How one man could have pwned all your PHP programs | Cyber Security
You’ve probably heard the term “supply chain attack” – it’s an all-the-rage jargon phrase in cybersecurity these days.
The metaphor is obvious – keeping goods safe, secure and unspoilt from manufacture to their final delivery has always been tough business.
The Dutch, for example, came up with brandy (brandewijn, which means distilled wine) especially to reduce the cost of transporting wine and to stop it going off in transit.
At 12% alcohol by volume, wine will soon spoil in the cargo hold of a sailing ship, but at 60%, “burned wine” will stay fresh even if you sail half way round the world and back.
The theory was to transport the wine in “compressed” form, and then dilute the brandy back to wine strength at the other end, so it could be sold at wine-like prices in wine-like volumes…
…but in practice the merchants just switched one problem (that the wine would turn to vinegar on the way) for another (that the brandy would vanish in transit, siphoned off by sailors who thought it far better unrediluted).
You’d think that the digital supply chain would be much easier to secure, given that we’re no longer shipping actual physical stuff from A to B, and that we can use cryptography to detect whether a product has “spoiled” in transit.
But knowing that a download worked perfectly isn’t the same as knowing that you downloaded the right file in the first place – so a hacked repository in 2018 poses just the same problem as an unlocked warehouse did back in 1672.
A Dutch wine trader who personally guarded his precious brandy casks against dipsomaniac sailors all the way from Cape Town to Rotterdam would have wasted his effort if the casks had already been drained and sneakily refilled with seawater before they even left the docks in Table Bay.
One rotten apple
Actually, in some ways the supply chain problem is worse today.
In any repository of software packages there is usually a complex web of dependencies such that some low-level software code may be required by – and automatically be downloaded during the installation of – hundreds or thousands of other packages on the site.
To extend our pre-industrial supply metaphor yet further: in software repositories, one rotten apple really can spoil the whole barrel.
The Heartbleed bug from 2014 is a good example: a long-overlooked and apparently unimportant memory corruption flaw in one part of the OpenSSL cryptographic library ended up turning millions of otherwise well-behaved websites into leaky buckets that potentially let out private data almost at random.
The servers depended on OpenSSL and therefore automatically inherited its security weaknesses along with its cryptographic strengths.
Likewise, mobile phones sometimes turn up in shops with malware preinstalled.
If the vendor’s own firmware images get infected during development, any phones that have the tainted firmware image added back at the factory will sit around with latent malware on them until they’re delivered, displayed, selected, purchased, taken home and turned on for the first time.
Package managers in the supply chain
For better or worse, today’s most popular software repositories serve truly enormous ecosystems, and many of them are essentially volunteer projects.
NPM, for example, is the Node Package Manager, and it serves up critical content – often entirely automatically – to just about everyone who codes in the Node.js programming environment, used on a huge number of web servers these days.
The Python programming community has PyPI, the Python Package Index; Ruby coders have RubyGems; Linux developers have kernel.org; the list goes on.
PHP, still the leading programming language for server-side web development, has Packagist, where open source PHP libraries can be hosted for free.
For many PHP coders, then, Packagist and the Composer package manager that it supplies, make up a vital part of their ecosystem.
If Packagist were to be hacked and a rotten apple uploaded in a well-chosen place, a truly enormous barrel would end up poisoned.
The Packagist problem
Fortunately, then, a recent and trivially exploitable vulnerabilty in the Packagist service was found by a cybersecurity researcher and quickly fixed.
The bug was caused by a surprisingly elementary blunder.
Researcher Max Justicz noticed that, when uploading a new package to Packagist, he was expected to provide a URL where Packagist would go looking for his code, so it could be imported to the Packagist repository.
But the Packagist server didn’t take proper care to check that he really had supplied a URL – he could supply a system command, and rather than visiting the URL, Packagist would blindly run his command instead.
Ouch!
In Justicz’s case, he simply supplied a URL (data that would usually look something like https://example.com
) in the format $(command)
.
The bug explained
When processed by many Linux command shells, text like command
is treated literally, and used unaltered.
But $(command)
tells the shell to run the program command
, collect its output and then replace the text $(command)
with that very output.
Imagine that you have source code called greet.c
that you compile to a program called greet
:
#includeint main() { printf("hello world"); return 0; } . . . $ clang -o greet greet.c # compile the program above . . . $ echo greet greet $ echo $(greet) # you need ./greet if the command is in the current directory hello world $
As shown above, the echo command will interpret the text argument greet directly as greet
, but will turn the command line argument $(greet) into hello world
first, because that’s the the output printed by the greet command above.
Of course, in order to do the greet-to-hello-world conversion, the shell first has to execute the program called greet.
Therefore a web server should never, ever allow text wrapped in $(...)
to be passed directly by an untrusted outsider to a command shell, or else the outsider could trick the server into running any program they wanted, just by putting its name inside the round brackets!
(By the way, putting text inside backticks, like `this`
, is an old-style equivalent of $(this)
and should be treated with identical suspicion.)
The jargon terms for this sort of flaw are:
- Command injection. By sneaking commands into input that is trusted as pure data at the other end, an attacker can sucker the server into running embedded commands.
- Bad input sanitisation. Before trusting data send from outside, a server needs to go through it and make sure there are no treacherous characters or other data that could make command injection possible.
What to do?
If you’re a programmer, never rely on external data without checking it carefully.
Characters in text strings that are harmless when you print them out, such as semicolons, quote marks, angle brackets, round brackets, backticks, vertical bars (also called pipes), dollar signs, tildes (also known as twiddles), asterisks and many more, often mean different things to system functions than they do to human readers.
As carpenters like to say: measure twice, cut once!