A group of researchers from the University of Washington has shown for the first time that it's possible to encode malicious software into physical strands of DNA, so that when a gene sequencer analyzes it the resulting data becomes a program that corrupts gene-sequencing software and takes control of the underlying computer. [...]
But encoding [the buffer overflow attack] in actual DNA proved harder than they first imagined. DNA sequencers work by mixing DNA with chemicals that bind differently to DNA's basic units of code -- the chemical bases A, T, G, and C -- and each emit a different color of light, captured in a photo of the DNA molecules. To speed up the processing, the images of millions of bases are split up into thousands of chunks and analyzed in parallel. So all the data that comprised their attack had to fit into just a few hundred of those bases, to increase the likelihood it would remain intact throughout the sequencer's parallel processing.
When the researchers sent their carefully crafted attack to the DNA synthesis service Integrated DNA Technologies in the form of As, Ts, Gs, and Cs, they found that DNA has other physical restrictions too. For their DNA sample to remain stable, they had to maintain a certain ratio of Gs and Cs to As and Ts, because the natural stability of DNA depends on a regular proportion of A-T and G-C pairs. And while a buffer overflow often involves using the same strings of data repeatedly, doing so in this case caused the DNA strand to fold in on itself. All of that meant the group had to repeatedly rewrite their exploit code to find a form that could also survive as actual DNA, which the synthesis service would ultimately send them in a finger-sized plastic vial in the mail.
The result, finally, was a piece of attack software that could survive the translation from physical DNA to the digital format, known as FASTQ, that's used to store the DNA sequence. And when that FASTQ file is compressed with a common compression program known as fqzcomp -- FASTQ files are often compressed because they can stretch to gigabytes of text -- it hacks that compression software with its buffer overflow exploit, breaking out of the program and into the memory of the computer running the software to run its own arbitrary commands.
Even then, the attack was fully translated only about 37 percent of the time, since the sequencer's parallel processing often cut it short or -- another hazard of writing code in a physical object -- the program decoded it backward. (A strand of DNA can be sequenced in either direction, but code is meant to be read in only one. The researchers suggest in their paper that future, improved versions of the attack might be crafted as a palindrome.)
This next part makes me sad. By "verge on cheating" you mean "absolutely totally cheating":
Despite that tortuous, unreliable process, the researchers admit, they also had to take some serious shortcuts in their proof-of-concept that verge on cheating. Rather than exploit an existing vulnerability in the fqzcomp program, as real-world hackers do, they modified the program's open-source code to insert their own flaw allowing the buffer overflow.
Still, it's a great stunt. Original paper here.
My favorite line is "The repeated 0xdeadbeef bytes produced a long (40+ base pair) repetitive sequence" because it makes me wonder whether this DNA is technically dead, and/or beef.
Brings new meaning to the old "sanitise your user inputs" advice.
I don't know why but the thought of it is kind of disgusting...
Makes me think of the dumbest hack on a TV crime show. On Bones, a computer caught on fire, an event caused by malware introduced to a computer by scanning a bone that had the malware code carved into its surface in a "fractal pattern" (I think the scan was something like a laser creating a 3D model of the bones).
Most people in the biz compress fastq with gzip, or they compress mapped reads/sequences in bam/cram format using samtools.
Maybe this reflects some level of ignorance, but I've never heard of fqzcomp, and I think calling it commonly used is a stretch.This is an interesting proof-of-concept, though.
It's pretty clear that they tried really hard to come up with a real-world exploit but failed. I'm guessing that they ran up against a deadline, and then contrived the rest to get a good headline.
But they have absolutely demonstrated that this is possible, with a bit more work and the proper unpatched exploit, which is amazing enough in its own right.
It is amazing. Reading through the paper in detail. Thanks for posting this.