Number 4294967295 might surprise you.
Culturally, code exists in a nether zone. We can feel its gnostic effects on our everyday reality, but we rarely see it, and it's quite inscrutable to non-initiates. (The folks in Silicon Valley like it that way; it helps them self-mythologize as wizards.) We construct top-10 lists for movies, games, TV -- pieces of work that shape our souls. But we don't sit around compiling lists of the world's most consequential bits of code, even though they arguably inform the zeitgeist just as much.
So Slate decided to do precisely that. To shed light on the software that has tilted the world on its axis, the editors polled computer scientists, software developers, historians, policymakers, and journalists. They were asked to pick: Which pieces of code had a huge influence? Which ones warped our lives?
Code can be used for good or bad. Thanks to Slate for this critical and meaningful insight.
I read the whole thing, and it was fun. But seriously WTF does "gnostic effects" even mean?
As for null-terminated strings, well. The problem that was intended to address was real, and to this day I don't think there was an obviously better solution. It was bad, maybe, that "C programming skills" became synonymous with "real programmer skills" for so long, and good, probably, that application shops aren't advertising for C programmers anymore. But there are reasons why C is still the implementation language for operating systems and those reasons are closely tied to why some of us still love it.
I'm not sure either, but I took it to mean that there is this invisible underlying structure that informs and creates the world we live in. You can't really see code; you can sort of see its representation and interpretation. But it impacts so much of what we do.
Animism wasn't real, until we built it. I think it's entirely fair to use "gnostic" to describe this new state of the world where formerly-inanimate objects have coldly inscrutable desires of their own.
"Very nearly every security exploit you’ve ever heard of starts here" dramatically overstates the significance of this in 2019.
Not least because of the "branded bug" which means if an attack works but is only known to anybody by a four digit CVE number it quickly fades from memory whereas if it has a logo and a single serving web site it's more likely to be something you've ever heard of. But there were no NUL terminated strings in BEAST and CRIME, or in SIMjacking attacks. The team that set back Iranian bomb ambitions by years didn't need to overflow any buffers, they just knew how to collide MD5.
The last decade or more have reinforced the correct notion that all bugs are security bugs. The system's security depends upon its operating as expected and all bugs are deviations from expected behavior otherwise, "that's a feature".
Lots of more grievous errors in this piece. Most obviously JPEG isn't about files. That's essential to understanding how it came to be how it is. The JTC sub-committee responsible for JPEG did not envision this as a way to make a data "file" on a computer that contains a compressed image. The first attempts to do that were part of TIFF, and are a colossal mess as a result of the JPEG system and TIFF having incompatible ideas about how image metadata works. The JPG files you see today are the product of the "Independent JPEG Group" and its JPEG File Interchange Format, JFIF for their "libjpeg" library, which instead of TIFF's complexity essentially says "OK, do JPEG compression in this prescribed way and just write all the compressed data from the JPEG standard into a binary file. To get the image back, take the raw data from the file and decompress it. Done". The royalty free aspect is because the IJG, fearful of patents, avoided using a feature in JPEG itself that was patented (by IBM and possibly by others because as usual patents are a nightmare of overlapping claims) and so if you did use this feature (perhaps because you'd paid one of its claimed inventors) your image file wouldn't open with libjpeg and so de facto the royalty free variant dominated.
It's nice to see Grace get her well-deserved mention though. It's all very well that people like Turing clearly understood their idea was meta-applicable it took Grace to actually do it. Everything else falls out from that, much more so than for NUL terminated strings or Space War, as fun as those are to reminisce about.
If Pascal strings had won over C strings, we'd see file paths and other API string parameters limited to 255 characters, as Win32 still is today. Or maybe they'd get upgraded to multi-byte length fields of differing widths and endianness and APIs would become fractured and incompatible. C-strings ↔ UTF-8 ⇔ Pascal-strings ↔ UTF-16, UTF-32, UCS-2.
Buffer overflows would still exist, because length prefixed strings does not imply bounds checking. Languages without automatic bounds checks would still out-perform languages with them, people would still get annoyed at slow programs and prefer to use competing programs that do the same thing faster, so programmers would still prefer "unsafe" languages over "safe" ones.
I'm glad you think that Everything Is Just Fine and All Of These Decisions Were Reasonable.
Thaaaaaanks for that, I'll give your opinion all due consideration.
Like I said, the problem addressed was real, but I'll give you that gets(3) should never have been implemented, even in the lab.
I find myself thinking more and more these days "maybe 'rough consensus and running code' was not the best design methodology for a critical piece of global infrastructure."
I wonder how much code is slapped together without being intended to be the underlying infrastructure. Imagine coming home one day, and a city official telling you that the Ikea shelves you've sort-of bolted to the wall with leftover screws from another project is now a major part of the plumbing system?
The '\0' business is for reading strings, not writing. It's trivial to generate suffixes: if *p != '\0', then p+1 is a pointer to a valid string (modulo split-up mbchars); and ctype.h functions work on '\0'.
If you're writing into a buffer, you need to know the buffer size -- this, not '\0', is the gets() fail. If you're writing into a string, you need to know the buffer size (if you're lengthening the string) and you need to remember to adjust the string length to match your modification - this whether the string length is determined by count or terminating '\0'.
Buffer overruns are indeed C's gift that keeps on giving, but not because of '\0'.
At what point did the choice become between braindamaged-and-inherently-unsafe strings and braindamaged strings combined with unsafe implementations which didn't check bounds? Because there should be other choices.
Oh yes, I remember: all our programs have to run on PDP-11s and PDP-11s are really slow. Yes, it's properly bounds checking strings which is making our programs slow. Really, it is: stop laughing over there.
The choice is between perfectly-reasonable-strings-you-don't-like and also-perfectly-reasonable-strings-you-do-like. String representation is a distracting side-show.
Optional/mandatory, automatic/manual, static/runtime access checks are independent from data representation, and we can rejoice together that the majority programs today are written in managed languages, completely immune to buffer overruns. Yet they suffer from e.g. deserialisation and injection attacks, because don't trust the client is more the fundamental principle.
Future speed increase are irrelevant when a competing program runs rings around yours on the same hardware, today. "This game is dog slow, but because I wrote it in BASIC instead of 6502, it's memory safe which is something that people in 30 years time will really respect" said no Apple II programmer ever.
Wrong decade. Apple ][ hackers were not the problem here. The infrastructure that runs our society does not rely on legacy 6502 code.
who else immediately winced at the redundant comparison in the illustration ;-p
tenuous assumptions that yield tighter code will ALWAYS have their place, but if yr code is reckless AND slow ...ugh!
The people who see me point the bone at the null-terminated string and react by saying either, "it was right and correct at the time and could not have been different" or, "so what, the next problem along would have been just as bad", are so depressing in their lack of imagination. Is this the Best of All Possible Worlds? Really?
I'm reminded of a friend who was an obsessive Civilization player who said to me, "If I haven't reached space by the 14th Century, I just reset and start over."
What if!
What if, some time between 1980 and 1990, C was supplanted by another language that differed in only two major respects: runtime bounds and type checking. Let's say the performance impact was severe. 30%, even 60%. That would have been such a huge setback! It would have cost so much money! In 1980. But oops, Moore's Law. A body-blow like that would have set the industry back all of six months. Apple might be releasing their new phone next March instead of this September.
And that's assuming that everything else in this alternate history played out the same. Which it wouldn't have, because how many lifetimes of effort would have been reallocated toward doing useful things instead of playing Buffer Overflow Whack-a-Mole?
We might be a decade ahead by now.
That alternate history did happen. It involved ADA (starting in '83) and cubic dollars worth of aerospace and military funding couldn't make it fly. Ariane 5 blew up on ADA's watch, length-counted strings and all.
There are many different answers now. The lessons have finally been learned, or bypassed in the name of Moore's Law. Bits of browsers are being re-written in length-counted languages, as are all the mobile-device GUIs. It'll take a while, but we're probably on the way to recovery at last.
No, it didn't, because I said "supplanted by". "But Ada didn't take off" does not invalidate my hypothetical.
What if Gary Kildall said yes to IBM?
I feel we'd be even further ahead than your hypothetical.