It seems appropriate that nil seems to be literally leaking. Said McCarthy: "...some of the decisions ... later proved unfortunate. ... Besides encouraging pornographic programming, giving a special interpretation to the address 0 has caused difficulties in all subsequent implementations."
Perhaps future historians will debate which 20th century McCarthy did the most lasting damage.
Aw hell no, you can't pin this all on McCarthy. The concept of "no value" must be represented somehow, and in Lisp NIL is almost certainly not 0 in its memory representation anyway. The lion's share of the blame belongs to one person, Dennis Ritchie. NIL is one thing but the NULL-terminated string is what destroys everything. Everything. Like I said a while back,
I'd like to point out again that nearly every security bug you've experienced in your entire life was Dennis Ritchie's fault, for building the single most catastrophic design bug in the history of computing into the C language: the null-terminated string. Thanks, Dennis. Your gift keeps on giving.
Sorry! I was going to blame Tony Hoare, but he only claims to have invented the null reference in 1965 -- a sort of posterior art, if you will. Dennis can take the blame.
By the way, re my comment just below, Tony Hoare took over as Oxford's second Professor of Computation after Strachey unexpectedly died of hepatitis in 1975. In addition to bestowing zero-terminated CPL strings upon everyone in 1963, Strachey also wrote the first AI heuristic search program, for playing checkers, andthe first computer melody (using Alan Turing's musical note routines) both in 1951 and the first generative grammar text production in 1952.
It wasn't Ritchie, it was Christopher Strachey in late 1963 when he implemented CPL (from which BCPL, B, and C copied string implementations and libraries) strings, because they had to be independent of word sizes which varied across its two initial target architectures, the Titan in Cambridge and Atlas in London. See "*Z and *z stand for stopcode" on page 32 here.
Thanks, that was a fascinating read but p32 doesn't seem to say that strings must be NUL-terminated, just gives a way to embed NULs and other characters via escapes. The rest of the document doesn't seem to elaborate on string representation, and I suspect it was an implementation decision.
According to Poul-Henning Kemp, it seems to be Ritchie's fault (with Thompson and Kernighan) but they were propagating an earlier PDP assembly and BCPL convention.
When BCPL was ported from the 24 and 48 bit word Atlas/Titan INTCODE system to the 36 bit word PDP-10 OCODE system, strings changed from nul-terminated bytes to 7 bit packed 5 per word with a 7 bit length. See page 14 here.
None of BCPL, B, or C supports character data strongly in the language; each treats strings much like vectors of integers and supplements general rules by a few conventions. In both BCPL and B a string literal denotes the address of a static area initialized with the characters of the string, packed into cells. In BCPL, the first packed byte contains the number of characters in the string; in B, there is no count and strings are terminated by a special character, which B spelled `*e'. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.
('*e' wasn't NUL by the way, but CPL's '*z' was.)
I've asked David Hartley to clarify. At 79, he's still the U.K. National Museum of Computing Director since 2012. The answer may be in one of his papers but it's paywalled.
I heard back from him and one of Stratchey's biographers (his 100th birthday is soon) and they couldn't remember but I gather they are presently looking through printouts in Cambridge's archives to see.
Just read George Coulouris's paper (``The London CPL1 compiler'', Comp. J BCS, 1968) via the
<a href="https://en.wikipedia.org/wiki/Institute_of_Computer_Science" title="LICS Wikipedia page"
Wikipedia page for the Institute of Computer Science, which says:
Strings are stored as a vector of half words. The first half word is the length of the string, subsequent half words are consecutive characters, simple or overprinted.
(CPL1 is a simplified version of CPL). So it looks like the London CPL compiler used packed arrays like BCPL. Maybe the Cambridge compiler did something different? I'd still be interested in David Hartley's group's reply.
Are they doing anything for Strachey's 100th birthday?
(sorry if my other reply sounded a bit dismissive; I seem to have become a bit obsessed by this!)
I read David Hartley's paper, and it mentions that the only completed CPL compiler was written by George Coulouris (later professor at QMW). I emailed him, and while he didn't explicitly give me permission to quote him, the gist is that he doesn't remember what string format was used, but would be surprised if it was zero-terminated; also, the language is unlikely to have specified that.
For BCPL, most compilers seem to use a packed format (recall that BCPL only has a word type). If you look at Martin Richards's manual annotated by Dennis Ritchie himself, (the PDF linked from that page, example on page 10), it uses a packed string format with the length stored in the first byte.
By contrast, B definitely used terminated strings (strictly, EOF-terminated); Ken Thompson's manual makes this explicit:
A string is any number of characters between " characters. The characters are packed into adjacent objects (lvalues sequential) and terminated with the character '*e'.
If you look at early Unix sources, even v1 (first PDP-11 version) uses zero-terminated strings -- e.g. look at the source of sh.s.
Curiously, v0 (for the PDP-7) doesn't seem to use them. The source is from one guy's amazing project to run PDP-7 unix on an actual PDP-7 -- look e.g. at the source for 'cp.s'. This may be my mistake in reading the assembly, of course. PDP-11 assemblers have an ASCIIZ declaration, but while PDP-7 assemblers had a TEXT declaration, it doesn't seem to have been zero-terminated.
All the above is only circumstantial evidence, of course, but I think it's unlikely that zero terminated strings come from CPL. They were definitely in Ken Thompson's B and v1 of Unix, so anyone must be blamed, it may as well be Dennis Ritchie and Ken Thompson. It seems uncharitable to blame them for all the consequences, and I suspect that zero-terminated strings were probably independently invented several times.
1) I want the T-shirt
2) I want the 21st century Symbolics machine from the alternate history where Symbolics never went bankrupt.
3) For retrocomputing purposes, I'll note that somewhere out there is the last updated source to Genera. Someone had better make the thing work again on emulators before the last people that could answer questions about how to make it function vanish.
Comments are closed because this post is 7 years old.
So... when can I buy this T-Shirt as DNALounge swag?
It seems appropriate that nil seems to be literally leaking. Said McCarthy: "...some of the decisions ... later proved unfortunate. ... Besides encouraging pornographic programming, giving a special interpretation to the address 0 has caused difficulties in all subsequent implementations."
Perhaps future historians will debate which 20th century McCarthy did the most lasting damage.
Aw hell no, you can't pin this all on McCarthy. The concept of "no value" must be represented somehow, and in Lisp NIL is almost certainly not 0 in its memory representation anyway. The lion's share of the blame belongs to one person, Dennis Ritchie. NIL is one thing but the NULL-terminated string is what destroys everything. Everything. Like I said a while back,
Sorry! I was going to blame Tony Hoare, but he only claims to have invented the null reference in 1965 -- a sort of posterior art, if you will. Dennis can take the blame.
By the way, re my comment just below, Tony Hoare took over as Oxford's second Professor of Computation after Strachey unexpectedly died of hepatitis in 1975. In addition to bestowing zero-terminated CPL strings upon everyone in 1963, Strachey also wrote the first AI heuristic search program, for playing checkers, and the first computer melody (using Alan Turing's musical note routines) both in 1951 and the first generative grammar text production in 1952.
It wasn't Ritchie, it was Christopher Strachey in late 1963 when he implemented CPL (from which BCPL, B, and C copied string implementations and libraries) strings, because they had to be independent of word sizes which varied across its two initial target architectures, the Titan in Cambridge and Atlas in London. See "*Z and *z stand for stopcode" on page 32 here.
Thanks, that was a fascinating read but p32 doesn't seem to say that strings must be NUL-terminated, just gives a way to embed NULs and other characters via escapes. The rest of the document doesn't seem to elaborate on string representation, and I suspect it was an implementation decision.
According to Poul-Henning Kemp, it seems to be Ritchie's fault (with Thompson and Kernighan) but they were propagating an earlier PDP assembly and BCPL convention.
Thanks also for the other links on Strachey.
When BCPL was ported from the 24 and 48 bit word Atlas/Titan INTCODE system to the 36 bit word PDP-10 OCODE system, strings changed from nul-terminated bytes to 7 bit packed 5 per word with a 7 bit length. See page 14 here.
In 1993, Ritchie wrote:
('*e' wasn't NUL by the way, but CPL's '*z' was.)
I've asked David Hartley to clarify. At 79, he's still the U.K. National Museum of Computing Director since 2012. The answer may be in one of his papers but it's paywalled.
Thanks again. I can read Hartley's paper but it doesn't say what the string representation was. I'd be genuinely interested in his reply.
I heard back from him and one of Stratchey's biographers (his 100th birthday is soon) and they couldn't remember but I gather they are presently looking through printouts in Cambridge's archives to see.
Thanks -- it would be great if you could persuade them to scan and OCR it as they go. More of this early stuff should be online.
Just read George Coulouris's paper (``The London CPL1 compiler'', Comp. J BCS, 1968) via the
<a href="https://en.wikipedia.org/wiki/Institute_of_Computer_Science" title="LICS Wikipedia page"
Wikipedia page for the Institute of Computer Science, which says:
(CPL1 is a simplified version of CPL). So it looks like the London CPL compiler used packed arrays like BCPL. Maybe the Cambridge compiler did something different? I'd still be interested in David Hartley's group's reply.
Are they doing anything for Strachey's 100th birthday?
(sorry if my other reply sounded a bit dismissive; I seem to have become a bit obsessed by this!)
Thanks, Tim. I forwarded your message and will share any reply. There is a conference for Strachey's birthday and here is a paper for it.
I read David Hartley's paper, and it mentions that the only completed CPL compiler was written by George Coulouris (later professor at QMW). I emailed him, and while he didn't explicitly give me permission to quote him, the gist is that he doesn't remember what string format was used, but would be surprised if it was zero-terminated; also, the language is unlikely to have specified that.
For BCPL, most compilers seem to use a packed format (recall that BCPL only has a word type). If you look at Martin Richards's manual annotated by Dennis Ritchie himself, (the PDF linked from that page, example on page 10), it uses a packed string format with the length stored in the first byte.
By contrast, B definitely used terminated strings (strictly, EOF-terminated); Ken Thompson's manual makes this explicit:
If you look at early Unix sources, even v1 (first PDP-11 version) uses zero-terminated strings -- e.g. look at the source of sh.s.
Curiously, v0 (for the PDP-7) doesn't seem to use them. The source is from one guy's amazing project to run PDP-7 unix on an actual PDP-7 -- look e.g. at the source for 'cp.s'. This may be my mistake in reading the assembly, of course. PDP-11 assemblers have an ASCIIZ declaration, but while PDP-7 assemblers had a TEXT declaration, it doesn't seem to have been zero-terminated.
All the above is only circumstantial evidence, of course, but I think it's unlikely that zero terminated strings come from CPL. They were definitely in Ken Thompson's B and v1 of Unix, so anyone must be blamed, it may as well be Dennis Ritchie and Ken Thompson. It seems uncharitable to blame them for all the consequences, and I suspect that zero-terminated strings were probably independently invented several times.
Handle with CAR?
Handle with CADR?
1) I want the T-shirt
2) I want the 21st century Symbolics machine from the alternate history where Symbolics never went bankrupt.
3) For retrocomputing purposes, I'll note that somewhere out there is the last updated source to Genera. Someone had better make the thing work again on emulators before the last people that could answer questions about how to make it function vanish.