DadaDodo |
Exterminate All Rational Thought |
© 1997-2003 Jamie Zawinski <jwz@jwz.org>
don't read the words | just look at the shapes | I never metadiscourse I didn't like | deconstruct this, monkey boy | the fun link is at the bottom |
DadaDodo is a program that analyses texts for word probabilities, and then generates random sentences based on that. Sometimes these sentences are nonsense; but sometimes they cut right through to the heart of the matter, and reveal hidden meanings.
William S. Burroughs called this ``cut up theory.'' His approach was to take a page of text, divide it into quadrants, rearrange the quadrants, and then read the page across the divisions. He wrote this way; writing, cutting up, shuffling, publishing the result. Collage and randomness applied to words. He saw this as a way of escaping from a prison that words create for us, locking us down into one way of thinking: an idea echoed in Orwell's ``1984,'' where the purpose of NewSpeak was to make ThoughtCrime impossible by making it inexpressible: ``The Revolution will be complete when the language is perfect.'' |
In 1976, industrial music found a name, when Throbbing Gristle formed Industrial Records (``Industrial Music for Industrial People'') along with such bands as Cabaret Voltaire and ClockDVA. These bands were heavily influenced by Burroughs' ideas, and cut-up theory made its way into their music, when the bands would make tape recordings of found sounds (machinery, short-wave radio, television newscasts, public conversations) and cut up, rearrange, and splice the tapes, turning it into music. This was long before digital audio: this was done with razor blades. Today, it's called sampling, and the influence of these bands is felt in nearly all branches of modern pop music. This wasn't the first time ``natural'' sounds had been used in musical compositions; that sort of thing had been going on at least as far back as the 19th century, and the surrealists and futurists of the 1920s and 1930s were way into this kind of thing. |
Ted
Nelson, the inventor of hypertext, published ``Computer Lib'' in
1973. This book was more a stream- |
DadaDodo is one of the class of programs known as ``dissociators,'' a coin perhaps termed by the amazing Emacs hack, ``Dissociated Press.''
DadaDodo works rather differently than Dissociated Press; whereas Dissociated Press (which, incidentally, refers to itself as a ``travesty generator'') simply grabs segments of the body of text and shuffles them, DadaDodo tries to work on a larger scale: it scans bodies of text, and builds a probability tree expressing how frequently word B tends to occur after word A, and various other statistics; then it generates sentences based on those probabilities. The theory here is that, with a large enough corpus, the generated sentences will tend to be grammatically correct, but semantically random: exterminate all rational thought. |
|
DadaDodo doesn't work quite as well as I would like it to.
Here's the bug: the smaller the amount of input text, the better
the sentences are that it generates. (Above a certain
I think I understand why this is. My guess is that as the body of input
text increases, the probabilities even out to normal- I don't think treating every pair of words as one ``word'' for statistical
purposes will work very well; that will be too clumpy.
I want some kind of commutative I'd kind of like to do this without adding another dimension to my graph,
because it's already pretty huge. Another order of magnitude just won't do.
But it seems that human language isn't a system which can be
modeled
by a Markov chain of length 1.
|
Dissociated press works better because it only ever operates on small
inputs, and always shuffles large-ish chunks. Burroughs'
cut-ups work better
because they work on large-ish chunks, and there are spatial relations that
come into But I really like the idea of breaking the original text down into probabilities and then generating from that, rather than taking the original text and shuffling it. The shuffling approach feels like it preserves too much of the original content, whereas all I want to preserve is the original grammar. Maybe that's not possible (practical). I don't want to have huge lists of nouns/ One possibility would be to keep only the most popular 3-way and higher
combinations around; for the more common ones, I could hold a pointer to
a sub-table, instead of a flat probability. Their popularity could be
found using a
quadtree- Another good compression trick would be to quantize the values; though
the maximal numerator or denominator that we need to express the
probabilities might be a 16 bit number (or higher), we probably could make
do with 8 bits (or less) of resolution: have an 8-bit lookup table of
approximate probabilities. I suspect that the values in this table
would end up being on a logarithmic scale (since that's how nature works.)
|
DadaDodo doesn't do quite as much as I would like it to.
I want it to crawl the web and consume text. |
I want it to sometimes randomly bounce to a similar- |
I want it to count syllables, and thereby generate haiku. This could be done by simply generating random sentences until we get ones that have the word and sentence breaks in the right places: it shouldn't take more than a few hundred or thousand iterations each. I think syllable counting is just a hyphenation problem.
| |
Quoth a random Netscape employee, about The Dr. Bronner's Peppermint Castile Soap School of Web Site Design and Panhandling:
If I see this guy again, so help me, I'm going to refer him for hiring as a web site designer. Was it Robert McElwaine? |
DadaDodo is still kinda cool, though.
|