Stephen Wolfram's Epic Analytics are EPIC.


Let's start off talking about email. I have a complete archive of all my email going back to 1989. Here's a plot with a dot showing the time of each of the third of a million emails I've sent since 1989:

OK. So email is one kind of data I've systematically archived. And there's a huge amount that can be learned from that. Another kind of data that I've been collecting is keystrokes. For many years, I've captured every keystroke I've typed -- now more than 100 million of them:

He what? He what? He archived every keystroke for a decade.

I am in a great deal of awe and not a little bit of outright worship at this project. Holy shit. Why didn't I do that?? Look at those fucking graphs. Just fucking look at them.

  1. Piku says:

    I have all my mail going back to 1999. Not entirely sure how to suck the past 8 years worth out of GMail though.

    • nandhp says:

      You can use IMAP for that. I have a cron job that does this every 3 hours using mbsync.

  2. pavel_lishin says:

    > Why didn't I do that??

    Because you're a security conscious person who doesn't want his project revealing all of his passwords?

  3. Russell Borogove says:

    How is it that Wolfram doesn't recognize a beta distribution when he sees one?

    • DFB says:

      Beta distributions are unimodal, and those have troughs for both sleep and dinner.

      I wonder what was going on at 4 and 5 am in 2005-7. Mid-2005 is particularly weird and not just a time zone shift like mid-2009.

      • I was talking about the emails-per-day plot, which he specifically calls out with a 'what the heck is this distribution', not the time-of-day plots.

    • Richard says:

      How is it that Wolfram doesn't recognize a beta distribution ...

      Aside being one of the world's biggest dickheads, Wolfram's forgotten more mathematics (and not just the lower high school stuff that sets programmers aside from their math-is-hard managers) than you'd learn in several lifetimes.

      That's why. Because it isn't.

  4. David M.A. says:

    Why would anyone--

    That's the wrong question, isn't it?

  5. Bill Paul says:

    I'm sure the government has been archiving all your e-mails and keystrokes for you.

  6. David Glover says:

    Which is great as long as you don't want to use Wolfram Alpha Pro to analyse your own email, because while they allow mbox uploads, there's a file size limit of 1MB.

    Yes, MB.

    I have single emails bigger than that.

  7. Sam Kington says:

    I love how he says "yeah, I've been storing every keystroke for years", in a sort of "what, doesn't everyone do that?" way.

  8. Phillip Remaker says:

    Hoarders, Mad Science edition.

  9. OCD is a helluva a drug

  10. Joe says:

    He should graph his Judge Wapner data.

  11. And I thought that having a livejournal for the past ten years was something of note. >:(

  12. rwos says:

    To create a plot of the file modification times (like his, but w/o colors) with gnuplot, do something like this:
    Put the following into "" or so:
    set xdata time
    set timefmt "%Y-%m-%d"
    set format x "%Y"
    set xlabel 'year'
    set ylabel 'hour'
    set yrange [0:24]
    set ytics 1
    set xtics 365*24*60*60
    set terminal png size 800,600
    plot "/tmp/times.dat" using 1:2 title 'mod time' with dots

    Then do
    find / -printf '%CY-%Cm-%Cd %CH %CM\n' | \
    awk '{print $1" "$2"."($3/60)*100;}' | \
    sed 's/\.[^.]\+$//' > /tmp/times.dat

    And create the plot via
    gnuplot > plot.png

    (The awk-hackery for the hours on the y-axis is because gnuplot can't handle two different time formats natively - at least I don't how). Works on my Linux machine, YMMV. My plot isn't too interesting, though - I was born the year Wolfram started collecting his mails, so...

    • rwos says:

      Oh, it should be ($3/59) not ($3/60) in the awk line. Well, better use something like Mathematica, I guess.

      • Alex says:

        Also, those should be { not ( in the awk. PS for anyone trying this at home, gnuplot will strftime into RFC 822 style but it won't strptime out of it.

  13. Heh, looks almost exactly like a raster plot out of an electrophysiology paper.

  14. Colin Dean says:

    There was a thing back in 2002 called Project Dolphin which had a client called Pulse that did keystroke counting. The goal was to see how many keys the average Internet user strokes in a year. I hung out with nirgle and friends on IRC for couple of years and even learned a bunch of Esperanto in the process.

    Ah, 10 years ago.

  15. Zingus J. Rinkle says:

    1st jan of 2002 something big happened to his email habits.
    Graduated from college? Dropped out of previous job?

    His keystroke don't reflect it, but that's because they start in 2002.
    All in all his sleeping patterns since 2002 are really boring and normal.

    • tkil says:

      He explains quite a few of the visually-obvious landmarks in the linked article.

      The really radically displaced e-mail was a trip to Europe, IIRC. Some of the other spikes (and changes to daily routines) had to do with other significant events in his life, e.g., releases of Mathematica 3 or A New Kind of Science.

  16. Ian says:

    Never mind that crap, how many orgasms has he had?