Ok, guess what? The Mail.app importer discards the "Date:" header and uses whatever was in the envelope ("From_") instead. So, not only is this bogus (the send-time is more interesting than the receive-time) but it means that it's going to mis-date just about all of my really old mail, since those have correct Date: headers but seem to have in their envelope the date at which they were converted to "mbox". I blame Kyle Jones.

This means I get to write a script to parse all those Date: headers and regenerate the envelopes. How many times have I done this? Then re-import the mail again. How many times have I done this?


otterley: I gave up on IMAP because the server is just far too much of a pain in the ass to figure out how to get running.

Tags: , , , , ,

56 Responses:

  1. earino says:

    About a year or so ago, I was writing my 20 billionth "parse this mailbox, generate new emails, import them into this new mail account" thing, and I snapped and decided I didn't feel like dealing with this. Deciding that google was probably going to be around at least for 4-5 years, I wrote a perl script that went through my 800 megs worth of mail, and I sent it all to a brand spankin' new "gmail archive" account. Now, all of my email from 2003 and before is nicely stored away in a "historical gmail account" that I can search using their lovely tools, send email from, etc... I sincerely doubt that such a thing would work for you, but I resolved to not write another mail parsing utility until google went under :)

    • fgmr says:

      I did that. Unfortunately, gmail uses the date it received the message, not the original date. So all my imported mail is dated last April. Sigh.

      • earino says:

        Yep. I noticed the same problem, I modified the body of my message to say: "Originally sent on [yadda yadda]", however once I had a pretty decent search engine for it, I really basically stopped using the dates for anything. Again though, far and away not a perfect solution, I just wanted to stop worrying about carrying my damned mailbox everywhere I went.

      • jkonrath says:

        I thought about doing the gmail "archive of every mail ever" thing and ran into that exact problem.

        Also, it puts together the "conversations" or threads or whatever strictly by sorting on the Subject. So every "Re: (no subject)" over the last decade is going to be lumped into one single thread. Not cool.

  2. boggyb says:

    Ouch. Wasn't the whole point of a Mac to get away from all this?

    Well, at least your sound is working now.

  3. dballing says:

    There's two column headers available, "Date Received" and "Date Sent"... right click on the column headers and it will allow you to customize which ones you're presented with.

    Remove the "Date Received", add the "Date Sent", and you're in good shape.

    • jwz says:

      I believe the importer actually overwriting the contents of the Date: header in the *.mbox/Messages/*.emlx files with the date that was in the envelope. Perhaps when incorporating mail "normally" it keeps track of both dates, but it looks to me like the importer is replacing it.

      • unixbigot says:

        Hmmn, does that mean it'll do the Right Thing if you cat all your old mail onto
        $unixbox:/var/mail/jwz and 'import' it via pop?

  4. lx says:

    Maybe this'll save you time, maybe you have something snazzier, but I started wondering what that would look like and this is what happened:

    s/(^|\n)From (.*)(\w{3} \w{3} [0-9 ]\d \d\d:\d\d:\d\d \d{4})(.*)\nDate: (.*?)\n/$1From $2$5$4\nDate: $5\n/gsi

    Apply at your Peri?l.

    • jwz says:

      Yeah, well, part of the problem is that in the early 90s, people put whatever-the-fuck they felt like in the Date headers, so they're all over the map. And parsers are often picky about the date format in the From_ line, so if you don't parse and regenerate, you end up with one message getting tacked onto the end of the previous one.

      • lx says:

        Ah, lousy.

        Happy Parsing!

        • terryray says:

          Gee, Jamie, if only you knew where to find some generic all-purpose can-handle-anything-in-the-world date parser...

          • strspn says:

            Terry, you obviously have a date parser of some kind in mind.

            Which is the best for this task? I nominate Tcl's [clock scan]

            $ tclsh
            % clock format [clock scan "12/12/12"]
            Wed Dec 12 12:00:00 AM PST 2012
            % clock format [clock scan "Dec 03, 95"]
            Sun Dec 03 12:00:00 AM PST 1995

            • jwz says:

              He's talking about the one I wrote for Netscape 2.

              Which I stayed up all night porting to Perl.

              Shoot me...

              • terryray says:

                You mean, you just ported it last night? Just for this?

                OK, fine. Bang bang, you're dead.

                So, just out of curiosity, did you start with the C version or the Java version?

                (And, do you realize that the new silly Wiki-based Grendel homepage no longer even includes a link to the source tree?)

                • jwz says:

                  Yes, last night. It wasn't as painful as I expected. I used the C version because, uh, I forgot the Java version even existed! Though I don't think Java would have been significantly easier to port than C. And though it pains me to say it, munging file contents is a lot easier in Perl than in either of those... Or maybe it's just that the worms have eaten in to my brain. Hard to say.

                  • terryray says:

                    Of course it's a lot easier than C! Or Java. String munging sucks in both those languages.

                    What you are supposed to be feeling bad about is how inefficient your code is now. And if you got sucked into using perl's regexp support somewhere in your port, then you must really feel bad.

                    Anyway, if the worms have eaten your brain, the same worms ate mine. No more C for me. No more languages that won't garbage collect, no more languages that will let me overflow buffers. No more of that crap. I've done my time. It's time for the computers to take over that drudgery.

                  • strspn says:

                    If you ever get nostalgic, critcl is much easier than Perl XS.

                    Perhaps Perl 6 will have inline Tcl.

  5. kfringe says:

    Which IMAP server was it? I know you and your pop fetish, but there are imap servers that are more user friendly than you might think. Okay, IMAP is a desperately broken standard, but it is at least a standard that will make sure that you can try a lot of new email clients without having to go through this mail-import dance a hundred times.

    Some of them are even solid. There's no need for you to drink of the UW madness or the Cyrus kool aid to get decent results.

    • jwz says:

      Courier. See whining in previous post.

      • kfringe says:

        You poor bastard.

      • ewindisch says:

        The easiest thing for you to do is to use someone else's services.. get a $5-10/mo hosting account with someone and use their imap server, just point the MX record for jwz.org to them.

        By far, I believe the best IMAP server around is Cyrus. It doesn't use mbox or maildir storage which can be annoying for some, but you can get it the messages via IMAP, which is all you should really need. We use Cyrus at GrokThis.net, my lil' web-hosting company.

        Considering your notarity, I'm sure there would be at least one company out there who would be willing to barter hosting for advertising.

        • luminalflux says:

          And cyrus is a couple magnitudes worse to get running. And that's after the headache of getting SASL to work with kerberos/ldap/whatever has gone away.

          • ewindisch says:

            No problem for me to setup on debian... the tough thing is getting multiple domain names to work, as there isn't much documentation online for handling it. To fix this I create usernames as jwz-jwz-org and jwz-dnalounge-com

      • muerte says:

        Check out Dovecot. If you're just running mail for a couple of users (< 100) it will be more than sufficient. It's super simple to setup too, because I'm with you, most IMAP servers are insanely complex.

      • ciphergoth says:

        When you complained about difficulty configuring your IMAP server, I was going to recommend Courier, because my IMAP server hasn't given me any configuration trouble at all. It should give you an idea of exactly how little that I just discovered my mistake: it turns out I'm running not Courier but Dovecot.

        I don't rememeber doing anything at all to make this work besides installing it, and putting the appropriate hole in my firewall.

    • jonabbey says:

      Why is IMAP a desperately broken standard?

      I've been reading Mark Crispin's rants on the subject of lousy IMAP clients over on comp.mail.imap for years, chiefly on how dumb it is to treat an IMAP sever like it was a POP server, but I've not seen any really cogent criticisms of the protocol itself..

      • kfringe says:

        The gallery can provide all the details. My favorite piece of amazingly stupid brokenness is that IMAP has no concept of any operation called "move."

        As for Crispin's rants: they seem to boil down to "It doesn't do what you want because you won't do it my way. If you do it my way it still won't do what you want. You should stop wanting that."

        • jonabbey says:

          Mmm. It does have the ability to copy an arbitrary set of messages from one mailbox to another as an atomic operation, though. With that and an atomic DELETE command that worked on specified mail messages, I suppose you could get the same effect without too much work..

          Ah, but I see from the RFC that it has no atomic delete operation that applies only to designated messages. It just has the EXPUNGE business.

          How unfortunate.


          • brong says:

            There is UIDEXPUNGE if your server supports it, which is nice.

            (found this old thread searching for a perl module that can actually parse the myriad bogus Date: headers so I don't have to write my own)

  6. ezhar says:

    I'm using bincimap (http://www.bincimap.org/) as IMAP server and it does a good job of providing Mail.app with my emails. It's stable (on NetBSD 2.0), fast and simple to setup (even SSL was only a small pain in the ass).

  7. ivo says:

    I would definitely give IMAP another chance. Not only do you add seperation between mail storage and GUI frontend (seperation is good, it's the Unix-Way(tm)), it also allows you to possibly set up different GUIs in the future. I'm not plugging my own mailreader (pine) here, but having it run through IMAP is great, since it allows me to also use a webgui (when I'm on the road) as an interface. Using maildir as a storage format should make it fairly future-proof, I doubt great standards as maildir will go away anytime soon.

    IMAP-with-maildir-over-postfix works very well for me. I do it on FreeBSD but I don't see a reason why OSX wouldn't be able to handle this.

    • cananian says:

      Heh. Pine's my mailreader, too -- but none of this "webgui" stuff. I just use a Java SSH client on the road. Yay for uniformity of interface...

  8. doctorow says:

    A lot of time this problem is because the machine changes the "date sent" field in all your mail, but not "date received." Try View -> Columns -> Date Received.

  9. p3rlm0nk says:

    I had a relatively easy time setting up my home mail server
    with Dovecot as the IMAP/POP3 daemon if you're in the mood
    to experiment. (The rest of the system was postfix. The hardest
    bits involved setting up encryption everywhere and things like
    smtp-auth.) Of course, I also probably don't have anywhere
    near the amount of mail you do, so... Basically I took one look
    at UW and some of the other "big-name" IMAP servers and threw
    up my hands rather quickly. I didn't see much point in migrating
    away from sendmail because of irrational complexity only to replace
    that irrational complexity on the mail access end.

    I think that mail is second only to backup-related issues
    on my list of systems administration irritants. And that's
    mainly because I really hate tape drives.

    • noweb4u says:

      I use dovecot and exim with exim's SMTP AUTH for relaying purposes (which is uncommenting a few lines and creating a password file, and restarting the daemon). I have multiple IMAP clients reading the same mbox files simultaneously with no configuration issues, and the mail stuff just works.

  10. mato says:

    I went through a similar scenario a couple of years back. I have ~13 years worth of email, which until then was all a somewhat disorganised mess of mbox files. FWIW, here are my experiences. Note that the IMAP part of this is only getting fully implemented now, since it's only recently that I have reliable server hardware with enough disk on RAID-1 available.

    Goals, in no particular order

    1st goal: Performance. Fast access to mail folders with 10000+ messages in them.

    2nd goal: Accessibility. I want to be able to access my mail from anywhere using random 21st century client-du jour, *without* having to import anything into a proprietary format. I also need to be able to access the same mail folders in a sensible way from a shell on the server. This is for situations when I have to react to critical business mail, and I'm stuck on the far end of a GPRS link with a Psion 5. SSH and nail is your friend.

    3rd goal: Reliability. I don't want my mail folder getting corrupted because some message with a "weird" From accidentally gets imported into it. I don't want to deal with locking issues on mbox files. In the event of filesystem corruption on the server, I don't want to have to trawl through multi-megabyte files.

    4th goal: Automation. I want to be able to filter messages based on content, at the server rather than on the client.

    The solution

    All of these basically pointed to a combination of IMAP + "some random mail folder format" + "some random mail filter language". To cut a long story short, I tried and tested various combinations until I came up with the following, that works for me:

    Folder format: Maildir stored on a RAID-1 mirror, but using the Binc IMAP IMAPdir layout for folders, rather than the more common Maildir++. With IMAPdir my $HOME is laid out like so:

    Mail/ (IMAP root)
    INBOX/ -> ../Maildir/

    The advantage of this format is that is makes it easy to access with command line tools (unlike Maildir++, which insists on sub-folders being stored as dotfiles, for chrissake), and it doesn't impose the silly "everything is a sub-folder of INBOX" restriction.

    Filter language: Maildrop. Procmail is for masochists.

    IMAP server: Binc IMAP. I would have liked to use dovecot, since during my testing it performed substantially faster than Binc IMAP on large folders, but it doesn't support the IMAPdir layout. I may yet decide to pay or otherwise encourage someone to develop that support.

    Hope this helps.


    • mato says:

      Oh, and I forgot. The other advantage to this setup is that I can use the obscenely fast mairix to index & search my mail.


      • crypticreign says:

        How did you go about converting mbox to maildir?

        • mato says:

          That was some time ago, but ISTR using mb2md. mbox2maildir has also been mentioned in this discussion but at the time it appeared less robust than mb2md. In my case I found some corruption (non-standard delimiters between messages, mostly) that mbox2maildir would just silently do bad things with, whilst mb2md would at least tell me about it so that I could hand-edit it out of the mboxes and re-run it.

  11. mark242 says:

    (the send-time is more interesting than the receive-time)

    Sure, but how often are those times separated by more than 5 minutes?

    • mato says:

      Often. Back in the days of 1992 (or in jwz's case, 1985 or so) we didn't have much of this snazzy broadband fibre-optic infrastructure people are used to these days. Links go down. Mail servers go down. Even today, if you are using something like fetchmail to a laptop, the receive-time in the message envelope will be when it arrived at it's final destination which may be hours after it was originally sent.

      • mark242 says:

        Yes, uucp did suck back then.

        Risking further flamage, I ask: what does a gap of a couple of hours really matter when you're dealing with 13-to-20-year-old mail?

  12. jdquintana says:

    I know this doesn't fix your problem right now, but these are very handy mail.app scripts: http://homepage.mac.com/aamann/Mail_Scripts.html

  13. wilecoyote says:

    I haven't posted anything in these threads because I don't have any useful suggestions (in fact, I changed careers precisely to avoid dealing with all this shit), but the more I read about your interactions with technology, the more surprised I am that you haven't yet: 1) smashed all your computers with a huge hammer; 2) moved to Africa and adopted the hunter-gatherer lifestyle.

  14. otterley says:

    I gave up on IMAP because the server is just far too much of a pain in the ass to figure out how to get running.

    I thought it was as simple as downloading Courier IMAP and running

    rpmbuild -ta courier-imap-1.7.1.tar.bz2 # For RPM 4.1 (Red Hat 8.0+)

    and then installing the resulting RPM.

    It should place startup scripts in /etc/init.d like a good daemon RPM should.

  15. causticjb says:

    JWZ, let me again suggest using dovecot for IMAP. The documentation is clear, install is simple (it took me about 25 minutes to install and configure), and it will support Maildir, Maildir++, and mbox without any complaints.

    This isn't (as far as I know) a 200+ person install. For only 2 to 3 people, it's beautifully simple.

    Courier and Cyrus are good industrial strength applications, not really what you want to use if you're simply converting for yourself and a few others. My experience with both has been less than pleasant (as you could guess).

  16. zuvembi says:

    <lj user="otterley">: I gave up on IMAP because the server is just far too much of a pain in the ass to figure out how to get running.

    This reminds me of a snippet of conversation from a few years ago in the Linux kernel mailing list.

    One of the biggest threadlets came out of an example, posted by Linus, of installing Linux from an IDE PCMCIA CD-ROM. He described the procedure:

    Installing from a IDE CDROM is very reliable. But you have to know the magic incantations for it to work:

    [snipped details]

    Do you see? Even _I_ had trouble installing Linux, and I hung my machine about three times just because a standard install got confused.

    If I have trouble installing Linux, something is wrong. Very wrong.

    Mainly I think of it because, whenever I'm trying to install some piece of soft/hardware and it's completely failing, making me feel like a total moron, I can recall these examples. And I can say to myself, "I'm not an idiot, these things are entirely too difficult to install, what the fuck is wrong with the people who made it?"

    • otterley says:

      The real problem is that there are too many goddamned cooks in the kitchen. It's really no wonder anymore why djb describes and supports only one way of installing djbdns, qmail and associated tools, why it is invariant across all flavors of UNIX-like OSes, and why his license excludes the right to modified redistribution.

      If the user follows his included setup instructions step by step, the author ensures himself that when the user writes him for support, the author doesn't first have to ask which OS the user is running, because it won't matter. Assuming the user did not do anything unsupported and undocumented, the config files are guaranteed to be in their predetermined location, the binaries are guaranteed to be in their predetermined location, and the daemons are guaranteed to be started by one of two methods (init(8) on SYSV-like systems, and from /etc/rc.local on BSD systems). The author will be able to say things like, "do THIS" and THIS will not fail because something is in a different location on the particular user's system.

  17. Who needs imap when you've got maildir and unison?

  18. boy says:

    My existing mail machine is running Courier IMAP with the usual SSL nonsense and while it works great once your config is correct, it was such an annoying bitch of a process that it won't ever be done again. A standalone FreeBSD box needing to do mail service for customers at work led me to digging around for a new solution and Dovecot was the ultimate choice. It is at the complete opposite end from Courier on the Scale Of Pain. Semi-related, the Courier webmail app (which is what most of my family uses to deal with mail) is pretty much ass.

    In the future I may move everything over to Hula for my personal project. As much as I love the Postfix/Dovecot combo a complete system with calendaring support would be peachy.