SpamAssassin being skipped?

Dear Lazyweb,

About once a day or so, a spam message lands in my Inbox because SpamAssassin never saw it. There are no X-Spam-* headers in it at all (normally those are present even for non-spam). The message was received from my ISP by the usual route. /var/log/maillog generally shows spamd activity immediately after each sendmail connection, but not in this case: sendmail is logged but not spamd. Messages received just a minute or two before and after show spamd being run as usual.

How is this even possible?

Update: Apparently SpamAssassin assumes that any message bigger than 250K is not spam. Wow, that's a good idea... It's possible that the fix is to add "-x -s 100000000" to the call to spamc in /etc/procmailrc.

Tags: , , ,

22 Responses:

  1. ianw says:

    A similar situation happened to me when I was running Postfix+amavisd+spamassassin+clamav on a VPS. I had them bump up the process limits, and that fixed the problem completely.

    Maybe spamd is being a resource hog?

    • jwz says:

      It's running on my home mail server, though, which is completely idle all the time, as far as I can tell. There's nothing in the logs to indicate that load got high (sendmail will log that) and it has barely touched its swap partition. The "spamd" process was started on feb 22, and isn't using much memory right now.

      • ianw says:

        Are you getting a lot of mail messages when it craps out, or is it just "forgetting" to run the messages through spamd?

      • 'The "spamd" process was started on feb 22, and isn't using much memory right now.' -- this doesn't tell you much: it forks for each connection, and it'll only really start to eat memory once it starts running its tests.

  2. fdoml says:

    Maybe message is too large ?

  3. brad says:

    Same thing happens with me with Postfix+amavisd+spamassassin+clamav, like ianw said. No clue why.

  4. benc says:

    Spamc will not pass a message that's bigger than 250k to spamd by default. It'll also just return the message instead of bombing out with an error if it can't connect to spamd, for whatever reason...

    It only returns an error if it connects OK, *then* fails to scan.

    • jwz says:

      Geez, why would it not even log anything when it does that? That's nuts!

      Where do I change this? I don't see anything about it in the spamd man page...

      • hasturkun says:

        It isn't in spamd, it's in spamc, set to 250k by default

        • jwz says:

          Oh, ok. So do I add this to /etc/procmailrc, or is there an /etc/syconfig file, or what? I only barely understand how this junk is supposed to be set up.

          • spamc -s SIZE; SIZE is in bytes. There might be a way of changing the default, but I doubt it -- if in doubt, curse the developers^W^W^W recompile.

            • jmason says:

              I'm one of the developers; curse away. ;)

              Re 'recognising that you shouldn't bother trying to run text-based regular expressions over large binary attachments was just too hard' -- we do indeed not run text-based regexps over large binary attachments. That's not the problem -- the problem is large chunks of text/plain or text/html, especially *hostile* large chunks.

              The whole operation of SpamAssassin is based around the probabilities of various traits appearing in spam. Spam messages over 250KB are (or at least were) very rare.

              At the same time, the perl regexp engine can consume exponential quantities of memory when matching certain backtracking regexps against multi-megabyte message strings -- especially strings that have been crafted to do just that. (See Scott Crosby's research into algorithmic complexity attacks, for example -- .) This meant that we needed to avoid passing any and all messages to the scanner, since a massive message becomes a trivial way to DOS the server through memory consumption.

              So, we picked a reasonable point at which to limit scans, based on prevalence of spam message sizes.

              Having said that -- that was quite a while ago, and it's time to reevaluate things. Bandwidth has been increasing steadily since then, and larger spam is becoming more common.

              Me, I use 600KB these days, but I'd say going up to 1MB would be fine. Here's what I use in ~/.procmailrc:

              * < 600000
              | spamc

              • jwz says:

                The most frustrating part about this was that there was no logging of any kind -- if something had shown up in maillog or in an X- header saying "message too big, unchecked" I'd have been able to figure out what was going wrong on my own...

                • jmason says:

                  I think that may have been because it was part of the procmail recipe. I think spamc will indeed log a message -- if it gets to spamc; but the "< 250000" part of the normal procmailrc recipe means that it never gets past procmail.

                  • jwz says:

                    Hmm, well what I have now is this -- does this do what I want? (I'm blissfully ignorant of procmail syntax...)

                      % cat /etc/procmailrc

                      | /usr/bin/spamc -x -s 100000000
                  • jmason says:

                    that should work fine, syntactically at least. however 100MB -- -s 100000000 -- is gigantic! that gives a potential attacker plenty of room to perform nasties.

                    If you really want to filter pretty much all spam messages, without exposing your server to the danger of resource exhaustion, I'd suggest 3MB -- "-s 3000000" should do the trick pretty well. Messages over that size will be delivered as nonspam, which works OK, since I've never seen a spammer send out mails that big. (the scale of spamming requires messages that are as short as possible.)

  5. hasturkun says:

    You're probably hitting either the spamc timeout or just an error, combined with spamc's default of safe fallback, makes it go through
    you can disable it with the -x flag.

  6. yesthattom says:

    In addition to the size issue, is there a chance that you have an additional postfix service running on a port other than 25 that lets spammers have a non-spamassassin vector?

  7. netik says:

    Aside from whta's already been said, if the message size is too big, spamd will ignore it.

    If you're running amavis, there's a config option that says "Tag if over" -- the calls to Mail::Spamassassian will only add x-spam-* headers if the spam score is greater than that value.

    It's in this section:

    sa_tag_level_deflt = 1.0; # add spam info headers if at, or above that level
    $sa_tag2_level_deflt = 4.65; # add 'spam detected' headers at that level
    $sa_kill_level_deflt = 6.31; # triggers spam evasive actions
    $sa_dsn_cutoff_level = 9; # spam level beyond which a DSN is not sent
    # $sa_quarantine_cutoff_level = 20; # spam level beyond which quarantine is off

  8. giantlaser says:

    Others have answered your question, so only address the reason there is a 250K limit by default. Try sending yourself a copy of a spam you've received and attach a 3 MB file. Watch spamc/d for memory usage, and the time it takes to actually deliver the message. That might be acceptable on your system, but do that on a multiuser system with a busy mail queue and you'll regret it.

    • evan says:

      Another way of looking at the problem is that spamassassin is back from the days when the default policy was to pass, and it was trying to cut out the spam. These days, I think of email handling as trash by default with filtering software attempting to find signal in the noise.

      • jwz says:

        It's also apparently from the days where recognising that you shouldn't bother trying to run text-based regular expressions over large binary attachments was just too hard.

        ("Now you have two problems.")