in Technology

Fetchmail and large emails

When came to office on Saturday, I wanted to spend some time on OpenCerti and complete the pending ActionScript work. But there was a problem waiting! The emails were stuck, and we were getting duplicate emails.

Let me give you some background. We have a set of Linux servers in the office. The magnet-i.com site is hosted in a NOC in USA. We have a few POP accounts on the server and a catchall. The catchall captures emails for most of the users. POP accounts are primarily used by people who travel and need roaming access etc. We download emails to local server using Fetchmail. The internet connection is via cable modem and DSL.

Now this kind of problems have happened before. I remember in the dialup days, if we had a large number of emails, and if the internet connection drops while fetchmail is running, it would start downloading all the emails again the next time you connect. Resulting in duplicate emails.

This time around, we have a major recruitment drive going on. The catchall account was more than 85MB in size. And fetchmail was giving up on it. It was not only the size, but also the number of messages that was big. And the mails were not only for the HR, but also for other people in the organization.

I and Vishal looked at the problem and first tried to delete unwanted mails via webmail. We got it down to 55MB, but this was still too big for fetchmail to handle on our internet connection.

The next step we generally take is to take the mbox file, bzip2 it, download it over web, append to the local mbox, and let the mails flood into the email client. This time, we couldn’t do it because it was not only one user account that the emails were destined too. But this is the track that we take up.

The mbox file contains all the mails in a single file. So we SSH’ed to the server, located the catchall account (something like /home/user/mail/domainname/emailaccount/ on cpanel servers) mailbox. Doing a bzip2 on it, got the file size down to 11MB. (OT: I wonder why they didn’t add compression in POP/SMTP. That would have saved a lot of traffic).

Downloading this via web (so that we can resume the download if it gets broken) did not work. Somehow our server did not allow direct file downloads. (Guess it was me only who disabled hot links like this..) We didn’t have too much time to go reconfigure the server, so we simply FTP’ed the file to one of our servers that we use for exchanging files with clients. Downloaded the 11MB file to local machine and posted it on the mail server.

Now what?

Idea! What if we configure fetchmail to connect to the local POP server? We could create a new user and push the downloaded mails into the new user’s mbox file. When fetchmail connects to the local server to fetch emails, it will fetch emails from this new account. And it can deliver them to the local users as per the original configuration!

The idea was right, and we tested it with one or two messages in the mbox file. First it bounced back, saying there’s a “mail forwarding loop”. We removed the “no dns, aka magnet-i.com” part from the fetchmail config, and it started pushing the emails to the postmaster. At least it did not bounce back! A few trials later – and inspecting the logs and the postmaster emails – we figured it out. We need to have the “aka magnet-i.com” line in, and have the mbox file of the server.

So set this up, ran fetchmail, and it went chopping the mbox like crazy and delivering emails to the local users. If we used some other method (like downloading only new messages with UIDL etc), it would have taken 5 times more time to download the emails, and we would have to monitor the process.

This gets us the best practices for handling large emails that are stuck on the server.

  • Bzip the mbox file. Download it via web.
  • Unzip the mbox file on local server. And process it there.
  • If it’s a single email account, simply append the mbox file to local mbox file. And let the user dowload emails via her email client.
  • If there are multiple email accounts in the mbox file (mbox of a catchall), create a new account on local server and append the mbox to it. Add a rule in .fetchmailrc to use local server as POP3 and fetch emails for the new account created. And then distribute it to “* here”
  • You can use “fetchmail -v” for verbose output.
  • Monitor /var/log/maillog and /var/log/fetchmail.log for info. Also check the postmaster account (we run Postfix) for error reports.

Good troubleshooting for the day! I am off to something bigger now!

Write a Comment

Comment

  1. Also, ask fetchmail to get smaller bunches of emails and then delete them off from the server immediately. Check fetchlimit in man fetchmail.

    Then, run fetchmail like 10 times manually with a limit of 5 or 10.

  2. Hi can u please tell me how to configure using fetchmail to retrieve mail from some catchall account. Also how fetchmail delivers those mails to the local domain after getting those mails. Can u help me at the earliest.

  3. Dear Nirav,

    This information you provided is very informative.
    Still i’m searching for duplicate mails.

    Let me explain you whole thing.

    I have some pop & one catchall id at my domain server.

    If anyone send a mail to some users, who don’t have pop accounts, and comes under catchall Id. Catchall ID receives same mail for each users listed in “To or CC” address.

    At office premises I have configured Feodra Core 4 server with fetchmail.
    Now when fetchmail fetches mails from catchall ID it fetches each users mail, but since “To & CC” address contains other users ID, it delivers mail to those users every time.

    This is irritating and didn’t find anything suitable while googling.
    Can you guide me on this one???

    Regards,
    Deven.

  4. Hi Deven,

    That’s how it is supposed to work! We had the same problem, but there is not simple solution to this problem. We then made a small script that would go through the emails and delete the duplicate emails off the server. Fetchmail will be called after this.

    That too was giving problems with “mbox locked” kind of errors. So we finally created POP accounts for all valid users and knocked off the catchall!

    HTH.

    :Nirav

  5. Dear Nirav,

    We are running a Sendmail and Fetchmail Servers.We have around 500 users,apprx 400 users are sitting at our head office and rest of them fetches thier mail from our ISP end.

    Now the problem we are facing is that because of too many and limited tcp/ip connection at our ISP Server we cant fetch mails for 400 users at one time.so for this we have configured four fetchmailrc file.But it has resulted in delay of mail to users at our head office.

    Do u have any solutions for this.

    Thanks in Advance.

  6. Dear All
    I have solved that duplicate mails fromk catchall id problem

    I make the following entry to the /etc/procmailrc file

    :0 Wh: msgid.log

    | formail -D 8192 msgid.cache

    which will checks the every message-id, and if it appears tobe duplicate it will simply discard it.
    So it delivers only one copy to each users.

    thanks

    Deven.