Scripting gmail download and saving the attachments with fetchmail, procmail, and munpack


google-inbox_2x-1 One of our partner is having a difficulty in automating the email attachment processing.  So basically attachments on each email, hosted on a secured POP3 server must be downloaded and saved to certain folder on a server. In addition to that, all attachment must be converted to tiff. See, this could be done easily with a cronjob, fetchmail and munpack. What you need:

  • linux install, I’m using ubuntu 14.04 for this, since CentOS has outdated and buggy mpack package
  • fetchmail to fetch the email from the pop3 server
  • procmail. Since munpack works with Maildir format, we will be using procmail to transfer the email to Maildir
  • mpack, for unpacking the attachments
  • imagemagick, for jpegs and pngs conversions to tiff. It can also do pdf, but the result is horrible at best
  • ghostscript for pdf to tiff conversion

Prepare your linux box properly, a vanilla install without a DE should suffice. Install the required packages, :

sudo apt-get install fetchmail procmail mpack imagemagick

ghostscript should be installed by default on your distro. If you’re using distributon other than Ubuntu and its’ derivatives, please make sure that you’re using the latest version of mpack. The next steps should be performed with non-root account. Let’s start first with fetchmail. Create fetchmail config file by doing

 nano ~/.fetchmailrc

For the purpose I’m using gmail to simulate POP3 server access

poll pop.gmail.com
protocol pop3
timeout 300
port 995
username "sovereign.khan@gmail.com" password "IsThisSparta?"
keep
mimedecode
ssl
sslcertck
sslproto TLS1
mda "/usr/bin/procmail -m '/home/ikhsan/.procmailrc'"

Replace gmail pop server address with yours, and since the file will contain the password to the mailbox in clear text, it must be secured. Do

chmod 700 ~/.fetchmailrc

…so that only you can open and see the file. Please take a note that the last line of the config file contains a hook to call procmail and its’ corresponding config file. Next, to setup procmail. Create the config file for procmail. Create it where your fetchmail hook can find it.

nano ~/.procmailrc

The file should look like this

LOGFILE=/home/ikhsan/.procmail.log
MAILDIR=/home/ikhsan/
VERBOSE=on

:0
Maildir/

This will set /home/ikhsan/Maildir as the mail directory, and new mails will be delivered there. Now let’s create the folders that we will use to process the attachments:

mkdir ~/Maildir/process
mkdir ~/Maildir/process/landing
mkdir ~/Maildir/process/extract
mkdir ~/Maildir/process/store
mkdir ~/Maildir/process/archive

A bit of explanation for the folders:

  • landing is where we first move new mails from procmail’s Maildir prior extracting the attachments
  • extract is where we will perform attachment extraction
  • store is the final destination of the attachments
  • archive is where the mail files are stored after the process is done. If you want to reprocess certain files, just move it back to landing

And now, for the script. Create it wherever you like it,  I personally kept all of my scripts in one place

nano ~/scripts/getmail.sh

The scripts is very simple and should be self explanatory:

#!/bin/bash
DIR=/home/ikhsan/Maildir
LOG=/home/ikhsan/Maildir/getmail.log
date +%r-%-d/%-m/%-y >> $LOG
fetchmail
mv $DIR/new/* $DIR/process/landing/
cd $DIR/process/landing/
shopt -s nullglob
for i in *
do
echo "processing $i" >> $LOG
mkdir $DIR/process/extract/$i
cp $i $DIR/process/extract/$i/
echo "saving backup $i to archive"  >> $LOG
mv $i $DIR/process/archive
echo "unpacking $i" >> $LOG
munpack -C $DIR/process/extract/$i -q $DIR/process/extract/$i/$i
echo "converting pdf.." >> $LOG
        for x in $DIR/process/extract/$i/*.pdf
        do
        ranx=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 5 | head -n 1)
        gs -sDEVICE=tiff24nc -dNOPAUSE -r300x300 -sOutputFile=$DIR/process/extract/$i/$i-$ranx.tiff -- $x
        rm $x
        done

echo "next, the jpegs.." >> $LOG
        for y in $DIR/process/extract/$i/*.jpg
        do
        rany=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 5 | head -n 1)
        convert $y $DIR/process/extract/$i/$i-$rany.tiff
        rm $y
        done

echo "last, the pngs.." >> $LOG
        for z in $DIR/process/extract/$i/*.png
        do
        ranz=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 5 | head -n 1)
        convert $z $DIR/process/extract/$i/$i-$ranz.tiff
        rm $z     
        done

done
shopt -u nullglob
echo "finishing.." >> $LOG
mv $DIR/process/extract/* /$DIR/process/store/ 
echo "done!" >> $LOG

Each set of attachments will be kept on separate folder, tagged with the time and date of processing. Ghostscript is used to convert pdf to tiff, while ImageMagick’s convert is used for jpeg and png conversions. Call the script with with a cronjob. The script will not preserve the name of the attachments.

crontab -e

To set the script to check for new mail every minutes, do

*/1     *       *       *       * /home/ikhsan/scripts/getmail.sh

Please be conservative with the schedule and consult with the mailserver admin, since some servers might relate  periodical access with short interval to an attempt for Denial of Service attack.   ..And we’re done 🙂


		
Advertisements

4 Comments Add yours

  1. David says:

    Thank you for providing this! Very helpful. This was the best source of info that I found that explains bringing fetchmail, procmail, and munpack together to retrieve/save emails in scripting.

    1. ikhsan says:

      You’re welcome 😀

  2. somphil says:

    kereen,.. pak ikhsan masih seneng ngoprek ya,…
    pernah dapet case kayak gini juga pak, jadi download attachment, kemudian upload ke ftp. saat ini pake skrip powershell, make EWS soalnya service pop nya exchange ga dinyalain, jadi skrip nya masi jalan di windows,
    mungkin kalo pak ikhsan pernah coba ews yang di linux, bisa share pak,.. sangat membantu bgt 🙂
    terimakasih

    1. ikhsan says:

      Oi Phil 😀

      Sayangnya belum pernah, tapi pada dasarnya metode di atas bisa dipakai, dengan menambahkan bridge ke OWA/EWS di depannya. Untuk itu bisa pakai DavMail, yang keluarannya smtp dan IMAP.

      Jadi dibikin kaya gini:
      OWA-[EWS]->Davmail-[IMAP]->Fetchmail+Procmail-[MailDir]->MPack. Selanjutnya tinggal ditambah script untuk ftp/sftp

      Buat baca2 soal Davmail:
      https://www.digitalocean.com/community/tutorials/how-to-setup-a-davmail-exchange-gateway-on-a-debian-7-vps

      Kalo iseng ntar gue coba 😀

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s