For searching: Split E-Mail mbox with formail limiting output mbox size.

This looked intriguing and I didn't quickly find the answer on the web.
I created maxmail.sh containing this script:

#! /bin/sh -f

prefix=splitmbox
# 10MB
maxsize=10000000

#check that the count file exist. Make one if it doesn't.
if [ ! -f count ] ; then
     echo 1 > count
fi

# set a variable to the contents of the count file
count=$(cat count)

# create a splitmbox file if it doesn't exist
if [ ! -f $prefix.$count ] ; then
     touch $prefix.$count
fi

#check the size of that box
size=`stat -c %s $prefix.$(cat count)`

# if it's greater than your max, then increment count
if [ $size -gt $maxsize ] ; then
     count=$(expr $count + 1)
     echo $count > count
    echo "Splitting to $prefix.$count"
fi

# append whatever came into this script to the splitmbox file
cat >> $prefix.$count
#-----------------End Script

Then I ran
procmail -s ./maxmail.sh < mbox


if you have some really large individual mails, they will stay together
and may make your split mbox bigger than your max.
My result:
Splitting to splitmbox.2
Splitting to splitmbox.3
Splitting to splitmbox.4
Splitting to splitmbox.5

gsker at veeta:~/mail> ls -l splitmbox.*
-rw-rw-r-- 1 gsker gsker 10006881 2010-10-14 19:44 splitmbox.1
-rw-rw-r-- 1 gsker gsker 11950245 2010-10-14 19:45 splitmbox.2
-rw-rw-r-- 1 gsker gsker 12995777 2010-10-14 19:45 splitmbox.3
-rw-rw-r-- 1 gsker gsker 10063591 2010-10-14 19:45 splitmbox.4
-rw-rw-r-- 1 gsker gsker  4328906 2010-10-14 19:45 splitmbox.5

gsker at veeta:~/mail> wc -l splitmbox.*
   165330 splitmbox.1
   210013 splitmbox.2
   200543 splitmbox.3
   171013 splitmbox.4
    90904 splitmbox.5
   837803 total

gsker at veeta:~/mail> wc -l mbox
837803 mbox




Cool!



-- 
Gerry Skerbitz
gsker at skerbitz.org
-------------- next part --------------
On Thu, Oct 14, 2010 at 04:35:10PM -0500, Mike Miller wrote:
> The csplit coreutil program lets me split a file into sections based on 
> some delimiter.  What I really want to do is split a file into sections 
> based on a delimiter but forcing those sections to be at least b bytes in 
> size, even if that means including multiple delimiters in most or all 
> sections.
> 
> An example would be that I have an mbox file (email messages) of 300 MB 
> and containing 50,000 messages and I want to break it into 10 sections of 
> at least 30 MB each (the tenth section would have to be a little smaller 
> because there wouldn't be enough file left).
> 
> I can do stuff like this to divide the file "mbox" into individual email 
> messages, one per file...
> 
> csplit -ksz mbox '/^From /' {*}

I don't have an answer to your general question, but in this particular
instance csplit would not necessarily do what you want, as there might
be a paragraph starting with 'From' at the beginning of the line
(which vim e-mail syntax highlighting merrily bolds and colors) that
would result in a message split in two.  Use 'formail' for this kind
of processing.

> ...but I can't figure out how to make the files bigger so that they 
> include multiple delimiters.
> 
> It seems like there ought to be a way to do this.

You could be 'catting' together bunches of smaller files 8^)

Cheers,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20101014/796a4fbb/attachment.pgp 
-------------- next part --------------
_______________________________________________
TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
tclug-list at mn-linux.org
http://mailman.mn-linux.org/mailman/listinfo/tclug-list