On Thu, Oct 14, 2010 at 04:35:10PM -0500, Mike Miller wrote:
> The csplit coreutil program lets me split a file into sections based on 
> some delimiter.  What I really want to do is split a file into sections 
> based on a delimiter but forcing those sections to be at least b bytes in 
> size, even if that means including multiple delimiters in most or all 
> sections.
> An example would be that I have an mbox file (email messages) of 300 MB 
> and containing 50,000 messages and I want to break it into 10 sections of 
> at least 30 MB each (the tenth section would have to be a little smaller 
> because there wouldn't be enough file left).
> I can do stuff like this to divide the file "mbox" into individual email 
> messages, one per file...
> csplit -ksz mbox '/^From /' {*}

I don't have an answer to your general question, but in this particular
instance csplit would not necessarily do what you want, as there might
be a paragraph starting with 'From' at the beginning of the line
(which vim e-mail syntax highlighting merrily bolds and colors) that
would result in a message split in two.  Use 'formail' for this kind
of processing.

> ...but I can't figure out how to make the files bigger so that they 
> include multiple delimiters.
> It seems like there ought to be a way to do this.

You could be 'catting' together bunches of smaller files 8^)


Bruce Schneier expects the Spanish Inquisition.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20101014/2808e263/attachment.pgp