Regarding the current list server:
    * PPro 200 w/ 128MB RAM, 9GB, 18GB SCSI discs

Paraphrasing Carl on the reasons it's getting overwhelmed:
    * mailman stores its data in huge mbox files
    * mbox files must be locked for each read & write. 
    * large # of requests for the LKML archives: spiders, etc.

Mailman also requires that each email to be "threaded" with the
pipermail module of the program be submitted one-by-one through the
global mailman wrapper script.  There is no well-defined separation
between the archive function of the list and the distribution function
of the list.  The libraries may support it with some hacking, but it's
not there yet.

Paraphrasing Carl's solutions:
    * rewrite mailman to use an SQL backend. 
    * hardware upgrades: memory, extra harddrives, etc
    * new box for TCLUG

SQL is overkill.  It is also slower than standard file transfers.  The
linux kernel is much better at pushing out static pages than dynamic
content anyway.  Use it to its advantage.  That's the reason mailman is
web-ifying the list archive anyway.

Regarding spiders, you can configure Apache to send the "No Archive"
header when fulfilling requests for *.mbox files.  Allow the spiders to
hit the individual HTML files generated.  This is how most people would
prefer to view web-based archives anyway.  Provide links on the site to
download the full mbox files.  (I'd be interested in seeing statistics
on how many mbox files are searched as opposed to simply being
downloaded).  Store the list into separate mbox files based on size or
date.  gzip them.  mbox files don't need to be accessed directly, ever.
Let users download them to their mail directories and browse them there.
Otherwise make them use the web front-end.

As far as webifying the archive goes.  Drop pipermail.  Don't even
bother with it.  Use mhonarc.  You can offload the html-izing of the
archive on to another machine with this very nice and configurable tool.
All you need is the mbox, perl, and mhonarc.  Use rsync or ftp to push
the html-ized archive to your website.

Let the distribution of the lists remain separate from the web-ifying of
the list.  Don't bother pushing out "ancient emails" through an exploder
list if you fall behind.  Simply provide the mbox files to the
subscribers and say:

    "Look.  The email server was down for a while and we missed about
    30,000 posts from LKML.  Instead trying to re-inject them.  You can
    download the mbox file from <URI>.  The web-archives will be updated
    every 3 hours and can be found at <URI>.

    If you need help with integrating these mboxes into your own email
    client/folders, take the mbox and dump it on your spool with cat
    file >> <spoolfile>.  A more elegant solution is..."

Really.  Until mailman separates the tasks of distribution from
webifying the archive in a clean break, something that can be offloaded
(hmm...separate mailman from pipermail perhaps), it'll continue to be a
bad solution for high-volume lists.  It's much better to go with a
simple email list server, such as SmartList, and use the power of
multiple tools, such as mhonarc, perl, and apache.

DISCLAIMER

This post is simply a continuation on the discussion about the TCLUG
email list and it's use of mailman, the python email listserver /
archiver.  It is not intended as a criticism of the services that Real
Time is providing us for Free!  I just honestly disagree with the
implementation being used.  Regardless, I appreciate RTE's commitment to
TCLUG!

-- 
Chad Walstrom <chewie at wookimus.net>                 | a.k.a. ^chewie
http://www.wookimus.net/                            | s.k.a. gunnarr
Get my public key, ICQ#, etc. $(mailx -s 'get info' chewie at wookimus.net)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
Url : http://shadowknight.real-time.com/pipermail/tclug-list/attachments/20020129/5374d041/attachment.pgp