Hi Mike,

There are tons of groups on campus doing this so contact me off list if 
you want to discuss it.

Fewer machines with more cores per machine is a bit more cost effective 
when it comes to the infrastructure to support it (power, cooling, 
space, network devices). Since the jobs aren't memory intensive, go for 
a many core system, but in case you want to use the cluster for 
something else in the future that may be memory intensive, you'll be a 
bit out of luck. Depending on which sort of processor/chipset you buy 
such as the AMD Opterons or the new Intel Core i7 processors (when they 
get ECC memory), it would not be as big of a hit.

Shared disks are the standard and there are ways of differentiating 
nodes while still booting mostly from the same media. I would recommend 
looking into Rocks (http://www.rocksclusters.org) as it would get you 
going the quickest. For managing jobs there is Torque/Maui, or Sun Grid 
Engine which does a lot of work for you that Torque does and then some.

Mike Miller wrote:
> We are thinking about putting together a cluster of maybe 10 machines, 
> presumably using GNU/Linux.  Do any of you have experience with this?
>
> Some of the things I'm wondering about include the appropriate 
> configuration of machines -- isn't it better in terms of cost/benefit to 
> buy fewer dual quad-core machines than more single CPU machines, 
> especially if the jobs are not very memory-instensive?
>
> We certainly want to use shared disks, but is there any problem with 
> booting all the computers from the same network drive?  That seems like a 
> good idea to me rather than to have separate HDDs in the machines, but I'm 
> not sure how it is done.
>
> What free software is available for managing jobs, e.g., batch queuing?
>
> FYI ... The idea is to use these machines for our genetic analyses -- 
> maybe 600,000 SNPs on 7,500 people, but this mostly consists of running 
> one SNP at a time on some collection of traits.  I don't think the memory 
> requirements are too great unless we try to load a lot of the data at 
> once.
>
> Mike
>
> _______________________________________________
> TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
> tclug-list at mn-linux.org
> http://mailman.mn-linux.org/mailman/listinfo/tclug-list
>