I've spent a lot of time working with EC2 and I would not really 
recommend it for this purpose without putting a lot of effort into 
planning and considering all the options. First of all, EC2 can be more 
expensive than purchasing your own hardware unless you do it right. 
There are two billing types of Amazon Machine Image (AMI) instances; on 
demand and reserved. On demand instances are intended to be up for the 
short term - from a few hours to days. Their pricing per hour reflects 
this. Reserved instances are cheaper to run per hour (3 cents compared 
to 10 cents for certain instances) since you pay a chunk of money up 
front. Throwing your infrastructure in the cloud is not always cost 
effective unless you plan it correctly. (There are companies that do 
this - I work for one) Keep in mind that after a year or two of hardcore 
EC2 usage, you might have spent enough to have purchased your own 
cluster; all expenses after that point is wasted money.

The other issues are designing your infrastructure over non-persistent 
storage. You might need to set up your own AMIs to ease some of the 
initial configuration (application installation and cluster management 
software). While you can use the many gigabytes EC2 instances come with 
for scratch space, you will need a combination of Simple Storage Service 
(S3) and Elastic Block Storage (EBS) for persistent storage. Each of 
these services has their own limitations. S3 can store an unlimited 
amount of files but maximum file size is around 5GB. An EBS volume can 
only be mounted by one instance at a time (for now). An EBS volume is 
also only available to EC2 instances in the same availability zone. You 
can think of availability zones as data centers in the same geographic 
region (although this isn't necessarily correct).

While data transfers are free between EC2 instances (over local IP 
addresses), they are not when your are using the public IP, even if it 
is between EC2 instances as I've heard. If you're transferring gigabytes 
or even terabytes of data to be computed or resulting from computation, 
this can be an expensive and slow process. Amazon provides a service 
(AWS Import/Export) where you can send in storage devices and they'll 
copy the data over to S3. If you have a lot of devices, it can be very 
expensive. Amazon does provide a nice and simple calculator for this - 
http://awsimportexport.s3.amazonaws.com/aws-import-export-calculator.html 
- so that you can pick which option works best.

They also have another calculator for their other services like EC2 and 
S3 - http://calculator.s3.amazonaws.com/calc5.html

The biggest flaw with EC2 is that while you do have guaranteed CPU and 
memory resources, there is no guarantee of memory bandwidth. This means 
if there is a separate instance from a different AWS account sharing the 
same physical machine as your compute job, the other instance could be 
taking up all or most of the memory bandwidth thus making your job run 
slower. Not only does your job take longer to finish, it is actually 
more expensive.

Since the infrastructure for power, space, and cooling already exists 
for you, it might be a better route to go with purchasing your own 
hardware. The biggest issue I see with deciding how many cores to put in 
a system is the network architecture you choose to purchase. If you 
choose to go with gigabit Ethernet, it doesn't make a huge difference. 
If you're thinking of using high speed interconnects like Infiniband, 
the number of systems you have is crucial since the switches and 
adapters can cost quite a bit of money. While a 24 port switch can be 
reasonably cheap (around $5000), a 48 port switches may not be ($20k-50k 
- http://www.provantage.com/scripts/search.dll?QUERY=Infiniband+switch ) 
so you would need to buy multiple smaller switches to get the right 
number of ports, and then add the right amount of switches to that so 
you can have good enough bisection bandwidth.

For the current Intel Xeon (non-Nehalem) processors, you shouldn't 
really get more than 8 cores in the system as if you go over that count, 
there isn't enough memory bandwidth to keep them all well fed with work. 
Dell and sometimes Sun offer good deals to academic groups, so you might 
benefit from that. Both companies also offer free trials of hardware so 
you can benchmark your applications on each and pick which is best. 
While you could get more AMD nodes that have same or equal power for 
about the same price of a single Intel node, keep in mind the costs of 
having many less powerful systems opposed to few very powerful ones can 
be a financial hit in the future.

steve ulrich wrote:
> mike -
>
> building out your own compute infrastructure is so 2002. ;)
>
> i've used amazon EC2 for a very similar application where i've been
> running large simulations on their infrastructure with my own VM image
> that i use for my purposes.  you can simply dial up the number of
> processors that you purchase and use.  you're charged by the hour for
> the the number of CPU instances you use.
>
> instead of buying hardware yourself that you have to power up, replace
> HDDs, etc. for and manage connectivity for you let someone pay for
> that and simply use their resources on demand.
>
> On Tue, Jul 7, 2009 at 9:29 AM, Mike Miller<mbmiller+l at gmail.com> wrote:
>   
>> We want to put together a few computers to make a little "farm" for doing
>> our statistical analyses.  It would be good to have 50-100 cores.  What is
>> the cheapest way to go?  About 4GB RAM per core should be more than
>> enough.  I'm thinking quad-core chips are going to be cheaper.  How many
>> sockets per mobo? I guess 1-, 2- and 4-socket mobos are available.  We
>> don't need SMP, but we'll take it if it is cheap (which I doubt).  We'll
>> use cloned HDDs in these boxes. My first thought is "blade" but maybe
>> blades are more expensive than somewhat less convenient ways of housing
>> the mobos.
>>
>> We have people here to house it and manage it and to pay for
>> electricity(!). They also will have ideas about what we should buy.
>>
>> Any ideas?
>>
>> Which CPU gives the most flops/dollar these days?
>>
>> Mike
>>
>> _______________________________________________
>> TCLUG Mailing List - Minneapolis/St. Paul, Minnesota
>> tclug-list at mn-linux.org
>> http://mailman.mn-linux.org/mailman/listinfo/tclug-list
>>
>>     
>
>
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20090707/753b6639/attachment.htm