As I haven't played with a Beowulf cluster, this is just a shot in the
dark...

Are you seeing lots of collisions on your 100 Mb hub when the latency
starts getting really bad?  If so, you may want to look into getting an
ethernet switch instead of using the hub.  If this is just a little test
environment, go get a little 5-port (about $30) or an 8-port (about $40)
10/100 switch from General Nanosystems.

If this is a production environment, you may want to look for a more
robust managed ethernet switch.

Jeff


On Mon, 22 Jul 2002, Randy Clarksean wrote:

>
> I am setting up a small Beowulf cluster.  I can run a number of the
> benchmarks indicating that it works: data is passed machine to machine, each
> machine will make calculations, all of the LAM examples work.
>
> There are two problems that keep coming up and any suggestions would be
> greatly appreciated.
>
> -  at times there are delays in the network because a ping can take up to 1
> sec to go to any one or all of the machines.  The machines are all on 10/100
> NICs with a 100 MB Hub.  If the machines are carefully rebooted (meaning
> rebooting one at a time until it is all the way back up again) this problem
> will seemingly go away.  So .. any time there is the ping delay, the rsh
> delays everything - which causes the cluster many problems.  Any thoughts on
> rsh parameters that I should change or set differently?  OR is there
> something running on the system that can cause these ping or rsh delays?
>
> - We have installed LAM and MPI related software.  All of the LAM examples
> will start, run, and complete successfully.  Many of the MPI examples will
> come back and tell me that the process did not close or finalize on some of
> the nodes properly.  I apparently have some setup problem ... but as you can
> tell - I have no idea what.
>
> Any suggestions or recommendations would be greatly appreciated.  I have
> posted in comp.parallel and comp.parallel.mpi on a couple occasions with no
> luck.
>
> All of these issues occur on a RH 7.2 installation - all systems (5)
>
> Thanks in advance!
>
> Randy
>
>
> _______________________________________________
> Twin Cities Linux Users Group Mailing List - Minneapolis/St. Paul, Minnesota
> http://www.mn-linux.org
> tclug-list at mn-linux.org
> https://mailman.mn-linux.org/mailman/listinfo/tclug-list
>
>