I am setting up a small Beowulf cluster.  I can run a number of the
benchmarks indicating that it works: data is passed machine to machine, each
machine will make calculations, all of the LAM examples work.

There are two problems that keep coming up and any suggestions would be
greatly appreciated.

-  at times there are delays in the network because a ping can take up to 1
sec to go to any one or all of the machines.  The machines are all on 10/100
NICs with a 100 MB Hub.  If the machines are carefully rebooted (meaning
rebooting one at a time until it is all the way back up again) this problem
will seemingly go away.  So .. any time there is the ping delay, the rsh
delays everything - which causes the cluster many problems.  Any thoughts on
rsh parameters that I should change or set differently?  OR is there
something running on the system that can cause these ping or rsh delays?

- We have installed LAM and MPI related software.  All of the LAM examples
will start, run, and complete successfully.  Many of the MPI examples will
come back and tell me that the process did not close or finalize on some of
the nodes properly.  I apparently have some setup problem ... but as you can
tell - I have no idea what.

Any suggestions or recommendations would be greatly appreciated.  I have
posted in comp.parallel and comp.parallel.mpi on a couple occasions with no
luck.

All of these issues occur on a RH 7.2 installation - all systems (5)

Thanks in advance!

Randy