Good tutorial on setting up a grid/cluster using fedora

Thu Apr 3 20:41:00 UTC 2014

On 04/02/2014 09:54 PM, Bill Oliver wrote:
> 
> Just to see if I can do it, I thought I'd set up a small grid/cluster
> using fedora.  Does anybody know of a good step-by-step guide for this
> (preferably free and online :-) )?
The "cluster" thing has a very blured meaning. You can do a cluster of
anything : you can have a cluster of postgres, a cluster of apache
nodes, etc.. anything that is made from distinct elements and work
together for a computational goal is a cluster.

The "grid" word is simple: cluster of clusters (geographical scattered).
An API that can access a cluster of clusters is a so-called "grid
middleware" .. historically it started with globus and now there are
middlewares like EMI (former gLite), ARC, AliEN (experiment specific)
and others (i am only familiar with those used in the experiment i am
part of)

To return to cluster: a cluster (usually) is a bunch of nodes linked by
a Resource Manager (like torque, lsf, sge, condor, slurm (used by most
of top 500 super clusters))

I said usually because there are things like shared memory clusters.
this cluster are made by individual nodes that have linked memory access
creating what is called a NUMA machine. if you log on a such machine you
will see a single computer with a few thousands cores and some (many) TB
of memory.

To return to the resources managed cluster: you can have 2 types of
processing: distributed and parallel.
In distributed computing you have some kind of atomic element of data.
In this way you can distribute chunks of data elements and process them
and in the end merge the results. Usually this happens in what is called
batch systems (like the ones enumerated above at Resource Managers)
because the computing jobs are batched in queues and the system process
the jobs sequentially (modulo number of computing slots in the cluster).
Lately appeared some other kind of processing in the form of interactive
processing. It started at CERN with the PROOF subsystem (part of ROOT)
around year 2000 (maybe older) and recently something called HADOOP. The
principle is that the data is distributed (if not already present) on
the nodes and interactively processed. Using the API it is like you have
a computer with many more cores that you could have on your computer.
(it is funny thing to use a laptop for some data analysis and see things
like 3 TB data processed in a couple of minutes)

The parallel processing usually is related to big matrices computing
(and solving many equations with many variables). The standard API is
called OpenMPI (but there are others). The parallel thing appears when
you the computation steps have dependencies on the other computation
steps so you need barriers, thread synchronization and (in general)
communications between threads. From my experience the MPI api is using
some resource manager in order to better access the required and
available resources (i am used to have/see torque wrappers and
integration for launching and using MPI based programs.)
The problem with parallel computing is that it is heavily dependent of
node intercommunication. It is working with ethernet (and i would
recommend drivers and hardware that have RoCE (RDMA over Ethernet)) but
an infiniband dedicated network is recommended.

As for the answer to your initial inquiry i would highly recommend ROCKS
Clusters that is based on CENTOS. it will automatically install and
manage your nodes from a single point (the FrontEnd server)(with NFS
shared home).

HTH,
Adrian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2272 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.fedoraproject.org/pipermail/users/attachments/20140403/c4eb3dbc/attachment.p7s>