Good tutorial on setting up a grid/cluster using fedora

Wed Apr 2 21:34:04 UTC 2014

On Wed, 2 Apr 2014, Greg Woods wrote:

>
> My experience says there isn't. Granted I am not an expert in parallel
> computing, but I work for a supercomputing site. About 15 years ago,
> high performance computing hit the wall with regard to how fast a single
> processor can be. We had CRAY computers that used vector processing;
> that means executing the same instructions on a range of memory words at
> the same time in one instruction cycle. This means that code like
>
> for i = 1,100 do
> a[i]=a[i]*2
> done
>
> would execute at the same speed as "x=x*2" (in this admittedly trivial
> example, you get a factor of 100 speedup).  That was a lot easier to
> program for than multiprocessing, but even that required careful
> attention when writing code so that it would vectorize and get the
> performance boost.
>
> After single processor computing hit the wall, we and every other HPC
> site had to go to parallel processing (modern supercomputers have tens
> of thousands of processors running on thousands of separate nodes). This
> too requires special coding, so that your program will naturally break
> up into separate tasks that can be executed in parallel. That is true
> whether you are talking about using multiple processors on a single
> machine, or spreading a code over multiple systems. There are MPI
> libraries to make this task easier, but it is never as simple as "OK,
> now execute this unmodified code five times as fast using five machines
> instead of one".
>
> How difficult it is to parallelize the code depends, as has already been
> said here, on the particular application to be parallelized.
>
> --Greg
>

Right.  A lot of image processing tasks are amenable to parallelization.

Consider an algorithm called "adaptive histogram equalization."  What this does is take:

1) Get a pixel and a small area around it (say the surrounding 100 pixels).

2) Do a contrast enhancement method called "histogram equalization" on that group of pixels.  This will change the value of the pixel in question.  Let's say that this process involves 500 high-level instructions.

3) Move to the next pixel.  Do the same thing.

If you have a 12-megapixel image (say, 11,760,000 pixels), that's 5,880,000,000 instructions.  That 500 instruction block is impossible to parallelize well.  However, each pixel is independent, so you can parallelize the work on each pixel easily.  I remember back in the 80s implementing this on a microVAX GPX II.  It took about 3 hours to do a 512x512 greyscale image by brute force.  Then Henry Fuchs et al. developed the PixelPlanes machine, and Austin et al.  implemented it on that -- it took about 4 seconds.  Even today on my laptop with an i7, a brute-force contrast-limited adaptive histogram equalization on a 10 megapixel image takes a "go get a cup of coffe" time period.  There are, of course, short cuts such as the Pizer-Cromartie algorithm, but they introduce interpolation artifacts.

Of course, that's why we have GPUs now, and most of this stuff is done on the GPU using CUDA.

Oh well, as I said, I remember back in the day trying to build a Beowulf cluster and deciding that it just wasn't worth the effort.  I was hoping that new tools were around to make it easier, with all the new advances in cloud and virtualization, but no such luck, it seems.

billo