Threading a scientific program
I wanted one of my codes to have a section where it would diagonalize 4 different matrices at the same time (since this is my bottleneck) and the computers that I'm running on (sharcnet) are mostly quad processors. I looked in to MPI and it's apparent processing and it is hard!! It is clearly not meant for what I wanted to do. I have a code that executes in serial and at a bottleneck point I wanted it to do many things on different CPU's. Then I found openMP. This was what I wanted, a way to parallelize a section of my code without a complete rewrite! I found the SECTIONS and SECTION commands especially easy. For instance say you have 2 matrices A and B and you want to do something to them with the function diag(). In a normal C code you'd have
...
diag(A); //long wait
diag(B); //also long wait
...
With openmp you can fork this section of the code with
#pragma omp parallel sections
{ }
This created an area that is forked and then joined at the end. So once this region finishes executing it goes back to serial-type execution. In the SECTIONS part you now (optimally) make a number of SECTIONs that equals the number of processors (not rquired but makes sense) and each section will execute on a different CPU at the same time but once the SECTIONS section ends it's regular execution again. So for matrices A and B you'd have
...
#pragma omp parallel sections
{
#pragma omp section
{ diag(A)}
#pragma omp section
{ diag(B)}
}
... //this will be normal C or C++ code. There is plenty more but I don't need it right now so that's all I know. I can say this is defiantly easier than using MPI send/receive! Some good sites are AMD's short one and a more complete one here. Hope this helps someone but leave me some CPU's please. ;-)
...
diag(A); //long wait
diag(B); //also long wait
...
With openmp you can fork this section of the code with
#pragma omp parallel sections
{ }
This created an area that is forked and then joined at the end. So once this region finishes executing it goes back to serial-type execution. In the SECTIONS part you now (optimally) make a number of SECTIONs that equals the number of processors (not rquired but makes sense) and each section will execute on a different CPU at the same time but once the SECTIONS section ends it's regular execution again. So for matrices A and B you'd have
...
#pragma omp parallel sections
{
#pragma omp section
{ diag(A)}
#pragma omp section
{ diag(B)}
}
... //this will be normal C or C++ code. There is plenty more but I don't need it right now so that's all I know. I can say this is defiantly easier than using MPI send/receive! Some good sites are AMD's short one and a more complete one here. Hope this helps someone but leave me some CPU's please. ;-)
Labels: programming, Science
0 Comments:
Post a Comment
<< Home