[ale] OT: Multi-core Utilization

Fri Mar 8 11:48:45 EST 2013

On 3/8/2013 08:33, Jeff Hubbs wrote:
> My *practical* experience has a hole in it when it comes to developing
> software to efficiently use multiple cores in a machine.
>
> If I'm writing code in the likes of C++, Python, or Fortran
> (acknowledging that I've got a range of programming paradigms there) and
> let's say that I'm subtracting two 2-D arrays of floating point numbers
> from one another element-wise, how is it that the operation gets blown
> across multiple CPU cores in an efficient way, if at all?  Bear in mind
> that if this is done in Fortran, it's done in a pair of nested do-loops
> so unless the compiler is really smart, that becomes a serial operation.

Depending on who (the writer or the compiler) is optimizing the code, 
who knows? :)

The sensible way to do it would be to exploit the known fact that matrix 
addition and subtraction requires the matrices to be identical in 
dimension.  So you just send a row or a column of each matrix to each 
core and let them rip through one short loop iterating over the elements 
of a row or column.  Now you have parallel single for{} loops chewing on 
only one row/column per loop.  The outer iteration is only needed to 
divvy up the work across the cores and executes enough times to send all 
the rows/columns to every available core.  (The number of iterations of 
that outer loop should be the order of the matrix divided by the number 
of cores available, ceiling nearest integer).

...or you cheat and send it to the GPU which knows how to natively work 
with matrices and blows the CPU out of the water. :)