next up previous contents
Next: Sparse Matrix Computation Up: Parallelizing matrix-vector multiplication Previous: Block-striped partioning   Contents

Block-checkerboard partitioning

The matrix is divided into small squares size $(2 \times 2)$. Each node gets a block each. The vector X is distributed in portions of size two to each process in the group. A refinement of this is also implemented, where the vector X is distributed in equal portions to the first process in each column(block) of the matrix. The first process in each column broadcasts this vector downwards to the whole column. Each processor multiplies it's block by it's vector. The MPI_Gather operation is then used to gather the results in from each node to the head node, where it adds all the information received from each row and inserts it into the final qubit vector.



Colm O hEigeartaigh 2003-05-30