OPTIMAL DECOMPOSITION OF A BOX [UPDATED]
For awhile now I’ve been doing distributed computing based on two major methods: the METIS graph partitioning method for decomposition and the MPI method for parallelism. Both of these techniques are well established and used extensively in many fields of computational physics, engineering and chemistry. I’ve been doing simulations in a simple mesh for a few months now. This mesh is simply a box with 200 x 200 x 200 cells. I decompose the box into 8 parts, each part to be run on a different processor using MPI as the construct to deal with processor-processor boundaries/communication. It occurred to me that the METIS method does something particularly ridiculous in this case.
If you simply break the box up into 8 pieces, the easiest possible way to do this is just to simply cut through the planes of the box. The faces of the global box do not exist on processor boundaries, as I apply boundary conditions on all these faces. Each cutting plane has 200 x 200 faces, so you don’t need a CS or math degree to know that the number of processor faces in this case would be 120,000. Is this what METIS gives you? No! It gives you 164,033 processor faces. What the hell?
Here’s a little graph of what this looks like (excuse my very quick and dirty xfig’ing). The width of the boundaries is directly related to the number of processor-processor faces between each decomposed domain.

While there is some obvious symmetry here (within a certain level of approximation), this yields far from the cleanest solution. While METIS may be fantastic for complex domains, it doesn’t do well with simple domains with obvious symmetry. Further, each domain should have a maximum of 3 processor-processor boundaries! It’s important to note here that in fact each processor has 3 major processor-processor boundaries (each node has 3 wide connections — this tell us that METIS is in fact roughly trying to get to the optical structure described above). It’s all the little connections that would be removed with some knowledge of the basic full domain structure. I understand and perhaps believe that this could all be due to some convergence criteria in the method which I am unaware of (in my reading of the papers on the subject and the code itself I haven’t found any such parameter), though still, I see no reason why some from-end part of the algorithmic implementation shouldn’t take into consideration the symmetry of the large and subdomain groups.
Cheers.
[UPDATE after the jump]
UPDATE: Here are two images from the processor 1 set. I think everyone should be able to see what I mean now.


Yikes! Is there a practical reason that you can see why group theory isn’t applied here?
that’s certainly what i would suggest at the front end of such an implementation of METIS …
all this being said i could of course just manually partition my box and be done with it, and the majority of people who use METIS use it in situations where there is nothing even remotely close to an analytical solution or really any symmetry at all … but i can imagine lots of situations in papers i’ve read even recently where people use the method in situations where it wasn’t really well applied. further, the VAST majority of novice parallel mesh solvers (people) will simply use the method because its the easiest thing to do.
i see no reason why there shouldn’t be simply one decomposition method which works well in all situations. there should be a group theory front end.