Parallelizing the Phylogeny Problem

	  Jeff A. Jones and Katherine Yelick 

The problem of determining the evolutionary history of species 
in the form of phylogenetic trees is known as the phylogeny 
problem.  We present a parallelization of the character 
compatibility method for solving the phylogeny problem. 
Abstractly, the algorithm searches through all subsets of 
characters, which may be traits like opposable thumbs or  
DNA sequence values, looking for a maximal consistent subset.  
The notion of consistency in this case is the existence of a 
particular kind of phylogenetic tree called a perfect phylogeny 
tree.  

The two challenges to achieving an efficient implementation
are load balancing and efficient sharing of information to
enable pruning.  In both cases, there is a trade-off between
communication overhead and the quality of the solution.  For
load balancing we use a distributed task queue, which has
imperfect load information but avoids centralization bottlenecks.
To prune the search space, we use the following property:
If a perfect phylogeny tree does not exist for some set of
characters, then none exists for any superset of that set.  
This is implemented by searching the power set starting with 
the smallest sets, and storing failures in an efficient
distributed trie.  The resulting program shows speedups of 
50 on a 64-processor CM-5.