HapBound v0.1 INTRODUCTION : This program, HapBound, computes the lower bound on the minimum number of recombinations that are needed in a network. HapBound improves the Haplotype bound idea, implemented in RecMin, is easier to use and and usually produce sharper bound. HapBound uses integer linear programming. This free distributed software contains the GNU GLPK ILP solver. HapBound also supports CPLEX, which runs faster, but CPLEX is an expensive commerical code. Fore more technical details, refer to our paper. HapBound's performance starts to degrade when the data gets large, say containing more than 300 sites. In that case, we suggest you use CPLEX as ILP solver. CPLEX runs much faster than the free GLPK. Since the implemented algorithm for self derivability test runs in exponential time, HapBound occasionally appears to be slow when -S is turned on. So there is timeout utility to bound the time SDT can run. You can modify the timeout limit in HapBound.cfg. HapBound is tested on Linux machines, as well as Windows (with cygwin). --------------------------------------------------------------------------- SYNOPSIS: HapBound [ -S[,i#[,w#]] ] [ -M ] data-filename OPTIONS: -S Activate SDT: self derivability test. By default, this option allows one additional (new) sequence to be inserted into data matrix before checking SDT. To reduce the running time, the user can choose to specifiy -S,i0 so that no new sequences is allowed. This typically reduces running time, but may produce a weaker bound. Another way to reduce running time is limit the data width (or number of the oclumns) in the matrix SDT is going to be performed. One can use -S,i0,w7, for example, to limit SDT be applied only on matrix up to 7 columns, and no additional row sequences. -M This mode explicitly checks every interval and compute the Haplotype bound directly without taking subsets. Then, we perform SDT on this interval. Surprisingly, this simple method sometimes produce very good bound, especially when coupled with -S. In other words, HapBound -S -M typically produces the strongest bounds. However, we do not recommend -M to be used when the data is large, say containing more than 100 columns. --------------------------------------------------------------------------- DATA FILE : The first line of the data file is ignored. If you like, you may put a description of the data there. The data should be in 0,1. (White space is allowed between columns.) Each sequence should be placed in a its own row. EXAMPLE : An example data set (Kreitman.dat) is included. Try running "./HapBound Kreitman.dat" to check that everything works correctly. CONTACT : Please send bug reportss and technical questions to . For general questions, contact Prof. Dan Gusfield at .