HapBound-GC v0.1 INTRODUCTION : This program, HapBound-GC, computes the lower bound on the minimum number of recombinations that are needed in a network. HapBound-GC improves the Haplotype bound idea, implemented in RecMin, is easier to use and and usually produce sharper bound. HapBound-GC uses integer linear programming. This free distributed software contains the GNU GLPK ILP solver. HapBound-GC also supports CPLEX, which runs faster, but CPLEX is an expensive commercial code. Fore more technical details, refer to our paper. HapBound-GC's performance starts to degrade when the data gets large, say containing more than 300 sites. In that case, we suggest you use CPLEX as ILP solver. CPLEX runs much faster than the free GLPK. Since the implemented algorithm for self derivability test runs in exponential time, HapBound-GC occasionally appears to be slow when -S is turned on. So there is timeout utility to bound the time SDT can run. You can modify the timeout limit in HapBound-GC.cfg. HapBound-GC is tested on Linux machines, as well as Windows (with cygwin). New: 3/16/06, now HapBound-GC supports gene conversion lower bound computation. Gene conversion is a type of recombination with two crossovers. HapBound-GC allows options to be chosen to compute a lower bound on the minimum number of single-crossover recombination AND gene conversion. We also allow user to configure the maximum tract length (defined as the length between two breakpoints in gene conversion). --------------------------------------------------------------------------- SYNOPSIS: HapBound-GC [ -S[,i#[,w#]] ] [ -M ] [-G[,[[s|d],]#]] data-filename OPTIONS: -S Activate SDT: self derivability test. By default, this option allows one additional (new) sequence to be inserted into data matrix before checking SDT. To reduce the running time, the user can choose to specify -S,i0 so that no new sequences is allowed. This typically reduces running time, but may produce a weaker bound. Another way to reduce running time is limit the data width (or number of the columns) in the matrix SDT is going to be performed. One can use -S,i0,w7, for example, to limit SDT be applied only on matrix up to 7 columns, and no additional row sequences. -M This mode explicitly checks every interval and compute the Haplotype bound directly without taking subsets. Then, we perform SDT on this interval. Surprisingly, this simple method sometimes produce very good bound, especially when coupled with -S. In other words, HapBound-GC -S -M typically produces the strongest bounds. However, we do not recommend -M to be used when the data is large, say containing more than 100 columns. -G This mode activates gene conversion lower bound. This option is not to be combined with -S or -M options. Option s means HapBound-GC is to maximize the number of single-crossovers over all solutions with the smallest combined single-crossover and gene-conversion lower bound. Option d, on the other hand, maximizes the number of gene-conversions over all solutions with the smallest combined lower bound. The optional number following s or d is the maximum tract length. Examples: -G activates gene conversion lower bound while allowing arbitrary tract length. -G,1000 activates gene conversion while restricting the tract length to be at most 1000 bp. -G,s activates gene conversion and also maximizes the number of single crossovers as secondary goal. -G,d,1000 activates gene conversion, maximizes gene conversion as secondary goal, and sets the maximum tract length to 1000. --------------------------------------------------------------------------- DATA FILE : The first line of the data file is ignored. If you like, you may put a description of the data there. The data should be in 0,1. (White space is allowed between columns.) Each sequence should be placed in a its own row. EXAMPLE : Example data sets are provided. Try running "./HapBound-GC sim1-n20s30r5f5m500.dat" to check that everything works correctly. CONTACT : Please send bug reports and technical questions to . For general questions, contact Prof. Dan Gusfield at .