HapBound v0.1
INTRODUCTION : This program, HapBound, computes the lower bound on the
minimum number of recombinations that are needed in a
network. HapBound improves the Haplotype bound idea,
implemented in RecMin, is easier to use and and usually
produce sharper bound. HapBound uses integer linear
programming. This free distributed software contains the
GNU GLPK ILP solver. HapBound also supports CPLEX, which
runs faster, but CPLEX is an expensive commerical code.
Fore more technical details, refer to our paper.
HapBound's performance starts to degrade when the data
gets large, say containing more than 300 sites. In that
case, we suggest you use CPLEX as ILP solver. CPLEX runs
much faster than the free GLPK.
Since the implemented algorithm for self derivability test
runs in exponential time, HapBound occasionally appears to
be slow when -S is turned on. So there is timeout utility
to bound the time SDT can run. You can modify the timeout
limit in HapBound.cfg.
HapBound is tested on Linux machines, as well as Windows
(with cygwin).
---------------------------------------------------------------------------
SYNOPSIS: HapBound [ -S[,i#[,w#]] ] [ -M ] data-filename
OPTIONS:
-S Activate SDT: self derivability test. By default, this option
allows one additional (new) sequence to be inserted into
data matrix before checking SDT. To reduce the running time,
the user can choose to specifiy -S,i0 so that no new sequences
is allowed. This typically reduces running time, but may
produce a weaker bound. Another way to reduce running time is
limit the data width (or number of the oclumns) in the matrix
SDT is going to be performed. One can use -S,i0,w7, for example,
to limit SDT be applied only on matrix up to 7 columns, and
no additional row sequences.
-M This mode explicitly checks every interval and compute the
Haplotype bound directly without taking subsets. Then, we perform
SDT on this interval. Surprisingly, this simple method sometimes
produce very good bound, especially when coupled with -S. In
other words, HapBound -S -M typically produces the strongest
bounds. However, we do not recommend -M to be used when the data
is large, say containing more than 100 columns.
---------------------------------------------------------------------------
DATA FILE : The first line of the data file is ignored. If you like,
you may put a description of the data there.
The data should be in 0,1. (White space is allowed
between columns.) Each sequence should be placed in a its
own row.
EXAMPLE : An example data set (Kreitman.dat) is included.
Try running "./HapBound Kreitman.dat" to
check that everything works correctly.
CONTACT : Please send bug reportss and technical questions to
. For general questions, contact
Prof. Dan Gusfield at .