HapBound-GC v0.1


INTRODUCTION :  This program, HapBound-GC, computes the lower bound on the 
                minimum number of recombinations that are needed in a 
                network. HapBound-GC improves the Haplotype bound idea, 
                implemented in RecMin, is easier to use and and usually 
                produce sharper bound. HapBound-GC uses integer linear 
                programming. This free distributed software contains the 
                GNU GLPK ILP solver. HapBound-GC also supports CPLEX, which 
                runs faster, but CPLEX is an expensive commercial code.  
                Fore more technical details, refer to our paper.

                HapBound-GC's performance starts to degrade when the data 
                gets large, say containing more than 300 sites. In that 
                case, we suggest you use CPLEX as ILP solver. CPLEX runs 
                much faster than the free GLPK.

                Since the implemented algorithm for self derivability test
                runs in exponential time, HapBound-GC occasionally appears to
                be slow when -S is turned on. So there is timeout utility
                to bound the time SDT can run. You can modify the timeout
                limit in HapBound-GC.cfg.

                HapBound-GC is tested on Linux machines, as well as Windows 
                (with cygwin).  

                New: 3/16/06, now HapBound-GC supports gene conversion lower 
		bound computation. Gene conversion is a type of recombination
		with two crossovers. HapBound-GC allows options to be chosen
		to compute a lower bound on the minimum number of 
		single-crossover recombination AND gene conversion. We 
		also allow user to configure the maximum tract length 
		(defined as the length between two breakpoints in gene
		conversion).
 
---------------------------------------------------------------------------

SYNOPSIS: HapBound-GC [ -S[,i#[,w#]] ] [ -M ] [-G[,[[s|d],]#]] data-filename

OPTIONS:
  -S        Activate SDT: self derivability test. By default, this option 
            allows one additional (new) sequence to be inserted into 
            data matrix before checking SDT. To reduce the running time,
            the user can choose to specify -S,i0 so that no new sequences
            is allowed. This typically reduces running time, but may 
            produce a weaker bound. Another way to reduce running time is
            limit the data width (or number of the columns) in the matrix
            SDT is going to be performed. One can use -S,i0,w7, for example,
            to limit SDT be applied only on matrix up to 7 columns, and 
            no additional row sequences. 

  -M        This mode explicitly checks every interval and compute the 
            Haplotype bound directly without taking subsets. Then, we perform
            SDT on this interval. Surprisingly, this simple method sometimes
            produce very good bound, especially when coupled with -S. In 
            other words, HapBound-GC -S -M typically produces the strongest
            bounds. However, we do not recommend -M to be used when the data
            is large, say containing more than 100 columns. 

  -G   	    This mode activates gene conversion lower bound. This option is not
	    to be combined with -S or -M options. 
            Option s means HapBound-GC is to maximize the number of 
	    single-crossovers over all solutions with the smallest combined 
 	    single-crossover and gene-conversion lower bound. Option d, on 
	    the other hand, maximizes the number of gene-conversions
            over all solutions with the smallest combined lower bound. 
            The optional number following s or d is the maximum tract length.

            Examples:
            -G        activates gene conversion lower bound while allowing 
                      arbitrary tract length.
            -G,1000   activates gene conversion while restricting the tract 
                      length to be at most 1000 bp. 
            -G,s      activates gene conversion and also maximizes the number 
 	              of single crossovers as secondary goal. 
            -G,d,1000 activates gene conversion, maximizes gene conversion 
                      as secondary goal, and sets the maximum tract length to 
                      1000.

---------------------------------------------------------------------------

DATA FILE    :  The first line of the data file is ignored.  If you like,
                you may put a description of the data there.
                The data should be in 0,1. (White space is allowed 
                between columns.)  Each sequence should be placed in a its
                own row.

EXAMPLE      :  Example data sets are provided.
                Try running "./HapBound-GC sim1-n20s30r5f5m500.dat" to 
                check that everything works correctly.
		   
CONTACT      :  Please send bug reports and technical questions to  
                <wuyu@cs.ucdavis.edu>. For general questions, contact
                Prof. Dan Gusfield at <dmgusfield@ucdavis.edu>.