NAME: shrub-gc (simulated history recombination upper bound with gene-conversion) SYNOPSIS: shrub-gc [-d #] [-l #] [-b #] [-n #] [-f] [-r] [-e] [-t #] [-g name] data-filename OPTIONS: (See README-SHRUB-GC for a more detailed description of the program.) -d # Display setting. 0: Only display final results (Default). 1: Display the original and the reduced data. 2: Display as in 1 + display local bound matrices for global lower bounds. -l # Fast lower bound method to be used in branch and bound. 1: Hudson and Kaplan's bound. 2: Approximate haplotype bound using distinct columns and rows. (Default: 1) -b # Degree of branching. The maximum number of row removals to be tried, in fast upper bound computation, for every matrix in step 3' of the algorithm. (Default: 2) -n # Number of runs to be executed. (Default: 3) -f Turn off full branch and bound. -r Root known. The first sequences is assumed to be the root sequence. -e Suppress mutation labels on edges. -t # Gene conversion maximum tract length. (Default: 0) -g [name] Output an ARG with the computed number of recombination events into a file. The output is in GML format. (Default: ARG.gml) NOTE : 1) In general, choosing higher degree of branching leads to better fast upper bounds, but increases computation time. 2) If you are not using full branch and bound, the quality of bound generally depends on what options are used. We recommend trying out both "-l 1" and "-l 2" options. Depending on the data set, one may work much better than the other. Also, -b and and -n options should be explored as well. EXAMPLES : shrub-gc -t 100 datafile shrub-gc -t 100 -b 1 -n 50 datafile shrub-gc -t 100 -b 3 -n 1 datafile shrub-gc -t 100 -l 1 -n 3 datafile shrub-gc -t 100 -l 1 -f datafile shrub-gc -t 100 -g graphics.gml datafile shrub-gc -t 100 -l 1 -f -g graphics.gml datafile --------------------------------------------------------------------------- COMPILATION : A makefile is provided. Simply typing "make" should compile and link the program correctly on most platforms. DATA FILE : The first line of the data file should contain the physical SNP positions. The data should be in 0,1. (White space is allowed between columns.) Each sequence should be placed in its own row. EXAMPLE : Example data sets are included. Use those data sets to check that everything works correctly. GRAPH VIEWING: The GML file generated by the program can be viewed using a software of your choice. We recommend VGJ (Visualizing Graphs with Java), which can be downloaded free of charge from the following webpage: http://www.eng.auburn.edu/department/cse/research/graph_drawing/graph_drawing.html NOTE: You need to place the GIF files ("*-ball.gif") in the main VGJ directory. * red-ball.gif (for leaves) * blue-ball.gif (for crossover vertices) * green-ball.gif (for gene-conversion vertices) * white-ball.gif (for coalescent vertices) Label below a crossover vertex: (i,j) denotes the interval containing the crossover breakpoint Label below a gene-conversion vertex: [i,j] denotes the interval contained in the gene-conversion tract 1) Go to the main VGJ directory. (default: graph_drawing) 2) Start VGJ. (java EDU/auburn/VGJ/VGJ &) 3) Click on "Start a Graph Window" to bring up a viewing window. 4) On the menu bar, choose "File" and then "Open (GML)." 5) When you first open a GML file generated by SHRUB-GC, everything looks clustered at a point. To expand the graph, choose "Algorithms" --> "Tree" --> "Tree Down" 6) Although step (4) makes the graph viewable, you might still wish to rearrange certain things. For example, some directed edges might point upward instead of downward. To rearrange the graph manually, click on "Select Nodes" under "Mouse Action," located to the left of the graph. Using the mouse, try rearranging the nodes to your liking. BUG REPORT : Please report bugs to Yun S. Song