Demographic Inference

diCal Version 1
[ Link ]
Software accompaniment to
"Sheehan, S.*, Harris, K.*, Song, Y.S. Estimating variable effective population sizes from multiple genomes: A sequentially Markov conditional sampling distribution approach. Genetics, 194 (2013) 647662."
diCal Version 1 is a scalable demographic inference method based on the sequentially Markov conditional sampling distribution framework. At present, diCal can infer a piecewiseconstant population size history from the genomes of multiple individuals sampled from a single population. We are currently working on extending the method to handle more complex demography, incorporating multiple populations, population splits, migration, admixture, etc.

diCal Version 2
[ Link ]
Software accompaniment to
"Steinrücken, M., Kamm, J.A., and Song, Y.S.
Inference of complex population histories using wholegenome sequences from multiple populations.
bioRxiv Preprint: http://dx.doi.org/10.1101/026591"
diCal Version 2 an efficient, flexible statistical method that can utilize wholegenome sequence data from multiple populations to infer complex demographic models involving population size changes, population splits, admixture, and migration.

fastNeutrino
[ Link ]
Software accompaniment to
"Bhaskar, A., Wang, Y.X.R. and Song, Y.S.
Efficient inference of population size histories and locusspecific mutation rates from largesample genomic variation data. Genome Research, Vol. 25, No. 2 (2015) 268279."
fastNeutrino is an efficient algorithm to infer piecewiseexponential models of the historical effective population size from the distribution of sample allele frequencies. In addition to inferring demography, our method can also accurately estimate locusspecific mutation rates.

momi
[ Link ]
Software accompaniment to
"Kamm, J.A., Terhorst, J., and Song, Y.S.
Efficient computation of the joint sample frequency spectra for multiple populations.
Journal of Computational and Graphical Statistics, in press. [ Journal ]"
This program computes the expected joint site frequency spectrum (SFS) for a treeshaped demography without migration, via a multipopulation Moran model. It can handle thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewiseexponential growth). It also computes the "truncated site frequency spectrum" for a single population, i.e. the frequency spectrum for mutations arising after a certain point in time. This can be used in both Moran and coalescent approaches to computing the multipopulation SFS.
Transition density functions of WF diffusion processes
and their applications
spectralTDF
[ Link ]
Software accompaniment to "Steinrücken, M., Jewett, E.M., and Song, Y.S.
SpectralTDF: transition densities of diffusion processes with timevarying selection parameters, mutation rates, and effective population sizes. Bioinformatics, Vol. 32, No. 5 (2016) 795797."
spectralHMM
[ Link ]
Software accompaniment to "Steinrücken, M., Bhaskar, A. and Song, Y.S.
A novel spectral method for inferring general diploid selection from time series genetic data. Annals of Applied Statistics, Vol. 8, No. 4 (2014) 22032222."
Estimating Recombination Rates

LDhelmet
[ Link ]
Software accompaniment to
"Chan, A.H., Jenkins, P.A., and Song, Y.S.
Genomewide finescale recombination rate variation in Drosophila melanogaster.
PLoS Genetics, vol. 8 no. 12 (2012) e1003090."
LDhelmet is a statistical method based on reversible jump MCMC and composite likelihood. It samples piecewise constant recombination maps from a posterior distribution.

Overpaint
[ Link ]
Software accompaniment to
"Yin, J. Jordan, M. I., and Song, Y. S..
Joint estimation of gene conversion rates
and mean conversion tract lengths from population SNP data,
Proceedings of ISMB 2009, Bioinformatics, 25 (2009) i231i239."
Overpaint is a C++ package that can jointly estimate crossover rates, gene conversion rates and mean conversion tract lengths from population SNP dataset.
Accuracy of the Coalescent

Genealogical quantities
[ Link ]
Software accompaniment to
"Bhaskar, A., Clark, A.G., and Song, Y.S. Distortion of genealogical properties when the sample is very large. PNAS, vol. 111 no. 6 (2014) 23852390."
Contains several programs to compute various genealogical quantities under Kingman's coalescent and the discretetime WrightFisher models of random mating.
ShortRead Error Correction

ECHO
[ Link ]
Software accompaniment to
"Kao, W.C., Chan, A. H., and Song, Y. S.
ECHO: A referencefree shortread error correction algorithm,
Genome Research,
21 (2011) 11811192"
De novo Assembly

Telescoper
[ Link ]
Bresler, M., Sheehan, S., Chan, A.H., and Song, Y.S. Telescoper: De novo Assembly of Highly Repetitive Regions. ECCB'12 Special Issue, Bioinformatics, 28 (2012) i311i317.
Telescoper is a local assembly algorithm designed for shortreads from NGS platforms such as Illumina. The reads must come from two libraries: one short insert, and one long insert. The algorithm begins with a usergiven seed string, and assembles a graph of possible extensions, and prints one path of extensions, as a fasta file.
The software is still a beta version. We have not yet tested it extensively, and envision many improvements down the line.
Basecaller for the Illumina Platform

(naive)BayesCall
[ Link ]
Software accompaniment to
"Kao, W.C., Stevens, K. and Song, Y.S.
BayesCall: A modelbased basecalling algorithm for highthroughput shortread sequencing.
Genome Research,
19 (2009) 18841895."
Kao, W.C. and Song, Y.S.
naiveBayesCall: An efficient modelbased basecalling algorithm for highthroughput sequencing.
Proc. 14th Annual Intl. Conf. on Research in Computational Molecular Biology
(RECOMB 2010),
Lecture Notes in Computer Science 6044, pages 233247, 2010.
(A new basecalling algorithm that builds on our previous method BayesCall to achieve scalability.)
Likelihoods under the Coalescent with Recombination

ASF
[ Link ]
Software accompaniment to
"Jenkins, P.A. and Song, Y.S.
Closedform twolocus sampling distributions: accuracy and universality
Genetics, 183 (2009) 10871103."

COB
[ Link ]
Software accompaniment to
"Lyngsų, R., Song, Y.S., and Hein, J.
Accurate computation of likelihoods in the coalescent with recombination via parsimony.
Proc. 12th Annual Intl. Conf. on Research in Computational Molecular Biology (RECOMB 2008),
Lecture Notes in Computer Science 4955, pages 463477."
COB is a parsimonybased method of computing likelihoods accurately under the coalescent with
recombination.
Multilocus Match Probability

Wright_Fisher_MP and
Moran_MP
[ Link ]
Software accompaniment to
"Bhaskar, A. and Song, Y.S.
Multilocus match probability in a finite population: A fundamental difference between the Moran and WrightFisher models.
Proceedings of ISMB 2009, Bioinformatics, 25 (2009) i187i195."
WholeGenome Association Mapping

BLOSSOC
[ Link ]
Software accompaniment to
"Ding, Z., Mailund, T., and Song, Y.S.
Efficient wholegenome association mapping using local phylogenies for
unphased genotype data.
Bioinformatics, 24 (2008) 22152221."
This program combines a recently found lineartime algorithm
for phasing genotypes on trees with a
treebased method for association mapping. From unphased
genotype data, our algorithm builds local phylogenies along the
genome, and scores each tree according to the clustering of
cases and controls.
Algorithms for Detecting Recombination

HapBound and SHRUB
[ Link ]
Software accompaniment to
"Song, Y.S., Wu, Y. and Gusfield, D.
Efficient computation of close lower and upper bounds on the minimum number of
recombinations in biological sequence evolution,
Proceedings of ISMB 2005.
Bioinformatics, 21, Suppl.1, (2005) i413i422."
HapBound and SHRUB respectively compute lower and upper bounds on the minimum number of crossover recombinations.
SHRUB constructs an ancestral recombination graph for the input data.

HapBoundGC and SHRUBGC
[ Link ]
Software accompaniment to
"Song, Y.S., Ding, Z., Gusfield, D., Langley, C.H., and Wu, Y.
Algorithms to Distinguish the Role of GeneConversion from
SingleCrossover Recombination in the Derivation of SNP Sequences in Populations
Proceedings of RECOMB 2006.
Lecture Notes in Computer Science 3909, (2006) 231245."
HapBoundGC and SHRUBGC respectively compute lower and upper bounds on the minimum combined number of crossover and geneconversion recombinations.
SHRUBGC constructs a graphical representation of evolutionary history involving coalescent, mutation, crossover and geneconversion events.

Beagle
[ Link ]
Software accompaniment to
"Lyngsø, R., Song, Y.S., and Hein, J.
Minimum Recombination Histories by Branch and Bound.
Proceedings of WABI 2005,
Lecture Notes in Computer Science, 3692, pp. 239250."
Beagle computes the minimum number of crossover recombinations. It also produces an ancestral recombination graph.