Github
-
Pop Gen Methods Github
[ Link ]
Various population genetics methods developed by our lab.
-
Song Lab @ Cal Github
[ Link ]
Other projects.
Demographic Inference
-
diCal Version 1
[ Link ]
Software accompaniment to
"Sheehan, S.*, Harris, K.*, Song, Y.S. Estimating variable effective population sizes from multiple genomes: A sequentially Markov conditional sampling distribution approach. Genetics, 194 (2013) 647-662. [ Journal ]"
diCal Version 1 is a scalable demographic inference method based on the sequentially Markov conditional sampling distribution framework. At present, diCal can infer a piecewise-constant population size history from the genomes of multiple individuals sampled from a single population. We are currently working on extending the method to handle more complex demography, incorporating multiple populations, population splits, migration, admixture, etc.
-
diCal Version 2
[ Link ]
Software accompaniment to
"Steinrücken, M., Kamm, J.A., and Song, Y.S.
Inference of complex population histories using whole-genome sequences from multiple populations.
[ Preprint ] "
diCal Version 2 an efficient, flexible statistical method that can utilize whole-genome sequence data from multiple populations to infer complex demographic models involving population size changes, population splits, admixture, and migration.
-
SMC++
[ Link ]
Software accompaniment to
"Terhorst, J., Kamm, J.A., and Song, Y.S.
Robust and scalable inference of population history from hundreds of unphased whole genomes.
Nature Genetics, Vol. 49 (2017) 303-309.
[ Journal ]"
SMC++ is a new inference method that combines the computational efficiency of the SFS and the advantage of utilizing LD information in coalescent HMMs.
It requires only unphased genomes, and can jointly infer population size histories and split times in diverged populations. It employs a novel spline regularization scheme that greatly reduces estimation error.
-
fastNeutrino
[ Link ]
Software accompaniment to
"Bhaskar, A., Wang, Y.X.R. and Song, Y.S.
Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Research, Vol. 25, No. 2 (2015) 268-279. [ Journal ]"
fastNeutrino is an efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates.
-
momi
[ Link ]
Software accompaniment to
"Kamm, J.A., Terhorst, J., and Song, Y.S.
Efficient computation of the joint sample frequency spectra for multiple populations.
Journal of Computational and Graphical Statistics, Vol. 26, No. 1 (2017) 182-194. [ Journal ]"
momi computes the expected joint site frequency spectrum (SFS) for a tree-shaped demography without migration, via a multipopulation Moran model. It can handle thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). It also computes the "truncated site frequency spectrum" for a single population, i.e. the frequency spectrum for mutations arising after a certain point in time. This can be used in both Moran and coalescent approaches to computing the multipopulation SFS.
-
momi2
[ Link ]
Software accompaniment to
"Kamm, J.A., Terhorst, J., Durbin, R, and Song, Y.S.
Efficiently inferring the demographic history of many populations with allele count data.
[ Preprint ] "
momi2 computes the expected multipopulation SFS and linear functionals of it, for demographies described by general directed acyclic graphs.
It also performs optimization over the parameter space by utilizing automatic differentiation to compute gradients of the SFS.
Transition density functions of WF diffusion processes
and their applications
spectralTDF
[ Link ]
Software accompaniment to "Steinrücken, M., Jewett, E.M., and Song, Y.S.
SpectralTDF: transition densities of diffusion processes with time-varying selection parameters, mutation rates, and effective population sizes. Bioinformatics, Vol. 32, No. 5 (2016) 795-797."
spectralHMM
[ Link ]
Software accompaniment to "Steinrücken, M., Bhaskar, A. and Song, Y.S.
A novel spectral method for inferring general diploid selection from time series genetic data. Annals of Applied Statistics, Vol. 8, No. 4 (2014) 2203-2222."
Estimating Recombination Rates
-
LDhelmet
[ Link ]
Software accompaniment to
"Chan, A.H., Jenkins, P.A., and Song, Y.S.
Genome-wide fine-scale recombination rate variation in Drosophila melanogaster.
PLoS Genetics, vol. 8 no. 12 (2012) e1003090."
LDhelmet is a statistical method based on reversible jump MCMC and composite likelihood. It samples piecewise constant recombination maps from a posterior distribution.
-
Overpaint
[ Link ]
Software accompaniment to
"Yin, J. Jordan, M. I., and Song, Y. S..
Joint estimation of gene conversion rates
and mean conversion tract lengths from population SNP data,
Proceedings of ISMB 2009, Bioinformatics, 25 (2009) i231-i239."
Overpaint is a C++ package that can jointly estimate crossover rates, gene conversion rates and mean conversion tract lengths from population SNP dataset.
Accuracy of the Coalescent
-
Genealogical quantities
[ Link ]
Software accompaniment to
"Bhaskar, A., Clark, A.G., and Song, Y.S. Distortion of genealogical properties when the sample is very large. PNAS, vol. 111 no. 6 (2014) 2385-2390."
Contains several programs to compute various genealogical quantities under Kingman's coalescent and the discrete-time Wright-Fisher models of random mating.
Short-Read Error Correction
-
ECHO
[ Link ]
Software accompaniment to
"Kao, W.-C., Chan, A. H., and Song, Y. S.
ECHO: A reference-free short-read error correction algorithm,
Genome Research,
21 (2011) 1181-1192"
De novo Assembly
-
Telescoper
[ Link ]
Bresler, M., Sheehan, S., Chan, A.H., and Song, Y.S. Telescoper: De novo Assembly of Highly Repetitive Regions. ECCB'12 Special Issue, Bioinformatics, 28 (2012) i311-i317.
Telescoper is a local assembly algorithm designed for short-reads from NGS platforms such as Illumina. The reads must come from two libraries: one short insert, and one long insert. The algorithm begins with a user-given seed string, and assembles a graph of possible extensions, and prints one path of extensions, as a fasta file.
The software is still a beta version. We have not yet tested it extensively, and envision many improvements down the line.
Basecaller for the Illumina Platform
-
(naive)BayesCall
[ Link ]
Software accompaniment to
"Kao, W.C., Stevens, K. and Song, Y.S.
BayesCall: A model-based basecalling algorithm for high-throughput short-read sequencing.
Genome Research,
19 (2009) 1884-1895."
Kao, W.C. and Song, Y.S.
naiveBayesCall: An efficient model-based base-calling algorithm for high-throughput sequencing.
Proc. 14th Annual Intl. Conf. on Research in Computational Molecular Biology
(RECOMB 2010),
Lecture Notes in Computer Science 6044, pages 233--247, 2010.
(A new base-calling algorithm that builds on our previous method BayesCall to achieve scalability.)
Likelihoods under the Coalescent with Recombination
-
ASF
[ Link ]
Software accompaniment to
"Jenkins, P.A. and Song, Y.S.
Closed-form two-locus sampling distributions: accuracy and universality
Genetics, 183 (2009) 1087-1103."
-
COB
[ Link ]
Software accompaniment to
"Lyngsų, R., Song, Y.S., and Hein, J.
Accurate computation of likelihoods in the coalescent with recombination via parsimony.
Proc. 12th Annual Intl. Conf. on Research in Computational Molecular Biology (RECOMB 2008),
Lecture Notes in Computer Science 4955, pages 463--477."
COB is a parsimony-based method of computing likelihoods accurately under the coalescent with
recombination.
Multi-locus Match Probability
-
Wright_Fisher_MP and
Moran_MP
[ Link ]
Software accompaniment to
"Bhaskar, A. and Song, Y.S.
Multi-locus match probability in a finite population: A fundamental difference between the Moran and Wright-Fisher models.
Proceedings of ISMB 2009, Bioinformatics, 25 (2009) i187-i195."
Whole-Genome Association Mapping
-
BLOSSOC
[ Link ]
Software accompaniment to
"Ding, Z., Mailund, T., and Song, Y.S.
Efficient whole-genome association mapping using local phylogenies for
unphased genotype data.
Bioinformatics, 24 (2008) 2215-2221."
This program combines a recently found linear-time algorithm
for phasing genotypes on trees with a
tree-based method for association mapping. From unphased
genotype data, our algorithm builds local phylogenies along the
genome, and scores each tree according to the clustering of
cases and controls.
Algorithms for Detecting Recombination
-
HapBound and SHRUB
[ Link ]
Software accompaniment to
"Song, Y.S., Wu, Y. and Gusfield, D.
Efficient computation of close lower and upper bounds on the minimum number of
recombinations in biological sequence evolution,
Proceedings of ISMB 2005.
Bioinformatics, 21, Suppl.1, (2005) i413-i422."
HapBound and SHRUB respectively compute lower and upper bounds on the minimum number of crossover recombinations.
SHRUB constructs an ancestral recombination graph for the input data.
-
HapBound-GC and SHRUB-GC
[ Link ]
Software accompaniment to
"Song, Y.S., Ding, Z., Gusfield, D., Langley, C.H., and Wu, Y.
Algorithms to Distinguish the Role of Gene-Conversion from
Single-Crossover Recombination in the Derivation of SNP Sequences in Populations
Proceedings of RECOMB 2006.
Lecture Notes in Computer Science 3909, (2006) 231-245."
HapBound-GC and SHRUB-GC respectively compute lower and upper bounds on the minimum combined number of crossover and gene-conversion recombinations.
SHRUB-GC constructs a graphical representation of evolutionary history involving coalescent, mutation, crossover and gene-conversion events.
-
Beagle
[ Link ]
Software accompaniment to
"Lyngsø, R., Song, Y.S., and Hein, J.
Minimum Recombination Histories by Branch and Bound.
Proceedings of WABI 2005,
Lecture Notes in Computer Science, 3692, pp. 239-250."
Beagle computes the minimum number of crossover recombinations. It also produces an ancestral recombination graph.