- diCal Version 1 [ Link ]
Software accompaniment to "Sheehan, S.*, Harris, K.*, Song, Y.S. Estimating variable effective population sizes from multiple genomes: A sequentially Markov conditional sampling distribution approach. Genetics, 194 (2013) 647-662." "Chan, A.H., Jenkins, P.A., and Song, Y.S.
diCal Version 1 is a scalable demographic inference method based on the sequentially Markov conditional sampling distribution framework. At present, diCal can infer a piecewise-constant population size history from the genomes of multiple individuals sampled from a single population. We are currently working on extending the method to handle more complex demography, incorporating multiple populations, population splits, migration, admixture, etc.
Estimating Recombination Rates
- LDhelmet [ Link ]
Software accompaniment to "Chan, A.H., Jenkins, P.A., and Song, Y.S. Genome-wide fine-scale recombination rate variation in Drosophila melanogaster. PLoS Genetics, vol. 8 no. 12 (2012) e1003090."
LDhelmet is a statistical method based on reversible jump MCMC and composite likelihood. It samples piecewise constant recombination maps from a posterior distribution.
- Overpaint [ Link ]
Software accompaniment to "Yin, J. Jordan, M. I., and Song, Y. S.. Joint estimation of gene conversion rates and mean conversion tract lengths from population SNP data, Proceedings of ISMB 2009, Bioinformatics, 25 (2009) i231-i239."
Overpaint is a C++ package that can jointly estimate crossover rates, gene conversion rates and mean conversion tract lengths from population SNP dataset.
Short-Read Error Correction
- ECHO [ Link ]
Software accompaniment to
"Kao, W.-C., Chan, A. H., and Song, Y. S. ECHO: A reference-free short-read error correction algorithm,Genome Research, 21 (2011) 1181-1192"
De novo Assembly
- Telescoper [ Link ]
Bresler, M., Sheehan, S., Chan, A.H., and Song, Y.S. Telescoper: De novo Assembly of Highly Repetitive Regions. ECCB'12 Special Issue, Bioinformatics, 28 (2012) i311-i317.
Telescoper is a local assembly algorithm designed for short-reads from NGS platforms such as Illumina. The reads must come from two libraries: one short insert, and one long insert. The algorithm begins with a user-given seed string, and assembles a graph of possible extensions, and prints one path of extensions, as a fasta file. The software is still a beta version. We have not yet tested it extensively, and envision many improvements down the line.
Basecaller for the Illumina Platform
- (naive)BayesCall [ Link ]
Software accompaniment to
"Kao, W.C., Stevens, K. and Song, Y.S. BayesCall: A model-based basecalling algorithm for high-throughput short-read sequencing. Genome Research, 19 (2009) 1884-1895."
Kao, W.C. and Song, Y.S. naiveBayesCall: An efficient model-based base-calling algorithm for high-throughput sequencing. Proc. 14th Annual Intl. Conf. on Research in Computational Molecular Biology(RECOMB 2010), Lecture Notes in Computer Science 6044, pages 233--247, 2010.
(A new base-calling algorithm that builds on our previous method BayesCall to achieve scalability.)
Likelihoods under the Coalescent with Recombination
- ASF [ Link ]
Software accompaniment to "Jenkins, P.A. and Song, Y.S. Closed-form two-locus sampling distributions: accuracy and universality Genetics, 183 (2009) 1087-1103."
- COB [ Link ]
Software accompaniment to "Lyngsø, R., Song, Y.S., and Hein, J. Accurate computation of likelihoods in the coalescent with recombination via parsimony. Proc. 12th Annual Intl. Conf. on Research in Computational Molecular Biology (RECOMB 2008), Lecture Notes in Computer Science 4955, pages 463--477."
COB is a parsimony-based method of computing likelihoods accurately under the coalescent with recombination.
Multi-locus Match Probability
- Wright_Fisher_MP and Moran_MP [ Link ]
Software accompaniment to "Bhaskar, A. and Song, Y.S. Multi-locus match probability in a finite population: A fundamental difference between the Moran and Wright-Fisher models. Proceedings of ISMB 2009, Bioinformatics, 25 (2009) i187-i195."
Whole-Genome Association Mapping
- BLOSSOC [ Link ]
Software accompaniment to "Ding, Z., Mailund, T., and Song, Y.S. Efficient whole-genome association mapping using local phylogenies for unphased genotype data. Bioinformatics, 24 (2008) 2215-2221."
This program combines a recently found linear-time algorithm for phasing genotypes on trees with a tree-based method for association mapping. From unphased genotype data, our algorithm builds local phylogenies along the genome, and scores each tree according to the clustering of cases and controls.
Algorithms for Detecting Recombination
- HapBound and SHRUB [ Link ]
Software accompaniment to "Song, Y.S., Wu, Y. and Gusfield, D. Efficient computation of close lower and upper bounds on the minimum number of recombinations in biological sequence evolution,Proceedings of ISMB 2005. Bioinformatics, 21, Suppl.1, (2005) i413-i422."
HapBound and SHRUB respectively compute lower and upper bounds on the minimum number of crossover recombinations. SHRUB constructs an ancestral recombination graph for the input data.
- HapBound-GC and SHRUB-GC [ Link ]
Software accompaniment to "Song, Y.S., Ding, Z., Gusfield, D., Langley, C.H., and Wu, Y. Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover Recombination in the Derivation of SNP Sequences in Populations Proceedings of RECOMB 2006. Lecture Notes in Computer Science 3909, (2006) 231-245."
HapBound-GC and SHRUB-GC respectively compute lower and upper bounds on the minimum combined number of crossover and gene-conversion recombinations. SHRUB-GC constructs a graphical representation of evolutionary history involving coalescent, mutation, crossover and gene-conversion events.
- Beagle [ Link ]
Software accompaniment to "Lyngsø, R., Song, Y.S., and Hein, J. Minimum Recombination Histories by Branch and Bound. Proceedings of WABI 2005, Lecture Notes in Computer Science, 3692, pp. 239-250."
Beagle computes the minimum number of crossover recombinations. It also produces an ancestral recombination graph.