1. DISENTANGLING THE EFFECTS OF GEOGRAPHIC AND ECOLOGICAL ISOLATION ON GENETIC DIFFERENTIATION
http://onlinelibrary.wiley.com/doi/10.1111/evo.12193/full
Populations can be genetically isolated both by geographic distance and by differences in their ecology or environment that decrease the rate of successful migration. Empirical studies often seek to investigate the relationship between genetic differentiation and some ecological variable(s) while accounting for geographic distance, but common approaches to this problem (such as the partial Mantel test) have a number of drawbacks. In this article, we present a Bayesian method that enables users to quantify the relative contributions of geographic distance and ecological distance to genetic differentiation between sampled populations or individuals. We model the allele frequencies in a set of populations at a set of unlinked loci as spatially correlated Gaussian processes, in which the covariance structure is a decreasing function of both geographic and ecological distance. Parameters of the model are estimated using a Markov chain Monte Carlo algorithm. We call this method Bayesian Estimation of Differentiation in Alleles by Spatial Structure and Local Ecology (BEDASSLE), and have implemented it in a user-friendly format in the statistical platform R. We demonstrate its utility with a simulation study and empirical applications to human and teosinte data sets.
http://genescape.ucdavis.edu/scripts-and-code/
2. INTEGRATING LANDSCAPE GENOMICS AND SPATIALLY EXPLICIT APPROACHES TO DETECT LOCI UNDER SELECTION IN CLINAL POPULATIONS
http://onlinelibrary.wiley.com/doi/10.1111/evo.12237/abstract
Uncovering the genetic basis of adaptation hinges on the ability to detect loci under selection. However, population genomics outlier approaches to detect selected loci may be inappropriate for clinal populations or those with unclear population structure because they require that individuals be clustered into populations. An alternate approach, landscape genomics, uses individual-based approaches to detect loci under selection and reveal potential environmental drivers of selection. We tested four landscape genomics methods on a simulated clinal population to determine their effectiveness at identifying a locus under varying selection strengths along an environmental gradient. We found all methods produced very low type I error rates across all selection strengths, but elevated type II error rates under “weak” selection. We then applied these methods to an AFLP genome scan of an alpine plant, Campanula barbata, and identified five highly supported candidate loci associated with precipitation variables. These loci also showed spatial autocorrelation and cline patterns indicative of selection along a precipitation gradient. Our results suggest that landscape genomics in combination with other spatial analyses provides a powerful approach for identifying loci potentially under selection and explaining spatially complex interactions between species and their environment.
2013年10月2日星期三
2013年9月22日星期日
Population genomics from pool sequencing
Keywords:
- Pool sequencing;
- High throughput sequencing;
- Neutrality tests;
- Composite likelihood estimators;
- Genetic differentiation
Abstract
Next generation sequencing of pooled samples is an effective approach for studies of variability and differentiation in populations. In this paper we provide a comprehensive set of estimators of the most common statistics in population genetics based on the frequency spectrum, namely the Watterson estimator θW, nucleotide pairwise diversity II, Tajima's D, Fu and Li's D and F, Fay and Wu's H, McDonald-Kreitman and HKA tests and Fst, corrected for sequencing errors and ascertainment bias. In a simulation study, we show that pool and individual θ estimates are highly correlated and discuss how the performance of the statistics vary with read depth and sample size in different evolutionary scenarios. As an application, we reanalyze sequences from Drosophila mauritiana and from an evolution experiment in Drosophila melanogaster. These methods are useful for population genetic projects with limited budget, study of communities of individuals that are hard to isolate, or autopolyploid species.
2013年8月18日星期日
Theoretical Evolutionary Genetics - draft text
1. http://evolution.genetics.washington.edu/pgbook/pgbook.html
This would be a very good book on population genetics.
This would be a very good book on population genetics.
- Theoretical Evolutionary Genetics notes by Joseph Felsenstein of the University of Washington. Felsenstein kindly provides these free of charge at: http://evolution.genetics.washington.edu/pgbook/pgbook.html
3. from Withlock in UBC
2013年8月17日星期六
2013年7月18日星期四
Inferring Demography from Runs of Homozygosity in Whole Genome Sequence, with Correction for Sequence Errors
Whole genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (2011) PSMC method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarises the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.
Our inference method can be applied to any
outbred diploid species for one or multiple individuals without the need to phase the data into
haplotypes.
http://mbe.oxfordjournals.org/content/early/2013/07/10/molbev.mst125.abstract
Our inference method can be applied to any
outbred diploid species for one or multiple individuals without the need to phase the data into
haplotypes.
http://mbe.oxfordjournals.org/content/early/2013/07/10/molbev.mst125.abstract
2013年4月30日星期二
2013年4月16日星期二
2013年4月6日星期六
good GUI tool for sequences analysis
1. Unipro UGENE
http://ugene.unipro.ru/
http://bioinformatics.oxfordjournals.org/content/28/8/1166
http://ugene.unipro.ru/
http://bioinformatics.oxfordjournals.org/content/28/8/1166
- CAP3
- Bowtie-0.12.7
- BWA-0.5.9
- blast-2.2.25
- blast-2.2.25+
- clustalw-2.1
- mafft-6.847
- T-Coffee-8.99
- Mr.Bayes-3.2.0
Summary: Unipro UGENE is a multiplatform open-source software with the main goal of assisting molecular biologists without much expertise in bioinformatics to manage, analyze and visualize their data. UGENE integrates widely used bioinformatics tools within a common user interface. The toolkit supports multiple biological data formats and allows the retrieval of data from remote data sources. It provides visualization modules for biological objects such as annotated genome sequences, Next Generation Sequencing (NGS) assembly data, multiple sequence alignments, phylogenetic trees and 3D structures. Most of the integrated algorithms are tuned for maximum performance by the usage of multithreading and special processor instructions. UGENE includes a visual environment for creating reusable workflows that can be launched on local resources or in a High Performance Computing (HPC) environment. UGENE is written in C++ using the Qt framework. The built-in plugin system and structured UGENE API make it possible to extend the toolkit with new functionality.
Availability and implementation: UGENE binaries are freely available for MS Windows, Linux and Mac OS X at http://ugene.unipro.ru/download.html. UGENE code is licensed under the GPLv2; the information about the code licensing and copyright of integrated tools can be found in the LICENSE.3rd_party file provided with the source bundle.
2013年3月31日星期日
PGML - plant genome mapping lab
http://www.plantgenome.uga.edu/personnel.html
MULTIPLE COLLINEARITY SCAN - MCSCAN
MCScanX-transposed is a software package able to detect transposed gene duplications that occurred within different epochs based on applying MCScanX within and between related genomes, also useful for integrative analysis of gene duplication modes and annotating a gene family of interest with gene duplication modes.
MCScan is an algorithm to scan multiple genomes or subgenomes to identify putative homologous chromosomal regions, then align these regions using genes as anchors. MCScanXtoolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity and extends the software by incorporating 15 utility programs for display and further analyses. Compared with MCScan version 0.8, MCScanX has the following new features:
2013年3月29日星期五
FaBox (1.41) - an online fasta sequence toolbox
1. FaBox,
http://onlinelibrary.wiley.com/doi/10.1111/j.1471-8286.2007.01821.x/abstract
FaBox is a collection of simple and intuitive web services that enable biologists to quickly perform typical task with sequence data. The services makes it easy to extract, edit, and replace sequence headers and join or divide data sets based on header information. Other services include collapsing a set of sequences into haplotypes and automated formatting of input files for a number of population genetics and phylogenetic programs, such as Arlequin, TCS and MrBayes. The toolbox is expected to grow on the basis of requests for particular services and converters in the future.
2. download
http://users-birc.au.dk/biopv/php/fabox/faq.php
http://onlinelibrary.wiley.com/doi/10.1111/j.1471-8286.2007.01821.x/abstract
FaBox is a collection of simple and intuitive web services that enable biologists to quickly perform typical task with sequence data. The services makes it easy to extract, edit, and replace sequence headers and join or divide data sets based on header information. Other services include collapsing a set of sequences into haplotypes and automated formatting of input files for a number of population genetics and phylogenetic programs, such as Arlequin, TCS and MrBayes. The toolbox is expected to grow on the basis of requests for particular services and converters in the future.
2. download
http://users-birc.au.dk/biopv/php/fabox/faq.php
2013年3月28日星期四
labs in evolutionary genetics
1. barkerlab lab, on genome duplication
http://barkerlab.net/
2. Yaniv Brandvain
http://yanivbrandvain.wordpress.com/publications/
3. Coop Lab
Population and evolutionary genetics
http://gcbias.org/publications/
http://barkerlab.net/
2. Yaniv Brandvain
Population genetics of speciation and mating system evolution
3. Coop Lab
Population and evolutionary genetics
http://gcbias.org/publications/
2013年3月25日星期一
2013年3月22日星期五
SPAms - help to build up ms simulation
SPAms (Simulation Program for the Analysis of ms)
SPAms is a user-friendly interface for simulating genetic data under several demographic scenarios. It uses the ms program, developed by Richard Hudson (2002), as an engine for simulating the genetic data. The program ms can be downloaded below or from Hudson's webpage . SPAms was written using MATLAB. Thus, depending on whether you have MATLAB installed or not, the files needed to run SPAms are different. The downladable package is thus divided in several files. But you do not need to have MATLAB to run SPAms. IF YOU HAVE MATLAB : you should NOT need the MCR Installer file. There is a set of examples for which we provide the R scripts (to analyse the outputs). We recommend that you read the user guide file before starting to use the program. Do not start with large number of simulations and large data sets before you understand how much memory and time you will need to carry out your simulations.
[download ms]
[download User guide]
[download SPAms]
[download Script for Examples]
[download MCRInstaller: 83.3 MB] only required if you do not have Matlab
[download User guide]
[download SPAms]
[download Script for Examples]
[download MCRInstaller: 83.3 MB] only required if you do not have Matlab
2013年3月20日星期三
A Geospatial Modelling Approach Integrating Archaeobotany and Genetics to Trace the Origin and Dispersal of Domesticated Plants
A Geospatial Modelling Approach Integrating Archaeobotany and Genetics to Trace the Origin and Dispersal of Domesticated Plants
Background
The study of the prehistoric origins and dispersal routes of domesticated plants is often based on the analysis of either archaeobotanical or genetic data. As more data become available, spatially explicit models of crop dispersal can be used to combine different types of evidence.
Methodology/Principal Findings
We present a model in which a crop disperses through a landscape that is represented by a conductance matrix. From this matrix, we derive least-cost distances from the geographical origin of the crop and use these to predict the age of archaeological crop remains and the heterozygosity of crop populations. We use measures of the overlap and divergence of dispersal trajectories to predict genetic similarity between crop populations. The conductance matrix is constructed from environmental variables using a number of parameters. Model parameters are determined with multiple-criteria optimization, simultaneously fitting the archaeobotanical and genetic data. The consilience reached by the model is the extent to which it converges around solutions optimal for both archaeobotanical and genetic data. We apply the modelling approach to the dispersal of maize in the Americas.
Conclusions/Significance
The approach makes possible the integrative inference of crop dispersal processes, while controlling model complexity and computational requirements.
van Etten J, Hijmans RJ (2010) A Geospatial Modelling Approach Integrating Archaeobotany and Genetics to Trace the Origin and Dispersal of Domesticated Plants. PLoS ONE 5(8): e12060. doi:10.1371/journal.pone.0012060
2013年3月19日星期二
2013年3月16日星期六
it is still hard to separate demography from selection in genomic inference
Joint analysis of demography and selection in population genetics: where do we stand and where could we go?
Teasing apart the effects of selection and demography on genetic polymorphism remains one of the major challenges in the analysis of population genomic data. The traditional approach has been to assume that demography would leave a genome-wide signature, whereas the effect of selection would be local. In the light of recent genomic surveys of sequence polymorphism, several authors have argued that this approach is questionable based on the evidence of the pervasive role of positive selection and that new approaches are needed. In the first part of this review, we give a few empirical and theoretical examples illustrating the difficulty in teasing apart the effects of selection and demography on genomic polymorphism patterns. In the second part, we review recent efforts to detect recent positive selection. Most available methods still rely on an a priori classification of sites in the genome but there are many promising new approaches. These new methods make use of the latest developments in statistics, explore aspects of the data that had been neglected hitherto or take advantage of the emerging population genomic data. A current and promising approach is based on first estimating demographic and genetic parameters, using, e.g., a likelihood or approximate Bayesian computation framework, focusing on extreme outlier regions, and then using an independent method to confirm these. Finally, especially for species where evidence of natural selection has been limited, more experimental and versatile approaches that contrast populations under varied environmental constraints might be more successful compared with species-wide genome scans in search of specific signatures.
Teasing apart the effects of selection and demography on genetic polymorphism remains one of the major challenges in the analysis of population genomic data. The traditional approach has been to assume that demography would leave a genome-wide signature, whereas the effect of selection would be local. In the light of recent genomic surveys of sequence polymorphism, several authors have argued that this approach is questionable based on the evidence of the pervasive role of positive selection and that new approaches are needed. In the first part of this review, we give a few empirical and theoretical examples illustrating the difficulty in teasing apart the effects of selection and demography on genomic polymorphism patterns. In the second part, we review recent efforts to detect recent positive selection. Most available methods still rely on an a priori classification of sites in the genome but there are many promising new approaches. These new methods make use of the latest developments in statistics, explore aspects of the data that had been neglected hitherto or take advantage of the emerging population genomic data. A current and promising approach is based on first estimating demographic and genetic parameters, using, e.g., a likelihood or approximate Bayesian computation framework, focusing on extreme outlier regions, and then using an independent method to confirm these. Finally, especially for species where evidence of natural selection has been limited, more experimental and versatile approaches that contrast populations under varied environmental constraints might be more successful compared with species-wide genome scans in search of specific signatures.
2013年2月6日星期三
2012年12月5日星期三
phylogenetic networks - splitstree and neighbor-net
1.splitstree
1.1
1.2 http://www.splitstree.org/
2. Neighbor-Net
2.1
1.1
Application of Phylogenetic Networks in Evolutionary Studies
1.2 http://www.splitstree.org/
2. Neighbor-Net
2.1
Neighbor-Net: An Agglomerative Method for the Construction of Phylogenetic Networks
2.2
Genome-Wide Analysis of the World's Sheep Breeds Reveals High Levels of Historic Mixture and Strong Recent Selection
ABC tools - examples
1. DIYABC
1.1
1.2
1.3 an example
1.1
Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation
1.2
Inference on population history and model checking using DNA sequence and microsatellite data with the software DIYABC (v1.0)
1.3 an example
Revealing the colonisation histories and invasion routes of M. squamiger using ABC methods
2.1
ABCtoolbox: a versatile toolkit for approximate Bayesian computations
2012年11月27日星期二
订阅:
评论 (Atom)