2013年9月28日星期六
2013年9月22日星期日
Population genomics from pool sequencing
Keywords:
- Pool sequencing;
- High throughput sequencing;
- Neutrality tests;
- Composite likelihood estimators;
- Genetic differentiation
Abstract
Next generation sequencing of pooled samples is an effective approach for studies of variability and differentiation in populations. In this paper we provide a comprehensive set of estimators of the most common statistics in population genetics based on the frequency spectrum, namely the Watterson estimator θW, nucleotide pairwise diversity II, Tajima's D, Fu and Li's D and F, Fay and Wu's H, McDonald-Kreitman and HKA tests and Fst, corrected for sequencing errors and ascertainment bias. In a simulation study, we show that pool and individual θ estimates are highly correlated and discuss how the performance of the statistics vary with read depth and sample size in different evolutionary scenarios. As an application, we reanalyze sequences from Drosophila mauritiana and from an evolution experiment in Drosophila melanogaster. These methods are useful for population genetic projects with limited budget, study of communities of individuals that are hard to isolate, or autopolyploid species.
2013年9月16日星期一
mdesci
1. http://www.medsci.cn/
2. 2013自然科学基金查询与分析系统(基础查询版)
http://www.medsci.cn/sci/nsfc.do
3. MedSci 2013年期刊智能查询系统(2012年度)
http://www.medsci.cn/sci/submit.asp
4. 论文服务
http://www.medsci.cn/list.asp?classid=110
2. 2013自然科学基金查询与分析系统(基础查询版)
http://www.medsci.cn/sci/nsfc.do
3. MedSci 2013年期刊智能查询系统(2012年度)
http://www.medsci.cn/sci/submit.asp
4. 论文服务
http://www.medsci.cn/list.asp?classid=110
2013年9月14日星期六
BroadE Workshop 2013 July 9-10
http://www.broadinstitute.org/gatk/guide/events?id=3093#materials
This workshop covered the core steps involved in calling variants with the Broad’s Genome Analysis Toolkit, using the “Best Practices” developed by the GATK team. View the workshop materials to learn why each step is essential to the calling process, what are the key operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset.
This workshop covered the core steps involved in calling variants with the Broad’s Genome Analysis Toolkit, using the “Best Practices” developed by the GATK team. View the workshop materials to learn why each step is essential to the calling process, what are the key operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset.
Workshop materials
Supplemental materials
2013年9月9日星期一
2013年龙星计划之生物信息学
http://yixf.name/2013/09/04/%E8%8D%902013%E5%B9%B4%E9%BE%99%E6%98%9F%E8%AE%A1%E5%88%92%E4%B9%8B%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/
课程主页
课件下载
课程视频
课程简介
- Day 1. Background. Basic Statistics. Introduce deep sequencing data. Motivational examples.
- Day 2. Analyze RNA-seq data and small RNA-seq data.
- Day 3. DNA methylation, Integration with other data types.
- Day 4. Analyze ChIP-seq data on transcription factors and histone modifications. Integration with other sequencing data types.
- Day 5. Analyze DNase-seq data and MNase-seq data. Integration with other data types.
实验内容
- Day 1. Background. Basic Statistics. Introduce deep sequencing data. Motivational examples.
- Day 2. Analyze ChIP-seq data on transcription factors and histone modifications. Integration with other sequencing data types.
- Day 3. Analyze RNA-seq data and small RNA-seq data
- Day 4. Analyze DNase-seq data and MNase-seq data. Integration with other data types
- Day 5. DNA methylation, Integration with other data types
MOSAIK: A hash-based algorithm for accurate next-generation sequencing read mapping
http://arxiv.org/pdf/1309.1149v1.pdf
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation
sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align
reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD,
Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to
provide consistent mappings for all the generated data (sequencing technologies, low-coverage and
exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash
clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture
mismatches as well as short insertions and deletions. To support the growing interest in larger structural
variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g.
mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All
variant discovery benefits from an accurate description of the read placement confidence. To this end,
MOSAIK uses a neural-net based training scheme to provide well-calibrated mapping quality scores,
demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities
greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is
provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is
multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO
(http://gkno.me).
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation
sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align
reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD,
Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to
provide consistent mappings for all the generated data (sequencing technologies, low-coverage and
exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash
clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture
mismatches as well as short insertions and deletions. To support the growing interest in larger structural
variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g.
mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All
variant discovery benefits from an accurate description of the read placement confidence. To this end,
MOSAIK uses a neural-net based training scheme to provide well-calibrated mapping quality scores,
demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities
greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is
provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is
multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO
(http://gkno.me).
订阅:
博文 (Atom)