2013年9月22日星期日

Population genomics from pool sequencing

Keywords:

  • Pool sequencing;
  • High throughput sequencing;
  • Neutrality tests;
  • Composite likelihood estimators;
  • Genetic differentiation

Abstract

Next generation sequencing of pooled samples is an effective approach for studies of variability and differentiation in populations. In this paper we provide a comprehensive set of estimators of the most common statistics in population genetics based on the frequency spectrum, namely the Watterson estimator θW, nucleotide pairwise diversity II, Tajima's D, Fu and Li's D and F, Fay and Wu's H, McDonald-Kreitman and HKA tests and Fst, corrected for sequencing errors and ascertainment bias. In a simulation study, we show that pool and individual θ estimates are highly correlated and discuss how the performance of the statistics vary with read depth and sample size in different evolutionary scenarios. As an application, we reanalyze sequences from Drosophila mauritiana and from an evolution experiment in Drosophila melanogaster. These methods are useful for population genetic projects with limited budget, study of communities of individuals that are hard to isolate, or autopolyploid species.

2013年9月16日星期一

mdesci

1. http://www.medsci.cn/

2. 2013自然科学基金查询与分析系统(基础查询版)
http://www.medsci.cn/sci/nsfc.do

3. MedSci 2013年期刊智能查询系统(2012年度)
http://www.medsci.cn/sci/submit.asp

4. 论文服务
http://www.medsci.cn/list.asp?classid=110

public library of bioinformatics

1. http://www.plob.org/
public library of bioinformatics

2. http://www.bioask.net/

2013年9月14日星期六

forest plot

https://mcfromnz.wordpress.com/2012/11/06/forest-plots-in-r-ggplot-with-side-table/#more-356


BroadE Workshop 2013 July 9-10

http://www.broadinstitute.org/gatk/guide/events?id=3093#materials

This workshop covered the core steps involved in calling variants with the Broad’s Genome Analysis Toolkit, using the “Best Practices” developed by the GATK team. View the workshop materials to learn why each step is essential to the calling process, what are the key operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset.

Workshop materials


 - Day 1 - Opening remarks

 -  - Introduction to Next Generation Sequence Analysis

 -  - Introduction to the GATK

 -  - Mapping and duplicate marking (data pre-processing)

 -  - Local realignment around indels
RTC IR

 -  - Base quality score recalibration (BQSR)
BR PR

 -  - Compression with ReduceReads
RR

 - Day 2 - Opening remarks

 -  - Variant calling
UG HC

 -  - Variant quality score recalibration (VQSR)
VR AR

 -  - Genotype phasing and refinement
PBT RBP

 -  - Functional annotation
VA

 -  - Analyzing variant calls
SV CV VE

 - Introduction to Parallelism (video not available yet)
NT NCT Q



Supplemental materials


 -  - GenomeSTRiP: Discovery and genotyping of deletions

 - XHMM: Discovery and genotyping of copy number variation from exome read depth (PDF not available for download yet)

2013年9月9日星期一

2013年龙星计划之生物信息学

http://yixf.name/2013/09/04/%E8%8D%902013%E5%B9%B4%E9%BE%99%E6%98%9F%E8%AE%A1%E5%88%92%E4%B9%8B%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/

课程主页

课件下载

课程视频

课程简介

  • Day 1. Background. Basic Statistics. Introduce deep sequencing data. Motivational examples.
  • Day 2. Analyze RNA-seq data and small RNA-seq data.
  • Day 3. DNA methylation, Integration with other data types.
  • Day 4. Analyze ChIP-seq data on transcription factors and histone modifications. Integration with other sequencing data types.
  • Day 5. Analyze DNase-seq data and MNase-seq data. Integration with other data types.

实验内容

  • Day 1. Background. Basic Statistics. Introduce deep sequencing data. Motivational examples.
  • Day 2. Analyze ChIP-seq data on transcription factors and histone modifications. Integration with other sequencing data types.
  • Day 3. Analyze RNA-seq data and small RNA-seq data
  • Day 4. Analyze DNase-seq data and MNase-seq data. Integration with other data types
  • Day 5. DNA methylation, Integration with other data types

MOSAIK: A hash-based algorithm for accurate next-generation sequencing read mapping

http://arxiv.org/pdf/1309.1149v1.pdf

MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation
sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align
reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD,
Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to
provide consistent mappings for all the generated data (sequencing technologies, low-coverage and
exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash
clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture
mismatches as well as short insertions and deletions. To support the growing interest in larger structural
variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g.
mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All
variant discovery benefits from an accurate description of the read placement confidence. To this end,
MOSAIK uses a neural-net based training scheme to provide well-calibrated mapping quality scores,
demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities
greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is
provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is
multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO
(http://gkno.me).