显示标签为“NGS”的博文。显示所有博文
显示标签为“NGS”的博文。显示所有博文

2013年11月8日星期五

List of Bioinformatics Workshops and Training Resources

http://gettinggeneticsdone.blogspot.com/search?updated-max=2013-05-15T09:39:00-05:00&max-results=8&start=8&by-date=false



List of Bioinformatics Workshops and Training Resources

I frequently get asked to recommend workshops or online learning resources for bioinformatics, genomics, statistics, and programming. I compiled a list of both online learning resources and in-person workshops (preferentially highlighting those where workshop materials are freely available online):

List of Bioinformatics Workshops and Training Resources

I hope to keep the page above as up-to-date as possible. Below is a snapshop of what I have listed as of today. Please leave a comment if you're aware of any egregious omissions, and I'll update the page above as appropriate.

From http://stephenturner.us/p/edu, April 4, 2013

In-Person Workshops:

Cold Spring Harbor Courses: meetings.cshl.edu/courses.html

Cold Spring Harbor has been offering advanced workshops and short courses in the life sciences for years. Relevant workshops include Advanced Sequencing Technologies & ApplicationsComputational & Comparative GenomicsProgramming for BiologyStatistical Methods for Functional Genomics, the Genome Access Course, and others. Unlike most of the others below, you won't find material from past years' CSHL courses available online.

Canadian Bioinformatics Workshops: bioinformatics.ca/workshops
Bioinformatics.ca through its Canadian Bioinformatics Workshops (CBW) series began offering one and two week short courses in bioinformatics, genomics and proteomics in 1999. The more recent workshops focus on training researchers using advanced high-throughput technologies on the latest approaches being used in computational biology to deal with the new data. Course material from past workshops is freely available online, including both audio/video lectures and slideshows. Topics include microarray analysisRNA-seq analysis, genome rearrangements, copy number alteration,network/pathway analysis, genome visualization, gene function prediction, functional annotation, data analysis using R, statistics for metabolomics, and much more.

UC Davis Bioinformatics Training Program: training.bioinformatics.ucdavis.edu
The UC Davis Bioinformatics Training program offers several intensive short bootcamp workshops on RNA-seq, data analysis and visualization, and cloud computing with a focus on Amazon's computing resources. They also offer a week-long Bioinformatics Short Course, covering in-depth the practical theory and application of cutting-edge next-generation sequencing techniques. Every course's documentation is freely available online, even if you didn't take the course.

MSU NGS Summer Course: bioinformatics.msu.edu/ngs-summer-course-2013
This intensive two week summer course will introduce attendees with a strong biology background to the practice of analyzing short-read sequencing data from Illumina and other next-gen platforms. The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq. Materials from previous courses are freely available online under a CC-by-SA license.

Genetic Analysis of Complex Human Diseases: hihg.med.miami.edu/edu...
The Genetic Analysis of Complex Human Diseases is a comprehensive four-day course directed toward physician-scientists and other medical researchers. The course will introduce state-of-the-art approaches for the mapping and characterization of human inherited disorders with an emphasis on the mapping of genes involved in common and genetically complex disease phenotypes. The primary goal of this course is to provide participants with an overview of approaches to identifying genes involved in complex human diseases. At the end of the course, participants should be able to identify the key components of a study team, and communicate effectively with specialists in various areas to design and execute a study. The course is in Miami Beach, FL. (Full Disclosure: I teach a section in this course.) Most of the course material from previous years is not available online, but my RNA-seq & methylation lectures are on Figshare.

UAB Short Course on Statistical Genetics and Genomics: soph.uab.edu/ssg/...
Focusing on the state-of-art methodology to analyze complex traits, this five-day course will offer an interactive program to enhance researchers' ability to understand & use statistical genetic methods, as well as implement & interpret sophisticated genetic analyses. Topics include GWAS Design/Analysis/Imputation/Interpretation; Non-Mendelian Disorders Analysis; Pharmacogenetics/Pharmacogenomics; ELSI; Rare Variants & Exome Sequencing; Whole Genome Prediction; Analysis of DNA Methylation Microarray Data; Variant Calling from NGS Data; RNAseq: Experimental Design and Data Analysis; Analysis of ChIP-seq Data; Statistical Methods for NGS Data; Discovering new drugs & diagnostics from 300 billion points of data. Video recording from the 2012 course are available online.

MBL Molecular Evolution Workshop: hermes.mbl.edu/education/...
One of the longest-running courses listed here (est. 1988), the Workshop on Molecular Evolution at Woods Hole presents a series of lectures, discussions, and bioinformatic exercises that span contemporary topics in molecular evolution. The course addresses phylogenetic analysis, population genetics, database and sequence matching, molecular evolution and development, and comparative genomics, using software packages including AWTY, BEAST, BEST, Clustal W/X, FASTA, FigTree, GARLI, MIGRATE, LAMARC, MAFFT, MP-EST, MrBayes, PAML, PAUP*, PHYLIP, STEM, STEM-hy, and SeaView. Some of the course materials can be found by digging around the course wiki.


Online Material:


Canadian Bioinformatics Workshops: bioinformatics.ca/workshops
(In person workshop described above). Course material from past workshops is freely available online, including both audio/video lectures and slideshows. Topics include microarray analysisRNA-seq analysis, genome rearrangements, copy number alteration, network/pathway analysis, genome visualization, gene function prediction, functional annotation, data analysis using R, statistics for metabolomics, andmuch more.

UC Davis Bioinformatics Training Program: training.bioinformatics.ucdavis.edu
(In person workshop described above). Every course's documentation is freely available online, even if you didn't take the course. Past topics include Galaxy, Bioinformatics for NGS, cloud computing, and RNA-seq.

MSU NGS Summer Course: bioinformatics.msu.edu/ngs-summer-course-2013
(In person workshop described above). Materials from previous courses are freely available online under a CC-by-SA license, which cover mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq.

EMBL-EBI Train Online: www.ebi.ac.uk/training/online
Train online provides free courses on Europe's most widely used data resources, created by experts at EMBL-EBI and collaborating institutes. Topics include Genes and GenomesGene Expression,Interactions, Pathways, and Networks, and others. Of particular interest may be the Practical Course on Analysis of High-Throughput Sequencing Data, which covers Bioconductor packages for short read analysis, ChIP-Seq, RNA-seq, and allele-specific expression & eQTLs.

UC Riverside Bioinformatics Manuals: manuals.bioinformatics.ucr.edu
This is an excellent collection of manuals and code snippets. Topics include Programming in RR+BioconductorSequence Analysis with R and BioconductorNGS analysis with Galaxy and IGV, basicLinux skills, and others.

Software Carpentry: software-carpentry.org
Software Carpentry helps researchers be more productive by teaching them basic computing skills. We recently ran a 2-day Software Carpentry Bootcamp here at UVA. Check out the online lectures for some introductory material on Unix, Python, Version Control, Databases, Automation, and many other topics.

Coursera: coursera.org/courses
Coursera partners with top universities to offer courses online for anytone to take, for free. Courses are usually 4-6 weeks, and consist of video lectures, quizzes, assignments, and exams. Joining a course gives you access to the course's forum where you can interact with the instructor and other participants. Relevant courses include Data AnalysisComputing for Data Analysis using R, and Bioinformatics Algorithms, among others. You can also view all of Jeff Leek's Data Analysis lectures on Youtube.
Rosalind: http://rosalind.info
Quite different from the others listed here, Rosalind is a platform for learning bioinformatics through gaming-like problem solving. Visit the Python Village to learn the basics of Python. Arm yourself at theBioinformatics Armory, equipping yourself with existing ready-to-use bioinformatics software tools. Or storm the Bioinformatics Stronghold, implementing your own algorithms for computational mass spectrometry, alignment, dynamic programming, genome assembly, genome rearrangements, phylogeny, probability, string algorithms and others.


Other Resources:


  • Titus Brown's list bioinformatics courses: Includes a few others not listed here (also see the comments).
  • GMOD Training and Outreach: GMOD is the Generic Model Organism Database project, a collection of open source software tools for creating and managing genome-scale biological databases. This page links out to tutorials on GMOD Components such as Apollo, BioMart, Galaxy, GBrowse, MAKER, and others.
  • Seqanswers.com: A discussion forum for anything related to Bioinformatics, including Q&A, paper discussions, new software announcements, protocols, and more.
  • Biostars.org: Similar to SEQanswers, but more strictly a Q&A site.
  • BioConductor Mailing list: A very active mailing list for getting help with Bioconductor packages. Make sure you do some Google searching yourself first before posting to this list.
  • Bioconductor Events: List of upcoming and prior Bioconductor training and events worldwide.
  • Learn Galaxy: Screencasts and tutorials for learning to use Galaxy.
  • Galaxy Event Horizon: Worldwide Galaxy-related events (workshops, training, user meetings) are listed here.
  • Galaxy RNA-Seq Exercise: Run through a small RNA-seq study from start to finish using Galaxy.
  • Rafael Irizarry's Youtube Channel: Several statistics and bioinformatics video lectures.
  • PLoS Comp Bio Online Bioinformatics Curriculum: A perspective paper by David B Searls outlining a series of free online learning initiatives for beginning to advanced training in biology, biochemistry, genetics, computational biology, genomics, math, statistics, computer science, programming, web development, databases, parallel computing, image processing, AI, NLP, and more.
  • Getting Genetics Done: Shameless plug – I write a blog highlighting literature of interest, new tools, and occasionally tutorials in genetics, statistics, and bioinformatics. I recently wrote this post about how to stay current in bioinformatics & genomics.

A Mitochondrial Manhattan Plot

A Mitochondrial Manhattan Plot




SysCall - Distinguishing heterozygous sites from systematic errors

http://bio.math.berkeley.edu/SysCall/

SysCall is a logistic regression based classifier.
Given a list of candidate heterozygous genomic locations and a sam file of sequenced reads SysCall classifies each genomic location as either a heterozygous site or a systematic error and outputs according lists, along with the assigned posterior probabilities.

The submitted manuscript describing SysCall can be found here and the lists of systematic errors reported in the paper are here .
The slides from a talk on SysCall given at the 2011 CSHL Meeting on The Biology of Genomes can be found here


Manual Click here to download the SysCall manual.

Paper
http://www.biomedcentral.com/1471-2105/12/451/

De Novo Transcriptome Assembly with Trinity: Protocol and Videos

http://gettinggeneticsdone.blogspot.com/2013/10/de-novo-transcriptome-assembly-trinity.html


2013年10月2日星期三

two tools - for detecting the genetic basis of adaptation

1. DISENTANGLING THE EFFECTS OF GEOGRAPHIC AND ECOLOGICAL ISOLATION ON GENETIC DIFFERENTIATION

http://onlinelibrary.wiley.com/doi/10.1111/evo.12193/full

Populations can be genetically isolated both by geographic distance and by differences in their ecology or environment that decrease the rate of successful migration. Empirical studies often seek to investigate the relationship between genetic differentiation and some ecological variable(s) while accounting for geographic distance, but common approaches to this problem (such as the partial Mantel test) have a number of drawbacks. In this article, we present a Bayesian method that enables users to quantify the relative contributions of geographic distance and ecological distance to genetic differentiation between sampled populations or individuals. We model the allele frequencies in a set of populations at a set of unlinked loci as spatially correlated Gaussian processes, in which the covariance structure is a decreasing function of both geographic and ecological distance. Parameters of the model are estimated using a Markov chain Monte Carlo algorithm. We call this method Bayesian Estimation of Differentiation in Alleles by Spatial Structure and Local Ecology (BEDASSLE), and have implemented it in a user-friendly format in the statistical platform R. We demonstrate its utility with a simulation study and empirical applications to human and teosinte data sets.

http://genescape.ucdavis.edu/scripts-and-code/

2. INTEGRATING LANDSCAPE GENOMICS AND SPATIALLY EXPLICIT APPROACHES TO DETECT LOCI UNDER SELECTION IN CLINAL POPULATIONS

http://onlinelibrary.wiley.com/doi/10.1111/evo.12237/abstract

Uncovering the genetic basis of adaptation hinges on the ability to detect loci under selection. However, population genomics outlier approaches to detect selected loci may be inappropriate for clinal populations or those with unclear population structure because they require that individuals be clustered into populations. An alternate approach, landscape genomics, uses individual-based approaches to detect loci under selection and reveal potential environmental drivers of selection. We tested four landscape genomics methods on a simulated clinal population to determine their effectiveness at identifying a locus under varying selection strengths along an environmental gradient. We found all methods produced very low type I error rates across all selection strengths, but elevated type II error rates under “weak” selection. We then applied these methods to an AFLP genome scan of an alpine plant, Campanula barbata, and identified five highly supported candidate loci associated with precipitation variables. These loci also showed spatial autocorrelation and cline patterns indicative of selection along a precipitation gradient. Our results suggest that landscape genomics in combination with other spatial analyses provides a powerful approach for identifying loci potentially under selection and explaining spatially complex interactions between species and their environment.


2013年10月1日星期二

Computational analysis and characterization of UCE-like elements (ULEs) in plant genomes

Ultraconserved elements (UCEs), stretches of DNA that are identical between distantly related species, are enigmatic genomic features whose function is not well understood. First identified and characterized in mammals, UCEs have been proposed to play important roles in gene regulation, RNA processing, and maintaining genome integrity. However, because all of these functions can tolerate some sequence variation, their ultraconserved and ultraselected nature is not explained. We investigated whether there are highly conserved DNA elements without genic function in distantly related plant genomes. We compared the genomes of Arabidopsis thaliana and Vitis vinifera; species that diverged ∼115 million years ago (Mya). We identified 36 highly conserved elements with at least 85% similarity that are longer than 55 bp. Interestingly, these elements exhibit properties similar to mammalian UCEs, such that we named them UCE-like elements (ULEs). ULEs are located in intergenic or intronic regions and are depleted from segmental duplications. Like UCEs, ULEs are under strong purifying selection, suggesting a functional role for these elements. As their mammalian counterparts, ULEs show a sharp drop of A+T content at their borders and are enriched close to genes encoding transcription factors and genes involved in development, the latter showing preferential expression in undifferentiated tissues. By comparing the genomes of Brachypodium distachyon and Oryza sativa, species that diverged ∼50 Mya, we identified a different set of ULEs with similar properties in monocots. The identification of ULEs in plant genomes offers new opportunities to study their possible roles in genome function, integrity, and regulation.

http://genome.cshlp.org/content/22/12/2455.long




2013年9月16日星期一

2013年9月14日星期六

BroadE Workshop 2013 July 9-10

http://www.broadinstitute.org/gatk/guide/events?id=3093#materials

This workshop covered the core steps involved in calling variants with the Broad’s Genome Analysis Toolkit, using the “Best Practices” developed by the GATK team. View the workshop materials to learn why each step is essential to the calling process, what are the key operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset.

Workshop materials


 - Day 1 - Opening remarks

 -  - Introduction to Next Generation Sequence Analysis

 -  - Introduction to the GATK

 -  - Mapping and duplicate marking (data pre-processing)

 -  - Local realignment around indels
RTC IR

 -  - Base quality score recalibration (BQSR)
BR PR

 -  - Compression with ReduceReads
RR

 - Day 2 - Opening remarks

 -  - Variant calling
UG HC

 -  - Variant quality score recalibration (VQSR)
VR AR

 -  - Genotype phasing and refinement
PBT RBP

 -  - Functional annotation
VA

 -  - Analyzing variant calls
SV CV VE

 - Introduction to Parallelism (video not available yet)
NT NCT Q



Supplemental materials


 -  - GenomeSTRiP: Discovery and genotyping of deletions

 - XHMM: Discovery and genotyping of copy number variation from exome read depth (PDF not available for download yet)

2013年9月9日星期一

2013年龙星计划之生物信息学

http://yixf.name/2013/09/04/%E8%8D%902013%E5%B9%B4%E9%BE%99%E6%98%9F%E8%AE%A1%E5%88%92%E4%B9%8B%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/

课程主页

课件下载

课程视频

课程简介

  • Day 1. Background. Basic Statistics. Introduce deep sequencing data. Motivational examples.
  • Day 2. Analyze RNA-seq data and small RNA-seq data.
  • Day 3. DNA methylation, Integration with other data types.
  • Day 4. Analyze ChIP-seq data on transcription factors and histone modifications. Integration with other sequencing data types.
  • Day 5. Analyze DNase-seq data and MNase-seq data. Integration with other data types.

实验内容

  • Day 1. Background. Basic Statistics. Introduce deep sequencing data. Motivational examples.
  • Day 2. Analyze ChIP-seq data on transcription factors and histone modifications. Integration with other sequencing data types.
  • Day 3. Analyze RNA-seq data and small RNA-seq data
  • Day 4. Analyze DNase-seq data and MNase-seq data. Integration with other data types
  • Day 5. DNA methylation, Integration with other data types

MOSAIK: A hash-based algorithm for accurate next-generation sequencing read mapping

http://arxiv.org/pdf/1309.1149v1.pdf

MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation
sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align
reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD,
Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to
provide consistent mappings for all the generated data (sequencing technologies, low-coverage and
exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash
clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture
mismatches as well as short insertions and deletions. To support the growing interest in larger structural
variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g.
mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All
variant discovery benefits from an accurate description of the read placement confidence. To this end,
MOSAIK uses a neural-net based training scheme to provide well-calibrated mapping quality scores,
demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities
greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is
provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is
multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO
(http://gkno.me).

2013年8月18日星期日

Alternative forms for genomic clines

http://onlinelibrary.wiley.com/doi/10.1002/ece3.609/full

Understanding factors regulating hybrid fitness and gene exchange is a major research challenge for evolutionary biology. Genomic cline analysis has been used to evaluate alternative patterns of introgression, but only two models have been used widely and the approach has generally lacked a hypothesis testing framework for distinguishing effects of selection and drift. I propose two alternative cline models, implement multivariate outlier detection to identify markers associated with hybrid fitness, and simulate hybrid zone dynamics to evaluate the signatures of different modes of selection. Analysis of simulated data shows that previous approaches are prone to false positives (multinomial regression) or relatively insensitive to outlier loci affected by selection (Barton's concordance). The new, theory-based logit-logistic cline model is generally best at detecting loci affecting hybrid fitness. Although some generalizations can be made about different modes of selection, there is no one-to-one correspondence between pattern and process. These new methods will enhance our ability to extract important information about the genetics of reproductive isolation and hybrid fitness. However, much remains to be done to relate statistical patterns to particular evolutionary processes. The methods described here are implemented in a freely available package “HIest” for the R statistical software (CRAN; http://cran.r-project.org/).

Theoretical Evolutionary Genetics - draft text

1. http://evolution.genetics.washington.edu/pgbook/pgbook.html

This would be a very good book on population genetics.

2. Evolution and Selection of Quantitative Traits by Bruce Walsh and Michael Lynch. While this book is in draft form it is available from Bruce Walsh's web page at: http://nitro.biosci.arizona.edu/zbook/NewVolume_2/newvol2.html (Bruce Walsh's web page is in general a fantastic source of information on all things population/quantitative genetics).

3. from Withlock in UBC