2013年11月8日星期五

List of Bioinformatics Workshops and Training Resources

http://gettinggeneticsdone.blogspot.com/search?updated-max=2013-05-15T09:39:00-05:00&max-results=8&start=8&by-date=false



List of Bioinformatics Workshops and Training Resources

I frequently get asked to recommend workshops or online learning resources for bioinformatics, genomics, statistics, and programming. I compiled a list of both online learning resources and in-person workshops (preferentially highlighting those where workshop materials are freely available online):

List of Bioinformatics Workshops and Training Resources

I hope to keep the page above as up-to-date as possible. Below is a snapshop of what I have listed as of today. Please leave a comment if you're aware of any egregious omissions, and I'll update the page above as appropriate.

From http://stephenturner.us/p/edu, April 4, 2013

In-Person Workshops:

Cold Spring Harbor Courses: meetings.cshl.edu/courses.html

Cold Spring Harbor has been offering advanced workshops and short courses in the life sciences for years. Relevant workshops include Advanced Sequencing Technologies & ApplicationsComputational & Comparative GenomicsProgramming for BiologyStatistical Methods for Functional Genomics, the Genome Access Course, and others. Unlike most of the others below, you won't find material from past years' CSHL courses available online.

Canadian Bioinformatics Workshops: bioinformatics.ca/workshops
Bioinformatics.ca through its Canadian Bioinformatics Workshops (CBW) series began offering one and two week short courses in bioinformatics, genomics and proteomics in 1999. The more recent workshops focus on training researchers using advanced high-throughput technologies on the latest approaches being used in computational biology to deal with the new data. Course material from past workshops is freely available online, including both audio/video lectures and slideshows. Topics include microarray analysisRNA-seq analysis, genome rearrangements, copy number alteration,network/pathway analysis, genome visualization, gene function prediction, functional annotation, data analysis using R, statistics for metabolomics, and much more.

UC Davis Bioinformatics Training Program: training.bioinformatics.ucdavis.edu
The UC Davis Bioinformatics Training program offers several intensive short bootcamp workshops on RNA-seq, data analysis and visualization, and cloud computing with a focus on Amazon's computing resources. They also offer a week-long Bioinformatics Short Course, covering in-depth the practical theory and application of cutting-edge next-generation sequencing techniques. Every course's documentation is freely available online, even if you didn't take the course.

MSU NGS Summer Course: bioinformatics.msu.edu/ngs-summer-course-2013
This intensive two week summer course will introduce attendees with a strong biology background to the practice of analyzing short-read sequencing data from Illumina and other next-gen platforms. The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq. Materials from previous courses are freely available online under a CC-by-SA license.

Genetic Analysis of Complex Human Diseases: hihg.med.miami.edu/edu...
The Genetic Analysis of Complex Human Diseases is a comprehensive four-day course directed toward physician-scientists and other medical researchers. The course will introduce state-of-the-art approaches for the mapping and characterization of human inherited disorders with an emphasis on the mapping of genes involved in common and genetically complex disease phenotypes. The primary goal of this course is to provide participants with an overview of approaches to identifying genes involved in complex human diseases. At the end of the course, participants should be able to identify the key components of a study team, and communicate effectively with specialists in various areas to design and execute a study. The course is in Miami Beach, FL. (Full Disclosure: I teach a section in this course.) Most of the course material from previous years is not available online, but my RNA-seq & methylation lectures are on Figshare.

UAB Short Course on Statistical Genetics and Genomics: soph.uab.edu/ssg/...
Focusing on the state-of-art methodology to analyze complex traits, this five-day course will offer an interactive program to enhance researchers' ability to understand & use statistical genetic methods, as well as implement & interpret sophisticated genetic analyses. Topics include GWAS Design/Analysis/Imputation/Interpretation; Non-Mendelian Disorders Analysis; Pharmacogenetics/Pharmacogenomics; ELSI; Rare Variants & Exome Sequencing; Whole Genome Prediction; Analysis of DNA Methylation Microarray Data; Variant Calling from NGS Data; RNAseq: Experimental Design and Data Analysis; Analysis of ChIP-seq Data; Statistical Methods for NGS Data; Discovering new drugs & diagnostics from 300 billion points of data. Video recording from the 2012 course are available online.

MBL Molecular Evolution Workshop: hermes.mbl.edu/education/...
One of the longest-running courses listed here (est. 1988), the Workshop on Molecular Evolution at Woods Hole presents a series of lectures, discussions, and bioinformatic exercises that span contemporary topics in molecular evolution. The course addresses phylogenetic analysis, population genetics, database and sequence matching, molecular evolution and development, and comparative genomics, using software packages including AWTY, BEAST, BEST, Clustal W/X, FASTA, FigTree, GARLI, MIGRATE, LAMARC, MAFFT, MP-EST, MrBayes, PAML, PAUP*, PHYLIP, STEM, STEM-hy, and SeaView. Some of the course materials can be found by digging around the course wiki.


Online Material:


Canadian Bioinformatics Workshops: bioinformatics.ca/workshops
(In person workshop described above). Course material from past workshops is freely available online, including both audio/video lectures and slideshows. Topics include microarray analysisRNA-seq analysis, genome rearrangements, copy number alteration, network/pathway analysis, genome visualization, gene function prediction, functional annotation, data analysis using R, statistics for metabolomics, andmuch more.

UC Davis Bioinformatics Training Program: training.bioinformatics.ucdavis.edu
(In person workshop described above). Every course's documentation is freely available online, even if you didn't take the course. Past topics include Galaxy, Bioinformatics for NGS, cloud computing, and RNA-seq.

MSU NGS Summer Course: bioinformatics.msu.edu/ngs-summer-course-2013
(In person workshop described above). Materials from previous courses are freely available online under a CC-by-SA license, which cover mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq.

EMBL-EBI Train Online: www.ebi.ac.uk/training/online
Train online provides free courses on Europe's most widely used data resources, created by experts at EMBL-EBI and collaborating institutes. Topics include Genes and GenomesGene Expression,Interactions, Pathways, and Networks, and others. Of particular interest may be the Practical Course on Analysis of High-Throughput Sequencing Data, which covers Bioconductor packages for short read analysis, ChIP-Seq, RNA-seq, and allele-specific expression & eQTLs.

UC Riverside Bioinformatics Manuals: manuals.bioinformatics.ucr.edu
This is an excellent collection of manuals and code snippets. Topics include Programming in RR+BioconductorSequence Analysis with R and BioconductorNGS analysis with Galaxy and IGV, basicLinux skills, and others.

Software Carpentry: software-carpentry.org
Software Carpentry helps researchers be more productive by teaching them basic computing skills. We recently ran a 2-day Software Carpentry Bootcamp here at UVA. Check out the online lectures for some introductory material on Unix, Python, Version Control, Databases, Automation, and many other topics.

Coursera: coursera.org/courses
Coursera partners with top universities to offer courses online for anytone to take, for free. Courses are usually 4-6 weeks, and consist of video lectures, quizzes, assignments, and exams. Joining a course gives you access to the course's forum where you can interact with the instructor and other participants. Relevant courses include Data AnalysisComputing for Data Analysis using R, and Bioinformatics Algorithms, among others. You can also view all of Jeff Leek's Data Analysis lectures on Youtube.
Rosalind: http://rosalind.info
Quite different from the others listed here, Rosalind is a platform for learning bioinformatics through gaming-like problem solving. Visit the Python Village to learn the basics of Python. Arm yourself at theBioinformatics Armory, equipping yourself with existing ready-to-use bioinformatics software tools. Or storm the Bioinformatics Stronghold, implementing your own algorithms for computational mass spectrometry, alignment, dynamic programming, genome assembly, genome rearrangements, phylogeny, probability, string algorithms and others.


Other Resources:


  • Titus Brown's list bioinformatics courses: Includes a few others not listed here (also see the comments).
  • GMOD Training and Outreach: GMOD is the Generic Model Organism Database project, a collection of open source software tools for creating and managing genome-scale biological databases. This page links out to tutorials on GMOD Components such as Apollo, BioMart, Galaxy, GBrowse, MAKER, and others.
  • Seqanswers.com: A discussion forum for anything related to Bioinformatics, including Q&A, paper discussions, new software announcements, protocols, and more.
  • Biostars.org: Similar to SEQanswers, but more strictly a Q&A site.
  • BioConductor Mailing list: A very active mailing list for getting help with Bioconductor packages. Make sure you do some Google searching yourself first before posting to this list.
  • Bioconductor Events: List of upcoming and prior Bioconductor training and events worldwide.
  • Learn Galaxy: Screencasts and tutorials for learning to use Galaxy.
  • Galaxy Event Horizon: Worldwide Galaxy-related events (workshops, training, user meetings) are listed here.
  • Galaxy RNA-Seq Exercise: Run through a small RNA-seq study from start to finish using Galaxy.
  • Rafael Irizarry's Youtube Channel: Several statistics and bioinformatics video lectures.
  • PLoS Comp Bio Online Bioinformatics Curriculum: A perspective paper by David B Searls outlining a series of free online learning initiatives for beginning to advanced training in biology, biochemistry, genetics, computational biology, genomics, math, statistics, computer science, programming, web development, databases, parallel computing, image processing, AI, NLP, and more.
  • Getting Genetics Done: Shameless plug – I write a blog highlighting literature of interest, new tools, and occasionally tutorials in genetics, statistics, and bioinformatics. I recently wrote this post about how to stay current in bioinformatics & genomics.

A Mitochondrial Manhattan Plot

A Mitochondrial Manhattan Plot




Lior Pachter's lab

http://math.berkeley.edu/~lpachter/software.html

Software developed in the Pachter group and still under active development in the group
  • eXpress (2012) Streaming quantification for high-throughput sequencing.
  • SysCall (2011) Distinguishing heterozygous sites from systematic error in high-thoughput sequenced reads
  • Cufflinks (2010) Transcript assembly and abundance estimation for RNA-Seq (now a joint effort together with Cole Trapnell and the John Rinn Lab at Harvard University)
  • MetMap (2010) Analysis of Methyl-Seq experiments
Software developed in the Pachter group but now maintained/developed elsewhere
  • ReadSpy (2012) Assessment of uniformity in RNA-Seq reads (now supported by Valerie Hower and her group at the University of Miami)
  • TopHat (2009) Splice junction mapper for short RNA-seq reads (now supported by Steven Salzberg and his group at Johns Hopkins University)
  • FSA (2009) Fast Statistical Alignment (now supported by Robert Bradley and his group at FHCRC)
  • MERCATOR (2004) Homology mapping (now supported by Colin Dewey and his group at the University of Wisconsin)
  • VISTA (2000) Visualization tool for global alignments (now supported by Inna Dubchak and her group at the JGI)
Retired Software
These programs, originally developed in the Pachter group, are no longer under active development and are not being supported.
  • AMAP (2007) Protein multiple alignment (recommended instead: FSA)
  • GENEMAPPER (2006) Reference based gene annotation (recommended instead: an RNA-Seq experiment)
  • MJOIN (2006) Neighbor joining with subtree weights (archived here)
  • PARALIGN (2006) Alignment polytope construction (archived here)
  • SLIM (2003) Minimum network design for optimizing the search space for pair hidden Markov models (archived here)
  • SLAM (2003) Pairwise simultaneous alignment and gene finding (recommended instead: an RNA-Seq experiment)
  • MAVID (2003) Multiple alignment of large genomic sequences (recommended instead: FSA)
################################################################################
Submitted
L. Pachter, Models for transcript quantification from RNA-Seq, submitted.
In press
A. Roberts, L. Schaeffer and L. Pachter, Updating RNA-Seq analyses after re-annotation, in press.
M. Singer and L. Pachter, Bayesian networks in the study of genomewide DNA methylation, in press.
2013
A. Rahman and L. Pachter, CGAL: computing genome assembly likelihoods, Genome Biology, 14 (2013), R8.
2012
C. Trapnell, D.G. Hendrickson, M. Sauvageau, L. Goff, J.L. Rinn and L. Pachter, Differential analysis of gene regulation at transcript resolution with RNA-seq, Nature Biotechnology, advance online publication (2012).
S.A. Mortimer, C. Trapnell, S. Aviran, L. Pachter and J.B. Lucks, SHAPE-Seq: High throughput RNA structure analysis, Current Protocols in Chemical Biology, advance online publication.
A. Kleinman, M. Harel and L. Pachter, Affine and projective tree metric theorems, Annals of Combinatorics, advance online publication (2012).
A. Roberts and L. Pachter, Streaming fragment assignment for real-time analysis of sequencing experiments, Nature Methods, advance online publication (2012).
V. Hower, R. Starfield, A. Roberts, and L. Pachter, Quantifying uniformity in mapped reads, Bioinformatics, 28 (2012), 2680--2682.
L. Pachter, A closer look at RNA editing, Nature Biotechnology, 30 (2012), 246--247.
C. Trapnell, A. Roberts, L. Goff, G. Pertea, D. Kim, D.R. Kelley, H. Pimentel, S.L. Salzberg, J.L. Rinn and L. Pachter, Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks, Nature Protocols, 7 (2012), 562--578.

SysCall - Distinguishing heterozygous sites from systematic errors

http://bio.math.berkeley.edu/SysCall/

SysCall is a logistic regression based classifier.
Given a list of candidate heterozygous genomic locations and a sam file of sequenced reads SysCall classifies each genomic location as either a heterozygous site or a systematic error and outputs according lists, along with the assigned posterior probabilities.

The submitted manuscript describing SysCall can be found here and the lists of systematic errors reported in the paper are here .
The slides from a talk on SysCall given at the 2011 CSHL Meeting on The Biology of Genomes can be found here


Manual Click here to download the SysCall manual.

Paper
http://www.biomedcentral.com/1471-2105/12/451/

PubMed Commons: One post-publication peer review forum to rule them all?

http://gettinggeneticsdone.blogspot.com/2013/10/pubmed-commons-post-publication-peer-review.html

Useful Unix/Linux One-Liners for Bioinformatics

http://gettinggeneticsdone.blogspot.com/2013/10/useful-linux-oneliners-for-bioinformatics.html

Much of the work that bioinformaticians do is munging and wrangling around massive amounts of text. While there are some "standardized" file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Unix/Linux is extremely helpful, namely awk, sed, cut, grep, GNU parallel, and others.

This is by no means an exhaustive catalog, but I've put together a short list of examples using various Unix/Linux utilities for text manipulation, from the very basic (e.g., sum a column) to the very advanced (munge a FASTQ file and print the total number of reads, total number unique reads, percentage of unique reads, most abundant sequence, and its frequency). Most of these examples (with the exception of the SeqTK examples) use built-in utilities installed on nearly every Linux system. These examples are a combination of tactics I used everyday and examples culled from other sources listed at the top of the page.



The list is available as a README in this GitHub repo. This list is a start - I would love suggestions for other things to include. To make a suggestion, leave a comment here, or better - open an issue, or even better still - send me a pull request.

Useful one-liners for bioinformatics: https://github.com/stephenturner/oneliners

Alternatively, download a PDF here.

De Novo Transcriptome Assembly with Trinity: Protocol and Videos

http://gettinggeneticsdone.blogspot.com/2013/10/de-novo-transcriptome-assembly-trinity.html


2013年10月2日星期三

two tools - for detecting the genetic basis of adaptation

1. DISENTANGLING THE EFFECTS OF GEOGRAPHIC AND ECOLOGICAL ISOLATION ON GENETIC DIFFERENTIATION

http://onlinelibrary.wiley.com/doi/10.1111/evo.12193/full

Populations can be genetically isolated both by geographic distance and by differences in their ecology or environment that decrease the rate of successful migration. Empirical studies often seek to investigate the relationship between genetic differentiation and some ecological variable(s) while accounting for geographic distance, but common approaches to this problem (such as the partial Mantel test) have a number of drawbacks. In this article, we present a Bayesian method that enables users to quantify the relative contributions of geographic distance and ecological distance to genetic differentiation between sampled populations or individuals. We model the allele frequencies in a set of populations at a set of unlinked loci as spatially correlated Gaussian processes, in which the covariance structure is a decreasing function of both geographic and ecological distance. Parameters of the model are estimated using a Markov chain Monte Carlo algorithm. We call this method Bayesian Estimation of Differentiation in Alleles by Spatial Structure and Local Ecology (BEDASSLE), and have implemented it in a user-friendly format in the statistical platform R. We demonstrate its utility with a simulation study and empirical applications to human and teosinte data sets.

http://genescape.ucdavis.edu/scripts-and-code/

2. INTEGRATING LANDSCAPE GENOMICS AND SPATIALLY EXPLICIT APPROACHES TO DETECT LOCI UNDER SELECTION IN CLINAL POPULATIONS

http://onlinelibrary.wiley.com/doi/10.1111/evo.12237/abstract

Uncovering the genetic basis of adaptation hinges on the ability to detect loci under selection. However, population genomics outlier approaches to detect selected loci may be inappropriate for clinal populations or those with unclear population structure because they require that individuals be clustered into populations. An alternate approach, landscape genomics, uses individual-based approaches to detect loci under selection and reveal potential environmental drivers of selection. We tested four landscape genomics methods on a simulated clinal population to determine their effectiveness at identifying a locus under varying selection strengths along an environmental gradient. We found all methods produced very low type I error rates across all selection strengths, but elevated type II error rates under “weak” selection. We then applied these methods to an AFLP genome scan of an alpine plant, Campanula barbata, and identified five highly supported candidate loci associated with precipitation variables. These loci also showed spatial autocorrelation and cline patterns indicative of selection along a precipitation gradient. Our results suggest that landscape genomics in combination with other spatial analyses provides a powerful approach for identifying loci potentially under selection and explaining spatially complex interactions between species and their environment.


2013年10月1日星期二

Computational analysis and characterization of UCE-like elements (ULEs) in plant genomes

Ultraconserved elements (UCEs), stretches of DNA that are identical between distantly related species, are enigmatic genomic features whose function is not well understood. First identified and characterized in mammals, UCEs have been proposed to play important roles in gene regulation, RNA processing, and maintaining genome integrity. However, because all of these functions can tolerate some sequence variation, their ultraconserved and ultraselected nature is not explained. We investigated whether there are highly conserved DNA elements without genic function in distantly related plant genomes. We compared the genomes of Arabidopsis thaliana and Vitis vinifera; species that diverged ∼115 million years ago (Mya). We identified 36 highly conserved elements with at least 85% similarity that are longer than 55 bp. Interestingly, these elements exhibit properties similar to mammalian UCEs, such that we named them UCE-like elements (ULEs). ULEs are located in intergenic or intronic regions and are depleted from segmental duplications. Like UCEs, ULEs are under strong purifying selection, suggesting a functional role for these elements. As their mammalian counterparts, ULEs show a sharp drop of A+T content at their borders and are enriched close to genes encoding transcription factors and genes involved in development, the latter showing preferential expression in undifferentiated tissues. By comparing the genomes of Brachypodium distachyon and Oryza sativa, species that diverged ∼50 Mya, we identified a different set of ULEs with similar properties in monocots. The identification of ULEs in plant genomes offers new opportunities to study their possible roles in genome function, integrity, and regulation.

http://genome.cshlp.org/content/22/12/2455.long




2013年9月22日星期日

Population genomics from pool sequencing

Keywords:

  • Pool sequencing;
  • High throughput sequencing;
  • Neutrality tests;
  • Composite likelihood estimators;
  • Genetic differentiation

Abstract

Next generation sequencing of pooled samples is an effective approach for studies of variability and differentiation in populations. In this paper we provide a comprehensive set of estimators of the most common statistics in population genetics based on the frequency spectrum, namely the Watterson estimator θW, nucleotide pairwise diversity II, Tajima's D, Fu and Li's D and F, Fay and Wu's H, McDonald-Kreitman and HKA tests and Fst, corrected for sequencing errors and ascertainment bias. In a simulation study, we show that pool and individual θ estimates are highly correlated and discuss how the performance of the statistics vary with read depth and sample size in different evolutionary scenarios. As an application, we reanalyze sequences from Drosophila mauritiana and from an evolution experiment in Drosophila melanogaster. These methods are useful for population genetic projects with limited budget, study of communities of individuals that are hard to isolate, or autopolyploid species.

2013年9月16日星期一

mdesci

1. http://www.medsci.cn/

2. 2013自然科学基金查询与分析系统(基础查询版)
http://www.medsci.cn/sci/nsfc.do

3. MedSci 2013年期刊智能查询系统(2012年度)
http://www.medsci.cn/sci/submit.asp

4. 论文服务
http://www.medsci.cn/list.asp?classid=110

public library of bioinformatics

1. http://www.plob.org/
public library of bioinformatics

2. http://www.bioask.net/

2013年9月14日星期六

forest plot

https://mcfromnz.wordpress.com/2012/11/06/forest-plots-in-r-ggplot-with-side-table/#more-356


BroadE Workshop 2013 July 9-10

http://www.broadinstitute.org/gatk/guide/events?id=3093#materials

This workshop covered the core steps involved in calling variants with the Broad’s Genome Analysis Toolkit, using the “Best Practices” developed by the GATK team. View the workshop materials to learn why each step is essential to the calling process, what are the key operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset.

Workshop materials


 - Day 1 - Opening remarks

 -  - Introduction to Next Generation Sequence Analysis

 -  - Introduction to the GATK

 -  - Mapping and duplicate marking (data pre-processing)

 -  - Local realignment around indels
RTC IR

 -  - Base quality score recalibration (BQSR)
BR PR

 -  - Compression with ReduceReads
RR

 - Day 2 - Opening remarks

 -  - Variant calling
UG HC

 -  - Variant quality score recalibration (VQSR)
VR AR

 -  - Genotype phasing and refinement
PBT RBP

 -  - Functional annotation
VA

 -  - Analyzing variant calls
SV CV VE

 - Introduction to Parallelism (video not available yet)
NT NCT Q



Supplemental materials


 -  - GenomeSTRiP: Discovery and genotyping of deletions

 - XHMM: Discovery and genotyping of copy number variation from exome read depth (PDF not available for download yet)

2013年9月9日星期一

2013年龙星计划之生物信息学

http://yixf.name/2013/09/04/%E8%8D%902013%E5%B9%B4%E9%BE%99%E6%98%9F%E8%AE%A1%E5%88%92%E4%B9%8B%E7%94%9F%E7%89%A9%E4%BF%A1%E6%81%AF%E5%AD%A6/

课程主页

课件下载

课程视频

课程简介

  • Day 1. Background. Basic Statistics. Introduce deep sequencing data. Motivational examples.
  • Day 2. Analyze RNA-seq data and small RNA-seq data.
  • Day 3. DNA methylation, Integration with other data types.
  • Day 4. Analyze ChIP-seq data on transcription factors and histone modifications. Integration with other sequencing data types.
  • Day 5. Analyze DNase-seq data and MNase-seq data. Integration with other data types.

实验内容

  • Day 1. Background. Basic Statistics. Introduce deep sequencing data. Motivational examples.
  • Day 2. Analyze ChIP-seq data on transcription factors and histone modifications. Integration with other sequencing data types.
  • Day 3. Analyze RNA-seq data and small RNA-seq data
  • Day 4. Analyze DNase-seq data and MNase-seq data. Integration with other data types
  • Day 5. DNA methylation, Integration with other data types