2013年8月5日星期一

short read toolbox

1. http://brianknaus.com/software/srtoolbox/

2. http://openwetware.org/wiki/Short_read_toolbox_Botany2012

Why open source software?

Rocchini and Neteler 2012 Four Freedoms - An article which explains the importance of open source software in science.

Platforms

Currently available platforms:
  • Illumina - Illumina (formerly Solexa).
  • 454 - 454/Roche.

Sequence format information

  • Short Read Toolbox - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to the fastq format standard.
  • FASTQ - Wikipedia's FASTQ page.
  • FASTA - Wikipedia's FASTA page.

Alignment format information

Short-read quality control software

  • TileQC - Requires R, RMySQL and MySQL.
  • FastQC - A quality control tool for high throughput sequence data. A Java application.
  • Short Read Toolbox - Scripts for quality control of Illumina data.

Open source de novo genome assemblers

  • Velvet - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
  • ABySS - Multi-threaded de novo assembly.

Open source de novo transcriptome assemblers

  • Trinity - De novo assembler designed specifically for transcriptomes.
  • Rnnotator - Uses multiple calls to velvet (see de novo genome assemblers).
  • Trans-ABySS - Uses multiple calls to ABySS (see de novo genome assemblers).
  • Oases - Post-processes velvet output (see de novo genome assemblers) for transcriptomic work.

Hybrid assemblers (reference guided & de novo)

Open source reference guided assemblers

  • SOAP - Short Oligonucleotide Analysis Package.
  • MAQ - Mapping and Assembly with Qualities.
  • Bowtie - Bowtie. An ultrafast, memory-efficient short read aligner.
  • BWA - Burrows-Wheeler aligner.

SNP discovery and calling

Assembly viewers

  • Tablet - Tablet, visualizes ACE, AFG, MAQ, SOAP, SAM and BAM formats.
  • SAMtools - SAMtools.

Sequence query programs

  • BLAST - BLAST.
  • PLAN - A web application for conducting, organizing, and mining large-scale BLAST searches (limited to 1,000 queries).
  • BLAT - BLAT.

Perl

A very brief example to demonstrate file input/output.
Code:
#!/usr/bin/perl
use strict;
use warnings;
my (@temp, $in, $out);
my $inf = "data.fq";
my $outf = "data_out.fq";
open($in, "<", $inf) or die "Can't open $inf: $!";
open($out, ">", $outf) or die "Can't open $outf: $!";
while(<$in>){
  chomp($temp[0]=$_); # First line is an identifier.
  chomp($temp[1]=<$in>); # Second line is sequence.
  chomp($temp[2]=<$in>); # Third line is an identifier.
  chomp($temp[3]=<$in>); # Fourth line is quality.
  print $out join("\t", @temp)."\n";
}
close $in or die "$in: $!";
close $out or die "$out: $!";
  • perlintro - Introduction to perl with links to other documentation.
  • BioPerl beginners - Introduction to BioPerl (be prepared for object oriented code).

R project

Computing resources

  • Galaxy - Web-based front end for popular bioinformatic tools.
  • Atmosphere - Virtual computing at iPlant.
  • XSEDE portal - Extreme Science and Engineering Discovery Environment.

Useful links


没有评论:

发表评论