2. http://openwetware.org/wiki/Short_read_toolbox_Botany2012
Why open source software?
Rocchini and Neteler 2012 Four Freedoms - An article which explains the importance of open source software in science.
Platforms
Currently available platforms:
Sequence format information
- Short Read Toolbox - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to the fastq format standard.
- FASTQ - Wikipedia's FASTQ page.
- FASTA - Wikipedia's FASTA page.
Alignment format information
Short-read quality control software
- TileQC - Requires R, RMySQL and MySQL.
- FastQC - A quality control tool for high throughput sequence data. A Java application.
- Short Read Toolbox - Scripts for quality control of Illumina data.
Open source de novo genome assemblers
- Velvet - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
- ABySS - Multi-threaded de novo assembly.
Open source de novo transcriptome assemblers
- Trinity - De novo assembler designed specifically for transcriptomes.
- Rnnotator - Uses multiple calls to velvet (see de novo genome assemblers).
- Trans-ABySS - Uses multiple calls to ABySS (see de novo genome assemblers).
- Oases - Post-processes velvet output (see de novo genome assemblers) for transcriptomic work.
Hybrid assemblers (reference guided & de novo)
- YASRA - Yet Another Short Read Aligner.
- Aakrosh Ratan dissertation - Description of YASRA.
- Liston:Computer_Scripts - Scripts for post-processing of YASRA contigs.
Open source reference guided assemblers
- SOAP - Short Oligonucleotide Analysis Package.
- MAQ - Mapping and Assembly with Qualities.
- Bowtie - Bowtie. An ultrafast, memory-efficient short read aligner.
- BWA - Burrows-Wheeler aligner.
SNP discovery and calling
Assembly viewers
Sequence query programs
- BLAST - BLAST.
- PLAN - A web application for conducting, organizing, and mining large-scale BLAST searches (limited to 1,000 queries).
- BLAT - BLAT.
Perl
A very brief example to demonstrate file input/output.
Code:
#!/usr/bin/perl use strict; use warnings; my (@temp, $in, $out); my $inf = "data.fq"; my $outf = "data_out.fq"; open($in, "<", $inf) or die "Can't open $inf: $!"; open($out, ">", $outf) or die "Can't open $outf: $!"; while(<$in>){ chomp($temp[0]=$_); # First line is an identifier. chomp($temp[1]=<$in>); # Second line is sequence. chomp($temp[2]=<$in>); # Third line is an identifier. chomp($temp[3]=<$in>); # Fourth line is quality. print $out join("\t", @temp)."\n"; } close $in or die "$in: $!"; close $out or die "$out: $!";
- perlintro - Introduction to perl with links to other documentation.
- BioPerl beginners - Introduction to BioPerl (be prepared for object oriented code).
R project
- R project - Statistical programming environment.
- Bioconductor - R for biologists (micro-array and next generation data).
- APE - Analysis of phylogenetics and evolution R package.
- HT Sequence Analysis with R and Bioconductor
Computing resources
- Galaxy - Web-based front end for popular bioinformatic tools.
- Atmosphere - Virtual computing at iPlant.
- XSEDE portal - Extreme Science and Engineering Discovery Environment.
没有评论:
发表评论