2013年1月30日星期三

ubuntu 中添加中文字体

从windows (c:\WINDOWS\Fonts)直接复制这些字体到 linux~/.fonts目录(或其子目录)下面.

2013年1月29日星期二

PopGenome: An efficient swiss army knife for population genetic & genomic analysis


PopGenome: An efficient swiss army knife for population genetic & genomic analysis


The PopGenome library provides data analysis in population genetics and is programmed in the powerful, open-source, statistical computing environment R. Several polymorphism statistics, such as the number of segregating sites and nucleotide or haplotype diversity based FST measurements can be calculated. In addition, PopGenome contributes a lot of neutrality statistics such as the Tajima D and Rozas R2 test. Testing of the significance of these statistics requires generating bootstrap samples from a neutral model using a coalescent approach. To do this PopGenome provides the application of the MS program, which was written by Hudson (2002).
The Sliding window method can be used to scan genetic data with different window and jump sizes. Bayesian methods are becoming increasingly important for population genetic studies and will be implemented in the next release. The PopGenome environment will also have the appropriate data handling and analysis capabilities needed for genome-wide resequencing projects

2013年1月27日星期日

Relationship between nucleosome positioning and DNA methylation

http://www.nature.com/nature/journal/v466/n7304/full/nature09147.html

Nucleosomes compact and regulate access to DNA in the nucleus, and are composed of approximately 147 bases of DNA wrapped around a histone octamer12. Here we report a genome-wide nucleosome positioning analysis of Arabidopsis thaliana using massively parallel sequencing of mononucleosomes. By combining this data with profiles of DNA methylation at single base resolution, we identified 10-base periodicities in the DNA methylation status of nucleosome-bound DNA and found that nucleosomal DNA was more highly methylated than flanking DNA. These results indicate that nucleosome positioning influences DNA methylation patterning throughout the genome and that DNA methyltransferases preferentially target nucleosome-bound DNA. We also observed similar trends in human nucleosomal DNA, indicating that the relationships between nucleosomes and DNA methyltransferases are conserved. Finally, as has been observed in animals, nucleosomes were highly enriched on exons, and preferentially positioned at intron–exon and exon–intron boundaries. RNA polymerase II (Pol II) was also enriched on exons relative to introns, consistent with the hypothesis that nucleosome positioning regulates Pol II processivity. DNA methylation is also enriched on exons, consistent with the targeting of DNA methylation to nucleosomes, and suggesting a role for DNA methylation in exon definition.

2013年1月20日星期日

Identifying miRNAs, targets and functions

http://bib.oxfordjournals.org/content/early/2012/11/22/bib.bbs075.full

microRNAs (miRNAs) are small endogenous non-coding RNAs that function as the universal specificity factors in post-transcriptional gene silencing. Discovering miRNAs, identifying their targets and further inferring miRNA functions have been a critical strategy for understanding normal biological processes of miRNAs and their roles in the development of disease. In this review, we focus on computational methods of inferring miRNA functions, including miRNA functional annotation and inferring miRNA regulatory modules, by integrating heterogeneous data sources. We also briefly introduce the research in miRNA discovery and miRNA-target identification with an emphasis on the challenges to computational biology.

Grape RNA-Seq analysis pipeline environment

http://big.crg.cat/services/grape

a set of workflows that allow for easy exploration of RNA-Seq data. Among other features, it enables the users to perform
  • quality checks
  • read mapping
  • generation of expression and splicing statistics


http://bioinformatics.oxfordjournals.org/content/early/2013/01/16/bioinformatics.btt016.abstract

2013年1月18日星期五

Field guide to next-generation DNA sequencers


2013 NGS Field Guide: Overview

These pages update the tables presented in Glenn’s (2011) “Field Guide to Next Generation DNA Sequencers” for 2013 values. Previous years’ tables have been archived: 2011, and2012.
Please note that the contents of this guide are the opinion of Travis Glenn, and do not necessarily represent those of any other organisation or person with which he is associated. Neither the other authors of this blog nor John Wiley and Sons are responsible for the accuracy of any of the information supplied by Travis.
  • Table 1a-c.  ”Grades” for common applications on various NGS instruments.  Other information from the original table 1 is relatively static.
  • Table 2a.  Run time, Millions of reads/run, Bases/read, and Yield/run for all common commercial NGS platforms.
  • Table 2b. Reagent costs/run, reagent costs/Mb, and minimum commercially available units for all common commercial NGS platforms.
  • Table 3a. List purchase price for for all common commercial NGS platforms, ancillary equipment, and service contracts.
  • Table 3b. Computational resources required for all common commercial NGS platforms.
  • Table 3c. Errors and error rates for common commercial NGS platforms.
  • Table 4.  Advantages and Disadvantages for all common commercial NGS platforms.

2013年1月9日星期三

AMOScmp - reference based alignment

1. http://sourceforge.net/apps/mediawiki/amos/index.php?title=AMOS
The AMOS consortium is committed to the development of open-source whole genome assembly software.

2. http://www.cbcb.umd.edu/research/SR-assembly-tutorial.shtml

PAGIT - Post Assembly Genome Improvement Toolkit

PAGIT - Post Assembly Genome Improvement Toolkit

Tools to generate automatically high quality sequence by ordering contigs, closing gaps, 
correcting sequence errors and transferring annotation.
PAGIT addresses the need for software to generate high quality draft genomes. It is based on a series of programs that we developed:
  1. ABACAS, that is able to contiguate contigs from a de novo assembly against a closely related reference.
  2. IMAGE, an iterative approach for closing gaps in assembled genomes using mate pair information. It is able to close gaps left open by the assembler in a draft genome, even when using the same data sets as used by the original assembler.
  3. iCORN, that enables errors in the consensus sequence to be corrected by iteratively mapping reads to the current assembly.
  4. RATT, a tool to transfer the annotation from a reference genome, or an earlier assembly, onto the latest assembly.

2013年1月6日星期日

delete or move/copy lots of files in Linux

http://houwenhui.gotoip2.com/archives/1700


(1)快速删除大量小文件
    今天遇见一个百万级的cache目录,删了20+分钟只删掉一个目录。。。。
    在网上找到了一种巧妙的快速删除方法,原理很简单,使用rsync同步一个空目录即可。对于万级文件的目录基本是秒删,回车就OK。
    步骤如下:
    1、建立一个空目录
        mkdir -p /tmp/rsync_blank
    2、确立需要清空的目标目录
        /data/ooxx
    3、使用rsync同步删除(注意目录后面的“/”),整体效率会快一个数量级的样子。
        rsync –delete-before -a -H -v –progress –stats /tmp/rsync_blank/ /data/ooxx/
    选项说明:
    –delete-before 接收者在传输之前进行删除操作
    –progress 在传输时显示传输过程
    -a 归档模式,表示以递归方式传输文件,并保持所有文件属性
    -H 保持硬连接的文件
    -v 详细输出模式
    -stats 给出某些文件的传输状态
    一般我们不需要显示进度,使用以下命令即可
        rsync –delete-before -a -H /tmp/rsync_blank/ /data/ooxx/
  这样我们要删除的 cache目录就会被清空了。
tips:
当SRC和DEST文件性质不一致时将会报错
当SRC和DEST性质都为文件【f】时,意思是清空文件内容而不是删除文件
当SRC和DEST性质都为目录【d】时,意思是删除该目录下的所有文件,使其变为空目录
最重要的是,它的处理速度相当快,处理几个G的文件也就是秒级的事
最核心的内容是:rsync实际上用的就是替换原理

  (2)快速复制大量小文件方法
  1,在需要对大量小文件进行移动或复制时,用cp、mv都会显得很没有效率,可以用tar先压缩再解压缩的方式。
  2,在网络环境中传输时,可以再结合nc命令,通过管道和tcp端口进行传输。
  nc和tar可以用来快速的在两台机器之间传输文件和目录,比ftp和scp要来得简单的多。
  由于nc是一个超轻量的命令,所以一般busybox都会集成它。当一个linux终端,比如linux pda,
  通过usblan的方式连接到另一台linux主机的时候,这样的嵌入式终端上一般不会集成ftp server, ssh server
  这样比较笨重的服务,这个时候, nc可能成为唯一的上传手段。
  比如将机器A上的mytest目录上传到到机器 B(192.168.0.11)上,只需要:
  在机器B上,用nc来监听一个端口,随便就好,只要不被占用;并且将收到的数据用tar展开。-l代表监听模式。
  #nc -l 4444 |tar -C /tmp/dir -zxf -
  然后,在A上通过nc和 tar发送test目录。使用一致的4444的端口。
  #tar -zcvf  -  test|nc 192.168.0.11 4444

Hello - I use to think I was good with a computer

Hello - I use to think I was good with a computer

this is a good post from seqanswer. a lot on NGS software/computer manipulation were explained. 

FULL INSTRUCTION LIST: How To Transform Your Mac Into A Sequencing Analysis Machine

Introduction

I’m a newly hired RA from Jonathan Keats’s lab who will be helping with a bunch of new sequencing stuff. I have been working on installing the suite of sequencing programs on our new workstation. Before I started, I knew virtually nothing about Terminal, Unix or manipulating sequencing files when I started. (In my mind, Terminal was where you board trains and Unix was some Talaxian from Star Trek: Voyager.) The learning curve has been steep, obviously, but Jonathan’s previous posts have been invaluable in making the adjustment. 

I wanted to update those posts, however, because (a) some of the instructions have changed as newer versions of applications have appeared; (b) posts could be combined into one gigantic “master-post”; (c) some of the instructions are much more advanced/complicated than others; and (d) some helpful instructions for certain applications weren’t included. 

To make things easier on the next bright-eyed generation of programming-illiterate biologists, I have included specific code instructions at practically every step of the installation process. After a couple times mentioning a particular command, I will stop including it to save space, so if you’re starting from the middle of the instruction set, refer to previous instructions for more information.

If you find this compilation of instructions frustratingly simplistic, then I suggest you read through the previous posts, if only to read through Jonathan’s wry comments about the entire bioinformatics process. Hopefully, this post will be helpful to extreme sequencing/Unix novices like myself.

Please let me know if you have any questions, good luck, and happy hunting!

David K. Edwards V and Jonathan Keats

Before You Begin: Programs

Unix

Before you begin, you should familiarize yourself with Terminal (Applications>Utilities>Terminal). Or better yet, you should invest some time working though the Unix portion of the "Unix and Perl for Biologists" course (http://groups.google.com/group/unix-...for-biologists), made public by Keith Bradham and Ian Korf at UC Davis. Or preorder their book on Amazon: http://www.amazon.com/UNIX-Perl-Resc...0189572&sr=8-1. Tell your PI it will be the best $50 investment of their career!

It’s really helpful for beginners understanding non-GUI file manipulations and gives you a good list of important Unix commands. (If you’re completely new to programming, it might be too confusing or complicated, but nobody said this was going to be easy.)

Download the entire course package: http://korflab.ucdavis.edu/Unix_and_Perl/index.html.

Here is a general list of helpful Unix commands:
  • To get a manual on any command, type "man command". Type "space" to page down, "b" to back-up, and "q" to quit.
  • To see what folder you are in currently, type "pwd".
  • To see what folders and files exist in the current directory, type "ls".
  • To move into a folder in the current directory, type "cd myfolder". (Note: You can move multiple levels downstream with "cd myfolder/myfolder2".)
  • To go back one directory, type "cd ..". (Note: You can move back multiple levels upstream with "cd ../..".)
  • To copy a file from the current directory to a downstream folder, type "cp myfile myfolder/". (Note: You can copy a file up one directory with "cp myfile ../".)
  • To move a file from the current directory, type "mv" instead of typing “cp”.
  • A folder immediately downstream of the root directory (i.e. absolute top of the tree) is always defined by "command /folder". (This means if you type "cd /something", it looks for the folder "something" downstream of the root directory.)
  • To note the current directory, type ".".
  • To change the permissions of the compiled applications, type "chmod 755 myfile". (This makes the file readable and executable by everyone but only writable by you. To allow everybody to do everything to the file, type “chmod 777 myfile”.)
  • To become a super user for a particular command (and become Superman!), type “sudo”.
  • To decompress a tarball file, type “tar -xvzf file.tar.gz”, where “file.tar.gz” is the decompressed file.


Xcode (http://developer.apple.com/technolog...ols/xcode.html)

NOTE: For some reason Apple has decided to mess with you and recent versions of Xcode (OS Lion and OS Mountain Lion compatible versions) no longer install some essential command line commands like "make" which you will use extensively to build the applications. However, there is an extremely simple solution to install these applications from within Xcode.

To install command line tools see (http://slashusr.wordpress.com/2012/0...nd-line-tools/)
  • Launch Xcode
  • Go to Preferences
  • Go to Downloads
  • Click the "Command Line Tools" radio button
  • Follow Prompts

You need to install Xcode on your computer so you can compile the various applications and if you start writing your own scripts it is a nice text editor in our opinion.

The newest version available on the App store, Xcode 4.3, is only compatible with OSX Mountain Lion (10.8.x). If you have Leopard (10.5.x) or Snow Leopard (10.6.x) or Lion (10.7.x), then you can install the package from your OS installation disks. Insert Mac OS X Install Disc 2, open the “Xcode Tools” folder, and double click “XCodeTools.mpkg”. Otherwise, you need to sign up to be a developer and download it from the website.

MacPorts (http://www.macports.org/)

You need to install some packages to run certain applications. There are two programs to install those packages, Fink and MacPorts. There isn’t much difference between both programs; in general, Fink is more conservative about upgrading packages that MacPorts, but both are perfectly acceptable. I simply chose MacPorts for this protocol.

R and Bioconductor

R: You will need R to perform statistical computations and generate graphs from your data. To install, visit http://www.r-project.org/, then select preferred CRAN mirror and follow the instructions.

Bioconductor: You will probably need Bioconductor to analyze your high-throughput genomic data. To install Bioconductor, you must have the most recent release version of R. The most common packages you will need to install areaffysimpleaffy, and gcmra.

To install these packages, starting first with affy, simply start R and type in the following:

Code:
source("http://bioconductor.org/biocLite.R")
biocLite("affy")
Press enter. R will automatically install the dependencies ‘Biobase’, ‘affyio’, and ‘preprocessCore’ during this installation.

To install simplaffy, replace “affy” with “simplaffy” in the above code and press enter. R will automatically install the dependencies ‘DBI’, ‘RSQLite’, ‘xtable’, ‘IRanges’, ‘AnnotationDbi’, ‘annotate’, ‘Biostrings’, ‘genefilter’, and ‘gcrma’.

There are three other dependencies you should install:

(NOTE: The version of cummeRbund that is installed through the current BioConductor development version is 1.0.0. The latest version, version 1.1.3 will be available as part of the Bioconductor development version 2.10, which will be made available in April 2012. For more information, please visit: http://compbio.mit.edu/cummeRbund/index.html.)

For more installation instructions, visit http://www.bioconductor.org/install/. (For this protocol, the current release version of R is 2.14, and the currently released Bioconductor version is 2.9.)

Before You Begin: Folders

You should establish a series of folders to manage your sequencing data and move around after each step is completed. You don’t necessarily have to follow this system of folders and subfolders, but all of our instructions for installing programs are based on this file hierarchy, so if you want to avoid confusion, and jump on our awesome folder-managing bandwagon, then read carefully!

Here is our system of folders and subfolders:

We have a main working directory called "ngs" in our $HOME directory (Users/YourUserName/). This is our home base for data analysis, and all of our steps and scripts will be called from this folder. Here are our subfolders within “ngs”:
  • ngs/{applications,bwa,run_parameters,run_parameters,scripts,temp,tophat,tophat_fusion}
  • ngs/analyzed_read_files/{chipseq,exomes,genomes,matepair,rnaseq}
  • ngs/finaloutputs/{chipseq,exomes,genomes,matepair,rnaseq}
  • ngs/refgenomes/{bfast_indexed,bowtie_indexed,bwa_indexed,downloads}
  • ngs/refgenomes/downloads/{ncbi36_hg18,grch37_hg19}
  • ngs/refgenomes/downloads/ncbi36_hg18/{annotation_files,reference_sequences}
  • ngs/refgenomes/downloads/grch37_hg19/{annotation_files,reference_sequences}

Each of these subfolders have subfolders, so instead of listing everything here, please visit the script “create_ngs_directorystructure_v4.sh” (http://seqanswers.com/forums/showthr...?t=4589&page=4, post #61) for more information. (To run the script, simply copy and paste the code included in that post when you immediately start Terminal, or when you are in the home directory. The corresponding files and folders will be created.)

Before You Begin: Picking Genome Files

[Maq is no longer included in this protocol because of recent improvements to BWA. If you need to install Maq, please see Jon’s preceding post on how to install it.]

This step is important and can be the source of most issues. You need to pick a source for all information genome sequence files and annotations. We use ensembl over UCSC for many reasons. For human genome reference files, we recommend the 1000 genomes versions. They think about the human genome much more than you do, so give them some credit. Besides, many of the applications you will use are published by those groups, so running them is streamlined and less complicated.

We will be using BWA to align our sequencing data against the reference genome (see BWA installation instructions under “Installing Programs”). You might think to use ensembl (http://www.ensembl.org/info/data/ftp/index.html) to get your reference genome, but the full human genome file (Homo_sapiens.GRCh37.66.dna_rm.toplevel.fa.gz) exceeds the maximum character length allowed by BWA’s index command.

Instead, you should use the 1000 Genomes reference genome (ftp://ftp.sanger.ac.uk/pub/1000genom...ect_reference/). You need to save the reference genome onto your computer:
  • Copy the file human_g1k_v37.fasta.gz” to your “ngs/refgenomes” folder.
  • Decompress the file by double clicking on it.

Installing Applications

Welcome to the meat-and-potatoes of this somewhat bloated post: program installation. This section has been written in chronological order, meaning that I started with the first program and proceeded onward to the last program. Some of the programs require that you have installed other programs, and unfortunately, unless explicitly mentioned, I don’t know which programs have those requirements.

Therefore, I recommend you follow the same installation order for your own computer. This will certainly make things simpler for newbies like myself, especially since I included the commonly used programs (e.g. BWA) before the less commonly used programs (e.g. Cairo).

As mentioned above, if you’re skipping around, I have written next to each application if it requires one of the preceding applications. However, I can’t be sure that this information is correct, so if you encounter a problem during installation, please let us know and we can amend our instructions.

Final note: The version numbers of programs might be out-of-date, so please change the instructions based on those new version numbers. We will try to update this document periodically to avoid this problem, but you should be forewarned! 

Setting Your Path Directory

To run many of the applications, you will need to either place the applications in the PATH, define additional PATH locations, or note the location of the application each time you call it. To find the current PATH directories used by Unix, type "$PATH". You should see something similar to the following:

Code:
-bash: /sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin
These folders are directly below the root directory and represent the places Unix looks when running an application. If you want to run any of these applications, you must download and compile the application. Before you begin installing any application, do the following (thanks to Nils Homer for the suggestion):

Create a directory in your home directory for the applications:
Code:
mkdir -p $HOME/local/bin
Edit your .profile file so this directory is in your PATH directories (you should see a file called “.profile”.):
Code:
ls –a
Open with nano by typing:
Code:
nano .profile
Add the following lines to your .profile file but DO NOT remove things in the current version: (you don’t need “sudo”):
Code:
export PATH=$HOME/local/bin:$PATH
Save your changes by typing "control O".
Exit nano by typing "control X".

Additionally, when you install applications, place the executable files in this directory so they are in a $PATH directory. You can either copy the application to the directory $HOME/local/bin or install using install script "./configure --prefix=$HOME/local”

BWA (http://sourceforge.net/projects/bio-bwa/files/; change naming in instructions based on BWA version) 
NOTE: Reference indexes created in previous versions do not work in version 6 so you need to reindex each reference if you have worked with previous versions or more importantly if you are setting up and someone is providing you pre-index reference files
  • Click on the link above and download the newest version (called "bwa-0.6.1.tar.bz2").
  • Move the "bwa-0.6.1.tar.bz2" file to your "ngs/applications" folder.
  • Decompress the file by double clicking on it.
  • Open Terminal (if previously open, ensure you are in your home directory).
  • Navigate to the decompressed folder by typing:
Code:
cd ngs/applications/bwa-0.6.1
Compile the application by typing:
Code:
make
Lines of code will start appearing under your command. Make sure that no errors are listed!

You can confirm that the installation was successful by typing:
Code:
./bwa
This should bring up a window with the BWA command options. (The first line is “Program: bwa (alignment via Burrows-Wheeler transformation)”.)

Copy "bwa" to your path directory by typing:
Code:
cp bwa $HOME/local/bin
Now Typing "bwa" into terminal at any point in any folder will launch the bwa program

SAMtools (http://sourceforge.net/projects/samtools/; change naming in instructions based on SAMtools version)
  • Click on the link above and download the newest version (called " samtools-0.1.18.tar.bz2").
  • Move the "samtools-0.1.18.tar.bz2" file to your "ngs/applications" folder.
  • Decompress the file by double clicking on it.
  • Open Terminal (if previously open, ensure you are in your home directory).
  • Navigate to the decompressed folder by typing:

Code:
cd ngs/applications/samtools-0.1.18
Compile the application by typing:
Code:
make
Lines of code will start appearing under your command. Make sure that no errors are listed!

You can confirm that the installation was successful by typing:
Code:
./samtools
This should bring up a window with the SAMtools command options. (The first line is “Program: samtools (Tools for alignments in the SAM format)”.)

Copy "samtools" and other valuable applicationsto your path directory by typing:
Code:
cp samtools $HOME/local/bin
cp bcftools $HOME/local/bin
cp vcfutils.pl $HOME/local/bin
(We are assuming you followed our path directory here. If not, then change “$HOME/local/bin” to your location of choice.)

Note: To save space, we have reduced the number of specific instructions, so instead of writing the exact lines of code required for commands, we will simply summarize them. This applies to decompressing the file, navigating to the decompressed folder, compiling the application, and copying to your path directory.

GATK (ftp://ftp.broadinstitute.org/pub/gsa...latest.tar.bz2)

According to the website (http://www.broadinstitute.org/gsa/wi...ading_the_GATK, “Outside the Broad Institute”), before you install GATK, you need to install three applications: JVM (Java Virtual Machine), Apache Ant, and Git. GATK requires that your version of JVM is 1.6 or greater, and your version of Apache Ant is 1.7.1 or greater.

JVM (Java Virtual Machine)

You should have JVM already installed on your computer. To confirm this, open Terminal and type:
Code:
java –version
Three lines of code should appear, starting with java version “1.6.0_29”. To update Java, search “Java” on the Apple website and find the most recent version that corresponds to your operating system.

Ant (http://ant.apache.org/)

You should already have Apache Ant installed on your computer. To confirm this, open Terminal and type:
Code:
ant –version
You should see something like this: “Apache Ant(TM) version 1.8.2 compiled on October 14 2011”. If that doesn’t work, here’s how to install Ant manually:

Click on the link above and download the latest version. (This version will probably be “apache-ant-1.8.2-bin.tar.bz2”.)
Move the “apache-ant-1.8.2-bin.tar.bz2” file to your “ngs/applications” folder.
Decompress the file.
Follow the somewhat complex instructions in the manual. To access the manual, click on the decompressed folder and look under docs/manual/install.html.

Git (http://git-scm.com/download)

Click on the link above and download the latest version. (This version will probably be “git-1.7.9.1-intel-universal-snow-leopard.dmg”.)
Install like any ordinary Mac application. (You thought it would be more complicated, right? You’re welcome!)

Now, onto installing GATK:

Click on the link above.
Move the “GenomeAnalysisTK-latest.tar.bz2” file to your "ngs/applications" folder.
Decompress the file and navigate to it.

To confirm this, Type in “java –jar GenoneAnalysisTK.jar --help”. (Do not copy this text! You will need to handtype it.)

You should see a message like: The Genome Analysis Toolkit (GATK) v1.4-30-gf2ef8d1, Compiled 2012/02/17 20:18:04.


Bowtie (http://sourceforge.net/projects/bowtie-bio/files/bowtie)

Click on the link above and download the latest version. (This version will probably be “bowtie-0.12.7-src.zip”.)
Move the “bowtie-0.12.7-src.zip” file to your “ngs/applications” folder.
Decompress the file and navigate to it.
Compile the application (“make”).
Copy "bowtie", "bowtie-build", and "bowtie-inspect" to your path directory.

To test the installation, navigate to the bowtie folder and type:
Code:
bowtie indexes/e_coli reads/e_coli_1000.fq
You should see a bunch of information stream onto the screen, and at the bottom, you should see: 

Code:
# reads processed: 1000
# reads with at least one reported alignment: 699 (69.90%)
# reads that failed to align: 301 (30.10%)
Reported 699 alignments to 1 output stream(s)
Boost (http://www.boost.org/)[Prerequisites: SAMtools, $PATH configuration.]

WARNIHG: Do not download the newest version of Boost (i.e., version 1.48.0)! This version will not natively work with this protocol. Instead, install any earlier version of Boost—we recommend version 1.47.0—and follow the instructions below. (For more information, and instructions on how to modify the latest version of Boost, please visit: http://seqanswers.com/forums/showthread.php?t=16637.)

Click on the link above and download the latest version. (MAKE SURE THIS IS VERSION “boost_1_47_0.tar.bz2” OR EARLIER.)
Move the “boost_1_47_0.tar.bz2” file to your “ngs/applications” folder.
Decompress the file and navigate to it.
Build/bootstrap the package by typing:
Code:
./bootstrap.sh
Type in the following command:
Code:
./bjam --prefix=$HOME/local --toolset=darwin architecture=x86 address-model=32_64 link=static runtime-link=static --layout=versioned stage install
This command will take awhile, so take your coworkers out for cappuccinos or something while you wait. Once it’s finished, the command will create “include” and “lib” subfolders in $HOME/local. You might get some error messages for which targets failed or were skipped, but ignore that because it won’t affect your other applications.
In the new "include" folder, create a subfolder "bam".
Using Terminal, navigate to the SAMtools folder within ngs/applications.
Copy the "libbam.a" file in the SAMtools folder to $HOME/local/lib by typing:
Code:
cp libbam.a $HOME/local/lib
Copy the header files (files ending in .h) in the SAMtools folder to $HOME/local/include/bam by typing:
Code:
cp *.h $HOME/local/include/bam
Tophat (http://tophat.cbcb.umd.edu/) [Prerequisites: Bowtie, SAMtools.]


Click on the link above and download the latest version. (This version will probably be “tophat-1.4.1.tar.gz”. Click on the option that says “Source Code.”)
Move the “tophat-1.4.1.tar.gz” file to your “ngs/applications” folder.
Decompress the file and navigate to it.
Build the package by typing
Code:
./configure --prefix=$HOME/local --with-bam=$HOME/local
[/li]

Compile the application (by typing “make”).
Make the executable available in your $PATH directory by typing:
Code:
make install
To test the Tophat installation, please visit the download website (http://tophat.cbcb.umd.edu/tutorial.html; search under “Testing the installation”) and follow these instructions:

Click on the link above and download the file. (This file will probably be “test_data.tar.gz”.
Decompress the folder and navigate to it.
To process the data, type:
Code:
tophat -r 20 test_ref reads_1.fq reads_2.fq
You should see lines of code after your command, beginning with something like the following:
Code:
[Mon May  4 11:07:23 2009] Beginning TopHat run (v1.1.1)
-----------------------------------------------
Cufflinks (http://cufflinks.cbcb.umd.edu/tutorial.html) [Prerequisites: Boost (SAMtools).] 

Click on the link above and download the latest version. (This version will probably be “cufflinks-1.3.0.tar.gz”. Click on the option that says “Source Code.”)
Move the “cufflinks-1.3.0.tar.gz” file to your “ngs/applications” folder.
Decompress the file and navigate to it.
Build the package (with Boost, so different from Tophat instructions!) by typing
Code:
./configure --prefix=$HOME/local --with-boost=$HOME/local --with-bam=$HOME/local
[/li]

Compile the application (by typing “make”).
Make the executable available in your $PATH directory by typing:
Code:
make install
To test the installation, you will need to download the cufflinks test data (http://cufflinks.cbcb.umd.edu/tutorial.html#ref; look under “Testing the installation). You can download the test text file anywhere (e.g. within your username folder) and navigate to that folder. 

Process the test data by typing:
Code:
cufflinks test_data.sam
You should see the following at the beginning of your output:
Code:
You are using Cufflinks v1.3.0, which is the most recent release.
[bam_header_read] EOF marker is absent. The input is probably truncated.
VarScan (http://varscan.sourceforge.net/) (Prerequisites: Samtools?)

[ol]
[li]Click on the link above and download the latest version. (This version will probably be “VarScan.v2.2.8.jar”.)[/li]
[li]Move the “VarScan.v2.2.8.jar” file to your “ngs/applications” folder.[/li]
[li]Navigate to your “applications” folder.[/li]
[/ol]

To test the installation, type:

Code:
java -jar VarScan.v2.2.8.jar
You should see the following at the beginning of your output:

Code:
VarScan v2.2

USAGE: java net.sf.varscan.VarScan [COMMAND] [OPTIONS]
Picard (http://picard.sourceforge.net/

[ol]
[li]Click on the link above and download the latest version. (This version will probably be “picard-tools-1.62.zip”.)[/li]
[li]Move the “picard-tools-1.62.zip” file to your “ngs/applications” folder.[/li]
[li]Decompress the file.[/li]
[li]Copy all .jar applications to your $PATH directory by typing:

Code:
cp .jar $HOME/local/bin
While this step isn’t required, it makes things easier and the pipelines we provide use this concept.[/li]
[/ol]

snpEff (http://snpeff.sourceforge.net/download.html)

To install snpEff, you must install both the program and the corresponding reference genome. These instructions include installing the most recent human genome from Ensembl (which is provided on their website). If you use a different genome, make sure that your genome version matches your snpEff version. (In other words, in this example, the genome version is for “v2_0_5” and the snpEff version is for “v2_0_5d”.)

[ol]
[li]Click on the link above and download the latest version of snpEff. (This version will probably be “snpEff_v2_0_5d_core.zip”.)[/li]
[li]Move the “snpEff_v2_0_5d_core.zip” file to your “ngs/applications” folder.[/li]
[li]Decompress the file.[/li]
[li]In the link above, download the latest version of the reference genome (This version will probably be “snpEff_v2_0_5_GRCh37.65.zip”.)[/li]
[li]Move the “snpEff_v2_0_5_GRCh37.65.zip” file to your “ngs/applications” folder.[/li]
[li]Decompress the file.[/li]
[/ol]


At this point, you’re probably feeling comfortable with these instructions, maybe even patting yourself on the back for understanding them. Well, prepare for more confusion, because we’re entering the wonderful seafaring world of ports!

For the following applications, you will need to install additional ports on your computer. There are two websites you can use to install them: MacPorts (http://www.macports.org/) and fink (http://www.finkproject.org/). The difference between them is that, in general, fink is more conservative about upgrading packages than MacPorts, so while the MacPorts version will be newer, the fink version might be more stable. We selected MacPorts for installing our packages, so our instructions will be tailored toward that program.

MacPorts (http://www.macports.org/install.php) [Prerequisites: XCode.]

To install MacPorts, please visit that website. Choose your operating system under the “Mac OS X Package (.pkg) Installer” section. Install like any ordinary software application.

To test the installation, close Terminal, meaning completely quit the application, and restart to run MacPorts. To begin the program, type in “sudo port”. You should see:

Code:
MacPorts 2.0.3
Entering interactive mode... ("help" for help, "quit" to quit)
To install any port, type:

Code:
install program
where “program” is name of port you’re installing. This is the method for installing any of the ports used by the subsequent applications. As the program indicates, to exit MacPorts, type “quit” and press enter.

FastX (http://hannonlab.cshl.edu/fastx_toolkit/download.html

MacPorts: Install “pkgconfig”. (The program is called “pkgconfig 0.26”, found on page 171 of the MacPorts website.)

[ol]
[li]Click on the link above and download libgtextutils. (This version will probably be “libgtextutils-0.6.tar.bz2”.)[/li]
[li]Move the “libgtextutils-0.6.tar.bz2” file to your “ngs/applications” folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]To install the program, like installing previous programs, type in “./configure” and press enter.
[li]To compile the application completely, type in “make” and press enter, then type in “sudo make install” and press enter.
[li]Make sure the program can identify gtextutils” by typing:

Code:
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
[/li]
[li]Once that command is processed, type: 

Code:
pkg-config --cflags gtextutils-0.1
You should see the following response:

Code:
-I/usr/local/include/gtextutils-0.1/
(If you have any questions about this step, or have any troubleshooting concerns about installing this application, please visit: http://hannonlab.cshl.edu/fastx_tool...nfig_email.txt.)[/li]

[li]Click on the link above and download the latest version of FastX. (This version will probably be “fastx_toolkit-0.0.13.tar.bz2”.)[/li]
[li]Move the “fastx_toolkit-0.0.13.tar.bz2” file to your “ngs/applications” folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]To install the program, like installing previous programs, type in “./configure”, then “make”, then “make install”.[/li]
[/ol]
Circos (http://mkweb.bcgsc.ca/circos/software/download/)
Before installing Circos, you will need to update your perl distribution to install all of Circos’s required packages. To install the packages, type the following in Terminal:

Code:
sudo perl -MCPAN -e shell
When it asks if you would like the program to configure things automatically, and choose the best CPAN mirror sites, type “yes”.

To install any package, type:

Code:
install program
where “program” is name of package you’re installing. Before installing these packages, however, you will need to install GD. (I know, it’s like Inception, with a program installation within a program installation within a program….)

GD (http://code.google.com/p/google-desk...tar.gz&can=2&q)

[ol]
[li]Click on the link above and download the latest version. (This version will probably be “gd-2.0.35.tar.gz”.)[/li]
[li]Move the “gd-2.0.35.tar.gz” file to your “ngs/applications” folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]To install the program, like installing previous programs, type in “./configure”, then “make”, then “sudo make install”.[/li]
[/ol]

Here is a list of the packages you will need to install (please install them in the following order because some of the packages require other packages):

YAML
Config::General (v2.50 or later)
GD::Polyline (requires YAML)
List::MoreUtils
Math::Bezier
Math::Round
Math::VecStat
Params::Validate
Readonly
Regexp::Common
Set::IntSpan (v1.16 or later)
Clone
Text::Format

Also, if you get the message that says something like this:

Code:
New CPAN.pm version (v1.9800) available.
  [Currently running version is v1.9456]
then type “install CPAN”, then “reload CPAN”, to update to the latest CPAN version. (This process takes a couple minutes.)

All right, here are the instructions for installing Circos:

[ol]
[li]Click on the link above and download the bug fixes version. (This version will be something like“circos-0.56-1.tgz”.)[/li]
[li]Move the “circos-0.56-1.tgz” file to your “ngs/applications” folder.[/li]
[li]Decompress the file.[/li]
[li]Click on the link above and download the latest version. (This version will probably be “circos-0.56.tgz”.)[/li]
[li]Move the “circos-0.56.tgz” file to your “ngs/applications” folder.[/li]
[li]Decompress the file.[/li]
[li]Drag the decompressed file within the bug fixes version into the file of the latest Circos version. When prompted, choose “replace file”.[/li]
[/ol]

To test the Circos installation, please visit this website (http://circos.ca/software/download/tutorials/) and follow these instructions:

[ol]
[li]Click on the link above and download the tutorial file (This version will be something like“circos-tutorials-0.56.tgz”.)[/li]
[li]Move the “circos-tutorials-0.56.tgz” file to your “ngs/applications” folder.[/li]
[li]Decompress the file.[/li]
[li]Drag the decompressed tutorial file into the file of the latest Circos version. When prompted, choose “replace file”.[/li]
[li]Navigate to the “circos-0.56” folder.[/li]
[li]Access the tutorial by typing:

Code:
cd tutorials/2/2
[/li]
[li]Test the tutorial by typing:

Code:
../../../bin/circos -conf ./circos.conf
[/li]
[/ol]

You should see a series of commands flash onto the screen, eventually ending with:

Code:
debuggroup summary,output 4.85s created PNG image ./circos.png (839 kb)
debuggroup summary,output 4.86s created SVG image ./circos.svg (356 kb)
If you navigate to that folder manually (“circos-0.56/tutorials/2/2”) and click on the “circos.png” file, you should see a circular graph of each human chromosome in different colors.

Finally, we copied the binary and library files to your path directory so you can just type "circos" instead of "bin/circos" each time you run the program. If you follow our folder hierarchy, then type the following commands in sequential order:

Code:
cd ngs/applications/circos-0.56/bin
cp circos $HOME/local/bin
cd ../lib
cp circos.pm $HOME/local/lib
Also, within the circos folder, to create a couple directories for your personal use, type the following commands in sequential order:

Code:
cd ngs/applications/circos-0.52
mkdir my_plots
mkdir my_reference_files
mkdir my_config_files
mkdir my_data_files
Once you’ve created those directories, you need to populate your reference files. (For more information, please visit: http://circos.ca/tutorials/.) When you visit that website, you can download the hg19 karyotype, decompress the corresponding file, and drag it into your newly created “my_reference_files” folder.
BEDTools (http://code.google.com/p/bedtools/

[ol]
[li]Click on the link above and download the latest version. (This version will probably be “BEDTools.v2.15.0.tar.gz”.)[/li]
[li]Move the “BEDTools.v2.15.0.tar.gz” file to your “ngs/applications” folder.[/li]
[li]Decompress the file and navigate to it. (NOTE: The file will be renamed to something like “BEDTools-Version-2.15.0”.)[/li]
[li]To install the program, type in “make clean, then “make all”. You should see a series of commands being processed.[/li]
[li]To list the available binaries and confirm that they installed, type “ls bin”. You should see columns of files beginning with “annotateBed” in the upper lefthand corner and ending with “windowMaker” in the lower righthand corner.[/li]
[li]Copy the binaries to your PATH directory by typing:
Code:
cp bin/* $HOME/local/bin
[/li]
[/ol]
Pairoscope (http://pairoscope.sourceforge.net/) [Prerequisite: SAMTools]
Truthfully, installing this program is difficult, so brace yourselves, folks. Or as Samuel Jackson says in Jurassic Park, “hold onto your butts.” 

Before installing pairoscope, you need to install Cairo. To install Cairo, type:

Code:
sudo port install cairo
You should get the following response:

Code:
 --->  Computing dependencies for cairo
--->  Cleaning cairo
Also, before installing pairoscope, you need to install CMake (http://www.cmake.org/cmake/help/install.html). To install the program, click on the link above and download the latest version. (This version will probably be “cmake-2.8.7-Darwin64-universal.dmg”.) Simply install like you would a normal application. (Oh, and when the bouncing colorful triangle appears on your Dock, click to “install command line links”.)

Finally, here are the instructions to install pairoscope:

[ol]
[li]Click on the link above and download the latest version. (This version will probably be “pairoscope-0.2.tgz”.)[/li]
[li]Move the “pairoscope-0.2.tgz” file to your “ngs/applications” folder.[/li]
[li]Decompress the file and navigate to the applications folder.
To install pairoscope, type:

Code:
ccmake pairoscope-0.2
The screen will transform and you will see a series of capitalized instructions on the left and corresponding answers written in white text on the right. To toggle advanced mode, type “t”.

Scroll all the way down with the arrow keys until you reach “Page 2 of 2”. (NOTE: The following series of instructions are based on our folder architecture, and assume that you followed our instructions for installing SAMTools. If your folder architecture is different, please point ccmake to your corresponding SAMTools directories.)

To edit the Samtools include and library locations, follow these instructions:

[ul]
[li]Under “Samtools_INCLUDE_DIR”, type “/-----/local/include/bam”.[/li]
[li]Under “Samtools_LIBRARY”, type “/-----/local/lib/libbam.a”.[/li]
[/ul]

where “-----“ is the exact folder hierarchy of your computer. (To access that exact hierarchy, type in “cd” in the command line and type in “pwd”. The resulting line of code should be pasted into the “-----“ section described above.)

To configure, type “c”. You should see a warning appear that starts with:

Code:
CMake Warning (dev) in CMakeLists.txt:
You can ignore this warning, so type “e”. To generate and exit, type “g”.

Now, pairoscope is ready. To make pairoscope, navigate to the “applications” folder and type:

Code:
cmake pairoscope-0.2
You should see a series of commands ending with:

Code:
 -- Build files have been written to: /-----/ngs/applications
where the “-----“ is the same prefix described above.

A new folder called “CMakeFiles” has been created in the “applications” folder. To make, navigate to the “applications” folder and type “make”. You will see a bunch of purple and green commands beginning with:

Code:
Scanning dependencies of target pairoscope
Copy the newly-created pairoscope program to your $PATH by typing:

Code:
cp pairoscope $HOME/local/bin
To test the installation, type “pairoscope”. You should see a series of commands beginning with the following:

Code:
Usage:   pairoscope [options]        
FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/)
[ol]
[li]Click on the link above and download the latest version. (This version will probably be “Source Code for FastQC v0.10.0 (zip file)”. Please download the Source Code version.)[/li]
[li]Move the “fastqc_v0.10.0_source.zip” file to your “ngs/applications” folder.[/li]
[li]Decompress the file and navigate to it. (NOTE: The file will be renamed “FastQC”.)[/li]

And that’s it! (Seriously! According to the installation files: “Once unzipped it's ready to go.”)
HTSeq (http://pypi.python.org/pypi/HTSeq)
[ol]
[li]Click on the link above and download the latest version. (This version will probably be “HTSeq-0.5.3p3.tar.gz”.)[/li]
[li]Move the “HTSeq-0.5.3p3.tar.gz” file to your “ngs/applications” folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]Install the program by typing:
Code:
sudo python setup.py install
[/li]
[/ol]

You should see a series of commands being processed and ending with:
Code:
Finished processing dependencies for HTSeq==0.5.3p3
(For more information about program installation, please visit: http://www-huber.embl.de/users/ander.../overview.html.)
chimerascan (http://code.google.com/p/chimerascan/)
[ol]
[li]Click on the link above and download the latest version. (This version will probably be “chimerascan-0.4.5a.tar.gz”.)[/li]
[li]Move the “chimerascan-0.4.5a.tar.gz” file to your “ngs/applications” folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]Build the program by typing:
Code:
python setup.py build
[/li]
[li]Install the program by typing:
Code:
sudo python setup.py install
[/li]

To test the installation, you need to access python. To do that, leave the directory (you can type “cd ../” to move into the “applications” folder) and type:

Code:
python
You should see something like:

Code:
Python 2.6.1
Type "help", "copyright", "credits" or "license" for more information.
To test that the chimerascan libraries are in your PYTHONPATH, type “import chimerascan”, then “chimerascan.__version__”. (Just in case that last command is obscured, you should type in “chimerascan” followed by a period, followed by two underscores, then “version”, then two underscores.) You should see the following:

Code:
'0.4.5'
Success! To exit python, type:

Code:
exit()
Congratulations! You now have a working computer that can handle just about any sequencing data you throw into it!
If you have any problems during the installation process, I recommend that you search online for the error message you received. That’s how I managed to resolve many of the difficulties I encountered during this whole process.
Additionally, you should read the README files (you can by typing “less README” when you are in the program’s directory) when you have problems, because they might give you helpful information about what’s going wrong with that program.
Finally, please remember that this document is a work in progress. Right now, we have created a system that can manage the installation of the current application versions, but these versions often change, and with those changes come new program requirements or permissions. If you encounter any problems with future versions, please respond to this thread (preferably with a solution!) and we will make the corresponding updates.
(This document was made with help from Venkata Yellapantula.)