2013年6月27日星期四

distinguish effects of geographic and ecological isolation on differentiation

Populations can be genetically isolated by both geographic distance and by differences in their ecology or environment that decrease the rate of successful migration. Empirical studies often seek to investigate the relationship between genetic differentiation and some ecological variable(s) while accounting for geographic distance, but common approaches to this problem (such as the partial Mantel test) have a number of drawbacks. In this article, we present a Bayesian method that enables users to quantify the relative contributions of geographic distance and ecological distance to genetic differentiation between sampled populations or individuals. We model the allele frequencies in a set of populations at a set of unlinked loci as spatially correlated Gaussian processes, in which the covariance structure is a decreasing function of both geographic and ecological distance. Parameters of the model are estimated using a Markov chain Monte Carlo algorithm. We call this method Bayesian Estimation of Differentiation in Alleles by Spatial Structure and Local Ecology (BEDASSLE), and have implemented it in a user-friendly format in the statistical platform R. We demonstrate its utility with a simulation study and empirical applications to human and teosinte datasets.

http://arxiv.org/abs/1302.3274

Piping with samtools, bwa and bedtools

http://www.biostars.org/p/43677/

In this tutorial I will introduce some concepts related to unix piping. Piping is a very useful feature to avoid creation of intermediate use once files. It is assumed that bedtoolssamtools, and bwa are installed.
Lets begin with a typical command to do paired end mapping with bwa: (./ means look in current directory only)
#-t 4 is for using 4 threads/cores
bwa aln -t 4 ./hg19.fasta ./s1_1.fastq > ./s1_1.sai
bwa aln -t 4 ./hg19.fasta ./s1_2.fastq > ./s1_2.sai
bwa sampe ./hg19.fasta ./s1_1.sai ./s1_2.sai ./s1_1.fastq ./s1_2.fastq > s1.sam
Supposed we wish to compress sam to bam, sort, remove duplicates, and create a bed file.
samtools view -Shu s1.sam > s1.bam
samtools sort s1.bam s1_sorted
samtools rmdup -s s1_sorted.bam s1_sorted_nodup.bam
bamToBed -i s1_sorted_nodup.bam > s1_sorted_nodup.bed
This workflow above creates many files that are only used once (such as s1.bam) and we can use the unix pipe utility to reduce the number intermediate files created. The pipe function is the character | and what it does is take the output from one command and sets it as input for next command (assuming next command knows how to deal with this input).
For example, when you type in head myfile.txt the command 'head' will read first 10 lines of myfile.txt and output it (by default) to the display (stdout) so you will see the first 10 lines printed on your screen. However, if you were to do something like this:
head myfile.txt | wc -l
you will instead get 10 printed out on your screen. What the pipe command did was take the output of head command and used it as input for the wordcount (wc) command. wc -l command will count the number of lines passed in (which is 10 since head by default prints the first 10 lines). An equivalent command would be
head myfile.txt > myfile_top10_lines.txt
wc -l myfile_top10_lines.txt
By using the pipe command we are able to eliminate the intermediate file that was generated (myfile_top10_lines.txt).
Below is an example of how pipe command can be used in samtools
bwa aln -t 4 ./hg19.fasta ./s1_1.fastq > ./s1_1.sai
bwa aln -t 4 ./hg19.fasta ./s1_2.fastq > ./s1_2.sai
bwa sampe ./hg19.fasta ./s1_1.sai ./s1_2.sai ./s1_1.fastq ./s1_2.fastq | \
samtools view -Shu - | \
samtools sort - - | \
samtools rmdup -s - - | \
tee s1_sorted_nodup.bam | \
bamToBed > s1_sorted_nodup.bed
A couple of things I wanted to point out here:
  • The \ symbol after the pipe symbol tells the terminal that I want to keep writing command on the next line, that way you can have multiline commands instead of super-long command on one line.
  • The tee command takes the information being piped and creates a file while passing the data along the pipe, this command is useful if you want to save an intermediate file but still need to do additional processing. (Think of a T shaped drain pipe)
  • The - symbol in samtools it to tell the samtools program to take the input from pipe
  • It is not recommended you run this command all at once like I showed above if you have a giant bam file because sort will take forever. Running things in a pipe can be more space efficient since you are not storing intermediate files but can create issues/bottlenecks elsewhere. If problems arise, run the pieces without piping to troubleshoot.
  • The two bwa aln commands can be run in parallel (for example: on another machine) before the sampe command
Other common uses of pipes that I have not shown above it to pipe output into snp calling, you can find some examples here: (NOTE: pileupcommand is depreciated, the new way to call SNPs using samtools using mpileup command)
samtools merge - *.bam | tee merged.bam | samtools rmdup - - | tee rmdup.bam | samtools mpileup - uf ./hg19.fasta - | bcftools view -bvcg - | gzip > var.raw.bcf.gz
Compressing files with gzip can save a lot of space, but makes these files non-human readable (if you use the head or less command it will give you gibberish). To view these files you can use zless and zcat command.
zless var.raw.bcf.gz #similar to gzip -cd var.raw.bcf.gz | less
zcat var.raw.bcf.gz | head
In case you are wondering if there is a way to run bwa sampe without storing intermediate sai files the command below will do this:
bwa sampe ./hg19.fasta <(bwa aln -t 4 ./hg19.fasta ./s1_1.fastq) <(bwa aln -t 4 ./hg19.fasta ./s1_2.fastq) ./s1_1.fastq ./s1_2.fastq | samtools view -Shb /dev/stdin > s1.bam
Notice here that I used the -b command instead of -u command in samtools view. -u is useful for piping since it spits out an uncompressed bam so you don't have to waste CPU cycles compressing/decompressing at every step, however, if you have a file you want to store, it would be better to compress it with -b. Additionally, I used /dev/stdin instead of using the - symbol, they both mean the same thing (afaik) and I wanted to point it out that /dev/stdin might be a more explicit way of showing things. (ie samtools view -Shu /dev/stdin )

The Elements of Bioinformatics

http://elements.eaglegenomics.com/

2013年6月16日星期日

X11 forwarding over SSH: run remote graphical app and display locally

http://linuxcommando.blogspot.ca/2013/05/x11-forwarding-over-ssh-run-remote.html

In the modern networked environment, we often wish to run an application on a remote host while we are comfortably logged in on our local computer.
Assuming both machines are Linux-based, and the application runs on the graphical X desktop, the following approaches come to mind:
  • VNC
  • X11 forwarding over SSH
This article focuses only on X11 forwarding. X11 forwarding over SSH enables you to run a remote X app and display it locally, with traffic between the 2 hosts encrypted by SSH.
For X11 forwarding over SSH to work, both the SSH client and SSH server must be properly configured.
X11 forwarding must be enabled on The SSH server side. This is the machine where the application resides. To enable the feature, make sure the X11 configuration file /etc/ssh/sshd_config on the server contains this line:
X11Forwarding yes
If you edit the said file, you need to restart the sshd daemon for the change to take effect.
On Debian or Ubuntu systems, you restart the SSH daemon like this:
$ sudo service ssh restart
[ ok ] Restarting OpenBSD Secure Shell server: sshd.
$
On the ssh client side, you need to run SSH command with the proper parameters. For instance, suppose you want to run the xclockapplication on the remote SSH server and have it displayed back on the local client.
$ ssh -fX peter@192.168.1.112 xclock 
peter@192.168.1.112's password: 
$
The -X parameter allows an one-off X11 forwarding session.
The -f parameter instructs the SSH client to go to the background just before xclock is run.
If you want to permanently enable X11 forwarding for an user, insert this line in the user's own ~/.ssh/config file on the local host.
ForwardX11 yes 
With X11 forwarding permanently enabled for the client, you can leave out the -X parameter:
$ ssh -f peter@192.168.1.112 xclock 
peter@192.168.1.112's password: 
$
If X11 forwarding is not enabled on the SSH server, any attempt to tunnel X11 will fail with the following error message:
$ ssh -X peter@192.168.1.112 xclock 
peter@192.168.1.112's password: 
X11 forwarding request failed on channel 0
Error: Can't open display: 
$
If X11 forwarding is properly enabled on the server side, you will see a nice looking clock displayed on your local screen.