2012年12月28日星期五

How to Add an Extra Vertical Axis to R Plots

http://www.carlislerainey.com/2012/12/28/how-to-add-an-extra-vertical-axis-to-r-plots/?utm_source=rss&utm_medium=rss&utm_campaign=how-to-add-an-extra-vertical-axis-to-r-plots


set.seed(3571)
n <- 100="100" br="br">a1 <- br="br" n="n" rnorm="rnorm">a2 <- br="br" n="n" rnorm="rnorm">
x <- br="br" null="null">y <- code="code" null="null">
x[1] <- a1="a1" br="br">y[1] <- a2="a2" code="code">
for (t in 2:n) {
x[t] <- -.2="-.2" .5=".5" a1="a1" br="br" t-1="t-1" t="t" x="x" y="y">y[t] <- a2="a2" br="br" t-1="t-1" t="t" y="y">}
par(mar = c(5,5,2,5))
plot(ts(x), ylab = \"x\")
par(new = T)
plot(ts(y), col = \"red\", axes = F, xlab = NA, ylab = NA)
axis(side = 4)
mtext(side = 4, line = 3, \"y\")

2012年12月24日星期一

update Zotero in ubuntu

This led to this forum post - http://forums.zotero.org/discussion/21759/update-failure-on-ubuntu-1110/ - and the idea that the problem was due to not having write access caused the problem. However, while granting write permissions to /opt and /opt/zotero was enough to download the update, it was not enough to install the update, and I had the error "The update could not be installed. Please make sure there are no other copies of Zotero running on your computer, and then restart Zotero to try again." My install is in /opt, simply using

# chmod -R o+w /opt/zotero /opt

finally solved the problem. After installing, I undid the procedure to return to the previous state with

# chmod -R o-w /opt/zotero /opt

vcflib - a set of tools for working with VCF

https://github.com/ekg/vcflib

2012年12月18日星期二

A model-based approach for analysis of spatial structure in genetic data

1. http://www.nature.com/ng/journal/v44/n6/abs/ng.2285.html

Characterizing genetic diversity within and between populations has broad applications in studies of human disease and evolution. We propose a new approach, spatial ancestry analysis, for the modeling of genotypes in two- or three-dimensional space. In spatial ancestry analysis (SPA), we explicitly model the spatial distribution of each SNP by assigning an allele frequency as a continuous function in geographic space. We show that the explicit modeling of the allele frequency allows individuals to be localized on the map on the basis of their genetic information alone. We apply our SPA method to a European and a worldwide population genetic variation data set and identify SNPs showing large gradients in allele frequency, and we suggest these as candidate regions under selection. These regions include SNPs in the well-characterized LCT region, as well as at loci including FOXP2OCA2 and LRP1B.

2012年12月17日星期一

Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data


Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data


Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus, two-allele Wright-Fisher process, involving up to three simultaneous populations. Our approach is a composite likelihood scheme, since linkage between neutral loci alters the variance but not the expectation of the frequency spectrum. We thus use bootstraps incorporating linkage to estimate uncertainties for parameters and significance values for hypothesis tests. Our method can also incorporate selection on single sites, predicting the joint distribution of selected alleles among populations experiencing a bevy of evolutionary forces, including expansions, contractions, migrations, and admixture. We model human expansion out of Africa and the settlement of the New World, using 5 Mb of noncoding DNA resequenced in 68 individuals from 4 populations (YRI, CHB, CEU, and MXL) by the Environmental Genome Project. We infer divergence between West African and Eurasian populations 140 thousand years ago (95% confidence interval: 40–270 kya). This is earlier than other genetic studies, in part because we incorporate migration. We estimate the European (CEU) and East Asian (CHB) divergence time to be 23 kya (95% c.i.: 17–43 kya), long after archeological evidence places modern humans in Europe. Finally, we estimate divergence between East Asians (CHB) and Mexican-Americans (MXL) of 22 kya (95% c.i.: 16.3–26.9 kya), and our analysis yields no evidence for subsequent migration. Furthermore, combining our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations accurately predicts the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU).

https://code.google.com/p/dadi/


Interplay of recombination and selection in the genomes of Chlamydia trachomatis


Interplay of recombination and selection in the genomes of Chlamydia trachomatis


Background

Chlamydia trachomatis is an obligate intracellular bacterial parasite, which causes several severe and debilitating diseases in humans. This study uses comparative genomic analyses of 12 complete published C. trachomatis genomes to assess the contribution of recombination and selection in this pathogen and to understand the major evolutionary forces acting on the genome of this bacterium.

Results

The conserved core genes of C. trachomatis are a large proportion of the pan-genome: we identified 836 core genes in C. trachomatis out of a range of 874-927 total genes in each genome. The ratio of recombination events compared to mutation (ρ/θ) was 0.07 based on ancestral reconstructions using the ClonalFrame tool, but recombination had a significant effect on genetic diversification (r/m = 0.71). The distance-dependent decay of linkage disequilibrium also indicated that C. trachomatis populations behaved intermediately between sexual and clonal extremes. Fifty-five genes were identified as having a history of recombination and 92 were under positive selection based on statistical tests. Twenty-three genes showed evidence of being under both positive selection and recombination, which included genes with a known role in virulence and pathogencity (e.g., ompA, pmps, tarp). Analysis of inter-clade recombination flux indicated non-uniform currents of recombination between clades, which suggests the possibility of spatial population structure in C. trachomatis infections.

Conclusions

C. trachomatis is the archetype of a bacterial species where recombination is relatively frequent yet gene gains by horizontal gene transfer (HGT) and losses (by deletion) are rare. Gene conversion occurs at sites across the whole C. trachomatis genome but may be more often fixed in genes that are under diversifying selection. Furthermore, genome sequencing will reveal patterns of serotype specific gene exchange and selection that will generate important research questions for understanding C. trachomatis pathogenesis.

Evolutionary and population genomics of the cavity causing bacteria Streptococcus mutans


Evolutionary and population genomics of the cavity causing bacteria Streptococcus mutans

Streptococcus mutans is widely recognized as one of the key etiological agents of human dental caries. Despite its role in this important disease our present knowledge of gene content variability across the species and its relationship to adaptation is minimal. Estimates of its demographic history are not available. In this study, we generated genome sequences of 57 Smutans isolates, as well as representative strains of the most closely related species to Smutans (Streptococcus rattiStreptococcus macaccaeStreptococcus criceti), in order to identify the overall structure and potential adaptive features of the dispensable and core components of the genome. We also performed population genetic analyses on the core genome of the species aimed at understanding the demographic history, and impact of selection shaping its genetic variation. The maximum gene content divergence among strains was approximately 23%, with the majority of strains diverging by 5% to 15%. The core genome consisted of 1490 genes and the pan-genome approximately 3296. Maximum likelihood analysis of the synonymous site frequency spectrum (SFS) suggested that theSmutans population started expanding exponentially around 10,000 years ago (95% CI: 3,268 — 14,344 ya), approximately coincidental with the onset of human agriculture. Analysis of the replacement SFS indicated that a majority of these substitutions are under strong negative selection, and the remainder evolved neutrally. A set of 14 genes was identified as being under positive selection, most of which were involved in either sugar metabolism or acid tolerance. Analysis of the core genome suggested that among 73 genes present in all isolates of Smutans but absent in other species of the mutans taxonomic group, the majority can be associated with metabolic processes that could have contributed to the successful adaptation of Smutans to its new niche, the human mouth, and with the dietary changes that accompanied the origin of agriculture.

2012年12月13日星期四

Inferring the History of Population Size Change from Genome-Wide SNP Data

Dense, genome-wide single-nucleotide polymorphism (SNP) data can be used to reconstruct the demographic history of human populations. However, demographic inferences from such data are complicated by recombination and ascertainment bias. We introduce two new statistics, allele frequency-identity by descent (AF-IBD) and allele frequency-identity by state (AF-IBS), that make use of linkage disequilibrium information and show defined relationships to the time of coalescence. These statistics, when conditioned on the derived allele frequency, are able to infer complex population size changes. Moreover, the AF-IBS statistic, which is based on genome-wide SNP data, is robust to varying ascertainment conditions. We constructed an efficient approximate Bayesian computation (ABC) pipeline based on AF-IBD and AF-IBS that can accurately estimate demographic parameters, even for fairly complex models. Finally, we applied this ABC approach to genome-wide SNP data and inferred the demographic histories of two human populations, Yoruba and French. Our results suggest a rather stable ancestral population size with a mild recent expansion for Yoruba, whereas the French seemingly experienced a long-lasting severe bottleneck followed by a drastic population growth. This approach should prove useful for new insights into populations, especially those with complex demographic histories.

http://mbe.oxfordjournals.org/content/29/12/3653.long

2012年12月10日星期一

population genomic studies of bacteria - two cases

1.

After the bottleneck: Genome-wide diversification of the Mycobacterium tuberculosis complex by mutation, recombination, and natural selection




2.

Population Genomics of Chlamydia trachomatis: Insights on Drift, Selection, Recombination, and Population Structure



detect recombination from multiple sequence alignment

1. http://www.genetics.org/content/175/3/1251.full


Inference of Bacterial Microevolution Using Multilocus Sequence Data


We describe a model-based method for using multilocus sequence data to infer the clonal relationships of bacteria and the chromosomal position of homologous recombination events that disrupt a clonal pattern of inheritance. The key assumption of our model is that recombination events introduce a constant rate of substitutions to a contiguous region of sequence. The method is applicable both to multilocus sequence typing (MLST) data from a few loci and to alignments of multiple bacterial genomes. It can be used to decide whether a subset of isolates share common ancestry, to estimate the age of the common ancestor, and hence to address a variety of epidemiological and ecological questions that hinge on the pattern of bacterial spread. It should also be useful in associating particular genetic events with the changes in phenotype that they cause. We show that the model outperforms existing methods of subdividing recombinogenic bacteria using MLST data and provide examples from Salmonella and Bacillus. The software used in this article, ClonalFrame, is available from http://bacteria.stats.ox.ac.uk/.


2. http://www.genetics.org/content/186/4/1435.full

Inference of Homologous Recombination in Bacteria Using Whole-Genome Sequences


Bacteria and archaea reproduce clonally, but sporadically import DNA into their chromosomes from other organisms. In many of these events, the imported DNA replaces an homologous segment in the recipient genome. Here we present a new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes. We introduce a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample. The model represents a simplification of the previously described coalescent with gene conversion. We implement a Monte Carlo Markov chain algorithm to perform inference under this model from sequence data alignments and show that inference is feasible for whole-genome alignments through parallelization. Using simulated data, we demonstrate accurate and reliable identification of individual recombination events and global recombination rate parameters. We applied our approach to an alignment of 13 whole genomes from the Bacillus cereus group. We find, as expected from laboratory experiments, that the recombination rate is higher between closely related organisms and also that the genome contains several broad regions of elevated levels of recombination. Application of the method to the genomic data sets that are becoming available should reveal the evolutionary history and private lives of populations of bacteria and archaea. The methods described in this article have been implemented in a computer software package, ClonalOrigin, which is freely available fromhttp://code.google.com/p/clonalorigin/.


2012年12月7日星期五

Extract private SNPs from multi-sample VCF file

http://www.biostars.org/p/58687/



I would use awk to find one sample having a genotype!="0/0"
 gunzip -c my.vcf.gz |
awk -F ' ' '/^#/{print; next;} {n=0; for(i=10;i<=NF;++i) { if($i!="." &&  index($i,"0/0")==0) n++;} if(n==1) print; } '
change the test according to your needs.

The Best Scientific Figures of 2012

http://www.wired.com/wiredscience/2012/12/science-figures-2012/?pid=5582&viewall=true


Humanity's Recent Evolution

It's easy to think that modern life has slowed human evolution, but the opposite is true. Most of humanity's genetic variation has accumulated in the last few thousand years during a period of extraordinary population growth. As a species, we're more evolvable than ever.
In the figure above, taken from an in-depth genetic analysis of 6,515 people, the amount of genetic variation collectively present at each location in the human genome is tabulated from before (left) and after (right) the population boom.
Citation: “Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants.” By Wenqing Fu, Timothy D. O’Connor, Goo Jun, Hyun Min Kang, Goncalo Abecasis, Suzanne M. Leal, Stacey Gabriel, David Altshuler, Jay Shendure, Deborah A. Nickerson, Michael J. Bamshad, NHLBI Exome Sequencing Project & Joshua M. Akey. Vol. 491, No. 7426, 29 November 2012

2012年12月4日星期二

NetView - Detect Fine-Scale Population Structures from Genome-Wide Patterns of Variation

http://sydney.edu.au/vetscience/reprogen/netview/

http://www.plosone.org/article/info:doi/10.1371/journal.pone.0048375?imageURI=info:doi/10.1371/journal.pone.0048375.g005




readscan - separate host and pathogen read sequences

Readscan is a program to identify pathogen/contaminant sequences in whole genome shotgun sequencing datasets.

scp multiple files - a bash script

#!/bin/bash 
filelist=$(ssh 'echo *' scp_user@scp_host:remotedir/) 
for File in "$filelist" 
do 
scp scp_user@scp_host:remotedir/"$FILE" /home/ugen/ 
done

TBA - work with multiple alignment MAF format


The resulting TBA multiple alignment is now ready for analysis. Several tools (in addition
to the above mentioned maf project) are available to aid you in this process and are
described in the README. Several are worth noting here:

maf order orders the species in each alignment block to some speci ed order. It will also
remove unwanted species from the multiple alignment (rather than rerunning tba
with a subset of species).
mafFind extracts a sub-region of the multiple alignment.
maf2fasta takes a projected MAF le and re-formats it to either MultiPipMaker format or
multi-FASTA format, representing the entire reference species' sequence, but only
5the alignable portions of all other species' sequences. This program is particularly
useful since the MAF speci cation is relatively new
(see http://genome.ucsc.edu/goldenPath/help/maf.html)
and is not yet as universal as the FASTA format.

http://www.bx.psu.edu/miller_lab/

2012年12月3日星期一

consist


consist of 由…构成
consist in 在于;存在于
consist with 符合;与…一致

The data consists in 370 founder individuals from the HapMap Phase III datasets, of the CEU, TSI, CHB and JPT populations.