1. http://www.mpipz.mpg.de/231719/news_publication_5982981
2. Lifestyle transitions in plant pathogenicColletotrichum fungi deciphered by genome and transcriptome analyses
2012年11月30日星期五
2012年11月28日星期三
for two reasons - 两方面的原因
Proper recombination between homologs is critical for two reasons: first, the physical link between homologs helps establish their alignment on the meiotic spindle and correct segregation at the first meiotic division; and second, the exchange of DNA provides a nearly limitless source of genetic diversity
2012年11月27日星期二
extract subset of a fasta file
#fasta file: pa101.fasta
>gi|129295|sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED)
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE
KMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTS
VLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHP
FLFLIKHNPTNTIVYFGRYWSP
#script: sequence_extractor.sh
#!/bin/bash
# The 1 based sequence extractor - sequence_extractor.sh
# No guarantees offered.
# usage:
# 1) download the script or copy the contents
# of the script and save it as sequence_extractor.sh
# 2) make it executable: chmod 755 sequence_extractor.sh
# reads from standard input or command line
# 3) run the script: ./sequence_extractor.sh ps101.fasta 4 6
# create a backup copy of the input fasta file
# and delete the header
sed -i.tmp -e '1d' $1 || exit $?
# merge the lines
temp_var1=`awk '{printf $0;}' $1` || exit $?
# select the region
temp_var2=$(((($3-1)-($2-1))+1)) || exit $?
# display the extracted sequence
echo ${temp_var1:$(($2-1)):$temp_var2} && mv $1.tmp $1 || exit $?
>gi|129295|sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED)
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE
KMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTS
VLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHP
FLFLIKHNPTNTIVYFGRYWSP
#script: sequence_extractor.sh
#!/bin/bash
# The 1 based sequence extractor - sequence_extractor.sh
# No guarantees offered.
# usage:
# 1) download the script or copy the contents
# of the script and save it as sequence_extractor.sh
# 2) make it executable: chmod 755 sequence_extractor.sh
# reads from standard input or command line
# 3) run the script: ./sequence_extractor.sh ps101.fasta 4 6
# create a backup copy of the input fasta file
# and delete the header
sed -i.tmp -e '1d' $1 || exit $?
# merge the lines
temp_var1=`awk '{printf $0;}' $1` || exit $?
# select the region
temp_var2=$(((($3-1)-($2-1))+1)) || exit $?
# display the extracted sequence
echo ${temp_var1:$(($2-1)):$temp_var2} && mv $1.tmp $1 || exit $?
2012年11月26日星期一
the climatic record from Greenland - last glacial maximum
http://www.sciencedirect.com/science/article/pii/S1040618212031965
If we accept an earlier colonization into the Americas, the story is not so neat, because there were substantial time periods between 60 and 30 thousand when colonization was possible environmentally—though these were relatively brief. But, if we look at the climatic record from Greenland rather than Antarctica (Fig. 6b), which should be more appropriate for northern latitudes, it would seem that between 55 and 25 thousand years ago, the warmer episodes were short lived extremes in a very rapidly fluctuating climate (Bender et al., 1994)—and perhaps it was the unpredictability of climate that made it difficult to work out how to adapt to the north. By the time the first Americans crossed Beringia, they seem to have learned to deal with such unpredictability because they survived the Younger Dryas fluctuations (Haynes, 2008).
last glacial maximum
If we accept an earlier colonization into the Americas, the story is not so neat, because there were substantial time periods between 60 and 30 thousand when colonization was possible environmentally—though these were relatively brief. But, if we look at the climatic record from Greenland rather than Antarctica (Fig. 6b), which should be more appropriate for northern latitudes, it would seem that between 55 and 25 thousand years ago, the warmer episodes were short lived extremes in a very rapidly fluctuating climate (Bender et al., 1994)—and perhaps it was the unpredictability of climate that made it difficult to work out how to adapt to the north. By the time the first Americans crossed Beringia, they seem to have learned to deal with such unpredictability because they survived the Younger Dryas fluctuations (Haynes, 2008).
last glacial maximum
Unix and Perl Primer for Biologists - v3.1.1
Unix and Perl Primer for Biologists
last updated: October 2012
- all course material (documentation + files)
- HTML documentation (recommended, has lots of useful links)
- plain text documentation (in Markdown with some MultiMarkdown)
- PDF documentation
If you download the entire course and uncompress the resulting zip file, then this should create a directory called 'Unix_and_Perl_course'. Inside this directory will be a 'Documentation' folder which has all three versions of the documentation (text, HTML, and PDF). The documentation is mostly aimed to be read from start to finish, though if you are comfortable with Unix you can jump to the sections on Perl.
IDEA - calculate dN dS ratio for multiple sequence, in paralle
Background
The availability of complete genomic sequences for hundreds of organisms promises to make obtaining genome-wide estimates of substitution rates, selective constraints and other molecular evolution variables of interest an increasingly important approach to addressing broad evolutionary questions. Two of the programs most widely used for this purpose are codeml and baseml, parts of the PAML (Phylogenetic Analysis by Maximum Likelihood) suite. A significant drawback of these programs is their lack of a graphical user interface, which can limit their user base and considerably reduce their efficiency.
Results
We have developed IDEA (Interactive Display for Evolutionary Analyses), an intuitive graphical input and output interface which interacts with PHYLIP for phylogeny reconstruction and with codeml and baseml for molecular evolution analyses. IDEA's graphical input and visualization interfaces eliminate the need to edit and parse text input and output files, reducing the likelihood of errors and improving processing time. Further, its interactive output display gives the user immediate access to results. Finally, IDEA can process data in parallel on a local machine or computing grid, allowing genome-wide analyses to be completed quickly.
Conclusion
IDEA provides a graphical user interface that allows the user to follow a codeml or baseml analysis from parameter input through to the exploration of results. Novel options streamline the analysis process, and post-analysis visualization of phylogenies, evolutionary rates and selective constraint along protein sequences simplifies the interpretation of results. The integration of these functions into a single tool eliminates the need for lengthy data handling and parsing, significantly expediting access to global patterns in the data.
http://www.biomedcentral.com/1471-2105/9/524
http://www.biomedcentral.com/1471-2105/9/524
Genome sequences reveal divergence times of malaria parasite lineages
Objective
The evolutionary history of human malaria parasites (genus Plasmodium) has long been a subject of speculation and controversy. The complete genome sequences of the two most widespread human malaria parasites, P. falciparum and P. vivax, and of the monkey parasite P. knowlesi are now available, together with the draft genomes of the chimpanzee parasite P. reichenowi, three rodent parasites, P. yoelii yoelli, P. berghei and P. chabaudi chabaudi, and one avian parasite, P. gallinaceum.
Methods
We present here an analysis of 45 orthologous gene sequences across the eight species that resolves the relationships of major Plasmodium lineages, and provides the first comprehensive dating of the age of those groups.
Results
Our analyses support the hypothesis that the last common ancestor of P. falciparum and the chimpanzee parasite P. reichenowi occurred around the time of the human-chimpanzee divergence. P. falciparum infections of African apes are most likely derived from humans and not the other way around. On the other hand, P. vivax, split from the monkey parasite P. knowlesi in the much more distant past, during the time that encompasses the separation of the Great Apes and Old World Monkeys.
Conclusion
The results support an ancient association between malaria parasites and their primate hosts, including humans.
Anisimova, M - on molecular evolution, and genomics
http://people.inf.ethz.ch/anmaria/publications.html
- Anisimova, M. 2012. Parametric models of codon evolution in Codon Evolution: mechanisms and models, eds. Cannarozzi G, Schneider A., Oxford University Press link
- Anisimova, M. and D. Liberles 2012. Detecting and understanding natural selection, in Codon Evolution: mechanisms and models, eds. Cannarozzi G, Schneider A., Oxford University Press link
- Roth, A., M. Anisimova, and G.M. Cannarozzi 2012. Measuring codon usage bias, in Codon Evolution: mechanisms and models, eds. Cannarozzi G, Schneider A., Oxford University Press link
- Schirrmeister, B.E., D.A. Dalquen, M. Anisimova and H.C. Bagheri 2012. Gene copy number variation and its significance in cyanobacterial phylogeny. BMC Microbiology 2:177, doi:10.1186/1471-2180-12-177 link
- Schaper, E., A.V. Kajava, A. Hauser, and M. Anisimova 2012. Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequences Nucl. Acids Res. doi: 10.1093/nar/gks726 link
Evolutionary Genomics: statistical and computational methods
Anisimova, M. (Ed.) 2012. Evolutionary Genomics: statistical and computational methods Springer (Humana Press):
The genetic architecture of adaptations to high altitude in Ethiopia
Although hypoxia is a major stress on physiological processes, several human populations have survived for millennia at high altitudes, suggesting that they have adapted to hypoxic conditions. This hypothesis was recently corroborated by studies of Tibetan highlanders, which showed that polymorphisms in candidate genes show signatures of natural selection as well as well-replicated association signals for variation in hemoglobin levels. We extended genomic analysis to two Ethiopian ethnic groups: Amhara and Oromo. For each ethnic group, we sampled low and high altitude residents, thus allowing genetic and phenotypic comparisons across altitudes and across ethnic groups. Genome-wide SNP genotype data were collected in these samples by using Illumina arrays. We find that variants associated with hemoglobin variation among Tibetans or other variants at the same loci do not influence the trait in Ethiopians. However, in the Amhara, SNP rs10803083 is associated with hemoglobin levels at genome-wide levels of significance. No significant genotype association was observed for oxygen saturation levels in either ethnic group. Approaches based on allele frequency divergence did not detect outliers in candidate hypoxia genes, but the most differentiated variants between high- and lowlanders have a clear role in pathogen defense. Interestingly, a significant excess of allele frequency divergence was consistently detected for genes involved in cell cycle control, DNA damage and repair, thus pointing to new pathways for high altitude adaptations. Finally, a comparison of CpG methylation levels between high- and lowlanders found several significant signals at individual genes in the Oromo.
http://arxiv.org/abs/1211.3053
http://arxiv.org/abs/1211.3053
2012年11月22日星期四
Demographic processes shaping genetic variation
Demographic processes modulate genome-wide levels and patterns of genetic variation via impacting effective population size independently of natural selection. Such processes include the perturbation of population distributions from external events shaping habitat landscape and internal factors shaping the probability of contemporaneous alleles in a population (coalescence). Several patterns have recently emerged: spatial and temporal heterogeneity in population structure have different influences on the persistence of new mutations and genetic variation, multi-locus analyses indicate that gene flow continues to occur during speciation and the incorporation of demographic processes into models of molecular evolution and association genetics approaches has improved statistical power to detect deviations from neutral-equilibrium expectations and decreased false positive rates.
http://www.sciencedirect.com/science/article/pii/S136952660800037X
http://www.sciencedirect.com/science/article/pii/S136952660800037X
Quantitative visualization of biological data in Google Earth using R2G2, an R CRAN package
http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12012/full
We briefly introduce R2G2, an R CRAN package to visualize spatially explicit biological data within the Google Earth interface. Our package combines a collection of basic graph-editing features, including automated placement of dots, segments, polygons, images (including graphs produced with R), along with several complex three-dimensional (3D) representations such as phylogenies, histograms and pie charts. We briefly present some example data sets and show the immediate benefits in communication gained from using the Google Earth interface to visually explore biological results. The package is distributed with detailed help pages providing examples and annotated source scripts with the hope that users will have an easy time using and further developing this package. R2G2 is distributed onhttp://cran.r-project.org/web/packages.
We briefly introduce R2G2, an R CRAN package to visualize spatially explicit biological data within the Google Earth interface. Our package combines a collection of basic graph-editing features, including automated placement of dots, segments, polygons, images (including graphs produced with R), along with several complex three-dimensional (3D) representations such as phylogenies, histograms and pie charts. We briefly present some example data sets and show the immediate benefits in communication gained from using the Google Earth interface to visually explore biological results. The package is distributed with detailed help pages providing examples and annotated source scripts with the hope that users will have an easy time using and further developing this package. R2G2 is distributed onhttp://cran.r-project.org/web/packages.
2012年11月21日星期三
In UNIX grep a phrase and adjacent lines
http://www.experts-exchange.com/OS/Linux/Q_26606094.html
grep -A2 SELECT
That will return the line matching and the next two lines, found the answer here -
http://linux.byexamples.co m/archives /304/grep- multiple-l ines/
You can also do
grep -A2 -i select
so it matches upper or lower case (-i is ignore case)
grep -A2 SELECT
That will return the line matching and the next two lines, found the answer here -
http://linux.byexamples.co
You can also do
grep -A2 -i select
so it matches upper or lower case (-i is ignore case)
split file into files by pattern
http://www.unix.com/shell-programming-scripting/70227-split-file-based-pattern-awk-grep-sed-perl.html
I like to split the above file into 3 files like below,
file1:
file2:
file3:
Basically the file need to be start with "B" record and start a new file when it come across another "B" record.
###################
awk '/^B/{close("file"f);f++}{print $0 > "file"f}' input.txt
perl -n -e '/^B/ and open FH, ">output_".$n++; print FH;' input.txt
csplit -k input.txt '/^B/' '{99}'
Code:
Buuuu xxx bbb Kmmmm rrr ssss uuuu Kwwww zzzz ccc Roooowwww eeee Bxxxx jjjj dddd Kuuuu eeeee nnnn Rpppp cccc vvvv cccc Rhhhhhhyyyy tttt Lhhhh rrrrrssssss Bffff mmmm iiiii Ktttt eeeeeee Kyyyyy iiiii wwww Rwwww rrrr sssss eeee Rnnnnn xxxxxxccccc
I like to split the above file into 3 files like below,
file1:
Code:
Buuuu xxx bbb Kmmmm rrr ssss uuuu Kwwww zzzz ccc Roooowwww eeee
file2:
Code:
Bxxxx jjjj dddd Kuuuu eeeee nnnn Rpppp cccc vvvv cccc Rhhhhhhyyyy tttt Lhhhh rrrrrssssss
file3:
Code:
Bffff mmmm iiiii Ktttt eeeeeee Kyyyyy iiiii wwww Rwwww rrrr sssss eeee Rnnnnn xxxxxxccccc
Basically the file need to be start with "B" record and start a new file when it come across another "B" record.
###################
awk '/^B/{close("file"f);f++}{print $0 > "file"f}' input.txt
perl -n -e '/^B/ and open FH, ">output_".$n++; print FH;' input.txt
csplit -k input.txt '/^B/' '{99}'
2012年11月20日星期二
attribute, ascribe - 归因于
1. population history could be attributed to the differentiation among populations.
2. It just fell within the range of the last glacial maximum (LGM), thus supporting that isolation of populations was ascribed to global climate change in Pleistocene.
2. It just fell within the range of the last glacial maximum (LGM), thus supporting that isolation of populations was ascribed to global climate change in Pleistocene.
counteract - 抵消
Restricted gene flow could not counteract the effect of genetic drift and resulted in differentiation among populations.
homoplasy - 异源相似性
However, closely related species delimitation based on morphologic analysis might be distorted by a high level of homoplasy (Nyffeler et al., 2005).
2012年11月14日星期三
2012年11月13日星期二
update R in Ubuntu linux
Keeping R up to date on Ubuntu linux
R is included as part of the standard Ubuntu distribution, and can be installed with a command like
sudo apt-get install r-base |
Obviously the software included as part of the standard distribution usually lags a little behind the latest version, and this is usually quite acceptable for most users most of the time. However, R is evolving quite quickly at the moment, and for various reasons I have decided to skip Ubuntu 12.10 (quantal) and stick with Ubuntu 12.4 (precise) for the time being. Since R 2.14 is included with Ubuntu 12.4, and I’d rather use R 2.15, I’d like to run with the latest R builds on my Ubuntu system.
Fortunately this is very easy, as there is a maintained repository for Ubuntu builds of R on CRAN. Full instructions are provided on CRAN, but here is the quick summary. First you need to know your nearest CRAN mirror – there is a list of mirrors on CRAN. I generally use the Bristol mirror, and so I will use it in the following.
1 | sudo su |
2 | echo "deb http://www.stats.bris.ac.uk/R/bin/linux/ubuntuprecise/" >> /etc/apt/sources.list |
3 | apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9 |
4 | apt-get update |
5 | apt-get upgrade |
That’s it. You are updated to the latest version of R, and your system will check for updates in the usual way. There are just two things you may need to edit in line 2 above. The first is the address of the CRAN mirror (here “www.stats.bris.ac.uk”). The second is the name of the Ubuntu distro you are running (here “precise”).
MULTIMIX - inferring the local ancestry of admixed individuals from dense genome-wide single nucleotide polymorphism data
http://onlinelibrary.wiley.com/doi/10.1002/gepi.21692/full
We describe a novel method for inferring the local ancestry of admixed individuals from dense genome-wide single nucleotide polymorphism data. The method, called MULTIMIX, allows multiple source populations, models population linkage disequilibrium between markers and is applicable to datasets in which the sample and source populations are either phased or unphased. The model is based upon a hidden Markov model of switches in ancestry between consecutive windows of loci. We model the observed haplotypes within each window using a multivariate normal distribution with parameters estimated from the ancestral panels. We present three methods to fit the model—Markov chain Monte Carlo sampling, the Expectation Maximization algorithm, and a Classification Expectation Maximization algorithm. The performance of our method on individuals simulated to be admixed with European and West African ancestry shows it to be comparable to HAPMIX, the ancestry calls of the two methods agreeing at 99.26% of loci across the three parameter groups. In addition to it being faster than HAPMIX, it is also found to perform well over a range of extent of admixture in a simulation involving three ancestral populations. In an analysis of real data, we estimate the contribution of European, West African and Native American ancestry to each locus in the Mexican samples of HapMap, giving estimates of ancestral proportions that are consistent with those previously reported.
We describe a novel method for inferring the local ancestry of admixed individuals from dense genome-wide single nucleotide polymorphism data. The method, called MULTIMIX, allows multiple source populations, models population linkage disequilibrium between markers and is applicable to datasets in which the sample and source populations are either phased or unphased. The model is based upon a hidden Markov model of switches in ancestry between consecutive windows of loci. We model the observed haplotypes within each window using a multivariate normal distribution with parameters estimated from the ancestral panels. We present three methods to fit the model—Markov chain Monte Carlo sampling, the Expectation Maximization algorithm, and a Classification Expectation Maximization algorithm. The performance of our method on individuals simulated to be admixed with European and West African ancestry shows it to be comparable to HAPMIX, the ancestry calls of the two methods agreeing at 99.26% of loci across the three parameter groups. In addition to it being faster than HAPMIX, it is also found to perform well over a range of extent of admixture in a simulation involving three ancestral populations. In an analysis of real data, we estimate the contribution of European, West African and Native American ancestry to each locus in the Mexican samples of HapMap, giving estimates of ancestral proportions that are consistent with those previously reported.
2012年11月9日星期五
2012年11月8日星期四
compromise, render
I am prepared to make some concession on minor details, but I cannot compromise on fundamentals.
在一些细节上我可以作些让步,但在基本原则上我是不会妥协的。
The loss of variation and the cost of domestication in genomes of crop species
may compromise the level of natural defenses against pathogens
and render them more susceptible than their wild relatives.
contingent on
The accuracy of repeat genotypes is contingent on the proper mapping of reads to repeat loci.
promise benefit to
Further, analysing repeats in personal genomes promises benefit not just to medical genetics and the diagnosis of repeat-related disorders but also to forensics and genealogy, where shorter and more stable tandem repeats can serve as DNA fingerprints to uniquely identify individuals.
2012年11月7日星期三
Linuxcast and Ecodecademy
http://yixf.name/2012/11/07/%E8%8D%90linuxcast%E4%B8%8Ecodecademy/
LinuxCast:全方位的Linux学习与交流平台。一个提供免费的专业Linux视频、教学、问答及交流平台。LinuxCast以视频+在线问答的形式为您提供了一种全新的、简单的Linux学习方式,而内容却更加专业 Linux学习从此不再晦涩难懂。
Codecademy:通过Codecademy学习编程,简单、互动、有趣。
PS:刚刚简单试了一下学习Python,还是蛮简单有趣的,描述性的英语也不难,理解起来基本不费劲
PS:刚刚简单试了一下学习Python,还是蛮简单有趣的,描述性的英语也不难,理解起来基本不费劲
2012年11月6日星期二
2012年11月5日星期一
Install R and Rstudio in Ubuntu
Install R in Ubuntu is extremely easy if you don’t meet any exception, but if you meet, then you’d better be a very advanced linux user :-)
Install R
Because the Ubuntu official source R version is usually half of years older than R-project official source, so it is recommanded to using r-project.org official source to install the latest R system.
1 2 3 |
|
1 2 3 4 5 |
|
Install Oracle DB access package
You can found new version of ROracle or DBI package in CRAN, it is also required you properly install the Oracle Instant Client.
1 2 3 4 |
|
Install RStudio Server
1 2 3 4 5 |
|
Do some RStudio Server setting
1 2 3 4 5 6 |
|
Setting the proxy server for RStudio server
This section is optional, assured already install nginx in server.
1 2 3 4 5 6 7 8 |
|
Setting auto restart and PATH
1 2 |
|
1 2 3 4 |
|
1
|
|
Add a user in RStudio
1 2 |
|
Update package
Usually it is more good to upgrade the r-base in system wide packages instead of per user
订阅:
博文 (Atom)