evolving all we are

Unknown

1. http://www.mpipz.mpg.de/231719/news_publication_5982981
2. Lifestyle transitions in plant pathogenicColletotrichum fungi deciphered by genome and transcriptome analyses

Unknown

Proper recombination between homologs is critical for two reasons: first, the physical link between homologs helps establish their alignment on the meiotic spindle and correct segregation at the first meiotic division; and second, the exchange of DNA provides a nearly limitless source of genetic diversity

Unknown

http://viralzone.expasy.org/

Unknown

#fasta file: pa101.fasta
>gi|129295|sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED)
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNNSFNVATLPAE
KMKILELPFASGDLSMLVLLPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTS
VLMALGMTDLFIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHP
FLFLIKHNPTNTIVYFGRYWSP

#script: sequence_extractor.sh
#!/bin/bash

# The 1 based sequence extractor - sequence_extractor.sh
# No guarantees offered.

# usage:
# 1) download the script or copy the contents
# of the script and save it as sequence_extractor.sh
# 2) make it executable: chmod 755 sequence_extractor.sh
# reads from standard input or command line
# 3) run the script: ./sequence_extractor.sh ps101.fasta 4 6

# create a backup copy of the input fasta file
# and delete the header
sed -i.tmp -e '1d' $1 || exit $?

# merge the lines
temp_var1=`awk '{printf $0;}' $1` || exit $?

# select the region
temp_var2=$(((($3-1)-($2-1))+1)) || exit $?

# display the extracted sequence
echo ${temp_var1:$(($2-1)):$temp_var2} && mv $1.tmp $1 || exit $?

Unknown

http://www.sciencedirect.com/science/article/pii/S1040618212031965

If we accept an earlier colonization into the Americas, the story is not so neat, because there were substantial time periods between 60 and 30 thousand when colonization was possible environmentally—though these were relatively brief. But, if we look at the climatic record from Greenland rather than Antarctica (Fig. 6b), which should be more appropriate for northern latitudes, it would seem that between 55 and 25 thousand years ago, the warmer episodes were short lived extremes in a very rapidly fluctuating climate (Bender et al., 1994)—and perhaps it was the unpredictability of climate that made it difficult to work out how to adapt to the north. By the time the first Americans crossed Beringia, they seem to have learned to deal with such unpredictability because they survived the Younger Dryas fluctuations (Haynes, 2008).

last glacial maximum

Unknown

Unix and Perl Primer for Biologists

last updated: October 2012

all course material (documentation + files)
HTML documentation (recommended, has lots of useful links)
plain text documentation (in Markdown with some MultiMarkdown)
PDF documentation

If you download the entire course and uncompress the resulting zip file, then this should create a directory called 'Unix_and_Perl_course'. Inside this directory will be a 'Documentation' folder which has all three versions of the documentation (text, HTML, and PDF). The documentation is mostly aimed to be read from start to finish, though if you are comfortable with Unix you can jump to the sections on Perl.

http://korflab.ucdavis.edu/Unix_and_Perl/index.html

Unknown

Background

The availability of complete genomic sequences for hundreds of organisms promises to make obtaining genome-wide estimates of substitution rates, selective constraints and other molecular evolution variables of interest an increasingly important approach to addressing broad evolutionary questions. Two of the programs most widely used for this purpose are codeml and baseml, parts of the PAML (Phylogenetic Analysis by Maximum Likelihood) suite. A significant drawback of these programs is their lack of a graphical user interface, which can limit their user base and considerably reduce their efficiency.

Results

We have developed IDEA (Interactive Display for Evolutionary Analyses), an intuitive graphical input and output interface which interacts with PHYLIP for phylogeny reconstruction and with codeml and baseml for molecular evolution analyses. IDEA's graphical input and visualization interfaces eliminate the need to edit and parse text input and output files, reducing the likelihood of errors and improving processing time. Further, its interactive output display gives the user immediate access to results. Finally, IDEA can process data in parallel on a local machine or computing grid, allowing genome-wide analyses to be completed quickly.

Conclusion

IDEA provides a graphical user interface that allows the user to follow a codeml or baseml analysis from parameter input through to the exploration of results. Novel options streamline the analysis process, and post-analysis visualization of phylogenies, evolutionary rates and selective constraint along protein sequences simplifies the interpretation of results. The integration of these functions into a single tool eliminates the need for lengthy data handling and parsing, significantly expediting access to global patterns in the data.

http://www.biomedcentral.com/1471-2105/9/524

Unknown

Objective

The evolutionary history of human malaria parasites (genus Plasmodium) has long been a subject of speculation and controversy. The complete genome sequences of the two most widespread human malaria parasites, P. falciparum and P. vivax, and of the monkey parasite P. knowlesi are now available, together with the draft genomes of the chimpanzee parasite P. reichenowi, three rodent parasites, P. yoelii yoelli, P. berghei and P. chabaudi chabaudi, and one avian parasite, P. gallinaceum.

Methods

We present here an analysis of 45 orthologous gene sequences across the eight species that resolves the relationships of major Plasmodium lineages, and provides the first comprehensive dating of the age of those groups.

Results

Our analyses support the hypothesis that the last common ancestor of P. falciparum and the chimpanzee parasite P. reichenowi occurred around the time of the human-chimpanzee divergence. P. falciparum infections of African apes are most likely derived from humans and not the other way around. On the other hand, P. vivax, split from the monkey parasite P. knowlesi in the much more distant past, during the time that encompasses the separation of the Great Apes and Old World Monkeys.

Conclusion

The results support an ancient association between malaria parasites and their primate hosts, including humans.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3081533/

Unknown

http://people.inf.ethz.ch/anmaria/publications.html

Anisimova, M. 2012. Parametric models of codon evolution in Codon Evolution: mechanisms and models, eds. Cannarozzi G, Schneider A., Oxford University Press link
Anisimova, M. and D. Liberles 2012. Detecting and understanding natural selection, in Codon Evolution: mechanisms and models, eds. Cannarozzi G, Schneider A., Oxford University Press link
Roth, A., M. Anisimova, and G.M. Cannarozzi 2012. Measuring codon usage bias, in Codon Evolution: mechanisms and models, eds. Cannarozzi G, Schneider A., Oxford University Press link
Schirrmeister, B.E., D.A. Dalquen, M. Anisimova and H.C. Bagheri 2012. Gene copy number variation and its significance in cyanobacterial phylogeny. BMC Microbiology 2:177, doi:10.1186/1471-2180-12-177 link
Schaper, E., A.V. Kajava, A. Hauser, and M. Anisimova 2012. Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequences Nucl. Acids Res. doi: 10.1093/nar/gks726 link

Unknown

Anisimova, M. (Ed.) 2012. Evolutionary Genomics: statistical and computational methods Springer (Humana Press):

vol 1: ISBN 978-1-61779-581-7

vol 2: ISBN 978-1-61779-584-8

Unknown

Although hypoxia is a major stress on physiological processes, several human populations have survived for millennia at high altitudes, suggesting that they have adapted to hypoxic conditions. This hypothesis was recently corroborated by studies of Tibetan highlanders, which showed that polymorphisms in candidate genes show signatures of natural selection as well as well-replicated association signals for variation in hemoglobin levels. We extended genomic analysis to two Ethiopian ethnic groups: Amhara and Oromo. For each ethnic group, we sampled low and high altitude residents, thus allowing genetic and phenotypic comparisons across altitudes and across ethnic groups. Genome-wide SNP genotype data were collected in these samples by using Illumina arrays. We find that variants associated with hemoglobin variation among Tibetans or other variants at the same loci do not influence the trait in Ethiopians. However, in the Amhara, SNP rs10803083 is associated with hemoglobin levels at genome-wide levels of significance. No significant genotype association was observed for oxygen saturation levels in either ethnic group. Approaches based on allele frequency divergence did not detect outliers in candidate hypoxia genes, but the most differentiated variants between high- and lowlanders have a clear role in pathogen defense. Interestingly, a significant excess of allele frequency divergence was consistently detected for genes involved in cell cycle control, DNA damage and repair, thus pointing to new pathways for high altitude adaptations. Finally, a comparison of CpG methylation levels between high- and lowlanders found several significant signals at individual genes in the Oromo.

http://arxiv.org/abs/1211.3053

Unknown

Demographic processes modulate genome-wide levels and patterns of genetic variation via impacting effective population size independently of natural selection. Such processes include the perturbation of population distributions from external events shaping habitat landscape and internal factors shaping the probability of contemporaneous alleles in a population (coalescence). Several patterns have recently emerged: spatial and temporal heterogeneity in population structure have different influences on the persistence of new mutations and genetic variation, multi-locus analyses indicate that gene flow continues to occur during speciation and the incorporation of demographic processes into models of molecular evolution and association genetics approaches has improved statistical power to detect deviations from neutral-equilibrium expectations and decreased false positive rates.

http://www.sciencedirect.com/science/article/pii/S136952660800037X

Unknown

http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12012/full

We briefly introduce R2G2, an R CRAN package to visualize spatially explicit biological data within the Google Earth interface. Our package combines a collection of basic graph-editing features, including automated placement of dots, segments, polygons, images (including graphs produced with R), along with several complex three-dimensional (3D) representations such as phylogenies, histograms and pie charts. We briefly present some example data sets and show the immediate benefits in communication gained from using the Google Earth interface to visually explore biological results. The package is distributed with detailed help pages providing examples and annotated source scripts with the hope that users will have an easy time using and further developing this package. R2G2 is distributed onhttp://cran.r-project.org/web/packages.

Unknown

http://www.experts-exchange.com/OS/Linux/Q_26606094.html

grep -A2 SELECT

That will return the line matching and the next two lines, found the answer here -

http://linux.byexamples.com/archives/304/grep-multiple-lines/

You can also do

grep -A2 -i select

so it matches upper or lower case (-i is ignore case)

Unknown

http://www.unix.com/shell-programming-scripting/70227-split-file-based-pattern-awk-grep-sed-perl.html

Code:

Buuuu xxx bbb
Kmmmm rrr ssss uuuu
Kwwww zzzz ccc
Roooowwww eeee
Bxxxx jjjj dddd
Kuuuu eeeee nnnn
Rpppp cccc vvvv cccc
Rhhhhhhyyyy tttt
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc

I like to split the above file into 3 files like below,

file1:

Code:

Buuuu xxx bbb
Kmmmm rrr ssss uuuu
Kwwww zzzz ccc
Roooowwww eeee

file2:

Code:

Bxxxx jjjj dddd
Kuuuu eeeee nnnn
Rpppp cccc vvvv cccc
Rhhhhhhyyyy tttt
Lhhhh rrrrrssssss

file3:

Code:

Bffff mmmm iiiii
Ktttt eeeeeee
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc

Basically the file need to be start with "B" record and start a new file when it come across another "B" record.

###################
awk '/^B/{close("file"f);f++}{print $0 > "file"f}' input.txt

perl -n -e '/^B/ and open FH, ">output_".$n++; print FH;' input.txt

csplit -k input.txt '/^B/' '{99}'

Unknown

1. population history could be attributed to the differentiation among populations.
2. It just fell within the range of the last glacial maximum (LGM), thus supporting that isolation of populations was ascribed to global climate change in Pleistocene.

Unknown

Restricted gene flow could not counteract the effect of genetic drift and resulted in differentiation among populations.

Unknown

However, closely related species delimitation based on morphologic analysis might be distorted by a high level of homoplasy (Nyffeler et al., 2005).

Unknown

With increasing evidence implicating important biological roles of lincRNAs in animal cells (Barsotti and Prives, 2010; Qureshi et al., 2010), a comprehensive genome-wide analysis of plant lincRNA is warranted.

Unknown

Therefore, as a first step, we reanalyzed these ncRNAs in an attempt to identify bona fide lincRNAs.

adj. 善意的；真实的；真诚的

actual , sincere , true , real , positive

adv. 善意地；真实地；真诚地

truly , sincerely , really , true

Unknown

Keeping R up to date on Ubuntu linux

R is included as part of the standard Ubuntu distribution, and can be installed with a command like

sudo apt-get install r-base

Obviously the software included as part of the standard distribution usually lags a little behind the latest version, and this is usually quite acceptable for most users most of the time. However, R is evolving quite quickly at the moment, and for various reasons I have decided to skip Ubuntu 12.10 (quantal) and stick with Ubuntu 12.4 (precise) for the time being. Since R 2.14 is included with Ubuntu 12.4, and I’d rather use R 2.15, I’d like to run with the latest R builds on my Ubuntu system.

Fortunately this is very easy, as there is a maintained repository for Ubuntu builds of R on CRAN. Full instructions are provided on CRAN, but here is the quick summary. First you need to know your nearest CRAN mirror – there is a list of mirrors on CRAN. I generally use the Bristol mirror, and so I will use it in the following.

1sudo su

2echo "deb http://www.stats.bris.ac.uk/R/bin/linux/ubuntuprecise/" >> /etc/apt/sources.list

3apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9

4apt-get update

5apt-get upgrade

That’s it. You are updated to the latest version of R, and your system will check for updates in the usual way. There are just two things you may need to edit in line 2 above. The first is the address of the CRAN mirror (here “www.stats.bris.ac.uk”). The second is the name of the Ubuntu distro you are running (here “precise”).

http://darrenjw.wordpress.com/2012/11/10/keeping-r-up-to-date-on-ubuntu-linux/

Unknown

http://onlinelibrary.wiley.com/doi/10.1002/gepi.21692/full

We describe a novel method for inferring the local ancestry of admixed individuals from dense genome-wide single nucleotide polymorphism data. The method, called MULTIMIX, allows multiple source populations, models population linkage disequilibrium between markers and is applicable to datasets in which the sample and source populations are either phased or unphased. The model is based upon a hidden Markov model of switches in ancestry between consecutive windows of loci. We model the observed haplotypes within each window using a multivariate normal distribution with parameters estimated from the ancestral panels. We present three methods to fit the model—Markov chain Monte Carlo sampling, the Expectation Maximization algorithm, and a Classification Expectation Maximization algorithm. The performance of our method on individuals simulated to be admixed with European and West African ancestry shows it to be comparable to HAPMIX, the ancestry calls of the two methods agreeing at 99.26% of loci across the three parameter groups. In addition to it being faster than HAPMIX, it is also found to perform well over a range of extent of admixture in a simulation involving three ancestral populations. In an analysis of real data, we estimate the contribution of European, West African and Native American ancestry to each locus in the Mexican samples of HapMap, giving estimates of ancestral proportions that are consistent with those previously reported.

Unknown

http://www.youtube.com/playlist?list=PL1D0F3DA630CFBF70

Unknown

Unknown

I am prepared to make some concession on minor details, but I cannot compromise on fundamentals.

在一些细节上我可以作些让步，但在基本原则上我是不会妥协的。

The loss of variation and the cost of domestication in genomes of crop species

may compromise the level of natural defenses against pathogens

and render them more susceptible than their wild relatives.

Unknown

The accuracy of repeat genotypes is contingent on the proper mapping of reads to repeat loci.

Unknown

Further, analysing repeats in personal genomes promises benefit not just to medical genetics and the diagnosis of repeat-related disorders but also to forensics and genealogy, where shorter and more stable tandem repeats can serve as DNA fingerprints to uniquely identify individuals.

Unknown

http://yixf.name/2012/11/07/%E8%8D%90linuxcast%E4%B8%8Ecodecademy/

LinuxCast：全方位的Linux学习与交流平台。一个提供免费的专业Linux视频、教学、问答及交流平台。LinuxCast以视频+在线问答的形式为您提供了一种全新的、简单的Linux学习方式，而内容却更加专业 Linux学习从此不再晦涩难懂。

Codecademy：通过Codecademy学习编程，简单、互动、有趣。
PS：刚刚简单试了一下学习Python，还是蛮简单有趣的，描述性的英语也不难，理解起来基本不费劲

Unknown

http://www.bartromgens.org/wordpress/?p=397

Unknown

http://mcfromnz.wordpress.com/2012/11/06/forest-plots-in-r-ggplot-with-side-table/

Unknown

http://blog.cloud-mes.com/2012/09/16/install-r-and-rstudio-in-ubuntu/

Install R in Ubuntu is extremely easy if you don’t meet any exception, but if you meet, then you’d better be a very advanced linux user :-)

Install R

Because the Ubuntu official source R version is usually half of years older than R-project official source, so it is recommanded to using r-project.org official source to install the latest R system.

vi /etc/apt/sources.list

# append below line to end of sources.list
# you can view mirror at http://cran.r-project.org/mirrors.html
deb http://ftp.ctex.org/mirrors/CRAN/bin/linux/ubuntu precise/

import the GPG key and install r-base

cd ~
gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add -
apt-get upgrade
apt-get install r-base

Install Oracle DB access package

You can found new version of ROracle or DBI package in CRAN, it is also required you properly install the Oracle Instant Client.

manual install the ROracle

wget http://cran.r-project.org/src/contrib/DBI_0.2-5.tar.gz
R CMD INSTALL DBI_0.2-5.tar.gz
wget http://cran.r-project.org/src/contrib/ROracle_1.1-5.tar.gz
R CMD INSTALL --configure-args='--with-oci-inc=/opt/oracle/instantclient_11_2/sdk/include' ROracle_1.1-5.tar.gz

Install RStudio Server

apt-get install libssl0.9.8 # must install even you have newer version
apt-get install libapparmor1 apparmor-utils
wget http://download2.rstudio.org/rstudio-server-0.96.331-i386.deb
dpkg -i rstudio-server-0.96.331-i386.deb
rstudio-server verify-installation

Do some RStudio Server setting

echo 'rsession-memory-limit-mb=1000' > /etc/rstudio/rserver.conf
echo 'rsession-stack-limit-mb=4' >> /etc/rstudio/rserver.conf
echo 'rsession-process-limit=20' >> /etc/rstudio/rserver.conf
# Only pass below if you will using proxy mode
echo 'www-address=127.0.0.1' >> /etc/rstudio/rserver.conf
groupadd rstudio

Setting the proxy server for RStudio server

This section is optional, assured already install nginx in server.

do not forgot link to /opt/nginx/conf/vhosts

server {
  listen       80;
  server_name  cvprstudio;
  location / {
    proxy_pass http://localhost:8787;
    proxy_redirect http://localhost:8787/ $scheme://$host/;
  }
}

Setting auto restart and PATH

ln -s /usr/lib/rstudio-server/extras/init.d/debian/rstudio-server /etc/init.d/rstudio-server
vi /etc/init.d/rstudio-server

append below line to /etc/init.d/rstudio-server SCRIPTNAME

ORACLE_BASE=/opt/oracle
ORACLE_HOME=/opt/oracle/instantclient_11_2
TNS_ADMIN=/opt/oracle/network/admin
NLS_LANG=AMERICAN_AMERICA.AL32UTF8

Now you can restart/start via standard init.d service way

/etc/init.d/rstudio-server restart

Add a user in RStudio

adduser --ingroup rstudio cindy
passwd cindy # setting password

Update package

Usually it is more good to upgrade the r-base in system wide packages instead of per user

2012年11月30日星期五

2012年11月28日星期三

2012年11月27日星期二

2012年11月26日星期一

Unix and Perl Primer for Biologists

Background

Results

Conclusion

Objective

Methods

Results

Conclusion

2012年11月22日星期四

2012年11月21日星期三

2012年11月20日星期二

2012年11月14日星期三

2012年11月13日星期二

Keeping R up to date on Ubuntu linux

2012年11月9日星期五

2012年11月8日星期四

2012年11月7日星期三

2012年11月6日星期二

2012年11月5日星期一

Install R

Install Oracle DB access package

Install RStudio Server

Do some RStudio Server setting

Setting the proxy server for RStudio server

Setting auto restart and PATH

Add a user in RStudio

Update package