2012年2月6日星期一

why we study phenotypic variations

the proteome is closer to the phenotype than the genome or the transcriptomeand as such may be more directly responsive to natural selectionand thus closely linked to adaptation


I just cite a sequence here.

RNCEP: global weather and climate data at your fingertips

RNCEP: global weather and climate data at your fingertips

reach climate data by the RNCEP package.

2012年2月3日星期五

R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data

http://nar.oxfordjournals.org/content/early/2012/01/28/nar.gks047.long




Dynamic models in biology

http://www.cam.cornell.edu/~dmb/DMBsupplements.html

with R codes.

How to be a quantitive ecologist

text with R code:
http://greenmaths.st-andrews.ac.uk/index.aspx

align facets in case of factor variables (time series)

如果X轴是时间序列变量,那么如何把这些X轴对齐排列呢?

Here is an example.



ex <- structure(list(Time = structure(c(1278428400, 1278429300,
1278430200, 1278431100, 1278432000, 1278432900, 1278433800, 1278434700,
1278435600, 1278436500, 1278437400, 1278438300, 1278439200, 1278440100,
1278441000, 1278441900, 1278442800, 1278443700, 1278444600, 1278445500,
1278446400, 1278447300, 1278448200, 1278449100, 1278450000, 1278450900,
1278451800, 1278452700, 1278453600, 1278454500, 1278455400, 1278456300,
1278457200), class = c("POSIXt", "POSIXct")), `Temperature 1` = c(23.4994760481,
23.5691608609, 23.4065467209, 23.3366466476, 23.7551289027, 23.8713964903,
23.8017531186, 23.8713964903, 23.8017531186, 23.7319104094, 23.7086908787,
23.7086908787, 23.6390259428, 23.6390259428, 23.7319104094, 23.7086908787,
23.7086908787, 23.7783463702, 23.8713964903, 23.9874496028, 24.0572606946,
24.2428788024, 24.3126625639, 24.4750221758, 24.4054436873, 24.4518300234,
24.4286371978, 24.6375394346, 24.7768442676, 24.6375394346, 24.6839132299,
24.4982136668, 24.8695770905), `Temperature 2` = c(26.1917071192,
26.4004768163, 26.7251744961, 26.6092703221, 26.5627215105, 27.0035834406,
26.7251744961, 26.5627215105, 26.7019928016, 26.6324503199, 26.8412800193,
26.7251744961, 26.8876497084, 26.8180959444, 26.7715392541, 26.7715392541,
26.6788115483, 26.8180959444, 26.8644646035, 26.9803955621, 27.0965310426,
27.1197219834, 27.0501510521, 27.0267719079, 26.7947223403, 26.8644646035,
26.9572082609, 27.2124923147, 27.560865829, 27.514456368, 27.5376606738,
27.6536951709, 27.4446582586), sources = c(1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0), status = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("closed",
"open"), class = "factor")), .Names = c("Time", "Temperature 1",
"Temperature 2", "sources", "status"), row.names = c(NA, -33L
), class = "data.frame")

ex$status <- 100 + (ex$status=="open")
melted <- melt(ex,id.vars="Time")
melted$shortvar <- substring(melted$variable,1,11) # This is to put both temperature readings on the same facet.
melted$value2 <- ifelse(melted$shortvar=="Temperature",melted$value,NA)
melted$value <- ifelse(is.na(melted$value2),melted$value,NA)
ggplot(melted, aes(x=Time)) +
 geom_point(aes(y=value2)) +
 geom_line(aes(y=value,group=variable)) +
 facet_grid(shortvar ~ ., scales="free", space="free") +
 scale_y_continuous(expand=c(.1,0), breaks=seq(0,101,1),
minor_breaks=seq(0,100,.5),
 labels=c(as.character(seq(0,99)),"closed","open"))
Created by Pretty R at inside-R.org

label the time series plot - ggplot2

Usually, the X-axis would be date/time variables, then how could we specify the right X positions for labels of whatever lines, text or rectangles? 时间序列作图中,X轴通常为时间序列变量。那么此时如果要给这样的图做注释,我们如何确定横轴坐标呢?

Here an example.



d <- data.frame(mon = seq(as.Date('2010-09-01'),

as.Date('2011-05-01'), by = '1 month'),

                y1 = c(9, 10, 9, 11, 10, 11, 10, 9, 9),

                y2 = c(17, 14, 16, 15, 14, 15, 16, 17, 17))

as.numeric(d$mon)

lab1 <- 'First multi-line\nchart annotation'

lab2 <- 'Second multi-line\nchart annotation'



ggplot(d, aes(x = mon)) +

    theme_bw() +

    geom_rect(xmin = 14000, xmax = 16000, ymin = 10, ymax = 30, fill =

'ivory') +

    geom_line(aes(y = y1), colour = 'red', linetype = 'dashed', size = 1) +

    geom_line(aes(y = y2), colour = 'gold', size = 1) +

    geom_text(aes(x = 14900, y = 17.5, label = lab1)) +

    geom_text(aes(x = 14950, y = 12.5, label = lab2)) +

    ylim(0, 20) +

    scale_x_date(major = '1 month', format = '%b') +

    opts(panel.grid.major = theme_blank(),

         panel.grid.minor = theme_blank()) +

    geom_hline(aes(yintercept = seq(0, 20, by = 2.5)), colour = 'grey80')
Created by Pretty R at inside-R.org

2012年2月2日星期四

LDsplit - super tools for detecting SNPs associated with meiotic recombination hotspots

LDsplit is an open source Java program that detects SNPs (single nucleotide polymorphismsassociated with meiotic recombination hotspots.



top and htop - check for system status of linux


FieldDescription
PIDThe task’s unique process ID, which periodically wraps, though never restarting at zero.
PPIDThe process ID of a task’s parent.
USERThe effective user name of the task’s owner.
PRThe priority of the task.
NIThe nice value of the task. A negative nice value means higher priority, whereas a positive nice value means lower priority. Zero in this field simply means priority will not be adjusted in determining a task’s dispatchability.
VIRTThe total amount of virtual memory used by the task, in kB. It includes all code, data and shared libraries and pages that have been swapped out, and pages that have been mapped but not used. VIRT = SWAP + RES.
RESThe resident/non-swapped physical memory a task has reserved, in kB. RES = CODE + DATA.
SHRThe amount of shared memory used by a task, in kB. It simply reflects memory that could be potentially shared with other processes.
SThe status of the task which can be one of: ’D’ = uninterruptible sleep, ’R’ = running, ’S’ = sleeping, ’T’ = traced or stopped,’Z’ = zombie
%CPUThe task’s share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time.
%MEMA task’s currently used share (RES) of available physical memory.
TIME+Total CPU time the task has used since it started.
COMMANDThe command line used to start a task or the name of the associated program. You toggle between command line and name with ’c’, which is both a command-line option and an interactive command.

using Cairo to make your R plot better


Antialias plotting in R using Cairo


before
after
install.packages(c("Cairo"), repos="http://cran.r-project.org" )
library(Cairo)
CairoPNG('new-style.png')
plot(x, y, main='Test plot', pch=21, col='blue', bg='lightblue')
abline(lm(y ~ x), col='red', lwd=2)
dev.off()

2012年2月1日星期三

VPA - R tool for NGS


VPAan R tool for analyzing sequencing variants with user-specifiedfrequency pattern


Background

The massive amounts of genetic variant generated by the next generation sequencing systems demand the development of effective computational tools for variant prioritization. Findings VPA (Variant Pattern Analyzer) is an R tool for prioritizing variants with specified frequency pattern from multiple study subjects in next-generation sequencing study. The tool starts from individual files of variant and sequence calls and extract variants with user-specified frequency pattern across the study subjects of interest. Several position level quality criteria can be incorporated into the variant extraction. It can be used in studies with matched pair design as well as studies with multiple groups of subjects.

Conclusions

VPA can be used as an automatic pipeline to prioritize variants for further functional exploration and hypothesis generation. The package is implemented in the R language and is freely available from http://vpa.r-forge.r-project.org.

A library and toolset for working with human genetic variation data


PLINK/SEQ is an open-source C/C++ library (input VCF file) for working with human genetic variation dataThe specific focus is to provide a platform for analytic tooldevelopment for variation data from large-scale resequencing projectsparticularly whole-exome and whole-genome studiesHoweverthe library couldin principle be applied to other types of genetic studiesincluding whole-genome association studies of common SNPs.
A number of interfaces to the core library are available, providing different ways to access a PLINK/SEQ project:
  • Command line toolpseq provides easy access to some of the most common functions of the library (e.g. loading and querying data) and also implements a number of useful statistical procedures (e.g. to summarise datasets, perform phenotype-genotype association tests).
  • R package for statistical computing: Use R as an interface to the dynamically-linked C/C++ extension library. This provides convenient access to the powerful statistical and visualisation tools available in R.
  • Web-browser: an exome-centric table-browser provides a simple, interactive tool for searching and reporting on a project's variant, genotypic and phenotypic data and meta-data.
  • C/C++ API: alternatively, one can use the C/C++ library API directly, to build analysis packages or other tools.

granova- an R package for graphical analysis of variance

granova

a good plant research center - UZH Institute of plant biology


Molecular Plant Biology / Phytopathology
Prof. Beat KellerProf. Robert DudlerPD Christoph Ringli
Plant Developmental Genetics
Prof. Ueli Grossniklaus
Evolutionary Functional Genomics
Prof. Kentaro Shimizu
Molecular Plant Physiology
Prof. Enrico MartinoiaProf. Felix KellerProf. Stefan Hörtensteiner
Limnology and Limnological Station
Prof. Jakob Pernthaler
Microbiology
Prof. Leo EberlDr. Laure Weisskopf
Administration / Library
Administration Library

quick R - present you many R resources

quick R blog:
http://statmethods.wordpress.com/

quick R web page, when I began to learn R, I referred this page for many times:
http://www.statmethods.net/index.html

R cook book - very valuable collections of tips of using R

R cook book, please go there and check.


R Cookbook

Welcome to the R Cookbook. (This site is not related to Paul Teetor's excellent book by the same name.) The goal of the cookbook is to provide solutions to common tasks and problems in analyzing data, mostly from psychological experiments.
Most of the code in these pages can be copied and pasted into the R command window if you want to see them in action.
  1. Basics
  2. Numbers
  3. Strings
  4. Formulas
  5. Data input and output
  6. Manipulating data
  7. Statistical analysis
  8. Graphs
  9. Scripts and functions
  10. Tools for experiments

Other useful references