2012年2月6日星期一

why we study phenotypic variations

the proteome is closer to the phenotype than the genome or the transcriptome, and as such may be more directly responsive to natural selection, and thus closely linked to adaptation.

I just cite a sequence here.

RNCEP: global weather and climate data at your fingertips

RNCEP: global weather and climate data at your fingertips

reach climate data by the RNCEP package.

2012年2月4日星期六

convert HTML to LaTeX, PDF and vector graphy

1. the tools, HTML to LaTeX

2. a good discussion here

http://biostar.stackexchange.com/questions/17037/fastqc-html-report-to-pdf-with-a-script

chromosome wide distribution map

1. ggbio, ideogram, and geneplotter from Bioconductor

2. http://biostar.stackexchange.com/questions/16930/create-chromosome-wide-distribution-map

3. http://biostar.stackexchange.com/questions/378/drawing-chromosome-ideogams-with-data

gff2ps - Produces PostScript graphical output from GFF-files

gff2ps, convert gff to ps format. I have not try it, but it is really interesting.

linkage disequilibrium LD and population clustering

1. a good discussion.

2. Detecting population structure using STRUCTUREsoftware: effect of background linkage disequilibrium

http://www.nature.com/hdy/journal/v99/n4/full/6801010a.html

2012年2月3日星期五

R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data

http://nar.oxfordjournals.org/content/early/2012/01/28/nar.gks047.long

Dynamic models in biology

http://www.cam.cornell.edu/~dmb/DMBsupplements.html

with R codes.

How to be a quantitive ecologist

text with R code:
http://greenmaths.st-andrews.ac.uk/index.aspx

align facets in case of factor variables (time series)

如果X轴是时间序列变量，那么如何把这些X轴对齐排列呢？

Here is an example.

ex <- structure(list(Time = structure(c(1278428400, 1278429300,

1278430200, 1278431100, 1278432000, 1278432900, 1278433800, 1278434700,

1278435600, 1278436500, 1278437400, 1278438300, 1278439200, 1278440100,

1278441000, 1278441900, 1278442800, 1278443700, 1278444600, 1278445500,

1278446400, 1278447300, 1278448200, 1278449100, 1278450000, 1278450900,

1278451800, 1278452700, 1278453600, 1278454500, 1278455400, 1278456300,

1278457200), class = c("POSIXt", "POSIXct")), `Temperature 1` = c(23.4994760481,

23.5691608609, 23.4065467209, 23.3366466476, 23.7551289027, 23.8713964903,

23.8017531186, 23.8713964903, 23.8017531186, 23.7319104094, 23.7086908787,

23.7086908787, 23.6390259428, 23.6390259428, 23.7319104094, 23.7086908787,

23.7086908787, 23.7783463702, 23.8713964903, 23.9874496028, 24.0572606946,

24.2428788024, 24.3126625639, 24.4750221758, 24.4054436873, 24.4518300234,

24.4286371978, 24.6375394346, 24.7768442676, 24.6375394346, 24.6839132299,

24.4982136668, 24.8695770905), `Temperature 2` = c(26.1917071192,

26.4004768163, 26.7251744961, 26.6092703221, 26.5627215105, 27.0035834406,

26.7251744961, 26.5627215105, 26.7019928016, 26.6324503199, 26.8412800193,

26.7251744961, 26.8876497084, 26.8180959444, 26.7715392541, 26.7715392541,

26.6788115483, 26.8180959444, 26.8644646035, 26.9803955621, 27.0965310426,

27.1197219834, 27.0501510521, 27.0267719079, 26.7947223403, 26.8644646035,

26.9572082609, 27.2124923147, 27.560865829, 27.514456368, 27.5376606738,

27.6536951709, 27.4446582586), sources = c(1, 1, 1, 1, 1, 1,

0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1,

0, 0, 0, 0, 0, 0), status = structure(c(2L, 2L, 2L, 2L, 2L, 2L,

2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("closed",

"open"), class = "factor")), .Names = c("Time", "Temperature 1",

"Temperature 2", "sources", "status"), row.names = c(NA, -33L

), class = "data.frame")



ex$status <- 100 + (ex$status=="open")

melted <- melt(ex,id.vars="Time")

melted$shortvar <- substring(melted$variable,1,11) # This is to put both temperature readings on the same facet. 

melted$value2 <- ifelse(melted$shortvar=="Temperature",melted$value,NA)

melted$value <- ifelse(is.na(melted$value2),melted$value,NA)

ggplot(melted, aes(x=Time)) +

 geom_point(aes(y=value2)) +

 geom_line(aes(y=value,group=variable)) +

 facet_grid(shortvar ~ ., scales="free", space="free") +

 scale_y_continuous(expand=c(.1,0), breaks=seq(0,101,1),

minor_breaks=seq(0,100,.5),

 labels=c(as.character(seq(0,99)),"closed","open"))

Created by Pretty R at inside-R.org

label the time series plot - ggplot2

Usually, the X-axis would be date/time variables, then how could we specify the right X positions for labels of whatever lines, text or rectangles? 时间序列作图中，X轴通常为时间序列变量。那么此时如果要给这样的图做注释，我们如何确定横轴坐标呢？

Here an example.

d <- data.frame(mon = seq(as.Date('2010-09-01'),

as.Date('2011-05-01'), by = '1 month'),

                y1 = c(9, 10, 9, 11, 10, 11, 10, 9, 9),

                y2 = c(17, 14, 16, 15, 14, 15, 16, 17, 17))

as.numeric(d$mon)

lab1 <- 'First multi-line\nchart annotation'

lab2 <- 'Second multi-line\nchart annotation'



ggplot(d, aes(x = mon)) +

    theme_bw() +

    geom_rect(xmin = 14000, xmax = 16000, ymin = 10, ymax = 30, fill =

'ivory') +

    geom_line(aes(y = y1), colour = 'red', linetype = 'dashed', size = 1) +

    geom_line(aes(y = y2), colour = 'gold', size = 1) +

    geom_text(aes(x = 14900, y = 17.5, label = lab1)) +

    geom_text(aes(x = 14950, y = 12.5, label = lab2)) +

    ylim(0, 20) +

    scale_x_date(major = '1 month', format = '%b') +

    opts(panel.grid.major = theme_blank(),

         panel.grid.minor = theme_blank()) +

    geom_hline(aes(yintercept = seq(0, 20, by = 2.5)), colour = 'grey80')

Created by Pretty R at inside-R.org

2012年2月2日星期四

LDsplit - super tools for detecting SNPs associated with meiotic recombination hotspots

LDsplit is an open source Java program that detects SNPs (single nucleotide polymorphisms) associated with meiotic recombination hotspots.

top and htop - check for system status of linux

Field	Description
PID	The task’s unique process ID, which periodically wraps, though never restarting at zero.
PPID	The process ID of a task’s parent.
USER	The effective user name of the task’s owner.
PR	The priority of the task.
NI	The nice value of the task. A negative nice value means higher priority, whereas a positive nice value means lower priority. Zero in this field simply means priority will not be adjusted in determining a task’s dispatchability.
VIRT	The total amount of virtual memory used by the task, in kB. It includes all code, data and shared libraries and pages that have been swapped out, and pages that have been mapped but not used. VIRT = SWAP + RES.
RES	The resident/non-swapped physical memory a task has reserved, in kB. RES = CODE + DATA.
SHR	The amount of shared memory used by a task, in kB. It simply reflects memory that could be potentially shared with other processes.
S	The status of the task which can be one of: ’D’ = uninterruptible sleep, ’R’ = running, ’S’ = sleeping, ’T’ = traced or stopped,’Z’ = zombie
%CPU	The task’s share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time.
%MEM	A task’s currently used share (RES) of available physical memory.
TIME+	Total CPU time the task has used since it started.
COMMAND	The command line used to start a task or the name of the associated program. You toggle between command line and name with ’c’, which is both a command-line option and an interactive command.

using Cairo to make your R plot better

Antialias plotting in R using Cairo

before

after

install.packages(c("Cairo"), repos="http://cran.r-project.org" )
library(Cairo)
CairoPNG('new-style.png')
plot(x, y, main='Test plot', pch=21, col='blue', bg='lightblue')
abline(lm(y ~ x), col='red', lwd=2)
dev.off()

2012年2月1日星期三

VPA - R tool for NGS

VPA: an R tool for analyzing sequencing variants with user-specifiedfrequency pattern

Background

The massive amounts of genetic variant generated by the next generation sequencing systems demand the development of effective computational tools for variant prioritization. Findings VPA (Variant Pattern Analyzer) is an R tool for prioritizing variants with specified frequency pattern from multiple study subjects in next-generation sequencing study. The tool starts from individual files of variant and sequence calls and extract variants with user-specified frequency pattern across the study subjects of interest. Several position level quality criteria can be incorporated into the variant extraction. It can be used in studies with matched pair design as well as studies with multiple groups of subjects.

Conclusions

VPA can be used as an automatic pipeline to prioritize variants for further functional exploration and hypothesis generation. The package is implemented in the R language and is freely available from http://vpa.r-forge.r-project.org.

A library and toolset for working with human genetic variation data

PLINK/SEQ is an open-source C/C++ library (input VCF file) for working with human genetic variation data. The specific focus is to provide a platform for analytic tooldevelopment for variation data from large-scale resequencing projects, particularly whole-exome and whole-genome studies. However, the library couldin principle be applied to other types of genetic studies, including whole-genome association studies of common SNPs.

A number of interfaces to the core library are available, providing different ways to access a PLINK/SEQ project:

Command line tool: pseq provides easy access to some of the most common functions of the library (e.g. loading and querying data) and also implements a number of useful statistical procedures (e.g. to summarise datasets, perform phenotype-genotype association tests).
R package for statistical computing: Use R as an interface to the dynamically-linked C/C++ extension library. This provides convenient access to the powerful statistical and visualisation tools available in R.
Web-browser: an exome-centric table-browser provides a simple, interactive tool for searching and reporting on a project's variant, genotypic and phenotypic data and meta-data.
C/C++ API: alternatively, one can use the C/C++ library API directly, to build analysis packages or other tools.

granova- an R package for graphical analysis of variance

granova

a good plant research center - UZH Institute of plant biology

Molecular Plant Biology / Phytopathology
Prof. Beat Keller	Prof. Robert Dudler	PD Christoph Ringli
Plant Developmental Genetics
Prof. Ueli Grossniklaus
Evolutionary Functional Genomics
Prof. Kentaro Shimizu
Molecular Plant Physiology
Prof. Enrico Martinoia	Prof. Felix Keller	Prof. Stefan Hörtensteiner
Limnology and Limnological Station
Prof. Jakob Pernthaler
Microbiology
Prof. Leo Eberl		Dr. Laure Weisskopf
Administration / Library
Administration Library

quick R - present you many R resources

quick R blog:
http://statmethods.wordpress.com/

quick R web page, when I began to learn R, I referred this page for many times:
http://www.statmethods.net/index.html

R cook book - very valuable collections of tips of using R

R cook book, please go there and check.

R Cookbook

Welcome to the R Cookbook. (This site is not related to Paul Teetor's excellent book by the same name.) The goal of the cookbook is to provide solutions to common tasks and problems in analyzing data, mostly from psychological experiments.

Most of the code in these pages can be copied and pasted into the R command window if you want to see them in action.

Other useful references

Quick-R - an excellent quick reference
R Reference card (PDF)
R tips - Some simple R tips.

订阅：博文 (Atom)