I just cite a sequence here.
2012年2月6日星期一
why we study phenotypic variations
I just cite a sequence here.
2012年2月4日星期六
chromosome wide distribution map
1. ggbio , ideogram , and geneplotter from Bioconductor
2. http://biostar.stackexchange.com/questions/16930/create-chromosome-wide-distribution-map
3. http://biostar.stackexchange.com/questions/378/drawing-chromosome-ideogams-with-data
linkage disequilibrium LD and population clustering
1. a good discussion.
http://www.nature.com/hdy/journal/v99/n4/full/6801010a.html
2. Detecting population structure using STRUCTURE software : effect of background linkage disequilibrium
http://www.nature.com/hdy/journal/v99/n4/full/6801010a.html
2012年2月3日星期五
align facets in case of factor variables (time series)
如果X轴是时间序列变量,那么如何把这些X轴对齐排列呢?
Here is an example.
Created by Pretty R at inside-R.org
Here is an example.
ex <- structure(list(Time = structure(c(1278428400, 1278429300,
1278430200, 1278431100, 1278432000, 1278432900, 1278433800, 1278434700,
1278435600, 1278436500, 1278437400, 1278438300, 1278439200, 1278440100,
1278441000, 1278441900, 1278442800, 1278443700, 1278444600, 1278445500,
1278446400, 1278447300, 1278448200, 1278449100, 1278450000, 1278450900,
1278451800, 1278452700, 1278453600, 1278454500, 1278455400, 1278456300,
1278457200), class = c("POSIXt", "POSIXct")), `Temperature 1` = c(23.4994760481,
23.5691608609, 23.4065467209, 23.3366466476, 23.7551289027, 23.8713964903,
23.8017531186, 23.8713964903, 23.8017531186, 23.7319104094, 23.7086908787,
23.7086908787, 23.6390259428, 23.6390259428, 23.7319104094, 23.7086908787,
23.7086908787, 23.7783463702, 23.8713964903, 23.9874496028, 24.0572606946,
24.2428788024, 24.3126625639, 24.4750221758, 24.4054436873, 24.4518300234,
24.4286371978, 24.6375394346, 24.7768442676, 24.6375394346, 24.6839132299,
24.4982136668, 24.8695770905), `Temperature 2` = c(26.1917071192,
26.4004768163, 26.7251744961, 26.6092703221, 26.5627215105, 27.0035834406,
26.7251744961, 26.5627215105, 26.7019928016, 26.6324503199, 26.8412800193,
26.7251744961, 26.8876497084, 26.8180959444, 26.7715392541, 26.7715392541,
26.6788115483, 26.8180959444, 26.8644646035, 26.9803955621, 27.0965310426,
27.1197219834, 27.0501510521, 27.0267719079, 26.7947223403, 26.8644646035,
26.9572082609, 27.2124923147, 27.560865829, 27.514456368, 27.5376606738,
27.6536951709, 27.4446582586), sources = c(1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0), status = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("closed",
"open"), class = "factor")), .Names = c("Time", "Temperature 1",
"Temperature 2", "sources", "status"), row.names = c(NA, -33L
), class = "data.frame")
ex$status <- 100 + (ex$status=="open")
melted <- melt(ex,id.vars="Time")
melted$shortvar <- substring(melted$variable,1,11) # This is to put both temperature readings on the same facet.
melted$value2 <- ifelse(melted$shortvar=="Temperature",melted$value,NA)
melted$value <- ifelse(is.na(melted$value2),melted$value,NA)
ggplot(melted, aes(x=Time)) +
geom_point(aes(y=value2)) +
geom_line(aes(y=value,group=variable)) +
facet_grid(shortvar ~ ., scales="free", space="free") +
scale_y_continuous(expand=c(.1,0), breaks=seq(0,101,1),
minor_breaks=seq(0,100,.5),
labels=c(as.character(seq(0,99)),"closed","open"))
label the time series plot - ggplot2
Usually, the X-axis would be date/time variables, then how could we specify the right X positions for labels of whatever lines, text or rectangles? 时间序列作图中,X轴通常为时间序列变量。那么此时如果要给这样的图做注释,我们如何确定横轴坐标呢?
Here an example.
Created by Pretty R at inside-R.org
Here an example.
d <- data.frame(mon = seq(as.Date('2010-09-01'), as.Date('2011-05-01'), by = '1 month'), y1 = c(9, 10, 9, 11, 10, 11, 10, 9, 9), y2 = c(17, 14, 16, 15, 14, 15, 16, 17, 17)) as.numeric(d$mon) lab1 <- 'First multi-line\nchart annotation' lab2 <- 'Second multi-line\nchart annotation' ggplot(d, aes(x = mon)) + theme_bw() + geom_rect(xmin = 14000, xmax = 16000, ymin = 10, ymax = 30, fill = 'ivory') + geom_line(aes(y = y1), colour = 'red', linetype = 'dashed', size = 1) + geom_line(aes(y = y2), colour = 'gold', size = 1) + geom_text(aes(x = 14900, y = 17.5, label = lab1)) + geom_text(aes(x = 14950, y = 12.5, label = lab2)) + ylim(0, 20) + scale_x_date(major = '1 month', format = '%b') + opts(panel.grid.major = theme_blank(), panel.grid.minor = theme_blank()) + geom_hline(aes(yintercept = seq(0, 20, by = 2.5)), colour = 'grey80')
2012年2月2日星期四
top and htop - check for system status of linux
PID | The task’s unique process ID, which periodically wraps, though never restarting at zero. |
PPID | The process ID of a task’s parent. |
USER | The effective user name of the task’s owner. |
PR | The priority of the task. |
NI | The nice value of the task. A negative nice value means higher priority, whereas a positive nice value means lower priority. Zero in this field simply means priority will not be adjusted in determining a task’s dispatchability. |
VIRT | The total amount of virtual memory used by the task, in kB. It includes all code, data and shared libraries and pages that have been swapped out, and pages that have been mapped but not used. VIRT = SWAP + RES. |
RES | The resident/non-swapped physical memory a task has reserved, in kB. RES = CODE + DATA. |
SHR | The amount of shared memory used by a task, in kB. It simply reflects memory that could be potentially shared with other processes. |
S | The status of the task which can be one of: ’D’ = uninterruptible sleep, ’R’ = running, ’S’ = sleeping, ’T’ = traced or stopped,’Z’ = zombie |
%CPU | The task’s share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time. |
%MEM | A task’s currently used share (RES) of available physical memory. |
TIME+ | Total CPU time the task has used since it started. |
COMMAND | The command line used to start a task or the name of the associated program. You toggle between command line and name with ’c’, which is both a command-line option and an interactive command. |
using Cairo to make your R plot better
Antialias plotting in R using Cairo
before
after
install .packages (c ("Cairo "),repos ="http ://cran .r-project .org " )library (Cairo )CairoPNG ('new-style .png ')plot (x ,y ,main ='Test plot ',pch =21,col ='blue ',bg ='lightblue ')abline (lm (y ~x ),col ='red ',lwd =2)dev .off ()
2012年2月1日星期三
VPA - R tool for NGS
VPA : an R tool for analyzing sequencing variants with user-specified frequency pattern
Background
The massive amounts of genetic variant generated by the next generation sequencing systems demand the development of effective computational tools for variant prioritization. Findings VPA (Variant Pattern Analyzer) is an R tool for prioritizing variants with specified frequency pattern from multiple study subjects in next-generation sequencing study. The tool starts from individual files of variant and sequence calls and extract variants with user-specified frequency pattern across the study subjects of interest. Several position level quality criteria can be incorporated into the variant extraction. It can be used in studies with matched pair design as well as studies with multiple groups of subjects.
Conclusions
VPA can be used as an automatic pipeline to prioritize variants for further functional exploration and hypothesis generation. The package is implemented in the R language and is freely available from http://vpa.r-forge.r-project.org.
A library and toolset for working with human genetic variation data
PLINK/SEQ is an open-source C /C ++ library (input VCF file) for working with human genetic variation data . The specific focus is to provide a platform for analytic tool development for variation data from large-scale resequencing projects , particularly whole-exome and whole-genome studies . However , the library could in principle be applied to other types of genetic studies , including whole-genome association studies of common SNPs .
A number of interfaces to the core library are available, providing different ways to access a PLINK/SEQ project:
- Command line tool: pseq provides easy access to some of the most common functions of the library (e.g. loading and querying data) and also implements a number of useful statistical procedures (e.g. to summarise datasets, perform phenotype-genotype association tests).
- R package for statistical computing: Use R as an interface to the dynamically-linked C/C++ extension library. This provides convenient access to the powerful statistical and visualisation tools available in R.
- Web-browser: an exome-centric table-browser provides a simple, interactive tool for searching and reporting on a project's variant, genotypic and phenotypic data and meta-data.
- C/C++ API: alternatively, one can use the C/C++ library API directly, to build analysis packages or other tools.
a good plant research center - UZH Institute of plant biology
Molecular Plant Biology / Phytopathology | ||
---|---|---|
Prof. Beat Keller | Prof. Robert Dudler | PD Christoph Ringli |
Plant Developmental Genetics | ||
Prof. Ueli Grossniklaus | ||
Evolutionary Functional Genomics | ||
Prof. Kentaro Shimizu | ||
Molecular Plant Physiology | ||
Prof. Enrico Martinoia | Prof. Felix Keller | Prof. Stefan Hörtensteiner |
Limnology and Limnological Station | ||
Prof. Jakob Pernthaler | ||
Microbiology | ||
Prof. Leo Eberl | Dr. Laure Weisskopf | |
Administration / Library | ||
Administration Library |
quick R - present you many R resources
quick R blog:
http://statmethods.wordpress.com/
quick R web page, when I began to learn R, I referred this page for many times:
http://www.statmethods.net/index.html
http://statmethods.wordpress.com/
quick R web page, when I began to learn R, I referred this page for many times:
http://www.statmethods.net/index.html
R cook book - very valuable collections of tips of using R
R cook book, please go there and check.
R Cookbook
- index
R Cookbook
Welcome to the R Cookbook. (This site is not related to Paul Teetor's excellent book by the same name.) The goal of the cookbook is to provide solutions to common tasks and problems in analyzing data, mostly from psychological experiments.
Most of the code in these pages can be copied and pasted into the R command window if you want to see them in action.
- Basics
- Numbers
- Strings
- Formulas
- Data input and output
- Manipulating data
- Statistical analysis
- Graphs
- Scripts and functions
- Tools for experiments
Other useful references
- Quick-R - an excellent quick reference
- R Reference card
(PDF) - R tips
- Some simple R tips.
订阅:
博文 (Atom)