2011年3月24日星期四

extract sequences from genome by using R/bioconductor

please reference the following sites for more deep discussions

http://biostar.stackexchange.com/questions/979
http://biostar.stackexchange.com/questions/6358/how-to-extract-specific-coordinates-from-multifasta-file

####################################################
# install packages necessary
# use Arabidopsis thaliana as an example
source("http://www.bioconductor.org/biocLite.R")
biocLite("BSgenome.Athaliana.TAIR.04232008")

# load the package
library(BSgenome.Athaliana.TAIR.04232008)

# touch the funtionalities
Athaliana
seqlengths(Athaliana)
Athaliana$chr1
class(Athaliana)
organism(Athaliana)
alphabetFrequency(Athaliana$chr1)

# get the sequences
start = runif(100, min=1, max =length(Athaliana$chr1)-200)
width = runif(100, min=30, max=200)
result = getSeq(Athaliana, names=chr1, start=start, end=start)
cbind(start, result)

没有评论:

发表评论