2012年2月29日星期三

2012年2月28日星期二

good guide on geography with R

http://geography.uoregon.edu/geogr/

Software list for analysis of next generation sequence data

http://seqanswers.com/wiki/Software/list

如何写申请－工作和项目

Funding: Got to get a grant

2012年2月27日星期一

fetch miRNA target and map them to pathways using R/bioconductor

cited from:
Bioconductor Digest, Vol 108, Issue 28

###################################################
here is the outline on how to do this using the miRNA predicted
conserved targets using targetscan in Human (it can also be done in Mouse).

myMir <- "hsa-mir-17"

# 1. to find targets we need to use the mature form of the mirna
# we can do this using 'mirbase.db'

require("mirbase.db") || stop("Could not load package 'mirbase.db'")

## inspect the mature forms for this particular miRNA
print(get(myMir, mirbaseMATURE))
##Accession: MIMAT0000070
## ID: hsa-miR-17
## Start: 14
## End: 36
## Evidence: experimental
## Experiment: cloned [2,5-8], Northern [4]
##
##Accession: MIMAT0000071
## ID: hsa-miR-17*
## Start: 51
## End: 72
## Evidence: experimental
## Experiment: cloned [1,5,7-8], Northern [1]

## select the first one
myMature <- matureName(get(myMir, mirbaseMATURE))[1]

# 2. find the targets of this mature mirna using targetscan

require("targetscan.Hs.eg.db") || stop("Could not load package",
"'targetscan.Hs.eg.db'")

## get the seed-based family associated with this mature mirna
myMirFam <- get(myMature, targetscan.Hs.egMIRBASE2FAMILY)

## retrieve targets (as NCBI Gene IDs)
myMirTargets <- get(myMirFam, revmap(targetscan.Hs.egTARGETS))

##length(myMirTargets)
##[1] 1114

## 3. map target genes to KEGG pathways
require("org.Hs.eg.db") || stop("Could not load package",
"'org.Hs.eg.db'.")

myMirTargetsKegg <- mget(myMirTargets, org.Hs.egPATH)
##sum(!is.na(myMirTargetsKegg))
##[1] 315

## optional for convenience, add KEGG pathway names
require("KEGG.db") || stop("Could not load package 'KEGG.db'")

## using only the entries with a KEGG id
myMirTargetsKeggNames <- lapply(myMirTargetsKegg[!is.na(myMirTargetsKegg)],
function(i) mget(i, KEGGPATHID2NAME))

Note that since KEGG is no longer public and that the BioConductor
package will soon be considered deprecated you could use the
'reactome.db' package instead for mapping pathways. Just replace the
step 3. above by a call to:

mget(myMirTargets, reactomeEXTID2PATHID, ifnotfound=NA)

I'll leave finding the pathway names as an exercise ;-)

Genotype Imputation with Thousandsof Genomes

Geospatial analysis - resources

http://www.spatialanalysisonline.com/

2012年2月25日星期六

Useful Bash commands to handle FASTA files

http://biostar.stackexchange.com/questions/17726/useful-bash-commands-to-handle-fasta-files

http://chrisduran.eu/bioinformatics/linux-and-osx-commands-for-working-with-fasta-files/

#####################################################

(1) counting number of sequences in a fasta file:

grep -c "^>" file.fa

remove comments

sed -e 's/^\(>[^[:space:]]*\).*/\1/' my.fasta > mymodified.fasta

(2) add something to end of all header lines:

sed 's/>.*/&WHATEVERYOUWANT/' file.fa > outfile.fa

(3) clean up a fasta file so only first column of the header is outputted:

awk '{print $1}' file.fa > output.fa

(4) To extract ids, just use the following:


grep -o -E "^>\w+" file.fasta | tr -d ">"


(5) A useful step is to linearize your sequences (i.e. remove the sequence wrapping). This is not a perfect solution, as I suspect that a few steps could be avoided, but it works quite fast, even for thousands of sequences.
sed -e 's/\(^>.*$\)/#\1#/' file.fasta | tr -d "\r" | tr -d "\n" | sed -e 's/$/#/' | tr "#" "\n" | sed -e '/^$/d'


(6) Remove duplicated sequences. Pierre Lindenbaum proposed this solution.
sed -e '/^>/s/$/@/' -e 's/^>/#/' file.fasta | tr -d '\n' | tr "#" "\n" | tr "@" "\t" | sort -u -t $'\t' -f -k 2,2  | sed -e 's/^/>/' -e 's/\t/\n/'






(7) Splitting a FASTA file of multiple sequences into FASTA files of individual sequences


This command will create as many files as there are member sequences in the same directory as the source file,
incrementally numbered with a .fasta extension. (e.g. for an input file with 5 member sequences, such as the Arabidopsis genome, it will output files 1.fasta to 5.fasta.
awk '/^>/{f=++d".fasta"} {print > f}' 


(8) Joining multiple FASTA files into a single, multi-sequence FASTA file

This is the reverse of the above and we will assume a few things. Firstly, you want to combine all fasta files in thecurrent directory and, secondly, they all have the same extension (.fasta). Adapt to your needs if this is not the case!
cat *.fasta > 


(10) List the sequence headers in a FASTA file
grep ">" 


(1) Counting the number of sequence entities in a FASTA file
grep ">"  | wc -l


(12) Determining the length of the sequence in a FASTA file

This method will give the TOTAL sequence length of a FASTA file. This means that if your FASTA file has a number of sequence entries, it will return the sum of the length of each sequence entry. To get the length of individual entries you would first need to split the file into individual entries, or do it programatically: either using a homegrown method or a Bioinformatics library such as BioPerl.
grep -v ">"  | tr -d [:space:] | wc -c

订阅：博文 (Atom)

evolving all we are

2012年2月29日星期三

评价环境因子对形态变异的决定作用

Forest structure and soil fertility determine internal stem morphology of Pedunculate oak

评价各环境因素对物种分布的决定作用

物种分布区模拟中的分析作图工作

ModelMap

物种分布区模拟中的空间自相关

物种的环境需求 Modelling the habitat requirement

物种分布模型中的生物相互作用－ biotic interaction and species distribution modeling

功能性状变异与物种分布区模拟－ linking functional traits with species distribution

boosted regression trees - a robust ecological modeling tech

2012年2月28日星期二

good guide on geography with R

Software list for analysis of next generation sequence data

如何写申请－工作和项目

Funding: Got to get a grant

2012年2月27日星期一

fetch miRNA target and map them to pathways using R/bioconductor

扩大了范围、提高了分辨率－ extend the scale and resolution

Bio++ is a set of C++ libraries for Bioinformatics

C/C++ libraries for bioinformatics

C/C++ libraries for bioinformatics

除了，不管，Irrespective of

Genotype Imputation with Thousands of Genomes

Genotype Imputation with Thousandsof Genomes

Geospatial analysis - resources

2012年2月25日星期六

Useful Bash commands to handle FASTA files

`(7) Splitting a FASTA file of multiple sequences into FASTA files of individual sequences`

(8) Joining multiple FASTA files into a single, multi-sequence FASTA file

(10) List the sequence headers in a FASTA file

(1) Counting the number of sequence entities in a FASTA file

(12) Determining the length of the sequence in a FASTA file