2012年2月29日星期三

评价环境因子对形态变异的决定作用


Forest structure and soil fertility determine internal stem morphology of Pedunculate oak

评价各环境因素对物种分布的决定作用

Applying additive modelling and gradient boosting to assess the effects of watershed and reach characteristics on riverine assemblages

物种分布区模拟中的分析作图工作


ModelMap

物种分布区模拟中的空间自相关

Spatial autocorrelation in predictors reduces the impact of positional uncertainty in occurrence data on species distribution modelling

物种的环境需求 Modelling the habitat requirement

Modelling the habitat requirement of riverine fish species at the Europeanscale


物种分布模型中的生物相互作用 - biotic interaction and species distribution modeling

How do species interactions affect species distribution models


功能性状变异与物种分布区模拟 - linking functional traits with species distribution

The role of functional traits in species distributions revealed through hierarchical model








boosted regression trees - a robust ecological modeling tech

1. A working guide to boosted regression trees


2. dismo package of R implement a robust BRT function.

3. BOOSTED TREES FOR ECOLOGICAL MODELING AND PREDICTION

2012年2月27日星期一

fetch miRNA target and map them to pathways using R/bioconductor

cited from: 
Bioconductor Digest, Vol 108, Issue 28


###################################################
here is the outline on how to do this using the miRNA predicted
conserved targets using targetscan in Human (it can also be done in Mouse).

myMir <- "hsa-mir-17"

# 1. to find targets we need to use the mature form of the mirna
# we can do this using 'mirbase.db'

require("mirbase.db") || stop("Could not load package 'mirbase.db'")

## inspect the mature forms for this particular miRNA
print(get(myMir, mirbaseMATURE))
##Accession: MIMAT0000070
##  ID: hsa-miR-17
##  Start: 14
##  End: 36
##  Evidence: experimental
##  Experiment: cloned [2,5-8], Northern [4]
##
##Accession: MIMAT0000071
##  ID: hsa-miR-17*
##  Start: 51
##  End: 72
##  Evidence: experimental
##  Experiment: cloned [1,5,7-8], Northern [1]

## select the first one
myMature <- matureName(get(myMir, mirbaseMATURE))[1]

# 2. find the targets of this mature mirna using targetscan

require("targetscan.Hs.eg.db") || stop("Could not load package",
                                       "'targetscan.Hs.eg.db'")

## get the seed-based family associated with this mature mirna
myMirFam <- get(myMature, targetscan.Hs.egMIRBASE2FAMILY)

## retrieve targets (as NCBI Gene IDs)
myMirTargets <- get(myMirFam, revmap(targetscan.Hs.egTARGETS))

##length(myMirTargets)
##[1] 1114

## 3. map target genes to KEGG pathways
require("org.Hs.eg.db") || stop("Could not load package",
                                "'org.Hs.eg.db'.")

myMirTargetsKegg <- mget(myMirTargets, org.Hs.egPATH)
##sum(!is.na(myMirTargetsKegg))
##[1] 315

## optional for convenience, add KEGG pathway names
require("KEGG.db") || stop("Could not load package 'KEGG.db'")

## using only the entries with a KEGG id
myMirTargetsKeggNames <- lapply(myMirTargetsKegg[!is.na(myMirTargetsKegg)],
                                function(i) mget(i, KEGGPATHID2NAME))


Note that since KEGG is no longer public and that the BioConductor
package will soon be considered deprecated you could use the
'reactome.db' package instead for mapping pathways. Just replace the
step 3. above by a call to:

mget(myMirTargets, reactomeEXTID2PATHID, ifnotfound=NA)

I'll leave finding the pathway names as an exercise ;-)

扩大了范围、提高了分辨率 - extend the scale and resolution

The rapid development of next generation sequencing (NGStechnology provides a new chance to extend the scale and resolution of genomic research.

Bio++ is a set of C++ libraries for Bioinformatics

Bio++ is a set of C++ libraries for Bioinformatics, including sequence analysis,phylogeneticsmolecular evolution and population genetics. Bio++ is fully Object Oriented and is designed to be both easy to use and computer efficient.

C/C++ libraries for bioinformatics


C/C++ libraries for bioinformatics

除了,不管,Irrespective of

Irrespective of the choice of alignment toolthe user is faced with the problem of cross-mappingwhich is the situation in which an sRNA sequence originating from one genomic location is inadvertently mapped toanotherincorrectlocation

Genotype Imputation with Thousands of Genomes


Genotype Imputation with Thousandsof Genomes


Geospatial analysis - resources

http://www.spatialanalysisonline.com/

2012年2月25日星期六

Useful Bash commands to handle FASTA files



#####################################################
(1) counting number of sequences in a fasta file:
grep -c "^>" file.fa
remove comments
sed -e 's/^\(>[^[:space:]]*\).*/\1/' my.fasta > mymodified.fasta
(2) add something to end of all header lines:
sed 's/>.*/&WHATEVERYOUWANT/' file.fa > outfile.fa
(3) clean up a fasta file so only first column of the header is outputted:
awk '{print $1}' file.fa > output.fa
(4) To extract ids, just use the following:

grep -o -E "^>\w+" file.fasta | tr -d ">"
(5) A useful step is to linearize your sequences (i.e. remove the sequence wrapping). This is not a perfect solution, as I suspect that a few steps could be avoided, but it works quite fast, even for thousands of sequences.
sed -e 's/\(^>.*$\)/#\1#/' file.fasta | tr -d "\r" | tr -d "\n" | sed -e 's/$/#/' | tr "#" "\n" | sed -e '/^$/d'
(6) Remove duplicated sequences. Pierre Lindenbaum proposed this solution.
sed -e '/^>/s/$/@/' -e 's/^>/#/' file.fasta | tr -d '\n' | tr "#" "\n" | tr "@" "\t" | sort -u -t $'\t' -f -k 2,2  | sed -e 's/^/>/' -e 's/\t/\n/'

(7) Splitting a FASTA file of multiple sequences into FASTA files of individual sequences

This command will create as many files as there are member sequences in the same directory as the source file, incrementally numbered with a .fasta extension. (e.g. for an input file with 5 member sequences, such as the Arabidopsis genome, it will output files 1.fasta to 5.fasta.
awk '/^>/{f=++d".fasta"} {print > f}' 

(8) Joining multiple FASTA files into a single, multi-sequence FASTA file

This is the reverse of the above and we will assume a few thingsFirstlyyou want to combine all fasta files in thecurrent directory andsecondlythey all have the same extension (.fasta). Adapt to your needs if this is not the case!
cat *.fasta > 

(10) List the sequence headers in a FASTA file

grep ">" 

(1) Counting the number of sequence entities in a FASTA file

grep ">"  | wc -l

(12) Determining the length of the sequence in a FASTA file

This method will give the TOTAL sequence length of a FASTA file. This means that if your FASTA file has a number of sequence entries, it will return the sum of the length of each sequence entry. To get the length of individual entries you would first need to split the file into individual entries, or do it programatically: either using a homegrown method or a Bioinformatics library such as BioPerl.
grep -v ">"  | tr -d [:space:] | wc -c