2012年2月29日星期三
boosted regression trees - a robust ecological modeling tech
1. A working guide to boosted regression trees
2. dismo package of R implement a robust BRT function.
3. BOOSTED TREES FOR ECOLOGICAL MODELING AND PREDICTION
3. BOOSTED TREES FOR ECOLOGICAL MODELING AND PREDICTION
2012年2月28日星期二
2012年2月27日星期一
fetch miRNA target and map them to pathways using R/bioconductor
##
##[1] 1114
##[1] 315
扩大了范围、提高了分辨率 - extend the scale and resolution
Bio++ is a set of C++ libraries for Bioinformatics
除了,不管,Irrespective of
2012年2月25日星期六
Useful Bash commands to handle FASTA files
#####################################################
(1) counting number of sequences in a fasta file :
grep -c "^>" file.fa
remove comments
sed -e 's/^\(>[^[:space:]]*\).*/\1/' my.fasta > mymodified.fasta
(2) add something to end of all header lines:
sed 's/>.*/&WHATEVERYOUWANT/' file.fa > outfile.fa
(3) clean up a fasta file so only first column of the header is outputted:
awk '{print $1}' file.fa > output.fa
(4) To extract ids, just use the following:
grep -o -E "^>\w+" file.fasta | tr -d ">"
(5) A useful step is to linearize your sequences (i.e. remove the sequence wrapping). This is not a perfect solution, as I suspect that a few steps could be avoided, but it works quite fast, even for thousands of sequences.
sed -e 's/\(^>.*$\)/#\1#/' file.fasta | tr -d "\r" | tr -d "\n" | sed -e 's/$/#/' | tr "#" "\n" | sed -e '/^$/d'
(6) Remove duplicated sequences. Pierre Lindenbaum proposed this solution.
sed -e '/^>/s/$/@/' -e 's/^>/#/' file.fasta | tr -d '\n' | tr "#" "\n" | tr "@" "\t" | sort -u -t $'\t' -f -k 2,2 | sed -e 's/^/>/' -e 's/\t/\n/'
(7) Splitting a FASTA file of multiple sequences into FASTA files of individual sequences
This command will create as many files as there are member sequences in the same directory as the source file,
incrementally numbered with a .fasta extension. (e.g. for an input file with 5 member sequences, such as the Arabidopsis genome, it will output files 1.fasta to 5.fasta.
awk '/^>/{f=++d".fasta"} {print > f}'
(8) Joining multiple FASTA files into a single, multi-sequence FASTA file
This is the reverse of the above and we will assume a few things . Firstly , you want to combine all fasta files in the current directory and , secondly , they all have the same extension (.fasta ). Adapt to your needs if this is not the case !
cat *.fasta >
(10) List the sequence headers in a FASTA file
grep ">"
(1) Counting the number of sequence entities in a FASTA file
grep ">" | wc -l
(12) Determining the length of the sequence in a FASTA file
This method will give the TOTAL sequence length of a FASTA file. This means that if your FASTA file has a number of sequence entries, it will return the sum of the length of each sequence entry. To get the length of individual entries you would first need to split the file into individual entries, or do it programatically: either using a homegrown method or a Bioinformatics library such as BioPerl.
grep -v ">" | tr -d [:space:] | wc -c
订阅:
博文 (Atom)