2012年3月29日星期四

Grant JR, Arantes AS, Xiaoping L, Stothard P (2011) In-depth annotation of SNPs arising from resequencing projects using NGS-SNP. Bioinformatics 27:2300-2301.

Description

NGS-SNP is a collection of command-line scripts for providing rich annotations for SNPs identified by the sequencing of transcriptsor whole genomes from organisms with reference sequences in Ensembl. Included among the annotations, several of which arenot available from any existing SNP annotation tools, are the results of detailed comparisons with orthologous sequences. Thesecomparisons allow, for example, SNPs to be sorted or filtered based on how drastically the SNP changes the score of a proteinalignment. Other fields indicate the names of overlapping protein domains or features, and the conservation of both the SNP siteand flanking regions. NCBI, Ensembl, and Uniprot IDs are provided for genes, transcripts, and proteins when applicable, alongwith Gene Ontology terms, a gene description, phenotypes linked to the gene, and an indication of whether the SNP is novel orknown. A “Model_Annotations” field provides several annotations obtained by transferring in silico the SNP to an orthologousgene, typically in a well-characterized species.

NGS-SNP scripts

annotate_SNPs.pl - used to annotate SNPs identified by the sequencing of genomic DNA or transcripts.
merge_and_sort_SNP_lists.pl - used to filter, merge, and sort SNP lists annotated using NGS-SNP.
cDNA_library_entropy.pl - used to choose the best tissues for SNP discovery by mRNA sequencing.
obtain_reference_chromosomes.pl - used to obtain reference chromosome sequences from Ensembl that can be supplied to SNP discovery tools such as Maq.
obtain_reference_transcripts.pl - used to obtain reference transcript sequences from Ensembl that can be supplied to SNP discovery tools such as Maq.
get_genes_in_area.pl - used to obtain information about genes located within or nearby CNVs or other variants supplied as input.
ncbi_monitor.pl - used to obtain publications related to genome regions supplied as input.

Using NGS-SNP

Set up NGS-SNP. Note that the simplest approach is to follow the "Linux virtual machine" section of the installation guide.
Obtain a list of SNPs from SAMtools, Maq, the AB diBayes SNP package, or some other SNP calling software. The SNP list formats that can be parsed by the annotate_SNPs.pl script are described in the annotate_SNPs.pl documentation.
Annotate the SNP list using the annotate_SNPs.pl script.

The following commands illustrate a typical NGS-SNP session in which SNPs are annotated and then scored (sample data included with NGS-SNP is analyzed):

cd NGS-SNP/scripts

perl annotate_SNPs/annotate_SNPs.pl -s bos_taurus -cs Homo_sapiens \
Mus_musculus -v -matrix annotate_SNPs/data/blosum62.mat -i \
annotate_SNPs/test_input/bovine_GA_maq_transcripts.tab -o annotated_snps.tab

For more information on the options available, input formats, and output formats, see the documentation for each script. Each script also comes with sample input and output files, located in directories called test_input and sample_output, respectively.

Using a local Ensembl database

To speed up the annotation process by using a local Ensembl database, see "Creating a local copy of Ensembl for NGS-SNP".
To update NGS-SNP so that it uses the latest release of Ensembl, see "Updating the Ensembl API".

NGS Catalog - resources collections for NGS in human

http://bioinfo.mc.vanderbilt.edu/NGS/software.html

How-to Extract Text From Multiple Websites with R

2012年3月24日星期六

unix and python tutorials from a same author

http://pythonstarter.blogspot.de/

http://unstableme.blogspot.de/

plotGoogleMap - easily plot on GoogleMaps

plotGoogleMap

genome sequencing for huge genomes

Flow cytometric chromosome sorting in plants: The next generation

ABEL - R packages for genetic studies

http://www.genabel.org/tutorials/ABEL-tutorial

Packages

GenABEL, or *ABEL, is an umbrella name for a number of software packages aiming to facilitate statistical analyses of polymorphic genomes data. This is reach program set which now allows very flexible genome-wide association (GWA) analysis (GenABEL, ProbABEL, MixABEL), meta-analysis (MetABEL), parallelization of GWA analyses (ParallABEL), management of very large files (DatABEL), and facilitates evaluation of prediction (PredictABEL).

Most likely, you only need one of the packages for your specific task. Figure out which one you need, install, and use! If you have questions, please refer to the "Support" section of this web-site.

The code for latest development versions of all packages are available from GenABEL on R-forge

For stable releases, use CRAN version for R packages or links provided at this web-site

GenABEL

Genome-wide association analysis for quantitative, binary and time-till-event traits

MetABEL

Meta-analysis of genome-wide SNP association results Genome-wide association analysis for quantitative, binary and time-till-event traits

ProbABEL

Genome-wide association analysis of imputed data

PredictABEL

Assess the performance of risk models for binary outcomes

DatABEL

File-based access to large matrices stored on HDD in binary format

ParallABEL

Generalized parallelization of Genome-Wide Association Studies

MixABEL

More mixed models for genome-wide association analysis; experimenting with GSL, multiple input formats, iterator, parallelization through threads.

VariABEL

Genome-wide variance heterogeneity analysis as a tool for identification of potentially interacting SNPs.

Plotting grouped data vs time with error bars in R

2012年3月23日星期五

Biological Data Modelling and Scripting in R

Biological Data Modelling and Scripting in R

workshop materials from Montreal R user group

https://sites.google.com/site/mcgillbgsa/

意味着- imply

Interestingly, increased methylation of the hybrid genomes predominantly occurred in regions that were differentially methylated in the two parents and covered by small RNAs, implying that the RNA-directed DNA methylation (RdDM) pathway may direct DNA methylation in hybrids.

施加影响- exert influence by doing sth.

They exert their influence by guiding mRNA cleavage, translational repression, and chromatin modification.

2012年3月22日星期四

TopHat-Fusion: an algorithm for discovery of novel fusion transcripts

http://genomebiology.com/2011/12/8/R72

http://tophat-fusion.sourceforge.net/index.html

model Based CpG Islands for multiple species - R package

makeCGI is an R software package to obtain CGI from a genome. It fits two HMMs on GC content and observed to expected CpG ratio iteratively and obtain posterior probabilities for genomic regions being CpG islands. The CpG islands are then defined by thresholding the posterior probabilities.

The software package can be downloaded from here. It depends on BSgenome and Biostrings BioConductor packages. The input DNA sequence can beeither a BSgenome package or text file in fa format. Follow below steps to use the package

http://rafalab.jhsph.edu/CGI/index.html

export SVG plot from R

RSvgDevice

install.packages(“RSvgDevice”) #one time

library(RSvgDevice)

devSVG(file = “testRplots.svg”, width = 10, height = 8,
bg = “white”, fg = “black”, onefile=TRUE, xmlHeader=TRUE)
plot(1:11,(-5:5)^2, type=’b', main=”Simple Example Plot”)
dev.off()

Time Series Analysis and Its Applications: With R Examples

2012年3月21日星期三

an excellent evolutionary blog

http://bioinfoblog.it/

trello - tools for you to organize your project

https://trello.com/board/my-ideas/4f69eb9588299a057b03c7a6

Introduction to Unix systems for Evolutionary Biologists

find bugs from your R scripts

R语言编程入门之七：程序查错

automatically detect and efficiently remove sequence condaminations from genomic and metagenomic datasets

http://deconseq.sourceforge.net/manual.html

accessions geographic locations - Arabidopsis thaliana

You could find these INFO in 149SNPFiltered folder, from this web page.

Genetics and Genomics of the Brassicaceae

Brassicaceae in Agriculture

67-121

The Non-coding Landscape of the Genome of Arabidopsis thaliana

123-151

Natural Variation in Arabidopsis thaliana

153-170

Chasing Ghosts: Comparative Mapping in the Brassicaceae

171-194

Comparative Genome Analysis at the Sequence Level in the Brassicaceae

195-214

Structural and Functional Evolution of Resynthesized Polyploids

215-260

Genetics of Brassica rapa L.

261-289

The Genetics of Brassica oleracea

291-322

The Genetics of Brassica napus

323-345

Genetics of Brassica juncea

347-372

Arabidopsis lyrata Genetics

373-387

The Genetics of Capsella

389-411

Self-Incompatibility in the Brassicaceae

413-435

Sequencing the Gene Space of Brassica rapa

437-467

Germplasm and Molecular Resources

469-503

Resources for Metabolomics

505-525

Transformation Technology in the Brassicaceae

527-560

Resources for Reverse Genetics Approaches in Arabidopsis thaliana

561-583

Resources for Reverse Genetics Approaches in Brassica Species

585-596

Bioinformatics Resources for Arabidopsis thaliana

597-615

Bioinformatics Resources for the Brassica Species

617-632

Perspectives on Genetics and Genomics of the Brassicaceae

633-677

Back matter

check Disk Usage and Free Space - unix command

http://unix-school.blogspot.de/2012/03/disk-usage-and-free-space-housekeeping.html

Join every 2 lines in a file - unix commands

http://unix-school.blogspot.de/2012/03/join-every-2-lines-in-file.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+http/unix-schoolblogspotcom+(The+UNIX+School)

2012年3月19日星期一

瞄准某个问题－ bear on these questions

We bring previously overlooked fossil evidence to bear on these questions and find the split between A. thaliana and Arabidopsis lyrata occurred about 13 Mya, and that the split between Arabidopsis and the Brassica complex (broccoli, cabbage, canola) occurred about 43 Mya.