2012年3月26日星期一

NGS-SNP - a good tools for SNP annotation

In-depth annotation of SNPs arising from resequencing projects using NGS-SNP


NGS-SNP - Overview

Citing NGS-SNP

Grant JR, Arantes AS, Xiaoping L, Stothard P (2011) In-depth annotation of SNPs arising from resequencing projects using NGS-SNP. Bioinformatics 27:2300-2301.

Description

NGS-SNP is a collection of command-line scripts for providing rich annotations for SNPs identified by the sequencing of transcriptsor whole genomes from organisms with reference sequences in EnsemblIncluded among the annotationsseveral of which arenot available from any existing SNP annotation toolsare the results of detailed comparisons with orthologous sequencesThesecomparisons allowfor exampleSNPs to be sorted or filtered based on how drastically the SNP changes the score of a proteinalignmentOther fields indicate the names of overlapping protein domains or featuresand the conservation of both the SNP siteand flanking regionsNCBIEnsembland Uniprot IDs are provided for genestranscriptsand proteins when applicablealongwith Gene Ontology termsa gene descriptionphenotypes linked to the geneand an indication of whether the SNP is novel orknownA “Model_Annotations” field provides several annotations obtained by transferring in silico the SNP to an orthologousgenetypically in a well-characterized species.

NGS-SNP scripts

  • annotate_SNPs.pl - used to annotate SNPs identified by the sequencing of genomic DNA or transcripts.
  • merge_and_sort_SNP_lists.pl - used to filter, merge, and sort SNP lists annotated using NGS-SNP.
  • cDNA_library_entropy.pl - used to choose the best tissues for SNP discovery by mRNA sequencing.
  • obtain_reference_chromosomes.pl - used to obtain reference chromosome sequences from Ensembl that can be supplied to SNP discovery tools such as Maq.
  • obtain_reference_transcripts.pl - used to obtain reference transcript sequences from Ensembl that can be supplied to SNP discovery tools such as Maq.
  • get_genes_in_area.pl - used to obtain information about genes located within or nearby CNVs or other variants supplied as input.
  • ncbi_monitor.pl - used to obtain publications related to genome regions supplied as input.

Using NGS-SNP

  1. Set up NGS-SNP. Note that the simplest approach is to follow the "Linux virtual machine" section of the installation guide.
  2. Obtain a list of SNPs from SAMtoolsMaq, the AB diBayes SNP package, or some other SNP calling software. The SNP list formats that can be parsed by the annotate_SNPs.pl script are described in the annotate_SNPs.pl documentation.
  3. Annotate the SNP list using the annotate_SNPs.pl script.
The following commands illustrate a typical NGS-SNP session in which SNPs are annotated and then scored (sample data included with NGS-SNP is analyzed):
cd NGS-SNP/scripts

perl annotate_SNPs/annotate_SNPs.pl -s bos_taurus -cs Homo_sapiens \
Mus_musculus -v -matrix annotate_SNPs/data/blosum62.mat -i \
annotate_SNPs/test_input/bovine_GA_maq_transcripts.tab -o annotated_snps.tab
        
For more information on the options available, input formats, and output formats, see the documentation for each script. Each script also comes with sample input and output files, located in directories called test_input and sample_output, respectively.

Using a local Ensembl database

没有评论:

发表评论