In-depth annotation of SNPs arising from resequencing projects using NGS-SNP
NGS-SNP - Overview
Citing NGS-SNP
Grant JR, Arantes AS, Xiaoping L, Stothard P (2011) In-depth annotation of SNPs arising from resequencing projects using NGS-SNP. Bioinformatics 27:2300-2301.
Description
NGS-SNP is a collection of command-line scripts for providing rich annotations for SNPs identified by the sequencing of transcriptsor whole genomes from organisms with reference sequences in Ensembl. Included among the annotations, several of which arenot available from any existing SNP annotation tools, are the results of detailed comparisons with orthologous sequences. Thesecomparisons allow, for example, SNPs to be sorted or filtered based on how drastically the SNP changes the score of a proteinalignment. Other fields indicate the names of overlapping protein domains or features, and the conservation of both the SNP siteand flanking regions. NCBI, Ensembl, and Uniprot IDs are provided for genes, transcripts, and proteins when applicable, alongwith Gene Ontology terms, a gene description, phenotypes linked to the gene, and an indication of whether the SNP is novel orknown. A “Model_Annotations” field provides several annotations obtained by transferring in silico the SNP to an orthologousgene, typically in a well-characterized species.
NGS-SNP scripts
Using NGS-SNP
- Set up NGS-SNP. Note that the simplest approach is to follow the "Linux virtual machine" section of the installation guide.
- Obtain a list of SNPs from SAMtools, Maq, the AB diBayes SNP package, or some other SNP calling software. The SNP list formats that can be parsed by the annotate_SNPs.pl script are described in the annotate_SNPs.pl documentation.
- Annotate the SNP list using the annotate_SNPs.pl script.
The following commands illustrate a typical NGS-SNP session in which SNPs are annotated and then scored (sample data included with NGS-SNP is analyzed):
cd NGS-SNP/scripts
perl annotate_SNPs/annotate_SNPs.pl -s bos_taurus -cs Homo_sapiens \
Mus_musculus -v -matrix annotate_SNPs/data/blosum62.mat -i \
annotate_SNPs/test_input/bovine_GA_maq_transcripts.tab -o annotated_snps.tab
For more information on the options available, input formats, and output formats, see the documentation for each script. Each script also comes with sample input and output files, located in directories called test_input and sample_output, respectively.
Using a local Ensembl database