2013年2月28日星期四

SNPMeta


SNPMeta

SNPMeta is a Python and BioPython-based tool to generate "metadata" for single nucleotide polymorphisms (SNPs) for easy filtering, or submission to SNP databases. Information reported includes gene name, whether the SNP is coding or noncoding, and whether the SNP is synonymous or nonsynonymous. SNPMeta outputs in either a dbSNP submission report format, or a tab-delimited format.

Companion Scripts
These are various helper scripts provided to help with running SNPMeta. They might have uses outside of that context, though.
Blast_SNPs.sh - A shell script to run BLAST on SNPs, and save the reports as XML. Requires an installation of NCBI's BLAST executables, and a Bash shell. Edit the script in a text editor so the variables match your system. Requires a directory with FASTA files, with one sequence per file. This script will create a new file for each FASTA in the directory, ending in '.xml', containing the BLAST report.
Convert_Illumina.py - A Python script to convert from the Illumina contextual sequence format to FASTA, for input to SNPMeta. Accepts a text file with two fields, separated by a tab: the SNP Name, and the SNP contextual sequence. Outputs a FASTA file with IUPAC ambiguities to stdout.
GBSContextualSeq.py - A Python script to build SNP contextual sequences from a reference sequence and a VCF file. Generates a separate FASTA file for each sample listed in the VCF file. This is useful for generating contextual sequence from genotype-by-sequence (GBS) data, as the SNPs will be stored as a VCF. Requires BioPython. Also requires Argparse if using Python < 2.7.
Split_FASTA.py - A Python script to split a large FASTA file into smaller files. Takes a FASTA file and a positive integer as arguments. Requires BioPython.

没有评论:

发表评论