2012年6月29日星期五

on multiple sequence alignment - file formats

1. http://asap.ahabs.wisc.edu/mauve-aligner/mauve-user-guide/mauve-output-file-formats.html

The .alignment file and the XMFA file format


2. http://www.bioperl.org/wiki/HOWTO:AlignIO_and_SimpleAlign

Data files storing multiple sequence alignments appear in varied formats and Bio::AlignIO is the Bioperl object for conversion of alignment files. AlignIO is patterned on the Bio::SeqIO object and its commands have many of the same names as the commands in Bio::SeqIO. Just as in Bio::SeqIO the Bio::AlignIO object can be created with "-fileand "-formatoptions:
use Bio::AlignIO;
my $io = Bio::AlignIO->new(-file   => "receptors.aln",
                           -format => "clustalw" );


3. http://www.bioperl.org/wiki/XMFA_multiple_alignment_format
XMFA multiple alignment format

A modification of the FASTA multiple alignment format that allows for multiple alignments in a file. More information is on the MAUVE site
The defline contains the following in its header: >seqname:start-stop strand comments
   >1:1-598 + chrY
   TCCAAGTCGGCTTTATGTTTGCTTCTGCCAGGCATTCTAGATGCCCCATGTCTAGGATCT
   CTTTAGGCAGGAGAGAGGGTGATGGTGTAGGAGGACCCATTTCTTGGCTTGCAGATTCCA
   ATAATAAAAAAGTCACAGATTTAAACCCCAAACTTTGATGAAATGCAGGTCTAGGGTTTT
   AAAATATAATGAGAGTTAAATACTTTTGTATTTTCTTCATCCAGAGATGGGGCAAGCTTC
   CTCATCTGCTCGTTCATGGGTGATTTATATTTTCCCCACTCCATCCTTTTCCTAAGGTAT
   TTTTTTTTTAGGGACAATGGCTTTTTGCAGAGTACTCAGTTCCAGCTCCGGGGGCACCGG
   TTGAGCCCTTACCGTCCTGCCCCTAAACATCCAGACCTCAAGTTAGAGAGGGGAGTAACA
   TTTGGGGGGTGCCCACACCTAGGAGGACCAATCCTTCTGGTTTCCTTAGGGATGCAGGAA
   TTTGGGGGGGGGGGGCTCAGTGCTAAAACCAGTAGAGTCCTGGGCAAACGAGTATGACTG
   AAGATGCTTTGAACACCCTAGCGTTATGTCGATCGCATGCATCGTAGTGTCGCTGATG
   >2:5000-5598 - chr17
   TGCAGATTGGCCTT-TGTTTCGTTTTTC-AAGCGTT-TAAA--CGCCTTGCCTAAGAATC
   TTTT--GCAGGGAAGGGGATAGTGAACTGGGAAAACCTGGCTCTTCCTTTCGAGATTCCA
   GTAACAAACATGTCATAACTATAAACGCCAAACTTGG--AGAGCGCAGGAATGGAAGGTC
   AAACACCAATGAGAGTTAGATGGTTTTGGGTTT----------------------GCT--
   CTAGTCTGCACG-------GTGCTCCCCGTCCCCTCACGTCCGTGCTTTTCCTCAGGATG
   ATGCCTTGCCAGAACACCGGTGTGCTGCAAGGTGCTCAGCTCCAAATCGGGCTGCACCGC
   TTCAGCTTTCCCCATCCAGCCA--ACGCAGGAAGGCCTGGAGCTACAGAGTTTAGAGCCA
   TCTCTCCGCTGCTCAT--------TAACCAACCATTCCAGCT-------GTCTGTAGTGG
   GTTTTTTTCTT----CTCTACACTAAAATGAGGACAGTCCAGGCCCTTTG--TTAGACTG
   AAGATGCTTTGAACACCCTAGCGTTATGTCGATCGCATGCATCGTAGTGTCGCTGATG
   >3:19000-19598 - chr7
   TCCAGACTGTCTTT-TGCTCCCTTTTTCCGAGCATT-TAAAAATACCATGCCTAAGAATC
   TTTT--GCAGGGAAGGGGATAGCGAGCTGGGAAGGCCTATTTCTTCATTTCGAGATTCTG
   GTAATAAACATGTCATAAATATAAATGCCAAACTCCG--GAAATGCAGGTGTAGAGCGTC
   AGATTCTATTTGGACTTAAATGATGTGGTGTTTT---------------------GCT--
   CTAATTTCTACC-------GTGCTCTCCGTTCC-TCAAGTCCATGCATTTCCTTAGGGTG
   CTGCCTTTCCAGAGTACTGGTATGCTGCAGGGTGCTCAGTTCCACATCTGTCTGCACTAT
   TTCAAAGTTTCCC-TCCAGCCC--ACACAACTATGCCTAGAGCTA--GAGGTTAGAACCG
   TCTGTCCA-TGCTCTT--------TAACCAACCACTCCAGAT-------AGGTGTGGTGG
   TTTTTTTTTTTTTTTCTCTGTACTAAAATTAGGACAGTCCAGGCCTGTTG--TTAGACCA
   AAGATGCTTTGAACACCCTAGCGTTATGTCGATCGCATGCATCGTAGTGTCGCTGATG
   = score = 111
   >1:1000-1060 + chrY
   CACTCTAATAGTAAAGTTTCTTTTGCTGTGCAGAAGCTCTTTAGTTTAATTAGATCCCAT
   >2:6000-6060 + chr17
   CACTCTAATACTAAACTTTCTTTTCCTCTCCACA----CTTTACTTTAATTACATCCCAT
   >3:20000-20060 - chr12
   CACTCTAATAGTAAAGTTTCTT----TGTGCAGAAGCTCTTAGTTTTAATTAGATCCCAT
   = score = 11



4. http://smweb.bcgsc.ca/hmr/index.html
Aggregating and analyzing WGA (Whole genome alignmentdata


The Berkeley data comes in XMFA formatTo build the berkeley datawe manually downloaded it from the URLabove and ran the following build steps

Build steps

  1. Download the Berkeley data from Berkeley data
  2. One file for each chromosomeso we handwrote the cluster job file
  3. Ran job on CMSGSC cluster using table creation script buildTables.pl
  4. Created database using command CREATE DATABASE hmr_berkeley in MySQL
  5. Found all the SQL table creation files find_sql_files.sh
  6. Created a MySQL SQL loader from script createMySQLDatabaseSHLoader.pl
  7. Ran the output script using SH from the last step
  8. Changed to the tables/ directory and ran mysqlimport -u smontgom -pMYPASS -h db02 hmr_berkeley *.txt.table
NoteEach script may require you to check the input parametersI will create a tar.gz in the future to downloadthese so they work out of the boxThere are a few absolute paths right now to support running jobs on a cluster.

5. http://www.koders.com/python/fid99340E36CA028D2382A72EB1B7D3A1639891E865.aspx?s=fuzzy


Python Parsers for FASTA and related formats.

没有评论:

发表评论