The .alignment file and the XMFA file format
2. http://www.bioperl.org/wiki/HOWTO:AlignIO_and_SimpleAlign
useBio ::AlignIO; my $io =Bio ::AlignIO->new(-file => "receptors.aln", -format => "clustalw" );
3. http://www.bioperl.org/wiki/XMFA_multiple_alignment_format
XMFA multiple alignment format
>seqname:start-stop strand comments
>1:1-598 +chrY TCCAAGTCGGCTTTATGTTTGCTTCTGCCAGGCATTCTAGATGCCCCATGTCTAGGATCT CTTTAGGCAGGAGAGAGGGTGATGGTGTAGGAGGACCCATTTCTTGGCTTGCAGATTCCA ATAATAAAAAAGTCACAGATTTAAACCCCAAACTTTGATGAAATGCAGGTCTAGGGTTTT AAAATATAATGAGAGTTAAATACTTTTGTATTTTCTTCATCCAGAGATGGGGCAAGCTTC CTCATCTGCTCGTTCATGGGTGATTTATATTTTCCCCACTCCATCCTTTTCCTAAGGTAT TTTTTTTTTAGGGACAATGGCTTTTTGCAGAGTACTCAGTTCCAGCTCCGGGGGCACCGG TTGAGCCCTTACCGTCCTGCCCCTAAACATCCAGACCTCAAGTTAGAGAGGGGAGTAACA TTTGGGGGGTGCCCACACCTAGGAGGACCAATCCTTCTGGTTTCCTTAGGGATGCAGGAA TTTGGGGGGGGGGGGCTCAGTGCTAAAACCAGTAGAGTCCTGGGCAAACGAGTATGACTG AAGATGCTTTGAACACCCTAGCGTTATGTCGATCGCATGCATCGTAGTGTCGCTGATG >2:5000-5598 - chr17TGCAGATTGGCCTT-TGTTTCGTTTTTC-AAGCGTT-TAAA --CGCCTTGCCTAAGAATC TTTT --GCAGGGAAGGGGATAGTGAACTGGGAAAACCTGGCTCTTCCTTTCGAGATTCCA GTAACAAACATGTCATAACTATAAACGCCAAACTTGG --AGAGCGCAGGAATGGAAGGTC AAACACCAATGAGAGTTAGATGGTTTTGGGTTT ----------------------GCT --CTAGTCTGCACG -------GTGCTCCCCGTCCCCTCACGTCCGTGCTTTTCCTCAGGATG ATGCCTTGCCAGAACACCGGTGTGCTGCAAGGTGCTCAGCTCCAAATCGGGCTGCACCGC TTCAGCTTTCCCCATCCAGCCA --ACGCAGGAAGGCCTGGAGCTACAGAGTTTAGAGCCA TCTCTCCGCTGCTCAT --------TAACCAACCATTCCAGCT -------GTCTGTAGTGG GTTTTTTTCTT ----CTCTACACTAAAATGAGGACAGTCCAGGCCCTTTG --TTAGACTG AAGATGCTTTGAACACCCTAGCGTTATGTCGATCGCATGCATCGTAGTGTCGCTGATG >3:19000-19598 - chr7TCCAGACTGTCTTT-TGCTCCCTTTTTCCGAGCATT-TAAAAATACCATGCCTAAGAATC TTTT --GCAGGGAAGGGGATAGCGAGCTGGGAAGGCCTATTTCTTCATTTCGAGATTCTG GTAATAAACATGTCATAAATATAAATGCCAAACTCCG --GAAATGCAGGTGTAGAGCGTC AGATTCTATTTGGACTTAAATGATGTGGTGTTTT ---------------------GCT --CTAATTTCTACC -------GTGCTCTCCGTTCC-TCAAGTCCATGCATTTCCTTAGGGTG CTGCCTTTCCAGAGTACTGGTATGCTGCAGGGTGCTCAGTTCCACATCTGTCTGCACTAT TTCAAAGTTTCCC-TCCAGCCC --ACACAACTATGCCTAGAGCTA --GAGGTTAGAACCG TCTGTCCA-TGCTCTT --------TAACCAACCACTCCAGAT -------AGGTGTGGTGG TTTTTTTTTTTTTTTCTCTGTACTAAAATTAGGACAGTCCAGGCCTGTTG --TTAGACCA AAGATGCTTTGAACACCCTAGCGTTATGTCGATCGCATGCATCGTAGTGTCGCTGATG =score = 111 >1:1000-1060 +chrY CACTCTAATAGTAAAGTTTCTTTTGCTGTGCAGAAGCTCTTTAGTTTAATTAGATCCCAT >2:6000-6060 + chr17CACTCTAATACTAAACTTTCTTTTCCTCTCCACA ----CTTTACTTTAATTACATCCCAT >3:20000-20060 - chr12CACTCTAATAGTAAAGTTTCTT ----TGTGCAGAAGCTCTTAGTTTTAATTAGATCCCAT =score = 11
4. http://smweb.bcgsc.ca/hmr/index.html
Build steps
Download the Berkeley data from Berkeley dataOne file for each chromosome ,so we handwrote the cluster job file Ran job on CMSGSC cluster using table creation script buildTables .pl - Created database using command
CREATE DATABASE hmr_berkeley
in MySQL - Found all the SQL table creation files find_sql_files.sh
- Created a MySQL SQL loader from script createMySQLDatabaseSHLoader.pl
- Ran the output script using SH from the last step
- Changed to the tables/ directory and ran
mysqlimport -u smontgom -pMYPASS -h db02 hmr_berkeley *.txt.table
5. http://www.koders.com/python/fid99340E36CA028D2382A72EB1B7D3A1639891E865.aspx?s=fuzzy
Python Parsers for FASTA and related formats.
没有评论:
发表评论