2013年2月7日星期四

perl scripts on systematic and evolutionary study

http://www.molekularesystematik.uni-oldenburg.de/33997.html#Sequences


SupertreesSequencesEvoDevoMiscellaneous

SUPERTREES (AND TREES IN GENERAL)

v.1.2.2
Determines the bootstrap frequencies for a given phylogenetic tree based on the results of a bootstrap analysis.
v.1.3.3
Adds branch length information to a NEXUS-formatted tree description corresponding to divergence times estimates of the nodes to create an ultrametric "chronograph". Will interpolate any missing dates using the log-log formula of Purvis (1995; see Bininda-Emonds et al., 1999 for the reference) and can correct for negative branch lengths.
v.1.1
Labels internal nodes of a NEXUS-formatted tree description with names of higher-level taxa according to a user-input taxonomy. These labels can be viewed in programs such as TreeView.
v.1.0
Adds node numbers to a NEXUS-formatted tree description for presentation purposes. These labels can be viewed in programs such asTreeView.
v.1.2.1
Calculates the number of unique clades / partitions for pairs of trees pruned to their common taxon set. Will optionally calculate a value weighted according to a measure of nodal support (given as a branch length), and can use this to calculate if two topologically identical trees have statistically different support values.
v.1.2.1
Calculates the qualitative support for the clades present in a supertree relative to those in the source trees contributing to it. Described inBininda-Emonds (2003) with a modified version (rQS) described inPrice et al. (2005).
Prachi Shah and Davide Pisani of Penn State University have kindly ported (an older version of) this program as a DOS executable file. It might only operate using the default parameters, however.
v.2.2
Derives relative branch length formulas for dating a supertree from one or more gene trees according to the "local molecular clock" procedure in Purvis (1995; see Bininda-Emonds et al., 1999 for the reference). Dates can be relative to either ancestral or daughter nodes.
v.1.0a
Uses PAUP* to reverse engineer the source trees from a NEXUS-formatted MRP data matrix and store in a single tree file. Requires, naturally, that the character boundaries of the source trees are specified (as nexus-format CHARSETs).
v.1.2.1
Converts a NEXUS-formatted treefile into a NEXUS-formatted data file ready for analysis. Incorporates both standard and Purvis MRP coding, and allows source trees to be coded as either rooted or unrooted (the latter as described in Bininda-Emonds et al., 2005).
v.2.1
Standardizes the taxon names in a set of source trees according to a user-input reference taxonomy and synonomy list. Mismatches are flagged for the user to correct. Note that it cannot account for branch-length or support information in the trees. Described in Bininda-Emondset al. (2004).
(Note: versions 1.0.x had serious bugs and should not be used!)
taxonoTree.plv.1.0Constructs the tree associated with a hierarchical taxonomy presented in a tab-delimited text file.
v.1.0
Converts a NEXUS-formatted treefile into a PHYLIP-formatted treefile or standalone NEXUS-formatted data file.
treePruner.plv.1.0Prunes the trees in a NEXUS-formatted treefile to their common taxon set, of specific user-input taxa, or both. Can optionally retain support values in the trees.

SupertreesSequencesEvoDevoMiscellaneous

SEQUENCES (AND DATA MINING)

autoMT.plv.1.0Allows for batch testing of the optimal model of evolution for a series of sequence files. Model testing can be performed using either ModelTEST with PAUP* or MrAIC.pl with PHYML. The applicability of the molecular clock can also be tested using the ModelTEST / PAUP* combination.
batchPHYML.plv.1.0Provides a wrapper around PHYML to easily perform sequential analyzes on a set of data matrices specfied by the user in a tab-delimited text file.
batchRAXML.plv.1.1.1Provides a wrapper around RAxML to easily analyze a set of data files according to a common set of the search criteria. Also organizes the RAxML output into a set of subdirectories. Compatible with RAxML-VI-HPC v2.2.3.
v.2.0
Mines all gene sequences from a GenBank output file according to annotations provided in each accession. As such, it is limited by the accuracy of the information given in the accession and uses a restricted library of gene synonyms. However, it can often mine more evolutionarily divergent sequences and better account for paralogs than can a BLAST-based search.
v.1.0a
Crude program to count number of sequences for a given gene in a GenBank download. Does not correct for differences in spelling, etc.
moleRat.plv1.0Calculates rates of evolution along the branches of one or more (gene) trees with respect to a dated reference tree, both for each tree individually and across the set of trees as a whole. Also identifies branches and clades that are evolving significantly differently from the overall average or that have changed their rate significantly with respect to an ancestral reference point (as determined using a paired Student's t-test and a paired Fisher's sign test). Described in Bininda-Emonds (2007).
seqCat.plv1.0Creates an interleaved nexus-formatted supermatrix of individual data matrices (in any of fasta, NEXUS, PHYLIP, or Se-Al formats).
v.1.0.2
Processes an aligned DNA sequence data set to retain only 1) those sequences with a minimum level of pairwise overlap and 2) the five most diverse (and longest) sequences for taxa with greater than five sequences. Data can be input in any of fasta, NEXUS, PHYLIP, or Se-Al formats.
seqConverter.plv.1.2Convert between some commonly used file formats (fasta, NEXUS, PHYLIP, or Se-Al) as well as performs simple data transformations (e.g., modify gaps, translate to amino acids, convert to haplotype data). Can also batch convert all programs of a specified file type in the working directory. A program that recognizes more file formats is sreformat, part of the HMMERpackage.
v.1.2
Facilitates the multiple alignment of protein-coding DNA sequences by aligning the amino acids sequences they specify. Data can be input in and output to any of fasta, NEXUS, PHYLIP, or Se-Alformats. Requires a local copy of ClustalW. Described in Bininda-Emonds (2005).

SupertreesSequencesEvoDevoMiscellaneous

EVODEVO

The Parsimov package, as described in Jeffery et al. (2005). All programs and example files were written by Jonathan Jeffery.
v.1.0.7g
Implements event-pair "parsimony cracking" as described inJeffery et al. (2005).
v.1.0.3b
Takes a Parsimv7g.pl output file and replaces the PAUP* character numbers with more readable character names according to a user-specified text list (e.g., CellType.txt, based of cell-line characters in spiralians).  For the latter, each line contains the character number (in ascending order) and character name, separated by a tab.
A lot of replacing can be automated using a batch file (e.g., isReplaceBat.txt).  The batch file contains the name of the output file to have its character numbers replaced and the name of the text-list to use, separated by a tab.
n/a
Creates a PAUP* command file to describe each tree in memory under ACCTRAN and DELTRAN optimizations (saving each as separate log files) plus a Parsimv7g.pl batch file (e.g.,ParsBatch.txt) to crack each of the PAUP* log files produced.  Handy for big jobs.
n/a
An example of a batch file -- if you have several log files to work through (e.g., ACCTRAN or DELTRAN optimizations, different topologies, etc), you can use this to get Parsimv7g.pl to run through each in turn.  The contents are, for each log-file you want to analyze: the name of a log file, the path where you want its output files written, whether to use all [a] or unambiguous [u] changes, whether to use a thorough search if feasible [y/n] and whether to clean-up the "working" files as it goes along [y/n] (useful for big data sets where temp files can reach 100s of MB).  This data must be tab-separated on a new line for each log file.
Other EvoDevo programs (written by me). Note: except for GamSim.pl, these programs are somewhat dated and run in MacPerl only.
v.1.0.2
Implements event-pair cracking as described in Jeffery et al. (2002).
BreakPoint.pl EventPair.pl 
JuncCode.pl
v.1.0
These three programs will encode developmental timing data according to three different coding schemes: breakpoint distances, event-pairing, and junction coding. Each program will output a NEXUS-formatted data file ready for analysis.
v.1.0
Infers the developmental sequence of a hypothetical ancestor on a cladogram, following the procedure described in Jeffery et al. (2002).
v.1.0a
Calculates whether the events in a given developmental sequence are evenly distributed or not. Described in Bininda-Emonds et al. (2003).

SupertreesSequencesEvoDevoMiscellaneous

MISCELLANEOUS

PerlEQ.plv.1.0b12A program, written by Jonathan Jeffery, that performs Safe Taxonomic Reduction (developed by Mark Wilkinson) to identify taxa that can be safely removed from a phylogenetic analysis because they are essentially "redundant" with other taxa. I have produced a modified version that allows all input options to be specified from the command line, including a new option to suppress output of the data matrix and character diagnostics to the html file (to keep the size of this file down somewhat). 
(Note: the program is no longer being actively maintained.)
v.1.0.9a
Creates a PAUP* batch file with the necessary instructions to perform a parsimony ratchet analysis.
reverseSTR.plv.1.0Re-includes taxa, where this is unequivocally possible, to a NEXUS-formatted tree description derived from a analysis using Safe Taxonomic Reduction. Requires the output of STRindexer.pl.
STRindexer.plv.1.1Parses the html output file of PerlEQ to identify taxa that can be safely removed from an analysis and then potentially unequivocally re-included (using reverseSTR.pl). Essentially, these are taxa conforming to the PerlEQ category C*.
STRedundancy.plv.1.0Identifies characters in a data matrix that will become redundant (i.e., duplicate others or become uninformative) upon deletion of a specified set of taxa. It is geared largely towards STR analyses of MRP matrices.
v.1.0
Converts Mac and DOS-style line breaks to Unix-style ones. This is a holdover from when my scripts required Unix-style line breaks.Here is an even better program (for Mac OS X) with drag-n-drop and the works.

没有评论:

发表评论