2. http://egglib.sourceforge.net/index.html
3. the manual of PDF
http://ftp.jaist.ac.jp/pub/sourceforge/e/eg/egglib/current/egglib.pdf
4. an example
polymorphismBPP(dataType=1)
Computes diversity statistics using tools provided through the Bio++ libraries. Note that attempting to
call this method from an EggLib module compile without Bio++ support will result in a RuntimeError.
Arguments:
•dataType: 1 for DNA, 2 for RNA, 3 for protein sequences, 4 for standard codons, 5 for vertebrate
mitochondrial codons, 6 for invertebrate mitochondrial codons and 7 for echinoderm mitochondrial codons.
The method returns a dictionary containing the diversity statistics. Some keys will be computed only
in the presence of an outgroup, or if sequences were specified as coding or depending on the value of
other statistics (otherwise, they will be None).
The following statistics are always computed:
•S: Number of polymorphic sites.
•Sinf: Number of parsimony informative sites.
•Ssin: Number of singleton sites.
•eta: Minimal number of mutations.
•thetaW: Theta estimator (Watterson Theor. Popul. Biol. 7:256-276, 1975).
•T83: Theta estimator (Tajima Genetics 105:437-460, 1983)
•He: Heterozygosity.
•Ti: Number of transitions.
•Tv: Number of transversions.
•K: Number of haplotypes.
•H: Haplotypic diversity.
•rhoH: Hudson’s estimator of rho (Genet. Res. 50:245-250, 1987)
The following statistic is computed only if Tv > 0:
•TiTv: Transition/transversion ratio.
The following statistic is computed only if S > 0:
•D: Tajima statistic (Genetics 123:585-595, 1989).
The following statistics are computed only if eta > 0:
•Deta: Tajima’s D computed with eta instead of S.
•Dflstar: Fu and Li’s D* (without outgroup; Genetics 133:693-709).
•Fstar: Fu and Li’s F* (without ougroup; Genetics 133:693-709).
The following statistic is computed only if an outgroup is found:
•Sext: Mutations on external branches.
The following statistics are computed only if an outgroup is found and eta > 0:
•Dfl: Fu and Li’s D (Genetics 133:693-709).
•F: Fu and Li’s F (Genetics 133:693-709).
The following statistics are computed only if sequences are coding dataType = 4-7:
•ncodon1mut: Number of codon sites with exactly one mutation.
•NSsites: Average number of non-synonymous sites.
•nstop: Number of codon sites with a stop codon.
•nsyn: Number of codon sites with a synonymous change.
•PiNS: Nucleotide diversity computed on non-synonymous sites.
•PiS: Nucleotide diversity computed on synonymous sites.
•SNS: Number of non-synonymous polymorphic sites.
•SS: Number of synonymous polymorphic sites.
•Ssites: Number of synonymous sites.
•tWNS: Watterson’s theta computed on non-synonymous sites.
•tWS: Watterson’s theta computed on synonymous sites
The following statistics are computed only if sequences are coding dataType = 4-7 and an outgroup is
found:
•MK: McDonald-Kreitman test table (Nature 351:652-654, 1991).
•NI: Neutrality index (Rand and Kann Mol. Biol. Evol. 13:735-748).
The returned dictionary also contains a nest dictionary options which feedbacks the values used
at function call. Changed in version 2.0.2: The following statistics are now computed only if S > 0:
D, Deta, Dflstar, Fstar, Dfl, F. Changed in version 2.1.0: The statistics not computed are now
exported and set to None.
没有评论:
发表评论