2013年2月28日星期四

SNPMeta

SNPMeta is a Python and BioPython-based tool to generate "metadata" for single nucleotide polymorphisms (SNPs) for easy filtering, or submission to SNP databases. Information reported includes gene name, whether the SNP is coding or noncoding, and whether the SNP is synonymous or nonsynonymous. SNPMeta outputs in either a dbSNP submission report format, or a tab-delimited format.

Companion Scripts

These are various helper scripts provided to help with running SNPMeta. They might have uses outside of that context, though.

Blast_SNPs.sh - A shell script to run BLAST on SNPs, and save the reports as XML. Requires an installation of NCBI's BLAST executables, and a Bash shell. Edit the script in a text editor so the variables match your system. Requires a directory with FASTA files, with one sequence per file. This script will create a new file for each FASTA in the directory, ending in '.xml', containing the BLAST report.

Convert_Illumina.py - A Python script to convert from the Illumina contextual sequence format to FASTA, for input to SNPMeta. Accepts a text file with two fields, separated by a tab: the SNP Name, and the SNP contextual sequence. Outputs a FASTA file with IUPAC ambiguities to stdout.

GBSContextualSeq.py - A Python script to build SNP contextual sequences from a reference sequence and a VCF file. Generates a separate FASTA file for each sample listed in the VCF file. This is useful for generating contextual sequence from genotype-by-sequence (GBS) data, as the SNPs will be stored as a VCF. Requires BioPython. Also requires Argparse if using Python < 2.7.

Split_FASTA.py - A Python script to split a large FASTA file into smaller files. Takes a FASTA file and a positive integer as arguments. Requires BioPython.

2013年2月27日星期三

Structure-Pipeline

1. Structure-Pipeline, This is really a good tool.

Users of Structure (Pritchard et al, 2000) may be familiar with the interface of the BioHPC cluster at Cornell. Unfortunately, guest access was discontinued in May, 2011. If you have structure installed on your SGE supercomuting cluster, several features of the web-based BioHPC cluster interface can be replaced with a pipeline of qsub and python scripts. This pipleline will guide you through setting up your datafile and parameter settings, running structure efficiently at many values of K, summarizing those results using CLUMPP, and vizualizing the results using custom R scripts.

2. structure Harvester

http://taylor0.biology.ucla.edu/structureHarvester/example/summary.html

3. a simple guide

https://wiki.duke.edu/display/SCSCusers/Using+Structure

2013年2月24日星期日

test for local adaptation and to analyze the performance of hybrids relative to native parental plants

tool and a reference on transplantation data analysis.

1. Geyer, C. J., S. Wagenius, and R. G. Shaw. 2007. Aster models for life history analysis. Biometrika 94:415–426.

http://cran.r-project.org/web/packages/aster/index.html

http://onlinelibrary.wiley.com/doi/10.1111/j.1558-5646.2008.00457.x/full

willislab -- on Mimulus speciation and genetics

http://biology.duke.edu/willislab/

genomics of ecological speciation - some cases

1. http://onlinelibrary.wiley.com/doi/10.1111/j.1461-0248.2010.01546.x/full
A guide to the genomics of ecological speciation in natural animal populations

Interest in ecological speciation is growing, as evidence accumulates showing that natural selection can lead to rapid divergence between subpopulations. However, whether and how ecological divergence can lead to the buildup of reproductive isolation remains under debate. What is the relative importance of natural selection vs. neutral processes? How does adaptation generate reproductive isolation? Can ecological speciation occur despite homogenizing gene flow? These questions can be addressed using genomic approaches, and with the rapid development of genomic technology, will become more answerable in studies of wild populations than ever before. In this article, we identify open questions in ecological speciation theory and suggest useful genomic methods for addressing these questions in natural animal populations. We aim to provide a practical guide for ecologists interested in incorporating genomic methods into their research programs. An increased integration between ecological research and genomics has the potential to shed novel light on the origin of species.

2. http://www.sciencedirect.com/science/article/pii/S0169534712001863

What is needed for next-generation ecological and evolutionary genomics?

Ecological and evolutionary genomics (EEG) aims to link gene functions and genomic features to phenotypes and ecological factors. Although the rapid development of technologies allows central questions to be addressed at an unprecedented level of molecular detail, they do not alleviate one of the major challenges of EEG, which is that a large fraction of genes remains without any annotation. Here, we propose two solutions to this challenge. The first solution is in the form of a database that regroups associations between genes, organismal attributes and abiotic and biotic conditions. This database would result in an ecological annotation of genes by allowing cross-referencing across studies and taxa. Our second solution is to use new functional techniques to characterize genes implicated in the response to ecological challenges.

3. http://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2008.03946.x/abstract

Divergent selection and heterogeneous genomic divergence

Levels of genetic differentiation between populations can be highly variable across the genome, with divergent selection contributing to such heterogeneous genomic divergence. For example, loci under divergent selection and those tightly physically linked to them may exhibit stronger differentiation than neutral regions with weak or no linkage to such loci. Divergent selection can also increase genome-wide neutral differentiation by reducing gene flow (e.g. by causing ecological speciation), thus promoting divergence via the stochastic effects of genetic drift. These consequences of divergent selection are being reported in recently accumulating studies that identify: (i) ‘outlier loci’ with higher levels of divergence than expected under neutrality, and (ii) a positive association between the degree of adaptive phenotypic divergence and levels of molecular genetic differentiation across population pairs [‘isolation by adaptation’ (IBA)]. The latter pattern arises because as adaptive divergence increases, gene flow is reduced (thereby promoting drift) and genetic hitchhiking increased. Here, we review and integrate these previously disconnected concepts and literatures. We find that studies generally report 5–10% of loci to be outliers. These selected regions were often dispersed across the genome, commonly exhibited replicated divergence across different population pairs, and could sometimes be associated with specific ecological variables. IBA was not infrequently observed, even at neutral loci putatively unlinked to those under divergent selection. Overall, we conclude that divergent selection makes diverse contributions to heterogeneous genomic divergence. Nonetheless, the number, size, and distribution of genomic regions affected by selection varied substantially among studies, leading us to discuss the potential role of divergent selection in the growth of regions of differentiation (i.e. genomic islands of divergence), a topic in need of future investigation.

2013年2月23日星期六

MultiGeneBlast: Combined BLAST searches for operons and gene clusters

MultiGeneBlast is an open source tool for identification of homologs of multigene modules such as operons and gene clusters. It is based on a reformatting of the FASTA headers of NCBI GenBank protein entries, using which it can track down their source nucleotide and coordinates.

Oftentimes when studying such genetic loci, much can be learned from their evolutionary context. Furthermore, MultiGeneBlast can aid in the detection of such multigene parts for synthetic biology projects; a synthetic library of operons can be created based on its output to identify those operons whose function is closest to the one desired by the user.

This tool provides the opportunities to identify all homologous genomic regions by combining the results of single BlastP runs on each gene, and sorting genomic regions from any GenBank entry by the number of hits, synteny conservation and cumulative Blast bit score. The basic algorithm behind this was previously used in our antiSMASH software.
Additionally, architecture searches can be performed to find any genomic regions with Blast hits to any user-specified combination of amino acid sequences.

The tool comes with a pre-configured database containing the most recent version of all relevant GenBank divisions. Moreover, you can easily make your own databases from local files or online GenBank entries or divisions

http://multigeneblast.sourceforge.net/

2013年2月22日星期五

bigcor: Large correlation matrices in R

http://rmazing.wordpress.com/2013/02/22/bigcor-large-correlation-matrices-in-r/

It has been shown that by calculating the Pearson correlation between genes, one can identify (by high $\varrho$ values, i.e. > 0.9) genes that share a common regulation mechanism such as being induced/repressed by the same transcription factors:

http://www.jbc.org/content/279/17/17905.long

I had an idea. How about using my microarray data of gene expression of 40000 genes in 28 samples and calculate the correlation between all 40000 genes (variables).

Inferring Population Histories Using Genome-Wide Allele Frequency Data

The recent development of high-throughput genotyping technologies has revolutionized the collection of data in a wide range of both model and nonmodel species. These data generally contain huge amounts of information about the demographic history of populations. In this study, we introduce a new method to estimate divergence times on a diffusion time scale from large single-nucleotide polymorphism (SNP) data sets, conditionally on a population history that is represented as a tree. We further assume that all the observed polymorphisms originate from the most ancestral (root) population; that is, we neglect mutations that occur after the split of the most ancestral population. This method relies on a hierarchical Bayesian model, based on Kimura’s time-dependent diffusion approximation of genetic drift. We implemented a Metropolis–Hastings within Gibbs sampler to estimate the posterior distribution of the parameters of interest in this model, which we refer to as the Kimura model. Evaluating the Kimura model on simulated population histories, we found that it provides accurate estimates of divergence time. Assessing model fit using the deviance information criterion (DIC) proved efficient for retrieving the correct tree topology among a set of competing histories. We show that this procedure is robust to low-to-moderate gene flow, as well as to ascertainment bias, providing that the most distantly related populations are represented in the discovery panel. As an illustrative example, we finally analyzed published human data consisting in genotypes for 452,198 SNPs from individuals belonging to four populations worldwide. Our results suggest that the Kimura model may be helpful to characterize the demographic history of differentiated populations, using genome-wide allele frequency data.

http://mbe.oxfordjournals.org/content/30/3/654.full

modeler4simcoal2 (m4s2) - a modeler for coalescent processes

modeler4simcoal2 (m4s2) is a modeler for coalescent processes. It allows the modeling of both demographies and chromosomes (i.e., markers with linkage relationships in multiple chromosome blocks).

m4s2 generates files for usage with Simcoal2 which can easily be analyzed with Arlequin3. m4s2 can be run standalone or can directly call and control Simcoal2. Arlequin3 can also be called after the simulations are run.

m4s2 is a Java Web Start application (requiring Java 1.4, available for Windows, Mac and Linux among others). It requires no installation and can be run directly from the web. m4s2 can be run on more platforms than those supported by Simcoal2 and Arlequin3 (in this case only in standalone mode).

The purpose of m4s2 is to allow biologists to concentrate more on biology and the underlying models used on analysis (and less on having to learn a new computer simulation tools). We expect that m4s2 will lower the barrier for coalescent simulator use.

m4s2 has full expressive power with regards to chromosome modeling (i.e., it can model all that Simcoal2 supports).

Regarding demographies, m4s2 includes a set of models which cover the vast majority found in the literature (e.g., island, stepping-stone). An extension system is also provided allowing for the creation of new models. A simple extension language is provided, if the language is not enough the full expressive power of Python (Jython) can be used to create new models. New models can be made available online as m4s2 can import those directly from the web. We make available an external model on the expansion of humans and domesticated species after the Neolithic as hierarchically structured.

Before using m4s2 we recommend reading the users guide. At least the first few lines... You can run m4s2 directly from here.

http://bioinformatics.oxfordjournals.org/content/23/14/1848.long

2013年2月21日星期四

circular plots in R

Is there any R or R / Bioconductor package that can make circular plots like Perl / circos

2013年2月20日星期三

Evolutionary Genomics

Statistical and Computational Methods, Volume 2

Evolutionary Biology for the 21st Century

http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001466

Evolutionary Processes That Shape Genomic and Phenotypic Variation

The availability of genomic data from a remarkable range of species has allowed the alignment and comparison of whole genomes. These comparative approaches have been used to characterize the relative importance of fundamental evolutionary processes that cause genomic evolution and to identify particular regions of the genome that have experienced recent positive selection, recurrent adaptive evolution, or extreme sequence conservation[72]–[75]. Yet more recently, resequencing of additional individuals or populations is also allowing genome-wide population genetic analyses within species [76]–[82]. Such population-level comparisons will allow even more powerful study of the relative importance of particular evolutionary processes in molecular evolution as well as the identification of candidate genomic regions that are responsible for key evolutionary changes (e.g., sticklebacks [83], butterflies [84], Arabidopsis [85]). These data, combined with theoretical advances, should provide insight into long-standing questions such as the prevalence of balancing selection, the relative frequency of strong versus weak directional selection, the role of hybridization, and the importance of genetic drift. A key challenge will be to move beyond documenting the action of natural selection on the genome to understanding the importance of particular selective agents. For example, what proportion of selection on genomes results from adaptation to the abiotic environment, coevolution of species, sexual selection, or genetic conflict? Finally, as sequencing costs continue to drop and analytical tools improve, these same approaches may be applied to organisms that present intriguing evolutionary questions but were not tractable methodologically just a few years ago. The nonmodel systems of today may well become the model systems of tomorrow [86].

Understanding Biological Diversification

A major and urgent challenge is to improve knowledge of the identity and distribution of species globally. While we need to retain the traditional focus on phenotypes, powerful new capabilities to obtain and interpret both genomic and spatial data can and should revolutionize the science of biodiversity. Building on momentum from single-locus “barcoding" efforts, new genome-level data can build bridges from population biology to systematics [91]. By establishing a comprehensive and robust “Tree of Life," we will improve understanding of both the distribution of diversity and the nature and timing of the evolutionary processes that have shaped it.

pandas - a python package working with dataframe

1. http://blog.yhathq.com/posts/R-and-pandas-and-what-ive-learned-about-each.html

pandas is the utility belt for data analysts using python. The package centers around the pandas DataFrame, a two-dimensional data structure with indexable rows and columns. It has effectively taken the best parts of Base R, R packages like plyr and reshape2 and consolidated them into a single library. It has lots of features (see library highlights). pandas gets its name from panel data, an econometrics term for multidimensional structured datasets (McKinney 5., 2013)

2. http://pandas.pydata.org/pandas-docs/stable/index.html

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

pandas is well suited for many different kinds of data:

Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet

Ordered and unordered (not necessarily fixed-frequency) time series data.

Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels

Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

2013年2月18日星期一

evopipes - Bioinformatic tools for ecological and evolutionary genomics

1. http://evopipes.net/docs.html

2. http://www.la-press.com/evopipesnet-bioinformatic-tools-for-ecological-and-evolutionary-genomi-article-a2316

2013年2月17日星期日

SHIPS - a non-parametric clustering algorithm

http://stat.genopole.cnrs.fr/logiciels/SHIPS

Overview

SHIPS (Spectral Hierarchical clustering for the Inference of Population Structure) is a non-parametric clustering algorithmthat clusters individuals from a population into genetically homogeneous sub-populations from genotype data. After computing a pairwise distance matrix, the algorithm progressively divides the original population in two sub-populations by the use of aspectral clustering algorithm. The process is then iterated in each of the two sub-populations and so on. This leads to the construction of a binary tree, where each node represents a group of individuals. To determine the final clusters a tree pruning procedure and an estimation of the optimal number of clusters, that is a gap statistic, are applied. In such an approach both the final clustering of the individuals and the number of clusters are estimated by the method.

The algorithm SHIPS is implemented with the software R that can be downloaded from the (CRAN web page) and is divided in several functions :

ships.cluster constructing the tree and providing several clustering possibilities
ships.gap that estimates the final number of clusters
ships.plotCluster that provides a graphical representation of the clustering
ships.plotGap that plots the criterion used to estimate the final number of clusters

SHIPS ressources

Source code: Ships.r
R package: Ships.tar.gz

Documentation: Documentation.pdf

http://www.plosone.org/article/info:doi/10.1371/journal.pone.0045685

2013年2月15日星期五

R package for IBD

1. gdsfmt and SNPRelate - Please follow this link to view the tutorial.

gdsfmt and SNPRelate are high-performance computing R packages for multi-core symmetric multiprocessing computer architectures. They are used to accelerate two key computations is GWAS: principal component analysis (PCA) and relatedness analysis using identity-by-descent (IBD) measures. The kernels of our algorithms are written in C/C++, and have been highly optimized. Benchmarks show the uniprocessor implementations of PCA and IBD are ~8 to 50 times faster than the implementations provided by the popular EIGENSTRAT (v3.0) and PLINK (v1.07) programs respectively, and can be sped up to 30~300 folds by utilizing eight cores. SNPRelate can analyze tens of thousands of samples, with millions of SNPs.

https://www.genevastudy.org/Accomplishments/software

http://corearray.sourceforge.net/tutorials/SNPRelate/#x1-130003.3

2. CrypticIBDcheck to identify pairs of closely-related
subjects based on genetic marker data from single-nucleotide polymorphisms (SNPs). The
package is able to accommodate SNPs in linkage disequibrium (LD), without the need to
thin the markers so that they are approximately independent in the population. Sample
pairs are identiﬁed by superposing their estimated identity-by-descent (IBD) coeﬃcients
on plots of IBD coeﬃcients for pairs of simulated subjects from one of several common
close relationships. The methods are particularly relevant to candidate-gene association
studies, in which dependent SNPs cluster in a relatively small number of genes spread
throughout the genome. The accommodation of LD allows the use of all available genetic data, a desirable property when working with a modest number of dependent SNPs
within candidate genes

http://cran.r-project.org/web/packages/CrypticIBDcheck/index.html

http://www.scfbm.org/content/pdf/1751-0473-8-5.pdf

GC-Biased Gene Conversion

The Role of GC-Biased Gene Conversion in Shaping the Fastest Evolving Regions of the Human Genome

GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that accelerates the fixation of guanine or cytosine alleles, regardless of their effects on fitness. gBGC can increase the overall rate of substitutions, a hallmark of positive selection. Many fast-evolving genes and noncoding sequences in the human genome have GC-biased substitution patterns, suggesting that gBGC—in contrast to adaptive processes—may have driven the human changes in these sequences.

2.

Phylogenetic Patterns of GC-Biased Gene Conversion in Placental Mammals and the Evolutionary Dynamics of Recombination Landscapes

Analysis of Maize Full-Length cDNAs -- a good example

Sequencing, Mapping, and Analysis of 27,455 Maize Full-Length cDNAs

Analysis

Contaminated FLcDNAs were found by comparing them against the maize, rice and Arabidopsis rRNA sequences with a BLAST e-value≤1e-50, which identified 26 rRNAs. An additional 110 FLcDNAs were identified that encoded proteins highly similar to bacteria (16 cDNAs), fungus (76 cDNAs) and vertebrate (18 cDNAs) and did not show similarity with plant proteins.

The ORFs were computed using the software GETORF in EMBOSS package [50] with parameters “–minsize 150, -find 1, -methionine, -noreverse”. TE and SSR analyses were performed using RepeatMasker (repeatmasker.org). For TE analysis, the Poaceae (grass family) TE database was downloaded from Genetic Information Research Institute (www.girinst.org) and the FLcDNAs that had masked sequence length of ≥100 bp were used for the TE insertion analysis. SSRs with length ≥20 bp and divergence ≤10% were selected for SSR location analysis. Putative transcription factors were analyzed using BLASTx with e-value≤1e-10 against rice and Arabidopsis transcription factor proteins downloaded from PlantTFDB (planttfdb.cbi.pku.edu.cn). Any maize cDNAs showing positive matches in both rice and Arabidopsis were assigned to TF families using the PlantTFDB nomenclature.

Plant homolog analysis was conducted using BLASTx (e-value≤1e-10) to compare rice, sorghum, Arabidopsis and poplar protein sequences downloaded from the following sites: 67,393 rice (MSU release 6.0; rice.plantbiology.msu.edu), 35,899 sorghum (www.phytozome.net/sorghum), 32,615 Arabidopsis (TAIR v8.0; www.arabidopsis.org) and 45,555 poplar (genome.jgi-psf.org). The maize FLcDNAs that did not have a homolog were compared with the plant UniProt database [29], where another 147 rice, sorghum, Arabidopsis or poplar homologs were identified and removed. Then the 1,475 putative unique maize FLcDNAs were mapped to GO annotated maize gene models with ≥95% ID and ≥90% alignment length using BLAT. GO over- and under- representation analysis were performed using Cytoscape [51] with BiNGO (Biological Networks Gene Ontology, [25]) plug-in and activating a hypergeometric distribution statistical test (p-value ≤0.05) with Benjamini and Hochberg false discovery rate (FDR) correction [52] relative to GO annotated maize gene models.

For annotation of all EST and FLcDNA assemblies, the unitrans were searched against the UniProt plants database (2009-06-17) using BLASTx with e-value≤1e-20. The GO [24]annotations were extracted from the UniProt file and gene association file (ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT),which were mapped to plant GO Slim [32]. Some of the results were computed by custom Perl scripts, and the rest were obtained from the website, as follows: Table 6 was copied from the “Advanced Summary/Example Queries” page. The number of UniProt matches for the 27k were from the “UniTrans Search”, where “Non-maize UniProt Match” was set to ‘yes’; for the non-putative, the “Match Description” was set to “not putative”. Table 8, Table 9, and the top of Table 10 can all be verified from the PAVE query system.

DAVID and WebGESAT for pathway analysis

pathway analysis you can use:

DAVID (http://david.abcc.ncifcrf.gov/),

Gene Set Analysis Toolkit (http://bioinfo.vanderbilt.edu/webgestalt/)

2013年2月14日星期四

NGS Statistical genetics courses from Abecasis lab

http://genome.sph.umich.edu/wiki/Biostatistics_666

Biostatistics 666: Course Introduction and Hardy Weinberg Equilibrium - PDF

Biostatistics 666: Linkage Disequilibrium - PDF

Biostatistics 666: Introduction to the Coalescent - PDF

Biostatistics 666: Modeling Variation in the Coalescent - PDF

Biostatistics 666: Modeling Recombination and Migration in the Coalescent - PDF

Biostatistics 666: Maximum Likelihood Allele Frequency Estimation - PDF

Biostatistics 666: Introduction to the E-M Algorithm - PDF

Biostatistics 666: Haplotype Estimation - PDF

Biostatistics 666: Haplotype Association Tests - PDF

Biostatistics 666: Association Tests in Structured Populations - PDF

Biostatistics 666: Power of Genomewide Association Studies - PDF

Biostatistics 666: Linkage Analysis in Sibling Pairs - PDF

Biostatistics 666: Multipoint Analysis in Sibling Pairs - PDF

Biostatistics 666: Relationship Checking - PDF

Biostatistics 666: Genotype Imputation - PDF

Biostatistics 666: Whole Genome Sequencing - PDF

Biostatistics 666: Analysis of Low Pass Sequence Data - PDF

Biostatistics 666: Analysis of Copy Number Using Sequence Data - PDF

Biostatistics 666: Introduction to De Novo Assembly - PDF

Biostatistics 666: Rare Variant Burden Tests - PDF

Biostatistics 666: Variance Component Analyses - PDF

Biostatistics 666: Likelihood Calculations for Large Pedigrees - PDF

Genomic consequences of transitions from cross- to self-fertilization on the efficacy of selection

Genomic consequences of transitions from cross- to self-fertilization on the efficacy of selection in three independently derived selfing plants

Background

Transitions from cross- to self-fertilization are associated with increased genetic drift rendering weakly selected mutations effectively neutral. The effect of drift is predicted to reduce selective constraints on amino acid sequences of proteins and relax biased codon usage. We investigated patterns of nucleotide variation to assess the effect of inbreeding on the accumulation of deleterious mutations in three independently evolved selfing plants. Using high-throughput sequencing, we assembled the floral transcriptomes of four individuals of Eichhornia(Pontederiaceae); these included one outcrosser and two independently derived selfers of E.paniculata, and E. paradoxa, a selfing outgroup. The dataset included ~8000 loci totalling ~3.5 Mb of coding DNA.

Results

Tests of selection were consistent with purifying selection constraining evolution of the transcriptome. However, we found an elevation in the proportion of non-synonymous sites that were potentially deleterious in the E. paniculata selfers relative to the outcrosser. Measurements of codon usage in high versus low expression genes demonstrated reduced bias in both E. paniculataselfers.

Conclusions

Our findings are consistent with a small reduction in the efficacy of selection on protein sequences associated with transitions to selfing, and reduced selection in selfers on synonymous changes that influence codon usage.

2013年2月13日星期三

Softberry programs for genomics

Softberry Programs available to academic users at no charge for occasional use in research projects

Program name	Help	Documentation	Linux	Mac/Windows (W)	Results Viewer	Title

ReadsMap	View	Download	Download	Download	Reads Mapping Viewer	Mapping of reads to chromosome (contig)
Reads Mapping Viewer	View	Download	Download	Download		The "Reads Mapping Viewer" software is developed to visualize the "ReadsMap" output data.
Assembler	View	Download	Download	Download	Assember Viewer	Algorithm of ab initio genome assembling using data produced by next-generation sequencing machines (Illumina/Solexa/etc). To view the program output data the Assembler Viewer can be used.
Assembler Viewer	View	Download	Download	Download		The "Assembler Viewer" software is developed to visualize the "Assembler" output data.
SNP-Toolbox	View	Download	Download (4.3Gb)	Download (W) (3.8GB)		A fast and effective tool for analysis of genome variations in human chromosomes.
Promoter
CPGFinder	View	Download	Download	Download		Search for CpG islands in sequences
FPROM	View	Download	Download	Download		Human promoter prediction
Nsite	View	Download	Download	Download		Search for of consensus patterns with statistical estimation.
Nsite-m	View	Download	Download	Download		Search for regulatory motifs conserved in several sequences
Pattern	View	Download	Download	Download		Search for significant patterns in the set of sequences.
Polyah	View	Download	Download	Download		Recognition of 3'-end cleavage and polyadenilation region
ScanWM-PL	View	Download	Download	Download		Search for weight matrix patterns of plant regulatory sequences
TSSG	View	Download	Download	Download		Recognition of human PolII promoter region and start of transcription
Protein Structure
3D-Comp	View	Download	Download	Download	3D-Explorer	Sequence Alignment to Superposition
3D-Match	View	Download	Download	Download	3D-Explorer	Pairwise protein structure alignment
3D-ModelFit	View	Download	Download	Download	3D-Explorer	Program for the estimation of quality of 3D model structure of protein.
Abini3D	View	Download	Download	Download	3D-Explorer	Ab inition folding
CysRec	View	Download	Download	Download		Prediction of SS-bonding states of cysteines and disulphide bridges in protein sequences
GetAtoms	View	Download	Download	Download	3D-Explorer	Optimization of replaced side chain groups by simulated annealing algorithm
MolDyn	View	Download	Download	Download	3D-Explorer	MolDyn is designed to perform multiple tasks with protein structure.
MolMech	View	Download	Download	Download	3D-Explorer	Energy minimization program by molecular mechanic
NNSSP	View	Download	Download	Download		Nearest-neighbor SS prediction
Net-SSPredict	View	Download	Download	Download		Program for secondary structure prediction. Neural nets based on profile of psiBLAST comparison of the query sequence with NR database.
PDisorder	View	Download	Download	Download		Program for predicting ordered and disordered regions in protein sequences
SSEnvID	View	Download	Download	Download		Protein secondary structure and environment assignment from atomic coordinates
SSP	View	Download	Download	Download		Prediction of a-helix and b-strand segments of globular proteins
SSPAL	View	Download	Download	Download		Prediction of protein secondary sturcture by using local alignments
Alignments
Protmap	View	Download	Download	Download		Mapping of a set of proteins on genome
MaliN	View	Download	Download	Download	MaliN Viewer	Multiple alignment for nucleotide sequences.
MaliP	View	Download	Download	Download	MaliP Viewer	Multiple alignment for protein sequences.
EstMap	View	Download	Download	Download		Program for mapping a whole set of mRNAs/ESTs to a chromosome sequence
Scan2	View	Download	Download	Download		Program for aligning two multimegabyte-size genome sequences using a sequential search for most significant similarity regions.
Scan2a	View	Download	Download	Download		Program for aligning two aminoacid sequences using a sequential search for most significant similarity regions.
Gene Finding
Fex	View	Download	Download	Download		Prediction of internal, 5'- and 3'- exons in Human DNA sequences.
Fgenes	View	Download	Download	Download	Sequence Explorer,PDFGenes	Pattern based human gene structure prediction (multiple genes, both chains).
Fgenes-m	View	Download	Download	Download	Sequence Explorer,PDFGenes	Pattern-based prediction of multiple variants of gene structure
Fsplice	View	Download	Download	Download		Program provides the possibility to search for both donor and acceptor sites, and to define thresholds for them independently. Program allows to search minor variants of splicing donor site (GC-site) as well.
PDFGenes	View	Download	Download	Download		PDFGenes utilizes the results of Gene Finding software, such as FGenesh, FGenesh+,FGenesh-C, FGenesh-2, FGenes, FGenes-mand BestORF, and represents them in PDF format for better viewability.
PSF	View	Download	Download	Download		Finding pseudogenes in a genomic sequence.
Rnaspl	View	Download	Download	Download		Program for predicting exon-exon junction positions in cDNA sequences. For this program gcc fortran library should be installed locally.
SPL	View	Download	Download	Download		Prediction of splice sites in DNA sequences. For this program gcc fortran library should be installed locally.
SplM	View	Download	Download	Download		Prediction of splice sites in Human DNA sequences.
Bacterial Gene Finding
ABSplit	View	Download	Download	Download	Sequence Explorer	Program determines for the nucleotide sequence of approx. 300-600 n.p. whether it belongs to archeal or bacterial genome.
Bprom	View	Download	Download	Download		Prediction of bacterial promoters
FindTerm	View	Download	Download	Download	RNA Secondary Structure Viewer	Searchs for bacterial terminators in DNA sequences.
RNA Strucuture
Bestpal-E	View	Download	Download	Download		Calculates best palindrome for given RNA sequence, and also a set suboptimal palindromes (sorted by energy)
Bestpal-H	View	Download	Download	Download	RNA Secondary Structure Viewer	Calculates best palindrome for given RNA sequence with restrictions.
Bestpal-W	View	Download	Download	Download	RNA Secondary Structure Viewer	Program for searching best "linear" RNA secondary structure for long sequences with a window moving along the sequence.
Find-miRNA	View	Download	Download	Download		Searchs for pre-miRNAs in a given RNA sequence and for miRNA in each found pre-miRNA.
FoldRNA	View	Download	Download	Download	RNA Secondary Structure Viewer	Program for RNA secondary structure prediction based on dynamic programming (Nussinov and Jackonson, 1978, Zuker, 2005).
Protein Location
PSite	View	Download	Download	Download		Search for of prosite patterns with statistical estimation.
CTL-Epitope	View	Download	Download	Download		This program is designed for prediction of CTL epitopes of length=9 in protein sequences.
Protcomp-AN	View	Download	Download	Download		Program for Identification of sub-cellular localization of Eukaryotic proteins: Animal/Fungi.
Protcomp-B	View	Download	Download	Download		Program for Identification of sub-cellular localization of bacterial proteins.
Protcomp-PL	View	Download	Download	Download		Program for Identification of sub-cellular localization of Eukaryotic proteins: Plants
Repeats
LCRep	View	Download	Download	Download	Sequence Explorer	Program for mapping low complexity regions in nucleotide sequences.
LCRep-P	View	Download	Download	Download	Sequence Explorer	Program for mapping low complexity regions in protein sequences.
TandemRep	View	Download	Download	Download		Program for mapping the Tandem Repeats Regions in nucleotide sequences.
TandemRep-P	View	Download	Download	Download		Program for mapping the Tandem Repeats Regions in protein sequences.
SelTag
BdClust	View	Download	Download	Download		Clustering of gene expression profiles or samples by Ben-Dor algorithm.
CHPImport	View	Download	Download	Download		Import expression data from the Affymetrix CHP format to SelTag data file.
FieldCorr	View	Download	Download	Download	Graph Viewer	The program calculates correlation coefficients between the gene expression values in experiments (fields).
GeneCorr	View	Download	Download	Download	Graph Viewer	The program calculates correlation coefficients between the gene expression profiles.
HClust	View	Download	Download	Download		Clustering of gene expression profiles by hierarchical algorithm.
Mas5Norm	View	Download	Download	Download		Normalization of the Affymetrix gene expression row data by MAS 5.0 algorithm.
Mas5Baseline	View	Download	Download	Download		Comparison of the Affymetrix gene expression row data to the baseline data by MAS 5.0 algorithm.
SelByExpr	View	Download	Download	Download		Gene selection by query (logical expression).
SelCorr	View	Download	Download	Download	Graph Viewer	The program selects most correlated genes for specified gene set.
SOMClust	View	Download	Download	Download		The program clusters gene expression profiles or samples by SOM (Self-Organizing Map) algorithm.
Statistics
SNNBP-Learn	View	Download	Download	Download		SNNBP-Learn implement the back-propagation training algorithm and output the optimal NN structure, saved in the SNNBP internal format.
SNNBP-Predict	View	Download	Download	Download		SNNBP-Predict - neural network calculate output values (predictions) using input values from the data file (target values need not be specified in this option).
SNNBP-Test	View	Download	Download	Download		SNNBP-Test implement testing of the previously obtained network on the user data.
Seqman
Restrictase	View	Download	Download	Download	Restrictase Viewer	The program for finding and displaying the positions of the cut sites of restriction enzyme recognition sequences.
Viewers
3D-Explorer	View	Download	Download	Download		3D Explorer is designed to visualize spatial models of biological macromolecules and their complexes. Application is compatible with PDB files.
Assembler Viewer	View	Download	Download	Download		The "Assembler Viewer" software is developed to visualize the "Assembler" output data.
DotPlot viewer	View	Download	Download	Download		DotPlot viewer provides the possibility to visualize the sites of homology between the sets of sequences.
Graph	View	Download	Download	Download		The application is made for visualization of statistics files.
MaliN Viewer	View		Download	Download		The "MaliN viewer" application is developed for aligning nucleotide sequences and working with results of such alignments.
MaliP Viewer	View		Download	Download		The "MaliP viewer" application is developed for aligning protein sequences and working with results of such alignments.
Reads Mapping Viewer	View	Download	Download	Download		The "Reads Mapping Viewer" software is developed to visualize the "ReadsMap" output data.
Restrictase Viewer	View	Download	Download	Download		The "Restrictase viewer" application is purposed for visualization of restriction sites that were found with use of "Restrictase" program.
RNA Secondary Structure Viewer	View	Download	Download	Download		RNA Secondary Structure Viewer is made for visualization and manipulation of RNA secondary structure.
Sequence Explorer	View	Download	Download	Download		Sequence Explorer provides the visual representation of genome annotations.

订阅：评论 (Atom)