SNPMeta is a Python and BioPython-based tool to generate "metadata" for single nucleotide polymorphisms (SNPs) for easy filtering, or submission to SNP databases. Information reported includes gene name, whether the SNP is coding or noncoding, and whether the SNP is synonymous or nonsynonymous. SNPMeta outputs in either a dbSNP submission report format, or a tab-delimited format.

Companion Scripts
These are various helper scripts provided to help with running SNPMeta. They might have uses outside of that context, though.
Blast_SNPs.sh - A shell script to run BLAST on SNPs, and save the reports as XML. Requires an installation of NCBI's BLAST executables, and a Bash shell. Edit the script in a text editor so the variables match your system. Requires a directory with FASTA files, with one sequence per file. This script will create a new file for each FASTA in the directory, ending in '.xml', containing the BLAST report.
Convert_Illumina.py - A Python script to convert from the Illumina contextual sequence format to FASTA, for input to SNPMeta. Accepts a text file with two fields, separated by a tab: the SNP Name, and the SNP contextual sequence. Outputs a FASTA file with IUPAC ambiguities to stdout.
GBSContextualSeq.py - A Python script to build SNP contextual sequences from a reference sequence and a VCF file. Generates a separate FASTA file for each sample listed in the VCF file. This is useful for generating contextual sequence from genotype-by-sequence (GBS) data, as the SNPs will be stored as a VCF. Requires BioPython. Also requires Argparse if using Python < 2.7.
Split_FASTA.py - A Python script to split a large FASTA file into smaller files. Takes a FASTA file and a positive integer as arguments. Requires BioPython.



1. Structure-Pipeline, This is really a good tool.

Users of Structure (Pritchard et al, 2000) may be familiar with the interface of the BioHPC cluster at Cornell. Unfortunately, guest access was discontinued in May, 2011. If you have structure installed on your SGE supercomuting cluster, several features of the web-based BioHPC cluster interface can be replaced with a pipeline of qsub and python scripts. This pipleline will guide you through setting up your datafile and parameter settings, running structure efficiently at many values of K, summarizing those results using CLUMPP, and vizualizing the results using custom R scripts.

2. structure Harvester

3. a simple guide


test for local adaptation and to analyze the performance of hybrids relative to native parental plants

tool and a reference on transplantation data analysis.

1. Geyer, C. J.S. Wagenius, and R. G. Shaw2007Aster models for life history analysisBiometrika 94:415426.


willislab -- on Mimulus speciation and genetics


genomics of ecological speciation - some cases

1. http://onlinelibrary.wiley.com/doi/10.1111/j.1461-0248.2010.01546.x/full
A guide to the genomics of ecological speciation in natural animal populations

Interest in ecological speciation is growing, as evidence accumulates showing that natural selection can lead to rapid divergence between subpopulations. However, whether and how ecological divergence can lead to the buildup of reproductive isolation remains under debate. What is the relative importance of natural selection vs. neutral processes? How does adaptation generate reproductive isolation? Can ecological speciation occur despite homogenizing gene flow? These questions can be addressed using genomic approaches, and with the rapid development of genomic technology, will become more answerable in studies of wild populations than ever before. In this article, we identify open questions in ecological speciation theory and suggest useful genomic methods for addressing these questions in natural animal populations. We aim to provide a practical guide for ecologists interested in incorporating genomic methods into their research programs. An increased integration between ecological research and genomics has the potential to shed novel light on the origin of species.

2. http://www.sciencedirect.com/science/article/pii/S0169534712001863

What is needed for next-generation ecological and evolutionary genomics?

Ecological and evolutionary genomics (EEG) aims to link gene functions and genomic features to phenotypes and ecological factors. Although the rapid development of technologies allows central questions to be addressed at an unprecedented level of molecular detail, they do not alleviate one of the major challenges of EEG, which is that a large fraction of genes remains without any annotation. Here, we propose two solutions to this challenge. The first solution is in the form of a database that regroups associations between genes, organismal attributes and abiotic and biotic conditions. This database would result in an ecological annotation of genes by allowing cross-referencing across studies and taxa. Our second solution is to use new functional techniques to characterize genes implicated in the response to ecological challenges.

Divergent selection and heterogeneous genomic divergence

Levels of genetic differentiation between populations can be highly variable across the genome, with divergent selection contributing to such heterogeneous genomic divergence. For example, loci under divergent selection and those tightly physically linked to them may exhibit stronger differentiation than neutral regions with weak or no linkage to such loci. Divergent selection can also increase genome-wide neutral differentiation by reducing gene flow (e.g. by causing ecological speciation), thus promoting divergence via the stochastic effects of genetic drift. These consequences of divergent selection are being reported in recently accumulating studies that identify: (i) ‘outlier loci’ with higher levels of divergence than expected under neutrality, and (ii) a positive association between the degree of adaptive phenotypic divergence and levels of molecular genetic differentiation across population pairs [‘isolation by adaptation’ (IBA)]. The latter pattern arises because as adaptive divergence increases, gene flow is reduced (thereby promoting drift) and genetic hitchhiking increased. Here, we review and integrate these previously disconnected concepts and literatures. We find that studies generally report 5–10% of loci to be outliers. These selected regions were often dispersed across the genome, commonly exhibited replicated divergence across different population pairs, and could sometimes be associated with specific ecological variables. IBA was not infrequently observed, even at neutral loci putatively unlinked to those under divergent selection. Overall, we conclude that divergent selection makes diverse contributions to heterogeneous genomic divergence. Nonetheless, the number, size, and distribution of genomic regions affected by selection varied substantially among studies, leading us to discuss the potential role of divergent selection in the growth of regions of differentiation (i.e. genomic islands of divergence), a topic in need of future investigation.


MultiGeneBlast: Combined BLAST searches for operons and gene clusters

MultiGeneBlast is an open source tool for identification of homologs of multigene modules such as operons and gene clusters. It is based on a reformatting of the FASTA headers of NCBI GenBank protein entries, using which it can track down their source nucleotide and coordinates.

Oftentimes when studying such genetic loci, much can be learned from their evolutionary context. Furthermore, MultiGeneBlast can aid in the detection of such multigene parts for synthetic biology projects; a synthetic library of operons can be created based on its output to identify those operons whose function is closest to the one desired by the user.
This tool provides the opportunities to identify all homologous genomic regions by combining the results of single BlastP runs on each gene, and sorting genomic regions from any GenBank entry by the number of hits, synteny conservation and cumulative Blast bit score. The basic algorithm behind this was previously used in our antiSMASH software.
Additionally, architecture searches can be performed to find any genomic regions with Blast hits to any user-specified combination of amino acid sequences.
The tool comes with a pre-configured database containing the most recent version of all relevant GenBank divisions. Moreover, you can easily make your own databases from local files or online GenBank entries or divisions


bigcor: Large correlation matrices in R


It has been shown that by calculating the Pearson correlation between genes, one can identify (by high \varrho values, i.e. > 0.9) genes that share a common regulation mechanism such as being induced/repressed by the same transcription factors:
I had an idea. How about using my microarray data of gene expression of 40000 genes in 28 samples and calculate the correlation between all 40000 genes (variables). 

Inferring Population Histories Using Genome-Wide Allele Frequency Data

The recent development of high-throughput genotyping technologies has revolutionized the collection of data in a wide range of both model and nonmodel species. These data generally contain huge amounts of information about the demographic history of populations. In this study, we introduce a new method to estimate divergence times on a diffusion time scale from large single-nucleotide polymorphism (SNP) data sets, conditionally on a population history that is represented as a tree. We further assume that all the observed polymorphisms originate from the most ancestral (root) population; that is, we neglect mutations that occur after the split of the most ancestral population. This method relies on a hierarchical Bayesian model, based on Kimura’s time-dependent diffusion approximation of genetic drift. We implemented a Metropolis–Hastings within Gibbs sampler to estimate the posterior distribution of the parameters of interest in this model, which we refer to as the Kimura model. Evaluating the Kimura model on simulated population histories, we found that it provides accurate estimates of divergence time. Assessing model fit using the deviance information criterion (DIC) proved efficient for retrieving the correct tree topology among a set of competing histories. We show that this procedure is robust to low-to-moderate gene flow, as well as to ascertainment bias, providing that the most distantly related populations are represented in the discovery panel. As an illustrative example, we finally analyzed published human data consisting in genotypes for 452,198 SNPs from individuals belonging to four populations worldwide. Our results suggest that the Kimura model may be helpful to characterize the demographic history of differentiated populations, using genome-wide allele frequency data.


modeler4simcoal2 (m4s2) - a modeler for coalescent processes

modeler4simcoal2 (m4s2) is a modeler for coalescent processes. It allows the modeling of both demographies and chromosomes (i.e., markers with linkage relationships in multiple chromosome blocks).

m4s2 generates files for usage with Simcoal2 which can easily be analyzed with Arlequin3. m4s2 can be run standalone or can directly call and control Simcoal2. Arlequin3 can also be called after the simulations are run.

m4s2 is a Java Web Start application (requiring Java 1.4, available for Windows, Mac and Linux among others). It requires no installation and can be run directly from the web. m4s2 can be run on more platforms than those supported by Simcoal2 and Arlequin3 (in this case only in standalone mode).

The purpose of m4s2 is to allow biologists to concentrate more on biology and the underlying models used on analysis (and less on having to learn a new computer simulation tools). We expect that m4s2 will lower the barrier for coalescent simulator use.

m4s2 has full expressive power with regards to chromosome modeling (i.e., it can model all that Simcoal2 supports).

Regarding demographies, m4s2 includes a set of models which cover the vast majority found in the literature (e.g., island, stepping-stone). An extension system is also provided allowing for the creation of new models. A simple extension language is provided, if the language is not enough the full expressive power of Python (Jython) can be used to create new models. New models can be made available online as m4s2 can import those directly from the web. We make available an external model on the expansion of humans and domesticated species after the Neolithic as hierarchically structured.

Before using m4s2 we recommend reading the users guide. At least the first few lines... You can run m4s2 directly from here.


Evolutionary Genomics

Evolutionary Genomics

Statistical and Computational Methods, Volume 2

Evolutionary Biology for the 21st Century


Evolutionary Processes That Shape Genomic and Phenotypic Variation

The availability of genomic data from a remarkable range of species has allowed the alignment and comparison of whole genomes. These comparative approaches have been used to characterize the relative importance of fundamental evolutionary processes that cause genomic evolution and to identify particular regions of the genome that have experienced recent positive selection, recurrent adaptive evolution, or extreme sequence conservation[72][75]. Yet more recently, resequencing of additional individuals or populations is also allowing genome-wide population genetic analyses within species [76][82]. Such population-level comparisons will allow even more powerful study of the relative importance of particular evolutionary processes in molecular evolution as well as the identification of candidate genomic regions that are responsible for key evolutionary changes (e.g., sticklebacks [83], butterflies [84]Arabidopsis [85]). These data, combined with theoretical advances, should provide insight into long-standing questions such as the prevalence of balancing selection, the relative frequency of strong versus weak directional selection, the role of hybridization, and the importance of genetic drift. A key challenge will be to move beyond documenting the action of natural selection on the genome to understanding the importance of particular selective agents. For example, what proportion of selection on genomes results from adaptation to the abiotic environment, coevolution of species, sexual selection, or genetic conflict? Finally, as sequencing costs continue to drop and analytical tools improve, these same approaches may be applied to organisms that present intriguing evolutionary questions but were not tractable methodologically just a few years ago. The nonmodel systems of today may well become the model systems of tomorrow [86].

Understanding Biological Diversification

A major and urgent challenge is to improve knowledge of the identity and distribution of species globally. While we need to retain the traditional focus on phenotypes, powerful new capabilities to obtain and interpret both genomic and spatial data can and should revolutionize the science of biodiversity. Building on momentum from single-locus “barcoding" efforts, new genome-level data can build bridges from population biology to systematics [91]. By establishing a comprehensive and robust “Tree of Life," we will improve understanding of both the distribution of diversity and the nature and timing of the evolutionary processes that have shaped it.

pandas - a python package working with dataframe

1. http://blog.yhathq.com/posts/R-and-pandas-and-what-ive-learned-about-each.html

pandas is the utility belt for data analysts using python. The package centers around the pandas DataFrame, a two-dimensional data structure with indexable rows and columns. It has effectively taken the best parts of Base R, R packages like plyr and reshape2 and consolidated them into a single library. It has lots of features (see library highlights). pandas gets its name from panel data, an econometrics term for multidimensional structured datasets (McKinney 5., 2013)

2. http://pandas.pydata.org/pandas-docs/stable/index.html

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.
pandas is well suited for many different kinds of data:
  • Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
  • Ordered and unordered (not necessarily fixed-frequency) time series data.
  • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
  • Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure


SHIPS - a non-parametric clustering algorithm



SHIPS (Spectral Hierarchical clustering for the Inference of Population Structure) is a non-parametric clustering algorithmthat clusters individuals from a population into genetically homogeneous sub-populations from genotype data. After computing a pairwise distance matrix, the algorithm progressively divides the original population in two sub-populations by the use of aspectral clustering algorithm. The process is then iterated in each of the two sub-populations and so on. This leads to the construction of a binary tree, where each node represents a group of individuals. To determine the final clusters a tree pruning procedure and an estimation of the optimal number of clusters, that is a gap statistic, are applied. In such an approach both the final clustering of the individuals and the number of clusters are estimated by the method.
The algorithm SHIPS is implemented with the software R that can be downloaded from the (CRAN web page) and is divided in several functions :
  • ships.cluster constructing the tree and providing several clustering possibilities
  • ships.gap that estimates the final number of clusters
  • ships.plotCluster that provides a graphical representation of the clustering
  • ships.plotGap that plots the criterion used to estimate the final number of clusters

SHIPS ressources


R package for IBD

1. gdsfmt and SNPRelate - Please follow this link to view the tutorial.
    gdsfmt and SNPRelate are high-performance computing R packages for multi-core symmetric multiprocessing computer architectures. They are used to accelerate two key computations is GWAS: principal component analysis (PCA) and relatedness analysis using identity-by-descent (IBD) measures. The kernels of our algorithms are written in C/C++, and have been highly optimized. Benchmarks show the uniprocessor implementations of PCA and IBD are ~8 to 50 times faster than the implementations provided by the popular EIGENSTRAT (v3.0) and PLINK (v1.07) programs respectively, and can be sped up to 30~300 folds by utilizing eight cores. SNPRelate can analyze tens of thousands of samples, with millions of SNPs.

2. CrypticIBDcheck
to identify pairs of closely-related

subjects based on genetic marker data from single-nucleotide polymorphisms (SNPs). The
package is able to accommodate SNPs in linkage disequibrium (LD), without the need to
thin the markers so that they are approximately independent in the population. Sample
pairs are identified by superposing their estimated identity-by-descent (IBD) coefficients
on plots of IBD coefficients for pairs of simulated subjects from one of several common
close relationships. The methods are particularly relevant to candidate-gene association
studies, in which dependent SNPs cluster in a relatively small number of genes spread
throughout the genome. The accommodation of LD allows the use of all available genetic data, a desirable property when working with a modest number of dependent SNPs
within candidate genes

GC-Biased Gene Conversion


The Role of GC-Biased Gene Conversion in Shaping the Fastest Evolving Regions of the Human Genome

GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that accelerates the fixation of guanine or cytosine alleles, regardless of their effects on fitness. gBGC can increase the overall rate of substitutions, a hallmark of positive selection. Many fast-evolving genes and noncoding sequences in the human genome have GC-biased substitution patterns, suggesting that gBGC—in contrast to adaptive processes—may have driven the human changes in these sequences.


Phylogenetic Patterns of GC-Biased Gene Conversion in Placental Mammals and the Evolutionary Dynamics of Recombination Landscapes

Analysis of Maize Full-Length cDNAs -- a good example

Sequencing, Mapping, and Analysis of 27,455 Maize Full-Length cDNAs


Contaminated FLcDNAs were found by comparing them against the maize, rice and Arabidopsis rRNA sequences with a BLAST e-value≤1e-50, which identified 26 rRNAs. An additional 110 FLcDNAs were identified that encoded proteins highly similar to bacteria (16 cDNAs), fungus (76 cDNAs) and vertebrate (18 cDNAs) and did not show similarity with plant proteins.
The ORFs were computed using the software GETORF in EMBOSS package [50] with parameters “–minsize 150, -find 1, -methionine, -noreverse”. TE and SSR analyses were performed using RepeatMasker (repeatmasker.org). For TE analysis, the Poaceae (grass family) TE database was downloaded from Genetic Information Research Institute (www.girinst.org) and the FLcDNAs that had masked sequence length of ≥100 bp were used for the TE insertion analysis. SSRs with length ≥20 bp and divergence ≤10% were selected for SSR location analysis. Putative transcription factors were analyzed using BLASTx with e-value≤1e-10 against rice and Arabidopsis transcription factor proteins downloaded from PlantTFDB (planttfdb.cbi.pku.edu.cn). Any maize cDNAs showing positive matches in both rice and Arabidopsis were assigned to TF families using the PlantTFDB nomenclature.
Plant homolog analysis was conducted using BLASTx (e-value≤1e-10) to compare rice, sorghum, Arabidopsis and poplar protein sequences downloaded from the following sites: 67,393 rice (MSU release 6.0; rice.plantbiology.msu.edu), 35,899 sorghum (www.phytozome.net/sorghum), 32,615 Arabidopsis (TAIR v8.0; www.arabidopsis.org) and 45,555 poplar (genome.jgi-psf.org). The maize FLcDNAs that did not have a homolog were compared with the plant UniProt database [29], where another 147 rice, sorghum, Arabidopsis or poplar homologs were identified and removed. Then the 1,475 putative unique maize FLcDNAs were mapped to GO annotated maize gene models with ≥95% ID and ≥90% alignment length using BLAT. GO over- and under- representation analysis were performed using Cytoscape [51] with BiNGO (Biological Networks Gene Ontology, [25]) plug-in and activating a hypergeometric distribution statistical test (p-value ≤0.05) with Benjamini and Hochberg false discovery rate (FDR) correction [52] relative to GO annotated maize gene models.
For annotation of all EST and FLcDNA assemblies, the unitrans were searched against the UniProt plants database (2009-06-17) using BLASTx with e-value≤1e-20. The GO [24]annotations were extracted from the UniProt file and gene association file (ftp.ebi.ac.uk/pub/databases/GO/goa/UNIP​ROT),which were mapped to plant GO Slim [32]. Some of the results were computed by custom Perl scripts, and the rest were obtained from the website, as follows: Table 6 was copied from the “Advanced Summary/Example Queries” page. The number of UniProt matches for the 27k were from the “UniTrans Search”, where “Non-maize UniProt Match” was set to ‘yes’; for the non-putative, the “Match Description” was set to “not putative”. Table 8Table 9, and the top of Table 10 can all be verified from the PAVE query system.

DAVID and WebGESAT for pathway analysis

pathway analysis you can use: 

DAVID (http://david.abcc.ncifcrf.gov/), 

Gene Set Analysis Toolkit (http://bioinfo.vanderbilt.edu/webgestalt/)


NGS Statistical genetics courses from Abecasis lab


Genomic consequences of transitions from cross- to self-fertilization on the efficacy of selection

Genomic consequences of transitions from cross- to self-fertilization on the efficacy of selection in three independently derived selfing plants


Transitions from cross- to self-fertilization are associated with increased genetic drift rendering weakly selected mutations effectively neutral. The effect of drift is predicted to reduce selective constraints on amino acid sequences of proteins and relax biased codon usage. We investigated patterns of nucleotide variation to assess the effect of inbreeding on the accumulation of deleterious mutations in three independently evolved selfing plants. Using high-throughput sequencing, we assembled the floral transcriptomes of four individuals of Eichhornia(Pontederiaceae); these included one outcrosser and two independently derived selfers of E.paniculata, and Eparadoxa, a selfing outgroup. The dataset included ~8000 loci totalling ~3.5 Mb of coding DNA.


Tests of selection were consistent with purifying selection constraining evolution of the transcriptome. However, we found an elevation in the proportion of non-synonymous sites that were potentially deleterious in the Epaniculata selfers relative to the outcrosser. Measurements of codon usage in high versus low expression genes demonstrated reduced bias in both E. paniculataselfers.


Our findings are consistent with a small reduction in the efficacy of selection on protein sequences associated with transitions to selfing, and reduced selection in selfers on synonymous changes that influence codon usage.


Softberry programs for genomics

Softberry Programs available to academic users at no charge for occasional use in research projects

Program nameHelpDocumentationLinuxMac/Windows (W)Results ViewerTitle
ReadsMapViewDownloadDownloadDownloadReads Mapping ViewerMapping of reads to chromosome (contig)
Reads Mapping ViewerViewDownloadDownloadDownloadThe "Reads Mapping Viewer" software is developed to visualize the "ReadsMap" output data.
AssemblerViewDownloadDownloadDownloadAssember ViewerAlgorithm of ab initio genome assembling using data produced by next-generation sequencing machines (Illumina/Solexa/etc). To view the program output data the Assembler Viewer can be used.
Assembler ViewerViewDownloadDownloadDownloadThe "Assembler Viewer" software is developed to visualize the "Assembler" output data.
SNP-ToolboxViewDownloadDownload (4.3Gb)Download  (W) (3.8GB)A fast and effective tool for analysis of genome variations in human chromosomes.
CPGFinderViewDownloadDownloadDownloadSearch for CpG islands in sequences
FPROMViewDownloadDownloadDownloadHuman promoter prediction
NsiteViewDownloadDownloadDownloadSearch for of consensus patterns with statistical estimation.
Nsite-mViewDownloadDownloadDownloadSearch for regulatory motifs conserved in several sequences
PatternViewDownloadDownloadDownloadSearch for significant patterns in the set of sequences.
PolyahViewDownloadDownloadDownloadRecognition of 3'-end cleavage and polyadenilation region
ScanWM-PLViewDownloadDownloadDownloadSearch for weight matrix patterns of plant regulatory sequences
TSSGViewDownloadDownloadDownloadRecognition of human PolII promoter region and start of transcription
Protein Structure
3D-CompViewDownloadDownloadDownload3D-ExplorerSequence Alignment to Superposition
3D-MatchViewDownloadDownloadDownload3D-ExplorerPairwise protein structure alignment
3D-ModelFitViewDownloadDownloadDownload3D-ExplorerProgram for the estimation of quality of 3D model structure of protein.
Abini3DViewDownloadDownloadDownload3D-ExplorerAb inition folding
CysRecViewDownloadDownloadDownloadPrediction of SS-bonding states of cysteines and disulphide bridges in protein sequences
GetAtomsViewDownloadDownloadDownload3D-ExplorerOptimization of replaced side chain groups by simulated annealing algorithm
MolDynViewDownloadDownloadDownload3D-ExplorerMolDyn is designed to perform multiple tasks with protein structure.
MolMechViewDownloadDownloadDownload3D-ExplorerEnergy minimization program by molecular mechanic
NNSSPViewDownloadDownloadDownloadNearest-neighbor SS prediction
Net-SSPredictViewDownloadDownloadDownloadProgram for secondary structure prediction. Neural nets based on profile of psiBLAST comparison of the query sequence with NR database.
PDisorderViewDownloadDownloadDownloadProgram for predicting ordered and disordered regions in protein sequences
SSEnvIDViewDownloadDownloadDownloadProtein secondary structure and environment assignment from atomic coordinates
SSPViewDownloadDownloadDownloadPrediction of a-helix and b-strand segments of globular proteins
SSPALViewDownloadDownloadDownloadPrediction of protein secondary sturcture by using local alignments
ProtmapViewDownloadDownloadDownloadMapping of a set of proteins on genome
MaliNViewDownloadDownloadDownloadMaliN ViewerMultiple alignment for nucleotide sequences.
MaliPViewDownloadDownloadDownloadMaliP ViewerMultiple alignment for protein sequences.
EstMapViewDownloadDownloadDownloadProgram for mapping a whole set of mRNAs/ESTs to a chromosome sequence
Scan2ViewDownloadDownloadDownloadProgram for aligning two multimegabyte-size genome sequences using a sequential search for most significant similarity regions.
Scan2aViewDownloadDownloadDownloadProgram for aligning two aminoacid sequences using a sequential search for most significant similarity regions.
Gene Finding
FexViewDownloadDownloadDownloadPrediction of internal, 5'- and 3'- exons in Human DNA sequences.
FgenesViewDownloadDownloadDownloadSequence Explorer,PDFGenesPattern based human gene structure prediction (multiple genes, both chains).
Fgenes-mViewDownloadDownloadDownloadSequence Explorer,PDFGenesPattern-based prediction of multiple variants of gene structure
FspliceViewDownloadDownloadDownloadProgram provides the possibility to search for both donor and acceptor sites, and to define thresholds for them independently. Program allows to search minor variants of splicing donor site (GC-site) as well.
PDFGenesViewDownloadDownloadDownloadPDFGenes utilizes the results of Gene Finding software, such as FGeneshFGenesh+,FGenesh-CFGenesh-2FGenesFGenes-mand BestORF, and represents them in PDF format for better viewability.
PSFViewDownloadDownloadDownloadFinding pseudogenes in a genomic sequence.
RnasplViewDownloadDownloadDownloadProgram for predicting exon-exon junction positions in cDNA sequences. For this program gcc fortran library should be installed locally.
SPLViewDownloadDownloadDownloadPrediction of splice sites in DNA sequences. For this program gcc fortran library should be installed locally.
SplMViewDownloadDownloadDownloadPrediction of splice sites in Human DNA sequences.
Bacterial Gene Finding
ABSplitViewDownloadDownloadDownloadSequence ExplorerProgram determines for the nucleotide sequence of approx. 300-600 n.p. whether it belongs to archeal or bacterial genome.
BpromViewDownloadDownloadDownloadPrediction of bacterial promoters
FindTermViewDownloadDownloadDownloadRNA Secondary Structure ViewerSearchs for bacterial terminators in DNA sequences.
RNA Strucuture
Bestpal-EViewDownloadDownloadDownloadCalculates best palindrome for given RNA sequence, and also a set suboptimal palindromes (sorted by energy)
Bestpal-HViewDownloadDownloadDownloadRNA Secondary Structure ViewerCalculates best palindrome for given RNA sequence with restrictions.
Bestpal-WViewDownloadDownloadDownloadRNA Secondary Structure ViewerProgram for searching best "linear" RNA secondary structure for long sequences with a window moving along the sequence.
Find-miRNAViewDownloadDownloadDownloadSearchs for pre-miRNAs in a given RNA sequence and for miRNA in each found pre-miRNA.
FoldRNAViewDownloadDownloadDownloadRNA Secondary Structure ViewerProgram for RNA secondary structure prediction based on dynamic programming (Nussinov and Jackonson, 1978, Zuker, 2005).
Protein Location
PSiteViewDownloadDownloadDownloadSearch for of prosite patterns with statistical estimation.
CTL-EpitopeViewDownloadDownloadDownloadThis program is designed for prediction of CTL epitopes of length=9 in protein sequences.
Protcomp-ANViewDownloadDownloadDownloadProgram for Identification of sub-cellular localization of Eukaryotic proteins: Animal/Fungi.
Protcomp-BViewDownloadDownloadDownloadProgram for Identification of sub-cellular localization of bacterial proteins.
Protcomp-PLViewDownloadDownloadDownloadProgram for Identification of sub-cellular localization of Eukaryotic proteins: Plants
LCRepViewDownloadDownloadDownloadSequence ExplorerProgram for mapping low complexity regions in nucleotide sequences.
LCRep-PViewDownloadDownloadDownloadSequence ExplorerProgram for mapping low complexity regions in protein sequences.
TandemRepViewDownloadDownloadDownloadProgram for mapping the Tandem Repeats Regions in nucleotide sequences.
TandemRep-PViewDownloadDownloadDownloadProgram for mapping the Tandem Repeats Regions in protein sequences.
BdClustViewDownloadDownloadDownloadClustering of gene expression profiles or samples by Ben-Dor algorithm.
CHPImportViewDownloadDownloadDownloadImport expression data from the Affymetrix CHP format to SelTag data file.
FieldCorrViewDownloadDownloadDownloadGraph ViewerThe program calculates correlation coefficients between the gene expression values in experiments (fields).
GeneCorrViewDownloadDownloadDownloadGraph ViewerThe program calculates correlation coefficients between the gene expression profiles.
HClustViewDownloadDownloadDownloadClustering of gene expression profiles by hierarchical algorithm.
Mas5NormViewDownloadDownloadDownloadNormalization of the Affymetrix gene expression row data by MAS 5.0 algorithm.
Mas5BaselineViewDownloadDownloadDownloadComparison of the Affymetrix gene expression row data to the baseline data by MAS 5.0 algorithm.
SelByExprViewDownloadDownloadDownloadGene selection by query (logical expression).
SelCorrViewDownloadDownloadDownloadGraph ViewerThe program selects most correlated genes for specified gene set.
SOMClustViewDownloadDownloadDownloadThe program clusters gene expression profiles or samples by SOM (Self-Organizing Map) algorithm.
SNNBP-LearnViewDownloadDownloadDownloadSNNBP-Learn implement the back-propagation training algorithm and output the optimal NN structure, saved in the SNNBP internal format.
SNNBP-PredictViewDownloadDownloadDownloadSNNBP-Predict - neural network calculate output values (predictions) using input values from the data file (target values need not be specified in this option).
SNNBP-TestViewDownloadDownloadDownloadSNNBP-Test implement testing of the previously obtained network on the user data.
RestrictaseViewDownloadDownloadDownloadRestrictase ViewerThe program for finding and displaying the positions of the cut sites of restriction enzyme recognition sequences.
3D-ExplorerViewDownloadDownloadDownload3D Explorer is designed to visualize spatial models of biological macromolecules and their complexes. Application is compatible with PDB files.
Assembler ViewerViewDownloadDownloadDownloadThe "Assembler Viewer" software is developed to visualize the "Assembler" output data.
DotPlot viewerViewDownloadDownloadDownloadDotPlot viewer provides the possibility to visualize the sites of homology between the sets of sequences.
GraphViewDownloadDownloadDownloadThe application is made for visualization of statistics files.
MaliN ViewerViewDownloadDownloadThe "MaliN viewer" application is developed for aligning nucleotide sequences and working with results of such alignments.
MaliP ViewerViewDownloadDownloadThe "MaliP viewer" application is developed for aligning protein sequences and working with results of such alignments.
Reads Mapping ViewerViewDownloadDownloadDownloadThe "Reads Mapping Viewer" software is developed to visualize the "ReadsMap" output data.
Restrictase ViewerViewDownloadDownloadDownloadThe "Restrictase viewer" application is purposed for visualization of restriction sites that were found with use of "Restrictase" program.
RNA Secondary Structure ViewerViewDownloadDownloadDownloadRNA Secondary Structure Viewer is made for visualization and manipulation of RNA secondary structure.
Sequence ExplorerViewDownloadDownloadDownloadSequence Explorer provides the visual representation of genome annotations.