G6G Directory of Omics and Intelligent Software

TABASCO

Category Genomics>Gene Expression Analysis/Profiling/Tools and Cross-Omics>Agent-Based Modeling/Simulation/Tools

Abstract TABASCO (Transcription And Binding And Serious Computational Overhead) is a simulator created to address the problem of simulating gene expression at single-base resolution.

By defining the logic of transcription and translation rules a priori such as initiation, elongation, termination, and interactions of polymerases and proteins, Tabasco automatically traverses the state of the system as it develops and thus makes simulation at such high resolution computationally feasible.

Tabasco was designed to allow the manufacturer's to better understand bacteriophage gene expression. In general, Tabasco would be useful to those interested in explicitly simulating hypotheses of protein-DNA interactions and their relation to gene expression (e.g., eukaryotic gene expression initiation).

TABASCO enables the precise representation of individual molecules and events in gene expression for genome-scale systems. The manufacturers use a single molecule computational engine to track individual molecules interacting with and along nucleic acid polymers at single base resolution.

Tabasco uses logical rules to automatically update and delimit the set of species and reactions that comprise a system during simulation, thereby avoiding the need for a priori specification, of all possible combinations of molecules and reaction events.

The manufacturer's have confirmed that single molecule, base-pair resolved simulation using TABASCO (Tabasco) can accurately compute gene expression dynamics and, moving beyond previous simulators, provide for the direct representation of intermolecular events such as polymerase collisions and promoter occlusion.

The manufacturer's have demonstrated the computational capacity of Tabasco by simulating the entirety of gene expression during bacteriophage T7 infection; for reference, the 39,937 base pair T7 genome encodes 56 genes that are transcribed by two types of RNA polymerases active across 22 promoters.

Tabasco enables genome-scale simulation of transcription and translation at individual molecule and single base-pair resolution. By directly representing the position and activity of individual molecules on DNA, Tabasco can directly test the effects of detailed molecular processes on system-wide gene expression.

Tabasco would also be useful for studying the complex regulatory mechanisms controlling eukaryotic gene expression.

The computational engine underlying Tabasco could also be adapted to represent other types of processive systems in which individual reaction events are organized across a single spatial dimension (e.g., polysaccharide synthesis).

Tabasco Algorithm --

Tabasco makes use of a Gibson-accelerated Gillespie Stochastic Simulation Algorithm (SSA) to compute the reaction event timing and the resultant time-evolution of the genetic system.

Tabasco uses predefined rules of transcription and translation such as initiation, elongation, termination, and protein interactions of polymerases, ribosomes, and other DNA/RNA-associated proteins. Based on these rules, Tabasco automatically updates the states of molecules and reaction events (Methods).

Tabasco transitions between two levels of resolution while simulating gene expression: “single-molecule” and “species level”. Reactants and events that occur on the DNA are tracked at single-molecule resolution - each copy of DNA and proteins associated with them are tracked individually by the simulator.

The structure of Tabasco confers at least four (4) advantages -

1) First, treating gene expression at base-pair resolution allows for more accurate representation of the kinetics of gene expression. For example, traditional SSAs often lump multi-step reactions as single steps causing inaccurate estimates on pre-steady state kinetics.

2) Second, tracking the state of individual proteins on DNA and allowing internal logic to automatically generate reactions eliminates the need to enumerate all the possible states of polymerases and proteins associated with the DNA. For example, transcribing polymerases and processes such as genome entry into a cell can cause certain protein binding sites to be inaccessible.

This feature also allows you to consider and integrate many factors that may influence the rate of RNA polymerization for any particular gene, such as the binding of multiple transcription factors or the contribution of RNA polymerases that initiated transcription at a promoter connected to an upstream gene.

3) Third, protein-protein interactions (PPIs) that may occur on DNA, such as collisions between different polymerases, can be accounted for and simulated based on simple and explicit rules.

4) Fourth, Tabasco can be used to graphically depict the location and dynamics of individual RNA polymerase molecules transcribing DNA, providing a useful visual tool for considering genome-scale gene expression dynamics.

Tabasco Overall Simulator Structure --

Tabasco implements a modified version of the Gibson Next Reaction Method (NRM). Gibson's NRM is an exact SSA that extends Gillespie's original First Reaction method by:

1) Updating only the minimum number of reactions through the use of a dependency graph and using absolute tentative reaction times; and

2) Using an efficient data structure, the indexed priority queue, to store and sort reactions.

The NRM uses a dependency graph to determine which reactions are affected by any particular reaction's execution. In the NRM, the dependency graph is constructed only once at the start of simulation and remains unchanged afterwards.

Tabasco contains two (2) specialized classes per DNA molecule within the overall indexed priority queue that are used to track a set of dynamically generated reactions.

Each class contains a dynamic priority queue that stores the dynamically generated transcriptional and translational reactions and their tentative reaction times, as well the dependencies of any particular reaction.

The minimum tentative reaction time for all the dynamic reactions is set as the tentative reaction time of that dynamic priority queue with respect to the overall priority queue.

Tabasco Translation --

The transcribing RNA polymerase complexes also produce mRNA, which are tracked by a separate class. Since there are many more copies of RNA than DNA during simulation, and because the manufacturers were uncertain as to the importance of protein-protein interactions on the mRNA, the manufacturers chose to treat the majority of translation at the species level.

However, if, for example, an RNA polymerase prematurely terminates before reaching the translation stop site, the mRNA and the ribosomes translating it will Not produce functional proteins.

Thus, so long as the coding sequence is still being transcribed, one must treat all translation events at the single-molecule level as well. However, as soon as the entire coding sequence of an open reading frame is transcribed, Tabasco transitions to tracking species of mRNA molecules.

Tabasco DNA entry --

The manufacturers developed multiple models to represent DNA entry into a cell or compartment (such a step can be useful in starting a simulation of infection or transformation).

First, DNA can enter the cell via a zero-order constant reaction rate. Second, RNA polymerases have themselves been implicated as molecular motors that can drive DNA entry.

Thus, in Tabasco, RNA polymerases that reach the end of a DNA molecule that has Not yet fully entered the cell can cause DNA internalization at the rate of transcription elongation. Both of these mechanisms were used during the manufacturers’ simulation of T7 gene expression.

Tabasco Simulation, Data Output, Visualization, Code --

Tabasco is written in Java. The input file to the simulator is an XML file that describes and parameterizes the relevant genetic elements, initial conditions, and any other reactions that occur.

The visualization is created by producing images that are then merged using Quicktime® to create a movie. The source code, executables, along with documentation and instructions for use are available via the TABASCO website.

System Requirements

Contact manufacturer.

Manufacturer

Department of Bioengineering
Stanford University
318 Campus Drive
Clark Center Room S170
Stanford, CA 94305-5444 USA

Manufacturer Web Site TABASCO

Price Contact manufacturer.

G6G Abstract Number 20639

G6G Manufacturer Number 104238

The G6G Directory of Omics and Intelligent Software

TABASCO