OnTheFly
Category Cross-Omics>Data/Text Mining Systems/Tools
Abstract OnTheFly is a web-based tool for automated document- based text annotation, data linking and network generation that applies biological named entity recognition to enrich Microsoft Office, PDF and plain text documents.
The input files are converted into the HyperText Markup Language (HTML) format and then sent to the Reflect tagging server, which highlights biological entity names like genes, proteins and chemicals, and attaches JavaScript code to them, to invoke a summary pop-up window.
The window provides an overview of relevant information about the entity, such as a protein description, the domain composition, a link to the 3D structure and links to other relevant online resources.
OnTheFly is also able to extract the bioentities mentioned in a set of files and to produce a ‘graphical representation’ of the networks of the known and predicted associations of these entities by retrieving the info from the STITCH database.
The STITCH database --
STITCH (Search Tool for Interactions of Chemicals) is a sister project of the protein-protein interactions (PPIs) server STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) - (see G6G Abstract Number 20298).
STITCH is a resource, to explore known and ‘predicted interactions’ of chemicals and proteins.
Chemicals are linked to other chemicals and proteins by evidence are derived from experiments, databases and the literature.
STITCH contains interactions for over 68,000 chemicals and over 1.5 million proteins in 373 species.
OnTheFly Functionality --
OnTheFly is a service that automatically annotates document files such as Microsoft Word, Excel, Power Point, PDF or plain text files.
After submitting the files to the service, the system returns a tagged HTML version of the documents.
Gene, protein and chemical names are highlighted, and by clicking on them the user activates a pop-up window which contains relevant information about the entity.
The presented information includes domains, sequence, and organism, sub-cellular localization for proteins, formula for chemicals and protein-chemical and chemical-chemical interactions for both entity types.
This functionality is provided by the Reflect server.
Furthermore, OnTheFly can generate interaction networks for a set of bioentities (genes, proteins, chemicals) extracted from the STITCH database.
The user can select the preferred organism whose protein aliases will be used for tagging and network generation; the default organism is set to Homo sapiens.
The size of the network and the number of interactors per recognized entity can be manually defined by the user.
The network generation is Not restricted to one document but can be applied to a set of documents simultaneously.
Lists, summarizing the identified bioentities are also generated. These lists contain the ID of the bioentities together with the organism and description.
These summary results contain information about bioentities found in the set of the selected files.
The performance of the service can be assessed in a number of ways, such as the quality of the document conversion, the time required to tag a document and the accuracy of the annotation.
The file converters used are able to maintain most of the layout of the documents, including column separation, tables and figures.
The time to process a full text article of about 15 pages with images and tables typically ranges between 15 to 20 seconds. This time includes the whole process including the communication with the server.
The name tagging performance of the Reflect server is comparable to other available methods.
What OnTheFly can be used for --
The manufacturer believes that OnTheFly is a very useful service, Not only for computer scientists but also for biologists.
The manufacturer presents the following two (2) different scenarios to motivate the user to use OnTheFly.
1) Imagine that you are a biologist and you have experimental results listed in an Excel spread sheet which also lists protein and gene names.
This could be a document, for example, which contains your results of a microarray experiment.
You can now automatically annotate the entities and link them to relevant databases by dropping your file into the OnTheFly interface.
The system will immediately have your file annotated with names, identifiers, descriptions, synonyms, sequences, organisms, domains, literature, information about the sub-cellular localization, chemical types, protein-protein, protein-chemical and chemical-chemical interactions and links to external databases.
2) A second scenario could be to combine similar documents between each other and extract ‘novel knowledge’ using the ‘network generation’ capability.
Many times, articles refer to ‘protein complexes’ or interactions between different bioentities coming from either experiments or prediction methods.
OnTheFly could be proven to be a useful tool to integrate all the produced knowledge, with the knowledge that is currently hidden in databases and come up with extended new biological knowledge.
System Requirements
Web-based and contact manufacturer.
Manufacturer
- Structural and Computational Biology Unit
- EMBL Meyerhofstrasse 1
- Heidelberg, Germany
Manufacturer Web Site OnTheFly
Price Contact manufacturer.
G6G Abstract Number 20509
G6G Manufacturer Number 104130




