InforSense TextSense

Category Cross-Omics>Data/Text Mining Systems/Tools

Abstract InforSense TextSense, built on the InforSense Platform, provides a wide range of text processing, analytics and visualization components. It allows scientists, informaticians and analysts, in any discipline, to rapidly create, execute and deploy applications which enable them to leverage literature analytics within their work. The TextSense analytics components cover the most widely used text analysis functions. Each component has an intuitive visual interface for setting its parameters and can be composed with other components into an analytic workflow to create complete, problem solving applications. The components are designed to easily integrate with the wide variety of InforSense analytical components so that applications for predictive modeling, structured data analysis, bioinformatics and cheminformatics can all be combined with text analytics.

With many applications in a large number of sectors, InforSense TextSense allows organizations to discover new concepts and relationships in large digital collections of textual data, including scientific papers, patents, business reports, laboratory reports, web pages and warranty records.

Note 1: Gene Expression Case Study - Gene expression profiling is widely used for target discovery in the drug development process. Such experiments result in a list of differentially expressed genes which the analyst will wish to investigate further. One information source that can be leveraged for this is the published scientific literature. Text analytics can be used to answer specific questions about the genes; are there direct or indirect relationships between these genes and the disease under study, in which biological processes are these genes involved, in what biological pathways are these genes involved. In this way, hypothesis based on the experimental results, may be supported or contradicted by information extracted from the published scientific literature.

InforSense TextSense Key Features include:

Import-Export --

InforSense TextSense provides a wide range of components for importing documents and other resources. These include:

XML --

InforSense TextSense supports a wide range of processing operations on Extensible Markup Language (XML) documents. These include:

Preprocessing --

InforSense TextSense supports a wide range of preprocessing components for parsing, cleaning and normalizing text. This includes operations for:

Annotation --

InforSense TextSense implements a generic architecture for tagging features within documents. Supported operations include:

Statistical Analysis --

Statistical analysis can be used to find trends and patterns within a document collection or it can be used to transform documents into the feature vector space in preparation for document categorization. Statistical analysis components include:

Document Categorization --

Once a document has been transformed into the feature vector space, traditional classification and clustering algorithms may be applied to categorize the document. These components include:

Information Extraction --

Information in the form of relationships that exist between features in documents can be uncovered using one of the following components:

Visualization --

InforSense TextSense adds to the suite of InforSense visualizers. The additional visualizers are:

Indexing --

Documents collections may be indexed for rapid interactive querying. Supported operations include:

Oracle Text --

InforSense Oracle Edition extends InforSense by allowing components within an InforSense application to be executed within Oracle, without the overhead of data transfer to and from the database. TextSense’s Oracle Text nodes add to this by providing Oracle’s Oracle Text functionality in the same framework.

This package includes components that perform the following in- database processing:

Extensibility --

The TextSense range of components can be extended to add extra functionality, including:

Additionally for Scientists --

Create and deliver true cross-domain applications through access to additional analytical components for biology and chemistry using InforSense BioSense (see G6G Abstract Number 20033) and ChemSense to combine our components, your internal tools and third- party software and data stores in one analytical workflow solution.

Note 2: Ontology Tagging Case Study - Ontologies are structured vocabularies that are used to describe knowledge (entities and the relationships between these) within a given domain. Well known ontologies include the Gene Ontology in the biomedical domain and the Derwent World Patents Index Codes in the intellectual property domain. Analysts find it much easier to locate relevant literature if it has been categorized against an ontology that describes their domain. This is often done manually, for instance various biomedical databases curate scientific papers according to the Gene Ontology (GO) concepts to which they refer. Using text analytics and machine learning techniques, documents from any source can be automatically categorized to any ontology. For instance, as well as accessing patents manually categorized by Derwent World Patents Index Codes, an analyst could also access Medline abstracts, which may contain useful information associated with the patents, categorized by the same codes.

System Requirements

The InforSense Platform is based on the Java J2EE architecture and has been validated on an a wide range of operating system environments. Currently supported platforms include:

Manufacturer

Manufacturer Web Site InforSense Limited

Price Contact manufacturer.

G6G Abstract Number 20034

G6G Manufacturer Number 101430