Category Cross-Omics>Data/Text Mining Systems/Tools

Abstract ProMiner is a software tool for biological name recognition which has been developed at the Fraunhofer Institute for Algorithms and Scientific Computing SCAI in a collaboration project with Aventis Pharma AG.

ProMiner addresses several fundamental issues in ‘name entity recognition’ in the field of life science:

1) Recognition of biological, medical or chemical named entities in scientific text.

2) Based on a dictionary approach, it can work with voluminous dictionaries, complex thesauri and large controlled vocabularies derived from ontologies.

3) Automated generation, curation and updating are followed by an automatic and manual evaluation process.

4) Recognition of biomedical entities and their spelling variants in text.

5) Mapping of synonyms to reference names and data source.

6) Context-dependent disambiguation of biomedical termini and resolution of acronyms.

Identification of biological entities in ProMiner is based on a dictionary approach. These regularly updated dictionaries are generated and curated automatically followed by a manual evaluation process.

Furthermore, a classification according to their ambiguous usage in a scientific text assures high specificity.

The proprietary ProMiner dictionary developed by SCAI for Homo sapiens has more than 32,000 entries and contains about 400,000 synonyms.

This dictionary covers the vast majority of all human gene and protein names and thus allows for efficient identification of human gene and protein names in unstructured text.

Available dictionaries --

1) Gene and protein name dictionaries for various organisms:

2) Gene ontology dictionary;

3) Mesh term dictionary;

4) Organism name dictionary;

5) Disease name dictionary;

6) Drug name dictionary.

ProMiner's indexing machinery for fast indexing of huge document resources enables:

1) Information retrieval and a fast literature overview even in a new research focus.

2) Questions like “give me all the genes/proteins associated with breast cancer” could be answered with a few mouse clicks including all the different synonyms and also providing a link to the sequence databases.

3) Classification of documents (e.g. patents) concerning different distinguishing features that could be based on large ontologies.

ProMiner's 'content generation' for the interpretation of large scale experimental data provides:

1) A simple output file to fill/supplement database content.

2) Linkage to other data made possible through the provided mapping to databases or its controlled vocabulary.

3) Background information for the annotation of genes/proteins.

4) Gene/protein interaction networks that could directly be used for data interpretation.

Proof of Performance -- The performance of ProMiner recognition of gene and protein names was tested in the »critical assessment of text mining in biology« (BioCreAtIvE I and II).

ProMiner, was benchmarked against other industrial and academic name entity recognition tools and scored in BioCreAtIvE I, highest in text dealing with the two multi-cellular organisms.

System Requirements

ProMiner™ is available for UNIX™ / Linux and the Microsoft Windows™ operating system.

To reduce the time required to process huge text corpora, ProMiner can run distributed in grid enabled hardware.

The software is already integrated in the IBM-UIMA framework and can be combined with other text processing software.

ProMiner has been successfully integrated in the Temis BER Skill Cartridge and can be adapted as pre-processing for other semantic analysis environments by tagging entity references in texts.


Manufacturer Web Site ProMiner

Price Contact manufacturer.

G6G Abstract Number 20242

G6G Manufacturer Number 101048