Web site and design © 2008-2010 by G6G Consulting Group. All Rights Reserved. Most product content has been taken directly from manufacturer's web
sites; other product content is assembled by G6G Consulting Group. G6G welcomes any corrections and/or comments.
Product Feedback
* Required Field
*Your name:
*Email:
*Questions, comments, or feedback:
    MedlineRanker

    Category  Cross-Omics>Data/Text Mining Systems/Tools

    Abstract  MedlineRanker webserver is a text mining system which
    allows the flexible ranking of Medline (PubMed) for a topic of interest
    without expert knowledge.

    Given some abstracts related to a topic, the system automatically
    deduces the most discriminative words in comparison to a random
    selection.

    These words are used to score other abstracts, including those from
    recent publications that have Not yet been annotated, which can then be
    ranked by relevance.

    The user defines their topic of interest using their own set of abstracts,
    which can be just a few examples, and they can also run the analysis
    with default parameters.

    If the input contains closely related abstracts, the system returns
    relevant abstracts from the recent bibliography with high accuracy.

    The web interface also allows customization of other parameters and
    inputs, such as the reference set of abstracts, which is compared to the
    query.

    This tool can process thousands of abstracts from the Medline
    database in a few seconds, or millions in a few minutes.

    MedlineRanker method and implementation --

    The MedlineRanker method is derived from a supervised learning
    method which was tested on the subject of stem cells.

    Briefly, noun usage is compared between a set of abstracts related to a
    topic of interest, called the training set, and the whole Medline or a
    subset, called the background set.

    First, nouns are extracted from each English abstract, including the title,
    without counting multiple occurrences.

    The original supervised learning method was improved by using a
    linear naïve Bayesian classifier which is applied by calculating noun
    weights with a refactored-for-speed dot product, which sums only the
    features that occur.

    The manufacturer also uses the split-Laplace smoothing scheme to
    counteract class skew.

    An abstract is scored by summing the weights of each of its nouns, and
    'P-values' are defined as the proportion of abstracts with a higher score
    within 10,000 recent abstracts.

    Extraction of nouns in English abstracts is performed using the
    TreeTagger program (Helmut Schmid, Institute for Natural Language
    Processing, University of Stuttgart) and stored in a local MySQL
    database along with information from the Medline database.

    MedlineRanker Results --

    MedlineRanker user inputs -

    There are three (3) different sets of data that the user can provide to
    help them get the most relevant results from MedlineRanker: the
    training set, the background set and the test set.

    A user interested in ranked results related to a particular topic has to
    input some abstracts related to that topic as 'the training set'.

    In the training set, an abstract is represented by its PubMed identifier
    (PMID). These identifiers can be easily retrieved from a PubMed search
    results page as explained in the MedlineRanker webserver online
    documentation.

    Also, thanks to available Medline annotations the webserver can
    automatically construct the training set from a list of biomedical MeSH
    terms.

    Some example training sets can also be selected just by clicking on
    hyperlinks.

    If the user decides to run the analysis with the default parameters, the
    training set profile will be compared to a precomputed profile of the
    entire Medline database, and used to rank ten thousand (10,000) recent
    abstracts.

    Beyond the input set, a 'second main parameter' of MedlineRanker is
    the choice of the reference abstracts, i.e. 'the background set'.

    To construct a profile for the query topic, the noun frequencies in the
    training set are compared to the corresponding frequencies in the
    background set by a linear naïve Bayesian classifier.

    The default background set is the entire Medline database, which is
    clearly suitable when ranking recent abstracts or the most recent years
    of the literature.

    The manufacturer recommends using the default background set;
    however, you can also provide your own list of PMIDs.

    This may be useful when the abstracts that have to be ranked are all
    related to a same secondary topic.

    For instance, if one is interested in ranking abstracts already related to
    protein binding according to their relevance for the topic
    ‘Phosphorylation’, an appropriate background set would be a list of
    abstracts related to ‘protein binding’.

    The 'last main parameter' defines which abstracts are going to be
    ranked, i.e. 'the test set'.

    By default, 10,000 recent abstracts are selected. By using this relatively
    small subset of Medline, the results can be returned quickly and the
    performance of the training set can be evaluated in short amount of time.

    The test set can be extended to the last months or years of Medline with
    a cost in computational time. The manufacturer's server can process
    approximately one million abstracts per minute.

    Alternatively, the user can input his own test set with a list of PMIDs.
    This is very useful for focusing a search on a particular set of abstracts
    of interest.

    For instance, if one was interested in ranking abstracts describing
    protein-protein interaction (PPI), the main PPI databases, like the
    Human Protein Reference Database (HPRD), the Database of
    Interacting Proteins (DIP) or MINT (database of functional relationships
    between proteins, DNA and RNA), provide PMIDs for each described
    interaction.

    MedlineRanker Results page --

    The results page shows the ranked test set as a table, with the most
    relevant records at the top of the table. For each abstract, the table
    shows the rank, PMID, title and 'P-value'.

    The discriminative words that were used to score the abstracts are
    highlighted in the column containing the article title.

    Clicking on a PMID opens a pop-up window showing the whole abstract
    text with highlighted discriminative words, further info and a link to
    PubMed.

    During the ranking process, a leave-one-out cross validation is done on
    a subset of the data. This provides an estimation of the method's
    predictive performance, including precision and recall, for several cut-
    offs and is displayed as a table.

    Additionally, the probability of the correct ranking of a random pair of
    abstracts, one relevant and one irrelevant, is calculated from the area
    under a Receiver Operating Characteristic (ROC) curve.

    This is provided to allow future comparisons with other algorithms.
    Finally, the list of ‘discriminative words’ with corresponding weights is
    given in decreasing order of importance.

    System Requirements  

    Web based

    Manufacturer   

    MedlineRanker was created by the members of the Computational
    Biological and Data Mining (CBDM) group of Miguel Andrade at the Max
    Delbrueck Center for Molecular Biology, Berlin.

    Computational Biology and Data Mining Group
    Max Delbrück Center for Molecular Medicine
    Robert-Rössle-Str. 10
    13125 Berlin, Germany
    Fax: +49-30-9406-4240

    Manufacturer's Web Site   

    http://cbdm.mdc-berlin.de/~medlineranker/

    Price   Contact manufacturer

    G6G Product Number  20482

    G6G Manufacturer Number 104107
The G6G Directory of Omics and Intelligent Software
Search www.G6G-SoftwareDirectory.com
Bookmark and Share