Molegro Data Modeller

Category Cross-Omics>Data/Text Mining Systems/Tools

Abstract Molegro Data Modeller (MDM) is a cross-platform application for Data Mining, Data Modeling, and Data Visualization.

Its highly interactive user interface is ideal for fast and intuitive data exploration, as opposed to complex workflow based solutions or command-driven statistical products.

Molegro Data Modeller features/capabilities include:

Regression and Classification --

Molegro Data Modeller offers different types of data modeling:

1) 'Multiple Linear Regression' models simple linear relations between data, and is fast and efficient.

2) 'Partial Least Squares' reduces the dimensionality of the data set before creating a model. Suitable for data sets with many independent variables.

3) 'Neural Networks' (NNs) are able to model highly non-linear relations.

4) Support Vector Machines are also able to model 'complex relations' and tend to be less prone to over-fitting than Neural Networks.

5) K-Nearest-Neighbors for simple classification.

Feature Selection and Cross-Validation --

1) 'Feature selection' is easy to set up in the regression wizard: different schemes can be chosen (Forward, Backward, and Hill Climber selection) and be combined with different model selection criteria (Bayes Information Criterion or cross validated R^2).

Different descriptor rankings can be employed when searching the descriptors.

2) 'Cross-validation' is just as easy.

Cross-validate by using a specified number of random folds, by using Leave-One-Out or by manually creating folds.

Visualization --

The different visualization types are highly interactive. Selections in the spreadsheet are directly shown in the plots and vice versa.

It is also possible to apply different user-defined coloring schemes and apply jitter (add artificial noise to the data plots).

It is also possible to visualize high-dimensional data. Using the built-in Spring-mass Map model, high-dimensional data can be projected onto 2D or 3D plots.

Chemistry --

Molegro Data Modeller supports chemical data: MDM understands SMILES and SDF files and can create 2D depictions of molecules directly in the spreadsheet or in the 2D plotter.

Clustering --

Molegro Data Modeller offers different kinds of clustering: K-means clustering and threshold-based clustering (both very efficient), and a density-based clustering scheme (which is able to capture more complex cluster shapes).

Principal Component Analysis (PCA) --

Principal Component Analysis is a method for reducing the dimensionality of a dataset. A new set of 'principal components' is created using linear combinations of the original descriptors.

The number of descriptors is then reduced by only keeping the descriptors contributing most to the variance.

Algebraic Data Transformations --

It is possible to work with 'algebraic transformations' directly on columns: for instance, "New Activity = log (Act) + Beta^2" will create a new column based on the expression.

Outlier Detection --

Molegro Data Modeller provides two (2) methods for locating abnormal data:

1) A quartile-based method which checks how far away a data point is from the 25th and 75th percentile. This method examines each descriptor individually.

2) A density-based method which calculates a local density for each data point. Data points with a low density are far away from other data points and could be outliers.

Advanced Subset Creation --

Molegro Data Modeller offers a grid-based method for creating a diverse subset of a dataset.

It is possible to create grids in an arbitrary number of dimensions, and if you are working with 2D and 3D grids they can be visualized directly in the data plotters.

MDM additional features/capabilities include:

1) Scrambling (shuffling) of columns and "replace with random values" for performing y-Randomization.

2) Data preparation: scaling, normalization, repair of missing values.

3) Statistical measures: Pearson and Spearman correlation, Confusion matrices, F-measures, and many others.

4) Correlation Matrix.

5) Cross-term generation.

6) Custom Data Views and Grid Molecule Depictions.

7) Similarity Browser (Euclidean, Manhattan, Cosine, and Tanimoto measures).

8) Gnuplot export (for creating and customizing published quality plots). Gnuplot is a command-line program that can generate two- and three- dimensional plots of functions and data.

9) Online Help and automatic checks for software updates.

System Requirements

Molegro Data Modeller works with:

Manufacturer

Manufacturer Web Site Molegro Data Modeller

Price Contact manufacturer.

G6G Abstract Number 20388

G6G Manufacturer Number 104025