## SVMlight

** Category** Intelligent Software>Neural Network Systems/Tools

** Abstract** SVMlight is an implementation of Support Vector Machines (SVMs) in C (C is a general-purpose computer programming language).

It is an implementation of Vapnik's Support Vector Machine for the problem of 'pattern recognition', for the problem of regression, and for the problem of learning a ranking function.

The optimization algorithms used in SVMlight are described in [Thorsten Joachims, Learning to Classify Text Using Support Vector Machines. Dissertation, Kluwer, 2002.]

And [T. Joachims, 11 in: Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT Press, 1999.].

The algorithm has scalable memory requirements and can efficiently handle problems with many thousands of support vectors.

The software also provides methods for efficiently assessing the generalization performance.

It includes two (2) efficient estimation methods for both 'error rate' and precision/recall.

XiAlpha-estimates can be computed at essentially No computational expense, but they are conservatively biased.

‘Almost unbiased’ estimates provide leave-one-out testing.

SVMlight exploits that the results of most 'leave-one-outs' (often more than 99%) are predetermined and need Not be computed.

New in this version is an algorithm for learning ranking functions. The goal is to learn a function from 'preference examples', so that it orders a new set of objects as accurately as possible.

Such ranking problems naturally occur in applications like 'search engines' and recommender systems.

Furthermore, this version includes an algorithm for training large-scale transductive SVMs. The algorithm proceeds by solving a sequence of optimization problems, lower-bounding the solution using a form of local search.

A detailed description of the algorithm can be found in [Thorsten Joachims, Transductive Inference for Text Classification using Support Vector Machines, International Conference on Machine Learning (ICML) 1999].

A similar transductive learner, which can be thought of as a 'transductive version' of k-Nearest Neighbor (k-NN) is the ‘Spectral Graph Transducer’ (SGT) --

[The SGT is a method for transductive learning. It solves a normalized- cut (or ratio-cut) problem with additional constraints for the labeled examples using spectral methods].

SVMlight can also train SVMs with 'cost models'.

SVMlight has been used on a large range of problems, including text classification, image recognition tasks, bioinformatics and medical applications.

Many tasks have the property of 'sparse instance vectors'. This implementation of SVMlight makes use of this property which leads to a very compact and efficient representation.

The main features/capabilities SVMlight are as follows:

1) Fast optimization algorithm - a) working set selection based on steepest feasible descent; b) "shrinking" heuristic; c) caching of kernel evaluations; and d) use of folding in the linear case.

2) Solves classification and regression problems. For multivariate and structured outputs use SVMstruct (see below…).

3) Solves ranking problems (e. g. learning retrieval functions in the STRIVER search engine).

4) Computes XiAlpha-estimates of the error rate, the precision, and the recall.

5) Efficiently computes Leave-One-Out estimates of the error rate, the precision, and the recall.

6) Includes an algorithm for approximately training large transductive SVMs (TSVMs) (see ‘Spectral Graph Transducer’ above…).

7) Can train SVMs with cost models and example dependent costs.

8) Allows restarts from a specified vector of dual variables.

9) Handles many thousands of support vectors.

10) Handles several hundred of thousands of training examples.

11) Supports standard kernel functions and lets you define your own.

12) Uses 'sparse vector' representation.

The following features/capabilities are New for this version:

1) SVMstruct: SVM learning for multivariate and structured outputs like trees, sequences, and sets.

2) SVMperf: New training algorithm for linear classification SVMs that can be much faster than SVMlight for large datasets.

It also lets you directly optimize multivariate performance measures like F1-Score, ROC-Area, and the Precision/Recall Break-Even Point.

3) SVMrank: New algorithm for training Ranking SVMs that is much faster than SVMlight in '-z p' mode.

*System Requirements*

Contact manufacturer.

*Manufacturer*

- Thorsten Joachims, Associate Professor
- Cornell University
- Department of Computer Science
- USA
- E-mail: thorsten@joachims.org

Initially Developed at University of Dortmund Informatik, AI-Unit, Germany

** Manufacturer Web Site**
SVMlight

** Price** Contact manufacturer.

** G6G Abstract Number** 20392

** G6G Manufacturer Number** 104028