# Svm Classification Example Essay

If you have used machine learning to perform classification, you might have heard about *Support Vector Machines (SVM)*. Introduced a little more than 50 years ago, they have evolved over time and have also been adapted to various other problems like *regression, outlier analysis,* and *ranking*.

SVMs are a favorite tool in the arsenal of many machine learning practitioners. At [24]7, we too use them to solve a variety of problems.

In this post, we will try to gain a high-level understanding of how SVMs work. I’ll focus on developing intuition rather than rigor. What that essentially means is we will skip as much of the math as possible and develop a strong intuition of the working principle.

**The Problem of Classification**

Say there is a machine learning (ML) course offered at your university. The course instructors have observed that students get the most out of it if they are good at Math or Stats. Over time, they have recorded the scores of the enrolled students in these subjects. Also, for each of these students, they have a label depicting their performance in the ML course: “Good” or “Bad.”

Now they want to determine the relationship between Math and Stats scores and the performance in the ML course. Perhaps, based on what they find, they want to specify a prerequisite for enrolling in the course.

How would they go about it? Let’s start with representing the data they have. We could draw a two-dimensional plot, where one axis represents scores in Math, while the other represents scores in Stats. A student with certain scores is shown as a point on the graph.

The color of the point — green or red — represents how he did on the ML course: “Good” or “Bad” respectively.

This is what such a plot might look like:

When a student requests enrollment, our instructors would ask her to supply her Math and Stats scores. Based on the data they already have, they would make an informed guess about her performance in the ML course.

What we essentially want is some kind of an “algorithm,” to which you feed in the “score tuple” of the form *(math_score, stats_score)*. It tells you whether the student is a red or green point on the plot (red/green is alternatively known as a *class *or* label*). And of course, this algorithm embodies, in some manner, the patterns present in the data we already have, also known as the *training data*.

In this case, finding a line that passes between the red and green clusters, and then determining which side of this line a score tuple lies on, is a good algorithm. We take a side — the green side or the red side — as being a good indicator of her most likely performance in the course.

The line here is our *separating boundary *(because it separates out the labels)or* classifier *(we use it classify points). The figure shows two possible classifiers for our problem.

**Good vs Bad Classifiers**

Here’s an interesting question: both lines above separate the red and green clusters. Is there a good reason to choose one over another?

Remember that the worth of a classifier is not in how well it separates the training data. We eventually want it to classify yet-unseen data points (known as *test data*). Given that, we want to choose a line that captures the *general pattern* in the training data, so there is a good chance it does well on the test data.

The first line above seems a bit “skewed.” Near its lower half it seems to run too close to the red cluster, and in its upper half it runs too close to the green cluster. Sure, it separates the training data perfectly, but if it sees a test point that’s a little farther out from the clusters, there is a good chance it would get the label wrong.

The second line doesn’t have this problem. For example, look at the test points shown as squares and the labels assigned by the classifiers in the figure below.

The second line stays as far away as possible from both the clusters while getting the training data separation right. By being right in the middle of the two clusters, it is less “risky,” gives the data distributions for each class some wiggle room so to speak, and thus generalizes well on test data.

SVMs try to find the second kind of line. We selected the better classifier visually, but we need to define the underlying philosophy a bit more precisely to apply it in the general case. Here’s a simplified version of what SVMs do:

- Find lines that correctly classify the training data
- Among all such lines, pick the one that has the greatest distance to the points closest to it.

The closest points that identify this line are known as *support vectors*. And the region they define around the line is known as the *margin*.

Here’s the second line shown with the support vectors: points with black edges (there are two of them) and the margin (the shaded region).

Support Vector Machines give you a way to pick between many possible classifiers in a way that guarantees a higher chance of correctly labeling your test data. Pretty neat, right?

While the above plot shows a line and data in two dimensions, it must be noted that SVMs work in any number of dimensions; and in these dimensions, they find the analogue of the two-dimensional line.

For example, in three dimensions they find a *plane* (we will see an example of this shortly), and in higher dimensions they find a *hyperplane* — a generalization of the two-dimensional line and three-dimensional plane to an arbitrary number of dimensions.

Data that can be separated by a line (or in general, a hyperplane) is known as *linearly separable *data. The hyperplane acts as a *linear classifier.*

**Allowing for Errors**

We looked at the easy case of perfectly linearly separable data in the last section. Real-world data is, however, typically messy. You will almost always have a few instances that a linear classifier can’t get right.

Here’s an example of such data:

Clearly, if we are using a linear classifier, we are never going to be able to perfectly separate the labels. We also don’t want to discard the linear classifier altogether because it does seem like a good fit for the problem except for a few errant points.

How do SVMs deal with this? They allow you to specify how many errors you are willing to accept.

You can provide a parameter called “C” to your SVM; this allows you to dictate the tradeoff between:

- Having a wide margin.
- Correctly classifying
**training**data. A higher value of C implies you want lesser errors on the training data.

It bears repeating that this is a **tradeoff**. You get better classification of training data at the *expense *of a wide margin.

The following plots show how the classifier and the margin vary as we increase the value of C (support vectors not shown):

Note how the line “tilts” as we increase the value of C. At high values, it tries to accommodate the labels of most of the red points present at the bottom right of the plots. This is probably not what we want for test data. The first plot with C=0.01 seems to capture the general trend better, although it suffers from a lower accuracy on the training data compared to higher values for C.

And since this is a trade-off, note how the width of the margin shrinks as we increase the value of C.

In the previous example, the margin was a “no man’s land” for points. Here, we see it’s not possible anymore to have *both *a good separating boundary *and* an associated point-free margin. Some points creep into the margin.

An important practical problem is to decide on a good value of C. Since real-world data is almost never cleanly separable, this need comes up often. We typically use a technique like *cross-validation* to pick a good value for C.

**Non-linearly Separable Data**

We have seen how Support Vector Machines systematically handle perfectly/almost linearly separable data. How does it handle the cases where the data is absolutely not linearly separable? Afterall, a lot of real-world data falls in this category. Surely, finding a hyperplane can’t work anymore. This seems unfortunate given that SVMs excel at this task.

Here’s an example of non-linearly separable data (this is a variant of the famous XOR dataset), shown with the linear classifier SVMs find:

You’d agree this doesn’t look great. We have only 75% accuracy on the training data — the best possible with a line. And more so, the line passes very close to some of the data. The best accuracy is not great, and to get even there, the line nearly straddles a few points.

We need to do better.

This is where one of my favorite bits about SVMs come in. Here’s what we have so far: we have a technique that is really good at finding hyperplanes. But then we also have data that is not linearly separable. So what do we do? Project the data into a space where it *is* linearly separable and find a hyperplane in this space!

I’ll illustrate this idea one step at a time.

We start with the dataset in the above figure, and project it into a three-dimensional space where the new coordinates are:

This is what the projected data looks like. Do you see a place where we just might be able to slip in a plane?

Let’s run our SVM on it:

Bingo! We have perfect label separation! Lets project the plane back to the original two-dimensional space and see what the separation boundary looks like:

100% accuracy on the training data *and* a separating boundary that doesn’t run too close to the data! Yay!

The shape of the separating boundary in the original space depends on the projection. In the projected space, this is *always* a hyperplane.

Remember the primary goal of projecting the data was to put the hyperplane-finding superpowers of SVMs to use.

When you map it back to the original space, the separating boundary is not a line anymore. This is also true for the margin and support vectors. As far as our visual intuition goes, they make sense in the projected space.

Take a look at what they look like in the projected space, and then in the original space. The 3D margin is the region (not shaded to avoid visual clutter) between the planes above and below the separating hyperplane.

There are 4 support vectors in the projected space, which seems reasonable. They sit on the two planes that identify the margin. In the original space, they are still on the margin, but there doesn’t seem to be enough of them.

Let’s step back and analyze what happened:

**1. How did I know what space to project the data onto?**

It seems I was being utterly specific — there is a square root of 2 in there somewhere!

In this case, I wanted to show how projections to higher dimensions work, so I picked a very specific projection. In general, this is hard to know. However, what we do know is data is more *likely* to be linearly separable when projected onto higher dimensions, thanks to Cover’s theorem.

In practice, we try out a few high-dimensional projections to see what works. In fact, we can project data onto *infinite *dimensions and that often works pretty well. This deserves going into some detail and that’s what the next section is about.

**2. So I project the data first and then run the SVM?**

No. To make the above example easy to grasp I made it sound like we need to project the data first. The fact is you ask the SVM to do the projection for you. This has some benefits. For one, SVMs use something called *kernels* to do these projections, and these are pretty fast (for reasons we shall soon see).

Also, remember I mentioned projecting to infinite dimensions in the previous point? If you project the data yourself, how do you represent or store infinite dimensions? It turns out SVMs are very clever about this, courtesy of kernels again.

It’s about time we looked at kernels.

**Kernels**

Finally, the secret sauce that makes SVMs tick. This is where we need to look at a bit of math.

Let’s take stock of what we have seen so far:

- For linearly separable data SVMs work amazingly well.
- For data that’s almost linearly separable, SVMs can still be made to work pretty well by using the right value of C.
- For data that’s not linearly separable, we can project data to a space where it is perfectly/almost linearly separable, which reduces the problem to 1 or 2 and we are back in business.

It looks like a big part of what makes SVMs universally applicable is projecting it to higher dimensions. And this is where kernels come in.

First, a slight digression.

A very surprising aspect of SVMs is that in all of the mathematical machinery it uses, the exact projection, or even the number of dimensions, doesn’t show up. You could write all of it in terms of the *dot products* between various data points (represented as vectors). For *p*-dimensional vectors *i* and *j* where the first subscript on a dimensionidentifies the point and the second indicates the dimension number:

The dot product is defined as:

If we have *n* points in our dataset, the SVM needs *only* the dot product of each pair of points to find a classifier. Just that. This is also true when we want to project data to higher dimensions. We don’t need to provide the SVM with exact projections; we need to give it the dot product between all pairs of points in the projected space.

This is relevant because this is exactly what kernels do. A kernel, short for *kernel function*, takes as input two points in the original space, and directly gives us the dot product in the projected space.

Let’s revisit the projection we did before, and see if we can come up with a corresponding kernel. We will also track the number of computations we need to perform for the projection and then finding the dot products — to see how using a kernel compares.

For a point *i*:

Our corresponding projected point was:

To compute this projection we need to perform the following operations:

- To get the new first dimension: 1 multiplication
- Second dimension: 1 multiplication
- Third dimension: 2 multiplications

In all, 1+1+2 =** 4 multiplications**.

The dot product in the new dimension is:

To compute this dot product for two points *i *and *j*, we need to compute their projections first. So that’s 4+4 = 8 multiplications, and then the dot product itself requires 3 multiplications and 2 additions.

In all, that’s:

- Multiplications: 8 (for the projections) + 3 (in the dot product) = 11 multiplications
- Additions: 2 (in the dot product)

Which is total of 11 + 2 = **13 operations**.

I claim this kernel function gives me the same result:

We take the dot product of the vectors in the original space *first*, and then square the result.

Let expand it out and check whether my claim is indeed true:

It is. How many operations does this need? Look at step (2) above. To compute the dot product in two dimensions I need 2 multiplications and 1 addition. Squaring it is another multiplication.

So, in all:

- Multiplications: 2 (for the dot product in the original space) + 1 (for squaring the result) = 3 multiplications
- Additions: 1 (for the dot product in the original space)

A total of 3 + 1 = **4 operations. **This is only** 31% of the operations** we needed before.

It looks like it is faster to use a kernel function to compute the dot products we need. It might not seem like a big deal here: we’re looking at 4 vs 13 operations, but with input points with a lot more dimensions, and with the projected space having an even higher number of dimensions, the computational savings for a large dataset add up incredibly fast. So that’s one huge advantage of using kernels.

Most SVM libraries already come pre-packaged with some popular kernels like *Polynomial, Radial Basis Function (RBF)*, and *Sigmoid*. When we don’t use a projection (as in our first example in this article), we compute the dot products in the original space — this we refer to as using the *linear kernel*.

Many of these kernels give you additional levers to further tune it for your data. For example, the polynomial kernel:

allows you to pick the value of *c *and *d* (the degree of the polynomial). For the 3D projection above, I had used a polynomial kernel with *c=0* and *d=2*.

But we are not done with the awesomeness of kernels yet!

Remember I mentioned projecting to infinite dimensions a while back? If you haven’t already guessed, the way to make it work is to have the right kernel function. That way, we really don’t have to project the input data, or worry about storing infinite dimensions.

A kernel function computes what the dot product would be if you had actually projected the data.

The RBF kernel is commonly used for a *specific *infinite-dimensional projection. We won’t go into the math of it here, but look at the references at the end of this article.

How can we have infinite dimensions, but can still compute the dot product? If you find this question confusing, think about how we compute sums of infinite series. This is similar. There are infinite terms in the dot product, but there happens to exist a formula to calculate their sum.

This answers the questions we had asked in the previous section. Let’s summarize:

- We typically don’t define a specific projection for our data. Instead, we pick from available kernels, tweaking them in some cases, to find one best suited to the data.
- Of course, nothing stops us from defining our own kernels, or performing the projection ourselves, but in many cases we don’t need to. Or we at least start by trying out what’s already available.
- If there is a kernel available for the projection we want, we prefer to use the kernel, because that’s often faster.
- RBF kernels can project points to infinite dimensions.

**SVM libraries to get started**

There are quite a few SVM libraries you could start practicing with:

• libSVM

• SVM-Light

• SVMTorch

Many general ML libraries like scikit-learn also offer SVM modules, which are often wrappers around dedicated SVM libraries. My recommendation is to start out with the tried and tested *libSVM*.

libSVM is available as a commandline tool, but the download also bundles Python, Java, and Matlab wrappers. As long as you have a file with your data in a format libSVM understands (the README that’s part of the download explains this, along with other available options) you are good to go.

In fact, if you need a *really quick *feel of how different kernels, the value of C, etc., influence finding the separating boundary, try out the “Graphical Interface” on their home page. Mark your points for different classes, pick the SVM parameters, and hit Run!

I couldn’t resist and quickly marked a few points:

Yep, I’m not making it easy for the SVM.

Then I tried out a couple of kernels:

The interface doesn’t show you the separating boundary, but shows you the regions that the SVM learns as belonging to a specific label. As you can see, the linear kernel completely ignores the red points. It thinks of the whole space as yellow (-ish green). But the RBF kernel neatly carves out a ring for the red label!

**Helpful resources**

We have been primarily relying on visual intuitions here. While that’s a great way to gain an initial understanding, I’d strongly encourage you to dig deeper. An example of where visual intuition might prove to be insufficient is in understanding margin width and support vectors for non-linearly separable cases.

Remember that these quantities are decided by optimizing a trade-off.Unless you look at the math, some of the results may seem counter-intuitive.

Another area where getting to know the math helps is in understanding kernel functions. Consider the RBF kernel, which I’ve barely introduced in this short article. I hope the “mystique” surrounding it — its relation to an infinite-dimensional projection coupled with the fantastic results on the last dataset (the “ring”) — has convinced you to take a closer look at it.

**Resources I would recommend:**

- Video Lectures: Learning from Data by Yaser Abu-Mostafa. Lectures from 14 to 16 talk about SVMs and kernels. I’d also highly recommend the whole series if you’re looking for an introduction to ML, it maintains an excellent balance between math and intuition.
- Book: The Elements of Statistical Learning — Trevor Hastie, Robert Tibshirani, Jerome Friedman.Chapter 4 introduces the basic idea behind SVMs, while Chapter 12 deals with it comprehensively.

Happy (Machine) Learning!

### YOU’D ALSO LIKE:

**Google Analytics Audit Checklist and Tools***Auditing a Google Analytics setup like a pro*blog.statsbot.co

**Data Structures Related to Machine Learning Algorithms***A primer on data structures for data scientists*blog.statsbot.co

**Machine Learning Translation and the Google Translate Algorithm***The basic principles of machine translation engines*blog.statsbot.co

## Abstract

### Motivation

Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction.

### Results

The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries.

We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1–BRCA2 samples with RBF kernel of SVM.

### Conclusion

We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance.

The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.

## Background

High-density DNA microarray measures the activities of several thousand genes simultaneously and the gene expression profiles have been recently used for the cancer and also other disease classification. The Support Vector Machine (SVM) [1,2] is a supervised learning algorithm, useful for recognizing subtle patterns in complex datasets. It is one of the classification methods successfully applied to the diagnosis and prognosis problems. The algorithm performs discriminative classification, learning by example to predict the classifications of previously unclassified data. The Support Vector Machine (SVM) was one of the methods successfully applied to the cancer diagnosis problem in the previous studies [3,4]. In principle, the SVM can be applied to very high dimensional data without altering its formulation. Such capacity is well suited to the microarray data structure.

The popularity of the SVM algorithm comes from four factors [5]. 1) The SVM algorithm has a strong theoretical foundation, based on the ideas of VC (Vapnik Chervonenkis) dimension and structural risk minimization [2]. 2) The SVM algorithm scales well to relatively large datasets. 3) The SVM algorithm is flexible due in part to the robustness of the algorithm itself, and in part to the parameterization of the SVM via a broad class of functions, called kernel functions. The behavior of the SVM can be modified to incorporate prior knowledge of a classification task simply by modifying the underlying kernel function. 4) Accuracy: The most important explanation for the popularity of the SVM algorithm is its accuracy. The underlying theory suggests explanations for the SVMs excellent learning performance, its widespread application is due in large part to the empirical success the algorithm has achieved [5].

## Implementation

We have developed a simple graphical interface to our implementation of the SVM algorithm, called SVM Classifier. This interface allows novice users to download the software for local installation and easily apply a sophisticated machine learning algorithm to their data. We implemented a publicly accessible application that allows SVM users to perform SVM training, classification and prediction. For details on using the software, sample dataset and explanations of the underlying algorithms, we refer readers to the web site and the references listed there. SVM users might also be interested in a number of other licensed SVM implementations that have been described previously, including LIBSVM [6].

We used the SVM algorithms implemented by the Libsvm team [6], as a core. In order to maximize cross-platform compatibility SVM Classifier is implemented in java using standard swing libraries (Figure 1).

Figure 1

SVM Classifier interface screen shot.

The open source, cross-platform Apache Ant, and free edition of Borland JBuilder 2005 Foundation are used as the build tools. Although developed on WinXP OS, SVM Classifier has been successfully tested on Linux and other Windows platforms, and will run on Mac OS9 with the Swing extension. Users are able to run SVM Classifier on any computer with java 1.4 runtime or higher version.

The application has two frames, the classification and the prediction frame. In both frames data file format can be imported either as a "Labeled" or as a "Delimited" data file format (Figure 2 and Figure 3).

Figure 2

**Labelled Data File Format Screenshot**. The format of training and testing data file is: <label> <index1>:<value1> <index2>:<value2> ... <label> is the target**...**

Figure 3

**Delimited File Format Screenshot**. The delimited File Format is a common format for the microarray experiment. It can be extracted from most microarray experiments by using any spreadsheet. The format of training and testing data file is: <label>**...**

In the classification frame user will create a model from the training dataset for classification (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR), and distribution estimation.

In this frame user is able to import the training dataset into the application, select the path to save the model file, select the appropriate SVM and kernel type and create a model for the dataset. The model file can be later used for the prediction purpose. There is also a choice for cross validation. Cross validation (CV) technique is used to estimate the accuracy of each parameter combination in the specified range and helps us to decide the best parameters for classification problem.

In prediction frame the model will be applied to the test data to predict the classification of unknown data. We have also provided a tool for viewing the two dimensional data that can be accessed from the view menu bar (Figure 4).

### Kernel Types

*K*(*x*_{i}, *x*_{j}) = Φ(*x*_{i})^{T}Φ(*x*_{j}) is called the kernel function. Here training vectors *x*_{i }are mapped into a higher (probably infinite) dimensional space by the function Φ. There are following four basic kernels: linear, polynomial, radial basic function (RBF), and sigmoid:

1. Linear: *K*(*x*_{i}, *x*_{j}) = *x*_{i}^{T}*x*_{j}

2. Polynomial: The polynomial kernel of degree d is of the form

*K*(*x*_{i}, *x*_{j}) = (*x*_{i}, *x*_{j})^{d}

3. RBF: The Gaussian kernel, known also as the radial basis function, is of the form

Where *σ *stands for a window width

4. Sigmoid: The sigmoid kernel is of the form

*K*(*x*_{i}, *x*_{j}) = tanh(*k*(*x*_{i}*x*_{j}) + *ϑ*)

When the sigmoid kernel is used with the SVM one can regard it as a two-layer neural network.

### SVM Types

#### 1. C-SVC: C-Support Vector Classification (Binary Case)

Given a training set of instance-label pairs (*x*_{i}, *y*_{i}), i = 1,...,*l *where *x*_{i }∈ *R*^{n }and *y *∈ {1, -1}^{l}, ←the support vector machines (SVM) require the solution of the following optimization problem:

SVM finds a linear separating hyperplane with the maximal margin in this higher dimensional space. C > 0 is the penalty parameter of the error term [7,2]. The decision function is:

#### 2. nu-SVC: *ν*-Support Vector Classification (Binary Case)

The parameter *ν *∈ (0, 1) is an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors [8]. Given training vectors *x*_{i }∈ *R*^{n}, i = 1,...,l, in two classes, and a vector *y *∈ *R*^{l }such that *y*_{i }∈ {1, -1}, the primal form considered is:

And the decision function is:

#### 3. epsilon-SVR: *ε*-Support Vector Regression (*ε*-SVR)

One extension of the SVM is that for the regression task. A regression problem is given whenever Y = ℝ for the training data set Z = {(*x*_{i}, *y*_{i}) ∈ *X *× *Y *| i = 1,...,M} and our interest is to find a function of the form *f*: X → RD. The primal formulation for the SVR is then given by:

We have to introduce two types of slack-variables *ξ*_{i }and , one to control the error induced by observations that are larger than the upper bound of the *ε*-tube, and the other for the observations smaller than the lower bound. The approximate function is:

#### 4. nu-SVR: *ν*-Support Vector Regression (*ν*-SVR)

Similar to *ν*-SVC, for regression, [8,9] use a parameter *ν *to control the number of support vectors. However, unlike *ν*-SVC where C is replaced by *ν *here *ν *replaces the parameter *ε *of *ε*-SVR. Then the decision function is the same as that of *ε*-SVR.

#### 5. One-class SVM: distribution estimation

One-class classification's difference from the standard classification problem is that the training data is not identically distributed to the test data. The dataset contains two classes: one of them, the target class, is well sampled, while the other class is absent or sampled very sparsely. Schölkopf *et al*. [9] have proposed an approach in which the target class is separated from the origin by a hyperplane. The primal form considered is:

And the decision function is:

### Cross Validation

The goal of using cross validation is to identify good parameters so that the classifier can accurately predict unknown data [6].

A common way is to separate training data into two parts, one of which is considered unknown in training the classifier. Then the prediction accuracy on this set can more precisely reflect the performance on classifying unknown data. An improved version of this procedure is cross-validation.

In v-fold cross-validation, the training set is divided into v subsets of equal size. Sequentially one subset is tested using the classifier trained on the remaining v - 1 subsets. Thus, each instance of the whole training set is predicted once so the cross-validation accuracy is the percentage of data which are correctly classified. The cross-validation procedure can prevent the overfitting problem [6].

### Shrinking

Chang and Lin [6] mentioned that since for many problems the number of free support vectors (i.e. 0 <*α*_{i }< C) is small, the shrinking technique reduces the size of the working problem without considering some bounded variables [6,10].

### Caching

Caching is another technique for reducing the computational time. Since Q (Q is an *l *by *l *positive semi definite matrix, *Q*_{ij }= *y*_{i}*y*_{j}*K*(*x*_{i}, *x*_{j})) is fully dense and may not be stored in the computer memory, elements Q_{ij }are calculated as needed. Usually a special storage using the idea of a cache is used to store recently used Q_{ij }[6,10].

## Results and Discussion

We presented an evaluation of the different classification techniques presented previously. Data from a breast cancer study [11] is used in this study. The data consists of 22 cDNA microarrays, each representing 3226 genes based on biopsy specimens of primary breast tumors of 7 patients with germ-line mutations of BRCA1, 8 patients with germ-line mutations of BRCA2, and 7 with sporadic cases. We took log2 of the data to perform the classification using the three kernels.

We have achieved 100% accuracy in classification among the BRCA1–BRCA2 samples with RBF kernel of SVM. RBF kernel also shows better performance among all data as shown in Figure 5.

Figure 5

Classification accuracy shown with polynomial, linear and radial basis function kernel among the BRCA1–BRCA2, BRCA1-sporadic and BRCA2-sporadic.

## Conclusion

We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance.

## Availability and Requirements

Project name: SVM Classifies

Project home page: http://mfgn.usm.edu/ebl/svm/

Operating systems: platform independent

Programming language: Java Swing

Other requirements: Java JRE 1.4.2 or higher

License: GNU GPL

Any restrictions to use by non-academics: none

## List of abbreviations

VC dimension: Vapnik Chervonenkis dimension

C-SVC: C-Support Vector Classification

nu-SVC: *ν*-Support Vector Classification

nu-SVR: *ν*-Support Vector Regression (*ν*-SVR)

*ε*-SVR: *ε*-Support Vector Regression (epsilon-SVR)

## Authors' contributions

Mehdi Pirooznia has designed and implemented the study and handled java software development, software engineering issues for SVM Classifier and data set preparation and testing. Youping Deng has coordinated and directed the project and revised the manuscript. Both authors have read and approved the final manuscript.

## Acknowledgements

This work was supported by Dean's Research Initiative award of the University of Southern Mississippi to Youping Deng and the Mississippi Functional Genomics Network (DHHS/NIH/NCRR Grant# 2P20RR016476-04).

This article has been published as part of *BMC Bioinformatics *Volume 7, Supplement 4, 2006: Symposium of Computations in Bioinformatics and Bioscience (SCBB06). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/7?issue=S4.

## References

- Noble WS. Support vector machine applications in computational biology. In: Schölkopf B, Tsuda K, Vert JP, editor. Kernel Methods in Computational Biology. MIT Press; 2004. pp. 71–92.
- Vapnik VN. Statistical Learning Theory Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley: New York; 1998.
- Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet C, Furey TS, Ares JM, Haussler D. Knowledge-based analysis of microarray gene expression data using support vector machines. Proc Natl Acad Sci USA. 2000;97:262–267. doi: 10.1073/pnas.97.1.262.[PMC free article][PubMed][Cross Ref]
- Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning. 2001;46:389–422.
- Pavlidis P, Wapinski I, Noble WS. Support vector machine classification on the web. Bioinformatics. 2004;20:586–587. doi: 10.1093/bioinformatics/btg461.[PubMed][Cross Ref]
- Chih-Chung Chang, Chih-Jen Lin. LIBSVM, a library for support vector machines. 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm
- Cortes C, Vapnik V. Support-vector network. Machine Learning. 1995;20:273–297.
- Schölkopf B, Smola A, Williamson RC, Bartlett PL. New support vector algorithms. Neural Computation. 2000;12:1207–1245. doi: 10.1162/089976600300015565.[PubMed][Cross Ref]
- Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC. Estimating the support of a high-dimensional distribution. Neural Computation. 2001;13:1443–1471. doi: 10.1162/089976601750264965.[PubMed][Cross Ref]
- Joachims T. Making large-scale SVM learning practical. In: Schölkopf B, Burges CJC, Smola AJ, editor. Advances in Kernel Methods – Support Vector Learning. Cambridge: MIT Press; 1998.
- Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, Raffeld M, et al. Gene-expression profiles in hereditary breast cancer. N Engl J Med. 2001;344:539–548. doi: 10.1056/NEJM200102223440801.[PubMed][Cross Ref]