#10 SVM

Support Vector Machine (SVM) is a supervised machine learning algorithm used for both classification and regression problems.
It works by finding the best boundary (hyperplane) that separates different classes.

SVM helps the model:

classify data using decision boundaries
maximize the margin between classes
handle high-dimensional data
work well for complex datasets

What is a Hyperplane?

A hyperplane is the boundary that separates classes.

For 2D data:

hyperplane → line
margin → distance between classes and boundary

The optimal hyperplane has the maximum margin.

Margin in SVM

SVM tries to maximize the distance between:

nearest data points
and the decision boundary

These nearest points are called Support Vectors.

SVM Equation

The hyperplane equation:

$$\mathbf{w^Tx + b = 0}$$

where

Symbol	Meaning
(w)	weight vector
(x)	input features
(b)	bias

Classification Rule

$$f(x)=w^Tx+b$$

prediction

$$\mathbf{f(x)\ge0 \rightarrow Class\ 1}$$

$$\mathbf{f(x)<0 \rightarrow Class\ 0}$$

Support Vectors

Support vectors are the closest points to the hyperplane.

They are important because:

they define the boundary
removing them changes the hyperplane

Types of SVM

1. Linear SVM

Used when data is linearly separable.

Example:

spam vs not spam
pass vs fail

2. Non Linear SVM

Used when data cannot be separated using a straight line.

Non linear SVM uses kernel trick

Different types of Kernel Functions -

Linear Kernel -

$$\mathbf{K(x_{i},x_{j})=x_{i}^{T}x_{j}}$$

Polynomial Kernel -

$$\mathbf{K(x_{i},x_{j}) = (x_{i}^{T}x_{j} + c)^{d}}$$

c = constant, d = degree of polynomial

RBF (Gaussian Kernel) -

$$\mathbf{K(x_{i},x_{j}) = e^{-\gamma ||x_{i}-x_{j}||^{2}}}$$

γ controls the spread

Linear SVM	Non-Linear SVM
Used when data is linearly separable.	Used when data is not linearly separable.
Decision boundary is a straight line (2D) or hyperplane.	Decision boundary is curved or complex.
Does not require kernel functions.	Uses kernel functions (RBF, Polynomial, Sigmoid).
Faster and computationally simpler.	More computationally expensive.
Works well for simple datasets.	Works well for complex datasets.

How does SVM handles non-linear classification problem

When the data cannot be separated by a straight line, SVM uses the Kernel Trick.

The kernel function maps the data from a lower-dimensional space to a higher-dimensional space where the classes become linearly separable. Then SVM finds an optimal hyperplane in that higher-dimensional space.

Steps:

Non-linear data is given.
Apply a kernel function (RBF, Polynomial, Sigmoid, etc.).
Transform data into a higher dimension.
Find the maximum-margin hyperplane.
Use this hyperplane for classification.

Diagram

use the diagram given above in the non-linear SVM section.

Example - SVM Classification Step by Step

Suppose we have this dataset:

Student	Study Hours	Result
1	2	Fail
2	3	Fail
3	7	Pass
4	8	Pass

We want to classify a new student who studies: 6 hours

Step 1 — Plot the Data

Classes:

Fail → Class 0 || Pass → Class 1

Step 2 — Find the Hyperplane

SVM tries to find the best boundary between the classes.

Decision boundary lies halfway between support vectors.

$$\mathbf{Midpoint = \frac{3+7}{2}} =\frac{10}{2}=5$$

so the seperating boundary is

$$\mathbf{x=5}$$

Step 3 — Identify Support Vectors

Point	Class
3	Fail
7	Pass

Step 3 — Write Hyperplane Equation

General SVM equation:

$$f(x)=w^Tx+b$$

Since this is 1D data:

$$\mathbf{wx+b=0}$$

Step 4 — Find w and b

Boundary:

$$\mathbf{x=5}$$

Rewrite:

$$\mathbf{x-5=0}$$

Compare with:

$$\mathbf{wx+b=0}$$

we get :

$$\mathbf{w=1, b = -5}$$

Step 5 — Final Decision Function

$$\mathbf{f(x)=x-5}$$

Step 6 — Classification

Rule

$$\mathbf{f(x)\ge0 \rightarrow Pass}$$

$$\mathbf{f(x)<0 \rightarrow Fail}$$

Step 7 — Predict New Student

Suppose:

$$\mathbf{x=6}$$

$$\mathbf{f(6)=6-5} =1$$

since 1 > 0

Prediction : Pass

Advantages of SVM

effective in high-dimensional data
works well with small datasets
powerful for classification
robust against overfitting

Disadvantages

slow for very large datasets
difficult to tune parameters
harder to interpret
sensitive to noisy data

Applications of SVM

Face detection
Image classification
Spam filtering
Bioinformatics
Handwriting recognition

Working Principle of SVM

The working principle of SVM is to find an optimal hyperplane that separates different classes with the maximum possible margin.

Plot the training data points in a feature space.
Identify all possible hyperplanes that can separate the classes.
Calculate the margin for each hyperplane.
Select the hyperplane with the largest margin.
The data points closest to the hyperplane are called support vectors.
New data points are classified based on which side of the hyperplane they fall.

Python Example

from sklearn import svm

# Features
X = [
    [2], [3], [7], [8]
]

# Labels
y = [0, 0, 1, 1]

# Create model
model = svm.SVC(kernel='linear')

# Train
model.fit(X, y)

# Predict
prediction = model.predict([[6]])

print(prediction)

Conclusion

Support Vector Machine (SVM) is a powerful supervised machine learning algorithm mainly used for classification problems. It works by finding the optimal hyperplane that separates different classes with the maximum possible margin.

SVM performs well on high-dimensional datasets and is highly effective for tasks like image classification, spam detection, handwriting recognition, and text classification. With the help of kernel functions, SVM can also solve complex non-linear problems.

Although SVM can be slower on very large datasets, it remains one of the most accurate and widely used classification algorithms in machine learning.