#10 SVM

Support Vector Machine (SVM) is a supervised machine learning algorithm used for both classification and regression problems.
It works by finding the best boundary (hyperplane) that separates different classes.
SVM helps the model:
classify data using decision boundaries
maximize the margin between classes
handle high-dimensional data
work well for complex datasets
What is a Hyperplane?
A hyperplane is the boundary that separates classes.
For 2D data:
hyperplane → line
margin → distance between classes and boundary
The optimal hyperplane has the maximum margin.
Margin in SVM
SVM tries to maximize the distance between:
nearest data points
and the decision boundary
These nearest points are called Support Vectors.
SVM Equation
The hyperplane equation:
$$\mathbf{w^Tx + b = 0}$$
where
| Symbol | Meaning |
|---|---|
| (w) | weight vector |
| (x) | input features |
| (b) | bias |
Classification Rule
$$f(x)=w^Tx+b$$
prediction
$$\mathbf{f(x)\ge0 \rightarrow Class\ 1}$$
$$\mathbf{f(x)<0 \rightarrow Class\ 0}$$
Support Vectors
Support vectors are the closest points to the hyperplane.
They are important because:
they define the boundary
removing them changes the hyperplane
Types of SVM
1. Linear SVM
Used when data is linearly separable.
Example:
spam vs not spam
pass vs fail
2. Non Linear SVM
Used when data cannot be separated using a straight line.
Non linear SVM uses kernel trick
Different types of Kernel Functions -
Linear Kernel -
$$\mathbf{K(x_{i},x_{j})=x_{i}^{T}x_{j}}$$
Polynomial Kernel -
$$\mathbf{K(x_{i},x_{j}) = (x_{i}^{T}x_{j} + c)^{d}}$$
c = constant, d = degree of polynomial
RBF (Gaussian Kernel) -
$$\mathbf{K(x_{i},x_{j}) = e^{-\gamma ||x_{i}-x_{j}||^{2}}}$$
γ controls the spread
| Linear SVM | Non-Linear SVM |
|---|---|
| Used when data is linearly separable. | Used when data is not linearly separable. |
| Decision boundary is a straight line (2D) or hyperplane. | Decision boundary is curved or complex. |
| Does not require kernel functions. | Uses kernel functions (RBF, Polynomial, Sigmoid). |
| Faster and computationally simpler. | More computationally expensive. |
| Works well for simple datasets. | Works well for complex datasets. |
How does SVM handles non-linear classification problem
When the data cannot be separated by a straight line, SVM uses the Kernel Trick.
The kernel function maps the data from a lower-dimensional space to a higher-dimensional space where the classes become linearly separable. Then SVM finds an optimal hyperplane in that higher-dimensional space.
Steps:
Non-linear data is given.
Apply a kernel function (RBF, Polynomial, Sigmoid, etc.).
Transform data into a higher dimension.
Find the maximum-margin hyperplane.
Use this hyperplane for classification.
Diagram
use the diagram given above in the non-linear SVM section.
Example - SVM Classification Step by Step
Suppose we have this dataset:
| Student | Study Hours | Result |
|---|---|---|
| 1 | 2 | Fail |
| 2 | 3 | Fail |
| 3 | 7 | Pass |
| 4 | 8 | Pass |
We want to classify a new student who studies: 6 hours
Step 1 — Plot the Data
Classes:
Fail → Class 0 || Pass → Class 1
Step 2 — Find the Hyperplane
SVM tries to find the best boundary between the classes.
Decision boundary lies halfway between support vectors.
$$\mathbf{Midpoint = \frac{3+7}{2}} =\frac{10}{2}=5$$
so the seperating boundary is
$$\mathbf{x=5}$$
Step 3 — Identify Support Vectors
| Point | Class |
|---|---|
| 3 | Fail |
| 7 | Pass |
Step 3 — Write Hyperplane Equation
General SVM equation:
$$f(x)=w^Tx+b$$
Since this is 1D data:
$$\mathbf{wx+b=0}$$
Step 4 — Find w and b
Boundary:
$$\mathbf{x=5}$$
Rewrite:
$$\mathbf{x-5=0}$$
Compare with:
$$\mathbf{wx+b=0}$$
we get :
$$\mathbf{w=1, b = -5}$$
Step 5 — Final Decision Function
$$\mathbf{f(x)=x-5}$$
Step 6 — Classification
Rule
$$\mathbf{f(x)\ge0 \rightarrow Pass}$$
$$\mathbf{f(x)<0 \rightarrow Fail}$$
Step 7 — Predict New Student
Suppose:
$$\mathbf{x=6}$$
$$\mathbf{f(6)=6-5} =1$$
since 1 > 0
Prediction : Pass
Advantages of SVM
effective in high-dimensional data
works well with small datasets
powerful for classification
robust against overfitting
Disadvantages
slow for very large datasets
difficult to tune parameters
harder to interpret
sensitive to noisy data
Applications of SVM
Face detection
Image classification
Spam filtering
Bioinformatics
Handwriting recognition
Working Principle of SVM
The working principle of SVM is to find an optimal hyperplane that separates different classes with the maximum possible margin.
Plot the training data points in a feature space.
Identify all possible hyperplanes that can separate the classes.
Calculate the margin for each hyperplane.
Select the hyperplane with the largest margin.
The data points closest to the hyperplane are called support vectors.
New data points are classified based on which side of the hyperplane they fall.
Python Example
from sklearn import svm
# Features
X = [
[2], [3], [7], [8]
]
# Labels
y = [0, 0, 1, 1]
# Create model
model = svm.SVC(kernel='linear')
# Train
model.fit(X, y)
# Predict
prediction = model.predict([[6]])
print(prediction)
Conclusion
Support Vector Machine (SVM) is a powerful supervised machine learning algorithm mainly used for classification problems. It works by finding the optimal hyperplane that separates different classes with the maximum possible margin.
SVM performs well on high-dimensional datasets and is highly effective for tasks like image classification, spam detection, handwriting recognition, and text classification. With the help of kernel functions, SVM can also solve complex non-linear problems.
Although SVM can be slower on very large datasets, it remains one of the most accurate and widely used classification algorithms in machine learning.





