#12 K-Medoids Clustering

K-Medoids is an unsupervised machine learning algorithm used for clustering data points into groups.
It is similar to K-Means, but instead of using the mean (centroid) of a cluster, it uses an actual data point called a medoid as the center of the cluster.
K-Medoids is more robust to ... noise, outliers, extreme values.
because medoids are real points from the dataset not the means like the K-means clustering.
What is a Medoid?
A medoid is the data point inside a cluster whose total distance to all other points is minimum.
Unlike K-means :
K-Means → uses average point
K-Medoids → uses actual data point
K-Medoids Objective
The algorithm tries to minimize total distance inside clusters.
Cost Function
$$\mathbf{Cost = \sum_{i=1}^{n} distance(x_i, m_j)}$$
where:
$$\mathbf{x_i = data\ point}$$
$$ \mathbf{m_j = medoid}$$
Steps of K-Medoids
Select K random medoids
Assign points to nearest medoid
Compute total clustering cost
Swap medoids with non-medoid points
Keep the swap if cost decreases
Repeat until cost no longer improves
K-means vs. K-medoids
| Feature | K-Means | K-Medoids |
|---|---|---|
| Center | Mean (centroid) | Actual data point |
| Sensitive to outliers | Yes | No |
| Robustness | Lower | Higher |
| Speed | Faster | Slower |
| Best for | Large datasets | Noisy datasets |
Example Dataset
| Point | X | Y |
|---|---|---|
| A | 1 | 2 |
| B | 2 | 3 |
| C | 8 | 9 |
| D | 9 | 10 |
choose k=2
Step 1 — Initialize Medoids
choose 2 random centroids. suppose C1 = (1,2), C2 = (8,9)
Distance Formula : Eucledian Distance
$$\mathbf{d=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}}$$
Step 2 - Cluster Assignment
| Point | Distance to (M_1) | Distance to (M_2) | Assigned Cluster |
|---|---|---|---|
| A | 0 | 9.89 | Cluster 1 |
| B | 1.41 | 8.48 | Cluster 1 |
| C | 9.89 | 0 | Cluster 2 |
| D | 11.31 | 1.41 | Cluster 2 |
Step 3 - Medoid Update (Check for better medoid)
Now we test whether replacing the current medoid reduces the total cluster cost.
Current medoid in Cluster 1: A(1,2)
Other possible points : B(2,3)
Calculate Total Cost using A as Medoid
$$\mathbf{d(A,B)=\sqrt{2}\approx1.41}$$
Calculate Total Cost using B as Medoid
$$\mathbf{d(B,A)=\sqrt{2}\approx1.41}$$
You can see the 2 costs are same ... so that the medoid will not be updated.
Actually here we have to find the total costs from a particular point ... as here only 2 points in a single cluster So the changes will not happen because distance from A to B and from B to A always be the same.
So this will be better understandable if there is >2 points per cluster.
Suppose if there is A, B, C in the cluster 1 then we have to calculate total 3 costs
if A is medoid : total cost Cost(A) = d(A,B) + d(A,C)
if B is medoid : total cost Cost(B) = d(B, A) + d(B, C)
if C is medoid : total cost Cost(C) = d(C, A) + d(C, B)
we have to find out min(Cos(A), Cost(B), Cost(C))
suppose Cost(B) is minimum
so the updated medoid will be B.
Advantages of K-Medoids
Robust to outliers
Uses real data points
Better for noisy datasets
Produces stable clusters
Disadvantages
Slower than K-Means
Expensive for large datasets
Requires distance calculations repeatedly
Applications of K-Medoids
Customer segmentation
Medical data clustering
Fraud detection
Recommendation systems
Document clustering
Conclusion
K-Medoids is a powerful clustering algorithm that groups similar data points using actual data points as cluster centers called medoids. Compared to K-Means, it is more resistant to noise and outliers, making it suitable for real-world datasets where extreme values may affect clustering quality.





