Skip to main content

Command Palette

Search for a command to run...

#9 Logistic Regression

Updated
5 min read
#9 Logistic Regression
A
Machine Learning Engineer and open-source developer focused on NLP, LLM applications, Retrieval-Augmented Generation (RAG), semantic search, and AI infrastructure. I enjoy building developer tools, portable AI systems, and production-ready ML pipelines using Python, FastAPI, FAISS, LangChain, TensorFlow, and PyTorch. Creator of: • RagBucket — portable executable RAG artifacts for Python • LazyTune — fast hyperparameter optimization library • AkBOT — AI portfolio chatbot using RAG Contributor to open-source projects including NumPy and LocalStack.

Logistic Regression is a supervised machine learning algorithm mainly used for classification problems.
It predicts the probability of a data point belonging to a particular class.

Examples:

  • Spam detection

  • Disease prediction

  • Fraud detection

  • Sentiment analysis

  • Customer churn prediction

What is Classification?

Classification predicts categories/classes instead of continuous values.

Examples

Problem Output
Email Spam Detection Spam / Not Spam
Disease Prediction Positive / Negative
Sentiment Analysis Positive / Neutral / Negative

Why “Regression” in Logistic Regression?

Even though it is used for classification, the algorithm calculates probabilities using a mathematical regression equation.

The final output is converted into classes using the Sigmoid Function.

Logistic Regression Equation

The linear equation

$$z = b_0 + b_1x_1 + b_2x_2 + \dots + b_nx_n$$

where

Symbol Meaning
(b_0) Bias / Intercept
(b_1,b_2) Coefficients
(x_1,x_2) Input features

Sigmoid Function

Logistic Regression uses the sigmoid function to convert values into probabilities between 0 and 1.

$$\sigma(z)=\frac{1}{1+e^{-z}}$$

Output range:

Value Meaning
Close to 1 Positive Class
Close to 0 Negative Class

Decision Boundary

generally...

$$P \ge 0.5 \rightarrow Class\ 1$$

$$P < 0.5 \rightarrow Class\ 0$$

How Logistic Regression Works

Steps

  1. Take input features

  2. Apply linear equation

  3. Pass result through sigmoid function

  4. Generate probability

  5. Classify output using threshold

Example

dataset

Hours Studied Pass
1 No
2 No
3 No
5 Yes
6 Yes
7 Yes

Suppose a student studies for :

$$x=4$$

Assume:

$$z=−4+1.2x$$

therefore :

$$z=−4+1.2(4) =−4+4.8 =0.8$$

Apply sigmoid fn:

$$P = \frac{1}{1+e^{-0.8}}$$

$$P \approx 0.69$$

since : 0.69 > 0.5

Prediction : Pass

Cost Function

Logistic Regression uses Log Loss (Cross Entropy Loss).

$$Loss = -\left[y\log(\hat{y}) + (1-y)\log(1-\hat{y})\right]$$

Odds Ratio

  • Odds Ratio tells us how the odds of an event change when a feature increases.

  • It helps us understand the effect of a predictor on the outcome.

  • If Odds Ratio = 1, there is no effect.

  • If Odds Ratio > 1, the event becomes more likely.

  • If Odds Ratio < 1, the event becomes less likely.

Formula:

$$\text{Odds} = \frac{p}{1-p}$$

If the probability of passing an exam is 0.8,

$$\text{Odds} = \frac{0.8}{1-0.8} = \frac{0.8}{0.2} = 4$$

$$ \therefore \text{Odds} = 4:1$$

Conclusion:

If P(pass) = 0.8, then Odds = 4:1. This means the likelihood of passing is four times the likelihood of failing.

Advantages of Logistic Regression

  • Simple and fast

  • Easy to interpret

  • Works well for binary classification

  • Produces probability outputs

  • Requires less computational power

Disadvantages

  • Not suitable for complex relationships

  • Sensitive to outliers

  • Assumes linear relationship

  • Lower accuracy on highly non-linear data

Applications of Logistic Regression

  • Email spam filtering

  • Credit card fraud detection

  • Medical diagnosis

  • Customer churn prediction

  • Marketing prediction

Python Example

from sklearn.linear_model import LogisticRegression
import numpy as np

# Features
X = np.array([
    [1],
    [2],
    [3],
    [5],
    [6],
    [7]
])

# Labels
y = np.array([0, 0, 0, 1, 1, 1])

# Create model
model = LogisticRegression()

# Train model
model.fit(X, y)

# Predict
prediction = model.predict([[4]])

print(prediction)

Logistic Regression without sklearn

import numpy as np

def sigmoid(x):
    # Standardization: zero mean, unit variance
    x_mean = np.mean(x)
    x_std = np.std(x)
    # Avoid division by zero by adding a small epsilon
    x_std = x_std if x_std != 0 else 1e-10
    x_normalized = (x - x_mean) / x_std
    # Clipping to prevent overflow
    x_clipped = np.clip(x_normalized, -500, 500)
    return 1 / (1 + np.exp(-x_clipped))

class LogisticRegression:
    def __init__(self, eta=0.001, epochs=1000):
        self.eta = eta
        self.epochs = epochs
        self.weights = None
        self.bias = None
        self.X_mean = None
        self.X_std = None
        
    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0
        # Standardize X once at the start
        self.X_mean = np.mean(X, axis=0)
        self.X_std = np.std(X, axis=0)
        self.X_std = np.where(self.X_std == 0, 1e-10, self.X_std)  # Avoid division by zero
        X_normalized = (X - self.X_mean) / self.X_std
        
        for epoch in range(self.epochs):
            linear_pred = np.dot(X_normalized, self.weights) + self.bias
            predictions = sigmoid(linear_pred)
            
            # Compute loss (binary cross-entropy)
            loss = -np.mean(y * np.log(predictions + 1e-15) + (1 - y) * np.log(1 - predictions + 1e-15))
            dw = (1 / n_samples) * np.dot(X_normalized.T, (predictions - y))
            db = (1 / n_samples) * np.sum(predictions - y)
            
            self.weights -= self.eta * dw
            self.bias -= self.eta * db
            
            # Print progress every 100 epochs
            if epoch % 100 == 0:
                print(f"Epoch {epoch}, Loss: {loss:.4f}, Bias: {self.bias:.4f}")
    
    def predict(self, X):
        linear_pred = np.dot(X, self.weights) + self.bias
        y_pred = sigmoid(linear_pred)
        class_pred = [0 if y <= 0.5 else 1 for y in y_pred]
        return class_pred

Conclusion

Logistic Regression is one of the most important classification algorithms in machine learning.
It is simple, efficient, interpretable, and widely used in real-world applications involving probability-based predictions.

61 views