Logistic Regression
info
In this chapter, you'll learn about:
- Binary Classification with Probabilistic Models: Modeling binary outcomes using probabilities.
- Bernoulli Distribution: Understanding the distribution for binary random variables.
- Logistic Function (Sigmoid Function): Introducing the squashing function to map linear combinations to probabilities.
- Logistic Regression Model: Formulating the logistic regression for binary classification.
- Maximum Likelihood Estimation (MLE): Deriving the loss function for logistic regression.
- Cross-Entropy Loss: Connecting the logistic regression loss to cross-entropy and KL divergence.
- Gradient Computation: Calculating gradients for optimization.
- Convexity and Optimization: Discussing the convex nature of logistic regression and optimization methods.
In previous chapters, we introduced classification problems and explored linear classifiers. We discussed the limitations of using linear regression for classification and the need for models specifically designed for categorical outcomes.
In this chapter, we delve into logistic regression, a fundamental algorithm for binary classification tasks. Logistic regression models the probability that a given input belongs to a particular category, allowing for probabilistic interpretation of predictions. It is widely used due to its simplicity, interpretability, and effectiveness.
Binary Classification and the Bernoulli Distribution
Binary Classification Recap
- Objective: Assign an input to one of two classes, labeled as 0 or 1.
- Examples: Spam detection (spam or not spam), disease diagnosis (disease or healthy).
Bernoulli Distribution
- Definition: A discrete probability distribution for a random variable that has two possible outcomes, 1 (success) and 0 (failure).
- Parameter: , where .
- Probability Mass Function:
- Use in Classification: Models the probability that the target variable equals 1.
Modeling the Probability with Inputs
- Goal: Model as a function of the input features .
- Linear Combination: Compute a linear combination .
- Issue: The linear combination can take any real value, but must be between 0 and 1.
The Logistic Function (Sigmoid Function)
Need for a Squashing Function
- Purpose: Map the linear combination to a value between 0 and 1.
- Requirements:
- Monotonic increasing function.
- Outputs values strictly between 0 and 1.
Logistic (Sigmoid) Function
-
Definition:
-
Properties:
- Range: for all real .
- S-shape Curve: As , ; as , .
- Symmetry: .
-
Visualization:
Alternative Functions
- Probit Function: Based on the cumulative distribution function (CDF) of the normal distribution.
- Definition: