Skip to main content

Linear Classification

info

In this chapter, you'll learn about:

  • Classification Problems: Understanding the difference between classification and regression tasks.
  • Types of Classification: Binary, multiclass, and multilabel classification.
  • Evaluation Metrics: Accuracy, confusion matrix, precision, recall, and F1-score.
  • Linear Classifiers: Introduction to linear classification models and decision boundaries.
  • Heuristics in Classification: Discussing linear regression for classification, Fisher's linear discriminant, and support vector machines.

In previous chapters, we focused on regression problems where the target variable is continuous. In this chapter, we shift our attention to classification problems, a fundamental type of supervised learning where the target variable is categorical.

Classification tasks involve assigning inputs to one of several predefined categories. Examples include spam detection, digit recognition, and sentiment analysis. Understanding classification is crucial for a wide range of applications in machine learning and artificial intelligence.

Classification vs. Regression

Regression

  • Target Variable: Continuous real numbers.
  • Goal: Predict numerical values.
  • Examples: Predicting house prices, forecasting stock prices.

Classification

  • Target Variable: Categorical labels from a predefined set.
  • Goal: Assign inputs to discrete categories.
  • Examples: Email spam detection, image classification, sentiment analysis.

Key Differences

  • Data Type of Target: Continuous (regression) vs. categorical (classification).
  • Evaluation Metrics: Different metrics are used to assess performance.

Examples of Classification Tasks

Spam Detection

  • Objective: Classify emails as "spam" or "not spam."
  • Features: Bag-of-words representation, where each feature indicates the presence or frequency of a word in the email.
    • Vocabulary V={v1,v2,,vK}V = \{v_1, v_2, \dots, v_K\}.
    • Feature Vector: x=[x1,x2,,xK]\mathbf{x} = [x_1, x_2, \dots, x_K], where xix_i indicates the occurrence of word viv_i.
  • Classes: "Spam" (1) or "Not Spam" (0).

Digit Recognition

  • Objective: Recognize handwritten digits from images.
  • Features: Pixel values from the image.
  • Classes: Digits from 0 to 9.
  • Note: Even though digits are numbers, they are treated as categorical labels in this context.

Product Rating Prediction

  • Objective: Predict a user's rating for a product.
  • Classes: Discrete ratings (e.g., 1 to 5 stars).
  • Considerations:
    • Regression Approach: Treat ratings as continuous values to exploit order and distance information.
    • Classification Approach: Treat ratings as categorical labels to model each rating individually.
  • Decision: Choice between regression and classification depends on the dataset and task requirements.

Types of Classification

Binary Classification

  • Definition: Classification with two possible classes.
  • Examples: Spam vs. not spam, cancer detection (malignant vs. benign).

Multiclass Classification

  • Definition: Classification with more than two classes.
  • Constraint: Each input is assigned to one and only one class.
  • Examples: Digit recognition (classes 0 to 9).

Multilabel Classification

  • Definition: Each input can be assigned multiple labels simultaneously.
  • Representation:
    • Set of Labels: T{C1,C2,,CK}T \subseteq \{C_1, C_2, \dots, C_K\}.
    • Binary Vector: t=[t1,t2,,tK]\mathbf{t} = [t_1, t_2, \dots, t_K], where ti{0,1}t_i \in \{0, 1\}.
  • Examples:
    • Emotion Detection: A text may express multiple emotions (e.g., "sad" and "angry").
    • Image Tagging: An image may contain multiple objects (e.g., "mountain," "creek," "sun").

Handling Multilabel Classification

  • Simplest Approach: Decompose into multiple binary classification problems, one for each label.
  • Advanced Approach: Model relationships between labels to improve performance.

Evaluation Metrics

Evaluating classification models requires appropriate metrics that reflect the performance accurately, especially in the presence of class imbalance.

Accuracy

  • Definition: Accuracy=Number of Correct PredictionsTotal Number of Predictions\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}
  • Limitations:
    • Skewed Class Distributions: High accuracy can be misleading when classes are imbalanced.
    • Example: In cancer detection with 99% healthy cases, a model predicting "healthy" for all patients achieves 99% accuracy but is useless.

Confusion Matrix

A confusion matrix provides a detailed breakdown of correct and incorrect predictions.

Binary Classification Confusion Matrix

Predicted PositivePredicted Negative
Actual PositiveTrue Positive (TP)False Negative (FN)
Actual NegativeFalse Positive (FP)True Negative (TN)
  • True Positive (TP): Correctly predicted positives.
  • False Positive (FP): Incorrectly predicted positives (Type I error).
  • False Negative (FN): Incorrectly predicted negatives (Type II error).
  • True Negative (TN): Correctly predicted negatives.

Precision and Recall

These metrics are especially useful in imbalanced datasets.

Precision

  • Definition: Precision=TPTP+FP\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}
  • Interpretation: Measures the accuracy of positive predictions.

Recall (Sensitivity)

  • Definition: Recall=TPTP+FN\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}
  • Interpretation: Measures the ability to find all positive samples.

F1-Score

  • Definition: Harmonic mean of precision and recall. F1-Score=2×Precision×RecallPrecision+Recall\text{F1-Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
  • Purpose: Provides a single metric that balances precision and recall.
  • Use Case: Effective in evaluating models on imbalanced datasets.

Weighted Risk and Subjectivity

  • Weighted Risk: Assigning different costs to false positives and false negatives.
  • Challenge: Determining appropriate weights is subjective and can be controversial.
  • Alternative: Use precision, recall, and F1-score to avoid subjective weighting.

Key Considerations

  • Positive Class Definition: Precision and recall focus on the positive class, which should be the minority class in skewed datasets.
  • Metric Selection: Choose metrics that align with the problem's objectives and consider class distribution.

Hypothesis Class for Linear Classification

Linear classifiers make decisions based on linear functions of the input features.

Binary Classification Hypothesis

  • Decision Function: y(x)={1if wx+b00otherwisey(\mathbf{x}) = \begin{cases} 1 & \text{if } \mathbf{w}^\top \mathbf{x} + b \geq 0 \\ 0 & \text{otherwise} \end{cases}
  • Interpretation:
    • wx+b\mathbf{w}^\top \mathbf{x} + b is the goodness score.
    • The model predicts class 1 if the goodness score exceeds a threshold (here, 0).

Decision Boundary

  • Definition: The set of points where the decision changes (i.e., where wx+b=0\mathbf{w}^\top \mathbf{x} + b = 0).
  • Properties:
    • In one-dimensional space, the decision boundary is a point (threshold).
    • In two-dimensional space, it's a line.
    • In three-dimensional space, it's a plane.
    • In nn-dimensional space, it's a hyperplane.

Geometric Interpretation

  • Normal Vector (w\mathbf{w}): Perpendicular to the decision boundary.
  • Classification Regions:
    • Positive Side: Points where wx+b0\mathbf{w}^\top \mathbf{x} + b \geq 0.
    • Negative Side: Points where wx+b<0\mathbf{w}^\top \mathbf{x} + b < 0.
  • Decision Function: A step function based on the sign of the goodness score.

Visualizing Decision Boundaries

  • One-Dimensional Example:
    • Threshold at x=bwx = -\frac{b}{w}.
  • Two-Dimensional Example:
    • Decision boundary is a line w1x1+w2x2+b=0w_1 x_1 + w_2 x_2 + b = 0.
  • Higher Dimensions:
    • Decision boundary is an (n1)(n-1)-dimensional hyperplane.

Attempts to Use Linear Regression for Classification

Linear Regression Model

  • Form: y(x)=wx+by(\mathbf{x}) = \mathbf{w}^\top \mathbf{x} + b
  • Issue:
    • Predicts continuous values, not suitable for categorical targets.
    • Linear regression tries to fit the exact target values (0 or 1), leading to inappropriate predictions for classification.

Problems with Linear Regression for Classification

  • Outliers Influence: Linear regression is sensitive to outliers, which can skew the decision boundary.
  • Penalty on Well-Classified Samples: Over-penalizes samples that are already correctly classified, trying to fit them exactly.
  • Thresholding Issues: Using a threshold (e.g., 0.5) to convert continuous outputs to classes can be arbitrary and ineffective.

Illustrative Example

  • Scenario: One-dimensional data with classes labeled as 0 or 1.
  • Linear Regression Fit: Attempts to fit a straight line through the target values.
  • Result: Poor classification performance, especially with outliers.

Alternative Approaches

  • Squashing Functions: Apply functions like the sigmoid to map outputs to probabilities between 0 and 1.
  • Classification-Specific Models: Use models designed for classification tasks, such as logistic regression.

Heuristics for Linear Classification

Fisher's Linear Discriminant

  • Objective: Find a projection that maximizes the separation between classes.
  • Method:
    • Between-Class Variance: Maximize the variance between class means.
    • Within-Class Variance: Minimize the variance within each class.
  • Result: A linear decision boundary that separates classes effectively.
  • Note: It's a discriminant method, not a probabilistic model.

Support Vector Machines (SVM)

  • Concept: Maximize the margin between classes.
  • Margin: The distance between the decision boundary and the nearest data points from any class.
  • Support Vectors: Data points that lie on the margin and influence the position of the decision boundary.
  • Advantages:
    • Effective in high-dimensional spaces.
    • Robust to outliers due to the focus on margin maximization.
  • Discussion: We'll explore SVMs in more detail in future lectures.

Conclusion

In this chapter, we've introduced the fundamentals of classification problems and the challenges associated with them. We've discussed the limitations of using linear regression for classification and highlighted the importance of selecting appropriate evaluation metrics, especially in the presence of class imbalance.

Understanding the hypothesis class for linear classification sets the foundation for more advanced models. In subsequent chapters, we'll delve into probabilistic models for classification, such as logistic regression and more sophisticated techniques.

Recap

  • Classification Problems: Involve predicting categorical target variables.
  • Types of Classification: Binary, multiclass, and multilabel.
  • Evaluation Metrics: Accuracy, precision, recall, and F1-score are crucial for assessing model performance.
  • Linear Classifiers:
    • Decision boundaries separate classes based on linear functions.
    • The normal vector is key in defining the orientation of the decision boundary.
  • Challenges with Linear Regression:
    • Not suitable for classification due to continuous outputs.
    • Sensitive to outliers and may not provide meaningful decision boundaries.
  • Heuristics for Classification:
    • Fisher's linear discriminant optimizes class separability.
    • Support vector machines focus on maximizing the margin between classes.