Linear Classification

info

In this chapter, you'll learn about:

Classification Problems: Understanding the difference between classification and regression tasks.
Types of Classification: Binary, multiclass, and multilabel classification.
Evaluation Metrics: Accuracy, confusion matrix, precision, recall, and F1-score.
Linear Classifiers: Introduction to linear classification models and decision boundaries.
Heuristics in Classification: Discussing linear regression for classification, Fisher's linear discriminant, and support vector machines.

In previous chapters, we focused on regression problems where the target variable is continuous. In this chapter, we shift our attention to classification problems, a fundamental type of supervised learning where the target variable is categorical.

Classification tasks involve assigning inputs to one of several predefined categories. Examples include spam detection, digit recognition, and sentiment analysis. Understanding classification is crucial for a wide range of applications in machine learning and artificial intelligence.

Classification vs. Regression

Regression

Target Variable: Continuous real numbers.
Goal: Predict numerical values.
Examples: Predicting house prices, forecasting stock prices.

Classification

Target Variable: Categorical labels from a predefined set.
Goal: Assign inputs to discrete categories.
Examples: Email spam detection, image classification, sentiment analysis.

Key Differences

Data Type of Target: Continuous (regression) vs. categorical (classification).
Evaluation Metrics: Different metrics are used to assess performance.

Examples of Classification Tasks

Spam Detection

Objective: Classify emails as "spam" or "not spam."
Features: Bag-of-words representation, where each feature indicates the presence or frequency of a word in the email.
- Vocabulary $V = \{v_1, v_2, \dots, v_K\}$ .
- Feature Vector: $\mathbf{x} = [x_1, x_2, \dots, x_K]$ , where $x_i$ indicates the occurrence of word $v_i$ .
Classes: "Spam" (1) or "Not Spam" (0).

Digit Recognition

Objective: Recognize handwritten digits from images.
Features: Pixel values from the image.
Classes: Digits from 0 to 9.
Note: Even though digits are numbers, they are treated as categorical labels in this context.

Product Rating Prediction

Objective: Predict a user's rating for a product.
Classes: Discrete ratings (e.g., 1 to 5 stars).
Considerations:
- Regression Approach: Treat ratings as continuous values to exploit order and distance information.
- Classification Approach: Treat ratings as categorical labels to model each rating individually.
Decision: Choice between regression and classification depends on the dataset and task requirements.

Types of Classification

Binary Classification

Definition: Classification with two possible classes.
Examples: Spam vs. not spam, cancer detection (malignant vs. benign).

Multiclass Classification

Definition: Classification with more than two classes.
Constraint: Each input is assigned to one and only one class.
Examples: Digit recognition (classes 0 to 9).

Multilabel Classification

Definition: Each input can be assigned multiple labels simultaneously.
Representation:
- Set of Labels: $T \subseteq \{C_1, C_2, \dots, C_K\}$ .
- Binary Vector: $\mathbf{t} = [t_1, t_2, \dots, t_K]$ , where $t_i \in \{0, 1\}$ .
Examples:
- Emotion Detection: A text may express multiple emotions (e.g., "sad" and "angry").
- Image Tagging: An image may contain multiple objects (e.g., "mountain," "creek," "sun").

Handling Multilabel Classification

Simplest Approach: Decompose into multiple binary classification problems, one for each label.
Advanced Approach: Model relationships between labels to improve performance.

Evaluation Metrics

Evaluating classification models requires appropriate metrics that reflect the performance accurately, especially in the presence of class imbalance.

Accuracy

Definition: $\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}$
Limitations:
- Skewed Class Distributions: High accuracy can be misleading when classes are imbalanced.
- Example: In cancer detection with 99% healthy cases, a model predicting "healthy" for all patients achieves 99% accuracy but is useless.

Confusion Matrix

A confusion matrix provides a detailed breakdown of correct and incorrect predictions.

Binary Classification Confusion Matrix

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

True Positive (TP): Correctly predicted positives.
False Positive (FP): Incorrectly predicted positives (Type I error).
False Negative (FN): Incorrectly predicted negatives (Type II error).
True Negative (TN): Correctly predicted negatives.

Precision and Recall

These metrics are especially useful in imbalanced datasets.

Precision

Definition: $\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$
Interpretation: Measures the accuracy of positive predictions.

Recall (Sensitivity)

Definition: $\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}$
Interpretation: Measures the ability to find all positive samples.

F1-Score

Definition: Harmonic mean of precision and recall. $\text{F1-Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$
Purpose: Provides a single metric that balances precision and recall.
Use Case: Effective in evaluating models on imbalanced datasets.

Weighted Risk and Subjectivity

Weighted Risk: Assigning different costs to false positives and false negatives.
Challenge: Determining appropriate weights is subjective and can be controversial.
Alternative: Use precision, recall, and F1-score to avoid subjective weighting.

Key Considerations

Positive Class Definition: Precision and recall focus on the positive class, which should be the minority class in skewed datasets.
Metric Selection: Choose metrics that align with the problem's objectives and consider class distribution.

Hypothesis Class for Linear Classification

Linear classifiers make decisions based on linear functions of the input features.

Binary Classification Hypothesis

Decision Function: $y(\mathbf{x}) = \begin{cases} 1 & \text{if } \mathbf{w}^\top \mathbf{x} + b \geq 0 \\ 0 & \text{otherwise} \end{cases}$
Interpretation:
- $\mathbf{w}^\top \mathbf{x} + b$ is the goodness score.
- The model predicts class 1 if the goodness score exceeds a threshold (here, 0).

Decision Boundary

Definition: The set of points where the decision changes (i.e., where $\mathbf{w}^\top \mathbf{x} + b = 0$ ).
Properties:
- In one-dimensional space, the decision boundary is a point (threshold).
- In two-dimensional space, it's a line.
- In three-dimensional space, it's a plane.
- In $n$ -dimensional space, it's a hyperplane.

Geometric Interpretation

Normal Vector ( $\mathbf{w}$ ): Perpendicular to the decision boundary.
Classification Regions:
- Positive Side: Points where $\mathbf{w}^\top \mathbf{x} + b \geq 0$ .
- Negative Side: Points where $\mathbf{w}^\top \mathbf{x} + b < 0$ .
Decision Function: A step function based on the sign of the goodness score.

Visualizing Decision Boundaries

One-Dimensional Example:
- Threshold at $x = -\frac{b}{w}$ .
Two-Dimensional Example:
- Decision boundary is a line $w_1 x_1 + w_2 x_2 + b = 0$ .
Higher Dimensions:
- Decision boundary is an $(n-1)$ -dimensional hyperplane.

Attempts to Use Linear Regression for Classification

Linear Regression Model

Form: $y(\mathbf{x}) = \mathbf{w}^\top \mathbf{x} + b$
Issue:
- Predicts continuous values, not suitable for categorical targets.
- Linear regression tries to fit the exact target values (0 or 1), leading to inappropriate predictions for classification.

Problems with Linear Regression for Classification

Outliers Influence: Linear regression is sensitive to outliers, which can skew the decision boundary.
Penalty on Well-Classified Samples: Over-penalizes samples that are already correctly classified, trying to fit them exactly.
Thresholding Issues: Using a threshold (e.g., 0.5) to convert continuous outputs to classes can be arbitrary and ineffective.

Illustrative Example

Scenario: One-dimensional data with classes labeled as 0 or 1.
Linear Regression Fit: Attempts to fit a straight line through the target values.
Result: Poor classification performance, especially with outliers.

Alternative Approaches

Squashing Functions: Apply functions like the sigmoid to map outputs to probabilities between 0 and 1.
Classification-Specific Models: Use models designed for classification tasks, such as logistic regression.

Heuristics for Linear Classification

Fisher's Linear Discriminant

Objective: Find a projection that maximizes the separation between classes.
Method:
- Between-Class Variance: Maximize the variance between class means.
- Within-Class Variance: Minimize the variance within each class.
Result: A linear decision boundary that separates classes effectively.
Note: It's a discriminant method, not a probabilistic model.

Support Vector Machines (SVM)

Concept: Maximize the margin between classes.
Margin: The distance between the decision boundary and the nearest data points from any class.
Support Vectors: Data points that lie on the margin and influence the position of the decision boundary.
Advantages:
- Effective in high-dimensional spaces.
- Robust to outliers due to the focus on margin maximization.
Discussion: We'll explore SVMs in more detail in future lectures.

Classification vs. Regression​

Regression​

Classification​

Key Differences​

Examples of Classification Tasks​

Spam Detection​

Digit Recognition​

Product Rating Prediction​

Types of Classification​

Binary Classification​

Multiclass Classification​

Multilabel Classification​

Handling Multilabel Classification​

Evaluation Metrics​

Accuracy​

Confusion Matrix​

Binary Classification Confusion Matrix​

Precision and Recall​

Precision​

Recall (Sensitivity)​

F1-Score​

Weighted Risk and Subjectivity​

Key Considerations​

Hypothesis Class for Linear Classification​

Binary Classification Hypothesis​

Decision Boundary​

Geometric Interpretation​

Visualizing Decision Boundaries​

Attempts to Use Linear Regression for Classification​

Linear Regression Model​

Problems with Linear Regression for Classification​

Illustrative Example​

Alternative Approaches​

Heuristics for Linear Classification​

Fisher's Linear Discriminant​

Support Vector Machines (SVM)​