Linear Regression
In this chapter you'll be introduced to:
- Linear Regression: Understanding the fundamentals of linear regression in supervised learning.
- Hypothesis Class for Linear Regression: Defining the set of functions used in linear regression models.
- Linear Regression Model: Formulating the linear model with weights and bias.
- Affine Transformations and Extended Features: Exploring the relationship between linear and affine transformations.
- Importance of Linear Models: Discussing why linear models are foundational and their relevance to more complex models.
- Training Criteria: Introducing the mean squared error loss function used for training linear regression models.
- Matrix-Vector Representation: Representing linear regression models and loss functions using matrices and vectors for computational efficiency.
- Future Directions: Preparing for discussions on convexity, optimization, and theoretical properties of linear regression.
Linear Regression
Linear Regression is one of the most basic algorithm in supervised learning, particularly in regression tasks where the target variable is continuous. It involves learning a linear relationship between input features and the target output.
In a supervised learning setting, we have:
-
Training Data: A set of examples , where:
- is the input vector for the -th example.
- is the target output for the -th example.
-
Goal: Learn a function (hypothesis) that maps inputs to outputs:
-
Prediction: For a new input , predict the output .
Hypothesis Class for Linear Regression
The hypothesis class in linear regression consists of all functions that can be represented as a linear combination of the input features plus a bias term. Formally, the hypothesis class is defined as:
- : Weight vector (parameters) of the model.
- : Bias term (intercept).
By specifying this hypothesis class, we're considering all possible linear functions parameterized by and .
The Linear Regression Model
The linear regression model predicts the output (also denoted as ) as:
- : Input feature vector.
- : Weight vector.
- : Bias term.
- Input:
- Model:
- Hypothesis Class: All lines (except vertical lines) in 2D space.
Parameters
- Weights (): Coefficients for each input feature.
- Bias (): Allows the regression line (or hyperplane) to shift upward or downward.
Note: The weight and bias are constrained in the example for simplicity of the plot.
Linear Transformation
A linear transformation has no bias term:
- Properties:
- Passes through the origin.
- Does not allow shifting.
Affine Transformation
An affine transformation includes a bias term:
- Properties:
- Allows shifting of the function.
- More flexible in fitting data.
We can represent affine transformations as linear transformations in a higher-dimensional space by introducing an augmented feature vector.
Augmented Feature Vector
Define:
- Extended Input Vector:
- Extended Weight Vector:
Then, the model becomes:
This representation simplifies notation and allows us to treat the bias term as part of the weight vector.
Why Linear Models?
Linear models are straightforward and easy to interpret. Additionally, they provide a foundation for understanding more complex models.
Nonlinear Relationships
While many real-world relationships are nonlinear, linear models can approximate nonlinear functions through:
Nonlinear Feature Transformations
- Polynomial Features: Include , , etc., as additional features.
- Feature Engineering: Manually create features that capture nonlinearities.
Kernel Methods
- Nonlinear Kernels: Use kernel functions to map inputs into a higher-dimensional space where linear regression can be applied.
Neural Networks
- Nonlinear Activation Functions: Introduce nonlinearity through activation functions in neural networks.
- Deep Learning: Stack multiple layers to capture complex patterns.
Understanding linear regression is essential before moving on to more advanced models like neural networks and support vector machines.
Training Criteria for Linear Regression
The goal is to find the parameters and that minimize the difference between the predicted outputs and the true targets.
Mean Squared Error (MSE) Loss
The MSE loss function is commonly used:
- : Predicted output for the -th example.
- : True target for the -th example.
- : Number of training examples.
Loss for a Single Example
For a single training example:
Why Use the Mean Squared Error?
- Penalizes Larger Errors More Severely: Squaring the errors emphasizes larger discrepancies.
- Mathematical Convenience: Differentiable and leads to closed-form solutions.
- Theoretical Justification: Relates to the assumption of normally distributed errors.
Visualizing the Loss Function
Consider a one-dimensional input:
- Data Points: Plotted on a scatter plot with input on the horizontal axis and target on the vertical axis.
- Regression Line: Represents the model's predictions.
- Residuals: Vertical distances between data points and the regression line.
The MSE loss calculates the average of the squared residuals.
Matrix-Vector Representation
Representing the linear regression model and loss function using matrices and vectors simplifies computations, especially for large datasets.
Design Matrix
- Definition: The design matrix contains all input vectors stacked as rows.
- Augmented with Bias Term: If including the bias term, prepend a column of ones to .
Formally
Weight Vector
Predictions
Compute predictions for all training examples simultaneously:
- : Vector of predicted outputs .
Loss Function
Express the MSE loss in matrix form:
- : Vector of true targets .
- : Euclidean norm (L2 norm).
Advantages
- Computational Efficiency: Enables the use of vectorized operations.
- Simplifies Derivatives: Easier to compute gradients for optimization.
Recap
In this chapter, we've covered:
- Linear Regression Fundamentals: Understanding the linear regression model and its components.
- Hypothesis Class Definition: Specifying the set of linear functions used in linear regression.
- Affine Transformations: Incorporating the bias term and representing it using extended feature vectors.
- Importance of Linear Models: Recognizing the foundational role of linear models in machine learning.
- Training Criteria: Introducing the mean squared error loss function and its significance.
- Matrix-Vector Representation: Leveraging matrices and vectors for efficient computation and notation.