Generalized Linear Models
In this chapter, you'll learn about:
- Generalized Linear Models (GLMs): A unified view of linear regression, logistic regression, and other probabilistic linear models.
- The Three Parts of a GLM: The random component, linear predictor, and link function.
- Exponential-Family Distributions: Why Gaussian, Bernoulli, and Poisson models fit naturally into the same framework.
- Canonical Links and Likelihoods: How the choice of link determines the loss and optimization problem.
- Practical Modeling Choices: When GLMs are appropriate and when you need richer nonlinear models.
In the previous chapters, we studied linear regression, logistic regression, and multinomial regression as if they were separate algorithms. They are closely related. A generalized linear model (GLM) is a common template that explains all of them in one language.
The key idea is simple: we keep a linear predictor in the features, but we allow the target variable to follow a distribution that matches the task. Continuous real-valued targets suggest Gaussian models, binary targets suggest Bernoulli models, and count data often suggests Poisson models.
Why We Need GLMs
Ordinary linear regression assumes:
This works well when:
- The target is real-valued.
- The conditional noise is roughly symmetric.
- Predicting any real number is acceptable.
It breaks down when the output has structural constraints:
- Binary labels must stay in or be interpreted as probabilities in .
- Counts must be nonnegative integers.
- Multiclass outputs must form a valid probability distribution whose entries sum to one.
GLMs solve this by keeping the linear part, but changing how the mean of the target distribution is connected to that linear predictor.
The Three Parts of a GLM
A GLM has three ingredients.
1. Random Component
We choose a conditional distribution for the target:
where is the conditional mean.
Typical choices are:
- Gaussian for real-valued regression targets.
- Bernoulli for binary labels.
- Poisson for counts.
2. Systematic Component
We define a linear predictor:
This is the same basic form we used in linear and logistic regression.
3. Link Function
The link function connects the conditional mean to the linear predictor:
Equivalently,
The inverse link is what turns the raw score into a valid mean parameter for the chosen distribution.
Exponential-Family View
Many GLMs use target distributions from the exponential family, whose densities or mass functions can be written in the form
You do not need this formula to use GLMs day to day, but it explains why so many models share the same optimization structure:
- The negative log-likelihood is often convex.
- The gradient takes a clean residual form.
- The mean is tightly connected to the natural parameter .
Common GLM Examples
The table below summarizes the most important cases in this section.
| Task | Target distribution | Mean | Link function |
|---|---|---|---|
| Linear regression | Gaussian | Identity: | |
| Logistic regression | Bernoulli | Logit: | |
| Multinomial regression | Categorical / multinomial | class probabilities | Softmax / generalized logit |
| Poisson regression | Poisson | Log: |
Gaussian GLM: Linear Regression
If
and we use the identity link, then
Maximizing the likelihood leads to the squared-error objective. This is exactly ordinary least squares under Gaussian noise assumptions.
Bernoulli GLM: Logistic Regression
For binary classification,
and the canonical link is the logit:
Solving for gives the sigmoid:
This is why logistic regression still uses a linear score internally, but outputs a valid probability.
Poisson GLM: Count Modeling
When the target is a count, a common model is
with log link
so that
This guarantees the predicted mean stays positive. Poisson GLMs are useful for quantities like event counts, arrivals, or number of clicks.
Likelihood and Loss
Once the distribution is chosen, training a GLM usually means maximizing the conditional likelihood:
or, equivalently, minimizing the negative log-likelihood:
This recovers familiar losses:
- Gaussian + identity link gives squared error.
- Bernoulli + logit link gives binary cross-entropy.
- Multiclass softmax gives multiclass cross-entropy.
- Poisson + log link gives Poisson negative log-likelihood.
This is one of the main reasons GLMs are so useful: the modeling choice and the training objective stay aligned.
Canonical Links
For exponential-family models there is often a preferred link called the canonical link. It maps the mean parameter to the natural parameter of the distribution.
Examples:
- Gaussian: canonical link is identity.
- Bernoulli: canonical link is logit.
- Poisson: canonical link is log.
Canonical links are attractive because they usually lead to cleaner gradients and convex objectives. They are not the only valid choice, but they are often the default starting point.
Interpreting Coefficients
GLMs remain attractive because the parameters still have interpretable effects.
- In linear regression, a unit increase in feature changes the predicted mean additively by .
- In logistic regression, a unit increase in changes the log-odds linearly by .
- In Poisson regression, a unit increase in changes the log expected count linearly by .
The parameter effect is linear in the transformed space, not necessarily in the original output space.
When GLMs Work Well
GLMs are strong baselines when:
- The features already capture the key structure of the problem.
- You want interpretable coefficients.
- The target distribution has obvious constraints.
- You need a model that trains quickly and behaves predictably.
Limitations of GLMs
A GLM can still fail if the linear predictor is too restrictive.
- Missing nonlinear structure: The true decision boundary or regression surface may not be close to linear in the chosen features.
- Missing interactions: Important feature combinations may be absent.
- Distribution mismatch: The chosen likelihood may poorly reflect the data.
Common fixes include:
- Adding basis features or interactions.
- Using kernels.
- Moving to tree-based models or neural networks.
Recap
1. What remains linear inside a generalized linear model?
2. Which link function is canonical for Bernoulli targets?
3. Why is the log link natural for Poisson regression?
What's Next
In the next chapter, we move from model specification to model selection: how to estimate generalization error, choose hyperparameters, and avoid fooling ourselves with validation leakage.