Generalized Linear Models

info

In this chapter, you'll learn about:

Generalized Linear Models (GLMs): A unified view of linear regression, logistic regression, and other probabilistic linear models.
The Three Parts of a GLM: The random component, linear predictor, and link function.
Exponential-Family Distributions: Why Gaussian, Bernoulli, and Poisson models fit naturally into the same framework.
Canonical Links and Likelihoods: How the choice of link determines the loss and optimization problem.
Practical Modeling Choices: When GLMs are appropriate and when you need richer nonlinear models.

In the previous chapters, we studied linear regression, logistic regression, and multinomial regression as if they were separate algorithms. They are closely related. A generalized linear model (GLM) is a common template that explains all of them in one language.

The key idea is simple: we keep a linear predictor in the features, but we allow the target variable to follow a distribution that matches the task. Continuous real-valued targets suggest Gaussian models, binary targets suggest Bernoulli models, and count data often suggests Poisson models.

Why We Need GLMs

Ordinary linear regression assumes:

\mathbb{E}[Y \mid \mathbf{x}] = \mathbf{w}^\top \mathbf{x} + b

This works well when:

The target is real-valued.
The conditional noise is roughly symmetric.
Predicting any real number is acceptable.

It breaks down when the output has structural constraints:

Binary labels must stay in $\{0,1\}$ or be interpreted as probabilities in $[0,1]$ .
Counts must be nonnegative integers.
Multiclass outputs must form a valid probability distribution whose entries sum to one.

GLMs solve this by keeping the linear part, but changing how the mean of the target distribution is connected to that linear predictor.

The Three Parts of a GLM

A GLM has three ingredients.

1. Random Component

We choose a conditional distribution for the target:

Y \mid \mathbf{x} \sim p(y \mid \mu(\mathbf{x}))

where $\mu(\mathbf{x}) = \mathbb{E}[Y \mid \mathbf{x}]$ is the conditional mean.

Typical choices are:

Gaussian for real-valued regression targets.
Bernoulli for binary labels.
Poisson for counts.

2. Systematic Component

We define a linear predictor:

\eta(\mathbf{x}) = \mathbf{w}^\top \mathbf{x} + b

This is the same basic form we used in linear and logistic regression.

3. Link Function

The link function connects the conditional mean to the linear predictor:

g(\mu(\mathbf{x})) = \eta(\mathbf{x})

Equivalently,

\mu(\mathbf{x}) = g^{-1}(\eta(\mathbf{x}))

The inverse link is what turns the raw score $\eta$ into a valid mean parameter for the chosen distribution.

Exponential-Family View

Many GLMs use target distributions from the exponential family, whose densities or mass functions can be written in the form

p(y \mid \theta, \phi) = h(y, \phi)\exp\left(\frac{y\theta - A(\theta)}{\phi}\right)

You do not need this formula to use GLMs day to day, but it explains why so many models share the same optimization structure:

The negative log-likelihood is often convex.
The gradient takes a clean residual form.
The mean is tightly connected to the natural parameter $\theta$ .

Common GLM Examples

The table below summarizes the most important cases in this section.

Task	Target distribution	Mean	Link function
Linear regression	Gaussian	$\mu \in \mathbb{R}$	Identity: $g(\mu)=\mu$
Logistic regression	Bernoulli	$\mu \in (0,1)$	Logit: $g(\mu)=\log\frac{\mu}{1-\mu}$
Multinomial regression	Categorical / multinomial	class probabilities	Softmax / generalized logit
Poisson regression	Poisson	$\mu > 0$	Log: $g(\mu)=\log \mu$

Gaussian GLM: Linear Regression

Y \mid \mathbf{x} \sim \mathcal{N}(\mu(\mathbf{x}), \sigma^2)

and we use the identity link, then

\mu(\mathbf{x}) = \mathbf{w}^\top \mathbf{x} + b

Maximizing the likelihood leads to the squared-error objective. This is exactly ordinary least squares under Gaussian noise assumptions.

Bernoulli GLM: Logistic Regression

For binary classification,

Y \mid \mathbf{x} \sim \text{Bernoulli}(\mu(\mathbf{x}))

and the canonical link is the logit:

\log \frac{\mu(\mathbf{x})}{1-\mu(\mathbf{x})} = \mathbf{w}^\top \mathbf{x} + b

Solving for $\mu(\mathbf{x})$ gives the sigmoid:

\mu(\mathbf{x}) = \sigma(\mathbf{w}^\top \mathbf{x} + b)

This is why logistic regression still uses a linear score internally, but outputs a valid probability.

Poisson GLM: Count Modeling

When the target is a count, a common model is

Y \mid \mathbf{x} \sim \text{Poisson}(\mu(\mathbf{x}))

with log link

\log \mu(\mathbf{x}) = \mathbf{w}^\top \mathbf{x} + b

so that

\mu(\mathbf{x}) = \exp(\mathbf{w}^\top \mathbf{x} + b)

This guarantees the predicted mean stays positive. Poisson GLMs are useful for quantities like event counts, arrivals, or number of clicks.

Likelihood and Loss

Once the distribution is chosen, training a GLM usually means maximizing the conditional likelihood:

\hat{\mathbf{w}}, \hat{b} = \arg\max_{\mathbf{w}, b} \prod_{i=1}^{N} p(y^{(i)} \mid \mathbf{x}^{(i)}; \mathbf{w}, b)

or, equivalently, minimizing the negative log-likelihood:

J(\mathbf{w}, b) = - \sum_{i=1}^{N} \log p(y^{(i)} \mid \mathbf{x}^{(i)}; \mathbf{w}, b)

This recovers familiar losses:

Gaussian + identity link gives squared error.
Bernoulli + logit link gives binary cross-entropy.
Multiclass softmax gives multiclass cross-entropy.
Poisson + log link gives Poisson negative log-likelihood.

This is one of the main reasons GLMs are so useful: the modeling choice and the training objective stay aligned.

Canonical Links

For exponential-family models there is often a preferred link called the canonical link. It maps the mean parameter to the natural parameter of the distribution.

Examples:

Gaussian: canonical link is identity.
Bernoulli: canonical link is logit.
Poisson: canonical link is log.

Canonical links are attractive because they usually lead to cleaner gradients and convex objectives. They are not the only valid choice, but they are often the default starting point.

Interpreting Coefficients

GLMs remain attractive because the parameters still have interpretable effects.

In linear regression, a unit increase in feature $x_j$ changes the predicted mean additively by $w_j$ .
In logistic regression, a unit increase in $x_j$ changes the log-odds linearly by $w_j$ .
In Poisson regression, a unit increase in $x_j$ changes the log expected count linearly by $w_j$ .

The parameter effect is linear in the transformed space, not necessarily in the original output space.

When GLMs Work Well

GLMs are strong baselines when:

The features already capture the key structure of the problem.
You want interpretable coefficients.
The target distribution has obvious constraints.
You need a model that trains quickly and behaves predictably.

Limitations of GLMs

A GLM can still fail if the linear predictor is too restrictive.

Missing nonlinear structure: The true decision boundary or regression surface may not be close to linear in the chosen features.
Missing interactions: Important feature combinations may be absent.
Distribution mismatch: The chosen likelihood may poorly reflect the data.

Common fixes include:

Adding basis features or interactions.
Using kernels.
Moving to tree-based models or neural networks.

Recap

What's Next

In the next chapter, we move from model specification to model selection: how to estimate generalization error, choose hyperparameters, and avoid fooling ourselves with validation leakage.

Generalized Linear Models

Why We Need GLMs

The Three Parts of a GLM

1. Random Component

2. Systematic Component

3. Link Function

Exponential-Family View

Common GLM Examples

Gaussian GLM: Linear Regression

Bernoulli GLM: Logistic Regression

Poisson GLM: Count Modeling

Likelihood and Loss

Canonical Links

Interpreting Coefficients

When GLMs Work Well

Limitations of GLMs

Recap

1. What remains linear inside a generalized linear model?

2. Which link function is canonical for Bernoulli targets?

3. Why is the log link natural for Poisson regression?

What's Next

Why We Need GLMs​

The Three Parts of a GLM​

1. Random Component​

2. Systematic Component​

3. Link Function​

Exponential-Family View​

Common GLM Examples​

Gaussian GLM: Linear Regression​

Bernoulli GLM: Logistic Regression​

Poisson GLM: Count Modeling​

Likelihood and Loss​

Canonical Links​

Interpreting Coefficients​

When GLMs Work Well​

Limitations of GLMs​

Recap​

1. What remains linear inside a generalized linear model?

2. Which link function is canonical for Bernoulli targets?

3. Why is the log link natural for Poisson regression?

What's Next​

Why We Need GLMs

The Three Parts of a GLM

1. Random Component

2. Systematic Component

3. Link Function

Exponential-Family View

Common GLM Examples

Gaussian GLM: Linear Regression

Bernoulli GLM: Logistic Regression

Poisson GLM: Count Modeling

Likelihood and Loss

Canonical Links

Interpreting Coefficients

When GLMs Work Well

Limitations of GLMs

Recap

What's Next