Introduction to Machine Learning

info

In this chapter you'll be introduced to:

Machine Learning (ML): Understanding what machine learning is and how it differs from traditional programming.
Learning from Experience vs. Instructions: The distinction between learning from data and following explicit procedures.
Components of a Machine Learning System: The stages involved in building and using a machine learning model.
Types of Machine Learning: An overview of supervised, unsupervised, and reinforcement learning.

What is Machine Learning?

Machine Learning (ML) is a field of artificial intelligence that focuses on enabling machines to learn from experience rather than being explicitly programmed. In essence, it's about developing algorithms that can recognize patterns in data and make informed decisions or predictions based on that data.

Unlike traditional programming, where a computer follows a set of predefined instructions to produce an output, machine learning allows computers to learn and adapt from data inputs without being explicitly programmed for specific tasks.

Learning from Instructions (Heuristics)

In traditional programming, humans provide explicit instructions or procedures for the computer to follow. For example, learning to perform multiplication involves following a specific algorithm taught in school:

Align the numbers based on place value.
Multiply each digit accordingly.
Sum the results to get the final product.

This method relies on learning from instructions, where the steps are clearly defined and must be followed precisely.

Learning from Experience (Machine Learning)

In contrast, machine learning focuses on learning from experience or data. For example, recognizing handwritten letters involves exposure to many examples of each letter in various styles and fonts. The learning algorithm identifies patterns and features that distinguish one letter from another without explicit programming for each variation.

Example

This approach is particularly useful when:

The desired function is too complex to model with explicit instructions.
The rules governing the task are unknown or hard to articulate.
The system needs to adapt to new patterns or data over time.

Components of a Machine Learning System

A machine learning system typically involves the following components:

Experience (Data): The dataset comprising examples the system learns from.
Machine Learning Algorithm: The method or procedure used to learn from data.
Machine Learning Model: The resulting function or representation after learning.
New Data (Possibly Unseen): Fresh inputs the model will make predictions on.
Prediction: The output or decision made by the model based on new data.

Machine learning involves two primary phases:

Training (Learning): The model learns from the training data using a machine learning algorithm to create a predictive model.
Prediction (Inference): The trained model makes predictions or decisions based on new, unseen data.

Types of Machine Learning

Machine learning can be broadly categorized into three types:

Supervised Learning

Supervised Learning involves learning a function that maps an input to an output based on example input-output pairs. It requires:

Specific Goal of Prediction: The target variable or outcome is known and defined.
Known Target During Training: The correct output (labels) for each training example is provided.

Example

Imagine you have a box containing an animal you cannot see. You can feel its size and the softness of its fur. The goal is to predict whether it's a cat or a dog.

Features (Inputs):
- Size: A continuous value between 0 (small) and 1 (large).
- Fur Softness: A continuous value between 0 (hard) and 1 (soft).
Target (Output): 'Cat' or 'Dog'.

During training, you have labeled examples:

Size	Fur Softness	Animal
0.2	0.9	Cat
0.8	0.3	Dog
0.5	0.8	Cat
0.7	0.4	Dog

The model learns to classify new animals based on these features.

Tasks in Supervised Learning

Among the supervised learning category, we can broadly distinguish two kinds of task:

Classification: The target variable is categorical (e.g., animal type, spam detection).
Regression: The target variable is continuous (e.g., predicting stock prices).

Example

Imagine you have data about various houses:

Features (Inputs):
- Size: Total square footage of the house.
- Number of Bedrooms: Total bedrooms in the house.
- Location: Neighborhood or area code.
- Age: Number of years since the house was built.

Regression Task

Target (Output): Market price of the house (a continuous value in dollars).

In the regression task, the goal is to predict the exact market price based on the house's features.

Size (sq ft)	Bedrooms	Location	Age (years)	Price ($)
2,000	3	A	5	350,000
1,500	2	B	10	250,000
2,500	4	A	2	500,000
1,800	3	C	20	200,000

The model learns to predict prices like 375,000 or 410,000 for new houses.

Classification Task

Target (Output): Price tier category (e.g., 'Affordable', 'Mid-range', 'Luxury').

In the classification task, the goal is to assign each house to a price tier based on its features.

Size (sq ft)	Bedrooms	Location	Age (years)	Price Tier
2,000	3	A	5	Mid-range
1,500	2	B	10	Affordable
2,500	4	A	2	Luxury
1,800	3	C	20	Affordable

The model learns to classify new houses into categories like 'Affordable' or 'Luxury'.

Caveat

Sometimes, reframing a problem—switching between regression and classification—can lead to improved performance or more interpretable results. However, this approach comes with several trade-offs and considerations:

Regression

Pros:
- Provides exact numerical predictions, preserving fine-grained differences between data points.
Cons:
- Can be sensitive to noise and outliers.
- May require more complex models to capture non-linear relationships.

Classification

Pros:
- Simplifies the problem by grouping continuous values into discrete categories, often making the model more robust against noise and easier to interpret.
Cons:
- Discretization can result in the loss of subtle differences, potentially masking valuable information.

When framing your problem, consider:

Does the data naturally cluster into distinct, meaningful categories?
Is there an intrinsic quantitative or ordinal relationship among outcomes (e.g., one category representing a value that’s significantly higher than another), or are they purely nominal?
Are you dealing with noise, outliers, or complex non-linear relationships that might be better managed through discretization?

Unsupervised Learning

Unsupervised Learning deals with unlabeled data, aiming to find patterns or intrinsic structures within the data without predefined labels.

Clustering

Grouping similar data points together based on feature similarity.

Example

Grouping pacients into clusters based on severity and number of symptoms per week.

Outlier Detection

Identifying data points that deviate significantly from the majority of the data.

Example

A Comparison of Outlier Detection methods².

Representation Learning

Learning useful representations or features from the data that can be used for various tasks.

Example

Extracting features from images that capture important information, which can then be used for tasks like image classification or captioning.

Representation transformation to perform image classification³.

Reinforcement Learning

Reinforcement Learning (RL) involves learning to make decisions by taking actions in an environment to maximize cumulative rewards.

Key Characteristics:
- Learning through trial and error.
- Receiving feedback in the form of rewards or penalties.

Other Categorizations of Machine Learning

Machine learning algorithms can also be categorized based on different criteria (regardless if they're supervised, unsupervised or reinforcement learning algorithms):

Batch vs. Online Learning

Batch Learning: The model is trained on the entire dataset at once. Data is static and doesn't change over time.
Online Learning: The model learns incrementally as new data comes in, often one instance at a time.

Passive vs. Active Learning

Passive Learning: The model learns from the dataset provided without influencing data collection.
Active Learning: The model can query or request additional data points or labels to improve learning.

Recap

What's Next

In the next chapter, we'll explore Supervised Learning—the core focus of our Intro to ML docs. You'll learn how to formulate learning tasks using labeled data, work through regression problems, and explore the essential components of building a machine learning system, including hypothesis classes, training, and model evaluation.

Stay tuned for an upcoming introduction to unsupervised learning in future updates, and if you're interested in decision-making models, check out the Intro to RL documentation.

Portions of this page are reproduced from work created and shared by Google and used according to terms described in the Creative Commons 4.0 Attribution License. ↩
Portions of this page are reproduced from work created and shared by Scikit-learn and used according to terms described in the BSD 3-Clause License. ↩
Source: Spot Intelligence's blog post by Neri Van Otten. ↩

Introduction to Machine Learning

What is Machine Learning?

Learning from Instructions (Heuristics)

Learning from Experience (Machine Learning)

Components of a Machine Learning System

Types of Machine Learning

Supervised Learning

Tasks in Supervised Learning

Regression Task

Classification Task

Regression

Classification

Unsupervised Learning

Clustering

Outlier Detection

Representation Learning

Reinforcement Learning

Other Categorizations of Machine Learning

Batch vs. Online Learning

Passive vs. Active Learning

Recap

1. What is the primary goal of Machine Learning?

2. Which component is not part of a typical Machine Learning system?

3. What distinguishes learning from experience from learning from instructions?

4. Which of the following is an example of a Machine Learning task?

5. Which of these is a main type of Machine Learning?

What's Next

What is Machine Learning?​

Learning from Instructions (Heuristics)​

Learning from Experience (Machine Learning)​

Components of a Machine Learning System​

Types of Machine Learning​

Supervised Learning​

Tasks in Supervised Learning​

Regression Task​

Classification Task​

Regression​

Classification​

Unsupervised Learning​

Clustering​

Outlier Detection​

Representation Learning​

Reinforcement Learning​

Other Categorizations of Machine Learning​

Batch vs. Online Learning​

Passive vs. Active Learning​

Recap​

1. What is the primary goal of Machine Learning?

2. Which component is not part of a typical Machine Learning system?

3. What distinguishes learning from experience from learning from instructions?

4. Which of the following is an example of a Machine Learning task?

5. Which of these is a main type of Machine Learning?

What's Next​

Footnotes​

What is Machine Learning?

Learning from Instructions (Heuristics)

Learning from Experience (Machine Learning)

Components of a Machine Learning System

Types of Machine Learning

Supervised Learning

Tasks in Supervised Learning

Regression Task

Classification Task

Regression

Classification

Unsupervised Learning

Clustering

Outlier Detection

Representation Learning

Reinforcement Learning

Other Categorizations of Machine Learning

Batch vs. Online Learning

Passive vs. Active Learning

Recap

What's Next

Footnotes