Designing Machine Learning Systems

This content is a summary and my personal takeaways from the excellent book Designing Machine Learning Systems by Chip Huyen. It reflects the points I found most relevant along with my own conclusions on the topics discussed.

For a more comprehensive coverage of the topics, I highly recommend reading the full book.

📄️ 1. Overview of Machine Learning Systems

Machine learning (ML) has become a cornerstone of modern technology, driving advancements in various fields such as healthcare, finance, marketing, and more. However, the decision to use machine learning and the approach to building and deploying ML systems requires careful consideration and planning. This chapter provides a foundational overview of when to use machine learning, the differences between research and production environments, and the essential factors that influence the success of ML projects.

📄️ 2. Introduction to Machine Learning Systems Design

Designing machine learning systems requires a holistic approach that encompasses both business objectives and technical requirements. A well-designed ML system not only delivers accurate predictions but also aligns with the overall goals of the organization. This chapter goes through the essential aspects of ML systems design, starting from aligning ML objectives with business goals to understanding the critical requirements for building robust and scalable systems. Additionally, it discusses the iterative nature of ML system development and the ongoing debate between data-centric and model-centric approaches in machine learning.

📄️ 3. Data Engineering Fundamentals

This chapter is very introductory. The recommendation is Martin Kleppmann's book Designing Data-Intensive Applications

📄️ 4. Training Data

The quality and quantity of training data are critical to the success of any machine learning project. Effective sampling, labeling, handling class imbalance, and data augmentation are essential techniques to prepare robust datasets that improve model performance and generalization. This chapter explores various methods for creating and refining training datasets, ensuring they are comprehensive, representative, and suitable for training accurate and reliable machine learning models.

📄️ 5. Feature Engineering

Feature engineering involves creating new features or transforming existing ones to improve the performance of models. Well-engineered features tend to give the models the biggest performance boost compared to algorithmic techniques such as hyperparameter tuning. This chapter explores the types of features, common feature engineering operations, strategies to avoid data leakage, and best practices for creating robust and generalizable features.

📄️ 6. Model Development and Offline Evaluation

Six Tips for Model Selection

📄️ 7. Model Deployment and Prediction Service

In many companies, the team that develops the models is also responsible for deploying them. In others, once the model is ready for deployment, it is exported and handed off to a different team (e.g., DevOps, MLOps, Data Platform) for deployment. This separation can lead to high communication overhead across teams and slow down model updates. It can also complicate debugging when issues arise.