Deep Reinforcement Learning

warning

This documentation is a collection of my notes from Marlos C. Machado's Deep Reinforcement Learning (DRL) course at the University of Alberta. For more accurate and complete coverage of the topics, check out the slides available on the course website.

Coming soon...

Deep Reinforcement Learning (Deep RL) builds on traditional reinforcement learning (RL) by using deep neural networks to handle function approximation. Specifically, Deep RL algorithms learn directly from high-dimensional data — such as pixels from video frames — without relying on handcrafted features. Unlike classical RL methods that are often agnostic about the function approximator used, Deep RL explicitly leverages modern deep learning techniques to extract rich representations and develop effective control policies in an end-to-end manner. This integration enables agents to tackle complex decision-making tasks across diverse environments with minimal manual tuning.

References

University of Alberta - CMPUT 628: Deep Reinforcement Learning (2025)
Course Website by Professor Marlos C. Machado.
Sutton, R. S., & Barto, A. G.
Reinforcement Learning: An Introduction (2nd Edition)
Foundations of Deep RL (2021)
YouTube Video Series by Professor Pieter Abbeel.
UC Berkeley - CS 285: Deep Reinforcement Learning (2023)
YouTube Lecture Series by Professor Sergey Levine.

Papers discussed in class

Deep RL Overview & DQN

L. J. Lin (1992)
"Self-improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching." Machine Learning, 8, 293–321.
Link
M. A. Riedmiller (2005)
"Neural Fitted Q Iteration – First Experiences with a Data-Efficient Neural Reinforcement Learning Method." European Conference on Machine Learning (ECML).
Link
M. G. Bellemare, Y. Naddaf, J. Veness, & M. Bowling (2013)
"The Arcade Learning Environment: An Evaluation Platform for General Agents." Journal of Artificial Intelligence Research, 47, 253–279.
V. Mnih et al. (2013)
"Playing Atari with Deep Reinforcement Learning." CoRR, abs/1312.5602.
Link
M. G. Bellemare, Y. Naddaf, J. Veness, & M. Bowling (2015)
"The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)." International Joint Conference on Artificial Intelligence (IJCAI).
V. Mnih et al. (2015)
"Human-level Control Through Deep Reinforcement Learning." Nature, 518(7540), 529–533.
Link
Y. Liang, M. C. Machado, E. Talvitie, & M. H. Bowling (2016)
"State of the Art Control of Atari Games Using Shallow Reinforcement Learning." International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
M. C. Machado et al. (2018)
"Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents." Journal of Artificial Intelligence Research, 61, 523–562.
H. van Hasselt et al. (2018)
"Deep Reinforcement Learning and the Deadly Triad." CoRR, abs/1812.02648.
Link
R. Tachet des Combes, P. Bachman, & H. van Seijen (2018)
"Learning Invariances for Policy Generalization." (ICLR 2018 Workshop)

Deep Double Learning

H. van Hasselt (2010)
"Double Q-learning." Neural Information Processing Systems (NeurIPS).
Link
H. van Hasselt, A. Guez, & D. Silver (2016)
"Deep Reinforcement Learning with Double Q-Learning." AAAI Conference on Artificial Intelligence.
Link

Multi-step Methods

R. Munos, T. Stepleton, A. Harutyunyan, & M. G. Bellemare (2016)
"Safe and Efficient Off-Policy Reinforcement Learning." Neural Information Processing Systems (NeurIPS).
Link
G. Ostrovski, M. G. Bellemare, A. van den Oord, & R. Munos (2017)
"Count-Based Exploration with Neural Density Models." International Conference on Machine Learning (ICML).
Link
Z. Wang et al. (2017)
"Sample Efficient Actor-Critic with Experience Replay." International Conference on Learning Representations (ICLR).
Link
B. Daley, M. White, C. Amato, & M. C. Machado (2023)
"Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning." International Conference on Machine Learning (ICML).
Link
B. Daley, M. White, & M. C. Machado (2024)
"Averaging n-step Returns Reduces Variance in Reinforcement Learning." International Conference on Machine Learning (ICML).
Link

Distributional RL

M. G. Bellemare, W. Dabney, & R. Munos (2017)
"A Distributional Perspective on Reinforcement Learning." International Conference on Machine Learning (ICML).
Link
W. Dabney, M. Rowland, M. G. Bellemare, & R. Munos (2018)
"Distributional Reinforcement Learning With Quantile Regression." AAAI Conference on Artificial Intelligence.
Link
C. Lyle, M. G. Bellemare, & P. S. Castro (2019)
"A Comparative Analysis of Expected and Distributional Reinforcement Learning." AAAI Conference on Artificial Intelligence.
Link
W. Dabney, G. Ostrovski, D. Silver, & R. Munos (2018)
"Implicit Quantile Networks for Distributional Reinforcement Learning." International Conference on Machine Learning (ICML).
Link
J. Farebrother et al. (2024)
"Stop Regressing: Training Value Functions via Classification for Scalable Deep RL." International Conference on Learning Representations (ICLR).
Link

Auxiliary Objective Functions

M. Jaderberg et al. (2017)
"Reinforcement Learning with Unsupervised Auxiliary Tasks." International Conference on Learning Representations (ICLR).
Link
C. Gelada, S. Kumar, J. Buckman, O. Nachum, & M. G. Bellemare (2019)
"DeepMDP: Learning Continuous Latent Space Models for Representation Learning." International Conference on Machine Learning (ICML).
Link
W. Dabney et al. (2021)
"The Value-Improvement Path: Towards Better Representations for Reinforcement Learning." AAAI Conference on Artificial Intelligence.
Link
H. Wang et al. (2024)
"Investigating the Properties of Neural Network Representations in Reinforcement Learning." Artificial Intelligence, 330, 104100.
Link

Neural Network Architectures & Auxiliary Inputs

Ziyu Wang et al. (2016)
"Dueling Network Architectures for Deep Reinforcement Learning." International Conference on Machine Learning (ICML).
Link
M. J. Hausknecht & P. Stone (2015)
"Deep Recurrent Q-Learning for Partially Observable MDPs." AAAI Conference on Artificial Intelligence (AAAI) Fall Symposia.
Link
Ruo Yu Tao, A. White, & M. C. Machado (2023)
"Agent-State Construction with Auxiliary Inputs." Transactions on Machine Learning Research.
Link

Experience Replay Buffers

W. Fedus et al. (2019)
"Revisiting Fundamentals of Experience Replay." International Conference on Machine Learning (ICML).
Link
S. Zhang & R. S. Sutton (2017)
"A Deeper Look at Experience Replay." CoRR, abs/1712.01275.
Link
T. Schaul et al. (2017)
"Prioritized Experience Replay." International Conference on Learning Representations (ICLR).
Link
A. Nair et al. (2015)
"Massively Parallel Methods for Deep Reinforcement Learning." CoRR, abs/1507.04296.
Link
D. Horgan et al. (2018)
"Distributed Prioritized Experience Replay." International Conference on Learning Representations (ICLR).
Link
S. Kapturowski et al. (2019)
"Recurrent Experience Replay in Distributed Reinforcement Learning." International Conference on Learning Representations (ICLR).
Link

Policy Gradient Methods

V. Mnih et al. (2016)
"Asynchronous Methods for Deep Reinforcement Learning." International Conference on Machine Learning (ICML).
Link
T. P. Lillicrap et al. (2015/2016)
"Continuous Control with Deep Reinforcement Learning." International Conference on Learning Representations (ICLR).
Link
S. Fujimoto, H. van Hoof, & D. Meger (2018)
"Addressing Function Approximation Error in Actor-Critic Methods." International Conference on Machine Learning (ICML).
Link
T. Haarnoja et al. (2018)
"Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor." International Conference on Machine Learning (ICML).
Link
J. Schulman et al. (2015)
"Trust Region Policy Optimization." International Conference on Machine Learning (ICML).
Link
J. Schulman et al. (2017)
"Proximal Policy Optimization Algorithms." CoRR, abs/1707.06347.
Link
J. Schulman et al. (2016)
"High-Dimensional Continuous Control Using Generalized Advantage Estimation." International Conference on Learning Representations (ICLR).
Link

Model-based Reinforcement Learning

J. Schrittwieser et al. (2020)
"Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model." Nature, 588(7839), 604–609.
Link
I. Antonoglou et al. (2022)
"Planning in Stochastic Environments with a Learned Model." International Conference on Learning Representations (ICLR).
Link
D. Hafner et al. (2021)
"Mastering Atari with Discrete World Models." International Conference on Learning Representations (ICLR).
Link
D. Hafner, J. Pasukonis, J. Ba, & T. P. Lillicrap (2023)
"Mastering Diverse Domains through World Models." CoRR, abs/2301.04104.
Link

Guest Lecture and Student Seminars

M. Elsayed, G. Vasan, & A. R. Mahmood (2024)
"Streaming Deep Reinforcement Learning Finally Works." CoRR, abs/2410.14606.
Link
C. Allen et al. (2024)
"Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy." Neural Information Processing Systems (NeurIPS).
Link
Z. D. Guo et al. (2022)
"BYOL-Explore: Exploration by Bootstrapped Prediction." Neural Information Processing Systems (NeurIPS).
Link
S. Flennerhag et al. (2022)
"Bootstrapped Meta-Learning." International Conference on Learning Representations (ICLR).
Link
R. Asad et al. (2024)
"Fast Convergence of Softmax Policy Mirror Ascent." CoRR, abs/2411.12042.
Link
T. Schaul, A. Barreto, J. Quan, & G. Ostrovski (2022)
"The Phenomenon of Policy Churn." Neural Information Processing Systems (NeurIPS).
Link
C. Lyle, M. Rowland, & W. Dabney (2022)
"Understanding and Preventing Capacity Loss in Reinforcement Learning." International Conference on Learning Representations (ICLR).
Link
A. Kumar, A. Zhou, G. Tucker, & S. Levine (2020)
"Conservative Q-Learning for Offline Reinforcement Learning." Neural Information Processing Systems (NeurIPS).
Link
L. Chen et al. (2021)
"Decision Transformer: Reinforcement Learning via Sequence Modeling." Neural Information Processing Systems (NeurIPS).
Link
G. Sokar et al. (2025)
"Don’t Flatten, Tokenize! Unlocking the Key to SoftMoE’s Efficacy in Deep RL." International Conference on Learning Representations (ICLR).
Link
A. Dedieu et al. (2025)
"Improving Transformer World Models for Data-Efficient RL." CoRR, abs/2502.01591.
Link
M. Klissarov et al. (2025)
"On the Modeling Capabilities of Large Language Models for Sequential Decision Making." International Conference on Learning Representations (ICLR).
Link

References​

Papers discussed in class​

Deep RL Overview & DQN​

Deep Double Learning​

Multi-step Methods​

Distributional RL​

Auxiliary Objective Functions​

Neural Network Architectures & Auxiliary Inputs​

Experience Replay Buffers​

Policy Gradient Methods​

Model-based Reinforcement Learning​

Guest Lecture and Student Seminars​