CS234: Reinforcement Learning Winter 2023

Course Description & Logistics

To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions. Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This class will provide a solid introduction to the field of reinforcement learning and students will learn about the core challenges and approaches, including generalization and exploration. Through a combination of lectures, and written and coding assignments, students will become well versed in key ideas and techniques for RL. Assignments will include the basics of reinforcement learning as well as deep reinforcement learning — an extremely promising new area that combines deep learning techniques with reinforcement learning.

Communication: We will use Ed discussion forums. We encourage all students to use Ed for the fastest response to your questions.

Lectures will be live every Tuesday and Thursday: Videos of the lecture content will also be made available to enrolled students through canvas.
Office hours: Will be announced in the first week of class

Platforms: All assignments and quizzes will be handled through Gradescope, where you will also find your grades. We will send out links and access codes to enrolled students through Canvas.

Course Instructor

Emma Brunskill

Course Assistants

Dilip Arumugam
(Head TA)

Skanda Vaidyanath

Jian Vora

Max Sobol Mark

Regina Wang

Anirudhan Badrinath

Prerequisites for This Class

Proficiency in Python
All class assignments will be in Python. There is a tutorial here for those who aren't as familiar with Python. If you have a lot of programming experience but in a different language (e.g. C/ C++/ Matlab/ Javascript) you will probably be fine.
College Calculus, Linear Algebra (e.g. MATH 51, CME 100)
You should be comfortable taking derivatives and understanding matrix vector operations and notation.
Basic Probability and Statistics (e.g. CS 109 or other stats course)
You should know basics of probabilities, Gaussian distributions, mean, standard deviation, etc.
Foundations of Machine Learning
We will be formulating cost functions, taking derivatives and performing optimization with gradient descent. Either CS 221 or CS 229 cover this background. Some optimization tricks will be more intuitive with some knowledge of convex optimization.

Learning Outcomes

By the end of the class students should be able to:

Define the key features of reinforcement learning that distinguishes it from AI and non-interactive machine learning (as assessed by the exam).
Given an application problem (e.g. from computer vision, robotics, etc), decide if it should be formulated as a RL problem; if yes be able to define it formally (in terms of the state space, action space, dynamics and reward model), state what algorithm (from class) is best suited for addressing it and justify your answer (as assessed by the exam).
Implement in code common RL algorithms (as assessed by the assignments).
Describe (list and define) multiple criteria for analyzing RL algorithms and evaluate algorithms on these metrics: e.g. regret, sample complexity, computational complexity, empirical performance, convergence, etc (as assessed by assignments and the exam).
Describe the exploration vs exploitation challenge and compare and contrast at least two approaches for addressing this challenge (in terms of performance, scalability, complexity of implementation, and theoretical guarantees) (as assessed by an assignment and the exam).

Course Lecture Materials (Videos and Slides)

See the Lecture Materials page.

Draft Course Schedule

	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday	Sunday
Week 1	Jan 9	Jan 10	Jan 11	Jan 12	Jan 13	Jan 14	Jan 15
Lecture Materials Introduction to Reinforcement Learning		Introduction to Reinforcement Learning 3pm-4:20pm		Tabular MDP planning 3pm-4:20pm [Assignment 1 Released]
Week 2	Jan 16	Jan 17	Jan 18	Jan 19	Jan 20	Jan 21	Jan 22
Lecture Materials		PyTorch tutorial 3pm-4:20pm		Tabular RL policy evaluation 3pm-4:20pm Problem Session 1 [Solutions]	Assignment 1 Due at 6 pm [Assignment 2 Released]
Week 3	Jan 23	Jan 24	Jan 25	Jan 26	Jan 27	Jan 28	Jan 29
Lecture Materials	Problem Session 2 [Solutions]	Q learning and Function approximation 1 3pm-4:20pm		Function approximation 2 3pm-4:20pm
Week 4	Jan 30	Jan 31	Feb 1	Feb 2	Feb 3	Feb 4	Feb 5
Lecture Materials	Problem Session 3 Solutions	Function approximation 3 3pm-4:20pm		Policy Search 3pm-4:20pm	Assignment 2 Due at 6 pm
Week 5	Feb 6	Feb 7	Feb 8	Feb 9	Feb 10	Feb 11	Feb 12
Lecture Materials		Policy Search 3pm-4:20pm		Midterm		[Assignment 3 Released]
Week 6	Feb 13	Feb 14	Feb 15	Feb 16	Feb 17	Feb 18	Feb 19
Lecture Materials	Problem Session 4 [Solutions] Project Proposal Due at 6 pm	Policy Search 3pm-4:20pm		Exploration / exploitation 3pm-4:20pm
Week 7	Feb 20	Feb 21	Feb 22	Feb 23	Feb 24	Feb 25	Feb 26
Lecture Materials		Exploration / exploitation 3pm-4:20pm		Batch RL 3pm-4:20pm			Assignment 3 Due at 6 pm
Week 8	Feb 27	Feb 28	Mar 1	Mar 2	Mar 3	Mar 4	Mar 5
Lecture Materials	Problem Session 5 [Solutions]	Batch RL 3pm-4:20pm		Imitation Learning 3pm-4:20pm Project Milestone Due at 6 pm
Week 9	Mar 6	Mar 7	Mar 8	Mar 9	Mar 10	Mar 11	Mar 12
Lecture Materials		Guest Lecture 3pm-4:20pm		In-class Quiz
Week 10	Mar 13	Mar 14	Mar 15	Mar 16	Mar 17	Mar 18	Mar 19
Lecture Materials		Value Alignment 3pm-4:20pm		Poster Session 3 - 5 pm

Textbooks

There is no official textbook for the class but a number of the supporting readings will come from:

Reinforcement Learning: An Introduction, Sutton and Barto, 2nd Edition. This is available for free here and references will refer to the final pdf version available here.

Some other additional references that may be useful are listed below:

Reinforcement Learning: State-of-the-Art, Marco Wiering and Martijn van Otterlo, Eds. [link]
Artificial Intelligence: A Modern Approach, Stuart J. Russell and Peter Norvig.[link]
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville. [link]
David Silver's course on Reinforcement Learning [link]

Grade Breakdown

Assignment 1: 10%
Assignment 2: 18%
Assignment 3: 18%
Midterm: 25%
Quiz: 5%
Course Project: 24%

Proposal: 1%
Milestone: 2%
Poster Presentation: 5%
Paper: 16%
If you choose to do the default project/4th assignment, your breakdown will instead be
- Poster presentation: 5%
- Paper/assignment write up: 19%

0.5% bonus for participating [answering lecture polls for 80% of the days we have lecture with polls. You may participate in these remotely as well. These are due by Sunday at 6pm for the week of lecture. You should complete these by logging in with your Stanford sunid in order for your participation to count.]

Late Day Policy

You can use 5 late days total.
A late day extends the deadline by 24 hours.
You are allowed up to 2 late days for assignments 1, 2, 3, project proposal, and project milestone, not to exceed 5 late days total. You may not use any late days for the project poster presentation and final project paper. For group submissions such as the project proposal and milestone, all group members must have the corresponding number of late days used on the assignment, and if one or more members do not have a sufficient amount of late days, all group members will incur a grade penalty of 50% within 24 hours and 100% after 24 hours, as explained below.
If you use two late days and hand an assignment in after 48 hours, it will be worth at most 50%. If you do not have enough late days left, handing the assignment within 1 day after it was due (adjusting for the late days used) will be worth at most 50%. No credit will be given to assignments handed in after 24 hours they were due (adjusting for any late days. E.g. if you use 2 late days, then after this policy applies 24 hours after your 2 late days, e.g. after 72 hours). Please contact us if you think you have an extremely rare circumstance for which we should make an exception. This policy is to ensure that feedback can be given in a timely manner.

CS234: Reinforcement Learning Winter 2023

Course Description & Logistics

Course Instructor

Course Assistants

Prerequisites for This Class

Learning Outcomes

Course Lecture Materials (Videos and Slides)

Draft Course Schedule

Textbooks

Grade Breakdown

Late Day Policy

Exams

Assignments and Submission Process

Communication

Regrading Requests

Academic Collaboration and Misconduct

Academic Accommodation

Credit/No Credit Enrollment