Course Description & Logistics

To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions. Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This class will provide a solid introduction to the field of reinforcement learning and students will learn about the core challenges and approaches, including generalization and exploration. Through a combination of lectures, and written and coding assignments, students will become well versed in key ideas and techniques for RL. Assignments will include the basics of reinforcement learning as well as deep reinforcement learning — an extremely promising new area that combines deep learning techniques with reinforcement learning.

Communication: We will use Ed discussion forums. We encourage all students to use Ed for the fastest response to your questions.

  • Lectures will be live every Tuesday and Thursday: Videos of the lecture content will also be made available to enrolled students through canvas.
  • Office hours: Will be announced in the first week of class

Platforms: All assignments and quizzes will be handled through Gradescope, where you will also find your grades. We will send out links and access codes to enrolled students through Canvas.

Course Instructor

Course Assistants

Max Sobol Mark
 
Regina Wang
 
Anirudhan Badrinath
 

Prerequisites for This Class

  • Proficiency in Python
    All class assignments will be in Python. There is a tutorial here for those who aren't as familiar with Python. If you have a lot of programming experience but in a different language (e.g. C/ C++/ Matlab/ Javascript) you will probably be fine.
  • College Calculus, Linear Algebra (e.g. MATH 51, CME 100)
    You should be comfortable taking derivatives and understanding matrix vector operations and notation.
  • Basic Probability and Statistics (e.g. CS 109 or other stats course)
    You should know basics of probabilities, Gaussian distributions, mean, standard deviation, etc.
  • Foundations of Machine Learning
    We will be formulating cost functions, taking derivatives and performing optimization with gradient descent. Either CS 221 or CS 229 cover this background. Some optimization tricks will be more intuitive with some knowledge of convex optimization.

Learning Outcomes

By the end of the class students should be able to:

  • Define the key features of reinforcement learning that distinguishes it from AI and non-interactive machine learning (as assessed by the exam).
  • Given an application problem (e.g. from computer vision, robotics, etc), decide if it should be formulated as a RL problem; if yes be able to define it formally (in terms of the state space, action space, dynamics and reward model), state what algorithm (from class) is best suited for addressing it and justify your answer (as assessed by the exam).
  • Implement in code common RL algorithms (as assessed by the assignments).
  • Describe (list and define) multiple criteria for analyzing RL algorithms and evaluate algorithms on these metrics: e.g. regret, sample complexity, computational complexity, empirical performance, convergence, etc (as assessed by assignments and the exam).
  • Describe the exploration vs exploitation challenge and compare and contrast at least two approaches for addressing this challenge (in terms of performance, scalability, complexity of implementation, and theoretical guarantees) (as assessed by an assignment and the exam).

Course Lecture Materials (Videos and Slides)

See the Lecture Materials page.

Draft Course Schedule


Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Week 1 Jan 9 Jan 10 Jan 11 Jan 12 Jan 13 Jan 14 Jan 15
Lecture Materials
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
3pm-4:20pm
Tabular MDP planning
3pm-4:20pm


[Assignment 1 Released]
Week 2 Jan 16
Jan 17
Jan 18 Jan 19 Jan 20 Jan 21 Jan 22
Lecture Materials
PyTorch tutorial
3pm-4:20pm
Tabular RL policy evaluation
3pm-4:20pm
Problem Session 1 [Solutions]
Assignment 1
Due at 6 pm
[Assignment 2 Released]
Week 3 Jan 23
Jan 24 Jan 25 Jan 26 Jan 27
Jan 28 Jan 29
Lecture Materials
Problem Session 2 [Solutions]
Q learning and Function approximation 1 3pm-4:20pm
Function approximation 2
3pm-4:20pm
Week 4 Jan 30
Jan 31
Feb 1 Feb 2 Feb 3 Feb 4 Feb 5
Lecture Materials
Problem Session 3
Solutions
Function approximation 3
3pm-4:20pm
Policy Search
3pm-4:20pm
Assignment 2
Due at 6 pm
Week 5 Feb 6
Feb 7
Feb 8 Feb 9 Feb 10 Feb 11 Feb 12
Lecture Materials
Policy Search
3pm-4:20pm
Midterm
[Assignment 3 Released]
Week 6 Feb 13
Feb 14
Feb 15 Feb 16 Feb 17 Feb 18 Feb 19
Lecture Materials
Problem Session 4
[Solutions]
Project Proposal
Due at 6 pm
Policy Search
3pm-4:20pm
Exploration / exploitation
3pm-4:20pm
Week 7 Feb 20
Feb 21
Feb 22 Feb 23 Feb 24 Feb 25 Feb 26
Lecture Materials
Exploration / exploitation
3pm-4:20pm
Batch RL
3pm-4:20pm
Assignment 3
Due at 6 pm
Week 8 Feb 27
Feb 28
Mar 1 Mar 2 Mar 3 Mar 4 Mar 5
Lecture Materials
Problem Session 5
[Solutions]
Batch RL
3pm-4:20pm
Imitation Learning
3pm-4:20pm


Project Milestone
Due at 6 pm
Week 9 Mar 6
Mar 7
Mar 8 Mar 9 Mar 10 Mar 11 Mar 12
Lecture Materials
Guest Lecture
3pm-4:20pm
In-class Quiz
Week 10 Mar 13 Mar 14 Mar 15 Mar 16 Mar 17 Mar 18 Mar 19
Lecture Materials
Value Alignment
3pm-4:20pm
Poster Session
3 - 5 pm

Textbooks

There is no official textbook for the class but a number of the supporting readings will come from: Some other additional references that may be useful are listed below:

Grade Breakdown

  • Assignment 1: 10%
  • Assignment 2: 18%
  • Assignment 3: 18%
  • Midterm: 25%
  • Quiz: 5%
  • Course Project: 24%
    • Proposal: 1%
    • Milestone: 2%
    • Poster Presentation: 5%
    • Paper: 16%
    • If you choose to do the default project/4th assignment, your breakdown will instead be
      • Poster presentation: 5%
      • Paper/assignment write up: 19%
  • 0.5% bonus for participating [answering lecture polls for 80% of the days we have lecture with polls. You may participate in these remotely as well. These are due by Sunday at 6pm for the week of lecture. You should complete these by logging in with your Stanford sunid in order for your participation to count.]

Late Day Policy

Exams

Assignments and Submission Process


Communication

We believe students often learn an enormous amount from each other as well as from us, the course staff. Therefore to facilitate discussion and peer learning, we request that you please use Ed for all questions related to lectures and assignments.

For SCPD students, if you have generic SCPD specific questions, please email scpdsupport@stanford.edu or call 650-741-1542. In case you have specific questions related to being a SCPD student for this particular class, please contact us at cs234-win2223-staff@lists.stanford.edu.

For exceptional circumstances that require us to make special arrangements, please email us at cs234-win2223-staff@lists.stanford.edu. For example, such a situation may arise if a student requires extra days to submit a homework due to a medical emergency, or if a student needs to schedule an alternative midterm date due to events such as conference travel etc. They will be considered and approved on a case by case basis.

Regrading Requests

Academic Collaboration and Misconduct

I care about academic collaboration and misconduct because it is important both that we are able to evaluate your own work (independent of your peer’s) and because not claiming others’ work as your own is an important part of integrity in your future career. I understand that different institutions and locations can have different definitions of what forms of collaborative behavior is considered acceptable. In this class, for written homework problems, you are welcome to discuss ideas with others, but you are expected to write up your own solutions independently (without referring to another’s solutions). For coding, you may only share the input-output behavior of your programs. This encourages you to work separately but share ideas on how to test your implementation. Please remember that if you share your solution with another student, even if you did not copy from another, you are still violating the honor code.

We periodically run similarity-detection software over all submitted student programs, including programs from past quarters and any solutions found online on public websites. Anyone violating the Stanford University Honor Code will be referred to the Office of Judicial Affairs. If you think you made a mistake (it can happen, especially under stress or when time is short!), please reach out to Emma or the head CA; the consequences will be much less severe than if we approach you.

Academic Accommodation

If you need an academic accommodation based on the impact of a disability, please share your Office of Accessible Education letter with us via an email to our course staff list as soon as it is convenient for you. This helps us ensure the course materials and staff support can comply with your needs. The OAS is located at 563 Salvatierra Walk (650-723-1066, http://studentaffairs.stanford.edu/oae).

Credit/No Credit Enrollment

If you're enrolled in the class on credit/no credit status, you will be graded on work as usual per standard Stanford rules. The only distinction with those taking the class for letter grade is that you must obtain a C- (C minus) grade or higher in the class, for you to be marked as CR.