CS 294 Deep Reinforcement Learning, Spring 2017

anuragramdasan · on Jan 3, 2017

Quickly glanced through the syllabus and this seems like it covers mostly the advanced aspects of Reinforcement Learning and assumes you know the basics concepts such as MDPs, and training models etc.

For those interested in this, would strongly recommend David Silvers intro to RL[1] before beginning with the above course.

1. http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html

kleiba · on Jan 3, 2017

Everything you say (including the David Silvers recommendation) is already stated on the course website under "Prerequisites".

dataflow · on Jan 3, 2017

Yes, the basics of MDPs, value/policy iteration, etc. are covered in the undergraduate Artificial Intelligence class (CS 188).

komaromy · on Jan 3, 2017

Looks really cool.

I recently hit a roadblock when trying to implement the original DeepMind Atari algorithm [0] with TensorFlow. They don't mention this in the paper, but the network wasn't trained to convergence at each training step (maybe this would be obvious to people more well-versed in deep learning, but it wasn't to me coming from a classical RL background).

As it turns out, TensorFlow's optimizers don't have a way to manually terminate training before convergence. That meant I was getting through several orders of magnitude fewer training steps than the DeepMind team did, even when accounting for my inferior hardware. This might not be a problem in some learning cases, where training more on certain examples lets you extract more information from them, but in games with sparse rewards it's bad.

Of course, TensorFlow does let you do the gradient calculations and updates by hand, but I wasn't prepared to go that far at the time. Maybe in the next few weeks I'll dive back into it.

[0] https://arxiv.org/pdf/1312.5602.pdf

tfgg · on Jan 3, 2017

> As it turns out, TensorFlow's optimizers don't have a way to manually terminate training before convergence.

I don't know how you determined this, but the optimizer minimize op definitely only does one step, equivalent to doing the gradient update yourself.

tw01 · on Jan 3, 2017

Will non-Berkeley students be able to participate in the discussions on Piazza?

If not, for those interested in following this course online, we might want to start a slack channel study group around this to help each other out. PM if interested.

rawnlq · on Jan 3, 2017

Stanford's cs231n had a subreddit: reddit.com/r/cs231n. Something similar might be nicer for organization in addition to chat.

markovbling · on Jan 3, 2017

Please post details on where to join the slack :)

paulbaumgart · on Jan 3, 2017

Seconded.

tw01 · on Jan 4, 2017

A slack study group is up at deep-rl-study.slack.com.

Email me (in about:) for invitation :)

echelon · on Jan 4, 2017

The course website mentions that the lectures may be recorded. Do you know the probability that this will happen, and if so, will they be posted for the public (this semester)?

Bosca · on Jan 3, 2017

Just signed up there, I'd be interested if one gets going. Thanks.

VodkaHaze · on Jan 3, 2017

Can't PM you, but willing to get one going.

ai_ia · on Jan 3, 2017

Interested.

f00_ · on Jan 3, 2017

can't pm, but interested

psb217 · on Jan 3, 2017

It would be great if cleaned-up demo code for many of these models/algorithms could be shared in a single "deep RL quickstart" repo.

Various implementations (sometimes of dubious correctness) are already scattered around Github, but having a single library of code to build from when booting up a new research project would be a boon to people who don't have such great access to collaborators' codebases.

Thanks for sharing these resources.

concilliatory · on Jan 3, 2017

will assignments be posted?

cbfinn · on Jan 3, 2017