Reinforcement Learning


How can an agent learn to act given only indirect, delayed rewards or penalties as feedback?

Consider a robot learning to act in its environment.


General Reinforcement Learning Task



Q Learning


Q Learning Algorithm


Another Example


Properties of Q Learning


Nondeterministic Q Learning