Ref: https://www.cs.toronto.edu/~jlucas/teaching/csc411/lectures/lec21_22_handout.pdf

Formulate:

Read more from: https://www.cs.toronto.edu/~jlucas/teaching/csc411/lectures/lec21_22_handout.pdf
What is a Policy (Deterministic Policy, Stochastic Policy)
What is a Value Function
What is a Model? What is Model Free. Markov Property for Model
MDP Problems
Exploration and Exploitation
Bellman Equations
Q-Learning
Function Approximation for Large State Spaces