This notebook talks about reinforcement learning with search methods. Specifically, we focus on the series of work of AlphaGo [1], AlphaGo Zero [2], AlphaZero [3], and MuZero [4]. We will talk about these works step-by-step, illuminating each of their advances to train stronger RL models.
[System 1 vs System 2 in Human Thoughts]: The concepts of System 1 and System 2 thinking, were initially brought up by the psychologist Daniel Kahneman in his lifetime research of human decision making process. They gained popularity through his best-seller Thinking fast, and slow. In his book, Daniel Kahneman explained two minds of human thoughts using the System 1 and System 2 concepts. System 1 thinking is an automatic and effortless thought process that operates quickly, continuously, and unconsiously. For example, a person quickly identifies a dog as a dog without the need to analyze its face, fur, or body. In comparison, System 2 thinking conducts deliberate reasoning with more focused efforts. For example, solving a non-trivial math problem requires analyzing the problem and careful derivations.