David Silver - The Predictron: End-To-End Learning and Planning (2016)

History / Edit / PDF / EPUB / BIB /
Created: June 23, 2017 / Updated: February 6, 2021 / Status: finished / 1 min read (~126 words)

  • The predictron is composed of four main components
    • A state representation $\textbf{s} = f(s)$ that encodes raw input $s$
    • A model $\textbf{s}'$, $\textbf{r}$, $\boldsymbol{\gamma} = m(\textbf{s}, \beta)$ that maps from internal state $\textbf{s}$ to subsequent internal state $\textbf{s}'$, internal reward $\textbf{r}$, and internal discount $\boldsymbol{\gamma}$
    • A value function $v$ that outputs internal values $\textbf{v} = v(\textbf{s})$ representing the future, internal return from internal state $\textbf{s}$ onwards
    • An accumulator, which combines together internal rewards, discounts, and values, into an overall estimate of value $\textbf{g}$

  • Silver, David, et al. "The predictron: End-to-end learning and planning." arXiv preprint arXiv:1612.08810 (2016).