Volodymyr Mnih - Human-level control through deep reinforcement learning (2015)

History / Edit / PDF / EPUB / BIB /
Created: June 10, 2017 / Updated: February 6, 2021 / Status: finished / 1 min read (~162 words)

  • "The presence in the t-SNE embedding of overlapping clusters of points corresponding to the network representation of states experienced during human and agent play shows that the DQN agent also follows sequences of states similar to those found in human play. "
    • What is claimed here? That the game will transition the same way for both a human player and a DQN agent? That seems somewhat like a tautology...

  • Two key ideas
    • First, we used a biologically inspired mechanism termed experience replay that randomizes over the data, thereby removing correlations in the observation sequence and smoothing over changes in the data distribution
    • Second, we used an iterative update that adjusts the action-values (Q) towards target values that are only periodically updated, thereby reducing correlations with the target

  • Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.