Miltiadis Allamanis - SmartPaste: Learning to Adapt Source Code (2017)

History / Edit / PDF / EPUB / BIB /
Created: May 25, 2017 / Updated: February 6, 2021 / Status: finished / 2 min read (~212 words)

  • Given a program where variables have been replaced by placeholders, determine what variable should be used for each placeholder

  • Existing machine learning models of source code capture its shallow, textual structure, e.g. as a sequence of tokens, as parse trees, or as a flat dependency networks of variables
  • In this work, we take advantage of two additional elements of source code: data flow and execution paths
  • The key insight is that exposing these semantics explicitly as input to a machine learning model lessens the requirements on amounts of training data, model capacity and training regime and allows us to solve tasks that are beyond the current state of the art
  • To achieve high accuracy on SmartPaste, we need to learn representations of program semantics
  • First an approximation of the semantic role of a variable (e.g., "is it a counter?", "is it a filename?") needs to be learned
  • Second, an approximation of variable usage semantics (e.g., "a filename is needed here") is required

  • Only variable identifiers need to be filled in
  • Several identifiers need to be filled in at the same time and thus all choices need to be made synchronously, reflecting interdependencies