Black-box reverse engineering

History / Edit / PDF / EPUB / BIB /
Created: May 6, 2016 / Updated: May 16, 2018 / Status: in progress / 3 min read (~499 words)

  • Reversing a hash function
  • Reversing a random generator function
  • Reversing the prime number generator function
  • Given that functions often expect arguments, how can we define a sensible set of test values?

  • Do a single operation, call all methods which return a value and check what they return
  • Call them multiple time to assert idempotences
  • Attempt to build a state machine out of the trials


def ANALYSIS(program):
    state = () # a list of states, initially empty
    functions = program.functions # a list of all functions in the program
    nonVoidFunctions = program.nonVoidFunctions # a list of functions of the program under analysis which return a non-void result upon being called
    callLength = 1 # the number of sequential functions to call
    while true:
        program.reset() # reinitialize any state in the program, assumes that no state persistency exists between resets
        functionsSequence = ... # a list of functions to call, different than the previously tested sequences
        output = () # the list of output generated by calling the functions in the functions sequence
        for function in functionsSequence:
            result = call(program, function)
            output.append(result)

  • Is it possible to deduce/infer pre-conditions/post-conditions?

  • Is it always positive/negative/zero?
  • Is it always incrementing/decrementing?
  • Is it always the same number/set of numbers?
  • Are the numbers within a given range?
  • Can the produced numbers be approximated using linear functions? Lagrange polynomial?
  • Are the numbers always integers or reals?

  • Are the strings always the same length?
  • What is the set of alphabet used?
  • Are the strings produced increasing/decreasing in length?
  • Are some parts of the produced strings constant (always present)?
  • What is the minimal/maximal string length that can be produced?
  • What is the subset of the alphabet used at each index of the string?
  • Is the set of output strings limited?

  • Test a variety of types (boolean, character, string, integer, float, miscellaneous structure) and see which ones crash and which ones don't
    • Once a list of variable types have been determined, use the same procedure as if it was static code

  • Test null values
  • Test type extremums (min/max values)
  • Test regular number set (min, -1, 0, 1, max)
  • Test uppercase/lowercase/random-case

A programmer starts by defining a feature's requirements. Then he writes a piece of code that fulfills these requirements. This code is then parsed, compiled and executed by the computer.

During this process, the variable names used by the programmer are stripped out and converted into placeholders.


function sum($left, $right) {
    return $left + $right;
}

function a1($a1, $a2) {
    $a3 = $a1 + $a2;
    return $a3;
}

sum(3, sum(4, 6));