Quantitative Information Flow analysis by Symbolic Execution

01 July 2014

This month, I am going to go to San Jose, California to present a paper at SPIN. The paper is our recent effort on Quantitative Information Flow analysis using Symbolic Execution.

The problem

Quantitative Information Flow (QIF) is the problem of measuring the information transferred from a variable H to a variable O in a given process.

A well-known example of the QIF problem is the transfer of information about the password to the output, i.e. accept or reject, of a PIN verification program. Suppose somebody steals your bank card, and tries to guess your PIN number, which is a 4-digit number. If her guess is correct, you loose all your information. On the other hand, if her guess is incorrect, you still loose some information. That information is the password is not the same as her try, so she will try another number next time, and her search space is narrowed. Hence, the program leaks some information.

QIF analysis measures the leaked information in information-theoretic metrics, the most widely used are Shannon entropy and Rényi’s min-entropy.

Why Symbolic Execution?

The idea is simple. Imagine that the transfer of information is flow of water from the rivers to the sea. The sources of the rivers are the inputs, the secret input is the source which is tainted. The QIF problem is to measured the amount of tainted water that goes to the sea.

Symbolic Execution is a program analysis technique, that executes program with symbolic values instead of concrete inputs, and returns a set of all possible execution paths of the (bounded) program. That means, instead of considering the flow of each molecule H2O, Symbolic Execution provides to you the set of all rivers.

Results so far

In 2012, I proposed a project for the Google Summer of Code (GSoC) program with NASA’s Java PathFinder team. We have developed a technique to compute precisely channel capacity, i.e. maximum leakage of information, in Java bytecode program. The description of the technique can be found in our paper. It is precise, but expensive.

In 2013, I applied for GSoC again, which results in the paper that I’m going to present at SPIN. The idea is to trade precision with performance. We developed an approximate technique to compute an upper bound of the channel capacity, so it is not as precise as the previous project, but the bounds computed are still reasonable. On the good side, the running time is much much faster.