Friday, May 28, 2010

Predicting vulnerability

Great paper by Thomas Zimmerman and co. at Microsoft on the development of techniques/metrics to predict the number of vulnerabilities in a given binary. Not an incredible of productive research to date in this area predicting the number of remaining vulnerabilities, minimize them, or the time/complexity required to exploit them. This is critically important for software developers attempting to remove/reduce vulnerabilities, defenders deploying potentially highly vulnerable software (hello Adobe products!) and vulnerability researchers trying to optimize where to spend their energy and predict time to discovery.

From their abstract:
Many factors are believed to increase the vulnerability of software system; for example, the more widely deployed or popular is a software system the more likely it is to be attacked. Early identification of defects has been a widely investigated topic in software engineering research. Early identification of software vulnerabilities can help mitigate these attacks to a large degree by focusing better security verification efforts in these components. Predicting vulnerabilities is complicated by the fact that vulnerabilities are, most often, few in number and introduce significant bias by creating a sparse dataset in the population. As a result, vulnerability prediction can be thought of as the proverbial “searching for a needle in a haystack.” In this paper, we present a large-scale empirical study on Windows Vista, where we empirically evaluate the efficacy of classical metrics like complexity, churn, coverage, dependency measures, and organizational structure of the company to predict vulnerabilities and assess how well these software measures correlate with vulnerabilities. We observed in our experiments that classical software measures predict vulnerabilities with a high precision but low recall values. The actual dependencies, however, predict vulnerabilities with a lower precision but substantially higher recall.