Differential Testing of Cross Deep Learning Framework APIs: Revealing Inconsistencies and Vulnerabilities

Abstract

With the increasing adoption of deep learning (DL) in various applications, developers often reuse models by, for example, performing model conversion among frameworks to raise productivity. However, security bugs in model conversion may make models behave differently across DL frameworks, and cause unpredictable errors. Prior studies primarily focus on the security of individual DL frameworks, but few of them can cope with the inconsistencies and security bugs during cross-framework conversion. Furthermore, the impact of these issues on DL applications remains largely unexplored. To this end, we propose TENSORSCOPE, a novel approach to test cross-framework APIs for security bugs. It takes as input a number of counterpart APIs that are supposed to be equivalent in functionality, then performs differential testing to identify the inconsistencies. We design novel strategies to boost testing efficiency, including 1) joint constraint analysis to raise the quality of test cases, and 2) error-guided test case fixing to refine the constraints for input. TENSORSCOPE is extensively evaluated on 1,658 APIs of six popular DL frameworks. The results show that TENSORSCOPE is more effective than FreeFuzz and DocTer by raising 28.7% and 24.3% code coverage, respectively. We find 257 bugs including 230 new bugs, and receive 8 CVEs and $1,100+ bounty with developers’ acknowledgment. Most importantly, we make the first attempt to exploit these inconsistencies to make the accuracy of three models reduced by at most 3.5%.

Publication
In 32nd USENIX Security Symposium
Zizhuang Deng
Zizhuang Deng
Research Associate

My research interests include deep learning system and software security, fuzzing and reversing.