Analyzing language-specific built-in functions is a necessary component of multiple program analysis tasks. In symbolic execution and exploit generation, built-in functions are usually manually translated into SMT-LIB specifications for constraint solving. Such translation requires an excessive amount of human efforts and deep understandings of the function behaviors. Incorrect translation can invalidate the results of their client applications and can bring severe security problems, e.g. a true positive case is classified as a (false) negative. In this paper, we explore the feasibility of automating the process of modeling PHP built-in functions into SMT-LIB specifications. We synthesize C programs that are equally transformed from the constraints collected in PHP symbolic execution. We then apply symbolic execution over the C program to find a suitable path for the program analysis task, which turns a constraint solving problem into a path finding problem. As the source code of PHP built-in functions is available, our method smoothly tackles the challenge of manually modeling PHP built-in functions in an automated manner. We thoroughly compare our automated method with the state-of-the-art manual modeling tool. The evaluation results show that our automated method can easily model more built-in functions and achieve higher correctness. It can solve a similar number of constraints in generating exploits for web application vulnerabilities. Our empirical analysis shows that the manual method and automated method have different strengths. Our study suggests combining manual and automated modeling to achieve better accuracy.

2021 THE WEB CONFERENCE NEWSLETTER
The Web Conference is announcing latest news and developments biweekly or on a monthly basis. We respect The General Data Protection Regulation 2016/679.