Client-side cross-site scripting (DOM XSS) vulnerabilities in web applications are common, hard to identify, and difficult to prevent. Taint tracking is the most promising approach for detecting DOM XSS with high precision and recall, but is too computationally expensive for many practical uses, e.g., in a web browser or for offline analysis at scale. We investigate whether machine learning (ML) classifiers can replace or augment taint tracking as a method to identify DOM XSS vulnerabilities. Through a large-scale web crawl we collect over 18 billion JavaScript functions and use taint tracking to label over 180,000 functions as potentially vulnerable. With this data, we train a 3-layer, feed-forward deep neural network (DNN) to analyze a JavaScript function and predict if it is vulnerable. In the process, we experiment with a range of hyperparameters and show how to train a low-latency, high-recall classifier that could serve as a pre-filter to taint tracking to reduce the cost of stand-alone taint tracking by 3.43x while detecting 94.5% of unique vulnerabilities. We argue that this combination of a DNN and taint tracking is efficient enough for a range of use cases for which taint tracking by itself is not, including in-browser run-time DOM XSS detection and analyzing large codebases.

The Web Conference is announcing latest news and developments biweekly or on a monthly basis. We respect The General Data Protection Regulation 2016/679.