Scientific-style figures are commonly used on the web to present numerical information. Captions that tell accurate figure information and sounds natural would significantly improve figure accessibility. In this paper, we present promising results on machine figure captioning. A recent corpus analysis of real-world captions reveals that machine figure captioning systems should start by generating accurate caption units. We formulate the caption unit generation problem as a controlled captioning problem: given a caption unit type as a control signal, a model generates an accurate and natural caption unit of that type. As a proof-of-concept, we propose a new deep learning model, FigJAM, that, given a caption unit control signal, utilizes metadata information and a joint static and dynamic dictionary, generates an accurate and natural caption unit of that type. We conduct quantitative evaluations with two datasets from a related task of figure question answering, and show that our model can generate more accurate caption units than competitive baseline models. Finally, a user study with 10 human experts confirms the value of machine-generated caption units in its standalone accuracy and naturalness.

The Web Conference is announcing latest news and developments biweekly or on a monthly basis. We respect The General Data Protection Regulation 2016/679.