Extreme-scale classification (XC) refers to the task of tagging an instance with a small subset of relevant labels from an extremely large set of all possible labels. The framework of XC has been widely employed in the web applications such as automatic labeling of web-encyclopedia and recommendation systems. While most state-of-the-art models in XC achieve high overall accuracy by performing well on the frequently occurring labels, they perform poorly on a large number of infrequent (tail) labels. This arises from two statistical challenges in XC, (i) missing labels, as it is virtually impossible to manually assign every relevant label to an instance, and (ii) highly imbalanced data distribution where a large fraction of labels are tail-labels. In this work, we consider common loss functions that decompose over labels, and calculate unbiased estimates that compensate missing labels according to (Natarajan et al., 2017). This turns out to be disadvantageous from an optimization perspective, as important properties such as convexity and lower-boundedness are lost. To circumvent this problem, we use the fact that typical loss functions in XC are convex surrogates of the 0-1 loss, and thus propose to switch to convex surrogates of its unbiased version. These surrogates are further adapted to the label imbalance by incorporating label-frequency based rebalancing and adaptive margins. The resulting losses can be easily incorporated into current frameworks for extreme classification. We tested two state-of-the-art algorithms, DiSMEC (Babbar and Scholkopf, 2017) as a shallow classifier based on the squared-hinge-loss and AttentionXML (You et al., 2019) as a deep learning method based on binary cross-entropy, and observed consistent (up to 15%) improvements in terms of propensity-scored metrics.

2021 THE WEB CONFERENCE NEWSLETTER
The Web Conference is announcing latest news and developments biweekly or on a monthly basis. We respect The General Data Protection Regulation 2016/679.