WWW '21: Proceedings of the Web Conference 2021

Full Citation in the ACM Digital Library

SESSION: Session: Online Markets

REST: Relational Event-driven Stock Trend Forecasting

Stock trend forecasting, aiming at predicting the stock future trends, is crucial for investors to seek maximized profits from the stock market. Many event-driven methods utilized the events extracted from news, social media, and discussion board to forecast the stock trend in recent years. However, existing event-driven methods have two main shortcomings: 1) overlooking the influence of event information differentiated by the stock-dependent properties; 2) neglecting the effect of event information from other related stocks. In this paper, we propose a relational event-driven stock trend forecasting (REST) framework, which can address the shortcoming of existing methods. To remedy the first shortcoming, we propose to model the stock context and learn the effect of event information on the stocks under different contexts. To address the second shortcoming, we construct a stock graph and design a new propagation layer to propagate the effect of event information from related stocks. The experimental studies on the real-world data demonstrate the efficiency of our REST framework. The results of investment simulation show that our framework can achieve a higher return of investment than baselines.

Exploring the Scale-Free Nature of Stock Markets: Hyperbolic Graph Learning for Algorithmic Trading

Quantitative trading and investment decision making are intricate financial tasks in the ever-increasing sixty trillion dollars global stock market. Despite advances in stock forecasting, a limitation of most existing neural methods is that they treat stocks independent of each other, ignoring the valuable rich signals between related stocks’ movements. Motivated by financial literature that shows stock markets and inter-stock correlations show scale-free network characteristics, we leverage domain knowledge on the Web to model inter-stock relations as a graph in four major global stock markets and formulate stock selection as a scale-free graph-based learning to rank problem. To capture the scale-free spatial and temporal dependencies in stock prices, we propose HyperStockGAT: Hyperbolic Stock Graph Attention Network, the first model on the Riemannian Manifolds for stock selection. Our work’s key novelty is the proposal of modeling the complex, scale-free nature of inter-stock relations through temporal hyperbolic graph learning on Riemannian manifolds that can represent the spatial correlations between stocks more accurately. Through extensive experiments on long-term real-world data spanning over six years on four of the world’s biggest markets: NASDAQ, NYSE, TSE, and China exchanges, we show that HyperStockGAT significantly outperforms state-of-the-art stock forecasting methods in terms of profitability by over 12%, and risk-adjusted Sharpe Ratio by over 4%. We analyze HyperStockGAT’s components’ contributions through a series of exploratory and ablative experiments to demonstrate its practical applicability to real-world trading. Furthermore, we propose a novel hyperbolic architecture that can be applied across various spatiotemporal problems on the Web’s commonly occurring scale-free networks.

Detecting and Quantifying Wash Trading on Decentralized Cryptocurrency Exchanges

Cryptoassets such as cryptocurrencies and tokens are increasingly traded on decentralized exchanges. The advantage for users is that the funds are not in custody of a centralized external entity. However, these exchanges are prone to manipulative behavior. In this paper, we illustrate how wash trading activity can be identified on two of the first popular limit order book-based decentralized exchanges on the Ethereum blockchain, IDEX and EtherDelta. We identify a lower bound of accounts and trading structures that meet the legal definitions of wash trading, discovering that they are responsible for a wash trading volume in equivalent of 159 million U.S. Dollars. While self-trades and two-account structures are predominant, complex forms also occur. We quantify these activities, finding that on both exchanges, more than 30% of all traded tokens have been subject to wash trading activity. On EtherDelta, 10% of the tokens have almost exclusively been wash traded. All data is made available for future research. Our findings underpin the need for countermeasures that are applicable in decentralized systems.

Towards Understanding and Demystifying Bitcoin Mixing Services

One reason for the popularity of Bitcoin is due to its anonymity. Although several heuristics have been used to break the anonymity, new approaches are proposed to enhance its anonymity at the same time. One of them is the mixing service. Unfortunately, mixing services have been abused to facilitate criminal activities, e.g., money laundering. As such, there is an urgent need to systematically understand Bitcoin mixing services.

In this paper, we take the first step to understand state-of-the-art Bitcoin mixing services. Specifically, we propose a generic abstraction model for mixing services and observe that there are two mixing mechanisms in the wild, i.e. swapping and obfuscating. Based on this model, we conduct a transaction-based analysis and successfully reveal the mixing mechanisms of four representative services. Besides, we propose a method to identify mixing transactions that leverage the obfuscating mechanism. The proposed approach is able to identify over 92% of the mixing transactions. Based on identified transactions, we then estimate the profit of mixing services and provide a case study of tracing the money flow of stolen Bitcoins.

Towards Understanding Cryptocurrency Derivatives:A Case Study of BitMEX

Since 2018, the cryptocurrency trading landscape has evolved from a collection of spot markets (fiat for cryptocurrency) to a hybrid ecosystem featuring complex and popular derivatives products. In this paper we explore this new paradigm through a study of BitMEX, one of the first and most successful derivatives platforms for leveraged cryptocurrency trading. BitMEX trades on average over 3 billion dollars worth of volume per day, and allows users to go long or short Bitcoin with up to 100x leverage. We analyze the evolution of BitMEX products—both settled and perpetual offerings that have become the standard across other cryptocurrency derivatives platforms. We additionally utilize on-chain forensics, public liquidation events, and a site-wide chat room to describe the diverse ensemble of amateur and professional traders that forms this community. These traders range from wealthy agents running automated strategies, to individuals trading small, risky positions and focusing on very short time-frames. Finally, we discuss how derivative trading has impacted cryptocurrency asset prices, notably how it has led to dramatic price movements in the underlying spot markets.

SESSION: Session: Security

On the Feasibility of Automated Built-in Function Modeling for PHP Symbolic Execution

Symbolic execution has been widely applied in detecting vulnerabilities in web applications. Modeling language-specific built-in functions is essential for symbolic execution. Since built-in functions tend to be complicated and are typically implemented in low-level languages, a common strategy is to manually translate them into the SMT-LIB language for constraint solving. Such translation requires an excessive amount of human effort and deep understandings of the function behaviors. Incorrect translation can invalidate the final results. This problem aggravates in PHP applications because of their cross-language nature, i.e., , the built-in functions are written in C, but the rest code is in PHP.

In this paper, we explore the feasibility of automating the process of modeling PHP built-in functions for symbolic execution. We synthesize C programs by transforming the constraint solving task in PHP symbolic execution into a C-compliant format and integrating them with C implementations of the built-in functions. We apply symbolic execution on the synthesized C program to find a feasible path, which gives a solution that can be applied to the original PHP constraints. In this way, we automate the modeling of built-in functions in PHP applications.

We thoroughly compare our automated method with the state-of-the-art manual modeling tool. The evaluation results demonstrate that our automated method is more accurate with a higher function coverage, and can exploit a similar number of vulnerabilities. Our empirical analysis also shows that the manual and automated methods have different strengths, which complement each other in certain scenarios. Therefore, the best practice is to combine both of them to optimize the accuracy, correctness, and coverage of symbolic execution.

TLS 1.3 in Practice:How TLS 1.3 Contributes to the Internet

Transport Layer Security (TLS) has become the norm for secure communication over the Internet. In August 2018, TLS 1.3, the latest version of TLS, was approved, providing improved security and performance of the previous TLS version. In this paper, we take a closer look at TLS 1.3 deployments in practice regarding adoption rate, security, performance, and implementation by applying temporal, spatial, and platform-based approaches on 687M  connections.

Overall, TLS 1.3 has rapidly been adopted mainly due to third-party platforms such as Content Delivery Networks (CDNs) makes a significant contribution to the Internet. In fact, it deprecates vulnerable cryptographic primitives and substantially reduces the time required to perform the TLS 1.3 full handshake compared to the TLS 1.2 handshake. We quantify these aspects and show TLS 1.3 is beneficial to websites that do not rely on the third-party platforms. We also review Common Vulnerabilities and Exposures (CVEs) regarding TLS libraries and show that many of recent vulnerabilities can be easily addressed by upgrading to TLS 1.3. However, some websites exhibit unstable support for TLS 1.3 due to multiple platforms with different TLS versions or migration to other platforms, which means that a website can show the lower TLS version at a certain time or from a certain region. Furthermore, we find that most of the implementations (including TLS libraries) do not fully support the new features of TLS 1.3 such as downgrade protection and certificate extensions.

Towards Realistic and ReproducibleWeb Crawl Measurements

Accurate web measurement is critical for understanding and improving security and privacy online. Such measurements implicitly assume that automated crawls generalize to typical web user experience. But anecdotal evidence suggests the web behaves differently when seen via well-known measurement endpoints or measurement automation frameworks, for various reasons. Our work improves the state of web privacy and security by investigating how key measurements differ when using naive crawling tool defaults vs. careful attempts to match “real” users across the Tranco top 25k web domains. We find web privacy and security measurements significantly affected by vantage point and browser configuration. We conclude that unless researchers ensure their web measurement tools match real world user experience, the research community is likely missing important signals systematically. For example, we find browser configuration alone causing shifts in 19% of known ad and tracking domains encountered and altering the loading frequency of up to 10% of distinct JavaScript code units executed. We find network vantage point having similar, though less dramatic, effects on the same web metrics. To ensure reproducibility, we carefully document our methodology and publish both our code and collected data.

#Twiti: Social Listening for Threat Intelligence

Twitter is a popular public source for threat hunting. Many security vendors and security professionals use Twitter in practice for collecting Indicators of Compromise (IOCs). However, little is known about IOCs on Twitter. Their important characteristics such as earliness, uniqueness, and accuracy have never been investigated. Moreover, how to extract IOCs from Twitter with high accuracy is not obvious. In this paper, we present Twiti, a system that automatically extracts various forms of malware IOCs from Twitter. Based on the collected IOCs, we conduct the first empirical assessment and thorough analysis of malware IOCs on Twitter. Twiti extracts IOCs from tweets identified as having malware IOC information by leveraging natural language processing and machine learning techniques. With extensive evaluation, we demonstrate that not only can Twiti extract malware IOCs accurately, but also the extracted IOCs are unique and early. By analyzing IOCs in Twiti from various aspects, we find that Twitter captures ongoing malware threats such as Emotet variants and malware distribution sites better than other public threat intelligence (TI) feeds. We also find that only a tiny fraction of IOCs on Twitter come from commercial vendor accounts and individual Twitter users are the main contributors of the early detected or exclusive IOCs, which indicates that Twitter can provide many valuable IOCs uncovered in commercial domain

An Investigation of Identity-Account Inconsistency in Single Sign-On

Single Sign-On (SSO) has been widely adopted for online authentication due to its favorable usability and security. However, it also introduces a single point of failure since all service providers fully trust the identity of a user created by the SSO identity provider. In this paper, we investigate the identity-account inconsistency threat, a new SSO vulnerability that can cause the compromise of online accounts. The vulnerability exists because current SSO systems highly rely on a user’s email address to bind an account with a real identity, but ignore the fact that email addresses might be reused by other users. We reveal that under the SSO authentication, such inconsistency allows an adversary controlling a reused email address to take over associated online accounts without knowing any credentials like passwords. Specifically, we first conduct a measurement study on the account management policies for multiple cloud email providers, showing the feasibility of acquiring previously used email accounts. We further perform a systematic study on 100 popular websites using the Google business email service with our own domain address and demonstrate that most online accounts can be compromised by exploiting this inconsistency vulnerability. To shed light on email reuse in the wild, we analyze the commonly used naming conventions that lead to a wide existence of potential email address collisions, and conduct a case study on the account policies of U.S. universities. Finally, we propose several useful practices for end-users, service providers, and identity providers to protect against this identity-account inconsistency threat.

SESSION: Session: Learn to Rank

An Alternative Cross Entropy Loss for Learning-to-Rank

Listwise learning-to-rank methods form a powerful class of ranking algorithms that are widely adopted in applications such as information retrieval. These algorithms learn to rank a set of items by optimizing a loss that is a function of the entire set—as a surrogate to a typically non-differentiable ranking metric. Despite their empirical success, existing listwise methods are based on heuristics and remain theoretically ill-understood. In particular, none of the empirically successful loss functions are related to ranking metrics. In this work, we propose a cross entropy-based learning-to-rank loss function that is theoretically sound, is a convex bound on NDCG—a popular ranking metric—and is consistent with NDCG under learning scenarios common in information retrieval. Furthermore, empirical evaluation of an implementation of the proposed method with gradient boosting machines on benchmark learning-to-rank datasets demonstrates the superiority of our proposed formulation over existing algorithms in quality and robustness.

Diversification-Aware Learning to Rank using Distributed Representation

Existing work on search result diversification typically falls into the “next document” paradigm, that is, selecting the next document based on the ones already chosen. A sequential process of selecting documents one-by-one is naturally modeled in learning-based approaches. However, such a process makes the learning difficult because there are an exponential number of ranking lists to consider. Sampling is usually used to reduce the computational complexity but this makes the learning less effective. In this paper, we propose a soft version of the “next document” paradigm in which we associate each document with an approximate rank, and thus the subtopics covered prior to a document can also be estimated. We show that we can derive differentiable diversification-aware losses, which are smooth approximation of diversity metrics like α-NDCG, based on these estimates. We further propose to optimize the losses in the learning-to-rank setting using neural distributed representations of queries and documents. Experiments are conducted on the public benchmark TREC datasets. By comparing with an extensive list of baseline methods, we show that our Diversification-Aware LEarning-TO-Rank (DALETOR) approaches outperform them by a large margin, while being much simpler during learning and inference.

Maximizing Marginal Fairness for Dynamic Learning to Rank

Rankings, especially those in search and recommendation systems, often determine how people access information and how information is exposed to people. Therefore, how to balance the relevance and fairness of information exposure is considered as one of the key problems for modern IR systems. As conventional ranking frameworks that myopically sorts documents with their relevance will inevitably introduce unfair result exposure, recent studies on ranking fairness mostly focus on dynamic ranking paradigms where result rankings can be adapted in real-time to support fairness in groups (i.e., races, genders, etc.). Existing studies on fairness in dynamic learning to rank, however, often achieve the overall fairness of document exposure in ranked lists by significantly sacrificing the performance of result relevance and fairness on the top results. To address this problem, we propose a fair and unbiased ranking method named Maximal Marginal Fairness (MMF). The algorithm integrates unbiased estimators for both relevance and merit-based fairness while providing an explicit controller that balances the selection of documents to maximize the marginal relevance and fairness in top-k results. Theoretical and empirical analysis shows that, with small compromises on long list fairness, our method achieves superior efficiency and effectiveness comparing to the state-of-the-art algorithms in both relevance and fairness for top-k rankings.

PairRank: Online Pairwise Learning to Rank by Divide-and-Conquer

Online Learning to Rank (OL2R) eliminates the need of explicit relevance annotation by directly optimizing the rankers from their interactions with users. However, the required exploration drives it away from successful practices in offline learning to rank, which limits OL2R’s empirical performance and practical applicability. In this work, we propose to estimate a pairwise learning to rank model online. In each round, candidate documents are partitioned and ranked according to the model’s confidence on the estimated pairwise rank order, and exploration is only performed on the uncertain pairs of documents, i.e., divide-and-conquer. Regret directly defined on the number of mis-ordered pairs is proven, which connects the online solution’s theoretical convergence with its expected ranking performance. Comparisons against an extensive list of OL2R baselines on two public learning to rank benchmark datasets demonstrate the effectiveness of the proposed solution.

Robust Generalization and Safe Query-Specializationin Counterfactual Learning to Rank

Existing work in counterfactual Learning to Rank (LTR) has focussed on optimizing feature-based models that predict the optimal ranking based on document features. LTR methods based on bandit algorithms often optimize tabular models that memorize the optimal ranking per query. These types of model have their own advantages and disadvantages. Feature-based models provide very robust performance across many queries, including those previously unseen, however, the available features often limit the rankings the model can predict. In contrast, tabular models can converge on any possible ranking through memorization. However, memorization is extremely prone to noise, which makes tabular models reliable only when large numbers of user interactions are available. Can we develop a robust counterfactual LTR method that pursues memorization-based optimization whenever it is safe to do?

We introduce the Generalization and Specialization (GENSPEC) algorithm, a robust feature-based counterfactual LTR method that pursues per-query memorization when it is safe to do so. Generalization and Specialization (GENSPEC) optimizes a single feature-based model for generalization: robust performance across all queries, and many tabular models for specialization: each optimized for high performance on a single query. GENSPEC uses novel relative high-confidence bounds to choose which model to deploy per query. By doing so, GENSPEC enjoys the high performance of successfully specialized tabular models with the robustness of a generalized feature-based model. Our results show that GENSPEC leads to optimal performance on queries with sufficient click data, while having robust behavior on queries with little or noisy data.

SESSION: Session: Digital Health

Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics

Modern healthcare systems knitted by a web of entities (e.g., hospitals, clinics, pharmacy companies) are collecting a huge volume of healthcare data from a large number of individuals with various medical procedures, medications, diagnosis, and lab tests. To extract meaningful medical concepts (i.e., phenotypes) from such higher-arity relational healthcare data, tensor factorization has been proven to be an effective approach and received increasing research attention, due to their intrinsic capability to represent the high-dimensional data. Recently, federated learning offers a privacy-preserving paradigm for collaborative learning among different entities, which seemingly provides an ideal potential to further enhance the tensor factorization-based collaborative phenotyping to handle sensitive personal health data. However, existing attempts to federated tensor factorization come with various limitations, including restrictions to the classic tensor factorization, high communication cost and reduced accuracy. We propose a communication efficient federated generalized tensor factorization, which is flexible enough to choose from a variate of losses to best suit different types of data in practice. We design a three-level communication reduction strategy tailored to the generalized tensor factorization, which is able to reduce the uplink communication cost up to 99.90%. In addition, we theoretically prove that our algorithm does not compromise convergence speed despite the aggressive communication compression. Extensive experiments on two real-world electronics health record datasets demonstrate the efficiency improvements in terms of computation and communication cost.

Completing Missing Prevalence Rates for Multiple Chronic Diseases by Jointly Leveraging Both Intra- and Inter-Disease Population Health Data Correlations

Population health data are becoming more and more publicly available on the Internet than ever before. Such datasets offer a great potential for enabling a better understanding of the health of populations, and inform health professionals and policy makers for better resource planning, disease management and prevention across different regions. However, due to the laborious and high-cost nature of collecting such public health data, it is a common place to find many missing entries on these datasets, which challenges the utility of the data and hinders reliable analysis and understanding. To tackle this problem, this paper proposes a deep-learning-based approach, called Compressive Population Health (CPH), to infer and recover (to complete) the missing prevalence rate entries of multiple chronic diseases. The key insight of CPH relies on the combined exploitation of both intra-disease and inter-disease correlation opportunities. Specifically, we first propose a Convolutional Neural Network (CNN) based approach to extract and model both of these two types of correlations, and then adopt a Generative Adversarial Network (GAN) based prevalence inference model to jointly fuse them to facility the prevalence rates data recovery of missing entries. We extensively evaluate the inference model based on real-world public health datasets publicly available on the Web. Results show that our inference method outperforms other baseline methods in various settings and with a significantly improved accuracy (from 14.8% to 9.1%).

Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach

Online peer-to-peer support platforms enable conversations between millions of people who seek and provide mental health support. If successful, web-based mental health conversations could improve access to treatment and reduce the global disease burden. Psychologists have repeatedly demonstrated that empathy, the ability to understand and feel the emotions and experiences of others, is a key component leading to positive outcomes in supportive conversations. However, recent studies have shown that highly empathic conversations are rare in online mental health platforms.

In this paper, we work towards improving empathy in online mental health support conversations. We introduce a new task of empathic rewriting which aims to transform low-empathy conversational posts to higher empathy. Learning such transformations is challenging and requires a deep understanding of empathy while maintaining conversation quality through text fluency and specificity to the conversational context. Here we propose Partner, a deep reinforcement learning (RL) agent that learns to make sentence-level edits to posts in order to increase the expressed level of empathy while maintaining conversation quality. Our RL agent leverages a policy network, based on a transformer language model adapted from GPT-2, which performs the dual task of generating candidate empathic sentences and adding those sentences at appropriate positions. During training, we reward transformations that increase empathy in posts while maintaining text fluency, context specificity, and diversity. Through a combination of automatic and human evaluation, we demonstrate that Partner successfully generates more empathic, specific, and diverse responses and outperforms NLP methods from related tasks such as style transfer and empathic dialogue generation. This work has direct implications for facilitating empathic conversations on web-based platforms.

Search Engines vs. Symptom Checkers: A Comparison of their Effectiveness for Online Health Advice

Increasingly, people go online to seek health advice. They commonly use the symptoms they are experiencing to identify the health conditions they may have (self-diagnosis task) as well as to determine an appropriate action to take (triaging task); e.g., should they seek emergent medical attention or attempt to treat themselves at home? This paper investigates the effectiveness of two of the most common methods people use for self-diagnosis and triaging: online symptom checkers and traditional web search engines. To this end, we conducted a user study with 64 real-world users performing 8 simulated self-diagnosis tasks. Participants were exposed to both a representative symptom checker and a search engine. The results of our study provides empirical evidence for whether using a search engine for health information improves people’s understanding of their health condition and their ability to act on them, compared to interacting with a symptom checker, which bases its interaction model on a question-answering process. Additionally, recorded answers to qualitative questionnaires from study participants provide insights into which style of interaction and system they prefer to use for obtaining medical information, and how helpful they thought each system was. These findings can help inform the development of better search engines and symptom checkers that support people seeking health advice online.

UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data

Successful health risk prediction demands accuracy and reliability of the model. Existing predictive models mainly depend on mining electronic health records (EHR) with advanced deep learning techniques to improve model accuracy. However, they all ignore the importance of publicly available online health data, especially socioeconomic status, environmental factors, and detailed demographic information for each location, which are all strong predictive signals and can definitely augment precision medicine. To achieve model reliability, the model needs to provide accurate prediction and uncertainty score of the prediction. However, existing uncertainty estimation approaches often failed in handling high-dimensional data, which are present in multi-sourced data.

To fill the gap, we propose UNcertaInTy-based hEalth risk prediction (UNITE) model. Building upon an adaptive multimodal deep kernel and a stochastic variational inference module, UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data including EHR data, patient demographics, and public health data collected from the web. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer’s disease (AD). UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to 19% over the best baseline. We also show UNITE can model meaningful uncertainties and can provide evidence-based clinical support by clustering similar patients.

SESSION: Session: Search

MIRA:Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks

We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevant documents from billions of candidates. The common framework is to encoding queries and documents separately into distributed representations and match them in latent semantic space. However, all the exiting deep encoding models only leverage the information of the document itself, which is often not sufficient in practice when matching with query terms, especially for the hard tail queries. In this work we aim to leverage the additional information for documents from their co-click neighbours to help document retrieval. The challenges include how to effectively extract information and eliminate noise when involving co-click information while meet the demands of industrial scalability for real time online serving.

To handle the noise in co-click relations, we firstly propose a web-scale Multi-Intention Co-click document Graph(MICG) which builds the co-click connections between documents on click intention level but not on document level. Then we present an encoding framework MIRA based on Bert and graph attention networks which leverages a two-factor attention mechanism to aggregate neighbours. To meet the online latency requirements, we only involve neighbour information in document side which can save the time-consuming query neighbor search in real time serving. We conduct extensive offline experiments on two public datasets and one private web-scale dataset from major commercial search engines(Bing1 and Sougou2) demonstrating the effectiveness and scalability of the proposed method compared with several baselines. And a further case study reveals that co-click relations mainly help improve web search quality from two aspects: key concept enhancing and query term complementary.

Long Short-Term Session Search: Joint Personalized Reranking and Next Query Prediction

DR and next query prediction (NQP) are two core tasks in session search. They are often driven by the same search intent and, hence, it is natural to jointly optimize both tasks. So far, most models proposed for jointly optimizing document reranking (DR) and NQP have focused on users’ short-term intent in an ongoing search session. Because of this limitation, these models fail to account for users’ long-term intent as captured in their historical search sessions. In contrast, we consider a personalized mechanism for learning a user’s profile from their long-term and short-term behavior to simultaneously enhance the performance of DR and NQP in an ongoing search session.

We propose a personalized session search model, called Long short-term session search, Network (LostNet), that jointly learns to rerank documents for the current query and predict the next query. LostNet consists of three modules: The hierarchical session-based attention mechanism tracks the fine-grained short-term intent in an ongoing session. The personalized multi-hop memory network tracks a user’s dynamic profile information from their prior search sessions so as to infer their personal search intent. Jointly learning of DR and NQP is aimed at simultaneously reranking documents and predicting the next query based on outputs from the above two modules. We conduct experiments on two large-scale session search benchmark datasets. The results show that LostNet achieves significant improvements over state-of-the-art baselines.

On the Value of Wikipedia as a Gateway to the Web

By linking to external websites, Wikipedia can act as a gateway to the Web. To date, however, little is known about the amount of traffic generated by Wikipedia’s external links. We fill this gap in a detailed analysis of usage logs gathered from Wikipedia users’ client devices. Our analysis proceeds in three steps: First, we quantify the level of engagement with external links, finding that, in one month, English Wikipedia generated 43M clicks to external websites, in roughly even parts via links in infoboxes, cited references, and article bodies. Official links listed in infoboxes have by far the highest click-through rate (CTR), 2.47% on average. In particular, official links associated with articles about businesses, educational institutions, and websites have the highest CTR, whereas official links associated with articles about geographical content, television, and music have the lowest CTR. Second, we investigate patterns of engagement with external links, finding that Wikipedia frequently serves as a stepping stone between search engines and third-party websites, effectively fulfilling information needs that search engines do not meet. Third, we quantify the hypothetical economic value of the clicks received by external websites from English Wikipedia, by estimating that the respective website owners would need to pay a total of $7–13 million per month to obtain the same volume of traffic via sponsored search. Overall, these findings shed light on Wikipedia’s role not only as an important source of information, but also as a high-traffic gateway to the broader Web ecosystem.

Projected Hamming Dissimilarity for Bit-Level Importance Coding in Collaborative Filtering

When reasoning about tasks that involve large amounts of data, a common approach is to represent data items as objects in the Hamming space where operations can be done efficiently and effectively. Object similarity can then be computed by learning binary representations (hash codes) of the objects and computing their Hamming distance. While this is highly efficient, each bit dimension is equally weighted, which means that potentially discriminative information of the data is lost. A more expressive alternative is to use real-valued vector representations and compute their inner product; this allows varying the weight of each dimension but is many magnitudes slower. To fix this, we derive a new way of measuring the dissimilarity between two objects in the Hamming space with binary weighting of each dimension (i.e., disabling bits): we consider a field-agnostic dissimilarity that projects the vector of one object onto the vector of the other. When working in the Hamming space, this results in a novel projected Hamming dissimilarity, which by choice of projection, effectively allows a binary importance weighting of the hash code of one object through the hash code of the other. We propose a variational hashing model for learning hash codes optimized for this projected Hamming dissimilarity, and experimentally evaluate it in collaborative filtering experiments. The resultant hash codes lead to effectiveness gains of up to +7% in NDCG and +14% in MRR compared to state-of-the-art hashing-based collaborative filtering baselines, while requiring no additional storage and no computational overhead compared to using the Hamming distance.

Constructing a Comparison-based Click Model for Web Search

Extracting valuable feedback information from user behavior logs is one of the major concerns in Web search studies. Among the tremendous efforts that aim to improve search performance with user behavior modeling, constructing click models is of vital importance because it provides a direct estimation of result relevance. Most existing click models assume that whether or not users click on results only depends on the examination probability and the content of the result. However, through a carefully designed user eye-tracking study, we found that users do not make click-through decisions in isolation. Instead, they also consider the context of a result (e.g., adjacent results). This finding leads to the design of a novel click model named Comparison-based Click Model (CBCM). Different from traditional examination hypotheses, CBCM introduces the concept of an examination viewport and assumes users click results after comparing adjacent results within the same viewport. The experimental results on a publicly available user behavior dataset demonstrate the effectiveness of CBCM. We also public our code of CBCM and dataset.

SESSION: Session: Systems and Infrastructure

WiseTrans: Adaptive Transport Protocol Selection for Mobile Web Service

To improve the performance of mobile web service, a new transport protocol, QUIC, has been recently proposed. However, for large-scale real-world deployments, deciding whether and when to use QUIC in mobile web service is challenging. Complex temporal correlation of network conditions, high spatial heterogeneity of users in a nationwide deployment, and limited resources on mobile devices all affect the selection of transport protocols. In this paper, we present WiseTrans to adaptively switch transport protocols for mobile web service online and improve the completion time of web requests.

WiseTrans introduces machine learning techniques to deal with temporal heterogeneity, makes decisions with historical information to handle spatial heterogeneity, and switches transport protocols at the request level to reach both high performance and acceptable overhead. We implement WiseTrans on two platforms (Android and iOS) in a popular mobile web service application of Baidu. Comprehensive experiments demonstrate that WiseTrans can reduce request completion time by up to 26.5% on average compared to the usage of a single protocol.

Surrounded by the Clouds: A Comprehensive Cloud Reachability Study

In the early days of cloud computing, datacenters were sparsely deployed at distant locations far from end-users with high end-to-end communication latency. However, today’s cloud datacenters have become more geographically spread, the bandwidth of the networks keeps increasing, pushing the end-users latency down. In this paper, we provide a comprehensive cloud reachability study as we perform extensive global client-to-cloud latency measurements towards 189 datacenters from all major cloud providers. We leverage the well-known measurement platform RIPE Atlas, involving up to 8500 probes deployed in heterogeneous environments, e.g., home and offices. Our goal is to evaluate the suitability of modern cloud environments for various current and predicted applications. We achieve this by comparing our latency measurements against known human perception thresholds and are able to draw inferences on the suitability of current clouds for novel applications, such as augmented reality. Our results indicate that the current cloud coverage can easily support several latency-critical applications, like cloud gaming, for the majority of the world’s population.

BrowseLite: A Private Data Saving Solution for the Web

The median webpage has increased in size by more than 80% in the last 4 years. This extra complexity allows for a rich browsing experience, but it hurts the majority of mobile users which still pay for their traffic. This has motivated several data-saving solutions, which aim at reducing the complexity of webpages by transforming their content. Despite each method being unique, they either reduce user privacy by further centralizing web traffic through data-saving middleboxes or introduce web compatibility (Web-compat) issues by removing content that breaks pages in unpredictable ways.

In this paper, we argue that data-saving is still possible without impacting either users privacy or Web-compat. Our main observation is that Web images make up a large portion of Web traffic and have negligible impact on Web-compat. To this end we make two main contributions. First, we quantify the potential savings that image manipulation, such as dimension resizing, quality compression, and transcoding, enables at large scale: 300 landing and 880 internal pages. Next, we design and build BrowseLite, an entirely client-side tool that achieves such data savings through opportunistically instrumenting existing server-side tooling to perform image compression, while simultaneously reducing the total amount of image data fetched. The effect of BrowseLite on the user experience is quantified using standard page load metrics and a real user study of over 200 users across 50 optimized web pages. BrowseLite allows for similar savings to middlebox approaches, while offering additional security, privacy, and Web-compat guarantees.

Superways: A Datacenter Topology for Incast-heavy workloads

Several important datacenter applications cause incast congestion, which severely degrades flow completion times of short flows and throughput of long flows. Further, because most flows are short and the incast duration is shorter than typical round-trip times, reactive mechanisms that rely on congestion control are not effective. While modern datacenter topologies provide high bisection bandwidth to support all-to-all traffic, incast is fundamentally a many-to-one traffic pattern, and therefore, requires deep buffers or high bandwidth at the network edge.

We propose Superways, a heterogeneous datacenter topology that provides higher bandwidth for some servers to absorb incasts, as incasts occur only at a small number of servers that aggregate responses from other senders. Our design is based on the key observation that a small subset of servers which aggregate responses are likely to be network bound, whereas most other servers that communicate only with random servers are not. Superways can be implemented over many of the existing datacenter topologies and can be expanded flexibly without incurring high cost and cabling complexity. We also provide a heuristic for scheduling jobs in our topology to fully utilize the extra capacity. Using a real CloudLab implementation and using ns-3 simulations, we show that Superways significantly improves flow completion times and throughput over existing datacenter topologies. We also analyze cost and cabling complexity, and discuss how to expand our topology.

BBR Bufferbloat in DASH Video

BBR is a new congestion control algorithm and is seeing increased adoption especially for video traffic. BBR solves the bufferbloat problem in legacy loss-based congestion control algorithms where application performance drops considerably when router buffers are deep. BBR regulates traffic such that router queues don’t build up to avoid the bufferbloat problem while still maintaining high throughput. However, our analysis shows that video applications experience significantly poor performance when using BBR under deep buffers. In fact, we find that video traffic sees inflated latencies because of long queues at the router, ultimately degrading video performance. To understand this dichotomy, we study the interaction between BBR and DASH video. Our investigation reveals that BBR under deep buffers and high network burstiness severely overestimates available bandwidth and does not converge to steady state, both of which results in BBR sending substantially more data into the network, causing a queue buildup. This elevated packet sending rate under BBR is ultimately caused by the router’s ability to absorb bursts in traffic, which destabilizes BBR’s bandwidth estimation and overrides BBR’s expected logic for exiting the startup phase. We design a new bandwidth estimation algorithm and apply it to BBR (and a still-unreleased, newer version of BBR called BBR2). Our modified BBR and BBR2 both see significantly improved video QoE even under deep buffers.

SESSION: Session: Graph Models

Graph Structure Estimation Neural Networks

Graph Neural Networks (GNNs) have drawn considerable attention in recent years and achieved outstanding performance in many tasks. Most empirical studies of GNNs assume that the observed graph represents a complete and accurate picture of node relationship. However, this fundamental assumption cannot always be satisfied, since the real-world graphs from complex systems are error-prone and may not be compatible with the properties of GNNs. Therefore, GNNs solely relying on original graph may cause unsatisfactory results, one typical example of which is that GNNs perform well on graphs with homophily while fail on the disassortative situation. In this paper, we propose graph estimation neural networks GEN, which estimates graph structure for GNNs. Specifically, our GEN presents a structure model to fit the mechanism of GNNs by generating graphs with community structure, and an observation model that injects multifaceted observations into calculating the posterior distribution of graphs and is the first to incorporate multi-order neighborhood information. With above two models, the estimation of graph is implemented based on Bayesian inference to maximize the posterior probability, which attains mutual optimization with GNN parameters in an iterative framework. To comprehensively evaluate the performance of GEN, we perform a set of experiments on several benchmark datasets with different homophily and a synthetic dataset, where the experimental results demonstrate the effectiveness of our GEN and rationality of the estimated graph.

Efficient Probabilistic Truss Indexing on Uncertain Graphs

Networks in many real-world applications come with an inherent uncertainty in their structure, due to e.g., noisy measurements, inference and prediction models, or for privacy purposes. Modeling and analyzing uncertain graphs has attracted a great deal of attention. Among the various graph analytic tasks studied, the extraction of dense substructures, such as cores or trusses, has a central role.

In this paper, we study the problem of (k, γ)-truss indexing and querying over an uncertain graph . A (k, γ)-truss is the largest subgraph of , such that the probability of each edge being contained in at least k − 2 triangles is no less than γ. Our first proposal, CPT-index, keeps all the (k, γ)-trusses: retrieval for any given k and γ can be executed in an optimal linear time w.r.t. the graph size of the queried (k, γ)-truss. We develop a bottom-up CPT-indexconstruction scheme and an improved algorithm for fast CPT-indexconstruction using top-down graph partitions. For trading off between (k, γ)-truss offline indexing and online querying, we further develop an approximate indexing approach (ϵ, Δr)-APXequipped with two parameters, ϵ and Δr, that govern tolerated errors.

Extensive experiments using large-scale uncertain graphs with 261 million edges validate the efficiency of our proposed indexing and querying algorithms against state-of-the-art methods.

Random Graphs with Prescribed K-Core Sequences: A New Null Model for Network Analysis

In the analysis of large-scale network data, a fundamental operation is the comparison of observed phenomena to the predictions provided by null models: when we find an interesting structure in a family of real networks, it is important to ask whether this structure is also likely to arise in random networks with similar characteristics to the real ones. A long-standing challenge in network analysis has been the relative scarcity of reasonable null models for networks; arguably the most common such model has been the configuration model, which starts with a graph G and produces a random graph with the same node degrees as G. This leads to a very weak form of null model, since fixing the node degrees does not preserve many of the crucial properties of the network, including the structure of its subgraphs.

Guided by this challenge, we establish a new family of network null models that operate on the k-core decomposition. For a graph G, the k-core is its maximal subgraph of minimum degree k; and the core number of a node v in G is the largest k such that v belongs to the k-core of G. We provide the first efficient sampling algorithm to solve the following basic combinatorial problem: given a graph G, produce a random graph sampled nearly uniformly from among all graphs with the same sequence of core numbers as G. This opens the opportunity to compare observed networks G with random graphs that exhibit the same core numbers, a comparison that preserves aspects of the structure of G that are not captured by more local measures like the degree sequence. We illustrate the power of this core-based null model on some fundamental tasks in network analysis, including the enumeration of networks motifs.

Motif-driven Dense Subgraph Discovery in Directed and Labeled Networks

Dense regions in networks are an indicator of interesting and unusual information. However, most existing methods only consider simple, undirected, unweighted networks. Complex networks in the real-world often have rich information though: edges are asymmetrical and nodes/edges have categorical and numerical attributes. Finding dense subgraphs in such networks in accordance with this rich information is an important problem with many applications. Furthermore, most existing algorithms ignore the higher-order relationships (i.e., motifs) among the nodes. Motifs are shown to be helpful for dense subgraph discovery but their wide spectrum in heterogeneous networks makes it challenging to utilize them effectively. In this work, we propose quark decomposition framework to locate dense subgraphs that are rich with a given motif. We focus on networks with directed edges and categorical attributes on nodes/edges. For a given motif, our framework builds subgraphs, called quarks, in varying quality and with hierarchical relations. Our framework is versatile, efficient, and extendible. We discuss the limitations and practical instantiations of our framework as well as the role confusion problem that needs to be considered in directed networks. We give an extensive evaluation of our framework in directed, signed-directed, and node-labeled networks. We consider various motifs and evaluate the quark decomposition using several real-world networks. Results show that quark decomposition performs better than the state-of-the-art techniques. Our framework is also practical and scalable to networks with up to 101M edges.

Heterogeneous Graph Neural Network via Attribute Completion

Heterogeneous information networks (HINs), also called heterogeneous graphs, are composed of multiple types of nodes and edges, and contain comprehensive information and rich semantics. Graph neural networks (GNNs), as powerful tools for graph data, have shown superior performance on network analysis. Recently, many excellent models have been proposed to process hetero-graph data using GNNs and have achieved great success. These GNN-based heterogeneous models can be interpreted as smooth node attributes guided by graph structure, which requires all nodes to have attributes. However, this is not easy to satisfy, as some types of nodes often have no attributes in heterogeneous graphs. Previous studies take some handcrafted methods to solve this problem, which separate the attribute completion from the graph learning process and, in turn, result in poor performance. In this paper, we hold that missing attributes can be acquired by a learnable manner, and propose a general framework for Heterogeneous Graph Neural Network via Attribute Completion (HGNN-AC), including pre-learning of topological embedding and attribute completion with attention mechanism. HGNN-AC first uses existing HIN-Embedding methods to obtain node topological embedding. Then it uses the topological relationship between nodes as guidance to complete attributes for no-attribute nodes by weighted aggregation of the attributes from these attributed nodes. Our complement mechanism can be easily combined with an arbitrary GNN-based heterogeneous model making the whole system end-to-end. We conduct extensive experiments on three real-world heterogeneous graphs. The results demonstrate the superiority of the proposed framework over state-of-the-art baselines.

SESSION: Session: Recommendations

DGCN: Diversified Recommendation with Graph Convolutional Networks

These years much effort has been devoted to improving the accuracy or relevance of the recommendation system. Diversity, a crucial factor which measures the dissimilarity among the recommended items, received rather little scrutiny. Directly related to user satisfaction, diversification is usually taken into consideration after generating the candidate items. However, this decoupled design of diversification and candidate generation makes the whole system suboptimal. In this paper, we aim at pushing the diversification to the upstream candidate generation stage, with the help of Graph Convolutional Networks (GCN). Although GCN based recommendation algorithms have shown great power in modeling complex collaborative filtering effect to improve the accuracy of recommendation, how diversity changes is ignored in those advanced works. We propose to perform rebalanced neighbor discovering, category-boosted negative sampling and adversarial learning on top of GCN. We conduct extensive experiments on real-world datasets. Experimental results verify the effectiveness of our proposed method on diversification. Further ablation studies validate that our proposed method significantly alleviates the accuracy-diversity dilemma.

Self-Supervised Multi-Channel Hypergraph Convolutional Network for Social Recommendation

Social relations are often used to improve recommendation quality when user-item interaction data is sparse in recommender systems. Most existing social recommendation models exploit pairwise relations to mine potential user preferences. However, real-life interactions among users are very complex and user relations can be high-order. Hypergraph provides a natural way to model high-order relations, while its potentials for improving social recommendation are under-explored. In this paper, we fill this gap and propose a multi-channel hypergraph convolutional network to enhance social recommendation by leveraging high-order user relations. Technically, each channel in the network encodes a hypergraph that depicts a common high-order user relation pattern via hypergraph convolution. By aggregating the embeddings learned through multiple channels, we obtain comprehensive user representations to generate recommendation results. However, the aggregation operation might also obscure the inherent characteristics of different types of high-order connectivity information. To compensate for the aggregating loss, we innovatively integrate self-supervised learning into the training of the hypergraph convolutional network to regain the connectivity information with hierarchical mutual information maximization. Extensive experiments on multiple real-world datasets demonstrate the superiority of the proposed model over the current SOTA methods, and the ablation study verifies the effectiveness and rationale of the multi-channel setting and the self-supervised task. The implementation of our model is available via https://github.com/Coder-Yu/RecQ.

Reinforcement Recommendation with User Multi-aspect Preference

Formulating recommender system with reinforcement learning (RL) frameworks has attracted increasing attention from both academic and industry communities. While many promising results have been achieved, existing models mostly simulate the environment reward with a unified value, which may hinder the understanding of users’ complex preferences and limit the model performance. In this paper, we consider how to model user multi-aspect preferences in the context of RL-based recommender system. More specifically, we base our model on the framework of deterministic policy gradient (DPG), which is effective in dealing with large action spaces. A major challenge for modeling user multi-aspect preferences lies in the fact that they may contradict with each other. To solve this problem, we introduce Pareto optimization into the DPG framework. We assign each aspect with a tailored critic, and all the critics share the same actor. The Pareto optimization is realized by a gradient-based method, which can be easily integrated into the actor and critic learning process. Based on the designed model, we theoretically analyze its gradient bias in the optimization process, and we design a weight-reuse mechanism to lower the upper bound of this bias, which is shown to be effective for improving the model performance. We conduct extensive experiments based on three real-world datasets to demonstrate our model’s superiorities.

Variation Control and Evaluation for Generative Slate Recommendations

Slate recommendation generates a list of items as a whole instead of ranking each item individually, so as to better model the intra-list positional biases and item relations. In order to deal with the enormous combinatorial space of slates, recent work considers a generative solution so that a slate distribution can be directly modeled. However, we observe that such approaches—despite their proved effectiveness in computer vision—suffer from a trade-off dilemma in recommender systems: when focusing on reconstruction, they easily over-fit the data and hardly generate satisfactory recommendations; on the other hand, when focusing on satisfying the user interests, they get trapped in a few items and fail to cover the item variation in slates. In this paper, we propose to enhance the accuracy-based evaluation with slate variation metrics to estimate the stochastic behavior of generative models. We illustrate that instead of reaching to one of the two undesirable extreme cases in the dilemma, a valid generative solution resides in a narrow “elbow” region in between. And we show that item perturbation can enforce slate variation and mitigate the over-concentration of generated slates, which expand the “elbow” performance to an easy-to-find region. We further propose to separate a pivot selection phase from the generation process so that the model can apply perturbation before generation. Empirical results show that this simple modification can provide even better variance with the same level of accuracy compared to post-generation perturbation methods.

Adversarial and Contrastive Variational Autoencoder for Sequential Recommendation

Sequential recommendation as an emerging topic has attracted increasing attention due to its important practical significance. Models based on deep learning and attention mechanism have achieved good performance in sequential recommendation. Recently, the generative models based on Variational Autoencoder (VAE) have shown the unique advantage in collaborative filtering. In particular, the sequential VAE model as a recurrent version of VAE can effectively capture temporal dependencies among items in user sequence and perform sequential recommendation. However, VAE-based models suffer from a common limitation that the representational ability of the obtained approximate posterior distribution is limited, resulting in lower quality of generated samples. This is especially true for generating sequences. To solve the above problem, in this work, we propose a novel method called Adversarial and Contrastive Variational Autoencoder (ACVAE) for sequential recommendation. Specifically, we first introduce the adversarial training for sequence generation under the Adversarial Variational Bayes (AVB) framework, which enables our model to generate high-quality latent variables. Then, we employ the contrastive loss. The latent variables will be able to learn more personalized and salient characteristics by minimizing the contrastive loss. Besides, when encoding the sequence, we apply a recurrent and convolutional structure to capture global and local relationships in the sequence. Finally, we conduct extensive experiments on four real-world datasets. The experimental results show that our proposed ACVAE model outperforms other state-of-the-art methods.

SESSION: Session: Networks, Access and Content Quality

“Is it a Qoincidence?”: An Exploratory Study of QAnon on Voat

Online fringe communities offer fertile grounds to users seeking and sharing ideas fueling suspicion of mainstream news and conspiracy theories. Among these, the QAnon conspiracy theory emerged in 2017 on 4chan, broadly supporting the idea that powerful politicians, aristocrats, and celebrities are closely engaged in a global pedophile ring. Simultaneously, governments are thought to be controlled by “puppet masters,” as democratically elected officials serve as a fake showroom of democracy.

This paper provides an empirical exploratory analysis of the QAnon community on Voat.co, a Reddit-esque news aggregator, which has captured the interest of the press for its toxicity and for providing a platform to QAnon followers. More precisely, we analyze a large dataset from /v/GreatAwakening, the most popular QAnon-related subverse (the Voat equivalent of a subreddit), to characterize activity and user engagement. To further understand the discourse around QAnon, we study the most popular named entities mentioned in the posts, along with the most prominent topics of discussion, which focus on US politics, Donald Trump, and world events. We also use word embeddings to identify narratives around QAnon-specific keywords. Our graph visualization shows that some of the QAnon-related ones are closely related to those from the Pizzagate conspiracy theory and so-called drops by “Q.” Finally, we analyze content toxicity, finding that discussions on /v/GreatAwakening are less toxic than in the broad Voat community.

Chinese Wall or Swiss Cheese? Keyword filtering in the Great Firewall of China

The Great Firewall of China (GFW) prevents Chinese citizens from accessing online content deemed objectionable by the Chinese government. One way it does this is to search for forbidden keywords in unencrypted packet streams. When it detects them, it terminates the offending stream by injecting TCP RST packets, and blocks further traffic between the same two hosts for a few minutes.

We report on a detailed investigation of the GFW’s application-layer understanding of HTTP. Forbidden keywords are only detected in certain locations within an HTTP request. Requests that contain the English word “search” are inspected for a longer list of forbidden keywords than requests without this word. The firewall can be evaded by bending the rules of the HTTP specification. We observe censorship based on the cleartext TLS Server Name Indication (SNI), but we find no evidence for bulk decryption of HTTPS.

We also report on changes since 2014 in the contents of the forbidden keyword list. Over 85% of the forbidden keywords have been replaced since 2014, with the surviving terms referring to perennially sensitive topics. The new keywords refer to recent events and controversies. The GFW’s keyword list is not kept in sync with the blocklists used by Chinese chat clients.

Understanding the Impact of Encrypted DNS on Internet Censorship

DNS traffic is transmitted in plaintext, resulting in privacy leakage. To combat this problem, secure protocols have been used to encrypt DNS messages. Existing studies have investigated the performance overhead and privacy benefits of encrypted DNS communications, yet little has been done from the perspective of censorship. In this paper, we study the impact of the encrypted DNS on Internet censorship in two aspects. On one hand, we explore the severity of DNS manipulation, which could be leveraged for Internet censorship, given the use of encrypted DNS resolvers. In particular, we perform 7.4 million DNS lookup measurements on 3,813 DoT and 75 DoH resolvers and identify that 1.66% of DoT responses and 1.42% of DoH responses undergo DNS manipulation. More importantly, we observe that more than two-thirds of the DoT and DoH resolvers manipulate DNS responses from at least one domain, indicating that the DNS manipulation is prevalent in encrypted DNS, which can be further exploited for enhancing Internet censorship. On the other hand, we evaluate the effectiveness of using encrypted DNS resolvers for censorship circumvention. Specifically, we first discover those vantage points that involve DNS manipulation through on-path devices, and then we apply encrypted DNS resolvers at these vantage points to access the censored domains. We reveal that 37% of the domains are accessible from the vantage points in China, but none of the domains is accessible from the vantage points in Iran, indicating that the censorship circumvention of using encrypted DNS resolvers varies from country to country. Moreover, for a vantage point, using a different encrypted DNS resolver does not lead to a noticeable difference in accessing the censored domains.

Improving Cyberbullying Detection with User Interaction

Cyberbullying, identified as intended and repeated online bullying behavior, has become increasingly prevalent in the past few decades. Despite the significant progress made thus far, the focus of most existing work on cyberbullying detection lies in the independent content analysis of different comments within a social media session. We argue that such leading notions of analysis suffer from three key limitations: they overlook the temporal correlations among different comments; they only consider the content within a single comment rather than the topic coherence across comments; they remain generic and exploit limited interactions between social media users. In this work, we observe that user comments in the same session may be inherently related, e.g., discussing similar topics, and their interaction may evolve over time. We also show that modeling such topic coherence and temporal interaction are critical to capture the repetitive characteristics of bullying behavior, thus leading to better predicting performance. To achieve the goal, we first construct a unified temporal graph for each social media session. Drawing on recent advances in graph neural network, we then propose a principled graph-based approach for modeling the temporal dynamics and topic coherence throughout user interactions. We empirically evaluate the effectiveness of our approach with the tasks of session-level bullying detection and comment-level case study. Our code is released to public. 1

IFSpard: An Information Fusion-based Framework for Spam Review Detection

Online reviews, which contain the quality information and user experience about products, always affect the consumption decisions of customers. Unfortunately, quite a number of spammers attempt to mislead consumers by writing fake reviews for some intents. Existing methods for detecting spam reviews mainly focus on constructing discriminative features, which heavily depend on experts and may miss some complex but effective features. Recently, some models attempt to learn the latent representations of reviews, users, and items. However, the learned embeddings usually lack interpretability. Moreover, most of existing methods are based on single classification model while ignoring the complementarity of different classification models.

To solve these problems, we propose IFSpard, a novel information fusion-based framework that aims at exploring and exploiting useful information from various aspects for spam review detection. First, we design a graph-based feature extraction method and an interaction-mining-based feature crossing method to automatically extract basic and complex features with consideration of different sources of data. Then, we propose a mutual-information-based feature selection and representation learning method to remove the irrelevant and redundant information contained in the automatically constructed features. Finally, we devise an adaptive ensemble model to make use of the information of constructed features and the abilities of different classifiers for spam review detection. Experimental results on several public datasets show that the proposed model performs better than state-of-the-art methods.

SESSION: Session: Sentiment

Dr.Emotion: Disentangled Representation Learning for Emotion Analysis on Social Media to Improve Community Resilience in the COVID-19 Era and Beyond

During the pandemic caused by coronavirus disease (COVID-19), social media has played an important role by enabling people to discuss their experiences and feelings of this global crisis. To help combat the prolonged pandemic that has exposed vulnerabilities impacting community resilience, in this paper, based on our established large-scale COVID-19 related social media data, we propose and develop an integrated framework (named Dr.Emotion) to learn disentangled representations of social media posts (i.e., tweets) for emotion analysis and thus to gain deep insights into public perceptions towards COVID-19. In Dr.Emotion, for given social media posts, we first post-train a transformer-based model to obtain the initial post embeddings. Since users may implicitly express their emotions in social media posts which could be highly entangled with other descriptive information in the post content, to address this challenge for emotion analysis, we propose an adversarial disentangler by integrating emotion-independent (i.e., sentiment-neutral) priors of the posts generated by another post-trained transformer-based model to separate and disentangle the implicitly encoded emotions from the content in latent space for emotion classification at the first attempt. Extensive experimental studies are conducted to fully evaluate Dr.Emotion and promising results demonstrate its performance in emotion analysis by comparison with the state-of-the-art baseline methods. By exploiting our developed Dr.Emotion, we further perform emotion analysis over a large number of social media posts and provide in-depth investigation from both temporal and geographical perspectives, based on which additional work can be conducted to extract and transform the constructive ideas, experiences and support into actionable information to improve community resilience in responses to a variety of crises created by COVID-19 and well beyond.

Modeling Human Motives and Emotions from Personal Narratives Using External Knowledge And Entity Tracking

The ability to automatically understand and infer characters’ motivations and emotional states is key to better narrative comprehension. In this work, we propose a Transformer-based architecture, referred to as , to model characters’ motives and emotions from personal narratives. Towards this goal, we incorporate social commonsense knowledge about the mental states of people related to social events and employ dynamic state tracking of entities using an augmented memory module. Our model learns to produce contextual embeddings and explanations of characters’ mental states by integrating external knowledge along with prior narrative context and mental state encodings. We leverage weakly-annotated personal narratives and knowledge data to train our model and demonstrate its effectiveness on publicly available dataset containing annotations for character mental states. Further, we show that the learned mental state embeddings can be applied in downstream tasks such as empathetic response generation.

Curriculum CycleGAN for Textual Sentiment Domain Adaptation with Multiple Sources

Sentiment analysis of user-generated reviews or comments on products and services in social networks can help enterprises to analyze the feedback from customers and take corresponding actions for improvement. To mitigate large-scale annotations on the target domain, domain adaptation (DA) provides an alternate solution by learning a transferable model from other labeled source domains. Existing multi-source domain adaptation (MDA) methods either fail to extract some discriminative features in the target domain that are related to sentiment, neglect the correlations of different sources and the distribution difference among different sub-domains even in the same source, or cannot reflect the varying optimal weighting during different training stages. In this paper, we propose a novel instance-level MDA framework, named curriculum cycle-consistent generative adversarial network (C-CycleGAN), to address the above issues. Specifically, C-CycleGAN consists of three components: (1) pre-trained text encoder which encodes textual input from different domains into a continuous representation space, (2) intermediate domain generator with curriculum instance-level adaptation which bridges the gap across source and target domains, and (3) task classifier trained on the intermediate domain for final sentiment classification. C-CycleGAN transfers source samples at instance-level to an intermediate domain that is closer to the target domain with sentiment semantics preserved and without losing discriminative features. Further, our dynamic instance-level weighting mechanisms can assign the optimal weights to different source samples in each training stage. We conduct extensive experiments on three benchmark datasets and achieve substantial gains over state-of-the-art DA approaches. Our source code is released at: https://github.com/WArushrush/Curriculum-CycleGAN.

Latent Target-Opinion as Prior for Document-Level Sentiment Classification: A Variational Approach from Fine-Grained Perspective

Existing works for document-level sentiment classification task treat the review document as an overall text unit, performing feature extraction with various sophisticated model architectures. In this paper, we draw inspiration from fine-grained sentiment analysis, proposing to first learn the latent target-opinion distribution behind the documents, and then leverage such fine-grained prior knowledge into the classification process. We model the latent target-opinion distribution as hierarchical variables, where global-level variable captures the overall target and opinion, and local-level variables retrieve the detailed opinion clues at the word level. The proposed method consists of two main parts: a variational module and a classification module. We employ the conditional variational autoencoder to make reconstructions of the document, during which the user and product information can be integrated. In the classification module, we build a hierarchical model based on Transformer encoders, where the local-level and global-level prior distribution representations induced from the variational module are injected into the word-level and sentence-level Transformers, respectively. Experimental results on benchmark datasets show that the proposed method significantly outperforms strong baselines, achieving the state-of-the-art performance. Further analysis shows that our model is capable of capturing the latent fine-grained target and opinion prior information, which is highly effective for improving the task performance.

Contrastive Lexical Diffusion Coefficient: Quantifying the Stickiness of the Ordinary

Lexical phenomena, such as clusters of words, disseminate through social networks at different rates but most models of diffusion focus on the discrete adoption of new lexical phenomena (i.e. new topics or memes). It is possible much of lexical diffusion happens via the changing rates of existing word categories or concepts (those that are already being used, at least to some extent, regularly) rather than new ones. In this study we introduce a new metric, contrastive lexical diffusion (CLD) coefficient, which attempts to measure the degree to which ordinary language (here clusters of common words) catch on over friendship connections over time. For instance topics related to meeting and job are found to be sticky, while negative thinking and emotion, and global events, like ‘school orientation’ were found to be less sticky even though they change rates over time. We evaluate CLD coefficient over both quantitative and qualitative tests, studied over 6 years of language on Twitter. We find CLD predicts the spread of tweets and friendship connections, scores converge with human judgments of lexical diffusion (r=0.92), and CLD coefficients replicate across disjoint networks (r=0.85). Comparing CLD scores can help understand lexical diffusion: positive emotion words appear more diffusive than negative emotions, first-person plurals (we) score higher than other pronouns, and numbers and time appear non-contagious.

SESSION: Session: User Modeling

High-dimensional Sparse Embeddings for Collaborative Filtering

A widely adopted paradigm in the design of recommender systems is to represent users and items as vectors, often referred to as latent factors or embeddings. Embeddings can be obtained using a variety of recommendation models and served in production using a variety of data engineering solutions. Embeddings also facilitate transfer learning, where trained embeddings from one model are reused in another. In contrast, some of the best-performing collaborative filtering models today are high-dimensional linear models that do not rely on factorization, and so they do not produce embeddings [27, 28]. They also require pruning, amounting to a trade-off between the model size and the density of the predicted affinities. This paper argues for the use of high-dimensional, sparse latent factor models, instead. We propose a new recommendation model based on a full-rank factorization of the inverse Gram matrix. The resulting high-dimensional embeddings can be made sparse while still factorizing a dense affinity matrix. We show how the embeddings combine the advantages of latent representations with the performance of high-dimensional linear models.

Sinkhorn Collaborative Filtering

Recommender systems play a vital role in modern web services. In a typical recommender system, we are given a set of observed user-item interaction records and seek to uncover the hidden behavioral patterns of users from these historical interactions. By exploiting these hidden patterns, we aim to discover users’ personalized tastes and recommend them new items. Among various types of recommendation methods, the latent factor collaborative filtering models have dominated the field. In this paper, we develop a unified view for the existing latent factor models from a probabilistic perspective. The unified framework enables us to discern the underlying connections of different latent factor models and deepen our understandings of their advantages and limitations. In particular, we observe that the loss functions adopted by the existing models are oblivious to the geometry induced by the item-similarity. To address this, we propose a novel model—SinkhornCF—based on Sinkhorn divergence. To address the challenge of the expensive computational cost of Sinkhorn divergence, we also propose new techniques to enable the resulting model to be able to scale to large datasets. Its effectiveness is verified on two real-world recommendation datasets.

HGCF: Hyperbolic Graph Convolution Networks for Collaborative Filtering

Hyperbolic spaces offer a rich setup to learn embeddings with superior properties that have been leveraged in areas such as computer vision, natural language processing and computational biology. Recently, several hyperbolic approaches have been proposed to learn robust representations for users and items in the recommendation setting. However, these approaches don’t capture the higher order relationships that typically exist in the recommendation domain. Graph convolutional neural networks (GCNs) on the other hand excel at capturing higher order information by applying multiple levels of aggregation to local representations. In this paper we combine these frameworks in a novel way, by proposing a hyperbolic GCN model for collaborative filtering. We demonstrate that our model can be effectively learned with a margin ranking loss, and show that hyperbolic space has desirable properties under the rank margin setting. At test time, inference in our model is done using the hyperbolic distance which preserves the structure of the learned space. We conduct extensive empirical analysis on three public benchmarks and compare against a large set of baselines. Our approach achieves highly competitive results and outperforms leading baselines including the Euclidean GCN counterpart. We further study the properties of the learned hyperbolic embeddings and show that they offer meaningful insights into the data. Full code for this work is available here: https://github.com/layer6ai-labs/HGCF.

Collaborative Filtering with Preferences Inferred from Brain Signals

Collaborative filtering is a common technique in which interaction data from a large number of users are used to recommend items to an individual that the individual may prefer but has not interacted with. Previous approaches have achieved this using a variety of behavioral signals, from dwell time and clickthrough rates to self-reported ratings. However, such signals are mere estimations of the real underlying preferences of the users. Here, we use brain-computer interfacing to infer preferences directly from the human brain. We then utilize these preferences in a collaborative filtering setting and report results from an experiment where brain inferred preferences are used in a neural collaborative filtering framework. Our results demonstrate, for the first time, that brain-computer interfacing can provide a viable alternative for behavioral and self-reported preferences in realistic recommendation scenarios. We also discuss the broader implications of our findings for personalization systems and user privacy.

Variable Interval Time Sequence Modeling for Career Trajectory Prediction: Deep Collaborative Perspective

In today’s fast-evolving job market, the timely and effective understanding of the career trajectories of talents can help them quickly develop necessary skills and make the right career transitions at the right time. However, it is a non-trivial task for developing a successful career trajectory prediction method, which should have the abilities for finding the right timing for job-hopping, identifying the right companies, and matching the right positions for the candidates. While people have been trying to develop solutions for providing some of the above abilities, there is no total solution or complete framework to integrate all these abilities together. To this end, in this paper, we propose a unified time-aware career trajectory prediction framework, namely TACTP, which is capable of jointly providing the above three abilities for better understanding the career trajectories of talents. Along this line, we first exploit a hierarchical deep sequential modeling network for career embedding and extract latent talent factors from multiple networks, which are designed with different functions of handling related issues of the timing, companies, and positions for job-hopping. Then, we perform collaborative filtering for generating personalized predictions. Furthermore, we propose a temporal encoding mechanism to handle dynamic temporal information so that TACTP is capable of generating time-aware predictions by addressing the challenges for variable interval time sequence modeling. Finally, we have conducted extensive experiments on large-scale real-world data to evaluate TACTP against the state-of-the-art baselines, and the results show that TACTP has advantages over baselines on all targeted tasks for career trajectory prediction.

SESSION: Session: Bias and Fairness

User-oriented Fairness in Recommendation

As a highly data-driven application, recommender systems could be affected by data bias, resulting in unfair results for different data groups, which could be a reason that affects the system performance. Therefore, it is important to identify and solve the unfairness issues in recommendation scenarios.

In this paper, we address the unfairness problem in recommender systems from the user perspective. We group users into advantaged and disadvantaged groups according to their level of activity, and conduct experiments to show that current recommender systems will behave unfairly between two groups of users. Specifically, the advantaged users (active) who only account for a small proportion in data enjoy much higher recommendation quality than those disadvantaged users (inactive). Such bias can also affect the overall performance since the disadvantaged users are the majority. To solve this problem, we provide a re-ranking approach to mitigate this unfairness problem by adding constraints over evaluation metrics. The experiments we conducted on several real-world datasets with various recommendation algorithms show that our approach can not only improve group fairness of users in recommender systems, but also achieve better overall recommendation performance.

Mitigating Gender Bias in Captioning Systems

Image captioning has made substantial progress with huge supporting image collections sourced from the web. However, recent studies have pointed out that captioning datasets, such as COCO, contain gender bias found in web corpora. As a result, learning models could heavily rely on the learned priors and image context for gender identification, leading to incorrect or even offensive errors. To encourage models to learn correct gender features, we reorganize the COCO dataset and present two new splits COCO-GB V1 and V2 datasets where the train and test sets have different gender-context joint distribution. Models relying on contextual cues will suffer from huge gender prediction errors on the anti-stereotypical test data. Benchmarking experiments reveal that most captioning models learn gender bias, leading to high gender prediction errors, especially for women. To alleviate the unwanted bias, we propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence. Experimental results validate that GAIC can significantly reduce gender prediction errors with a competitive caption quality. Our codes and the designed benchmark datasets are available at https://github.com/datamllab/Mitigating_Gender_Bias_In_Captioning_System.

Fair Partitioning of Public Resources: Redrawing District Boundary to Minimize Spatial Inequality in School Funding

Public schools in the United States offer tuition-free primary and secondary education to their students, and are divided into school districts funded by the local and state governments. Although the primary source of school district revenue is public money, several studies have pointed to the inequality in funding across different school districts. In this paper, we focus on the spatial geometry/distribution of such inequality, i.e., how the highly funded and lesser funded school districts are located relative to each other. Due to the major reliance on local property taxes for school funding, we find existing school district boundaries promoting financial segregation, with highly-funded school districts surrounded by lesser-funded districts and vice-versa.

To counter such issues, we formally propose the Fair Partitioning  problem to divide a given set of schools into k districts such that the spatial inequality in the district-level funding is minimized. However, the Fair Partitioning  problem turns out to be computationally challenging, and we formally show that it is strongly -complete. We further provide a greedy algorithm to offer practical solution to Fair Partitioning, and show its effectiveness in lowering spatial inequality in school district funding across different states in the United States.

Understanding User Sensemaking in Machine Learning Fairness Assessment Systems

A variety of systems have been proposed to assist users in detecting machine learning (ML) fairness issues. These systems approach bias reduction from a number of perspectives, including recommender systems, exploratory tools, and dashboards. In this paper, we seek to inform the design of these systems by examining how individuals make sense of fairness issues as they use different de-biasing affordances. In particular, we consider the tension between de-biasing recommendations which are quick but may lack nuance and ”what-if” style exploration which is time consuming but may lead to deeper understanding and transferable insights. Using logs, think-aloud data, and semi-structured interviews we find that exploratory systems promote a rich pattern of hypothesis generation and testing, while recommendations deliver quick answers which satisfy participants at the cost of reduced information exposure. We highlight design requirements and trade-offs in the design of ML fairness systems to promote accurate and explainable assessments.

Not All Features Are Equal: Discovering Essential Features for Preserving Prediction Privacy

When receiving machine learning services from the cloud, the provider does not need to receive all features; in fact, only a subset of the features are necessary for the target prediction task. Discerning this subset is the key problem of this work. We formulate this problem as a gradient-based perturbation maximization method that discovers this subset in the input feature space with respect to the functionality of the prediction model used by the provider. After identifying the subset, our framework, Cloak, suppresses the rest of the features using utility-preserving constant values that are discovered through a separate gradient-based optimization process. We show that Cloak does not necessarily require collaboration from the service provider beyond its normal service, and can be applied in scenarios where we only have black-box access to the service provider’s model. We theoretically guarantee that Cloak’s optimizations reduce the upper bound of the Mutual Information (MI) between the data and the sifted representations that are sent out. Experimental results show that Cloak reduces the mutual information between the input and the sifted representations by 85.01% with only negligible reduction in utility (1.42%). In addition, we show that Cloak greatly diminishes adversaries’ ability to learn and infer non-conducive features.

SESSION: Session: Models for networks and dynamics

Twin Peaks, a Model for Recurring Cascades

Understanding information dynamics and their resulting cascades is a central topic in social network analysis. In a recent seminal work, Cheng et al. analyzed multiples cascades on Facebook over several months, and noticed that many of them exhibit a recurring behaviour. They tend to have multiple peaks of popularity, with periods of quiescence in between.

In this paper, we propose the first mathematical model that provably explains this interesting phenomenon, besides exhibiting other fundamental properties of information cascades. Our model is simple and shows that it is enough to have a good clustering structure to observe this interesting recurring behaviour with a standard information diffusion model. Furthermore, we complement our theoretical analysis with an experimental evaluation where we show that our model is able to reproduce the observed phenomenon on several social networks.

TEDIC: Neural Modeling of Behavioral Patterns in Dynamic Social Interaction Networks

Dynamic social interaction networks are an important abstraction to model time-stamped social interactions such as eye contact, speaking and listening between people. These networks typically contain informative while subtle patterns that reflect people’s social characters and relationship, and therefore attract the attentions of a lot of social scientists and computer scientists. Previous approaches on extracting those patterns primarily rely on sophisticated expert knowledge of psychology and social science, and the obtained features are often overly task-specific. More generic models based on representation learning of dynamic networks may be applied, but the unique properties of social interactions cause severe model mismatch and degenerate the quality of the obtained representations. Here we fill this gap by proposing a novel framework, termed TEmporal network-DIffusion Convolutional networks (TEDIC), for generic representation learning on dynamic social interaction networks. We make TEDIC a good fit by designing two components: 1) Adopt diffusion of node attributes over a combination of the original network and its complement to capture long-hop interactive patterns embedded in the behaviors of people making or avoiding contact; 2) Leverage temporal convolution networks with hierarchical set-pooling operation to flexibly extract patterns from different-length interactions scattered over a long time span. The design also endows TEDIC with certain self-explaining power. We evaluate TEDIC over five real datasets for four different social character prediction tasks including deception detection, dominance identification, nervousness detection and community detection. TEDIC not only consistently outperforms previous SOTA’s, but also provides two important pieces of social insight. In addition, it exhibits favorable societal characteristics by remaining unbiased to people from different regions. Our project website is: http://snap.stanford.edu/tedic/.

Modeling Sparse Information Diffusion at Scale via Lazy Multivariate Hawkes Processes

Multivariate Hawkes Processes (MHPs) are an important class of temporal point processes that have enabled key advances in understanding and predicting social information systems. However, due to their complex modeling of temporal dependencies, MHPs have proven to be notoriously difficult to scale, what has limited their applications to relatively small domains. In this work, we propose a novel model and computational approach to overcome this important limitation. By exploiting a characteristic sparsity pattern in real-world diffusion processes, we show that our approach allows to compute the exact likelihood and gradients of an MHP – independently of the ambient dimensions of the underlying network. We show on synthetic and real-world datasets that our method does not only achieve state-of-the-art modeling results, but also improves runtime performance by multiple orders of magnitude on sparse event sequences. In combination with easily interpretable latent variables and influence structures, this allows us to analyze diffusion processes in networks at previously unattainable scale.

DYMOND: DYnamic MOtif-NoDes Network Generative Model

Motifs, which have been established as building blocks for network structure, move beyond pair-wise connections to capture longer-range correlations in connections and activity. In spite of this, there are few generative graph models that consider higher-order network structures and even fewer that focus on using motifs in models of dynamic graphs. Most existing generative models for temporal graphs strictly grow the networks via edge addition, and the models are evaluated using static graph structure metrics—which do not adequately capture the temporal behavior of the network. To address these issues, in this work we propose DYnamic MOtif-NoDes (DYMOND)—a generative model that considers (i) the dynamic changes in overall graph structure using temporal motif activity and (ii) the roles nodes play in motifs (e.g., one node plays the hub role in a wedge, while the remaining two act as spokes). We compare DYMOND to three dynamic graph generative model baselines on real-world networks and show that DYMOND performs better at generating graph structure and node behavior similar to the observed network. We also propose a new methodology to adapt graph structure metrics to better evaluate the temporal aspect of the network. These metrics take into account the changes in overall graph structure and the individual nodes’ behavior over time.

Radflow: A Recurrent, Aggregated, and Decomposable Model for Networks of Time Series

We propose a new model for networks of time series that influence each other. Graph structures among time series are found in diverse domains, such as web traffic influenced by hyperlinks, product sales influenced by recommendation, or urban transport volume influenced by road networks and weather. There has been recent progress in graph modeling and in time series forecasting, respectively, but an expressive and scalable approach for a network of series does not yet exist. We introduce Radflow, a novel model that embodies three key ideas: a recurrent neural network to obtain node embeddings that depend on time, the aggregation of the flow of influence from neighboring nodes with multi-head attention, and the multi-layer decomposition of time series. Radflow naturally takes into account dynamic networks where nodes and edges change over time, and it can be used for prediction and data imputation tasks. On real-world datasets ranging from a few hundred to a few hundred thousand nodes, we observe that Radflow variants are the best performing model across a wide range of settings. The recurrent component in Radflow also outperforms N-BEATS, the state-of-the-art time series model. We show that Radflow can learn different trends and seasonal patterns, that it is robust to missing nodes and edges, and that correlated temporal patterns among network neighbors reflect influence strength. We curate WikiTraffic, the largest dynamic network of time series with 366K nodes and 22M time-dependent links spanning five years. This dataset provides an open benchmark for developing models in this area, with applications that include optimizing resources for the web. More broadly, Radflow has the potential to improve forecasts in correlated time series networks such as the stock market, and impute missing measurements in geographically dispersed networks of natural phenomena.

SESSION: Session: Web Mining for Search

Towards a Better Understanding of Query Reformulation Behavior in Web Search

As queries submitted by users directly affect search experiences, how to organize queries has always been a research focus in Web search studies. While search request becomes complex and exploratory, many search sessions contain more than a single query thus reformulation becomes a necessity. To help users better formulate their queries in these complex search tasks, modern search engines usually provide a series of reformulation entries on search engine result pages (SERPs), i.e., query suggestions and related entities. However, few existing work have thoroughly studied why and how users perform query reformulations in these heterogeneous interfaces. Therefore, whether search engines provide sufficient assistance for users in reformulating queries remains under-investigated. To shed light on this research question, we conducted a field study to analyze fine-grained user reformulation behaviors including reformulation type, entry, reason, and the inspiration source with various search intents. Different from existing efforts that rely on external assessors to make judgments, in the field study we collect both implicit behavior signals and explicit user feedback information. Analysis results demonstrate that query reformulation behavior in Web search varies with the type of search tasks. We also found that the current query suggestion/related query recommendations provided by search engines do not offer enough help for users in complex search tasks. Based on the findings in our field study, we design a supervised learning framework to predict: 1) the reason behind each query reformulation, and 2) how users organize the reformulated query, both of which are novel challenges in this domain. This work provides insight into complex query reformulation behavior in Web search as well as the guidance for designing better query suggestion techniques in search engines.

Topic-enhanced knowledge-aware retrieval model for diverse relevance estimation

Relevance measures the relation between query and document which contains several different dimensions, e.g., semantic similarity, topical relatedness, cognitive relevance (the relations in the aspect of knowledge), usefulness, timeliness, utility and so on. However, existing retrieval models mainly focus on semantic similarity and cognitive relevance while ignore other possible dimensions to model relevance. Topical relatedness, as an important dimension to measure relevance, is not well studied in existing neural information retrieval. In this paper, we propose a Topic Enhanced Knowledge-aware retrieval Model (TEKM) that jointly learns semantic similarity, knowledge relevance and topical relatedness to estimate relevance between query and document. We first construct a neural topic model to learn topical information and generate topic embeddings of a query. Then we combine the topic embeddings with a knowledge-aware retrieval model to estimate different dimensions of relevance. Specifically, we exploit kernel pooling to soft match topic embeddings with word and entity in a unified embedding space to generate fine-grained topical relatedness. The whole model is trained in an end-to-end manner. Experiments on a large-scale publicly available benchmark dataset show that TEKM outperforms existing retrieval models. Further analysis also shows how topic relatedness is modeled to improve traditional retrieval model with semantic similarity and knowledge relevance.

Controllable Gradient Item Retrieval

In this paper, we identify and study an important problem of gradient item retrieval. We define the problem as retrieving a sequence of items with a gradual change on a certain attribute, given a reference item and a modification text. For example, after a customer saw a white dress, she/he wants to buy a similar one but more floral on it. The extent of ”more floral” is subjective, thus prompting one floral dress is hard to satisfy the customer’s needs. A better way is to present a sequence of products with increasingly floral attributes based on the white dress, and allow the customer to select the most satisfactory one from the sequence. Existing item retrieval methods mainly focus on whether the target items appear at the top of the retrieved sequence, but ignore the demand for retrieving a sequence of products with gradual change on a certain attribute. To deal with this problem, we propose a weakly-supervised method that can learn a disentangled item representation from user-item interaction data and ground the semantic meaning of attributes to dimensions of the item representation. Our method takes a reference item and a modification as a query. During inference, we start from the reference item and ”walk” along the direction of the modification in the item representation space to retrieve a sequence of items in a gradient manner. We demonstrate our proposed method can achieve disentanglement through weak supervision. Besides, we empirically show that an item sequence retrieved by our method is gradually changed on an indicated attribute and, in the item retrieval task, our method outperforms existing approaches on three different datasets.

Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval

The ad-hoc retrieval task is to rank related documents given a query and a document collection. A series of deep learning based approaches have been proposed to solve such problem and gained lots of attention. However, we argue that they are inherently based on local word sequences, ignoring the subtle long-distance document-level word relationships. To solve the problem, we explicitly model the document-level word relationship through the graph structure, capturing the subtle information via graph neural networks. In addition, due to the complexity and scale of the document collections, it is considerable to explore the different grain-sized hierarchical matching signals at a more general level. Therefore, we propose a Graph-based Hierarchical Relevance Matching model (GHRM) for ad-hoc retrieval, by which we can capture the subtle and general hierarchical matching signals simultaneously. We validate the effects of GHRM over two representative ad-hoc retrieval benchmarks, the comprehensive experiments and results demonstrate its superiority over state-of-the-art methods.

Cross-Positional Attention for Debiasing Clicks

A well-known challenge in leveraging implicit user feedback like clicks to improve real-world search services and recommender systems is its inherent bias. Most existing click models are based on the examination hypothesis in user behaviors and differ in how to model such an examination bias. However, they are constrained by assuming a simple position-based bias or enforcing a sequential order in user examination behaviors. These assumptions are insufficient to capture complex real-world user behaviors and hardly generalize to modern user interfaces (UI) in web applications (e.g., results shown in a grid view). In this work, we propose a fully data-driven neural model for the examination bias, Cross-Positional Attention (XPA), which is more flexible in fitting complex user behaviors. Our model leverages the attention mechanism to effectively capture cross-positional interactions among displayed items and is applicable to arbitrary UIs. We employ XPA in a novel neural click model that can both predict clicks and estimate relevance. Our experiments on offline synthetic data sets show that XPA is robust among different click generation processes. We further apply XPA to a large-scale real-world recommender system, showing significantly better results than baselines in online A/B experiments that involve millions of users. This validates the necessity to model more complex user behaviors than those proposed in the literature.

SESSION: Session: Entity Linking and Knowledge Graph Completion

Inductive Entity Representations from Text via Link Prediction

Knowledge Graphs (KG) are of vital importance for multiple applications on the web, including information retrieval, recommender systems, and metadata annotation.

Regardless of whether they are built manually by domain experts or with automatic pipelines, KGs are often incomplete. To address this problem, there is a large amount of work that proposes using machine learning to complete these graphs by predicting new links. Recent work has begun to explore the use of textual descriptions available in knowledge graphs to learn vector representations of entities in order to preform link prediction. However, the extent to which these representations learned for link prediction generalize to other tasks is unclear. This is important given the cost of learning such representations. Ideally, we would prefer representations that do not need to be trained again when transferring to a different task, while retaining reasonable performance.

Therefore, in this work, we propose a holistic evaluation protocol for entity representations learned via a link prediction objective. We consider the inductive link prediction and entity classification tasks, which involve entities not seen during training. We also consider an information retrieval task for entity-oriented search. We evaluate an architecture based on a pretrained language model, that exhibits strong generalization to entities not observed during training, and outperforms related state-of-the-art methods (22% MRR improvement in link prediction on average). We further provide evidence that the learned representations transfer well to other tasks without fine-tuning. In the entity classification task we obtain an average improvement of 16% in accuracy compared with baselines that also employ pre-trained models. In the information retrieval task, we obtain significant improvements of up to 8.8% in NDCG@10 for natural language queries. We thus show that the learned representations are not limited KG-specific tasks, and have greater generalization properties than evaluated in previous work.

Revisiting the Evaluation Protocol of Knowledge Graph Completion Methods for Link Prediction

Completion methods learn models to infer missing (subject, predicate, object) triples in knowledge graphs, a task known as link prediction. The training phase is based on samples of positive triples and their negative counterparts. The test phase consists of ranking each positive triple with respect to its negative counterparts based on the scores obtained by a learned model. The best model ranks all positive triples first. Metrics like mean rank, mean reciprocal rank and hits at k are used to assess accuracy. Under this generic evaluation protocol, we observe several shortcomings: 1) Current metrics assume that each measurement is upper bounded by the same constant value and, therefore, are oblivious to the fact that, in link prediction, each positive triple may have a different number of negative counterparts, which alters the difficulty of ranking positive triples. 2) Benchmarking datasets contain anomalies (unrealistic redundancy) that allegedly simplifies link prediction; however, current instantiations of the generic evaluation protocol do not integrate anomalies, which are just discarded based on a user-defined threshold. 3) Benchmarking datasets have been randomly split, which typically alters the graph topology and results in the training split not resembling the original dataset. 4) A single model is typically kept based on its accuracy over the validation split using a given metric; however, since metrics aggregate ranks into a single value, there may be no significant differences among the ranks produced by several models, which must be all evaluated in the test phase. In this paper, we contribute to the evaluation of link prediction as follows: 1) We propose a variation of the mean rank that considers the number of negative counterparts. 2) We define the anomaly coefficient of a predicate and integrate such coefficient in the protocol. 3) We propose a downscaling algorithm to generate training splits that reflect the original graph topology based on a nonparametric, unpaired statistical test. 4) During validation, we discard a learned model only if its output ranks are significantly different than other ranks based on a nonparametric, paired statistical test. Our experiments over seven well-known datasets show that translation-based methods (TransD, TransE and TransH) significantly outperform recent methods, which entails that our understanding of the accuracy of completion methods for link prediction is far from perfect.

Boosting the Speed of Entity Alignment 10 ×: Dual Attention Matching Network with Normalized Hard Sample Mining

Seeking the equivalent entities among multi-source Knowledge Graphs (KGs) is the pivotal step to KGs integration, also known as entity alignment (EA). However, most existing EA methods are inefficient and poor in scalability. A recent summary points out that some of them even require several days to deal with a dataset containing 200,000 nodes (DWY100K). We believe over-complex graph encoder and inefficient negative sampling strategy are the two main reasons. In this paper, we propose a novel KG encoder — Dual Attention Matching Network (Dual-AMN), which not only models both intra-graph and cross-graph information smartly, but also greatly reduces computational complexity. Furthermore, we propose the Normalized Hard Sample Mining Loss to smoothly select hard negative samples with reduced loss shift. The experimental results on widely used public datasets indicate that our method achieves both high accuracy and high efficiency. On DWY100K, the whole running process of our method could be finished in 1,100 seconds, at least 10 × faster than previous work. The performances of our method also outperform previous works across all datasets, where Hits@1 and MRR have been improved from 6% to 13%.

Progressive, Holistic Geospatial Interlinking

Geospatial data constitute a considerable part of Semantic Web data, but at the moment, its sources are inadequately interlinked with topological relations in the Linked Open Data cloud. Geospatial Interlinking covers this gap with batch techniques that are restricted to individual topological relations, even though most operations are common for all main relations. In this work, we introduce a batch algorithm that simultaneously computes all topological relations and define the task of Progressive Geospatial Interlinking, which produces results in a pay-as-you-go manner when the available computational or temporal resources are limited. We propose two progressive algorithms and conduct a thorough experimental study over large, real datasets, demonstrating the superiority of our techniques over the current state-of-the-art.

RETA: A Schema-Aware, End-to-End Solution for Instance Completion in Knowledge Graphs

Knowledge Graph (KG) completion has been widely studied to tackle the incompleteness issue (i.e., missing facts) in modern KGs. A fact in a KG is represented as a triplet (h, r, t) linking two entities h and t via a relation r. Existing work mostly consider link prediction to solve this problem, i.e., given two elements of a triplet predicting the missing one, such as (h, r, ?). This task has, however, a strong assumption on the two given elements in a triplet, which have to be correlated, resulting otherwise in meaningless predictions, such as (Marie Curie, headquarters location, ?). In addition, the KG completion problem has also been formulated as a relation prediction task, i.e., when predicting relations r for a given entity h. Without predicting t, this task is however a step away from the ultimate goal of KG completion. Against this background, this paper studies an instance completion task suggesting r-t pairs for a given h, i.e., (h, ?, ?). We propose an end-to-end solution called RETA (as it suggests the Relation and Tail for a given head entity) consisting of two components: a RETA-Filter and RETA-Grader. More precisely, our RETA-Filter first generates candidate r-t pairs for a given h by extracting and leveraging the schema of a KG; our RETA-Grader then evaluates and ranks the candidate r-t pairs considering the plausibility of both the candidate triplet and its corresponding schema using a newly-designed KG embedding model. We evaluate our methods against a sizable collection of state-of-the-art techniques on three real-world KG datasets. Results show that our RETA-Filter generates of high-quality candidate r-t pairs, outperforming the best baseline techniques while reducing by 10.61%-84.75% the candidate size under the same candidate quality guarantees. Moreover, our RETA-Grader also significantly outperforms state-of-the-art link prediction techniques on the instance completion task by 16.25%-65.92% across different datasets.

SESSION: Session: Recommendations

A Recommender System for Crowdsourcing Food Rescue Platforms

The challenges of food waste and insecurity arise in wealthy and developing nations alike, impacting millions of livelihoods. The ongoing pandemic only exacerbates the problem. A major force to combat food waste and insecurity, food rescue (FR) organizations match food donations to the non-profits that serve low-resource communities. Since they rely on external volunteers to pick up and deliver the food, some FRs use web-based mobile applications to reach the right set of volunteers. In this paper, we propose the first machine learning based model to improve volunteer engagement in the food waste and security domain. We (1) develop a recommender system to send push notifications to the most likely volunteers for each given rescue, (2) leverage a mathematical programming based approach to diversify our recommendations, and (3) propose an online algorithm to dynamically select the volunteers to notify without the knowledge of future rescues. Our recommendation system improves the hit ratio from 44% achieved by the previous method to 73%. A pilot study of our method is scheduled to take place in the near future.

A Workflow Analysis of Context-driven Conversational Recommendation

A number of recent works have made seminal contributions to the understanding of user intent and recommender interaction in conversational recommendation. However, to date, these studies have not focused explicitly on context-driven interaction that underlies the typical use of more pervasive Question Answering (QA) focused conversational assistants like Amazon Alexa, Apple Siri, and Google Assistant. In this paper, we aim to understand a general workflow of natural context-driven conversational recommendation that arises from a pairwise study of a human user interacting with a human simulating the role of a recommender. In our analysis of this intrinsically organic human-to-human conversation, we observe a clear structure of interaction workflow consisting of a preference elicitation and refinement stage, followed by inquiry and critiquing stages after the first recommendation. To better understand the nature of these stages and the conversational flow within them, we augment existing taxonomies of intent and action to label all interactions at each stage and analyze the workflow. From this analysis, we identify distinct conversational characteristics of each stage, e.g., (i) the preference elicitation stage consists of significant iteration to clarify, refine, and obtain a mutual understanding of preferences, (ii) the inquiry and critiquing stage consists of extensive informational queries to understand features of the recommended item and to (implicitly) specify critiques, and (iii) explanation appears to drive a substantial portion of the post-recommendation interaction, suggesting that beyond the purpose of justification, explanation serves a critical role to direct the evolving conversation itself. Altogether, we contribute a novel qualitative and quantitative analysis of workflow in conversational recommendation that further refines our existing understanding of this important frontier of conversational systems and suggests a number of critical avenues for further research to better automate natural recommendation conversations.

Learning Intents behind Interactions with Knowledge Graph for Recommendation

Knowledge graph (KG) plays an increasingly important role in recommender systems. A recent technical trend is to develop end-to-end models founded on graph neural networks (GNNs). However, existing GNN-based models are coarse-grained in relational modeling, failing to (1) identify user-item relation at a fine-grained level of intents, and (2) exploit relation dependencies to preserve the semantics of long-range connectivity.

In this study, we explore intents behind a user-item interaction by using auxiliary item knowledge, and propose a new model, Knowledge Graph-based Intent Network (KGIN). Technically, we model each intent as an attentive combination of KG relations, encouraging the independence of different intents for better model capability and interpretability. Furthermore, we devise a new information aggregation scheme for GNN, which recursively integrates the relation sequences of long-range connectivity (i.e., relational paths). This scheme allows us to distill useful information about user intents and encode them into the representations of users and items. Experimental results on three benchmark datasets show that, KGIN achieves significant improvements over the state-of-the-art methods like KGAT [41], KGNN-LS [38], and CKAN [47]. Further analyses show that KGIN offers interpretable explanations for predictions by identifying influential intents and relational paths. The implementations are available at https://github.com/huangtinglin/Knowledge_Graph_based_Intent_Network.

Rabbit Holes and Taste Distortion: Distribution-Aware Recommendation with Evolving Interests

To mitigate the rabbit hole effect in recommendations, conventional distribution-aware recommendation systems aim to ensure that a user’s prior interest areas are reflected in the recommendations that the system makes. For example, a user who historically prefers comedies to dramas by 2:1 should see a similar ratio in recommended movies. Such approaches have proven to be an important building block for recommendation tasks. However, existing distribution-aware approaches enforce that the target taste distribution should exactly match a user’s prior interests (typically revealed through training data), based on the assumption that users’ taste distribution is fundamentally static. This assumption can lead to large estimation errors. We empirically identify this taste distortion problem through a data-driven study over multiple datasets. We show how taste preferences dynamically shift and how the design of a calibration mechanism should be designed with these shifts in mind. We further demonstrate how to incorporate these shifts into a taste enhanced calibrated recommender system, which results in simultaneously mitigated both the rabbit hole effect and taste distortion problem.

DeepRec: On-device Deep Learning for Privacy-Preserving Sequential Recommendation in Mobile Commerce

Sequential recommendation techniques are considered to be a promising way of providing better user experience in mobile commerce by learning sequential interests within user historical interaction behaviors. However, the recently increasing focus on privacy concerns, such as the General Data Protection Regulation (GDPR), can significantly affect the deployment of state-of-the-art sequential recommendation techniques, because user behavior data are no longer allowed to be arbitrarily used without the user’s explicit permission. To address the issue, this paper proposes DeepRec, an on-device deep learning framework of mining interaction behaviors for sequential recommendation without sending any raw data or intermediate results out of the device, preserving user privacy maximally. DeepRec constructs a global model using data collected before GDPR and fine-tunes a personal model continuously on individual mobile devices using data collected after GDPR. DeepRec employs the model pruning and embedding sparsity techniques to reduce the computation and network overhead, making the model training process practical on computation-constraint mobile devices. Evaluation results show that DeepRec can achieve comparable recommendation accuracy to existing centralized recommendation approaches with small computation overhead and up to 10x reduction in network overhead.

SESSION: Session: Federated Learning

Meta-HAR: Federated Representation Learning for Human Activity Recognition

Human activity recognition (HAR) based on mobile sensors plays an important role in ubiquitous computing. However, the rise of data regulatory constraints precludes collecting private and labeled signal data from personal devices at scale. Thanks to the growth of computational power on mobile devices, federated learning has emerged as a decentralized alternative solution to model training, which iteratively aggregates locally updated models into a shared global model, therefore being able to leverage decentralized, private data without central collection. However, the effectiveness of federated learning for HAR is affected by the fact that each user has different activity types and even a different signal distribution for the same activity type. Furthermore, it is uncertain if a single global model trained can generalize well to individual users or new users with heterogeneous data. In this paper, we propose Meta-HAR, a federated representation learning framework, in which a signal embedding network is meta-learned in a federated manner, while the learned signal representations are further fed into a personalized classification network at each user for activity prediction. In order to boost the representation ability of the embedding network, we treat the HAR problem at each user as a different task and train the shared embedding network through a Model-Agnostic Meta-learning framework, such that the embedding network can generalize to any individual user. Personalization is further achieved on top of the robustly learned representations in an adaptation procedure. We conducted extensive experiments based on two publicly available HAR datasets as well as a newly created HAR dataset. Results verify that Meta-HAR is effective at maintaining high test accuracies for individual users, including new users, and significantly outperforms several baselines, including Federated Averaging, Reptile and even centralized learning in certain cases. Our collected dataset will be open-sourced to facilitate future development in the field of sensor-based human activity recognition.

PFA: Privacy-preserving Federated Adaptation for Effective Model Personalization

Federated learning (FL) has become a prevalent distributed machine learning paradigm with improved privacy. After learning, the resulting federated model should be further personalized to each different client. While several methods have been proposed to achieve personalization, they are typically limited to a single local device, which may incur bias or overfitting since data in a single device is extremely limited. In this paper, we attempt to realize personalization beyond a single client. The motivation is that during the FL process, there may exist many clients with similar data distribution, and thus the personalization performance could be significantly boosted if these similar clients can cooperate with each other. Inspired by this, this paper introduces a new concept called federated adaptation, targeting at adapting the trained model in a federated manner to achieve better personalization results. However, the key challenge for federated adaptation is that we could not outsource any raw data from the client during adaptation, due to privacy concerns. In this paper, we propose PFA, a framework to accomplish Privacy-preserving Federated Adaptation. PFA leverages the sparsity property of neural networks to generate privacy-preserving representations and uses them to efficiently identify clients with similar data distributions. Based on the grouping results, PFA conducts an FL process in a group-wise way on the federated model to accomplish the adaptation. For evaluation, we manually construct several practical FL datasets based on public datasets in order to simulate both the class-imbalance and background-difference conditions. Extensive experiments on these datasets and popular model architectures demonstrate the effectiveness of PFA, outperforming other state-of-the-art methods by a large margin while ensuring user privacy. We will release our code at: https://github.com/lebyni/PFA.

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a device unavailable for training or unable to upload its model updates. Unfortunately, these impacts have never been systematically studied and quantified in existing FL literature.

In this paper, we carry out the first empirical study to characterize the impacts of heterogeneity in FL. We collect large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. We also build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration. Based on the data and the platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings. Results show that heterogeneity causes non-trivial performance degradation in FL, including up to 9.2% accuracy drop, 2.32 × lengthened training time, and undermined fairness. Furthermore, we analyze potential impact factors and find that device failure and participant bias are two potential factors for performance degradation. Our study provides insightful implications for FL practitioners. On the one hand, our findings suggest that FL algorithm designers consider necessary heterogeneity during the evaluation. On the other hand, our findings urge system providers to design specific mechanisms to mitigate the impacts of heterogeneity.

Incentive Mechanism for Horizontal Federated Learning Based on Reputation and Reverse Auction

Current research on federated learning mainly focuses on joint optimization, improving efficiency and effectiveness, and protecting privacy. However, there are relatively few studies on incentive mechanisms. Most studies fail to consider the fact that if there is no profit, participants have no incentive to provide data and training models, and task requesters cannot identify and select reliable participants with high-quality data. Therefore, this paper proposes a federated learning incentive mechanism based on reputation and reverse auction theory. Participants bid for tasks, and reputation indirectly reflects their reliability and data quality. In this federated learning program, we select and reward participants by combining the reputation and bids of the participants under a limited budget. Theoretical analysis proves that the mechanism satisfies computational efficiency, individual rationality, budget feasibility, and truthfulness. The simulation results show the effectiveness of the mechanism.

Hierarchical Personalized Federated Learning for User Modeling

User modeling aims to capture the latent characteristics of users from their behaviors, and is widely applied in numerous applications. Usually, centralized user modeling suffers from the risk of privacy leakage. Instead, federated user modeling expects to provide a secure multi-client collaboration for user modeling through federated learning. Existing federated learning methods are mainly designed for consistent clients, which cannot be directly applied to practical scenarios, where different clients usually store inconsistent user data. Therefore, it is a crucial demand to design an appropriate federated solution that can better adapt to user modeling tasks, and however, meets following critical challenges: 1) Statistical heterogeneity. The distributions of user data in different clients are not always independently identically distributed which leads to personalized clients; 2) Privacy heterogeneity. User data contains both public and private information, which have different levels of privacy. It means we should balance different information to be shared and protected; 3) Model heterogeneity. The local user models trained with client records are heterogeneous which need flexible aggregation in the server. In this paper, we propose a novel client-server architecture framework, namely Hierarchical Personalized Federated Learning (HPFL) to serve federated learning in user modeling with inconsistent clients. In the framework, we first define hierarchical information to finely partition the data with privacy heterogeneity. On this basis, the client trains a user model which contains different components designed for hierarchical information. Moreover, client processes a fine-grained personalized update strategy to update personalized user model for statistical heterogeneity. Correspondingly, the server completes a differentiated component aggregation strategy to flexibly aggregate heterogeneous user models in the case of privacy and model heterogeneity. Finally, we conduct extensive experiments on real-world datasets, which demonstrate the effectiveness of the HPFL framework.

SESSION: Session: Security

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

A key challenge of big data analytics is how to collect a large volume of (labeled) data. Crowdsourcing aims to address this challenge via aggregating and estimating high-quality data (e.g., sentiment label for text) from pervasive clients/users. Existing studies on crowdsourcing focus on designing new methods to improve the aggregated data quality from unreliable/noisy clients. However, the security aspects of such crowdsourcing systems remain under-explored to date. We aim to bridge this gap in this work. Specifically, we show that crowdsourcing is vulnerable to data poisoning attacks, in which malicious clients provide carefully crafted data to corrupt the aggregated data. We formulate our proposed data poisoning attacks as an optimization problem that maximizes the error of the aggregated data. Our evaluation results on one synthetic and two real-world benchmark datasets demonstrate that the proposed attacks can substantially increase the estimation errors of the aggregated data. We also propose two defenses to reduce the impact of malicious clients. Our empirical results show that the proposed defenses can substantially reduce the estimation errors of the data poisoning attacks.

Deepfake Videos in the Wild: Analysis and Detection

AI-manipulated videos, commonly known as deepfakes, are an emerging problem. Recently, researchers in academia and industry have contributed several (self-created) benchmark deepfake datasets, and deepfake detection algorithms. However, little effort has gone towards understanding deepfake videos in the wild, leading to a limited understanding of the real-world applicability of research contributions in this space. Even if detection schemes are shown to perform well on existing datasets, it is unclear how well the methods generalize to real-world deepfakes. To bridge this gap in knowledge, we make the following contributions: First, we collect and present the largest dataset of deepfake videos in the wild, containing 1,869 videos from YouTube and Bilibili, and extract over 4.8M frames of content. Second, we present a comprehensive analysis of the growth patterns, popularity, creators, manipulation strategies, and production methods of deepfake content in the real-world. Third, we systematically evaluate existing defenses using our new dataset, and observe that they are not ready for deployment in the real-world. Fourth, we explore the potential for transfer learning schemes and competition-winning techniques to improve defenses.

RIGA: Covert and Robust White-Box Watermarking of Deep Neural Networks

Watermarking of deep neural networks (DNN) can enable their tracing once released by a data owner to an online platform. In this paper, we generalize white-box watermarking algorithms for DNNs, where the data owner needs white-box access to the model to extract the watermark. White-box watermarking algorithms have the advantage that they do not impact the accuracy of the watermarked model. We propose Robust whIte-box GAn watermarking (RIGA), a novel white-box watermarking algorithm that uses adversarial training. Our extensive experiments demonstrate that the proposed watermarking algorithm not only does not impact accuracy, but also significantly improves the covertness and robustness over the current state-of-art.

CoResident Evil: Covert Communication In The Cloud With Lambdas

“Serverless” cloud services, such as AWS lambdas, are one of the fastest growing segments of the cloud services market. These services are popular in part due to their light-weight nature and flexibility in scheduling and cost, however the security issues associated with serverless computing are not well understood. In this work, we explore the feasibility of constructing a practical covert channel from lambdas. We establish that a fast co-residence detection for lambdas is key to enabling such a covert channel, and proceed to develop a reliable and scalable co-residence detector based on the memory bus hardware. Our technique enables dynamic discovery for co-resident lambdas and is incredibly fast, executing in a matter of seconds. We evaluate our approach for correctness and scalability, and use it to establish covert channels and perform data transfer on AWS lambdas. We show that we can establish hundreds of individual covert channels for every 1000 lambdas deployed, and each of those channels can send data at a rate of 00 bits per second, thus demonstrating that covert communication via lambdas is entirely feasible.

DAPter: Preventing User Data Abuse in Deep Learning Inference Services

The data abuse issue has risen along with the widespread development of the deep learning inference service (DLIS). Specifically, mobile users worry about their input data being labeled to secretly train new deep learning models that are unrelated to the DLIS they subscribe to. This unique issue, unlike the privacy problem, is about the rights of data owners in the context of deep learning. However, preventing data abuse is demanding when considering the usability and generality in the mobile scenario. In this work, we propose, to our best knowledge, the first data abuse prevention mechanism called DAPter. DAPter is a user-side DLIS-input converter, which removes unnecessary information with respect to the targeted DLIS. The converted input data by DAPter maintains good inference accuracy and is difficult to be labeled manually or automatically for the new model training. DAPter’s conversion is empowered by our lightweight generative model trained with a novel loss function to minimize abusable information in the input data. Furthermore, adapting DAPter requires no change in the existing DLIS backend and models. We conduct comprehensive experiments with our DAPter prototype on mobile devices and demonstrate that DAPter can substantially raise the bar of the data abuse difficulty with little impact on the service quality and overhead.

SESSION: Session: Information retrieval

Cross-lingual Language Model Pretraining for Retrieval

Existing research on cross-lingual retrieval cannot take good advantage of large-scale pretrained language models such as multilingual BERT and XLM. We hypothesize that the absence of cross-lingual passage-level relevance data for finetuning and the lack of query-document style pretraining are key factors of this issue. In this paper, we introduce two novel retrieval-oriented pretraining tasks to further pretrain cross-lingual language models for downstream retrieval tasks such as cross-lingual ad-hoc retrieval (CLIR) and cross-lingual question answering (CLQA). We construct distant supervision data from multilingual Wikipedia using section alignment to support retrieval-oriented language model pretraining. We also propose to directly finetune language models on part of the evaluation collection by making Transformers capable of accepting longer sequences. Experiments on multiple benchmark datasets show that our proposed model can significantly improve upon general multilingual language models in both the cross-lingual retrieval setting and the cross-lingual transfer setting.

Match Plan Generation in Web Search with Parameterized Action Reinforcement Learning

To achieve good result quality and short query response time, search engines use specific match plans on Inverted Index to help retrieve a small set of relevant documents from billions of web pages. A match plan is composed of a sequence of match rules, which contain discrete match rule types and continuous stopping quotas. Currently, match plans are manually designed by experts according to their several years’ experience, which encounters difficulty in dealing with heterogeneous queries and varying data distribution. In this work, we formulate the match plan generation as a Partially Observable Markov Decision Process (POMDP) with a parameterized action space, and propose a novel reinforcement learning algorithm Parameterized Action Soft Actor-Critic (PASAC) to effectively enhance the exploration in both spaces. In our scene, we also discover a skew prioritizing issue of the original Prioritized Experience Replay (PER) and introduce Stratified Prioritized Experience Replay (SPER) to address it. We are the first group to generalize this task for all queries as a learning problem with zero prior knowledge and successfully apply deep reinforcement learning in the real web search environment. Our approach greatly outperforms the well-designed production match plans by over 70% reduction of index block accesses with the quality of documents almost unchanged, and 9% reduction of query response time even with model inference cost. Our method also beats the baselines on some open-source benchmarks1.

A Linguistic Study on Relevance Modeling in Information Retrieval

Relevance plays a central role in information retrieval (IR), which has received extensive studies starting from the 20th century. The definition and the modeling of relevance has always been critical challenges in both information science and computer science research areas. Along with the debate and exploration on relevance, IR has already become a core task in many real-world applications, such as Web search engines, question answering systems, conversational bots, and so on. While relevance acts as a unified concept in all these retrieval tasks, the inherent definitions are quite different due to the heterogeneity of these tasks. This raises a question to us: Do these different forms of relevance really lead to different modeling focuses? To answer this question, in this work, we conduct an empirical study on relevance modeling in three representative IR tasks, i.e., document retrieval, answer retrieval, and response retrieval. Specifically, we attempt to study the following two questions: 1) Does relevance modeling in these tasks really show differences in terms of natural language understanding (NLU)? We employ 16 linguistic tasks to probe a unified retrieval model over these three retrieval tasks to answer this question. 2) If there do exist differences, how can we leverage the findings to enhance the relevance modeling? We proposed three intervention methods to investigate how to leverage different modeling focuses of relevance to improve these IR tasks. We believe the way we study the problem as well as our findings would be beneficial to the IR community.

Estimation of Fair Ranking Metrics with Incomplete Judgments

There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics. We formulate a robust and unbiased estimator which can operate even with very limited number of labeled items. We evaluate our approach using both simulated and real world data. Our experimental results demonstrate that our method can estimate this family of fair ranking metrics and provides a robust, reliable alternative to exhaustive or random data annotation.

Pivot-based Candidate Retrieval for Cross-lingual Entity Linking

Entity candidate retrieval plays a critical role in cross-lingual entity linking (XEL). In XEL, entity candidate retrieval needs to retrieve a list of plausible candidate entities from a large knowledge graph in a target language given a piece of text in a sentence or question, namely a mention, in a source language. Existing works mainly fall into two categories: lexicon-based and semantic-based approaches. The lexicon-based approach usually creates cross-lingual and mention-entity lexicons, which is effective but relies heavily on bilingual resources (e.g. inter-language links in Wikipedia). The semantic-based approach maps mentions and entities in different languages to a unified embedding space, which reduces dependence on large-scale bilingual dictionaries. However, its effectiveness is limited by the representation capacity of fixed-length vectors. In this paper, we propose a pivot-based approach which inherits the advantages of the aforementioned two approaches while avoiding their limitations. It takes an intermediary set of plausible target-language mentions as pivots to bridge the two types of gaps: cross-lingual gap and mention-entity gap. Specifically, it first converts mentions in the source language into an intermediary set of plausible mentions in the target language by cross-lingual semantic retrieval and a selective mechanism, and then retrieves candidate entities based on the generated mentions by lexical retrieval. The proposed approach only relies on a small bilingual word dictionary, and fully exploits the benefits of both lexical and semantic matching. Experimental results on two challenging cross-lingual entity linking datasets spanning over 11 languages show that the pivot-based approach outperforms both the lexicon-based and semantic-based approach by a large margin.

SESSION: Session: Online Conversations

The Structure of Toxic Conversations on Twitter

Social media platforms promise to enable rich and vibrant conversations online; however, their potential is often hindered by antisocial behaviors. In this paper, we study the relationship between structure and toxicity in conversations on Twitter. We collect 1.18M conversations (58.5M tweets, 4.4M users) prompted by tweets that are posted by or mention major news outlets over one year and candidates who ran in the 2018 US midterm elections over four months. We analyze the conversations at the individual, dyad, and group level. At the individual level, we find that toxicity is spread across many low to moderately toxic users. At the dyad level, we observe that toxic replies are more likely to come from users who do not have any social connection nor share many common friends with the poster. At the group level, we find that toxic conversations tend to have larger, wider, and deeper reply trees, but sparser follow graphs. To test the predictive power of the conversational structure, we consider two prediction tasks. In the first prediction task, we demonstrate that the structural features can be used to predict whether the conversation will become toxic as early as the first ten replies. In the second prediction task, we show that the structural characteristics of the conversation are also predictive of whether the next reply posted by a specific user will be toxic or not. We observe that the structural and linguistic characteristics of the conversations are complementary in both prediction tasks. Our findings inform the design of healthier social media platforms and demonstrate that models based on the structural characteristics of conversations can be used to detect early signs of toxicity and potentially steer conversations in a less toxic direction.

Interventions for Softening Can Lead to Hardening of Opinions: Evidence from a Randomized Controlled Trial

Motivated by the goal of designing interventions for softening polarized opinions on the Web, and building on results from psychology, we hypothesized that people would be moved more easily towards opposing opinions when the latter were voiced by a celebrity they like, rather than by a celebrity they dislike. We tested this hypothesis in a survey-based randomized controlled trial in which we exposed respondents to opinions that were randomly assigned to one of four spokespersons each: a disagreeing but liked celebrity, a disagreeing and disliked celebrity, a disagreeing expert, and an agreeing but disliked celebrity. After the treatment, we measured changes in the respondents’ opinions, empathy towards the spokespersons, and use of affective language.

Unlike hypothesized, no softening of opinions was observed regardless of the respondents’ attitudes towards the celebrity. Instead, we found strong evidence of a hardening of pretreatment opinions when a disagreeing opinion was attributed to an expert or when an agreeing opinion was attributed to a disliked celebrity. We also observed a pronounced reduction in empathy for disagreeing spokespersons, indicating a punitive response. The only celebrity for whom, on average, empathy remained unchanged was the one who agreed, even though they were disliked.

Our results could be explained as a reaction to violated expectations towards experts and as a perceived breach of trust by liked celebrities. They confirm that naïve strategies at mediation may not yield intended results, and how difficult it is to depolarize—and how easy it is to further polarize or provoke emotional responses.

“Short is the Road that Leads from Fear to Hate”: Fear Speech in Indian WhatsApp Groups

WhatsApp is the most popular messaging app in the world. Due to its popularity, WhatsApp has become a powerful and cheap tool for political campaigning being widely used during the 2019 Indian general election, where it was used to connect to the voters on a large scale. Along with the campaigning, there have been reports that WhatsApp has also become a breeding ground for harmful speech against various protected groups and religious minorities. Many such messages attempt to instil fear among the population about a specific (minority) community. According to research on inter-group conflict, such ‘fear speech’ messages could have a lasting impact and might lead to real offline violence. In this paper, we perform the first large scale study on fear speech across thousands of public WhatsApp groups discussing politics in India. We curate a new dataset and try to characterize fear speech from this dataset. We observe that users writing fear speech messages use various events and symbols to create the illusion of fear among the reader about a target community. We build models to classify fear speech and observe that current state-of-the-art NLP models do not perform well at this task. Fear speech messages tend to spread faster and could potentially go undetected by classifiers built to detect traditional toxic speech due to their low toxic nature. Finally, using a novel methodology to target users with Facebook ads, we conduct a survey among the users of these WhatsApp groups to understand the types of users who consume and share fear speech. We believe that this work opens up new research questions that are very different from tackling hate speech which the research community has been traditionally involved in. We have made our code and dataset public for other researchers.

“Go eat a bat, Chang!”: On the Emergence of Sinophobic Behavior on Web Communities in the Face of COVID-19

The outbreak of the COVID-19 pandemic has changed our lives in unprecedented ways. In the face of the projected catastrophic consequences, most countries have enacted social distancing measures in an attempt to limit the spread of the virus. Under these conditions, the Web has become an indispensable medium for information acquisition, communication, and entertainment. At the same time, unfortunately, the Web is being exploited for the dissemination of potentially harmful and disturbing content, such as the spread of conspiracy theories and hateful speech towards specific ethnic groups, in particular towards Chinese people and people of Asian descent since COVID-19 is believed to have originated from China.

In this paper, we make a first attempt to study the emergence of Sinophobic behavior on the Web during the outbreak of the COVID-19 pandemic. We collect two large datasets from Twitter and 4chan’s Politically Incorrect board (/pol/) over a time period of approximately five months and analyze them to investigate whether there is a rise or important differences with regard to the dissemination of Sinophobic content. We find that COVID-19 indeed drives the rise of Sinophobia on the Web and that the dissemination of Sinophobic content is a cross-platform phenomenon: it exists on fringe Web communities like /pol/, and to a lesser extent on mainstream ones like Twitter. Using word embeddings over time, we characterize the evolution of Sinophobic slurs on both Twitter and /pol/. Finally, we find interesting differences in the context in which words related to Chinese people are used on the Web before and after the COVID-19 outbreak: on Twitter we observe a shift towards blaming China for the situation, while on /pol/ we find a shift towards using more (and new) Sinophobic slurs.

Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations

Online conversations can go in many directions: some turn out poorly due to antisocial behavior, while others turn out positively to the benefit of all. Research on improving online spaces has focused primarily on detecting and reducing antisocial behavior. Yet we know little about positive outcomes in online conversations and how to increase them—is a prosocial outcome simply the lack of antisocial behavior or something more? Here, we examine how conversational features lead to prosocial outcomes within online discussions. We introduce a series of new theory-inspired metrics to define prosocial outcomes such as mentoring and esteem enhancement. Using a corpus of 26M Reddit conversations, we show that these outcomes can be forecasted from the initial comment of an online conversation, with the best model providing a relative 24% improvement over human forecasting performance at ranking conversations for predicted outcome. Our results indicate that platforms can use these early cues in their algorithmic ranking of early conversations to prioritize better outcomes.

SESSION: Session: Systems and Infrastructure

DF-TAR: A Deep Fusion Network for Citywide Traffic Accident Risk Prediction with Dangerous Driving Behavior

Because traffic accidents cause huge social and economic losses, it is of prime importance to precisely predict the traffic accident risk for reducing future accidents. In this paper, we propose a Deep Fusion network for citywide Traffic Accident Risk prediction (DF-TAR) with dangerous driving statistics that contain the frequencies of various dangerous driving offences in each region. Our unique contribution is to exploit these statistics, obtained by processing the data from in-vehicle sensors, for modeling the traffic accident risk. Toward this goal, we first examine the correlation between dangerous driving offences and traffic accidents, and the analysis shows a strong correlation between them in terms of both location and time. Specifically, quick start (0.83), rapid acceleration (0.76), and sharp turn (0.76) are the top three offences that have the highest average correlation scores. We then train the DF-TAR model using the dangerous driving statistics as well as external environmental features. By extensive experiments on various frameworks, the DF-TAR model is shown to improve the accuracy of the baseline models by up to 54% by virtue of the integration of dangerous driving into the modeling of traffic accident risk.

Dissecting Performance of Production QUIC

IETF QUIC, the standardized version of Google’s UDP-based layer-4 network protocol, has seen increasing adoption from large Internet companies for its benefits over TCP. Yet despite its rapid adoption, performance analysis of QUIC in production is scarce. Most existing analyses have only used unoptimized open-source QUIC servers on non-tuned kernels: these analyses are unrepresentative of production deployments which raises the question of whether QUIC actually outperforms TCP in practice.

In this paper, we conduct one of the first comparative studies on the performance of QUIC and TCP against production endpoints hosted by Google, Facebook, and Cloudflare under various dimensions: network conditions, workloads, and client implementations.

To understand our results, we create a tool to systematically visualize the root causes of performance differences between the two protocols. Using our tool we make several key observations. First, while QUIC has some inherent advantages over TCP, such as worst-case 1-RTT handshakes, its overall performance is largely determined by the server’s choice of congestion-control algorithm and the robustness of its congestion-control implementation under edge-case network scenarios. Second, we find that some QUIC clients require non-trivial configuration tuning in order to achieve optimal performance. Lastly, we demonstrate that QUIC’s removal of head-of-line (HOL) blocking has little impact on web-page performance in practice. Taken together, our observations illustrate the fact that QUIC’s performance is inherently tied to implementation design choices, bugs, and configurations which implies that QUIC measurements are not always a reflection of the protocol and often do not generalize across deployments.

XY-Sketch: on Sketching Data Streams at Web Scale

Conventional sketching methods on counting stream item frequencies use hash functions for mapping data items to a concise structure, e.g., a two-dimensional array, at the expense of overcounting due to hashing collisions. Despite the popularity, however, the accumulated errors originated in hashing collisions deteriorate the sketching accuracies at the rapid pace of data increasing, which poses a great challenge to sketch big data streams at web scale. In this paper, we propose a novel structure, called XY-sketch, which estimates the frequency of a data item by estimating the probability of this item appearing in the data stream. The framework associated with XY-sketch consists of two phases, namely decomposition and recomposition phases. A data item is split into a set of compactly stored basic elements, which can be stringed up in a probabilistic manner for query evaluation during the recomposition phase. Throughout, we conduct optimization under space constraints and detailed theoretical analysis. Experiments on both real and synthetic datasets are done to show the superior scalability on sketching large-scale streams. Remarkably, XY-sketch is orders of magnitudes more accurate than existing solutions, when the space budget is small.

NTAM: Neighborhood-Temporal Attention Model for Disk Failure Prediction in Cloud Platforms

With the rapid deployment of cloud platforms, high service reliability is of critical importance. An industrial cloud platform contains a huge number of disks, and disk failure is a common cause of service unreliability. In recent years, many machine learning based disk failure prediction approaches have been proposed, and they can predict disk failures based on disk status data before the failures actually happen. In this way, proactive actions can be taken in advance to improve service reliability. However, existing approaches treat each disk individually and do not explore the influence of the neighboring disks. In this paper, we propose Neighborhood-Temporal Attention Model (NTAM), a novel deep learning based approach to disk failure prediction. When predicting whether or not a disk will fail in near future, NTAM is a novel approach that not only utilizes a disk’s own status data, but also considers its neighbors’ status data. Moreover, NTAM includes a novel attention-based temporal component to capture the temporal nature of the disk status data. Besides, we propose a data enhancement method, called Temporal Progressive Sampling (TPS), to handle the extreme data imbalance issue. We evaluate NTAM on a public dataset as well as two industrial datasets collected from millions of disks in Microsoft Azure. Our experimental results show that NTAM significantly outperforms state-of-the-art competitors. Also, our empirical evaluations indicate the effectiveness of the neighborhood-ware component and the temporal component underlying NTAM as well as the effectiveness of TPS. More encouragingly, we have successfully applied NTAM and TPS to Microsoft cloud platforms (including Microsoft Azure and Microsoft 365) and obtained benefits in industrial practice.

WebSocket Adoption and the Landscape of the Real-Time Web

Developers are increasingly deploying web applications which require real-time bidirectional updates, a use case which does not naturally align with the traditional client-server architecture of the web. Many solutions have arisen to address this need over the preceding decades, including HTTP polling, Server-Sent Events, and WebSockets. This paper investigates this ecosystem and reports on the prevalence, benefits, and drawbacks of these technologies, with a particular focus on the adoption of WebSockets. We crawl the Tranco Top 1 Million websites to build a dataset for studying real-time updates in the wild. We find that HTTP Polling remains significantly more common than WebSockets, and WebSocket adoption appears to have stagnated in the past two to three years. We investigate some of the possible reasons for this decrease in the rate of adoption, and we contrast the adoption process to that of other web technologies. Our findings further suggest that even when WebSockets are employed, the prescribed best practices for securing them are often disregarded. The dataset is made available in the hopes that it may help inform the development of future real-time solutions for the web.

SESSION: Session: Graph Models

Theoretically Improving Graph Neural Networks via Anonymous Walk Graph Kernels

Graph neural networks (GNNs) have achieved tremendous success in graph mining. However, the inability of GNNs to model substructures in graphs remains a significant drawback. Specifically, message-passing GNNs (MPGNNs), as the prevailing type of GNNs, have been theoretically shown unable to distinguish, detect or count many graph substructures. While efforts have been paid to complement the inability, existing works either rely on pre-defined substructure sets, thus being less flexible, or are lacking in theoretical insights. In this paper, we propose GSKN1, a GNN model with a theoretically stronger ability to distinguish graph structures. Specifically, we design GSKN based on anonymous walks (AWs), flexible substructure units, and derive it upon feature mappings of graph kernels (GKs). We theoretically show that GSKN provably extends the 1-WL test, and hence the maximally powerful MPGNNs from both graph-level and node-level viewpoints. Correspondingly, various experiments are leveraged to evaluate GSKN, where GSKN outperforms a wide range of baselines, endorsing the analysis.

Interpreting and Unifying Graph Neural Networks with An Optimization Framework

Graph Neural Networks (GNNs) have received considerable attention on graph-structured data learning for a wide variety of tasks. The well-designed propagation mechanism which has been demonstrated effective is the most fundamental part of GNNs. Although most of GNNs basically follow a message passing manner, litter effort has been made to discover and analyze their essential relations. In this paper, we establish a surprising connection between different propagation mechanisms with a unified optimization problem, showing that despite the proliferation of various GNNs, in fact, their proposed propagation mechanisms are the optimal solution optimizing a feature fitting function over a wide class of graph kernels with a graph regularization term. Our proposed unified optimization framework, summarizing the commonalities between several of the most representative GNNs, not only provides a macroscopic view on surveying the relations between different GNNs, but also further opens up new opportunities for flexibly designing new GNNs. With the proposed framework, we discover that existing works usually utilize naïve graph convolutional kernels for feature fitting function, and we further develop two novel objective functions considering adjustable graph kernels showing low-pass or high-pass filtering capabilities respectively. Moreover, we provide the convergence proofs and expressive power comparisons for the proposed models. Extensive experiments on benchmark datasets clearly show that the proposed GNNs not only outperform the state-of-the-art methods but also have good ability to alleviate over-smoothing, and further verify the feasibility for designing GNNs with our unified optimization framework.

Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework

Semi-supervised learning on graphs is an important problem in the machine learning area. In recent years, state-of-the-art classification methods based on graph neural networks (GNNs) have shown their superiority over traditional ones such as label propagation. However, the sophisticated architectures of these neural models will lead to a complex prediction mechanism, which could not make full use of valuable prior knowledge lying in the data, e.g., structurally correlated nodes tend to have the same class. In this paper, we propose a framework based on knowledge distillation to address the above issues. Our framework extracts the knowledge of an arbitrary learned GNN model (teacher model), and injects it into a well-designed student model. The student model is built with two simple prediction mechanisms, i.e., label propagation and feature transformation, which naturally preserves structure-based and feature-based prior knowledge, respectively. In specific, we design the student model as a trainable combination of parameterized label propagation and feature transformation modules. As a result, the learned student can benefit from both prior knowledge and the knowledge in GNN teachers for more effective predictions. Moreover, the learned student model has a more interpretable prediction process than GNNs. We conduct experiments on five public benchmark datasets and employ seven GNN models including GCN, GAT, APPNP, SAGE, SGC, GCNII and GLP as the teacher models. Experimental results show that the learned student model can consistently outperform its corresponding teacher model by on average. Code and data are available at https://github.com/BUPT-GAMMA/CPF

CurGraph: Curriculum Learning for Graph Classification

Graph neural networks (GNNs) have achieved state-of-the-art performance on graph classification tasks. Existing work usually feeds graphs to GNNs in random order for training. However, graphs can vary greatly in their difficulty for classification, and we argue that GNNs can benefit from an easy-to-difficult curriculum, similar to the learning process of humans. Evaluating the difficulty of graphs is challenging due to the high irregularity of graph data. To address this issue, we present the CurGraph (Curriculum Learning for Graph Classification) framework, that analyzes the graph difficulty in the high-level semantic feature space. Specifically, we use the infomax method to obtain graph-level embeddings and a neural density estimator to model the embedding distributions. Then we calculate the difficulty scores of graphs based on the intra-class and inter-class distributions of their embeddings. Given the difficulty scores, CurGraph first exposes a GNN to easy graphs, before gradually moving on to hard ones. To provide a soft transition from easy to hard, we propose a smooth-step method, which utilizes a time-variant smooth function to filter out hard graphs. Thanks to CurGraph, a GNN learns from the graphs at the border of its capability, neither too easy or too hard, to gradually expand its border at each training step. Empirically, CurGraph yields significant gains for popular GNN models on graph classification and enables them to achieve superior performance on miscellaneous graphs.

Lorentzian Graph Convolutional Networks

Graph convolutional networks (GCNs) have received considerable research attention recently. Most GCNs learn the node representations in Euclidean geometry, but that could have a high distortion in the case of embedding graphs with scale-free or hierarchical structure. Recently, some GCNs are proposed to deal with this problem in non-Euclidean geometry, e.g., hyperbolic geometry. Although hyperbolic GCNs achieve promising performance, existing hyperbolic graph operations actually cannot rigorously follow the hyperbolic geometry, which may limit the ability of hyperbolic geometry and thus hurt the performance of hyperbolic GCNs. In this paper, we propose a novel hyperbolic GCN named Lorentzian graph convolutional network (LGCN), which rigorously guarantees the learned node features follow the hyperbolic geometry. Specifically, we rebuild the graph operations of hyperbolic GCNs with Lorentzian version, e.g., the feature transformation and non-linear activation. Also, an elegant neighborhood aggregation method is designed based on the centroid of Lorentzian distance. Moreover, we prove some proposed graph operations are equivalent in different types of hyperbolic geometry, which fundamentally indicates their correctness. Experiments on six datasets show that LGCN performs better than the state-of-the-art methods. LGCN has lower distortion to learn the representation of tree-likeness graphs compared with existing hyperbolic GCNs. We also find that the performance of some hyperbolic GCNs can be improved by simply replacing the graph operations with those we defined in this paper.

SESSION: Session: Recommendations

Linear-Time Self Attention with Codeword Histogram for Efficient Recommendation

Self-attention has become increasingly popular in a variety of sequence modeling tasks from natural language processing to recommendation, due to its effectiveness. However, self-attention suffers from quadratic computational and memory complexities, prohibiting its applications on long sequences. Existing approaches that address this issue mainly rely on a sparse attention context, either using a local window, or a permuted bucket obtained by locality-sensitive hashing (LSH) or sorting, while crucial information may be lost. Inspired by the idea of vector quantization that uses cluster centroids to approximate items, we propose LISA (LInear-time Self Attention), which enjoys both the effectiveness of vanilla self-attention and the efficiency of sparse attention. LISA scales linearly with the sequence length, while enabling full contextual attention via computing differentiable histograms of codeword distributions. Meanwhile, unlike some efficient attention methods, our method poses no restriction on casual masking or sequence length. We evaluate our method on four real-world datasets for sequential recommendation. The results show that LISA outperforms the state-of-the-art efficient attention methods in both performance and speed; and it is up to 57x faster and 78x more memory efficient than vanilla self-attention.

Learning Heterogeneous Temporal Patterns of User Preference for Timely Recommendation

Recommender systems have achieved great success in modeling user’s preferences on items and predicting the next item the user would consume. Recently, there have been many efforts to utilize time information of users’ interactions with items to capture inherent temporal patterns of user behaviors and offer timely recommendations at a given time. Existing studies regard the time information as a single type of feature and focus on how to associate it with user preferences on items. However, we argue they are insufficient for fully learning the time information because the temporal patterns of user preference are usually heterogeneous. A user’s preference for a particular item may 1) increase periodically or 2) evolve over time under the influence of significant recent events, and each of these two kinds of temporal pattern appears with some unique characteristics. In this paper, we first define the unique characteristics of the two kinds of temporal pattern of user preference that should be considered in time-aware recommender systems. Then we propose a novel recommender system for timely recommendations, called TimelyRec, which jointly learns the heterogeneous temporal patterns of user preference considering all of the defined characteristics. In TimelyRec, a cascade of two encoders captures the temporal patterns of user preference using a proposed attention module for each encoder. Moreover, we introduce an evaluation scenario that evaluates the performance on predicting an interesting item and when to recommend the item simultaneously in top-K recommendation (i.e., item-timing recommendation). Our extensive experiments on a scenario for item recommendation and the proposed scenario for item-timing recommendation on real-world datasets demonstrate the superiority of TimelyRec and the proposed attention modules.

Drug Package Recommendation via Interaction-aware Graph Induction

Recent years have witnessed the rapid accumulation of massive electronic medical records (EMRs), which highly support the intelligent medical services such as drug recommendation. However, prior arts mainly follow the traditional recommendation strategies like collaborative filtering, which usually treat individual drugs as mutually independent, while the latent interactions among drugs, e.g., synergistic or antagonistic effect, have been largely ignored. To that end, in this paper, we target at developing a new paradigm for drug package recommendation with considering the interaction effect within drugs, in which the interaction effects could be affected by patient conditions. Specifically, we first design a pre-training method based on neural collaborative filtering to get the initial embedding of patients and drugs. Then, the drug interaction graph will be initialized based on medical records and domain knowledge. Along this line, we propose a new Drug Package Recommendation (DPR) framework with two variants, respectively DPR on Weighted Graph (DPR-WG) and DPR on Attributed Graph (DPR-AG) to solve the problem, in which each the interactions will be described as signed weights or attribute vectors. In detail, a mask layer is utilized to capture the impact of patient condition, and graph neural networks (GNNs) are leveraged for the final graph induction task to embed the package. Extensive experiments on a real-world data set from a first-rate hospital demonstrate the effectiveness of our DPR framework compared with several competitive baseline methods, and further support the heuristic study for the drug package generation task with adequate performance.

Interest-aware Message-Passing GCN for Recommendation

Graph Convolution Networks (GCNs) manifest great potential in recommendation. This is attributed to their capability on learning good user and item embeddings by exploiting the collaborative signals from the high-order neighbors. Like other GCN models, the GCN based recommendation models also suffer from the notorious over-smoothing problem – when stacking more layers, node embeddings become more similar and eventually indistinguishable, resulted in performance degradation. The recently proposed LightGCN and LR-GCN alleviate this problem to some extent, however, we argue that they overlook an important factor for the over-smoothing problem in recommendation, that is, high-order neighboring users with no common interests of a user can be also involved in the user’s embedding learning in the graph convolution operation. As a result, the multi-layer graph convolution will make users with dissimilar interests have similar embeddings. In this paper, we propose a novel Interest-aware Message-Passing GCN (IMP-GCN) recommendation model, which performs high-order graph convolution inside subgraphs. The subgraph consists of users with similar interests and their interacted items. To form the subgraphs, we design an unsupervised subgraph generation module, which can effectively identify users with common interests by exploiting both user feature and graph structure. To this end, our model can avoid propagating negative information from high-order neighbors into embedding learning. Experimental results on three large-scale benchmark datasets show that our model can gain performance improvement by stacking more layers and outperform the state-of-the-art GCN-based recommendation models significantly.

Task-adaptive Neural Process for User Cold-Start Recommendation

User cold-start recommendation is a long-standing challenge for recommender systems due to the fact that only a few interactions of cold-start users can be exploited. Recent studies seek to address this challenge from the perspective of meta learning, and most of them follow a manner of parameter initialization, where the model parameters can be learned by a few steps of gradient updates. While these gradient-based meta-learning models achieve promising performances to some extent, a fundamental problem of them is how to adapt the global knowledge learned from previous tasks for the recommendations of cold-start users more effectively.

In this paper, we develop a novel meta-learning recommender called task-adaptive neural process (TaNP). TaNP is a new member of the neural process family, where making recommendations for each user is associated with a corresponding stochastic process. TaNP directly maps the observed interactions of each user to a predictive distribution, sidestepping some training issues in gradient-based meta-learning models. More importantly, to balance the trade-off between model capacity and adaptation reliability, we introduce a novel task-adaptive mechanism. It enables our model to learn the relevance of different tasks and customize the global knowledge to the task-related decoder parameters for estimating user preferences. We validate TaNP on multiple benchmark datasets in different experimental settings. Empirical results demonstrate that TaNP yields consistent improvements over several state-of-the-art meta-learning recommenders.

SESSION: Session: Sampling

Consistent Sampling Through Extremal Process

The1 Jaccard similarity has been widely used in search and machine learning, especially in industrial practice. For binary (0/1) data, the Jaccard similarity is often called the “resemblance” and the method of minwise hashing has been the standard tool for computing resemblances in massive data. For general weighted data, the commonly used sampling algorithm for computing the (weighted) Jaccard similarity is the Consistent Weighted Sampling (CWS). A convenient (and perhaps also mysterious) implementation of CWS is the so-called “0-bit CWS” published in KDD 2015 [31], which, in this paper, we refer to as the “relaxed CWS” and was purely an empirical observation without theoretical justification. The difficulty in the analysis of the “relaxed CWS” is due to the complicated probability problem, which we could not resolve at this point.

In this paper, we propose using extremal processes to generate samples for estimating the Jaccard similarity. Surprisingly, the proposed “extremal sampling” (ES) scheme makes it possible to analyze the “relaxed ES” variant. Through some novel probability endeavours, we are able to rigorously compute the bias of the “relaxed ES” which, to a good extent, explains why the “relaxed ES” works so well and when it does not in extreme corner cases. Interestingly, compared with CWS, the resultant algorithm only involves counting and does not need sophisticated mathematical operations (as required by CWS). It is therefore not surprising that the proposed ES scheme is actually noticeably faster than CWS.

Although ES is different from CWS (and other algorithms in the literature for estimating the Jaccard similarity), in retrospect ES is indeed closely related to CWS. This paper provides the much needed insight which connects CWS with extremal processes. This insight may help understand CWS (and variants), and might help develop new algorithms for similarity estimation, in future research.

Beyond Outlier Detection: Outlier Interpretation by Attention-Guided Triplet Deviation Network

Outlier detection is an important task in many domains and is intensively studied in the past decade. Further, how to explain outliers, i.e., outlier interpretation, is more significant, which can provide valuable insights for analysts to better understand, solve, and prevent these detected outliers. However, only limited studies consider this problem. Most of the existing methods are based on the score-and-search manner. They select a feature subspace as interpretation per queried outlier by estimating outlying scores of the outlier in searched subspaces. Due to the tremendous searching space, they have to utilize pruning strategies and set a maximum subspace length, often resulting in suboptimal interpretation results. Accordingly, this paper proposes a novel Attention-guided Triplet deviation network for Outlier interpretatioN (ATON). Instead of searching a subspace, ATON directly learns an embedding space and learns how to attach attention to each embedding dimension (i.e., capturing the contribution of each dimension to the outlierness of the queried outlier). Specifically, ATON consists of a feature embedding module and a customized self-attention learning module, which are optimized by a triplet deviation-based loss function. We obtain an optimal attention-guided embedding space with expanded high-level information and rich semantics, and thus outlying behaviors of the queried outlier can be better unfolded. ATON finally distills a subspace of original features from the embedding module and the attention coefficient. With the good generality, ATON can be employed as an additional step of any black-box outlier detector. A comprehensive suite of experiments is conducted to evaluate the effectiveness and efficiency of ATON. The proposed ATON significantly outperforms state-of-the-art competitors on 12 real-world datasets and obtains good scalability w.r.t. both data dimensionality and data size.

Fair and Representative Subset Selection from Data Streams

We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be formulated as maximizing a monotone submodular function subject to a cardinality constraint k. In this work, we consider the setting where data items in the stream belong to one of several disjoint groups and investigate the optimization problem with an additional fairness constraint that limits selection to a given number of items from each group. We then propose efficient algorithms for the fairness-aware variant of the streaming submodular maximization problem. In particular, we first give a -approximation algorithm that requires passes over the stream for any constant ε > 0. Moreover, we give a single-pass streaming algorithm that has the same approximation ratio of when unlimited buffer sizes and post-processing time are permitted, and discuss how to adapt it to more practical settings where the buffer sizes are bounded. Finally, we demonstrate the efficiency and effectiveness of our proposed algorithms on two real-world applications, namely maximum coverage on large graphs and personalized recommendation.

CLEAR: Contrastive-Prototype Learning with Drift Estimation for Resource Constrained Stream Mining

Non-stationary data stream mining aims to classify large scale online instances that emerge continuously. The most apparent challenge compared with the offline learning manner is the issue of consecutive emergence of new categories, when tackling non-static categorical distribution. Non-stationary stream settings often appear in real-world applications, e.g., online classification in E-commerce systems that involves the incoming productions, or the summary of news topics on social networks (Twitter). Ideally, a learning model should be able to learn novel concepts from labeled data (in new tasks) and reduce the abrupt degradation of model performance on the old concept (also named catastrophic forgetting problem). In this work, we focus on improving the performance of the stream mining approach under the constrained resources, where both the memory resource of old data and labeled new instances are limited/scarce. We propose a simple yet efficient resource-constrained framework CLEAR to facilitate previous challenges during the one-pass stream mining. Specifically, CLEAR focuses on creating and calibrating the class representation (the prototype) in the embedding space. We first apply the contrastive-prototype learning on large amount of unlabeled data, and generate the discriminative prototype for each class in the embedding space. Next, for updating on new tasks/categories, we propose a drift estimation strategy to calibrate/compensate for the drift of each class representation, which could reduce the knowledge forgetting without storing any previous data. We perform experiments on public datasets (e.g., CUB200, CIFAR100) under stream setting, our approach is consistently and clearly better than many state-of-the-art methods, along with both the memory and annotation restriction.

Diversity on the Go! Streaming Determinantal Point Processes under a Maximum Induced Cardinality Objective

Over the past decade, Determinantal Point Processes (DPPs) have proven to be a mathematically elegant framework for modeling diversity. Given a set of items N, DPPs define a probability distribution over subsets of N, with sets of larger diversity having greater probability. Recently, DPPs have achieved success in the domain of recommendation systems, as a method to enforce diversity of recommendations in addition to relevance. In large-scale recommendation applications however, the input typically comes in the form of a stream too large to fit into main memory. However, the natural greedy algorithm for DPP-based recommendations is memory intensive, and cannot be used in a streaming setting.

In this work, we give the first streaming algorithm for optimizing DPPs under the Maximum Induced Cardinality (MIC) objective of Gillenwater et al.  [15]. As noted by [15], the MIC objective is better suited towards recommendation systems than the classically used maximum a posteriori (MAP) DPP objective. In the insertion-only streaming model, our algorithm runs in time per update and uses memory, where k is the number of diverse items to be selected. In the sliding window streaming model, our algorithm runs in time per update and memory where n is the size of the sliding window. The approximation guarantees are simple, and depend on the largest and the k-th largest eigenvalues of the kernel matrix used to model diversity. We show that in practice, the algorithm often achieves close to optimal results, and meets the memory and latency requirements of production systems. Furthermore, the algorithm works well even in a non-streaming setting, and runs in a fraction of time compared to the classic greedy algorithm.

SESSION: Session: Knowledge Graphs

Self-Supervised Hyperboloid Representations from Logical Queries over Knowledge Graphs

Knowledge Graphs (KGs) are ubiquitous structures for information storage in several real-world applications such as web search, e-commerce, social networks, and biology. Querying KGs remains a foundational and challenging problem due to their size and complexity. Promising approaches to tackle this problem include embedding the KG units (e.g., entities and relations) in a Euclidean space such that the query embedding contains the information relevant to its results. These approaches, however, fail to capture the hierarchical nature and semantic information of the entities present in the graph. Additionally, most of these approaches only utilize multi-hop queries (that can be modeled by simple translation operations) to learn embeddings and ignore more complex operations such as intersection, and union of simpler queries. To tackle such complex operations, in this paper, we formulate KG representation learning as a self-supervised logical query reasoning problem that utilizes translation, intersection and union queries over KGs. We propose Hyperboloid Embeddings (HypE), a novel self-supervised dynamic reasoning framework, that utilizes positive first-order existential queries on a KG to learn representations of its entities and relations as hyperboloids in a Poincaré ball. HypE models the positive first-order queries as geometrical translation, intersection, and union. For the problem of KG reasoning in real-world datasets, the proposed HypE model significantly outperforms the state-of-the art results. We also apply HypE to an anomaly detection task on a popular e-commerce website product taxonomy as well as hierarchically organized web articles and demonstrate significant performance improvements compared to existing baseline methods. Finally, we also visualize the learned HypE embeddings in a Poincaré ball to clearly interpret and comprehend the representation space.

ColChain: Collaborative Linked Data Networks

One of the major obstacles that currently prevents the Semantic Web from exploiting its full potential is that the data it provides access to is sometimes not available or outdated. The reason is rooted deep within its architecture that relies on data providers to keep the data available, queryable, and up-to-date at all times – an expectation that many data providers in reality cannot live up to for an extended (or infinite) period of time. Hence, decentralized architectures have recently been proposed that use replication to keep the data available in case the data provider fails. Although this increases availability, it does not help keeping the data up-to-date or allow users to query and access previous versions of a dataset. In this paper, we therefore propose ColChain (COLlaborative knowledge CHAINs), a novel decentralized architecture based on blockchains that not only lowers the burden for the data providers but at the same time also allows users to propose updates to faulty or outdated data, trace updates back to their origin, and query older versions of the data. Our extensive experiments show that ColChain reaches these goals while achieving query processing performance comparable to the state of the art.

MedPath: Augmenting Health Risk Prediction via Medical Knowledge Paths

The broad adoption of electronic health records (EHR) data and the availability of biomedical knowledge graphs (KGs) on the web have provided clinicians and researchers unprecedented resources and opportunities for conducting health risk predictions to improve healthcare quality and medical resource allocation. Existing methods have focused on improving the EHR feature representations using attention mechanisms, time-aware models, or external knowledge. However, they ignore the importance of using personalized information to make predictions. Besides, the reliability of their prediction interpretations needs to be improved since their interpretable attention scores are not explicitly reasoned from disease progression paths. In this paper, we propose MedPath to solve these challenges and augment existing risk prediction models with the ability to use personalized information and provide reliable interpretations inferring from disease progression paths. Firstly, MedPath extracts personalized knowledge graphs (PKGs) containing all possible disease progression paths from observed symptoms to target diseases from a large-scale online medical knowledge graph. Next, to augment existing EHR encoders for achieving better predictions, MedPath learns a PKG embedding by conducting multi-hop message passing from symptom nodes to target disease nodes through a graph neural network encoder. Since MedPath reasons disease progression by paths existing in PKGs, it can provide explicit explanations for the prediction by pointing out how observed symptoms can finally lead to target diseases. Experimental results on three real-world medical datasets show that MedPath is effective in improving the performance of eight state-of-the-art methods with higher F1 scores and AUCs. Our case study also demonstrates that MedPath can greatly improve the explicitness of the risk prediction interpretation.1

Efficient Computation of Semantically Cohesive Subgraphs for Keyword-Based Knowledge Graph Exploration

A knowledge graph (KG) represents a set of entities and their relations. To explore the content of a large and complex KG, a convenient way is keyword-based querying. Traditional methods assign small weights to salient entities or relations, and answer an exploratory keyword query by computing a group Steiner tree (GST), which is a minimum-weight subgraph that connects all the keywords in the query. Recent studies have suggested improving the semantic cohesiveness of a query answer by minimizing the pairwise semantic distances between the entities in a subgraph, but it remains unclear how to efficiently compute such a semantically cohesive subgraph. In this paper, we formulate it as a quadratic group Steiner tree problem (QGSTP) by extending the classical minimum-weight GST problem which is NP-hard. We design two approximation algorithms for QGSTP and prove their approximation ratios. Furthermore, to improve their practical performance, we present heuristics including pruning and ranking strategies.

WiseKG: Balanced Access to Web Knowledge Graphs

SPARQL query services that balance processing between clients and servers become more and more essential to handle the increasing load for open and decentralized knowledge graphs on the Web. To this end, Linked Data Fragments (LDF) have introduced a foundational framework that has sparked research exploring a spectrum of potential Web querying interfaces in between server-side query processing via SPARQL endpoints and client-side query processing of data dumps. Current proposals in between typically suffer from imbalanced load on either the client or the server. In this paper, to the best of our knowledge, we present the first work that combines both client-side and server-side query optimization techniques in a truly dynamic fashion: we introduce WiseKG, a system that employs a cost model that dynamically delegates the load between servers and clients by combining client-side processing of shipped partitions with efficient server-side processing of star-shaped sub-queries, based on current server workload and client capabilities. Our experiments show that WiseKG significantly outperforms state-of-the-art solutions in terms of average total query execution time per client, while at the same time decreasing network traffic and increasing server-side availability.

SESSION: Session: Mobile and Ubiquitous Computing

A Longitudinal Study of Removed Apps in iOS App Store

To improve app quality and nip the potential threats in the bud, modern app markets have released strict guidelines along with app vetting process before app publishing. However, there has been growing evidence showing the ineffectiveness of app vetting, making potentially harmful and policy-violation apps sneak into the market from time to time. Therefore, app removal is a common practice, and market maintainers have to remove undesired apps from the market periodically in a reactive manner. Although a number of reports and news media have mentioned removed apps, our research community still lacks the comprehensive understanding of the landscape of this kind of apps. To fill the void, in this paper, we present a large-scale and longitudinal study of removed apps in iOS app store. We first make great efforts to record daily snapshot of iOS app store continuously in a span of 1.5 years. By comparing each two consecutive snapshots, we have collected the information of over 1 million removed apps with their accurate removed date. This comprehensive dataset enables us to characterize the overall landscape of removed apps. We observe that, although most of the removed apps are low-quality apps (e.g., outdated and abandoned), a number of the removed apps are quite popular. We further investigate the practical reasons leading to the removal of such popular apps, and observe several interesting reasons, including ranking fraud, fake description, and content issues, etc. More importantly, most of these mis-behaviors can be reflected on app meta information including app description, app review, and ASO keywords. It motivates us to design an automated approach to flagging the removed apps. Experiment result suggests that, even without accessing to the bytecode of mobile apps, we can identify the removed apps with good performance (F1=83%). Furthermore, we are able to flag the removed apps in advance as long as their inappropriate behaviors appear in their metadata. We believe our approach can work as a whistle blower that pinpoints policy-violation behaviors timely, which will be quite effective in improving the app maintenance process.

Demystifying Illegal Mobile Gambling Apps

Mobile gambling app, as a new type of online gambling service emerging in the mobile era, has become one of the most popular and lucrative underground businesses in the mobile app ecosystem. Since its born, mobile gambling app has received strict regulations from both government authorities and app markets. However, to the best of our knowledge, mobile gambling apps have not been investigated by our research community. In this paper, we take the first step to fill the void. Specifically, we first perform a 5-month dataset collection process to harvest illegal gambling apps in China, where mobile gambling apps are outlawed. We have collected 3,366 unique gambling apps with 5,344 different versions. We then characterize the gambling apps from various perspectives including app distribution channels, network infrastructure, malicious behaviors, abused third-party and payment services. Our work has revealed a number of covert distribution channels, the unique characteristics of gambling apps, and the abused fourth-party payment services. At last, we further propose a “guilt-by-association” expansion method to identify new suspicious gambling services, which help us further identify over 140K suspicious gambling domains and over 57K gambling app candidates. Our study demonstrates the urgency for detecting and regulating illegal gambling apps.

ReACt: A Resource-centric Access Control System for Web-app Interactions on Android

We identify and survey five mechanisms through which web content interacts with mobile apps. While useful, these web-app interaction mechanisms cause various notable security vulnerabilities on mobile apps or web content. The root cause is lack of proper access control mechanisms for web-app interactions on mobile OSes. Existing solutions usually adopt either an origin-centric design or a code-centric design, and suffer from one or several of the following limitations: coarse protection granularity, poor flexibility in terms of access control policy establishment, and incompatibility with existing apps/OSes due to the need of modifying the apps and/or the underlying OS. More importantly, none of the existing works can organically deal with all the five web-app interaction mechanisms. In this paper, we propose ReACt, a novel Resource-centric Access Control design that can coherently work with all the web-app interaction mechanisms while addressing the above-mentioned limitations. We have implemented a prototype system on Android, and performed extensive evaluation on it. The evaluation results show that our system works well with existing commercial off-the-shelf Android apps and different versions of Android OS, and it can achieve the design goals with small overhead.

NeuroPose: 3D Hand Pose Tracking using EMG Wearables

Ubiquitous finger motion tracking enables a number of exciting applications in augmented reality, sports analytics, rehabilitation-healthcare, haptics etc. This paper presents NeuroPose, a system that shows the feasibility of 3D finger motion tracking using a platform of wearable ElectroMyoGraphy (EMG) sensors. EMG sensors can sense electrical potential from muscles due to finger activation, thus offering rich information for fine-grained finger motion sensing. However converting the sensor information to 3D finger poses is non trivial since signals from multiple fingers superimpose at the sensor in complex patterns. Towards solving this problem, NeuroPose fuses information from anatomical constraints of finger motion with machine learning architectures on Recurrent Neural Networks (RNN), Encoder-Decoder Networks, and ResNets to extract 3D finger motion from noisy EMG data. The generated motion pattern is temporally smooth as well as anatomically consistent. Furthermore, a transfer learning algorithm is leveraged to adapt a pretrained model on one user to a new user with minimal training overhead. A systematic study with 12 users demonstrates a median error of 6.24° and a 90%-ile error of 18.33° in tracking 3D finger joint angles. The accuracy is robust to natural variation in sensor mounting positions as well as changes in wrist positions of the user. NeuroPose is implemented on a smartphone with a processing latency of 0.101s, and a low energy overhead.

Whale Watching in Inland Indonesia: Analyzing a Small, Remote, Internet-Based Community Cellular Network

While only generating a minuscule percentage of global traffic, largely lost in the noise of large-scale analyses, remote rural networks are the physical frontier of the Internet today. Through tight integration with a local operator’s infrastructure, we gather a unique dataset to characterize and report a year of interaction between finances, utilization, and performance of a novel, remote, data-only Community LTE Network in Bokondini, Indonesia. With visibility to drill down to individual users, we find use highly unbalanced and the network supported by only a handful of relatively heavy consumers. 45% of users are offline more days than online, and the median user consumes only 77 MB per day online and 36 MB per day on average, limiting consumption by frequently “topping up” in small amounts. Outside video and social media, messaging and IP calling provided by over-the-top services like Facebook Messenger, QQ, and WhatsApp comprise a relatively large percentage of traffic consistently across both heavy and light users. Our analysis shows that Internet-only Community Cellular Networks can be profitable despite most users spending less than $1 USD/day, and offers insights into the unique properties of these networks.

SESSION: Session: Neural Networks

Learning Neural Point Processes with Latent Graphs

Neural point processes (NPPs) employ neural networks to capture complicated dynamics of asynchronous event sequences. Existing NPPs feed all history events into neural networks, assuming that all event types contribute to the prediction of the target type. However, this assumption can be problematic because in reality some event types do not contribute to the predictions of another type. To correct this defect, we learn to omit those types of events that do not contribute to the prediction of one target type during the formulation of NPPs. Towards this end, we simultaneously consider the tasks of (1) finding event types that contribute to predictions of the target types and (2) learning a NPP model from event sequences. For the former, we formulate a latent graph, with event types being vertices and non-zero contributing relationships being directed edges; then we propose a probabilistic graph generator, from which we sample a latent graph. For the latter, the sampled graph can be readily used as a plug-in to modify an existing NPP model. Because these two tasks are nested, we propose to optimize the model parameters through bilevel programming, and develop an efficient solution based on truncated gradient back-propagation. Experimental results on both synthetic and real-world datasets show the improved performance against state-of-the-art baselines. This work removes disturbance of non-contributing event types with the aid of a validation procedure, similar to the practice to mitigate overfitting used when training machine learning models.

OCT-GAN: Neural ODE-based Conditional Tabular GANs

Synthesizing tabular data is attracting much attention these days for various purposes. With sophisticate synthetic data, for instance, one can augment its training data. For the past couple of years, tabular data synthesis techniques have been greatly improved. Recent work made progress to address many problems in synthesizing tabular data, such as the imbalanced distribution and multimodality problems. However, the data utility of state-of-the-art methods is not satisfactory yet. In this work, we significantly improve the utility by designing our generator and discriminator based on neural ordinary differential equations (NODEs). After showing that NODEs have theoretically preferred characteristics for generating tabular data, we introduce our designs. The NODE-based discriminator performs a hidden vector evolution trajectory-based classification rather than classifying with a hidden vector at the last layer only. Our generator also adopts an ODE layer at the very beginning of its architecture to transform its initial input vector (i.e., the concatenation of a noisy vector and a condition vector in our case) onto another latent vector space suitable for the generation process. We conduct experiments with 13 datasets, including but not limited to insurance fraud detection, online news article prediction, and so on, and our presented method outperforms other state-of-the-art tabular data synthesis methods in many cases of our classification, regression, and clustering experiments.

Neural Collaborative Reasoning

Existing Collaborative Filtering (CF) methods are mostly designed based on the idea of matching, i.e., by learning user and item embeddings from data using shallow or deep models, they try to capture the associative relevance patterns in data, so that a user embedding can be matched with relevant item embeddings using designed or learned similarity functions. However, as a cognition rather than a perception intelligent task, recommendation requires not only the ability of pattern recognition and matching from data, but also the ability of cognitive reasoning in data.

In this paper, we propose to advance Collaborative Filtering (CF) to Collaborative Reasoning (CR), which means that each user knows part of the reasoning space, and they collaborate for reasoning in the space to estimate preferences for each other. Technically, we propose a Neural Collaborative Reasoning (NCR) framework to bridge learning and reasoning. Specifically, we integrate the power of representation learning and logical reasoning, where representations capture similarity patterns in data from perceptual perspectives, and logic facilitates cognitive reasoning for informed decision making. An important challenge, however, is to bridge differentiable neural networks and symbolic reasoning in a shared architecture for optimization and inference. To solve the problem, we propose a modularized reasoning architecture, which learns logical operations such as AND (∧), OR (∨) and NOT (¬) as neural modules for implication reasoning (→). In this way, logical expressions can be equivalently organized as neural networks, so that logical reasoning and prediction can be conducted in a continuous space. Experiments on real-world datasets verified the advantages of our framework compared with both shallow, deep and reasoning models.

Deep Co-Attention Network for Multi-View Subspace Learning

Many real-world applications involve data from multiple modalities and thus exhibit the view heterogeneity. For example, user modeling on social media might leverage both the topology of the underlying social network and the content of the users’ posts; in the medical domain, multiple views could be X-ray images taken at different poses. To date, various techniques have been proposed to achieve promising results, such as canonical correlation analysis based methods, etc. In the meanwhile, it is critical for decision-makers to be able to understand the prediction results from these methods. For example, given the diagnostic result that a model provided based on the X-ray images of a patient at different poses, the doctor needs to know why the model made such a prediction. However, state-of-the-art techniques usually suffer from the inability to utilize the complementary information of each view and to explain the predictions in an interpretable manner.

To address these issues, in this paper, we propose a deep co-attention network for multi-view subspace learning, which aims to extract both the common information and the complementary information in an adversarial setting and provide robust interpretations behind the prediction to the end-users via the co-attention mechanism. In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation by incorporating the classifier into our model. This improves the quality of latent representation and accelerates the convergence speed. Finally, we develop an efficient iterative algorithm to find the optimal encoders and discriminator, which are evaluated extensively on synthetic and real-world data sets. We also conduct a case study to demonstrate how the proposed method robustly interprets the predictions on an image data set.

ATJ-Net: Auto-Table-Join Network for Automatic Learning on Relational Databases

A relational database, consisting of multiple tables, provides heterogeneous information across various entities, widely used in real-world services. This paper studies the supervised learning task on multiple tables, aiming to predict one label column with the help of multiple-tabular data. However, classical ML techniques mainly focus on single-tabular data. Multiple-tabular data refers to many-to-many mapping among joinable attributes and n-ary relations, which cannot be utilized directly by classical ML techniques. Besides, current graph techniques, like heterogeneous information network (HIN) and graph neural networks (GNN), are infeasible to be deployed directly and automatically in a multi-table environment, which limits the learning on databases.

For automatic learning on relational databases, we propose an auto-table-join network (ATJ-Net). Multiple tables with relationships are considered as a hypergraph, where vertices are joinable attributes and hyperedges are tuples of tables. Then, ATJ-Net builds a graph neural network on the heterogeneous hypergraph, which samples and aggregates the vertices and hyperedges on n-hop sub-graphs as the receptive field. In order to enable ATJ-Net to be automatically deployed to different datasets and avoid the ”no free lunch” dilemma, we use random architecture search to select optimal aggregators and prune redundant paths in the network. For verifying the effectiveness of our methods across various tasks and schema, we conduct extensive experiments on 4 tasks, 8 various schemas, and 19 sub-datasets w.r.t. citing prediction, review classification, recommendation, and task-blind challenge. ATJ-Net achieves the best performance over state-of-the-art approaches on three tasks and is competitive with KddCup Winner solution on task-blind challenge.

SESSION: Session: Personalization

A Cooperative Memory Network for Personalized Task-oriented Dialogue Systems with Incomplete User Profiles

There is increasing interest in developing personalized Task-oriented Dialogue Systems (TDSs). Previous work on personalized TDSs often assumes that complete user profiles are available for most or even all users. This is unrealistic because In this paper, we study personalized TDSs without assuming that user profiles are complete. We propose a Cooperative Memory Network (CoMemNN) that has a novel mechanism to gradually enrich user profiles as dialogues progress and to simultaneously improve response selection based on the enriched profiles. Cooperative Memory Network (CoMemNN) consists of two core modules: User Profile Enrichment (UPE) and Dialogue Response Selection (DRS). The former enriches incomplete user profiles by utilizing collaborative information from neighbor users as well as current dialogues. The latter uses the enriched profiles to update the current user query so as to encode more useful information, based on which a personalized response to a user request is selected.

We conduct extensive experiments on the personalized bAbI dialogue benchmark datasets. We find that CoMemNN is able to enrich user profiles effectively, which results in an improvement of 3.06% in terms of response selection accuracy compared to state-of-the-art methods. We also test the robustness of CoMemNN against incompleteness of user profiles by randomly discarding attribute values from user profiles. Even when discarding 50% of the attribute values, CoMemNN is able to match the performance of the best performing baseline without discarding user profiles, showing the robustness of CoMemNN.

Stimuli-Sensitive Hawkes Processes for Personalized Student Procrastination Modeling

Student procrastination and cramming for deadlines are major challenges in online learning environments, with negative educational and well-being side effects. Modeling student activities in continuous time and predicting their next study time are important problems that can help in creating personalized timely interventions to mitigate these challenges. However, previous attempts on dynamic modeling of student procrastination suffer from major issues: they are unable to predict the next activity times, cannot deal with missing activity history, are not personalized, and disregard important course properties, such as assignment deadlines, that are essential in explaining the cramming behavior. To resolve these problems, we introduce a new personalized stimuli-sensitive Hawkes process model (SSHP), by jointly modeling all student-assignment pairs and utilizing their similarities, to predict students’ next activity times even when there are no historical observations. Unlike regular point processes that assume a constant external triggering effect from the environment, we model three dynamic types of external stimuli, according to assignment availabilities, assignment deadlines, and each student’s time management habits. Our experiments on two synthetic datasets and two real-world datasets show a superior performance of future activity prediction, comparing with state-of-the-art models. Moreover, we show that our model achieves a flexible and accurate parameterization of activity intensities in students.

Personalized Treatment Selection using Causal Heterogeneity

Randomized experimentation (also known as A/B testing or bucket testing) is widely used in the internet industry to measure the metric impact obtained by different treatment variants. A/B tests identify the treatment variant showing the best performance, which then becomes the chosen or selected treatment for the entire population. However, the effect of a given treatment can differ across experimental units and a personalized approach for treatment selection can greatly improve upon the usual global selection strategy. In this work, we develop a framework for personalization through (i) estimation of heterogeneous treatment effect at either a cohort or member-level, followed by (ii) selection of optimal treatment variants for cohorts (or members) obtained through (deterministic or stochastic) constrained optimization.

We perform a two-fold evaluation of our proposed methods. First, a simulation analysis is conducted to study the effect of personalized treatment selection under carefully controlled settings. This simulation illustrates the differences between the proposed methods and the suitability of each with increasing uncertainty. We also demonstrate the effectiveness of the method through a real-life example related to serving notifications at Linkedin. The solution significantly outperformed both heuristic solutions and the global treatment selection baseline leading to a sizable win on top-line metrics like member visits.

Incremental Spatio-Temporal Graph Learning for Online Query-POI Matching

Query and Point-of-Interest (POI) matching, aiming at recommending the most relevant POIs from partial query keywords, has become one of the most essential functions in online navigation and ride-hailing applications. Existing methods for query-POI matching, such as Google Maps and Uber, have a natural focus on measuring the static semantic similarity between contextual information of queries and geographical information of POIs. However, it remains challenging for dynamic and personalized online query-POI matching because of the non-stationary and situational context-dependent query-POI relevance. Moreover, the large volume of online queries requires an adaptive and incremental model training strategy that is efficient and scalable in the online scenario. To this end, in this paper, we propose an Incremental Spatio-Temporal Graph Learning (IncreSTGL) framework for intelligent online query-POI matching. Specifically, we first model dynamic query-POI interactions as microscopic and macroscopic graphs. Then, we propose an incremental graph representation learning module to refine and update query-POI interaction graphs in an online incremental fashion, which includes: (i) a contextual graph attention operation quantifying query-POI correlation based on historical queries under dynamic situational context, (ii) a graph discrimination operation capturing the sequential query-POI relevance drift from a holistic view of personalized preference and social homophily, and (iii) a multi-level temporal attention operation summarizing the temporal variations of query-POI interaction graphs for subsequent query-POI matching. Finally, we introduce a lightweight semantic matching module for online query-POI similarity measurement. To demonstrate the effectiveness and efficiency of the proposed algorithm, we conduct extensive experiments on two real-world datasets collected from a leading online navigation and map service provider in China.

Slot Self-Attentive Dialogue State Tracking

An indispensable component in task-oriented dialogue systems is the dialogue state tracker, which keeps track of users’ intentions in the course of conversation. The typical approach towards this goal is to fill in multiple pre-defined slots that are essential to complete the task. Although various dialogue state tracking methods have been proposed in recent years, most of them predict the value of each slot separately and fail to consider the correlations among slots. In this paper, we propose a slot self-attention mechanism that can learn the slot correlations automatically. Specifically, a slot-token attention is first utilized to obtain slot-specific features from the dialogue context. Then a stacked slot self-attention is applied on these features to learn the correlations among slots. We conduct comprehensive experiments on two multi-domain task-oriented dialogue datasets, including MultiWOZ 2.0 and MultiWOZ 2.1. The experimental results demonstrate that our approach achieves state-of-the-art performance on both datasets, verifying the necessity and effectiveness of taking slot correlations into consideration.

SESSION: Session: Network Embeddings

Dynamic Embeddings for Interaction Prediction

In recommender systems (RSs), predicting the next item that a user interacts with is critical for user retention. While the last decade has seen an explosion of RSs aimed at identifying relevant items that match user preferences, there is still a range of aspects that could be considered to further improve their performance. For example, often RSs are centered around the user, who is modeled using her recent sequence of activities. Recent studies, however, have shown the effectiveness of modeling the mutual interactions between users and items using separate user and item embeddings.

Building on the success of these studies, we propose a novel method called DeePRed that addresses some of their limitations. In particular, we avoid recursive and costly interactions between consecutive short-term embeddings by using long-term (stationary) embeddings as a proxy. This enable us to train DeePRed using simple mini-batches without the overhead of specialized mini-batches proposed in previous studies. Moreover, DeePRed’s effectiveness comes from the aforementioned design and a multi-way attention mechanism that inspects user-item compatibility. Experiments show that DeePRed outperforms the best state-of-the-art approach by at least 14% of Mean Reciprocal Rank (MRR) on next item prediction task, while gaining more than an order of magnitude speedup over the best performing baselines. Although this study is mainly concerned with temporal interaction networks, we also show the power and flexibility of DeePRed by adapting it to the case of static interaction networks, substituting the short- and long-term aspects with local and global ones.

Knowledge Embedding Based Graph Convolutional Network

Recently, a considerable literature has grown up around the theme of Graph Convolutional Network (GCN). How to effectively leverage the rich structural information in complex graphs, such as knowledge graphs with heterogeneous types of entities and relations, is a primary open challenge in the field. Most GCN methods are either restricted to graphs with a homogeneous type of edges (e.g., citation links only), or focusing on representation learning for nodes only instead of jointly propagating and updating the embeddings of both nodes and edges for target-driven objectives. This paper addresses these limitations by proposing a novel framework, namely the Knowledge Embedding based Graph Convolutional Network (KE-GCN), which combines the power of GCNs in graph-based belief propagation and the strengths of advanced knowledge embedding (a.k.a. knowledge graph embedding) methods, and goes beyond. Our theoretical analysis shows that KE-GCN offers an elegant unification of several well-known GCN methods as specific cases, with a new perspective of graph convolution. Experimental results on benchmark datasets show the advantageous performance of KE-GCN over strong baseline methods in the tasks of knowledge graph alignment and entity classification1.

Motif-Preserving Dynamic Attributed Network Embedding

Network embedding has emerged as a new learning paradigm to embed complex network into a low-dimensional vector space while preserving node proximities in both network structures and properties. It advances various network mining tasks, ranging from link prediction to node classification. However, most existing works primarily focus on static networks while many networks in real-life evolve over time with addition/deletion of links and nodes, naturally with associated attribute evolution. In this work, we present Motif-preserving Temporal Shift Network (MTSN), a novel dynamic network embedding framework that simultaneously models the local high-order structures and temporal evolution for dynamic attributed networks. Specifically, MTSN learns node representations by stacking the proposed TIME module to capture both local high-order structural proximities and node attributes by motif-preserving encoder and temporal dynamics by temporal shift operation in a dynamic attributed network. Finally, we perform extensive experiments on four real-world network datasets to demonstrate the superiority of MTSN against state-of-the-art network embedding baselines in terms of both effectiveness and efficiency. The source code of our method is available at: https://github.com/ZhijunLiu95/MTSN.

Highly Liquid Temporal Interaction Graph Embeddings

Capturing the topological and temporal information of interactions and predicting future interactions are crucial for many domains, such as social networks, financial transactions, and e-commerce. With the advent of co-evolutional models, the mutual influence between the interacted users and items are captured. However, existing models only update the interaction information of nodes along the timeline. It causes the problem of information asymmetry, where early updated nodes often have much less information than the most recently updated nodes. The information asymmetry is essentially a blockage of information flow. We propose HILI (Highly Liquid Temporal Interaction Graph Embeddings) to predict highly liquid embeddings on temporal interaction graphs. Our embedding model makes interaction information highly liquid without information asymmetry. A specific least recently used-based and frequency-based windows are used to determine the priority of the nodes that receive the latest interaction information. HILI updates node embeddings by attention layers. The attention layers learn the correlation between nodes and update node embedding simply and quickly. In addition, HILI elaborately designs, a self-linear layer, a linear layer initialized in a novel method. A self-linear layer reduces the expected space of predicted embedding of the next interacting node and makes predicted embedding focus more on relevant nodes. We illustrate the geometric meaning of a self-linear layer in the paper. Furthermore, the results of the experiments show that our model outperforms other state-of-the-art temporal interaction prediction models.

Multiplex Bipartite Network Embedding using Dual Hypergraph Convolutional Networks

A bipartite network is a graph structure where nodes are from two distinct domains and only inter-domain interactions exist as edges. A large number of network embedding methods exist to learn vectorial node representations from general graphs with both homogeneous and heterogeneous node and edge types, including some that can specifically model the distinct properties of bipartite networks. However, these methods are inadequate to model multiplex bipartite networks (e.g., in e-commerce), that have multiple types of interactions (e.g., click, inquiry, and buy) and node attributes. Most real-world multiplex bipartite networks are also sparse and have imbalanced node distributions that are challenging to model. In this paper, we develop an unsupervised Dual HyperGraph Convolutional Network (DualHGCN) model that scalably transforms the multiplex bipartite network into two sets of homogeneous hypergraphs and uses spectral hypergraph convolutional operators, along with intra- and inter-message passing strategies to promote information exchange within and across domains, to learn effective node embeddings. We benchmark DualHGCN using four real-world datasets on link prediction and node classification tasks. Our extensive experiments demonstrate that DualHGCN significantly outperforms state-of-the-art methods, and is robust to varying sparsity levels and imbalanced node distributions.

SESSION: Session: Information Extraction

Semi-Open Information Extraction

Open Information Extraction (OIE), the task aimed at discovering all textual facts organized in the form of (subject, predicate, object) found within a sentence, has gained much attention recently. However, in some knowledge-driven applications such as question answering, we often have a target entity and hope to obtain its structured factual knowledge for better understanding, instead of extracting all possible facts aimlessly from the corpus. In this paper, we define a new task, namely Semi-Open Information Extraction (SOIE), to address this need. The goal of SOIE is to discover domain-independent facts towards a particular entity from general and diverse web text. To facilitate research on this new task, we propose a large-scale human-annotated benchmark called SOIED, consisting of 61,984 facts for 8,013 subject entities annotated on 24,000 Chinese sentences collected from the web search engine.

In addition, we propose a novel unified model called USE for this task. First, we introduce subject-guided sequence as input to a pre-trained language model and normalize the hidden representations conditioned on the subject embedding to encode the sentence in a subject-aware manner. Second, we decompose SOIE into three uncoupled subtasks: predicate extraction, object extraction, and boundary alignment. They can all be formulated as the problem of table filling by forming a two-dimensional tag table based on a task-specific tagging scheme. Third, we introduce a collaborative learning strategy that enables the interactive relations among subtasks to be better exploited by explicitly exchanging informative clues. Finally, we evaluate USE and several strong baselines on our new dataset. Experimental results demonstrate the advantages of the proposed method and reveal insight for future improvement.

RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network

In this paper, we present a novel method named RECON, that automatically identifies relations in a sentence (sentential relation extraction) and aligns to a knowledge graph (KG). RECON uses a graph neural network to learn representations of both the sentence as well as facts stored in a KG, improving the overall extraction quality. These facts, including entity attributes (label, alias, description, instance-of) and factual triples, have not been collectively used in the state of the art methods. We evaluate the effect of various forms of representing the KG context on the performance of RECON. The empirical evaluation on two standard relation extraction datasets shows that RECON significantly outperforms all state of the art methods on NYT Freebase and Wikidata datasets.

GNEM: A Generic One-to-Set Neural Entity Matching Framework

Entity Matching is a classic research problem in any data analytics pipeline, aiming to identify records referring to the same real-world entity. It plays an important role in data cleansing and integration. Advanced entity matching techniques focus on extracting syntactic or semantic features from record pairs via complex neural architectures or pre-trained language models. However, the performances always suffer from noisy or missing attribute values in the records. We observe that comparing one record with several relevant records in a collective manner allows each pairwise matching decision to be made by borrowing valuable insights from other pairs, which is beneficial to the overall matching performance. In this paper, we propose a generic one-to-set neural framework named GNEM for entity matching. GNEM predicts matching labels between one record and a set of relevant records simultaneously. It constructs a record pair graph with weighted edges and adopts the graph neural network to propagate information among pairs. We further show that GNEM can be interpreted as an extension and generalization of the existing pairwise matching techniques. Extensive experiments on real-world data sets demonstrate that GNEM consistently outperforms the existing pairwise entity matching techniques and achieves up to 8.4% improvement on F1-Score compared with the state-of-the-art neural methods.

Effective Named Entity Recognition with Boundary-aware Bidirectional Neural Networks

Named Entity Recognition (NER) is a fundamental problem in Natural Language Processing and has received much research attention. Although the current neural-based NER approaches have achieved the state-of-the-art performance, they still suffer from one or more of the following three problems in their architectures: (1) boundary tag sparsity, (2) lacking of global decoding information; and (3) boundary error propagation. In this paper, we propose a novel Boundary-aware Bidirectional Neural Networks (Ba-BNN) model to tackle these problems for neural-based NER. The proposed Ba-BNN model is constructed based on the structure of pointer networks for tackling the first problem on boundary tag sparsity. Moreover, we also use a boundary-aware binary classifier to capture the global decoding information as input to the decoders. In the Ba-BNN model, we propose to use two decoders to process the information in two different directions (i.e., from left-to-right and right-to-left). The final hidden states of the left-to-right decoder are obtained by incorporating the hidden states of the right-to-left decoder in the decoding process. In addition, a boundary retraining strategy is also proposed to help reduce boundary error propagation caused by the pointer networks in boundary detection and entity classification. We have conducted extensive experiments based on three NER benchmark datasets. The performance results have shown that the proposed Ba-BNN model has outperformed the current state-of-the-art models.

A Trigger-Sense Memory Flow Framework for Joint Entity and Relation Extraction

Joint entity and relation extraction framework constructs a unified model to perform entity recognition and relation extraction simultaneously, which can exploit the dependency between the two tasks to mitigate the error propagation problem suffered by the pipeline model. Current efforts on joint entity and relation extraction focus on enhancing the interaction between entity recognition and relation extraction through parameter sharing, joint decoding, or other ad-hoc tricks (e.g., modeled as a semi-Markov decision process, cast as a multi-round reading comprehension task). However, there are still two issues on the table. First, the interaction utilized by most methods is still weak and uni-directional, which is unable to model the mutual dependency between the two tasks. Second, relation triggers are ignored by most methods, which can help explain why humans would extract a relation in the sentence. They’re essential for relation extraction but overlooked. To this end, we present a Trigger-Sense Memory Flow Framework (TriMF) for joint entity and relation extraction. We build a memory module to remember category representations learned in entity recognition and relation extraction tasks. And based on it, we design a multi-level memory flow attention mechanism to enhance the bi-directional interaction between entity recognition and relation extraction. Moreover, without any human annotations, our model can enhance relation trigger information in a sentence through a trigger sensor module, which improves the model performance and makes model predictions with better interpretation. Experiment results show that our proposed framework achieves state-of-the-art results by improves the relation F1 to 52.44% (+3.2%) on SciERC, 66.49% (+4.9%) on ACE05, 72.35% (+0.6%) on CoNLL04 and 80.66% (+2.3%) on ADE.

SESSION: Session: Knowledge Graph Embeddings

MulDE: Multi-teacher Knowledge Distillation for Low-dimensional Knowledge Graph Embeddings

Link prediction based on knowledge graph embeddings (KGE) aims to predict new triples to automatically construct knowledge graphs (KGs). However, recent KGE models achieve performance improvements by excessively increasing the embedding dimensions, which may cause enormous training costs and require more storage space. In this paper, instead of training high-dimensional models, we propose MulDE, a novel knowledge distillation framework, which includes multiple low-dimensional hyperbolic KGE models as teachers and two student components, namely Junior and Senior. Under a novel iterative distillation strategy, the Junior component, a low-dimensional KGE model, asks teachers actively based on its preliminary prediction results, and the Senior component integrates teachers’ knowledge adaptively to train the Junior component based on two mechanisms: relation-specific scaling and contrast attention. The experimental results show that MulDE can effectively improve the performance and training speed of low-dimensional KGE models. The distilled 32-dimensional model is competitive compared to the state-of-the-art high-dimensional methods on several widely-used datasets.

Efficient Non-Sampling Knowledge Graph Embedding

Knowledge Graph (KG) is a flexible structure that is able to describe the complex relationship between data entities. Currently, most KG embedding models are trained based on negative sampling, i.e., the model aims to maximize some similarity of the connected entities in the KG, while minimizing the similarity of the sampled disconnected entities. Negative sampling helps to reduce the time complexity of model learning by only considering a subset of negative instances, which may fail to deliver stable model performance due to the uncertainty in the sampling procedure. To avoid such deficiency, we propose a new framework for KG embedding—Efficient Non-Sampling Knowledge Graph Embedding (NS-KGE). The basic idea is to consider all of the negative instances in the KG for model learning, and thus to avoid negative sampling. The framework can be applied to square-loss based knowledge graph embedding models or models whose loss can be converted to a square loss. A natural side-effect of this non-sampling strategy is the increased computational complexity of model learning. To solve the problem, we leverage mathematical derivations to reduce the complexity of non-sampling loss function, which eventually provides us both better efficiency and better accuracy in KG embedding compared with existing models. Experiments on benchmark datasets show that our NS-KGE framework can achieve a better performance on efficiency and accuracy over traditional negative sampling based models, and that the framework is applicable to a large class of knowledge graph embedding models.

Structure-Augmented Text Representation Learning for Efficient Knowledge Graph Completion

Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks, but these graphs are usually incomplete, urging auto-completion of them (a.k.a. knowledge graph completion). Prevalent graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements (i.e., entities/relations) into dense embeddings and capturing their triple-level relationship with spatial distance. However, they are hardly generalizable to the elements never visited in training and are intrinsically vulnerable to graph incompleteness. In contrast, textual encoding approaches, e.g., KG-BERT, resort to graph triple’s text and triple-level contextualized representations. They are generalizable enough and robust to the incompleteness, especially when coupled with pre-trained encoders. But two major drawbacks limit the performance: (1) high overheads due to the costly scoring of all possible triples in inference, and (2) a lack of structured knowledge in the textual encoder. In this paper, we follow the textual encoding paradigm and aim to alleviate its drawbacks by augmenting it with graph embedding techniques – a complementary hybrid of both paradigms. Specifically, we partition each triple into two asymmetric parts as in translation-based graph embedding approach, and encode both parts into contextualized representations by a Siamese-style textual encoder. Built upon the representations, our model employs both deterministic classifier and spatial measurement for representation and structure learning respectively. It thus reduces the overheads by reusing graph elements’ embeddings to avoid combinatorial explosion, and enhances structured knowledge by exploring the spatial characteristics. Moreover, we develop a self-adaptive ensemble scheme to further improve the performance by incorporating triple scores from an existing graph embedding model. In experiments, we achieve state-of-the-art performance on three benchmarks and a zero-shot dataset for link prediction, with highlights of inference costs reduced by 1-2 orders of magnitude compared to a sophisticated textual encoding method.

An Adversarial Transfer Network for Knowledge Representation Learning

Knowledge representation learning has received a lot of attention in the past few years. The success of existing methods heavily relies on the quality of knowledge graphs. The entities with few triplets tend to be learned with less expressive power. Fortunately, there are many knowledge graphs constructed from various sources, the representations of which could contain much information. We propose an adversarial embedding transfer network ATransN, which transfers knowledge from one or more teacher knowledge graphs to a target one through an aligned entity set without explicit data leakage. Specifically, we add soft constraints on aligned entity pairs and neighbours to the existing knowledge representation learning methods. To handle the problem of possible distribution differences between teacher and target knowledge graphs, we introduce an adversarial adaption module. The discriminator of this module evaluates the degree of consistency between the embeddings of an aligned entity pair. The consistency score is then used as the weights of soft constraints. It is not necessary to acquire the relations and triplets in teacher knowledge graphs because we only utilize the entity representations. Knowledge graph completion results show that ATransN achieves better performance against baselines without transfer on three datasets, CN3l, WK3l, and DWY100k. The ablation study demonstrates that ATransN can bring steady and consistent improvement in different settings. The extension of combining other knowledge graph embedding algorithms and the extension with three teacher graphs display the promising generalization of the adversarial transfer network.

Mixed-Curvature Multi-Relational Graph Neural Network for Knowledge Graph Completion

Knowledge graphs (KGs) have gradually become valuable assets for many AI applications. In a KG, a node denotes an entity, and an edge (or link) denotes a relationship between the entities represented by the nodes. Knowledge graph completion infers and predicts missing edges in a KG automatically. Knowledge graph embeddings have shed light on addressing this task. Recent research embeds KGs in hyperbolic (negatively curved) space instead of conventional Euclidean (zero curved) space and is effective in capturing hierarchical structures. However, as multi-relational graphs, KGs are not structured uniformly and display intrinsic heterogeneous structures. They usually contain rich types of structures, such as hierarchical and cyclic typed structures. Embedding KGs in single-curvature space, such as Euclidean or hyperbolic space, overlooks the intrinsic heterogeneous structures of KGs, and therefore cannot accurately capture their structures. To address this issue, we propose Mixed-Curvature Multi-Relational Graph Neural Network (M2GNN), a generic approach that embeds multi-relational KGs in a mixed-curvature space for knowledge graph completion. Specifically, we define and construct a mixed-curvature space through a product manifold combining multiple single-curvature spaces (e.g., spherical, hyperbolic, or Euclidean) with the purpose of modeling a variety of structures. However, constructing a mixed-curvature space typically requires manually defining the fixed curvatures, which needs domain knowledge and additional data analysis. Improperly defined curvature space also cannot capture the structures of KGs accurately. To address this problem, we set mixed-curvatures as trainable parameters to better capture the underlying structures of the KGs. Furthermore, we propose a Graph Neural Updater by leveraging the heterogeneous relational context in mixed-curvature space to improve the quality of the embedding. Experiments on three KG datasets demonstrate that the proposed M2GNN can outperform its single geometry counterpart as well as state-of-the-art embedding methods on the KG completion task.

SESSION: Session: Relevance, Ranking and Recommendations

Elo-MMR: A Rating System for Massive Multiplayer Competitions

Skill estimation mechanisms, colloquially known as rating systems, play an important role in competitive sports and games. They provide a measure of player skill, which incentivizes competitive performances and enables balanced match-ups. In this paper, we present a novel Bayesian rating system for contests with many participants. It is widely applicable to competition formats with discrete ranked matches, such as online programming competitions, obstacle courses races, and video games. The system’s simplicity allows us to prove theoretical bounds on its robustness and runtime. In addition, we show that it is incentive-compatible: a player who seeks to maximize their rating will never want to underperform. Experimentally, the rating system surpasses existing systems in prediction accuracy, and computes faster than existing systems by up to an order of magnitude.

DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems

Learning effective feature crosses is the key behind building recommender systems. However, the sparse and large feature space requires exhaustive search to identify effective crosses. Deep & Cross Network (DCN) was proposed to automatically and efficiently learn bounded-degree predictive feature interactions. Unfortunately, in models that serve web-scale traffic with billions of training examples, DCN showed limited expressiveness in its cross network at learning more predictive feature interactions. Despite significant research progress made, many deep learning models in production still rely on traditional feed-forward neural networks to learn feature crosses inefficiently.

In light of the pros/cons of DCN and existing feature interaction learning approaches, we propose an improved framework DCN-V2 to make DCN more practical in large-scale industrial settings. In a comprehensive experimental study with extensive hyper-parameter search and model tuning, we observed that DCN-V2 approaches outperform all the state-of-the-art algorithms on popular benchmark datasets. The improved DCN-V2 is more expressive yet remains cost efficient at feature interaction learning, especially when coupled with a mixture of low-rank architecture. DCN-V2 is simple, can be easily adopted as building blocks, and has delivered significant offline accuracy and online business metrics gains across many web-scale learning to rank systems at Google. Our code and tutorial are open-sourced as part of TensorFlow Recommenders (TFRS)1.

A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix Learning

Matrix learning is at the core of many machine learning problems. A number of real-world applications such as collaborative filtering and text mining can be formulated as a low-rank matrix completion problems, which recovers incomplete matrix using low-rank assumptions. To ensure that the matrix solution has a low rank, a recent trend is to use nonconvex regularizers that adaptively penalize singular values. They offer good recovery performance and have nice theoretical properties, but are computationally expensive due to repeated access to individual singular values. In this paper, based on the key insight that adaptive shrinkage on singular values improve empirical performance, we propose a new nonconvex low-rank regularizer called ”nuclear norm minus Frobenius norm” regularizer, which is scalable, adaptive and sound. We first show it provably holds the adaptive shrinkage property. Further, we discover its factored form which bypasses the computation of singular values and allows fast optimization by general optimization algorithms. Stable recovery and convergence are guaranteed. Extensive low-rank matrix completion experiments on a number of synthetic and real-world data sets show that the proposed method obtains state-of-the-art recovery performance while being the fastest in comparison to existing low-rank matrix learning methods. 1

An Adversarial Imitation Click Model for Information Retrieval

Modern information retrieval systems, including web search, ads placement, and recommender systems, typically rely on learning from user feedback. Click models, which study how users interact with a ranked list of items, provide a useful understanding of user feedback for learning ranking models. Constructing ”right” dependencies is the key of any successful click model. However, probabilistic graphical models (PGMs) have to rely on manually assigned dependencies, and oversimplify user behaviors. Existing neural network based methods promote PGMs by enhancing the expressive ability and allowing flexible dependencies, but still suffer from exposure bias and inferior estimation. In this paper, we propose a novel framework, Adversarial Imitation Click Model (AICM), based on imitation learning. Firstly, we explicitly learn the reward function that recovers users’ intrinsic utility and underlying intentions. Secondly, we model user interactions with a ranked list as a dynamic system instead of one-step click prediction, alleviating the exposure bias problem. Finally, we minimize the JS divergence through adversarial training and learn a stable distribution of click sequences, which makes AICM generalize well across different distributions of ranked lists. A theoretical analysis has indicated that AICM reduces the exposure bias from O(T2) to O(T). Our studies on a public web search dataset show that AICM not only outperforms state-of-the-art models in traditional click metrics but also achieves superior performance in addressing the exposure bias and recovering the underlying patterns of click sequences.

CrowdGP: a Gaussian Process Model for Inferring Relevance from Crowd Annotations

Test collection has been a crucial factor for developing information retrieval systems. Constructing a test collection requires annotators to assess the relevance of massive query-document pairs. Relevance annotations acquired through crowdsourcing platforms alleviate the enormous cost of this process but they are often noisy. Existing models to denoise crowd annotations mostly assume that annotations are generated independently, based on which a probabilistic graphical model is designed to model the annotation generation process. However, tasks are often correlated with each other in reality. It is an understudied problem whether and how task correlation helps in denoising crowd annotations.

In this paper, we relax the independence assumption to model task correlation in terms of relevance. We propose a new crowd annotation generation model named CrowdGP, where true relevance labels, annotator competence, annotator’s bias towards relevancy, task difficulty, and task’s bias towards relevancy are modelled through a Gaussian process and multiple Gaussian variables respectively. The CrowdGP model shows better performance in terms of interring true relevance labels compared with state-of-the-art baselines on two crowdsourcing datasets on relevance. The experiments also demonstrate its effectiveness in terms of selecting new tasks for future crowd annotation, which is a new functionality of CrowdGP. Ablation studies indicate that the effectiveness is attributed to the modelling of task correlation based on the auxiliary information of tasks and the prior relevance information of documents to queries.

SESSION: Session: Urban Computing

Fine-Grained Urban Flow Prediction

Urban flow prediction benefits smart cities in many aspects, such as traffic management and risk assessment. However, a critical prerequisite for these benefits is having fine-grained knowledge of the city. Thus, unlike previous works that are limited to coarse-grained data, we extend the horizon of urban flow prediction to fine granularity which raises specific challenges: 1) the predominance of inter-grid transitions observed in fine-grained data makes it more complicated to capture the spatial dependencies among grid cells at a global scale; 2) it is very challenging to learn the impact of external factors (e.g., weather) on a large number of grid cells separately. To address these two challenges, we present a Spatio-Temporal Relation Network (STRN) to predict fine-grained urban flows. First, a backbone network is used to learn high-level representations for each cell. Second, we present a Global Relation Module (GloNet) that captures global spatial dependencies much more efficiently compared to existing methods. Third, we design a Meta Learner that takes external factors and land functions (e.g., POI density) as inputs to produce meta knowledge and boost model performances. We conduct extensive experiments on two real-world datasets. The results show that STRN reduces the errors by 7.1% to 11.5% compared to the state-of-the-art method while using much fewer parameters. Moreover, a cloud-based system called UrbanFlow 3.0 has been deployed to show the practicality of our approach.

AutoSTG: Neural Architecture Search for Predictions of Spatio-Temporal Graph✱

Spatio-temporal graphs are important structures to describe urban sensory data, e.g., traffic speed and air quality. Predicting over spatio-temporal graphs enables many essential applications in intelligent cities, such as traffic management and environment analysis. Recently, many deep learning models have been proposed for spatio-temporal graph prediction and achieved significant results. However, designing neural networks requires rich domain knowledge and expert efforts. To this end, we study automated neural architecture search for spatio-temporal graphs with the application to urban traffic prediction, which meets two challenges: 1) how to define search space for capturing complex spatio-temporal correlations; and 2) how to learn network weight parameters related to the corresponding attributed graph of a spatio-temporal graph.

To tackle these challenges, we propose a novel framework, entitled AutoSTG, for automated spatio-temporal graph prediction. In our AutoSTG, spatial graph convolution and temporal convolution operations are adopted in our search space to capture complex spatio-temporal correlations. Besides, we employ the meta learning technique to learn the adjacency matrices of spatial graph convolution layers and kernels of temporal convolution layers from the meta knowledge of the attributed graph. And specifically, such meta knowledge is learned by a graph meta knowledge learner that iteratively aggregates knowledge on the attributed graph. Finally, extensive experiments were conducted on two real-world benchmark datasets to demonstrate that AutoSTG can find effective network architectures and achieve state-of-the-art results. To the best of our knowledge, we are the first to study neural architecture search for spatio-temporal graphs.

Intelligent Electric Vehicle Charging Recommendation Based on Multi-Agent Reinforcement Learning

Electric Vehicle (EV) has become a preferable choice in the modern transportation system due to its environmental and energy sustainability. However, in many large cities, EV drivers often fail to find the proper spots for charging, because of the limited charging infrastructures and the spatiotemporally unbalanced charging demands. Indeed, the recent emergence of deep reinforcement learning provides great potential to improve the charging experience from various aspects over a long-term horizon. In this paper, we propose a framework, named Multi-Agent Spatio-Temporal Reinforcement Learning (Master), for intelligently recommending public accessible charging stations by jointly considering various long-term spatiotemporal factors. Specifically, by regarding each charging station as an individual agent, we formulate this problem as a multi-objective multi-agent reinforcement learning task. We first develop a multi-agent actor-critic framework with the centralized attentive critic to coordinate the recommendation between geo-distributed agents. Moreover, to quantify the influence of future potential charging competition, we introduce a delayed access strategy to exploit the knowledge of future charging competition during training. After that, to effectively optimize multiple learning objectives, we extend the centralized attentive critic to multi-critics and develop a dynamic gradient re-weighting strategy to adaptively guide the optimization direction. Finally, extensive experiments on two real-world datasets demonstrate that Master achieves the best comprehensive performance compared with nine baseline approaches.

STUaNet: Understanding Uncertainty in Spatiotemporal Collective Human Mobility

The high dynamics and heterogeneous interactions in the complicated urban systems have raised the issue of uncertainty quantification in spatiotemporal human mobility, to support critical decision-makings in risk-aware web applications such as urban event prediction where fluctuations are of significant interests. Given the fact that uncertainty quantifies the potential variations around prediction results, traditional learning schemes always lack uncertainty labels, and conventional uncertainty quantification approaches mostly rely upon statistical estimations with Bayesian Neural Networks or ensemble methods. However, they have never involved any spatiotemporal evolution of uncertainties under various contexts, and also have kept suffering from the poor efficiency of statistical uncertainty estimation while training models with multiple times. To provide high-quality uncertainty quantification for spatiotemporal forecasting, we propose an uncertainty learning mechanism to simultaneously estimate internal data quality and quantify external uncertainty regarding various contextual interactions. To address the issue of lacking labels of uncertainty, we propose a hierarchical data turbulence scheme where we can actively inject controllable uncertainty for guidance, and hence provide insights to both uncertainty quantification and weak supervised learning. Finally, we re-calibrate and boost the prediction performance by devising a gated-based bridge to adaptively leverage the learned uncertainty into predictions. Extensive experiments on three real-world spatiotemporal mobility sets have corroborated the superiority of our proposed model in terms of both forecasting and uncertainty quantification.

DeepFEC: Energy Consumption Prediction under Real-World Driving Conditions for Smart Cities

The status of air pollution is serious all over the world. Analysing and predicting vehicle energy consumption becomes a major concern. Vehicle energy consumption depends not only on speed but also on a number of external factors such as road topology, traffic, driving style, etc. Obtaining the cost for each link (i.e., link energy consumption) in road networks plays a key role in energy-optimal route planning process. This paper presents a novel framework that identifies vehicle/driving environment-dependent factors to predict energy consumption over a road network based on historical consumption data for different vehicle types. We design a deep-learning-based structure, called DeepFEC, to forecast accurate energy consumption in each and every road in a city based on real traffic conditions. A residual neural network and recurrent neural network are employed to model the spatial and temporal closeness, respectively. Static vehicle data reflecting vehicle type, vehicle weight, engine configuration and displacement are also learned. The outputs of these neural networks are dynamically aggregated to improve the spatially correlated time series data forecasting. Extensive experiments conducted on a diverse fleet consisting of 264 gasoline vehicles, 92 Hybrid Electric Vehicles, and 27 Plug-in Hybrid Electric Vehicles/Electric Vehicles drove in Michigan road network, show that our proposed deep learning algorithm significantly outperforms the state-of-the-art prediction algorithms. To make the results reproductible, the code, the used data and details of the experimental setup are made available online at https://github.com/ElmiSay/DeepFEC.

SESSION: Session: Crowdsourcing

Separ: Towards Regulating Future of Work Multi-Platform Crowdworking Environments with Privacy Guarantees

Crowdworking platforms provide the opportunity for diverse workers to execute tasks for different requesters. The popularity of the ”gig” economy has given rise to independent platforms that provide competing and complementary services. Workers as well as requesters with specific tasks may need to work for or avail from the services of multiple platforms resulting in the rise of multi-platform crowdworking systems. Recently, there has been increasing interest by governmental, legal and social institutions to enforce regulations, such as minimal and maximal work hours, on crowdworking platforms. Platforms within multi-platform crowdworking systems, therefore, need to collaborate to enforce cross-platform regulations. While collaborating to enforce global regulations requires the transparent sharing of information about tasks and their participants, the privacy of all participants needs to be preserved. In this paper, we propose an overall vision exploring the regulation, privacy, and architecture dimensions for the future of work multi-platform crowdworking environments. We then present Separ, a multi-platform crowdworking system that enforces a large sub-space of practical global regulations on a set of distributed independent platforms in a privacy-preserving manner. Separ, enforces privacy using lightweight and anonymous tokens, while transparency is achieved using fault-tolerant blockchain ledgers shared among multiple platforms. The privacy guarantees of Separ against covert adversaries are formalized and thoroughly demonstrated, while the experiments reveal the efficiency of Separ in terms of performance and scalability.

Online Label Aggregation: A Variational Bayesian Approach

Noisy labeled data is more a norm than a rarity for crowd sourced contents. It is effective to distill noise and infer correct labels through aggregating results from crowd workers. To ensure the time relevance and overcome slow responses of workers, online label aggregation is increasingly requested, calling for solutions that can incrementally infer true label distribution via subsets of data items. In this paper, we propose a novel online label aggregation framework, BiLA , which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training. BiLA is flexible to accommodate any generating distribution of labels by the exact computation of its posterior distribution. We also derive the convergence bound of the proposed optimizer. We compare BiLA with the state of the art based on minimax entropy, neural networks and expectation maximization algorithms, on synthetic and real-world data sets. Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1.5 percent points for synthetic and real-world datasets, respectively.

Peer Grading the Peer Reviews: A Dual-Role Approach for Lightening the Scholarly Paper Review Process

Scientific peer review is pivotal to maintain quality standards for academic publication. The effectiveness of the reviewing process is currently being challenged by the rapid increase of paper submissions in various conferences. Those venues need to recruit a large number of reviewers of different levels of expertise and background. The submitted reviews often do not meet the conformity standards of the conferences. Such a situation poses an ever-bigger burden on the meta-reviewers when trying to reach a final decision.

In this work, we propose a human-AI approach that estimates the conformity of reviews to the conference standards. Specifically, we ask peers to grade each other’s reviews anonymously with respect to important criteria of review conformity such as sufficient justification and objectivity. We introduce a Bayesian framework that learns the conformity of reviews from both the peer grading process, historical reviews and decisions of a conference, while taking into account grading reliability. Our approach helps meta-reviewers easily identify reviews that require clarification and detect submissions requiring discussions while not inducing additional overhead from reviewers. Through a large-scale crowdsourced study where crowd workers are recruited as graders, we show that the proposed approach outperforms machine learning or review grades alone and that it can be easily integrated into existing peer review systems.

Multi-Session Diversity to Improve User Satisfaction in Web Applications

In various Web applications, users consume content in a series of sessions. That is prevalent in online music listening, where a session is a channel and channels are listened to in sequence, or in crowdsourcing, where a session is a set of tasks and task sets are completed in sequence. Content diversity can be defined in more than one way, e.g., based on artists or genres for music, or on requesters or rewards in crowdsourcing. A user may prefer to experience diversity within or across sessions. Naturally, intra-session diversity is set-based, whereas, inter-session diversity is sequence-based. This novel multi-session diversity gives rise to four bi-objective problems with the goal of minimizing or maximizing inter and intra diversities. Given the hardness of those problems, we propose to formulate a constrained optimization problem that optimizes inter diversity, subject to the constraint of intra diversity. We develop an efficient algorithm to solve our problem. Our experiments with human subjects on two real datasets, music and crowdsourcing, show our diversity formulations do serve different user needs, and yield high user satisfaction. Our large data experiments on real and synthetic data empirically demonstrate that our solution satisfy the theoretical bounds and is highly scalable, compared to baselines.

What do You Mean? Interpreting Image Classification with Crowdsourced Concept Extraction and Analysis

Global interpretability is a vital requirement for image classification applications. Existing interpretability methods mainly explain a model behavior by identifying salient image patches, which require manual efforts from users to make sense of, and also do not typically support model validation with questions that investigate multiple visual concepts. In this paper, we introduce a scalable human-in-the-loop approach for global interpretability. Salient image areas identified by local interpretability methods are annotated with semantic concepts, which are then aggregated into a tabular representation of images to facilitate automatic statistical analysis of model behavior. We show that this approach answers interpretability needs for both model validation and exploration, and provides semantically more diverse, informative, and relevant explanations while still allowing for scalable and cost-efficient execution.

SESSION: Session: Question Answering Systems

FINN: Feedback Interactive Neural Network for Intent Recommendation

Intent recommendation, as a new type of recommendation service, is to recommend a predicted query to a user in the search box when the user lands on the homepage of an application without any input. Such an intent recommendation service has been widely used in e-commerce applications, such as Taobao and Amazon. The most difficult part is to accurately predict user’s search intent, so as to improve user’s search experience and reduce tedious typing especially on mobile phones. Existing methods mainly rely on user’s historical search behaviors to estimate user’s current intent, but they do not make full use of the feedback information between the user and the intent recommendation system. Essentially, feedback information is the key factor for capturing dynamics of user search intents in real time. Therefore, we propose a feedback interactive neural network (FINN) to estimate user’s potential search intent more accurately, by making full use of the feedback interaction with the following three parts: 1) Both positive feedback (PF) and negative feedback (NF) information are collected simultaneously. PF includes user’s search intent information that the user is interested in, such as the query used and the title clicked. NF indicates user’s search intent information that the user is not interested in, such as the query recommended by the system but not clicked by the user. 2) A filter-attention (FAT) structure is proposed to filter out the noisy feedback and get more accurate positive and negative intentions of users. 3) A multi-task learning is designed to match the correlation between the user’s search intent and query candidates, which can learn and recommend query candidates from user interests and disinterests associated with each user. Finally, extensive experiments have been conducted by comparing with state-of-the-art methods, and it shows that our FINN method can achieve the best performance using the Taobao mobile application dataset. In addition, online experimental results also show that our method improves the CTR by 8% and attracts more than 7.98% of users than the baseline.

Weakly-Supervised Question Answering with Effective Rank and Weighted Loss over Candidates

We study the weakly supervised question answering problem. Weak-ly supervised question answering aims to learn how the questions should be answered directly from the <question, answer> pairs without golden solutions/evidences, which makes question answering models much easier to scale to many domains. However, in weak supervision setup, a question typically involves many candidate solutions and the spuriousness of candidate solutions will hurt the performance of the question answering models. In this paper, we present an effective method to learn a question answering model in a weak supervision way. Specifically, in order to reduce the spuriousness of candidate solutions used for training, we design several simple yet effective scoring functions to rank the candidate solutions. Despite its simplicity, this ranking process can improve the quality of the training data significantly with fewer spurious candidates left. Then, different from previous approaches that either treat all candidates equally for training or only select the candidate with the largest likelihood in each iteration, we formulate this problem as a multi-task learning problem by weighing the losses computed from top-k candidates. Experimental results show that, our method1 can outperform previous approaches on both semantic parsing and machine reading comprehension tasks.

Controlling the Risk of Conversational Search via Reinforcement Learning

Users often formulate their search queries and questions with immature language without well-developed keywords and complete structures. Such queries are likely to fail to express their true information needs and raise ambiguity as fragmental language often yield various interpretations and aspects. This gives search engines a hard time processing and understanding the query, and eventually leads to unsatisfactory retrieval results. An alternative approach to direct answer while facing an ambiguous query is to proactively ask clarifying questions to the user. Recent years have seen many works and shared tasks from both NLP and IR community about identifying the need for asking clarifying question and methodology to generate them. An often neglected fact by these works is that although sometimes the need for clarifying questions is correctly recognized, the clarifying questions these system generate are still off-topic and dissatisfaction provoking to users and may just cause users to leave the conversation.

In this work, we propose a risk-aware conversational search agent model to balance the risk of answering user’s query and asking clarifying questions. The agent is fully aware that asking clarifying questions can potentially collect more information from user, but it will compare all the choices it has and evaluate the risks. Only after then, it will make decision between answering or asking. To demonstrate that our system is able to retrieve better answers, we conduct experiments on the MSDialog dataset which contains real-world customer service conversations from Microsoft products community. We also purpose a reinforcement learning strategy which allows us to train our model on the original dataset directly and saves us from any further data annotation efforts. Our experiment results show that our risk-aware conversational search agent is able to significantly outperform strong non-risk-aware baselines.

Joint Spatio-Textual Reasoning for Answering Tourism Questions

Our goal is to answer real-world tourism questions that seek Points-of-Interest (POI) recommendations. Such questions express various kinds of spatial and non-spatial constraints, necessitating a combination of textual and spatial reasoning. In response, we develop the first joint spatio-textual reasoning model, which combines geo-spatial knowledge with information in textual corpora to answer questions. We first develop a modular spatial-reasoning network that uses geo-coordinates of location names mentioned in a question, and of candidate answer POIs, to reason over only spatial constraints. We then combine our spatial-reasoner with a textual reasoner in a joint model and present experiments on a real world POI recommendation task. We report substantial improvements over existing models without joint spatio-textual reasoning. To the best of our knowledge, we are the first to develop a joint QA model that combines reasoning over external geo-spatial knowledge along with textual reasoning.

Adapting to Context-Aware Knowledge in Natural Conversation for Multi-Turn Response Selection

Virtual assistants aim to build a human-like conversational agent. However, current human-machine conversations still cannot make users feel intelligent enough to build a continued dialog over time. Some responses from agents are usually inconsistent, uninformative, less-engaging and even memoryless. In recent years, most researchers have tried to employ conversation context and external knowledge, e.g. wiki pages and knowledge graphs, into the model which only focuses on solving some special conversation problems in local perspectives. Few researchers are dedicated to the whole capability of the conversational agent which is endowed with abilities of not only passively reacting the conversation but also proactively leading the conversation.

In this paper, we first explore the essence of conversations among humans by analyzing real dialog records. We find that some conversations revolve around the same context and topic, and some require additional information or even move on to a new topic. Base on that, we conclude three conversation modes shown in Figure 1 and try to solve how to adapt to them for a continuous conversation. To this end, we define “Adaptive Knowledge-Grounded Conversations” (AKGCs) where the knowledge is to ground the conversation within a multi-turn context by adapting to three modes. To achieve AKGC, a model called MNDB is proposed to model natural dialog behaviors for multi-turn response selection. To ensure a consistent response, MNDB constructs a multi-turn context flow. Then, to mimic user behaviors of incorporating knowledge in natural conversations, we design a ternary-grounding network along with the context flow. In this network, to gain the ability to adapt to diversified conversation modes, we exploit multi-view semantical relations among response candidates, context and knowledge. Thus, three adaptive matching signals are extracted for final response selection. Evaluation results on two benchmarks indicate that MNDB can significantly outperform state-of-the-art models.

SESSION: Session: Politics on the Web

Understanding the Complexity of Detecting Political Ads

Online political advertising has grown significantly over the last few years. To monitor online sponsored political discourse, companies such as Facebook, Google, and Twitter have created public Ad Libraries collecting the political ads that run on their platforms. Currently, both policymakers and platforms are debating further restrictions on political advertising to deter misuses.

This paper investigates whether we can reliably distinguish political ads from non-political ads. We take an empirical approach to analyze what kind of ads are deemed political by ordinary people and what kind of ads lead to disagreement. Our results show a significant disagreement between what ad platforms, ordinary people, and advertisers consider political and suggest that this disagreement mainly comes from diverging opinions on which ads address social issues. Overall our results imply that it is important to consider social issue ads as political, but they also complicate political advertising regulations.

War of Words II: Enriched Models of Law-Making Processes

The European Union law-making process is an instance of a peer-production system. We mine a rich dataset of law edits and introduce models predicting their adoption by parliamentary committees. Edits are proposed by parliamentarians, and they can be in conflict with edits of other parliamentarians and with the original proposition in the law. Our models combine three different categories of features: (a) Explicit features extracted from data related to the edits, the parliamentarians, and the laws, (b) latent features that capture bi-linear interactions between parliamentarians and laws, and (c) text features of the edits. We show experimentally that this combination enables us to accurately predict the success of the edits. Furthermore, it leads to model parameters that are interpretable, hence provides valuable insight into the law-making process.

Assessing the Effects of Friend-to-Friend Texting onTurnout in the 2018 US Midterm Elections

Recent mobile app technology lets people systematize the process of messaging their friends to urge them to vote. Prior to the most recent US midterm elections in 2018, the mobile app Outvote randomized an aspect of their system, hoping to unobtrusively assess the causal effect of their users’ messages on voter turnout. However, properly assessing this causal effect is hindered by multiple statistical challenges, including attenuation bias due to mismeasurement of subjects’ outcomes and low precision due to two-sided non-compliance with subjects’ assignments. We address these challenges, which are likely to impinge upon any study that seeks to randomize authentic friend-to-friend interactions, by tailoring the statistical analysis to make use of additional data about both users and subjects. Using meta-data of users’ in-app behavior, we reconstruct subjects’ positions in users’ queues. We use this information to refine the study population to more compliant subjects who were higher in the queues, and we do so in a systematic way which optimizes a proxy for the study’s power. To mitigate attenuation bias, we then use ancillary data of subjects’ matches to the voter rolls that lets us refine the study population to one with low rates of outcome mismeasurement. Our analysis reveals statistically significant treatment effects from friend-to-friend mobilization efforts ( 8.3, CI = (1.2, 15.3)) that are among the largest reported in the get-out-the-vote (GOTV) literature. While social pressure from friends has long been conjectured to play a role in effective GOTV treatments, the present study is among the first to assess these effects experimentally. 

Fast Evaluation for Relevant Quantities of Opinion Dynamics

One of the main subjects in the field of social networks is to quantify conflict, disagreement, controversy, and polarization, and some quantitative indicators have been developed to quantify these concepts. However, direct computation of these indicators involves the operations of matrix inversion and multiplication, which make it computationally infeasible for large-scale graphs with millions of nodes. In this paper, by reducing the problem of computing relevant quantities to evaluating ℓ2 norms of some vectors, we present a nearly linear time algorithm to estimate all these quantities. Our algorithm is based on the Laplacian solvers, and has a proved theoretical guarantee of error for each quantity. We execute extensive numerical experiments on a variety of real networks, which demonstrate that our approximation algorithm is efficient and effective, scalable to large graphs having millions of nodes.

Random Walks with Erasure: Diversifying Personalized Recommendations on Social and Information Networks

Most existing personalization systems promote items that match a user’s previous choices or those that are popular among similar users. This results in recommendations that are highly similar to the ones users are already exposed to, resulting in their isolation inside familiar but insulated information silos. In this context, we develop a novel recommendation framework with a goal of improving information diversity using a modified random walk exploration of the user-item graph. We focus on the problem of political content recommendation, while addressing a general problem applicable to personalization tasks in other social and information networks.

For recommending political content on social networks, we first propose a new model to estimate the ideological positions for both users and the content they share, which is able to recover ideological positions with high accuracy. Based on these estimated positions, we generate diversified personalized recommendations using our new random-walk based recommendation algorithm. With experimental evaluations on large datasets of Twitter discussions, we show that our method based on random walks with erasure is able to generate more ideologically diverse recommendations. Our approach does not depend on the availability of labels regarding the bias of users or content producers. With experiments on open benchmark datasets from other social and information networks, we also demonstrate the effectiveness of our method in recommending diverse long-tail items.

SESSION: Session: Graph Models

Soft-mask: Adaptive Substructure Extractions for Graph Neural Networks

For learning graph representations, not all detailed structures within a graph are relevant to the given graph tasks. Task-relevant structures can be localized or sparse which are only involved in subgraphs or characterized by the interactions of subgraphs (a hierarchical perspective). A graph neural network should be able to efficiently extract task-relevant structures and be invariant to irrelevant parts, which is challenging for general message passing GNNs. In this work, we propose to learn graph representations from a sequence of subgraphs of the original graph to better capture task-relevant substructures or hierarchical structures and skip noisy parts. To this end, we design soft-mask GNN layer to extract desired subgraphs through the mask mechanism. The soft-mask is defined in a continuous space to maintain the differentiability and characterize the weights of different parts. Compared with existing subgraph or hierarchical representation learning methods and graph pooling operations, the soft-mask GNN layer is not limited by the fixed sample or drop ratio, and therefore is more flexible to extract subgraphs with arbitrary sizes. Extensive experiments on public graph benchmarks show that soft-mask mechanism brings performance improvements. And it also provides interpretability where visualizing the values of masks in each layer allows us to have an insight into the structures learned by the model.

Graph Contrastive Learning with Adaptive Augmentation

Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes—a crucial component in CL—remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structures and attributes of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art baselines and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation.

SUGAR: Subgraph Neural Network with Reinforcement Pooling and Self-Supervised Mutual Information Mechanism

Graph representation learning has attracted increasing research attention. However, most existing studies fuse all structural features and node attributes to provide an overarching view of graphs, neglecting finer substructures’ semantics, and suffering from interpretation enigmas. This paper presents a novel hierarchical subgraph-level selection and embedding-based graph neural network for graph classification, namely SUGAR, to learn more discriminative subgraph representations and respond in an explanatory way. SUGAR reconstructs a sketched graph by extracting striking subgraphs as the representative part of the original graph to reveal subgraph-level patterns. To adaptively select striking subgraphs without prior knowledge, we develop a reinforcement pooling mechanism, which improves the generalization ability of the model. To differentiate subgraph representations among graphs, we present a self-supervised mutual information mechanism to encourage subgraph embedding to be mindful of the global graph structural properties by maximizing their mutual information. Extensive experiments on six typical bioinformatics datasets demonstrate a significant and consistent improvement in model quality with competitive performance and interpretability.

Strongly Local Hypergraph Diffusions for Clustering and Semi-supervised Learning

Hypergraph-based machine learning methods are now widely recognized as important for modeling and using higher-order and multiway relationships between data objects. Local hypergraph clustering and semi-supervised learning specifically involve finding a well-connected set of nodes near a given set of labeled vertices. Although many methods for local graph clustering exist, there are relatively few for localized clustering in hypergraphs. Moreover, those that exist often lack flexibility to model a general class of hypergraph cut functions or cannot scale to large problems. To tackle these issues, this paper proposes a new diffusion-based hypergraph clustering algorithm that solves a quadratic hypergraph cut based objective akin to a hypergraph analog of Andersen-Chung-Lang personalized PageRank clustering for graphs. We prove that, for graphs with fixed maximum hyperedge size, this method is strongly local, meaning that its runtime only depends on the size of the output instead of the size of the hypergraph and is highly scalable. Moreover, our method enables us to compute with a wide variety of cardinality-based hypergraph cut functions. We also prove that the clusters found by solving the new objective function satisfy a Cheeger-like quality guarantee. We demonstrate that on large real-world hypergraphs our new method finds better clusters and runs much faster than existing approaches. Specifically, it runs in a few seconds for hypergraphs with a few million hyperedges compared with minutes for a flow-based technique. We furthermore show that our framework is general enough that can also be used to solve other p-norm based cut objectives on hypergraphs.

TG-GAN: Continuous-time Temporal Graph Deep Generative Models with Time-Validity Constraints

Deep generative models of graph-structured data have become popular in very recent years. Although initial research has focused on static graphs in applications such as molecular design and social networks, many challenges involve temporal graphs whose topology and attribute values evolve dynamically over time. Sophisticated and unknown network processes that affect temporal graphs cannot be captured adequately by prescribed models. Application areas include social mobility networks and catastrophic cybersecurity failures. These web-scale applications challenge current deep graph generative models with the need to capture 1) time-validity constraints, 2) time and topological distributions, and 3) joint time and graph encoding and decoding. Here, we propose the “Temporal Graph Generative Adversarial Network” (TG-GAN) for continuous-time graph generation with time-validity constraints 1. TG-GAN can jointly generate the time, node, and edge information for truncated temporal walks via a novel recurrent-based model and a valid time decoder. The generated truncated temporal walks are then assembled into time-budgeted temporal walks for temporal graphs under the learned topological and temporal dependencies. In addition, a discriminator is proposed to combine time and node encoding operations over a recurrent architecture to distinguish generated sequences from real ones sampled by a truncated temporal walk sampler. Extensive experiments on both synthetic and real-world datasets confirm that TG-GAN significantly outperforms five benchmarking methods in terms of efficiency and effectiveness.

SESSION: Session: Privacy

Cookie Swap Party: Abusing First-Party Cookies for Web Tracking

As a step towards protecting user privacy, most web browsers perform some form of third-party HTTP cookie blocking or periodic deletion by default, while users typically have the option to select even stricter blocking policies. As a result, web trackers have shifted their efforts to work around these restrictions and retain or even improve the extent of their tracking capability.

In this paper, we shed light into the increasingly used practice of relying on first-party cookies that are set by third-party JavaScript code to implement user tracking and other potentially unwanted capabilities. Although unlike third-party cookies, first-party cookies are not sent automatically by the browser to third-parties on HTTP requests, this tracking is possible because any included third-party code runs in the context of the parent page, and thus can fully set or read existing first-party cookies—which it can then leak to the same or other third parties. Previous works that survey user privacy on the web in relation to cookies, third-party or otherwise, have not fully explored this mechanism. To address this gap, we propose a dynamic data flow tracking system based on Chromium to track the leakage of first-party cookies to third parties, and used it to conduct a large-scale study of the Alexa top 10K websites. In total, we found that 97.72% of the websites have first-party cookies that are set by third-party JavaScript, and that on 57.66% of these websites there is at least one such cookie that contains a unique user identifier that is diffused to multiple third parties. Our results highlight the privacy-intrusive capabilities of first-party cookies, even when a privacy-savvy user has taken mitigative measures such as blocking third-party cookies, or employing popular crowd-sourced filter lists such as EasyList/EasyPrivacy and the Disconnect list.

User Tracking in the Post-cookie Era: How Websites Bypass GDPR Consent to Track Users

During the past few years, mostly as a result of the GDPR and the CCPA, websites have started to present users with cookie consent banners. These banners are web forms where the users can state their preference and declare which cookies they would like to accept, if such option exists. Although requesting consent before storing any identifiable information is a good start towards respecting the user privacy, yet previous research has shown that websites do not always respect user choices. Furthermore, considering the ever decreasing reliance of trackers on cookies and actions browser vendors take by blocking or restricting third-party cookies, we anticipate a world where stateless tracking emerges, either because trackers or websites do not use cookies, or because users simply refuse to accept any.

In this paper, we explore whether websites use more persistent and sophisticated forms of tracking in order to track users who said they do not want cookies. Such forms of tracking include first-party ID leaking, ID synchronization, and browser fingerprinting. Our results suggest that websites do use such modern forms of tracking even before users had the opportunity to register their choice with respect to cookies. To add insult to injury, when users choose to raise their voice and reject all cookies, user tracking only intensifies. As a result, users’ choices play very little role with respect to tracking: we measured that more than 75% of tracking activities happened before users had the opportunity to make a selection in the cookie consent banner, or when users chose to reject all cookies.

It’s Not Just the Site, It’s the Contents: Intra-domain Fingerprinting Social Media Websites Through CDN Bursts

The website fingerprinting (or inter-domain WSF), enhanced by various machine learning techniques, has shown its power to identify websites a user has visited. To our best knowledge, a finer-grained problem of web page fingerprinting (or intra-domain WPF) has not been systematically studied by our research community. The WPF attackers, such as government agencies enforcing Internet censorship, are keen to identify the particular web pages (e.g., a political dissident’s social media page) visited by the target user.

In this work, we investigate the intra-domain WPF among social media websites, against the realistic on-path passive attack scenario. We reveal that delivering large-size data such as images and videos via Content Delivery Networks (CDNs), which is a common practice in social media websites, makes intra-domain WPF highly feasible. The network traffic generated during rendering a social media page exhibits temporal and volumetric patterns that are sufficiently recognizable by machine learning algorithms. We characterize such patterns as CDN bursts, and use features extracted from them to empower classification algorithms to achieve a high classification accuracy (96%) and a low false positive rate (0.02%).

Have You been Properly Notified? Automatic Compliance Analysis of Privacy Policy Text with GDPR Article 13

With the rapid development of web and mobile applications, as well as their wide adoption in different domains, more and more personal data is provided, consciously or unconsciously, to different application providers. Privacy policy is an important medium for users to understand what personal information has been collected and used. As data privacy protection is becoming a critical social issue, there are laws and regulations being enacted in different countries and regions, and the most representative one is the EU General Data Protection Regulation (GDPR). It is thus important to detect compliance issues among regulations, e.g., GDPR, with privacy policies, and provide intuitive results for data subjects (i.e., users), data collection party (i.e., service providers) and the regulatory authorities. In this work, we target to solve the problem of compliance analysis between GDPR (Article 13) and privacy policies. We format the task into a combination of a sentence classification step and a rule-based analysis step. We manually curate a corpus of 36,610 labeled sentences from 304 privacy policies, and benchmark our corpus with several standard sentence classifiers. We also conduct a rule-based analysis to detect compliance issues and a user study to evaluate the usability of our approach. The web-based tool AutoCompliance is publicly accessible 1.

Privacy Policies over Time: Curation and Analysis of a Million-Document Dataset

SESSION: Session: Recommendations

STAN: Spatio-Temporal Attention Network for Next Location Recommendation

The next location recommendation is at the core of various location-based applications. Current state-of-the-art models have attempted to solve spatial sparsity with hierarchical gridding and model temporal relation with explicit time intervals, while some vital questions remain unsolved. Non-adjacent locations and non-consecutive visits provide non-trivial correlations for understanding a user’s behavior but were rarely considered. To aggregate all relevant visits from user trajectory and recall the most plausible candidates from weighted representations, here we propose a Spatio-Temporal Attention Network (STAN) for location recommendation. STAN explicitly exploits relative spatiotemporal information of all the check-ins with self-attention layers along the trajectory. This improvement allows a point-to-point interaction between non-adjacent locations and non-consecutive check-ins with explicit spatio-temporal effect. STAN uses a bi-layer attention architecture that firstly aggregates spatiotemporal correlation within user trajectory and then recalls the target with consideration of personalized item frequency (PIF). By visualization, we show that STAN is in line with the above intuition. Experimental results unequivocally show that our model outperforms the existing state-of-the-art methods by 9-17%.

Session-aware Linear Item-Item Models for Session-based Recommendation

Session-based recommendation aims at predicting the next item given a sequence of previous items consumed in the session, e.g., on e-commerce or multimedia streaming services. Specifically, session data exhibits some unique characteristics, i.e., session consistency and sequential dependency over items within the session, repeated item consumption, and session timeliness. In this paper, we propose simple-yet-effective linear models for considering the holistic aspects of the sessions. The comprehensive nature of our models helps improve the quality of session-based recommendation. More importantly, it provides a generalized framework for reflecting different perspectives of session data. Furthermore, since our models can be solved by closed-form solutions, they are highly scalable. Experimental results demonstrate that the proposed linear models show competitive or state-of-the-art performance in various metrics on several real-world datasets.

Learning Fair Representations for Recommendation: A Graph-based Perspective

As a key application of artificial intelligence, recommender systems are among the most pervasive computer aided systems to help users find potential items of interests. Recently, researchers paid considerable attention to fairness issues for artificial intelligence applications. Most of these approaches assumed independence of instances, and designed sophisticated models to eliminate the sensitive information to facilitate fairness. However, recommender systems differ greatly from these approaches as users and items naturally form a user-item bipartite graph, and are collaboratively correlated in the graph structure. In this paper, we propose a novel graph based technique for ensuring fairness of any recommendation models. Here, the fairness requirements refer to not exposing sensitive feature set in the user modeling process. Specifically, given the original embeddings from any recommendation models, we learn a composition of filters that transform each user’s and each item’s original embeddings into a filtered embedding space based on the sensitive feature set. For each user, this transformation is achieved under the adversarial learning of a user-centric graph, in order to obfuscate each sensitive feature between both the filtered user embedding and the sub graph structures of this user. Finally, extensive experimental results clearly show the effectiveness of our proposed model for fair recommendation. We publish the source code at https://github.com/newlei/FairGo.

Leveraging Review Properties for Effective Recommendation

Many state-of-the-art recommendation systems leverage explicit item reviews posted by users by considering their usefulness in representing the users’ preferences and describing the items’ attributes. These posted reviews may have various associated properties, such as their length, their age since they were posted, or their rating of the item. However, it remains unclear how these different review properties contribute to the usefulness of their corresponding reviews in addressing the recommendation task. In particular, users show distinct preferences when considering different aspects of the reviews (i.e. properties) for making decisions about the items. Hence, it is important to model the relationship between the reviews’ properties and the usefulness of the reviews while learning the users’ preferences and the items’ attributes. In this paper, we propose to model the reviews with their associated available properties. We introduce a novel review properties-based recommendation model (RPRM) that learns which review properties are more important than others in capturing the usefulness of reviews, thereby enhancing the recommendation results. Furthermore, inspired by the users’ information adoption framework, we integrate two loss functions and a negative sampling strategy into our proposed RPRM model, to ensure that the properties of reviews are correlated with the users’ preferences. We examine the effectiveness of RPRM using the well-known Yelp and Amazon datasets. Our results show that RPRM significantly outperforms a classical and five existing state-of-the-art baselines. Moreover, we experimentally show the advantages of using our proposed loss functions and negative sampling strategy, which further enhance the recommendation performances of RPRM.

A Model of Two Tales: Dual Transfer Learning Framework for Improved Long-tail Item Recommendation

Highly skewed long-tail item distribution is very common in recommendation systems. It significantly hurts model performance on tail items. To improve tail-item recommendation, we conduct research to transfer knowledge from head items to tail items, leveraging the rich user feedback in head items and the semantic connections between head and tail items. Specifically, we propose a novel dual transfer learning framework that jointly learns the knowledge transfer from both model-level and item-level: 1. The model-level knowledge transfer builds a generic meta-mapping of model parameters from few-shot to many-shot model. It captures the implicit data augmentation on the model-level to improve the representation learning of tail items. 2. The item-level transfer connects head and tail items through item-level features, to ensure a smooth transfer of meta-mapping from head items to tail items. The two types of transfers are incorporated to ensure the learned knowledge from head items can be well applied for tail item representation learning in the long-tail distribution settings. Through extensive experiments on two benchmark datasets, results show that our proposed dual transfer learning framework significantly outperforms other state-of-the-art methods for tail item recommendation in hit ratio and NDCG. It is also very encouraging that our framework further improves head items and overall performance on top of the gains on tail items.

SESSION: Session: Mobile and Ubiquitous Computing

DeepVista: 16K Panoramic Cinema on Your Mobile Device

In this paper, we design, implement, and evaluate , which is to our knowledge the first consumer-class system that streams panoramic videos far beyond the ultra high-definition resolution (up to 16K) to mobile devices, offering truly immersive experiences. Such an immense resolution makes streaming video-on-demand (VoD) content extremely resource-demanding. To tackle this challenge, introduces a novel framework that leverages an edge server to perform efficient, intelligent, and quality-guaranteed content transcoding, by extracting from panoramic frames the viewport stream that will be delivered to the client. To support real-time transcoding of 16K content, employs several key mechanisms such as dual-GPU acceleration, lossless viewport extraction, deep viewport prediction, and a two-layer streaming design. Our extensive evaluations using real users’ viewport movement data indicate that outperforms existing solutions, and can smoothly stream 16K panoramic videos to mobile devices over diverse wireless networks including WiFi, LTE, and mmWave 5G.

CoopEdge: A Decentralized Blockchain-based Platform for Cooperative Edge Computing

Edge computing (EC) has recently emerged as a novel computing paradigm that offers users low-latency services. Suffering from constrained computing resources due to their limited physical sizes, edge servers cannot always handle all the incoming computation tasks timely when they operate independently. They often need to cooperate through peer-offloading. Deployed and managed by different stakeholders, edge servers operate in a distrusted environment. Trust and incentive are the two main issues that challenge cooperative computing between them. Another unique challenge in the EC environment is to facilitate trust and incentive in a decentralized manner. To tackle these challenges systematically, this paper proposes CoopEdge, a novel blockchain-based decentralized platform, to drive and support cooperative edge computing. On CoopEdge, an edge server can publish a computation task for other edge servers to contend for. A winner is selected from candidate edge servers based on their reputations. After that, a consensus is reached among edge servers to record the performance in task execution on blockchain. We implement CoopEdge based on Hyperledger Sawtooth and evaluate it experimentally against a baseline and two state-of-the-art implementations in a simulated EC environment. The results validate the usefulness of CoopEdge and demonstrate its performance.

Temporal Analysis of the Entire Ethereum Blockchain Network

With over 42 billion USD market capitalization (October 2020), Ethereum is the largest public blockchain that supports smart contracts. Recent works have modeled transactions, tokens, and other interactions in the Ethereum blockchain as static graphs to provide new observations and insights by conducting relevant graph analysis. Surprisingly, there is much less study on the evolution and temporal properties of these networks. In this paper, we investigate the evolutionary nature of Ethereum interaction networks from a temporal graphs perspective. We study the growth rate and model of four Ethereum blockchain networks, active lifespan and update rate of high-degree vertices. We detect anomalies based on temporal changes in global network properties, and forecast the survival of network communities in succeeding months leveraging on the relevant graph features and machine learning models.

SrVARM: State Regularized Vector Autoregressive Model for Joint Learning of Hidden State Transitions and State-Dependent Inter-Variable Dependencies from Multi-variate Time Series

Many applications, e.g., healthcare, education, call for effective methods methods for constructing predictive models from high dimensional time series data where the relationship between variables can be complex and vary over time. In such settings, the underlying system undergoes a sequence of unobserved transitions among a finite set of hidden states. Furthermore, the relationships between the observed variables and their temporal dynamics may depend on the hidden state of the system. To further complicate matters, the hidden state sequences underlying the observed data from different individuals may not be aligned relative to a common frame of reference. Against this background, we consider the novel problem of jointly learning the state-dependent inter-variable relationships as well as the pattern of transitions between hidden states from multi-variate time series data. To solve this problem, we introduce the State-Regularized Vector Autoregressive Model (SrVARM) which combines a state-regularized recurrent neural network to learn the dynamics of transitions between discrete hidden states with an augmented autoregressive model which models the inter-variable dependencies in each state using a state-dependent directed acyclic graph (DAG). We propose an efficient algorithm for training SrVARM by leveraging a recently introduced reformulation of the combinatorial problem of optimizing the DAG structure with respect to a scoring function into a continuous optimization problem. We report results of extensive experiments with simulated data as well as a real-world benchmark that show that SrVARM outperforms state-of-the-art baselines in recovering the unobserved state transitions and discovering the state-dependent relationships among variables.

Equilibrium Inverse Reinforcement Learning for Ride-hailing Vehicle Network

Ubiquitous mobile computing have enabled ride-hailing services to collect vast amounts of behavioral data of riders and drivers and optimize supply and demand matching in real time. While these mobility service providers have some degree of control over the market by assigning vehicles to requests, they need to deal with the uncertainty arising from self-interested driver behavior since workers are usually free to drive when they are not assigned tasks. If a driver’s behavior can be accurately replicated on the digital twin, more detailed and realistic counterfactual simulations will enable decision making to improve mobility services as well as to validate urban planning.

In this work, we formulate the problem of passenger-vehicle matching in a sparsely connected graph and proposed an algorithm to derive an equilibrium policy in a multi-agent environment. Our framework combines value iteration methods to estimate the optimal policy given expected state visitation and policy propagation to compute multi-agent state visitation frequencies. Furthermore, we developed a method to learn the driver’s reward function transferable to an environment with significantly different dynamics from training data. We evaluated the robustness to changes in spatio-temporal supply-demand distributions and deterioration in data quality using a real-world taxi trajectory dataset; our approach significantly outperforms several baselines in terms of imitation accuracy. The computational time required to obtain an equilibrium policy shared by all vehicles does not depend on the number of agents, and even on the scale of real-world services, it takes only a few seconds on a single CPU.

SESSION: Session: Online Advertising

Unifying Offline Causal Inference and Online Bandit Learning for Data Driven Decision

A fundamental question for companies with large amount of logged data is: How to use such logged data together with incoming streaming data to make good decisions? Many companies currently make decisions via online A/B tests, but wrong decisions during testing hurt users’ experiences and cause irreversible damage. A typical alternative is offline causal inference, which analyzes logged data alone to make decisions. However, these decisions are not adaptive to the new incoming data, and so a wrong decision will continuously hurt users’ experiences. To overcome the aforementioned limitations, we propose a framework to unify offline causal inference algorithms (e.g., weighting, matching) and online learning algorithms (e.g., UCB, LinUCB). We propose novel algorithms and derive bounds on the decision accuracy via the notion of “regret”. We derive the first upper regret bound for forest-based online bandit algorithms. Experiments on two real datasets show that our algorithms outperform other algorithms that use only logged data or online feedbacks, or algorithms that do not use the data properly.

Automated Creative Optimization for E-Commerce Advertising

Advertising creatives are ubiquitous in E-commerce advertisements and aesthetic creatives may improve the click-through rate (CTR) of the products. Nowadays smart advertisement platforms provide the function of compositing creatives based on source materials provided by advertisers. Since a great number of creatives can be generated, it is difficult to accurately predict their CTR given a limited amount of feedback. Factorization machine (FM), which models inner product interaction between features, can be applied for the CTR prediction of creatives. However, interactions between creative elements may be more complex than the inner product, and the FM-estimated CTR may be of high variance due to limited feedback. To address these two issues, we propose an Automated Creative Optimization (AutoCO) framework to model complex interaction between creative elements and to balance between exploration and exploitation. Specifically, motivated by AutoML, we propose one-shot search algorithms for searching effective interaction functions between elements. We then develop stochastic variational inference to estimate the posterior distribution of parameters based on the reparameterization trick, and apply Thompson Sampling for efficiently exploring potentially better creatives. We evaluate the proposed method with both a synthetic dataset and two public datasets. The experimental results show our method can outperform competing baselines with respect to cumulative regret. The online A/B test shows our method leads to a 7% increase in CTR compared to the baseline.

GuideBoot: Guided Bootstrap for Deep Contextual Banditsin Online Advertising

The exploration/exploitation (E&E) dilemma lies at the core of interactive systems such as online advertising, for which contextual bandit algorithms have been proposed. Bayesian approaches provide guided exploration via uncertainty estimation, but the applicability is often limited due to over-simplified assumptions. Non-Bayesian bootstrap methods, on the other hand, can apply to complex problems by using deep reward models, but lack a clear guidance to the exploration behavior. It still remains largely unsolved to develop a practical method for complex deep contextual bandits.

In this paper, we introduce Guided Bootstrap (GuideBoot), combining the best of both worlds. GuideBoot provides explicit guidance to the exploration behavior by training multiple models over both real samples and noisy samples with fake labels, where the noise is added according to the predictive uncertainty. The proposed method is efficient as it can make decisions on-the-fly by utilizing only one randomly chosen model, but is also effective as we show that it can be viewed as a non-Bayesian approximation of Thompson sampling. Moreover, we extend it to an online version that can learn solely from streaming data, which is favored in real applications. Extensive experiments on both synthetic tasks and large-scale advertising environments show that GuideBoot achieves significant improvements against previous state-of-the-art methods.

A Hybrid Bandit Model with Visual Priors for Creative Ranking in Display Advertising

Creative plays a great important role in e-commerce for exhibiting products. Sellers usually create multiple creatives for comprehensive demonstrations, thus it is crucial to display the most appealing design to maximize the Click-Through Rate (CTR). For this purpose, modern recommender systems dynamically rank creatives when a product is proposed for a user. However, this task suffers more cold-start problem than conventional products recommendation since the user-click data is more scarce and creatives potentially change more frequently. In this paper, we propose a hybrid bandit model with visual priors which first makes predictions with a visual evaluation, and then naturally evolves to focus on the specialities through the hybrid bandit model. Our contributions are three-fold: 1) We present a visual-aware ranking model (called VAM) that incorporates a list-wise ranking loss for ordering the creatives according to the visual appearance. 2) Regarding visual evaluation as a prior, the hybrid bandit model (called HBM) is proposed to evolve consistently to make better posteriori estimations by taking more observations into consideration for online scenarios. 3) A first large-scale creative dataset, CreativeRanking1, is constructed, which contains over 1.7M creatives of 500k products as well as their real impression and click data. Extensive experiments have also been conducted on both our dataset and public Mushroom dataset, demonstrating the effectiveness of the proposed method.

Local Clustering in Contextual Multi-Armed Bandits

We study identifying user clusters in contextual multi-armed bandits (MAB). Contextual MAB is an effective tool for many real applications, such as content recommendation and online advertisement. In practice, user dependency plays an essential role in the user’s actions, and thus the rewards. Clustering similar users can improve the quality of reward estimation, which in turn leads to more effective content recommendation and targeted advertising. Different from traditional clustering settings, we cluster users based on the unknown bandit parameters, which will be estimated incrementally. In particular, we define the problem of cluster detection in contextual MAB, and propose a bandit algorithm, LOCB, embedded with local clustering procedure. And, we provide theoretical analysis about LOCB in terms of the correctness and efficiency of clustering and its regret bound. Finally, we evaluate the proposed algorithm from various aspects, which outperforms state-of-the-art baselines.

SESSION: Session: Web Mining Applications

Learning from Graph Propagation via Ordinal Distillation for One-Shot Automated Essay Scoring

One-shot automated essay scoring (AES) aims to assign scores to a set of essays written specific to a certain prompt, with only one manually scored essay per distinct score. Compared to the previous-studied prompt-specific AES which usually requires a large number of manually scored essays for model training (e.g., about 600 manually scored essays out of totally 1000 essays), one-shot AES can greatly reduce the workload of manual scoring. In this paper, we propose a Transductive Graph-based Ordinal Distillation (TGOD) framework to tackle the task of one-shot AES. Specifically, we design a transductive graph-based model as a teacher model to generate pseudo labels of unlabeled essays based on the one-shot labeled essays. Then, we distill the knowledge in the teacher model into a neural student model by learning from the high confidence pseudo labels. Different from the general knowledge distillation, we propose an ordinal-aware unimodal distillation which makes a unimodal distribution constraint on the output of student model, to tolerate the minor errors existed in pseudo labels. Experimental results on the public dataset ASAP show that TGOD can improve the performance of existing neural AES models under the one-shot AES setting and achieve an acceptable average QWK of 0.69.

Wiki2Prop: A Multimodal Approach for Predicting Wikidata Properties from Wikipedia

Wikidata is rapidly emerging as a key resource for a multitude of online tasks such as Speech Recognition, Entity Linking, Question Answering, or Semantic Search. The value of Wikidata is directly linked to the rich information associated with each entity – that is, the properties describing each entity as well as the relationships to other entities. Despite the tremendous manual and automatic efforts the community invested in the Wikidata project, the growing number of entities (now more than 100 million) presents multiple challenges in terms of knowledge gaps in the graph that are hard to track. To help guide the community in filling the gaps in Wikidata, we propose to identify and rank the properties that an entity might be missing. In this work, we focus on entities with a dedicated Wikipedia page in any language to make predictions directly based on textual content. We show that this problem can be formulated as a multi-label classification problem where every property defined in Wikidata is a potential label. Our main contribution, Wiki2Prop, solves this problem using a multimodal Deep Learning method to predict which properties should be attached to a given entity, using its Wikipedia page embeddings. Moreover, Wiki2Prop is able to incorporate additional features in the form of multilingual embeddings and multimodal data such as images whenever available. We empirically evaluate our approach against the state of the art and show how Wiki2Prop significantly outperforms its competitors for the task of property prediction in Wikidata, and how the use of multilingual and multimodal data improves the results further. Finally, we make Wiki2Prop available as a property recommender system that can be activated and used directly in the context of a Wikidata entity page.

FANCY: Human-centered, Deep Learning-based Framework for Fashion Style Analysis

Fashion style analysis is of the utmost importance for fashion professionals. However, it has an issue of having different style classification criteria that rely heavily on professionals’ subjective experiences with no quantitative criteria. We present FANCY (Fashion Attributes detectioN for Clustering stYle), a human-centered, deep learning-based framework to support fashion professionals’ analytic tasks using a computational method integrated with their insights. We work closely with fashion professionals in the whole study process to reflect their domain knowledge and experience as much as possible. We redefine fashion attributes, demonstrate a strong association with fashion attributes and styles, and develop a deep learning model that detects attributes in a given fashion image and reflects fashion professionals’ insight. Based on attribute-annotated 302,772 runway fashion images, we developed 25 new fashion styles (FANCY dataset 1). We summarize quantitative standards of the fashion style groups and present fashion trends based on time, location, and brand.

PARIMA: Viewport Adaptive 360-Degree Video Streaming

With increasing advancements in technologies for capturing 360° videos, advances in streaming such videos have become a popular research topic. However, streaming 360° videos require high bandwidth, thus escalating the need for developing optimized streaming algorithms. Researchers have proposed various methods to tackle the problem, considering the network bandwidth or attempt to predict future viewports in advance. However, most of the existing works either (1) do not consider video contents to predict user viewport, or (2) do not adapt to user preferences dynamically, or (3) require a lot of training data for new videos, thus making them potentially unfit for video streaming purposes. We develop PARIMA, a fast and efficient online viewport prediction model that uses past viewports of users along with the trajectories of prime objects as a representative of video content to predict future viewports. We claim that the head movement of a user majorly depends upon the trajectories of the prime objects in the video. We employ a pyramid-based bitrate allocation scheme and perform a comprehensive evaluation of the performance of PARIMA. In our evaluation, we show that PARIMA outperforms state-of-the-art approaches, improving the Quality of Experience by over 30% while maintaining a short response time.

Controllable and Diverse Text Generation in E-commerce

In E-commerce, a key challenge in text generation is to find a good trade-off between word diversity and accuracy (relevance) in order to make generated text appear more natural and human-like. In order to improve the relevance of generated results, conditional text generators were developed that use input keywords or attributes to produce the corresponding text. Prior work, however, do not finely control the diversity of automatically generated sentences. For example, it does not control the order of keywords to put more relevant ones first. Moreover, it does not explicitly control the balance between diversity and accuracy. To remedy these problems, we propose a fine-grained controllable generative model, called Apex, that uses an algorithm borrowed from automatic control (namely, a variant of the proportional, integral, and derivative (PID) controller) to precisely manipulate the diversity/accuracy trade-off of generated text. The algorithm is injected into a Conditional Variational Autoencoder (CVAE), allowing Apex to control both (i) the order of keywords in the generated sentences (conditioned on the input keywords and their order), and (ii) the trade-off between diversity and accuracy. Evaluation results on real world datasets 1 show that the proposed method outperforms existing generative models in terms of diversity and relevance. Moreover, it achieves about 97% accuracy in the control of the order of keywords.

Apex is currently deployed to generate production descriptions and item recommendation reasons in Taobao2, the largest E-commerce platform in China. The A/B production test results show that our method improves click-through rate (CTR) by 13.17% compared to the existing method for production descriptions. For item recommendation reason, it is able to increase CTR by 6.89% and 1.42% compared to user reviews and top-K item recommendation without reviews, respectively.

SESSION: Session: Special networks and dynamics

Nonlinear Higher-Order Label Spreading

Label spreading is a general technique for semi-supervised learning with point cloud or network data, which can be interpreted as a diffusion of labels on a graph. While there are many variants of label spreading, nearly all of them are linear models, where the incoming information to a node is a weighted sum of information from neighboring nodes. Here, we add nonlinearity to label spreading via nonlinear functions involving higher-order network structure, namely triangles in the graph. For a broad class of nonlinear functions, we prove convergence of our nonlinear higher-order label spreading algorithm to the global solution of an interpretable semi-supervised loss function. We demonstrate the efficiency and efficacy of our approach on a variety of point cloud and network datasets, where the nonlinear higher-order model outperforms classical label spreading, hypergraph clustering, and graph neural networks.

HDMI: High-order Deep Multiplex Infomax

Networks have been widely used to represent the relations between objects such as academic networks and social networks, and learning embedding for networks has thus garnered plenty of research attention. Self-supervised network representation learning aims at extracting node embedding without external supervision. Recently, maximizing the mutual information between the local node embedding and the global summary (e.g. Deep Graph Infomax, or DGI for short) has shown promising results on many downstream tasks such as node classification. However, there are two major limitations of DGI. Firstly, DGI merely considers the extrinsic supervision signal (i.e., the mutual information between node embedding and global summary) while ignores the intrinsic signal (i.e., the mutual dependence between node embedding and node attributes). Secondly, nodes in a real-world network are usually connected by multiple edges with different relations, while DGI does not fully explore the various relations among nodes. To address the above-mentioned problems, we propose a novel framework, called High-order Deep Multiplex Infomax (HDMI), for learning node embedding on multiplex networks in a self-supervised way. To be more specific, we first design a joint supervision signal containing both extrinsic and intrinsic mutual information by high-order mutual information, and we propose a High-order Deep Infomax (HDI) to optimize the proposed supervision signal. Then we propose an attention based fusion module to combine node embedding from different layers of the multiplex network. Finally, we evaluate the proposed HDMI on various downstream tasks such as unsupervised clustering and supervised classification. The experimental results show that HDMI achieves state-of-the-art performance on these tasks.

Network of Tensor Time Series

Co-evolving time series appears in a multitude of applications such as environmental monitoring, financial analysis, and smart transportation. This paper aims to address the following challenges, including (C1) how to incorporate explicit relationship networks of the time series; (C2) how to model the implicit relationship of the temporal dynamics. We propose a novel model called Network of Tensor Time Series (NeT3), which is comprised of two modules, including Tensor Graph Convolutional Network (TGCN) and Tensor Recurrent Neural Network (TRNN). TGCN tackles the first challenge by generalizing Graph Convolutional Network (GCN) for flat graphs to tensor graphs, which captures the synergy between multiple graphs associated with the tensors. TRNN leverages tensor decomposition to model the implicit relationships among co-evolving time series. The experimental results on five real-world datasets demonstrate the efficacy of the proposed method.

Improving Graph Neural Networks with Structural Adaptive Receptive Fields

The abundant information in graphs helps us to learn more expressive node representations. Different nodes in the neighborhood have different importance to the central node. Thus, average weight aggregation in most Graph Neural Networks would fail to model such difference. GAT-based models introduce the attention mechanism to solve this problem, but they ignore the rich structural information and may suffer from the problem of over-smoothing. In this paper, we propose Graph Neural Networks with STructural Adaptive Receptive fields (STAR-GNN), which adaptively construct a receptive field for each node with structural information and further achieve better aggregation of information. Firstly, we model local structural distribution based on anonymous random walks, followed by using the structural information to construct receptive fields guided with mutual information. Then, as the generated receptive fields are irregular, we design a sub-graph aggregator to boost node representations and theoretically prove that it has the ability to capture the complex structures in receptive fields. Experimental results demonstrate the power of STAR-GNN in learning structural receptive fields adaptively and encoding more informative structural characteristics in real-world networks.

Few-shot Network Anomaly Detection via Cross-network Meta-learning

Network anomaly detection, also known as graph anomaly detection, aims to find network elements (e.g., nodes, edges, subgraphs) with significantly different behaviors from the vast majority. It has a profound impact in a variety of applications ranging from finance, healthcare to social network analysis. Due to the unbearable labeling cost, existing methods are predominately developed in an unsupervised manner. Nonetheless, the anomalies they identify may turn out to be data noises or uninteresting data instances due to the lack of prior knowledge on the anomalies of interest. Hence, it is critical to investigate and develop few-shot learning for network anomaly detection. In real-world scenarios, few labeled anomalies are also easy to be accessed on similar networks from the same domain as of the target network, while most of the existing works omit to leverage them and merely focus on a single network. Taking advantage of this potential, in this work, we tackle the problem of few-shot network anomaly detection by (1) proposing a new family of graph neural networks – Graph Deviation Networks (GDN) that can leverage a small number of labeled anomalies for enforcing statistically significant deviations between abnormal and normal nodes on a network; and (2) equipping the proposed GDN with a new cross-network meta-learning algorithm to realize few-shot network anomaly detection by transferring meta-knowledge from multiple auxiliary networks. Extensive evaluations demonstrate the efficacy of the proposed approach on few-shot or even one-shot network anomaly detection.

SESSION: Session: Applications

Learning Dynamic User Behavior Based on Error-driven Event Representation

Understanding the evolution of large graphs over time is of significant importance in user behavior understanding and prediction. Modeling user behavior with temporal networks has gained increasing attention in recent years since it allows capturing users’ dynamic preferences and predicting their next actions. Recently, some approaches have been proposed to model user behavior. However, these methods suffer from two problems: they work on static data, which ignores the dynamic evolution, or they model the whole behavior sequences directly by recurrent neural networks and thus suffer from noisy information. To tackle these problems, we propose a dynamic user behavior learning algorithm called LDBR. It views user behaviors as a set of dynamic events and uses recent event embedding to predict future user behavior and infer the current semantic labels. Specifically, we propose a new strategy to automatically learn a good event embedding in behavior sequence by introducing a smooth sampling strategy and minimizing the temporal link prediction error.

It is hard to obtain real-world datasets with evolving labels. Thus in this paper, we provide a new dynamic network dataset with evolving labels called Arxiv and make it publicly available. Based on the Arxiv dataset, we conduct a case study to verify the quality of event embedding. Extensive experiments on temporal link prediction tasks further demonstrate the effectiveness of the LDBR model.

Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks

We study the problem of incorporating prior knowledge into a deep Transformer-based model, i.e., Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and analyzing what BERT has already known when solving this task, we obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed. The analysis further motivates us to take a different approach than most existing works. Instead of using prior knowledge to create a new training task for fine-tuning BERT, we directly inject knowledge into BERT’s multi-head attention mechanism. This leads us to a simple yet effective approach that enjoys fast training stage as it saves the model from training on additional data or tasks other than the main task. Extensive experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance over the original BERT model, and the performance benefit is most salient when training data is scarce.

Wait, Let’s Think about Your Purchase Again: A Study on Interventions for Supporting Self-Controlled Online Purchases

As online marketplaces adopt new technologies to encourage consumers’ purchases (e.g., one-click purchases), the number of consumers who impulsively buy products also increases. Although some interventions have been introduced for consumers’ self-controlled purchases, there have been few studies that evaluate the effectiveness of the techniques in the real environment. In this paper, we conducted an online survey with 118 consumers in their 20s to investigate their impulse buying behaviors and self-control strategies. Based on the survey results and literature surveys, we developed interventions that can assist consumers in controlling their online purchase habits, including Reflection, Distraction, Desire Reduction, and Salient Cost. For evaluation, we enrolled 107 participants in a user study on a real-world e-commerce site. The results indicate that all interventions were effective in reducing impulse buying urges, with variations in user experiences. Our findings and design implications are discussed.

Online Mobile App Usage as an Indicator of Sleep Behavior and Job Performance

Sleep is critical to human function, mediating factors like memory, mood, energy, and alertness; therefore, it is commonly conjectured that a good night’s sleep is important for job performance. However, both real-world sleep behavior and job performance are difficult to measure at scale. In this work, we demonstrate that people’s everyday interactions with online mobile apps can reveal insights into their job performance in real-world contexts. We present an observational study in which we objectively tracked the sleep behavior and job performance of salespeople (N = 15) and athletes (N = 19) for 18 months, leveraging a mattress sensor and online mobile app to conduct the largest study of this kind to date. We first demonstrate that cumulative sleep measures are significantly correlated with job performance metrics, showing that an hour of daily sleep loss for a week was associated with a 9.0% average reduction in contracts established for salespeople and a 9.5% average reduction in game grade for the athletes. We then investigate the utility of online app interaction time as a passively collectible and scalable performance indicator. We show that app interaction time is correlated with the job performance of the athletes, but not the salespeople. To support that our app-based performance indicator truly captures meaningful variation in psychomotor function as it relates to sleep and is robust against potential confounds, we conducted a second study to evaluate the relationship between sleep behavior and app interaction time in a cohort of 274 participants. Using a generalized additive model to control for per-participant random effects, we demonstrate that participants who lost one hour of daily sleep for a week exhibited average app interaction times that were 5.0% slower. We also find that app interaction time exhibits meaningful chronobiologically consistent correlations with sleep history, time awake, and circadian rhythms. The findings from this work reveal an opportunity for online app developers to generate new insights regarding cognition and productivity.

Quiz-Style Question Generation for News Stories

A large majority of American adults get at least some of their news from the Internet. Even though many online news products have the goal of informing their users about the news, they lack scalable and reliable tools for measuring how well they are achieving this goal, and therefore have to resort to noisy proxy metrics (e.g., click-through rates or reading time) to track their performance.

As a first step towards measuring news informedness at a scale, we study the problem of quiz-style multiple-choice question generation, which may be used to survey users about their knowledge of recent news. In particular, we formulate the problem as two sequence-to-sequence tasks: question-answer generation (QAG) and distractor, or incorrect answer, generation (DG). We introduce NewsQuizQA, the first dataset intended for quiz-style question-answer generation, containing 20K human written question-answer pairs from 5K news article summaries. Using this dataset, we propose a series of novel techniques for applying large pre-trained Transformer encoder-decoder models, namely PEGASUS and T5, to the tasks of question-answer generation and distractor generation.

We show that our models outperform strong baselines using both automated metrics and human raters. We provide a case study of running weekly quizzes on real-world users via the Google Surveys platform over the course of two months. We found that users generally found the automatically generated questions to be educational and enjoyable. Finally, to serve the research community, we are releasing the NewsQuizQA dataset.

SESSION: Session: Discovery, prediction and recommendation

Large-scale Comb-K Recommendation

Promotion recommendation, as a new recommendation paradigm in recent years, plays an important role in stimulating the purchase desire of users and maximizing the total revenue. Different from previous recommendations (e.g., item/group recommendation), promotion recommendation aims to select a set of K items based on all user preferences in selection phase and maximize the total revenue in delivery phase. Although these two phases are closely related with each other, existing methods usually focus on item selection in selection phase, largely ignoring the delivery phase and leading to sub-optimal performance. To solve the promotion recommendation problem, we propose the comb-K recommendation model, a constrained combinatorial optimization model which seamlessly integrates the selection phase and delivery phase with delicately designed constraints. When selecting K items, the comb-K recommendation is able to simultaneously search the optimal combination of item selection and delivery with the full consideration of all user preferences. Specifically, we propose a novel heterogeneous graph convolutional network to estimate user preference and propose the user-level comb-K recommendation model through solving a binary combination optimization problem. In order to handle combination explosion for large-scale users, we furtherly cluster massive users into limited groups and present a group-level comb-K recommendation model in which a novel heterogeneous graph pooling network is proposed to perform user clustering and estimate group preference. In addition, considering the ”long tail” phenomenon in e-commerce, we design a restricted neighbor heuristic search to accelerate the solving process. Extensive experiments on four datasets demonstrate the superiority of comb-K model for large-scale promotion recommendation. On billion-scale data, when clustering 2.5 × 107 users into 103 groups, our model is able to preserve 98.7% personalized preferences in group-level and significantly improves the Total Click and Hit Ratio by 9.35% and 7.14%, respectively.

Dual Side Deep Context-aware Modulation for Social Recommendation

Social recommendation is effective in improving the recommendation performance by leveraging social relations from online social networking platforms. Social relations among users provide friends’ information for modeling users’ interest in candidate items and help items expose to potential consumers (i.e., item attraction). However, there are two issues haven’t been well-studied: Firstly, for the user interests, existing methods typically aggregate friends’ information contextualized on the candidate item only, and this shallow context-aware aggregation makes them suffer from the limited friends’ information. Secondly, for the item attraction, if the item’s past consumers are the friends of or have a similar consumption habit to the targeted user, the item may be more attractive to the targeted user, but most existing methods neglect the relation enhanced context-aware item attraction.

To address the above issues, we proposed DICER (Dual sIde deepContext-awarEmodulation for socialRecommendation). Specifically, we first proposed a novel graph neural network to model the social relation and collaborative relation, and on top of high-order relations, a dual side deep context-aware modulation is introduced to capture the friends’ information and item attraction. Empirical results on two real-world datasets show the effectiveness of the proposed model and further experiments are conducted to help understand how the dual context-aware modulation works.

Graph Neural Networks for Friend Ranking in Large-scale Social Platforms

Graph Neural Networks (GNNs) have recently enabled substantial advances in graph learning. Despite their rich representational capacity, GNNs remain under-explored for large-scale social modeling applications. One such industrially ubiquitous application is friend suggestion: recommending users other candidate users to befriend, to improve user connectivity, retention and engagement. However, modeling such user-user interactions on large-scale social platforms poses unique challenges: such graphs often have heavy-tailed degree distributions, where a significant fraction of users are inactive and have limited structural and engagement information. Moreover, users interact with different functionalities, communicate with diverse groups, and have multifaceted interaction patterns.

We study the application of GNNs for friend suggestion, providing the first investigation of GNN design for this task, to our knowledge. To leverage the rich knowledge of in-platform actions, we formulate friend suggestion as multi-faceted friend ranking with multi-modal user features and link communication features. We design a neural architecture GraFRank to learn expressive user representations from multiple feature modalities and user-user interactions. Specifically, GraFRank employs modality-specific neighbor aggregators and cross-modality attentions to learn multi-faceted user representations. We conduct experiments on two multi-million user datasets from Snapchat, a leading mobile social platform, where GraFRank outperforms several state-of-the-art approaches on candidate retrieval (by 30% MRR) and ranking (by 20% MRR) tasks. Moreover, our qualitative analysis indicates notable gains for critical populations of less-active and low-degree users.

Pathfinder Discovery Networks for Neural Message Passing

In this work we propose Pathfinder Discovery Networks (PDNs), a method for jointly learning a message passing graph over a multiplex network with a downstream semi-supervised model. PDNs inductively learn an aggregated weight for each edge, optimized to produce the best outcome for the downstream learning task. PDNs are a generalization of attention mechanisms on graphs which allow flexible construction of similarity functions between nodes. They also support edge convolutions and cheap multiscale mixing layers. We show that PDNs overcome weaknesses of existing methods for graph attention (e.g. Graph Attention Networks), such as the diminishing weight problem.

Our experimental results demonstrate competitive predictive performance on academic node classification tasks. Additional results from a challenging suite of node classification experiments show how PDNs can learn a wider class of functions than existing baselines. We analyze the relative computational complexity of PDNs, and show that PDN runtime is not considerably higher than static-graph models. Finally, we discuss how PDNs can be used to construct an easily interpretable attention mechanism that allows users to understand information propagation in the graph.

Few-Shot Graph Learning for Molecular Property Prediction

The recent success of graph neural networks has significantly boosted molecular property prediction, advancing activities such as drug discovery. The existing deep neural network methods usually require large training dataset for each property, impairing their performance in cases (especially for new molecular properties) with a limited amount of experimental data, which are common in real situations. To this end, we propose Meta-MGNN, a novel model for few-shot molecular property prediction. Meta-MGNN applies molecular graph neural network to learn molecular representations and builds a meta-learning framework for model optimization. To exploit unlabeled molecular information and address task heterogeneity of different molecular properties, Meta-MGNN further incorporates molecular structures, attribute based self-supervised modules and self-attentive task weights into the former framework, strengthening the whole learning model. Extensive experiments on two public multi-property datasets demonstrate that Meta-MGNN outperforms a variety of state-of-the-art methods.

SESSION: Session: Question Answering Systems

Multi-domain Dialogue State Tracking with Recursive Inference

Multi-domain dialogue state tracking (DST) is a critical component for monitoring user goals during the course of an interaction. Existing approaches have relied on dialogue history indiscriminately or updated on the most recent turns incrementally. However, in spite of modeling it based on fixed ontology or open vocabulary, the former setting violates the interactive and progressing nature of dialogue, while the later easily gets affected by the error accumulation conundrum. Here, we propose a Recursive Inference mechanism (ReInf) to resolve DST in multi-domain scenarios that call for more robust and accurate tracking capability. Specifically, our agent reversely reviews the dialogue history until the agent has pinpointed sufficient turns confidently for slot value prediction. It also recursively factors in potential dependencies among domains and slots to further solve the co-reference and value sharing problems. The quantitative and qualitative experimental results on the MultiWOZ 2.1 corpus demonstrate that the proposed ReInf not only outperforms the state-of-the-art methods, but also achieves reasonable turn reference and interpretable slot co-reference.

Automatic Intent-Slot Induction for Dialogue Systems

Automatically and accurately identifying user intents and filling the associated slots from their spoken language are critical to the success of dialogue systems. Traditional methods require manually defining the DOMAIN-INTENT-SLOT schema and asking many domain experts to annotate the corresponding utterances, upon which neural models are trained. This procedure brings the challenges of information sharing hindering, out-of-schema, or data sparsity in open domain dialogue systems. To tackle these challenges, we explore a new task of automatic intent-slot induction and propose a novel domain-independent tool. That is, we design a coarse-to-fine three-step procedure including Role-labeling, Concept-mining, And Pattern-mining (RCAP): (1) role-labeling: extracting key phrases from users’ utterances and classifying them into a quadruple of coarsely-defined intent-roles via sequence labeling; (2) concept-mining: clustering the extracted intent-role mentions and naming them into abstract fine-grained concepts; (3) pattern-mining: applying the Apriori algorithm to mine intent-role patterns and automatically inferring the intent-slot using these coarse-grained intent-role labels and fine-grained concepts. Empirical evaluations on both real-world in-domain and out-of-domain datasets show that: (1) our RCAP can generate satisfactory SLU schema and outperforms the state-of-the-art supervised learning method; (2) our RCAP can be directly applied to out-of-domain datasets and gain at least 76% improvement of F1-score on intent detection and 41% improvement of F1-score on slot filling; (3) our RCAP exhibits its power in generic intent-slot extractions with less manual effort, which opens pathways for schema induction on new domains and unseen intent-slot discovery for generalizable dialogue systems.

Multilingual COVID-QA: Learning towards Global Information Sharing via Web Question Answering in Multiple Languages

Since late December 2019, it has been reported an outbreak of atypical pneumonia, now known as COVID-19 caused by the novel coronavirus. Cases have spread to more than 200 countries and regions internationally. World Health Organization (WHO) officially declares the coronavirus outbreak a pandemic and the public health emergency has caused world-wide impact to daily lives: people are advised to keep social distance, in-person events have been moved online, and some function facilitates have been locked-down. Alternatively, the Web becomes an active venue for people to share information. With respect to the on-going topic, people continuously post questions online and seek for answers. Yet, sharing global information conveyed in different languages is challenging because the language barrier is intrinsically unfriendly to monolingual speakers. In this paper, we propose a multilingual COVID-QA model to answer people’s questions in their own languages while the model is able to absorb knowledge from other languages. Another challenge is that in most cases, the information to share does not have parallel data in multiple languages. To this end, we propose a novel framework which incorporates (unsupervised) translation alignment to learn as pseudo-parallel data. Then we train multilingual question-answering mapping and generation. We demonstrate the effectiveness of our proposed approach compared against a series of competitive baselines. In this way, we make it easier to share global information across the language barriers, and hopefully we contribute to the battle against COVID-19.

ComQA: Compositional Question Answering via Hierarchical Graph Neural Networks

With the development of deep learning techniques and large scale datasets, the question answering (QA) systems have been quickly improved, providing more accurate and satisfying answers. However, current QA systems either focus on the sentence-level answer, i.e., answer selection, or phrase-level answer, i.e., machine reading comprehension. How to produce compositional answers has not been throughout investigated. In compositional question answering, the systems should assemble several supporting evidence from the document to generate the final answer, which is more difficult than sentence-level or phrase-level QA. In this paper, we present a large-scale compositional question answering dataset containing more than 120k human-labeled questions. The answer in this dataset is composed of discontiguous sentences in the corresponding document. To tackle the ComQA problem, we proposed a hierarchical graph neural networks, which represent the document from the low-level word to the high-level sentence. We also devise a question selection and node selection task for pre-training. Our proposed model achieves a significant improvement over previous machine reading comprehension methods and pre-training methods. Codes, dataset can be found at https://github.com/benywon/ComQA.

Cross-domain Knowledge Distillation for Retrieval-based Question Answering Systems

Question Answering (QA) systems have been extensively studied in both academia and the research community due to their wide real-world applications. When building such industrial-scale QA applications, we are facing two prominent challenges, i.e., i) lacking a sufficient amount of training data to learn an accurate model and ii) requiring high inference speed for online model serving. There are generally two ways to mitigate the above-mentioned problems. One is to adopt transfer learning to leverage information from other domains; the other is to distill the “dark knowledge” from a large teacher model to small student models. The former usually employs parameter sharing mechanisms for knowledge transfer, but does not utilize the “dark knowledge” of pre-trained large models. The latter usually does not consider the cross-domain information from other domains. We argue that these two types of methods can be complementary to each other. Hence in this work, we provide a new perspective on the potential of the teacher-student paradigm facilitating cross-domain transfer learning, where the teacher and student tasks belong to heterogeneous domains, with the goal to improve the student model’s performance in the target domain. Our framework considers the “dark knowledge” learned from large teacher models and also leverages the adaptive hints to alleviate the domain differences between teacher and student models. Extensive experiments have been conducted on two text matching tasks for retrieval-based QA systems. Results show the proposed method has better performance than the competing methods including the existing state-of-the-art transfer learning methods. We have also deployed our method in an online production system and observed significant improvements compared to the existing approaches in terms of both accuracy and cross-domain robustness.

SESSION: Session: Ontologies and Knowledge Extraction

Computing Views of OWL Ontologies for the Semantic Web

This paper tackles the problem of computing views of OWL ontologies using a forgetting-based approach. In traditional relational databases, a view is a subset of a database, whereas in ontologies, a view is more than a subset; it contains not only axioms contained in the original ontology, but may also contain newly-derived axioms entailed by the original ontology (implicitly contained in the original ontology). Specifically, given an ontology , the signature of is the set of all the names in , and a view of is a new ontology obtained from using only part of ’s signature, namely the target signature, while preserving all logical entailments up to the target signature. Computing views of OWL ontologies is useful for Semantic Web applications such as ontology-based query answering, in a way that the view can be used as a substitute of the original ontology to answer queries formulated with the target signature, and information hiding, in the sense that it restricts users from viewing certain information of an ontology.

Forgetting is a form of non-standard reasoning concerned with eliminating from an ontology a subset of its signature, namely the forgetting signature, in such a way that all logical entailments are preserved up to the target signature. Forgetting can thus be used as a means for computing views of OWL ontologies — the solution of forgetting a set of names from an ontology is the view of for the target signature .

In this paper, we present a forgetting-based method for computing views of OWL ontologies specified in the description logic , the basic extended with role hierarchy, nominals and inverse roles. The method is terminating and sound. Despite the method not being complete, an evaluation with a prototype implementation of the method on a corpus of real-world ontologies has shown very good success rates. This is very useful from the perspective of the Semantic Web, as it provides knowledge engineers with a powerful tool for creating views of OWL ontologies.

Advanced Semantics for Commonsense Knowledge Extraction

Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications such as robust chatbots. Prior works like ConceptNet, TupleKB and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and monolithic strings for P and O. Also, these projects have either prioritized precision or recall, but hardly reconcile these complementary goals. This paper presents a methodology, called Ascent, to automatically build a large-scale knowledge base (KB) of CSK assertions, with advanced expressiveness and both better precision and recall than prior works. Ascent goes beyond triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter are important to express temporal and spatial validity of assertions and further qualifiers. Ascent combines open information extraction with judicious cleaning using language models. Intrinsic evaluation shows the superior size and quality of the Ascent KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent. A web interface, data and code can be found at https://www.mpi-inf.mpg.de/ascent.

DISCOS: Bridging the Gap between Discourse Knowledge and Commonsense Knowledge

Commonsense knowledge is crucial for artificial intelligence systems to understand natural language. Previous commonsense knowledge acquisition approaches typically rely on human annotations (for example, ATOMIC) or text generation models (for example, COMET.) Human annotation could provide high-quality commonsense knowledge, yet its high cost often results in relatively small scale and low coverage. On the other hand, generation models have the potential to automatically generate more knowledge. Nonetheless, machine learning models often fit the training data well and thus struggle to generate high-quality novel knowledge. To address the limitations of previous approaches, in this paper, we propose an alternative commonsense knowledge acquisition framework DISCOS (from DIScourse to COmmonSense), which automatically populates expensive complex commonsense knowledge to more affordable linguistic knowledge resources. Experiments demonstrate that we can successfully convert discourse knowledge about eventualities from ASER, a large-scale discourse knowledge graph, into if-then commonsense knowledge defined in ATOMIC without any additional annotation effort. Further study suggests that DISCOS significantly outperforms previous supervised approaches in terms of novelty and diversity with comparable quality. In total, we can acquire 3.4M ATOMIC-like inferential commonsense knowledge by populating ATOMIC on the core part of ASER. Codes and data are available at https://github.com/HKUST-KnowComp/DISCOS-commonsense.

Role-Aware Modeling for N-ary Relational Knowledge Bases

N-ary relational knowledge bases (KBs) represent knowledge with binary and beyond-binary relational facts. Especially, in an n-ary relational fact, the involved entities play different roles, e.g., the ternary relation PlayCharacterIn consists of three roles, Actor, Character and Movie. However, existing approaches are often directly extended from binary relational KBs, i.e., knowledge graphs, while missing the important semantic property of role. Therefore, we start from the role level, and propose a Role-Aware Modeling, RAM for short, for facts in n-ary relational KBs. RAM explores a latent space that contains basis vectors, and represents roles by linear combinations of these vectors. This way encourages semantically related roles to have close representations. RAM further introduces a pattern matrix that captures the compatibility between the role and all involved entities. To this end, it presents a multilinear scoring function to measure the plausibility of a fact composed by certain roles and entities. We show that RAM achieves both theoretical full expressiveness and computation efficiency, which also provides an elegant generalization for approaches in binary relational KBs. Experiments demonstrate that RAM outperforms representative baselines on both n-ary and binary relational datasets.

Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process.

Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors.

SESSION: Session: Security

Towards a Lightweight, Hybrid Approach for Detecting DOM XSS Vulnerabilities with Machine Learning

Client-side cross-site scripting (DOM XSS) vulnerabilities in web applications are common, hard to identify, and difficult to prevent. Taint tracking is the most promising approach for detecting DOM XSS with high precision and recall, but is too computationally expensive for many practical uses.

We investigate whether machine learning (ML) classifiers can replace or augment taint tracking when detecting DOM XSS vulnerabilities. Through a large-scale web crawl, we collect over 18 billion JavaScript functions and use taint tracking to label over 180,000 functions as potentially vulnerable. With this data, we train a deep neural network (DNN) to analyze a JavaScript function and predict if it is vulnerable to DOM XSS. We experiment with a range of hyperparameters and present a low-latency, high-recall classifier that could serve as a pre-filter to taint tracking, reducing the cost of stand-alone taint tracking by 3.43 × while detecting 94.5% of unique vulnerabilities. We argue that this combination of a DNN and taint tracking is efficient enough for a range of use cases for which taint tracking by itself is not, including in-browser run-time DOM XSS detection and analyzing large codebases.

An Empirical Study of Real-World WebAssembly Binaries: Security, Languages, Use Cases

WebAssembly has emerged as a low-level language for the web and beyond. Despite its popularity in different domains, little is known about WebAssembly binaries that occur in the wild. This paper presents a comprehensive empirical study of 8,461 unique WebAssembly binaries gathered from a wide range of sources, including source code repositories, package managers, and live websites. We study the security properties, source languages, and use cases of the binaries and how they influence the security of the WebAssembly ecosystem. Our findings update some previously held assumptions about real-world WebAssembly and highlight problems that call for future research. For example, we show that vulnerabilities that propagate from insecure source languages potentially affect a wide range of binaries (e.g., two thirds of the binaries are compiled from memory unsafe languages, such as C and C++) and that 21% of all binaries import potentially dangerous APIs from their host environment. We also show that cryptomining, which once accounted for the majority of all WebAssembly code, has been marginalized (less than 1% of all binaries found on the web) and gives way to a diverse set of use cases. Finally, 29% of all binaries on the web are minified, calling for techniques to decompile and reverse engineer WebAssembly. Overall, our results show that WebAssembly has left its infancy and is growing up into a language that powers a diverse ecosystem, with new challenges and opportunities for security researchers and practitioners. Besides these insights, we also share the dataset underlying our study, which is 58 times larger than the largest previously reported benchmark.

Security of Alerting Authorities in the WWW: Measuring Namespaces, DNSSEC, and Web PKI

During disasters, crisis, and emergencies the public relies on online services provided by official authorities to receive timely alerts, trustworthy information, and access to relief programs. It is therefore crucial for the authorities to reduce risks when accessing their online services. This includes catering to secure identification of service, secure resolution of name to network service, and content security and privacy as a minimum base for trustworthy communication.

In this paper, we take a first look at Alerting Authorities (AA) in the US and investigate security measures related to trustworthy and secure communication. We study the domain namespace structure, DNSSEC penetration, and web certificates. We introduce an integrative threat model to better understand whether and how the online presence and services of AAs are harmed. As an illustrative example, we investigate 1,388 Alerting Authorities. We observe partial heightened security relative to the global Internet trends, yet find cause for concern as about 78% of service providers fail to deploy measures of trustworthy service provision. Our analysis shows two major shortcomings. First, how the DNS ecosystem is leveraged: about 50% of organizations do not own their dedicated domain names and are dependent on others, 55% opt for unrestricted-use namespaces, which simplifies phishing, and less than 4% of unique AA domain names are secured by DNSSEC, which can lead to DNS poisoning and possibly to certificate misissuance. Second, how Web PKI certificates are utilized: 15% of all hosts provide none or invalid certificates, thus cannot cater to confidentiality and data integrity, 64% of the hosts provide domain validation certification that lack any identity information, and shared certificates have gained on popularity, which leads to fate-sharing and can be a cause for instability.

LChecker: Detecting Loose Comparison Bugs in PHP

Weakly-typed languages such as PHP support loosely comparing two operands by implicitly converting their types and values. Such a language feature is widely used but can also pose severe security threats. In certain conditions, loose comparisons can cause unexpected results, leading to authentication bypass and other functionality problems.

In this paper, we present the first in-depth study of such loose comparison bugs. We develop LChecker, a system to statically detect PHP loose comparison bugs. It employs a context-sensitive inter-procedural data-flow analysis together with several new techniques. We also enhance the PHP interpreter to help dynamically validate the detected bugs. Our evaluation shows that LChecker can both effectively and efficiently detect PHP loose comparison bugs with a reasonably low false-positive rate. It also successfully detected all previously known bugs in our evaluation dataset with no false negative. Using LChecker, we discovered 42 new loose comparison bugs and were assigned 9 new CVE IDs.

SEPAL: Towards a Large-scale Analysis of SEAndroid Policy Customization

Nowadays, SEAndroid has been widely deployed in Android devices to enforce security policies and provide flexible mandatory access control (MAC), for the purpose of narrowing down attack surfaces and restricting risky operations. Generally, the original SEAndroid security policy rules are carefully and strictly written and maintained by the Android community. However, in practice, mobile device manufacturers usually have to customize these policy rules and add their own new rules to satisfy their functionality extensions, which breaks the integrity of SEAndroid and causes serious security issues. Still, up to now, it is a challenging task to identify these security issues due to the large and ever-increasing number of policy rules, as well as the complexity of policy semantics.

To investigate the status quo of SEAndroid policy customization, we propose SEPAL, a universal tool to automatically retrieve and examine the customized policy rules. SEPAL applies the NLP technique and employs and trains a wide&deep model to quickly and precisely predict whether one rule is unregulated or not. Our evaluation shows SEPAL is effective, practical and scalable. We verify SEPAL outperforms the state of the art approach (i.e., EASEAndroid) by 15% accuracy rate on average. In our experiments, SEPAL successfully identifies 7,111 unregulated policy rules with a low false positive rate from 595,236 customized rules (extracted from 774 Android firmware images of 72 manufacturers). We further discover the policy customization problem is getting worse in newer Android versions (e.g., around 8% for Android 7 and nearly 20% for Android 9), even though more and more efforts are made. Then, we conduct a deep study and discuss why the unregulated rules are introduced and how they can compromise user devices. Last, we report some unregulated rules to seven vendors and so far four of them confirm our findings.

SESSION: Session: User Experience

From Personal Data to Digital Legacy: Exploring Conflicts in the Sharing, Security and Privacy of Post-mortem Data

As digital technologies become more prevalent there is a growing awareness of the importance of good security and privacy practices. The tools and techniques used to achieve this are typically designed with the living user in mind, with little consideration of how they should or will perform after the user has died. We report on two workshops carried out with users of password managers to explore their views on the post-mortem sharing, security and privacy of a range of common digital assets. We discuss a post-mortem privacy paradox where users recognise value in planning for their digital legacy, yet avoid actively doing so. Importantly, our findings highlight a tension between the use of recommended security tools during life and facilitating appropriate post-mortem access to chosen assets. We offer design recommendations to facilitate and encourage digital legacy planning while promoting good security habits during life.

ConceptGuide: Supporting Online Video Learning with Concept Map-based Recommendation of Learning Path

People increasingly use online video platforms, e.g., YouTube, to locate educational videos to acquire knowledge or skills to meet personal learning needs. However, most of existing video platforms display video search results in generic ranked lists based on relevance to queries. The design of relevance-oriented information display does not take into account the inner structure of the knowledge domain, and may not suit the need of online learners. In this paper, we present ConceptGuide, a prototype system for learning orientations to support ad hoc online learning from unorganized video materials. ConceptGuide features a computational pipeline that performs content analysis on the transcripts of YouTube videos retrieved for a topic, and generates concept-map-based visual recommendations of inter-concept and inter-video links, forming learning pathways as structures for learners to consume. We evaluated ConceptGuide by comparing the design to the general-purpose interface of YouTube in learning experiences and behaviors. ConceptuGuide was found to improve the efficiency of video learning and helped learners explore the knowledge of interest in many constructive ways.

An Experimental Study to Understand User Experience and Perception Bias Occurred by Fact-checking Messages

Fact-checking has become the de facto solution for fighting fake news online. This research brings attention to the unexpected and diminished effect of fact-checking due to cognitive biases. We experimented (66,870 decisions) comparing the change in users’ stance toward unproven claims before and after being presented with a hypothetical fact-checked condition. We found that, first, the claims tagged with the ‘Lack of Evidence’ label are recognized similarly as false information unlike other borderline labels, indicating the presence of uncertainty-aversion bias in response to insufficient information. Second, users who initially show disapproval toward a claim are less likely to correct their views later than those who initially approve of the same claim when opposite fact-checking labels are shown — an indication of disapproval bias. Finally, user interviews revealed that users are more likely to share claims with Divided Evidence than those with Lack of Evidence among borderline messages, reaffirming the presence of uncertainty-aversion bias. On average, we confirm that fact-checking helps users correct their views and reduces the circulation of falsehoods by leading them to abandon extreme views. Simultaneously, the presence of two biases reveals that fact-checking does not always elicit the desired user experience and that the outcome varies by the design of fact-checking messages and people’s initial view. These new observations have direct implications for multiple stakeholders, including platforms, policy-makers, and online users.

Touch Screen Exploration of Visual Artwork for Blind People

This paper investigates how touchscreen exploration and verbal feedback can be used to support blind people to access visual artwork. We present two artwork exploration modalities. The first one, attribute-based exploration, extends prior work on touchscreen image accessibility, and provides fine-grained segmentation of artwork visual elements; when the user touches an element, the associated attributes are read. The second one, hierarchical exploration, is designed with domain experts and provides multi-level segmentation of the artwork; the user initially accesses a general description of the entire artwork and then explores a coarse segmentation of the visual elements with the corresponding high-level descriptions; once selected, coarse segments are subdivided into fine-grained ones, which the user can access for more detailed descriptions.

The two exploration modalities, implemented as a mobile web app, were evaluated through a user study with 10 blind participants. Both modalities were appreciated by the participants. Attribute-based exploration is perceived to be easier to access. Instead, the hierarchical exploration was considered more understandable, useful, interesting and captivating, and the participants remembered more details about the artwork with this modality. Participants commented that the two modalities work well together and therefore both should be made available.

Generating Accurate Caption Units for Figure Captioning

Scientific-style figures are commonly used on the web to present numerical information. Captions that tell accurate figure information and sound natural would significantly improve figure accessibility. In this paper, we present promising results on machine figure captioning. A recent corpus analysis of real-world captions reveals that machine figure captioning systems should start by generating accurate caption units. We formulate the caption unit generation problem as a controlled captioning problem. Given a caption unit type as a control signal, a model generates an accurate caption unit of that type. As a proof-of-concept on single bar charts, we propose a model, FigJAM, that achieves this goal through utilizing metadata information and a joint static and dynamic dictionary. Quantitative evaluations with two datasets from the figure question answering task show that our model can generate more accurate caption units than competitive baseline models. A user study with ten human experts confirms the value of machine-generated caption units in their standalone accuracy and naturalness. Finally, a post-editing simulation study demonstrates the potential for models to paraphrase and stitch together single-type caption units into multi-type captions by learning from data.

SESSION: Session: Online Advertising and Pricing

Stochastic bandits for multi-platform budget optimization in online advertising

We study the problem of an online advertising system that wants to optimally spend an advertiser’s given budget for a campaign across multiple platforms, without knowing the value for showing an ad to the users on those platforms. We model this challenging practical application as a Stochastic Bandits with Knapsacks problem over T rounds of bidding with the set of arms given by the set of distinct bidding m-tuples, where m is the number of platforms. We modify the algorithm proposed in Badanidiyuru et al., [11] to extend it to the case of multiple platforms to obtain an algorithm for both the discrete and continuous bid-spaces. Namely, for discrete bid spaces we give an algorithm with regret , where OPT is the performance of the optimal algorithm that knows the distributions. For continuous bid spaces the regret of our algorithm is . When restricted to this special-case, this bound improves over Sankararaman and Slivkins [34] in the regime OPT < < T, as is the case in the particular application at hand. Second, we show an lower bound for the discrete case and an Ω(m1/3B2/3) lower bound for the continuous setting, almost matching the upper bounds. Finally, we use a real-world data set from a large internet online advertising company with multiple ad platforms and show that our algorithms outperform common benchmarks and satisfy the required properties warranted in the real-world application.

Incrementality Testing in Programmatic Advertising: Enhanced Precision with Double-Blind Designs

Measuring the incremental value of advertising (incrementality) is critical for financial planning and budget allocation by advertisers. Running randomized controlled experiments is the gold standard in marketing incrementality measurement. Current literature and industry practices to run incrementality experiments focus on running placebo, intention-to-treat (ITT), or ghost bidding based experiments. A fundamental challenge with these is that the serving engine as treatment administrator is not blind to the user treatment assignment. Similarly, ITT and ghost bidding solutions provide greatly decreased precision since many experiment users never see ads. We present a novel randomized design solution for incrementality testing based on ghost bidding with improved measurement precision. Our design provides faster and cheaper results including double-blind, to the users and to the serving engine, post-auction experiment execution without ad targeting bias. We also identify ghost impressions in open ad exchanges by matching the bidding values or ads sent to external auctions with held-out bid values. This design leads to larger precision than ITT or current ghost bidding solutions. Our proposed design has been fully deployed in a real production system within a commercial programmatic ad network combined with a Demand Side Platform (DSP) that places ad bids in third-party ad exchanges. We have found reductions of up to 85% of the advertiser budget to reach statistical significance with typical ghost bids conversion and winner rates. Moreover, the highest statistical power at 50% control size design of this current practice is reached at 8% of our proposed design. By deploying this design, for an advertiser in the insurance industry, to measure the incrementality of display and native programmatic advertising, we have found conclusive evidence that the last-touch attribution framework (current industry standard) undervalues these channels by 87% when compared to the incremental conversions derived from the experiment.

FM2: Field-matrixed Factorization Machines for Recommender Systems

Click-through rate (CTR) prediction plays a critical role in recommender systems and online advertising. The data used in these applications are multi-field categorical data, where each feature belongs to one field. Field information is proved to be important and there are several works considering fields in their models. In this paper, we proposed a novel approach to model the field information effectively and efficiently. The proposed approach is a direct improvement of FwFM, and is named Field-matrixed Factorization Machines (FmFM, or FM2). We also proposed a new explanation of FM and FwFM within the FmFM framework, and compared it with the FFM. Besides pruning the cross terms, our model supports field-specific variable dimensions of embedding vectors, which acts as a soft pruning. We also proposed an efficient way to minimize the dimension while keeping the model performance. The FmFM model can also be optimized further by caching the intermediate vectors, and it only takes thousands of floating-point operations (FLOPs) to make a prediction. Our experiment results show that it can out-perform the FFM, which is more complex. The FmFM model’s performance is also comparable to DNN models which require much more FLOPs in runtime.

Integrating Floor Plans into Hedonic Models for Rent Price Appraisal

Online real estate platforms have become significant marketplaces facilitating users’ search for an apartment or a house. Yet it remains challenging to accurately appraise a property’s value. Prior works have primarily studied real estate valuation based on hedonic price models that take structured data into account while accompanying unstructured data is typically ignored. In this study, we investigate to what extent an automated visual analysis of apartment floor plans on online real estate platforms can enhance hedonic rent price appraisal. We propose a tailored two-staged deep learning approach to learn price-relevant designs of floor plans from historical price data. Subsequently, we integrate the floor plan predictions into hedonic rent price models that account for both structural and locational characteristics of an apartment. Our empirical analysis based on a unique dataset of 9,174 real estate listings suggests that current hedonic models underutilize the available data. We find that (1) the visual design of floor plans has significant explanatory power regarding rent prices – even after controlling for structural and locational apartment characteristics, and (2) harnessing floor plans results in an up to 10.56 % lower out-of-sample prediction error. We further find that floor plans yield a particularly high gain in prediction performance for older and smaller apartments. Altogether, our empirical findings contribute to the existing research body by establishing the link between the visual design of floor plans and real estate prices. Moreover, our approach has important implications for online real estate platforms, which can use our findings to enhance user experience in their real estate listings.

TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored Search

Text encoders based on C-DSSM or transformers have demonstrated strong performance in many Natural Language Processing (NLP) tasks. Low latency variants of these models have also been developed in recent years in order to apply them in the field of sponsored search which has strict computational constraints. However these models are not the panacea to solve all the Natural Language Understanding (NLU) challenges as the pure semantic information in the data is not sufficient to fully identify the user intents. We propose the TextGNN model that naturally extends the strong twin tower structured encoders with the complementary graph information from user historical behaviors, which serves as a natural guide to help us better understand the intents and hence generate better language representations. The model inherits all the benefits of twin tower models such as C-DSSM and TwinBERT so that it can still be used in the low latency environment while achieving a significant performance gain than the strong encoder-only counterpart baseline models in both offline evaluations and online production system. In offline experiments, the model achieves a 0.14% overall increase in ROC-AUC with a 1% increased accuracy for long-tail low-frequency Ads, and in the online A/B testing, the model shows a 2.03% increase in Revenue Per Mille with a 2.32% decrease in Ad defect rate.

SESSION: Session: Search

Leveraging User Behavior History for Personalized Email Search

An effective email search engine can facilitate users’ search tasks and improve their communication efficiency. Users could have varied preferences on various ranking signals of an email, such as relevance and recency based on their tasks at hand and even their jobs. Thus a uniform matching pattern is not optimal for all users. Instead, an effective email ranker should conduct personalized ranking by taking users’ characteristics into account. Existing studies have explored user characteristics from various angles to make email search results personalized. However, little attention has been given to users’ search history for characterizing users. Although users’ historical behaviors have been shown to be beneficial as context in Web search, their effect in email search has not been studied and remains unknown. Given these observations, we propose to leverage user search history as query context to characterize users and build a context-aware ranking model for email search. In contrast to previous context-dependent ranking techniques that are based on raw texts, we use ranking features in the search history. This frees us from potential privacy leakage while giving a better generalization power to unseen users. Accordingly, we propose a context-dependent neural ranking model (CNRM) that encodes the ranking features in users’ search history as query context and show that it can significantly outperform the baseline neural model without using the context. We also investigate the benefit of the query context vectors obtained from CNRM on the state-of-the-art learning-to-rank model LambdaMart by clustering the vectors and incorporating the cluster information. Experimental results show that significantly better results can be achieved on LambdaMart as well, indicating that the query clusters can characterize different users and effectively turn the ranking model personalized.

Partial-Softmax Loss based Deep Hashing

Recently, deep supervised hashing methods have shown state-of-the-art performance by integrating feature learning and hash codes learning into an end-to-end network to generate high-quality hash codes. However, it is still a challenge to learn discriminative hash codes for preserving the label information of images efficiently. To overcome this difficulty, in this paper, we propose a novel Partial-Softmax Loss based Deep Hashing, called PSLDH, to generate high-quality hash codes. Specifically, PSLDH first trains a category hashing network to generate a discriminative hash code for each category, and the hash code will preserve semantic information of the corresponding category well. Then, instead of defining the similarity between datapairs using their corresponding label vectors, we directly use the learned hash codes of categories to supervise the learning process of image hashing network, and a novel Partial-SoftMax loss is proposed to optimize the image hashing network. By minimizing the novel Partial-SoftMax loss, the learned hash codes can preserve the label information of images sufficiently. Extensive experiments on three benchmark datasets show that the proposed method outperforms the state-of-the-art baselines in image retrieval task.

Unsupervised Multi-Index Semantic Hashing

Semantic hashing represents documents as compact binary vectors (hash codes) and allows both efficient and effective similarity search in large-scale information retrieval. The state of the art has primarily focused on learning hash codes that improve similarity search effectiveness, while assuming a brute-force linear scan strategy for searching over all the hash codes, even though much faster alternatives exist. One such alternative is multi-index hashing, an approach that constructs a smaller candidate set to search over, which depending on the distribution of the hash codes can lead to sub-linear search time. In this work, we propose Multi-Index Semantic Hashing (MISH), an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing. We derive novel training objectives, which enable to learn hash codes that reduce the candidate sets produced by multi-index hashing, while being end-to-end trainable. In fact, our proposed training objectives are model agnostic, i.e., not tied to how the hash codes are generated specifically in MISH, and are straight-forward to include in existing and future semantic hashing models. We experimentally compare MISH to state-of-the-art semantic hashing baselines in the task of document similarity search. We find that even though multi-index hashing also improves the efficiency of the baselines compared to a linear scan, they are still upwards of 33% slower than MISH, while MISH is still able to obtain state-of-the-art effectiveness.

Learning a Product Relevance Model from Click-Through Data in E-Commerce

The search engine plays a fundamental role in online e-commerce systems, to help users find the products they want from the massive product collections. Relevance is an essential requirement for e-commerce search, since showing products that do not match search query intent will degrade user experience. With the existence of vocabulary gap between user language of queries and seller language of products, measuring semantic relevance is necessary and neural networks are engaged to address this task. However, semantic relevance is different from click-through rate prediction in that no direct training signal is available. Most previous attempts learn relevance models from user click-through data that are cheap and abundant. Unfortunately, click behavior is noisy and misleading, which is affected by not only relevance but also factors including price, image and attractive titles. Therefore, it is challenging but valuable to learn relevance models from click-through data. In this paper, we propose a new relevance learning framework that concentrates on how to train a relevance model from the weak supervision of click-through data. Different from previous efforts that treat samples as either relevant or irrelevant, we construct more fine-grained samples for training. We propose a novel way to consider samples of different relevance confidence, and come up with a new training objective to learn a robust relevance model with desirable score distribution. The proposed model is evaluated on offline annotated data and online A/B testing, and it achieves both promising performance and high computational efficiency. The model has already been deployed online, serving the search traffic of Taobao for over a year.

High-Dimensional Sparse Cross-Modal Hashing with Fine-Grained Similarity Embedding

Recently, with the discoveries in neurobiology, high-dimensional sparse hashing has attracted increasing attention. In contrast with general hashing that generates low-dimensional hash codes, the high-dimensional sparse hashing maps inputs into a higher dimensional space and generates sparse hash codes, achieving superior performance. However, the sparse hashing has not been fully studied in hashing literature yet. For example, how to fully explore the power of sparse coding in cross-modal retrieval tasks; how to discretely solve the binary and sparse constraints so as to avoid the quantization error problem. Motivated by these issues, in this paper, we present an efficient sparse hashing method, i.e., High-dimensional Sparse Cross-modal Hashing, HSCH for short. It not only takes the high-level semantic similarity of data into consideration, but also properly exploits the low-level feature similarity. In specific, we theoretically design a fine-grained similarity with two critical fusion rules. Then we take advantage of sparse codes to embed the fine-grained similarity into the to-be-learnt hash codes. Moreover, an efficient discrete optimization algorithm is proposed to solve the binary and sparse constraints, reducing the quantization error. In light of this, it becomes much more trainable, and the learnt hash codes are more discriminative. More importantly, the retrieval complexity of HSCH is as efficient as general hash methods. Extensive experiments on three widely-used datasets demonstrate the superior performance of HSCH compared with several state-of-the-art cross-modal hashing approaches.

SESSION: Session: Link prediction

Hashing-Accelerated Graph Neural Networks for Link Prediction

Networks are ubiquitous in the real world. Link prediction, as one of the key problems for network-structured data, aims to predict whether there exists a link between two nodes. The traditional approaches are based on the explicit similarity computation between the compact node representation by embedding each node into a low-dimensional space. In order to efficiently handle the intensive similarity computation in link prediction, the hashing technique has been successfully used to produce the node representation in the Hamming space. However, the hashing-based link prediction algorithms face accuracy loss from the randomized hashing techniques or inefficiency from the learning to hash techniques in the embedding process. Currently, the Graph Neural Network (GNN) framework has been widely applied to the graph-related tasks in an end-to-end manner, but it commonly requires substantial computational resources and memory costs due to massive parameter learning, which makes the GNN-based algorithms impractical without the help of a powerful workhorse. In this paper, we propose a simple and effective model called #GNN, which balances the trade-off between accuracy and efficiency. #GNN is able to efficiently acquire node representation in the Hamming space for link prediction by exploiting the randomized hashing technique to implement message passing and capture high-order proximity in the GNN framework. Furthermore, we characterize the discriminative power of #GNN in probability. The extensive experimental results demonstrate that the proposed #GNN algorithm achieves accuracy comparable to the learning-based algorithms and outperforms the randomized algorithm, while running significantly faster than the learning-based algorithms. Also, the proposed algorithm shows excellent scalability on a large-scale network with the limited resources.

Multi-view Graph Contrastive Representation Learning for Drug-Drug Interaction Prediction

Potential Drug-Drug Interactions (DDI) occur while treating complex or co-existing diseases with drug combinations, which may cause changes in drugs’ pharmacological activity. Therefore, DDI prediction has been an important task in the medical health machine learning community. Graph-based learning methods have recently aroused widespread interest and are proved to be a priority for this task. However, these methods are often limited to exploiting the inter-view drug molecular structure and ignoring the drug’s intra-view interaction relationship, vital to capturing the complex DDI patterns. This study presents a new method, multi-view graph contrastive representation learning for drug-drug interaction prediction, MIRACLE for brevity, to capture inter-view molecule structure and intra-view interactions between molecules simultaneously. MIRACLE treats a DDI network as a multi-view graph where each node in the interaction graph itself is a drug molecular graph instance. We use GCN to encode DDI relationships and a bond-aware attentive message propagating method to capture drug molecular structure information in the MIRACLE learning stage. Also, we propose a novel unsupervised contrastive learning component to balance and integrate the multi-view information. Comprehensive experiments on multiple real datasets show that MIRACLE outperforms the state-of-the-art DDI prediction models consistently.

Multi-level Hyperedge Distillation for Social Linking Prediction on Sparsely Observed Networks

Social linking prediction is one of the most fundamental problems in online social networks and has attracted researchers’ persistent attention. Most of the existing works predict unobserved links using graph neural networks (GNNs) to learn node embeddings upon pair-wise relations. Despite promising results given enough observed links, these models are still challenging to achieve heart-stirring performance when observed links are extremely limited. The main reason is that they only focus on the smoothness of node representations on pair-wise relations. Unfortunately, this assumption may fall when the networks do not have enough observed links to support it. To this end, we go beyond pair-wise relations and propose a new and novel framework using hypergraph neural networks with multi-level hyperedge distillation strategies. To break through the limitations of sparsely observed links, we introduce the hypergraph to uncover higher-level relations, which is exceptionally crucial to deduce unobserved links. A hypergraph allows one edge to connect multiple nodes, making it easier to learn better higher-level relations for link prediction. To overcome the restrictions of manually designed hypergraphs, which is constant in most hypergraph researches, we propose a new method to learn high-quality hyperedges using three novel hyperedges distillation strategies automatically. The generated hyperedges are hierarchical and follow the power-law distribution, which can significantly improve the link prediction performance. To predict unobserved links, we present a novel hypergraph neural networks named HNN. HNN takes the multi-level hypergraphs as input and makes the node embeddings smooth on hyperedges instead of pair-wise links only. Extensive evaluations on four real-world datasets demonstrate our model’s superior performance over state-of-the-art baselines, especially when the observed links are extremely reduced.

Self-Supervised Learning of Contextual Embeddings for Link Prediction in Heterogeneous Networks

Representation learning methods for heterogeneous networks produce a low-dimensional vector embedding (that is typically fixed for all tasks) for each node. Many of the existing methods focus on obtaining a static vector representation for a node in a way that is agnostic to the downstream application where it is being used. In practice, however, downstream tasks such as link prediction require specific contextual information that can be extracted from the subgraphs related to the nodes provided as input to the task. To tackle this challenge, we develop , a framework for bridging static representation learning methods using global information from the entire graph with localized attention driven mechanisms to learn contextual node representations. We first pre-train our model in a self-supervised manner by introducing higher-order semantic associations and masking nodes, and then fine-tune our model for a specific link prediction task. Instead of training node representations by aggregating information from all semantic neighbors connected via metapaths, we automatically learn the composition of different metapaths that characterize the context for a specific task without the need for any pre-defined metapaths. significantly outperforms both static and contextual embedding learning methods on several publicly available benchmark network datasets. We also demonstrate the interpretability, effectiveness of contextual learning, and the scalability of through extensive evaluation.

Community Value Prediction in Social E-commerce

The phenomenal success of the newly-emerging social e-commerce has demonstrated that utilizing social relations is becoming a promising approach to promote e-commerce platforms. In this new scenario, one of the most important problems is to predict the value of a community formed by closely connected users in social networks due to its tremendous business value. However, few works have addressed this problem because of 1) its novel setting and 2) its challenging nature that the structure of a community has complex effects on its value. To bridge this gap, we develop a Multi-scale Structure-aware Community value prediction network (MSC) that jointly models the structural information of different scales, including peer relations, community structure, and inter-community connections, to predict the value of given communities. Specifically, we first proposed a Masked Edge Learning Graph Convolutional Network (MEL-GCN) based on a novel masked propagation mechanism to model peer influence. Then, we design a Pair-wise Community Pooling (PCPool) module to capture critical community structures. Finally, we model the inter-community connections by distinguishing intra-community edges from inter-community edges and employing a Multi-aggregator Framework (MAF). Extensive experiments on a large-scale real-world social e-commerce dataset demonstrate our method’s superior performance over state-of-the-art baselines, with a relative performance gain of 11.40%, 10.01%, and 10.97% in MAE, RMSE, and NRMSE, respectively. Further ablation study shows the effectiveness of our designed components. Our code and dataset are available1.

SESSION: Session: Recommendations

RetaGNN: Relational Temporal Attentive Graph Neural Networks for Holistic Sequential Recommendation

Sequential recommendation (SR) is to accurately recommend a list of items for a user based on her current accessed ones. While new-coming users continuously arrive in the real world, one crucial task is to have inductive SR that can produce embeddings of users and items without re-training. Given user-item interactions can be extremely sparse, another critical task is to have transferable SR that can transfer the knowledge derived from one domain with rich data to another domain. In this work, we aim to present the holistic SR that simultaneously accommodates conventional, inductive, and transferable settings. We propose a novel deep learning-based model, Relational Temporal Attentive Graph Neural Networks (RetaGNN), for holistic SR. The main idea of RetaGNN is three-fold. First, to have inductive and transferable capabilities, we train a relational attentive GNN on the local subgraph extracted from a user-item pair, in which the learnable weight matrices are on various relations among users, items, and attributes, rather than nodes or edges. Second, long-term and short-term temporal patterns of user preferences are encoded by a proposed sequential self-attention mechanism. Third, a relation-aware regularization term is devised for better training of RetaGNN. Experiments conducted on MovieLens, Instagram, and Book-Crossing datasets exhibit that RetaGNN can outperform state-of-the-art methods under conventional, inductive, and transferable settings. The derived attention weights also bring model explainability.

Disentangling User Interest and Conformity for Recommendation with Causal Embedding

Recommendation models are usually trained on observational interaction data. However, observational interaction data could result from users’ conformity towards popular items, which entangles users’ real interest. Existing methods tracks this problem as eliminating popularity bias, e.g., by re-weighting training samples or leveraging a small fraction of unbiased data. However, the variety of user conformity is ignored by these approaches, and different causes of an interaction are bundled together as unified representations, hence robustness and interpretability are not guaranteed when underlying causes are changing. In this paper, we present DICE, a general framework that learns representations where interest and conformity are structurally disentangled, and various backbone recommendation models could be smoothly integrated. We assign users and items with separate embeddings for interest and conformity, and make each embedding capture only one cause by training with cause-specific data which is obtained according to the colliding effect of causal inference. Our proposed methodology outperforms state-of-the-art baselines with remarkable improvements on two real-world datasets on top of various backbone models. We further demonstrate that the learned embeddings successfully capture the desired causes, and show that DICE guarantees the robustness and interpretability of recommendation.

Future-Aware Diverse Trends Framework for Recommendation

In recommender systems, modeling user-item behaviors is essential for user representation learning. Existing sequential recommenders consider the sequential correlations between historically interacted items for capturing users’ historical preferences. However, since users’ preferences are by nature time-evolving and diversified, solely modeling the historical preference (without being aware of the time-evolving trends of preferences) can be inferior for recommending complementary or fresh items and thus hurt the effectiveness of recommender systems. In this paper, we bridge the gap between the past preference and potential future preference by proposing the future-aware diverse trends (FAT) framework. By future-aware, for each inspected user, we construct the future sequences from other similar users, which comprise of behaviors that happen after the last behavior of the inspected user, based on a proposed neighbor behavior extractor. By diverse trends, supposing the future preferences can be diversified, we propose the diverse trends extractor and the time-aware mechanism to represent the possible trends of preferences for a given user with multiple vectors. We leverage both the representations of historical preference and possible future trends to obtain the final recommendation. The quantitative and qualitative results from relatively extensive experiments on real-world datasets demonstrate the proposed framework not only outperforms the state-of-the-art sequential recommendation methods across various metrics, but also makes complementary and fresh recommendations.

Graph Embedding for Recommendation against Attribute Inference Attacks

In recent years, recommender systems play a pivotal role in helping users identify the most suitable items that satisfy personal preferences. As user-item interactions can be naturally modelled as graph-structured data, variants of graph convolutional networks (GCNs) have become a well-established building block in the latest recommenders. Due to the wide utilization of sensitive user profile data, existing recommendation paradigms are likely to expose users to the threat of privacy breach, and GCN-based recommenders are no exception. Apart from the leakage of raw user data, the fragility of current recommenders under inference attacks offers malicious attackers a backdoor to estimate users’ private attributes via their behavioral footprints and the recommendation results. However, little attention has been paid to developing recommender systems that can defend such attribute inference attacks, and existing works achieve attack resistance by either sacrificing considerable recommendation accuracy or only covering specific attack models or protected information. In our paper, we propose GERAI, a novel differentially private graph convolutional network to address such limitations. Specifically, in GERAI, we bind the information perturbation mechanism in differential privacy with the recommendation capability of graph convolutional networks. Furthermore, based on local differential privacy and functional mechanism, we innovatively devise a dual-stage encryption paradigm to simultaneously enforce privacy guarantee on users’ sensitive features and the model optimization process. Extensive experiments show the superiority of GERAI in terms of its resistance to attribute inference attacks and recommendation effectiveness.

AutoDim: Field-aware Embedding Dimension Searchin Recommender Systems

Practical large-scale recommender systems usually contain thousands of feature fields from users, items, contextual information, and their interactions. Most of them empirically allocate a unified dimension to all feature fields, which is memory inefficient. Thus it is highly desired to assign various embedding dimensions to different feature fields according to their importance and predictability. Due to the large amounts of feature fields and the nuanced relationship between embedding dimensions with feature distributions and neural network architectures, manually allocating embedding dimensions in practical recommender systems can be challenging. To this end, we propose an AutoML-based framework (AutoDim) in this paper, which can automatically select dimensions for different feature fields in a data-driven fashion. Specifically, we first proposed an end-to-end differentiable framework that can calculate the weights over various dimensions in a soft and continuous manner for feature fields, and an AutoML-based optimization algorithm; then, we derive a hard and discrete embedding component architecture according to the maximal weights and retrain the whole recommender framework. We conduct extensive experiments on benchmark datasets to validate the effectiveness of AutoDim.

SESSION: Session: Topic Modeling

Verdi: Quality Estimation and Error Detection for Bilingual Corpora

Translation Quality Estimation is critical to reducing post-editing efforts in machine translation and to cross-lingual corpus cleaning. As a research problem, quality estimation (QE) aims to directly estimate the quality of translation in a given pair of source and target sentences, and highlight the words that need corrections, without referencing to golden translations. In this paper, we propose Verdi, a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora. Verdi adopts two word predictors to enable diverse features to be extracted from a pair of sentences for subsequent quality estimation, including a transformer-based neural machine translation (NMT) model and a pre-trained cross-lingual language model (XLM). We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor, which handles a primal task and a dual task simultaneously with weight sharing, leading to stronger context prediction ability than single-direction NMT models. By taking advantage of the dual learning scheme, we further design a novel feature to directly encode the translated target information without relying on the source context. Extensive experiments conducted on WMT20 QE tasks demonstrate that our method beats the winner of the competition and outperforms other baseline methods by a great margin. We further use the sentence-level scores provided by Verdi to clean a parallel corpus and observe benefits on both model performance and training efficiency.

Crosslingual Topic Modeling with WikiPDA

We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA), a crosslingual topic model that learns to represent Wikipedia articles written in any language as distributions over a common set of language-independent topics. It leverages the fact that Wikipedia articles link to each other and are mapped to concepts in the Wikidata knowledge base, such that, when represented as bags of links, articles are inherently language-independent. WikiPDA works in two steps, by first densifying bags of links using matrix completion and then training a standard monolingual topic model. A human evaluation shows that WikiPDA produces more coherent topics than monolingual text-based latent Dirichlet allocation (LDA), thus offering crosslinguality at no cost. We demonstrate WikiPDA’s utility in two applications: a study of topical biases in 28 Wikipedia language editions, and crosslingual supervised document classification. Finally, we highlight WikiPDA’s capacity for zero-shot language transfer, where a model is reused for new languages without any fine-tuning. Researchers can benefit from WikiPDA as a practical tool for studying Wikipedia’s content across its 299 language editions in interpretable ways, via an easy-to-use library publicly available at https://github.com/epfl-dlab/WikiPDA.

Keyword-aware Abstractive Summarization by Extracting Set-level Intermediate Summaries

Abstractive summarization is useful in providing a summary or a digest of news or other web texts and enhancing users reading experience, especially when they are reading on small displays such as mobile phones. However, existing encoder-decoder summarization models have difficulty learning the latent alignment between source documents and summaries because of their vast disparity in length. In this paper, we propose a extractor-abstractor framework in which the keyword-based extractor selects a few sets of salient sentences from the input document and then the abstractor paraphrases these sets of sentences in parallel, which are more aligned to the summary, to generate the final summary. The new extractor and abstractor are pretrained from a set of “pseudo summaries” extracted by specially designed heuristics, and then further trained together in a reinforcement learning framework. The results show that the proposed model generates high-quality summaries with faster training speed and less training memory footprint, and outperforms the state-of-the-art models on CNN/Daily Mail, Webis-TLDR-17, Webis-Snippet-20, WikiHow and DUC-2002 datasets.

Graph Topic Neural Network for Document Representation

Graph Neural Networks (GNNs) such as GCN can effectively learn document representations via the semantic relation graph among documents and words. However, despite a few exceptions, most of the previous work in this line of research does not consider the underlying topical semantics inherited in document contents and the relation graph, making the representations less effective and hard to interpret. In a few recent studies trying to incorporate latent topics into GNNs, the topics have been learned independently from the relation graph modeling. Intuitively, topic extraction can benefit much from the information propagation of the relation graph structure - directly and indirectly connected documents and words have similar topics. In this paper, we propose a novel Graph Topic Neural Network (GTNN) model to mine latent topic semantics for interpretable document representation learning, taking into account the document-document, document-word, and word-word relationships in the graph. We also show that our model can be viewed as semi-amortized inference for relational topic model based on Poisson distribution, with high order correlations. We test our model in several settings: unsupervised, semi-supervised, and supervised representation learning, for both connected and unconnected documents. In all the cases, our model outperforms the state-of-the-art models for these tasks.

Insightful Dimensionality Reduction with Very Low Rank Variable Subsets

Dimensionality reduction techniques can be employed to produce robust, cost-effective predictive models, and to enhance interpretability in exploratory data analysis. However, the models produced by many of these methods are formulated in terms of abstract factors or are too high-dimensional to facilitate insight and fit within low computational budgets.

In this paper we explore an alternative approach to interpretable dimensionality reduction. Given a data matrix, we study the following question: are there subsets of variables that can be primarily explained by a single factor?

We formulate this challenge as the problem of finding submatrices close to rank one. Despite its potential, this topic has not been sufficiently addressed in the literature, and there exist virtually no algorithms for this purpose that are simultaneously effective, efficient and scalable.

We formalize the task as two problems which we characterize in terms of computational complexity, and propose efficient, scalable algorithms with approximation guarantees. Our experiments demonstrate how our approach can produce insightful findings in data, and show our algorithms to be superior to strong baselines.

SESSION: Session: Mobile and Ubiquitous Computing

SDFVAE: Static and Dynamic Factorized VAE for Anomaly Detection of Multivariate CDN KPIs

Content Delivery Networks (CDNs) are critical for providing good user experience of cloud services. CDN providers typically collect various multivariate Key Performance Indicators (KPIs) time series to monitor and diagnose system performance. State-of-the-art anomaly detection methods mostly use deep learning to extract the normal patterns of data, due to its superior performance. However, KPI data usually exhibit non-additive Gaussian noise, which makes it difficult for deep learning models to learn the normal patterns, resulting in degraded performance in anomaly detection. In this paper, we propose a robust and noise-resilient anomaly detection mechanism using multivariate KPIs. Our key insight is that different KPIs are constrained by certain time-invariant characteristics of the underlying system, and that explicitly modelling such invariance may help resist noise in the data. We thus propose a novel anomaly detection method called SDFVAE, short for Static and Dynamic Factorized VAE, that learns the representations of KPIs by explicitly factorizing the latent variables into dynamic and static parts. Extensive experiments using real-world data show that SDFVAE achieves a F1-score ranging from 0.92 to 0.99 on both regular and noisy dataset, outperforming state-of-the-art methods by a large margin.

MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments

With the advantages of flexible scalability and fast delivery, microservice has become a popular software architecture in the modern IT industry. However, the explosion in the number of service instances and complex dependencies make the troubleshooting extremely challenging in microservice environments. To help understand and troubleshoot a microservice system, the end-to-end tracing technology has been widely applied to capture the execution path of each request. Nevertheless, the tracing data are not fully leveraged by cloud and application providers when conducting latency issue localization in the microservice environment.

This paper proposes a novel system, named MicroRank, which analyzes clues provided by normal and abnormal traces to locate root causes of latency issues. Once a latency issue is detected by the Anomaly Detector in MicroRank, the cause localization procedure is triggered. MicroRank first distinguishs which traces are abnormal. Then, MicroRank’s PageRank Scorer module uses the abnormal and normal trace information as its input and differentials the importance of different traces to extended spectrum techniques . Finally, the spectrum techniques can calculate the ranking list based on the weighted spectrum information from PageRank Scorer to locate root causes more effectively. The experimental evaluations on a widely-used open-source system and a production system show that MicroRank achieves excellent results not only in one root cause situation but also in two issues that happen at the same time. Moreover, MicroRank makes 6% to 22% improvement in recall in localizing root causes compared to current state-of-the-art methods.

Outlier-Resilient Web Service QoS Prediction

The proliferation of Web services makes it difficult for users to select the most appropriate one among numerous functionally identical or similar service candidates. Quality-of-Service (QoS) describes the non-functional characteristics of Web services, and it has become the key differentiator for service selection. However, users cannot invoke all Web services to obtain the corresponding QoS values due to high time cost and huge resource overhead. Thus, it is essential to predict unknown QoS values. Although various QoS prediction methods have been proposed, few of them have taken outliers into consideration, which may dramatically degrade the prediction performance. To overcome this limitation, we propose an outlier-resilient QoS prediction method in this paper. Our method utilizes Cauchy loss to measure the discrepancy between the observed QoS values and the predicted ones. Owing to the robustness of Cauchy loss, our method is resilient to outliers. We further extend our method to provide time-aware QoS prediction results by taking the temporal information into consideration. Finally, we conduct extensive experiments on both static and dynamic datasets. The results demonstrate that our method is able to achieve better performance than state-of-the-art baseline methods.

Autodidactic Neurosurgeon: Collaborative Deep Inference for Mobile Edge Intelligence via Online Learning

Recent breakthroughs in deep learning (DL) have led to the emergence of many intelligent mobile applications and services, but in the meanwhile also pose unprecedented computing challenges on resource-constrained mobile devices. This paper builds a collaborative deep inference system between a resource-constrained mobile device and a powerful edge server, aiming at joining the power of both on-device processing and computation offloading. The basic idea of this system is to partition a deep neural network (DNN) into a front-end part running on the mobile device and a back-end part running on the edge server, with the key challenge being how to locate the optimal partition point to minimize the end-to-end inference delay. Unlike existing efforts on DNN partitioning that rely heavily on a dedicated offline profiling stage to search for the optimal partition point, our system has a built-in online learning module, called Autodidactic Neurosurgeon (ANS), to automatically learn the optimal partition point on-the-fly. Therefore, ANS is able to closely follow the changes of the system environment by generating new knowledge for adaptive decision making. The core of ANS is a novel contextual bandit learning algorithm, called μLinUCB, which not only has provable theoretical learning performance guarantee but also is ultra-lightweight for easy real-world implementation. We implement our system on a video stream object detection testbed to validate the design of ANS and evaluate its performance. The experiments show that ANS significantly outperforms state-of-the-art benchmarks in terms of tracking system changes and reducing the end-to-end inference delay.

Time Series Change Point Detection with Self-Supervised Contrastive Predictive Coding

Change Point Detection (CPD) methods identify the times associated with changes in the trends and properties of time series data in order to describe the underlying behaviour of the system. For instance, detecting the changes and anomalies associated with web service usage, application usage or human behaviour can provide valuable insights for downstream modelling tasks. We propose a novel approach for self-supervised Time Series Change Point detection method based on Contrastive Predictive coding (TS − CP2). TS − CP2 is the first approach to employ a contrastive learning strategy for CPD by learning an embedded representation that separates pairs of embeddings of time adjacent intervals from pairs of interval embeddings separated across time. Through extensive experiments on three diverse, widely used time series datasets, we demonstrate that our method outperforms five state-of-the-art CPD methods, which include unsupervised and semi-supervised approaches. TS − CP2 is shown to improve the performance of methods that use either handcrafted statistical or temporal features by 79.4% and deep learning-based methods by 17.0% with respect to the F1-score averaged across the three datasets.

SESSION: Session: Graph Neural Networks

REST: Reciprocal Framework for Spatiotemporal-coupled Predictions

In recent years, Graph Convolutional Networks (GCNs) have been applied to benefit spatiotemporal predictions. The current shell for spatiotemporal predictions often relies heavily on the quality of handcraft, fixed graphical structures, however, we argue that such a paradigm could be expensive and sub-optimal in many applications. To raise the bar, this paper proposes to jointly mine the spatial dependencies and model temporal patterns in a coupled framework, i.e., to make spatiotemporal-coupled predictions. We come up with a novel Reciprocal SpatioTemporal (REST) framework, which introduces Edge Inference Networks (EINs) to couple with GCNs. From the temporal side to the spatial side, EINs infer spatial dependencies among time series vertices and generate multi-modal directed weighted graphs to serve GCNs. And from the temporal side to the spatial side, GCNs utilize these spatial dependencies to make predictions and then introduce feedback to optimize EINs. The REST framework is incrementally trained for higher performance of spatiotemporal prediction, powered by the reciprocity between its comprised two components from such an iterative joint learning process. Additionally, to maximize the power of the REST framework, we design a phased heuristic approach, which effectively stabilizes training procedure and prevents early-stop. Extensive experiments on two real-world datasets have demonstrated that the proposed REST framework significantly outperforms baselines, and can learn meaningful spatial dependencies beyond predefined graphical structures.

Predicting Customer Value with Social Relationships via Motif-based Graph Attention Networks

Customer value is essential for successful customer relationship management. Although growing evidence suggests that customers’ purchase decisions can be influenced by social relationships, social influence is largely overlooked in previous research. In this work, we fill this gap with a novel framework — Motif-based Multi-view Graph Attention Networks with Gated Fusion (MAG), which jointly considers customer demographics, past behaviors, and social network structures. Specifically, (1) to make the best use of higher-order information in complex social networks, we design a motif-based multi-view graph attention module, which explicitly captures different higher-order structures, along with the attention mechanism auto-assigning high weights to informative ones. (2) To model the complex effects of customer attributes and social influence, we propose a gated fusion module with two gates: one depicts the susceptibility to social influence and the other depicts the dependency of the two factors. Extensive experiments on two large-scale datasets show superior performance of our model over the state-of-the-art baselines. Further, we discover that the increase of motifs does not guarantee better performances and identify how motifs play different roles. These findings shed light on how to understand socio-economic relationships among customers and find high-value customers.

HINTS: Citation Time Series Prediction for New Publications via Dynamic Heterogeneous Information Network Embedding

Accurate prediction of scientific impact is important for scientists, academic recommender systems, and granting organizations alike. Existing approaches rely on many years of leading citation values to predict a scientific paper’s citations (a proxy for impact), even though most papers make their largest contributions in the first few years after they are published. In this paper, we tackle a new problem: predicting a new paper’s citation time series from the date of publication (i.e., without leading values). We propose HINTS, a novel end-to-end deep learning framework that converts citation signals from dynamic heterogeneous information networks (DHIN) into citation time series. HINTS imputes pseudo-leading values for a paper in the years before it is published from DHIN embeddings, and then transforms these embeddings into the parameters of a formal model that can predict citation counts immediately after publication. Empirical analysis on two real-world datasets from Computer Science and Physics show that HINTS is competitive with baseline citation prediction models. While we focus on citations, our approach generalizes to other “cold start” time series prediction tasks where relational data is available and accurate prediction in early timestamps is crucial.

Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection

Graph-based fraud detection approaches have escalated lots of attention recently due to the abundant relational information of graph-structured data, which may be beneficial for the detection of fraudsters. However, the GNN-based algorithms could fare poorly when the label distribution of nodes is heavily skewed, and it is common in sensitive areas such as financial fraud, etc. To remedy the class imbalance problem of graph-based fraud detection, we propose a Pick and Choose Graph Neural Network (PC-GNN for short) for imbalanced supervised learning on graphs. First, nodes and edges are picked with a devised label-balanced sampler to construct sub-graphs for mini-batch training. Next, for each node in the sub-graph, the neighbor candidates are chosen by a proposed neighborhood sampler. Finally, information from the selected neighbors and different relations are aggregated to obtain the final representation of a target node. Experiments on both benchmark and real-world graph-based fraud detection tasks demonstrate that PC-GNN apparently outperforms state-of-the-art baselines.

Rumor Detection with Field of Linear and Non-Linear Propagation

The propagation of rumors is a complex and varied phenomenon. In the process of rumor dissemination, in addition to rumor claims, there will be abundant social context information surrounding the rumor. Therefore, it is vital to learn the characteristics of rumors in terms of both the linear temporal sequence and the non-linear diffusion structure simultaneously. However, in some existing research, time-dependent and diffusion-related information has not been fully utilized. Accordingly, in this paper, we propose a novel model Rumor Detection with Field of Linear and Non-Linear Propagation (RDLNP) to automatically detect rumors from the above two fields by taking advantage of claim content, social context and temporal information. First, the Rumor Hybrid Feature Learning (RHFL) we designed can extract the correlations between the claims and temporal information, differentiate the hybrid features of specific posts, and generate unified representations for rumors. Second, we proposed Non-Linear Structure Learning (NLSL) and Linear Sequence Learning (LSL) to integrate contextual features along the path of the diffusion structure and temporal engagement variation of responses respectively. Finally, Shared Feature Learning (SFL) models the representation reinforcement and learns the mutual influence between NLSL and LSL, and then highlights their valuable features. Experiments conduct on two public and widely used datasets, i.e. PHEME and RumorEval, demonstrate both the effectiveness and the outstanding performance of the proposed approach.

SESSION: Session: Personalization

Situation and Behavior Understanding by Trope Detection on Films

The human ability of deep cognitive skills is crucial for the development of various real-world applications that process diverse and abundant user generated input. While recent progress of deep learning and natural language processing have enabled learning system to reach human performance on some benchmarks requiring shallow semantics, such human ability still remains challenging for even modern contextual embedding models, as pointed out by many recent studies [9, 10, 22, 24, 32]. Existing machine comprehension datasets assume sentence-level input, lack of casual or motivational inferences, or can be answered with question-answer bias. Here, we present a challenging novel task, trope detection on films, in an effort to create a situation and behavior understanding for machines. Tropes are frequently used storytelling devices for creative works. Comparing to existing movie tag prediction tasks, tropes are more sophisticated as they can vary widely, from a moral concept to a series of circumstances, and embedded with motivations and cause-and-effects. We introduce a new dataset, Tropes in Movie Synopses (TiMoS), with 5623 movie synopses and 95 different tropes collecting from a Wikipedia-style database, TVTropes. We present a multi-stream comprehension network (MulCom) leveraging multi-level attention of words, sentences, and role relations. Experimental result demonstrates that modern models including BERT contextual embedding, movie tag prediction systems, and relational networks, perform at most 37% of human performance (23.97/64.87) in terms of F1 score. Our MulCom outperforms all modern baselines, by 1.5 to 5.0 F1 score and 1.5 to 3.0 mean of average precision (mAP) score. We also provide a detailed analysis and human evaluation to pave ways for future research.

A Novel Macro-Micro Fusion Network for User Representation Learning on Mobile Apps

The evolution of mobile apps has greatly changed the way that we live. It becomes increasingly important to understand and model the users on mobile apps. Instead of focusing on some specific app alone, it has become a popular paradigm to study the user behavior on various mobile apps in a symbiotic environment.

In this paper, we study the task of user representation learning with both macro and micro interaction data on mobile apps. Specifically, macro and micro interaction refer to user-app interaction or user-item interaction on some specific app, respectively. By combining the two kinds of user data, it is expected to derive a more comprehensive, robust user representation model on mobile apps. In order to effectively fuse the information across the macro and micro views, we propose a novel macro-micro fusion network for user representation learning on mobile apps. With a Transformer architecture as the base model, we design a representation fusion component that is able to capture the category-based semantic alignment at the user level. After such semantic alignment, the information across the two views can be adaptively fused in our approach. Furthermore, we adopt mutual information maximization to derive a self-supervised loss to enhance the learning of our fusion network. Extensive experiments with three downstream tasks on two real-world datasets have demonstrated the effectiveness of our approach.

Where To Next? A Dynamic Model of User Preferences

We consider the problem of predicting users’ preferences on online platforms. We build on recent findings suggesting that users’ preferences change over time, and that helping users expand their horizons is important in ensuring that they stay engaged. Most existing models of user preferences attempt to capture simultaneous preferences: “Users who like A tend to like B as well”. In this paper, we argue that these models fail to anticipate changing preferences. To overcome this issue, we seek to understand the structure that underlies the evolution of user preferences. To this end, we propose the Preference Transition Model (PTM), a dynamic model for user preferences towards classes of items. The model enables the estimation of transition probabilities between classes of items over time, which can be used to estimate how users’ tastes are expected to evolve based on their past history. We test our model’s predictive performance on a number of different prediction tasks on data from three different domains: music streaming, restaurant recommendations and movie recommendations, and find that it outperforms competing approaches. We then focus on a music application, and inspect the structure learned by our model. We find that the PTM uncovers remarkable regularities in users’ preference trajectories over time. We believe that these findings could inform a new generation of dynamic, diversity-enhancing recommender systems.

Density-Ratio Based Personalised Ranking from Implicit Feedback

Learning from implicit user feedback is challenging as we can only observe positive samples but never access negative ones. Most conventional methods cope with this issue by adopting a pairwise ranking approach with negative sampling. However, the pairwise ranking approach has a severe disadvantage in the convergence time owing to the quadratically increasing computational cost with respect to the sample size; it is problematic, particularly for large-scale datasets and complex models such as neural networks. By contrast, a pointwise approach does not directly solve a ranking problem, and is therefore inferior to a pairwise counterpart in top-K ranking tasks; however, it is generally advantageous in regards to the convergence time. This study aims to establish an approach to learn personalised ranking from implicit feedback, which reconciles the training efficiency of the pointwise approach and ranking effectiveness of the pairwise counterpart. The key idea is to estimate the ranking of items in a pointwise manner; we first reformulate the conventional pointwise approach based on density ratio estimation and then incorporate the essence of ranking-oriented approaches (e.g. the pairwise approach) into our formulation. Through experiments on three real-world datasets, we demonstrate that our approach dramatically reduces the convergence time (one to two orders of magnitude faster) and significantly improves the ranking performance.

Itinerary-aware Personalized Deep Matching at Fliggy

Matching items for a user from a travel item pool of large cardinality have been the most important technology for increasing the business at Fliggy, one of the most popular online travel platforms (OTPs) in China. There are three major challenges facing OTPs: sparsity, diversity, and implicitness. In this paper, we present a novel Fliggy ITinerary-aware deep matching NETwork (FitNET) to address these three challenges. FitNET is designed based on the popular deep matching network, which has been successfully employed in many industrial recommendation systems, due to its effectiveness. The concept itinerary is firstly proposed under the context of recommendation systems for OTPs, which is defined as the list of unconsumed orders of a user. All orders in a user itinerary are learned as a whole, based on which the implicit travel intention of each user can be more accurately inferred. To alleviate the sparsity problem, users’ profiles are incorporated into FitNET. Meanwhile, a series of itinerary-aware attention mechanisms that capture the vital interactions between user’s itinerary and other input categories are carefully designed. These mechanisms are very helpful in inferring a user’s travel intention or preference, and handling the diversity in a user’s need. Further, two training objectives, i.e., prediction accuracy of user’s travel intention and prediction accuracy of user’s click behavior, are utilized by FitNET, so that these two objectives can be optimized simultaneously. An offline experiment on Fliggy production dataset with over 0.27 million users and 1.55 million travel items, and an online A/B test both show that FitNET effectively learns users’ travel intentions, preferences, and diverse needs, based on their itineraries and gains superior performance compared with state-of-the-art methods. FitNET now has been successfully deployed at Fliggy, serving major online traffic.

SESSION: Session: Text Classification and Clustering

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

Multi-label text classification refers to the problem of assigning each given document its most relevant labels from a label set. Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications. However, most existing studies focus on only modeling the text information, with a few attempts to utilize either metadata or hierarchy signals, but not both of them. In this paper, we bridge the gap by formalizing the problem of metadata-aware text classification in a large label hierarchy (e.g., with tens of thousands of labels). To address this problem, we present the MATCH1 solution—an end-to-end framework that leverages both metadata and hierarchy information. To incorporate metadata, we pre-train the embeddings of text and metadata in the same space and also leverage the fully-connected attentions to capture the interrelations between them. To leverage the label hierarchy, we propose different ways to regularize the parameters and output probability of each child label by its parents. Extensive experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH over the state-of-the-art deep learning baselines.

Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks

Text categorization is an essential task in Web content analysis. Considering the ever-evolving Web data and new emerging categories, instead of the laborious supervised setting, in this paper, we focus on the minimally-supervised setting that aims to categorize documents effectively, with a couple of seed documents annotated per category. We recognize that texts collected from the Web are often structure-rich, i.e., accompanied by various metadata. One can easily organize the corpus into a text-rich network, joining raw text documents with document attributes, high-quality phrases, label surface names as nodes, and their associations as edges. Such a network provides a holistic view of the corpus’ heterogeneous data sources and enables a joint optimization for network-based analysis and deep textual model training. We therefore propose a novel framework for minimally supervised categorization by learning from the text-rich network. Specifically, we jointly train two modules with different inductive biases – a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning. Each module generates pseudo training labels from the unlabeled document set, and both modules mutually enhance each other by co-training using pooled pseudo labels. We test our model on two real-world datasets. On the challenging e-commerce product categorization dataset with 683 categories, our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%, significantly outperforming all compared methods; our accuracy is only less than 2% away from the supervised BERT model trained on about 50K labeled documents.

Scalable Auto-weighted Discrete Multi-view Clustering

Multi-view clustering has been widely studied in machine learning, which uses complementary information to improve clustering performance. However, challenges remain when handling large-scale multi-view data due to the traditional approaches’ high time complexity. Besides, the existing approaches suffer from parameter selection. Due to the lack of labeled data, parameter selection in practical clustering applications is difficult, especially in big data. In this paper, we propose a novel approach for large-scale multi-view clustering to overcome the above challenges. Our approach focuses on learning the low-dimensional binary embedding of multi-view data, preserving the samples’ local structure during binary embedding, and optimizing the embedding and clustering in a unified framework. Furthermore, we proposed to learn the parameters using a combination of data-driven and heuristic approaches. Experiments on five large-scale multi-view datasets show that the proposed method is superior to the state-of-the-art in terms of clustering quality and running time.

Linguistically-Enriched and Context-AwareZero-shot Slot Filling

Slot filling is identifying contiguous spans of words in an utterance that correspond to certain parameters (i.e., slots) of a user request/query. Slot filling is one of the most important challenges in modern task-oriented dialog systems. Supervised approaches have proven effective at tackling this challenge, but they need a significant amount of labeled training data in a given domain. However, new domains (i.e., unseen in training) may emerge after deployment. Thus, it is imperative that these models seamlessly adapt and fill slots from both seen and unseen domains – unseen domains contain unseen slot types with no training data, and even seen slots in unseen domains are typically presented in different contexts. This setting is commonly referred to as zero-shot slot filling. Little work has focused on this setting, with limited experimental evaluation. Existing models that mainly rely on context-independent embedding-based similarity measures fail to detect slot values in unseen domains or do so only partially. We propose a new zero-shot slot filling neural model, , which works in three steps. Step one acquires domain-oblivious, context-aware representations of utterance words by exploiting (a) linguistic features such as part-of-speech tags; (b) named entity recognition cues; and (c) contextual embeddings from pre-trained language models. Step two fine-tunes these rich representations and produces slot-independent tags for each word. Step three exploits generalizable context-aware utterance-slot similarity features at the word level, uses slot-independent tags, and contextualizes them to produce slot-specific predictions for each word. Our thorough evaluation on four diverse public datasets demonstrates that our approach consistently outperforms state-of-the-art models by 17.52%, 22.15%, 17.42%, and 17.95% on average for unseen domains on SNIPS, ATIS, MultiWOZ, and SGD datasets, respectively.

Enquire One’s Parent and Child Before Decision: Fully Exploit Hierarchical Structure for Self-Supervised Taxonomy Expansion

Taxonomy is a hierarchically structured knowledge graph that plays a crucial role in machine intelligence. The taxonomy expansion task aims to find a position for a new term in an existing taxonomy to capture the emerging knowledge in the world and keep the taxonomy dynamically updated. Previous taxonomy expansion solutions neglect valuable information brought by the hierarchical structure and evaluate the correctness of merely an added edge, which downgrade the problem to node-pair scoring or mini-path classification. In this paper, we propose the Hierarchy Expansion Framework (HEF), which fully exploits the hierarchical structure’s properties to maximize the coherence of expanded taxonomy. HEF makes use of taxonomy’s hierarchical structure in multiple aspects: i) HEF utilizes subtrees containing most relevant nodes as self-supervision data for a complete comparison of parental and sibling relations; ii) HEF adopts a coherence modeling module to evaluate the coherence of a taxonomy’s subtree by integrating hypernymy relation detection and several tree-exclusive features; iii) HEF introduces the Fitting Score for position selection, which explicitly evaluates both path and level selections and takes full advantage of parental relations to interchange information for disambiguation and self-correction. Extensive experiments show that by better exploiting the hierarchical structure and optimizing taxonomy’s coherence, HEF vastly surpasses the prior state-of-the-art on three benchmark datasets by an average improvement of 46.7% in accuracy and 32.3% in mean reciprocal rank.

SESSION: Session: Knowledge Graph Validation

Typing Errors in Factual Knowledge Graphs: Severity and Possible Ways Out

Large-scale factual knowledge graphs (KGs) such as DBpedia and Wikidata are essential to many popular downstream tasks and are also widely used by various research communities as training and/or benchmarking data. Despite their immense success and utility, these KGs are surprisingly noisy. In this study, we investigate the quality of these KGs, where the typing error rate is estimated to be 27% for coarse-grained types on average, and even 73% for certain fine-grained types. In pursuit of solutions, we propose an active typing error detection algorithm that maximizes the utilization of both gold and noisy labels. We also comprehensively discuss and compare the state-of-the-art in unsupervised, semi-supervised, and supervised paradigms to deal with typing errors in factual KGs. The outcomes of this study provide guidelines for researchers to use noisy factual KGs. To help practitioners deploy the techniques and conduct further research, we published our code and data 1.

Few-Shot Knowledge Validation using Rules

Knowledge graphs (KGs) form the basis of modern intelligent search systems – their network structure helps with the semantic reasoning and interpretation of complex tasks. A KG is a highly dynamic structure in which facts are continuously updated, added, and removed. A typical approach to ensure data quality in the presence of continuous changes is to apply logic rules. These rules are automatically mined from the data using frequency-based approaches. As a result, these approaches depend on the data quality of the KG and are susceptible to errors and incompleteness.

To address these issues, we propose Colt, a few-shot rule-based knowledge validation framework that enables the interactive quality assessment of logic rules. It evaluates the quality of any rule by asking a user to validate only a few facts entailed by such rule on the KG. We formalize the problem as learning a validation function over the rule’s outcomes and study the theoretical connections to the generalized maximum coverage problem. Our model obtains (i) an accurate estimate of the quality of a rule with fewer than 20 user interactions and (ii) 75% quality (F1) with 5% annotations in the task of validating facts entailed by any rule.

OntoZSL: Ontology-enhanced Zero-shot Learning

Zero-shot Learning (ZSL), which aims to predict for those classes that have never appeared in the training data, has arisen hot research interests. The key of implementing ZSL is to leverage the prior knowledge of classes which builds the semantic relationship between classes and enables the transfer of the learned models (e.g., features) from training classes (i.e., seen classes) to unseen classes. However, the priors adopted by the existing methods are relatively limited with incomplete semantics. In this paper, we explore richer and more competitive prior knowledge to model the inter-class relationship for ZSL via ontology-based knowledge representation and semantic embedding. Meanwhile, to address the data imbalance between seen classes and unseen classes, we developed a generative ZSL framework with Generative Adversarial Networks (GANs).

Our main findings include: (i) an ontology-enhanced ZSL framework that can be applied to different domains, such as image classification (IMGC) and knowledge graph completion (KGC); (ii) a comprehensive evaluation with multiple zero-shot datasets from different domains, where our method often achieves better performance than the state-of-the-art models. In particular, on four representative ZSL baselines of IMGC, the ontology-based class semantics outperform the previous priors e.g., the word embeddings of classes by an average of 12.4 accuracy points in the standard ZSL across two example datasets (see Figure 4).

Trav-SHACL: Efficiently Validating Networks of SHACL Constraints

Knowledge graphs have emerged as expressive data structures for Web data. Knowledge graph potential and the demand for ecosystems to facilitate their creation, curation, and understanding, is testified in diverse domains, e.g., biomedicine. The Shapes Constraint Language (SHACL) is the W3C recommendation language for integrity constraints over RDF knowledge graphs. Enabling quality assements of knowledge graphs, SHACL is rapidly gaining attention in real-world scenarios. SHACL models integrity constraints as a network of shapes, where a shape contains the constraints to be fullfiled by the same entities. The validation of a SHACL shape schema can face the issue of tractability during validation. To facilitate full adoption, efficient computational methods are required. We present Trav-SHACL, a SHACL engine capable of planning the traversal and execution of a shape schema in a way that invalid entities are detected early and needless validations are minimized. Trav-SHACL reorders the shapes in a shape schema for efficient validation and rewrites target and constraint queries for fast detection of invalid entities. Trav-SHACL is empirically evaluated on 27 testbeds executed against knowledge graphs of up to 34M triples. Our experimental results suggest that Trav-SHACL exhibits high performance gradually and reduces validation time by a factor of up to 28.93 compared to the state of the art.

Online Disease Diagnosis with Inductive Heterogeneous Graph Convolutional Networks

We propose a Healthcare Graph Convolutional Network (HealGCN) to offer disease self-diagnosis service for online users based on Electronic Healthcare Records (EHRs). Two main challenges are focused in this paper for online disease diagnosis: (1) serving cold-start users via graph convolutional networks and (2) handling scarce clinical description via a symptom retrieval system. To this end, we first organize the EHR data into a heterogeneous graph that is capable of modeling complex interactions among users, symptoms and diseases, and tailor the graph representation learning towards disease diagnosis with an inductive learning paradigm. Then, we build a disease self-diagnosis system with a corresponding EHR Graph-based Symptom Retrieval System (GraphRet) that can search and provide a list of relevant alternative symptoms by tracing the predefined meta-paths. GraphRet helps enrich the seed symptom set through the EHR graph when confronting users with scarce descriptions, hence yield better diagnosis accuracy. At last, we validate the superiority of our model on a large-scale EHR dataset.

SESSION: Session: Decomposition and Detection of Anomalies and Motifs

Causal Network Motifs: Identifying Heterogeneous Spillover Effects in A/B Tests

Randomized experiments, or “A/B” tests, remain the gold standard for evaluating the causal effect of a policy intervention or product change. However, experimental settings, such as social networks, where users are interacting and influencing one another, may violate conventional assumptions of no interference for credible causal inference. Existing solutions to the network setting include accounting for the fraction or count of treated neighbors in a user’s network, yet most current methods do not account for the local network structure beyond simply counting the number of neighbors. Our study provides an approach that accounts for both the local structure in a user’s social network via motifs as well as the treatment assignment conditions of neighbors. We propose a two-part approach. We first introduce and employ “causal network motifs”, which are network motifs that characterize the assignment conditions in local ego networks; and then we propose a tree-based algorithm for identifying different network interference conditions and estimating their average potential outcomes. Our approach can account for social network theories, such as structural diversity and echo chambers, and also can help specify network interference conditions that are suitable to each experiment. We test our method on a synthetic network setting and on a real-world experiment on a large-scale network, which highlight how accounting for local structures can better account for different interference patterns in networks.

MStream: Fast Anomaly Detection in Multi-Aspect Streams

Given a stream of entries in a multi-aspect data setting i.e., entries having multiple dimensions, how can we detect anomalous activities in an unsupervised manner? For example, in the intrusion detection setting, existing work seeks to detect anomalous events or edges in dynamic graph streams, but this does not allow us to take into account additional attributes of each entry. Our work aims to define a streaming multi-aspect data anomaly detection framework, termed MStream  which can detect unusual group anomalies as they occur, in a dynamic manner. MStream has the following properties: (a) it detects anomalies in multi-aspect data including both categorical and numeric attributes; (b) it is online, thus processing each record in constant time and constant memory; (c) it can capture the correlation between multiple aspects of the data. MStream is evaluated over the KDDCUP99, CICIDS-DoS, UNSW-NB 15 and CICIDS-DDoS datasets, and outperforms state-of-the-art baselines.

Knowledge-Preserving Incremental Social Event Detection via Heterogeneous GNNs

Social events provide valuable insights into group social behaviors and public concerns and therefore have many applications in fields such as product recommendation and crisis management. The complexity and streaming nature of social messages make it appealing to address social event detection in an incremental learning setting, where acquiring, preserving, and extending knowledge are major concerns. Most existing methods, including those based on incremental clustering and community detection, learn limited amounts of knowledge as they ignore the rich semantics and structural information contained in social data. Moreover, they cannot memorize previously acquired knowledge. In this paper, we propose a novel Knowledge-Preserving Incremental Heterogeneous Graph Neural Network (KPGNN) for incremental social event detection. To acquire more knowledge, KPGNN models complex social messages into unified social graphs to facilitate data utilization and explores the expressive power of GNNs for knowledge extraction. To continuously adapt to the incoming data, KPGNN adopts contrastive loss terms that cope with a changing number of event classes. It also leverages the inductive learning ability of GNNs to efficiently detect events and extends its knowledge from previously unseen data. To deal with large social streams, KPGNN adopts a mini-batch subgraph sampling strategy for scalable training, and periodically removes obsolete data to maintain a dynamic embedding space. KPGNN requires no feature engineering and has few hyperparameters to tune. Extensive experiment results demonstrate the superiority of KPGNN over various baselines.

How Do Hyperedges Overlap in Real-World Hypergraphs? - Patterns, Measures, and Generators

Hypergraphs, a generalization of graphs, naturally represent groupwise relationships among multiple individuals or objects, which are common in many application areas, including web, bioinformatics, and social networks. The flexibility in the number of nodes in each hyperedge, which provides the expressiveness of hypergraphs, brings about structural differences between graphs and hypergraphs. Especially, the overlaps of hyperedges lead to complex high-order relations beyond pairwise relations, raising new questions that have not been considered in graphs: How do hyperedges overlap in real-world hypergraphs? Are there any pervasive characteristics? What underlying process can cause such patterns?

In this work, we closely investigate thirteen real-world hypergraphs from various domains and share interesting observations of the overlaps of hyperedges. To this end, we define principled measures and statistically compare the overlaps of hyperedges in real-world hypergraphs and those in null models. Additionally, based on the observations, we propose , a realistic hypergraph generative model. is (a) Realistic: it accurately reproduces overlapping patterns of real-world hypergraphs, (b) Automatically Fittable: its parameters can be tuned automatically using to generate hypergraphs particularly similar to a given target hypergraph, (c) Scalable: it generates and fits a hypergraph with 0.7 billion hyperedges within few hours.

STruD: Truss Decomposition of Simplicial Complexes

A simplicial complex is a generalization of a graph: a collection of n-ary relationships (instead of binary as the edges of a graph), named simplices. In this paper, we develop a new tool to study the structure of simplicial complexes: we generalize the graph notion of truss decomposition to complexes, and show that this more powerful representation gives rise to different properties compared to the graph-based one. This power, however, comes with important computational challenges derived from the combinatorial explosion caused by the downward closure property of complexes.

Drawing upon ideas from itemset mining and similarity search, we design a memory-aware algorithm, dubbed STruD, which is able to efficiently compute the truss decomposition of a simplicial complex. STruDadapts its behavior to the amount of available memory by storing intermediate data in a compact way. We then devise a variant that computes directly the n simplices of maximum trussness. By applying STruDto several datasets, we prove its scalability, and provide an analysis of their structure.

Finally, we show that the truss decomposition can be seen as a filtration, and as such it can be used to study the persistent homology of a dataset, a method for computing topological features at different spatial resolutions, prominent in Topological Data Analysis.

SESSION: Session: Facts and Misinformation

Constructing Explainable Opinion Graphs from Reviews

The Web is a major resource of both factual and subjective information. While there are significant efforts to organize factual information into knowledge bases, there is much less work on organizing opinions, which are abundant in subjective data, into a structured format.

We present ExplainIt, a system that extracts and organizes opinions into an opinion graph, which are useful for downstream applications such as generating explainable review summaries and facilitating search over opinion phrases. In such graphs, a node represents a set of semantically similar opinions extracted from reviews and an edge between two nodes signifies that one node explains the other. ExplainIt mines explanations in a supervised method and groups similar opinions together in a weakly supervised way before combining the clusters of opinions together with their explanation relationships into an opinion graph. We experimentally demonstrate that the explanation relationships generated in the opinion graph are of good quality and our labeled datasets for explanation mining and grouping opinions are publicly available at https://github.com/megagonlabs/explainit.

The Surprising Performance of Simple Baselines for Misinformation Detection

As social media becomes increasingly prominent in our day to day lives, it is increasingly important to detect informative content and prevent the spread of disinformation and unverified rumours. While many sophisticated and successful models have been proposed in the literature, they are often compared with older NLP baselines such as SVMs, CNNs, and LSTMs. In this paper, we examine the performance of a broad set of modern transformer-based language models and show that with basic fine-tuning, these models are competitive with and can even significantly outperform recently proposed state-of-the-art methods. We present our framework as a baseline for creating and evaluating new methods for misinformation detection. We further study a comprehensive set of benchmark datasets, and discuss potential data leakage and the need for careful design of the experiments and understanding of datasets to account for confounding variables. As an extreme case example, we show that classifying only based on the first three digits of tweet ids, which contain information on the date, gives state-of-the-art performance on a commonly used benchmark dataset for fake news detection –Twitter16. We provide a simple tool to detect this problem and suggest steps to mitigate it in future datasets.

MQuadE: a Unified Model for Knowledge Fact Embedding

The task of knowledge graph embedding (KGE) tries to find appropriate representations for entities and relations and appropriate mathematical computations between the representations to approximate the symbolic and logical relationships between entities. One major challenge for KGE is that the relations in real-world knowledge bases exhibit complex behaviors: they can be injective (1-1) or non-injective (1-N, N-1, or N-N), symmetry or skew-symmetry; one relation may be the inversion of another relation; one relation may be the composition of other two relations (where the composition can be either Abelian or non-Abelian). To our knowledge, there has not been any theoretical guarantee that these complex behaviors can be modeled by existing KGE methods.

This paper proposes a method called MQuadE to tackle the challenge in KGE modeling. In MQuadE, we represent a fact triple (h, r, t), that is, (head entity, relation, tail entity), in the knowledge graph with a matrix quadruple , where H and T are the representations of h and t respectively and is the pair of representation of r. MQuadE projects the head entity into HR and the tail entity into , then assumes that HR and are similar for true facts and dissimilar for false facts. We prove that MQuadE, as a unified model for KGE, is able to model the generally concerned types of relations (symmetric, skew-symmetric, injective, non-injective, inversion, Abelian composition, non-Abelian composition). Experiments on link prediction and triple classification show that MQuadE outperforms many previous knowledge graph embedding methods, especially on 1-N, N-1, and N-N relations.

Target-adaptive Graph for Cross-target Stance Detection

Target plays an essential role in stance detection of an opinionated review/claim, since the stance expressed in the text often depends on the target. In practice, we need to deal with targets unseen in the annotated training data. As such, detecting stance for an unknown or unseen target is an important research problem. This paper presents a novel approach that automatically identifies and adapts the target-dependent and target-independent roles that a word plays with respect to a specific target in stance expressions, so as to achieve cross-target stance detection. More concretely, we explore a novel solution of constructing heterogeneous target-adaptive pragmatics dependency graphs (TPDG) for each sentence towards a given target. An in-target graph is constructed to produce inherent pragmatics dependencies of words for a distinct target. In addition, another cross-target graph is constructed to develop the versatility of words across all targets for boosting the learning of dominant word-level stance expressions available to an unknown target. A novel graph-aware model with interactive Graphical Convolutional Network (GCN) blocks is developed to derive the target-adaptive graph representation of the context for stance detection. The experimental results on a number of benchmark datasets show that our proposed model outperforms state-of-the-art methods in cross-target stance detection.

Mining Dual Emotion for Fake News Detection

Emotion plays an important role in detecting fake news online. When leveraging emotional signals, the existing methods focus on exploiting the emotions of news contents that conveyed by the publishers (i.e., publisher emotion). However, fake news often evokes high-arousal or activating emotions of people, so the emotions of news comments aroused in the crowd (i.e., social emotion) should not be ignored. Furthermore, it remains to be explored whether there exists a relationship between publisher emotion and social emotion (i.e., dual emotion), and how the dual emotion appears in fake news. In this paper, we verify that dual emotion is distinctive between fake and real news and propose Dual Emotion Features to represent dual emotion and the relationship between them for fake news detection. Further, we exhibit that our proposed features can be easily plugged into existing fake news detectors as an enhancement. Extensive experiments on three real-world datasets (one in English and the others in Chinese) show that our proposed feature set: 1) outperforms the state-of-the-art task-related emotional features; 2) can be well compatible with existing fake news detectors and effectively improve the performance of detecting fake news.1 2

SESSION: Session: Question Answering and Text Processing

Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases

Existing studies on question answering on knowledge bases (KBQA) mainly operate with the standard i.i.d. assumption, i.e., training distribution over questions is the same as the test distribution. However, i.i.d. may be neither achievable nor desirable on large-scale KBs because 1) true user distribution is hard to capture and 2) randomly sampling training examples from the enormous space would be data-inefficient. Instead, we suggest that KBQA models should have three levels of built-in generalization: i.i.d., compositional, and zero-shot. To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64,331 questions, GrailQA, and provide evaluation settings for all three levels of generalization. In addition, we propose a novel BERT-based KBQA model. The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA.1

Improving Neural Question Generation using Deep Linguistic Representation

Question Generation (QG) is a challenging Natural Language Processing (NLP) task which aims at generating questions with given answers and context. There are many works incorporating linguistic features to improve the performance of QG. However, similar to traditional word embedding, these works normally embed such features with a set of trainable parameters, which results in the linguistic features not fully exploited. In this work, inspired by the recent achievements of text representation, we propose to utilize linguistic information via large pre-trained neural models. First, these models are trained in several specific NLP tasks in order to better represent linguistic features. Then, such feature representation is fused into a seq2seq based QG model to guide question generation. Extensive experiments were conducted on two benchmark Question Generation datasets to evaluate the effectiveness of our approach. The experimental results demonstrate that our approach outperforms the state-of-the-art QG systems, as a result, it significantly improves the baseline by 17.2% and 6.2% under the BLEU-4 metric on these two datasets, respectively.

Diverse and Specific Clarification Question Generation with Keywords

Product descriptions on e-commerce websites often suffer from missing important aspects. Clarification question generation (CQGen) can be a promising approach to help alleviate the problem. Unlike traditional QGen assuming the existence of answers in the context and generating questions accordingly, CQGen mimics user behaviors of asking for unstated information. The generated CQs can serve as a sanity check or proofreading to help e-commerce merchant to identify potential missing information before advertising their product, and improve consumer experience consequently. Due to the variety of possible user backgrounds and use cases, the information need can be quite diverse but also specific to a detailed topic, while previous works assume generating one CQ per context and the results tend to be generic. We thus propose the task of Diverse CQGen and also tackle the challenge of specificity. We propose a new model named KPCNet, which generates CQs with Keyword Prediction and Conditioning, to deal with the tasks. Automatic and human evaluation on 2 datasets (Home & Kitchen, Office) showed that KPCNet can generate more specific questions and promote better group-level diversity than several competing baselines. 1

Knowledge-Aware Procedural Text Understanding with Multi-Stage Training

Procedural text describes dynamic state changes during a step-by-step natural process (e.g., photosynthesis). In this work, we focus on the task of procedural text understanding, which aims to comprehend such documents and track entities’ states and locations during a process. Although recent approaches have achieved substantial progress, their results are far behind human performance. Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved, which require the incorporation of external knowledge bases. Previous works on external knowledge injection usually rely on noisy web mining tools and heuristic rules with limited applicable scenarios. In this paper, we propose a novel KnOwledge-Aware proceduraL text understAnding (KoaLa) model, which effectively leverages multiple forms of external knowledge in this task. Specifically, we retrieve informative knowledge triples from ConceptNet and perform knowledge-aware reasoning while tracking the entities. Besides, we employ a multi-stage training schema which fine-tunes the BERT model over unlabeled data collected from Wikipedia before further fine-tuning it on the final model. Experimental results on two procedural text datasets, ProPara and Recipes, verify the effectiveness of the proposed methods, in which our model achieves state-of-the-art performance in comparison to various baselines.1

Multi-level Connection Enhanced Representation Learning for Script Event Prediction

Script event prediction (SEP) aims to choose a correct subsequent event from a candidate list, given a chain of ordered context events. Event representation learning has been proposed and successfully applied to this task. Most previous methods learning representations mainly focus on coarse-grained connections at event or chain level, while ignoring more fine-grained connections between events. Here we propose a novel framework which can enhance the representation learning of events by mining their connections at multiple granularity levels, including argument level, event level and chain level. In our method, we first employ a masked self-attention mechanism to model the relations between the components of events (i.e. arguments). Then, a directed graph convolutional network is further utilized to model the temporal or causal relations between events in the chain. Finally, we introduce an attention module to the context event chain, so as to dynamically aggregate context events with respect to the current candidate event. By fusing threefold connections in a unified framework, our approach can learn more accurate argument/event/chain representations, and thus leads to better prediction performance. Comprehensive experiment results on public New York Times corpus demonstrate that our model outperforms other state-of-the-art baselines. Our code is available in https://github.com/YueAWu/MCer.

SESSION: Session: Learning

Unsupervised Lifelong Learning with Curricula

Lifelong machine learning (LML) has driven the development of extensive web applications, enabling the learning systems deployed on web servers to deal with a sequence of tasks in an incremental fashion. Such systems can retain knowledge from learned tasks in a knowledge base and seamlessly apply it to improve the future learning. Unfortunately, most existing LML methods require labels in every task, whereas providing persistent human labeling for all future tasks is costly, onerous, error-prone, and hence impractical. Motivated by this situation, we propose a new paradigm named unsupervised lifelong learning with curricula (ULLC), where only one task needs to be labeled for initialization and the system then performs lifelong learning for subsequent tasks in an unsupervised fashion. A main challenge of realizing this paradigm lies in the occurrence of negative knowledge transfer, where partial old knowledge becomes detrimental for learning a given task yet cannot be filtered out by the learner without the help of labels. To overcome this challenge, we draw insights from the learning behaviors of humans. Specifically, when faced with a difficult task that cannot be well tackled by our current knowledge, we usually postpone it and work on some easier tasks first, which allows us to grow our knowledge. Thereafter, once we go back to the postponed task, we are more likely to tackle it well as we are more knowledgeable now. The key idea of ULLC is similar – at any time, a pool of candidate tasks are organized in a curriculum by their distances to the knowledge base. The learner then starts from the closer tasks, accumulates knowledge from learning them, and moves to learn the faraway tasks with a gradually augmented knowledge base. The viability and effectiveness of our proposal are substantiated through extensive empirical studies on both synthetic and real datasets.

Taxonomy-aware Learning for Few-Shot Event Detection

Event detection classifies unlabeled sentences into event labels, which can benefit numerous applications, including information retrieval, question answering and script learning. One of the major obstacles to event detection in reality is insufficient training data. To deal with the low-resources problem, we investigate few-shot event detection in this paper and propose TaLeM, a novel taxonomy-aware learning model, consisting of two components, i.e., the taxonomy-aware self-supervised learning framework (TaSeLF) and the taxonomy-aware prototypical networks (TaPN). Specifically, TaSeLF mines the taxonomy-aware distance relations to increases the training examples, which alleviates the generalization bottleneck brought by the insufficient data. TaPN introduces the Poincaré embeddings to represent the label taxonomy, and integrates them into a task-adaptive projection networks, which tackles problems of the class centroids distribution and the taxonomy-aware embedding distribution in the vanilla prototypical networks.

Extensive experiments in the four types of meta tasks demonstrate the superiority of our proposal over the strong baselines, and further verify the effectiveness and importance of modeling the label taxonomy. Besides, TaSeLF can be a flexible plug-in for the other taxonomy-based few-shot classification tasks.

Distilling Knowledge from Publicly Available Online EMR Data to Emerging Epidemic for Prognosis

Due to the characteristics of COVID-19, the epidemic develops rapidly and overwhelms health service systems worldwide. Many patients suffer from life-threatening systemic problems and need to be carefully monitored in ICUs. An intelligent prognosis can help physicians take an early intervention, prevent adverse outcomes, and optimize the medical resource allocation, which is urgently needed, especially in this ongoing global pandemic crisis. However, in the early stage of the epidemic outbreak, the data available for analysis is limited due to the lack of effective diagnostic mechanisms, the rarity of the cases, and privacy concerns. In this paper, we propose a distilled transfer learning framework, which leverages the existing publicly available online Electronic Medical Records to enhance the prognosis for inpatients with emerging infectious diseases. It learns to embed the COVID-19-related medical features based on massive existing EMR data. The transferred parameters are further trained to imitate the teacher model’s representation based on distillation, which embeds the health status more comprehensively on the source dataset. We conduct Length-of-Stay prediction experiments for patients in ICUs on real-world COVID-19 datasets. The experiment results indicate that our proposed model consistently outperforms competitive baseline methods. In order to further verify the scalability of o deal with different clinical tasks on different EMR datasets, we conduct an additional mortality prediction experiment on End-Stage Renal Disease datasets. The extensive experiments demonstrate that an benefit the prognosis for emerging pandemics and other diseases with limited EMR.

AID: Active Distillation Machine to Leverage Pre-Trained Black-Box Models in Private Data Settings

This paper presents an active distillation method for a local institution (e.g., hospital) to find the best queries within its given budget to distill an on-server black-box model’s predictive knowledge into a local surrogate with transparent parameterization. This allows local institutions to understand better the predictive reasoning of the black-box model in its own local context or to further customize the distilled knowledge with its private dataset that cannot be centralized and fed into the server model. The proposed method thus addresses several challenges of deploying machine learning (ML) in many industrial settings (e.g., healthcare analytics) with strong proprietary constraints. These include: (1) the opaqueness of the server model’s architecture which prevents local users from understanding its predictive reasoning in their local data contexts; (2) the increasing cost and risk of uploading local data on the cloud for analysis; and (3) the need to customize the server model with private onsite data. We evaluated the proposed method on both benchmark and real-world healthcare data where significant improvements over existing local distillation methods were observed. A theoretical analysis of the proposed method is also presented.

UserSim: User Simulation via Supervised GenerativeAdversarial Network

With the recent advances in Reinforcement Learning (RL), there have been tremendous interests in employing RL for recommender systems. However, directly training and evaluating a new RL-based recommendation algorithm needs to collect users’ real-time feedback in the real system, which is time/effort consuming and could negatively impact users’ experiences. Thus, it calls for a user simulator that can mimic real users’ behaviors to pre-train and evaluate new recommendation algorithms. Simulating users’ behaviors in a dynamic system faces immense challenges – (i) the underlying item distribution is complex, and (ii) historical logs for each user are limited. In this paper, we develop a user simulator based on a Generative Adversarial Network (GAN). To be specific, the generator captures the underlying distribution of users’ historical logs and generates realistic logs that can be considered as augmentations of real logs; while the discriminator not only distinguishes real and fake logs but also predicts users’ behaviors. The experimental results based on benchmark datasets demonstrate the effectiveness of the proposed simulator.

SESSION: Session: Adversarial Cloaking and Attacks

Adversarial Item Promotion: Vulnerabilities at the Core of Top-N Recommenders that Use Images to Address Cold Start

E-commerce platforms provide their customers with ranked lists of recommended items matching the customers’ preferences. Merchants on e-commerce platforms would like their items to appear as high as possible in the top-N of these ranked lists. In this paper, we demonstrate how unscrupulous merchants can create item images that artificially promote their products, improving their rankings. Recommender systems that use images to address the cold start problem are vulnerable to this security risk. We describe a new type of attack, Adversarial Item Promotion (AIP), that strikes directly at the core of Top-N recommenders: the ranking mechanism itself. Existing work on adversarial images in recommender systems investigates the implications of conventional attacks, which target deep learning classifiers. In contrast, our AIP attacks are embedding attacks that seek to push features representations in a way that fools the ranker (not a classifier) and directly leads to item promotion. We introduce three AIP attacks insider attack, expert attack, and semantic attack, which are defined with respect to three successively more realistic attack models. Our experiments evaluate the danger of these attacks when mounted against three representative visually-aware recommender algorithms in a framework that uses images to address cold start. We also evaluate potential defenses, including adversarial training and find that common, currently-existing, techniques do not eliminate the danger of AIP attacks. In sum, we show that using images to address cold start opens recommender systems to potential threats with clear practical implications.

Robust Android Malware Detection against Adversarial Example Attacks

Adversarial examples pose severe threats to Android malware detection because they can render the machine learning based detection systems useless. How to effectively detect Android malware under various adversarial example attacks becomes an essential but very challenging issue. Existing adversarial example defense mechanisms usually rely heavily on the instances or the knowledge of adversarial examples, and thus their usability and effectiveness are significantly limited because they often cannot resist the unseen-type adversarial examples. In this paper, we propose a novel robust Android malware detection approach that can resist adversarial examples without requiring their instances or knowledge by jointly investigating malware detection and adversarial example defenses. More precisely, our approach employs a new VAE (variational autoencoder) and an MLP (multi-layer perceptron) to detect malware, and combines their detection outcomes to make the final decision. In particular, we share a feature extraction network between the VAE and the MLP to reduce model complexity and design a new loss function to disentangle the features of different classes, hence improving detection performance. Extensive experiments confirm our model’s advantage in accuracy and robustness. Our method outperforms 11 state-of-the-art robust Android malware detection models when resisting 7 kinds of adversarial example attacks.

Where are you taking me?Understanding Abusive Traffic Distribution Systems

Illicit website owners frequently rely on traffic distribution systems (TDSs) operated by less-than-scrupulous advertising networks to acquire user traffic. While researchers have described a number of case studies on various TDSs or the businesses they serve, we still lack an understanding of how users are differentiated in these ecosystems, how different illicit activities frequently leverage the same advertisement networks and, subsequently, the same malicious advertisers. We design ODIN (Observatory of Dynamic Illicit ad Networks), the first system to study cloaking, user differentiation and business integration at the same time in four different types of traffic sources: typosquatting, copyright-infringing movie streaming, ad-based URL shortening, and illicit online pharmacy websites.

ODIN performed 874,494 scrapes over two months (June 19, 2019–August 24, 2019), posing as six different types of users (e.g., mobile, desktop, and crawler) and accumulating over 2TB of data. We observed 81% more malicious pages compared to using only the best performing crawl profile by itself. Three of the traffic sources we study redirect users to the same traffic broker domain names up to 44% of the time and all of them often expose users to the same malicious advertisers. Our experiments show that novel cloaking techniques could decrease by half the number of malicious pages observed. Worryingly, popular blacklists do not just suffer from the lack of coverage and delayed detection, but miss the vast majority of malicious pages targeting mobile users. We use these findings to design a classifier, which can make precise predictions about the likelihood of a user being redirected to a malicious advertiser.

One Detector to Rule Them All: Towards a General Deepfake Attack Detection Framework

Deep learning-based video manipulation methods have become widely accessible to the masses. With little to no effort, people can quickly learn how to generate deepfake (DF) videos. While deep learning-based detection methods have been proposed to identify specific types of DFs, their performance suffers for other types of deepfake methods, including real-world deepfakes, on which they are not sufficiently trained. In other words, most of the proposed deep learning-based detection methods lack transferability and generalizability. Beyond detecting a single type of DF from benchmark deepfake datasets, we focus on developing a generalized approach to detect multiple types of DFs, including deepfakes from unknown generation methods such as DeepFake-in-the-Wild (DFW) videos. To better cope with unknown and unseen deepfakes, we introduce a Convolutional LSTM-based Residual Network (CLRNet), which adopts a unique model training strategy and explores spatial as well as the temporal information in a deepfakes. Through extensive experiments, we show that existing defense methods are not ready for real-world deployment. Whereas our defense method (CLRNet) achieves far better generalization when detecting various benchmark deepfake methods (97.57% on average). Furthermore, we evaluate our approach with a high-quality DeepFake-in-the-Wild dataset, collected from the Internet containing numerous videos and having more than 150,000 frames. Our CLRNet model demonstrated that it generalizes well against high-quality DFW videos by achieving 93.86% detection accuracy, outperforming existing state-of-the-art defense methods by a considerable margin.

A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning

As modern neural machine translation (NMT) systems have been widely deployed, their security vulnerabilities require close scrutiny. Most recently, NMT systems have been found vulnerable to targeted attacks which cause them to produce specific, unsolicited, and even harmful translations. These attacks are usually exploited in a white-box setting, where adversarial inputs causing targeted translations are discovered for a known target system. However, this approach is less viable when the target system is black-box and unknown to the adversary (e.g., secured commercial systems). In this paper, we show that targeted attacks on black-box NMT systems are feasible, based on poisoning a small fraction of their parallel training data. We show that this attack can be realised practically via targeted corruption of web documents crawled to form the system’s training data. We then analyse the effectiveness of the targeted poisoning in two common NMT training scenarios: the from-scratch training and the pre-train & fine-tune paradigm. Our results are alarming: even on the state-of-the-art systems trained with massive parallel data (tens of millions), the attacks are still successful (over 50% success rate) under surprisingly low poisoning budgets (e.g., 0.006%). Lastly, we discuss potential defences to counter such attacks.

SESSION: Session: Graph Algorithms

On the Equivalence of Decoupled Graph Convolution Network and Label Propagation

The original design of Graph Convolution Network (GCN) couples feature transformation and neighborhood aggregation for node representation learning. Recently, some work shows that coupling is inferior to decoupling, which supports deep graph propagation better and has become the latest paradigm of GCN (e.g., APPNP [16] and SGCN [32]). Despite effectiveness, the working mechanisms of the decoupled GCN are not well understood.

In this paper, we explore the decoupled GCN for semi-supervised node classification from a novel and fundamental perspective — label propagation. We conduct thorough theoretical analyses, proving that the decoupled GCN is essentially the same as the two-step label propagation: first, propagating the known labels along the graph to generate pseudo-labels for the unlabeled nodes, and second, training normal neural network classifiers on the augmented pseudo-labeled data. More interestingly, we reveal the effectiveness of decoupled GCN: going beyond the conventional label propagation, it could automatically assign structure- and model- aware weights to the pseudo-label data. This explains why the decoupled GCN is relatively robust to the structure noise and over-smoothing, but sensitive to the label noise and model initialization. Based on this insight, we propose a new label propagation method named Propagation then Training Adaptively (PTA), which overcomes the flaws of the decoupled GCN with a dynamic and adaptive weighting strategy. Our PTA is simple yet more effective and robust than decoupled GCN. We empirically validate our findings on four benchmark datasets, demonstrating the advantages of our method. The code is available at https://github.com/DongHande/PT_propagation_then_training.

Mixup for Node and Graph Classification

Mixup is an advanced data augmentation method for training neural network based image classifiers, which interpolates both features and labels of a pair of images to produce synthetic samples. However, devising the Mixup methods for graph learning is challenging due to the irregularity and connectivity of graph data. In this paper, we propose the Mixup methods for two fundamental tasks in graph learning: node and graph classification. To interpolate the irregular graph topology, we propose the two-branch graph convolution to mix the receptive field subgraphs for the paired nodes. Mixup on different node pairs can interfere with the mixed features for each other due to the connectivity between nodes. To block this interference, we propose the two-stage Mixup framework, which uses each node’s neighbors’ representations before Mixup for graph convolutions. For graph classification, we interpolate complex and diverse graphs in the semantic space. Qualitatively, our Mixup methods enable GNNs to learn more discriminative features and reduce over-fitting. Quantitative results show that our method yields consistent gains in terms of test accuracy and F1-micro scores on standard datasets, for both node and graph classification. Overall, our method effectively regularizes popular graph neural networks for better generalization without increasing their time complexity.

Effective and Scalable Clustering on Massive Attributed Graphs

Given a graph G where each node is associated with a set of attributes, and a parameter k specifying the number of output clusters, k-attributed graph clustering (k-AGC) groups nodes in G into k disjoint clusters, such that nodes within the same cluster share similar topological and attribute characteristics, while those in different clusters are dissimilar. This problem is challenging on massive graphs, e.g., with millions of nodes and billions of attribute values. For such graphs, existing solutions either incur prohibitively high costs, or produce clustering results with compromised quality.

In this paper, we propose , an efficient approach to k-AGC that yields high-quality clusters with costs linear to the size of the input graph G. The main contributions of are twofold: (i) a novel formulation of the k-AGC problem based on an attributed multi-hop conductance quality measure custom-made for this problem setting, which effectively captures cluster coherence in terms of both topological proximities and attribute similarities, and (ii) a linear-time optimization solver that obtains high quality clusters iteratively, based on efficient matrix operations such as orthogonal iterations, an alternative optimization approach, as well as an initialization technique that significantly speeds up the convergence of in practice.

Extensive experiments, comparing 11 competitors on 6 real datasets, demonstrate that consistently outperforms all competitors in terms of result quality measured against ground truth labels, while being up to orders of magnitude faster. In particular, on the Microsoft Academic Knowledge Graph dataset with 265.2 million edges and 1.1 billion attribute values, outputs high-quality results for 5-AGC within 1.68 hours using a single CPU core, while none of the 11 competitors finish within 3 days.

Mask-GVAE: Blind Denoising Graphs via Partition

We present Mask-GVAE, a variational generative model for blind denoising large discrete graphs, in which ”blind denoising” means we don’t require any supervision from clean graphs. We focus on recovering graph structures via deleting irrelevant edges and adding missing edges, which has many applications in real-world scenarios, for example, enhancing the quality of connections in a co-authorship network. Mask-GVAE makes use of the robustness in low eigenvectors of graph Laplacian against random noise and decomposes the input graph into several stable clusters. It then harnesses the huge computations by decoding probabilistic smoothed subgraphs in a variational manner. On a wide variety of benchmarks, Mask-GVAE outperforms competing approaches by a significant margin on PSNR and WL similarity.

Bridging the Gap between von Neumann Graph Entropy and Structural Information: Theory and Applications

The von Neumann graph entropy (VNGE) is a measure of graph complexity based on the Laplacian spectrum. It has recently found applications in various learning tasks driven by networked data. However, it is computationally demanding and hard to interpret using simple structural patterns. Due to the close relation between Lapalcian spectrum and degree sequence, we conjecture that the structural information, defined as the Shannon entropy of the normalized degree sequence, might approximate VNGE well.

In this work, we thereby study the difference between the structural information and VNGE named as entropy gap. Based on the knowledge that the degree sequence is majorized by the Laplacian spectrum, we for the first time prove the entropy gap is between 0 and log 2e in any undirected unweighted graphs. Consequently we certify that the structural information is a good approximation of VNGE that achieves provable accuracy, scalability, and interpretability simultaneously. This approximation is further applied to two entropy-related tasks: network design and graph similarity measure, where novel graph similarity measure and fast algorithms are proposed. Our experimental results on graphs of various scales and types show that the very small entropy gap readily applies to a wide range of graphs and weighted graphs. As an approximation of VNGE, the structural information is the only one that achieves both high efficiency and high accuracy among the prominent methods. It is at least two orders of magnitude faster than SLaQ [40] with comparable accuracy. Our structural information based methods also exhibit superior performance in two entropy-related tasks.

SESSION: Session: Text Classification

Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels

Extreme Classification (XC) refers to supervised learning where each training/test instance is labeled with small subset of relevant labels that are chosen from a large set of possible target labels. The framework of XC has been widely employed in web applications such as automatic labeling of web-encyclopedia, prediction of related searches, and recommendation systems.

While most state-of-the-art models in XC achieve high overall accuracy by performing well on the frequently occurring labels, they perform poorly on a large number of infrequent (tail) labels. This arises from two statistical challenges, (i) missing labels, as it is virtually impossible to manually assign every relevant label to an instance, and (ii) highly imbalanced data distribution where a large fraction of labels are tail labels. In this work, we consider common loss functions that decompose over labels, and calculate unbiased estimates that compensate missing labels according to Natarajan et al. [26]. This turns out to be disadvantageous from an optimization perspective, as important properties such as convexity and lower-boundedness are lost. To circumvent this problem, we use the fact that typical loss functions in XC are convex surrogates of the 0-1 loss, and thus propose to switch to convex surrogates of its unbiased version. These surrogates are further adapted to the label imbalance by combining with label-frequency-based rebalancing.

We show that the proposed loss functions can be easily incorporated into various different frameworks for extreme classification. This includes (i) linear classifiers, such as DiSMEC, on sparse input data representation, (ii) attention-based deep architecture, AttentionXML, learnt on dense Glove embeddings, and (iii) XLNet-based transformer model for extreme classification, APLC-XLNet. Our results demonstrate consistent improvements over the respective vanilla baseline models, on the propensity-scored metrics for precision and nDCG.

ECLARE: Extreme Classification with Label Graph Correlations

Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during training. Such rare labels hold the key to personalized recommendations that can delight and surprise a user. However, the large number of rare labels and small amount of training data per rare label offer significant statistical and computational challenges. State-of-the-art deep XC methods attempt to remedy this by incorporating textual descriptions of labels but do not adequately address the problem. This paper presents ECLARE, a scalable deep learning architecture that incorporates not only label text, but also label correlations, to offer accurate real-time predictions within a few milliseconds. Core contributions of ECLARE include a frugal architecture and scalable techniques to train deep models along with label correlation graphs at the scale of millions of labels. In particular, ECLARE offers predictions that are 2–14% more accurate on both publicly available benchmark datasets as well as proprietary datasets for a related products recommendation task sourced from the Bing search engine. Code for ECLARE is available at https://github.com/Extreme-classification/ECLARE

GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification

This paper develops the GalaXC algorithm for Extreme Classification, where the task is to annotate a document with the most relevant subset of labels from an extremely large label set. Extreme classification has been successfully applied to several real world web-scale applications such as web search, product recommendation, query rewriting, etc. GalaXC identifies two critical deficiencies in leading extreme classification algorithms. First, existing approaches generally assume that documents and labels reside in disjoint sets, even though in several applications, labels and documents cohabit the same space. Second, several approaches, albeit scalable, do not utilize various forms of metadata offered by applications, such as label text and label correlations. To remedy these, GalaXC presents a framework that enables collaborative learning over joint document-label graphs at massive scales, in a way that naturally allows various auxiliary sources of information, including label metadata, to be incorporated. GalaXC also introduces a novel label-wise attention mechanism to meld high-capacity extreme classifiers with its framework. An efficient end-to-end implementation of GalaXC is presented that could be trained on a dataset with 50M labels and 97M training documents in less than 100 hours on 4 × V100 GPUs. This allowed GalaXC to not only scale to applications with several millions of labels, but also be up to 18% more accurate than leading deep extreme classifiers, while being upto 2-50 × faster to train and 10 × faster to predict on benchmark datasets. GalaXC is particularly well-suited to warm-start scenarios where predictions need to be made on data points with partially revealed label sets, and was found to be up to 25% more accurate than extreme classification algorithms specifically designed for warm start settings. In A/B tests conducted on the Bing search engine, GalaXC could improve the Click Yield (CY) and coverage by 1.52% and 1.11% respectively. Code for GalaXC is available at https://github.com/Extreme-classification/GalaXC

Generalizing Discriminative Retrieval Models using Generative Tasks

Information Retrieval has a long history of applying either discriminative or generative modeling to retrieval and ranking tasks. Recent developments in transformer architectures and multi-task learning techniques have dramatically improved our ability to train effective neural models capable of resolving a wide variety of tasks using either of these paradigms. In this paper, we propose a novel multi-task learning approach which can be used to produce more effective neural ranking models. The key idea is to improve the quality of the underlying transformer model by cross-training a retrieval task and one or more complementary language generation tasks. By targeting the training on the encoding layer in the transformer architecture, our experimental results show that the proposed multi-task learning approach consistently improves retrieval effectiveness on the targeted collection and can easily be re-targeted to new ranking tasks. We provide an in-depth analysis showing how multi-task learning modifies model behaviors, resulting in more general models.

FedPS: A Privacy Protection Enhanced Personalized Search Framework

Personalized search returns each user more accurate results by collecting the user’s historical search behaviors to infer her interests and query intents. However, it brings the risk of user privacy leakage, and this may greatly limit the practical application of personalized search. In this paper, we focus on the problem of privacy protection in personalized search, and propose a privacy protection enhanced personalized search framework, denoted with FedPS. Under this framework, we keep each user’s private data on her individual client, and train a shared personalized ranking model with all users’ decentralized data by means of federated learning. We implement two models within the framework: the first one applies a personalization model with a personal module that fits the user’s data distribution to alleviate the challenge of data heterogeneity in federated learning; the second model introduces trustworthy proxies and group servers to solve the problems of limited communication, performance bottleneck and privacy attack for FedPS. Experimental results verify that our proposed framework can enhance privacy protection without losing too much accuracy.

SESSION: Session: Bias and Fairness

Auditing for Discrimination in Algorithms Delivering Job Ads

Ad platforms such as Facebook, Google and LinkedIn promise value for advertisers through their targeted advertising. However, multiple studies have shown that ad delivery on such platforms can be skewed by gender or race due to hidden algorithmic optimization by the platforms, even when not requested by the advertisers. Building on prior work measuring skew in ad delivery, we develop a new methodology for black-box auditing of algorithms for discrimination in the delivery of job advertisements. Our first contribution is to identify the distinction between skew in ad delivery due to protected categories such as gender or race, from skew due to differences in qualification among people in the targeted audience. This distinction is important in U.S. law, where ads may be targeted based on qualifications, but not on protected categories. Second, we develop an auditing methodology that distinguishes between skew explainable by differences in qualifications from other factors, such as the ad platform’s optimization for engagement or training its algorithms on biased data. Our method controls for job qualification by comparing ad delivery of two concurrent ads for similar jobs, but for a pair of companies with different de facto gender distributions of employees. We describe the careful statistical tests that establish evidence of non-qualification skew in the results. Third, we apply our proposed methodology to two prominent targeted advertising platforms for job ads: Facebook and LinkedIn. We confirm skew by gender in ad delivery on Facebook, and show that it cannot be justified by differences in qualifications. We fail to find skew in ad delivery on LinkedIn. Finally, we suggest improvements to ad platform practices that could make external auditing of their algorithms in the public interest more feasible and accurate.

Debiasing Career Recommendations with Neural Fair Collaborative Filtering

A growing proportion of human interactions are digitized on social media platforms and subjected to algorithmic decision-making, and it has become increasingly important to ensure fair treatment from these algorithms. In this work, we investigate gender bias in collaborative-filtering recommender systems trained on social media data. We develop neural fair collaborative filtering (NFCF), a practical framework for mitigating gender bias in recommending career-related sensitive items (e.g. jobs, academic concentrations, or courses of study) using a pre-training and fine-tuning approach to neural collaborative filtering, augmented with bias correction techniques. We show the utility of our methods for gender de-biased career and college major recommendations on the MovieLens dataset and a Facebook dataset, respectively, and achieve better performance and fairer behavior than several state-of-the-art models.

The Interaction between Political Typology and Filter Bubbles in News Recommendation Algorithms

Algorithmic personalization of news and social media content aims to improve user experience; however, there is evidence that this filtering can have the unintended side effect of creating homogeneous “filter bubbles,” in which users are over-exposed to ideas that conform with their preexisting perceptions and beliefs. In this paper, we investigate this phenomenon in the context of political news recommendation algorithms, which have important implications for civil discourse.

We first collect and curate a collection of over 900K news articles from 41 sources annotated by topic and partisan lean. We then conduct simulation studies to investigate how different algorithmic strategies affect filter bubble formation. Drawing on Pew studies of political typologies, we identify heterogeneous effects based on the user’s pre-existing preferences. For example, we find that i) users with more extreme preferences are shown less diverse content but have higher click-through rates than users with less extreme preferences, ii) content-based and collaborative-filtering recommenders result in markedly different filter bubbles, and iii) when users have divergent views on different topics, recommenders tend to have a homogenization effect.

Dialect Diversity in Text Summarization on Twitter

Discussions on Twitter involve participation from different communities with different dialects and it is often necessary to summarize a large number of posts into a representative sample to provide a synopsis. Yet, any such representative sample should sufficiently portray the underlying dialect diversity to present the voices of different participating communities representing the dialects. Extractive summarization algorithms perform the task of constructing subsets that succinctly capture the topic of any given set of posts. However, we observe that there is dialect bias in the summaries generated by common summarization approaches, i.e., they often return summaries that under-represent certain dialects.

The vast majority of existing “fair” summarization approaches require socially salient attribute labels (in this case, dialect) to ensure that the generated summary is fair with respect to the socially salient attribute. Nevertheless, in many applications, these labels do not exist. Furthermore, due to the ever-evolving nature of dialects in social media, it is unreasonable to label or accurately infer the dialect of every social media post. To correct for the dialect bias, we employ a framework that takes an existing text summarization algorithm as a blackbox and, using a small set of dialect-diverse sentences, returns a summary that is relatively more dialect-diverse. Crucially, this approach does not need the posts being summarized to have dialect labels, ensuring that the diversification process is independent of dialect classification/identification models. We show the efficacy of our approach on Twitter datasets containing posts written in dialects used by different social groups defined by race or gender; in all cases, our approach leads to improved dialect diversity compared to standard text summarization approaches.

Fairness-Aware PageRank

Algorithmic fairness has attracted significant attention in the past years. In this paper, we consider fairness for link analysis and in particular for the celebrated Pagerank algorithm. Given that the nodes in a network belong to groups (for example, based on demographic or other characteristics), we provide a parity-based definition of fairness that imposes constraints on the proportion of Pagerank allocated to the members of each group. We propose two families of fair Pagerank algorithms: the first (Fairness-Sensitive Pagerank) modifies the jump vector of the Pagerank algorithm to enforce fairness; the second (Locally Fair Pagerank) imposes a fair behavior per node. We then define a stronger fairness requirement, termed universal personalized fairness, that asks that the derived personalized pageranks of all nodes are fair. We prove that the locally fair algorithms achieve also universal personalized fairness, and furthermore, we prove that this is the only family of algorithms with this property, establishing an equivalence between universal personalized fairness and local fairness. We also consider the problem of achieving fairness while minimizing the utility loss with respect to the original Pagerank algorithm. We present experiments with real and synthetic networks that examine the fairness of the original Pagerank and demonstrate qualitatively and quantitatively the properties of our algorithms.

SESSION: Session: Recommendations

Cost-Effective and Interpretable Job Skill Recommendation with Deep Reinforcement Learning

Nowadays, as organizations operate in very fast-paced and competitive environments, workforce has to be agile and adaptable to regularly learning new job skills. However, it is nontrivial for talents to know which skills to develop at each working stage. To this end, in this paper, we aim to develop a cost-effective recommendation system based on deep reinforcement learning, which can provide personalized and interpretable job skill recommendation for each talent. Specifically, we first design an environment to estimate the utilities of skill learning by mining the massive job advertisement data, which includes a skill-matching-based salary estimator and a frequent itemset-based learning difficulty estimator. Based on the environment, we design a Skill Recommendation Deep Q-Network (SRDQN) with multi-task structure to estimate the long-term skill learning utilities. In particular, SRDQN recommends job skills in a personalized and cost-effective manner; that is, the talents will only learn the recommended necessary skills for achieving their career goals. Finally, extensive experiments on a real-world dataset clearly validate the effectiveness and interpretability of our approach.

Personalized Approximate Pareto-Efficient Recommendation

Real-world recommendation systems usually have different learning objectives and evaluation criteria on accuracy, diversity or novelty. Therefore, multi-objective recommendation (MOR) has been widely explored to jointly model different objectives. Pareto efficiency, where no objective can be further improved without hurting others, is viewed as an optimal situation in multi-objective optimization. Recently, Pareto efficiency model has been introduced to MOR, while all existing scalarization methods only have shared objective weights for all instances. To capture users’ objective-level preferences and enhance personalization in Pareto-efficient recommendation, we propose a novel Personalized Approximate Pareto-Efficient Recommendation (PAPERec) framework for multi-objective recommendation. Specifically, we design an approximate Pareto-efficient learning based on scalarization with KKT conditions that closely mimics Pareto efficiency, where users have personalized weights on different objectives. We propose a Pareto-oriented reinforcement learning module to find appropriate personalized objective weights for each user, with the weighted sum of multiple objectives’ gradients considered in reward. In experiments, we conduct extensive offline and online evaluations on a real-world recommendation system. The significant improvements verify the effectiveness of PAPERec in practice. We have deployed PAPERec on WeChat Top Stories, affecting millions of users. The source codes are released in https://github.com/onepunch-cyber/PAPERec.

ELIXIR: Learning from User Feedback on Explanations to Improve Recommender Models

System-provided explanations for recommendations are an important component towards transparent and trustworthy AI. In state-of-the-art research, this is a one-way signal, though, to improve user acceptance. In this paper, we turn the role of explanations around and investigate how they can contribute to enhancing the quality of generated recommendations themselves. We devise a human-in-the-loop framework, called Elixir, where user feedback on explanations is leveraged for pairwise learning of user preferences. Elixir leverages feedback on pairs of recommendations and explanations to learn user-specific latent preference vectors, overcoming sparseness by label propagation with item-similarity-based neighborhoods. Our framework is instantiated using generalized graph recommendation via Random Walk with Restart. Insightful experiments with a real user study show significant improvements in movie and book recommendations over item-level feedback.

Bidirectional Distillation for Top-K Recommender System

Recommender systems (RS) have started to employ knowledge distillation, which is a model compression technique training a compact model (student) with the knowledge transferred from a cumbersome model (teacher). The state-of-the-art methods rely on unidirectional distillation transferring the knowledge only from the teacher to the student, with an underlying assumption that the teacher is always superior to the student. However, we demonstrate that the student performs better than the teacher on a significant proportion of the test set, especially for RS. Based on this observation, we propose Bidirectional Distillation (BD) framework whereby both the teacher and the student collaboratively improve with each other. Specifically, each model is trained with the distillation loss that makes to follow the other’s prediction along with its original loss function. For effective bidirectional distillation, we propose rank discrepancy-aware sampling scheme to distill only the informative knowledge that can fully enhance each other. The proposed scheme is designed to effectively cope with a large performance gap between the teacher and the student. Trained in the bidirectional way, it turns out that both the teacher and the student are significantly improved compared to when being trained separately. Our extensive experiments on real-world datasets show that our proposed framework consistently outperforms the state-of-the-art competitors. We also provide analyses for an in-depth understanding of BD and ablation studies to verify the effectiveness of each proposed component.

Towards Content Provider Aware Recommender Systems: A Simulation Study on the Interplay between User and Provider Utilities

Most existing recommender systems focus primarily on matching users (content consumers) to content which maximizes user satisfaction on the platform. It is increasingly obvious, however, that content providers have a critical influence on user satisfaction through content creation, largely determining the content pool available for recommendation. A natural question thus arises: can we design recommenders taking into account the long-term utility of both users and content providers? By doing so, we hope to sustain more content providers and a more diverse content pool for long-term user satisfaction. Understanding the full impact of recommendations on both user and content provider groups is challenging. This paper aims to serve as a research investigation of one approach toward building a content provider aware recommender, and evaluating its impact in a simulated setup.

To characterize the user-recommender-provider interdependence, we complement user modeling by formalizing provider dynamics as well. The resulting joint dynamical system gives rise to a weakly-coupled partially observable Markov decision process driven by recommender actions and user feedback to providers. We then build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective of user utility and the counterfactual utility lift of the content provider associated with the recommended content, which we show to be equivalent to maximizing overall user utility and the utilities of all content providers on the platform under some mild assumptions. To evaluate our approach, we introduce a simulation environment capturing the key interactions among users, providers, and the recommender. We offer a number of simulated experiments that shed light on both the benefits and the limitations of our approach. These results help understand how and when a content provider aware recommender agent is of benefit in building multi-stakeholder recommender systems.

SESSION: Session: Network Algorithms

Robust Network Alignment via Attack Signal Scaling and Adversarial Perturbation Elimination

Recent studies have shown that graph learning models are highly vulnerable to adversarial attacks, and network alignment methods are no exception. How to enhance the robustness of network alignment against adversarial attacks remains an open research problem. In this paper, we propose a robust network alignment solution, RNA, for offering preemptive protection of existing network alignment algorithms, enhanced with the guidance of effective adversarial attacks. First, we analyze how popular iterative gradient-based adversarial attack techniques suffer from gradient vanishing issues and show a fake sense of attack effectiveness. Based on dynamical isometry theory, an attack signal scaling (ASS) method with established upper bound of feasible signal scaling is introduced to alleviate the gradient vanishing issues for effective adversarial attacks while maintaining the decision boundary of network alignment. Second, we develop an adversarial perturbation elimination (APE) model to neutralize adversarial nodes in vulnerable space to adversarial-free nodes in safe area, by integrating Dirac delta approximation (DDA) techniques and the LSTM models. Our proposed APE method is able to provide proactive protection to existing network alignment algorithms against adversarial attacks. The theoretical analysis demonstrates the existence of an optimal distribution for the APE model to reach a lower bound. Last but not least, extensive evaluation on real datasets presents that RNA is able to offer the preemptive protection to trained network alignment methods against three popular adversarial attack models.

Attent: Active Attributed Network Alignment

Network alignment finds node correspondences across multiple networks, where the alignment accuracy is of crucial importance because of its profound impact on downstream applications. The vast majority of existing works focus on how to best utilize the topology and attribute information of the input networks as well as the anchor links when available. Nonetheless, it has not been well studied on how to boost the alignment performance through actively obtaining high-quality and informative anchor links, with a few exceptions. The sparse literature on active network alignment introduces the human in the loop to label some seed node correspondence (i.e., anchor links), which are informative from the perspective of querying the most uncertain node given few potential matchings. However, the direct influence of the intrinsic network attribute information on the alignment results has largely remained unknown. In this paper, we tackle this challenge and propose an active network alignment method (Attent) to identify the best nodes to query. The key idea of the proposed method is to leverage effective and efficient influence functions defined over the alignment solution to evaluate the goodness of the candidate nodes for query. Our proposed query strategy bears three distinct advantages, including (1) effectiveness, being able to accurately quantify the influence of the candidate nodes on the alignment results; (2) efficiency, scaling linearly with 15 − 17 × speed-up over the straight-forward implementation without any quality loss; (3) generality, consistently improving alignment performance of a variety of network alignment algorithms.

BRIGHT: A Bridging Algorithm for Network Alignment

Multiple networks emerge in a wealth of high-impact applications. Network alignment, which aims to find the node correspondence across different networks, plays a fundamental role for many data mining tasks. Most of the existing methods can be divided into two categories: (1) consistency optimization based methods, which often explicitly assume the alignment to be consistent in terms of neighborhood topology and attribute across networks, and (2) network embedding based methods which learn low-dimensional node embedding vectors to infer alignment. In this paper, by analyzing representative methods of these two categories, we show that (1) the consistency optimization based methods are essentially specific random walk propagations from anchor links that might be too restrictive; (2) the embedding based methods no longer explicitly assume alignment consistency but inevitably suffer from the space disparity issue. To overcome these two limitations, we bridge these methods and propose a novel family of network alignment algorithms BRIGHT to handle both plain and attributed networks. Specifically, it constructs a space by random walk with restart (RWR) whose bases are one-hot encoding vectors of anchor nodes, followed by a shared linear layer. Our experiments on real-world networks show that the proposed family of algorithms BRIGHT outperform the state-of-the-arts for both plain and attributed network alignment tasks.

Sketch-based Algorithms for Approximate Shortest Paths in Road Networks

Constructing efficient data structures (distance oracles) for fast computation of shortest paths and other connectivity measures in graphs has been a promising area of study in computer science [23, 24, 28]. In this paper, we propose very efficient algorithms, based on a distance oracle, for computing approximate shortest paths and alternate paths in road networks. Specifically, we adopt a distance oracle construction that exploits the existence of small separators in such networks. In other words, the existence of a small cut in a graph admits a partitioning of the graph into balanced components with a small number of inter-component edges. We demonstrate the efficacy of our algorithm by using it to find near optimal shortest paths and show that it also has the desired properties of well-studied goal-oriented path search algorithms such as ALT [12]. We further demonstrate the use of our distance oracle to produce multiple alternative routes in addition to the shortest path. Finally, we empirically demonstrate that our method, while exploring few edges, produces high quality alternates with respect to metrics such as optimality-loss and diversity of paths.

Efficient Reductions and a Fast Algorithm of Maximum Weighted Independent Set

The maximum independent set problem is one of the most fundamental problems in graph algorithms and has been widely studied in social networks. The weighted version of this problem, where each vertex is assigned a nonnegative weight, also receives a lot of attention due to its potential applications in many areas. However, many nice properties and fast algorithms for the unweighted version can not be extended to the weighted version. In this paper, we study the structural properties of this problem, giving some sufficient conditions for a vertex being or not being in a maximum weighted independent set. These properties provide a suite of reduction rules that includes and generalizes almost all frequently used reduction rules for this problem. These rules can efficiently find partial solutions and greatly reduce the instances, especially for sparse graphs. Based on them, we also propose a simple exact yet practical algorithm. To demonstrate the efficiency of our algorithm, we compare it with state-of-the-art algorithms on several well-known datasets from the real world. The experimental results reveal that our exact algorithm is not only faster than existing algorithms but also can exactly solve more hard instances with 1,000 seconds. For remaining infeasible instances, our reduction rules can also improve existing heuristic algorithms by producing higher-quality solutions using less time.

SESSION: Session: Auctions and Incentives

Auction Design for ROI-Constrained Buyers

We combine theory and empirics to (i) show that some buyers in online advertising markets are financially constrained and (ii) demonstrate how to design auctions that take into account such financial constraints. We use data from a field experiment where reserve prices were randomized on Google’s advertising exchange (AdX). We find that, contrary to the predictions of classical auction theory, a significant set of buyers lowers their bids when reserve prices go up. We show that this behavior can be explained if we assume buyers have constraints on their minimum return on investment (ROI). We proceed to design auctions for ROI-constrained buyers. We show that optimal auctions for symmetric ROI-constrained buyers are either second-price auctions with reduced reserve prices or subsidized second-price auctions. For asymmetric buyers, the optimal auction involves a modification of virtual values. Going back to the data, we show that using ROI-aware optimal auctions can lead to large revenue gains and large welfare gains for buyers.

Bid Prediction in Repeated Auctions with Learning

We consider the problem of bid prediction in repeated auctions and evaluate the performance of econometric methods for learning agents, using a dataset from a mainstream sponsored search auction marketplace. Sponsored search auctions are a billion dollar industry and the main source of revenue of several tech giants. A critical problem in optimizing such marketplaces is understanding how bidders will react to changes in the auction design. We propose the use of no-regret-based econometrics for bid prediction, modeling players as no-regret learners with respect to a utility function, unknown to the analyst. We propose econometric approaches to simultaneously learn the parameters of a player’s utility and her learning rule, and apply these methods to a real-world dataset from the BingAds sponsored search auction marketplace. We show that the no-regret econometric methods perform comparably to state-of-the-art time-series machine-learning methods when there is no co-variate shift, but significantly outperform machine-learning methods when there is a co-variate shift between the training and test periods. This portrays the importance of using structural econometric approaches for predicting how players will respond to changes in the market. Moreover, we show that among structural econometric methods, approaches based on no-regret learning outperform more traditional, equilibrium-based, econometric methods that assume that players continuously best respond to competition. Finally, we demonstrate how the prediction performance of the no-regret learning algorithms can be further improved by considering bidders who optimize a utility function with a visibility bias component.

Towards Efficient Auctions in an Auto-bidding World

Auto-bidding has become one of the main options for bidding in online advertisements, in which advertisers only need to specify high-level objectives and leave the complex task of bidding to auto-bidders. In this paper, we propose a family of auctions with boosts to improve welfare in auto-bidding environments with both return on ad spend constraints and budget constraints. Our empirical results validate our theoretical findings and show that both the welfare and revenue can be improved by selecting the weight of the boosts properly.

Information Elicitation from Rowdy Crowds

We initiate the study of information elicitation mechanisms for a crowd containing both self-interested agents, who respond to incentives, and adversarial agents, who may collude to disrupt the system. Our mechanisms work in the peer prediction setting where ground truth need not be accessible to the mechanism or even exist.

We provide a meta-mechanism that reduces the design of peer prediction mechanisms to a related robust learning problem. The resulting mechanisms are ϵ-informed truthful, which means truth-telling is the highest paid ϵ-Bayesian Nash equilibrium (up to ϵ-error) and pays strictly more than uninformative equilibria. The value of ϵ depends on the properties of robust learning algorithm, and typically limits to 0 as the number of tasks and agents increase.

We show how to use our meta-mechanism to design mechanisms with provable guarantees in two important crowdsourcing settings even when some agents are self-interested and others are adversarial.

Evaluating the Rationales of Amateur Investors

Social media’s rise in popularity has demonstrated the usefulness of the wisdom of the crowd. Most previous works take into account the law of large numbers and simply average the results extracted from tasks such as opinion mining and sentiment analysis. Few attempt to identify high-quality opinions from the mined results. In this paper, we propose an approach for capturing expert-like rationales from social media platforms without the requirement of the annotated data. By leveraging stylistic and semantic features, our approach achieves an F1-score of 90.81%. The comparison between the rationales of experts and those of the crowd is done from stylistic and semantic perspectives, revealing that stylistic and semantic information provides complementary cues for professional rationales. We further show the advantage of using these superlative analysis results in the financial market, and find that top-ranked opinions identified by our approach increase potential returns by up to 90.31% and reduce downside risk by up to 71.69%, compared with opinions ranked by feedback from social media users. Moreover, the performance of our method on downside risk control is comparable with that of professional analysts.

SESSION: Session: Knowledge Extraction

Information Extraction From Co-Occurring Similar Entities

Knowledge about entities and their interrelations is a crucial factor of success for tasks like question answering or text summarization. Publicly available knowledge graphs like Wikidata or DBpedia are, however, far from being complete. In this paper, we explore how information extracted from similar entities that co-occur in structures like tables or lists can help to increase the coverage of such knowledge graphs. In contrast to existing approaches, we do not focus on relationships within a listing (e.g., between two entities in a table row) but on the relationship between a listing’s subject entities and the context of the listing. To that end, we propose a descriptive rule mining approach that uses distant supervision to derive rules for these relationships based on a listing’s context. Extracted from a suitable data corpus, the rules can be used to extend a knowledge graph with novel entities and assertions. In our experiments we demonstrate that the approach is able to extract up to 3M novel entities and 30M additional assertions from listings in Wikipedia. We find that the extracted information is of high quality and thus suitable to extend Wikipedia-based knowledge graphs like DBpedia, YAGO, and CaLiGraph. For the case of DBpedia, this would result in an increase of covered entities by roughly 50%.

Unsupervised Semantic Association Learning with Latent Label Inference

In this paper, we unify a diverse set of learning tasks in NLP, semantic retrieval and related areas, under a common umbrella, which we call unsupervised semantic association learning (USAL). Examples of this generic task include word sense disambiguation, answer selection and question retrieval. We then present a novel modeling framework to tackle such tasks. The framework introduces, under the deep learning paradigm, a latent label indexing the true target in the candidate target set. An EM algorithm is then developed for learning the deep model and inferring the latent variables, principled under variational techniques and noise contrastive estimation. We apply the model and algorithm to several semantic retrieval benchmark tasks and the superior performance of the proposed approach is demonstrated via empirical studies.

TCN: Table Convolutional Network for Web Table Interpretation

Information extraction from semi-structured webpages provides valuable long-tailed facts for augmenting knowledge graph. Relational Web tables are a critical component containing additional entities and attributes of rich and diverse knowledge. However, extracting knowledge from relational tables is challenging because of sparse contextual information. Existing work linearize table cells and heavily rely on modifying deep language models such as BERT which only captures related cells information in the same table. In this work, we propose a novel relational table representation learning approach considering both the intra- and inter-table contextual information. On one hand, the proposed Table Convolutional Network model employs the attention mechanism to adaptively focus on the most informative intra-table cells of the same row or column; and, on the other hand, it aggregates inter-table contextual information from various types of implicit connections between cells across different tables. Specifically, we propose three novel aggregation modules for (i) cells of the same value, (ii) cells of the same schema position, and (iii) cells linked to the same page topic. We further devise a supervised multi-task training objective for jointly predicting column type and pairwise column relation, as well as a table cell recovery objective for pre-training. Experiments on real Web table datasets demonstrate our method can outperform competitive baselines by of F1 for column type prediction and by of F1 for pairwise column relation prediction.

Extracting Contextualized Quantity Facts from Web Tables

Quantity queries, with filter conditions on quantitative measures of entities, are beyond the functionality of search engines and QA assistants. To enable such queries over web contents, this paper develops a novel method for automatically extracting quantity facts from ad-hoc web tables. This involves recognizing quantities, with normalized values and units, aligning them with the proper entities, and contextualizing these pairs with informative cues to match sophisticated queries with modifiers. Our method includes a new approach to aligning quantity columns to entity columns. Prior works assumed a single subject-column per table, whereas our approach is geared for complex tables and leverages external corpora as evidence. For contextualization, we identify informative cues from text and structural markup that surrounds a table. For query-time fact ranking, we devise a new scoring technique that exploits both context similarity and inter-fact consistency. Comparisons of our building blocks against state-of-the-art baselines and extrinsic experiments with two query benchmarks demonstrate the benefits of our method.

Searching to Sparsify Tensor Decomposition for N-ary Relational Data

Tensor, an extension of the vector and matrix to the multi-dimensional case, is a natural way to describe the N-ary relational data. Recently, tensor decomposition methods have been introduced into N-ary relational data and become state-of-the-art on embedding learning. However, the performance of existing tensor decomposition methods is not as good as desired. First, they suffer from the data-sparsity issue since they can only learn from the N-ary relational data with a specific arity, i.e., parts of common N-ary relational data. Besides, they are neither effective nor efficient enough to be trained due to the over-parameterization problem. In this paper, we propose a novel method, i.e., S2S, for effectively and efficiently learning from the N-ary relational data. Specifically, we propose a new tensor decomposition framework, which allows embedding sharing to learn from facts with mixed arity. Since the core tensors may still suffer from the over-parameterization, we propose to reduce parameters by sparsifying the core tensors while retaining their expressive power using neural architecture search (NAS) techniques, which can search for data-dependent architectures. As a result, the proposed S2S not only guarantees to be expressive but also efficiently learns from mixed arity. Finally, empirical results have demonstrated that S2S is efficient to train and achieves state-of-the-art performance. 1