Hence, research on the temporal dimension of Web contents opens up great opportunities for analysts. For example, one could compare the notions of “online friends” and “social networks” as of today versus five or ten years back. Similar examples about “tablet PC” or “online music” are relevant for a business analyst or a technology journalist. Similarly, the hyperlink structure of archived material can now be systematically exploited. So, it is possible to see how site (or even domain) structures develop over time, whether they are affected by Web spam or not, and which prevalent structures exist in general within a certain domain.
The focus of TempWeb and the topics addressed are a “natural” match with The Web Conference. With digital content born almost two decades ago, the need for a more systematic exploitation of our digital cultural heritage as well as new analysis techniques, becomes evident. While the early 90’s of the Web content is almost completely lost, national libraries, digital news archives and archiving institutions (like the Internet Archive Foundation) have protected Web content from vanishing. However, the societal as well as scientific impact of temporal Web analytics has not been sufficiently studied. As The Web Conference is the premier event series in this domain, we consider TempWeb an ideal venue to exchange knowledge about temporal analytics at a web scale with experts from science and industry.
This is the 7th edition of the annual workshop series labeled “WebAndTheCity – Web Intelligence and Smart Cities”, which started back in Florence in 2015 and kept on taking place every year in conjunction with the WWW conference series. Last year the workshop was held virtually in Taipei during The Web Conference 2020. The workshop series aims to investigate the Web and Web applications’ role in smart city (SC) growth. This year, the workshop focuses on SC resilience, with the inclusion of Web Intelligence that enables self-sensing and early alerts, and self-adaptation of smart services in SC: SC seem to have failed so far in recognizing Covid-19 and other risks and questions rise regarding how can SC self-detect and analyze risks from the environment? How it can generate alerts and communications with the appropriate smart components and city stakeholders against these risks?
Chiara Renso and Jeanna Matthews
Third Workshop on Fairness, Accountability, Transparency, Ethics and Society on the Web (FATES 2021)
The workshop will promote the discussion around the critical questions of fairness, accountability, transparency and ethics and join forces towards a Web that is truly inclusive, transparent and open. Data is learned from people. Personal data collected from social media and mobile devices, often considered sensitive information, has been extensively used by systems for a number of purposes, including user behavior forecasting, content recommendation and fraud detection. User behavior, in turn, is changing based on the algorithms that users are exposed to. Recent studies have revealed that many machine-learning based systems exhibit biases, including racial and gender bias. This scenario raises new challenges concerning algorithmic fairness and accountability, transparency of machine-learning models, the importance of developing better AI systems on the Web and tools to deal with privacy matters, and ethics on modeling and analyzing online communities, such as social media interactions, mobility data, political engagement networks, healthcare communities, and so on. The goal of this workshop is to gather researchers and developers from academia, industry, and civil society to present and debate topics of the importance of developing better AI systems on the Web and tools to deal with privacy matters. To achieve this, we will seek contributions that describe research initiatives, projects, results, and design techniques and experiments that are being developed to deal with fairness and accountability, transparency, and ethics on AI and privacy. In this sense, we will encourage submissions in various degrees of progress, such as new results, visions, techniques, innovative application papers, and progress reports.
Modern society faces an unprecedented number of events that impact countries, communities and economies around the globe, across language, country and community borders. Recent examples include sudden or unexpected events such as terrorist attacks, a world-wide pandemic and political shake-ups such as Brexit, as well as longer, ongoing and evolving topics such as the migration crisis in Europe, that regularly spawn events of global importance affecting local communities. These developments result in a vast amount of event-centric, multilingual information available from heterogeneous sources on the Web, in the Web of Data, within Knowledge Graphs, in social media, inside Web archives and in news sources. Such event-centric information differs across sources, languages and communities, potentially reflecting community-specific aspects, opinions, sentiments and bias.
The theme of the CLEOPATRA workshop – event-centric multilingual analytics – includes a variety of interdisciplinary challenges related to analysis, interaction with and interpretation of vast amounts of event-centric textual, semantic and visual information in multiple languages originating from different communities. The objective of the workshop is to bring together researchers and practitioners interested in the development of methods for analysing event-centric multilingual information.
The CLEOPATRA workshop will be a highly interactive event, which will include keynotes by experts in the relevant fields, poster and demo sessions, research presentations and discussion.
Training, retraining and deploying web-scale machine learning models requires large amounts of high-quality data. Often, this is achieved via a time-consuming, labor intensive human annotation process. While in web-scale applications, there is an abundance of unlabeled, often extremely noisy data, there is a severe lack of high quality labeled data from which practitioners can train ML models that perform well on customer-facing applications. To this end, it is imperative that ML scientists and engineers devise innovative ways to deal with the constrained setting of small amounts of labeled data, and make the best use of limited (time and monetary) budget available to obtain annotated data. Thus, one needs to train data-efficient machine learning models. This has led to the proliferation of creative techniques such as data augmentation, transfer learning, self-supervised learning, active learning, multi-task learning to name a few. While many of these techniques have shown to work well under specific settings, web data offers additional challenges. Web data is multi-modal in nature, it has implicit signals from user-interactions, and often involves multiple agents. Given the uniqueness, importance, and growing interest in these problems, the workshop on Data-efficient Machine Learning for Web Applications (DeMaL) is a venue to present ideas and solutions to these problems. The full day workshop aims to bring together practitioners in both academia and industry working on the collection, annotation and usage of labeled data for large scale web applications.
Paolo Manghi, Andrea Mannocci, Francesco Osborne, Dimitris Sacharidis, Angelo Salatino and Thanasis Vergoulis
1st International Workshop on Scientific Knowledge Representation, Discovery, and Assessment (Sci-K)
In the last decades, we have experienced a substantial increase in the volume of published scientific articles and related research objects (e.g., data sets, software packages); a trend that is expected to continue. This opens up fundamental challenges including generating large-scale machine-readable representations of scientific knowledge, making scholarly data discoverable and accessible, and designing reliable and comprehensive metrics to assess scientific impact. The main objective of Sci-K is to provide a forum for researchers and practitioners from different disciplines to present, educate from, and guide research related to scientific knowledge. Specifically, we foresee three main themes that cover the most important challenges in the field: representation, discoverability, and assessment.
Representation. There is an urge for flexible, context-sensitive, fine-grained, and machine-actionable representations of scholarly knowledge that at the same time are structured, interlinked, and semantically rich: Scientific Knowledge Graphs (SKGs). These resources can power several data-driven services for navigating, analysing, and making sense of research dynamics. Current challenges are related to the design of ontologies able to conceptualise scholarly knowledge, model its representation, and enable its exchange across different SKGs.
Discoverability. It is important that scholarly information is easily findable, discoverable, and visible, so that it can be mined and organised within SKGs. Hence, we need discovery tools able to crawl the Web and identify scholarly data, whether on a publisher’s website or elsewhere – institutional repositories, preprint servers, open-access repositories, and others. This is a particularly challenging endeavour as it requires a deep understanding of both the scholarly communication landscape and the needs of a variety of stakeholders: researchers, publishers, funders, and the general public. Other challenges are related to the discovery and extraction of entities and concepts, integration of information from heterogeneous sources, identification of duplicates, finding connections between entities, and identifying conceptual inconsistencies.
Assessment. Due to the continuous growth in the volume of research output, rigorous approaches for the assessment of research impact are now more valuable than ever. In this context, we urge reliable and comprehensive metrics and indicators of the scientific impact and merit of publications, datasets, research institutions, individual researchers, and other relevant entities.
Wenzhong Guo, Chin-Chen Chang, Eyhab Al-Masri, Chi-Hua Chen, Haishuai Wang and K. Shankar
The 1st International Workshop on Deep Learning for the Web of Things
In recent years, the techniques of Internet of Things (IoT) and Web of Things (WoT) have been more and more popular to collect sensing data and build intelligent services and applications. Some organizations (e.g., oneM2M, AllSeen Alliance, Open Connectivity Foundation (OCF), IEEE, etc.) were instituted to establish the standards and specifications of IoT for building an IoT ecosystem. These standards and specifications discuss the issues of data models, unique identification of things, service descriptions and dependencies, discovery, trust management, and real-time control and cyber-physical systems. For instance, the AllSeen Alliance and OCF designed discovery and advertisement mechanisms to send multicast packets to find the adapted devices which include the target interface in wireless local area network based on IEEE 802.11 or personal area network based on IEEE 802.15 for building a self-organizing network. The devices can follow the data models and control methods in specifications to control other AllJoyn or OCF devices for IoT applications. However, the communications among the different techniques of IoT standards and specifications are the important challenges. Therefore, the interoperation of services across platforms based on different IoT standards and specifications needs to be investigated. For example, the Interworking Proxy Entity (IPE) was designed to establish the connection of oneM2M, AllJoyn, OCF, and Lightweight M2M in oneM2M’s Release 2. The WoT defined by the World Wide Web Consortium (W3C) focuses on the web technologies for the combination and interoperation of the IoT with the web of data. Developers can use the techniques of WoT to collect the sensing data and control the devices via different IoT standards and specifications for the applications of agriculture, energy, enterprise, finance, healthcare, industry, public services, residency, retail, and transportation.
Furthermore, Deep learning techniques (e.g. neural network (NN), convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM), etc.) have been popularly applied into image recognition and time-series inference for IoT and WoT applications. Advanced driver assistance systems and autonomous cars, for instance, have been developed based on the machine learning and deep learning techniques, which perform the forward collision warning, blind spot monitoring, lane departure warning, traffic sign recognition, traffic safety, infrastructure management and congestion, and so on. Autonomous cars can share their detected information (e.g. traffic signs, collision events, etc.) with other cars via vehicular communication systems (e.g. dedicated short range communications (DSRC), vehicular ad hoc networks (VANETs), long term evolution (LTE), and the 5th generation mobile networks) for cooperation. However, how to enhance the performance and efficiency of these deep learning techniques is one of the big challenges for implementing these real-time applications. Several optimization techniques (e.g. stochastic gradient descent algorithm (SGD), adaptive moment estimation algorithm (Adam), Nesterov-accelerated Adaptive Moment Estimation (Nadam), etc.) have been proposed to support deep learning algorithms for faster solution searching, e.g., the gradient descent method is a popular optimization technique to quickly seek the optimized weight sets and filters of CNN for image recognition. The IoT and WoT applications based on these image recognition techniques (e.g. autonomous cars, augmented reality navigation systems, etc.) have gained considerable attention, and the hybrid approaches typical of mathematics for engineering and computer science (e.g. deep learning and optimization techniques) can be investigated and developed to support a variety of IoT and WoT applications.
Eyhab Al-Masri and Di Wang
First International Workshop on the Efficiency of Modern Datacenters (EMDC)
Major research challenges in the operations of data centers include performance, power efficiency, availability, scalability, security among many others. As the number of Internet of Things (IoT) devices proliferates, data center capabilities will transcend basic management operations. That is, traditional management capabilities for CPU, memory and input/output operations need to be replaced with more advanced IoT-based management capabilities to include items such as temperature sensors, fan speed sensors, power sensors, moisture sensors among many others. Many modern data centers today continuously collect and aggregate a wide range of telemetry data in order to avoiding critical downtimes. For example, as heat load of modern data centers increases, the ability to monitor and manage ambient temperature becomes more and more vital for the availability, reliability, serviceability, safety, manageability and scalability of these mission-critical assets. However, such management capabilities also contribute to the consumption of network bandwidth, computational processing power and data storage. Therefore, we need more rigorous architectures and design methods for the efficient modern data centers, more sophisticated design and simulation tools, reliable equipment and software systems benchmarks, accurate performance evaluation methods, among many others. This workshop will provide researchers and practitioners a venue to discuss the efficiency of modern data centers. The workshop’s ambition is to help in shaping a community of interest on the existing research opportunities and challenges associated with the engineering design and management of modern data centers. In this context, we believe having a dedicated workshop that brings researchers and practitioners together will help investigate innovative ideas or approaches to this new research challenge with main focus on the efficiency of modern data centers, foster collaborations and exchange points of view.
Cheng-Te Li and Lun-Wei Lu
The 9th International Workshop on Natural Language Processing for Social Media (SocialNLP 2021)
SocialNLP is a new inter-disciplinary area of natural language processing (NLP) and social computing. We consider three plausible directions of SocialNLP: (1) addressing issues in social computing using the techniques of artificial intelligence; (2) solving NLP problems using information from online social networks or social media; and (3) handling new problems related to both social media analysis and natural language processing. We aim at organizing the seventh SocialNLP workshop in The Web Conference 2021 because of the following rationales. First, social media analysis and sentiment analysis are two research topics which are closely related to natural language processing. Moreover, their development highly depends on NLP techniques due to textual data. In recent WWW conferences, no matter to tell from the number of submissions or participants, it is apparent that they are certainly two of the biggest research communities. Therefore, we believe that the SocialNLP workshop can draw much interest and attract many audiences from potential academic or industrial participants of WWW. Second, social media data is essentially generated and collected from online social services, which have accumulated a large number of user-generated social data, i.e., big social data. Processing such big social data with Web AI knowledge and NLP techniques has encountered many important research problems. Hence NLP and AI researchers might found some inspirations and useful information from the SocialNLP workshop. Third, user-generated data in social media is mainly in the form of text. Theories and techniques on Web artificial intelligence and natural language processing are desired for semantic understanding, accurate search, and efficient processing of social media content. From the perspective of application, novel online applications involving social media analysis and sentiment analysis, such as emergency management, social recommendation, user behavior analysis, user social community analysis and future prediction are topics that NLP/Web/AI/Social Media researchers have paid attention to.
Through this workshop, we provide a platform for research outcome presentations and head-to-head discussions in the area of SocialNLP, with the hope to combine the insight and experience of prominent researchers from both NLP and social computing fields to jointly contribute to the area of SocialNLP. Seeing the high impact of Covid-19 pandemic this year, we are thinking about what we and the NLP and social media analytics techniques can do to help mitigate what it did to us. A serious issue which obviously makes the situation even worse is the fake news of Covid-19. For example, the belief on the fake news about drinking pure alcohol helps prevent the infection causes 30 death in Turkey. Considering our workshop’s previous participants have developed technologies and systems related to social media data and emotion analysis, we especially design the CovidFake-EmoReact challenge this year.
While supervised and unsupervised learning have been extensively used for knowledge discovery for decades and have achieved immense success, much less attention has been paid to reinforcement learning in knowledge discovery until the recent emergence of deep reinforcement learning (DRL). By integrating deep learning into reinforcement learning, DRL is not only capable of continuing sensing and learning to act, but also capturing complex patterns with the power of deep learning. Recent years have witnessed the enormous success of DRL for numerous domains such as the game of Go, video games, and robotics, leading up to increasing advances of DRL for knowledge discovery. For instance, RL-based recommender systems have been developed to produce recommendations that maximize user utility (reward) in the long run for interactive systems; RL-based traffic signal systems have been designed to control traffic lights in real time to enhance traffic efficiency for urban computing. Similar excitement has been generated in other areas of knowledge discovery, such as graph optimization, interactive dialogue systems, and big data systems. While these successes show the promise of DRL, applying learning from game-based DRL to knowledge discovery is fraught with unique challenges, including, but not limited to, extreme data sparsity, power-law distributed samples, and large state and action spaces. Therefore, it is timely and necessary to provide a venue, which can bring together academia researchers and industry practitioners (1) to discuss the principles, limitations and applications of DRL for knowledge discovery; and (2) to foster research on innovative algorithms, novel techniques, and new applications of DRL to knowledge discovery.
Due to the exponential growth in the number of scientific articles published in the biomedical domain, obtaining the most relevant articles to a topic of interest, integrating different parts of knowledge from various studies, and finding reliable and scientifically sound studies present significant challenges. While traditional term-based information retrieval and machine learning techniques can be employed for literature search, ranking and integration, such approaches lack an effective mechanism for retrieving scientific articles that usually contain domain-specific terminology, phrases, and abbreviations, where text can have differing and ambiguous semantics based on the given context and domain. Knowledge representation and Semantics-enabled techniques have already shown the potential to systematically curate, organize, retrieve and interpret content in ways that relates well to human understanding.
The recent major collaborative effort for rapid and effective retrieval of literature around COVID-19 is a strong indication for the need to develop effective retrieval techniques specifically tailored for the biomedical domain. The White House Office of Science and Technology Policy along with major institutes such as Chan Zuckerberg Initiative, Microsoft Research, the Allen Institute for Artificial Intelligence, and the National Institutes of Health’s National Library of Medicine, among others, shared the COVID-19 Open Research Dataset, known as CORD-19, which consists of 30,000 scientific articles about the virus known as SARS-CoV-2. One of the objectives behind the release of such a dataset is for researchers to build tools and techniques that can identify and effectively retrieve information from the literature.
To this end, the main objective of this workshop is to bring together researchers from Web search, semantic Web, and biomedical communities, to present and discuss the latest methods and results in biomedical information and knowledge representation, integration, and retrieval. Given the fact that searching and retrieving all studies that address a research question is one of the initial and main important tasks in devising a systematic review, we also aim at investigating the application of semantics and Web search in creating biomedical systematic literature review. The distinguishing feature of this workshop is its focus on leveraging semantic-techniques for information retrieval from biomedical literature. This further means that it puts rigorous attention on semantic-based techniques that allow for the creation, analysis, and integration of biomedical knowledge bases, with the ultimate objective of employing such knowledge bases to improve search performance over biomedical literature.
Miriam Redi, Robert West and Leila Zia
8th Wiki Workshop
The goal of Wiki Workshop is to bring together the volunteers, policy makers, and researchers exploring all aspects of the Wikimedia projects such as Wikipedia, Wikidata, and Wikimedia Commons. With members of the Wikimedia Foundation’s Research team in the organizing committee and with the experience of 7 successful previous workshops in 2015 (at ICWSM), 2016 (at WWW and ICWSM), 2017 (at WWW), 2018 (at TheWebConf), 2019 (at TheWebConf), 2020 (at TheWebConf) we aim to continue facilitating a direct pathway for exchanging ideas between researchers, policy makers, volunteers and practitioners of the Wikimedia projects.
FinTech is an emerging and popular topic in both financial and engineering domains. Internet and mobile are key points in the FinTech revolution. In Bank 3.0, financial institutions put some financial service functions into their websites, and the customers can do some financial operations such as transfer by themselves via the site. In the past five years, mobile banking is all the rage as mobile devices become more and more prevalent. Some infrastructures for developing Bank 4.0 have already be done. We think that it is a good time to hold a forum to discuss the possible application scenarios of using the information from the Web in FinTech field.
Expressing opinions and interacting with others on the Web has led to the production of an abundance of online discourse data, such as claims and viewpoints on controversial topics, their sources and contexts (events, entities). This data constitutes a valuable source of insights for studies into misinformation spread, bias reinforcement, echo chambers or political agenda setting. While knowledge graphs (KGs) promise to provide the key to a Web of structured information, they are mainly focused on facts without keeping track of the diversity, connection or temporal evolution of online discourse data. As opposed to facts, claims are inherently more complex. Their interpretation strongly depends on the context and a variety of intentional or unintended meanings, where terminology and conceptual understandings strongly diverge across communities from computational social science, to argumentation mining, fact-checking, or viewpoint/stance detection. This workshop aims at strengthening the relations between these communities, providing a forum for shared works on the modeling, extraction and analysis of discourse on the Web. It will address the need for a shared understanding and structured knowledge about discourse data in order to enable machine-interpretation, discoverability and reuse, in support of scientific or journalistic studies into the analysis of societal debates on the Web.
Workshop Data Science for Social Good
The need for social innovation is clear: From climate change to income inequality to geopolitical upheaval and terrorism, the difficulties confronting us are unprecedented not only in their variety but also in their complexity. At the same time, today’s public policy practices and tools are not sufficient. Increasingly, it is clear, we need not only new solutions but new methods to arriving at solutions.
Data and data science will become more central to meeting these challenges and to social innovation, philanthropy, international development and humanitarian aid. From the analysis of satellite imagery to mapping poverty to using Facebook data to track the global digital gender gap, “Data Science for Social Good” provides great promise. Data from corporate actors (e.g. mobile phones data, remote sensing, satellite imagery) as well as data from digital traces generated by the pervasiveness of the Web in combination with state-of-the-art knowledge generated by data science can be synergically exploited to solve issues around many social problems and support global agencies and policymakers in implementing better and more impactful policies and interventions.
Yet, for all of data’s potential to address public challenges, the truth remains that most of the data assets and data science capabilities available today are not yet sufficiently applied to solving public problems. Because of a lack of awareness of the potential, funding and data access restrictions and often poorly distributed data science capacity, its vast potential often goes untapped.
This workshop will review the potential and emerging field of data science for good – as well as how to develop new partnerships for the data age (Data Collaboratives) to unlock both data and data science capabilities.
We will gather researchers from the fields of data science, machine learning and artificial intelligence together with experts in the social and political sciences to present and discuss applications of data science with a high social impact; and will examine why, where, how and under what conditions data science can advance social objectives.
Online news websites have gained huge popularity for digital news reading. News recommendation and intelligence techniques are critical for news websites to improve the reading experience of users and alleviate information overload. However, news recommendation and intelligence encounter special challenges due to the rich textual information of news articles and the abundant behaviors of users. First, NLP techniques are at the heart to understand the news. Second, news articles usually have short lifecycles, which leads to a severe cold-start problem.
Besides, it is also difficult to model the diverse and evolutional user interest. Thus, we propose the 1st International Workshop on News Recommendation and Intelligence, which aims to promote news recommendation and other news intelligence techniques for improving the reading experience of users. It will provide rich opportunities for researchers and practitioners to disseminate their ideas and knowledge on news recommendation and intelligence. Meanwhile, this workshop will hold a shared task on news recommendation, which can set up a good testbed for news recommendation research. We hope this full-day workshop can facilitate researches on news recommendation, intelligence and other related fields.
Loïc Barrault, Erik Cambria, Giuseppe Castellucci, Simone Filice and Elman Mansimov
NLP Beyond Text – 2nd Edition Workshop
The web is an endless source of information that can be provided in different formats, including text, image, video or audio. At the same time, the way people access to the web is rapidly evolving: new forms of interactions (e.g., visual or voice-based) are integrating or replacing the traditional text-based interaction. Conversational assistants, such as Amazon Alexa or Google Assistant, represent a notable example of an advanced human-machine interaction. Therefore, the capability of processing multi-modal data and interpret user gestures, voice or facial expressions is becoming crucial in modern systems. The goal of this workshop is to promote research in the context of multi-modal and cross-modal NLP.
Luis Garcia Pueyo, Anand Bhaskar, Prathyusha Senthil Kumar, Panayiotis Tsaparas, Kiran Garimella, Yu Sun and Francesco Bonchi
Workshop on Misinformation Integrity in Social Networks
The Workshop on Misinformation Integrity in Social Networks, MISINFO 2021, to be held in conjunction with the 30th The Web Conference (WWW) in Ljubljana, Slovenia, aims to bring together researchers and practitioners to discuss misinformation integrity challenges in social networks and social media platforms. The event consists of (1) a series of invited talks by reputed members of the Integrity community from both academia and industry, (2) a Call for Papers for contributed talks, and (3) a panel with the speakers.
Social media platforms and the web in general play an outsized role in the media consumption process. They have expanded the reach of media messaging through advertising and digital publications, and given a mechanism for expressing opinions and views to anyone with internet access. The flip side of this expanded access has found these platforms to harbor the potential for attacks and abuse on information processes, through misinformation campaigns organized by foreign adversaries and financially motivated actors, misleading and polarizing views from the extremes of the political spectrum receiving viral distribution, and general fake-news / misinformation tactics emerging as new threats.
LocWeb2021 is a full day workshop at The Web Conference 2021. It will run for the 11th time with evolving topics around location-aware information access, Web architecture, spatial social computing, and social good. It is designed as a meeting place for researchers around the location topic at The Web Conference. We expect to draw 30 — 40 participants in a mixed mini-conference and discussion format.
Ashutosh Joshi, Shailendra Agarwal, Vaclav Petricek and Atul Saroop
Workshop on Multilingual Search
In this first Multilingual Search workshop, we aim to bring together researchers and practitioners from across the world, and, in particular, from different disciplines, such as information retrieval, data mining, machine learning, data science, NLP/NLU, machine translation, transfer learning and other related areas to share their ideas and research achievements in providing a seamless search experience in a multilingual setting.
Ying Ding, Benjamin Glicksberg, Jim Hendler, Mark Musen, Fei Wang, Yifan Peng, Gq Zhang and Marinka Zitnik
International Workshop on AI in Health: Transferring and Integrating Knowledge for Better Health
The rich medical concepts connected by semantic relationships integrate EHR data into knowledge graphs to enable knowledge-intensive discoveries. But it is still an open field with lots of challenges. For example, data cannot be easily shared across different hospital systems due to privacy, security, and policy issues. Especially, EHRs are embedded in different commercial vendor systems which makes the integration of EHRs extremely troublesome. Even though the dramatic increase of healthcare data offers unprecedented opportunities for evidence-based care, the interoperability of EHRs and mining the integrated EHRs are still open to innovative solutions. In this workshop, we will welcome researchers from various domains to discuss and share latest progresses related to knowledge representation, computer vision, deep learning, knowledge graph, deep graph mining, and natural language processing to share latest developments to promote innovative semantic approaches to address pressing needs in healthcare.
Bing Yin, Luna Dong, Haixun Wang and Chao Zhang
PKG4Ecommerce: Product Knowledge Graph for E-Commerce Workshop 2020
The global e-Commerce market size is valued at USD 9.09 trillion with an annual growth rate of 14.7%. The 2020 pandemic dramatically changed people’s lifestyles. E-Commerce will further accelerate its growth and penetration into people’s daily lives. E-Commerce websites and apps are among the top visits of everyone’s daily routine. Customers want E-Commerce websites and apps as their personal assistant that finds the exact products they are searching for, provides recommendations when they are not sure which products to buy, and answers questions about product details. Extracting structural knowledge about e-Commerce products from their text descriptions, images, reviews, customer interaction logs is the key for building delightful shopping experience for search, recommendation, advertising, and product QA. Many challenges in building a product knowledge base can benefit from the learnings of building a semantic web. On the other hand, the unique data in e-commerce can spike new research directions in the web conference community. The KMEcommerce workshop aims to bring together researchers from both academia and industry labs to exchange notes and get a pulse for the state of art of building intelligent shopping assistent with product knowledge mining and management.
Yuxiao Dong, Danai Koutra and Qiaozhu Mei
Graph Learning Benchmarks (GLB 2021)
We organize a half-day workshop that calls for the Web community to crowdsource diverse benchmark datasets and tasks for graph representation learning. Real-world graph data are ubiquitous, heterogeneous, and diverse; there are a variety of machine learning tasks formulated on top of the graph-structured data, at either the node level, the edge level, or the graph level. While recent literature has demonstrated that certain families of graph neural networks achieve promising performance on tasks like node classification, graph classification, and link prediction, most of the reported success can only be verified and compared within a rather limited set of formal, publicly available benchmark datasets. There is a trend that further improvements on different tasks (especially tasks beyond node and link predictions) often face dramatically different technical bottlenecks. The lack of diversity in existing benchmark datasets may have also biased the development of graph (representation) learning techniques towards narrow directions.
By crowdsourcing novel tasks and datasets, the proposed workshop aims to increase the diversity of graph learning benchmarks, identify new demands of graph representation learning in general, as well as gain a better synergy of how concrete techniques perform on these benchmarks. Unlike most existing graph learning related workshops that focus on the development of new models and algorithms, the proposed workshop will focus on discovering novel datasets, tasks, and applications and how they can be formalized as new benchmarks. The contributed papers will be evaluated based on the meaningfulness of the proposed tasks/datasets, their potential of becoming new benchmarks for graph learning, and their contributions in understanding the pros and cons of graph neural networks. In particular, part of our goal is to discover benchmark tasks where existing popular graph learning methods fail, which provide valuable insights for the development of new methodology. The expected outcome of the workshop includes a collection of contributed papers, new benchmark datasets/tasks, as well as a summary paper giving a taxonomy of the new datasets.
Jie Tang, Yuxiao Dong and Zhilin Yang
The International Workshop on Self-Supervised Learning for the Web (S2L@WWW 2021)
Over the past few years, self-supervised learning has achieved great success across a variety of domains, such as natural language processing, computer vision, and speech recognition. The promise of self-supervised learning is to leverage input data itself as the supervision signals for learning models that could be as powerful as techniques with dedicated label information. In other words, it does not require task-specific label data, which is often arduously expensive to obtain at scale. Though its soaring performance for tasks on text, image datasets, self-supervised learning for problems on the Web, e.g., retrieval, recommendation, graph mining, social network analysis, is largely unexplored.
As we know, the Web is a treasure trove of user experiences, knowledge, and multi-modal data that present great opportunities to achieve artificial intelligence. The Web itself, with huge amount of multi-modal data—text, image, graph, etc., and multiple types of entities (e.g., users, documents, organizations, etc.), user behaviors, and relations between entities, has become so large and complex that traditional methodologies are inadequate. Over the last two decades, the conventional paradigm of mining and learning the Web data usually involves massive scale of manual effort in data labeling, most of which require extensive and specific domain knowledge.
In light of these issues faced, it is more than promising to leverage of the power of self-supervised learning to facilitate classical Web mining tasks. Therefore, we propose to host a dedicated workshop to explore the potential of self-supervised learning for the Web at WWW 2021. The workshop is timely for connecting scholars who are working on self-supervised learning in the machine learning, nlp, computer vision, graph neural networks, deep learning, artificial intelligence, and Web community to ignite inspirations. It will also offer a platform for discussing and identifying its main challenges, future directions and opportunities.
Mahdi Bohlouli and Maria-Esther Vidal
The 3rd Innovation Workshop on Transforming Big Data into Actionable Knowledge (BiDAW)
Today the world is being changed and revolutionized very fast by the integration of new technologies. Mainly digitalization and in particular, big data and next-generation Artificial Intelligence (AI), are impacting our lives as well as business processes. In this regard, the World Wide Web (WWW) plays a key role. The operational environment as well as all the digital services in the web should actionably respond to these changes and result in actionable insights for the support of decision making actions in various domains. Industrial integration of technologies that cause such actionable changes includes knowledge graphs, explainable and conversational AI, human-centered AI, as well as deep learning, adversarial networks, and reinforcement learning. The 3rd BiDAW workshop aims at bringing together practitioners and researchers from the areas of the semantic web, WWW, artificial intelligence, and databases, to discuss research issues and experiences in developing and deploying concepts, techniques, and applications that address various issues related to big data and actionable knowledge. In this regard, we believe that the BiDAW workshop will provide a productive and promising brainstorming environment to business leaders and academics towards transforming both big data and the next generation AI into actionable knowledge.