Scope and Topics

This workshop is dedicated to pioneering research at the intersection of information seeking (e.g., information retrieval, recommendation, and question-answering), with the transformative potential of big models (e.g., language, visual, audio, and multimodal models), specifically focusing on five key aspects. (1) It encourages exploring how big models can revolutionize information retrieval strategies, emphasizing advancements that enhance efficiency and effectiveness in accessing information. (2) We encourage active researchers to utilize big models to advance recommender algorithms, enhancing user modeling for improved personalized recommendations. (3) We encourage groundbreaking research in question-answering systems, leveraging the capabilities of big models to provide accurate and contextually relevant responses. (4) We emphasize the critical role of trustworthiness when employing big models for information seeking, to ensure the reliability of generated content with ethical and legal standards. (5) We encourage researchers to design robust evaluation methodologies, standards, and human evaluation paradigms to comprehensively assess the impact of big models in information seeking.

In summary, the workshop will focus on exploring the challenges and opportunities of integrating big models for information seeking, covering areas such as information retrieval, recommendation, and question-answering. Our aim is to offer a platform for participants to present their research, share experiences, and foster collaborative discussions.

Call for Papers

In recent years, the field of data mining and machine learning has experienced a surge in the scale and complexity of models. These large-scale models, often referred to as "big models", hold immense potential to revolutionize the landscape of information seeking. Moreover, within the realm of language models, recent breakthroughs have given rise to ChatGPT-like powerful models that excel in both natural language understanding and generation. Trained on vast volumes of data, these models offer exciting opportunities for enhancing information seeking experiences. The workshop aims to foster collaboration by bringing together researchers, practitioners, and industry experts from the information seeking and big model communities. Participants will delve into novel techniques, methodologies, and applications related to leveraging big models for information retrieval, recommendation systems, and question-answering.

Topics of Interest

The workshop centers around pioneering research at the intersection of information seeking, big models, and novel technologies. It specifically focuses on five key aspects: (1) revolutionizing information retrieval, e.g., exploring how big models can enhance efficiency and effectiveness in accessing information; (2) advancing recommender algorithms, e.g., utilizing big models to improve personalized recommendations; (3) enhancing question-answering systems, e.g., leveraging big models for accurate and contextually relevant responses; (4) ensuring trustworthiness, e.g., emphasizing ethical and legal standards when using big models; (5) comprehensive evaluation, e.g., designing robust evaluation methodologies to assess big models' impact. We welcome original submissions across a wide range of topics, including but not limited to:

  • Information retrieval with large generative models
    • examining how large generative models influence information retrieval and ranking, and vice versa.
  • Query expansion and reformulation strategies
    • developing innovative approaches for query expansion and reformulation using contextual information from big models.
  • User studies and experiences with big models
    • exploring the effectiveness of big models in addressing diverse information needs from a user-centric perspective.
  • Ethical considerations and responsible AI
    • examining ethical implications and propose strategies for responsible AI in the development and deployment of large generative models.
  • Architectural integration into search engines
    • discussing architectural considerations for integrating big models into existing search products.
  • Scalability challenges in deploying big models
    • exploring solutions ensuring efficient and reliable performance at scale.
  • Applications in specialized domains
    • investigating the role of big models in domains such as healthcare, education, finance, legal, and others.
  • Robust evaluation metrics
    • exploring reliable metrics for assessing big models' performance in information seeking.
  • Multimodal information retrieval
    • investigating theoretical, algorithmic or practical solutions addressing problems across the domain of multimedia and information retrieval with big models.
  • Integration in recommendation systems
    • exploring how big models enhance recommendation systems for personalized content discovery and accurate recommendations.
  • Advancements in question-answering
    • discussing advancements in using big models for both open-domain and domain-specific question-answering.

Submission Guidelines

Prospective authors are invited to submit original research papers that address the topics of interest for the workshop. Paper submissions should be limited to a maximum of ten (10) pages, in the IEEE 2-column format, including the bibliography and any possible appendices. Submissions longer than 10 pages will be rejected without review. All submissions will be double-blind reviewed by the Program Committee on the basis of technical quality, relevance to scope of the workshop, originality, significance, and clarity. Please refer to the ICDM regular Submission Guidelines for more information.

Important Dates

Workshop papers submission: September 17, 2024
Notification of workshop papers acceptance to authors: October 7, 2024
Camera-ready deadline and copyright form: October 11, 2024
Workshops date: December 9, 2024

Paper submission link: Workshop on Information Seeking with Big Models
The submission deadline is at 11:59 p.m. of the stated deadline date Anywhere on Earth.

Workshop Organizers

Zheng Wang

Researcher

Huawei Singapore Research Center

Shu Xian Teo

Researcher

Huawei Singapore Research Center

Wei Shi

Researcher

Huawei Singapore Research Center

Kun Tan

Researcher

Huawei Technologies, Co., Ltd.

Xiangnan He

Professor

University of Science and Technology of China

Cheng Long

Professor

Nanyang Technological University

Gao Cong

Professor

Nanyang Technological University

Speakers

Keynote 1

Prof. Aixin Sun is an Associate Professor at the College of Computing and Data Science (CCDS), Nanyang Technological University (NTU), Singapore. He received B.A.Sc (1st class honours) and Ph.D. both in Computer Engineering from NTU in 2001 and 2004 respectively. His current research interests include information retrieval, text mining, recommender systems, and social computing. Dr. Sun is an associate editor of ACM Transactions on Information Systems (TOIS), ACM Transactions on Intelligent Systems and Technology (TIST), ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Neurocomputing, and editorial board member of Journal of the Association for Information Science and Technology (JASIST). He has also served as Area Chair, Senior PC member for many conferences including SIGIR, WWW, WSDM, and NeurIPS.

Talk Title: Question-Answering via Large Vision-Language Models
Abstract: For a long time, the development of natural language processing (NLP) has been based on the assumption that data is processed in raw text format. However, the documents we encounter in daily life often contain rich structures, including nicely formatted pages with visual elements like figures, charts, photos, and tables. The rapid advancements in large vision-language models (LVLMs) provide an opportunity to reconsider traditional NLP pipelines. With LVLMs, we can read PDF/Word files directly from screenshots (e.g., PNG files), and this simplified process eliminates the need for parsing documents before LLMs can answer questions based on extracted (or OCR-ed) text. More importantly, this end-to-end approach preserves both textual and visual information in its entirety. In this talk, I will introduce MMLongBench-Doc, an evaluation of 14 LVLMs and 10 LLMs on question-answering tasks over long documents with rich visual elements. I will also discuss key insights gained from this benchmarking and their implications for the future of NLP.

Keynote 2

Dr. Shanshan Feng is currently a senior scientist at the Centre for Frontier AI Research, Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore. He received the Ph.D. degree from Nanyang Technological University, Singapore in 2017, and his B.E. degree from the University of Science and Technology of China in 2011. His current research interests include big data analytics, social graph learning, and recommender systems. He has published more than 60 papers in prestigious journals and conferences including IEEE TPAMI, IEEE TKDE, IEEE TNNLS, IEEE TGRS, SIGKDD, SIGIR, ICDE, VLDB, etc. He served as a PC member of SIGIR, SIGKDD, ICDE, ICDM, AAAI, and IJCAI, etc.

Talk Title: Multi-Objective Optimization in Generative Recommendations
Abstract: Recent advancements in recommendation systems have primarily focused on improving single-objective metrics, such as accuracy. However, the evolving landscape of artificial general intelligence and the rise of large language models (LLMs) have opened new avenues for handling multiple tasks and objectives simultaneously within recommendation systems. These versatile foundational models offer enhanced flexibility in addressing multi-task and multi-objective challenges, making it increasingly critical to explore and understand this emerging field. In this talk, I will present a comprehensive review of multi-objective optimization techniques in recommendation systems, focusing on the integration of generative AI methods. The discussion will cover key tasks, methods, applications, and recent developments related to Multi-Objective Generative Recommendations. The goal is to bridge the gap in existing literature by highlighting the importance of multi-objective approaches in the next generation of recommendation systems.

Keynote 3

Dr. Hossein Esfandiari is a Senior Research Scientist at the Google NYC Algorithms and Optimization Team. Previously, he was a Postdoctoral Researcher at Harvard University in the Theory of Computation group, where he was advised by Professor Michael Mitzenmacher. He received his Ph.D. in Computer Science from the University of Maryland, under the guidance of Professor Mohammad T. HajiAghayi. He completed his undergraduate studies in Computer Engineering at Sharif University of Technology.

Talk Title: Providing Large-Scale Privacy for Complex Models via Smooth Anonymity
Abstract: When working with user data, providing well-defined privacy guarantees is paramount. Here, we aim to manipulate and share a complex learning model or an entire dataset with a third party privately. As our first main result, we prove that any differentially private mechanism that maintains a reasonable similarity with the initial dataset is doomed to have a very weak privacy guarantee in the worst case. In such situations, we need to look into other privacy notions such as k-anonymity. Hence, we consider a variation of k-anonymity, which we call smooth-k-anonymity, and design a very large-scale algorithm that efficiently provides smooth-k-anonymity for billions of entries. Our empirical evaluations show that our algorithm improves the performance in downstream machine learning tasks on anonymized data.

Workshop Schedule

  • On-site venue: Room 6
  • Date & Time: December 9th Morning (Abu Dhabi Local Time)
  • Zoom link: Meeting ID: 825 5154 1095, Passcode: 241209

  • Time Title
    10:30 - 10:40 Welcome Message from the Chairs
    10:40 - 11:15 Keynote 1: Question-Answering via Large Vision-Language Models
    11:15 - 11:50 Keynote 2: Multi-Objective Optimization in Generative Recommendations
    11:50 - 12:25 Keynote 3: Providing Large-Scale Privacy for Complex Models via Smooth Anonymity
    12:25 - 12:35 Presentation 1: SRSA: A Cost-Efficient Strategy-Router Search Agent for Contextual Queries
    12:35 - 12:45 Presentation 2: Emotion Retrieval Using Generative Models: An Exploratory Study
    12:45 - 12:55 Presentation 3: Foundation Models for Course Equivalency Evaluation
    12:55 - 13:00 Workshop Closing


    Contact

    In case of questions, contact us via an email to Zheng Wang.