Information Seeking with Big Models (BigIS)

Workshop at IEEE ICDM 2025

Scope and Topics

This workshop is dedicated to pioneering research at the intersection of information seeking (e.g., information retrieval, recommendation, and question-answering), with the transformative potential of big models (e.g., language, visual, audio, and multimodal models), specifically focusing on five key aspects. (1) It encourages exploring how big models can revolutionize information retrieval strategies, emphasizing advancements that enhance efficiency and effectiveness in accessing information. (2) We encourage active researchers to utilize big models to advance recommender algorithms, enhancing user modeling for improved personalized recommendations. (3) We encourage groundbreaking research in question-answering systems, leveraging the capabilities of big models to provide accurate and contextually relevant responses. (4) We emphasize the critical role of trustworthiness when employing big models for information seeking, to ensure the reliability of generated content with ethical and legal standards. (5) We encourage researchers to design robust evaluation methodologies, standards, and human evaluation paradigms to comprehensively assess the impact of big models in information seeking.

News

16 April 2025 - Call For Paper is released.
16 April 2025 - Workshop homepage is now available.

Call for Papers

The workshop centers around pioneering research at the intersection of information seeking, big models, and novel technologies. It specifically focuses on five key aspects: (1) revolutionizing information retrieval, e.g., exploring how big models can enhance efficiency and effectiveness in accessing information; (2) advancing recommender algorithms, e.g., utilizing big models to improve personalized recommendations; (3) enhancing question-answering systems, e.g., leveraging big models for accurate and contextually relevant responses; (4) ensuring trustworthiness, e.g., emphasizing ethical and legal standards when using big models; (5) comprehensive evaluation, e.g., designing robust evaluation methodologies to assess big models' impact. We welcome original submissions across a wide range of topics, including but not limited to:

Information retrieval with large generative models
Query expansion and reformulation strategies
User studies and experiences with big models
Ethical considerations and responsible AI
Architectural integration into search engines
Scalability challenges in deploying big models
Applications in specialized domains
Robust evaluation metrics
Multimodal information seeking
Integration in recommendation systems
Advancements in question-answering

Important dates:

Workshop Papers Submission: 7 September 2025
Workshop Papers Notification: 19 September 2025
Camera-ready Submission: 25 September 2025
Workshop date: 12 November 2025

Please note: The submission deadline is at 11:59 p.m. of the stated deadline date Anywhere on Earth.

Submission

Submission Guidelines:

Submitted papers (.pdf format) must be the same format & template as the main conference. Please remember to add Concepts and Keywords. All the accepted papers can be up to 10 pages including references. All papers will undergo the same review process and review period. Paper submissions must conform to the “double-blind” review policy. All papers will be peer-reviewed by experts in the field. Acceptance will be based on relevance to the workshop, scientific novelty, and technical quality.

Submission Site: Submission Link

Organizers

Zheng Wang (Huawei Singapore Research Center, Singapore)
Jieer Ouyang (Huawei Singapore Research Center, Singapore)
Xiaoneng Xiang (Huawei Singapore Research Center, Singapore)
Wei Shi (Huawei Singapore Research Center, Singapore)
Kun Tan (Huawei Technologies, Co., Ltd.)
Xiangnan He (University of Science and Technology of China, China)
Cheng Long (Nanyang Technological University, Singapore)
Gao Cong (Nanyang Technological University, Singapore)

Speakers

Keynote 1

Speaker: Raymond Chi-Wing Wong (The Hong Kong University of Science and Technology)

Talk Title: Data Extraction for Data Visualization

Abstract: Data visualization (DV) has become a prevailing tool in the market due to its effectiveness into illustrating insights in vast amounts of data. To lower the barrier of using DVs, automatic DV tasks, such as natural language question (NLQ) to visualization translation (formally called text-to-vis), have been investigated in the research community. The task of text-to-vis is to extract relevant data from databases and to generate visualized charts based on this relevant data. In this talk, we focus on recent developments of different variants of text-to-vis.

Bio: Raymond Chi-Wing Wong is a Professor in Computer Science and Engineering (CSE) of The Hong Kong University of Science and Technology (HKUST). He is currently the associate head of CSE and the director of the Undergraduate Research Opportunities Program (UROP). He received his BSc, MPhil and PhD degrees in Computer Science and Engineering in CUHK in 2002, 2004 and 2008, respectively. He has received 46 awards, published 137 conference papers (e.g., SIGMOD, SIGKDD, VLDB, ICDE, ICDM), 52 journal/chapter papers (e.g., TODS, DAMI, TKDE, VLDBJ, TKDD) and 1 book. His research interests include data mining and database.

Keynote 2

Speaker: Bo An (Nanyang Technological University)

Talk Title: From Algorithmic and RL-based to LLM-powered Agents

Abstract: In the early days of tackling AI problems involving complex cooperation and strategic interactions, algorithmic approaches were widely employed. Reinforcement learning has since proven effective in learning efficient policies for large-scale optimization problems that are beyond the scalability of traditional algorithmic approaches. Recently, the use of large language models (LLMs) as computational engines has given rise to a new paradigm: LLM-powered agents capable of addressing complex problems across various domains. This talk will explore our recent work within these three paradigms and offer insights into the development of scalable, efficient, and distributed artificial general intelligence.

Bio: Bo An is a President’s Chair Professor in Computer Science, Head of Division of Artificial Intelligence, and Director of Centre of AI-for-X at Nanyang Technological University, Singapore. He received the Ph.D degree in Computer Science from the University of Massachusetts, Amherst. His current research interests include artificial intelligence, multiagent systems, computational game theory, reinforcement learning, and optimization. His research results have been successfully applied to many domains including infrastructure security, sustainability and e-commerce. He has published over 150 referred papers at AAMAS, IJCAI, AAAI, ICLR, NeurIPS, ICML, AISTATS, ICAPS, KDD, UAI, EC, WWW, JAAMAS, and AIJ. Dr. An was the recipient of the 2010 IFAAMAS Victor Lesser Distinguished Dissertation Award, an Operational Excellence Award from the Commander, First Coast Guard District of the United States, the 2012 INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice, 2018 Nanyang Research Award (Young Investigator), and 2022 Nanyang Research Award. His publications won the Best Innovative Application Paper Award at AAMAS’12, the Innovative Application Award at IAAI’16, and the best paper award at DAI’20. He was invited to give Early Career Spotlight talk at IJCAI’17. He led the team HogRider which won the 2017 Microsoft Collaborative AI Challenge. He was named to IEEE Intelligent Systems' "AI's 10 to Watch" list for 2018. He was PC Co-Chair of AAMAS’20 and General Co-Chair of AAMAS’23. He is on the IJCAI Board of Trustees and will be Program Chair of IJCAI’27. He is Editor-in-Chief of IEEE Intelligent Systems and serves on editorial boards of AIJ, JAAMAS, ACM TAAS, ACM TIST, and JAIR.

Keynote 3

Speaker: Aixin Sun (Nanyang Technological University)

Talk Title: Storage-Efficient Visual Document Retrieval and Beyond

Abstract: Visually rich documents often contain complex structures—layouts, figures, charts, images, and tables—that challenge traditional text-based retrieval systems. Recent advances in large vision-language models (LVLMs) offer new opportunities to rethink document search pipelines by processing documents directly in their visual form. A natural approach is to encode each page as a dense set of patch-level embeddings, but this introduces substantial storage overhead. In this talk, I will present our empirical study on reducing the number of patch embeddings per page while maintaining retrieval accuracy. We evaluate two token-reduction strategies, token pruning and token merging, and find that token merging offers a more effective trade-off between storage efficiency and performance. I will also introduce how to apply the dense retrieval pipeline to large-scale visual content retrieval.

Bio: Dr. SUN Aixin is an Associate Professor at the College of Computing and Data Science, Nanyang Technological University (NTU), Singapore. His research interests include information retrieval, recommender systems, and natural language processing. His work has earned accolades including the SIGIR 2025 Test of Time Honorable Mention Award, the Best Student Paper Award at the IEEE International Conference on Services Computing in 2020, and a Best Student Paper Honorable Mention at SIGIR 2016. He serves as an Associate Editor for ACM TOIS, ACM TORS, ACM TIST, and is on the editorial board of JASIST.

Time Zone: US Eastern Standard Time (EST)
Time (EST)	Session
13:30 – 13:40	Welcome Message from the Chairs
13:40 – 14:20	Keynote 1 — Data Extraction for Data Visualization (Raymond Chi-Wing Wong)
14:20 – 15:00	Keynote 2 — From Algorithmic and RL-based to LLM-powered Agents (Bo An)
15:00 – 15:40	Keynote 3 — Storage-Efficient Visual Document Retrieval and Beyond (Aixin Sun)
15:40 – 15:55	Coffee / Break
15:55 – 16:10	Paper 1 — Automated Indicator Mining and Forecast-Ready Feature Engineering via LLMs: Application in Colocation Data Center Supply Chains
16:10 – 16:25	Paper 2 — OTTER: Open-Tagging via Text-Image Representation for Multi-modal Understanding
16:25 – 16:40	Paper 3 — A Multimodal Conversational Agent for Tabular Data Analysis
16:40 – 17:30	Open Discussion, Q&A, Networking