Details of a Researcher - SAKAI, Tetsuya

写真a

SAKAI, Tetsuya

Scopus Paper Info

Paper Count: 207 Citation Count: 2457 h-index: 28

Click to view the Scopus page. The data was downloaded from Scopus API in October 24, 2025, via http://api.elsevier.com and http://www.scopus.com .

Google Scholar Information (Citations per year)

Citation Count: 7608 h-index: 47 i10-index: 168

Click to view the Google Scholar page.

Scopus Information

News & Topics

2024.10.03

見えてきた医理工連携の成果と展開（第4回日本医科大学・早稲田大学合同シンポジウム開催報告）

2023.10.16

進む、医理工研究交流（第３回日本医科大学・早稲田大学合同シンポジウム開催報告）

Affiliation

Faculty of Science and Engineering, School of Fundamental Science and Engineering

Job title

Professor

Degree

博士

Mail Address

Homepage URL

http://sakailab.com/tetsuya/

Profile

http://sakailab.com/tetsuya/

Research Areas

Human interface and interaction

Research Interests

information access, information retrieval, natural language processing

Awards

SIGIR Academy 2023

2023.05 ACM SIGIR
DEIM 2020 Excellent Paper Award (second author)

2020
FIT 2020 Excellent Paper Award (first author)

2020
CSS 2019 Best Paper Award (fifth author)

2019
ACM Distinguished Member

2018
WASEDA e-Teaching Award 2018

2018
ACM Recognition of Service Award (SIGIR'17 Co-chair)

2017
ACM Senior Member

2016
Waseda University Presidential Teaching Award (2016 Spring Semester)

2016
Waseda University Teaching Award (2014 Autumn Semester)

2015
CSS 2014 Student Paper Award (third author)

2014
MobileHCI 2014 Honorable Mention (second author)

2014
ACM SIGIR 2013 best paper shortlisted nominee (first author)

2013
AIRS 2012 Best Paper Award (first author)

2012
WebDB Forum 2010 Excellent Paper Award and NTT Resonant Award (second author)

2010
FIT 2008 Funai Best Paper Award (first author)

2008
IEICE ESS 2007 Merit Award

2007
IPSJ 2007 Best Paper Award (single author)

2007
IPSJ 2006 Yamashita SIG Research Award (single author)

2006
IPSJ 2006 Best Paper Award (single author)

2006
FIT 2005 Excellent Paper Award (single author)

2005

▼display all

Papers

Click the search button and be happy: Evaluating direct and immediate information access

Tetsuya Sakai, Makoto P. Kato, Young-In Song

International Conference on Information and Knowledge Management, Proceedings 621 - 630 2011 [Refereed]

　View Summary

We define Direct Information Access as a type of information access where there is no user operation such as clicking or scrolling between the user's click on the search button and the user's information acquisition
we define Immediate Information Access as a type of information access where the user can locate the relevant information within the system output very quickly. Hence, a Direct and Immediate Information Access (DIIA) system is expected to satisfy the user's information need very quickly with its very first response. We propose a nugget-based evaluation framework for DIIA, which takes nugget positions into account in order to evaluate the ability of a system to present important nuggets first and to minimise the amount of text the user has to read. To demonstrate the integrity, usefulness and limitations of our framework, we built a Japanese DIIA test collection with 60 queries and over 2,800 nuggets as well as an offset-based nugget match evaluation interface, and conducted experiments with manual and automatic runs. The results suggest our proposal is a useful complement to traditional ranked retrieval evaluation based on document relevance. © 2011 ACM.

DOI

Scopus

22

Citation

(Scopus)
LLM-assisted Relevance Assessments: When should We Ask LLMs for Help?,

Rikiya Takehi, Ellen M. Voorhees, Tetsuya Sakai, Ian Soboroff

Proceedings of SIGIR 2025 95 - 105 2025 [Refereed]
My System Is As Effective As Yours: Reproducibility, Sustainability, and More

Tetsuya Sakai

Proceedings of SIGIR 2025 3943 - 3953 2025 [Refereed]
COPWA at the NTCIR-18 FairWeb-2 Task

Amogh Raina, Tetsuya Sakai

Proceedings of NTCIR-18 80 - 83 2025 [Refereed]
RSLFW at the NTCIR-18 FairWeb-2 Task

Atsuya Ishikawa, Sijie Tao, Tetsuya Sakai

Proceedings of NTCIR-18 68 - 71 2025 [Refereed]
Overview of the NTCIR-18 FairWeb-2 Task

Sijie Tao, Tetsuya Sakai, Junjie Wang, Hanpei Fang, Yuxiang Zhang, Haitao Li, Yiteng Tu, Nuo Chen, Maria Maistro

Proceedings of NTCIR-18 40 - 60 2025 [Refereed]
Evaluating Group Fairness and Relevance in Conversational Search

Tetsuya Sakai, Sijie Tao, Young-In Song

An Alternative Formulation, Procedings of EVIA 2025 15 - 22 2025 [Refereed]
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentated Generation

Yiruo Cheng, Kelong Mao, Ziliang Zhao, Guanting Dong, Hongjin Qian, Yongkang Wu, Tetsuya Sakai, Ji-Rong Wen, Zhicheng Dou

Findings of NAACL 2025 1308 - 1330 2025 [Refereed]
Explainable Detection of Logical and Structural Anomalies based on Multimodal Large Language Models

Noeko Fujii, Tetsuya Sakai

ROBOVIS 2025 2025 [Refereed]
Reconstruction of 3D Brain Structures from Clinical 2D MRI Data

Rui Shi, Tsukasa Koike, Tetsuro Sekine, Akio Morita, Tetsuya Sakai

ICPRAM 2025, SciTePress, INSTICC, 2025 352 - 259 2025 [Refereed]
Understanding User Behavior and Measuring System Vulnerability

Nuo Chen, Jiqun Liu, Hanpei Fang, Yuankai Luo, Tetsuya Sakai, Xiao-Ming Wu

ACM TOIS 2025 [Refereed]
Evaluating System Responses Based On Overconfidence and Underconfidence

Tetsuya Sakai

2024 [Refereed]

Authorship：Lead author
Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Large Language Models

Yuxiang Zhang, Xin Fan, Junjie Wang, Chongxian Chen, Fan Mo, Tetsuya Sakai, Hayato Yamana

ACM SIGIR-AP 226 - 235 2024 [Refereed]
AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment

Nuo Chen, Jiqun Liu, Xiaoyu Dong, Qijiong Liu, Tetsuya Sakai, Xiao-Ming Wu

ACM SIGIR-AP 56 - 63 2024 [Refereed]
Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval

Kelong Mao, Chenlong Deng, Haonan Chen, Fengran Mo, Zheng Liu, Tetsuya Sakai, Zhicheng Dou

EMLNLP 1227 - 1240 2024 [Refereed]
A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models

Yuxiang Zhang, Jing Chen, Junjie Wang, Yaxin Liu, Cheng Yang, Chufan Shi, Xinyu Zhu, Zihao Lin, Hanwen WAN, Yujiu Yang, Tetsuya Sakai, Tian Feng, Hayato Yamana

EMLNLP 11388 - 11422 2024 [Refereed]
Benchmarking Chinese Text-to-Table Performance in Large Language Models

Haoxiang Shi, Jiaan Wang, Jiarong Xu, Cen Wang, Tetsuya Sakai

arxiv:2405.12174 2024 [Refereed]
Solving Named Entity Recognition Problems via a Single-stream Reasoner

Yuxiang Zhang, Junjie, Wang, Xinyu Zhu, Tetsuya Sakai, Hayato Yamana

ACM TOIS 42 ( 5 ) 2024 [Refereed]
Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models

Qijiong Liu, Nuo Chen, Tetsuya Sakai, Xiao-Ming Wu

ACM WSDM 452 - 461 2024 [Refereed]
Enhancing Parameter Efficiency in Model Inference using an Ultralight Inter-Transformer Linear Structure

Haoxiang Shi, Tetsuya Sakai

IEEE Access 12 43734 - 43746 2024 [Refereed]
Modeling Multimodal Uncertainties via Probability Distribution Encoders included Vision-Language Models

Junjie Wang, Yatai, Ji, Yuxiang Zhang, Yanru Zhu, Tetsuya Sakai

IEEE Access 12 420 - 434 2024 [Refereed]
Zero-Shot Learners for Natural Language Understanding via a Unified Multiple-Choice Perspective

Junjie Wang, Ping Yang, Ruxi Gan, Yuxiang Zhang, Jiaxing Zhang, Tetsuya Sakai

IEEE Access 11 142829 - 142845 2023 [Refereed]
Ethical Alignment Meets Conversational Information Retrieval

Yiyao Yu, Junjie Wang, Yuxiang Zhang, Lin Zhang, Yujiu Yang, Tetsuya Sakai

ACM SIGIR-AP 32 - 39 2023 [Refereed]
Deriving Nugget-level Scores from Turn-level Scores

Rikiya Takehi, Akihisa Watanabe, Tetsuya Sakai

ACM SIGIR-AP 40 - 45 2023 [Refereed]
Chuweb21D: A Deduped English Document Collection for Web Search Tasks

Zhumin Chu, Tetsuya Sakai, Qingyao Ai, Yiqun Liu

ACM SIGIR-AP 63 - 72 2023 [Refereed]
Fairness-based Evaluation of Conversational Search: A Pilot Study

Tetsuya Sakai

EVIA 5 - 13 2023 [Refereed]
Decoy Effect in Search Interaction: A Pilot Study

Nuo Chen, Jiqun Liu, Tetsuya Sakai, Xiao-Ming Wu

EVIA 14 - 16 2023 [Refereed]
On A Few Responsibilities of (IR) Researchers (Fairness, Awareness, and Sustainability), A keynote at ECIR 2023

Tetsuya Sakai

SIGIR Forum 2023 [Refereed]
Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval

Haoxiang Shi, Sumio Fujita, Tetsuya Sakai

SIGIR Workshop on ReNeuIR 2023 [Refereed]
Self-supervised and Few-shot Contrastive Learning Frameworks for Text Clustering

Haoxiang Shi, Tetsuya Sakai

IEEE 11 84134 - 84143 2023 [Refereed]
On the Ordering of Pooled Web Pages, Gold Assessments, and Bronze Assessments

Tetsuya Sakai, Sijie Tao, Nuo Chen, Yujing Li, Maria Maistro, Zhumin Chu, Nicola Ferro

ACM TOIS 2023 [Refereed]
How Many Crowd Workers Do I Need? On Statistical Power When Crowdsourcing Relevance Judgments

Kevin Roitero, David La Barbera, Michael Soprano, Gianluca Demartini, Stefano Mizzaro, Tetsuya Sakai

ACM TOIS 2023 [Refereed]
A Versatile Framework for Evaluating Ranked Lists in terms of Group Fairness and Relevance

Tetsuya Sakai, Jin Young Kim, Inho Kang

ACM TOIS 2023 [Refereed]
Practice and Challenges in Building a Business-oriented Search Engine Quality Metric

Nuo Chen, Donghyun Park, Hyungae Park, Kijun Choi, Tetsuya Sakai, Jinyoung Kim

SIGIR 2023 2023 [Refereed]
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model

Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang

CVPR 2023 2023 [Refereed]
A Reference-Dependent Model for Web Search Evaluation

Nuo Chen, Jiqun Liu, Tetsuya Sakai

The 2023 ACM Web Conference 2023 [Refereed]
Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents? (CORRECTED VERSION)

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

arXiv 2022 [Refereed]
Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective

Ping Yang, Junjie Wang, Ruyi Gan, Xinyu Zhu, Lin Zhang, Ziwei Wu, Xinyu Gao, Jiaxing Zhang, Tetsuya Sakai

EMNLP 2022 2022 [Refereed]
Understanding the Behavior Transparency of Voice Assistant Applications Using the ChatterBox Framework

Atsuko Natatsuka, Ryo Iijima, Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, Tatsuya Mori

Proceedings of RAID 2022 2022 [Refereed]
MAP: Modality-Agnostic Uncertainty-Aware Vision-Language Pre-training Model

Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang

arXiv 2022 [Refereed]
Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4 English Subtasks

Tetsuya Sakai, Sijie Tao, Maria Maistro, Zhumin Chu, Yujing Li, Nuo Chen, Nicola Ferro, Junjie Wang, Ian Soboroff, Yiqun Liu

arXiv 2022 [Refereed]
LayerConnect: Hypernetwork-Assisted Inter-Layer Connector to Enhance Parameter Efficiency

Haoxiang Shi, Rongsheng Zhang, Jiaan Wang, Cen Wang, Guandan Chen, Yinhe Zheng, Tetsuya Sakai

Proceedings of COLING 2022 2022 [Refereed]
Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles?

Rei Shimizu, Sumio Fujita, Tetsuya Sakai

Proceedings of ACM ICTIR 2022 2022 [Refereed]
Constructing Better Evaluation Metrics by Incorporating the Anchoring Effect into the User Model

Nuo Chen, Fan Zhang, Tetsuya Sakai

ACM SIGIR 2022 2022 [Refereed]
Evaluating the Effects of Embedding with Speaker Identity Information in Dialogue Summarization

Yuji Naraki, Tetsuya Sakai, Yoshihiko Hayashi

LREC 2022 2022 [Refereed]
AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval

Riku Togashi, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Tetsuya Sakai

CVPR 2022 2022 [Refereed]
スタンス検出タスクにおける評価方法の選定 (研究会推薦論文)

雨宮佑基, 酒井哲也

電子情報通信学会和文論文誌D「データ工学と情報マネジメント特集」 2022 [Refereed]
Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents?

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

ACM TOIS 2022 [Refereed]
MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

Junjie Wang, Yatai Ji, Jiaqi Sun, Yujiu Yang, Tetsuya Sakai

Findings of the Association for Computational Linguistics: EMNLP 2021 2021 [Refereed]
A Closer Look at Evaluation Measures for Ordinal Quantification

Tetsuya Sakai

Proceedings of the CIKM 2021 Workshops 2021 [Refereed]
Evaluating Relevance Judgments with Pairwise Discriminative Power

Zhumin Chu, Jiaxin Mao, Fan Zhang, Yiqun Liu, Tetsuya Sakai, Min Zhang, Shaoping Ma

Proceedings of ACM CIKM 2021 2021 [Refereed]
Incorporating Query Reformulating Behavior into Web Search Evaluation

Jia Chen, Yiqun Liu, Jiaxin Mao, Fan Zhang, Tetsuya Sakai, Weizhi Ma, Min Zhang, Shaoping Ma

Proceedings of ACM CIKM 2021 2021 [Refereed]
A Simple and Effective Usage of Self-supervised Contrastive Learning for Text Clustering

Haoxiang Shi, Cen Wang, Tetsuya Sakai

Proceedings of IEEE SMC 2021 2021
Evaluating Evaluation Measures for Ordinal Classification and Ordinal Quantification

Tetsuya Sakai

Proceedings of ACL-IJCNLP 2021 2021 [Refereed]
WWW3E8: 259,000 Relevance Labels for Studying the Effect of Document Presentation Order for Relevance Assessors

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

Proceedings of ACM SIGIR 2021 2021 [Refereed]
On the Two-Sample Randomisation Test for IR Evaluation

Tetsuya Sakai

Proceedings of ACM SIGIR 2021 2021 [Refereed]
Scalable Personalised Item Ranking through Parametric Density Estimation

Riku Togashi, Masahiro Kato, Mayu Otani, Tetsuya Sakai, Shin’Ichi Satoh

Proceedings of ACM SIGIR 2021 2021 [Refereed]
Fast and Exact Randomisation Test for Comparing Two Systems with Paired Data

Rikiya Suzuki, Tetsuya Sakai

Proceedings of ACM ICTIR 2021 2021 [Refereed]
DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators’ Labels

Zhaohao Zeng, Tetsuya Sakai

arXiv 2021 [Refereed]
How Do Users Revise Zero-Hit Product Search Queries?

Yuki Amemiya, Tomohiro Manabe, Sumio Fujita, Tetsuya Sakai

Proceedings of ECIR 2021 Part II 2021 [Refereed]
On the Instability of Diminishing Return IR Measures

Tetsuya Sakai

Proceedings of ECIR 2021 Part I 2021 [Refereed]
RSL19BD at DBDC4: Ensemble of Decision Tree-Based and LSTM-Based Models

Chih-hao Wang, Sosuke Kato, Tetsuya Sakai

Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems 2021 [Refereed]
Retrieval Evaluation Measures that Agree with Users’ SERP Preferences: Traditional, Preference-based, and Diversity Measures

Tetsuya Sakai, Zhaohao Zeng

ACM TOIS 2020 [Refereed]
A Siamese CNN Architecture for Learning Chinese Sentence Similarity

Haoxiang Shi, Cen Wang, Tetsuya Sakai

Proceedings of AACL-IJCNLP 2020 Student Research Workshop (SRW) 2020 [Refereed]
Automatic Evaluation of Iconic Image Retrieval based on Colour, Shape, and Texture

Riku Togashi, Sumio Fujita, Tetsuya Sakai

Proceedings of ACM ICMR 2020 2020 [Refereed]
SogouQ: The First Large-Scale Test Collection with Click Streams Used in a Shared-Task Evaluation

Ruihua Song, Min Zhang, Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou

Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact 2020 [Refereed]
Graded Relevance

Tetsuya Sakai

Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact 2020 [Refereed]
Visual Intents vs. Clicks, Likes, and Purchases in E-commerce

Riku Togashi, Tetsuya Sakai

Proceedings of ACM SIGIR 2020 2020 [Refereed]
Good Evaluation Measures based on Document Preferences

Tetsuya Sakai, Zhaohao Zeng

Proceedings of ACM SIGIR 2020 2020 [Refereed]
How to Measure the Reproducibility of System-oriented IR Experiments

Timo Breuer, Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Philipp Schaer, Ian Soboroff

Proceedings of ACM SIGIR 2020 2020 [Refereed]
文書分類技術に基づくエントリーシートからの業界推薦

三王慶太, 酒井哲也

日本データベース学会和文論文誌 2020 [Refereed]
Randomised vs. Prioritised Pools for Relevance Assessments: Sample Size Considerations

Tetsuya Sakai, Peng Xiao

Proceedings of AIRS 2019 2020 [Refereed]
Generating Short Product Descriptors based on Very Little Training Data

Peng Xiao, Joo-Young Lee, Sijie Tao, Young-Sook Hwang, Tetsuya Sakai

Proceedings of AIRS 2019 2020 [Refereed]
Unsupervised Answer Retrieval with Data Fusion for Community Question Answering

Sosuke Kato, Toru Shimizu, Sumio Fujita, Tetsuya Sakai

Proceedings of AIRS 2019 2020 [Refereed]
Towards Automatic Evaluation of Reused Answers in Community Question Answering

Hsin-Wen Liu, Sumio Fujita, Tetsuya Sakai

Proceedings of AIRS 2019 2020 [Refereed]
Arc Loss: Softmax with Additive Angular Margin for Answer Retrieval

Rikiya Suzuki, Sumio Fujita, and Tetsuya Sakai

Proceedings of AIRS 2019 2020 [Refereed]
System Evaluation of Ternary Error-Correcting Output Codes for Multiclass Classification Problems*

Shigeichi Hirasawa, Gendo Kumoi, Hideki Yagi, Manabu Kobayashi, Masayuki Goto, Tetsuya Sakai, Hiroshige Inazumi

2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) 2019.10 [Refereed]

DOI
Understanding the inconsistencies between text descriptions and the use of privacy-sensitive resources of mobile apps

Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, Hironori Washizaki, Tatsuya Mori

SOUPS 2015 - Proceedings of the 11th Symposium on Usable Privacy and Security 241 - 255 2019

　View Summary

Permission warnings and privacy policy enforcement are widely used to inform mobile app users of privacy threats. These mechanisms disclose information about use of privacy-sensitive resources such as user location or contact list. However, it has been reported that very few users pay attention to these mechanisms during installation. Instead, a user may focus on a more user-friendly source of information: text description, which is written by a developer who has an incentive to attract user attention. When a user searches for an app in a marketplace, his/her query keywords are generally searched on text descriptions of mobile apps. Then, users review the search results, often by reading the text descriptions
i.e., text descriptions are associated with user expectation. Given these observations, this paper aims to address the following research question: What are the primary reasons that text descriptions of mobile apps fail to refer to the use of privacy-sensitive resources? To answer the research question, we performed empirical large-scale study using a huge volume of apps with our ACODE (Analyzing COde and DEscription) framework, which combines static code analysis and text analysis. We developed light-weight techniques so that we can handle hundred of thousands of distinct text descriptions. We note that our text analysis technique does not require manually labeled descriptions
hence, it enables us to conduct a large-scale measurement study without requiring expensive labeling tasks. Our analysis of 200,000 apps and multilingual text descriptions collected from official and third-party Android marketplaces revealed four primary factors that are associated with the inconsistencies between text descriptions and the use of privacy-sensitive resources: (1) existence of app building services/frameworks that tend to add API permissions/code unnecessarily, (2) existence of prolific developers who publish many applications that unnecessarily install permissions and code, (3) existence of secondary functions that tend to be unmentioned, and (4) existence of third-party libraries that access to the privacy-sensitive resources. We believe that these findings will be useful for improving users' awareness of privacy on mobile software distribution platforms.
Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

Tetsuya Sakai

Proceesings of ACM WSDM 2019 2019 [Refereed]
Attitude Detection for One-Round Conversation: Jointly Extracting Target-Polarity Pairs

Zhaohao Zeng, Ruihua Song, Pingping Lin, and Tetsuya Sakai

Proceesings of ACM WSDM 2019 2019 [Refereed]
A Comparative Study of Deep Learning Approaches for Extractive Query-Focused Multi-Document Summarization

Yuliska and Tetsuya Sakai

Proceedings of IEEE ICICT 2019 2019 [Refereed]
Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

Tetsuya Sakai

Proceedings of ECIR 2019 Part II (LNCS 11438) 2019 [Refereed]
CENTRE@CLEF 2019

Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, and Ian Soboroff

Proceedings of ECIR 2019 Part II (LNCS 11438) 2019 [Refereed]
Celebrating 20 Years of NTCIR: The Book

Douglas W. Oard, D.W., Tetsuya Sakai, and Noriko Kando

Proceedings of EVIA 2019 2019 [Refereed]
RSL19BD at DBDC4: Ensemble of Decision Tree-based and LSTM-based Models

Chih-hao Wang, Sosuke Kato, and Tetsuya Sakai

Proceedings of Chatbots and Conversational Agents and Dialogue Breakdown Detection Challenge (WOCHAT+DBDC), IWSDS 2019 2019 [Refereed]
Low-cost, Bottom-up Measures for Evaluating Search Result Diversification

Zhicheng Dou, Xue Yang, Diya Li, Ji-Rong Wen, Tetsuya Sakai

Information Retrieval Journal 2019 [Refereed]
Which Diversity Evaluation Measures Are “Good”?

Tetsuya Sakai and Zhaohao Zeng

Proceedings of ACM SIGIR 2019 2019 [Refereed]
The SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)

Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

Proceedings of ACM SIGIR 2019 2019 [Refereed]
Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)

Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

Proceedings of OSIRRC 2019 2019 [Refereed]
BM25 Pseudo Relevance Feedback Using Anserini at Waseda University

Zhaohao Zeng, Tetsuya Sakai

Proceedings of OSIRRC 2019 2019 [Refereed]
Composing a Picture Book by Automatic Story Understanding and Visualization

Xiaoyu Qi, Ruihua Song, Chunting Wang, Jin Zhou, and Tetsuya Sakai

Proceedings of the Second Storytelling Workshop (StoryNLP @ ACL2019) 2019 [Refereed]
CENTRE@CLEF2019: Overview of the Replicability and Reproducibility Tasks

Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Ian Soboroff

CLEF 2019 Working Notes 2019 [Refereed]
CENTRE@CLEF2019: Sequel in the Systematic Reproducibility Realm

Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, and Ian Soboroff

Proceedings of CLEF 2019 (LNCS 11696) 2019 [Refereed]
Generalising Kendall’s Tau for Noisy and Incomplete Preference Judgements

Riku Togashi and Tetsuya Sakai

Proceedings of ACM ICTIR 2019 2019 [Refereed]
Evaluating Image-Inspired Poetry Generation

Chao-Chung Wu, Ruihua Song, Tetsuya Sakai, Wen-Feng Cheng, Xing Xie, and Shou-De Lin

Proceedings of NLPCC 2019 2019 [Refereed]
How to Run an Evaluation Task: with a Primary Focus on Ad Hoc Information Retrieval

Tetsuya Sakai

Information Retrieval Evaluation in a Changing World : Lessons Learned from 20 Years of CLEF 2019 [Refereed]
Voice Assistant アプリの大規模実態調査

刀塚敦子, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉

コンピュータセキュリティシンポジウム 2019 [Refereed]
Voice Input Interface Failures and Frustration: Developer and User Perspectives

Shiyoh Goetsu and Tetsuya Sakai

ACM UIST 2019 Adjunct 2019 [Refereed]
A First Look at the Privacy Risks of Voice Assistant Apps

Atsuko Natatsuka, Mitsuaki Akiyama, Ryo Iijima, Tetsuya Sakai, Takuya Watanabe, and Tatsuya Mori

ACM CCS 2019 Posters & Demos 2019 [Refereed]
Attitude Detection for One-Round Conversation: Jointly Extracting Target-Polarity Pairs

Zhaohao Zeng, Ruihua Song, Pingping Lin, and Tetsuya Sakai

Journal of Information Processing 2019 [Refereed]
Search Result Diversity Evaluation Based on Intent Hierarchies

Xiaojie Wang, Ji-Rong Wen, Zhicheng Dou, Tetsuya Sakai, Rui Zhang

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 30 ( 1 ) 156 - 169 2018.01 [Refereed]

　View Summary

Search result diversification aims at returning diversified document lists to cover different user intents of a query. Existing diversity measures assume that the intents of a query are disjoint, and do not consider their relationships. In this paper, we introduce intent hierarchies to model the relationships between intents, and present four weighing schemes. Based on intent hierarchies, we propose several hierarchical measures that take into account the relationships between intents. We demonstrate the feasibility of hierarchical measures by using a new test collection based on TREC Web Track 2009-2013 diversity test collections and by using NTCIR-11 IMine test collection. Our main experimental findings are: (1) Hierarchical measures are more discriminative and intuitive than existing measures. In terms of intuitiveness, it is preferable for hierarchical measures to use the whole intent hierarchies than to use only the leaf nodes. (2) The types of intent hierarchies used affect the discriminative power and intuitiveness of hierarchical measures. We suggest the best type of intent hierarchies to be used according to whether the nonuniform weights are available. (3) To measure the benefits of the diversification algorithms which use automatically mined hierarchical intents, it is important to use hierarchical measures instead of existing measures.

DOI

Scopus

18

Citation

(Scopus)
Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

Tetsuya Sakai

Proceedings of ACM SIGIR 2018 2018 [Refereed]
Comparing Two Binned Probability Distributions for Information Access Evaluation

Tetsuya Sakai

Proceedings of ACM SIGIR 2018 2018 [Refereed]
CENTRE@CLEF2018: Overview of the Replicability Task

Nicola Ferro, Maria Maistro, Tetsuya Sakai, and Ian Soboroff

CLEF 2018 Working Notes 招待論文 2018 [Refereed]
Topic Set Size Design for Paired and Unpaired Data

Tetsuya Sakai

Proceedings of ACM ICTIR 2018 2018 [Refereed]
Classifying Community QA Questions That Contain an Image

Kenta Tamaki, Riku Togashi, Sosuke Kato, Sumio Fujita, Hideyuki Maeda, and Tetsuya Sakai

Proceedings of ACM ICTIR 2018 2018 [Refereed]
放棄セッションにおけるユーザ操作に着目したモバイル検索カードの順位付け

川崎真未, Inho Kang, 酒井哲也

IPSJ TOD11(3) 2018 [Refereed]
Towards Automatic Evaluation of Customer-Helpdesk Dialogues

Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, and Tetsuya Sakai

Journal of Information Processing 2018 [Refereed]
Overview of CENTRE@CLEF 2018: a First Tale in the Systematic Reproducibility Realm

Nicola Ferro, Maria Maistro, Tetsuya Sakai, and Ian Soboroff

Proceedings of CLEF 2018 (LNCS 11018) 2018 [Refereed]
Why You Should Listen to This Song: Reason Generation for Explainable Recommendation

Guoshuai Zhao, Hao Fu, Ruihua Song, Tetsuya Sakai, Xing Xie, and Xueming Qian

1st Workshop on Scalable and Applicable Recommendation Systems (SAREC 2018) 2018 [Refereed]
Understanding the Inconsistency between Behaviors and Descriptions of Mobile Apps

Takuya Watanabe, Akiyama Mitsuki, Tetsuya Sakai, Hironori Washizaki, and Tatsuya Mori

IEICE Transactions 2018 [Refereed]
Proceedings of AIRS 2018 (LNCS 11292)

Yuen-Hsien Tseng, Tetsuya Sakai, Jing Jiang, Lun-Wei Ku, Dae Hoon Park, Jui-Feng Yeh, Liang-Chih Yu, Lung-Hao Lee, Zhi-Hong Chen

エディタ 2018 [Refereed]
SIGIR 2017 chairs' welcome

Hang Li, Arjen P. De Vries, Ryen W. White, Noriko Kando, Tetsuya Sakai, Hideo Joho

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval iv - v 2017.08
The probability that your hypothesis is correct, credible intervals, and effect sizes for IR evaluation

Tetsuya Sakai

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 25 - 34 2017.08 [Refereed]

　View Summary

Using classical statistical significance tests, researchers can only discuss PD+jH, the probability of observing the data D at hand or something more extreme, under the assumption that the hypothesis H is true (i.e., the p-value). But what we usually want is PHjD, the probability that a hypothesis is true, given the data. If we use Bayesian statistics with state-of-The-Art Markov Chain Monte Carlo (MCMC) methods for obtaining posterior distributions, this is no longer a problem. .at is, instead of the classical p-values and 95% confidence intervals, which are offen misinterpreted respectively as "probability that the hypothesis is (in)correct" and "probability that the true parameter value drops within the interval is 95%," we can easily obtain PHjD and credible intervals which represent exactly the above. Moreover, with Bayesian tests, we can easily handle virtually any hypothesis, not just "equality of means," and obtain an Expected A Posteriori (EAP) value of any statistic that we are interested in. We provide simple tools to encourage the IR community to take up paired and unpaired Bayesian tests for comparing two systems. Using a variety of TREC and NTCIR data, we compare PHjD with p-values, credible intervals with con.-dence intervals, and Bayesian EAP effect sizes with classical ones. Our results show that (a) p-values and confidence intervals can respectively be regarded as approximations of what we really want, namely, PHjD and credible intervals
and (b) sample effect sizes from classical significance tests can di.er considerably from the Bayesian EAP effect sizes, which suggests that the former can be poor estimates of population effect sizes. For both paired and unpaired tests, we propose that the IR community report the EAP, the credible interval, and the probability of hypothesis being true, not only for the raw di.erence in means but also for the effect size in terms of Glass's.δ.

DOI

Scopus

12

Citation

(Scopus)
Evaluating mobile search with height-biased gain

Cheng Luo, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, Shaoping Ma

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 435 - 444 2017.08 [Refereed]

　View Summary

Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33% of the time, which is better than all existing metrics.

DOI

Scopus

22

Citation

(Scopus)
LSTM vs. BM25 for Open-domain QA: A hands-on comparison of effectiveness and efficiency

Sosuke Kato, Riku Togashi, Hideyuki Maeda, Sumio Fujita, Tetsuya Sakai

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 1309 - 1312 2017.08 [Refereed]

　View Summary

Recent advances in neural networks, along with the growth of rich and diverse community question answering (cQA) data, have en-abled researchers to construct robust open-domain question an-swering (QA) systems. It is often claimed that such state-of-The-art QA systems far outperform traditional IR baselines such as BM25. However, most such studies rely on relatively small data sets, e.g., those extracted from the old TREC QA tracks. Given mas-sive training data plus a separate corpus of Q&amp
A pairs as the tar-get knowledge source, how well would such a system really per-form? How fast would it respond? In this demonstration, we pro-vide the attendees of SIGIR 2017 an opportunity to experience a live comparison of two open-domain QA systems, one based on a long short-Term memory (LSTM) architecture with over 11 mil-lion Yahoo! Chiebukuro (i.e., Japanese Yahoo! Answers) questions and over 27.4 million answers for training, and the other based on BM25. Both systems use the same Q&amp
A knowledge source for answer retrieval. Our core demonstration system is a pair of Japan-ese monolingual QA systems, but we leverage machine translation for letting the SIGIR attendees enter English questions and com-pare the Japanese responses from the two systems after translating them into English.

DOI

Scopus

5

Citation

(Scopus)
Does document relevance affect the searcher's perception of time?

Cheng Luo, Yiqun Liu, Tetsuya Sakai, Ke Zhou, Fan Zhang, Xue Li, Shaoping Ma

WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining 141 - 150 2017.02 [Refereed]

　View Summary

Time plays an essential role in multiple areas of Information Retrieval (IR) studies such as search evaluation, user behavior analysis, temporal search result ranking and query understanding. Especially, in search evaluation studies, time is usually adopted as a measure to quantify users' efforts in search processes. Psychological studies have reported that the time perception of human beings can be affected by many stimuli, such as attention and motivation, which are closely related to many cognitive factors in search. Considering the fact that users' search experiences are affected by their subjective feelings of time, rather than the objective time measured by timing devices, it is necessary to look into the different factors that have impacts on search users' perception of time. In this work, we make a first step towards revealing the time perception mechanism of search users with the following contributions: (1) We establish an experimental research framework to measure the subjective perception of time while reading documents in search scenario, which originates from but is also different from traditional time perception measurements in psychological studies. (2) With the framework, we show that while users are reading result documents, document relevance has small yet visible effect on search users' perception of time. By further examining the impact of other factors, we demonstrate that the effect on relevant documents can also be influenced by individuals and tasks. (3) We conduct a preliminary experiment in which the difference between perceived time and dwell time is taken into consideration in a search evaluation task. We found that the revised framework achieved a better correlation with users' satisfaction feedbacks. This work may help us better understand the time perception mechanism of search users and provide insights in how to better incorporate time factor in search evaluation studies.

DOI

Scopus

7

Citation

(Scopus)
Investigating Users' Time Perception during Web Search

Cheng Luo, Xue Li, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, and Shaoping Ma

Proceedings of CHIIR 2017 2017 [Refereed]
Overview of Special Issue

Donna Harman and Diane Kelly (eds.), James Allan, Nicholas J. Belkin, Paul Bennett, Jamie Callan, Charles Clarke, Fernando Diaz, Susan Dumais, Nicola Ferro, Donna Harman, Djoerd Hiemstra, Ian Ruthven, Tetsuya Sakai, Mark D. Smucker, Justin Zobel

SIGIR Forum, 51(2) 2017 [Refereed]
Mobile Vertical Ranking based on Preference Graphs

Yuta Kadotami, Yasuaki Yoshida, Sumio Fujita, and Tetsuya Sakai

ACM ICTIR 2017 2017 [Refereed]
Ranking Rich Mobile Verticals based on Clicks and Abandonment

Mami Kawasaki, Inho Kang, and Tetsuya Sakai

Proceedings of ACM CIKM 2017 2017 [Refereed]
Test Collections and Measures for Evaluating Customer-Helpdesk Dialogues

Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, and Tetsuya Sakai

Proceedings of EVIA 2017 2017 [Refereed]
Evaluating Evaluation Measures with Worst-Case Confidence Interval Widths

Tetsuya Sakai

Proceedings of EVIA 2017 2017 [Refereed]
Towards Automatic Evaluation of Multi-Turn Dialogues: A Task Design that Leverages Inherently Subjective Annotations

Tetsuya Sakai

Proceedings of EVIA 2017 2017 [Refereed]
The Effect of Inter-Assessor Disagreement on IR System Evaluation: A Case Study with Lancers and Students

Tetsuya Sakai

Proceedings of EVIA 2017 2017 [Refereed]
Unanimity-Aware Gain for Highly Subjective Assessments

Tetsuya Sakai

Proceedings of EVIA 2017 2017 [Refereed]
RSL17BD at DBDC3: Computing Utterance Similarities based on Term Frequency and Word Embedding Vectors

Sosuke Kato and Tetsuya Sakai

Proceedings of DSTC6 2017 [Refereed]
Simple and effective approach to score standardisation

Tetsuya Sakai

ICTIR 2016 - Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval 95 - 104 2016.09 [Refereed]

　View Summary

Webber, Moffat and Zobel proposed score standardization for information retrieval evaluation with multiple test collections. Given a topic-by-run raw score matrix in terms of some evaluation measure, each score can be standardised using the topic's sample mean and sample standard deviation across a set of past runs so as to quantify how different a system is from the "average" system in standard deviation units. Using standardised scores, researchers can compare systems across different test collections without worrying about topic hardness or normalisation. While Webber et al. mapped the standardised scores to the [0,1] range using a standard normal cumulative density function, the present study demonstrates that linear transformation of the standardised scores, a method widely used in educational research, can be a simple and effective alternative. We use three TREC robust track data sets with graded relevance assessments and official runs to compare these methods by means of leave-one-out tests, discriminative power, swap rate tests, and topic set size design. In particular, we demonstrate that our method is superior to the method of Webber et al. in terms of swap rates and topic set size design: put simply, our method ensures pairwise system comparisons that are more consistent across different data sets, and is arguably more convenient for designing a new test collection from a statistical viewpoint.

DOI

Scopus

16

Citation

(Scopus)
Evaluating search result diversity using intent hierarchies

Xiaojie Wang, Zhicheng Dou, Tetsuya Sakai, Ji-Rong Wen

SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval 415 - 424 2016.07 [Refereed]

　View Summary

Search result diversification aims at returning diversified document lists to cover different user intents for ambiguous or broad queries. Existing diversity measures assume that user intents are independent or exclusive, and do not consider the relationships among the intents. In this paper, we introduce intent hierarchies to model the relationships among intents. Based on intent hierarchies, we propose several hierarchical measures that can consider the relationships among intents. We demonstrate the feasibility of hierarchical measures by using a new test collection based on TREC Web Track 2009-2013 diversity test collections. Our main experimental findings are: (1) Hierarchical measures are generally more discriminative and intuitive than existing measures using flat lists of intents
(2) When the queries have multilayer intent hierarchies, hierarchical measures are less correlated to existing measures, but can get more improvement in discriminative power
(3) Hierarchical measures are more intuitive in terms of diversity or relevance. The hierarchical measures using the whole intent hierarchies are more intuitive than only using the leaf nodes in terms of diversity and relevance.

DOI

Scopus

23

Citation

(Scopus)
Topic set size design

Tetsuya Sakai

INFORMATION RETRIEVAL JOURNAL 19 ( 3 ) 256 - 283 2016.06 [Refereed]

　View Summary

Traditional pooling-based information retrieval (IR) test collections typically have n = 50-100 topics, but it is difficult for an IR researcher to say why the topic set size should really be n. The present study provides details on principled ways to determine the number of topics for a test collection to be built, based on a specific set of statistical requirements. We employ Nagata's three sample size design techniques, which are based on the paired t test, one-way ANOVA, and confidence intervals, respectively. These topic set size design methods require topic-by-run score matrices from past test collections for the purpose of estimating the within-system population variance for a particular evaluation measure. While the previous work of Sakai incorrectly used estimates of the total variances, here we use the correct estimates of the within-system variances, which yield slightly smaller topic set sizes than those reported previously by Sakai. Moreover, this study provides a comparison across the three methods. Our conclusions nevertheless echo those of Sakai: as different evaluation measures can have vastly different within-system variances, they require substantially different topic set sizes under the same set of statistical requirements; by analysing the tradeoff between the topic set size and the pool depth for a particular evaluation measure in advance, researchers can build statistically reliable yet highly economical test collections.

DOI

Scopus

33

Citation

(Scopus)
On Estimating Variances for Topic Set Size Design

Tetsuya Sakai, Lifeng Shang

EVIA 2016 2016 [Refereed]
Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect User Preferences?

Makoto P. Kato, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, and Hajime Morita

EVIA 2016 2016 [Refereed]
Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS,2006-2015

Tetsuya Sakai

ACM SIGIR 2016 2016 [Refereed]
Two Sample T-tests for IR Evaluation: Student or Welch?

Tetsuya Sakai

ACM SIGIR 2016 2016 [Refereed]
Report on the First International Workshop on the Evaluation on Collaborative Information Seeking and Retrieval (ECol'2015),

Laure Soulier, Lynda Tamine, Tetsuya Sakai, Leif Azzopardi, and Jeremy Pickens

ACM ICTIR 2016 2016 [Refereed]
Topic Set Size Design and Power Analysis in Practice (Tutorial Abstract),

Tetsuya Sakai

ACM ICTIR 2016 2016 [Refereed]
The Effect of Score Standardisation on Topic Set Size Design

Tetsuya Sakai

INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2016 9994 16 - 28 2016 [Refereed]

　View Summary

Given a topic-by-run score matrix from past data, topic set size design methods can help test collection builders determine the number of topics to create for a new test collection from a statistical viewpoint. In this study, we apply a recently-proposed score standardisation method called std-AB to score matrices before applying topic set size design, and demonstrate its advantages. For topic set size design, std-AB suppresses score variances and thereby enables test collection builders to consider realistic choices of topic set sizes, and to handle unnormalised measures in the same way as normalised measures. In addition, even discrete measures that clearly violate normality assumptions look more continuous after applying std-AB, which may make them more suitable for statistically motivated topic set size design. Our experiments cover a variety of tasks and evaluation measures from NTCIR-12.

DOI

Scopus

2

Citation

(Scopus)
Search result diversification based on hierarchical intents

Sha Hu, Zhicheng Dou, Xiaojie Wang, Tetsuya Sakai, Ji-Rong Wen

International Conference on Information and Knowledge Management, Proceedings 19-23- 63 - 72 2015.10 [Refereed]

　View Summary

A large percentage of queries issued to search engines are broad or ambiguous. Search result diversification aims to solve this problem, by returning diverse results that can fulfill as many different information needs as possible. Most existing intent-aware search result diversification algorithms formulate user intents for a query as a flat list of subtopics. In this paper, we introduce a new hierarchical structure to represent user intents and propose two general hierarchical diversification models to leverage hierarchical intents. Experimental results show that our hierarchical diversification models outperform state-of-the-art diversification methods that use traditional flat subtopics.

DOI

Scopus

70

Citation

(Scopus)
Dynamic author name disambiguation for growing digital libraries

Yanan Qian, Qinghua Zheng, Tetsuya Sakai, Junting Ye, Jun Liu

INFORMATION RETRIEVAL 18 ( 5 ) 379 - 412 2015.10 [Refereed]

　View Summary

When a digital library user searches for publications by an author name, she often sees a mixture of publications by different authors who have the same name. With the growth of digital libraries and involvement of more authors, this author ambiguity problem is becoming critical. Author disambiguation (AD) often tries to solve this problem by leveraging metadata such as coauthors, research topics, publication venues and citation information, since more personal information such as the contact details is often restricted or missing. In this paper, we study the problem of how to efficiently disambiguate author names given an incessant stream of published papers. To this end, we propose a "BatchAD+IncAD" framework for dynamic author disambiguation. First, we perform batch author disambiguation (BatchAD) to disambiguate all author names at a given time by grouping all records (each record refers to a paper with one of its author names) into disjoint clusters. This establishes a one-to-one mapping between the clusters and real-world authors. Then, for newly added papers, we periodically perform incremental author disambiguation (IncAD), which determines whether each new record can be assigned to an existing cluster, or to a new cluster not yet included in the previous data. Based on the new data, IncAD also tries to correct previous AD results. Our main contributions are: (1) We demonstrate with real data that a small number of new papers often have overlapping author names with a large portion of existing papers, so it is challenging for IncAD to effectively leverage previous AD results. (2) We propose a novel IncAD model which aggregates metadata from a cluster of records to estimate the author's profile such as her coauthor distributions and keyword distributions, in order to predict how likely it is that a new record is "produced" by the author. (3) Using two labeled datasets and one large-scale raw dataset, we show that the proposed method is much more efficient than state-of-the-art methods while ensuring high accuracy.

DOI

Scopus

45

Citation

(Scopus)
Understanding the Inconsistencies between Text Descriptions and the Use of Privacy-sensitive Resources of Mobile Apps

Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, Hironori Washizaki, Tatsuya Mori

SOUPS 2015 2015 [Refereed]
Topic Set Size Design with the Evaluation Measures for Short Text Conversation

Tetsuya Sakai, Lifeng Shang, Zhengdong Lu, Hang Li

INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2015 9460 319 - 331 2015 [Refereed]

　View Summary

Short Text Conversation (STC) is a new NTCIR task which tackles the following research question: given a microblog repository and a new post to that microblog, can systems reuse an old comment from the respository to satisfy the author of the new post? The official evaluation measures of STC are normalised gain at 1 (nG@1), normalised expected reciprocal rank at 10 (nERR@10), and P+, all of which can be regarded as evaluation measures for navigational intents. In this study, we apply the topic set size design technique of Sakai to decide on the number of test topics, using variance estimates of the above evaluation measures. Our main conclusion is to create 100 test topics, but what distinguishes our work from other tasks with similar topic set sizes is that we know what this topic set size means from a statistical viewpoint for each of our evaluation measures. We also demonstrate that, under the same set of statistical requirements, the topic set sizes required by nERR@10 and P+ are more or less the same, while nG@1 requires more than twice as many topics. To our knowledge, our task is the first among all efforts at TREC-like evaluation conferences to actually create a new test collection by using this principled approach.

DOI

Scopus

4

Citation

(Scopus)
Designing test collections for comparing many systems

Tetsuya Sakai

CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management 61 - 70 2014.11 [Refereed]

　View Summary

A researcher decides to build a test collection for comparing her new information retrieval (IR) systems with several state-of-the-art baselines. She wants to know the number of topics (n) she needs to create in advance, so that she can start looking for (say) a query log large enough for sampling n good topics, and estimating the relevance assessment cost. We provide practical solutions to researchers like her using power analysis and sample size design techniques, and demonstrate its usefulness for several IR tasks and evaluation measures. We consider not only the paired t-test but also one-way analysis of variance (ANOVA) for significance testing to accommodate comparison of m(≥ 2) systems under a given set of statistical requirements (α: the Type I error rate, β: the Type II error rate, and minD: the minimum detectable difference between the best and the worst systems). Using our simple Excel tools and some pooled variance estimates from past data, researchers can design statistically well-designed test collections. We demonstrate that, as different evaluation measures have different variances across topics, they inevitably require different topic set sizes. This suggests that the evaluation measures should be chosen at the test collection design phase. Moreover, through a pool depth reduction experiment with past data, we show how the relevance assessment cost can be reduced dramatically while freezing the set of statistical requirements. Based on the cost analysis and the available budget, researchers can determine the right balance betweeen n and the pool depth pd. Our techniques and tools are applicable to test collections for non-IR tasks as well.

DOI

Scopus

13

Citation

(Scopus)
Metrics, Statistics, Tests (invited paper)

Tetsuya Sakai

PROMISE Winter School 2013: Bridging between Information Retrieval and Databases (LNCS 8174) 2014 [Refereed]
Statistical Reform in Information Retrieval?

Tetsuya Sakai

SIGIR Forum 2014 [Refereed]
Designing Test Collections That Provide Tight Confidence Intervals

Tetsuya Sakai

Forum on Information Technology 2014 13 ( 2 ) 15 - 18 2014 [Refereed]

CiNii
ReviewCollage: A Mobile Interface for Direct Comparison Using Online Reviews

Haojian Jin, Tetsuya Sakai, Koji Yatani

PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION WITH MOBILE DEVICES AND SERVICES (MOBILEHCI'14) 349 - 358 2014 [Refereed]

　View Summary

Review comments posted in online websites can help the user decide a product to purchase or place to visit. They can also be useful to closely compare a couple of candidate entities. However, the user may have to read different webpages back and forth for comparison, and this is not desirable particularly when she is using a mobile device. We present ReviewCollage, a mobile interface that aggregates information about two reviewed entities in a one-page view. ReviewCollage uses attribute-value pairs, known to be effective for review text summarization, and highlights the similarities and differences between the entities. Our user study confirms that ReviewCollage can support the user to compare two entities and make a decision within a couple of minutes, at least as quickly as existing summarization interfaces. It also reveals that ReviewCollage could be most useful when two entities are very similar.

DOI

Scopus

6

Citation

(Scopus)
Topic Set Size Design with Variance Estimates from Two-Way ANOVA

Tetsuya Sakai

EVIA 2014 2014 [Refereed]
When do people use query suggestion? A query suggestion log analysis

Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka

INFORMATION RETRIEVAL 16 ( 6 ) 725 - 746 2013.12 [Refereed]

　View Summary

Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it has not been clear what circumstances cause the user to turn to query suggestion. In order to investigate when and how the user uses query suggestion, we analyzed three kinds of data sets obtained from a major commercial Web search engine, comprising approximately 126 million unique queries, 876 million query suggestions and 306 million action patterns of users. Our analysis shows that query suggestions are often used (1) when the original query is a rare query, (2) when the original query is a single-term query, (3) when query suggestions are unambiguous, (4) when query suggestions are generalizations or error corrections of the original query, and (5) after the user has clicked on several URLs in the first search result page. Our results suggest that search engines should provide better assistance especially when rare or single-term queries are input, and that they should dynamically provide query suggestions according to the searcher's current state.

DOI

Scopus

33

Citation

(Scopus)
Introduction to the special issue on search intents and diversification

Tetsuya Sakai, Noriko Kando, Craig Macdonald, Ian Soboroff

INFORMATION RETRIEVAL 16 ( 4 ) 427 - 428 2013.08 [Refereed]

DOI

Scopus

1

Citation

(Scopus)
Diversified search evaluation: lessons from the NTCIR-9 INTENT task

Tetsuya Sakai, Ruihua Song

INFORMATION RETRIEVAL 16 ( 4 ) 504 - 529 2013.08 [Refereed]

　View Summary

The evaluation of diversified web search results is a relatively new research topic and is not as well-understood as the time-honoured evaluation methodology of traditional IR based on precision and recall. In diversity evaluation, one topic may have more than one intent, and systems are expected to balance relevance and diversity. The recent NTCIR-9 evaluation workshop launched a new task called INTENT which included a diversified web search subtask that differs from the TREC web diversity task in several aspects: the choice of evaluation metrics, the use of intent popularity and per-intent graded relevance, and the use of topic sets that are twice as large as those of TREC. The objective of this study is to examine whether these differences are useful, using the actual data recently obtained from the NTCIR-9 INTENT task. Our main experimental findings are: (1) The evaluation framework used at NTCIR provides more "intuitive" and statistically reliable results than Intent-Aware Expected Reciprocal Rank; (2) Utilising both intent popularity and per-intent graded relevance as is done at NTCIR tends to improve discriminative power, particularly for -nDCG; and (3) Reducing the topic set size, even by just 10 topics, can affect not only significance testing but also the entire system ranking; when 50 topics are used (as in TREC) instead of 100 (as in NTCIR), the system ranking can be substantially different from the original ranking and the discriminative power can be halved. These results suggest that the directions being explored at NTCIR are valuable.

DOI

Scopus

15

Citation

(Scopus)
Web Search Evaluation with Informational and Navigational Intents

Tetsuya Sakai

Journal of Information Processing 2013 [Refereed]
The Unreusability of Diversified Test Collections

Tetsuya Sakai

EVIA 2013 2013 [Refereed]
Summaries, Ranked Retrieval and Sessions: A Unified Framework for Information Access Evaluation

Tetsuya Sakai and Zhicheng Dou

ACM SIGIR 2013 2013 [Refereed]
Exploring semi-automatic nugget extraction for Japanese one click access evaluation

Matthew Ekstrand-Abueg, Virgil Pavlu, Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 749 - 752 2013 [Refereed]

　View Summary

Building test collections based on nuggets is useful evaluating systems that return documents, answers, or summaries. However, nugget construction requires a lot of manual work and is not feasible for large query sets. Towards an efficient and scalable nugget-based evaluation, we study the applicability of semi-automatic nugget extraction in the context of the ongoing NTCIR One Click Access (1CLICK) task. We compare manually-extracted and semi-automatically- extracted Japanese nuggets to demonstrate the coverage and efficiency of the semi-automatic nugget extraction. Our findings suggest that the manual nugget extraction can be replaced with a direct adaptation of the English semi-automatic nugget extraction system, especially for queries for which the user desires broad answers from free-form text. Copyright © 2013 ACM.

DOI

Scopus

3

Citation

(Scopus)
Report from the NTCIR-10 1CLICK-2 Japanese subtask: Baselines, upperbounds and evaluation robustness

Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 753 - 756 2013 [Refereed]

　View Summary

The One Click Access Task (1CLICK) of NTCIR requires systems to return a concise multi-document summary of web pages in response to a query which is assumed to have been submitted in a mobile context. Systems are evaluated based on information units (or iUnits), and are required to present important pieces of information first and to minimise the amount of text the user has to read. Using the official Japanese results of the second round of the 1CLICK task from NTCIR-10, we discuss our task setting and evaluation framework. Our analyses show that: (1) Simple baseline methods that leverage search engine snippets or Wikipedia are effective for "lookup" type queries but not necessarily for other query types
(2) There is still a substantial gap between manual and automatic runs
and (3) Our evaluation metrics are relatively robust to the incompleteness of iUnits. Copyright © 2013 ACM.

DOI

Scopus

4

Citation

(Scopus)
Summary of the NTCIR-10 INTENT-2 Task: Subtopic mining and search result diversification

Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Makoto P. Kato

SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 761 - 764 2013 [Refereed]

　View Summary

The NTCIR INTENT task comprises two subtasks: Subtopic Mining, where systems are required to return a ranked list of subtopic strings for each given query
and Document Ranking, where systems are required to return a diversified web search result for each given query. This paper summarises the novel features of the Second INTENT task at NTCIR-10 and its main findings, and poses some questions for future diversified search evaluation. Copyright © 2013 ACM.

DOI

Scopus

14

Citation

(Scopus)
Time-aware structured query suggestion

Taiki Miyanishi, Tetsuya Sakai

SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 809 - 812 2013 [Refereed]

　View Summary

Most commercial search engines have a query suggestion feature, which is designed to capture various possible search intents behind the user's original query. However, even though different search intents behind a given query may have been popular at different time periods in the past, existing query suggestion methods neither utilize nor present such information. In this study, we propose Time-aware Structured Query Suggestion (TaSQS) which clusters query suggestions along a timeline so that the user can narrow down his search from a temporal point of view. Moreover, when a suggested query is clicked, TaSQS presents web pages from query-URL bipartite graphs after ranking them according to the click counts within a particular time period. Our experiments using data from a commercial search engine log show that the time-aware clustering and the time-aware document ranking features of TaSQS are both effective. Copyright © 2013 ACM.

DOI

Scopus

17

Citation

(Scopus)
The Impact of Intent Selection on Diversified Search Evaluation

Tetsuya Sakai, Zhicheng Dou, Charles L.A. Clarke

ACM SIGIR 2013 2013 [Refereed]
Evaluating Heterogeneous Information Access (Position paper)

Ke Zhou, Tetsuya Sakai, Mounia Lalmas, Zhicheng Dou, and Joemon M. Jose

Workshop on Modeling User Behavior for Information Access Evaluation 2013 [Refereed]
Mining Search Intents from Text Fragments

Qinglei Wang, Yanan Qian, Ruihua Song, Zhicheng Dou, Fan Zhang, Tetsuya Sakai, and Qinghua Zheng

Information Retrieval 2013 [Refereed]
On the reliability and intuitiveness of aggregated search metrics

Ke Zhou, Mounia Lalmas, Tetsuya Sakai, Ronan Cummins, Joemon M. Jose

International Conference on Information and Knowledge Management, Proceedings 689 - 698 2013 [Refereed]

　View Summary

Aggregating search results from a variety of diverse verticals such as news, images, videos and Wikipedia into a single interface is a popular web search presentation paradigm. Although several aggregated search (AS) metrics have been proposed to evaluate AS result pages, their properties remain poorly understood. In this paper, we compare the properties of existing AS metrics under the assumptions that (1) queries may have multiple preferred verticals
(2) the likelihood of each vertical preference is available
and (3) the topical relevance assessments of results returned from each vertical is available. We compare a wide range of AS metrics on two test collections. Our main criteria of comparison are (1) discriminative power, which represents the reliability of a metric in comparing the performance of systems, and (2) intuitiveness, which represents how well a metric captures the various key aspects to be measured (i.e. various aspects of a user's perception of AS result pages). Our study shows that the AS metrics that capture key AS components (e.g., vertical selection) have several advantages over other metrics. This work sheds new lights on the further developments and applications of AS metrics. Copyright 2013 ACM.

DOI

Scopus

15

Citation

(Scopus)
Dynamic query intent mining from a search log stream

Yanan Qian, Tetsuya Sakai, Junting Ye, Qinghua Zheng, Cong Li

International Conference on Information and Knowledge Management, Proceedings 1205 - 1208 2013 [Refereed]

　View Summary

It has long been recognized that search queries are often broad and ambiguous. Even when submitting the same query, different users may have different search intents. Moreover, the intents are dynamically evolving. Some intents are constantly popular with users, others are more bursty. We propose a method for mining dynamic query intents from search query logs. By regarding the query logs as a data stream, we identify constant intents while quickly capturing new bursty intents. To evaluate the accuracy and efficiency of our method, we conducted experiments using 50 topics from the NTCIR INTENT-9 data and additional five popular topics, all supplemented with six-month query logs from a commercial search engine. Our results show that our method can accurately capture new intents with short response time. Copyright 2013 ACM.

DOI

Scopus

17

Citation

(Scopus)
How intuitive are diversified search metrics? Concordance test results for the diversity U-measures

Tetsuya Sakai

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8281 13 - 24 2013 [Refereed]

　View Summary

Most of the existing Information Retrieval (IR) metrics discount the value of each retrieved relevant document based on its rank. This statement also applies to the evaluation of diversified search: the widely-used diversity metrics, namely, α-nDCG, Intent-Aware Expected Reciprocal Rank (ERR-IA) and D#-nDCG, are all rank-based. These evaluation metrics regard the system output as a list of document IDs, and ignore all other features such as snippets and document full texts of various lengths. In contrast, the U-measure framework of Sakai and Dou uses the amount of text read by the user as the foundation for discounting the value of relevant information, and can take into account the user's snippet reading and full text reading behaviours. The present study compares the diversity versions of U-measure (D-U and U-IA) with the state-of-the-art diversity metrics using the concordance test: given a pair of ranked lists, we quantify the ability of each metric to favour the more diversified and more relevant list. Our results show that while D#-nDCG is the overall winner in terms of simultaneous concordance with diversity and relevance, D-U and U-IA statistically significantly outperform other state-of-the-art metrics. Moreover, in terms of concordance with relevance alone, D-U and U-IA significantly outperform all rank-based diversity metrics. Thus, D-U and U-IA are not only more realistic but also more relevance-oriented than other diversity metrics. © 2013 Springer-Verlag.

DOI

Scopus

6

Citation

(Scopus)
User-aware advertisability

Hai-Tao Yu, Tetsuya Sakai

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8281 452 - 463 2013 [Refereed]

　View Summary

In sponsored search, many studies focus on finding the most relevant advertisements (ads) and their optimal ranking for a submitted query. Determining whether it is suitable to show ads has received less attention. In this paper, we introduce the concept of user-aware advertisability, which refers to the probability of ad-click on sponsored ads when a specific user submits a query. When computing the advertisability for a given query-user pair, we first classify the clicked web pages based on a pre-defined category hierarchy and use the aggregated topical categories of clicked web pages to represent user preference. Taking user preference into account, we then compute the ad-click probability for this query-user pair. Compared with existing methods, the experimental results show that user preference is of great value for generating user-specific advertisability. In particular, our approach that computes advertisability per query-user pair outperforms the two state-of-the-art methods that compute advertisability per query in terms of a variant of the normalized Discounted Cumulative Gain metric. © 2013 Springer-Verlag.

DOI

Scopus
Estimating intent types for search result diversification

Kosetsu Tsukuda, Tetsuya Sakai, Zhicheng Dou, Katsumi Tanaka

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8281 25 - 37 2013 [Refereed]

　View Summary

Given an ambiguous or underspecified query, search result diversification aims at accommodating different user intents within a single Search Engine Result Page (SERP). While automatic identification of different intents for a given query is a crucial step for result diversification, also important is the estimation of intent types (informational vs. navigational). If it is possible to distinguish between informational and navigational intents, search engines can aim to return one best URL for each navigational intent, while allocating more space to the informational intents within the SERP. In light of the observations, we propose a new framework for search result diversification that is intent importance-aware and type-aware. Our experiments using the NTCIR-9 INTENT Japanese Subtopic Mining and Document Ranking test collections show that: (a) our intent type estimation method for Japanese achieves 64.4% accuracy
and (b) our proposed diversification method achieves 0.6373 in D#-nDCG and 0.5898 in DIN#-nDCG over 56 topics, which are statistically significant gains over the top performers of the NTCIR-9 INTENT Japanese Document Ranking runs. Moreover, our relevance oriented model significantly outperforms our diversity oriented model and the original model by Dou et al.. © 2013 Springer-Verlag.

DOI

Scopus

6

Citation

(Scopus)
On labelling intent types for evaluating search result diversification

Tetsuya Sakai, Young-In Song

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8281 38 - 49 2013 [Refereed]

　View Summary

Search result diversification is important for accommodating different user needs by means of covering popular and diverse query intents within a single result page. To evaluate diversity, we believe that it is important to consider the distinction between informational and navigational intents, as users would not want redundant information especially for navigational intents. In this study, we conduct intent type-sensitive diversity evaluation based on both top-down labelling, which labels each intent as either navigational or informational a priori, and bottom-up labelling, which labels each intent based on whether a "navigational relevant" document has actually been identified in the document collection. Our results suggest that reliable type-sensitive diversity evaluation can be conducted using the top-down approach with a clear intent labelling guideline, while ensuring that the desired URLs for navigational intents make their way into relevance assessments. © 2013 Springer-Verlag.

DOI

Scopus

1

Citation

(Scopus)
Query snowball: A co-occurrence-based approach to multi-document summarization for question answering

Hajime Morita, Tetsuya Sakai, Manabu Okumura

IPSJ Online Transactions 5 ( 2012 ) 124 - 129 2012 [Refereed]

　View Summary

We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.

DOI

Scopus
Evaluation with Informational and Navigational Intents

Tetsuya Sakai

WWW 2012 2012 [Refereed]
Structured query suggestion for specialization and parallel movement: Effect on search behaviors

Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka

WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web 389 - 398 2012 [Refereed]

　View Summary

Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it is often difficult for the user to choose from a list of query suggestions, and to understand the relation between an input query and suggested ones. In this paper, we propose a new method to present query suggestions to the user, which has been designed to help two popular query reformulation actions, namely, specialization (e.g. from "nikon" to "nikon camera") and parallel movement (e.g. from "nikon camera" to "canon camera"). Using a query log collected from a popular commercial Web search engine, our prototype called SParQS classifies query suggestions into automatically generated categories and generates a label for each category. Moreover, SParQS presents some new entities as alternatives to the original query (e.g. "canon" in response to the query "nikon"), together with their query suggestions classified in the same way as the original query's suggestions. We conducted a task-based user study to compare SParQS with a traditional "flat list" query suggestion interface. Our results show that the SParQS interface enables subjects to search more successfully than the flat list case, even though query suggestions presented were exactly the same in the two interfaces. In addition, the subjects found the query suggestions more helpful when they were presented in the SParQS interface rather than in a flat list.

DOI

Scopus

30

Citation

(Scopus)
Query snowball: A co-occurrence-based approach to multi-document summarization for question answering

Hajime Morita, Tetsuya Sakai, Manabu Okumura

IPSJ Online Transactions 5 ( 2012 ) 124 - 129 2012 [Refereed]

　View Summary

We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.

DOI

Scopus
AspecTiles: Tile-based visualization of diversified web search results

Mayu Iwata, Tetsuya Sakai, Takehiro Yamamoto, Yu Chen, Yi Liu, Ji-Rong Wen, Shojiro Nishio

SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval 85 - 94 2012 [Refereed]

　View Summary

A diversified search result for an underspecified query generally contains web pages in which there are answers that are relevant to different aspects of the query. In order to help the user locate such relevant answers, we propose a simple extension to the standard Search Engine Result Page (SERP) interface, called AspecTiles. In addition to presenting a ranked list of URLs with their titles and snippets, AspecTiles visualizes the relevance degree of a document to each aspect by means of colored squares ("tiles"). To compare AspecTiles with the standard SERP interface in terms of usefulness, we conducted a user study involving 30 search tasks designed based on the TREC web diversity task topics as well as 32 participants. Our results show that AspecTiles has some advantages in terms of search performance, user behavior, and user satisfaction. First, AspecTiles enables the user to gather relevant information significantly more efficiently than the standard SERP interface for tasks where the user considers several different aspects of the query to be important at the same time (multi-aspect tasks). Second, AspecTiles affects the user's information seeking behavior: with this interface, we observed significantly fewer query reformulations, shorter queries and deeper examinations of ranked lists in multi-aspect tasks. Third, participants of our user study found AspecTiles significantly more useful for finding relevant information and easy to use than the standard SERP interface. These results suggest that simple interfaces like AspecTiles can enhance the search performance and search experience of the user when their queries are underspecified. © 2012 ACM.

DOI

Scopus

12

Citation

(Scopus)
Towards Zero-Click Mobile IR Evaluation: Knowing What and Knowing When

Tetsuya Sakai

ACM SIGIR 2012 2012 [Refereed]
New Assessment Criteria for Query Suggestion

Zhongrui Ma, Yu Chen, Ruihua Song, Tetsuya Sakai, Jiaheng Lu, and Ji-Rong Wen

ACM SIGIR 2012 2012 [Refereed]
The wisdom of advertisers: Mining subgoals via query clustering

Takehiro Yamamoto, Tetsuya Sakai, Mayu Iwata, Chen Yu, Ji-Rong Wen, Katsumi Tanaka

ACM International Conference Proceeding Series 505 - 514 2012 [Refereed]

　View Summary

This paper tackles the problem of mining subgoals of a given search goal from data. For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as "book flights," "book a hotel," "find good restaurants" and "decide which sightseeing spots to visit." As another example, if a searcher wants to lose weight, there may exist several alternative solutions such as "do physical exercise," "take diet pills," and "control calorie intake." In this paper, we refer to such subtasks or solutions as subgoals, and propose to utilize sponsored search data for finding subgoals of a given query by means of query clustering. Advertisements (ads) reflect advertisers' tremendous efforts in trying to match a given query with implicit user needs. Moreover, ads are usually associated with a particular action or transaction. We therefore hypothesized that they are useful for subgoal mining. To our knowledge, our work is the first to use sponsored search data for this purpose. Our experimental results show that sponsored search data is a good resource for obtaining related queries and for identifying subgoals via query clustering. In particular, our method that combines ad impressions from sponsored search data and query co-occurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall. © 2012 ACM.

DOI

Scopus

7

Citation

(Scopus)
The reusability of a diversified search test collection

Tetsuya Sakai, Zhicheng Dou, Ruihua Song, Noriko Kando

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7675 26 - 38 2012 [Refereed]

　View Summary

Traditional ad hoc IR test collections were built using a relatively large pool depth (e.g. 100), and are usually assumed to be reusable. Moreover, when they are reused to compare a new system with another or with systems that contributed to the pools ("contributors"), an even larger measurement depth (e.g. 1,000) is often used for computing evaluation metrics. In contrast, the web diversity test collections that have been created in the past few years at TREC and NTCIR use a much smaller pool depth (e.g. 20). The measurement depth is also small (e.g. 10-30), as search result diversification is primarily intended for the first result page. In this study, we examine the reusability of a typical web diversity test collection, namely, one from the NTCIR-9 INTENT-1 Chinese Document Ranking task, which used a pool depth of 20 and official measurement depths of 10, 20 and 30. First, we conducted additional relevance assessments to expand the official INTENT-1 collection to achieve a pool depth of 40. Using the expanded relevance assessments, we show that run rankings at the measurement depth of 30 are too unreliable, given that the pool depth is 20. Second, we conduct a leave-one-out experiment for every participating team of the INTENT-1 Chinese task, to examine how (un)fairly new runs are evaluated with the INTENT-1 collection. We show that, for the purpose of comparing new systems with the contributors of the test collection being used, condensed-list versions of existing diversity evaluation metrics are more reliable than the raw metrics. However, even the condensed-list metrics may be unreliable if the new systems are not competitive compared to the contributors. © Springer-Verlag 2012.

DOI

Scopus

5

Citation

(Scopus)
One click one revisited: Enhancing evaluation based on information units

Tetsuya Sakai, Makoto P. Kato

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7675 39 - 51 2012 [Refereed]

　View Summary

This paper extends the evaluation framework of the NTCIR-9 One Click Access Task (1CLICK-1), which required systems to return a single, concise textual output in response to a query in order to satisfy the user immediately after a click on the SEARCH button. Unlike traditional nugget-based summarisation and question answering evaluation methods, S-measure, the official evaluation metric of 1CLICK-1, discounts the value of each information unit based on its position within the textual output. We first show that the discount parameter L of S-measure affects system ranking and discriminative power, and that using multiple values, e.g. L = 250 (user has only 30 seconds to view the text) and L = 500 (user has one minute), is beneficial. We then complement the recall-like S-measure with a simple, precision-like metric called T-measure as well as a combination of S-measure and T-measure, called S#. We show that S# with a heavy emphasis on S-measure imposes an appropriate length penalty to 1CLICK-1 system outputs and yet achieves discriminative power that is comparable to S-measure. These new metrics will be used at NTCIR-10 1CLICK-2. © Springer-Verlag 2012.

DOI

Scopus

3

Citation

(Scopus)
Grid-based interaction for exploratory search

Hideo Joho, Tetsuya Sakai

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7675 496 - 505 2012 [Refereed]

　View Summary

This paper presents a grid-based interaction model that is designed to encourage searchers to organize a complex search space by managing n x m sub spaces. A search interface was developed based on the proposed interaction model, and its performance was evaluated by a user study carried out in the context of the NTCIR-9 VisEx Task. With the proposed interface, there were cases where subjects discovered new knowledge without accessing external resources when compared to a baseline system. The encouraging results from experiments warrant further studies on the model. © Springer-Verlag 2012.

DOI

Scopus
Using graded-relevance metrics for evaluating community QA answer selection

Tetsuya Sakai, Yohei Seki, Daisuke Ishikawa, Kazuko Kuriyama, Noriko Kando, Chin-Yew Lin

Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011 187 - 196 2011 [Refereed]

　View Summary

Community Question Answering (CQA) sites such as Yahoo ! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation
and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments. Copyright 2011 ACM.

DOI

Scopus

36

Citation

(Scopus)
Query Session Data vs. Clickthrough Data as Query Suggestion Resources

Makoto P. Kato, Tetsuya Sakai, and Katsumi Tanaka

ECIR 2011 Workshop on Session Information Retrieval 2011 [Refereed]
Challenges in Diversity Evaluation (keynote)

Tetsuya Sakai

ECIR 2011 Workshop on Diversity in Document Retrieval 2011 [Refereed]
Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

Naoyoshi Aikawa, Tetsuya Sakai, and Hayato Yamana

情報処理学会論文誌 2011 ( 1 ) 1 - 9 2011 [Refereed]

CiNii
Evaluating Diversified Search Results Using Per-Intent Graded Relevance

Tetsuya Sakai and Ruihua Song

ACM SIGIR 2011 2011 [Refereed]
NTCIREVAL: A Generic Toolkit for Information Access Evaluation

Tetsuya Sakai

FIT 2011 2011 [Refereed]
コミュニティQAにおける良質回答の自動予測

石川大介, 酒井哲也, 関洋平, 栗山和子, 神門典子

情報知識学会誌 2011 [Refereed]
北京のマイクロソフト研究所より2011 - 日本人インターンの成功事例 -

酒井哲也

若手研究者支援のための産学共同GCOE国内シンポジウムダイジェスト集 2011 [Refereed]
What Makes a Good Answer in Community Question Answering? An Analysis of Assessors' Criteria

Daisuke Ishikawa, Noriko Kando, and Tetsuya Sakai

EVIA2011 2011 [Refereed]
Analysis of Best-Answer Estimation for a Q&A Site and its Application to Machine Learning

ISHIKAWA Daisuke, KURIYAMA Kazuko, SAKAI Tetsuya, SEKI Yohei, KANDO Noriko

Journal of Japan Society of Information and Knowledge 20 ( 2 ) 73 - 85 2010.05

　View Summary

In this research, we investigated whether a computer could estimate the best answer on a Q site. First, a best answer estimation experiment was carried out with human assessors. The data of Yahoo! Chiebukuro was used for the experiment; 50 questions extracted at random from four categories, viz., "Consultation of love," "Personal computer,""General knowledge," and "Politics," were used. The accuracy rate(precision) of the estimation by two assessors was 50% and 52% (random estimation: 34%) for "Consultation of love," 62% and 58% (random estimation: 38%) for "Personal computer," 54% and 56% (random estimation: 37%) for "General knowledge," and 56% and 60% (random estimation:35.8%) for "Politics." Next, the experimental results were analyzed, and the machine learning system with "Detailed","Evidence", and "Polite" in the feature as a factor to choose the best answer was constructed. The precision of the machine learning system exceeded the assessors' results in the "Personal computer"(67%) category, and it fell below the assessors' results in the "Consultation of love"(41%) category. In the "General knowledge" and "Politics" categories, the precision of the machine learning system was almost equal to the assessors' results.

DOI CiNii
Boiling Down Information Retrieval Test Collections

Tetsuya Sakai, Teruko Mitamura

RIAO 2010 Proceedings 2010 [Refereed]
Constructing a Test Collection with Multi-Intent Queries

Ruihua Song, Dongjie Qi, Hua Liu, Tetsuya Sakai, Jian-Yun Nie, Hsiao-Wen Hon, and Yong Yu:

EVIA 2010 Proceedings 2010 [Refereed]
Simple Evaluation Metrics for Diversified Search Results

Tetsuya Sakai, Nick Craswell, Ruihua Song, Stephen Robertson, Zhicheng Dou, and Chin-Yew Lin:

EVIA 2010 Proceedings 2010 [Refereed]
Ranking Retrieval Systems without Relevance Assessments – Revisited

Tetsuya Sakai and Chin-Yew Lin

EVIA 2010 Proceedings 2010 [Refereed]
コミュニティQAにおける良質な回答の選定タスク: 評価方法に関する考察

酒井哲也, 石川大介, 栗山和子, 関洋平, 神門典子

FIT 2010 2010 [Refereed]
Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

Naoyoshi Aikawa, Tetsuya Sakai, and Hayato Yamana

WebDB Forum 2010 2010 [Refereed]
On the robustness of information retrieval metrics to biased relevance assessments

Tetsuya Sakai

Journal of Information Processing 17 156 - 166 2009 [Refereed]

　View Summary

Information Retrieval (IR) test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used IR evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in more realistic settings, by reducing the number of pooled systems and the number of pooled documents. Even though previous studies have shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that these results do not hold when the relevance data are biased towards particular systems or towards the top of the pools. More specifically, we show that the condensed-list versions of Average Precision, Qmeasure and normalised Discounted Cumulative Gain, which we denote as AP′, Q′ and nDCG′, are not necessarily superior to the original metrics for handling biases. Nevertheless, AP′ and Q′ are generally superior to bpref, Rank-Biased Precision and its condensed-list version even in the presence of biases.

DOI

Scopus

4

Citation

(Scopus)
Serendipitous Search via Wikipedia: A Query Log Analysis

Tetsuya Sakai, Kenichi Nogami

PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL 780 - 781 2009 [Refereed]

　View Summary

We analyse the query log of a click-oriented Japanese search engine that utilises the link structures of Wikipedia for encouraging the user to change his information need and to perform repeated, serendipitous, exploratory search. Our results show that users tend to make transitions within the same query type: from person names to person names, from place names to place names, and so on.
Ranking the NTCIR ACLIA IR4QA Systems without Relevance Assessments

Tetsuya Sakai, Noriko Kando, Hideki Shima, Chuan-Jie Lin, Ruihua Song, Miho Sugimoto, and Teruko Mitamura

日本データベース学会論文誌 2009 [Refereed]
People, Clouds, and Interaction for Information Access (invited paper)

Tetsuya Sakai

IUCS 2009 2009 [Refereed]
On information retrieval metrics designed for evaluation with incomplete relevance assessments

Tetsuya Sakai, Noriko Kando

INFORMATION RETRIEVAL 11 ( 5 ) 447 - 470 2008.10 [Refereed]

　View Summary

Modern information retrieval (IR) test collections have grown in size, but the available manpower for relevance assessments has more or less remained constant. Hence, how to reliably evaluate and compare IR systems using incomplete relevance data, where many documents exist that were never examined by the relevance assessors, is receiving a lot of attention. This article compares the robustness of IR metrics to incomplete relevance assessments, using four different sets of graded-relevance test collections with submitted runs-the TREC 2003 and 2004 robust track data and the NTCIR-6 Japanese and Chinese IR data from the crosslingual task. Following previous work, we artificially reduce the original relevance data to simulate IR evaluation environments with extremely incomplete relevance data. We then investigate the effect of this reduction on discriminative power, which we define as the proportion of system pairs with a statistically significant difference for a given probability of Type I Error, and on Kendall's rank correlation, which reflects the overall resemblance of two system rankings according to two different metrics or two different relevance data sets. According to these experiments, Q', nDCG' and AP' proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also point out some weaknesses of bpref and Rank-Biased Precision by examining their formal definitions.

DOI

Scopus

110

Citation

(Scopus)
Introduction to the NTCIR-6 Special Issue

Noriko Kando, Teruko Mitamura, and Tetsuya Sakai

ACM Transactions on Asian Language Information Processing (TALIP) 2008 [Refereed]
Precision-at-ten considered redundant

William Webber, Alistair Moffat, Justin Zobel, Tetsuya Sakai

ACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings 695 - 696 2008 [Refereed]

　View Summary

Information retrieval systems are compared using evaluation metrics, with researchers commonly reporting results for simple metrics such as precision-at-10 or reciprocal rank together with more complex ones such as average precision or discounted cumulative gain. In this paper, we demonstrate that complex metrics are as good as or better than simple metrics at predicting the performance of the simple metrics on other topics. Therefore, reporting of results from simple metrics alongside complex ones is redundant.

DOI

Scopus

29

Citation

(Scopus)
Comparing Metrics across TREC and NTCIR: The Robustness to Pool Depth Bias

Tetsuya Sakai

ACM SIGIR 2008 Proceedings 2008 [Refereed]

CiNii
クリックスルーに基づく探検型検索サイトの設計と開発,

酒井哲也, 小山田浩史, 野上謙一, 北村仁美, 梶浦正浩, 東美奈子, 野中由美子, 小野雅也, 菊池豊

第7回情報科学技術フォーラム2008 2008 [Refereed]

CiNii
Comparing metrics across TREC and NTCIR: The robustness to system bias

Tetsuya Sakai

International Conference on Information and Knowledge Management, Proceedings 581 - 590 2008 [Refereed]

　View Summary

Test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in a more realistic setting, by reducing the number of pooled systems. Even though previous work has shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that these results do not hold in the presence of system bias. In our experiments using TREC and NTCIR data, we first show that condensed-list metrics overestimate new systems while traditional metrics underestimate them, and that the overestimation tends to be larger than the underestimation. We then show that, when relevance data is heavily biased towards a single team or a few teams, the condensed-list versions of Average Precision (AP), Q-measure (Q) and normalised Discounted Cumulative Gain (nDCG), which we call AP', Q' and nDCG', are not necessarily superior to the original metrics in terms of discriminative power, i.e., the overall ability to detect pairwise statistical significance. Nevertheless, even under system bias, AP' and Q' are generally more discriminative than bpref and the condensed-list version of Rank-Biased Precision (RBP), which we call RBP'. Copyright 2008 ACM.

DOI

Scopus

29

Citation

(Scopus)
Modelling A User Population for Designing Information Retrieval Metrics

Tetsuya Sakai and Stephen Robertson

Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008) 2008 [Refereed]
On the reliability of information retrieval metrics based on graded relevance

Tetsuya Sakai

INFORMATION PROCESSING & MANAGEMENT 43 ( 2 ) 531 - 548 2007.03 [Refereed]

　View Summary

This paper compares 14 information retrieval metrics based on graded relevance, together with 10 traditional metrics based on binary relevance, in terms of stability, sensitivity and resemblance of system rankings. More specifically, we compare these metrics using the Buckley/Voorhees stability method, the Voorhees/Buckley swap method and Kendall's rank correlation, with three data sets comprising test collections and submitted runs from NTCIR. Our experiments show that (Average) Normalised Discounted Cumulative Gain at document cut-off I are the best among the rank-based graded-relevance metrics, provided that l is large. On the other hand, if one requires a recall-based graded-relevance metric that is highly correlated with Average Precision, then Q-measure is the best choice. Moreover, these best graded-relevance metrics are at least as stable and sensitive as Average Precision, and are fairly robust to the choice of gain values. (c) 2006 Elsevier Ltd. All rights reserved.

DOI

Scopus

83

Citation

(Scopus)
On the Reliability of Factoid Question Answering Evaluation

Tetsuya Sakai

ACM Transactions on Asian Language Information Processing (TALIP) 2007 [Refereed]
On Penalising Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance

Tetsuya Sakai

Proceedings of the First Workshop on Evaluating Information Access (EVIA 2007) 32 - 43 2007 [Refereed]

CiNii
User Satisfaction Task: A Proposal for NTCIR-7

Tetsuya Sakai

Proceedings of the First Workshop on Evaluating Information Access (EVIA 2007), 2007 [Refereed]
Pic-A-Topic: Efficient Viewing of Informative TV Contents on Travel, Cooking, Food and More

Tetsuya Sakai, Tatsuya Uehara, Taishi Shimomori, Makoto Koyama, and Mika Fukui

RIAO 2007 Proceedings 2007 [Refereed]
Alternatives to Bpref

Tetsuya Sakai

Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 71 - 78 2007 [Refereed]

　View Summary

Recently, a number of TREC tracks have adopted a retrieval effectiveness metric called bpref which has been designed for evaluation environments with incomplete relevance data. A graded-relevance version of this metric called rpref has also been proposed. However, we show that the application of Q-measure, normalised Discounted Cumulative Gain (nDCG) or Average Precision (AveP)to condensed lists, obtained by ?ltering out all unjudged documents from the original ranked lists, is actually a better solution to the incompleteness problem than bpref. Furthermore, we show that the use of graded relevance boosts the robustness of IR evaluation to incompleteness and therefore that Q-measure and nDCG based on condensed lists are the best choices. To this end, we use four graded-relevance test collections from NTCIR to compare ten different IR metrics in terms of system ranking stability and pairwise discriminative power. Copyright 2007 ACM.

DOI

Scopus

140

Citation

(Scopus)
Evaluating the Task of Finding One Relevant Document Using Incomplete Relevance Data

Tetsuya Sakai

FIT 2007 Information Technology Letters 2007 [Refereed]
Evaluating Information Retrieval Metrics based on Bootstrap Hypothesis Tests

Tetsuya Sakai

IPSJ TOD 2007 [Refereed]
On the Properties of Evaluation Metrics for Finding One Highly Relevant Document

Tetsuya Sakai

IPSJ TOD 2007 [Refereed]
高精度な音声入力質問応答のための疑問表現補完

筒井秀樹, 真鍋俊彦, 福井美佳, 酒井哲也, 藤井寛子, 浦田耕二

情報処理学会論文誌 2007 [Refereed]
よりよい検索システム実現のために：正解の良し悪しを考慮した情報検索評価の動向

酒井哲也

情報処理 2006 [Refereed]
A Further Note on Evaluation Metrics for the Task of Finding One Highly Relevant Document

Tetsuya Sakai

IPSJ SIG Technical Report 2006 [Refereed]
On the Task of Finding One Highly Relevant Document with High Precision

Tetsuya Sakai

IPSJ TOD 2006 [Refereed]
Give Me Just One Highly Relevant Document: P-measure

Tetsuya Sakai

ACM SIGIR 2006 Proceedings 2006 [Refereed]
Evaluating Evaluation Metrics based on the Bootstrap

Tetsuya Sakai

ACM SIGIR 2006 Proceedings 2006 [Refereed]

CiNii
NTCIRに基づく文書検索技術の進歩に関する一考察

酒井哲也

情報科学技術レターズ 2006 [Refereed]
Improving the robustness to recognition errors in speech input question answering

Hideki Tsutsui, Toshihiko Manabe, Mika Fukui, Tetsuya Sakai, Hiroko Fujii, Koji Urata

INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS 4182 297 - 312 2006 [Refereed]

　View Summary

In our previous work, we developed a prototype of a speech-input help system for home appliances such as digital cameras and microwave ovens. Given a factoid question, the system performs textual question answering using the manuals as the knowledge source. Whereas, given a HOW question, it retrieves and plays a demonstration video. However, our first prototype suffered from speech recognition errors, especially when the Japanese interrogative phrases in factoid questions were misrecognized. We therefore propose a method for solving this problem, which complements a speech query transcript with an interrogative phrase selected from a pre-determined list. The selection process first narrows down candidate phrases based on co-occurrences within the manual text, and then computes the similarity between each candidate and the query transcript in terms of pronunciation. Our method improves the Mean Reciprocal Rank of top three answers from 0.429 to 0.597 for factoid questions.
Pic-A-Topic: Gathering information efficiently from recorded TV shows on travel

Tetsuya Sakai, Tatsuya Uehara, Kazuo Sumita, Taishi Shimomori

INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS 4182 429 - 444 2006 [Refereed]

　View Summary

We introduce a system called Pic-A-Topic, which analyses closed captions of Japanese TV shows on travel to perform topic segmentation and topic sentence selection. Our objective is to provide a table-of-contents interface that enables efficient viewing of desired topical segments within recorded TV shows to users of appliances such as hard disk recorders and digital TVs. According to our experiments using 14.5 hours of recorded travel TV shows, Pic-A-Topic's F1-measure for the topic segmentation task is 82% of manual performance on average. Moreover, a preliminary user evaluation experiment suggests that this level of performance may be indistinguishable from manual performance.
Bootstrap-based comparisons of IR metrics for finding one relevant document

Tetsuya Sakai

INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS 4182 374 - 389 2006 [Refereed]

　View Summary

This paper compares the sensitivity of IR metrics designed for the task of finding one relevant document, using a method recently proposed at SIGIR 2006. The metrics are: P+-measure, P-measure, O-measure, Normalised Weighted Reciprocal Rank (NWRR) and Reciprocal Rank (RR). All of them except for RR can handle graded relevance. Unlike the ad hoc (but nevertheless useful) "swap" method proposed by Voorhees and Buckley, the new method derives the sensitivity and the performance difference required to guarantee a given significance level directly from Bootstrap Hypothesis Tests. We use four data sets from NTCIR to show that, according to this method, "P((+))-measure >= O-measure >= NWRR >= RR" generally holds, where ">=" means "is at least as sensitive as". These results generalise and reinforce previously reported ones based on the swap method. Therefore, we recommend the use of P((+))-measure and O-measure for practical tasks such as known-item search where recall is either unimportant or immeasurable.
Ranking the NTCIR systems based on multigrade relevance

Tetsuya Sakai

INFORMATION RETRIEVAL TECHNOLOGY 3411 251 - 262 2005 [Refereed]

　View Summary

At NTCIR-4, new retrieval effectiveness metrics called Q-measure and R-measure were proposed for evaluation based on multi-grade relevance. This paper shows that Q-measure inherits both the reliability of noninterpolated Average Precision and the multigrade relevance capability of Average Weighted Precision through a theoretical analysis, and then verify the above claim through experiments by actually ranking the systems submitted to the NTCIR-3 CLIR Task. Our experiments confirm that the Q-measure ranking is very highly correlated with the Average Precision ranking and that it is more reliable than Average Weighted Precision.
評価型ワークショップにおけるシステム順位の安定性について

酒井哲也

言語処理学会第11回年次大会併設ワークショップ「評価型ワークショップを考える」 2005 [Refereed]
固有表現抽出と回答タイプ体系が質問応答システムの性能に与える影響(自然言語処理)

市村由美, 齋藤佳美, 酒井哲也, 国分智晴, 小山誠

電子情報通信学会論文誌 2005 [Refereed]
Flexible Pseudo-Relevance Feedback via Selective Sampling

Tetsuya Sakai, Toshihiko Manabe, and Makoto Koyama

ACM TALIP 2005 [Refereed]
Advanced Technologies for Information Access (invited paper),

Tetsuya Sakai

International Journal of Computer Processing of Oriental Languages 2005 [Refereed]
ひとつの高適合文書を高精度に検索するタスクのための評価指標

酒井哲也

情報科学技術レターズ 2005 [Refereed]
The reliability of metrics based on graded relevance

Tetsuya Sakai

INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS 3689 1 - 16 2005 [Refereed]

　View Summary

This paper compares 14 metrics designed for information retrieval evaluation with graded relevance, together with 10 traditional metrics based on binary relevance, in terms of reliability and resemblance of system rankings. More specifically, we use two test collections with submitted runs from the Chinese IR and English IR tasks in the NTCIR-3 CUR track to examine the metrics using methods proposed by Buckley/Voorhees and Voorhees/Buckley as well as Kendall's rank correlation. Our results show that AnDCC(l) and nDCC(l) ((Average) Normalised Discounted Cumulative Cain at Document cut-off 1) are good metrics, provided that I is large. However, if one wants to avoid the parameter I altogether, or if one requires a metric that closely resembles TREC Average Precision, then Q-measure appears to be the best choice.
Introduction to the special issue: Recent advances in information processing and access for Japanese

Tetsuya Sakai, Yuji Matsumoto

ACM Transactions on Asian Language Information Processing 4 ( 4 ) 275 - 376 2005 [Refereed]

DOI

Scopus
The Relationship between Answer Ranking and User Satisfaction in a Question Answering System

Tomoharu Kokubu, Tetsuya Sakai, Yoshimi Saito, Hideki Tsutsui, Toshihiko Manabe, Makoto Koyama, and Hiroko Fujii:

NTCIR-5 Proceedings (Open Submission Session) 2005 [Refereed]
The Effect of Topic Sampling on Sensitivity Comparisons of Information Retrieval Metrics

Tetsuya Sakai

NTCIR-5 Proceedings (Open Submission Session) 2005 [Refereed]
ASKMi: A Japanese Question Answering System based on Semantic Role Analysis

Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Makoto Koyama, Tomoharu Kokubu, and Toshihiko Manabe

RIAO 2004 2004 [Refereed]

CiNii
New Performance Metrics based on Multigrade Relevance

Tetsuya Sakai

NTCIR-4 Proceedings (Open Submission Session), 2004 [Refereed]

CiNii
The Effect of Back-Formulating Questions in Question Answering Evaluation

Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Tomoharu Kokubu, and Makoto Koyama

ACM SIGIR 2004 2004 [Refereed]

CiNii
汎用シソーラスと擬似適合性フィードバックとを用いた検索質問拡張

小山誠, 真鍋俊彦, 木村和広, 酒井哲也

「情報アクセスのためのテキスト処理」シンポジウム 2003 [Refereed]
BRIDJE over a Language Barrier: Cross-Language Information Access by Integrating Translation and Retrieval,

Tetsuya Sakai, Makoto Koyama, Masaru Suzuki, Akira Kumano, and Toshihiko Manabe

IRAL 2003 2003 [Refereed]

CiNii
Evaluating Retrieval Performance for Japanese Question Answering: What Are Best Passages?

Tetsuya Sakai and Tomoharu Kokubu

ACM SIGIR 2003 2003 [Refereed]

CiNii
Average Gain Ratio: A Simple Retrieval Performance Measure for Evaluation with Multiple Relevance Levels

Tetsuya Sakai

ACM SIGIR 2003 2003 [Refereed]
Relative and Absolute Term Selection Criteria: A Comparative Study for English and Japanese IR

Tetsuya Sakai and Stephen E. Robertson

ACM SIGIR 2002 2002 [Refereed]
Generating transliteration rules for cross-language information retrieval from machine translation dictionaries

Tetsuya Sakai, Akira Kumano, Toshihiko Manabe

Proceedings of the IEEE International Conference on Systems, Man and Cybernetics 6 290 - 295 2002 [Refereed]

　View Summary

This paper describes a method for automatically converting existing English-Japanese and Japanese-English machine translation dictionaries into English-Japanese transliteration rules and Japanese-English back-transliteration rules for cross language information retrieval. An existing English-katakana word alignment module, which is part of our own machine translation system, is exploited in generating probabilistic rewriting rules. If our system is allowed to output 15 candidate spellings, it successfully transliterates more than 75% of a set of out-of-vocabulary English words into katakana, and successfully back-transliterates more than 55% of a set of out-of-vocabulary katakana words into English. Moreover, our preliminary cross-language information retrieval experiments, which treat the candidate spellings as a group of synonyms, suggest that our methods can indeed compensate for the failure of machine translation in some cases.

DOI

Scopus

4

Citation

(Scopus)
The Use of External Text Data in Cross-Language Information Retrieval based on Machine Translation

Tetsuya Sakai

IEEE SMC 2002 2002 [Refereed]
意味役割解析に基づく高適合英語文書の検索

酒井哲也, 小山誠, 鈴木優, 真鍋俊彦

FIT 2002 情報技術レターズ LD-8 67 - 68 2002 [Refereed]

CiNii
A framework for cross-language information access: Application to English and Japanese

Gareth Jones, Nigel Collier, Tetsuya Sakai, Kazuo Sumita, Hideki Hirakawa

COMPUTERS AND THE HUMANITIES 35 ( 4 ) 371 - 388 2001.11 [Refereed]

　View Summary

Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documents that they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This paper shows how information retrieval and machine translation can be combined in a cross-language information access framework to help overcome the language barrier. We present encouraging preliminary experimental results using English queries to retrieve documents from the standard Japanese language BMIR-J2 retrieval test collection. We outline the scope and purpose of cross-language information access and provide an example application to suggest that technology already exists to provide effective and potentially useful applications.
A framework for cross-language information access: Application to english and Japanese

Gareth Jones, Nigel Collier, Tetsuya Sakai, Kazuo Sumita, Hideki Hirakawa

Language Resources and Evaluation 35 ( 4 ) 371 - 388 2001

　View Summary

Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documents that they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This paper shows how information retrieval and machine translation can be combined in a cross-language information access framework to help overcome the language barrier. We present encouraging preliminary experimental results using English queries to retrieve documents from the standard Japanese language BMIR-J2 retrieval test collection. We outline the scope and purpose of cross-language information access and provide an example application to suggest that technology already exists to provide effective and potentially useful applications. © 2001 Kluwer Academic Publishers.
Flexible Pseudo-Relevance Feedback via Direct Mapping and Categorization of Search Requests

Tetsuya Sakai, Stephen E. Robertson, and Stephen Walker

ECIR 2001 2001 [Refereed]
Japanese-English Cross-Language Information Retrieval using Machine Translation and Pseudo-Relevance Feedback

Tetsuya Sakai

International Journal of Computer Processing of Oriental Languages 14 ( 2 ) 83 - 107 2001 [Refereed]

DOI CiNii
Flexible Pseudo-Relevance Feedback Using Optimization Tables

Tetsuya Sakai, Stephen E. Robertson

ACM SIGIR 2001 2001 [Refereed]
Generic Summaries for Indexing in Information Retrieval

Tetsuya Sakai and Karen Sparck Jones

ACM SIGIR 2001 2001 [Refereed]

CiNii
Combining the Ranked Output from Fulltext and Summary Indexes

Tetsuya Sakai

ACM SIGIR 2001 Workshop on Text Summarization 2001 [Refereed]
Incremental relevance feedback in Japanese text retrieval

Gareth Jones, Tetsuya Sakai, Masahiro Kajiura, Kazuo Sumita

Information Retrieval 2 ( 4 ) 361 - 384 2000 [Refereed]

　View Summary

The application of relevance feedback techniques has been shown to improve retrieval performance for a number of information retrieval tasks. This paper explores incremental relevance feedback for ad hoc Japanese text retrieval
examining, separately and in combination, the utility of term reweighting and query expansion using a probabilistic retrieval model. Retrieval performance is evaluated in terms of standard precision-recall measures, and also using "number-to-view" graphs. Experimental results, on the standard BMIR-J2 Japanese language retrieval collection, show that both term reweighting and query expansion improve retrieval performance. This is reflected in improvements in both precision and recall, but also a reduction in the average number of documents which must be viewed to find a selected number of relevant items. In particular, using a simple simulation of user searching, incremental application of relevance information is shown to lead to progressively improved retrieval performance and an overall reduction in the number of documents that a user must view to find relevant ones. © 2000 Kluwer Academic Publishers.

DOI

Scopus

2

Citation

(Scopus)
MT-based Japanese-English Cross-Language IR Experiments using the TREC Test Collections

Tetsuya Sakai

IRAL 2000 2000 [Refereed]
A First Step towards Flexible Local Feedback for Ad hoc Retrieval

Tetsuya Sakai, Masahiro Kajiura, and Kazuo Sumita

IRAL 2000 2000 [Refereed]
BMIR-J2: A test collection for evaluation of Japanese information retrieval systems

Tetsuya Sakai, Tsuyoshi Kitani, Yasushi Ogawa, Tetsuya Ishikawa, Haruo Kimoto, Ikuo Keshi, Jun Toyoura, Toshikazu Fukushima, Kunio Matsui, Yoshihiro Ueda, Takenobu Tokunaga, Hiroshi Tsuruoka, Hidekazu Nakawatase, Teru Agata, Noriko Kando

SIGIR Forum (ACM Special Interest Group on Information Retrieval) 33 ( 1 ) 13 - 17 1999

　View Summary

BMIR-J2 is the first complete test collection generally available for evaluating Japanese information retrieval systems. BMIR-J2 features include a novel division of search requests based on various functions required to perform successful retrieval. BMIR-J2 and its smaller predecessor BMIR-J1 were constructed by a volunteer-based working group under the Information Processing Society of Japan. We hope that BMIR-J2 will come into wide use and that it will foster the development of Japanese IR systems.

DOI

Scopus

10

Citation

(Scopus)
確率モデルに基づく日本語情報フィルタリングにおけるフィードバックによる検索条件展開および検索精度評価

酒井哲也, Gareth J.F. Jones, 梶浦正浩, 住田一男

情報処理学会論文誌 1999 [Refereed]
A comparison of query translation methods for English-Japanese cross-language information retrieval

Gareth Jones, Tetsuya Sakai, Nigel Collier, Akira Kumano, and Kazuo Sumita

SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL 269 - 270 1999 [Refereed]

　View Summary

In this paper we report results of an investigation into English-Japanese Cross-Language Information Retrieval (CLIR) comparing a number of query translation methods. Results from experiments using the standard BMIR-J2 Japanese collection suggest that full machine translation (MT) can outperform popular dictionary-based query translation methods and further that in this context MT is largely robust to queries with little linguistic structure.
Exploring the use of Machine Translation resources for English-Japanese Cross-Language Information Retrieval

Gareth Jones, Tetsuya Sakai, Nigel Collier, Akira Kumano, and Kazuo Sumita

MT Summit VII Workshop on Machine Translation for Cross Language Information Retrieval 1999 [Refereed]
日本語情報検索システム評価用テストコレクションの構築

木本晴夫, 小川泰嗣, 石川徹也, 増永良文, 福島俊一, 田中智博, 中渡瀬秀一, 芥子育雄, 豊浦潤, 宮内忠信, 上田良寛, 松井くにお, 木谷強, 三池誠司, 酒井哲也, 徳永健伸, 鶴岡弘, 安形輝

情報処理学会論文誌 1999 [Refereed]
機械翻訳を用いた英日・日英言語横断検索に関する一考察

酒井哲也, 梶浦正浩, 住田一男, Jones, G, Collier, N

情報処理学会論文誌 40 ( 11 ) 4075 - 4086 1999 [Refereed]

CiNii
情報検索システム評価のためのテストコレクション

酒井哲也, 小川泰嗣, 木谷強, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 徳永健伸, 鶴岡弘, 安形輝, 神門典子

Computer Today 1998 [Refereed]
日本語情報検索システム評価用テストコレクションの構築

木本晴夫, 小川泰嗣, 石川徹也, 増永良文, 福島俊一, 田中智博, 中渡瀬秀一, 芥子育雄, 豊浦潤, 宮内忠信, 上田良寛, 松井くにお, 木谷強, 三池誠司, 酒井哲也, 徳永健伸, 鶴岡弘, 安形輝

情報学シンポジウム'98 1998 [Refereed]
ユーザーの要求に応じた情報フィルタリングシステム NEATのプロファイル生成

酒井哲也, Jones, G.J.F, 梶浦正浩, 住田一男

Interaction '98 149 - 152 1998 [Refereed]

CiNii
Lessons from BMIR-J2: A Test Collection for Japanese IR Systems

Tsuyoshi Kitani, Yasushi Ogawa, Tetsuya Ishikawa, Haruo Kimoto, Ikuo Keshi, Jun Toyoura, Toshikazu Fukushima, Kunio Matsui, Yoshihiro Ueda, Tetsuya Sakai, Takenobu Tokunaga, Hiroshi Tsuruoka, Hidekazu Nakawatase, Teru Agata

ACM SIGIR '98 1998 [Refereed]
Experiments in Japanese Text Retrieval and Routing using the NEAT Systemross-Language Information Access: a case study for English and Japanese

Gareth Jones, Tetsuya Sakai, Masaharu Kajiura, and Kazuo Sumita

ACM SIGIR '98 1998 [Refereed]
Application of Query Expansion Techniques in Probabilistic Japanese News Filtering

Tetsuya Sakai, Gareth Jones, Masahiro Kajiura, and Kazuo Sumita

IRAL '98 1998 [Refereed]
情報フィルタリングのためのブール式と文書構造を利用した検索条件生成と検索精度評価

酒井哲也, 梶浦正浩, 住田一男

情報処理学会論文誌 1998 [Refereed]
日本語テキスト情報検索システムの評価用テストコレクション

酒井哲也, 小川泰嗣, 木谷強, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 徳永健伸, 鶴岡弘, 安形輝

アドバンストデータベースシンポジウム'98, パネル：マルチメディア情報検索ベンチマークの未来 1998 [Refereed]
WWW上のフロー情報を対象にした情報フィルタ (FreshEye)

住田一男, 上原龍也, 小野顕司, 酒井哲也, 池田朋男, 下郡信宏

インタラクション'97 1997 [Refereed]
日本語情報検索システム評価用テストコレクションBMIR-J1

福島俊一, 小川泰嗣, 石川徹也, 増永良文, 木本晴夫, 田中智博, 中渡瀬秀一, 芥子育雄, 豊浦潤, 宮内忠信, 上田良寛, 松井くにお, 三池誠司, 酒井哲也, 木谷強, 徳永健伸, 鶴岡弘, 安形輝

自然言語処理シンポジウム'96 1996 [Refereed]
A User Interface for Generating Dynamic Abstracts of Retrieved Documents

Tetsuya Sakai, Etsuo Itoh, Seiji Miike, Kazuo Sumita

47th FID 1994 [Refereed]

▼display all

Books and Other Publications

Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

Tetsuya Sakai, Emi Ishita, Hiroaki Ohshima, Faegheh Hasibi, Jiaxin Mao, Joemon Jose

SIGIR-AP 2024
Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

Qingyao Ai, Yiqin Liu, Alistair Moffa, Xuanjing Huang, Tetsuya Sakai, Justin Zobel

SIGIR-AP 2023
Proceedings of ACM SIGIR 2021

Fernando, Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, Tetsuya Sakai, Alejandro Bellogín, Masaharu Yoshioka

2021
Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact

Tetsuya Sakai, Douglas W. Oard, Noriko Kando

Springer 2020
Proceedings of the Open-Source IR Replicability Challenge (OSIRRC 2019)

Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

2019
U-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018
Q-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018
Expected Reciprocal Rank. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018
ERR-IA. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018
D-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018
alpha-nDCG. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018
Advanced Information Retrieval Measures. In: Liu L., Özsu M. (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018
Laboratory Experiments in Information Retrieval: Sample Sizes, Effect Sizes, and Statistical Power

Tetsuya Sakai

Springer 2018
Proceedings of AIRS 2018 (LNCS 11292)

Yuen-Hsien Tseng, Tetsuya Sakai, Jing Jiang, Lun-Wei Ku, Dae Hoon Park, Jui-Feng Yeh, Liang-Chih Yu, Lung-Hao Lee, Zhi-Hong Chen

2018
Proceedings of ACM SIGIR 2017

Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de, Vries, A.P, Ryen W. White

2017
人工知能学大事典

人工知能学会

共立出版 2017
Proceedings of SPIRE 2016 (LNCS 9954)

Shunsuke Inegaga, Kunihiko Sadakane, Tetsuya Sakai

Springer 2016
情報アクセス評価方法論～検索エンジンの進歩のために～,

酒井哲也

コロナ社 2015
Proceedings of ACM SIGIR 2013

Gareth J.F. Jones, Páraic Sheridan, Diane Kelly, Maarten de Rijke, and Tetsuya Sakai

2013
Proceedings of NTCIR-10

Noriko Kando, Kazuaki Kishida, Eric Tang, Tetsuya Sakai, Makoto P. Kato, Ka Po Chow, Isao Goto, Yotaro Watanabe, Tomoyosi Akiba, Hiromitsu Nishizaki, Akiko Aizawa, Mizuki Morita, and Eiji Aramaki

2013
Proceedings of NTCIR-9

Noriko Kando, Daisuke Ishikawa, Miho Sugimoto, Fredric C. Gey, Tetsuya Sakai, Tomoyosi Akiba, Hideki Shima, Shlomo Geva, Eric Tang, Andrew Trotman, Tsuneaki Kato, Bin Lu, and Isao Goto

2011
Proceedings of the 3rd International Workshop on Evaluating Information Access (EVIA 2010)

Tetsuya Sakai, Mark Sanderson, William Webber, Noriko Kando, Kazuaki Kishida

2010
Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation,

Shlomo Geva, Jaap Kamps, Carol Peters, Tetsuya Sakai, Andrew Trotman, and Ellen Voorhees

2009
5th Asia Information Retrieval Symposium (AIRS 2009)

Gary Geunbae Lee, Dawei Song, Chin-Yew Lin, Akiko Aizawa, Kazuko Kuriyama, Masaharu Yoshioka, Tetsuya Sakai

Springer 2009
Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation

Shlomo Geva, Jaap Kamps, Carol Peters, Tetsuya Sakai, Andrew Trotman, and Ellen Voorhees

2009
言語処理学辞典

共同執筆

共立出版 2009
Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008),

Tetsuya Sakai, Mark Sanderson, Noriko Kando, and Miho Sugimoto

2008
Proceedings of AIRS 2008 (LNCS 4993)

Hang Li, Ting Liu, Wei-Ying Ma, Tetsuya Sakai, Kam-Fai Wong, and Guodong Zhou

2008
Proceedings of the First International Workshop on Evaluating Information Access (EVIA 2007),

Tetsuya Sakai, Mark Sanderson, and David Kirk Evans

2007

▼display all

Presentations

マルチモーダル大規模言語モデルに基づく論理的および構造的異常の説明可能な検知

藤井野枝子, 酒井哲也

DEIM2025

Presentation date： 2025
A Comparative Study of Group Fairness Measures for Information Retrieval

Presentation date： 2025
Webサイト内URLを用いたColBERTによる文書検索の精度向上

石川敦也, 酒井哲也

DEIM2025

Presentation date： 2025
日本語におけるadversarial text生成手法の検討

真壁潤, 酒井哲也

DEIM2025

Presentation date： 2025
グループフェアなウェブ検索と会話型検索

酒井哲也, Sijie Tao, Hanpei Fang, Yuxiang Zhang

情報処理学会

Presentation date： 2024

Event date：
2024

　

　
Cross-lingual Relevance Estimation with Soft and Hard Prompts

Hanpei Fang, Tetsuya Sakai

Presentation date： 2024

Event date：
2024

　

　
A Study on Automatic Nugget Weight Generation with LLMs

Kai-xin Chang, Tetsuya Sakai

Presentation date： 2024

Event date：
2024

　

　
中間音楽生成のための Transformer モデル

Presentation date： 2024

Event date：
2024

　

　
思い出せない映画に特化した情報検索システムの作成

DEIM

Presentation date： 2024

Event date：
2024

　

　
RSLTOT at the TREC 2023 ToT Track

Reo Yoshikoshi, Tetsuya Sakai

TREC

Presentation date： 2024

Event date：
2024

　

　
A Practical Guide to Computing Evaluation Measures and Comparing Systems: Twelve Small Tips

Tetsuya Sakai [Invited]

NTCIR-17

Presentation date： 2023.12

Event date：
2023.12

　

　
Overview of the NTCIR-17 FairWeb-1 Task

Sijie Tao, Nuo Chen, Tetsuya Sakai, Zhumin Chu, Hiromi Arai, Ian Soboroff, Nicola Ferro, Maria Maistro

NTCIR-17

Presentation date： 2023
RSLFW at the NTCIR-17 FairWeb 1 Task

Fan Li, Kaize Shi, Kenta Inaba, Sijie Tao, Nuo Chen, Tetsuya Sakai

NTCIR-17

Presentation date： 2023
Evaluating Parrots and Sociopathic Liars (keynote)

Tetsuya Sakai [Invited]

ACM ICTIR

Presentation date： 2023
On A Few Responsibilities of (IR) Researchers: Fairness, Awareness, and Sustainability (keynote)

Tetsuya Sakai [Invited]

ECIR

Presentation date： 2023
ウェブ検索結果がユーザの意見形成に及ぼす影響の調査

Kenta Inaba, Tetsuya Sakai

DEIM

Presentation date： 2023
SWAN: A Generic Framework for Auditing Textual Conversational Systems

Tetsuya Sakai

arXiv, Cornell University

Presentation date： 2023
Evaluating Evaluation Measures, Evaluating Information Access Systems, Designing and Constructing Test Collections, and Evaluating Again

Tetsuya Sakai [Invited]

Proceedings of NTCIR-16

Presentation date： 2022
グループフェアネスを考慮したウェブ検索タスク

酒井哲也

情報処理学会研究報告

Presentation date： 2022
Overview of the NTCIR-16 WeWantWeb with CENTRE (WWW-4) Task

Tetsuya Sakai, Sijie Tao, Zhumin Chu, Maria Maistro, Yujing Li, Nuo Chen, Nicola Ferro, Junjie Wang, Ian Soboroff, Yiqun Liu

Proceedings of NTCIR-16

Presentation date： 2022
SLWWW at the NTCIR-16 WWW-4 Task

Yuya Ubukata, Masaki Muraoka, Sijie Tao, Tetsuya Sakai

Proceedings of NTCIR-16

Presentation date： 2022
RSLDE at the NTCIR-16 DialEval-2 Task

Fan Li, Tetsuya Sakai

Proceedings of NTCIR-16

Presentation date： 2022
Overview of the NTCIR-16 Dialogue Evaluation (DialEval-2) Task

Sijie Tao, Tetsuya Sakai

Proceedings of NTCIR-16

Presentation date： 2022
On Variants of Root Normalised Order-aware Divergence and a Divergence based on Kendall’s Tau

Tetsuya Sakai

arXiv:2204.07304

Presentation date： 2022
A Versatile Framework for Evaluating Ranked Lists in terms of Group Fairness and Relevance

Tetsuya Sakai, Jin Young Kim, Inho Kang

arXiv:2204.00280

Presentation date： 2022
Transformerを用いた文書の自動品質評価

吉越玲士, 酒井哲也

DEIM 2022

Presentation date： 2022
NTCIR-16ウェブ検索・再現可能性タスク (WWW-4) および対話評価タスク (DialEval-2)への誘い

酒井哲也

情報処理学会研究報告

Presentation date： 2021
対話要約における話者情報を持つEmbeddingの効果

楢木悠士, 酒井哲也, 林良彦

FIT2021講演論文集

Presentation date： 2021
RealSakaiLab at the TREC 2020 Health Misinformation Track

Sijie Tao, Tetsuya Sakai

Presentation date： 2021
話者情報を認識した対話要約

楢木悠士, 酒井哲也

言語処理学会第27回年次大会発表論文集

Presentation date： 2021
Voice Assistantアプリの対話型解析システムの開発

刀塚敦子, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉

電子情報通信学会第54回情報通信システムセキュリティ研究会

Presentation date： 2021
モバイルアプリケーションにおけるUIデザイン自動評価の検討

栗林峻, 酒井哲也

DEIM 2021

Presentation date： 2021
スタンス検出タスクにおける評価方法の選定

雨宮佑基, 酒井哲也

DEIM 2021

Presentation date： 2021
日経新聞の記事からの日経ラジオ用読み原稿の自動生成

清水嶺, 酒井哲也

DEIM 2021

Presentation date： 2021
有用なレビューを抽出するための比較文フィルタリングの検討

小橋賢介, 雨宮佑基, 酒井哲也

DEIM 2021

Presentation date： 2021
Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents?

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

Presentation date： 2021
Overview of the TREC 2018 CENTRE Track

Ian Soboroff, Nicola Ferro, Maria Maistro, Tetsuya Sakai

Proceedings of TREC 2018

Presentation date： 2020
Improving Concept Representations for Short Text Classification

Sijie Tao, Tetsuya Sakai

Presentation date： 2020
Different Types of Voice User Interface Failures May Cause Different Degrees of Frustration

Shiyoh Goetsu, Tetsuya Sakai

Presentation date： 2020
商品比較のための文脈つき評価軸抽出の検討

小橋賢介, 酒井哲也

DEIM 2020

Presentation date： 2020
Androidアプリの権限要求に対する説明十分性の自動確認システムの提案

小島智樹, 酒井哲也

DEIM 2020

Presentation date： 2020
Purchase Prediction based on Recurrent Neural Networks with an Emphasis on Recent User Activities

Quanyu Piao, Joo-Young Lee, Tetsuya Sakai

DEIM 2020

Presentation date： 2020
Experiments on Unsupervised Text Classification based on Graph Neural Networks

Haoxiang Shi, Cen Wang, Tetsuya Sakai

DEIM 2020

Presentation date： 2020
Do Neural Models for Response Generation Fully Exploit the Input Natural Language Text?

Lingfeng Zhang, Tetsuya Sakai

DEIM 2020

Presentation date： 2020
商品検索におけるゼロマッチ解消のためのデータセット構築の検討

雨宮佑基, 真鍋知博, 藤田澄男, 酒井哲也

DEIM 2020

Presentation date： 2020
解釈可能な内部表現を使用したタスク指向ニューラル対話システムの試作

村田憲俊, 酒井哲也

DEIM 2020

Presentation date： 2020
Response Generation based on the Big Five Personality Traits

Wanqi Wu, Tetsuya Sakai

DEIM 2020

Presentation date： 2020
Different Types of Voice User Interface Failures May Cause Different Degrees of Frustration

Shiyoh Goetsu, Tetsuya Sakai

arXiv

Presentation date： 2020
selt Team’s Entity Linking System at the NTCIR-15 QALab-PoliInfo2

Yuji Naraki, Tetsuya Sakai

Proceedings of NTCIR-15

Presentation date： 2020
SLWWW at the NTCIR-15WWW-3 Task

Masaki Muraoka, Zhaohao Zeng, Tetsuya Sakai

Proceedings of NTCIR-15

Presentation date： 2020
Overview of the NTCIR-15 We Want Web with CENTRE (WWW-3) Task

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng, Yukun Zheng, Jiaxin Mao, Zhumin Chu, Yiqun Liu, Maria Maistro, Zhicheng Dou, Nicola Ferro, Ian Soboroff

Proceedings of NTCIR-15

Presentation date： 2020
RSLNV at the NTCIR-15 Dialogue Evaluation (DialEval-1) Task

Ting Cao, Fan Zhang, Haoxiang Shi, Zhaohao Zeng, Sosuke Kato, Tetsuya Sakai, Injae Lee, Kyungduk Kim, Inho Kang

Proceedings of NTCIR-15

Presentation date： 2020
SKYMN at the NTCIR-15 DialEval-1 Task

Junjie Wang, Yuxiang Zhang, Tetsuya Sakai, Hayato Yamana

Proceedings of NTCIR-15

Presentation date： 2020
Overview of the NTCIR-15 Dialogue Evaluation (DialEval-1) Task

Zhaohao Zeng, Sosuke Kato, Tetsuya Sakai, Inho Kang

Proceedings of NTCIR-15

Presentation date： 2020
ユーザの感覚に近い多様化検索評価指標

酒井哲也, Zhaohao Zeng

FIT2020講演論文集

Presentation date： 2020
On Fuhr’s Guideline for IR Evaluation

Tetsuya Sakai

SIGIR Forum

Presentation date： 2020
擬似アノテーションにもとづく日本語ツイートの極性判定

小橋賢介, 酒井哲也

DEIM 2019

Presentation date： 2019
FigureQAタスクにおける抽象画像を考慮したアプローチ

坂本凜, 酒井哲也

DEIM 2019

Presentation date： 2019
Convolutional Neural Networkを用いたFake News Challengeの検討

雨宮佑基, 酒井哲也

DEIM 2019

Presentation date： 2019
音声ユーザインタフェースにおける処理エラーによるユーザフラストレーションに関する調査

呉越思瑶, 酒井哲也

DEIM 2019

Presentation date： 2019
Query-Focused Extractive Summarization based on Deep Learning: Comparison of Similarity Measures for Pseudo Ground Truth Generation

Yuliska, Tetsuya Sakai

DEIM 2019

Presentation date： 2019
Exploring Multi-label Classification Using Text Graph Convolutional Networks on the NTCIR-13 MedWeb Dataset

Sijie Tao, Tetsuya Sakai

DEIM 2019

Presentation date： 2019
Androidアプリの権限要求に対するユーザーへの説明の補完

小島智樹, 酒井哲也

DEIM 2019

Presentation date： 2019
能動学習を利用した未知語アノテーションの検討

黒澤瞭佑, 酒井哲也

DEIM 2019

Presentation date： 2019
Dialogue Quality Distribution Prediction based on a Loss that Compares Adjacent Probability Bins

河東宗祐, 酒井哲也

DEIM 2019

Presentation date： 2019
Twitterコーパスに基づく雑談対話システムにおける多様性の獲得

村田憲俊, 酒井哲也

DEIM 2019

Presentation date： 2019
文書分類技術に基づくエントリーシートからの業界推薦

三王慶太, 酒井哲也

DEIM 2019

Presentation date： 2019
Graded Relevance Assessments and Graded Relevance Measures of NTCIR: A Survey of the First Twenty Years

Tetsuya Sakai

arXiv:1903.11272

Presentation date： 2019
RSL19BD at DBDC4: Ensemble of Decision Tree-based and LSTM-based Models

Chih-hao Wang, Sosuke Kato, Tetsuya Sakai

arXiv:1905.01799

Presentation date： 2019
Overview of the NTCIR-14 CENTRE Task

Tetsuya Sakai, Nicola Ferro, Ian Soboroff, Zhaohao Zeng, Peng Xiao, and Maria Maistro

Proceedings of NTCIR-14

Presentation date： 2019
Overview of the NTCIR-14 We Want Web Task

Jiaxin Mao, Tetsuya Sakai, Cheng Luo, Peng Xiao, Yiqun Liu, and Zhicheng Dou

Proceedings of NTCIR-14

Presentation date： 2019
Overview of the NTCIR-14 Short Text Conversation Task: Dialogue Quality and Nugget Detection Subtasks

Zhaohao Zeng, Sosuke Kato, and Tetsuya Sakai

Proceedings of NTCIR-14

Presentation date： 2019
SLSTC at the NTCIR-14 STC-3 Dialogue Quality and Nugget Detection Subtasks

Sosuke Kato, Rikiya Suzuki, Zhaohao Zeng, and Tetsuya Sakai

Proceedings of NTCIR-14

Presentation date： 2019
SLWWW at the NTCIR-14 We Want Web Task

Peng Xiao and Tetsuya Sakai

Proceedings of NTCIR-14

Presentation date： 2019
NTCIR-15ウェブ検索・再現可能性タスク (WWW-3) および対話評価タスク (DialEval-1)への誘い

酒井哲也

情報処理学会研究報告2019-IFAT-136

Presentation date： 2019
Overview of the TREC 2018 CENTRE Track

Ian Soboroff, Nicola Ferro, Maria Maistro, and Tetsuya Sakai

Proceedings of TREC 2018

Presentation date： 2019
クリックと放棄に基づくモバイルバーティカルの順位付け

川崎真未, Inho Kang, 酒井哲也

DEIM 2018

Presentation date： 2018
Generative Adversarial Nets を用いた文書分類の検証

小島智樹, 酒井哲也

DEIM 2018

Presentation date： 2018
単語レベルと文字レベルの情報を用いた日本語対話システムの試作

村田憲俊, 酒井哲也

DEIM 2018

Presentation date： 2018
Classifying Community QA QuestionsThat Contain an Image

Kenta Tamaki, Riku Togashi, Sumio Fujita, Hideyuki Maeda, Tetsuya Sakai

DEIM 2018

Presentation date： 2018
ユーザーのニーズに合わせたインタラクティブな推薦システムの提案

呉越思瑶, 酒井哲也

DEIM 2018

Presentation date： 2018
Report on NTCIR-13: The Thirteenth Round of NII Testbeds and Community for Information Access Research

Yiqun Liu, Makoto P. Kato, Charles L.A. Clarke, Noriko Kando, and Tetsuya Sakai

SIGIR Forum 52(1) 2018

Presentation date： 2018
A Comparative Study of Deep Learning Approaches for Visual Question Classification in Community QA

Hsin-Wen Liu, Avikalp Srivastava, Sumio Fujita, Toru Shimizu, Riku Togashi, and Tetsuya Sakai

IPSJ SIG Technical Report 2018-IFAT-132 (17)

Presentation date： 2018
対話破綻検出コーパスに対する学習データ選別の検討

河東宗祐, 酒井哲也

情報処理学会研究報告 2018-IFAT-132 (28)

Presentation date： 2018
色・形状・テクスチャに基づく画像検索の自動評価と多様化

富樫陸, 藤田澄男, 酒井哲也

情報処理学会研究報告 2018-IFAT-132 (12)

Presentation date： 2018
Androidアプリのレビューを用いたユーザーへの権限説明の補完

小島智樹, 酒井哲也

情報処理学会研究報告

Presentation date： 2018
評価実験の設計と論文での結果報告: きちんとやっていますか?

酒井哲也

第3回自然言語処理シンポジウム

Presentation date： 2017
Report on NTCIR-12: The Twelfth Round of NII Testbeds and Community for Information Access Research

Makoto P. Kato, Kazuaki Kishida, Noriko Kando, Tetsuya Sakai, and Mark Sanderson

SIGIR Forum 50 (2)

Presentation date： 2017
ツイートにおける周辺単語の感情極性値を用いた新語の感情推定

黒澤瞭佑, 酒井哲也

DEIM 2017

Presentation date： 2017
解答検証を利用した選択式問題への自動解答

佐藤航, 酒井哲也

DEIM 2017

Presentation date： 2017
英日言語横断検索におけるクエリ拡張結果の詳細分析

玉置賢太, 酒井哲也

DEIM 2017

Presentation date： 2017
アノテーション分布を考慮した対話破綻検出

河東宗祐, 酒井哲也

DEIM 2017

Presentation date： 2017
拡張クエリを用いたレシピ検索のパーソナライゼーション

犬塚眞太郎, 酒井哲也

DEIM 2017

Presentation date： 2017
クリックに基づく選好グラフを用いたバーティカル適合性推定

門田見侑大, 吉田泰明, 藤田澄男, 酒井哲也

DEIM 2017

Presentation date： 2017
複数人で睡眠習慣改善に臨む際の人間関係と協調の効果

飯島聡美, 酒井哲也

DEIM 2017

Presentation date： 2017
Test Collections and Measures for Evaluating Customer-Helpdesk Dialogues

Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, Tetsuya Sakai

情報処理学会研究報告 2017-NL-232

Presentation date： 2017
Ranking Rich Mobile Verticals based on Clicks and Abandonment

Mami Kawasaki, Inho Kang, and Tetsuya Sakai

Proceedings of NTCIR-13

Presentation date： 2017
Overview of the NTCIR-13 Short Text Conversation Task

Lifeng Shang, Tetsuya Sakai, Hang Li, Ryuichiro Higashinaka, Yusuke Miyao, Yuki Arase, and Masako Nomoto

Proceedings of NTCIR-13

Presentation date： 2017
Overview of the NTCIR-13 We Want Web Task

Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou, Chenyan Xiong, and Jingfang Xu

Proceedings of NTCIR-13

Presentation date： 2017
SLOLQ at the NTCIR-13 OpenLiveQ Task

Ryo Kashimura and Tetsuya Sakai,

Proceedings of NTCIR-13

Presentation date： 2017
SLQAL at the NTCIR-13 QA Lab-3 Task

Kou Sato and Tetsuya Sakai

Proceedings of NTCIR-13

Presentation date： 2017
SLSTC at the NTCIR-13 STC Task

Jun Guan and Tetsuya Sakai

Proceedings of NTCIR-13

Presentation date： 2017
SLWWW at the NTCIR-13 WWW Task

Peng Xiao, Lingtao Li, Yimeng Fan, and Tetsuya Sakai

Proceedings of NTCIR-13

Presentation date： 2017
Project Next IR -情報検索の失敗分析‐

難波英嗣, 酒井哲也, 神門典子

情報処理

Presentation date： 2016
発話者を考慮した学習に基づく対話システムの検討

河東宗祐, 酒井哲也

DEIM 2016

Presentation date： 2016
ショッピングサイトにおける購入予測のための行動パターン分析

出縄弘人, Young-In Song, 酒井哲也

DEIM 2016

Presentation date： 2016
コンテキスト付き検索ログを用いた要求ヴァーティカルの分析

門田見侑大, 吉田泰明, 藤田澄男, 酒井哲也

DEIM 2016

Presentation date： 2016
言語の分散表現と擬似適合性フィードバックを用いた英日言語横断検索

玉置賢太, 林佑明, 酒井哲也

DEIM 2016

Presentation date： 2016
協調型ヘルスケア -規則正しい睡眠による日中の生産性向上

飯島聡美, 酒井哲也

DEIM 2016

Presentation date： 2016
Overview of the NTCIR-12 Short Text Conversation Task

Lifeng Shang, Tetsuya Sakai, Zhengdong Lu, Hang Li, Ryuichiro Higashinaka, Yusuke Miyao

NTCIR-12

Presentation date： 2016
Overview of the NTCIR-12 MobileClick Task

Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Virgil Pavlu, Hajime Morita, and Sumio Fujita

NTCIR-12

Presentation date： 2016
NEXTI at NTCIR-12 IMine-2 Task

Hidetsugu Nanba, Tetsuya Sakai, Noriko Kando, Atsushi Keyaki, Koji Eguchi, Kenji Hatano, Toshiyuki Shimizu, Yu Hirate, and Atsushi Fujii

NTCIR-12

Presentation date： 2016
SLQAL at the NTCIR-12 QALab-2 Task

Shin Higuchi, Tetsuya Sakai

NTCIR-12

Presentation date： 2016
SLSTC at the NTCIR-12 STC Task

Hiroto Denawa, Tomoaki Sano, Yuta Kadotami, Sosuke Kato, and Tetsuya Sakai

NTCIR-12

Presentation date： 2016
SLLL at the NTCIR-12 Lifelog Task: Sleepflower and the LIT Subtask

Satomi Iijima and Tetsuya Sakai

NTCIR-12

Presentation date： 2016
Evaluating Helpdesk Dialogues: Initial Considerations from An Information Access Perspective

Tetsuya Sakai, Zhaohao Zeng, Cheng Luo

IPSJ SIG Technical Report

Presentation date： 2016
word2vecによる発話ベクトルの類似度を用いた対話破綻予測

河東宗祐, 酒井哲也

人工知能学会音声・言語理解と対話処理研究会（SLUD）第78回研究会 (第7回対話システムシンポジウム),

Presentation date： 2016
TREC 2014 Temporal Summarization Track Overview

Javed Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Richard McCreadie, and Tetsuya Sakai

TREC 2014

Presentation date： 2015
言語の分散表現による文脈情報を利用した言語横断情報検索

林佑明, 酒井哲也

DEIM Forum 2015

Presentation date： 2015
情報検索のエラー分析

難波英嗣, 酒井哲也

言語処理学会第２１回年次大会ワークショップ

Presentation date： 2015
Topic Set Size Design with the Evaluation Measures for Short Text Conversation

Tetsuya Sakai

Presentation date： 2015
ECol 2015: First International Workshop on the Evaluation of Collaborative Information Seeking and Retrieval

Leif Azzopardi, Jeremy Pickens, Tetsuya Sakai, Laure Soulier, Lynda Tamine

ACM CIKM 2015

Presentation date： 2015
TREC 2013 Temporal Summarization

Javd Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Virgil Pavlu, and Tetsuya Sakai

TREC 2013

Presentation date： 2014
映像入力デバイスを悪用する Android アプリの解析と対策法

渡邉卓弥, 森達哉, 酒井哲也

信学技報

Presentation date： 2014
Androidアプリの説明文とプライバシー情報アクセスの相関分析

渡邉卓弥, 秋山満昭, 酒井哲也, 鷲崎弘宜, 森達哉

マルウェア対策研究人材育成ワークショップ 2014

Presentation date： 2014
Overview of the NTCIR-11 MobileClick Task

Makoto P. Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, and Mayu Iwata

NTCIR-11

Presentation date： 2014
A Preview of the NTCIR-10 INTENT-2 Results

Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, Makoto P. Kato, and Mayu Iwata

IPSJ SIG Technical Report

Presentation date： 2013
Overview of the NTCIR-10 INTENT-2 Task

Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, Makoto P. Kato, and Mayu Iwata

NTCIR-10

Presentation date： 2013
Overview of the NTCIR-10 1CLICK-2 Task

Makoto P. Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, and Mayu Iwata

NTCIR-10

Presentation date： 2013
Microsoft Research Asia at the NTCIR-10 Intent Task

Kosetsu Tsukuda, Zhicheng Dou, and Tetsuya Sakai

NTCIR-10

Presentation date： 2013
MSRA at NTCIR-10 1CLICK-2

Kazuya Narita, Tetsuya Sakai, Zhicheng Dou, and Young-In Song

NTCIR-10

Presentation date： 2013
How Intuitive Are Diversified Search Metrics? Concordance Test Results for the Diversity U-measures

Tetsuya Sakai

IPSJ SIG Technical Report

Presentation date： 2013
モバイル「情報」検索に向けて: NTCIR-11 MobileClickタスクへの誘い

加藤誠, Matthew Ekstrand-Abueg, Virgil Pavlu, 酒井哲也, 山本岳洋, 岩田麻佑

人工知能学会第5回インタラクティブ情報アクセスと可視化マイニング研究会

Presentation date： 2013
曖昧なクエリと(不)明快なクエリ:NTCIR-10 INTENT-2と1CLICK-2タスクへの誘い

酒井哲也

IPSJ SIG Technical Report

Presentation date： 2012
NTCIR-9総括と今後の展望

酒井哲也, 上保秀夫, 神門典子, 加藤恒昭, 相澤彰子, 秋葉友良, 後藤功雄, 木村文則, 三田村照子, 西崎博光, 嶋秀樹, 吉岡真治, Shlomo Geva, Ling-Xiang Tang, Andrew Trotman, and Yue Xu

IPSJ SIG Technical Report

Presentation date： 2012
Frontiers, Challenges, and Opportunities for Information Retrieval: Report from SWIRL 2012 The Second Strategic Workshop on Information Retrieval in Lorne,

Allan, J, Aslam, J, Azzopardi, L, Belkin, N, Borlund, P, Bruza, P, Callan, J, Carman, M, Clarke, C.L.A, Craswell, N. Croft, W, B, Culpepper, J.S, Diaz, F, Dumais, S, Ferro, N, Geva, S, Gonzalo, J, Hawking, D, Jarvelin, K, Jones, G, Jones, R, Kamps, J, Kando, N, Kanoulas, N, Karlgren, J, Kelly, D, Lease, M, Lin, J, Mizzaro, S, Moffat, A, Murdock, V, Oard, D.W, de Rijke, M, Sakai, T, Sanderson, M, Scholer, F, Si, L, Thom, J.A, Thomas, P, Trotman, A, Turpin, A

SIGIR Forum

Presentation date： 2012
The Reusability of a Diversified Search Test Collection

Tetsuya Sakai

IPSJ SIG Technical Report

Presentation date： 2012
One Click One Revisited: Enhancing Evaluation based on Information Units

Tetsuya Sakai and Makoto P. Kato

IPSJ SIG Technical Report

Presentation date： 2012
複数判定者によるコミュニティQAの良質回答の判定

石川大介, 酒井哲也, 関洋平, 栗山和子, 神門典子

情報知識学会誌

Presentation date： 2011
Japanese Hyponymy Extraction based on a Term Similarity Graph

Takuya Akiba and Tetsuya Sakai

IPSJ SIG Technical Report

Presentation date： 2011
Overview of NTCIR-9

Tetsuya Sakai and Hideo Joho

NTCIR-9 Proceedings

Presentation date： 2011
Overview of the NTCIR-9 INTENT Task

Ruihua Song, Min Zhang, Tetsuya Sakai, Makoto P. Kato, Yiqun Liu, Miho Sugimoto, Qinglei Wang and Naoki Orii

NTCIR-9 Proceedings

Presentation date： 2011
Overview of NTCIR-9 1CLICK

Tetsuya Sakai, Makoto P. Kato, and Young-In Song:

NTCIR-9 Proceedings

Presentation date： 2011
Microsoft Research Asia at the NTCIR-9 1CLICK Task

Naoki Orii, Young-In Song, and Tetsuya Sakai:

NTCIR-9 Proceedings

Presentation date： 2011
Microsoft Research Asia at the NTCIR-9 Intent Task

Jialong Han, Qinglei Wang, Naoki Orii, Zhicheng Dou, Tetsuya Sakai, and Ruihua Song:

NTCIR-9 Proceedings

Presentation date： 2011
TTOKU Summarization Based Systems at NTCIR-9 1CLICK Task

Hajime Morita, Takuya Makino, Tetsuya Sakai, Hiroya Takamura, and Manabu Okumura:

NTCIR-9 Proceedings

Presentation date： 2011
Grid-based Interaction for NTCIR-9 VisEx Task,

Hideo Joho, Tetsuya Sakai

NTCIR-9 Proceedings

Presentation date： 2011
NTCIR-9 VisEx におけるグリッド型インタラクションモデルの研究

上保秀夫, 酒井哲也

人工知能学会情報編纂研究会第７回究会

Presentation date： 2011
Q&Aサイトにおけるベストアンサー推定の分析とその機械学習への応用

石川大介, 栗山和子, 酒井哲也, 関洋平, 神門典子

情報知識学会年次大会予稿

Presentation date： 2010
OvervieOverview of the NTCIR-8 ACLIA Tasks: Advanced Cross-Lingual Information Accessw of the NTCIR-8 ACLIA Tasks

Teruko Mitamura, Hideki Shima, Tetsuya Sakai, Noriko Kando, Tatsunori Mori, Koichi Takeda, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, and Cheng-Wei Lee:

NTCIR-8 Proceedings

Presentation date： 2010
Overview of NTCIR-8 ACLIA IR4QA

Tetsuya Sakai, Hideki Shima, Noriko Kando, Ruihua Song, Chuan-Jia Lin, Teruko Mitamura, Miho Sugimoto, and Cheng-Wei Lee:

NTCIR-8 Proceedings

Presentation date： 2010
NTCIR-GeoTime Overview: Evaluating Geographic and Temporal Search

Fredric Gey, Ray Larson, Noriko Kando, Jorge Machado, and Tetsuya Sakai:

NTCIR-8 Proceedings

Presentation date： 2010
Overview of the NTCIR-8 Community QA Pilot Task (Part I)

Daisuke Ishikawa, Tetsuya Sakai, and Noriko Kando:

The Test Collection and the Task, NTCIR-8 Proceedings

Presentation date： 2010
Overview of the NTCIR-8 Community QA Pilot Task (Part II)

Tetsuya Sakai, Daisuke Ishikawa, and Noriko Kando:

System Evaluation, NTCIR-8 Proceedings

Presentation date： 2010
Microsoft Research Asia with Redmond at the NTCIR-8 Community QA Pilot Task

Young-In Song, Jing Liu, Tetsuya Sakai, Xin-Jing Wang, Guwen Feng, Yunbo Cao, Hisami Suzuki. and Chin-Yew Lin:

NTCIR-8 Proceedings

Presentation date： 2010
Mutilinguality at NTCIR, and moving on... (invited talk)

Tetsuya Sakai [Invited]

Proceedings of the COLING 2010 Fourth Workshop on Cross Lingual Information Access

Presentation date： 2010
EVIA 2010: The Third International Workshop on Evaluating Information Access

William Webber, Tetsuya Sakai, and Mark Sanderson

ACM SIGIR Forum

Presentation date： 2010
ウィキペディアを活用した探検型検索サイトのクエリログ分析

酒井哲也, 野上謙一

IPSJ SIG Technical Report

Presentation date： 2009
NTCIR-7 ACLIA IR4QA Results based on Qrels Version 2

Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Ruihua Song, Hideki Shima, and Teruko Mitamura

NTCIR-7 Online Proceedings

Presentation date： 2009
EVIA 2008: The Second International Workshop on Evaluating Information Access

Tetsuya Sakai, Mark Sanderson, and Noriko Kando

ACM SIGIR Forum

Presentation date： 2009
Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments

Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Ruihua Song, Hideki Shima, and Teruko Mitamura

IPSJ SIG Technical Report

Presentation date： 2009
Report on the SIGIR 2009 Workshop on the Future of IR Evaluation,

Jaap Kamps, Shlomo Geva, Carol Peters, Tetsuya Sakai, Andrew Trotman, Ellen Voorhees

ACM SIGIR Forum

Presentation date： 2009
チュートリアル情報検索テストコレクションと評価指標

酒井哲也

IPSJ SIG Technical Report

Presentation date： 2008
Comparing Metrics across TREC and NTCIR: The Robustness to System Bias

Tetsuya Sakai

IPSJ SIG Technical Report

Presentation date： 2008
Breaking News from NTCIR-7 (in Japanese),

酒井哲也, 加藤恒昭, 藤井敦, 難波英嗣, 関洋平, 三田村照子, 神門典子

ディジタル図書館編集委員会

Presentation date： 2008
Are Popular Documents More Likely To Be Relevant? A Dive into the ACLIA IR4QA Pools

Tetsuya Sakai and Noriko Kando

Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008)

Presentation date： 2008
Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access

Teruko Mitamura, Eric Nyberg, Hideki Shima, Tsuneaki Kato, Tatsunori Mori, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, Tetsuya Sakai, Donghong Ji, and Noriko Kando:

NTCIR-7 Proceedings

Presentation date： 2008
Overview of the NTCIR-7 ACLIA IR4QA Task

Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Teruko Mitamura, Hideki Shima, Donghong Ji, Kuang-Hua Chen, and Eric Nyberg:

NTCIR-7 Proceedings

Presentation date： 2008
効率的な番組視聴を支援するための話題ラベルの生成とその評価

小山誠, 酒井哲也, 福井美佳, 上原龍也, 下森大志

IPSJ SIG Technical Report

Presentation date： 2007
Toshiba BRIDJE at NTCIR-6 CLIR: The Head/Lead Method and Graded Relevance Feedback

Tetsuya Sakai, Makoto Koyama, Tatsuya Izuha, Akira Kumano, Toshihiko Manabe, and Tomoharu Kokubu:

NTCIR-6 Proceedings

Presentation date： 2007
A Further Note on Alternatives to Bpref

Tetsuya Sakai and Noriko Kando

IPSJ SIG Technical Report

Presentation date： 2007
EVIA 2007: The First International Workshop on Evaluating Information Access

Mark Sanderson, Tetsuya Sakai, and Noriko Kando

ACM SIGIR Forum

Presentation date： 2007
Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance

Tetsuya Sakai

IPSJ SIG Technical Report

Presentation date： 2006
質問応答型検索の音声認識誤りに対するロバスト性向上

筒井秀樹, 真鍋俊彦, 福井美佳, 藤井寛子, 浦田耕二, 酒井哲也

IPSJ SIG Technical Report

Presentation date： 2005
文書分類技法とそのアンケート分析への応用

平澤茂一, 石田崇, 足立鉱史, 後藤正幸, 酒井哲也

経営情報学会2005年度春季全国研究発表大会

Presentation date： 2005
インターネットを用いた研究支援環境～情報検索システム～

石田崇, 足立鉱史, 後藤正幸, 酒井哲也, 平澤茂一

経営情報学会2005年度春季全国研究発表大会

Presentation date： 2005
質問応答システムの正解順位とユーザ満足率の関係について

國分智晴, 酒井哲也, 齋藤佳美, 筒井秀樹, 真鍋俊彦, 藤井寛子

IPSJ SIG Technical Report

Presentation date： 2005
教学支援システムに関する学生アンケートの分析

渡辺智幸, 後藤正幸, 石田崇, 酒井哲也, 平澤茂一

FIT 2005 一般講演論文集

Presentation date： 2005
The Effect of Topic Sampling in Sensitivity Comparisons of Information Retrieval Metrics

Tetsuya Sakai

IPSJ SIG Technical Report

Presentation date： 2005
Toshiba BRIDJE at NTCIR-5: Evaluation using Geometric Means

Tetsuya Sakai, Toshihiko Manabe, Akira Kumano, Makoto Koyama, and Tomoharu Kokubu

NTCIR-5 Proceedings

Presentation date： 2005
質問応答技術に基づくマルチモーダルヘルプシステム

浦田耕二, 福井美佳, 藤井寛子, 鈴木優, 酒井哲也, 齋藤佳美, 市村由美, 佐々木寛

IPSJ SIG Technical Report

Presentation date： 2004
質問応答と，日本語固有表現抽出および固有表現体系の関係についての考察

市村由美, 齋藤佳美, 酒井哲也, 國分智晴, 小山誠

IPSJ SIG Technical Report

Presentation date： 2004
Toshiba BRIDJE at NTCIR-4 CLIR: Monolingual/Bilingual IR and Flexible Feedback

Tetsuya Sakai, Makoto Koyama, Akira Kumano, and Toshihiko Manabe

NTCIR-4 Proceedings

Presentation date： 2004
Toshiba ASKMi at NTCIR-4 QAC2

Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Makoto Koyama, and Tomoharu Kokubu

NTCIR-4 Proceedings

Presentation date： 2004
自然言語表現に基づく学生アンケート分析システム

酒井哲也, 石田崇, 後藤正幸, 平澤茂一

FIT 2004 一般講演論文集 N-021

Presentation date： 2004
新聞記事からの用語定義の抽出と固有表現クラスに基づく分類

小山誠, 酒井哲也, 真鍋俊彦

IPSJ SIG Technical Report

Presentation date： 2004
High-Precision Search via Question Abstraction for Japanese Question Answering

Tetsuya Sakai, Yoshimi Saito, Tomoharu Kokubu, Makoto Koyama, and Toshihiko Manabe

IPSJ SIG Technical Report

Presentation date： 2004
情報検索技術を用いた選択式・自由記述式の学生アンケート解析

石田崇, 足立鉱史, 後藤正幸, 酒井哲也, 平澤茂一

経営情報学会2004年度秋季全国研究発表大会

Presentation date： 2004
A Note on the Reliability of Japanese Question Answering Evaluation

Tetsuya Sakai

IPSJ SIG Technical Report

Presentation date： 2004
情報検索技術を用いた効率的な授業アンケートの分析

酒井哲也, 伊藤潤, 後藤正幸, 石田崇, 平澤茂一

経営情報学会2003年度春季全国研究発表大会

Presentation date： 2003
選択式・記述式アンケートからの知識発見

後藤正幸, 酒井哲也, 伊藤潤, 石田崇, 平澤茂一

2003 PCカンファレンス

Presentation date： 2003
授業に関する選択式・記述式アンケートの分析

平澤茂一, 石田崇, 伊藤潤, 後藤正幸, 酒井哲也

私立大学情報教育協会平成15年度大学情報化全国大会

Presentation date： 2003
PLSIを利用した文書からの知識発見

伊藤潤, 石田崇, 後藤正幸, 酒井哲也, 平澤茂一

FIT 2003 一般講演論文集

Presentation date： 2003
質問応答システムにおけるパッセージ検索の評価,

國分智晴, 酒井哲也

FIT 2003 一般講演論文集

Presentation date： 2003
Toshiba KIDS at NTCIR-3: Japanese and English-Japanese IR

Tetsuya Sakai, Makoto Koyama, Mika Suzuki, and Toshihiko Manabe

NTCIR-3 Proceedings

Presentation date： 2003
ベイズ統計を用いた文書ファイルの自動分析手法

後藤正幸, 伊藤潤, 石田崇, 酒井哲也, 平澤茂一

経営情報学会2003年度秋季全国研究発表大会

Presentation date： 2003
授業モデルとその検証

石田崇, 伊藤潤, 後藤正幸, 酒井哲也, 平澤茂一

経営情報学会2003年度秋季全国研究発表大会

Presentation date： 2003
係り受け木を用いた日本語文書の重要部分抽出

伊藤潤, 酒井哲也, 平澤茂一

IPSJ SIG Technical Report

Presentation date： 2003
Flexible Pseudo-Relevance Feedback for NTCIR-2

Tetsuya Sakai, Stephen E. Robertson, and Stephen Walker

NTCIR-2

Presentation date： 2001
Generic Summaries for Indexing in Information Retrieval - Detailed Test Results

Tetsuya Sakai and Karen Sparck Jones

Computer Laboratory, University of Cambridge

Presentation date： 2001
インターネットを用いた研究活動支援システム

平澤茂一, 松嶋敏泰, 鴻巣敏之, 酒井哲也, 中澤真, 李相協, 野村亮

2001 PCカンファランス

Presentation date： 2001
Cross -language情報検索のためのBMIR - J2を用いた一考察

酒井哲也, 梶浦正浩, 住田一男

IPSJ SIG Technical Report

Presentation date： 1999
Probabilistic Retrieval of Japanese News Articles for IREX at Toshiba

Tetsuya Sakai, Masaharu Kajiura, and Kazuo Sumita

IREX Workshop

Presentation date： 1999
Cross-Language Information Retrieval for NTCIR at Toshiba

Tetsuya Sakai, Yasuyo Shibazaki, Masaru Suzuki, Masaharu Kajiura, Toshihiko Manabe, Kazuo Sumita

NTCIR-1

Presentation date： 1999
BMIR-J2: A Test Collection for Evaluation of Japanese Information Retrieval Systems

Tetsuya Sakai, Tsuyoshi Kitani, Yasushi Ogawa, Tetsuya Ishikawa, Haruo Kimoto, Ikuro Keshi, Jun Toyoura, Toshikazu Fukushima, Kunio Matsui, Yoshihiro Ueda, Takenobu Tokunaga, Hiroshi Tsuruoka, Hidekazu Nakawatase, Teru Agata, and Noriko Kando

ACM SIGIR Forum

Presentation date： 1999
First Experiments on the BMIR-J2 Collection using the NEAT System

Gareth Jones, Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita

IPSJ SIG Technical Report

Presentation date： 1998
Cross-Language Information Access: a case study for English and Japanese

Gareth Jones, Nigel Collier, Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita, and Hideki Hirakawa

IPSJ SIG Technical Report

Presentation date： 1998
日本語情報検索システム評価用テストコレクションBMIR-J2

木谷強, 小川泰嗣, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 酒井哲也, 徳永健伸, 鶴岡弘, 安形輝

IPSJ SIG Technical Report

Presentation date： 1997
情報フィルタリングシステムNEATの開発

梶浦正浩, 三池誠司, 酒井哲也, 佐藤誠, 住田一男

第54回情報処理学会全国大会

Presentation date： 1997
ベンチマークBMIR-J2を用いた情報フィルタリングシステムNEATの評価

酒井哲也, 梶浦正浩, 三池誠司, 佐藤誠, 住田一男

第54会情報処理学会全国大会

Presentation date： 1997
情報フィルタリングシステムNEATのための検索要求文からのプロファイル生成

酒井哲也, 梶浦正浩, 住田一男

IPSJ SIG Technical Report

Presentation date： 1997
電子図書館のための効率的な文書検索

住田一男, 酒井哲也, 小野顕司, 三池誠司

ディジタル図書館 No.3

Presentation date： 1995
文書検索システムの動的抄録提示インタフェースの評価

酒井哲也, 三池誠司, 住田一男

情報処理学会研究報告ヒューマンコンピュータインタラクション

Presentation date： 1994

▼display all

Research Projects

Nugget-based Automatic Evaluation of Task-oriented dialogues

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2017.04

-

2021.03

Sakai Tetsuya

　View Summary

We served as task organisers of NTCIR (NII Testbeds and Community for Information access Research), an international evaluation forum, and constructed the DCH-2 dialogue data set. Our paper on an early version of this data set received the excellent paper runner-up award at WebDB Forum 2018. We also designed an evaluation measure for a task that requires systems to estimate the gold distribution of dialogue quality scores and showed its effectiveness. The work has been accepted as a full paper at ACL-IJCNLP 2021, a top conference in natural language processing.
Exploratory Search Considering the User's Situation

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2016.04

-

2020.03

Kando Noriko

　View Summary

The purpose of this study is to propose the interactive exploratory search technologies supporting users based on each user's situation. Firstly, we have conducted multiple laboratory-typed or crowdsourcing user studies and investigated various factors affect the search process and user's perceived satisfaction, such as the cognitive complexity of the search tasks. Secondly, we have investigated and tested the wide variety of tools and underlying technologies supporting user's exploratory such as inference of the user's situation based on logs and/or eye gaze, query recommendation, search results diversification, multi-facet search, and ostensive search. Thirdly the prototype of the ostensive search model-based interactive exploratory search and guide app for the museum was developed, using user's logs both in the virtual space on the app and the physical space in the museum, encouraged the unforgettable museum experience and introduced the context before and after the museum visits.

Misc

Developing an Interactive Analysis System for Voice Assistant Apps

刀塚敦子, 飯島涼, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉, 森達哉

電子情報通信学会技術研究報告(Web) 120 ( 384(ICSS2020 26-59) ) 2021

J-GLOBAL
Overview of the NTCIR-12 MobileClick-2 Task.

Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto,Virgil Pavlu, Hajime Morita, Sumio Fujita

Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10, 2016 2016
Overview of the NTCIR-11 MobileClick Task.

Makoto P. Kato, Matthew Ekstrand-Abueg,Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-11, National Center of Sciences, Tokyo, Japan, December 9-12, 2014 2014
A Preview of the NTCIR-10 INTENT-2 Results

2013 ( 5 ) 1 - 8 2013.02

CiNii
A Preview of the NTCIR-10 INTENT-2 Results

2013 ( 5 ) 1 - 8 2013.02

CiNii
Overview of the NTCIR-10 1CLICK-2 Task.

Makoto P. Kato, Matthew Ekstrand-Abueg,Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-10, National Center of Sciences, Tokyo, Japan, June 18-21, 2013 182 - 211 2013
Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

Hajime Morita, Tetsuya Sakai, Manabu Okumura

5 ( 2 ) 11 - 16 2012.06

CiNii
Determination of the Quality of Answers in Community QA by Multiple Assessors

ISHIKAWA Daisuke, SAKAI Tetsuya, SEKI Yohei, KURIYAMA Kazuko, KANDO Noriko

Journal of Japan Society of Information and Knowledge 21 ( 2 ) 169 - 177 2011.05

　View Summary

Community Question Answering (CQA) has recently become a popular means of satisfying personal information needs. However, as the quality of answers posted on CQA sites varies widely, there is a need to effectively extract high-quality answers from CQA. In this study, we manually analyzed the high-quality answers from Yahoo! Chiebukuro data by multiple assessors and identi?ed criteria used by assessors to evaluate high-quality answers.

DOI CiNii
Overview of the NTCIR-9 INTENT Task.

Ruihua Song, Min Zhan, Tetsuya Sakai, Makoto, P. Kato, Yiqun Liu, Miho Sugimoto, Qinglei Wang, Naoki Orii

Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, NTCIR-9, National Center of Sciences, Tokyo, Japan, December 6-9, 2011 82 - 105 2011
Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments

SAKAI TETSUYA, KANDO NORIKO, LIN CHUAN-JIE, SONG RUIHUA, SHIMA HIDEKI, MITAMURA TERUKO

2009 ( 9 ) 1 - 8 2009.07

CiNii
Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments

SAKAI TETSUYA, KANDO NORIKO, LIN CHUAN-JIE, SONG RUIHUA, SHIMA HIDEKI, MITAMURA TERUKO

2009 ( 9 ) 1 - 8 2009.07

CiNii
Hobby & Work

SAKAI Tetsuya

IPSJ Magazine 49 ( 7 ) 835 - 835 2008.07

CiNii
Comparing Metrics across TREC and NTCIR : The Robustness to System Bias

SAKAI Tetsuya

IPSJ SIG Notes 2008 ( 56 ) 1 - 8 2008.06

　View Summary

Test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in more realistic settings, by reducing the number of pooled systems. Even though previous work has shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that they are not necessarily superior to traditional metrics in the presence of system bias. Using data from both TREC and NTCIR, we first show that condensed-list metrics overestimate new systems while traditional metrics underestimate them, and that the overestimation tends to be larger than the underestimation. We then show that, when relevance data is heavily biased towards a single team or a few teams, the condensed-list versions of Average Precision (AP), Q-measure (Q) and normalised Discounted Cumulative Gain (nDCG), which we call AP', Q' and nDCG', are not necessarily superior to the original metrics in terms of discriminative power, i.e., the overall ability to detect pairwise statistical significance. Nevertheless, AP' and Q' are generally more discriminative than bpref and the condensed-list version of Rank-Biased Precision (RBP), which we call RBP'.

CiNii
Information Retrieval Test Collections and Evaluation Metrics: A Tutorial

Sakai Tetsuya

IPSJ SIG Notes 2008 ( 4 ) 1 - 8 2008.01

CiNii
Information Retrieval Test Collections and Evaluation Metrics: A Tutorial

SAKAI Tetsuya

IPSJ SIG Notes 2008 ( 4 ) 1 - 8 2008.01

CiNii
A Further Note on Alternatives to Bpref

SAKAI Tetsuya, KANDO Noriko

IPSJ SIG Notes 2007 ( 109 ) 7 - 14 2007.11

　View Summary

This paper compares the robustness of information retrieval (IR) metrics to incomplete relevance assessments, using four different sets of graded-relevance test collections with submitted runs-two from TREC and two from NTCIR. We investigate the effect of reducing the original relevance data on discriminative power (i.e., how often statistical significance can be detected given the probability of Type I Error) and on Kendall's rank correlation between two system rankings. According to these experiments, Q', nDCG' and AP' proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also clarify some properties of these metrics that immediately follow from their definitions.

CiNii
After Q and R Comes O, then P…

SAKAI Tetsuya

IPSJ Magazine 48 ( 7 ) 761 - 761 2007.07

CiNii
Automatic Generation of Topic Labels for Efficient Video Viewing

KOYAMA Makoto, SAKAI Tetsuya, FUKUI Mika, UEHARA Tatsuya, SHIMOMORI Taishi

IPSJ SIG Notes 2007 ( 34 ) 17 - 23 2007.03

　View Summary

This paper describes a method for generating keyword, phrase and sentence labels for video segments of TV programs. By using a relevance feedback algorithm in information retrieval, it selects topic keywords, phrases and sentences from closed caption text in each topical segment. 39 subjects evaluated keyword, phrase and sentence labels from TV programs about travel, town and cooking. The results show that keyword and phrase labels achieve better results than sentence labels on understandability and relevance of labels.

CiNii
Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evalution with Graded Relevance

SAKAI Tetsuya

IPSJ SIG Notes 2006 ( 94 ) 57 - 64 2006.09

　View Summary

Large-scale information retrieval evaluation efforts such as TREC and NTCIR have always used binary-relevance evaluation metrics, even when graded relevance data were available. However, the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics, though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evaluation with graded relevance.

CiNii
Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evalution with Graded Relevance

Sakai Tetsuya

IPSJ SIG Notes 2006 ( 94 ) 57 - 64 2006.09

　View Summary

Large-scale information retrieval evaluation efforts such as TREC and NTCIR have always used binary-relevance evaluation metrics, even when graded relevance data were available. However, the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics, though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evaluation with graded relevance.

CiNii
Improving the Robustness to Recognition Errors in Speech Input Question Answering

TSUTSUI Hideki, MANABE Toshihiko, FUKUI Mika, FUJII Hiroko, URATA Koji, SAKAI Tetsuya

IPSJ SIG Notes 2005 ( 22 ) 31 - 38 2005.03

　View Summary

We have been developing a multimodal question answering system that combines the search technology for multimodal contents with high expressive power such as video, speech and text, and the factoid question answering technology for understanding the user's information need and extracting exact answers from text. Failure analyses of our system showed that speech recognition errors were fatal for answer type recognition and therefore for the final Mean Reciprocal Rank (MRR) performance, especially with numerical factoid questions. We therefore propose a new method which is robust to speech recognition errors. This method improves our MRR based on top 3 answers from 0.429 to 0.597.

CiNii
A Note on the Reliability of Japanese Question Answering Evaluation

SAKAI Tetsuya

IPSJ SIG Technical Reports 2004 ( 119 ) 57 - 64 2004.11

　View Summary

This paper compares some existing QA evaluation metrics from the viewpoint of reliability and usefulness, using the NTCIR-4 QAC2 Japanese QA tasks and our adaptations of Buckley/Voorhees and Voorhees/Buckley reliability measurement methods. Our main findings are : (1) The fraction of questions with a correct answer within Top 5 (NQcorrect5) and that with a correct answer at Rank 1 (NQcorrectl) are not as stable as Reciprocal Rank based on ranked lists containing up to five answers. (2) Q-measure, which can handle multiple correct answers and answer correctness levels, is as reliable and useful as Reciprocal Rank, provided that a mild gain value assignment is used. Emphasising answer correctness levels tends to hurt stability, while handling multiple correct answers improves it.

CiNii
High - Precision Search via Question Abstraction for Japanese Question Answering

SAKAI Tetsuya, SAITO Yoshimi, KOKUBU Tomoharu, KOYAMA Makoto, MANABE Toshihiko

IPSJ SIG Notes 2004 ( 93 ) 139 - 146 2004.09

　View Summary

This paper explores the use of Question Abstraction, i.e., Named Entity Recognition for questions input by the user, for reranking retrieved documents to enhance retrieval precision for Japanese Question Answering (QA). Question Abstraction may help improve precision because (a) As named entities are often phrases, it may have effects that are similar to phrasal or proximity search; (b) As named entity recognition is context-sensitive, the named entity tags may help disambiguate ambiguous terms and phrases. Our experiments using several Japanese "exact answer" QA test collections show that this approach significantly improves IR precision, but that this improvement is not necessarily carried over to the overall QA performance. Additionally, we conduct preliminary experiments on the use of Question Abstraction for Pseudo-Relevance Feedback using Japanese IR test collections, and find positive (though not statistically significant) effects. Thus the Question Abstraction approach probably deserves further investigations.

CiNii
Extraction and Classification of Term Definitions Using Named Entity Extraction from News Articles

KOYAMA Makoto, SAKAI Tetsuya, Manabe Toshihiko

IPSJ SIG Notes 2004 ( 93 ) 45 - 51 2004.09

　View Summary

In this paper, we propose a system that uses Japanese newspaper corpora for extracting and classifying term definitions to expand the knowledge of a natural language system such as a question answering system. The system classifies term definitions based on semantic classes obtained through named entity extraction and words obtained through morphological analysis. In an experiment using news articles, the system classifies term definitions by 14 semantic classes and achieves 82.1% precision and 50.8% recall.

CiNii
N-021 A Student Questionnaire Analysis System based on Natural Language Expressions

Sakai Tetsuya, Ishida Takashi, Goto Masayuki, Hirasawa Shigeichi

3 ( 4 ) 325 - 328 2004.08

CiNii
Japanese Text Extraction using the Dependency Tree

ITO Jun, SAKAI Tetsuya, HIRASAWA Shigeichi

IPSJ SIG Notes 2003 ( 108 ) 19 - 24 2003.11

　View Summary

A Japanese sentence can be expressed as a tree structure (dependency tree) based on dependency relations. Since a subtree of a dependency tree preserves the dependency of the original tree, it generally represents a correct sentence on its own. In this paper, a document is expressed as an extended dependency tree, in which weights are assigned to its nodes and edges. Moreover, the problem of extracting important text fragments is formalized as that of "searching for a subtree that maximizes a certain score from subtrees of the extended decision tree". We implemented such a summarization system and performed evaluations based on manual assessment as well as comparison with original texts.

CiNii
"ベイズ統計を用いた文書ファイルの自動分析手法,"

後藤正幸, 伊藤潤, 石田崇, 酒井哲也

経営情報学会2003年度秋季全国研究発表大会予稿集，函館 pp.28-31 2003
「インターネットを用いた研究活動支援システム」システム構成

平澤茂一, 松嶋敏泰, 鴻巣敏之, 酒井哲也, 中澤真, 李相協, 野村亮

2001PCカンファレンス 2001
A Study on Cross - language Information Retrieval using BMIR - J2

SAKAI Tetsuya, KAJIURA Masahiro, SUMITA Kazuo

IPSJ SIG Notes 1999 ( 2 ) 41 - 48 1999.01

　View Summary

We study a cross-language IR approach using the NEAT information filtering system and the AS-TRANSAC machine translation system. The BMIR-J2 standard Japanese test collection and our own translated data are used for evaluation. In the English-to-Japanese experiments, we consider both document translation and query translation, and also compare the retrieval performance when the queries are translated by different translators. In the Japanese-to-pseudo-English experiments, we perform local feedback both before and after query translation. We achieve over 90% of Japanese monolingual performance.

CiNii
BMIR -J2- A Test Collection for Evaluation of Japanese Information Retrieval Systems

KITANI Tsuyoshi, OGAWA Yasushi, ISHIKAWA Tetsuya, KIMOTO Haruo, NAKAWATASE Hidekazu, KESHI Ikuo, TOYOURA Jun, FUKUSHIMA Toshikazu, MATSUI Kunio, UEDA Yoshihiro, SAKAI Tetsuya, TOKUNAGA Takenobu, TSURUOKA Hiroshi, AGATA Teru

IPSJ SIG Notes 1998 ( 2 ) 15 - 22 1998.01

　View Summary

BMIR-J2, a test collection for evaluation of Japanese information retrieval systems to be released in March 1998, has been developed by a working group under the Special Interest Group on Database Systems in Information Processing Society of Japan. Since March 1996, a preliminary version called BMIR-J1 has been distributed to fifty sites and used in many research projects. Based on comments from the BMIR-J1 users and our experience, we have enlarged the collection size and revised search queries and relevance assessments in BMIR-J2. In this paper, we describe BMIR-J2 and its development process, and discuss issues to be considered for improving BMIR-J2 further.

CiNii
Profile Generation from Query Sentences for the NEAT Information Filtering System

SAKAI Tetsuya, KAJIURA Masahiro, SUMITA Kazuo

IPSJ SIG Notes 1997 ( 86 ) 83 - 88 1997.09

　View Summary

The NEAT information filtering system selects relevant articles from digital text provided daily by Japanese newspaper companies and publishers, and sends them by e-mail to its users. NEAT calculates a score for each article and produces a ranked output based on various types of query vectors written in the profile, such as location, density and distribution of keywords as well as boolean operators. We show that profiles generated automatically from query sentences can lie halfway between simple boolean profiles and hand-made profiles with respect to retrieval effectiveness. By combining this method and relevance feedback, the burden of manual profile definition will be lightened considerably.

CiNii
Evaluation of the NEAT Information Filtering System Using the BMIR-J1 Benchmark

54 301 - 302 1997.03

CiNii
Development of Information Filtering System NEAT

54 299 - 300 1997.03

CiNii
Effective Document Retrieval for Digital Library : Document Structure Analysis and Automatic Abstract Generation

Sumita Kazuo, Sakai Tetsuya, Ono Kenji, Miike Seiji

Digital libraries 3 35 - 41 1995.03

CiNii
Evaluation of the Abstract Presentation Interface for a Document Retrieval System

1994 ( 96 ) 49 - 54 1994.11

CiNii
A Document Retrieval System with an Interactive Abstract Generator : Functions and Configuration

48 275 - 276 1994.03

CiNii
Learning formal languages from Feasible Teachers

Journal of Japan Industrial Management Association 44 ( 3 ) 245 - 245 1993.08

CiNii

▼display all

Syllabus

Master's Thesis (Department of Computer Science and Communications Engineering)

Graduate School of Fundamental Science and Engineering

2025 full year
Master's Thesis (Department of Computer Science and Communications Engineering)

Graduate School of Fundamental Science and Engineering

2025 full year
Research on Information Access

Graduate School of Fundamental Science and Engineering

2025 full year
Seminar on Information Access C

Graduate School of Fundamental Science and Engineering

2025 spring semester
Seminar on Information Access B

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Information Access A

Graduate School of Fundamental Science and Engineering

2025 spring semester
Seminar on Information Access D

Graduate School of Fundamental Science and Engineering

2025 fall semester
Special Laboratory B in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 fall semester
Special Laboratory A in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 spring semester
Advanced Project Study(Autumn)

Graduate School of Fundamental Science and Engineering

2025 fall semester
Advanced Project Study(Spring)

Graduate School of Fundamental Science and Engineering

2025 spring semester
Foundations for Information Access Evaluation

Graduate School of Fundamental Science and Engineering

2025 spring semester
Research on Information Access

Graduate School of Fundamental Science and Engineering

2025 full year
Foundations for Information Access Evaluation

Graduate School of Fundamental Science and Engineering

2025 spring semester
Seminar on Information Access D

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Information Access C

Graduate School of Fundamental Science and Engineering

2025 spring semester
Seminar on Information Access B

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Information Access A

Graduate School of Fundamental Science and Engineering

2025 spring semester
Special Laboratory B in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 fall semester
Special Laboratory A in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 spring semester
Research on Information Access

Graduate School of Fundamental Science and Engineering

2025 full year
Special Seminar B in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 fall semester
Special Seminar A in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A（Fall Semester）

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B（Spring Semester）

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis B [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A (Intensive Course)

School of Fundamental Science and Engineering

2025 an intensive course(spring and fall)
Bachelor Thesis A [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Computer Science and Engineering Laboratory A (2)

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A（Fall Semester）

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B（Spring Semester）

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2025 spring semester
Computer Science and Engineering Laboratory B [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Programming C

School of Fundamental Science and Engineering

2025 fall semester
Computer Science and Engineering Laboratory B

School of Fundamental Science and Engineering

2025 spring semester
Computer Science and Engineering Laboratory A [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Project Research A

School of Fundamental Science and Engineering

2025 spring semester
Foundations for Information Access Evaluation

School of Fundamental Science and Engineering

2025 spring semester
Informationalized Society [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Databases

School of Fundamental Science and Engineering

2025 fall semester
Statistical Analysis in Practice

School of Fundamental Science and Engineering

2025 fall quarter
Project Research B

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2025 fall semester
Communications and Computer Engineering Laboratory A [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Communications and Computer Engineering Laboratory A

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B (Spring Semester)

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A (Fall Semester)

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2025 spring semester
Foundations for Information Access Evaluation

School of Fundamental Science and Engineering

2025 spring semester
Statistical Analysis in Practice

School of Fundamental Science and Engineering

2025 fall quarter
Project Research B

School of Fundamental Science and Engineering

2025 fall semester
Project Research A

School of Fundamental Science and Engineering

2025 spring semester
Databases

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A (Intensive Course)

School of Fundamental Science and Engineering

2025 an intensive course(spring and fall)
Bachelor Thesis B (Spring Semester)

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A (Fall Semester)

School of Fundamental Science and Engineering

2025 fall semester
Communications and Computer Engineering Laboratory B

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis B [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Communications and Computer Engineering Laboratory B [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Graduation Thesis A (Spring)

School of Fundamental Science and Engineering

2025 spring semester
Graduation Thesis B (Spring) [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Graduation Thesis A (Fall)

School of Fundamental Science and Engineering

2025 fall semester
Graduation Thesis B (Fall)

School of Fundamental Science and Engineering

2025 fall semester
Graduation Thesis B (Fall) [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Graduation Thesis B (Spring)

School of Fundamental Science and Engineering

2025 spring semester
Computer Science and Communications Engineering Laboratory A [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Computer Science and Communications Engineering Laboratory A

School of Fundamental Science and Engineering

2025 fall semester
Computer Science and Communications Engineering Laboratory B

School of Fundamental Science and Engineering

2025 spring semester
Project Research Spring

School of Fundamental Science and Engineering

2025 spring semester
Databases

School of Fundamental Science and Engineering

2025 fall semester
Project Research Fall

School of Fundamental Science and Engineering

2025 fall semester
Introduction to Computers and Networks

School of Fundamental Science and Engineering

2025 spring semester
Foundations for Information Access Evaluation

School of Fundamental Science and Engineering

2025 spring semester
Graduation Thesis A (Fall) [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Graduation Thesis A (Spring) [S Grade]

School of Fundamental Science and Engineering

2025 spring semester

▼display all

Sub-affiliation

Faculty of Science and Engineering Graduate School of Fundamental Science and Engineering

Research Institute

2024

-

2026

Waseda Research Institute for Science and Engineering Concurrent Researcher

Internal Special Research Projects

ベイズ統計に基づく情報アクセス評価体系の構築

2017

　View Summary

I published the following full paper at SIGIR 2017, the top conference in information retrieval.The following is the abstract:Using classical statistical significance tests, researchers can onlydiscuss P(D+|H), the probability of observing the data D at hand orsomething more extreme, under the assumption that the hypothesisH is true (i.e., the p-value). But what we usually want is P(D+|H),the probability that a hypothesis is true, given the data. If we useBayesian statistics with state-of-the-art Markov Chain Monte Carlo(MCMC) methods for obtaining posterior distributions, this is nolonger a problem. That is, instead of the classical p-values and 95%confidence intervals, which are often misinterpreted respectivelyas “probability that the hypothesis is (in)correct” and “probabilitythat the true parameter value drops within the interval is 95%,” wecan easily obtain P(H|D) and credible intervals which representexactly the above. Moreover, with Bayesian tests, we can easilyhandle virtually any hypothesis, not just “equality of means,” andobtain an Expected A Posteriori (EAP) value of any statistic thatwe are interested in. We provide simple tools to encourage theIR community to take up paired and unpaired Bayesian tests forcomparing two systems. Using a variety of TREC and NTCIR data,we compare P(H|D) with p-values, credible intervals with confidence intervals, and Bayesian EAP effect sizes with classical ones.Our results show that (a) p-values and confidence intervals canrespectively be regarded as approximations of what we really want,namely, P(H|D) and credible intervals; and (b) sample effect sizesfrom classical significance tests can differ considerably from theBayesian EAP effect sizes, which suggests that the former can bepoor estimates of population effect sizes. For both paired and unpairedtests, we propose that the IR community report the EAP, thecredible interval, and the probability of hypothesis being true, notonly for the raw difference in means but also for the effect size interms of Glass’s delta.
統計的手法を用いた情報検索テストコレクション横断評価および情報検索論文の評価

2016

　View Summary

I published five international conference papers (SIGIR, SIGIR, SIGIR(short), ICTIR, AIRS),two international workshop papers (EVIA, EVIA), and a workshop report (SIGIR Forum).Moreover, I gave a tutorial at an international conference (ICTIR) and a keynote at a Japanese symposium (IPSJ SIGNL) on this topic.
「寡黙なユーザ」のための情報検索技術に関する研究

2015

　View Summary

We published one international journal paper, one international conference paper, one evaluation conference overview (TREC), and two unrefereed domestic papers.
情報アクセス評価基盤の体系化および評価

2015

　View Summary

We published one book, one international journal paper, one international conference paper, one domestic IPSJ workshop paper and organised an international workshop.
テストコレクションのサンプルサイズ設計に関する研究

2014

　View Summary

We published three refereed papers (two forinternational conferences and one for a domestic conference) on how todetermine the topic set size of a test collection.
最小限のインタラクションを介した情報アクセスに関する研究

2014 Koji Yatani, Makoto P. Kato, Takehiro Yamamoto, Virgil Pavlu, Javed Aslam, Fernando Diaz

　View Summary

Wecollaborated with various researchers from outside Waseda and published severalpapers related to information access via minimal interactions. We ran a taskcalled MobileClick at NTCIR and a track called Temporal Summarization at TREC.It is worth noting that ourMobileHCI paper (collaboration with the University of Tokyo) received anHonourable Mention Award.
サーチエンジン評価指標の体系化と有効性実証

2014

　View Summary

We published three refereed papers (two forinternational conferences and one for a domestic conference) on how todetermine the topic set size of a test collection.

▼display all