Updated on 2024/12/21

写真a

 
SAKAI, Tetsuya
 
Affiliation
Faculty of Science and Engineering, School of Fundamental Science and Engineering
Job title
Professor
Degree
博士
Mail Address
メールアドレス
Profile
http://sakailab.com/tetsuya/

Research Areas

  • Human interface and interaction

Research Interests

  • information access, information retrieval, natural language processing

Awards

  • SIGIR Academy 2023

    2023.05   ACM SIGIR  

  • DEIM 2020 Excellent Paper Award (second author)

    2020  

  • FIT 2020 Excellent Paper Award (first author)

    2020  

  • CSS 2019 Best Paper Award (fifth author)

    2019  

  • ACM Distinguished Member

    2018  

  • WASEDA e-Teaching Award 2018

    2018  

  • ACM Recognition of Service Award (SIGIR'17 Co-chair)

    2017  

  • ACM Senior Member

    2016  

  • Waseda University Presidential Teaching Award (2016 Spring Semester)

    2016  

  • Waseda University Teaching Award (2014 Autumn Semester)

    2015  

  • CSS 2014 Student Paper Award (third author)

    2014  

  • MobileHCI 2014 Honorable Mention (second author)

    2014  

  • ACM SIGIR 2013 best paper shortlisted nominee (first author)

    2013  

  • AIRS 2012 Best Paper Award (first author)

    2012  

  • WebDB Forum 2010 Excellent Paper Award and NTT Resonant Award (second author)

    2010  

  • FIT 2008 Funai Best Paper Award (first author)

    2008  

  • IEICE ESS 2007 Merit Award

    2007  

  • IPSJ 2007 Best Paper Award (single author)

    2007  

  • IPSJ 2006 Yamashita SIG Research Award (single author)

    2006  

  • IPSJ 2006 Best Paper Award (single author)

    2006  

  • FIT 2005 Excellent Paper Award (single author)

    2005  

▼display all

 

Papers

  • Click the search button and be happy: Evaluating direct and immediate information access

    Tetsuya Sakai, Makoto P. Kato, Young-In Song

    International Conference on Information and Knowledge Management, Proceedings     621 - 630  2011  [Refereed]

     View Summary

    We define Direct Information Access as a type of information access where there is no user operation such as clicking or scrolling between the user's click on the search button and the user's information acquisition
    we define Immediate Information Access as a type of information access where the user can locate the relevant information within the system output very quickly. Hence, a Direct and Immediate Information Access (DIIA) system is expected to satisfy the user's information need very quickly with its very first response. We propose a nugget-based evaluation framework for DIIA, which takes nugget positions into account in order to evaluate the ability of a system to present important nuggets first and to minimise the amount of text the user has to read. To demonstrate the integrity, usefulness and limitations of our framework, we built a Japanese DIIA test collection with 60 queries and over 2,800 nuggets as well as an offset-based nugget match evaluation interface, and conducted experiments with manual and automatic runs. The results suggest our proposal is a useful complement to traditional ranked retrieval evaluation based on document relevance. © 2011 ACM.

    DOI

    Scopus

    20
    Citation
    (Scopus)
  • Understanding User Behavior and Measuring System Vulnerability

    Nuo Chen, Jiqun Liu, Hanpei Fang, Yuankai Luo, Tetsuya Sakai, Xiao-Ming Wu

    ACM TOIS    2025  [Refereed]

  • Evaluating System Responses Based On Overconfidence and Underconfidence

    Tetsuya Sakai

       2024  [Refereed]

    Authorship:Lead author

  • Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Large Language Models

    Yuxiang Zhang, Xin Fan, Junjie Wang, Chongxian Chen, Fan Mo, Tetsuya Sakai, Hayato Yamana

    ACM SIGIR-AP     226 - 235  2024  [Refereed]

  • AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment

    Nuo Chen, Jiqun Liu, Xiaoyu Dong, Qijiong Liu, Tetsuya Sakai, Xiao-Ming Wu

    ACM SIGIR-AP     56 - 63  2024  [Refereed]

  • Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval

    Kelong Mao, Chenlong Deng, Haonan Chen, Fengran Mo, Zheng Liu, Tetsuya Sakai, Zhicheng Dou

    EMLNLP     1227 - 1240  2024  [Refereed]

  • A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models

    Yuxiang Zhang, Jing Chen, Junjie Wang, Yaxin Liu, Cheng Yang, Chufan Shi, Xinyu Zhu, Zihao Lin, Hanwen WAN, Yujiu Yang, Tetsuya Sakai, Tian Feng, Hayato Yamana

    EMLNLP     11388 - 11422  2024  [Refereed]

  • Benchmarking Chinese Text-to-Table Performance in Large Language Models

    Haoxiang Shi, Jiaan Wang, Jiarong Xu, Cen Wang, Tetsuya Sakai

    arxiv:2405.12174    2024  [Refereed]

  • Solving Named Entity Recognition Problems via a Single-stream Reasoner

    Yuxiang Zhang, Junjie, Wang, Xinyu Zhu, Tetsuya Sakai, Hayato Yamana

    ACM TOIS   42 ( 5 )  2024  [Refereed]

  • Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models

    Qijiong Liu, Nuo Chen, Tetsuya Sakai, Xiao-Ming Wu

    ACM WSDM     452 - 461  2024  [Refereed]

  • Enhancing Parameter Efficiency in Model Inference using an Ultralight Inter-Transformer Linear Structure

    Haoxiang Shi, Tetsuya Sakai

    IEEE Access   12   43734 - 43746  2024  [Refereed]

  • Modeling Multimodal Uncertainties via Probability Distribution Encoders included Vision-Language Models

    Junjie Wang, Yatai, Ji, Yuxiang Zhang, Yanru Zhu, Tetsuya Sakai

    IEEE Access   12   420 - 434  2024  [Refereed]

  • Zero-Shot Learners for Natural Language Understanding via a Unified Multiple-Choice Perspective

    Junjie Wang, Ping Yang, Ruxi Gan, Yuxiang Zhang, Jiaxing Zhang, Tetsuya Sakai

    IEEE Access   11   142829 - 142845  2023  [Refereed]

  • Ethical Alignment Meets Conversational Information Retrieval

    Yiyao Yu, Junjie Wang, Yuxiang Zhang, Lin Zhang, Yujiu Yang, Tetsuya Sakai

    ACM SIGIR-AP     32 - 39  2023  [Refereed]

  • Deriving Nugget-level Scores from Turn-level Scores

    Rikiya Takehi, Akihisa Watanabe, Tetsuya Sakai

    ACM SIGIR-AP     40 - 45  2023  [Refereed]

  • Chuweb21D: A Deduped English Document Collection for Web Search Tasks

    Zhumin Chu, Tetsuya Sakai, Qingyao Ai, Yiqun Liu

    ACM SIGIR-AP     63 - 72  2023  [Refereed]

  • Fairness-based Evaluation of Conversational Search: A Pilot Study

    Tetsuya Sakai

    EVIA     5 - 13  2023  [Refereed]

  • Decoy Effect in Search Interaction: A Pilot Study

    Nuo Chen, Jiqun Liu, Tetsuya Sakai, Xiao-Ming Wu

    EVIA     14 - 16  2023  [Refereed]

  • On A Few Responsibilities of (IR) Researchers (Fairness, Awareness, and Sustainability), A keynote at ECIR 2023

    Tetsuya Sakai

    SIGIR Forum    2023  [Refereed]

  • Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval

    Haoxiang Shi, Sumio Fujita, Tetsuya Sakai

    SIGIR Workshop on ReNeuIR    2023  [Refereed]

  • Self-supervised and Few-shot Contrastive Learning Frameworks for Text Clustering

    Haoxiang Shi, Tetsuya Sakai

    IEEE   11   84134 - 84143  2023  [Refereed]

  • On the Ordering of Pooled Web Pages, Gold Assessments, and Bronze Assessments

    Tetsuya Sakai, Sijie Tao, Nuo Chen, Yujing Li, Maria Maistro, Zhumin Chu, Nicola Ferro

    ACM TOIS    2023  [Refereed]

  • How Many Crowd Workers Do I Need? On Statistical Power When Crowdsourcing Relevance Judgments

    Kevin Roitero, David La Barbera, Michael Soprano, Gianluca Demartini, Stefano Mizzaro, Tetsuya Sakai

    ACM TOIS    2023  [Refereed]

  • A Versatile Framework for Evaluating Ranked Lists in terms of Group Fairness and Relevance

    Tetsuya Sakai, Jin Young Kim, Inho Kang

    ACM TOIS    2023  [Refereed]

  • Practice and Challenges in Building a Business-oriented Search Engine Quality Metric

    Nuo Chen, Donghyun Park, Hyungae Park, Kijun Choi, Tetsuya Sakai, Jinyoung Kim

    SIGIR 2023    2023  [Refereed]

  • MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model

    Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang

    CVPR 2023    2023  [Refereed]

  • A Reference-Dependent Model for Web Search Evaluation

    Nuo Chen, Jiqun Liu, Tetsuya Sakai

    The 2023 ACM Web Conference    2023  [Refereed]

  • Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents? (CORRECTED VERSION)

    Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

    arXiv    2022  [Refereed]

  • Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective

    Ping Yang, Junjie Wang, Ruyi Gan, Xinyu Zhu, Lin Zhang, Ziwei Wu, Xinyu Gao, Jiaxing Zhang, Tetsuya Sakai

    EMNLP 2022    2022  [Refereed]

  • Understanding the Behavior Transparency of Voice Assistant Applications Using the ChatterBox Framework

    Atsuko Natatsuka, Ryo Iijima, Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, Tatsuya Mori

    Proceedings of RAID 2022    2022  [Refereed]

  • MAP: Modality-Agnostic Uncertainty-Aware Vision-Language Pre-training Model

    Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang

    arXiv    2022  [Refereed]

  • Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4 English Subtasks

    Tetsuya Sakai, Sijie Tao, Maria Maistro, Zhumin Chu, Yujing Li, Nuo Chen, Nicola Ferro, Junjie Wang, Ian Soboroff, Yiqun Liu

    arXiv    2022  [Refereed]

  • LayerConnect: Hypernetwork-Assisted Inter-Layer Connector to Enhance Parameter Efficiency

    Haoxiang Shi, Rongsheng Zhang, Jiaan Wang, Cen Wang, Guandan Chen, Yinhe Zheng, Tetsuya Sakai

    Proceedings of COLING 2022    2022  [Refereed]

  • Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles?

    Rei Shimizu, Sumio Fujita, Tetsuya Sakai

    Proceedings of ACM ICTIR 2022    2022  [Refereed]

  • Constructing Better Evaluation Metrics by Incorporating the Anchoring Effect into the User Model

    Nuo Chen, Fan Zhang, Tetsuya Sakai

    ACM SIGIR 2022    2022  [Refereed]

  • Evaluating the Effects of Embedding with Speaker Identity Information in Dialogue Summarization

    Yuji Naraki, Tetsuya Sakai, Yoshihiko Hayashi

    LREC 2022    2022  [Refereed]

  • AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval

    Riku Togashi, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Tetsuya Sakai

    CVPR 2022    2022  [Refereed]

  • スタンス検出タスクにおける評価方法の選定 (研究会推薦論文)

    雨宮佑基, 酒井哲也

    電子情報通信学会和文論文誌D「データ工学と情報マネジメント特集」    2022  [Refereed]

  • Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents?

    Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

    ACM TOIS    2022  [Refereed]

  • MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

    Junjie Wang, Yatai Ji, Jiaqi Sun, Yujiu Yang, Tetsuya Sakai

    Findings of the Association for Computational Linguistics: EMNLP 2021    2021  [Refereed]

  • A Closer Look at Evaluation Measures for Ordinal Quantification

    Tetsuya Sakai

    Proceedings of the CIKM 2021 Workshops    2021  [Refereed]

  • Evaluating Relevance Judgments with Pairwise Discriminative Power

    Zhumin Chu, Jiaxin Mao, Fan Zhang, Yiqun Liu, Tetsuya Sakai, Min Zhang, Shaoping Ma

    Proceedings of ACM CIKM 2021    2021  [Refereed]

  • Incorporating Query Reformulating Behavior into Web Search Evaluation

    Jia Chen, Yiqun Liu, Jiaxin Mao, Fan Zhang, Tetsuya Sakai, Weizhi Ma, Min Zhang, Shaoping Ma

    Proceedings of ACM CIKM 2021    2021  [Refereed]

  • A Simple and Effective Usage of Self-supervised Contrastive Learning for Text Clustering

    Haoxiang Shi, Cen Wang, Tetsuya Sakai

    Proceedings of IEEE SMC 2021    2021

  • Evaluating Evaluation Measures for Ordinal Classification and Ordinal Quantification

    Tetsuya Sakai

    Proceedings of ACL-IJCNLP 2021    2021  [Refereed]

  • WWW3E8: 259,000 Relevance Labels for Studying the Effect of Document Presentation Order for Relevance Assessors

    Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

    Proceedings of ACM SIGIR 2021    2021  [Refereed]

  • On the Two-Sample Randomisation Test for IR Evaluation

    Tetsuya Sakai

    Proceedings of ACM SIGIR 2021    2021  [Refereed]

  • Scalable Personalised Item Ranking through Parametric Density Estimation

    Riku Togashi, Masahiro Kato, Mayu Otani, Tetsuya Sakai, Shin’Ichi Satoh

    Proceedings of ACM SIGIR 2021    2021  [Refereed]

  • Fast and Exact Randomisation Test for Comparing Two Systems with Paired Data

    Rikiya Suzuki, Tetsuya Sakai

    Proceedings of ACM ICTIR 2021    2021  [Refereed]

  • DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators’ Labels

    Zhaohao Zeng, Tetsuya Sakai

    arXiv    2021  [Refereed]

  • How Do Users Revise Zero-Hit Product Search Queries?

    Yuki Amemiya, Tomohiro Manabe, Sumio Fujita, Tetsuya Sakai

    Proceedings of ECIR 2021 Part II    2021  [Refereed]

  • On the Instability of Diminishing Return IR Measures

    Tetsuya Sakai

    Proceedings of ECIR 2021 Part I    2021  [Refereed]

  • RSL19BD at DBDC4: Ensemble of Decision Tree-Based and LSTM-Based Models

    Chih-hao Wang, Sosuke Kato, Tetsuya Sakai

    Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems    2021  [Refereed]

  • Retrieval Evaluation Measures that Agree with Users’ SERP Preferences: Traditional, Preference-based, and Diversity Measures

    Tetsuya Sakai, Zhaohao Zeng

    ACM TOIS    2020  [Refereed]

  • A Siamese CNN Architecture for Learning Chinese Sentence Similarity

    Haoxiang Shi, Cen Wang, Tetsuya Sakai

    Proceedings of AACL-IJCNLP 2020 Student Research Workshop (SRW)    2020  [Refereed]

  • Automatic Evaluation of Iconic Image Retrieval based on Colour, Shape, and Texture

    Riku Togashi, Sumio Fujita, Tetsuya Sakai

    Proceedings of ACM ICMR 2020    2020  [Refereed]

  • SogouQ: The First Large-Scale Test Collection with Click Streams Used in a Shared-Task Evaluation

    Ruihua Song, Min Zhang, Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou

    Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact    2020  [Refereed]

  • Graded Relevance

    Tetsuya Sakai

    Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact    2020  [Refereed]

  • Visual Intents vs. Clicks, Likes, and Purchases in E-commerce

    Riku Togashi, Tetsuya Sakai

    Proceedings of ACM SIGIR 2020    2020  [Refereed]

  • Good Evaluation Measures based on Document Preferences

    Tetsuya Sakai, Zhaohao Zeng

    Proceedings of ACM SIGIR 2020    2020  [Refereed]

  • How to Measure the Reproducibility of System-oriented IR Experiments

    Timo Breuer, Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Philipp Schaer, Ian Soboroff

    Proceedings of ACM SIGIR 2020    2020  [Refereed]

  • 文書分類技術に基づくエントリーシートからの業界推薦

    三王慶太, 酒井哲也

    日本データベース学会和文論文誌    2020  [Refereed]

  • Randomised vs. Prioritised Pools for Relevance Assessments: Sample Size Considerations

    Tetsuya Sakai, Peng Xiao

    Proceedings of AIRS 2019    2020  [Refereed]

  • Generating Short Product Descriptors based on Very Little Training Data

    Peng Xiao, Joo-Young Lee, Sijie Tao, Young-Sook Hwang, Tetsuya Sakai

    Proceedings of AIRS 2019    2020  [Refereed]

  • Unsupervised Answer Retrieval with Data Fusion for Community Question Answering

    Sosuke Kato, Toru Shimizu, Sumio Fujita, Tetsuya Sakai

    Proceedings of AIRS 2019    2020  [Refereed]

  • Towards Automatic Evaluation of Reused Answers in Community Question Answering

    Hsin-Wen Liu, Sumio Fujita, Tetsuya Sakai

    Proceedings of AIRS 2019    2020  [Refereed]

  • Arc Loss: Softmax with Additive Angular Margin for Answer Retrieval

    Rikiya Suzuki, Sumio Fujita, and Tetsuya Sakai

    Proceedings of AIRS 2019    2020  [Refereed]

  • System Evaluation of Ternary Error-Correcting Output Codes for Multiclass Classification Problems*

    Shigeichi Hirasawa, Gendo Kumoi, Hideki Yagi, Manabu Kobayashi, Masayuki Goto, Tetsuya Sakai, Hiroshige Inazumi

    2019 IEEE International Conference on Systems, Man and Cybernetics (SMC)    2019.10  [Refereed]

    DOI

  • Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

    Tetsuya Sakai

    Proceesings of ACM WSDM 2019    2019  [Refereed]

  • Attitude Detection for One-Round Conversation: Jointly Extracting Target-Polarity Pairs

    Zhaohao Zeng, Ruihua Song, Pingping Lin, and Tetsuya Sakai

    Proceesings of ACM WSDM 2019    2019  [Refereed]

  • A Comparative Study of Deep Learning Approaches for Extractive Query-Focused Multi-Document Summarization

    Yuliska and Tetsuya Sakai

    Proceedings of IEEE ICICT 2019    2019  [Refereed]

  • Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

    Tetsuya Sakai

    Proceedings of ECIR 2019 Part II (LNCS 11438)    2019  [Refereed]

  • CENTRE@CLEF 2019

    Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, and Ian Soboroff

    Proceedings of ECIR 2019 Part II (LNCS 11438)    2019  [Refereed]

  • Celebrating 20 Years of NTCIR: The Book

    Douglas W. Oard, D.W., Tetsuya Sakai, and Noriko Kando

    Proceedings of EVIA 2019    2019  [Refereed]

  • RSL19BD at DBDC4: Ensemble of Decision Tree-based and LSTM-based Models

    Chih-hao Wang, Sosuke Kato, and Tetsuya Sakai

    Proceedings of Chatbots and Conversational Agents and Dialogue Breakdown Detection Challenge (WOCHAT+DBDC), IWSDS 2019    2019  [Refereed]

  • Low-cost, Bottom-up Measures for Evaluating Search Result Diversification

    Zhicheng Dou, Xue Yang, Diya Li, Ji-Rong Wen, Tetsuya Sakai

    Information Retrieval Journal    2019  [Refereed]

  • Which Diversity Evaluation Measures Are “Good”?

    Tetsuya Sakai and Zhaohao Zeng

    Proceedings of ACM SIGIR 2019    2019  [Refereed]

  • The SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)

    Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

    Proceedings of ACM SIGIR 2019    2019  [Refereed]

  • Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)

    Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

    Proceedings of OSIRRC 2019    2019  [Refereed]

  • BM25 Pseudo Relevance Feedback Using Anserini at Waseda University

    Zhaohao Zeng, Tetsuya Sakai

    Proceedings of OSIRRC 2019    2019  [Refereed]

  • Composing a Picture Book by Automatic Story Understanding and Visualization

    Xiaoyu Qi, Ruihua Song, Chunting Wang, Jin Zhou, and Tetsuya Sakai

    Proceedings of the Second Storytelling Workshop (StoryNLP @ ACL2019)    2019  [Refereed]

  • CENTRE@CLEF2019: Overview of the Replicability and Reproducibility Tasks

    Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Ian Soboroff

    CLEF 2019 Working Notes    2019  [Refereed]

  • CENTRE@CLEF2019: Sequel in the Systematic Reproducibility Realm

    Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, and Ian Soboroff

    Proceedings of CLEF 2019 (LNCS 11696)    2019  [Refereed]

  • Generalising Kendall’s Tau for Noisy and Incomplete Preference Judgements

    Riku Togashi and Tetsuya Sakai

    Proceedings of ACM ICTIR 2019    2019  [Refereed]

  • Evaluating Image-Inspired Poetry Generation

    Chao-Chung Wu, Ruihua Song, Tetsuya Sakai, Wen-Feng Cheng, Xing Xie, and Shou-De Lin

    Proceedings of NLPCC 2019    2019  [Refereed]

  • How to Run an Evaluation Task: with a Primary Focus on Ad Hoc Information Retrieval

    Tetsuya Sakai

    Information Retrieval Evaluation in a Changing World : Lessons Learned from 20 Years of CLEF    2019  [Refereed]

  • Voice Assistant アプリの大規模実態調査

    刀塚敦子, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉

    コンピュータセキュリティシンポジウム    2019  [Refereed]

  • Voice Input Interface Failures and Frustration: Developer and User Perspectives

    Shiyoh Goetsu and Tetsuya Sakai

    ACM UIST 2019 Adjunct    2019  [Refereed]

  • A First Look at the Privacy Risks of Voice Assistant Apps

    Atsuko Natatsuka, Mitsuaki Akiyama, Ryo Iijima, Tetsuya Sakai, Takuya Watanabe, and Tatsuya Mori

    ACM CCS 2019 Posters & Demos    2019  [Refereed]

  • Attitude Detection for One-Round Conversation: Jointly Extracting Target-Polarity Pairs

    Zhaohao Zeng, Ruihua Song, Pingping Lin, and Tetsuya Sakai

    Journal of Information Processing    2019  [Refereed]

  • Search Result Diversity Evaluation Based on Intent Hierarchies

    Xiaojie Wang, Ji-Rong Wen, Zhicheng Dou, Tetsuya Sakai, Rui Zhang

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING   30 ( 1 ) 156 - 169  2018.01  [Refereed]

     View Summary

    Search result diversification aims at returning diversified document lists to cover different user intents of a query. Existing diversity measures assume that the intents of a query are disjoint, and do not consider their relationships. In this paper, we introduce intent hierarchies to model the relationships between intents, and present four weighing schemes. Based on intent hierarchies, we propose several hierarchical measures that take into account the relationships between intents. We demonstrate the feasibility of hierarchical measures by using a new test collection based on TREC Web Track 2009-2013 diversity test collections and by using NTCIR-11 IMine test collection. Our main experimental findings are: (1) Hierarchical measures are more discriminative and intuitive than existing measures. In terms of intuitiveness, it is preferable for hierarchical measures to use the whole intent hierarchies than to use only the leaf nodes. (2) The types of intent hierarchies used affect the discriminative power and intuitiveness of hierarchical measures. We suggest the best type of intent hierarchies to be used according to whether the nonuniform weights are available. (3) To measure the benefits of the diversification algorithms which use automatically mined hierarchical intents, it is important to use hierarchical measures instead of existing measures.

    DOI

    Scopus

    17
    Citation
    (Scopus)
  • Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

    Tetsuya Sakai

    Proceedings of ACM SIGIR 2018    2018  [Refereed]

  • Comparing Two Binned Probability Distributions for Information Access Evaluation

    Tetsuya Sakai

    Proceedings of ACM SIGIR 2018    2018  [Refereed]

  • CENTRE@CLEF2018: Overview of the Replicability Task

    Nicola Ferro, Maria Maistro, Tetsuya Sakai, and Ian Soboroff

    CLEF 2018 Working Notes 招待論文    2018  [Refereed]

  • Topic Set Size Design for Paired and Unpaired Data

    Tetsuya Sakai

    Proceedings of ACM ICTIR 2018    2018  [Refereed]

  • Classifying Community QA Questions That Contain an Image

    Kenta Tamaki, Riku Togashi, Sosuke Kato, Sumio Fujita, Hideyuki Maeda, and Tetsuya Sakai

    Proceedings of ACM ICTIR 2018    2018  [Refereed]

  • 放棄セッションにおけるユーザ操作に着目したモバイル検索カードの順位付け

    川崎真未, Inho Kang, 酒井哲也

    IPSJ TOD11(3)    2018  [Refereed]

  • Towards Automatic Evaluation of Customer-Helpdesk Dialogues

    Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, and Tetsuya Sakai

    Journal of Information Processing    2018  [Refereed]

  • Overview of CENTRE@CLEF 2018: a First Tale in the Systematic Reproducibility Realm

    Nicola Ferro, Maria Maistro, Tetsuya Sakai, and Ian Soboroff

    Proceedings of CLEF 2018 (LNCS 11018)    2018  [Refereed]

  • Why You Should Listen to This Song: Reason Generation for Explainable Recommendation

    Guoshuai Zhao, Hao Fu, Ruihua Song, Tetsuya Sakai, Xing Xie, and Xueming Qian

    1st Workshop on Scalable and Applicable Recommendation Systems (SAREC 2018)    2018  [Refereed]

  • Understanding the Inconsistency between Behaviors and Descriptions of Mobile Apps

    Takuya Watanabe, Akiyama Mitsuki, Tetsuya Sakai, Hironori Washizaki, and Tatsuya Mori

    IEICE Transactions    2018  [Refereed]

  • Proceedings of AIRS 2018 (LNCS 11292)

    Yuen-Hsien Tseng, Tetsuya Sakai, Jing Jiang, Lun-Wei Ku, Dae Hoon Park, Jui-Feng Yeh, Liang-Chih Yu, Lung-Hao Lee, Zhi-Hong Chen

    エディタ    2018  [Refereed]

  • The probability that your hypothesis is correct, credible intervals, and effect sizes for IR evaluation

    Tetsuya Sakai

    SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval     25 - 34  2017.08  [Refereed]

     View Summary

    Using classical statistical significance tests, researchers can only discuss PD+jH, the probability of observing the data D at hand or something more extreme, under the assumption that the hypothesis H is true (i.e., the p-value). But what we usually want is PHjD, the probability that a hypothesis is true, given the data. If we use Bayesian statistics with state-of-The-Art Markov Chain Monte Carlo (MCMC) methods for obtaining posterior distributions, this is no longer a problem. .at is, instead of the classical p-values and 95% confidence intervals, which are offen misinterpreted respectively as "probability that the hypothesis is (in)correct" and "probability that the true parameter value drops within the interval is 95%," we can easily obtain PHjD and credible intervals which represent exactly the above. Moreover, with Bayesian tests, we can easily handle virtually any hypothesis, not just "equality of means," and obtain an Expected A Posteriori (EAP) value of any statistic that we are interested in. We provide simple tools to encourage the IR community to take up paired and unpaired Bayesian tests for comparing two systems. Using a variety of TREC and NTCIR data, we compare PHjD with p-values, credible intervals with con.-dence intervals, and Bayesian EAP effect sizes with classical ones. Our results show that (a) p-values and confidence intervals can respectively be regarded as approximations of what we really want, namely, PHjD and credible intervals
    and (b) sample effect sizes from classical significance tests can di.er considerably from the Bayesian EAP effect sizes, which suggests that the former can be poor estimates of population effect sizes. For both paired and unpaired tests, we propose that the IR community report the EAP, the credible interval, and the probability of hypothesis being true, not only for the raw di.erence in means but also for the effect size in terms of Glass's.δ.

    DOI

    Scopus

    10
    Citation
    (Scopus)
  • Evaluating mobile search with height-biased gain

    Cheng Luo, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, Shaoping Ma

    SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval     435 - 444  2017.08  [Refereed]

     View Summary

    Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33% of the time, which is better than all existing metrics.

    DOI

    Scopus

    22
    Citation
    (Scopus)
  • LSTM vs. BM25 for Open-domain QA: A hands-on comparison of effectiveness and efficiency

    Sosuke Kato, Riku Togashi, Hideyuki Maeda, Sumio Fujita, Tetsuya Sakai

    SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval     1309 - 1312  2017.08  [Refereed]

     View Summary

    Recent advances in neural networks, along with the growth of rich and diverse community question answering (cQA) data, have en-abled researchers to construct robust open-domain question an-swering (QA) systems. It is often claimed that such state-of-The-art QA systems far outperform traditional IR baselines such as BM25. However, most such studies rely on relatively small data sets, e.g., those extracted from the old TREC QA tracks. Given mas-sive training data plus a separate corpus of Q&amp
    A pairs as the tar-get knowledge source, how well would such a system really per-form? How fast would it respond? In this demonstration, we pro-vide the attendees of SIGIR 2017 an opportunity to experience a live comparison of two open-domain QA systems, one based on a long short-Term memory (LSTM) architecture with over 11 mil-lion Yahoo! Chiebukuro (i.e., Japanese Yahoo! Answers) questions and over 27.4 million answers for training, and the other based on BM25. Both systems use the same Q&amp
    A knowledge source for answer retrieval. Our core demonstration system is a pair of Japan-ese monolingual QA systems, but we leverage machine translation for letting the SIGIR attendees enter English questions and com-pare the Japanese responses from the two systems after translating them into English.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Does document relevance affect the searcher's perception of time?

    Cheng Luo, Yiqun Liu, Tetsuya Sakai, Ke Zhou, Fan Zhang, Xue Li, Shaoping Ma

    WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining     141 - 150  2017.02  [Refereed]

     View Summary

    Time plays an essential role in multiple areas of Information Retrieval (IR) studies such as search evaluation, user behavior analysis, temporal search result ranking and query understanding. Especially, in search evaluation studies, time is usually adopted as a measure to quantify users' efforts in search processes. Psychological studies have reported that the time perception of human beings can be affected by many stimuli, such as attention and motivation, which are closely related to many cognitive factors in search. Considering the fact that users' search experiences are affected by their subjective feelings of time, rather than the objective time measured by timing devices, it is necessary to look into the different factors that have impacts on search users' perception of time. In this work, we make a first step towards revealing the time perception mechanism of search users with the following contributions: (1) We establish an experimental research framework to measure the subjective perception of time while reading documents in search scenario, which originates from but is also different from traditional time perception measurements in psychological studies. (2) With the framework, we show that while users are reading result documents, document relevance has small yet visible effect on search users' perception of time. By further examining the impact of other factors, we demonstrate that the effect on relevant documents can also be influenced by individuals and tasks. (3) We conduct a preliminary experiment in which the difference between perceived time and dwell time is taken into consideration in a search evaluation task. We found that the revised framework achieved a better correlation with users' satisfaction feedbacks. This work may help us better understand the time perception mechanism of search users and provide insights in how to better incorporate time factor in search evaluation studies.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • Investigating Users' Time Perception during Web Search

    Cheng Luo, Xue Li, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, and Shaoping Ma

    Proceedings of CHIIR 2017    2017  [Refereed]

  • Overview of Special Issue

    Donna Harman and Diane Kelly (eds.), James Allan, Nicholas J. Belkin, Paul Bennett, Jamie Callan, Charles Clarke, Fernando Diaz, Susan Dumais, Nicola Ferro, Donna Harman, Djoerd Hiemstra, Ian Ruthven, Tetsuya Sakai, Mark D. Smucker, Justin Zobel

    SIGIR Forum, 51(2)    2017  [Refereed]

  • Mobile Vertical Ranking based on Preference Graphs

    Yuta Kadotami, Yasuaki Yoshida, Sumio Fujita, and Tetsuya Sakai

    ACM ICTIR 2017    2017  [Refereed]

  • Ranking Rich Mobile Verticals based on Clicks and Abandonment

    Mami Kawasaki, Inho Kang, and Tetsuya Sakai

    Proceedings of ACM CIKM 2017    2017  [Refereed]

  • Test Collections and Measures for Evaluating Customer-Helpdesk Dialogues

    Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, and Tetsuya Sakai

    Proceedings of EVIA 2017    2017  [Refereed]

  • Evaluating Evaluation Measures with Worst-Case Confidence Interval Widths

    Tetsuya Sakai

    Proceedings of EVIA 2017    2017  [Refereed]

  • Towards Automatic Evaluation of Multi-Turn Dialogues: A Task Design that Leverages Inherently Subjective Annotations

    Tetsuya Sakai

    Proceedings of EVIA 2017    2017  [Refereed]

  • The Effect of Inter-Assessor Disagreement on IR System Evaluation: A Case Study with Lancers and Students

    Tetsuya Sakai

    Proceedings of EVIA 2017    2017  [Refereed]

  • Unanimity-Aware Gain for Highly Subjective Assessments

    Tetsuya Sakai

    Proceedings of EVIA 2017    2017  [Refereed]

  • RSL17BD at DBDC3: Computing Utterance Similarities based on Term Frequency and Word Embedding Vectors

    Sosuke Kato and Tetsuya Sakai

    Proceedings of DSTC6    2017  [Refereed]

  • Simple and effective approach to score standardisation

    Tetsuya Sakai

    ICTIR 2016 - Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval     95 - 104  2016.09  [Refereed]

     View Summary

    Webber, Moffat and Zobel proposed score standardization for information retrieval evaluation with multiple test collections. Given a topic-by-run raw score matrix in terms of some evaluation measure, each score can be standardised using the topic's sample mean and sample standard deviation across a set of past runs so as to quantify how different a system is from the "average" system in standard deviation units. Using standardised scores, researchers can compare systems across different test collections without worrying about topic hardness or normalisation. While Webber et al. mapped the standardised scores to the [0,1] range using a standard normal cumulative density function, the present study demonstrates that linear transformation of the standardised scores, a method widely used in educational research, can be a simple and effective alternative. We use three TREC robust track data sets with graded relevance assessments and official runs to compare these methods by means of leave-one-out tests, discriminative power, swap rate tests, and topic set size design. In particular, we demonstrate that our method is superior to the method of Webber et al. in terms of swap rates and topic set size design: put simply, our method ensures pairwise system comparisons that are more consistent across different data sets, and is arguably more convenient for designing a new test collection from a statistical viewpoint.

    DOI

    Scopus

    14
    Citation
    (Scopus)
  • Evaluating search result diversity using intent hierarchies

    Xiaojie Wang, Zhicheng Dou, Tetsuya Sakai, Ji-Rong Wen

    SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval     415 - 424  2016.07  [Refereed]

     View Summary

    Search result diversification aims at returning diversified document lists to cover different user intents for ambiguous or broad queries. Existing diversity measures assume that user intents are independent or exclusive, and do not consider the relationships among the intents. In this paper, we introduce intent hierarchies to model the relationships among intents. Based on intent hierarchies, we propose several hierarchical measures that can consider the relationships among intents. We demonstrate the feasibility of hierarchical measures by using a new test collection based on TREC Web Track 2009-2013 diversity test collections. Our main experimental findings are: (1) Hierarchical measures are generally more discriminative and intuitive than existing measures using flat lists of intents
    (2) When the queries have multilayer intent hierarchies, hierarchical measures are less correlated to existing measures, but can get more improvement in discriminative power
    (3) Hierarchical measures are more intuitive in terms of diversity or relevance. The hierarchical measures using the whole intent hierarchies are more intuitive than only using the leaf nodes in terms of diversity and relevance.

    DOI

    Scopus

    23
    Citation
    (Scopus)
  • Topic set size design

    Tetsuya Sakai

    INFORMATION RETRIEVAL JOURNAL   19 ( 3 ) 256 - 283  2016.06  [Refereed]

     View Summary

    Traditional pooling-based information retrieval (IR) test collections typically have n = 50-100 topics, but it is difficult for an IR researcher to say why the topic set size should really be n. The present study provides details on principled ways to determine the number of topics for a test collection to be built, based on a specific set of statistical requirements. We employ Nagata's three sample size design techniques, which are based on the paired t test, one-way ANOVA, and confidence intervals, respectively. These topic set size design methods require topic-by-run score matrices from past test collections for the purpose of estimating the within-system population variance for a particular evaluation measure. While the previous work of Sakai incorrectly used estimates of the total variances, here we use the correct estimates of the within-system variances, which yield slightly smaller topic set sizes than those reported previously by Sakai. Moreover, this study provides a comparison across the three methods. Our conclusions nevertheless echo those of Sakai: as different evaluation measures can have vastly different within-system variances, they require substantially different topic set sizes under the same set of statistical requirements; by analysing the tradeoff between the topic set size and the pool depth for a particular evaluation measure in advance, researchers can build statistically reliable yet highly economical test collections.

    DOI

    Scopus

    32
    Citation
    (Scopus)
  • On Estimating Variances for Topic Set Size Design

    Tetsuya Sakai, Lifeng Shang

    EVIA 2016    2016  [Refereed]

  • Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect User Preferences?

    Makoto P. Kato, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, and Hajime Morita

    EVIA 2016    2016  [Refereed]

  • Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS,2006-2015

    Tetsuya Sakai

    ACM SIGIR 2016    2016  [Refereed]

  • Two Sample T-tests for IR Evaluation: Student or Welch?

    Tetsuya Sakai

    ACM SIGIR 2016    2016  [Refereed]

  • Report on the First International Workshop on the Evaluation on Collaborative Information Seeking and Retrieval (ECol'2015),

    Laure Soulier, Lynda Tamine, Tetsuya Sakai, Leif Azzopardi, and Jeremy Pickens

    ACM ICTIR 2016    2016  [Refereed]

  • Topic Set Size Design and Power Analysis in Practice (Tutorial Abstract),

    Tetsuya Sakai

    ACM ICTIR 2016    2016  [Refereed]

  • The Effect of Score Standardisation on Topic Set Size Design

    Tetsuya Sakai

    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2016   9994   16 - 28  2016  [Refereed]

     View Summary

    Given a topic-by-run score matrix from past data, topic set size design methods can help test collection builders determine the number of topics to create for a new test collection from a statistical viewpoint. In this study, we apply a recently-proposed score standardisation method called std-AB to score matrices before applying topic set size design, and demonstrate its advantages. For topic set size design, std-AB suppresses score variances and thereby enables test collection builders to consider realistic choices of topic set sizes, and to handle unnormalised measures in the same way as normalised measures. In addition, even discrete measures that clearly violate normality assumptions look more continuous after applying std-AB, which may make them more suitable for statistically motivated topic set size design. Our experiments cover a variety of tasks and evaluation measures from NTCIR-12.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Search result diversification based on hierarchical intents

    Sha Hu, Zhicheng Dou, Xiaojie Wang, Tetsuya Sakai, Ji-Rong Wen

    International Conference on Information and Knowledge Management, Proceedings   19-23-   63 - 72  2015.10  [Refereed]

     View Summary

    A large percentage of queries issued to search engines are broad or ambiguous. Search result diversification aims to solve this problem, by returning diverse results that can fulfill as many different information needs as possible. Most existing intent-aware search result diversification algorithms formulate user intents for a query as a flat list of subtopics. In this paper, we introduce a new hierarchical structure to represent user intents and propose two general hierarchical diversification models to leverage hierarchical intents. Experimental results show that our hierarchical diversification models outperform state-of-the-art diversification methods that use traditional flat subtopics.

    DOI

    Scopus

    66
    Citation
    (Scopus)
  • Dynamic author name disambiguation for growing digital libraries

    Yanan Qian, Qinghua Zheng, Tetsuya Sakai, Junting Ye, Jun Liu

    INFORMATION RETRIEVAL   18 ( 5 ) 379 - 412  2015.10  [Refereed]

     View Summary

    When a digital library user searches for publications by an author name, she often sees a mixture of publications by different authors who have the same name. With the growth of digital libraries and involvement of more authors, this author ambiguity problem is becoming critical. Author disambiguation (AD) often tries to solve this problem by leveraging metadata such as coauthors, research topics, publication venues and citation information, since more personal information such as the contact details is often restricted or missing. In this paper, we study the problem of how to efficiently disambiguate author names given an incessant stream of published papers. To this end, we propose a "BatchAD+IncAD" framework for dynamic author disambiguation. First, we perform batch author disambiguation (BatchAD) to disambiguate all author names at a given time by grouping all records (each record refers to a paper with one of its author names) into disjoint clusters. This establishes a one-to-one mapping between the clusters and real-world authors. Then, for newly added papers, we periodically perform incremental author disambiguation (IncAD), which determines whether each new record can be assigned to an existing cluster, or to a new cluster not yet included in the previous data. Based on the new data, IncAD also tries to correct previous AD results. Our main contributions are: (1) We demonstrate with real data that a small number of new papers often have overlapping author names with a large portion of existing papers, so it is challenging for IncAD to effectively leverage previous AD results. (2) We propose a novel IncAD model which aggregates metadata from a cluster of records to estimate the author's profile such as her coauthor distributions and keyword distributions, in order to predict how likely it is that a new record is "produced" by the author. (3) Using two labeled datasets and one large-scale raw dataset, we show that the proposed method is much more efficient than state-of-the-art methods while ensuring high accuracy.

    DOI

    Scopus

    41
    Citation
    (Scopus)
  • Understanding the Inconsistencies between Text Descriptions and the Use of Privacy-sensitive Resources of Mobile Apps

    Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, Hironori Washizaki, Tatsuya Mori

    SOUPS 2015    2015  [Refereed]

  • Topic Set Size Design with the Evaluation Measures for Short Text Conversation

    Tetsuya Sakai, Lifeng Shang, Zhengdong Lu, Hang Li

    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2015   9460   319 - 331  2015  [Refereed]

     View Summary

    Short Text Conversation (STC) is a new NTCIR task which tackles the following research question: given a microblog repository and a new post to that microblog, can systems reuse an old comment from the respository to satisfy the author of the new post? The official evaluation measures of STC are normalised gain at 1 (nG@1), normalised expected reciprocal rank at 10 (nERR@10), and P+, all of which can be regarded as evaluation measures for navigational intents. In this study, we apply the topic set size design technique of Sakai to decide on the number of test topics, using variance estimates of the above evaluation measures. Our main conclusion is to create 100 test topics, but what distinguishes our work from other tasks with similar topic set sizes is that we know what this topic set size means from a statistical viewpoint for each of our evaluation measures. We also demonstrate that, under the same set of statistical requirements, the topic set sizes required by nERR@10 and P+ are more or less the same, while nG@1 requires more than twice as many topics. To our knowledge, our task is the first among all efforts at TREC-like evaluation conferences to actually create a new test collection by using this principled approach.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Designing test collections for comparing many systems

    Tetsuya Sakai

    CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management     61 - 70  2014.11  [Refereed]

     View Summary

    A researcher decides to build a test collection for comparing her new information retrieval (IR) systems with several state-of-the-art baselines. She wants to know the number of topics (n) she needs to create in advance, so that she can start looking for (say) a query log large enough for sampling n good topics, and estimating the relevance assessment cost. We provide practical solutions to researchers like her using power analysis and sample size design techniques, and demonstrate its usefulness for several IR tasks and evaluation measures. We consider not only the paired t-test but also one-way analysis of variance (ANOVA) for significance testing to accommodate comparison of m(≥ 2) systems under a given set of statistical requirements (α: the Type I error rate, β: the Type II error rate, and minD: the minimum detectable difference between the best and the worst systems). Using our simple Excel tools and some pooled variance estimates from past data, researchers can design statistically well-designed test collections. We demonstrate that, as different evaluation measures have different variances across topics, they inevitably require different topic set sizes. This suggests that the evaluation measures should be chosen at the test collection design phase. Moreover, through a pool depth reduction experiment with past data, we show how the relevance assessment cost can be reduced dramatically while freezing the set of statistical requirements. Based on the cost analysis and the available budget, researchers can determine the right balance betweeen n and the pool depth pd. Our techniques and tools are applicable to test collections for non-IR tasks as well.

    DOI

    Scopus

    13
    Citation
    (Scopus)
  • Metrics, Statistics, Tests (invited paper)

    Tetsuya Sakai

    PROMISE Winter School 2013: Bridging between Information Retrieval and Databases (LNCS 8174)    2014  [Refereed]

  • Statistical Reform in Information Retrieval?

    Tetsuya Sakai

    SIGIR Forum    2014  [Refereed]

  • Designing Test Collections That Provide Tight Confidence Intervals

    Tetsuya Sakai

    Forum on Information Technology 2014   13 ( 2 ) 15 - 18  2014  [Refereed]

    CiNii

  • ReviewCollage: A Mobile Interface for Direct Comparison Using Online Reviews

    Haojian Jin, Tetsuya Sakai, Koji Yatani

    PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION WITH MOBILE DEVICES AND SERVICES (MOBILEHCI'14)     349 - 358  2014  [Refereed]

     View Summary

    Review comments posted in online websites can help the user decide a product to purchase or place to visit. They can also be useful to closely compare a couple of candidate entities. However, the user may have to read different webpages back and forth for comparison, and this is not desirable particularly when she is using a mobile device. We present ReviewCollage, a mobile interface that aggregates information about two reviewed entities in a one-page view. ReviewCollage uses attribute-value pairs, known to be effective for review text summarization, and highlights the similarities and differences between the entities. Our user study confirms that ReviewCollage can support the user to compare two entities and make a decision within a couple of minutes, at least as quickly as existing summarization interfaces. It also reveals that ReviewCollage could be most useful when two entities are very similar.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Topic Set Size Design with Variance Estimates from Two-Way ANOVA

    Tetsuya Sakai

    EVIA 2014    2014  [Refereed]

  • When do people use query suggestion? A query suggestion log analysis

    Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka

    INFORMATION RETRIEVAL   16 ( 6 ) 725 - 746  2013.12  [Refereed]

     View Summary

    Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it has not been clear what circumstances cause the user to turn to query suggestion. In order to investigate when and how the user uses query suggestion, we analyzed three kinds of data sets obtained from a major commercial Web search engine, comprising approximately 126 million unique queries, 876 million query suggestions and 306 million action patterns of users. Our analysis shows that query suggestions are often used (1) when the original query is a rare query, (2) when the original query is a single-term query, (3) when query suggestions are unambiguous, (4) when query suggestions are generalizations or error corrections of the original query, and (5) after the user has clicked on several URLs in the first search result page. Our results suggest that search engines should provide better assistance especially when rare or single-term queries are input, and that they should dynamically provide query suggestions according to the searcher's current state.

    DOI

    Scopus

    33
    Citation
    (Scopus)
  • Introduction to the special issue on search intents and diversification

    Tetsuya Sakai, Noriko Kando, Craig Macdonald, Ian Soboroff

    INFORMATION RETRIEVAL   16 ( 4 ) 427 - 428  2013.08  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Diversified search evaluation: lessons from the NTCIR-9 INTENT task

    Tetsuya Sakai, Ruihua Song

    INFORMATION RETRIEVAL   16 ( 4 ) 504 - 529  2013.08  [Refereed]

     View Summary

    The evaluation of diversified web search results is a relatively new research topic and is not as well-understood as the time-honoured evaluation methodology of traditional IR based on precision and recall. In diversity evaluation, one topic may have more than one intent, and systems are expected to balance relevance and diversity. The recent NTCIR-9 evaluation workshop launched a new task called INTENT which included a diversified web search subtask that differs from the TREC web diversity task in several aspects: the choice of evaluation metrics, the use of intent popularity and per-intent graded relevance, and the use of topic sets that are twice as large as those of TREC. The objective of this study is to examine whether these differences are useful, using the actual data recently obtained from the NTCIR-9 INTENT task. Our main experimental findings are: (1) The evaluation framework used at NTCIR provides more "intuitive" and statistically reliable results than Intent-Aware Expected Reciprocal Rank; (2) Utilising both intent popularity and per-intent graded relevance as is done at NTCIR tends to improve discriminative power, particularly for -nDCG; and (3) Reducing the topic set size, even by just 10 topics, can affect not only significance testing but also the entire system ranking; when 50 topics are used (as in TREC) instead of 100 (as in NTCIR), the system ranking can be substantially different from the original ranking and the discriminative power can be halved. These results suggest that the directions being explored at NTCIR are valuable.

    DOI

    Scopus

    15
    Citation
    (Scopus)
  • Web Search Evaluation with Informational and Navigational Intents

    Tetsuya Sakai

    Journal of Information Processing    2013  [Refereed]

  • The Unreusability of Diversified Test Collections

    Tetsuya Sakai

    EVIA 2013    2013  [Refereed]

  • Summaries, Ranked Retrieval and Sessions: A Unified Framework for Information Access Evaluation

    Tetsuya Sakai and Zhicheng Dou

    ACM SIGIR 2013    2013  [Refereed]

  • Exploring semi-automatic nugget extraction for Japanese one click access evaluation

    Matthew Ekstrand-Abueg, Virgil Pavlu, Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

    SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval     749 - 752  2013  [Refereed]

     View Summary

    Building test collections based on nuggets is useful evaluating systems that return documents, answers, or summaries. However, nugget construction requires a lot of manual work and is not feasible for large query sets. Towards an efficient and scalable nugget-based evaluation, we study the applicability of semi-automatic nugget extraction in the context of the ongoing NTCIR One Click Access (1CLICK) task. We compare manually-extracted and semi-automatically- extracted Japanese nuggets to demonstrate the coverage and efficiency of the semi-automatic nugget extraction. Our findings suggest that the manual nugget extraction can be replaced with a direct adaptation of the English semi-automatic nugget extraction system, especially for queries for which the user desires broad answers from free-form text. Copyright © 2013 ACM.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Report from the NTCIR-10 1CLICK-2 Japanese subtask: Baselines, upperbounds and evaluation robustness

    Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

    SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval     753 - 756  2013  [Refereed]

     View Summary

    The One Click Access Task (1CLICK) of NTCIR requires systems to return a concise multi-document summary of web pages in response to a query which is assumed to have been submitted in a mobile context. Systems are evaluated based on information units (or iUnits), and are required to present important pieces of information first and to minimise the amount of text the user has to read. Using the official Japanese results of the second round of the 1CLICK task from NTCIR-10, we discuss our task setting and evaluation framework. Our analyses show that: (1) Simple baseline methods that leverage search engine snippets or Wikipedia are effective for "lookup" type queries but not necessarily for other query types
    (2) There is still a substantial gap between manual and automatic runs
    and (3) Our evaluation metrics are relatively robust to the incompleteness of iUnits. Copyright © 2013 ACM.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Summary of the NTCIR-10 INTENT-2 Task: Subtopic mining and search result diversification

    Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Makoto P. Kato

    SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval     761 - 764  2013  [Refereed]

     View Summary

    The NTCIR INTENT task comprises two subtasks: Subtopic Mining, where systems are required to return a ranked list of subtopic strings for each given query
    and Document Ranking, where systems are required to return a diversified web search result for each given query. This paper summarises the novel features of the Second INTENT task at NTCIR-10 and its main findings, and poses some questions for future diversified search evaluation. Copyright © 2013 ACM.

    DOI

    Scopus

    14
    Citation
    (Scopus)
  • Time-aware structured query suggestion

    Taiki Miyanishi, Tetsuya Sakai

    SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval     809 - 812  2013  [Refereed]

     View Summary

    Most commercial search engines have a query suggestion feature, which is designed to capture various possible search intents behind the user's original query. However, even though different search intents behind a given query may have been popular at different time periods in the past, existing query suggestion methods neither utilize nor present such information. In this study, we propose Time-aware Structured Query Suggestion (TaSQS) which clusters query suggestions along a timeline so that the user can narrow down his search from a temporal point of view. Moreover, when a suggested query is clicked, TaSQS presents web pages from query-URL bipartite graphs after ranking them according to the click counts within a particular time period. Our experiments using data from a commercial search engine log show that the time-aware clustering and the time-aware document ranking features of TaSQS are both effective. Copyright © 2013 ACM.

    DOI

    Scopus

    17
    Citation
    (Scopus)
  • The Impact of Intent Selection on Diversified Search Evaluation

    Tetsuya Sakai, Zhicheng Dou, Charles L.A. Clarke

    ACM SIGIR 2013    2013  [Refereed]

  • Evaluating Heterogeneous Information Access (Position paper)

    Ke Zhou, Tetsuya Sakai, Mounia Lalmas, Zhicheng Dou, and Joemon M. Jose

    Workshop on Modeling User Behavior for Information Access Evaluation    2013  [Refereed]

  • Mining Search Intents from Text Fragments

    Qinglei Wang, Yanan Qian, Ruihua Song, Zhicheng Dou, Fan Zhang, Tetsuya Sakai, and Qinghua Zheng

    Information Retrieval    2013  [Refereed]

  • On the reliability and intuitiveness of aggregated search metrics

    Ke Zhou, Mounia Lalmas, Tetsuya Sakai, Ronan Cummins, Joemon M. Jose

    International Conference on Information and Knowledge Management, Proceedings     689 - 698  2013  [Refereed]

     View Summary

    Aggregating search results from a variety of diverse verticals such as news, images, videos and Wikipedia into a single interface is a popular web search presentation paradigm. Although several aggregated search (AS) metrics have been proposed to evaluate AS result pages, their properties remain poorly understood. In this paper, we compare the properties of existing AS metrics under the assumptions that (1) queries may have multiple preferred verticals
    (2) the likelihood of each vertical preference is available
    and (3) the topical relevance assessments of results returned from each vertical is available. We compare a wide range of AS metrics on two test collections. Our main criteria of comparison are (1) discriminative power, which represents the reliability of a metric in comparing the performance of systems, and (2) intuitiveness, which represents how well a metric captures the various key aspects to be measured (i.e. various aspects of a user's perception of AS result pages). Our study shows that the AS metrics that capture key AS components (e.g., vertical selection) have several advantages over other metrics. This work sheds new lights on the further developments and applications of AS metrics. Copyright 2013 ACM.

    DOI

    Scopus

    15
    Citation
    (Scopus)
  • Dynamic query intent mining from a search log stream

    Yanan Qian, Tetsuya Sakai, Junting Ye, Qinghua Zheng, Cong Li

    International Conference on Information and Knowledge Management, Proceedings     1205 - 1208  2013  [Refereed]

     View Summary

    It has long been recognized that search queries are often broad and ambiguous. Even when submitting the same query, different users may have different search intents. Moreover, the intents are dynamically evolving. Some intents are constantly popular with users, others are more bursty. We propose a method for mining dynamic query intents from search query logs. By regarding the query logs as a data stream, we identify constant intents while quickly capturing new bursty intents. To evaluate the accuracy and efficiency of our method, we conducted experiments using 50 topics from the NTCIR INTENT-9 data and additional five popular topics, all supplemented with six-month query logs from a commercial search engine. Our results show that our method can accurately capture new intents with short response time. Copyright 2013 ACM.

    DOI

    Scopus

    17
    Citation
    (Scopus)
  • How intuitive are diversified search metrics? Concordance test results for the diversity U-measures

    Tetsuya Sakai

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8281   13 - 24  2013  [Refereed]

     View Summary

    Most of the existing Information Retrieval (IR) metrics discount the value of each retrieved relevant document based on its rank. This statement also applies to the evaluation of diversified search: the widely-used diversity metrics, namely, α-nDCG, Intent-Aware Expected Reciprocal Rank (ERR-IA) and D#-nDCG, are all rank-based. These evaluation metrics regard the system output as a list of document IDs, and ignore all other features such as snippets and document full texts of various lengths. In contrast, the U-measure framework of Sakai and Dou uses the amount of text read by the user as the foundation for discounting the value of relevant information, and can take into account the user's snippet reading and full text reading behaviours. The present study compares the diversity versions of U-measure (D-U and U-IA) with the state-of-the-art diversity metrics using the concordance test: given a pair of ranked lists, we quantify the ability of each metric to favour the more diversified and more relevant list. Our results show that while D#-nDCG is the overall winner in terms of simultaneous concordance with diversity and relevance, D-U and U-IA statistically significantly outperform other state-of-the-art metrics. Moreover, in terms of concordance with relevance alone, D-U and U-IA significantly outperform all rank-based diversity metrics. Thus, D-U and U-IA are not only more realistic but also more relevance-oriented than other diversity metrics. © 2013 Springer-Verlag.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • User-aware advertisability

    Hai-Tao Yu, Tetsuya Sakai

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8281   452 - 463  2013  [Refereed]

     View Summary

    In sponsored search, many studies focus on finding the most relevant advertisements (ads) and their optimal ranking for a submitted query. Determining whether it is suitable to show ads has received less attention. In this paper, we introduce the concept of user-aware advertisability, which refers to the probability of ad-click on sponsored ads when a specific user submits a query. When computing the advertisability for a given query-user pair, we first classify the clicked web pages based on a pre-defined category hierarchy and use the aggregated topical categories of clicked web pages to represent user preference. Taking user preference into account, we then compute the ad-click probability for this query-user pair. Compared with existing methods, the experimental results show that user preference is of great value for generating user-specific advertisability. In particular, our approach that computes advertisability per query-user pair outperforms the two state-of-the-art methods that compute advertisability per query in terms of a variant of the normalized Discounted Cumulative Gain metric. © 2013 Springer-Verlag.

    DOI

    Scopus

  • Estimating intent types for search result diversification

    Kosetsu Tsukuda, Tetsuya Sakai, Zhicheng Dou, Katsumi Tanaka

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8281   25 - 37  2013  [Refereed]

     View Summary

    Given an ambiguous or underspecified query, search result diversification aims at accommodating different user intents within a single Search Engine Result Page (SERP). While automatic identification of different intents for a given query is a crucial step for result diversification, also important is the estimation of intent types (informational vs. navigational). If it is possible to distinguish between informational and navigational intents, search engines can aim to return one best URL for each navigational intent, while allocating more space to the informational intents within the SERP. In light of the observations, we propose a new framework for search result diversification that is intent importance-aware and type-aware. Our experiments using the NTCIR-9 INTENT Japanese Subtopic Mining and Document Ranking test collections show that: (a) our intent type estimation method for Japanese achieves 64.4% accuracy
    and (b) our proposed diversification method achieves 0.6373 in D#-nDCG and 0.5898 in DIN#-nDCG over 56 topics, which are statistically significant gains over the top performers of the NTCIR-9 INTENT Japanese Document Ranking runs. Moreover, our relevance oriented model significantly outperforms our diversity oriented model and the original model by Dou et al.. © 2013 Springer-Verlag.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • On labelling intent types for evaluating search result diversification

    Tetsuya Sakai, Young-In Song

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8281   38 - 49  2013  [Refereed]

     View Summary

    Search result diversification is important for accommodating different user needs by means of covering popular and diverse query intents within a single result page. To evaluate diversity, we believe that it is important to consider the distinction between informational and navigational intents, as users would not want redundant information especially for navigational intents. In this study, we conduct intent type-sensitive diversity evaluation based on both top-down labelling, which labels each intent as either navigational or informational a priori, and bottom-up labelling, which labels each intent based on whether a "navigational relevant" document has actually been identified in the document collection. Our results suggest that reliable type-sensitive diversity evaluation can be conducted using the top-down approach with a clear intent labelling guideline, while ensuring that the desired URLs for navigational intents make their way into relevance assessments. © 2013 Springer-Verlag.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Evaluation with Informational and Navigational Intents

    Tetsuya Sakai

    WWW 2012    2012  [Refereed]

  • Structured query suggestion for specialization and parallel movement: Effect on search behaviors

    Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka

    WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web     389 - 398  2012  [Refereed]

     View Summary

    Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it is often difficult for the user to choose from a list of query suggestions, and to understand the relation between an input query and suggested ones. In this paper, we propose a new method to present query suggestions to the user, which has been designed to help two popular query reformulation actions, namely, specialization (e.g. from "nikon" to "nikon camera") and parallel movement (e.g. from "nikon camera" to "canon camera"). Using a query log collected from a popular commercial Web search engine, our prototype called SParQS classifies query suggestions into automatically generated categories and generates a label for each category. Moreover, SParQS presents some new entities as alternatives to the original query (e.g. "canon" in response to the query "nikon"), together with their query suggestions classified in the same way as the original query's suggestions. We conducted a task-based user study to compare SParQS with a traditional "flat list" query suggestion interface. Our results show that the SParQS interface enables subjects to search more successfully than the flat list case, even though query suggestions presented were exactly the same in the two interfaces. In addition, the subjects found the query suggestions more helpful when they were presented in the SParQS interface rather than in a flat list.

    DOI

    Scopus

    29
    Citation
    (Scopus)
  • Query snowball: A co-occurrence-based approach to multi-document summarization for question answering

    Hajime Morita, Tetsuya Sakai, Manabu Okumura

    IPSJ Online Transactions   5 ( 2012 ) 124 - 129  2012  [Refereed]

     View Summary

    We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.

    DOI

    Scopus

  • AspecTiles: Tile-based visualization of diversified web search results

    Mayu Iwata, Tetsuya Sakai, Takehiro Yamamoto, Yu Chen, Yi Liu, Ji-Rong Wen, Shojiro Nishio

    SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval     85 - 94  2012  [Refereed]

     View Summary

    A diversified search result for an underspecified query generally contains web pages in which there are answers that are relevant to different aspects of the query. In order to help the user locate such relevant answers, we propose a simple extension to the standard Search Engine Result Page (SERP) interface, called AspecTiles. In addition to presenting a ranked list of URLs with their titles and snippets, AspecTiles visualizes the relevance degree of a document to each aspect by means of colored squares ("tiles"). To compare AspecTiles with the standard SERP interface in terms of usefulness, we conducted a user study involving 30 search tasks designed based on the TREC web diversity task topics as well as 32 participants. Our results show that AspecTiles has some advantages in terms of search performance, user behavior, and user satisfaction. First, AspecTiles enables the user to gather relevant information significantly more efficiently than the standard SERP interface for tasks where the user considers several different aspects of the query to be important at the same time (multi-aspect tasks). Second, AspecTiles affects the user's information seeking behavior: with this interface, we observed significantly fewer query reformulations, shorter queries and deeper examinations of ranked lists in multi-aspect tasks. Third, participants of our user study found AspecTiles significantly more useful for finding relevant information and easy to use than the standard SERP interface. These results suggest that simple interfaces like AspecTiles can enhance the search performance and search experience of the user when their queries are underspecified. © 2012 ACM.

    DOI

    Scopus

    12
    Citation
    (Scopus)
  • Towards Zero-Click Mobile IR Evaluation: Knowing What and Knowing When

    Tetsuya Sakai

    ACM SIGIR 2012    2012  [Refereed]

  • New Assessment Criteria for Query Suggestion

    Zhongrui Ma, Yu Chen, Ruihua Song, Tetsuya Sakai, Jiaheng Lu, and Ji-Rong Wen

    ACM SIGIR 2012    2012  [Refereed]

  • The wisdom of advertisers: Mining subgoals via query clustering

    Takehiro Yamamoto, Tetsuya Sakai, Mayu Iwata, Chen Yu, Ji-Rong Wen, Katsumi Tanaka

    ACM International Conference Proceeding Series     505 - 514  2012  [Refereed]

     View Summary

    This paper tackles the problem of mining subgoals of a given search goal from data. For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as "book flights," "book a hotel," "find good restaurants" and "decide which sightseeing spots to visit." As another example, if a searcher wants to lose weight, there may exist several alternative solutions such as "do physical exercise," "take diet pills," and "control calorie intake." In this paper, we refer to such subtasks or solutions as subgoals, and propose to utilize sponsored search data for finding subgoals of a given query by means of query clustering. Advertisements (ads) reflect advertisers' tremendous efforts in trying to match a given query with implicit user needs. Moreover, ads are usually associated with a particular action or transaction. We therefore hypothesized that they are useful for subgoal mining. To our knowledge, our work is the first to use sponsored search data for this purpose. Our experimental results show that sponsored search data is a good resource for obtaining related queries and for identifying subgoals via query clustering. In particular, our method that combines ad impressions from sponsored search data and query co-occurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall. © 2012 ACM.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • The reusability of a diversified search test collection

    Tetsuya Sakai, Zhicheng Dou, Ruihua Song, Noriko Kando

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   7675   26 - 38  2012  [Refereed]

     View Summary

    Traditional ad hoc IR test collections were built using a relatively large pool depth (e.g. 100), and are usually assumed to be reusable. Moreover, when they are reused to compare a new system with another or with systems that contributed to the pools ("contributors"), an even larger measurement depth (e.g. 1,000) is often used for computing evaluation metrics. In contrast, the web diversity test collections that have been created in the past few years at TREC and NTCIR use a much smaller pool depth (e.g. 20). The measurement depth is also small (e.g. 10-30), as search result diversification is primarily intended for the first result page. In this study, we examine the reusability of a typical web diversity test collection, namely, one from the NTCIR-9 INTENT-1 Chinese Document Ranking task, which used a pool depth of 20 and official measurement depths of 10, 20 and 30. First, we conducted additional relevance assessments to expand the official INTENT-1 collection to achieve a pool depth of 40. Using the expanded relevance assessments, we show that run rankings at the measurement depth of 30 are too unreliable, given that the pool depth is 20. Second, we conduct a leave-one-out experiment for every participating team of the INTENT-1 Chinese task, to examine how (un)fairly new runs are evaluated with the INTENT-1 collection. We show that, for the purpose of comparing new systems with the contributors of the test collection being used, condensed-list versions of existing diversity evaluation metrics are more reliable than the raw metrics. However, even the condensed-list metrics may be unreliable if the new systems are not competitive compared to the contributors. © Springer-Verlag 2012.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • One click one revisited: Enhancing evaluation based on information units

    Tetsuya Sakai, Makoto P. Kato

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   7675   39 - 51  2012  [Refereed]

     View Summary

    This paper extends the evaluation framework of the NTCIR-9 One Click Access Task (1CLICK-1), which required systems to return a single, concise textual output in response to a query in order to satisfy the user immediately after a click on the SEARCH button. Unlike traditional nugget-based summarisation and question answering evaluation methods, S-measure, the official evaluation metric of 1CLICK-1, discounts the value of each information unit based on its position within the textual output. We first show that the discount parameter L of S-measure affects system ranking and discriminative power, and that using multiple values, e.g. L = 250 (user has only 30 seconds to view the text) and L = 500 (user has one minute), is beneficial. We then complement the recall-like S-measure with a simple, precision-like metric called T-measure as well as a combination of S-measure and T-measure, called S#. We show that S# with a heavy emphasis on S-measure imposes an appropriate length penalty to 1CLICK-1 system outputs and yet achieves discriminative power that is comparable to S-measure. These new metrics will be used at NTCIR-10 1CLICK-2. © Springer-Verlag 2012.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Grid-based interaction for exploratory search

    Hideo Joho, Tetsuya Sakai

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   7675   496 - 505  2012  [Refereed]

     View Summary

    This paper presents a grid-based interaction model that is designed to encourage searchers to organize a complex search space by managing n x m sub spaces. A search interface was developed based on the proposed interaction model, and its performance was evaluated by a user study carried out in the context of the NTCIR-9 VisEx Task. With the proposed interface, there were cases where subjects discovered new knowledge without accessing external resources when compared to a baseline system. The encouraging results from experiments warrant further studies on the model. © Springer-Verlag 2012.

    DOI

    Scopus

  • Using graded-relevance metrics for evaluating community QA answer selection

    Tetsuya Sakai, Yohei Seki, Daisuke Ishikawa, Kazuko Kuriyama, Noriko Kando, Chin-Yew Lin

    Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011     187 - 196  2011  [Refereed]

     View Summary

    Community Question Answering (CQA) sites such as Yahoo ! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation
    and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments. Copyright 2011 ACM.

    DOI

    Scopus

    36
    Citation
    (Scopus)
  • Query Session Data vs. Clickthrough Data as Query Suggestion Resources

    Makoto P. Kato, Tetsuya Sakai, and Katsumi Tanaka

    ECIR 2011 Workshop on Session Information Retrieval    2011  [Refereed]

  • Challenges in Diversity Evaluation (keynote)

    Tetsuya Sakai

    ECIR 2011 Workshop on Diversity in Document Retrieval    2011  [Refereed]

  • Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

    Naoyoshi Aikawa, Tetsuya Sakai, and Hayato Yamana

    情報処理学会論文誌   2011 ( 1 ) 1 - 9  2011  [Refereed]

    CiNii

  • Evaluating Diversified Search Results Using Per-Intent Graded Relevance

    Tetsuya Sakai and Ruihua Song

    ACM SIGIR 2011    2011  [Refereed]

  • NTCIREVAL: A Generic Toolkit for Information Access Evaluation

    Tetsuya Sakai

    FIT 2011    2011  [Refereed]

  • コミュニティQAにおける良質回答の自動予測

    石川大介, 酒井哲也, 関洋平, 栗山和子, 神門典子

    情報知識学会誌    2011  [Refereed]

  • 北京のマイクロソフト研究所より2011 - 日本人インターンの成功事例 -

    酒井哲也

    若手研究者支援のための産学共同GCOE国内シンポジウムダイジェスト集    2011  [Refereed]

  • What Makes a Good Answer in Community Question Answering? An Analysis of Assessors' Criteria

    Daisuke Ishikawa, Noriko Kando, and Tetsuya Sakai

    EVIA2011    2011  [Refereed]

  • Analysis of Best-Answer Estimation for a Q&A Site and its Application to Machine Learning

    ISHIKAWA Daisuke, KURIYAMA Kazuko, SAKAI Tetsuya, SEKI Yohei, KANDO Noriko

    Journal of Japan Society of Information and Knowledge   20 ( 2 ) 73 - 85  2010.05

     View Summary

    In this research, we investigated whether a computer could estimate the best answer on a Q site. First, a best answer estimation experiment was carried out with human assessors. The data of Yahoo! Chiebukuro was used for the experiment; 50 questions extracted at random from four categories, viz., "Consultation of love," "Personal computer,""General knowledge," and "Politics," were used. The accuracy rate(precision) of the estimation by two assessors was 50% and 52% (random estimation: 34%) for "Consultation of love," 62% and 58% (random estimation: 38%) for "Personal computer," 54% and 56% (random estimation: 37%) for "General knowledge," and 56% and 60% (random estimation:35.8%) for "Politics." Next, the experimental results were analyzed, and the machine learning system with "Detailed","Evidence", and "Polite" in the feature as a factor to choose the best answer was constructed. The precision of the machine learning system exceeded the assessors' results in the "Personal computer"(67%) category, and it fell below the assessors' results in the "Consultation of love"(41%) category. In the "General knowledge" and "Politics" categories, the precision of the machine learning system was almost equal to the assessors' results.

    DOI CiNii

  • Boiling Down Information Retrieval Test Collections

    Tetsuya Sakai, Teruko Mitamura

    RIAO 2010 Proceedings    2010  [Refereed]

  • Constructing a Test Collection with Multi-Intent Queries

    Ruihua Song, Dongjie Qi, Hua Liu, Tetsuya Sakai, Jian-Yun Nie, Hsiao-Wen Hon, and Yong Yu:

    EVIA 2010 Proceedings    2010  [Refereed]

  • Simple Evaluation Metrics for Diversified Search Results

    Tetsuya Sakai, Nick Craswell, Ruihua Song, Stephen Robertson, Zhicheng Dou, and Chin-Yew Lin:

    EVIA 2010 Proceedings    2010  [Refereed]

  • Ranking Retrieval Systems without Relevance Assessments – Revisited

    Tetsuya Sakai and Chin-Yew Lin

    EVIA 2010 Proceedings    2010  [Refereed]

  • コミュニティQAにおける良質な回答の選定タスク: 評価方法に関する考察

    酒井哲也, 石川大介, 栗山和子, 関洋平, 神門典子

    FIT 2010    2010  [Refereed]

  • Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

    Naoyoshi Aikawa, Tetsuya Sakai, and Hayato Yamana

    WebDB Forum 2010    2010  [Refereed]

  • On the robustness of information retrieval metrics to biased relevance assessments

    Tetsuya Sakai

    Journal of Information Processing   17   156 - 166  2009  [Refereed]

     View Summary

    Information Retrieval (IR) test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used IR evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in more realistic settings, by reducing the number of pooled systems and the number of pooled documents. Even though previous studies have shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that these results do not hold when the relevance data are biased towards particular systems or towards the top of the pools. More specifically, we show that the condensed-list versions of Average Precision, Qmeasure and normalised Discounted Cumulative Gain, which we denote as AP′, Q′ and nDCG′, are not necessarily superior to the original metrics for handling biases. Nevertheless, AP′ and Q′ are generally superior to bpref, Rank-Biased Precision and its condensed-list version even in the presence of biases.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Serendipitous Search via Wikipedia: A Query Log Analysis

    Tetsuya Sakai, Kenichi Nogami

    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL     780 - 781  2009  [Refereed]

     View Summary

    We analyse the query log of a click-oriented Japanese search engine that utilises the link structures of Wikipedia for encouraging the user to change his information need and to perform repeated, serendipitous, exploratory search. Our results show that users tend to make transitions within the same query type: from person names to person names, from place names to place names, and so on.

  • Ranking the NTCIR ACLIA IR4QA Systems without Relevance Assessments

    Tetsuya Sakai, Noriko Kando, Hideki Shima, Chuan-Jie Lin, Ruihua Song, Miho Sugimoto, and Teruko Mitamura

    日本データベース学会論文誌    2009  [Refereed]

  • People, Clouds, and Interaction for Information Access (invited paper)

    Tetsuya Sakai

    IUCS 2009    2009  [Refereed]

  • On information retrieval metrics designed for evaluation with incomplete relevance assessments

    Tetsuya Sakai, Noriko Kando

    INFORMATION RETRIEVAL   11 ( 5 ) 447 - 470  2008.10  [Refereed]

     View Summary

    Modern information retrieval (IR) test collections have grown in size, but the available manpower for relevance assessments has more or less remained constant. Hence, how to reliably evaluate and compare IR systems using incomplete relevance data, where many documents exist that were never examined by the relevance assessors, is receiving a lot of attention. This article compares the robustness of IR metrics to incomplete relevance assessments, using four different sets of graded-relevance test collections with submitted runs-the TREC 2003 and 2004 robust track data and the NTCIR-6 Japanese and Chinese IR data from the crosslingual task. Following previous work, we artificially reduce the original relevance data to simulate IR evaluation environments with extremely incomplete relevance data. We then investigate the effect of this reduction on discriminative power, which we define as the proportion of system pairs with a statistically significant difference for a given probability of Type I Error, and on Kendall's rank correlation, which reflects the overall resemblance of two system rankings according to two different metrics or two different relevance data sets. According to these experiments, Q', nDCG' and AP' proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also point out some weaknesses of bpref and Rank-Biased Precision by examining their formal definitions.

    DOI

    Scopus

    106
    Citation
    (Scopus)
  • Introduction to the NTCIR-6 Special Issue

    Noriko Kando, Teruko Mitamura, and Tetsuya Sakai

    ACM Transactions on Asian Language Information Processing (TALIP)    2008  [Refereed]

  • Precision-at-ten considered redundant

    William Webber, Alistair Moffat, Justin Zobel, Tetsuya Sakai

    ACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings     695 - 696  2008  [Refereed]

     View Summary

    Information retrieval systems are compared using evaluation metrics, with researchers commonly reporting results for simple metrics such as precision-at-10 or reciprocal rank together with more complex ones such as average precision or discounted cumulative gain. In this paper, we demonstrate that complex metrics are as good as or better than simple metrics at predicting the performance of the simple metrics on other topics. Therefore, reporting of results from simple metrics alongside complex ones is redundant.

    DOI

    Scopus

    29
    Citation
    (Scopus)
  • Comparing Metrics across TREC and NTCIR: The Robustness to Pool Depth Bias

    Tetsuya Sakai

    ACM SIGIR 2008 Proceedings    2008  [Refereed]

    CiNii

  • クリックスルーに基づく探検型検索サイトの設計と開発,

    酒井 哲也, 小山田 浩史, 野上 謙一, 北村 仁美, 梶浦 正浩, 東 美奈子, 野中 由美子, 小野 雅也, 菊池 豊

    第7回情報科学技術フォーラム2008    2008  [Refereed]

    CiNii

  • Comparing metrics across TREC and NTCIR: The robustness to system bias

    Tetsuya Sakai

    International Conference on Information and Knowledge Management, Proceedings     581 - 590  2008  [Refereed]

     View Summary

    Test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in a more realistic setting, by reducing the number of pooled systems. Even though previous work has shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that these results do not hold in the presence of system bias. In our experiments using TREC and NTCIR data, we first show that condensed-list metrics overestimate new systems while traditional metrics underestimate them, and that the overestimation tends to be larger than the underestimation. We then show that, when relevance data is heavily biased towards a single team or a few teams, the condensed-list versions of Average Precision (AP), Q-measure (Q) and normalised Discounted Cumulative Gain (nDCG), which we call AP', Q' and nDCG', are not necessarily superior to the original metrics in terms of discriminative power, i.e., the overall ability to detect pairwise statistical significance. Nevertheless, even under system bias, AP' and Q' are generally more discriminative than bpref and the condensed-list version of Rank-Biased Precision (RBP), which we call RBP'. Copyright 2008 ACM.

    DOI

    Scopus

    27
    Citation
    (Scopus)
  • Modelling A User Population for Designing Information Retrieval Metrics

    Tetsuya Sakai and Stephen Robertson

    Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008)    2008  [Refereed]

  • On the reliability of information retrieval metrics based on graded relevance

    Tetsuya Sakai

    INFORMATION PROCESSING & MANAGEMENT   43 ( 2 ) 531 - 548  2007.03  [Refereed]

     View Summary

    This paper compares 14 information retrieval metrics based on graded relevance, together with 10 traditional metrics based on binary relevance, in terms of stability, sensitivity and resemblance of system rankings. More specifically, we compare these metrics using the Buckley/Voorhees stability method, the Voorhees/Buckley swap method and Kendall's rank correlation, with three data sets comprising test collections and submitted runs from NTCIR. Our experiments show that (Average) Normalised Discounted Cumulative Gain at document cut-off I are the best among the rank-based graded-relevance metrics, provided that l is large. On the other hand, if one requires a recall-based graded-relevance metric that is highly correlated with Average Precision, then Q-measure is the best choice. Moreover, these best graded-relevance metrics are at least as stable and sensitive as Average Precision, and are fairly robust to the choice of gain values. (c) 2006 Elsevier Ltd. All rights reserved.

    DOI

    Scopus

    81
    Citation
    (Scopus)
  • On the Reliability of Factoid Question Answering Evaluation

    Tetsuya Sakai

    ACM Transactions on Asian Language Information Processing (TALIP)    2007  [Refereed]

  • On Penalising Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance

    Tetsuya Sakai

    Proceedings of the First Workshop on Evaluating Information Access (EVIA 2007)     32 - 43  2007  [Refereed]

    CiNii

  • User Satisfaction Task: A Proposal for NTCIR-7

    Tetsuya Sakai

    Proceedings of the First Workshop on Evaluating Information Access (EVIA 2007),    2007  [Refereed]

  • Pic-A-Topic: Efficient Viewing of Informative TV Contents on Travel, Cooking, Food and More

    Tetsuya Sakai, Tatsuya Uehara, Taishi Shimomori, Makoto Koyama, and Mika Fukui

    RIAO 2007 Proceedings    2007  [Refereed]

  • Alternatives to Bpref

    Tetsuya Sakai

    Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07     71 - 78  2007  [Refereed]

     View Summary

    Recently, a number of TREC tracks have adopted a retrieval effectiveness metric called bpref which has been designed for evaluation environments with incomplete relevance data. A graded-relevance version of this metric called rpref has also been proposed. However, we show that the application of Q-measure, normalised Discounted Cumulative Gain (nDCG) or Average Precision (AveP)to condensed lists, obtained by ?ltering out all unjudged documents from the original ranked lists, is actually a better solution to the incompleteness problem than bpref. Furthermore, we show that the use of graded relevance boosts the robustness of IR evaluation to incompleteness and therefore that Q-measure and nDCG based on condensed lists are the best choices. To this end, we use four graded-relevance test collections from NTCIR to compare ten different IR metrics in terms of system ranking stability and pairwise discriminative power. Copyright 2007 ACM.

    DOI

    Scopus

    132
    Citation
    (Scopus)
  • Evaluating the Task of Finding One Relevant Document Using Incomplete Relevance Data

    Tetsuya Sakai

    FIT 2007 Information Technology Letters    2007  [Refereed]

  • Evaluating Information Retrieval Metrics based on Bootstrap Hypothesis Tests

    Tetsuya Sakai

    IPSJ TOD    2007  [Refereed]

  • On the Properties of Evaluation Metrics for Finding One Highly Relevant Document

    Tetsuya Sakai

    IPSJ TOD    2007  [Refereed]

  • 高精度な音声入力質問応答のための疑問表現補完

    筒井 秀樹, 真鍋 俊彦, 福井 美佳, 酒井 哲也, 藤井 寛子, 浦田 耕二

    情報処理学会論文誌    2007  [Refereed]

  • よりよい検索システム実現のために:正解の良し悪しを考慮した情報検索評価の動向

    酒井哲也

    情報処理    2006  [Refereed]

  • A Further Note on Evaluation Metrics for the Task of Finding One Highly Relevant Document

    Tetsuya Sakai

    IPSJ SIG Technical Report    2006  [Refereed]

  • On the Task of Finding One Highly Relevant Document with High Precision

    Tetsuya Sakai

    IPSJ TOD    2006  [Refereed]

  • Give Me Just One Highly Relevant Document: P-measure

    Tetsuya Sakai

    ACM SIGIR 2006 Proceedings    2006  [Refereed]

  • Evaluating Evaluation Metrics based on the Bootstrap

    Tetsuya Sakai

    ACM SIGIR 2006 Proceedings    2006  [Refereed]

    CiNii

  • NTCIRに基づく文書検索技術の進歩に関する一考察

    酒井哲也

    情報科学技術レターズ    2006  [Refereed]

  • Improving the robustness to recognition errors in speech input question answering

    Hideki Tsutsui, Toshihiko Manabe, Mika Fukui, Tetsuya Sakai, Hiroko Fujii, Koji Urata

    INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS   4182   297 - 312  2006  [Refereed]

     View Summary

    In our previous work, we developed a prototype of a speech-input help system for home appliances such as digital cameras and microwave ovens. Given a factoid question, the system performs textual question answering using the manuals as the knowledge source. Whereas, given a HOW question, it retrieves and plays a demonstration video. However, our first prototype suffered from speech recognition errors, especially when the Japanese interrogative phrases in factoid questions were misrecognized. We therefore propose a method for solving this problem, which complements a speech query transcript with an interrogative phrase selected from a pre-determined list. The selection process first narrows down candidate phrases based on co-occurrences within the manual text, and then computes the similarity between each candidate and the query transcript in terms of pronunciation. Our method improves the Mean Reciprocal Rank of top three answers from 0.429 to 0.597 for factoid questions.

  • Pic-A-Topic: Gathering information efficiently from recorded TV shows on travel

    Tetsuya Sakai, Tatsuya Uehara, Kazuo Sumita, Taishi Shimomori

    INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS   4182   429 - 444  2006  [Refereed]

     View Summary

    We introduce a system called Pic-A-Topic, which analyses closed captions of Japanese TV shows on travel to perform topic segmentation and topic sentence selection. Our objective is to provide a table-of-contents interface that enables efficient viewing of desired topical segments within recorded TV shows to users of appliances such as hard disk recorders and digital TVs. According to our experiments using 14.5 hours of recorded travel TV shows, Pic-A-Topic's F1-measure for the topic segmentation task is 82% of manual performance on average. Moreover, a preliminary user evaluation experiment suggests that this level of performance may be indistinguishable from manual performance.

  • Bootstrap-based comparisons of IR metrics for finding one relevant document

    Tetsuya Sakai

    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS   4182   374 - 389  2006  [Refereed]

     View Summary

    This paper compares the sensitivity of IR metrics designed for the task of finding one relevant document, using a method recently proposed at SIGIR 2006. The metrics are: P+-measure, P-measure, O-measure, Normalised Weighted Reciprocal Rank (NWRR) and Reciprocal Rank (RR). All of them except for RR can handle graded relevance. Unlike the ad hoc (but nevertheless useful) "swap" method proposed by Voorhees and Buckley, the new method derives the sensitivity and the performance difference required to guarantee a given significance level directly from Bootstrap Hypothesis Tests. We use four data sets from NTCIR to show that, according to this method, "P((+))-measure >= O-measure >= NWRR >= RR" generally holds, where ">=" means "is at least as sensitive as". These results generalise and reinforce previously reported ones based on the swap method. Therefore, we recommend the use of P((+))-measure and O-measure for practical tasks such as known-item search where recall is either unimportant or immeasurable.

  • Ranking the NTCIR systems based on multigrade relevance

    Tetsuya Sakai

    INFORMATION RETRIEVAL TECHNOLOGY   3411   251 - 262  2005  [Refereed]

     View Summary

    At NTCIR-4, new retrieval effectiveness metrics called Q-measure and R-measure were proposed for evaluation based on multi-grade relevance. This paper shows that Q-measure inherits both the reliability of noninterpolated Average Precision and the multigrade relevance capability of Average Weighted Precision through a theoretical analysis, and then verify the above claim through experiments by actually ranking the systems submitted to the NTCIR-3 CLIR Task. Our experiments confirm that the Q-measure ranking is very highly correlated with the Average Precision ranking and that it is more reliable than Average Weighted Precision.

  • 評価型ワークショップにおけるシステム順位の安定性について

    酒井哲也

    言語処理学会第11回年次大会 併設ワークショップ「評価型ワークショップを考える」    2005  [Refereed]

  • 固有表現抽出と回答タイプ体系が質問応答システムの性能に与える影響(自然言語処理)

    市村由美, 齋藤佳美, 酒井哲也, 国分智晴, 小山誠

    電子情報通信学会 論文誌    2005  [Refereed]

  • Flexible Pseudo-Relevance Feedback via Selective Sampling

    Tetsuya Sakai, Toshihiko Manabe, and Makoto Koyama

    ACM TALIP    2005  [Refereed]

  • Advanced Technologies for Information Access (invited paper),

    Tetsuya Sakai

    International Journal of Computer Processing of Oriental Languages    2005  [Refereed]

  • ひとつの高適合文書を高精度に検索するタスクのための評価指標

    酒井哲也

    情報科学技術レターズ    2005  [Refereed]

  • The reliability of metrics based on graded relevance

    Tetsuya Sakai

    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS   3689   1 - 16  2005  [Refereed]

     View Summary

    This paper compares 14 metrics designed for information retrieval evaluation with graded relevance, together with 10 traditional metrics based on binary relevance, in terms of reliability and resemblance of system rankings. More specifically, we use two test collections with submitted runs from the Chinese IR and English IR tasks in the NTCIR-3 CUR track to examine the metrics using methods proposed by Buckley/Voorhees and Voorhees/Buckley as well as Kendall's rank correlation. Our results show that AnDCC(l) and nDCC(l) ((Average) Normalised Discounted Cumulative Cain at Document cut-off 1) are good metrics, provided that I is large. However, if one wants to avoid the parameter I altogether, or if one requires a metric that closely resembles TREC Average Precision, then Q-measure appears to be the best choice.

  • Introduction to the special issue: Recent advances in information processing and access for Japanese

    Tetsuya Sakai, Yuji Matsumoto

    ACM Transactions on Asian Language Information Processing   4 ( 4 ) 275 - 376  2005  [Refereed]

    DOI

    Scopus

  • The Relationship between Answer Ranking and User Satisfaction in a Question Answering System

    Tomoharu Kokubu, Tetsuya Sakai, Yoshimi Saito, Hideki Tsutsui, Toshihiko Manabe, Makoto Koyama, and Hiroko Fujii:

    NTCIR-5 Proceedings (Open Submission Session)    2005  [Refereed]

  • The Effect of Topic Sampling on Sensitivity Comparisons of Information Retrieval Metrics

    Tetsuya Sakai

    NTCIR-5 Proceedings (Open Submission Session)    2005  [Refereed]

  • ASKMi: A Japanese Question Answering System based on Semantic Role Analysis

    Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Makoto Koyama, Tomoharu Kokubu, and Toshihiko Manabe

    RIAO 2004    2004  [Refereed]

    CiNii

  • New Performance Metrics based on Multigrade Relevance

    Tetsuya Sakai

    NTCIR-4 Proceedings (Open Submission Session),    2004  [Refereed]

    CiNii

  • The Effect of Back-Formulating Questions in Question Answering Evaluation

    Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Tomoharu Kokubu, and Makoto Koyama

    ACM SIGIR 2004    2004  [Refereed]

    CiNii

  • 汎用シソーラスと擬似適合性フィードバックとを用いた検索質問拡張

    小山誠, 真鍋俊彦, 木村和広, 酒井哲也

    「情報アクセスのためのテキスト処理」シンポジウム    2003  [Refereed]

  • BRIDJE over a Language Barrier: Cross-Language Information Access by Integrating Translation and Retrieval,

    Tetsuya Sakai, Makoto Koyama, Masaru Suzuki, Akira Kumano, and Toshihiko Manabe

    IRAL 2003    2003  [Refereed]

    CiNii

  • Evaluating Retrieval Performance for Japanese Question Answering: What Are Best Passages?

    Tetsuya Sakai and Tomoharu Kokubu

    ACM SIGIR 2003    2003  [Refereed]

    CiNii

  • Average Gain Ratio: A Simple Retrieval Performance Measure for Evaluation with Multiple Relevance Levels

    Tetsuya Sakai

    ACM SIGIR 2003    2003  [Refereed]

  • Relative and Absolute Term Selection Criteria: A Comparative Study for English and Japanese IR

    Tetsuya Sakai and Stephen E. Robertson

    ACM SIGIR 2002    2002  [Refereed]

  • Generating transliteration rules for cross-language information retrieval from machine translation dictionaries

    Tetsuya Sakai, Akira Kumano, Toshihiko Manabe

    Proceedings of the IEEE International Conference on Systems, Man and Cybernetics   6   290 - 295  2002  [Refereed]

     View Summary

    This paper describes a method for automatically converting existing English-Japanese and Japanese-English machine translation dictionaries into English-Japanese transliteration rules and Japanese-English back-transliteration rules for cross language information retrieval. An existing English-katakana word alignment module, which is part of our own machine translation system, is exploited in generating probabilistic rewriting rules. If our system is allowed to output 15 candidate spellings, it successfully transliterates more than 75% of a set of out-of-vocabulary English words into katakana, and successfully back-transliterates more than 55% of a set of out-of-vocabulary katakana words into English. Moreover, our preliminary cross-language information retrieval experiments, which treat the candidate spellings as a group of synonyms, suggest that our methods can indeed compensate for the failure of machine translation in some cases.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • The Use of External Text Data in Cross-Language Information Retrieval based on Machine Translation

    Tetsuya Sakai

    IEEE SMC 2002    2002  [Refereed]

  • 意味役割解析に基づく高適合英語文書の検索

    酒井哲也, 小山誠, 鈴木優, 真鍋俊彦

    FIT 2002 情報技術レターズ LD-8     67 - 68  2002  [Refereed]

    CiNii

  • A framework for cross-language information access: Application to English and Japanese

    Gareth Jones, Nigel Collier, Tetsuya Sakai, Kazuo Sumita, Hideki Hirakawa

    COMPUTERS AND THE HUMANITIES   35 ( 4 ) 371 - 388  2001.11  [Refereed]

     View Summary

    Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documents that they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This paper shows how information retrieval and machine translation can be combined in a cross-language information access framework to help overcome the language barrier. We present encouraging preliminary experimental results using English queries to retrieve documents from the standard Japanese language BMIR-J2 retrieval test collection. We outline the scope and purpose of cross-language information access and provide an example application to suggest that technology already exists to provide effective and potentially useful applications.

  • A framework for cross-language information access: Application to english and Japanese

    Gareth Jones, Nigel Collier, Tetsuya Sakai, Kazuo Sumita, Hideki Hirakawa

    Language Resources and Evaluation   35 ( 4 ) 371 - 388  2001

     View Summary

    Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documents that they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This paper shows how information retrieval and machine translation can be combined in a cross-language information access framework to help overcome the language barrier. We present encouraging preliminary experimental results using English queries to retrieve documents from the standard Japanese language BMIR-J2 retrieval test collection. We outline the scope and purpose of cross-language information access and provide an example application to suggest that technology already exists to provide effective and potentially useful applications. © 2001 Kluwer Academic Publishers.

  • Flexible Pseudo-Relevance Feedback via Direct Mapping and Categorization of Search Requests

    Tetsuya Sakai, Stephen E. Robertson, and Stephen Walker

    ECIR 2001    2001  [Refereed]

  • Japanese-English Cross-Language Information Retrieval using Machine Translation and Pseudo-Relevance Feedback

    Tetsuya Sakai

    International Journal of Computer Processing of Oriental Languages   14 ( 2 ) 83 - 107  2001  [Refereed]

    DOI CiNii

  • Flexible Pseudo-Relevance Feedback Using Optimization Tables

    Tetsuya Sakai, Stephen E. Robertson

    ACM SIGIR 2001    2001  [Refereed]

  • Generic Summaries for Indexing in Information Retrieval

    Tetsuya Sakai and Karen Sparck Jones

    ACM SIGIR 2001    2001  [Refereed]

    CiNii

  • Combining the Ranked Output from Fulltext and Summary Indexes

    Tetsuya Sakai

    ACM SIGIR 2001 Workshop on Text Summarization    2001  [Refereed]

  • Incremental relevance feedback in Japanese text retrieval

    Gareth Jones, Tetsuya Sakai, Masahiro Kajiura, Kazuo Sumita

    Information Retrieval   2 ( 4 ) 361 - 384  2000  [Refereed]

     View Summary

    The application of relevance feedback techniques has been shown to improve retrieval performance for a number of information retrieval tasks. This paper explores incremental relevance feedback for ad hoc Japanese text retrieval
    examining, separately and in combination, the utility of term reweighting and query expansion using a probabilistic retrieval model. Retrieval performance is evaluated in terms of standard precision-recall measures, and also using "number-to-view" graphs. Experimental results, on the standard BMIR-J2 Japanese language retrieval collection, show that both term reweighting and query expansion improve retrieval performance. This is reflected in improvements in both precision and recall, but also a reduction in the average number of documents which must be viewed to find a selected number of relevant items. In particular, using a simple simulation of user searching, incremental application of relevance information is shown to lead to progressively improved retrieval performance and an overall reduction in the number of documents that a user must view to find relevant ones. © 2000 Kluwer Academic Publishers.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • MT-based Japanese-English Cross-Language IR Experiments using the TREC Test Collections

    Tetsuya Sakai

    IRAL 2000    2000  [Refereed]

  • A First Step towards Flexible Local Feedback for Ad hoc Retrieval

    Tetsuya Sakai, Masahiro Kajiura, and Kazuo Sumita

    IRAL 2000    2000  [Refereed]

  • 確率モデルに基づく日本語情報フィルタリングにおけるフィードバックによる検索条件展開および検索精度評価

    酒井哲也, Gareth J.F. Jones, 梶浦正浩, 住田一男

    情報処理学会論文誌    1999  [Refereed]

  • A comparison of query translation methods for English-Japanese cross-language information retrieval

    Gareth Jones, Tetsuya Sakai, Nigel Collier, Akira Kumano, and Kazuo Sumita

    SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL     269 - 270  1999  [Refereed]

     View Summary

    In this paper we report results of an investigation into English-Japanese Cross-Language Information Retrieval (CLIR) comparing a number of query translation methods. Results from experiments using the standard BMIR-J2 Japanese collection suggest that full machine translation (MT) can outperform popular dictionary-based query translation methods and further that in this context MT is largely robust to queries with little linguistic structure.

  • Exploring the use of Machine Translation resources for English-Japanese Cross-Language Information Retrieval

    Gareth Jones, Tetsuya Sakai, Nigel Collier, Akira Kumano, and Kazuo Sumita

    MT Summit VII Workshop on Machine Translation for Cross Language Information Retrieval    1999  [Refereed]

  • 日本語情報検索システム評価用テストコレクションの構築

    木本 晴夫, 小川 泰嗣, 石川 徹也, 増永 良文, 福島 俊一, 田中 智博, 中渡瀬 秀一, 芥子 育雄, 豊浦 潤, 宮内 忠信, 上田 良寛, 松井 くにお, 木谷 強, 三池 誠司, 酒井 哲也, 徳永 健伸, 鶴岡 弘, 安形 輝

    情報処理学会論文誌    1999  [Refereed]

  • 機械翻訳を用いた英日・日英言語横断検索に関する一考察

    酒井哲也, 梶浦 正浩, 住田 一男, Jones, G, Collier, N

    情報処理学会論文誌   40 ( 11 ) 4075 - 4086  1999  [Refereed]

    CiNii

  • 情報検索システム評価のためのテストコレクション

    酒井哲也, 小川泰嗣, 木谷強, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 徳永健伸, 鶴岡弘, 安形輝, 神門典子

    Computer Today    1998  [Refereed]

  • 日本語情報検索システム評価用テストコレクションの構築

    木本晴夫, 小川泰嗣, 石川徹也, 増永良文, 福島俊一, 田中智博, 中渡瀬秀一, 芥子育雄, 豊浦潤, 宮内忠信, 上田良寛, 松井くにお, 木谷強, 三池誠司, 酒井哲也, 徳永健伸, 鶴岡弘, 安形輝

    情報学シンポジウム'98    1998  [Refereed]

  • ユーザーの要求に応じた 情報フィルタリングシステム NEATのプロファイル生成

    酒井哲也, Jones, G.J.F, 梶浦正浩, 住田一男

    Interaction '98     149 - 152  1998  [Refereed]

    CiNii

  • Lessons from BMIR-J2: A Test Collection for Japanese IR Systems

    Tsuyoshi Kitani, Yasushi Ogawa, Tetsuya Ishikawa, Haruo Kimoto, Ikuo Keshi, Jun Toyoura, Toshikazu Fukushima, Kunio Matsui, Yoshihiro Ueda, Tetsuya Sakai, Takenobu Tokunaga, Hiroshi Tsuruoka, Hidekazu Nakawatase, Teru Agata

    ACM SIGIR '98    1998  [Refereed]

  • Experiments in Japanese  Text Retrieval and Routing using the NEAT Systemross-Language Information Access: a case study for English and Japanese

    Gareth Jones, Tetsuya Sakai, Masaharu Kajiura, and Kazuo Sumita

    ACM SIGIR '98    1998  [Refereed]

  • Application of Query Expansion Techniques in Probabilistic Japanese News Filtering

    Tetsuya Sakai, Gareth Jones, Masahiro Kajiura, and Kazuo Sumita

    IRAL '98    1998  [Refereed]

  • 情報フィルタリングのためのブール式と文書構造を利用した検索条件生成と検索精度評価

    酒井 哲也, 梶浦 正浩, 住田 一男

    情報処理学会論文誌    1998  [Refereed]

  • 日本語テキスト情報検索システムの評価用テストコレクション

    酒井哲也, 小川泰嗣, 木谷強, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 徳永健伸, 鶴岡弘, 安形輝

    アドバンストデータベースシンポジウム'98, パネル:マルチメディア情報検索ベンチマークの未来    1998  [Refereed]

  • WWW上のフロー情報を対象にした情報フィルタ (FreshEye)

    住田一男, 上原龍也, 小野顕司, 酒井哲也, 池田朋男, 下郡信宏

    インタラクション'97    1997  [Refereed]

  • 日本語情報検索システム評価用テストコレクションBMIR-J1

    福島俊一, 小川泰嗣, 石川徹也, 増永良文, 木本晴夫, 田中智博, 中渡瀬秀一, 芥子育雄, 豊浦潤, 宮内忠信, 上田良寛, 松井くにお, 三池誠司, 酒井哲也, 木谷強, 徳永健伸, 鶴岡弘, 安形輝

    自然言語処理シンポジウム'96    1996  [Refereed]

  • A User Interface for Generating Dynamic Abstracts of Retrieved Documents

    Tetsuya Sakai, Etsuo Itoh, Seiji Miike, Kazuo Sumita

    47th FID    1994  [Refereed]

▼display all

Books and Other Publications

  • Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

    Tetsuya Sakai, Emi Ishita, Hiroaki Ohshima, Faegheh Hasibi, Jiaxin Mao, Joemon Jose

    SIGIR-AP  2024

  • Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

    Qingyao Ai, Yiqin Liu, Alistair Moffa, Xuanjing Huang, Tetsuya Sakai, Justin Zobel

    SIGIR-AP  2023

  • Proceedings of ACM SIGIR 2021

    Fernando, Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, Tetsuya Sakai, Alejandro Bellogín, Masaharu Yoshioka

    2021

  • Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact

    Tetsuya Sakai, Douglas W. Oard, Noriko Kando

    Springer  2020

  • Proceedings of the Open-Source IR Replicability Challenge (OSIRRC 2019)

    Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

    2019

  • U-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018

  • Q-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018

  • Expected Reciprocal Rank. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018

  • ERR-IA. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018

  • D-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018

  • alpha-nDCG. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018

  • Advanced Information Retrieval Measures. In: Liu L., Özsu M. (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018

  • Laboratory Experiments in Information Retrieval: Sample Sizes, Effect Sizes, and Statistical Power

    Tetsuya Sakai

    Springer  2018

  • Proceedings of AIRS 2018 (LNCS 11292)

    Yuen-Hsien Tseng, Tetsuya Sakai, Jing Jiang, Lun-Wei Ku, Dae Hoon Park, Jui-Feng Yeh, Liang-Chih Yu, Lung-Hao Lee, Zhi-Hong Chen

    2018

  • Proceedings of ACM SIGIR 2017

    Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de, Vries, A.P, Ryen W. White

    2017

  • 人工知能学大事典

    人工知能学会

    共立出版  2017

  • Proceedings of SPIRE 2016 (LNCS 9954)

    Shunsuke Inegaga, Kunihiko Sadakane, Tetsuya Sakai

    Springer  2016

  • 情報アクセス評価方法論~検索エンジンの進歩のために~,

    酒井哲也

    コロナ社  2015

  • Proceedings of ACM SIGIR 2013

    Gareth J.F. Jones, Páraic Sheridan, Diane Kelly, Maarten de Rijke, and Tetsuya Sakai

    2013

  • Proceedings of NTCIR-10

    Noriko Kando, Kazuaki Kishida, Eric Tang, Tetsuya Sakai, Makoto P. Kato, Ka Po Chow, Isao Goto, Yotaro Watanabe, Tomoyosi Akiba, Hiromitsu Nishizaki, Akiko Aizawa, Mizuki Morita, and Eiji Aramaki

    2013

  • Proceedings of NTCIR-9

    Noriko Kando, Daisuke Ishikawa, Miho Sugimoto, Fredric C. Gey, Tetsuya Sakai, Tomoyosi Akiba, Hideki Shima, Shlomo Geva, Eric Tang, Andrew Trotman, Tsuneaki Kato, Bin Lu, and Isao Goto

    2011

  • Proceedings of the 3rd International Workshop on Evaluating Information Access (EVIA 2010)

    Tetsuya Sakai, Mark Sanderson, William Webber, Noriko Kando, Kazuaki Kishida

    2010

  • Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation,

    Shlomo Geva, Jaap Kamps, Carol Peters, Tetsuya Sakai, Andrew Trotman, and Ellen Voorhees

    2009

  • 5th Asia Information Retrieval Symposium (AIRS 2009)

    Gary Geunbae Lee, Dawei Song, Chin-Yew Lin, Akiko Aizawa, Kazuko Kuriyama, Masaharu Yoshioka, Tetsuya Sakai

    Springer  2009

  • Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation

    Shlomo Geva, Jaap Kamps, Carol Peters, Tetsuya Sakai, Andrew Trotman, and Ellen Voorhees

    2009

  • 言語処理学辞典

    共同執筆

    共立出版  2009

  • Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008),

    Tetsuya Sakai, Mark Sanderson, Noriko Kando, and Miho Sugimoto

    2008

  • Proceedings of AIRS 2008 (LNCS 4993)

    Hang Li, Ting Liu, Wei-Ying Ma, Tetsuya Sakai, Kam-Fai Wong, and Guodong Zhou

    2008

  • Proceedings of the First International Workshop on Evaluating Information Access (EVIA 2007),

    Tetsuya Sakai, Mark Sanderson, and David Kirk Evans

    2007

▼display all

Presentations

  • グループフェアなウェブ検索と会話型検索

    酒井哲也, Sijie Tao, Hanpei Fang, Yuxiang Zhang

    情報処理学会 

    Presentation date: 2024

    Event date:
    2024
     
     
  • Cross-lingual Relevance Estimation with Soft and Hard Prompts

    Hanpei Fang, Tetsuya Sakai

    Presentation date: 2024

    Event date:
    2024
     
     
  • A Study on Automatic Nugget Weight Generation with LLMs

    Kai-xin Chang, Tetsuya Sakai

    Presentation date: 2024

    Event date:
    2024
     
     
  • 中間音楽生成のための Transformer モデル

    Presentation date: 2024

    Event date:
    2024
     
     
  • 思い出せない映画に特化した情報検索システムの作成

    DEIM 

    Presentation date: 2024

    Event date:
    2024
     
     
  • RSLTOT at the TREC 2023 ToT Track

    Reo Yoshikoshi, Tetsuya Sakai

    TREC 

    Presentation date: 2024

    Event date:
    2024
     
     
  • Overview of the NTCIR-17 FairWeb-1 Task

    Sijie Tao, Nuo Chen, Tetsuya Sakai, Zhumin Chu, Hiromi Arai, Ian Soboroff, Nicola Ferro, Maria Maistro

    NTCIR-17 

    Presentation date: 2023

  • RSLFW at the NTCIR-17 FairWeb 1 Task

    Fan Li, Kaize Shi, Kenta Inaba, Sijie Tao, Nuo Chen, Tetsuya Sakai

    NTCIR-17 

    Presentation date: 2023

  • Evaluating Parrots and Sociopathic Liars (keynote)

    Tetsuya Sakai  [Invited]

    ACM ICTIR 

    Presentation date: 2023

  • On A Few Responsibilities of (IR) Researchers: Fairness, Awareness, and Sustainability (keynote)

    Tetsuya Sakai  [Invited]

    ECIR 

    Presentation date: 2023

  • ウェブ検索結果がユーザの意見形成に及ぼす影響の調査

    Kenta Inaba, Tetsuya Sakai

    DEIM 

    Presentation date: 2023

  • SWAN: A Generic Framework for Auditing Textual Conversational Systems

    Tetsuya Sakai

    arXiv, Cornell University 

    Presentation date: 2023

  • Evaluating Evaluation Measures, Evaluating Information Access Systems, Designing and Constructing Test Collections, and Evaluating Again

    Tetsuya Sakai  [Invited]

    Proceedings of NTCIR-16 

    Presentation date: 2022

  • グループフェアネスを考慮したウェブ検索タスク

    酒井哲也

    情報処理学会研究報告 

    Presentation date: 2022

  • Overview of the NTCIR-16 WeWantWeb with CENTRE (WWW-4) Task

    Tetsuya Sakai, Sijie Tao, Zhumin Chu, Maria Maistro, Yujing Li, Nuo Chen, Nicola Ferro, Junjie Wang, Ian Soboroff, Yiqun Liu

    Proceedings of NTCIR-16 

    Presentation date: 2022

  • SLWWW at the NTCIR-16 WWW-4 Task

    Yuya Ubukata, Masaki Muraoka, Sijie Tao, Tetsuya Sakai

    Proceedings of NTCIR-16 

    Presentation date: 2022

  • RSLDE at the NTCIR-16 DialEval-2 Task

    Fan Li, Tetsuya Sakai

    Proceedings of NTCIR-16 

    Presentation date: 2022

  • Overview of the NTCIR-16 Dialogue Evaluation (DialEval-2) Task

    Sijie Tao, Tetsuya Sakai

    Proceedings of NTCIR-16 

    Presentation date: 2022

  • On Variants of Root Normalised Order-aware Divergence and a Divergence based on Kendall’s Tau

    Tetsuya Sakai

    arXiv:2204.07304 

    Presentation date: 2022

  • A Versatile Framework for Evaluating Ranked Lists in terms of Group Fairness and Relevance

    Tetsuya Sakai, Jin Young Kim, Inho Kang

    arXiv:2204.00280 

    Presentation date: 2022

  • Transformerを用いた文書の自動品質評価

    吉越玲士, 酒井哲也

    DEIM 2022 

    Presentation date: 2022

  • NTCIR-16ウェブ検索・再現可能性タスク (WWW-4) および対話評価タスク (DialEval-2)への誘い

    酒井哲也

    情報処理学会研究報告 

    Presentation date: 2021

  • 対話要約における話者情報を持つEmbeddingの効果

    楢木悠士, 酒井哲也, 林 良彦

    FIT2021講演論文集 

    Presentation date: 2021

  • RealSakaiLab at the TREC 2020 Health Misinformation Track

    Sijie Tao, Tetsuya Sakai

    Presentation date: 2021

  • 話者情報を認識した対話要約

    楢木悠士, 酒井哲也

    言語処理学会第27回年次大会発表論文集 

    Presentation date: 2021

  • Voice Assistantアプリの対話型解析システムの開発

    刀塚敦子, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉

    電子情報通信学会 第54回情報通信システムセキュリティ研究会 

    Presentation date: 2021

  • モバイルアプリケーションにおけるUIデザイン自動評価の検討

    栗林峻, 酒井哲也

    DEIM 2021 

    Presentation date: 2021

  • スタンス検出タスクにおける評価方法の選定

    雨宮佑基, 酒井哲也

    DEIM 2021 

    Presentation date: 2021

  • 日経新聞の記事からの日経ラジオ用読み原稿の自動生成

    清水嶺, 酒井哲也

    DEIM 2021 

    Presentation date: 2021

  • 有用なレビューを抽出するための比較文フィルタリングの検討

    小橋賢介, 雨宮佑基, 酒井哲也

    DEIM 2021 

    Presentation date: 2021

  • Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents?

    Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

    Presentation date: 2021

  • Overview of the TREC 2018 CENTRE Track

    Ian Soboroff, Nicola Ferro, Maria Maistro, Tetsuya Sakai

    Proceedings of TREC 2018 

    Presentation date: 2020

  • Improving Concept Representations for Short Text Classification

    Sijie Tao, Tetsuya Sakai

    Presentation date: 2020

  • Different Types of Voice User Interface Failures May Cause Different Degrees of Frustration

    Shiyoh Goetsu, Tetsuya Sakai

    Presentation date: 2020

  • 商品比較のための文脈つき評価軸抽出の検討

    小橋賢介, 酒井哲也

    DEIM 2020 

    Presentation date: 2020

  • Androidアプリの権限要求に対する説明十分性の自動確認システムの提案

    小島智樹, 酒井哲也

    DEIM 2020 

    Presentation date: 2020

  • Purchase Prediction based on Recurrent Neural Networks with an Emphasis on Recent User Activities

    Quanyu Piao, Joo-Young Lee, Tetsuya Sakai

    DEIM 2020 

    Presentation date: 2020

  • Experiments on Unsupervised Text Classification based on Graph Neural Networks

    Haoxiang Shi, Cen Wang, Tetsuya Sakai

    DEIM 2020 

    Presentation date: 2020

  • Do Neural Models for Response Generation Fully Exploit the Input Natural Language Text?

    Lingfeng Zhang, Tetsuya Sakai

    DEIM 2020 

    Presentation date: 2020

  • 商品検索におけるゼロマッチ解消のためのデータセット構築の検討

    雨宮佑基, 真鍋知博, 藤田澄男, 酒井哲也

    DEIM 2020 

    Presentation date: 2020

  • 解釈可能な内部表現を使用したタスク指向ニューラル対話システムの試作

    村田憲俊, 酒井哲也

    DEIM 2020 

    Presentation date: 2020

  • Response Generation based on the Big Five Personality Traits

    Wanqi Wu, Tetsuya Sakai

    DEIM 2020 

    Presentation date: 2020

  • Different Types of Voice User Interface Failures May Cause Different Degrees of Frustration

    Shiyoh Goetsu, Tetsuya Sakai

    arXiv 

    Presentation date: 2020

  • selt Team’s Entity Linking System at the NTCIR-15 QALab-PoliInfo2

    Yuji Naraki, Tetsuya Sakai

    Proceedings of NTCIR-15 

    Presentation date: 2020

  • SLWWW at the NTCIR-15WWW-3 Task

    Masaki Muraoka, Zhaohao Zeng, Tetsuya Sakai

    Proceedings of NTCIR-15 

    Presentation date: 2020

  • Overview of the NTCIR-15 We Want Web with CENTRE (WWW-3) Task

    Tetsuya Sakai, Sijie Tao, Zhaohao Zeng, Yukun Zheng, Jiaxin Mao, Zhumin Chu, Yiqun Liu, Maria Maistro, Zhicheng Dou, Nicola Ferro, Ian Soboroff

    Proceedings of NTCIR-15 

    Presentation date: 2020

  • RSLNV at the NTCIR-15 Dialogue Evaluation (DialEval-1) Task

    Ting Cao, Fan Zhang, Haoxiang Shi, Zhaohao Zeng, Sosuke Kato, Tetsuya Sakai, Injae Lee, Kyungduk Kim, Inho Kang

    Proceedings of NTCIR-15 

    Presentation date: 2020

  • SKYMN at the NTCIR-15 DialEval-1 Task

    Junjie Wang, Yuxiang Zhang, Tetsuya Sakai, Hayato Yamana

    Proceedings of NTCIR-15 

    Presentation date: 2020

  • Overview of the NTCIR-15 Dialogue Evaluation (DialEval-1) Task

    Zhaohao Zeng, Sosuke Kato, Tetsuya Sakai, Inho Kang

    Proceedings of NTCIR-15 

    Presentation date: 2020

  • ユーザの感覚に近い多様化検索評価指標

    酒井哲也, Zhaohao Zeng

    FIT2020講演論文集 

    Presentation date: 2020

  • On Fuhr’s Guideline for IR Evaluation

    Tetsuya Sakai

    SIGIR Forum 

    Presentation date: 2020

  • 擬似アノテーションにもとづく日本語ツイートの極性判定

    小橋賢介, 酒井哲也

    DEIM 2019 

    Presentation date: 2019

  • FigureQAタスクにおける抽象画像を考慮したアプローチ

    坂本凜, 酒井哲也

    DEIM 2019 

    Presentation date: 2019

  • Convolutional Neural Networkを用いたFake News Challengeの検討

    雨宮佑基, 酒井哲也

    DEIM 2019 

    Presentation date: 2019

  • 音声ユーザインタフェースにおける処理エラーによるユーザフラストレーションに関する調査

    呉越思瑶, 酒井哲也

    DEIM 2019 

    Presentation date: 2019

  • Query-Focused Extractive Summarization based on Deep Learning: Comparison of Similarity Measures for Pseudo Ground Truth Generation

    Yuliska, Tetsuya Sakai

    DEIM 2019 

    Presentation date: 2019

  • Exploring Multi-label Classification Using Text Graph Convolutional Networks on the NTCIR-13 MedWeb Dataset

    Sijie Tao, Tetsuya Sakai

    DEIM 2019 

    Presentation date: 2019

  • Androidアプリの権限要求に対するユーザーへの説明の補完

    小島智樹, 酒井哲也

    DEIM 2019 

    Presentation date: 2019

  • 能動学習を利用した未知語アノテーションの検討

    黒澤瞭佑, 酒井哲也

    DEIM 2019 

    Presentation date: 2019

  • Dialogue Quality Distribution Prediction based on a Loss that Compares Adjacent Probability Bins

    河東宗祐, 酒井哲也

    DEIM 2019 

    Presentation date: 2019

  • Twitterコーパスに基づく雑談対話システムにおける多様性の獲得

    村田憲俊, 酒井哲也

    DEIM 2019 

    Presentation date: 2019

  • 文書分類技術に基づくエントリーシートからの業界推薦

    三王慶太, 酒井哲也

    DEIM 2019 

    Presentation date: 2019

  • Graded Relevance Assessments and Graded Relevance Measures of NTCIR: A Survey of the First Twenty Years

    Tetsuya Sakai

    arXiv:1903.11272 

    Presentation date: 2019

  • RSL19BD at DBDC4: Ensemble of Decision Tree-based and LSTM-based Models

    Chih-hao Wang, Sosuke Kato, Tetsuya Sakai

    arXiv:1905.01799 

    Presentation date: 2019

  • Overview of the NTCIR-14 CENTRE Task

    Tetsuya Sakai, Nicola Ferro, Ian Soboroff, Zhaohao Zeng, Peng Xiao, and Maria Maistro

    Proceedings of NTCIR-14 

    Presentation date: 2019

  • Overview of the NTCIR-14 We Want Web Task

    Jiaxin Mao, Tetsuya Sakai, Cheng Luo, Peng Xiao, Yiqun Liu, and Zhicheng Dou

    Proceedings of NTCIR-14 

    Presentation date: 2019

  • Overview of the NTCIR-14 Short Text Conversation Task: Dialogue Quality and Nugget Detection Subtasks

    Zhaohao Zeng, Sosuke Kato, and Tetsuya Sakai

    Proceedings of NTCIR-14 

    Presentation date: 2019

  • SLSTC at the NTCIR-14 STC-3 Dialogue Quality and Nugget Detection Subtasks

    Sosuke Kato, Rikiya Suzuki, Zhaohao Zeng, and Tetsuya Sakai

    Proceedings of NTCIR-14 

    Presentation date: 2019

  • SLWWW at the NTCIR-14 We Want Web Task

    Peng Xiao and Tetsuya Sakai

    Proceedings of NTCIR-14 

    Presentation date: 2019

  • NTCIR-15ウェブ検索・再現可能性タスク (WWW-3) および対話評価タスク (DialEval-1)への誘い

    酒井哲也

    情報処理学会研究報告2019-IFAT-136 

    Presentation date: 2019

  • Overview of the TREC 2018 CENTRE Track

    Ian Soboroff, Nicola Ferro, Maria Maistro, and Tetsuya Sakai

    Proceedings of TREC 2018 

    Presentation date: 2019

  • クリックと放棄に基づくモバイルバーティカルの順位付け

    川崎 真未, Inho Kang, 酒井哲也

    DEIM 2018 

    Presentation date: 2018

  • Generative Adversarial Nets を用いた文書分類の検証

    小島智樹, 酒井哲也

    DEIM 2018 

    Presentation date: 2018

  • 単語レベルと文字レベルの情報を用いた日本語対話システムの試作

    村田憲俊, 酒井哲也

    DEIM 2018 

    Presentation date: 2018

  • Classifying Community QA QuestionsThat Contain an Image

    Kenta Tamaki, Riku Togashi, Sumio Fujita, Hideyuki Maeda, Tetsuya Sakai

    DEIM 2018 

    Presentation date: 2018

  • ユーザーのニーズに合わせたインタラクティブな推薦システムの提案

    呉越思瑶, 酒井哲也

    DEIM 2018 

    Presentation date: 2018

  • Report on NTCIR-13: The Thirteenth Round of NII Testbeds and Community for Information Access Research

    Yiqun Liu, Makoto P. Kato, Charles L.A. Clarke, Noriko Kando, and Tetsuya Sakai

    SIGIR Forum 52(1) 2018 

    Presentation date: 2018

  • A Comparative Study of Deep Learning Approaches for Visual Question Classification in Community QA

    Hsin-Wen Liu, Avikalp Srivastava, Sumio Fujita, Toru Shimizu, Riku Togashi, and Tetsuya Sakai

    IPSJ SIG Technical Report 2018-IFAT-132 (17) 

    Presentation date: 2018

  • 対話破綻検出コーパスに対する学習データ選別の検討

    河東宗祐, 酒井哲也

    情報処理学会研究報告 2018-IFAT-132 (28) 

    Presentation date: 2018

  • 色・形状・テクスチャに基づく画像検索の自動評価と多様化

    富樫陸, 藤田澄男, 酒井哲也

    情報処理学会研究報告 2018-IFAT-132 (12) 

    Presentation date: 2018

  • Androidアプリのレビューを用いたユーザーへの権限説明の補完

    小島智樹, 酒井哲也

    情報処理学会研究報告 

    Presentation date: 2018

  • 評価実験の設計と論文での結果報告: きちんとやっていますか?

    酒井 哲也

    第3回自然言語処理シンポジウム 

    Presentation date: 2017

  • Report on NTCIR-12: The Twelfth Round of NII Testbeds and Community for Information Access Research

    Makoto P. Kato, Kazuaki Kishida, Noriko Kando, Tetsuya Sakai, and Mark Sanderson

    SIGIR Forum 50 (2) 

    Presentation date: 2017

  • ツイートにおける周辺単語の感情極性値を用いた新語の感情推定

    黒澤 瞭佑, 酒井 哲也

    DEIM 2017 

    Presentation date: 2017

  • 解答検証を利用した選択式問題への自動解答

    佐藤 航, 酒井 哲也

    DEIM 2017 

    Presentation date: 2017

  • 英日言語横断検索におけるクエリ拡張結果の詳細分析

    玉置 賢太, 酒井 哲也

    DEIM 2017 

    Presentation date: 2017

  • アノテーション分布を考慮した対話破綻検出

    河東 宗祐, 酒井 哲也

    DEIM 2017 

    Presentation date: 2017

  • 拡張クエリを用いたレシピ検索のパーソナライゼーション

    犬塚 眞太郎, 酒井 哲也

    DEIM 2017 

    Presentation date: 2017

  • クリックに基づく選好グラフを用いたバーティカル適合性推定

    門田見 侑大, 吉田 泰明, 藤田澄男, 酒井 哲也

    DEIM 2017 

    Presentation date: 2017

  • 複数人で睡眠習慣改善に臨む際の人間関係と協調の効果

    飯島 聡美, 酒井 哲也

    DEIM 2017 

    Presentation date: 2017

  • Test Collections and Measures for Evaluating Customer-Helpdesk Dialogues

    Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, Tetsuya Sakai

    情報処理学会研究報告 2017-NL-232 

    Presentation date: 2017

  • Ranking Rich Mobile Verticals based on Clicks and Abandonment

    Mami Kawasaki, Inho Kang, and Tetsuya Sakai

    Proceedings of NTCIR-13 

    Presentation date: 2017

  • Overview of the NTCIR-13 Short Text Conversation Task

    Lifeng Shang, Tetsuya Sakai, Hang Li, Ryuichiro Higashinaka, Yusuke Miyao, Yuki Arase, and Masako Nomoto

    Proceedings of NTCIR-13 

    Presentation date: 2017

  • Overview of the NTCIR-13 We Want Web Task

    Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou, Chenyan Xiong, and Jingfang Xu

    Proceedings of NTCIR-13 

    Presentation date: 2017

  • SLOLQ at the NTCIR-13 OpenLiveQ Task

    Ryo Kashimura and Tetsuya Sakai,

    Proceedings of NTCIR-13 

    Presentation date: 2017

  • SLQAL at the NTCIR-13 QA Lab-3 Task

    Kou Sato and Tetsuya Sakai

    Proceedings of NTCIR-13 

    Presentation date: 2017

  • SLSTC at the NTCIR-13 STC Task

    Jun Guan and Tetsuya Sakai

    Proceedings of NTCIR-13 

    Presentation date: 2017

  • SLWWW at the NTCIR-13 WWW Task

    Peng Xiao, Lingtao Li, Yimeng Fan, and Tetsuya Sakai

    Proceedings of NTCIR-13 

    Presentation date: 2017

  • Project Next IR -情報検索の失敗分析‐

    難波英嗣, 酒井哲也, 神門典子

    情報処理 

    Presentation date: 2016

  • 発話者を考慮した学習に基づく対話システムの検討

    河東宗祐, 酒井哲也

    DEIM 2016 

    Presentation date: 2016

  • ショッピングサイトにおける購入予測のための行動パターン分析

    出縄弘人, Young-In Song, 酒井哲也

    DEIM 2016 

    Presentation date: 2016

  • コンテキスト付き検索ログを用いた要求ヴァーティカルの分析

    門田見侑大, 吉田泰明, 藤田澄男, 酒井哲也

    DEIM 2016 

    Presentation date: 2016

  • 言語の分散表現と擬似適合性フィードバックを用いた英日言語横断検索

    玉置賢太, 林佑明, 酒井哲也

    DEIM 2016 

    Presentation date: 2016

  • 協調型ヘルスケア -規則正しい睡眠による日中の生産性向上

    飯島聡美, 酒井哲也

    DEIM 2016 

    Presentation date: 2016

  • Overview of the NTCIR-12 Short Text Conversation Task

    Lifeng Shang, Tetsuya Sakai, Zhengdong Lu, Hang Li, Ryuichiro Higashinaka, Yusuke Miyao

    NTCIR-12 

    Presentation date: 2016

  • Overview of the NTCIR-12 MobileClick Task

    Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Virgil Pavlu, Hajime Morita, and Sumio Fujita

    NTCIR-12 

    Presentation date: 2016

  • NEXTI at NTCIR-12 IMine-2 Task

    Hidetsugu Nanba, Tetsuya Sakai, Noriko Kando, Atsushi Keyaki, Koji Eguchi, Kenji Hatano, Toshiyuki Shimizu, Yu Hirate, and Atsushi Fujii

    NTCIR-12 

    Presentation date: 2016

  • SLQAL at the NTCIR-12 QALab-2 Task

    Shin Higuchi, Tetsuya Sakai

    NTCIR-12 

    Presentation date: 2016

  • SLSTC at the NTCIR-12 STC Task

    Hiroto Denawa, Tomoaki Sano, Yuta Kadotami, Sosuke Kato, and Tetsuya Sakai

    NTCIR-12 

    Presentation date: 2016

  • SLLL at the NTCIR-12 Lifelog Task: Sleepflower and the LIT Subtask

    Satomi Iijima and Tetsuya Sakai

    NTCIR-12 

    Presentation date: 2016

  • Evaluating Helpdesk Dialogues: Initial Considerations from An Information Access Perspective

    Tetsuya Sakai, Zhaohao Zeng, Cheng Luo

    IPSJ SIG Technical Report 

    Presentation date: 2016

  • word2vecによる発話ベクトルの類似度を用いた対話破綻予測

    河東宗祐, 酒井 哲也

    人工知能学会 音声・言語理解と対話処理研究会(SLUD)第78回研究会 (第7回対話システムシンポジウム), 

    Presentation date: 2016

  • TREC 2014 Temporal Summarization Track Overview

    Javed Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Richard McCreadie, and Tetsuya Sakai

    TREC 2014 

    Presentation date: 2015

  • 言語の分散表現による文脈情報を利用した言語横断情報検索

    林佑明, 酒井哲也

    DEIM Forum 2015 

    Presentation date: 2015

  • 情報検索のエラー分析

    難波英嗣, 酒井哲也

    言語処理学会第21回年次大会ワークショップ 

    Presentation date: 2015

  • Topic Set Size Design with the Evaluation Measures for Short Text Conversation

    Tetsuya Sakai

    Presentation date: 2015

  • ECol 2015: First International Workshop on the Evaluation of Collaborative Information Seeking and Retrieval

    Leif Azzopardi, Jeremy Pickens, Tetsuya Sakai, Laure Soulier, Lynda Tamine

    ACM CIKM 2015 

    Presentation date: 2015

  • TREC 2013 Temporal Summarization

    Javd Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Virgil Pavlu, and Tetsuya Sakai

    TREC 2013 

    Presentation date: 2014

  • 映像入力デバイスを悪用する Android アプリの解析と対策法

    渡邉卓弥, 森達哉, 酒井哲也

    信学技報 

    Presentation date: 2014

  • Androidアプリの説明文とプライバシー情報アクセスの相関分析

    渡邉卓弥, 秋山満昭, 酒井哲也, 鷲崎弘宜, 森達哉

    マルウェア対策研究人材育成ワークショップ 2014 

    Presentation date: 2014

  • Overview of the NTCIR-11 MobileClick Task

    Makoto P. Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, and Mayu Iwata

    NTCIR-11 

    Presentation date: 2014

  • A Preview of the NTCIR-10 INTENT-2 Results

    Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, Makoto P. Kato, and Mayu Iwata

    IPSJ SIG Technical Report 

    Presentation date: 2013

  • Overview of the NTCIR-10 INTENT-2 Task

    Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, Makoto P. Kato, and Mayu Iwata

    NTCIR-10 

    Presentation date: 2013

  • Overview of the NTCIR-10 1CLICK-2 Task

    Makoto P. Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, and Mayu Iwata

    NTCIR-10 

    Presentation date: 2013

  • Microsoft Research Asia at the NTCIR-10 Intent Task

    Kosetsu Tsukuda, Zhicheng Dou, and Tetsuya Sakai

    NTCIR-10 

    Presentation date: 2013

  • MSRA at NTCIR-10 1CLICK-2

    Kazuya Narita, Tetsuya Sakai, Zhicheng Dou, and Young-In Song

    NTCIR-10 

    Presentation date: 2013

  • How Intuitive Are Diversified Search Metrics? Concordance Test Results for the Diversity U-measures

    Tetsuya Sakai

    IPSJ SIG Technical Report 

    Presentation date: 2013

  • モバイル「情報」検索に向けて: NTCIR-11 MobileClickタスクへの誘い

    加藤誠, Matthew Ekstrand-Abueg, Virgil Pavlu, 酒井哲也, 山本岳洋, 岩田麻佑

    人工知能学会第5回インタラクティブ情報アクセスと可視化マイニング研究会 

    Presentation date: 2013

  • 曖昧なクエリと(不)明快なクエリ:NTCIR-10 INTENT-2と1CLICK-2タスクへの誘い

    酒井哲也

    IPSJ SIG Technical Report 

    Presentation date: 2012

  • NTCIR-9総括と今後の展望

    酒井哲也, 上保秀夫, 神門典子, 加藤恒昭, 相澤彰子, 秋葉友良, 後藤功雄, 木村文則, 三田村照子, 西崎博光, 嶋秀樹, 吉岡真治, Shlomo Geva, Ling-Xiang Tang, Andrew Trotman, and Yue Xu

    IPSJ SIG Technical Report 

    Presentation date: 2012

  • Frontiers, Challenges, and Opportunities for Information Retrieval: Report from SWIRL 2012 The Second Strategic Workshop on Information Retrieval in Lorne,

    Allan, J, Aslam, J, Azzopardi, L, Belkin, N, Borlund, P, Bruza, P, Callan, J, Carman, M, Clarke, C.L.A, Craswell, N. Croft, W, B, Culpepper, J.S, Diaz, F, Dumais, S, Ferro, N, Geva, S, Gonzalo, J, Hawking, D, Jarvelin, K, Jones, G, Jones, R, Kamps, J, Kando, N, Kanoulas, N, Karlgren, J, Kelly, D, Lease, M, Lin, J, Mizzaro, S, Moffat, A, Murdock, V, Oard, D.W, de Rijke, M, Sakai, T, Sanderson, M, Scholer, F, Si, L, Thom, J.A, Thomas, P, Trotman, A, Turpin, A

    SIGIR Forum 

    Presentation date: 2012

  • The Reusability of a Diversified Search Test Collection

    Tetsuya Sakai

    IPSJ SIG Technical Report 

    Presentation date: 2012

  • One Click One Revisited: Enhancing Evaluation based on Information Units

    Tetsuya Sakai and Makoto P. Kato

    IPSJ SIG Technical Report 

    Presentation date: 2012

  • 複数判定者によるコミュニティQAの良質回答の判定

    石川大介, 酒井哲也, 関洋平, 栗山和子, 神門典子

    情報知識学会誌 

    Presentation date: 2011

  • Japanese Hyponymy Extraction based on a Term Similarity Graph

    Takuya Akiba and Tetsuya Sakai

    IPSJ SIG Technical Report 

    Presentation date: 2011

  • Overview of NTCIR-9

    Tetsuya Sakai and Hideo Joho

    NTCIR-9 Proceedings 

    Presentation date: 2011

  • Overview of the NTCIR-9 INTENT Task

    Ruihua Song, Min Zhang, Tetsuya Sakai, Makoto P. Kato, Yiqun Liu, Miho Sugimoto, Qinglei Wang and Naoki Orii

    NTCIR-9 Proceedings 

    Presentation date: 2011

  • Overview of NTCIR-9 1CLICK

    Tetsuya Sakai, Makoto P. Kato, and Young-In Song:

    NTCIR-9 Proceedings 

    Presentation date: 2011

  • Microsoft Research Asia at the NTCIR-9 1CLICK Task

    Naoki Orii, Young-In Song, and Tetsuya Sakai:

    NTCIR-9 Proceedings 

    Presentation date: 2011

  • Microsoft Research Asia at the NTCIR-9 Intent Task

    Jialong Han, Qinglei Wang, Naoki Orii, Zhicheng Dou, Tetsuya Sakai, and Ruihua Song:

    NTCIR-9 Proceedings 

    Presentation date: 2011

  • TTOKU Summarization Based Systems at NTCIR-9 1CLICK Task

    Hajime Morita, Takuya Makino, Tetsuya Sakai, Hiroya Takamura, and Manabu Okumura:

    NTCIR-9 Proceedings 

    Presentation date: 2011

  • Grid-based Interaction for NTCIR-9 VisEx Task,

    Hideo Joho, Tetsuya Sakai

    NTCIR-9 Proceedings 

    Presentation date: 2011

  • NTCIR-9 VisEx におけるグリッド型インタラクションモデルの研究

    上保秀夫, 酒井哲也

    人工知能学会情報編纂研究会第7回究会 

    Presentation date: 2011

  • Q&Aサイトにおけるベストアンサー推定の分析とその機械学習への応用

    石川大介, 栗山和子, 酒井哲也, 関洋平, 神門典子

    情報知識学会年次大会予稿 

    Presentation date: 2010

  • OvervieOverview of the NTCIR-8 ACLIA Tasks: Advanced Cross-Lingual Information Accessw of the NTCIR-8 ACLIA Tasks

    Teruko Mitamura, Hideki Shima, Tetsuya Sakai, Noriko Kando, Tatsunori Mori, Koichi Takeda, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, and Cheng-Wei Lee:

    NTCIR-8 Proceedings 

    Presentation date: 2010

  • Overview of NTCIR-8 ACLIA IR4QA

    Tetsuya Sakai, Hideki Shima, Noriko Kando, Ruihua Song, Chuan-Jia Lin, Teruko Mitamura, Miho Sugimoto, and Cheng-Wei Lee:

    NTCIR-8 Proceedings 

    Presentation date: 2010

  • NTCIR-GeoTime Overview: Evaluating Geographic and Temporal Search

    Fredric Gey, Ray Larson, Noriko Kando, Jorge Machado, and Tetsuya Sakai:

    NTCIR-8 Proceedings 

    Presentation date: 2010

  • Overview of the NTCIR-8 Community QA Pilot Task (Part I)

    Daisuke Ishikawa, Tetsuya Sakai, and Noriko Kando:

    The Test Collection and the Task, NTCIR-8 Proceedings 

    Presentation date: 2010

  • Overview of the NTCIR-8 Community QA Pilot Task (Part II)

    Tetsuya Sakai, Daisuke Ishikawa, and Noriko Kando:

    System Evaluation, NTCIR-8 Proceedings 

    Presentation date: 2010

  • Microsoft Research Asia with Redmond at the NTCIR-8 Community QA Pilot Task

    Young-In Song, Jing Liu, Tetsuya Sakai, Xin-Jing Wang, Guwen Feng, Yunbo Cao, Hisami Suzuki. and Chin-Yew Lin:

    NTCIR-8 Proceedings 

    Presentation date: 2010

  • Mutilinguality at NTCIR, and moving on... (invited talk)

    Tetsuya Sakai  [Invited]

    Proceedings of the COLING 2010 Fourth Workshop on Cross Lingual Information Access 

    Presentation date: 2010

  • EVIA 2010: The Third International Workshop on Evaluating Information Access

    William Webber, Tetsuya Sakai, and Mark Sanderson

    ACM SIGIR Forum 

    Presentation date: 2010

  • ウィキペディアを活用した探検型検索サイトのクエリログ分析

    酒井哲也, 野上謙一

    IPSJ SIG Technical Report 

    Presentation date: 2009

  • NTCIR-7 ACLIA IR4QA Results based on Qrels Version 2

    Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Ruihua Song, Hideki Shima, and Teruko Mitamura

    NTCIR-7 Online Proceedings 

    Presentation date: 2009

  • EVIA 2008: The Second International Workshop on Evaluating Information Access

    Tetsuya Sakai, Mark Sanderson, and Noriko Kando

    ACM SIGIR Forum 

    Presentation date: 2009

  • Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments

    Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Ruihua Song, Hideki Shima, and Teruko Mitamura

    IPSJ SIG Technical Report 

    Presentation date: 2009

  • Report on the SIGIR 2009 Workshop on the Future of IR Evaluation,

    Jaap Kamps, Shlomo Geva, Carol Peters, Tetsuya Sakai, Andrew Trotman, Ellen Voorhees

    ACM SIGIR Forum 

    Presentation date: 2009

  • チュートリアル 情報検索テストコレクションと評価指標

    酒井哲也

    IPSJ SIG Technical Report 

    Presentation date: 2008

  • Comparing Metrics across TREC and NTCIR: The Robustness to System Bias

    Tetsuya Sakai

    IPSJ SIG Technical Report 

    Presentation date: 2008

  • Breaking News from NTCIR-7 (in Japanese),

    酒井 哲也, 加藤 恒昭, 藤井 敦, 難波 英嗣, 関 洋平, 三田村照子, 神門典子

    ディジタル図書館編集委員会 

    Presentation date: 2008

  • Are Popular Documents More Likely To Be Relevant? A Dive into the ACLIA IR4QA Pools

    Tetsuya Sakai and Noriko Kando

    Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008) 

    Presentation date: 2008

  • Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access

    Teruko Mitamura, Eric Nyberg, Hideki Shima, Tsuneaki Kato, Tatsunori Mori, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, Tetsuya Sakai, Donghong Ji, and Noriko Kando:

    NTCIR-7 Proceedings 

    Presentation date: 2008

  • Overview of the NTCIR-7 ACLIA IR4QA Task

    Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Teruko Mitamura, Hideki Shima, Donghong Ji, Kuang-Hua Chen, and Eric Nyberg:

    NTCIR-7 Proceedings 

    Presentation date: 2008

  • 効率的な番組視聴を支援するための話題ラベルの生成とその評価

    小山誠, 酒井哲也, 福井美佳, 上原龍也, 下森大志

    IPSJ SIG Technical Report 

    Presentation date: 2007

  • Toshiba BRIDJE at NTCIR-6 CLIR: The Head/Lead Method and Graded Relevance Feedback

    Tetsuya Sakai, Makoto Koyama, Tatsuya Izuha, Akira Kumano, Toshihiko Manabe, and Tomoharu Kokubu:

    NTCIR-6 Proceedings 

    Presentation date: 2007

  • A Further Note on Alternatives to Bpref

    Tetsuya Sakai and Noriko Kando

    IPSJ SIG Technical Report 

    Presentation date: 2007

  • EVIA 2007: The First International Workshop on Evaluating Information Access

    Mark Sanderson, Tetsuya Sakai, and Noriko Kando

    ACM SIGIR Forum 

    Presentation date: 2007

  • Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance

    Tetsuya Sakai

    IPSJ SIG Technical Report 

    Presentation date: 2006

  • 質問応答型検索の音声認識誤りに対するロバスト性向上

    筒井 秀樹, 真鍋俊彦, 福井 美佳, 藤井 寛子, 浦田 耕二, 酒井哲也

    IPSJ SIG Technical Report 

    Presentation date: 2005

  • 文書分類技法とそのアンケート分析への応用

    平澤茂一, 石田崇, 足立鉱史, 後藤正幸, 酒井哲也

    経営情報学会2005年度春季全国研究発表大会 

    Presentation date: 2005

  • インターネットを用いた研究支援環境~情報検索システム~

    石田崇, 足立鉱史, 後藤正幸, 酒井哲也, 平澤茂一

    経営情報学会2005年度春季全国研究発表大会 

    Presentation date: 2005

  • 質問応答システムの正解順位とユーザ満足率の関係について

    國分智晴, 酒井哲也, 齋藤 佳美, 筒井 秀樹, 真鍋俊彦, 藤井寛子

    IPSJ SIG Technical Report 

    Presentation date: 2005

  • 教学支援システムに関する学生アンケートの分析

    渡辺智幸, 後藤正幸, 石田崇, 酒井哲也, 平澤茂一

    FIT 2005 一般講演論文集 

    Presentation date: 2005

  • The Effect of Topic Sampling in Sensitivity Comparisons of Information Retrieval Metrics

    Tetsuya Sakai

    IPSJ SIG Technical Report 

    Presentation date: 2005

  • Toshiba BRIDJE at NTCIR-5: Evaluation using Geometric Means

    Tetsuya Sakai, Toshihiko Manabe, Akira Kumano, Makoto Koyama, and Tomoharu Kokubu

    NTCIR-5 Proceedings 

    Presentation date: 2005

  • 質問応答技術に基づくマルチモーダルヘルプシステム

    浦田 耕二, 福井美佳, 藤井寛子, 鈴木優, 酒井哲也, 齋藤佳美, 市村 由美, 佐々木寛

    IPSJ SIG Technical Report 

    Presentation date: 2004

  • 質問応答と,日本語固有表現抽出および固有表現体系の関係についての考察

    市村由美, 齋藤佳美, 酒井哲也, 國分智晴, 小山誠

    IPSJ SIG Technical Report 

    Presentation date: 2004

  • Toshiba BRIDJE at NTCIR-4 CLIR: Monolingual/Bilingual IR and Flexible Feedback

    Tetsuya Sakai, Makoto Koyama, Akira Kumano, and Toshihiko Manabe

    NTCIR-4 Proceedings 

    Presentation date: 2004

  • Toshiba ASKMi at NTCIR-4 QAC2

    Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Makoto Koyama, and Tomoharu Kokubu

    NTCIR-4 Proceedings 

    Presentation date: 2004

  • 自然言語表現に基づく学生アンケート分析システム

    酒井哲也, 石田崇, 後藤正幸, 平澤茂一

    FIT 2004 一般講演論文集 N-021 

    Presentation date: 2004

  • 新聞記事からの用語定義の抽出と固有表現クラスに基づく分類

    小山誠, 酒井哲也, 真鍋俊彦

    IPSJ SIG Technical Report 

    Presentation date: 2004

  • High-Precision Search via Question Abstraction for Japanese Question Answering

    Tetsuya Sakai, Yoshimi Saito, Tomoharu Kokubu, Makoto Koyama, and Toshihiko Manabe

    IPSJ SIG Technical Report 

    Presentation date: 2004

  • 情報検索技術を用いた選択式・自由記述式の学生アンケート解析

    石田崇, 足立鉱史, 後藤正幸, 酒井哲也, 平澤茂一

    経営情報学会2004年度秋季全国研究発表大会 

    Presentation date: 2004

  • A Note on the Reliability of Japanese Question Answering Evaluation

    Tetsuya Sakai

    IPSJ SIG Technical Report 

    Presentation date: 2004

  • 情報検索技術を用いた効率的な授業アンケートの分析

    酒井哲也, 伊藤潤, 後藤正幸, 石田崇, 平澤茂一

    経営情報学会2003年度春季全国研究発表大会 

    Presentation date: 2003

  • 選択式・記述式アンケートからの知識発見

    後藤正幸, 酒井哲也, 伊藤潤, 石田崇, 平澤茂一

    2003 PCカンファレンス 

    Presentation date: 2003

  • 授業に関する選択式・記述式アンケートの分析

    平澤茂一, 石田崇, 伊藤潤, 後藤正幸, 酒井哲也

    私立大学情報教育協会平成15年度大学情報化全国大会 

    Presentation date: 2003

  • PLSIを利用した文書からの知識発見

    伊藤潤, 石田崇, 後藤正幸, 酒井哲也, 平澤茂一

    FIT 2003 一般講演論文集 

    Presentation date: 2003

  • 質問応答システムにおけるパッセージ検索の評価,

    國分智晴, 酒井哲也

    FIT 2003 一般講演論文集 

    Presentation date: 2003

  • Toshiba KIDS at NTCIR-3: Japanese and English-Japanese IR

    Tetsuya Sakai, Makoto Koyama, Mika Suzuki, and Toshihiko Manabe

    NTCIR-3 Proceedings 

    Presentation date: 2003

  • ベイズ統計を用いた文書ファイルの自動分析手法

    後藤正幸, 伊藤潤, 石田崇, 酒井哲也, 平澤茂一

    経営情報学会2003年度秋季全国研究発表大会 

    Presentation date: 2003

  • 授業モデルとその検証

    石田崇, 伊藤潤, 後藤正幸, 酒井哲也, 平澤茂一

    経営情報学会2003年度秋季全国研究発表大会 

    Presentation date: 2003

  • 係り受け木を用いた日本語文書の重要部分抽出

    伊藤潤, 酒井哲也, 平澤茂一

    IPSJ SIG Technical Report 

    Presentation date: 2003

  • Flexible Pseudo-Relevance Feedback for NTCIR-2

    Tetsuya Sakai, Stephen E. Robertson, and Stephen Walker

    NTCIR-2 

    Presentation date: 2001

  • Generic Summaries for Indexing in Information Retrieval - Detailed Test Results

    Tetsuya Sakai and Karen Sparck Jones

    Computer Laboratory, University of Cambridge 

    Presentation date: 2001

  • インターネットを用いた研究活動支援システム

    平澤茂一, 松嶋敏泰, 鴻巣敏之, 酒井哲也, 中澤真, 李相協, 野村亮

    2001 PCカンファランス 

    Presentation date: 2001

  • Cross -language情報検索のためのBMIR - J2を用いた一考察

    酒井 哲也, 梶浦 正浩, 住田 一男

    IPSJ SIG Technical Report 

    Presentation date: 1999

  • Probabilistic Retrieval of Japanese News Articles for IREX at Toshiba

    Tetsuya Sakai, Masaharu Kajiura, and Kazuo Sumita

    IREX Workshop 

    Presentation date: 1999

  • Cross-Language Information Retrieval for NTCIR at Toshiba

    Tetsuya Sakai, Yasuyo Shibazaki, Masaru Suzuki, Masaharu Kajiura, Toshihiko Manabe, Kazuo Sumita

    NTCIR-1 

    Presentation date: 1999

  • BMIR-J2: A Test Collection for Evaluation of Japanese Information Retrieval Systems

    Tetsuya Sakai, Tsuyoshi Kitani, Yasushi Ogawa, Tetsuya Ishikawa, Haruo Kimoto, Ikuro Keshi, Jun Toyoura, Toshikazu Fukushima, Kunio Matsui, Yoshihiro Ueda, Takenobu Tokunaga, Hiroshi Tsuruoka, Hidekazu Nakawatase, Teru Agata, and Noriko Kando

    ACM SIGIR Forum 

    Presentation date: 1999

  • First Experiments on the BMIR-J2 Collection using the NEAT System

    Gareth Jones, Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita

    IPSJ SIG Technical Report 

    Presentation date: 1998

  • Cross-Language Information Access: a case study for English and Japanese

    Gareth Jones, Nigel Collier, Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita, and Hideki Hirakawa

    IPSJ SIG Technical Report 

    Presentation date: 1998

  • 日本語情報検索システム評価用テストコレクションBMIR-J2

    木谷強, 小川泰嗣, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 酒井哲也, 徳永健伸, 鶴岡弘, 安形輝

    IPSJ SIG Technical Report 

    Presentation date: 1997

  • 情報フィルタリングシステムNEATの開発

    梶浦正浩, 三池誠司, 酒井哲也, 佐藤誠, 住田一男

    第54回情報処理学会全国大会 

    Presentation date: 1997

  • ベンチマークBMIR-J2を用いた情報フィルタリングシステムNEATの評価

    酒井哲也, 梶浦正浩, 三池誠司, 佐藤誠, 住田一男

    第54会情報処理学会全国大会 

    Presentation date: 1997

  • 情報フィルタリングシステムNEATのための検索要求文からのプロファイル生成

    酒井 哲也, 梶浦 正浩, 住田 一男

    IPSJ SIG Technical Report 

    Presentation date: 1997

  • 電子図書館のための効率的な文書検索

    住田一男, 酒井哲也, 小野顕司, 三池誠司

    ディジタル図書館 No.3 

    Presentation date: 1995

  • 文書検索システムの動的抄録提示インタフェースの評価

    酒井 哲也, 三池 誠司, 住田 一男

    情報処理学会研究報告ヒューマンコンピュータインタラクション 

    Presentation date: 1994

▼display all

Research Projects

  • ナゲットに基づくタスク指向対話の自動評価に関する研究

    Project Year :

    2017.04
    -
    2021.03
     

     View Summary

    コンペティション型国際会議NTCIR-14にてShort Text Conversation (STC-3) タスクをスケジュール通りに運営し、早稲田大学酒井研究室を含む12の研究機関から結果を提出してもらうことができた。このタスクは、顧客・ヘルプデスク間の対話の品質を推定するものであり、この技術は将来的に対話システムの応答戦略に応用可能である。タスクの評価方法については情報検索会議の最高峰SIGIRにて発表を行い、データセットに関してはJournal of Information Processingにてまとめた。後者はWebDB Forum 2018にてbest paper runner-upに選出された。<BR>・Zeng, Z., Luo, C., Shang, L., Li, H., and Sakai, T.: Towards Automatic Evaluation of Customer-Helpdesk Dialogues, Journal of Information Processing, Volume 26, pp.768-778, 査読あり, 2018. WebDB Forum 2018 Best Paper Runner-up<BR>・Sakai, T.: Comparing Two Binned Probability Distributions for Information Access Evaluation, Proceedings of ACM SIGIR 2018, pp.1073-1076, 査読あり, 2018.以下のスケジュールに沿ってタスク運営を進めることができた。4月 データのクローリング+アノテーションツールの開発、5-8月 データのアノテーション、9月 学習用データ公開、11月 評価用データ公開・結果提出締切、2月 タスクオーバービュー論文暫定版公開、3月 タスク参加者論文暫定版投稿2019年度の計画は以下の通りである。・NTCIR-14にてタスク運営者およびタスク参加者としての研究成果を発表・対話データセットDCH-1の中英翻訳を進め、より広くの対話研究者が使えるようにする・NTCIR-15における対話タスクの設計と提案、推

  • Exploratory Search Considering the User's Situation

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2016.04
    -
    2020.03
     

    Kando Noriko

     View Summary

    The purpose of this study is to propose the interactive exploratory search technologies supporting users based on each user's situation. Firstly, we have conducted multiple laboratory-typed or crowdsourcing user studies and investigated various factors affect the search process and user's perceived satisfaction, such as the cognitive complexity of the search tasks. Secondly, we have investigated and tested the wide variety of tools and underlying technologies supporting user's exploratory such as inference of the user's situation based on logs and/or eye gaze, query recommendation, search results diversification, multi-facet search, and ostensive search. Thirdly the prototype of the ostensive search model-based interactive exploratory search and guide app for the museum was developed, using user's logs both in the virtual space on the app and the physical space in the museum, encouraged the unforgettable museum experience and introduced the context before and after the museum visits.

Misc

  • Developing an Interactive Analysis System for Voice Assistant Apps

    刀塚敦子, 飯島涼, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉, 森達哉

    電子情報通信学会技術研究報告(Web)   120 ( 384(ICSS2020 26-59) )  2021

    J-GLOBAL

  • Overview of the NTCIR-12 MobileClick-2 Task.

    Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto,Virgil Pavlu, Hajime Morita, Sumio Fujita

    Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10, 2016    2016

  • Overview of the NTCIR-11 MobileClick Task.

    Makoto P. Kato, Matthew Ekstrand-Abueg,Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

    Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-11, National Center of Sciences, Tokyo, Japan, December 9-12, 2014    2014

  • A Preview of the NTCIR-10 INTENT-2 Results

      2013 ( 5 ) 1 - 8  2013.02

    CiNii

  • A Preview of the NTCIR-10 INTENT-2 Results

      2013 ( 5 ) 1 - 8  2013.02

    CiNii

  • Overview of the NTCIR-10 1CLICK-2 Task.

    Makoto P. Kato, Matthew Ekstrand-Abueg,Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

    Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-10, National Center of Sciences, Tokyo, Japan, June 18-21, 2013     182 - 211  2013

  • Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

    Hajime Morita, Tetsuya Sakai, Manabu Okumura

      5 ( 2 ) 11 - 16  2012.06

    CiNii

  • Determination of the Quality of Answers in Community QA by Multiple Assessors

    ISHIKAWA Daisuke, SAKAI Tetsuya, SEKI Yohei, KURIYAMA Kazuko, KANDO Noriko

    Journal of Japan Society of Information and Knowledge   21 ( 2 ) 169 - 177  2011.05

     View Summary

    Community Question Answering (CQA) has recently become a popular means of satisfying personal information needs. However, as the quality of answers posted on CQA sites varies widely, there is a need to effectively extract high-quality answers from CQA. In this study, we manually analyzed the high-quality answers from Yahoo! Chiebukuro data by multiple assessors and identi?ed criteria used by assessors to evaluate high-quality answers.

    DOI CiNii

  • Overview of the NTCIR-9 INTENT Task.

    Ruihua Song, Min Zhan, Tetsuya Sakai, Makoto, P. Kato, Yiqun Liu, Miho Sugimoto, Qinglei Wang, Naoki Orii

    Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, NTCIR-9, National Center of Sciences, Tokyo, Japan, December 6-9, 2011     82 - 105  2011

  • Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments

    SAKAI TETSUYA, KANDO NORIKO, LIN CHUAN-JIE, SONG RUIHUA, SHIMA HIDEKI, MITAMURA TERUKO

      2009 ( 9 ) 1 - 8  2009.07

    CiNii

  • Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments

    SAKAI TETSUYA, KANDO NORIKO, LIN CHUAN-JIE, SONG RUIHUA, SHIMA HIDEKI, MITAMURA TERUKO

      2009 ( 9 ) 1 - 8  2009.07

    CiNii

  • Hobby & Work

    SAKAI Tetsuya

    IPSJ Magazine   49 ( 7 ) 835 - 835  2008.07

    CiNii

  • Comparing Metrics across TREC and NTCIR : The Robustness to System Bias

    SAKAI Tetsuya

    IPSJ SIG Notes   2008 ( 56 ) 1 - 8  2008.06

     View Summary

    Test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in more realistic settings, by reducing the number of pooled systems. Even though previous work has shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that they are not necessarily superior to traditional metrics in the presence of system bias. Using data from both TREC and NTCIR, we first show that condensed-list metrics overestimate new systems while traditional metrics underestimate them, and that the overestimation tends to be larger than the underestimation. We then show that, when relevance data is heavily biased towards a single team or a few teams, the condensed-list versions of Average Precision (AP), Q-measure (Q) and normalised Discounted Cumulative Gain (nDCG), which we call AP', Q' and nDCG', are not necessarily superior to the original metrics in terms of discriminative power, i.e., the overall ability to detect pairwise statistical significance. Nevertheless, AP' and Q' are generally more discriminative than bpref and the condensed-list version of Rank-Biased Precision (RBP), which we call RBP'.

    CiNii

  • Information Retrieval Test Collections and Evaluation Metrics: A Tutorial

    Sakai Tetsuya

    IPSJ SIG Notes   2008 ( 4 ) 1 - 8  2008.01

    CiNii

  • Information Retrieval Test Collections and Evaluation Metrics: A Tutorial

    SAKAI Tetsuya

    IPSJ SIG Notes   2008 ( 4 ) 1 - 8  2008.01

    CiNii

  • A Further Note on Alternatives to Bpref

    SAKAI Tetsuya, KANDO Noriko

    IPSJ SIG Notes   2007 ( 109 ) 7 - 14  2007.11

     View Summary

    This paper compares the robustness of information retrieval (IR) metrics to incomplete relevance assessments, using four different sets of graded-relevance test collections with submitted runs-two from TREC and two from NTCIR. We investigate the effect of reducing the original relevance data on discriminative power (i.e., how often statistical significance can be detected given the probability of Type I Error) and on Kendall's rank correlation between two system rankings. According to these experiments, Q', nDCG' and AP' proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also clarify some properties of these metrics that immediately follow from their definitions.

    CiNii

  • After Q and R Comes O, then P…

    SAKAI Tetsuya

    IPSJ Magazine   48 ( 7 ) 761 - 761  2007.07

    CiNii

  • Automatic Generation of Topic Labels for Efficient Video Viewing

    KOYAMA Makoto, SAKAI Tetsuya, FUKUI Mika, UEHARA Tatsuya, SHIMOMORI Taishi

    IPSJ SIG Notes   2007 ( 34 ) 17 - 23  2007.03

     View Summary

    This paper describes a method for generating keyword, phrase and sentence labels for video segments of TV programs. By using a relevance feedback algorithm in information retrieval, it selects topic keywords, phrases and sentences from closed caption text in each topical segment. 39 subjects evaluated keyword, phrase and sentence labels from TV programs about travel, town and cooking. The results show that keyword and phrase labels achieve better results than sentence labels on understandability and relevance of labels.

    CiNii

  • Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evalution with Graded Relevance

    SAKAI Tetsuya

    IPSJ SIG Notes   2006 ( 94 ) 57 - 64  2006.09

     View Summary

    Large-scale information retrieval evaluation efforts such as TREC and NTCIR have always used binary-relevance evaluation metrics, even when graded relevance data were available. However, the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics, though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evaluation with graded relevance.

    CiNii

  • Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evalution with Graded Relevance

    Sakai Tetsuya

    IPSJ SIG Notes   2006 ( 94 ) 57 - 64  2006.09

     View Summary

    Large-scale information retrieval evaluation efforts such as TREC and NTCIR have always used binary-relevance evaluation metrics, even when graded relevance data were available. However, the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics, though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evaluation with graded relevance.

    CiNii

  • Improving the Robustness to Recognition Errors in Speech Input Question Answering

    TSUTSUI Hideki, MANABE Toshihiko, FUKUI Mika, FUJII Hiroko, URATA Koji, SAKAI Tetsuya

    IPSJ SIG Notes   2005 ( 22 ) 31 - 38  2005.03

     View Summary

    We have been developing a multimodal question answering system that combines the search technology for multimodal contents with high expressive power such as video, speech and text, and the factoid question answering technology for understanding the user's information need and extracting exact answers from text. Failure analyses of our system showed that speech recognition errors were fatal for answer type recognition and therefore for the final Mean Reciprocal Rank (MRR) performance, especially with numerical factoid questions. We therefore propose a new method which is robust to speech recognition errors. This method improves our MRR based on top 3 answers from 0.429 to 0.597.

    CiNii

  • A Note on the Reliability of Japanese Question Answering Evaluation

    SAKAI Tetsuya

    IPSJ SIG Technical Reports   2004 ( 119 ) 57 - 64  2004.11

     View Summary

    This paper compares some existing QA evaluation metrics from the viewpoint of reliability and usefulness, using the NTCIR-4 QAC2 Japanese QA tasks and our adaptations of Buckley/Voorhees and Voorhees/Buckley reliability measurement methods. Our main findings are : (1) The fraction of questions with a correct answer within Top 5 (NQcorrect5) and that with a correct answer at Rank 1 (NQcorrectl) are not as stable as Reciprocal Rank based on ranked lists containing up to five answers. (2) Q-measure, which can handle multiple correct answers and answer correctness levels, is as reliable and useful as Reciprocal Rank, provided that a mild gain value assignment is used. Emphasising answer correctness levels tends to hurt stability, while handling multiple correct answers improves it.

    CiNii

  • High - Precision Search via Question Abstraction for Japanese Question Answering

    SAKAI Tetsuya, SAITO Yoshimi, KOKUBU Tomoharu, KOYAMA Makoto, MANABE Toshihiko

    IPSJ SIG Notes   2004 ( 93 ) 139 - 146  2004.09

     View Summary

    This paper explores the use of Question Abstraction, i.e., Named Entity Recognition for questions input by the user, for reranking retrieved documents to enhance retrieval precision for Japanese Question Answering (QA). Question Abstraction may help improve precision because (a) As named entities are often phrases, it may have effects that are similar to phrasal or proximity search; (b) As named entity recognition is context-sensitive, the named entity tags may help disambiguate ambiguous terms and phrases. Our experiments using several Japanese "exact answer" QA test collections show that this approach significantly improves IR precision, but that this improvement is not necessarily carried over to the overall QA performance. Additionally, we conduct preliminary experiments on the use of Question Abstraction for Pseudo-Relevance Feedback using Japanese IR test collections, and find positive (though not statistically significant) effects. Thus the Question Abstraction approach probably deserves further investigations.

    CiNii

  • Extraction and Classification of Term Definitions Using Named Entity Extraction from News Articles

    KOYAMA Makoto, SAKAI Tetsuya, Manabe Toshihiko

    IPSJ SIG Notes   2004 ( 93 ) 45 - 51  2004.09

     View Summary

    In this paper, we propose a system that uses Japanese newspaper corpora for extracting and classifying term definitions to expand the knowledge of a natural language system such as a question answering system. The system classifies term definitions based on semantic classes obtained through named entity extraction and words obtained through morphological analysis. In an experiment using news articles, the system classifies term definitions by 14 semantic classes and achieves 82.1% precision and 50.8% recall.

    CiNii

  • N-021 A Student Questionnaire Analysis System based on Natural Language Expressions

    Sakai Tetsuya, Ishida Takashi, Goto Masayuki, Hirasawa Shigeichi

      3 ( 4 ) 325 - 328  2004.08

    CiNii

  • Japanese Text Extraction using the Dependency Tree

    ITO Jun, SAKAI Tetsuya, HIRASAWA Shigeichi

    IPSJ SIG Notes   2003 ( 108 ) 19 - 24  2003.11

     View Summary

    A Japanese sentence can be expressed as a tree structure (dependency tree) based on dependency relations. Since a subtree of a dependency tree preserves the dependency of the original tree, it generally represents a correct sentence on its own. In this paper, a document is expressed as an extended dependency tree, in which weights are assigned to its nodes and edges. Moreover, the problem of extracting important text fragments is formalized as that of "searching for a subtree that maximizes a certain score from subtrees of the extended decision tree". We implemented such a summarization system and performed evaluations based on manual assessment as well as comparison with original texts.

    CiNii

  • "ベイズ統計を用いた文書ファイルの自動分析手法,"

    後藤正幸, 伊藤潤, 石田崇, 酒井哲也

    経営情報学会2003年度秋季全国研究発表大会予稿集,函館   pp.28-31  2003

  • 「インターネットを用いた研究活動支援システム」システム構成

    平澤茂一, 松嶋敏泰, 鴻巣敏之, 酒井哲也, 中澤真, 李相協, 野村亮

    2001PCカンファレンス    2001

  • A Study on Cross - language Information Retrieval using BMIR - J2

    SAKAI Tetsuya, KAJIURA Masahiro, SUMITA Kazuo

    IPSJ SIG Notes   1999 ( 2 ) 41 - 48  1999.01

     View Summary

    We study a cross-language IR approach using the NEAT information filtering system and the AS-TRANSAC machine translation system. The BMIR-J2 standard Japanese test collection and our own translated data are used for evaluation. In the English-to-Japanese experiments, we consider both document translation and query translation, and also compare the retrieval performance when the queries are translated by different translators. In the Japanese-to-pseudo-English experiments, we perform local feedback both before and after query translation. We achieve over 90% of Japanese monolingual performance.

    CiNii

  • BMIR -J2- A Test Collection for Evaluation of Japanese Information Retrieval Systems

    KITANI Tsuyoshi, OGAWA Yasushi, ISHIKAWA Tetsuya, KIMOTO Haruo, NAKAWATASE Hidekazu, KESHI Ikuo, TOYOURA Jun, FUKUSHIMA Toshikazu, MATSUI Kunio, UEDA Yoshihiro, SAKAI Tetsuya, TOKUNAGA Takenobu, TSURUOKA Hiroshi, AGATA Teru

    IPSJ SIG Notes   1998 ( 2 ) 15 - 22  1998.01

     View Summary

    BMIR-J2, a test collection for evaluation of Japanese information retrieval systems to be released in March 1998, has been developed by a working group under the Special Interest Group on Database Systems in Information Processing Society of Japan. Since March 1996, a preliminary version called BMIR-J1 has been distributed to fifty sites and used in many research projects. Based on comments from the BMIR-J1 users and our experience, we have enlarged the collection size and revised search queries and relevance assessments in BMIR-J2. In this paper, we describe BMIR-J2 and its development process, and discuss issues to be considered for improving BMIR-J2 further.

    CiNii

  • Profile Generation from Query Sentences for the NEAT Information Filtering System

    SAKAI Tetsuya, KAJIURA Masahiro, SUMITA Kazuo

    IPSJ SIG Notes   1997 ( 86 ) 83 - 88  1997.09

     View Summary

    The NEAT information filtering system selects relevant articles from digital text provided daily by Japanese newspaper companies and publishers, and sends them by e-mail to its users. NEAT calculates a score for each article and produces a ranked output based on various types of query vectors written in the profile, such as location, density and distribution of keywords as well as boolean operators. We show that profiles generated automatically from query sentences can lie halfway between simple boolean profiles and hand-made profiles with respect to retrieval effectiveness. By combining this method and relevance feedback, the burden of manual profile definition will be lightened considerably.

    CiNii

  • Effective Document Retrieval for Digital Library : Document Structure Analysis and Automatic Abstract Generation

    Sumita Kazuo, Sakai Tetsuya, Ono Kenji, Miike Seiji

    Digital libraries   3   35 - 41  1995.03

    CiNii

  • Learning formal languages from Feasible Teachers

    Journal of Japan Industrial Management Association   44 ( 3 ) 245 - 245  1993.08

    CiNii

▼display all

 

Syllabus

▼display all

 

Sub-affiliation

  • Faculty of Science and Engineering   Graduate School of Fundamental Science and Engineering

Research Institute

  • 2022
    -
    2024

    Waseda Research Institute for Science and Engineering   Concurrent Researcher

Internal Special Research Projects

  • ベイズ統計に基づく情報アクセス評価体系の構築

    2017  

     View Summary

    I published the following full paper at SIGIR 2017, the top conference in information retrieval.The following is the abstract:Using classical statistical signifi€cance tests, researchers can onlydiscuss P(D+|H), the probability of observing the data D at hand orsomething more extreme, under the assumption that the hypothesisH is true (i.e., the p-value). But what we usually want is P(D+|H),the probability that a hypothesis is true, given the data. If we useBayesian statistics with state-of-the-art Markov Chain Monte Carlo(MCMC) methods for obtaining posterior distributions, this is nolonger a problem. Th‘at is, instead of the classical p-values and 95%confi€dence intervals, which are oft‰en misinterpreted respectivelyas “probability that the hypothesis is (in)correct” and “probabilitythat the true parameter value drops within the interval is 95%,” wecan easily obtain P(H|D) and credible intervals which representexactly the above. Moreover, with Bayesian tests, we can easilyhandle virtually any hypothesis, not just “equality of means,” andobtain an Expected A Posteriori (EAP) value of any statistic thatwe are interested in. We provide simple tools to encourage theIR community to take up paired and unpaired Bayesian tests forcomparing two systems. Using a variety of TREC and NTCIR data,we compare P(H|D) with p-values, credible intervals with con€fidence intervals, and Bayesian EAP eff‚ect sizes with classical ones.Our results show that (a) p-values and confi€dence intervals canrespectively be regarded as approximations of what we really want,namely, P(H|D) and credible intervals; and (b) sample eff‚ect sizesfrom classical signifi€cance tests can diff‚er considerably from theBayesian EAP eff‚ect sizes, which suggests that the former can bepoor estimates of population e‚ffect sizes. For both paired and unpairedtests, we propose that the IR community report the EAP, thecredible interval, and the probability of hypothesis being true, notonly for the raw diff‚erence in means but also for the eff‚ect size interms of Glass’s delta.

  • 統計的手法を用いた情報検索テストコレクション横断評価および情報検索論文の評価

    2016  

     View Summary

    I published five international conference papers (SIGIR, SIGIR, SIGIR(short), ICTIR, AIRS),two international workshop papers (EVIA, EVIA), and a workshop report (SIGIR Forum).Moreover, I gave a tutorial at an international conference (ICTIR) and a keynote at a Japanese symposium (IPSJ SIGNL) on this topic.

  • 「寡黙なユーザ」のための情報検索技術に関する研究

    2015  

     View Summary

    We published one international journal paper, one international conference paper, one evaluation conference overview (TREC), and two unrefereed domestic papers.

  • 情報アクセス評価基盤の体系化および評価

    2015  

     View Summary

    We published one book, one international journal paper, one international conference paper, one domestic IPSJ workshop paper and organised an international workshop.

  • テストコレクションのサンプルサイズ設計に関する研究

    2014  

     View Summary

    We published three refereed papers (two forinternational conferences and one for a domestic conference) on how todetermine the topic set size of a test collection.

  • サーチエンジン評価指標の体系化と有効性実証

    2014  

     View Summary

    We published three refereed papers (two forinternational conferences and one for a domestic conference) on how todetermine the topic set size of a test collection.

  • 最小限のインタラクションを介した情報アクセスに関する研究

    2014   Koji Yatani, Makoto P. Kato, Takehiro Yamamoto, Virgil Pavlu, Javed Aslam, Fernando Diaz

     View Summary

    Wecollaborated with various researchers from outside Waseda and published severalpapers related to information access via minimal interactions. We ran a taskcalled MobileClick at NTCIR and a track called Temporal Summarization at TREC.It is worth noting that ourMobileHCI paper (collaboration with the University of Tokyo) received anHonourable Mention Award.

▼display all