研究者詳細 - 酒井　哲也

写真a

サカイ　テツヤ

酒井　哲也

Scopus 論文情報

論文数: 207 Citation: 2457 h-index: 28

Click to view the Scopus page. The data was downloaded from Scopus API in October 23, 2025, via http://api.elsevier.com and http://www.scopus.com .

Google Scholar 情報（Citations per year）

Citation: 7608 h-index: 47 i10-index: 168

Click to view the Google Scholar page.

Scopus 情報

News & Topics

2024.10.03

見えてきた医理工連携の成果と展開（第4回日本医科大学・早稲田大学合同シンポジウム開催報告）

2023.10.16

進む、医理工研究交流（第３回日本医科大学・早稲田大学合同シンポジウム開催報告）

所属

理工学術院基幹理工学部

職名

教授

学位

博士

メールアドレス

ホームページ

http://sakailab.com/tetsuya/

プロフィール

http://sakailab.com/tetsuya/

研究分野

ヒューマンインタフェース、インタラクション

研究キーワード

情報アクセス、情報検索、自然言語処理

受賞

SIGIR Academy 2023

2023年05月 ACM SIGIR
DEIM 2020 Excellent Paper Award (second author)

2020年
FIT 2020 Excellent Paper Award (first author)

2020年
CSS 2019 Best Paper Award (fifth author)

2019年
ACM Distinguished Member

2018年
第6回WASEDA e-Teaching Award

2018年
ACM Recognition of Service Award (SIGIR'17 Co-chair)

2017年
ACM Senior Member

2016年
早稲田大学ティーチングアワード総長賞（2016年度春学期）

2016年
早稲田大学ティーチングアワード（2014年度秋学期）

2015年
CSS 2014 Student Paper Award (third author)

2014年
MobileHCI 2014 Honorable Mention(second author)

2014年
ACM SIGIR 2013 best paper shortlisted nominee (first author)

2013年
AIRS 2012 Best Paper Award (first author)

2012年
WebDB Forum 2010 Excellent Paper Award and NTT Resonant Award (second author)

2010年
FIT 2008 Funai Best Paper Award (first author)

2008年
IEICE ESS 2007 Merit Award

2007年
IPSJ 2007 Best Paper Award (single author)

2007年
IPSJ 2006 Yamashita SIG Research Award (single author)

2006年
IPSJ 2006 Best Paper Award (single author)

2006年
FIT 2005 Excellent Paper Award (single author)

2005年

▼全件表示

論文

Click the search button and be happy: Evaluating direct and immediate information access

Tetsuya Sakai, Makoto P. Kato, Young-In Song

International Conference on Information and Knowledge Management, Proceedings 621 - 630 2011年 [査読有り]

　概要を見る

We define Direct Information Access as a type of information access where there is no user operation such as clicking or scrolling between the user's click on the search button and the user's information acquisition
we define Immediate Information Access as a type of information access where the user can locate the relevant information within the system output very quickly. Hence, a Direct and Immediate Information Access (DIIA) system is expected to satisfy the user's information need very quickly with its very first response. We propose a nugget-based evaluation framework for DIIA, which takes nugget positions into account in order to evaluate the ability of a system to present important nuggets first and to minimise the amount of text the user has to read. To demonstrate the integrity, usefulness and limitations of our framework, we built a Japanese DIIA test collection with 60 queries and over 2,800 nuggets as well as an offset-based nugget match evaluation interface, and conducted experiments with manual and automatic runs. The results suggest our proposal is a useful complement to traditional ranked retrieval evaluation based on document relevance. © 2011 ACM.

DOI

Scopus

22

被引用数

(Scopus)
LLM-assisted Relevance Assessments: When should We Ask LLMs for Help?

Rikiya Takehi, Ellen M. Voorhees, Tetsuya Sakai, Ian Soboroff

Proceedings of SIGIR 2025 95 - 105 2025年 [査読有り]
My System Is As Effective As Yours: Reproducibility, Sustainability, and More

酒井哲也

Proceedings of SIGIR 2025 3943 - 3953 2025年 [査読有り]
COPWA at the NTCIR-18 FairWeb-2 Task

Amogh Raina, Tetsuya Sakai

Proceedings of NTCIR-18 80 - 83 2025年 [査読有り]
RSLFW at the NTCIR-18 FairWeb-2 Task

Atsuya Ishikawa, Sijie Tao, Tetsuya Sakai

Proceedings of NTCIR-18 68 - 71 2025年 [査読有り]
Overview of the NTCIR-18 FairWeb-2 Task

Sijie Tao, Tetsuya Sakai, Junjie Wang, Hanpei Fang, Yuxiang Zhang, Haitao Li, Yiteng Tu, Nuo Chen, Maria Maistro

Proceedings of NTCIR-18 40 - 60 2025年 [査読有り]
Evaluating Group Fairness and Relevance in Conversational Search

Tetsuya Sakai, Sijie Tao, Young-In Song

An Alternative Formulation, Procedings of EVIA 2025 15 - 22 2025年 [査読有り]
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentated Generation

Yiruo Cheng, Kelong Mao, Ziliang Zhao, Guanting Dong, Hongjin Qian, Yongkang Wu, Tetsuya Sakai, Ji-Rong Wen, Zhicheng Dou

Findings of NAACL 2025 1308 - 1330 2025年 [査読有り]
Explainable Detection of Logical and Structural Anomalies based on Multimodal Large Language Models

Noeko Fujii, Tetsuya Sakai

ROBOVIS 2025 2025年 [査読有り]
Reconstruction of 3D Brain Structures from Clinical 2D MRI Data

Rui Shi, Tsukasa Koike, Tetsuro Sekine, Akio Morita, Tetsuya Sakai

ICPRAM 2025, SciTePress, INSTICC, 2025 352 - 259 2025年 [査読有り]
Understanding User Behavior and Measuring System Vulnerability

Nuo Chen, Jiqun Liu, Hanpei Fang, Yuankai Luo, Tetsuya Sakai, Xiao-Ming Wu

ACM TOIS 2025年 [査読有り]
Evaluating System Responses Based On Overconfidence and Underconfidence

Tetsuya Sakai

SIGIR-AP 2024 Workshops EMTCIR 2024 and UM-CIR 2024年 [査読有り]

担当区分：筆頭著者
Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Large Language Models

Yuxiang Zhang, Xin Fan, Junjie Wang, Chongxian Chen, Fan Mo, Tetsuya Sakai, Hayato Yamana

ACM SIGIR-AP 226 - 235 2024年 [査読有り]
AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment

Nuo Chen, Jiqun Liu, Xiaoyu Dong, Qijiong Liu, Tetsuya Sakai, Xiao-Ming Wu

ACM SIGIR-AP 56 - 63 2024年 [査読有り]
Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval

Kelong Mao, Chenlong Deng, Haonan Chen, Fengran Mo, Zheng Liu, Tetsuya Sakai, Zhicheng Dou

EMLNLP 1227 - 1240 2024年 [査読有り]
A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models

Yuxiang Zhang, Jing Chen, Junjie Wang, Yaxin Liu, Cheng Yang, Chufan Shi, Xinyu Zhu, Zihao Lin, Hanwen WAN, Yujiu Yang, Tetsuya Sakai, Tian Feng, Hayato Yamana

EMLNLP 11388 - 11422 2024年 [査読有り]
Benchmarking Chinese Text-to-Table Performance in Large Language Models

Haoxiang Shi, Jiaan Wang, Jiarong Xu, Cen Wang, Tetsuya Sakai

arxiv:2405.12174 2024年 [査読有り]
Solving Named Entity Recognition Problems via a Single-stream Reasoner

Yuxiang Zhang, Junjie, Wang, Xinyu Zhu, Tetsuya Sakai, Hayato Yamana

ACM TOIS 42 ( 5 ) 2024年 [査読有り]
Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models

Qijiong Liu, Nuo Chen, Tetsuya Sakai, Xiao-Ming Wu

ACM WSDM 452 - 461 2024年 [査読有り]
Enhancing Parameter Efficiency in Model Inference using an Ultralight Inter-Transformer Linear Structure

Haoxiang Shi, Tetsuya Sakai

IEEE Access 12 43734 - 43746 2024年 [査読有り]
Modeling Multimodal Uncertainties via Probability Distribution Encoders included Vision-Language Models

Junjie Wang, Yatai, Ji, Yuxiang Zhang, Yanru Zhu, Tetsuya Sakai

IEEE Access 12 420 - 434 2024年 [査読有り]
Zero-Shot Learners for Natural Language Understanding via a Unified Multiple-Choice Perspective

Junjie Wang, Ping Yang, Ruxi Gan, Yuxiang Zhang, Jiaxing Zhang, Tetsuya Sakai

IEEE Access 11 142829 - 142845 2023年 [査読有り]
Ethical Alignment Meets Conversational Information Retrieval

Yiyao Yu, Junjie Wang, Yuxiang Zhang, Lin Zhang, Yujiu Yang, Tetsuya Sakai

ACM SIGIR-AP 32 - 39 2023年 [査読有り]
Deriving Nugget-level Scores from Turn-level Scores

Rikiya Takehi, Akihisa Watanabe, Tetsuya Sakai

ACM SIGIR-AP 40 - 45 2023年 [査読有り]
Chuweb21D: A Deduped English Document Collection for Web Search Tasks

Zhumin Chu, Tetsuya Sakai, Qingyao Ai, Yiqun Liu

ACM SIGIR-AP 63 - 72 2023年 [査読有り]
Fairness-based Evaluation of Conversational Search: A Pilot Study

Tetsuya Sakai

EVIA 5 - 13 2023年 [査読有り]
Decoy Effect in Search Interaction: A Pilot Study

Nuo Chen, Jiqun Liu, Tetsuya Sakai, Xiao-Ming Wu

EVIA 14 - 16 2023年 [査読有り]
On A Few Responsibilities of (IR) Researchers (Fairness, Awareness, and Sustainability), A keynote at ECIR 2023

Tetsuya Sakai

SIGIR Forum 2023年 [査読有り]
Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval

Haoxiang Shi, Sumio Fujita, Tetsuya Sakai

SIGIR Workshop on ReNeuIR 2023年 [査読有り]
Self-supervised and Few-shot Contrastive Learning Frameworks for Text Clustering

Haoxiang Shi, Tetsuya Sakai

IEEE 11 84134 - 84143 2023年 [査読有り]
On the Ordering of Pooled Web Pages, Gold Assessments, and Bronze Assessments

Tetsuya Sakai, Sijie Tao, Nuo Chen, Yujing Li, Maria Maistro, Zhumin Chu, Nicola Ferro

ACM TOIS 2023年 [査読有り]
How Many Crowd Workers Do I Need? On Statistical Power When Crowdsourcing Relevance Judgments

Kevin Roitero, David La Barbera, Michael Soprano, Gianluca Demartini, Stefano Mizzaro, Tetsuya Sakai

ACM TOIS 2023年 [査読有り]
A Versatile Framework for Evaluating Ranked Lists in terms of Group Fairness and Relevance

Tetsuya Sakai, Jin Young Kim, Inho Kang

ACM TOIS 2023年 [査読有り]
Practice and Challenges in Building a Business-oriented Search Engine Quality Metric

Nuo Chen, Donghyun Park, Hyungae Park, Kijun Choi, Tetsuya Sakai, Jinyoung Kim

SIGIR 2023 2023年 [査読有り]
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model

Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang

CVPR 2023 2023年 [査読有り]
A Reference-Dependent Model for Web Search Evaluation

Nuo Chen, Jiqun Liu, Tetsuya Sakai

The 2023 ACM Web Conference 2023年 [査読有り]
Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents? (CORRECTED VERSION)

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

arXiv 2022年 [査読有り]
Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective

Ping Yang, Junjie Wang, Ruyi Gan, Xinyu Zhu, Lin Zhang, Ziwei Wu, Xinyu Gao, Jiaxing Zhang, Tetsuya Sakai

EMNLP 2022 2022年 [査読有り]
Understanding the Behavior Transparency of Voice Assistant Applications Using the ChatterBox Framework

Atsuko Natatsuka, Ryo Iijima, Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, Tatsuya Mori

Proceedings of RAID 2022 2022年 [査読有り]
MAP: Modality-Agnostic Uncertainty-Aware Vision-Language Pre-training Model

Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang

arXiv 2022年 [査読有り]
Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4 English Subtasks

Tetsuya Sakai, Sijie Tao, Maria Maistro, Zhumin Chu, Yujing Li, Nuo Chen, Nicola Ferro, Junjie Wang, Ian Soboroff, Yiqun Liu

arXiv 2022年 [査読有り]
LayerConnect: Hypernetwork-Assisted Inter-Layer Connector to Enhance Parameter Efficiency

Haoxiang Shi, Rongsheng Zhang, Jiaan Wang, Cen Wang, Guandan Chen, Yinhe Zheng, Tetsuya Sakai

Proceedings of COLING 2022 2022年 [査読有り]
Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles?

Rei Shimizu, Sumio Fujita, Tetsuya Sakai

Proceedings of ACM ICTIR 2022 2022年 [査読有り]
Constructing Better Evaluation Metrics by Incorporating the Anchoring Effect into the User Model

Nuo Chen, Fan Zhang, Tetsuya Sakai

ACM SIGIR 2022 2022年 [査読有り]
Evaluating the Effects of Embedding with Speaker Identity Information in Dialogue Summarization

Yuji Naraki, Tetsuya Sakai, Yoshihiko Hayashi

LREC 2022 2022年 [査読有り]
AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval

Riku Togashi, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Tetsuya Sakai

CVPR 2022 2022年 [査読有り]
スタンス検出タスクにおける評価方法の選定 (研究会推薦論文)

雨宮佑基, 酒井哲也

電子情報通信学会和文論文誌D「データ工学と情報マネジメント特集」 2022年 [査読有り]
Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents?

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

ACM TOIS 2022年 [査読有り]
MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

Junjie Wang, Yatai Ji, Jiaqi Sun, Yujiu Yang, Tetsuya Sakai

Findings of the Association for Computational Linguistics: EMNLP 2021 2021年 [査読有り]
A Closer Look at Evaluation Measures for Ordinal Quantification

Tetsuya Sakai

Proceedings of the CIKM 2021 Workshops 2021年 [査読有り]
Evaluating Relevance Judgments with Pairwise Discriminative Power

Zhumin Chu, Jiaxin Mao, Fan Zhang, Yiqun Liu, Tetsuya Sakai, Min Zhang, Shaoping Ma

Proceedings of ACM CIKM 2021 2021年 [査読有り]
Incorporating Query Reformulating Behavior into Web Search Evaluation

Jia Chen, Yiqun Liu, Jiaxin Mao, Fan Zhang, Tetsuya Sakai, Weizhi Ma, Min Zhang, Shaoping Ma

Proceedings of ACM CIKM 2021 2021年 [査読有り]
A Simple and Effective Usage of Self-supervised Contrastive Learning for Text Clustering

Haoxiang Shi, Cen Wang, Tetsuya Sakai

Proceedings of IEEE SMC 2021 2021年 [招待有り]
Evaluating Evaluation Measures for Ordinal Classification and Ordinal Quantification

Tetsuya Sakai

Proceedings of ACL-IJCNLP 2021 2021年 [査読有り]
WWW3E8: 259,000 Relevance Labels for Studying the Effect of Document Presentation Order for Relevance Assessors

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

Proceedings of ACM SIGIR 2021 2021年 [査読有り]
On the Two-Sample Randomisation Test for IR Evaluation

Tetsuya Sakai

Proceedings of ACM SIGIR 2021 2021年 [査読有り]
Scalable Personalised Item Ranking through Parametric Density Estimation

Riku Togashi, Masahiro Kato, Mayu Otani, Tetsuya Sakai, Shin’Ichi Satoh

Proceedings of ACM SIGIR 2021 2021年 [査読有り]
Fast and Exact Randomisation Test for Comparing Two Systems with Paired Data

Rikiya Suzuki, Tetsuya Sakai

Proceedings of ACM ICTIR 2021 2021年 [査読有り]
DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators’ Labels

Zhaohao Zeng, Tetsuya Sakai

arXiv 2021年 [査読有り]
How Do Users Revise Zero-Hit Product Search Queries?

Yuki Amemiya, Tomohiro Manabe, Sumio Fujita, Tetsuya Sakai

Proceedings of ECIR 2021 Part II 2021年 [査読有り]
On the Instability of Diminishing Return IR Measures

Tetsuya Sakai

Proceedings of ECIR 2021 Part I 2021年 [査読有り]
RSL19BD at DBDC4: Ensemble of Decision Tree-Based and LSTM-Based Models

Chih-hao Wang, Sosuke Kato, Tetsuya Sakai

Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems 2021年 [査読有り]
Retrieval Evaluation Measures that Agree with Users’ SERP Preferences: Traditional, Preference-based, and Diversity Measures

Tetsuya Sakai, Zhaohao Zeng

ACM TOIS 2020年 [査読有り]
A Siamese CNN Architecture for Learning Chinese Sentence Similarity

Haoxiang Shi, Cen Wang, Tetsuya Sakai

Proceedings of AACL-IJCNLP 2020 Student Research Workshop (SRW) 2020年 [査読有り]
Automatic Evaluation of Iconic Image Retrieval based on Colour, Shape, and Texture

Riku Togashi, Sumio Fujita, Tetsuya Sakai

Proceedings of ACM ICMR 2020 2020年 [査読有り]
SogouQ: The First Large-Scale Test Collection with Click Streams Used in a Shared-Task Evaluation

Ruihua Song, Min Zhang, Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou

Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact 2020年 [査読有り]
Graded Relevance

Tetsuya Sakai

Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact 2020年 [査読有り]
Visual Intents vs. Clicks, Likes, and Purchases in E-commerce

Riku Togashi, Tetsuya Sakai

Proceedings of ACM SIGIR 2020 2020年 [査読有り]
Good Evaluation Measures based on Document Preferences

Tetsuya Sakai, Zhaohao Zeng

Proceedings of ACM SIGIR 2020 2020年 [査読有り]
How to Measure the Reproducibility of System-oriented IR Experiments

Timo Breuer, Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Philipp Schaer, Ian Soboroff

Proceedings of ACM SIGIR 2020 2020年 [査読有り]
文書分類技術に基づくエントリーシートからの業界推薦

三王慶太, 酒井哲也

日本データベース学会和文論文誌 2020年 [査読有り]
Randomised vs. Prioritised Pools for Relevance Assessments: Sample Size Considerations

Tetsuya Sakai, Peng Xiao

Proceedings of AIRS 2019 2020年 [査読有り]
Generating Short Product Descriptors based on Very Little Training Data

Peng Xiao, Joo-Young Lee, Sijie Tao, Young-Sook Hwang, Tetsuya Sakai

Proceedings of AIRS 2019 2020年 [査読有り]
Unsupervised Answer Retrieval with Data Fusion for Community Question Answering

Sosuke Kato, Toru Shimizu, Sumio Fujita, Tetsuya Sakai

Proceedings of AIRS 2019 2020年 [査読有り]
Towards Automatic Evaluation of Reused Answers in Community Question Answering

Hsin-Wen Liu, Sumio Fujita, Tetsuya Sakai

Proceedings of AIRS 2019 2020年 [査読有り]
Arc Loss: Softmax with Additive Angular Margin for Answer Retrieval

Rikiya Suzuki, Sumio Fujita, Tetsuya Sakai

Proceedings of AIRS 2019 2020年 [査読有り]
System Evaluation of Ternary Error-Correcting Output Codes for Multiclass Classification Problems*

Shigeichi Hirasawa, Gendo Kumoi, Hideki Yagi, Manabu Kobayashi, Masayuki Goto, Tetsuya Sakai, Hiroshige Inazumi

2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) 2019年10月 [査読有り]

DOI
Understanding the inconsistencies between text descriptions and the use of privacy-sensitive resources of mobile apps

Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, Hironori Washizaki, Tatsuya Mori

SOUPS 2015 - Proceedings of the 11th Symposium on Usable Privacy and Security 241 - 255 2019年

　概要を見る

Permission warnings and privacy policy enforcement are widely used to inform mobile app users of privacy threats. These mechanisms disclose information about use of privacy-sensitive resources such as user location or contact list. However, it has been reported that very few users pay attention to these mechanisms during installation. Instead, a user may focus on a more user-friendly source of information: text description, which is written by a developer who has an incentive to attract user attention. When a user searches for an app in a marketplace, his/her query keywords are generally searched on text descriptions of mobile apps. Then, users review the search results, often by reading the text descriptions
i.e., text descriptions are associated with user expectation. Given these observations, this paper aims to address the following research question: What are the primary reasons that text descriptions of mobile apps fail to refer to the use of privacy-sensitive resources? To answer the research question, we performed empirical large-scale study using a huge volume of apps with our ACODE (Analyzing COde and DEscription) framework, which combines static code analysis and text analysis. We developed light-weight techniques so that we can handle hundred of thousands of distinct text descriptions. We note that our text analysis technique does not require manually labeled descriptions
hence, it enables us to conduct a large-scale measurement study without requiring expensive labeling tasks. Our analysis of 200,000 apps and multilingual text descriptions collected from official and third-party Android marketplaces revealed four primary factors that are associated with the inconsistencies between text descriptions and the use of privacy-sensitive resources: (1) existence of app building services/frameworks that tend to add API permissions/code unnecessarily, (2) existence of prolific developers who publish many applications that unnecessarily install permissions and code, (3) existence of secondary functions that tend to be unmentioned, and (4) existence of third-party libraries that access to the privacy-sensitive resources. We believe that these findings will be useful for improving users' awareness of privacy on mobile software distribution platforms.
Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

Tetsuya Sakai

Proceesings of ACM WSDM 2019 2019年 [査読有り]
Attitude Detection for One-Round Conversation: Jointly Extracting Target-Polarity Pairs

Zhaohao Zeng, Ruihua Song, Pingping Lin, Tetsuya Sakai

Proceesings of ACM WSDM 2019 2019年 [査読有り]
A Comparative Study of Deep Learning Approaches for Extractive Query-Focused Multi-Document Summarization

Yuliska, Tetsuya Sakai

Proceedings of IEEE ICICT 2019 2019年 [査読有り]
Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

Tetsuya Sakai

Proceedings of ECIR 2019 Part II (LNCS 11438) 2019年 [査読有り]
CENTRE@CLEF 2019

Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Ian Soboroff

Proceedings of ECIR 2019 Part II (LNCS 11438) 2019年 [査読有り]
Celebrating 20 Years of NTCIR: The Book

Douglas W. Oard, D.W, Tetsuya Sakai, Noriko Kando

Proceedings of EVIA 2019 2019年 [査読有り]
RSL19BD at DBDC4: Ensemble of Decision Tree-based and LSTM-based Models

Chih-hao Wang, Sosuke Kato, Tetsuya Sakai

Proceedings of Chatbots and Conversational Agents and Dialogue Breakdown Detection Challenge (WOCHAT+DBDC), IWSDS 2019 2019年 [査読有り]
Low-cost, Bottom-up Measures for Evaluating Search Result Diversification

Zhicheng Dou, Xue Yang, Diya Li, Ji-Rong Wen, Tetsuya Sakai

Information Retrieval Journal 2019年 [査読有り]
Which Diversity Evaluation Measures Are “Good”?

Tetsuya Sakai, Zhaohao Zeng

Proceedings of ACM SIGIR 2019 2019年 [査読有り]
The SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)

Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

Proceedings of ACM SIGIR 2019 2019年 [査読有り]
Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)

Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

Proceedings of OSIRRC 2019 2019年 [査読有り]
BM25 Pseudo Relevance Feedback Using Anserini at Waseda University

Zhaohao Zeng, Tetsuya Sakai

Proceedings of OSIRRC 2019 2019年 [査読有り]
Composing a Picture Book by Automatic Story Understanding and Visualization

Xiaoyu Qi, Ruihua Song, Chunting Wang, Jin Zhou, Tetsuya Sakai

Proceedings of the Second Storytelling Workshop (StoryNLP @ ACL2019) 2019年 [査読有り]
CENTRE@CLEF2019: Overview of the Replicability and Reproducibility Tasks

Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Ian Soboroff

CLEF 2019 Working Notes 2019年 [査読有り]
CENTRE@CLEF2019: Sequel in the Systematic Reproducibility Realm

Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Ian Soboroff

Proceedings of CLEF 2019 (LNCS 11696) 2019年 [査読有り]
Generalising Kendall’s Tau for Noisy and Incomplete Preference Judgements

Riku Togashi, Tetsuya Sakai

Proceedings of ACM ICTIR 2019 2019年 [査読有り]
Evaluating Image-Inspired Poetry Generation

Chao-Chung Wu, Ruihua Song, Tetsuya Sakai, Wen-Feng Cheng, Xing Xie, Shou-De Lin

Proceedings of NLPCC 2019 2019年 [査読有り]
How to Run an Evaluation Task: with a Primary Focus on Ad Hoc Information Retrieval

Tetsuya Sakai

Information Retrieval Evaluation in a Changing World : Lessons Learned from 20 Years of CLEF 2019年 [査読有り]
Voice Assistant アプリの大規模実態調査

刀塚敦子, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉

コンピュータセキュリティシンポジウム 2019年 [査読有り]
Voice Input Interface Failures and Frustration: Developer and User Perspectives

Shiyoh Goetsu, Tetsuya Sakai

ACM UIST 2019 Adjunct 2019年 [査読有り]
A First Look at the Privacy Risks of Voice Assistant Apps

Atsuko Natatsuka, Mitsuaki Akiyama, Ryo Iijima, Tetsuya Sakai, Takuya Watanabe, Tatsuya Mori

ACM CCS 2019 Posters & Demos 2019年 [査読有り]
Attitude Detection for One-Round Conversation: Jointly Extracting Target-Polarity Pairs

Zhaohao Zeng, Ruihua Song, Pingping Lin, Tetsuya Sakai

Journal of Information Processing 2019年 [査読有り]
Search Result Diversity Evaluation Based on Intent Hierarchies

Xiaojie Wang, Ji-Rong Wen, Zhicheng Dou, Tetsuya Sakai, Rui Zhang

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 30 ( 1 ) 156 - 169 2018年01月 [査読有り]

　概要を見る

Search result diversification aims at returning diversified document lists to cover different user intents of a query. Existing diversity measures assume that the intents of a query are disjoint, and do not consider their relationships. In this paper, we introduce intent hierarchies to model the relationships between intents, and present four weighing schemes. Based on intent hierarchies, we propose several hierarchical measures that take into account the relationships between intents. We demonstrate the feasibility of hierarchical measures by using a new test collection based on TREC Web Track 2009-2013 diversity test collections and by using NTCIR-11 IMine test collection. Our main experimental findings are: (1) Hierarchical measures are more discriminative and intuitive than existing measures. In terms of intuitiveness, it is preferable for hierarchical measures to use the whole intent hierarchies than to use only the leaf nodes. (2) The types of intent hierarchies used affect the discriminative power and intuitiveness of hierarchical measures. We suggest the best type of intent hierarchies to be used according to whether the nonuniform weights are available. (3) To measure the benefits of the diversification algorithms which use automatically mined hierarchical intents, it is important to use hierarchical measures instead of existing measures.

DOI

Scopus

18

被引用数

(Scopus)
Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

Tetsuya Sakai

Proceedings of ACM SIGIR 2018 2018年 [査読有り]
Comparing Two Binned Probability Distributions for Information Access Evaluation

Tetsuya Sakai

Proceedings of ACM SIGIR 2018 2018年 [査読有り]
CENTRE@CLEF2018: Overview of the Replicability Task

Nicola Ferro, Maria Maistro, Tetsuya Sakai, Ian Soboroff

CLEF 2018 Working Notes 招待論文 2018年 [査読有り]
Topic Set Size Design for Paired and Unpaired Data

Tetsuya Sakai

Proceedings of ACM ICTIR 2018 2018年 [査読有り]
Classifying Community QA Questions That Contain an Image

Kenta Tamaki, Riku Togashi, Sosuke Kato, Sumio Fujita, Hideyuki Maeda, Tetsuya Sakai

Proceedings of ACM ICTIR 2018 2018年 [査読有り]
放棄セッションにおけるユーザ操作に着目したモバイル検索カードの順位付け

川崎真未, Inho Kang, 酒井哲也

IPSJ TOD11(3) 2018年 [査読有り]
Towards Automatic Evaluation of Customer-Helpdesk Dialogues

Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, Tetsuya Sakai

Journal of Information Processing 2018年 [査読有り]
Overview of CENTRE@CLEF 2018: a First Tale in the Systematic Reproducibility Realm

Nicola Ferro, Maria Maistro, Tetsuya Sakai, Ian Soboroff

Proceedings of CLEF 2018 (LNCS 11018) 2018年 [査読有り]
Why You Should Listen to This Song: Reason Generation for Explainable Recommendation

Guoshuai Zhao, Hao Fu, Ruihua Song, Tetsuya Sakai, Xing Xie, Xueming Qian

1st Workshop on Scalable and Applicable Recommendation Systems (SAREC 2018) 2018年 [査読有り]
Understanding the Inconsistency between Behaviors and Descriptions of Mobile Apps

Takuya Watanabe, Akiyama Mitsuki, Tetsuya Sakai, Hironori Washizaki, Tatsuya Mori

IEICE Transactions 2018年 [査読有り]
Proceedings of AIRS 2018 (LNCS 11292)

Yuen-Hsien Tseng, Tetsuya Sakai, Jing Jiang, Lun-Wei Ku, Dae Hoon Park, Jui-Feng Yeh, Liang-Chih Yu, Lung-Hao Lee, Zhi-Hong Chen

エディタ 2018年 [査読有り]
SIGIR 2017 chairs' welcome

Hang Li, Arjen P. De Vries, Ryen W. White, Noriko Kando, Tetsuya Sakai, Hideo Joho

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval iv - v 2017年08月
The probability that your hypothesis is correct, credible intervals, and effect sizes for IR evaluation

Tetsuya Sakai

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 25 - 34 2017年08月 [査読有り]

　概要を見る

Using classical statistical significance tests, researchers can only discuss PD+jH, the probability of observing the data D at hand or something more extreme, under the assumption that the hypothesis H is true (i.e., the p-value). But what we usually want is PHjD, the probability that a hypothesis is true, given the data. If we use Bayesian statistics with state-of-The-Art Markov Chain Monte Carlo (MCMC) methods for obtaining posterior distributions, this is no longer a problem. .at is, instead of the classical p-values and 95% confidence intervals, which are offen misinterpreted respectively as "probability that the hypothesis is (in)correct" and "probability that the true parameter value drops within the interval is 95%," we can easily obtain PHjD and credible intervals which represent exactly the above. Moreover, with Bayesian tests, we can easily handle virtually any hypothesis, not just "equality of means," and obtain an Expected A Posteriori (EAP) value of any statistic that we are interested in. We provide simple tools to encourage the IR community to take up paired and unpaired Bayesian tests for comparing two systems. Using a variety of TREC and NTCIR data, we compare PHjD with p-values, credible intervals with con.-dence intervals, and Bayesian EAP effect sizes with classical ones. Our results show that (a) p-values and confidence intervals can respectively be regarded as approximations of what we really want, namely, PHjD and credible intervals
and (b) sample effect sizes from classical significance tests can di.er considerably from the Bayesian EAP effect sizes, which suggests that the former can be poor estimates of population effect sizes. For both paired and unpaired tests, we propose that the IR community report the EAP, the credible interval, and the probability of hypothesis being true, not only for the raw di.erence in means but also for the effect size in terms of Glass's.δ.

DOI

Scopus

12

被引用数

(Scopus)
Evaluating mobile search with height-biased gain

Cheng Luo, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, Shaoping Ma

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 435 - 444 2017年08月 [査読有り]

　概要を見る

Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33% of the time, which is better than all existing metrics.

DOI

Scopus

22

被引用数

(Scopus)
LSTM vs. BM25 for Open-domain QA: A hands-on comparison of effectiveness and efficiency

Sosuke Kato, Riku Togashi, Hideyuki Maeda, Sumio Fujita, Tetsuya Sakai

SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval 1309 - 1312 2017年08月 [査読有り]

　概要を見る

Recent advances in neural networks, along with the growth of rich and diverse community question answering (cQA) data, have en-abled researchers to construct robust open-domain question an-swering (QA) systems. It is often claimed that such state-of-The-art QA systems far outperform traditional IR baselines such as BM25. However, most such studies rely on relatively small data sets, e.g., those extracted from the old TREC QA tracks. Given mas-sive training data plus a separate corpus of Q&amp
A pairs as the tar-get knowledge source, how well would such a system really per-form? How fast would it respond? In this demonstration, we pro-vide the attendees of SIGIR 2017 an opportunity to experience a live comparison of two open-domain QA systems, one based on a long short-Term memory (LSTM) architecture with over 11 mil-lion Yahoo! Chiebukuro (i.e., Japanese Yahoo! Answers) questions and over 27.4 million answers for training, and the other based on BM25. Both systems use the same Q&amp
A knowledge source for answer retrieval. Our core demonstration system is a pair of Japan-ese monolingual QA systems, but we leverage machine translation for letting the SIGIR attendees enter English questions and com-pare the Japanese responses from the two systems after translating them into English.

DOI

Scopus

5

被引用数

(Scopus)
Does document relevance affect the searcher's perception of time?

Cheng Luo, Yiqun Liu, Tetsuya Sakai, Ke Zhou, Fan Zhang, Xue Li, Shaoping Ma

WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining 141 - 150 2017年02月 [査読有り]

　概要を見る

Time plays an essential role in multiple areas of Information Retrieval (IR) studies such as search evaluation, user behavior analysis, temporal search result ranking and query understanding. Especially, in search evaluation studies, time is usually adopted as a measure to quantify users' efforts in search processes. Psychological studies have reported that the time perception of human beings can be affected by many stimuli, such as attention and motivation, which are closely related to many cognitive factors in search. Considering the fact that users' search experiences are affected by their subjective feelings of time, rather than the objective time measured by timing devices, it is necessary to look into the different factors that have impacts on search users' perception of time. In this work, we make a first step towards revealing the time perception mechanism of search users with the following contributions: (1) We establish an experimental research framework to measure the subjective perception of time while reading documents in search scenario, which originates from but is also different from traditional time perception measurements in psychological studies. (2) With the framework, we show that while users are reading result documents, document relevance has small yet visible effect on search users' perception of time. By further examining the impact of other factors, we demonstrate that the effect on relevant documents can also be influenced by individuals and tasks. (3) We conduct a preliminary experiment in which the difference between perceived time and dwell time is taken into consideration in a search evaluation task. We found that the revised framework achieved a better correlation with users' satisfaction feedbacks. This work may help us better understand the time perception mechanism of search users and provide insights in how to better incorporate time factor in search evaluation studies.

DOI

Scopus

7

被引用数

(Scopus)
Investigating Users' Time Perception during Web Search

Cheng Luo, Xue Li, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, Shaoping Ma

Proceedings of CHIIR 2017 2017年 [査読有り]
Overview of Special Issue

Donna Harman, Diane Kelly (eds, James Allan, Nicholas J. Belkin, Paul Bennett, Jamie Callan, Charles Clarke, Fernando Diaz, Susan Dumais, Nicola Ferro, Donna Harman, Djoerd Hiemstra, Ian Ruthven, Tetsuya Sakai, Mark D. Smucker, Justin Zobel

SIGIR Forum, 51(2) 2017年 [査読有り]
Mobile Vertical Ranking based on Preference Graphs

Yuta Kadotami, Yasuaki Yoshida, Sumio Fujita, Tetsuya Sakai

ACM ICTIR 2017 2017年 [査読有り]
Ranking Rich Mobile Verticals based on Clicks and Abandonment

Mami Kawasaki, Inho Kang, Tetsuya Sakai

Proceedings of ACM CIKM 2017 2017年 [査読有り]
Test Collections and Measures for Evaluating Customer-Helpdesk Dialogues

Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, Tetsuya Sakai

Proceedings of EVIA 2017 2017年 [査読有り]
Evaluating Evaluation Measures with Worst-Case Confidence Interval Widths

Tetsuya Sakai

Proceedings of EVIA 2017 2017年 [査読有り]
Towards Automatic Evaluation of Multi-Turn Dialogues: A Task Design that Leverages Inherently Subjective Annotations

Tetsuya Sakai

Proceedings of EVIA 2017 2017年 [査読有り]
The Effect of Inter-Assessor Disagreement on IR System Evaluation: A Case Study with Lancers and Students

Tetsuya Sakai

Proceedings of EVIA 2017 2017年 [査読有り]
Unanimity-Aware Gain for Highly Subjective Assessments

Tetsuya Sakai

Proceedings of EVIA 2017 2017年 [査読有り]
RSL17BD at DBDC3: Computing Utterance Similarities based on Term Frequency and Word Embedding Vectors

Sosuke Kato, Tetsuya Sakai

Proceedings of DSTC6 2017年 [査読有り]
Simple and effective approach to score standardisation

Tetsuya Sakai

ICTIR 2016 - Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval 95 - 104 2016年09月 [査読有り]

　概要を見る

Webber, Moffat and Zobel proposed score standardization for information retrieval evaluation with multiple test collections. Given a topic-by-run raw score matrix in terms of some evaluation measure, each score can be standardised using the topic's sample mean and sample standard deviation across a set of past runs so as to quantify how different a system is from the "average" system in standard deviation units. Using standardised scores, researchers can compare systems across different test collections without worrying about topic hardness or normalisation. While Webber et al. mapped the standardised scores to the [0,1] range using a standard normal cumulative density function, the present study demonstrates that linear transformation of the standardised scores, a method widely used in educational research, can be a simple and effective alternative. We use three TREC robust track data sets with graded relevance assessments and official runs to compare these methods by means of leave-one-out tests, discriminative power, swap rate tests, and topic set size design. In particular, we demonstrate that our method is superior to the method of Webber et al. in terms of swap rates and topic set size design: put simply, our method ensures pairwise system comparisons that are more consistent across different data sets, and is arguably more convenient for designing a new test collection from a statistical viewpoint.

DOI

Scopus

16

被引用数

(Scopus)
Evaluating search result diversity using intent hierarchies

Xiaojie Wang, Zhicheng Dou, Tetsuya Sakai, Ji-Rong Wen

SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval 415 - 424 2016年07月 [査読有り]

　概要を見る

Search result diversification aims at returning diversified document lists to cover different user intents for ambiguous or broad queries. Existing diversity measures assume that user intents are independent or exclusive, and do not consider the relationships among the intents. In this paper, we introduce intent hierarchies to model the relationships among intents. Based on intent hierarchies, we propose several hierarchical measures that can consider the relationships among intents. We demonstrate the feasibility of hierarchical measures by using a new test collection based on TREC Web Track 2009-2013 diversity test collections. Our main experimental findings are: (1) Hierarchical measures are generally more discriminative and intuitive than existing measures using flat lists of intents
(2) When the queries have multilayer intent hierarchies, hierarchical measures are less correlated to existing measures, but can get more improvement in discriminative power
(3) Hierarchical measures are more intuitive in terms of diversity or relevance. The hierarchical measures using the whole intent hierarchies are more intuitive than only using the leaf nodes in terms of diversity and relevance.

DOI

Scopus

23

被引用数

(Scopus)
Topic set size design

Tetsuya Sakai

INFORMATION RETRIEVAL JOURNAL 19 ( 3 ) 256 - 283 2016年06月 [査読有り]

　概要を見る

Traditional pooling-based information retrieval (IR) test collections typically have n = 50-100 topics, but it is difficult for an IR researcher to say why the topic set size should really be n. The present study provides details on principled ways to determine the number of topics for a test collection to be built, based on a specific set of statistical requirements. We employ Nagata's three sample size design techniques, which are based on the paired t test, one-way ANOVA, and confidence intervals, respectively. These topic set size design methods require topic-by-run score matrices from past test collections for the purpose of estimating the within-system population variance for a particular evaluation measure. While the previous work of Sakai incorrectly used estimates of the total variances, here we use the correct estimates of the within-system variances, which yield slightly smaller topic set sizes than those reported previously by Sakai. Moreover, this study provides a comparison across the three methods. Our conclusions nevertheless echo those of Sakai: as different evaluation measures can have vastly different within-system variances, they require substantially different topic set sizes under the same set of statistical requirements; by analysing the tradeoff between the topic set size and the pool depth for a particular evaluation measure in advance, researchers can build statistically reliable yet highly economical test collections.

DOI

Scopus

33

被引用数

(Scopus)
On Estimating Variances for Topic Set Size Design

Tetsuya Sakai, Lifeng Shang

EVIA 2016 2016年 [査読有り]
Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect User Preferences?

Makoto P. Kato, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Hajime Morita

EVIA 2016 2016年 [査読有り]
Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS,2006-2015

Tetsuya Sakai

ACM SIGIR 2016 2016年 [査読有り]
Two Sample T-tests for IR Evaluation: Student or Welch?

Tetsuya Sakai

ACM SIGIR 2016 2016年 [査読有り]
Report on the First International Workshop on the Evaluation on Collaborative Information Seeking and Retrieval (ECol'2015),

Laure Soulier, Lynda Tamine, Tetsuya Sakai, Leif Azzopardi, Jeremy Pickens

ACM SIGIR 2016 2016年 [査読有り]
Topic Set Size Design and Power Analysis in Practice (Tutorial Abstract),

Tetsuya Sakai

ACM ICTIR 2016 2016年 [査読有り]
The Effect of Score Standardisation on Topic Set Size Design

Tetsuya Sakai

INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2016 9994 16 - 28 2016年 [査読有り]

　概要を見る

Given a topic-by-run score matrix from past data, topic set size design methods can help test collection builders determine the number of topics to create for a new test collection from a statistical viewpoint. In this study, we apply a recently-proposed score standardisation method called std-AB to score matrices before applying topic set size design, and demonstrate its advantages. For topic set size design, std-AB suppresses score variances and thereby enables test collection builders to consider realistic choices of topic set sizes, and to handle unnormalised measures in the same way as normalised measures. In addition, even discrete measures that clearly violate normality assumptions look more continuous after applying std-AB, which may make them more suitable for statistically motivated topic set size design. Our experiments cover a variety of tasks and evaluation measures from NTCIR-12.

DOI

Scopus

2

被引用数

(Scopus)
Search result diversification based on hierarchical intents

Sha Hu, Zhicheng Dou, Xiaojie Wang, Tetsuya Sakai, Ji-Rong Wen

International Conference on Information and Knowledge Management, Proceedings 19-23- 63 - 72 2015年10月 [査読有り]

　概要を見る

A large percentage of queries issued to search engines are broad or ambiguous. Search result diversification aims to solve this problem, by returning diverse results that can fulfill as many different information needs as possible. Most existing intent-aware search result diversification algorithms formulate user intents for a query as a flat list of subtopics. In this paper, we introduce a new hierarchical structure to represent user intents and propose two general hierarchical diversification models to leverage hierarchical intents. Experimental results show that our hierarchical diversification models outperform state-of-the-art diversification methods that use traditional flat subtopics.

DOI

Scopus

70

被引用数

(Scopus)
Dynamic author name disambiguation for growing digital libraries

Yanan Qian, Qinghua Zheng, Tetsuya Sakai, Junting Ye, Jun Liu

INFORMATION RETRIEVAL 18 ( 5 ) 379 - 412 2015年10月 [査読有り]

　概要を見る

When a digital library user searches for publications by an author name, she often sees a mixture of publications by different authors who have the same name. With the growth of digital libraries and involvement of more authors, this author ambiguity problem is becoming critical. Author disambiguation (AD) often tries to solve this problem by leveraging metadata such as coauthors, research topics, publication venues and citation information, since more personal information such as the contact details is often restricted or missing. In this paper, we study the problem of how to efficiently disambiguate author names given an incessant stream of published papers. To this end, we propose a "BatchAD+IncAD" framework for dynamic author disambiguation. First, we perform batch author disambiguation (BatchAD) to disambiguate all author names at a given time by grouping all records (each record refers to a paper with one of its author names) into disjoint clusters. This establishes a one-to-one mapping between the clusters and real-world authors. Then, for newly added papers, we periodically perform incremental author disambiguation (IncAD), which determines whether each new record can be assigned to an existing cluster, or to a new cluster not yet included in the previous data. Based on the new data, IncAD also tries to correct previous AD results. Our main contributions are: (1) We demonstrate with real data that a small number of new papers often have overlapping author names with a large portion of existing papers, so it is challenging for IncAD to effectively leverage previous AD results. (2) We propose a novel IncAD model which aggregates metadata from a cluster of records to estimate the author's profile such as her coauthor distributions and keyword distributions, in order to predict how likely it is that a new record is "produced" by the author. (3) Using two labeled datasets and one large-scale raw dataset, we show that the proposed method is much more efficient than state-of-the-art methods while ensuring high accuracy.

DOI

Scopus

45

被引用数

(Scopus)
Understanding the Inconsistencies between Text Descriptions and the Use of Privacy-sensitive Resources of Mobile Apps

Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, Hironori Washizaki, Tatsuya Mori

SOUPS 2015 2015年 [査読有り]
Topic Set Size Design with the Evaluation Measures for Short Text Conversation

Tetsuya Sakai, Lifeng Shang, Zhengdong Lu, Hang Li

INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2015 9460 319 - 331 2015年 [査読有り]

　概要を見る

Short Text Conversation (STC) is a new NTCIR task which tackles the following research question: given a microblog repository and a new post to that microblog, can systems reuse an old comment from the respository to satisfy the author of the new post? The official evaluation measures of STC are normalised gain at 1 (nG@1), normalised expected reciprocal rank at 10 (nERR@10), and P+, all of which can be regarded as evaluation measures for navigational intents. In this study, we apply the topic set size design technique of Sakai to decide on the number of test topics, using variance estimates of the above evaluation measures. Our main conclusion is to create 100 test topics, but what distinguishes our work from other tasks with similar topic set sizes is that we know what this topic set size means from a statistical viewpoint for each of our evaluation measures. We also demonstrate that, under the same set of statistical requirements, the topic set sizes required by nERR@10 and P+ are more or less the same, while nG@1 requires more than twice as many topics. To our knowledge, our task is the first among all efforts at TREC-like evaluation conferences to actually create a new test collection by using this principled approach.

DOI

Scopus

4

被引用数

(Scopus)
Designing test collections for comparing many systems

Tetsuya Sakai

CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management 61 - 70 2014年11月 [査読有り]

　概要を見る

A researcher decides to build a test collection for comparing her new information retrieval (IR) systems with several state-of-the-art baselines. She wants to know the number of topics (n) she needs to create in advance, so that she can start looking for (say) a query log large enough for sampling n good topics, and estimating the relevance assessment cost. We provide practical solutions to researchers like her using power analysis and sample size design techniques, and demonstrate its usefulness for several IR tasks and evaluation measures. We consider not only the paired t-test but also one-way analysis of variance (ANOVA) for significance testing to accommodate comparison of m(≥ 2) systems under a given set of statistical requirements (α: the Type I error rate, β: the Type II error rate, and minD: the minimum detectable difference between the best and the worst systems). Using our simple Excel tools and some pooled variance estimates from past data, researchers can design statistically well-designed test collections. We demonstrate that, as different evaluation measures have different variances across topics, they inevitably require different topic set sizes. This suggests that the evaluation measures should be chosen at the test collection design phase. Moreover, through a pool depth reduction experiment with past data, we show how the relevance assessment cost can be reduced dramatically while freezing the set of statistical requirements. Based on the cost analysis and the available budget, researchers can determine the right balance betweeen n and the pool depth pd. Our techniques and tools are applicable to test collections for non-IR tasks as well.

DOI

Scopus

13

被引用数

(Scopus)
Metrics, Statistics, Tests (invited paper)

Tetsuya Sakai

PROMISE Winter School 2013: Bridging between Information Retrieval and Databases (LNCS 8173) 2014年 [査読有り]
Statistical Reform in Information Retrieval?

Tetsuya Sakai

SIGIR Forum 2014年 [査読有り]
Designing Test Collections That Provide Tight Confidence Intervals

Tetsuya Sakai

Forum on Information Technology 2014 13 ( 2 ) 15 - 18 2014年 [査読有り]

CiNii
ReviewCollage: A Mobile Interface for Direct Comparison Using Online Reviews

Haojian Jin, Tetsuya Sakai, Koji Yatani

PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION WITH MOBILE DEVICES AND SERVICES (MOBILEHCI'14) 349 - 358 2014年 [査読有り]

　概要を見る

Review comments posted in online websites can help the user decide a product to purchase or place to visit. They can also be useful to closely compare a couple of candidate entities. However, the user may have to read different webpages back and forth for comparison, and this is not desirable particularly when she is using a mobile device. We present ReviewCollage, a mobile interface that aggregates information about two reviewed entities in a one-page view. ReviewCollage uses attribute-value pairs, known to be effective for review text summarization, and highlights the similarities and differences between the entities. Our user study confirms that ReviewCollage can support the user to compare two entities and make a decision within a couple of minutes, at least as quickly as existing summarization interfaces. It also reveals that ReviewCollage could be most useful when two entities are very similar.

DOI

Scopus

6

被引用数

(Scopus)
Topic Set Size Design with Variance Estimates from Two-Way ANOVA

Tetsuya Sakai

EVIA 2014 2014年 [査読有り]
When do people use query suggestion? A query suggestion log analysis

Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka

INFORMATION RETRIEVAL 16 ( 6 ) 725 - 746 2013年12月 [査読有り]

　概要を見る

Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it has not been clear what circumstances cause the user to turn to query suggestion. In order to investigate when and how the user uses query suggestion, we analyzed three kinds of data sets obtained from a major commercial Web search engine, comprising approximately 126 million unique queries, 876 million query suggestions and 306 million action patterns of users. Our analysis shows that query suggestions are often used (1) when the original query is a rare query, (2) when the original query is a single-term query, (3) when query suggestions are unambiguous, (4) when query suggestions are generalizations or error corrections of the original query, and (5) after the user has clicked on several URLs in the first search result page. Our results suggest that search engines should provide better assistance especially when rare or single-term queries are input, and that they should dynamically provide query suggestions according to the searcher's current state.

DOI

Scopus

33

被引用数

(Scopus)
Introduction to the special issue on search intents and diversification

Tetsuya Sakai, Noriko Kando, Craig Macdonald, Ian Soboroff

INFORMATION RETRIEVAL 16 ( 4 ) 427 - 428 2013年08月 [査読有り]

DOI

Scopus

1

被引用数

(Scopus)
Diversified search evaluation: lessons from the NTCIR-9 INTENT task

Tetsuya Sakai, Ruihua Song

INFORMATION RETRIEVAL 16 ( 4 ) 504 - 529 2013年08月 [査読有り]

　概要を見る

The evaluation of diversified web search results is a relatively new research topic and is not as well-understood as the time-honoured evaluation methodology of traditional IR based on precision and recall. In diversity evaluation, one topic may have more than one intent, and systems are expected to balance relevance and diversity. The recent NTCIR-9 evaluation workshop launched a new task called INTENT which included a diversified web search subtask that differs from the TREC web diversity task in several aspects: the choice of evaluation metrics, the use of intent popularity and per-intent graded relevance, and the use of topic sets that are twice as large as those of TREC. The objective of this study is to examine whether these differences are useful, using the actual data recently obtained from the NTCIR-9 INTENT task. Our main experimental findings are: (1) The evaluation framework used at NTCIR provides more "intuitive" and statistically reliable results than Intent-Aware Expected Reciprocal Rank; (2) Utilising both intent popularity and per-intent graded relevance as is done at NTCIR tends to improve discriminative power, particularly for -nDCG; and (3) Reducing the topic set size, even by just 10 topics, can affect not only significance testing but also the entire system ranking; when 50 topics are used (as in TREC) instead of 100 (as in NTCIR), the system ranking can be substantially different from the original ranking and the discriminative power can be halved. These results suggest that the directions being explored at NTCIR are valuable.

DOI

Scopus

15

被引用数

(Scopus)
Web Search Evaluation with Informational and Navigational Intents

Tetsuya Sakai

Journal of Information Processing 2013年 [査読有り]
The Unreusability of Diversified Test Collections

Tetsuya Sakai

EVIA 2013 2013年 [査読有り]
Summaries, Ranked Retrieval and Sessions: A Unified Framework for Information Access Evaluation

Tetsuya Sakai, Zhicheng Dou

ACM SIGIR 2013 2013年 [査読有り]
Exploring semi-automatic nugget extraction for Japanese one click access evaluation

Matthew Ekstrand-Abueg, Virgil Pavlu, Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 749 - 752 2013年 [査読有り]

　概要を見る

Building test collections based on nuggets is useful evaluating systems that return documents, answers, or summaries. However, nugget construction requires a lot of manual work and is not feasible for large query sets. Towards an efficient and scalable nugget-based evaluation, we study the applicability of semi-automatic nugget extraction in the context of the ongoing NTCIR One Click Access (1CLICK) task. We compare manually-extracted and semi-automatically- extracted Japanese nuggets to demonstrate the coverage and efficiency of the semi-automatic nugget extraction. Our findings suggest that the manual nugget extraction can be replaced with a direct adaptation of the English semi-automatic nugget extraction system, especially for queries for which the user desires broad answers from free-form text. Copyright © 2013 ACM.

DOI

Scopus

3

被引用数

(Scopus)
Report from the NTCIR-10 1CLICK-2 Japanese subtask: Baselines, upperbounds and evaluation robustness

Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 753 - 756 2013年 [査読有り]

　概要を見る

The One Click Access Task (1CLICK) of NTCIR requires systems to return a concise multi-document summary of web pages in response to a query which is assumed to have been submitted in a mobile context. Systems are evaluated based on information units (or iUnits), and are required to present important pieces of information first and to minimise the amount of text the user has to read. Using the official Japanese results of the second round of the 1CLICK task from NTCIR-10, we discuss our task setting and evaluation framework. Our analyses show that: (1) Simple baseline methods that leverage search engine snippets or Wikipedia are effective for "lookup" type queries but not necessarily for other query types
(2) There is still a substantial gap between manual and automatic runs
and (3) Our evaluation metrics are relatively robust to the incompleteness of iUnits. Copyright © 2013 ACM.

DOI

Scopus

4

被引用数

(Scopus)
Summary of the NTCIR-10 INTENT-2 Task: Subtopic mining and search result diversification

Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Makoto P. Kato

SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 761 - 764 2013年 [査読有り]

　概要を見る

The NTCIR INTENT task comprises two subtasks: Subtopic Mining, where systems are required to return a ranked list of subtopic strings for each given query
and Document Ranking, where systems are required to return a diversified web search result for each given query. This paper summarises the novel features of the Second INTENT task at NTCIR-10 and its main findings, and poses some questions for future diversified search evaluation. Copyright © 2013 ACM.

DOI

Scopus

14

被引用数

(Scopus)
Time-aware structured query suggestion

Taiki Miyanishi, Tetsuya Sakai

SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval 809 - 812 2013年 [査読有り]

　概要を見る

Most commercial search engines have a query suggestion feature, which is designed to capture various possible search intents behind the user's original query. However, even though different search intents behind a given query may have been popular at different time periods in the past, existing query suggestion methods neither utilize nor present such information. In this study, we propose Time-aware Structured Query Suggestion (TaSQS) which clusters query suggestions along a timeline so that the user can narrow down his search from a temporal point of view. Moreover, when a suggested query is clicked, TaSQS presents web pages from query-URL bipartite graphs after ranking them according to the click counts within a particular time period. Our experiments using data from a commercial search engine log show that the time-aware clustering and the time-aware document ranking features of TaSQS are both effective. Copyright © 2013 ACM.

DOI

Scopus

17

被引用数

(Scopus)
The Impact of Intent Selection on Diversified Search Evaluation

Tetsuya Sakai, Zhicheng Dou, Charles L.A. Clarke

ACM SIGIR 2013 2013年 [査読有り]
Evaluating Heterogeneous Information Access (Position paper)

Ke Zhou, Tetsuya Sakai, Mounia Lalmas, Zhicheng Dou, Joemon M. Jose

Workshop on Modeling User Behavior for Information Access Evaluation 2013年 [査読有り]
Mining Search Intents from Text Fragments

Qinglei Wang, Yanan Qian, Ruihua Song, Zhicheng Dou, Fan Zhang, Tetsuya Sakai, Qinghua Zheng

Information Retrieval 2013年 [査読有り]
On the reliability and intuitiveness of aggregated search metrics

Ke Zhou, Mounia Lalmas, Tetsuya Sakai, Ronan Cummins, Joemon M. Jose

International Conference on Information and Knowledge Management, Proceedings 689 - 698 2013年 [査読有り]

　概要を見る

Aggregating search results from a variety of diverse verticals such as news, images, videos and Wikipedia into a single interface is a popular web search presentation paradigm. Although several aggregated search (AS) metrics have been proposed to evaluate AS result pages, their properties remain poorly understood. In this paper, we compare the properties of existing AS metrics under the assumptions that (1) queries may have multiple preferred verticals
(2) the likelihood of each vertical preference is available
and (3) the topical relevance assessments of results returned from each vertical is available. We compare a wide range of AS metrics on two test collections. Our main criteria of comparison are (1) discriminative power, which represents the reliability of a metric in comparing the performance of systems, and (2) intuitiveness, which represents how well a metric captures the various key aspects to be measured (i.e. various aspects of a user's perception of AS result pages). Our study shows that the AS metrics that capture key AS components (e.g., vertical selection) have several advantages over other metrics. This work sheds new lights on the further developments and applications of AS metrics. Copyright 2013 ACM.

DOI

Scopus

15

被引用数

(Scopus)
Dynamic query intent mining from a search log stream

Yanan Qian, Tetsuya Sakai, Junting Ye, Qinghua Zheng, Cong Li

International Conference on Information and Knowledge Management, Proceedings 1205 - 1208 2013年 [査読有り]

　概要を見る

It has long been recognized that search queries are often broad and ambiguous. Even when submitting the same query, different users may have different search intents. Moreover, the intents are dynamically evolving. Some intents are constantly popular with users, others are more bursty. We propose a method for mining dynamic query intents from search query logs. By regarding the query logs as a data stream, we identify constant intents while quickly capturing new bursty intents. To evaluate the accuracy and efficiency of our method, we conducted experiments using 50 topics from the NTCIR INTENT-9 data and additional five popular topics, all supplemented with six-month query logs from a commercial search engine. Our results show that our method can accurately capture new intents with short response time. Copyright 2013 ACM.

DOI

Scopus

17

被引用数

(Scopus)
How intuitive are diversified search metrics? Concordance test results for the diversity U-measures

Tetsuya Sakai

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8281 13 - 24 2013年 [査読有り]

　概要を見る

Most of the existing Information Retrieval (IR) metrics discount the value of each retrieved relevant document based on its rank. This statement also applies to the evaluation of diversified search: the widely-used diversity metrics, namely, α-nDCG, Intent-Aware Expected Reciprocal Rank (ERR-IA) and D#-nDCG, are all rank-based. These evaluation metrics regard the system output as a list of document IDs, and ignore all other features such as snippets and document full texts of various lengths. In contrast, the U-measure framework of Sakai and Dou uses the amount of text read by the user as the foundation for discounting the value of relevant information, and can take into account the user's snippet reading and full text reading behaviours. The present study compares the diversity versions of U-measure (D-U and U-IA) with the state-of-the-art diversity metrics using the concordance test: given a pair of ranked lists, we quantify the ability of each metric to favour the more diversified and more relevant list. Our results show that while D#-nDCG is the overall winner in terms of simultaneous concordance with diversity and relevance, D-U and U-IA statistically significantly outperform other state-of-the-art metrics. Moreover, in terms of concordance with relevance alone, D-U and U-IA significantly outperform all rank-based diversity metrics. Thus, D-U and U-IA are not only more realistic but also more relevance-oriented than other diversity metrics. © 2013 Springer-Verlag.

DOI

Scopus

6

被引用数

(Scopus)
User-aware advertisability

Hai-Tao Yu, Tetsuya Sakai

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8281 452 - 463 2013年 [査読有り]

　概要を見る

In sponsored search, many studies focus on finding the most relevant advertisements (ads) and their optimal ranking for a submitted query. Determining whether it is suitable to show ads has received less attention. In this paper, we introduce the concept of user-aware advertisability, which refers to the probability of ad-click on sponsored ads when a specific user submits a query. When computing the advertisability for a given query-user pair, we first classify the clicked web pages based on a pre-defined category hierarchy and use the aggregated topical categories of clicked web pages to represent user preference. Taking user preference into account, we then compute the ad-click probability for this query-user pair. Compared with existing methods, the experimental results show that user preference is of great value for generating user-specific advertisability. In particular, our approach that computes advertisability per query-user pair outperforms the two state-of-the-art methods that compute advertisability per query in terms of a variant of the normalized Discounted Cumulative Gain metric. © 2013 Springer-Verlag.

DOI

Scopus
Estimating intent types for search result diversification

Kosetsu Tsukuda, Tetsuya Sakai, Zhicheng Dou, Katsumi Tanaka

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8281 25 - 37 2013年 [査読有り]

　概要を見る

Given an ambiguous or underspecified query, search result diversification aims at accommodating different user intents within a single Search Engine Result Page (SERP). While automatic identification of different intents for a given query is a crucial step for result diversification, also important is the estimation of intent types (informational vs. navigational). If it is possible to distinguish between informational and navigational intents, search engines can aim to return one best URL for each navigational intent, while allocating more space to the informational intents within the SERP. In light of the observations, we propose a new framework for search result diversification that is intent importance-aware and type-aware. Our experiments using the NTCIR-9 INTENT Japanese Subtopic Mining and Document Ranking test collections show that: (a) our intent type estimation method for Japanese achieves 64.4% accuracy
and (b) our proposed diversification method achieves 0.6373 in D#-nDCG and 0.5898 in DIN#-nDCG over 56 topics, which are statistically significant gains over the top performers of the NTCIR-9 INTENT Japanese Document Ranking runs. Moreover, our relevance oriented model significantly outperforms our diversity oriented model and the original model by Dou et al.. © 2013 Springer-Verlag.

DOI

Scopus

6

被引用数

(Scopus)
On labelling intent types for evaluating search result diversification

Tetsuya Sakai, Young-In Song

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8281 38 - 49 2013年 [査読有り]

　概要を見る

Search result diversification is important for accommodating different user needs by means of covering popular and diverse query intents within a single result page. To evaluate diversity, we believe that it is important to consider the distinction between informational and navigational intents, as users would not want redundant information especially for navigational intents. In this study, we conduct intent type-sensitive diversity evaluation based on both top-down labelling, which labels each intent as either navigational or informational a priori, and bottom-up labelling, which labels each intent based on whether a "navigational relevant" document has actually been identified in the document collection. Our results suggest that reliable type-sensitive diversity evaluation can be conducted using the top-down approach with a clear intent labelling guideline, while ensuring that the desired URLs for navigational intents make their way into relevance assessments. © 2013 Springer-Verlag.

DOI

Scopus

1

被引用数

(Scopus)
Query snowball: A co-occurrence-based approach to multi-document summarization for question answering

Hajime Morita, Tetsuya Sakai, Manabu Okumura

IPSJ Online Transactions 5 ( 2012 ) 124 - 129 2012年 [査読有り]

　概要を見る

We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.

DOI

Scopus
Evaluation with Informational and Navigational Intents

Tetsuya Sakai

WWW 2012 2012年 [査読有り]
Structured query suggestion for specialization and parallel movement: Effect on search behaviors

Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka

WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web 389 - 398 2012年 [査読有り]

　概要を見る

Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it is often difficult for the user to choose from a list of query suggestions, and to understand the relation between an input query and suggested ones. In this paper, we propose a new method to present query suggestions to the user, which has been designed to help two popular query reformulation actions, namely, specialization (e.g. from "nikon" to "nikon camera") and parallel movement (e.g. from "nikon camera" to "canon camera"). Using a query log collected from a popular commercial Web search engine, our prototype called SParQS classifies query suggestions into automatically generated categories and generates a label for each category. Moreover, SParQS presents some new entities as alternatives to the original query (e.g. "canon" in response to the query "nikon"), together with their query suggestions classified in the same way as the original query's suggestions. We conducted a task-based user study to compare SParQS with a traditional "flat list" query suggestion interface. Our results show that the SParQS interface enables subjects to search more successfully than the flat list case, even though query suggestions presented were exactly the same in the two interfaces. In addition, the subjects found the query suggestions more helpful when they were presented in the SParQS interface rather than in a flat list.

DOI

Scopus

30

被引用数

(Scopus)
Query snowball: A co-occurrence-based approach to multi-document summarization for question answering

Hajime Morita, Tetsuya Sakai, Manabu Okumura

IPSJ Online Transactions 5 ( 2012 ) 124 - 129 2012年 [査読有り]

　概要を見る

We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.

DOI

Scopus
AspecTiles: Tile-based visualization of diversified web search results

Mayu Iwata, Tetsuya Sakai, Takehiro Yamamoto, Yu Chen, Yi Liu, Ji-Rong Wen, Shojiro Nishio

SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval 85 - 94 2012年 [査読有り]

　概要を見る

A diversified search result for an underspecified query generally contains web pages in which there are answers that are relevant to different aspects of the query. In order to help the user locate such relevant answers, we propose a simple extension to the standard Search Engine Result Page (SERP) interface, called AspecTiles. In addition to presenting a ranked list of URLs with their titles and snippets, AspecTiles visualizes the relevance degree of a document to each aspect by means of colored squares ("tiles"). To compare AspecTiles with the standard SERP interface in terms of usefulness, we conducted a user study involving 30 search tasks designed based on the TREC web diversity task topics as well as 32 participants. Our results show that AspecTiles has some advantages in terms of search performance, user behavior, and user satisfaction. First, AspecTiles enables the user to gather relevant information significantly more efficiently than the standard SERP interface for tasks where the user considers several different aspects of the query to be important at the same time (multi-aspect tasks). Second, AspecTiles affects the user's information seeking behavior: with this interface, we observed significantly fewer query reformulations, shorter queries and deeper examinations of ranked lists in multi-aspect tasks. Third, participants of our user study found AspecTiles significantly more useful for finding relevant information and easy to use than the standard SERP interface. These results suggest that simple interfaces like AspecTiles can enhance the search performance and search experience of the user when their queries are underspecified. © 2012 ACM.

DOI

Scopus

12

被引用数

(Scopus)
Towards Zero-Click Mobile IR Evaluation: Knowing What and Knowing When

Tetsuya Sakai

ACM SIGIR 2012 2012年 [査読有り]
New Assessment Criteria for Query Suggestion

Zhongrui Ma, Yu Chen, Ruihua Song, Tetsuya Sakai, Jiaheng Lu, Ji-Rong Wen

ACM SIGIR 2012 2012年 [査読有り]
The wisdom of advertisers: Mining subgoals via query clustering

Takehiro Yamamoto, Tetsuya Sakai, Mayu Iwata, Chen Yu, Ji-Rong Wen, Katsumi Tanaka

ACM International Conference Proceeding Series 505 - 514 2012年 [査読有り]

　概要を見る

This paper tackles the problem of mining subgoals of a given search goal from data. For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as "book flights," "book a hotel," "find good restaurants" and "decide which sightseeing spots to visit." As another example, if a searcher wants to lose weight, there may exist several alternative solutions such as "do physical exercise," "take diet pills," and "control calorie intake." In this paper, we refer to such subtasks or solutions as subgoals, and propose to utilize sponsored search data for finding subgoals of a given query by means of query clustering. Advertisements (ads) reflect advertisers' tremendous efforts in trying to match a given query with implicit user needs. Moreover, ads are usually associated with a particular action or transaction. We therefore hypothesized that they are useful for subgoal mining. To our knowledge, our work is the first to use sponsored search data for this purpose. Our experimental results show that sponsored search data is a good resource for obtaining related queries and for identifying subgoals via query clustering. In particular, our method that combines ad impressions from sponsored search data and query co-occurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall. © 2012 ACM.

DOI

Scopus

7

被引用数

(Scopus)
The reusability of a diversified search test collection

Tetsuya Sakai, Zhicheng Dou, Ruihua Song, Noriko Kando

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7675 26 - 38 2012年 [査読有り]

　概要を見る

Traditional ad hoc IR test collections were built using a relatively large pool depth (e.g. 100), and are usually assumed to be reusable. Moreover, when they are reused to compare a new system with another or with systems that contributed to the pools ("contributors"), an even larger measurement depth (e.g. 1,000) is often used for computing evaluation metrics. In contrast, the web diversity test collections that have been created in the past few years at TREC and NTCIR use a much smaller pool depth (e.g. 20). The measurement depth is also small (e.g. 10-30), as search result diversification is primarily intended for the first result page. In this study, we examine the reusability of a typical web diversity test collection, namely, one from the NTCIR-9 INTENT-1 Chinese Document Ranking task, which used a pool depth of 20 and official measurement depths of 10, 20 and 30. First, we conducted additional relevance assessments to expand the official INTENT-1 collection to achieve a pool depth of 40. Using the expanded relevance assessments, we show that run rankings at the measurement depth of 30 are too unreliable, given that the pool depth is 20. Second, we conduct a leave-one-out experiment for every participating team of the INTENT-1 Chinese task, to examine how (un)fairly new runs are evaluated with the INTENT-1 collection. We show that, for the purpose of comparing new systems with the contributors of the test collection being used, condensed-list versions of existing diversity evaluation metrics are more reliable than the raw metrics. However, even the condensed-list metrics may be unreliable if the new systems are not competitive compared to the contributors. © Springer-Verlag 2012.

DOI

Scopus

5

被引用数

(Scopus)
One click one revisited: Enhancing evaluation based on information units

Tetsuya Sakai, Makoto P. Kato

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7675 39 - 51 2012年 [査読有り]

　概要を見る

This paper extends the evaluation framework of the NTCIR-9 One Click Access Task (1CLICK-1), which required systems to return a single, concise textual output in response to a query in order to satisfy the user immediately after a click on the SEARCH button. Unlike traditional nugget-based summarisation and question answering evaluation methods, S-measure, the official evaluation metric of 1CLICK-1, discounts the value of each information unit based on its position within the textual output. We first show that the discount parameter L of S-measure affects system ranking and discriminative power, and that using multiple values, e.g. L = 250 (user has only 30 seconds to view the text) and L = 500 (user has one minute), is beneficial. We then complement the recall-like S-measure with a simple, precision-like metric called T-measure as well as a combination of S-measure and T-measure, called S#. We show that S# with a heavy emphasis on S-measure imposes an appropriate length penalty to 1CLICK-1 system outputs and yet achieves discriminative power that is comparable to S-measure. These new metrics will be used at NTCIR-10 1CLICK-2. © Springer-Verlag 2012.

DOI

Scopus

3

被引用数

(Scopus)
Grid-based interaction for exploratory search

Hideo Joho, Tetsuya Sakai

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7675 496 - 505 2012年 [査読有り]

　概要を見る

This paper presents a grid-based interaction model that is designed to encourage searchers to organize a complex search space by managing n x m sub spaces. A search interface was developed based on the proposed interaction model, and its performance was evaluated by a user study carried out in the context of the NTCIR-9 VisEx Task. With the proposed interface, there were cases where subjects discovered new knowledge without accessing external resources when compared to a baseline system. The encouraging results from experiments warrant further studies on the model. © Springer-Verlag 2012.

DOI

Scopus
Using graded-relevance metrics for evaluating community QA answer selection

Tetsuya Sakai, Yohei Seki, Daisuke Ishikawa, Kazuko Kuriyama, Noriko Kando, Chin-Yew Lin

Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011 187 - 196 2011年 [査読有り]

　概要を見る

Community Question Answering (CQA) sites such as Yahoo ! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation
and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments. Copyright 2011 ACM.

DOI

Scopus

36

被引用数

(Scopus)
Query Session Data vs. Clickthrough Data as Query Suggestion Resources

Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka

ECIR 2011 Workshop on Session Information Retrieval 2011年 [査読有り]
Challenges in Diversity Evaluation (keynote)

Tetsuya Sakai

ECIR 2011 Workshop on Diversity in Document Retrieval 2011年 [査読有り]
Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

Naoyoshi Aikawa, Tetsuya Sakai, Hayato Yamana

情報処理学会論文誌 2011 ( 1 ) 1 - 9 2011年 [査読有り]

CiNii
Evaluating Diversified Search Results Using Per-Intent Graded Relevance

Tetsuya Sakai, Ruihua Song

ACM SIGIR 2011 2011年 [査読有り]
NTCIREVAL: A Generic Toolkit for Information Access Evaluation

Tetsuya Sakai

FIT 2011 2011年 [査読有り]
コミュニティQAにおける良質回答の自動予測

石川大介, 酒井哲也, 関洋平, 栗山和子, 神門典子

情報知識学会誌 2011年 [査読有り]
北京のマイクロソフト研究所より2011 - 日本人インターンの成功事例 -

酒井哲也

若手研究者支援のための産学共同GCOE国内シンポジウムダイジェスト集 2011年 [査読有り]
What Makes a Good Answer in Community Question Answering? An Analysis of Assessors' Criteria

Daisuke Ishikawa, Noriko Kando, Tetsuya Sakai

EVIA2011 2011年 [査読有り]
Q&Aサイトにおけるベストアンサー推定の分析とその機械学習への応用

石川大介, 栗山和子, 酒井哲也, 関洋平, 神門典子

情報知識学会誌 20 ( 2 ) 73 - 85 2010年05月

　概要を見る

本研究では，Q サイトにおけるベストアンサーを計算機が推定可能か検証した．まず最初に，人間の判定者によるベストアンサー推定実験を行った．ベストアンサー推定実験にはYahoo!知恵袋データを利用し，「恋愛相談」「パソコン」「一般教養」「政治」の4つのカテゴリからそれぞれ無作為抽出した50 問を使用した．判定者二人による推定結果の正解率(精度) は，「恋愛相談」では50%と52%(ランダム推定:34%)，「パソコン」では62%と58%(ランダム推定:38%)，「一般教養」では54%と56%(ランダム推定:37%)，「政治」では56%と60%(ランダム推定:35.8%) であった．次に，この実験結果を分析し，ベストアンサーを選ぶ要因として「詳しい」「根拠」「丁寧」を素性とする機械学習システムを構築した．判定者らと同じ50 問を用いた推定実験の結果，機械学習システムの精度は，「パソコン」では判定者らの結果を上回り(67%)，「恋愛相談」では判定者らの結果を下回った(41%)．「一般教養」と「政治」では機械学習システムと判定者らの結果はほぼ同等であった．

DOI CiNii
Boiling Down Information Retrieval Test Collections

Tetsuya Sakai, Teruko Mitamura

RIAO 2010 Proceedings 2010年 [査読有り]
Constructing a Test Collection with Multi-Intent Queries

Ruihua Song, Dongjie Qi, Hua Liu, Tetsuya Sakai, Jian-Yun Nie, Hsiao-Wen Hon, Yong Yu

EVIA 2010 Proceedings 2010年 [査読有り]
Simple Evaluation Metrics for Diversified Search Results

Tetsuya Sakai, Nick Craswell, Ruihua Song, Stephen Robertson, Zhicheng Dou, Chin-Yew Lin

EVIA 2010 Proceedings 2010年 [査読有り]
Ranking Retrieval Systems without Relevance Assessments – Revisited

Tetsuya Sakai, Chin-Yew Lin

EVIA 2010 Proceedings 2010年 [査読有り]
コミュニティQAにおける良質な回答の選定タスク: 評価方法に関する考察

酒井哲也, 石川大介, 栗山和子, 関洋平, 神門典子

FIT 2010 2010年 [査読有り]
Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

Naoyoshi Aikawa, Tetsuya Sakai, Hayato Yamana

WebDB Forum 2010 2010年 [査読有り]
On the robustness of information retrieval metrics to biased relevance assessments

Tetsuya Sakai

Journal of Information Processing 17 156 - 166 2009年 [査読有り]

　概要を見る

Information Retrieval (IR) test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used IR evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in more realistic settings, by reducing the number of pooled systems and the number of pooled documents. Even though previous studies have shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that these results do not hold when the relevance data are biased towards particular systems or towards the top of the pools. More specifically, we show that the condensed-list versions of Average Precision, Qmeasure and normalised Discounted Cumulative Gain, which we denote as AP′, Q′ and nDCG′, are not necessarily superior to the original metrics for handling biases. Nevertheless, AP′ and Q′ are generally superior to bpref, Rank-Biased Precision and its condensed-list version even in the presence of biases.

DOI

Scopus

4

被引用数

(Scopus)
Serendipitous Search via Wikipedia: A Query Log Analysis

Tetsuya Sakai, Kenichi Nogami

PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL 780 - 781 2009年 [査読有り]

　概要を見る

We analyse the query log of a click-oriented Japanese search engine that utilises the link structures of Wikipedia for encouraging the user to change his information need and to perform repeated, serendipitous, exploratory search. Our results show that users tend to make transitions within the same query type: from person names to person names, from place names to place names, and so on.
Ranking the NTCIR ACLIA IR4QA Systems without Relevance Assessments

Tetsuya Sakai, Noriko Kando, Hideki Shima, Chuan-Jie Lin, Ruihua Song, Miho Sugimoto, Teruko Mitamura

日本データベース学会論文誌 2009年 [査読有り]
People, Clouds, and Interaction for Information Access (invited paper)

Tetsuya Sakai

IUCS 2009 2009年 [査読有り]
On information retrieval metrics designed for evaluation with incomplete relevance assessments

Tetsuya Sakai, Noriko Kando

INFORMATION RETRIEVAL 11 ( 5 ) 447 - 470 2008年10月 [査読有り]

　概要を見る

Modern information retrieval (IR) test collections have grown in size, but the available manpower for relevance assessments has more or less remained constant. Hence, how to reliably evaluate and compare IR systems using incomplete relevance data, where many documents exist that were never examined by the relevance assessors, is receiving a lot of attention. This article compares the robustness of IR metrics to incomplete relevance assessments, using four different sets of graded-relevance test collections with submitted runs-the TREC 2003 and 2004 robust track data and the NTCIR-6 Japanese and Chinese IR data from the crosslingual task. Following previous work, we artificially reduce the original relevance data to simulate IR evaluation environments with extremely incomplete relevance data. We then investigate the effect of this reduction on discriminative power, which we define as the proportion of system pairs with a statistically significant difference for a given probability of Type I Error, and on Kendall's rank correlation, which reflects the overall resemblance of two system rankings according to two different metrics or two different relevance data sets. According to these experiments, Q', nDCG' and AP' proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also point out some weaknesses of bpref and Rank-Biased Precision by examining their formal definitions.

DOI

Scopus

110

被引用数

(Scopus)
Introduction to the NTCIR-6 Special Issue

Noriko Kando, Teruko Mitamura, Tetsuya Sakai

ACM Transactions on Asian Language Information Processing (TALIP) 2008年 [査読有り]
Precision-at-ten considered redundant

William Webber, Alistair Moffat, Justin Zobel, Tetsuya Sakai

ACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings 695 - 696 2008年 [査読有り]

　概要を見る

Information retrieval systems are compared using evaluation metrics, with researchers commonly reporting results for simple metrics such as precision-at-10 or reciprocal rank together with more complex ones such as average precision or discounted cumulative gain. In this paper, we demonstrate that complex metrics are as good as or better than simple metrics at predicting the performance of the simple metrics on other topics. Therefore, reporting of results from simple metrics alongside complex ones is redundant.

DOI

Scopus

29

被引用数

(Scopus)
Comparing Metrics across TREC and NTCIR: The Robustness to Pool Depth Bias

Tetsuya Sakai

ACM SIGIR 2008 Proceedings 2008年 [査読有り]

CiNii
クリックスルーに基づく探検型検索サイトの設計と開発,

酒井哲也, 小山田浩史, 野上謙一, 北村仁美, 梶浦正浩, 東美奈子, 野中由美子, 小野雅也, 菊池豊

第7回情報科学技術フォーラム2008 2008年 [査読有り]

CiNii
Comparing metrics across TREC and NTCIR: The robustness to system bias

Tetsuya Sakai

International Conference on Information and Knowledge Management, Proceedings 581 - 590 2008年 [査読有り]

　概要を見る

Test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in a more realistic setting, by reducing the number of pooled systems. Even though previous work has shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that these results do not hold in the presence of system bias. In our experiments using TREC and NTCIR data, we first show that condensed-list metrics overestimate new systems while traditional metrics underestimate them, and that the overestimation tends to be larger than the underestimation. We then show that, when relevance data is heavily biased towards a single team or a few teams, the condensed-list versions of Average Precision (AP), Q-measure (Q) and normalised Discounted Cumulative Gain (nDCG), which we call AP', Q' and nDCG', are not necessarily superior to the original metrics in terms of discriminative power, i.e., the overall ability to detect pairwise statistical significance. Nevertheless, even under system bias, AP' and Q' are generally more discriminative than bpref and the condensed-list version of Rank-Biased Precision (RBP), which we call RBP'. Copyright 2008 ACM.

DOI

Scopus

29

被引用数

(Scopus)
Modelling A User Population for Designing Information Retrieval Metrics

Tetsuya Sakai, Stephen Robertson

Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008) 2008年 [査読有り]
On the reliability of information retrieval metrics based on graded relevance

Tetsuya Sakai

INFORMATION PROCESSING & MANAGEMENT 43 ( 2 ) 531 - 548 2007年03月 [査読有り]

　概要を見る

This paper compares 14 information retrieval metrics based on graded relevance, together with 10 traditional metrics based on binary relevance, in terms of stability, sensitivity and resemblance of system rankings. More specifically, we compare these metrics using the Buckley/Voorhees stability method, the Voorhees/Buckley swap method and Kendall's rank correlation, with three data sets comprising test collections and submitted runs from NTCIR. Our experiments show that (Average) Normalised Discounted Cumulative Gain at document cut-off I are the best among the rank-based graded-relevance metrics, provided that l is large. On the other hand, if one requires a recall-based graded-relevance metric that is highly correlated with Average Precision, then Q-measure is the best choice. Moreover, these best graded-relevance metrics are at least as stable and sensitive as Average Precision, and are fairly robust to the choice of gain values. (c) 2006 Elsevier Ltd. All rights reserved.

DOI

Scopus

83

被引用数

(Scopus)
On the Reliability of Factoid Question Answering Evaluation

Tetsuya Sakai

ACM Transactions on Asian Language Information Processing (TALIP) 2007年 [査読有り]
On Penalising Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance

Tetsuya Sakai

Proceedings of the First Workshop on Evaluating Information Access (EVIA 2007) 32 - 43 2007年 [査読有り]

CiNii
User Satisfaction Task: A Proposal for NTCIR-7

Tetsuya Sakai

Proceedings of the First Workshop on Evaluating Information Access (EVIA 2007) 2007年 [査読有り]
Pic-A-Topic: Efficient Viewing of Informative TV Contents on Travel, Cooking, Food and More

Tetsuya Sakai, Tatsuya Uehara, Taishi Shimomori, Makoto Koyama, Mika Fukui

RIAO 2007 Proceedings 2007年 [査読有り]
Alternatives to Bpref

Tetsuya Sakai

Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 71 - 78 2007年 [査読有り]

　概要を見る

Recently, a number of TREC tracks have adopted a retrieval effectiveness metric called bpref which has been designed for evaluation environments with incomplete relevance data. A graded-relevance version of this metric called rpref has also been proposed. However, we show that the application of Q-measure, normalised Discounted Cumulative Gain (nDCG) or Average Precision (AveP)to condensed lists, obtained by ?ltering out all unjudged documents from the original ranked lists, is actually a better solution to the incompleteness problem than bpref. Furthermore, we show that the use of graded relevance boosts the robustness of IR evaluation to incompleteness and therefore that Q-measure and nDCG based on condensed lists are the best choices. To this end, we use four graded-relevance test collections from NTCIR to compare ten different IR metrics in terms of system ranking stability and pairwise discriminative power. Copyright 2007 ACM.

DOI

Scopus

140

被引用数

(Scopus)
Evaluating the Task of Finding One Relevant Document Using Incomplete Relevance Data

Tetsuya Sakai

FIT 2007 Information Technology Letters 2007年 [査読有り]
Evaluating Information Retrieval Metrics based on Bootstrap Hypothesis Tests

Tetsuya Sakai

IPSJ TOD 2007年 [査読有り]
On the Properties of Evaluation Metrics for Finding One Highly Relevant Document

Tetsuya Sakai

IPSJ TOD 2007年 [査読有り]
高精度な音声入力質問応答のための疑問表現補完

筒井秀樹, 真鍋俊彦, 福井美佳, 酒井哲也, 藤井寛子, 浦田耕二

情報処理学会論文誌 2007年 [査読有り]
よりよい検索システム実現のために：正解の良し悪しを考慮した情報検索評価の動向

酒井哲也

情報処理 2006年 [査読有り]
A Further Note on Evaluation Metrics for the Task of Finding One Highly Relevant Document

Tetsuya Sakai

IPSJ SIG Technical Report 2006年 [査読有り]
On the Task of Finding One Highly Relevant Document with High Precision

Tetsuya Sakai

IPSJ TOD 2006年 [査読有り]
Give Me Just One Highly Relevant Document: P-measure

Tetsuya Sakai

ACM SIGIR 2006 Proceedings 2006年 [査読有り]
Evaluating Evaluation Metrics based on the Bootstrap

Tetsuya Sakai

ACM SIGIR 2006 Proceedings 2006年 [査読有り]

CiNii
NTCIRに基づく文書検索技術の進歩に関する一考察

酒井哲也

情報科学技術レターズ 2006年 [査読有り]
Improving the robustness to recognition errors in speech input question answering

Hideki Tsutsui, Toshihiko Manabe, Mika Fukui, Tetsuya Sakai, Hiroko Fujii, Koji Urata

INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS 4182 297 - 312 2006年 [査読有り]

　概要を見る

In our previous work, we developed a prototype of a speech-input help system for home appliances such as digital cameras and microwave ovens. Given a factoid question, the system performs textual question answering using the manuals as the knowledge source. Whereas, given a HOW question, it retrieves and plays a demonstration video. However, our first prototype suffered from speech recognition errors, especially when the Japanese interrogative phrases in factoid questions were misrecognized. We therefore propose a method for solving this problem, which complements a speech query transcript with an interrogative phrase selected from a pre-determined list. The selection process first narrows down candidate phrases based on co-occurrences within the manual text, and then computes the similarity between each candidate and the query transcript in terms of pronunciation. Our method improves the Mean Reciprocal Rank of top three answers from 0.429 to 0.597 for factoid questions.
Pic-A-Topic: Gathering information efficiently from recorded TV shows on travel

Tetsuya Sakai, Tatsuya Uehara, Kazuo Sumita, Taishi Shimomori

INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS 4182 429 - 444 2006年 [査読有り]

　概要を見る

We introduce a system called Pic-A-Topic, which analyses closed captions of Japanese TV shows on travel to perform topic segmentation and topic sentence selection. Our objective is to provide a table-of-contents interface that enables efficient viewing of desired topical segments within recorded TV shows to users of appliances such as hard disk recorders and digital TVs. According to our experiments using 14.5 hours of recorded travel TV shows, Pic-A-Topic's F1-measure for the topic segmentation task is 82% of manual performance on average. Moreover, a preliminary user evaluation experiment suggests that this level of performance may be indistinguishable from manual performance.
Bootstrap-based comparisons of IR metrics for finding one relevant document

Tetsuya Sakai

INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS 4182 374 - 389 2006年 [査読有り]

　概要を見る

This paper compares the sensitivity of IR metrics designed for the task of finding one relevant document, using a method recently proposed at SIGIR 2006. The metrics are: P+-measure, P-measure, O-measure, Normalised Weighted Reciprocal Rank (NWRR) and Reciprocal Rank (RR). All of them except for RR can handle graded relevance. Unlike the ad hoc (but nevertheless useful) "swap" method proposed by Voorhees and Buckley, the new method derives the sensitivity and the performance difference required to guarantee a given significance level directly from Bootstrap Hypothesis Tests. We use four data sets from NTCIR to show that, according to this method, "P((+))-measure >= O-measure >= NWRR >= RR" generally holds, where ">=" means "is at least as sensitive as". These results generalise and reinforce previously reported ones based on the swap method. Therefore, we recommend the use of P((+))-measure and O-measure for practical tasks such as known-item search where recall is either unimportant or immeasurable.
Ranking the NTCIR systems based on multigrade relevance

Tetsuya Sakai

INFORMATION RETRIEVAL TECHNOLOGY 3411 251 - 262 2005年 [査読有り]

　概要を見る

At NTCIR-4, new retrieval effectiveness metrics called Q-measure and R-measure were proposed for evaluation based on multi-grade relevance. This paper shows that Q-measure inherits both the reliability of noninterpolated Average Precision and the multigrade relevance capability of Average Weighted Precision through a theoretical analysis, and then verify the above claim through experiments by actually ranking the systems submitted to the NTCIR-3 CLIR Task. Our experiments confirm that the Q-measure ranking is very highly correlated with the Average Precision ranking and that it is more reliable than Average Weighted Precision.
評価型ワークショップにおけるシステム順位の安定性について

酒井哲也

言語処理学会第11回年次大会併設ワークショップ「評価型ワークショップを考える」 2005年 [査読有り]
固有表現抽出と回答タイプ体系が質問応答システムの性能に与える影響(自然言語処理)

市村由美, 齋藤佳美, 酒井哲也, 国分智晴, 小山誠

電子情報通信学会論文誌 2005年 [査読有り]
Flexible Pseudo-Relevance Feedback via Selective Sampling

Tetsuya Sakai, Toshihiko Manabe, Makoto Koyama

ACM TALIP 2005年 [査読有り]
Advanced Technologies for Information Access (invited paper)

Tetsuya Sakai

International Journal of Computer Processing of Oriental Languages 2005年 [査読有り]
ひとつの高適合文書を高精度に検索するタスクのための評価指標

酒井哲也

情報科学技術レターズ 2005年 [査読有り]
The reliability of metrics based on graded relevance

Tetsuya Sakai

INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS 3689 1 - 16 2005年 [査読有り]

　概要を見る

This paper compares 14 metrics designed for information retrieval evaluation with graded relevance, together with 10 traditional metrics based on binary relevance, in terms of reliability and resemblance of system rankings. More specifically, we use two test collections with submitted runs from the Chinese IR and English IR tasks in the NTCIR-3 CUR track to examine the metrics using methods proposed by Buckley/Voorhees and Voorhees/Buckley as well as Kendall's rank correlation. Our results show that AnDCC(l) and nDCC(l) ((Average) Normalised Discounted Cumulative Cain at Document cut-off 1) are good metrics, provided that I is large. However, if one wants to avoid the parameter I altogether, or if one requires a metric that closely resembles TREC Average Precision, then Q-measure appears to be the best choice.
Introduction to the special issue: Recent advances in information processing and access for Japanese

Tetsuya Sakai, Yuji Matsumoto

ACM Transactions on Asian Language Information Processing 4 ( 4 ) 275 - 376 2005年 [査読有り]

DOI

Scopus
The Relationship between Answer Ranking and User Satisfaction in a Question Answering System

Tomoharu Kokubu, Tetsuya Sakai, Yoshimi Saito, Hideki Tsutsui, Toshihiko Manabe, Makoto Koyama, Hiroko Fujii

NTCIR-5 Proceedings (Open Submission Session) 2005年 [査読有り]
The Effect of Topic Sampling on Sensitivity Comparisons of Information Retrieval Metrics

Tetsuya Sakai

NTCIR-5 Proceedings (Open Submission Session) 2005年 [査読有り]
ASKMi: A Japanese Question Answering System based on Semantic Role Analysis

Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Makoto Koyama, Tomoharu Kokubu, Toshihiko Manabe

RIAO 2004 2004年 [査読有り]

CiNii
New Performance Metrics based on Multigrade Relevance

Tetsuya Sakai

NTCIR-4 Proceedings (Open Submission Session), 2004年 [査読有り]

CiNii
The Effect of Back-Formulating Questions in Question Answering Evaluation

Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Tomoharu Kokubu, Makoto Koyama

ACM SIGIR 2004 2004年 [査読有り]

CiNii
汎用シソーラスと擬似適合性フィードバックとを用いた検索質問拡張

小山誠, 真鍋俊彦, 木村和広, 酒井哲也

「情報アクセスのためのテキスト処理」シンポジウム 2003年 [査読有り]
BRIDJE over a Language Barrier: Cross-Language Information Access by Integrating Translation and Retrieval

Tetsuya Sakai, Makoto Koyama, Masaru Suzuki, Akira Kumano, Toshihiko Manabe

IRAL 2003 2003年 [査読有り]

CiNii
Evaluating Retrieval Performance for Japanese Question Answering: What Are Best Passages?

Tetsuya Sakai, Tomoharu Kokubu

ACM SIGIR 2003 2003年 [査読有り]

CiNii
Average Gain Ratio: A Simple Retrieval Performance Measure for Evaluation with Multiple Relevance Levels

Tetsuya Sakai

ACM SIGIR 2003 2003年 [査読有り]
Relative and Absolute Term Selection Criteria: A Comparative Study for English and Japanese IR

Tetsuya Sakai, Stephen E. Robertson

ACM SIGIR 2002 2002年 [査読有り]
Generating transliteration rules for cross-language information retrieval from machine translation dictionaries

Tetsuya Sakai, Akira Kumano, Toshihiko Manabe

Proceedings of the IEEE International Conference on Systems, Man and Cybernetics 6 290 - 295 2002年 [査読有り]

　概要を見る

This paper describes a method for automatically converting existing English-Japanese and Japanese-English machine translation dictionaries into English-Japanese transliteration rules and Japanese-English back-transliteration rules for cross language information retrieval. An existing English-katakana word alignment module, which is part of our own machine translation system, is exploited in generating probabilistic rewriting rules. If our system is allowed to output 15 candidate spellings, it successfully transliterates more than 75% of a set of out-of-vocabulary English words into katakana, and successfully back-transliterates more than 55% of a set of out-of-vocabulary katakana words into English. Moreover, our preliminary cross-language information retrieval experiments, which treat the candidate spellings as a group of synonyms, suggest that our methods can indeed compensate for the failure of machine translation in some cases.

DOI

Scopus

4

被引用数

(Scopus)
The Use of External Text Data in Cross-Language Information Retrieval based on Machine Translation

Tetsuya Sakai

IEEE SMC 2002 2002年 [査読有り]
意味役割解析に基づく高適合英語文書の検索

酒井哲也, 小山誠, 鈴木優, 真鍋俊彦

FIT 2002 情報技術レターズ LD-8 67 - 68 2002年 [査読有り]

CiNii
A framework for cross-language information access: Application to English and Japanese

Gareth Jones, Nigel Collier, Tetsuya Sakai, Kazuo Sumita, Hideki Hirakawa

COMPUTERS AND THE HUMANITIES 35 ( 4 ) 371 - 388 2001年11月 [査読有り]

　概要を見る

Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documents that they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This paper shows how information retrieval and machine translation can be combined in a cross-language information access framework to help overcome the language barrier. We present encouraging preliminary experimental results using English queries to retrieve documents from the standard Japanese language BMIR-J2 retrieval test collection. We outline the scope and purpose of cross-language information access and provide an example application to suggest that technology already exists to provide effective and potentially useful applications.
A framework for cross-language information access: Application to english and Japanese

Gareth Jones, Nigel Collier, Tetsuya Sakai, Kazuo Sumita, Hideki Hirakawa

Language Resources and Evaluation 35 ( 4 ) 371 - 388 2001年

　概要を見る

Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documents that they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This paper shows how information retrieval and machine translation can be combined in a cross-language information access framework to help overcome the language barrier. We present encouraging preliminary experimental results using English queries to retrieve documents from the standard Japanese language BMIR-J2 retrieval test collection. We outline the scope and purpose of cross-language information access and provide an example application to suggest that technology already exists to provide effective and potentially useful applications. © 2001 Kluwer Academic Publishers.
Flexible Pseudo-Relevance Feedback via Direct Mapping and Categorization of Search Requests

Tetsuya Sakai, Stephen E. Robertson, Stephen Walker

ECIR 2001 2001年 [査読有り]
Japanese-English Cross-Language Information Retrieval using Machine Translation and Pseudo-Relevance Feedback

Tetsuya Sakai

International Journal of Computer Processing of Oriental Languages 14 ( 2 ) 83 - 107 2001年 [査読有り]

DOI CiNii
Flexible Pseudo-Relevance Feedback Using Optimization Tables

Tetsuya Sakai, Stephen E. Robertson

ACM SIGIR 2001 2001年 [査読有り]
Generic Summaries for Indexing in Information Retrieval

Tetsuya Sakai, Karen Sparck Jones

ACM SIGIR 2001 2001年 [査読有り]

CiNii
Combining the Ranked Output from Fulltext and Summary Indexes

Tetsuya Sakai

ACM SIGIR 2001 Workshop on Text Summarization 2001年 [査読有り]
Incremental relevance feedback in Japanese text retrieval

Gareth Jones, Tetsuya Sakai, Masahiro Kajiura, Kazuo Sumita

Information Retrieval 2 ( 4 ) 361 - 384 2000年 [査読有り]

　概要を見る

The application of relevance feedback techniques has been shown to improve retrieval performance for a number of information retrieval tasks. This paper explores incremental relevance feedback for ad hoc Japanese text retrieval
examining, separately and in combination, the utility of term reweighting and query expansion using a probabilistic retrieval model. Retrieval performance is evaluated in terms of standard precision-recall measures, and also using "number-to-view" graphs. Experimental results, on the standard BMIR-J2 Japanese language retrieval collection, show that both term reweighting and query expansion improve retrieval performance. This is reflected in improvements in both precision and recall, but also a reduction in the average number of documents which must be viewed to find a selected number of relevant items. In particular, using a simple simulation of user searching, incremental application of relevance information is shown to lead to progressively improved retrieval performance and an overall reduction in the number of documents that a user must view to find relevant ones. © 2000 Kluwer Academic Publishers.

DOI

Scopus

2

被引用数

(Scopus)
MT-based Japanese-English Cross-Language IR Experiments using the TREC Test Collections

Tetsuya Sakai

IRAL 2000 2000年 [査読有り]
A First Step towards Flexible Local Feedback for Ad hoc Retrieval

Tetsuya Sakai, Masahiro Kajiura, Kazuo Sumita

IRAL 2000 2000年 [査読有り]
BMIR-J2: A test collection for evaluation of Japanese information retrieval systems

Tetsuya Sakai, Tsuyoshi Kitani, Yasushi Ogawa, Tetsuya Ishikawa, Haruo Kimoto, Ikuo Keshi, Jun Toyoura, Toshikazu Fukushima, Kunio Matsui, Yoshihiro Ueda, Takenobu Tokunaga, Hiroshi Tsuruoka, Hidekazu Nakawatase, Teru Agata, Noriko Kando

SIGIR Forum (ACM Special Interest Group on Information Retrieval) 33 ( 1 ) 13 - 17 1999年

　概要を見る

BMIR-J2 is the first complete test collection generally available for evaluating Japanese information retrieval systems. BMIR-J2 features include a novel division of search requests based on various functions required to perform successful retrieval. BMIR-J2 and its smaller predecessor BMIR-J1 were constructed by a volunteer-based working group under the Information Processing Society of Japan. We hope that BMIR-J2 will come into wide use and that it will foster the development of Japanese IR systems.

DOI

Scopus

10

被引用数

(Scopus)
確率モデルに基づく日本語情報フィルタリングにおけるフィードバックによる検索条件展開および検索精度評価

酒井哲也, Gareth J.F. Jones, 梶浦正浩, 住田一男

情報処理学会論文誌 1999年 [査読有り]
A comparison of query translation methods for English-Japanese cross-language information retrieval

Gareth Jones, Tetsuya Sakai, Nigel Collier, Akira Kumano, and Kazuo Sumita

SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL 269 - 270 1999年 [査読有り]

　概要を見る

In this paper we report results of an investigation into English-Japanese Cross-Language Information Retrieval (CLIR) comparing a number of query translation methods. Results from experiments using the standard BMIR-J2 Japanese collection suggest that full machine translation (MT) can outperform popular dictionary-based query translation methods and further that in this context MT is largely robust to queries with little linguistic structure.
Exploring the use of Machine Translation resources for English-Japanese Cross-Language Information Retrieval

Gareth Jones, Tetsuya Sakai, Nigel Collier, Akira Kumano, Kazuo Sumita

MT Summit VII Workshop on Machine Translation for Cross Language Information Retrieval 1999年 [査読有り]
日本語情報検索システム評価用テストコレクションの構築

木本晴夫, 小川泰嗣, 石川徹也, 増永良文, 福島俊一, 田中智博, 中渡瀬秀一, 芥子育雄, 豊浦潤, 宮内忠信, 上田良寛, 松井くにお, 木谷強, 三池誠司, 酒井哲也, 徳永健伸, 鶴岡弘, 安形輝

情報処理学会論文誌 1999年 [査読有り]
機械翻訳を用いた英日・日英言語横断検索に関する一考察

酒井哲也, 梶浦正浩, 住田一男, Jones, G, Collier, N

情報処理学会論文誌 40 ( 11 ) 4075 - 4086 1999年 [査読有り]

CiNii
情報検索システム評価のためのテストコレクション

酒井哲也, 小川泰嗣, 木谷強, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 徳永健伸, 鶴岡弘, 安形輝, 神門典子

Computer Today 1998年 [査読有り]
日本語情報検索システム評価用テストコレクションの構築

木本晴夫, 小川泰嗣, 石川徹也, 増永良文, 福島俊一, 田中智博, 中渡瀬秀一, 芥子育雄, 豊浦潤, 宮内忠信, 上田良寛, 松井くにお, 木谷強, 三池誠司, 酒井哲也, 徳永健伸, 鶴岡弘, 安形輝

情報学シンポジウム'98 1998年 [査読有り]
ユーザーの要求に応じた情報フィルタリングシステム NEATのプロファイル生成

酒井哲也, Jones, G.J.F, 梶浦正浩, 住田一男

Interaction '98 149 - 152 1998年 [査読有り]

CiNii
Lessons from BMIR-J2: A Test Collection for Japanese IR Systems

Tsuyoshi Kitani, Yasushi Ogawa, Tetsuya Ishikawa, Haruo Kimoto, Ikuo Keshi, Jun Toyoura, Toshikazu Fukushima, Kunio Matsui, Yoshihiro Ueda, Tetsuya Sakai, Takenobu Tokunaga, Hiroshi Tsuruoka, Hidekazu Nakawatase, Teru Agata

ACM SIGIR '98 1998年 [査読有り]
Experiments in Japanese Text Retrieval and Routing using the NEAT System

Gareth Jones, Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita

ACM SIGIR '98 1998年 [査読有り]
Application of Query Expansion Techniques in Probabilistic Japanese News Filtering

Tetsuya Sakai, Gareth Jones, Masahiro Kajiura, Kazuo Sumita

IRAL '98 1998年 [査読有り]
情報フィルタリングのためのブール式と文書構造を利用した検索条件生成と検索精度評価

酒井哲也, 梶浦正浩, 住田一男

情報処理学会論文誌 1998年 [査読有り]
日本語テキスト情報検索システムの評価用テストコレクション

酒井哲也, 小川泰嗣, 木谷強, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 徳永健伸, 鶴岡弘, 安形輝

アドバンストデータベースシンポジウム'98, パネル：マルチメディア情報検索ベンチマークの未来 1998年 [査読有り]
WWW上のフロー情報を対象にした情報フィルタ (FreshEye)

住田一男, 上原龍也, 小野顕司, 酒井哲也, 池田朋男, 下郡信宏

インタラクション'97 1997年 [査読有り]
日本語情報検索システム評価用テストコレクションBMIR-J1

福島俊一, 小川泰嗣, 石川徹也, 増永良文, 木本晴夫, 田中智博, 中渡瀬秀一, 芥子育雄, 豊浦潤, 宮内忠信, 上田良寛, 松井くにお, 三池誠司, 酒井哲也, 木谷強, 徳永健伸, 鶴岡弘, 安形輝

自然言語処理シンポジウム'96 1996年 [査読有り]
A User Interface for Generating Dynamic Abstracts of Retrieved Documents

Tetsuya Sakai, Etsuo Itoh, Seiji Miike, Kazuo Sumita

47th FID 1994年 [査読有り]

▼全件表示

書籍等出版物

Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

Tetsuya Sakai, Emi Ishita, Hiroaki Ohshima, Faegheh Hasibi, Jiaxin Mao, Joemon Jose

SIGIR-AP 2024年
Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

Qingyao Ai, Yiqin Liu, Alistair Moffa, Xuanjing Huang, Tetsuya Sakai, Justin Zobel

SIGIR-AP 2023年
Proceedings of ACM SIGIR 2021

Fernando, Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, Tetsuya Sakai, Alejandro Bellogín, Masaharu Yoshioka

2021年
Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact

Tetsuya Sakai, Douglas W. Oard, Noriko Kando

Springer 2020年
Proceedings of the Open-Source IR Replicability Challenge (OSIRRC 2019)

Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

2019年
U-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018年
Q-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018年
Expected Reciprocal Rank. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018年
ERR-IA. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018年
D-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018年
alpha-nDCG. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018年
Advanced Information Retrieval Measures. In: Liu L., Özsu M. (eds) Encyclopedia of Database Systems (Second Edition)

Tetsuya Sakai

Springer 2018年
Laboratory Experiments in Information Retrieval: Sample Sizes, Effect Sizes, and Statistical Power

Tetsuya Sakai

Springer 2018年
Proceedings of AIRS 2018 (LNCS 11292)

Yuen-Hsien Tseng, Tetsuya Sakai, Jing Jiang, Lun-Wei Ku, Dae Hoon Park, Jui-Feng Yeh, Liang-Chih Yu, Lung-Hao Lee, Zhi-Hong Chen

2018年
Proceedings of ACM SIGIR 2017

Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de, Vries, A.P, Ryen W. White

2017年
人工知能学大事典

人工知能学会

共立出版 2017年
Proceedings of SPIRE 2016 (LNCS 9954)

Shunsuke Inegaga, Kunihiko Sadakane, Tetsuya Sakai

Springer 2016年
情報アクセス評価方法論～検索エンジンの進歩のために～,

酒井哲也

コロナ社 2015年
Proceedings of ACM SIGIR 2013

Gareth J.F. Jones, Páraic Sheridan, Diane Kelly, Maarten de Rijke, Tetsuya Sakai

2013年
Proceedings of NTCIR-10

Noriko Kando, Kazuaki Kishida, Eric Tang, Tetsuya Sakai, Makoto P. Kato, Ka Po Chow, Isao Goto, Yotaro Watanabe, Tomoyosi Akiba, Hiromitsu Nishizaki, Akiko Aizawa, Mizuki Morita, Eiji Aramaki

2013年
Proceedings of NTCIR-9

Noriko Kando, Daisuke Ishikawa, Miho Sugimoto, Fredric C. Gey, Tetsuya Sakai, Tomoyosi Akiba, Hideki Shima, Shlomo Geva, Eric Tang, Andrew Trotman, Tsuneaki Kato, Bin Lu, Isao Goto

2011年
Proceedings of the 3rd International Workshop on Evaluating Information Access (EVIA 2010)

Tetsuya Sakai, Mark Sanderson, William Webber, Noriko Kando, Kazuaki Kishida

2010年
Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation,

Shlomo Geva, Jaap Kamps, Carol Peters, Tetsuya Sakai, Andrew Trotman, Ellen Voorhees

2009年
5th Asia Information Retrieval Symposium (AIRS 2009)

Gary Geunbae Lee, Dawei Song, Chin-Yew Lin, Akiko Aizawa, Kazuko Kuriyama, Masaharu Yoshioka, Tetsuya Sakai

Springer 2009年
Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation

Shlomo Geva, Jaap Kamps, Carol Peters, Tetsuya Sakai, Andrew Trotman, Ellen Voorhees

2009年
言語処理学辞典

共同執筆

共立出版 2009年
Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008)

Tetsuya Sakai, Mark Sanderson, Noriko Kando, Miho Sugimoto

2008年
Proceedings of AIRS 2008 (LNCS 4993)

Hang Li, Ting Liu, Wei-Ying Ma, Tetsuya Sakai, Kam-Fai Wong, Guodong Zhou

2008年
Proceedings of the First International Workshop on Evaluating Information Access (EVIA 2007),

Tetsuya Sakai, Mark Sanderson, David Kirk Evans

2007年

▼全件表示

講演・口頭発表等

マルチモーダル大規模言語モデルに基づく論理的および構造的異常の説明可能な検知

藤井野枝子, 酒井哲也

DEIM2025

発表年月： 2025年
A Comparative Study of Group Fairness Measures for Information Retrieval

陶思捷, 酒井哲也

DEIM2025

発表年月： 2025年
Webサイト内URLを用いたColBERTによる文書検索の精度向上

石川敦也, 酒井哲也

DEIM2025

発表年月： 2025年
日本語におけるadversarial text生成手法の検討

真壁潤, 酒井哲也

DEIM2025

発表年月： 2025年
グループフェアなウェブ検索と会話型検索

酒井哲也, Sijie Tao, Hanpei Fang, Yuxiang Zhang

情報処理学会

発表年月： 2024年

開催年月：
2024年

　

　
Cross-lingual Relevance Estimation with Soft and Hard Prompts

Hanpei Fang, Tetsuya Sakai

DEIM

発表年月： 2024年

開催年月：
2024年

　

　
A Study on Automatic Nugget Weight Generation with LLMs

Kai-xin Chang, Tetsuya Sakai

DEIM

発表年月： 2024年

開催年月：
2024年

　

　
中間音楽生成のための Transformer モデル

劉明明, 酒井哲也

DEIM

発表年月： 2024年

開催年月：
2024年

　

　
思い出せない映画に特化した情報検索システムの作成

吉越玲士, 酒井哲也

DEIM

発表年月： 2024年

開催年月：
2024年

　

　
RSLTOT at the TREC 2023 ToT Track

Reo Yoshikoshi, Tetsuya Sakai

TREC

発表年月： 2024年

開催年月：
2024年

　

　
A Practical Guide to Computing Evaluation Measures and Comparing Systems: Twelve Small Tips

酒井哲也 [招待有り]

NTCIR-17

発表年月： 2023年12月

開催年月：
2023年12月

　

　
Overview of the NTCIR-17 FairWeb-1 Task

Sijie Tao, Nuo Chen, Tetsuya Sakai, Zhumin Chu, Hiromi Arai, Ian Soboroff, Nicola Ferro, Maria Maistro

NTCIR-17

発表年月： 2023年
RSLFW at the NTCIR-17 FairWeb 1 Task

Fan Li, Kaize Shi, Kenta Inaba, Sijie Tao, Nuo Chen, Tetsuya Sakai

NTCIR-17

発表年月： 2023年
Evaluating Parrots and Sociopathic Liars (keynote)

Tetsuya Sakai [招待有り]

ACM ICTIR

発表年月： 2023年
On A Few Responsibilities of (IR) Researchers: Fairness, Awareness, and Sustainability (keynote)

Tetsuya Sakai [招待有り]

ECIR

発表年月： 2023年
ウェブ検索結果がユーザの意見形成に及ぼす影響の調査

稲葉健太, 酒井哲也

DEIM

発表年月： 2023年
SWAN: A Generic Framework for Auditing Textual Conversational Systems

Tetsuya Sakai

arXiv, Cornell University

発表年月： 2023年
Evaluating Evaluation Measures, Evaluating Information Access Systems, Designing and Constructing Test Collections, and Evaluating Again

Tetsuya Sakai [招待有り]

Proceedings of NTCIR-16

発表年月： 2022年
グループフェアネスを考慮したウェブ検索タスク

酒井哲也

情報処理学会研究報告

発表年月： 2022年
Overview of the NTCIR-16 WeWantWeb with CENTRE (WWW-4) Task

Tetsuya Sakai, Sijie Tao, Zhumin Chu, Maria Maistro, Yujing Li, Nuo Chen, Nicola Ferro, Junjie Wang, Ian Soboroff, Yiqun Liu

Proceedings of NTCIR-16

発表年月： 2022年
SLWWW at the NTCIR-16 WWW-4 Task

Yuya Ubukata, Masaki Muraoka, Sijie Tao, Tetsuya Sakai

Proceedings of NTCIR-16

発表年月： 2022年
RSLDE at the NTCIR-16 DialEval-2 Task

Fan Li, Tetsuya Sakai

Proceedings of NTCIR-16

発表年月： 2022年
Overview of the NTCIR-16 Dialogue Evaluation (DialEval-2) Task

Sijie Tao, Tetsuya Sakai

Proceedings of NTCIR-16

発表年月： 2022年
On Variants of Root Normalised Order-aware Divergence and a Divergence based on Kendall’s Tau

Tetsuya Sakai

arXiv:2204.07304

発表年月： 2022年
A Versatile Framework for Evaluating Ranked Lists in terms of Group Fairness and Relevance

Tetsuya Sakai, Jin Young Kim, Inho Kang

arXiv:2204.00280

発表年月： 2022年
Transformerを用いた文書の自動品質評価

吉越玲士, 酒井哲也

DEIM 2022

発表年月： 2022年
NTCIR-16ウェブ検索・再現可能性タスク (WWW-4) および対話評価タスク (DialEval-2)への誘い

酒井哲也

情報処理学会研究報告

発表年月： 2021年
対話要約における話者情報を持つEmbeddingの効果

楢木悠士, 酒井哲也, 林良彦

FIT2021講演論文集

発表年月： 2021年
RealSakaiLab at the TREC 2020 Health Misinformation Track

Sijie Tao, Tetsuya Sakai

Proceedings of TREC 2020

発表年月： 2021年
話者情報を認識した対話要約

楢木悠士, 酒井哲也

言語処理学会第27回年次大会発表論文集

発表年月： 2021年
Voice Assistantアプリの対話型解析システムの開発

刀塚敦子, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉

電子情報通信学会第54回情報通信システムセキュリティ研究会

発表年月： 2021年
モバイルアプリケーションにおけるUIデザイン自動評価の検討

栗林峻, 酒井哲也

DEIM 2021

発表年月： 2021年
スタンス検出タスクにおける評価方法の選定

雨宮佑基, 酒井哲也

DEIM 2021

発表年月： 2021年
日経新聞の記事からの日経ラジオ用読み原稿の自動生成

清水嶺, 酒井哲也

DEIM 2021

発表年月： 2021年
有用なレビューを抽出するための比較文フィルタリングの検討

小橋賢介, 雨宮佑基, 酒井哲也

DEIM 2021

発表年月： 2021年
Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents?

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

情報処理学会研究報告

発表年月： 2021年
Overview of the TREC 2018 CENTRE Track

Ian Soboroff, Nicola Ferro, Maria Maistro, Tetsuya Sakai

Proceedings of TREC 2018

発表年月： 2020年
Improving Concept Representations for Short Text Classification

Sijie Tao, Tetsuya Sakai

言語処理学会第26回年次大会発表論文集

発表年月： 2020年
Different Types of Voice User Interface Failures May Cause Different Degrees of Frustration

Shiyoh Goetsu, Tetsuya Sakai

インタラクション 2020

発表年月： 2020年
商品比較のための文脈つき評価軸抽出の検討

小橋賢介, 酒井哲也

DEIM 2020

発表年月： 2020年
Androidアプリの権限要求に対する説明十分性の自動確認システムの提案

小島智樹, 酒井哲也

DEIM 2020

発表年月： 2020年
Purchase Prediction based on Recurrent Neural Networks with an Emphasis on Recent User Activities

Quanyu Piao, Joo-Young Lee, Tetsuya Sakai

DEIM 2020

発表年月： 2020年
Experiments on Unsupervised Text Classification based on Graph Neural Networks

Haoxiang Shi, Cen Wang, Tetsuya Sakai

DEIM 2020

発表年月： 2020年
Do Neural Models for Response Generation Fully Exploit the Input Natural Language Text?

Lingfeng Zhang, Tetsuya Sakai

DEIM 2020

発表年月： 2020年
商品検索におけるゼロマッチ解消のためのデータセット構築の検討

雨宮佑基, 真鍋知博, 藤田澄男, 酒井哲也

DEIM 2020

発表年月： 2020年
解釈可能な内部表現を使用したタスク指向ニューラル対話システムの試作

村田憲俊, 酒井哲也

DEIM 2020

発表年月： 2020年
Response Generation based on the Big Five Personality Traits

Wanqi Wu, Tetsuya Sakai

DEIM 2020

発表年月： 2020年
Different Types of Voice User Interface Failures May Cause Different Degrees of Frustration

Shiyoh Goetsu, Tetsuya Sakai

arXiv

発表年月： 2020年
selt Team’s Entity Linking System at the NTCIR-15 QALab-PoliInfo2

Yuji Naraki, Tetsuya Sakai

Proceedings of NTCIR-15

発表年月： 2020年
SLWWW at the NTCIR-15WWW-3 Task

Masaki Muraoka, Zhaohao Zeng, Tetsuya Sakai

Proceedings of NTCIR-15

発表年月： 2020年
Overview of the NTCIR-15 We Want Web with CENTRE (WWW-3) Task

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng, Yukun Zheng, Jiaxin Mao, Zhumin Chu, Yiqun Liu, Maria Maistro, Zhicheng Dou, Nicola Ferro, Ian Soboroff

Proceedings of NTCIR-15

発表年月： 2020年
RSLNV at the NTCIR-15 Dialogue Evaluation (DialEval-1) Task

Ting Cao, Fan Zhang, Haoxiang Shi, Zhaohao Zeng, Sosuke Kato, Tetsuya Sakai, Injae Lee, Kyungduk Kim, Inho Kang

Proceedings of NTCIR-15

発表年月： 2020年
SKYMN at the NTCIR-15 DialEval-1 Task

Junjie Wang, Yuxiang Zhang, Tetsuya Sakai, Hayato Yamana

Proceedings of NTCIR-15

発表年月： 2020年
Overview of the NTCIR-15 Dialogue Evaluation (DialEval-1) Task

Zhaohao Zeng, Sosuke Kato, Tetsuya Sakai, Inho Kang

Proceedings of NTCIR-15

発表年月： 2020年
ユーザの感覚に近い多様化検索評価指標

酒井哲也, Zhaohao Zeng

FIT2020講演論文集

発表年月： 2020年
On Fuhr’s Guideline for IR Evaluation

Tetsuya Sakai

SIGIR Forum

発表年月： 2020年
擬似アノテーションにもとづく日本語ツイートの極性判定

小橋賢介, 酒井哲也

DEIM 2019

発表年月： 2019年
FigureQAタスクにおける抽象画像を考慮したアプローチ

坂本凜, 酒井哲也

DEIM 2019

発表年月： 2019年
Convolutional Neural Networkを用いたFake News Challengeの検討

雨宮佑基, 酒井哲也

DEIM 2019

発表年月： 2019年
音声ユーザインタフェースにおける処理エラーによるユーザフラストレーションに関する調査

呉越思瑶, 酒井哲也

DEIM 2019

発表年月： 2019年
Query-Focused Extractive Summarization based on Deep Learning: Comparison of Similarity Measures for Pseudo Ground Truth Generation

Yuliska, Tetsuya Sakai

DEIM 2019

発表年月： 2019年
Exploring Multi-label Classification Using Text Graph Convolutional Networks on the NTCIR-13 MedWeb Dataset

Sijie Tao, Tetsuya Sakai

DEIM 2019

発表年月： 2019年
Androidアプリの権限要求に対するユーザーへの説明の補完

小島智樹, 酒井哲也

DEIM 2019

発表年月： 2019年
能動学習を利用した未知語アノテーションの検討

黒澤瞭佑, 酒井哲也

DEIM 2019

発表年月： 2019年
Dialogue Quality Distribution Prediction based on a Loss that Compares Adjacent Probability Bins

河東宗祐, 酒井哲也

DEIM 2019

発表年月： 2019年
Twitterコーパスに基づく雑談対話システムにおける多様性の獲得

村田憲俊, 酒井哲也

DEIM 2019

発表年月： 2019年
文書分類技術に基づくエントリーシートからの業界推薦

三王慶太, 酒井哲也

DEIM 2019

発表年月： 2019年
Graded Relevance Assessments and Graded Relevance Measures of NTCIR: A Survey of the First Twenty Years

Tetsuya Sakai

arXiv:1903.11272

発表年月： 2019年
RSL19BD at DBDC4: Ensemble of Decision Tree-based and LSTM-based Models

Chih-hao Wang, Sosuke Kato, Tetsuya Sakai

arXiv:1905.01799

発表年月： 2019年
Overview of the NTCIR-14 CENTRE Task

Tetsuya Sakai, Nicola Ferro, Ian Soboroff, Zhaohao Zeng, Peng Xiao, Maria Maistro

Proceedings of NTCIR-14

発表年月： 2019年
Overview of the NTCIR-14 We Want Web Task

Jiaxin Mao, Tetsuya Sakai, Cheng Luo, Peng Xiao, Yiqun Liu, Zhicheng Dou

Proceedings of NTCIR-14

発表年月： 2019年
Overview of the NTCIR-14 Short Text Conversation Task: Dialogue Quality and Nugget Detection Subtasks

Zhaohao Zeng, Sosuke Kato, Tetsuya Sakai

Proceedings of NTCIR-14

発表年月： 2019年
SLSTC at the NTCIR-14 STC-3 Dialogue Quality and Nugget Detection Subtasks

Sosuke Kato, Rikiya Suzuki, Zhaohao Zeng, Tetsuya Sakai

Proceedings of NTCIR-14

発表年月： 2019年
SLWWW at the NTCIR-14 We Want Web Task

Peng Xiao, Tetsuya Sakai

Proceedings of NTCIR-14

発表年月： 2019年
NTCIR-15ウェブ検索・再現可能性タスク (WWW-3) および対話評価タスク (DialEval-1)への誘い

酒井哲也

情報処理学会研究報告2019-IFAT-136

発表年月： 2019年
Overview of the TREC 2018 CENTRE Track

Ian Soboroff, Nicola Ferro, Maria Maistro, Tetsuya Sakai

Proceedings of TREC 2018

発表年月： 2019年
クリックと放棄に基づくモバイルバーティカルの順位付け

川崎真未, Inho Kang, 酒井哲也

DEIM 2018

発表年月： 2018年
Generative Adversarial Nets を用いた文書分類の検証

小島智樹, 酒井哲也

DEIM 2018

発表年月： 2018年
単語レベルと文字レベルの情報を用いた日本語対話システムの試作

村田憲俊, 酒井哲也

DEIM 2018

発表年月： 2018年
Classifying Community QA QuestionsThat Contain an Image

Kenta Tamaki, Riku Togashi, Sumio Fujita, Hideyuki Maeda, Tetsuya Sakai

DEIM 2018

発表年月： 2018年
ユーザーのニーズに合わせたインタラクティブな推薦システムの提案

呉越思瑶, 酒井哲也

DEIM 2018

発表年月： 2018年
Report on NTCIR-13: The Thirteenth Round of NII Testbeds and Community for Information Access Research

Yiqun Liu, Makoto P. Kato, Charles L.A. Clarke, Noriko Kando, Tetsuya Sakai

SIGIR Forum 52(1) 2018

発表年月： 2018年
A Comparative Study of Deep Learning Approaches for Visual Question Classification in Community QA

Hsin-Wen Liu, Avikalp Srivastava, Sumio Fujita, Toru Shimizu, Riku Togashi, Tetsuya Sakai

IPSJ SIG Technical Report 2018-IFAT-132 (17)

発表年月： 2018年
対話破綻検出コーパスに対する学習データ選別の検討

河東宗祐, 酒井哲也

情報処理学会研究報告 2018-IFAT-132 (28)

発表年月： 2018年
色・形状・テクスチャに基づく画像検索の自動評価と多様化

富樫陸, 藤田澄男, 酒井哲也

情報処理学会研究報告 2018-IFAT-132 (12)

発表年月： 2018年
Androidアプリのレビューを用いたユーザーへの権限説明の補完

小島智樹, 酒井哲也

情報処理学会研究報告

発表年月： 2018年
評価実験の設計と論文での結果報告: きちんとやっていますか?

酒井哲也

第3回自然言語処理シンポジウム

発表年月： 2017年
Report on NTCIR-12: The Twelfth Round of NII Testbeds and Community for Information Access Research

Makoto P. Kato, Kazuaki Kishida, Noriko Kando, Tetsuya Sakai, Mark Sanderson

SIGIR Forum 50 (2)

発表年月： 2017年
ツイートにおける周辺単語の感情極性値を用いた新語の感情推定

黒澤瞭佑, 酒井哲也

DEIM 2017

発表年月： 2017年
解答検証を利用した選択式問題への自動解答

佐藤航, 酒井哲也

DEIM 2017

発表年月： 2017年
英日言語横断検索におけるクエリ拡張結果の詳細分析

玉置賢太, 酒井哲也

DEIM 2017

発表年月： 2017年
アノテーション分布を考慮した対話破綻検出

河東宗祐, 酒井哲也

DEIM 2017

発表年月： 2017年
拡張クエリを用いたレシピ検索のパーソナライゼーション

犬塚眞太郎, 酒井哲也

DEIM 2017

発表年月： 2017年
クリックに基づく選好グラフを用いたバーティカル適合性推定

門田見侑大, 吉田泰明, 藤田澄男, 酒井哲也

DEIM 2017

発表年月： 2017年
複数人で睡眠習慣改善に臨む際の人間関係と協調の効果

飯島聡美, 酒井哲也

DEIM 2017

発表年月： 2017年
Test Collections and Measures for Evaluating Customer-Helpdesk Dialogues

Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, Tetsuya Sakai

情報処理学会研究報告 2017-NL-232

発表年月： 2017年
Ranking Rich Mobile Verticals based on Clicks and Abandonment

Mami Kawasaki, Inho Kang, Tetsuya Sakai

情報処理学会研究報告 2017-IFAT-127

発表年月： 2017年
Overview of the NTCIR-13 Short Text Conversation Task

Lifeng Shang, Tetsuya Sakai, Hang Li, Ryuichiro Higashinaka, Yusuke Miyao, Yuki Arase, Masako Nomoto

Proceedings of NTCIR-13

発表年月： 2017年
Overview of the NTCIR-13 We Want Web Task

Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou, Chenyan Xiong, Jingfang Xu

Proceedings of NTCIR-13

発表年月： 2017年
SLOLQ at the NTCIR-13 OpenLiveQ Task

Ryo Kashimura, Tetsuya Sakai

Proceedings of NTCIR-13

発表年月： 2017年
SLQAL at the NTCIR-13 QA Lab-3 Task

Kou Sato, Tetsuya Sakai

Proceedings of NTCIR-13

発表年月： 2017年
SLSTC at the NTCIR-13 STC Task

Jun Guan, Tetsuya Sakai

Proceedings of NTCIR-13

発表年月： 2017年
SLWWW at the NTCIR-13 WWW Task

Peng Xiao, Lingtao Li, Yimeng Fan, Tetsuya Sakai

Proceedings of NTCIR-13

発表年月： 2017年
Project Next IR -情報検索の失敗分析‐

難波英嗣, 酒井哲也, 神門典子

情報処理

発表年月： 2016年
発話者を考慮した学習に基づく対話システムの検討

河東宗祐, 酒井哲也

DEIM 2016

発表年月： 2016年
ショッピングサイトにおける購入予測のための行動パターン分析

出縄弘人, Young-In Song, 酒井哲也

DEIM 2016

発表年月： 2016年
コンテキスト付き検索ログを用いた要求ヴァーティカルの分析

門田見侑大, 吉田泰明, 藤田澄男, 酒井哲也

DEIM 2016

発表年月： 2016年
言語の分散表現と擬似適合性フィードバックを用いた英日言語横断検索

玉置賢太, 林佑明, 酒井哲也

DEIM 2016

発表年月： 2016年
協調型ヘルスケア -規則正しい睡眠による日中の生産性向上

飯島聡美, 酒井哲也

DEIM 2016

発表年月： 2016年
Overview of the NTCIR-12 Short Text Conversation Task

Lifeng Shang, Tetsuya Sakai, Zhengdong Lu, Hang Li, Ryuichiro Higashinaka, Yusuke Miyao

NTCIR-12

発表年月： 2016年
Overview of the NTCIR-12 MobileClick Task

Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Virgil Pavlu, Hajime Morita, Sumio Fujita

NTCIR-12

発表年月： 2016年
NEXTI at NTCIR-12 IMine-2 Task

Hidetsugu Nanba, Tetsuya Sakai, Noriko Kando, Atsushi Keyaki, Koji Eguchi, Kenji Hatano, Toshiyuki Shimizu, Yu Hirate, Atsushi Fujii

NTCIR-12

発表年月： 2016年
SLQAL at the NTCIR-12 QALab-2 Task

Shin Higuchi, Tetsuya Sakai

NTCIR-12

発表年月： 2016年
SLSTC at the NTCIR-12 STC Task

Hiroto Denawa, Tomoaki Sano, Yuta Kadotami, Sosuke Kato, Tetsuya Sakai

NTCIR-12

発表年月： 2016年
SLLL at the NTCIR-12 Lifelog Task: Sleepflower and the LIT Subtask

Satomi Iijima, Tetsuya Sakai

NTCIR-12

発表年月： 2016年
Evaluating Helpdesk Dialogues: Initial Considerations from An Information Access Perspective

Tetsuya Sakai, Zhaohao Zeng, Cheng Luo

情報処理学会研究報告

発表年月： 2016年
word2vecによる発話ベクトルの類似度を用いた対話破綻予測

河東宗祐, 酒井哲也

人工知能学会音声・言語理解と対話処理研究会（SLUD）第78回研究会 (第7回対話システムシンポジウム),

発表年月： 2016年
TREC 2014 Temporal Summarization Track Overview

Javed Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Richard, McCreadie, Tetsuya Sakai

TREC 2014

発表年月： 2015年
言語の分散表現による文脈情報を利用した言語横断情報検索

林佑明, 酒井哲也

DEIM Forum 2015

発表年月： 2015年
情報検索のエラー分析

難波英嗣, 酒井哲也

言語処理学会第２１回年次大会ワークショップ

発表年月： 2015年
Topic Set Size Design with the Evaluation Measures for Short Text Conversation

Tetsuya Sakai

情報処理学会研究報告

発表年月： 2015年
ECol 2015: First International Workshop on the Evaluation of Collaborative Information Seeking and Retrieval

Leif Azzopardi, Jeremy Pickens, Tetsuya Sakai, Laure Soulier, Lynda Tamine

ACM CIKM 2015

発表年月： 2015年
TREC 2013 Temporal Summarization

Javd Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai

TREC 2013

発表年月： 2014年
映像入力デバイスを悪用する Android アプリの解析と対策法

渡邉卓弥, 森達哉, 酒井哲也

信学技報

発表年月： 2014年
Androidアプリの説明文とプライバシー情報アクセスの相関分析

渡邉卓弥, 秋山満昭, 酒井哲也, 鷲崎弘宜, 森達哉

マルウェア対策研究人材育成ワークショップ 2014

発表年月： 2014年
Overview of the NTCIR-11 MobileClick Task

Makoto P. Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

NTCIR-11

発表年月： 2014年
A Preview of the NTCIR-10 INTENT-2 Results

Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, Makoto P. Kato, Mayu Iwata

情報処理学会研究報告

発表年月： 2013年
Overview of the NTCIR-10 INTENT-2 Task

Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, Makoto P. Kato, Mayu Iwata

NTCIR-10

発表年月： 2013年
Overview of the NTCIR-10 1CLICK-2 Task

Makoto P. Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

NTCIR-10

発表年月： 2013年
Microsoft Research Asia at the NTCIR-10 Intent Task

Kosetsu Tsukuda, Zhicheng Dou, Tetsuya Sakai

NTCIR-10

発表年月： 2013年
MSRA at NTCIR-10 1CLICK-2

Kazuya Narita, Tetsuya Sakai, Zhicheng Dou, Young-In Song

NTCIR-10

発表年月： 2013年
How Intuitive Are Diversified Search Metrics? Concordance Test Results for the Diversity U-measures

Tetsuya Sakai

情処学会研究報告

発表年月： 2013年
モバイル「情報」検索に向けて: NTCIR-11 MobileClickタスクへの誘い

加藤誠, Matthew Ekstrand-Abueg, Virgil Pavlu, 酒井哲也, 山本岳洋, 岩田麻佑

人工知能学会第5回インタラクティブ情報アクセスと可視化マイニング研究会

発表年月： 2013年
曖昧なクエリと(不)明快なクエリ:NTCIR-10 INTENT-2と1CLICK-2タスクへの誘い

酒井哲也

情報処理学会研究報告

発表年月： 2012年
NTCIR-9総括と今後の展望

酒井哲也, 上保秀夫, 神門典子, 加藤恒昭, 相澤彰子, 秋葉友良, 後藤功雄, 木村文則, 三田村照子, 西崎博光, 嶋秀樹, 吉岡真治, Shlomo Geva, Ling-Xiang Tang, Andrew Trotman, Yue Xu

情報処理学会研究報告

発表年月： 2012年
Frontiers, Challenges, and Opportunities for Information Retrieval: Report from SWIRL 2012 The Second Strategic Workshop on Information Retrieval in Lorne,

Allan, J, Aslam, J, Azzopardi, L, Belkin, N, Borlund, P, Bruza, P, Callan, J, Carman, M, Clarke, C.L.A, Craswell, N. Croft, W, B, Culpepper, J.S, Diaz, F, Dumais, S, Ferro, N, Geva, S, Gonzalo, J, Hawking, D, Jarvelin, K, Jones, G, Jones, R, Kamps, J, Kando, N, Kanoulas, N, Karlgren, J, Kelly, D, Lease, M, Lin, J, Mizzaro, S, Moffat, A, Murdock, V, Oard, D.W, de Rijke, M, Sakai, T, Sanderson, M, Scholer, F, Si, L, Thom, J.A, Thomas, P, Trotman, A, Turpin, A

SIGIR Forum

発表年月： 2012年
The Reusability of a Diversified Search Test Collection

Tetsuya Sakai

情報処理学会研究報告

発表年月： 2012年
One Click One Revisited: Enhancing Evaluation based on Information Units

Tetsuya Sakai, Makoto P. Kato

情報処理学会研究報告

発表年月： 2012年
複数判定者によるコミュニティQAの良質回答の判定

石川大介, 酒井哲也, 関洋平, 栗山和子, 神門典子

情報知識学会誌

発表年月： 2011年
Japanese Hyponymy Extraction based on a Term Similarity Graph

Takuya Akiba, Tetsuya Sakai

情報処理学会研究報告

発表年月： 2011年
Overview of NTCIR-9

Tetsuya Sakai, Hideo Joho

NTCIR-9 Proceedings

発表年月： 2011年
Overview of the NTCIR-9 INTENT Task

Ruihua Song, Min Zhang, Tetsuya Sakai, Makoto P. Kato, Yiqun Liu, Miho Sugimoto, Qinglei Wang, Naoki Orii

NTCIR-9 Proceedings

発表年月： 2011年
Overview of NTCIR-9 1CLICK

Tetsuya Sakai, Makoto P. Kato, Young-In Song

NTCIR-9 Proceedings

発表年月： 2011年
Microsoft Research Asia at the NTCIR-9 1CLICK Task

Naoki Orii, Young-In Song, Tetsuya Sakai

NTCIR-9 Proceedings

発表年月： 2011年
Microsoft Research Asia at the NTCIR-9 Intent Task

Jialong Han, Qinglei Wang, Naoki Orii, Zhicheng Dou, Tetsuya Sakai, Ruihua Song

NTCIR-9 Proceedings

発表年月： 2011年
TTOKU Summarization Based Systems at NTCIR-9 1CLICK Task

Hajime Morita, Takuya Makino, Tetsuya Sakai, Hiroya Takamura, Manabu Okumura

NTCIR-9 Proceedings

発表年月： 2011年
Grid-based Interaction for NTCIR-9 VisEx Task,

Hideo Joho, Tetsuya Sakai

NTCIR-9 Proceedings

発表年月： 2011年
NTCIR-9 VisEx におけるグリッド型インタラクションモデルの研究

上保秀夫, 酒井哲也

人工知能学会情報編纂研究会第７回究会

発表年月： 2011年
Q&Aサイトにおけるベストアンサー推定の分析とその機械学習への応用

石川大介, 栗山和子, 酒井哲也, 関洋平, 神門典子

情報知識学会年次大会予稿

発表年月： 2010年
Overview of the NTCIR-8 ACLIA Tasks: Advanced Cross-Lingual Information Access

Teruko Mitamura, Hideki Shima, Tetsuya Sakai, Noriko Kando, Tatsunori Mori, Koichi Takeda, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, Cheng-Wei Lee

NTCIR-8 Proceedings

発表年月： 2010年
Overview of NTCIR-8 ACLIA IR4QA

Tetsuya Sakai, Hideki Shima, Noriko Kando, Ruihua Song, Chuan-Jia Lin, Teruko Mitamura, Miho Sugimoto, Cheng-Wei Lee

NTCIR-8 Proceedings

発表年月： 2010年
NTCIR-GeoTime Overview: Evaluating Geographic and Temporal Search

Fredric Gey, Ray Larson, Noriko Kando, Jorge Machado, Tetsuya Sakai

NTCIR-8 Proceedings

発表年月： 2010年
Overview of the NTCIR-8 Community QA Pilot Task (Part I):

Daisuke Ishikawa, Tetsuya Sakai, Noriko Kando

The Test Collection and the Task, NTCIR-8 Proceedings

発表年月： 2010年
Overview of the NTCIR-8 Community QA Pilot Task (Part II)

Tetsuya Sakai, Daisuke Ishikawa, Noriko Kando

System Evaluation, NTCIR-8 Proceedings

発表年月： 2010年
Microsoft Research Asia with Redmond at the NTCIR-8 Community QA Pilot Task

Young-In Song, Jing Liu, Tetsuya Sakai, Xin-Jing Wang, Guwen Feng, Yunbo Cao, Hisami Suzuki, Chin-Yew Lin

NTCIR-8 Proceedings

発表年月： 2010年
Mutilinguality at NTCIR, and moving on... (invited talk),

Tetsuya Sakai [招待有り]

Proceedings of the COLING 2010 Fourth Workshop on Cross Lingual Information Access

発表年月： 2010年
EVIA 2010: The Third International Workshop on Evaluating Information Access

William Webber, Tetsuya Sakai, Mark Sanderson

ACM SIGIR Forum

発表年月： 2010年
ウィキペディアを活用した探検型検索サイトのクエリログ分析

酒井哲也, 野上謙一

情報処理学会研究報告

発表年月： 2009年
NTCIR-7 ACLIA IR4QA Results based on Qrels Version 2

Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Ruihua Song, Hideki Shima, Teruko Mitamura

NTCIR-7 Online Proceedings

発表年月： 2009年
EVIA 2008: The Second International Workshop on Evaluating Information Access

Tetsuya Sakai, Mark Sanderson, Noriko Kando

ACM SIGIR Forum

発表年月： 2009年
Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments

Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Ruihua Song, Hideki Shima, Teruko Mitamura

情報処理学会研究報告

発表年月： 2009年
Report on the SIGIR 2009 Workshop on the Future of IR Evaluation,

Jaap Kamps, Shlomo Geva, Carol Peters, Tetsuya Sakai, Andrew Trotman, Ellen Voorhees

ACM SIGIR Forum

発表年月： 2009年
チュートリアル情報検索テストコレクションと評価指標

酒井哲也

情報処理学会研究報告

発表年月： 2008年
Comparing Metrics across TREC and NTCIR: The Robustness to System Bias

Tetsuya Sakai

情報処理学会研究報告

発表年月： 2008年
Breaking News from NTCIR-7 (in Japanese),

酒井哲也, 加藤恒昭, 藤井敦, 難波英嗣, 関洋平, 三田村照子, 神門典子

ディジタル図書館編集委員会

発表年月： 2008年
Are Popular Documents More Likely To Be Relevant? A Dive into the ACLIA IR4QA Pools

Tetsuya Sakai, Noriko Kando

Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008)

発表年月： 2008年
Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access

Teruko Mitamura, Eric Nyberg, Hideki Shima, Tsuneaki Kato, Tatsunori Mori, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, Tetsuya Sakai, Donghong Ji, Noriko Kando

NTCIR-7 Proceedings

発表年月： 2008年
Overview of the NTCIR-7 ACLIA IR4QA Task

Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Teruko Mitamura, Hideki Shima, Donghong Ji, Kuang-Hua Chen, Eric Nyberg

NTCIR-7 Proceedings

発表年月： 2008年
効率的な番組視聴を支援するための話題ラベルの生成とその評価

小山誠, 酒井哲也, 福井美佳, 上原龍也, 下森大志

情報処理学会研究報告

発表年月： 2007年
Toshiba BRIDJE at NTCIR-6 CLIR: The Head/Lead Method and Graded Relevance Feedback

Tetsuya Sakai, Makoto Koyama, Tatsuya Izuha, Akira Kumano, Toshihiko Manabe, Tomoharu Kokubu

NTCIR-6 Proceedings

発表年月： 2007年
A Further Note on Alternatives to Bpref

Tetsuya Sakai, Noriko Kando

情報処理学会研究報告

発表年月： 2007年
EVIA 2007: The First International Workshop on Evaluating Information Access

Mark Sanderson, Tetsuya Sakai, Noriko Kando

ACM SIGIR Forum

発表年月： 2007年
Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance

Tetsuya Sakai

IPSJ SIG Technical Report

発表年月： 2006年
質問応答型検索の音声認識誤りに対するロバスト性向上

筒井秀樹, 真鍋俊彦, 福井美佳, 藤井寛子, 浦田耕二, 酒井哲也

情報処理学会研究報告

発表年月： 2005年
文書分類技法とそのアンケート分析への応用

平澤茂一, 石田崇, 足立鉱史, 後藤正幸, 酒井哲也

経営情報学会2005年度春季全国研究発表大会

発表年月： 2005年
インターネットを用いた研究支援環境～情報検索システム～

石田崇, 足立鉱史, 後藤正幸, 酒井哲也, 平澤茂一

経営情報学会2005年度春季全国研究発表大会

発表年月： 2005年
質問応答システムの正解順位とユーザ満足率の関係について

國分智晴, 酒井哲也, 齋藤佳美, 筒井秀樹, 真鍋俊彦, 藤井寛子

情報処理学会研究報告

発表年月： 2005年
教学支援システムに関する学生アンケートの分析

渡辺智幸, 後藤正幸, 石田崇, 酒井哲也, 平澤茂一

FIT 2005 一般講演論文集

発表年月： 2005年
The Effect of Topic Sampling in Sensitivity Comparisons of Information Retrieval Metrics

Tetsuya Sakai

情報処理学会研究報告

発表年月： 2005年
Toshiba BRIDJE at NTCIR-5: Evaluation using Geometric Means

Tetsuya Sakai, Toshihiko Manabe, Akira Kumano, Makoto Koyama, Tomoharu Kokubu

NTCIR-5 Proceedings

発表年月： 2005年
質問応答技術に基づくマルチモーダルヘルプシステム

浦田耕二, 福井美佳, 藤井寛子, 鈴木優, 酒井哲也, 齋藤佳美, 市村由美, 佐々木寛

情報処理学会研究報告

発表年月： 2004年
質問応答と，日本語固有表現抽出および固有表現体系の関係についての考察

市村由美, 齋藤佳美, 酒井哲也, 國分智晴, 小山誠

情報処理学会研究報告

発表年月： 2004年
Toshiba BRIDJE at NTCIR-4 CLIR: Monolingual/Bilingual IR and Flexible Feedback

Tetsuya Sakai, Makoto Koyama, Akira Kumano, Toshihiko Manabe

NTCIR-4 Proceedings

発表年月： 2004年
Toshiba ASKMi at NTCIR-4 QAC2

Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Makoto Koyama, Tomoharu Kokubu

NTCIR-4 Proceedings

発表年月： 2004年
自然言語表現に基づく学生アンケート分析システム

酒井哲也, 石田崇, 後藤正幸, 平澤茂一

FIT 2004 一般講演論文集 N-021

発表年月： 2004年
新聞記事からの用語定義の抽出と固有表現クラスに基づく分類

小山誠, 酒井哲也, 真鍋俊彦

情報処理学会研究報告

発表年月： 2004年
High-Precision Search via Question Abstraction for Japanese Question Answering

Tetsuya Sakai, Yoshimi Saito, Tomoharu Kokubu, Makoto Koyama, Toshihiko Manabe

情報処理学会研究報告

発表年月： 2004年
情報検索技術を用いた選択式・自由記述式の学生アンケート解析

石田崇, 足立鉱史, 後藤正幸, 酒井哲也, 平澤茂一

経営情報学会2004年度秋季全国研究発表大会

発表年月： 2004年
A Note on the Reliability of Japanese Question Answering Evaluation

Tetsuya Sakai

情報処理学会研究報告

発表年月： 2004年
情報検索技術を用いた効率的な授業アンケートの分析

酒井哲也, 伊藤潤, 後藤正幸, 石田崇, 平澤茂一

経営情報学会2003年度春季全国研究発表大会

発表年月： 2003年
選択式・記述式アンケートからの知識発見

後藤正幸, 酒井哲也, 伊藤潤, 石田崇, 平澤茂一

2003 PCカンファレンス

発表年月： 2003年
授業に関する選択式・記述式アンケートの分析

平澤茂一, 石田崇, 伊藤潤, 後藤正幸, 酒井哲也

私立大学情報教育協会平成15年度大学情報化全国大会

発表年月： 2003年
PLSIを利用した文書からの知識発見

伊藤潤, 石田崇, 後藤正幸, 酒井哲也, 平澤茂一

FIT 2003 一般講演論文集

発表年月： 2003年
質問応答システムにおけるパッセージ検索の評価,

國分智晴, 酒井哲也

FIT 2003 一般講演論文集

発表年月： 2003年
Toshiba KIDS at NTCIR-3: Japanese and English-Japanese IR

Tetsuya Sakai, Makoto Koyama, Mika Suzuki, Toshihiko Manabe

NTCIR-3 Proceedings

発表年月： 2003年
ベイズ統計を用いた文書ファイルの自動分析手法

後藤正幸, 伊藤潤, 石田崇, 酒井哲也, 平澤茂一

経営情報学会2003年度秋季全国研究発表大会

発表年月： 2003年
授業モデルとその検証

石田崇, 伊藤潤, 後藤正幸, 酒井哲也, 平澤茂一

経営情報学会2003年度秋季全国研究発表大会

発表年月： 2003年
係り受け木を用いた日本語文書の重要部分抽出

伊藤潤, 酒井哲也, 平澤茂一

情報処理学会研究報告

発表年月： 2003年
Flexible Pseudo-Relevance Feedback for NTCIR-2

Tetsuya Sakai, Stephen E. Robertson, Stephen Walker

NTCIR-2

発表年月： 2001年
Generic Summaries for Indexing in Information Retrieval - Detailed Test Results

Tetsuya Sakai, Karen Sparck Jones

Computer Laboratory, University of Cambridge

発表年月： 2001年
インターネットを用いた研究活動支援システム

平澤茂一, 松嶋敏泰, 鴻巣敏之, 酒井哲也, 中澤真, 李相協, 野村亮

2001 PCカンファランス

発表年月： 2001年
Cross -language情報検索のためのBMIR - J2を用いた一考察

酒井哲也, 梶浦正浩, 住田一男

情報処理学会研究報告

発表年月： 1999年
Probabilistic Retrieval of Japanese News Articles for IREX at Toshiba

Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita

IREX Workshop

発表年月： 1999年
Cross-Language Information Retrieval for NTCIR at Toshiba

Tetsuya Sakai, Yasuyo Shibazaki, Masaru Suzuki, Masaharu Kajiura, Toshihiko Manabe, Kazuo Sumita

NTCIR-1

発表年月： 1999年
BMIR-J2: A Test Collection for Evaluation of Japanese Information Retrieval Systems

Tetsuya Sakai, Tsuyoshi Kitani, Yasushi Ogawa, Tetsuya Ishikawa, Haruo Kimoto, Ikuro Keshi, Jun Toyoura, Toshikazu Fukushima, Kunio Matsui, Yoshihiro Ueda, Takenobu Tokunaga, Hiroshi Tsuruoka, Hidekazu Nakawatase, Teru Agata, Noriko Kando

ACM SIGIR Forum

発表年月： 1999年
First Experiments on the BMIR-J2 Collection using the NEAT System

Gareth Jones, Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita

情報処理学会研究報告

発表年月： 1998年
Cross-Language Information Access: a case study for English and Japanese

Gareth Jones, Nigel Collier, Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita, Hideki Hirakawa

情報処理学会研究報告

発表年月： 1998年
日本語情報検索システム評価用テストコレクションBMIR-J2

木谷強, 小川泰嗣, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 酒井哲也, 徳永健伸, 鶴岡弘, 安形輝

情報処理学会研究報告

発表年月： 1997年
情報フィルタリングシステムNEATの開発

梶浦正浩, 三池誠司, 酒井哲也, 佐藤誠, 住田一男

第54回情報処理学会全国大会

発表年月： 1997年
ベンチマークBMIR-J2を用いた情報フィルタリングシステムNEATの評価

酒井哲也, 梶浦正浩, 三池誠司, 佐藤誠, 住田一男

第54会情報処理学会全国大会

発表年月： 1997年
情報フィルタリングシステムNEATのための検索要求文からのプロファイル生成

酒井哲也, 梶浦正浩, 住田一男

情報処理学会研究報告

発表年月： 1997年
電子図書館のための効率的な文書検索

住田一男, 酒井哲也, 小野顕司, 三池誠司

ディジタル図書館 No.3

発表年月： 1995年
文書検索システムの動的抄録提示インタフェースの評価

酒井哲也, 三池誠司, 住田一男

情報処理学会研究報告ヒューマンコンピュータインタラクション

発表年月： 1994年

▼全件表示

共同研究・競争的資金等の研究課題

ナゲットに基づくタスク指向対話の自動評価に関する研究

日本学術振興会科学研究費助成事業

研究期間:

2017年04月

-

2021年03月

酒井哲也

　概要を見る

コンペティション型国際会議NTCIR-14にてShort Text Conversation (STC-3) タスクをスケジュール通りに運営し、早稲田大学酒井研究室を含む12の研究機関から結果を提出してもらうことができた。このタスクは、顧客・ヘルプデスク間の対話の品質を推定するものであり、この技術は将来的に対話システムの応答戦略に応用可能である。タスクの評価方法については情報検索会議の最高峰SIGIRにて発表を行い、データセットに関してはJournal of Information Processingにてまとめた。後者はWebDB Forum 2018にてbest paper runner-upに選出された。 ・Zeng, Z., Luo, C., Shang, L., Li, H., and Sakai, T.: Towards Automatic Evaluation of Customer-Helpdesk Dialogues, Journal of Information Processing, Volume 26, pp.768-778, 査読あり, 2018. WebDB Forum 2018 Best Paper Runner-up ・Sakai, T.: Comparing Two Binned Probability Distributions for Information Access Evaluation, Proceedings of ACM SIGIR 2018, pp.1073-1076, 査読あり, 2018.以下のスケジュールに沿ってタスク運営を進めることができた。4月データのクローリング＋アノテーションツールの開発、5-8月データのアノテーション、9月学習用データ公開、11月評価用データ公開・結果提出締切、2月タスクオーバービュー論文暫定版公開、3月タスク参加者論文暫定版投稿2019年度の計画は以下の通りである。・NTCIR-14にてタスク運営者およびタスク参加者としての研究成果を発表・対話データセットDCH-1の中英翻訳を進め、より広くの対話研究者が使えるようにする・NTCIR-15における対話タスクの設計と提案、推
利用者の状況を考慮する探索的検索の技術

日本学術振興会科学研究費助成事業

研究期間:

2016年04月

-

2020年03月

神門典子, 吉岡真治, 山本岳洋, 酒井哲也, 相澤彰子, 大島裕明

　概要を見る

（1）ユーザの状況の捕捉：ユーザ実験によって収集したインタラクションデータ(検索行動ログ、視線、検索前後のアンケート・コンセプトマップ・インタビューなど）を用い、検索過程における個々のユーザの状況の捕捉を研究した。タスクの認知的複雑さ（Complexity)、背景知識、タスクの自覚する困難さや満足度との関係を解析した。対象は、法律・学術における「調査」、Web、音楽を取り上げた。中国・清華大の協力により、モバイル端末を用いた検索インタラクションとユーザ背景知識、香港大学の協力により音楽検索などの「楽しみのための探索的検索Search for fun」とタスクの認知的複雑性の影響について検討した。さらに新たなインタラクション環境として、美術館博物館の来館者のタブレット端末を用いたデジタル空間での探索行動と実空間での探索閲覧行動の両面を補足し、それらを用いてよりよい探索エクスペリエンス提供を目指すすサブ課題に着手した。 （2）ユーザの状況に応じて、ユーザを支援する技術：異なる認知的複雑さタスクを選定し、検索過程でユーザを支援する技術のプロトタイプを提案した。具体的には (a)クエリマイニングによる代替検索戦略提案：コミュニティQAコーパスの質問・解答構造を利用して、同じ検索意図や目的のための異なる検索戦略をユーザに提示、(b)マルチファセット検索UIのベースラインシステムを構築し、随時、サブカテゴリや検索Trail提示の仕組みを提案した。これらの支援メカニズムが有用な状況やユーザ特性について検討を深める。さらに、情報源のスタンスなどより包括的な視点の有用性も検討した。(c)コンセプトマップが探索に与える影響について検討した。 (3) 検索の基礎技術：今年度は、自然言語対話、モバイル端末におけるタッチインタラクション、検索実験計画法と評価法について研究をすすめた。いくつかの対象領域に焦点を当て、ユーザの探索行動の捕捉とも出るかのための解析、探索過程で状況に応じて探索の方向性を提案する検索支援について、研究を進めることができた。中国・清華大の協力により、モバイル端末を用いた検索インタラクション、香港大学の協力により音楽検索など、予定よりも豊かな研究対象に取り組むことができた。さらに、今年度後半から、より豊かな探索インタラクションデータを捕捉できる環境として、従来のデジタル空間での探索行動に加え、センサー等によりユーザの物理空間での探索・閲覧行動も捕捉して、両者を連携して、よりよい探索インタラクション経験を提供するメカニズムについても検討を進めていくこととした。（1）ユーザの状況の捕捉：既有のインタラクションデータに加え、新たにユーザ検索実験によりインタラクションデータを収集し、より多面的に解析を行う。 （2）ユーザの状況に応じて、ユーザを支援する技術：異なる認知的複雑さタスクを選定し、検索過程でユーザを支援する技術のプロトタイプについて、ユーザを支援するメカニズムが、有用である、状況、タスクの特性、ユーザの特性との関係をより明らかにできるように検討をすすめる。 (3) 検索の基礎技術：必要に応じて研究をすすめ、（２）のユニットに適用する

Misc

Voice Assistantアプリの対話型解析システムの開発

刀塚敦子, 飯島涼, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉, 森達哉

電子情報通信学会技術研究報告(Web) 120 ( 384(ICSS2020 26-59) ) 2021年

J-GLOBAL
Overview of the NTCIR-12 MobileClick-2 Task.

Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto,Virgil Pavlu, Hajime Morita, Sumio Fujita

Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10, 2016 2016年
Overview of the NTCIR-11 MobileClick Task.

Makoto P. Kato, Matthew Ekstrand-Abueg,Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-11, National Center of Sciences, Tokyo, Japan, December 9-12, 2014 2014年
A Preview of the NTCIR-10 INTENT-2 Results

Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, MakotoP.Kato, Mayu Iwata

研究報告デジタルドキュメント（DD） 2013 ( 5 ) 1 - 8 2013年02月

　概要を見る

The second NTCIR INTENT task (INTENT-2) will be concluded at the NTCIR-10 conference in June 2013. The task comprises two subtasks: Subtopic Mining (given a query, return a ranked list of subtopic strings) and Document Ranking (given a query, return a diversified web search result). The task attracted participating teams from China, France, Japan and South Korea: 12 teams for Subtopic Mining and 4 teams for Document Ranking. This paper provides a preview of the official results of the task, while keeping the participating teams anonymous.The second NTCIR INTENT task (INTENT-2) will be concluded at the NTCIR-10 conference in June 2013. The task comprises two subtasks: Subtopic Mining (given a query, return a ranked list of subtopic strings) and Document Ranking (given a query, return a diversified web search result). The task attracted participating teams from China, France, Japan and South Korea: 12 teams for Subtopic Mining and 4 teams for Document Ranking. This paper provides a preview of the official results of the task, while keeping the participating teams anonymous.

CiNii
A Preview of the NTCIR-10 INTENT-2 Results

Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, MakotoP.Kato, Mayu Iwata

研究報告情報基礎とアクセス技術（IFAT） 2013 ( 5 ) 1 - 8 2013年02月

　概要を見る

The second NTCIR INTENT task (INTENT-2) will be concluded at the NTCIR-10 conference in June 2013. The task comprises two subtasks: Subtopic Mining (given a query, return a ranked list of subtopic strings) and Document Ranking (given a query, return a diversified web search result). The task attracted participating teams from China, France, Japan and South Korea: 12 teams for Subtopic Mining and 4 teams for Document Ranking. This paper provides a preview of the official results of the task, while keeping the participating teams anonymous.The second NTCIR INTENT task (INTENT-2) will be concluded at the NTCIR-10 conference in June 2013. The task comprises two subtasks: Subtopic Mining (given a query, return a ranked list of subtopic strings) and Document Ranking (given a query, return a diversified web search result). The task attracted participating teams from China, France, Japan and South Korea: 12 teams for Subtopic Mining and 4 teams for Document Ranking. This paper provides a preview of the official results of the task, while keeping the participating teams anonymous.

CiNii
Overview of the NTCIR-10 1CLICK-2 Task.

Makoto P. Kato, Matthew Ekstrand-Abueg,Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies, NTCIR-10, National Center of Sciences, Tokyo, Japan, June 18-21, 2013 182 - 211 2013年
Query Snowball : A Co-occurrence-based Approach to Multi-document Summarization for Question Answering (データベース Vol.5 No.2)

Hajime Morita, Tetsuya Sakai, Manabu Okumura

情報処理学会論文誌データベース（TOD） 5 ( 2 ) 11 - 16 2012年06月

　概要を見る

We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.

CiNii
複数判定者によるコミュニティQAの良質回答の判定

石川大介, 酒井哲也, 関洋平, 栗山和子, 神門典子

情報知識学会誌 21 ( 2 ) 169 - 177 2011年05月

　概要を見る

コミュニティ型質問応答サイト(CQA) は，ユーザが自身の状況に応じた回答を得ることができる新たな情報獲得手段である．しかしCQA サイトに投稿された回答の質は様々であるため，そこから良質な回答を効果的に取り出す方法が必要である．そこで本研究は，Yahoo!知恵袋データを用いて複数判定者によって手動で良質回答を分析し，判定者らが良質回答の評価に用いた判定基準を特定した．

DOI CiNii
Overview of the NTCIR-9 INTENT Task.

Ruihua Song, Min Zhan, Tetsuya Sakai, Makoto, P. Kato, Yiqun Liu, Miho Sugimoto, Qinglei Wang, Naoki Orii

Proceedings of the 9th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, NTCIR-9, National Center of Sciences, Tokyo, Japan, December 6-9, 2011 82 - 105 2011年
Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments (情報学基礎(FI) Vol.2009-FI-95)

Tetsuya Sakai, Noriko Kando, Chuan-JieLin, Ruihua Song, Hideki Shima, Teruko Mitamura

研究報告情報学基礎（FI） 2009 ( 9 ) 1 - 8 2009年07月

　概要を見る

At the NTCIR-7 Workshop Meeting held in December 2008, participating systems of the ACLIA IR4QA task were evaluated based on "qrels version 1," which covered the depth-30 pool for every topic and went further down the pool for a limited number of topics, due to time constraints. This paper reports on revised results based on "qrels version 2" which covers the depth-100 pool for every topic. While the version 1 and version 2 results are generally in agreement, some differences in system rankings and significance test results suggest that the additional effort was worthwhile. This paper also reports on a set of additional experiments with new "pseudo-qrels," which mimics the qrels without relying on any manual relevance assessments. Our pseudo-qrels experiments are surprisingly successful: the Pearson correlation coefficients between performances based on our "size-100" pseudo-qrels and those based on qrels version 2 are over 0.9, and even the Kendall rank correlations are 0.58-0.86. Hence, for the next round of IR4QA at NTCIR-8, we may be able to predict system rankings with reasonable accuracy using size-100 pseudo-qrels, right after the run submission deadline.At the NTCIR-7 Workshop Meeting held in December 2008, participating systems of the ACLIA IR4QA task were evaluated based on "qrels version 1," which covered the depth-30 pool for every topic and went further down the pool for a limited number of topics, due to time constraints. This paper reports on revised results based on "qrels version 2" which covers the depth-100 pool for every topic. While the version 1 and version 2 results are generally in agreement, some differences in system rankings and significance test results suggest that the additional effort was worthwhile. This paper also reports on a set of additional experiments with new "pseudo-qrels," which mimics the qrels without relying on any manual relevance assessments. Our pseudo-qrels experiments are surprisingly successful: the Pearson correlation coefficients between performances based on our "size-100" pseudo-qrels and those based on qrels version 2 are over 0.9, and even the Kendall rank correlations are 0.58-0.86. Hence, for the next round of IR4QA at NTCIR-8, we may be able to predict system rankings with reasonable accuracy using size-100 pseudo-qrels, right after the run submission deadline.

CiNii
Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments (データベースシステム(DBS) Vol.2009-DBS-148)

Tetsuya Sakai, Noriko Kando, Chuan-JieLin, Ruihua Song, Hideki Shima, Teruko Mitamura

研究報告データベースシステム（DBS） 2009 ( 9 ) 1 - 8 2009年07月

　概要を見る

At the NTCIR-7 Workshop Meeting held in December 2008, participating systems of the ACLIA IR4QA task were evaluated based on "qrels version 1," which covered the depth-30 pool for every topic and went further down the pool for a limited number of topics, due to time constraints. This paper reports on revised results based on "qrels version 2" which covers the depth-100 pool for every topic. While the version 1 and version 2 results are generally in agreement, some differences in system rankings and significance test results suggest that the additional effort was worthwhile. This paper also reports on a set of additional experiments with new "pseudo-qrels," which mimics the qrels without relying on any manual relevance assessments. Our pseudo-qrels experiments are surprisingly successful: the Pearson correlation coefficients between performances based on our "size-100" pseudo-qrels and those based on qrels version 2 are over 0.9, and even the Kendall rank correlations are 0.58-0.86. Hence, for the next round of IR4QA at NTCIR-8, we may be able to predict system rankings with reasonable accuracy using size-100 pseudo-qrels, right after the run submission deadline.At the NTCIR-7 Workshop Meeting held in December 2008, participating systems of the ACLIA IR4QA task were evaluated based on "qrels version 1," which covered the depth-30 pool for every topic and went further down the pool for a limited number of topics, due to time constraints. This paper reports on revised results based on "qrels version 2" which covers the depth-100 pool for every topic. While the version 1 and version 2 results are generally in agreement, some differences in system rankings and significance test results suggest that the additional effort was worthwhile. This paper also reports on a set of additional experiments with new "pseudo-qrels," which mimics the qrels without relying on any manual relevance assessments. Our pseudo-qrels experiments are surprisingly successful: the Pearson correlation coefficients between performances based on our "size-100" pseudo-qrels and those based on qrels version 2 are over 0.9, and even the Kendall rank correlations are 0.58-0.86. Hence, for the next round of IR4QA at NTCIR-8, we may be able to predict system rankings with reasonable accuracy using size-100 pseudo-qrels, right after the run submission deadline.

CiNii
趣味と仕事

酒井哲也

情報処理 49 ( 7 ) 835 - 835 2008年07月

CiNii
Comparing metrics across TREC and NTCIR: the robustness to system bias (データベースシステム・情報学基礎)

酒井哲也

情報処理学会研究報告データベースシステム（DBS） 2008 ( 56 ) 1 - 8 2008年06月

　概要を見る

Test collections are growing larger and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used evaluation metrics specifically designed to handle this problem but most of them have only examined the metrics under incomplete but unbiased conditions using random samples of the original relevance data. This paper examines nine metrics in more realistic settings by reducing the number of pooled systems. Even though previous work has shown that metrics based on a condensed list obtained by removing all unjudged documents from the original ranked list are effective for handling very incomplete but unbiased relevance data we show that they are not necessarily superior to traditional metrics in the presence of system bias. Using data from both TREC and NTCIR we first show that condensed-list metrics overestimate new systems while traditional metrics underestimate them and that the overestimation tends to be larger than the underestimation. We then show that when relevance data is heavily biased towards a single team or a few teams the condensed-list versions of Average Precision (AP) Q-measure (Q) and normalised Discounted Cumulative Gain (nDCG) which we call AP' Q' and nDCG' are not necessarily superior to the original metrics in terms of discriminative power i.e. the overall ability to detect pairwise statistical significance. Nevertheless AP' and Q' are generally more discriminative than bpref and the condensed-list version of Rank-Biased Precision (RBP) which we call RBP'.Test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in more realistic settings, by reducing the number of pooled systems. Even though previous work has shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that they are not necessarily superior to traditional metrics in the presence of system bias. Using data from both TREC and NTCIR, we first show that condensed-list metrics overestimate new systems while traditional metrics underestimate them, and that the overestimation tends to be larger than the underestimation. We then show that, when relevance data is heavily biased towards a single team or a few teams, the condensed-list versions of Average Precision (AP), Q-measure (Q) and normalised Discounted Cumulative Gain (nDCG), which we call AP', Q' and nDCG', are not necessarily superior to the original metrics in terms of discriminative power, i.e., the overall ability to detect pairwise statistical significance. Nevertheless, AP' and Q' are generally more discriminative than bpref and the condensed-list version of Rank-Biased Precision (RBP), which we call RBP'.

CiNii
情報検索テストコレクションと評価指標

酒井哲也

情報処理学会研究報告自然言語処理（NL） 2008 ( 4 ) 1 - 8 2008年01月

　概要を見る

本稿では，情報検索評価用テストコレクションの概要とその作成過程，検索有効性の評価指標，およびこれらを用いた情報検索評価の方法について述べる．This paper describes what information retrieval test collections consist of and how they are constructed, and provide some formal definitions of evaluation metrics for measuring retrieval effectiveness. We then describe how to conduct sound evaluation experiments.

CiNii
情報検索テストコレクションと評価指標

酒井哲也

情報処理学会研究報告情報学基礎（FI） 2008 ( 4 ) 1 - 8 2008年01月

　概要を見る

本稿では，情報検索評価用テストコレクションの概要とその作成過程，検索有効性の評価指標，およびこれらを用いた情報検索評価の方法について述べる．This paper describes what information retrieval test collections consist of and how they are constructed, and provide some formal definitions of evaluation metrics for measuring retrieval effectiveness. We then describe how to conduct sound evaluation experiments.

CiNii
A further note on alternatives to Bpref (情報学基礎)

Tetsuya SAKAI, Noriko KANDO

情報処理学会研究報告情報学基礎（FI） 2007 ( 109 ) 7 - 14 2007年11月

　概要を見る

This paper compares the robustness of information retrieval (IR) metrics to incomplete relevance assessments using four different sets of graded-relevance test collections with submitted runs - two from TREC and two from NTCIR. We investigate the effect of reducing the original relevance data on discriminative power (i.e. how often statistical significance can be detected given the probability of Type I Error) and on Kendall's rank correlation between two system rankings. According to these experiments Q' nDCG' and AP' proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also clarify some properties of these metrics that immediately follow from their definitions.This paper compares the robustness of information retrieval (IR) metrics to incomplete relevance assessments, using four different sets of graded-relevance test collections with submitted runs - two from TREC and two from NTCIR. We investigate the effect of reducing the original relevance data on discriminative power (i.e., how often statistical significance can be detected given the probability of Type I Error) and on Kendall's rank correlation between two system rankings. According to these experiments, Q', nDCG' and AP' proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also clarify some properties of these metrics that immediately follow from their definitions.

CiNii
Q, Rの次はO, そしてP…

酒井哲也

情報処理 48 ( 7 ) 761 - 761 2007年07月

CiNii
効率的な番組視聴を支援するための話題ラベルの生成とその評価

小山誠, 酒井哲也, 福井美佳, 上原龍也, 下森大志

情報処理学会研究報告デジタルドキュメント（DD） 2007 ( 34 ) 17 - 23 2007年03月

　概要を見る

本論文では、ＴＶ番組を話題毎に分割して得られるセグメントに対して、内容を簡潔に表す代表キーワードや代表フレーズ、代表文を付与する話題ラベリングの手法とその評価について述べる。本手法は、セグメントに対するクローズドキヤプション（字幕テキスト）から、情報検索における適合フィードバック技術を応用して、対象セグメントの話題に関するキーワードやフレーズ、文を抽出する。「旅行」「タウン」「料理」ジャンルの情報番組を対象に、自動および人手により作成した文形式のラベル、キーワード・フレーズ形式のラベルに対して、３９名の被験者による主観評価実験を実施した。その結果、ラベルの「わかりやすさ」と「適切さ」に関し、全体としてはキーワード・フレーズ形式のラベルが文形式のラベルより高い評価を得られることを確認した。This paper describes a method for generating keyword, phrase and sentence labels for video segments of TV programs. By using a relevance feedback algorithm in information retrieval, it selects topic keywords, phrases and sentences from closed caption text in each topical segment. 39 subjects evaluated keyword, phrase and sentence labels from TV programs about travel, town and cooking. The results show that keyword and phrase labels achieve better results than sentence labels on understandability and relevance of labels.

CiNii
Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evalution with Graded Relevance

Tetsuya Sakai

情報処理学会研究報告情報学基礎（FI） 2006 ( 94 ) 57 - 64 2006年09月

　概要を見る

Large-scale information retrieval evalution efforts such as TREC and NTCIR have always used binary-relevance evalution metrics even when graded relevance data were available. However the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evalution with graded relevance.Large-scale information retrieval evalution efforts such as TREC and NTCIR have always used binary-relevance evalution metrics, even when graded relevance data were available. However, the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics, though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evalution with graded relevance.

CiNii
Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evalution with Graded Relevance

Tetsuya Sakai

情報処理学会研究報告自然言語処理（NL） 2006 ( 94 ) 57 - 64 2006年09月

　概要を見る

Large-scale information retrieval evalution efforts such as TREC and NTCIR have always used binary-relevance evalution metrics even when graded relevance data were available. However the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evalution with graded relevance.Large-scale information retrieval evalution efforts such as TREC and NTCIR have always used binary-relevance evalution metrics, even when graded relevance data were available. However, the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics, though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evalution with graded relevance.

CiNii
質問応答型検索の音声認識誤りに対するロバスト性向上

筒井秀樹, 真鍋俊彦, 福井美佳, 藤井寛子, 浦田耕二, 酒井哲也

情報処理学会研究報告自然言語処理（NL） 2005 ( 22 ) 31 - 38 2005年03月

　概要を見る

我々はこれまで，質問応答型マルチモーダルヘルプシステムの開発を行ってきた．これはユーザからの質問に対し，映像・音声・取扱説明(テキスト)などで構成される表現力豊かなマルチモーダルコンテンツの検索技術，および，質問内容を理解し，ユーザが必要としている情報に対して的確に回答する質問応答技術を融合することにより，よりわかりやすい情報提供を実現したシステムである．この中で，音声入力による質問を処理する際，音声認識誤りが起きると，その後の処理がうまく行かず，適切な回答が出来ない場合があった．失敗原因を検討した結果，具体的な時間や量をきくFactoid型の質問に対する音声認識誤りの影響が大きいことがわかった．これは，音声認識誤りによって疑問詞についての情報が失われることにより，具体的に何を回答すべきかを判定する回答タイプ判定に失敗することが原因であった．そこで今回，音声認識誤りに対するロバスト性向上を目的とし，回答タイプが正しく判定されるように，音声認識結果を補完して検索する手法を開発した．その結果，上位３位までのＭＲＲ(Mean Reciprocal Rank)による検索精度で，従来手法が0.429であったのに対し，今回の手法では0.597に向上した．We have been developing a multimodal question answering system that combines the search technol-ogy for multimodal contents with high expressive power such as video, speech and text, and the factoid question answering technology for understanding the user's information need and extracting exact an-swers from text. Failure analyses of our system showed that speech recognition errors were fatal for answer type recognition and therefore for the final Mean Reciprocal Rank (MRR) performance, espe-cially with numerical factoid questions. We therefore propose a new method which is robust to speech recognition errors. This method improves our MRR based on top 3 answers from 0.429 to 0.597.

CiNii
A Note on the Reliability of Japanese Question Answering Evaluation

Tetsuya Sakai

情報処理学会研究報告情報学基礎（FI） 2004 ( 119 ) 57 - 64 2004年11月

　概要を見る

This paper compares existing QA evaluation metrics from the viewpoint of reliability and usefulness using the NTCIR-4 QAC2 Japanese QA tasks and our adaptations of Buckley/Voorhees and Voorhees/Buckley reliability measurement methods. Our main findings are: (1) The fraction of questions with a correct answer within Top 5 (NQcorrect5) and that with a correct answer at Rank 1 (NQcorrect1) are not as stable as Reciprocal Rank based on ranked lists containing up to five answers. (2) Q-measure which can handle multiple correct answers and answer correctness levels is as reliable and useful as Reciprocal Rank provided that a mild gain value assignment is used. Using answer correctness levels tends to hurt stability while handling multiple correct answers improves it.This paper compares existing QA evaluation metrics from the viewpoint of reliability and usefulness, using the NTCIR-4 QAC2 Japanese QA tasks and our adaptations of Buckley/Voorhees and Voorhees/Buckley reliability measurement methods. Our main findings are: (1) The fraction of questions with a correct answer within Top 5 (NQcorrect5) and that with a correct answer at Rank 1 (NQcorrect1) are not as stable as Reciprocal Rank based on ranked lists containing up to five answers. (2) Q-measure, which can handle multiple correct answers and answer correctness levels, is as reliable and useful as Reciprocal Rank, provided that a mild gain value assignment is used. Using answer correctness levels tends to hurt stability, while handling multiple correct answers improves it.

CiNii
High - Precision Search via Question Abstraction for Japanese Question Answering

Tetsuya SAKAI, Yoshimi SAITO, Tomoharu KOKUBU, Makoto KOYAMA, Toshihiko MANABE

情報処理学会研究報告自然言語処理（NL） 2004 ( 93 ) 139 - 146 2004年09月

　概要を見る

This paper explores the use of Question Abstraction i.e. Named Entity Recognition for questions input by the user for reranking retrieved documents to enhance retrieval precision for Japanese Question Answering (QA). Question Abstraction may help improve precision because (a) As named entities are often phrases it may have effects that are similar to phrasal or proximity search; (b) As named entity recognition is context-sensitive the named entity tags may help disambiguate ambiguous terms and phrases. Our experiments using several Japanese ``exact answer'' QA test collections show that this approach significantly improves IR precision but that this improvement is not necessarily carried over to the overall QA performance. Additionally we conduct preliminary experiments on the use of Question Abstraction for Pseudo-Relevance Feedback using Japanese {?em IR} test collections and find positive (though not statistically significant) effects. Thus the Question Abstraction approach probably deserves further investigations.This paper explores the use of Question Abstraction, i.e., Named Entity Recognition for questions input by the user, for reranking retrieved documents to enhance retrieval precision for Japanese Question Answering (QA). Question Abstraction may help improve precision because (a) As named entities are often phrases, it may have effects that are similar to phrasal or proximity search; (b) As named entity recognition is context-sensitive, the named entity tags may help disambiguate ambiguous terms and phrases. Our experiments using several Japanese ``exact answer'' QA test collections show that this approach significantly improves IR precision, but that this improvement is not necessarily carried over to the overall QA performance. Additionally, we conduct preliminary experiments on the use of Question Abstraction for Pseudo-Relevance Feedback using Japanese {\em IR} test collections, and find positive (though not statistically significant) effects. Thus the Question Abstraction approach probably deserves further investigations.

CiNii
新聞記事からの用語定義の抽出と固有表現クラスに基づく分類

小山誠, 酒井哲也, 真鍋俊彦

情報処理学会研究報告自然言語処理（NL） 2004 ( 93 ) 45 - 51 2004年09月

　概要を見る

本報告では，質問応答システムなどの自然言語処理システムの言語知識の拡張のため，新聞記事から用語定義を抽出し，分類・体系化するシステムを提案する．本システムは，定義文に対する固有表現抽出結果から得られる固有表現の意味クラスと，定義文に対する形態素解析結果から抽出される語に基づき，用語定義を分類する．新聞記事を用いた評価実験を行った結果，14の意味クラスに対して，適合率82.1%，再現率50.8%で抽出した用語定義を分類できることを確認した．In this paper, we propose a system that uses Japanese newspaper corpora for extracting and classifying term definitions to expand the knowledge of a natural language system such as a question answering system. The system classifies term definitions based on semantic classes obtained through named entity extraction and words obtained through morphological analysis. In an experiment using news articles, the system classifies term definitions by 14 semantic classes and achieves 82.1% precision and 50.8% recall.

CiNii
N-021 自然言語表現に基づく学生アンケート分析システム(N.教育・人文科学)

酒井哲也, 石田崇, 後藤正幸, 平澤茂一

情報科学技術フォーラム一般講演論文集 3 ( 4 ) 325 - 328 2004年08月

CiNii
係り受け木を用いた日本語文書の重要部分抽出

伊藤潤, 酒井哲也, 平澤茂一

情報処理学会研究報告自然言語処理（NL） 2003 ( 108 ) 19 - 24 2003年11月

　概要を見る

日本語の文は、係り受け関係をもとに木構造（係り受け木）で表すことができる．係り受け木の部分木の表す分は，係り受け関係が保存されるため一般に正しい文となる．本稿では，文書を拡大係り受け木として表し，そのノード，エッジに重みを与える．そして，重要部分抽出問題を「拡大係り受け木の部分木のうち評価値を最大にする木を探索する問題」として定式化し，その最適化問題を解くアルゴリズムを示す．その後，提案手法による要約システムを実装し，作成された要約文を人手による採点と原文との類似度で評価を行った．A Japanese sentence can be expressed as a tree structure (dependency tree) based on dependency relations.Since a subtree of a dependency tree preserves the dependency relations of the original tree, it generally represents a correct sentence on its own. In this paper, a document is expressed as an extended dependency tree, in which weights are assigned to its nodes and edges. Moreover, the problem of extracting important text fragments is formalized as that of "searching for a subtree that maximizes a certain score from subtrees of the extended decision tree". We implemented auch a summarization system and performed evaluations based on manual assessment as well as comparison with original texts.

CiNii
"ベイズ統計を用いた文書ファイルの自動分析手法,"

後藤正幸, 伊藤潤, 石田崇, 酒井哲也

経営情報学会2003年度秋季全国研究発表大会予稿集，函館 pp.28-31 2003年
「インターネットを用いた研究活動支援システム」システム構成

平澤茂一, 松嶋敏泰, 鴻巣敏之, 酒井哲也, 中澤真, 李相協, 野村亮

2001PCカンファレンス 2001年
Cross - language情報検索のためのBMIR - J2を用いた一考察

酒井哲也, 梶浦正浩, 住田一男

情報処理学会研究報告自然言語処理（NL） 1999 ( 2 ) 41 - 48 1999年01月

　概要を見る

本論文では，日本語テストコレクションBMIR-J2およびこれを英訳したデータを用い，情報フィルタリングシステムNEATと機械翻訳システムASTRANSACによる日・英間のcross-language情報検索の検索精度評価実験を行う．英語検索要求による日本語文書の検索実験では，文書の翻訳と検索要求の翻訳のアプローチを，さらに異なる翻訳者による検索要求の翻訳を比較する．日本語検索要求による擬似英語文書の検索実験では，検索要求の翻訳の前後にローカルフィードバックを行う．以上により，日本語単言語検索の90％以上の精度を実現する．We study a cross-language IR approach using the NEAT information filtering system and the AS-TRANSAC machine translation system. The BMIR-J2 standard Japanese test collection and our own translated data are used for evaluation. In the English-to-Japanese experiments, we consider both document translation and query translation, and also compare the retrieval performance when the queries are translated by different translators. In the Japanese-to-pseudo-English experiments, we perform local feedback both before and after query translation. We achieve over 90% of Japanese monolingual performance.

CiNii
日本語情報検索システム評価用テストコレクションBMIR ? J2

木谷強, 小川泰嗣, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 酒井哲也, 徳永健伸, 鶴岡弘, 安形輝

情報処理学会研究報告データベースシステム（DBS） 1998 ( 2 ) 15 - 22 1998年01月

　概要を見る

日本語情報検索システム評価用テストコレクションBMIR-J2は、情報処理学会データベースシステム研究会内のワーキンググループによって作成されている。BMIR-J2は1998年3月から配布される予定であるが、これに先立ち、テスト版としてBMIR-J2が1996年3月からモニタ公開された。J1は50箇所のモニタに配布され、多数の研究成果が発表されている。BMIR-J2では、J1に対するモニタユーザからのアンケートの回答と、作成にあたったワーキングループメンバの経験をもとに、テストコレクションの検索対象テキスト数を大幅に増やし、検索要求と適合性判定基準も見直した。本論文では、BMIR-J2の内容とその作成手順、および今後の課題について述べる。BMIR-J2, a test collection for evaluation of Japanese information retrieval systems to be released in March 1998, has been developed by a working group under the Special Interest Group on Database Systems in Information Processing Society of Japan. Since March 1996, a preliminary version called BMIR-J1 has been distributed to fifty sites and used in many research projects. Based on comments from the BMIR-J1 users and our experience, we have enlarged the collection size and revised search queries and relevance assessments in BMIR-J2. In this paper, we describe BMIR-J2 and its development process, and discuss issues to be considered for improving BMIR-J2 further.

CiNii
情報フィルタリングシステムNEATのための検索要求文からのプロファイル生成

酒井哲也, 梶浦正浩, 住田一男

情報処理学会研究報告情報学基礎（FI） 1997 ( 86 ) 83 - 88 1997年09月

　概要を見る

我々は、新聞社・雑誌社により日々提供される電子化記事から個々のユーザーの興味に合ったものを選出し電子メイルなどで配信する情報フィルタリングシステムNEATを開発した。NEATは、プロファイルに記述されたブール式、検索語の出現位置・文書内密度・文書内分布などの多様な検索条件ベクトルに基づき、文書に対して加点しランキングを行う。今回、BMIR?J1の自然言語で書かれた検索要求文からプロファイルを自動生成する実験を行い、単純なブール式のプロファイルと人手によるプロファイルの中間程度の性能を達成できることを確認した。初期プロファイルの自動生成とrelevance feedbackの併用により、人手によるプロファイル作成の負荷は大幅に軽減されると考えられる。The NEAT information filtering system selects relevant articles from digital text provided daily by Japanese newspaper companies and publishers, and sends them by e-mail to its users. NEAT calculates a score for each article and produces a ranked output based on various types of query vectors written in the profile, such as location, density and distribution of keywords as well as boolean operators. We show that profiles generated automatically from query sentences can lie halfway between simple boolean profiles and hand-made profiles with respect to retrieval effectiveness. By combining this method and relevance feedback, the burden of manual profile definition will be lightened considerably.

CiNii
ベンチマーク BMIR-J1 を用いた情報フィルタリングシステム NEAT の評価

酒井哲也, 梶浦正浩, 三池誠司, 佐藤誠, 住田一男

全国大会講演論文集 54 301 - 302 1997年03月

　概要を見る

我々は, プール式, 検索語の出現位置, 検素語の文書内密度・分布などの多様な検索条件ベクトルにより文書に対して加点しランキングを行う情報フィルタリングシステム NEAT を開発した. 本稿では, 検索システム評価用ベンチマーク BMIR-J1を用いた, NEAT のプール式および検索語の出現位置情報のみを利用した場合の検索精度の評価について報告する.

CiNii
情報フィルタリングシステム NEAT の開発

梶浦正浩, 三池誠司, 酒井哲也, 佐藤誠, 住田一男

全国大会講演論文集 54 299 - 300 1997年03月

　概要を見る

我々は, 新聞社/雑誌社などから日々提供される文書(記事)よりユーザの要求に合致するものを抽出しユーザに提供する, 実サービス用の情報フィルタリングシステム NEAT (News Extractor with Accurately Tailored profiles) およびシステムの中核であるフィルタリングエンジシを開発した. フィルタリングエンジンは, 2種類の単語検索方法を結合した新しい検索法や多様なフィールドに対応した複数の検索条件ベクトルを用いることによって, 高い再現率/適合率を実現できるよう設計されている. 本稿では, 開発した NEAT およびフィルタリングエンジンの概要について述べ, また, 新しい単語検索法の評価結果を示す.

CiNii
電子図書館のための効率的な文書検索 : 検索/提示のための文書構造化と抄録生成

住田一男, 酒井哲也, 小野顕司, 三池誠司

ディジタル図書館 3 35 - 41 1995年03月

CiNii
文書検索システムの動的抄録提示インタフェースの評価

酒井哲也, 三池誠司, 住田一男

情報処理学会研究報告ヒューマンコンピュータインタラクション（HCI） 1994 ( 96 ) 49 - 54 1994年11月

　概要を見る

膨大な文書検索結果の全文を読み所望の文書を選出したり情報を得たりする労力の軽減のために、我々は、検索した文書の抄録を自動生成して提示するインタフェースを開発した。これは、ユーザーによる「詳しく」「簡単に」の指示に応じて、抄録の任意の部分の長さを変更することを可能とする。このインタフェースの評価実験を行った結果、文書の要不要の判定時間は、原文のみを提示した場合に比べ80％程度に短縮されることがわかった。この代償として、平均では判定の質が原文のみを提示した場合の90％程度に低下してしまうが、内容の深い理解を必要としない判定問題においては判定の質を保持することができた。In order to lighten the burden of browsing through numerous retrieved documents to select relevant documents or to obtain useful information, we have developed a user interface for generating and presenting abstracts of the retrieved documents. The interface enables the user to alter the length of any part of the abstract by entering DETAIL or BRIEF commands. Experiments show that, using this interface, the time for judging the relevance of each retrieved document can be reduced to 80% in comparison to using an interface that only presents the full-text to the user. Although the quality of relevance judgment was on average lowered to 90% of that achieved in the case of full-text presentation, it was not affected when deep understanding of the contents was not required.

CiNii
自動抄録機能をもつ対話的文書検索システム : システムの機能と構成

住田一男, 酒井哲也, 小野顕司, 伊藤悦雄, 三池誠司, 武田公人

全国大会講演論文集 48 275 - 276 1994年03月

　概要を見る

近年、ワークステーションの計算機パワーの増大にともない、全文文書を検索対象とした全文検索システムの実用化が進みつつある。しかし、現在実用化されている全文検索システムでは、検索してきた文書を表示する場合、検索文書のタイトルの一覧を表示するか、原文をそのまま表示するにすぎない。検索は、結果から統計的な情報を作成すること、あるいは検索した文書を読み、理解すること、内容を参考にし、再利用すること等を目的として行われる。このため、検索システムにおいては、検索速度や精度の点だけではなく、検索結果の提示方法も配慮し、効率的な検索を可能にする必要がある。効率的な検索インタフェースの構築を目的として、ディレクトリ構造のような情報の階層構造を利用し、大量情報を可視化する試みがなされている。しかし、情報伝達の中心である言語情報についての扱いがこれまで未検討であった。我々は、効率的な検索を目的として、検索結果の文書から自動的に抄録を生成し提示することを特長とする文書検索システムBREVIDOC(Broadcatching system with an essence viewer for retrieved documents) を試作した。本稿では、試作したシステムの構成ならびに機能を述べる。

CiNii
Learning formal languages from Feasible Teachers

酒井哲也, 平澤茂一, 松嶋敏泰

日本経営工学会誌 44 ( 3 ) 245 - 245 1993年08月

CiNii

▼全件表示

現在担当している科目

卒業論文Ａ

基幹理工学部

2025年春学期
卒業論文Ａ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年春学期
卒業論文Ｂ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
卒業論文Ａ（秋学期）　18前再

基幹理工学部

2025年秋学期
卒業論文Ｂ（春学期）　18前再

基幹理工学部

2025年春学期
プログラミングＣ　18前再

基幹理工学部

2025年秋学期
卒業論文Ａ　18前再

基幹理工学部

2025年春学期
情報理工学実験Ｂ【前年度成績S評価者用】

基幹理工学部

2025年春学期
情報理工学実験Ｂ

基幹理工学部

2025年春学期
情報理工学実験Ａ

基幹理工学部

2025年秋学期
情報理工学実験Ａ　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
卒業論文Ｂ（春学期）

基幹理工学部

2025年春学期
情報アクセス評価基盤

基幹理工学部

2025年春学期
データベース【前年度成績S評価者用】

基幹理工学部

2025年秋学期
データベース

基幹理工学部

2025年秋学期
統計解析実習

基幹理工学部

2025年秋クォーター
卒業論文Ｂ

基幹理工学部

2025年秋学期
プロジェクト研究Ｂ

基幹理工学部

2025年秋学期
プロジェクト研究Ａ

基幹理工学部

2025年春学期
卒業論文Ａ　（集中）

基幹理工学部

2025年集中（春・秋学期）
卒業論文Ｂ　18前再

基幹理工学部

2025年秋学期
卒業論文Ａ（秋学期）

基幹理工学部

2025年秋学期
情報通信実験Ｂ【前年度成績S評価者用】

基幹理工学部

2025年春学期
情報通信実験Ｂ

基幹理工学部

2025年春学期
情報通信実験Ａ　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
情報通信実験Ａ

基幹理工学部

2025年秋学期
卒業論文Ｂ

基幹理工学部

2025年秋学期
卒業論文Ａ　（集中）

基幹理工学部

2025年集中（春・秋学期）
統計解析実習

基幹理工学部

2025年秋クォーター
プロジェクト研究Ｂ

基幹理工学部

2025年秋学期
データベース

基幹理工学部

2025年秋学期
プロジェクト研究Ａ

基幹理工学部

2025年春学期
卒業論文Ａ（秋学期）

基幹理工学部

2025年秋学期
卒業論文Ａ

基幹理工学部

2025年春学期
卒業論文Ａ（秋学期）　18前再

基幹理工学部

2025年秋学期
卒業論文Ａ　18前再

基幹理工学部

2025年春学期
卒業論文Ｂ（春学期）

基幹理工学部

2025年春学期
卒業論文Ｂ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
卒業論文Ｂ（春学期）　18前再

基幹理工学部

2025年春学期
卒業論文Ｂ　18前再

基幹理工学部

2025年秋学期
卒業論文Ａ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年春学期
Graduation Thesis A (Fall)

基幹理工学部

2025年秋学期
Graduation Thesis B (Fall) [S Grade]

基幹理工学部

2025年秋学期
Graduation Thesis B (Spring)

基幹理工学部

2025年春学期
Computer Science and Communications Engineering Laboratory A [S Grade]

基幹理工学部

2025年秋学期
Computer Science and Communications Engineering Laboratory A

基幹理工学部

2025年秋学期
情報アクセス評価基盤

基幹理工学部

2025年春学期
Project Research Spring

基幹理工学部

2025年春学期
Project Research Fall

基幹理工学部

2025年秋学期
Databases

基幹理工学部

2025年秋学期
Introduction to Computers and Networks

基幹理工学部

2025年春学期
Computer Science and Communications Engineering Laboratory B

基幹理工学部

2025年春学期
Graduation Thesis A (Spring) [S Grade]

基幹理工学部

2025年春学期
Foundations for Information Access Evaluation

基幹理工学部

2025年春学期
Graduation Thesis B (Spring) [S Grade]

基幹理工学部

2025年春学期
Graduation Thesis A (Fall) [S Grade]

基幹理工学部

2025年秋学期
Graduation Thesis A (Spring)

基幹理工学部

2025年春学期
Graduation Thesis B (Fall)

基幹理工学部

2025年秋学期
修士論文（情報・通信）

大学院基幹理工学研究科

2025年通年
Master's Thesis (Department of Computer Science and Communications Engineering)

大学院基幹理工学研究科

2025年通年
リサーチプロジェクト（秋）

大学院基幹理工学研究科

2025年秋学期
情報アクセス研究

大学院基幹理工学研究科

2025年通年
Foundations for Information Access Evaluation

大学院基幹理工学研究科

2025年春学期
Special Laboratory B in Computer Science and Communications Engineering

大学院基幹理工学研究科

2025年秋学期
Special Laboratory A in Computer Science and Communications Engineering

大学院基幹理工学研究科

2025年春学期
Research on Information Access

大学院基幹理工学研究科

2025年通年
情報アクセス演習C

大学院基幹理工学研究科

2025年春学期
情報アクセス演習Ｂ

大学院基幹理工学研究科

2025年秋学期
情報アクセス演習A

大学院基幹理工学研究科

2025年春学期
情報アクセス演習D

大学院基幹理工学研究科

2025年秋学期
リサーチプロジェクト（春）

大学院基幹理工学研究科

2025年春学期
情報アクセス評価基盤

大学院基幹理工学研究科

2025年春学期
情報理工・情報通信特別実験B

大学院基幹理工学研究科

2025年秋学期
情報理工・情報通信特別実験A

大学院基幹理工学研究科

2025年春学期
情報理工・情報通信特別演習Ａ

大学院基幹理工学研究科

2025年春学期
Seminar on Information Access D

大学院基幹理工学研究科

2025年秋学期
Seminar on Information Access C

大学院基幹理工学研究科

2025年春学期
Seminar on Information Access B

大学院基幹理工学研究科

2025年秋学期
Seminar on Information Access A

大学院基幹理工学研究科

2025年春学期
情報アクセス研究

大学院基幹理工学研究科

2025年通年
情報理工・情報通信特別演習Ｂ

大学院基幹理工学研究科

2025年秋学期

▼全件表示

他学部・他研究科等兼任情報

理工学術院大学院基幹理工学研究科

学内研究所・附属機関兼任歴

2024年

-

2026年

理工学術院総合研究所兼任研究員

特定課題制度（学内資金）

ベイズ統計に基づく情報アクセス評価体系の構築

2017年

　概要を見る

I published the following full paper at SIGIR 2017, the top conference in information retrieval.The following is the abstract:Using classical statistical significance tests, researchers can onlydiscuss P(D+|H), the probability of observing the data D at hand orsomething more extreme, under the assumption that the hypothesisH is true (i.e., the p-value). But what we usually want is P(D+|H),the probability that a hypothesis is true, given the data. If we useBayesian statistics with state-of-the-art Markov Chain Monte Carlo(MCMC) methods for obtaining posterior distributions, this is nolonger a problem. That is, instead of the classical p-values and 95%confidence intervals, which are often misinterpreted respectivelyas “probability that the hypothesis is (in)correct” and “probabilitythat the true parameter value drops within the interval is 95%,” wecan easily obtain P(H|D) and credible intervals which representexactly the above. Moreover, with Bayesian tests, we can easilyhandle virtually any hypothesis, not just “equality of means,” andobtain an Expected A Posteriori (EAP) value of any statistic thatwe are interested in. We provide simple tools to encourage theIR community to take up paired and unpaired Bayesian tests forcomparing two systems. Using a variety of TREC and NTCIR data,we compare P(H|D) with p-values, credible intervals with confidence intervals, and Bayesian EAP effect sizes with classical ones.Our results show that (a) p-values and confidence intervals canrespectively be regarded as approximations of what we really want,namely, P(H|D) and credible intervals; and (b) sample effect sizesfrom classical significance tests can differ considerably from theBayesian EAP effect sizes, which suggests that the former can bepoor estimates of population effect sizes. For both paired and unpairedtests, we propose that the IR community report the EAP, thecredible interval, and the probability of hypothesis being true, notonly for the raw difference in means but also for the effect size interms of Glass’s delta.
統計的手法を用いた情報検索テストコレクション横断評価および情報検索論文の評価

2016年

　概要を見る

I published five international conference papers (SIGIR, SIGIR, SIGIR(short), ICTIR, AIRS),two international workshop papers (EVIA, EVIA), and a workshop report (SIGIR Forum).Moreover, I gave a tutorial at an international conference (ICTIR) and a keynote at a Japanese symposium (IPSJ SIGNL) on this topic.
「寡黙なユーザ」のための情報検索技術に関する研究

2015年

　概要を見る

We published one international journal paper, one international conference paper, one evaluation conference overview (TREC), and two unrefereed domestic papers.
情報アクセス評価基盤の体系化および評価

2015年

　概要を見る

We published one book, one international journal paper, one international conference paper, one domestic IPSJ workshop paper and organised an international workshop.
テストコレクションのサンプルサイズ設計に関する研究

2014年

　概要を見る

We published three refereed papers (two forinternational conferences and one for a domestic conference) on how todetermine the topic set size of a test collection.
サーチエンジン評価指標の体系化と有効性実証

2014年

　概要を見る

We published three refereed papers (two forinternational conferences and one for a domestic conference) on how todetermine the topic set size of a test collection.
最小限のインタラクションを介した情報アクセスに関する研究

2014年 Koji Yatani, Makoto P. Kato, Takehiro Yamamoto, Virgil Pavlu, Javed Aslam, Fernando Diaz

　概要を見る

Wecollaborated with various researchers from outside Waseda and published severalpapers related to information access via minimal interactions. We ran a taskcalled MobileClick at NTCIR and a track called Temporal Summarization at TREC.It is worth noting that ourMobileHCI paper (collaboration with the University of Tokyo) received anHonourable Mention Award.

▼全件表示