2022/09/28 更新

写真a

サカイ テツヤ
酒井 哲也
所属
理工学術院 基幹理工学部
職名
教授
メールアドレス
メールアドレス
ホームページ
プロフィール
http://sakailab.com/tetsuya/

兼担

  • 理工学術院   大学院基幹理工学研究科

学内研究所等

  • 2020年
    -
    2022年

    理工学術院総合研究所   兼任研究員

学位

  • 博士

 

研究分野

  • ヒューマンインタフェース、インタラクション

研究キーワード

  • 情報アクセス、情報検索、自然言語処理

論文

  • Click the search button and be happy: Evaluating direct and immediate information access

    Tetsuya Sakai, Makoto P. Kato, Young-In Song

    International Conference on Information and Knowledge Management, Proceedings     621 - 630  2011年  [査読有り]

     概要を見る

    We define Direct Information Access as a type of information access where there is no user operation such as clicking or scrolling between the user's click on the search button and the user's information acquisition
    we define Immediate Information Access as a type of information access where the user can locate the relevant information within the system output very quickly. Hence, a Direct and Immediate Information Access (DIIA) system is expected to satisfy the user's information need very quickly with its very first response. We propose a nugget-based evaluation framework for DIIA, which takes nugget positions into account in order to evaluate the ability of a system to present important nuggets first and to minimise the amount of text the user has to read. To demonstrate the integrity, usefulness and limitations of our framework, we built a Japanese DIIA test collection with 60 queries and over 2,800 nuggets as well as an offset-based nugget match evaluation interface, and conducted experiments with manual and automatic runs. The results suggest our proposal is a useful complement to traditional ranked retrieval evaluation based on document relevance. © 2011 ACM.

    DOI

  • Evaluating the Effects of Embedding with Speaker Identity Information in Dialogue Summarization

    Yuji Naraki, Tetsuya Sakai, Yoshihiko Hayashi

    LREC 2022    2022年  [査読有り]

  • AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval

    Riku Togashi, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkilä, Tetsuya Sakai

    CVPR 2022    2022年  [査読有り]

  • スタンス検出タスクにおける評価方法の選定 (研究会推薦論文)

    雨宮佑基, 酒井哲也

    電子情報通信学会和文論文誌D「データ工学と情報マネジメント特集」    2022年  [査読有り]

  • Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents?

    Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

    ACM TOIS    2022年  [査読有り]

  • MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

    Junjie Wang, Yatai Ji, Jiaqi Sun, Yujiu Yang, Tetsuya Sakai

    Findings of the Association for Computational Linguistics: EMNLP 2021    2021年  [査読有り]

  • A Closer Look at Evaluation Measures for Ordinal Quantification

    Tetsuya Sakai

    Proceedings of the CIKM 2021 Workshops    2021年  [査読有り]

  • Evaluating Relevance Judgments with Pairwise Discriminative Power

    Zhumin Chu, Jiaxin Mao, Fan Zhang, Yiqun Liu, Tetsuya Sakai, Min Zhang, Shaoping Ma

    Proceedings of ACM CIKM 2021    2021年  [査読有り]

  • Incorporating Query Reformulating Behavior into Web Search Evaluation

    Jia Chen, Yiqun Liu, Jiaxin Mao, Fan Zhang, Tetsuya Sakai, Weizhi Ma, Min Zhang, Shaoping Ma

    Proceedings of ACM CIKM 2021    2021年  [査読有り]

  • A Simple and Effective Usage of Self-supervised Contrastive Learning for Text Clustering

    Haoxiang Shi, Cen Wang, Tetsuya Sakai

    Proceedings of IEEE SMC 2021    2021年  [招待有り]

  • Evaluating Evaluation Measures for Ordinal Classification and Ordinal Quantification

    Tetsuya Sakai

    Proceedings of ACL-IJCNLP 2021    2021年  [査読有り]

  • WWW3E8: 259,000 Relevance Labels for Studying the Effect of Document Presentation Order for Relevance Assessors

    Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

    Proceedings of ACM SIGIR 2021    2021年  [査読有り]

  • On the Two-Sample Randomisation Test for IR Evaluation

    Tetsuya Sakai

    Proceedings of ACM SIGIR 2021    2021年  [査読有り]

  • Scalable Personalised Item Ranking through Parametric Density Estimation

    Riku Togashi, Masahiro Kato, Mayu Otani, Tetsuya Sakai, Shin’Ichi Satoh

    Proceedings of ACM SIGIR 2021    2021年  [査読有り]

  • Fast and Exact Randomisation Test for Comparing Two Systems with Paired Data

    Rikiya Suzuki, Tetsuya Sakai

    Proceedings of ACM ICTIR 2021    2021年  [査読有り]

  • DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators’ Labels

    Zhaohao Zeng, Tetsuya Sakai

    arXiv    2021年  [査読有り]

  • How Do Users Revise Zero-Hit Product Search Queries?

    Yuki Amemiya, Tomohiro Manabe, Sumio Fujita, Tetsuya Sakai

    Proceedings of ECIR 2021 Part II    2021年  [査読有り]

  • On the Instability of Diminishing Return IR Measures

    Tetsuya Sakai

    Proceedings of ECIR 2021 Part I    2021年  [査読有り]

  • RSL19BD at DBDC4: Ensemble of Decision Tree-Based and LSTM-Based Models

    Chih-hao Wang, Sosuke Kato, Tetsuya Sakai

    Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems    2021年  [査読有り]

  • Retrieval Evaluation Measures that Agree with Users’ SERP Preferences: Traditional, Preference-based, and Diversity Measures

    Tetsuya Sakai, Zhaohao Zeng

    ACM TOIS    2020年  [査読有り]

  • A Siamese CNN Architecture for Learning Chinese Sentence Similarity

    Haoxiang Shi, Cen Wang, Tetsuya Sakai

    Proceedings of AACL-IJCNLP 2020 Student Research Workshop (SRW)    2020年  [査読有り]

  • Automatic Evaluation of Iconic Image Retrieval based on Colour, Shape, and Texture

    Riku Togashi, Sumio Fujita, Tetsuya Sakai

    Proceedings of ACM ICMR 2020    2020年  [査読有り]

  • SogouQ: The First Large-Scale Test Collection with Click Streams Used in a Shared-Task Evaluation

    Ruihua Song, Min Zhang, Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou

    Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact    2020年  [査読有り]

  • Graded Relevance

    Tetsuya Sakai

    Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact    2020年  [査読有り]

  • Visual Intents vs. Clicks, Likes, and Purchases in E-commerce

    Riku Togashi, Tetsuya Sakai

    Proceedings of ACM SIGIR 2020    2020年  [査読有り]

  • Good Evaluation Measures based on Document Preferences

    Tetsuya Sakai, Zhaohao Zeng

    Proceedings of ACM SIGIR 2020    2020年  [査読有り]

  • How to Measure the Reproducibility of System-oriented IR Experiments

    Timo Breuer, Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Philipp Schaer, Ian Soboroff

    Proceedings of ACM SIGIR 2020    2020年  [査読有り]

  • 文書分類技術に基づくエントリーシートからの業界推薦

    三王慶太, 酒井哲也

    日本データベース学会和文論文誌    2020年  [査読有り]

  • Randomised vs. Prioritised Pools for Relevance Assessments: Sample Size Considerations

    Tetsuya Sakai, Peng Xiao

    Proceedings of AIRS 2019    2020年  [査読有り]

  • Generating Short Product Descriptors based on Very Little Training Data

    Peng Xiao, Joo-Young Lee, Sijie Tao, Young-Sook Hwang, Tetsuya Sakai

    Proceedings of AIRS 2019    2020年  [査読有り]

  • Unsupervised Answer Retrieval with Data Fusion for Community Question Answering

    Sosuke Kato, Toru Shimizu, Sumio Fujita, Tetsuya Sakai

    Proceedings of AIRS 2019    2020年  [査読有り]

  • Towards Automatic Evaluation of Reused Answers in Community Question Answering

    Hsin-Wen Liu, Sumio Fujita, Tetsuya Sakai

    Proceedings of AIRS 2019    2020年  [査読有り]

  • Arc Loss: Softmax with Additive Angular Margin for Answer Retrieval

    Rikiya Suzuki, Sumio Fujita, Tetsuya Sakai

    Proceedings of AIRS 2019    2020年  [査読有り]

  • System Evaluation of Ternary Error-Correcting Output Codes for Multiclass Classification Problems*

    Shigeichi Hirasawa, Gendo Kumoi, Hideki Yagi, Manabu Kobayashi, Masayuki Goto, Tetsuya Sakai, Hiroshige Inazumi

    2019 IEEE International Conference on Systems, Man and Cybernetics (SMC)    2019年10月  [査読有り]

    DOI

  • Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

    Tetsuya Sakai

    Proceesings of ACM WSDM 2019    2019年  [査読有り]

  • Attitude Detection for One-Round Conversation: Jointly Extracting Target-Polarity Pairs

    Zhaohao Zeng, Ruihua Song, Pingping Lin, Tetsuya Sakai

    Proceesings of ACM WSDM 2019    2019年  [査読有り]

  • A Comparative Study of Deep Learning Approaches for Extractive Query-Focused Multi-Document Summarization

    Yuliska, Tetsuya Sakai

    Proceedings of IEEE ICICT 2019    2019年  [査読有り]

  • Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

    Tetsuya Sakai

    Proceedings of ECIR 2019 Part II (LNCS 11438)    2019年  [査読有り]

  • CENTRE@CLEF 2019

    Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Ian Soboroff

    Proceedings of ECIR 2019 Part II (LNCS 11438)    2019年  [査読有り]

  • Celebrating 20 Years of NTCIR: The Book

    Douglas W. Oard, D.W, Tetsuya Sakai, Noriko Kando

    Proceedings of EVIA 2019    2019年  [査読有り]

  • RSL19BD at DBDC4: Ensemble of Decision Tree-based and LSTM-based Models

    Chih-hao Wang, Sosuke Kato, Tetsuya Sakai

    Proceedings of Chatbots and Conversational Agents and Dialogue Breakdown Detection Challenge (WOCHAT+DBDC), IWSDS 2019    2019年  [査読有り]

  • Low-cost, Bottom-up Measures for Evaluating Search Result Diversification

    Zhicheng Dou, Xue Yang, Diya Li, Ji-Rong Wen, Tetsuya Sakai

    Information Retrieval Journal    2019年  [査読有り]

  • Which Diversity Evaluation Measures Are “Good”?

    Tetsuya Sakai, Zhaohao Zeng

    Proceedings of ACM SIGIR 2019    2019年  [査読有り]

  • The SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)

    Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

    Proceedings of ACM SIGIR 2019    2019年  [査読有り]

  • Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019)

    Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

    Proceedings of OSIRRC 2019    2019年  [査読有り]

  • BM25 Pseudo Relevance Feedback Using Anserini at Waseda University

    Zhaohao Zeng, Tetsuya Sakai

    Proceedings of OSIRRC 2019    2019年  [査読有り]

  • Composing a Picture Book by Automatic Story Understanding and Visualization

    Xiaoyu Qi, Ruihua Song, Chunting Wang, Jin Zhou, Tetsuya Sakai

    Proceedings of the Second Storytelling Workshop (StoryNLP @ ACL2019)    2019年  [査読有り]

  • CENTRE@CLEF2019: Overview of the Replicability and Reproducibility Tasks

    Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Ian Soboroff

    CLEF 2019 Working Notes    2019年  [査読有り]

  • CENTRE@CLEF2019: Sequel in the Systematic Reproducibility Realm

    Nicola Ferro, Norbert Fuhr, Maria Maistro, Tetsuya Sakai, Ian Soboroff

    Proceedings of CLEF 2019 (LNCS 11696)    2019年  [査読有り]

  • Generalising Kendall’s Tau for Noisy and Incomplete Preference Judgements

    Riku Togashi, Tetsuya Sakai

    Proceedings of ACM ICTIR 2019    2019年  [査読有り]

  • Evaluating Image-Inspired Poetry Generation

    Chao-Chung Wu, Ruihua Song, Tetsuya Sakai, Wen-Feng Cheng, Xing Xie, Shou-De Lin

    Proceedings of NLPCC 2019    2019年  [査読有り]

  • How to Run an Evaluation Task: with a Primary Focus on Ad Hoc Information Retrieval

    Tetsuya Sakai

    Information Retrieval Evaluation in a Changing World : Lessons Learned from 20 Years of CLEF    2019年  [査読有り]

  • Voice Assistant アプリの大規模実態調査

    刀塚敦子, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉

    コンピュータセキュリティシンポジウム    2019年  [査読有り]

  • Voice Input Interface Failures and Frustration: Developer and User Perspectives

    Shiyoh Goetsu, Tetsuya Sakai

    ACM UIST 2019 Adjunct    2019年  [査読有り]

  • A First Look at the Privacy Risks of Voice Assistant Apps

    Atsuko Natatsuka, Mitsuaki Akiyama, Ryo Iijima, Tetsuya Sakai, Takuya Watanabe, Tatsuya Mori

    ACM CCS 2019 Posters & Demos    2019年  [査読有り]

  • Attitude Detection for One-Round Conversation: Jointly Extracting Target-Polarity Pairs

    Zhaohao Zeng, Ruihua Song, Pingping Lin, Tetsuya Sakai

    Journal of Information Processing    2019年  [査読有り]

  • Search Result Diversity Evaluation Based on Intent Hierarchies

    Xiaojie Wang, Ji-Rong Wen, Zhicheng Dou, Tetsuya Sakai, Rui Zhang

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING   30 ( 1 ) 156 - 169  2018年01月  [査読有り]

     概要を見る

    Search result diversification aims at returning diversified document lists to cover different user intents of a query. Existing diversity measures assume that the intents of a query are disjoint, and do not consider their relationships. In this paper, we introduce intent hierarchies to model the relationships between intents, and present four weighing schemes. Based on intent hierarchies, we propose several hierarchical measures that take into account the relationships between intents. We demonstrate the feasibility of hierarchical measures by using a new test collection based on TREC Web Track 2009-2013 diversity test collections and by using NTCIR-11 IMine test collection. Our main experimental findings are: (1) Hierarchical measures are more discriminative and intuitive than existing measures. In terms of intuitiveness, it is preferable for hierarchical measures to use the whole intent hierarchies than to use only the leaf nodes. (2) The types of intent hierarchies used affect the discriminative power and intuitiveness of hierarchical measures. We suggest the best type of intent hierarchies to be used according to whether the nonuniform weights are available. (3) To measure the benefits of the diversification algorithms which use automatically mined hierarchical intents, it is important to use hierarchical measures instead of existing measures.

    DOI

  • Conducting Laboratory Experiments Properly with Statistical Tools: An Easy Hands-on Tutorial

    Tetsuya Sakai

    Proceedings of ACM SIGIR 2018    2018年  [査読有り]

  • Comparing Two Binned Probability Distributions for Information Access Evaluation

    Tetsuya Sakai

    Proceedings of ACM SIGIR 2018    2018年  [査読有り]

  • CENTRE@CLEF2018: Overview of the Replicability Task

    Nicola Ferro, Maria Maistro, Tetsuya Sakai, Ian Soboroff

    CLEF 2018 Working Notes 招待論文    2018年  [査読有り]

  • Topic Set Size Design for Paired and Unpaired Data

    Tetsuya Sakai

    Proceedings of ACM ICTIR 2018    2018年  [査読有り]

  • Classifying Community QA Questions That Contain an Image

    Kenta Tamaki, Riku Togashi, Sosuke Kato, Sumio Fujita, Hideyuki Maeda, Tetsuya Sakai

    Proceedings of ACM ICTIR 2018    2018年  [査読有り]

  • 放棄セッションにおけるユーザ操作に着目したモバイル検索カードの順位付け

    川崎真未, Inho Kang, 酒井哲也

    IPSJ TOD11(3)    2018年  [査読有り]

  • Towards Automatic Evaluation of Customer-Helpdesk Dialogues

    Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, Tetsuya Sakai

    Journal of Information Processing    2018年  [査読有り]

  • Overview of CENTRE@CLEF 2018: a First Tale in the Systematic Reproducibility Realm

    Nicola Ferro, Maria Maistro, Tetsuya Sakai, Ian Soboroff

    Proceedings of CLEF 2018 (LNCS 11018)    2018年  [査読有り]

  • Why You Should Listen to This Song: Reason Generation for Explainable Recommendation

    Guoshuai Zhao, Hao Fu, Ruihua Song, Tetsuya Sakai, Xing Xie, Xueming Qian

    1st Workshop on Scalable and Applicable Recommendation Systems (SAREC 2018)    2018年  [査読有り]

  • Understanding the Inconsistency between Behaviors and Descriptions of Mobile Apps

    Takuya Watanabe, Akiyama Mitsuki, Tetsuya Sakai, Hironori Washizaki, Tatsuya Mori

    IEICE Transactions    2018年  [査読有り]

  • Proceedings of AIRS 2018 (LNCS 11292)

    Yuen-Hsien Tseng, Tetsuya Sakai, Jing Jiang, Lun-Wei Ku, Dae Hoon Park, Jui-Feng Yeh, Liang-Chih Yu, Lung-Hao Lee, Zhi-Hong Chen

    エディタ    2018年  [査読有り]

  • The probability that your hypothesis is correct, credible intervals, and effect sizes for IR evaluation

    Tetsuya Sakai

    SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval     25 - 34  2017年08月  [査読有り]

     概要を見る

    Using classical statistical significance tests, researchers can only discuss PD+jH, the probability of observing the data D at hand or something more extreme, under the assumption that the hypothesis H is true (i.e., the p-value). But what we usually want is PHjD, the probability that a hypothesis is true, given the data. If we use Bayesian statistics with state-of-The-Art Markov Chain Monte Carlo (MCMC) methods for obtaining posterior distributions, this is no longer a problem. .at is, instead of the classical p-values and 95% confidence intervals, which are offen misinterpreted respectively as "probability that the hypothesis is (in)correct" and "probability that the true parameter value drops within the interval is 95%," we can easily obtain PHjD and credible intervals which represent exactly the above. Moreover, with Bayesian tests, we can easily handle virtually any hypothesis, not just "equality of means," and obtain an Expected A Posteriori (EAP) value of any statistic that we are interested in. We provide simple tools to encourage the IR community to take up paired and unpaired Bayesian tests for comparing two systems. Using a variety of TREC and NTCIR data, we compare PHjD with p-values, credible intervals with con.-dence intervals, and Bayesian EAP effect sizes with classical ones. Our results show that (a) p-values and confidence intervals can respectively be regarded as approximations of what we really want, namely, PHjD and credible intervals
    and (b) sample effect sizes from classical significance tests can di.er considerably from the Bayesian EAP effect sizes, which suggests that the former can be poor estimates of population effect sizes. For both paired and unpaired tests, we propose that the IR community report the EAP, the credible interval, and the probability of hypothesis being true, not only for the raw di.erence in means but also for the effect size in terms of Glass's.δ.

    DOI

  • Evaluating mobile search with height-biased gain

    Cheng Luo, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, Shaoping Ma

    SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval     435 - 444  2017年08月  [査読有り]

     概要を見る

    Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, traditional retrieval measures that regard the SERP as a ranked list of homogeneous items are not adequate for evaluating the overall quality of mobile SERPs. Specifically, we address the following new problems in mobile search evaluation: (1) Different retrieved items have different heights within the scrollable SERP, unlike a ten-blue-link SERP in which results have similar heights with each other. Therefore, the traditional rank-based decaying functions are not adequate for mobile search metrics. (2) For some types of verticals and cards, the information that the user seeks is already embedded in the snippet, which makes clicking on those items to access the landing page unnecessary. (3) For some results with complex sub-components (and usually a large height), the total gain of the results cannot be obtained if users only read part of their contents. The benefit brought by the result is affected by user's reading behavior and the internal gain distribution (over the height) should be modeled to get a more accurate estimation. To tackle these problems, we conduct a lab-based user study to construct suitable user behavior model for mobile search evaluation. From the results, we find that the geometric heights of user's browsing trails can be adopted as a good signal of user effort. Based on these findings, we propose a new evaluation metric, Height-Biased Gain, which is calculated by summing up the product of gain distribution and discount factors that are both modeled in terms of result height. To evaluate the effectiveness of the proposed metric, we compare the agreement of evaluation metrics with side-by-side user preferences on a test collection composed of four mobile search engines. Experimental results show that HBG agrees with user preferences 85.33% of the time, which is better than all existing metrics.

    DOI

  • LSTM vs. BM25 for Open-domain QA: A hands-on comparison of effectiveness and efficiency

    Sosuke Kato, Riku Togashi, Hideyuki Maeda, Sumio Fujita, Tetsuya Sakai

    SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval     1309 - 1312  2017年08月  [査読有り]

     概要を見る

    Recent advances in neural networks, along with the growth of rich and diverse community question answering (cQA) data, have en-abled researchers to construct robust open-domain question an-swering (QA) systems. It is often claimed that such state-of-The-art QA systems far outperform traditional IR baselines such as BM25. However, most such studies rely on relatively small data sets, e.g., those extracted from the old TREC QA tracks. Given mas-sive training data plus a separate corpus of Q&amp
    A pairs as the tar-get knowledge source, how well would such a system really per-form? How fast would it respond? In this demonstration, we pro-vide the attendees of SIGIR 2017 an opportunity to experience a live comparison of two open-domain QA systems, one based on a long short-Term memory (LSTM) architecture with over 11 mil-lion Yahoo! Chiebukuro (i.e., Japanese Yahoo! Answers) questions and over 27.4 million answers for training, and the other based on BM25. Both systems use the same Q&amp
    A knowledge source for answer retrieval. Our core demonstration system is a pair of Japan-ese monolingual QA systems, but we leverage machine translation for letting the SIGIR attendees enter English questions and com-pare the Japanese responses from the two systems after translating them into English.

    DOI

  • Does document relevance affect the searcher's perception of time?

    Cheng Luo, Yiqun Liu, Tetsuya Sakai, Ke Zhou, Fan Zhang, Xue Li, Shaoping Ma

    WSDM 2017 - Proceedings of the 10th ACM International Conference on Web Search and Data Mining     141 - 150  2017年02月  [査読有り]

     概要を見る

    Time plays an essential role in multiple areas of Information Retrieval (IR) studies such as search evaluation, user behavior analysis, temporal search result ranking and query understanding. Especially, in search evaluation studies, time is usually adopted as a measure to quantify users' efforts in search processes. Psychological studies have reported that the time perception of human beings can be affected by many stimuli, such as attention and motivation, which are closely related to many cognitive factors in search. Considering the fact that users' search experiences are affected by their subjective feelings of time, rather than the objective time measured by timing devices, it is necessary to look into the different factors that have impacts on search users' perception of time. In this work, we make a first step towards revealing the time perception mechanism of search users with the following contributions: (1) We establish an experimental research framework to measure the subjective perception of time while reading documents in search scenario, which originates from but is also different from traditional time perception measurements in psychological studies. (2) With the framework, we show that while users are reading result documents, document relevance has small yet visible effect on search users' perception of time. By further examining the impact of other factors, we demonstrate that the effect on relevant documents can also be influenced by individuals and tasks. (3) We conduct a preliminary experiment in which the difference between perceived time and dwell time is taken into consideration in a search evaluation task. We found that the revised framework achieved a better correlation with users' satisfaction feedbacks. This work may help us better understand the time perception mechanism of search users and provide insights in how to better incorporate time factor in search evaluation studies.

    DOI

  • Investigating Users' Time Perception during Web Search

    Cheng Luo, Xue Li, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, Shaoping Ma

    Proceedings of CHIIR 2017    2017年  [査読有り]

  • Overview of Special Issue

    Donna Harman, Diane Kelly (eds, James Allan, Nicholas J. Belkin, Paul Bennett, Jamie Callan, Charles Clarke, Fernando Diaz, Susan Dumais, Nicola Ferro, Donna Harman, Djoerd Hiemstra, Ian Ruthven, Tetsuya Sakai, Mark D. Smucker, Justin Zobel

    SIGIR Forum, 51(2)    2017年  [査読有り]

  • Mobile Vertical Ranking based on Preference Graphs

    Yuta Kadotami, Yasuaki Yoshida, Sumio Fujita, Tetsuya Sakai

    ACM ICTIR 2017    2017年  [査読有り]

  • Ranking Rich Mobile Verticals based on Clicks and Abandonment

    Mami Kawasaki, Inho Kang, Tetsuya Sakai

    Proceedings of ACM CIKM 2017    2017年  [査読有り]

  • Test Collections and Measures for Evaluating Customer-Helpdesk Dialogues

    Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, Tetsuya Sakai

    Proceedings of EVIA 2017    2017年  [査読有り]

  • Evaluating Evaluation Measures with Worst-Case Confidence Interval Widths

    Tetsuya Sakai

    Proceedings of EVIA 2017    2017年  [査読有り]

  • Towards Automatic Evaluation of Multi-Turn Dialogues: A Task Design that Leverages Inherently Subjective Annotations

    Tetsuya Sakai

    Proceedings of EVIA 2017    2017年  [査読有り]

  • The Effect of Inter-Assessor Disagreement on IR System Evaluation: A Case Study with Lancers and Students

    Tetsuya Sakai

    Proceedings of EVIA 2017    2017年  [査読有り]

  • Unanimity-Aware Gain for Highly Subjective Assessments

    Tetsuya Sakai

    Proceedings of EVIA 2017    2017年  [査読有り]

  • RSL17BD at DBDC3: Computing Utterance Similarities based on Term Frequency and Word Embedding Vectors

    Sosuke Kato, Tetsuya Sakai

    Proceedings of DSTC6    2017年  [査読有り]

  • Simple and effective approach to score standardisation

    Tetsuya Sakai

    ICTIR 2016 - Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval     95 - 104  2016年09月  [査読有り]

     概要を見る

    Webber, Moffat and Zobel proposed score standardization for information retrieval evaluation with multiple test collections. Given a topic-by-run raw score matrix in terms of some evaluation measure, each score can be standardised using the topic's sample mean and sample standard deviation across a set of past runs so as to quantify how different a system is from the "average" system in standard deviation units. Using standardised scores, researchers can compare systems across different test collections without worrying about topic hardness or normalisation. While Webber et al. mapped the standardised scores to the [0,1] range using a standard normal cumulative density function, the present study demonstrates that linear transformation of the standardised scores, a method widely used in educational research, can be a simple and effective alternative. We use three TREC robust track data sets with graded relevance assessments and official runs to compare these methods by means of leave-one-out tests, discriminative power, swap rate tests, and topic set size design. In particular, we demonstrate that our method is superior to the method of Webber et al. in terms of swap rates and topic set size design: put simply, our method ensures pairwise system comparisons that are more consistent across different data sets, and is arguably more convenient for designing a new test collection from a statistical viewpoint.

    DOI

  • Evaluating search result diversity using intent hierarchies

    Xiaojie Wang, Zhicheng Dou, Tetsuya Sakai, Ji-Rong Wen

    SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval     415 - 424  2016年07月  [査読有り]

     概要を見る

    Search result diversification aims at returning diversified document lists to cover different user intents for ambiguous or broad queries. Existing diversity measures assume that user intents are independent or exclusive, and do not consider the relationships among the intents. In this paper, we introduce intent hierarchies to model the relationships among intents. Based on intent hierarchies, we propose several hierarchical measures that can consider the relationships among intents. We demonstrate the feasibility of hierarchical measures by using a new test collection based on TREC Web Track 2009-2013 diversity test collections. Our main experimental findings are: (1) Hierarchical measures are generally more discriminative and intuitive than existing measures using flat lists of intents
    (2) When the queries have multilayer intent hierarchies, hierarchical measures are less correlated to existing measures, but can get more improvement in discriminative power
    (3) Hierarchical measures are more intuitive in terms of diversity or relevance. The hierarchical measures using the whole intent hierarchies are more intuitive than only using the leaf nodes in terms of diversity and relevance.

    DOI

  • Topic set size design

    Tetsuya Sakai

    INFORMATION RETRIEVAL JOURNAL   19 ( 3 ) 256 - 283  2016年06月  [査読有り]

     概要を見る

    Traditional pooling-based information retrieval (IR) test collections typically have n = 50-100 topics, but it is difficult for an IR researcher to say why the topic set size should really be n. The present study provides details on principled ways to determine the number of topics for a test collection to be built, based on a specific set of statistical requirements. We employ Nagata's three sample size design techniques, which are based on the paired t test, one-way ANOVA, and confidence intervals, respectively. These topic set size design methods require topic-by-run score matrices from past test collections for the purpose of estimating the within-system population variance for a particular evaluation measure. While the previous work of Sakai incorrectly used estimates of the total variances, here we use the correct estimates of the within-system variances, which yield slightly smaller topic set sizes than those reported previously by Sakai. Moreover, this study provides a comparison across the three methods. Our conclusions nevertheless echo those of Sakai: as different evaluation measures can have vastly different within-system variances, they require substantially different topic set sizes under the same set of statistical requirements; by analysing the tradeoff between the topic set size and the pool depth for a particular evaluation measure in advance, researchers can build statistically reliable yet highly economical test collections.

    DOI

  • On Estimating Variances for Topic Set Size Design

    Tetsuya Sakai, Lifeng Shang

    EVIA 2016    2016年  [査読有り]

  • Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect User Preferences?

    Makoto P. Kato, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Hajime Morita

    EVIA 2016    2016年  [査読有り]

  • Statistical Significance, Power, and Sample Sizes: A Systematic Review of SIGIR and TOIS,2006-2015

    Tetsuya Sakai

    ACM SIGIR 2016    2016年  [査読有り]

  • Two Sample T-tests for IR Evaluation: Student or Welch?

    Tetsuya Sakai

    ACM SIGIR 2016    2016年  [査読有り]

  • Report on the First International Workshop on the Evaluation on Collaborative Information Seeking and Retrieval (ECol'2015),

    Laure Soulier, Lynda Tamine, Tetsuya Sakai, Leif Azzopardi, Jeremy Pickens

    ACM SIGIR 2016    2016年  [査読有り]

  • Topic Set Size Design and Power Analysis in Practice (Tutorial Abstract),

    Tetsuya Sakai

    ACM ICTIR 2016    2016年  [査読有り]

  • The Effect of Score Standardisation on Topic Set Size Design

    Tetsuya Sakai

    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2016   9994   16 - 28  2016年  [査読有り]

     概要を見る

    Given a topic-by-run score matrix from past data, topic set size design methods can help test collection builders determine the number of topics to create for a new test collection from a statistical viewpoint. In this study, we apply a recently-proposed score standardisation method called std-AB to score matrices before applying topic set size design, and demonstrate its advantages. For topic set size design, std-AB suppresses score variances and thereby enables test collection builders to consider realistic choices of topic set sizes, and to handle unnormalised measures in the same way as normalised measures. In addition, even discrete measures that clearly violate normality assumptions look more continuous after applying std-AB, which may make them more suitable for statistically motivated topic set size design. Our experiments cover a variety of tasks and evaluation measures from NTCIR-12.

    DOI

  • Search result diversification based on hierarchical intents

    Sha Hu, Zhicheng Dou, Xiaojie Wang, Tetsuya Sakai, Ji-Rong Wen

    International Conference on Information and Knowledge Management, Proceedings   19-23-   63 - 72  2015年10月  [査読有り]

     概要を見る

    A large percentage of queries issued to search engines are broad or ambiguous. Search result diversification aims to solve this problem, by returning diverse results that can fulfill as many different information needs as possible. Most existing intent-aware search result diversification algorithms formulate user intents for a query as a flat list of subtopics. In this paper, we introduce a new hierarchical structure to represent user intents and propose two general hierarchical diversification models to leverage hierarchical intents. Experimental results show that our hierarchical diversification models outperform state-of-the-art diversification methods that use traditional flat subtopics.

    DOI

  • Dynamic author name disambiguation for growing digital libraries

    Yanan Qian, Qinghua Zheng, Tetsuya Sakai, Junting Ye, Jun Liu

    INFORMATION RETRIEVAL   18 ( 5 ) 379 - 412  2015年10月  [査読有り]

     概要を見る

    When a digital library user searches for publications by an author name, she often sees a mixture of publications by different authors who have the same name. With the growth of digital libraries and involvement of more authors, this author ambiguity problem is becoming critical. Author disambiguation (AD) often tries to solve this problem by leveraging metadata such as coauthors, research topics, publication venues and citation information, since more personal information such as the contact details is often restricted or missing. In this paper, we study the problem of how to efficiently disambiguate author names given an incessant stream of published papers. To this end, we propose a "BatchAD+IncAD" framework for dynamic author disambiguation. First, we perform batch author disambiguation (BatchAD) to disambiguate all author names at a given time by grouping all records (each record refers to a paper with one of its author names) into disjoint clusters. This establishes a one-to-one mapping between the clusters and real-world authors. Then, for newly added papers, we periodically perform incremental author disambiguation (IncAD), which determines whether each new record can be assigned to an existing cluster, or to a new cluster not yet included in the previous data. Based on the new data, IncAD also tries to correct previous AD results. Our main contributions are: (1) We demonstrate with real data that a small number of new papers often have overlapping author names with a large portion of existing papers, so it is challenging for IncAD to effectively leverage previous AD results. (2) We propose a novel IncAD model which aggregates metadata from a cluster of records to estimate the author's profile such as her coauthor distributions and keyword distributions, in order to predict how likely it is that a new record is "produced" by the author. (3) Using two labeled datasets and one large-scale raw dataset, we show that the proposed method is much more efficient than state-of-the-art methods while ensuring high accuracy.

    DOI

  • Understanding the Inconsistencies between Text Descriptions and the Use of Privacy-sensitive Resources of Mobile Apps

    Takuya Watanabe, Mitsuaki Akiyama, Tetsuya Sakai, Hironori Washizaki, Tatsuya Mori

    SOUPS 2015    2015年  [査読有り]

  • Topic Set Size Design with the Evaluation Measures for Short Text Conversation

    Tetsuya Sakai, Lifeng Shang, Zhengdong Lu, Hang Li

    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2015   9460   319 - 331  2015年  [査読有り]

     概要を見る

    Short Text Conversation (STC) is a new NTCIR task which tackles the following research question: given a microblog repository and a new post to that microblog, can systems reuse an old comment from the respository to satisfy the author of the new post? The official evaluation measures of STC are normalised gain at 1 (nG@1), normalised expected reciprocal rank at 10 (nERR@10), and P+, all of which can be regarded as evaluation measures for navigational intents. In this study, we apply the topic set size design technique of Sakai to decide on the number of test topics, using variance estimates of the above evaluation measures. Our main conclusion is to create 100 test topics, but what distinguishes our work from other tasks with similar topic set sizes is that we know what this topic set size means from a statistical viewpoint for each of our evaluation measures. We also demonstrate that, under the same set of statistical requirements, the topic set sizes required by nERR@10 and P+ are more or less the same, while nG@1 requires more than twice as many topics. To our knowledge, our task is the first among all efforts at TREC-like evaluation conferences to actually create a new test collection by using this principled approach.

    DOI

  • Designing test collections for comparing many systems

    Tetsuya Sakai

    CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management     61 - 70  2014年11月  [査読有り]

     概要を見る

    A researcher decides to build a test collection for comparing her new information retrieval (IR) systems with several state-of-the-art baselines. She wants to know the number of topics (n) she needs to create in advance, so that she can start looking for (say) a query log large enough for sampling n good topics, and estimating the relevance assessment cost. We provide practical solutions to researchers like her using power analysis and sample size design techniques, and demonstrate its usefulness for several IR tasks and evaluation measures. We consider not only the paired t-test but also one-way analysis of variance (ANOVA) for significance testing to accommodate comparison of m(≥ 2) systems under a given set of statistical requirements (α: the Type I error rate, β: the Type II error rate, and minD: the minimum detectable difference between the best and the worst systems). Using our simple Excel tools and some pooled variance estimates from past data, researchers can design statistically well-designed test collections. We demonstrate that, as different evaluation measures have different variances across topics, they inevitably require different topic set sizes. This suggests that the evaluation measures should be chosen at the test collection design phase. Moreover, through a pool depth reduction experiment with past data, we show how the relevance assessment cost can be reduced dramatically while freezing the set of statistical requirements. Based on the cost analysis and the available budget, researchers can determine the right balance betweeen n and the pool depth pd. Our techniques and tools are applicable to test collections for non-IR tasks as well.

    DOI

  • Metrics, Statistics, Tests (invited paper)

    Tetsuya Sakai

    PROMISE Winter School 2013: Bridging between Information Retrieval and Databases (LNCS 8173)    2014年  [査読有り]

  • Statistical Reform in Information Retrieval?

    Tetsuya Sakai

    SIGIR Forum    2014年  [査読有り]

  • Designing Test Collections That Provide Tight Confidence Intervals

    Tetsuya Sakai

    Forum on Information Technology 2014   13 ( 2 ) 15 - 18  2014年  [査読有り]

    CiNii

  • ReviewCollage: A Mobile Interface for Direct Comparison Using Online Reviews

    Haojian Jin, Tetsuya Sakai, Koji Yatani

    PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION WITH MOBILE DEVICES AND SERVICES (MOBILEHCI'14)     349 - 358  2014年  [査読有り]

     概要を見る

    Review comments posted in online websites can help the user decide a product to purchase or place to visit. They can also be useful to closely compare a couple of candidate entities. However, the user may have to read different webpages back and forth for comparison, and this is not desirable particularly when she is using a mobile device. We present ReviewCollage, a mobile interface that aggregates information about two reviewed entities in a one-page view. ReviewCollage uses attribute-value pairs, known to be effective for review text summarization, and highlights the similarities and differences between the entities. Our user study confirms that ReviewCollage can support the user to compare two entities and make a decision within a couple of minutes, at least as quickly as existing summarization interfaces. It also reveals that ReviewCollage could be most useful when two entities are very similar.

    DOI

  • Topic Set Size Design with Variance Estimates from Two-Way ANOVA

    Tetsuya Sakai

    EVIA 2014    2014年  [査読有り]

  • When do people use query suggestion? A query suggestion log analysis

    Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka

    INFORMATION RETRIEVAL   16 ( 6 ) 725 - 746  2013年12月  [査読有り]

     概要を見る

    Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it has not been clear what circumstances cause the user to turn to query suggestion. In order to investigate when and how the user uses query suggestion, we analyzed three kinds of data sets obtained from a major commercial Web search engine, comprising approximately 126 million unique queries, 876 million query suggestions and 306 million action patterns of users. Our analysis shows that query suggestions are often used (1) when the original query is a rare query, (2) when the original query is a single-term query, (3) when query suggestions are unambiguous, (4) when query suggestions are generalizations or error corrections of the original query, and (5) after the user has clicked on several URLs in the first search result page. Our results suggest that search engines should provide better assistance especially when rare or single-term queries are input, and that they should dynamically provide query suggestions according to the searcher's current state.

    DOI

  • Introduction to the special issue on search intents and diversification

    Tetsuya Sakai, Noriko Kando, Craig Macdonald, Ian Soboroff

    INFORMATION RETRIEVAL   16 ( 4 ) 427 - 428  2013年08月  [査読有り]

    DOI

  • Diversified search evaluation: lessons from the NTCIR-9 INTENT task

    Tetsuya Sakai, Ruihua Song

    INFORMATION RETRIEVAL   16 ( 4 ) 504 - 529  2013年08月  [査読有り]

     概要を見る

    The evaluation of diversified web search results is a relatively new research topic and is not as well-understood as the time-honoured evaluation methodology of traditional IR based on precision and recall. In diversity evaluation, one topic may have more than one intent, and systems are expected to balance relevance and diversity. The recent NTCIR-9 evaluation workshop launched a new task called INTENT which included a diversified web search subtask that differs from the TREC web diversity task in several aspects: the choice of evaluation metrics, the use of intent popularity and per-intent graded relevance, and the use of topic sets that are twice as large as those of TREC. The objective of this study is to examine whether these differences are useful, using the actual data recently obtained from the NTCIR-9 INTENT task. Our main experimental findings are: (1) The evaluation framework used at NTCIR provides more "intuitive" and statistically reliable results than Intent-Aware Expected Reciprocal Rank; (2) Utilising both intent popularity and per-intent graded relevance as is done at NTCIR tends to improve discriminative power, particularly for -nDCG; and (3) Reducing the topic set size, even by just 10 topics, can affect not only significance testing but also the entire system ranking; when 50 topics are used (as in TREC) instead of 100 (as in NTCIR), the system ranking can be substantially different from the original ranking and the discriminative power can be halved. These results suggest that the directions being explored at NTCIR are valuable.

    DOI

  • Web Search Evaluation with Informational and Navigational Intents

    Tetsuya Sakai

    Journal of Information Processing    2013年  [査読有り]

  • The Unreusability of Diversified Test Collections

    Tetsuya Sakai

    EVIA 2013    2013年  [査読有り]

  • Summaries, Ranked Retrieval and Sessions: A Unified Framework for Information Access Evaluation

    Tetsuya Sakai, Zhicheng Dou

    ACM SIGIR 2013    2013年  [査読有り]

  • Exploring semi-automatic nugget extraction for Japanese one click access evaluation

    Matthew Ekstrand-Abueg, Virgil Pavlu, Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

    SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval     749 - 752  2013年  [査読有り]

     概要を見る

    Building test collections based on nuggets is useful evaluating systems that return documents, answers, or summaries. However, nugget construction requires a lot of manual work and is not feasible for large query sets. Towards an efficient and scalable nugget-based evaluation, we study the applicability of semi-automatic nugget extraction in the context of the ongoing NTCIR One Click Access (1CLICK) task. We compare manually-extracted and semi-automatically- extracted Japanese nuggets to demonstrate the coverage and efficiency of the semi-automatic nugget extraction. Our findings suggest that the manual nugget extraction can be replaced with a direct adaptation of the English semi-automatic nugget extraction system, especially for queries for which the user desires broad answers from free-form text. Copyright © 2013 ACM.

    DOI

  • Report from the NTCIR-10 1CLICK-2 Japanese subtask: Baselines, upperbounds and evaluation robustness

    Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

    SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval     753 - 756  2013年  [査読有り]

     概要を見る

    The One Click Access Task (1CLICK) of NTCIR requires systems to return a concise multi-document summary of web pages in response to a query which is assumed to have been submitted in a mobile context. Systems are evaluated based on information units (or iUnits), and are required to present important pieces of information first and to minimise the amount of text the user has to read. Using the official Japanese results of the second round of the 1CLICK task from NTCIR-10, we discuss our task setting and evaluation framework. Our analyses show that: (1) Simple baseline methods that leverage search engine snippets or Wikipedia are effective for "lookup" type queries but not necessarily for other query types
    (2) There is still a substantial gap between manual and automatic runs
    and (3) Our evaluation metrics are relatively robust to the incompleteness of iUnits. Copyright © 2013 ACM.

    DOI

  • Summary of the NTCIR-10 INTENT-2 Task: Subtopic mining and search result diversification

    Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Makoto P. Kato

    SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval     761 - 764  2013年  [査読有り]

     概要を見る

    The NTCIR INTENT task comprises two subtasks: Subtopic Mining, where systems are required to return a ranked list of subtopic strings for each given query
    and Document Ranking, where systems are required to return a diversified web search result for each given query. This paper summarises the novel features of the Second INTENT task at NTCIR-10 and its main findings, and poses some questions for future diversified search evaluation. Copyright © 2013 ACM.

    DOI

  • Time-aware structured query suggestion

    Taiki Miyanishi, Tetsuya Sakai

    SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval     809 - 812  2013年  [査読有り]

     概要を見る

    Most commercial search engines have a query suggestion feature, which is designed to capture various possible search intents behind the user's original query. However, even though different search intents behind a given query may have been popular at different time periods in the past, existing query suggestion methods neither utilize nor present such information. In this study, we propose Time-aware Structured Query Suggestion (TaSQS) which clusters query suggestions along a timeline so that the user can narrow down his search from a temporal point of view. Moreover, when a suggested query is clicked, TaSQS presents web pages from query-URL bipartite graphs after ranking them according to the click counts within a particular time period. Our experiments using data from a commercial search engine log show that the time-aware clustering and the time-aware document ranking features of TaSQS are both effective. Copyright © 2013 ACM.

    DOI

  • The Impact of Intent Selection on Diversified Search Evaluation

    Tetsuya Sakai, Zhicheng Dou, Charles L.A. Clarke

    ACM SIGIR 2013    2013年  [査読有り]

  • Evaluating Heterogeneous Information Access (Position paper)

    Ke Zhou, Tetsuya Sakai, Mounia Lalmas, Zhicheng Dou, Joemon M. Jose

    Workshop on Modeling User Behavior for Information Access Evaluation    2013年  [査読有り]

  • Mining Search Intents from Text Fragments

    Qinglei Wang, Yanan Qian, Ruihua Song, Zhicheng Dou, Fan Zhang, Tetsuya Sakai, Qinghua Zheng

    Information Retrieval    2013年  [査読有り]

  • On the reliability and intuitiveness of aggregated search metrics

    Ke Zhou, Mounia Lalmas, Tetsuya Sakai, Ronan Cummins, Joemon M. Jose

    International Conference on Information and Knowledge Management, Proceedings     689 - 698  2013年  [査読有り]

     概要を見る

    Aggregating search results from a variety of diverse verticals such as news, images, videos and Wikipedia into a single interface is a popular web search presentation paradigm. Although several aggregated search (AS) metrics have been proposed to evaluate AS result pages, their properties remain poorly understood. In this paper, we compare the properties of existing AS metrics under the assumptions that (1) queries may have multiple preferred verticals
    (2) the likelihood of each vertical preference is available
    and (3) the topical relevance assessments of results returned from each vertical is available. We compare a wide range of AS metrics on two test collections. Our main criteria of comparison are (1) discriminative power, which represents the reliability of a metric in comparing the performance of systems, and (2) intuitiveness, which represents how well a metric captures the various key aspects to be measured (i.e. various aspects of a user's perception of AS result pages). Our study shows that the AS metrics that capture key AS components (e.g., vertical selection) have several advantages over other metrics. This work sheds new lights on the further developments and applications of AS metrics. Copyright 2013 ACM.

    DOI

  • Dynamic query intent mining from a search log stream

    Yanan Qian, Tetsuya Sakai, Junting Ye, Qinghua Zheng, Cong Li

    International Conference on Information and Knowledge Management, Proceedings     1205 - 1208  2013年  [査読有り]

     概要を見る

    It has long been recognized that search queries are often broad and ambiguous. Even when submitting the same query, different users may have different search intents. Moreover, the intents are dynamically evolving. Some intents are constantly popular with users, others are more bursty. We propose a method for mining dynamic query intents from search query logs. By regarding the query logs as a data stream, we identify constant intents while quickly capturing new bursty intents. To evaluate the accuracy and efficiency of our method, we conducted experiments using 50 topics from the NTCIR INTENT-9 data and additional five popular topics, all supplemented with six-month query logs from a commercial search engine. Our results show that our method can accurately capture new intents with short response time. Copyright 2013 ACM.

    DOI

  • How intuitive are diversified search metrics? Concordance test results for the diversity U-measures

    Tetsuya Sakai

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8281   13 - 24  2013年  [査読有り]

     概要を見る

    Most of the existing Information Retrieval (IR) metrics discount the value of each retrieved relevant document based on its rank. This statement also applies to the evaluation of diversified search: the widely-used diversity metrics, namely, α-nDCG, Intent-Aware Expected Reciprocal Rank (ERR-IA) and D#-nDCG, are all rank-based. These evaluation metrics regard the system output as a list of document IDs, and ignore all other features such as snippets and document full texts of various lengths. In contrast, the U-measure framework of Sakai and Dou uses the amount of text read by the user as the foundation for discounting the value of relevant information, and can take into account the user's snippet reading and full text reading behaviours. The present study compares the diversity versions of U-measure (D-U and U-IA) with the state-of-the-art diversity metrics using the concordance test: given a pair of ranked lists, we quantify the ability of each metric to favour the more diversified and more relevant list. Our results show that while D#-nDCG is the overall winner in terms of simultaneous concordance with diversity and relevance, D-U and U-IA statistically significantly outperform other state-of-the-art metrics. Moreover, in terms of concordance with relevance alone, D-U and U-IA significantly outperform all rank-based diversity metrics. Thus, D-U and U-IA are not only more realistic but also more relevance-oriented than other diversity metrics. © 2013 Springer-Verlag.

    DOI

  • User-aware advertisability

    Hai-Tao Yu, Tetsuya Sakai

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8281   452 - 463  2013年  [査読有り]

     概要を見る

    In sponsored search, many studies focus on finding the most relevant advertisements (ads) and their optimal ranking for a submitted query. Determining whether it is suitable to show ads has received less attention. In this paper, we introduce the concept of user-aware advertisability, which refers to the probability of ad-click on sponsored ads when a specific user submits a query. When computing the advertisability for a given query-user pair, we first classify the clicked web pages based on a pre-defined category hierarchy and use the aggregated topical categories of clicked web pages to represent user preference. Taking user preference into account, we then compute the ad-click probability for this query-user pair. Compared with existing methods, the experimental results show that user preference is of great value for generating user-specific advertisability. In particular, our approach that computes advertisability per query-user pair outperforms the two state-of-the-art methods that compute advertisability per query in terms of a variant of the normalized Discounted Cumulative Gain metric. © 2013 Springer-Verlag.

    DOI

  • Estimating intent types for search result diversification

    Kosetsu Tsukuda, Tetsuya Sakai, Zhicheng Dou, Katsumi Tanaka

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8281   25 - 37  2013年  [査読有り]

     概要を見る

    Given an ambiguous or underspecified query, search result diversification aims at accommodating different user intents within a single Search Engine Result Page (SERP). While automatic identification of different intents for a given query is a crucial step for result diversification, also important is the estimation of intent types (informational vs. navigational). If it is possible to distinguish between informational and navigational intents, search engines can aim to return one best URL for each navigational intent, while allocating more space to the informational intents within the SERP. In light of the observations, we propose a new framework for search result diversification that is intent importance-aware and type-aware. Our experiments using the NTCIR-9 INTENT Japanese Subtopic Mining and Document Ranking test collections show that: (a) our intent type estimation method for Japanese achieves 64.4% accuracy
    and (b) our proposed diversification method achieves 0.6373 in D#-nDCG and 0.5898 in DIN#-nDCG over 56 topics, which are statistically significant gains over the top performers of the NTCIR-9 INTENT Japanese Document Ranking runs. Moreover, our relevance oriented model significantly outperforms our diversity oriented model and the original model by Dou et al.. © 2013 Springer-Verlag.

    DOI

  • On labelling intent types for evaluating search result diversification

    Tetsuya Sakai, Young-In Song

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8281   38 - 49  2013年  [査読有り]

     概要を見る

    Search result diversification is important for accommodating different user needs by means of covering popular and diverse query intents within a single result page. To evaluate diversity, we believe that it is important to consider the distinction between informational and navigational intents, as users would not want redundant information especially for navigational intents. In this study, we conduct intent type-sensitive diversity evaluation based on both top-down labelling, which labels each intent as either navigational or informational a priori, and bottom-up labelling, which labels each intent based on whether a "navigational relevant" document has actually been identified in the document collection. Our results suggest that reliable type-sensitive diversity evaluation can be conducted using the top-down approach with a clear intent labelling guideline, while ensuring that the desired URLs for navigational intents make their way into relevance assessments. © 2013 Springer-Verlag.

    DOI

  • Query snowball: A co-occurrence-based approach to multi-document summarization for question answering

    Hajime Morita, Tetsuya Sakai, Manabu Okumura

    IPSJ Online Transactions   5 ( 2012 ) 124 - 129  2012年  [査読有り]

     概要を見る

    We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.

    DOI

  • Evaluation with Informational and Navigational Intents

    Tetsuya Sakai

    WWW 2012    2012年  [査読有り]

  • Structured query suggestion for specialization and parallel movement: Effect on search behaviors

    Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka

    WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web     389 - 398  2012年  [査読有り]

     概要を見る

    Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it is often difficult for the user to choose from a list of query suggestions, and to understand the relation between an input query and suggested ones. In this paper, we propose a new method to present query suggestions to the user, which has been designed to help two popular query reformulation actions, namely, specialization (e.g. from "nikon" to "nikon camera") and parallel movement (e.g. from "nikon camera" to "canon camera"). Using a query log collected from a popular commercial Web search engine, our prototype called SParQS classifies query suggestions into automatically generated categories and generates a label for each category. Moreover, SParQS presents some new entities as alternatives to the original query (e.g. "canon" in response to the query "nikon"), together with their query suggestions classified in the same way as the original query's suggestions. We conducted a task-based user study to compare SParQS with a traditional "flat list" query suggestion interface. Our results show that the SParQS interface enables subjects to search more successfully than the flat list case, even though query suggestions presented were exactly the same in the two interfaces. In addition, the subjects found the query suggestions more helpful when they were presented in the SParQS interface rather than in a flat list.

    DOI

  • Query snowball: A co-occurrence-based approach to multi-document summarization for question answering

    Hajime Morita, Tetsuya Sakai, Manabu Okumura

    IPSJ Online Transactions   5 ( 2012 ) 124 - 129  2012年  [査読有り]

     概要を見る

    We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.

    DOI

  • AspecTiles: Tile-based visualization of diversified web search results

    Mayu Iwata, Tetsuya Sakai, Takehiro Yamamoto, Yu Chen, Yi Liu, Ji-Rong Wen, Shojiro Nishio

    SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval     85 - 94  2012年  [査読有り]

     概要を見る

    A diversified search result for an underspecified query generally contains web pages in which there are answers that are relevant to different aspects of the query. In order to help the user locate such relevant answers, we propose a simple extension to the standard Search Engine Result Page (SERP) interface, called AspecTiles. In addition to presenting a ranked list of URLs with their titles and snippets, AspecTiles visualizes the relevance degree of a document to each aspect by means of colored squares ("tiles"). To compare AspecTiles with the standard SERP interface in terms of usefulness, we conducted a user study involving 30 search tasks designed based on the TREC web diversity task topics as well as 32 participants. Our results show that AspecTiles has some advantages in terms of search performance, user behavior, and user satisfaction. First, AspecTiles enables the user to gather relevant information significantly more efficiently than the standard SERP interface for tasks where the user considers several different aspects of the query to be important at the same time (multi-aspect tasks). Second, AspecTiles affects the user's information seeking behavior: with this interface, we observed significantly fewer query reformulations, shorter queries and deeper examinations of ranked lists in multi-aspect tasks. Third, participants of our user study found AspecTiles significantly more useful for finding relevant information and easy to use than the standard SERP interface. These results suggest that simple interfaces like AspecTiles can enhance the search performance and search experience of the user when their queries are underspecified. © 2012 ACM.

    DOI

  • Towards Zero-Click Mobile IR Evaluation: Knowing What and Knowing When

    Tetsuya Sakai

    ACM SIGIR 2012    2012年  [査読有り]

  • New Assessment Criteria for Query Suggestion

    Zhongrui Ma, Yu Chen, Ruihua Song, Tetsuya Sakai, Jiaheng Lu, Ji-Rong Wen

    ACM SIGIR 2012    2012年  [査読有り]

  • The wisdom of advertisers: Mining subgoals via query clustering

    Takehiro Yamamoto, Tetsuya Sakai, Mayu Iwata, Chen Yu, Ji-Rong Wen, Katsumi Tanaka

    ACM International Conference Proceeding Series     505 - 514  2012年  [査読有り]

     概要を見る

    This paper tackles the problem of mining subgoals of a given search goal from data. For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as "book flights," "book a hotel," "find good restaurants" and "decide which sightseeing spots to visit." As another example, if a searcher wants to lose weight, there may exist several alternative solutions such as "do physical exercise," "take diet pills," and "control calorie intake." In this paper, we refer to such subtasks or solutions as subgoals, and propose to utilize sponsored search data for finding subgoals of a given query by means of query clustering. Advertisements (ads) reflect advertisers' tremendous efforts in trying to match a given query with implicit user needs. Moreover, ads are usually associated with a particular action or transaction. We therefore hypothesized that they are useful for subgoal mining. To our knowledge, our work is the first to use sponsored search data for this purpose. Our experimental results show that sponsored search data is a good resource for obtaining related queries and for identifying subgoals via query clustering. In particular, our method that combines ad impressions from sponsored search data and query co-occurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, F1-measure and subgoal recall. © 2012 ACM.

    DOI

  • The reusability of a diversified search test collection

    Tetsuya Sakai, Zhicheng Dou, Ruihua Song, Noriko Kando

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   7675   26 - 38  2012年  [査読有り]

     概要を見る

    Traditional ad hoc IR test collections were built using a relatively large pool depth (e.g. 100), and are usually assumed to be reusable. Moreover, when they are reused to compare a new system with another or with systems that contributed to the pools ("contributors"), an even larger measurement depth (e.g. 1,000) is often used for computing evaluation metrics. In contrast, the web diversity test collections that have been created in the past few years at TREC and NTCIR use a much smaller pool depth (e.g. 20). The measurement depth is also small (e.g. 10-30), as search result diversification is primarily intended for the first result page. In this study, we examine the reusability of a typical web diversity test collection, namely, one from the NTCIR-9 INTENT-1 Chinese Document Ranking task, which used a pool depth of 20 and official measurement depths of 10, 20 and 30. First, we conducted additional relevance assessments to expand the official INTENT-1 collection to achieve a pool depth of 40. Using the expanded relevance assessments, we show that run rankings at the measurement depth of 30 are too unreliable, given that the pool depth is 20. Second, we conduct a leave-one-out experiment for every participating team of the INTENT-1 Chinese task, to examine how (un)fairly new runs are evaluated with the INTENT-1 collection. We show that, for the purpose of comparing new systems with the contributors of the test collection being used, condensed-list versions of existing diversity evaluation metrics are more reliable than the raw metrics. However, even the condensed-list metrics may be unreliable if the new systems are not competitive compared to the contributors. © Springer-Verlag 2012.

    DOI

  • One click one revisited: Enhancing evaluation based on information units

    Tetsuya Sakai, Makoto P. Kato

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   7675   39 - 51  2012年  [査読有り]

     概要を見る

    This paper extends the evaluation framework of the NTCIR-9 One Click Access Task (1CLICK-1), which required systems to return a single, concise textual output in response to a query in order to satisfy the user immediately after a click on the SEARCH button. Unlike traditional nugget-based summarisation and question answering evaluation methods, S-measure, the official evaluation metric of 1CLICK-1, discounts the value of each information unit based on its position within the textual output. We first show that the discount parameter L of S-measure affects system ranking and discriminative power, and that using multiple values, e.g. L = 250 (user has only 30 seconds to view the text) and L = 500 (user has one minute), is beneficial. We then complement the recall-like S-measure with a simple, precision-like metric called T-measure as well as a combination of S-measure and T-measure, called S#. We show that S# with a heavy emphasis on S-measure imposes an appropriate length penalty to 1CLICK-1 system outputs and yet achieves discriminative power that is comparable to S-measure. These new metrics will be used at NTCIR-10 1CLICK-2. © Springer-Verlag 2012.

    DOI

  • Grid-based interaction for exploratory search

    Hideo Joho, Tetsuya Sakai

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   7675   496 - 505  2012年  [査読有り]

     概要を見る

    This paper presents a grid-based interaction model that is designed to encourage searchers to organize a complex search space by managing n x m sub spaces. A search interface was developed based on the proposed interaction model, and its performance was evaluated by a user study carried out in the context of the NTCIR-9 VisEx Task. With the proposed interface, there were cases where subjects discovered new knowledge without accessing external resources when compared to a baseline system. The encouraging results from experiments warrant further studies on the model. © Springer-Verlag 2012.

    DOI

  • Using graded-relevance metrics for evaluating community QA answer selection

    Tetsuya Sakai, Yohei Seki, Daisuke Ishikawa, Kazuko Kuriyama, Noriko Kando, Chin-Yew Lin

    Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011     187 - 196  2011年  [査読有り]

     概要を見る

    Community Question Answering (CQA) sites such as Yahoo ! Answers have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites can be irrelevant, incomplete, redundant, incorrect, biased, ill-formed or even abusive. Hence, automatic selection of "good" answers for a given posted question is a practical research problem that will help us manage the quality of accumulated knowledge. One way to evaluate answer selection systems for CQA would be to use the Best Answers (BAs) that are readily available from the CQA sites. However, BAs may be biased, and even if they are not, there may be other good answers besides BAs. To remedy these two problems, we propose system evaluation methods that involve multiple answer assessors and graded-relevance information retrieval metrics. Our main findings from experiments using the NTCIR-8 CQA task data are that, using our evaluation methods, (a) we can detect many substantial differences between systems that would have been overlooked by BA-based evaluation
    and (b) we can better identify hard questions (i.e. those that are handled poorly by many systems and therefore require focussed investigation) compared to BAbased evaluation. We therefore argue that our approach is useful for building effective CQA answer selection systems despite the cost of manual answer assessments. Copyright 2011 ACM.

    DOI

  • Query Session Data vs. Clickthrough Data as Query Suggestion Resources

    Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka

    ECIR 2011 Workshop on Session Information Retrieval    2011年  [査読有り]

  • Challenges in Diversity Evaluation (keynote)

    Tetsuya Sakai

    ECIR 2011 Workshop on Diversity in Document Retrieval    2011年  [査読有り]

  • Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

    Naoyoshi Aikawa, Tetsuya Sakai, Hayato Yamana

    情報処理学会論文誌   2011 ( 1 ) 1 - 9  2011年  [査読有り]

    CiNii

  • Evaluating Diversified Search Results Using Per-Intent Graded Relevance

    Tetsuya Sakai, Ruihua Song

    ACM SIGIR 2011    2011年  [査読有り]

  • NTCIREVAL: A Generic Toolkit for Information Access Evaluation

    Tetsuya Sakai

    FIT 2011    2011年  [査読有り]

  • コミュニティQAにおける良質回答の自動予測

    石川大介, 酒井哲也, 関洋平, 栗山和子, 神門典子

    情報知識学会誌    2011年  [査読有り]

  • 北京のマイクロソフト研究所より2011 - 日本人インターンの成功事例 -

    酒井哲也

    若手研究者支援のための産学共同GCOE国内シンポジウムダイジェスト集    2011年  [査読有り]

  • What Makes a Good Answer in Community Question Answering? An Analysis of Assessors' Criteria

    Daisuke Ishikawa, Noriko Kando, Tetsuya Sakai

    EVIA2011    2011年  [査読有り]

  • Q&Aサイトにおけるベストアンサー推定の分析とその機械学習への応用

    石川 大介, 栗山 和子, 酒井 哲也, 関 洋平, 神門 典子

    情報知識学会誌   20 ( 2 ) 73 - 85  2010年05月

     概要を見る

    本研究では,Q サイトにおけるベストアンサーを計算機が推定可能か検証した.まず最初に,人間の判定者によるベストアンサー推定実験を行った.ベストアンサー推定実験にはYahoo!知恵袋データを利用し,「恋愛相談」「パソコン」「一般教養」「政治」の4つのカテゴリからそれぞれ無作為抽出した50 問を使用した.判定者二人による推定結果の正解率(精度) は,「恋愛相談」では50%と52%(ランダム推定:34%),「パソコン」では62%と58%(ランダム推定:38%),「一般教養」では54%と56%(ランダム推定:37%),「政治」では56%と60%(ランダム推定:35.8%) であった.次に,この実験結果を分析し,ベストアンサーを選ぶ要因として「詳しい」「根拠」「丁寧」を素性とする機械学習システムを構築した.判定者らと同じ50 問を用いた推定実験の結果,機械学習システムの精度は,「パソコン」では判定者らの結果を上回り(67%),「恋愛相談」では判定者らの結果を下回った(41%).「一般教養」と「政治」では機械学習システムと判定者らの結果はほぼ同等であった.

    DOI CiNii

  • Boiling Down Information Retrieval Test Collections

    Tetsuya Sakai, Teruko Mitamura

    RIAO 2010 Proceedings    2010年  [査読有り]

  • Constructing a Test Collection with Multi-Intent Queries

    Ruihua Song, Dongjie Qi, Hua Liu, Tetsuya Sakai, Jian-Yun Nie, Hsiao-Wen Hon, Yong Yu

    EVIA 2010 Proceedings    2010年  [査読有り]

  • Simple Evaluation Metrics for Diversified Search Results

    Tetsuya Sakai, Nick Craswell, Ruihua Song, Stephen Robertson, Zhicheng Dou, Chin-Yew Lin

    EVIA 2010 Proceedings    2010年  [査読有り]

  • Ranking Retrieval Systems without Relevance Assessments – Revisited

    Tetsuya Sakai, Chin-Yew Lin

    EVIA 2010 Proceedings    2010年  [査読有り]

  • コミュニティQAにおける良質な回答の選定タスク: 評価方法に関する考察

    酒井哲也, 石川大介, 栗山和子, 関洋平, 神門典子

    FIT 2010    2010年  [査読有り]

  • Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not?

    Naoyoshi Aikawa, Tetsuya Sakai, Hayato Yamana

    WebDB Forum 2010    2010年  [査読有り]

  • On the robustness of information retrieval metrics to biased relevance assessments

    Tetsuya Sakai

    Journal of Information Processing   17   156 - 166  2009年  [査読有り]

     概要を見る

    Information Retrieval (IR) test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used IR evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in more realistic settings, by reducing the number of pooled systems and the number of pooled documents. Even though previous studies have shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that these results do not hold when the relevance data are biased towards particular systems or towards the top of the pools. More specifically, we show that the condensed-list versions of Average Precision, Qmeasure and normalised Discounted Cumulative Gain, which we denote as AP′, Q′ and nDCG′, are not necessarily superior to the original metrics for handling biases. Nevertheless, AP′ and Q′ are generally superior to bpref, Rank-Biased Precision and its condensed-list version even in the presence of biases.

    DOI

  • Serendipitous Search via Wikipedia: A Query Log Analysis

    Tetsuya Sakai, Kenichi Nogami

    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL     780 - 781  2009年  [査読有り]

     概要を見る

    We analyse the query log of a click-oriented Japanese search engine that utilises the link structures of Wikipedia for encouraging the user to change his information need and to perform repeated, serendipitous, exploratory search. Our results show that users tend to make transitions within the same query type: from person names to person names, from place names to place names, and so on.

  • Ranking the NTCIR ACLIA IR4QA Systems without Relevance Assessments

    Tetsuya Sakai, Noriko Kando, Hideki Shima, Chuan-Jie Lin, Ruihua Song, Miho Sugimoto, Teruko Mitamura

    日本データベース学会論文誌    2009年  [査読有り]

  • People, Clouds, and Interaction for Information Access (invited paper)

    Tetsuya Sakai

    IUCS 2009    2009年  [査読有り]

  • On information retrieval metrics designed for evaluation with incomplete relevance assessments

    Tetsuya Sakai, Noriko Kando

    INFORMATION RETRIEVAL   11 ( 5 ) 447 - 470  2008年10月  [査読有り]

     概要を見る

    Modern information retrieval (IR) test collections have grown in size, but the available manpower for relevance assessments has more or less remained constant. Hence, how to reliably evaluate and compare IR systems using incomplete relevance data, where many documents exist that were never examined by the relevance assessors, is receiving a lot of attention. This article compares the robustness of IR metrics to incomplete relevance assessments, using four different sets of graded-relevance test collections with submitted runs-the TREC 2003 and 2004 robust track data and the NTCIR-6 Japanese and Chinese IR data from the crosslingual task. Following previous work, we artificially reduce the original relevance data to simulate IR evaluation environments with extremely incomplete relevance data. We then investigate the effect of this reduction on discriminative power, which we define as the proportion of system pairs with a statistically significant difference for a given probability of Type I Error, and on Kendall's rank correlation, which reflects the overall resemblance of two system rankings according to two different metrics or two different relevance data sets. According to these experiments, Q', nDCG' and AP' proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also point out some weaknesses of bpref and Rank-Biased Precision by examining their formal definitions.

    DOI

  • Introduction to the NTCIR-6 Special Issue

    Noriko Kando, Teruko Mitamura, Tetsuya Sakai

    ACM Transactions on Asian Language Information Processing (TALIP)    2008年  [査読有り]

  • Precision-at-ten considered redundant

    William Webber, Alistair Moffat, Justin Zobel, Tetsuya Sakai

    ACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings     695 - 696  2008年  [査読有り]

     概要を見る

    Information retrieval systems are compared using evaluation metrics, with researchers commonly reporting results for simple metrics such as precision-at-10 or reciprocal rank together with more complex ones such as average precision or discounted cumulative gain. In this paper, we demonstrate that complex metrics are as good as or better than simple metrics at predicting the performance of the simple metrics on other topics. Therefore, reporting of results from simple metrics alongside complex ones is redundant.

    DOI

  • Comparing Metrics across TREC and NTCIR: The Robustness to Pool Depth Bias

    Tetsuya Sakai

    ACM SIGIR 2008 Proceedings    2008年  [査読有り]

    CiNii

  • クリックスルーに基づく探検型検索サイトの設計と開発,

    酒井 哲也, 小山田 浩史, 野上 謙一, 北村 仁美, 梶浦 正浩, 東 美奈子, 野中 由美子, 小野 雅也, 菊池 豊

    第7回情報科学技術フォーラム2008    2008年  [査読有り]

    CiNii

  • Comparing metrics across TREC and NTCIR: The robustness to system bias

    Tetsuya Sakai

    International Conference on Information and Knowledge Management, Proceedings     581 - 590  2008年  [査読有り]

     概要を見る

    Test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in a more realistic setting, by reducing the number of pooled systems. Even though previous work has shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that these results do not hold in the presence of system bias. In our experiments using TREC and NTCIR data, we first show that condensed-list metrics overestimate new systems while traditional metrics underestimate them, and that the overestimation tends to be larger than the underestimation. We then show that, when relevance data is heavily biased towards a single team or a few teams, the condensed-list versions of Average Precision (AP), Q-measure (Q) and normalised Discounted Cumulative Gain (nDCG), which we call AP', Q' and nDCG', are not necessarily superior to the original metrics in terms of discriminative power, i.e., the overall ability to detect pairwise statistical significance. Nevertheless, even under system bias, AP' and Q' are generally more discriminative than bpref and the condensed-list version of Rank-Biased Precision (RBP), which we call RBP'. Copyright 2008 ACM.

    DOI

  • Modelling A User Population for Designing Information Retrieval Metrics

    Tetsuya Sakai, Stephen Robertson

    Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008)    2008年  [査読有り]

  • On the reliability of information retrieval metrics based on graded relevance

    Tetsuya Sakai

    INFORMATION PROCESSING & MANAGEMENT   43 ( 2 ) 531 - 548  2007年03月  [査読有り]

     概要を見る

    This paper compares 14 information retrieval metrics based on graded relevance, together with 10 traditional metrics based on binary relevance, in terms of stability, sensitivity and resemblance of system rankings. More specifically, we compare these metrics using the Buckley/Voorhees stability method, the Voorhees/Buckley swap method and Kendall's rank correlation, with three data sets comprising test collections and submitted runs from NTCIR. Our experiments show that (Average) Normalised Discounted Cumulative Gain at document cut-off I are the best among the rank-based graded-relevance metrics, provided that l is large. On the other hand, if one requires a recall-based graded-relevance metric that is highly correlated with Average Precision, then Q-measure is the best choice. Moreover, these best graded-relevance metrics are at least as stable and sensitive as Average Precision, and are fairly robust to the choice of gain values. (c) 2006 Elsevier Ltd. All rights reserved.

    DOI

  • On the Reliability of Factoid Question Answering Evaluation

    Tetsuya Sakai

    ACM Transactions on Asian Language Information Processing (TALIP)    2007年  [査読有り]

  • On Penalising Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance

    Tetsuya Sakai

    Proceedings of the First Workshop on Evaluating Information Access (EVIA 2007)     32 - 43  2007年  [査読有り]

    CiNii

  • User Satisfaction Task: A Proposal for NTCIR-7

    Tetsuya Sakai

    Proceedings of the First Workshop on Evaluating Information Access (EVIA 2007)    2007年  [査読有り]

  • Pic-A-Topic: Efficient Viewing of Informative TV Contents on Travel, Cooking, Food and More

    Tetsuya Sakai, Tatsuya Uehara, Taishi Shimomori, Makoto Koyama, Mika Fukui

    RIAO 2007 Proceedings    2007年  [査読有り]

  • Alternatives to Bpref

    Tetsuya Sakai

    Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07     71 - 78  2007年  [査読有り]

     概要を見る

    Recently, a number of TREC tracks have adopted a retrieval effectiveness metric called bpref which has been designed for evaluation environments with incomplete relevance data. A graded-relevance version of this metric called rpref has also been proposed. However, we show that the application of Q-measure, normalised Discounted Cumulative Gain (nDCG) or Average Precision (AveP)to condensed lists, obtained by ?ltering out all unjudged documents from the original ranked lists, is actually a better solution to the incompleteness problem than bpref. Furthermore, we show that the use of graded relevance boosts the robustness of IR evaluation to incompleteness and therefore that Q-measure and nDCG based on condensed lists are the best choices. To this end, we use four graded-relevance test collections from NTCIR to compare ten different IR metrics in terms of system ranking stability and pairwise discriminative power. Copyright 2007 ACM.

    DOI

  • Evaluating the Task of Finding One Relevant Document Using Incomplete Relevance Data

    Tetsuya Sakai

    FIT 2007 Information Technology Letters    2007年  [査読有り]

  • Evaluating Information Retrieval Metrics based on Bootstrap Hypothesis Tests

    Tetsuya Sakai

    IPSJ TOD    2007年  [査読有り]

  • On the Properties of Evaluation Metrics for Finding One Highly Relevant Document

    Tetsuya Sakai

    IPSJ TOD    2007年  [査読有り]

  • 高精度な音声入力質問応答のための疑問表現補完

    筒井 秀樹, 真鍋 俊彦, 福井 美佳, 酒井 哲也, 藤井 寛子, 浦田 耕二

    情報処理学会論文誌    2007年  [査読有り]

  • よりよい検索システム実現のために:正解の良し悪しを考慮した情報検索評価の動向

    酒井哲也

    情報処理    2006年  [査読有り]

  • A Further Note on Evaluation Metrics for the Task of Finding One Highly Relevant Document

    Tetsuya Sakai

    IPSJ SIG Technical Report    2006年  [査読有り]

  • On the Task of Finding One Highly Relevant Document with High Precision

    Tetsuya Sakai

    IPSJ TOD    2006年  [査読有り]

  • Give Me Just One Highly Relevant Document: P-measure

    Tetsuya Sakai

    ACM SIGIR 2006 Proceedings    2006年  [査読有り]

  • Evaluating Evaluation Metrics based on the Bootstrap

    Tetsuya Sakai

    ACM SIGIR 2006 Proceedings    2006年  [査読有り]

    CiNii

  • NTCIRに基づく文書検索技術の進歩に関する一考察

    酒井哲也

    情報科学技術レターズ    2006年  [査読有り]

  • Improving the robustness to recognition errors in speech input question answering

    Hideki Tsutsui, Toshihiko Manabe, Mika Fukui, Tetsuya Sakai, Hiroko Fujii, Koji Urata

    INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS   4182   297 - 312  2006年  [査読有り]

     概要を見る

    In our previous work, we developed a prototype of a speech-input help system for home appliances such as digital cameras and microwave ovens. Given a factoid question, the system performs textual question answering using the manuals as the knowledge source. Whereas, given a HOW question, it retrieves and plays a demonstration video. However, our first prototype suffered from speech recognition errors, especially when the Japanese interrogative phrases in factoid questions were misrecognized. We therefore propose a method for solving this problem, which complements a speech query transcript with an interrogative phrase selected from a pre-determined list. The selection process first narrows down candidate phrases based on co-occurrences within the manual text, and then computes the similarity between each candidate and the query transcript in terms of pronunciation. Our method improves the Mean Reciprocal Rank of top three answers from 0.429 to 0.597 for factoid questions.

  • Pic-A-Topic: Gathering information efficiently from recorded TV shows on travel

    Tetsuya Sakai, Tatsuya Uehara, Kazuo Sumita, Taishi Shimomori

    INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS   4182   429 - 444  2006年  [査読有り]

     概要を見る

    We introduce a system called Pic-A-Topic, which analyses closed captions of Japanese TV shows on travel to perform topic segmentation and topic sentence selection. Our objective is to provide a table-of-contents interface that enables efficient viewing of desired topical segments within recorded TV shows to users of appliances such as hard disk recorders and digital TVs. According to our experiments using 14.5 hours of recorded travel TV shows, Pic-A-Topic's F1-measure for the topic segmentation task is 82% of manual performance on average. Moreover, a preliminary user evaluation experiment suggests that this level of performance may be indistinguishable from manual performance.

  • Bootstrap-based comparisons of IR metrics for finding one relevant document

    Tetsuya Sakai

    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS   4182   374 - 389  2006年  [査読有り]

     概要を見る

    This paper compares the sensitivity of IR metrics designed for the task of finding one relevant document, using a method recently proposed at SIGIR 2006. The metrics are: P+-measure, P-measure, O-measure, Normalised Weighted Reciprocal Rank (NWRR) and Reciprocal Rank (RR). All of them except for RR can handle graded relevance. Unlike the ad hoc (but nevertheless useful) "swap" method proposed by Voorhees and Buckley, the new method derives the sensitivity and the performance difference required to guarantee a given significance level directly from Bootstrap Hypothesis Tests. We use four data sets from NTCIR to show that, according to this method, "P((+))-measure >= O-measure >= NWRR >= RR" generally holds, where ">=" means "is at least as sensitive as". These results generalise and reinforce previously reported ones based on the swap method. Therefore, we recommend the use of P((+))-measure and O-measure for practical tasks such as known-item search where recall is either unimportant or immeasurable.

  • Ranking the NTCIR systems based on multigrade relevance

    Tetsuya Sakai

    INFORMATION RETRIEVAL TECHNOLOGY   3411   251 - 262  2005年  [査読有り]

     概要を見る

    At NTCIR-4, new retrieval effectiveness metrics called Q-measure and R-measure were proposed for evaluation based on multi-grade relevance. This paper shows that Q-measure inherits both the reliability of noninterpolated Average Precision and the multigrade relevance capability of Average Weighted Precision through a theoretical analysis, and then verify the above claim through experiments by actually ranking the systems submitted to the NTCIR-3 CLIR Task. Our experiments confirm that the Q-measure ranking is very highly correlated with the Average Precision ranking and that it is more reliable than Average Weighted Precision.

  • 評価型ワークショップにおけるシステム順位の安定性について

    酒井哲也

    言語処理学会第11回年次大会 併設ワークショップ「評価型ワークショップを考える」    2005年  [査読有り]

  • 固有表現抽出と回答タイプ体系が質問応答システムの性能に与える影響(自然言語処理)

    市村由美, 齋藤佳美, 酒井哲也, 国分智晴, 小山誠

    電子情報通信学会 論文誌    2005年  [査読有り]

  • Flexible Pseudo-Relevance Feedback via Selective Sampling

    Tetsuya Sakai, Toshihiko Manabe, Makoto Koyama

    ACM TALIP    2005年  [査読有り]

  • Advanced Technologies for Information Access (invited paper)

    Tetsuya Sakai

    International Journal of Computer Processing of Oriental Languages    2005年  [査読有り]

  • ひとつの高適合文書を高精度に検索するタスクのための評価指標

    酒井哲也

    情報科学技術レターズ    2005年  [査読有り]

  • The reliability of metrics based on graded relevance

    Tetsuya Sakai

    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS   3689   1 - 16  2005年  [査読有り]

     概要を見る

    This paper compares 14 metrics designed for information retrieval evaluation with graded relevance, together with 10 traditional metrics based on binary relevance, in terms of reliability and resemblance of system rankings. More specifically, we use two test collections with submitted runs from the Chinese IR and English IR tasks in the NTCIR-3 CUR track to examine the metrics using methods proposed by Buckley/Voorhees and Voorhees/Buckley as well as Kendall's rank correlation. Our results show that AnDCC(l) and nDCC(l) ((Average) Normalised Discounted Cumulative Cain at Document cut-off 1) are good metrics, provided that I is large. However, if one wants to avoid the parameter I altogether, or if one requires a metric that closely resembles TREC Average Precision, then Q-measure appears to be the best choice.

  • Introduction to the special issue: Recent advances in information processing and access for Japanese

    Tetsuya Sakai, Yuji Matsumoto

    ACM Transactions on Asian Language Information Processing   4 ( 4 ) 275 - 376  2005年  [査読有り]

    DOI

  • The Relationship between Answer Ranking and User Satisfaction in a Question Answering System

    Tomoharu Kokubu, Tetsuya Sakai, Yoshimi Saito, Hideki Tsutsui, Toshihiko Manabe, Makoto Koyama, Hiroko Fujii

    NTCIR-5 Proceedings (Open Submission Session)    2005年  [査読有り]

  • The Effect of Topic Sampling on Sensitivity Comparisons of Information Retrieval Metrics

    Tetsuya Sakai

    NTCIR-5 Proceedings (Open Submission Session)    2005年  [査読有り]

  • ASKMi: A Japanese Question Answering System based on Semantic Role Analysis

    Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Makoto Koyama, Tomoharu Kokubu, Toshihiko Manabe

    RIAO 2004    2004年  [査読有り]

    CiNii

  • New Performance Metrics based on Multigrade Relevance

    Tetsuya Sakai

    NTCIR-4 Proceedings (Open Submission Session),    2004年  [査読有り]

    CiNii

  • The Effect of Back-Formulating Questions in Question Answering Evaluation

    Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Tomoharu Kokubu, Makoto Koyama

    ACM SIGIR 2004    2004年  [査読有り]

    CiNii

  • 汎用シソーラスと擬似適合性フィードバックとを用いた検索質問拡張

    小山誠, 真鍋俊彦, 木村和広, 酒井哲也

    「情報アクセスのためのテキスト処理」シンポジウム    2003年  [査読有り]

  • BRIDJE over a Language Barrier: Cross-Language Information Access by Integrating Translation and Retrieval

    Tetsuya Sakai, Makoto Koyama, Masaru Suzuki, Akira Kumano, Toshihiko Manabe

    IRAL 2003    2003年  [査読有り]

    CiNii

  • Evaluating Retrieval Performance for Japanese Question Answering: What Are Best Passages?

    Tetsuya Sakai, Tomoharu Kokubu

    ACM SIGIR 2003    2003年  [査読有り]

    CiNii

  • Average Gain Ratio: A Simple Retrieval Performance Measure for Evaluation with Multiple Relevance Levels

    Tetsuya Sakai

    ACM SIGIR 2003    2003年  [査読有り]

  • Relative and Absolute Term Selection Criteria: A Comparative Study for English and Japanese IR

    Tetsuya Sakai, Stephen E. Robertson

    ACM SIGIR 2002    2002年  [査読有り]

  • Generating transliteration rules for cross-language information retrieval from machine translation dictionaries

    Tetsuya Sakai, Akira Kumano, Toshihiko Manabe

    Proceedings of the IEEE International Conference on Systems, Man and Cybernetics   6   290 - 295  2002年  [査読有り]

     概要を見る

    This paper describes a method for automatically converting existing English-Japanese and Japanese-English machine translation dictionaries into English-Japanese transliteration rules and Japanese-English back-transliteration rules for cross language information retrieval. An existing English-katakana word alignment module, which is part of our own machine translation system, is exploited in generating probabilistic rewriting rules. If our system is allowed to output 15 candidate spellings, it successfully transliterates more than 75% of a set of out-of-vocabulary English words into katakana, and successfully back-transliterates more than 55% of a set of out-of-vocabulary katakana words into English. Moreover, our preliminary cross-language information retrieval experiments, which treat the candidate spellings as a group of synonyms, suggest that our methods can indeed compensate for the failure of machine translation in some cases.

    DOI

  • The Use of External Text Data in Cross-Language Information Retrieval based on Machine Translation

    Tetsuya Sakai

    IEEE SMC 2002    2002年  [査読有り]

  • 意味役割解析に基づく高適合英語文書の検索

    酒井哲也, 小山誠, 鈴木優, 真鍋俊彦

    FIT 2002 情報技術レターズ LD-8     67 - 68  2002年  [査読有り]

    CiNii

  • A framework for cross-language information access: Application to English and Japanese

    Gareth Jones, Nigel Collier, Tetsuya Sakai, Kazuo Sumita, Hideki Hirakawa

    COMPUTERS AND THE HUMANITIES   35 ( 4 ) 371 - 388  2001年11月  [査読有り]

     概要を見る

    Internet search engines allow access to online information from all over the world. However, there is currently a general assumption that users are fluent in the languages of all documents that they might search for. This has for historical reasons usually been a choice between English and the locally supported language. Given the rapidly growing size of the Internet, it is likely that future users will need to access information in languages in which they are not fluent or have no knowledge of at all. This paper shows how information retrieval and machine translation can be combined in a cross-language information access framework to help overcome the language barrier. We present encouraging preliminary experimental results using English queries to retrieve documents from the standard Japanese language BMIR-J2 retrieval test collection. We outline the scope and purpose of cross-language information access and provide an example application to suggest that technology already exists to provide effective and potentially useful applications.

  • Flexible Pseudo-Relevance Feedback via Direct Mapping and Categorization of Search Requests

    Tetsuya Sakai, Stephen E. Robertson, Stephen Walker

    ECIR 2001    2001年  [査読有り]

  • Japanese-English Cross-Language Information Retrieval using Machine Translation and Pseudo-Relevance Feedback

    Tetsuya Sakai

    International Journal of Computer Processing of Oriental Languages   14 ( 2 ) 83 - 107  2001年  [査読有り]

    DOI CiNii

  • Flexible Pseudo-Relevance Feedback Using Optimization Tables

    Tetsuya Sakai, Stephen E. Robertson

    ACM SIGIR 2001    2001年  [査読有り]

  • Generic Summaries for Indexing in Information Retrieval

    Tetsuya Sakai, Karen Sparck Jones

    ACM SIGIR 2001    2001年  [査読有り]

    CiNii

  • Combining the Ranked Output from Fulltext and Summary Indexes

    Tetsuya Sakai

    ACM SIGIR 2001 Workshop on Text Summarization    2001年  [査読有り]

  • Incremental relevance feedback in Japanese text retrieval

    Gareth Jones, Tetsuya Sakai, Masahiro Kajiura, Kazuo Sumita

    Information Retrieval   2 ( 4 ) 361 - 384  2000年  [査読有り]

     概要を見る

    The application of relevance feedback techniques has been shown to improve retrieval performance for a number of information retrieval tasks. This paper explores incremental relevance feedback for ad hoc Japanese text retrieval
    examining, separately and in combination, the utility of term reweighting and query expansion using a probabilistic retrieval model. Retrieval performance is evaluated in terms of standard precision-recall measures, and also using "number-to-view" graphs. Experimental results, on the standard BMIR-J2 Japanese language retrieval collection, show that both term reweighting and query expansion improve retrieval performance. This is reflected in improvements in both precision and recall, but also a reduction in the average number of documents which must be viewed to find a selected number of relevant items. In particular, using a simple simulation of user searching, incremental application of relevance information is shown to lead to progressively improved retrieval performance and an overall reduction in the number of documents that a user must view to find relevant ones. © 2000 Kluwer Academic Publishers.

    DOI

  • MT-based Japanese-English Cross-Language IR Experiments using the TREC Test Collections

    Tetsuya Sakai

    IRAL 2000    2000年  [査読有り]

  • A First Step towards Flexible Local Feedback for Ad hoc Retrieval

    Tetsuya Sakai, Masahiro Kajiura, Kazuo Sumita

    IRAL 2000    2000年  [査読有り]

  • 確率モデルに基づく日本語情報フィルタリングにおけるフィードバックによる検索条件展開および検索精度評価

    酒井哲也, Gareth J.F. Jones, 梶浦正浩, 住田一男

    情報処理学会論文誌    1999年  [査読有り]

  • A comparison of query translation methods for English-Japanese cross-language information retrieval

    Gareth Jones, Tetsuya Sakai, Nigel Collier, Akira Kumano, and Kazuo Sumita

    SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL     269 - 270  1999年  [査読有り]

     概要を見る

    In this paper we report results of an investigation into English-Japanese Cross-Language Information Retrieval (CLIR) comparing a number of query translation methods. Results from experiments using the standard BMIR-J2 Japanese collection suggest that full machine translation (MT) can outperform popular dictionary-based query translation methods and further that in this context MT is largely robust to queries with little linguistic structure.

  • Exploring the use of Machine Translation resources for English-Japanese Cross-Language Information Retrieval

    Gareth Jones, Tetsuya Sakai, Nigel Collier, Akira Kumano, Kazuo Sumita

    MT Summit VII Workshop on Machine Translation for Cross Language Information Retrieval    1999年  [査読有り]

  • 日本語情報検索システム評価用テストコレクションの構築

    木本 晴夫, 小川 泰嗣, 石川 徹也, 増永 良文, 福島 俊一, 田中 智博, 中渡瀬 秀一, 芥子 育雄, 豊浦 潤, 宮内 忠信, 上田 良寛, 松井 くにお, 木谷 強, 三池 誠司, 酒井 哲也, 徳永 健伸, 鶴岡 弘, 安形 輝

    情報処理学会論文誌    1999年  [査読有り]

  • 機械翻訳を用いた英日・日英言語横断検索に関する一考察

    酒井哲也, 梶浦 正浩, 住田 一男, Jones, G, Collier, N

    情報処理学会論文誌   40 ( 11 ) 4075 - 4086  1999年  [査読有り]

    CiNii

  • 情報検索システム評価のためのテストコレクション

    酒井哲也, 小川泰嗣, 木谷強, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 徳永健伸, 鶴岡弘, 安形輝, 神門典子

    Computer Today    1998年  [査読有り]

  • 日本語情報検索システム評価用テストコレクションの構築

    木本晴夫, 小川泰嗣, 石川徹也, 増永良文, 福島俊一, 田中智博, 中渡瀬秀一, 芥子育雄, 豊浦潤, 宮内忠信, 上田良寛, 松井くにお, 木谷強, 三池誠司, 酒井哲也, 徳永健伸, 鶴岡弘, 安形輝

    情報学シンポジウム'98    1998年  [査読有り]

  • ユーザーの要求に応じた 情報フィルタリングシステム NEATのプロファイル生成

    酒井哲也, Jones, G.J.F, 梶浦正浩, 住田一男

    Interaction '98     149 - 152  1998年  [査読有り]

    CiNii

  • Lessons from BMIR-J2: A Test Collection for Japanese IR Systems

    Tsuyoshi Kitani, Yasushi Ogawa, Tetsuya Ishikawa, Haruo Kimoto, Ikuo Keshi, Jun Toyoura, Toshikazu Fukushima, Kunio Matsui, Yoshihiro Ueda, Tetsuya Sakai, Takenobu Tokunaga, Hiroshi Tsuruoka, Hidekazu Nakawatase, Teru Agata

    ACM SIGIR '98    1998年  [査読有り]

  • Experiments in Japanese  Text Retrieval and Routing using the NEAT System

    Gareth Jones, Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita

    ACM SIGIR '98    1998年  [査読有り]

  • Application of Query Expansion Techniques in Probabilistic Japanese News Filtering

    Tetsuya Sakai, Gareth Jones, Masahiro Kajiura, Kazuo Sumita

    IRAL '98    1998年  [査読有り]

  • 情報フィルタリングのためのブール式と文書構造を利用した検索条件生成と検索精度評価

    酒井 哲也, 梶浦 正浩, 住田 一男

    情報処理学会論文誌    1998年  [査読有り]

  • 日本語テキスト情報検索システムの評価用テストコレクション

    酒井哲也, 小川泰嗣, 木谷強, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 徳永健伸, 鶴岡弘, 安形輝

    アドバンストデータベースシンポジウム'98, パネル:マルチメディア情報検索ベンチマークの未来    1998年  [査読有り]

  • WWW上のフロー情報を対象にした情報フィルタ (FreshEye)

    住田一男, 上原龍也, 小野顕司, 酒井哲也, 池田朋男, 下郡信宏

    インタラクション'97    1997年  [査読有り]

  • 日本語情報検索システム評価用テストコレクションBMIR-J1

    福島俊一, 小川泰嗣, 石川徹也, 増永良文, 木本晴夫, 田中智博, 中渡瀬秀一, 芥子育雄, 豊浦潤, 宮内忠信, 上田良寛, 松井くにお, 三池誠司, 酒井哲也, 木谷強, 徳永健伸, 鶴岡弘, 安形輝

    自然言語処理シンポジウム'96    1996年  [査読有り]

  • A User Interface for Generating Dynamic Abstracts of Retrieved Documents

    Tetsuya Sakai, Etsuo Itoh, Seiji Miike, Kazuo Sumita

    47th FID    1994年  [査読有り]

▼全件表示

書籍等出版物

  • Proceedings of ACM SIGIR 2021

    Fernando, Diaz, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, Tetsuya Sakai, Alejandro Bellogín, Masaharu Yoshioka

    2021年

  • Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact

    Tetsuya Sakai, Douglas W. Oard, Noriko Kando

    Springer  2020年

  • Proceedings of the Open-Source IR Replicability Challenge (OSIRRC 2019)

    Ryan Clancy, Nicola Ferro, Claudia Hauff, Jimmy Lin, Tetsuya Sakai, Ze Zhong Wu

    2019年

  • U-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018年

  • Q-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018年

  • Expected Reciprocal Rank. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018年

  • ERR-IA. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018年

  • D-Measure. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018年

  • alpha-nDCG. In: Ling Liu and M. Tamer Özsu (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018年

  • Advanced Information Retrieval Measures. In: Liu L., Özsu M. (eds) Encyclopedia of Database Systems (Second Edition)

    Tetsuya Sakai

    Springer  2018年

  • Laboratory Experiments in Information Retrieval: Sample Sizes, Effect Sizes, and Statistical Power

    Tetsuya Sakai

    Springer  2018年

  • Proceedings of AIRS 2018 (LNCS 11292)

    Yuen-Hsien Tseng, Tetsuya Sakai, Jing Jiang, Lun-Wei Ku, Dae Hoon Park, Jui-Feng Yeh, Liang-Chih Yu, Lung-Hao Lee, Zhi-Hong Chen

    2018年

  • Proceedings of ACM SIGIR 2017

    Noriko Kando, Tetsuya Sakai, Hideo Joho, Hang Li, Arjen P. de, Vries, A.P, Ryen W. White

    2017年

  • 人工知能学大事典

    人工知能学会

    共立出版  2017年

  • Proceedings of SPIRE 2016 (LNCS 9954)

    Shunsuke Inegaga, Kunihiko Sadakane, Tetsuya Sakai

    Springer  2016年

  • 情報アクセス評価方法論~検索エンジンの進歩のために~,

    酒井哲也

    コロナ社  2015年

  • Proceedings of ACM SIGIR 2013

    Gareth J.F. Jones, Páraic Sheridan, Diane Kelly, Maarten de Rijke, Tetsuya Sakai

    2013年

  • Proceedings of NTCIR-10

    Noriko Kando, Kazuaki Kishida, Eric Tang, Tetsuya Sakai, Makoto P. Kato, Ka Po Chow, Isao Goto, Yotaro Watanabe, Tomoyosi Akiba, Hiromitsu Nishizaki, Akiko Aizawa, Mizuki Morita, Eiji Aramaki

    2013年

  • Proceedings of NTCIR-9

    Noriko Kando, Daisuke Ishikawa, Miho Sugimoto, Fredric C. Gey, Tetsuya Sakai, Tomoyosi Akiba, Hideki Shima, Shlomo Geva, Eric Tang, Andrew Trotman, Tsuneaki Kato, Bin Lu, Isao Goto

    2011年

  • Proceedings of the 3rd International Workshop on Evaluating Information Access (EVIA 2010)

    Tetsuya Sakai, Mark Sanderson, William Webber, Noriko Kando, Kazuaki Kishida

    2010年

  • Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation,

    Shlomo Geva, Jaap Kamps, Carol Peters, Tetsuya Sakai, Andrew Trotman, Ellen Voorhees

    2009年

  • 5th Asia Information Retrieval Symposium (AIRS 2009)

    Gary Geunbae Lee, Dawei Song, Chin-Yew Lin, Akiko Aizawa, Kazuko Kuriyama, Masaharu Yoshioka, Tetsuya Sakai

    Springer  2009年

  • Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation

    Shlomo Geva, Jaap Kamps, Carol Peters, Tetsuya Sakai, Andrew Trotman, Ellen Voorhees

    2009年

  • 言語処理学辞典

    共同執筆

    共立出版  2009年

  • Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008)

    Tetsuya Sakai, Mark Sanderson, Noriko Kando, Miho Sugimoto

    2008年

  • Proceedings of AIRS 2008 (LNCS 4993)

    Hang Li, Ting Liu, Wei-Ying Ma, Tetsuya Sakai, Kam-Fai Wong, Guodong Zhou

    2008年

  • Proceedings of the First International Workshop on Evaluating Information Access (EVIA 2007),

    Tetsuya Sakai, Mark Sanderson, David Kirk Evans

    2007年

▼全件表示

Misc

  • Voice Assistantアプリの対話型解析システムの開発

    刀塚敦子, 飯島涼, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉, 森達哉

    電子情報通信学会技術研究報告(Web)   120 ( 384(ICSS2020 26-59) )  2021年

    J-GLOBAL

  • A Preview of the NTCIR-10 INTENT-2 Results

    Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, MakotoP.Kato, Mayu Iwata

    研究報告デジタルドキュメント(DD)   2013 ( 5 ) 1 - 8  2013年02月

     概要を見る

    The second NTCIR INTENT task (INTENT-2) will be concluded at the NTCIR-10 conference in June 2013. The task comprises two subtasks: Subtopic Mining (given a query, return a ranked list of subtopic strings) and Document Ranking (given a query, return a diversified web search result). The task attracted participating teams from China, France, Japan and South Korea: 12 teams for Subtopic Mining and 4 teams for Document Ranking. This paper provides a preview of the official results of the task, while keeping the participating teams anonymous.The second NTCIR INTENT task (INTENT-2) will be concluded at the NTCIR-10 conference in June 2013. The task comprises two subtasks: Subtopic Mining (given a query, return a ranked list of subtopic strings) and Document Ranking (given a query, return a diversified web search result). The task attracted participating teams from China, France, Japan and South Korea: 12 teams for Subtopic Mining and 4 teams for Document Ranking. This paper provides a preview of the official results of the task, while keeping the participating teams anonymous.

    CiNii

  • A Preview of the NTCIR-10 INTENT-2 Results

    Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, MakotoP.Kato, Mayu Iwata

    研究報告情報基礎とアクセス技術(IFAT)   2013 ( 5 ) 1 - 8  2013年02月

     概要を見る

    The second NTCIR INTENT task (INTENT-2) will be concluded at the NTCIR-10 conference in June 2013. The task comprises two subtasks: Subtopic Mining (given a query, return a ranked list of subtopic strings) and Document Ranking (given a query, return a diversified web search result). The task attracted participating teams from China, France, Japan and South Korea: 12 teams for Subtopic Mining and 4 teams for Document Ranking. This paper provides a preview of the official results of the task, while keeping the participating teams anonymous.The second NTCIR INTENT task (INTENT-2) will be concluded at the NTCIR-10 conference in June 2013. The task comprises two subtasks: Subtopic Mining (given a query, return a ranked list of subtopic strings) and Document Ranking (given a query, return a diversified web search result). The task attracted participating teams from China, France, Japan and South Korea: 12 teams for Subtopic Mining and 4 teams for Document Ranking. This paper provides a preview of the official results of the task, while keeping the participating teams anonymous.

    CiNii

  • Query Snowball : A Co-occurrence-based Approach to Multi-document Summarization for Question Answering (データベース Vol.5 No.2)

    Hajime Morita, Tetsuya Sakai, Manabu Okumura

    情報処理学会論文誌データベース(TOD)   5 ( 2 ) 11 - 16  2012年06月

     概要を見る

    We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.313, a 36% improvement over a baseline using Maximal Marginal Relevance.

    CiNii

  • Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments (情報学基礎(FI) Vol.2009-FI-95)

    Tetsuya Sakai, Noriko Kando, Chuan-JieLin, Ruihua Song, Hideki Shima, Teruko Mitamura

    研究報告情報学基礎(FI)   2009 ( 9 ) 1 - 8  2009年07月

     概要を見る

    At the NTCIR-7 Workshop Meeting held in December 2008, participating systems of the ACLIA IR4QA task were evaluated based on "qrels version 1," which covered the depth-30 pool for every topic and went further down the pool for a limited number of topics, due to time constraints. This paper reports on revised results based on "qrels version 2" which covers the depth-100 pool for every topic. While the version 1 and version 2 results are generally in agreement, some differences in system rankings and significance test results suggest that the additional effort was worthwhile. This paper also reports on a set of additional experiments with new "pseudo-qrels," which mimics the qrels without relying on any manual relevance assessments. Our pseudo-qrels experiments are surprisingly successful: the Pearson correlation coefficients between performances based on our "size-100" pseudo-qrels and those based on qrels version 2 are over 0.9, and even the Kendall rank correlations are 0.58-0.86. Hence, for the next round of IR4QA at NTCIR-8, we may be able to predict system rankings with reasonable accuracy using size-100 pseudo-qrels, right after the run submission deadline.At the NTCIR-7 Workshop Meeting held in December 2008, participating systems of the ACLIA IR4QA task were evaluated based on "qrels version 1," which covered the depth-30 pool for every topic and went further down the pool for a limited number of topics, due to time constraints. This paper reports on revised results based on "qrels version 2" which covers the depth-100 pool for every topic. While the version 1 and version 2 results are generally in agreement, some differences in system rankings and significance test results suggest that the additional effort was worthwhile. This paper also reports on a set of additional experiments with new "pseudo-qrels," which mimics the qrels without relying on any manual relevance assessments. Our pseudo-qrels experiments are surprisingly successful: the Pearson correlation coefficients between performances based on our "size-100" pseudo-qrels and those based on qrels version 2 are over 0.9, and even the Kendall rank correlations are 0.58-0.86. Hence, for the next round of IR4QA at NTCIR-8, we may be able to predict system rankings with reasonable accuracy using size-100 pseudo-qrels, right after the run submission deadline.

    CiNii

  • Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments (データベースシステム(DBS) Vol.2009-DBS-148)

    Tetsuya Sakai, Noriko Kando, Chuan-JieLin, Ruihua Song, Hideki Shima, Teruko Mitamura

    研究報告データベースシステム(DBS)   2009 ( 9 ) 1 - 8  2009年07月

     概要を見る

    At the NTCIR-7 Workshop Meeting held in December 2008, participating systems of the ACLIA IR4QA task were evaluated based on "qrels version 1," which covered the depth-30 pool for every topic and went further down the pool for a limited number of topics, due to time constraints. This paper reports on revised results based on "qrels version 2" which covers the depth-100 pool for every topic. While the version 1 and version 2 results are generally in agreement, some differences in system rankings and significance test results suggest that the additional effort was worthwhile. This paper also reports on a set of additional experiments with new "pseudo-qrels," which mimics the qrels without relying on any manual relevance assessments. Our pseudo-qrels experiments are surprisingly successful: the Pearson correlation coefficients between performances based on our "size-100" pseudo-qrels and those based on qrels version 2 are over 0.9, and even the Kendall rank correlations are 0.58-0.86. Hence, for the next round of IR4QA at NTCIR-8, we may be able to predict system rankings with reasonable accuracy using size-100 pseudo-qrels, right after the run submission deadline.At the NTCIR-7 Workshop Meeting held in December 2008, participating systems of the ACLIA IR4QA task were evaluated based on "qrels version 1," which covered the depth-30 pool for every topic and went further down the pool for a limited number of topics, due to time constraints. This paper reports on revised results based on "qrels version 2" which covers the depth-100 pool for every topic. While the version 1 and version 2 results are generally in agreement, some differences in system rankings and significance test results suggest that the additional effort was worthwhile. This paper also reports on a set of additional experiments with new "pseudo-qrels," which mimics the qrels without relying on any manual relevance assessments. Our pseudo-qrels experiments are surprisingly successful: the Pearson correlation coefficients between performances based on our "size-100" pseudo-qrels and those based on qrels version 2 are over 0.9, and even the Kendall rank correlations are 0.58-0.86. Hence, for the next round of IR4QA at NTCIR-8, we may be able to predict system rankings with reasonable accuracy using size-100 pseudo-qrels, right after the run submission deadline.

    CiNii

  • 趣味と仕事

    酒井 哲也

    情報処理   49 ( 7 ) 835 - 835  2008年07月

    CiNii

  • Comparing metrics across TREC and NTCIR: the robustness to system bias (データベースシステム・情報学基礎)

    酒井 哲也

    情報処理学会研究報告データベースシステム(DBS)   2008 ( 56 ) 1 - 8  2008年06月

     概要を見る

    Test collections are growing larger and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used evaluation metrics specifically designed to handle this problem but most of them have only examined the metrics under incomplete but unbiased conditions using random samples of the original relevance data. This paper examines nine metrics in more realistic settings by reducing the number of pooled systems. Even though previous work has shown that metrics based on a condensed list obtained by removing all unjudged documents from the original ranked list are effective for handling very incomplete but unbiased relevance data we show that they are not necessarily superior to traditional metrics in the presence of system bias. Using data from both TREC and NTCIR we first show that condensed-list metrics overestimate new systems while traditional metrics underestimate them and that the overestimation tends to be larger than the underestimation. We then show that when relevance data is heavily biased towards a single team or a few teams the condensed-list versions of Average Precision (AP) Q-measure (Q) and normalised Discounted Cumulative Gain (nDCG) which we call AP' Q' and nDCG' are not necessarily superior to the original metrics in terms of discriminative power i.e. the overall ability to detect pairwise statistical significance. Nevertheless AP' and Q' are generally more discriminative than bpref and the condensed-list version of Rank-Biased Precision (RBP) which we call RBP'.Test collections are growing larger, and relevance data constructed through pooling are suspected of becoming more and more incomplete and biased. Several studies have used evaluation metrics specifically designed to handle this problem, but most of them have only examined the metrics under incomplete but unbiased conditions, using random samples of the original relevance data. This paper examines nine metrics in more realistic settings, by reducing the number of pooled systems. Even though previous work has shown that metrics based on a condensed list, obtained by removing all unjudged documents from the original ranked list, are effective for handling very incomplete but unbiased relevance data, we show that they are not necessarily superior to traditional metrics in the presence of system bias. Using data from both TREC and NTCIR, we first show that condensed-list metrics overestimate new systems while traditional metrics underestimate them, and that the overestimation tends to be larger than the underestimation. We then show that, when relevance data is heavily biased towards a single team or a few teams, the condensed-list versions of Average Precision (AP), Q-measure (Q) and normalised Discounted Cumulative Gain (nDCG), which we call AP', Q' and nDCG', are not necessarily superior to the original metrics in terms of discriminative power, i.e., the overall ability to detect pairwise statistical significance. Nevertheless, AP' and Q' are generally more discriminative than bpref and the condensed-list version of Rank-Biased Precision (RBP), which we call RBP'.

    CiNii

  • 情報検索テストコレクションと評価指標

    酒井 哲也

    情報処理学会研究報告自然言語処理(NL)   2008 ( 4 ) 1 - 8  2008年01月

     概要を見る

    本稿では,情報検索評価用テストコレクションの概要とその作成過程,検索有効性の評価指標,およびこれらを用いた情報検索評価の方法について述べる.This paper describes what information retrieval test collections consist of and how they are constructed, and provide some formal definitions of evaluation metrics for measuring retrieval effectiveness. We then describe how to conduct sound evaluation experiments.

    CiNii

  • 情報検索テストコレクションと評価指標

    酒井 哲也

    情報処理学会研究報告情報学基礎(FI)   2008 ( 4 ) 1 - 8  2008年01月

     概要を見る

    本稿では,情報検索評価用テストコレクションの概要とその作成過程,検索有効性の評価指標,およびこれらを用いた情報検索評価の方法について述べる.This paper describes what information retrieval test collections consist of and how they are constructed, and provide some formal definitions of evaluation metrics for measuring retrieval effectiveness. We then describe how to conduct sound evaluation experiments.

    CiNii

  • A further note on alternatives to Bpref (情報学基礎)

    Tetsuya SAKAI, Noriko KANDO

    情報処理学会研究報告情報学基礎(FI)   2007 ( 109 ) 7 - 14  2007年11月

     概要を見る

    This paper compares the robustness of information retrieval (IR) metrics to incomplete relevance assessments using four different sets of graded-relevance test collections with submitted runs - two from TREC and two from NTCIR. We investigate the effect of reducing the original relevance data on discriminative power (i.e. how often statistical significance can be detected given the probability of Type I Error) and on Kendall's rank correlation between two system rankings. According to these experiments Q' nDCG' and AP' proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also clarify some properties of these metrics that immediately follow from their definitions.This paper compares the robustness of information retrieval (IR) metrics to incomplete relevance assessments, using four different sets of graded-relevance test collections with submitted runs - two from TREC and two from NTCIR. We investigate the effect of reducing the original relevance data on discriminative power (i.e., how often statistical significance can be detected given the probability of Type I Error) and on Kendall's rank correlation between two system rankings. According to these experiments, Q', nDCG' and AP' proposed by Sakai are superior to bpref proposed by Buckley and Voorhees and to Rank-Biased Precision proposed by Moffat and Zobel. We also clarify some properties of these metrics that immediately follow from their definitions.

    CiNii

  • Q, Rの次はO, そしてP…

    酒井 哲也

    情報処理   48 ( 7 ) 761 - 761  2007年07月

    CiNii

  • 効率的な番組視聴を支援するための話題ラベルの生成とその評価

    小山 誠, 酒井 哲也, 福井 美佳, 上原 龍也, 下森 大志

    情報処理学会研究報告デジタルドキュメント(DD)   2007 ( 34 ) 17 - 23  2007年03月

     概要を見る

    本論文では、TV番組を話題毎に分割して得られるセグメントに対して、内容を簡潔に表す代表キーワードや代表フレーズ、代表文を付与する話題ラベリングの手法とその評価について述べる。本手法は、セグメントに対するクローズドキヤプション(字幕テキスト)から、情報検索における適合フィードバック技術を応用して、対象セグメントの話題に関するキーワードやフレーズ、文を抽出する。「旅行」「タウン」「料理」ジャンルの情報番組を対象に、自動および人手により作成した文形式のラベル、キーワード・フレーズ形式のラベルに対して、39名の被験者による主観評価実験を実施した。その結果、ラベルの「わかりやすさ」と「適切さ」に関し、全体としてはキーワード・フレーズ形式のラベルが文形式のラベルより高い評価を得られることを確認した。This paper describes a method for generating keyword, phrase and sentence labels for video segments of TV programs. By using a relevance feedback algorithm in information retrieval, it selects topic keywords, phrases and sentences from closed caption text in each topical segment. 39 subjects evaluated keyword, phrase and sentence labels from TV programs about travel, town and cooking. The results show that keyword and phrase labels achieve better results than sentence labels on understandability and relevance of labels.

    CiNii

  • Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evalution with Graded Relevance

    Tetsuya Sakai

    情報処理学会研究報告情報学基礎(FI)   2006 ( 94 ) 57 - 64  2006年09月

     概要を見る

    Large-scale information retrieval evalution efforts such as TREC and NTCIR have always used binary-relevance evalution metrics even when graded relevance data were available. However the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evalution with graded relevance.Large-scale information retrieval evalution efforts such as TREC and NTCIR have always used binary-relevance evalution metrics, even when graded relevance data were available. However, the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics, though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evalution with graded relevance.

    CiNii

  • Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evalution with Graded Relevance

    Tetsuya Sakai

    情報処理学会研究報告自然言語処理(NL)   2006 ( 94 ) 57 - 64  2006年09月

     概要を見る

    Large-scale information retrieval evalution efforts such as TREC and NTCIR have always used binary-relevance evalution metrics even when graded relevance data were available. However the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evalution with graded relevance.Large-scale information retrieval evalution efforts such as TREC and NTCIR have always used binary-relevance evalution metrics, even when graded relevance data were available. However, the NTCIR-6 crosslingual task has finally announced that it will use graded-relevance metrics, though only as additional metrics. This paper compares graded-relevance metrics in terms of the ability to control the balance between retrieving highly relevant documents and retrieving any relevant documents early in the ranked list. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evalution with graded relevance.

    CiNii

  • 質問応答型検索の音声認識誤りに対するロバスト性向上

    筒井 秀樹, 真鍋 俊彦, 福井 美佳, 藤井 寛子, 浦田 耕二, 酒井 哲也

    情報処理学会研究報告自然言語処理(NL)   2005 ( 22 ) 31 - 38  2005年03月

     概要を見る

    我々はこれまで,質問応答型マルチモーダルヘルプシステムの開発を行ってきた.これはユーザからの質問に対し,映像・音声・取扱説明(テキスト)などで構成される表現力豊かなマルチモーダルコンテンツの検索技術,および,質問内容を理解し,ユーザが必要としている情報に対して的確に回答する質問応答技術を融合することにより,よりわかりやすい情報提供を実現したシステムである.この中で,音声入力による質問を処理する際,音声認識誤りが起きると,その後の処理がうまく行かず,適切な回答が出来ない場合があった.失敗原因を検討した結果,具体的な時間や量をきくFactoid型の質問に対する音声認識誤りの影響が大きいことがわかった.これは,音声認識誤りによって疑問詞についての情報が失われることにより,具体的に何を回答すべきかを判定する回答タイプ判定に失敗することが原因であった.そこで今回,音声認識誤りに対するロバスト性向上を目的とし,回答タイプが正しく判定されるように,音声認識結果を補完して検索する手法を開発した.その結果,上位3位までのMRR(Mean Reciprocal Rank)による検索精度で,従来手法が0.429であったのに対し,今回の手法では0.597に向上した.We have been developing a multimodal question answering system that combines the search technol-ogy for multimodal contents with high expressive power such as video, speech and text, and the factoid question answering technology for understanding the user's information need and extracting exact an-swers from text. Failure analyses of our system showed that speech recognition errors were fatal for answer type recognition and therefore for the final Mean Reciprocal Rank (MRR) performance, espe-cially with numerical factoid questions. We therefore propose a new method which is robust to speech recognition errors. This method improves our MRR based on top 3 answers from 0.429 to 0.597.

    CiNii

  • A Note on the Reliability of Japanese Question Answering Evaluation

    Tetsuya Sakai

    情報処理学会研究報告情報学基礎(FI)   2004 ( 119 ) 57 - 64  2004年11月

     概要を見る

    This paper compares existing QA evaluation metrics from the viewpoint of reliability and usefulness using the NTCIR-4 QAC2 Japanese QA tasks and our adaptations of Buckley/Voorhees and Voorhees/Buckley reliability measurement methods. Our main findings are: (1) The fraction of questions with a correct answer within Top 5 (NQcorrect5) and that with a correct answer at Rank 1 (NQcorrect1) are not as stable as Reciprocal Rank based on ranked lists containing up to five answers. (2) Q-measure which can handle multiple correct answers and answer correctness levels is as reliable and useful as Reciprocal Rank provided that a mild gain value assignment is used. Using answer correctness levels tends to hurt stability while handling multiple correct answers improves it.This paper compares existing QA evaluation metrics from the viewpoint of reliability and usefulness, using the NTCIR-4 QAC2 Japanese QA tasks and our adaptations of Buckley/Voorhees and Voorhees/Buckley reliability measurement methods. Our main findings are: (1) The fraction of questions with a correct answer within Top 5 (NQcorrect5) and that with a correct answer at Rank 1 (NQcorrect1) are not as stable as Reciprocal Rank based on ranked lists containing up to five answers. (2) Q-measure, which can handle multiple correct answers and answer correctness levels, is as reliable and useful as Reciprocal Rank, provided that a mild gain value assignment is used. Using answer correctness levels tends to hurt stability, while handling multiple correct answers improves it.

    CiNii

  • High - Precision Search via Question Abstraction for Japanese Question Answering

    Tetsuya SAKAI, Yoshimi SAITO, Tomoharu KOKUBU, Makoto KOYAMA, Toshihiko MANABE

    情報処理学会研究報告自然言語処理(NL)   2004 ( 93 ) 139 - 146  2004年09月

     概要を見る

    This paper explores the use of Question Abstraction i.e. Named Entity Recognition for questions input by the user for reranking retrieved documents to enhance retrieval precision for Japanese Question Answering (QA). Question Abstraction may help improve precision because (a) As named entities are often phrases it may have effects that are similar to phrasal or proximity search; (b) As named entity recognition is context-sensitive the named entity tags may help disambiguate ambiguous terms and phrases. Our experiments using several Japanese ``exact answer'' QA test collections show that this approach significantly improves IR precision but that this improvement is not necessarily carried over to the overall QA performance. Additionally we conduct preliminary experiments on the use of Question Abstraction for Pseudo-Relevance Feedback using Japanese {?em IR} test collections and find positive (though not statistically significant) effects. Thus the Question Abstraction approach probably deserves further investigations.This paper explores the use of Question Abstraction, i.e., Named Entity Recognition for questions input by the user, for reranking retrieved documents to enhance retrieval precision for Japanese Question Answering (QA). Question Abstraction may help improve precision because (a) As named entities are often phrases, it may have effects that are similar to phrasal or proximity search; (b) As named entity recognition is context-sensitive, the named entity tags may help disambiguate ambiguous terms and phrases. Our experiments using several Japanese ``exact answer'' QA test collections show that this approach significantly improves IR precision, but that this improvement is not necessarily carried over to the overall QA performance. Additionally, we conduct preliminary experiments on the use of Question Abstraction for Pseudo-Relevance Feedback using Japanese {\em IR} test collections, and find positive (though not statistically significant) effects. Thus the Question Abstraction approach probably deserves further investigations.

    CiNii

  • 新聞記事からの用語定義の抽出と固有表現クラスに基づく分類

    小山 誠, 酒井 哲也, 真鍋 俊彦

    情報処理学会研究報告自然言語処理(NL)   2004 ( 93 ) 45 - 51  2004年09月

     概要を見る

    本報告では,質問応答システムなどの自然言語処理システムの言語知識の拡張のため,新聞記事から用語定義を抽出し,分類・体系化するシステムを提案する.本システムは,定義文に対する固有表現抽出結果から得られる固有表現の意味クラスと,定義文に対する形態素解析結果から抽出される語に基づき,用語定義を分類する.新聞記事を用いた評価実験を行った結果,14の意味クラスに対して,適合率82.1%,再現率50.8%で抽出した用語定義を分類できることを確認した.In this paper, we propose a system that uses Japanese newspaper corpora for extracting and classifying term definitions to expand the knowledge of a natural language system such as a question answering system. The system classifies term definitions based on semantic classes obtained through named entity extraction and words obtained through morphological analysis. In an experiment using news articles, the system classifies term definitions by 14 semantic classes and achieves 82.1% precision and 50.8% recall.

    CiNii

  • N-021 自然言語表現に基づく学生アンケート分析システム(N.教育・人文科学)

    酒井 哲也, 石田 崇, 後藤 正幸, 平澤 茂一

    情報科学技術フォーラム一般講演論文集   3 ( 4 ) 325 - 328  2004年08月

    CiNii

  • 係り受け木を用いた日本語文書の重要部分抽出

    伊藤 潤, 酒井 哲也, 平澤 茂一

    情報処理学会研究報告自然言語処理(NL)   2003 ( 108 ) 19 - 24  2003年11月

     概要を見る

    日本語の文は、係り受け関係をもとに木構造(係り受け木)で表すことができる.係り受け木の部分木の表す分は,係り受け関係が保存されるため一般に正しい文となる.本稿では,文書を拡大係り受け木として表し,そのノード,エッジに重みを与える.そして,重要部分抽出問題を「拡大係り受け木の部分木のうち評価値を最大にする木を探索する問題」として定式化し,その最適化問題を解くアルゴリズムを示す.その後,提案手法による要約システムを実装し,作成された要約文を人手による採点と原文との類似度で評価を行った.A Japanese sentence can be expressed as a tree structure (dependency tree) based on dependency relations.Since a subtree of a dependency tree preserves the dependency relations of the original tree, it generally represents a correct sentence on its own. In this paper, a document is expressed as an extended dependency tree, in which weights are assigned to its nodes and edges. Moreover, the problem of extracting important text fragments is formalized as that of "searching for a subtree that maximizes a certain score from subtrees of the extended decision tree". We implemented auch a summarization system and performed evaluations based on manual assessment as well as comparison with original texts.

    CiNii

  • "ベイズ統計を用いた文書ファイルの自動分析手法,"

    後藤正幸, 伊藤潤, 石田崇, 酒井哲也

    経営情報学会2003年度秋季全国研究発表大会予稿集,函館   pp.28-31  2003年

  • 「インターネットを用いた研究活動支援システム」システム構成

    平澤茂一, 松嶋敏泰, 鴻巣敏之, 酒井哲也, 中澤真, 李相協, 野村亮

    2001PCカンファレンス    2001年

  • Cross - language情報検索のためのBMIR - J2を用いた一考察

    酒井 哲也, 梶浦 正浩, 住田 一男

    情報処理学会研究報告自然言語処理(NL)   1999 ( 2 ) 41 - 48  1999年01月

     概要を見る

    本論文では,日本語テストコレクションBMIR-J2およびこれを英訳したデータを用い,情報フィルタリングシステムNEATと機械翻訳システムASTRANSACによる日・英間のcross-language情報検索の検索精度評価実験を行う.英語検索要求による日本語文書の検索実験では,文書の翻訳と検索要求の翻訳のアプローチを,さらに異なる翻訳者による検索要求の翻訳を比較する.日本語検索要求による擬似英語文書の検索実験では,検索要求の翻訳の前後にローカルフィードバックを行う.以上により,日本語単言語検索の90%以上の精度を実現する.We study a cross-language IR approach using the NEAT information filtering system and the AS-TRANSAC machine translation system. The BMIR-J2 standard Japanese test collection and our own translated data are used for evaluation. In the English-to-Japanese experiments, we consider both document translation and query translation, and also compare the retrieval performance when the queries are translated by different translators. In the Japanese-to-pseudo-English experiments, we perform local feedback both before and after query translation. We achieve over 90% of Japanese monolingual performance.

    CiNii

  • 日本語情報検索システム評価用テストコレクションBMIR ? J2

    木谷強, 小川 泰嗣, 石川 徹也, 木本 晴夫, 中渡瀬 秀一, 芥子 育雄, 豊浦 潤, 福島 俊一, 松井 くにお, 上田 良寛, 酒井 哲也, 徳永 健伸, 鶴岡 弘, 安形 輝

    情報処理学会研究報告データベースシステム(DBS)   1998 ( 2 ) 15 - 22  1998年01月

     概要を見る

    日本語情報検索システム評価用テストコレクションBMIR-J2は、情報処理学会データベースシステム研究会内のワーキンググループによって作成されている。BMIR-J2は1998年3月から配布される予定であるが、これに先立ち、テスト版としてBMIR-J2が1996年3月からモニタ公開された。J1は50箇所のモニタに配布され、多数の研究成果が発表されている。BMIR-J2では、J1に対するモニタユーザからのアンケートの回答と、作成にあたったワーキングループメンバの経験をもとに、テストコレクションの検索対象テキスト数を大幅に増やし、検索要求と適合性判定基準も見直した。本論文では、BMIR-J2の内容とその作成手順、および今後の課題について述べる。BMIR-J2, a test collection for evaluation of Japanese information retrieval systems to be released in March 1998, has been developed by a working group under the Special Interest Group on Database Systems in Information Processing Society of Japan. Since March 1996, a preliminary version called BMIR-J1 has been distributed to fifty sites and used in many research projects. Based on comments from the BMIR-J1 users and our experience, we have enlarged the collection size and revised search queries and relevance assessments in BMIR-J2. In this paper, we describe BMIR-J2 and its development process, and discuss issues to be considered for improving BMIR-J2 further.

    CiNii

  • 情報フィルタリングシステムNEATのための検索要求文からのプロファイル生成

    酒井 哲也, 梶浦 正浩, 住田 一男

    情報処理学会研究報告情報学基礎(FI)   1997 ( 86 ) 83 - 88  1997年09月

     概要を見る

    我々は、新聞社・雑誌社により日々提供される電子化記事から個々のユーザーの興味に合ったものを選出し電子メイルなどで配信する情報フィルタリングシステムNEATを開発した。NEATは、プロファイルに記述されたブール式、検索語の出現位置・文書内密度・文書内分布などの多様な検索条件ベクトルに基づき、文書に対して加点しランキングを行う。今回、BMIR?J1の自然言語で書かれた検索要求文からプロファイルを自動生成する実験を行い、単純なブール式のプロファイルと人手によるプロファイルの中間程度の性能を達成できることを確認した。初期プロファイルの自動生成とrelevance feedbackの併用により、人手によるプロファイル作成の負荷は大幅に軽減されると考えられる。The NEAT information filtering system selects relevant articles from digital text provided daily by Japanese newspaper companies and publishers, and sends them by e-mail to its users. NEAT calculates a score for each article and produces a ranked output based on various types of query vectors written in the profile, such as location, density and distribution of keywords as well as boolean operators. We show that profiles generated automatically from query sentences can lie halfway between simple boolean profiles and hand-made profiles with respect to retrieval effectiveness. By combining this method and relevance feedback, the burden of manual profile definition will be lightened considerably.

    CiNii

  • ベンチマーク BMIR-J1 を用いた情報フィルタリングシステム NEAT の評価

    酒井 哲也, 梶浦 正浩, 三池 誠司, 佐藤 誠, 住田 一男

    全国大会講演論文集   54   301 - 302  1997年03月

     概要を見る

    我々は, プール式, 検索語の出現位置, 検素語の文書内密度・分布などの多様な検索条件ベクトルにより文書に対して加点しランキングを行う情報フィルタリングシステム NEAT を開発した. 本稿では, 検索システム評価用ベンチマーク BMIR-J1を用いた, NEAT のプール式および検索語の出現位置情報のみを利用した場合の検索精度の評価について報告する.

    CiNii

  • 情報フィルタリングシステム NEAT の開発

    梶浦 正浩, 三池 誠司, 酒井 哲也, 佐藤 誠, 住田 一男

    全国大会講演論文集   54   299 - 300  1997年03月

     概要を見る

    我々は, 新聞社/雑誌社などから日々提供される文書(記事)よりユーザの要求に合致するものを抽出しユーザに提供する, 実サービス用の情報フィルタリングシステム NEAT (News Extractor with Accurately Tailored profiles) およびシステムの中核であるフィルタリングエンジシを開発した. フィルタリングエンジンは, 2種類の単語検索方法を結合した新しい検索法や多様なフィールドに対応した複数の検索条件ベクトルを用いることによって, 高い再現率/適合率を実現できるよう設計されている. 本稿では, 開発した NEAT およびフィルタリングエンジンの概要について述べ, また, 新しい単語検索法の評価結果を示す.

    CiNii

  • 電子図書館のための効率的な文書検索 : 検索/提示のための文書構造化と抄録生成

    住田 一男, 酒井 哲也, 小野 顕司, 三池 誠司

    ディジタル図書館   3   35 - 41  1995年03月

    CiNii

  • 文書検索システムの動的抄録提示インタフェースの評価

    酒井 哲也, 三池 誠司, 住田 一男

    情報処理学会研究報告ヒューマンコンピュータインタラクション(HCI)   1994 ( 96 ) 49 - 54  1994年11月

     概要を見る

    膨大な文書検索結果の全文を読み所望の文書を選出したり情報を得たりする労力の軽減のために、我々は、検索した文書の抄録を自動生成して提示するインタフェースを開発した。これは、ユーザーによる「詳しく」「簡単に」の指示に応じて、抄録の任意の部分の長さを変更することを可能とする。このインタフェースの評価実験を行った結果、文書の要不要の判定時間は、原文のみを提示した場合に比べ80%程度に短縮されることがわかった。この代償として、平均では判定の質が原文のみを提示した場合の90%程度に低下してしまうが、内容の深い理解を必要としない判定問題においては判定の質を保持することができた。In order to lighten the burden of browsing through numerous retrieved documents to select relevant documents or to obtain useful information, we have developed a user interface for generating and presenting abstracts of the retrieved documents. The interface enables the user to alter the length of any part of the abstract by entering DETAIL or BRIEF commands. Experiments show that, using this interface, the time for judging the relevance of each retrieved document can be reduced to 80% in comparison to using an interface that only presents the full-text to the user. Although the quality of relevance judgment was on average lowered to 90% of that achieved in the case of full-text presentation, it was not affected when deep understanding of the contents was not required.

    CiNii

  • 自動抄録機能をもつ対話的文書検索システム : システムの機能と構成

    住田 一男, 酒井 哲也, 小野 顕司, 伊藤 悦雄, 三池 誠司, 武田 公人

    全国大会講演論文集   48   275 - 276  1994年03月

     概要を見る

    近年、ワークステーションの計算機パワーの増大にともない、全文文書を検索対象とした全文検索システムの実用化が進みつつある。しかし、現在実用化されている全文検索システムでは、検索してきた文書を表示する場合、検索文書のタイトルの一覧を表示するか、原文をそのまま表示するにすぎない。検索は、結果から統計的な情報を作成すること、あるいは検索した文書を読み、理解すること、内容を参考にし、再利用すること等を目的として行われる。このため、検索システムにおいては、検索速度や精度の点だけではなく、検索結果の提示方法も配慮し、効率的な検索を可能にする必要がある。効率的な検索インタフェースの構築を目的として、ディレクトリ構造のような情報の階層構造を利用し、大量情報を可視化する試みがなされている。しかし、情報伝達の中心である言語情報についての扱いがこれまで未検討であった。我々は、効率的な検索を目的として、検索結果の文書から自動的に抄録を生成し提示することを特長とする文書検索システムBREVIDOC(Broadcatching system with an essence viewer for retrieved documents) を試作した。本稿では、試作したシステムの構成ならびに機能を述べる。

    CiNii

  • Learning formal languages from Feasible Teachers

    酒井 哲也, 平澤 茂一, 松嶋 敏泰

    日本経営工学会誌   44 ( 3 ) 245 - 245  1993年08月

    CiNii

▼全件表示

受賞

  • DEIM 2020 Excellent Paper Award (second author)

    2020年  

  • FIT 2020 Excellent Paper Award (first author)

    2020年  

  • CSS 2019 Best Paper Award (fifth author)

    2019年  

  • ACM Distinguished Member

    2018年  

  • 第6回WASEDA e-Teaching Award

    2018年  

  • ACM Recognition of Service Award (SIGIR'17 Co-chair)

    2017年  

  • ACM Senior Member

    2016年  

  • 早稲田大学ティーチングアワード総長賞(2016年度春学期)

    2016年  

  • 早稲田大学ティーチングアワード(2014年度秋学期)

    2015年  

  • CSS 2014 Student Paper Award (third author)

    2014年  

  • MobileHCI 2014 Honorable Mention(second author)

    2014年  

  • ACM SIGIR 2013 best paper shortlisted nominee (first author)

    2013年  

  • AIRS 2012 Best Paper Award (first author)

    2012年  

  • WebDB Forum 2010 Excellent Paper Award and NTT Resonant Award (second author)

    2010年  

  • FIT 2008 Funai Best Paper Award (first author)

    2008年  

  • IEICE ESS 2007 Merit Award

    2007年  

  • IPSJ 2007 Best Paper Award (single author)

    2007年  

  • IPSJ 2006 Yamashita SIG Research Award (single author)

    2006年  

  • IPSJ 2006 Best Paper Award (single author)

    2006年  

  • FIT 2005 Excellent Paper Award (single author)

    2005年  

▼全件表示

共同研究・競争的資金等の研究課題

  • ナゲットに基づくタスク指向対話の自動評価に関する研究

    研究期間:

    2017年04月
    -
    2021年03月
     

     概要を見る

    コンペティション型国際会議NTCIR-14にてShort Text Conversation (STC-3) タスクをスケジュール通りに運営し、早稲田大学酒井研究室を含む12の研究機関から結果を提出してもらうことができた。このタスクは、顧客・ヘルプデスク間の対話の品質を推定するものであり、この技術は将来的に対話システムの応答戦略に応用可能である。タスクの評価方法については情報検索会議の最高峰SIGIRにて発表を行い、データセットに関してはJournal of Information Processingにてまとめた。後者はWebDB Forum 2018にてbest paper runner-upに選出された。<BR>・Zeng, Z., Luo, C., Shang, L., Li, H., and Sakai, T.: Towards Automatic Evaluation of Customer-Helpdesk Dialogues, Journal of Information Processing, Volume 26, pp.768-778, 査読あり, 2018. WebDB Forum 2018 Best Paper Runner-up<BR>・Sakai, T.: Comparing Two Binned Probability Distributions for Information Access Evaluation, Proceedings of ACM SIGIR 2018, pp.1073-1076, 査読あり, 2018.以下のスケジュールに沿ってタスク運営を進めることができた。4月 データのクローリング+アノテーションツールの開発、5-8月 データのアノテーション、9月 学習用データ公開、11月 評価用データ公開・結果提出締切、2月 タスクオーバービュー論文暫定版公開、3月 タスク参加者論文暫定版投稿2019年度の計画は以下の通りである。・NTCIR-14にてタスク運営者およびタスク参加者としての研究成果を発表・対話データセットDCH-1の中英翻訳を進め、より広くの対話研究者が使えるようにする・NTCIR-15における対話タスクの設計と提案、推

  • 利用者の状況を考慮する探索的検索の技術

    研究期間:

    2016年04月
    -
    2020年03月
     

     概要を見る

    (1)ユーザの状況の捕捉: ユーザ実験によって収集したインタラクションデータ(検索行動ログ、視線、検索前後のアンケート・コンセプトマップ・インタビューなど)を用い、検索過程における個々のユーザの状況の捕捉を研究した。タスクの認知的複雑さ(Complexity)、背景知識、タスクの自覚する困難さや満足度との関係を解析した。対象は、法律・学術における「調査」、Web、音楽を取り上げた。中国・清華大の協力により、モバイル端末を用いた検索インタラクションとユーザ背景知識、香港大学の協力により音楽検索などの「楽しみのための探索的検索Search for fun」とタスクの認知的複雑性の影響について検討した。さらに新たなインタラクション環境として、美術館博物館の来館者のタブレット端末を用いたデジタル空間での探索行動と実空間での探索閲覧行動の両面を補足し、それらを用いてよりよい探索エクスペリエンス提供を目指すすサブ課題に着手した。<BR>(2)ユーザの状況に応じて、ユーザを支援する技術:異なる認知的複雑さタスクを選定し、検索過程でユーザを支援する技術のプロトタイプを提案した。具体的には (a)クエリマイニングによる代替検索戦略提案:コミュニティQAコーパスの質問・解答構造を利用して、同じ検索意図や目的のための異なる検索戦略をユーザに提示、(b)マルチファセット検索UIのベースラインシステムを構築し、随時、サブカテゴリや検索Trail提示の仕組みを提案した。これらの支援メカニズムが有用な状況やユーザ特性について検討を深める。さらに、情報源のスタンスなどより包括的な視点の有用性も検討した。(c)コンセプトマップが探索に与える影響について検討した。<BR>(3) 検索の基礎技術: 今年度は、自然言語対話、モバイル端末におけるタッチインタラクション、検索実験計画法と評価法について研究をすすめた。いくつかの対象領域に焦点を当て、ユーザの探索行動の捕捉とも出るかのための解析、探索過程で状況に応じて探索の方向性を提案する検索支援について、研究を進めることができた。中国・清華大の協力により、モバイル端末を用いた検索インタラクション、香港大学の協力により音楽検索など、予定よりも豊かな研究対象に取り組むことができた。さらに、今年度後半から、より豊かな探索インタラクションデータを捕捉できる環境として、従来のデジタル空間での探索行動に加え、センサー等によりユーザの物理空間での探索・閲覧行動も捕捉して、両者を連携して、よりよい探索インタラクション経験を提供するメカニズムについても検討を進めていくこととした。(1)ユーザの状況の捕捉: 既有のインタラクションデータに加え、新たにユーザ検索実験によりインタラクションデータを収集し、より多面的に解析を行う。<BR>(2)ユーザの状況に応じて、ユーザを支援する技術:異なる認知的複雑さタスクを選定し、検索過程でユーザを支援する技術のプロトタイプについて、ユーザを支援するメカニズムが、有用である、状況、タスクの特性、ユーザの特性との関係をより明らかにできるように検討をすすめる。<BR>(3) 検索の基礎技術: 必要に応じて研究をすすめ、(2)のユニットに適用する

講演・口頭発表等

  • Overview of the NTCIR-16 WeWantWeb with CENTRE (WWW-4) Task

    Tetsuya Sakai, Sijie Tao, Zhumin Chu, Maria Maistro, Yujing Li, Nuo Chen, Nicola Ferro, Junjie Wang, Ian Soboroff, Yiqun Liu

    Proceedings of NTCIR-16  

    発表年月: 2022年

  • SLWWW at the NTCIR-16 WWW-4 Task

    Yuya Ubukata, Masaki Muraoka, Sijie Tao, Tetsuya Sakai

    Proceedings of NTCIR-16  

    発表年月: 2022年

  • RSLDE at the NTCIR-16 DialEval-2 Task

    Fan Li, Tetsuya Sakai

    Proceedings of NTCIR-16  

    発表年月: 2022年

  • Overview of the NTCIR-16 Dialogue Evaluation (DialEval-2) Task

    Sijie Tao, Tetsuya Sakai

    Proceedings of NTCIR-16  

    発表年月: 2022年

  • On Variants of Root Normalised Order-aware Divergence and a Divergence based on Kendall’s Tau

    Tetsuya Sakai

    arXiv:2204.07304  

    発表年月: 2022年

  • A Versatile Framework for Evaluating Ranked Lists in terms of Group Fairness and Relevance

    Tetsuya Sakai, Jin Young Kim, Inho Kang

    arXiv:2204.00280  

    発表年月: 2022年

  • Transformerを用いた文書の自動品質評価

    吉越玲士, 酒井哲也

    DEIM 2022  

    発表年月: 2022年

  • NTCIR-16ウェブ検索・再現可能性タスク (WWW-4) および対話評価タスク (DialEval-2)への誘い

    酒井哲也

    情報処理学会研究報告  

    発表年月: 2021年

  • 対話要約における話者情報を持つEmbeddingの効果

    楢木悠士, 酒井哲也, 林 良彦

    FIT2021講演論文集  

    発表年月: 2021年

  • RealSakaiLab at the TREC 2020 Health Misinformation Track

    Sijie Tao, Tetsuya Sakai

    Proceedings of TREC 2020  

    発表年月: 2021年

  • 話者情報を認識した対話要約

    楢木悠士, 酒井哲也

    言語処理学会第27回年次大会発表論文集  

    発表年月: 2021年

  • Voice Assistantアプリの対話型解析システムの開発

    刀塚敦子, 飯島涼, 渡邉卓弥, 秋山満昭, 酒井哲也, 森達哉

    電子情報通信学会 第54回情報通信システムセキュリティ研究会  

    発表年月: 2021年

  • モバイルアプリケーションにおけるUIデザイン自動評価の検討

    栗林峻, 酒井哲也

    DEIM 2021  

    発表年月: 2021年

  • スタンス検出タスクにおける評価方法の選定

    雨宮佑基, 酒井哲也

    DEIM 2021  

    発表年月: 2021年

  • 日経新聞の記事からの日経ラジオ用読み原稿の自動生成

    清水嶺, 酒井哲也

    DEIM 2021  

    発表年月: 2021年

  • 有用なレビューを抽出するための比較文フィルタリングの検討

    小橋賢介, 雨宮佑基, 酒井哲也

    DEIM 2021  

    発表年月: 2021年

  • Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents?

    Tetsuya Sakai, Sijie Tao, Zhaohao Zeng

    情報処理学会研究報告  

    発表年月: 2021年

  • Overview of the TREC 2018 CENTRE Track

    Ian Soboroff, Nicola Ferro, Maria Maistro, Tetsuya Sakai

    Proceedings of TREC 2018  

    発表年月: 2020年

  • Improving Concept Representations for Short Text Classification

    Sijie Tao, Tetsuya Sakai

    言語処理学会第26回年次大会発表論文集  

    発表年月: 2020年

  • Different Types of Voice User Interface Failures May Cause Different Degrees of Frustration

    Shiyoh Goetsu, Tetsuya Sakai

    インタラクション 2020  

    発表年月: 2020年

  • 商品比較のための文脈つき評価軸抽出の検討

    小橋賢介, 酒井哲也

    DEIM 2020  

    発表年月: 2020年

  • Androidアプリの権限要求に対する説明十分性の自動確認システムの提案

    小島智樹, 酒井哲也

    DEIM 2020  

    発表年月: 2020年

  • Purchase Prediction based on Recurrent Neural Networks with an Emphasis on Recent User Activities

    Quanyu Piao, Joo-Young Lee, Tetsuya Sakai

    DEIM 2020  

    発表年月: 2020年

  • Experiments on Unsupervised Text Classification based on Graph Neural Networks

    Haoxiang Shi, Cen Wang, Tetsuya Sakai

    DEIM 2020  

    発表年月: 2020年

  • Do Neural Models for Response Generation Fully Exploit the Input Natural Language Text?

    Lingfeng Zhang, Tetsuya Sakai

    DEIM 2020  

    発表年月: 2020年

  • 商品検索におけるゼロマッチ解消のためのデータセット構築の検討

    雨宮佑基, 真鍋知博, 藤田澄男, 酒井哲也

    DEIM 2020  

    発表年月: 2020年

  • 解釈可能な内部表現を使用したタスク指向ニューラル対話システムの試作

    村田憲俊, 酒井哲也

    DEIM 2020  

    発表年月: 2020年

  • Response Generation based on the Big Five Personality Traits

    Wanqi Wu, Tetsuya Sakai

    DEIM 2020  

    発表年月: 2020年

  • Different Types of Voice User Interface Failures May Cause Different Degrees of Frustration

    Shiyoh Goetsu, Tetsuya Sakai

    arXiv  

    発表年月: 2020年

  • selt Team’s Entity Linking System at the NTCIR-15 QALab-PoliInfo2

    Yuji Naraki, Tetsuya Sakai

    Proceedings of NTCIR-15  

    発表年月: 2020年

  • SLWWW at the NTCIR-15WWW-3 Task

    Masaki Muraoka, Zhaohao Zeng, Tetsuya Sakai

    Proceedings of NTCIR-15  

    発表年月: 2020年

  • Overview of the NTCIR-15 We Want Web with CENTRE (WWW-3) Task

    Tetsuya Sakai, Sijie Tao, Zhaohao Zeng, Yukun Zheng, Jiaxin Mao, Zhumin Chu, Yiqun Liu, Maria Maistro, Zhicheng Dou, Nicola Ferro, Ian Soboroff

    Proceedings of NTCIR-15  

    発表年月: 2020年

  • RSLNV at the NTCIR-15 Dialogue Evaluation (DialEval-1) Task

    Ting Cao, Fan Zhang, Haoxiang Shi, Zhaohao Zeng, Sosuke Kato, Tetsuya Sakai, Injae Lee, Kyungduk Kim, Inho Kang

    Proceedings of NTCIR-15  

    発表年月: 2020年

  • SKYMN at the NTCIR-15 DialEval-1 Task

    Junjie Wang, Yuxiang Zhang, Tetsuya Sakai, Hayato Yamana

    Proceedings of NTCIR-15  

    発表年月: 2020年

  • Overview of the NTCIR-15 Dialogue Evaluation (DialEval-1) Task

    Zhaohao Zeng, Sosuke Kato, Tetsuya Sakai, Inho Kang

    Proceedings of NTCIR-15  

    発表年月: 2020年

  • ユーザの感覚に近い多様化検索評価指標

    酒井哲也, Zhaohao Zeng

    FIT2020講演論文集  

    発表年月: 2020年

  • On Fuhr’s Guideline for IR Evaluation

    Tetsuya Sakai

    SIGIR Forum  

    発表年月: 2020年

  • 擬似アノテーションにもとづく日本語ツイートの極性判定

    小橋賢介, 酒井哲也

    DEIM 2019  

    発表年月: 2019年

  • FigureQAタスクにおける抽象画像を考慮したアプローチ

    坂本凜, 酒井哲也

    DEIM 2019  

    発表年月: 2019年

  • Convolutional Neural Networkを用いたFake News Challengeの検討

    雨宮佑基, 酒井哲也

    DEIM 2019  

    発表年月: 2019年

  • 音声ユーザインタフェースにおける処理エラーによるユーザフラストレーションに関する調査

    呉越思瑶, 酒井哲也

    DEIM 2019  

    発表年月: 2019年

  • Query-Focused Extractive Summarization based on Deep Learning: Comparison of Similarity Measures for Pseudo Ground Truth Generation

    Yuliska, Tetsuya Sakai

    DEIM 2019  

    発表年月: 2019年

  • Exploring Multi-label Classification Using Text Graph Convolutional Networks on the NTCIR-13 MedWeb Dataset

    Sijie Tao, Tetsuya Sakai

    DEIM 2019  

    発表年月: 2019年

  • Androidアプリの権限要求に対するユーザーへの説明の補完

    小島智樹, 酒井哲也

    DEIM 2019  

    発表年月: 2019年

  • 能動学習を利用した未知語アノテーションの検討

    黒澤瞭佑, 酒井哲也

    DEIM 2019  

    発表年月: 2019年

  • Dialogue Quality Distribution Prediction based on a Loss that Compares Adjacent Probability Bins

    河東宗祐, 酒井哲也

    DEIM 2019  

    発表年月: 2019年

  • Twitterコーパスに基づく雑談対話システムにおける多様性の獲得

    村田憲俊, 酒井哲也

    DEIM 2019  

    発表年月: 2019年

  • 文書分類技術に基づくエントリーシートからの業界推薦

    三王慶太, 酒井哲也

    DEIM 2019  

    発表年月: 2019年

  • Graded Relevance Assessments and Graded Relevance Measures of NTCIR: A Survey of the First Twenty Years

    Tetsuya Sakai

    arXiv:1903.11272  

    発表年月: 2019年

  • RSL19BD at DBDC4: Ensemble of Decision Tree-based and LSTM-based Models

    Chih-hao Wang, Sosuke Kato, Tetsuya Sakai

    arXiv:1905.01799  

    発表年月: 2019年

  • Overview of the NTCIR-14 CENTRE Task

    Tetsuya Sakai, Nicola Ferro, Ian Soboroff, Zhaohao Zeng, Peng Xiao, Maria Maistro

    Proceedings of NTCIR-14  

    発表年月: 2019年

  • Overview of the NTCIR-14 We Want Web Task

    Jiaxin Mao, Tetsuya Sakai, Cheng Luo, Peng Xiao, Yiqun Liu, Zhicheng Dou

    Proceedings of NTCIR-14  

    発表年月: 2019年

  • Overview of the NTCIR-14 Short Text Conversation Task: Dialogue Quality and Nugget Detection Subtasks

    Zhaohao Zeng, Sosuke Kato, Tetsuya Sakai

    Proceedings of NTCIR-14  

    発表年月: 2019年

  • SLSTC at the NTCIR-14 STC-3 Dialogue Quality and Nugget Detection Subtasks

    Sosuke Kato, Rikiya Suzuki, Zhaohao Zeng, Tetsuya Sakai

    Proceedings of NTCIR-14  

    発表年月: 2019年

  • SLWWW at the NTCIR-14 We Want Web Task

    Peng Xiao, Tetsuya Sakai

    Proceedings of NTCIR-14  

    発表年月: 2019年

  • NTCIR-15ウェブ検索・再現可能性タスク (WWW-3) および対話評価タスク (DialEval-1)への誘い

    酒井哲也

    情報処理学会研究報告2019-IFAT-136  

    発表年月: 2019年

  • Overview of the TREC 2018 CENTRE Track

    Ian Soboroff, Nicola Ferro, Maria Maistro, Tetsuya Sakai

    Proceedings of TREC 2018  

    発表年月: 2019年

  • クリックと放棄に基づくモバイルバーティカルの順位付け

    川崎 真未, Inho Kang, 酒井哲也

    DEIM 2018  

    発表年月: 2018年

  • Generative Adversarial Nets を用いた文書分類の検証

    小島智樹, 酒井哲也

    DEIM 2018  

    発表年月: 2018年

  • 単語レベルと文字レベルの情報を用いた日本語対話システムの試作

    村田憲俊, 酒井哲也

    DEIM 2018  

    発表年月: 2018年

  • Classifying Community QA QuestionsThat Contain an Image

    Kenta Tamaki, Riku Togashi, Sumio Fujita, Hideyuki Maeda, Tetsuya Sakai

    DEIM 2018  

    発表年月: 2018年

  • ユーザーのニーズに合わせたインタラクティブな推薦システムの提案

    呉越思瑶, 酒井哲也

    DEIM 2018  

    発表年月: 2018年

  • Report on NTCIR-13: The Thirteenth Round of NII Testbeds and Community for Information Access Research

    Yiqun Liu, Makoto P. Kato, Charles L.A. Clarke, Noriko Kando, Tetsuya Sakai

    SIGIR Forum 52(1) 2018  

    発表年月: 2018年

  • A Comparative Study of Deep Learning Approaches for Visual Question Classification in Community QA

    Hsin-Wen Liu, Avikalp Srivastava, Sumio Fujita, Toru Shimizu, Riku Togashi, Tetsuya Sakai

    IPSJ SIG Technical Report 2018-IFAT-132 (17)  

    発表年月: 2018年

  • 対話破綻検出コーパスに対する学習データ選別の検討

    河東宗祐, 酒井哲也

    情報処理学会研究報告 2018-IFAT-132 (28)  

    発表年月: 2018年

  • 色・形状・テクスチャに基づく画像検索の自動評価と多様化

    富樫陸, 藤田澄男, 酒井哲也

    情報処理学会研究報告 2018-IFAT-132 (12)  

    発表年月: 2018年

  • Androidアプリのレビューを用いたユーザーへの権限説明の補完

    小島智樹, 酒井哲也

    情報処理学会研究報告  

    発表年月: 2018年

  • 評価実験の設計と論文での結果報告: きちんとやっていますか?

    酒井 哲也

    第3回自然言語処理シンポジウム  

    発表年月: 2017年

  • Report on NTCIR-12: The Twelfth Round of NII Testbeds and Community for Information Access Research

    Makoto P. Kato, Kazuaki Kishida, Noriko Kando, Tetsuya Sakai, Mark Sanderson

    SIGIR Forum 50 (2)  

    発表年月: 2017年

  • ツイートにおける周辺単語の感情極性値を用いた新語の感情推定

    黒澤 瞭佑, 酒井 哲也

    DEIM 2017  

    発表年月: 2017年

  • 解答検証を利用した選択式問題への自動解答

    佐藤 航, 酒井 哲也

    DEIM 2017  

    発表年月: 2017年

  • 英日言語横断検索におけるクエリ拡張結果の詳細分析

    玉置 賢太, 酒井 哲也

    DEIM 2017  

    発表年月: 2017年

  • アノテーション分布を考慮した対話破綻検出

    河東 宗祐, 酒井 哲也

    DEIM 2017  

    発表年月: 2017年

  • 拡張クエリを用いたレシピ検索のパーソナライゼーション

    犬塚 眞太郎, 酒井 哲也

    DEIM 2017  

    発表年月: 2017年

  • クリックに基づく選好グラフを用いたバーティカル適合性推定

    門田見 侑大, 吉田 泰明, 藤田澄男, 酒井 哲也

    DEIM 2017  

    発表年月: 2017年

  • 複数人で睡眠習慣改善に臨む際の人間関係と協調の効果

    飯島 聡美, 酒井 哲也

    DEIM 2017  

    発表年月: 2017年

  • Test Collections and Measures for Evaluating Customer-Helpdesk Dialogues

    Zhaohao Zeng, Cheng Luo, Lifeng Shang, Hang Li, Tetsuya Sakai

    情報処理学会研究報告 2017-NL-232  

    発表年月: 2017年

  • Ranking Rich Mobile Verticals based on Clicks and Abandonment

    Mami Kawasaki, Inho Kang, Tetsuya Sakai

    情報処理学会研究報告 2017-IFAT-127  

    発表年月: 2017年

  • Overview of the NTCIR-13 Short Text Conversation Task

    Lifeng Shang, Tetsuya Sakai, Hang Li, Ryuichiro Higashinaka, Yusuke Miyao, Yuki Arase, Masako Nomoto

    Proceedings of NTCIR-13  

    発表年月: 2017年

  • Overview of the NTCIR-13 We Want Web Task

    Cheng Luo, Tetsuya Sakai, Yiqun Liu, Zhicheng Dou, Chenyan Xiong, Jingfang Xu

    Proceedings of NTCIR-13  

    発表年月: 2017年

  • SLOLQ at the NTCIR-13 OpenLiveQ Task

    Ryo Kashimura, Tetsuya Sakai

    Proceedings of NTCIR-13  

    発表年月: 2017年

  • SLQAL at the NTCIR-13 QA Lab-3 Task

    Kou Sato, Tetsuya Sakai

    Proceedings of NTCIR-13  

    発表年月: 2017年

  • SLSTC at the NTCIR-13 STC Task

    Jun Guan, Tetsuya Sakai

    Proceedings of NTCIR-13  

    発表年月: 2017年

  • SLWWW at the NTCIR-13 WWW Task

    Peng Xiao, Lingtao Li, Yimeng Fan, Tetsuya Sakai

    Proceedings of NTCIR-13  

    発表年月: 2017年

  • Project Next IR -情報検索の失敗分析‐

    難波英嗣, 酒井哲也, 神門典子

    情報処理  

    発表年月: 2016年

  • 発話者を考慮した学習に基づく対話システムの検討

    河東宗祐, 酒井哲也

    DEIM 2016  

    発表年月: 2016年

  • ショッピングサイトにおける購入予測のための行動パターン分析

    出縄弘人, Young-In Song, 酒井哲也

    DEIM 2016  

    発表年月: 2016年

  • コンテキスト付き検索ログを用いた要求ヴァーティカルの分析

    門田見侑大, 吉田泰明, 藤田澄男, 酒井哲也

    DEIM 2016  

    発表年月: 2016年

  • 言語の分散表現と擬似適合性フィードバックを用いた英日言語横断検索

    玉置賢太, 林佑明, 酒井哲也

    DEIM 2016  

    発表年月: 2016年

  • 協調型ヘルスケア -規則正しい睡眠による日中の生産性向上

    飯島聡美, 酒井哲也

    DEIM 2016  

    発表年月: 2016年

  • Overview of the NTCIR-12 Short Text Conversation Task

    Lifeng Shang, Tetsuya Sakai, Zhengdong Lu, Hang Li, Ryuichiro Higashinaka, Yusuke Miyao

    NTCIR-12  

    発表年月: 2016年

  • Overview of the NTCIR-12 MobileClick Task

    Makoto P. Kato, Tetsuya Sakai, Takehiro Yamamoto, Virgil Pavlu, Hajime Morita, Sumio Fujita

    NTCIR-12  

    発表年月: 2016年

  • NEXTI at NTCIR-12 IMine-2 Task

    Hidetsugu Nanba, Tetsuya Sakai, Noriko Kando, Atsushi Keyaki, Koji Eguchi, Kenji Hatano, Toshiyuki Shimizu, Yu Hirate, Atsushi Fujii

    NTCIR-12  

    発表年月: 2016年

  • SLQAL at the NTCIR-12 QALab-2 Task

    Shin Higuchi, Tetsuya Sakai

    NTCIR-12  

    発表年月: 2016年

  • SLSTC at the NTCIR-12 STC Task

    Hiroto Denawa, Tomoaki Sano, Yuta Kadotami, Sosuke Kato, Tetsuya Sakai

    NTCIR-12  

    発表年月: 2016年

  • SLLL at the NTCIR-12 Lifelog Task: Sleepflower and the LIT Subtask

    Satomi Iijima, Tetsuya Sakai

    NTCIR-12  

    発表年月: 2016年

  • Evaluating Helpdesk Dialogues: Initial Considerations from An Information Access Perspective

    Tetsuya Sakai, Zhaohao Zeng, Cheng Luo

    情報処理学会研究報告  

    発表年月: 2016年

  • word2vecによる発話ベクトルの類似度を用いた対話破綻予測

    河東宗祐, 酒井 哲也

    人工知能学会 音声・言語理解と対話処理研究会(SLUD)第78回研究会 (第7回対話システムシンポジウム),  

    発表年月: 2016年

  • TREC 2014 Temporal Summarization Track Overview

    Javed Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Richard, McCreadie, Tetsuya Sakai

    TREC 2014  

    発表年月: 2015年

  • 言語の分散表現による文脈情報を利用した言語横断情報検索

    林佑明, 酒井哲也

    DEIM Forum 2015  

    発表年月: 2015年

  • 情報検索のエラー分析

    難波英嗣, 酒井哲也

    言語処理学会第21回年次大会ワークショップ  

    発表年月: 2015年

  • Topic Set Size Design with the Evaluation Measures for Short Text Conversation

    Tetsuya Sakai

    情報処理学会研究報告  

    発表年月: 2015年

  • ECol 2015: First International Workshop on the Evaluation of Collaborative Information Seeking and Retrieval

    Leif Azzopardi, Jeremy Pickens, Tetsuya Sakai, Laure Soulier, Lynda Tamine

    ACM CIKM 2015  

    発表年月: 2015年

  • TREC 2013 Temporal Summarization

    Javd Aslam, Fernando Diaz, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai

    TREC 2013  

    発表年月: 2014年

  • 映像入力デバイスを悪用する Android アプリの解析と対策法

    渡邉卓弥, 森達哉, 酒井哲也

    信学技報  

    発表年月: 2014年

  • Androidアプリの説明文とプライバシー情報アクセスの相関分析

    渡邉卓弥, 秋山満昭, 酒井哲也, 鷲崎弘宜, 森達哉

    マルウェア対策研究人材育成ワークショップ 2014  

    発表年月: 2014年

  • Overview of the NTCIR-11 MobileClick Task

    Makoto P. Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

    NTCIR-11  

    発表年月: 2014年

  • A Preview of the NTCIR-10 INTENT-2 Results

    Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, Makoto P. Kato, Mayu Iwata

    情報処理学会研究報告  

    発表年月: 2013年

  • Overview of the NTCIR-10 INTENT-2 Task

    Tetsuya Sakai, Zhicheng Dou, Takehiro Yamamoto, Yiqun Liu, Min Zhang, Ruihua Song, Makoto P. Kato, Mayu Iwata

    NTCIR-10  

    発表年月: 2013年

  • Overview of the NTCIR-10 1CLICK-2 Task

    Makoto P. Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, Mayu Iwata

    NTCIR-10  

    発表年月: 2013年

  • Microsoft Research Asia at the NTCIR-10 Intent Task

    Kosetsu Tsukuda, Zhicheng Dou, Tetsuya Sakai

    NTCIR-10  

    発表年月: 2013年

  • MSRA at NTCIR-10 1CLICK-2

    Kazuya Narita, Tetsuya Sakai, Zhicheng Dou, Young-In Song

    NTCIR-10  

    発表年月: 2013年

  • How Intuitive Are Diversified Search Metrics? Concordance Test Results for the Diversity U-measures

    Tetsuya Sakai

    情処学会研究報告  

    発表年月: 2013年

  • モバイル「情報」検索に向けて: NTCIR-11 MobileClickタスクへの誘い

    加藤誠, Matthew Ekstrand-Abueg, Virgil Pavlu, 酒井哲也, 山本岳洋, 岩田麻佑

    人工知能学会第5回インタラクティブ情報アクセスと可視化マイニング研究会  

    発表年月: 2013年

  • 曖昧なクエリと(不)明快なクエリ:NTCIR-10 INTENT-2と1CLICK-2タスクへの誘い

    酒井哲也

    情報処理学会研究報告  

    発表年月: 2012年

  • NTCIR-9総括と今後の展望

    酒井哲也, 上保秀夫, 神門典子, 加藤恒昭, 相澤彰子, 秋葉友良, 後藤功雄, 木村文則, 三田村照子, 西崎博光, 嶋秀樹, 吉岡真治, Shlomo Geva, Ling-Xiang Tang, Andrew Trotman, Yue Xu

    情報処理学会研究報告  

    発表年月: 2012年

  • Frontiers, Challenges, and Opportunities for Information Retrieval: Report from SWIRL 2012 The Second Strategic Workshop on Information Retrieval in Lorne,

    Allan, J, Aslam, J, Azzopardi, L, Belkin, N, Borlund, P, Bruza, P, Callan, J, Carman, M, Clarke, C.L.A, Craswell, N. Croft, W, B, Culpepper, J.S, Diaz, F, Dumais, S, Ferro, N, Geva, S, Gonzalo, J, Hawking, D, Jarvelin, K, Jones, G, Jones, R, Kamps, J, Kando, N, Kanoulas, N, Karlgren, J, Kelly, D, Lease, M, Lin, J, Mizzaro, S, Moffat, A, Murdock, V, Oard, D.W, de Rijke, M, Sakai, T, Sanderson, M, Scholer, F, Si, L, Thom, J.A, Thomas, P, Trotman, A, Turpin, A

    SIGIR Forum  

    発表年月: 2012年

  • The Reusability of a Diversified Search Test Collection

    Tetsuya Sakai

    情報処理学会研究報告  

    発表年月: 2012年

  • One Click One Revisited: Enhancing Evaluation based on Information Units

    Tetsuya Sakai, Makoto P. Kato

    情報処理学会研究報告  

    発表年月: 2012年

  • 複数判定者によるコミュニティQAの良質回答の判定

    石川大介, 酒井哲也, 関洋平, 栗山和子, 神門典子

    情報知識学会誌  

    発表年月: 2011年

  • Japanese Hyponymy Extraction based on a Term Similarity Graph

    Takuya Akiba, Tetsuya Sakai

    情報処理学会研究報告  

    発表年月: 2011年

  • Overview of NTCIR-9

    Tetsuya Sakai, Hideo Joho

    NTCIR-9 Proceedings  

    発表年月: 2011年

  • Overview of the NTCIR-9 INTENT Task

    Ruihua Song, Min Zhang, Tetsuya Sakai, Makoto P. Kato, Yiqun Liu, Miho Sugimoto, Qinglei Wang, Naoki Orii

    NTCIR-9 Proceedings  

    発表年月: 2011年

  • Overview of NTCIR-9 1CLICK

    Tetsuya Sakai, Makoto P. Kato, Young-In Song

    NTCIR-9 Proceedings  

    発表年月: 2011年

  • Microsoft Research Asia at the NTCIR-9 1CLICK Task

    Naoki Orii, Young-In Song, Tetsuya Sakai

    NTCIR-9 Proceedings  

    発表年月: 2011年

  • Microsoft Research Asia at the NTCIR-9 Intent Task

    Jialong Han, Qinglei Wang, Naoki Orii, Zhicheng Dou, Tetsuya Sakai, Ruihua Song

    NTCIR-9 Proceedings  

    発表年月: 2011年

  • TTOKU Summarization Based Systems at NTCIR-9 1CLICK Task

    Hajime Morita, Takuya Makino, Tetsuya Sakai, Hiroya Takamura, Manabu Okumura

    NTCIR-9 Proceedings  

    発表年月: 2011年

  • Grid-based Interaction for NTCIR-9 VisEx Task,

    Hideo Joho, Tetsuya Sakai

    NTCIR-9 Proceedings  

    発表年月: 2011年

  • NTCIR-9 VisEx におけるグリッド型インタラクションモデルの研究

    上保秀夫, 酒井哲也

    人工知能学会情報編纂研究会第7回究会  

    発表年月: 2011年

  • Q&Aサイトにおけるベストアンサー推定の分析とその機械学習への応用

    石川大介, 栗山和子, 酒井哲也, 関洋平, 神門典子

    情報知識学会年次大会予稿  

    発表年月: 2010年

  • Overview of the NTCIR-8 ACLIA Tasks: Advanced Cross-Lingual Information Access

    Teruko Mitamura, Hideki Shima, Tetsuya Sakai, Noriko Kando, Tatsunori Mori, Koichi Takeda, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, Cheng-Wei Lee

    NTCIR-8 Proceedings  

    発表年月: 2010年

  • Overview of NTCIR-8 ACLIA IR4QA

    Tetsuya Sakai, Hideki Shima, Noriko Kando, Ruihua Song, Chuan-Jia Lin, Teruko Mitamura, Miho Sugimoto, Cheng-Wei Lee

    NTCIR-8 Proceedings  

    発表年月: 2010年

  • NTCIR-GeoTime Overview: Evaluating Geographic and Temporal Search

    Fredric Gey, Ray Larson, Noriko Kando, Jorge Machado, Tetsuya Sakai

    NTCIR-8 Proceedings  

    発表年月: 2010年

  • Overview of the NTCIR-8 Community QA Pilot Task (Part I):

    Daisuke Ishikawa, Tetsuya Sakai, Noriko Kando

    The Test Collection and the Task, NTCIR-8 Proceedings  

    発表年月: 2010年

  • Overview of the NTCIR-8 Community QA Pilot Task (Part II)

    Tetsuya Sakai, Daisuke Ishikawa, Noriko Kando

    System Evaluation, NTCIR-8 Proceedings  

    発表年月: 2010年

  • Microsoft Research Asia with Redmond at the NTCIR-8 Community QA Pilot Task

    Young-In Song, Jing Liu, Tetsuya Sakai, Xin-Jing Wang, Guwen Feng, Yunbo Cao, Hisami Suzuki, Chin-Yew Lin

    NTCIR-8 Proceedings  

    発表年月: 2010年

  • Mutilinguality at NTCIR, and moving on... (invited talk),

    Tetsuya Sakai  [招待有り]

    Proceedings of the COLING 2010 Fourth Workshop on Cross Lingual Information Access  

    発表年月: 2010年

  • EVIA 2010: The Third International Workshop on Evaluating Information Access

    William Webber, Tetsuya Sakai, Mark Sanderson

    ACM SIGIR Forum  

    発表年月: 2010年

  • ウィキペディアを活用した探検型検索サイトのクエリログ分析

    酒井哲也, 野上謙一

    情報処理学会研究報告  

    発表年月: 2009年

  • NTCIR-7 ACLIA IR4QA Results based on Qrels Version 2

    Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Ruihua Song, Hideki Shima, Teruko Mitamura

    NTCIR-7 Online Proceedings  

    発表年月: 2009年

  • EVIA 2008: The Second International Workshop on Evaluating Information Access

    Tetsuya Sakai, Mark Sanderson, Noriko Kando

    ACM SIGIR Forum  

    発表年月: 2009年

  • Revisiting NTCIR ACLIA IR4QA with Additional Relevance Assessments

    Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Ruihua Song, Hideki Shima, Teruko Mitamura

    情報処理学会研究報告  

    発表年月: 2009年

  • Report on the SIGIR 2009 Workshop on the Future of IR Evaluation,

    Jaap Kamps, Shlomo Geva, Carol Peters, Tetsuya Sakai, Andrew Trotman, Ellen Voorhees

    ACM SIGIR Forum  

    発表年月: 2009年

  • チュートリアル 情報検索テストコレクションと評価指標

    酒井哲也

    情報処理学会研究報告  

    発表年月: 2008年

  • Comparing Metrics across TREC and NTCIR: The Robustness to System Bias

    Tetsuya Sakai

    情報処理学会研究報告  

    発表年月: 2008年

  • Breaking News from NTCIR-7 (in Japanese),

    酒井 哲也, 加藤 恒昭, 藤井 敦, 難波 英嗣, 関 洋平, 三田村照子, 神門典子

    ディジタル図書館編集委員会  

    発表年月: 2008年

  • Are Popular Documents More Likely To Be Relevant? A Dive into the ACLIA IR4QA Pools

    Tetsuya Sakai, Noriko Kando

    Proceedings of the Second International Workshop on Evaluating Information Access (EVIA 2008)  

    発表年月: 2008年

  • Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access

    Teruko Mitamura, Eric Nyberg, Hideki Shima, Tsuneaki Kato, Tatsunori Mori, Chin-Yew Lin, Ruihua Song, Chuan-Jie Lin, Tetsuya Sakai, Donghong Ji, Noriko Kando

    NTCIR-7 Proceedings  

    発表年月: 2008年

  • Overview of the NTCIR-7 ACLIA IR4QA Task

    Tetsuya Sakai, Noriko Kando, Chuan-Jie Lin, Teruko Mitamura, Hideki Shima, Donghong Ji, Kuang-Hua Chen, Eric Nyberg

    NTCIR-7 Proceedings  

    発表年月: 2008年

  • 効率的な番組視聴を支援するための話題ラベルの生成とその評価

    小山誠, 酒井哲也, 福井美佳, 上原龍也, 下森大志

    情報処理学会研究報告  

    発表年月: 2007年

  • Toshiba BRIDJE at NTCIR-6 CLIR: The Head/Lead Method and Graded Relevance Feedback

    Tetsuya Sakai, Makoto Koyama, Tatsuya Izuha, Akira Kumano, Toshihiko Manabe, Tomoharu Kokubu

    NTCIR-6 Proceedings  

    発表年月: 2007年

  • A Further Note on Alternatives to Bpref

    Tetsuya Sakai, Noriko Kando

    情報処理学会研究報告  

    発表年月: 2007年

  • EVIA 2007: The First International Workshop on Evaluating Information Access

    Mark Sanderson, Tetsuya Sakai, Noriko Kando

    ACM SIGIR Forum  

    発表年月: 2007年

  • Controlling the Penalty on Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance

    Tetsuya Sakai

    IPSJ SIG Technical Report  

    発表年月: 2006年

  • 質問応答型検索の音声認識誤りに対するロバスト性向上

    筒井 秀樹, 真鍋俊彦, 福井 美佳, 藤井 寛子, 浦田 耕二, 酒井哲也

    情報処理学会研究報告  

    発表年月: 2005年

  • 文書分類技法とそのアンケート分析への応用

    平澤茂一, 石田崇, 足立鉱史, 後藤正幸, 酒井哲也

    経営情報学会2005年度春季全国研究発表大会  

    発表年月: 2005年

  • インターネットを用いた研究支援環境~情報検索システム~

    石田崇, 足立鉱史, 後藤正幸, 酒井哲也, 平澤茂一

    経営情報学会2005年度春季全国研究発表大会  

    発表年月: 2005年

  • 質問応答システムの正解順位とユーザ満足率の関係について

    國分智晴, 酒井哲也, 齋藤 佳美, 筒井 秀樹, 真鍋俊彦, 藤井寛子

    情報処理学会研究報告  

    発表年月: 2005年

  • 教学支援システムに関する学生アンケートの分析

    渡辺智幸, 後藤正幸, 石田崇, 酒井哲也, 平澤茂一

    FIT 2005 一般講演論文集  

    発表年月: 2005年

  • The Effect of Topic Sampling in Sensitivity Comparisons of Information Retrieval Metrics

    Tetsuya Sakai

    情報処理学会研究報告  

    発表年月: 2005年

  • Toshiba BRIDJE at NTCIR-5: Evaluation using Geometric Means

    Tetsuya Sakai, Toshihiko Manabe, Akira Kumano, Makoto Koyama, Tomoharu Kokubu

    NTCIR-5 Proceedings  

    発表年月: 2005年

  • 質問応答技術に基づくマルチモーダルヘルプシステム

    浦田 耕二, 福井美佳, 藤井寛子, 鈴木優, 酒井哲也, 齋藤佳美, 市村 由美, 佐々木寛

    情報処理学会研究報告  

    発表年月: 2004年

  • 質問応答と,日本語固有表現抽出および固有表現体系の関係についての考察

    市村由美, 齋藤佳美, 酒井哲也, 國分智晴, 小山誠

    情報処理学会研究報告  

    発表年月: 2004年

  • Toshiba BRIDJE at NTCIR-4 CLIR: Monolingual/Bilingual IR and Flexible Feedback

    Tetsuya Sakai, Makoto Koyama, Akira Kumano, Toshihiko Manabe

    NTCIR-4 Proceedings  

    発表年月: 2004年

  • Toshiba ASKMi at NTCIR-4 QAC2

    Tetsuya Sakai, Yoshimi Saito, Yumi Ichimura, Makoto Koyama, Tomoharu Kokubu

    NTCIR-4 Proceedings  

    発表年月: 2004年

  • 自然言語表現に基づく学生アンケート分析システム

    酒井哲也, 石田崇, 後藤正幸, 平澤茂一

    FIT 2004 一般講演論文集 N-021  

    発表年月: 2004年

  • 新聞記事からの用語定義の抽出と固有表現クラスに基づく分類

    小山誠, 酒井哲也, 真鍋俊彦

    情報処理学会研究報告  

    発表年月: 2004年

  • High-Precision Search via Question Abstraction for Japanese Question Answering

    Tetsuya Sakai, Yoshimi Saito, Tomoharu Kokubu, Makoto Koyama, Toshihiko Manabe

    情報処理学会研究報告  

    発表年月: 2004年

  • 情報検索技術を用いた選択式・自由記述式の学生アンケート解析

    石田崇, 足立鉱史, 後藤正幸, 酒井哲也, 平澤茂一

    経営情報学会2004年度秋季全国研究発表大会  

    発表年月: 2004年

  • A Note on the Reliability of Japanese Question Answering Evaluation

    Tetsuya Sakai

    情報処理学会研究報告  

    発表年月: 2004年

  • 情報検索技術を用いた効率的な授業アンケートの分析

    酒井哲也, 伊藤潤, 後藤正幸, 石田崇, 平澤茂一

    経営情報学会2003年度春季全国研究発表大会  

    発表年月: 2003年

  • 選択式・記述式アンケートからの知識発見

    後藤正幸, 酒井哲也, 伊藤潤, 石田崇, 平澤茂一

    2003 PCカンファレンス  

    発表年月: 2003年

  • 授業に関する選択式・記述式アンケートの分析

    平澤茂一, 石田崇, 伊藤潤, 後藤正幸, 酒井哲也

    私立大学情報教育協会平成15年度大学情報化全国大会  

    発表年月: 2003年

  • PLSIを利用した文書からの知識発見

    伊藤潤, 石田崇, 後藤正幸, 酒井哲也, 平澤茂一

    FIT 2003 一般講演論文集  

    発表年月: 2003年

  • 質問応答システムにおけるパッセージ検索の評価,

    國分智晴, 酒井哲也

    FIT 2003 一般講演論文集  

    発表年月: 2003年

  • Toshiba KIDS at NTCIR-3: Japanese and English-Japanese IR

    Tetsuya Sakai, Makoto Koyama, Mika Suzuki, Toshihiko Manabe

    NTCIR-3 Proceedings  

    発表年月: 2003年

  • ベイズ統計を用いた文書ファイルの自動分析手法

    後藤正幸, 伊藤潤, 石田崇, 酒井哲也, 平澤茂一

    経営情報学会2003年度秋季全国研究発表大会  

    発表年月: 2003年

  • 授業モデルとその検証

    石田崇, 伊藤潤, 後藤正幸, 酒井哲也, 平澤茂一

    経営情報学会2003年度秋季全国研究発表大会  

    発表年月: 2003年

  • 係り受け木を用いた日本語文書の重要部分抽出

    伊藤潤, 酒井哲也, 平澤茂一

    情報処理学会研究報告  

    発表年月: 2003年

  • Flexible Pseudo-Relevance Feedback for NTCIR-2

    Tetsuya Sakai, Stephen E. Robertson, Stephen Walker

    NTCIR-2  

    発表年月: 2001年

  • Generic Summaries for Indexing in Information Retrieval - Detailed Test Results

    Tetsuya Sakai, Karen Sparck Jones

    Computer Laboratory, University of Cambridge  

    発表年月: 2001年

  • インターネットを用いた研究活動支援システム

    平澤茂一, 松嶋敏泰, 鴻巣敏之, 酒井哲也, 中澤真, 李相協, 野村亮

    2001 PCカンファランス  

    発表年月: 2001年

  • Cross -language情報検索のためのBMIR - J2を用いた一考察

    酒井 哲也, 梶浦 正浩, 住田 一男

    情報処理学会研究報告  

    発表年月: 1999年

  • Probabilistic Retrieval of Japanese News Articles for IREX at Toshiba

    Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita

    IREX Workshop  

    発表年月: 1999年

  • Cross-Language Information Retrieval for NTCIR at Toshiba

    Tetsuya Sakai, Yasuyo Shibazaki, Masaru Suzuki, Masaharu Kajiura, Toshihiko Manabe, Kazuo Sumita

    NTCIR-1  

    発表年月: 1999年

  • BMIR-J2: A Test Collection for Evaluation of Japanese Information Retrieval Systems

    Tetsuya Sakai, Tsuyoshi Kitani, Yasushi Ogawa, Tetsuya Ishikawa, Haruo Kimoto, Ikuro Keshi, Jun Toyoura, Toshikazu Fukushima, Kunio Matsui, Yoshihiro Ueda, Takenobu Tokunaga, Hiroshi Tsuruoka, Hidekazu Nakawatase, Teru Agata, Noriko Kando

    ACM SIGIR Forum  

    発表年月: 1999年

  • First Experiments on the BMIR-J2 Collection using the NEAT System

    Gareth Jones, Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita

    情報処理学会研究報告  

    発表年月: 1998年

  • Cross-Language Information Access: a case study for English and Japanese

    Gareth Jones, Nigel Collier, Tetsuya Sakai, Masaharu Kajiura, Kazuo Sumita, Hideki Hirakawa

    情報処理学会研究報告  

    発表年月: 1998年

  • 日本語情報検索システム評価用テストコレクションBMIR-J2

    木谷強, 小川泰嗣, 石川徹也, 木本晴夫, 中渡瀬秀一, 芥子育雄, 豊浦潤, 福島俊一, 松井くにお, 上田良寛, 酒井哲也, 徳永健伸, 鶴岡弘, 安形輝

    情報処理学会研究報告  

    発表年月: 1997年

  • 情報フィルタリングシステムNEATの開発

    梶浦正浩, 三池誠司, 酒井哲也, 佐藤誠, 住田一男

    第54回情報処理学会全国大会  

    発表年月: 1997年

  • ベンチマークBMIR-J2を用いた情報フィルタリングシステムNEATの評価

    酒井哲也, 梶浦正浩, 三池誠司, 佐藤誠, 住田一男

    第54会情報処理学会全国大会  

    発表年月: 1997年

  • 情報フィルタリングシステムNEATのための検索要求文からのプロファイル生成

    酒井 哲也, 梶浦 正浩, 住田 一男

    情報処理学会研究報告  

    発表年月: 1997年

  • 電子図書館のための効率的な文書検索

    住田一男, 酒井哲也, 小野顕司, 三池誠司

    ディジタル図書館 No.3  

    発表年月: 1995年

  • 文書検索システムの動的抄録提示インタフェースの評価

    酒井 哲也, 三池 誠司, 住田 一男

    情報処理学会研究報告ヒューマンコンピュータインタラクション  

    発表年月: 1994年

▼全件表示

特定課題研究

  • ベイズ統計に基づく情報アクセス評価体系の構築

    2017年  

     概要を見る

    I published the following full paper at SIGIR 2017, the top conference in information retrieval.The following is the abstract:Using classical statistical signifi€cance tests, researchers can onlydiscuss P(D+|H), the probability of observing the data D at hand orsomething more extreme, under the assumption that the hypothesisH is true (i.e., the p-value). But what we usually want is P(D+|H),the probability that a hypothesis is true, given the data. If we useBayesian statistics with state-of-the-art Markov Chain Monte Carlo(MCMC) methods for obtaining posterior distributions, this is nolonger a problem. Th‘at is, instead of the classical p-values and 95%confi€dence intervals, which are oft‰en misinterpreted respectivelyas “probability that the hypothesis is (in)correct” and “probabilitythat the true parameter value drops within the interval is 95%,” wecan easily obtain P(H|D) and credible intervals which representexactly the above. Moreover, with Bayesian tests, we can easilyhandle virtually any hypothesis, not just “equality of means,” andobtain an Expected A Posteriori (EAP) value of any statistic thatwe are interested in. We provide simple tools to encourage theIR community to take up paired and unpaired Bayesian tests forcomparing two systems. Using a variety of TREC and NTCIR data,we compare P(H|D) with p-values, credible intervals with con€fidence intervals, and Bayesian EAP eff‚ect sizes with classical ones.Our results show that (a) p-values and confi€dence intervals canrespectively be regarded as approximations of what we really want,namely, P(H|D) and credible intervals; and (b) sample eff‚ect sizesfrom classical signifi€cance tests can diff‚er considerably from theBayesian EAP eff‚ect sizes, which suggests that the former can bepoor estimates of population e‚ffect sizes. For both paired and unpairedtests, we propose that the IR community report the EAP, thecredible interval, and the probability of hypothesis being true, notonly for the raw diff‚erence in means but also for the eff‚ect size interms of Glass’s delta.

  • 統計的手法を用いた情報検索テストコレクション横断評価および情報検索論文の評価

    2016年  

     概要を見る

    I published five international conference papers (SIGIR, SIGIR, SIGIR(short), ICTIR, AIRS),two international workshop papers (EVIA, EVIA), and a workshop report (SIGIR Forum).Moreover, I gave a tutorial at an international conference (ICTIR) and a keynote at a Japanese symposium (IPSJ SIGNL) on this topic.

  • 「寡黙なユーザ」のための情報検索技術に関する研究

    2015年  

     概要を見る

    We published one international journal paper, one international conference paper, one evaluation conference overview (TREC), and two unrefereed domestic papers.

  • 情報アクセス評価基盤の体系化および評価

    2015年  

     概要を見る

    We published one book, one international journal paper, one international conference paper, one domestic IPSJ workshop paper and organised an international workshop.

  • テストコレクションのサンプルサイズ設計に関する研究

    2014年  

     概要を見る

    We published three refereed papers (two forinternational conferences and one for a domestic conference) on how todetermine the topic set size of a test collection.

  • 最小限のインタラクションを介した情報アクセスに関する研究

    2014年   Koji Yatani, Makoto P. Kato, Takehiro Yamamoto, Virgil Pavlu, Javed Aslam, Fernando Diaz

     概要を見る

    Wecollaborated with various researchers from outside Waseda and published severalpapers related to information access via minimal interactions. We ran a taskcalled MobileClick at NTCIR and a track called Temporal Summarization at TREC.It is worth noting that ourMobileHCI paper (collaboration with the University of Tokyo) received anHonourable Mention Award.

  • サーチエンジン評価指標の体系化と有効性実証

    2014年  

     概要を見る

    We published three refereed papers (two forinternational conferences and one for a domestic conference) on how todetermine the topic set size of a test collection.

▼全件表示

 

現在担当している科目

▼全件表示