Updated on 2024/04/23

写真a

 
KAWAHARA, Daisuke
 
Affiliation
Faculty of Science and Engineering, School of Fundamental Science and Engineering
Job title
Professor

Research Experience

  • 2021.04
    -
    Now

    National Institute of Informatics   Visiting Professor

  • 2020.04
    -
    Now

    Waseda University   Faculty of Science and Engineering   Professor

  • 2017.07
    -
    Now

    RIKEN Center for Advanced Intelligence Project   Visiting Researcher

  • 2010.10
    -
    2020.03

    Kyoto University   Graduate School of Informatics   Associate Professor

  • 2008.10
    -
    2010.09

    National Institute of Information and Communications Technology   Senior Researcher

  • 2006.04
    -
    2008.09

    National Institute of Information and Communications Technology   Researcher

  • 2002.04
    -
    2006.03

    University of Tokyo   Graduate School of Information Science and Technology   Research Associate

▼display all

Education Background

  • 2005
    -
     

    博士(情報学) (京都大学)  

  • 1999
    -
    2002

    京都大学 大学院情報学研究科 知能情報学専攻 博士課程  

  • 1997
    -
    1999

    京都大学 大学院工学研究科 電子通信工学専攻 修士課程  

  • 1993
    -
    1997

    京都大学 工学部 電気工学第二学科  

Research Areas

  • Intelligent informatics

Awards

  • 言語処理学会第29回年次大会委員特別賞

    2023.03   言語処理学会   JCommonsenseQA 2.0: 計算機と人の協働による常識推論データセットの改良

    Winner: 栗原健太郎, 河原大輔, 柴田知秀

  • 言語処理学会第29回年次大会優秀賞

    2023.03   言語処理学会   日本語WiCデータセットの構築と読みづらさ検出への応用

    Winner: 吉田あいり, 河原大輔

  • 自然言語処理研究会優秀研究賞

    2022.09   情報処理学会 自然言語処理研究会   KWJA:汎用言語モデルに基づく日本語解析器

    Winner: 植田 暢大, 大村 和正, 児玉 貴志, 清丸 寛一, 村脇 有吾, 河原 大輔, 黒橋 禎夫

  • 言語処理学会第28回年次大会言語資源賞

    2022.03   言語処理学会   JGLUE: 日本語言語理解ベンチマーク

    Winner: 栗原健太郎, 河原大輔, 柴田知秀

  • 言語処理学会第27回年次大会言語資源賞

    2021.03   言語処理学会   日本語Wikipediaの編集履歴に基づく入力誤りデータセットと訂正システムの改良

    Winner: 田中佑, 村脇有吾, 河原大輔, 黒橋禎夫

  • Best Paper Award

    2020.03   The Association for Natural Language Processing  

  • 科学技術分野の文部科学大臣表彰 (科学技術賞・研究部門)

    2017.04   文部科学省   日本語テキスト解析のための統合的言語資源構築に関する研究

    Winner: 黒橋禎夫, 河原大輔

  • 20th Anniversary Best Paper Award

    2014.10   The Association for Natural Language Processing  

  • 第56回前島賞

    2011.03   情報分析エンジン「WISDOM」の開発

    Winner: 木俵豊, 黒橋禎夫, 赤峯享, 河原大輔, 加藤義清

  • 第14回年次大会優秀発表賞

    2009.03   言語処理学会   類似性を用いない並列構造解析

    Winner: 河原大輔, 黒橋禎夫

  • 第13回年次大会最優秀発表賞

    2008.03   言語処理学会   大規模日本語ウェブ文書を対象とした開放型検索エンジン基盤の構築

    Winner: 新里圭司, 柴田知秀, 河原大輔, 黒橋禎夫

  • 2007年論文賞

    2008.03   言語処理学会   自動構築した大規模格フレームに基づく構文・格解析の統合的確率モデル

    Winner: 河原大輔, 黒橋禎夫

  • 平成18年度山下記念研究賞

    2007.03   情報処理学会   高性能計算環境を用いたWebからの大規模格フレーム構築

    Winner: 河原大輔, 黒橋禎夫

  • 第12回年次大会最優秀発表賞

    2007.03   言語処理学会   Webから獲得した大規模格フレームに基づく構文・格解析の統合的確率モデル

    Winner: 河原大輔, 黒橋禎夫

  • 2005年論文賞

    2006.03   言語処理学会   格フレーム辞書の漸次的自動構築

    Winner: 河原大輔, 黒橋禎夫

  • 第8回年次大会優秀発表賞

    2002.06   言語処理学会   頑健な格解析を実現する格フレーム辞書の自動構築

    Winner: 河原大輔, 黒橋禎夫

  • 第6回年次大会優秀発表賞

    2000.06   言語処理学会   大規模コーパスからの格フレーム辞書構築とそれを用いた格解析

    Winner: 河原大輔, 鍜治伸裕, 黒橋禎夫

▼display all

 

Papers

  • JGLUE: 日本語言語理解ベンチマーク

    栗原 健太郎, 河原 大輔, 柴田 知秀

    自然言語処理   30 ( 1 ) 63 - 87  2023.03  [Refereed]

    DOI

  • SCTB-V2: the 2nd version of the Chinese treebank in the scientific domain

    Chenhui Chu, Zhuoyuan Mao, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

    Language Resources and Evaluation    2022.10

    DOI

    Scopus

  • Automatic Japanese Example Extraction for Flashcard-based Foreign Language Learning

    Arseny Tolmachev, Sadao Kurohashi, Daisuke Kawahara

    Journal of Information Processing   30   315 - 330  2022.04  [Refereed]

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • 日本語Wikipediaの編集履歴に基づく入力誤りデータセットと訂正システムの構築

    田中 佑, 村脇 有吾, 河原 大輔, 黒橋 禎夫

    自然言語処理   28 ( 4 ) 995 - 1033  2021.12  [Refereed]

    DOI

  • RODA: Reverse Operation Based Data Augmentation for Solving Math Word Problems

    Qianying Liu, Wenyu Guan, Sujian Li, Fei Cheng, Daisuke Kawahara, Sadao Kurohashi

    IEEE/ACM Transactions on Audio, Speech, and Language Processing   30   1 - 11  2021.11  [Refereed]

    DOI

  • 日本語談話関係解析: タスク設計・談話標識の自動認識・コーパスアノテーション

    岸本 裕大, 村脇 有吾, 河原 大輔, 黒橋 禎夫

    自然言語処理   27 ( 4 ) 889 - 931  2020.12  [Refereed]

  • Design and Structure of The Juman++ Morphological Analyzer Toolkit

    Arseny Tolmachev, Daisuke Kawahara, Sadao Kurohashi

    Journal of Natural Language Processing   27 ( 1 ) 89 - 132  2020.03  [Refereed]

    DOI

  • A System for Worldwide COVID-19 Information Aggregation.

    Akiko Aizawa, Frédéric Bergéron, Junjie Chen, Fei Cheng, Katsuhiko Hayashi, Kentaro Inui, Hiroyoshi Ito, Daisuke Kawahara, Masaru Kitsuregawa, Hirokazu Kiyomaru, Masaki Kobayashi, Takashi Kodama, Sadao Kurohashi, Qianying Liu, Masaki Matsubara, Yusuke Miyao, Atsuyuki Morishima, Yugo Murawaki, Kazumasa Omura, Haiyue Song, Eiichiro Sumita, Shinji Suzuki, Ribeka Tanaka, Yu Tanaka, Masashi Toyoda, Nobuhiro Ueda, Honai Ueoka, Masao Utiyama, Ying Zhong

       2020

    DOI

  • Annotating a Driving Experience Corpus with Behavior and Subjectivity

    Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi

    自然言語処理   26 ( 2 ) 329 - 359  2019.06  [Refereed]

  • Neural Network-based Chinese Joint Syntactic Analysis

    Shuhei Kurita, Daisuke Kawahara, Sadao Kurohashi

    Journal of Natural Language Processing   26 ( 1 ) 231 - 258  2019.03

    DOI

  • 事象に対する網羅的な時間情報アノテーションとその分析

    坂口 智洋, 河原 大輔, 黒橋 禎夫

    自然言語処理   26 ( 1 )  2019.03  [Refereed]

  • ニューラルネットワークを利用した中国語の統合的な構文解析

    栗田修平, 河原 大輔, 黒橋 禎夫

    自然言語処理   26 ( 1 )  2019.03  [Refereed]

  • Improving Chinese Semantic Role Labeling using High-quality Surface and Deep Case Frames

    Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

    自然言語処理   25 ( 2 ) 201 - 221  2018.03  [Refereed]

    DOI

  • Learning to Answer Questions by Understanding Using Entity-Based Memory Network

    Xun Wang, Katsuhito Sudoh, Masaaki Nagata, Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi

    Computacion y Sistemas   21 ( 4 ) 799 - 808  2017  [Refereed]

     View Summary

    This paper introduces a novel neural network model for question answering, the entity-based memory network. It enhances neural networks’ ability of representing and calculating information over a long period by keeping records of entities contained in text. The core component is a memory pool which comprises entities’ states. These entities’ states are continuously updated according to the input text. Questions with regard to the input text are used to search the memory pool for related entities and answers are further predicted based on the states of retrieved entities. Entities in this model are regard as the basic units that carry information and construct text. Information carried by text are encoded in the states of entities. Hence text can be best understood by analysing its containing entities. Compared with previous memory network models, the proposed model is capable of handling fine-grained information and more sophisticated relations based on entities. We formulated several different tasks as question answering problems and tested the proposed model. Experiments reported satisfying results.

    DOI

    Scopus

  • Chinese Word Segmentation and Unknown Word Extraction by Mining Maximized Substring

    Mo Shen, Daisuke Kawahara, Sadao Kurohashi

    自然言語処理   23 ( 3 ) 266 - 266  2016.06  [Refereed]

     View Summary

    <p>Chinese word segmentation is an initial and important step in Chinese language processing. Recent advances in machine learning techniques have boosted the performance of Chinese word segmentation systems, yet the identification of out-of-vocabulary words is still a major problem in this field of study. Recent research has attempted to address this problem by exploiting characteristics of frequent substrings in unlabeled data. We propose a simple yet effective approach for extracting a specific type of frequent substrings, called maximized substrings, which provide good estimations of unknown word boundaries. In the task of Chinese word segmentation, we use these substrings which are extracted from large scale unlabeled data to improve the segmentation accuracy. The effectiveness of this approach is demonstrated through experiments using various data sets from different domains. In the task of unknown word extraction, we apply post-processing techniques that effectively reduce the noise in the extracted substrings. We demonstrate the effectiveness and efficiency of our approach by comparing the results with a widely applied Chinese word recognition method in a previous study. </p>

    DOI CiNii J-GLOBAL

  • 関連語知識獲得のための対話システム上の連想ゲームのデザイン

    町田雄一郎, 河原大輔, 黒橋禎夫, 颯々野学

    情報処理学会論文誌   57 ( 3 ) 1058 - 1068  2016.03  [Refereed]

  • 受身・使役形と能動形間の格交替に関する語彙知識の自動獲得

    笹野遼平, 河原大輔, 黒橋禎夫, 奥村学

    自然言語処理   21 ( 6 ) 1207 - 1233  2014.12  [Refereed]

    DOI J-GLOBAL

  • Language-independent Approach to High Quality Dependency Selection From Automatic Parses

    Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

    自然言語処理   21 ( 6 ) 1163 - 1182  2014.12  [Refereed]

    DOI J-GLOBAL

  • Dependency parse reranking with rich subtree features

    Mo Shen, Daisuke Kawahara, Sadao Kurohashi

    IEEE Transactions on Audio, Speech and Language Processing   22 ( 7 ) 1208 - 1218  2014.07  [Refereed]

     View Summary

    In pursuing machine understanding of human language, highly accurate syntactic analysis is a crucial step. In this work, we focus on dependency grammar, which models syntax by encoding transparent predicate-argument structures. Recent advances in dependency parsing have shown that employing higherorder subtree structures in graph-based parsers can substantially improve the parsing accuracy. However, the inefficiency of this approach increases with the order of the subtrees. This work explores a new reranking approach for dependency parsing that can utilize complex subtree representations by applying efficient subtree selection methods. We demonstrate the effectiveness of the approach in experiments conducted on the Penn Treebank and the Chinese Treebank. Our system achieves the best performance among known supervised systems evaluated on these datasets, improving the baseline accuracy from 91.88% to 93.42% for English, and from 87.39% to 89.25% for Chinese.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • 外界照応および著者・読者表現を考慮した日本語ゼロ照応解析

    萩行正嗣, 河原大輔, 黒橋禎夫

    自然言語処理   21 ( 3 ) 563 - 600  2014.06  [Refereed]

    DOI J-GLOBAL

  • 多様な文書の書き始めに対する意味関係タグ付きコーパスの構築とその分析

    萩行正嗣, 河原大輔, 黒橋禎夫

    自然言語処理   21 ( 2 ) 213 - 247  2014.04  [Refereed]

    DOI J-GLOBAL

  • Chinese-Japanese machine translation exploiting Chinese characters

    Chenhui Chu, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

    ACM Transactions on Asian Language Information Processing   12 ( 4 ) 1 - 25  2013.10  [Refereed]

     View Summary

    The Chinese and Japanese languages share Chinese characters. Since the Chinese characters in Japanese originated from ancient China, many common Chinese characters exist between these two languages. Since Chinese characters contain significant semantic information and common Chinese characters share the same meaning in the two languages, they can be quite useful in Chinese-Japanese machine translation (MT). We therefore propose a method for creating a Chinese character mapping table for Japanese, traditional Chinese, and simplified Chinese, with the aim of constructing a complete resource of common Chinese characters. Furthermore, we point out two main problems in Chinese word segmentation for Chinese-Japanese MT, namely, unknown words and word segmentation granularity, and propose an approach exploiting common Chinese characters to solve these problems. We also propose a statistical method for detecting other semantically equivalent Chinese characters other than the common ones and a method for exploiting shared Chinese characters in phrase alignment. Results of the experiments carried out on a state-of-the-art phrase-based statistical MT system and an example-based MT system show that our proposed approaches can improve MT performance significantly, thereby verifying the effectiveness of shared Chinese characters for Chinese-Japanese MT. © 2013 ACM.

    DOI J-GLOBAL

    Scopus

    19
    Citation
    (Scopus)
  • TSUBAKI: An open search engine infrastructure for developing information access methodology

    Keiji Shinzato, Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi

    Journal of Information Processing   20 ( 1 ) 216 - 227  2012  [Refereed]

     View Summary

    Due to the explosive growth in the amount of information in the last decade, it is getting extremely harder to obtain necessary information by conventional information access methods. Hence, creation of drastically new technology is needed. For developing such new technology, search engine infrastructures are required. Although the existing search engine APIs can be regarded as such infrastructures, these APIs have several restrictions such as a limit on the number of API calls. To help the development of new technology, we are running an open search engine infrastructure, TSUBAKI, on a high-performance computing environment. In this paper, we describe TSUBAKI infrastructure. © 2012 Information Processing Society of Japan.

    DOI

    Scopus

    12
    Citation
    (Scopus)
  • 構文・照応・評価情報つきブログコーパスの構築

    橋本 力, 黒橋 禎夫, 河原 大輔, 新里 圭司, 永田 昌明

    自然言語処理 (技術資料)   18 ( 2 ) 175 - 201  2011.06  [Refereed]

    DOI J-GLOBAL

  • The Effect of Corpus Size on Case Frame Acquisition for Predicate-Argument Structure Analysis

    Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E93D ( 6 ) 1361 - 1368  2010.06  [Refereed]

     View Summary

    This paper reports the effect of corpus size on case frame acquisition for predicate-argument structure analysis in Japanese. For this study, we collect a Japanese corpus consisting of up to 100 billion words, and construct case frames from corpora of six different sizes. Then, we apply these case frames to syntactic and case structure analysis, and zero anaphora resolution, in order to investigate the relationship between the corpus size for case frame acquisition and the performance of predicate-argument structure analysis. We obtained better analyses by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 billion words.

    DOI J-GLOBAL

    Scopus

    2
    Citation
    (Scopus)
  • Identifying information senders of web pages

    Yoshikiyo Kato, Daisuke Kawahara, Kentaro Inui, Sadao Kurohashi, Tomohide Shibata

    Transactions of the Japanese Society for Artificial Intelligence   25 ( 1 ) 90 - 103  2010  [Refereed]

     View Summary

    The source of information is one of the crucial elements when judging the credibility of the information. On the currentWeb, however, the information about the source is not readily available to the users. In this paper, we formulate the problem of identifying the information source as the problem of identifying the information sender configuration (ISC) of a Web page. An information sender of a Web page is an entity which is involved in the publication of the information on the page. An information sender configuration of aWeb page describes the information senders of the page and the relationship among them. Information sender identification is a sub-problem of identifying ISC, and we present a method for extracting information senders fromWeb pages, along with its evaluation. ISC provides a basis for deeper analysis of information on the Web.

    DOI J-GLOBAL

    Scopus

  • Compilation of an idiom example database for supervised idiom identification

    Chikara Hashimoto, Daisuke Kawahara

    LANGUAGE RESOURCES AND EVALUATION   43 ( 4 ) 355 - 384  2009.12  [Refereed]

     View Summary

    Some phrases can be interpreted in their context either idiomatically (figuratively) or literally. The precise identification of idioms is essential in order to achieve full-fledged natural language processing. Because of this, the authors of this paper have created an idiom corpus for Japanese. This paper reports on the corpus itself and the results of an idiom identification experiment conducted using the corpus. The corpus targeted 146 ambiguous idioms, and consists of 102,856 examples, each of which is annotated with a literal/idiomatic label. All sentences were collected from the World Wide Web. For idiom identification, 90 out of the 146 idioms were targeted and a word sense disambiguation (WSD) method was adopted using both common WSD features and idiom-specific features. The corpus and the experiment are both, as far as can be determined, the largest of their kinds. It was discovered that a standard supervised WSD method works well for idiom identification and it achieved accuracy levels of 89.25 and 88.86%, with and without idiom-specific features, respectively. It was also found that the most effective idiom-specific feature is the one that involves the adjacency of idiom constituents.

    DOI

    Scopus

    13
    Citation
    (Scopus)
  • Extracting the author of web pages

    Yoshikiyo Kato, Daisuke Kawahara, Kentaro Inui, Sadao Kurohashi, Tomohide Shibata

    International Conference on Information and Knowledge Management, Proceedings     35 - 41  2008  [Refereed]

     View Summary

    In this paper, we define the problem of identifying the author of a Web page as a sub-problem of identifying the information sender configuration of a Web page. We propose a method that extracts the author name candidates from a Web page based on linguistic features, and rank the candidates based on local features such as distance from the main content. The evaluation shows that we can achieve more than 75% precision when evaluated with candidates ranked within top five. Copyright 2008 ACM.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • 格フレームを用いた自然言語処理〈下〉――格フレームに基づく構文・格解析とその応用

    黒橋 禎夫, 河原 大輔

    月刊「言語」   36 ( 12 ) 76 - 83  2007.12  [Refereed]

  • 格フレームを用いた自然言語処理〈上〉――基本語彙の整理と格フレームの自動獲得

    黒橋 禎夫, 河原 大輔

    月刊「言語」   36 ( 11 ) 94 - 100  2007.11  [Refereed]

  • Automatic Text Presentation for the Conversational Knowledge Process

    Sadao Kurohashi, Daisuke Kawahara, Nobuhiro Kaji, Tomohide Shibata

    Conversational Informatics: An Engineering Approach     201 - 216  2007.10  [Refereed]

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • 自動構築した大規模格フレームに基づく構文・格解析の統合的確率モデル

    河原 大輔, 黒橋 禎夫

    自然言語処理   14 ( 4 ) 67 - 81  2007.07  [Refereed]

    DOI J-GLOBAL

  • Extraction of questions behind messages

    Naohiro Matsumura, Daisuke Kawahara, Masashi Okamoto, Sadao Kurohashi, Toyoaki Nishida

    Transactions of the Japanese Society for Artificial Intelligence   22 ( 1 ) 93 - 102  2007  [Refereed]

     View Summary

    To overcome the limitation of conventional text-mining approaches in which frequent patterns of word occurrences are to be extracted to understand obvious user needs, this paper proposes an approach to extracting questions behind messages to understand potential user needs. We first extract characteristic case frames by comparing the case frames constructed from target messages with the ones from 25M sentences in the Web and 20M sentences in newspaper articles of 20 years. Then we extract questions behind messages by transforming the characteristic case frames into interrogative sentences based on new information and old information, i.e., replacing new information with WH-question words. The proposed approach is, in other words, a kind of classification of word occurrence pattern. Qualitative evaluations of our preliminary experiments suggest that extracted questions show problem consciousness and alternative solutions - all of which help to understand potential user needs.

    DOI J-GLOBAL

    Scopus

    2
    Citation
    (Scopus)
  • Cards-to-presentation on the web: generating multimedia contents featuring agent animations

    YI Nakano, T Murayama, M Okamoto, D Kawahara, Q Li, S Kurohashi, T Nishida

    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS   29 ( 2-3 ) 83 - 104  2006.04  [Refereed]

     View Summary

    With the goal of supporting the knowledge circulation and creation process in a society, we have studied story-based communication in a network community. On the basis of this research motivation, this paper proposes a web-based multimedia environment called Stream-oriented Public Opinion Channel (SPOC), which enables novice users to embody a story as multimedia content and distribute it on the Internet. The system produces digital camera work for graphics and video clips and automatically generates agent animations according to linguistic information in a text. The findings of our evaluation experiments show that SPOC is easy for novice users to learn and use, suggesting that this system can reduce a user's cost in creating multimedia content and encourage communication in a network community. (c) 2005 Elsevier Ltd. All rights reserved.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • 名詞格フレーム辞書の自動構築とそれを用いた名詞句の関係解析

    笹野 遼平, 河原 大輔, 黒橋 禎夫

    自然言語処理   12 ( 3 ) 129 - 144  2005.07  [Refereed]

    DOI J-GLOBAL

  • Zero pronoun resolution based on automatically constructed case frames and structural preference of antecedents

    D Kawahara, S Kurohashi

    NATURAL LANGUAGE PROCESSING - IJCNLP 2004   3248   12 - 21  2005  [Refereed]

     View Summary

    This paper describes a method to detect and resolve zero pronouns in Japanese text. We detect zero pronouns by case analysis based on automatically constructed case frames, and select their appropriate antecedents based on similarity to examples in the case frames. We also introduce structural preference of antecedents to precisely capture the tendency that a zero pronoun has its antecedent in its close position. Experimental results on 100 articles indicated that the precision and recall of zero pronoun detection is 87.1% and 74.8% respectively and the accuracy of antecedent estimation is 61.8%.

  • 会話型知識プロセスのための言語情報のメディア変換

    黒橋禎夫, 大泉敏貴, 柴田知秀, 鍜治伸裕, 河原大輔, 岡本雅史, 西田豊明

    社会技術研究論文集   2   173-180  2004.10  [Refereed]

  • Text understanding for conversational agent

    D Kawahara, R Sasano, S Kurohashi

    INTELLIGENT MEDIA TECHNOLOGY FOR COMMUNICATIVE INTELLIGENCE   3490   12 - 20  2004  [Refereed]

     View Summary

    This paper describes a text understanding system for conversational agents. The system resolves zero, direct and indirect anaphors in Japanese texts by integrating two sorts of linguistic resources: a hand-annotated corpus with various relations and automatically constructed case frames. The corpus has relevance tags which consist of predicate-argument relations, relations between nouns and coreferences, and is utilised for learning parameters of the system and testing it. The case frames are indispensable knowledge both for detecting zero/indirect anaphors. and estimating appropriate antecedents. Our preliminary experiments showed promising results.

  • Predicate Paraphrasing based on Case Frame Alignment

    KAJI NOBUHIRO, KAWAHARA DAISUKE, KUROHASHI SADAO, SATO SATOSHI

    Journal of natural language processing   10 ( 4 ) 65 - 81  2003.07  [Refereed]

    DOI CiNii

  • 用言と直前の格要素の組を単位とする格フレームの自動構築

    河原大輔, 黒橋禎夫

    自然言語処理 = Journal of natural language processing   9 ( 1 ) 3 - 19  2002.01  [Refereed]

    DOI CiNii

▼display all

Research Projects

  • 百科事典的意味論に基づくレキシコンの研究:大規模コーパスを用いた実証的研究

    日本学術振興会  科学研究費助成事業

    Project Year :

    2023.04
    -
    2027.03
     

    松本 曜, 小原 京子, 中嶌 浩貴, 籾山 洋介, 河原 大輔, 加藤 祥, 陳 奕廷

  • 自己超越的感情の生起メカニズムに関する心理・生物・情報学的研究

    日本学術振興会  科学研究費助成事業

    Project Year :

    2022.04
    -
    2025.03
     

    野村 理朗, 河原 大輔, 高橋 英之, 西平 直

  • Building General Language Understanding Infrastructure by Fusing Computational and Human Intelligence

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

    Project Year :

    2021.04
    -
    2025.03
     

  • Acquisition of Knowledge Frame with Denotational and Connotational Meanings and its Application to Text Understanding

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Project Year :

    2018.04
    -
    2021.03
     

    Kawahara Daisuke

     View Summary

    We have acquired knowledge about denotation and connotation for case frames and "events" based on predicate-argument structures to achieve natural language understanding. For denotation knowledge, we mapped case frames to semantic frames of FrameNet and induced semantic frames using deep learning techniques. For connotation knowledge, we gradually acquired emotion knowledge for events. We also devised a method to use knowledge in deep learning models.

  • Core Technologies for Analyzing Big Text by Combining Intelligence of Computers and Humans

    JST  PRESTO

    Project Year :

    2014.10
    -
    2018.03
     

  • Symbol Grounding based on Metadata Recognition for Web Contents

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

    Project Year :

    2013.04
    -
    2016.03
     

    Kawahara Daisuke

     View Summary

    We developed a system for recognizing metadata of Web contents and real-world referents of named entities. This system identifies authors of Web contents as our target metadata and referents of location expressions in Web contents. By using deep natural language processing techniques and real-world information, we achieved more precise analysis than previous systems.

  • Development of the Interactive Method for Foreign Language Learning Based on Linguistic Knowledge Extracted from the Web Corpus

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

    Project Year :

    2012.04
    -
    2016.03
     

    Hajime Nozawa, KAWAHARA Daisuke, LEE Jae-ho

     View Summary

    We conducted a survey with a large scale English newspaper corpus to see what types of nouns are frequently used for various verbs as their subjects and objects, and defined the patterns of the co-occurring verbs and nouns extracted from the corpus as the grammatical knowledge which learners of English need to acquire to master the use of verbs. We created grammatical exercises for learners which are more natural and consistent with encyclopedic knowledge, by manually editing the examples of such usages taken from the corpus. We also developed a gamelike e-learning system on which learners can interactively do the exercises.

  • Modeling Context Analysis based on Semantic Annotation on Diverse Documents

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Project Year :

    2012.04
    -
    2015.03
     

    KUROHASHI Sadao, KAWAHARA Daisuke, SHIBATA Tomohide

     View Summary

    To study semantic analysis of natural language texts, a corpus annotated with semantic relations is required. Although existing corpora annotated with semantic relations have been restricted to newspaper articles, there are texts of various genres and styles containing linguistic expressions that are missing in newspaper articles. In this research, we defined annotation criteria for linguistic phenomena which have not been treated using existing criteria. We built a diverse document leads corpus annotated with semantic relations. We also proposed a novel approach for rapidly developing a corpus with discourse annotations using crowdsourcing.

  • Co-deepening Knowledge Acquisition and Linguistic Analysis based on Large-scale Observation of Language Uses

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Young Scientists (A)

    Project Year :

    2011.04
    -
    2014.03
     

    KAWAHARA Daisuke

     View Summary

    First, we automatically acquired linguistic knowledge from a large-scale web corpus. The acquired linguistic knowledge mainly consists of case frames, which describe the relations between predicates and their arguments. Then, we integrated such linguistic knowledge into a linguistic analyzer to improve the performance of the analyzer. We also developed an information retrieval system based on the knowledge-rich linguistic analyzer, and confirmed that our information retrieval system outperformed conventional information retrieval systems based on bag of words and dependency relations.

  • Construction of Information Retrieval Infrastructure Based on Structural Natural Language Processing

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research on Priority Areas

    Project Year :

    2007
    -
    2010
     

    KUROHASHI Sadao, KAWAHARA Daisuke, SHIBATA Tomohide, SHINZATO Keiji, SASANO Ryohei

     View Summary

    The essential purpose of Information Retrieval is not to get relevant documents, but to obtain relevant information and knowledge. In order to achieve this, we believe that text understanding by machine, or Natural Language Processing is the most important aspect. This research project constructed IR infrastructure based on structural NLP, analyzing predicate argument structures in texts, handling expressive diversity in natural language, and providing a bird's-eye view towards a given topic by organizing and relating information.

▼display all

Misc

  • KWJA:汎用言語モデルに基づく日本語解析器

    植田 暢大, 大村 和正, 児玉 貴志, 清丸 寛一, 村脇 有吾, 河原 大輔, 黒橋 禎夫

    情報処理学会 第253回自然言語処理研究会   2022-NL-253 ( 2 ) 1 - 14  2022.09

  • The 1st Workshop on Construction and Improving Usability of Japanese Evaluation Datasets (JED2022) - Achievements and Prospects

    Hiroshi Matsuda, Tomohide Shibata, Daisuke Kawahara, Sorami Hisamoto, Takahiro Kubo, Masayuki Asahara

    Journal of Natural Language Processing   29 ( 3 ) 1023 - 1029  2022.09  [Invited]

    DOI

  • Building a Personalized Dialogue System with Prompt-Tuning

    Tomohito Kasahara, Daisuke Kawahara, Nguyen Tung, Shengzhe Li, Kenta Shinzato, Toshinori Sato

    In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop     96 - 105  2022.07  [Refereed]

  • Generate, Evaluate, and Select: A Dialogue System with a Response Evaluator for Diversity-Aware Response Generation

    Ryoma Sakaeda, Daisuke Kawahara

    In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop     76 - 82  2022.07  [Refereed]

  • Grounding in social media: An approach to building a chit-chat dialogue model

    Ritvik Choudhary, Daisuke Kawahara

    In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop     9 - 15  2022.07  [Refereed]

  • JGLUE: Japanese General Language Understanding Evaluation

    Kentaro Kurihara, Daisuke Kawahara, Tomohide Shibata

    In Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)     2957 - 2966  2022.06  [Refereed]

  • JGLUE: 日本語言語理解ベンチマーク

    栗原 健太郎, 河原 大輔, 柴田 知秀

    自然言語処理   29 ( 2 ) 711 - 717  2022.06  [Invited]

    DOI

  • Building a Dialogue Corpus Annotated with Expressed and Experienced Emotions

    Tatsuya Ide, Daisuke Kawahara

    In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop     21 - 30  2022.05  [Refereed]

  • JGLUE: 日本語言語理解ベンチマーク

    栗原健太郎, 河原大輔, 柴田知秀

    言語処理学会 第28回年次大会     2023 - 2028  2022.03

    Research paper, summary (national, other academic conference)  

  • 構造的曖昧性に基づく読みづらさの検出

    吉田あいり, 河原大輔

    言語処理学会 第28回年次大会     425 - 429  2022.03

    Research paper, summary (national, other academic conference)  

  • 応答の生成・評価・選択による対話システム

    榮田亮真, 河原大輔

    言語処理学会 第28回年次大会     380 - 385  2022.03

    Research paper, summary (national, other academic conference)  

  • ソーシャルメディア上のインタラクションを利用したオープンドメイン対話応答生成

    Ritvik Choudhary, 河原大輔

    言語処理学会 第28回年次大会     392 - 397  2022.03

    Research paper, summary (national, other academic conference)  

  • 表出感情と経験感情をタグ付けした対話コーパスの構築

    井手竜也, 河原大輔

    言語処理学会 第28回年次大会     386 - 391  2022.03

    Research paper, summary (national, other academic conference)  

  • Prompt-Tuningによる個性を持った対話システムの構築

    笠原智仁, 河原大輔

    言語処理学会 第28回年次大会     179 - 184  2022.03

    Research paper, summary (national, other academic conference)  

  • Multi-Task Learning of Generation and Classification for Emotion-Aware Dialogue Response Generation

    Tatsuya Ide, Daisuke Kawahara

    NAACL Student Research Workshop (SRW) 2021     119 - 125  2021.06  [Refereed]

  • 日本語Wikipediaの編集履歴に基づく入力誤りデータセットと訂正システムの改良

    田中佑, 村脇有吾, 河原大輔, 黒橋禎夫

    言語処理学会 第27回年次大会     1540 - 1545  2021.03

  • 集合知を用いた大規模意味的フレーム知識の構築

    小原京子, 河原大輔, 笹野遼平, 関根聡

    言語処理学会 第27回年次大会     554 - 558  2021.03

  • ファクトチェック支援のための含意関係認識システム

    栗原健太郎, 河原大輔

    言語処理学会 第27回年次大会     1734 - 1739  2021.03

  • 逆翻訳とフィルタリングによる擬似対話コーパスの生成とそれを用いた対話システムの学習

    榮田亮真, 河原大輔

    言語処理学会 第27回年次大会     647 - 652  2021.03

  • 生成と分類のマルチタスク学習による感情が考慮された対話応答生成

    井手竜也, 河原大輔

    言語処理学会 第27回年次大会     642 - 646  2021.03

  • オープンコラボレーションによるCOVID-19世界情報集約サイトの構築

    河原 大輔

    自然言語処理   27 ( 4 ) 939 - 943  2020.12  [Invited]

    Article, review, commentary, editorial, etc. (scientific journal)  

  • BERT-based Cohesion Analysis of Japanese Texts

    Nobuhiro Ueda, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 28th International Conference on Computational Linguistics (COLING2020)     1323 - 1333  2020.12  [Refereed]

    Research paper, summary (international conference)  

  • A Method for Building a Commonsense Inference Dataset based on Basic Events

    Kazumasa Omura, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP2020)     2450 - 2460  2020.11  [Refereed]

    Research paper, summary (international conference)  

  • Minimize Exposure Bias of Seq2Seq Models in Joint Entity and Relation Extraction

    Haoran Zhang, Qianying Liu, Aysa Xuemo Fan, Heng Ji, Daojian Zeng, Fei Cheng, Daisuke Kawahara, Sadao Kurohashi

    In Findings of the Association for Computational Linguistics: EMNLP 2020     236 - 246  2020.11  [Refereed]

  • A System for Worldwide COVID-19 Information Aggregation

    Akiko Aizawa, Frederic Bergeron, Junjie Chen, Fei Cheng, Katsuhiko Hayashi, Kentaro Inui, Hiroyoshi Ito, Daisuke Kawahara, Masaru Kitsuregawa, Hirokazu Kiyomaru, Masaki Kobayashi, Takashi Kodama, Sadao Kurohashi, Qianying Liu, Masaki Matsubara, Yusuke Miyao, Atsuyuki Morishima, Yugo Murawaki, Kazumasa Omura, Haiyue Song, Eiichiro Sumita, Shinji Suzuki, Ribeka Tanaka, Yu Tanaka, Masashi Toyoda, Nobuhiro Ueda, Honai Ueoka, Masao Utiyama, Ying Zhong

    In Proceedings of Workshop on NLP for COVID-19 (Part 2) at EMNLP2020    2020.11  [Refereed]

  • 機械翻訳を用いた自然言語推論データセットの多言語化

    吉越 卓見, 河原 大輔, 黒橋 禎夫

    情報処理学会 第244回自然言語処理研究会    2020.07

  • Building a Japanese Typo Dataset from Wikipedia's Revision History

    Yu Tanaka, Yugo Murawaki, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (ACL2020SRW)     230 - 236  2020.07  [Refereed]

    Research paper, summary (international conference)  

  • Acquiring Social Knowledge about Personality and Driving-related Behavior

    Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi

    In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC2020)     2306 - 2315  2020.05  [Refereed]

    Research paper, summary (international conference)  

  • Development of a Japanese Personality Dictionary based on Psychological Methods

    Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi

    In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC2020)     3103 - 3108  2020.05  [Refereed]

    Research paper, summary (international conference)  

  • クラウドソーシングを用いた日本語述語項構造タグ付きコーパスの拡張

    阿部 航平, 河原 大輔, 黒橋 禎夫

    言語処理学会 第26回年次大会    2020.03

  • 基本イベントに基づく常識推論データセットの構築

    大村 和正, 河原 大輔, 黒橋 禎夫

    言語処理学会 第26回年次大会    2020.03

  • 因果関係グラフ: 構造的言語処理に基づくイベントの原因・結果・解決策の集約

    清丸 寛一, 植田 暢大, 児玉 貴志, 田中 佑, 岸本 裕大, 田中 リベカ, 河原 大輔, 黒橋 禎夫

    言語処理学会 第26回年次大会    2020.03

  • Wikipediaの修正履歴を用いた日本語入力誤りデータセットの構築

    田中 佑, 村脇 有吾, 河原 大輔, 黒橋 禎夫

    言語処理学会 第26回年次大会    2020.03

  • BERTとRefinementネットワークによる統合的照応・共参照解析

    植田 暢大, 河原 大輔, 黒橋 禎夫

    言語処理学会 第26回年次大会    2020.03

  • 対話テキスト中の自己主張及び感情の分析に基づくソーシャルスタイル推定

    高橋 憲生, 河原 大輔, 黒橋 禎夫

    言語処理学会 第26回年次大会    2020.03

  • Tree-structured Decoding for Solving Math Word Problems

    Qianying Liu, Wenyu Guan, Sujian Li, Daisuke Kawahara

    In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing    2019.11  [Refereed]

  • Machine Comprehension Improves Domain-Specific Japanese Predicate-Argument Structure Analysis

    Norio Takahashi, Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of EMNLP-IJCNLP 2019 Workshop MRQA: Machine Reading for Question Answering, Hong Kong    2019.11  [Refereed]

  • Diversity-aware Event Prediction based on a Conditional Variational Autoencoder with Reconstruction

    Hirokazu Kiyomaru, Kazumasa Omura, Yugo Murawaki, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing (COIN)     113 - 122  2019.11  [Refereed]

  • A Community Detection Method Towards Analysis of Xi Feng Parties in the Northern Song Dynasty

    Qianying Liu, Qiyao Wang, Wending Chen, Daisuke Kawahara

    In Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33)    2019.09  [Refereed]

  • A Better Ad Experience: Click Prediction Leveraging Sequential Networks Derived Specifically From User Search Behaviors

    Shengzhe Li, Tomoko Izumi, Yu Kuratake, Jiali Yao, Jerry Turner, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33)    2019.09  [Refereed]

  • Applying Machine Translation to Psychology: Automatic Translation of Personality Adjectives

    Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi

    In Proceedings of the 17th Machine Translation Summit (MT Summit XVII)    2019.08  [Refereed]

  • Emotion helps Sentiment: A Multi-task Model for Sentiment and Emotion Analysis

    Abhishek Kumar, Asif Ekbal, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 2019 International Joint Conference on Neural Networks    2019.07  [Refereed]

  • Shrinking Japanese Morphological Analyzers With Neural Networks and Semi-supervised Learning

    Arseny Tolmachev, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of NAACL-HLT 2019: Annual Conference of the North American Chapter of the Association for Computational Linguistics     2744 - 2755  2019.06  [Refereed]

  • クラウドソーシングを用いた習得時期の想起質問に基づく単語難易度データベースの構築

    水谷 勇介, 河原 大輔, 黒橋 禎夫

    言語処理学会 第25回年次大会     1503 - 1506  2019.03

  • ドメインを限定した機械読解モデルに基づく述語項構造解析

    高橋 憲生, 柴田 知秀, 河原 大輔, 黒橋 禎夫

    言語処理学会 第25回年次大会    2019.03

  • 大規模な自動解析データが形態素解析器をどこまで小さくできるか

    Arseny Tolmachev, 河原大輔, 黒橋禎夫

    言語処理学会 第25回年次大会    2019.03

  • BERTによる日本語構文解析の精度向上

    柴田 知秀, 河原 大輔, 黒橋 禎夫

    言語処理学会 第25回年次大会     205 - 208  2019.03

  • Conditional VAEに基づく多様性を考慮したイベント予測

    清丸 寛一, 大村和正, 村脇有吾, 河原大輔, 黒橋禎夫

    言語処理学会第25回年次大会    2019.03

  • クラウドソーシングによる大喜利の面白さの構成要素の分析

    中川 裕貴, 村脇 有吾, 河原 大輔, 黒橋 禎夫

    言語処理学会 第25回年次大会    2019.03

  • Annotating a Driving Experience Corpus with Behavior and Subjectivity

    Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi

    In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation (PACLIC 32)    2018.12  [Refereed]

  • Juman++: A Morphological Analysis Toolkit for Scriptio Continua

    Arseny Tolmachev, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of EMNLP 2018: Conference on Empirical Methods in Natural Language Processing, System Demonstrations     54 - 54  2018.11  [Refereed]

  • Cross-lingual Knowledge Projection Using Machine Translation and Target-side Knowledge Base Completion

    Naoki Otani, Hirokazu Kiyomaru, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics     1508 - 1520  2018.08  [Refereed]

  • Neural Adversarial Training for Semi-supervised Japanese Predicate-argument Structure Analysis

    Shuhei Kurita, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL2018)    2018.07  [Refereed]

    DOI

  • Knowledge-enriched Two-layered Attention Network for Sentiment Analysis

    Abhishek Kumar, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT2018), Volume 2 (Short Papers)     253 - 258  2018.06  [Refereed]

  • JDCFC: A Japanese Dialogue Corpus with Feature Changes

    Tetsuaki Nakamura, Daisuke Kawahara

    In Proceedings of the 11th Edition of its Language Resources and Evaluation Conference     2915 - 2921  2018.05  [Refereed]

  • Comprehensive Annotation of Various Types of Temporal Information on the Time Axis

    Tomohiro Sakaguchi, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 11th Edition of its Language Resources and Evaluation Conference     332 - 338  2018.05  [Refereed]

  • Improving Crowdsourcing-Based Annotation of Japanese Discourse Relations

    Yudai Kishimoto, Shinnosuke Sawada, Yugo Murawaki, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 11th Edition of its Language Resources and Evaluation Conference    2018.05  [Refereed]

  • JFCKB: Japanese Feature Change Knowledge Base

    Tetsuaki Nakamura, Daisuke Kawahara

    In Proceedings of the 11th Edition of its Language Resources and Evaluation Conference     1398 - 1404  2018.05  [Refereed]

  • 述語項構造に基づく言語情報の基本単位のデザインと可視化

    齋藤 純, 坂口 智洋, 柴田 知秀, 河原 大輔, 黒橋 禎夫

    言語処理学会 第24回年次大会     93 - 93  2018.03

  • クラウドソーシングによる日本語FrameNetと自動構築した格フレームとの対応付け

    河原 大輔, 小原 京子, 関根 聡, 乾 健太郎

    言語処理学会第24回年次大会     706 - 709  2018.03

  • 感情を含む特徴変化情報付き対話コーパスの構築とそれを用いた対話の自然さ推定

    仲村 哲明, 河原 大輔

    言語処理学会第24回年次大会     654 - 657  2018.03

  • 意見分析に適した意見タグ獲得改善への取り組み

    三澤 賢祐, 成田 和弥, 伊藤 友博, 柴田 知秀, 河原 大輔, 黒橋 禎夫

    言語処理学会 第24回年次大会     572 - 572  2018.03

  • 京都大学テキストコーパスに対する 網羅的な時間情報アノテーション

    坂口智洋, 河原大輔, 黒橋禎夫

    情報処理学会 第233回自然言語処理研究会    2017.10

  • Automatically Acquired Lexical Knowledge Improves Japanese Joint Morphological and Dependency Analysis

    Daisuke Kawahara, Yuta Hayashibe, Hajime Morita, Sadao Kurohashi

    In Proceedings of the 15th International Conference on Parsing Technologies (IWPT2017)     1 - 10  2017.09  [Refereed]

  • 実テキストの情報分析のための頑健な言語処理基盤

    河原大輔, 黒橋禎夫, 林部祐太, 森田一, Arseny Tolmachev

    第11回テキストアナリティクス・シンポジウム     25 - 30  2017.09

  • Improving Chinese Semantic Role Labeling using High-quality Surface and Deep Case Frames

    Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL2017)     567 - 576  2017.04  [Refereed]

  • ニューラルネットワークに基づく単語分割・品詞付与・構文解析の統合解析

    栗田 修平, 河原 大輔, 黒橋 禎夫

    言語処理学会第23回年次大会    2017.03

  • 集合知により獲得された事態参与者の特徴変化知識に基づく照応解析

    仲村 哲明, 河原 大輔

    言語処理学会第23回年次大会     767 - 770  2017.03

  • 容認度判定の実態調査の報告: その実体は不均一な反応からなる,バイアスのかかった心理評定である

    黒田 航, 仲村 哲明, 河原 大輔

    言語処理学会第23回年次大会     398 - 401  2017.03

  • 対訳コーパスを用いたゼロ照応タグ付きコーパスの自動構築

    古川 智雅, 中澤 敏明, 柴田 知秀, 河原 大輔, 黒橋 禎夫

    言語処理学会第23回年次大会    2017.03

  • クラウドソーシングを用いた談話関係アノテーションの改良

    岸本 裕大, 澤田 晋之介, 村脇 有吾, 河原 大輔, 黒橋 禎夫

    言語処理学会第23回年次大会     819 - 819  2017.03

  • Neural joint model for transition-based Chinese syntactic analysis

    Shuhei Kurita, Daisuke Kawahara, Sadao Kurohashi

    ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)   1   1204 - 1214  2017  [Refereed]

     View Summary

    We present neural network-based joint models for Chinese word segmentation, POS tagging and dependency parsing. Our models are the first neural approaches for fully joint Chinese analysis that is known to prevent the error propagation problem of pipeline models. Although word embeddings play a key role in dependency parsing, they cannot be applied directly to the joint task in the previous work. To address this problem, we propose embeddings of character strings, in addition to words. Experiments show that our models outperform existing systems in Chinese word segmentation and POS tagging, and perform preferable accuracies in dependency parsing. We also explore bi-LSTM models with fewer features.

    DOI

  • Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language

    Mo Shen, Wingmui Li, HyunJeong Choe, Chenhui Chu, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 26th International Conference on Computational Linguistics (COLING2016)     298 - 308  2016.12  [Refereed]

  • SCTB: A Chinese Treebank in Scientific Domain

    Chenhui Chu, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 12th Workshop on Asian Language Resources (ALR12 2016)    2016.12  [Refereed]

  • Large-Scale Acquisition of Commonsense Knowledge via a Quiz Game on a Dialogue System

    Naoki Otani, Daisuke Kawahara, Sadao Kurohashi, Nobuhiro Kaji, Manabu Sassano

    In Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)     11 - 20  2016.12  [Refereed]

  • IRT-based Aggregation Model of Crowdsourced Pairwise Comparison for Evaluating Machine Translations

    Naoki Otani, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of EMNLP 2016: Conference on Empirical Methods in Natural Language Processing     511 - 520  2016.11  [Refereed]

  • Age Related Differences in Episodic Memory Recollections: Applying Latent Dirichlet Allocation to Free-Writings on Driving Incidents by Older and Young Drivers

    Ritsuko Iwai, Takatsune Kumada, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 38th Annual Conference of Cognitive Science Society (poster)    2016.08  [Refereed]

  • Leveraging VerbNet to Build Corpus-Specific Verb Clusters

    Daniel Peterson, Daisuke Kawahara, Jordan Boyd-Graber, Martha Palmer

    In Proceedings of *SEM 2016: The Fifth Joint Conference on Lexical and Computational Semantics     102 - 107  2016.08  [Refereed]

  • Neural Network-Based Model for Japanese Predicate Argument Structure Analysis

    Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL2016)     1235 - 1244  2016.08  [Refereed]

  • Design of Word Association Games using Dialog Systems for Acquisition of Word Association Knowledge

    Yuichiro Machida, Daisuke Kawahara, Sadao Kurohashi, Manabu Sassano

    In Proceedings of the 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016     86 - 91  2016.06  [Refereed]

  • Constructing a Dictionary Describing Feature Changes of Arguments in Event Sentences

    Tetsuaki Nakamura, Daisuke Kawahara

    In Proceedings of the 4th Workshop on EVENTS: Definition, Detection, Coreference, and Representation    2016.06  [Refereed]

  • M2L at SemEval-2016 Task 8 “Meaning Representation Parsing”: AMR Parsing with Neural Networks

    Yevgeniy Puzikov, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016)    2016.06  [Refereed]

  • 多言語述語項構造ベクトル表現の学習

    宇野真矢, 柴田知秀, 河原大輔, 黒橋禎夫

    情報処理学会 第226回自然言語処理研究会    2016.05

  • ユーザのライフログに対する健康アドバイスの自動生成

    粟村誉, 岡照晃, 荒牧英治, 河原大輔, 黒橋禎夫

    言語処理学会 第22回年次大会    2016.03

  • 連想ゲームによるコモンセンス知識の獲得

    大谷直樹, 河原大輔, 黒橋禎夫, 鍜治伸裕, 颯々野学

    言語処理学会第22回年次大会     897 - 897  2016.03

  • 集合知を用いた事態参与者の特徴変化に関する知識の獲得

    仲村哲明, 河原大輔

    言語処理学会第22回年次大会     901 - 904  2016.03

  • おしゃべりけんこうノート:管理栄養士・インストラクターのアドバイスに基づく健康アドバイスシステム

    岡照晃, 粟村誉, 荒牧英治, 河原大輔, 黒橋禎夫

    言語処理学会第22回年次大会    2016.03

  • パーソナリティ表現の自動翻訳の試み

    植田晋平, 河原大輔, 黒橋禎夫, 岩井律子, 井関龍太, 熊田孝恒

    言語処理学会 第22回年次大会     282 - 285  2016.03

  • 格パターンの多様性に頑健な日本語格フレーム構築

    林部祐太, 河原大輔, 黒橋禎夫

    情報処理学会 第224回自然言語処理研究会    2015.12

  • Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model

    Hajime Morita, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of EMNLP 2015: Conference on Empirical Methods in Natural Language Processing     2292 - 2297  2015.09  [Refereed]

  • 行間を読む健康アドバイス生成システムの実現に向けて

    粟村誉, 岡照晃, 荒牧英治, 河原大輔, 黒橋禎夫

    情報処理学会 第223回自然言語処理研究会    2015.09

  • Chinese Semantic Role Labeling using High-quality Syntactic Knowledge

    Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

    The 8th SIGHAN Workshop on Chinese Language Processing     120 - 127  2015.07  [Refereed]

  • Corpus Patterns for Semantic Processing

    Patrick Hanks, Elisabetta Jezek, Daisuke Kawahara, Octavian Popescu

    In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP2015) (Tutorials)     12 - 15  2015.07  [Refereed]

  • Classification and Acquisition of Contradictory Event Pairs using Crowdsourcing

    Yu Takabatake, Hajime Morita, Daisuke Kawahara, Sadao Kurohashi, Ryuichiro Higashinaka, Yoshihiro Matsuo

    In Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation     99 - 107  2015.06  [Refereed]

  • Location Name Disambiguation Exploiting Spatial Proximity and Temporal Consistency

    Takashi Awamura, Eiji Aramaki, Daisuke Kawahara, Tomohide Shibata, Sadao Kurohashi

    In Proceedings of the 3rd International Workshop on Natural Language Processing for Social Media     1 - 9  2015.06  [Refereed]

  • Toward an Advice Agent for Diet and Exercise Based on Diary Texts

    Tetsuaki Nakamura, Takashi Awamura, Yiqi Zhang, Eiji Aramaki, Daisuke Kawahara, Sadao Kurohashi

    Ambient Intelligence for Health and Cognitive Enhancement, Papers from the AAAI Spring Symposium, Technical Report SS-15-01     43 - 48  2015.03  [Refereed]

  • 集合知を利用した対訳知識のカバレッジ向上

    牛久敦, 河原大輔, 黒橋禎夫, 颯々野学

    情報処理学会 第77回全国大会, 京都     2:205 - 2:206  2015.03

  • ブログ記事に対する健康アドバイスの自動生成に向けて

    仲村 哲明, 粟村 誉, Yiqi Zhang, 荒牧 英治, 河原 大輔, 黒橋 禎夫

    情報処理学会第77回全国大会講演論文集    2015.03

  • クラウドソーシングを活用した事態間矛盾の分析と分類

    高畠悠, 森田一, 河原大輔, 黒橋禎夫, 東中竜一郎, 松尾義博

    言語処理学会 第21回年次大会     305 - 308  2015.03

  • 自動獲得と集合知の併用による関連語知識の高度化と評価

    町田雄一郎, 河原大輔, 黒橋禎夫, 颯々野学

    言語処理学会 第21回年次大会     1060 - 1063  2015.03

  • クラウドソーシングによる確率的アノテーションを利用した談話関係解析

    澤田 晋之介, 小浜 翔太郎, 河原 大輔, 黒橋 禎夫

    情報処理学会 第77回全国大会    2015.03

  • 文の構造を可視化した翻訳後編集インターフェース

    岸本裕大, 中澤敏明, 河原大輔, 黒橋禎夫

    情報処理学会 第77回全国大会     2:209 - 2:210  2015.03

  • Post-Editing User Interface Using Visualization of a Sentence Structure

    Yudai Kishimoto, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

    In Proceeding of AMTA2014 Workshop on Post-Editing Technology and Practice (WPTP3)    2014.10  [Refereed]

  • ゲーミフィケーションによる連想概念の獲得

    町田雄一郎, 河原大輔, 柴田知秀, 黒橋禎夫, 颯々野学

    情報処理学会関西支部 2014年度 支部大会    2014.09

  • Rapid Development of a Corpus with Discourse Annotations using Two-stage Crowdsourcing

    Daisuke Kawahara, Yuichiro Machida, Tomohide Shibata, Sadao Kurohashi, Hayato Kobayashi, Manabu Sassano

    In Proceedings of the 25th International Conference on Computational Linguistics (COLING2014)     269 - 278  2014.08  [Refereed]

  • Dependency Parse Reranking with Rich Subtree Features

    Mo Shen, Daisuke Kawahara, Sadao Kurohashi

    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   22 ( 7 ) 1208 - 1218  2014.07  [Refereed]

     View Summary

    In pursuing machine understanding of human language, highly accurate syntactic analysis is a crucial step. In this work, we focus on dependency grammar, which models syntax by encoding transparent predicate-argument structures. Recent advances in dependency parsing have shown that employing higher-order subtree structures in graph-based parsers can substantially improve the parsing accuracy. However, the inefficiency of this approach increases with the order of the subtrees. This work explores a new reranking approach for dependency parsing that can utilize complex subtree representations by applying efficient subtree selection methods. We demonstrate the effectiveness of the approach in experiments conducted on the Penn Treebank and the Chinese Treebank. Our system achieves the best performance among known supervised systems evaluated on these datasets, improving the baseline accuracy from 91.88% to 93.42% for English, and from 87.39% to 89.25% for Chinese.

    DOI

  • 2段階のクラウドソーシングによる談話関係タグ付きコーパスの構築

    河原 大輔, 町田 雄一郎, 柴田 知秀, 黒橋 禎夫, 小林 隼人, 颯々野 学

    情報処理学会 第217回自然言語処理研究会    2014.07

  • ソーシャルメディアにおける空間的近接性と時間的一貫性を考慮した地名の曖昧性解消

    粟村 誉, 荒牧 英治, 河原 大輔, 柴田知秀, 黒橋 禎夫

    情報処理学会 第217回自然言語処理研究会    2014.07

  • A Step-wise Usage-based Method for Inducing Polysemy-aware Verb Classes

    Daisuke Kawahara, Daniel W. Peterson, Martha Palmer

    In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL2014)     1030 - 1040  2014.06  [Refereed]

  • Chinese Morphological Analysis with Character-level POS Tagging (Short Paper)

    Mo Shen, Hongxiao Liu, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL2014)    2014.06  [Refereed]

  • A Self-correcting Approach to Solve Syntactic Ambiguities based on Collocational Strength

    Hajime Nozawa, Daisuke Kawahara

    In Proceedings of the 6th International Conference on Corpus Linguistics (CILC2014)    2014.05  [Refereed]

  • Inducing Example-based Semantic Frames from a Massive Amount of Verb Uses

    Daisuke Kawahara, Daniel W. Peterson, Octavian Popescu, Martha Palmer

    In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL2014)     58 - 67  2014.04  [Refereed]

  • Japanese Discourse Structure Analysis Based on Automatically Acquired Large-Scale Knowledge

    Qinghan Bu, Daisuke Kawahara, Sadao Kurohashi

    言語処理学会 第20回年次大会     725 - 728  2014.03

  • Chinese Unknown Word Extraction by Mining Maximized Substrings

    Shen Mo, 黒橋禎夫, 河原大輔

    言語処理学会 第20回年次大会     384 - 387  2014.03

  • 著者・読者表現および外界ゼロ照応を考慮したゼロ照応解析

    萩行正嗣, 河原大輔, 黒橋禎夫

    言語処理学会 第20回年次大会     721 - 724  2014.03

  • コーパスから算出した語の親和性によって構文パターンの曖昧性を解消する試み

    野澤 元, 河原 大輔

    言語処理学会 第20回年次大会     189 - 192  2014.03

  • A Framework for Compiling High Quality Knowledge Resources From Raw Corpora

    Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION     109 - 114  2014  [Refereed]

     View Summary

    The identification of various types of relations is a necessary step to allow computers to understand natural language text. In particular, the clarification of relations between predicates and their arguments is essential because predicate-argument structures convey most of the information in natural languages. To precisely capture these relations, wide-coverage knowledge resources are indispensable. Such knowledge resources can be derived from automatic parses of raw corpora, but unfortunately parsing still has not achieved a high enough performance for precise knowledge acquisition. We present a framework for compiling high quality knowledge resources from raw corpora. Our proposed framework selects high quality dependency relations from automatic parses and makes use of them for not only the calculation of fundamental distributional similarity but also the acquisition of knowledge such as case frames.

  • Single Classifier Approach for Verb Sense Disambiguation based on Generalized Features

    Daisuke Kawahara, Martha Palmer

    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION     4210 - 4213  2014  [Refereed]

     View Summary

    We present a supervised method for verb sense disambiguation based on Verb Net. Most previous supervised approaches to verb sense disambiguation create a classifier for each verb that reaches a frequency threshold. These methods, however, have a significant practical problem that they cannot be applied to rare or unseen verbs. In order to overcome this problem, we create a single classifier to be applied to rare or unseen verbs in a new text. This single classifier also exploits generalized semantic features of a verb and its modifiers in order to better deal with rare or unseen verbs. Our experimental results show that the proposed method achieves equivalent performance to per-verb classifiers, which cannot be applied to unseen verbs. Our classifier could be utilized to improve the classifications in lexical resources of verbs, such as Verb Net, in a semi-automatic manner and to possibly extend the coverage of these resources to new verbs.

  • Towards Fully Lexicalized Dependency Parsing for Korean

    Jungyeul Park, Daisuke Kawahara, Sadao Kurohashi, Key-Sun Choi

    In Proceedings of the 13th International Conference on Parsing Technologies (IWPT2013, short paper)     120 - 126  2013.11  [Refereed]

  • High Quality Dependency Selection from Automatic Parses

    Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP2013)     947 - 951  2013.10  [Refereed]

  • Precise Information Retrieval Exploiting Predicate-Argument Structures

    Daisuke Kawahara, Keiji Shinzato, Tomohide Shibata, Sadao Kurohashi

    In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP2013)     37 - 45  2013.10  [Refereed]

  • Japanese Zero Reference Resolution Considering Exophora and Author/Reader Mentions

    Masatsugu Hangyo, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of EMNLP 2013: Conference on Empirical Methods in Natural Language Processing     924 - 934  2013.10  [Refereed]

  • Chinese Word Segmentation by Mining Maximized Substrings

    Mo Shen, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP2013)     171 - 179  2013.10  [Refereed]

  • Automatic Knowledge Acquisition for Case Alternation between the Passive and Active Voices in Japanese

    Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi, Manabu Okumura

    In Proceedings of EMNLP 2013: Conference on Empirical Methods in Natural Language Processing     1213 - 1223  2013.10  [Refereed]

  • Language-independent Approach to High Quality Dependency Selection From Automatic Parses

    Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

    IPSJ 2013    2013.09

  • 構文・述語項構造解析システムKNPの解析の流れと特徴

    笹野遼平, 河原大輔, 黒橋禎夫, 奥村学

    言語処理学会 第19回年次大会     110 - 113  2013.03

  • Dependency Parse Reranking Based-on Subtree Extraction

    Mo Shen, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of NLP 2013     58 - 61  2013.03

  • Selecting High Quality Dependencies from Automatic Parses

    Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of NLP 2013    2013.03

  • 日本語語彙知識の統一的・整合的管理のデザイン

    黒橋禎夫, 進義治, 柴田知秀, 村脇有吾, 河原大輔

    言語処理学会 第19回年次大会     26 - 29  2013.03

  • 非計算機的計算に向けて(<特集>編集委員今年の抱負2013)

    河原 大輔, Daisuke Kawahara

    人工知能学会誌 = Journal of Japanese Society for Artificial Intelligence   28 ( 1 ) 20 - 20  2013.01

    CiNii

  • Knowledge Acquisition from a Large Web Corpus and its Applications

    Tsubame ESJ. : e-science journal   7   12 - 15  2012.12

    CiNii

  • Building a Diverse Document Leads Corpus Annotated with Semantic Relations

    Masatsugu Hangyo, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of 26th Pacific Asia Conference on Language Information and Computing     535 - 544  2012.11  [Refereed]

  • A Reranking Approach for Dependency Parsing with Variable-sized Subtree Features

    Mo Shen, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of 26th Pacific Asia Conference on Language Information and Computing     308 - 317  2012.11  [Refereed]

  • Exploiting Shared Chinese Characters in Chinese Word Segmentation Optimization for Chinese-Japanese Machine Translation

    Chenhui Chu, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT2012)     35 - 42  2012.05  [Refereed]

  • 多様な文書の書き始めに対する意味関係タグ付きコーパスの構築

    萩行 正嗣, 河原 大輔, 黒橋 禎夫

    情報処理学会 第206回自然言語処理研究会    2012.05

  • 実テキスト解析をささえる語彙知識の自動獲得

    柴田知秀, 村脇有吾, 黒橋禎夫, 河原大輔

    言語処理学会 第18回年次大会     81 - 84  2012.03

  • A Framework of Automatic Case Frame Construction From Raw Corpus

    Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

    言語処理学会 第18回年次大会     389 - 392  2012.03

  • TSUBAKI: An Open Search Engine Infrastructure for Developing Information Access Methodology

    Keiji Shinzato, Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi

      52 ( 12 ) 12p  2011.12

    CiNii

  • Generative Modeling of Coordination by Factoring Parallelism and Selectional Preferences

    Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP2011)     456 - 464  2011.11  [Refereed]

  • Automatic Construction of Multilingual Case Frames

    Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

    情報処理学会関西支部支部大会    2011.09

  • 単語の共起分布を用いた文末モダリティの自動推定

    中村紘規, 玉城伸仁, 河原大輔, 黒橋禎夫

    情報処理学会関西支部支部大会    2011.09

  • Web上の多彩な言語表現バリエーションに対応した頑健な形態素解析

    勝木 健太, 笹野 遼平, 河原 大輔, 黒橋 禎夫

    言語処理学会 第17回年次大会     1003 - 1006  2011.03

  • 大規模Web情報分析のための分析対象ページの段階的選択

    赤峯 享, 加藤 義清, 川田 拓也, レオン 末松豊インティ, 河原 大輔, 乾 健太郎, 黒橋 禎夫

    言語処理学会 第17回年次大会     41 - 44  2011.03

  • Treatment of Complex Sentences, Modality and Verbal Structures in Linguistics-Based MT

    Alexis Kauffmann, Daisuke Kawahara, Sadao Kurohashi

    言語処理学会 第17回年次大会     818 - 821  2011.03

  • 情報分析システム WISDOM のユーザ評価とその分析

    川田 拓也, 赤峯 享, 河原 大輔, 加藤 義清, 乾 健太郎, 黒橋 禎夫, 木俵 豊

    言語処理学会 第17回年次大会     45 - 48  2011.03

  • 言語を獲得する(<特集>編集委員今年の抱負2011)

    河原 大輔, Daisuke Kawahara

    人工知能学会誌 = Journal of Japanese Society for Artificial Intelligence   26 ( 1 ) 16 - 16  2011.01

    CiNii

  • Web information analysis for open-domain decision support: System design and user evaluation

    Takuya Kawada, Susumu Akamine, Daisuke Kawahara, Yoshikiyo Kato, Yutaka I. Leon-Suematsu, Kentaro Inui, Sadao Kurohashi, Yutaka Kidawara

    ACM International Conference Proceeding Series     13 - 18  2011  [Refereed]

     View Summary

    In this paper, we investigate the effectiveness of the system design of a Web information analysis for open-domain decision support. In order to make decisions, it is required to collect and compare information from various view points. In case of making decisions based on Web information, however, it is difficult to obtain diverse information from variety of sources by using current search engines. Based on this observation, we design a system for supporting open-domain decision making, which analyzes Web information. Among the major design decisions are to focus on two elements, i.e. identifying the source of information and the extraction of informative content, and to organize the two elements so that the user can quickly grasp who is saying what on the Web. The assumption behind such decisions is that information organized in such a way would facilitate proper judgments in the user's decision making process. We conduct users evaluation to verify the effectiveness of our approach. In the result, it is confirmed that our system is superior to current search engine for grasping organized information from different stance of senders and supports the process of decision making, by (i) uncovering biases, (ii) showing various opinions from multiple view points, (iii) revealing information sources. © 2011 ACM.

    DOI

  • Identifying Contradictory and Contrastive Relations between Statements to Outline Web Information on a Given Topic

    Daisuke Kawahara, Kentaro Inui, Sadao Kurohashi

    COLING 2010 Poster Volume     534 - 542  2010.08  [Refereed]

  • Web情報分析のための大規模Webページの収集・選択・検索

    赤峯 享, 加藤 義清, 河原 大輔, レオン 末松豊インティ, 新里 圭司, 乾 健太郎, 黒橋 禎夫, 木俵 豊

    言語処理学会第16回年次大会     238 - 241  2010.03

  • Web情報の俯瞰的把握のための主要・対比・対立文の抽出と集約

    河原 大輔, 乾 健太郎, 黒橋 禎夫

    言語処理学会第16回年次大会     134 - 137  2010.03

  • Web ページの情報発信構成の同定

    加藤 義清, 河原 大輔, 乾 健太郎, 黒橋 禎夫

    言語処理学会第16回年次大会     90 - 93  2010.03

  • Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation

    Daisuke Kawahara, Sadao Kurohashi

    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION     1389 - 1393  2010  [Refereed]

     View Summary

    We present a method for acquiring reliable predicate-argument structures from raw corpora for automatic compilation of case frames. Such lexicon compilation requires highly reliable predicate-argument structures to practically contribute to Natural Language Processing (NLP) applications, such as paraphrasing, text entailment, and machine translation. We first apply chunking to raw corpora and then extract reliable chunks to ensure that high-quality predicate-argument structures are obtained from the chunks. Our experiments confirmed that we succeeded in acquiring highly reliable predicate-argument structures on a large scale.

  • Organizing information on the web to support user judgments on information credibility

    Susumu Akamine, Daisuke Kawahara, Yoshikiyo Kato, Tetsuji Nakagawa, Yutaka I. Leon-Suematsu, Takuya Kawada, Kentaro Inui, Sadao Kurohashi, Yutaka Kidawara

    2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings     123 - 130  2010  [Refereed]

     View Summary

    A vast amount of information and knowledge has been accumulated and circulated on the Web. They provide people with options regarding their daily lives and are starting to have a strong influence on governmental policies and business management. A crucial problem is that information on the Web is not necessarily credible. This paper describes an information analysis system called WISDOM, which assists users in assessing the credibility of information on the Web. WISDOM is to organize information on a given topic through the following three types of analyses: (1) extracting and contrasting opinions and important statements around the points related to the topic, (2) identifying and classifying the information sender of each page
    and (3) analyzing the appearance of each page, for example, page design and writing style. Our preliminary evaluation indicates the effectiveness of WISDOM and its advantage to Google from the viewpoint of the ability of grasping the difference of information senders and opinions. ©2010 IEEE.

    DOI

  • Capturing Consistency between Intra-clause and Inter-clause Relations in Knowledge-rich Dependency and Case Structure Analysis

    Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 11th International Conference on Parsing Technology (IWPT'09)     108 - 116  2009.10  [Refereed]

    CiNii

  • WISDOM: A Web Information Credibility Analysis System

    Susumu Akamine, Daisuke Kawahara, Yoshikiyo Kato, Tetsuji Nakagawa, Kentaro Inui, Sadao Kurohashi, Yutaka Kidawara

    In Proceedings of the ACL-IJCNLP 2009 Software Demonstrations     1 - 4  2009.08  [Refereed]

  • Webページの大規模収集・検索基盤

    赤峯 享, 加藤 義清, 河原 大輔, 新里 圭司, 乾 健太郎, 黒橋 禎夫, 木俵 豊

    情報処理学会研究会 Vol.2009-DBS-148 No.14    2009.07

  • The Effect of Corpus Size on Case Frame Acquisition for Discourse Analysis

    Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT2009)     521 - 529  2009.06  [Refereed]

    CiNii

  • 節内と節間の整合性をとる構文・格解析

    河原 大輔, 黒橋 禎夫

    言語処理学会 第15回年次大会     24 - 27  2009.03

  • Summarizing evaluative information on the web for information credibility analysis

    Daisuke Kawahara, Tetsuji Nakagawa, Takuya Kawada, Kentaro Inui, Sadao Kurohashi

    ACM International Conference Proceeding Series     187 - 192  2009  [Refereed]

     View Summary

    The World Wide Web comprises a wide variety of evaluative information. It consists of positive and negative opinions on innumerable topics from various perspectives, thus proving to be a useful information source for information credibility analysis. To present an informative and at-a-glance summary of any topic that a user of such an analysis system searches for, it is important to summarize many diverse evaluative expressions on the topic. In this paper, we describe a method for summarizing an extensive variety of evaluative expressions that are automatically extracted. Copyright 2009 ACM.

    DOI

  • Development of a large-scale web crawler and search engine infrastructure

    Susumu Akamine, Yoshikiyo Kato, Daisuke Kawahara, Keiji Shinzato, Kentaro Inui, Sadao Kurohashi, Yutaka Kidawara

    ACM International Conference Proceeding Series     126 - 131  2009  [Refereed]

     View Summary

    This paper reports the ongoing development of a large-scale Web crawler and search engine infrastructure at National Institute of Information and Communications Technology. This infrastructure has the following characteristics: (1) It collects one billion Japanese Web pages while keeping them up-to-date. (2) It selects 100 million pages from among the collected pages and converts them into a standard data format to store the results of morphological analysis, dependency parsing, and synonym augmentation. (3) The selected set of pages is searchable and accessible to the users. (4) The scalability of the system is achieved by using a large-scale cluster machine for distributed data processing. Copyright 2009 ACM.

    DOI

  • Identifying Information Sender Configuration of Web Pages

    Yoshikiyo Kato, Daisuke Kawahara, Kentaro Inui, Sadao Kurohashi, Tomohide Shibata

    2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1     335 - 340  2009  [Refereed]

     View Summary

    The source of a piece of information is a crucial element to consider when judging the credibility of that information. In this paper: we address the task of identifying the information source which is cast as a problem of identifying the information sender configuration (ISC) of a Web page. An information sender of a Web page is an entity which is involved in the publication of the information on the page. An ISC of a Web page describes the information senders of the page and the relationship among them. Information sender extraction is thus a subtask of identifying ISC, and we present a method for extracting information senders from Web pages and offer preliminary evaluation. The ISC provides a basis for deeper analysis of information on the Web.

    DOI

  • JUMAN/KNPを用いた形態素・構文・格解析

    河原 大輔, 黒橋 禎夫

    京都大学学術情報メディアセンター, メディア情報処理専修コース「自然言語処理技術」    2008.09

  • Webページの著者の同定

    加藤 義清, 河原 大輔, 乾 健太郎, 黒橋 禎夫, 柴田 知秀

    第7回情報科学技術フォーラム(FIT2008)    2008.09

  • Coordination Disambiguation without Any Similarities

    Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 22nd International Conference on Computational Linguistics (COLING2008)     425 - 432  2008.08  [Refereed]

  • A Fully-Lexicalized Probabilistic Model for Japanese Zero Anaphora Resolution

    Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 22nd International Conference on Computational Linguistics (COLING2008)     769 - 776  2008.08  [Refereed]

    CiNii

  • Chinese Dependency Parsing with Large Scale Automatically Constructed Case Structures

    Kun Yu, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 22nd International Conference on Computational Linguistics (COLING2008)     1049 - 1056  2008.08  [Refereed]

  • 主要・対立表現の俯瞰的把握 - ウェブの情報信頼性分析に向けて

    河原 大輔, 黒橋 禎夫, 乾 健太郎

    情報処理学会 第186回自然言語処理研究会    2008.07

  • 類似性を用いない並列構造解析

    河原 大輔, 黒橋 禎夫

    言語処理学会 第14回年次大会     91 - 94  2008.03

  • コーパスサイズの拡大および用例の汎化による格フレームのカバレッジの改善

    笹野 遼平, 河原 大輔, 黒橋 禎夫

    言語処理学会 第14回年次大会     528 - 531  2008.03

  • Cascaded Classification for High Quality Head-modifier Pair Selection

    Kun Yu, Daisuke Kawahara, Sadao Kurohashi

    言語処理学会 第14回年次大会     95 - 98  2008.03

  • 分布類似度を用いた大規模格フレームの自動構築

    濱田 慧, 笹野 遼平, 柴田 知秀, 河原 大輔, 黒橋 禎夫

    言語処理学会 第14回年次大会     532 - 535  2008.03

  • TSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology

    Keiji Shinzato, Tomohide Shibata, Daisuke Kawahara, Chikara Hashimoto, Sadao Kurohashi

    In Proceedings of Third International Joint Conference on Natural Language Processing (IJCNLP2008)     189 - 196  2008.01  [Refereed]

  • A Large-Scale Web Data Collection as a Natural Language Processing Infrastructure

    Keiji Shinzato, Daisuke Kawahara, Chikara Hashimoto, Sadao Kurohashi

    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008     2236 - 2241  2008  [Refereed]

     View Summary

    In recent years, language resources acquired from the Web are released, and these data improve the performance of applications in several NLP tasks. Although the language resources based on the web page unit are useful in NLP tasks and applications such as knowledge acquisition, document retrieval and document summarization, such language resources are not released so far. In this paper, we propose a data format for results of web page processing, and a search engine infrastructure which makes it possible to share approximately 100 million Japanese web data. By obtaining the web data, NLP researchers are enabled to begin their own processing immediately without analyzing web pages by themselves.

  • Information Credibility Analysis of Web Contents

    Sadao Kurohashi, Susumu Akamine, Daisuke Kawahara, Yoshikiyo Kato, Tetsuji Nakagawa, Kentaro Inui, Yutaka Kidawara

    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION     146 - +  2008  [Refereed]

     View Summary

    As computers and computer networks become more sophisticated, a vast amount of information and knowledge has been accumulated and circulated on the Web. They provide people with options regarding their daily lives and are starting to have a strong influence on governmental policies and business management. However, a crucial problem is that information on the Web is not necessarily credible. It is actually very difficult for human beings to judge information credibility and even more difficult for computers. However, computers can be used to develop a system that collects, organizes and relativises information and helps human beings view information from several viewpoints and judge information credibility. This paper introduces the information credibility criteria project at the National Institute of Information and Communications Technology, which aims to develop such a system, called WISDOM.

    DOI

  • Grasping major statements and their contradictions toward information credibility analysis of web contents

    Daisuke Kawahara, Sadao Kurohashi, Kentaro Inui

    Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008     393 - 397  2008  [Refereed]

     View Summary

    The World Wide Web contains wide variety of news reports, arguments, opinions, etc. that vary widely in quality. People judge the credibility of information on the Web for decision making in daily life. At present, while the quantity of information on the Web is explosively increasing, it is necessary to develop a system that supports such judgments. We have been developing an information credibility analysis system, WISDOM that considers the viewpoints of information contents, information senders, and information appearances. In this paper, as a viewpoint of information contents, we propose a method for providing a bird's eye view of major statements on a given topic and their contradictions. We evaluate the obtained statements in our experiments, and confirm the effectiveness of our approach. Furthermore, we discuss our future objectives. © 2008 IEEE.

    DOI

  • Learning Head-modifier Pairs to Improve Lexicalized Dependency Parsing on a Chinese Treebank

    Kun Yu, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of TLT 2007    2007.12  [Refereed]

  • JUMAN/KNPを用いた形態素・構文・格解析

    河原 大輔, 黒橋 禎夫

    京都大学学術情報メディアセンター, メディア情報処理専修コース「自然言語処理技術」    2007.09

  • Probabilistic Coordination Disambiguation in a Fully-Lexicalized Japanese Parser

    Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL2007)     306 - 314  2007.06  [Refereed]

  • 代表表記による自然言語リソースの整備

    岡部 浩司, 河原 大輔, 黒橋 禎夫

    言語処理学会 第13回年次大会     606 - 609  2007.03

  • 大規模語彙的知識に基づく構文・並列・格構造解析の統合的確率モデル

    河原 大輔, 黒橋 禎夫

    言語処理学会 第13回年次大会     506 - 509  2007.03

  • 大規模日本語ウェブ文書を対象とした開放型検索エンジン基盤の構築

    新里 圭司, 柴田 知秀, 河原 大輔, 黒橋 禎夫

    言語処理学会 第13回年次大会     1117 - 1120  2007.03

  • 自然言語処理基盤としてのウェブ文書標準フォーマットの提案

    新里 圭司, 橋本 力, 河原 大輔, 黒橋 禎夫

    言語処理学会 第13回年次大会     602 - 605  2007.03

  • Improving coreference resolution using bridging reference resolution and automatically acquired synonyms

    Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi

    ANAPHORA: ANALYSIS, ALGORITHMS AND APPLICATIONS   4410   125 - +  2007  [Refereed]

     View Summary

    We present a knowledge-rich approach to Japanese coreference resolution. In Japanese, proper noun coreference and common noun coreference occupy a central position in coreference relations. To improve coreference resolution for such language, wide-coverage knowledge of synonyms is required. We first acquire knowledge of synonyms from large raw corpus and dictionary definition sentences, and resolve coreference relations based on the knowledge. Furthermore, to boost the performance of coreference resolution, we integrate bridging reference resolution system into coreference resolver.

  • Example-based Machine Translation based on Deeper NLP

    Toshiaki Nakazawa, Kun Yu, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of International Workshop on Spoken Language Translation (IWSLT'06)     64 - 70  2006.11  [Refereed]

  • 表層的語彙分布に基づく談話/テクストの主観性・主体性分析に向けて

    岡本 雅史, 河原 大輔, 黒橋 禎夫

    日本認知言語学会論文集 第6巻     423 - 432  2006.09

  • A Fully-Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis

    Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics     176 - 183  2006.06  [Refereed]

    CiNii

  • Case Frame Compilation from the Web using High-Performance Computing

    Daisuke Kawahara, Sadao Kurohashi

    the 5th International Conference on Language Resources and Evaluation    2006.05  [Refereed]

  • 格フレームを用いたかな表記語の曖昧性解消

    岡部 浩司, 河原 大輔, 黒橋 禎夫

    言語処理学会 第12回年次大会     1115 - 1118  2006.03

  • 自動獲得した知識に基づく統合的な照応解析

    笹野 遼平, 河原 大輔, 黒橋 禎夫

    言語処理学会 第12回年次大会     480 - 483  2006.03

  • Webから獲得した大規模格フレームに基づく構文・格解析の統合的確率モデル

    河原 大輔, 黒橋 禎夫

    言語処理学会 第12回年次大会     1111 - 1114  2006.03

  • 高性能計算環境を用いたWebからの大規模格フレーム構築

    河原 大輔, 黒橋 禎夫

    情報処理学会 自然言語処理研究会 171-12    2006.01

  • JUMAN/KNPを用いた形態素解析・構文解析

    黒橋 禎夫, 河原 大輔

    京都大学学術情報メディアセンター, メディア情報処理専修コース「自然言語処理技術」    2005.08

  • Gradual Fertilization of Case Frames

    KAWAHARA DAISUKE, KUROHASHI SADAO

    Journal of natural language processing   12 ( 2 ) 109 - 131  2005.03

    CiNii

  • 日本語辞書整備のための日本語カタカナ複合名詞の自動分割

    中澤 敏明, 河原 大輔, 黒橋 禎夫

    言語処理学会 第11回年次大会     588 - 591  2005.03

  • 大規模格フレームに基づく構文・格解析の統合的確率モデル

    河原 大輔, 黒橋 禎夫

    言語処理学会 第11回年次大会     923 - 926  2005.03

  • Automatic acquisition of basic Katakana lexicon from a given corpus

    T Nakazawa, D Kawahara, S Kurohashi

    NATURAL LANGUAGE PROCESSING - IJCNLP 2005, PROCEEDINGS   3651   682 - 693  2005  [Refereed]

     View Summary

    Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automati-cally, given only a medium or large size of Japanese corpus of some domain.

    DOI

  • PP-attachment disambiguation boosted by a gigantic volume of unambiguous examples

    D Kawahara, S Kurohashi

    NATURAL LANGUAGE PROCESSING - IJCNLP 2005, PROCEEDINGS   3651   188 - 198  2005  [Refereed]

     View Summary

    We present a PP-attachment disambiguation method based on a gigantic volume of unambiguous examples extracted from raw corpus. The unambiguous examples are utilized to acquire precise lexical preferences for PP-attachment disambiguation. Attachment decisions are made by a machine learning method that optimizes the use of the lexical preferences. Our experiments indicate that the precise lexical preferences work effectively.

    DOI

  • Improving Japanese Zero Pronoun Resolution by Global Word Sense Disambiguation

    Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 20th International Conference on Computational Linguistics     343 - 349  2004.08  [Refereed]

  • Automatic Construction of Nominal Case Frames and its Application to Indirect Anaphora Resolution

    Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 20th International Conference on Computational Linguistics     1201 - 1207  2004.08  [Refereed]

  • Toward Text Understanding: Integrating Relevance-tagged Corpus and Automatically Constructed Case Frames

    Daisuke Kawahara, Ryohei Sasano, Sadao Kurohashi

    In Proceedings of the 4th International Conference on Language Resources and Evaluation     1833 - 1836  2004.05  [Refereed]

  • Zero Pronoun Resolution based on Automatically Constructed Case Frames and Structural Preference of Antecedents

    Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of the 1st International Joint Conference on Natural Language Processing     334 - 341  2004.03  [Refereed]

  • 言語情報と映像情報の統合による教示発話の構造解析

    柴田 知秀, 立木 将人, 河原 大輔, 岡本 雅史, 黒橋 禎夫, 西田 豊明

    言語処理学会 第10回年次大会     532 - 535  2004.03

  • 名詞格フレーム辞書の自動構築とそれを用いた名詞句の関係解析

    笹野 遼平, 河原 大輔, 黒橋 禎夫

    言語処理学会 第10回年次大会     472 - 475  2004.03

  • 語の大域的多義性解消に基づく省略解析の精度向上

    河原 大輔, 黒橋 禎夫

    言語処理学会 第10回年次大会     769 - 772  2004.03

  • Converting Text into Agent Animations: Assigning Gestures to Texts

    Yukiko I. Nakano, Masashi Okamoto, Daisuke Kawahara, Qing Li, Toyoaki Nishida

    Proceedings of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2004), Companion Volume, pp. 153-156    2004  [Refereed]

  • Structural analysis of instruction utterances using linguistic and visual information

    T Shibata, M Tachiki, D Kawahara, M Okamoto, S Kurohashi, T Nishida

    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS   3213   393 - 400  2004  [Refereed]

     View Summary

    In realizing video retrieval system, the crucial point is how to provide an effective access method of video contents. This paper focuses on Japanese cooking instruction utterances and describes a method of analyzing structure of them, which leads to a summary of video. We detect a hierarchical structure of video contents by using linguistic and visual information. We found that the integration of visual information can improve the detection of task units better than using linguistic information alone.

  • 言葉の背後に潜む『問い』の抽出 (ことば工学研究会(第14回)テーマ:ことばと身体性)

    松村 真宏, 河原 大輔, 岡本 雅史

    ことば工学研究会   14   1 - 7  2003.08

    CiNii

  • 料理教示発話の構造解析

    西田 悠介, 柴田 知秀, 河原 大輔, 岡本 雅史, 黒橋 禎夫, 西田 豊明

    言語処理学会 第9回年次大会     601 - 604  2003.03

  • 主題と文章構造の解析に基づくスライドの自動生成

    柴田 知秀, 河原 大輔, 黒橋 禎夫

    言語処理学会 第9回年次大会     597 - 600  2003.03

  • 自動構築した格フレーム辞書に基づく省略解析の大規模評価

    河原 大輔, 黒橋 禎夫

    言語処理学会 第9回年次大会     589 - 592  2003.03

  • Embodied conversational agents for presenting intellectual multimedia contents

    YI Nakano, T Murayama, D Kawahara, S Kurohashi, T Nishida

    KNOWLEDGE-BASED INTELLIGNET INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS   2774   1030 - 1036  2003  [Refereed]

     View Summary

    This paper presents an embodied conversational agent (ECA) that presents multimedia contents. The system takes plain text as input, and automatically generates a presentation featured with an animated agent. It selects and generates appropriate gestures and facial expressions for a humanoid agent according to linguistic information in the text. As a component of the ECA systems we also present an agent animation system, RISA, which can draw animations of natural human behaviors on web-based applications.

  • Structural analysis of instruction utterances

    T Shibata, D Kawahara, M Okamoto, S Kurohashi, T Nishida

    KNOWLEDGE-BASED INTELLIGNET INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS   2774   1054 - 1061  2003  [Refereed]

     View Summary

    Toward designing a system which teaches various works interactively and visually, this paper proposes a method of analyzing instruction utterances. One of the biggest problem in dealing with spoken language is ellipsis/anaphor resolution. We resolve it using a domain-specific case frame dictionary constructed automatically from a large amount of texts. Then, we attach utterance-type to distinguish actions from notes, tips, etc. Based on the attached type, we analyze discourse structure of utterances and detect a unit of actions.

  • Fertilization of Case Frame Dictionary for Robust Japanese Case Analysis

    Daisuke Kawahara, Sadao Kurohashi

    In Proceedings of 19th COLING (COLING02)    2002.08  [Refereed]

  • Construction of a Japanese Relevance-tagged Corpus

    Daisuke Kawahara, Sadao Kurohashi, Koiti Hasida

    In Proceedings of The Third International Conference on Language Resources Evaluation    2002.05  [Refereed]

  • 会話型コンテンツを用いた知識流通支援

    久保田, 秀和, 黒橋, 禎夫, 西田, 豊明, 河原, 大輔, 清田, 陽司

    第64回全国大会講演論文集   2002 ( 1 ) 535 - 542  2002.03

    CiNii

  • 頑健な格解析を実現する格フレーム辞書の自動構築

    河原 大輔, 黒橋 禎夫

    言語処理学会 第8回年次大会     515 - 518  2002.03

  • 格関係の比較を用いた複数テキスト間の重複・差分の検出

    成松 深, 河原 大輔, 黒橋 禎夫, 西田 豊明

    言語処理学会 第8回年次大会     535 - 538  2002.03

  • 国語辞典とコーパスを用いた用言の言い換え規則の学習

    鍜治 伸裕, 河原 大輔, 黒橋 禎夫, 佐藤 理史

    言語処理学会 第8回年次大会     331 - 334  2002.03

  • 「関係」タグ付きコーパスの作成

    河原 大輔, 黒橋 禎夫, 橋田 浩一

    言語処理学会 第8回年次大会     495 - 498  2002.03

  • Verb paraphrase based on case frame alignment

    N Kaji, D Kawahara, S Kurohash, S Sato

    40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE     215 - 222  2002  [Refereed]

     View Summary

    This paper describes a method of translating a predicate-argument structure of a verb into that of an equivalent verb, which is a core component of the dictionary-based paraphrasing. Our method grasps several usages of a headword and those of the def-heads as a form of their case frames and aligns those case frames, which means the acquisition of word sense disambiguation rules and the detection of the appropriate equivalent and case marker transformation.

  • Case Frame Construction by Coupling the Predicate and its Adjacent Case Component

    KAWAHARA Daisuke, KUROHASHI Sadao

    IPSJ SIG Notes   2000 ( 107 ) 127 - 134  2000.11

     View Summary

    This paper describes a method to construct a case frame dictionary automatically from a raw corpus. First, we parse a corpus and collect reliable examples from the parsed corpus. Secondly, to deal with semantic ambiguity of a predicate, we distinguish examples by a predicate and its adjacent case component and cluster them. We also report on an experimental result of case structure analysis using the constructed dictionary.

    CiNii

▼display all

 

Syllabus

▼display all

 

Sub-affiliation

  • Faculty of Science and Engineering   Graduate School of Fundamental Science and Engineering

Research Institute

  • 2022
    -
    2024

    Global Information and Telecommunication Institute   Concurrent Researcher

  • 2022
    -
    2024

    Waseda Research Institute for Science and Engineering   Concurrent Researcher

Internal Special Research Projects

  • 計算機の文章読解能力向上に関する研究

    2023  

     View Summary

    To improve the text understanding abilities of computers, we conducted studies on training foundation models using large text corpora and developing and evaluating application systems through fine-tuning these models. For the training of the foundation models, we investigated the impact of filtering methods for large text corpora on downstream tasks, constructed models that learned the syllable count for literature generation, and examined knowledge-integrated models using Mixture of Experts (MoE). For application systems, we developed systems for generating interesting senryu (a type of haiku) and playing word chain games, etc. For evaluation, we automatically constructed a Japanese Winoground dataset for evaluating Japanese multimodal models. Through these research and development efforts, we believe that we have taken a step forward in the study of text understanding by computers.

  • 集合知による注釈付けに基づくデータ駆動型言語理解の変革

    2021  

     View Summary

    We tried to use the wisdom of crowds to build probabilistically annotated corpora towards a breakthrough in natural language understanding. Using crowdsourcing as the wisdom of crowds, five to ten crowdworkers made annotations for the tasks of syntactic parsing and discourse relation analysis. Furthermore, the collected annotations were converted to probabilities using the EM algorithm. As a result, we confirmed that the higher the level of a task is, the more the probability value of each annotation label varied. We also verified that the resulting probabilistic multi-label annotations were plausible. In the future, we plan to increase the size of the probabilistically annotated corpora and develop analyzers based on the annotated corpora. This will enable us to dramatically improve the accuracy of natural language analysis and understanding, such as syntactic parsing, anaphora resolution, and so forth.

  • 速報的情報を俯瞰するためのテキストの集約と分析手法に関する研究

    2020  

     View Summary

    As the coronavirus disease (COVID-19) has been rapidly spreading around the world, there is an increasing need for a system for aggregating immediate information that transcends borders and domains. To build such a system, it is necessary to use natural language processing (NLP) technologies flexibly, such as combining machine translation of multilingual texts with information analysis technology, and mapping information transmitted by experts with social media texts. We have studied an application of NLP technologies for COVID-19 by cooperating with researchers in informatics including NLP. Then, we have developed a system for aggregating COVID-19 information from all over the world. In this system, COVID-19 information is grouped by regions and topics, such as infection status, prevention, medical information, economic policies, and education. Collected multilingual articles are translated into Japanese and English by machine translation and are automatically classified into the topics by the contextualized language model BERT. We hope that this system is useful for many people, and this kind of technology will be used for other future events and disasters.