研究者詳細 - 河原　大輔

写真a

カワハラ　ダイスケ

河原　大輔

Scopus 論文情報

論文数: 50 Citation: 511 h-index: 11

Click to view the Scopus page. The data was downloaded from Scopus API in July 15, 2026, via http://api.elsevier.com and http://www.scopus.com .

Google Scholar 情報（Citations per year）

Citation: 4679 h-index: 33 i10-index: 101

Click to view the Google Scholar page.

Scopus 情報

所属

理工学術院基幹理工学部

職名

教授

経歴

2021年04月

-

継続中

国立情報学研究所客員教授
2020年04月

-

継続中

早稲田大学理工学術院教授
2017年07月

-

2025年03月

理化学研究所革新知能統合研究センター客員研究員
2010年10月

-

2020年03月

京都大学大学院情報学研究科准教授
2008年10月

-

2010年09月

情報通信研究機構主任研究員
2006年04月

-

2008年09月

情報通信研究機構研究員
2002年04月

-

2006年03月

東京大学大学院情報理工学系研究科学術研究支援員

▼全件表示

学歴

2005年

-

　

博士(情報学) (京都大学)
1999年

-

2002年

京都大学大学院情報学研究科知能情報学専攻博士課程
1997年

-

1999年

京都大学大学院工学研究科電子通信工学専攻修士課程
1993年

-

1997年

京都大学工学部電気工学第二学科

研究分野

知能情報学

受賞

言語処理学会第32回年次大会委員特別賞

2026年03月言語処理学会 WAON: 視覚言語モデルのための大規模かつ高品質な日本語画像・テキスト対データセット

受賞者：杉浦一瑳, 栗田修平, 小田悠介, 河原大輔, 岡部寿男, 岡崎直観
言語処理学会第32回年次大会委員特別賞

2026年03月言語処理学会アンサンブル蒸留と学習ベース集計を用いた数学的推論プロセスの検証と性能分析

受賞者：榎本倫太郎, 栗田修平, 河原大輔
2025年度人工知能学会全国大会 (第39回) 優秀賞

2025年11月人工知能学会文長制限を設けた問題文による早押しクイズ用Retrieverの学習

受賞者：佐々木斗海, 河原大輔
情報処理学会自然言語処理研究会優秀研究賞

2025年09月情報処理学会自然言語処理研究会 JMTEB and JMTEB-lite: Japanese Massive Text Embedding Benchmark and Its Lightweight Version

受賞者：李聖哲, 大萩雅也, 李凌寒, 福地成彦, 柴田知秀, 河原大輔
情報処理学会自然言語処理研究会優秀研究賞

2025年03月情報処理学会自然言語処理研究会実在しないエンティティや出来事に関する合成文書を用いたRAGベンチマークの構築

受賞者：李聖哲, 大萩雅也, 塚越駿, 福地成彦, 柴田知秀, 河原大輔
言語処理学会第31回年次大会スポンサー賞 (博報堂テクノロジーズ賞)

2025年03月言語処理学会大規模言語モデルは他者の心をシミュレートしているか

受賞者：青木洸士郎, 河原大輔
言語処理学会第31回年次大会委員特別賞

2025年03月言語処理学会 LLM-jp-3 VILA: 日本語マルチモーダルデータセット及び強力な日本語マルチモーダルモデルの構築

受賞者：笹川慶人, 前田航希, 杉浦一瑳, 栗田修平, 岡崎直観, 河原大輔
言語処理学会2023年最優秀論文賞

2024年03月言語処理学会 JGLUE: 日本語言語理解ベンチマーク

受賞者：栗原健太郎, 河原大輔, 柴田知秀
言語処理学会2023年論文賞

2024年03月言語処理学会基本イベントに基づく常識推論データセットの構築と利用

受賞者：大村和正, 河原大輔, 黒橋禎夫
言語処理学会第30回年次大会スポンサー賞 (メルカリ賞)

2024年03月言語処理学会プロンプトの丁寧さと大規模言語モデルの性能の関係検証

受賞者：尹子旗, 王昊, 堀尾海斗, 河原大輔, 関根聡
言語処理学会第30回年次大会スポンサー賞 (日立賞)

2024年03月言語処理学会日本語TruthfulQAの構築

受賞者：中村友亮, 河原大輔
言語処理学会第30回年次大会委員特別賞

2024年03月言語処理学会日本語TruthfulQAの構築

受賞者：中村友亮, 河原大輔
言語処理学会第29回年次大会委員特別賞

2023年03月言語処理学会 JCommonsenseQA 2.0: 計算機と人の協働による常識推論データセットの改良

受賞者：栗原健太郎, 河原大輔, 柴田知秀
言語処理学会第29回年次大会優秀賞

2023年03月言語処理学会日本語WiCデータセットの構築と読みづらさ検出への応用

受賞者：吉田あいり, 河原大輔
自然言語処理研究会優秀研究賞

2022年09月情報処理学会自然言語処理研究会 KWJA：汎用言語モデルに基づく日本語解析器

受賞者：植田暢大, 大村和正, 児玉貴志, 清丸寛一, 村脇有吾, 河原大輔, 黒橋禎夫
言語処理学会第28回年次大会言語資源賞

2022年03月言語処理学会 JGLUE: 日本語言語理解ベンチマーク

受賞者：栗原健太郎, 河原大輔, 柴田知秀
言語処理学会第27回年次大会言語資源賞

2021年03月言語処理学会日本語Wikipediaの編集履歴に基づく入力誤りデータセットと訂正システムの改良

受賞者：田中佑, 村脇有吾, 河原大輔, 黒橋禎夫
2019年論文賞

2020年03月言語処理学会ニューラルネットワークを利用した中国語の統合的な構文解析

受賞者：栗田修平, 河原大輔, 黒橋禎夫
科学技術分野の文部科学大臣表彰 (科学技術賞・研究部門)

2017年04月文部科学省日本語テキスト解析のための統合的言語資源構築に関する研究

受賞者：黒橋禎夫, 河原大輔
20周年記念論文賞

2014年10月言語処理学会格フレーム辞書の漸次的自動構築

受賞者：河原大輔, 黒橋禎夫
第56回前島賞

2011年03月情報分析エンジン「WISDOM」の開発

受賞者：木俵豊, 黒橋禎夫, 赤峯享, 河原大輔, 加藤義清
第14回年次大会優秀発表賞

2009年03月言語処理学会類似性を用いない並列構造解析

受賞者：河原大輔, 黒橋禎夫
第13回年次大会最優秀発表賞

2008年03月言語処理学会大規模日本語ウェブ文書を対象とした開放型検索エンジン基盤の構築

受賞者：新里圭司, 柴田知秀, 河原大輔, 黒橋禎夫
2007年論文賞

2008年03月言語処理学会自動構築した大規模格フレームに基づく構文・格解析の統合的確率モデル

受賞者：河原大輔, 黒橋禎夫
平成18年度山下記念研究賞

2007年03月情報処理学会高性能計算環境を用いたWebからの大規模格フレーム構築

受賞者：河原大輔, 黒橋禎夫
第12回年次大会最優秀発表賞

2007年03月言語処理学会 Webから獲得した大規模格フレームに基づく構文・格解析の統合的確率モデル

受賞者：河原大輔, 黒橋禎夫
2005年論文賞

2006年03月言語処理学会格フレーム辞書の漸次的自動構築

受賞者：河原大輔, 黒橋禎夫
第8回年次大会優秀発表賞

2002年06月言語処理学会頑健な格解析を実現する格フレーム辞書の自動構築

受賞者：河原大輔, 黒橋禎夫
第6回年次大会優秀発表賞

2000年06月言語処理学会大規模コーパスからの格フレーム辞書構築とそれを用いた格解析

受賞者：河原大輔, 鍜治伸裕, 黒橋禎夫

▼全件表示

論文

Advancing psychological assessment: quantifying self-compassion through free-text responses and language model BERT

Hirohito Okano, Daisuke Kawahara, Michio Nomura

Scientific Reports 15 ( 1 ) 2025年07月 [査読有り]

DOI

Scopus

2

被引用数

(Scopus)
日本語Natural QuestionsとBoolQの構築

植松拓也, 王昊, 福田創, 河原大輔, 柴田知秀

自然言語処理 32 ( 2 ) 497 - 519 2025年06月 [査読有り]

DOI
言語モデルを用いた漢詩文の返り点付与と書き下し文生成

王昊, 清水博文, 河原大輔

自然言語処理 31 ( 1 ) 134 - 154 2024年03月 [査読有り]

DOI
JGLUE: 日本語言語理解ベンチマーク

栗原健太郎, 河原大輔, 柴田知秀

自然言語処理 30 ( 1 ) 63 - 87 2023年03月 [査読有り]

DOI
A Novel Global Prototype-Based Node Embedding Technique

Zyad Alkayem, Rami Zewail, Amin Shoukry, Daisuke Kawahara, Samir A. Elsagheer Mohamed

IEEE Access 10 125311 - 125318 2022年11月 [査読有り]

DOI
SCTB-V2: the 2nd version of the Chinese treebank in the scientific domain

Chenhui Chu, Zhuoyuan Mao, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

Language Resources and Evaluation 2022年10月 [査読有り]

DOI

Scopus
Automatic Japanese Example Extraction for Flashcard-based Foreign Language Learning

Arseny Tolmachev, Sadao Kurohashi, Daisuke Kawahara

Journal of Information Processing 30 315 - 330 2022年04月 [査読有り]

DOI

Scopus

4

被引用数

(Scopus)
日本語Wikipediaの編集履歴に基づく入力誤りデータセットと訂正システムの構築

田中佑, 村脇有吾, 河原大輔, 黒橋禎夫

自然言語処理 28 ( 4 ) 995 - 1033 2021年12月 [査読有り]

DOI
RODA: Reverse Operation Based Data Augmentation for Solving Math Word Problems

Qianying Liu, Wenyu Guan, Sujian Li, Fei Cheng, Daisuke Kawahara, Sadao Kurohashi

IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 1 - 11 2021年11月 [査読有り]

DOI
日本語談話関係解析: タスク設計・談話標識の自動認識・コーパスアノテーション

岸本裕大, 村脇有吾, 河原大輔, 黒橋禎夫

自然言語処理 27 ( 4 ) 889 - 931 2020年12月 [査読有り]
Design and Structure of The Juman++ Morphological Analyzer Toolkit

Arseny Tolmachev, Daisuke Kawahara, Sadao Kurohashi

自然言語処理 27 ( 1 ) 89 - 132 2020年03月 [査読有り]

DOI
Annotating a Driving Experience Corpus with Behavior and Subjectivity

Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi

自然言語処理 26 ( 2 ) 329 - 359 2019年06月 [査読有り]
事象に対する網羅的な時間情報アノテーションとその分析

坂口智洋, 河原大輔, 黒橋禎夫

自然言語処理 26 ( 1 ) 2019年03月 [査読有り]
ニューラルネットワークを利用した中国語の統合的な構文解析

栗田修平, 河原大輔, 黒橋禎夫

自然言語処理 26 ( 1 ) 2019年03月 [査読有り]
Improving Chinese Semantic Role Labeling using High-quality Surface and Deep Case Frames

Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

自然言語処理 25 ( 2 ) 201 - 221 2018年03月 [査読有り]

DOI
Learning to Answer Questions by Understanding Using Entity-Based Memory Network

Xun Wang, Katsuhito Sudoh, Masaaki Nagata, Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi

Computacion y Sistemas 21 ( 4 ) 799 - 808 2017年 [査読有り]

DOI
Chinese Word Segmentation and Unknown Word Extraction by Mining Maximized Substring

Mo Shen, Daisuke Kawahara, Sadao Kurohashi

自然言語処理 23 ( 3 ) 266 - 266 2016年06月 [査読有り]

　概要を見る

<p>Chinese word segmentation is an initial and important step in Chinese language processing. Recent advances in machine learning techniques have boosted the performance of Chinese word segmentation systems, yet the identification of out-of-vocabulary words is still a major problem in this field of study. Recent research has attempted to address this problem by exploiting characteristics of frequent substrings in unlabeled data. We propose a simple yet effective approach for extracting a specific type of frequent substrings, called maximized substrings, which provide good estimations of unknown word boundaries. In the task of Chinese word segmentation, we use these substrings which are extracted from large scale unlabeled data to improve the segmentation accuracy. The effectiveness of this approach is demonstrated through experiments using various data sets from different domains. In the task of unknown word extraction, we apply post-processing techniques that effectively reduce the noise in the extracted substrings. We demonstrate the effectiveness and efficiency of our approach by comparing the results with a widely applied Chinese word recognition method in a previous study. </p>

DOI CiNii J-GLOBAL
関連語知識獲得のための対話システム上の連想ゲームのデザイン

町田雄一郎, 河原大輔, 黒橋禎夫, 颯々野学

情報処理学会論文誌 57 ( 3 ) 1058 - 1068 2016年03月 [査読有り]
受身・使役形と能動形間の格交替に関する語彙知識の自動獲得

笹野遼平, 河原大輔, 黒橋禎夫, 奥村学

自然言語処理 21 ( 6 ) 1207 - 1233 2014年12月 [査読有り]

DOI J-GLOBAL
Language-independent Approach to High Quality Dependency Selection From Automatic Parses

Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

自然言語処理 21 ( 6 ) 1163 - 1182 2014年12月 [査読有り]

DOI J-GLOBAL
Dependency parse reranking with rich subtree features

Mo Shen, Daisuke Kawahara, Sadao Kurohashi

IEEE Transactions on Audio, Speech and Language Processing 22 ( 7 ) 1208 - 1218 2014年07月 [査読有り]

DOI
外界照応および著者・読者表現を考慮した日本語ゼロ照応解析

萩行正嗣, 河原大輔, 黒橋禎夫

自然言語処理 21 ( 3 ) 563 - 600 2014年06月 [査読有り]

DOI CiNii J-GLOBAL
多様な文書の書き始めに対する意味関係タグ付きコーパスの構築とその分析

萩行正嗣, 河原大輔, 黒橋禎夫

自然言語処理 21 ( 2 ) 213 - 247 2014年04月 [査読有り]

DOI CiNii J-GLOBAL
Chinese-Japanese machine translation exploiting Chinese characters

Chenhui Chu, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

ACM Transactions on Asian Language Information Processing 12 ( 4 ) 1 - 25 2013年10月 [査読有り]

DOI J-GLOBAL
TSUBAKI: An open search engine infrastructure for developing information access methodology

Keiji Shinzato, Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi

Journal of Information Processing 20 ( 1 ) 216 - 227 2012年 [査読有り]

DOI
構文・照応・評価情報つきブログコーパスの構築

橋本力, 黒橋禎夫, 河原大輔, 新里圭司, 永田昌明

自然言語処理 (技術資料) 18 ( 2 ) 175 - 201 2011年06月 [査読有り]

DOI J-GLOBAL
The Effect of Corpus Size on Case Frame Acquisition for Predicate-Argument Structure Analysis

Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E93D ( 6 ) 1361 - 1368 2010年06月 [査読有り]

DOI J-GLOBAL
Webページの情報発信者の同定

加藤義清, 河原大輔, 乾健太郎, 黒橋禎夫, 柴田知秀

人工知能学会論文誌 25 ( 1 ) 90 - 103 2010年 [査読有り]

DOI J-GLOBAL
Compilation of an idiom example database for supervised idiom identification

Chikara Hashimoto, Daisuke Kawahara

LANGUAGE RESOURCES AND EVALUATION 43 ( 4 ) 355 - 384 2009年12月 [査読有り]

DOI
格フレームを用いた自然言語処理〈下〉――格フレームに基づく構文・格解析とその応用

黒橋禎夫, 河原大輔

月刊「言語」 36 ( 12 ) 76 - 83 2007年12月 [査読有り]

CiNii
格フレームを用いた自然言語処理〈上〉――基本語彙の整理と格フレームの自動獲得

黒橋禎夫, 河原大輔

月刊「言語」 36 ( 11 ) 94 - 100 2007年11月 [査読有り]

CiNii
自動構築した大規模格フレームに基づく構文・格解析の統合的確率モデル

河原大輔, 黒橋禎夫

自然言語処理 14 ( 4 ) 67 - 81 2007年07月 [査読有り]

DOI CiNii J-GLOBAL
メッセージの背後に潜む「問い」の抽出

松村真宏, 河原大輔, 岡本雅史, 黒橋禎夫, 西田豊明

人工知能学会論文誌 22 ( 1 ) 93 - 102 2007年 [査読有り]

DOI J-GLOBAL
名詞格フレーム辞書の自動構築とそれを用いた名詞句の関係解析

笹野遼平, 河原大輔, 黒橋禎夫

自然言語処理 12 ( 3 ) 129 - 144 2005年07月 [査読有り]

DOI J-GLOBAL
格フレームの対応付けに基づく用言の言い換え

鍜治伸裕, 河原大輔, 黒橋禎夫, 佐藤理史

自然言語処理 = Journal of natural language processing 10 ( 4 ) 65 - 81 2003年07月 [査読有り]

DOI CiNii
用言と直前の格要素の組を単位とする格フレームの自動構築

河原大輔, 黒橋禎夫

自然言語処理 = Journal of natural language processing 9 ( 1 ) 3 - 19 2002年01月 [査読有り]

DOI CiNii
Language model BERT can estimate trait self-compassion from people's free texts with high accuracy

Hirohito Okano, Daisuke Kawahara, Michio Nomura

2024年03月

DOI
LLM-jp: 日本語に強い大規模言語モデルの研究開発を行う組織横断プロジェクト

河原大輔, 空閑洋平, 黒橋禎夫, 鈴木潤, 宮尾祐介

自然言語処理 31 ( 1 ) 266 - 279 2024年

DOI
Scientific keyphrase extraction: Extracting candidates with semi-supervised data augmentation

Qianying Liu, Daisuke Kawahara, Sujian Li

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11221 183 - 194 2018年

DOI

Scopus

3

被引用数

(Scopus)
Extracting the author of web pages

Yoshikiyo Kato, Daisuke Kawahara, Kentaro Inui, Sadao Kurohashi, Tomohide Shibata

International Conference on Information and Knowledge Management, Proceedings 35 - 41 2008年 [査読有り]

　概要を見る

In this paper, we define the problem of identifying the author of a Web page as a sub-problem of identifying the information sender configuration of a Web page. We propose a method that extracts the author name candidates from a Web page based on linguistic features, and rank the candidates based on local features such as distance from the main content. The evaluation shows that we can achieve more than 75% precision when evaluated with candidates ranked within top five. Copyright 2008 ACM.

DOI
Automatic Text Presentation for the Conversational Knowledge Process

Sadao Kurohashi, Daisuke Kawahara, Nobuhiro Kaji, Tomohide Shibata

Conversational Informatics: An Engineering Approach 201 - 216 2007年10月 [査読有り]

DOI
Cards-to-presentation on the web: generating multimedia contents featuring agent animations

YI Nakano, T Murayama, M Okamoto, D Kawahara, Q Li, S Kurohashi, T Nishida

JOURNAL OF NETWORK AND COMPUTER APPLICATIONS 29 ( 2-3 ) 83 - 104 2006年04月 [査読有り]

　概要を見る

With the goal of supporting the knowledge circulation and creation process in a society, we have studied story-based communication in a network community. On the basis of this research motivation, this paper proposes a web-based multimedia environment called Stream-oriented Public Opinion Channel (SPOC), which enables novice users to embody a story as multimedia content and distribute it on the Internet. The system produces digital camera work for graphics and video clips and automatically generates agent animations according to linguistic information in a text. The findings of our evaluation experiments show that SPOC is easy for novice users to learn and use, suggesting that this system can reduce a user's cost in creating multimedia content and encourage communication in a network community. (c) 2005 Elsevier Ltd. All rights reserved.

DOI
会話型知識プロセスのための言語情報のメディア変換

黒橋禎夫, 大泉敏貴, 柴田知秀, 鍜治伸裕, 河原大輔, 岡本雅史, 西田豊明

社会技術研究論文集 2 173-180 2004年10月 [査読有り]
Text understanding for conversational agent

D Kawahara, R Sasano, S Kurohashi

INTELLIGENT MEDIA TECHNOLOGY FOR COMMUNICATIVE INTELLIGENCE 3490 12 - 20 2004年 [査読有り]

　概要を見る

This paper describes a text understanding system for conversational agents. The system resolves zero, direct and indirect anaphors in Japanese texts by integrating two sorts of linguistic resources: a hand-annotated corpus with various relations and automatically constructed case frames. The corpus has relevance tags which consist of predicate-argument relations, relations between nouns and coreferences, and is utilised for learning parameters of the system and testing it. The case frames are indispensable knowledge both for detecting zero/indirect anaphors. and estimating appropriate antecedents. Our preliminary experiments showed promising results.

▼全件表示

共同研究・競争的資金等の研究課題

人知との融合による大規模言語モデルの大規模知識モデルへの進化

日本学術振興会科学研究費助成事業

研究期間:

2024年04月

-

2028年03月

河原大輔, 笹野遼平, 鈴木潤
百科事典的意味論に基づくレキシコンの研究：大規模コーパスを用いた実証的研究

日本学術振興会科学研究費助成事業

研究期間:

2023年04月

-

2027年03月

松本曜, 小原京子, 中嶌浩貴, 籾山洋介, 河原大輔, 加藤祥, 陳奕廷
「いき」の認知科学

日本学術振興会科学研究費助成事業

研究期間:

2023年06月

-

2026年03月

野村理朗, 河原大輔

　概要を見る

本研究はこの「いき」を問う挑戦的研究として，手始めに「いき」の言語的側面に着眼し，その心理・生物学的基盤の理解を確立する。さらに，言語芸術における「いき」の核心となる要素を，その生成の可能性とともに，自然言語処理，身体性の両者の観点から検証する。すなわち「いき」を表現する川柳の刺激収集とその評価を行い、評定された川柳の“おもろさ“を推定・生成する深層学習モデルを構築，分析することによって「いき」の言語的要素を解明する。
そのために令和5年度は、認知と身体性の軸、これに加えて東洋思想の視点を包括し、いきの要素を含む「川柳」の面白さ、あるいは身体性に関わる構成要素について実験や調査、自然言語処理による検討を重ねてきた。
その主な結果として、1）内受容感覚にどれほど正確に気づくことができるかという信念の測定法として，Interoceptive Accuracy Scale（IAS）の日本語版IASを開発した. 逆。その方法論として281名（平均年齢40.74歳，SD: 8.75，範囲: 20-65，女性143名，男性137名，その他1名）にオンライン上での回答を求めて得られた解析の結果内的一貫性・再検査信頼性・構成概念妥当性を確認した。また自然言語処理により、2) おもしろい川柳を生成する自然言語処理モデルを開発した。モデルが生成した川柳を評価したところ、音数、川柳としての適否、川柳としておもしろさの3項目（後述）すべてでベースライン手法を上回り、手法の有効性を確認した。。以上の結果に加えて、3）東洋思想（「井筒俊彦の共時的構造化」等）の視座をふまえつつ構築した「いき」と相補的なモデルとなる「無心」モデルについての論考が書籍として印刷中である。
自己超越的感情の生起メカニズムに関する心理・生物・情報学的研究

日本学術振興会科学研究費助成事業

研究期間:

2022年04月

-

2025年03月

野村理朗, 河原大輔, 高橋英之, 西平直
計算知と人知の融合による汎用言語理解基盤の構築

日本学術振興会科学研究費助成事業

研究期間:

2021年04月

-

2024年03月

河原大輔, 鈴木潤, 笹野遼平
言内・言外の意味を統合した知識フレームの獲得と言語理解への応用

日本学術振興会科学研究費助成事業

研究期間:

2018年04月

-

2021年03月

河原大輔, 笹野遼平

　概要を見る

計算機による自然言語理解の実現に向けて、述語項構造を基本とする格フレームや「イベント」に対して、言内の意味(denotation)と言外の意味(connotation)に関する知識を獲得した。denotationに関する知識は、深層学習技術を利用して格フレームと意味フレーム(FrameNet)の対応付け、および、意味フレームの自動推定を行った。connotationに関する知識は、イベントに対する感情知識を漸進的に獲得した。また、獲得した知識を深層学習モデルにおいて利用する手法についても考案した。
計算機・人の知を統合したビッグテキスト解析基盤

科学技術振興機構さきがけ

研究期間:

2014年10月

-

2018年03月
Webコンテンツのメタデータ自動付与に基づくシンボルグラウンディング

日本学術振興会科学研究費助成事業挑戦的萌芽研究

研究期間:

2013年04月

-

2016年03月

河原大輔

　概要を見る

本研究課題では、Webコンテンツを対象として、テキスト中のメタデータ認識と、実世界の参照先の同定を行うシステムを研究開発した。メタデータとしては主にWebコンテンツの著者を対象とし、また地名表現についての参照先同定を対象とした。深い自然言語解析技術に加え、実世界情報を利用することによって、既存のシステムよりも高精度な解析が可能となった。
Ｗｅｂから獲得した言語知識をベースとするインタラクティブな外国語学習法の開発

日本学術振興会科学研究費助成事業基盤研究(C)

研究期間:

2012年04月

-

2016年03月

野澤元, 河原大輔, 李在鎬, 渋谷良方

　概要を見る

大規模な英字新聞コーパスを用いて、どのような名詞が様々な動詞の主語や目的語として頻繁に使われるのかを調べ、コーパスに見られるそういった共起する動詞と名詞のパターンを、動詞の用法に熟達する上で英語の学習者が習得する必要のある文法知識とみなした。また、コーパスから得られたそういった用法の例文を手作業で編集することによって、より自然で百科事典的な知識とも一致する、学習者のための文法問題を作成した。さらに、それらの文法問題をインタラクティブに練習することのできるゲーム形式のEラーニングシステムを開発した。
多様なテキストへの高次アノテーションに基づく文脈理解モデルの明確化

日本学術振興会科学研究費助成事業基盤研究(B)

研究期間:

2012年04月

-

2015年03月

黒橋禎夫, 河原大輔, 柴田知秀

　概要を見る

自然言語の意味解析の研究には意味関係を付与したコーパスが必要であるが，従来の意味関係のタグ付きコーパスは新聞記事を中心に整備されてきた．しかし，文書には多様なジャンル，文体が存在し，その中には新聞記事では出現しないような言語現象も出現する．本研究では，従来のタグ付け基準では扱われてこなかった現象に対して新たなタグ付け基準を設定し，ウェブを利用することで多様な文書の書き始めからなる意味関係タグ付きコーパスを構築した．さらに，2段階のクラウドソーシングにより，談話関係タグ付きコーパスを構築する手法を考案した．
言語使用の大規模観察に基づく言語知識獲得と言語解析の共深化

日本学術振興会科学研究費助成事業若手研究(A)

研究期間:

2011年04月

-

2014年03月

河原大輔

　概要を見る

本研究では、まず、大規模Webテキスト集合から言語知識を自動獲得した。主な言語知識は、述語とそれがとる項(名詞)を記述した格フレームと呼ばれるものである。次に、獲得した言語知識を言語解析に組み込むことによって言語解析システムの精度向上を達成した。また、この高いカバレッジをもつ言語知識に基づく言語解析器を情報検索に応用し、これまでよりも精度が高い情報検索システムを開発した。
構造的言語処理による情報検索基盤技術の構築

日本学術振興会科学研究費助成事業特定領域研究

研究期間:

2007年

-

2010年

黒橋禎夫, 河原大輔, 柴田知秀, 新里圭司, 笹野遼平

　概要を見る

情報検索の本来の目的は、表面的なテキストではなく、その中の情報・知識を得ることであり、そのためには計算機によるテキストの理解、言語の理解が本質的に必要となる。構造的言語処理によって、語を単位とするのではなく述語項構造を単位とし、言語表現の多様性を吸収し、検索結果クラスタリングに基づく鳥瞰図的把握を提供する,次世代情報検索の基盤技術を構築した。

▼全件表示

Misc

Demystifying Mixed Outcomes of Self-Training: Pre-training Analyses on Non-Toy LLMs

Yusuke Nakamura, Hirokazu Kiyomaru, Chaoran Liu, Shuhei Kurita, Daisuke Kawahara

Findings of the Association for Computational Linguistics: EACL 2026 4107 - 4113 2026年03月 [査読有り]

DOI
Scaling Data-Constrained Language Models with Synthetic Data

Hirokazu Kiyomaru, Yusuke Oda, Takashi Kodama, Chaoran Liu, Daisuke Kawahara

Findings of the Association for Computational Linguistics: EACL 2026 1002 - 1016 2026年03月 [査読有り]

DOI
Evaluating the Impact of SAE-based Language Steering on LLM Performance

Sebastian Zwirner, Wentao Hu, Koshiro Aoki, Daisuke Kawahara

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop) 555 - 568 2026年03月 [査読有り]

DOI
非母語話者の日本語に対する言語モデルの差別的傾向

折田奈甫, 小川隼斗, 河原大輔

第50回社会言語科学会研究大会 2026年03月 [査読有り]
VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction

Hao Wang, Eiki Murata, Lingfang Zhang, Ayako Sato, So Fukuda, Ziqi Yin, Wentao Hu, Keisuke Nakao, Yusuke Nakamura, Sebastian Zwirner, Yi-Chia Chen, Hiroyuki Otomo, Hiroki Ouchi, Daisuke Kawahara

Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26) 2026年01月 [査読有り]
Testing Simulation Theory in LLMs’ Theory of Mind

Koshiro Aoki, Daisuke Kawahara

Proceedings of the 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Student Research Workshop 96 - 104 2025年12月 [査読有り]
Leveraging High-Resource English Corpora for Cross-lingual Domain Adaptation in Low-Resource Japanese Medicine via Continued Pre-training

Kazuma Kobayashi, Zhen Wan, Fei Cheng, Yuma Tsuta, Xin Zhao, Junfeng Jiang, Jiahao Huang, Zhiyi Huang, Yusuke Oda, Rio Yokota, Yuki Arase, Daisuke Kawahara, Akiko Aizawa, Sadao Kurohashi

Findings of the Association for Computational Linguistics: EMNLP 2025 11469 - 11488 2025年11月 [査読有り]

DOI
Harmony: A Human-Aware, Responsive, Modular Assistant with a Locally Deployed Large Language Model

Ziqi Yin, Mingxin Zhang, Daisuke Kawahara

Companion of the 2025 ACM International Joint Conference on Pervasive and Ubiquitous Computing 126 - 130 2025年10月 [査読有り]

DOI
Building Japanese Creativity Benchmarks and Applying them to Enhance LLM Creativity

So Fukuda, Hayato Ogawa, Kaito Horio, Daisuke Kawahara, Tomohide Shibata

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop) 939 - 957 2025年07月 [査読有り]

DOI
Detecting Honkadori based on Waka Embeddings

Hayato Ogawa, Kaito Horio, Daisuke Kawahara

Proceedings of the Second Workshop on Ancient Language Processing 112 - 119 2025年05月 [査読有り]

DOI
Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model

Keito Sasagawa, Koki Maeda, Issa Sugiura, Shuhei Kurita, Naoaki Okazaki, Daisuke Kawahara

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations) 470 - 484 2025年05月 [査読有り]

DOI
Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance

Ziqi Yin, Hao Wang, Kaito Horio, Daisuike Kawahara, Satoshi Sekine

Proceedings of the Second Workshop on Social Influence in Conversations (SICon 2024) 9 - 35 2024年11月 [査読有り]

DOI
A Comprehensive Analysis of Memorization in Large Language Models

Hirokazu Kiyomaru, Issa Sugiura, Daisuke Kawahara, Sadao Kurohashi

Proceedings of the 17th International Natural Language Generation Conference 584 - 596 2024年09月 [査読有り]

DOI
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition

Hao Wang, Shuhei Kurita, Shuichiro Shimizu, Daisuke Kawahara

Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR) 129 - 137 2024年08月 [査読有り]

DOI
Investigating Web Corpus Filtering Methods for Language Model Development in Japanese

Rintaro Enomoto, Arseny Tolmachev, Takuro Niitsuma, Shuhei Kurita, Daisuke Kawahara

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop) 154 - 160 2024年06月 [査読有り]

DOI
Improving Repository-level Code Search with Text Conversion

Mizuki Kondo, Daisuke Kawahara, Toshiyuki Kurabayashi

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop) 130 - 137 2024年06月 [査読有り]

DOI
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation

Hao Wang, Tetsuro Morimura, Ukyo Honda, Daisuke Kawahara

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop) 212 - 218 2024年06月 [査読有り]

DOI
A Benchmark Suite of Japanese Natural Questions

Takuya Uematsu, Hao Wang, Daisuke Kawahara, Tomohide Shibata

Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024) 58 - 68 2024年06月 [査読有り]

DOI
Time-aware COMET: a Commonsense Knowledge Model with Temporal Knowledge

Eiki Murata, Daisuke Kawahara

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) 16162 - 16174 2024年05月 [査読有り]
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for Text Generation

Tomohito Kasahara, Daisuke Kawahara

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Student Research Workshop 24 - 31 2023年11月 [査読有り]

DOI
Kanbun-LM: Reading and Translating Classical Chinese in Japanese Methods by Language Models

Hao Wang, Hirofumi Shimizu, Daisuke Kawahara

Findings of the Association for Computational Linguistics: ACL 2023 8589 - 8601 2023年07月 [査読有り]

DOI
KWJA: A Unified Japanese Analyzer Based on Foundation Models

Nobuhiro Ueda, Kazumasa Omura, Takashi Kodama, Hirokazu Kiyomaru, Yugo Murawaki, Daisuke Kawahara, Sadao Kurohashi

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) 538 - 548 2023年07月 [査読有り]

DOI
Theoretical Linguistics Rivals Embeddings in Language Clustering for Multilingual Named Entity Recognition

Sakura Imai, Daisuke Kawahara, Naho Orita, Hiromune Oda

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop) 139 - 151 2023年07月 [査読有り]

DOI
Building a Personalized Dialogue System with Prompt-Tuning

Tomohito Kasahara, Daisuke Kawahara, Nguyen Tung, Shengzhe Li, Kenta Shinzato, Toshinori Sato

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop 96 - 105 2022年07月 [査読有り]
Generate, Evaluate, and Select: A Dialogue System with a Response Evaluator for Diversity-Aware Response Generation

Ryoma Sakaeda, Daisuke Kawahara

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop 76 - 82 2022年07月 [査読有り]
Grounding in social media: An approach to building a chit-chat dialogue model

Ritvik Choudhary, Daisuke Kawahara

In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop 9 - 15 2022年07月 [査読有り]
JGLUE: Japanese General Language Understanding Evaluation

Kentaro Kurihara, Daisuke Kawahara, Tomohide Shibata

In Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022) 2957 - 2966 2022年06月 [査読有り]
Building a Dialogue Corpus Annotated with Expressed and Experienced Emotions

Tatsuya Ide, Daisuke Kawahara

In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop 21 - 30 2022年05月 [査読有り]
Multi-Task Learning of Generation and Classification for Emotion-Aware Dialogue Response Generation

Tatsuya Ide, Daisuke Kawahara

NAACL Student Research Workshop (SRW) 2021 119 - 125 2021年06月 [査読有り]
BERT-based Cohesion Analysis of Japanese Texts

Nobuhiro Ueda, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 28th International Conference on Computational Linguistics (COLING2020) 1323 - 1333 2020年12月 [査読有り]

研究発表ペーパー・要旨（国際会議）
A Method for Building a Commonsense Inference Dataset based on Basic Events

Kazumasa Omura, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP2020) 2450 - 2460 2020年11月 [査読有り]

研究発表ペーパー・要旨（国際会議）
A System for Worldwide COVID-19 Information Aggregation

Akiko Aizawa, Frederic Bergeron, Junjie Chen, Fei Cheng, Katsuhiko Hayashi, Kentaro Inui, Hiroyoshi Ito, Daisuke Kawahara, Masaru Kitsuregawa, Hirokazu Kiyomaru, Masaki Kobayashi, Takashi Kodama, Sadao Kurohashi, Qianying Liu, Masaki Matsubara, Yusuke Miyao, Atsuyuki Morishima, Yugo Murawaki, Kazumasa Omura, Haiyue Song, Eiichiro Sumita, Shinji Suzuki, Ribeka Tanaka, Yu Tanaka, Masashi Toyoda, Nobuhiro Ueda, Honai Ueoka, Masao Utiyama, Ying Zhong

In Proceedings of Workshop on NLP for COVID-19 (Part 2) at EMNLP2020 abs/2008.01523 2020年11月 [査読有り]
Building a Japanese Typo Dataset from Wikipedia's Revision History

Yu Tanaka, Yugo Murawaki, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (ACL2020SRW) 230 - 236 2020年07月 [査読有り]

研究発表ペーパー・要旨（国際会議）
Acquiring Social Knowledge about Personality and Driving-related Behavior

Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi

In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC2020) 2306 - 2315 2020年05月 [査読有り]

研究発表ペーパー・要旨（国際会議）
Development of a Japanese Personality Dictionary based on Psychological Methods

Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi

In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC2020) 3103 - 3108 2020年05月 [査読有り]

研究発表ペーパー・要旨（国際会議）
Tree-structured Decoding for Solving Math Word Problems

Qianying Liu, Wenyu Guan, Sujian Li, Daisuke Kawahara

In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing 2019年11月 [査読有り]
Machine Comprehension Improves Domain-Specific Japanese Predicate-Argument Structure Analysis

Norio Takahashi, Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of EMNLP-IJCNLP 2019 Workshop MRQA: Machine Reading for Question Answering, Hong Kong 2019年11月 [査読有り]
Diversity-aware Event Prediction based on a Conditional Variational Autoencoder with Reconstruction

Hirokazu Kiyomaru, Kazumasa Omura, Yugo Murawaki, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing (COIN) 113 - 122 2019年11月 [査読有り]
A Community Detection Method Towards Analysis of Xi Feng Parties in the Northern Song Dynasty

Qianying Liu, Qiyao Wang, Wending Chen, Daisuke Kawahara

In Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33) 2019年09月 [査読有り]
A Better Ad Experience: Click Prediction Leveraging Sequential Networks Derived Specifically From User Search Behaviors

Shengzhe Li, Tomoko Izumi, Yu Kuratake, Jiali Yao, Jerry Turner, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33) 2019年09月 [査読有り]
Applying Machine Translation to Psychology: Automatic Translation of Personality Adjectives

Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi

In Proceedings of the 17th Machine Translation Summit (MT Summit XVII) 2019年08月 [査読有り]
Emotion helps Sentiment: A Multi-task Model for Sentiment and Emotion Analysis

Abhishek Kumar, Asif Ekbal, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 2019 International Joint Conference on Neural Networks 2019年07月 [査読有り]
Shrinking Japanese Morphological Analyzers With Neural Networks and Semi-supervised Learning

Arseny Tolmachev, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of NAACL-HLT 2019: Annual Conference of the North American Chapter of the Association for Computational Linguistics 2744 - 2755 2019年06月 [査読有り]
Annotating a Driving Experience Corpus with Behavior and Subjectivity

Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi

In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation (PACLIC 32) 2018年12月 [査読有り]
Juman++: A Morphological Analysis Toolkit for Scriptio Continua

Arseny Tolmachev, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of EMNLP 2018: Conference on Empirical Methods in Natural Language Processing, System Demonstrations 54 - 54 2018年11月 [査読有り]
Cross-lingual Knowledge Projection Using Machine Translation and Target-side Knowledge Base Completion

Naoki Otani, Hirokazu Kiyomaru, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics 1508 - 1520 2018年08月 [査読有り]
Neural Adversarial Training for Semi-supervised Japanese Predicate-argument Structure Analysis

Shuhei Kurita, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL2018) 2018年07月 [査読有り]

DOI
Knowledge-enriched Two-layered Attention Network for Sentiment Analysis

Abhishek Kumar, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT2018), Volume 2 (Short Papers) 253 - 258 2018年06月 [査読有り]
JDCFC: A Japanese Dialogue Corpus with Feature Changes

Tetsuaki Nakamura, Daisuke Kawahara

In Proceedings of the 11th Edition of its Language Resources and Evaluation Conference 2915 - 2921 2018年05月 [査読有り]
Comprehensive Annotation of Various Types of Temporal Information on the Time Axis

Tomohiro Sakaguchi, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 11th Edition of its Language Resources and Evaluation Conference 332 - 338 2018年05月 [査読有り]
Improving Crowdsourcing-Based Annotation of Japanese Discourse Relations

Yudai Kishimoto, Shinnosuke Sawada, Yugo Murawaki, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 11th Edition of its Language Resources and Evaluation Conference 2018年05月 [査読有り]
JFCKB: Japanese Feature Change Knowledge Base

Tetsuaki Nakamura, Daisuke Kawahara

In Proceedings of the 11th Edition of its Language Resources and Evaluation Conference 1398 - 1404 2018年05月 [査読有り]
Automatically Acquired Lexical Knowledge Improves Japanese Joint Morphological and Dependency Analysis

Daisuke Kawahara, Yuta Hayashibe, Hajime Morita, Sadao Kurohashi

In Proceedings of the 15th International Conference on Parsing Technologies (IWPT2017) 1 - 10 2017年09月 [査読有り]
Improving Chinese Semantic Role Labeling using High-quality Surface and Deep Case Frames

Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL2017) 567 - 576 2017年04月 [査読有り]
Neural joint model for transition-based Chinese syntactic analysis

Shuhei Kurita, Daisuke Kawahara, Sadao Kurohashi

ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) 1 1204 - 1214 2017年 [査読有り]

DOI
Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language

Mo Shen, Wingmui Li, HyunJeong Choe, Chenhui Chu, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 26th International Conference on Computational Linguistics (COLING2016) 298 - 308 2016年12月 [査読有り]
SCTB: A Chinese Treebank in Scientific Domain

Chenhui Chu, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 12th Workshop on Asian Language Resources (ALR12 2016) 2016年12月 [査読有り]
Large-Scale Acquisition of Commonsense Knowledge via a Quiz Game on a Dialogue System

Naoki Otani, Daisuke Kawahara, Sadao Kurohashi, Nobuhiro Kaji, Manabu Sassano

In Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016) 11 - 20 2016年12月 [査読有り]
IRT-based Aggregation Model of Crowdsourced Pairwise Comparison for Evaluating Machine Translations

Naoki Otani, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of EMNLP 2016: Conference on Empirical Methods in Natural Language Processing 511 - 520 2016年11月 [査読有り]
Age Related Differences in Episodic Memory Recollections: Applying Latent Dirichlet Allocation to Free-Writings on Driving Incidents by Older and Young Drivers

Ritsuko Iwai, Takatsune Kumada, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 38th Annual Conference of Cognitive Science Society (poster) 2016年08月 [査読有り]
Leveraging VerbNet to Build Corpus-Specific Verb Clusters

Daniel Peterson, Daisuke Kawahara, Jordan Boyd-Graber, Martha Palmer

In Proceedings of *SEM 2016: The Fifth Joint Conference on Lexical and Computational Semantics 102 - 107 2016年08月 [査読有り]
Neural Network-Based Model for Japanese Predicate Argument Structure Analysis

Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL2016) 1235 - 1244 2016年08月 [査読有り]
Design of Word Association Games using Dialog Systems for Acquisition of Word Association Knowledge

Yuichiro Machida, Daisuke Kawahara, Sadao Kurohashi, Manabu Sassano

In Proceedings of the 5th Workshop on Automated Knowledge Base Construction (AKBC) 2016 86 - 91 2016年06月 [査読有り]
Constructing a Dictionary Describing Feature Changes of Arguments in Event Sentences

Tetsuaki Nakamura, Daisuke Kawahara

In Proceedings of the 4th Workshop on EVENTS: Definition, Detection, Coreference, and Representation 2016年06月 [査読有り]
M2L at SemEval-2016 Task 8 “Meaning Representation Parsing”: AMR Parsing with Neural Networks

Yevgeniy Puzikov, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016) 2016年06月 [査読有り]
Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model

Hajime Morita, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of EMNLP 2015: Conference on Empirical Methods in Natural Language Processing 2292 - 2297 2015年09月 [査読有り]
Chinese Semantic Role Labeling using High-quality Syntactic Knowledge

Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

The 8th SIGHAN Workshop on Chinese Language Processing 120 - 127 2015年07月 [査読有り]
Corpus Patterns for Semantic Processing

Patrick Hanks, Elisabetta Jezek, Daisuke Kawahara, Octavian Popescu

In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP2015) (Tutorials) 12 - 15 2015年07月 [査読有り]
Classification and Acquisition of Contradictory Event Pairs using Crowdsourcing

Yu Takabatake, Hajime Morita, Daisuke Kawahara, Sadao Kurohashi, Ryuichiro Higashinaka, Yoshihiro Matsuo

In Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation 99 - 107 2015年06月 [査読有り]
Location Name Disambiguation Exploiting Spatial Proximity and Temporal Consistency

Takashi Awamura, Eiji Aramaki, Daisuke Kawahara, Tomohide Shibata, Sadao Kurohashi

In Proceedings of the 3rd International Workshop on Natural Language Processing for Social Media 1 - 9 2015年06月 [査読有り]
Toward an Advice Agent for Diet and Exercise Based on Diary Texts

Tetsuaki Nakamura, Takashi Awamura, Yiqi Zhang, Eiji Aramaki, Daisuke Kawahara, Sadao Kurohashi

Ambient Intelligence for Health and Cognitive Enhancement, Papers from the AAAI Spring Symposium, Technical Report SS-15-01 43 - 48 2015年03月 [査読有り]
Post-Editing User Interface Using Visualization of a Sentence Structure

Yudai Kishimoto, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

In Proceeding of AMTA2014 Workshop on Post-Editing Technology and Practice (WPTP3) 2014年10月 [査読有り]
Rapid Development of a Corpus with Discourse Annotations using Two-stage Crowdsourcing

Daisuke Kawahara, Yuichiro Machida, Tomohide Shibata, Sadao Kurohashi, Hayato Kobayashi, Manabu Sassano

In Proceedings of the 25th International Conference on Computational Linguistics (COLING2014) 269 - 278 2014年08月 [査読有り]
Dependency Parse Reranking with Rich Subtree Features

Mo Shen, Daisuke Kawahara, Sadao Kurohashi

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 22 ( 7 ) 1208 - 1218 2014年07月 [査読有り]

DOI
A Step-wise Usage-based Method for Inducing Polysemy-aware Verb Classes

Daisuke Kawahara, Daniel W. Peterson, Martha Palmer

In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL2014) 1030 - 1040 2014年06月 [査読有り]
Chinese Morphological Analysis with Character-level POS Tagging (Short Paper)

Mo Shen, Hongxiao Liu, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL2014) 2014年06月 [査読有り]
A Self-correcting Approach to Solve Syntactic Ambiguities based on Collocational Strength

Hajime Nozawa, Daisuke Kawahara

In Proceedings of the 6th International Conference on Corpus Linguistics (CILC2014) 2014年05月 [査読有り]
Inducing Example-based Semantic Frames from a Massive Amount of Verb Uses

Daisuke Kawahara, Daniel W. Peterson, Octavian Popescu, Martha Palmer

In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL2014) 58 - 67 2014年04月 [査読有り]
A Framework for Compiling High Quality Knowledge Resources From Raw Corpora

Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION 109 - 114 2014年 [査読有り]
Single Classifier Approach for Verb Sense Disambiguation based on Generalized Features

Daisuke Kawahara, Martha Palmer

LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION 4210 - 4213 2014年 [査読有り]
Towards Fully Lexicalized Dependency Parsing for Korean

Jungyeul Park, Daisuke Kawahara, Sadao Kurohashi, Key-Sun Choi

In Proceedings of the 13th International Conference on Parsing Technologies (IWPT2013, short paper) 120 - 126 2013年11月 [査読有り]
High Quality Dependency Selection from Automatic Parses

Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP2013) 947 - 951 2013年10月 [査読有り]
Precise Information Retrieval Exploiting Predicate-Argument Structures

Daisuke Kawahara, Keiji Shinzato, Tomohide Shibata, Sadao Kurohashi

In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP2013) 37 - 45 2013年10月 [査読有り]
Japanese Zero Reference Resolution Considering Exophora and Author/Reader Mentions

Masatsugu Hangyo, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of EMNLP 2013: Conference on Empirical Methods in Natural Language Processing 924 - 934 2013年10月 [査読有り]
Chinese Word Segmentation by Mining Maximized Substrings

Mo Shen, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP2013) 171 - 179 2013年10月 [査読有り]
Automatic Knowledge Acquisition for Case Alternation between the Passive and Active Voices in Japanese

Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi, Manabu Okumura

In Proceedings of EMNLP 2013: Conference on Empirical Methods in Natural Language Processing 1213 - 1223 2013年10月 [査読有り]
Building a Diverse Document Leads Corpus Annotated with Semantic Relations

Masatsugu Hangyo, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of 26th Pacific Asia Conference on Language Information and Computing 535 - 544 2012年11月 [査読有り]
A Reranking Approach for Dependency Parsing with Variable-sized Subtree Features

Mo Shen, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of 26th Pacific Asia Conference on Language Information and Computing 308 - 317 2012年11月 [査読有り]
Exploiting Shared Chinese Characters in Chinese Word Segmentation Optimization for Chinese-Japanese Machine Translation

Chenhui Chu, Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT2012) 35 - 42 2012年05月 [査読有り]
Generative Modeling of Coordination by Factoring Parallelism and Selectional Preferences

Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP2011) 456 - 464 2011年11月 [査読有り]
Web information analysis for open-domain decision support: System design and user evaluation

Takuya Kawada, Susumu Akamine, Daisuke Kawahara, Yoshikiyo Kato, Yutaka I. Leon-Suematsu, Kentaro Inui, Sadao Kurohashi, Yutaka Kidawara

ACM International Conference Proceeding Series 13 - 18 2011年 [査読有り]

DOI
Identifying Contradictory and Contrastive Relations between Statements to Outline Web Information on a Given Topic

Daisuke Kawahara, Kentaro Inui, Sadao Kurohashi

COLING 2010 Poster Volume 534 - 542 2010年08月 [査読有り]
Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation

Daisuke Kawahara, Sadao Kurohashi

LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION 1389 - 1393 2010年 [査読有り]
Organizing information on the web to support user judgments on information credibility

Susumu Akamine, Daisuke Kawahara, Yoshikiyo Kato, Tetsuji Nakagawa, Yutaka I. Leon-Suematsu, Takuya Kawada, Kentaro Inui, Sadao Kurohashi, Yutaka Kidawara

2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings 123 - 130 2010年 [査読有り]

DOI
Capturing Consistency between Intra-clause and Inter-clause Relations in Knowledge-rich Dependency and Case Structure Analysis

Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 11th International Conference on Parsing Technology (IWPT'09) 108 - 116 2009年10月 [査読有り]

CiNii
WISDOM: A Web Information Credibility Analysis System

Susumu Akamine, Daisuke Kawahara, Yoshikiyo Kato, Tetsuji Nakagawa, Kentaro Inui, Sadao Kurohashi, Yutaka Kidawara

In Proceedings of the ACL-IJCNLP 2009 Software Demonstrations 1 - 4 2009年08月 [査読有り]

CiNii
The Effect of Corpus Size on Case Frame Acquisition for Discourse Analysis

Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT2009) 521 - 529 2009年06月 [査読有り]

CiNii
Summarizing evaluative information on the web for information credibility analysis

Daisuke Kawahara, Tetsuji Nakagawa, Takuya Kawada, Kentaro Inui, Sadao Kurohashi

ACM International Conference Proceeding Series 187 - 192 2009年 [査読有り]

DOI
Development of a large-scale web crawler and search engine infrastructure

Susumu Akamine, Yoshikiyo Kato, Daisuke Kawahara, Keiji Shinzato, Kentaro Inui, Sadao Kurohashi, Yutaka Kidawara

ACM International Conference Proceeding Series 126 - 131 2009年 [査読有り]

DOI
Identifying Information Sender Configuration of Web Pages

Yoshikiyo Kato, Daisuke Kawahara, Kentaro Inui, Sadao Kurohashi, Tomohide Shibata

2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1 335 - 340 2009年 [査読有り]

DOI
Coordination Disambiguation without Any Similarities

Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 22nd International Conference on Computational Linguistics (COLING2008) 425 - 432 2008年08月 [査読有り]

CiNii
A Fully-Lexicalized Probabilistic Model for Japanese Zero Anaphora Resolution

Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 22nd International Conference on Computational Linguistics (COLING2008) 769 - 776 2008年08月 [査読有り]

CiNii
Chinese Dependency Parsing with Large Scale Automatically Constructed Case Structures

Kun Yu, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 22nd International Conference on Computational Linguistics (COLING2008) 1049 - 1056 2008年08月 [査読有り]
TSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology

Keiji Shinzato, Tomohide Shibata, Daisuke Kawahara, Chikara Hashimoto, Sadao Kurohashi

In Proceedings of Third International Joint Conference on Natural Language Processing (IJCNLP2008) 189 - 196 2008年01月 [査読有り]

CiNii
A Large-Scale Web Data Collection as a Natural Language Processing Infrastructure

Keiji Shinzato, Daisuke Kawahara, Chikara Hashimoto, Sadao Kurohashi

SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008 2236 - 2241 2008年 [査読有り]
Information Credibility Analysis of Web Contents

Sadao Kurohashi, Susumu Akamine, Daisuke Kawahara, Yoshikiyo Kato, Tetsuji Nakagawa, Kentaro Inui, Yutaka Kidawara

PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION 146 - + 2008年 [査読有り]

DOI
Grasping major statements and their contradictions toward information credibility analysis of web contents

Daisuke Kawahara, Sadao Kurohashi, Kentaro Inui

Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008 393 - 397 2008年 [査読有り]

DOI
Learning Head-modifier Pairs to Improve Lexicalized Dependency Parsing on a Chinese Treebank

Kun Yu, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of TLT 2007 2007年12月 [査読有り]
Probabilistic Coordination Disambiguation in a Fully-Lexicalized Japanese Parser

Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL2007) 306 - 314 2007年06月 [査読有り]
Improving coreference resolution using bridging reference resolution and automatically acquired synonyms

Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi

ANAPHORA: ANALYSIS, ALGORITHMS AND APPLICATIONS 4410 125 - + 2007年 [査読有り]
Example-based Machine Translation based on Deeper NLP

Toshiaki Nakazawa, Kun Yu, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of International Workshop on Spoken Language Translation (IWSLT'06) 64 - 70 2006年11月 [査読有り]

CiNii
A Fully-Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis

Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics 176 - 183 2006年06月 [査読有り]

CiNii
Case Frame Compilation from the Web using High-Performance Computing

Daisuke Kawahara, Sadao Kurohashi

the 5th International Conference on Language Resources and Evaluation 2006年05月 [査読有り]

CiNii
Automatic acquisition of basic Katakana lexicon from a given corpus

T Nakazawa, D Kawahara, S Kurohashi

NATURAL LANGUAGE PROCESSING - IJCNLP 2005, PROCEEDINGS 3651 682 - 693 2005年 [査読有り]

DOI
PP-attachment disambiguation boosted by a gigantic volume of unambiguous examples

D Kawahara, S Kurohashi

NATURAL LANGUAGE PROCESSING - IJCNLP 2005, PROCEEDINGS 3651 188 - 198 2005年 [査読有り]

DOI
Improving Japanese Zero Pronoun Resolution by Global Word Sense Disambiguation

Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 20th International Conference on Computational Linguistics 343 - 349 2004年08月 [査読有り]
Automatic Construction of Nominal Case Frames and its Application to Indirect Anaphora Resolution

Ryohei Sasano, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 20th International Conference on Computational Linguistics 1201 - 1207 2004年08月 [査読有り]

CiNii
Toward Text Understanding: Integrating Relevance-tagged Corpus and Automatically Constructed Case Frames

Daisuke Kawahara, Ryohei Sasano, Sadao Kurohashi

In Proceedings of the 4th International Conference on Language Resources and Evaluation 1833 - 1836 2004年05月 [査読有り]

CiNii
Zero Pronoun Resolution based on Automatically Constructed Case Frames and Structural Preference of Antecedents

Daisuke Kawahara, Sadao Kurohashi

In Proceedings of the 1st International Joint Conference on Natural Language Processing 334 - 341 2004年03月 [査読有り]

CiNii
Converting Text into Agent Animations: Assigning Gestures to Texts

Yukiko I. Nakano, Masashi Okamoto, Daisuke Kawahara, Qing Li, Toyoaki Nishida

Proceedings of Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2004), Companion Volume, pp. 153-156 2004年 [査読有り]
Structural analysis of instruction utterances using linguistic and visual information

T Shibata, M Tachiki, D Kawahara, M Okamoto, S Kurohashi, T Nishida

KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS 3213 393 - 400 2004年 [査読有り]
Structural analysis of instruction utterances

T Shibata, D Kawahara, M Okamoto, S Kurohashi, T Nishida

KNOWLEDGE-BASED INTELLIGNET INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS 2774 1054 - 1061 2003年 [査読有り]
Fertilization of Case Frame Dictionary for Robust Japanese Case Analysis

Daisuke Kawahara, Sadao Kurohashi

In Proceedings of 19th COLING (COLING02) 2002年08月 [査読有り]

CiNii
Construction of a Japanese Relevance-tagged Corpus

Daisuke Kawahara, Sadao Kurohashi, Koiti Hasida

In Proceedings of The Third International Conference on Language Resources Evaluation 495 - 498 2002年05月 [査読有り]

CiNii
Verb paraphrase based on case frame alignment

N Kaji, D Kawahara, S Kurohash, S Sato

40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE 215 - 222 2002年 [査読有り]
韻と内容を制御した日本語ラップ歌詞の自動生成

織田宥楽, 小川隼斗, 河原大輔

情報処理学会第267回自然言語処理研究発表会 2026年03月
官庁出版物コーパスを用いた日本語LLMの継続事前学習とその分析

屋藤翔麻, 清丸寛一, 小田悠介, 河原大輔

情報処理学会第267回自然言語処理研究発表会 2026年03月
LLMの生成テキストの真偽検証のための日本語真偽判定データセットの構築

政野美和, 清丸寛一, 欅惇志, 堀尾海斗, 源怜維, 欅リベカ, 中山功太, 橘秀幸, 河原大輔

言語処理学会第32回年次大会 2026年03月
WAON: 視覚言語モデルのための大規模かつ高品質な日本語画像・テキスト対データセット

杉浦一瑳, 栗田修平, 小田悠介, 河原大輔, 岡部寿男, 岡崎直観

言語処理学会第32回年次大会 1612 - 1617 2026年03月
LLMの生成テキストの真偽検証のための日本語言説分解データセットの構築と評価

政野美和, 欅リベカ, 欅惇志, 清丸寛一, 中山功太, 堀尾海斗, 源怜維, 橘秀幸, 河原大輔

言語処理学会第32回年次大会 1394 - 1399 2026年03月
JAMMEval: 再アノテーションによる日本語VQA評価データセットの信頼性向上

杉浦一瑳, 前田航希, 栗田修平, 小田悠介, 河原大輔, 岡崎直観

言語処理学会第32回年次大会 1104 - 1109 2026年03月
LLM-jp FactCheck：学習コーパスに照らした真偽検証によるモデル応答の分析

清丸寛一, 出口祥之, 政野美和, 源怜維, 堀尾海斗, 欅惇志, 中山功太, 橘秀幸, 欅リベカ, 河原大輔

言語処理学会第32回年次大会 248 - 253 2026年03月
大規模コーパスにおける要配慮個人情報検出の精度向上

源怜維, 小田悠介, 河原大輔

言語処理学会第32回年次大会 3611 - 3616 2026年03月
指示認識テキスト埋め込みモデルの指示によるベクトル変位の分析

小川隼斗, 福地成彦, 李聖哲, 河原大輔

言語処理学会第32回年次大会 3129 - 3134 2026年03月
GRPOを用いた日本語ラップの歌詞生成モデルの構築

小川隼斗, 河原大輔

言語処理学会第32回年次大会 4233 - 4238 2026年03月
オープンなVLMを活用した日本語マルチモーダル指示データセットの構築

中尾圭佑, 栗田修平, 河原大輔

言語処理学会第32回年次大会 642 - 647 2026年03月
大規模言語モデルに対するプロービングによる複合動詞の意味理解の分析

小野聡, 河原大輔

言語処理学会第32回年次大会 1590 - 1595 2026年03月
Improving SAE-based Language Steering with Prompting in Large Language Models

Sebastian Zwirner, Wentao Hu, Koshiro Aoki, Daisuke Kawahara

言語処理学会第32回年次大会 2555 - 2560 2026年03月
尤度を用いた進化戦略によるLLMの最適化

福田創, 河原大輔

言語処理学会第32回年次大会 2639 - 2644 2026年03月
大規模言語モデルの手順型応答を対象としたファクトチェックフレームワークの構築

杣谷星音, 河原大輔

言語処理学会第32回年次大会 1036 - 1041 2026年03月
アンサンブル蒸留と学習ベース集計を用いた数学的推論プロセスの検証と性能分析

榎本倫太郎, 栗田修平, 河原大輔

言語処理学会第32回年次大会 3117 - 3122 2026年03月
クロスコーダーを用いた脳と言語モデルにおける内部表現の特徴量比較

青木洸士郎, 濵田偉月, 折田奈甫, 河原大輔, 酒井弘

言語処理学会第32回年次大会 2543 - 2548 2026年03月
マルチモーダルLLM の縦書きテキスト読み取り能力の評価

笹川慶人, 栗田修平, 河原大輔

言語処理学会第32回年次大会 607 - 612 2026年03月
JaCarEval: 日本語車載対話に対するLLM評価器のメタ評価フレームワーク

藤田一颯, 織田宥楽, Sebastian Zwirner, 河原大輔

言語処理学会第32回年次大会 3421 - 3426 2026年03月
応答内容・順序に着目した音声対話ベンチマークの構築

渡邉一功, 水本智也, 周藤唯, 河原大輔

言語処理学会第32回年次大会 648 - 653 2026年03月 [査読有り]
JMT-Safety: 日本語マルチターン対話における安全性評価ベンチマーク

五十里渚, 福田創, 高山隼矢, 綿岡晃輝, 河原大輔

言語処理学会第32回年次大会 3605 - 3610 2026年03月
LLM の生成テキストの真偽検証のための日本語言説分解データセットの構築

政野美和, 欅リベカ, 欅惇志, 清丸寛一, 中山功太, 堀尾海斗, 源怜維, 橘秀幸, 河原大輔

情報処理学会第265回自然言語処理研究発表会 2025年09月
JMTEB and JMTEB-lite: Japanese Massive Text Embedding Benchmark and Its Lightweight Version

李聖哲, 大萩雅也, 李凌寒, 福地成彦, 柴田知秀, 河原大輔

情報処理学会第265回自然言語処理研究発表会 2025年09月
複数言語の逐次学習による多言語モデルの分類タスク性能の検証

張齢方, 河原大輔

2025年度人工知能学会全国大会（第39回） 2025年05月
文長制限を設けた問題文による早押しクイズ用Retrieverの学習

佐々木斗海, 小林俊介, 河原大輔

2025年度人工知能学会全国大会（第39回） 2025年05月
教師なし学習によるラップの形式の学習

織田宥楽, 小川隼斗, 河原大輔

2025年度人工知能学会全国大会（第39回） 2025年05月
大規模言語モデルによる数学概念誤解分析

陳奕嘉, 河原大輔

2025年度人工知能学会全国大会（第39回） 2025年05月
実在しないエンティティや出来事に関する合成文書を用いたRAGベンチマークの構築

李聖哲, 大萩雅也, 塚越駿, 福地成彦, 柴田知秀, 河原大輔

情報処理学会第263回自然言語処理研究発表会 2025年03月
対話に対する共感のアノテーションと共感制御可能な対話モデルの構築

鈴江万碧, 堀尾海斗, 折田奈甫, 河原大輔

言語処理学会第31回年次大会 4132 - 4137 2025年03月
Japanese MT-bench++: より自然なマルチターン対話設定の日本語大規模ベンチマーク

植松拓也, 福田創, 河原大輔, 柴田知秀

言語処理学会第31回年次大会 3569 - 3574 2025年03月
連合学習におけるLoRAの統合数と精度の関係の検証

尹子旗, 村田栄樹, 河原大輔

言語処理学会第31回年次大会 3462 - 3467 2025年03月
関数単位の修正箇所特定によるリポジトリレベルのバグ修正

近藤瑞希, 河原大輔, 倉林利行

言語処理学会第31回年次大会 1793 - 1798 2025年03月
LM-jp-3 VILA: 日本語マルチモーダルデータセット及び強力な日本語マルチモーダルモデルの構築

笹川慶人, 前田航希, 杉浦一瑳, 栗田修平, 岡崎直観, 河原大輔

言語処理学会第31回年次大会 1185 - 1190 2025年03月
大規模言語モデルの事前学習用コーパスにおける要配慮個人情報の検出

源怜維, 小田悠介, 河原大輔

言語処理学会第31回年次大会 2873 - 2878 2025年03月
日本語創造性ベンチマークの構築

福田創, 小川隼斗, 堀尾海斗, 河原大輔, 柴田知秀

言語処理学会第31回年次大会 77 - 82 2025年03月
大規模言語モデルは他者の心をシミュレートしているか

青木洸士郎, 河原大輔

言語処理学会第31回年次大会 3176 - 3181 2025年03月
多様な言い換え生成と自己学習手法の統合による大規模言語モデルへの新規知識の追加学習

山本貴之, 河原大輔

言語処理学会第31回年次大会 2231 - 2236 2025年03月
LLMによるクイズの自動生成と質問応答への応用

小林俊介, 河原大輔

言語処理学会第31回年次大会 3984 - 3989 2025年03月
大規模言語モデルにおける多段推論の依存構造と推論能力の関係検証

榎本倫太郎, 新妻巧朗, 栗田修平, 河原大輔

言語処理学会第31回年次大会 857 - 862 2025年03月
Sparse Autoencoders as a Tool for Steering the Output Language of Large Language Models

Sebastian Zwirner, Wentao Hu, Koshiro Aoki, Daisuke Kawahara

言語処理学会第31回年次大会 3411 - 3416 2025年03月
真面目LLMと不真面目LLMで推論能力は変わるか？

堀尾海斗, 河原大輔

言語処理学会第31回年次大会 1687 - 1692 2025年03月
日本語の包括的な指示追従性データセットの構築

堀尾海斗, 福田創, 小川隼斗, 鈴江万碧, 織田宥楽, 河原大輔, 関根聡, 安藤まや

言語処理学会第31回年次大会 3401 - 3405 2025年03月
SvMoE: MoEルータの教師あり学習

村田栄樹, 河原大輔

言語処理学会第31回年次大会 3395 - 3400 2025年03月
オープンLLMによる翻訳を活用した日本語CLIPの開発

杉浦一瑳, 栗田修平, 小田悠介, 河原大輔, 岡崎直観

言語処理学会第31回年次大会 1421 - 1426 2025年03月
LLMの学術ドメイン適応のための合成データに基づく統合フレームワーク

小川隼斗, 河原大輔, 相澤彰子

言語処理学会第31回年次大会 3367 - 3372 2025年03月
和歌の埋め込みに基づく本歌取りの推定

小川隼斗, 河原大輔

言語処理学会第31回年次大会 3832 - 3837 2025年03月
Estimation of self-compassion from free texts using a large language model BERT

Hirohito Okano, Daisuke Kawahara, Michio Nomura

INTERNATIONAL JOURNAL OF PSYCHOLOGY 59 493 - 494 2024年08月

研究発表ペーパー・要旨（国際会議）
おもしろい川柳の生成

太田聖三郎, 河原大輔, 野村理朗

言語処理学会第30回年次大会 3286 - 3291 2024年03月
知識志向 Mixture of LoRA Experts の構築

伊藤俊太朗, 河原大輔

言語処理学会第30回年次大会 3101 - 3106 2024年03月
環境依存情報を利用しない大規模言語モデルによるコンピュータータスク自動化手法

笹川慶人, 河原大輔

言語処理学会第30回年次大会 1969 - 1974 2024年03月
大規模言語モデル開発における日本語Web文書のフィルタリング手法の検証

榎本倫太郎, Tolmachev Arseny, 新妻巧朗, 栗田修平, 河原大輔

言語処理学会第30回年次大会 2274 - 2279 2024年03月
科学技術論文を対象とした根拠付き生成型要約システムの構築

笠原智仁, 村田栄樹, 河原大輔

言語処理学会第30回年次大会 2131 - 2136 2024年03月
多様なクイズを自動生成する手法およびその検証

小林俊介, 河原大輔

言語処理学会第30回年次大会 24 - 29 2024年03月
自由記述からセルフ・コンパッションを推定することは可能か? ―BERTによる心理学的構成概念の定量化―

岡野裕仁, 河原大輔, 野村理朗

言語処理学会第30回年次大会 1113 - 1118 2024年03月
Uzushio: A Distributed Huge Corpus Processor for the LLM Era

Arseny Tolmachev, Masayoshi Hayashi, Takuro Niitsuma, Rintaro Enomoto, Hao Wang, Shuhei Kurita, Daisuke Kawahara, Kazuma Takaoka, Yoshitaka Uchida

言語処理学会第30回年次大会 902 - 907 2024年03月
ichikara-instruction LLMのための日本語インストラクションデータの作成

関根聡, 安藤まや, 後藤美知子, 鈴木久美, 河原大輔, 井之上直也, 乾健太郎

言語処理学会第30回年次大会 1508 - 1513 2024年03月
SlideAVSR: 視聴覚音声認識のための論文解説動画データセット

王昊, 栗田修平, 清水周一郎, 河原大輔

言語処理学会第30回年次大会 1258 - 1263 2024年03月
手順のテキスト化による将棋解説文生成

山内悠輔, 河原大輔

言語処理学会第30回年次大会 3197 - 3202 2024年03月
日本語Natural QuestionsとBoolQの構築

植松拓也, 王昊, 河原大輔, 柴田知秀

言語処理学会第30回年次大会 679 - 684 2024年03月
TaCOMET: 時間を考慮したイベント常識生成モデル

村田栄樹, 河原大輔

言語処理学会第30回年次大会 868 - 873 2024年03月
テキスト変換によるリポジトリレベルのコード検索の改善

近藤瑞希, 河原大輔, 倉林利行

言語処理学会第30回年次大会 65 - 70 2024年03月
意味的プロービングデータセットの構築と言語モデルの評価: イタリア語の倒置を例に

今井咲良, Giovanni Pasa, 小田博宗, 折田奈甫, 河原大輔

言語処理学会第30回年次大会 2670 - 2675 2024年03月
プロンプトの丁寧さと大規模言語モデルの性能の関係検証

尹子旗, 王昊, 堀尾海斗, 河原大輔, 関根聡

言語処理学会第30回年次大会 1803 - 1808 2024年03月
日本語Winogroundデータセットの自動構築

清水博文, 河原大輔

言語処理学会第30回年次大会 1523 - 1528 2024年03月
日本語TruthfulQAの構築

中村友亮, 河原大輔

言語処理学会第30回年次大会 1709 - 1714 2024年03月
多段階転移学習による不完全発話補完の精度向上

尹子旗, 河原大輔

2023年度人工知能学会全国大会（第37回） 2023年06月
日本語における Chain-of-Thought プロンプトの検証

堀尾海斗, 村田栄樹, 王昊, 井手竜也, 河原大輔, 山崎天, 新里顕大, 中町礼文, 李聖哲, 佐藤敏紀

2023年度人工知能学会全国大会（第37回） 2023年06月
日本語BERTにおけるトークナイザの違いによる影響の検証

伊藤俊太朗, 河原大輔

2023年度人工知能学会全国大会（第37回） 2023年06月
非言語データを用いた対照学習による文埋め込み学習の日本語における効果検証

清水博文, 河原大輔

2023年度人工知能学会全国大会（第37回） 2023年06月
大規模言語モデルによって構築された常識知識グラフの拡大と低コストフィルタリング

村田栄樹, 井手竜也, 榮田亮真, 河原大輔, 山崎天, 李聖哲, 新里顕大, 佐藤敏紀

言語処理学会第29回年次大会 2123 - 2128 2023年03月
日本語の分類タスクにおけるカリキュラム学習とマルチタスク学習の効果検証

植松拓也, 河原大輔

言語処理学会第29回年次大会 843 - 848 2023年03月
複数文書の読解を要する質問の自動生成と質問応答システムへの応用

小林俊介, 河原大輔

言語処理学会第29回年次大会 2616 - 2621 2023年03月
事前学習モデルに基づく日本語形態素解析器における辞書の利用

田村稔行, 河原大輔

言語処理学会第29回年次大会 333 - 338 2023年03月
魅力的な対話応答生成のための複数教師による知識蒸留

Ritvik Choudhary, 河原大輔

言語処理学会第29回年次大会 1388 - 1393 2023年03月
言語モデルを用いた漢文の返り点付与と書き下し文生成

王昊, 清水博文, 河原大輔

言語処理学会第29回年次大会 3031 - 3036 2023年03月
機械学習を用いた川柳の面白さの予測

太田聖三郎, 河原大輔, 野村理朗

言語処理学会第29回年次大会 577 - 582 2023年03月
Decoderベースの大規模言語モデルに基づくテキスト生成の自動評価指標

笠原智仁, 河原大輔, 山崎天, 新里顕大, 佐藤敏紀

言語処理学会第29回年次大会 1940 - 1945 2023年03月
対話行為の分布を利用した雑談対話システムの評価指標

榮田亮真, 井手竜也, 村田栄樹, 河原大輔

言語処理学会第29回年次大会 1653 - 1658 2023年03月
理論言語学の知見を応用した多言語クラスタリング

今井咲良, 河原大輔, 折田奈甫, 小田博宗

言語処理学会第29回年次大会 1316 - 1321 2023年03月
日本語WiCデータセットの構築と読みづらさ検出への応用

吉田あいり, 河原大輔

言語処理学会第29回年次大会 1643 - 1647 2023年03月
対話に基づく常識知識グラフの構築と対話応答生成に対する適用

井手竜也, 榮田亮真, 河原大輔, 山崎天, 李聖哲, 新里顕大, 佐藤敏紀

言語処理学会第29回年次大会 125 - 130 2023年03月
人間と言語モデルに対するプロンプトを用いたゼロからのイベント常識知識グラフ構築

井手竜也, 村田栄樹, 堀尾海斗, 河原大輔, 山崎天, 李聖哲, 新里顕大, 佐藤敏紀

言語処理学会第29回年次大会 322 - 327 2023年03月
JCommonsenseQA 2.0: 計算機と人の協働による常識推論データセットの改良

栗原健太郎, 河原大輔, 柴田知秀

言語処理学会第29回年次大会 2908 - 2913 2023年03月
テキスト生成モデルによる日本語形態素解析

児玉貴志, 植田暢大, 大村和正, 清丸寛一, 村脇有吾, 河原大輔, 黒橋禎夫

言語処理学会第29回年次大会 339 - 344 2023年03月
KWJA：汎用言語モデルに基づく日本語解析器

植田暢大, 大村和正, 児玉貴志, 清丸寛一, 村脇有吾, 河原大輔, 黒橋禎夫

情報処理学会第253回自然言語処理研究会 2022-NL-253 ( 2 ) 1 - 14 2022年09月
日本語における評価用データセットの構築と利用性の向上―JED2022 ワークショップの成果と展望

松田寛, 柴田知秀, 河原大輔, 久本空海, 久保隆宏, 浅原正幸

自然言語処理 29 ( 3 ) 1023 - 1029 2022年09月 [招待有り]

DOI
JGLUE: 日本語言語理解ベンチマーク

栗原健太郎, 河原大輔, 柴田知秀

自然言語処理 29 ( 2 ) 711 - 717 2022年06月 [招待有り]

DOI
JGLUE: 日本語言語理解ベンチマーク

栗原健太郎, 河原大輔, 柴田知秀

言語処理学会第28回年次大会 2023 - 2028 2022年03月

研究発表ペーパー・要旨（全国大会，その他学術会議）
構造的曖昧性に基づく読みづらさの検出

吉田あいり, 河原大輔

言語処理学会第28回年次大会 425 - 429 2022年03月

研究発表ペーパー・要旨（全国大会，その他学術会議）
応答の生成・評価・選択による対話システム

榮田亮真, 河原大輔

言語処理学会第28回年次大会 380 - 385 2022年03月

研究発表ペーパー・要旨（全国大会，その他学術会議）
ソーシャルメディア上のインタラクションを利用したオープンドメイン対話応答生成

Ritvik Choudhary, 河原大輔

言語処理学会第28回年次大会 392 - 397 2022年03月

研究発表ペーパー・要旨（全国大会，その他学術会議）
表出感情と経験感情をタグ付けした対話コーパスの構築

井手竜也, 河原大輔

言語処理学会第28回年次大会 386 - 391 2022年03月

研究発表ペーパー・要旨（全国大会，その他学術会議）
Prompt-Tuningによる個性を持った対話システムの構築

笠原智仁, 河原大輔

言語処理学会第28回年次大会 179 - 184 2022年03月

研究発表ペーパー・要旨（全国大会，その他学術会議）
日本語Wikipediaの編集履歴に基づく入力誤りデータセットと訂正システムの改良

田中佑, 村脇有吾, 河原大輔, 黒橋禎夫

言語処理学会第27回年次大会 1540 - 1545 2021年03月
集合知を用いた大規模意味的フレーム知識の構築

小原京子, 河原大輔, 笹野遼平, 関根聡

言語処理学会第27回年次大会 554 - 558 2021年03月
ファクトチェック支援のための含意関係認識システム

栗原健太郎, 河原大輔

言語処理学会第27回年次大会 1734 - 1739 2021年03月
逆翻訳とフィルタリングによる擬似対話コーパスの生成とそれを用いた対話システムの学習

榮田亮真, 河原大輔

言語処理学会第27回年次大会 647 - 652 2021年03月
生成と分類のマルチタスク学習による感情が考慮された対話応答生成

井手竜也, 河原大輔

言語処理学会第27回年次大会 642 - 646 2021年03月
オープンコラボレーションによるCOVID-19世界情報集約サイトの構築

河原大輔

自然言語処理 27 ( 4 ) 939 - 943 2020年12月 [招待有り]

記事・総説・解説・論説等（学術雑誌）
Minimize Exposure Bias of Seq2Seq Models in Joint Entity and Relation Extraction

Haoran Zhang, Qianying Liu, Aysa Xuemo Fan, Heng Ji, Daojian Zeng, Fei Cheng, Daisuke Kawahara, Sadao Kurohashi

In Findings of the Association for Computational Linguistics: EMNLP 2020 236 - 246 2020年11月 [査読有り]
機械翻訳を用いた自然言語推論データセットの多言語化

吉越卓見, 河原大輔, 黒橋禎夫

情報処理学会第244回自然言語処理研究会 2020年07月
クラウドソーシングを用いた日本語述語項構造タグ付きコーパスの拡張

阿部航平, 河原大輔, 黒橋禎夫

言語処理学会第26回年次大会 2020年03月
基本イベントに基づく常識推論データセットの構築

大村和正, 河原大輔, 黒橋禎夫

言語処理学会第26回年次大会 2020年03月
因果関係グラフ: 構造的言語処理に基づくイベントの原因・結果・解決策の集約

清丸寛一, 植田暢大, 児玉貴志, 田中佑, 岸本裕大, 田中リベカ, 河原大輔, 黒橋禎夫

言語処理学会第26回年次大会 2020年03月
Wikipediaの修正履歴を用いた日本語入力誤りデータセットの構築

田中佑, 村脇有吾, 河原大輔, 黒橋禎夫

言語処理学会第26回年次大会 2020年03月
BERTとRefinementネットワークによる統合的照応・共参照解析

植田暢大, 河原大輔, 黒橋禎夫

言語処理学会第26回年次大会 2020年03月
対話テキスト中の自己主張及び感情の分析に基づくソーシャルスタイル推定

高橋憲生, 河原大輔, 黒橋禎夫

言語処理学会第26回年次大会 2020年03月
クラウドソーシングを用いた習得時期の想起質問に基づく単語難易度データベースの構築

水谷勇介, 河原大輔, 黒橋禎夫

言語処理学会第25回年次大会 1503 - 1506 2019年03月
ドメインを限定した機械読解モデルに基づく述語項構造解析

高橋憲生, 柴田知秀, 河原大輔, 黒橋禎夫

言語処理学会第25回年次大会 2019年03月
大規模な自動解析データが形態素解析器をどこまで小さくできるか

Arseny Tolmachev, 河原大輔, 黒橋禎夫

言語処理学会第25回年次大会 2019年03月
BERTによる日本語構文解析の精度向上

柴田知秀, 河原大輔, 黒橋禎夫

言語処理学会第25回年次大会 205 - 208 2019年03月
Conditional VAEに基づく多様性を考慮したイベント予測

清丸寛一, 大村和正, 村脇有吾, 河原大輔, 黒橋禎夫

言語処理学会第25回年次大会 2019年03月
クラウドソーシングによる大喜利の面白さの構成要素の分析

中川裕貴, 村脇有吾, 河原大輔, 黒橋禎夫

言語処理学会第25回年次大会 2019年03月
述語項構造に基づく言語情報の基本単位のデザインと可視化

齋藤純, 坂口智洋, 柴田知秀, 河原大輔, 黒橋禎夫

言語処理学会第24回年次大会 93 - 93 2018年03月
クラウドソーシングによる日本語FrameNetと自動構築した格フレームとの対応付け

河原大輔, 小原京子, 関根聡, 乾健太郎

言語処理学会第24回年次大会 706 - 709 2018年03月
感情を含む特徴変化情報付き対話コーパスの構築とそれを用いた対話の自然さ推定

仲村哲明, 河原大輔

言語処理学会第24回年次大会 654 - 657 2018年03月
意見分析に適した意見タグ獲得改善への取り組み

三澤賢祐, 成田和弥, 伊藤友博, 柴田知秀, 河原大輔, 黒橋禎夫

言語処理学会第24回年次大会 572 - 572 2018年03月
京都大学テキストコーパスに対する網羅的な時間情報アノテーション

坂口智洋, 河原大輔, 黒橋禎夫

情報処理学会第233回自然言語処理研究会 2017年10月
実テキストの情報分析のための頑健な言語処理基盤

河原大輔, 黒橋禎夫, 林部祐太, 森田一, Arseny Tolmachev

第11回テキストアナリティクス・シンポジウム 25 - 30 2017年09月
ニューラルネットワークに基づく単語分割・品詞付与・構文解析の統合解析

栗田修平, 河原大輔, 黒橋禎夫

言語処理学会第23回年次大会 2017年03月
集合知により獲得された事態参与者の特徴変化知識に基づく照応解析

仲村哲明, 河原大輔

言語処理学会第23回年次大会 767 - 770 2017年03月
容認度判定の実態調査の報告: その実体は不均一な反応からなる,バイアスのかかった心理評定である

黒田航, 仲村哲明, 河原大輔

言語処理学会第23回年次大会 398 - 401 2017年03月
対訳コーパスを用いたゼロ照応タグ付きコーパスの自動構築

古川智雅, 中澤敏明, 柴田知秀, 河原大輔, 黒橋禎夫

言語処理学会第23回年次大会 2017年03月
クラウドソーシングを用いた談話関係アノテーションの改良

岸本裕大, 澤田晋之介, 村脇有吾, 河原大輔, 黒橋禎夫

言語処理学会第23回年次大会 819 - 819 2017年03月
多言語述語項構造ベクトル表現の学習

宇野真矢, 柴田知秀, 河原大輔, 黒橋禎夫

情報処理学会第226回自然言語処理研究会 2016年05月
ユーザのライフログに対する健康アドバイスの自動生成

粟村誉, 岡照晃, 荒牧英治, 河原大輔, 黒橋禎夫

言語処理学会第22回年次大会 2016年03月
連想ゲームによるコモンセンス知識の獲得

大谷直樹, 河原大輔, 黒橋禎夫, 鍜治伸裕, 颯々野学

言語処理学会第22回年次大会 897 - 897 2016年03月
集合知を用いた事態参与者の特徴変化に関する知識の獲得

仲村哲明, 河原大輔

言語処理学会第22回年次大会 901 - 904 2016年03月
おしゃべりけんこうノート：管理栄養士・インストラクターのアドバイスに基づく健康アドバイスシステム

岡照晃, 粟村誉, 荒牧英治, 河原大輔, 黒橋禎夫

言語処理学会第22回年次大会 2016年03月
パーソナリティ表現の自動翻訳の試み

植田晋平, 河原大輔, 黒橋禎夫, 岩井律子, 井関龍太, 熊田孝恒

言語処理学会第22回年次大会 282 - 285 2016年03月
格パターンの多様性に頑健な日本語格フレーム構築

林部祐太, 河原大輔, 黒橋禎夫

情報処理学会第224回自然言語処理研究会 2015年12月
行間を読む健康アドバイス生成システムの実現に向けて

粟村誉, 岡照晃, 荒牧英治, 河原大輔, 黒橋禎夫

情報処理学会第223回自然言語処理研究会 2015年09月
集合知を利用した対訳知識のカバレッジ向上

牛久敦, 河原大輔, 黒橋禎夫, 颯々野学

情報処理学会第77回全国大会, 京都 2:205 - 2:206 2015年03月
ブログ記事に対する健康アドバイスの自動生成に向けて

仲村哲明, 粟村誉, Yiqi Zhang, 荒牧英治, 河原大輔, 黒橋禎夫

情報処理学会第77回全国大会講演論文集 2015年03月
クラウドソーシングを活用した事態間矛盾の分析と分類

高畠悠, 森田一, 河原大輔, 黒橋禎夫, 東中竜一郎, 松尾義博

言語処理学会第21回年次大会 305 - 308 2015年03月
自動獲得と集合知の併用による関連語知識の高度化と評価

町田雄一郎, 河原大輔, 黒橋禎夫, 颯々野学

言語処理学会第21回年次大会 1060 - 1063 2015年03月
クラウドソーシングによる確率的アノテーションを利用した談話関係解析

澤田晋之介, 小浜翔太郎, 河原大輔, 黒橋禎夫

情報処理学会第77回全国大会 2015年03月
文の構造を可視化した翻訳後編集インターフェース

岸本裕大, 中澤敏明, 河原大輔, 黒橋禎夫

情報処理学会第77回全国大会 2:209 - 2:210 2015年03月
ゲーミフィケーションによる連想概念の獲得

町田雄一郎, 河原大輔, 柴田知秀, 黒橋禎夫, 颯々野学

情報処理学会関西支部 2014年度支部大会 2014年09月
2段階のクラウドソーシングによる談話関係タグ付きコーパスの構築

河原大輔, 町田雄一郎, 柴田知秀, 黒橋禎夫, 小林隼人, 颯々野学

情報処理学会第217回自然言語処理研究会 2014年07月
ソーシャルメディアにおける空間的近接性と時間的一貫性を考慮した地名の曖昧性解消

粟村誉, 荒牧英治, 河原大輔, 柴田知秀, 黒橋禎夫

情報処理学会第217回自然言語処理研究会 2014年07月
Japanese Discourse Structure Analysis Based on Automatically Acquired Large-Scale Knowledge

Qinghan Bu, Daisuke Kawahara, Sadao Kurohashi

言語処理学会第20回年次大会 725 - 728 2014年03月
Chinese Unknown Word Extraction by Mining Maximized Substrings

Shen Mo, 黒橋禎夫, 河原大輔

言語処理学会第20回年次大会 384 - 387 2014年03月
著者・読者表現および外界ゼロ照応を考慮したゼロ照応解析

萩行正嗣, 河原大輔, 黒橋禎夫

言語処理学会第20回年次大会 721 - 724 2014年03月
コーパスから算出した語の親和性によって構文パターンの曖昧性を解消する試み

野澤元, 河原大輔

言語処理学会第20回年次大会 189 - 192 2014年03月
Language-independent Approach to High Quality Dependency Selection From Automatic Parses

Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

IPSJ 2013 2013年09月
構文・述語項構造解析システムKNPの解析の流れと特徴

笹野遼平, 河原大輔, 黒橋禎夫, 奥村学

言語処理学会第19回年次大会 110 - 113 2013年03月
Dependency Parse Reranking Based-on Subtree Extraction

Mo Shen, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of NLP 2013 58 - 61 2013年03月
Selecting High Quality Dependencies from Automatic Parses

Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

In Proceedings of NLP 2013 2013年03月
日本語語彙知識の統一的・整合的管理のデザイン

黒橋禎夫, 進義治, 柴田知秀, 村脇有吾, 河原大輔

言語処理学会第19回年次大会 26 - 29 2013年03月
非計算機的計算に向けて(<特集>編集委員今年の抱負2013)

河原大輔, Daisuke Kawahara

人工知能学会誌 = Journal of Japanese Society for Artificial Intelligence 28 ( 1 ) 20 - 20 2013年01月

CiNii
大規模ウェブテキスト集合からの知識獲得とその応用

河原大輔, 黒橋禎夫

Tsubame ESJ. : e-science journal 7 12 - 15 2012年12月

CiNii
大規模語彙的知識に基づく受身形と能動形の表層格の対応付け

笹野遼平, 河原大輔, 黒橋禎夫, 奥村学

研究報告自然言語処理（NL） 2012 ( 9 ) 1 - 6 2012年07月

　概要を見る

本稿では，同一の用言の対応する受身形と能動形の格の用例や分布の類似性に着目し，Webから自動獲得した大規模格フレームと，少数の受身形と能動形の格の変換規則を用いることで，受身形と能動形の表層格の対応付けに関する知識を自動獲得する手法を提案する．さらに，自動獲得した知識を受身文の能動文への変換における格変換タスクに適用することにより，その有用性を示す．We propose a method for automatically acquiring knowledge about case alternations between the passive and active voices. Our method leverages large lexical case frames obtained from large Web corpus, and several alternation patterns. We then use the acquired knowledge to a case alternation task and show the usefulness of the acquired knowledge.

CiNii
多様な文書の書き始めに対する意味関係タグ付きコーパスの構築

萩行正嗣, 河原大輔, 黒橋禎夫

情報処理学会第206回自然言語処理研究会 2012年05月
実テキスト解析をささえる語彙知識の自動獲得

柴田知秀, 村脇有吾, 黒橋禎夫, 河原大輔

言語処理学会第18回年次大会 81 - 84 2012年03月
A Framework of Automatic Case Frame Construction From Raw Corpus

Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

言語処理学会第18回年次大会 389 - 392 2012年03月
TSUBAKI: An Open Search Engine Infrastructure for Developing Information Access Methodology (特集:情報爆発時代におけるIT基盤技術)

Keiji Shinzato, Tomohide Shibata, Daisuke Kawahara, Sadao Kurohashi

情報処理学会論文誌 52 ( 12 ) 12p 2011年12月

　概要を見る

Due to the explosive growth in the amount of information in the last decade, it is getting extremely harder to obtain necessary information by conventional information access methods. Hence, creation of drastically new technology is needed. For developing such new technology, search engine infrastructures are required. Although the existing search engine APIs can be regarded as such infrastructures, these APIs have several restrictions such as a limit on the number of API calls. To help the development of new technology, we are running an open search engine infrastructure, TSUBAKI, on a high-performance computing environment. In this paper, we describe TSUBAKI infrastructure.------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.20(2012) No.1 (online) DOI http://dx.doi.org/10.2197/ipsjjip.20.216------------------------------Due to the explosive growth in the amount of information in the last decade, it is getting extremely harder to obtain necessary information by conventional information access methods. Hence, creation of drastically new technology is needed. For developing such new technology, search engine infrastructures are required. Although the existing search engine APIs can be regarded as such infrastructures, these APIs have several restrictions such as a limit on the number of API calls. To help the development of new technology, we are running an open search engine infrastructure, TSUBAKI, on a high-performance computing environment. In this paper, we describe TSUBAKI infrastructure.------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.20(2012) No.1 (online) DOI http://dx.doi.org/10.2197/ipsjjip.20.216------------------------------

CiNii
Automatic Construction of Multilingual Case Frames

Gongye Jin, Daisuke Kawahara, Sadao Kurohashi

情報処理学会関西支部支部大会 2011年09月
単語の共起分布を用いた文末モダリティの自動推定

中村紘規, 玉城伸仁, 河原大輔, 黒橋禎夫

情報処理学会関西支部支部大会 2011年09月
Web上の多彩な言語表現バリエーションに対応した頑健な形態素解析

勝木健太, 笹野遼平, 河原大輔, 黒橋禎夫

言語処理学会第17回年次大会 1003 - 1006 2011年03月
大規模Web情報分析のための分析対象ページの段階的選択

赤峯享, 加藤義清, 川田拓也, レオン末松豊インティ, 河原大輔, 乾健太郎, 黒橋禎夫

言語処理学会第17回年次大会 41 - 44 2011年03月
Treatment of Complex Sentences, Modality and Verbal Structures in Linguistics-Based MT

Alexis Kauffmann, Daisuke Kawahara, Sadao Kurohashi

言語処理学会第17回年次大会 818 - 821 2011年03月
情報分析システム WISDOM のユーザ評価とその分析

川田拓也, 赤峯享, 河原大輔, 加藤義清, 乾健太郎, 黒橋禎夫, 木俵豊

言語処理学会第17回年次大会 45 - 48 2011年03月
言語を獲得する(<特集>編集委員今年の抱負2011)

河原大輔, Daisuke Kawahara

人工知能学会誌 = Journal of Japanese Society for Artificial Intelligence 26 ( 1 ) 16 - 16 2011年01月

CiNii
Web情報分析のための大規模Webページの収集・選択・検索

赤峯享, 加藤義清, 河原大輔, レオン末松豊インティ, 新里圭司, 乾健太郎, 黒橋禎夫, 木俵豊

言語処理学会第16回年次大会 238 - 241 2010年03月
Web情報の俯瞰的把握のための主要・対比・対立文の抽出と集約

河原大輔, 乾健太郎, 黒橋禎夫

言語処理学会第16回年次大会 134 - 137 2010年03月
Web ページの情報発信構成の同定

加藤義清, 河原大輔, 乾健太郎, 黒橋禎夫

言語処理学会第16回年次大会 90 - 93 2010年03月
Webページの大規模収集・検索基盤

赤峯享, 加藤義清, 河原大輔, 新里圭司, 乾健太郎, 黒橋禎夫, 木俵豊

情報処理学会研究会 Vol.2009-DBS-148 No.14 2009年07月
節内と節間の整合性をとる構文・格解析

河原大輔, 黒橋禎夫

言語処理学会第15回年次大会 24 - 27 2009年03月
JUMAN/KNPを用いた形態素・構文・格解析

河原大輔, 黒橋禎夫

京都大学学術情報メディアセンター, メディア情報処理専修コース「自然言語処理技術」 2008年09月
Webページの著者の同定

加藤義清, 河原大輔, 乾健太郎, 黒橋禎夫, 柴田知秀

第7回情報科学技術フォーラム(FIT2008) 2008年09月
主要・対立表現の俯瞰的把握－ウェブの情報信頼性分析に向けて

河原大輔, 黒橋禎夫, 乾健太郎

情報処理学会第186回自然言語処理研究会 2008年07月
類似性を用いない並列構造解析

河原大輔, 黒橋禎夫

言語処理学会第14回年次大会 91 - 94 2008年03月
コーパスサイズの拡大および用例の汎化による格フレームのカバレッジの改善

笹野遼平, 河原大輔, 黒橋禎夫

言語処理学会第14回年次大会 528 - 531 2008年03月
Cascaded Classification for High Quality Head-modifier Pair Selection

Kun Yu, Daisuke Kawahara, Sadao Kurohashi

言語処理学会第14回年次大会 95 - 98 2008年03月
分布類似度を用いた大規模格フレームの自動構築

濱田慧, 笹野遼平, 柴田知秀, 河原大輔, 黒橋禎夫

言語処理学会第14回年次大会 532 - 535 2008年03月
JUMAN/KNPを用いた形態素・構文・格解析

河原大輔, 黒橋禎夫

京都大学学術情報メディアセンター, メディア情報処理専修コース「自然言語処理技術」 2007年09月
代表表記による自然言語リソースの整備

岡部浩司, 河原大輔, 黒橋禎夫

言語処理学会第13回年次大会 606 - 609 2007年03月
大規模語彙的知識に基づく構文・並列・格構造解析の統合的確率モデル

河原大輔, 黒橋禎夫

言語処理学会第13回年次大会 506 - 509 2007年03月
大規模日本語ウェブ文書を対象とした開放型検索エンジン基盤の構築

新里圭司, 柴田知秀, 河原大輔, 黒橋禎夫

言語処理学会第13回年次大会 1117 - 1120 2007年03月
自然言語処理基盤としてのウェブ文書標準フォーマットの提案

新里圭司, 橋本力, 河原大輔, 黒橋禎夫

言語処理学会第13回年次大会 602 - 605 2007年03月
表層的語彙分布に基づく談話/テクストの主観性・主体性分析に向けて

岡本雅史, 河原大輔, 黒橋禎夫

日本認知言語学会論文集第６巻 423 - 432 2006年09月
格フレームを用いたかな表記語の曖昧性解消

岡部浩司, 河原大輔, 黒橋禎夫

言語処理学会第12回年次大会 1115 - 1118 2006年03月
自動獲得した知識に基づく統合的な照応解析

笹野遼平, 河原大輔, 黒橋禎夫

言語処理学会第12回年次大会 480 - 483 2006年03月
Webから獲得した大規模格フレームに基づく構文・格解析の統合的確率モデル

河原大輔, 黒橋禎夫

言語処理学会第12回年次大会 1111 - 1114 2006年03月
高性能計算環境を用いたWebからの大規模格フレーム構築

河原大輔, 黒橋禎夫

情報処理学会自然言語処理研究会 171-12 2006年01月
JUMAN/KNPを用いた形態素解析・構文解析

黒橋禎夫, 河原大輔

京都大学学術情報メディアセンター, メディア情報処理専修コース「自然言語処理技術」 2005年08月
格フレーム辞書の漸次的自動構築

河原大輔, 黒橋禎夫

自然言語処理 = Journal of natural language processing 12 ( 2 ) 109 - 131 2005年03月

CiNii
日本語辞書整備のための日本語カタカナ複合名詞の自動分割

中澤敏明, 河原大輔, 黒橋禎夫

言語処理学会第11回年次大会 588 - 591 2005年03月
大規模格フレームに基づく構文・格解析の統合的確率モデル

河原大輔, 黒橋禎夫

言語処理学会第11回年次大会 923 - 926 2005年03月
言語情報と映像情報の統合による教示発話の構造解析

柴田知秀, 立木将人, 河原大輔, 岡本雅史, 黒橋禎夫, 西田豊明

言語処理学会第10回年次大会 532 - 535 2004年03月
名詞格フレーム辞書の自動構築とそれを用いた名詞句の関係解析

笹野遼平, 河原大輔, 黒橋禎夫

言語処理学会第10回年次大会 472 - 475 2004年03月
語の大域的多義性解消に基づく省略解析の精度向上

河原大輔, 黒橋禎夫

言語処理学会第10回年次大会 769 - 772 2004年03月
言葉の背後に潜む『問い』の抽出 (ことば工学研究会(第14回)テーマ:ことばと身体性)

松村真宏, 河原大輔, 岡本雅史

ことば工学研究会 14 1 - 7 2003年08月

CiNii
料理教示発話の構造解析

西田悠介, 柴田知秀, 河原大輔, 岡本雅史, 黒橋禎夫, 西田豊明

言語処理学会第9回年次大会 601 - 604 2003年03月
主題と文章構造の解析に基づくスライドの自動生成

柴田知秀, 河原大輔, 黒橋禎夫

言語処理学会第9回年次大会 597 - 600 2003年03月
自動構築した格フレーム辞書に基づく省略解析の大規模評価

河原大輔, 黒橋禎夫

言語処理学会第9回年次大会 589 - 592 2003年03月
Embodied conversational agents for presenting intellectual multimedia contents

YI Nakano, T Murayama, D Kawahara, S Kurohashi, T Nishida

KNOWLEDGE-BASED INTELLIGNET INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS 2774 1030 - 1036 2003年 [査読有り]

　概要を見る

This paper presents an embodied conversational agent (ECA) that presents multimedia contents. The system takes plain text as input, and automatically generates a presentation featured with an animated agent. It selects and generates appropriate gestures and facial expressions for a humanoid agent according to linguistic information in the text. As a component of the ECA systems we also present an agent animation system, RISA, which can draw animations of natural human behaviors on web-based applications.
会話型コンテンツを用いた知識流通支援

久保田, 秀和, 黒橋, 禎夫, 西田, 豊明, 河原, 大輔, 清田, 陽司

第64回全国大会講演論文集 2002 ( 1 ) 535 - 542 2002年03月

CiNii
頑健な格解析を実現する格フレーム辞書の自動構築

河原大輔, 黒橋禎夫

言語処理学会第8回年次大会 515 - 518 2002年03月
格関係の比較を用いた複数テキスト間の重複・差分の検出

成松深, 河原大輔, 黒橋禎夫, 西田豊明

言語処理学会第8回年次大会 535 - 538 2002年03月
国語辞典とコーパスを用いた用言の言い換え規則の学習

鍜治伸裕, 河原大輔, 黒橋禎夫, 佐藤理史

言語処理学会第8回年次大会 331 - 334 2002年03月
「関係」タグ付きコーパスの作成

河原大輔, 黒橋禎夫, 橋田浩一

言語処理学会第8回年次大会 495 - 498 2002年03月
用言と直前の格要素の組を単位とする格フレームの自動獲得

河原大輔, 黒橋禎夫

情報処理学会研究報告自然言語処理（NL） 2000 ( 107 ) 127 - 134 2000年11月

　概要を見る

本稿では、生コーパスから格フレームを自動的に獲得する手法を提案する。まず、生コーパスを解析し、解析結果から確信度の高い係り受けを収集する。次に、用言の意味的曖昧性に対処するために、用言と直前の格要素の組を単位として、生コーパスから用例を収集し、それらのクラスタリングを行う。また、得られた格フレームを用いて格解析を行い、その結果を示す。This paper describes a method to construct a case frame dictionary automatically from a raw corpus. First, we parse a corpus and collect reliable examples from the parsed corpus. Secondly, to deal with semantic ambiguity of a predicate, we distinguish examples by a predicate and its adjacent case component and cluster them. We also report on an experimental result of case structure analysis using the constructed dictionary.

CiNii
京都大学自然言語処理ツール

黒橋禎夫, 河原大輔

情報処理学会研究報告自然言語処理（NL） 2000 ( 53 ) 91 - 91 2000年06月

　概要を見る

これまでに京都大学言語メディア研究室（旧長尾研究室）で開発してきた形態素解析システムJUMAN，構文解析システムKNP，形態素・構文解析結果の修正GUI，および京大コーパスのデモ・紹介を行う．JUMANは益岡・田窪文法をもとに，EDR日本語単語辞書，独自の連語辞書などを用いて形態素区切りと品詞同定を行うシステムで，新聞記事ドメインで約99％の精度である．KNPは日本語文節に関する詳細な文法に基づき，類似性に基づく並列構造解析等によって文節間の係り受け構造を一意に決定するシステムで，新聞記事ドメインで約90％の精度である．京大コーパスは，これらのシステムによって毎日新聞の記事4万文を自動解析し，その結果をGUIによって人手で修正したものである．これらのツールはいずれもhttp://www-nagao.kuee.kyoto-u.ac.jp/から入手可能である．Natural language processing tools developed at Language Media Lab, Kyoto University are demonstrated: Japanese morphological analyzer JUMAN, Japanese parser KNP, GUI to modify outputs of JUMAN/KNP, and Kyoto Corpus. JUMAN segments a sentence into words and clarifies PsOS of the words, based on Masuoka-Takubo Japanese grammar, EDR dictionary and a dictionary for fixed expressions. KNP first detects the scopes of coordinations based on parallelism, and then detects the dependency structure of the sentence by a detailed grammar describing functions of Japanese phrases. Kyoto Corpus consists of 40,000 sentences from Mainichi newspaper articles, first analyzed by JUMAN and KNP, then modified by annotators using a mouse-based GUI. All these resources are available at http://www-nagao.kuee.kyoto-u.ac.jp/.

CiNii

▼全件表示

現在担当している科目

情報通信基礎　【前年度成績S評価者用】

基幹理工学部

2026年春学期
情報通信基礎

基幹理工学部

2026年春学期
情報理工学実験Ｂ【前年度成績S評価者用】

基幹理工学部

2026年春学期
卒業論文Ｂ

基幹理工学部

2026年秋学期
情報理工学実験Ｂ

基幹理工学部

2026年春学期
自然言語処理

基幹理工学部

2026年春学期
プロジェクト研究Ｂ

基幹理工学部

2026年秋学期
プロジェクト研究Ａ

基幹理工学部

2026年春学期
卒業論文Ｂ（春学期）

基幹理工学部

2026年春学期
卒業論文Ａ　（集中）

基幹理工学部

2026年集中（春・秋学期）
卒業論文Ａ

基幹理工学部

2026年春学期
卒業論文Ａ（秋学期）

基幹理工学部

2026年秋学期
卒業論文Ｂ　18前再　【前年度成績S評価者用】

基幹理工学部

2026年秋学期
卒業論文Ａ　18前再　【前年度成績S評価者用】

基幹理工学部

2026年春学期
卒業論文Ｂ（春学期）　18前再

基幹理工学部

2026年春学期
卒業論文Ａ（秋学期）　18前再

基幹理工学部

2026年秋学期
卒業論文Ｂ　18前再

基幹理工学部

2026年秋学期
卒業論文Ａ　18前再

基幹理工学部

2026年春学期
情報理工学実験Ａ

基幹理工学部

2026年秋学期
情報理工学実験Ａ　【前年度成績S評価者用】

基幹理工学部

2026年秋学期
プロジェクト研究Ａ

基幹理工学部

2026年春学期
卒業論文Ｂ　18前再

基幹理工学部

2026年秋学期
卒業論文Ｂ（春学期）

基幹理工学部

2026年春学期
卒業論文Ｂ

基幹理工学部

2026年秋学期
卒業論文Ａ　18前再　【前年度成績S評価者用】

基幹理工学部

2026年春学期
卒業論文Ａ（秋学期）　18前再

基幹理工学部

2026年秋学期
卒業論文Ａ　18前再

基幹理工学部

2026年春学期
自然言語処理

基幹理工学部

2026年春学期
プロジェクト研究Ｂ

基幹理工学部

2026年秋学期
情報通信実験Ｂ【前年度成績S評価者用】

基幹理工学部

2026年春学期
情報通信実験Ｂ

基幹理工学部

2026年春学期
離散数学【前年度成績S評価者用】

基幹理工学部

2026年春学期
離散数学

基幹理工学部

2026年春学期
卒業論文Ａ　（集中）

基幹理工学部

2026年集中（春・秋学期）
卒業論文Ａ（秋学期）

基幹理工学部

2026年秋学期
卒業論文Ａ

基幹理工学部

2026年春学期
卒業論文Ｂ　18前再　【前年度成績S評価者用】

基幹理工学部

2026年秋学期
情報通信実験Ａ　【前年度成績S評価者用】

基幹理工学部

2026年秋学期
情報通信実験Ａ

基幹理工学部

2026年秋学期
情報通信学プログラミング基礎　【前年度成績S評価者用】

基幹理工学部

2026年春学期
情報通信学プログラミング基礎

基幹理工学部

2026年春学期
卒業論文Ｂ（春学期）　18前再

基幹理工学部

2026年春学期
Graduation Thesis A　(Fall)【For students enrolled before 2022】

基幹理工学部

2026年秋学期
Graduation Thesis B (Spring) [S Grade]

基幹理工学部

2026年春学期
Graduation Thesis B (Fall) [S Grade]

基幹理工学部

2026年秋学期
Graduation Thesis B (Fall)

基幹理工学部

2026年秋学期
Computer Science and Communications Engineering Laboratory A

基幹理工学部

2026年秋学期
Graduation Thesis B (Spring)

基幹理工学部

2026年春学期
Computer Science and Communications Engineering Laboratory A [S Grade]

基幹理工学部

2026年秋学期
Project Research Fall

基幹理工学部

2026年秋学期
Project Research Spring

基幹理工学部

2026年春学期
Introduction to Computers and Networks

基幹理工学部

2026年春学期
Computer Science and Communications Engineering Laboratory B

基幹理工学部

2026年春学期
Graduation Thesis A　(Fall)[S Grade]【For students enrolled before 2022】

基幹理工学部

2026年秋学期
Graduation Thesis A (Spring) [S Grade]

基幹理工学部

2026年春学期
Graduation Thesis A (Fall)

基幹理工学部

2026年秋学期
Graduation Thesis A (Fall) [S Grade]

基幹理工学部

2026年秋学期
Graduation Thesis A (Spring)

基幹理工学部

2026年春学期
Graduation Thesis A　(Spring)[S Grade]【For students enrolled before 2022】

基幹理工学部

2026年春学期
Graduation Thesis A　(Spring)【For students enrolled before 2022】

基幹理工学部

2026年春学期
Master's Thesis (Department of Computer Science and Communications Engineering)

大学院基幹理工学研究科

2026年通年
修士論文（情報・通信）

大学院基幹理工学研究科

2026年通年
自然言語処理演習Ｄ

大学院基幹理工学研究科

2026年秋学期
自然言語処理

大学院基幹理工学研究科

2026年春学期
情報理工・情報通信特別実験B

大学院基幹理工学研究科

2026年秋学期
情報理工・情報通信特別実験A

大学院基幹理工学研究科

2026年春学期
自然言語処理研究

大学院基幹理工学研究科

2026年通年
情報理工・情報通信特別演習Ａ

大学院基幹理工学研究科

2026年春学期
自然言語処理研究

大学院基幹理工学研究科

2026年通年
Seminar on Natural Language Processing D

大学院基幹理工学研究科

2026年秋学期
Seminar on Natural Language Processing C

大学院基幹理工学研究科

2026年春学期
Seminar on Natural Language Processing B

大学院基幹理工学研究科

2026年秋学期
Seminar on Natural Language Processing A

大学院基幹理工学研究科

2026年春学期
Special Laboratory B in Computer Science and Communications Engineering

大学院基幹理工学研究科

2026年秋学期
Special Laboratory A in Computer Science and Communications Engineering

大学院基幹理工学研究科

2026年春学期
Research on Natural Language Processing

大学院基幹理工学研究科

2026年通年
Natural Language Processing

大学院基幹理工学研究科

2026年春学期
自然言語処理演習Ｃ

大学院基幹理工学研究科

2026年春学期
自然言語処理演習Ｂ

大学院基幹理工学研究科

2026年秋学期
自然言語処理演習Ａ

大学院基幹理工学研究科

2026年春学期
情報理工・情報通信特別演習Ｂ

大学院基幹理工学研究科

2026年秋学期

▼全件表示

他学部・他研究科等兼任情報

理工学術院大学院基幹理工学研究科

学内研究所・附属機関兼任歴

2024年

-

2026年

理工学術院総合研究所兼任研究員

特定課題制度（学内資金）

計算機の文章読解・生成能力の向上、評価に関する研究

2025年

　概要を見る

To improve computers’ text understanding and generation capabilities, we conducted research and development on the training and evaluation of large language models using large-scale text corpora, as well as on application systems built upon them. Regarding the training of large language models, we examined large language model training using synthetic data, constructed a large-scale Japanese vision–language model, and developed a highly efficient process reward model. Regarding evaluation, we built a safety evaluation benchmark for Japanese multi-turn dialogue, a fact-checking framework targeting procedural responses, and a meta-evaluation framework for evaluators of Japanese in-vehicle dialogue. Regarding application systems, we developed an automatic Japanese rap lyrics generation system that controls rhyme and content, as well as a smart home assistant using local language models. We believe that these research and development efforts have advanced research on text understanding and generation by one step.
計算機の文章読解・生成能力向上に関する研究

2024年

　概要を見る

To enhance the capabilities of computational text understanding and generation, we conducted research involving the training of large language models (LLMs) on large-scale corpora and the development of application systems based on these models. Regarding LLM training, we focused on integrating knowledge graphs into LLMs, improving knowledge-fusion models using a Mixture of Experts (MoE) approach, automatic quiz generation, and validating the vision-and-language model LongCLIP. The developed application systems included a recognizer for "Honka-dori" (classical poetry allusions) based on waka embeddings, an integrated framework leveraging synthetic data for adapting LLMs to academic domains, and a music generation system employing retrieval-augmented generation with ABC notation. Evaluation studies involved the creation of Japanese datasets related to prompting and the validation of empathy annotations in dialogue contexts. These advancements significantly contribute to the field of computational text understanding and generation.
計算機の文章読解能力向上に関する研究

2023年

　概要を見る

To improve the text understanding abilities of computers, we conducted studies on training foundation models using large text corpora and developing and evaluating application systems through fine-tuning these models. For the training of the foundation models, we investigated the impact of filtering methods for large text corpora on downstream tasks, constructed models that learned the syllable count for literature generation, and examined knowledge-integrated models using Mixture of Experts (MoE). For application systems, we developed systems for generating interesting senryu (a type of haiku) and playing word chain games, etc. For evaluation, we automatically constructed a Japanese Winoground dataset for evaluating Japanese multimodal models. Through these research and development efforts, we believe that we have taken a step forward in the study of text understanding by computers.
集合知による注釈付けに基づくデータ駆動型言語理解の変革

2021年

　概要を見る

We tried to use the wisdom of crowds to build probabilistically annotated corpora towards a breakthrough in natural language understanding. Using crowdsourcing as the wisdom of crowds, five to ten crowdworkers made annotations for the tasks of syntactic parsing and discourse relation analysis. Furthermore, the collected annotations were converted to probabilities using the EM algorithm. As a result, we confirmed that the higher the level of a task is, the more the probability value of each annotation label varied. We also verified that the resulting probabilistic multi-label annotations were plausible. In the future, we plan to increase the size of the probabilistically annotated corpora and develop analyzers based on the annotated corpora. This will enable us to dramatically improve the accuracy of natural language analysis and understanding, such as syntactic parsing, anaphora resolution, and so forth.
速報的情報を俯瞰するためのテキストの集約と分析手法に関する研究

2020年

　概要を見る

As the coronavirus disease (COVID-19) has been rapidly spreading around the world, there is an increasing need for a system for aggregating immediate information that transcends borders and domains. To build such a system, it is necessary to use natural language processing (NLP) technologies flexibly, such as combining machine translation of multilingual texts with information analysis technology, and mapping information transmitted by experts with social media texts. We have studied an application of NLP technologies for COVID-19 by cooperating with researchers in informatics including NLP. Then, we have developed a system for aggregating COVID-19 information from all over the world. In this system, COVID-19 information is grouped by regions and topics, such as infection status, prevention, medical information, economic policies, and education. Collected multilingual articles are translated into Japanese and English by machine translation and are automatically classified into the topics by the contextualized language model BERT. We hope that this system is useful for many people, and this kind of technology will be used for other future events and disasters.