Updated on 2022/12/02


KIKUCHI, Hideaki
Scopus Paper Info  
Paper Count: 0  Citation Count: 0  h-index: 3

Citation count denotes the number of citations in papers published for a particular year.

Faculty of Human Sciences, School of Human Sciences
Job title

Concurrent Post

  • Faculty of Human Sciences   Graduate School of Human Sciences

  • Faculty of Science and Engineering   Graduate School of Fundamental Science and Engineering

  • Affiliated organization   Global Education Center

Research Institute

  • 2021

    データ科学センター   兼任センター員

  • 2021

    大学総合研究センター   兼任センター員


  • 早稲田大学   博士(情報科学)

Professional Memberships












Research Areas

  • Intelligent informatics

Research Interests

  • Speech Science, Spoken Dialogue, Human Agent Interaction


  • Vowels in infant-directed speech: More breathy and more variable, but not clearer

    Kouki Miyazawa, Takahito Shinya, Andrew Martin, Hideaki Kikuchi, Reiko Mazuka

    COGNITION   166   84 - 93  2017.09  [Refereed]

     View Summary

    Infant-directed speech (IDS) is known to differ from adult-directed speech (ADS) in a number of ways, and it has often been argued that some of these IDS properties facilitate infants' acquisition of language. An influential study in support of this view is Kuhl et al. (1997), which found that vowels in IDS are produced with expanded first and second formants (F1/F2) on average, indicating that the vowels are acoustically further apart in IDS than in ADS. These results have been interpreted to mean that the way vowels are produced in IDS makes infants' task of learning vowel categories easier. The present paper revisits this interpretation by means of a thorough analysis of IDS vowels using a large-scale corpus of Japanese natural utterances. We will show that the expansion of F1/F2 values does occur in spontaneous IDS even when the vowels' prosodic position, lexical pitch accent, and lexical bias are accounted for. When IDS vowels are compared to carefully read speech (CS) by the same mothers, however, larger variability among IDS vowel tokens means that the acoustic distances among vowels are farther apart only in CS, but not in IDS when compared to ADS. Finally, we will show that IDS vowels are significantly more breathy than ADS or CS vowels. Taken together, our results demonstrate that even though expansion of formant values occurs in spontaneous IDS, this expansion cannot be interpreted as an indication that the acoustic distances among vowels are farther apart, as is the case in CS. Instead, we found that IDS vowels are characterized by breathy voice, which has been associated with the communication of emotional affect. (C) 2017 Elsevier B.V. All rights reserved.



  • Assigning a Personality to a Spoken Dialogue Agent by Behavior Reporting

    Yoshito Ogawa, Hideaki Kikuchi

    NEW GENERATION COMPUTING   35 ( 2 ) 181 - 209  2017.04  [Refereed]

     View Summary

    A method to assign a personality to a spoken dialogue agent is proposed and evaluated. The proposed method assigns a personality using agent reporting about behavior independent of interaction with a user. The proposed method attempts to assigning complex personalities. For this purpose, we have defined a behavior report dialogue and designed a personality assigning method using behavior reporting. The proposed method consists of three steps: collecting stereotypes between a personality and behavior through a questionnaire, designing the behavior report dialogue from the collected stereotypes, and agent reports about behavior at the start of interactions with a user. Experimental results show that the proposed method can assign a personality by repeating the behavior report dialogue, (the assigned personality is equivalent to the personality determined by the collected stereotypes) and that reporting behavior influences the assigned personality. In addition, we verified that the proposed method can assign "kind", "judicious" and the five basic personalities defined in the Tokyo University Egogram Second Edition.



  • Turn-taking timing of mother tongue

    Ichikawa Akira, Oohashi Hiroki, Naka Makiko, Kikuchi Hideaki, Horiuchi Yasuo, Kuroiwa Shingo

    Studies in Science and Technology   5 ( 1 ) 113 - 122  2016

    DOI CiNii

  • Humor utterance generation for non-task-oriented dialogue systems

    Shohei Fujikura, Yoshito Ogawa, Hideaki Kikuchi

    HAI 2015 - Proceedings of the 3rd International Conference on Human-Agent Interaction     171 - 173  2015.10

     View Summary

    We propose a humor utterance generation method that is compatible with dialogue systems, to increase "desire of continuing dialogue". A dialogue system retrieves leading-item: noun pairs from Twitter as knowledge and attempts to select the most humorous reply using word similarity, which reveals that incongruity can be explained by the incongruity-resolution model. We consider the differences among individuals, and confirm the validity of the proposed method. Ex-perimental results indicate that high-incongruity replies are significantly effective against low-incongruity replies with a limited condition.



  • Constructing the corpus of infant-directed speech and infant-like robot-directed speech

    Ryuji Nakamura, Kouki Miyazawa, Hisashi Ishihara, Ken'ya Nishikawa, Hideaki Kikuchi, Minoru Asada, Reiko Mazuka

    HAI 2015 - Proceedings of the 3rd International Conference on Human-Agent Interaction     167 - 169  2015.10

     View Summary

    The characteristics of the spoken language used to address infants have been eagerly studied as a part of the language acquisition research. Because of the uncontrollability factor with regard to the infants, the features and roles of infantdirected speech were tried to be revealed by the comparison of speech directed toward infants and that toward other listeners. However, they share few characteristics with infants, while infants have many characteristics which may derive the features of IDS. In this study, to solve this problem, we will introduce a new approach that replaces the infant with an infant-like robot which is designed to control its motions and to imitate its appearance very similar to a real infant. We have now recorded both infant-and infantlike robot-directed speech and are constructing both corpora. Analysis of these corpora is expected to contribute to the studies of infant-directed speech. In this paper, we discuss the contents of this approach and the outline of the corpora.



  • Automatic Estimation of Speaking Style in Speech Corpora Focusing on Speech Transcriptions

    Shen Raymond, Kikuchi Hideaki

    Journal of Natural Language Processing   21 ( 3 ) 445 - 464  2014

     View Summary

    Recent developments in computer technology have allowed the construction and widespread application of large-scale speech corpora. To enable users of speech corpora to easier data retrieval, we attempt to characterise the speaking style of speakers recorded in the corpora. We first introduce the three scales for measuring speaking style which were proposed by Eskenazi in 1993. We then use morphological features extracted from speech transcriptions that have proven effective in discriminating between styles and identifying authors in the field of natural language processing to construct an estimation model of speaking style. More specifically, we randomly choose transcriptions from various speech corpora as text stimuli with which to conduct a rating experiment on speaking style perception. Then, using the features extracted from these stimuli and rating results, we construct an estimation model of speaking style, using a multi-regression analysis. After cross-validation (leave-1-out), the results show that among the three scales of speaking style, the ratings of two scales can be estimated with high accuracy, which proves the effectiveness of our method in the estimation of speaking style.


  • Self Organizing Maps as the Perceptual Acquisition Model:-Unsupervised Phoneme Learning from Continuous Speech-

    MIYAZAWA Kouki, SHIROSE Ayako, MAZUKA Reiko, KIKUCHI Hideaki

    J. SOFT   26 ( 1 ) 510 - 520  2014

     View Summary

    We assume that SOM is adequate as a language acquisition model of the native phonetic system. However, many studies don't consider the quantitative features (the appearance frequency and the number of frames of each phoneme) of the input data. Our model is designed to learn values of the acoustic characteristic of a natural continuous speech and to estimate the number and boundaries of the vowel categories without using explicit instructions. In the simulation trial, we investigate the relationship between the quantity of learning and the accuracy for the vowels in a single Japanese speaker's natural speech. As a result, it is found that the recognition accuracy rate (of our model) are 5% (/u/)-92% (/s/).

    DOI CiNii

  • Effect of Schemed Acting Directions on Speech Expressions : Toward the Achievement of Expressive Acted Speech

    MIYAJIMA Takahiro, KIKUCHI Hideaki, SHIRAI Katsuhiko, OKAWA Shigeki

    Journal of the Phonetic Society of Japan   17 ( 3 ) 10 - 23  2013.12

     View Summary

    This paper explains the procedure to enhance the expressiveness in acted speech. We designed our own "format of acting script" referring to the theory of drama and created 280 acting scripts. We presented these acting scripts as acting directions to three actresses and collected 840 speech data. For comparison, using typical emotional words as acting directions, we also collected 160 speech data from each actress. Then, we compared tendencies of various features of each data type and each speaker and found that our acting scripts are effective on the enhancement of expressiveness in acted speech psychologically/acoustically.

    DOI CiNii

  • Humor Utterance Generation Method for Non-task-oriented Dialogue System

    FUJIKURA Shohei, OGAWA Yoshito, KIKUCHI Hideaki

    IEICE technical report. Natural language understanding and models of communication   113 ( 338 ) 29 - 32  2013.11

     View Summary

    In this study, we propose humor generate method for Non-task-oriented Dialogue System using Twitter. We have been aiming to establish the design of dialogue systems with desire of continuing interaction by analyzing the factors to feel, "want to chat with next time". We confirmed dealing humor is valid for desire of continuing interaction. In this paper, we proposes a method which dialogue system can automatically generate humor with knowledge, extract from Twitter as Modifier-Noun pair and Value-Predicate clauses pair. And in Evaluation experiment, We confirmed proposal method can generate humor.


  • Desire of Continuing Interaction with Spoken Dialogue System

    KIKUCHI Hideaki, MIYAZAWA Kouki, OGAWA Yoshito, FUJIKURA Shouhei

    IEICE technical report. Speech   113 ( 220 ) 21 - 26  2013.09

     View Summary

    We aimed at improvement of desire of continuing interaction with a spoken dialogue system through the three cases of construction of spoken dialogue system. System utterances with humor, control of speech rate of system utterances and estimation of user's personality based on user's utterances are effective for improvement of desire of continuing interaction.


  • E-037 Development of Fatigue Degree Estimation System for Smartphone

    Aoki Yuki, Miyajima Takahiro, Kikuchi Hideaki, Shiomi Kakuichi

      12 ( 2 ) 269 - 270  2013.08


  • J-054 Personality Recognition and Improvement of Dialogue Continuance Desire Using Control of Speech Rate and Speech Interval Length

    Takeya Yuki, Ogawa Yoshito, Kikuchi Hideaki

      12 ( 3 ) 507 - 508  2013.08


  • The Relationship between the Level of Intimacy and Manner of Speech


    Technical report of IEICE. HIP   112 ( 483 ) 109 - 114  2013.03

     View Summary

    In this study, we analyzed the change of manner of speech with the level of intimacy. For our experiments, we collected two kinds of dialogue data: the initial meeting dialogue data we called "low intimacy" and that after six months we called "high intimacy". In our experiments, subjects listened to the dialogue of three pairs of speakers, and evaluated their impressions of the manner of speech through a questionnaire. Analyzing the results, we extracted four significant factors: "Liveliness", "Pleasantness", "Fluency" and "Speed". Comparing the factor scores for the low intimacy dialogues with the high intimacy dialogues, we found similar results for different partners in the low intimacy dialogues, butdifferent factor scores for different partners in the high intimacy dialogues. In particular, the fluency score increased in the dialogues after six months.


  • Effects of an Agent's Feature Grasping on an User's Attachment

    OGAWA Yoshito, HARADA Kaho, KIKUCHI Hideaki

    IEICE technical report. Speech   112 ( 450 ) 35 - 40  2013.02

     View Summary

    In this study, we consider effects of an agent's grasping user features on the user's attachment to the agent. Recently, some previous studies of HAI have researched for about strategies that make users continue to use spoken dialogue systems long period of time. In this research, we suggest a system estimate user's degree of activeness from prosody and accumulate that with a correct label decided from user response as training data for following estimation. Our results show our system performs more stable estimate, and higher estimation accuracy makes users conscious more intense attachment.


  • The construction of an evaluation scale for singing voice of popular music : in Amateur singing voice

    KANATO Ai, KIKUCHI Hideaki

    IEICE technical report. Speech   112 ( 422 ) 49 - 54  2013.01

     View Summary

    In this research, we tried to construct an evaluation scale for singing voice of popular music. In this paper, we considered the effectiveness of the scale in amateur singing voice and the factor of evaluation for singing voice. As a result, we constructed the scale with 12 words and confirmed its reliability. And, we found a characteristic factor in singing voice which differs from speaking voice.


  • Study on System Utterance of Suggestion to Promote User's Accepatance in Driving Environment

    MIYAZAWA Kouki, KAGETANI Takuya, SHEN Raymond, KIKUCHI Hideaki, OGAWA Yoshito, HATA Chihiro, OHTA Katsumi, HOZUMI Hideaki, MITAMURA Takeshi

    Transactions of the Japanese Society for Artificial Intelligence   25 ( 6 ) 723 - 732  2010

     View Summary

    In this study, we aim at clarification of the factor that promotes an user's acceptance of suggestion from an interactive agent in driving environment. Our aim is to figure out how human beings accept the encouragement from interaction objects, and also which kinds of dialogues or action controls are necessary for the design of car navigation system which makes suggestion and requests to drivers. Firstly, we had an experiment for collecting dialogue between humans in driving simulation environment, then we analyzed the drivers' acceptance and evaluation for the navigators. As the results, we found that the presence and reliability of the navigator highly relate to the acceptance of suggestion from the navigator. When navigators were next to drivers, the rate of drivers' suggestion acceptance rose. However, the stress of drivers increased. In addition, based on the linguistic and acoustic analysis of the navigators' utterances, we found out some points of designing system utterance of suggestion to promote user's acceptance. We found that expressing the grounds of suggestions, showing the exact numbers, and the wide pitch ranges, all highly relate to the acceptance of suggestions.

    DOI CiNii


  • 韻律情報を用いた発話態度認識とその対話システムへの応用

    八木大三, 藤江真也, 菊池英明, 小林哲則

    日本音響学会2005年春季研究発表会講演論文集     65 - 66  2005.03

  • 肯定的/否定的発話態度の認識とその音声対話システムへの応用

    藤江真也, 江尻康, 菊池英明, 小林哲則

    電子情報通信学会論文誌   J88-D-II ( 2 ) 489 - 498  2005.03

  • 早稲田大学eスクールの実践:大学教育におけるeラーニングの展望

    向後千春, 松居辰則, 西村昭治, 浅田匡, 菊池英明, 金群, 野嶋栄一郎

    第11回日本教育メディ ア学会年次大会発表論文集     45 - 48  2004.10

  • 韻律情報を利用した文章入力システムのための韻律制御モデル

    大久保崇, 菊池英明, 白井克彦

    日本音響学会2004年秋季研究発表会講演論文集     133 - 134  2004.09

  • 音声対話における発話の感情判別

    小林季実子, 菊池英明, 白井克彦

    日本音響学会2004年秋季研究発表会講演論文集     281 - 282  2004.09

  • 日本語話し言葉コーパス

    国立国語研究所    2004.03

  • 韻律情報を用いた肯定的/否定的態度の認識

    八木大三, 藤江真也, 菊池英明, 小林哲則

    日本音響学会2004年春季研究発表会講演論文集     141 - 142  2004.03

  • アイヌ語音声データベース

    早稲田大学語学教育研究所    2004.03

  • Spoken Dialogue System Using Prosody As Para-Linguistic Information

    FUJIE Shinya, YAGI Daizo, MATSUSAKA Yosuke, KIKUCHI Hideaki, KOBAYASHI Tetsunori

    proc. of SP2004(International Conference Speech Prosody,2004)     387 - 390  2004.03

  • Corpus of Spontaneous Japanese: Design, Annotation and XML Representation

    Kikuo Maekawa, Hideaki Kikuchi, Wataru Tsukahara

    International Symposium on Large-scale Knowledge Resources (LKR2004)     19 - 24  2004.03

  • 日本語話し言葉コーパスの音声ラベリング

    菊池英明, 前川喜久雄, 五十嵐陽介, 米山聖子, 藤本雅子

    音声研究   7 ( 3 ) 16 - 26  2003.12

  • 音声対話における韻律を用いた話題境界検出

    大久保崇, 菊池英明, 白井克彦

    電子情報通信学会技術報告   103 ( 519 ) 235 - 240  2003.12

  • パラ言語の理解能力を有する対話ロボット

    藤江真也, 江尻康, 菊池英明, 小林哲則

    情報処理学会音声言語情報処理研究会   2003-SLP-48   13 - 20  2003.10

  • パラ言語情報を用いた音声対話システム

    藤江真也, 八木大三, 菊池英明, 小林哲則

    日本音響学会2003年秋季研究発表会講演論文集     39 - 40  2003.09

  • Use of a large-scale spontaneous speech corpus in the study of linguistic variation

    MAEKAWA Kikuo, KOISO Hanae, KIKUCHI Hideaki, YONEYAMA Kiyoko

    proc. of 15th Int'l Congress of Phonetic Sciences     643 - 644  2003.08

  • Evaluation of the effectiveness of "X-JToBI": A new prosodic labeling scheme for spontaneous Japanese speech

    KIKUCHI Hideaki, MAEKAWA Kikuo

    proc. of 15th Int'l Congress of Phonetic Sciences     579 - 582  2003.08

  • 自発音声コーパスにおけるF0下降開始位置の分析

    籠宮隆之, 五十嵐陽介, 菊池英明, 米山聖子, 前川喜久雄

    日本音響学会2003年春季研究発表会講演論文集     317 - 318  2003.03

  • 『日本語話し言葉コーパス』(CSJ)のXML検索環境

    塚原渉, 菊池英明, 前川喜久雄

    第3回話し言葉の科学と工学ワークショップ講演予稿集     15 - 20  2003.02

  • XMLを利用した『日本語話し言葉コーパス』(CSJ)の整合性検証

    菊池英明, 塚原渉, 前川喜久雄

    第3回話し言葉の科学と工学ワークショップ講演予稿集     21 - 26  2003.02

  • Performance of segmental and prosodic labeling of spontaneous speech

    Kikuchi, H, K. Maekawa

    proc. of the ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR2003)     191 - 194  2003.02

  • Recognition of para-linguistic information and its application to spoken dialogue system

    S Fujie, Y Ejiri, Y Matsusaka, H Kikuchi, T Kobayashi


     View Summary

    The human-human interactions in a spoken dialogue seem to use not only linguistic information in the utterances but also some sorts of additional information supporting linguistic information. We call these sorts of additional information "para-linguistic information". In this paper, we present a recognition method of attitudes by prosodic information, and a recognition method of head gestures. In the former method, in order to recognize two attitudes, such as "positive" and "negative", F0 pattern and phoneme alignment are introduced as features. In the latter method, in order to recognize three gestures, such as "nod", "tilt" and "shake", left-to-right HMM is introduced as the probabilistic model as well as optical flow is introduced as features. Experiment results show that these methods are sufficient to recognize user's attitude as para-linguistic information. Finally, we show a proto-type spoken dialogue system using para-linguistic information and how these sorts of information contribute the efficient conversation.

  • 日本語自発音声韻律ラベリングスキームX-JToBIの能力検証

    菊池英明, 前川喜久雄

    人工知能学会言語・音声理解と対話処理研究会   SIG-SLUD-A202-06   33 - 37  2002.11

  • X-JToBI: An extended J_ToBI for spontaneous speech

    Maekawa, K, H. Kikuchi, Y. Igarashi, J. Venditti

    proc. 7th International Congress on Spoken Language Processing (ICSLP2002)     1545 - 1548  2002.10

  • 自発音声韻律ラベリングスキームX-JToBIによるラベリング精度の検証

    菊池英明, 前川喜久雄

    日本音響学会2002年春季研究発表会講演論文集     259 - 260  2002.09

  • 音声対話における心的状態変化の予測をともなうメタ発話生成機構

    菊池英明, 白井克彦

    情報処理学会論文誌   Vol.43, No.7   2130 - 2137  2002.07

  • 大規模自発音声コーパス『日本語話し言葉コーパス』の仕様と作成

    籠宮隆之, 小磯花絵, 小椋秀樹, 山口昌也, 菊池英明, 間淵洋子, 土屋菜穂子, 斎藤美紀, 西川賢哉, 前川喜久雄

    国語学会2002年度春季大会要旨集     225 - 232  2002.05

  • 日本語自発音声の韻律ラベリング体系: X-JToBI

    前川喜久雄, 菊池英明, 五十嵐陽介

    日本音響学会2002年春季研究発表会講演論文集     313 - 314  2002.03

  • 自発音声に対する音素自動ラベリング精度の検証

    菊池英明, 前川喜久雄

    日本音響学会2001年春季研究発表会講演論文集     97 - 98  2002.03

  • 日本語自発音声の韻律ラベリングスキーム: X-JToBI

    菊池英明, 前川喜久雄, 五十嵐陽介

    第2回話し言葉の科学と工学ワークショップ講演予稿集     19 - 26  2002.02

  • 自発音声に対する音素自動ラベリング精度の検証

    菊池英明, 前川喜久雄

    第2回話し言葉の科学と工学ワークショップ講演予稿集     53 - 58  2002.02

  • X-JToBI: 自発音声の韻律ラベリングスキーム

    前川喜久雄, 菊池英明, 五十嵐陽介

    情報処理学会音声言語情報処理研究会   SLP-39-23   135 - 140  2001.12

  • 自発音声コーパスにおける印象評定とその要因

    籠宮隆之, 槙洋一, 菊池英明, 前川喜久雄

    日本音響学会2001年秋季研究発表会講演論文集     381 - 382  2001.09

  • 多次元心的状態を扱う音声対話システムの構築

    鈴木堅悟, 青山一美, 菊池英明, 白井克彦

    情報処理学会音声言語情報処理研究会   2001-SLP037   13 - 18  2001.06

  • 「日本語話し言葉コーパス」における書き起こしの方法とその基準について

    小磯花絵, 土屋菜緒子, 間淵洋子, 斉藤美紀, 籠宮隆之, 菊池英明, 前川喜久雄

    日本語科学   Vol.9   43 - 58  2001.04

  • 自発音声に対するJ_ToBIラベリングの問題点検討

    菊池英明, 籠宮隆之, 前川喜久雄, 竹内京子

    日本音響学会2001年春季研究発表会講演論文集   2001 ( 1 ) 383 - 384  2001.03


  • 日本語音声への韻律ラベリング


    人工知能学会研究会資料   SIG-SLUD-A003-4   21 - 24  2001.02

  • 音声対話に基づく知的情報検索システム

    菊池英明, 阿部賢司, 桐山伸也, 大野澄雄, 河原達也, 板橋秀一, 広瀬啓吉, 中川聖一, 堂下修二, 白井克彦, 藤崎博也

    情報処理学会音声言語情報処理研究会   2001-SLP-35   85 - 90  2001.02

  • 『日本語話し言葉コーパス』の構築における計算機利用

    前川喜久雄, 菊池英明, 籠宮隆之, 山口昌也, 小磯花絵, 小椋秀

    日本語学, 明治書院     61 - 79  2001

  • 『日本語話し言葉コーパス』の書き起こし基準について

    小磯花絵, 土屋菜穂子, 間淵洋子, 斉藤美紀, 籠宮隆之, 菊池英明, 前川喜久雄

    電子情報通信学会技術報告   NLC2000-56, SP2000-104   55 - 60  2000.12

  • モノローグを対象とした自発音声コーパス:その設計について

    第14回日本音声学会全国大会予稿集    2000.10

  • Overview of an Intelligent System for Information Retrieval Based on Human-Machine Dialogue through Spoken Language

    proc. of Int'l. Conference on Spoken Language Processing    2000.10

  • Modeling of Spoken Dialogue Control for Improvement of Dialogue Efficiency

    proc. of IEEE Int'l. Conference on Systems, Man and Cybernetics    2000.10  [Refereed]

  • Improvement of Dialogue Efficiency by Dialogue Control Model According to Performance of Processes

    proc. of Int'l. Conference on Spoken Language Processing    2000.10

  • Designing a Domain Independent Platform of Spoken Dialogue System

    proc. of Int'l. Conference on Spoken Language Processing    2000.10

  • Controlling Non-verbal Information in Speaker-change for Spoken Dialogue

    proc. of IEEE Int'l. Conference on Systems, Man and Cybernetics    2000.10  [Refereed]

  • 大規模話し言葉コーパスにおける発話スタイルの諸相---書き起こしテキストの分析から---

    日本音響学会2000年秋季研究発表会講演論文集    2000.09

  • 日本語話し言葉コーパスの設計

    音声研究   4月2日  2000.08

  • 音声対話システム汎用プラットフォームにおける行動管理部の構築

    人工知能学会全国大会(第14回)   6月8日  2000.06

  • 対話効率の向上を目的とした音声対話制御のモデル化

    ヒューマンインタフェース学会誌   Vol.2, No.2  2000.05

  • 音声対話システム汎用プラットフォームの検討

    情報処理学会音声言語情報処理研究会   2000-SLP-30  2000.02

  • 課題遂行対話における対話潤滑語の認定

    人工知能学会誌   Vol.14, No.5  1999.09

  • Improving Recognition Correct Rate of Important Words in Large Vocabulary Speech Recognition

    proc. of Eurospeech    1999.09

  • A Post-Processing of Speech for Hearing Impaired Integrate into Standard Digital Audio Decorders

    proc. of Eurospeech    1999.09

  • Controlling Dialogue Strategy According to Performance of Processes

    proc. of ESCA Workshop Interactive Dialogue in Multi-modal Systems    1999.05

  • 音声対話システムにおける処理性能と対話戦略の関係についての一考察

    日本音響学会講演論文集   pp.109-110  1999.03

  • 人間型ロボットの対話インタフェースにおける発話交替時の非言語情報の制御

    情報処理学会論文誌   Vol.40, No.2  1999.02

  • システムの処理性能を考慮した対話制御方法の検討

    人工知能学会 言語・音声理解と対話処理研究会予稿集   pp.1-6  1999.02

  • Use of Nonverbal Information in Communication between Human and Robot

    Proc. Of International Conference on Spoken Language Processing (ICSLP)   pp.2351 - 2354  1998.12

  • 非言語的現象の分析と対話処理 -電子メール討論

    日本音響学会誌   54, No.11  1998.11

  • Controlling Gaze of Humanoid in Communication with Human

    Proc. Of International Conference on Intelligent Robots and Systems (IROS)   1, pp.255-260  1998.10  [Refereed]

  • 人間型対話インタフェースにおけるまばたき制御の検討

    人工知能学会全国大会論文集   15-14, pp.242-245  1998.06

  • 時間的制約を考慮した対話制御方法の実現方法

    人工知能学会全国大会論文集   37-07, pp.677-678  1998.06

  • 人間とロボットのコミュニケーションにおける非言語情報の利用

    情報処理学会音声言語情報処理研究会資料   21-7, pp.69-74  1998.05

  • 情報のら旋成長を支援するコミュニケーション形電子図書館

    電子情報通信学会論文誌   Vol.J81-D-II, No.5  1998.05

  • Multimodal Communication Between Human and Robot

    Proc. Of International Wireless and Telecommunications Symposium (IWTS)   pp.322-325  1998.05

  • 時間的制約を考慮した対話制御方法の検討

    日本音響学会講演論文集   3-6-16, pp.113-114  1998.03

  • 自由会話における時間的制約の影響の分析

    電子情報通信学会技術研究報告   SP97-55, pp.31-36  1997.10

  • 音声を利用したマルチモーダルインタフェース

    電子情報通信学会誌   80;10, pp.1031-1035  1997.10

  • 複数ユーザとロボットの対話における非言語情報の役割

    日本音響学会講演論文集   3-1-13, pp.111-112  1997.09

  • The Role of Non-Verbal Information in Spoken Dialogue between a Man and a Robot

    International Conference on Speech Processing (ICSP) '97 Proceedings   2, pp.539-544  1997.08

  • ロボットとの対話における非言語情報の役割

    人工知能学会全国大会論文集   21-06, pp.433-436  1997.06

  • 音響学会員のためのインターネット概説

    日本音響学会誌   52巻8号  1996.08

  • User Interface for a Digital Library to Support Construction of a "Virtual Personal Library"

    proc. of ICMCS(International Conference on Multimedia Conputing and Systems)    1996.06  [Refereed]

  • 情報処理学会第53回全国大会大会優秀賞


  • ハイパーメディア共有アーキテクチャにおけるバージョン管理方式

    情報処理学会全国大会講演論文集    1996.03

  • ハイパーメディア共有アーキテクチャ

    情報処理学会全国大会講演論文集    1996.03

  • Extensions of World-wide Aiming at the construction of a "Virtual Personal Library"

    proc. of Seventh ACM Conf. on Hypertext    1996.03

  • Multimodal Interface Using Speech and Pointing Gestures, and Its Applications for Interior Design and PC Window Manipulation

    proc. of IWHIT95 (International Workshop on Human Interface Technology 95)    1995.10

  • 音声とペンを入力手段とするマルチモーダルインタフェースの構築

    情報処理学会音声言語情報処理研究会   SLP-7-18  1995.07

  • Agent-typed Multimodal Interface Using Speech, Pointing Gestures, and CG

    proc. of HCI(Human Conputer Interaction) International '95    1995.07  [Refereed]

  • 仮想個人図書館の構築を支援するユーザインタフェースの開発

    電子情報通信学会春季大会講演論文集    1995.03

  • 音声・ポインティング・CGによるエージェント型ユーザインタフェースの試作と評価

    第10回ヒューマンインタフェースシンポジウム論文集    1994.10

  • マルチモーダルウインドウシステムの構築

    第10回ヒューマンインタフェースシンポジウム論文集    1994.10

  • 音声・ポインティング・CGによるエージェント型ユーザインタフェースシステム

    電子情報通信学会秋季大会講演論文集    1994.09

  • 音声対話インタフェースにおける発話権管理による割り込みへの対処

    電子情報通信学会論文誌   Vol.J77-D-II, No.8  1994.08

  • 音声対話システムにおける発話権の制御

    電子情報通信学会春季大会講演論文集   Vol.D-108  1993.03

  • Three Different LR Parsing Algorithm for Phoneme-Context-Dependent HMM-Based Continuous Speech Recognition

    IEICE Trans. Inf. & Sys.   Vol.E76-D, No.1  1993.01

  • ナビゲーションシステムにおける音声対話インタフェースの構成

    人工知能学会言語・音声理解と対話処理研究会   SIG-SLUD-9203-3  1992.10

▼display all

Books and Other Publications

  • "感情の音声表出", 石井克典監修「IoHを指向する感情・思考センシング技術」

    KIKUCHI Hideaki

    CMC出版  2019 ISBN: 9784781314303

  • "音声対話システム", 白井克彦編著「音声言語処理の潮流」

    KIKUCHI Hideaki

    コロナ社  2010


    KIKUCHI Hideaki

    Americas Group Publications,U.S.  2010 ISBN: 0935047727

  • "音声コミュニケーションの分析単位 -ToBI-", 坊農真弓, 高梨克也編,「多人数インタラクションの分析手法」

    KIKUCHI Hideaki

    オーム社  2009

  • "韻律を利用した対話状態の推定", 広瀬啓吉 編「韻律と音声言語情報処理」

    KIKUCHI Hideaki, SHIRAI Katsuhiko

    丸善  2006 ISBN: 4621076744

  • "Voicing in Japanese," Van de Weijer, Nanjo, Nishihara (Eds.)

    MAEKAWA Kikuo, KIKUCHI Hideaki

    Mouton de Gruyter, Berlin and New York  2006

  • "Spoken Language Systems", S. Nakagawa et al. (Eds.)

    HATAOKA Nobuo, ANDO Haru, KIKUCHI Hideaki

    Ohmsha/IOS Press  2005 ISBN: 427490637x

▼display all

Industrial Property Rights

  • 検索語クラスタリング装置、検索語クラスタリング方法、検索語クラスタリングプログラム及び記録媒体

    白井 克彦, 菊池 英明, 新関 一馬


  • 音声認識装置及び音声認識用プログラム

    白井 克彦, 菊池 英明, 大久保 崇


  • 連続音声認識装置および方法

    白井 克彦, 城崎 康夫, 菊池 英明



  • 情報処理学会第53回全国大会大会優秀賞


Research Projects

  • Real-time MRI database of articulatory movements of Japanese

    Project Year :


  • Development of protocol of effective social work interview for older adults with dementia

    Project Year :


  • バーチャルリアリティ環境におけるオラリティの運用の検討

    Project Year :


     View Summary


  • Elaboration of articulatory phonetics by means of realtime-MRI and WAVE data

    Project Year :


  • コーパス言語学的手法に基づく会話音声の韻律特徴の体系化

    Project Year :


     View Summary


  • Estimation of User's Impression Space for Improving Desire of Interaction with Spoken Dialogue System

    Project Year :


     View Summary

    This study aims at investigating influences of humans’ personal characteristics on forming impression of agent through human-agent interaction. We conducted an experiment in which subjects have some interaction with an agent or a human and form impression toward them. The result showed that subjects who has no experiences of programming tend to evaluate an agent lower than a human. Also subjects of the “high emotional-warmth” group tend to evaluate an agent lower than a human.Also we proposed a humor utterance generation method which compatible with dialogue system, to increase desire of sustainability. Through the experiment, we confirmed validity of the method. From the result, we observed high-incongruity reply is significantly effective against the low-incongruity and random reply. Finally we confirmed generating humor utterances is effective for increase desire of sustainability in interaction with dialogue system

  • Fundamental Study for Conversion between Spoken and Written Japanese Considering Influence of Interactivity

    Project Year :


     View Summary

    Our research clarifies the differences of styles between four modes of Japanese sentences. Each of these modes are combination of a pair of exclusive modes: spoken/written, dialogue/monologue. We developed a method for acquiring sentences of such modes within which issues and stories are controlled.Acquired data shows that on dialogue condition, the differences of styles between spoken and written sentences are smaller than those on monologue condition. These results imply that the traits of dialogue in which talker is prompted to make quick composition of sentence and to pay more attention to listener in front of him/her decrease spoken or written specific styles

  • 音声対話システムに対するインタラクション欲求向上のためのユーザ印象空間の推定

    科学研究費助成事業(早稲田大学)  科学研究費助成事業(基盤研究(C))

    Project Year :


  • Estimation of User's Impression Space for Improving Desire of Interaction with Spoken Dialogue System

    Project Year :


     View Summary

    This study aims at investigating influences of humans’ personal characteristics on forming impression of agent through human-agent interaction. We conducted an experiment in which subjects have some interaction with an agent or a human and form impression toward them. The result showed that subjects who has no experiences of programming tend to evaluate an agent lower than a human. Also subjects of the “high emotional-warmth” group tend to evaluate an agent lower than a human.Also we proposed a humor utterance generation method which compatible with dialogue system, to increase desire of sustainability. Through the experiment, we confirmed validity of the method. From the result, we observed high-incongruity reply is significantly effective against the low-incongruity and random reply. Finally we confirmed generating humor utterances is effective for increase desire of sustainability in interaction with dialogue system

  • Preparation of Database and Analysis of Dialectial Difference in Phonetics, Phonology, Grammar and Vocabulary among Dialects of Ainu

    Project Year :


     View Summary

    The differences among person marking system of several Ainu dialects have been observed and a prominent difference between Saru and Chitose dialects, which have so far been regarded as very similar dialects, was found. General tendency or historical implication of Ainu person marking system was investigated upon this observation. Difference among kinship terms of Saru and Shizunai dialects was researched using the epic texts of these dialects. The Ainu audio database was experimentally constructed and records for the database have been accumulated. Tagging to the verbs in Ainu text records for the database, focusing upon the grammatical and semantic role of each morpheme, was attempted and evaluated

  • On the physical factors which makes the mother tongue dialogues smoothly - through the comparison with the non-mother tongue

    Project Year :


     View Summary

    In the dialogue among mother tongue adult speakers, overlapping utterances are produced in transition-relevance places (TRP). This phenomenon seems to appear as the result of some capability which reduces the mental ‘burden’ of the dialogue in the mother tongue. We examined the age to acquire this capability regarding Japanese mother tongue speakers.The timing of the turn taking of the mother tongue and the non-mother tongue speakers was compared first. It was found that the non-mother tongue speaker could not maintain TRP. Next, the timing of turn taking between adults and five-year olds or six-year olds was examined. As for five-year olds, a difference between them and adults was present, but a difference between six-year olds and adults was absent. As a result, it found that the capability was already acquired by six

  • 韻律制御に主体をおいた対話システム

     View Summary


  • XML documentation of complex annotation on spontaneous speech data

     View Summary

    Annotation of spontaneous speech data is a difficult task, but the maintenance of large annotated spontaneous speech database and the information retrieval of such database is all the more difficult. We proposed a XML format that can represent nearly all annotation information of the Corpus of Spontaneous Japanese. CSJ is a world's largest spontaneous speech database with very rich annotation including transcription, POS information, clause boundary information, dependency-structure information, discourse-boundary information, segment label, intonation label, and so forth.Our XML format includes 10 layers (starting with "Talk" element and ending in "Phone" and "Tone" elements) arranged according to the structure of natural language. 208 attributes covers linguistic, paralinguistic, and non-linguistic annotation of the speech data as well as various disfluency phenomena. Also, there are some attributes that are introduced to represent the format of the transcription text.We have converted all 3302 talks of the CSJ (661 hours, over 7.5 million morphemes) into XML document, and used them for the data validation purposes.Information retrieval experiments were also conducted using the XML documents. It turned out that the use of XSLT language gave satisfactory performance. Information retrievals of modest complexity could be performed within 15 to 30 minutes when a PC of ordinary performance (3Ghz CPU with 2GB memory) was used.Lastly, we developed a simple GUI-based search tool that helps naive users to make XSLT query scripts. The software is written in Java language and runs under nearly all PC platforms.The XML documents and GUI search tool will be publicly available as a part of the CSJ in June 2004

  • Development of new prototypes and models of higher education utilizing broadband networks.

     View Summary

    To get substantive outcomes of e-learning courses, it is necessary for e-learning system including learning management systems to facilitate learners learning. Also it is necessary for teachers, coaches, and supporting staffs to work respectively. Teachers have three types of work: design, management and evaluation of the courses. Designing the detailed course structure is the new and important part of work for the teachers. And then online coaches appear to have a greater part of work to support the teacher and to facilitate classroom activities. Coaches have three types of work: facilitating the classroom activities, making classroom atmosphere and standards, facilitating the discussion processes. Many kind of learning management systems are now available free or commercially. The minimum functions are video streaming, bulletin board system, and testing, but these functions should be carefully designed and become more usable to get more substantive learning outcomes. Talking about the future learning environments, learner will be able to access directly his/her own working spaces by opening web brouser

  • Analysis of effective communication process by using body and artifact for faculty Development

     View Summary

    Based on the analysis of college lecture class and English class at elementary school by using Observational System for Instructional Analysis originally developed by Hough and Duncan, the followings were found. The OSIA analysis, starting with transcription of class activities, takes the form of matrix and timeline.1) Compared with college class, the level of interaction was significantly higher and there were more students' talk observed in an elementary English class2) More experienced Assistant Language Teacher(ALT, native speaker of English) created higher level of interaction between teacher and students by not just starting repetition practice but waiting for students' utterances after showing visual materials at an elementary English class. This created more energetic class activities with a higher sense of participation.3) Richer facial expression of teachers increased positive class atmosphere.4) More experienced ALT used short sentences and phrases elaborately and effectively as feedback each time students answered.5) Class proceeded effectively and smoothly when appropriate artifact was used for instruction

  • Quantitative Analysis of Linguistic Variation Using a Spontaneous Speech Corpus

     View Summary

    The first half of the three-year research project was devoted for the derivation of new research data from the Corpus of Spontaneous Japanese (CSJ), including,1) Prosodic and morphological information of the CSJ-Core (about 44 hours) is reorganized for the study of prosodic variations. The new data is encoded as an XML document whose hierarchical structure reflects those of Japanese prosody. The most basic node of the new XML document corresponds to so-called 'accentual phrase' of the Japanese language. (Done by Kikuchi)2) The word-origin information (i.e., Native, Sino-Japanese, Borrowing, and mixture of these) is given to the total of forty thousand short-unit-word recorded in the CSJ (by Ogura).3) Phonetic database for the study of the variation of the velar nasal in Tokyo Japanese (by Maekawa and Hibiya).4) RDB containing the whole word-from variations observed in the whole CSJ (7.52 million SUW.) (by Maekawa).Based upon these data, we analyzed language variations recorded in the CSJ including,1) Devoicing of vowels (by Maekawa & Kikuchi)2) Non-lexical lengthening of vowels (by Den)3) Moraic nasalization of particles (by Koiso)4) Variation of velar nasal (by Hibiya)5) Word-form variation of the whole CSJ (by Maekawa)6) Variation in the accentual-phrase-final rising intonation (by Maekawa & Kikuchi)7) Variation of morphological features at the end of sentence (by Ogura)The results of these studies were presented in international and domestic conferences, and reprinted in the 252-page final report of the project

  • A Study on a framework of spontaneous communication depending on dialogue situation

     View Summary

    In this study, we examined a framework of communication systems that interact with users spontaneously so as to cope with practical dialogue environments. While conventional spoken dialogue systems aimed to efficiently achieve specific speech-dialogue tasks, the framework of spontaneous communication system was developed in order to evolve these spoken dialogue systems to one that spontaneously start and continue spoken dialogues.For this purpose, three studies were conducted : (i) understanding of dialogue environment, which aims advanced human recognition and spoken dialogue recognition using image and speech signal, (ii) spontaneous communication management model, which models how to start, continue and end spoken dialogues, and(iii) speech generation and motion expression technology, which is how to present intentions of the system by utterances or motions.In the study(i), human pose estimation using stereo camera was developed. Simultaneous adoption of information of space depth of images, and shapes and textures of either human bodies or clothes realized accurate estimation of human poses. In addition, estimation of utterance intention was studied. Using characteristics of end of sentences and word N-grams, more accurate utterance intention was achieved.In the study(ii), models for a robot to start communication with a human, to continue it, and to end it were developed. we developed mental-state of a conversational partner for these purposes.In the study a speech generation technique for laughter was developed. Based on acoustical analyses of human laughter syntheses of both laughter and laughter-speech were realized. Throughout these three studies, basic framework of spontaneous communication was established

  • Analysis of infant-directed speech by acoustic and computational modeling methods.

     View Summary

    The goal of the present project was to investigate how the nature of infant-directed speech(IDS) differs from that of adult-directed speech(ADS), and what functions the properties of IDS may play for infants phonological acquisition. In particular, we focused on the vowel category acquisition. In acquiring vowel categories, based on the quality differences(i. e.,/a/,/i/,/u/,/e/,/o/), SOM models that were developed for ADS were able to learn similar categories from IDS as well. The distinction between short and long vowels, however, turned out to be particularly challenging. That was because the actual duration of the vowels varied widely independent of whether it is phonologically short or long, and we are continuing our research into what additional information will be necessary for a model to acquire the long-short vowel distinction

▼display all

Specific Research

  • 音象徴の表現力の精緻化


     View Summary

    本研究では、刺激を工学的手法により作成・呈示し、その結果を心理学的な統計処理を施し解釈することで、音象徴の持つ表現力を精緻化する。具体的には、実験参加者の属性を抽象化能力の理解度という側面から、音象徴の持つ表現力の差異を明らかにする。音象徴の印象評定実験を行い、心理学的距離に差が生じ得るか検討した。印象評定実験では,Scheffe の一対比較法によって,選定した聴覚・視覚刺激の心理尺度上の距離を比較した.その結果,日本語母語話者においても,丸みを帯びた名前はブーバ顔と,角張った名前はキキ顔と強く結びつく傾向にあることが示された.

  • 音声言語コーパスへの発話スタイル属性付与のためのアノテーション規準作成と自動推定

    2020   沈 睿

     View Summary


  • 音声対話システム発話の音声言語的特徴制御によるインタラクション欲求向上


     View Summary

    音声対話システムのシステム発話を制御することによってユーザに与える印象を変化させる技術の開発を目指している。本特定課題研究では、擬人化したシステムの自己開示によってパーソナリティを付与する手法に関する基礎研究を中心に進めた。実験を通じて、自己開示量と内容によって特定のパーソナリティを付与できることを確認した。この成果は、ヒューマンインタフェース学会論文誌に査読論文として掲載された(「自己開示による音声対話エージェントへのパーソナリティ付与」)。他にも、マイクロブログからユーモア発話を自動生成する技術(「非タスク指向対話システムにおけるマイクロブログを用いたユーモア発話の自動生成」)、発話速度あるいは無音区間長を制御する手法(「ロボット発話の話速・無音区間長の制御によるパーソナリティ認知と対話継続欲求の向上」)を検討し、それぞれによってユーザがシステムに抱く印象がどのように変化するかを実験により調査した。いずれも国内学会にて成果を発表した。ユーザ発話における音声のプロソディを解析することによってユーザの心的状態を推定して、それに応じてシステムの振舞を変えることによってユーザが抱く愛着感を変化させられることを実験により確認し、この成果を国際会議にて発表した(「Effects of an Agent Feature Comprehension on the Emotional Attachment of Users」)。いずれの研究においても、ユーザが抱く印象を変化させるこれらの手法によって、音声対話システムに対するインタラクション継続欲求が向上することを確認している。このことを一旦整理して国内学会にて発表した(「音声対話システムに対するインタラクション継続欲求」)。こうした一連の研究成果を体系化してさらに幅広く応用可能な技術を開発するために、インタラクション継続欲求とユーザ印象空間の関係を明確にする必要性が生じている。そこで最終的に、本特定課題研究を経て、2014年度科学研究費の基盤(C)に「音声対話システムに対するインタラクション欲求向上のためのユーザ印象空間の推定」というテーマを申請して採択された。

  • 生体情報を教師信号としたモデル学習による感情推定技術の高度化


     View Summary

     本研究では、音声からの心的状態の推定において話者の心的状況をより高い精度で推定するため、生体情報を教師信号としたモデル学習を行う“生理心理学的アプローチの導入”を提案する。 従来の感情推定は、モデル学習の際に実験者の判断による評定結果が教師信号として用いられるため主観的方法であることが否めない。また推定の対象も基本的な感情にのみ重点が置かれてきた。 生体情報は、意図的な操作が入らず継時的な変化を捉えられることができるとされている。そのため、推定を行う際に実験の第一段階として生体信号を利用することで、多様で連続的な心情の変化を対象とすることができるようになり、またより客観的で精度の高い判断が可能になると思われる。 難度の異なる音読課題を2つ用意し、課題間における生体信号の反応の違いが音声の違いにも現れるのかを観察した。実験者の主観的評価によってストレス状態と判断された被験者の音声と、それらのうち生体信号の変化からもストレス状態にあると判断できた被験者の音声の比較を行う。 生体信号には、心的状態の推定へ利用できると思われた容積脈派(BVP)、心電図(EKG)、皮膚温(TEMP)、皮膚コンダクタンス(SC)を用いた。 音声の比較には、各音声からF0とパワーそれぞれの最大値、最小値、振幅、平均値、それに発話速度を加えた9つの特徴量を抽出し、これらを決定木学習に利用した。決定木学習には、C4.5アルゴリズムを使用し、交差検定を用いて評価を行う。 全データ(実験者の主観的評価のみによってストレス状態を判断した)で学習モデルを生成した場合平均63.9%であった判別率が、選別データ(主観的評価に加え、生体信号の変化からもストレス状態を判断した)で学習モデルを生成した場合には平均77.8%まで精度が向上した。 生体信号がストレス状態を判断するうえで一つの指標となり得ることを示唆する結果となった。本実験の結果、音声からの心的状態の推定を行う際に生体情報を利用することの有益性が実証された。

Overseas Activities

  • 音声言語における感情・評価・態度の解析技術高度化


    アメリカ   オハイオ州立大学

    中国   北京大学



▼display all