Updated on 2024/12/21

写真a

 
KOBAYASHI, Tetsunori
 
Affiliation
Faculty of Science and Engineering, School of Fundamental Science and Engineering
Job title
Professor
Degree
Dr. Eng.
Profile
音声・画像処理などを用いたコンピュータ・ヒューマン・インタラクション,知能ロボット,音声の生成・知覚,インタフェースの開発パラダイムなどの研究に興味を持つ。

Research Experience

  • 1997.04
    -
    Now

    Waseda University   Professor

  • 2018.11
    -
    2020.09

    Waseda University   Center for Research Strategy   Director

  • 2004.04
    -
    2009.03

    NHK Science & Technology Research Laboratories   Visiting Researcher

  • 2000.04
    -
    2002.03

    ATR Spoken Language Translation Research Laboratories   Visiting Researcher

  • 1994.07
    -
    1995.08

    MIT   Laboratory for Computer Science   Visiting Researcher

  • 2020.04
    -
    2020.09

    Waseda University   Reserch Council   Chair

  • 2014.11
    -
    2016.09

    Waseda University   Institue for Advanced Study   Depty Director

  • 2010.09
    -
    2016.09

    Waseda University   Research Institute for Science and Engineering   Deputy Director

  • 2007.04
    -
    2014.03

    Waseda University   Dept. Computer Science and Engineering (due to change of department name)

  • 2004.04
    -
    2007.03

    Waseda University   Dept. Computer Science   Professor

  • 2003.04
    -
    2004.03

    Waseda University   Dept. Computer Science (due to depatment reorganization)

  • 1997.04
    -
    2003.03

    Waseda University   Dept. Electrical, Electronics and Computer Engineering   Professor

  • 1996.04
    -
    1997.03

    Waseda University   Dept. Electrical, Electronics and Computer Engineering (due to change of dapartment name)   Assosiate Professor

  • 1991.04
    -
    1996.03

    Waseda University   Department of Electrical Engineering, School of Science and Engineering

  • 1987.04
    -
    1991.03

    Hosei University

  • 1990.07
    -
    1990.09

    Auditory and Visual Perception Research Laboratories   Invited Researcher

  • 1985.04
    -
    1987.03

    Hosei University   Dept. Electrical Engineering   Lecturer

▼display all

Education Background

  • 1985.03
    -
     

    Waseda University   Graduate School of Science and Engineering  

  • 1982.03
    -
     

    Waseda University   Graduate School of Science and Engineering  

  • 1980.03
    -
     

    Waseda University   School of Science and Engineering   Department of Electrical Engineering  

Committee Memberships

  • 2004.05
    -
    2006.05

    電子情報通信学会  会誌編集委員会 編集特別幹事

  • 2001.04
    -
    2004.03

    情報処理学会  音声言語情報処理研究会 主査

  • 1998.04
    -
    2002.03

    言語処理学会  理事

Professional Memberships

  • 1994
    -
    Now

    THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING

  • 1987
    -
    Now

    THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE

  • 1983
    -
    Now

    INFORMATION PROCESSING SOCIETY OF JAPAN

  • 1980
    -
    Now

    ACOUSTICAL SOCIETY OF JAPAN

  • 1980
    -
    Now

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS

  • 2003
    -
    Now

    Language Resources Association

  • 1996.10
    -
    Now

    ACM

  • 1989.01
    -
    Now

    THE ROBOTICS SOCIETY OF JAPAN

  • 1984
    -
    Now

    IEEE

▼display all

Research Areas

  • Intelligent robotics / Perceptual information processing

Research Interests

  • Pattern Recognition

  • Image Processing

  • Spoken Language Processing

  • Conversational Robot

Awards

  • Fellow

    2023.03   Institute of Electronics, Information and Communication Engineers (IEICE)   Contributions to research on multi-modal multi-party conversations using robots

  • Fellow

    2016.06   Information Processing Society of Japan   Contributions to pioneering research on robot conversation and contributions to the revitalization of the research community

  • American Publishers Awards for Professional and Scholarly Excellence

    2008   Springer handbook of robotics

  • Best Paper Award

    2001   Institute of Electronics, Information and Communication Engineers (IEICE)   Conversation Robot Participating in Group Conversation

    Winner: Yosuke Matsudsaka, Tsuyoshi Tojo, Tetsunori Kobayashi

  • Award for Academic Startups

    2024.08   Japan Science and Technology Agency  

  • 研究会優秀賞

    2019   人工知能学会   隠れた良作を推薦可能なWeb小説レコメンドシステムの提案

  • Best Poster Award

    2016.12   ACM SIGGRAPH VRCAI2016   Video Semantic Indexing using Object Detector

    Winner: Kazuya Ueki, Tetsunori Kobayashi

  • 研究会優秀賞

    2015   人工知能学会   情報アクセスにおける受動性と能動性:音声対話によるニュース記事アクセス

  • 研究会優秀賞

    2012   人工知能学会   多人数会話活性化のための自発的行動タイミング検出と発話行動戦略

  • HAI-2012

    2012   Outstanding Research Award

  • 研究会優秀賞

    2011   人工知能学会   発話期待度/意欲度に基づく発話タイミング制御

  • Best Paper

    2008   IEEE BTAS2008 (International Conference on Biometrics: Theory, Applications and Systems)   Class distance weighted locality preserving projection for automatic age estimation

    Winner: Kazuya Ueki, Masakazu Miya, Tetsuji Ogawa, Tetsunori Kobayashi.

  • 研究会優秀賞

    2008   人工知能学会   人-人コミュニケーションの活性化支援ロボットの開発

▼display all

Media Coverage

  • 「場を読める」目線・しぐさ

    Newspaper, magazine

    読売新聞  

    2012.04

  • 処理遅延の小さい音源分離モジュール,OKIと早大が共同開発

    Newspaper, magazine

    Author: Other  

    日経BP,   日経エレクトロニクス,  

    2009.05

  • 会話ロボット

    TV or radio program

    NHK,   サイエンスZERO  

    2006.01

  • 会話ロボット

    TV or radio program

    Discovery Channel Canada  

    2005.03

  • 年齢・性別推定システム

    TV or radio program

    TV東京   ワールドビジネスサテライト  

    2004.03

  • 会話ロボット

    TV or radio program

    NHK,   クローズアップ現代  

    2000.01

  • 会話ロボット ROBITA

    TV or radio program

    NHK   サイエンス・アイ  

    1999.12

  • 会話ロボット ROBITA

    TV or radio program

    TBS   筑紫哲也・立花隆の『人のたび・ヒトへの旅』  

    1999.05

  • 騒音の中でも声だけ拾う OKI、スマホ向け技術 雑音消し声認識

    Newspaper, magazine

    日本経済新聞  

    2012.11

  • 音源分離技術

    TV or radio program

    テレビ東京   ワールドビジネスサテライト・トレンドたまご  

    2008.11

  • 音源分離技術

    Newspaper, magazine

    日経産業新聞  

    2008.11

  • 会話ロボット

    TV or radio program

    VARA(オランダのテレビ局)  

    2006.01

  • NECソフト 来店客を自動解析 早大と開発 年齢・性別を推定

    Newspaper, magazine

    日経MJ  

    2004.05

  • コナミと早大 CG共同開発

    Newspaper, magazine

    日本経済新聞  

    2004.02

  • 人の感情読み取り対話

    Newspaper, magazine

    日本経済新聞  

    2004.01

  • 早大「会話型ロボ」開発 あいまいな言葉も理解し返答

    Newspaper, magazine

    日経産業新聞  

    2004.01

  • 顧客の年齢や性別推定 早大 NECソフト

    Newspaper, magazine

    日経産業新聞  

    2003.10

  • 会話ロボット

    TV or radio program

    テレビ東京   賢者のマネー  

    2003.06

  • まずトーイより始めよ:認識機能を強化してユーザとの対話を多彩に

    Newspaper, magazine

    日経BP,   日経エレクトロニクス,  

    No.747, pp.133-134,  

    1999.07

  • もう一つのワールドカップ

    TV or radio program

    テレビ愛知  

    1999.06

  • 特集:感性ロボット登場

    Newspaper, magazine

    日刊工業新聞,   トリガー,  

    Vol.18, No.5, pp.24-26,  

    1999.05

  • 会話ロボット

    TV or radio program

    テレビ朝日,   週間地球テレビ『ロボット特集』,  

    1998.01

  • ジェスチャ認識システム

    TV or radio program

    東京メトロポリタンテレビ   ロボットが鉄腕アトムになる日  

    1997.09

  • 世界をリードする日本のロボット技術

    Newspaper, magazine

    日本工業新聞  

    1997.04

▼display all

 

Papers

  • Response Timing Estimation for Spoken Dialog Systems Based on Syntactic Completeness Prediction

    Jin Sakuma, Shinya Fujie, Tetsunori Kobayashi

    2022 IEEE Spoken Language Technology Workshop (SLT)    2023.01  [Refereed]  [International journal]

    Authorship:Last author

    DOI

  • BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model

    Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    ACM EMNLP 2022     5486 - 5503  2022  [Refereed]  [International coauthorship]

  • Associative Memory Model-Based Linear Filtering and Its Application to Tandem Connectionist Blind Source Separation

    Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi

    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   25 ( 3 ) 637 - 650  2017.03  [Refereed]

    Authorship:Last author

     View Summary

    We propose a blind source separation method that yields high-quality speech with low distortion. Time-frequency (TF) masking can effectively reduce interference, but it produces nonlinear distortion. By contrast, linear filtering using a separation matrix such as independent vector analysis (IVA) can avoid nonlinear distortion, but the separation per-formance is reduced under reverberant conditions. The tandem connectionist approach combines several separation methods and it has been used frequently to compensate for the disadvantages of these methods. In this study, we propose associative memory model (AMM)-based linear filtering and a tandem connectionist framework, which applies TF masking followed by linear filtering. By using AMM trained with speech spectra to optimize the sepa-ration matrix, the proposed linear filtering method considers the properties of speech that are not considered explicitly in IVA, such as the harmonic components of spectra. TF masking is applied in the proposed tandem connectionist framework to reduce unwanted components that hinder the optimization of the separation matrix, and it is approximated by using a linear separation matrix to reduce nonlinear distortion. The results obtained in simultaneous speech separation experiments demonstrate that although the proposed linear filtering method can increase the signal-to-distortion ratio (SDR) and signal-to-interference ratio (SIR) compared with IVA, the proposed tandem connectionist framework can obtain greater increases in SDR and SIR, and it reduces the phoneme error rate more than the proposed linear filtering method.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Four-participant group conversation: A facilitation robot controlling engagement density as the forth participant

    Yoichi Matsuyama, Iwao Akiba, Shinya Fujie and Tetsunori Kobayashi

    Computer Speech and Language   33 ( 1 ) 1 - 24  2015.09  [Refereed]

    Authorship:Last author

  • Conversational Robots: An Approach to conversation protocol issues that utilizes the paralinguistic information available in a robot-human setting.

    Tetsunori Kobayashi, Shinya Fujie

    Acoustical Science and Technology   34 ( 2 ) 64 - 72  2013.03  [Refereed]  [Invited]

    Authorship:Lead author

  • Conversation robot participating in group conversation

    Y Matsusaka, T Tojo, T Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E86D ( 1 ) 26 - 36  2003.01  [Refereed]  [Invited]

    Authorship:Last author

     View Summary

    We developed a conversation system which can participate in a group conversation. Group conversation is a form of conversation in which three or more participants talk to each other about a topic on an equal footing. Conventional conversation systems have been designed under the assumption that each system merely talked with only one person. Group conversation is different from these conventional systems in the following points. It is necessary for the system to understand the conversational situation such as who is speaking, to whom he is speaking, and also to whom the other participants pay attention. It is also necessary for the system itself to try to affect the situation appropriately. In this study, we realized the function of recognizing the conversational situation, by combining image processing and acoustic processing, and the function of working on the conversational situation utilizing facial and body actions of the robot. Thus, a robot that can join in the group conversation was realized.

  • Hierarchical Multi-Task Learning with CTC and Recursive Operation

    Nahomi Kusunoki, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

    Interspeech 2024     2855 - 2859  2024.09

    DOI

  • Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture

    Huaibo Zhao, Shinya Fujie, Tetsuji Ogawa, Jin Sakuma, Yusuke Kida, Tetsunori Kobayashi

    ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)    2023.06

    DOI

  • BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder

    Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)    2023.06

    DOI

  • Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss

    Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)    2023.06

    DOI

  • Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization

    Yusuke Fujita, Tetsuji Ogawa, Tetsunori Kobayashi

    IEEE Access   11   140069 - 140076  2023

    DOI

  • PostMe: Unsupervised Dynamic Microtask Posting For Efficient and Reliable Crowdsourcing

    Ryo Yanagisawa, Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

    2022 IEEE International Conference on Big Data (Big Data)    2022.12  [Refereed]

    DOI

  • Phrase-Level Localization of Inconsistency Errors in Summarization by Weak Supervision

    Masato Takatsuka, Tetsunori Kobayashi, Yoshihiko Hayashi

    Proceedings of the 29th International Conference on Computational Linguistics     6151 - 6164  2022.10  [Refereed]

  • Response Timing Estimation for Spoken Dialog System using Dialog Act Estimation

    Jin Sakuma, Shinya Fujie, Tetsunori Kobayashi

    Interspeech 2022    2022.09  [Refereed]  [International journal]

    Authorship:Last author

    DOI

  • Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

    Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi

    ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)    2022.05  [Refereed]

    Authorship:Last author

    DOI

  • Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models

    Naohiro TAWARA, Atsunori OGAWA, Tomoharu IWATA, Hiroto ASHIKAWA, Tetsunori KOBAYASHI, Tetsuji OGAWA

    IEICE Transactions on Information and Systems   E105.D ( 1 ) 150 - 160  2022.01  [Refereed]

    DOI

  • Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation

    Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

    Interspeech 2021    2021.08  [Refereed]  [International journal]

    Authorship:Last author

    DOI

  • Timing generating networks: Neural network based precise turn-taking timing prediction in multiparty conversation

    Shinya Fujie, Hayato Katayama, Jin Sakuma, Tetsunori Kobayashi

    Interspeech 2021     3771 - 3775  2021.08  [Refereed]

    Authorship:Last author

  • Improved Mask-CTC for Non-Autoregressive End-to-End ASR

    Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)    2021.06  [Refereed]

    Authorship:Last author

    DOI

  • Noise-robust attention learning for end-to-end speech recognition

    Yosuke Higuchi, Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

    European Signal Processing Conference   2021-   311 - 315  2021.01

     View Summary

    We propose a method for improving the noise robustness of an end-to-end automatic speech recognition (ASR) model using attention weights. Several studies have adopted a combination of recurrent neural networks and attention mechanisms to achieve direct speech-to-text translation. In the real-world environment, however, noisy conditions make it difficult for the attention mechanisms to estimate the accurate alignment between the input speech frames and output characters, leading to the degradation of the recognition performance of the end-to-end model. In this work, we propose noise-robust attention learning (NRAL) which explicitly tells the attention mechanism where to “listen at” in a sequence of noisy speech features. Specifically, we train the attention weights estimated from a noisy speech to approximate the weights estimated from a clean speech. The experimental results based on the CHiME-4 task indicate that the proposed NRAL approach effectively improves the noise robustness of the end-to-end ASR model.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Investigation of network architecture for single-channel end-to-end denoising

    Takuya Hasumi, Tetsunori Kobayashi, Tetsuji Ogawa

    European Signal Processing Conference   2021-   441 - 445  2021.01

     View Summary

    This paper examines the effectiveness of a fully convolutional time-domain audio separation network (Conv-TasNet) on single-channel denoising. Conv-TasNet, which has a structure to explicitly estimate a mask for encoded features, has shown to be effective in single-channel sound source separation in noise-free environments, but it has not been applied to denoising. Therefore, the present study investigates a method of learning Conv-TasNet for denoising and clarifies the optimal structure for single-channel end-to-end modeling. Experimental comparisons conducted using the CHiME-3 dataset demonstrate that Conv-TasNet performs well in denoising and yields improvements in single-channel end-to-end denoising over existing denoising autoencoder-based modeling.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Personalized Extractive Summarization for a News Dialogue System

    Hiroaki Takatsu, Mayu Okuda, Yoichi Matsuyama, Hiroshi Honda, Shinya Fujie, Tetsunori Kobayashi

    2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings     1044 - 1051  2021.01

     View Summary

    In modern society, people's interests and preferences are diversifying. Along with this, the demand for personalized summarization technology is increasing. In this study, we propose a method for generating summaries tailored to each user's interests using profile features obtained from questionnaires administered to users of our spoken-dialogue news delivery system. We propose a method that collects and uses the obtained user profile features to generate a summary tailored to each user's interests, specifically, the sentence features obtained by BERT and user profile features obtained from the questionnaire result. In addition, we propose a method for extracting sentences by solving an integer linear programming problem that considers redundancy and context coherence, using the degree of interest in sentences estimated by the model. The results of our experiments confirmed that summaries generated based on the degree of interest in sentences estimated using user profile information can transmit information more efficiently than summaries based solely on the importance of sentences.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Analysis of Multimodal Features for Speaking Proficiency Scoring in an Interview Dialogue

    Mao Saeki, Yoichi Matsuyama, Satoshi Kobashikawa, Tetsuji Ogawa, Tetsunori Kobayashi

    2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings     629 - 635  2021.01

     View Summary

    This paper analyzes the effectiveness of different modalities in automated speaking proficiency scoring in an online dialogue task of non-native speakers. Conversational competence of a language learner can be assessed through the use of multimodal behaviors such as speech content, prosody, and visual cues. Although lexical and acoustic features have been widely studied, there has been no study on the usage of visual features, such as facial expressions and eye gaze. To build an automated speaking proficiency scoring system using multi-modal features, we first constructed an online video interview dataset of 210 Japanese English-learners with annotations of their speaking proficiency. We then examined two approaches for incorporating visual features and compared the effectiveness of each modality. Results show the end-to-end approach with deep neural networks achieves a higher correlation with human scoring than one with handcrafted features. Modalities are effective in the order of lexical, acoustic, and visual features.

    DOI

    Scopus

    8
    Citation
    (Scopus)
  • Deep Speech Extraction with Time-Varying Spatial Filtering Guided by Desired Direction Attractor

    Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings   2020-   671 - 675  2020.05

     View Summary

    In this investigation, a deep neural network (DNN) based speech extraction method is proposed to enhance a speech signal propagating from the desired direction. The proposed method integrates knowledge based on a sound propagation model and the time-varying characteristics of a speech source, into a DNN-based separation framework. This approach outputs a separated speech source using time-varying spatial filtering, which achieves superior speech extraction performance compared with time-invariant spatial filtering. Given that the gradient of all modules can be calculated, back-propagation can be performed to maximize the speech quality of the output signal in an end-to-end manner. Guided information is also modeled based on the sound propagation model, which facilitates disentangled representations of the target speech source and noise signals. The experimental results demonstrate that the proposed method can extract the target speech source more accurately than conventional DNN-based speech source separation and conventional speech extraction using time-invariant spatial filtering.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Exploring and exploiting the hierarchical structure of a scene for scene graph generation

    Ikuto Kurosawa, Tetsunori Kobayashi, Yoshihiko Hayashi

    Proceedings - International Conference on Pattern Recognition     1422 - 1429  2020

     View Summary

    The scene graph of an image is an explicit, concise representation of the image
    hence, it can be used in various applications such as visual question answering or robot vision. We propose a novel neural network model for generating scene graphs that maintain global consistency, which prevents the generation of unrealistic scene graphs
    the performance in the scene graph generation task is expected to improve. Our proposed model is used to construct a hierarchical structure whose leaf nodes correspond to objects depicted in the image, and a message is passed along the estimated structure on the fly. To this end, we aggregate features of all objects into the root node of the hierarchical structure, and the global context is back-propagated to the root node to maintain all the object nodes. The experimental results on the Visual Genome dataset indicate that the proposed model outperformed the existing models in scene graph generation tasks. We further qualitatively confirmed that the hierarchical structures captured by the proposed model seemed to be valid.

    DOI

    Scopus

  • Mask CTC: Non-autoregressive end-to-end ASR with CTC and mask predict

    Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH   2020-   3655 - 3659  2020

     View Summary

    We present Mask CTC, a novel non-autoregressive end-to-end automatic speech recognition (ASR) framework, which generates a sequence by refining outputs of the connectionist temporal classification (CTC). Neural sequence-to-sequence models are usually autoregressive: each output token is generated by conditioning on previously generated tokens, at the cost of requiring as many iterations as the output length. On the other hand, non-autoregressive models can simultaneously generate tokens within a constant number of iterations, which results in significant inference time reduction and better suits end-to-end ASR model for real-world scenarios. In this work, Mask CTC model is trained using a Transformer encoder-decoder with joint training of mask prediction and CTC. During inference, the target sequence is initialized with the greedy CTC outputs and low-confidence tokens are masked based on the CTC probabilities. Based on the conditional dependence between output tokens, these masked low-confidence tokens are then predicted conditioning on the high-confidence tokens. Experimental results on different speech recognition tasks show that Mask CTC outperforms the standard CTC model (e.g., 17.9% ? 12.1% WER on WSJ) and approaches the autoregressive model, requiring much less inference time using CPUs (0.07 RTF in Python implementation). All of our codes are publicly available at https://github.com/espnet/espnet.

    DOI

    Scopus

    78
    Citation
    (Scopus)
  • Mentoring-reverse mentoring for unsupervised multi-channel speech source separation

    Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH   2020-   86 - 90  2020

     View Summary

    Mentoring-reverse mentoring, which is a novel knowledge transfer framework for unsupervised learning, is introduced in multi-channel speech source separation. This framework aims to improve two different systems, which are referred to as a senior and a junior system, by mentoring each other. The senior system, which is composed of a neural separator and a statistical blind source separation (BSS) model, generates a pseudo-target signal. The junior system, which is composed of a neural separator and a post-filter, was constructed using teacher-student learning with the pseudo-target signal generated from the senior system i.e, imitating the output from the senior system (mentoring step). Then, the senior system can be improved by propagating the shared neural separator of the grown-up junior system to the senior system (reverse mentoring step). Since the improved neural separator can give better initial parameters for the statistical BSS model, the senior system can yield more accurate pseudo-target signals, leading to iterative improvement of the pseudo-target signal generator and the neural separator. Experimental comparisons conducted under the condition where mixture-clean parallel data are not available demonstrated that the proposed mentoring-reverse mentoring framework yielded improvements in speech source separation over the existing unsupervised source separation methods.

    DOI

    Scopus

    11
    Citation
    (Scopus)
  • Efficient Human-In-The-Loop Object Detection using Bi-Directional Deep SORT and Annotation-Free Segment Identification.

    Koki Madono, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

    Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(APSIPA)     1226 - 1233  2020

  • Waseda meisei at TRECVID 2018: Ad-hoc video search

    Kazuya Ueki, Yu Nakagome, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

    2018 TREC Video Retrieval Evaluation, TRECVID 2018    2020

     View Summary

    Copyright © TRECVID 2018.All rights reserved. The Waseda Meisei team participated in the TRECVID 2018 Ad-hoc Video Search (AVS) task [1]. For this year's AVS task, we submitted both manually assisted and fully automatic runs. Our approach focuses on the concept-based video retrieval, based on the same approach as last year. Specifically, it improves on the word-based keyword extraction method presented in last year's system, which could neither handle keywords related to motion nor appropriately capture the meaning of phrases or whole sentences in queries. To deal with these problems, we introduce two new measures: (i) calculating the similarity between the definition of a word and an entire query sentence, (ii) handling of multi-word phrases. Our best manually assisted run achieved a mean average precision (mAP) of 10.6%, which was ranked the highest among all submitted manually assisted runs. Our best fully automatic run achieved an mAP of 6.0%, which ranked sixth among all participants.

  • Waseda_Meisei at TRECVID 2018: Fully-automatic ad-hoc video search

    Yu Nakagome, Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

    2018 TREC Video Retrieval Evaluation, TRECVID 2018    2020

  • Word attribute prediction enhanced by lexical entailment tasks

    Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi

    LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings     5846 - 5854  2020  [Refereed]

     View Summary

    © European Language Resources Association (ELRA), licensed under CC-BY-NC Human semantic knowledge about concepts acquired through perceptual inputs and daily experiences can be expressed as a bundle of attributes. Unlike the conventional distributed word representations that are purely induced from a text corpus, a semantic attribute is associated with a designated dimension in attribute-based vector representations. Thus, semantic attribute vectors can effectively capture the commonalities and differences among concepts. However, as semantic attributes have been generally created by psychological experimental settings involving human annotators, an automatic method to create or extend such resources is highly demanded in terms of language resource development and maintenance. This study proposes a two-stage neural network architecture, Word2Attr, in which initially acquired attribute representations are then fine-tuned by employing supervised lexical entailment tasks. The quantitative empirical results demonstrated that the fine-tuning was indeed effective in improving the performances of semantic/visual similarity/relatedness evaluation tasks. Although the qualitative analysis confirmed that the proposed method could often discover valid but not-yet human-annotated attributes, they also exposed future issues to be worked: we should refine the inventory of semantic attributes that currently relies on an existing dataset.

  • Exploiting Narrative Context and A Priori Knowledge of Categories in Textual Emotion Classification.

    Hikari Tanabe, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi

    Proceedings of the 28th International Conference on Computational Linguistics(COLING)     5535 - 5540  2020  [Refereed]

  • MicroLapse: Measuring workers' leniency to prediction errors of microtasks' working times

    Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Jefrey P. Bigham

    Proceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW     352 - 356  2019.11

     View Summary

    Working time estimation is known to be helpful for allowing crowd workers to select lucrative microtasks. We previously proposed a machine learning method for estimating the working times of microtasks, but a practical evaluation was not possible because it was unclear what errors would be problematic for workers across diferent scales of microtask working times. In this study, we formulate MicroLapse, a function that expresses a maximal error in working time prediction that workers can accept for a given working time length. We collected 60, 760 survey answers from 660 Amazon Mechanical Turk workers to formulate MicroLapse. Our evaluation of our previous method based on MicroLapse demonstrated that our working time prediction method was fairly successful for shorter microtasks, which could not have been concluded in our previous paper.

    DOI

    Scopus

  • Regularized adversarial training for single-shot virtual try-on

    Kotaro Kikuchi, Kota Yamaguchi, Edgar Simo-Serra, Tetsunori Kobayashi

    Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019     3149 - 3152  2019.10

     View Summary

    Spatially placing an object onto a background is an essential operation in graphic design and facilitates many different applications such as virtual try-on. The placing operation is formulated as a geometric inference problem for given foreground and background images, and has been approached by spatial transformer architecture. In this paper, we propose a simple yet effective regularization technique to guide the geometric parameters based on user-defined trust regions. Our approach stabilizes the training process of spatial transformer networks and achieves a high-quality prediction with single-shot inference. Our proposed method is independent of initial parameters, and can easily incorporate various priors to prevent different types of trivial solutions. Empirical evaluation with the Abstract Scenes and CelebA datasets shows that our approach achieves favorable results compared to baselines.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Speaker adversarial training of DPGMM-based feature extractor for zero-resource languages

    Yosuke Higuchi, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. INTERSPEECH2019     266 - 270  2019.09  [Refereed]

  • Multi-channel speech enhancement using time-domain convolutional denoising autoencoder

    Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. INTERSPEECH2019     86 - 90  2019.09  [Refereed]

  • Calving prediction from video: Exploiting behavioural information relevant to calving signs in Japanese black beef cows

    Kazuma Sugawara, Susumu Saito, Teppei Nakano, Makoto Akanabe, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. ECPLF2019     663 - 669  2019.08  [Refereed]

  • Two-stage calving prediction system: Exploiting state-based information relevant to calving signs in Japanese black beef cows

    Ryosuke Hyodo, Saki Yasuda, Yusuke Okimoto, Susumu Saito, Teppei Nakano, Makoto Akanabe, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. ECPLF2019     670 - 676  2019.08  [Refereed]

  • Data assimilation versus machine learning: Comparative study of fish catch forecasting

    Yuka Horiuchi, Yuya Kokaki, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. OCEANS2019    2019.06  [Refereed]

  • Psychological measure on fish catches and its application to optimization criterion for machine learning based predictors

    Yuya Kokaki, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. OCEANS2019    2019.06  [Refereed]

  • 対話音声合成の表現力向上に向けた文末音調の制御による付加的なニュアンスの表現に関する実験的検討

    岩田和彦,小林哲則

    電子情報通信学会 論文誌 D,   Vol.J102-D ( 6 ) 442 - 453  2019.06  [Refereed]

    Authorship:Last author

    DOI

  • TurkScanner: Predicting the hourly wage of microtasks

    Susumu Saito, Chun-Wei Chiang, Saiph Savage, Teppei Nakano, Tetsunori Kobayashi, Jeffrey Bigham

    Proc. The Web Conference 2019     3187 - 3193  2019.05  [Refereed]  [International coauthorship]

  • Postfiltering using an adversarial denoising autoencoder with noise-aware training

    Naohiro Tawara, Hikari Tanabe, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa

    Proc. ICASSP2019     3282 - 3286  2019.05  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • End-to-middle training based action generation for multi-party conversation robot

    Hayato Katayama, Shinya Fujie, Tetsunori Kobayashi

    Proc. IWSDS2019    2019.04  [Refereed]

    Authorship:Last author

  • Investigation of Users' Short Responses in Actual Conversation System and Automatic Recognition of their Intentions

    Katsuya Yokoyama, Hiroaki Takatsu, Hiroshi Honda, Shinya Fujie, Tetsunori Kobayashi

    2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings     934 - 940  2019.02

     View Summary

    In human-human conversations, listeners often convey intentions to speakers through feedback consisting of reflexive short responses. The speakers recognize these intentions and change the conversational plans to make communication more efficient. These functions are expected to be effective in human-system conversations also
    however, there is only a few systems using these functions or a research corpus including such functions. We created a corpus that consists of users' short responses to an actual conversation system and developed a model for recognizing the intention of these responses. First, we categorized the intention of feedback that affects the progress of conversations. We then collected 15604 short responses of users from 2060 conversation sessions using our news-delivery conversation system. Twelve annotators labeled each utterance based on intention through a listening test. We then designed our deep-neural-network-based intention recognition model using the collected data. We found that feedback in the form of questions, which is the most frequently occurring expression, was correctly recognized and contributed to the efficiency of the conversation system.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • 会話によるニュース記事伝達のための音声合成

    高津 弘明, 福岡 維新, 藤江 真也, 岩田 和彦, 小林 哲則

    人工知能学会 論文誌,   34 ( 2 ) B-I65_1 - 15  2019.02  [Refereed]

    Authorship:Last author

  • Speech synthesis for conversational news contents delivery

    Hiroaki Takatsu, Ishin Fukuoka, Shinya Fujie, Kazuhiko Iwata, Tetsunori Kobayashi

    Transactions of the Japanese Society for Artificial Intelligence   34 ( 2 )  2019

     View Summary

    We have been developing a speech-based “news-delivery system”, which can transmit news contents via spoken dialogues. In such a system, a speech synthesis sub system that can flexibly adjust the prosodic features in utterances is highly vital: the system should be able to highlight spoken phrases containing noteworthy information in an article
    it should also provide properly controlled pauses between utterances to facilitate user’s interactive reactions including questions. To achieve these goals, we have decided to incorporate the position of the utterance in the paragraph and the role of the utterance in the discourse structure into the bundle of features for speech synthesis. These features were found to be crucially important in fulfilling the above-mentioned requirements for the spoken utterances by the thorough investigation into the news-telling speech data uttered by a voice actress. Specifically, these features dictate the importance of information carried by spoken phrases, and hence should be effectively utilized in synthesizing prosodically adequate utterances. Based on these investigations, we devised a deep neural network-based speech synthesis model that takes as input the role and position features. In addition, we designed a neural network model that can estimate an adequate pause length between utterances. Experimental results showed that by adding these features to the input, it becomes more proper speech for information delivery. Furthermore, we confirmed that by inserting pauses properly, it becomes easier for users to ask questions during system utterances.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Recognition of Intentions of Users’ Short Responses for Conversational News Delivery System

    Hiroaki Takatsu, Katsuya Yokoyama, Yoichi Matsuyama, Shinya Fujie, Tetsunori Kobayashi

    Proc. INTERSPEECH2019     1193 - 1197  2019  [Refereed]

    Authorship:Last author

  • Social image tags as a source of word embeddings: A Task-oriented Evaluation

    Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi

    LREC 2018 - 11th International Conference on Language Resources and Evaluation     969 - 973  2019  [Refereed]

     View Summary

    © LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved. Distributional hypothesis has been playing a central role in statistical NLP. Recently, however, its limitation in incorporating perceptual and empirical knowledge is noted, eliciting a field of perceptually grounded computational semantics. Typical sources of features in such a research are image datasets, where images are accompanied by linguistic tags and/or descriptions. Mainstream approaches employ machine learning techniques to integrate/combine visual features with linguistic features. In contrast to or supplementing these approaches, this study assesses the effectiveness of social image tags in generating word embeddings, and argues that these generated representations exhibit somewhat different and favorable behaviors from corpus-originated representations. More specifically, we generated word embeddings by using image tags obtained from a large social image dataset YFCC100M, which collects Flickr images and the associated tags. We evaluated the efficacy of generated word embeddings with standard semantic similarity/relatedness tasks, which showed that comparable performances with corpus-originated word embeddings were attained. These results further suggest that the generated embeddings could be effective in discriminating synonyms and antonyms, which has been an issue in distributional hypothesis-based approaches. In summary, social image tags can be utilized as yet another source of visually enforced features, provided the amount of available tags is large enough.

  • SemSeq: A Regime for Training Widely-Applicable Word-Sequence Encoders.

    Hiroaki Tsuyuki, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi

    Computational Linguistics - 16th International Conference of the Pacific Association for Computational Linguistics(PACLING)   1215 CCIS   43 - 55  2019  [Refereed]

     View Summary

    © 2020, Springer Nature Singapore Pte Ltd. A sentence encoder that can be readily employed in many applications or effectively fine-tuned to a specific task/domain is highly demanded. Such a sentence encoding technique would achieve a broader range of applications if it can deal with almost arbitrary word-sequences. This paper proposes a training regime for enabling encoders that can effectively deal with word-sequences of various kinds, including complete sentences, as well as incomplete sentences and phrases. The proposed training regime can be distinguished from existing methods in that it first extracts word-sequences of an arbitrary length from an unlabeled corpus of ordered or unordered sentences. An encoding model is then trained to predict the adjacency between these word-sequences. Herein an unordered sentence indicates an individual sentence without neighboring contextual sentences. In some NLP tasks, such as sentence classification, the semantic contents of an isolated sentence have to be properly encoded. Further, by employing rather unconstrained word-sequences extracted from a large corpus, without heavily relying on complete sentences, it is expected that linguistic expressions of various kinds are employed in the training. This property contributes to enhancing the applicability of the resulting word-sequence/sentence encoders. The experimental results obtained from supervised evaluation tasks demonstrated that the trained encoder achieved performance comparable to existing encoders while exhibiting superior performance in unsupervised evaluation tasks that involve incomplete sentences and phrases.

    DOI

    Scopus

  • Towards Answer-unaware Conversational Question Generation.

    Mao Nakanishi, Tetsunori Kobayashi, Yoshihiko Hayashi

    Proceedings of the 2nd Workshop on Machine Reading for Question Answering(MRQA@EMNLP)     63 - 71  2019  [Refereed]

    DOI

  • Zero-Shot Video Retrieval from a Query Phrase Including Multiple Concepts - Efforts and Challenges in TRECVID AVS Task -

    Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Tetsunori Kobayashi

      84 ( 12 ) 983 - 990  2018.12  [Refereed]

    Authorship:Last author

  • Adversarial autoencoder for reducing nonlinear distortion

    Naohiro Tawara, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa

    Proc. APSIPA2018     1669 - 1673  2018.11  [Refereed]

  • Sequential fish catch forecasting using Bayesian state space models

    Yuya Kokaki, Naohiro Tawara, Tetsunori Kobayashi, Kazuo Hashimoto, Tetsuji Ogawa

    Proc. ICPR2018     776 - 781  2018.08  [Refereed]

  • Fine-grained Video Retrieval using Query Phrases – Waseda_Meisei TRECVID 2017 AVS System –

    Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Tetsunori Kobayashi

    Proceedings of the 24th International Conference on Pattern Recognition     3322 - 3327  2018.08  [Refereed]

    Authorship:Last author

  • Acoustic feature representation based on timbre for fault detection of rotary machines

    Kesaaki Menemura, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. SDPC2018     302 - 305  2018.08  [Refereed]

    Authorship:Last author

  • Speaker invariant feature extraction for zero-resource languages with adversarial training

    Taira Tsuchiya, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

    2018 IEEE International Conference on Acoustics, Speech and Signal Processing     2381 - 2385  2018.04  [Refereed]

  • Language model domain adaptation via recurrent neural network with domain-shared and domain-specific representations

    Tsuyoshi Morioka, Naohiro Tawara, Tetsuji Ogawa, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi

    Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018)     6084 - 6088  2018.04  [Refereed]  [International journal]

    Authorship:Last author

    DOI

    Scopus

    23
    Citation
    (Scopus)
  • A Spoken Dialogue System for Enabling Information Behavior of Various Intention Levels

    Hiroaki Takatsu, Ishin Fukuoka, Shinya Fujie, Yoshihiko Hayashi, Tetsunori Kobayashi

    Transactions of the Japanese Society for Artificial Intelligence   33 ( 1 ) DSH - C_1  2018  [Refereed]

    Authorship:Last author

    DOI

  • Ad-hoc Video Search Improved by the Word Sense Filtering of Query Terms

    Koji Hirakawa, Kotaro Kikuchi, Kazuya Ueki, Tetsunori Kobayashi, Yoshihiko Hayashi

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   11292 LNCS   157 - 163  2018  [Refereed]

     View Summary

    © 2018, Springer Nature Switzerland AG. The performances of an ad-hoc video search (AVS) task can only be improved when the video processing for analyzing video contents and the linguistic processing for interpreting natural language queries are nicely combined. Among the several issues associated with this challenging task, this paper particularly focuses on the sense disambiguation/filtering (WSD/WSF) of the terms contained in a search query. We propose WSD/WSF methods which employ distributed sense representations, and discuss their efficacy in improving the performance of an AVS system which makes full use of a large bank of visual concept classifiers. The application of a WSD/WSF method is crucial, as each visual concept classifier is linked with the lexical concept denoted by a word sense. The results are generally promising, outperforming not only a baseline query processing method that only considers the polysemy of a query term but also a strong WSD baseline method.

    DOI

    Scopus

  • Waseda_Meisei at TRECVID 2018: Ad-hoc Video Search.

    Kazuya Ueki, Yu Nakagome, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

    2018 TREC Video Retrieval Evaluation(TRECVID)    2018

  • Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension.

    Mao Nakanishi, Tetsunori Kobayashi, Yoshihiko Hayashi

    Proceedings of the 27th International Conference on Computational Linguistics(COLING)     973 - 983  2018  [Refereed]

  • Exploiting end of sentences and speaker alternations in recurrent neural network-based language modeling for multiparty conversations

    Hiroto Ashikawa, Naohiro Tawara, Asunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2017 (APSIPA2017)    2017.12  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Object Detection Oriented Feature Pooling for Video Semantic Indexing

    Kazuya Ueki, Tetsunori Kobayashi

    The 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications     44 - 51  2017.02  [Refereed]

    Authorship:Last author

  • Classifying Lexical-semantic Relationships by Exploiting Sense/Concept Representations

    Kentaro Kanada, Tetsunori Kobayashi, Yoshihiko Hayashi

    2017 Workshop on Sense, Concept and Entity Representations and their Application     37 - 46  2017  [Refereed]

  • Adaptive training of vibration-based anomaly detector for wind turbine condition monitoring

    Takanori Hasegawa, Jun Ogata, Masahiro Murakawa, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. Annual Conference on PHM Society     177 - 184  2017  [Refereed]

  • Incorporating visual features into word embeddings: A bimodal autoencoder-based approach.

    Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi

    IWCS 2017 - 12th International Conference on Computational Semantics - Short papers(IWCS(2))    2017  [Refereed]

  • “Video Semantic Indexing using Object Detector,”

    Kazuya Ueki, Tetsunori Kobayashi

    Proc. VRCAI2016    2016.12  [Refereed]

    Authorship:Last author

  • “Evaluation for Collaborative Video Surveillance Platform using Prototype System of Abandoned Object Detection,”

    Susumu Saito, Teppei Nakano, Tetsunori Kobayashi

    Proc. ICDSC2016     172 - 177  2016.09  [Refereed]

    Authorship:Last author

  • Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering

    Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi

    Trans. on Signal and Information Processing   5  2016.08  [Refereed]

    Authorship:Last author

  • Waseda at TRECVID 2016: Fully-automatic Ad-hoc Video Search

    Kotaro Kikuchi, Kazuya Ueki, Susumu Saito, Tetsunori Kobayashi

    2016 TREC Video Retrieval Evaluation, TRECVID 2016    2016

  • A Spoken Dialog System for Coordinating Information Consumption and Exploration.

    Shinya Fujie, Ishin Fukuoka, Asumi Mugita, Hiroaki Takatsu, Yoshihiko Hayashi, Tetsunori Kobayashi

    Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval(CHIIR)     253 - 256  2016  [Refereed]

    Authorship:Last author

     View Summary

    © 2016 ACM. Passive consumption of information is boring in most cases and even painful in some cases, especially when the information content is delivered by employing speech media. The user of a speech-based information delivery system, for example a text-to-speech system, usually cannot interrupt the ongoing information ow, inhibiting her/him to confirm some part of the content, or to pose an inquiry for further information exploration. We argue that a carefully designed spoken dialog system could remedy these undesirable situations, and further enable an enjoyable conversation with the users. The key technologies to realize such an attractive dialog system are: (1) pre-compilation of a dialog plan based on the analysis of a source content, and (2) the dynamic recognition of user's state of understanding and interests. This paper illustrates technical views to implement these functionalities, and discusses a dialog example to exemplify the technical merits of the proposed system.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Multi-Feature Based Fast Depth Decision in HEVC Inter Prediction for VLSI Implementation

    Gaoxing Chen, Zhenyu Liut, Tetsunori Kobayashi, Takeshi Ikenaga

    2016 9TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2016)     124 - 134  2016  [Refereed]

     View Summary

    High efficiency video coding (HEVC) is the latest international video compression standard that achieves double compression efficiency than the previous standard H.264/AVC. To increase the compression accuracy, HEVC employs the coding unit (CU) ranging from 8 x 8 to 64 x 64. However, the encoding complexity of HEVC increase a lot since the manifold partition sizes. A lot of works are focused on reducing the complexity but didn't considered the feasibility of hardware implementation. This paper proposes a hardware friendly fast depth range definition algorithm based on multiple features. Block texture feature, quantization feature and block motion feature are utilized. Block texture feature is based on the texture similarity in consecutive frames. Quantization feature is based on the compression regularity of HEVC. Block motion feature is for compensate the difference caused by the moving object. Comparing with the original HEVC, the proposed method can saved about 33.72% of the processing time with 0.76% of BD-bitrate increase on average.

  • Image Retrieval under Very Noisy Annotations

    Kazuya Ueki, Tetsunori Kobayashi

    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO)     1277 - 1282  2016  [Refereed]

    Authorship:Last author

     View Summary

    In recent years, a significant number of tagged images uploaded onto image sharing sites has enabled us to create high-performance image recognition models. However, there are many inaccurate image tags on the Internet, and it is very laborious to investigate the percentage of tags that are incorrect. In this paper, we propose a new method for creating an image recognition model that can be used even when the image data set includes many incorrect tags. Our method has two superior features. First, our method automatically measures the reliability of annotations and does not require any parameter adjustment for the percentage of error tags. This is a very important feature because we usually do not know how many errors are included in the database, especially in actual Internet environments. Second, our method iterates the error modification process. It begins with the modification of simple and obvious errors, gradually deals with much more difficult errors, and finally creates the high-performance recognition model with refined annotations. Using an object recognition image database with many annotation errors, our experiments showed that the proposed method successfully improved the image retrieval performance in approximately 90 percent of the image object categories.

  • Video Semantic Indexing using Object Detection-Derived Features

    Kotaro Kikuchi, Kazuya Ueki, Tetsuji Ogawa, Tetsunori Kobayashi

    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO)     1288 - 1292  2016  [Refereed]

    Authorship:Last author

     View Summary

    A new feature extraction method based on object detection to achieve accurate and robust semantic indexing of videos is proposed. Local features (e.g., SIFT and HOG) and convolutional neural network (CNN)-derived features, which have been used in semantic indexing, in general are extracted from the entire image and do not explicitly represent the information of meaningful objects that contributes to the determination of semantic categories. In this case, the background region, which does not contain the meaningful objects, is unduly considered, exerting a harmful effect on the indexing performance. In the present study, an attempt was made to suppress the undesirable effects derived from the redundant background information by incorporating object detection technology into semantic indexing. In the proposed method, a combination of the meaningful objects detected in the video frame image is represented as a feature vector for verification of semantic categories. Experimental comparisons demonstrate that the proposed method facilitates the TRECVID semantic indexing task.

  • IMPROVING SEMANTIC VIDEO INDEXING: EFFORTS IN WASEDA TRECVID 2015 SIN SYSTEM

    Kazuya Ueki, Tetsunori Kobayashi

    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS     1184 - 1188  2016  [Refereed]

    Authorship:Last author

     View Summary

    In this paper, we propose a method for improving the performance of semantic video indexing. Our approach involves extracting features from multiple convolutional neural networks (CNNs), creating multiple classifiers, and integrating them. We employed four measures to accomplish this: (1) utilizing multiple evidences observed in each video and effectively compressing them into a fixed-length vector; (2) introducing gradient and motion features to CNNs; (3) enriching variations of the training and the testing sets; and (4) extracting features from several CNNs trained with various large-scale datasets. Using the test dataset from TRECVID's 2014 evaluation benchmark, we evaluated the performance of the proposal in terms of the mean extended inferred average precision measure. On this measure, our system's performance was 35.7, outperforming the state-of-the-art TRECVID 2014 benchmark performance of 33.2. Based on this work, our submission at TRECVID 2015 was ranked second among all submissions.

  • Separation matrix optimization using associative memory model for blind source separation

    Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri

    2015 23rd European Signal Processing Conference, EUSIPCO 2015     1098 - 1102  2015.12  [Refereed]

     View Summary

    A source signal is estimated using an associative memory model (AMM) and used for separation matrix optimization in linear blind source separation (BSS) to yield high quality and less distorted speech. Linear-filtering-based BSS, such as independent vector analysis (IVA), has been shown to be effective in sound source separation while avoiding non-linear signal distortion. This technique, however, requires several assumptions of sound sources being independent and generated from non-Gaussian distribution. We propose a method for estimating a linear separation matrix without any assumptions about the sources by repeating the following two steps: estimating non-distorted reference signals by using an AMM and optimizing the separation matrix to minimize an error between the estimated signal and reference signal. Experimental comparisons carried out in simultaneous speech separation suggest that the proposed method can reduce the residual distortion caused by IVA.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Improving Classification Accuracy of Image Categories Using Local Descriptors with Supplemental Information

    Kazuya Ueki, Yohei Shiraishi, Naohiro Tawara, Tetsunori Kobayashi

      80 ( 12 ) 1144 - 1149  2015.12  [Refereed]

    Authorship:Last author

  • Waseda at TRECVID 2015: Semantic Indexing, notebook paper of the TRECVID 2015 Workshop: November 2015.

    Kazuya Ueki , Tetsunori Kobayashi

    The TREC Video Retrieval Evaluation2015    2015.11  [Refereed]

    Authorship:Last author

  • Automatic image tag refinement for image retrieval.

    Kazuya Ueki , Tetsunori Kobayashi

    Proc. 5th Asia International Symposium on Mechatronics     396 - 399  2015.10  [Refereed]

    Authorship:Last author

  • Multiscale recurrent neural network based language model.

    Tsuyoshi Morioka, Tomoharu Iwata, Takaaki Hori, Tetsunori Kobayashi

    Proc. 16th Annual Conf. of the Int'l Speech Communication Association     2366 - 2370  2015.09  [Refereed]

    Authorship:Last author

  • Bilinear map of filter-bank outputs for DNN-based speech recognition.

    Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi , Tsuneo Nitta

    Proc. 16th Annual Conf. of the Int'l Speech Communication Association     16 - 20  2015.09  [Refereed]

  • Blind source separation using associative memory model and linear separation filter.

    Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri

    Proc. 2015 European Signal Processing Conference     1103 - 1107  2015.09  [Refereed]

    Authorship:Corresponding author

  • A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model

    Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura and Tetsunori Kobayashi

    Trans. on Signal and Information Processing   4 ( e6 )  2015.09  [Refereed]

    Authorship:Last author

  • Bilinear map of filter‐bank outputs for DNN‐based speech recognition

    Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta

    INTERSPEECH 2015     16 - 20  2015.09  [Refereed]

  • Feature extraction for rotary-machine acoustic diagnostics focused on period.

    Kesaaki Minemura, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. 44th International Congress and Exposition on Noise Control Engineering    2015.08  [Refereed]

    Authorship:Last author

  • Towards a Computational Model of Small Group Facilitation.

    Yoichi Matsuyama, Tetsunori Kobayashi

    2015 AAAI Spring Symposium Series    2015.03  [Refereed]

    Authorship:Last author

  • Automatic Expressive Opinion Sentence Generation for Enjoyable Conversational Systems

    Yoichi Matsuyama, Akihiro Saito, Shinya Fujie and Tetsunori Kobayashi

    Trans. on Audio, Speech, and Language Processing   23 ( 1 ) 313 - 326  2015.02  [Refereed]

    Authorship:Last author

  • Waseda at TRECVID 2015 semantic indexing (SIN)

    Kazuya Ueki, Tetsunori Kobayashi

    2015 TREC Video Retrieval Evaluation, TRECVID 2015    2015

  • A COMPARATIVE STUDY OF SPECTRAL CLUSTERING FOR I-VECTOR-BASED SPEAKER CLUSTERING UNDER NOISY CONDITIONS

    Naohiro Tawara, Tetsuji Ogawa, Tetsunori Kobayashi

    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP)     2041 - 2045  2015  [Refereed]

    Authorship:Last author

     View Summary

    The present paper dealt with speaker clustering for speech corrupted by noise. In general, the performance of speaker clustering significantly depends on how well the similarities between speech utterances can be measured. The recently proposed i-vector-based cosine similarity has yielded the state-of-the-art performance in speaker clustering systems. However, this similarity often fails to capture the speaker similarity under noisy conditions. Therefore, we attempted to examine the efficiency of spectral clustering on i-vector-based similarity for speech corrupted by noise because spectral clustering can yield robustness against noise by non-linear projection. Experimental comparisons demonstrated that spectral clustering yielded significant improvement from conventional methods, such as agglomerative clustering and k -means clustering, under non-stationary noise conditions.

  • Multi-layer Feature Extractions for Image Classification - Knowledge from Deep CNNs

    Kazuya Ueki, Tetsunori Kobayashi

    2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015)     9 - 12  2015  [Refereed]

    Authorship:Last author

     View Summary

    Recently, there has been considerable research into the application of deep learning to image recognition. Notably, deep convolutional neural networks (CNNs) have achieved excellent performance in a number of image classification tasks, compared with conventional methods based on techniques such as Bag-of-Features (BoF) using local descriptors. In this paper, to cultivate a better understanding of the structure of CNN, we focus on the characteristics of deep CNNs, and adapt them to SIFT+BoF-based methods to improve the classification accuracy. We introduce the multi-layer structure of CNNs into the classification pipeline of the BoF framework, and conduct experiments to confirm the effectiveness of this approach using a fine-grained visual categorization dataset. The results show that the average classification rate is improved from 52.4% to 69.8%.

  • Effect of frequency weighting on MLP-based speaker canonicalization,

    Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi , Tsuneo Nitta,

    Proc. 15th Annual Conf. of the Int'l Speech Communication Association     2987 - 2991  2014.09  [Refereed]

    Authorship:Last author

  • Effect of Frequency Weighting on MLP-Based Speaker Canonicalization

    Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta

    INTERSPEECH 2014     2987 - 2990  2014.09  [Refereed]

  • “Blocked Gibbs Sampling Based Multi-Scale Mixture Model for Speaker Clustering on Noisy Data”,

    Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi,

    IEEE International Workshop on Machine Learning for Signal Processing    2013.09  [Refereed]

    Authorship:Last author

  • Expression of Speaker's Intentions through Sentence-Final Particle/Intonation Combinations in Japanese Conversational Speech Syntyesis

    Kazuhiko Iwata, Tetsunori Kobayashi

    8th ISCA Speech Synthesis Workshop     235 - 240  2013.08  [Refereed]

    Authorship:Last author

  • Speaker's Intentions Conveyed to Listeners by Sentence-Final Particles and Their Intonations in Japanese Conversational Speech

    Kazuhiko Iwata, Tetsunori Kobayashi

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2013     6895 - 6899  2013.05  [Refereed]

    Authorship:Last author

  • A Four-Participant Group Facilitation Framework for Conversational Robots

    Yoichi Matsuyama, Iwao Akiba, Akihiro Saito, Tetsunori Kobayashi

    Proceedings of the SIGDIAL 2013 Conference     284 - 293  2013  [Refereed]

    Authorship:Last author

  • Expressing speaker's intentions through sentence-final intonations for Japanese conversational speech synthesis

    Kazuhiko Iwata, Tetsunori Kobayashi

    Proc. 13th Annual Conf. of the Int'l Speech Communication Association     442 - 445  2012.09  [Refereed]

    Authorship:Last author

  • Fully Bayesian Speaker Clustering Based on Hierarchically Structured Utterance-oriented Dirichlet process mixture model.

    Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

    Proc. 13th Annual Conf. of the Int'l Speech Communication Association     2166 - 2169  2012.09  [Refereed]

    Authorship:Last author

  • AAM Fitting Using Shape Parameter Distribution.

    Youhei Shiraishi, Shinya Fujie , Tetsunori Kobayashi

    Proc. EUSIPCO2012     2238 - 2242  2012.08  [Refereed]

    Authorship:Last author

  • Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering.

    Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi,

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2012     5253 - 5256  2012.03  [Refereed]

    Authorship:Last author

  • Conversation Robot Participating in and Promoting Human-Human Communication

    Shinya Fujie, Yoichi Matsuyama, Akira Taniyama, Tetsunori Kobayashi

      J95-A ( 1 ) 37 - 45  2012.01  [Refereed]

    Authorship:Last author

  • Spatial filter calibration based on minimization of modified LSD.

    Nobuaki Tanaka, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. 12th Annual Conf. of the Int'l Speech Communication Association     1761 - 1764  2011.09  [Refereed]

    Authorship:Last author

  • Development and evaluation of Japanese Lombard speech corpus

    Tetsuji Ogawa, Takanobu Nishiura, Takeshi Yamada, Norihide Kitaoka, Tetsunori Kobayashi,

    Proc. Internoise2011     1366 - 1373  2011.09  [Refereed]

    Authorship:Last author

  • Speaker verification robust to talking style variation using multiple kernel leaning based on conditional entropy minimization

    Tetsuji Ogawa, Hideitsu Hino, Noboru Murata, Tetsunori Kobayashi

    Proc. 12th Annual Conf. of the Int'l Speech Communication Association     2741 - 2744  2011.08  [Refereed]

    Authorship:Last author

  • Speaker Clustering Based on Utterance-oriented Dirichlet Process Mixture Model.

    Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. 12th Annual Conf. of the Int'l Speech Communication Association     2905 - 2908  2011.08  [Refereed]

    Authorship:Last author

  • Multiparty Conversation Facilitation Strategy Using Combination of Question Answering and Spontaneous Utterances

    Yoichi Matsuyama, Yushi Xu, Akihiro Saito, Shinya Fujie, Tetsunori Kobayashi

    The paralinguistic Information Processing and its Integration in Spoken Dialogue Systems     103 - 112  2011.08  [Refereed]

    Authorship:Last author

  • Conversational Speech Synthesis System with Communication Situation Dependent HMMs

    Kazuhiko Iwata, Tetsunori Kobayashi

    The paralinguistic Information Processing and its Integration in Spoken Dialogue Systems     113 - 124  2011.08  [Refereed]

    Authorship:Last author

  • Class-Distance-Based Discriminant Analysis and Its Application to Supervised Automatic Age Estimation

    OGAWA Tetsuji, UEKI Kazuya, KOBAYASHI Tetsunori

    IEICE Trans. Inf. & Syst.   94 ( 8 ) 1683 - 1689  2011.08  [Refereed]

    Authorship:Last author

     View Summary

    We propose a novel method of supervised feature projection called class-distance-based discriminant analysis (CDDA), which is suitable for automatic age estimation (AAE) from facial images. Most methods of supervised feature projection, e.g., Fisher discriminant analysis (FDA) and local Fisher discriminant analysis (LFDA), focus on determining whether two samples belong to the same class (i.e., the same age in AAE) or not. Even if an estimated age is not consistent with the correct age in AAE systems, i.e., the AAE system induces error, smaller errors are better. To treat such characteristics in AAE, CDDA determines between-class separability according to the class distance (i.e., difference in ages); two samples with similar ages are imposed to be close and those with spaced ages are imposed to be far apart. Furthermore, we propose an extension of CDDA called local CDDA (LCDDA), which aims at handling multimodality in samples. Experimental results revealed that CDDA and LCDDA could extract more discriminative features than FDA and LFDA.

    DOI CiNii

    Scopus

  • mn SPEAKER RECOGNITION USING MULTIPLE KERNEL LEARNING BA SED ON CONDITIONA L ENTROPY MINIMIZATION

    Tetsuji Ogawa, Hideitsu Hino, Nima Reyhani, Noboru Murata, Tetsunori Kobayashi

    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING     2204 - 2207  2011  [Refereed]

    Authorship:Last author

     View Summary

    We applied a multiple kernel learning (MKL) method based on information-theoretic optimization to speaker recognition. Most of the kernel methods applied to speaker recognition systems require a suitable kernel function and its parameters to be determined for a given data set. In contrast, MKL eliminates the need for strict determination of the kernel function and parameters by using a convex combination of element kernels. In the present paper, we describe an MKL algorithm based on conditional entropy minimization (MCEM). We experimentally verified the effectiveness of MCEM for speaker classification; this method reduced the speaker error rate as compared to conventional methods.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Framework of Communication Activation Robot Participating in Multiparty Conversation

    Yoichi Matsuyama, Shinya Fujie, Tetsunori Kobayashi

    AAAI Fall Symposium, Dialog with Robots     68 - 73  2010.11  [Refereed]

    Authorship:Last author

  • DEVELOPMENT OF ZONAL BEAMFORMER AND ITS APPLICATION TO ROBOT AUDITION

    Nobuaki Tanaka, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

    18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010)     1529 - 1533  2010.08  [Refereed]

    Authorship:Last author

     View Summary

    We have proposed a zonal beamformer (ZBF), which enhances the sound source located in a zonal space, and applied the ZBF to noise reduction systems for robot audition. A conversational partner of a robot does not always remain stationary with respect to the robot. In order to cope with such a situation, we have proposed a fan-like beamformer (FBF), which enhances the sound source located in a fan-like space in front of the robot under the assumption that the partner is in front of the robot. However, the FBF may degrade the noise reduction performance when directional noise sources are located behind the target source because the FBF widens the space as the distance from the robot increases. The ZBF can better improve the performance of eliminating the directional noise coming from behind the target source than the FBF because the ZBF has a considerably sharper directivity than the FBF.

  • Speech enhancement using a square microphone array in the presence of directional and diffuse noise

    Tetsuji Ogawa, Shintaro Takada, Kenzo Akagiri, and Tetsunori Kobayashi

    Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA)   E93-E ( 5 )  2010.05  [Refereed]

    Authorship:Last author

  • A Meeting Assistance System with a Collaborative Editor for Argument Structure Visualization

    Yasutomo Arai, Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

    Proc. Int'l Conf. on Computer Supported Corporative Work 2010    2010.02  [Refereed]

    Authorship:Last author

  • A Collaborative Lexical Data Design System for Speech Recognition Application Developers

    Hiroshi Sasaki, Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

    Proc. Int'l Conf. on Computer Supported Corporative Work 2010     455 - 456  2010.02  [Refereed]

    Authorship:Last author

  • Conversation Robot and Its Audition System

    FUJIE Shinya, OGAWA Tetsuji, KOBAYASHI Tetsunori

    JRSJ   28 ( 1 ) 23 - 26  2010.01  [Refereed]  [Invited]

    Authorship:Last author

    DOI CiNii

  • Psychological evaluation of a group communication activativation robot in a party game

    Yoichi Matsuyama, Shinya Fujie, Hikaru Taniyama, Tetsunori Kobayashi

    Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010     3046 - 3049  2010

     View Summary

    We propose a communication activation robot and evaluate effectiveness of communication activation. As an example of application, we developed the system participating in a quiz-formed party game called NANDOKU quiz on a multi-modal conversation robot SCHEMA, and we conducted an experiment in a laboratory to evaluate its capability of activation in group communication. We evaluated interaction in NANDOKU quiz game with subjects as panelists using video analysis and SD(Semantic Differential) method with questionnaires. The result of SD method indicates that subjects feel more pleased and more noisy with participation of a robot. As the result of video analysis, the smiling duration ratio is greater with participation of a robot. These results imply evidence of robot's communication activation function in the party game. © 2010 ISCA.

  • A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination

    Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

    Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010     2954 - 2957  2010

     View Summary

    We present a realization method of the principle of minimum relative entropy discrimination (MRED) in order to derive a regularized discriminative training method. MRED is advantageous since it provides a Bayesian interpretations of the conventional discriminative training methods and regularization techniques. In order to realize MRED for speech recognition, we proposed an approximation method of MRED that strictly preserves the constraints used in MRED. Further, in order to practically perform MRED, an optimization method based on convex optimization and its solver based on the cutting plane algorithm are also proposed. The proposed methods were evaluated on continuous phoneme recognition tasks. We confirmed that the MRED-based training system outperformed conventional discriminative training methods in the experiments. © 2010 ISCA.

  • Development of zonal beam former and its application to robot audition

    Nobuaki Tanaka, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

    European Signal Processing Conference     1529 - 1533  2010

     View Summary

    We have proposed a zonal beamformer (ZBF), which enhances the sound source located in a zonal space, and applied the ZBF to noise reduction systems for robot audition. A conversational partner of a robot does not always remain stationary with respect to the robot. In order to cope with such a situation, we have proposed a fan-like beamformer (FBF), which enhances the sound source located in a fan-like space in front of the robot under the assumption that the partner is in front of the robot. However, the FBF may degrade the noise reduction performance when directional noise sources are located behind the target source because the FBF widens the space as the distance from the robot increases. The ZBF can better improve the performance of eliminating the directional noise coming from behind the target source than the FBF because the ZBF has a considerably sharper directivity than the FBF. © EURASIP, 2010.

  • SCHEMA: multi-party interaction-oriented humanoid robot

    Yoichi Matsuyama, Kosuke Hosoya, Hikaru Taniyama, Hiroki Tsuboi, Shinya Fujie, Tetsunori Kobayashi

    ACM SIGGRAPH ASIA 2009 Art Gallery & Emerging Technologies: Adaptation    2009.12  [Refereed]

    Authorship:Last author

    DOI

  • Influence of Lombard Effect: Accuracy Analysis of Simulation-Based Assessments of Noisy Speech Recognition Systems for Various Recognition Conditions

    Tetsuji Ogawa, Tetsunori Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E92D ( 11 ) 2244 - 2252  2009.11  [Refereed]

    Authorship:Last author

     View Summary

    The accuracy of simulation-based assessments of speech recognition systems under noisy conditions is investigated with a focus on the influence of the Lombard effect on the speech recognition performances. This investigation was carried out under various recognition conditions of different sound pressure levels of ambient noise, for different recognition tasks, such as continuous speech recognition and spoken word recognition, and using different recognition systems, i.e., systems with and without adaptation of the acoustic models to ambient noise. Experimental results showed that accurate simulation was not always achieved when dry sources with neutral talking style were used, but it could be achieved if the dry sources that include the influence of the Lombard effect were used; the simulation in the latter case is accurate, irrespective of the recognition conditions.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Conversation robot participating in and activating a group communication

    Shinya Fujie, Yoichi Matsuyama, Hikaru Taniyama, Tetsunori Kobayashi

    Proc. 10th Annual Conf. of the Int'l Speech Communication Association     264 - 267  2009.09  [Refereed]

    Authorship:Last author

  • Direction-of-arrival estimation under noisy condition using four-line omni-directional microphones mounted on a robot head

    Tetsuji Ogawa, Kosuke Hosoya, Kenzo Akagiri, Tetsunori Kobayashi

    Proc. 2009 European Signal Processing Conference     879 - 883  2009.08  [Refereed]

    Authorship:Last author

  • System Design of Group Communication Activator: An Entertainment Task for Elderly Care

    Yoichi Matsuyama, Hikaru Taniyama, Shinya Fujie, Tetsunori Kobayashi

    Proc. HRI2009     243 - 244  2009.03  [Refereed]

    Authorship:Last author

  • Upper-Body Contour Extraction Using Face and Body Shape Variance Information

    Kazuki Hoshiai, Shinya Fujie, Tetsunori Kobayashi

    ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS   5414   862 - +  2009  [Refereed]

    Authorship:Last author

     View Summary

    We propose a fitting method using a model that integrates face and body shape variance information for upper-body contour extraction. Accurate body-contour extraction is necessary for various applications, such as pose estimation, gesture recognition, and so on. In this study, we regard it as the shape model fitting problem. A model including shape variance information can fit to the contour robustly even in the noisy case. AAMs are one of these models and can fit to a face successfully. It needs appearance information for effective fitting, but it can not be used in our case because appearance of upper-body easily changes by clothes. Instead of intensity image, proposed method uses edge image as appearance information. However, discrimination between a true contour edge of upper-body and other edges is difficult. To solve this problem, we integrate shapes of upper-body and face. It is expected that this integrated model is more robust to edges in clutter background and various locations of the body than a body shape model using only body shape information, We conduct experiments and confirm improvement in accuracy by integration of face and body variance information.

  • Robot auditory system using head-mounted square microphone array

    Kosuke Hosoya, Tetsuji Ogawa, Tetsunori Kobayashi

    2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS     2736 - 2741  2009  [Refereed]

    Authorship:Last author

     View Summary

    A new noise reduction method suitable for autonomous mobile robots was proposed and applied to preprocessing of a hands-free spoken dialogue system. When a robot talks with a conversational partner in real environments, not only speech utterances by the partner but also various types of noise, such as directional noise, diffuse noise, and noise from the robot, are observed at microphones. We attempted to remove these types of noise simultaneously with small and light-weighted devices and low-computational-cost algorithms. We assumed that the conversational partner of the robot was in front of the robot. In this case, the aim of the proposed method is extracting speech signals coming from the frontal direction of the robot. The proposed noise reduction system was evaluated h the presence of various types of noise: the number of word errors was reduced by 69 % as compared to the conventional methods. The proposed robot auditory system can also cope with the case in which a conversational partner (i.e., a sound source) moves from the front of the robot: the sound source was localized by face detection and tracking using facial images obtained from a camera mounted on an eye of the robot. As a result, various types of noise could be reduced in real time, irrespective of the sound source positions, by combining speech information with image information.

  • Multi-modal Integration for Personalized Conversation: Towards a Humanoid in Daily Life

    Shinya Fujie, Daichi Watanabe, Yuhi Ichikawa, Hikaru Taniyama, Kosuke Hosoya, Yoichi Matsuyama, Tetsunori Kobayashi

    Proc. Int'l Conf. on Humanoid Robots     617 - 622  2008.12  [Refereed]

    Authorship:Last author

  • Designing Communication Activation System in Group Communication

    Yoichi Matsuyama, Hikaru Taniyama, Shinya Fujie, Tetsunori Kobayashi

    Proc. Int'l Conf. on Humanoid Robots     629 - 634  2008.12  [Refereed]

    Authorship:Last author

  • Class Distance Weighted Locality Preserving Projection for Automatic Age Estimation

    Kazuya Ueki, Masakazu Miya, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. Biometrics: Theory, Applications and Systems    2008.10  [Refereed]

    Authorship:Last author

  • Design and Formulation for Speech Interface Based on Flexible Shortcuts

    Teppei Nakano, Tomoyuki Kumai, Tetsunori Kobayashi, Yasushi Ishikawa

    Proc. 9th Annual Conf. of the Int'l Speech Communication Association     2474 - 2477  2008.09  [Refereed]

    Authorship:Corresponding author

  • An ASM fitting method based on machine learning that provides a robust parameter initialization for AAM fitting

    Matthias Wimmer, Shinya Fujie, Freek Stulp, Tetsunori Kobayashi, Bernd Radig

    Proc. Int'l Conf. on Automatic Face and Gesture Recognition    2008.09  [Refereed]

    Authorship:Corresponding author

  • Ears of the robot: noise reduction using four-line ultra-micro omni-directional microphones mounted on a robot head

    Tetsuji Ogawa, Hirofumi Takeuchi, Shintaro Takada, Kenzo Akagiri, Tetsunori Kobayashi

    Proc. 2008 European Signal Processing Conference    2008.08  [Refereed]

    Authorship:Last author

  • Ears of the robot: Direction of arrival estimation based on pattern recognition using robot-mounted microphones

    Naoya Mochiki, Tetsuji Ogawa, Tetsunori Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E91D ( 5 ) 1522 - 1530  2008.05  [Refereed]

    Authorship:Last author

     View Summary

    We propose a new type of direction-of-arrival estimation method for robot audition that is free from strict head related transfer function estimation. The proposed method is based on statistical pattern recognition that employs a ratio of power spectrum amplitudes occurring for a microphone pair as a feature vector. It does not require any phase information explicitly, which is frequently used in conventional techniques, because the phase information is unreliable for the case in which strong reflections and diffractions occur around the microphones. The feature vectors we adopted can treat these influences naturally. The effectiveness of the proposed method was shown from direction-of-arrival estimation tests for 19 kinds of directions: 92.4% of errors were reduced compared with the conventional phase-based method.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Speech enhancement using square microphone array for mobile devices

    Shintaro Takada, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2008     313 - 316  2008.04  [Refereed]

    Authorship:Last author

  • Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR,

    Shoei Sato, Akio Kobayashi, Kazuo Onoe, Shinichi Homma, Toru Imai,Tohru Takagi, Tetsunori Kobayashi

    Trans. on Information and Systems (ED)   E91-D ( 3 ) 815 - 824  2008.03  [Refereed]

    Authorship:Last author

  • Social robots that interact with people.

    Cynthia Breazeal, Atsuo Takanishi, Tetsunori Kobayashi

    Springer handbook of robotics    2008  [Refereed]  [Invited]

    Authorship:Last author

  • Upper-body Contour Extraction and Tracking Using Face and Body Shape Variance Information

    Kazuki Hoshiai, Shinya Fujie, Tetsunori Kobayashi

    2008 8TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS 2008)     398 - +  2008  [Refereed]

    Authorship:Last author

     View Summary

    We propose a fitting method using a model that integrates face and body shape variance information for upper-body contour extraction and tracking. Accurate body-contour extraction is necessary for various applications, such as pose estimation, gesture recognition, and so on. In this study, we regard it as the shape model fitting problem. A model including shape variance information can fit to the contour robustly even in the noise case. AAMs are one of these models and can fit to a face successfully. It needs appearance information for effective fitting, but it can not be used in our case because appearance of upper-body easily changes by clothes. Instead of intensity image, proposed method uses edge image as appearance information. However, discrimination between a true contour edge of upper-body and other edges is difficult. To solve this problem, we integrate shape models of upper body and face. It is expected that this integrated model is more robust to edges in clutter background and various locations of the body than a body shape model using only body shape information. We conduct experiments and confirm improvement in accuracy by integration of face and body variance information.

  • Extensible speech recognition system using Proxy-Agent

    Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

    Proc. Automatic Speech Recognition and Understanding Workshop     601 - 606  2007.12  [Refereed]

    Authorship:Last author

  • Gender Classification Based on Integration of Multiple Classifiers Using Different Features of Facial and Neck Imgaes

    Kazuya Ueki, Tetsunori Kobayashi

    Journal of the Institute of Image Information and Television Engineers   61 ( 12 ) 1803 - 1809  2007.12  [Refereed]

    Authorship:Last author

  • Sound Source Separation using Null-Beamforming and Spectral Subtraction for Mobile Devices

    Shintaro Takada, Satoshi Kanba, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

    IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2007)     30 - 33  2007.10  [Refereed]

    Authorship:Last author

  • Ears of the Robot : Three Simultaneous Speech Segregation and Recognition Using Robot-Mounted Microphones

    MOCHIKI Naoya, OGAWA Tetsuji, KOBAYASHI Tetsunori

    IEICE Trans. Inf. Syst., D   90 ( 9 ) 1465 - 1468  2007.09  [Refereed]

    Authorship:Last author

     View Summary

    A new type of sound source segregation method using robot-mounted microphones, which are free from strict head related transfer function (HRTF) estimation, has been proposed and successfully applied to three simultaneous speech recognition systems. The proposed segregation method is executed with sound intensity differences that are due to the particular arrangement of the four directivity microphones and the existence of a robot head acting as a sound barrier. The proposed method consists of three-layered signal processing: two-line SAFIA (binary masking based on the narrow band sound intensity comparison), two-line spectral subtraction and their integration. We performed 20K vocabulary continuous speech recognition test in the presence of three speakers' simultaneous talk, and achieved more than 70% word error reduction compared with the case without any segregation processing.

    CiNii

  • Ears of the robot: Three simultaneous speech segregation and recognition using robot-mounted microphones

    Naoya Mochiki, Tetsuji Ogawa, Tetsunori Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E90D ( 9 ) 1465 - 1468  2007.09  [Refereed]

    Authorship:Last author

     View Summary

    A new type of sound source segregation method using robot-mounted microphones, which are free from strict head related transfer function (HRTF) estimation, has been proposed and successfully applied to three simultaneous speech recognition systems. The proposed segregation method is executed with sound intensity differences that are due to the particular arrangement of the four directivity microphones and the existence of a robot head acting as a sound barrier. The proposed method consists of three-layered signal processing: two-line SAFIA (binary masking based on the narrow band sound intensity comparison), two-line spectral subtraction and their integration. We performed 20K vocabulary continuous speech recognition test in the presence of three speakers' simultaneous talk, and achieved more than 70% word error reduction compared with the case without any segregation processing.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR

    Shoei Sato, Tetsunori Kobayashi, et al.,

    Proc. 8th Annual Conf. of the Int'l Speech Communication Association     345 - 348  2007.08  [Refereed]

    Authorship:Corresponding author

  • Fusion-based age-group classification method using multiple two-dimensional feature extraction algorithms

    Kazuya Ueki, Tetsunori Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E90D ( 6 ) 923 - 934  2007.06  [Refereed]

    Authorship:Last author

     View Summary

    An age-group classification method based on a fusion of different classifiers with different two-dimensional feature extraction algorithms is proposed. Theoretically, an integration of multiple classifiers can provide better performance compared to a single classifier. In this paper, we extract effective features from one sample image using different dimensional reduction methods, construct multiple classifiers in each subspace, and combine them to reduce age-group classification errors. As for the dimensional reduction methods, two-dimensional PCA (2DPCA) and two-dimensional LDA (2DLDA) are used. These algorithms are antisymmetric in the treatment of the rows and the columns of the images. We prepared the row-based and column-based algorithms to make two different classifiers with different error tendencies. By combining these classifiers with different errors, the performance can be improved. Experimental results show that our fusion-based age-group classification method achieves better performance than existing two-dimensional algorithms alone.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • マルチモーダル会話ロボット:ロボットが会話において「聴く」行為について

    小林哲則, 藤江真也

    計測自動制御学会誌   46 ( 6 ) 466 - 471  2007.06  [Refereed]  [Invited]

    Authorship:Lead author

  • Speech Starter: Speech Input Interface Capable of Endpoint Detection by Using Filled Pauses

    Masataka Goto, Koji Kitayama, Katsunobu Itou, Tetsunori Kobayashi

      48 ( 5 ) 2001 - 2011  2007.05  [Refereed]

    Authorship:Last author

  • Adequency analysis of simulation-based assessment of speech recognition system

    Tetsuji Ogawa, Satoshi Kanba, Tetsunori Kobayashi

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2007     1153 - 1157  2007.04  [Refereed]

    Authorship:Last author

  • Speech Spotter: Speech Input Interface Capable of Using Speech Recognition in the Midst of Human-Human Conversation

    Masataka Goto, Koji Kitayama, Katsunobu Itou, Tetsunori Kobayashi

      48 ( 3 ) 1275 - 1283  2007.03  [Refereed]

    Authorship:Last author

  • Conversation robot with the function of gaze recognition

    Shinya Fujie, Toshihiko Yamahata, Tetsunori Kobayashi

    IEEE-RAS Int'l Conf. on Humanoid Robots     364 - 369  2006.12  [Refereed]

    Authorship:Last author

  • Realization of rhythmic dialogue on spoken dialogue system using para-linguistic information

    Shinya Fujie , Tetsunori Kobayashi

    The Journal of the Acoustical Society of America    2006.11  [Refereed]

    Authorship:Last author

  • Hybrid Voice Conversion of Unit Selection and Generation Using Prosody Dependent HMM

    Masashi Okubo, Ryo Mochizuki, Tetsunori Kobayashi

    Trans. on Information and Systems (ED)   E89-D ( 11 ) 2775 - 2782  2006.11  [Refereed]

    Authorship:Last author

  • Source Separation Using Multiple Directivity Patterns Produced by ICA-based BSS

    Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. 14th European Signal Processing Conference    2006.09  [Refereed]

    Authorship:Last author

  • A Method for Solving the Permutation Problem of Frequency-Domain BSS Using Reference Signal

    Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. 14th European Signal Processing Conference    2006.09  [Refereed]

    Authorship:Last author

  • Two-dimensional Heteroscedastic Linear Discriminant Analysis for Age-group Classification

    Kazuya Ueki, Teruhide Hayashida, Tetsunori Kobayashi

    Proc. 18th International Conference on Pattern Recognition     585 - 588  2006.08  [Refereed]

    Authorship:Last author

  • Head Gesture Recognition for the Moving Conversation Robot

    NAKAJIMA Kei, EJIRI Yasushi, FUJIE Shinya, OGAWA Tetsuji, MATSUSAKA Yosuke, KOBAYASHI Tetsunori

    The IEICE transactions on information and systems   89 ( 7 ) 1514 - 1522  2006.07  [Refereed]

    Authorship:Last author

    CiNii

  • Spoken Dialogue System Using Recognition of User's Feedback for Rhythmic Dialogue

    Shinya Fujie, Riho Miyake, Tetsunori Kobayashi

    Proc. Speech Prosody 2006    2006.05  [Refereed]

    Authorship:Last author

  • MONEA: Message-Oriented Networked-Robot Architecture

    Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

    Proc. Internarional Conference on Robotics and Automation     194 - 199  2006.04  [Refereed]

    Authorship:Last author

  • MONEA : Message-Oriented Networked-Robot Architecture for Efficient Multifunctional-Robot Development Environment

    Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

      24 ( 4 ) 115 - 125  2006.04  [Refereed]

    Authorship:Last author

  • Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion

    Tetsuji Ogawa, Tetsunori Kobayashi

    Trans. on Information and Systems (ED)   E89-D ( 3 ) 939 - 945  2006.03  [Refereed]

    Authorship:Last author

  • Adaptive understanding of proposal-requesting expressions for conversational information retrieval system

    Kenichiro Hosokawa, Shinya Fujie, Tetsunori Kobayashi

    Systems and Computers in Japan   37 ( 14 ) 62 - 72  2006

     View Summary

    This paper considers a conversational system in which information is provided in accordance with the conditions presented by the user, and proposes a method that can adequately deal even with unknown expressions. In most conventional systems, the relation between the expression and the intention of the utterance by the user is built into the system beforehand. Thus, it is difficult to deal adequately with unknown expressions which have not been learned. We propose a framework which adaptively manages on-line the relation between the expression and the intention by interaction with the user. The proposed method produces a framework in which the connection between the expression and the intention is dynamically modified according to the explicitness or implicitness of the affirmative or negative attitude shown by the user to the proposal made by the system. It is verified by an evaluation experiment that the system can adequately learn the relation between the expression and the intention of the user by the proposed method, and can deal adequately with unknown expressions. © 2006 Wiley Periodicals, Inc.

    DOI

    Scopus

  • Manifold HLDA and its application to robust speech recognition

    Toshiaki Kubo, Tetsuji Ogawa, Tetsunori Kobayashi

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5     1551 - 1554  2006  [Refereed]

     View Summary

    A manifold heteroscedastic linear discriminant analysis (MHLDA) which removes environmental information explicitly from the useful information for discrimination is proposed. Usually, a feature parameter used in pattern recognition involves categorical information and also environmental information. A well-known HLDA tries to extract useful information (UT) to represent categorical information from the feature parameter. However, environmental information is still remained in the UI parameters extracted by HLDA, and it causes slight degradation in performance. This is because HLDA does not handle the environmental information explicitly. The proposed MHLDA also tries to extract UI like HLDA, but it handles environmental information explicitly. This handling makes MHLDA-based UI parameter less influenced of environment. However, as compensation, in MHLDA, the categorical information is little bit destroyed. In this paper, we try to combine HLDA-based UI and MHLDA-based UI for pattern recognition, and draw benefit of both parameters. Experimental results show the effectiveness of this combining method.

  • Subspace-based age-group classification using facial images under various lighting conditions

    Kazuya Ueki, Teruhide Hayashida, Tetsunori Kobayashi

    PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION - PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE     43 - +  2006  [Refereed]

     View Summary

    This paper presents a framework of age-group classification using facial images under various lighting conditions. Our method is based on the appearance-based approach that projects images from the original image space into a face-subspace. We propose a two-phased approach (2DLDA+LDA), which is based on 2DPCA and LDA. Our experimental results show that the new 2DLDA+LDA-based approach improves classification accuracy more than the conventional PCA-based and LDA-based approach. Moreover, the effectiveness of eliminating dimensions that do not contain important discriminative information is confirmed. The accuracy rates are 46.3%, 67.8% and 78.1% for age-groups that are in the 5-year, 10-year and 15-year range respectively.

  • 韻律情報を用いたスペクトル変換方式の検討

    望月 亮, 大久保 雅史, 小林 哲則

    電子情報通信学会論文誌   J88-DII ( 11 ) 2269 - 2276  2005.11  [Refereed]

    Authorship:Last author

  • Optimizing the Structure of Parly-Hidden Markov Models Using Weighted Likelihood-Ratio Maximization Criterion

    Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. 6th Annual Conf. of the Int'l Speech Communication Association     3353 - 3356  2005.09  [Refereed]

    Authorship:Last author

  • Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue syste

    Shinya Fujie, Kenta Fukushima, Tetsunori Kobayashi

    Proc. 6th Annual Conf. of the Int'l Speech Communication Association     889 - 892  2005.09  [Refereed]

    Authorship:Last author

  • A Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation Using Reference Signal

    Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

    14th European Signal Processing Conference     15  2005.09  [Refereed]

    Authorship:Last author

  • Extension of Hidden Markov Models for Multiple Candidates and its Application to Gesture Recognition

    Yosuke Sato, Tetsunoji Ogawa and Testunori Kobayashi

    Trans. on Information and Systems (ED)   E88-D ( 6 ) 1239 - 1247  2005.06  [Refereed]

    Authorship:Last author

  • Speech recognition in the blind condition based on multiple directivity patterns using a microphone array

    Toshiyuki Sekiya, Tetsunori Kobayashi

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2005     1373 - 1376  2005.03  [Refereed]

    Authorship:Last author

  • Adaptive Understanding of Proposal-Requesting Expressions for Conversational Information Retrieval System

    Kenichiro Hosokawa, Shinya Fujie, Tetsunori Kobayashi

      J88-DII ( 3 ) 619 - 628  2005.03  [Refereed]

    Authorship:Last author

  • Recognition of Positive/Negative Attitude and Its Application to a Spoken Dialogue System

    Shinya Fujie, Yasushi Ejiri, Hideaki Kikuchi, Tetsunori Kobayashi

      J88-DII ( 3 ) 488 - 498  2005.03  [Refereed]

    Authorship:Last author

  • Speech Shift: Speech Input Interface Using Intentional Control of Voice Pitch

    Yukihiro Omoto, Masataka Goto, Katsunobu Itou, Tetsunori Kobayashi

      J88-DII ( 3 ) 469 - 489  2005.03  [Refereed]

    Authorship:Last author

  • An Evaluation of Affective Representation by Prosodic/Spectral Features

    Masashi Okubo, Ryo Mochizuki, Tetsunori Kobayashi

      J88-DII ( 2 ) 441 - 444  2005.02  [Refereed]

    Authorship:Last author

  • Anthropo-morphic conversational robot : Multimodal human interface with para-linguistic information expressing/understanding abilities

    KOBAYASHI Tetsunori, FUJIE Shinya, MATSUSAKA Yosuke, SHIRAI Katsuhiko

    The Journal of the Acoustical Society of Japan   61 ( 2 ) 85 - 90  2005.02  [Refereed]  [Invited]

    Authorship:Lead author

    DOI CiNii

  • A Conversation Robot with Back-Channel Feedback Function based on Linguistic and Nonlinguistic Information

    Shinya Fujie, Kenta Fukushima, Tetsunori Kobayashi

    Proc. International Conf. on Autonomous Robots and Agents     379 - 384  2004.12  [Refereed]

    Authorship:Last author

  • Speech Spotter: On-demand Speech Recognition in Human-Human Conversation on the Telephone or in Face-to-Face Situations

    Masataka Goto, Koji Kitayama, Katunobu Itou, Tetsunori Kobayashi

    Proc. 5th Annual Conf. of the Int'l Speech Communication Association    2004.10  [Refereed]

    Authorship:Last author

  • Speech Recognition Interface for Music Information Retrieval: Speech Completion'' and ``Speech Spotter

    Masataka Goto, Katunobu Itou, Koji Kitayama , Tetsunori Kobayashi

    ISMIR2004     403 - 408  2004.10  [Refereed]

    Authorship:Last author

  • Recognition of three simultaneous utterance of speech by four-line directivity microphone mounted on head of robot

    Naoya Mochiki,Toshiyuki Sekiya,Tetsuji Ogawa , Tetsunori Kobayashi

    Proc. 5th Annual Conf. of the Int'l Speech Communication Association    2004.10  [Refereed]

    Authorship:Last author

  • Prosody based Attitude Recognition with Feature Selection and Its Application to Spoken Dialog System as Para-Linguistic Information

    Shinya Fujie, Daizo Yagi, Hideaki Kikuchi, Tetsunori Kobayashi

    Proc. 5th Annual Conf. of the Int'l Speech Communication Association     2841 - 2844  2004.10  [Refereed]

    Authorship:Last author

  • A low-band spectrum envelope reconstruction method for PSOLA-based F0 modification

    Ryo Mochizuki, Tetsunori Kobayashi

    Trans. on Information and Systems (ED)   E87-D ( 10 ) 2426 - 2429  2004.10  [Refereed]

    Authorship:Last author

  • A Conversation Robot Using Head Gesture Recognition as Para-Linguistic Information

    Shinya Fujie, Yasuhi Ejiri, Kei Nakajima, Yosuke Matsusaka, Tetsunori Kobayashi

    Proceedings of 13th IEEE International Workshop on Robot and Human Communication     159 - 164  2004.09  [Refereed]

    Authorship:Last author

  • A Method of Gender Classification by Integrating Facial, Hairstyle, and Clothing Images

    Kazuya Ueki, Hiromitsu Komatsu, Satoshi Imaizumi, Kenichi Kaneko, Satoshi Imaizumi, Nobuhiro Sekine, Jiro Katto, Tetsunori Kobayashi

    Proc. Int'l Conf. on Pattern Recognition     446 - 449  2004.08  [Refereed]

    Authorship:Last author

  • Design and Implementation of Data Sharing Architecture for Multifunctional Robot Development

    Yosuke Matsusaka, Kentaro Oku, Tetsunori Kobayashi

    Systems and Computers in Japan   35 ( 8 ) 54 - 65  2004.07  [Refereed]

    Authorship:Last author

  • Extension of State-Observation Dependency in Partly-Hidden Markov Models and Its Application to Continuous Speech Recognition

    OGAWA Tetsuji, KOBAYASHI Tetsunori

    The Transactions of the Institute of Electronics,Information and Communication Engineers.   87 ( 6 ) 1216 - 1223  2004.06  [Refereed]

    Authorship:Last author

    CiNii

  • Speech Enhancement based on Multiple Directivity patterns using a Microphone Array

    Toshiyuki Sekiya, Tetsunori Kobayashi,

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2004    2004.05  [Refereed]

    Authorship:Last author

  • A Low-band Spectrum Envelope Modeling For High Quality Pitch Modification

    Ryo Mochizuki , Tetsunori Kobayashi

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2004     645 - 648  2004.05  [Refereed]

    Authorship:Last author

  • Spoken Dialogue System Using Prosody as Para-Linguistic Information

    Shinya Fujie, Daizo Yagi, Yosuke Matsusaka, Hideaki Kikuchi, Tetsunori Kobayashi

    Proc. Int'l Conf. on Speech Prosody 2004     387 - 390  2004.03  [Refereed]

    Authorship:Last author

  • Multi-Layer Audio Segregation and its Application to Double Talk

    Toshiyuki Sekiya, Tomohiro Sawada Tetsuji Ogawa, Tetsunori Kobayashi

    SWIM(Lectures by Masters in Speech Processing)    2004.01  [Refereed]

    Authorship:Last author

  • Recognition of Para-Linguistic Information and Its Application to Spoken Dialogue System

    Shinya FUJIE, Yasushi EJIRI, Yosuke MATSUSAKA, Hideaki KIKUCHI , Tetsunori KOBAYASHI

    Proc. Automatic Speech Recognition and Understanding Workshop     231 - 236  2003.12  [Refereed]

    Authorship:Last author

  • Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking

    Noriyuki Murai, Tetsunori Kobayashi

    Systems and Computer in Japan   34 ( 30 ) 103 - 111  2003.11  [Refereed]

    Authorship:Last author

  • Speech Starter: Noise-Robust Endpoint Detection by Using Filled Pauses

    Koji Kitayama, Masataka Goto, Katunobu Itou , Tetsunori Kobayashi

    Proc. 4th Annual Conf. of the Int'l Speech Communication Association     1237 - 1240  2003.09  [Refereed]

    Authorship:Last author

  • 車運転時における音声利用の心的負荷評価

    宗近 純一,松坂 要佐,小林 哲則

    第2回 情報科学技術フォーラム FIT2003 情報技術レターズ   2   105 - 106  2003.09  [Refereed]

    Authorship:Last author

  • Design and Implementation of Data Sharing Architecture for Multi-Functional Robot Development

    Yosuke Matsusaka, Kentaro Oku, Tetsurnori Kobayashi

      J86-D-I ( 5 ) 318 - 329  2003.05  [Refereed]

    Authorship:Last author

  • Hybrid modeling of PHMM and HMM for speech recognition

    Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2003     140 - 143  2003.04  [Refereed]

    Authorship:Last author

  • Inter-Module Cooperation Architecture for Interactive Robot

    KyeongJu Kim, Yosuke Matsusaka, Tetsunori Kobayashi

    International Conference on Intelligent Robots and Systems     2286 - 2291  2002.10  [Refereed]

    Authorship:Last author

  • Generalization of State-Observation-Dependency in Partly Hidden Markov Models

    Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. 3rd Annual Conf. of the Int'l Speech Communication Association     2673 - 2676  2002.09  [Refereed]

    Authorship:Last author

  • System Architecture to Realize Widely Applicable and Interactive Behavior of the Robot

    Yosuke Matsusaka, Tetsunori Kobayashi

    Proc. Int'l Workshop on Lifelike Animated Agents -Tools, Affective Functions, and Applications-     77 - 82  2002.08  [Refereed]

    Authorship:Last author

  • Media-Integrated Biometric Person Recognition Based on the Dempster-Shafer Theory

    Yoshiaki Sugie, Tetsunori Kobayashi

    Proc. Int'l Conf. on Pattern Recognition 2002     381 - 384  2002.08  [Refereed]

    Authorship:Last author

  • Extension of Hidden Markov Models to Deal with Multiple Candidates of Observations and its Application to Mobile-robot-oriented Gesture Recognition

    Yosuke Sato,Tetsunori Kobayashi

    Proc. Int'l Conf. on Pattern Recognition 2002     515 - 519  2002.08  [Refereed]

    Authorship:Last author

  • Trend of Spoken Dialogue Research(<Special Issue>Recent Advancements of Spoken Language Interfaces and Dialogue Systems)

    Tetsunori Kobayashi

      17 ( 3 ) 266 - 270  2002.05  [Refereed]  [Invited]

    Authorship:Lead author

  • Humanoid robots in waseda university—hadaly-2 and wabian

    Shuji Hashimoto, Tetsunori Kobayashi, al

    Autonomous Robots, Kluwer Academic Publishers   12   25 - 38  2002  [Refereed]

  • System Software for Collaborative Development of Interactive Robot

    Yosuke Matsusaka , Tetsunori Kobayashi

    IEEE-RAS Int'l Conf. on Humanoid Robots     271 - 277  2001.11  [Refereed]

    Authorship:Last author

  • Modeling of conversational strategy for the robot participating in the group conversation

    Yosuke Matsusaka, Shinya Fujie, Tetsunori Kobayashi

    Proc. 2nd Annual Conf. of the Int'l Speech Communication Association     2173 - 2176  2001.09  [Refereed]

    Authorship:Last author

  • Estimating positions of multiple adjacent speakers based on MUSIC spectra correlation using a microphone array

    Hidetomo Tanaka, Tetsunori Kobayashi

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2001     3045 - 3048  2001.05  [Refereed]

    Authorship:Last author

  • Japanese Dictation ToolKit-1999 version-

    Kawahara Tatsuya, Lee Akinobu, Kobayashi Tetsunori, Takeda Kazuya, Minematsu Nobuaki, Sagayama Shigeki, Itou Katsunobu, Itou Akinori, Yamamoto Mikio, Yamada Atsushi, Utsuro Takehito, Shikano Kiyohiro

    The Journal of the Acoustical Society of Japan   57 ( 3 ) 210 - 214  2001.03

    DOI CiNii J-GLOBAL

  • DARPA音声プロジェクトと日本の音声認識研究

    小林哲則

    日本音響学会誌   57 ( 1 ) 70 - 60  2001.01  [Refereed]

    Authorship:Lead author

  • Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking

    Noriyuki Murai, Tetsunori Kobayashi

      J83-D-II ( 11 ) 2465 - 2742  2000.11  [Refereed]

    Authorship:Last author

  • Spoken Word Recognition Using Partly-Hidden Markov Models

    Junko Koyama, Tetsunori Kobayashi

      J83-D-II ( 11 ) 2379 - 2387  2000.11  [Refereed]

    Authorship:Last author

  • Robust Language Modeling for Small Corpus of Target Task Using Class Combined Word Statistics and Selective Use of General Corpus

    Yosuke Wada, Norihiko Kobayashi, Tetsunori Kobayashi

      J83-D-II ( 11 ) 2379 - 2406  2000.11  [Refereed]

    Authorship:Last author

  • Partly-Hidden Markov Model and Its Application to Gesture Recognition

    Ken Masumitsu, Tetsunori Kobayashi

      41 ( 11 ) 3060 - 2069  2000.11  [Refereed]

    Authorship:Last author

  • Free software toolkit for japanese large vocabulary continuous speech recognition(共著)

    T.Kawahara, A.Lee, T.Kobayashi, K.Takeda, N.Minematsu, S.Sagayama, K.Itou, A.Ito, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

    Proc. 1st Annual Conf. of the Int'l Speech Communication Association     476 - 479  2000.09  [Refereed]

  • Dictation of multi-party conversation using statistical turn taking model and speaker models

    Noriyuki Murai, Tetsunori Kobayashi

    Proc. of International Conference on Acoustic, Speech, Signal Processing     1575 - 1578  2000.06  [Refereed]

    Authorship:Last author

  • A conversational robot utilizing facial and body expressions

    Tsuyoshi Tojo, Yosuke Matsusaka, Tomotada Ishii, Tetsunori Kobayashi

    Proc. International Conf. on System, Man and Cybernetics     858 - 863  2000.06  [Refereed]

    Authorship:Last author

  • Japanese Dictation ToolKit - 1998 version -

    KAWAHARA Tatsuya, LEE Akinobu, KOBAYASHI Tetsunori, TAKEDA Kazuya, MINEMATSU Nobuaki, ITOU Katsunobu, YAMAMOTO Mikio, YAMADA Atsushi, UTSURO Takehito, SHIKANO Kiyohiro

    The Journal of the Acoustical Society of Japan   56 ( 4 ) 255 - 259  2000.04  [Refereed]

    DOI CiNii

  • Multi-person Conversation via Multi-modal Interface : A Robot who Communicate with Multi-user

    Yosuke Matsusaka, Tsuyoshi Tojo, Sentaro Kubota, Kenji Furukawa, Daisuke Tamiya, Shinya Fujie, Tetsunori Koabyashi,

    Proc. European Conf. on Speech Communication and Technology     1723 - 1726  1999.09  [Refereed]

    Authorship:Last author

  • Class-combined Word N-gram for Robust Language Modeling

    Noriyuki Kobayashi, Tetsunori Kobayashi

    Proc. European Conf. on Speech Communication and Technology     1599 - 1602  1999.09  [Refereed]

    Authorship:Last author

  • Multi-person Conversation Robot using Multi-modal Interface

    Yosuke Matsusaka, Tsuyoshi Tojo, Sentaro Kubota, Kenji Furukawa,Shinya Fujie, Tetsunori Koabyashi

    Proc. SCI/ICAS'99     450 - 455  1999.07  [Refereed]

    Authorship:Last author

  • Controlling Dialogue Strategy According to Performance of Processing

    Hideaki Kikuchi, Tetsunori Kobayashi, Katsuhiko Shirai

    Proc. ESCA Workshop on Interactive Dialogue in Multi-Modal Systems     85 - 88  1999.06  [Refereed]

  • Japanese dictation toolkit: 1997 version

    T.Kawahara, A.Lee, T.Koabayshi, K.Takeda, N.Minematsu, K.Ito, A.Itoh, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

    Journal of the Acoustical Society of Japan E   20 ( 3 ) 233 - 239  1999.05  [Refereed]

  • JNAS: Japanese speech corpus for large vocabulary continuous speech recognition reseach

    K.Ito, M.Yamamoto, K.Takeda, T.Takezawa, T.Matsuoka, Tetsunori Kobayashi, K.Shikano and S.Itahashi

    Journal of the Acoustical Society of Japan E   20 ( 3 ) 199 - 207  1999.05  [Refereed]

  • Effect of Vocabulary Extension using Word Sequence Concatenation for Large Vocabulary Continuous Speech Recognition

    Yosuke Wada, Norihiko Kobayashi, Yuichiro Nakano, Tetsunori Kobayashi

      40 ( 4 ) 1413 - 1420  1999.04  [Refereed]

    Authorship:Last author

  • Partly Hidden Markov Model and its Application to Speech Recognition

    T.Kobayashi, K.Masumitsu, J.Furuyama,

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1999     121 - 124  1999.03  [Refereed]

    Authorship:Lead author

  • Japanese dictation toolkit: 1997 version

    T.Kawahara, A.Lee, T.Koabayshi, K.Takeda, N.Minematsu, K.Ito, A.Itoh, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

      55 ( 3 ) 175 - 180  1999.03  [Refereed]

  • The Design of the Newspaper-Based Japanese Large Vocabulary Continuous Speech Recognition Corpus

    K.Itoh, M.Yamamoto, K.Takezawa, T.Matsuoka, K.Shikano, T.Kobayashi, S.Itahashi,

    Proc. 5th Int'l Conf. on Spoken Language Processing     3261 - 3264  1998.12  [Refereed]

  • Sharable software repository for Japanese large vocabulary continuous speech recognition

    T.Kawahara, T.Kobayashi, K.Takeda, N.Minematsu, K.Itou, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

    Proc. 5th Int'l Conf. on Spoken Language Processing     3257 - 3260  1998.12  [Refereed]

  • Source-Extended Language Model for Large Vocabulary Continuous Speech Recognition

    Tetsunori Kobayashi, Norihiko Kobayashi , Yosuke Wada

    Proc. 5th Int'l Conf. on Spoken Language Processing     2431 - 2434  1998.12  [Refereed]

    Authorship:Lead author

  • Controlling Gaze of Humanoid in Communication with Human

    H.Kikuchi,M.Yokoyama,K.Hoashi,Y.Hidaki,T.Kobayashi,K.Shirai

    Proc.IROS98/IEEE     255 - 260  1998.10  [Refereed]

  • Design and Development of Japanese Speech Corpus for Large Vocabulary Continuous Speech Recognition Assessment

    K.Itou,K.Takeda,T.Takezawa,T.Matsuoka,K.Shikano,T.Kobayashi,S.Itahashi,M.Yamamoto

    Proc. of First International Workshp on East-Asian Language Resorces and Evaluation     98 - 103  1998.05  [Refereed]

  • Common Platform of Japanese Large Vocabulary Continuous Speech Recognizer—Proposal and Initial Results

    T.Kawahara, A.Lee, T.Kobayashi, K.Takeda, N.Minematsu, K.Itou, A.Ito, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

    Proc. of First International Workshp on East-Asian Language Resorces and Evaluation     117 - 122  1998.05  [Refereed]

  • Speech Processing Technology towards Practical Use

    Katsuhiko Shirai, Tetsunori Kobayashi, Ikuo Kudo

      38 ( 11 ) 971 - 975  1997.10  [Refereed]  [Invited]

  • Humanoid - Intelligent Anthropomorphic Robot -

    Shuji Hashimoto, Seinosuke Narita, Katsuhiko Shirai, Tetsunori Kobayashi, Atsuo Takanishi, Shigeki Sugano, Yoshinori Kasahara

      38 ( 11 ) 959 - 969  1997.10  [Refereed]

  • Humanoid Robot ---Development of an Information Assistant Robot Hadaly---

    Hashimoto, S, Narita, H. Kasahara, A. Takanishi, S. Sugano, K. Shirai, T. Kobayashi, H. Takanobu, T. Kurata, K. Fujiwara, T. Matsuno, T. Kawasaki , K. Hoashi

    Proc. Int'l Workshop on Robot and Human Communication     106 - 111  1997.09  [Refereed]

  • Development of ASJ Continuous Speech Corpus --- Japanese Newspaper Article Sentences (JNAS) ---

    Shuichi ITAHASHI, Mikio YAMAMOTO, Toshiyuki TAKEZAWA, Tetsunori KOBAYASHI

    Proc. Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques    1997.09  [Refereed]

  • Partly Hidden Markov Model and its Application to Gesture Recognition

    Tetsunori Kobayashi, Sataoshi Haruyama

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1997     3081 - 3084  1997.04  [Refereed]

    Authorship:Lead author

  • Human interface of the humanoids

    Tetsunori Kobayashi

    Proc. International Workshop on Human Interface Technology     63  1997.03  [Refereed]

    Authorship:Lead author

  • マルチモーダル入力環境下における音声の協調的利用 : 音声作図システムS-tgifの設計と評価

    西本卓也, 志田修利, 小林哲則, 白井克彦

    電子情報通信学会論文誌. D-II, 情報・システム, II-情報処理   79 ( 12 ) 2176 - 2183  1996.12  [Refereed]

    Authorship:Corresponding author

     View Summary

    マルチモーダルインタフェースの枠組みの中で音声入力がどのようにインタフェースの改善に貢献し得るかを検討し,そこで得た知見を生かしたマルチモーダル作図システムS-tgifを作成・評価した.システムの作成にあたっては,インタフェースの原則論に従って音声の特長である操作性および手順連想容易性を生かし,欠点である状態理解容易性,頑健性を他で補うよう努めた.評価実験の結果,システムの利用を開始してまもない時期あるいは一時利用を中断した後などにおいては特に音声の利用効果が高く,課題の完了までに要する時間を約80%に減少できた.ユーザがシステムに熟練すると音声の利用の客観的効果は薄れるが,特定のコマンドでは音声の利用率が90%を超え,また主観評価の結果でも高い評価を得るなど,音声入力はユーザから支持された.このように,インタフェースの原則論に従って音声の効果的利用を考慮することにより,有用なインタフェースを構築できることが示された.

    CiNii

  • ALICE : Acquisition of Language In Conversational Environment : An Approach to Weakly Supervised Training of Spoken Language System for Language Porting

    Tetsunori Kobayashi

    Proc. 4th Int'l Conf. on Spoken Language Processing     833 - 836  1996.10  [Refereed]

    Authorship:Lead author

  • An application of Dempster and Shafer's probability theory to speech recognition

    Tetsunori Kobayashi

    The Journal of the Acoustical Sosiety of America   100 ( 4 Pt.2 ) 2757  1996.10  [Refereed]

    Authorship:Lead author

  • Speech recognition in nonstationary noise based on parallel HMMs and spectral subtraction

    Ryuji Mine, Tetsunori Kobayashi, Katsuhiko Shirai

    Systems and Computers in Japan   27 ( 14 ) 37 - 44  1996  [Refereed]

    Authorship:Corresponding author

     View Summary

    This paper proposes a method of speech recognition in a nonstationary noisy environment, combining the parallel HMMs and the spectral subtraction. In the proposed method, a set of hypothesis is generated with respect to the combination of the speech and the noise that can produce the observed data by a series of subtraction processes. Using HMMs prepared separately for the speech and the noise, the probabilities of occurrence are calculated. The 100-word recognition in the noisy environment in an ordinary car running in an urban area, is defined as the task in the experiment. Comparative experiments, are made for the proposed method, the ordinary spectral subtraction method and other parallel HMM methods. Then, the effectiveness of the proposed method is verified.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Improving human interface in drawing tool using speech, mouse and key-board

    Takuya Nishimoto, Nobutoshi Shida, Tetsunori Kobayashi , Katsuhiko Shirai

    Proc. International Workshop on Robot and Human Communication     107 - 112  1995  [Refereed]

    Authorship:Corresponding author

  • Phoneme recognition in various styles of utterance based on mutual information criterion(共著)

    Shigeki Okawa, Tetsunori Kobayashi , Katsuhiko Shirai

    Proc. 3rd Int'l Conf. on Spoken Language Processing     1911 - 1917  1994.09  [Refereed]

  • Multimodal drawing tool using speech, mouse and key-board(共著)

    T.Nishimoto, N.Shida, T.Kobayashi K.Shirai

    Proc. 3rd Int'l Conf. on Spoken Language Processing     1287 - 1290  1994.09  [Refereed]

    Authorship:Corresponding author

  • Generation of prosody in speech synthesis using large speech data-base

    Naohiro Sakurai, Takemi Mochida, Tetsunori Kobayashi , Katsuhiko Shirai

    Proc. 3rd Int'l Conf. on Spoken Language Processing     747 - 750  1994.09  [Refereed]

    Authorship:Corresponding author

  • Phoneme recognition in continuous speech based on mutual information criterion

    Shigeki Okawa, Tetsunori Kobayashi, Katsuhiko Shirai

      50 ( 9 ) 702 - 710  1994.09  [Refereed]

  • Handling of User Interruption Realizing Timing-Free Utterances for Spoken Dialogue Interface

    Hideaki Kikuchi, Ikuo Kudo, Tetsunori Kobayashi, Katsuhiko Shirai

      J77-D ( 8 ) 1502 - 1511  1994.08  [Refereed]

  • Speech Recognition in Nonstationary Noise Based on Parallel HMMs and Spectral Subtraction

    Ryuji MINE, Tetsunori KOBAYASHI, Katsuhiko SHIRAI

      J78-DII ( 7 ) 1021 - 1027  1994.07  [Refereed]

    Authorship:Corresponding author

  • Recognition of convervsational speech

    Tetsunori Kobayashi

      50 ( 7 ) 563 - 567  1994.07  [Refereed]  [Invited]

    Authorship:Lead author

  • Characterization of fluctuations in fundamental periods of speech based on fractal analysis

    Tetsunori Kobayashi

    The Journal of the Acoustical Society of America   95 ( 5 Pt.2 ) 2824  1994.05  [Refereed]

    Authorship:Lead author

  • Automatic training of phoneme dictionary based on mutual information criterion

    Shigeki Okawa, Tetsunori Kobayashi , Katsuhiko Shirai

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1994     241 - 244  1994.04  [Refereed]

  • Markov model based noise modeling and its application to noisy speech recognition using dynamical features of speech

    Tetsunori Kobayashi, Ryuji Mine , Katsuhiko Shirai

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1994     57 - 60  1994.04  [Refereed]

    Authorship:Lead author

  • Phoneme Recognition Using Probability Ratio between Phoneme-Group-Pair

    Tetsunori Kobayashi, Y.Hamano, S.An, Katsuhiko Shirai

      J77-A ( 2 ) 128 - 134  1994.02  [Refereed]

    Authorship:Lead author

  • Speech synthesis of japanese sentences using large waveform data-base

    Takemi Mochida, Tetsunori Kobayashi, Katsuhiko Shirai

    1993 International Workshop on Speech Processing     95 - 100  1993.11  [Refereed]

  • Word spotting in conversational speech based on phonemic likelihood by mutual information criteion

    S.Okawa, T.Kobayashi , K.Shirai

    Proc. European Conf. on Speech Communication and Technology     1281 - 1284  1993.09  [Refereed]

  • Speech recognition under the unstationary noise based on the noise markov model and spectral subtraction

    T.Kobayashi, R.Mine , K.Shirai

    Proc. European Conf. on Speech Communication and Technology     833 - 836  1993.09  [Refereed]

    Authorship:Lead author

  • 隠れマルコフモデルに基づく音声認識

    小林 哲則

    電気学会論文誌 C 電子・情報・システム部門誌   113 ( 5 ) p295 - 301  1993.05  [Refereed]  [Invited]

    Authorship:Lead author

    CiNii

  • Design and creation of speech and text corpora of dialogue

    Satoru Hayamizu, Shuichi Itahashi, Tetsunori Kobayashi, Toshiyuki Takezawa

    Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA)   E76-A ( 1 ) 17 - 22  1993  [Refereed]

  • Phrase recognition in conversational speech using prosodic and phonemic information

    Shigeki Okawa, Takashi Endo, Tetsunori Kobayashi, Katsuhiko Shirai

    Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA)   E76-A ( 1 ) 44 - 50  1993  [Refereed]

  • High quality syntheic speech generation using synchronized oscillators

    Kenji Hashimoto, Takemi Mochida, Yasuaki Satoh, Tetsunori Kobayashi, Katsuhiko Shirai

    Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA)   E76-A ( 11 ) 1949 - 1956  1993  [Refereed]

  • ASJ continuous speech corpus for research

    Tetsunori Kobayashi, Shuichi Itahashi, Satoru Hayamizu, Toshiyuki Takezawa

      Vol.48 ( No.12 ) 888 - 893  1992.12  [Refereed]

    Authorship:Lead author

  • Spectral mapping onto probabilistic domain using neural networks and its application to speaker adaptive phoneme recognition

    T.Kobayashi

    Proc. 2nd Int'l Conf. on Spoken Language Processing     385 - 388  1992.11  [Refereed]

    Authorship:Lead author

  • Speaker adaptive phoneme recognition based on spectral mapping to probabilistic domain

    T.Kobayashi, Y.Uchiyama, J.Osada , K.Shirai

    Proc. of International Conference on Acoustics, Speech and Signal Processing     457 - 460  1992.03  [Refereed]

    Authorship:Lead author

  • Fractal dimension of fluctuations in fundamental period of speech

    K.Shirai, T.Kobayashi, M.Yagyu

    Proc. of International Conference on Noise in Physical Systems and 1/f Fluctuations    1991.11  [Refereed]

    Authorship:Corresponding author

  • Visualization of Speech Production Process and Color Representation of Phonetic Information

    Katsuhiko Shirai, Tetsunori Kobayashi

      11 ( 43 ) 216 - 221  1991.10  [Refereed]

  • The Role of Fluctuations in Fundamental Period for Natural Speech Synthesis.

    Tetsunori Kobayashi, Hidetoshi Sekine

      47 ( 8 ) 539 - 544  1991.08  [Refereed]

    Authorship:Lead author

  • Estimation of articulatory motion using neural networks

    Katsuhiko Shirai, Tetsunori Kobayashi

    Journal of Phonetics   19   379 - 385  1991.08  [Refereed]

    Authorship:Corresponding author

  • Analysis of cotextual dependency of phonetic features and its application to speech recognition

    T.Kobayashi, K.Watanabe, Y.Uchiyama

    Proc. Korea-Japan joint workshop on advanced technology of speech recognition and synthesis     92 - 97  1991.07  [Refereed]

    Authorship:Lead author

  • Application of neural networks to articulatory motion estimation

    T.Kobayashi, M.Yagyu, K.Shirai

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1991     489 - 492  1991.05  [Refereed]

    Authorship:Lead author

  • Dependence of spectral features of vowels and voiceless stops on phoneme environment.

    Tetsunori Kobayashi, Kazuhiro Watanabe, Toshiyuki Matsuda

      J74-A ( 3 ) 353 - 359  1991  [Refereed]

    Authorship:Lead author

  • Statistical properties of fluctuation of pitch intervals and its modeling for natural synthetic speech

    T.Kobayashi,H.Sekine

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1990     321 - 324  1990  [Refereed]

    Authorship:Lead author

  • Dependence of phonemic feature on context

    T.Kobayashi, K.Watanabe

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1990     769 - 772  1990  [Refereed]

    Authorship:Lead author

  • Quantification Theory Dealing with Categorical Dependency and Its Application to the Modeling of Spectral Difference of Phoneme.

    Tetsunori Kobayashi, Toshiyuki Matsuda, Kazuhiro Watanabe

      J74-A ( 3 ) 345 - 352  1990  [Refereed]

    Authorship:Lead author

  • Contextual Factor Analysis of Vowel Distribution

    Tetsunori Kobayashi, Toshiyuki Matsuda, Kazuhiro Watanabe

    Proc. European Conf. on Speech Communication and Technology     2277 - 2280  1989  [Refereed]

    Authorship:Lead author

  • A categorical factor analysis of vowel distribution based on the modified qualification theory

    Tetsunori Kobayashi, Toshiyuki Matsuda

    The Journal of the Acoustical Society of America    1988  [Refereed]

    Authorship:Lead author

  • Speech Production Model and Automatic Recognition

    Katsuhiko Shirai, Tetsunori Kobayashi

    Nature, Cognition and System   I   3 - 14  1988  [Refereed]

  • Description of Task Dependent Knowledge for Speech Understanding System

    Tetsunori Kobayashi, Katsuhiko Shirai

    European Conference on Speech Technology    1987  [Refereed]

    Authorship:Lead author

  • The robot musician ‘wabot-2’(waseda robot-2)

    Ichiro Kato, Sadamu Ohteru, Katsuhiko Shirai, Toshiaki Matsushima, Seinosuke Narita, Shigeki Sugano, Tetsunori Kobayashi, Eizo Fujisawa

    Robotics   3 ( 2 ) 143 - 155  1987  [Refereed]

  • A network model dealing with focus of conversation for speech understanding system

    Tetsunori Kobayashi, Katsuhiko Shirai

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1986     1589 - 1592  1986  [Refereed]

    Authorship:Lead author

  • Estimation of articulatory parameters by table look-up method and its application for speaker independent phoneme recognition

    Katsuhiko Shirai, Tetsunori Kobayashi, Jun Yazawa

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1986     2247 - 2250  1986  [Refereed]

    Authorship:Corresponding author

  • Estimating articulatory motion from speech wave

    Katsuhiko Shirai, Tetsunori Kobayashi

    Speech Communication   5 ( 2 ) 159 - 170  1986  [Refereed]

     View Summary

    If articulatory movements can be estimated, then the articulatory parameters which represent the motion of the articulatory organs would be useful for speech recognition. This paper discusses an effective method of estimating articulatory movements and its application to speech recognition. Firstly, what is described is a method of estimating articulatory parameters known as the model matching method, and various spectral distance measures are evaluated for this method. The results show that the best in average is the higher order cepstral distance measure, which is one of the peak weighted measure. Secondly, articulatory parameters are utilized for the recognition of vowels uttered by unspecified speakers. It is shown that the adaptation of the model by the estimated mean vocal tract length is effective to normalize speaker difference. Thirdly, the motor commands to move the articulatory organs are estimated considering articulatory dynamics, and the continuous vowels are recognized by means of these estimated commands. It has been found that a considerable part of the coarticulation effects can be compensated for by this command estimated, and the method is useful for continuous speech recognition. © 1986.

    DOI

    Scopus

    27
    Citation
    (Scopus)
  • Evaluation of Spectral Distance Measure for the Estimation of Articulatory Motion by the Model Matching Method

    Tetsunori Kobayashi, Jun Yazawa, Katsuhiko Shirai

      J68-A ( 2 ) 210 - 217  1985.10  [Refereed]

    Authorship:Lead author

  • Speech I/O System Realizing Flexible Conversation for Robot

    Katsuhiko Shirai, Tetsunori Kobayashi, Kazuhiko Iwata, Yoshio Fukazawa

      3 ( 4 ) 362 - 372  1985.08  [Refereed]

    Authorship:Corresponding author

  • Phrase speech recognition for large vocabulary

    Tetsunori Kobayashi, Yasuhiro Komori, Katsuhiko Shirai

      J68-D ( 6 ) 1304 - 1311  1985.06  [Refereed]

    Authorship:Lead author

    CiNii

  • Speech conversation system of the musician robot

    Tetsunori Kobayashi, Y. Komori, N. Hashimoto, Kazuhiko Iwata, Y. Fukazawa, K. Shirai

    Proc. ICAR'85     483 - 488  1985  [Refereed]

    Authorship:Lead author

  • Speech I/O System Realizing Flexible Conversation for Robot--The Conversational System of WABOT-2

    Katsuhiko Shirai, Tetsunori Kobayashi, Kaduhiko Shirai, Yoshio Fukazawa

    Bulletin of Science and Egineering Resaerch Laboratory, Waseda University   112   53 - 79  1985  [Refereed]

    Authorship:Corresponding author

  • Recognition of Vowels in Continuous Speech Based on the Articulatory Control Model

    Tetsunori Kobayashi, Katsuhiko Shirai

      J67-A ( 10 ) 935 - 942  1984.10  [Refereed]

    Authorship:Lead author

    CiNii

  • Phrase speech recognition of large vocabulary using feature in articulatory domain

    Katsuhiko Shirai, Tetsunori Kobayashi

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1984     409 - 412  1984  [Refereed]

    Authorship:Corresponding author

  • Considerations on articulatory dynamics for continuous speech recognition

    Katsuhiko Shirai, Tetsunori Kobayashi

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1983     324 - 327  1983  [Refereed]

    Authorship:Corresponding author

  • Validity of Articulatory Parameters in Contimuous Speech Recognition for Unspecified Speakers - Vowel Discrimination Test -

    Katsuhiko Shirai, Hiroshi Matsuura, Tetsunori Kobayashi

      J65-A ( 7 ) 671 - 678  1982.07  [Refereed]

    CiNii

  • Recognition of semivowels and consonants in continuous speech using articulatory parameters

    Katsuhiko Shirai, Tetsunori Kobayashi

    Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1982     2004 - 2007  1982  [Refereed]

    Authorship:Corresponding author

▼display all

Books and Other Publications

  • Paralinguistic Information and its Integration in Spoken Dialogue Systems

    Ramón López-Cózar Delgado, Tetsunori Kobayashi( Part: Joint editor)

    Springer  2011 ISBN: 9781461413349

  • 音声言語処理の潮流

    白井克彦( Part: Contributor)

    コロナ社  2010.03 ISBN: 9784339008104

  • 韻律と音声言語情報処理 : アクセント・イントネーション・リズムの科学

    広瀬啓吉( Part: Contributor)

    丸善  2006.01 ISBN: 4621076744

  • 情報システムとヒューマンインターフェース

    白井克彦( Part: Contributor)

    早稲田大学出版部  2010.03 ISBN: 9784657101099

  • Springer handbook of robotics

    Siciliano, Bruno, Khatib, Oussama( Part: Contributor)

    Springer  2008 ISBN: 9783540239574

  • 人工知能学事典

    人工知能学会( Part: Contributor)

    共立出版  2005.12 ISBN: 4320121074

  • ロボット工学ハンドブック

    日本ロボット学会( Part: Contributor)

    コロナ社  2005.06 ISBN: 9784339045765

  • Spoken language systems

    Seiichi Nakagawa, Michio Okada, Tatsuya Kawahara( Part: Contributor)

    Ohmsha,IOS  2005 ISBN: 1586035150

  • 感性情報学 : 感じる・楽しむ・創りだす : 感性的ヒューマンインタフェース最前線

    原島博, 井口征士, 乾敏郎( Part: Contributor)

    工作舎  2004.05 ISBN: 4875023782

  • マルチメディア処理入門

    新田恒雄, 岡村好庸, 杉浦彰彦, 小林哲則, 金沢靖, 山本真司( Part: Joint author)

    朝倉書店  2002.04 ISBN: 4254205074

  • 人間型ロボットのはなし

    早稲田大学ヒューマンノイドプロジェクト( Part: Joint author)

    日刊工業新聞社  1999.06 ISBN: 4526043974

    ASIN

  • Cで学ぶプログラミング技法

    小林哲則( Part: Sole author)

    培風館  1997.11 ISBN: 4563013951

  • Recent research towards advanced man-machine interface through spoken language.

    Hiroya Fujisaki( Part: Contributor)

    Elsevier  1996.10 ISBN: 0444816070

    ASIN

  • International Symposium on Spoken Dialogue : New directions in human and man-machine communication

    Katsuhiko Shirai, Tetsunori Kobayashi, Yasunari Harada( Part: Joint editor)

    ISSD Organizing Committee  1993.11 ISBN: 4990026918

▼display all

Presentations

  • HANASHI-JOZU = A Good Conversationalist: Goodbye Request-response Model, Hello Pre-planed Information Transfer Model.

    Tetsunori Kobayashi  [Invited]

    Multimodal Agents for Ageing and Multicultural Societies, NII Shonan meeting 

    Presentation date: 2018.10

  • A robot-based enjoyable conversation system

    Tetsunori Kobayashi  [Invited]

    5-th ASA-ASJ Joint meeting, Nov.2016 

    Presentation date: 2016.11

  • A robot-based approach towards finding conversation protocol: the role of prosody, eye gaze and body expressions in communication

    Tetsunori Kobayashi  [Invited]

    The Fifth Workshop of Eye Gaze in Intelligent Human Machine Interaction 

    Presentation date: 2013

  • Robot as a multimodal human interface device

    Tetsunori Kobayashi  [Invited]

    International Conference on Auditory-Visual Speech Processing 

    Presentation date: 2010.10

  • Conversation robot recognizing and expressing paralinguistic information

    Tetsunori Kobayashi  [Invited]

    Workshop on Predictive Models of Human Communication Dynamics 

    Presentation date: 2010.08

  • 情報遭遇型会話システム:多様な情報行動による知識の伝達

    小林哲則  [Invited]

    人工知能学会/情報処理学会/電子情報通信学会 第7回 対話システムシンポジウム 

    Presentation date: 2016.10

  • 会話とロボット

     [Invited]

    MMDAgent DAY ! 

    Presentation date: 2016.10

  • Enjoyable Conversation System

    Tetsunori Kobayashi  [Invited]

    InterACT 2016 

    Presentation date: 2016.07

  • 会話ロボットと そのプロトコル

    小林哲則  [Invited]

    日本音響学会 春季研究発表会 

    Presentation date: 2016.03

  • 会話向け音声合成システム

    小林哲則, 岩田和彦  [Invited]

    電子情報通信学会・音声研究会 

    Presentation date: 2014.11

  • History of the Conversational Robot

    Tetsunori Kobayashi  [Invited]

    International Workshop on Spoken Dialogue System 

    Presentation date: 2010.10

  • マルチモーダル会話ロボットとグループコミュニケーション

    小林哲則  [Invited]

    電子情報通信学会 VNV研究会 

    Presentation date: 2009.03

  • 音声認識応用システム開発の新パラダイム

    小林哲則  [Invited]

    情報処理学会/電子情報通信学会, 第10回音声言語シンポジウム 

    Presentation date: 2008.12

  • パラ言語の理解・生成機能によるリズムある対話コミュニケーションの実現

    小林哲則  [Invited]

    日本ロボット学会・ロボット工学セミナー 

    Presentation date: 2006.03

  • パラ言語の理解・生成能力を有する会話ロボット

    小林哲則  [Invited]

    電子情報通信学会・パターン認識メディア理解研究会 

    Presentation date: 2005.09

  • ロボット頭部に設置したマイクロホンによる音源定位・音源分離

     [Invited]

    日本音響学会・春季研究発表会 

    Presentation date: 2005.03

  • 音声認識技術の現状と課題

    小林哲則  [Invited]

    電子情報通信学会 音声実用化シンポジウム 

    Presentation date: 2004.03

  • 会話ロボットの実現に向けて

    小林哲則  [Invited]

    電子情報通信学会・ヒューマンコミュニケーション基礎(HCS)研究会 

    Presentation date: 2003.04

  • Trend of Stochastic Speech Recognition

    Tetsunori Kobayashi  [Invited]

    ISCIE Stochastic System Symposium 

    Presentation date: 1997.11

▼display all

Research Projects

  • Temporal structural modeling of conversational interaction and efficient information transfer using speech media.

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2018.04
    -
    2022.03
     

    Kobayashi Tetsunori

     View Summary

    In order to efficiently convey massinformation via voice media, it is important to incorporate conversational elements into the information transmission and to guarantee the rhythm of the interaction.
    Here, we have modeled the constraints on the temporal structure of conversational interaction that form the basis for realizing rhythmic conversation and incorporated the model into our information delivery system. The system has the ability to monitor the user's response at any time while delivering a summarized document, and to restore and present information that was reduced during summarization in response to the user's response. These features achieved efficient document transmission through spoken conversation.
    In addition, as important elemental technologies of the system, low-latency speech recognition technology, expressive speech synthesis technology, and paralinguistic understanding technology were developed to enhance the performance of the system.

  • User-friendly voice dialogue service that responds with natural timing and sensitivity

    NEDO  Venture support program for innovation implementation

    Project Year :

    2013.04
    -
    2014.03
     

  • Development of technology for utilizing information appliances, sensors, and human interface devices (Related to the development of speech recognition core technology)

    Ministry of Economy, Trade and Industry  Strategic Technology Development Consignment Program

    Project Year :

    2006
    -
    2009
     

  • 人物行動パターン自動解析装置の開発

    NEDO,  大学発事業創出実用化開発費助成金

    Project Year :

    2003.03
    -
    2004.03
     

  • Spoken Dialogue System utilizing Prosody Control

    Japan Society for the Promotion of Science(JSPS)  Grant-in-Aid for Scientific Research

    Project Year :

    2000
    -
    2003
     

  • A study on social networking services for older adults: clarification of barriers and providing solutions

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

    Project Year :

    2013.04
    -
    2016.03
     

    Kobayashi Tetsunori, NAKANO Teppei

     View Summary

    This research presents a basic design and implementation of a framework, which help older adults communicate with others via Internet content sharing. This framework includes a prototype design of Channel-Oriented Interface, a very simple User Interface for older adults, and an interaction design using the interface. This interface is characterized in that it includes a framework for "supporters", who help users configure their interface and usage. Thanks to this feature, the interface is simplified and the operations are unified. We also designed and conducted a series of experiments using the prototype system; a questionnaire survey to 100 older adults, usability testing with 31 older adults, and user-experience evaluation with 3 pair of older adults and their children or grand children. These experimental results showed that most of our hypothesis are supported and however future tasks are still exist.

  • A study on communication robot performing rhythmic conversation

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Project Year :

    2008
    -
    2010
     

    KOBAYASHI Tetsunori

     View Summary

    We sophisticated generation/recognition methods of linguistic and paralinguistic information and achieved a communication robot which can make conversation with a group of people. The robot was used to stimulate activity of the human to human conversation. For this purpose, we designed a robot appearance to express desired character for conversation and to perform paralinguistic information expression functions. We designed behaviors to suit for each conversational situation and conversational procedure to make it attractive. We also improved speech recognition/synthesis methods for conversation.

  • Construction of voice quality generation mechanism by a mechanical model

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2007
    -
    2010
     

    HONDA Masaaki, TAKANISHI Atsuo, KOBAYASHI Tetsunori, SAGISAKA Yoshinori, FUKUI Koutaro

     View Summary

    The research project was aiming to clarify quantitatively speech production process and its control strategy by using a mechanical speech production model (Talking ROBOT) which is mimicking the human mechanism. We realized to reproduce speech sounds with various voice qualities such as breathy voice and creaky voice as well as laughter and laughing voice by using the laryngeal control of the model which is similar to the human control. We also examined the vocal cord vibration behavior and aero-acoustic phenomenon in generating these voice sounds by the direct measurement.

  • Leading research for the practical use of speech recognition technology

    NEDO  Advanced Information and Communications Equipment and Devices Program

    Project Year :

    2005.06
    -
    2006.03
     

  • Studies on conversation systems with understanding and generating functions of linguistic and para-linguistic information

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Project Year :

    2003
    -
    2006
     

    KOBAYASHI Tetsunori, FUJIE Shinya, OGAWA Tetsuji

     View Summary

    As a tool for investigating fundamental elements of natural spoken language communication, a prototype of spoken dialogue system with understanding and generating functions of linguistic and para-linguistic information was developed.
    Although many excellent studies on speech recognition and synthesis have been conducted, there exists no practical spoken dialogue system which satisfies us. One of the reasons is that most spoken dialogue systems did not deal with para-linguistic information. The quantitative understanding for para-linguistic information is not sufficient enough to make natural conversation system. In this study, we tried realizing many component technologies and a platform of conversation robot as tools to reveal the quantitative rolls of para-language.
    In particular, the following outcomes were obtained. 1) the sound localization and separation methods using the four-line directivity microphone mounted on head of robot, 2) the high quality speech synthesis method based on the waveform synthesis and the high quality voice conversion method for expressing para-linguistic information, 3) the method of attitude recognition and back-channel feedback generation based on the prosodic information as para-linguistic information in speech information, 4) the method of head gesture recognition and facial expression recognition as para-linguistic information in visual information, 5) humanoid robot "ROBISUKE" developed as the platform of the spoken dialogue system, and 6) Message Oriented RObot Architecture, MONEA, proposed for the integration of the abovementioned modules.
    Future work includes the experiment for finding out the necessary requirement for natural conversation quantitatively.

  • High performance speech and gesture recognition based on the stochastic model with mutual state-observation-dependencies

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

    Project Year :

    2000
    -
    2002
     

    KOBAYASHI Tetsunori

     View Summary

    Aiming at treating more complicated temporal changes of stochastic phenomena, Partly-Hidden Markov Model (PHMM), is proposed and applied to speech and gesture recognition. It can treat the observation dependent behaviors in both observations and state transitions. Some simulation experiments showed the high potential of PHMM. In addition, from the gesture recognition and the isolated spoken word recognition experiments, PHMM showed the performance to exceed HMM.
    In the formulation of original PHMM, we used common pair of hidden state and observable state to determine the stochastic phenomena of the observation and the state transition. In the formulation modified here, we use common hidden state but different observable state for the observation and for the state transition separately. This slight modification brought the big flexibility in the modeling of phenomena and reduced the word errors compared with HMM and traditional PHMM using continuous speech.
    We also proposed Smoothed Partly-Hidden Markov Model (SPHMM), in which the observation and state transition probabilities are defined by the geometric means of PHMM-based ones and HMM-based ones. From continuous speech recognition experiments, it was found that SPHMM gave the best performance compared with HMM and PHMM when the weight of smoothing was set adequately.

  • 知覚・行動による実世界とのインタラクションに基づく言語の理解・獲得と行動生成

    日本学術振興会  科学研究費助成事業 萌芽的研究

    Project Year :

    1998
    -
    1999
     

    小林 哲則

     View Summary

    本研究では、知覚機能を持つロボットを思考の主体とし、それを我々の生活環境におくことによって、実世界とのインタラクションが可能な状況を設定し、その中でロボットが知覚・行動を介して思考する枠組みを構築することを試みた。特に、ロボットに対する依頼表現の解釈、あるいはそれに応じた適切な行動計画の生成に焦点をあて、これらに対する適切な問題解釈を、実世界とのインタラクションを通じて行なう方法について検討した。
    昨年度は、1)自律ロボットの作成、2)ロボットのAPI決定、3)外部世界の知覚アルゴリズムの作成、4)各種行動の上記シーンの解析結果を与える影響の抽出アルゴリズムの作成、などの要素技術の開発を行ない、これらをベースとして、行動のプリミティブに1対1に対応する言語表現を対象として言語獲得アルゴリズムの開発を行なったが、各要素技術の基本性能には問題が残されていた。
    本年度においては、これら要素技術の高度化と、それらを統合したより高度な知的処理の実現を目指した。要素技術の高度化としては、ロボットに対する眉の付加とこれを用いた表情合成を行った。表情によるロボット内部の状態提示が可能になることで、利用者との間でのより豊かなインタラクションが実現された。また、シーン解析のアルゴリズム(視覚処理)に環境適応処理を組み込み、耐性を向上させることで、システムの動作が安定した。統合処理では、プリミティブ単体では実現できない複雑な行動を、どのようなプリミティブの組合せによって構成すべきかを、各種行動のプリミティブとその影響の関係対の組合せ問題を解くことによって求めるアルゴリズムについて検討した。以上によって、知覚・行動・思考の統合処理に基づいて、言語理解・獲得と行動計画の立案を行なう知的なシステムの基礎的な枠組みが実現できた。

  • 時間変化する雑音環境下における音声認識に関する研究

    日本学術振興会  科学研究費助成事業 奨励研究(A)

    Project Year :

    1993
     
     
     

    小林 哲則

     View Summary

    本研究では時間変化する雑音下で発話された音声を高精度で認識するための基本技術を確立することを検討した。
    人間は、音声にかなりの雑音がかぶっていても、あるいは、背景に音楽が流れていようとも、注目する音声を捉え、認識することができる。これらの機能は、人間の音声における瞬時スペクトルの特徴、あるいはその時系列の特徴に関する知識と、雑音における同様の知識を兼ね備えて持って、それらを分離しながら人間の音声にのみに選択的に注目する機能を持っているためである。本研究では、この機能を、音声と雑音とを2つの独立な確率モデルで表し、このモデルの下でもっともらしい音声と雑音の組合せ探索するという枠組によって、確率論理的に実現した・
    具体的には、それぞれの情報源を独立に隠れマルコフモデル(HMM)と呼ばれる確率モデルで表現し、これらの情報源が与えられた時、その組合せの情報源から得られた観測信号列が生起する確率を、スペクトルサブトラクションと動的計画法に基づく最適時間整合とを組み合わせることによって実現した。
    この結果、雑音対策なしのとき、-10dB、-20dBで、それぞれ74%、8%であった認識率を、100%、40%に向上させることができた。

  • 音声の基本周波数揺らぎの生成論的および現象論的モデル化

    日本学術振興会  科学研究費助成事業 奨励研究(A)

    Project Year :

    1990
     
     
     

    小林 哲則

  • 知識主導型音声認識システムの音韻決定部におけるニューラルネットの応用に関する研究

    日本学術振興会  科学研究費助成事業 奨励研究(A)

    Project Year :

    1989
     
     
     

    小林 哲則

  • 音韻特徴の記号的表現とファジー推論を用いた音韻認識に関する研究

    日本学術振興会  科学研究費助成事業 奨励研究(A)

    Project Year :

    1987
     
     
     

    小林 哲則

  • 調音状態推定に基づくボトムアップ型調音結合処理に関する研究

    日本学術振興会  科学研究費助成事業 奨励研究(A)

    Project Year :

    1986
     
     
     

    小林 哲則

▼display all

Misc

  • IPA Japanese dictation free software project

    Katsunobu Itou, Kiyohiro Shikano, Tatsuya Kawahara, Kazuya Takeda, Atsushi Yamada, Akinori Ito, Takehito Utsuro, Tetsunori Kobayashi, Nobuaki Minematsu, Mikio Yamamoto, Shigeki Sagayama, Akinobu Lee

    2nd International Conference on Language Resources and Evaluation, LREC 2000    2000.01

     View Summary

    Large vocabulary continuous speech recognition (LVCSR) is an important basis for the application development of speech recognition technology. We had constructed Japanese common LVCSR speech database and have been developing sharable Japanese LVCSR programs/models by the volunteer-based efforts. We have been engaged in the following two volunteer-based activities. a) IPSJ (Information Processing Society of Japan) LVCSR speech database working group. b) IPA (Information Technology Promotion Agency) Japanese dictation free software project. IPA Japanese dictation free software project (April 1997 to March 2000) is aiming at building Japanese LVCSR free software/models based on the IPSJ LVCSR speech database (JNAS) and Mainichi newspaper article text corpus. The software repository as the product of the IPA project is available to the public. More than 500 CD-ROMs have been distributed. The performance evaluation was carried out for the simple version, the fast version, and the accurate version in February 2000. The evaluation uses 200 sentence utterances from 46 speakers. The gender-independent HMM models and 20k/60k language models are used for evaluation. The accurate version with the 2000 HMM states and 16 Gaussian mixtures shows 95.9 % word correct rate. The fast version with the phonetic tied mixture HMM and the 1/10 reduced language model shows 92.2 % word correct rate and realtime speed. The CD-ROM with the IPA Japanese dictation free software and its developing workbench will be distributed by the registration to http://www.lang.astem.or.jp/dictation-tk/or by sending e-mail to dictation-tk-request@astem.or.jp.

Industrial Property Rights

  • 学習装置、音声認識装置、学習方法、および、学習プログラム

    特許7473890

    Patent

  • 対話システムおよびプログラム

    特許7274210

    Patent

  • 情報伝達システムおよびプログラム

    特許7244910

    Patent

  • 収音装置、収音プログラム、及び収音方法

    Patent

  • 予兆検知システムおよびプログラム

    Patent

  • 予兆検知システムおよびプログラム

    特許7107498

    Patent

  • 情報再生プログラム、情報再生方法、情報処理装置及びデータ構造

    Patent

  • 予測装置、予測方法および予測プログラム

    特許6928346

    Patent

  • 制御状態監視システムおよびプログラム

    Patent

  • 状態監視システム

    特許6717461

    Patent

  • 単語予測装置、プログラム

    特許6588874

    Patent

  • 言語確率算出方法、言語確率算出装置および言語確率算出プログラム

    特許6495814

    Patent

  • 会話ロボット

    特許5751610

    Patent

  • 情報処理システム及び情報処理方法

    特許 5467298

    Patent

  • 情報処理装置及び情報処理方法

    特許5466593

    Patent

  • 対話活性化ロボット

    特許5294315

    Patent

  • 音源分離装置、音源分離方法、音源分離プログラム及び記録媒体

    特許5190859

    Patent

  • 音源分離装置、方法及びプログラム

    特許5170465

    Patent

  • 音声認識用音響モデル作成装置とその方法と、プログラム

    特許5152931

    Patent

  • 音源分離装置、プログラム及び方法

    特許5105336

    Patent

  • 音声対話装置、音声対話方法及びロボット装置

    特許5051882

    Patent

  • 音源分離装置、方法及びプログラム

    特許4986248

    Patent

  • 音源分離システムおよび音源分離方法、並びに音響信号取得装置

    特許4873913

    Patent

  • 顧客情報収集管理システム

    特許4778532

    Patent

  • 音源分離方法およびそのシステム

    特許4594629

    Patent

  • 人物属性識別方法およびそのシステム

    特許4511850

    Patent

  • 音源分離方法およびそのシステム、並びに音声認識方法およびそのシステム

    特許4457221

    Patent

  • 顧客情報収集管理方法及びそのシステム

    特許4125634

    Patent

  • 音声入力モード変換システム

    特許3906327

    Patent

▼display all

Other

  • 特別講義:高校数学で理解するChatGPTのしくみ(早大理工・オープンキャンパス)

    2023.08
    -
     
  • 特別授業:高校数学で理解する人工知能の基礎(早大学院)

    2021.06
    -
     
  • 特別講義:高校数学で理解する人工知能の基礎(早大理工・オープンキャンパス)

    2018.08
    -
     
  • 特別授業:人工知能と会話ロボット(高知工業高校)

    2018.05
    -
     
  • 特別授業:メディア情報処理が拓く世界(早大学院)

    2005
    -
     
  • 特別授業:ロボットとのコミュニケーション(山手学院高校)

    2005
    -
     
  • 特別授業:会話ロボットの歴史(都立立川高校)

    2005
    -
     
  • 特別講義:高校数学で理解するChatGPTのしくみ(早大理工・オープンキャンパス)

    2024.08
    -
     
  • 特別授業:高校数学で理解する人工知能の基礎(早大学院)

    2019.10
    -
     
  • 特別講義:会話とロボット・・・私のライフワーク(東京女子大)

    2017.06
    -
     
  • Seminar : Enjoyable Conversation System (MIT, Spoken Lang. Sys. Group)

    2015.11
    -
     
  • セミナー:ロボットを用いた会話プロトコルの研究(千葉工大・藤江研)

    2015.05
    -
     
  • 特別講義:会話システム:情報提供のきっかけは誰が作りうるか(奈良先端大)

    2014.11
    -
     
  • Seminar : A robot-based approach towards finding conversation protocols (MERL)

    2013.06
    -
     
  • セミナー:会話ロボットの開発(名古屋大・武田研)

    2011.10
    -
     
  • Lecture : Conversation Robot (E-Just, Egypt)

    2010.05
    -
     
  • 特別講義:パラ言語の理解・生成機能を持つ マルチモーダル会話ロボット(豊橋技科大)

    2009.11
    -
     
  • Seminar : Recent Research Topics in Perceptual Computing Group at Waseda University (MIT, Spoken Lang. Sys. Group)

    2009.10
    -
     
  • セミナー:音声対話ロボットの開発と将来展開(東北工大・畑岡研)

    2009.02
    -
     
  • セミナー:人間と自然に会話するロボット の実現を目指して(電機大・武川研)

    2007.12
    -
     
  • 特別授業:コミュニケーションと人形ロボット ―理工学のススメ―(小平3中)

    2007.09
    -
     
  • セミナー:ロボットとの会話におけるパラ言語情報の利用- ROBISUKE:マルチモーダル会話ロボット -(東北大・牧野研)

    2004.11
    -
     
  • セミナー:ROBISUKE:次世代の会話ロボット(京都大・奥乃研)

    2003.11
    -
     
  • セミナー:ロボットと人との対話(阪大・白井研)

    2003.10
    -
     
  • 特別講義:マルチモーダルインタフェースによる ヒューマノイドロボットとの対話(北陸先端大)

    1999.10
    -
     
  • Seminar : Multi-person Communication via Multi-modal Interface - Human Interface of the Humanoid Robot - (MIT, Spoken Lang. Sys. Group)

    1999.05
    -
     
  • ◇ 本欄における特別講義/特別授業/セミナーの区別について

     View Summary

    特別授業:高校・中学に赴いて単発の授業を行ったもの; 特別講義:大学に赴いて単発の講義を行ったもの; セミナー:大学・企業の研究室に赴いて情報提供・議論を行ったもの。

▼display all

 

Syllabus

▼display all

Teaching Experience

  • 最適化アルゴリズム (早稲田大学)

    2005.04
    -
    Now
     

  • 情報理論 (早稲田大学)

    1995.09
    -
    Now
     

  • パターン認識 (法政大学,早稲田大学)

    1991.04
    -
    Now
     

  • 信号処理 (法政大学,早稲田大学)

    1991.04
    -
    2008.03
     

  • プログラミング (法政大学,東京農工大学,早稲田大学)

    1985.04
    -
    2004.03
     

  • 電子回路 (早稲田大学)

    1992.04
    -
    1994.07
     

  • 人工知能 (法政大学,早稲田大学)

    1985.04
    -
    1992.03
     

  • 計算機工学 (法政大学)

    1986.04
    -
    1991.03
     

  • 電磁気学 (法政大学)

    1985.04
    -
    1991.03
     

▼display all

 

Social Activities

  • 会話ロボット ROBISUKE の実演展示

    愛・地球博 

    2005.08
     
     

  • Conversational Robot ROBISUKE, Exhibition & Demonstration

    Lille 2004 -European Capital of Culture : Robots ! 

    2003.12
    -
    2004.03

  • Conversational Robot ROBISUKE, Exhibition & Demonstration

    ROBODEX2003 

    2003.03
     
     

  • 会話ロボット ROBITA の実演展示

    NTT Intercommunication Center  「 共生する/進化するロボット」展 

    1999.02
     
     

  • WABOT2 の実演展示

    科学万博つくば'85 

    1985.03
    -
    1985.09

  • SCHEMA: multi-party interaction-oriented humanoid robot

    ACM,  SIGGRAPH ASIA 2009 Art Gallery & Emerging Technologies: Adaptation 

    2009.12
     
     

  • 会話ロボットROBISUKEの実演展示

    大垣市  ものづくりフェスティバル 

    2009.03
     
     

  • 会話ロボットROBISUKEの実演

    ケアタウン小平  訪問デモ 

    2008.06
    -
     

  • 会話ロボットROBISUKEの実演展示

    21世紀夢ウィーク ~飛騨高山ロボットワールド~ 

    2005.09
     
     

  • Conversational Robot ROBISUKE, Exhibition & Demonstration

    Japan Festival in Korea 

    2002.10
     
     

  • 会話ロボットROBITA の実演展示

    ザ・ロボット博 

    2001.04
    -
    2001.05

▼display all

Academic Activities

  • Organizing committee member, Local arrgement committee chair, Interspeech 2010

    Competition, symposium, etc.

    ISCA  

    2010.09
     
     
  • Organizing committee chair, International Symposium on Spoken Dialogue

    Competition, symposium, etc.

    Waseda University  

    1993.11
     
     
  • Exhibition committee member, International Conference on Spoken Language Processing 1990

    Competition, symposium, etc.

    1990.11
     
     

Sub-affiliation

  • Faculty of Science and Engineering   Graduate School of Fundamental Science and Engineering

  • Affiliated organization   Global Education Center

Research Institute

  • 2023
    -
    2024

    Center for Data Science   Concurrent Researcher

  • 2022
    -
    2024

    Waseda Research Institute for Science and Engineering   Concurrent Researcher

  • 2020
    -
    2024

    Perceptual Computing Laboratory   Director of Research Institute

Internal Special Research Projects

  • 音声会話システムに関する研究

    2021  

     View Summary

     会話システムにおいてコンテンツを扱う部分からプロトコルを扱う部分を分離して両者を疎結合構成とし,後者を対象として End-to-End 学習を適用することで,比較的少数のデータで学習が可能な会話プロトコル制御モデルについて検討している.本年度は,特にシステムの発話タイミングの精緻なモデリングについて検討した.従来の会話システムでは,発話終端の検知に基づいてシステムを動作させるが,安定した発話終端検知には時間がかかり,リズムの良い会話はできない.そこで,発話終端検知に頼ることなく,韻律パターンや発話内容からシステムが発話すべきか否かを,音声の分析フレームの更新時刻に同期して毎時刻判定することについて検討した.モデルは,LSTMをベースとするDNN(Deep Neural Networks)で構成し,利用する入力情報としては,スペクトル包絡特徴,韻律特徴,言語特徴(音声認識の結果得られるサブワード列),対話行為の推定結果を用いることについて検討した.このシステム構成によって,発話タイミングを精緻に制御することができ,円滑な会話進行に貢献すること,対話行為を利用することの効果は大きいことなどが明らかになった.

  • 会話システムのプロトコルとアーキテクチャに関する研究

    2020  

     View Summary

     提案する音声会話の4階層プロトコルのうち,ターンテイク層の機能の精緻化について検討した。 ターンテイク層では,リズムの良い会話の実現のために,文脈に応じてシステムが番をとるべきかとらざるべきか,とるとするならばどの程度の時間をおいてとるべきかを決定する。昨年度,この解決のために,TGN(Timing Generating Networks)とよぶ,イベントの出力タイミングを制御可能なニューラルネットワークを提案したが,本年度はこれに発話義務推定とのマルチタスクで学習する仕組みと,言語情報利用の仕組みとを組み込んで精緻化した。この拡張によって,発話タイミングを0.5秒以内で推定できる精度を7.5%向上させることができた。

  • 会話システムのプロトコルとアーキテクチャに関する研究

    2019  

     View Summary

     提案する音声会話の4階層プロトコルのうち,ターンテイク層の機能の実現法を検討した。 ターンテイク層では,リズムの良い会話の実現のために,文脈に応じてシステムが番をとるべきかとらざるべきか,とるとするならばどの程度の時間をおいてとるべきかを決定する。この問題の解決のために,ETCNN(Event-Timing Controllable Neural Network)とよぶ,イベントの出力タイミングを制御可能なニューラルネットワークを提案した。ETCNNは,出力タイミングが,ユーザ発話の韻律,ユーザなどに従って制御できるEnd-to-Endの枠組みである。この手法の適用によって,発話タイミングの推定誤差を従来手法に比べ平均約20%減じるとともに,格段に推定の外れ値を減ずることに成功した。

  • 会話システムのプロトコルとアーキテクチャに関する研究

    2018  

     View Summary

     我々が提案する音声会話の4階層プロトコルのうち,参与構造形成層,メッセージ送信層の機能実現法を検討した。 参与構造形成層では,参与構造形成のためのシステムの行動を,センサー情報からEnd-to-Endで決める手法を検討した。発話や視線の認識を副タスクとするマルチタスクNeural Networksを導入することで,従来のルールに基づく方法に比べ,精度を30ポイント以上向上できた。 メッセージ送信層では,段落内における各文の重要度をBERTに基づく解析結果を用いて求め,これに応じて文間の「間」を静的に制御するする手法を提案した。一対比較によるプリファレンス評価において,本手法導入後のシステムは,導入以前に対し,77%という極めて高い値を達成した。

  • 会話システムのプロトコルとアーキテクチャに関する研究

    2017  

     View Summary

    会話プロトコルを,通信系になぞらえて,a)物理層,b)参与構造形成層,c)メッセージ送信層,d)ターンテイク層に整理した。a)は通信系の物理層に相当し,人を模した表現手段としての身体を持つことで,人同士と同じ方法でのデータ授受を可能にする。b)はデータリンク/ネットワーク層相当に相当し,身体表現によって,会話への参加状態と,その変更手続きを与える。c)はトランスポート層に相当し,相槌等によってデータ授受の成否を伝える。d)はセッション層に相当し,セッション開始・終了を定義する。これら円滑な会話に必要となる振る舞いを,機能・役割レベルと,具体的身体動作レベルに分けて記述し,ハードウェアに依存部を下位に隠蔽した。

  • 音声会話:情報遭遇を含む多様な情報行動による情報アクセスに関する研究

    2017   小川哲司, 林良彦

     View Summary

     従来,音声会話は,質問応答を対象として実現されてきた。しかし,快適な情報アクセスには,これに加えシステム側から主体的に情報提供する機能が必要とされ,さらにこれらのモードがリズム良く遷移できること求められる。ここでは,これら複雑な情報行動に即応性高く対応できる音声会話システムを実現する立場から,「シナリオ主導型会話システム」を提案した。伝えようとする文書の解析に基づいて,文書のあらすじを伝えるシナリオの主計画と,想定される質問に答える副計画とが事前に準備され,これらに沿って会話が進められる。実験の結果,従来型の会話システムに比べ,ユーザに必要な情報だけ効率的に伝達できるシステムが実現できた。

  • 会話:意図性の異なる多様な情報行動による情報享受の実現

    2016   林良彦, 藤江真也

     View Summary

     能動的(検索的)情報行動と受動的(遭遇的)情報行動の双方を交えたリズム良い情報授受によって,ニュース内容を伝える会話システムを実現した。即応性を重要するため,予想されるユーザの応答に応じた分岐を含む会話進行のシナリオを準備し,これを切り替えながら会話を進める方法をとった。シナリオは,ニュースの根幹を伝える主計画と,ユーザの反応に応じて補足情報を提示する副計画からなる。前者は,話題性を考慮して重要語を定め,これを含むようニュースを要約して定めた。後者は,各呼気段落内の重要自立語に対し質問タイプを網羅して回答を用意して定めた。以上によって,所期の目的を達成する会話システムを実現できた。

  • 深層学習に基づく雑音抑圧処理歪の補正と雑音下音声認識への適用に関する研究

    2016   小川哲司

     View Summary

     本研究では,申請者が研究を続けてきた高速・高精度な音源分離手法であるエリア収音技術と深層学習を利用した低歪の雑音抑圧技術を融合することで,拡散性雑音抑圧フィルタを完成度の高い形で実現する方式の検討を行った. 提案方式では,エリア収音により目的音と雑音を分離した後,目的音に残留した雑音成分を抑圧するフィルタを構築する.そのために,エリア収音により得られた目的音が支配的な信号と拡散性雑音が支配的な信号のパワースペクトルを入力とする深層ニューラルネットワークによって帯域ごとのSNRを推定した. 提案方式は,従来のマルチチャネルウィナーフィルタと比較して,処理歪を抑えながら高い雑音抑圧性能を達成した.

  • 参加者間の共鳴状態を誘導する音声会話システム

    2014   林良彦, 小川哲司, 松山洋一, 藤江真也, 中野鐵兵

     View Summary

     新たな会話制御技術と情報提供技術により,会話を共鳴状態(参加者が響きあうよう呼応して会話する状態)に導く音声会話システムを実現した。 会話制御に関しては,全会話参加者が等しい発話機会を持つための調整機能を提案した。会話では特定の人が頻繁に発話を繰り返し,一部は会話に入れないことがある。ここでは,会話に割り込んで主導権をとった後,発話機会の少ない人に話題をふる機能を実装しこの問題を解決した。 情報提供に関しては,レビュー記事にある投稿者の主観的発話を,システムの主観の如く発話する機能を実装した。会話相手を楽しませる効果を持つ発話の選択手法と,選ばれる複数の文の主観が一貫性を持つため枠組みを提案した。

  • グループ会話環境下における場の活性化要素としての会話ロボットに関する研究

    2012   松山洋一, 岩田和彦

     View Summary

     少人数て&#12441;の会議や談笑に見られる会話参加者間て&#12441;動的な発話やりとりを特徴とする「ク&#12441;ルーフ&#12442;会話(多人数会話)」を対象として,これに参与して,場を活性化する機械システムを実現するための枠組みについて検討した。 具体的には,1)会話に入れない人に参加を促す際のプロトコル2)興味を引く発話の自動生成3)会話システム用顔画像処理技術の高度化4)会話システム用音声合成技術の高度化の4つのサブテーマをとりあげた。各テーマの成果は以下の通りである。 「1)会話に入れない人に参加を促す際のプロトコル」は,発話する機会も,話しかけられる機会も少ない会話参加者を見つけて,その人に話しかけ,会話に入ってもらうためには,どのような仕組みが必要かについて検討したものである。このとき,システムは,対象となる人を探す機能,話しかける適切な内容を決める機能,話しかけても全体の調和を乱さないタイミングを決める機能,などが必要となる。ここでは,①参加者の発話状態,視線などからそれぞれの参与役割を推定し,発話者にも主たる聞き手にもなっていない割合が高い人を話しかけるべき対象者とする,②CRFを用いて話題を適切に追うことで発話するべき内容を定める,③現話題の下での会話に参加して「調和的会話参加者」になってから対象者に話かける,などの仕組みを実装することで,所望の機能を実現した。 「2)興味を引く発話の自動生成」は,会話参加者からの質問にたいする回答内容を自動的に用意する方法について検討したものである。興味を引く発話を行うために,回答内容には,システムの主体的感想,評価的内容を含めることとした。この目的のため,システムは,レビューサイトをクロールし,関連する話題について評価を述べた部分を抜き出し,口語調に表現を変えた上で,内容のふさわしさを評価ランキングし,上位の文を回答文とする方式を考案した。ここで,ランキングには低頻度形容詞を多用している文を評価尺度とした。これによって,情報の多い意外性のある文を選ぶ仕組みが実現し,効果的な回答文を生成することに成功した。 「3)会話システム用顔画像処理技術の高度化」は,会話システムに必要となる顔検出を安定に行う技術について検討したものである。AAMを改良することで,顔と顔部品の検出精度を飛躍的に改善することができた。 「4)会話システム用音声合成技術の高度化」は,会話調の音声合成方式について検討したものである。文脈に応じて,適切な声質・抑揚で発話できる合成器を,心理空間上での文脈のクラスタリング,語末表現のクラスタリング等を精緻に行うことで実現した。

  • 音声会話システムの総合的研究

    2011   藤江 真也, 小川 哲司, 松山 洋一, 岩田 和彦

     View Summary

    ロボットを用いた会話コミュニケーションの実現に向けて,以下のテーマの研究を行った。(1)音声会話プロトコルの解明 音声会話プロトコルのモデル化を,会話の観察に基づて行った。特に多人数で会話を行うとき,会話相手の選択,発話の番の制御などが,どのような身体表現を伴って行われるかを整理した。(2) 魅力ある会話の実現 会話が魅力的であるために,ロボットの発話はどうあるべきかについて整理した。特に相手が話しやすくすることに配慮しながら,まず,単に聞かれたことに答えるだけでなく,質問に答えながら関連した新たな話題を含めるしくみを用意した。これによって利用者は発話をつなぎやすくった。(3)要素技術の開発 3-1)視覚情報処理: 会話参加者の姿勢は,その会話参加者の会話への参加の意思等を特定するのに重要である。また,視線が直接の通信相手を表すこと,表情が情報伝達の成功/不成功や,相手の興味の有無を表現することなどは,既に良く知られている。この「姿勢と表情」の自動認識システムを開発した。姿勢認識・表情推定に必要となる画像特徴点の抽出問題に対し,ロボットに装着したカメラだけでなく,部屋の天井に設置したカメラでも情報を収集した上でそれらを統合利用するシステムを実現した。 3-2)聴覚情報処理: 多人数の音声会話をハンズフリーで行うとき生じる様々な問題を解決した。主に目的話者の背後から到来する指向性雑音の除去と,残響の問題を,提案する6マイクの帯状ビームフォーマ[4]で処理した。また,会話では,一息で多くの文を話したり,ひとつの文をとぎれとぎれに話したりするが,この発話単位と意味の纏まりの異なりが,会話音声認識の問題を難しくしている。ここでは,話し方(間のおき方)の違いは,一種のプロトコルにかかわる情報を発話に含めた結果と解釈し,それが引き起こす特有の韻律現象を,デコードに積極利用する方法を検討した。(4)統合システム 以上(1)-(4)を統合し,複数人を対象に,ゲームをしながら会話を楽しむことができるシステムを実現した。通所介護施設の高齢者との会話実験を行い,好評をいただいた。 

  • 会話ロボットの利用に基づくパラ言語理解・生成機構の定量的モデル化に関する研究

    2007   藤江 真也

     View Summary

     これまでに開発したパラ言語情報の理解・表出機能を持つ音声会話ロボットを用いて,自然な音声会話コミュニケーションを成立させるために必要となるパラ言語情報の役割を定量的に明らかにすること試みた。 我々人間は,会話的コミュニケーションにおいて,音声で言語情報を伝える傍らにおいて,会話参与の状態(情報を受け付ける状態にあるのかどうか,正常に情報を受け付けたかどうか,受けた情報をどのように評価しているのか等)を表情によって伝達しており,これが基礎となって円滑な情報交換が成立している。これらの情報は,パラ言語と呼ばれる情報の一部であるが,これらパラ言語の重要性を定性的に指摘する研究は存在するものの,これをどの程度厳格にモデル化すれば自然なコミュニケーションは成立するのかについて定量的に検討する試みはなされていない。 そこで本研究では,特にターンテーキングの円滑化に係るロボットの表情表出動作として,視線表現を選定し,その定量的モデル化を試みた。一般に,ターンを渡すためには発話終了に合わせて聴取者に視線を向け,ターンを保持するためには視線をずらすとされている。表現のバリエーション(視線の外し方,合わせ方)およびその頻度,時間構造をパラメタとするモデルを作成し,種々のパラメタ設定でパラ言語情報を生成する会話ロボットと被験者の会話実験を行い,その自然性を評価した。この結果,自然な視線表現を実現するパラメータの関係式と,連続動作させるときのパラメタセットの組み合わせに関する知見が得られ,これに従って視線を動作させるとき,会話が自然に進行することを確認した。

  • 状況把握と身体表現機能を有する複数話者との会話ロボット-人間と空間を共有する情報端末の実現に向けて

    2000   菊池 英明, 高西 淳夫

     View Summary

     複数話者と会話するロボットの実現に向けて、1)複数話者の音声の分離・認識、2)複数話者間におけるコミュニケーションチャネルの認識、3)身体による意志表現、4)情報統合技術、の4点について検討を行った。 1)に関しては、通常音声認識に用いられる、音響モデルと、言語モデルに加え、発話のターンテーキングに関する発話間言語モデル、および話者の交代を統計的に表わす話者モデルをさらに加えた4つの確率モデルを用いて、最も尤もらしい話者の交代と発話内容を推定するアルゴリズムを確立した。 2)に関しては、音源の定位と発話者の顔方向の組合せによって、誰が誰に向かって話しているのかを認識することを可能とした。音源定位に関しては、MUSICスペクトルの相関を用いた方法を提案し、定位精度を飛躍的に改善した。顔方向に関しては、ICAを基礎とした特徴抽出手法を提案し、高精度の顔方向の認識を実現した。 3)に関しては、ロボットハードウェアとして、従来からあった目、手などに加え、眉、口、手などを付加することで、表現能力を拡充した。また、それらの単純化された身体を用いて、効果的に内部状態を表現するための動作と、その提示戦略を確立した。 4)に関しては、黒板システムにサブスクライブ/パブリッシュ機能を付加した情報伝達機構を考え、これをロボットを構成する多種多様なプロセッサ構成の中で、透過な形で実装した。 以上の成果を用いて、外部状況を視覚的あるいは聴覚的に把握し、ときにジェスチャ等の非言語的手段による意思表示をしながら、複数の相手を対象にして会話できるロボットを実現した。

  • 確率過程の精密なモデル化とその音声認識・ジェスチャ認識への応用に関する研究

    1998   橋本 周司, 笠原 博徳

     View Summary

    本研究では、時系列パターンマッチングのための確率過程のモデルを精密化するとともに、それを用いて音声認識、ジェスチャー認識の性能を向上させることを試みた。音声やジェスチャーの認識に代表される時系列のパターン認識においては、確率過程のモデルが重要な役割を果たす。従来この確率モデルとしては、隠れマルコフモデル(HMM)が用いられてきた。しかしながら、HMMは区分定常の確率過程しか扱うことができず、結果として種々の不都合を生じていた。この問題を解決するために、2重のマルコフモデルから発して、時間の古い状態を観測不能な隠れ状態に、時間の新しい方の状態を可観測状態においた、新たな確率モデル、部分隠れマルコフモデル(PHMM)を提案した。HMMでは、出力、次状態ともに前状態にのみ依存して決まるのに対し、PHMMでは、出力、状態ともに、状態と前出力に依存して決まる枠組となっている。この構造のため、モデルの複雑化を抑えた上で、HMMに比べ過渡部の表現能力の高い確率過程のモデルが実現できた。 PHMMのパラメータ推定法としては、EMアルゴリズムを用いた定式化を行ない、厳密なパラメータ推定法を確立した。シミュレーション実験を通じてPHMMとHMMの特性を比較したところ、HMMでは出力確率が主に状態推移のタイミングを決め、状態遷移確率はほとんど無意味であるのに対し、PHMMでは遷移確率が状態推移のタイミングを決めていることが分かった。遷移部の動特性の違いを区別する上でも、PHMMはHMMより有効であることが分かった。 PHMMを用いて、ジェスチャ認識実験と音声認識実験を行なったところ、ジェスチャ認識、音声認識ともにHMMより高い性能が得られ、PHMMの時系列パターン認識への有効性が確認された。

▼display all