Details of a Researcher - KOBAYASHI, Tetsunori

写真a

KOBAYASHI, Tetsunori

Scopus Paper Info

Paper Count: 238 Citation Count: 2275 h-index: 24

Click to view the Scopus page. The data was downloaded from Scopus API in August 11, 2025, via http://api.elsevier.com and http://www.scopus.com .

Google Scholar Information (Citations per year)

Citation Count: 5736 h-index: 35 i10-index: 115

Click to view the Google Scholar page.

Scopus Information

Affiliation

Faculty of Science and Engineering, School of Fundamental Science and Engineering

Job title

Professor

Degree

Dr. Eng.

Homepage URL

http://www.pcl.cs.waseda.ac.jp/index.html

Profile

音声・画像処理などを用いたコンピュータ・ヒューマン・インタラクション，知能ロボット，音声の生成・知覚，インタフェースの開発パラダイムなどの研究に興味を持つ。

Research Experience

1997.04

-

Now

Waseda University Professor
2018.11

-

2020.09

Waseda University Center for Research Strategy Director
2004.04

-

2009.03

NHK Science & Technology Research Laboratories Visiting Researcher
2000.04

-

2002.03

ATR Spoken Language Translation Research Laboratories Visiting Researcher
1994.07

-

1995.08

MIT Laboratory for Computer Science Visiting Researcher
2020.04

-

2020.09

Waseda University Reserch Council Chair
2014.11

-

2016.09

Waseda University Institue for Advanced Study Depty Director
2010.09

-

2016.09

Waseda University Research Institute for Science and Engineering Deputy Director
2007.04

-

2014.03

Waseda University Dept. Computer Science and Engineering (due to change of department name)
2004.04

-

2007.03

Waseda University Dept. Computer Science Professor
2003.04

-

2004.03

Waseda University Dept. Computer Science (due to depatment reorganization)
1997.04

-

2003.03

Waseda University Dept. Electrical, Electronics and Computer Engineering Professor
1996.04

-

1997.03

Waseda University Dept. Electrical, Electronics and Computer Engineering (due to change of dapartment name) Assosiate Professor
1991.04

-

1996.03

Waseda University Department of Electrical Engineering, School of Science and Engineering
1987.04

-

1991.03

Hosei University
1990.07

-

1990.09

Auditory and Visual Perception Research Laboratories Invited Researcher
1985.04

-

1987.03

Hosei University Dept. Electrical Engineering Lecturer

▼display all

Education Background

1985.03

-

　

Waseda University Graduate School of Science and Engineering
1982.03

-

　

Waseda University Graduate School of Science and Engineering
1980.03

-

　

Waseda University School of Science and Engineering Department of Electrical Engineering

Committee Memberships

2004.05

-

2006.05

電子情報通信学会会誌編集委員会編集特別幹事
2001.04

-

2004.03

情報処理学会音声言語情報処理研究会主査
1998.04

-

2002.03

言語処理学会理事

Professional Memberships

1994

-

Now

THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING
1987

-

Now

THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE
1983

-

Now

INFORMATION PROCESSING SOCIETY OF JAPAN
1980

-

Now

ACOUSTICAL SOCIETY OF JAPAN
1980

-

Now

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS
2003

-

Now

Language Resources Association
1996.10

-

Now

ACM
1989.01

-

Now

THE ROBOTICS SOCIETY OF JAPAN
1984

-

Now

IEEE

▼display all

Research Areas

Intelligent robotics / Perceptual information processing

Research Interests

Pattern Recognition
Image Processing
Spoken Language Processing
Conversational Robot

Awards

Fellow

2023.03 Institute of Electronics, Information and Communication Engineers (IEICE) Contributions to research on multi-modal multi-party conversations using robots
Fellow

2016.06 Information Processing Society of Japan Contributions to pioneering research on robot conversation and contributions to the revitalization of the research community
American Publishers Awards for Professional and Scholarly Excellence

2008 Springer handbook of robotics
Best Paper Award

2001 Institute of Electronics, Information and Communication Engineers (IEICE) Conversation Robot Participating in Group Conversation

Winner： Yosuke Matsudsaka, Tsuyoshi Tojo, Tetsunori Kobayashi
Award for Academic Startups

2024.08 Japan Science and Technology Agency
研究会優秀賞

2019 人工知能学会隠れた良作を推薦可能なWeb小説レコメンドシステムの提案
Best Poster Award

2016.12 ACM SIGGRAPH VRCAI2016 Video Semantic Indexing using Object Detector

Winner： Kazuya Ueki, Tetsunori Kobayashi
研究会優秀賞

2015 人工知能学会情報アクセスにおける受動性と能動性：音声対話によるニュース記事アクセス
研究会優秀賞

2012 人工知能学会多人数会話活性化のための自発的行動タイミング検出と発話行動戦略
HAI-2012

2012 Outstanding Research Award
研究会優秀賞

2011 人工知能学会発話期待度／意欲度に基づく発話タイミング制御
Best Paper

2008 IEEE BTAS2008 (International Conference on Biometrics: Theory, Applications and Systems) Class distance weighted locality preserving projection for automatic age estimation

Winner： Kazuya Ueki, Masakazu Miya, Tetsuji Ogawa, Tetsunori Kobayashi.
研究会優秀賞

2008 人工知能学会人-人コミュニケーションの活性化支援ロボットの開発

▼display all

Media Coverage

「場を読める」目線・しぐさ

Newspaper, magazine

読売新聞

2012.04
処理遅延の小さい音源分離モジュール，OKIと早大が共同開発

Newspaper, magazine

Author： Other

日経BP, 日経エレクトロニクス,

2009.05
会話ロボット

TV or radio program

NHK, サイエンスZERO

2006.01
会話ロボット

TV or radio program

Discovery Channel Canada

2005.03
年齢・性別推定システム

TV or radio program

TV東京ワールドビジネスサテライト

2004.03
会話ロボット

TV or radio program

NHK, クローズアップ現代

2000.01
会話ロボット ROBITA

TV or radio program

NHK サイエンス・アイ

1999.12
会話ロボット ROBITA

TV or radio program

TBS 筑紫哲也・立花隆の『人のたび・ヒトへの旅』

1999.05
騒音の中でも声だけ拾う OKI、スマホ向け技術雑音消し声認識

Newspaper, magazine

日本経済新聞

2012.11
音源分離技術

TV or radio program

テレビ東京ワールドビジネスサテライト・トレンドたまご

2008.11
音源分離技術

Newspaper, magazine

日経産業新聞

2008.11
会話ロボット

TV or radio program

VARA(オランダのテレビ局)

2006.01
NECソフト来店客を自動解析早大と開発年齢・性別を推定

Newspaper, magazine

日経MJ

2004.05
コナミと早大 CG共同開発

Newspaper, magazine

日本経済新聞

2004.02
人の感情読み取り対話

Newspaper, magazine

日本経済新聞

2004.01
早大「会話型ロボ」開発あいまいな言葉も理解し返答

Newspaper, magazine

日経産業新聞

2004.01
顧客の年齢や性別推定早大 NECソフト

Newspaper, magazine

日経産業新聞

2003.10
会話ロボット

TV or radio program

テレビ東京賢者のマネー

2003.06
まずトーイより始めよ：認識機能を強化してユーザとの対話を多彩に

Newspaper, magazine

日経BP, 日経エレクトロニクス,

No.747, pp.133-134,

1999.07
もう一つのワールドカップ

TV or radio program

テレビ愛知

1999.06
特集：感性ロボット登場

Newspaper, magazine

日刊工業新聞, トリガー,

Vol.18, No.5, pp.24-26,

1999.05
会話ロボット

TV or radio program

テレビ朝日, 週間地球テレビ『ロボット特集』,

1998.01
ジェスチャ認識システム

TV or radio program

東京メトロポリタンテレビロボットが鉄腕アトムになる日

1997.09
世界をリードする日本のロボット技術

Newspaper, magazine

日本工業新聞

1997.04

▼display all

Papers

Response Timing Estimation for Spoken Dialog Systems Based on Syntactic Completeness Prediction

Jin Sakuma, Shinya Fujie, Tetsunori Kobayashi

2022 IEEE Spoken Language Technology Workshop (SLT) 2023.01 [Refereed] [International journal]

Authorship：Last author

DOI
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model

Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

ACM EMNLP 2022 5486 - 5503 2022 [Refereed] [International coauthorship]
Associative Memory Model-Based Linear Filtering and Its Application to Tandem Connectionist Blind Source Separation

Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 25 ( 3 ) 637 - 650 2017.03 [Refereed]

Authorship：Last author

　View Summary

We propose a blind source separation method that yields high-quality speech with low distortion. Time-frequency (TF) masking can effectively reduce interference, but it produces nonlinear distortion. By contrast, linear filtering using a separation matrix such as independent vector analysis (IVA) can avoid nonlinear distortion, but the separation per-formance is reduced under reverberant conditions. The tandem connectionist approach combines several separation methods and it has been used frequently to compensate for the disadvantages of these methods. In this study, we propose associative memory model (AMM)-based linear filtering and a tandem connectionist framework, which applies TF masking followed by linear filtering. By using AMM trained with speech spectra to optimize the sepa-ration matrix, the proposed linear filtering method considers the properties of speech that are not considered explicitly in IVA, such as the harmonic components of spectra. TF masking is applied in the proposed tandem connectionist framework to reduce unwanted components that hinder the optimization of the separation matrix, and it is approximated by using a linear separation matrix to reduce nonlinear distortion. The results obtained in simultaneous speech separation experiments demonstrate that although the proposed linear filtering method can increase the signal-to-distortion ratio (SDR) and signal-to-interference ratio (SIR) compared with IVA, the proposed tandem connectionist framework can obtain greater increases in SDR and SIR, and it reduces the phoneme error rate more than the proposed linear filtering method.

DOI

Scopus

5

Citation

(Scopus)
Four-participant group conversation: A facilitation robot controlling engagement density as the forth participant

Yoichi Matsuyama, Iwao Akiba, Shinya Fujie and Tetsunori Kobayashi

Computer Speech and Language 33 ( 1 ) 1 - 24 2015.09 [Refereed]

Authorship：Last author
Conversational Robots: An Approach to conversation protocol issues that utilizes the paralinguistic information available in a robot-human setting.

Tetsunori Kobayashi, Shinya Fujie

Acoustical Science and Technology 34 ( 2 ) 64 - 72 2013.03 [Refereed] [Invited]

Authorship：Lead author
Conversation robot participating in group conversation

Y Matsusaka, T Tojo, T Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E86D ( 1 ) 26 - 36 2003.01 [Refereed] [Invited]

Authorship：Last author

　View Summary

We developed a conversation system which can participate in a group conversation. Group conversation is a form of conversation in which three or more participants talk to each other about a topic on an equal footing. Conventional conversation systems have been designed under the assumption that each system merely talked with only one person. Group conversation is different from these conventional systems in the following points. It is necessary for the system to understand the conversational situation such as who is speaking, to whom he is speaking, and also to whom the other participants pay attention. It is also necessary for the system itself to try to affect the situation appropriately. In this study, we realized the function of recognizing the conversational situation, by combining image processing and acoustic processing, and the function of working on the conversational situation utilizing facial and body actions of the robot. Thus, a robot that can join in the group conversation was realized.
Hierarchical Multi-Task Learning with CTC and Recursive Operation

Nahomi Kusunoki, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

Interspeech 2024 2855 - 2859 2024.09

DOI
Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture

Huaibo Zhao, Shinya Fujie, Tetsuji Ogawa, Jin Sakuma, Yusuke Kida, Tetsunori Kobayashi

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.06

DOI
BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.06

DOI
Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.06

DOI
Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization

Yusuke Fujita, Tetsuji Ogawa, Tetsunori Kobayashi

IEEE Access 11 140069 - 140076 2023

DOI
PostMe: Unsupervised Dynamic Microtask Posting For Efficient and Reliable Crowdsourcing

Ryo Yanagisawa, Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

2022 IEEE International Conference on Big Data (Big Data) 2022.12 [Refereed]

DOI
Phrase-Level Localization of Inconsistency Errors in Summarization by Weak Supervision

Masato Takatsuka, Tetsunori Kobayashi, Yoshihiko Hayashi

Proceedings of the 29th International Conference on Computational Linguistics 6151 - 6164 2022.10 [Refereed]
Response Timing Estimation for Spoken Dialog System using Dialog Act Estimation

Jin Sakuma, Shinya Fujie, Tetsunori Kobayashi

Interspeech 2022 2022.09 [Refereed] [International journal]

Authorship：Last author

DOI
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022.05 [Refereed]

Authorship：Last author

DOI
Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models

Naohiro TAWARA, Atsunori OGAWA, Tomoharu IWATA, Hiroto ASHIKAWA, Tetsunori KOBAYASHI, Tetsuji OGAWA

IEICE Transactions on Information and Systems E105.D ( 1 ) 150 - 160 2022.01 [Refereed]

DOI
Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation

Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

Interspeech 2021 2021.08 [Refereed] [International journal]

Authorship：Last author

DOI
Timing generating networks: Neural network based precise turn-taking timing prediction in multiparty conversation

Shinya Fujie, Hayato Katayama, Jin Sakuma, Tetsunori Kobayashi

Interspeech 2021 3771 - 3775 2021.08 [Refereed]

Authorship：Last author
Improved Mask-CTC for Non-Autoregressive End-to-End ASR

Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021.06 [Refereed]

Authorship：Last author

DOI
Noise-robust attention learning for end-to-end speech recognition

Yosuke Higuchi, Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

European Signal Processing Conference 2021- 311 - 315 2021.01

　View Summary

We propose a method for improving the noise robustness of an end-to-end automatic speech recognition (ASR) model using attention weights. Several studies have adopted a combination of recurrent neural networks and attention mechanisms to achieve direct speech-to-text translation. In the real-world environment, however, noisy conditions make it difficult for the attention mechanisms to estimate the accurate alignment between the input speech frames and output characters, leading to the degradation of the recognition performance of the end-to-end model. In this work, we propose noise-robust attention learning (NRAL) which explicitly tells the attention mechanism where to “listen at” in a sequence of noisy speech features. Specifically, we train the attention weights estimated from a noisy speech to approximate the weights estimated from a clean speech. The experimental results based on the CHiME-4 task indicate that the proposed NRAL approach effectively improves the noise robustness of the end-to-end ASR model.

DOI

Scopus

5

Citation

(Scopus)
Investigation of network architecture for single-channel end-to-end denoising

Takuya Hasumi, Tetsunori Kobayashi, Tetsuji Ogawa

European Signal Processing Conference 2021- 441 - 445 2021.01

　View Summary

This paper examines the effectiveness of a fully convolutional time-domain audio separation network (Conv-TasNet) on single-channel denoising. Conv-TasNet, which has a structure to explicitly estimate a mask for encoded features, has shown to be effective in single-channel sound source separation in noise-free environments, but it has not been applied to denoising. Therefore, the present study investigates a method of learning Conv-TasNet for denoising and clarifies the optimal structure for single-channel end-to-end modeling. Experimental comparisons conducted using the CHiME-3 dataset demonstrate that Conv-TasNet performs well in denoising and yields improvements in single-channel end-to-end denoising over existing denoising autoencoder-based modeling.

DOI

Scopus

3

Citation

(Scopus)
Personalized Extractive Summarization for a News Dialogue System

Hiroaki Takatsu, Mayu Okuda, Yoichi Matsuyama, Hiroshi Honda, Shinya Fujie, Tetsunori Kobayashi

2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings 1044 - 1051 2021.01

　View Summary

In modern society, people's interests and preferences are diversifying. Along with this, the demand for personalized summarization technology is increasing. In this study, we propose a method for generating summaries tailored to each user's interests using profile features obtained from questionnaires administered to users of our spoken-dialogue news delivery system. We propose a method that collects and uses the obtained user profile features to generate a summary tailored to each user's interests, specifically, the sentence features obtained by BERT and user profile features obtained from the questionnaire result. In addition, we propose a method for extracting sentences by solving an integer linear programming problem that considers redundancy and context coherence, using the degree of interest in sentences estimated by the model. The results of our experiments confirmed that summaries generated based on the degree of interest in sentences estimated using user profile information can transmit information more efficiently than summaries based solely on the importance of sentences.

DOI

Scopus

4

Citation

(Scopus)
Analysis of Multimodal Features for Speaking Proficiency Scoring in an Interview Dialogue

Mao Saeki, Yoichi Matsuyama, Satoshi Kobashikawa, Tetsuji Ogawa, Tetsunori Kobayashi

2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings 629 - 635 2021.01

　View Summary

This paper analyzes the effectiveness of different modalities in automated speaking proficiency scoring in an online dialogue task of non-native speakers. Conversational competence of a language learner can be assessed through the use of multimodal behaviors such as speech content, prosody, and visual cues. Although lexical and acoustic features have been widely studied, there has been no study on the usage of visual features, such as facial expressions and eye gaze. To build an automated speaking proficiency scoring system using multi-modal features, we first constructed an online video interview dataset of 210 Japanese English-learners with annotations of their speaking proficiency. We then examined two approaches for incorporating visual features and compared the effectiveness of each modality. Results show the end-to-end approach with deep neural networks achieves a higher correlation with human scoring than one with handcrafted features. Modalities are effective in the order of lexical, acoustic, and visual features.

DOI

Scopus

12

Citation

(Scopus)
Deep Speech Extraction with Time-Varying Spatial Filtering Guided by Desired Direction Attractor

Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2020- 671 - 675 2020.05

　View Summary

In this investigation, a deep neural network (DNN) based speech extraction method is proposed to enhance a speech signal propagating from the desired direction. The proposed method integrates knowledge based on a sound propagation model and the time-varying characteristics of a speech source, into a DNN-based separation framework. This approach outputs a separated speech source using time-varying spatial filtering, which achieves superior speech extraction performance compared with time-invariant spatial filtering. Given that the gradient of all modules can be calculated, back-propagation can be performed to maximize the speech quality of the output signal in an end-to-end manner. Guided information is also modeled based on the sound propagation model, which facilitates disentangled representations of the target speech source and noise signals. The experimental results demonstrate that the proposed method can extract the target speech source more accurately than conventional DNN-based speech source separation and conventional speech extraction using time-invariant spatial filtering.

DOI

Scopus

7

Citation

(Scopus)
Exploring and exploiting the hierarchical structure of a scene for scene graph generation

Ikuto Kurosawa, Tetsunori Kobayashi, Yoshihiko Hayashi

Proceedings - International Conference on Pattern Recognition 1422 - 1429 2020

　View Summary

The scene graph of an image is an explicit, concise representation of the image
hence, it can be used in various applications such as visual question answering or robot vision. We propose a novel neural network model for generating scene graphs that maintain global consistency, which prevents the generation of unrealistic scene graphs
the performance in the scene graph generation task is expected to improve. Our proposed model is used to construct a hierarchical structure whose leaf nodes correspond to objects depicted in the image, and a message is passed along the estimated structure on the fly. To this end, we aggregate features of all objects into the root node of the hierarchical structure, and the global context is back-propagated to the root node to maintain all the object nodes. The experimental results on the Visual Genome dataset indicate that the proposed model outperformed the existing models in scene graph generation tasks. We further qualitatively confirmed that the hierarchical structures captured by the proposed model seemed to be valid.

DOI

Scopus
Mask CTC: Non-autoregressive end-to-end ASR with CTC and mask predict

Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020- 3655 - 3659 2020

　View Summary

We present Mask CTC, a novel non-autoregressive end-to-end automatic speech recognition (ASR) framework, which generates a sequence by refining outputs of the connectionist temporal classification (CTC). Neural sequence-to-sequence models are usually autoregressive: each output token is generated by conditioning on previously generated tokens, at the cost of requiring as many iterations as the output length. On the other hand, non-autoregressive models can simultaneously generate tokens within a constant number of iterations, which results in significant inference time reduction and better suits end-to-end ASR model for real-world scenarios. In this work, Mask CTC model is trained using a Transformer encoder-decoder with joint training of mask prediction and CTC. During inference, the target sequence is initialized with the greedy CTC outputs and low-confidence tokens are masked based on the CTC probabilities. Based on the conditional dependence between output tokens, these masked low-confidence tokens are then predicted conditioning on the high-confidence tokens. Experimental results on different speech recognition tasks show that Mask CTC outperforms the standard CTC model (e.g., 17.9% ? 12.1% WER on WSJ) and approaches the autoregressive model, requiring much less inference time using CPUs (0.07 RTF in Python implementation). All of our codes are publicly available at https://github.com/espnet/espnet.

DOI

Scopus

98

Citation

(Scopus)
Mentoring-reverse mentoring for unsupervised multi-channel speech source separation

Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020- 86 - 90 2020

　View Summary

Mentoring-reverse mentoring, which is a novel knowledge transfer framework for unsupervised learning, is introduced in multi-channel speech source separation. This framework aims to improve two different systems, which are referred to as a senior and a junior system, by mentoring each other. The senior system, which is composed of a neural separator and a statistical blind source separation (BSS) model, generates a pseudo-target signal. The junior system, which is composed of a neural separator and a post-filter, was constructed using teacher-student learning with the pseudo-target signal generated from the senior system i.e, imitating the output from the senior system (mentoring step). Then, the senior system can be improved by propagating the shared neural separator of the grown-up junior system to the senior system (reverse mentoring step). Since the improved neural separator can give better initial parameters for the statistical BSS model, the senior system can yield more accurate pseudo-target signals, leading to iterative improvement of the pseudo-target signal generator and the neural separator. Experimental comparisons conducted under the condition where mixture-clean parallel data are not available demonstrated that the proposed mentoring-reverse mentoring framework yielded improvements in speech source separation over the existing unsupervised source separation methods.

DOI

Scopus

11

Citation

(Scopus)
Efficient Human-In-The-Loop Object Detection using Bi-Directional Deep SORT and Annotation-Free Segment Identification.

Koki Madono, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(APSIPA) 1226 - 1233 2020
Waseda meisei at TRECVID 2018: Ad-hoc video search

Kazuya Ueki, Yu Nakagome, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

2018 TREC Video Retrieval Evaluation, TRECVID 2018 2020

　View Summary

Copyright © TRECVID 2018.All rights reserved. The Waseda Meisei team participated in the TRECVID 2018 Ad-hoc Video Search (AVS) task [1]. For this year's AVS task, we submitted both manually assisted and fully automatic runs. Our approach focuses on the concept-based video retrieval, based on the same approach as last year. Specifically, it improves on the word-based keyword extraction method presented in last year's system, which could neither handle keywords related to motion nor appropriately capture the meaning of phrases or whole sentences in queries. To deal with these problems, we introduce two new measures: (i) calculating the similarity between the definition of a word and an entire query sentence, (ii) handling of multi-word phrases. Our best manually assisted run achieved a mean average precision (mAP) of 10.6%, which was ranked the highest among all submitted manually assisted runs. Our best fully automatic run achieved an mAP of 6.0%, which ranked sixth among all participants.
Waseda_Meisei at TRECVID 2018: Fully-automatic ad-hoc video search

Yu Nakagome, Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

2018 TREC Video Retrieval Evaluation, TRECVID 2018 2020
Word attribute prediction enhanced by lexical entailment tasks

Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi

LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings 5846 - 5854 2020 [Refereed]

　View Summary

© European Language Resources Association (ELRA), licensed under CC-BY-NC Human semantic knowledge about concepts acquired through perceptual inputs and daily experiences can be expressed as a bundle of attributes. Unlike the conventional distributed word representations that are purely induced from a text corpus, a semantic attribute is associated with a designated dimension in attribute-based vector representations. Thus, semantic attribute vectors can effectively capture the commonalities and differences among concepts. However, as semantic attributes have been generally created by psychological experimental settings involving human annotators, an automatic method to create or extend such resources is highly demanded in terms of language resource development and maintenance. This study proposes a two-stage neural network architecture, Word2Attr, in which initially acquired attribute representations are then fine-tuned by employing supervised lexical entailment tasks. The quantitative empirical results demonstrated that the fine-tuning was indeed effective in improving the performances of semantic/visual similarity/relatedness evaluation tasks. Although the qualitative analysis confirmed that the proposed method could often discover valid but not-yet human-annotated attributes, they also exposed future issues to be worked: we should refine the inventory of semantic attributes that currently relies on an existing dataset.
Exploiting Narrative Context and A Priori Knowledge of Categories in Textual Emotion Classification.

Hikari Tanabe, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi

Proceedings of the 28th International Conference on Computational Linguistics(COLING) 5535 - 5540 2020 [Refereed]
MicroLapse: Measuring workers' leniency to prediction errors of microtasks' working times

Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Jefrey P. Bigham

Proceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW 352 - 356 2019.11

　View Summary

Working time estimation is known to be helpful for allowing crowd workers to select lucrative microtasks. We previously proposed a machine learning method for estimating the working times of microtasks, but a practical evaluation was not possible because it was unclear what errors would be problematic for workers across diferent scales of microtask working times. In this study, we formulate MicroLapse, a function that expresses a maximal error in working time prediction that workers can accept for a given working time length. We collected 60, 760 survey answers from 660 Amazon Mechanical Turk workers to formulate MicroLapse. Our evaluation of our previous method based on MicroLapse demonstrated that our working time prediction method was fairly successful for shorter microtasks, which could not have been concluded in our previous paper.

DOI

Scopus
Regularized adversarial training for single-shot virtual try-on

Kotaro Kikuchi, Kota Yamaguchi, Edgar Simo-Serra, Tetsunori Kobayashi

Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019 3149 - 3152 2019.10

　View Summary

Spatially placing an object onto a background is an essential operation in graphic design and facilitates many different applications such as virtual try-on. The placing operation is formulated as a geometric inference problem for given foreground and background images, and has been approached by spatial transformer architecture. In this paper, we propose a simple yet effective regularization technique to guide the geometric parameters based on user-defined trust regions. Our approach stabilizes the training process of spatial transformer networks and achieves a high-quality prediction with single-shot inference. Our proposed method is independent of initial parameters, and can easily incorporate various priors to prevent different types of trivial solutions. Empirical evaluation with the Abstract Scenes and CelebA datasets shows that our approach achieves favorable results compared to baselines.

DOI

Scopus

6

Citation

(Scopus)
Speaker adversarial training of DPGMM-based feature extractor for zero-resource languages

Yosuke Higuchi, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. INTERSPEECH2019 266 - 270 2019.09 [Refereed]
Multi-channel speech enhancement using time-domain convolutional denoising autoencoder

Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. INTERSPEECH2019 86 - 90 2019.09 [Refereed]
Calving prediction from video: Exploiting behavioural information relevant to calving signs in Japanese black beef cows

Kazuma Sugawara, Susumu Saito, Teppei Nakano, Makoto Akanabe, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. ECPLF2019 663 - 669 2019.08 [Refereed]
Two-stage calving prediction system: Exploiting state-based information relevant to calving signs in Japanese black beef cows

Ryosuke Hyodo, Saki Yasuda, Yusuke Okimoto, Susumu Saito, Teppei Nakano, Makoto Akanabe, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. ECPLF2019 670 - 676 2019.08 [Refereed]
Data assimilation versus machine learning: Comparative study of fish catch forecasting

Yuka Horiuchi, Yuya Kokaki, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. OCEANS2019 2019.06 [Refereed]
Psychological measure on fish catches and its application to optimization criterion for machine learning based predictors

Yuya Kokaki, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. OCEANS2019 2019.06 [Refereed]
対話音声合成の表現力向上に向けた文末音調の制御による付加的なニュアンスの表現に関する実験的検討

岩田和彦,小林哲則

電子情報通信学会論文誌 D, Vol.J102-D ( 6 ) 442 - 453 2019.06 [Refereed]

Authorship：Last author

DOI
TurkScanner: Predicting the hourly wage of microtasks

Susumu Saito, Chun-Wei Chiang, Saiph Savage, Teppei Nakano, Tetsunori Kobayashi, Jeffrey Bigham

Proc. The Web Conference 2019 3187 - 3193 2019.05 [Refereed] [International coauthorship]
Postfiltering using an adversarial denoising autoencoder with noise-aware training

Naohiro Tawara, Hikari Tanabe, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa

Proc. ICASSP2019 3282 - 3286 2019.05 [Refereed]

DOI

Scopus

1

Citation

(Scopus)
End-to-middle training based action generation for multi-party conversation robot

Hayato Katayama, Shinya Fujie, Tetsunori Kobayashi

Proc. IWSDS2019 2019.04 [Refereed]

Authorship：Last author
Investigation of Users' Short Responses in Actual Conversation System and Automatic Recognition of their Intentions

Katsuya Yokoyama, Hiroaki Takatsu, Hiroshi Honda, Shinya Fujie, Tetsunori Kobayashi

2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings 934 - 940 2019.02

　View Summary

In human-human conversations, listeners often convey intentions to speakers through feedback consisting of reflexive short responses. The speakers recognize these intentions and change the conversational plans to make communication more efficient. These functions are expected to be effective in human-system conversations also
however, there is only a few systems using these functions or a research corpus including such functions. We created a corpus that consists of users' short responses to an actual conversation system and developed a model for recognizing the intention of these responses. First, we categorized the intention of feedback that affects the progress of conversations. We then collected 15604 short responses of users from 2060 conversation sessions using our news-delivery conversation system. Twelve annotators labeled each utterance based on intention through a listening test. We then designed our deep-neural-network-based intention recognition model using the collected data. We found that feedback in the form of questions, which is the most frequently occurring expression, was correctly recognized and contributed to the efficiency of the conversation system.

DOI

Scopus

5

Citation

(Scopus)
会話によるニュース記事伝達のための音声合成

高津弘明, 福岡維新, 藤江真也, 岩田和彦, 小林哲則

人工知能学会論文誌, 34 ( 2 ) B-I65_1 - 15 2019.02 [Refereed]

Authorship：Last author
Speech synthesis for conversational news contents delivery

Hiroaki Takatsu, Ishin Fukuoka, Shinya Fujie, Kazuhiko Iwata, Tetsunori Kobayashi

Transactions of the Japanese Society for Artificial Intelligence 34 ( 2 ) 2019

　View Summary

We have been developing a speech-based “news-delivery system”, which can transmit news contents via spoken dialogues. In such a system, a speech synthesis sub system that can flexibly adjust the prosodic features in utterances is highly vital: the system should be able to highlight spoken phrases containing noteworthy information in an article
it should also provide properly controlled pauses between utterances to facilitate user’s interactive reactions including questions. To achieve these goals, we have decided to incorporate the position of the utterance in the paragraph and the role of the utterance in the discourse structure into the bundle of features for speech synthesis. These features were found to be crucially important in fulfilling the above-mentioned requirements for the spoken utterances by the thorough investigation into the news-telling speech data uttered by a voice actress. Specifically, these features dictate the importance of information carried by spoken phrases, and hence should be effectively utilized in synthesizing prosodically adequate utterances. Based on these investigations, we devised a deep neural network-based speech synthesis model that takes as input the role and position features. In addition, we designed a neural network model that can estimate an adequate pause length between utterances. Experimental results showed that by adding these features to the input, it becomes more proper speech for information delivery. Furthermore, we confirmed that by inserting pauses properly, it becomes easier for users to ask questions during system utterances.

DOI

Scopus

1

Citation

(Scopus)
Recognition of Intentions of Users’ Short Responses for Conversational News Delivery System

Hiroaki Takatsu, Katsuya Yokoyama, Yoichi Matsuyama, Shinya Fujie, Tetsunori Kobayashi

Proc. INTERSPEECH2019 1193 - 1197 2019 [Refereed]

Authorship：Last author
Social image tags as a source of word embeddings: A Task-oriented Evaluation

Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi

LREC 2018 - 11th International Conference on Language Resources and Evaluation 969 - 973 2019 [Refereed]

　View Summary

© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved. Distributional hypothesis has been playing a central role in statistical NLP. Recently, however, its limitation in incorporating perceptual and empirical knowledge is noted, eliciting a field of perceptually grounded computational semantics. Typical sources of features in such a research are image datasets, where images are accompanied by linguistic tags and/or descriptions. Mainstream approaches employ machine learning techniques to integrate/combine visual features with linguistic features. In contrast to or supplementing these approaches, this study assesses the effectiveness of social image tags in generating word embeddings, and argues that these generated representations exhibit somewhat different and favorable behaviors from corpus-originated representations. More specifically, we generated word embeddings by using image tags obtained from a large social image dataset YFCC100M, which collects Flickr images and the associated tags. We evaluated the efficacy of generated word embeddings with standard semantic similarity/relatedness tasks, which showed that comparable performances with corpus-originated word embeddings were attained. These results further suggest that the generated embeddings could be effective in discriminating synonyms and antonyms, which has been an issue in distributional hypothesis-based approaches. In summary, social image tags can be utilized as yet another source of visually enforced features, provided the amount of available tags is large enough.
SemSeq: A Regime for Training Widely-Applicable Word-Sequence Encoders.

Hiroaki Tsuyuki, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi

Computational Linguistics - 16th International Conference of the Pacific Association for Computational Linguistics(PACLING) 1215 CCIS 43 - 55 2019 [Refereed]

　View Summary

© 2020, Springer Nature Singapore Pte Ltd. A sentence encoder that can be readily employed in many applications or effectively fine-tuned to a specific task/domain is highly demanded. Such a sentence encoding technique would achieve a broader range of applications if it can deal with almost arbitrary word-sequences. This paper proposes a training regime for enabling encoders that can effectively deal with word-sequences of various kinds, including complete sentences, as well as incomplete sentences and phrases. The proposed training regime can be distinguished from existing methods in that it first extracts word-sequences of an arbitrary length from an unlabeled corpus of ordered or unordered sentences. An encoding model is then trained to predict the adjacency between these word-sequences. Herein an unordered sentence indicates an individual sentence without neighboring contextual sentences. In some NLP tasks, such as sentence classification, the semantic contents of an isolated sentence have to be properly encoded. Further, by employing rather unconstrained word-sequences extracted from a large corpus, without heavily relying on complete sentences, it is expected that linguistic expressions of various kinds are employed in the training. This property contributes to enhancing the applicability of the resulting word-sequence/sentence encoders. The experimental results obtained from supervised evaluation tasks demonstrated that the trained encoder achieved performance comparable to existing encoders while exhibiting superior performance in unsupervised evaluation tasks that involve incomplete sentences and phrases.

DOI

Scopus
Towards Answer-unaware Conversational Question Generation.

Mao Nakanishi, Tetsunori Kobayashi, Yoshihiko Hayashi

Proceedings of the 2nd Workshop on Machine Reading for Question Answering(MRQA@EMNLP) 63 - 71 2019 [Refereed]

DOI
Zero-Shot Video Retrieval from a Query Phrase Including Multiple Concepts - Efforts and Challenges in TRECVID AVS Task -

Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Tetsunori Kobayashi

84 ( 12 ) 983 - 990 2018.12 [Refereed]

Authorship：Last author
Adversarial autoencoder for reducing nonlinear distortion

Naohiro Tawara, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa

Proc. APSIPA2018 1669 - 1673 2018.11 [Refereed]
Sequential fish catch forecasting using Bayesian state space models

Yuya Kokaki, Naohiro Tawara, Tetsunori Kobayashi, Kazuo Hashimoto, Tetsuji Ogawa

Proc. ICPR2018 776 - 781 2018.08 [Refereed]
Fine-grained Video Retrieval using Query Phrases – Waseda_Meisei TRECVID 2017 AVS System –

Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Tetsunori Kobayashi

Proceedings of the 24th International Conference on Pattern Recognition 3322 - 3327 2018.08 [Refereed]

Authorship：Last author
Acoustic feature representation based on timbre for fault detection of rotary machines

Kesaaki Menemura, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. SDPC2018 302 - 305 2018.08 [Refereed]

Authorship：Last author
Speaker invariant feature extraction for zero-resource languages with adversarial training

Taira Tsuchiya, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

2018 IEEE International Conference on Acoustics, Speech and Signal Processing 2381 - 2385 2018.04 [Refereed]
Language model domain adaptation via recurrent neural network with domain-shared and domain-specific representations

Tsuyoshi Morioka, Naohiro Tawara, Tetsuji Ogawa, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi

Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018) 6084 - 6088 2018.04 [Refereed] [International journal]

Authorship：Last author

DOI

Scopus

23

Citation

(Scopus)
A Spoken Dialogue System for Enabling Information Behavior of Various Intention Levels

Hiroaki Takatsu, Ishin Fukuoka, Shinya Fujie, Yoshihiko Hayashi, Tetsunori Kobayashi

Transactions of the Japanese Society for Artificial Intelligence 33 ( 1 ) DSH - C_1 2018 [Refereed]

Authorship：Last author

DOI
Ad-hoc Video Search Improved by the Word Sense Filtering of Query Terms

Koji Hirakawa, Kotaro Kikuchi, Kazuya Ueki, Tetsunori Kobayashi, Yoshihiko Hayashi

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11292 LNCS 157 - 163 2018 [Refereed]

　View Summary

© 2018, Springer Nature Switzerland AG. The performances of an ad-hoc video search (AVS) task can only be improved when the video processing for analyzing video contents and the linguistic processing for interpreting natural language queries are nicely combined. Among the several issues associated with this challenging task, this paper particularly focuses on the sense disambiguation/filtering (WSD/WSF) of the terms contained in a search query. We propose WSD/WSF methods which employ distributed sense representations, and discuss their efficacy in improving the performance of an AVS system which makes full use of a large bank of visual concept classifiers. The application of a WSD/WSF method is crucial, as each visual concept classifier is linked with the lexical concept denoted by a word sense. The results are generally promising, outperforming not only a baseline query processing method that only considers the polysemy of a query term but also a strong WSD baseline method.

DOI

Scopus
Waseda_Meisei at TRECVID 2018: Ad-hoc Video Search.

Kazuya Ueki, Yu Nakagome, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

2018 TREC Video Retrieval Evaluation(TRECVID) 2018
Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension.

Mao Nakanishi, Tetsunori Kobayashi, Yoshihiko Hayashi

Proceedings of the 27th International Conference on Computational Linguistics(COLING) 973 - 983 2018 [Refereed]
Exploiting end of sentences and speaker alternations in recurrent neural network-based language modeling for multiparty conversations

Hiroto Ashikawa, Naohiro Tawara, Asunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2017 (APSIPA2017) 2017.12 [Refereed]

DOI

Scopus

1

Citation

(Scopus)
Object Detection Oriented Feature Pooling for Video Semantic Indexing

Kazuya Ueki, Tetsunori Kobayashi

The 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications 44 - 51 2017.02 [Refereed]

Authorship：Last author
Classifying Lexical-semantic Relationships by Exploiting Sense/Concept Representations

Kentaro Kanada, Tetsunori Kobayashi, Yoshihiko Hayashi

2017 Workshop on Sense, Concept and Entity Representations and their Application 37 - 46 2017 [Refereed]
Adaptive training of vibration-based anomaly detector for wind turbine condition monitoring

Takanori Hasegawa, Jun Ogata, Masahiro Murakawa, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. Annual Conference on PHM Society 177 - 184 2017 [Refereed]
Incorporating visual features into word embeddings: A bimodal autoencoder-based approach.

Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi

IWCS 2017 - 12th International Conference on Computational Semantics - Short papers(IWCS(2)) 2017 [Refereed]
“Video Semantic Indexing using Object Detector,”

Kazuya Ueki, Tetsunori Kobayashi

Proc. VRCAI2016 2016.12 [Refereed]

Authorship：Last author
“Evaluation for Collaborative Video Surveillance Platform using Prototype System of Abandoned Object Detection,”

Susumu Saito, Teppei Nakano, Tetsunori Kobayashi

Proc. ICDSC2016 172 - 177 2016.09 [Refereed]

Authorship：Last author
Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi

Trans. on Signal and Information Processing 5 2016.08 [Refereed]

Authorship：Last author
Waseda at TRECVID 2016: Fully-automatic Ad-hoc Video Search

Kotaro Kikuchi, Kazuya Ueki, Susumu Saito, Tetsunori Kobayashi

2016 TREC Video Retrieval Evaluation, TRECVID 2016 2016
A Spoken Dialog System for Coordinating Information Consumption and Exploration.

Shinya Fujie, Ishin Fukuoka, Asumi Mugita, Hiroaki Takatsu, Yoshihiko Hayashi, Tetsunori Kobayashi

Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval(CHIIR) 253 - 256 2016 [Refereed]

Authorship：Last author

　View Summary

© 2016 ACM. Passive consumption of information is boring in most cases and even painful in some cases, especially when the information content is delivered by employing speech media. The user of a speech-based information delivery system, for example a text-to-speech system, usually cannot interrupt the ongoing information ow, inhibiting her/him to confirm some part of the content, or to pose an inquiry for further information exploration. We argue that a carefully designed spoken dialog system could remedy these undesirable situations, and further enable an enjoyable conversation with the users. The key technologies to realize such an attractive dialog system are: (1) pre-compilation of a dialog plan based on the analysis of a source content, and (2) the dynamic recognition of user's state of understanding and interests. This paper illustrates technical views to implement these functionalities, and discusses a dialog example to exemplify the technical merits of the proposed system.

DOI

Scopus

3

Citation

(Scopus)
Multi-Feature Based Fast Depth Decision in HEVC Inter Prediction for VLSI Implementation

Gaoxing Chen, Zhenyu Liut, Tetsunori Kobayashi, Takeshi Ikenaga

2016 9TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2016) 124 - 134 2016 [Refereed]

　View Summary

High efficiency video coding (HEVC) is the latest international video compression standard that achieves double compression efficiency than the previous standard H.264/AVC. To increase the compression accuracy, HEVC employs the coding unit (CU) ranging from 8 x 8 to 64 x 64. However, the encoding complexity of HEVC increase a lot since the manifold partition sizes. A lot of works are focused on reducing the complexity but didn't considered the feasibility of hardware implementation. This paper proposes a hardware friendly fast depth range definition algorithm based on multiple features. Block texture feature, quantization feature and block motion feature are utilized. Block texture feature is based on the texture similarity in consecutive frames. Quantization feature is based on the compression regularity of HEVC. Block motion feature is for compensate the difference caused by the moving object. Comparing with the original HEVC, the proposed method can saved about 33.72% of the processing time with 0.76% of BD-bitrate increase on average.
Image Retrieval under Very Noisy Annotations

Kazuya Ueki, Tetsunori Kobayashi

2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) 1277 - 1282 2016 [Refereed]

Authorship：Last author

　View Summary

In recent years, a significant number of tagged images uploaded onto image sharing sites has enabled us to create high-performance image recognition models. However, there are many inaccurate image tags on the Internet, and it is very laborious to investigate the percentage of tags that are incorrect. In this paper, we propose a new method for creating an image recognition model that can be used even when the image data set includes many incorrect tags. Our method has two superior features. First, our method automatically measures the reliability of annotations and does not require any parameter adjustment for the percentage of error tags. This is a very important feature because we usually do not know how many errors are included in the database, especially in actual Internet environments. Second, our method iterates the error modification process. It begins with the modification of simple and obvious errors, gradually deals with much more difficult errors, and finally creates the high-performance recognition model with refined annotations. Using an object recognition image database with many annotation errors, our experiments showed that the proposed method successfully improved the image retrieval performance in approximately 90 percent of the image object categories.
Video Semantic Indexing using Object Detection-Derived Features

Kotaro Kikuchi, Kazuya Ueki, Tetsuji Ogawa, Tetsunori Kobayashi

2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) 1288 - 1292 2016 [Refereed]

Authorship：Last author

　View Summary

A new feature extraction method based on object detection to achieve accurate and robust semantic indexing of videos is proposed. Local features (e.g., SIFT and HOG) and convolutional neural network (CNN)-derived features, which have been used in semantic indexing, in general are extracted from the entire image and do not explicitly represent the information of meaningful objects that contributes to the determination of semantic categories. In this case, the background region, which does not contain the meaningful objects, is unduly considered, exerting a harmful effect on the indexing performance. In the present study, an attempt was made to suppress the undesirable effects derived from the redundant background information by incorporating object detection technology into semantic indexing. In the proposed method, a combination of the meaningful objects detected in the video frame image is represented as a feature vector for verification of semantic categories. Experimental comparisons demonstrate that the proposed method facilitates the TRECVID semantic indexing task.
IMPROVING SEMANTIC VIDEO INDEXING: EFFORTS IN WASEDA TRECVID 2015 SIN SYSTEM

Kazuya Ueki, Tetsunori Kobayashi

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS 1184 - 1188 2016 [Refereed]

Authorship：Last author

　View Summary

In this paper, we propose a method for improving the performance of semantic video indexing. Our approach involves extracting features from multiple convolutional neural networks (CNNs), creating multiple classifiers, and integrating them. We employed four measures to accomplish this: (1) utilizing multiple evidences observed in each video and effectively compressing them into a fixed-length vector; (2) introducing gradient and motion features to CNNs; (3) enriching variations of the training and the testing sets; and (4) extracting features from several CNNs trained with various large-scale datasets. Using the test dataset from TRECVID's 2014 evaluation benchmark, we evaluated the performance of the proposal in terms of the mean extended inferred average precision measure. On this measure, our system's performance was 35.7, outperforming the state-of-the-art TRECVID 2014 benchmark performance of 33.2. Based on this work, our submission at TRECVID 2015 was ranked second among all submissions.
Separation matrix optimization using associative memory model for blind source separation

Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri

2015 23rd European Signal Processing Conference, EUSIPCO 2015 1098 - 1102 2015.12 [Refereed]

　View Summary

A source signal is estimated using an associative memory model (AMM) and used for separation matrix optimization in linear blind source separation (BSS) to yield high quality and less distorted speech. Linear-filtering-based BSS, such as independent vector analysis (IVA), has been shown to be effective in sound source separation while avoiding non-linear signal distortion. This technique, however, requires several assumptions of sound sources being independent and generated from non-Gaussian distribution. We propose a method for estimating a linear separation matrix without any assumptions about the sources by repeating the following two steps: estimating non-distorted reference signals by using an AMM and optimizing the separation matrix to minimize an error between the estimated signal and reference signal. Experimental comparisons carried out in simultaneous speech separation suggest that the proposed method can reduce the residual distortion caused by IVA.

DOI

Scopus

2

Citation

(Scopus)
Improving Classification Accuracy of Image Categories Using Local Descriptors with Supplemental Information

Kazuya Ueki, Yohei Shiraishi, Naohiro Tawara, Tetsunori Kobayashi

80 ( 12 ) 1144 - 1149 2015.12 [Refereed]

Authorship：Last author
Waseda at TRECVID 2015: Semantic Indexing, notebook paper of the TRECVID 2015 Workshop: November 2015.

Kazuya Ueki , Tetsunori Kobayashi

The TREC Video Retrieval Evaluation2015 2015.11 [Refereed]

Authorship：Last author
Automatic image tag refinement for image retrieval.

Kazuya Ueki , Tetsunori Kobayashi

Proc. 5th Asia International Symposium on Mechatronics 396 - 399 2015.10 [Refereed]

Authorship：Last author
Multiscale recurrent neural network based language model.

Tsuyoshi Morioka, Tomoharu Iwata, Takaaki Hori, Tetsunori Kobayashi

Proc. 16th Annual Conf. of the Int'l Speech Communication Association 2366 - 2370 2015.09 [Refereed]

Authorship：Last author
Bilinear map of filter-bank outputs for DNN-based speech recognition.

Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi , Tsuneo Nitta

Proc. 16th Annual Conf. of the Int'l Speech Communication Association 16 - 20 2015.09 [Refereed]
Blind source separation using associative memory model and linear separation filter.

Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri

Proc. 2015 European Signal Processing Conference 1103 - 1107 2015.09 [Refereed]

Authorship：Corresponding author
A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura and Tetsunori Kobayashi

Trans. on Signal and Information Processing 4 ( e6 ) 2015.09 [Refereed]

Authorship：Last author
Bilinear map of filter‐bank outputs for DNN‐based speech recognition

Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta

INTERSPEECH 2015 16 - 20 2015.09 [Refereed]
Feature extraction for rotary-machine acoustic diagnostics focused on period.

Kesaaki Minemura, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 44th International Congress and Exposition on Noise Control Engineering 2015.08 [Refereed]

Authorship：Last author
Towards a Computational Model of Small Group Facilitation.

Yoichi Matsuyama, Tetsunori Kobayashi

2015 AAAI Spring Symposium Series 2015.03 [Refereed]

Authorship：Last author
Automatic Expressive Opinion Sentence Generation for Enjoyable Conversational Systems

Yoichi Matsuyama, Akihiro Saito, Shinya Fujie and Tetsunori Kobayashi

Trans. on Audio, Speech, and Language Processing 23 ( 1 ) 313 - 326 2015.02 [Refereed]

Authorship：Last author
Waseda at TRECVID 2015 semantic indexing (SIN)

Kazuya Ueki, Tetsunori Kobayashi

2015 TREC Video Retrieval Evaluation, TRECVID 2015 2015
A COMPARATIVE STUDY OF SPECTRAL CLUSTERING FOR I-VECTOR-BASED SPEAKER CLUSTERING UNDER NOISY CONDITIONS

Naohiro Tawara, Tetsuji Ogawa, Tetsunori Kobayashi

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) 2041 - 2045 2015 [Refereed]

Authorship：Last author

　View Summary

The present paper dealt with speaker clustering for speech corrupted by noise. In general, the performance of speaker clustering significantly depends on how well the similarities between speech utterances can be measured. The recently proposed i-vector-based cosine similarity has yielded the state-of-the-art performance in speaker clustering systems. However, this similarity often fails to capture the speaker similarity under noisy conditions. Therefore, we attempted to examine the efficiency of spectral clustering on i-vector-based similarity for speech corrupted by noise because spectral clustering can yield robustness against noise by non-linear projection. Experimental comparisons demonstrated that spectral clustering yielded significant improvement from conventional methods, such as agglomerative clustering and k -means clustering, under non-stationary noise conditions.
Multi-layer Feature Extractions for Image Classification - Knowledge from Deep CNNs

Kazuya Ueki, Tetsunori Kobayashi

2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015) 9 - 12 2015 [Refereed]

Authorship：Last author

　View Summary

Recently, there has been considerable research into the application of deep learning to image recognition. Notably, deep convolutional neural networks (CNNs) have achieved excellent performance in a number of image classification tasks, compared with conventional methods based on techniques such as Bag-of-Features (BoF) using local descriptors. In this paper, to cultivate a better understanding of the structure of CNN, we focus on the characteristics of deep CNNs, and adapt them to SIFT+BoF-based methods to improve the classification accuracy. We introduce the multi-layer structure of CNNs into the classification pipeline of the BoF framework, and conduct experiments to confirm the effectiveness of this approach using a fine-grained visual categorization dataset. The results show that the average classification rate is improved from 52.4% to 69.8%.
Effect of frequency weighting on MLP-based speaker canonicalization,

Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi , Tsuneo Nitta,

Proc. 15th Annual Conf. of the Int'l Speech Communication Association 2987 - 2991 2014.09 [Refereed]

Authorship：Last author
Effect of Frequency Weighting on MLP-Based Speaker Canonicalization

Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta

INTERSPEECH 2014 2987 - 2990 2014.09 [Refereed]
“Blocked Gibbs Sampling Based Multi-Scale Mixture Model for Speaker Clustering on Noisy Data”,

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi,

IEEE International Workshop on Machine Learning for Signal Processing 2013.09 [Refereed]

Authorship：Last author
Expression of Speaker's Intentions through Sentence-Final Particle/Intonation Combinations in Japanese Conversational Speech Syntyesis

Kazuhiko Iwata, Tetsunori Kobayashi

8th ISCA Speech Synthesis Workshop 235 - 240 2013.08 [Refereed]

Authorship：Last author
Speaker's Intentions Conveyed to Listeners by Sentence-Final Particles and Their Intonations in Japanese Conversational Speech

Kazuhiko Iwata, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2013 6895 - 6899 2013.05 [Refereed]

Authorship：Last author
A Four-Participant Group Facilitation Framework for Conversational Robots

Yoichi Matsuyama, Iwao Akiba, Akihiro Saito, Tetsunori Kobayashi

Proceedings of the SIGDIAL 2013 Conference 284 - 293 2013 [Refereed]

Authorship：Last author
Expressing speaker's intentions through sentence-final intonations for Japanese conversational speech synthesis

Kazuhiko Iwata, Tetsunori Kobayashi

Proc. 13th Annual Conf. of the Int'l Speech Communication Association 442 - 445 2012.09 [Refereed]

Authorship：Last author
Fully Bayesian Speaker Clustering Based on Hierarchically Structured Utterance-oriented Dirichlet process mixture model.

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

Proc. 13th Annual Conf. of the Int'l Speech Communication Association 2166 - 2169 2012.09 [Refereed]

Authorship：Last author
AAM Fitting Using Shape Parameter Distribution.

Youhei Shiraishi, Shinya Fujie , Tetsunori Kobayashi

Proc. EUSIPCO2012 2238 - 2242 2012.08 [Refereed]

Authorship：Last author
Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering.

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi,

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2012 5253 - 5256 2012.03 [Refereed]

Authorship：Last author
Conversation Robot Participating in and Promoting Human-Human Communication

Shinya Fujie, Yoichi Matsuyama, Akira Taniyama, Tetsunori Kobayashi

J95-A ( 1 ) 37 - 45 2012.01 [Refereed]

Authorship：Last author
Spatial filter calibration based on minimization of modified LSD.

Nobuaki Tanaka, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 12th Annual Conf. of the Int'l Speech Communication Association 1761 - 1764 2011.09 [Refereed]

Authorship：Last author
Development and evaluation of Japanese Lombard speech corpus

Tetsuji Ogawa, Takanobu Nishiura, Takeshi Yamada, Norihide Kitaoka, Tetsunori Kobayashi,

Proc. Internoise2011 1366 - 1373 2011.09 [Refereed]

Authorship：Last author
Speaker verification robust to talking style variation using multiple kernel leaning based on conditional entropy minimization

Tetsuji Ogawa, Hideitsu Hino, Noboru Murata, Tetsunori Kobayashi

Proc. 12th Annual Conf. of the Int'l Speech Communication Association 2741 - 2744 2011.08 [Refereed]

Authorship：Last author
Speaker Clustering Based on Utterance-oriented Dirichlet Process Mixture Model.

Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 12th Annual Conf. of the Int'l Speech Communication Association 2905 - 2908 2011.08 [Refereed]

Authorship：Last author
Multiparty Conversation Facilitation Strategy Using Combination of Question Answering and Spontaneous Utterances

Yoichi Matsuyama, Yushi Xu, Akihiro Saito, Shinya Fujie, Tetsunori Kobayashi

The paralinguistic Information Processing and its Integration in Spoken Dialogue Systems 103 - 112 2011.08 [Refereed]

Authorship：Last author
Conversational Speech Synthesis System with Communication Situation Dependent HMMs

Kazuhiko Iwata, Tetsunori Kobayashi

The paralinguistic Information Processing and its Integration in Spoken Dialogue Systems 113 - 124 2011.08 [Refereed]

Authorship：Last author
Class-Distance-Based Discriminant Analysis and Its Application to Supervised Automatic Age Estimation

OGAWA Tetsuji, UEKI Kazuya, KOBAYASHI Tetsunori

IEICE Trans. Inf. & Syst. 94 ( 8 ) 1683 - 1689 2011.08 [Refereed]

Authorship：Last author

　View Summary

We propose a novel method of supervised feature projection called class-distance-based discriminant analysis (CDDA), which is suitable for automatic age estimation (AAE) from facial images. Most methods of supervised feature projection, e.g., Fisher discriminant analysis (FDA) and local Fisher discriminant analysis (LFDA), focus on determining whether two samples belong to the same class (i.e., the same age in AAE) or not. Even if an estimated age is not consistent with the correct age in AAE systems, i.e., the AAE system induces error, smaller errors are better. To treat such characteristics in AAE, CDDA determines between-class separability according to the class distance (i.e., difference in ages); two samples with similar ages are imposed to be close and those with spaced ages are imposed to be far apart. Furthermore, we propose an extension of CDDA called local CDDA (LCDDA), which aims at handling multimodality in samples. Experimental results revealed that CDDA and LCDDA could extract more discriminative features than FDA and LFDA.

DOI CiNii

Scopus
mn SPEAKER RECOGNITION USING MULTIPLE KERNEL LEARNING BA SED ON CONDITIONA L ENTROPY MINIMIZATION

Tetsuji Ogawa, Hideitsu Hino, Nima Reyhani, Noboru Murata, Tetsunori Kobayashi

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING 2204 - 2207 2011 [Refereed]

Authorship：Last author

　View Summary

We applied a multiple kernel learning (MKL) method based on information-theoretic optimization to speaker recognition. Most of the kernel methods applied to speaker recognition systems require a suitable kernel function and its parameters to be determined for a given data set. In contrast, MKL eliminates the need for strict determination of the kernel function and parameters by using a convex combination of element kernels. In the present paper, we describe an MKL algorithm based on conditional entropy minimization (MCEM). We experimentally verified the effectiveness of MCEM for speaker classification; this method reduced the speaker error rate as compared to conventional methods.

DOI

Scopus

4

Citation

(Scopus)
Framework of Communication Activation Robot Participating in Multiparty Conversation

Yoichi Matsuyama, Shinya Fujie, Tetsunori Kobayashi

AAAI Fall Symposium, Dialog with Robots 68 - 73 2010.11 [Refereed]

Authorship：Last author
DEVELOPMENT OF ZONAL BEAMFORMER AND ITS APPLICATION TO ROBOT AUDITION

Nobuaki Tanaka, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010) 1529 - 1533 2010.08 [Refereed]

Authorship：Last author

　View Summary

We have proposed a zonal beamformer (ZBF), which enhances the sound source located in a zonal space, and applied the ZBF to noise reduction systems for robot audition. A conversational partner of a robot does not always remain stationary with respect to the robot. In order to cope with such a situation, we have proposed a fan-like beamformer (FBF), which enhances the sound source located in a fan-like space in front of the robot under the assumption that the partner is in front of the robot. However, the FBF may degrade the noise reduction performance when directional noise sources are located behind the target source because the FBF widens the space as the distance from the robot increases. The ZBF can better improve the performance of eliminating the directional noise coming from behind the target source than the FBF because the ZBF has a considerably sharper directivity than the FBF.
Speech enhancement using a square microphone array in the presence of directional and diffuse noise

Tetsuji Ogawa, Shintaro Takada, Kenzo Akagiri, and Tetsunori Kobayashi

Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA) E93-E ( 5 ) 2010.05 [Refereed]

Authorship：Last author
A Meeting Assistance System with a Collaborative Editor for Argument Structure Visualization

Yasutomo Arai, Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

Proc. Int'l Conf. on Computer Supported Corporative Work 2010 2010.02 [Refereed]

Authorship：Last author
A Collaborative Lexical Data Design System for Speech Recognition Application Developers

Hiroshi Sasaki, Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

Proc. Int'l Conf. on Computer Supported Corporative Work 2010 455 - 456 2010.02 [Refereed]

Authorship：Last author
Conversation Robot and Its Audition System

FUJIE Shinya, OGAWA Tetsuji, KOBAYASHI Tetsunori

JRSJ 28 ( 1 ) 23 - 26 2010.01 [Refereed] [Invited]

Authorship：Last author

DOI CiNii
Psychological evaluation of a group communication activativation robot in a party game

Yoichi Matsuyama, Shinya Fujie, Hikaru Taniyama, Tetsunori Kobayashi

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 3046 - 3049 2010

　View Summary

We propose a communication activation robot and evaluate effectiveness of communication activation. As an example of application, we developed the system participating in a quiz-formed party game called NANDOKU quiz on a multi-modal conversation robot SCHEMA, and we conducted an experiment in a laboratory to evaluate its capability of activation in group communication. We evaluated interaction in NANDOKU quiz game with subjects as panelists using video analysis and SD(Semantic Differential) method with questionnaires. The result of SD method indicates that subjects feel more pleased and more noisy with participation of a robot. As the result of video analysis, the smiling duration ratio is greater with participation of a robot. These results imply evidence of robot's communication activation function in the party game. © 2010 ISCA.
A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination

Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 2954 - 2957 2010

　View Summary

We present a realization method of the principle of minimum relative entropy discrimination (MRED) in order to derive a regularized discriminative training method. MRED is advantageous since it provides a Bayesian interpretations of the conventional discriminative training methods and regularization techniques. In order to realize MRED for speech recognition, we proposed an approximation method of MRED that strictly preserves the constraints used in MRED. Further, in order to practically perform MRED, an optimization method based on convex optimization and its solver based on the cutting plane algorithm are also proposed. The proposed methods were evaluated on continuous phoneme recognition tasks. We confirmed that the MRED-based training system outperformed conventional discriminative training methods in the experiments. © 2010 ISCA.
Development of zonal beam former and its application to robot audition

Nobuaki Tanaka, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

European Signal Processing Conference 1529 - 1533 2010

　View Summary

We have proposed a zonal beamformer (ZBF), which enhances the sound source located in a zonal space, and applied the ZBF to noise reduction systems for robot audition. A conversational partner of a robot does not always remain stationary with respect to the robot. In order to cope with such a situation, we have proposed a fan-like beamformer (FBF), which enhances the sound source located in a fan-like space in front of the robot under the assumption that the partner is in front of the robot. However, the FBF may degrade the noise reduction performance when directional noise sources are located behind the target source because the FBF widens the space as the distance from the robot increases. The ZBF can better improve the performance of eliminating the directional noise coming from behind the target source than the FBF because the ZBF has a considerably sharper directivity than the FBF. © EURASIP, 2010.
SCHEMA: multi-party interaction-oriented humanoid robot

Yoichi Matsuyama, Kosuke Hosoya, Hikaru Taniyama, Hiroki Tsuboi, Shinya Fujie, Tetsunori Kobayashi

ACM SIGGRAPH ASIA 2009 Art Gallery & Emerging Technologies: Adaptation 2009.12 [Refereed]

Authorship：Last author

DOI
Influence of Lombard Effect: Accuracy Analysis of Simulation-Based Assessments of Noisy Speech Recognition Systems for Various Recognition Conditions

Tetsuji Ogawa, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E92D ( 11 ) 2244 - 2252 2009.11 [Refereed]

Authorship：Last author

　View Summary

The accuracy of simulation-based assessments of speech recognition systems under noisy conditions is investigated with a focus on the influence of the Lombard effect on the speech recognition performances. This investigation was carried out under various recognition conditions of different sound pressure levels of ambient noise, for different recognition tasks, such as continuous speech recognition and spoken word recognition, and using different recognition systems, i.e., systems with and without adaptation of the acoustic models to ambient noise. Experimental results showed that accurate simulation was not always achieved when dry sources with neutral talking style were used, but it could be achieved if the dry sources that include the influence of the Lombard effect were used; the simulation in the latter case is accurate, irrespective of the recognition conditions.

DOI

Scopus

4

Citation

(Scopus)
Conversation robot participating in and activating a group communication

Shinya Fujie, Yoichi Matsuyama, Hikaru Taniyama, Tetsunori Kobayashi

Proc. 10th Annual Conf. of the Int'l Speech Communication Association 264 - 267 2009.09 [Refereed]

Authorship：Last author
Direction-of-arrival estimation under noisy condition using four-line omni-directional microphones mounted on a robot head

Tetsuji Ogawa, Kosuke Hosoya, Kenzo Akagiri, Tetsunori Kobayashi

Proc. 2009 European Signal Processing Conference 879 - 883 2009.08 [Refereed]

Authorship：Last author
System Design of Group Communication Activator: An Entertainment Task for Elderly Care

Yoichi Matsuyama, Hikaru Taniyama, Shinya Fujie, Tetsunori Kobayashi

Proc. HRI2009 243 - 244 2009.03 [Refereed]

Authorship：Last author
Upper-Body Contour Extraction Using Face and Body Shape Variance Information

Kazuki Hoshiai, Shinya Fujie, Tetsunori Kobayashi

ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS 5414 862 - + 2009 [Refereed]

Authorship：Last author

　View Summary

We propose a fitting method using a model that integrates face and body shape variance information for upper-body contour extraction. Accurate body-contour extraction is necessary for various applications, such as pose estimation, gesture recognition, and so on. In this study, we regard it as the shape model fitting problem. A model including shape variance information can fit to the contour robustly even in the noisy case. AAMs are one of these models and can fit to a face successfully. It needs appearance information for effective fitting, but it can not be used in our case because appearance of upper-body easily changes by clothes. Instead of intensity image, proposed method uses edge image as appearance information. However, discrimination between a true contour edge of upper-body and other edges is difficult. To solve this problem, we integrate shapes of upper-body and face. It is expected that this integrated model is more robust to edges in clutter background and various locations of the body than a body shape model using only body shape information, We conduct experiments and confirm improvement in accuracy by integration of face and body variance information.
Robot auditory system using head-mounted square microphone array

Kosuke Hosoya, Tetsuji Ogawa, Tetsunori Kobayashi

2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS 2736 - 2741 2009 [Refereed]

Authorship：Last author

　View Summary

A new noise reduction method suitable for autonomous mobile robots was proposed and applied to preprocessing of a hands-free spoken dialogue system. When a robot talks with a conversational partner in real environments, not only speech utterances by the partner but also various types of noise, such as directional noise, diffuse noise, and noise from the robot, are observed at microphones. We attempted to remove these types of noise simultaneously with small and light-weighted devices and low-computational-cost algorithms. We assumed that the conversational partner of the robot was in front of the robot. In this case, the aim of the proposed method is extracting speech signals coming from the frontal direction of the robot. The proposed noise reduction system was evaluated h the presence of various types of noise: the number of word errors was reduced by 69 % as compared to the conventional methods. The proposed robot auditory system can also cope with the case in which a conversational partner (i.e., a sound source) moves from the front of the robot: the sound source was localized by face detection and tracking using facial images obtained from a camera mounted on an eye of the robot. As a result, various types of noise could be reduced in real time, irrespective of the sound source positions, by combining speech information with image information.
Multi-modal Integration for Personalized Conversation: Towards a Humanoid in Daily Life

Shinya Fujie, Daichi Watanabe, Yuhi Ichikawa, Hikaru Taniyama, Kosuke Hosoya, Yoichi Matsuyama, Tetsunori Kobayashi

Proc. Int'l Conf. on Humanoid Robots 617 - 622 2008.12 [Refereed]

Authorship：Last author
Designing Communication Activation System in Group Communication

Yoichi Matsuyama, Hikaru Taniyama, Shinya Fujie, Tetsunori Kobayashi

Proc. Int'l Conf. on Humanoid Robots 629 - 634 2008.12 [Refereed]

Authorship：Last author
Class Distance Weighted Locality Preserving Projection for Automatic Age Estimation

Kazuya Ueki, Masakazu Miya, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. Biometrics: Theory, Applications and Systems 2008.10 [Refereed]

Authorship：Last author
Design and Formulation for Speech Interface Based on Flexible Shortcuts

Teppei Nakano, Tomoyuki Kumai, Tetsunori Kobayashi, Yasushi Ishikawa

Proc. 9th Annual Conf. of the Int'l Speech Communication Association 2474 - 2477 2008.09 [Refereed]

Authorship：Corresponding author
An ASM fitting method based on machine learning that provides a robust parameter initialization for AAM fitting

Matthias Wimmer, Shinya Fujie, Freek Stulp, Tetsunori Kobayashi, Bernd Radig

Proc. Int'l Conf. on Automatic Face and Gesture Recognition 2008.09 [Refereed]

Authorship：Corresponding author
Ears of the robot: noise reduction using four-line ultra-micro omni-directional microphones mounted on a robot head

Tetsuji Ogawa, Hirofumi Takeuchi, Shintaro Takada, Kenzo Akagiri, Tetsunori Kobayashi

Proc. 2008 European Signal Processing Conference 2008.08 [Refereed]

Authorship：Last author
Ears of the robot: Direction of arrival estimation based on pattern recognition using robot-mounted microphones

Naoya Mochiki, Tetsuji Ogawa, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E91D ( 5 ) 1522 - 1530 2008.05 [Refereed]

Authorship：Last author

　View Summary

We propose a new type of direction-of-arrival estimation method for robot audition that is free from strict head related transfer function estimation. The proposed method is based on statistical pattern recognition that employs a ratio of power spectrum amplitudes occurring for a microphone pair as a feature vector. It does not require any phase information explicitly, which is frequently used in conventional techniques, because the phase information is unreliable for the case in which strong reflections and diffractions occur around the microphones. The feature vectors we adopted can treat these influences naturally. The effectiveness of the proposed method was shown from direction-of-arrival estimation tests for 19 kinds of directions: 92.4% of errors were reduced compared with the conventional phase-based method.

DOI

Scopus

3

Citation

(Scopus)
Speech enhancement using square microphone array for mobile devices

Shintaro Takada, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2008 313 - 316 2008.04 [Refereed]

Authorship：Last author
Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR,

Shoei Sato, Akio Kobayashi, Kazuo Onoe, Shinichi Homma, Toru Imai,Tohru Takagi, Tetsunori Kobayashi

Trans. on Information and Systems (ED) E91-D ( 3 ) 815 - 824 2008.03 [Refereed]

Authorship：Last author
Social robots that interact with people.

Cynthia Breazeal, Atsuo Takanishi, Tetsunori Kobayashi

Springer handbook of robotics 2008 [Refereed] [Invited]

Authorship：Last author
Upper-body Contour Extraction and Tracking Using Face and Body Shape Variance Information

Kazuki Hoshiai, Shinya Fujie, Tetsunori Kobayashi

2008 8TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS 2008) 398 - + 2008 [Refereed]

Authorship：Last author

　View Summary

We propose a fitting method using a model that integrates face and body shape variance information for upper-body contour extraction and tracking. Accurate body-contour extraction is necessary for various applications, such as pose estimation, gesture recognition, and so on. In this study, we regard it as the shape model fitting problem. A model including shape variance information can fit to the contour robustly even in the noise case. AAMs are one of these models and can fit to a face successfully. It needs appearance information for effective fitting, but it can not be used in our case because appearance of upper-body easily changes by clothes. Instead of intensity image, proposed method uses edge image as appearance information. However, discrimination between a true contour edge of upper-body and other edges is difficult. To solve this problem, we integrate shape models of upper body and face. It is expected that this integrated model is more robust to edges in clutter background and various locations of the body than a body shape model using only body shape information. We conduct experiments and confirm improvement in accuracy by integration of face and body variance information.
Extensible speech recognition system using Proxy-Agent

Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

Proc. Automatic Speech Recognition and Understanding Workshop 601 - 606 2007.12 [Refereed]

Authorship：Last author
Gender Classification Based on Integration of Multiple Classifiers Using Different Features of Facial and Neck Imgaes

Kazuya Ueki, Tetsunori Kobayashi

Journal of the Institute of Image Information and Television Engineers 61 ( 12 ) 1803 - 1809 2007.12 [Refereed]

Authorship：Last author
Sound Source Separation using Null-Beamforming and Spectral Subtraction for Mobile Devices

Shintaro Takada, Satoshi Kanba, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2007) 30 - 33 2007.10 [Refereed]

Authorship：Last author
Ears of the Robot : Three Simultaneous Speech Segregation and Recognition Using Robot-Mounted Microphones

MOCHIKI Naoya, OGAWA Tetsuji, KOBAYASHI Tetsunori

IEICE Trans. Inf. Syst., D 90 ( 9 ) 1465 - 1468 2007.09 [Refereed]

Authorship：Last author

　View Summary

A new type of sound source segregation method using robot-mounted microphones, which are free from strict head related transfer function (HRTF) estimation, has been proposed and successfully applied to three simultaneous speech recognition systems. The proposed segregation method is executed with sound intensity differences that are due to the particular arrangement of the four directivity microphones and the existence of a robot head acting as a sound barrier. The proposed method consists of three-layered signal processing: two-line SAFIA (binary masking based on the narrow band sound intensity comparison), two-line spectral subtraction and their integration. We performed 20K vocabulary continuous speech recognition test in the presence of three speakers' simultaneous talk, and achieved more than 70% word error reduction compared with the case without any segregation processing.

CiNii
Ears of the robot: Three simultaneous speech segregation and recognition using robot-mounted microphones

Naoya Mochiki, Tetsuji Ogawa, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E90D ( 9 ) 1465 - 1468 2007.09 [Refereed]

Authorship：Last author

　View Summary

A new type of sound source segregation method using robot-mounted microphones, which are free from strict head related transfer function (HRTF) estimation, has been proposed and successfully applied to three simultaneous speech recognition systems. The proposed segregation method is executed with sound intensity differences that are due to the particular arrangement of the four directivity microphones and the existence of a robot head acting as a sound barrier. The proposed method consists of three-layered signal processing: two-line SAFIA (binary masking based on the narrow band sound intensity comparison), two-line spectral subtraction and their integration. We performed 20K vocabulary continuous speech recognition test in the presence of three speakers' simultaneous talk, and achieved more than 70% word error reduction compared with the case without any segregation processing.

DOI

Scopus

3

Citation

(Scopus)
Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR

Shoei Sato, Tetsunori Kobayashi, et al.,

Proc. 8th Annual Conf. of the Int'l Speech Communication Association 345 - 348 2007.08 [Refereed]

Authorship：Corresponding author
Fusion-based age-group classification method using multiple two-dimensional feature extraction algorithms

Kazuya Ueki, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E90D ( 6 ) 923 - 934 2007.06 [Refereed]

Authorship：Last author

　View Summary

An age-group classification method based on a fusion of different classifiers with different two-dimensional feature extraction algorithms is proposed. Theoretically, an integration of multiple classifiers can provide better performance compared to a single classifier. In this paper, we extract effective features from one sample image using different dimensional reduction methods, construct multiple classifiers in each subspace, and combine them to reduce age-group classification errors. As for the dimensional reduction methods, two-dimensional PCA (2DPCA) and two-dimensional LDA (2DLDA) are used. These algorithms are antisymmetric in the treatment of the rows and the columns of the images. We prepared the row-based and column-based algorithms to make two different classifiers with different error tendencies. By combining these classifiers with different errors, the performance can be improved. Experimental results show that our fusion-based age-group classification method achieves better performance than existing two-dimensional algorithms alone.

DOI

Scopus

4

Citation

(Scopus)
マルチモーダル会話ロボット：ロボットが会話において「聴く」行為について

小林哲則, 藤江真也

計測自動制御学会誌 46 ( 6 ) 466 - 471 2007.06 [Refereed] [Invited]

Authorship：Lead author
Speech Starter: Speech Input Interface Capable of Endpoint Detection by Using Filled Pauses

Masataka Goto, Koji Kitayama, Katsunobu Itou, Tetsunori Kobayashi

48 ( 5 ) 2001 - 2011 2007.05 [Refereed]

Authorship：Last author
Adequency analysis of simulation-based assessment of speech recognition system

Tetsuji Ogawa, Satoshi Kanba, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2007 1153 - 1157 2007.04 [Refereed]

Authorship：Last author
Speech Spotter: Speech Input Interface Capable of Using Speech Recognition in the Midst of Human-Human Conversation

Masataka Goto, Koji Kitayama, Katsunobu Itou, Tetsunori Kobayashi

48 ( 3 ) 1275 - 1283 2007.03 [Refereed]

Authorship：Last author
Conversation robot with the function of gaze recognition

Shinya Fujie, Toshihiko Yamahata, Tetsunori Kobayashi

IEEE-RAS Int'l Conf. on Humanoid Robots 364 - 369 2006.12 [Refereed]

Authorship：Last author
Realization of rhythmic dialogue on spoken dialogue system using para-linguistic information

Shinya Fujie , Tetsunori Kobayashi

The Journal of the Acoustical Society of America 2006.11 [Refereed]

Authorship：Last author
Hybrid Voice Conversion of Unit Selection and Generation Using Prosody Dependent HMM

Masashi Okubo, Ryo Mochizuki, Tetsunori Kobayashi

Trans. on Information and Systems (ED) E89-D ( 11 ) 2775 - 2782 2006.11 [Refereed]

Authorship：Last author
Source Separation Using Multiple Directivity Patterns Produced by ICA-based BSS

Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 14th European Signal Processing Conference 2006.09 [Refereed]

Authorship：Last author
A Method for Solving the Permutation Problem of Frequency-Domain BSS Using Reference Signal

Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 14th European Signal Processing Conference 2006.09 [Refereed]

Authorship：Last author
Two-dimensional Heteroscedastic Linear Discriminant Analysis for Age-group Classification

Kazuya Ueki, Teruhide Hayashida, Tetsunori Kobayashi

Proc. 18th International Conference on Pattern Recognition 585 - 588 2006.08 [Refereed]

Authorship：Last author
Head Gesture Recognition for the Moving Conversation Robot

NAKAJIMA Kei, EJIRI Yasushi, FUJIE Shinya, OGAWA Tetsuji, MATSUSAKA Yosuke, KOBAYASHI Tetsunori

The IEICE transactions on information and systems 89 ( 7 ) 1514 - 1522 2006.07 [Refereed]

Authorship：Last author

CiNii
Spoken Dialogue System Using Recognition of User's Feedback for Rhythmic Dialogue

Shinya Fujie, Riho Miyake, Tetsunori Kobayashi

Proc. Speech Prosody 2006 2006.05 [Refereed]

Authorship：Last author
MONEA: Message-Oriented Networked-Robot Architecture

Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

Proc. Internarional Conference on Robotics and Automation 194 - 199 2006.04 [Refereed]

Authorship：Last author
MONEA : Message-Oriented Networked-Robot Architecture for Efficient Multifunctional-Robot Development Environment

Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

24 ( 4 ) 115 - 125 2006.04 [Refereed]

Authorship：Last author
Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion

Tetsuji Ogawa, Tetsunori Kobayashi

Trans. on Information and Systems (ED) E89-D ( 3 ) 939 - 945 2006.03 [Refereed]

Authorship：Last author
Adaptive understanding of proposal-requesting expressions for conversational information retrieval system

Kenichiro Hosokawa, Shinya Fujie, Tetsunori Kobayashi

Systems and Computers in Japan 37 ( 14 ) 62 - 72 2006

　View Summary

This paper considers a conversational system in which information is provided in accordance with the conditions presented by the user, and proposes a method that can adequately deal even with unknown expressions. In most conventional systems, the relation between the expression and the intention of the utterance by the user is built into the system beforehand. Thus, it is difficult to deal adequately with unknown expressions which have not been learned. We propose a framework which adaptively manages on-line the relation between the expression and the intention by interaction with the user. The proposed method produces a framework in which the connection between the expression and the intention is dynamically modified according to the explicitness or implicitness of the affirmative or negative attitude shown by the user to the proposal made by the system. It is verified by an evaluation experiment that the system can adequately learn the relation between the expression and the intention of the user by the proposed method, and can deal adequately with unknown expressions. © 2006 Wiley Periodicals, Inc.

DOI

Scopus
Manifold HLDA and its application to robust speech recognition

Toshiaki Kubo, Tetsuji Ogawa, Tetsunori Kobayashi

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 1551 - 1554 2006 [Refereed]

　View Summary

A manifold heteroscedastic linear discriminant analysis (MHLDA) which removes environmental information explicitly from the useful information for discrimination is proposed. Usually, a feature parameter used in pattern recognition involves categorical information and also environmental information. A well-known HLDA tries to extract useful information (UT) to represent categorical information from the feature parameter. However, environmental information is still remained in the UI parameters extracted by HLDA, and it causes slight degradation in performance. This is because HLDA does not handle the environmental information explicitly. The proposed MHLDA also tries to extract UI like HLDA, but it handles environmental information explicitly. This handling makes MHLDA-based UI parameter less influenced of environment. However, as compensation, in MHLDA, the categorical information is little bit destroyed. In this paper, we try to combine HLDA-based UI and MHLDA-based UI for pattern recognition, and draw benefit of both parameters. Experimental results show the effectiveness of this combining method.
Subspace-based age-group classification using facial images under various lighting conditions

Kazuya Ueki, Teruhide Hayashida, Tetsunori Kobayashi

PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION - PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE 43 - + 2006 [Refereed]

　View Summary

This paper presents a framework of age-group classification using facial images under various lighting conditions. Our method is based on the appearance-based approach that projects images from the original image space into a face-subspace. We propose a two-phased approach (2DLDA+LDA), which is based on 2DPCA and LDA. Our experimental results show that the new 2DLDA+LDA-based approach improves classification accuracy more than the conventional PCA-based and LDA-based approach. Moreover, the effectiveness of eliminating dimensions that do not contain important discriminative information is confirmed. The accuracy rates are 46.3%, 67.8% and 78.1% for age-groups that are in the 5-year, 10-year and 15-year range respectively.
韻律情報を用いたスペクトル変換方式の検討

望月亮, 大久保雅史, 小林哲則

電子情報通信学会論文誌 J88-DII ( 11 ) 2269 - 2276 2005.11 [Refereed]

Authorship：Last author
Optimizing the Structure of Parly-Hidden Markov Models Using Weighted Likelihood-Ratio Maximization Criterion

Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 6th Annual Conf. of the Int'l Speech Communication Association 3353 - 3356 2005.09 [Refereed]

Authorship：Last author
Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue syste

Shinya Fujie, Kenta Fukushima, Tetsunori Kobayashi

Proc. 6th Annual Conf. of the Int'l Speech Communication Association 889 - 892 2005.09 [Refereed]

Authorship：Last author
A Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation Using Reference Signal

Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

14th European Signal Processing Conference 15 2005.09 [Refereed]

Authorship：Last author
Extension of Hidden Markov Models for Multiple Candidates and its Application to Gesture Recognition

Yosuke Sato, Tetsunoji Ogawa and Testunori Kobayashi

Trans. on Information and Systems (ED) E88-D ( 6 ) 1239 - 1247 2005.06 [Refereed]

Authorship：Last author
Speech recognition in the blind condition based on multiple directivity patterns using a microphone array

Toshiyuki Sekiya, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2005 1373 - 1376 2005.03 [Refereed]

Authorship：Last author
Adaptive Understanding of Proposal-Requesting Expressions for Conversational Information Retrieval System

Kenichiro Hosokawa, Shinya Fujie, Tetsunori Kobayashi

J88-DII ( 3 ) 619 - 628 2005.03 [Refereed]

Authorship：Last author
Recognition of Positive/Negative Attitude and Its Application to a Spoken Dialogue System

Shinya Fujie, Yasushi Ejiri, Hideaki Kikuchi, Tetsunori Kobayashi

J88-DII ( 3 ) 488 - 498 2005.03 [Refereed]

Authorship：Last author
Speech Shift: Speech Input Interface Using Intentional Control of Voice Pitch

Yukihiro Omoto, Masataka Goto, Katsunobu Itou, Tetsunori Kobayashi

J88-DII ( 3 ) 469 - 489 2005.03 [Refereed]

Authorship：Last author
An Evaluation of Affective Representation by Prosodic/Spectral Features

Masashi Okubo, Ryo Mochizuki, Tetsunori Kobayashi

J88-DII ( 2 ) 441 - 444 2005.02 [Refereed]

Authorship：Last author
Anthropo-morphic conversational robot : Multimodal human interface with para-linguistic information expressing/understanding abilities

KOBAYASHI Tetsunori, FUJIE Shinya, MATSUSAKA Yosuke, SHIRAI Katsuhiko

The Journal of the Acoustical Society of Japan 61 ( 2 ) 85 - 90 2005.02 [Refereed] [Invited]

Authorship：Lead author

DOI CiNii
A Conversation Robot with Back-Channel Feedback Function based on Linguistic and Nonlinguistic Information

Shinya Fujie, Kenta Fukushima, Tetsunori Kobayashi

Proc. International Conf. on Autonomous Robots and Agents 379 - 384 2004.12 [Refereed]

Authorship：Last author
Speech Spotter: On-demand Speech Recognition in Human-Human Conversation on the Telephone or in Face-to-Face Situations

Masataka Goto, Koji Kitayama, Katunobu Itou, Tetsunori Kobayashi

Proc. 5th Annual Conf. of the Int'l Speech Communication Association 2004.10 [Refereed]

Authorship：Last author
Speech Recognition Interface for Music Information Retrieval: Speech Completion'' and ``Speech Spotter

Masataka Goto, Katunobu Itou, Koji Kitayama , Tetsunori Kobayashi

ISMIR2004 403 - 408 2004.10 [Refereed]

Authorship：Last author
Recognition of three simultaneous utterance of speech by four-line directivity microphone mounted on head of robot

Naoya Mochiki,Toshiyuki Sekiya,Tetsuji Ogawa , Tetsunori Kobayashi

Proc. 5th Annual Conf. of the Int'l Speech Communication Association 2004.10 [Refereed]

Authorship：Last author
Prosody based Attitude Recognition with Feature Selection and Its Application to Spoken Dialog System as Para-Linguistic Information

Shinya Fujie, Daizo Yagi, Hideaki Kikuchi, Tetsunori Kobayashi

Proc. 5th Annual Conf. of the Int'l Speech Communication Association 2841 - 2844 2004.10 [Refereed]

Authorship：Last author
A low-band spectrum envelope reconstruction method for PSOLA-based F0 modification

Ryo Mochizuki, Tetsunori Kobayashi

Trans. on Information and Systems (ED) E87-D ( 10 ) 2426 - 2429 2004.10 [Refereed]

Authorship：Last author
A Conversation Robot Using Head Gesture Recognition as Para-Linguistic Information

Shinya Fujie, Yasuhi Ejiri, Kei Nakajima, Yosuke Matsusaka, Tetsunori Kobayashi

Proceedings of 13th IEEE International Workshop on Robot and Human Communication 159 - 164 2004.09 [Refereed]

Authorship：Last author
A Method of Gender Classification by Integrating Facial, Hairstyle, and Clothing Images

Kazuya Ueki, Hiromitsu Komatsu, Satoshi Imaizumi, Kenichi Kaneko, Satoshi Imaizumi, Nobuhiro Sekine, Jiro Katto, Tetsunori Kobayashi

Proc. Int'l Conf. on Pattern Recognition 446 - 449 2004.08 [Refereed]

Authorship：Last author
Design and Implementation of Data Sharing Architecture for Multifunctional Robot Development

Yosuke Matsusaka, Kentaro Oku, Tetsunori Kobayashi

Systems and Computers in Japan 35 ( 8 ) 54 - 65 2004.07 [Refereed]

Authorship：Last author
Extension of State-Observation Dependency in Partly-Hidden Markov Models and Its Application to Continuous Speech Recognition

OGAWA Tetsuji, KOBAYASHI Tetsunori

The Transactions of the Institute of Electronics,Information and Communication Engineers. 87 ( 6 ) 1216 - 1223 2004.06 [Refereed]

Authorship：Last author

CiNii
Speech Enhancement based on Multiple Directivity patterns using a Microphone Array

Toshiyuki Sekiya, Tetsunori Kobayashi,

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2004 2004.05 [Refereed]

Authorship：Last author
A Low-band Spectrum Envelope Modeling For High Quality Pitch Modification

Ryo Mochizuki , Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2004 645 - 648 2004.05 [Refereed]

Authorship：Last author
Spoken Dialogue System Using Prosody as Para-Linguistic Information

Shinya Fujie, Daizo Yagi, Yosuke Matsusaka, Hideaki Kikuchi, Tetsunori Kobayashi

Proc. Int'l Conf. on Speech Prosody 2004 387 - 390 2004.03 [Refereed]

Authorship：Last author
Multi-Layer Audio Segregation and its Application to Double Talk

Toshiyuki Sekiya, Tomohiro Sawada Tetsuji Ogawa, Tetsunori Kobayashi

SWIM(Lectures by Masters in Speech Processing) 2004.01 [Refereed]

Authorship：Last author
Recognition of Para-Linguistic Information and Its Application to Spoken Dialogue System

Shinya FUJIE, Yasushi EJIRI, Yosuke MATSUSAKA, Hideaki KIKUCHI , Tetsunori KOBAYASHI

Proc. Automatic Speech Recognition and Understanding Workshop 231 - 236 2003.12 [Refereed]

Authorship：Last author
Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking

Noriyuki Murai, Tetsunori Kobayashi

Systems and Computer in Japan 34 ( 30 ) 103 - 111 2003.11 [Refereed]

Authorship：Last author
Speech Starter: Noise-Robust Endpoint Detection by Using Filled Pauses

Koji Kitayama, Masataka Goto, Katunobu Itou , Tetsunori Kobayashi

Proc. 4th Annual Conf. of the Int'l Speech Communication Association 1237 - 1240 2003.09 [Refereed]

Authorship：Last author
車運転時における音声利用の心的負荷評価

宗近純一,松坂要佐,小林哲則

第2回情報科学技術フォーラム FIT2003 情報技術レターズ 2 105 - 106 2003.09 [Refereed]

Authorship：Last author
Design and Implementation of Data Sharing Architecture for Multi-Functional Robot Development

Yosuke Matsusaka, Kentaro Oku, Tetsurnori Kobayashi

J86-D-I ( 5 ) 318 - 329 2003.05 [Refereed]

Authorship：Last author
Hybrid modeling of PHMM and HMM for speech recognition

Tetsuji Ogawa, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2003 140 - 143 2003.04 [Refereed]

Authorship：Last author
Inter-Module Cooperation Architecture for Interactive Robot

KyeongJu Kim, Yosuke Matsusaka, Tetsunori Kobayashi

International Conference on Intelligent Robots and Systems 2286 - 2291 2002.10 [Refereed]

Authorship：Last author
Generalization of State-Observation-Dependency in Partly Hidden Markov Models

Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 3rd Annual Conf. of the Int'l Speech Communication Association 2673 - 2676 2002.09 [Refereed]

Authorship：Last author
System Architecture to Realize Widely Applicable and Interactive Behavior of the Robot

Yosuke Matsusaka, Tetsunori Kobayashi

Proc. Int'l Workshop on Lifelike Animated Agents -Tools, Affective Functions, and Applications- 77 - 82 2002.08 [Refereed]

Authorship：Last author
Media-Integrated Biometric Person Recognition Based on the Dempster-Shafer Theory

Yoshiaki Sugie, Tetsunori Kobayashi

Proc. Int'l Conf. on Pattern Recognition 2002 381 - 384 2002.08 [Refereed]

Authorship：Last author
Extension of Hidden Markov Models to Deal with Multiple Candidates of Observations and its Application to Mobile-robot-oriented Gesture Recognition

Yosuke Sato,Tetsunori Kobayashi

Proc. Int'l Conf. on Pattern Recognition 2002 515 - 519 2002.08 [Refereed]

Authorship：Last author
Trend of Spoken Dialogue Research(<Special Issue>Recent Advancements of Spoken Language Interfaces and Dialogue Systems)

Tetsunori Kobayashi

17 ( 3 ) 266 - 270 2002.05 [Refereed] [Invited]

Authorship：Lead author
Humanoid robots in waseda university—hadaly-2 and wabian

Shuji Hashimoto, Tetsunori Kobayashi, al

Autonomous Robots, Kluwer Academic Publishers 12 25 - 38 2002 [Refereed]
System Software for Collaborative Development of Interactive Robot

Yosuke Matsusaka , Tetsunori Kobayashi

IEEE-RAS Int'l Conf. on Humanoid Robots 271 - 277 2001.11 [Refereed]

Authorship：Last author
Modeling of conversational strategy for the robot participating in the group conversation

Yosuke Matsusaka, Shinya Fujie, Tetsunori Kobayashi

Proc. 2nd Annual Conf. of the Int'l Speech Communication Association 2173 - 2176 2001.09 [Refereed]

Authorship：Last author
Estimating positions of multiple adjacent speakers based on MUSIC spectra correlation using a microphone array

Hidetomo Tanaka, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2001 3045 - 3048 2001.05 [Refereed]

Authorship：Last author
Japanese Dictation ToolKit-1999 version-

Kawahara Tatsuya, Lee Akinobu, Kobayashi Tetsunori, Takeda Kazuya, Minematsu Nobuaki, Sagayama Shigeki, Itou Katsunobu, Itou Akinori, Yamamoto Mikio, Yamada Atsushi, Utsuro Takehito, Shikano Kiyohiro

The Journal of the Acoustical Society of Japan 57 ( 3 ) 210 - 214 2001.03

DOI CiNii J-GLOBAL
DARPA音声プロジェクトと日本の音声認識研究

小林哲則

日本音響学会誌 57 ( 1 ) 70 - 60 2001.01 [Refereed]

Authorship：Lead author
Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking

Noriyuki Murai, Tetsunori Kobayashi

J83-D-II ( 11 ) 2465 - 2742 2000.11 [Refereed]

Authorship：Last author
Spoken Word Recognition Using Partly-Hidden Markov Models

Junko Koyama, Tetsunori Kobayashi

J83-D-II ( 11 ) 2379 - 2387 2000.11 [Refereed]

Authorship：Last author
Robust Language Modeling for Small Corpus of Target Task Using Class Combined Word Statistics and Selective Use of General Corpus

Yosuke Wada, Norihiko Kobayashi, Tetsunori Kobayashi

J83-D-II ( 11 ) 2379 - 2406 2000.11 [Refereed]

Authorship：Last author
Partly-Hidden Markov Model and Its Application to Gesture Recognition

Ken Masumitsu, Tetsunori Kobayashi

41 ( 11 ) 3060 - 2069 2000.11 [Refereed]

Authorship：Last author
Free software toolkit for japanese large vocabulary continuous speech recognition(共著)

T.Kawahara, A.Lee, T.Kobayashi, K.Takeda, N.Minematsu, S.Sagayama, K.Itou, A.Ito, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

Proc. 1st Annual Conf. of the Int'l Speech Communication Association 476 - 479 2000.09 [Refereed]
Dictation of multi-party conversation using statistical turn taking model and speaker models

Noriyuki Murai, Tetsunori Kobayashi

Proc. of International Conference on Acoustic, Speech, Signal Processing 1575 - 1578 2000.06 [Refereed]

Authorship：Last author
A conversational robot utilizing facial and body expressions

Tsuyoshi Tojo, Yosuke Matsusaka, Tomotada Ishii, Tetsunori Kobayashi

Proc. International Conf. on System, Man and Cybernetics 858 - 863 2000.06 [Refereed]

Authorship：Last author
Japanese Dictation ToolKit - 1998 version -

KAWAHARA Tatsuya, LEE Akinobu, KOBAYASHI Tetsunori, TAKEDA Kazuya, MINEMATSU Nobuaki, ITOU Katsunobu, YAMAMOTO Mikio, YAMADA Atsushi, UTSURO Takehito, SHIKANO Kiyohiro

The Journal of the Acoustical Society of Japan 56 ( 4 ) 255 - 259 2000.04 [Refereed]

DOI CiNii
Multi-person Conversation via Multi-modal Interface : A Robot who Communicate with Multi-user

Yosuke Matsusaka, Tsuyoshi Tojo, Sentaro Kubota, Kenji Furukawa, Daisuke Tamiya, Shinya Fujie, Tetsunori Koabyashi,

Proc. European Conf. on Speech Communication and Technology 1723 - 1726 1999.09 [Refereed]

Authorship：Last author
Class-combined Word N-gram for Robust Language Modeling

Noriyuki Kobayashi, Tetsunori Kobayashi

Proc. European Conf. on Speech Communication and Technology 1599 - 1602 1999.09 [Refereed]

Authorship：Last author
Multi-person Conversation Robot using Multi-modal Interface

Yosuke Matsusaka, Tsuyoshi Tojo, Sentaro Kubota, Kenji Furukawa,Shinya Fujie, Tetsunori Koabyashi

Proc. SCI/ICAS'99 450 - 455 1999.07 [Refereed]

Authorship：Last author
Controlling Dialogue Strategy According to Performance of Processing

Hideaki Kikuchi, Tetsunori Kobayashi, Katsuhiko Shirai

Proc. ESCA Workshop on Interactive Dialogue in Multi-Modal Systems 85 - 88 1999.06 [Refereed]
Japanese dictation toolkit: 1997 version

T.Kawahara, A.Lee, T.Koabayshi, K.Takeda, N.Minematsu, K.Ito, A.Itoh, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

Journal of the Acoustical Society of Japan E 20 ( 3 ) 233 - 239 1999.05 [Refereed]
JNAS: Japanese speech corpus for large vocabulary continuous speech recognition reseach

K.Ito, M.Yamamoto, K.Takeda, T.Takezawa, T.Matsuoka, Tetsunori Kobayashi, K.Shikano and S.Itahashi

Journal of the Acoustical Society of Japan E 20 ( 3 ) 199 - 207 1999.05 [Refereed]
Effect of Vocabulary Extension using Word Sequence Concatenation for Large Vocabulary Continuous Speech Recognition

Yosuke Wada, Norihiko Kobayashi, Yuichiro Nakano, Tetsunori Kobayashi

40 ( 4 ) 1413 - 1420 1999.04 [Refereed]

Authorship：Last author
Partly Hidden Markov Model and its Application to Speech Recognition

T.Kobayashi, K.Masumitsu, J.Furuyama,

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1999 121 - 124 1999.03 [Refereed]

Authorship：Lead author
Japanese dictation toolkit: 1997 version

T.Kawahara, A.Lee, T.Koabayshi, K.Takeda, N.Minematsu, K.Ito, A.Itoh, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

55 ( 3 ) 175 - 180 1999.03 [Refereed]
The Design of the Newspaper-Based Japanese Large Vocabulary Continuous Speech Recognition Corpus

K.Itoh, M.Yamamoto, K.Takezawa, T.Matsuoka, K.Shikano, T.Kobayashi, S.Itahashi,

Proc. 5th Int'l Conf. on Spoken Language Processing 3261 - 3264 1998.12 [Refereed]
Sharable software repository for Japanese large vocabulary continuous speech recognition

T.Kawahara, T.Kobayashi, K.Takeda, N.Minematsu, K.Itou, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

Proc. 5th Int'l Conf. on Spoken Language Processing 3257 - 3260 1998.12 [Refereed]
Source-Extended Language Model for Large Vocabulary Continuous Speech Recognition

Tetsunori Kobayashi, Norihiko Kobayashi , Yosuke Wada

Proc. 5th Int'l Conf. on Spoken Language Processing 2431 - 2434 1998.12 [Refereed]

Authorship：Lead author
Controlling Gaze of Humanoid in Communication with Human

H.Kikuchi,M.Yokoyama,K.Hoashi,Y.Hidaki,T.Kobayashi,K.Shirai

Proc.IROS98/IEEE 255 - 260 1998.10 [Refereed]
Design and Development of Japanese Speech Corpus for Large Vocabulary Continuous Speech Recognition Assessment

K.Itou,K.Takeda,T.Takezawa,T.Matsuoka,K.Shikano,T.Kobayashi,S.Itahashi,M.Yamamoto

Proc. of First International Workshp on East-Asian Language Resorces and Evaluation 98 - 103 1998.05 [Refereed]
Common Platform of Japanese Large Vocabulary Continuous Speech Recognizer—Proposal and Initial Results

T.Kawahara, A.Lee, T.Kobayashi, K.Takeda, N.Minematsu, K.Itou, A.Ito, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

Proc. of First International Workshp on East-Asian Language Resorces and Evaluation 117 - 122 1998.05 [Refereed]
Speech Processing Technology towards Practical Use

Katsuhiko Shirai, Tetsunori Kobayashi, Ikuo Kudo

38 ( 11 ) 971 - 975 1997.10 [Refereed] [Invited]
Humanoid - Intelligent Anthropomorphic Robot -

Shuji Hashimoto, Seinosuke Narita, Katsuhiko Shirai, Tetsunori Kobayashi, Atsuo Takanishi, Shigeki Sugano, Yoshinori Kasahara

38 ( 11 ) 959 - 969 1997.10 [Refereed]
Humanoid Robot ---Development of an Information Assistant Robot Hadaly---

Hashimoto, S, Narita, H. Kasahara, A. Takanishi, S. Sugano, K. Shirai, T. Kobayashi, H. Takanobu, T. Kurata, K. Fujiwara, T. Matsuno, T. Kawasaki , K. Hoashi

Proc. Int'l Workshop on Robot and Human Communication 106 - 111 1997.09 [Refereed]
Development of ASJ Continuous Speech Corpus --- Japanese Newspaper Article Sentences (JNAS) ---

Shuichi ITAHASHI, Mikio YAMAMOTO, Toshiyuki TAKEZAWA, Tetsunori KOBAYASHI

Proc. Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques 1997.09 [Refereed]
Partly Hidden Markov Model and its Application to Gesture Recognition

Tetsunori Kobayashi, Sataoshi Haruyama

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1997 3081 - 3084 1997.04 [Refereed]

Authorship：Lead author
Human interface of the humanoids

Tetsunori Kobayashi

Proc. International Workshop on Human Interface Technology 63 1997.03 [Refereed]

Authorship：Lead author
マルチモーダル入力環境下における音声の協調的利用 : 音声作図システムS-tgifの設計と評価

西本卓也, 志田修利, 小林哲則, 白井克彦

電子情報通信学会論文誌. D-II, 情報・システム, II-情報処理 79 ( 12 ) 2176 - 2183 1996.12 [Refereed]

Authorship：Corresponding author

　View Summary

マルチモーダルインタフェースの枠組みの中で音声入力がどのようにインタフェースの改善に貢献し得るかを検討し,そこで得た知見を生かしたマルチモーダル作図システムS-tgifを作成・評価した.システムの作成にあたっては,インタフェースの原則論に従って音声の特長である操作性および手順連想容易性を生かし,欠点である状態理解容易性,頑健性を他で補うよう努めた.評価実験の結果,システムの利用を開始してまもない時期あるいは一時利用を中断した後などにおいては特に音声の利用効果が高く,課題の完了までに要する時間を約80%に減少できた.ユーザがシステムに熟練すると音声の利用の客観的効果は薄れるが,特定のコマンドでは音声の利用率が90%を超え,また主観評価の結果でも高い評価を得るなど,音声入力はユーザから支持された.このように,インタフェースの原則論に従って音声の効果的利用を考慮することにより,有用なインタフェースを構築できることが示された.

CiNii
ALICE : Acquisition of Language In Conversational Environment : An Approach to Weakly Supervised Training of Spoken Language System for Language Porting

Tetsunori Kobayashi

Proc. 4th Int'l Conf. on Spoken Language Processing 833 - 836 1996.10 [Refereed]

Authorship：Lead author
An application of Dempster and Shafer's probability theory to speech recognition

Tetsunori Kobayashi

The Journal of the Acoustical Sosiety of America 100 ( 4 Pt.2 ) 2757 1996.10 [Refereed]

Authorship：Lead author
Speech recognition in nonstationary noise based on parallel HMMs and spectral subtraction

Ryuji Mine, Tetsunori Kobayashi, Katsuhiko Shirai

Systems and Computers in Japan 27 ( 14 ) 37 - 44 1996 [Refereed]

Authorship：Corresponding author

　View Summary

This paper proposes a method of speech recognition in a nonstationary noisy environment, combining the parallel HMMs and the spectral subtraction. In the proposed method, a set of hypothesis is generated with respect to the combination of the speech and the noise that can produce the observed data by a series of subtraction processes. Using HMMs prepared separately for the speech and the noise, the probabilities of occurrence are calculated. The 100-word recognition in the noisy environment in an ordinary car running in an urban area, is defined as the task in the experiment. Comparative experiments, are made for the proposed method, the ordinary spectral subtraction method and other parallel HMM methods. Then, the effectiveness of the proposed method is verified.

DOI

Scopus

2

Citation

(Scopus)
Improving human interface in drawing tool using speech, mouse and key-board

Takuya Nishimoto, Nobutoshi Shida, Tetsunori Kobayashi , Katsuhiko Shirai

Proc. International Workshop on Robot and Human Communication 107 - 112 1995 [Refereed]

Authorship：Corresponding author
Phoneme recognition in various styles of utterance based on mutual information criterion(共著)

Shigeki Okawa, Tetsunori Kobayashi , Katsuhiko Shirai

Proc. 3rd Int'l Conf. on Spoken Language Processing 1911 - 1917 1994.09 [Refereed]
Multimodal drawing tool using speech, mouse and key-board(共著)

T.Nishimoto, N.Shida, T.Kobayashi K.Shirai

Proc. 3rd Int'l Conf. on Spoken Language Processing 1287 - 1290 1994.09 [Refereed]

Authorship：Corresponding author
Generation of prosody in speech synthesis using large speech data-base

Naohiro Sakurai, Takemi Mochida, Tetsunori Kobayashi , Katsuhiko Shirai

Proc. 3rd Int'l Conf. on Spoken Language Processing 747 - 750 1994.09 [Refereed]

Authorship：Corresponding author
Phoneme recognition in continuous speech based on mutual information criterion

Shigeki Okawa, Tetsunori Kobayashi, Katsuhiko Shirai

50 ( 9 ) 702 - 710 1994.09 [Refereed]
Handling of User Interruption Realizing Timing-Free Utterances for Spoken Dialogue Interface

Hideaki Kikuchi, Ikuo Kudo, Tetsunori Kobayashi, Katsuhiko Shirai

J77-D ( 8 ) 1502 - 1511 1994.08 [Refereed]
Speech Recognition in Nonstationary Noise Based on Parallel HMMs and Spectral Subtraction

Ryuji MINE, Tetsunori KOBAYASHI, Katsuhiko SHIRAI

J78-DII ( 7 ) 1021 - 1027 1994.07 [Refereed]

Authorship：Corresponding author
Recognition of convervsational speech

Tetsunori Kobayashi

50 ( 7 ) 563 - 567 1994.07 [Refereed] [Invited]

Authorship：Lead author
Characterization of fluctuations in fundamental periods of speech based on fractal analysis

Tetsunori Kobayashi

The Journal of the Acoustical Society of America 95 ( 5 Pt.2 ) 2824 1994.05 [Refereed]

Authorship：Lead author
Automatic training of phoneme dictionary based on mutual information criterion

Shigeki Okawa, Tetsunori Kobayashi , Katsuhiko Shirai

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1994 241 - 244 1994.04 [Refereed]
Markov model based noise modeling and its application to noisy speech recognition using dynamical features of speech

Tetsunori Kobayashi, Ryuji Mine , Katsuhiko Shirai

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1994 57 - 60 1994.04 [Refereed]

Authorship：Lead author
Phoneme Recognition Using Probability Ratio between Phoneme-Group-Pair

Tetsunori Kobayashi, Y.Hamano, S.An, Katsuhiko Shirai

J77-A ( 2 ) 128 - 134 1994.02 [Refereed]

Authorship：Lead author
Speech synthesis of japanese sentences using large waveform data-base

Takemi Mochida, Tetsunori Kobayashi, Katsuhiko Shirai

1993 International Workshop on Speech Processing 95 - 100 1993.11 [Refereed]
Word spotting in conversational speech based on phonemic likelihood by mutual information criteion

S.Okawa, T.Kobayashi , K.Shirai

Proc. European Conf. on Speech Communication and Technology 1281 - 1284 1993.09 [Refereed]
Speech recognition under the unstationary noise based on the noise markov model and spectral subtraction

T.Kobayashi, R.Mine , K.Shirai

Proc. European Conf. on Speech Communication and Technology 833 - 836 1993.09 [Refereed]

Authorship：Lead author
隠れマルコフモデルに基づく音声認識

小林哲則

電気学会論文誌 C 電子・情報・システム部門誌 113 ( 5 ) p295 - 301 1993.05 [Refereed] [Invited]

Authorship：Lead author

CiNii
Design and creation of speech and text corpora of dialogue

Satoru Hayamizu, Shuichi Itahashi, Tetsunori Kobayashi, Toshiyuki Takezawa

Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA) E76-A ( 1 ) 17 - 22 1993 [Refereed]
Phrase recognition in conversational speech using prosodic and phonemic information

Shigeki Okawa, Takashi Endo, Tetsunori Kobayashi, Katsuhiko Shirai

Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA) E76-A ( 1 ) 44 - 50 1993 [Refereed]
High quality syntheic speech generation using synchronized oscillators

Kenji Hashimoto, Takemi Mochida, Yasuaki Satoh, Tetsunori Kobayashi, Katsuhiko Shirai

Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA) E76-A ( 11 ) 1949 - 1956 1993 [Refereed]
ASJ continuous speech corpus for research

Tetsunori Kobayashi, Shuichi Itahashi, Satoru Hayamizu, Toshiyuki Takezawa

Vol.48 ( No.12 ) 888 - 893 1992.12 [Refereed]

Authorship：Lead author
Spectral mapping onto probabilistic domain using neural networks and its application to speaker adaptive phoneme recognition

T.Kobayashi

Proc. 2nd Int'l Conf. on Spoken Language Processing 385 - 388 1992.11 [Refereed]

Authorship：Lead author
Speaker adaptive phoneme recognition based on spectral mapping to probabilistic domain

T.Kobayashi, Y.Uchiyama, J.Osada , K.Shirai

Proc. of International Conference on Acoustics, Speech and Signal Processing 457 - 460 1992.03 [Refereed]

Authorship：Lead author
Fractal dimension of fluctuations in fundamental period of speech

K.Shirai, T.Kobayashi, M.Yagyu

Proc. of International Conference on Noise in Physical Systems and 1/f Fluctuations 1991.11 [Refereed]

Authorship：Corresponding author
Visualization of Speech Production Process and Color Representation of Phonetic Information

Katsuhiko Shirai, Tetsunori Kobayashi

11 ( 43 ) 216 - 221 1991.10 [Refereed]
The Role of Fluctuations in Fundamental Period for Natural Speech Synthesis.

Tetsunori Kobayashi, Hidetoshi Sekine

47 ( 8 ) 539 - 544 1991.08 [Refereed]

Authorship：Lead author
Estimation of articulatory motion using neural networks

Katsuhiko Shirai, Tetsunori Kobayashi

Journal of Phonetics 19 379 - 385 1991.08 [Refereed]

Authorship：Corresponding author
Analysis of cotextual dependency of phonetic features and its application to speech recognition

T.Kobayashi, K.Watanabe, Y.Uchiyama

Proc. Korea-Japan joint workshop on advanced technology of speech recognition and synthesis 92 - 97 1991.07 [Refereed]

Authorship：Lead author
Application of neural networks to articulatory motion estimation

T.Kobayashi, M.Yagyu, K.Shirai

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1991 489 - 492 1991.05 [Refereed]

Authorship：Lead author
Dependence of spectral features of vowels and voiceless stops on phoneme environment.

Tetsunori Kobayashi, Kazuhiro Watanabe, Toshiyuki Matsuda

J74-A ( 3 ) 353 - 359 1991 [Refereed]

Authorship：Lead author
Statistical properties of fluctuation of pitch intervals and its modeling for natural synthetic speech

T.Kobayashi,H.Sekine

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1990 321 - 324 1990 [Refereed]

Authorship：Lead author
Dependence of phonemic feature on context

T.Kobayashi, K.Watanabe

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1990 769 - 772 1990 [Refereed]

Authorship：Lead author
Quantification Theory Dealing with Categorical Dependency and Its Application to the Modeling of Spectral Difference of Phoneme.

Tetsunori Kobayashi, Toshiyuki Matsuda, Kazuhiro Watanabe

J74-A ( 3 ) 345 - 352 1990 [Refereed]

Authorship：Lead author
Contextual Factor Analysis of Vowel Distribution

Tetsunori Kobayashi, Toshiyuki Matsuda, Kazuhiro Watanabe

Proc. European Conf. on Speech Communication and Technology 2277 - 2280 1989 [Refereed]

Authorship：Lead author
A categorical factor analysis of vowel distribution based on the modified qualification theory

Tetsunori Kobayashi, Toshiyuki Matsuda

The Journal of the Acoustical Society of America 1988 [Refereed]

Authorship：Lead author
Speech Production Model and Automatic Recognition

Katsuhiko Shirai, Tetsunori Kobayashi

Nature, Cognition and System I 3 - 14 1988 [Refereed]
Description of Task Dependent Knowledge for Speech Understanding System

Tetsunori Kobayashi, Katsuhiko Shirai

European Conference on Speech Technology 1987 [Refereed]

Authorship：Lead author
The robot musician ‘wabot-2’(waseda robot-2)

Ichiro Kato, Sadamu Ohteru, Katsuhiko Shirai, Toshiaki Matsushima, Seinosuke Narita, Shigeki Sugano, Tetsunori Kobayashi, Eizo Fujisawa

Robotics 3 ( 2 ) 143 - 155 1987 [Refereed]
A network model dealing with focus of conversation for speech understanding system

Tetsunori Kobayashi, Katsuhiko Shirai

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1986 1589 - 1592 1986 [Refereed]

Authorship：Lead author
Estimation of articulatory parameters by table look-up method and its application for speaker independent phoneme recognition

Katsuhiko Shirai, Tetsunori Kobayashi, Jun Yazawa

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1986 2247 - 2250 1986 [Refereed]

Authorship：Corresponding author
Estimating articulatory motion from speech wave

Katsuhiko Shirai, Tetsunori Kobayashi

Speech Communication 5 ( 2 ) 159 - 170 1986 [Refereed]

　View Summary

If articulatory movements can be estimated, then the articulatory parameters which represent the motion of the articulatory organs would be useful for speech recognition. This paper discusses an effective method of estimating articulatory movements and its application to speech recognition. Firstly, what is described is a method of estimating articulatory parameters known as the model matching method, and various spectral distance measures are evaluated for this method. The results show that the best in average is the higher order cepstral distance measure, which is one of the peak weighted measure. Secondly, articulatory parameters are utilized for the recognition of vowels uttered by unspecified speakers. It is shown that the adaptation of the model by the estimated mean vocal tract length is effective to normalize speaker difference. Thirdly, the motor commands to move the articulatory organs are estimated considering articulatory dynamics, and the continuous vowels are recognized by means of these estimated commands. It has been found that a considerable part of the coarticulation effects can be compensated for by this command estimated, and the method is useful for continuous speech recognition. © 1986.

DOI

Scopus

28

Citation

(Scopus)
Evaluation of Spectral Distance Measure for the Estimation of Articulatory Motion by the Model Matching Method

Tetsunori Kobayashi, Jun Yazawa, Katsuhiko Shirai

J68-A ( 2 ) 210 - 217 1985.10 [Refereed]

Authorship：Lead author
Speech I/O System Realizing Flexible Conversation for Robot

Katsuhiko Shirai, Tetsunori Kobayashi, Kazuhiko Iwata, Yoshio Fukazawa

3 ( 4 ) 362 - 372 1985.08 [Refereed]

Authorship：Corresponding author
Phrase speech recognition for large vocabulary

Tetsunori Kobayashi, Yasuhiro Komori, Katsuhiko Shirai

J68-D ( 6 ) 1304 - 1311 1985.06 [Refereed]

Authorship：Lead author

CiNii
Speech conversation system of the musician robot

Tetsunori Kobayashi, Y. Komori, N. Hashimoto, Kazuhiko Iwata, Y. Fukazawa, K. Shirai

Proc. ICAR'85 483 - 488 1985 [Refereed]

Authorship：Lead author
Speech I/O System Realizing Flexible Conversation for Robot--The Conversational System of WABOT-2

Katsuhiko Shirai, Tetsunori Kobayashi, Kaduhiko Shirai, Yoshio Fukazawa

Bulletin of Science and Egineering Resaerch Laboratory, Waseda University 112 53 - 79 1985 [Refereed]

Authorship：Corresponding author
Recognition of Vowels in Continuous Speech Based on the Articulatory Control Model

Tetsunori Kobayashi, Katsuhiko Shirai

J67-A ( 10 ) 935 - 942 1984.10 [Refereed]

Authorship：Lead author

CiNii
Phrase speech recognition of large vocabulary using feature in articulatory domain

Katsuhiko Shirai, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1984 409 - 412 1984 [Refereed]

Authorship：Corresponding author
Considerations on articulatory dynamics for continuous speech recognition

Katsuhiko Shirai, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1983 324 - 327 1983 [Refereed]

Authorship：Corresponding author
Validity of Articulatory Parameters in Contimuous Speech Recognition for Unspecified Speakers - Vowel Discrimination Test -

Katsuhiko Shirai, Hiroshi Matsuura, Tetsunori Kobayashi

J65-A ( 7 ) 671 - 678 1982.07 [Refereed]

CiNii
Recognition of semivowels and consonants in continuous speech using articulatory parameters

Katsuhiko Shirai, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1982 2004 - 2007 1982 [Refereed]

Authorship：Corresponding author

▼display all

Books and Other Publications

Paralinguistic Information and its Integration in Spoken Dialogue Systems

Ramón López-Cózar Delgado, Tetsunori Kobayashi( Part： Joint editor)

Springer 2011 ISBN: 9781461413349
音声言語処理の潮流

白井克彦( Part： Contributor)

コロナ社 2010.03 ISBN: 9784339008104
韻律と音声言語情報処理 : アクセント・イントネーション・リズムの科学

広瀬啓吉( Part： Contributor)

丸善 2006.01 ISBN: 4621076744
情報システムとヒューマンインターフェース

白井克彦( Part： Contributor)

早稲田大学出版部 2010.03 ISBN: 9784657101099
Springer handbook of robotics

Siciliano, Bruno, Khatib, Oussama( Part： Contributor)

Springer 2008 ISBN: 9783540239574
人工知能学事典

人工知能学会( Part： Contributor)

共立出版 2005.12 ISBN: 4320121074
ロボット工学ハンドブック

日本ロボット学会( Part： Contributor)

コロナ社 2005.06 ISBN: 9784339045765
Spoken language systems

Seiichi Nakagawa, Michio Okada, Tatsuya Kawahara( Part： Contributor)

Ohmsha,IOS 2005 ISBN: 1586035150
感性情報学 : 感じる・楽しむ・創りだす : 感性的ヒューマンインタフェース最前線

原島博, 井口征士, 乾敏郎( Part： Contributor)

工作舎 2004.05 ISBN: 4875023782
マルチメディア処理入門

新田恒雄, 岡村好庸, 杉浦彰彦, 小林哲則, 金沢靖, 山本真司( Part： Joint author)

朝倉書店 2002.04 ISBN: 4254205074
人間型ロボットのはなし

早稲田大学ヒューマンノイドプロジェクト( Part： Joint author)

日刊工業新聞社 1999.06 ISBN: 4526043974

ASIN
Cで学ぶプログラミング技法

小林哲則( Part： Sole author)

培風館 1997.11 ISBN: 4563013951
Recent research towards advanced man-machine interface through spoken language.

Hiroya Fujisaki( Part： Contributor)

Elsevier 1996.10 ISBN: 0444816070

ASIN
International Symposium on Spoken Dialogue : New directions in human and man-machine communication

Katsuhiko Shirai, Tetsunori Kobayashi, Yasunari Harada( Part： Joint editor)

ISSD Organizing Committee 1993.11 ISBN: 4990026918

▼display all

Presentations

会話のできるロボットと身体を持った会話システム

小林哲則 [Invited]

日本音響学会2025年春季研究発表会, スペシャルセッション「音声対話技術の新展開1」

Presentation date： 2025.03
HANASHI-JOZU = A Good Conversationalist: Goodbye Request-response Model, Hello Pre-planed Information Transfer Model.

Tetsunori Kobayashi [Invited]

Multimodal Agents for Ageing and Multicultural Societies, NII Shonan meeting

Presentation date： 2018.10
A robot-based enjoyable conversation system

Tetsunori Kobayashi [Invited]

5-th ASA-ASJ Joint meeting, Nov.2016

Presentation date： 2016.11
A robot-based approach towards finding conversation protocol: the role of prosody, eye gaze and body expressions in communication

Tetsunori Kobayashi [Invited]

The Fifth Workshop of Eye Gaze in Intelligent Human Machine Interaction

Presentation date： 2013
Robot as a multimodal human interface device

Tetsunori Kobayashi [Invited]

International Conference on Auditory-Visual Speech Processing

Presentation date： 2010.10
Conversation robot recognizing and expressing paralinguistic information

Tetsunori Kobayashi [Invited]

Workshop on Predictive Models of Human Communication Dynamics

Presentation date： 2010.08
情報遭遇型会話システム：多様な情報行動による知識の伝達

小林哲則 [Invited]

人工知能学会/情報処理学会/電子情報通信学会第7回対話システムシンポジウム

Presentation date： 2016.10
会話とロボット

[Invited]

MMDAgent DAY !

Presentation date： 2016.10
Enjoyable Conversation System

Tetsunori Kobayashi [Invited]

InterACT 2016

Presentation date： 2016.07
会話ロボットとそのプロトコル

小林哲則 [Invited]

日本音響学会春季研究発表会

Presentation date： 2016.03
会話向け音声合成システム

小林哲則, 岩田和彦 [Invited]

電子情報通信学会・音声研究会

Presentation date： 2014.11
History of the Conversational Robot

Tetsunori Kobayashi [Invited]

International Workshop on Spoken Dialogue System

Presentation date： 2010.10
マルチモーダル会話ロボットとグループコミュニケーション

小林哲則 [Invited]

電子情報通信学会 VNV研究会

Presentation date： 2009.03
音声認識応用システム開発の新パラダイム

小林哲則 [Invited]

情報処理学会/電子情報通信学会, 第10回音声言語シンポジウム

Presentation date： 2008.12
パラ言語の理解・生成機能によるリズムある対話コミュニケーションの実現

小林哲則 [Invited]

日本ロボット学会・ロボット工学セミナー

Presentation date： 2006.03
パラ言語の理解・生成能力を有する会話ロボット

小林哲則 [Invited]

電子情報通信学会・パターン認識メディア理解研究会

Presentation date： 2005.09
ロボット頭部に設置したマイクロホンによる音源定位・音源分離

[Invited]

日本音響学会・春季研究発表会

Presentation date： 2005.03
音声認識技術の現状と課題

小林哲則 [Invited]

電子情報通信学会音声実用化シンポジウム

Presentation date： 2004.03
会話ロボットの実現に向けて

小林哲則 [Invited]

電子情報通信学会・ヒューマンコミュニケーション基礎(HCS)研究会

Presentation date： 2003.04
Multi-Modal Conversational Interface for Humanoid Robot

Tetsunori Kobayashi, Katsuhiko Shirai [Invited]

Presentation date： 1999.12
Trend of Stochastic Speech Recognition

Tetsunori Kobayashi [Invited]

ISCIE Stochastic System Symposium

Presentation date： 1997.11

▼display all

Research Projects

Temporal structural modeling of conversational interaction and efficient information transfer using speech media.

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2018.04

-

2022.03

Kobayashi Tetsunori

　View Summary

In order to efficiently convey massinformation via voice media, it is important to incorporate conversational elements into the information transmission and to guarantee the rhythm of the interaction.
Here, we have modeled the constraints on the temporal structure of conversational interaction that form the basis for realizing rhythmic conversation and incorporated the model into our information delivery system. The system has the ability to monitor the user's response at any time while delivering a summarized document, and to restore and present information that was reduced during summarization in response to the user's response. These features achieved efficient document transmission through spoken conversation.
In addition, as important elemental technologies of the system, low-latency speech recognition technology, expressive speech synthesis technology, and paralinguistic understanding technology were developed to enhance the performance of the system.
User-friendly voice dialogue service that responds with natural timing and sensitivity

NEDO Venture support program for innovation implementation

Project Year :

2013.04

-

2014.03
Development of technology for utilizing information appliances, sensors, and human interface devices (Related to the development of speech recognition core technology)

Ministry of Economy, Trade and Industry Strategic Technology Development Consignment Program

Project Year :

2006

-

2009
人物行動パターン自動解析装置の開発

NEDO, 大学発事業創出実用化開発費助成金

Project Year :

2003.03

-

2004.03
Spoken Dialogue System utilizing Prosody Control

Japan Society for the Promotion of Science（JSPS） Grant-in-Aid for Scientific Research

Project Year :

2000

-

2003
A study on social networking services for older adults: clarification of barriers and providing solutions

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Challenging Exploratory Research

Project Year :

2013.04

-

2016.03

Kobayashi Tetsunori, NAKANO Teppei

　View Summary

This research presents a basic design and implementation of a framework, which help older adults communicate with others via Internet content sharing. This framework includes a prototype design of Channel-Oriented Interface, a very simple User Interface for older adults, and an interaction design using the interface. This interface is characterized in that it includes a framework for "supporters", who help users configure their interface and usage. Thanks to this feature, the interface is simplified and the operations are unified. We also designed and conducted a series of experiments using the prototype system; a questionnaire survey to 100 older adults, usability testing with 31 older adults, and user-experience evaluation with 3 pair of older adults and their children or grand children. These experimental results showed that most of our hypothesis are supported and however future tasks are still exist.
A study on communication robot performing rhythmic conversation

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Project Year :

2008

-

2010

KOBAYASHI Tetsunori

　View Summary

We sophisticated generation/recognition methods of linguistic and paralinguistic information and achieved a communication robot which can make conversation with a group of people. The robot was used to stimulate activity of the human to human conversation. For this purpose, we designed a robot appearance to express desired character for conversation and to perform paralinguistic information expression functions. We designed behaviors to suit for each conversational situation and conversational procedure to make it attractive. We also improved speech recognition/synthesis methods for conversation.
Construction of voice quality generation mechanism by a mechanical model

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2007

-

2010

HONDA Masaaki, TAKANISHI Atsuo, KOBAYASHI Tetsunori, SAGISAKA Yoshinori, FUKUI Koutaro

　View Summary

The research project was aiming to clarify quantitatively speech production process and its control strategy by using a mechanical speech production model (Talking ROBOT) which is mimicking the human mechanism. We realized to reproduce speech sounds with various voice qualities such as breathy voice and creaky voice as well as laughter and laughing voice by using the laryngeal control of the model which is similar to the human control. We also examined the vocal cord vibration behavior and aero-acoustic phenomenon in generating these voice sounds by the direct measurement.
Leading research for the practical use of speech recognition technology

NEDO Advanced Information and Communications Equipment and Devices Program

Project Year :

2005.06

-

2006.03
Studies on conversation systems with understanding and generating functions of linguistic and para-linguistic information

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

Project Year :

2003

-

2006

KOBAYASHI Tetsunori, FUJIE Shinya, OGAWA Tetsuji

　View Summary

As a tool for investigating fundamental elements of natural spoken language communication, a prototype of spoken dialogue system with understanding and generating functions of linguistic and para-linguistic information was developed.
Although many excellent studies on speech recognition and synthesis have been conducted, there exists no practical spoken dialogue system which satisfies us. One of the reasons is that most spoken dialogue systems did not deal with para-linguistic information. The quantitative understanding for para-linguistic information is not sufficient enough to make natural conversation system. In this study, we tried realizing many component technologies and a platform of conversation robot as tools to reveal the quantitative rolls of para-language.
In particular, the following outcomes were obtained. 1) the sound localization and separation methods using the four-line directivity microphone mounted on head of robot, 2) the high quality speech synthesis method based on the waveform synthesis and the high quality voice conversion method for expressing para-linguistic information, 3) the method of attitude recognition and back-channel feedback generation based on the prosodic information as para-linguistic information in speech information, 4) the method of head gesture recognition and facial expression recognition as para-linguistic information in visual information, 5) humanoid robot "ROBISUKE" developed as the platform of the spoken dialogue system, and 6) Message Oriented RObot Architecture, MONEA, proposed for the integration of the abovementioned modules.
Future work includes the experiment for finding out the necessary requirement for natural conversation quantitatively.
High performance speech and gesture recognition based on the stochastic model with mutual state-observation-dependencies

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (C)

Project Year :

2000

-

2002

KOBAYASHI Tetsunori

　View Summary

Aiming at treating more complicated temporal changes of stochastic phenomena, Partly-Hidden Markov Model (PHMM), is proposed and applied to speech and gesture recognition. It can treat the observation dependent behaviors in both observations and state transitions. Some simulation experiments showed the high potential of PHMM. In addition, from the gesture recognition and the isolated spoken word recognition experiments, PHMM showed the performance to exceed HMM.
In the formulation of original PHMM, we used common pair of hidden state and observable state to determine the stochastic phenomena of the observation and the state transition. In the formulation modified here, we use common hidden state but different observable state for the observation and for the state transition separately. This slight modification brought the big flexibility in the modeling of phenomena and reduced the word errors compared with HMM and traditional PHMM using continuous speech.
We also proposed Smoothed Partly-Hidden Markov Model (SPHMM), in which the observation and state transition probabilities are defined by the geometric means of PHMM-based ones and HMM-based ones. From continuous speech recognition experiments, it was found that SPHMM gave the best performance compared with HMM and PHMM when the weight of smoothing was set adequately.
知覚・行動による実世界とのインタラクションに基づく言語の理解・獲得と行動生成

日本学術振興会科学研究費助成事業萌芽的研究

Project Year :

1998

-

1999

小林哲則

　View Summary

本研究では、知覚機能を持つロボットを思考の主体とし、それを我々の生活環境におくことによって、実世界とのインタラクションが可能な状況を設定し、その中でロボットが知覚・行動を介して思考する枠組みを構築することを試みた。特に、ロボットに対する依頼表現の解釈、あるいはそれに応じた適切な行動計画の生成に焦点をあて、これらに対する適切な問題解釈を、実世界とのインタラクションを通じて行なう方法について検討した。
昨年度は、1)自律ロボットの作成、2)ロボットのAPI決定、3)外部世界の知覚アルゴリズムの作成、4)各種行動の上記シーンの解析結果を与える影響の抽出アルゴリズムの作成、などの要素技術の開発を行ない、これらをベースとして、行動のプリミティブに1対1に対応する言語表現を対象として言語獲得アルゴリズムの開発を行なったが、各要素技術の基本性能には問題が残されていた。
本年度においては、これら要素技術の高度化と、それらを統合したより高度な知的処理の実現を目指した。要素技術の高度化としては、ロボットに対する眉の付加とこれを用いた表情合成を行った。表情によるロボット内部の状態提示が可能になることで、利用者との間でのより豊かなインタラクションが実現された。また、シーン解析のアルゴリズム(視覚処理)に環境適応処理を組み込み、耐性を向上させることで、システムの動作が安定した。統合処理では、プリミティブ単体では実現できない複雑な行動を、どのようなプリミティブの組合せによって構成すべきかを、各種行動のプリミティブとその影響の関係対の組合せ問題を解くことによって求めるアルゴリズムについて検討した。以上によって、知覚・行動・思考の統合処理に基づいて、言語理解・獲得と行動計画の立案を行なう知的なシステムの基礎的な枠組みが実現できた。
時間変化する雑音環境下における音声認識に関する研究

日本学術振興会科学研究費助成事業奨励研究(A)

Project Year :

1993

　

　

小林哲則

　View Summary

本研究では時間変化する雑音下で発話された音声を高精度で認識するための基本技術を確立することを検討した。
人間は、音声にかなりの雑音がかぶっていても、あるいは、背景に音楽が流れていようとも、注目する音声を捉え、認識することができる。これらの機能は、人間の音声における瞬時スペクトルの特徴、あるいはその時系列の特徴に関する知識と、雑音における同様の知識を兼ね備えて持って、それらを分離しながら人間の音声にのみに選択的に注目する機能を持っているためである。本研究では、この機能を、音声と雑音とを2つの独立な確率モデルで表し、このモデルの下でもっともらしい音声と雑音の組合せ探索するという枠組によって、確率論理的に実現した・
具体的には、それぞれの情報源を独立に隠れマルコフモデル(HMM)と呼ばれる確率モデルで表現し、これらの情報源が与えられた時、その組合せの情報源から得られた観測信号列が生起する確率を、スペクトルサブトラクションと動的計画法に基づく最適時間整合とを組み合わせることによって実現した。
この結果、雑音対策なしのとき、-10dB、-20dBで、それぞれ74%、8%であった認識率を、100%、40%に向上させることができた。
音声の基本周波数揺らぎの生成論的および現象論的モデル化

日本学術振興会科学研究費助成事業奨励研究(A)

Project Year :

1990

　

　

小林哲則
知識主導型音声認識システムの音韻決定部におけるニューラルネットの応用に関する研究

日本学術振興会科学研究費助成事業奨励研究(A)

Project Year :

1989

　

　

小林哲則
音韻特徴の記号的表現とファジー推論を用いた音韻認識に関する研究

日本学術振興会科学研究費助成事業奨励研究(A)

Project Year :

1987

　

　

小林哲則
調音状態推定に基づくボトムアップ型調音結合処理に関する研究

日本学術振興会科学研究費助成事業奨励研究(A)

Project Year :

1986

　

　

小林哲則

▼display all

Misc

IPA Japanese dictation free software project

Katsunobu Itou, Kiyohiro Shikano, Tatsuya Kawahara, Kazuya Takeda, Atsushi Yamada, Akinori Ito, Takehito Utsuro, Tetsunori Kobayashi, Nobuaki Minematsu, Mikio Yamamoto, Shigeki Sagayama, Akinobu Lee

2nd International Conference on Language Resources and Evaluation, LREC 2000 2000.01

　View Summary

Large vocabulary continuous speech recognition (LVCSR) is an important basis for the application development of speech recognition technology. We had constructed Japanese common LVCSR speech database and have been developing sharable Japanese LVCSR programs/models by the volunteer-based efforts. We have been engaged in the following two volunteer-based activities. a) IPSJ (Information Processing Society of Japan) LVCSR speech database working group. b) IPA (Information Technology Promotion Agency) Japanese dictation free software project. IPA Japanese dictation free software project (April 1997 to March 2000) is aiming at building Japanese LVCSR free software/models based on the IPSJ LVCSR speech database (JNAS) and Mainichi newspaper article text corpus. The software repository as the product of the IPA project is available to the public. More than 500 CD-ROMs have been distributed. The performance evaluation was carried out for the simple version, the fast version, and the accurate version in February 2000. The evaluation uses 200 sentence utterances from 46 speakers. The gender-independent HMM models and 20k/60k language models are used for evaluation. The accurate version with the 2000 HMM states and 16 Gaussian mixtures shows 95.9 % word correct rate. The fast version with the phonetic tied mixture HMM and the 1/10 reduced language model shows 92.2 % word correct rate and realtime speed. The CD-ROM with the IPA Japanese dictation free software and its developing workbench will be distributed by the registration to http://www.lang.astem.or.jp/dictation-tk/or by sending e-mail to dictation-tk-request@astem.or.jp.

Industrial Property Rights

学習装置、音声認識装置、学習方法、および、学習プログラム

特許7473890

Patent
対話システムおよびプログラム

特許7274210

Patent
情報伝達システムおよびプログラム

特許7244910

Patent
収音装置、収音プログラム、及び収音方法

Patent
予兆検知システムおよびプログラム

Patent
予兆検知システムおよびプログラム

特許7107498

Patent
情報再生プログラム、情報再生方法、情報処理装置及びデータ構造

Patent
予測装置、予測方法および予測プログラム

特許6928346

Patent
制御状態監視システムおよびプログラム

Patent
状態監視システム

特許6717461

Patent
単語予測装置、プログラム

特許6588874

Patent
言語確率算出方法、言語確率算出装置および言語確率算出プログラム

特許6495814

Patent
会話ロボット

特許5751610

Patent
情報処理システム及び情報処理方法

特許 5467298

Patent
情報処理装置及び情報処理方法

特許5466593

Patent
対話活性化ロボット

特許5294315

Patent
音源分離装置、音源分離方法、音源分離プログラム及び記録媒体

特許5190859

Patent
音源分離装置、方法及びプログラム

特許5170465

Patent
音声認識用音響モデル作成装置とその方法と、プログラム

特許5152931

Patent
音源分離装置、プログラム及び方法

特許5105336

Patent
音声対話装置、音声対話方法及びロボット装置

特許5051882

Patent
音源分離装置、方法及びプログラム

特許4986248

Patent
音源分離システムおよび音源分離方法、並びに音響信号取得装置

特許4873913

Patent
顧客情報収集管理システム

特許4778532

Patent
音源分離方法およびそのシステム

特許4594629

Patent
人物属性識別方法およびそのシステム

特許4511850

Patent
音源分離方法およびそのシステム、並びに音声認識方法およびそのシステム

特許4457221

Patent
顧客情報収集管理方法及びそのシステム

特許4125634

Patent
音声入力モード変換システム

特許3906327

Patent

▼display all

Other

特別講義：高校数学で理解するChatGPTのしくみ（早大理工・オープンキャンパス）

2023.08

-

　
特別授業：高校数学で理解する人工知能の基礎（早大学院）

2021.06

-

　
特別講義：高校数学で理解する人工知能の基礎（早大理工・オープンキャンパス）

2018.08

-

　
特別授業：人工知能と会話ロボット（高知工業高校）

2018.05

-

　
特別授業：メディア情報処理が拓く世界（早大学院）

2005

-

　
特別授業：ロボットとのコミュニケーション（山手学院高校）

2005

-

　
特別授業：会話ロボットの歴史（都立立川高校）

2005

-

　
特別講義：高校数学で理解するChatGPTのしくみ（早大理工・オープンキャンパス）

2024.08

-

　
特別授業：高校数学で理解する人工知能の基礎（早大学院）

2019.10

-

　
特別講義：会話とロボット・・・私のライフワーク（東京女子大）

2017.06

-

　
Seminar : Enjoyable Conversation System (MIT, Spoken Lang. Sys. Group)

2015.11

-

　
セミナー：ロボットを用いた会話プロトコルの研究（千葉工大・藤江研）

2015.05

-

　
特別講義：会話システム：情報提供のきっかけは誰が作りうるか（奈良先端大）

2014.11

-

　
Seminar : A robot-based approach towards finding conversation protocols (MERL)

2013.06

-

　
セミナー：会話ロボットの開発（名古屋大・武田研）

2011.10

-

　
Lecture : Conversation Robot (E-Just, Egypt)

2010.05

-

　
特別講義：パラ言語の理解・生成機能を持つマルチモーダル会話ロボット（豊橋技科大）

2009.11

-

　
Seminar : Recent Research Topics in Perceptual Computing Group at Waseda University (MIT, Spoken Lang. Sys. Group)

2009.10

-

　
セミナー：音声対話ロボットの開発と将来展開（東北工大・畑岡研）

2009.02

-

　
セミナー：人間と自然に会話するロボットの実現を目指して（電機大・武川研）

2007.12

-

　
特別授業：コミュニケーションと人形ロボット ―理工学のススメ―（小平３中）

2007.09

-

　
セミナー：ロボットとの会話におけるパラ言語情報の利用－ ROBISUKE：マルチモーダル会話ロボット－（東北大・牧野研）

2004.11

-

　
セミナー：ROBISUKE：次世代の会話ロボット（京都大・奥乃研）

2003.11

-

　
セミナー：ロボットと人との対話（阪大・白井研）

2003.10

-

　
特別講義：マルチモーダルインタフェースによるヒューマノイドロボットとの対話（北陸先端大）

1999.10

-

　
Seminar : Multi-person Communication via Multi-modal Interface - Human Interface of the Humanoid Robot - (MIT, Spoken Lang. Sys. Group)

1999.05

-

　
◇ 本欄における特別講義／特別授業／セミナーの区別について

　View Summary

特別授業：高校・中学に赴いて単発の授業を行ったもの；特別講義：大学に赴いて単発の講義を行ったもの；セミナー：大学・企業の研究室に赴いて情報提供・議論を行ったもの。

▼display all

Syllabus

Methodical Robotics

Graduate School of Advanced Science and Engineering

2025 fall semester
Innovation and Technology Practice alpha

Global Education Center

2025 fall quarter
Innovation and Technology Start-Up alpha

Global Education Center

2025 spring quarter
Innovation and Technology Practice alpha

Global Education Center

2025 fall quarter
Innovation and Technology Start-Up alpha

Global Education Center

2025 spring quarter
Bachelor Thesis B

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A (Intensive Course)

School of Fundamental Science and Engineering

2025 an intensive course(spring and fall)
Bachelor Thesis A（Fall Semester）

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2025 spring semester
Computer Science and Engineering Laboratory A (2)

School of Fundamental Science and Engineering

2025 fall semester
Computer Science and Engineering Laboratory B [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Computer Science and Engineering Laboratory B

School of Fundamental Science and Engineering

2025 spring semester
Computer Science and Engineering Laboratory A [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Communications Engineering Laboratory

School of Fundamental Science and Engineering

2025 fall semester
Pattern Recognition and Machine Learning

School of Fundamental Science and Engineering

2025 fall semester
Project Research A

School of Fundamental Science and Engineering

2025 spring semester
Optimization Algorithm

School of Fundamental Science and Engineering

2025 spring semester
Project Research B

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B（Spring Semester）

School of Fundamental Science and Engineering

2025 spring semester
Information Theory A

School of Fundamental Science and Engineering

2025 spring semester
Communications Engineering Laboratory [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A（Fall Semester）

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis B [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B（Spring Semester）

School of Fundamental Science and Engineering

2025 spring semester
Information Theory A [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Information Theory A

School of Fundamental Science and Engineering

2025 spring semester
Communications Engineering Laboratory

School of Fundamental Science and Engineering

2025 fall semester
Optimization Algorithm [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Optimization Algorithm

School of Fundamental Science and Engineering

2025 spring semester
Communications and Computer Engineering Laboratory B [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Communications and Computer Engineering Laboratory B

School of Fundamental Science and Engineering

2025 spring semester
Communications and Computer Engineering Laboratory A

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B (Spring Semester)

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A (Intensive Course)

School of Fundamental Science and Engineering

2025 an intensive course(spring and fall)
Bachelor Thesis A (Fall Semester)

School of Fundamental Science and Engineering

2025 fall semester
Project Research A

School of Fundamental Science and Engineering

2025 spring semester
Pattern Recognition and Machine Learning

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B (Spring Semester)

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A (Fall Semester)

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2025 spring semester
Communications Engineering Laboratory [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Communications and Computer Engineering Laboratory A [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Computer Science and Communications Engineering Laboratory A [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Graduation Thesis A (Spring) [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Graduation Thesis B (Fall)

School of Fundamental Science and Engineering

2025 fall semester
Computer Science and Communications Engineering Laboratory A

School of Fundamental Science and Engineering

2025 fall semester
Project Research B

School of Fundamental Science and Engineering

2025 fall semester
Computer Science and Communications Engineering Laboratory B

School of Fundamental Science and Engineering

2025 spring semester
Project Research Spring

School of Fundamental Science and Engineering

2025 spring semester
Project Research Fall

School of Fundamental Science and Engineering

2025 fall semester
Introduction to Computers and Networks

School of Fundamental Science and Engineering

2025 spring semester
Graduation Thesis B (Fall) [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Graduation Thesis B (Spring) [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Graduation Thesis A (Fall) [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Graduation Thesis A (Spring)

School of Fundamental Science and Engineering

2025 spring semester
Graduation Thesis B (Spring)

School of Fundamental Science and Engineering

2025 spring semester
Graduation Thesis A (Fall)

School of Fundamental Science and Engineering

2025 fall semester
Practice on Embodiment Informatics I

Graduate School of Fundamental Science and Engineering

2025 spring semester
Practice on Embodiment Informatics H

Graduate School of Fundamental Science and Engineering

2025 fall semester
Practice on Embodiment Informatics G

Graduate School of Fundamental Science and Engineering

2025 spring semester
Practice on Embodiment Informatics F

Graduate School of Fundamental Science and Engineering

2025 fall semester
Practice on Embodiment Informatics E

Graduate School of Fundamental Science and Engineering

2025 spring semester
Practice on Embodiment Informatics D

Graduate School of Fundamental Science and Engineering

2025 fall semester
Practice on Embodiment Informatics C

Graduate School of Fundamental Science and Engineering

2025 spring semester
Practice on Embodiment Informatics B

Graduate School of Fundamental Science and Engineering

2025 fall semester
Practice on Embodiment Informatics A

Graduate School of Fundamental Science and Engineering

2025 spring semester
Embodiment Informatics

Graduate School of Fundamental Science and Engineering

2025 spring semester
Intensive Seminar on Embodiment Informatics

Graduate School of Fundamental Science and Engineering

2025 full year
Practice on Embodiment Informatics J

Graduate School of Fundamental Science and Engineering

2025 fall semester
Master's Thesis (Department of Computer Science and Communications Engineering)

Graduate School of Fundamental Science and Engineering

2025 full year
Master's Thesis (Department of Computer Science and Communications Engineering)

Graduate School of Fundamental Science and Engineering

2025 full year
Intensive Seminar on Embodiment Informatics

Graduate School of Fundamental Science and Engineering

2025 full year
Case Study on Innovation -Practice-

Graduate School of Fundamental Science and Engineering

2025 fall quarter
Case Study on Innovation -Start-Up-

Graduate School of Fundamental Science and Engineering

2025 spring quarter
Pattern Recognition

Graduate School of Fundamental Science and Engineering

2025 spring semester
Special Laboratory B in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 fall semester
Special Laboratory A in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 spring semester
Perceptual Computing

Graduate School of Fundamental Science and Engineering

2025 fall semester
Research on Linguistic Cognitive Science

Graduate School of Fundamental Science and Engineering

2025 full year
Research on Perceptual Computing

Graduate School of Fundamental Science and Engineering

2025 full year
Perceptual Computing

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Perceptual Computing C

Graduate School of Fundamental Science and Engineering

2025 spring semester
Seminar on Perceptual Computing B

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Perceptual Computing A

Graduate School of Fundamental Science and Engineering

2025 spring semester
Pattern Recognition

Graduate School of Fundamental Science and Engineering

2025 spring semester
Special Laboratory B in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 fall semester
Special Laboratory A in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 spring semester
Perceptual Computing

Graduate School of Fundamental Science and Engineering

2025 fall semester
Research on Linguistic Cognitive Science

Graduate School of Fundamental Science and Engineering

2025 full year
Research on Perceptual Computing

Graduate School of Fundamental Science and Engineering

2025 full year
Seminar on Linguistic Cognitive Science D

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Linguistic Cognitive Science C

Graduate School of Fundamental Science and Engineering

2025 spring semester
Seminar on Linguistic Cognitive Science B

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Linguistic Cognitive Science A

Graduate School of Fundamental Science and Engineering

2025 spring semester
Seminar on Perceptual Computing D

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Perceptual Computing C

Graduate School of Fundamental Science and Engineering

2025 spring semester
Seminar on Perceptual Computing B

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Perceptual Computing A

Graduate School of Fundamental Science and Engineering

2025 spring semester
Special Seminar B in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 fall semester
Special Seminar A in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2025 spring semester
Practice on Embodiment Informatics G

Graduate School of Creative Science and Engineering

2025 spring semester
Practice on Embodiment Informatics F

Graduate School of Creative Science and Engineering

2025 fall semester
Practice on Embodiment Informatics E

Graduate School of Creative Science and Engineering

2025 spring semester
Practice on Embodiment Informatics D

Graduate School of Creative Science and Engineering

2025 fall semester
Practice on Embodiment Informatics C

Graduate School of Creative Science and Engineering

2025 spring semester
Practice on Embodiment Informatics B

Graduate School of Creative Science and Engineering

2025 fall semester
Practice on Embodiment Informatics A

Graduate School of Creative Science and Engineering

2025 spring semester
Research on Perceptual Computing

Graduate School of Fundamental Science and Engineering

2025 full year
Research on Linguistic Cognitive Science

Graduate School of Fundamental Science and Engineering

2025 full year
Seminar on Perceptual Computing D

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Linguistic Cognitive Science D

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Linguistic Cognitive Science C

Graduate School of Fundamental Science and Engineering

2025 spring semester
Seminar on Linguistic Cognitive Science B

Graduate School of Fundamental Science and Engineering

2025 fall semester
Seminar on Linguistic Cognitive Science A

Graduate School of Fundamental Science and Engineering

2025 spring semester
Embodiment Informatics

Graduate School of Creative Science and Engineering

2025 spring semester
Analysis and Discussion of Papers on Advanced Robotics

Graduate School of Creative Science and Engineering

2025 spring semester@fall semester
Intensive Seminar on Embodiment Informatics

Graduate School of Creative Science and Engineering

2025 full year
Case Study on Innovation -Practice-

Graduate School of Creative Science and Engineering

2025 fall quarter
Case Study on Innovation -Start-Up-

Graduate School of Creative Science and Engineering

2025 spring quarter
Intensive Seminar on Embodiment Informatics

Graduate School of Creative Science and Engineering

2025 full year
Practice on Embodiment Informatics J

Graduate School of Creative Science and Engineering

2025 fall semester
Practice on Embodiment Informatics I

Graduate School of Creative Science and Engineering

2025 spring semester
Practice on Embodiment Informatics H

Graduate School of Creative Science and Engineering

2025 fall semester
Methodical Robotics

Graduate School of Creative Science and Engineering

2025 fall semester
Advanced Seminar on Methodical Robotics

Graduate School of Creative Science and Engineering

2025 fall semester
Analysis and Discussion of Papers on Advanced Robotics

Graduate School of Creative Science and Engineering

2025 spring semester@fall semester
Intensive Seminar on Embodiment Informatics

Graduate School of Advanced Science and Engineering

2025 full year
Practice on Embodiment Informatics J

Graduate School of Advanced Science and Engineering

2025 fall semester
Practice on Embodiment Informatics I

Graduate School of Advanced Science and Engineering

2025 spring semester
Practice on Embodiment Informatics H

Graduate School of Advanced Science and Engineering

2025 fall semester
Practice on Embodiment Informatics G

Graduate School of Advanced Science and Engineering

2025 spring semester
Practice on Embodiment Informatics F

Graduate School of Advanced Science and Engineering

2025 fall semester
Practice on Embodiment Informatics E

Graduate School of Advanced Science and Engineering

2025 spring semester
Intensive Seminar on Embodiment Informatics

Graduate School of Advanced Science and Engineering

2025 full year
Case Study on Innovation -Practice-

Graduate School of Advanced Science and Engineering

2025 fall quarter
Case Study on Innovation -Start-Up-

Graduate School of Advanced Science and Engineering

2025 spring quarter
Analysis and Discussion of Papers on Advanced Robotics

Graduate School of Creative Science and Engineering

2025 spring semester@fall semester
Practice on Embodiment Informatics D

Graduate School of Advanced Science and Engineering

2025 fall semester
Practice on Embodiment Informatics C

Graduate School of Advanced Science and Engineering

2025 spring semester
Practice on Embodiment Informatics B

Graduate School of Advanced Science and Engineering

2025 fall semester
Practice on Embodiment Informatics A

Graduate School of Advanced Science and Engineering

2025 spring semester
Embodiment Informatics

Graduate School of Advanced Science and Engineering

2025 spring semester

▼display all

Teaching Experience

最適化アルゴリズム（早稲田大学）

2005.04

-

Now
情報理論（早稲田大学）

1995.09

-

Now
パターン認識（法政大学，早稲田大学）

1991.04

-

Now
信号処理（法政大学，早稲田大学）

1991.04

-

2008.03
プログラミング（法政大学，東京農工大学，早稲田大学）

1985.04

-

2004.03
電子回路（早稲田大学）

1992.04

-

1994.07
人工知能（法政大学，早稲田大学）

1985.04

-

1992.03
計算機工学（法政大学）

1986.04

-

1991.03
電磁気学（法政大学）

1985.04

-

1991.03

▼display all

Social Activities

会話ロボット ROBISUKE の実演展示

愛・地球博
2005.08

　

　
Conversational Robot ROBISUKE, Exhibition & Demonstration

Lille 2004 -European Capital of Culture : Robots !
2003.12

-

2004.03
Conversational Robot ROBISUKE, Exhibition & Demonstration

ROBODEX2003
2003.03

　

　
会話ロボット ROBITA の実演展示

NTT Intercommunication Center 「共生する／進化するロボット」展
1999.02

　

　
WABOT2 の実演展示

科学万博つくば'85
1985.03

-

1985.09
SCHEMA: multi-party interaction-oriented humanoid robot

ACM, SIGGRAPH ASIA 2009 Art Gallery & Emerging Technologies: Adaptation
2009.12

　

　
会話ロボットROBISUKEの実演展示

大垣市ものづくりフェスティバル
2009.03

　

　
会話ロボットROBISUKEの実演

ケアタウン小平訪問デモ
2008.06

-

　
会話ロボットROBISUKEの実演展示

21世紀夢ウィーク～飛騨高山ロボットワールド～
2005.09

　

　
Conversational Robot ROBISUKE, Exhibition & Demonstration

Japan Festival in Korea
2002.10

　

　
会話ロボットROBITA の実演展示

ザ・ロボット博
2001.04

-

2001.05

▼display all

Academic Activities

Organizing committee member, Local arrgement committee chair, Interspeech 2010

Competition, symposium, etc.

ISCA

2010.09

　

　
Organizing committee chair, International Symposium on Spoken Dialogue

Competition, symposium, etc.

Waseda University

1993.11

　

　
Exhibition committee member, International Conference on Spoken Language Processing 1990

Competition, symposium, etc.

1990.11

Sub-affiliation

Faculty of Science and Engineering Graduate School of Fundamental Science and Engineering
Affiliated organization Global Education Center

Research Institute

2025

-

2026

Center for Data Science Concurrent Researcher
2025

　

　

Perceptual Computing Laboratory Director of Research Institute
2024

-

2026

Waseda Research Institute for Science and Engineering Concurrent Researcher

Internal Special Research Projects

音声会話システムに関する研究

2021

　View Summary

　会話システムにおいてコンテンツを扱う部分からプロトコルを扱う部分を分離して両者を疎結合構成とし，後者を対象として End-to-End 学習を適用することで，比較的少数のデータで学習が可能な会話プロトコル制御モデルについて検討している．本年度は，特にシステムの発話タイミングの精緻なモデリングについて検討した．従来の会話システムでは，発話終端の検知に基づいてシステムを動作させるが，安定した発話終端検知には時間がかかり，リズムの良い会話はできない．そこで，発話終端検知に頼ることなく，韻律パターンや発話内容からシステムが発話すべきか否かを，音声の分析フレームの更新時刻に同期して毎時刻判定することについて検討した．モデルは，LSTMをベースとするDNN(Deep Neural Networks)で構成し，利用する入力情報としては，スペクトル包絡特徴，韻律特徴，言語特徴（音声認識の結果得られるサブワード列），対話行為の推定結果を用いることについて検討した．このシステム構成によって，発話タイミングを精緻に制御することができ，円滑な会話進行に貢献すること，対話行為を利用することの効果は大きいことなどが明らかになった．
会話システムのプロトコルとアーキテクチャに関する研究

2020

　View Summary

　提案する音声会話の４階層プロトコルのうち，ターンテイク層の機能の精緻化について検討した。　ターンテイク層では，リズムの良い会話の実現のために，文脈に応じてシステムが番をとるべきかとらざるべきか，とるとするならばどの程度の時間をおいてとるべきかを決定する。昨年度，この解決のために，TGN(Timing Generating Networks)とよぶ，イベントの出力タイミングを制御可能なニューラルネットワークを提案したが，本年度はこれに発話義務推定とのマルチタスクで学習する仕組みと，言語情報利用の仕組みとを組み込んで精緻化した。この拡張によって，発話タイミングを0.5秒以内で推定できる精度を7.5%向上させることができた。
会話システムのプロトコルとアーキテクチャに関する研究

2019

　View Summary

　提案する音声会話の４階層プロトコルのうち，ターンテイク層の機能の実現法を検討した。　ターンテイク層では，リズムの良い会話の実現のために，文脈に応じてシステムが番をとるべきかとらざるべきか，とるとするならばどの程度の時間をおいてとるべきかを決定する。この問題の解決のために，ETCNN(Event-Timing Controllable Neural Network)とよぶ，イベントの出力タイミングを制御可能なニューラルネットワークを提案した。ETCNNは，出力タイミングが，ユーザ発話の韻律，ユーザなどに従って制御できるEnd-to-Endの枠組みである。この手法の適用によって，発話タイミングの推定誤差を従来手法に比べ平均約20%減じるとともに，格段に推定の外れ値を減ずることに成功した。
会話システムのプロトコルとアーキテクチャに関する研究

2018

　View Summary

　我々が提案する音声会話の４階層プロトコルのうち，参与構造形成層，メッセージ送信層の機能実現法を検討した。　参与構造形成層では，参与構造形成のためのシステムの行動を，センサー情報からEnd-to-Endで決める手法を検討した。発話や視線の認識を副タスクとするマルチタスクNeural Networksを導入することで，従来のルールに基づく方法に比べ，精度を30ポイント以上向上できた。　メッセージ送信層では，段落内における各文の重要度をBERTに基づく解析結果を用いて求め，これに応じて文間の「間」を静的に制御するする手法を提案した。一対比較によるプリファレンス評価において，本手法導入後のシステムは，導入以前に対し，77%という極めて高い値を達成した。
会話システムのプロトコルとアーキテクチャに関する研究

2017

　View Summary

会話プロトコルを，通信系になぞらえて，a)物理層，b)参与構造形成層，c)メッセージ送信層，d)ターンテイク層に整理した。a)は通信系の物理層に相当し，人を模した表現手段としての身体を持つことで，人同士と同じ方法でのデータ授受を可能にする。b)はデータリンク／ネットワーク層相当に相当し，身体表現によって，会話への参加状態と，その変更手続きを与える。c)はトランスポート層に相当し，相槌等によってデータ授受の成否を伝える。d)はセッション層に相当し，セッション開始・終了を定義する。これら円滑な会話に必要となる振る舞いを，機能・役割レベルと，具体的身体動作レベルに分けて記述し，ハードウェアに依存部を下位に隠蔽した。
音声会話：情報遭遇を含む多様な情報行動による情報アクセスに関する研究

2017 小川哲司, 林良彦

　View Summary

　従来，音声会話は，質問応答を対象として実現されてきた。しかし，快適な情報アクセスには，これに加えシステム側から主体的に情報提供する機能が必要とされ，さらにこれらのモードがリズム良く遷移できること求められる。ここでは，これら複雑な情報行動に即応性高く対応できる音声会話システムを実現する立場から，「シナリオ主導型会話システム」を提案した。伝えようとする文書の解析に基づいて，文書のあらすじを伝えるシナリオの主計画と，想定される質問に答える副計画とが事前に準備され，これらに沿って会話が進められる。実験の結果，従来型の会話システムに比べ，ユーザに必要な情報だけ効率的に伝達できるシステムが実現できた。
深層学習に基づく雑音抑圧処理歪の補正と雑音下音声認識への適用に関する研究

2016 小川哲司

　View Summary

　本研究では，申請者が研究を続けてきた高速・高精度な音源分離手法であるエリア収音技術と深層学習を利用した低歪の雑音抑圧技術を融合することで，拡散性雑音抑圧フィルタを完成度の高い形で実現する方式の検討を行った．　提案方式では，エリア収音により目的音と雑音を分離した後，目的音に残留した雑音成分を抑圧するフィルタを構築する．そのために，エリア収音により得られた目的音が支配的な信号と拡散性雑音が支配的な信号のパワースペクトルを入力とする深層ニューラルネットワークによって帯域ごとのSNRを推定した．　提案方式は，従来のマルチチャネルウィナーフィルタと比較して，処理歪を抑えながら高い雑音抑圧性能を達成した．
会話：意図性の異なる多様な情報行動による情報享受の実現

2016 林良彦, 藤江真也

　View Summary

　能動的（検索的）情報行動と受動的（遭遇的）情報行動の双方を交えたリズム良い情報授受によって，ニュース内容を伝える会話システムを実現した。即応性を重要するため，予想されるユーザの応答に応じた分岐を含む会話進行のシナリオを準備し，これを切り替えながら会話を進める方法をとった。シナリオは，ニュースの根幹を伝える主計画と，ユーザの反応に応じて補足情報を提示する副計画からなる。前者は，話題性を考慮して重要語を定め，これを含むようニュースを要約して定めた。後者は，各呼気段落内の重要自立語に対し質問タイプを網羅して回答を用意して定めた。以上によって，所期の目的を達成する会話システムを実現できた。
参加者間の共鳴状態を誘導する音声会話システム

2014 林良彦, 小川哲司, 松山洋一, 藤江真也, 中野鐵兵

　View Summary

　新たな会話制御技術と情報提供技術により，会話を共鳴状態（参加者が響きあうよう呼応して会話する状態）に導く音声会話システムを実現した。　会話制御に関しては，全会話参加者が等しい発話機会を持つための調整機能を提案した。会話では特定の人が頻繁に発話を繰り返し，一部は会話に入れないことがある。ここでは，会話に割り込んで主導権をとった後，発話機会の少ない人に話題をふる機能を実装しこの問題を解決した。　情報提供に関しては，レビュー記事にある投稿者の主観的発話を，システムの主観の如く発話する機能を実装した。会話相手を楽しませる効果を持つ発話の選択手法と，選ばれる複数の文の主観が一貫性を持つため枠組みを提案した。
グループ会話環境下における場の活性化要素としての会話ロボットに関する研究

2012 松山洋一, 岩田和彦

　View Summary

　少人数での会議や談笑に見られる会話参加者間で動的な発話やりとりを特徴とする「グループ会話(多人数会話)」を対象として,これに参与して,場を活性化する機械システムを実現するための枠組みについて検討した。　具体的には，１）会話に入れない人に参加を促す際のプロトコル２）興味を引く発話の自動生成３）会話システム用顔画像処理技術の高度化４）会話システム用音声合成技術の高度化の４つのサブテーマをとりあげた。各テーマの成果は以下の通りである。　「１）会話に入れない人に参加を促す際のプロトコル」は，発話する機会も，話しかけられる機会も少ない会話参加者を見つけて，その人に話しかけ，会話に入ってもらうためには，どのような仕組みが必要かについて検討したものである。このとき，システムは，対象となる人を探す機能，話しかける適切な内容を決める機能，話しかけても全体の調和を乱さないタイミングを決める機能，などが必要となる。ここでは，①参加者の発話状態，視線などからそれぞれの参与役割を推定し，発話者にも主たる聞き手にもなっていない割合が高い人を話しかけるべき対象者とする，②CRFを用いて話題を適切に追うことで発話するべき内容を定める，③現話題の下での会話に参加して「調和的会話参加者」になってから対象者に話かける，などの仕組みを実装することで，所望の機能を実現した。　「２）興味を引く発話の自動生成」は，会話参加者からの質問にたいする回答内容を自動的に用意する方法について検討したものである。興味を引く発話を行うために，回答内容には，システムの主体的感想，評価的内容を含めることとした。この目的のため，システムは，レビューサイトをクロールし，関連する話題について評価を述べた部分を抜き出し，口語調に表現を変えた上で，内容のふさわしさを評価ランキングし，上位の文を回答文とする方式を考案した。ここで，ランキングには低頻度形容詞を多用している文を評価尺度とした。これによって，情報の多い意外性のある文を選ぶ仕組みが実現し，効果的な回答文を生成することに成功した。　「３）会話システム用顔画像処理技術の高度化」は，会話システムに必要となる顔検出を安定に行う技術について検討したものである。AAMを改良することで，顔と顔部品の検出精度を飛躍的に改善することができた。　「４）会話システム用音声合成技術の高度化」は，会話調の音声合成方式について検討したものである。文脈に応じて，適切な声質・抑揚で発話できる合成器を，心理空間上での文脈のクラスタリング，語末表現のクラスタリング等を精緻に行うことで実現した。
音声会話システムの総合的研究

2011 藤江　真也, 小川　哲司, 松山　洋一, 岩田　和彦

　View Summary

ロボットを用いた会話コミュニケーションの実現に向けて，以下のテーマの研究を行った。(1)音声会話プロトコルの解明　音声会話プロトコルのモデル化を，会話の観察に基づて行った。特に多人数で会話を行うとき，会話相手の選択，発話の番の制御などが，どのような身体表現を伴って行われるかを整理した。(2) 魅力ある会話の実現　会話が魅力的であるために，ロボットの発話はどうあるべきかについて整理した。特に相手が話しやすくすることに配慮しながら，まず，単に聞かれたことに答えるだけでなく，質問に答えながら関連した新たな話題を含めるしくみを用意した。これによって利用者は発話をつなぎやすくった。(3)要素技術の開発　3-1)視覚情報処理：　会話参加者の姿勢は，その会話参加者の会話への参加の意思等を特定するのに重要である。また，視線が直接の通信相手を表すこと，表情が情報伝達の成功／不成功や，相手の興味の有無を表現することなどは，既に良く知られている。この「姿勢と表情」の自動認識システムを開発した。姿勢認識・表情推定に必要となる画像特徴点の抽出問題に対し，ロボットに装着したカメラだけでなく，部屋の天井に設置したカメラでも情報を収集した上でそれらを統合利用するシステムを実現した。　3-2)聴覚情報処理：　多人数の音声会話をハンズフリーで行うとき生じる様々な問題を解決した。主に目的話者の背後から到来する指向性雑音の除去と，残響の問題を，提案する6マイクの帯状ビームフォーマ[4]で処理した。また，会話では，一息で多くの文を話したり，ひとつの文をとぎれとぎれに話したりするが，この発話単位と意味の纏まりの異なりが，会話音声認識の問題を難しくしている。ここでは，話し方(間のおき方)の違いは，一種のプロトコルにかかわる情報を発話に含めた結果と解釈し，それが引き起こす特有の韻律現象を，デコードに積極利用する方法を検討した。(4)統合システム　以上(1)-(4)を統合し，複数人を対象に，ゲームをしながら会話を楽しむことができるシステムを実現した。通所介護施設の高齢者との会話実験を行い，好評をいただいた。　
会話ロボットの利用に基づくパラ言語理解・生成機構の定量的モデル化に関する研究

2007 藤江　真也

　View Summary

　これまでに開発したパラ言語情報の理解・表出機能を持つ音声会話ロボットを用いて，自然な音声会話コミュニケーションを成立させるために必要となるパラ言語情報の役割を定量的に明らかにすること試みた。　我々人間は，会話的コミュニケーションにおいて，音声で言語情報を伝える傍らにおいて，会話参与の状態（情報を受け付ける状態にあるのかどうか，正常に情報を受け付けたかどうか，受けた情報をどのように評価しているのか等）を表情によって伝達しており，これが基礎となって円滑な情報交換が成立している。これらの情報は，パラ言語と呼ばれる情報の一部であるが，これらパラ言語の重要性を定性的に指摘する研究は存在するものの，これをどの程度厳格にモデル化すれば自然なコミュニケーションは成立するのかについて定量的に検討する試みはなされていない。　そこで本研究では，特にターンテーキングの円滑化に係るロボットの表情表出動作として，視線表現を選定し，その定量的モデル化を試みた。一般に，ターンを渡すためには発話終了に合わせて聴取者に視線を向け，ターンを保持するためには視線をずらすとされている。表現のバリエーション（視線の外し方，合わせ方）およびその頻度，時間構造をパラメタとするモデルを作成し，種々のパラメタ設定でパラ言語情報を生成する会話ロボットと被験者の会話実験を行い，その自然性を評価した。この結果，自然な視線表現を実現するパラメータの関係式と，連続動作させるときのパラメタセットの組み合わせに関する知見が得られ，これに従って視線を動作させるとき，会話が自然に進行することを確認した。
状況把握と身体表現機能を有する複数話者との会話ロボット－人間と空間を共有する情報端末の実現に向けて

2000 菊池　英明, 高西　淳夫

　View Summary

　複数話者と会話するロボットの実現に向けて、1)複数話者の音声の分離・認識、2)複数話者間におけるコミュニケーションチャネルの認識、3)身体による意志表現、4)情報統合技術、の４点について検討を行った。　１）に関しては、通常音声認識に用いられる、音響モデルと、言語モデルに加え、発話のターンテーキングに関する発話間言語モデル、および話者の交代を統計的に表わす話者モデルをさらに加えた４つの確率モデルを用いて、最も尤もらしい話者の交代と発話内容を推定するアルゴリズムを確立した。　２）に関しては、音源の定位と発話者の顔方向の組合せによって、誰が誰に向かって話しているのかを認識することを可能とした。音源定位に関しては、MUSICスペクトルの相関を用いた方法を提案し、定位精度を飛躍的に改善した。顔方向に関しては、ICAを基礎とした特徴抽出手法を提案し、高精度の顔方向の認識を実現した。　３）に関しては、ロボットハードウェアとして、従来からあった目、手などに加え、眉、口、手などを付加することで、表現能力を拡充した。また、それらの単純化された身体を用いて、効果的に内部状態を表現するための動作と、その提示戦略を確立した。　４）に関しては、黒板システムにサブスクライブ／パブリッシュ機能を付加した情報伝達機構を考え、これをロボットを構成する多種多様なプロセッサ構成の中で、透過な形で実装した。　以上の成果を用いて、外部状況を視覚的あるいは聴覚的に把握し、ときにジェスチャ等の非言語的手段による意思表示をしながら、複数の相手を対象にして会話できるロボットを実現した。
確率過程の精密なモデル化とその音声認識・ジェスチャ認識への応用に関する研究

1998 橋本　周司, 笠原　博徳

　View Summary

本研究では、時系列パターンマッチングのための確率過程のモデルを精密化するとともに、それを用いて音声認識、ジェスチャー認識の性能を向上させることを試みた。音声やジェスチャーの認識に代表される時系列のパターン認識においては、確率過程のモデルが重要な役割を果たす。従来この確率モデルとしては、隠れマルコフモデル(HMM)が用いられてきた。しかしながら、HMMは区分定常の確率過程しか扱うことができず、結果として種々の不都合を生じていた。この問題を解決するために、2重のマルコフモデルから発して、時間の古い状態を観測不能な隠れ状態に、時間の新しい方の状態を可観測状態においた、新たな確率モデル、部分隠れマルコフモデル(PHMM)を提案した。HMMでは、出力、次状態ともに前状態にのみ依存して決まるのに対し、PHMMでは、出力、状態ともに、状態と前出力に依存して決まる枠組となっている。この構造のため、モデルの複雑化を抑えた上で、HMMに比べ過渡部の表現能力の高い確率過程のモデルが実現できた。 PHMMのパラメータ推定法としては、EMアルゴリズムを用いた定式化を行ない、厳密なパラメータ推定法を確立した。シミュレーション実験を通じてPHMMとHMMの特性を比較したところ、HMMでは出力確率が主に状態推移のタイミングを決め、状態遷移確率はほとんど無意味であるのに対し、PHMMでは遷移確率が状態推移のタイミングを決めていることが分かった。遷移部の動特性の違いを区別する上でも、PHMMはHMMより有効であることが分かった。 PHMMを用いて、ジェスチャ認識実験と音声認識実験を行なったところ、ジェスチャ認識、音声認識ともにHMMより高い性能が得られ、PHMMの時系列パターン認識への有効性が確認された。

▼display all