研究者詳細 - 小林　哲則

写真a

コバヤシ　テツノリ

小林　哲則

Scopus 論文情報

論文数: 238 Citation: 2275 h-index: 24

Click to view the Scopus page. The data was downloaded from Scopus API in August 11, 2025, via http://api.elsevier.com and http://www.scopus.com .

Google Scholar 情報（Citations per year）

Citation: 5736 h-index: 35 i10-index: 115

Click to view the Google Scholar page.

Scopus 情報

News & Topics

2021.12.23

英会話能力判定systemを開発

所属

理工学術院基幹理工学部

職名

教授

学位

工学博士

ホームページ

http://www.pcl.cs.waseda.ac.jp/index.html

プロフィール

音声・画像処理などを用いたコンピュータ・ヒューマン・インタラクション，知能ロボット，音声の生成・知覚，インタフェースの開発パラダイムなどの研究に興味を持つ。

経歴

1997年04月

-

継続中

早稲田大学教授（理工学術院基幹理工学部情報通信学科）
2018年11月

-

2020年09月

早稲田大学研究戦略センター所長
2004年04月

-

2009年03月

NHK放送技術研究所客員研究員
2000年04月

-

2002年03月

ATR音声言語通信研究所客員研究員
1994年07月

-

1995年08月

MIT Laboratory for Computer Science Visiting Researcher
2020年04月

-

2020年09月

早稲田大学研究院院長
2014年11月

-

2016年09月

早稲田大学高等研究所副所長
2010年09月

-

2016年09月

早稲田大学理工学術院総合研究所副所長
2007年04月

-

2014年03月

早稲田大学理工学術院基幹理工学部情報理工学科教授（学科名称変更による）
2004年04月

-

2007年03月

早稲田大学理工学術院基幹理工学部コンピュータ・ネットワーク工学科教授（学部再編による）
2003年04月

-

2004年03月

早稲田大学理工学部コンピュータ・ネットワーク工学科教授（学科再編による）
1997年04月

-

2003年03月

早稲田大学理工学部電気電子情報工学科教授
1996年04月

-

1997年03月

早稲田大学理工学部電気電子情報工学科助教授（学科名称変更による）
1991年04月

-

1996年03月

早稲田大学理工学部電気工学科助教授
1987年04月

-

1991年03月

法政大学工学部電気工学科助教授
1990年07月

-

1990年09月

ATR視聴覚機構研究所招聘研究員
1985年04月

-

1987年03月

法政大学工学部電気工学科専任講師

▼全件表示

学歴

1985年03月

-

　

早稲田大学大学院理工学研究科博士課程修了
1982年03月

-

　

早稲田大学大学院理工学研究科修士課程修了
1980年03月

-

　

早稲田大学理工学部電気工学科卒業

委員歴

2004年05月

-

2006年05月

電子情報通信学会会誌編集委員会編集特別幹事
2001年04月

-

2004年03月

情報処理学会音声言語情報処理研究会主査
1998年04月

-

2002年03月

言語処理学会理事

所属学協会

1994年

-

継続中

言語処理学会
1987年

-

継続中

人工知能学会
1983年

-

継続中

情報処理学会
1980年

-

継続中

日本音響学会
1980年

-

継続中

電子情報通信学会
2003年

-

継続中

言語資源協会
1996年10月

-

継続中

ACM
1989年01月

-

継続中

日本ロボット学会
1984年

-

継続中

IEEE

▼全件表示

研究分野

知能ロボティクス / 知覚情報処理

研究キーワード

パターン認識
画像情報処理
音声言語処理
会話ロボット

受賞

フェロー

2023年03月電子情報通信学会ロボットを用いたマルチモーダル多人数会話研究に対する貢献
フェロー

2016年06月情報処理学会ロボット会話研究に対する先駆的な貢献と研究コミュニティの活性化への貢献
American Publishers Awards for Professional and Scholarly Excellence

2008年 Springer handbook of robotics
論文賞

2001年電子情報通信学会グループ会話に参与する対話ロボットの構築

受賞者：松坂要佐, 東條剛, 小林哲則
大学発ベンチャー表彰科学技術振興機構理事長賞

2024年08月科学技術振興機構
研究会優秀賞

2019年人工知能学会隠れた良作を推薦可能なWeb小説レコメンドシステムの提案
Best Poster Award

2016年12月 ACM SIGGRAPH VRCAI2016 Video Semantic Indexing using Object Detector

受賞者： Kazuya Ueki, Tetsunori Kobayashi
研究会優秀賞

2015年人工知能学会情報アクセスにおける受動性と能動性：音声対話によるニュース記事アクセス
研究会優秀賞

2012年人工知能学会多人数会話活性化のための自発的行動タイミング検出と発話行動戦略
HAI-2012

2012年 Outstanding Research Award
研究会優秀賞

2011年人工知能学会発話期待度／意欲度に基づく発話タイミング制御
Best Paper

2008年 IEEE BTAS2008 (International Conference on Biometrics: Theory, Applications and Systems) Class distance weighted locality preserving projection for automatic age estimation

受賞者： Kazuya Ueki, Masakazu Miya, Tetsuji Ogawa, Tetsunori Kobayashi.
研究会優秀賞

2008年人工知能学会人-人コミュニケーションの活性化支援ロボットの開発

▼全件表示

メディア報道

「場を読める」目線・しぐさ

新聞・雑誌

読売新聞

2012年04月
処理遅延の小さい音源分離モジュール，OKIと早大が共同開発

新聞・雑誌

執筆者：本人以外

日経BP, 日経エレクトロニクス,

2009年05月
会話ロボット

テレビ・ラジオ番組

NHK, サイエンスZERO

2006年01月
会話ロボット

テレビ・ラジオ番組

Discovery Channel Canada

2005年03月
年齢・性別推定システム

テレビ・ラジオ番組

TV東京ワールドビジネスサテライト

2004年03月
会話ロボット

テレビ・ラジオ番組

NHK, クローズアップ現代

2000年01月
会話ロボット ROBITA

テレビ・ラジオ番組

NHK サイエンス・アイ

1999年12月
会話ロボット ROBITA

テレビ・ラジオ番組

TBS 筑紫哲也・立花隆の『人のたび・ヒトへの旅』

1999年05月
騒音の中でも声だけ拾う OKI、スマホ向け技術雑音消し声認識

新聞・雑誌

日本経済新聞

2012年11月
音源分離技術

テレビ・ラジオ番組

テレビ東京ワールドビジネスサテライト・トレンドたまご

2008年11月
音源分離技術

新聞・雑誌

日経産業新聞

2008年11月
会話ロボット

テレビ・ラジオ番組

VARA(オランダのテレビ局)

2006年01月
NECソフト来店客を自動解析早大と開発年齢・性別を推定

新聞・雑誌

日経MJ

2004年05月
コナミと早大 CG共同開発

新聞・雑誌

日本経済新聞

2004年02月
人の感情読み取り対話

新聞・雑誌

日本経済新聞

2004年01月
早大「会話型ロボ」開発あいまいな言葉も理解し返答

新聞・雑誌

日経産業新聞

2004年01月
顧客の年齢や性別推定早大 NECソフト

新聞・雑誌

日経産業新聞

2003年10月
会話ロボット

テレビ・ラジオ番組

テレビ東京賢者のマネー

2003年06月
まずトーイより始めよ：認識機能を強化してユーザとの対話を多彩に

新聞・雑誌

日経BP, 日経エレクトロニクス,

No.747, pp.133-134,

1999年07月
もう一つのワールドカップ

テレビ・ラジオ番組

テレビ愛知

1999年06月
特集：感性ロボット登場

新聞・雑誌

日刊工業新聞, トリガー,

Vol.18, No.5, pp.24-26,

1999年05月
会話ロボット

テレビ・ラジオ番組

テレビ朝日, 週間地球テレビ『ロボット特集』,

1998年01月
ジェスチャ認識システム

テレビ・ラジオ番組

東京メトロポリタンテレビロボットが鉄腕アトムになる日

1997年09月
世界をリードする日本のロボット技術

新聞・雑誌

日本工業新聞

1997年04月

▼全件表示

論文

Response Timing Estimation for Spoken Dialog Systems Based on Syntactic Completeness Prediction

Jin Sakuma, Shinya Fujie, Tetsunori Kobayashi

2022 IEEE Spoken Language Technology Workshop (SLT) 2023年01月 [査読有り] [国際誌]

担当区分：最終著者

DOI
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model

Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

ACM EMNLP 2022 5486 - 5503 2022年 [査読有り] [国際共著]
Associative Memory Model-Based Linear Filtering and Its Application to Tandem Connectionist Blind Source Separation

Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 25 ( 3 ) 637 - 650 2017年03月 [査読有り]

担当区分：最終著者

　概要を見る

We propose a blind source separation method that yields high-quality speech with low distortion. Time-frequency (TF) masking can effectively reduce interference, but it produces nonlinear distortion. By contrast, linear filtering using a separation matrix such as independent vector analysis (IVA) can avoid nonlinear distortion, but the separation per-formance is reduced under reverberant conditions. The tandem connectionist approach combines several separation methods and it has been used frequently to compensate for the disadvantages of these methods. In this study, we propose associative memory model (AMM)-based linear filtering and a tandem connectionist framework, which applies TF masking followed by linear filtering. By using AMM trained with speech spectra to optimize the sepa-ration matrix, the proposed linear filtering method considers the properties of speech that are not considered explicitly in IVA, such as the harmonic components of spectra. TF masking is applied in the proposed tandem connectionist framework to reduce unwanted components that hinder the optimization of the separation matrix, and it is approximated by using a linear separation matrix to reduce nonlinear distortion. The results obtained in simultaneous speech separation experiments demonstrate that although the proposed linear filtering method can increase the signal-to-distortion ratio (SDR) and signal-to-interference ratio (SIR) compared with IVA, the proposed tandem connectionist framework can obtain greater increases in SDR and SIR, and it reduces the phoneme error rate more than the proposed linear filtering method.

DOI

Scopus

5

被引用数

(Scopus)
Four-participant group conversation: A facilitation robot controlling engagement density as the forth participant

Yoichi Matsuyama, Iwao Akiba, Shinya Fujie and Tetsunori Kobayashi

Computer Speech and Language 33 ( 1 ) 1 - 24 2015年09月 [査読有り]

担当区分：最終著者
Conversational Robots: An Approach to conversation protocol issues that utilizes the paralinguistic information available in a robot-human setting.

Tetsunori Kobayashi, Shinya Fujie

Acoustical Science and Technology 34 ( 2 ) 64 - 72 2013年03月 [査読有り] [招待有り]

担当区分：筆頭著者
Conversation robot participating in group conversation

Y Matsusaka, T Tojo, T Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E86D ( 1 ) 26 - 36 2003年01月 [査読有り] [招待有り]

担当区分：最終著者

　概要を見る

We developed a conversation system which can participate in a group conversation. Group conversation is a form of conversation in which three or more participants talk to each other about a topic on an equal footing. Conventional conversation systems have been designed under the assumption that each system merely talked with only one person. Group conversation is different from these conventional systems in the following points. It is necessary for the system to understand the conversational situation such as who is speaking, to whom he is speaking, and also to whom the other participants pay attention. It is also necessary for the system itself to try to affect the situation appropriately. In this study, we realized the function of recognizing the conversational situation, by combining image processing and acoustic processing, and the function of working on the conversational situation utilizing facial and body actions of the robot. Thus, a robot that can join in the group conversation was realized.
Hierarchical Multi-Task Learning with CTC and Recursive Operation

Nahomi Kusunoki, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

Interspeech 2024 2855 - 2859 2024年09月

DOI
Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture

Huaibo Zhao, Shinya Fujie, Tetsuji Ogawa, Jin Sakuma, Yusuke Kida, Tetsunori Kobayashi

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023年06月

DOI
BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023年06月

DOI
Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023年06月

DOI
Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization

Yusuke Fujita, Tetsuji Ogawa, Tetsunori Kobayashi

IEEE Access 11 140069 - 140076 2023年

DOI
PostMe: Unsupervised Dynamic Microtask Posting For Efficient and Reliable Crowdsourcing

Ryo Yanagisawa, Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

2022 IEEE International Conference on Big Data (Big Data) 2022年12月 [査読有り]

DOI
Phrase-Level Localization of Inconsistency Errors in Summarization by Weak Supervision

Masato Takatsuka, Tetsunori Kobayashi, Yoshihiko Hayashi

Proceedings of the 29th International Conference on Computational Linguistics 6151 - 6164 2022年10月 [査読有り]
Response Timing Estimation for Spoken Dialog System using Dialog Act Estimation

Jin Sakuma, Shinya Fujie, Tetsunori Kobayashi

Interspeech 2022 2022年09月 [査読有り] [国際誌]

担当区分：最終著者

DOI
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022年05月 [査読有り]

担当区分：最終著者

DOI
Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models

Naohiro TAWARA, Atsunori OGAWA, Tomoharu IWATA, Hiroto ASHIKAWA, Tetsunori KOBAYASHI, Tetsuji OGAWA

IEICE Transactions on Information and Systems E105.D ( 1 ) 150 - 160 2022年01月 [査読有り]

DOI
Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation

Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

Interspeech 2021 2021年08月 [査読有り] [国際誌]

担当区分：最終著者

DOI
Timing generating networks: Neural network based precise turn-taking timing prediction in multiparty conversation

Shinya Fujie, Hayato Katayama, Jin Sakuma, Tetsunori Kobayashi

Interspeech 2021 3771 - 3775 2021年08月 [査読有り]

担当区分：最終著者
Improved Mask-CTC for Non-Autoregressive End-to-End ASR

Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021年06月 [査読有り]

担当区分：最終著者

DOI
Noise-robust attention learning for end-to-end speech recognition

Yosuke Higuchi, Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

European Signal Processing Conference 2021- 311 - 315 2021年01月

　概要を見る

We propose a method for improving the noise robustness of an end-to-end automatic speech recognition (ASR) model using attention weights. Several studies have adopted a combination of recurrent neural networks and attention mechanisms to achieve direct speech-to-text translation. In the real-world environment, however, noisy conditions make it difficult for the attention mechanisms to estimate the accurate alignment between the input speech frames and output characters, leading to the degradation of the recognition performance of the end-to-end model. In this work, we propose noise-robust attention learning (NRAL) which explicitly tells the attention mechanism where to “listen at” in a sequence of noisy speech features. Specifically, we train the attention weights estimated from a noisy speech to approximate the weights estimated from a clean speech. The experimental results based on the CHiME-4 task indicate that the proposed NRAL approach effectively improves the noise robustness of the end-to-end ASR model.

DOI

Scopus

5

被引用数

(Scopus)
Investigation of network architecture for single-channel end-to-end denoising

Takuya Hasumi, Tetsunori Kobayashi, Tetsuji Ogawa

European Signal Processing Conference 2021- 441 - 445 2021年01月

　概要を見る

This paper examines the effectiveness of a fully convolutional time-domain audio separation network (Conv-TasNet) on single-channel denoising. Conv-TasNet, which has a structure to explicitly estimate a mask for encoded features, has shown to be effective in single-channel sound source separation in noise-free environments, but it has not been applied to denoising. Therefore, the present study investigates a method of learning Conv-TasNet for denoising and clarifies the optimal structure for single-channel end-to-end modeling. Experimental comparisons conducted using the CHiME-3 dataset demonstrate that Conv-TasNet performs well in denoising and yields improvements in single-channel end-to-end denoising over existing denoising autoencoder-based modeling.

DOI

Scopus

3

被引用数

(Scopus)
Personalized Extractive Summarization for a News Dialogue System

Hiroaki Takatsu, Mayu Okuda, Yoichi Matsuyama, Hiroshi Honda, Shinya Fujie, Tetsunori Kobayashi

2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings 1044 - 1051 2021年01月

　概要を見る

In modern society, people's interests and preferences are diversifying. Along with this, the demand for personalized summarization technology is increasing. In this study, we propose a method for generating summaries tailored to each user's interests using profile features obtained from questionnaires administered to users of our spoken-dialogue news delivery system. We propose a method that collects and uses the obtained user profile features to generate a summary tailored to each user's interests, specifically, the sentence features obtained by BERT and user profile features obtained from the questionnaire result. In addition, we propose a method for extracting sentences by solving an integer linear programming problem that considers redundancy and context coherence, using the degree of interest in sentences estimated by the model. The results of our experiments confirmed that summaries generated based on the degree of interest in sentences estimated using user profile information can transmit information more efficiently than summaries based solely on the importance of sentences.

DOI

Scopus

4

被引用数

(Scopus)
Analysis of Multimodal Features for Speaking Proficiency Scoring in an Interview Dialogue

Mao Saeki, Yoichi Matsuyama, Satoshi Kobashikawa, Tetsuji Ogawa, Tetsunori Kobayashi

2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings 629 - 635 2021年01月

　概要を見る

This paper analyzes the effectiveness of different modalities in automated speaking proficiency scoring in an online dialogue task of non-native speakers. Conversational competence of a language learner can be assessed through the use of multimodal behaviors such as speech content, prosody, and visual cues. Although lexical and acoustic features have been widely studied, there has been no study on the usage of visual features, such as facial expressions and eye gaze. To build an automated speaking proficiency scoring system using multi-modal features, we first constructed an online video interview dataset of 210 Japanese English-learners with annotations of their speaking proficiency. We then examined two approaches for incorporating visual features and compared the effectiveness of each modality. Results show the end-to-end approach with deep neural networks achieves a higher correlation with human scoring than one with handcrafted features. Modalities are effective in the order of lexical, acoustic, and visual features.

DOI

Scopus

12

被引用数

(Scopus)
Deep Speech Extraction with Time-Varying Spatial Filtering Guided by Desired Direction Attractor

Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2020- 671 - 675 2020年05月

　概要を見る

In this investigation, a deep neural network (DNN) based speech extraction method is proposed to enhance a speech signal propagating from the desired direction. The proposed method integrates knowledge based on a sound propagation model and the time-varying characteristics of a speech source, into a DNN-based separation framework. This approach outputs a separated speech source using time-varying spatial filtering, which achieves superior speech extraction performance compared with time-invariant spatial filtering. Given that the gradient of all modules can be calculated, back-propagation can be performed to maximize the speech quality of the output signal in an end-to-end manner. Guided information is also modeled based on the sound propagation model, which facilitates disentangled representations of the target speech source and noise signals. The experimental results demonstrate that the proposed method can extract the target speech source more accurately than conventional DNN-based speech source separation and conventional speech extraction using time-invariant spatial filtering.

DOI

Scopus

7

被引用数

(Scopus)
Exploring and exploiting the hierarchical structure of a scene for scene graph generation

Ikuto Kurosawa, Tetsunori Kobayashi, Yoshihiko Hayashi

Proceedings - International Conference on Pattern Recognition 1422 - 1429 2020年

　概要を見る

The scene graph of an image is an explicit, concise representation of the image
hence, it can be used in various applications such as visual question answering or robot vision. We propose a novel neural network model for generating scene graphs that maintain global consistency, which prevents the generation of unrealistic scene graphs
the performance in the scene graph generation task is expected to improve. Our proposed model is used to construct a hierarchical structure whose leaf nodes correspond to objects depicted in the image, and a message is passed along the estimated structure on the fly. To this end, we aggregate features of all objects into the root node of the hierarchical structure, and the global context is back-propagated to the root node to maintain all the object nodes. The experimental results on the Visual Genome dataset indicate that the proposed model outperformed the existing models in scene graph generation tasks. We further qualitatively confirmed that the hierarchical structures captured by the proposed model seemed to be valid.

DOI

Scopus
Mask CTC: Non-autoregressive end-to-end ASR with CTC and mask predict

Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020- 3655 - 3659 2020年

　概要を見る

We present Mask CTC, a novel non-autoregressive end-to-end automatic speech recognition (ASR) framework, which generates a sequence by refining outputs of the connectionist temporal classification (CTC). Neural sequence-to-sequence models are usually autoregressive: each output token is generated by conditioning on previously generated tokens, at the cost of requiring as many iterations as the output length. On the other hand, non-autoregressive models can simultaneously generate tokens within a constant number of iterations, which results in significant inference time reduction and better suits end-to-end ASR model for real-world scenarios. In this work, Mask CTC model is trained using a Transformer encoder-decoder with joint training of mask prediction and CTC. During inference, the target sequence is initialized with the greedy CTC outputs and low-confidence tokens are masked based on the CTC probabilities. Based on the conditional dependence between output tokens, these masked low-confidence tokens are then predicted conditioning on the high-confidence tokens. Experimental results on different speech recognition tasks show that Mask CTC outperforms the standard CTC model (e.g., 17.9% ? 12.1% WER on WSJ) and approaches the autoregressive model, requiring much less inference time using CPUs (0.07 RTF in Python implementation). All of our codes are publicly available at https://github.com/espnet/espnet.

DOI

Scopus

98

被引用数

(Scopus)
Mentoring-reverse mentoring for unsupervised multi-channel speech source separation

Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020- 86 - 90 2020年

　概要を見る

Mentoring-reverse mentoring, which is a novel knowledge transfer framework for unsupervised learning, is introduced in multi-channel speech source separation. This framework aims to improve two different systems, which are referred to as a senior and a junior system, by mentoring each other. The senior system, which is composed of a neural separator and a statistical blind source separation (BSS) model, generates a pseudo-target signal. The junior system, which is composed of a neural separator and a post-filter, was constructed using teacher-student learning with the pseudo-target signal generated from the senior system i.e, imitating the output from the senior system (mentoring step). Then, the senior system can be improved by propagating the shared neural separator of the grown-up junior system to the senior system (reverse mentoring step). Since the improved neural separator can give better initial parameters for the statistical BSS model, the senior system can yield more accurate pseudo-target signals, leading to iterative improvement of the pseudo-target signal generator and the neural separator. Experimental comparisons conducted under the condition where mixture-clean parallel data are not available demonstrated that the proposed mentoring-reverse mentoring framework yielded improvements in speech source separation over the existing unsupervised source separation methods.

DOI

Scopus

11

被引用数

(Scopus)
Efficient Human-In-The-Loop Object Detection using Bi-Directional Deep SORT and Annotation-Free Segment Identification.

Koki Madono, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference(APSIPA) 1226 - 1233 2020年
Waseda meisei at TRECVID 2018: Ad-hoc video search

Kazuya Ueki, Yu Nakagome, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

2018 TREC Video Retrieval Evaluation, TRECVID 2018 2020年

　概要を見る

Copyright © TRECVID 2018.All rights reserved. The Waseda Meisei team participated in the TRECVID 2018 Ad-hoc Video Search (AVS) task [1]. For this year's AVS task, we submitted both manually assisted and fully automatic runs. Our approach focuses on the concept-based video retrieval, based on the same approach as last year. Specifically, it improves on the word-based keyword extraction method presented in last year's system, which could neither handle keywords related to motion nor appropriately capture the meaning of phrases or whole sentences in queries. To deal with these problems, we introduce two new measures: (i) calculating the similarity between the definition of a word and an entire query sentence, (ii) handling of multi-word phrases. Our best manually assisted run achieved a mean average precision (mAP) of 10.6%, which was ranked the highest among all submitted manually assisted runs. Our best fully automatic run achieved an mAP of 6.0%, which ranked sixth among all participants.
Waseda_Meisei at TRECVID 2018: Fully-automatic ad-hoc video search

Yu Nakagome, Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

2018 TREC Video Retrieval Evaluation, TRECVID 2018 2020年
Word attribute prediction enhanced by lexical entailment tasks

Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi

LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings 5846 - 5854 2020年 [査読有り]

　概要を見る

© European Language Resources Association (ELRA), licensed under CC-BY-NC Human semantic knowledge about concepts acquired through perceptual inputs and daily experiences can be expressed as a bundle of attributes. Unlike the conventional distributed word representations that are purely induced from a text corpus, a semantic attribute is associated with a designated dimension in attribute-based vector representations. Thus, semantic attribute vectors can effectively capture the commonalities and differences among concepts. However, as semantic attributes have been generally created by psychological experimental settings involving human annotators, an automatic method to create or extend such resources is highly demanded in terms of language resource development and maintenance. This study proposes a two-stage neural network architecture, Word2Attr, in which initially acquired attribute representations are then fine-tuned by employing supervised lexical entailment tasks. The quantitative empirical results demonstrated that the fine-tuning was indeed effective in improving the performances of semantic/visual similarity/relatedness evaluation tasks. Although the qualitative analysis confirmed that the proposed method could often discover valid but not-yet human-annotated attributes, they also exposed future issues to be worked: we should refine the inventory of semantic attributes that currently relies on an existing dataset.
Exploiting Narrative Context and A Priori Knowledge of Categories in Textual Emotion Classification.

Hikari Tanabe, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi

Proceedings of the 28th International Conference on Computational Linguistics(COLING) 5535 - 5540 2020年 [査読有り]
MicroLapse: Measuring workers' leniency to prediction errors of microtasks' working times

Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Jefrey P. Bigham

Proceedings of the ACM Conference on Computer Supported Cooperative Work, CSCW 352 - 356 2019年11月

　概要を見る

Working time estimation is known to be helpful for allowing crowd workers to select lucrative microtasks. We previously proposed a machine learning method for estimating the working times of microtasks, but a practical evaluation was not possible because it was unclear what errors would be problematic for workers across diferent scales of microtask working times. In this study, we formulate MicroLapse, a function that expresses a maximal error in working time prediction that workers can accept for a given working time length. We collected 60, 760 survey answers from 660 Amazon Mechanical Turk workers to formulate MicroLapse. Our evaluation of our previous method based on MicroLapse demonstrated that our working time prediction method was fairly successful for shorter microtasks, which could not have been concluded in our previous paper.

DOI

Scopus
Regularized adversarial training for single-shot virtual try-on

Kotaro Kikuchi, Kota Yamaguchi, Edgar Simo-Serra, Tetsunori Kobayashi

Proceedings - 2019 International Conference on Computer Vision Workshop, ICCVW 2019 3149 - 3152 2019年10月

　概要を見る

Spatially placing an object onto a background is an essential operation in graphic design and facilitates many different applications such as virtual try-on. The placing operation is formulated as a geometric inference problem for given foreground and background images, and has been approached by spatial transformer architecture. In this paper, we propose a simple yet effective regularization technique to guide the geometric parameters based on user-defined trust regions. Our approach stabilizes the training process of spatial transformer networks and achieves a high-quality prediction with single-shot inference. Our proposed method is independent of initial parameters, and can easily incorporate various priors to prevent different types of trivial solutions. Empirical evaluation with the Abstract Scenes and CelebA datasets shows that our approach achieves favorable results compared to baselines.

DOI

Scopus

6

被引用数

(Scopus)
Speaker adversarial training of DPGMM-based feature extractor for zero-resource languages

Yosuke Higuchi, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. INTERSPEECH2019 266 - 270 2019年09月 [査読有り]
Multi-channel speech enhancement using time-domain convolutional denoising autoencoder

Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. INTERSPEECH2019 86 - 90 2019年09月 [査読有り]
Calving prediction from video: Exploiting behavioural information relevant to calving signs in Japanese black beef cows

Kazuma Sugawara, Susumu Saito, Teppei Nakano, Makoto Akanabe, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. ECPLF2019 663 - 669 2019年08月 [査読有り]
Two-stage calving prediction system: Exploiting state-based information relevant to calving signs in Japanese black beef cows

Ryosuke Hyodo, Saki Yasuda, Yusuke Okimoto, Susumu Saito, Teppei Nakano, Makoto Akanabe, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. ECPLF2019 670 - 676 2019年08月 [査読有り]
Data assimilation versus machine learning: Comparative study of fish catch forecasting

Yuka Horiuchi, Yuya Kokaki, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. OCEANS2019 2019年06月 [査読有り]
Psychological measure on fish catches and its application to optimization criterion for machine learning based predictors

Yuya Kokaki, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. OCEANS2019 2019年06月 [査読有り]
対話音声合成の表現力向上に向けた文末音調の制御による付加的なニュアンスの表現に関する実験的検討

岩田和彦,小林哲則

電子情報通信学会論文誌 D, Vol.J102-D ( 6 ) 442 - 453 2019年06月 [査読有り]

担当区分：最終著者

DOI
TurkScanner: Predicting the hourly wage of microtasks

Susumu Saito, Chun-Wei Chiang, Saiph Savage, Teppei Nakano, Tetsunori Kobayashi, Jeffrey Bigham

Proc. The Web Conference 2019 3187 - 3193 2019年05月 [査読有り] [国際共著]
Postfiltering using an adversarial denoising autoencoder with noise-aware training

Naohiro Tawara, Hikari Tanabe, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa

Proc. ICASSP2019 3282 - 3286 2019年05月 [査読有り]

DOI

Scopus

1

被引用数

(Scopus)
End-to-middle training based action generation for multi-party conversation robot

Hayato Katayama, Shinya Fujie, Tetsunori Kobayashi

Proc. IWSDS2019 2019年04月 [査読有り]

担当区分：最終著者
Investigation of Users' Short Responses in Actual Conversation System and Automatic Recognition of their Intentions

Katsuya Yokoyama, Hiroaki Takatsu, Hiroshi Honda, Shinya Fujie, Tetsunori Kobayashi

2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings 934 - 940 2019年02月

　概要を見る

In human-human conversations, listeners often convey intentions to speakers through feedback consisting of reflexive short responses. The speakers recognize these intentions and change the conversational plans to make communication more efficient. These functions are expected to be effective in human-system conversations also
however, there is only a few systems using these functions or a research corpus including such functions. We created a corpus that consists of users' short responses to an actual conversation system and developed a model for recognizing the intention of these responses. First, we categorized the intention of feedback that affects the progress of conversations. We then collected 15604 short responses of users from 2060 conversation sessions using our news-delivery conversation system. Twelve annotators labeled each utterance based on intention through a listening test. We then designed our deep-neural-network-based intention recognition model using the collected data. We found that feedback in the form of questions, which is the most frequently occurring expression, was correctly recognized and contributed to the efficiency of the conversation system.

DOI

Scopus

5

被引用数

(Scopus)
会話によるニュース記事伝達のための音声合成

高津弘明, 福岡維新, 藤江真也, 岩田和彦, 小林哲則

人工知能学会論文誌, 34 ( 2 ) B-I65_1 - 15 2019年02月 [査読有り]

担当区分：最終著者
Speech synthesis for conversational news contents delivery

Hiroaki Takatsu, Ishin Fukuoka, Shinya Fujie, Kazuhiko Iwata, Tetsunori Kobayashi

Transactions of the Japanese Society for Artificial Intelligence 34 ( 2 ) 2019年

　概要を見る

We have been developing a speech-based “news-delivery system”, which can transmit news contents via spoken dialogues. In such a system, a speech synthesis sub system that can flexibly adjust the prosodic features in utterances is highly vital: the system should be able to highlight spoken phrases containing noteworthy information in an article
it should also provide properly controlled pauses between utterances to facilitate user’s interactive reactions including questions. To achieve these goals, we have decided to incorporate the position of the utterance in the paragraph and the role of the utterance in the discourse structure into the bundle of features for speech synthesis. These features were found to be crucially important in fulfilling the above-mentioned requirements for the spoken utterances by the thorough investigation into the news-telling speech data uttered by a voice actress. Specifically, these features dictate the importance of information carried by spoken phrases, and hence should be effectively utilized in synthesizing prosodically adequate utterances. Based on these investigations, we devised a deep neural network-based speech synthesis model that takes as input the role and position features. In addition, we designed a neural network model that can estimate an adequate pause length between utterances. Experimental results showed that by adding these features to the input, it becomes more proper speech for information delivery. Furthermore, we confirmed that by inserting pauses properly, it becomes easier for users to ask questions during system utterances.

DOI

Scopus

1

被引用数

(Scopus)
Recognition of Intentions of Users’ Short Responses for Conversational News Delivery System

Hiroaki Takatsu, Katsuya Yokoyama, Yoichi Matsuyama, Shinya Fujie, Tetsunori Kobayashi

Proc. INTERSPEECH2019 1193 - 1197 2019年 [査読有り]

担当区分：最終著者
Social image tags as a source of word embeddings: A Task-oriented Evaluation

Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi

LREC 2018 - 11th International Conference on Language Resources and Evaluation 969 - 973 2019年 [査読有り]

　概要を見る

© LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved. Distributional hypothesis has been playing a central role in statistical NLP. Recently, however, its limitation in incorporating perceptual and empirical knowledge is noted, eliciting a field of perceptually grounded computational semantics. Typical sources of features in such a research are image datasets, where images are accompanied by linguistic tags and/or descriptions. Mainstream approaches employ machine learning techniques to integrate/combine visual features with linguistic features. In contrast to or supplementing these approaches, this study assesses the effectiveness of social image tags in generating word embeddings, and argues that these generated representations exhibit somewhat different and favorable behaviors from corpus-originated representations. More specifically, we generated word embeddings by using image tags obtained from a large social image dataset YFCC100M, which collects Flickr images and the associated tags. We evaluated the efficacy of generated word embeddings with standard semantic similarity/relatedness tasks, which showed that comparable performances with corpus-originated word embeddings were attained. These results further suggest that the generated embeddings could be effective in discriminating synonyms and antonyms, which has been an issue in distributional hypothesis-based approaches. In summary, social image tags can be utilized as yet another source of visually enforced features, provided the amount of available tags is large enough.
SemSeq: A Regime for Training Widely-Applicable Word-Sequence Encoders.

Hiroaki Tsuyuki, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi

Computational Linguistics - 16th International Conference of the Pacific Association for Computational Linguistics(PACLING) 1215 CCIS 43 - 55 2019年 [査読有り]

　概要を見る

© 2020, Springer Nature Singapore Pte Ltd. A sentence encoder that can be readily employed in many applications or effectively fine-tuned to a specific task/domain is highly demanded. Such a sentence encoding technique would achieve a broader range of applications if it can deal with almost arbitrary word-sequences. This paper proposes a training regime for enabling encoders that can effectively deal with word-sequences of various kinds, including complete sentences, as well as incomplete sentences and phrases. The proposed training regime can be distinguished from existing methods in that it first extracts word-sequences of an arbitrary length from an unlabeled corpus of ordered or unordered sentences. An encoding model is then trained to predict the adjacency between these word-sequences. Herein an unordered sentence indicates an individual sentence without neighboring contextual sentences. In some NLP tasks, such as sentence classification, the semantic contents of an isolated sentence have to be properly encoded. Further, by employing rather unconstrained word-sequences extracted from a large corpus, without heavily relying on complete sentences, it is expected that linguistic expressions of various kinds are employed in the training. This property contributes to enhancing the applicability of the resulting word-sequence/sentence encoders. The experimental results obtained from supervised evaluation tasks demonstrated that the trained encoder achieved performance comparable to existing encoders while exhibiting superior performance in unsupervised evaluation tasks that involve incomplete sentences and phrases.

DOI

Scopus
Towards Answer-unaware Conversational Question Generation.

Mao Nakanishi, Tetsunori Kobayashi, Yoshihiko Hayashi

Proceedings of the 2nd Workshop on Machine Reading for Question Answering(MRQA@EMNLP) 63 - 71 2019年 [査読有り]

DOI
複雑のコンセプトを含むクエリ文からのゼロショット映像検索 -TRECVID AVSタスクにおける成果と課題-

植木一也, 平川幸司, 菊池康太郎, 小林哲則

精密工学会誌 84 ( 12 ) 983 - 990 2018年12月 [査読有り]

担当区分：最終著者
Adversarial autoencoder for reducing nonlinear distortion

Naohiro Tawara, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa

Proc. APSIPA2018 1669 - 1673 2018年11月 [査読有り]
Sequential fish catch forecasting using Bayesian state space models

Yuya Kokaki, Naohiro Tawara, Tetsunori Kobayashi, Kazuo Hashimoto, Tetsuji Ogawa

Proc. ICPR2018 776 - 781 2018年08月 [査読有り]
Fine-grained Video Retrieval using Query Phrases – Waseda_Meisei TRECVID 2017 AVS System –

Kazuya Ueki, Koji Hirakawa, Kotaro Kikuchi, Tetsunori Kobayashi

Proceedings of the 24th International Conference on Pattern Recognition 3322 - 3327 2018年08月 [査読有り]

担当区分：最終著者
Acoustic feature representation based on timbre for fault detection of rotary machines

Kesaaki Menemura, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. SDPC2018 302 - 305 2018年08月 [査読有り]

担当区分：最終著者
Speaker invariant feature extraction for zero-resource languages with adversarial training

Taira Tsuchiya, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

2018 IEEE International Conference on Acoustics, Speech and Signal Processing 2381 - 2385 2018年04月 [査読有り]
Language model domain adaptation via recurrent neural network with domain-shared and domain-specific representations

Tsuyoshi Morioka, Naohiro Tawara, Tetsuji Ogawa, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi

Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018) 6084 - 6088 2018年04月 [査読有り] [国際誌]

担当区分：最終著者

DOI

Scopus

23

被引用数

(Scopus)
A Spoken Dialogue System for Enabling Information Behavior of Various Intention Levels

Hiroaki Takatsu, Ishin Fukuoka, Shinya Fujie, Yoshihiko Hayashi, Tetsunori Kobayashi

Transactions of the Japanese Society for Artificial Intelligence 33 ( 1 ) DSH - C_1 2018年 [査読有り]

担当区分：最終著者

DOI
Ad-hoc Video Search Improved by the Word Sense Filtering of Query Terms

Koji Hirakawa, Kotaro Kikuchi, Kazuya Ueki, Tetsunori Kobayashi, Yoshihiko Hayashi

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11292 LNCS 157 - 163 2018年 [査読有り]

　概要を見る

© 2018, Springer Nature Switzerland AG. The performances of an ad-hoc video search (AVS) task can only be improved when the video processing for analyzing video contents and the linguistic processing for interpreting natural language queries are nicely combined. Among the several issues associated with this challenging task, this paper particularly focuses on the sense disambiguation/filtering (WSD/WSF) of the terms contained in a search query. We propose WSD/WSF methods which employ distributed sense representations, and discuss their efficacy in improving the performance of an AVS system which makes full use of a large bank of visual concept classifiers. The application of a WSD/WSF method is crucial, as each visual concept classifier is linked with the lexical concept denoted by a word sense. The results are generally promising, outperforming not only a baseline query processing method that only considers the polysemy of a query term but also a strong WSD baseline method.

DOI

Scopus
Waseda_Meisei at TRECVID 2018: Ad-hoc Video Search.

Kazuya Ueki, Yu Nakagome, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

2018 TREC Video Retrieval Evaluation(TRECVID) 2018年
Answerable or Not: Devising a Dataset for Extending Machine Reading Comprehension.

Mao Nakanishi, Tetsunori Kobayashi, Yoshihiko Hayashi

Proceedings of the 27th International Conference on Computational Linguistics(COLING) 973 - 983 2018年 [査読有り]
Exploiting end of sentences and speaker alternations in recurrent neural network-based language modeling for multiparty conversations

Hiroto Ashikawa, Naohiro Tawara, Asunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2017 (APSIPA2017) 2017年12月 [査読有り]

DOI

Scopus

1

被引用数

(Scopus)
Object Detection Oriented Feature Pooling for Video Semantic Indexing

Kazuya Ueki, Tetsunori Kobayashi

The 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications 44 - 51 2017年02月 [査読有り]

担当区分：最終著者
Classifying Lexical-semantic Relationships by Exploiting Sense/Concept Representations

Kentaro Kanada, Tetsunori Kobayashi, Yoshihiko Hayashi

2017 Workshop on Sense, Concept and Entity Representations and their Application 37 - 46 2017年 [査読有り]
Adaptive training of vibration-based anomaly detector for wind turbine condition monitoring

Takanori Hasegawa, Jun Ogata, Masahiro Murakawa, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. Annual Conference on PHM Society 177 - 184 2017年 [査読有り]
Incorporating visual features into word embeddings: A bimodal autoencoder-based approach.

Mika Hasegawa, Tetsunori Kobayashi, Yoshihiko Hayashi

IWCS 2017 - 12th International Conference on Computational Semantics - Short papers(IWCS(2)) 2017年 [査読有り]
“Video Semantic Indexing using Object Detector,”

Kazuya Ueki, Tetsunori Kobayashi

Proc. VRCAI2016 2016年12月 [査読有り]

担当区分：最終著者
“Evaluation for Collaborative Video Surveillance Platform using Prototype System of Abandoned Object Detection,”

Susumu Saito, Teppei Nakano, Tetsunori Kobayashi

Proc. ICDSC2016 172 - 177 2016年09月 [査読有り]

担当区分：最終著者
Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi

Trans. on Signal and Information Processing 5 2016年08月 [査読有り]

担当区分：最終著者
Waseda at TRECVID 2016: Fully-automatic Ad-hoc Video Search

Kotaro Kikuchi, Kazuya Ueki, Susumu Saito, Tetsunori Kobayashi

2016 TREC Video Retrieval Evaluation, TRECVID 2016 2016年
A Spoken Dialog System for Coordinating Information Consumption and Exploration.

Shinya Fujie, Ishin Fukuoka, Asumi Mugita, Hiroaki Takatsu, Yoshihiko Hayashi, Tetsunori Kobayashi

Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval(CHIIR) 253 - 256 2016年 [査読有り]

担当区分：最終著者

　概要を見る

© 2016 ACM. Passive consumption of information is boring in most cases and even painful in some cases, especially when the information content is delivered by employing speech media. The user of a speech-based information delivery system, for example a text-to-speech system, usually cannot interrupt the ongoing information ow, inhibiting her/him to confirm some part of the content, or to pose an inquiry for further information exploration. We argue that a carefully designed spoken dialog system could remedy these undesirable situations, and further enable an enjoyable conversation with the users. The key technologies to realize such an attractive dialog system are: (1) pre-compilation of a dialog plan based on the analysis of a source content, and (2) the dynamic recognition of user's state of understanding and interests. This paper illustrates technical views to implement these functionalities, and discusses a dialog example to exemplify the technical merits of the proposed system.

DOI

Scopus

3

被引用数

(Scopus)
Multi-Feature Based Fast Depth Decision in HEVC Inter Prediction for VLSI Implementation

Gaoxing Chen, Zhenyu Liut, Tetsunori Kobayashi, Takeshi Ikenaga

2016 9TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2016) 124 - 134 2016年 [査読有り]

　概要を見る

High efficiency video coding (HEVC) is the latest international video compression standard that achieves double compression efficiency than the previous standard H.264/AVC. To increase the compression accuracy, HEVC employs the coding unit (CU) ranging from 8 x 8 to 64 x 64. However, the encoding complexity of HEVC increase a lot since the manifold partition sizes. A lot of works are focused on reducing the complexity but didn't considered the feasibility of hardware implementation. This paper proposes a hardware friendly fast depth range definition algorithm based on multiple features. Block texture feature, quantization feature and block motion feature are utilized. Block texture feature is based on the texture similarity in consecutive frames. Quantization feature is based on the compression regularity of HEVC. Block motion feature is for compensate the difference caused by the moving object. Comparing with the original HEVC, the proposed method can saved about 33.72% of the processing time with 0.76% of BD-bitrate increase on average.
Image Retrieval under Very Noisy Annotations

Kazuya Ueki, Tetsunori Kobayashi

2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) 1277 - 1282 2016年 [査読有り]

担当区分：最終著者

　概要を見る

In recent years, a significant number of tagged images uploaded onto image sharing sites has enabled us to create high-performance image recognition models. However, there are many inaccurate image tags on the Internet, and it is very laborious to investigate the percentage of tags that are incorrect. In this paper, we propose a new method for creating an image recognition model that can be used even when the image data set includes many incorrect tags. Our method has two superior features. First, our method automatically measures the reliability of annotations and does not require any parameter adjustment for the percentage of error tags. This is a very important feature because we usually do not know how many errors are included in the database, especially in actual Internet environments. Second, our method iterates the error modification process. It begins with the modification of simple and obvious errors, gradually deals with much more difficult errors, and finally creates the high-performance recognition model with refined annotations. Using an object recognition image database with many annotation errors, our experiments showed that the proposed method successfully improved the image retrieval performance in approximately 90 percent of the image object categories.
Video Semantic Indexing using Object Detection-Derived Features

Kotaro Kikuchi, Kazuya Ueki, Tetsuji Ogawa, Tetsunori Kobayashi

2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) 1288 - 1292 2016年 [査読有り]

担当区分：最終著者

　概要を見る

A new feature extraction method based on object detection to achieve accurate and robust semantic indexing of videos is proposed. Local features (e.g., SIFT and HOG) and convolutional neural network (CNN)-derived features, which have been used in semantic indexing, in general are extracted from the entire image and do not explicitly represent the information of meaningful objects that contributes to the determination of semantic categories. In this case, the background region, which does not contain the meaningful objects, is unduly considered, exerting a harmful effect on the indexing performance. In the present study, an attempt was made to suppress the undesirable effects derived from the redundant background information by incorporating object detection technology into semantic indexing. In the proposed method, a combination of the meaningful objects detected in the video frame image is represented as a feature vector for verification of semantic categories. Experimental comparisons demonstrate that the proposed method facilitates the TRECVID semantic indexing task.
IMPROVING SEMANTIC VIDEO INDEXING: EFFORTS IN WASEDA TRECVID 2015 SIN SYSTEM

Kazuya Ueki, Tetsunori Kobayashi

2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS 1184 - 1188 2016年 [査読有り]

担当区分：最終著者

　概要を見る

In this paper, we propose a method for improving the performance of semantic video indexing. Our approach involves extracting features from multiple convolutional neural networks (CNNs), creating multiple classifiers, and integrating them. We employed four measures to accomplish this: (1) utilizing multiple evidences observed in each video and effectively compressing them into a fixed-length vector; (2) introducing gradient and motion features to CNNs; (3) enriching variations of the training and the testing sets; and (4) extracting features from several CNNs trained with various large-scale datasets. Using the test dataset from TRECVID's 2014 evaluation benchmark, we evaluated the performance of the proposal in terms of the mean extended inferred average precision measure. On this measure, our system's performance was 35.7, outperforming the state-of-the-art TRECVID 2014 benchmark performance of 33.2. Based on this work, our submission at TRECVID 2015 was ranked second among all submissions.
Separation matrix optimization using associative memory model for blind source separation

Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri

2015 23rd European Signal Processing Conference, EUSIPCO 2015 1098 - 1102 2015年12月 [査読有り]

　概要を見る

A source signal is estimated using an associative memory model (AMM) and used for separation matrix optimization in linear blind source separation (BSS) to yield high quality and less distorted speech. Linear-filtering-based BSS, such as independent vector analysis (IVA), has been shown to be effective in sound source separation while avoiding non-linear signal distortion. This technique, however, requires several assumptions of sound sources being independent and generated from non-Gaussian distribution. We propose a method for estimating a linear separation matrix without any assumptions about the sources by repeating the following two steps: estimating non-distorted reference signals by using an AMM and optimizing the separation matrix to minimize an error between the estimated signal and reference signal. Experimental comparisons carried out in simultaneous speech separation suggest that the proposed method can reduce the residual distortion caused by IVA.

DOI

Scopus

2

被引用数

(Scopus)
付加情報付き局所特徴量を用いた画像カテゴリ識別の向上

植木一也,白石洋平,俵直弘,小林哲則

精密工学会誌 80 ( 12 ) 1144 - 1149 2015年12月 [査読有り]

担当区分：最終著者
Waseda at TRECVID 2015: Semantic Indexing, notebook paper of the TRECVID 2015 Workshop: November 2015.

Kazuya Ueki , Tetsunori Kobayashi

The TREC Video Retrieval Evaluation2015 2015年11月 [査読有り]

担当区分：最終著者
Automatic image tag refinement for image retrieval.

Kazuya Ueki , Tetsunori Kobayashi

Proc. 5th Asia International Symposium on Mechatronics 396 - 399 2015年10月 [査読有り]

担当区分：最終著者
Multiscale recurrent neural network based language model.

Tsuyoshi Morioka, Tomoharu Iwata, Takaaki Hori, Tetsunori Kobayashi

Proc. 16th Annual Conf. of the Int'l Speech Communication Association 2366 - 2370 2015年09月 [査読有り]

担当区分：最終著者
Bilinear map of filter-bank outputs for DNN-based speech recognition.

Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi , Tsuneo Nitta

Proc. 16th Annual Conf. of the Int'l Speech Communication Association 16 - 20 2015年09月 [査読有り]
Blind source separation using associative memory model and linear separation filter.

Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri

Proc. 2015 European Signal Processing Conference 1103 - 1107 2015年09月 [査読有り]

担当区分：責任著者
A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura and Tetsunori Kobayashi

Trans. on Signal and Information Processing 4 ( e6 ) 2015年09月 [査読有り]

担当区分：最終著者
Bilinear map of filter‐bank outputs for DNN‐based speech recognition

Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta

INTERSPEECH 2015 16 - 20 2015年09月 [査読有り]
Feature extraction for rotary-machine acoustic diagnostics focused on period.

Kesaaki Minemura, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 44th International Congress and Exposition on Noise Control Engineering 2015年08月 [査読有り]

担当区分：最終著者
Towards a Computational Model of Small Group Facilitation.

Yoichi Matsuyama, Tetsunori Kobayashi

2015 AAAI Spring Symposium Series 2015年03月 [査読有り]

担当区分：最終著者
Automatic Expressive Opinion Sentence Generation for Enjoyable Conversational Systems

Yoichi Matsuyama, Akihiro Saito, Shinya Fujie and Tetsunori Kobayashi

Trans. on Audio, Speech, and Language Processing 23 ( 1 ) 313 - 326 2015年02月 [査読有り]

担当区分：最終著者
Waseda at TRECVID 2015 semantic indexing (SIN)

Kazuya Ueki, Tetsunori Kobayashi

2015 TREC Video Retrieval Evaluation, TRECVID 2015 2015年
A COMPARATIVE STUDY OF SPECTRAL CLUSTERING FOR I-VECTOR-BASED SPEAKER CLUSTERING UNDER NOISY CONDITIONS

Naohiro Tawara, Tetsuji Ogawa, Tetsunori Kobayashi

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) 2041 - 2045 2015年 [査読有り]

担当区分：最終著者

　概要を見る

The present paper dealt with speaker clustering for speech corrupted by noise. In general, the performance of speaker clustering significantly depends on how well the similarities between speech utterances can be measured. The recently proposed i-vector-based cosine similarity has yielded the state-of-the-art performance in speaker clustering systems. However, this similarity often fails to capture the speaker similarity under noisy conditions. Therefore, we attempted to examine the efficiency of spectral clustering on i-vector-based similarity for speech corrupted by noise because spectral clustering can yield robustness against noise by non-linear projection. Experimental comparisons demonstrated that spectral clustering yielded significant improvement from conventional methods, such as agglomerative clustering and k -means clustering, under non-stationary noise conditions.
Multi-layer Feature Extractions for Image Classification - Knowledge from Deep CNNs

Kazuya Ueki, Tetsunori Kobayashi

2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015) 9 - 12 2015年 [査読有り]

担当区分：最終著者

　概要を見る

Recently, there has been considerable research into the application of deep learning to image recognition. Notably, deep convolutional neural networks (CNNs) have achieved excellent performance in a number of image classification tasks, compared with conventional methods based on techniques such as Bag-of-Features (BoF) using local descriptors. In this paper, to cultivate a better understanding of the structure of CNN, we focus on the characteristics of deep CNNs, and adapt them to SIFT+BoF-based methods to improve the classification accuracy. We introduce the multi-layer structure of CNNs into the classification pipeline of the BoF framework, and conduct experiments to confirm the effectiveness of this approach using a fine-grained visual categorization dataset. The results show that the average classification rate is improved from 52.4% to 69.8%.
Effect of frequency weighting on MLP-based speaker canonicalization,

Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi , Tsuneo Nitta,

Proc. 15th Annual Conf. of the Int'l Speech Communication Association 2987 - 2991 2014年09月 [査読有り]

担当区分：最終著者
Effect of Frequency Weighting on MLP-Based Speaker Canonicalization

Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta

INTERSPEECH 2014 2987 - 2990 2014年09月 [査読有り]
“Blocked Gibbs Sampling Based Multi-Scale Mixture Model for Speaker Clustering on Noisy Data”,

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi,

IEEE International Workshop on Machine Learning for Signal Processing 2013年09月 [査読有り]

担当区分：最終著者
Expression of Speaker's Intentions through Sentence-Final Particle/Intonation Combinations in Japanese Conversational Speech Syntyesis

Kazuhiko Iwata, Tetsunori Kobayashi

8th ISCA Speech Synthesis Workshop 235 - 240 2013年08月 [査読有り]

担当区分：最終著者
Speaker's Intentions Conveyed to Listeners by Sentence-Final Particles and Their Intonations in Japanese Conversational Speech

Kazuhiko Iwata, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2013 6895 - 6899 2013年05月 [査読有り]

担当区分：最終著者
A Four-Participant Group Facilitation Framework for Conversational Robots

Yoichi Matsuyama, Iwao Akiba, Akihiro Saito, Tetsunori Kobayashi

Proceedings of the SIGDIAL 2013 Conference 284 - 293 2013年 [査読有り]

担当区分：最終著者
Expressing speaker's intentions through sentence-final intonations for Japanese conversational speech synthesis

Kazuhiko Iwata, Tetsunori Kobayashi

Proc. 13th Annual Conf. of the Int'l Speech Communication Association 442 - 445 2012年09月 [査読有り]

担当区分：最終著者
Fully Bayesian Speaker Clustering Based on Hierarchically Structured Utterance-oriented Dirichlet process mixture model.

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

Proc. 13th Annual Conf. of the Int'l Speech Communication Association 2166 - 2169 2012年09月 [査読有り]

担当区分：最終著者
AAM Fitting Using Shape Parameter Distribution.

Youhei Shiraishi, Shinya Fujie , Tetsunori Kobayashi

Proc. EUSIPCO2012 2238 - 2242 2012年08月 [査読有り]

担当区分：最終著者
Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering.

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi,

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2012 5253 - 5256 2012年03月 [査読有り]

担当区分：最終著者
人同士のコミュニケーションに参加し活性化する会話ロボット

藤江真也,松山洋一,谷山輝,小林哲則

電子情報通信学会論文誌A J95-A ( 1 ) 37 - 45 2012年01月 [査読有り]

担当区分：最終著者
Spatial filter calibration based on minimization of modified LSD.

Nobuaki Tanaka, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 12th Annual Conf. of the Int'l Speech Communication Association 1761 - 1764 2011年09月 [査読有り]

担当区分：最終著者
Development and evaluation of Japanese Lombard speech corpus

Tetsuji Ogawa, Takanobu Nishiura, Takeshi Yamada, Norihide Kitaoka, Tetsunori Kobayashi,

Proc. Internoise2011 1366 - 1373 2011年09月 [査読有り]

担当区分：最終著者
Speaker verification robust to talking style variation using multiple kernel leaning based on conditional entropy minimization

Tetsuji Ogawa, Hideitsu Hino, Noboru Murata, Tetsunori Kobayashi

Proc. 12th Annual Conf. of the Int'l Speech Communication Association 2741 - 2744 2011年08月 [査読有り]

担当区分：最終著者
Speaker Clustering Based on Utterance-oriented Dirichlet Process Mixture Model.

Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 12th Annual Conf. of the Int'l Speech Communication Association 2905 - 2908 2011年08月 [査読有り]

担当区分：最終著者
Multiparty Conversation Facilitation Strategy Using Combination of Question Answering and Spontaneous Utterances

Yoichi Matsuyama, Yushi Xu, Akihiro Saito, Shinya Fujie, Tetsunori Kobayashi

The paralinguistic Information Processing and its Integration in Spoken Dialogue Systems 103 - 112 2011年08月 [査読有り]

担当区分：最終著者
Conversational Speech Synthesis System with Communication Situation Dependent HMMs

Kazuhiko Iwata, Tetsunori Kobayashi

The paralinguistic Information Processing and its Integration in Spoken Dialogue Systems 113 - 124 2011年08月 [査読有り]

担当区分：最終著者
Class-Distance-Based Discriminant Analysis and Its Application to Supervised Automatic Age Estimation

OGAWA Tetsuji, UEKI Kazuya, KOBAYASHI Tetsunori

IEICE transactions on information and systems 94 ( 8 ) 1683 - 1689 2011年08月 [査読有り]

担当区分：最終著者

　概要を見る

We propose a novel method of supervised feature projection called class-distance-based discriminant analysis (CDDA), which is suitable for automatic age estimation (AAE) from facial images. Most methods of supervised feature projection, e.g., Fisher discriminant analysis (FDA) and local Fisher discriminant analysis (LFDA), focus on determining whether two samples belong to the same class (i.e., the same age in AAE) or not. Even if an estimated age is not consistent with the correct age in AAE systems, i.e., the AAE system induces error, smaller errors are better. To treat such characteristics in AAE, CDDA determines between-class separability according to the class distance (i.e., difference in ages); two samples with similar ages are imposed to be close and those with spaced ages are imposed to be far apart. Furthermore, we propose an extension of CDDA called local CDDA (LCDDA), which aims at handling multimodality in samples. Experimental results revealed that CDDA and LCDDA could extract more discriminative features than FDA and LFDA.

DOI CiNii

Scopus
mn SPEAKER RECOGNITION USING MULTIPLE KERNEL LEARNING BA SED ON CONDITIONA L ENTROPY MINIMIZATION

Tetsuji Ogawa, Hideitsu Hino, Nima Reyhani, Noboru Murata, Tetsunori Kobayashi

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING 2204 - 2207 2011年 [査読有り]

担当区分：最終著者

　概要を見る

We applied a multiple kernel learning (MKL) method based on information-theoretic optimization to speaker recognition. Most of the kernel methods applied to speaker recognition systems require a suitable kernel function and its parameters to be determined for a given data set. In contrast, MKL eliminates the need for strict determination of the kernel function and parameters by using a convex combination of element kernels. In the present paper, we describe an MKL algorithm based on conditional entropy minimization (MCEM). We experimentally verified the effectiveness of MCEM for speaker classification; this method reduced the speaker error rate as compared to conventional methods.

DOI

Scopus

4

被引用数

(Scopus)
Framework of Communication Activation Robot Participating in Multiparty Conversation

Yoichi Matsuyama, Shinya Fujie, Tetsunori Kobayashi

AAAI Fall Symposium, Dialog with Robots 68 - 73 2010年11月 [査読有り]

担当区分：最終著者
DEVELOPMENT OF ZONAL BEAMFORMER AND ITS APPLICATION TO ROBOT AUDITION

Nobuaki Tanaka, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010) 1529 - 1533 2010年08月 [査読有り]

担当区分：最終著者

　概要を見る

We have proposed a zonal beamformer (ZBF), which enhances the sound source located in a zonal space, and applied the ZBF to noise reduction systems for robot audition. A conversational partner of a robot does not always remain stationary with respect to the robot. In order to cope with such a situation, we have proposed a fan-like beamformer (FBF), which enhances the sound source located in a fan-like space in front of the robot under the assumption that the partner is in front of the robot. However, the FBF may degrade the noise reduction performance when directional noise sources are located behind the target source because the FBF widens the space as the distance from the robot increases. The ZBF can better improve the performance of eliminating the directional noise coming from behind the target source than the FBF because the ZBF has a considerably sharper directivity than the FBF.
Speech enhancement using a square microphone array in the presence of directional and diffuse noise

Tetsuji Ogawa, Shintaro Takada, Kenzo Akagiri, and Tetsunori Kobayashi

Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA) E93-E ( 5 ) 2010年05月 [査読有り]

担当区分：最終著者
A Meeting Assistance System with a Collaborative Editor for Argument Structure Visualization

Yasutomo Arai, Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

Proc. Int'l Conf. on Computer Supported Corporative Work 2010 2010年02月 [査読有り]

担当区分：最終著者
A Collaborative Lexical Data Design System for Speech Recognition Application Developers

Hiroshi Sasaki, Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

Proc. Int'l Conf. on Computer Supported Corporative Work 2010 455 - 456 2010年02月 [査読有り]

担当区分：最終著者
会話ロボットとその聴覚機能

藤江真也, 小川哲司, 小林哲則

日本ロボット学会誌 28 ( 1 ) 23 - 26 2010年01月 [査読有り] [招待有り]

担当区分：最終著者

DOI CiNii
Psychological evaluation of a group communication activativation robot in a party game

Yoichi Matsuyama, Shinya Fujie, Hikaru Taniyama, Tetsunori Kobayashi

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 3046 - 3049 2010年

　概要を見る

We propose a communication activation robot and evaluate effectiveness of communication activation. As an example of application, we developed the system participating in a quiz-formed party game called NANDOKU quiz on a multi-modal conversation robot SCHEMA, and we conducted an experiment in a laboratory to evaluate its capability of activation in group communication. We evaluated interaction in NANDOKU quiz game with subjects as panelists using video analysis and SD(Semantic Differential) method with questionnaires. The result of SD method indicates that subjects feel more pleased and more noisy with participation of a robot. As the result of video analysis, the smiling duration ratio is greater with participation of a robot. These results imply evidence of robot's communication activation function in the party game. © 2010 ISCA.
A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination

Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 2954 - 2957 2010年

　概要を見る

We present a realization method of the principle of minimum relative entropy discrimination (MRED) in order to derive a regularized discriminative training method. MRED is advantageous since it provides a Bayesian interpretations of the conventional discriminative training methods and regularization techniques. In order to realize MRED for speech recognition, we proposed an approximation method of MRED that strictly preserves the constraints used in MRED. Further, in order to practically perform MRED, an optimization method based on convex optimization and its solver based on the cutting plane algorithm are also proposed. The proposed methods were evaluated on continuous phoneme recognition tasks. We confirmed that the MRED-based training system outperformed conventional discriminative training methods in the experiments. © 2010 ISCA.
Development of zonal beam former and its application to robot audition

Nobuaki Tanaka, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

European Signal Processing Conference 1529 - 1533 2010年

　概要を見る

We have proposed a zonal beamformer (ZBF), which enhances the sound source located in a zonal space, and applied the ZBF to noise reduction systems for robot audition. A conversational partner of a robot does not always remain stationary with respect to the robot. In order to cope with such a situation, we have proposed a fan-like beamformer (FBF), which enhances the sound source located in a fan-like space in front of the robot under the assumption that the partner is in front of the robot. However, the FBF may degrade the noise reduction performance when directional noise sources are located behind the target source because the FBF widens the space as the distance from the robot increases. The ZBF can better improve the performance of eliminating the directional noise coming from behind the target source than the FBF because the ZBF has a considerably sharper directivity than the FBF. © EURASIP, 2010.
SCHEMA: multi-party interaction-oriented humanoid robot

Yoichi Matsuyama, Kosuke Hosoya, Hikaru Taniyama, Hiroki Tsuboi, Shinya Fujie, Tetsunori Kobayashi

ACM SIGGRAPH ASIA 2009 Art Gallery & Emerging Technologies: Adaptation 2009年12月 [査読有り]

担当区分：最終著者

DOI
Influence of Lombard Effect: Accuracy Analysis of Simulation-Based Assessments of Noisy Speech Recognition Systems for Various Recognition Conditions

Tetsuji Ogawa, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E92D ( 11 ) 2244 - 2252 2009年11月 [査読有り]

担当区分：最終著者

　概要を見る

The accuracy of simulation-based assessments of speech recognition systems under noisy conditions is investigated with a focus on the influence of the Lombard effect on the speech recognition performances. This investigation was carried out under various recognition conditions of different sound pressure levels of ambient noise, for different recognition tasks, such as continuous speech recognition and spoken word recognition, and using different recognition systems, i.e., systems with and without adaptation of the acoustic models to ambient noise. Experimental results showed that accurate simulation was not always achieved when dry sources with neutral talking style were used, but it could be achieved if the dry sources that include the influence of the Lombard effect were used; the simulation in the latter case is accurate, irrespective of the recognition conditions.

DOI

Scopus

4

被引用数

(Scopus)
Conversation robot participating in and activating a group communication

Shinya Fujie, Yoichi Matsuyama, Hikaru Taniyama, Tetsunori Kobayashi

Proc. 10th Annual Conf. of the Int'l Speech Communication Association 264 - 267 2009年09月 [査読有り]

担当区分：最終著者
Direction-of-arrival estimation under noisy condition using four-line omni-directional microphones mounted on a robot head

Tetsuji Ogawa, Kosuke Hosoya, Kenzo Akagiri, Tetsunori Kobayashi

Proc. 2009 European Signal Processing Conference 879 - 883 2009年08月 [査読有り]

担当区分：最終著者
System Design of Group Communication Activator: An Entertainment Task for Elderly Care

Yoichi Matsuyama, Hikaru Taniyama, Shinya Fujie, Tetsunori Kobayashi

Proc. HRI2009 243 - 244 2009年03月 [査読有り]

担当区分：最終著者
Upper-Body Contour Extraction Using Face and Body Shape Variance Information

Kazuki Hoshiai, Shinya Fujie, Tetsunori Kobayashi

ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS 5414 862 - + 2009年 [査読有り]

担当区分：最終著者

　概要を見る

We propose a fitting method using a model that integrates face and body shape variance information for upper-body contour extraction. Accurate body-contour extraction is necessary for various applications, such as pose estimation, gesture recognition, and so on. In this study, we regard it as the shape model fitting problem. A model including shape variance information can fit to the contour robustly even in the noisy case. AAMs are one of these models and can fit to a face successfully. It needs appearance information for effective fitting, but it can not be used in our case because appearance of upper-body easily changes by clothes. Instead of intensity image, proposed method uses edge image as appearance information. However, discrimination between a true contour edge of upper-body and other edges is difficult. To solve this problem, we integrate shapes of upper-body and face. It is expected that this integrated model is more robust to edges in clutter background and various locations of the body than a body shape model using only body shape information, We conduct experiments and confirm improvement in accuracy by integration of face and body variance information.
Robot auditory system using head-mounted square microphone array

Kosuke Hosoya, Tetsuji Ogawa, Tetsunori Kobayashi

2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS 2736 - 2741 2009年 [査読有り]

担当区分：最終著者

　概要を見る

A new noise reduction method suitable for autonomous mobile robots was proposed and applied to preprocessing of a hands-free spoken dialogue system. When a robot talks with a conversational partner in real environments, not only speech utterances by the partner but also various types of noise, such as directional noise, diffuse noise, and noise from the robot, are observed at microphones. We attempted to remove these types of noise simultaneously with small and light-weighted devices and low-computational-cost algorithms. We assumed that the conversational partner of the robot was in front of the robot. In this case, the aim of the proposed method is extracting speech signals coming from the frontal direction of the robot. The proposed noise reduction system was evaluated h the presence of various types of noise: the number of word errors was reduced by 69 % as compared to the conventional methods. The proposed robot auditory system can also cope with the case in which a conversational partner (i.e., a sound source) moves from the front of the robot: the sound source was localized by face detection and tracking using facial images obtained from a camera mounted on an eye of the robot. As a result, various types of noise could be reduced in real time, irrespective of the sound source positions, by combining speech information with image information.
Multi-modal Integration for Personalized Conversation: Towards a Humanoid in Daily Life

Shinya Fujie, Daichi Watanabe, Yuhi Ichikawa, Hikaru Taniyama, Kosuke Hosoya, Yoichi Matsuyama, Tetsunori Kobayashi

Proc. Int'l Conf. on Humanoid Robots 617 - 622 2008年12月 [査読有り]

担当区分：最終著者
Designing Communication Activation System in Group Communication

Yoichi Matsuyama, Hikaru Taniyama, Shinya Fujie, Tetsunori Kobayashi

Proc. Int'l Conf. on Humanoid Robots 629 - 634 2008年12月 [査読有り]

担当区分：最終著者
Class Distance Weighted Locality Preserving Projection for Automatic Age Estimation

Kazuya Ueki, Masakazu Miya, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. Biometrics: Theory, Applications and Systems 2008年10月 [査読有り]

担当区分：最終著者
Design and Formulation for Speech Interface Based on Flexible Shortcuts

Teppei Nakano, Tomoyuki Kumai, Tetsunori Kobayashi, Yasushi Ishikawa

Proc. 9th Annual Conf. of the Int'l Speech Communication Association 2474 - 2477 2008年09月 [査読有り]

担当区分：責任著者
An ASM fitting method based on machine learning that provides a robust parameter initialization for AAM fitting

Matthias Wimmer, Shinya Fujie, Freek Stulp, Tetsunori Kobayashi, Bernd Radig

Proc. Int'l Conf. on Automatic Face and Gesture Recognition 2008年09月 [査読有り]

担当区分：責任著者
Ears of the robot: noise reduction using four-line ultra-micro omni-directional microphones mounted on a robot head

Tetsuji Ogawa, Hirofumi Takeuchi, Shintaro Takada, Kenzo Akagiri, Tetsunori Kobayashi

Proc. 2008 European Signal Processing Conference 2008年08月 [査読有り]

担当区分：最終著者
Ears of the robot: Direction of arrival estimation based on pattern recognition using robot-mounted microphones

Naoya Mochiki, Tetsuji Ogawa, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E91D ( 5 ) 1522 - 1530 2008年05月 [査読有り]

担当区分：最終著者

　概要を見る

We propose a new type of direction-of-arrival estimation method for robot audition that is free from strict head related transfer function estimation. The proposed method is based on statistical pattern recognition that employs a ratio of power spectrum amplitudes occurring for a microphone pair as a feature vector. It does not require any phase information explicitly, which is frequently used in conventional techniques, because the phase information is unreliable for the case in which strong reflections and diffractions occur around the microphones. The feature vectors we adopted can treat these influences naturally. The effectiveness of the proposed method was shown from direction-of-arrival estimation tests for 19 kinds of directions: 92.4% of errors were reduced compared with the conventional phase-based method.

DOI

Scopus

3

被引用数

(Scopus)
Speech enhancement using square microphone array for mobile devices

Shintaro Takada, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2008 313 - 316 2008年04月 [査読有り]

担当区分：最終著者
Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR,

Shoei Sato, Akio Kobayashi, Kazuo Onoe, Shinichi Homma, Toru Imai,Tohru Takagi, Tetsunori Kobayashi

Trans. on Information and Systems (ED) E91-D ( 3 ) 815 - 824 2008年03月 [査読有り]

担当区分：最終著者
Social robots that interact with people.

Cynthia Breazeal, Atsuo Takanishi, Tetsunori Kobayashi

Springer handbook of robotics 2008年 [査読有り] [招待有り]

担当区分：最終著者
Upper-body Contour Extraction and Tracking Using Face and Body Shape Variance Information

Kazuki Hoshiai, Shinya Fujie, Tetsunori Kobayashi

2008 8TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS 2008) 398 - + 2008年 [査読有り]

担当区分：最終著者

　概要を見る

We propose a fitting method using a model that integrates face and body shape variance information for upper-body contour extraction and tracking. Accurate body-contour extraction is necessary for various applications, such as pose estimation, gesture recognition, and so on. In this study, we regard it as the shape model fitting problem. A model including shape variance information can fit to the contour robustly even in the noise case. AAMs are one of these models and can fit to a face successfully. It needs appearance information for effective fitting, but it can not be used in our case because appearance of upper-body easily changes by clothes. Instead of intensity image, proposed method uses edge image as appearance information. However, discrimination between a true contour edge of upper-body and other edges is difficult. To solve this problem, we integrate shape models of upper body and face. It is expected that this integrated model is more robust to edges in clutter background and various locations of the body than a body shape model using only body shape information. We conduct experiments and confirm improvement in accuracy by integration of face and body variance information.
Extensible speech recognition system using Proxy-Agent

Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

Proc. Automatic Speech Recognition and Understanding Workshop 601 - 606 2007年12月 [査読有り]

担当区分：最終著者
Gender Classification Based on Integration of Multiple Classifiers Using Different Features of Facial and Neck Imgaes

Kazuya Ueki, Tetsunori Kobayashi

Journal of the Institute of Image Information and Television Engineers 61 ( 12 ) 1803 - 1809 2007年12月 [査読有り]

担当区分：最終著者
Sound Source Separation using Null-Beamforming and Spectral Subtraction for Mobile Devices

Shintaro Takada, Satoshi Kanba, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2007) 30 - 33 2007年10月 [査読有り]

担当区分：最終著者
Ears of the Robot : Three Simultaneous Speech Segregation and Recognition Using Robot-Mounted Microphones

MOCHIKI Naoya, OGAWA Tetsuji, KOBAYASHI Tetsunori

IEICE transactions on information and systems 90 ( 9 ) 1465 - 1468 2007年09月 [査読有り]

担当区分：最終著者

　概要を見る

A new type of sound source segregation method using robot-mounted microphones, which are free from strict head related transfer function (HRTF) estimation, has been proposed and successfully applied to three simultaneous speech recognition systems. The proposed segregation method is executed with sound intensity differences that are due to the particular arrangement of the four directivity microphones and the existence of a robot head acting as a sound barrier. The proposed method consists of three-layered signal processing: two-line SAFIA (binary masking based on the narrow band sound intensity comparison), two-line spectral subtraction and their integration. We performed 20K vocabulary continuous speech recognition test in the presence of three speakers' simultaneous talk, and achieved more than 70% word error reduction compared with the case without any segregation processing.

CiNii
Ears of the robot: Three simultaneous speech segregation and recognition using robot-mounted microphones

Naoya Mochiki, Tetsuji Ogawa, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E90D ( 9 ) 1465 - 1468 2007年09月 [査読有り]

担当区分：最終著者

　概要を見る

A new type of sound source segregation method using robot-mounted microphones, which are free from strict head related transfer function (HRTF) estimation, has been proposed and successfully applied to three simultaneous speech recognition systems. The proposed segregation method is executed with sound intensity differences that are due to the particular arrangement of the four directivity microphones and the existence of a robot head acting as a sound barrier. The proposed method consists of three-layered signal processing: two-line SAFIA (binary masking based on the narrow band sound intensity comparison), two-line spectral subtraction and their integration. We performed 20K vocabulary continuous speech recognition test in the presence of three speakers' simultaneous talk, and achieved more than 70% word error reduction compared with the case without any segregation processing.

DOI

Scopus

3

被引用数

(Scopus)
Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR

Shoei Sato, Tetsunori Kobayashi, et al.,

Proc. 8th Annual Conf. of the Int'l Speech Communication Association 345 - 348 2007年08月 [査読有り]

担当区分：責任著者
Fusion-based age-group classification method using multiple two-dimensional feature extraction algorithms

Kazuya Ueki, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E90D ( 6 ) 923 - 934 2007年06月 [査読有り]

担当区分：最終著者

　概要を見る

An age-group classification method based on a fusion of different classifiers with different two-dimensional feature extraction algorithms is proposed. Theoretically, an integration of multiple classifiers can provide better performance compared to a single classifier. In this paper, we extract effective features from one sample image using different dimensional reduction methods, construct multiple classifiers in each subspace, and combine them to reduce age-group classification errors. As for the dimensional reduction methods, two-dimensional PCA (2DPCA) and two-dimensional LDA (2DLDA) are used. These algorithms are antisymmetric in the treatment of the rows and the columns of the images. We prepared the row-based and column-based algorithms to make two different classifiers with different error tendencies. By combining these classifiers with different errors, the performance can be improved. Experimental results show that our fusion-based age-group classification method achieves better performance than existing two-dimensional algorithms alone.

DOI

Scopus

4

被引用数

(Scopus)
マルチモーダル会話ロボット：ロボットが会話において「聴く」行為について

小林哲則, 藤江真也

計測自動制御学会誌 46 ( 6 ) 466 - 471 2007年06月 [査読有り] [招待有り]

担当区分：筆頭著者
音声スタータ: 有声休止による発話開始の指定が可能な音声入力インタフェース

後藤真孝,北山広治,伊藤克亘,小林哲則

情報処理学会論文誌 48 ( 5 ) 2001 - 2011 2007年05月 [査読有り]

担当区分：最終著者
Adequency analysis of simulation-based assessment of speech recognition system

Tetsuji Ogawa, Satoshi Kanba, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2007 1153 - 1157 2007年04月 [査読有り]

担当区分：最終著者
音声スポッタ: 人間同士の会話中に音声認識が利用可能な音声入力インタフェース

後藤真孝,北山広治,伊藤克亘,小林哲則

情報処理学会論文誌 48 ( 3 ) 1275 - 1283 2007年03月 [査読有り]

担当区分：最終著者
Conversation robot with the function of gaze recognition

Shinya Fujie, Toshihiko Yamahata, Tetsunori Kobayashi

IEEE-RAS Int'l Conf. on Humanoid Robots 364 - 369 2006年12月 [査読有り]

担当区分：最終著者
Realization of rhythmic dialogue on spoken dialogue system using para-linguistic information

Shinya Fujie , Tetsunori Kobayashi

The Journal of the Acoustical Society of America 2006年11月 [査読有り]

担当区分：最終著者
Hybrid Voice Conversion of Unit Selection and Generation Using Prosody Dependent HMM

Masashi Okubo, Ryo Mochizuki, Tetsunori Kobayashi

Trans. on Information and Systems (ED) E89-D ( 11 ) 2775 - 2782 2006年11月 [査読有り]

担当区分：最終著者
Source Separation Using Multiple Directivity Patterns Produced by ICA-based BSS

Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 14th European Signal Processing Conference 2006年09月 [査読有り]

担当区分：最終著者
A Method for Solving the Permutation Problem of Frequency-Domain BSS Using Reference Signal

Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 14th European Signal Processing Conference 2006年09月 [査読有り]

担当区分：最終著者
Two-dimensional Heteroscedastic Linear Discriminant Analysis for Age-group Classification

Kazuya Ueki, Teruhide Hayashida, Tetsunori Kobayashi

Proc. 18th International Conference on Pattern Recognition 585 - 588 2006年08月 [査読有り]

担当区分：最終著者
対話ロボットの動作に頑健な頭部ジェスチャ認識

中島慶, 江尻康, 藤江真也, 小川哲司, 松坂要佐, 小林哲則

電子情報通信学会論文誌. D, 情報・システム = The IEICE transactions on information and systems (Japanese edition) 89 ( 7 ) 1514 - 1522 2006年07月 [査読有り]

担当区分：最終著者

　概要を見る

ロボットが人と対話をする上で,対話相手の頭部ジェスチャを認識することは,自然な対話を実現するために重要である.しかし,ロボット頭部に設置したカメラからの入力画像をジェスチャの認識に用いる場合,ロボット自身も頭部ジェスチャを行うことが求められるため画像は乱れ,認識は困難となる.本論文では,HMMによるジェスチャ認識を対象として,揺れの多い画像への対処法について検討した.具体的には,HMMの出力確率をロボットの動作ごとに用意した上で,これをロボット動作に応じて切り換えて使用することを試みた.評価実験の結果,ロボット動作に応じたモデルの切換を行うことで,これをしない場合に比べ79%のエラーを削減でき,提案法の有効性が確かめられた.

CiNii
Spoken Dialogue System Using Recognition of User's Feedback for Rhythmic Dialogue

Shinya Fujie, Riho Miyake, Tetsunori Kobayashi

Proc. Speech Prosody 2006 2006年05月 [査読有り]

担当区分：最終著者
MONEA: Message-Oriented Networked-Robot Architecture

Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi

Proc. Internarional Conference on Robotics and Automation 194 - 199 2006年04月 [査読有り]

担当区分：最終著者
MONEA:効率的多機能ロボット開発環境を実現するメッセージ指向ネットワークロボットアーキテクチャ

中野鐵兵,藤江真也,小林哲則

日本ロボット学会誌 24 ( 4 ) 115 - 125 2006年04月 [査読有り]

担当区分：最終著者
Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion

Tetsuji Ogawa, Tetsunori Kobayashi

Trans. on Information and Systems (ED) E89-D ( 3 ) 939 - 945 2006年03月 [査読有り]

担当区分：最終著者
Adaptive understanding of proposal-requesting expressions for conversational information retrieval system

Kenichiro Hosokawa, Shinya Fujie, Tetsunori Kobayashi

Systems and Computers in Japan 37 ( 14 ) 62 - 72 2006年

　概要を見る

This paper considers a conversational system in which information is provided in accordance with the conditions presented by the user, and proposes a method that can adequately deal even with unknown expressions. In most conventional systems, the relation between the expression and the intention of the utterance by the user is built into the system beforehand. Thus, it is difficult to deal adequately with unknown expressions which have not been learned. We propose a framework which adaptively manages on-line the relation between the expression and the intention by interaction with the user. The proposed method produces a framework in which the connection between the expression and the intention is dynamically modified according to the explicitness or implicitness of the affirmative or negative attitude shown by the user to the proposal made by the system. It is verified by an evaluation experiment that the system can adequately learn the relation between the expression and the intention of the user by the proposed method, and can deal adequately with unknown expressions. © 2006 Wiley Periodicals, Inc.

DOI

Scopus
Manifold HLDA and its application to robust speech recognition

Toshiaki Kubo, Tetsuji Ogawa, Tetsunori Kobayashi

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 1551 - 1554 2006年 [査読有り]

　概要を見る

A manifold heteroscedastic linear discriminant analysis (MHLDA) which removes environmental information explicitly from the useful information for discrimination is proposed. Usually, a feature parameter used in pattern recognition involves categorical information and also environmental information. A well-known HLDA tries to extract useful information (UT) to represent categorical information from the feature parameter. However, environmental information is still remained in the UI parameters extracted by HLDA, and it causes slight degradation in performance. This is because HLDA does not handle the environmental information explicitly. The proposed MHLDA also tries to extract UI like HLDA, but it handles environmental information explicitly. This handling makes MHLDA-based UI parameter less influenced of environment. However, as compensation, in MHLDA, the categorical information is little bit destroyed. In this paper, we try to combine HLDA-based UI and MHLDA-based UI for pattern recognition, and draw benefit of both parameters. Experimental results show the effectiveness of this combining method.
Subspace-based age-group classification using facial images under various lighting conditions

Kazuya Ueki, Teruhide Hayashida, Tetsunori Kobayashi

PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION - PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE 43 - + 2006年 [査読有り]

　概要を見る

This paper presents a framework of age-group classification using facial images under various lighting conditions. Our method is based on the appearance-based approach that projects images from the original image space into a face-subspace. We propose a two-phased approach (2DLDA+LDA), which is based on 2DPCA and LDA. Our experimental results show that the new 2DLDA+LDA-based approach improves classification accuracy more than the conventional PCA-based and LDA-based approach. Moreover, the effectiveness of eliminating dimensions that do not contain important discriminative information is confirmed. The accuracy rates are 46.3%, 67.8% and 78.1% for age-groups that are in the 5-year, 10-year and 15-year range respectively.
韻律情報を用いたスペクトル変換方式の検討

望月亮, 大久保雅史, 小林哲則

電子情報通信学会論文誌 J88-DII ( 11 ) 2269 - 2276 2005年11月 [査読有り]

担当区分：最終著者
Optimizing the Structure of Parly-Hidden Markov Models Using Weighted Likelihood-Ratio Maximization Criterion

Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 6th Annual Conf. of the Int'l Speech Communication Association 3353 - 3356 2005年09月 [査読有り]

担当区分：最終著者
Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue syste

Shinya Fujie, Kenta Fukushima, Tetsunori Kobayashi

Proc. 6th Annual Conf. of the Int'l Speech Communication Association 889 - 892 2005年09月 [査読有り]

担当区分：最終著者
A Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation Using Reference Signal

Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

14th European Signal Processing Conference 15 2005年09月 [査読有り]

担当区分：最終著者
Extension of Hidden Markov Models for Multiple Candidates and its Application to Gesture Recognition

Yosuke Sato, Tetsunoji Ogawa and Testunori Kobayashi

Trans. on Information and Systems (ED) E88-D ( 6 ) 1239 - 1247 2005年06月 [査読有り]

担当区分：最終著者
Speech recognition in the blind condition based on multiple directivity patterns using a microphone array

Toshiyuki Sekiya, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2005 1373 - 1376 2005年03月 [査読有り]

担当区分：最終著者
対話型情報検索システムにおける提案要求表現の適応的理解

細川健一郎, 藤江真也, 小林哲則

電子情報通信学会論文誌 J88-DII ( 3 ) 619 - 628 2005年03月 [査読有り]

担当区分：最終著者
肯定的/否定的発話態度の認識とその音声対話システムへの応用

藤江真也, 江尻康, 菊池英明, 小林哲則

電子情報通信学会論文誌 J88-DII ( 3 ) 488 - 498 2005年03月 [査読有り]

担当区分：最終著者
音声シフト：音高の意図的な変化を利用した音声入力インタフェース

尾本幸宏, 後藤真孝, 伊藤克亘, 小林哲則

電子情報通信学会論文誌 J88-DII ( 3 ) 469 - 489 2005年03月 [査読有り]

担当区分：最終著者
心的態度表現に寄与する韻律/スペクトル包絡特徴の評価

大久保雅史, 望月亮, 小林哲則

電子情報通信学会論文誌 J88-DII ( 2 ) 441 - 444 2005年02月 [査読有り]

担当区分：最終著者
人間形会話ロボット－パラ言語の生成・理解機能を持つマルチモーダルインタフェース－

小林哲則, 藤江真也, 松坂要佐, 白井克彦

日本音響学会誌 61 ( 2 ) 85 - 90 2005年02月 [査読有り] [招待有り]

担当区分：筆頭著者

DOI CiNii
A Conversation Robot with Back-Channel Feedback Function based on Linguistic and Nonlinguistic Information

Shinya Fujie, Kenta Fukushima, Tetsunori Kobayashi

Proc. International Conf. on Autonomous Robots and Agents 379 - 384 2004年12月 [査読有り]

担当区分：最終著者
Speech Spotter: On-demand Speech Recognition in Human-Human Conversation on the Telephone or in Face-to-Face Situations

Masataka Goto, Koji Kitayama, Katunobu Itou, Tetsunori Kobayashi

Proc. 5th Annual Conf. of the Int'l Speech Communication Association 2004年10月 [査読有り]

担当区分：最終著者
Speech Recognition Interface for Music Information Retrieval: Speech Completion'' and ``Speech Spotter

Masataka Goto, Katunobu Itou, Koji Kitayama , Tetsunori Kobayashi

ISMIR2004 403 - 408 2004年10月 [査読有り]

担当区分：最終著者
Recognition of three simultaneous utterance of speech by four-line directivity microphone mounted on head of robot

Naoya Mochiki,Toshiyuki Sekiya,Tetsuji Ogawa , Tetsunori Kobayashi

Proc. 5th Annual Conf. of the Int'l Speech Communication Association 2004年10月 [査読有り]

担当区分：最終著者
Prosody based Attitude Recognition with Feature Selection and Its Application to Spoken Dialog System as Para-Linguistic Information

Shinya Fujie, Daizo Yagi, Hideaki Kikuchi, Tetsunori Kobayashi

Proc. 5th Annual Conf. of the Int'l Speech Communication Association 2841 - 2844 2004年10月 [査読有り]

担当区分：最終著者
A low-band spectrum envelope reconstruction method for PSOLA-based F0 modification

Ryo Mochizuki, Tetsunori Kobayashi

Trans. on Information and Systems (ED) E87-D ( 10 ) 2426 - 2429 2004年10月 [査読有り]

担当区分：最終著者
A Conversation Robot Using Head Gesture Recognition as Para-Linguistic Information

Shinya Fujie, Yasuhi Ejiri, Kei Nakajima, Yosuke Matsusaka, Tetsunori Kobayashi

Proceedings of 13th IEEE International Workshop on Robot and Human Communication 159 - 164 2004年09月 [査読有り]

担当区分：最終著者
A Method of Gender Classification by Integrating Facial, Hairstyle, and Clothing Images

Kazuya Ueki, Hiromitsu Komatsu, Satoshi Imaizumi, Kenichi Kaneko, Satoshi Imaizumi, Nobuhiro Sekine, Jiro Katto, Tetsunori Kobayashi

Proc. Int'l Conf. on Pattern Recognition 446 - 449 2004年08月 [査読有り]

担当区分：最終著者
Design and Implementation of Data Sharing Architecture for Multifunctional Robot Development

Yosuke Matsusaka, Kentaro Oku, Tetsunori Kobayashi

Systems and Computers in Japan 35 ( 8 ) 54 - 65 2004年07月 [査読有り]

担当区分：最終著者
部分隠れマルコフモデルにおける状態・出力間依存関係の拡張と連続音声認識への適用

小川哲司, 小林哲則

電子情報通信学会論文誌. D-II, 情報・システム, II-パターン処理 = The transactions of the Institute of Electronics, Information and Communication Engineers. D-II 87 ( 6 ) 1216 - 1223 2004年06月 [査読有り]

担当区分：最終著者

　概要を見る

部分隠れマルコフモデル(Partly-Hidden Markov Model; PHMM)における状態と出力の相互依存関係を拡張し,連続音声認識に適用した.PHMMは,隠れ状態と観測可能な状態という二つの状態系列に依存して出力確率,状態遷移確率が決まる枠組みであり,従来のPHMMにおいては,隠れ状態と観測可能な状態は各々,出力確率を条件づけるものに対しても,状態遷移確率を条件づけるものに対しても,同じものを用いていた.ここでは,隠れ状態に関しては,出力確率,状態遷移確率の双方を条件づけるものとして同じものを用い,観測可能な状態に関しては,出力確率を条件づけるものと,状態遷移確率を条件づけるものとで,別のものを用いることを考える.このような簡単な改良により大きな自由度が与えられ,より精度の高い確率過程のモデルを実現できる.また,このように状態と出力の相互依存関係を拡張したPHMMとHMMを統合した確率モデルである平滑化部分隠れマルコフモデル(Smoothed PHMM; SPHMM)を構築し,検討を行った.新聞読上げ音声を対象とした連続音声認識実験の結果,PHMM,SPHMMはHMMに比べて,各々10%,24%の誤りを削減し,提案モデルの有効性が示された.

CiNii
Speech Enhancement based on Multiple Directivity patterns using a Microphone Array

Toshiyuki Sekiya, Tetsunori Kobayashi,

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2004 2004年05月 [査読有り]

担当区分：最終著者
A Low-band Spectrum Envelope Modeling For High Quality Pitch Modification

Ryo Mochizuki , Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2004 645 - 648 2004年05月 [査読有り]

担当区分：最終著者
Spoken Dialogue System Using Prosody as Para-Linguistic Information

Shinya Fujie, Daizo Yagi, Yosuke Matsusaka, Hideaki Kikuchi, Tetsunori Kobayashi

Proc. Int'l Conf. on Speech Prosody 2004 387 - 390 2004年03月 [査読有り]

担当区分：最終著者
Multi-Layer Audio Segregation and its Application to Double Talk

Toshiyuki Sekiya, Tomohiro Sawada Tetsuji Ogawa, Tetsunori Kobayashi

SWIM(Lectures by Masters in Speech Processing) 2004年01月 [査読有り]

担当区分：最終著者
Recognition of Para-Linguistic Information and Its Application to Spoken Dialogue System

Shinya FUJIE, Yasushi EJIRI, Yosuke MATSUSAKA, Hideaki KIKUCHI , Tetsunori KOBAYASHI

Proc. Automatic Speech Recognition and Understanding Workshop 231 - 236 2003年12月 [査読有り]

担当区分：最終著者
Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking

Noriyuki Murai, Tetsunori Kobayashi

Systems and Computer in Japan 34 ( 30 ) 103 - 111 2003年11月 [査読有り]

担当区分：最終著者
Speech Starter: Noise-Robust Endpoint Detection by Using Filled Pauses

Koji Kitayama, Masataka Goto, Katunobu Itou , Tetsunori Kobayashi

Proc. 4th Annual Conf. of the Int'l Speech Communication Association 1237 - 1240 2003年09月 [査読有り]

担当区分：最終著者
車運転時における音声利用の心的負荷評価

宗近純一,松坂要佐,小林哲則

第2回情報科学技術フォーラム FIT2003 情報技術レターズ 2 105 - 106 2003年09月 [査読有り]

担当区分：最終著者
多機能ロボット開発のための情報共有アーキテクチャの設計と実装

松坂要佐, 於久健太郎, 小林哲則

電子情報通信学会論文誌 J86-D-I ( 5 ) 318 - 329 2003年05月 [査読有り]

担当区分：最終著者
Hybrid modeling of PHMM and HMM for speech recognition

Tetsuji Ogawa, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2003 140 - 143 2003年04月 [査読有り]

担当区分：最終著者
Inter-Module Cooperation Architecture for Interactive Robot

KyeongJu Kim, Yosuke Matsusaka, Tetsunori Kobayashi

International Conference on Intelligent Robots and Systems 2286 - 2291 2002年10月 [査読有り]

担当区分：最終著者
Generalization of State-Observation-Dependency in Partly Hidden Markov Models

Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 3rd Annual Conf. of the Int'l Speech Communication Association 2673 - 2676 2002年09月 [査読有り]

担当区分：最終著者
System Architecture to Realize Widely Applicable and Interactive Behavior of the Robot

Yosuke Matsusaka, Tetsunori Kobayashi

Proc. Int'l Workshop on Lifelike Animated Agents -Tools, Affective Functions, and Applications- 77 - 82 2002年08月 [査読有り]

担当区分：最終著者
Media-Integrated Biometric Person Recognition Based on the Dempster-Shafer Theory

Yoshiaki Sugie, Tetsunori Kobayashi

Proc. Int'l Conf. on Pattern Recognition 2002 381 - 384 2002年08月 [査読有り]

担当区分：最終著者
Extension of Hidden Markov Models to Deal with Multiple Candidates of Observations and its Application to Mobile-robot-oriented Gesture Recognition

Yosuke Sato,Tetsunori Kobayashi

Proc. Int'l Conf. on Pattern Recognition 2002 515 - 519 2002年08月 [査読有り]

担当区分：最終著者
音声対話研究の現状と動向

小林哲則

人工知能学会誌 17 ( 3 ) 266 - 270 2002年05月 [査読有り] [招待有り]

担当区分：筆頭著者
Humanoid robots in waseda university—hadaly-2 and wabian

Shuji Hashimoto, Tetsunori Kobayashi, al

Autonomous Robots, Kluwer Academic Publishers 12 25 - 38 2002年 [査読有り]
System Software for Collaborative Development of Interactive Robot

Yosuke Matsusaka , Tetsunori Kobayashi

IEEE-RAS Int'l Conf. on Humanoid Robots 271 - 277 2001年11月 [査読有り]

担当区分：最終著者
Modeling of conversational strategy for the robot participating in the group conversation

Yosuke Matsusaka, Shinya Fujie, Tetsunori Kobayashi

Proc. 2nd Annual Conf. of the Int'l Speech Communication Association 2173 - 2176 2001年09月 [査読有り]

担当区分：最終著者
Estimating positions of multiple adjacent speakers based on MUSIC spectra correlation using a microphone array

Hidetomo Tanaka, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 2001 3045 - 3048 2001年05月 [査読有り]

担当区分：最終著者
日本語ディクテーション基本ソフトウェア(99年度版)

河原達也, 李晃伸, 小林哲則, 武田一哉, 峰松信明, 嵯峨山茂樹, 伊藤克亘, 伊藤彰則, 山本幹雄, 山田篤, 宇津呂武仁, 鹿野清宏

日本音響学会誌 57 ( 3 ) 210 - 214 2001年03月

　概要を見る

「日本語ディクテーション基本ソフトウェア」は, 大語彙連続音声認識(LVCSR)研究・開発の共通プラットフォームとして設計・作成された。このプラットフォームは, 標準的な認識エンジン・日本語音響モデル・日本語言語モデル及び日本語形態素解析・読み付与ツール等から構成される。99年度版では更なる高精度化・高速化そして大語彙化がなされた。本稿ではその仕様を述べると共に, 20, 000語彙及び60, 000語彙のディクテーションタスクにおける要素技術の評価を報告する。本ツールキットは, 無償で一般に公開されている。

DOI CiNii J-GLOBAL
DARPA音声プロジェクトと日本の音声認識研究

小林哲則

日本音響学会誌 57 ( 1 ) 70 - 60 2001年01月 [査読有り]

担当区分：筆頭著者
話者性と発話交替を考慮した複数話者対話音声の認識

村井則之, 小林哲則

電子情報通信学会論文誌DII J83-D-II ( 11 ) 2465 - 2742 2000年11月 [査読有り]

担当区分：最終著者
部分隠れマルコフモデルによる単語音声認識

古山純子, 小林哲則

電子情報通信学会論文誌DII J83-D-II ( 11 ) 2379 - 2387 2000年11月 [査読有り]

担当区分：最終著者
単語・クラス統計の融合と汎用コーパスの選択的利用に基づく小規模目的コーパスからの頑健な言語モデル作成法

和田陽介, 小林紀彦, 小林哲則

電子情報通信学会論文誌DII J83-D-II ( 11 ) 2379 - 2406 2000年11月 [査読有り]

担当区分：最終著者
部分隠れマルコフモデルとそのジェスチャ認識への応用

益満健, 小林哲則

情報処理学会論文誌 41 ( 11 ) 3060 - 2069 2000年11月 [査読有り]

担当区分：最終著者
Free software toolkit for japanese large vocabulary continuous speech recognition(共著)

T.Kawahara, A.Lee, T.Kobayashi, K.Takeda, N.Minematsu, S.Sagayama, K.Itou, A.Ito, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

Proc. 1st Annual Conf. of the Int'l Speech Communication Association 476 - 479 2000年09月 [査読有り]
Dictation of multi-party conversation using statistical turn taking model and speaker models

Noriyuki Murai, Tetsunori Kobayashi

Proc. of International Conference on Acoustic, Speech, Signal Processing 1575 - 1578 2000年06月 [査読有り]

担当区分：最終著者
A conversational robot utilizing facial and body expressions

Tsuyoshi Tojo, Yosuke Matsusaka, Tomotada Ishii, Tetsunori Kobayashi

Proc. International Conf. on System, Man and Cybernetics 858 - 863 2000年06月 [査読有り]

担当区分：最終著者
日本語ディクテーション基本ソフトウェア : 98年度版

河原達也, 李晃伸, 小林哲則, 武田一哉, 峯松信明, 伊藤克亘, 山本幹雄, 山田篤, 宇津呂武仁, 鹿野清宏

日本音響学会誌 56 ( 4 ) 255 - 259 2000年04月 [査読有り]

　概要を見る

「日本語ディクテーション基本ソフトウェア」は, 大語彙連続音声認識(LVCSR)研究・開発の共通プラットフォームとして設計・作成された。そのプラットフォームは, 標準的な認識エンジン・日本語音響モデル・日本語言語モデル及び日本語形態素解析・読み付与ツール等から構成される。98年度版では各モジュールに大幅な改良・改善がなされた。本稿ではその仕様を述べると共に, 20, 000語彙のディクテーションタスクにおける要素技術の評価を報告する。本ツールキットは, 無償で一般に公開されている。

DOI CiNii
Multi-person Conversation via Multi-modal Interface : A Robot who Communicate with Multi-user

Yosuke Matsusaka, Tsuyoshi Tojo, Sentaro Kubota, Kenji Furukawa, Daisuke Tamiya, Shinya Fujie, Tetsunori Koabyashi,

Proc. European Conf. on Speech Communication and Technology 1723 - 1726 1999年09月 [査読有り]

担当区分：最終著者
Class-combined Word N-gram for Robust Language Modeling

Noriyuki Kobayashi, Tetsunori Kobayashi

Proc. European Conf. on Speech Communication and Technology 1599 - 1602 1999年09月 [査読有り]

担当区分：最終著者
Multi-person Conversation Robot using Multi-modal Interface

Yosuke Matsusaka, Tsuyoshi Tojo, Sentaro Kubota, Kenji Furukawa,Shinya Fujie, Tetsunori Koabyashi

Proc. SCI/ICAS'99 450 - 455 1999年07月 [査読有り]

担当区分：最終著者
Controlling Dialogue Strategy According to Performance of Processing

Hideaki Kikuchi, Tetsunori Kobayashi, Katsuhiko Shirai

Proc. ESCA Workshop on Interactive Dialogue in Multi-Modal Systems 85 - 88 1999年06月 [査読有り]
Japanese dictation toolkit: 1997 version

T.Kawahara, A.Lee, T.Koabayshi, K.Takeda, N.Minematsu, K.Ito, A.Itoh, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

Journal of the Acoustical Society of Japan E 20 ( 3 ) 233 - 239 1999年05月 [査読有り]
JNAS: Japanese speech corpus for large vocabulary continuous speech recognition reseach

K.Ito, M.Yamamoto, K.Takeda, T.Takezawa, T.Matsuoka, Tetsunori Kobayashi, K.Shikano and S.Itahashi

Journal of the Acoustical Society of Japan E 20 ( 3 ) 199 - 207 1999年05月 [査読有り]
大語彙連続音声認識における連鎖語の追加による語彙拡大の効果

和田陽介,小林紀彦,中野裕一郎,小林哲則

情報処理学会論文誌 40 ( 4 ) 1413 - 1420 1999年04月 [査読有り]

担当区分：最終著者
Partly Hidden Markov Model and its Application to Speech Recognition

T.Kobayashi, K.Masumitsu, J.Furuyama,

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1999 121 - 124 1999年03月 [査読有り]

担当区分：筆頭著者
日本語ディクテーション基本ソフトウェア-1997年版

河原達也, 李晃伸, 小林哲則, 武田一哉, 峯松信明, 嵯峨山茂樹, 伊藤克亘, 伊藤彰則, 山本幹雄, 山田篤, 宇津呂武仁, 鹿野清宏

日本音響学会誌 55 ( 3 ) 175 - 180 1999年03月 [査読有り]
The Design of the Newspaper-Based Japanese Large Vocabulary Continuous Speech Recognition Corpus

K.Itoh, M.Yamamoto, K.Takezawa, T.Matsuoka, K.Shikano, T.Kobayashi, S.Itahashi,

Proc. 5th Int'l Conf. on Spoken Language Processing 3261 - 3264 1998年12月 [査読有り]
Sharable software repository for Japanese large vocabulary continuous speech recognition

T.Kawahara, T.Kobayashi, K.Takeda, N.Minematsu, K.Itou, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

Proc. 5th Int'l Conf. on Spoken Language Processing 3257 - 3260 1998年12月 [査読有り]
Source-Extended Language Model for Large Vocabulary Continuous Speech Recognition

Tetsunori Kobayashi, Norihiko Kobayashi , Yosuke Wada

Proc. 5th Int'l Conf. on Spoken Language Processing 2431 - 2434 1998年12月 [査読有り]

担当区分：筆頭著者
Controlling Gaze of Humanoid in Communication with Human

H.Kikuchi,M.Yokoyama,K.Hoashi,Y.Hidaki,T.Kobayashi,K.Shirai

Proc.IROS98/IEEE 255 - 260 1998年10月 [査読有り]
Design and Development of Japanese Speech Corpus for Large Vocabulary Continuous Speech Recognition Assessment

K.Itou,K.Takeda,T.Takezawa,T.Matsuoka,K.Shikano,T.Kobayashi,S.Itahashi,M.Yamamoto

Proc. of First International Workshp on East-Asian Language Resorces and Evaluation 98 - 103 1998年05月 [査読有り]
Common Platform of Japanese Large Vocabulary Continuous Speech Recognizer—Proposal and Initial Results

T.Kawahara, A.Lee, T.Kobayashi, K.Takeda, N.Minematsu, K.Itou, A.Ito, M.Yamamoto, A.Yamada, T.Utsuro, K.Shikano

Proc. of First International Workshp on East-Asian Language Resorces and Evaluation 117 - 122 1998年05月 [査読有り]
花開く音声処理技術

白井克彦,小林哲則,工藤育男

情報処理学会誌 38 ( 11 ) 971 - 975 1997年10月 [査読有り] [招待有り]
ヒューマノイド —人間形高度情報処理ロボット—

橋本周司,成田誠之助,白井克彦,小林哲則,高西淳夫,菅野重樹,笠原博徳

情報処理学会誌 38 ( 11 ) 959 - 969 1997年10月 [査読有り]
Humanoid Robot ---Development of an Information Assistant Robot Hadaly---

Hashimoto, S, Narita, H. Kasahara, A. Takanishi, S. Sugano, K. Shirai, T. Kobayashi, H. Takanobu, T. Kurata, K. Fujiwara, T. Matsuno, T. Kawasaki , K. Hoashi

Proc. Int'l Workshop on Robot and Human Communication 106 - 111 1997年09月 [査読有り]
Development of ASJ Continuous Speech Corpus --- Japanese Newspaper Article Sentences (JNAS) ---

Shuichi ITAHASHI, Mikio YAMAMOTO, Toshiyuki TAKEZAWA, Tetsunori KOBAYASHI

Proc. Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques 1997年09月 [査読有り]
Partly Hidden Markov Model and its Application to Gesture Recognition

Tetsunori Kobayashi, Sataoshi Haruyama

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1997 3081 - 3084 1997年04月 [査読有り]

担当区分：筆頭著者
Human interface of the humanoids

Tetsunori Kobayashi

Proc. International Workshop on Human Interface Technology 63 1997年03月 [査読有り]

担当区分：筆頭著者
マルチモーダル入力環境下における音声の協調的利用 : 音声作図システムS-tgifの設計と評価

西本卓也, 志田修利, 小林哲則, 白井克彦

電子情報通信学会論文誌. D-II, 情報・システム, II-情報処理 79 ( 12 ) 2176 - 2183 1996年12月 [査読有り]

担当区分：責任著者

　概要を見る

マルチモーダルインタフェースの枠組みの中で音声入力がどのようにインタフェースの改善に貢献し得るかを検討し,そこで得た知見を生かしたマルチモーダル作図システムS-tgifを作成・評価した.システムの作成にあたっては,インタフェースの原則論に従って音声の特長である操作性および手順連想容易性を生かし,欠点である状態理解容易性,頑健性を他で補うよう努めた.評価実験の結果,システムの利用を開始してまもない時期あるいは一時利用を中断した後などにおいては特に音声の利用効果が高く,課題の完了までに要する時間を約80%に減少できた.ユーザがシステムに熟練すると音声の利用の客観的効果は薄れるが,特定のコマンドでは音声の利用率が90%を超え,また主観評価の結果でも高い評価を得るなど,音声入力はユーザから支持された.このように,インタフェースの原則論に従って音声の効果的利用を考慮することにより,有用なインタフェースを構築できることが示された.

CiNii
ALICE : Acquisition of Language In Conversational Environment : An Approach to Weakly Supervised Training of Spoken Language System for Language Porting

Tetsunori Kobayashi

Proc. 4th Int'l Conf. on Spoken Language Processing 833 - 836 1996年10月 [査読有り]

担当区分：筆頭著者
An application of Dempster and Shafer's probability theory to speech recognition

Tetsunori Kobayashi

The Journal of the Acoustical Sosiety of America 100 ( 4 Pt.2 ) 2757 1996年10月 [査読有り]

担当区分：筆頭著者
Speech recognition in nonstationary noise based on parallel HMMs and spectral subtraction

Ryuji Mine, Tetsunori Kobayashi, Katsuhiko Shirai

Systems and Computers in Japan 27 ( 14 ) 37 - 44 1996年 [査読有り]

担当区分：責任著者

　概要を見る

This paper proposes a method of speech recognition in a nonstationary noisy environment, combining the parallel HMMs and the spectral subtraction. In the proposed method, a set of hypothesis is generated with respect to the combination of the speech and the noise that can produce the observed data by a series of subtraction processes. Using HMMs prepared separately for the speech and the noise, the probabilities of occurrence are calculated. The 100-word recognition in the noisy environment in an ordinary car running in an urban area, is defined as the task in the experiment. Comparative experiments, are made for the proposed method, the ordinary spectral subtraction method and other parallel HMM methods. Then, the effectiveness of the proposed method is verified.

DOI

Scopus

2

被引用数

(Scopus)
Improving human interface in drawing tool using speech, mouse and key-board

Takuya Nishimoto, Nobutoshi Shida, Tetsunori Kobayashi , Katsuhiko Shirai

Proc. International Workshop on Robot and Human Communication 107 - 112 1995年 [査読有り]

担当区分：責任著者
Phoneme recognition in various styles of utterance based on mutual information criterion(共著)

Shigeki Okawa, Tetsunori Kobayashi , Katsuhiko Shirai

Proc. 3rd Int'l Conf. on Spoken Language Processing 1911 - 1917 1994年09月 [査読有り]
Multimodal drawing tool using speech, mouse and key-board(共著)

T.Nishimoto, N.Shida, T.Kobayashi K.Shirai

Proc. 3rd Int'l Conf. on Spoken Language Processing 1287 - 1290 1994年09月 [査読有り]

担当区分：責任著者
Generation of prosody in speech synthesis using large speech data-base

Naohiro Sakurai, Takemi Mochida, Tetsunori Kobayashi , Katsuhiko Shirai

Proc. 3rd Int'l Conf. on Spoken Language Processing 747 - 750 1994年09月 [査読有り]

担当区分：責任著者
情報量を基準とした音韻性抽出に基づく連続音声中の音韻認識

大川茂樹,小林哲則,白井克彦

日本音響学会誌 50 ( 9 ) 702 - 710 1994年09月 [査読有り]
音声対話インターフェースにおける発話権管理による割り込みへの対処

菊池英明, 工藤育男, 小林哲則, 白井克彦

電子情報通信学会論文誌 J77-D ( 8 ) 1502 - 1511 1994年08月 [査読有り]
並列HMMとスペクトルサブトラクションによる非定常雑音環境下における音声認識

嶺竜治, 小林哲則, 白井克彦

電子情報通信学会誌 DII J78-DII ( 7 ) 1021 - 1027 1994年07月 [査読有り]

担当区分：責任著者
対話音声の認識技術

小林哲則

日本音響学会誌 50 ( 7 ) 563 - 567 1994年07月 [査読有り] [招待有り]

担当区分：筆頭著者
Characterization of fluctuations in fundamental periods of speech based on fractal analysis

Tetsunori Kobayashi

The Journal of the Acoustical Society of America 95 ( 5 Pt.2 ) 2824 1994年05月 [査読有り]

担当区分：筆頭著者
Automatic training of phoneme dictionary based on mutual information criterion

Shigeki Okawa, Tetsunori Kobayashi , Katsuhiko Shirai

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1994 241 - 244 1994年04月 [査読有り]
Markov model based noise modeling and its application to noisy speech recognition using dynamical features of speech

Tetsunori Kobayashi, Ryuji Mine , Katsuhiko Shirai

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1994 57 - 60 1994年04月 [査読有り]

担当区分：筆頭著者
音素群対確率比空間における音素認識

小林哲則, 浜野康和, 安成雨, 白井克彦,

電子情報通信学会論文誌 J77-A ( 2 ) 128 - 134 1994年02月 [査読有り]

担当区分：筆頭著者
Speech synthesis of japanese sentences using large waveform data-base

Takemi Mochida, Tetsunori Kobayashi, Katsuhiko Shirai

1993 International Workshop on Speech Processing 95 - 100 1993年11月 [査読有り]
Word spotting in conversational speech based on phonemic likelihood by mutual information criteion

S.Okawa, T.Kobayashi , K.Shirai

Proc. European Conf. on Speech Communication and Technology 1281 - 1284 1993年09月 [査読有り]
Speech recognition under the unstationary noise based on the noise markov model and spectral subtraction

T.Kobayashi, R.Mine , K.Shirai

Proc. European Conf. on Speech Communication and Technology 833 - 836 1993年09月 [査読有り]

担当区分：筆頭著者
隠れマルコフモデルに基づく音声認識

小林哲則

電気学会論文誌 C 電子・情報・システム部門誌 113 ( 5 ) p295 - 301 1993年05月 [査読有り] [招待有り]

担当区分：筆頭著者

CiNii
Design and creation of speech and text corpora of dialogue

Satoru Hayamizu, Shuichi Itahashi, Tetsunori Kobayashi, Toshiyuki Takezawa

Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA) E76-A ( 1 ) 17 - 22 1993年 [査読有り]
Phrase recognition in conversational speech using prosodic and phonemic information

Shigeki Okawa, Takashi Endo, Tetsunori Kobayashi, Katsuhiko Shirai

Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA) E76-A ( 1 ) 44 - 50 1993年 [査読有り]
High quality syntheic speech generation using synchronized oscillators

Kenji Hashimoto, Takemi Mochida, Yasuaki Satoh, Tetsunori Kobayashi, Katsuhiko Shirai

Trans. on Fundamentals of Electronics, Communications and Computer Sciences (EA) E76-A ( 11 ) 1949 - 1956 1993年 [査読有り]
日本音響学会研究用連続音声データベース

小林哲則, 板橋秀一, 速水悟, 竹沢俊幸

日本音響学会誌 Vol.48 ( No.12 ) 888 - 893 1992年12月 [査読有り]

担当区分：筆頭著者
Spectral mapping onto probabilistic domain using neural networks and its application to speaker adaptive phoneme recognition

T.Kobayashi

Proc. 2nd Int'l Conf. on Spoken Language Processing 385 - 388 1992年11月 [査読有り]

担当区分：筆頭著者
Speaker adaptive phoneme recognition based on spectral mapping to probabilistic domain

T.Kobayashi, Y.Uchiyama, J.Osada , K.Shirai

Proc. of International Conference on Acoustics, Speech and Signal Processing 457 - 460 1992年03月 [査読有り]

担当区分：筆頭著者
Fractal dimension of fluctuations in fundamental period of speech

K.Shirai, T.Kobayashi, M.Yagyu

Proc. of International Conference on Noise in Physical Systems and 1/f Fluctuations 1991年11月 [査読有り]

担当区分：責任著者
音声生成過程の可視化と色表示について

白井克彦, 小林哲則

可視化情報学会誌 11 ( 43 ) 216 - 221 1991年10月 [査読有り]
合成音の自然性に対する基本周期揺らぎの役割

小林哲則, 関根英敏

日本音響学会誌 47 ( 8 ) 539 - 544 1991年08月 [査読有り]

担当区分：筆頭著者
Estimation of articulatory motion using neural networks

Katsuhiko Shirai, Tetsunori Kobayashi

Journal of Phonetics 19 379 - 385 1991年08月 [査読有り]

担当区分：責任著者
Analysis of cotextual dependency of phonetic features and its application to speech recognition

T.Kobayashi, K.Watanabe, Y.Uchiyama

Proc. Korea-Japan joint workshop on advanced technology of speech recognition and synthesis 92 - 97 1991年07月 [査読有り]

担当区分：筆頭著者
Application of neural networks to articulatory motion estimation

T.Kobayashi, M.Yagyu, K.Shirai

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1991 489 - 492 1991年05月 [査読有り]

担当区分：筆頭著者
母音および無声破裂子音スペクトルの音韻環境依存性

小林哲則, 渡辺一博松田俊幸

電子情報通信学会論文誌A J74-A ( 3 ) 353 - 359 1991年 [査読有り]

担当区分：筆頭著者
Statistical properties of fluctuation of pitch intervals and its modeling for natural synthetic speech

T.Kobayashi,H.Sekine

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1990 321 - 324 1990年 [査読有り]

担当区分：筆頭著者
Dependence of phonemic feature on context

T.Kobayashi, K.Watanabe

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1990 769 - 772 1990年 [査読有り]

担当区分：筆頭著者
範疇間の依存関係を扱う数量化理論とその音韻スペクトル変形のモデル化に対する応用

小林哲則 , 松田俊幸, 渡辺一博

電子情報通信学会論文誌A J74-A ( 3 ) 345 - 352 1990年 [査読有り]

担当区分：筆頭著者
Contextual Factor Analysis of Vowel Distribution

Tetsunori Kobayashi, Toshiyuki Matsuda, Kazuhiro Watanabe

Proc. European Conf. on Speech Communication and Technology 2277 - 2280 1989年 [査読有り]

担当区分：筆頭著者
A categorical factor analysis of vowel distribution based on the modified qualification theory

Tetsunori Kobayashi, Toshiyuki Matsuda

The Journal of the Acoustical Society of America 1988年 [査読有り]

担当区分：筆頭著者
Speech Production Model and Automatic Recognition

Katsuhiko Shirai, Tetsunori Kobayashi

Nature, Cognition and System I 3 - 14 1988年 [査読有り]
Description of Task Dependent Knowledge for Speech Understanding System

Tetsunori Kobayashi, Katsuhiko Shirai

European Conference on Speech Technology 1987年 [査読有り]

担当区分：筆頭著者
The robot musician ‘wabot-2’(waseda robot-2)

Ichiro Kato, Sadamu Ohteru, Katsuhiko Shirai, Toshiaki Matsushima, Seinosuke Narita, Shigeki Sugano, Tetsunori Kobayashi, Eizo Fujisawa

Robotics 3 ( 2 ) 143 - 155 1987年 [査読有り]
A network model dealing with focus of conversation for speech understanding system

Tetsunori Kobayashi, Katsuhiko Shirai

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1986 1589 - 1592 1986年 [査読有り]

担当区分：筆頭著者
Estimation of articulatory parameters by table look-up method and its application for speaker independent phoneme recognition

Katsuhiko Shirai, Tetsunori Kobayashi, Jun Yazawa

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1986 2247 - 2250 1986年 [査読有り]

担当区分：責任著者
Estimating articulatory motion from speech wave

Katsuhiko Shirai, Tetsunori Kobayashi

Speech Communication 5 ( 2 ) 159 - 170 1986年 [査読有り]

　概要を見る

If articulatory movements can be estimated, then the articulatory parameters which represent the motion of the articulatory organs would be useful for speech recognition. This paper discusses an effective method of estimating articulatory movements and its application to speech recognition. Firstly, what is described is a method of estimating articulatory parameters known as the model matching method, and various spectral distance measures are evaluated for this method. The results show that the best in average is the higher order cepstral distance measure, which is one of the peak weighted measure. Secondly, articulatory parameters are utilized for the recognition of vowels uttered by unspecified speakers. It is shown that the adaptation of the model by the estimated mean vocal tract length is effective to normalize speaker difference. Thirdly, the motor commands to move the articulatory organs are estimated considering articulatory dynamics, and the continuous vowels are recognized by means of these estimated commands. It has been found that a considerable part of the coarticulation effects can be compensated for by this command estimated, and the method is useful for continuous speech recognition. © 1986.

DOI

Scopus

28

被引用数

(Scopus)
モデル・マッチング法による調音状態推定のためのスペクトル距離尺度の検討

小林哲則, 矢沢淳, 白井克彦

電子通信学会論文誌A J68-A ( 2 ) 210 - 217 1985年10月 [査読有り]

担当区分：筆頭著者
ロボットとの柔軟な対話を目的とした音声入出力システム

白井克彦, 小林哲則, 岩田和彦, 深沢克夫

日本ロボット学会誌 3 ( 4 ) 362 - 372 1985年08月 [査読有り]

担当区分：責任著者
大語彙を対象とした文節音声の認識

小林哲則, 小森康弘, 白井克彦

電子通信学会論文誌D J68-D ( 6 ) 1304 - 1311 1985年06月 [査読有り]

担当区分：筆頭著者

CiNii
Speech conversation system of the musician robot

Tetsunori Kobayashi, Y. Komori, N. Hashimoto, Kazuhiko Iwata, Y. Fukazawa, K. Shirai

Proc. ICAR'85 483 - 488 1985年 [査読有り]

担当区分：筆頭著者
Speech I/O System Realizing Flexible Conversation for Robot--The Conversational System of WABOT-2

Katsuhiko Shirai, Tetsunori Kobayashi, Kaduhiko Shirai, Yoshio Fukazawa

Bulletin of Science and Egineering Resaerch Laboratory, Waseda University 112 53 - 79 1985年 [査読有り]

担当区分：責任著者
調音制御モデルに基づく連続音声中の母音認識

小林哲則, 白井克彦

電子通信学会論文誌A J67-A ( 10 ) 935 - 942 1984年10月 [査読有り]

担当区分：筆頭著者

CiNii
Phrase speech recognition of large vocabulary using feature in articulatory domain

Katsuhiko Shirai, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1984 409 - 412 1984年 [査読有り]

担当区分：責任著者
Considerations on articulatory dynamics for continuous speech recognition

Katsuhiko Shirai, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1983 324 - 327 1983年 [査読有り]

担当区分：責任著者
不特定話者の連続音声認識に対する調音パラメタの有効性

白井克彦, 松浦博, 小林哲則

電子通信学会論文誌A J65-A ( 7 ) 671 - 678 1982年07月 [査読有り]

CiNii
Recognition of semivowels and consonants in continuous speech using articulatory parameters

Katsuhiko Shirai, Tetsunori Kobayashi

Proc. Int'l Conf. on Acoustic, Speech, Signal Processing 1982 2004 - 2007 1982年 [査読有り]

担当区分：責任著者

▼全件表示

書籍等出版物

Paralinguistic Information and its Integration in Spoken Dialogue Systems

Ramón López-Cózar Delgado, Tetsunori Kobayashi( 担当：共編者(共編著者))

Springer 2011年 ISBN: 9781461413349
音声言語処理の潮流

白井克彦( 担当：分担執筆)

コロナ社 2010年03月 ISBN: 9784339008104
韻律と音声言語情報処理 : アクセント・イントネーション・リズムの科学

広瀬啓吉( 担当：分担執筆)

丸善 2006年01月 ISBN: 4621076744
情報システムとヒューマンインターフェース

白井克彦( 担当：分担執筆)

早稲田大学出版部 2010年03月 ISBN: 9784657101099
Springer handbook of robotics

Siciliano, Bruno, Khatib, Oussama( 担当：分担執筆)

Springer 2008年 ISBN: 9783540239574
人工知能学事典

人工知能学会( 担当：分担執筆)

共立出版 2005年12月 ISBN: 4320121074
ロボット工学ハンドブック

日本ロボット学会( 担当：分担執筆)

コロナ社 2005年06月 ISBN: 9784339045765
Spoken language systems

Seiichi Nakagawa, Michio Okada, Tatsuya Kawahara( 担当：分担執筆)

Ohmsha,IOS 2005年 ISBN: 1586035150
感性情報学 : 感じる・楽しむ・創りだす : 感性的ヒューマンインタフェース最前線

原島博, 井口征士, 乾敏郎( 担当：分担執筆)

工作舎 2004年05月 ISBN: 4875023782
マルチメディア処理入門

新田恒雄, 岡村好庸, 杉浦彰彦, 小林哲則, 金沢靖, 山本真司( 担当：共著)

朝倉書店 2002年04月 ISBN: 4254205074
人間型ロボットのはなし

早稲田大学ヒューマンノイドプロジェクト( 担当：共著)

日刊工業新聞社 1999年06月 ISBN: 4526043974

ASIN
Cで学ぶプログラミング技法

小林哲則( 担当：単著)

培風館 1997年11月 ISBN: 4563013951
Recent research towards advanced man-machine interface through spoken language.

Hiroya Fujisaki( 担当：分担執筆)

Elsevier 1996年10月 ISBN: 0444816070

ASIN
International Symposium on Spoken Dialogue : New directions in human and man-machine communication

Katsuhiko Shirai, Tetsunori Kobayashi, Yasunari Harada( 担当：共編者(共編著者))

ISSD Organizing Committee 1993年11月 ISBN: 4990026918

▼全件表示

講演・口頭発表等

会話のできるロボットと身体を持った会話システム

小林哲則 [招待有り]

日本音響学会2025年春季研究発表会, スペシャルセッション「音声対話技術の新展開1」

発表年月： 2025年03月
HANASHI-JOZU = A Good Conversationalist: Goodbye Request-response Model, Hello Pre-planed Information Transfer Model.

Tetsunori Kobayashi [招待有り]

Multimodal Agents for Ageing and Multicultural Societies, NII Shonan meeting

発表年月： 2018年10月
A robot-based enjoyable conversation system

Tetsunori Kobayashi [招待有り]

5-th ASA-ASJ Joint meeting, Nov.2016

発表年月： 2016年11月
A robot-based approach towards finding conversation protocol: the role of prosody, eye gaze and body expressions in communication

Tetsunori Kobayashi [招待有り]

The Fifth Workshop of Eye Gaze in Intelligent Human Machine Interaction

発表年月： 2013年
Robot as a multimodal human interface device

Tetsunori Kobayashi [招待有り]

International Conference on Auditory-Visual Speech Processing

発表年月： 2010年10月
Conversation robot recognizing and expressing paralinguistic information

Tetsunori Kobayashi [招待有り]

Workshop on Predictive Models of Human Communication Dynamics

発表年月： 2010年08月
情報遭遇型会話システム：多様な情報行動による知識の伝達

小林哲則 [招待有り]

人工知能学会/情報処理学会/電子情報通信学会第7回対話システムシンポジウム

発表年月： 2016年10月
会話とロボット

[招待有り]

MMDAgent DAY !

発表年月： 2016年10月
Enjoyable Conversation System

Tetsunori Kobayashi [招待有り]

InterACT 2016

発表年月： 2016年07月
会話ロボットとそのプロトコル

小林哲則 [招待有り]

日本音響学会春季研究発表会

発表年月： 2016年03月
会話向け音声合成システム

小林哲則, 岩田和彦 [招待有り]

電子情報通信学会・音声研究会

発表年月： 2014年11月
History of the Conversational Robot

Tetsunori Kobayashi [招待有り]

International Workshop on Spoken Dialogue System

発表年月： 2010年10月
マルチモーダル会話ロボットとグループコミュニケーション

小林哲則 [招待有り]

電子情報通信学会 VNV研究会

発表年月： 2009年03月
音声認識応用システム開発の新パラダイム

小林哲則 [招待有り]

情報処理学会/電子情報通信学会, 第10回音声言語シンポジウム

発表年月： 2008年12月
パラ言語の理解・生成機能によるリズムある対話コミュニケーションの実現

小林哲則 [招待有り]

日本ロボット学会・ロボット工学セミナー

発表年月： 2006年03月
パラ言語の理解・生成能力を有する会話ロボット

小林哲則 [招待有り]

電子情報通信学会・パターン認識メディア理解研究会

発表年月： 2005年09月
ロボット頭部に設置したマイクロホンによる音源定位・音源分離

[招待有り]

日本音響学会・春季研究発表会

発表年月： 2005年03月
音声認識技術の現状と課題

小林哲則 [招待有り]

電子情報通信学会音声実用化シンポジウム

発表年月： 2004年03月
会話ロボットの実現に向けて

小林哲則 [招待有り]

電子情報通信学会・ヒューマンコミュニケーション基礎(HCS)研究会

発表年月： 2003年04月
ヒューマノイドロボットにおけるマルチモーダル会話インタフェース

小林哲則, 白井克彦 [招待有り]

第1回音声言語シンポジウム（音声研究会,言語理解とコミュニケーション研究会共催）

発表年月： 1999年12月
Trend of Stochastic Speech Recognition

Tetsunori Kobayashi [招待有り]

ISCIE Stochastic System Symposium

発表年月： 1997年11月

▼全件表示

共同研究・競争的資金等の研究課題

音声メディアを利用した情報伝達における相互行為の時間構造的特徴と伝達効率の関係

日本学術振興会科学研究費助成事業

研究期間:

2018年04月

-

2022年03月

小林哲則, 藤江真也, 森大毅, 徳田恵一

　概要を見る

音声メディアでまとまった情報を効率的に伝達するにあたっては，情報伝達に会話的要素を取り入れた上で，相互行為のリズムを保証することが重要である。本研究では，文書を要約・断片化して順次伝える傍らで，随時ユーザの反応をモニタし，反応に応じて要約時に削減した情報を復元・提示することに特徴を持つ，会話型情報伝達システムをベースシステムとして採用したうえで，これに会話相互行為の時間構造に関する制約をモデル化して組み込んだ。また，システムの重要な要素技術として，低遅延音声認識技術，表現豊かな音声合成技術，パラ言語理解技術を取り上げ，独自性の高い高性能な技術を実現した。
自然なタイミングと感性で応答する使いやすい音声対話サービス事業

NEDO イノベーション実用化ベンチャー支援事業

研究期間:

2013年04月

-

2014年03月
情報家電センサー・ヒューマンインターフェイスデバイス活用技術開発＜音声認識基盤技術の開発＞に係るもの

経済産業省戦略的技術開発委託費

研究期間:

2006年

-

2009年
人物行動パターン自動解析装置の開発

NEDO, 大学発事業創出実用化開発費助成金

研究期間:

2003年03月

-

2004年03月
韻律制御に主体をおいた対話システム

日本学術振興会科学研究費助成事業

研究期間:

2000年

-

2003年

小林哲則, 中川聖一, 菊池英明, 白井克彦, 匂坂芳典, 甲斐充彦

　概要を見る

今年度の成果は以下の通りである。
a)対話のリズムと韻律制御
前年度までの成果に基づいて、対話における話題境界の判別を題材に、韻律情報におけるアクセント句単位でのパラメータを用いて統計的なモデルを学習し、オープンデータに対しても人間と同程度の判別精度が得られることを確認した。(白井・菊池)
自然な対話システムを構築する上で重要なシステム側の相槌生成と話者交替のタイミングの決定を、韻律情報と表層的言語情報を用いて行う方法を開発した。この決定法を、実際に天気予報を題材にした雑談対話システムに実装し、被験者がシステムと対話することにより主観的な評価を行い、有用性を確認した。(中川)
b)対話音声理解応用
対話音声における繰り返しの訂正発話に関する特徴の統計的な分析結果を踏まえ、フレーズ単位の韻律的特徴の併用と訂正発話検出への適用を評価した。また、これらと併せた頑健な対話音声理解のため、フィラーの韻律的な特徴分析・モデル化の検討を行った。(甲斐)
c)対話音声合成応用
語彙の韻律的有標性について程度の副詞を用い、生成・聴覚の両面から分析を行い、自然な会話音声生成のための韻律的強勢制御を実現した。また、統計的計算モデルによる話速制御モデルを作成し、会話音声にみられる局所話速の分析を進め、自由な話速の制御を可能とした。さらに、韻律制御パラメータが合成音声の自然性品質に及ぼす影響を調べた。(匂坂)
d)対話システム
上記の成果をまとめ,対話システムを実装した。特に,顔表情の認識・生成システム,声表情の認識・生成システムなどを前年度までに開発した対話プラットホーム上に統合し,パラ言語情報の授受を可能とするリズムある対話システムを構築した。(小林)
高齢者におけるSNS利用障壁の調査とその改善に関する研究

日本学術振興会科学研究費助成事業挑戦的萌芽研究

研究期間:

2013年04月

-

2016年03月

小林哲則, 中野鐵兵

　概要を見る

社会との繋がりが希薄になりがちな高齢者が、様々なソーシャルメディアの共有を受けることで主に家族とのコミュニケーションを楽しむことを可能にする枠組み・インタフェースの提案・設計・実装を実施し、その有効性を検証した。高齢者がソーシャルメディア利用を妨げる障壁の一つであるメディアへの到達や操作方法の難しさを、第三者からの支援を受ける枠組みとインタフェースの単純化、操作性の可視化・統一により解決した。１００名の高齢者へのアンケート調査、３１名の高齢者へのインタフェース操作実験、３組の高齢者への２～３ヶ月の評価実験等を通して、提案手法の有効性を検証した。
リズムある会話を可能とするコミュニケーションロボットに関する研究

日本学術振興会科学研究費助成事業基盤研究(B)

研究期間:

2008年

-

2010年

小林哲則, 藤江真也, 小川哲司, 高西敦夫, 松山洋一, 岩田和彦

　概要を見る

言語・パラ言語の生成・理解処理を高度化することで,複数の人間と自然なリズムで会話できるコミュニケーションロボットを実現した.また,このロボットを用いて,人同士の会話を活性化することを試みた.この目的のため,ロボットへの性格付与とパラ言語表現機能を考慮したロボットハードウェア,会話状況に沿うロボットの振る舞い,魅力ある会話の進行方式などを設計した.また,ロボットの聴覚機能および発話方式の高度化についても検討した.
実体モデルに基づく声質生成機構の構築

日本学術振興会科学研究費助成事業

研究期間:

2007年

-

2010年

誉田雅彰, 高西淳夫, 小林哲則, 匂坂芳典, 福井孝太郎

　概要を見る

人間の発声・発話機構を模した機械的実体モデル(発話ロボット)を基に声質の生成・制御機構を構築し、声質の生成メカニズムを構成的に明らかにすることを目的として研究を進めた。その結果、人間に類似した実体モデルの喉頭調節を行うことによって気息音やフライ音などの多様な声質や笑い声やしゃべり笑いなどの声の感性情報を再現できることを示すとともに、これらの声の生成時における声帯振動や空気流体音響現象を明らかにした。
音声認識技術実用化に向けた先導研究

NEDO 高度情報通信機器・デバイス基盤プログラム

研究期間:

2005年06月

-

2006年03月
言語・パラ言語の生成・理解能力を有する会話システムに関する研究

日本学術振興会科学研究費助成事業基盤研究(B)

研究期間:

2003年

-

2006年

小林哲則, 藤江真也, 小川哲司, 松坂要佐

　概要を見る

自然な音声対話コミュニケーションを成立させるための要件を探るための道具として,言語情報の理解・表出機能に加え,パラ言語情報の理解・表出機能を持つ音声対話システムのプロトタイプを作成した.
近年の音声認識・合成の著しい研究成果にも係らず,一般の利用者に広く受け入れられる可能性を感じさせるような,自然性の高い音声対話システムは見られない.この一つの要因が,対話にとって重要な役割を果たすパラ言語情報(顔表情や声表情に含まれる,発話者の内的・心的状態を伝える情報)の軽視にある.しかし,言語・パラ言語をバランスよく扱う対話システムを作る際必要なパラ言語に関する定量的知見はほとんどない。本研究では,パラ言語の役割を定量的に明らかにするための道具として各種の要素技術を開発し,音声対話システムとして組み上げた.
具体的には次に挙げるものを実現した.1)ロボット頭部に設置した4系統の指向性を用いてロボットに適した形で音源定位・音源分離を行う方式を提案した,2)パラ言語の表現を可能とする合成音声方式について検討し,波形合成方式を対象として合成品質を向上させる手法を提案するとともに,高品位な声質の変換方式を提案した.3)音声に含まれるパラ言語情報として,発話態度と相槌/聞き返しを取り上げ,発話に含まれる韻律情報を基にこれらを認識する手法を提案した.4)頭部ジェスチャ・表情等の視覚的情報に含まれるパラ言語情報の認識手法を提案した.5)音声対話システムのプラットフォームとして人間型ロボットROBISUKEの設計,構築を行った.6)各モジュールの理解・生成機能を有機的に統合する情報共有の枠組としてMONEAを提案し,通信ミドルウェアの実装を行った.
今後,これらの成果を利用し,自然な音声対話に必要とされる要件を明らかにするための定量的な実験を行う予定である.
状態・出力に相互依存性を有する確率モデルに基づく高精度な音声・ジェスチャ認識

日本学術振興会科学研究費助成事業基盤研究(C)

研究期間:

2000年

-

2002年

小林哲則

　概要を見る

本研究では、時系列パターン認識に必要となる確率モデルとして通常用いられている隠れマルコフモデル(HMM)に代わる、表現能力の高い確率モデルを提案した。さらにこれらを用いて高精度な連続音声認識システム、ジェスチャ認識システムを実現することを試みた。具体的な研究成果は以下の通りである。
(1)部分隠れマルコフモデル(PHMM)の導入
複雑な特徴量の時間変化を確率モデル内部で扱うことを目指して、部分隠れマルコフモデル(PHMM)を提案した。PHMMはHMMに比べて過渡部の表現能力に優れており、出力確率のみならず、状態遷移確率も過去の出力に依存して決まる点が他の先行研究と異なる。シミュレーション実験から、PHMMは過渡部の表現能力、および正解カテゴリに対する尤度の安定性という点で、HMMよりも良好な性能を与えることが確認できた。また、ジェスチャ認識、孤立単語認識において、PHMMは条件によらずHMMを上回る性能を示した。しかし、連続音声認識において差分あり特徴を用いた場合は、HMMの性能を下回ることがわかった。
(2)PHMMにおける状態・出力相互依存関係の拡張
従来、出力確率と状態遷移確率を決定するにあたり共通のものを用いていた隠れ状態と観測可能な状態の組を、隠れ状態に関しては共通のものを用い、観測可能な状態に関しては、出力を決定するものと状態遷移を決定するものとで異なるものを用いるように、PHMMおける状態と出力の依存関係の拡張を行った。これにより、PHMMは従来の枠組みよりも離れたフレーム間における出力の依存関係を扱うことが可能となる。連続音声認識実験により性能を評価したところ、従来のPHMMおよびHMMの性能を大幅に改善するとともに、従来のPHMMで差分あり特徴を用いた場合に見られた、HMMの性能を下回るという問題を解決している。
(3)PHMMにおける確率の平滑化
PHMMに基づく新たな確率モデルの枠組みとして、平滑化部分隠れマルコフモデル(SPHMM)を提案した。SPHMMは、PHMMにおける高次の確率をHMMで用いられる低次の確率で補間・平滑化するという枠組みであり、PHMMのもつ精密性とHMMの持つ信頼性を兼ね備えることを目指したものである。また、PHMMとHMMがともに高いスコアを与える単語仮説のみを正解とする枠組みであるので、誤認識を減少させるために有効であると考えられる。連続音声認識実験の結果、SPHMMは平滑化の重みを適切に設定したとき、HMM, PHMMよりも高い性能を与え、その有効性が確認された。PHMMの高次の確率をHMMの低次の確率で平滑化するというSPHMMの構造上、特に高次の特徴量を用いた場合に有効であることがわかった。
知覚・行動による実世界とのインタラクションに基づく言語の理解・獲得と行動生成

日本学術振興会科学研究費助成事業萌芽的研究

研究期間:

1998年

-

1999年

小林哲則

　概要を見る

本研究では、知覚機能を持つロボットを思考の主体とし、それを我々の生活環境におくことによって、実世界とのインタラクションが可能な状況を設定し、その中でロボットが知覚・行動を介して思考する枠組みを構築することを試みた。特に、ロボットに対する依頼表現の解釈、あるいはそれに応じた適切な行動計画の生成に焦点をあて、これらに対する適切な問題解釈を、実世界とのインタラクションを通じて行なう方法について検討した。
昨年度は、1)自律ロボットの作成、2)ロボットのAPI決定、3)外部世界の知覚アルゴリズムの作成、4)各種行動の上記シーンの解析結果を与える影響の抽出アルゴリズムの作成、などの要素技術の開発を行ない、これらをベースとして、行動のプリミティブに1対1に対応する言語表現を対象として言語獲得アルゴリズムの開発を行なったが、各要素技術の基本性能には問題が残されていた。
本年度においては、これら要素技術の高度化と、それらを統合したより高度な知的処理の実現を目指した。要素技術の高度化としては、ロボットに対する眉の付加とこれを用いた表情合成を行った。表情によるロボット内部の状態提示が可能になることで、利用者との間でのより豊かなインタラクションが実現された。また、シーン解析のアルゴリズム(視覚処理)に環境適応処理を組み込み、耐性を向上させることで、システムの動作が安定した。統合処理では、プリミティブ単体では実現できない複雑な行動を、どのようなプリミティブの組合せによって構成すべきかを、各種行動のプリミティブとその影響の関係対の組合せ問題を解くことによって求めるアルゴリズムについて検討した。以上によって、知覚・行動・思考の統合処理に基づいて、言語理解・獲得と行動計画の立案を行なう知的なシステムの基礎的な枠組みが実現できた。
時間変化する雑音環境下における音声認識に関する研究

日本学術振興会科学研究費助成事業奨励研究(A)

研究期間:

1993年

　

　

小林哲則

　概要を見る

本研究では時間変化する雑音下で発話された音声を高精度で認識するための基本技術を確立することを検討した。
人間は、音声にかなりの雑音がかぶっていても、あるいは、背景に音楽が流れていようとも、注目する音声を捉え、認識することができる。これらの機能は、人間の音声における瞬時スペクトルの特徴、あるいはその時系列の特徴に関する知識と、雑音における同様の知識を兼ね備えて持って、それらを分離しながら人間の音声にのみに選択的に注目する機能を持っているためである。本研究では、この機能を、音声と雑音とを2つの独立な確率モデルで表し、このモデルの下でもっともらしい音声と雑音の組合せ探索するという枠組によって、確率論理的に実現した・
具体的には、それぞれの情報源を独立に隠れマルコフモデル(HMM)と呼ばれる確率モデルで表現し、これらの情報源が与えられた時、その組合せの情報源から得られた観測信号列が生起する確率を、スペクトルサブトラクションと動的計画法に基づく最適時間整合とを組み合わせることによって実現した。
この結果、雑音対策なしのとき、-10dB、-20dBで、それぞれ74%、8%であった認識率を、100%、40%に向上させることができた。
音声の基本周波数揺らぎの生成論的および現象論的モデル化

日本学術振興会科学研究費助成事業奨励研究(A)

研究期間:

1990年

　

　

小林哲則
知識主導型音声認識システムの音韻決定部におけるニューラルネットの応用に関する研究

日本学術振興会科学研究費助成事業奨励研究(A)

研究期間:

1989年

　

　

小林哲則
音韻特徴の記号的表現とファジー推論を用いた音韻認識に関する研究

日本学術振興会科学研究費助成事業奨励研究(A)

研究期間:

1987年

　

　

小林哲則
調音状態推定に基づくボトムアップ型調音結合処理に関する研究

日本学術振興会科学研究費助成事業奨励研究(A)

研究期間:

1986年

　

　

小林哲則

▼全件表示

Misc

IPA Japanese dictation free software project

Katsunobu Itou, Kiyohiro Shikano, Tatsuya Kawahara, Kazuya Takeda, Atsushi Yamada, Akinori Ito, Takehito Utsuro, Tetsunori Kobayashi, Nobuaki Minematsu, Mikio Yamamoto, Shigeki Sagayama, Akinobu Lee

2nd International Conference on Language Resources and Evaluation, LREC 2000 2000年01月

　概要を見る

Large vocabulary continuous speech recognition (LVCSR) is an important basis for the application development of speech recognition technology. We had constructed Japanese common LVCSR speech database and have been developing sharable Japanese LVCSR programs/models by the volunteer-based efforts. We have been engaged in the following two volunteer-based activities. a) IPSJ (Information Processing Society of Japan) LVCSR speech database working group. b) IPA (Information Technology Promotion Agency) Japanese dictation free software project. IPA Japanese dictation free software project (April 1997 to March 2000) is aiming at building Japanese LVCSR free software/models based on the IPSJ LVCSR speech database (JNAS) and Mainichi newspaper article text corpus. The software repository as the product of the IPA project is available to the public. More than 500 CD-ROMs have been distributed. The performance evaluation was carried out for the simple version, the fast version, and the accurate version in February 2000. The evaluation uses 200 sentence utterances from 46 speakers. The gender-independent HMM models and 20k/60k language models are used for evaluation. The accurate version with the 2000 HMM states and 16 Gaussian mixtures shows 95.9 % word correct rate. The fast version with the phonetic tied mixture HMM and the 1/10 reduced language model shows 92.2 % word correct rate and realtime speed. The CD-ROM with the IPA Japanese dictation free software and its developing workbench will be distributed by the registration to http://www.lang.astem.or.jp/dictation-tk/or by sending e-mail to dictation-tk-request@astem.or.jp.

産業財産権

学習装置、音声認識装置、学習方法、および、学習プログラム

特許7473890

特許権
対話システムおよびプログラム

特許7274210

特許権
情報伝達システムおよびプログラム

特許7244910

特許権
収音装置、収音プログラム、及び収音方法

特許権
予兆検知システムおよびプログラム

特許権
予兆検知システムおよびプログラム

特許7107498

特許権
情報再生プログラム、情報再生方法、情報処理装置及びデータ構造

特許権
予測装置、予測方法および予測プログラム

特許6928346

特許権
制御状態監視システムおよびプログラム

特許権
状態監視システム

特許6717461

特許権
単語予測装置、プログラム

特許6588874

特許権
言語確率算出方法、言語確率算出装置および言語確率算出プログラム

特許6495814

特許権
会話ロボット

特許5751610

特許権
情報処理システム及び情報処理方法

特許 5467298

特許権
情報処理装置及び情報処理方法

特許5466593

特許権
対話活性化ロボット

特許5294315

特許権
音源分離装置、音源分離方法、音源分離プログラム及び記録媒体

特許5190859

特許権
音源分離装置、方法及びプログラム

特許5170465

特許権
音声認識用音響モデル作成装置とその方法と、プログラム

特許5152931

特許権
音源分離装置、プログラム及び方法

特許5105336

特許権
音声対話装置、音声対話方法及びロボット装置

特許5051882

特許権
音源分離装置、方法及びプログラム

特許4986248

特許権
音源分離システムおよび音源分離方法、並びに音響信号取得装置

特許4873913

特許権
顧客情報収集管理システム

特許4778532

特許権
音源分離方法およびそのシステム

特許4594629

特許権
人物属性識別方法およびそのシステム

特許4511850

特許権
音源分離方法およびそのシステム、並びに音声認識方法およびそのシステム

特許4457221

特許権
顧客情報収集管理方法及びそのシステム

特許4125634

特許権
音声入力モード変換システム

特許3906327

特許権

▼全件表示

その他

特別講義：高校数学で理解するChatGPTのしくみ（早大理工・オープンキャンパス）

2023年08月

-

　
特別授業：高校数学で理解する人工知能の基礎（早大学院）

2021年06月

-

　
特別講義：高校数学で理解する人工知能の基礎（早大理工・オープンキャンパス）

2018年08月

-

　
特別授業：人工知能と会話ロボット（高知工業高校）

2018年05月

-

　
特別授業：メディア情報処理が拓く世界（早大学院）

2005年

-

　
特別授業：ロボットとのコミュニケーション（山手学院高校）

2005年

-

　
特別授業：会話ロボットの歴史（都立立川高校）

2005年

-

　
特別講義：高校数学で理解するChatGPTのしくみ（早大理工・オープンキャンパス）

2024年08月

-

　
特別授業：高校数学で理解する人工知能の基礎（早大学院）

2019年10月

-

　
特別講義：会話とロボット・・・私のライフワーク（東京女子大）

2017年06月

-

　
Seminar : Enjoyable Conversation System (MIT, Spoken Lang. Sys. Group)

2015年11月

-

　
セミナー：ロボットを用いた会話プロトコルの研究（千葉工大・藤江研）

2015年05月

-

　
特別講義：会話システム：情報提供のきっかけは誰が作りうるか（奈良先端大）

2014年11月

-

　
Seminar : A robot-based approach towards finding conversation protocols (MERL)

2013年06月

-

　
セミナー：会話ロボットの開発（名古屋大・武田研）

2011年10月

-

　
Lecture : Conversation Robot (E-Just, Egypt)

2010年05月

-

　
特別講義：パラ言語の理解・生成機能を持つマルチモーダル会話ロボット（豊橋技科大）

2009年11月

-

　
Seminar : Recent Research Topics in Perceptual Computing Group at Waseda University (MIT, Spoken Lang. Sys. Group)

2009年10月

-

　
セミナー：音声対話ロボットの開発と将来展開（東北工大・畑岡研）

2009年02月

-

　
セミナー：人間と自然に会話するロボットの実現を目指して（電機大・武川研）

2007年12月

-

　
特別授業：コミュニケーションと人形ロボット ―理工学のススメ―（小平３中）

2007年09月

-

　
セミナー：ロボットとの会話におけるパラ言語情報の利用－ ROBISUKE：マルチモーダル会話ロボット－（東北大・牧野研）

2004年11月

-

　
セミナー：ROBISUKE：次世代の会話ロボット（京都大・奥乃研）

2003年11月

-

　
セミナー：ロボットと人との対話（阪大・白井研）

2003年10月

-

　
特別講義：マルチモーダルインタフェースによるヒューマノイドロボットとの対話（北陸先端大）

1999年10月

-

　
Seminar : Multi-person Communication via Multi-modal Interface - Human Interface of the Humanoid Robot - (MIT, Spoken Lang. Sys. Group)

1999年05月

-

　
◇ 本欄における特別講義／特別授業／セミナーの区別について

　概要を見る

特別授業：高校・中学に赴いて単発の授業を行ったもの；特別講義：大学に赴いて単発の講義を行ったもの；セミナー：大学・企業の研究室に赴いて情報提供・議論を行ったもの。

▼全件表示

現在担当している科目

体系的ロボット工学特論

大学院先進理工学研究科

2025年秋学期
イノベーションとテクノロジー実践　α：人工知能・先端ロボットテクノロジー実践

グローバル・エデュケーション・センター

2025年秋クォーター
イノベーションとテクノロジー基礎　α：人工知能・先端ロボットテクノロジーの基礎とスタートアップを学ぶ

グローバル・エデュケーション・センター

2025年春クォーター
イノベーションとテクノロジー実践　α：人工知能・先端ロボットテクノロジー実践　（学部生用）

グローバル・エデュケーション・センター

2025年秋クォーター
イノベーションとテクノロジー基礎　α：人工知能・先端ロボットテクノロジーの基礎とスタートアップを学ぶ（学部生用）

グローバル・エデュケーション・センター

2025年春クォーター
卒業論文Ｂ　18前再

基幹理工学部

2025年秋学期
卒業論文Ｂ

基幹理工学部

2025年秋学期
卒業論文Ａ　（集中）

基幹理工学部

2025年集中（春・秋学期）
卒業論文Ａ（秋学期）

基幹理工学部

2025年秋学期
卒業論文Ａ　18前再

基幹理工学部

2025年春学期
情報理工学実験Ａ

基幹理工学部

2025年秋学期
情報理工学実験Ｂ【前年度成績S評価者用】

基幹理工学部

2025年春学期
情報理工学実験Ｂ

基幹理工学部

2025年春学期
情報理工学実験Ａ　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
情報通信ラボ

基幹理工学部

2025年秋学期
パターン認識と機械学習

基幹理工学部

2025年秋学期
プロジェクト研究Ａ

基幹理工学部

2025年春学期
最適化アルゴリズム

基幹理工学部

2025年春学期
プロジェクト研究Ｂ

基幹理工学部

2025年秋学期
卒業論文Ｂ（春学期）

基幹理工学部

2025年春学期
情報理論Ａ

基幹理工学部

2025年春学期
情報通信ラボ【前年度成績S評価者用】

基幹理工学部

2025年秋学期
卒業論文Ａ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年春学期
卒業論文Ａ（秋学期）　18前再

基幹理工学部

2025年秋学期
卒業論文Ａ

基幹理工学部

2025年春学期
卒業論文Ｂ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
卒業論文Ｂ（春学期）　18前再

基幹理工学部

2025年春学期
情報理論Ａ　【前年度成績S評価者用】

基幹理工学部

2025年春学期
情報理論Ａ

基幹理工学部

2025年春学期
情報通信ラボ

基幹理工学部

2025年秋学期
最適化アルゴリズム【前年度成績S評価者用】

基幹理工学部

2025年春学期
最適化アルゴリズム

基幹理工学部

2025年春学期
情報通信実験Ｂ【前年度成績S評価者用】

基幹理工学部

2025年春学期
情報通信実験Ｂ

基幹理工学部

2025年春学期
情報通信実験Ａ

基幹理工学部

2025年秋学期
卒業論文Ｂ（春学期）

基幹理工学部

2025年春学期
卒業論文Ｂ

基幹理工学部

2025年秋学期
卒業論文Ａ　（集中）

基幹理工学部

2025年集中（春・秋学期）
卒業論文Ａ（秋学期）

基幹理工学部

2025年秋学期
プロジェクト研究Ａ

基幹理工学部

2025年春学期
パターン認識と機械学習

基幹理工学部

2025年秋学期
卒業論文Ｂ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
卒業論文Ｂ（春学期）　18前再

基幹理工学部

2025年春学期
卒業論文Ａ

基幹理工学部

2025年春学期
卒業論文Ａ（秋学期）　18前再

基幹理工学部

2025年秋学期
卒業論文Ａ　18前再

基幹理工学部

2025年春学期
情報通信ラボ【前年度成績S評価者用】

基幹理工学部

2025年秋学期
卒業論文Ｂ　18前再

基幹理工学部

2025年秋学期
卒業論文Ａ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年春学期
情報通信実験Ａ　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
Computer Science and Communications Engineering Laboratory A [S Grade]

基幹理工学部

2025年秋学期
Graduation Thesis A (Spring) [S Grade]

基幹理工学部

2025年春学期
Graduation Thesis B (Fall)

基幹理工学部

2025年秋学期
Computer Science and Communications Engineering Laboratory A

基幹理工学部

2025年秋学期
プロジェクト研究Ｂ

基幹理工学部

2025年秋学期
Computer Science and Communications Engineering Laboratory B

基幹理工学部

2025年春学期
Project Research Spring

基幹理工学部

2025年春学期
Project Research Fall

基幹理工学部

2025年秋学期
Introduction to Computers and Networks

基幹理工学部

2025年春学期
Graduation Thesis B (Fall) [S Grade]

基幹理工学部

2025年秋学期
Graduation Thesis B (Spring) [S Grade]

基幹理工学部

2025年春学期
Graduation Thesis A (Fall) [S Grade]

基幹理工学部

2025年秋学期
Graduation Thesis A (Spring)

基幹理工学部

2025年春学期
Graduation Thesis B (Spring)

基幹理工学部

2025年春学期
Graduation Thesis A (Fall)

基幹理工学部

2025年秋学期
実体情報学演習Ｉ

大学院基幹理工学研究科

2025年春学期
実体情報学演習Ｈ

大学院基幹理工学研究科

2025年秋学期
実体情報学演習Ｇ

大学院基幹理工学研究科

2025年春学期
実体情報学演習Ｆ

大学院基幹理工学研究科

2025年秋学期
実体情報学演習Ｅ

大学院基幹理工学研究科

2025年春学期
実体情報学演習Ｄ

大学院基幹理工学研究科

2025年秋学期
実体情報学演習Ｃ

大学院基幹理工学研究科

2025年春学期
実体情報学演習Ｂ

大学院基幹理工学研究科

2025年秋学期
実体情報学演習Ａ

大学院基幹理工学研究科

2025年春学期
実体情報学概論

大学院基幹理工学研究科

2025年春学期
実体情報学特別演習　23前再

大学院基幹理工学研究科

2025年通年
実体情報学演習Ｊ

大学院基幹理工学研究科

2025年秋学期
Master's Thesis (Department of Computer Science and Communications Engineering)

大学院基幹理工学研究科

2025年通年
修士論文（情報・通信）

大学院基幹理工学研究科

2025年通年
実体情報学特別演習

大学院基幹理工学研究科

2025年通年
イノベーション事例研究特論（実践）　23前再

大学院基幹理工学研究科

2025年秋クォーター
イノベーション事例研究特論（基礎）　23前再

大学院基幹理工学研究科

2025年春クォーター
パターン認識特論

大学院基幹理工学研究科

2025年春学期
情報理工・情報通信特別実験B

大学院基幹理工学研究科

2025年秋学期
情報理工・情報通信特別実験A

大学院基幹理工学研究科

2025年春学期
知覚情報システム

大学院基幹理工学研究科

2025年秋学期
言語認知科学研究

大学院基幹理工学研究科

2025年通年
知覚情報システム研究

大学院基幹理工学研究科

2025年通年
知覚情報システム

大学院基幹理工学研究科

2025年秋学期
Seminar on Perceptual Computing C

大学院基幹理工学研究科

2025年春学期
Seminar on Perceptual Computing B

大学院基幹理工学研究科

2025年秋学期
Seminar on Perceptual Computing A

大学院基幹理工学研究科

2025年春学期
Pattern Recognition

大学院基幹理工学研究科

2025年春学期
Special Laboratory B in Computer Science and Communications Engineering

大学院基幹理工学研究科

2025年秋学期
Special Laboratory A in Computer Science and Communications Engineering

大学院基幹理工学研究科

2025年春学期
Perceptual Computing

大学院基幹理工学研究科

2025年秋学期
Research on Linguistic Cognitive Science

大学院基幹理工学研究科

2025年通年
Research on Perceptual Computing

大学院基幹理工学研究科

2025年通年
言語認知科学演習Ｄ

大学院基幹理工学研究科

2025年秋学期
言語認知科学演習Ｃ

大学院基幹理工学研究科

2025年春学期
言語認知科学演習Ｂ

大学院基幹理工学研究科

2025年秋学期
言語認知科学演習Ａ

大学院基幹理工学研究科

2025年春学期
知覚情報システム演習D

大学院基幹理工学研究科

2025年秋学期
知覚情報システム演習C

大学院基幹理工学研究科

2025年春学期
知覚情報システム演習B

大学院基幹理工学研究科

2025年秋学期
知覚情報システム演習A

大学院基幹理工学研究科

2025年春学期
情報理工・情報通信特別演習Ｂ

大学院基幹理工学研究科

2025年秋学期
情報理工・情報通信特別演習Ａ

大学院基幹理工学研究科

2025年春学期
実体情報学演習Ｇ

大学院創造理工学研究科

2025年春学期
実体情報学演習Ｆ

大学院創造理工学研究科

2025年秋学期
実体情報学演習Ｅ

大学院創造理工学研究科

2025年春学期
実体情報学演習Ｄ

大学院創造理工学研究科

2025年秋学期
実体情報学演習Ｃ

大学院創造理工学研究科

2025年春学期
実体情報学演習Ｂ

大学院創造理工学研究科

2025年秋学期
実体情報学演習Ａ

大学院創造理工学研究科

2025年春学期
知覚情報システム研究

大学院基幹理工学研究科

2025年通年
言語認知科学研究

大学院基幹理工学研究科

2025年通年
Seminar on Perceptual Computing D

大学院基幹理工学研究科

2025年秋学期
Seminar on Linguistic Cognitive Science D

大学院基幹理工学研究科

2025年秋学期
Seminar on Linguistic Cognitive Science C

大学院基幹理工学研究科

2025年春学期
Seminar on Linguistic Cognitive Science B

大学院基幹理工学研究科

2025年秋学期
Seminar on Linguistic Cognitive Science A

大学院基幹理工学研究科

2025年春学期
実体情報学概論

大学院創造理工学研究科

2025年春学期
最先端ロボティクスにおける論文の分析および議論

大学院創造理工学研究科

2025年春学期@秋学期
実体情報学特別演習

大学院創造理工学研究科

2025年通年
イノベーション事例研究特論（実践）　23前再

大学院創造理工学研究科

2025年秋クォーター
イノベーション事例研究特論（基礎）　23前再

大学院創造理工学研究科

2025年春クォーター
実体情報学特別演習　23前再

大学院創造理工学研究科

2025年通年
実体情報学演習Ｊ

大学院創造理工学研究科

2025年秋学期
実体情報学演習Ｉ

大学院創造理工学研究科

2025年春学期
実体情報学演習Ｈ

大学院創造理工学研究科

2025年秋学期
体系的ロボット工学特論

大学院創造理工学研究科

2025年秋学期
体系的ロボット工学特別演習

大学院創造理工学研究科

2025年秋学期
Analysis and Discussion of Papers on Advanced Robotics

大学院創造理工学研究科

2025年春学期@秋学期
実体情報学特別演習　23前再

大学院先進理工学研究科

2025年通年
実体情報学演習Ｊ

大学院先進理工学研究科

2025年秋学期
実体情報学演習Ｉ

大学院先進理工学研究科

2025年春学期
実体情報学演習Ｈ

大学院先進理工学研究科

2025年秋学期
実体情報学演習Ｇ

大学院先進理工学研究科

2025年春学期
実体情報学演習Ｆ

大学院先進理工学研究科

2025年秋学期
実体情報学演習Ｅ

大学院先進理工学研究科

2025年春学期
実体情報学特別演習

大学院先進理工学研究科

2025年通年
イノベーション事例研究特論（実践）　23前再

大学院先進理工学研究科

2025年秋クォーター
イノベーション事例研究特論（基礎）　23前再

大学院先進理工学研究科

2025年春クォーター
最先端ロボティクスにおける論文の分析および議論

大学院創造理工学研究科

2025年春学期@秋学期
実体情報学演習Ｄ

大学院先進理工学研究科

2025年秋学期
実体情報学演習Ｃ

大学院先進理工学研究科

2025年春学期
実体情報学演習Ｂ

大学院先進理工学研究科

2025年秋学期
実体情報学演習Ａ

大学院先進理工学研究科

2025年春学期
実体情報学概論

大学院先進理工学研究科

2025年春学期

▼全件表示

担当経験のある科目(授業)

最適化アルゴリズム（早稲田大学）

2005年04月

-

継続中
情報理論（早稲田大学）

1995年09月

-

継続中
パターン認識（法政大学，早稲田大学）

1991年04月

-

継続中
信号処理（法政大学，早稲田大学）

1991年04月

-

2008年03月
プログラミング（法政大学，東京農工大学，早稲田大学）

1985年04月

-

2004年03月
電子回路（早稲田大学）

1992年04月

-

1994年07月
人工知能（法政大学，早稲田大学）

1985年04月

-

1992年03月
計算機工学（法政大学）

1986年04月

-

1991年03月
電磁気学（法政大学）

1985年04月

-

1991年03月

▼全件表示

社会貢献活動

会話ロボット ROBISUKE の実演展示

愛・地球博
2005年08月

　

　
Conversational Robot ROBISUKE, Exhibition & Demonstration

Lille 2004 -European Capital of Culture : Robots !
2003年12月

-

2004年03月
Conversational Robot ROBISUKE, Exhibition & Demonstration

ROBODEX2003
2003年03月

　

　
会話ロボット ROBITA の実演展示

NTT Intercommunication Center 「共生する／進化するロボット」展
1999年02月

　

　
WABOT2 の実演展示

科学万博つくば'85
1985年03月

-

1985年09月
SCHEMA: multi-party interaction-oriented humanoid robot

ACM, SIGGRAPH ASIA 2009 Art Gallery & Emerging Technologies: Adaptation
2009年12月

　

　
会話ロボットROBISUKEの実演展示

大垣市ものづくりフェスティバル
2009年03月

　

　
会話ロボットROBISUKEの実演

ケアタウン小平訪問デモ
2008年06月

-

　
会話ロボットROBISUKEの実演展示

21世紀夢ウィーク～飛騨高山ロボットワールド～
2005年09月

　

　
Conversational Robot ROBISUKE, Exhibition & Demonstration

Japan Festival in Korea
2002年10月

　

　
会話ロボットROBITA の実演展示

ザ・ロボット博
2001年04月

-

2001年05月

▼全件表示

学術貢献活動

Organizing committee member, Local arrgement committee chair, Interspeech 2010

大会・シンポジウム等

ISCA

2010年09月

　

　
Organizing committee chair, International Symposium on Spoken Dialogue

大会・シンポジウム等

Waseda University

1993年11月

　

　
Exhibition committee member, International Conference on Spoken Language Processing 1990

大会・シンポジウム等

1990年11月

他学部・他研究科等兼任情報

理工学術院大学院基幹理工学研究科
附属機関・学校グローバル・エデュケーション・センター

学内研究所・附属機関兼任歴

2025年

-

2026年

データ科学センター兼任センター員
2025年

　

　

知覚情報システム研究所プロジェクト研究所所長
2024年

-

2026年

理工学術院総合研究所兼任研究員

特定課題制度（学内資金）

音声会話システムに関する研究

2021年

　概要を見る

　会話システムにおいてコンテンツを扱う部分からプロトコルを扱う部分を分離して両者を疎結合構成とし，後者を対象として End-to-End 学習を適用することで，比較的少数のデータで学習が可能な会話プロトコル制御モデルについて検討している．本年度は，特にシステムの発話タイミングの精緻なモデリングについて検討した．従来の会話システムでは，発話終端の検知に基づいてシステムを動作させるが，安定した発話終端検知には時間がかかり，リズムの良い会話はできない．そこで，発話終端検知に頼ることなく，韻律パターンや発話内容からシステムが発話すべきか否かを，音声の分析フレームの更新時刻に同期して毎時刻判定することについて検討した．モデルは，LSTMをベースとするDNN(Deep Neural Networks)で構成し，利用する入力情報としては，スペクトル包絡特徴，韻律特徴，言語特徴（音声認識の結果得られるサブワード列），対話行為の推定結果を用いることについて検討した．このシステム構成によって，発話タイミングを精緻に制御することができ，円滑な会話進行に貢献すること，対話行為を利用することの効果は大きいことなどが明らかになった．
会話システムのプロトコルとアーキテクチャに関する研究

2020年

　概要を見る

　提案する音声会話の４階層プロトコルのうち，ターンテイク層の機能の精緻化について検討した。　ターンテイク層では，リズムの良い会話の実現のために，文脈に応じてシステムが番をとるべきかとらざるべきか，とるとするならばどの程度の時間をおいてとるべきかを決定する。昨年度，この解決のために，TGN(Timing Generating Networks)とよぶ，イベントの出力タイミングを制御可能なニューラルネットワークを提案したが，本年度はこれに発話義務推定とのマルチタスクで学習する仕組みと，言語情報利用の仕組みとを組み込んで精緻化した。この拡張によって，発話タイミングを0.5秒以内で推定できる精度を7.5%向上させることができた。
会話システムのプロトコルとアーキテクチャに関する研究

2019年

　概要を見る

　提案する音声会話の４階層プロトコルのうち，ターンテイク層の機能の実現法を検討した。　ターンテイク層では，リズムの良い会話の実現のために，文脈に応じてシステムが番をとるべきかとらざるべきか，とるとするならばどの程度の時間をおいてとるべきかを決定する。この問題の解決のために，ETCNN(Event-Timing Controllable Neural Network)とよぶ，イベントの出力タイミングを制御可能なニューラルネットワークを提案した。ETCNNは，出力タイミングが，ユーザ発話の韻律，ユーザなどに従って制御できるEnd-to-Endの枠組みである。この手法の適用によって，発話タイミングの推定誤差を従来手法に比べ平均約20%減じるとともに，格段に推定の外れ値を減ずることに成功した。
会話システムのプロトコルとアーキテクチャに関する研究

2018年

　概要を見る

　我々が提案する音声会話の４階層プロトコルのうち，参与構造形成層，メッセージ送信層の機能実現法を検討した。　参与構造形成層では，参与構造形成のためのシステムの行動を，センサー情報からEnd-to-Endで決める手法を検討した。発話や視線の認識を副タスクとするマルチタスクNeural Networksを導入することで，従来のルールに基づく方法に比べ，精度を30ポイント以上向上できた。　メッセージ送信層では，段落内における各文の重要度をBERTに基づく解析結果を用いて求め，これに応じて文間の「間」を静的に制御するする手法を提案した。一対比較によるプリファレンス評価において，本手法導入後のシステムは，導入以前に対し，77%という極めて高い値を達成した。
会話システムのプロトコルとアーキテクチャに関する研究

2017年

　概要を見る

会話プロトコルを，通信系になぞらえて，a)物理層，b)参与構造形成層，c)メッセージ送信層，d)ターンテイク層に整理した。a)は通信系の物理層に相当し，人を模した表現手段としての身体を持つことで，人同士と同じ方法でのデータ授受を可能にする。b)はデータリンク／ネットワーク層相当に相当し，身体表現によって，会話への参加状態と，その変更手続きを与える。c)はトランスポート層に相当し，相槌等によってデータ授受の成否を伝える。d)はセッション層に相当し，セッション開始・終了を定義する。これら円滑な会話に必要となる振る舞いを，機能・役割レベルと，具体的身体動作レベルに分けて記述し，ハードウェアに依存部を下位に隠蔽した。
音声会話：情報遭遇を含む多様な情報行動による情報アクセスに関する研究

2017年小川哲司, 林良彦

　概要を見る

　従来，音声会話は，質問応答を対象として実現されてきた。しかし，快適な情報アクセスには，これに加えシステム側から主体的に情報提供する機能が必要とされ，さらにこれらのモードがリズム良く遷移できること求められる。ここでは，これら複雑な情報行動に即応性高く対応できる音声会話システムを実現する立場から，「シナリオ主導型会話システム」を提案した。伝えようとする文書の解析に基づいて，文書のあらすじを伝えるシナリオの主計画と，想定される質問に答える副計画とが事前に準備され，これらに沿って会話が進められる。実験の結果，従来型の会話システムに比べ，ユーザに必要な情報だけ効率的に伝達できるシステムが実現できた。
深層学習に基づく雑音抑圧処理歪の補正と雑音下音声認識への適用に関する研究

2016年小川哲司

　概要を見る

　本研究では，申請者が研究を続けてきた高速・高精度な音源分離手法であるエリア収音技術と深層学習を利用した低歪の雑音抑圧技術を融合することで，拡散性雑音抑圧フィルタを完成度の高い形で実現する方式の検討を行った．　提案方式では，エリア収音により目的音と雑音を分離した後，目的音に残留した雑音成分を抑圧するフィルタを構築する．そのために，エリア収音により得られた目的音が支配的な信号と拡散性雑音が支配的な信号のパワースペクトルを入力とする深層ニューラルネットワークによって帯域ごとのSNRを推定した．　提案方式は，従来のマルチチャネルウィナーフィルタと比較して，処理歪を抑えながら高い雑音抑圧性能を達成した．
会話：意図性の異なる多様な情報行動による情報享受の実現

2016年林良彦, 藤江真也

　概要を見る

　能動的（検索的）情報行動と受動的（遭遇的）情報行動の双方を交えたリズム良い情報授受によって，ニュース内容を伝える会話システムを実現した。即応性を重要するため，予想されるユーザの応答に応じた分岐を含む会話進行のシナリオを準備し，これを切り替えながら会話を進める方法をとった。シナリオは，ニュースの根幹を伝える主計画と，ユーザの反応に応じて補足情報を提示する副計画からなる。前者は，話題性を考慮して重要語を定め，これを含むようニュースを要約して定めた。後者は，各呼気段落内の重要自立語に対し質問タイプを網羅して回答を用意して定めた。以上によって，所期の目的を達成する会話システムを実現できた。
参加者間の共鳴状態を誘導する音声会話システム

2014年林良彦, 小川哲司, 松山洋一, 藤江真也, 中野鐵兵

　概要を見る

　新たな会話制御技術と情報提供技術により，会話を共鳴状態（参加者が響きあうよう呼応して会話する状態）に導く音声会話システムを実現した。　会話制御に関しては，全会話参加者が等しい発話機会を持つための調整機能を提案した。会話では特定の人が頻繁に発話を繰り返し，一部は会話に入れないことがある。ここでは，会話に割り込んで主導権をとった後，発話機会の少ない人に話題をふる機能を実装しこの問題を解決した。　情報提供に関しては，レビュー記事にある投稿者の主観的発話を，システムの主観の如く発話する機能を実装した。会話相手を楽しませる効果を持つ発話の選択手法と，選ばれる複数の文の主観が一貫性を持つため枠組みを提案した。
グループ会話環境下における場の活性化要素としての会話ロボットに関する研究

2012年松山洋一, 岩田和彦

　概要を見る

　少人数での会議や談笑に見られる会話参加者間で動的な発話やりとりを特徴とする「グループ会話(多人数会話)」を対象として,これに参与して,場を活性化する機械システムを実現するための枠組みについて検討した。　具体的には，１）会話に入れない人に参加を促す際のプロトコル２）興味を引く発話の自動生成３）会話システム用顔画像処理技術の高度化４）会話システム用音声合成技術の高度化の４つのサブテーマをとりあげた。各テーマの成果は以下の通りである。　「１）会話に入れない人に参加を促す際のプロトコル」は，発話する機会も，話しかけられる機会も少ない会話参加者を見つけて，その人に話しかけ，会話に入ってもらうためには，どのような仕組みが必要かについて検討したものである。このとき，システムは，対象となる人を探す機能，話しかける適切な内容を決める機能，話しかけても全体の調和を乱さないタイミングを決める機能，などが必要となる。ここでは，①参加者の発話状態，視線などからそれぞれの参与役割を推定し，発話者にも主たる聞き手にもなっていない割合が高い人を話しかけるべき対象者とする，②CRFを用いて話題を適切に追うことで発話するべき内容を定める，③現話題の下での会話に参加して「調和的会話参加者」になってから対象者に話かける，などの仕組みを実装することで，所望の機能を実現した。　「２）興味を引く発話の自動生成」は，会話参加者からの質問にたいする回答内容を自動的に用意する方法について検討したものである。興味を引く発話を行うために，回答内容には，システムの主体的感想，評価的内容を含めることとした。この目的のため，システムは，レビューサイトをクロールし，関連する話題について評価を述べた部分を抜き出し，口語調に表現を変えた上で，内容のふさわしさを評価ランキングし，上位の文を回答文とする方式を考案した。ここで，ランキングには低頻度形容詞を多用している文を評価尺度とした。これによって，情報の多い意外性のある文を選ぶ仕組みが実現し，効果的な回答文を生成することに成功した。　「３）会話システム用顔画像処理技術の高度化」は，会話システムに必要となる顔検出を安定に行う技術について検討したものである。AAMを改良することで，顔と顔部品の検出精度を飛躍的に改善することができた。　「４）会話システム用音声合成技術の高度化」は，会話調の音声合成方式について検討したものである。文脈に応じて，適切な声質・抑揚で発話できる合成器を，心理空間上での文脈のクラスタリング，語末表現のクラスタリング等を精緻に行うことで実現した。
音声会話システムの総合的研究

2011年藤江　真也, 小川　哲司, 松山　洋一, 岩田　和彦

　概要を見る

ロボットを用いた会話コミュニケーションの実現に向けて，以下のテーマの研究を行った。(1)音声会話プロトコルの解明　音声会話プロトコルのモデル化を，会話の観察に基づて行った。特に多人数で会話を行うとき，会話相手の選択，発話の番の制御などが，どのような身体表現を伴って行われるかを整理した。(2) 魅力ある会話の実現　会話が魅力的であるために，ロボットの発話はどうあるべきかについて整理した。特に相手が話しやすくすることに配慮しながら，まず，単に聞かれたことに答えるだけでなく，質問に答えながら関連した新たな話題を含めるしくみを用意した。これによって利用者は発話をつなぎやすくった。(3)要素技術の開発　3-1)視覚情報処理：　会話参加者の姿勢は，その会話参加者の会話への参加の意思等を特定するのに重要である。また，視線が直接の通信相手を表すこと，表情が情報伝達の成功／不成功や，相手の興味の有無を表現することなどは，既に良く知られている。この「姿勢と表情」の自動認識システムを開発した。姿勢認識・表情推定に必要となる画像特徴点の抽出問題に対し，ロボットに装着したカメラだけでなく，部屋の天井に設置したカメラでも情報を収集した上でそれらを統合利用するシステムを実現した。　3-2)聴覚情報処理：　多人数の音声会話をハンズフリーで行うとき生じる様々な問題を解決した。主に目的話者の背後から到来する指向性雑音の除去と，残響の問題を，提案する6マイクの帯状ビームフォーマ[4]で処理した。また，会話では，一息で多くの文を話したり，ひとつの文をとぎれとぎれに話したりするが，この発話単位と意味の纏まりの異なりが，会話音声認識の問題を難しくしている。ここでは，話し方(間のおき方)の違いは，一種のプロトコルにかかわる情報を発話に含めた結果と解釈し，それが引き起こす特有の韻律現象を，デコードに積極利用する方法を検討した。(4)統合システム　以上(1)-(4)を統合し，複数人を対象に，ゲームをしながら会話を楽しむことができるシステムを実現した。通所介護施設の高齢者との会話実験を行い，好評をいただいた。　
会話ロボットの利用に基づくパラ言語理解・生成機構の定量的モデル化に関する研究

2007年藤江　真也

　概要を見る

　これまでに開発したパラ言語情報の理解・表出機能を持つ音声会話ロボットを用いて，自然な音声会話コミュニケーションを成立させるために必要となるパラ言語情報の役割を定量的に明らかにすること試みた。　我々人間は，会話的コミュニケーションにおいて，音声で言語情報を伝える傍らにおいて，会話参与の状態（情報を受け付ける状態にあるのかどうか，正常に情報を受け付けたかどうか，受けた情報をどのように評価しているのか等）を表情によって伝達しており，これが基礎となって円滑な情報交換が成立している。これらの情報は，パラ言語と呼ばれる情報の一部であるが，これらパラ言語の重要性を定性的に指摘する研究は存在するものの，これをどの程度厳格にモデル化すれば自然なコミュニケーションは成立するのかについて定量的に検討する試みはなされていない。　そこで本研究では，特にターンテーキングの円滑化に係るロボットの表情表出動作として，視線表現を選定し，その定量的モデル化を試みた。一般に，ターンを渡すためには発話終了に合わせて聴取者に視線を向け，ターンを保持するためには視線をずらすとされている。表現のバリエーション（視線の外し方，合わせ方）およびその頻度，時間構造をパラメタとするモデルを作成し，種々のパラメタ設定でパラ言語情報を生成する会話ロボットと被験者の会話実験を行い，その自然性を評価した。この結果，自然な視線表現を実現するパラメータの関係式と，連続動作させるときのパラメタセットの組み合わせに関する知見が得られ，これに従って視線を動作させるとき，会話が自然に進行することを確認した。
状況把握と身体表現機能を有する複数話者との会話ロボット－人間と空間を共有する情報端末の実現に向けて

2000年菊池　英明, 高西　淳夫

　概要を見る

　複数話者と会話するロボットの実現に向けて、1)複数話者の音声の分離・認識、2)複数話者間におけるコミュニケーションチャネルの認識、3)身体による意志表現、4)情報統合技術、の４点について検討を行った。　１）に関しては、通常音声認識に用いられる、音響モデルと、言語モデルに加え、発話のターンテーキングに関する発話間言語モデル、および話者の交代を統計的に表わす話者モデルをさらに加えた４つの確率モデルを用いて、最も尤もらしい話者の交代と発話内容を推定するアルゴリズムを確立した。　２）に関しては、音源の定位と発話者の顔方向の組合せによって、誰が誰に向かって話しているのかを認識することを可能とした。音源定位に関しては、MUSICスペクトルの相関を用いた方法を提案し、定位精度を飛躍的に改善した。顔方向に関しては、ICAを基礎とした特徴抽出手法を提案し、高精度の顔方向の認識を実現した。　３）に関しては、ロボットハードウェアとして、従来からあった目、手などに加え、眉、口、手などを付加することで、表現能力を拡充した。また、それらの単純化された身体を用いて、効果的に内部状態を表現するための動作と、その提示戦略を確立した。　４）に関しては、黒板システムにサブスクライブ／パブリッシュ機能を付加した情報伝達機構を考え、これをロボットを構成する多種多様なプロセッサ構成の中で、透過な形で実装した。　以上の成果を用いて、外部状況を視覚的あるいは聴覚的に把握し、ときにジェスチャ等の非言語的手段による意思表示をしながら、複数の相手を対象にして会話できるロボットを実現した。
確率過程の精密なモデル化とその音声認識・ジェスチャ認識への応用に関する研究

1998年橋本　周司, 笠原　博徳

　概要を見る

本研究では、時系列パターンマッチングのための確率過程のモデルを精密化するとともに、それを用いて音声認識、ジェスチャー認識の性能を向上させることを試みた。音声やジェスチャーの認識に代表される時系列のパターン認識においては、確率過程のモデルが重要な役割を果たす。従来この確率モデルとしては、隠れマルコフモデル(HMM)が用いられてきた。しかしながら、HMMは区分定常の確率過程しか扱うことができず、結果として種々の不都合を生じていた。この問題を解決するために、2重のマルコフモデルから発して、時間の古い状態を観測不能な隠れ状態に、時間の新しい方の状態を可観測状態においた、新たな確率モデル、部分隠れマルコフモデル(PHMM)を提案した。HMMでは、出力、次状態ともに前状態にのみ依存して決まるのに対し、PHMMでは、出力、状態ともに、状態と前出力に依存して決まる枠組となっている。この構造のため、モデルの複雑化を抑えた上で、HMMに比べ過渡部の表現能力の高い確率過程のモデルが実現できた。 PHMMのパラメータ推定法としては、EMアルゴリズムを用いた定式化を行ない、厳密なパラメータ推定法を確立した。シミュレーション実験を通じてPHMMとHMMの特性を比較したところ、HMMでは出力確率が主に状態推移のタイミングを決め、状態遷移確率はほとんど無意味であるのに対し、PHMMでは遷移確率が状態推移のタイミングを決めていることが分かった。遷移部の動特性の違いを区別する上でも、PHMMはHMMより有効であることが分かった。 PHMMを用いて、ジェスチャ認識実験と音声認識実験を行なったところ、ジェスチャ認識、音声認識ともにHMMより高い性能が得られ、PHMMの時系列パターン認識への有効性が確認された。

▼全件表示