Details of a Researcher - OGAWA, Tetsuji

写真a

OGAWA, Tetsuji

Scopus Paper Info

Paper Count: 152 Citation Count: 1045 h-index: 17

Click to view the Scopus page. The data was downloaded from Scopus API in July 11, 2026, via http://api.elsevier.com and http://www.scopus.com .

Google Scholar Information (Citations per year)

Citation Count: 1726 h-index: 21 i10-index: 38

Click to view the Google Scholar page.

Scopus Information

Affiliation

Faculty of Science and Engineering, School of Fundamental Science and Engineering

Job title

Professor

Degree

Ph.D ( Waseda University )

Homepage URL

https://sites.google.com/site/ogatetsu/

Profile

Tetsuji Ogawa received his B.S., M.S., and Ph.D. in electrical engineering from Waseda University in Tokyo, Japan, in 2000, 2002, and 2005. He was a Research Associate from 2004 to 2007, a Visiting Lecturer in 2007, an Assistant Professor from 2007 to 2012, and an Associate Professor from 2012 to 2019 at Waseda University. He has been a Professor at Waseda University since 2019. He was an Adjunct Professor at Egypt-Japan University of Science and Technology (E-JUST) from 2012 to 2015. He was a Visiting Scholar in the Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, from June to September in 2012 and from June to August in 2013, and a Visiting Scholar in Speech Processing Group, Faculty of Information Technology, Brno University of Technology, Czech Republic from June to July in 2014 and May to August in 2015. His research interests include stochastic modeling for pattern recognition, speech enhancement, and speech and speaker recognition. He is a member of the Institute for of Electrical and Electronics Engineering (IEEE), Institute of Electoronics, Information and Communication Engineers of Japan (IEICE), Information Processing Society of Japan (IPSJ), The Japanese Society of Artificial Intelligence (JSAI), Acoustic Society of Japan (ASJ), The Japan Society of Mechanical Engineers (JSME), Japan Wind Energy Association (JWEA), Japanese Society of Animal Science (JSAS), and The Japanese Society of Fisheries Science (JSFS).

Research Experience

2025.04

-

Now

東京大学先端科学技術研究センター洋上風力開発推進施設（AIHOW）客員上級研究員
2019.04

-

Now

Waseda University
2016.06

-

Now

The National Institute of Advanced Industrial Science and Technology (AIST) Artificial Intelligence Research Center Guest Researcher
2020.04

-

2025.03

NHK放送技術研究所客員研究員
2012.04

-

2019.03

Waseda University
2015.05

-

2015.08

, Brno University of Technology Visiting Scholar
2012.04

-

2015.03

, Egypt-Japan University of Science and Technology Adjunct Associate Professor
2014.06

-

2014.07

Brno University of Technology Visiting Scholar
2013.06

-

2013.08

Johns Hopkins University Visiting Researcher
2012.06

-

2012.09

Johns Hopkins University Visiting Researcher
2007.11

-

2012.03

Assistant Professor, Waseda Institute for Advanced Study
2007.04

-

2007.10

Visiting Lecturer, Waseda University
2004.04

-

2007.03

Research Associate, Waseda University

▼display all

Education Background

2002.04

-

2005.03

Waseda University
2000.04

-

2002.03

Waseda University
1996.04

-

2000.03

Waseda University

Committee Memberships

2023.04

-

Now

日本音響学会評議員
2019

-

Now

高知県マリンイノベーション運営協議会委員
2017.09

-

Now

電子情報通信学会常任査読委員
2014.04

-

Now

日本音響学会査読委員
2021.06

-

2025.05

日本音響学会会誌編集委員
2020.11

-

2021.06

音学シンポジウム2021 実行委員
2019.05

-

2021.04

電子情報通信学会音声研究会幹事
2019.11

-

2020.06

音学シンポジウム2020 実行委員
2020

　

　

Speaker Odyssey 2020 Local Organizing Committee
2017.05

-

2019.04

電子情報通信学会音声研究会専門委員
2017

　

　

第7回バイオメトリクスと認識・認証シンポジウムプログラム委員
2010

-

2011

電子情報通信学会情報・システムソサイエティ誌編集委員
2008

-

2011

情報処理学会音声言語情報処理研究会運営委員
2010

　

　

高度言語情報融合フォーラム（ALAGIN）若手研究者フォーラム実行委員
2009

-

2010

第9回情報科学技術フォーラム(FIT) プログラム委員

▼display all

Professional Memberships

2021.10

-

Now

Japanese Society of Fisheries Oceanography
2019.05

-

Now

The Japanese Society of Artificial Intelligence (JSAI)
2018.07

-

Now

The Japanese Society of Fisheries Science (JSFS)
2018.06

-

Now

Japan Wind Energy Association (JWEA)
2018.01

-

Now

Japanese Society of Animal Science (JSAS)
2017.08

-

Now

The Japan Society of Mechanical Engineers (JSME)
2008.03

-

Now

Information Processing Society of Japan (IPSJ)
2000.01

-

Now

The Acoustical Society of Japan (ASJ)
　

　

　

Institute of Electoronics, Information and Communication Engineers of Japan (IEICE)
　

　

　

International Speech Communication Association (ISCA)
　

　

　

The Institute of Electrical and Electronics Engineers, Inc. (IEEE)

▼display all

Research Areas

Perceptual information processing / Human interface and interaction / Intelligent informatics / Medical systems 医療情報システム / Medical assistive technology 看護理工学 / Aquatic bioproduction science / Animal production science

Research Interests

音声言語情報処理
音響信号処理
画像情報処理
映像情報処理
パターン認識
機械学習
データ駆動科学
異常検知
スマートメンテナンス
精密畜産
精密水産
看護情報

▼display all

Awards

第251回情報処理学会自然言語処理研究会優秀発表賞

2021.12

Winner：佐藤裕明, 小森智康, 三島剛, 河合吉彦, 望月貴裕, 佐藤庄衛, 小川哲司
早稲田大学ティーチングアワード総長賞

2018.02 早稲田大学
APSIPA ASC2017 Poster Book Prizes

2017.12 APSIPA ASC2017
情報処理学会山下記念研究賞

2012.03 情報処理学会
日本音響学会粟屋潔学術奨励賞

2011.03 日本音響学会
BTAS2008 Best Paper Award

2008.10 BTAS2008

▼display all

Papers

Analysis of the correlation between theory of mind and dialogue ability to identify essential ToM for dialogue systems

Haruhisa Iseno, Atsumoto Ohashi, Tetsuji Ogawa, Shinnosuke Takamichi, Ryuichiro Higashinaka

Proc. The 39th Pacific Asia Conference on Language, Information and Computation (PACLIC 39) 2025.12 [Refereed]
Image Recognition Framework via Adaptive Class Descriptions with Vision-Language Models

Haruki Konii, Teppei Nakano, Mari Wakabayashi, Tomomi Sato, Tetsuji Ogawa

Proc. The 8th Asian Conference on Pattern Recognition (ACPR 2025) 397 - 411 2025.11 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus
Towards Farmers’ Decision Support: Explainable-by-Design Modeling for Calving Sign Detection in Cattle

Michihiro Nakata, Teppei Nakano, Susumu Saito, Tetsuji Ogawa

Proc. The 8th Asian Conference on Pattern Recognition (ACPR 2025) 427 - 441 2025.11 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus
Lyric-aware karaoke background video selection using large language models and moment retrieval

Tomoki Ariga, Jun Taniguchi, Yosuke Higuchi, Sayaka Toma, Kunihiro Abe, Rie Shigyo, Tetsuji Ogawa

Proc. The 17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA2025) 1492 - 1497 2025.10 [Refereed]

Authorship：Last author, Corresponding author
Strong eye closure detection in children with profound intellectual and multiple disabilities using robust temporal difference features

Proc. The 17th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA2025) 2477 - 2482 2025.10 [Refereed]

Authorship：Last author, Corresponding author
Video-Based Vibration Analysis for Predictive Maintenance: A Motion Magnification and Random Forest Approach

Walid Gomaa, Abdelrahman Ammar, Ismael Abbo, Mohamed Nassef, Tetsuji Ogawa, Mohab Hossam

Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics 445 - 452 2025.10 [Refereed]

DOI

Scopus

1

Citation

(Scopus)
Stabilizing and Enhancing Remixing-based Unsupervised Sound Source Separation

Kohei Saijo, Tetsuji Ogawa

APSIPA Transactions on Signal and Information Processing 14 ( 1 ) 2025.10 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus
Necessity of voice sample selection in qualification tests for crowdsourced subjective audio quality evaluation

Takuma Yabe, Moe Yaegashi, Teppei Nakano, Tetsuji Ogawa

Proc. the 33rd European Signal Processing Conference (EUSIPCO2025) 261 - 265 2025.09 [Refereed]

Authorship：Last author, Corresponding author
A comparative study on positional encoding for time-frequency domain dual-path transformer-based source separation models

Kohei Saijo, Tetsuji Ogawa

Proc. the 33rd European Signal Processing Conference (EUSIPCO2025) 446 - 450 2025.09 [Refereed]

Authorship：Last author, Corresponding author
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition

Asahi Sakuma, Hiroaki Sato, Ryuga Sugano, Tadashi Kumano, Yoshihiko Kawai, Tetsuji Ogawa

Proc. The 26th Annual Conference of the International Speech Communication Association (INTERSPEECH2025) 5503 - 5507 2025.08 [Refereed]

Authorship：Last author

DOI
End-to-End Speech Translation Guided by Robust Translation Capability of Large Language Model

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. The 26th Annual Conference of the International Speech Communication Association (INTERSPEECH2025) 21 - 25 2025.08 [Refereed]

DOI
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model for Guiding End-to-End Speech Recognition

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2025) 1 - 5 2025.04 [Refereed]

DOI
What to refer and how? - Exploring handling of auxiliary information in target speaker extraction

Tomohiro Hayashi, Riku Ogino, Kohei Saijo, Tetsuji Ogawa

Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2024 (APSIPA2024) 2024.12 [Refereed]

Authorship：Last author, Corresponding author
Differences between singer and speaker verification: Training singer feature representation extractor utilizing singing voice characteristics

Sayaka Toma, Tomoki Ariga, Yosuke Higuchi, Ichiju Hayasaka, Rie Shigyo, Tetsuji Ogawa

Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2024 (APSIPA2024) 2024.12 [Refereed]

Authorship：Last author, Corresponding author
WindVibraTransformer: A foundational model for precise and robust wind turbine condition monitoring via viration signals,

Takuya Wakayama, Taiki Inoue, Jun Ogata, Makoto Iida, Tetsuji Ogawa

Proc. 23rd International Conference on Machine Learning and Applications (ICMLA2024) 2024.12 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

1

Citation

(Scopus)
Leveraging data from vast unexplored seas: positive unlabeled learning for refining prediction area in good fishing ground prediction

Haruki Konii, Teppei Nakano, Yasumasa Miyazawa, Tetsuji Ogawa

Proc. 27th International Conference on Pattern Recognition (ICPR2024) 2024.12 [Refereed]

Authorship：Last author, Corresponding author
Exploring impact of prioritizing intra-singer acoustic variations on singer embedding extractor construction for singer verification

Sayaka Toma, Tomoki Ariga, Yosuke Higuchi, Ichiju Hayasaka, Rie Shigyo, Tetsuji Ogawa

Proc. The 27th Conference of the Oriental COCOSDA (O-COCOSDA2024) 2024.10 [Refereed]

Authorship：Last author, Corresponding author
Construction of individual tracking dataset for developing foundational models in calving sign monitoring for beef cattle

Michihiro Nakata, Sawa Ohyoshi, Teppei Nakano, Tetsuji Ogawa

Proc. The 11th European Conference on Precision Livestock Farming (ECPLF2024) 1625 - 1632 2024.09 [Refereed]

Authorship：Last author, Corresponding author
Hierarchical Multi-Task Learning with CTC and Recursive Operation

Nahomi Kusunoki, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. The 25th Annual Conference of the International Speech Communication Association (INTERSPEECH2024) 2855 - 2859 2024.09 [Refereed]

DOI
Exploring robust and explainable design for facial expression-based emotional state estimation in children with profound intellectual multiple disabilities

Kota Mochida, Teppei Nakano, Shinya Fujie, Mari Wakabayashi, Tomomi Sato, Tetsuji Ogawa

Proc. the 32nd European Signal Processing Conference (EUSIPCO2024) 481 - 485 2024.08 [Refereed]

Authorship：Last author, Corresponding author
Normal with occasional anomalies: Feature extraction for detecting non-stationary abnormal events in wind turbines,

Takuya Wakayama, Taiki Inoue, Jun Ogata, Makoto Iida, Tetsuji Ogawa

Proc. the 32nd European Signal Processing Conference (EUSIPCO2024) 2012 - 2016 2024.08 [Refereed]

Authorship：Last author, Corresponding author
Parody detection using source-target attention with teacher-forced lyrics

Tomoki Ariga, Yosuke Higuchi, Kazutoshi Hayasaka, Naoki Okamoto, Tetsuji Ogawa

2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2024) 2024.04 [Refereed]

Authorship：Last author, Corresponding author
Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization

Yusuke Fujita, Tetsuji Ogawa, Tetsunori Kobayashi

IEEE Access 11 140069 - 140076 2023.12 [Refereed]

DOI
A single speech enhancement model unifying dereverberation, denoising, speaker counting, separation, and extraction

Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU2023) 2023.12 [Refereed]

Authorship：Last author
Learning discriminative feature representation via metric learning for early operation of wind turbine anomaly detection systems

Taiki Inoue, Jun Ogata, Makoto Iida, Tetsuji Ogawa

Proc. 22nd International Conference on Machine Learning and Applications (ICMLA2023) 2023.12 [Refereed]

Authorship：Last author, Corresponding author
Masry: A text-to-speech system for the Egyptian Arabic

Ahmed Hammad Azab, Ahmed Bayoumi Zaki, Tetsuji Ogawa, Walid Gomaa

Proc. 20th International Conference on Informatics in Control, Automation, and Robotics (ICINCO2023) 2023.11 [Refereed]
Lightweight Multiscale Attention-Aware Method for Semantic Segmentation of Urban Structural Buildings in Drone Aerial Imagery

Jacob Herman, Rami Zewail, Tetsuji Ogawa, Samir El Sagheer

2023 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC) 2023.09 [Refereed]

DOI
Mask-CTC-based encoder pre-training for streaming end-to-end speech recognition

Huaibo Zhao, Yosuke Higuchi, Yusuke Kida, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. the 31st European Signal Processing Conference (EUSIPCO2023) 56 - 60 2023.09 [Refereed]
Voice or Content? --- Exploring impact of speech content on age estimation from voice

Yuta Ide, Naohiro Tawara, Susumu Saito, Teppei Nakano, Tetsuji Ogawa

Proc. the 31st European Signal Processing Conference (EUSIPCO2023) 221 - 225 2023.09 [Refereed]

Authorship：Last author, Corresponding author
Spotting parodies: Detecting alignment collapse between lyrics and singing voice

Tomoki Ariga, Yosuke Higuchi, Mitsunori Kanno, Rie Shigyo, Takato Mizuguchi, Naoki Okamoto, Tetsuji Ogawa

Proc. the 31st European Signal Processing Conference (EUSIPCO2023) 286 - 290 2023.09 [Refereed]

Authorship：Last author, Corresponding author
Remixing-based unsupervised source separation from scratch

Kohei Saijo, Tetsuji Ogawa

Proc. The 24th Annual Conference of the International Speech Communication Association (INTERSPEECH2023) 1678 - 1682 2023.08 [Refereed]

Authorship：Last author, Corresponding author
Thermal Gait Dataset for Deep Learning-Oriented Gait Recognition

Fatma Youssef, Ahmed El-Mahdy, Tetsuji Ogawa, Walid Gomaa

2023 International Joint Conference on Neural Networks (IJCNN) 2023.06 [Refereed]

DOI
Narrow Down Forecast Range: Using Knowledge of Past Operations and Attribute-Dependent Thresholding in Good Fishing Ground Prediction

Haruki Konii, Teppei Nakano, Yasumasa Miyazawa, Tetsuji Ogawa

OCEANS 2023 - Limerick 2023.06 [Refereed]

Authorship：Last author, Corresponding author

DOI
Neural Diarization with Non-Autoregressive Intermediate Attractors

Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.06 [Refereed]

Authorship：Last author

DOI
Self-Remixing: Unsupervised Speech Separation VIA Separation and Remixing

Kohei Saijo, Tetsuji Ogawa

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.06 [Refereed]

Authorship：Last author, Corresponding author

DOI
Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture

Huaibo Zhao, Shinya Fujie, Tetsuji Ogawa, Jin Sakuma, Yusuke Kida, Tetsunori Kobayashi

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.06 [Refereed]

DOI
BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.06 [Refereed]

DOI
Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss

Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023.06 [Refereed]

DOI
A Lightweight Transfer Learning-Based Model for Building Classification in Aerial Imagery

Jacob Herman, Rami Zewail, Tetsuji Ogawa, Samir ElSagheer

2023 15th International Conference on Computer Research and Development (ICCRD) 181 - 186 2023.01 [Refereed]

DOI
PostMe: Unsupervised Dynamic Microtask Posting For Efficient and Reliable Crowdsourcing

Ryo Yanagisawa, Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

2022 IEEE International Conference on Big Data (Big Data) 4039 - 4044 2022.12 [Refereed]

Authorship：Last author, Corresponding author

DOI
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model,

Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

Proc. The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP2022) 2022.12 [Refereed]
Refinement of Utterance Fluency Feature Extraction and Automated Scoring of L2 Oral Fluency with Dialogic Features

Ryuki Matsuura, Shungo Suzuki, Mao Saeki, Tetsuji Ogawa, Yoichi Matsuyama

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 1309 - 1317 2022.11 [Refereed]

DOI
Do You Know How Humans Sound? Exploring a Qualification Test Design for Crowdsourced Evaluation of Voice Synthesis Quality

Moe Yaegashi, Susumu Saito, Teppei Nakano, Tetsuji Ogawa

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 980 - 985 2022.11 [Refereed]

Authorship：Last author, Corresponding author

DOI
Design of Discriminators in GAN-Based Unsupervised Learning of Neural Post-Processors for Suppressing Localized Spectral Distortion

Riku Ogino, Kohei Saijo, Tetsuji Ogawa

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 969 - 975 2022.11 [Refereed]

Authorship：Last author, Corresponding author

DOI
Text-only domain adaptation based on intermediate CTC

Hiroaki Sato, Tomoyasu Komori, Takeshi Mishima, Yoshihiko Kawai, Takahiro Mochizuki, Shoei Sato, Tetsuji Ogawa

Proc. The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH2022) 2022.09 [Refereed]

Authorship：Last author
Confusion detection for adaptive conversational strategies of an oral proficiency assessment interview agent

Mao Saeki, Kotoka Miyagi, Shinya Fujie, Shungo Suzuki, Tetsuji Ogawa, Tetsunori Kobayashi, Yoichi Matsuyama

Proc. The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH2022) 2022.09 [Refereed]
Can humans correct errors from system? Investigating error tendencies in speaker identification using crowdsourcing

Yuta Ide, Susumu Saito, Teppei Nakano, Tetsuji Ogawa

Proc. The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH2022) 2022.09 [Refereed]

Authorship：Last author, Corresponding author
Unsupervised training of sequential neural beamformer using coarsely-separated and non-separated signals

Kohei Saijo, Tetsuji Ogawa

Proc. The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH2022) 2022.09 [Refereed]

Authorship：Last author, Corresponding author
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2022) 7797 - 7801 2022.05 [Refereed]

DOI
Remix-Cycle-Consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation

Kohei Saijo, Tetsuji Ogawa

Proc. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2022) 4373 - 4377 2022.05 [Refereed]

Authorship：Last author, Corresponding author

DOI
Sequential fish catch counter using vision-based fish detection and tracking

Riko Tanaka, Teppei Nakano, Tetsuji Ogawa

Proc. MTS/IEEE OCEANS 2022 Chennai Conference and Exhibit (OCEANS2022) 2022.02 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

9

Citation

(Scopus)
Inlier modeling-based good fishing ground detection for efficient bullet tuna trolling using meteorological and oceanographic Information

Yuka Horiuchi, Teppei Nakano, Yasumasa Miyazawa, Tetsuji Ogawa

Proc. MTS/IEEE OCEANS 2022 Chennai Conference and Exhibit (OCEANS2022) 2022.02 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

2

Citation

(Scopus)
Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models

Naohiro TAWARA, Atsunori OGAWA, Tomoharu IWATA, Hiroto ASHIKAWA, Tetsunori KOBAYASHI, Tetsuji OGAWA

IEICE Transactions on Information and Systems E105.D ( 1 ) 150 - 160 2022.01 [Refereed]

Authorship：Last author, Corresponding author

DOI
An investigation of enhancing CTC model for triggered attention-based streaming ASR

Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2021 (APSIPA2021) 2021.12 [Refereed]

Authorship：Corresponding author
Comparative study on DNN-based minimum variance beamforming robust to small movements of sound sources

Kohei Saijo, Kazuhiro Katagiri, Masaru Fujieda, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2021 (APSIPA2021) 2021.12 [Refereed]

Authorship：Last author, Corresponding author
風車異常検知の効率的運用に向けた正常状態の特徴表現学習

長谷川隆徳, 緒方淳, 村川正宏, 飯田誠, 小川哲司

日本風力エネルギー学会論文集 45 ( 3 ) 60 - 68 2021.11 [Refereed]

Authorship：Last author, Corresponding author
SIA-GAN: Scrambling Inversion Attack Using Generative Adversarial Network

Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa

IEEE Access 9 129385 - 129393 2021.09 [Refereed]

Authorship：Last author

DOI
VocalTurk: Exploring Feasibility of Crowdsourced Speaker Identification

Susumu Saito, Yuta Ide, Teppei Nakano, Tetsuji Ogawa

Proc. The 22th Annual Conference of the International Speech Communication Association (INTERSPEECH2021) 1723 - 1727 2021.08 [Refereed]

Authorship：Last author, Corresponding author

DOI
Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation

Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. The 22th Annual Conference of the International Speech Communication Association (INTERSPEECH2021) 3051 - 3055 2021.08 [Refereed]

DOI
Improved Mask-CTC for Non-Autoregressive End-to-End ASR

Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 8363 - 8367 2021.06 [Refereed]

DOI
Scrambling Parameter Generation to Improve Perceptual Information Hiding

Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa

Electronic Imaging 2021 ( 11 ) 155 - 1 2021.01 [Refereed]

Authorship：Last author

　View Summary

<italic>The present study proposes the method to improve the perceptual information hiding in image scramble approaches. Image scramble approaches have been used to overcome the privacy issues on the cloud-based machine learning approach. The performance of image scramble approaches are
depending on the scramble parameters; because it decides the performance of perceptual information hiding. However, in existing image scramble approaches, the performance by scrambling parameters has not been quantitatively evaluated. This may be led to show private information in public.
To overcome this issue, a suitable metric is investigated to hide PIH, and then scrambling parameter generation is proposed to combine image scramble approaches. Experimental comparisons using several image quality assessment metrics show that Learned Perceptual Image Patch Similarity (LPIPS)
is suitable for PIH. Also, the proposed scrambling parameter generation is experimentally confirmed effective to hide PIH while keeping the classification performance.</italic>

DOI
Investigation on network architecture for single-channel end-to-end denoising

Takuya Hasumi, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. The 2020 European Signal Processing Conference (EUSIPCO2020) 2021.01 [Refereed]

Authorship：Last author, Corresponding author
Noise-robust attention learning for end-to-end speech recognition

Yosuke Higuchi, Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. The 2020 European Signal Processing Conference (EUSIPCO2020) 2021.01 [Refereed]

Authorship：Last author, Corresponding author
Toward building a data-driven system for detecting mounting actions of black beef cattle

Yuriko Kawano, Susumu Saito, Teppei Nakano, Ikumi Kondo, Ryota Yamazaki, Hiromi Kusaka, Minoru Sakaguchi, Tetsuji Ogawa

Proc. 25th International Conference on Pattern Recognition (ICPR2020) 2021.01 [Refereed]

Authorship：Last author, Corresponding author
Crowdsourced verification for operating calving surveillance systems at an early stage

Yusuke Okimoto, Soshi Kawata, Susumu Saito, Nakano Teppei, Tetsuji Ogawa

Proc. 25th International Conference on Pattern Recognition (ICPR2020) 2021.01 [Refereed]

Authorship：Last author, Corresponding author
Feature Representation Learning for Calving Detection of Cows Using Video Frames

Ryosuke Hyodo, Teppei Nakano, Tetsuji Ogawa

Proc. 25th International Conference on Pattern Recognition (ICPR2020) 2021.01 [Refereed]

Authorship：Last author, Corresponding author
Analysis of multimodal features for speaking proficiency scoring in an interview dialogue

Mao Saeki, Yoichi Matsuyama, Satoshi Kobashikawa, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. The 8th IEEE Spoken Language Technology Workshop (SLT2021) 2021.01 [Refereed]
Efficient human-in-the-loop object detection using bi-directional deep SORT and annotation-free segment identification

Koki Madono, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2020 (APSIPA2020) 2020.12 [Refereed]

Authorship：Last author, Corresponding author
Exploiting narrative context and a priori knowledge of categories in textual emotion classification

Hikari Tanabe, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi

The 28th International Conference on Computational Linguistics (COLING2020) 5535 - 5540 2020.12 [Refereed]
Crowd-sourced development of image dataset for detecting mounting actions of black beef cattle

Yuriko Kawano, Susumu Saito, Teppei Nakano, Ikumi Kondo, Ryota Yamazaki, Hitomi Kusaka, Minoru Sakaguchi, Tetsuji Ogawa

The 2nd Asian Conference on Precision Livestock Farming (ACPLF2020) 341 - 351 2020.10 [Refereed]

Authorship：Last author, Corresponding author
Attention network learning for robust detection of allantochorion and fetal membrane of Japanese black beef cattle

Soshi Kawata, Teppei Nakano, Tetsuji Ogawa

The 2nd Asian Conference on Precision Livestock Farming (ACPLF2020) 333 - 340 2020.10 [Refereed]

Authorship：Last author, Corresponding author
Data-driven feature extraction for calving sign detection in Japanese black beef cattle using video frames

Ryosuke Hyodo, Teppei Nakano, Tetsuji Ogawa

The 2nd Asian Conference on Precision Livestock Farming (ACPLF2020) 323 - 332 2020.10 [Refereed]

Authorship：Last author, Corresponding author
Exploring Effectiveness of Inter-Microtask Qualification Tests in Crowdsourcing

Masaya Morinaga, Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. The 8th AAAI Conference on Human Computation and Crowdsourcing (HCOMP2020), Works-In-Progress and Demonstration Papers 2020.10 [Refereed]

Authorship：Last author, Corresponding author
Mask CTC: Non-autoregressive end-to-end ASR with CTC and mask predict

Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. The 21th Annual Conference of the International Speech Communication Association (INTERSPEECH2020) 3655 - 3659 2020.10 [Refereed]
Mentoring-reverse mentoring for unsupervised multi-channel speech source separation

Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. The 21th Annual Conference of the International Speech Communication Association (INTERSPEECH2020) 86 - 90 2020.10 [Refereed]
CHARM-Deep: Continuous Human Activity Recognition Model Based on Deep Neural Network Using IMU Sensors of Smartwatch

Sara Ashry, Tetsuji Ogawa, Walid Gomaa

IEEE Sensors Journal 20 ( 15 ) 8757 - 8770 2020.08 [Refereed]

DOI
SemSeq: A Regime for Training Widely-Applicable Word-Sequence Encoders

Hiroaki Tsuyuki, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi

Communications in Computer and Information Science 43 - 55 2020.07 [Refereed]

DOI

Scopus
Deep speech extraction with time-varying spatial filtering guided by desired direction attractor

Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2020) 671 - 675 2020.05 [Refereed]
Frame-level phoneme-invariant speaker embedding for text-independent speaker recognition on extremely short utterances

Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Marc Delcroix, Tetsuji Ogawa

Proc. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2020) 6799 - 6803 2020.05 [Refereed]

Authorship：Last author
Block-wise scrambled image recognition using adaptation network

Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa

AAAI-20 Workshop on Artificial Intelligence of Things 2020.02 [Refereed]

Authorship：Last author
Vibration-Based Fault Detection for Flywheel Condition Monitoring

Takanori Hasegawa, Mao Saeki, Tetsuji Ogawa, Teppei Nakano

Procedia Structural Integrity 17 487 - 494 2019.09 [Refereed]

Authorship：Corresponding author

DOI

Scopus

12

Citation

(Scopus)
Speaker adversarial training of DPGMM-based feature extractor for zero-resource languages

Yosuke Higuchi, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. INTERSPEECH2019 266 - 270 2019.09 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

4

Citation

(Scopus)
Multi-channel speech enhancement using time-domain convolutional denoising autoencoder

Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. INTERSPEECH2019 86 - 90 2019.09 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

39

Citation

(Scopus)
Calving prediction from video: Exploiting behavioural information relevant to calving signs in Japanese black beef cows

Kazuma Sugawara, Susumu Saito, Teppei Nakano, Makoto Akanabe, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. ECPLF2019 663 - 669 2019.08 [Refereed]

Authorship：Last author, Corresponding author
Two-stage calving prediction system: Exploiting state-based information relevant to calving signs in Japanese black beef cows

Ryosuke Hyodo, Saki Yasuda, Yusuke Okimoto, Susumu Saito, Teppei Nakano, Makoto Akanabe, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. ECPLF2019 670 - 676 2019.08 [Refereed]

Authorship：Last author, Corresponding author
Data assimilation versus machine learning: Comparative study of fish catch forecasting

Yuka Horiuchi, Yuya Kokaki, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. OCEANS2019 2019.06 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

1

Citation

(Scopus)
Psychological measure on fish catches and its application to optimization criterion for machine learning based predictors

Yuya Kokaki, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. OCEANS2019 2019.06 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

3

Citation

(Scopus)
Visual explanation of neural network based rotation machinery anomaly detection system

Mao Saeki, Jun Ogata, Masahiro Murakawa, Tetsuji Ogawa

Proc. ICPHM2019 2019.06 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

20

Citation

(Scopus)
Postfiltering using an adversarial denoising autoencoder with noise-aware training

Naohiro Tawara, Hikari Tanabe, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa

Proc. ICASSP2019 3282 - 3286 2019.05 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

2

Citation

(Scopus)
Adversarial autoencoder for reducing nonlinear distortion

Naohiro Tawara, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa

Proc. APSIPA2018 2018.11 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

5

Citation

(Scopus)
Sequential fish catch forecasting using Bayesian state space models

Yuya Kokaki, Naohiro Tawara, Tetsunori Kobayashi, Kazuo Hashimoto, Tetsuji Ogawa

Proc. ICPR2018 776 - 781 2018.08 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

6

Citation

(Scopus)
Acoustic feature representation based on timbre for fault detection of rotary machines

Kesaaki Menemura, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. SDPC2018 2018.08 [Refereed]

DOI

Scopus

3

Citation

(Scopus)
Tandem connectionist anomaly detection: Use of faulty vibration signals in feature representation learning

Takanori Hasegawa, Jun Ogata, Masahiro Murakawa, Tetsuji Ogawa

Proc. ICPHM2018 1 - 7 2018.06 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

12

Citation

(Scopus)
Speaker invariant feature extraction for zero-resource languages with adversarial training

Taira Tsuchiya, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018) 2381 - 2385 2018.04 [Refereed] [International journal]

Authorship：Last author, Corresponding author

DOI

Scopus

25

Citation

(Scopus)
Language model domain adaptation via recurrent neural network with domain-shared and domain-specific representations

Tsuyoshi Morioka, Naohiro Tawara, Tetsuji Ogawa, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi

Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018) 6084 - 6088 2018.04 [Refereed] [International journal]

DOI

Scopus

25

Citation

(Scopus)
Exploiting end of sentences and speaker alternations in recurrent neural network-based language modeling for multiparty conversations

Hiroto Ashikawa, Naohiro Tawara, Asunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2017 (APSIPA2017) 2017.12 [Refereed]

Authorship：Last author, Corresponding author

DOI

Scopus

1

Citation

(Scopus)
Adaptive training of vibration-based anomaly detector for wind turbine condition monitoring

Takanori Hasegawa, Jun Ogata, Masahiro Murakawa, Tetsunori Kobayashi, Tetsuji Ogawa

Proc. Annual Conference on PHM Society 177 - 184 2017.10 [Refereed]

Authorship：Last author, Corresponding author
Real-Time Large-Scale Map Matching Using Mobile Phone Data

Essam Algizawy, Tetsuji Ogawa, Ahmed El-Mahdy

ACM Transactions on Knowledge Discovery from Data 11 ( 4 ) 1 - 38 2017.08 [Refereed] [International journal]

　View Summary

With the wide spread use of mobile phones, cellular mobile big data is becoming an important resource that provides a wealth of information with almost no cost. However, the data generally suffers from relatively high spatial granularity, limiting the scope of its application. In this article, we consider, for the first time, the utility of actual mobile big data for map matching allowing for “microscopic” level traffic analysis. The state-of-the-art in map matching generally targets GPS data, which provides far denser sampling and higher location resolution than the mobile data. Our approach extends the typical Hidden-Markov model used in map matching to accommodate for highly sparse location trajectories, exploit the large mobile data volume to learn the model parameters, and exploit the sparsity of the data to provide for real-time Viterbi processing. We study an actual, anonymised mobile trajectories data set of the city of Dakar, Senegal, spanning a year, and generate a corresponding road-level traffic density, at an hourly granularity, for each mobile trajectory. We observed a relatively high correlation between the generated traffic intensities and corresponding values obtained by the gravity and equilibrium models typically used in mobility analysis, indicating the utility of the approach as an alternative means for traffic analysis.

DOI

Scopus

34

Citation

(Scopus)
Associative Memory Model-Based Linear Filtering and Its Application to Tandem Connectionist Blind Source Separation

Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi

IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 ( 3 ) 637 - 650 2017.03 [Refereed] [International journal]

DOI
A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation

Tetsuji Ogawa, Sri Harish Mallidi, Emmanuel Dupoux, Jordan Cohen, Naomi H. Feldman, Hynek Hermansky

Proc. 23rd International Conference on Pattern Recognition (ICPR2016) 2222 - 2227 2016.12 [Refereed] [International journal]

Authorship：Lead author, Corresponding author

DOI
Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi

APSIPA Trans. Signal & Infor. Process. ( 5 ) 2016.08 [Refereed]

DOI
Video semantic indexing using object detection-derived features

Kotaro Kikuchi, Kazuya Ueki, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. 24th European Signal Processing Conference (EUSIPCO2016) 1288 - 1292 2016.08 [Refereed]

DOI
Separation matrix optimization using associative memory model for blind source separation

Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri

2015 23rd European Signal Processing Conference, EUSIPCO 2015 1098 - 1102 2015.12 [Refereed]

　View Summary

A source signal is estimated using an associative memory model (AMM) and used for separation matrix optimization in linear blind source separation (BSS) to yield high quality and less distorted speech. Linear-filtering-based BSS, such as independent vector analysis (IVA), has been shown to be effective in sound source separation while avoiding non-linear signal distortion. This technique, however, requires several assumptions of sound sources being independent and generated from non-Gaussian distribution. We propose a method for estimating a linear separation matrix without any assumptions about the sources by repeating the following two steps: estimating non-distorted reference signals by using an AMM and optimizing the separation matrix to minimize an error between the estimated signal and reference signal. Experimental comparisons carried out in simultaneous speech separation suggest that the proposed method can reduce the residual distortion caused by IVA.

DOI

Scopus

2

Citation

(Scopus)
Uncertainty estimation of DNN classifiers

Sri Harish Mallidi, Tetsuji Ogawa, Hynek Hermansky

2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2015.12 [Refereed]

DOI
A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large-scale data

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

APSIPA Transactions on Signal and Information Processing 4 ( 4 ) 2015.10 [Refereed]

　View Summary

An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of non-parametric Bayesian modeling is implemented with the Markov chain Monte Carlo and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet process mixture model (UO-DPMM). The present paper demonstrates that UO-DPMM is successfully applied on large-scale data and outperforms the conventional hierarchical agglomerative clustering, especially for large amounts of utterances.

DOI

Scopus

2

Citation

(Scopus)
Autoencoder based multi-stream combination for noise robust speech recognition

Sri Harish Mallidi, Tetsuji Ogawa, Karel Vesely, Phani S. Nidadavolu, Hynek Hermansky

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015) 3551 - 3555 2015.09 [Refereed]

　View Summary

Performances of automatic speech recognition (ASR) systems degrade rapidly when there is a mismatch between train and test acoustic conditions. Performance can be improved using a multi-stream framework, which involves combining posterior probabilities from several classifiers (often deep neural networks (DNNs)) trained on different features/streams. Knowledge about the confidence of each of these classifiers on a noisy test utterance can help in devising better techniques for posterior combination than simple sum and product rules [1]. In this work, we propose to use autoencoders which are multi layer feed forward neural networks, for estimating this confidence measure. During the training phase, for each stream, an autocoder is trained on TANDEM features extracted from the corresponding DNN. On employing the autoencoder during the testing phase, we show that the reconstruction error of the autoencoder is correlated to the robustness of the corresponding stream. These error estimates are then used as confidence measures to combine the posterior probabilities generated from each of the streams. Experiments on Aurora4 and BABEL databases indicate significant improvements, especially in the scenario of mismatch between train and test acoustic conditions.
Bilinear map of filter-bank outputs for DNN-based speech recognition

Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015) 16 - 20 2015.09 [Refereed]

Authorship：Lead author, Corresponding author

　View Summary

Filter-bank outputs are extended into tensors to yield precise acoustic features for speech recognition using deep neural networks (DNNs). The filter-bank outputs with temporal contexts form a time-frequency pattern of speech and have been shown to be effective as a feature parameter for DNN-based acoustic models. We attempt to project the filter-bank outputs onto a tensor product space using decorrelation followed by a bilinear map to improve acoustic separability in feature extraction. This extension makes extracting a more precise structure of the time-frequency pattern possible because the bilinear map yields higher-order correlations of features. Experimental comparisons carried out in phoneme recognition demonstrate that the tensor feature provides comparable results to the filter-bank feature, and the fusion of the two features yields an improvement over each feature.
Feature extraction for rotary-machine acoustic diagnostics focused on period

Kesaaki Minemura, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. INTERNOISE2015 2015.08 [Refereed]
TOWARDS MACHINES THAT KNOW WHEN THEY DO NOT KNOW: SUMMARY OF WORK DONE AT 2014 FREDERICK JELINEK MEMORIAL WORKSHOP

Hynek Hermansky, Lukas Burget, Jordan Cohen, Emmanuel Dupoux, Naomi Feldman, John Godfrey, Sanjeev Khudanpur, Matthew Maciejewski, Sri Harish Mallidi, Anjali Menon, Tetsuji Ogawa, Vijayaditya Peddinti, Richard Rose, Richard Stern, Matthew Wiesner, Karel Vesely

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) 5009 - 5013 2015 [Refereed]

　View Summary

A group of junior and senior researchers gathered as a part of the 2014 Frederick Jelinek Memorial Workshop in Prague to address the problem of predicting the accuracy of a nonlinear Deep Neural Network probability estimator for unknown data in a different application domain from the domain in which the estimator was trained. The paper describes the problem and summarizes approaches that were taken by the group.

DOI

Scopus

7

Citation

(Scopus)
A COMPARATIVE STUDY OF SPECTRAL CLUSTERING FOR I-VECTOR-BASED SPEAKER CLUSTERING UNDER NOISY CONDITIONS

Naohiro Tawara, Tetsuji Ogawa, Tetsunori Kobayashi

2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) 2041 - 2045 2015 [Refereed]

　View Summary

The present paper dealt with speaker clustering for speech corrupted by noise. In general, the performance of speaker clustering significantly depends on how well the similarities between speech utterances can be measured. The recently proposed i-vector-based cosine similarity has yielded the state-of-the-art performance in speaker clustering systems. However, this similarity often fails to capture the speaker similarity under noisy conditions. Therefore, we attempted to examine the efficiency of spectral clustering on i-vector-based similarity for speech corrupted by noise because spectral clustering can yield robustness against noise by non-linear projection. Experimental comparisons demonstrated that spectral clustering yielded significant improvement from conventional methods, such as agglomerative clustering and k -means clustering, under non-stationary noise conditions.

DOI

Scopus

4

Citation

(Scopus)
Effect of frequency weighting on MLP-based speaker canonicalization

Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta

Proc. INTERSPEECH2014 2987 - 2991 2014.09 [Refereed]
Vision based SLAM for humanoid robots: A survey

Walaa Gouda, Walid Gomaa, Tetsuji Ogawa

Proceedings of the 2013 2nd International Japan-Egypt Conference on Electronics, Communications and Computers, JEC-ECC 2013 170 - 175 2013.12 [Refereed]

Authorship：Last author

　View Summary

This paper is a survey work for designing a Vision based Simultaneous Localization and Mapping (VSLAM) humanoid robot to generate a map of an unknown environment. A lot of factors have to be considered while designing a VSLAM robot. Vision Sensors are very attractive for application in SLAM because of their rich sensory output and cost effectiveness. Different issues are involved in the problem of vision based SLAM and many different approaches exist in order to solve these issues. Similarly the type of environment determines the suitable feature extraction method. The main objective of this survey is to conduct a comparative study among the current vision sensing methods in terms of imaging systems used for performing VSLAM, feature extraction algorithms used in some recently published papers, and initialization of landmarks, and to figure out the best for our work. © 2013 IEEE.

DOI

Scopus

13

Citation

(Scopus)
Integration of MKL-based and i-vector-based speaker verification by short

Hideitsu Hino, Tetsuji Ogawa

2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013) 562 - 566 2013.11 [Refereed]

Authorship：Last author, Corresponding author

　View Summary

We developed a speaker verification system that is efficient for short utterances. The i-vector-based speaker representation has helped realize highly accurate speaker verification systems; however, it might be not robust against short utterances because the reliability of statistics required for extracting i-vectors is low. On the other hand, multiple kernel learning based on conditional entropy minimization has also achieved high accuracy in speaker verification that is robust against intra-speaker variability. To improve the robustness of speaker verification systems against short utterances, we attempted to integrate the above-mentioned complementary systems. Our experimental results showed that the proposed system integration achieved high-accuracy speaker verification systems, irrespective of the utterance lengths, even for very short utterances (e.g., less than two seconds).

DOI

Scopus

1

Citation

(Scopus)
Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2013.09 [Refereed]

　View Summary

A novel sampling method is proposed for estimating a continuous multi-scale mixture model. The multi-scale mixture models we assume have a hierarchical structure in which each component of the mixture is represented by a Gaussian mixture model (GMM). In speaker modeling from speech, this GMM represents intra-speaker dynamics derived from the difference in the attributes such as phoneme contexts and the existence of non-stationary noise and the mixture of GMMs (MoGMMs) represents inter-speaker dynamics derived from the difference in speakers. Gibbs sampling is a powerful technique to estimate such hierarchically structured models but can easily induce the local optima problem depending on its use especially when the elemental GMMs are complex in structure. To solve this problem, a highly accurate and robust sampling method based on the blocked Gibbs sampling and iterative conditional modes (ICM) is proposed and effectively applied for reducing a singularity solution given in the model with complex multi-modal distributions. In speaker clustering experiments under non-stationary noise, the proposed sampling-based model estimation improved the clustering performance by 17% on average compared to the conventional sampling-based methods. © 2013 IEEE.

DOI

Scopus

1

Citation

(Scopus)
Stream Selection and Integration in Multistream ASR Using GMM-Based Performance Monitoring

Tetsuji Ogawa, Feipeng Li, Hynek Hermansky

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013) 3331 - 3335 2013.08 [Refereed]

Authorship：Lead author, Corresponding author

　View Summary

A moderately deep and rather wide artificial neural net is applied in phoneme recognition of noisy speech. The net is formed by first estimating posterior probabilities of phonemes in 21 band-limited streams covering the whole speech spectrum. These 21 band-limited streams are subdivided into three seven band-limited stream subsets, by differently sub-sampling the original 21 band-limited streams. In the second processing stage, all non-empty combinations of seven band-limited streams from each subset are formed as inputs to 127 artificial neural nets that are again trained to yield phoneme posteriors. In this way, 127 x 3 = 381 processing streams are formed. A novel technique for finding the best combination of the resulting 381 parallel processing streams, which uses the likelihood of a single-state Gaussian mixture model of the final classifier output is applied to selecting the most efficient streams. The technique is efficient in phoneme recognition of speech that is corrupted by realistic additive noise.
An Improved Entropy-Based Multiple Kernel Learning

Hideitsu Hino, Tetsuji Ogawa

2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012) 1189 - 1192 2012.11 [Refereed]

Authorship：Last author

　View Summary

Kernel methods have been successfully used in many practical machine learning problems. However, the problem of choosing a suitable kernel is left to practitioners. One method to select the optimal kernel is to learn a linear combination of element kernels. A framework of multiple kernel learning based on conditional entropy minimization criterion (MCEM) has been proposed and it has been shown to work well for, e.g., speaker recognition tasks. In this paper, a computationally efficient implementation for MCEM, which utilizes sequential quadratic programming, is formulated. Through a comparative experiment to conventional MCEM algorithm on a speaker verification task, the proposed method is shown to offer comparable verification accuracy with considerable improvement in computational speed.
Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012) 2163 - 2166 2012.09 [Refereed]

　View Summary

We have proposed a novel speaker clustering method based on a hierarchically structured utterance-oriented Dirichlet process mixture model. In the proposed method, the number of speakers can be determined from the given data using a nonparametric Bayesian manner and intra-speaker variability is successfully handled by multi-scale mixture modeling. Experimental result showed that the proposed method is computationally-efficient and effective in speaker clustering. The proposed method significantly improve the accuracy of speaker clustering systems as compared with the conventional method, particularly for the case in which the number of utterances varied from speaker to speaker.
FULLY BAYESIAN INFERENCE OF MULTI-MIXTURE GAUSSIAN MODEL AND ITS EVALUATION USING SPEAKER CLUSTERING

Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi

2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 5253 - 5256 2012.03 [Refereed]

　View Summary

This study aims to verify effective optimization methods for estimating parametric, fully Bayesian models in speech processing. For that purpose, we investigate the impact of the difference in optimization methods for the multi-scale Gaussian mixture model, which is suitable for speaker clustering, on the clustering accuracy. The Markov chain Monte Carlo (MCMC)-based method was compared with the variational Bayesian method in the speaker clustering experiment; with a small amount of data, the MCMC-based method was more effective; with large scale data (more than one million samples), the difference between these methods in terms of the clustering accuracy decreased and the MCMC-based method was computationally efficient.

DOI

Scopus

6

Citation

(Scopus)
CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments

Takahiro Fukumori, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Norihide Kitaoka, Takeshi Yamada, Kazumasa Yamamoto, Satoru Tsuge, Masakiyo Fujimoto, Tetsuya Takiguchi, Chiyomi Miyajima, Satoshi Tamura, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura

Acoustical Science and Technology 32 ( 5 ) 201 - 210 2011.09 [Refereed]

　View Summary

We have been distributing a new collection of databases and evaluation tools called CENSREC-4, which is a framework for evaluating distant-talking speech in reverberant environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1. Two subsets are included in the data: "basic data sets" and "extra data sets." The basic data sets are used for evaluating the room impulse response-convolved speech data to simulate the various reverberations. The extra data sets consist of simulated data and corresponding real recorded data. Evaluation tools are presently only provided for the basic data sets and will be delivered to the extra data sets in the future. The task of CENSREC-4 with a basic data set appears simple
however, the results of experiments prove that CENSREC-4 provides a challenging reverberation speech-recognition task, in the sense that a traditional technique to improve recognition and a widely used criterion to represent the difficulty of recognition deliver poor performance. Within this context, this common framework can be an important step toward the future evolution of reverberant speech-recognition methodologies. © 2011 The Acoustical Society of Japan.

DOI

Scopus

2

Citation

(Scopus)
Development and evaluation of Japanese Lombard speech corpus

Tetsuji Ogawa, Takanobu Nishiura, Takeshi Yamada, Norihide Kitaoka, Tetsunori Kobayashi

Proc. Internoise2011 2011.09 [Refereed] [Invited]

Authorship：Lead author, Corresponding author
Class-Distance-Based Discriminant Analysis and Its Application to Supervised Automatic Age Estimation

Tetsuji Ogawa, Kazuya Ueki, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E94D ( 8 ) 1683 - 1689 2011.08 [Refereed]

Authorship：Lead author, Corresponding author

　View Summary

We propose a novel method of supervised feature projection called class-distance-based discriminant analysis (CDDA), which is suitable for automatic age estimation (AAE) from facial images. Most methods of supervised feature projection, e.g., Fisher discriminant analysis (FDA) and local Fisher discriminant analysis (LFDA), focus on determining whether two samples belong to the same class (i.e., the same age in AAE) or not. Even if an estimated age is not consistent with the correct age in AAE systems, i.e., the AAE system induces error, smaller errors are better. To treat such characteristics in AAE, CDDA determines between-class separability according to the class distance (i.e., difference in ages); two samples with similar ages are imposed to be close and those with spaced ages are imposed to be far apart. Furthermore, we propose an extension of CDDA called local CDDA (LCDDA), which aims at handling multimodality in samples. Experimental results revealed that CDDA and LCDDA could extract more discriminative features than FDA and LFDA.

DOI

Scopus
Speaker verification robust to talking style variation using multiple kernel learning based on conditional entropy minimization

Tetsuji Ogawa, Hideitsu Hino, Noboru Murata, Tetsunori Kobayashi

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2741 - 2744 2011.08 [Refereed]

Authorship：Lead author, Corresponding author
Speaker Clustering Based on Utterance-oriented Dirichlet Process Mixture Model

Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011) 2905 - 2908 2011.08 [Refereed]

　View Summary

This paper provides the analytical solution and algorithm of UO-DPMM based on a non-parametric Bayesian manner, and thus realizes fully Bayesian speaker clustering. We carried out preliminary speaker clustering experiments by using a TIMIT database to compare the proposed method with the conventional Bayesian Information Criterion (BIC) based method, which is an approximate Bayesian approach. The results showed that the proposed method outperformed the conventional one in terms of both computational cost and robustness to changes in tuning parameters.
Spatial filter calibration based on minimization of modified LSD

Nobuaki Tanaka, Tetsuji Ogawa, Tetsunori Kobayashi

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011) 1761 - 1764 2011.08 [Refereed]

　View Summary

A new sound source separation method has been developed that is robust against individual variability in microphones and acoustic lines. A specific area that has a target sound source was enhanced by using a spatial filter developed by time-frequency masking. However, there is a strong likelihood that the spatial filters will be distorted due to the impact of individual variability in microphone characteristics and acoustic lines. To solve this problem, calibration of these spatial filters' shapes was attempted using a modified log-spectral distance (MLSD) minimization criterion, which uses utterances made by each individual (i.e., a sound source) at the desired positions. The effectiveness of this spatial filter calibration was experimentally verified in speech recognition experiments; MLSD-based calibration had fewer word errors than the cases without calibration and calibration using other criteria.
Speaker recognition using multiple kernel learning based on conditional entropy minimization

Tetsuji Ogawa, Hideitsu Hino, Nima Reyhani, Noboru Murata, Tetsunori Kobayashi

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2204 - 2207 2011.05 [Refereed]

Authorship：Lead author, Corresponding author

DOI
CENSREC-1-AV: An audio-visual corpus for noisy bimodal speech recognition

Satoshi Tamura, Chiyomi Miyajima, Norihide Kitaoka, Takeshi Yamada, Satoru Tsuge, Tetsuya Takiguchi, Kazumasa Yamamoto, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Shigeki Matsuda, Tetsuji Ogawa, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura

Proc. AVSP2010 2010.09 [Refereed]
DEVELOPMENT OF ZONAL BEAMFORMER AND ITS APPLICATION TO ROBOT AUDITION

Nobuaki Tanaka, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010) 1529 - 1533 2010.08 [Refereed]

　View Summary

We have proposed a zonal beamformer (ZBF), which enhances the sound source located in a zonal space, and applied the ZBF to noise reduction systems for robot audition. A conversational partner of a robot does not always remain stationary with respect to the robot. In order to cope with such a situation, we have proposed a fan-like beamformer (FBF), which enhances the sound source located in a fan-like space in front of the robot under the assumption that the partner is in front of the robot. However, the FBF may degrade the noise reduction performance when directional noise sources are located behind the target source because the FBF widens the space as the distance from the robot increases. The ZBF can better improve the performance of eliminating the directional noise coming from behind the target source than the FBF because the ZBF has a considerably sharper directivity than the FBF.
Speech Enhancement Using a Square Microphone Array in the Presence of Directional and Diffuse Noise

Tetsuji Ogawa, Shintaro Takada, Kenzo Akagiri, Tetsunori Kobayashi

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E93A ( 5 ) 926 - 935 2010.05 [Refereed]

Authorship：Lead author, Corresponding author

　View Summary

We propose a new speech enhancement method suitable for mobile devices used in the presence of various types of noise. In order to achieve high-performance speech recognition and auditory perception in mobile devices, various types of noise have to be removed under the constraints of a space-saving microphone arrangement and few computational resources. The proposed method can reduce both the directional noise and the diffuse noise under the abovementioned constraints for mobile devices by employing a square microphone array and conducting low-computational-cost processing that consists of multiple null beamforming, minimum power channel selection, and Wiener filtering. The effectiveness of the proposed method is experimentally verified in terms of speech recognition accuracy and speech quality when both the directional noise and the diffuse noise are observed simultaneously; this method reduces the number of word errors and improves the log-spectral distances as compared to conventional methods.

DOI

Scopus

4

Citation

(Scopus)
Development of zonal beam former and its application to robot audition

Nobuaki Tanaka, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

European Signal Processing Conference 1529 - 1533 2010

　View Summary

We have proposed a zonal beamformer (ZBF), which enhances the sound source located in a zonal space, and applied the ZBF to noise reduction systems for robot audition. A conversational partner of a robot does not always remain stationary with respect to the robot. In order to cope with such a situation, we have proposed a fan-like beamformer (FBF), which enhances the sound source located in a fan-like space in front of the robot under the assumption that the partner is in front of the robot. However, the FBF may degrade the noise reduction performance when directional noise sources are located behind the target source because the FBF widens the space as the distance from the robot increases. The ZBF can better improve the performance of eliminating the directional noise coming from behind the target source than the FBF because the ZBF has a considerably sharper directivity than the FBF. © EURASIP, 2010.
Influence of Lombard Effect: Accuracy Analysis of Simulation-Based Assessments of Noisy Speech Recognition Systems for Various Recognition Conditions

Tetsuji Ogawa, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E92D ( 11 ) 2244 - 2252 2009.11 [Refereed]

Authorship：Lead author, Corresponding author

　View Summary

The accuracy of simulation-based assessments of speech recognition systems under noisy conditions is investigated with a focus on the influence of the Lombard effect on the speech recognition performances. This investigation was carried out under various recognition conditions of different sound pressure levels of ambient noise, for different recognition tasks, such as continuous speech recognition and spoken word recognition, and using different recognition systems, i.e., systems with and without adaptation of the acoustic models to ambient noise. Experimental results showed that accurate simulation was not always achieved when dry sources with neutral talking style were used, but it could be achieved if the dry sources that include the influence of the Lombard effect were used; the simulation in the latter case is accurate, irrespective of the recognition conditions.

DOI

Scopus

4

Citation

(Scopus)
Robot auditory system using head-mounted square microphone array

Kosuke Hosoya, Tetsuji Ogawa, Tetsunori Kobayashi

2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS 2736 - 2741 2009.10 [Refereed]

　View Summary

A new noise reduction method suitable for autonomous mobile robots was proposed and applied to preprocessing of a hands-free spoken dialogue system. When a robot talks with a conversational partner in real environments, not only speech utterances by the partner but also various types of noise, such as directional noise, diffuse noise, and noise from the robot, are observed at microphones. We attempted to remove these types of noise simultaneously with small and light-weighted devices and low-computational-cost algorithms. We assumed that the conversational partner of the robot was in front of the robot. In this case, the aim of the proposed method is extracting speech signals coming from the frontal direction of the robot. The proposed noise reduction system was evaluated h the presence of various types of noise: the number of word errors was reduced by 69 % as compared to the conventional methods. The proposed robot auditory system can also cope with the case in which a conversational partner (i.e., a sound source) moves from the front of the robot: the sound source was localized by face detection and tracking using facial images obtained from a camera mounted on an eye of the robot. As a result, various types of noise could be reduced in real time, irrespective of the sound source positions, by combining speech information with image information.

DOI

Scopus

6

Citation

(Scopus)
CENSREC-1-C: An evaluation framework for voice activity detection under noisy environments

Norihide Kitaoka, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Kazumasa Yamamoto, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Shigeki Matsuda, Tetsuji Ogawa, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura

Acoustical Science and Technology 30 ( 5 ) 363 - 371 2009.08 [Refereed]

　View Summary

Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding under noisy environments. We have developed an evaluation framework for VAD under noisy environments, named CENSREC-1-C. We designed this framework for simple isolated utterance detection and hence, this framework consists of noisy continuous digit utterances and evaluation tools for VAD results. We define two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance. We also provide the evaluation results of a power-based VAD method as a reference. ©2009 The Acoustical Society of Japan.

DOI

Scopus

27

Citation

(Scopus)
Direction-of-arrival estimation under noisy condition using four-line omni-directional microphones mounted on a robot head

Tetsuji Ogawa, Kosuke Hosoya, Kenzo Akagiri, Tetsunori Kobayashi

Proc. EUSIPCO2009 2009.08 [Refereed]
CENSREC-4: Development of Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments

Masato Nakayama, Takanobu Nishiura, Yuki Denda, Norihide Kitaoka, Kazumasa Yamamoto, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 968 - 971 2008.09 [Refereed]
Class Distance Weighted Locality Preserving Projection for Automatic Age Estimation

Kazuya Ueki, Masakazu Miya, Tetsuji Ogawa, Tetsunori Kobayashi

2008 IEEE Second International Conference on Biometrics: Theory, Applications and Systems (BTAS2008) 2008.09 [Refereed]

DOI
Ears of the Robot: Noise Reduction Using Four-Line Ultra-Micro Omni-Directional Microphones Mounted on A Robot Head

Tetsuji Ogawa, Hirofumi Takeuchi, Shintaro Takada, Kenzo Akagiri, Tetsunori Kobayashi

Proc. EUSIPCO2008 2008.08 [Refereed]

Authorship：Lead author, Corresponding author
Ears of the robot: Direction of arrival estimation based on pattern recognition using robot-mounted microphones

Naoya Mochiki, Tetsuji Ogawa, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E91D ( 5 ) 1522 - 1530 2008.05 [Refereed]

　View Summary

We propose a new type of direction-of-arrival estimation method for robot audition that is free from strict head related transfer function estimation. The proposed method is based on statistical pattern recognition that employs a ratio of power spectrum amplitudes occurring for a microphone pair as a feature vector. It does not require any phase information explicitly, which is frequently used in conventional techniques, because the phase information is unreliable for the case in which strong reflections and diffractions occur around the microphones. The feature vectors we adopted can treat these influences naturally. The effectiveness of the proposed method was shown from direction-of-arrival estimation tests for 19 kinds of directions: 92.4% of errors were reduced compared with the conventional phase-based method.

DOI

Scopus

3

Citation

(Scopus)
Speech enhancement using square microphone array for mobile devices

Shintaro Takada, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 313 - 316 2008.04 [Refereed]

　View Summary

In this paper, we propose a new type of speech enhancement method that is suitable for mobile devices used in noisy environments. For the sake of achieving high-performance speech recognition and auditory perception in the mobile devices, disturbance noises have to be removed under the requirements of a space-saving microphone arrangement and a low computational cost. The proposed method can reduce both the directional and the diffuse noises under the requirements for the mobile devices by applying the square microphone array and the low-cost processing that consists of multiple null beam-forming, their minimum power channel selection and Wiener filtering. The effectiveness of the proposed method is clarified for speech recognition accuracies and speech qualities under the condition in which both the directional and the diffuse noises exist simultaneously: it reduced 40% of recognition errors and improved PESQ-based MOS value by 0.75 point.

DOI

Scopus

6

Citation

(Scopus)
Sound source separation using null-beamforming and spectral subtraction for mobile devices

Shintaro Takada, Satoshi Kanba, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

Proc. 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA2007) 133 - 136 2007.10 [Refereed]

　View Summary

This paper presents a new type of speech segregation method for mobile devices in noisy sound situation, where two or more speakers are talking simultaneously. The proposed method consists of multiple null-beamformers, their minimum power channel selection and spectral subtraction. The proposed method is performed with space-saving and coplanar microphone arrangements and low-cost calculations, which are the very important requirements for the mobile application. Effectiveness of the proposed method is clarified in the segregation and the recognition experiments of two simultaneous continuous speeches: the method improved the PESQ-based MOS value by about one point and reduced 70% of word recognition errors compared with non-processing.

DOI

Scopus

9

Citation

(Scopus)
Ears of the robot: Three simultaneous speech segregation and recognition using robot-mounted microphones

Naoya Mochiki, Tetsuji Ogawa, Tetsunori Kobayashi

IEICE Transactions on Information and Systems E90-D ( 9 ) 1465 - 1468 2007.09 [Refereed]

Authorship：Corresponding author

　View Summary

A new type of sound source segregation method using robot-mounted microphones, which are free from strict head related transfer function (HRTF) estimation, has been proposed and successfully applied to three simultaneous speech recognition systems. The proposed segregation method is executed with sound intensity differences that are due to the particular arrangement of the four directivity microphones and the existence of a robot head acting as a sound barrier. The proposed method consists of three-layered signal processing: two-line SAFIA (binary masking based on the narrow band sound intensity comparison), two-line spectral subtraction and their integration. We performed 20 K vocabulary continuous speech recognition test in the presence of three speakers' simultaneous talk, and achieved more than 70% word error reduction compared with the case without any segregation processing. Copyright © 2007 The Institute of Electronics, Information and Communication Engineers.

DOI

Scopus

3

Citation

(Scopus)
Adequacy Analysis of Simulation-Based Assessment of Speech Recognition System

Tetsuji Ogawa, Satoshi Kanba, Tetsunori Kobayashi

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 ( 4 ) 1153 - 1157 2007.04 [Refereed]

Authorship：Lead author

DOI
Manifold HLDA and its application to robust speech recognition

Toshiaki Kubo, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. The 9th International Conference on Spoken Language Processing (INTERSPEECH2006 - ICSLP) 1551 - 1554 2006.09 [Refereed]

Authorship：Corresponding author

　View Summary

A manifold heteroscedastic linear discriminant analysis (MHLDA) which removes environmental information explicitly from the useful information for discrimination is proposed. Usually, a feature parameter used in pattern recognition involves categorical information and also environmental information. A well-known HLDA tries to extract useful information (UT) to represent categorical information from the feature parameter. However, environmental information is still remained in the UI parameters extracted by HLDA, and it causes slight degradation in performance. This is because HLDA does not handle the environmental information explicitly. The proposed MHLDA also tries to extract UI like HLDA, but it handles environmental information explicitly. This handling makes MHLDA-based UI parameter less influenced of environment. However, as compensation, in MHLDA, the categorical information is little bit destroyed. In this paper, we try to combine HLDA-based UI and MHLDA-based UI for pattern recognition, and draw benefit of both parameters. Experimental results show the effectiveness of this combining method.

DOI
Source Separation Using Multiple Directivity Patterns Produced by ICA-based BSS

Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. The 14th European Signal Processing Conference (EUSIPCO2006) 2006.09 [Refereed]
A Method for Solving the Permutation Problem of Frequency-Domain BSS Using Reference Signal

Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. The 14th European Signal Processing Conference (EUSIPCO2006) 2006.09 [Refereed]
Head Gesture Recognition for the Moving Conversation Robot

NAKAJIMA Kei, EJIRI Yasushi, FUJIE Shinya, OGAWA Tetsuji, MATSUSAKA Yosuke, KOBAYASHI Tetsunori

The IEICE transactions on information and systems J89-D ( 7 ) 1514 - 1522 2006.09 [Refereed]

CiNii
Genetic algorithm based optimization of Partly-Hidden Markov Model structure using discriminative criterion

Tetsuji Ogawa, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E89D ( 3 ) 939 - 945 2006.03 [Refereed]

Authorship：Lead author

　View Summary

A discriminative modeling is applied to optimize the structure of it Partly-Hidden Markov Model (PHMM). PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can represent observation dependent behaviors in both observations and state transitions. In the formulation of the previous PHMM we used a common structure for all models. However, it is expected that the optimal structure which gives the best performance differs front category to category. In this paper, we designed a new structure optimization method in which the dependence of the states and the observations of PHMM are optimally defined according to each model using the weighted likelihood-ratio maximization (WLRM) criterion. The WLRM criterion gives high discriminability between the correct category and the incorrect categories. Therefore it gives model structures with good discriminative performance. We define the model structure combination which satisfy the WLRM criterion for any possible structure combinations as the optimal structures. A genetic algorithm is also applied to the adequate approximation of a full search. With results of continuous lecture talk speech recognition, the effectiveness of the proposed structure optimization is shown: it reduced the word errors compared to HMM and PHMM with a common structure for all models.

DOI

Scopus

2

Citation

(Scopus)
A Method for Solving the Permutation Problem of Frequency-domain Blind Source Separation using Reference Signal

Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

Biennial on DSP for in-Vehicle and Mobile Systems 2005.09 [Refereed]
Optimizing the Structure of Partly-Hidden Markov Models Using Weighted Likelihood-Ratio Maximization Criterion

Tetsuji Ogawa, Tetsunori Kobayashi

Proc. Interspeech2005 3353 - 3356 2005.09 [Refereed]

Authorship：Lead author
Extension of Hidden Markov Models for multiple candidates and its application to gesture recognition

Yosuke Sato, Tetsuji Ogawa, Tetsunori Kobayashi

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E88D ( 6 ) 1239 - 1247 2005.06 [Refereed]

　View Summary

We propose a modified Hidden Markov Model (HMM) with a view to improve gesture recognition using a moving camera. The conventional HMM is formulated so as to deal with only one feature candidate per frame. However. for a mobile robot, the background and the lighting conditions are always changing, and the feature extraction problem becomes difficult. It is almost impossible to extract a reliable feature vector under such conditions. In this paper, we define a new gesture recognition framework in which multiple candidates of feature vectors are generated with confidence measures and the HMM is extended to deal with these multiple feature vectors. Experimental results comparing the proposed system with feature vectors based on DCT and the method of selecting only one candidate feature point verifies the effectiveness of the proposed technique.

DOI

Scopus

1

Citation

(Scopus)
Recognition of three simultaneous utterance of speech by four-line directivity microphone mounted on head of robot

Naoya Mochiki, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. ICSLP2004 2 821 - 824 2004.10 [Refereed]
Extension of State-Observation Dependency in Partly-Hidden Markov Models and Its Application to Continuous Speech Recognition

Tetsuji Ogawa, Tetsunori Kobayashi

The Transactions of the Institute of Electronics,Information and Communication Engineers. J87-DII ( 6 ) 1216 - 1223 2004.06 [Refereed]

Authorship：Lead author

CiNii
Speech Recognition of Double Talk using SAFIA-based Audio Segregation

Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

Proc. EUROSPEECH2003 1285 - 1288 2003.09 [Refereed]
Hybrid modeling of PHMM and HMM for speech recognition

Tetsuji Ogawa, Tetsunori Kobayashi

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS 1 140 - 143 2003 [Refereed]

Authorship：Lead author

　View Summary

A hybrid acoustic model of Partly Hidden Markov Model (PHMM) and HMM is proposed,
PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can realize the observation dependent behaviors in both observations and state transitions. It achieved good performance but some errors with different trend from HMM still remained.
In this paper, we designed a new acoustic model on the basis of PHMM, in which the observation and state transition probabilities are defined by the geometric means of PHMM-based ones and HMM-based ones. In this framework, if a word hypothesis is given a low score by either PHMM or HMM, it almost loses possibilities to be a probable candidate. Since many errors are due to the high-scores of incorrect categories, rather than the low-score of the correct category, this property contributed to reduce errors. Moreover, the proposed model is more stable than PHMM because the higher order statistics of PHMM, which is generally accurate but sometimes less reliable, is smoothed by the lower order statistics of HMM, which is not so accurate but robust.
Experimental results showed the effectiveness of proposed model: it reduced the word errors by 25% compared with HMM.

DOI
Generalization of State-Observation-Dependency in Partly Hidden Markov Models

Tetsuji Ogawa, Tetsunori Kobayashi

Proc. ICSLP2002 2673 - 2676 2002.09 [Refereed]

Authorship：Lead author

▼display all

Books and Other Publications

Smart fisheries

WADA Masaaki( Part： Contributor, Catch Prediction Model)

Midori Shobo 2023.03 ISBN: 9784895318853
音声（下）

岩野, 公司, 河原, 達也, 篠田, 浩一, 伊藤, 彰則, 増村, 亮, 小川, 哲司, 駒谷, 和範( Part： Contributor, 話者認識)

コロナ社 2023.01 ISBN: 9784339013672
スマート水産業入門

和田, 雅昭( Part： Contributor, 定置網漁の日次漁獲量予測～定置網漁に関する知識を組み込んだビッグデータを必要としない漁獲量予測モデリング～)

緑書房 2022.03 ISBN: 9784895317818
Encyclopedia of artificial intelligence

( Part： Contributor)

2017.07 ISBN: 9784320997974
音響キーワードブック

日本音響学会( Part： Contributor, 話者ダイアライゼーション)

コロナ社 2016.03 ISBN: 9784339008807

ASIN

Presentations

風車の予知保全に向けた故障予兆検知技術

小川哲司 [Invited]

産業技術総合研究所第87回人工知能セミナー「AI技術と風力発電」

Presentation date： 2025.08
気象・海況情報を用いた良漁場予測における予測範囲の絞り込みに関する取り組み

兒新治紀, 中野鐵兵, 宮澤泰正, 小川哲司

マリンITワークショップ2023

Presentation date： 2023.08
畜産農家が納得して意思決定するための繁殖牛の映像監視モデリング

小川哲司, 斎藤奨, 中野鐵兵 [Invited]

第10回計測自動制御学会制御部門マルチシンポジウム，企画セッション：農・林・畜・水産業への計測制御技術応用

Presentation date： 2023.03
Video monitoring for detecting calving signs of breeding cows - How to construct and operate AI systems that enable users to make decisions with conviction?

Tetsuji Ogawa

CSE Research Seminar in E-JUST, E-JUST, Alexandria, Egypt

Presentation date： 2022.12
Tutti: データアノテーション用システム開発・運用基盤

斎藤奨, 中野鐵兵, 小川哲司

第25回情報論的学習理論ワークショップ (IBIS2022)

Presentation date： 2022.11
予測クラスの相違に基づく深層ニューラルネットワークの不確実性推定

松永直輝, 斎藤奨, 中野鐵兵, 小川哲司

第24回情報論的学習理論ワークショップ（IBIS2021）

Presentation date： 2021.11
映像監視に基づく繁殖牛の分娩予兆検知～ユーザが納得して意思決定できるような映像監視システムをどう構築し運用するか？～

小川哲司

第2回AI・人工知能EXPO秋・アカデミックフォーラム

Presentation date： 2021.10
船上映像からの魚体の検出・追跡に基づく漁獲尾数計測

田中理子, 中野鐵兵, 漁崎盛也, 小川哲司

マリンITワークショップ2021

Presentation date： 2021.09
意思決定支援のための説明可能な状態監視システムの構築・運用法（家畜の映像監視を例に）

小川哲司, 兵頭亮介, 斎藤奨, 中野鐵兵 [Invited]

電子情報通信学会総合大会，企画セッション：AIは本当にPoCを超えられるのか？-実用化を阻む大きな壁-

Presentation date： 2021.03
メジカ漁師の意思決定に対する直接的支援のための漁場予測に関する検討～高知マリンイノベーションの取り組みとして～

小川哲司, 堀内優佳, 田中理子, 宮澤泰正, 漁崎盛也

マリンITワークショップ2021みえ

Presentation date： 2021.03
風車異常検知システムの早期運用に関する事例紹介～メンテナンスに係る意思決定のために人工知能技術をどう構築・運用すべきか？～

小川哲司, 長谷川隆徳, 緒方淳 [Invited]

トライボロジー技術へのAIの活用を考える研究会

Presentation date： 2021.03
ビッグデータを利用できないとき，人工知能技術をどう開発し運用するか？～第一次産業支援に関する事例紹介～

小川哲司 [Invited]

早稲田大学実体情報学博士プログラム 2020年度第4回コロキューム

Presentation date： 2020.12
ユーザの意思決定過程に関するドメイン知識を組み込んだ解釈可能な映像監視モデリング

兵頭亮介, 中野鐵兵, 小川哲司

第23回情報論的学習理論ワークショップ (IBIS2020) (茨城県・つくば市)

Presentation date： 2020.11
ビッグデータを利用できないとき，AI技術をどう開発するか？～水産業支援と畜産業支援の事例紹介～

小川哲司, 斎藤奨, 中野鐵兵 [Invited]

電子情報通信学会総合大会，企画セッション：あなたは本当にAIを理解していますか？ - 基本原理から使い方，応用まで -

Presentation date： 2020.03

Event date：
2020.03

　

　
人工知能技術の現状と課題～メンテナンスや一次産業支援に適用する際に注意すべきこと～

小川哲司 [Invited]

IoTビジネス推進コンソーシアム沖縄第7回セミナー (沖縄県・那覇市)

Presentation date： 2019.10
センサデータの欠損が漁獲量予測性能に与える影響

小川哲司, 堀内優佳, 小林哲則, 福嶋正義, 井戸上彰

マリンITワークショップ2019 (北海道・函館市)

Presentation date： 2019.08
漁獲量心理尺度と機械学習による漁獲量予測モデルの最適化への利用

小川哲司, 幸加木裕也, 橋本和夫, 小林哲則, 福嶋正義, 井戸上彰

マリンITワークショップ2019いしがき (沖縄県・石垣市)

Presentation date： 2019.03
最近の人工知能技術事情と鹿児島県における産学連携導入事例

小川哲司 [Invited]

鹿児島ITビジネス研究会 (鹿児島県・鹿児島市)

Presentation date： 2019.03
状態空間モデルを用いた定置網漁のための日単位漁獲量予測

小川哲司

マリンITワークショップ (北海道・函館市)

Presentation date： 2018.08
情報工学から考えるIoTと畜産の未来

小川哲司 [Invited]

日本繁殖生物学会若手サマーセミナー合宿 (茨城県・笠間市)

Presentation date： 2018.08
Toward proactive forecasting for smart maintenance of infrastructure equipment and support for primary industry

Tetsuji Ogawa [Invited]

7th Research Seminar in E-JUST (Alexandria) Egypt-Japan University of Science and Technology (E-JUST)

Presentation date： 2018.03
人工知能研究の進展と課題

小川哲司 [Invited]

鹿児島ITビジネス研究会 (鹿児島県・鹿児島市)

Presentation date： 2017.09
High resolution traffic maps generation using cellular big data

Ahmed El-Mahdy, Essam Algizawy, Tetsuji Ogawa, Hisham Shishiny, Mohamed Badder, Keiji Kimura

NetMob2015 (Boston)

Presentation date： 2015.04
階層的発話生成モデルを用いた話者クラスタリングのためのフルベイズモデル推定手法の比較

俵直弘, 小川哲司, 渡部晋治, 小林哲則

第14回情報論的学習理論ワークショップ（IBIS2011） (奈良県・奈良市)

Presentation date： 2011.11
クラス間距離に基づく判別分析と年齢推定システムへの適用

小川哲司, 小林哲則

第13回情報論的学習理論ワークショップ（IBIS2010） (東京都・目黒区)

Presentation date： 2010.11
Sound source separation system and acoustic signal acquisition device

Tetsuji Ogawa

Leading Edge Japan 2009 (New York)

Presentation date： 2009.03
Multi-layer audio segregation and its application to double talk recognition

Toshiyuki Sekiya, Tomohiro Sawada, Tetsuji Ogawa, Tetsunori Kobayashi

SWIM, Lectures by Masters in Speech Processing (Honolulu)

Presentation date： 2004.01

▼display all

Research Projects

Study on Construction and Operation Method of Sustainable Condition Monitoring System for Decision Support

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2022.04

-

2025.03
Study on Construction and Operation Method of Sustainable Condition Monitoring System for Decision Support

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2022.04

-

2025.03
重症心身障害児の育ちを支える「コミュニケーション支援 AI」の開発および持続的な運用法の確立

木原記念横浜生命科学振興財団 2023 年度 LIP.横浜トライアル助成金

Project Year :

2023.06

-

2024.03

佐藤朝美, 小川哲司
Research on sustainable fishery condition monitoring through cooperation between fishermen and artificial intelligence technology

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2022.06

-

2024.03
Deep semantic annotation of video contents

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2017.04

-

2021.03

　View Summary

To enable an advanced retrieval system or an intelligent knowledge extraction system that deals with a large set of video contents, it is essential to semantically annotate them adequately. Towards this ultimate goal, this study researched fundamental technologies that combine vision and language technologies. More specifically, we have developed an effective yet efficient scene graph generation systems and an action captioning system. Empirical results show that the resulting systems generally performed better than the comparative systems. These systems respectively achieve information structure adequate for computer processing and for human consumption.
局所的海洋データを活用した漁業の効率化の研究開発

総務省戦略的情報通信研究開発推進制度（SCOPE）

Project Year :

2017.04

-

2020.03

内海康雄, 北島宏之, 若生一広, 菅原利弥, 宇都宮栄二, 井戸上彰, 阿部博則, 福嶋正義, 小川哲司, 小林哲則, 中野鐵兵, 橋本和夫
A study on speaker-specific information extraction in consideration of vocalization mechanism and its application to speaker verification

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2016.04

-

2019.03

Ogawa Tetsuji, Tawara Naohiro

　View Summary

An attempt was made to develop a neural network to learn speaker representations that are not affected by phoneme information under the assumption that speaker and phoneme information are separable on acoustic features. As the achievement, the disentangling neural network was successfully developed to extract the phoneme and speaker information separately from each frame of acoustic features. The present study introduced statistical pooling, which aims at reflecting the utterance-by-utterance speaker information to the frame-by-frame features, and demonstrated that the pooling just before classification (i.e., late pooling) performed well. In addition, a loss function based on the entropy of classifiers was introduced to optimize feature extractors such that the extracted features could contain only the desired speaker-specific and phoneme-specific information and shown to be effective in speaker verification.
A study on total optimization of multiple pattern recognition systems using cooperative and adaptive training

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2013.04

-

2016.03

Ogawa Tetsuji

　View Summary

Attempts have been made to cooperatively optimize multiple pattern recognition systems, developing a total system efficiently and automatically. Specifically, the clustering technique that is robust against the environmental changes and multistream pattern recognition framework, which cooperatively exploits information yielded from multiple systems, have been developed as the fundamental technologies for adaptively refining the systems to cope with the changes in characteristics of data (e.g., users and surrounding environments of the system).
Effective improvement of time-series pattern recognition systems using clustering and unsupervised adaptive training

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2011

-

2012

OGAWA Tetsuji

　View Summary

I developed technologies for clustering speech data into acoustic attributes such as speakers and types of noise and technologies for adaptively optimizing speech recognition systems in unsupervised ways. The developed technologies would be essential for constructing a system structuring speech data and a speech retrieval system.
A study on online adaptive pattern recognition with sequential optimization of model structures

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2009

-

2010

OGAWA Tetsuji

　View Summary

I developed a method of adaptively optimizing both the structure and parameters of statistical models used in pattern recognition systems to effectively improve robustness of those systems to environmental changes. In addition, I attempted to apply this framework to speaker recognition systems using speech information and face recognition systems using image information.
A study on communication robot performing rhythmic conversation

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2008

-

2010

KOBAYASHI Tetsunori

　View Summary

We sophisticated generation/recognition methods of linguistic and paralinguistic information and achieved a communication robot which can make conversation with a group of people. The robot was used to stimulate activity of the human to human conversation. For this purpose, we designed a robot appearance to express desired character for conversation and to perform paralinguistic information expression functions. We designed behaviors to suit for each conversational situation and conversational procedure to make it attractive. We also improved speech recognition/synthesis methods for conversation.
Study on Speech Enhancement Based on Distorted Speech Corpora in the Real-world

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2007

-

2009

TAKEDA Kazuya, KITAOKA Norihide, YAMADA Takeshi, NISHIURA Takanobu, MIYAJIMA Chiyomi, TAMURA Satoshi, NAKAMURA Satoshi, KUROIWA Shingo, TSUGE Satoru, TAKIGUCHI Tetsuya, YAMAMOTO Kazumasa, OGAWA Tetsuji, NAKAYAMA Masato

　View Summary

For distorted speech recognition under the real world, we conducted below : (1) development of distorted speech corpora named CENSREC and distribution of them in public ; (2) accurate recognition performance prediction for additively/convolutionally distorted speech ; (3) development of structural explanation of distortion factors and recognition methods for distorted speech ; (4) development of distorted speech recognition methods.
A study on a pattern recognition system based on the combination of complementary classifiers

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2007

-

2008

OGAWA Tetsuji
状態と出力に相互依存関係を有する確率モデルの構造最適化と頑健性強化に関する研究

日本学術振興会科学研究費助成事業

Project Year :

2005

-

2006

小川哲司

　View Summary

本年度は,以下の2点について検討を行った.
(1)部分隠れマルコフモデル(PHMM)のモデル構造最適選択
PHMMのモデル構造を音韻毎に最適化する枠組みとして,昨年度は,評価基準として重み付き尤度比最大化基準を,最適化アルゴリズムとして遺伝的アルゴリズムを導入し,講演音声認識において従来法の誤りを削減した.本年度は,下記A)〜C)について詳細な検討を行った.
A)評価関数:重み付き尤度比基準,最尤基準,ベイズ基準など複数の評価基準を導入し認識性能を評価したところ,識別的な基準である尤度比基準が最良の性能を与えることがわかった.
B)最適化アルゴリズム:遺伝的アルゴリズムとタブサーチを用いたときの性能を評価したところ,タブサーチは局所解に陥りやすく,遺伝的アルゴリズムの方が高速に最適解に到達することがわかった.
C)識別クラスの共有:探索の効率化のため音韻のクラスタリングを行ったが,最適化の段階でクラスを共有してしまうと,共有しない場合と同程度の性能を得られないことがわかった.
(2)環境の変動に頑健な特徴量の検討
PHMMのように高精度な確率モデルは,HMMなどの単純なモデルと比較して発話者や環境の変動の影響を受け易い.したがって,音響特徴量から発話者の情報や環境の情報を取り除き,識別に必要な情報である音韻情報のみを抽出する手法(識別情報抽出)について検討を行った.識別情報抽出として,HLDAやそれを拡張したManifold HLDA(MHLDA)を提案し,単語音声認識により評価を行ったところ,HLDAとMHLDAにより抽出されたパラメータを統合することで,環境の変動に対して頑健な性能を与えることがわかった.
この知見を発展させ,HLDAにブースティングを導入した確率モデルの統合手法についても検討を行い,最尤識別に比べ頑健性の高い認識が可能になるという予備的な知見も得た.
Studies on conversation systems with understanding and generating functions of linguistic and para-linguistic information

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2003

-

2006

KOBAYASHI Tetsunori, FUJIE Shinya, OGAWA Tetsuji

　View Summary

As a tool for investigating fundamental elements of natural spoken language communication, a prototype of spoken dialogue system with understanding and generating functions of linguistic and para-linguistic information was developed.
Although many excellent studies on speech recognition and synthesis have been conducted, there exists no practical spoken dialogue system which satisfies us. One of the reasons is that most spoken dialogue systems did not deal with para-linguistic information. The quantitative understanding for para-linguistic information is not sufficient enough to make natural conversation system. In this study, we tried realizing many component technologies and a platform of conversation robot as tools to reveal the quantitative rolls of para-language.
In particular, the following outcomes were obtained. 1) the sound localization and separation methods using the four-line directivity microphone mounted on head of robot, 2) the high quality speech synthesis method based on the waveform synthesis and the high quality voice conversion method for expressing para-linguistic information, 3) the method of attitude recognition and back-channel feedback generation based on the prosodic information as para-linguistic information in speech information, 4) the method of head gesture recognition and facial expression recognition as para-linguistic information in visual information, 5) humanoid robot "ROBISUKE" developed as the platform of the spoken dialogue system, and 6) Message Oriented RObot Architecture, MONEA, proposed for the integration of the abovementioned modules.
Future work includes the experiment for finding out the necessary requirement for natural conversation quantitatively.

▼display all

Misc

階層的Deep Biasingを用いた低頻度語に頑健な音声認識

楠奈穂美, 樋口陽祐, 小川哲司, 小林哲則

情報処理学会研究報告 (SLP) 2025.12

Research paper, summary (national, other academic conference)
Common Crawl を用いた大規模音声音響データセットの構築

淺井航平, 杉浦一瑳, 中田亘, 栗田修平, 高道慎之介, 小川哲司, 東中竜一郎

日本音響学会秋季研究発表会講演論文集 2025.09

Research paper, summary (national, other academic conference)
大規模言語モデルによる歌詞解釈記述とモーメント検索を用いたカラオケ背景映像の選択

有賀智輝, 谷口純, 当間佐耶佳, 阿部国大, 執行里恵, 小川哲司

第28回画像の認識・理解シンポジウム (MIRU2025) IS-3-203 2025.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
視覚言語モデルを用いた適応的なクラス記述に基づく画像認識フレームワーク

兒新治紀, 中野鐵兵, 佐藤朝美, 小川哲司

28回画像の認識・理解シンポジウム (MIRU2025) IS-1-146 2025.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
養育者に無理なく頼るモデル構築：重症児感情状態推定のための人間参加型学習および視覚言語モデルの活用

望田康太, 中野鐵兵, 若林麻里, 佐藤朝美, 小川哲司

2025年度人工知能学会全国大会（JSAI2025） 4LS-OS-38-03 2025.05

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)

DOI
振動信号による風車状態監視のための精密かつ少量データに頑健なモデリング

若山拓矢, 井上太揮, 深山覚, 飯田誠, 小川哲司

2025年度人工知能学会全国大会（JSAI2025） 4R3-GS-10-03 2025.05

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)

DOI
映像からの繁殖牛発情行動検知における物体検出モデル利用に関する検討

小林恵輔, 中野鐵兵, 春日良一, 日下裕美, 坂口実, 小川哲司

2025年度人工知能学会全国大会（JSAI2025） 3win5-82 2025.05

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)

DOI
映像を用いた繁殖牛分娩監視のための基盤モデルの開発と運用

中田道寛, 川島由理, 中野鐵兵, 春日良一, 小川哲司

2025年度人工知能学会全国大会（JSAI2025） 2O1-GS-10-01 2025.05

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
良漁場予測範囲絞り込みのためのPU学習を用いた未探索海域データの活用

兒新治紀, 中野鐵兵, 宮澤泰正, 小川哲司

2025年度人工知能学会全国大会（JSAI2025） 1Q4-GS-10-03 2025.05

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)

DOI
歌唱音声の特性を考慮した歌唱者照合のための頑健な特徴抽出器の構築

当間佐耶佳, 有賀智輝, 樋口陽祐, 早坂一寿, 執行里恵, 小川哲司

日本音響学会研究発表会講演論文集 ( 1-2-17 ) 935 - 938 2025.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
発音プロンプトと辞書を活用したEnd-to-End音声認識のキーワード認識精度改善手法

菅野竜雅, 佐藤裕明, 佐久間旭, 熊野正, 河合吉彦, 小川哲司

日本音響学会研究発表会講演論文集 ( 1-2-14 ) 927 - 930 2025.03

Authorship：Last author

Research paper, summary (national, other academic conference)
階層的マルチタスク学習とContextual Biasingを用いたEnd-to-End音声認識

楠奈穂美, 樋口陽祐, 小川哲司, 小林哲則

日本音響学会研究発表会講演論文集 ( 1-2-10 ) 913 - 916 2025.03

Research paper, summary (national, other academic conference)
音声・音響・音楽を扱うオープン基盤モデルの構築に向けたデータセット策定

高道慎之介, 和田仰, 小川諒, 山岡洸瑛, 中田亘, 淺井航平, 関健太郎, 岡本悠希, 齋藤佑樹, 小川哲司, 猿渡洋, 中村友彦, 深山覚

言語処理学会第31回年次大会発表論文集 2692 - 2696 2025.03

Research paper, summary (national, other academic conference)
Evidential deep learningを用いた不確実性に基づくストリーミング音声認識

佐藤裕明, 佐久間旭, 菅野竜雅, 熊野正, 河合吉彦, 小川哲司

電子情報通信学会研究報告(SP) 124 ( 391 ) 1 - 6 2025.03

Authorship：Last author

Research paper, summary (national, other academic conference)
音質主観評価における評価者選抜のための音声サンプル選定の重要性

矢部拓真, 八重樫萌絵, 中野鐵兵, 小川哲司

電子情報通信学会研究報告(SP) 124 ( 391 ) 329 - 334 2025.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
発話被りを含む会話音声認識のための多話者CTC損失関数の検討

佐久間旭, 佐藤裕明, 菅野竜雅, 熊野正, 河合吉彦, 小川哲司

電子情報通信学会技術研究報告 (SP) 124 ( 303 ) 6 - 11 2024.12

Authorship：Last author

Research paper, summary (national, other academic conference)
End-to-End音声認識における指示チューニングされた大規模言語モデルの活用

樋口陽祐, 小川哲司, 小林哲則

情報処理学会研究報告 (SLP) 2024-SLP-154 ( 27 ) 1 - 8 2024.12

Research paper, summary (national, other academic conference)
再帰的フィードバックを用いた階層的 End-to-End 音声認識

楠奈穂美, 樋口陽祐, 小川哲司, 小林哲則

情報処理学会研究報告 (SLP) 2024-SLP-154 ( 1 ) 1 - 7 2024.12

Research paper, summary (national, other academic conference)
WindVibraTransformer：振動信号による精密かつ頑健な風車状態監視のための基盤モデル

若山拓矢, 井上太揮, 緒方淳, 飯田誠, 小川哲司

第46回風力エネルギー利用シンポジウム A1-05 2024.11

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
RangeBoundTrack: 黒毛和種雌牛分娩監視映像データセット構築のための牛追跡

中田道寛, 大吉佐和, 中野鐵兵, 春日良一, 小川哲司

日本畜産学会第132回大会 2024.09

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
低頻度語のためのプロンプトを活用した音声認識

菅野竜雅, 佐藤裕明, 佐久間旭, 熊野正, 河合吉彦, 小川哲司

日本音響学会研究発表会講演論文集 2024.09

Authorship：Last author

Research paper, summary (national, other academic conference)
状態変化の頻度情報の抽出と家畜の映像監視のための特徴表現としての利用

中田道寛, 中野鐵兵, 小川哲司

第27回画像の認識・理解シンポジウム (MIRU2024) IS-3-142 1 - 4 2024.08

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
重症児感情状態推定モデル構築のためのフィードバックサイクルの検討：人の「見守り」による効率的なモデル構築

望田康太, 中野鐵兵, 若林麻里, 佐藤朝美, 小川哲司

第27回画像の認識・理解シンポジウム (MIRU2024) IS-1-165 1 - 4 2024.08

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
歌唱者埋め込み表現抽出器の構築において歌唱者内の音響変動を重要視することの効果の検証

当間佐耶佳, 有賀智輝, 樋口陽祐, 早坂一寿, 執行里恵, 小川哲司

情報処理学会研究報告 2024-SLP-152 ( 60 ) 331 - 336 2024.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
発音出力を利用したchain of thought 音声認識

菅野竜雅, 佐藤裕明, 熊野正, 河合吉彦, 小川哲司

日本音響学会研究発表会講演論文集 2024.03

Authorship：Last author

Research paper, summary (national, other academic conference)
再帰的フィードバックを用いた階層的マルチタスク学習によるEnd-to-End音声認識

楠奈穂美, 樋口陽祐, 小川哲司, 小林哲則

日本音響学会研究発表会講演論文集 2024.03

Research paper, summary (national, other academic conference)
M-measureを用いた特徴抽出に基づく回転速度に頑健な風車異常検知

若山拓矢, 井上太揮, 緒方淳, 飯田誠, 小川哲司

第45回風力エネルギー利用シンポジウム 2023.11

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
風車異常検知システム早期運用のための距離学習を用いた識別的な特徴表現の学習

井上太揮, 緒方淳, 飯田誠, 小川哲司

第45回風力エネルギー利用シンポジウム 2023.11

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
Streaming transducerにおけるテキストのみを用いた学習方法に関する検討

佐藤裕明, 菅野竜雅, 佐久間旭, 河合吉彦, 熊野正, 山田一郎, 小川哲司

日本音響学会研究発表会講演論文集 2023.09

Authorship：Last author

Research paper, summary (national, other academic conference)
深層話者埋め込みを用いた歌唱者の照合に関する検討

当間佐耶佳, 有賀智輝, 樋口陽祐, 早坂一寿, 岡本直紀, 小川哲司

日本音響学会研究発表会講演論文集 2023.09

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
Teacher-Forcingにより歌詞を与えた際のAttentionの崩れに着目した替え歌検知

有賀智輝, 樋口陽祐, 早坂一寿, 岡本直紀, 小林哲則, 小川哲司

日本音響学会研究発表会講演論文集 2023.09

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
An investigation on constructing multi-look-ahead contextual block streaming transducer

Huaibo Zhao, Shinya Fujie, Tetsuji Ogawa, Tetsunori Kobayashi

日本音響学会研究発表会講演論文集 2023.09

Research paper, summary (national, other academic conference)
音源の分離と再混合による事前学習を必要としないモノラル教師なし音源分離

西城耕平, 小川哲司

日本音響学会研究発表会講演論文集 2023.09

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
事前学習済みマスク言語モデルを用いたEnd-to-end音声認識

樋口陽祐, 小川哲司, 小林哲則, 渡部晋治

日本音響学会研究発表会講演論文集 2023.09

Research paper, summary (national, other academic conference)
字幕制作効率化のための音声認識エラー検出手法

菅野竜雅, 佐藤裕明, 佐久間旭, 熊野正, 河合吉彦, 山田一郎, 小川哲司

映像メディア学会2023年年次大会 2023.08

Authorship：Last author

Research paper, summary (national, other academic conference)
アクションユニットを用いた重症心身障害児の感情状態推定

望田康太, 岸凌祐, 大矢耀介, 中野鐵兵, 藤江真也, 佐藤朝美, 小川哲司

第24回日本医療情報学会看護学術大会 2023.07

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
重症心身障害児を対象とした顔表情に基づく感情状態推定のための事前学習モデルに関する検討

望田康太, 中野鐵兵, 藤江真也, 若林麻里, 佐藤朝美, 小川哲司

第26回画像の認識・理解シンポジウム (MIRU2023) IS1-104 1 - 4 2023.07

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
映像監視に基づく意思決定支援のための事前学習モデルの構築法と繁殖牛の分娩検知への応用

中田道寛, 斎藤奨, 中野鐵兵, 小川哲司

第26回画像の認識・理解シンポジウム (MIRU2023) IS1-101 1 - 4 2023.07

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
歌詞と歌唱音声のアライメント崩れに基づく替え歌検知

有賀智輝, 樋口陽祐, 菅野光則, 執行里恵, 水口天都, 岡本直紀, 小川哲司

電子情報通信学科技術研究報告(SP) 123 ( 88 ) 48 - 53 2023.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
Self-Remixing: 音源の分離と再混合による教師なし音源分離

西城耕平, 小川哲司

日本音響学会研究発表会講演論文集 191 - 194 2023.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
中間層予測を用いたEnd-to-Endダイアライゼーション

藤田雄介, 小松達也, 木田祐介, 小川哲司

日本音響学会研究発表会講演論文集 665 - 666 2023.03

Authorship：Last author

Research paper, summary (national, other academic conference)
気象・海況情報を用いた良漁場予測における予測範囲の絞り込み

兒新治紀, 中野鐵兵, 宮澤泰正, 小川哲司

日本水産学会春季大会 2023.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
Multiple latency CBS streaming ASR for conversational systems

Zhao Huaibo, Shinya Fujie, Tetsuji Ogawa, Jin Sakuma, Yusuke Kida, Tetsunori Kobayashi

情報処理学会研究報告 (SLP) 2022-SLP-146 ( 9 ) 1 - 6 2023.02

Research paper, summary (national, other academic conference)
畜産農家の意思決定支援AI導入に向けた取組み

小川哲司, 斎藤奨, 中野鐵兵

ITUジャーナル 52 ( 10 ) 10 - 13 2022.10 [Invited]

Authorship：Lead author, Corresponding author

Article, review, commentary, editorial, etc. (scientific journal)
映像監視に基づく繁殖牛の分娩予兆検知～ユーザが納得して意思決定できるような映像監視システムをどう構築し運用するか？

小川哲司

計測と制御・特集「農・林・畜・水産業に挑む画像センシング技術」 61 ( 10 ) 746 - 749 2022.10 [Invited]

Authorship：Lead author, Corresponding author

Article, review, commentary, editorial, etc. (scientific journal)
CycleGANを用いた教師無し音声処理歪み補正

荻野里久, 西城耕平, 小川哲司

日本音響学会研究発表会講演論文集 371 - 374 2022.09

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
ブラインド音源分離を教師としたTeacher-Student学習とUnmix-Remix無矛盾学習によるSequential Neural Beamformerの教師なし学習

西城耕平, 小川哲司

日本音響学会研究発表会講演論文集 359 - 362 2022.09

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
クラウドソーシングにおける動的タスク発注モデルの教師無し学習

柳澤遼, 斎藤奨, 中野鐵兵, 小林哲則, 小川哲司

電子情報通信学会技術研究報告 (AI) 122 ( 96 ) 72 - 76 2022.07

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
対話特徴を用いた第二言語発話の流暢性自動採点

松浦瑠希, 鈴木駿吾, 佐伯真於, 藤江真也, 小川哲司, 松山洋一

情報処理学会研究報告 (SLP) 2022-SLP-142 ( 47 ) 1 - 6 2022.06

Research paper, summary (national, other academic conference)
Transducer型ストリーミング音声認識におけるMask-CTCを用いた事前学習

趙懐博, 樋口陽祐, 木田祐介, 小川哲司, 小林哲則

情報処理学会研究報告 (SLP) 2022-SLP-142 ( 61 ) 1 - 6 2022.06

Research paper, summary (national, other academic conference)
クラウドソーシングを用いた合成音声の音質主観評価のためのワーカ選抜基準

八重樫萌絵, 斎藤奨, 中野鐵兵, 小川哲司

電子情報通信学会技術研究報告 (SP) 122 ( 81 ) 104 - 109 2022.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
局所的な真偽判定を用いた敵対的学習に基づく教師なし音声処理歪み補正

荻野里久, 西城耕平, 藤枝大, 小川哲司

研究報告 (SP) 122 ( 81 ) 49 - 54 2022.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
ブラインド音源分離の分離音と観測信号を教師信号として用いたSequential Neural Beamformerの教師なし学習

西城耕平, 小川哲司

電子情報通信学会技術研究報告 (SP) 122 ( 81 ) 110 - 115 2022.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
テキストのみを用いたIntermediate-CTCコンフォーマーモデルのドメイン適応

佐藤裕明, 小森智康, 三島剛, 河合吉彦, 望月貴裕, 佐藤庄衛, 小川哲司

日本音響学会研究発表会講演論文集 2022.03

Authorship：Last author

Research paper, summary (national, other academic conference)
粒度の異なるサブワード単位に基づく階層的条件付きEnd-to-End音声認識

樋口陽祐, 軽部敬太, 小川哲司, 小林哲則

日本音響学会研究発表会講演論文集 2022.03

Research paper, summary (national, other academic conference)
クラウドソーシングを用いた合成音声評価におけるワーカからの回答の分析

八重樫萌絵, 斎藤奨, 中野鐵兵, 小川哲司

日本音響学会研究発表会講演論文集 2022.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
敵対的学習と Unmix-Remix 無矛盾学習による教師なし音源分離

西城耕平, 小川哲司

日本音響学会研究発表会講演論文集 2022.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
ペアデータを必要としない敵対的学習に基づく音声処理歪み補正

荻野里久, 藤枝大, 片桐一浩, 小川哲司

日本音響学会研究発表会講演論文集 2022.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
言い淀みとポーズ位置検出に基づく第二言語発話の流暢性自動採点

松浦瑠希, 鈴木駿吾, 佐伯真於, 小川哲司, 松山洋一

日本音響学会研究発表会講演論文集 2022.03

Research paper, summary (national, other academic conference)
クラウドソーシングを用いた話者照合結果の検証における誤り削減傾向に関する調査

井手悠太, 斎藤奨, 中野鐵兵, 小川哲司

日本音響学会研究発表会講演論文集 2022.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
風車運用高度化技術研究開発

飯田誠, 古澤陽子, 山本和男, 緒方淳, 小川哲司

日本風力エネルギー学会誌・特集「風力発電分野の国家プロジェクト」 45 ( 4 ) 582 - 586 2022.02 [Invited]

Authorship：Last author

Article, review, commentary, editorial, etc. (scientific journal)
End-to-end音声認識モデルにおけるテキストデータ学習手法の検討

佐藤裕明, 小森智康, 三島剛, 河合吉彦, 望月貴裕, 佐藤庄衛, 小川哲司

2021年度映像情報メディア学会冬季大会 2021.12

Authorship：Last author

Research paper, summary (national, other academic conference)
テキストのみを用いたドメイン適応のためのIntermediate-CTCコンフォーマーモデルに関する検討

佐藤裕明, 小森智康, 三島剛, 河合吉彦, 望月貴裕, 佐藤庄衛, 小川哲司

情報処理学会研究報告 (SLP) 2021.12

Authorship：Last author

Research paper, summary (national, other academic conference)
クラウドソーシングを用いた結果の検証による話者照合性能の改善

井手悠太, 斎藤奨, 中野鐵兵, 小川哲司

情報処理学会研究報告 (SLP) 2021.12

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
CTCと異なる粒度のサブワード単位に基づいた階層的条件付きEnd-to-End音声認識

樋口陽祐, 軽部敬太, 小川哲司, 小林哲則

情報処理学会研究報告 (SLP) 1 - 6 2021.12

Research paper, summary (national, other academic conference)
マルソウダ曳縄漁のための気象・海況情報を用いた良漁場予測

堀内優佳, 中野鐵兵, 宮澤泰正, 小川哲司

水産海洋学会2021年度研究発表大会要旨 2021.11

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
船上映像を用いた漁獲尾数計測器

田中理子, 中野鐵兵, 小川哲司

水産海洋学会2021年度研究発表大会要旨 2021.11

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
農家の皆さんにとって使い勝手が良く，信頼してもらえるAI技術の作り方－農家の意思決定支援のための家畜の映像監視システム開発を例に―

小川哲司

肉牛ジャーナル 34 ( 10 ) 59 - 63 2021.10

Authorship：Lead author, Corresponding author

Article, review, commentary, editorial, etc. (trade magazine, newspaper, online media)
Triggered attention型ストリーミング音声認識におけるMask-CTCを用いた事前学習

趙懐博, 樋口陽祐, 小林哲則, 小川哲司

情報処理学会研究報告 (SLP) 1 - 6 2021.10

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
DNNを用いた最小分散ビームフォーマの音源の動き対する頑健性：音源追跡とエリア収音に基づくアプローチの比較

西城耕平, 藤枝大, 片桐一浩, 小林哲則, 小川哲司

日本音響学会研究発表会講演論文集 321 - 322 2021.09

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
VocalTurk：クラウドソーシングを用いた話者照合の性能調査

斎藤奨, 井手悠太, 中野鐵兵, 小川哲司

日本音響学会研究発表会講演論文集 1003 - 1006 2021.09

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
予測の不一致に基づく深層学習モデルの不確実性推定とクラウドソーシングを用いた映像監視への応用

松永直輝, 斎藤奨, 中野鐵兵, 小川哲司

第24回画像の認識・理解シンポジウム (MIRU2021) 1 - 4 2021.07

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
意思決定支援のための解釈可能な映像監視システムの開発フローと繁殖牛の分娩予兆検知への応用

兵頭亮介, 斎藤奨, 中野鐵兵, 小川哲司

第24回画像の認識・理解シンポジウム (MIRU2021) 1 - 4 2021.07

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
黒毛和牛種の映像監視における解釈可能な分娩予兆通知システム

兵頭亮介, 斎藤奨, 中野鐵兵, 赤羽誠, 春日良一, 小川哲司

日本畜産学会第128回大会要旨 2021.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
空間フィルタ出力を補助情報として用いた音源の移動に頑健なニューラル音声強調

西城耕平, 藤枝大, 片桐一浩, 小林哲則, 小川哲司

日本音響学会研究発表会講演論文集 427 - 428 2021.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
ペアデータを必要としない敵対的学習に基づく多チャンネル音源分離

中込優, 戸上真人, 小川哲司, 小林哲則

日本音響学会研究発表会講演論文集 409 - 410 2021.03

Research paper, summary (national, other academic conference)
コモンセンス知識を利用した物語中の登場人物の感情推定

田辺ひかり, 小川哲司, 小林哲則, 林良彦

言語処理学会第27回年次大会 27th 538 - 542 2021.03

Research paper, summary (national, other academic conference)

J-GLOBAL
単語の重要度に応じてパラメタ数可変な単語分散表現の学習

露木浩章, 小川哲司, 小林哲則, 林良彦

言語処理学会第27回年次大会 27th 12 - 16 2021.03

Research paper, summary (national, other academic conference)

J-GLOBAL
CTCとマスク推定に基づく推論速度の速いEnd-to-End音声認識

樋口陽祐, 稲熊寛文, 渡部晋治, 小川哲司, 小林哲則

電子情報通信学会技術研究報告 (SP) 2020.12

Research paper, summary (national, other academic conference)
分布類似度に基づく健全性指標と風車異常検知システムの早期運用における効果

長谷川隆徳, 緒方淳, 飯田誠, 小川哲司

第42回風力エネルギー利用シンポジウム予稿集 2020.11

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
Mentoring-Reverse Mentoring: 多チャンネル音源分離における教師なし学習のための知識伝搬フレームワーク

中込優, 戸上真人, 小川哲司, 小林哲則

日本音響学会講演論文集 2020 ( 秋季 ) 127 - 130 2020.09

Research paper, summary (national, other academic conference)
Mask CTC: CTCとマスク推定に基づいた非自己回帰的なEnd-to-End音声認識

樋口陽祐, 渡部晋治, Chen Nanxin, 小川哲司, 小林哲則

日本音響学会講演論文集 2020 ( 秋季 ) 747 - 748 2020.09

Research paper, summary (national, other academic conference)
書き起こしのための遠方発話音声認識技術の検討

佐藤裕明, 萩原愛子, 伊藤均, 三島剛, 河合吉彦, 小森智康, 佐藤庄衛, 小川哲司

日本音響学会講演論文集 2020 ( 秋季 ) 841 - 842 2020.09

Authorship：Last author

Research paper, summary (national, other academic conference)
感情に関するマルチラベルアノテーションにおける正解基準の設定

田辺ひかり, 小川哲司, 小林哲則, 林良彦

人工知能学会全国大会論文集 JSAI2020 1 - 4 2020.06

Research paper, summary (national, other academic conference)
クラウドソーシングにおける効率的な回答収集のための動的なマイクロタスク追加発注

森永聖也, 斎藤奨, 中野鐵兵, 小林哲則, 小川哲司

人工知能学会全国大会論文集 JSAI2020 1 - 4 2020.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
肉牛の発情検知のための乗駕行動画像データセット構築におけるクラウドソーシングの活用

川野百合子, 斎藤奨, 中野鐵兵, 赤羽誠, 近藤育海, 山崎凌汰, 日下裕美, 坂口実, 小川哲司

人工知能学会全国大会論文集 JSAI2020 1 - 4 2020.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
ドローン空撮画像を用いた潮目の検知に関する検討

幸加木裕也, 小林哲則, 小川哲司

日本水産学会春季大会要旨 2020.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
End-to-end雑音除去のためのネットワーク構造の検討

蓮実拓也, 小林哲則, 小川哲司

日本音響学会講演論文集 2020 ( 春季 ) 335 - 336 2020.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
感情推定における感情カテゴリに関する先験的知識の利用

田辺ひかり, 小川哲司, 小林哲則, 林良彦

言語処理学会第26回年次大会発表論文集 P6-23 2020.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
局所的依存構造をSelf-Attentionにより考慮する翻訳文生成

露木浩章, 小川哲司, 小林哲則, 林良彦

言語処理学会第26回年次大会発表論文集 P1-7 2020.03

Research paper, summary (national, other academic conference)
所望音源の方向アトラクターに基づく時変の空間フィルタを用いたDNN音声抽出

中込優, 戸上真人, 小川哲司, 小林哲則

日本音響学会講演論文集 2020 ( 春季 ) 305 - 308 2020.03

Research paper, summary (national, other academic conference)
短発話を対象としたテキスト独立型話者認識のためのフレームレベル音素非依存特徴抽出

俵直弘, 小川厚徳, 岩田具治, デラクロアマーク, 小川哲司

日本音響学会講演論文集 2020 ( 春季 ) 997 - 998 2020.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
Attentionに関する損失を利用したノイズに頑健なEnd-to-End音声認識

樋口陽祐, 俵直弘, 小川厚徳, 岩田具治, 小林哲則, 小川哲司

日本音響学会講演論文集 2020 ( 春季 ) 935 - 936 2020.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
クラウドソーシングにおける動的な回答収集による低コストな多数決手法

森永聖也, 斎藤奨, 中野鐵兵, 小林哲則, 小川哲司

情報処理学会研究報告 (HCI) 2019-HCI-186 ( 36 ) 1 - 6 2020.01

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
マルチチャネル音声強調のための時間領域畳み込みデノイジングオートエンコーダ

俵直弘, 小林哲則, 小川哲司

電子情報通信学会技術研究報告（SP） 2019.12

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
あらゆる風車に適用可能な状態監視技術を目指して～風車主要機器におけるデータ駆動型異常検知とその評価～

長谷川隆徳, 緒方淳, 村川正宏, 飯田誠, 小川哲司

第41回風力エネルギー利用シンポジウム 2019.12

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
映像情報を用いた繁殖牛分娩検知システムの構築と運用法に関する研究・開発（自然に挑む画像センシング技術～農林水産業の現場でいかに役立つか？～）

小川哲司, 斎藤奨, 中野鐵兵

OplusE 41 ( 6 ) 858 - 862 2019.11 [Invited]

Authorship：Lead author, Corresponding author

Article, review, commentary, editorial, etc. (scientific journal)
画像情報による黒毛和牛種の乗駕行動の検知に関する検討

川野百合子, 河田宗士, 沖本祐典, 中野鐵兵, 赤羽誠, 近藤育海, 山崎凌汰, 日下裕美, 坂口実, 小川哲司

日本畜産学会第126回大会要旨 IV-19-03 2019.09

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
双方向時系列情報を利用した検出結果と正解情報付与による効率的なアノテーション手法

真殿航輝, 中野鐵兵, 小林哲則, 小川哲司

第22回画像の認識・理解シンポジウム PS2-5 1 - 4 2019.08

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
学習可能な暗号化画像への敵対的学習に基づく攻撃

真殿航輝, 田中正行, 大西正輝, 小川哲司

第22回画像の認識・理解シンポジウム PS1-41 1 - 4 2019.08

Authorship：Last author

Research paper, summary (national, other academic conference)
DPGMMと敵対的学習に基づく話者の違いに頑健な特徴抽出とゼロリソース音声認識での評価

樋口陽祐, 俵直弘, 小林哲則, 小川哲司

情報処理学会研究報告 (SLP) 2019-SLP-128 ( 6 ) 1 - 6 2019.07

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
画像から得られる牛の身体情報に基づく分娩予兆検知

兵頭亮介, 斎藤奨, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

人工知能学会全国大会論文集 JSAI2019 2019.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)

DOI
肉牛の分娩検知システムにおけるクラウドソーシングを用いた誤通報の抑制

沖本裕典, 斎藤奨, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

人工知能学会全国大会論文集 JSAI2019 2019.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)

DOI
ベイズ状態空間モデルを用いた定置網漁のための日単位漁獲量予測

幸加木裕也, 堀内優佳, 俵直弘, 福嶋正義, 井戸上彰, 橋本和夫, 小林哲則, 小川哲司

人工知能学会全国大会論文集 JSAI2019 2019.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)

DOI
回転機器状態監視のための振動異常検知システムにおける特徴表現学習

長谷川隆徳, 緒方淳, 村川正宏, 小川哲司

人工知能学会全国大会論文集 JSAI2019 2019.06

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)

DOI
動画像から得られる牛の身体情報に基づく分娩予兆検知システム

兵頭亮介, 菅原一真, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

電子情報通信学会技術研究報告（PRMU） 119 ( 64 ) 1 - 6 2019.05

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
クラウドソーシングを用いた肉牛分娩開始検知システムの早期運用

沖本裕典, 斎藤奨, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

電子情報通信学会技術研究報告（PRMU） 119 ( 64 ) 7 - 12 2019.05

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
網内の魚の振る舞いを表現した状態空間モデルによる定置網漁のための日単位漁獲量予測

幸加木裕也, 俵直弘, 橋本和夫, 小林哲則, 福嶋正義, 井戸上彰, 小川哲司

電子情報通信学会技術研究報告（PRMU） 119 ( 64 ) 13 - 18 2019.05

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
早稲田大学知覚情報システム・メディアインテリジェンス研究室紹介

長谷川隆徳, 黒澤郁音, 斎藤奨, 松山洋一, 林良彦, 小林哲則, 小川哲司

日本風力エネルギー学会誌 43 ( 1 ) 154 - 157 2019.05

Authorship：Last author, Corresponding author

Article, review, commentary, editorial, etc. (scientific journal)

CiNii
クエリ文によるゼロショット映像検索 – TRECVID 2018 AVSタスクの成果報告 –

植木一也, 中込優, 平川幸司, 菊池康太郎, 林良彦, 小川哲司, 小林哲則

動的画像処理実用化ワークショップ2019 (DIA2019) 2019.03

Research paper, summary (national, other academic conference)
漁獲量における心理尺度と漁獲量予測器の最適化への利用

幸加木裕也, 福嶋正義, 井戸上彰, 橋本和夫, 小林哲則, 小川哲司

日本水産学会春季大会要旨 140 2019.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
画像情報による黒毛和牛種の状態識別に基づいた分娩予兆検知システム

兵頭亮介, 安田早希, 斎藤奨, 菅原一真, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

日本畜産学会第125回大会要旨 XIII-29-10 2019.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
映像情報による肉牛の分娩検知システムにおけるクラウドソーシングを用いた誤検出抑制

沖本祐典, 斎藤奨, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

日本畜産学会第125回大会要旨 XIII-29-09 2019.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
音韻・話者特徴抽出のためのディスエンタングリングニューラルネットワークの実現にむけて

俵直弘, 小林哲則, 小川哲司

日本音響学会講演論文集 2019 ( 春季 ) 1003 - 1004 2019.03
ドメイン属性情報を用いたRNN言語モデルのドメイン汎化

芦川博人, 森岡幹, 俵直弘, 小川厚徳, 岩田具治, 小川哲司, 小林哲則

日本音響学会講演論文集 2019 ( 春季 ) 927 - 930 2019.03
ゼロリソース言語音声認識のための発話者の違いに頑健な特徴抽出

樋口陽祐, 俵直弘, 小川哲司, 小林哲則

日本音響学会講演論文集 2019 ( 春季 ) 923 - 924 2019.03
noise-aware学習を用いた敵対的デノイジングオートエンコーダによるポストフィルタリング

俵直弘, 田辺ひかり, 小林哲則, 藤枝大, 片桐一浩, 矢頭隆, 小川哲司

日本音響学会講演論文集 2019 ( 春季 ) 159 - 162 2019.03
隣接単語系列の予測による文の分散表現構成法

露木浩章, 小川哲司, 小林哲則, 林良彦

言語処理学会第25回年次大会発表論文集 1479 - 1482 2019.03
敵対的デノイジングオートエンコーダを用いた拡散性雑音除去

田辺ひかり, 俵直弘, 小林哲則, 藤枝大, 片桐一浩, 矢頭隆, 小川哲司

電子情報通信学会技術研究報告（SP） 118 ( 497 ) 155 - 160 2019.03

CiNii
隣接単語系列の予測による汎用的な文の分散表現の構成

露木浩章, 小川哲司, 小林哲則, 林良彦

言語処理学会年次大会発表論文集(Web) 25th 2019

J-GLOBAL
畳み込みニューラルネットワークに基づく風車異常検知システムにおける判断根拠の可視化に関する検討

佐伯真於, 緒方淳, 村川正宏, 小川哲司

第40回風力エネルギー利用シンポジウム予稿集 2018.12
正常稼働状態の表現学習に基づく風車異常検知

長谷川隆徳, 緒方淳, 村川正宏, 小川哲司

第40回風力エネルギー利用シンポジウム予稿集 2018.12
RNN言語モデルのためのドメイン属性情報を用いたゼロショット学習

芦川博人, 森岡幹, 俵直弘, 小川厚徳, 岩田具治, 小川哲司, 小林哲則

情報処理学会研究報告 2018.12
映像からの牛の分娩予兆行動検知に関する検討

菅原一真, 中野鐵兵, 赤羽誠, 小林晢則, 小川哲司

電子情報通信学会技術研究報告 (PRMU) 118 ( 362 ) 79 - 84 2018.12
画像からの牛の状態識別に基づく分娩予兆検知

兵頭亮介, 安田早希, 斎藤奨, 沖本裕典, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

電子情報通信学会技術研究報告 (PRMU) 118 ( 362 ) 57 - 60 2018.12
Waseda_Meisei at TRECVID 2018: Fully-automatic ad-hoc video search

Kazuya Ueki, Yu Nakagome, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

Notebook paper of the TRECVID 2018 Workshop 2018.11

Research paper, summary (international conference)
Waseda Meisei at TRECVID2018: Ad-hoc video search

Kazuya Ueki, Yu Nakagome, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

Notebook paper of the TRECVID 2018 Workshop 2018.11

Research paper, summary (international conference)
定置網漁の日単位漁獲量予測モデリングにおける学習データ量と予測性能の関係の調査

堀内優佳, 幸加木裕也, 小林哲則, 小川哲司

日本水産学会秋季大会要旨 2018.09

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
敵対的デノイジングオートエンコーダによる非線形ひずみ除去フィルタリング

俵直弘, 小林哲則, 藤枝大, 片桐一浩, 矢頭隆, 小川哲司

日本音響学会講演論文集 2018 ( 秋季 ) 159 - 162 2018.09
非線形ひずみ除去のための敵対的 denoising autoencoder

俵直弘, 小林哲則, 藤枝大, 片桐一浩, 矢頭隆, 小川哲司

情報処理学会研究報告 2018-SLP-123 ( 1 ) 1 - 7 2018.07
牛の分娩予兆として映像から観測可能な状態の検知

沖本祐典, 菅原一真, 齊藤奨, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

人工知能学会全国大会論文集 JSAI2018 2018.06

DOI CiNii
AIで風車の異常を見つける：データ駆動型アプローチによる異常検知の最新動向

長谷川隆徳, 緒方淳, 村川正宏, 小川哲司

日本風力エネルギー学会誌 42 ( 1 ) 72 - 76 2018.05 [Invited]

Authorship：Last author, Corresponding author

Article, review, commentary, editorial, etc. (scientific journal)

DOI
定置網漁における漁獲過程モデルを用いたシロサケの日単位漁獲量予測

幸加木裕也, 俵直弘, 小林哲則, 橋本和夫, 小川哲司

日本水産学会春季大会要旨 2018.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
映像情報を用いた分娩時の牛の状態推定

沖本祐典, 菅原一真, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

日本畜産学会第124回大会要旨 2018.03

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
敵対的マルチタスク学習を用いた話者の違いに頑健な特徴抽出とゼロリソース音素識別による評価

土屋平, 俵直弘, 小川哲司, 小林哲則

日本音響学会講演論文集 2018 ( 春季 ) 9 - 12 2018.03
話者正規化における言語非依存性とゼロリソース音声認識における効果

島田拓也, 俵直弘, 小川哲司, 小林哲則

日本音響学会講演論文集 2018 ( 春季 ) 109 - 112 2018.03
敵対的学習に基づく話者特徴抽出

俵直弘, 土屋平, 小川哲司, 小林哲則

日本音響学会講演論文集 2018 ( 春季 ) 141 - 144 2018.03
異種データ活用のための変換複合行列分解

土屋平, 岩田具治, 小川哲司

電子情報通信学会技術研究報告 (IBISML) 117 ( 475 ) 41 - 48 2018.03

CiNii
正常・損傷の表現学習に基づく風力発電システム異常検知技術の高度化

長谷川隆徳, 緒方淳, 村川正宏, 小川哲司

第39回風力エネルギー利用シンポジウム 371 - 374 2017.12
Waseda Meisei at TRECVID2017: Ad-hoc video search

Kazuya Ueki, Koji Hirakawa, Kotara Kikuchi, Tetsuji Ogawa, Tetsunori Kobayashi

Notebook paper of the TRECVID 2017 Workshop 2017.11

Research paper, summary (international conference)
正常・損傷の表現学習に基づく機械振動異常検知

長谷川隆徳, 緒方淳, 村川正宏, 小川哲司

第16回評価・診断に関するシンポジウム講演論文集 5 - 10 2017.11

DOI
複数人対話を対象としたRNN言語モデルにおける発話終端情報利用の有効性

芦川博人, 俵直弘, 小川厚徳, 岩田具治, 小林哲則, 小川哲司

日本音響学会講演論文集 2017 ( 秋季 ) 23 - 26 2017.09
ドメイン依存・非依存の内部表現を有する再帰型ニューラルネットワーク言語モデル

森岡幹, 俵直弘, 小川哲司, 小川厚徳, 岩田具治, 小林哲則

日本音響学会講演論文集 2017 ( 秋季 ) 27 - 30 2017.09
会話参加状態を考慮した振る舞いをするロボットのシステムアーキテクチャ

菅原一真, 浅野秀平, 赤川優斗, 藤江真也, 小川哲司, 小林哲則

人工知能学会全国大会論文集 JSAI2017 2017.06

DOI CiNii
Use of the end of sentence and speaker-derived information in recurrent neural network language models for multiparty conversations

116 ( 477 ) 287 - 290 2017.03

CiNii
国際会議INTERSPEECH2016参加報告

浅見太一, 小川厚徳, 小川哲司, 大谷大和, 倉田岳人, 齋藤大輔, 塩田さやか, 篠原雄介, 鈴木雅之, 高道慎之介, 南條浩輝, 橋本佳, 樋口卓哉, 増村亮, 吉野幸一郎, 渡部晋治

情報処理学会研究報告 (SLP) vol.2016-SLP-115 ( 7 ) 1 - 7 2017.02

Research paper, summary (national, other academic conference)
少量データに頑健なニューラルネットワーク言語モデル

森岡幹, 岩田具治, 小川厚徳, 俵直弘, 小川哲司, 小林哲則

日本音響学会講演論文集 2016 ( 秋季 ) 89 - 92 2016.09
複数人対話のための話者情報を用いたRNN言語モデル

芦川博人, 森岡幹, 小川厚徳, 岩田具治, 俵直弘, 小川哲司, 小林哲則

日本音響学会講演論文集 2016 ( 秋季 ) 85 - 88 2016.09
深層学習を用いた出現音素の偏りに頑健な話者照合手法

佐藤洋輔, 小川哲司, 堀内靖雄, 黒岩眞吾

電子情報通信学会総合大会講演論文集 2016.03

Research paper, summary (national, other academic conference)
連想記憶に基づく線形分離行列推定を用いたタンデム接続型音源分離

大町基, 小川哲司, 小林哲則, 藤枝大, 片桐一浩

日本音響学会講演論文集 2016 ( 春季 ) 21 - 24 2016.03
高次相関を考慮した音響特徴量のDNNに基づく音声認識での利用

小川哲司, 小林哲則, 新田恒雄

日本音響学会講演論文集 2016 ( 春季 ) 161 - 162 2016.03

Authorship：Lead author, Corresponding author

Research paper, summary (national, other academic conference)
ニューラルネットワークに基づく識別器の不確かさの推定とマルチストリーム音声認識への適用

小川哲司, Mallidi Harish, Vesely Karel, Hermansky Hynek

日本音響学会講演論文集 2016 ( 春季 ) 67 - 70 2016.03

Authorship：Lead author, Corresponding author

Research paper, summary (national, other academic conference)
国際会議INTERSPEECH2015参加報告

浅見太一, 大谷大和, 小川哲司, 木下慶介, 倉田岳人, 齋藤大輔, 塩田さやか, 太刀岡勇気, 中村静, 増村亮, 渡部晋治

情報処理学会研究報告 2016-SLP-110 ( 4 ) 1 - 5 2016.02
スペクトラルクラスタリングに基づく話者クラスタリングのための因子分析法の効果の検証

俵直弘, 小川哲司, 小林哲則

日本音響学会講演論文集 2015 ( 秋季 ) 173 - 174 2015.09
連想記憶に基づくブラインド音源分離のエコーキャンセリングへの応用

大町基, 小川哲司, 小林哲則, 藤枝大, 片桐一浩

日本音響学会講演論文集 2015 ( 秋季 ) 593 - 596 2015.09
複数の文脈長を考慮したリカレントニューラルネットワークに基づく言語モデル

森岡幹, 俵直弘, 小川哲司, 岩田具治, 小川厚徳, 堀貴明, 小林哲則

日本音響学会講演論文集 2015 ( 秋季 ) 17 - 20 2015.09
国際会議ICASSP2015参加報告

岡本拓磨, 小川哲司, 落合翼, 柏木陽佑, 亀岡弘和, 木下慶介, 郡山知樹, 齋藤大輔, 篠崎隆宏, 高木信二, 滝口哲也, 太刀岡勇気, 俵直弘, 橋本佳, 藤本雅清, 松田繁樹, 三村正人, 吉岡拓也, 渡部晋治

情報処理学会研究報告 2015-SLP-107 ( 3 ) 1 - 7 2015.07
テンソル積による基底変換に基づく音声認識に関する研究

上田賢次郎, 小川哲司, 小林哲則, 桂田浩一, 新田恒雄

日本音響学会講演論文集 2015 ( 春季 ) 7 - 10 2015.03
国際会議INTERSPEECH2014，SLT2014参加報告

浅見太一, 岩野公司, 小川哲司, 駒谷和範, 齋藤大輔, 篠田浩一, 太刀岡勇気, 東中竜一郎, 福田隆, 増村亮, 渡部晋治

情報処理学会研究報告 2015-SLP-105 ( 7 ) 1 - 6 2015.02

　View Summary

2014 年 9 月 14 日から 18 日にかけシンガポールで開催された ISCA 主催の INTERSPEECH2014，及び，同年 12 月 14 日から 18 日にかけて米国レイク・タホで開催された IEEE 主催の SLT2014 に参加した．ともに，音声言語処理分野で一流の国際会議である．ここでは，海外からの発表を中心に，これらの会議における最新の技術動向，注目すべき発表について報告する．

CiNii
i-vectorを用いたスペクトラルクラスタリングによる雑音環境下話者クラスタリング

俵直弘, 小川哲司, 小林哲則

情報処理学会研究報告 2015-SLP-105 ( 11 ) 1 - 6 2015.02

　View Summary

i-vector による話者表現とスペクトラルクラスタリングを組み合わせることで，雑音に頑健な話者クラスタリングを実現する．まず，雑音を含む音声に対して話者クラスタリングを行う場合，高精度な話者特徴量として知られる i-vector を用いて発話間類似度を計算しても，話者の類似度を適切に推定できないことを実験的に明らかにする．また，この問題に対してスペクトラルクラスタリングを適用することの妥当性をグラフラプラシアンの固有ベクトルを分析することで確認する．最後に，スペクトラルクラスタリングの雑音に対する頑健性を実験的に確認するために，日本語話し言葉コーパスに様々な種類の雑音を重畳して得た音声を用いて話者クラスタリング実験を行い，クリーンな音声と同程度の精度で雑音を含む音声のクラスタリングが可能であることを明らかにする．

CiNii
連想記憶と線形分離フィルタを用いたブラインド音源分離

大町基, 小川哲司, 小林哲則, 藤枝大, 片桐一浩

情報処理学会研究報告 2015-SLP-105 ( 4 ) 1 - 6 2015.02

　View Summary

連想記憶と線形分離フィルタを組み合わせることにより，歪が少ない高精度なブラインド音源分離方式を提案する．独立成分分析（ICA）や独立ベクトル分析（IVA）のような線形フィルタに基づく音源分離は，歪が少ないという特徴を持つ．しかしながら，ICA，IVA は，音源の独立性や非ガウス性を仮定するため，これが成立しないとき分離性能が劣化する．提案法は，線形分離フィルタの出力に最も近い無歪の音声を連想記憶を用いて求める処理と，連想記憶の出力に分離フィルタの出力が近づくよう分離フィルタの係数を補正する処理とを繰り返すことで分離音声を求める．これにより音源の独立性を仮定すること無く，歪の少ない分離音声を得ることができる．2 話者同時発話音声に対する音源分離実験の結果，提案法は IVA より分離精度を向上できることを確認した．

CiNii
スペクトラルクラスタリングに基づく話者クラスタリング

俵直弘, 小川哲司, 小林哲則

日本音響学会講演論文集 2014 ( 秋季 ) 95 - 98 2014.09
A study on MLP-based speaker canonicalization

IPSJ SIG Notes 2014-SLP-102 ( 8 ) 1 - 6 2014.07

　View Summary

Accurate and efficient speaker canonicalization is proposed to improve the performance of speaker-independent ASR systems. Vocal tract length normalization (VTLN) is often applied to speaker canonicalization in ASR; however, it requires parallel decoding of speech when estimating the optimal warping parameter. In addition, VTLN provides the same linear spectral transformation in an utterance, although optimal mapping functions differ among phonemes. In this study, we propose a novel speaker canonicalization using multilayer perceptron (MLP) that is trained with a data set of vowels to map an input spectrum to the output spectrum of a standard speaker or a canonical speaker. The proposed speaker canonicalization operates according to the integration of MLP-based mapping and identity mapping that depends on frequency bands and achieves accurate recognition without any tuning of mapping function during run-time. Results of experiments conducted with a continuous digit recognition task showed that the proposed method reduces the intra-class variability in both of the vowel and consonant parts and outperforms VTLN.

CiNii
Speaker recognition using i-vector

Ogawa Tetsuji, Shiota Sayaka

THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN 70 ( 6 ) 332 - 339 2014.06 [Invited]

Authorship：Lead author, Corresponding author

Article, review, commentary, editorial, etc. (scientific journal)

DOI CiNii J-GLOBAL
標準話者母音スペクトルへの変換に基づく話者正準化

久保田雄一, 大町基, 小川哲司, 小林哲則, 新田恒雄

日本音響学会講演論文集 2014 ( 春季 ) 77 - 78 2014.03
因子分析モデルに基づく話者照合の環境変動に対する頑健性の調査

福地佑介, 俵直弘, 小川哲司, 小林哲則

日本音響学会講演論文集 2013 ( 秋季 ) 75 - 78 2013.09
Machine learning for speaker recognition

OGAWA Tetsuji, MATSUI Tomoko

The Journal of the Acoustical Society of Japan 69 ( 7 ) 349 - 356 2013.07

CiNii
効率的なサンプリング手法を用いた話者モデリング

俵直弘, 小川哲司, 渡部晋治, 中村篤, 小林哲則

情報処理学会研究報告 2013-SLP-97 ( 2 ) 1 - 8 2013.07

　View Summary

多重スケール混合分布（Multi-scale mixture model）を推定するための効率的なサンプリング手法を提案する．多重スケール混合分布は，混合分布を要素分布として持つ混合モデルで，本稿では，要素分布として混合ガウス分布（Gaussian mixture model: GMM）を導入したモデルを扱う．複数の話者が発話した音声データの集合に対して本モデルを適用した場合，発話のような数十フレーム程度の比較的短いスケールで観測される話者内変動は，各要素 GMM により表現される．一方で，異なる話者の発話間に含まれ，比較的長いスケールで観測される話者間変動は，多重スケール混合分布全体により表現される．このような階層構造を持つ複雑な分布のモデル構造推定問題では，マルコフ連鎖モンテカルロ（Markov chain Monte Carlo: MCMC）法のような確率論的アプローチに基づくモデル推定の枠組みが有効である．しかし，ギブスサンプリングのような単純な MCMC 法をそのまま適用した場合，本来は階層構造を持つべき長時間スケールの構造と短時間スケールの構造が，どちらも対等にサンプリングされるため，繰り返しを含むモデル推定の過程で，容易に局所解に陥ってしまう．そこで，本研究では，blocked ギブスサンプリングに類する手法を導入することで，モデルの階層構造を考慮できるサンプリング手法を提案する．このとき，Iterative conditional modes (ICM) アルゴリズムを導入し，一部のサンプリングプロセスを決定論的な枠組みに置き換えることにより，全ての分布がひとつの分布に縮退してしまう病的な解が選ばれる現象を回避できることを示す．非定常なノイズを重畳した評価セットに対する話者クラスタリング実験により，提案するサンプリング法に基づく構造推定手法が，従来のサンプリング手法や変分ベイズ法に基づく構造推定手法よりも，高い精度でクラスタリング出来ることを示した．

CiNii
話者認識で用いる機械学習

小川哲司, 松井知子

日本音響学会誌 69 ( 7 ) 349 - 356 2013.07 [Invited]

Authorship：Lead author, Corresponding author

Article, review, commentary, editorial, etc. (scientific journal)

DOI
指向性を付与したマルチチャネルウィーナフィルタを前段に持つ音源分離方式の検討

大町基, 小川哲司, 赤桐健三, 小林哲則

日本音響学会講演論文集 2013 ( 春季 ) 937 - 940 2013.03
性能モニタリングに基づく多層パーセプトロンの適応的選択による雑音に頑健なマルチストリーム音声認識

小川哲司, Li Feipeng, Hermansky Hynek

日本音響学会講演論文集 2013 ( 春季 ) 167 - 170 2013.03
Current situations and issues of speaker recognition technologies

AMINO Kanae, ISHIHARA Shunichi, OGAWA Tetsuji, OSANAI Takashi, KUROIWA Shingo, KOSHINAKA Takafumi, SHINODA Koichi, TSUGE Satoru, NISHIDA Masafumi, MATSUI Tomoko, WANG Longbiao

IEICE technical report. Speech 112 ( 450 ) 63 - 70 2013.02 [Invited]

　View Summary

Speaker recognition for recognizing who is speaking from his/her voice has been studied for 30 years. As the importance of security measures becomes greater, speaker recognition research has recently come to its boom. In this article, we survey the present state of speaker recognition researches and address their problems. In particular, we focus on their world-wide trends, machine learning approaches, robustness against the variety of environments, forensics applications.

CiNii
New Speech Research Paradigm in the Cloud Era

2012-SLP-92 ( 4 ) 1 - 7 2012.07

CiNii
Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

2012-SLP-92 ( 8 ) 1 - 6 2012.07

CiNii
発話単位DPMMを用いたフルベイズ話者クラスタリングと大規模データによる評価

俵直弘, 小川哲司, 渡部晋治, 中村篤, 小林哲則

日本音響学会講演論文集 2012 ( 春季 ) 207 - 210 2012.03
話者照合における因子分析に基づく特徴抽出に関する評価

小川哲司, 小林哲則

日本音響学会講演論文集 2012 ( 春季 ) 197 - 198 2012.03
Fully Bayesian speaker clustering based on hierarchical structured Dirichlet process mixture model

TAWARA Naohiro, OGAWA Tetsuji, WATANABE Shinji, NAKAMURA Atsushi, KOBAYASHI Tetsunori

111 ( 480 ) 21 - 28 2012.03

　View Summary

We proposed a novel speaker clustering method by estimating the structure of a fully Bayesian utterance generative model with a hierarchical structure. We defined the hierarchical generative model as a mixture of GMMs, which represent each speaker's distribution. We approximately estimated this model by introducing a sampling method because strict estimation of this model was infeasible. From speaker clustering experiments, we showed that the proposed method was effective to the data in which the number of utterances varied from speaker to speaker, while the conventional method caused significant degradation in clustering accuracy for these data.

CiNii
多重混合ガウス分布モデルにおけるフルベイズモデル推定手法の検討と話者クラスタリングによる評価

俵直弘, 渡部晋治, 小川哲司, 小林哲則

日本音響学会講演論文集 2011 ( 秋季 ) 175 - 178 2011.09
Modified LSD 最小化に基づく空間フィルタキャリブレーション

田中信秋, 小川哲司, 小林哲則

日本音響学会講演論文集 2011 ( 秋季 ) 33 - 36 2011.09
クラス内変動に頑健なカーネルマシンと話者照合への適用

小川哲司, 日野英逸, 村田昇, 小林哲則

日本音響学会講演論文集 2011 ( 秋季 ) 183 - 186 2011.09

Authorship：Lead author, Corresponding author

Research paper, summary (national, other academic conference)
Speaker verification system robust to speaking style variation using multiple kernel learning based on conditional entropy minimization

2011-SLP-87 ( 3 ) 1 - 6 2011.07

CiNii
発話を単位としたディリクレ過程混合モデルに基づく話者クラスタリング

俵直弘, 渡部晋治, 小川哲司, 小林哲則

日本音響学会講演論文集 2011 ( 春季 ) 41 - 44 2011.03
Investigation on optimization in speaker recognition using multiple kernel learning

IEICE technical report 110 ( 357 ) 153 - 158 2010.12

CiNii
マルチカーネル学習を用いた話者認識における最適化の検討

小川哲司, 日野英逸, Nima Reyhani, 村田昇, 小林哲則

情報処理学会研究報告 2010-SLP-84 ( 27 ) 1 - 6 2010.12

CiNii
Toward Developing Practical Automatic Speech Recognition Technology : Sound Source Separation Using Square Microphone Array

Takashi Yazu, Makoto Morito, Kei Yamada, Tetsuji Ogawa

IPSJ Magazine 51 ( 11 ) 1410 - 1416 2010.11

Authorship：Last author

Article, review, commentary, editorial, etc. (scientific journal)

CiNii
シャッタが切り取る世界（ちょっとしたエッセイ）

小川哲司

日本音響学会誌 66 ( 10 ) 528 - 528 2010.10

Authorship：Lead author, Corresponding author

Article, review, commentary, editorial, etc. (scientific journal)

DOI CiNii
情報論的な最適化に基づくマルチカーネル学習を用いた話者認識

小川哲司, 日野英逸, Nima Reyhani, 村田昇, 小林哲則

日本音響学会講演論文集 2010 ( 秋季 ) 81 - 84 2010.09
CENSREC-1-AV An evaluation framework for multimodal speech recognition

Satoshi,Tamura, Chiyomi,Miyajima, Norihide,Kitaoka, Kazuya,Takeda, Takeshi,Yamada, Tetsuya,Takiguchi, Satoru,Tsuge, Kazumasa,Yamamoto, Takanobu,Nishiura, Masato,Nakayama, Yuki,Denda, Masakiyo,Fujimoto, Shigeki,Matsuda, Tetsuji,Ogawa, Shingo,Kuroiwa, Satoshi,Nakamura

IPSJ SIG Notes 2010 ( 7 ) 1 - 6 2010.07

　View Summary

This paper introduces an evaluation framework for multimodal speech recognition: CENSREC-1-AV. The corpus CENSREC-1-AV provides an audiovisual speech database and a baseline system of multimodal speech recognition. Speech signals were recorded in clean condition for training and in-car noises were overlapped for testing. Color and infrared pictures were captured as training data, and image corruption was conducted for testing using the gamma correction technique. In the baseline system, acoustic MFCC as well as eigenface or optical-flow information are adopted as audio and visual features respectively, then multi-stream HMMs are used as a recognition model.

CiNii
雑音下マルチモーダル音声認識評価基盤CENSREC-1-AVの構築

田村哲嗣, 宮島千代美, 北岡教英, 武田一哉, 山田武志, 滝口哲也, 柘植覚, 山本一公, 西浦敬信, 中山雅人, 傳田遊亀, 藤本雅清, 松田繁樹, 小川哲司, 黒岩眞吾, 中村哲

情報処理学会研究報告 2010-SLP-82 ( 7 ) 1 - 6 2010.07

CiNii
CENSREC-1-AV: マルチモーダル音声認識コーパスの構築

田村哲嗣, 宮島千代美, 北岡教英, 武田一哉, 山田武志, 滝口哲也, 柘植覚, 山本一公, 西浦敬信, 中山雅人, 傳田遊亀, 藤本雅清, 松田繁樹, 小川哲司, 黒岩眞吾, 中村哲

日本音響学会講演論文集 2010 ( 春季 ) 219 - 220 2010.03
Conversation Robot and Its Audition System

FUJIE Shinya, OGAWA Tetsuji, KOBAYASHI Tetsunori

JRSJ 28 ( 1 ) 23 - 26 2010.01

Article, review, commentary, editorial, etc. (scientific journal)

DOI CiNii
ロボット頭頂部に設置した小型正方形マイクロホンアレイによる音源定位

細谷耕佑, 小川哲司, 小林哲則

日本音響学会講演論文集 2009 ( 秋季 ) 775 - 778 2009.09

Research paper, summary (national, other academic conference)
音声認識利用者の発声方法誘導を行うエキスパートシステムの実装と評価

網田康裕, 中野鐵兵, 小川哲司, 菊池英明, 小林哲則

日本音響学会講演論文集 2009 ( 秋季 ) 229 - 230 2009.09
ゾーン強調型ビームフォーマの構築

田中信秋, 細谷耕佑, 小川哲司, 小林哲則

日本音響学会講演論文集 2009 ( 秋季 ) 153 - 154 2009.09
ロンバード発声音声コーパスの設計と評価

小川哲司, 川野弘, 西浦敬信, 山田武志, 北岡教英, 小林哲則

日本音響学会講演論文集 2009 ( 秋季 ) 141 - 144 2009.09
連続円動作の認識に基づくメニュー項目の選択法

橋口拓弥, 藤江真也, 小川哲司, 中野鐵兵, 小林哲則

画像の理解・認識シンポジウム(MIRU2009)予稿集 IS3-70 1846 - 1850 2009.07
騒音下音声認識システム評価におけるロンバード効果の影響の検証−ロンバード発声適応モデルを用いた評価−

小川哲司, 小林哲則

日本音響学会講演論文集 2009 ( 春季 ) 175 - 176 2009.03
Hands-free speech recognition system for robot

HOSOYA Kosuke, OGAWA Tetsuji, FUJIE Shinya, WATANABE Daichi, ICHIKAWA Yuhi, TANIYAMA Hikaru, KOBAYASHI Tetsunori

IPSJ SIG Notes 2008-SLP-74 ( 123 ) 7 - 12 2008.12

　View Summary

A new type of noise reduction method suitable for autonomous mobile robots is proposed and applied to pre-processing of a hands-free spoken dialogue system. The proposed method can reduce various kinds of noise such as directional noise, diffuse noise, moving noise of the robot, and speech utterance from the robot, which are mixed with the target speech for the case in which people talk with the robot, by using small and light-weighted devices and low-computational-cost algorithms. Here, we assume that the people talking with the robot is in the front of the robot, and thus the proposed method aims at extracting speech signals coming from the frontal direction of the robot. In addition, for the case in which the people moves from the front of the robot, the sound source can be localized by face detection and tracking using facial images obtained from a camera mounted on eyes of the robot. By taking advantage of the robot, which can combine speech information with image information, real-time reduction of the various noise can be achieved, and thus the hands-free spoken dialogue system can work well in real environments.

CiNii
Progress Report of SLP Noisy Speech Recognition Evaluation WG : Individual evaluation framework for each factor affecting recognition performance (3)

KITAOKA Norihide, YAMADA Takeshi, TAKIGUCHI Tetsuya, TSUGE Satoru, YAMAMOTO Kazumasa, MIYAJIMA Chiyomi, NISHIURA Takanobu, NAKAYAMA Masato, DENDA Yuki, FUJIMOTO Masakiyo, TAMURA Satoshi, MATSUDA Shigeki, OGAWA Tetsuji, KUROIWA Shingo, TAKEDA Kazuya, NAKAMURA Satoshi

IPSJ SIG Notes 2008-SLP-73 ( 102 ) 41 - 46 2008.10

　View Summary

We organized a working group under Special Interest Group of Spoken Language Processing in Information Processing Society of Japan have developed evaluation frameworks of noisy speech recognition (CENSREC series) with which one can evaluate his/her own noise-robust speech recognition method and compare it with the others. In this report, we introduce the series and then review the history of the noisy speech recognition researches in ASJ and ICASSP and view the roles of our works in the history. Finally we discuss the future directions.

CiNii
Dimensionality Reduction in Rescoring Using Likelihood Patterns Given by HMMs

OGAWA Tetsuji, KOBAYASHI Tetsunori

IEICE technical report 108 ( 142 ) 73 - 78 2008.07

　View Summary

We investigate dimensionality reduction of feature vectors in rescoring using likelihood patterns given by HMMs with long-time structures as feature parameters. The likelihood patterns calculated for word utterances by using word-wise statistical models give discriminative patterns even if those utterances belong to the different word classes consisting of similar phonemes. This characteristic can contribute to reduction of errors for the classes that are difficult to classify by conventional ML classification in rescoring using the likelihood feature vectors with long-time structures. However, since this method utilizes the likelihood feature vectors with a dimensionality of the number of the vocabulary, it is not feasible for large vocabulary tasks. Thus, in the present paper, we attempt to reduce the dimensionality of the feature vectors by selecting only the word classes that contribute to classification from the vocabulary and using the likelihoods only for those word classes as the feature parameters. For the case in which static pattern recognition on the feature space constructed from the likelihood feature vectors is applied to rescoring of the word recognition system, proposed dimensionality reduction did not degrade the performance considerably compared to the system without dimensionality reduction, and it improved the performance compared to the conventional HMMs.

CiNii
HMM における尤度パターンの非対称性を利用した音声認識

加藤健一, 小川哲司, 小林哲則

日本音響学会講演論文集 2008 ( 春季 ) 209 - 212 2008.03
ロボット頭部に設置した4系統小型無指向性マイクロホンによるハンズフリー音声認識

竹内寛史, 高田晋太郎, 小川哲司, 赤桐健三, 小林哲則, 森戸誠

日本音響学会講演論文集 2008 ( 春季 ) 155 - 158 2008.03
残響下音声認識評価基盤（CENSREC-4）の構築

西浦敬信, 中山雅人, 傳田遊亀, 北岡教英, 山本一公, 山田武志, 藤本雅清, 柘植覚, 宮島千代美, 滝口哲也, 田村哲嗣, 小川哲司, 松田繁樹, 黒岩眞吾, 武田一哉, 中村哲

日本音響学会講演論文集 2008 ( 春季 ) 175 - 178 2008.03
雑音下音声認識評価ワーキンググループ活動報告：認識に影響する要因の個別評価環境(2)

北岡教英, 山田武志, 滝口哲也, 柘植覚, 山本一公, 宮島千代美, 西浦敬信, 中山雅人, 傳田遊亀, 藤本雅清, 田村哲嗣, 松田繁樹, 小川哲司, 黒岩眞吾, 武田一哉, 中村哲

情報処理学会研究報告 2007-SLP-69 1 - 6 2007.12

CiNii
指向性雑音と拡散性雑音の混在する環境を対象とした携帯端末向け音声強調の検討

高田晋太郎, 小川哲司, 赤桐健三, 小林哲則

日本音響学会講演論文集 2007 ( 秋季 ) 743 - 746 2007.09
テンプレート群からの確率的距離を用いた階層的音声認識の検討

加藤健一, 小川哲司, 小林哲則

日本音響学会講演論文集 2007 ( 秋季 ) 147 - 150 2007.09
シミュレーションに基づく騒音下音声認識システム評価におけるロンバード効果の影響の検証−複数の認識タスク，騒音レベルに対する評価−

小川哲司, 倉持公壮, 小林哲則

日本音響学会講演論文集 2007 ( 秋季 ) 195 - 198 2007.09
Hierarchical Spoken Word Recognition System Using Probabilistic Distances from a Group of Templates with Long-Time Structures

KATO Ken-ichi, OGAWA Tetsuji, KOBAYASHI Tetsunori

IEICE technical report 107 ( 116 ) 79 - 84 2007.06

　View Summary

We propose a hierarchical spoken word recognition method which calculates probabilistic distances from a group of templates with relatively long-time structures at the first stage and adopts static pattern recognition using the probabilistic distances as feature vectors at the second stage. Almost all of conventional speech recognizers treat the time series of spectral parameter as feature vectors. They prepare the statistical models for each category. The category with highest likelihood is estimated as the category of the input data. Here, the likelihood of each category is dependent on the quantity or quality of training dataset and also structure of the statistical models. This fact leads to the classifier-specific recognition-error trends. The probabilistic distances from templates are stable if the word probability models are selected as templates. In the present paper, the fact is shown that hierarchical spoken word recognition using probabilistic distances from word templates as feature vectors can reduce errors even if the likelihood of the correct category is not highest. As the result of spoken word recognition experiment, it is shown that 79% of errors can be reduced in the proposed method compared with the conventional HMM-based speech recognition method.

CiNii
重み付きHLDA を用いた相補的識別器の構成

加藤健一, 小川哲司, 小林哲則

日本音響学会講演論文集 2007 ( 春季 ) 39 - 40 2007.03
空間フィルタとポストフィルタを用いた背景雑音抑圧

高田晋太郎, 小川哲司, 赤桐健三, 小林哲則

日本音響学会講演論文集 2007 ( 春季 ) 575 - 576 2007.03
プロキシエージェントアーキテクチャによる音声認識アプリケーション用ユーザモニタリング機能の効率化

中野鐵兵, 梅本暁, 藤江真也, 小川哲司, 小林哲則

情報処理学会研究報告 (SLP) 2006-SLP-65 23 - 28 2007.02

Research paper, summary (national, other academic conference)
Combining Complementary Classifiers generated by Boosting in Feature Transformation

KATO Ken-ichi, OGAWA Tetsuji, KOBAYASHI Tetsunori

106 ( 442 ) 25 - 30 2006.12

CiNii
Combining Complementary Classifiers generated by Boosting in Feature Transformation

KATO Ken-ichi, OGAWA Tetsuji, KOBAYASHI Tetsunori

IPSJ SIG Notes 2006 ( 136 (SLP-64) ) 203 - 208 2006.12

　View Summary

A framework of system combination using boosting in a feature transformation is proposed. In general, the combination of multiple classifiers improves the classification performance of each classifier. However, there are two important issues in such a system combination. First, the classification performance is not necessarily improved if the classifiers are not complementary. Second, an inappropriate combination makes the performance worse even if the complementary classifiers can be obtained. In this paper, we attempt to solve how to generate and how to combine the complementary classifiers. Aiming at generating the complementary classifiers, the boosting was applied in HLDA based feature transformation. At the combination stage, a pattern recognition using support vector machine was performed, in which a pair of the likelihoods emitted by the classifiers of the first stage was used as a feature parameter. Experimental results showed the effectiveness of proposed method: it reduced the errors by 74% compared to the case without any system combination.

CiNii
少数のマイクロホンを用いた携帯端末向け音源分離

高田晋太郎, 勘場智之, 小川哲司, 赤桐健三, 小林哲則

日本音響学会講演論文集 2006 ( 秋季 ) 493 - 494 2006.09
時間連続性を利用した音源分離処理の高精度化

勘場智之, 小川哲司, 赤桐健三, 小林哲則

日本音響学会講演論文集 2006 ( 秋季 ) 491 - 492 2006.09
シミュレーションに基づく騒音環境下音声認識におけるロンバード効果の影響

小川哲司, 勘場智之, 小林哲則

日本音響学会講演論文集 2006 ( 秋季 ) 101 - 102 2006.09
Adequacy analysis of simulation-based assessment of speech recognition system

OGAWA Tetsuji, KANBA Satoshi, KOBAYASHI Tetsunori

IEICE technical report 106 ( 123 ) 1 - 6 2006.06

　View Summary

The adequacies of the simulation-based assessment of speech recognition systems in noisy conditions are investigated and discussed. To evaluate the speech recognition systems in various environments, it is desirable to collect the test data in various environments but it is not realistic since enormous works are required. To perform such evaluation efficiently, it is promising to simulate evaluation experiments in target environments described below: comparatively small test data is collected and then test data of the target environment is generated by computing convolution of impulse response of the target environment and the collected data. However, it is not necessarily obvious whether the above simulation can precisely approximate the experiment in practical environment. This paper clarifies the condition to perform effective simulations of noisy speech recognition, focused on the influence of computing convolution of an impulse response and the change of acoustic characteristics affected by the Lombard effects.

CiNii
Sound Source Separation using Null Beamformer and Spectral Subtraction, and its Application to Cellular Phone

TAKADA Shintaro, KANBA Satoshi, OGAWA Tetsuji, AKAGIRI Kenzo, KOBAYASHI Tetsunori

IEICE technical report 106 ( 123 ) 7 - 12 2006.06

　View Summary

A novel speech segregation method which consists of a combination of the null beam former using 3 channel omni-directional microphones and the spectral subtraction is proposed and successfully applied to the mobile terminal devices such as cellular phones and PDAs. To realize the application of the speech recognition technology to the mobile devices in noisy environments, the disturbance sounds or ambient noises are required to be suppressed under the restrictions of small number of microphones, space-saving microphone arrangement, and low-cost calculation. The proposed method aims at solving this problem. In this paper, using the microphones actually embedded in the cellular phone, the performance of the proposed method is evaluated. As the result of the sound source separation and the continuous speech recognition experiments for double-talk, the proposed method improved the PESQ-based MOS value by 1 point and achieved 80% word accuracy in the speech recognition.

CiNii
ロボット頭部に設置したマイクロホンによる環境変動に頑健な音源定位

久保俊明, 持木南生也, 小川哲司, 小林哲則

人工知能学会研究会資料 SIG-Challenge-0522 89 - 94 2005.10

CiNii
BSSとスペクトラルサブトラクションの多段処理による音源分離

伊佐崇, 関矢俊之, 小川哲司, 小林哲則

日本音響学会講演論文集 2005 ( 秋季 ) 705 - 706 2005.09
ロボット頭部に設置した4系統指向性マイクロホンによる音源定位におけるHLDA利用の効果

久保俊明, 持木南生也, 小川哲司, 小林哲則

日本音響学会講演論文集 2005 ( 秋季 ) 717 - 718 2005.09
An extension of the state-observation dependency in Partly Hidden Markov Models and its application to continuous speech recognition

Tetsuji Ogawa, Tetsunori Kobayashi

Systems and Computers in Japan 36 ( 8 ) 31 - 39 2005.07

　View Summary

We extend the state-observation dependencies in a Partly Hidden Markov Model (PHMM) and apply this model to continuous speech recognition. In a PHMM the observations and state transitions are dependent on a series of hidden and observable states. In the standard formulation of a PHMM, the observations and state transitions are conditioned on the same hidden state and observable state variables. Here we also condition the observations and state transitions on the same hidden states but condition the observations and state transitions on different observation states, respectively. This simple improvement to the model gives it significant flexibility allowing it to model stochastic processes more precisely. In addition, by integrating the PHMM containing this extended state-observation dependency with a standard HMM we can construct a stochastic model that we call a Smoothed Partly Hidden Markov Model (SPHMM). Results of continuous speech recognition on a newspaper read-speech have shown reductions of 10 and 24% in the error rate using the PHMM and SPHMM, respectively, compared to a standard HMM thereby displaying the effectiveness of the proposed models. © 2005 Wiley Periodicals, Inc.

DOI
Optimizing the Structure of Partly Hidden Markov Models Using Classification Measure and Genetic Algorithm

OGAWA Tetsuji, KOBAYASHI Tetsunori

IEICE technical report. Speech 105 ( 132 ) 37 - 42 2005.06

　View Summary

A structure of Partly-Hidden Markov Model (PHMM) is optimized. PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can realize the observation dependent behaviors in both observations and state transitions. In the formulation of previous PHMM, we used a common structure in all model categories. However, it is well known that the optimal structure which gives best performance differes from category to category. In this paper, we designed a new structure optimization method in which the state-observation dependences in PHMM are optimally defined with respect to each category using Weighted Likelihood-Ratio Maximization (WLRM) criterion. WLRM criterion induces sparse and discriminative structures, and therefore gives the resulting structurally discriminative models. We define the model structure combination which gives maximum weighted likelihood-ratio for any possible structure patterns as the optimal structures, and Genetic Algorithm is applied to an optimal approximation of search. As the result of continuous speech recognition aiming at lecture talk, the effectiveness of the proposed structure optimization is shown: it reduced the word errors compared to HMM and PHMM with common structure for all categories.

CiNii
Enhancement of Frequency-Domain BSS by Solving Permutation Problem Using Reference Signal and SMDP

ISA Takashi, SEKIYA Toshiyuki, OGAWA Tetsuji, KOBAYASHI Tetsunori

IEICE technical report. Speech 105 ( 133 ) 31 - 36 2005.06

　View Summary

In this paper, we propose a method for solving the permutation problem of blind source separation (BSS) in frequency domain. We integrate techniques, BSS in frequency domain with SMDP (Segregation using Multiple Directivity Pattern). For each frequency, the permutation problem is solved by calculating correlation coefficients between the reference signal and the separated signal. The reference signal is obtained by different processes from BSS as corresponding to individual original signal. It does not need to be separated well. We generate a simultaneous equations of the amplitudes of sound sources using these multiple directivities. The solution of these equations gives good estimates of disturbances. The spectral subtraction is applied with these estimates of disturbances and the perfect enhancement of target speech is performed. The experimental results of double talk recognition show that the proposed technique is effective to achieve 30% error reduction.

CiNii
Effectiveness of adopting HLDA in sound source localization using spectral intensity ratio of microphones

KUBO Toshiaki, MOCHIKI Naoya, SEKIYA Toshiyuki, OGAWA Tetsuji, KOBAYASHI Tetsunori

IEICE technical report. Speech 105 ( 133 ) 37 - 42 2005.06

　View Summary

We propose a novel sound source localization method which gives a robust performance in various environments by Heteroscedastic LDA (HLDA). In our previous work, a robust sound localization method which does not require a strict head related transfer function (HRTF) was proposed. In this method, spectral intensity ratio of microphones mounted on the robot head is extracted as a feature parameters, and then a statistical pattern recognition is conducted. In our pattern recognition, it is well known that the degradation of performance is invoked by the difference between the training environment and the operating environment. In order to compensate the difference, a model adaptation technique, such as MLLR, is executed using a small amount of data obtained from several directions in the operating environment. However, in case that an environment in which the robot acts changes at any time, it is practically difficult to adapt in every case. Thus, in this paper, we propose the useful information extraction from feature vectors by using HLDA. In this method, nuisance information which dones not contribute to discrimination, such as reverberation is deleted and essential information can be extracted. As the result of sound source localization experiment, the robustness of the proposed method is shown.

CiNii
ロボット頭部に設置した４系統指向性マイクロホンによる音源定位

持木南生也, 関矢俊之, 小川哲司, 小林哲則

日本音響学会講演論文集 2005 ( 春季 ) 609 - 610 2005.03
重み付き尤度比最大基準に基づく部分隠れマルコフモデルの構造の最適化

小川哲司, 小林哲則

日本音響学会講演論文集 2005 ( 春季 ) 131 - 132 2005.03
ロボット頭部に設置した4系統指向性マイクロフォンによる音源定位および混合音声認識

持木南生也, 関矢俊之, 小川哲司, 小林哲則

人工知能学会研究会資料 SIG-Challenge-0420-4 21 - 27 2004.12
複数の指向特性を利用した音源分離における音源定位との統合

関矢俊之, 小川哲司, 小林哲則

日本音響学会講演論文集 2004 ( 秋季 ) 617 - 618 2004.10
雑音環境下における階層的音源分離の評価

関矢俊之, 澤田知寛, 小川哲司, 小林哲則

日本音響学会講演論文集 2004 ( 春季 ) 99 - 100 2004.03
ロボット頭部に設置した4系統指向性マイクロホンによる混合音声認識

持木南生也, 関矢俊之, 小川哲司, 小林哲則

日本音響学会講演論文集 2004 ( 春季 ) 95 - 96 2004.03
階層的音源分離に基づく混合音声の認識

澤田知寛, 関矢俊之, 小川哲司, 小林哲則

人工知能学会研究会資料 SIG-Challenge-0318-5 27 - 32 2003.11
Mixed Speech Recognition Using Microphonearray

SEKIYA Toshiyuki, OGAWA Tetsuji, KOBAYASHI Tetsunori

IEICE technical report. Speech 103 ( 93 ) 13 - 18 2003.05

　View Summary

Double-talk recognition under distant microphone condition is one of the serious problems in real environment speech recognition. In this paper, this problem is solved by the microphone-array based BSAS (Band-Selection-based Audio Segregation). In this approach, we prepare some different directivity characteristics using a microphone array, and utilize the difference of these outputs of the array to extract desired speech. We also used generalized harmonic analysis (GHA) instead of FFT for the spectral analysis to improve the performance of BSAS. These modifications enable good segregation in a human auditory sense, but the quality is still insufficient for recognition because some spectral distortion occur in segregation processing. We used MLLR-based acoustic model adaptation and retraining to be robust to the spectral distortion. These efforts enabled 76.2% word accuracy under the condition that the SN ratio is 0 dB, this represents a 45% reduction in the error obtained in the case where only array signal processing was used, and a 30% error reduction compared with when array signal processing and BSAS were used.

CiNii
SAFIAによる同時発話音声の認識

関矢俊之, 芹沢新, 小川哲司, 小林哲則

日本音響学会講演論文集 2003 ( 春季 ) 19 - 20 2003.03
部分隠れマルコフモデルの拡張と連続音声認識による評価

小川哲司, 小林哲則

日本音響学会講演論文集 2002 ( 秋季 ) 51 - 52 2002.09
Continuous speech recognition by Partly-Hidden Markov Model

OGAWA Tetsuji, KOBAYASHI Tetsunori

IEICE technical report. Speech 102 ( 159 ) 25 - 30 2002.06

　View Summary

Generalization of state-observation dependencies in Partly-Hidden Markov Model (PHMM) is performed and it is successfully applied to the continuous speech recognition. PHMM, which was proposed in our previous paper, is the novel stochastic model, in which the pairs of the hidden states (H-state) and the observable states (0-state) determine the stochastic phenomena of the current observation and the next state transition. In the previous formulation of PHMM, we used common pair of H-state and 0-state to determine both of these phenomena. In the formulation of modified PHMM proposed here, we use common H-state but different 0-states for the current observation and for the next state separately. This slight modification brought the big flexibility in the modeling of phenomena. Experimental results showed the effectiveness of PHMM (without delta parameters): it reduced the word error by 19% compared to triphone HMM (with delta parameters), respectively.

CiNii
複数の話者依存モデルを用いた話者空間表現に基づく話者適応

牛久祐輔, 小川哲司, 小林哲則

日本音響学会講演論文集 2001 ( 秋季 ) 129 - 130 2001.10
音素単位の部分隠れマルコフモデルにおける状態・出力依存関係の一般化

小川哲司, 小林哲則

日本音響学会講演論文集 2000 ( 秋季 ) 19 - 20 2000.09
部分隠れマルコフモデルにおける状態・出力依存関係の一般化

小川哲司, 古山純子, 小林哲則

日本音響学会講演論文集 2000 ( 春季 ) 155 - 156 2000.03

▼display all

Industrial Property Rights

予兆検知システムおよびプログラム

特許7313610

中野鐵兵, 小川哲司, 小林哲則, 沖本祐典

Patent
収音装置、収音プログラム、及び収音方法

藤枝大, 原宗大, 片桐一浩, 西城耕平, 小林哲則, 小川哲司

Patent
収音装置、収音プログラム、及び収音方法

藤枝大, 片桐一浩, 西城耕平, 小川哲司

Patent
音声認識モデル学習装置、音声認識装置、およびプログラム

佐藤裕明, 所澤愛子, 伊藤均, 三島剛, 河合吉彦, 小森智康, 小川哲司, 佐藤庄衛

Patent
学習装置、音声認識装置、学習方法、および、学習プログラム

Patent
照合装置、照合方法、および、照合プログラム

小川哲司

Patent
制御状態監視システムおよびプログラム

Patent
信号処理装置、信号処理プログラム、信号処理方法、及び収音装置

Patent
予兆検知システムおよびプログラム

中野鐵兵, 小川哲司, 小林哲則, 沖本祐典

Patent
モニタリング対象機器の異常発生予兆検知方法及びシステム

長谷川隆徳, 緒方淳, 小川哲司, 村川正宏

Patent
予測装置、予測方法および予測プログラム

小林哲則, 小川哲司, 森岡幹

Patent
状態監視システム

中野鐵兵, 小林哲則, 斎藤奨, 小川哲司

Patent
単語予測装置、プログラム

岩田具治, 小川厚徳, 小林哲則, 小川哲司, 森岡幹, 川崎真未

Patent
音源分離システム、方法及びプログラム

矢頭隆, 片桐一浩, 藤枝大, 小林哲則, 大町基, 小川哲司

Patent
音源分離装置、方法及びプログラム

Patent
音源分離装置、プログラム及び方法

Patent
音源分離装置、方法及びプログラム

Patent
エコーキャンセラ及びエコーキャンセル方法

小林哲則, 赤桐健三, 藤江真也, 小川哲司

Patent
認識器構築システム、認識器構築方法、組立サービス提供システム、およびプログラム

小林哲則, 中野鐵兵, 藤江真也, 小川哲司

Patent

▼display all

Syllabus

Introduction to Computer Science and Communications Engineering [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Introduction to Computer Science and Communications Engineering

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis A [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Computer Science and Engineering Laboratory A (2)

School of Fundamental Science and Engineering

2026 fall semester
Computer Science and Engineering Laboratory B

School of Fundamental Science and Engineering

2026 spring semester
Computer Science and Engineering Laboratory A [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Logic Circuits

School of Fundamental Science and Engineering

2026 spring semester
Logic Circuits [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2026 spring semester
Pattern Recognition and Machine Learning

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis A (Intensive Course)

School of Fundamental Science and Engineering

2026 an intensive course(spring and fall)
Bachelor Thesis A（Fall Semester）

School of Fundamental Science and Engineering

2026 fall semester
Project Research B

School of Fundamental Science and Engineering

2026 fall semester
Project Research A

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis B（Spring Semester）

School of Fundamental Science and Engineering

2026 spring semester
Computer Science and Engineering Laboratory B [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis A（Fall Semester）

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis B [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis B（Spring Semester）

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis A (Intensive Course)

School of Fundamental Science and Engineering

2026 an intensive course(spring and fall)
Bachelor Thesis A (Fall Semester)

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2026 spring semester
Communications and Computer Engineering Laboratory A [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Communications and Computer Engineering Laboratory A

School of Fundamental Science and Engineering

2026 fall semester
Circuit Theory B [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Circuit Theory B

School of Fundamental Science and Engineering

2026 fall semester
Logic Circuits [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Logic Circuits

School of Fundamental Science and Engineering

2026 spring semester
Project Research B

School of Fundamental Science and Engineering

2026 fall semester
Pattern Recognition and Machine Learning

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis B (Spring Semester)

School of Fundamental Science and Engineering

2026 spring semester
Communications and Computer Engineering Laboratory B

School of Fundamental Science and Engineering

2026 spring semester
Project Research A

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis B (Spring Semester)

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis B [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Communications and Computer Engineering Laboratory B [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis A [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis A (Fall Semester)

School of Fundamental Science and Engineering

2026 fall semester
Graduation Thesis B (Spring) [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Graduation Thesis A (Fall)[S Grade][For students enrolled before 2022]

School of Fundamental Science and Engineering

2026 fall semester
Graduation Thesis B (Fall)

School of Fundamental Science and Engineering

2026 fall semester
Graduation Thesis A (Spring)[For students enrolled before 2022]

School of Fundamental Science and Engineering

2026 spring semester
Computer Science and Communications Engineering Laboratory A [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Computer Science and Communications Engineering Laboratory A

School of Fundamental Science and Engineering

2026 fall semester
Project Research Fall

School of Fundamental Science and Engineering

2026 fall semester
Introduction to Computers and Networks

School of Fundamental Science and Engineering

2026 spring semester
Computer Science and Communications Engineering Laboratory B

School of Fundamental Science and Engineering

2026 spring semester
Graduation Thesis B (Spring)

School of Fundamental Science and Engineering

2026 spring semester
Graduation Thesis A (Fall) [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Graduation Thesis A (Fall)

School of Fundamental Science and Engineering

2026 fall semester
Graduation Thesis A (Spring) [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Graduation Thesis A (Spring)

School of Fundamental Science and Engineering

2026 spring semester
Graduation Thesis A (Fall)[For students enrolled before 2022]

School of Fundamental Science and Engineering

2026 fall semester
Graduation Thesis A (Spring)[S Grade][For students enrolled before 2022]

School of Fundamental Science and Engineering

2026 spring semester
Graduation Thesis B (Fall) [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Project Research Spring

School of Fundamental Science and Engineering

2026 spring semester
Logic Circuits [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Logic Circuits

School of Fundamental Science and Engineering

2026 fall semester
Master's Thesis (Department of Computer Science and Communications Engineering)

Graduate School of Fundamental Science and Engineering

2026 full year
Master's Thesis (Department of Computer Science and Communications Engineering)

Graduate School of Fundamental Science and Engineering

2026 full year
Special Laboratory A in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2026 spring semester
Special Laboratory B in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2026 fall semester
Perceptual Computing

Graduate School of Fundamental Science and Engineering

2026 fall semester
Research on Media Intelligence

Graduate School of Fundamental Science and Engineering

2026 full year
Perceptual Computing

Graduate School of Fundamental Science and Engineering

2026 fall semester
Special Laboratory B in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2026 fall semester
Special Laboratory A in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2026 spring semester
Perceptual Computing

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Media Intelligence D

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Media Intelligence C

Graduate School of Fundamental Science and Engineering

2026 spring semester
Seminar on Media Intelligence B

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Media Intelligence A

Graduate School of Fundamental Science and Engineering

2026 spring semester
Research on Media Intelligence

Graduate School of Fundamental Science and Engineering

2026 full year
Seminar on Media Intelligence D

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Media Intelligence C

Graduate School of Fundamental Science and Engineering

2026 spring semester
Seminar on Media Intelligence B

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Media Intelligence A

Graduate School of Fundamental Science and Engineering

2026 spring semester
Pattern Recognition

Graduate School of Fundamental Science and Engineering

2026 spring semester
Pattern Recognition

Graduate School of Fundamental Science and Engineering

2026 spring semester
Special Seminar B in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2026 fall semester
Special Seminar A in Computer Science and Communications Engineering

Graduate School of Fundamental Science and Engineering

2026 spring semester
Research on Media Intelligence

Graduate School of Fundamental Science and Engineering

2026 full year

▼display all

Teaching Experience

Computer Science and Communications Engineering Lab

Waseda University
2023.04

-

Now
Pattern Recognition

Waseda University
2023.04

-

Now
Pattern Recognition and Machine Learning

Waseda University
2023.04

-

Now
回路理論B

早稲田大学
2020.09

-

Now
機械学習

早稲田大学／enPiT-Pro スマートエスイー
2019.04

-

Now
Introduction to Computers and Networks

Waseda University
2019.04

-

Now
情報通信基礎

早稲田大学
2017.04

-

Now
Perceptual Computing

Waseda University
2016.09

-

Now
Logic Circuits

Waseda University
2016.09

-

Now
論理回路

早稲田大学
2016.04

-

Now
最適化と認識・学習

早稲田大学
2021.09

-

2023.03
情報通信実験C／音情報処理

早稲田大学
2016.09

-

2023.03
Pattern Recognition and Machine Learning

Waseda University
2016.04

-

2023.03
工学系のモデリングA

早稲田大学
2016.04

-

2023.03
アルゴリズムとデータ構造A

早稲田大学
2019.04

-

2019.09
Circuit Theory A

Waseda University
2016.09

-

2019.03
Machine Learning

Egypt-Japan University of Science and Technology
2012.09

-

2015.02
知覚情報システム

早稲田大学
2008.04

-

2011.09
音情報処理

早稲田大学オープン教育センター
2008.09

-

2011.03
インタラクティブシステム

早稲田大学オープン教育センター
2008.04

-

2010.09
音インタフェース

早稲田大学オープン教育センター
2007.09

-

2008.03

▼display all

Sub-affiliation

Faculty of Science and Engineering Graduate School of Fundamental Science and Engineering

Research Institute

2026

-

2030

Perceptual Computing Laboratory Director of Research Institute
2024

-

2026

Research Organization for Open Innovation Strategy Concurrent Researcher
2024

-

2026

Waseda Research Institute for Science and Engineering Concurrent Researcher
2024

-

2026

Waseda University Mission-oriented Research and Education Center Concurrent Researcher

Internal Special Research Projects

高品質音声対話データ収集のための音声収録・雑音抑圧方式の体系的検討

2025

　View Summary

　本研究では、意味のある自然な応答が可能な高性能音声対話モデルの構築に向けて、雑音の少ない自然対話音声を取得する方法について検討した。音声対話モデルの深層学習には、人同士の自然な対話コーパスが不可欠であり、その収集においては、残響、他話者音声、背景雑音の影響を抑え、かつ目的話者の音声を歪ませることなく取得することが重要である。しかし、対話参加者がそれぞれ近接マイクを装着して対面対話を収録する場合であっても、本人音声を明瞭に取得できる一方で、他話者の発話や周囲雑音の混入は避けられない。　そこで本研究では、このような雑音を含む対話音源から目的話者音声を抽出するため、信号処理ベースおよび深層学習ベースの音源分離手法を比較・検討した。具体的には、信号処理ベースの手法として独立ベクトル解析（IVA）を用い、深層学習ベースの手法としてMossFormer2のような学習済みモデルを用いたモノラル音源分離手法を適用した。さらに、音源分離後に深層学習ベースの背景雑音除去処理（FRCRNやMossFormerGAN）を組み合わせ、よりクリーンな音源を得るために有効な構成を調査した。　実験の結果、肩掛けマイクロホンで収録した音声に対しては、学習済みモデルと入力音声とのドメインギャップは分離性能に大きな影響を与えず、深層学習ベースのモノラル音源分離手法が最も有効であることを確認した。一方、IVAは分離音声に大きな歪みを生じさせにくいという利点を有するものの、背景雑音が残留しやすい傾向が見られた。ただし、後処理として背景雑音除去を適用することで、一定の音声品質向上が得られることも明らかとなった。以上より、本研究は、高品質な対話コーパス収集基盤の構築に資する知見を示した。
ユーザの意図を直接かつ直感的な形で反映可能な状態監視システムの構築法に関する研究

2025

　View Summary

　本研究では、重症心身障害児（以下、重症児）の顔画像から得られる情報に基づく感情状態推定を対象として、視覚言語モデル（VLM）を活用した個別適応型の画像認識フレームワークを提案した。重症児が表出する感情状態やその表出方法には大きな個人差があり、さらに発達や医学的要因により経時的にも変化する。このため、大規模データに基づいて構築された汎用的な感情状態推定モデルをそのまま活用することは現実的ではない。一方、従来の個別適応型パターン認識手法には、養育者や専門家による高コストなアノテーション、モデリングにおける人工知能（AI）技術者への依存、ならびに判断過程の不透明性といった課題があった。　これに対し本研究では、モデルの再学習を行うことなく、VLMが生成する言語記述を介してパターン認識を実現するとともに、対象者ごとの個別最適化を行う手法を開発した。例えば、快状態の検知においては、入力画像が快状態を表す言語記述とどの程度整合するかをVLMに照合させることで推定を行う。具体的には、まず少数の顔画像から、顔面筋の動きのパターンであるアクションユニット（AU）の強度と自由記述を組み合わせた特徴記述を生成し、推論誤差に基づいて反復的に修正する「VLM記述修正法」を提案した。これにより、専門家の知識や勾配計算を伴う再学習を必要とせずに個別適応を実現するとともに、自然言語記述を通じた高い説明可能性を確保した。さらに、初期記述への依存を軽減するため、複数の特徴記述を生成・統合する「VLM複数記述統合法」を提案し、認識の頑健性を向上させた。　評価実験の結果、提案法は重症児の快・不快推定において有効性を示し、専門家の知識を要する既存手法を上回る性能を達成した。本研究は、少数の画像と自然言語のみを用いて個別最適化を可能にする新たなパターン認識の枠組みを示した点に意義があり、説明可能性が重視される実社会の多様な画像認識課題への展開が期待される。
視覚言語モデルを用いた重症児感情推定モデルの構築法に関する研究

2024

　View Summary

重症心身障害児（重症児）の感情状態推定モデルを効率的に構築する手法を検討した．重症児が表出する感情状態やその表現手段は児ごとに大きく異なるため，モデル構築には養育者による個別のアノテーションが不可欠であり，その負担の大きさが課題となっている．この問題に対し，養育者の関与を限定しつつ個別性の高いモデルを構築するための枠組みとして「Parents-in-the-Loop Learning（PITL）」を提案した．PITLは，１）養育者が持つ知識の提供に基づき初期モデルを構築する「教育段階」，および２）初期モデルの推定結果に対し養育者が検証・修正を行うことでモデル性能を向上させる「矯正段階」から成る．本枠組みにより，養育者の関与を限定した中で，効率的かつ高精度に児専用の感情状態推定モデルを構築することを目指した．しかし，PITLの教育段階では使用するサイン検出器の選定などにAI技術者の介入を要する．重症児の感情状態やその表出方法は児ごとに異なるだけでなく，成長に伴い変化・増加する可能性があるため，都度AI技術者に依存せず，養育者自身の意図に沿って柔軟にモデルを構築できることが望ましい．そこで本研究では，AI技術者の関与をさらに削減する手段として，視覚言語モデル（VLM）の活用を提案し，VLMを用いた初期モデル構築の可能性を検討した．これにより，養育者が直感的なプロンプト入力を通じて，モデル構築プロセスを主体的に進められることを目指した．顔表情サインを用いる重症児一名を対象とした実験の結果，PITLにより構築されたモデルが，人手による精密だが労力を要するアノテーションに基づくモデルと同等の性能を示し，その有効性が確認された．さらに，VLMを用いた初期モデル構築については，複数の正解例をプロンプトとして与えることで，重症児感情状態推定モデルの構築に活用可能であることが示唆された．
持続可能な看護支援のための重度障がい児感情推定システムの構築に関する研究

2023

　View Summary

医療が必要な重症心身障がい児（以下，重症児）とのコミュニケーションを支援する人工知能（AI）技術として，映像から重症児の感情状態や意図を推定する方式について検討を行った．重症児は感情の表出方法に強い個人性があること，および感情状態・意図推定の目的が医療・看護に関する意思決定支援であることから，持続可能な重症児看護のための感情状態推定を，少量の学習データでも頑健に感情状態推定モデルを構築可能（要件Ⅰ），感情状態が検知された際の根拠を説明可能（要件Ⅱ）な形で実現することを試みた．具体的には，感情状態の拠り所（サイン）もしくはその構成要素を識別するような大規模事前学習モデルの利用によって，上述の二つの要件を満たすような感情状態推定の枠組みを提案した．提案方式では，顔表情から感情状態が読み取れる児を想定し，顔面筋の動作単位であるアクションユニット（AU）をサインの構成要素として検出するとともに，検出における中間情報を感情状態推定の特徴量として利用した．顔面筋の動きは実際に養育者が意思決定過程で拠り所とする情報であり，感情状態の特徴表現及び予測根拠の直感的な説明材料として利用可能である．また，顔面筋の動きは人に依らない情報であるため，健常者の大規模データを事前学習モデルの構築に利用出来る．これにより高精度な特徴が抽出され，感情状態推定器の学習が重症児の少量データで可能になると期待される．重症児 1 名の映像データを題材とした快・不快状態の推定実験を通して，提案手法と汎用的な事前学習モデルを用いる手法を比較し，推定性能と予測根拠の説明性の観点から，提案手法が有効であるという結論を得た．本研究で得られた知見は，重症児のコミュニケーション支援AIの開発のみならず，個人依存性が高い属性の予測およびそのためのモデリング一般に貢献することが期待される．
クラウドソーシングにおける品質保証：効率的な回答収集のための動的なタスク発注

2022

　View Summary

クラウドソーシング（インターネットを通じた作業の依頼）の活用により機械学習に必要な大規模データを比較的容易に収集可能となっているが，悪意のあるワーカ等に起因したデータ品質の劣化が問題となる．同一タスクに複数発注を行い回答の多数決を行うことでデータの品質を向上できる一方，発注数の増加に伴うコストの増加も無視できない．それに対し，タスクの難易度に応じて発注数を適応的に決定することで，経済性と信頼性を併せ持つデータ品質保証技術の開発を試みた．家畜の監視画像に対するアノテーションにおいて，発注数の最小値と最大値，ワーカの最低合意率といったパラメータを正解ラベルなしで学習できることを明らかにした．
クラウドソーシングと物体追跡を用いた効率的な映像アノテーションに関する研究

2021

　View Summary

映像中の複数の移動物体に対するアノテーションを効率的に行うため，物体検出器の反復的自己学習により得られる疑似矩形ラベルを活用したインタラクティブなアノテーション方式を提案した．提案方式では，矩形ラベル生成において検出対象の見逃しを低く抑えながら，反復的自己学習により対象の外観の変化に頑健な物体検出器を構築した．また，インタラクティブな追跡により低品質の追跡結果を補正することでアノテーション精度を改善するとともに，対象物体に矩形を描画する既存ツールのアノテーションコストを削減することに成功した．実際，標準的なベンチマークや家畜の映像監視データを用いた検証を通じ，提案方式の高い実用性を確認した．
意思決定⽀援のための説明可能な状態監視システムの構築・運⽤法に関する研究

2021

　View Summary

状態監視システムを運用する過程で蓄えられるデータをクラウドソーシングにより検証することで効率的かつ持続的にシステムを成長させる枠組みを，畜産業従事者の意思決定支援において重要な課題である，家畜の分娩予兆を映像情報から検知するシステムの開発を通じて確立することを試みた．具体的には，１）正例の見逃しを含むラベルノイズに頑健な映像監視モデルのマルチタスク学習法，２）深層ニューラルネットワークによる予測の不確実性推定のための，相補性を考慮したアンサンブル学習法と，複数モデルの予測の不一致に基づくデータ選択法，３）ストリーミング映像の監視システムを実時間動作可能にする実装法を明らかにした．
クラウドソーシングを活用した持続可能な状態監視システムの構築・運用法に関する研究

2020

　View Summary

人の意思決定支援を目的とした映像監視システムは，1）少量データで構築可能，2）持続的に運用可能，3）予測結果の根拠を説明可能，であることが求められる．本研究では，ユーザ（専門家）の意思決定プロセスに係る知識をニューラルネットワークに組み込むことで，これらの要件を満たすシステムを構築・運用するためのフレームワークを確立することを試みた．提案フレームワークに基づき映像監視による繁殖牛の分娩予兆検知システムを構築し，少量データ・環境変動に対して頑健な予兆検知性能と畜産業従事者に対する予測根拠の解釈可能性の両面においてend-to-endアプローチで構築したシステムに対する有効性を明らかにした．
ドローンによる空撮に基づく潮目の検知に関する研究

2020

　View Summary

ドローンによって撮影された海面映像から自動的に潮目を検知する技術の開発を試みた．ドローンによる潮目の検知が可能になれば，良漁場に関する情報を比較的低コストで漁業事業者に提供できるため，操業効率化への貢献が期待される．潮目検知モデルを構築するために，ドローン空撮による潮目画像データセット（画像総数158,739枚）を構築し，潮目の有無に関する識別実験を行った．潮目の検知モデルにPyramid pooling moduleを備えた畳み込みニューラルネットワークを用いたところ，適合率0.90，再現率0.81，Ｆ値0.85という性能で潮目が検知できることがわかった．
映像情報を用いた繁殖牛の発情予兆検知に関する研究

2019

　View Summary

インターネットを通して不特定多数の人に仕事を依頼するクラウドソーシングを用いて，映像から繁殖牛の発情予兆を検知するための技術開発を行った．特に，本研究では，牛の発情予兆として乗駕行動に着目し，その評価基盤を構築した．まず，物体検出アルゴリズムとクラウドソーシングを用いて，牛の検出漏れを抑えながら乗駕行動の有無を信頼性高くアノテーションする方式を開発した．14頭の肉牛がいるフリーストール内で収録した乗駕行動29回分の映像データに対して提案したアノテーションを実施し，合計5020枚の画像からなるデータセットを構築した．さらに，構築したデータセットを用いて交差検証による実験を行ったところ，画像単位では陽性判定率0.80，感度0.76で乗駕行動の検知が可能であることがわかった．
映像監視システムの持続可能な運用法に関する研究

2019

　View Summary

ビッグデータの蓄積を待たずに映像監視システムを早期運用しながら，日々蓄えられるデータを効率的に利用してシステムを成長させる枠組みの確立を試みた．特に，本研究では，パターン認識に基づく映像監視の結果をクラウドソーシングを活用して修正することで，システムの早期運用段階においても高い検知性能を保持する枠組みの開発と検証を行った．映像情報を用いた繁殖牛の分娩検知システムの開発を通じて，提案した映像監視システムの早期運用法に関する評価を行ったところ，パターン認識（分娩検知）とクラウドソーシングを併用することにより，分娩の見逃しを低く抑えながら誤検出を抑制でき，映像監視システムの早期運用が可能であることを明らかにした．
エリア収音と敵対的生成ネットワークを用いた多様な雑音に頑健な音声強調

2018 俵直弘

　View Summary

エリア収音により生じた非線形歪を敵対的デノイジングオートエンコーダ (ADAE) により補正するポストフィルタ法を提案した．エリア収音は時間周波数マスキングに基づき目的音と妨害音を高精度に分離可能な技術であるが，非線形信号処理特有の不快な歪が発生するという問題がある．そこで，単チャネル音源強調において有効なADAEを用いて非線形歪を低減することを試みたところ，音質改善に有効であることが示された．また，分離処理前の観測信号や雑音情報をADAEの補助入力として用いるnoise-aware学習の枠組みを導入することで，強調信号の更なる品質改善が得られた．
エリア収音と深層学習を用いた高速・高精度・低歪の雑音除去フィルタ構成法

2017

　View Summary

拡散性雑音が重畳された音声に対して低歪で高精度な雑音抑圧を実現する方式について検討を行った．そのために，申請者が研究を続けてきた音源分離技術であるエリア収音により目的音と拡散性雑音を分離した後，目的音に残留した雑音成分を抑圧するフィルタの推定法を提案した．具体的には，エリア収音により分離した目的音と雑音のパワースペクトルから深層ニューラルネットワークによって線形フィルタの係数（厳密にはpriori SNR）を推定した．拡散性雑音下での雑音抑圧性能を雑音抑圧率および対数スペクトル距離により評価したところ，提案手法は双方の尺度で従来のマルチチャネルウィナーフィルタの性能を改善した．
メタ認知機能を有するパターン認識システムの構成法に関する研究

2016

　View Summary

　人が持つメタ認知機能（知っているか否かを知る，どの程度知っているかを知る機能）を模倣することで，データの収集だけに頼らずに未知の入力に対して頑健に高い性能を与えるパターン認識方式の確立を目指す．本課題では，雑音下音声認識での評価を通じ，「メタ認知機能を有するパターン認識」の基本となる認識性能予測技術およびマルチストリーム型パターン認識アルゴリズムに焦点を当てて検討を行った．　異なる現象を扱うパターン認識システムをＤＮＮにより多数構築しておき，そのうち最適なシステムをＤＮＮの出力（事後確率）の時間変化量および自己符号化器の復元誤差に基づき選択して用いることで，環境変動に頑健な認識を実現した．
部分隠れマルコフモデルによる自然発話音声認識

2004

　View Summary

　本研究では、音声認識に用いる確率モデルとして一般的に用いられている隠れマルコフモデル(Hidden Markov Model; HMM)に代わる表現能力の高い確率モデルとして、部分隠れマルコフモデル（Partly-Hidden Markov Model; PHMM）を提案している。このPHMMは、状態と出力双方が過去の出力に依存する枠組みであるが、その構造は全てのモデルカテゴリで共通なものを用いてきた。そこで本年度は，重みつき尤度比最大基準に基づき、PHMM における状態と出力間の依存構造をモデルカテゴリ毎に最適に選択することを試みた。　尤度比最大化に基づくモデル構造選択の枠組みでは、正解カテゴリと不正解カテゴリが与える対数尤度の差を直接計算したものを目的関数として導入し、その値を最大にするようなモデル構造を選択する。ここで、尤度比を改善しても認識結果が変わりにくい、尤度比の値が大きな値を持つデータより、それが0に近い値を持つデータを対象として尤度比を改善することが重要であるため、尤度比の値が小さいときはその値をそのまま用い、尤度比の値が大きいときはある閾値で打ち切るように重み付けを行った。この重み付けされた尤度比を重みつき尤度比と呼び、ここでは重みつき尤度比を最大化するようにモデル構造の選択を行った。また本手法では、各々のカテゴリに帰属するデータに対して重みつき尤度比を最大化するのではなく、全てのカテゴリに対して取り得るモデル構造の組み合わせを考え、生成される膨大な数のモデル構造の組み合わせに対して重みつき尤度比を最大化する。そして、最大の重みつき尤度比を与える構造の組み合わせを、最適な構造と考える。しかし、このような膨大なパターンに対する全探索は現実的ではなく、遺伝的アルゴリズムを適用し、全探索おける近似解を与えることを試みた。　学会講演音声を対象とする連続音声認識実験により提案するモデル構造選択手法の有効性を評価したところ、モデル構造を行わないPHMMの誤りを削減することが示された。

▼display all