Updated on 2024/12/21

写真a

 
OGAWA, Tetsuji
 
Affiliation
Faculty of Science and Engineering, School of Fundamental Science and Engineering
Job title
Professor
Degree
Ph.D ( Waseda University )
Profile

Tetsuji Ogawa received his B.S., M.S., and Ph.D. in electrical engineering from Waseda University in Tokyo, Japan, in 2000, 2002, and 2005. He was a Research Associate from 2004 to 2007, a Visiting Lecturer in 2007, an Assistant Professor from 2007 to 2012, and an Associate Professor from 2012 to 2019 at Waseda University. He has been a Professor at Waseda University since 2019. He was an Adjunct Professor at Egypt-Japan University of Science and Technology (E-JUST) from 2012 to 2015. He was a Visiting Scholar in the Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, from June to September in 2012 and from June to August in 2013, and a Visiting Scholar in Speech Processing Group, Faculty of Information Technology, Brno University of Technology, Czech Republic from June to July in 2014 and May to August in 2015. His research interests include stochastic modeling for pattern recognition, speech enhancement, and speech and speaker recognition. He is a member of the Institute for of Electrical and Electronics Engineering (IEEE), Institute of Electoronics, Information and Communication Engineers of Japan (IEICE), Information Processing Society of Japan (IPSJ), The Japanese Society of Artificial Intelligence (JSAI), Acoustic Society of Japan (ASJ), The Japan Society of Mechanical Engineers (JSME), Japan Wind Energy Association (JWEA), Japanese Society of Animal Science (JSAS), and The Japanese Society of Fisheries Science (JSFS).

Research Experience

  • 2020.04
    -
    Now

    NHK放送技術研究所   客員研究員

  • 2019.04
    -
    Now

    Waseda University

  • 2016.06
    -
    Now

    The National Institute of Advanced Industrial Science and Technology (AIST)   Artificial Intelligence Research Center   Guest Researcher

  • 2012.04
    -
    2019.03

    Waseda University

  • 2015.05
    -
    2015.08

    , Brno University of Technology   Visiting Scholar

  • 2012.04
    -
    2015.03

    , Egypt-Japan University of Science and Technology   Adjunct Associate Professor

  • 2014.06
    -
    2014.07

    Brno University of Technology   Visiting Scholar

  • 2013.06
    -
    2013.08

    Johns Hopkins University   Visiting Researcher

  • 2012.06
    -
    2012.09

    Johns Hopkins University   Visiting Researcher

  • 2007.11
    -
    2012.03

    Assistant Professor, Waseda Institute for Advanced Study

  • 2007.04
    -
    2007.10

    Visiting Lecturer, Waseda University

  • 2004.04
    -
    2007.03

    Research Associate, Waseda University

▼display all

Education Background

  • 2002.04
    -
    2005.03

    Waseda University  

  • 2000.04
    -
    2002.03

    Waseda University  

  • 1996.04
    -
    2000.03

    Waseda University  

Committee Memberships

  • 2023.04
    -
    Now

    日本音響学会  評議員

  • 2021.06
    -
    Now

    日本音響学会  会誌編集委員

  • 2019
    -
    Now

    高知県マリンイノベーション運営協議会  委員

  • 2017.09
    -
    Now

    電子情報通信学会  常任査読委員

  • 2014.04
    -
    Now

    日本音響学会  査読委員

  • 2020.11
    -
    2021.06

    音学シンポジウム2021 実行委員

  • 2019.05
    -
    2021.04

    電子情報通信学会 音声研究会  幹事

  • 2019.11
    -
    2020.06

    音学シンポジウム2020 実行委員

  • 2020
     
     

    Speaker Odyssey 2020  Local Organizing Committee

  • 2017.05
    -
    2019.04

    電子情報通信学会 音声研究会  専門委員

  • 2017
     
     

    第7回バイオメトリクスと認識・認証シンポジウム  プログラム委員

  • 2010
    -
    2011

    電子情報通信学会  情報・システムソサイエティ誌 編集委員

  • 2008
    -
    2011

    情報処理学会 音声言語情報処理研究会  運営委員

  • 2010
     
     

    高度言語情報融合フォーラム(ALAGIN)  若手研究者フォーラム実行委員

  • 2009
    -
    2010

    第9回情報科学技術フォーラム(FIT)  プログラム委員

▼display all

Professional Memberships

  • 2021.10
    -
    Now

    Japanese Society of Fisheries Oceanography

  • 2019.05
    -
    Now

    The Japanese Society of Artificial Intelligence (JSAI)

  • 2018.07
    -
    Now

    The Japanese Society of Fisheries Science (JSFS)

  • 2018.06
    -
    Now

    Japan Wind Energy Association (JWEA)

  • 2018.01
    -
    Now

    Japanese Society of Animal Science (JSAS)

  • 2017.08
    -
    Now

    The Japan Society of Mechanical Engineers (JSME)

  • 2008.03
    -
    Now

    Information Processing Society of Japan (IPSJ)

  • 2000.01
    -
    Now

    The Acoustical Society of Japan (ASJ)

  •  
     
     

    Institute of Electoronics, Information and Communication Engineers of Japan (IEICE)

  •  
     
     

    International Speech Communication Association (ISCA)

  •  
     
     

    The Institute of Electrical and Electronics Engineers, Inc. (IEEE)

▼display all

Research Areas

  • Perceptual information processing / Human interface and interaction / Intelligent informatics / Medical systems   医療情報システム / Medical assistive technology   看護理工学 / Aquatic bioproduction science / Animal production science

Research Interests

  • 音声言語情報処理

  • 音響信号処理

  • 画像情報処理

  • 映像情報処理

  • パターン認識

  • 機械学習

  • データ駆動科学

  • 異常検知

  • スマートメンテナンス

  • 精密畜産

  • 精密水産

  • 看護情報

▼display all

Awards

  • 第251回 情報処理学会自然言語処理研究会 優秀発表賞

    2021.12  

    Winner: 佐藤裕明, 小森智康, 三島剛, 河合吉彦, 望月貴裕, 佐藤庄衛, 小川哲司

  • 早稲田大学ティーチングアワード総長賞

    2018.02   早稲田大学  

  • APSIPA ASC2017 Poster Book Prizes

    2017.12   APSIPA ASC2017  

  • 情報処理学会 山下記念研究賞

    2012.03   情報処理学会  

  • 日本音響学会粟屋潔学術奨励賞

    2011.03   日本音響学会  

  • BTAS2008 Best Paper Award

    2008.10   BTAS2008  

▼display all

 

Papers

  • What to refer and how? - Exploring handling of auxiliary information in target speaker extraction

    Tomohiro Hayashi, Riku Ogino, Kohei Saijo, Tetsuji Ogawa

    Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2024 (APSIPA2024)    2024.12  [Refereed]

    Authorship:Last author, Corresponding author

  • Differences between singer and speaker verification: Training singer feature representation extractor utilizing singing voice characteristics

    Sayaka Toma, Tomoki Ariga, Yosuke Higuchi, Ichiju Hayasaka, Rie Shigyo, Tetsuji Ogawa

    Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2024 (APSIPA2024)    2024.12  [Refereed]

    Authorship:Last author, Corresponding author

  • A foundational model for precise and robust wind turbine condition monitoring via viration signals

    Takuya Wakayama, Taiki Inoue, Jun Ogata, Makoto Iida, Tetsuji Ogawa

    Proc. 23rd International Conference on Machine Learning and Applications (ICMLA2024)    2024.12  [Refereed]

    Authorship:Last author, Corresponding author

  • Leveraging data from vast unexplored seas: positive unlabeled learning for refining prediction area in good fishing ground prediction

    Haruki Konii, Teppei Nakano, Yasumasa Miyazawa, Tetsuji Ogawa

    Proc. 27th International Conference on Pattern Recognition (ICPR2024)    2024.12  [Refereed]

    Authorship:Last author, Corresponding author

  • Exploring impact of prioritizing intra-singer acoustic variations on singer embedding extractor construction for singer verification

    Sayaka Toma, Tomoki Ariga, Yosuke Higuchi, Ichiju Hayasaka, Rie Shigyo, Tetsuji Ogawa

    Proc. The 27th Conference of the Oriental COCOSDA (O-COCOSDA2024)    2024.10  [Refereed]

    Authorship:Last author, Corresponding author

  • Construction of individual tracking dataset for developing foundational models in calving sign monitoring for beef cattle

    Michihiro Nakata, Sawa Ohyoshi, Teppei Nakano, Tetsuji Ogawa

    Proc. The 11th European Conference on Precision Livestock Farming (ECPLF2024)     1625 - 1632  2024.09  [Refereed]

    Authorship:Last author, Corresponding author

  • Hierarchical Multi-Task Learning with CTC and Recursive Operation

    Nahomi Kusunoki, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. The 25th Annual Conference of the International Speech Communication Association (INTERSPEECH2024)     2855 - 2859  2024.09  [Refereed]

    DOI

  • Exploring robust and explainable design for facial expression-based emotional state estimation in children with profound intellectual multiple disabilities

    Kota Mochida, Teppei Nakano, Shinya Fujie, Mari Wakabayashi, Tomomi Sato, Tetsuji Ogawa

    Proc. the 32nd European Signal Processing Conference (EUSIPCO2024)     481 - 485  2024.08  [Refereed]

    Authorship:Last author, Corresponding author

  • Normal with occasional anomalies: Feature extraction for detecting non-stationary abnormal events in wind turbines,

    Takuya Wakayama, Taiki Inoue, Jun Ogata, Makoto Iida, Tetsuji Ogawa

    Proc. the 32nd European Signal Processing Conference (EUSIPCO2024)     2012 - 2016  2024.08  [Refereed]

    Authorship:Last author, Corresponding author

  • Parody detection using source-target attention with teacher-forced lyrics

    Tomoki Ariga, Yosuke Higuchi, Kazutoshi Hayasaka, Naoki Okamoto, Tetsuji Ogawa

    2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2024)    2024.04  [Refereed]

    Authorship:Last author, Corresponding author

  • Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization

    Yusuke Fujita, Tetsuji Ogawa, Tetsunori Kobayashi

    IEEE Access   11   140069 - 140076  2023.12  [Refereed]

    DOI

  • A single speech enhancement model unifying dereverberation, denoising, speaker counting, separation, and extraction

    Kohei Saijo, Wangyou Zhang, Zhong-Qiu Wang, Shinji Watanabe, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU2023)    2023.12  [Refereed]

    Authorship:Last author

  • Learning discriminative feature representation via metric learning for early operation of wind turbine anomaly detection systems

    Taiki Inoue, Jun Ogata, Makoto Iida, Tetsuji Ogawa

    Proc. 22nd International Conference on Machine Learning and Applications (ICMLA2023)    2023.12  [Refereed]

    Authorship:Last author, Corresponding author

  • Masry: A text-to-speech system for the Egyptian Arabic

    Ahmed Hammad Azab, Ahmed Bayoumi Zaki, Tetsuji Ogawa, Walid Gomaa

    Proc. 20th International Conference on Informatics in Control, Automation, and Robotics (ICINCO2023)    2023.11  [Refereed]

  • Lightweight Multiscale Attention-Aware Method for Semantic Segmentation of Urban Structural Buildings in Drone Aerial Imagery

    Jacob Herman, Rami Zewail, Tetsuji Ogawa, Samir El Sagheer

    2023 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC)    2023.09  [Refereed]

    DOI

  • Mask-CTC-based encoder pre-training for streaming end-to-end speech recognition

    Huaibo Zhao, Yosuke Higuchi, Yusuke Kida, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. the 31st European Signal Processing Conference (EUSIPCO2023)     56 - 60  2023.09  [Refereed]

  • Voice or Content? --- Exploring impact of speech content on age estimation from voice

    Yuta Ide, Naohiro Tawara, Susumu Saito, Teppei Nakano, Tetsuji Ogawa

    Proc. the 31st European Signal Processing Conference (EUSIPCO2023)     221 - 225  2023.09  [Refereed]

    Authorship:Last author, Corresponding author

  • Spotting parodies: Detecting alignment collapse between lyrics and singing voice

    Tomoki Ariga, Yosuke Higuchi, Mitsunori Kanno, Rie Shigyo, Takato Mizuguchi, Naoki Okamoto, Tetsuji Ogawa

    Proc. the 31st European Signal Processing Conference (EUSIPCO2023)     286 - 290  2023.09  [Refereed]

    Authorship:Last author, Corresponding author

  • Remixing-based unsupervised source separation from scratch

    Kohei Saijo, Tetsuji Ogawa

    Proc. The 24th Annual Conference of the International Speech Communication Association (INTERSPEECH2023)     1678 - 1682  2023.08  [Refereed]

    Authorship:Last author, Corresponding author

  • Thermal Gait Dataset for Deep Learning-Oriented Gait Recognition

    Fatma Youssef, Ahmed El-Mahdy, Tetsuji Ogawa, Walid Gomaa

    2023 International Joint Conference on Neural Networks (IJCNN)    2023.06  [Refereed]

    DOI

  • Narrow Down Forecast Range: Using Knowledge of Past Operations and Attribute-Dependent Thresholding in Good Fishing Ground Prediction

    Haruki Konii, Teppei Nakano, Yasumasa Miyazawa, Tetsuji Ogawa

    OCEANS 2023 - Limerick    2023.06  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

  • Neural Diarization with Non-Autoregressive Intermediate Attractors

    Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

    ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)    2023.06  [Refereed]

    Authorship:Last author

    DOI

  • Self-Remixing: Unsupervised Speech Separation VIA Separation and Remixing

    Kohei Saijo, Tetsuji Ogawa

    ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)    2023.06  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

  • Conversation-Oriented ASR with Multi-Look-Ahead CBS Architecture

    Huaibo Zhao, Shinya Fujie, Tetsuji Ogawa, Jin Sakuma, Yusuke Kida, Tetsunori Kobayashi

    ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)    2023.06  [Refereed]

    DOI

  • BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder

    Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)    2023.06  [Refereed]

    DOI

  • Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss

    Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)    2023.06  [Refereed]

    DOI

  • A Lightweight Transfer Learning-Based Model for Building Classification in Aerial Imagery

    Jacob Herman, Rami Zewail, Tetsuji Ogawa, Samir ElSagheer

    2023 15th International Conference on Computer Research and Development (ICCRD)     181 - 186  2023.01  [Refereed]

    DOI

  • PostMe: Unsupervised Dynamic Microtask Posting For Efficient and Reliable Crowdsourcing

    Ryo Yanagisawa, Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

    2022 IEEE International Conference on Big Data (Big Data)     4039 - 4044  2022.12  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

  • BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model,

    Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

    Proc. The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP2022)    2022.12  [Refereed]

  • Refinement of Utterance Fluency Feature Extraction and Automated Scoring of L2 Oral Fluency with Dialogic Features

    Ryuki Matsuura, Shungo Suzuki, Mao Saeki, Tetsuji Ogawa, Yoichi Matsuyama

    2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)     1309 - 1317  2022.11  [Refereed]

    DOI

  • Do You Know How Humans Sound? Exploring a Qualification Test Design for Crowdsourced Evaluation of Voice Synthesis Quality

    Moe Yaegashi, Susumu Saito, Teppei Nakano, Tetsuji Ogawa

    2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)     980 - 985  2022.11  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

  • Design of Discriminators in GAN-Based Unsupervised Learning of Neural Post-Processors for Suppressing Localized Spectral Distortion

    Riku Ogino, Kohei Saijo, Tetsuji Ogawa

    2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)     969 - 975  2022.11  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

  • Text-only domain adaptation based on intermediate CTC

    Hiroaki Sato, Tomoyasu Komori, Takeshi Mishima, Yoshihiko Kawai, Takahiro Mochizuki, Shoei Sato, Tetsuji Ogawa

    Proc. The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH2022)    2022.09  [Refereed]

    Authorship:Last author

  • Confusion detection for adaptive conversational strategies of an oral proficiency assessment interview agent

    Mao Saeki, Kotoka Miyagi, Shinya Fujie, Shungo Suzuki, Tetsuji Ogawa, Tetsunori Kobayashi, Yoichi Matsuyama

    Proc. The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH2022)    2022.09  [Refereed]

  • Can humans correct errors from system? Investigating error tendencies in speaker identification using crowdsourcing

    Yuta Ide, Susumu Saito, Teppei Nakano, Tetsuji Ogawa

    Proc. The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH2022)    2022.09  [Refereed]

    Authorship:Last author, Corresponding author

  • Unsupervised training of sequential neural beamformer using coarsely-separated and non-separated signals

    Kohei Saijo, Tetsuji Ogawa

    Proc. The 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH2022)    2022.09  [Refereed]

    Authorship:Last author, Corresponding author

  • Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units

    Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2022)     7797 - 7801  2022.05  [Refereed]

    DOI

  • Remix-Cycle-Consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation

    Kohei Saijo, Tetsuji Ogawa

    Proc. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2022)     4373 - 4377  2022.05  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

  • Sequential fish catch counter using vision-based fish detection and tracking

    Riko Tanaka, Teppei Nakano, Tetsuji Ogawa

    Proc. MTS/IEEE OCEANS 2022 Chennai Conference and Exhibit (OCEANS2022)    2022.02  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Inlier modeling-based good fishing ground detection for efficient bullet tuna trolling using meteorological and oceanographic Information

    Yuka Horiuchi, Teppei Nakano, Yasumasa Miyazawa, Tetsuji Ogawa

    Proc. MTS/IEEE OCEANS 2022 Chennai Conference and Exhibit (OCEANS2022)    2022.02  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Multi-Source Domain Generalization Using Domain Attributes for Recurrent Neural Network Language Models

    Naohiro TAWARA, Atsunori OGAWA, Tomoharu IWATA, Hiroto ASHIKAWA, Tetsunori KOBAYASHI, Tetsuji OGAWA

    IEICE Transactions on Information and Systems   E105.D ( 1 ) 150 - 160  2022.01  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

  • An investigation of enhancing CTC model for triggered attention-based streaming ASR

    Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2021 (APSIPA2021)    2021.12  [Refereed]

    Authorship:Corresponding author

  • Comparative study on DNN-based minimum variance beamforming robust to small movements of sound sources

    Kohei Saijo, Kazuhiro Katagiri, Masaru Fujieda, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2021 (APSIPA2021)    2021.12  [Refereed]

    Authorship:Last author, Corresponding author

  • 風車異常検知の効率的運用に向けた正常状態の特徴表現学習

    長谷川隆徳, 緒方淳, 村川正宏, 飯田誠, 小川哲司

    日本風力エネルギー学会論文集   45 ( 3 ) 60 - 68  2021.11  [Refereed]

    Authorship:Last author, Corresponding author

  • SIA-GAN: Scrambling Inversion Attack Using Generative Adversarial Network

    Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa

    IEEE Access   9   129385 - 129393  2021.09  [Refereed]

    Authorship:Last author

    DOI

  • VocalTurk: Exploring Feasibility of Crowdsourced Speaker Identification

    Susumu Saito, Yuta Ide, Teppei Nakano, Tetsuji Ogawa

    Proc. The 22th Annual Conference of the International Speech Communication Association (INTERSPEECH2021)     1723 - 1727  2021.08  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

  • Efficient and Stable Adversarial Learning Using Unpaired Data for Unsupervised Multichannel Speech Separation

    Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. The 22th Annual Conference of the International Speech Communication Association (INTERSPEECH2021)     3051 - 3055  2021.08  [Refereed]

    DOI

  • Improved Mask-CTC for Non-Autoregressive End-to-End ASR

    Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)     8363 - 8367  2021.06  [Refereed]

    DOI

  • Scrambling Parameter Generation to Improve Perceptual Information Hiding

    Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa

    Electronic Imaging   2021 ( 11 ) 155 - 1  2021.01  [Refereed]

    Authorship:Last author

     View Summary

    <italic>The present study proposes the method to improve the perceptual information hiding in image scramble approaches. Image scramble approaches have been used to overcome the privacy issues on the cloud-based machine learning approach. The performance of image scramble approaches are
    depending on the scramble parameters; because it decides the performance of perceptual information hiding. However, in existing image scramble approaches, the performance by scrambling parameters has not been quantitatively evaluated. This may be led to show private information in public.
    To overcome this issue, a suitable metric is investigated to hide PIH, and then scrambling parameter generation is proposed to combine image scramble approaches. Experimental comparisons using several image quality assessment metrics show that Learned Perceptual Image Patch Similarity (LPIPS)
    is suitable for PIH. Also, the proposed scrambling parameter generation is experimentally confirmed effective to hide PIH while keeping the classification performance.</italic>

    DOI

  • Investigation on network architecture for single-channel end-to-end denoising

    Takuya Hasumi, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. The 2020 European Signal Processing Conference (EUSIPCO2020)    2021.01  [Refereed]

    Authorship:Last author, Corresponding author

  • Noise-robust attention learning for end-to-end speech recognition

    Yosuke Higuchi, Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. The 2020 European Signal Processing Conference (EUSIPCO2020)    2021.01  [Refereed]

    Authorship:Last author, Corresponding author

  • Toward building a data-driven system for detecting mounting actions of black beef cattle

    Yuriko Kawano, Susumu Saito, Teppei Nakano, Ikumi Kondo, Ryota Yamazaki, Hiromi Kusaka, Minoru Sakaguchi, Tetsuji Ogawa

    Proc. 25th International Conference on Pattern Recognition (ICPR2020)    2021.01  [Refereed]

    Authorship:Last author, Corresponding author

  • Crowdsourced verification for operating calving surveillance systems at an early stage

    Yusuke Okimoto, Soshi Kawata, Susumu Saito, Nakano Teppei, Tetsuji Ogawa

    Proc. 25th International Conference on Pattern Recognition (ICPR2020)    2021.01  [Refereed]

    Authorship:Last author, Corresponding author

  • Feature Representation Learning for Calving Detection of Cows Using Video Frames

    Ryosuke Hyodo, Teppei Nakano, Tetsuji Ogawa

    Proc. 25th International Conference on Pattern Recognition (ICPR2020)    2021.01  [Refereed]

    Authorship:Last author, Corresponding author

  • Analysis of multimodal features for speaking proficiency scoring in an interview dialogue

    Mao Saeki, Yoichi Matsuyama, Satoshi Kobashikawa, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. The 8th IEEE Spoken Language Technology Workshop (SLT2021)    2021.01  [Refereed]

  • Efficient human-in-the-loop object detection using bi-directional deep SORT and annotation-free segment identification

    Koki Madono, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2020 (APSIPA2020)    2020.12  [Refereed]

    Authorship:Last author, Corresponding author

  • Exploiting narrative context and a priori knowledge of categories in textual emotion classification

    Hikari Tanabe, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi

    The 28th International Conference on Computational Linguistics (COLING2020)     5535 - 5540  2020.12  [Refereed]

  • Crowd-sourced development of image dataset for detecting mounting actions of black beef cattle

    Yuriko Kawano, Susumu Saito, Teppei Nakano, Ikumi Kondo, Ryota Yamazaki, Hitomi Kusaka, Minoru Sakaguchi, Tetsuji Ogawa

    The 2nd Asian Conference on Precision Livestock Farming (ACPLF2020)     341 - 351  2020.10  [Refereed]

    Authorship:Last author, Corresponding author

  • Attention network learning for robust detection of allantochorion and fetal membrane of Japanese black beef cattle

    Soshi Kawata, Teppei Nakano, Tetsuji Ogawa

    The 2nd Asian Conference on Precision Livestock Farming (ACPLF2020)     333 - 340  2020.10  [Refereed]

    Authorship:Last author, Corresponding author

  • Data-driven feature extraction for calving sign detection in Japanese black beef cattle using video frames

    Ryosuke Hyodo, Teppei Nakano, Tetsuji Ogawa

    The 2nd Asian Conference on Precision Livestock Farming (ACPLF2020)     323 - 332  2020.10  [Refereed]

    Authorship:Last author, Corresponding author

  • Exploring Effectiveness of Inter-Microtask Qualification Tests in Crowdsourcing

    Masaya Morinaga, Susumu Saito, Teppei Nakano, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. The 8th AAAI Conference on Human Computation and Crowdsourcing (HCOMP2020), Works-In-Progress and Demonstration Papers    2020.10  [Refereed]

    Authorship:Last author, Corresponding author

  • Mask CTC: Non-autoregressive end-to-end ASR with CTC and mask predict

    Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. The 21th Annual Conference of the International Speech Communication Association (INTERSPEECH2020)     3655 - 3659  2020.10  [Refereed]

  • Mentoring-reverse mentoring for unsupervised multi-channel speech source separation

    Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. The 21th Annual Conference of the International Speech Communication Association (INTERSPEECH2020)     86 - 90  2020.10  [Refereed]

  • CHARM-Deep: Continuous Human Activity Recognition Model Based on Deep Neural Network Using IMU Sensors of Smartwatch

    Sara Ashry, Tetsuji Ogawa, Walid Gomaa

    IEEE Sensors Journal   20 ( 15 ) 8757 - 8770  2020.08  [Refereed]

    DOI

  • SemSeq: A Regime for Training Widely-Applicable Word-Sequence Encoders

    Hiroaki Tsuyuki, Tetsuji Ogawa, Tetsunori Kobayashi, Yoshihiko Hayashi

    Communications in Computer and Information Science     43 - 55  2020.07  [Refereed]

    DOI

    Scopus

  • Deep speech extraction with time-varying spatial filtering guided by desired direction attractor

    Yu Nakagome, Masahito Togami, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2020)     671 - 675  2020.05  [Refereed]

  • Frame-level phoneme-invariant speaker embedding for text-independent speaker recognition on extremely short utterances

    Naohiro Tawara, Atsunori Ogawa, Tomoharu Iwata, Marc Delcroix, Tetsuji Ogawa

    Proc. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2020)     6799 - 6803  2020.05  [Refereed]

    Authorship:Last author

  • Block-wise scrambled image recognition using adaptation network

    Koki Madono, Masayuki Tanaka, Masaki Onishi, Tetsuji Ogawa

    AAAI-20 Workshop on Artificial Intelligence of Things    2020.02  [Refereed]

    Authorship:Last author

  • Vibration-Based Fault Detection for Flywheel Condition Monitoring

    Takanori Hasegawa, Mao Saeki, Tetsuji Ogawa, Teppei Nakano

    Procedia Structural Integrity   17   487 - 494  2019.09  [Refereed]

    Authorship:Corresponding author

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • Speaker adversarial training of DPGMM-based feature extractor for zero-resource languages

    Yosuke Higuchi, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. INTERSPEECH2019     266 - 270  2019.09  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Multi-channel speech enhancement using time-domain convolutional denoising autoencoder

    Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. INTERSPEECH2019     86 - 90  2019.09  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    31
    Citation
    (Scopus)
  • Calving prediction from video: Exploiting behavioural information relevant to calving signs in Japanese black beef cows

    Kazuma Sugawara, Susumu Saito, Teppei Nakano, Makoto Akanabe, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. ECPLF2019     663 - 669  2019.08  [Refereed]

    Authorship:Last author, Corresponding author

  • Two-stage calving prediction system: Exploiting state-based information relevant to calving signs in Japanese black beef cows

    Ryosuke Hyodo, Saki Yasuda, Yusuke Okimoto, Susumu Saito, Teppei Nakano, Makoto Akanabe, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. ECPLF2019     670 - 676  2019.08  [Refereed]

    Authorship:Last author, Corresponding author

  • Data assimilation versus machine learning: Comparative study of fish catch forecasting

    Yuka Horiuchi, Yuya Kokaki, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. OCEANS2019    2019.06  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Psychological measure on fish catches and its application to optimization criterion for machine learning based predictors

    Yuya Kokaki, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. OCEANS2019    2019.06  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Visual explanation of neural network based rotation machinery anomaly detection system

    Mao Saeki, Jun Ogata, Masahiro Murakawa, Tetsuji Ogawa

    Proc. ICPHM2019    2019.06  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    16
    Citation
    (Scopus)
  • Postfiltering using an adversarial denoising autoencoder with noise-aware training

    Naohiro Tawara, Hikari Tanabe, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa

    Proc. ICASSP2019     3282 - 3286  2019.05  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Adversarial autoencoder for reducing nonlinear distortion

    Naohiro Tawara, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa

    Proc. APSIPA2018    2018.11  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Sequential fish catch forecasting using Bayesian state space models

    Yuya Kokaki, Naohiro Tawara, Tetsunori Kobayashi, Kazuo Hashimoto, Tetsuji Ogawa

    Proc. ICPR2018     776 - 781  2018.08  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Acoustic feature representation based on timbre for fault detection of rotary machines

    Kesaaki Menemura, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. SDPC2018    2018.08  [Refereed]

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Tandem connectionist anomaly detection: Use of faulty vibration signals in feature representation learning

    Takanori Hasegawa, Jun Ogata, Masahiro Murakawa, Tetsuji Ogawa

    Proc. ICPHM2018     1 - 7  2018.06  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    11
    Citation
    (Scopus)
  • Speaker invariant feature extraction for zero-resource languages with adversarial training

    Taira Tsuchiya, Naohiro Tawara, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018)     2381 - 2385  2018.04  [Refereed]  [International journal]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    23
    Citation
    (Scopus)
  • Language model domain adaptation via recurrent neural network with domain-shared and domain-specific representations

    Tsuyoshi Morioka, Naohiro Tawara, Tetsuji Ogawa, Atsunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi

    Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2018)     6084 - 6088  2018.04  [Refereed]  [International journal]

    DOI

    Scopus

    23
    Citation
    (Scopus)
  • Exploiting end of sentences and speaker alternations in recurrent neural network-based language modeling for multiparty conversations

    Hiroto Ashikawa, Naohiro Tawara, Asunori Ogawa, Tomoharu Iwata, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2017 (APSIPA2017)    2017.12  [Refereed]

    Authorship:Last author, Corresponding author

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Adaptive training of vibration-based anomaly detector for wind turbine condition monitoring

    Takanori Hasegawa, Jun Ogata, Masahiro Murakawa, Tetsunori Kobayashi, Tetsuji Ogawa

    Proc. Annual Conference on PHM Society     177 - 184  2017.10  [Refereed]

    Authorship:Last author, Corresponding author

  • Real-Time Large-Scale Map Matching Using Mobile Phone Data

    Essam Algizawy, Tetsuji Ogawa, Ahmed El-Mahdy

    ACM Transactions on Knowledge Discovery from Data   11 ( 4 ) 1 - 38  2017.08  [Refereed]  [International journal]

     View Summary

    With the wide spread use of mobile phones, cellular mobile big data is becoming an important resource that provides a wealth of information with almost no cost. However, the data generally suffers from relatively high spatial granularity, limiting the scope of its application. In this article, we consider, for the first time, the utility of actual mobile big data for map matching allowing for “microscopic” level traffic analysis. The state-of-the-art in map matching generally targets GPS data, which provides far denser sampling and higher location resolution than the mobile data. Our approach extends the typical Hidden-Markov model used in map matching to accommodate for highly sparse location trajectories, exploit the large mobile data volume to learn the model parameters, and exploit the sparsity of the data to provide for real-time Viterbi processing. We study an actual, anonymised mobile trajectories data set of the city of Dakar, Senegal, spanning a year, and generate a corresponding road-level traffic density, at an hourly granularity, for each mobile trajectory. We observed a relatively high correlation between the generated traffic intensities and corresponding values obtained by the gravity and equilibrium models typically used in mobility analysis, indicating the utility of the approach as an alternative means for traffic analysis.

    DOI

    Scopus

    30
    Citation
    (Scopus)
  • Associative Memory Model-Based Linear Filtering and Its Application to Tandem Connectionist Blind Source Separation

    Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi

    IEEE/ACM Transactions on Audio, Speech, and Language Processing   25 ( 3 ) 637 - 650  2017.03  [Refereed]  [International journal]

    DOI

  • A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation

    Tetsuji Ogawa, Sri Harish Mallidi, Emmanuel Dupoux, Jordan Cohen, Naomi H. Feldman, Hynek Hermansky

    Proc. 23rd International Conference on Pattern Recognition (ICPR2016)     2222 - 2227  2016.12  [Refereed]  [International journal]

    Authorship:Lead author, Corresponding author

    DOI

  • Nested Gibbs sampling for mixture-of-mixture model and its application to speaker clustering

    Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi

    APSIPA Trans. Signal & Infor. Process.   ( 5 )  2016.08  [Refereed]

    DOI

  • Video semantic indexing using object detection-derived features

    Kotaro Kikuchi, Kazuya Ueki, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. 24th European Signal Processing Conference (EUSIPCO2016)     1288 - 1292  2016.08  [Refereed]

    DOI

  • Separation matrix optimization using associative memory model for blind source separation

    Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri

    2015 23rd European Signal Processing Conference, EUSIPCO 2015     1098 - 1102  2015.12  [Refereed]

     View Summary

    A source signal is estimated using an associative memory model (AMM) and used for separation matrix optimization in linear blind source separation (BSS) to yield high quality and less distorted speech. Linear-filtering-based BSS, such as independent vector analysis (IVA), has been shown to be effective in sound source separation while avoiding non-linear signal distortion. This technique, however, requires several assumptions of sound sources being independent and generated from non-Gaussian distribution. We propose a method for estimating a linear separation matrix without any assumptions about the sources by repeating the following two steps: estimating non-distorted reference signals by using an AMM and optimizing the separation matrix to minimize an error between the estimated signal and reference signal. Experimental comparisons carried out in simultaneous speech separation suggest that the proposed method can reduce the residual distortion caused by IVA.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Uncertainty estimation of DNN classifiers

    Sri Harish Mallidi, Tetsuji Ogawa, Hynek Hermansky

    2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)    2015.12  [Refereed]

    DOI

  • A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large-scale data

    Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

    APSIPA Transactions on Signal and Information Processing   4 ( 4 )  2015.10  [Refereed]

     View Summary

    An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of non-parametric Bayesian modeling is implemented with the Markov chain Monte Carlo and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet process mixture model (UO-DPMM). The present paper demonstrates that UO-DPMM is successfully applied on large-scale data and outperforms the conventional hierarchical agglomerative clustering, especially for large amounts of utterances.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Autoencoder based multi-stream combination for noise robust speech recognition

    Sri Harish Mallidi, Tetsuji Ogawa, Karel Vesely, Phani S. Nidadavolu, Hynek Hermansky

    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015)     3551 - 3555  2015.09  [Refereed]

     View Summary

    Performances of automatic speech recognition (ASR) systems degrade rapidly when there is a mismatch between train and test acoustic conditions. Performance can be improved using a multi-stream framework, which involves combining posterior probabilities from several classifiers (often deep neural networks (DNNs)) trained on different features/streams. Knowledge about the confidence of each of these classifiers on a noisy test utterance can help in devising better techniques for posterior combination than simple sum and product rules [1]. In this work, we propose to use autoencoders which are multi layer feed forward neural networks, for estimating this confidence measure. During the training phase, for each stream, an autocoder is trained on TANDEM features extracted from the corresponding DNN. On employing the autoencoder during the testing phase, we show that the reconstruction error of the autoencoder is correlated to the robustness of the corresponding stream. These error estimates are then used as confidence measures to combine the posterior probabilities generated from each of the streams. Experiments on Aurora4 and BABEL databases indicate significant improvements, especially in the scenario of mismatch between train and test acoustic conditions.

  • Bilinear map of filter-bank outputs for DNN-based speech recognition

    Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta

    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015)     16 - 20  2015.09  [Refereed]

    Authorship:Lead author, Corresponding author

     View Summary

    Filter-bank outputs are extended into tensors to yield precise acoustic features for speech recognition using deep neural networks (DNNs). The filter-bank outputs with temporal contexts form a time-frequency pattern of speech and have been shown to be effective as a feature parameter for DNN-based acoustic models. We attempt to project the filter-bank outputs onto a tensor product space using decorrelation followed by a bilinear map to improve acoustic separability in feature extraction. This extension makes extracting a more precise structure of the time-frequency pattern possible because the bilinear map yields higher-order correlations of features. Experimental comparisons carried out in phoneme recognition demonstrate that the tensor feature provides comparable results to the filter-bank feature, and the fusion of the two features yields an improvement over each feature.

  • Feature extraction for rotary-machine acoustic diagnostics focused on period

    Kesaaki Minemura, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. INTERNOISE2015    2015.08  [Refereed]

  • TOWARDS MACHINES THAT KNOW WHEN THEY DO NOT KNOW: SUMMARY OF WORK DONE AT 2014 FREDERICK JELINEK MEMORIAL WORKSHOP

    Hynek Hermansky, Lukas Burget, Jordan Cohen, Emmanuel Dupoux, Naomi Feldman, John Godfrey, Sanjeev Khudanpur, Matthew Maciejewski, Sri Harish Mallidi, Anjali Menon, Tetsuji Ogawa, Vijayaditya Peddinti, Richard Rose, Richard Stern, Matthew Wiesner, Karel Vesely

    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP)     5009 - 5013  2015  [Refereed]

     View Summary

    A group of junior and senior researchers gathered as a part of the 2014 Frederick Jelinek Memorial Workshop in Prague to address the problem of predicting the accuracy of a nonlinear Deep Neural Network probability estimator for unknown data in a different application domain from the domain in which the estimator was trained. The paper describes the problem and summarizes approaches that were taken by the group.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • A COMPARATIVE STUDY OF SPECTRAL CLUSTERING FOR I-VECTOR-BASED SPEAKER CLUSTERING UNDER NOISY CONDITIONS

    Naohiro Tawara, Tetsuji Ogawa, Tetsunori Kobayashi

    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP)     2041 - 2045  2015  [Refereed]

     View Summary

    The present paper dealt with speaker clustering for speech corrupted by noise. In general, the performance of speaker clustering significantly depends on how well the similarities between speech utterances can be measured. The recently proposed i-vector-based cosine similarity has yielded the state-of-the-art performance in speaker clustering systems. However, this similarity often fails to capture the speaker similarity under noisy conditions. Therefore, we attempted to examine the efficiency of spectral clustering on i-vector-based similarity for speech corrupted by noise because spectral clustering can yield robustness against noise by non-linear projection. Experimental comparisons demonstrated that spectral clustering yielded significant improvement from conventional methods, such as agglomerative clustering and k -means clustering, under non-stationary noise conditions.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Effect of frequency weighting on MLP-based speaker canonicalization

    Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta

    Proc. INTERSPEECH2014     2987 - 2991  2014.09  [Refereed]

  • Vision based SLAM for humanoid robots: A survey

    Walaa Gouda, Walid Gomaa, Tetsuji Ogawa

    Proceedings of the 2013 2nd International Japan-Egypt Conference on Electronics, Communications and Computers, JEC-ECC 2013     170 - 175  2013.12  [Refereed]

    Authorship:Last author

     View Summary

    This paper is a survey work for designing a Vision based Simultaneous Localization and Mapping (VSLAM) humanoid robot to generate a map of an unknown environment. A lot of factors have to be considered while designing a VSLAM robot. Vision Sensors are very attractive for application in SLAM because of their rich sensory output and cost effectiveness. Different issues are involved in the problem of vision based SLAM and many different approaches exist in order to solve these issues. Similarly the type of environment determines the suitable feature extraction method. The main objective of this survey is to conduct a comparative study among the current vision sensing methods in terms of imaging systems used for performing VSLAM, feature extraction algorithms used in some recently published papers, and initialization of landmarks, and to figure out the best for our work. © 2013 IEEE.

    DOI

    Scopus

    12
    Citation
    (Scopus)
  • Integration of MKL-based and i-vector-based speaker verification by short

    Hideitsu Hino, Tetsuji Ogawa

    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013)     562 - 566  2013.11  [Refereed]

    Authorship:Last author, Corresponding author

     View Summary

    We developed a speaker verification system that is efficient for short utterances. The i-vector-based speaker representation has helped realize highly accurate speaker verification systems; however, it might be not robust against short utterances because the reliability of statistics required for extracting i-vectors is low. On the other hand, multiple kernel learning based on conditional entropy minimization has also achieved high accuracy in speaker verification that is robust against intra-speaker variability. To improve the robustness of speaker verification systems against short utterances, we attempted to integrate the above-mentioned complementary systems. Our experimental results showed that the proposed system integration achieved high-accuracy speaker verification systems, irrespective of the utterance lengths, even for very short utterances (e.g., less than two seconds).

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data

    Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

    IEEE International Workshop on Machine Learning for Signal Processing, MLSP    2013.09  [Refereed]

     View Summary

    A novel sampling method is proposed for estimating a continuous multi-scale mixture model. The multi-scale mixture models we assume have a hierarchical structure in which each component of the mixture is represented by a Gaussian mixture model (GMM). In speaker modeling from speech, this GMM represents intra-speaker dynamics derived from the difference in the attributes such as phoneme contexts and the existence of non-stationary noise and the mixture of GMMs (MoGMMs) represents inter-speaker dynamics derived from the difference in speakers. Gibbs sampling is a powerful technique to estimate such hierarchically structured models but can easily induce the local optima problem depending on its use especially when the elemental GMMs are complex in structure. To solve this problem, a highly accurate and robust sampling method based on the blocked Gibbs sampling and iterative conditional modes (ICM) is proposed and effectively applied for reducing a singularity solution given in the model with complex multi-modal distributions. In speaker clustering experiments under non-stationary noise, the proposed sampling-based model estimation improved the clustering performance by 17% on average compared to the conventional sampling-based methods. © 2013 IEEE.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Stream Selection and Integration in Multistream ASR Using GMM-Based Performance Monitoring

    Tetsuji Ogawa, Feipeng Li, Hynek Hermansky

    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013)     3331 - 3335  2013.08  [Refereed]

    Authorship:Lead author, Corresponding author

     View Summary

    A moderately deep and rather wide artificial neural net is applied in phoneme recognition of noisy speech. The net is formed by first estimating posterior probabilities of phonemes in 21 band-limited streams covering the whole speech spectrum. These 21 band-limited streams are subdivided into three seven band-limited stream subsets, by differently sub-sampling the original 21 band-limited streams. In the second processing stage, all non-empty combinations of seven band-limited streams from each subset are formed as inputs to 127 artificial neural nets that are again trained to yield phoneme posteriors. In this way, 127 x 3 = 381 processing streams are formed. A novel technique for finding the best combination of the resulting 381 parallel processing streams, which uses the likelihood of a single-state Gaussian mixture model of the final classifier output is applied to selecting the most efficient streams. The technique is efficient in phoneme recognition of speech that is corrupted by realistic additive noise.

  • An Improved Entropy-Based Multiple Kernel Learning

    Hideitsu Hino, Tetsuji Ogawa

    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012)     1189 - 1192  2012.11  [Refereed]

    Authorship:Last author

     View Summary

    Kernel methods have been successfully used in many practical machine learning problems. However, the problem of choosing a suitable kernel is left to practitioners. One method to select the optimal kernel is to learn a linear combination of element kernels. A framework of multiple kernel learning based on conditional entropy minimization criterion (MCEM) has been proposed and it has been shown to work well for, e.g., speaker recognition tasks. In this paper, a computationally efficient implementation for MCEM, which utilizes sequential quadratic programming, is formulated. Through a comparative experiment to conventional MCEM algorithm on a speaker verification task, the proposed method is shown to offer comparable verification accuracy with considerable improvement in computational speed.

  • Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model

    Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012)     2163 - 2166  2012.09  [Refereed]

     View Summary

    We have proposed a novel speaker clustering method based on a hierarchically structured utterance-oriented Dirichlet process mixture model. In the proposed method, the number of speakers can be determined from the given data using a nonparametric Bayesian manner and intra-speaker variability is successfully handled by multi-scale mixture modeling. Experimental result showed that the proposed method is computationally-efficient and effective in speaker clustering. The proposed method significantly improve the accuracy of speaker clustering systems as compared with the conventional method, particularly for the case in which the number of utterances varied from speaker to speaker.

  • FULLY BAYESIAN INFERENCE OF MULTI-MIXTURE GAUSSIAN MODEL AND ITS EVALUATION USING SPEAKER CLUSTERING

    Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi

    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)     5253 - 5256  2012.03  [Refereed]

     View Summary

    This study aims to verify effective optimization methods for estimating parametric, fully Bayesian models in speech processing. For that purpose, we investigate the impact of the difference in optimization methods for the multi-scale Gaussian mixture model, which is suitable for speaker clustering, on the clustering accuracy. The Markov chain Monte Carlo (MCMC)-based method was compared with the variational Bayesian method in the speaker clustering experiment; with a small amount of data, the MCMC-based method was more effective; with large scale data (more than one million samples), the difference between these methods in terms of the clustering accuracy decreased and the MCMC-based method was computationally efficient.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments

    Takahiro Fukumori, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Norihide Kitaoka, Takeshi Yamada, Kazumasa Yamamoto, Satoru Tsuge, Masakiyo Fujimoto, Tetsuya Takiguchi, Chiyomi Miyajima, Satoshi Tamura, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura

    Acoustical Science and Technology   32 ( 5 ) 201 - 210  2011.09  [Refereed]

     View Summary

    We have been distributing a new collection of databases and evaluation tools called CENSREC-4, which is a framework for evaluating distant-talking speech in reverberant environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1. Two subsets are included in the data: "basic data sets" and "extra data sets." The basic data sets are used for evaluating the room impulse response-convolved speech data to simulate the various reverberations. The extra data sets consist of simulated data and corresponding real recorded data. Evaluation tools are presently only provided for the basic data sets and will be delivered to the extra data sets in the future. The task of CENSREC-4 with a basic data set appears simple
    however, the results of experiments prove that CENSREC-4 provides a challenging reverberation speech-recognition task, in the sense that a traditional technique to improve recognition and a widely used criterion to represent the difficulty of recognition deliver poor performance. Within this context, this common framework can be an important step toward the future evolution of reverberant speech-recognition methodologies. © 2011 The Acoustical Society of Japan.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Development and evaluation of Japanese Lombard speech corpus

    Tetsuji Ogawa, Takanobu Nishiura, Takeshi Yamada, Norihide Kitaoka, Tetsunori Kobayashi

    Proc. Internoise2011    2011.09  [Refereed]  [Invited]

    Authorship:Lead author, Corresponding author

  • Class-Distance-Based Discriminant Analysis and Its Application to Supervised Automatic Age Estimation

    Tetsuji Ogawa, Kazuya Ueki, Tetsunori Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E94D ( 8 ) 1683 - 1689  2011.08  [Refereed]

    Authorship:Lead author, Corresponding author

     View Summary

    We propose a novel method of supervised feature projection called class-distance-based discriminant analysis (CDDA), which is suitable for automatic age estimation (AAE) from facial images. Most methods of supervised feature projection, e.g., Fisher discriminant analysis (FDA) and local Fisher discriminant analysis (LFDA), focus on determining whether two samples belong to the same class (i.e., the same age in AAE) or not. Even if an estimated age is not consistent with the correct age in AAE systems, i.e., the AAE system induces error, smaller errors are better. To treat such characteristics in AAE, CDDA determines between-class separability according to the class distance (i.e., difference in ages); two samples with similar ages are imposed to be close and those with spaced ages are imposed to be far apart. Furthermore, we propose an extension of CDDA called local CDDA (LCDDA), which aims at handling multimodality in samples. Experimental results revealed that CDDA and LCDDA could extract more discriminative features than FDA and LFDA.

    DOI

    Scopus

  • Speaker verification robust to talking style variation using multiple kernel learning based on conditional entropy minimization

    Tetsuji Ogawa, Hideitsu Hino, Noboru Murata, Tetsunori Kobayashi

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH     2741 - 2744  2011.08  [Refereed]

    Authorship:Lead author, Corresponding author

  • Speaker Clustering Based on Utterance-oriented Dirichlet Process Mixture Model

    Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011)     2905 - 2908  2011.08  [Refereed]

     View Summary

    This paper provides the analytical solution and algorithm of UO-DPMM based on a non-parametric Bayesian manner, and thus realizes fully Bayesian speaker clustering. We carried out preliminary speaker clustering experiments by using a TIMIT database to compare the proposed method with the conventional Bayesian Information Criterion (BIC) based method, which is an approximate Bayesian approach. The results showed that the proposed method outperformed the conventional one in terms of both computational cost and robustness to changes in tuning parameters.

  • Spatial filter calibration based on minimization of modified LSD

    Nobuaki Tanaka, Tetsuji Ogawa, Tetsunori Kobayashi

    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011)     1761 - 1764  2011.08  [Refereed]

     View Summary

    A new sound source separation method has been developed that is robust against individual variability in microphones and acoustic lines. A specific area that has a target sound source was enhanced by using a spatial filter developed by time-frequency masking. However, there is a strong likelihood that the spatial filters will be distorted due to the impact of individual variability in microphone characteristics and acoustic lines. To solve this problem, calibration of these spatial filters' shapes was attempted using a modified log-spectral distance (MLSD) minimization criterion, which uses utterances made by each individual (i.e., a sound source) at the desired positions. The effectiveness of this spatial filter calibration was experimentally verified in speech recognition experiments; MLSD-based calibration had fewer word errors than the cases without calibration and calibration using other criteria.

  • Speaker recognition using multiple kernel learning based on conditional entropy minimization

    Tetsuji Ogawa, Hideitsu Hino, Nima Reyhani, Noboru Murata, Tetsunori Kobayashi

    2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)     2204 - 2207  2011.05  [Refereed]

    Authorship:Lead author, Corresponding author

    DOI

  • CENSREC-1-AV: An audio-visual corpus for noisy bimodal speech recognition

    Satoshi Tamura, Chiyomi Miyajima, Norihide Kitaoka, Takeshi Yamada, Satoru Tsuge, Tetsuya Takiguchi, Kazumasa Yamamoto, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Shigeki Matsuda, Tetsuji Ogawa, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura

    Proc. AVSP2010    2010.09  [Refereed]

  • DEVELOPMENT OF ZONAL BEAMFORMER AND ITS APPLICATION TO ROBOT AUDITION

    Nobuaki Tanaka, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

    18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010)     1529 - 1533  2010.08  [Refereed]

     View Summary

    We have proposed a zonal beamformer (ZBF), which enhances the sound source located in a zonal space, and applied the ZBF to noise reduction systems for robot audition. A conversational partner of a robot does not always remain stationary with respect to the robot. In order to cope with such a situation, we have proposed a fan-like beamformer (FBF), which enhances the sound source located in a fan-like space in front of the robot under the assumption that the partner is in front of the robot. However, the FBF may degrade the noise reduction performance when directional noise sources are located behind the target source because the FBF widens the space as the distance from the robot increases. The ZBF can better improve the performance of eliminating the directional noise coming from behind the target source than the FBF because the ZBF has a considerably sharper directivity than the FBF.

  • Speech Enhancement Using a Square Microphone Array in the Presence of Directional and Diffuse Noise

    Tetsuji Ogawa, Shintaro Takada, Kenzo Akagiri, Tetsunori Kobayashi

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E93A ( 5 ) 926 - 935  2010.05  [Refereed]

    Authorship:Lead author, Corresponding author

     View Summary

    We propose a new speech enhancement method suitable for mobile devices used in the presence of various types of noise. In order to achieve high-performance speech recognition and auditory perception in mobile devices, various types of noise have to be removed under the constraints of a space-saving microphone arrangement and few computational resources. The proposed method can reduce both the directional noise and the diffuse noise under the abovementioned constraints for mobile devices by employing a square microphone array and conducting low-computational-cost processing that consists of multiple null beamforming, minimum power channel selection, and Wiener filtering. The effectiveness of the proposed method is experimentally verified in terms of speech recognition accuracy and speech quality when both the directional noise and the diffuse noise are observed simultaneously; this method reduces the number of word errors and improves the log-spectral distances as compared to conventional methods.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Development of zonal beam former and its application to robot audition

    Nobuaki Tanaka, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

    European Signal Processing Conference     1529 - 1533  2010

     View Summary

    We have proposed a zonal beamformer (ZBF), which enhances the sound source located in a zonal space, and applied the ZBF to noise reduction systems for robot audition. A conversational partner of a robot does not always remain stationary with respect to the robot. In order to cope with such a situation, we have proposed a fan-like beamformer (FBF), which enhances the sound source located in a fan-like space in front of the robot under the assumption that the partner is in front of the robot. However, the FBF may degrade the noise reduction performance when directional noise sources are located behind the target source because the FBF widens the space as the distance from the robot increases. The ZBF can better improve the performance of eliminating the directional noise coming from behind the target source than the FBF because the ZBF has a considerably sharper directivity than the FBF. © EURASIP, 2010.

  • Influence of Lombard Effect: Accuracy Analysis of Simulation-Based Assessments of Noisy Speech Recognition Systems for Various Recognition Conditions

    Tetsuji Ogawa, Tetsunori Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E92D ( 11 ) 2244 - 2252  2009.11  [Refereed]

    Authorship:Lead author, Corresponding author

     View Summary

    The accuracy of simulation-based assessments of speech recognition systems under noisy conditions is investigated with a focus on the influence of the Lombard effect on the speech recognition performances. This investigation was carried out under various recognition conditions of different sound pressure levels of ambient noise, for different recognition tasks, such as continuous speech recognition and spoken word recognition, and using different recognition systems, i.e., systems with and without adaptation of the acoustic models to ambient noise. Experimental results showed that accurate simulation was not always achieved when dry sources with neutral talking style were used, but it could be achieved if the dry sources that include the influence of the Lombard effect were used; the simulation in the latter case is accurate, irrespective of the recognition conditions.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Robot auditory system using head-mounted square microphone array

    Kosuke Hosoya, Tetsuji Ogawa, Tetsunori Kobayashi

    2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS     2736 - 2741  2009.10  [Refereed]

     View Summary

    A new noise reduction method suitable for autonomous mobile robots was proposed and applied to preprocessing of a hands-free spoken dialogue system. When a robot talks with a conversational partner in real environments, not only speech utterances by the partner but also various types of noise, such as directional noise, diffuse noise, and noise from the robot, are observed at microphones. We attempted to remove these types of noise simultaneously with small and light-weighted devices and low-computational-cost algorithms. We assumed that the conversational partner of the robot was in front of the robot. In this case, the aim of the proposed method is extracting speech signals coming from the frontal direction of the robot. The proposed noise reduction system was evaluated h the presence of various types of noise: the number of word errors was reduced by 69 % as compared to the conventional methods. The proposed robot auditory system can also cope with the case in which a conversational partner (i.e., a sound source) moves from the front of the robot: the sound source was localized by face detection and tracking using facial images obtained from a camera mounted on an eye of the robot. As a result, various types of noise could be reduced in real time, irrespective of the sound source positions, by combining speech information with image information.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • CENSREC-1-C: An evaluation framework for voice activity detection under noisy environments

    Norihide Kitaoka, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Kazumasa Yamamoto, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Shigeki Matsuda, Tetsuji Ogawa, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura

    Acoustical Science and Technology   30 ( 5 ) 363 - 371  2009.08  [Refereed]

     View Summary

    Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding under noisy environments. We have developed an evaluation framework for VAD under noisy environments, named CENSREC-1-C. We designed this framework for simple isolated utterance detection and hence, this framework consists of noisy continuous digit utterances and evaluation tools for VAD results. We define two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance. We also provide the evaluation results of a power-based VAD method as a reference. ©2009 The Acoustical Society of Japan.

    DOI

    Scopus

    27
    Citation
    (Scopus)
  • Direction-of-arrival estimation under noisy condition using four-line omni-directional microphones mounted on a robot head

    Tetsuji Ogawa, Kosuke Hosoya, Kenzo Akagiri, Tetsunori Kobayashi

    Proc. EUSIPCO2009    2009.08  [Refereed]

  • CENSREC-4: Development of Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments

    Masato Nakayama, Takanobu Nishiura, Yuki Denda, Norihide Kitaoka, Kazumasa Yamamoto, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Tetsuji Ogawa, Shigeki Matsuda, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura

    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5     968 - 971  2008.09  [Refereed]

  • Class Distance Weighted Locality Preserving Projection for Automatic Age Estimation

    Kazuya Ueki, Masakazu Miya, Tetsuji Ogawa, Tetsunori Kobayashi

    2008 IEEE Second International Conference on Biometrics: Theory, Applications and Systems (BTAS2008)    2008.09  [Refereed]

    DOI

  • Ears of the Robot: Noise Reduction Using Four-Line Ultra-Micro Omni-Directional Microphones Mounted on A Robot Head

    Tetsuji Ogawa, Hirofumi Takeuchi, Shintaro Takada, Kenzo Akagiri, Tetsunori Kobayashi

    Proc. EUSIPCO2008    2008.08  [Refereed]

    Authorship:Lead author, Corresponding author

  • Ears of the robot: Direction of arrival estimation based on pattern recognition using robot-mounted microphones

    Naoya Mochiki, Tetsuji Ogawa, Tetsunori Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E91D ( 5 ) 1522 - 1530  2008.05  [Refereed]

     View Summary

    We propose a new type of direction-of-arrival estimation method for robot audition that is free from strict head related transfer function estimation. The proposed method is based on statistical pattern recognition that employs a ratio of power spectrum amplitudes occurring for a microphone pair as a feature vector. It does not require any phase information explicitly, which is frequently used in conventional techniques, because the phase information is unreliable for the case in which strong reflections and diffractions occur around the microphones. The feature vectors we adopted can treat these influences naturally. The effectiveness of the proposed method was shown from direction-of-arrival estimation tests for 19 kinds of directions: 92.4% of errors were reduced compared with the conventional phase-based method.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Speech enhancement using square microphone array for mobile devices

    Shintaro Takada, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12     313 - 316  2008.04  [Refereed]

     View Summary

    In this paper, we propose a new type of speech enhancement method that is suitable for mobile devices used in noisy environments. For the sake of achieving high-performance speech recognition and auditory perception in the mobile devices, disturbance noises have to be removed under the requirements of a space-saving microphone arrangement and a low computational cost. The proposed method can reduce both the directional and the diffuse noises under the requirements for the mobile devices by applying the square microphone array and the low-cost processing that consists of multiple null beam-forming, their minimum power channel selection and Wiener filtering. The effectiveness of the proposed method is clarified for speech recognition accuracies and speech qualities under the condition in which both the directional and the diffuse noises exist simultaneously: it reduced 40% of recognition errors and improved PESQ-based MOS value by 0.75 point.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Sound source separation using null-beamforming and spectral subtraction for mobile devices

    Shintaro Takada, Satoshi Kanba, Tetsuji Ogawa, Kenzo Akagiri, Tetsunori Kobayashi

    Proc. 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA2007)     133 - 136  2007.10  [Refereed]

     View Summary

    This paper presents a new type of speech segregation method for mobile devices in noisy sound situation, where two or more speakers are talking simultaneously. The proposed method consists of multiple null-beamformers, their minimum power channel selection and spectral subtraction. The proposed method is performed with space-saving and coplanar microphone arrangements and low-cost calculations, which are the very important requirements for the mobile application. Effectiveness of the proposed method is clarified in the segregation and the recognition experiments of two simultaneous continuous speeches: the method improved the PESQ-based MOS value by about one point and reduced 70% of word recognition errors compared with non-processing.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • Ears of the robot: Three simultaneous speech segregation and recognition using robot-mounted microphones

    Naoya Mochiki, Tetsuji Ogawa, Tetsunori Kobayashi

    IEICE Transactions on Information and Systems   E90-D ( 9 ) 1465 - 1468  2007.09  [Refereed]

    Authorship:Corresponding author

     View Summary

    A new type of sound source segregation method using robot-mounted microphones, which are free from strict head related transfer function (HRTF) estimation, has been proposed and successfully applied to three simultaneous speech recognition systems. The proposed segregation method is executed with sound intensity differences that are due to the particular arrangement of the four directivity microphones and the existence of a robot head acting as a sound barrier. The proposed method consists of three-layered signal processing: two-line SAFIA (binary masking based on the narrow band sound intensity comparison), two-line spectral subtraction and their integration. We performed 20 K vocabulary continuous speech recognition test in the presence of three speakers' simultaneous talk, and achieved more than 70% word error reduction compared with the case without any segregation processing. Copyright © 2007 The Institute of Electronics, Information and Communication Engineers.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Adequacy Analysis of Simulation-Based Assessment of Speech Recognition System

    Tetsuji Ogawa, Satoshi Kanba, Tetsunori Kobayashi

    2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07   ( 4 ) 1153 - 1157  2007.04  [Refereed]

    Authorship:Lead author

    DOI

  • Manifold HLDA and its application to robust speech recognition

    Toshiaki Kubo, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. The 9th International Conference on Spoken Language Processing (INTERSPEECH2006 - ICSLP)     1551 - 1554  2006.09  [Refereed]

    Authorship:Corresponding author

     View Summary

    A manifold heteroscedastic linear discriminant analysis (MHLDA) which removes environmental information explicitly from the useful information for discrimination is proposed. Usually, a feature parameter used in pattern recognition involves categorical information and also environmental information. A well-known HLDA tries to extract useful information (UT) to represent categorical information from the feature parameter. However, environmental information is still remained in the UI parameters extracted by HLDA, and it causes slight degradation in performance. This is because HLDA does not handle the environmental information explicitly. The proposed MHLDA also tries to extract UI like HLDA, but it handles environmental information explicitly. This handling makes MHLDA-based UI parameter less influenced of environment. However, as compensation, in MHLDA, the categorical information is little bit destroyed. In this paper, we try to combine HLDA-based UI and MHLDA-based UI for pattern recognition, and draw benefit of both parameters. Experimental results show the effectiveness of this combining method.

    DOI

  • Source Separation Using Multiple Directivity Patterns Produced by ICA-based BSS

    Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. The 14th European Signal Processing Conference (EUSIPCO2006)    2006.09  [Refereed]

  • A Method for Solving the Permutation Problem of Frequency-Domain BSS Using Reference Signal

    Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. The 14th European Signal Processing Conference (EUSIPCO2006)    2006.09  [Refereed]

  • Head Gesture Recognition for the Moving Conversation Robot

    NAKAJIMA Kei, EJIRI Yasushi, FUJIE Shinya, OGAWA Tetsuji, MATSUSAKA Yosuke, KOBAYASHI Tetsunori

    The IEICE transactions on information and systems   J89-D ( 7 ) 1514 - 1522  2006.09  [Refereed]

    CiNii

  • Genetic algorithm based optimization of Partly-Hidden Markov Model structure using discriminative criterion

    Tetsuji Ogawa, Tetsunori Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E89D ( 3 ) 939 - 945  2006.03  [Refereed]

    Authorship:Lead author

     View Summary

    A discriminative modeling is applied to optimize the structure of it Partly-Hidden Markov Model (PHMM). PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can represent observation dependent behaviors in both observations and state transitions. In the formulation of the previous PHMM we used a common structure for all models. However, it is expected that the optimal structure which gives the best performance differs front category to category. In this paper, we designed a new structure optimization method in which the dependence of the states and the observations of PHMM are optimally defined according to each model using the weighted likelihood-ratio maximization (WLRM) criterion. The WLRM criterion gives high discriminability between the correct category and the incorrect categories. Therefore it gives model structures with good discriminative performance. We define the model structure combination which satisfy the WLRM criterion for any possible structure combinations as the optimal structures. A genetic algorithm is also applied to the adequate approximation of a full search. With results of continuous lecture talk speech recognition, the effectiveness of the proposed structure optimization is shown: it reduced the word errors compared to HMM and PHMM with a common structure for all models.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A Method for Solving the Permutation Problem of Frequency-domain Blind Source Separation using Reference Signal

    Takashi Isa, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

    Biennial on DSP for in-Vehicle and Mobile Systems    2005.09  [Refereed]

  • Optimizing the Structure of Partly-Hidden Markov Models Using Weighted Likelihood-Ratio Maximization Criterion

    Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. Interspeech2005     3353 - 3356  2005.09  [Refereed]

    Authorship:Lead author

  • Extension of Hidden Markov Models for multiple candidates and its application to gesture recognition

    Yosuke Sato, Tetsuji Ogawa, Tetsunori Kobayashi

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E88D ( 6 ) 1239 - 1247  2005.06  [Refereed]

     View Summary

    We propose a modified Hidden Markov Model (HMM) with a view to improve gesture recognition using a moving camera. The conventional HMM is formulated so as to deal with only one feature candidate per frame. However. for a mobile robot, the background and the lighting conditions are always changing, and the feature extraction problem becomes difficult. It is almost impossible to extract a reliable feature vector under such conditions. In this paper, we define a new gesture recognition framework in which multiple candidates of feature vectors are generated with confidence measures and the HMM is extended to deal with these multiple feature vectors. Experimental results comparing the proposed system with feature vectors based on DCT and the method of selecting only one candidate feature point verifies the effectiveness of the proposed technique.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Recognition of three simultaneous utterance of speech by four-line directivity microphone mounted on head of robot

    Naoya Mochiki, Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. ICSLP2004   2   821 - 824  2004.10  [Refereed]

  • Extension of State-Observation Dependency in Partly-Hidden Markov Models and Its Application to Continuous Speech Recognition

    Tetsuji Ogawa, Tetsunori Kobayashi

    The Transactions of the Institute of Electronics,Information and Communication Engineers.   J87-DII ( 6 ) 1216 - 1223  2004.06  [Refereed]

    Authorship:Lead author

    CiNii

  • Speech Recognition of Double Talk using SAFIA-based Audio Segregation

    Toshiyuki Sekiya, Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. EUROSPEECH2003     1285 - 1288  2003.09  [Refereed]

  • Hybrid modeling of PHMM and HMM for speech recognition

    Tetsuji Ogawa, Tetsunori Kobayashi

    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS   1   140 - 143  2003  [Refereed]

    Authorship:Lead author

     View Summary

    A hybrid acoustic model of Partly Hidden Markov Model (PHMM) and HMM is proposed,
    PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can realize the observation dependent behaviors in both observations and state transitions. It achieved good performance but some errors with different trend from HMM still remained.
    In this paper, we designed a new acoustic model on the basis of PHMM, in which the observation and state transition probabilities are defined by the geometric means of PHMM-based ones and HMM-based ones. In this framework, if a word hypothesis is given a low score by either PHMM or HMM, it almost loses possibilities to be a probable candidate. Since many errors are due to the high-scores of incorrect categories, rather than the low-score of the correct category, this property contributed to reduce errors. Moreover, the proposed model is more stable than PHMM because the higher order statistics of PHMM, which is generally accurate but sometimes less reliable, is smoothed by the lower order statistics of HMM, which is not so accurate but robust.
    Experimental results showed the effectiveness of proposed model: it reduced the word errors by 25% compared with HMM.

    DOI

  • Generalization of State-Observation-Dependency in Partly Hidden Markov Models

    Tetsuji Ogawa, Tetsunori Kobayashi

    Proc. ICSLP2002     2673 - 2676  2002.09  [Refereed]

    Authorship:Lead author

▼display all

Books and Other Publications

  • Smart fisheries

    WADA Masaaki( Part: Contributor, Catch Prediction Model)

    Midori Shobo  2023.03 ISBN: 9784895318853

  • 音声(下)

    岩野, 公司, 河原, 達也, 篠田, 浩一, 伊藤, 彰則, 増村, 亮, 小川, 哲司, 駒谷, 和範( Part: Contributor, 話者認識)

    コロナ社  2023.01 ISBN: 9784339013672

  • スマート水産業入門

    和田, 雅昭( Part: Contributor, 定置網漁の日次漁獲量予測~定置網漁に関する知識を組み込んだビッグデータを必要としない漁獲量予測モデリング~)

    緑書房  2022.03 ISBN: 9784895317818

  • Encyclopedia of artificial intelligence

    ( Part: Contributor)

    2017.07 ISBN: 9784320997974

  • 音響キーワードブック

    日本音響学会( Part: Contributor, 話者ダイアライゼーション)

    コロナ社  2016.03 ISBN: 9784339008807

    ASIN

Presentations

  • 気象・海況情報を用いた良漁場予測における予測範囲の絞り込みに関する取り組み

    兒新治紀, 中野鐵兵, 宮澤泰正, 小川哲司

    マリンITワークショップ2023 

    Presentation date: 2023.08

  • 畜産農家が納得して意思決定するための繁殖牛の映像監視モデリング

    小川哲司, 斎藤奨, 中野鐵兵  [Invited]

    第10回計測自動制御学会制御部門マルチシンポジウム ,企画セッション:農・林・畜・水産業への計測制御技術応用 

    Presentation date: 2023.03

  • Video monitoring for detecting calving signs of breeding cows - How to construct and operate AI systems that enable users to make decisions with conviction?

    Tetsuji Ogawa

    CSE Research Seminar in E-JUST, E-JUST, Alexandria, Egypt 

    Presentation date: 2022.12

  • Tutti: データアノテーション用システム開発・運用基盤

    斎藤奨, 中野鐵兵, 小川哲司

    第25回情報論的学習理論ワークショップ (IBIS2022) 

    Presentation date: 2022.11

  • 予測クラスの相違に基づく深層ニューラルネットワークの不確実性推定

    松永直輝, 斎藤奨, 中野鐵兵, 小川哲司

    第24回情報論的学習理論ワークショップ(IBIS2021) 

    Presentation date: 2021.11

  • 映像監視に基づく繁殖牛の分娩予兆検知~ユーザが納得して意思決定できるような映像監視システムをどう構築し運用するか?~

    小川哲司

    第2回AI・人工知能EXPO秋・アカデミックフォーラム 

    Presentation date: 2021.10

  • 船上映像からの魚体の検出・追跡に基づく漁獲尾数計測

    田中理子, 中野鐵兵, 漁崎盛也, 小川哲司

    マリンITワークショップ2021 

    Presentation date: 2021.09

  • 意思決定支援のための説明可能な状態監視システムの構築・運用法(家畜の映像監視を例に)

    小川哲司, 兵頭亮介, 斎藤奨, 中野鐵兵  [Invited]

    電子情報通信学会総合大会,企画セッション:AIは本当にPoCを超えられるのか?-実用化を阻む大きな壁- 

    Presentation date: 2021.03

  • メジカ漁師の意思決定に対する直接的支援のための漁場予測に関する検討~高知マリンイノベーションの取り組みとして~

    小川哲司, 堀内優佳, 田中理子, 宮澤泰正, 漁崎盛也

    マリンITワークショップ2021みえ 

    Presentation date: 2021.03

  • 風車異常検知システムの早期運用に関する事例紹介~メンテナンスに係る意思決定のために人工知能技術をどう構築・運用すべきか?~

    小川哲司, 長谷川隆徳, 緒方淳  [Invited]

    トライボロジー技術へのAIの活用を考える研究会 

    Presentation date: 2021.03

  • ビッグデータを利用できないとき,人工知能技術をどう開発し運用するか?~第一次産業支援に関する事例紹介~

    小川哲司  [Invited]

    早稲田大学実体情報学博士プログラム 2020年度第4回コロキューム 

    Presentation date: 2020.12

  • ユーザの意思決定過程に関するドメイン知識を組み込んだ解釈可能な映像監視モデリング

    兵頭亮介, 中野鐵兵, 小川哲司

    第23回情報論的学習理論ワークショップ (IBIS2020)  (茨城県・つくば市) 

    Presentation date: 2020.11

  • ビッグデータを利用できないとき,AI技術をどう開発するか?~水産業支援と畜産業支援の事例紹介~

    小川哲司, 斎藤奨, 中野鐵兵  [Invited]

    電子情報通信学会総合大会,企画セッション:あなたは本当にAIを理解していますか? - 基本原理から使い方,応用まで - 

    Presentation date: 2020.03

    Event date:
    2020.03
     
     
  • 人工知能技術の現状と課題~メンテナンスや一次産業支援に適用する際に注意すべきこと~

    小川哲司  [Invited]

    IoTビジネス推進コンソーシアム沖縄第7回セミナー  (沖縄県・那覇市) 

    Presentation date: 2019.10

  • センサデータの欠損が漁獲量予測性能に与える影響

    小川哲司, 堀内優佳, 小林哲則, 福嶋正義, 井戸上彰

    マリンITワークショップ2019  (北海道・函館市) 

    Presentation date: 2019.08

  • 漁獲量心理尺度と機械学習による漁獲量予測モデルの最適化への利用

    小川哲司, 幸加木裕也, 橋本和夫, 小林哲則, 福嶋正義, 井戸上彰

    マリンITワークショップ2019いしがき  (沖縄県・石垣市) 

    Presentation date: 2019.03

  • 最近の人工知能技術事情と鹿児島県における産学連携導入事例

    小川哲司  [Invited]

    鹿児島ITビジネス研究会  (鹿児島県・鹿児島市) 

    Presentation date: 2019.03

  • 状態空間モデルを用いた定置網漁のための日単位漁獲量予測

    小川哲司

    マリンITワークショップ  (北海道・函館市) 

    Presentation date: 2018.08

  • 情報工学から考えるIoTと畜産の未来

    小川哲司  [Invited]

    日本繁殖生物学会若手サマーセミナー合宿  (茨城県・笠間市) 

    Presentation date: 2018.08

  • Toward proactive forecasting for smart maintenance of infrastructure equipment and support for primary industry

    Tetsuji Ogawa  [Invited]

    7th Research Seminar in E-JUST  (Alexandria)  Egypt-Japan University of Science and Technology (E-JUST)

    Presentation date: 2018.03

  • 人工知能研究の進展と課題

    小川哲司  [Invited]

    鹿児島ITビジネス研究会  (鹿児島県・鹿児島市) 

    Presentation date: 2017.09

  • High resolution traffic maps generation using cellular big data

    Ahmed El-Mahdy, Essam Algizawy, Tetsuji Ogawa, Hisham Shishiny, Mohamed Badder, Keiji Kimura

    NetMob2015  (Boston) 

    Presentation date: 2015.04

  • 階層的発話生成モデルを用いた話者クラスタリングのためのフルベイズモデル推定手法の比較

    俵直弘, 小川哲司, 渡部晋治, 小林哲則

    第14回情報論的学習理論ワークショップ(IBIS2011)  (奈良県・奈良市) 

    Presentation date: 2011.11

  • クラス間距離に基づく判別分析と年齢推定システムへの適用

    小川哲司, 小林哲則

    第13回情報論的学習理論ワークショップ(IBIS2010)  (東京都・目黒区) 

    Presentation date: 2010.11

  • Sound source separation system and acoustic signal acquisition device

    Tetsuji Ogawa

    Leading Edge Japan 2009  (New York) 

    Presentation date: 2009.03

  • Multi-layer audio segregation and its application to double talk recognition

    Toshiyuki Sekiya, Tomohiro Sawada, Tetsuji Ogawa, Tetsunori Kobayashi

    SWIM, Lectures by Masters in Speech Processing  (Honolulu) 

    Presentation date: 2004.01

▼display all

Research Projects

  • Study on Construction and Operation Method of Sustainable Condition Monitoring System for Decision Support

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2022.04
    -
    2025.03
     

  • 重症心身障害児の育ちを支える「コミュニケーション支援 AI」の開発および持続的な運用法の確立

    木原記念横浜生命科学振興財団  2023 年度 LIP.横浜トライアル助成金

    Project Year :

    2023.06
    -
    2024.03
     

    佐藤朝美, 小川哲司

  • Research on sustainable fishery condition monitoring through cooperation between fishermen and artificial intelligence technology

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2022.06
    -
    2024.03
     

  • Deep semantic annotation of video contents

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2017.04
    -
    2021.03
     

     View Summary

    To enable an advanced retrieval system or an intelligent knowledge extraction system that deals with a large set of video contents, it is essential to semantically annotate them adequately. Towards this ultimate goal, this study researched fundamental technologies that combine vision and language technologies. More specifically, we have developed an effective yet efficient scene graph generation systems and an action captioning system. Empirical results show that the resulting systems generally performed better than the comparative systems. These systems respectively achieve information structure adequate for computer processing and for human consumption.

  • 局所的海洋データを活用した漁業の効率化の研究開発

    総務省  戦略的情報通信研究開発推進制度(SCOPE)

    Project Year :

    2017.04
    -
    2020.03
     

    内海康雄, 北島宏之, 若生一広, 菅原利弥, 宇都宮栄二, 井戸上彰, 阿部博則, 福嶋正義, 小川哲司, 小林哲則, 中野鐵兵, 橋本和夫

  • A study on speaker-specific information extraction in consideration of vocalization mechanism and its application to speaker verification

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2016.04
    -
    2019.03
     

    Ogawa Tetsuji, Tawara Naohiro

     View Summary

    An attempt was made to develop a neural network to learn speaker representations that are not affected by phoneme information under the assumption that speaker and phoneme information are separable on acoustic features. As the achievement, the disentangling neural network was successfully developed to extract the phoneme and speaker information separately from each frame of acoustic features. The present study introduced statistical pooling, which aims at reflecting the utterance-by-utterance speaker information to the frame-by-frame features, and demonstrated that the pooling just before classification (i.e., late pooling) performed well. In addition, a loss function based on the entropy of classifiers was introduced to optimize feature extractors such that the extracted features could contain only the desired speaker-specific and phoneme-specific information and shown to be effective in speaker verification.

  • A study on total optimization of multiple pattern recognition systems using cooperative and adaptive training

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2013.04
    -
    2016.03
     

    Ogawa Tetsuji

     View Summary

    Attempts have been made to cooperatively optimize multiple pattern recognition systems, developing a total system efficiently and automatically. Specifically, the clustering technique that is robust against the environmental changes and multistream pattern recognition framework, which cooperatively exploits information yielded from multiple systems, have been developed as the fundamental technologies for adaptively refining the systems to cope with the changes in characteristics of data (e.g., users and surrounding environments of the system).

  • Effective improvement of time-series pattern recognition systems using clustering and unsupervised adaptive training

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2011
    -
    2012
     

    OGAWA Tetsuji

     View Summary

    I developed technologies for clustering speech data into acoustic attributes such as speakers and types of noise and technologies for adaptively optimizing speech recognition systems in unsupervised ways. The developed technologies would be essential for constructing a system structuring speech data and a speech retrieval system.

  • A study on online adaptive pattern recognition with sequential optimization of model structures

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2009
    -
    2010
     

    OGAWA Tetsuji

     View Summary

    I developed a method of adaptively optimizing both the structure and parameters of statistical models used in pattern recognition systems to effectively improve robustness of those systems to environmental changes. In addition, I attempted to apply this framework to speaker recognition systems using speech information and face recognition systems using image information.

  • A study on communication robot performing rhythmic conversation

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2008
    -
    2010
     

    KOBAYASHI Tetsunori

     View Summary

    We sophisticated generation/recognition methods of linguistic and paralinguistic information and achieved a communication robot which can make conversation with a group of people. The robot was used to stimulate activity of the human to human conversation. For this purpose, we designed a robot appearance to express desired character for conversation and to perform paralinguistic information expression functions. We designed behaviors to suit for each conversational situation and conversational procedure to make it attractive. We also improved speech recognition/synthesis methods for conversation.

  • Study on Speech Enhancement Based on Distorted Speech Corpora in the Real-world

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2007
    -
    2009
     

    TAKEDA Kazuya, KITAOKA Norihide, YAMADA Takeshi, NISHIURA Takanobu, MIYAJIMA Chiyomi, TAMURA Satoshi, NAKAMURA Satoshi, KUROIWA Shingo, TSUGE Satoru, TAKIGUCHI Tetsuya, YAMAMOTO Kazumasa, OGAWA Tetsuji, NAKAYAMA Masato

     View Summary

    For distorted speech recognition under the real world, we conducted below : (1) development of distorted speech corpora named CENSREC and distribution of them in public ; (2) accurate recognition performance prediction for additively/convolutionally distorted speech ; (3) development of structural explanation of distortion factors and recognition methods for distorted speech ; (4) development of distorted speech recognition methods.

  • A study on a pattern recognition system based on the combination of complementary classifiers

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2007
    -
    2008
     

    OGAWA Tetsuji

  • 状態と出力に相互依存関係を有する確率モデルの構造最適化と頑健性強化に関する研究

    日本学術振興会  科学研究費助成事業

    Project Year :

    2005
    -
    2006
     

    小川 哲司

     View Summary

    本年度は,以下の2点について検討を行った.
    (1)部分隠れマルコフモデル(PHMM)のモデル構造最適選択
    PHMMのモデル構造を音韻毎に最適化する枠組みとして,昨年度は,評価基準として重み付き尤度比最大化基準を,最適化アルゴリズムとして遺伝的アルゴリズムを導入し,講演音声認識において従来法の誤りを削減した.本年度は,下記A)〜C)について詳細な検討を行った.
    A)評価関数:重み付き尤度比基準,最尤基準,ベイズ基準など複数の評価基準を導入し認識性能を評価したところ,識別的な基準である尤度比基準が最良の性能を与えることがわかった.
    B)最適化アルゴリズム:遺伝的アルゴリズムとタブサーチを用いたときの性能を評価したところ,タブサーチは局所解に陥りやすく,遺伝的アルゴリズムの方が高速に最適解に到達することがわかった.
    C)識別クラスの共有:探索の効率化のため音韻のクラスタリングを行ったが,最適化の段階でクラスを共有してしまうと,共有しない場合と同程度の性能を得られないことがわかった.
    (2)環境の変動に頑健な特徴量の検討
    PHMMのように高精度な確率モデルは,HMMなどの単純なモデルと比較して発話者や環境の変動の影響を受け易い.したがって,音響特徴量から発話者の情報や環境の情報を取り除き,識別に必要な情報である音韻情報のみを抽出する手法(識別情報抽出)について検討を行った.識別情報抽出として,HLDAやそれを拡張したManifold HLDA(MHLDA)を提案し,単語音声認識により評価を行ったところ,HLDAとMHLDAにより抽出されたパラメータを統合することで,環境の変動に対して頑健な性能を与えることがわかった.
    この知見を発展させ,HLDAにブースティングを導入した確率モデルの統合手法についても検討を行い,最尤識別に比べ頑健性の高い認識が可能になるという予備的な知見も得た.

  • Studies on conversation systems with understanding and generating functions of linguistic and para-linguistic information

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2003
    -
    2006
     

    KOBAYASHI Tetsunori, FUJIE Shinya, OGAWA Tetsuji

     View Summary

    As a tool for investigating fundamental elements of natural spoken language communication, a prototype of spoken dialogue system with understanding and generating functions of linguistic and para-linguistic information was developed.
    Although many excellent studies on speech recognition and synthesis have been conducted, there exists no practical spoken dialogue system which satisfies us. One of the reasons is that most spoken dialogue systems did not deal with para-linguistic information. The quantitative understanding for para-linguistic information is not sufficient enough to make natural conversation system. In this study, we tried realizing many component technologies and a platform of conversation robot as tools to reveal the quantitative rolls of para-language.
    In particular, the following outcomes were obtained. 1) the sound localization and separation methods using the four-line directivity microphone mounted on head of robot, 2) the high quality speech synthesis method based on the waveform synthesis and the high quality voice conversion method for expressing para-linguistic information, 3) the method of attitude recognition and back-channel feedback generation based on the prosodic information as para-linguistic information in speech information, 4) the method of head gesture recognition and facial expression recognition as para-linguistic information in visual information, 5) humanoid robot "ROBISUKE" developed as the platform of the spoken dialogue system, and 6) Message Oriented RObot Architecture, MONEA, proposed for the integration of the abovementioned modules.
    Future work includes the experiment for finding out the necessary requirement for natural conversation quantitatively.

▼display all

Misc

  • RangeBoundTrack: 黒毛和種雌牛分娩監視映像データセット構築のための牛追跡

    中田道寛, 大吉佐和, 中野鐵兵, 春日良一, 小川哲司

    日本畜産学会 第132回大会    2024.09

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 低頻度語のためのプロンプトを活用した音声認識

    菅野竜雅, 佐藤裕明, 佐久間旭, 熊野正, 河合吉彦, 小川哲司

    日本音響学会研究発表会講演論文集    2024.09

    Authorship:Last author

    Research paper, summary (national, other academic conference)  

  • 状態変化の頻度情報の抽出と家畜の映像監視のための特徴表現としての利用

    中田道寛, 中野鐵兵, 小川哲司

    第27回画像の認識・理解シンポジウム (MIRU2024)   IS-3-142   1 - 4  2024.08

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 重症児感情状態推定モデル構築のためのフィードバックサイクルの検討:人の「見守り」による効率的なモデル構築

    望田康太, 中野鐵兵, 若林麻里, 佐藤朝美, 小川哲司

    第27回画像の認識・理解シンポジウム (MIRU2024)   IS-1-165   1 - 4  2024.08

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 歌唱者埋め込み表現抽出器の構築において歌唱者内の音響変動を重要視することの効果の検証

    当間佐耶佳, 有賀智輝, 樋口陽祐, 早坂一寿, 執行里恵, 小川哲司

    情報処理学会研究報告   2024-SLP-152 ( 60 ) 331 - 336  2024.06

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 発音出力を利用したchain of thought 音声認識

    菅野竜雅, 佐藤裕明, 熊野正, 河合吉彦, 小川哲司

    日本音響学会研究発表会講演論文集    2024.03

    Authorship:Last author

    Research paper, summary (national, other academic conference)  

  • 再帰的フィードバックを用いた階層的マルチタスク学習によるEnd-to-End音声認識

    楠奈穂美, 樋口陽祐, 小川哲司, 小林哲則

    日本音響学会研究発表会講演論文集    2024.03

    Research paper, summary (national, other academic conference)  

  • M-measureを用いた特徴抽出に基づく回転速度に頑健な風車異常検知

    若山拓矢, 井上太揮, 緒方淳, 飯田誠, 小川哲司

    第45回風力エネルギー利用シンポジウム    2023.11

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 風車異常検知システム早期運用のための距離学習を用いた識別的な特徴表現の学習

    井上太揮, 緒方淳, 飯田誠, 小川哲司

    第45回風力エネルギー利用シンポジウム    2023.11

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • Streaming transducerにおけるテキストのみを用いた学習方法に関する検討

    佐藤裕明, 菅野竜雅, 佐久間旭, 河合吉彦, 熊野正, 山田一郎, 小川哲司

    日本音響学会研究発表会講演論文集    2023.09

    Authorship:Last author

    Research paper, summary (national, other academic conference)  

  • 深層話者埋め込みを用いた歌唱者の照合に関する検討

    当間佐耶佳, 有賀智輝, 樋口陽祐, 早坂一寿, 岡本直紀, 小川哲司

    日本音響学会研究発表会講演論文集    2023.09

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • Teacher-Forcingにより歌詞を与えた際のAttentionの崩れに着目した替え歌検知

    有賀智輝, 樋口陽祐, 早坂一寿, 岡本直紀, 小林哲則, 小川哲司

    日本音響学会研究発表会講演論文集    2023.09

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • An investigation on constructing multi-look-ahead contextual block streaming transducer

    Huaibo Zhao, Shinya Fujie, Tetsuji Ogawa, Tetsunori Kobayashi

    日本音響学会研究発表会講演論文集    2023.09

    Research paper, summary (national, other academic conference)  

  • 音源の分離と再混合による事前学習を必要としないモノラル教師なし音源分離

    西城耕平, 小川哲司

    日本音響学会研究発表会講演論文集    2023.09

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 事前学習済みマスク言語モデルを用いたEnd-to-end音声認識

    樋口陽祐, 小川哲司, 小林哲則, 渡部晋治

    日本音響学会研究発表会講演論文集    2023.09

    Research paper, summary (national, other academic conference)  

  • 字幕制作効率化のための音声認識エラー検出手法

    菅野竜雅, 佐藤裕明, 佐久間旭, 熊野正, 河合吉彦, 山田一郎, 小川哲司

    映像メディア学会2023年年次大会    2023.08

    Authorship:Last author

    Research paper, summary (national, other academic conference)  

  • アクションユニットを用いた重症心身障害児の感情状態推定

    望田康太, 岸凌祐, 大矢耀介, 中野鐵兵, 藤江真也, 佐藤朝美, 小川哲司

    第24回日本医療情報学会看護学術大会    2023.07

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 重症心身障害児を対象とした顔表情に基づく感情状態推定のための事前学習モデルに関する検討

    望田康太, 中野鐵兵, 藤江真也, 若林麻里, 佐藤朝美, 小川哲司

    第26回画像の認識・理解シンポジウム (MIRU2023)   IS1-104   1 - 4  2023.07

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 映像監視に基づく意思決定支援のための事前学習モデルの構築法と繁殖牛の分娩検知への応用

    中田道寛, 斎藤奨, 中野鐵兵, 小川哲司

    第26回画像の認識・理解シンポジウム (MIRU2023)   IS1-101   1 - 4  2023.07

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 歌詞と歌唱音声のアライメント崩れに基づく替え歌検知

    有賀智輝, 樋口陽祐, 菅野光則, 執行里恵, 水口天都, 岡本直紀, 小川哲司

    電子情報通信学科技術研究報告(SP)   123 ( 88 ) 48 - 53  2023.06

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • Self-Remixing: 音源の分離と再混合による教師なし音源分離

    西城耕平, 小川哲司

    日本音響学会研究発表会講演論文集     191 - 194  2023.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 中間層予測を用いたEnd-to-Endダイアライゼーション

    藤田雄介, 小松達也, 木田祐介, 小川哲司

    日本音響学会研究発表会講演論文集     665 - 666  2023.03

    Authorship:Last author

    Research paper, summary (national, other academic conference)  

  • 気象・海況情報を用いた良漁場予測における予測範囲の絞り込み

    兒新治紀, 中野鐵兵, 宮澤泰正, 小川哲司

    日本水産学会春季大会    2023.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • Multiple latency CBS streaming ASR for conversational systems

    Zhao Huaibo, Shinya Fujie, Tetsuji Ogawa, Jin Sakuma, Yusuke Kida, Tetsunori Kobayashi

    情報処理学会研究報告 (SLP)   2022-SLP-146 ( 9 ) 1 - 6  2023.02

    Research paper, summary (national, other academic conference)  

  • 畜産農家の意思決定支援AI導入に向けた取組み

    小川哲司, 斎藤奨, 中野鐵兵

    ITUジャーナル   52 ( 10 ) 10 - 13  2022.10  [Invited]

    Authorship:Lead author, Corresponding author

    Article, review, commentary, editorial, etc. (scientific journal)  

  • 映像監視に基づく繁殖牛の分娩予兆検知~ユーザが納得して意思決定できるような映像監視システムをどう構築し運用するか?

    小川哲司

    計測と制御・特集「農・林・畜・水産業に挑む画像センシング技術 」   61 ( 10 ) 746 - 749  2022.10  [Invited]

    Authorship:Lead author, Corresponding author

    Article, review, commentary, editorial, etc. (scientific journal)  

  • CycleGANを用いた教師無し音声処理歪み補正

    荻野里久, 西城耕平, 小川哲司

    日本音響学会研究発表会講演論文集     371 - 374  2022.09

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • ブラインド音源分離を教師としたTeacher-Student学習とUnmix-Remix無矛盾学習によるSequential Neural Beamformerの教師なし学習

    西城耕平, 小川哲司

    日本音響学会研究発表会講演論文集     359 - 362  2022.09

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • クラウドソーシングにおける動的タスク発注モデルの教師無し学習

    柳澤遼, 斎藤奨, 中野鐵兵, 小林哲則, 小川哲司

    電子情報通信学会技術研究報告 (AI)   122 ( 96 ) 72 - 76  2022.07

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 対話特徴を用いた第二言語発話の流暢性自動採点

    松浦瑠希, 鈴木駿吾, 佐伯真於, 藤江真也, 小川哲司, 松山洋一

    情報処理学会研究報告 (SLP)   2022-SLP-142 ( 47 ) 1 - 6  2022.06

    Research paper, summary (national, other academic conference)  

  • Transducer型ストリーミング音声認識におけるMask-CTCを用いた事前学習

    趙懐博, 樋口陽祐, 木田祐介, 小川哲司, 小林哲則

    情報処理学会研究報告 (SLP)   2022-SLP-142 ( 61 ) 1 - 6  2022.06

    Research paper, summary (national, other academic conference)  

  • クラウドソーシングを用いた合成音声の音質主観評価のためのワーカ選抜基準

    八重樫萌絵, 斎藤奨, 中野鐵兵, 小川哲司

    電子情報通信学会技術研究報告 (SP)   122 ( 81 ) 104 - 109  2022.06

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 局所的な真偽判定を用いた敵対的学習に基づく教師なし音声処理歪み補正

    荻野里久, 西城耕平, 藤枝大, 小川哲司

    研究報告 (SP)   122 ( 81 ) 49 - 54  2022.06

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • ブラインド音源分離の分離音と観測信号を教師信号として用いたSequential Neural Beamformerの教師なし学習

    西城耕平, 小川哲司

    電子情報通信学会技術研究報告 (SP)   122 ( 81 ) 110 - 115  2022.06

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • テキストのみを用いたIntermediate-CTCコンフォーマーモデルのドメイン適応

    佐藤裕明, 小森智康, 三島剛, 河合吉彦, 望月貴裕, 佐藤庄衛, 小川哲司

    日本音響学会研究発表会講演論文集    2022.03

    Authorship:Last author

    Research paper, summary (national, other academic conference)  

  • 粒度の異なるサブワード単位に基づく階層的条件付きEnd-to-End音声認識

    樋口陽祐, 軽部敬太, 小川哲司, 小林哲則

    日本音響学会研究発表会講演論文集    2022.03

    Research paper, summary (national, other academic conference)  

  • クラウドソーシングを用いた合成音声評価におけるワーカからの回答の分析

    八重樫萌絵, 斎藤奨, 中野鐵兵, 小川哲司

    日本音響学会研究発表会講演論文集    2022.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 敵対的学習と Unmix-Remix 無矛盾学習による教師なし音源分離

    西城耕平, 小川哲司

    日本音響学会研究発表会講演論文集    2022.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • ペアデータを必要としない敵対的学習に基づく音声処理歪み補正

    荻野里久, 藤枝大, 片桐一浩, 小川哲司

    日本音響学会研究発表会講演論文集    2022.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 言い淀みとポーズ位置検出に基づく第二言語発話の流暢性自動採点

    松浦瑠希, 鈴木駿吾, 佐伯真於, 小川哲司, 松山洋一

    日本音響学会研究発表会講演論文集    2022.03

    Research paper, summary (national, other academic conference)  

  • クラウドソーシングを用いた話者照合結果の検証における誤り削減傾向に関する調査

    井手悠太, 斎藤奨, 中野鐵兵, 小川哲司

    日本音響学会研究発表会講演論文集    2022.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 風車運用高度化技術研究開発

    飯田誠, 古澤陽子, 山本和男, 緒方淳, 小川哲司

    日本風力エネルギー学会誌・特集「風力発電分野の国家プロジェクト」   45 ( 4 ) 582 - 586  2022.02  [Invited]

    Authorship:Last author

    Article, review, commentary, editorial, etc. (scientific journal)  

  • End-to-end音声認識モデルにおけるテキストデータ学習手法の検討

    佐藤裕明, 小森智康, 三島剛, 河合吉彦, 望月貴裕, 佐藤庄衛, 小川哲司

    2021年度映像情報メディア学会冬季大会    2021.12

    Authorship:Last author

    Research paper, summary (national, other academic conference)  

  • テキストのみを用いたドメイン適応のためのIntermediate-CTCコンフォーマーモデルに関する検討

    佐藤裕明, 小森智康, 三島剛, 河合吉彦, 望月貴裕, 佐藤庄衛, 小川哲司

    情報処理学会研究報告 (SLP)    2021.12

    Authorship:Last author

    Research paper, summary (national, other academic conference)  

  • クラウドソーシングを用いた結果の検証による話者照合性能の改善

    井手悠太, 斎藤奨, 中野鐵兵, 小川哲司

    情報処理学会研究報告 (SLP)    2021.12

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • CTCと異なる粒度のサブワード単位に基づいた階層的条件付きEnd-to-End音声認識

    樋口陽祐, 軽部敬太, 小川哲司, 小林哲則

    情報処理学会研究報告 (SLP)     1 - 6  2021.12

    Research paper, summary (national, other academic conference)  

  • マルソウダ曳縄漁のための気象・海況情報を用いた良漁場予測

    堀内優佳, 中野鐵兵, 宮澤泰正, 小川哲司

    水産海洋学会2021年度研究発表大会要旨    2021.11

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 船上映像を用いた漁獲尾数計測器

    田中理子, 中野鐵兵, 小川哲司

    水産海洋学会2021年度研究発表大会要旨    2021.11

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 農家の皆さんにとって使い勝手が良く,信頼してもらえるAI技術の作り方-農家の意思決定支援のための家畜の映像監視システム開発を例に―

    小川哲司

    肉牛ジャーナル   34 ( 10 ) 59 - 63  2021.10

    Authorship:Lead author, Corresponding author

    Article, review, commentary, editorial, etc. (trade magazine, newspaper, online media)  

  • Triggered attention型ストリーミング音声認識におけるMask-CTCを用いた事前学習

    趙懐博, 樋口陽祐, 小林哲則, 小川哲司

    情報処理学会研究報告 (SLP)     1 - 6  2021.10

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • DNNを用いた最小分散ビームフォーマの音源の動き対する頑健性:音源追跡とエリア収音に基づくアプローチの比較

    西城耕平, 藤枝大, 片桐一浩, 小林哲則, 小川哲司

    日本音響学会研究発表会講演論文集     321 - 322  2021.09

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • VocalTurk:クラウドソーシングを用いた話者照合の性能調査

    斎藤奨, 井手悠太, 中野鐵兵, 小川哲司

    日本音響学会研究発表会講演論文集     1003 - 1006  2021.09

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 予測の不一致に基づく深層学習モデルの不確実性推定とクラウドソーシングを用いた映像監視への応用

    松永直輝, 斎藤奨, 中野鐵兵, 小川哲司

    第24回画像の認識・理解シンポジウム (MIRU2021)     1 - 4  2021.07

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 意思決定支援のための解釈可能な映像監視システムの開発フローと繁殖牛の分娩予兆検知への応用

    兵頭亮介, 斎藤奨, 中野鐵兵, 小川哲司

    第24回画像の認識・理解シンポジウム (MIRU2021)     1 - 4  2021.07

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 黒毛和牛種の映像監視における解釈可能な分娩予兆通知システム

    兵頭亮介, 斎藤奨, 中野鐵兵, 赤羽誠, 春日良一, 小川哲司

    日本畜産学会 第128回大会要旨    2021.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 空間フィルタ出力を補助情報として用いた音源の移動に頑健なニューラル音声強調

    西城耕平, 藤枝大, 片桐一浩, 小林哲則, 小川哲司

    日本音響学会研究発表会講演論文集     427 - 428  2021.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • ペアデータを必要としない敵対的学習に基づく多チャンネル音源分離

    中込優, 戸上真人, 小川哲司, 小林哲則

    日本音響学会研究発表会講演論文集     409 - 410  2021.03

    Research paper, summary (national, other academic conference)  

  • コモンセンス知識を利用した物語中の登場人物の感情推定

    田辺ひかり, 小川哲司, 小林哲則, 林良彦

    言語処理学会第27回年次大会   27th   538 - 542  2021.03

    Research paper, summary (national, other academic conference)  

    J-GLOBAL

  • 単語の重要度に応じてパラメタ数可変な単語分散表現の学習

    露木浩章, 小川哲司, 小林哲則, 林良彦

    言語処理学会第27回年次大会   27th   12 - 16  2021.03

    Research paper, summary (national, other academic conference)  

    J-GLOBAL

  • CTCとマスク推定に基づく推論速度の速いEnd-to-End音声認識

    樋口陽祐, 稲熊寛文, 渡部晋治, 小川哲司, 小林哲則

    電子情報通信学会技術研究報告 (SP)    2020.12

    Research paper, summary (national, other academic conference)  

  • 分布類似度に基づく健全性指標と風車異常検知システムの早期運用における効果

    長谷川隆徳, 緒方淳, 飯田誠, 小川哲司

    第42回風力エネルギー利用シンポジウム予稿集    2020.11

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • Mentoring-Reverse Mentoring: 多チャンネル音源分離における教師なし学習のための知識伝搬フレームワーク

    中込優, 戸上真人, 小川哲司, 小林哲則

    日本音響学会講演論文集   2020 ( 秋季 ) 127 - 130  2020.09

    Research paper, summary (national, other academic conference)  

  • Mask CTC: CTCとマスク推定に基づいた非自己回帰的なEnd-to-End音声認識

    樋口陽祐, 渡部晋治, Chen Nanxin, 小川哲司, 小林哲則

    日本音響学会講演論文集   2020 ( 秋季 ) 747 - 748  2020.09

    Research paper, summary (national, other academic conference)  

  • 書き起こしのための遠方発話音声認識技術の検討

    佐藤裕明, 萩原愛子, 伊藤均, 三島剛, 河合吉彦, 小森智康, 佐藤庄衛, 小川哲司

    日本音響学会講演論文集   2020 ( 秋季 ) 841 - 842  2020.09

    Authorship:Last author

    Research paper, summary (national, other academic conference)  

  • 感情に関するマルチラベルアノテーションにおける正解基準の設定

    田辺ひかり, 小川哲司, 小林哲則, 林良彦

    人工知能学会全国大会論文集   JSAI2020   1 - 4  2020.06

    Research paper, summary (national, other academic conference)  

  • クラウドソーシングにおける効率的な回答収集のための動的なマイクロタスク追加発注

    森永聖也, 斎藤奨, 中野鐵兵, 小林哲則, 小川哲司

    人工知能学会全国大会論文集   JSAI2020   1 - 4  2020.06

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 肉牛の発情検知のための乗駕行動画像データセット構築におけるクラウドソーシングの活用

    川野百合子, 斎藤奨, 中野鐵兵, 赤羽誠, 近藤育海, 山崎凌汰, 日下裕美, 坂口実, 小川哲司

    人工知能学会全国大会論文集   JSAI2020   1 - 4  2020.06

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • ドローン空撮画像を用いた潮目の検知に関する検討

    幸加木裕也, 小林哲則, 小川哲司

    日本水産学会春季大会要旨    2020.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • End-to-end雑音除去のためのネットワーク構造の検討

    蓮実拓也, 小林哲則, 小川哲司

    日本音響学会講演論文集   2020 ( 春季 ) 335 - 336  2020.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 感情推定における感情カテゴリに関する先験的知識の利用

    田辺ひかり, 小川哲司, 小林哲則, 林良彦

    言語処理学会第26回年次大会発表論文集   P6-23  2020.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 局所的依存構造をSelf-Attentionにより考慮する翻訳文生成

    露木浩章, 小川哲司, 小林哲則, 林良彦

    言語処理学会第26回年次大会発表論文集   P1-7  2020.03

    Research paper, summary (national, other academic conference)  

  • 所望音源の方向アトラクターに基づく時変の空間フィルタを用いたDNN音声抽出

    中込優, 戸上真人, 小川哲司, 小林哲則

    日本音響学会講演論文集   2020 ( 春季 ) 305 - 308  2020.03

    Research paper, summary (national, other academic conference)  

  • 短発話を対象としたテキスト独立型話者認識のためのフレームレベル音素非依存特徴抽出

    俵直弘, 小川厚徳, 岩田具治, デラクロアマーク, 小川哲司

    日本音響学会講演論文集   2020 ( 春季 ) 997 - 998  2020.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • Attentionに関する損失を利用したノイズに頑健なEnd-to-End音声認識

    樋口陽祐, 俵直弘, 小川厚徳, 岩田具治, 小林 哲則, 小川哲司

    日本音響学会講演論文集   2020 ( 春季 ) 935 - 936  2020.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • クラウドソーシングにおける動的な回答収集による低コストな多数決手法

    森永聖也, 斎藤奨, 中野鐵兵, 小林哲則, 小川哲司

    情報処理学会研究報告 (HCI)   2019-HCI-186 ( 36 ) 1 - 6  2020.01

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • マルチチャネル音声強調のための時間領域畳み込みデノイジングオートエンコーダ

    俵直弘, 小林哲則, 小川哲司

    電子情報通信学会技術研究報告(SP)    2019.12

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • あらゆる風車に適用可能な状態監視技術を目指して~風車主要機器におけるデータ駆動型異常検知とその評価~

    長谷川隆徳, 緒方淳, 村川正宏, 飯田誠, 小川哲司

    第41回風力エネルギー利用シンポジウム    2019.12

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 映像情報を用いた繁殖牛分娩検知システムの構築と運用法に関する研究・開発(自然に挑む画像センシング技術~農林水産業の現場でいかに役立つか?~)

    小川哲司, 斎藤奨, 中野鐵兵

    OplusE   41 ( 6 ) 858 - 862  2019.11  [Invited]

    Authorship:Lead author, Corresponding author

    Article, review, commentary, editorial, etc. (scientific journal)  

  • 画像情報による黒毛和牛種の乗駕行動の検知に関する検討

    川野百合子, 河田宗士, 沖本祐典, 中野鐵兵, 赤羽誠, 近藤育海, 山崎凌汰, 日下裕美, 坂口実, 小川哲司

    日本畜産学会 第126回大会要旨     IV-19-03  2019.09

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 双方向時系列情報を利用した検出結果と正解情報付与による効率的なアノテーション手法

    真殿航輝, 中野鐵兵, 小林哲則, 小川哲司

    第22回画像の認識・理解シンポジウム   PS2-5   1 - 4  2019.08

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 学習可能な暗号化画像への敵対的学習に基づく攻撃

    真殿航輝, 田中正行, 大西正輝, 小川哲司

    第22回画像の認識・理解シンポジウム   PS1-41   1 - 4  2019.08

    Authorship:Last author

    Research paper, summary (national, other academic conference)  

  • DPGMMと敵対的学習に基づく話者の違いに頑健な特徴抽出とゼロリソース音声認識での評価

    樋口陽祐, 俵直弘, 小林哲則, 小川哲司

    情報処理学会研究報告 (SLP)   2019-SLP-128 ( 6 ) 1 - 6  2019.07

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 画像から得られる牛の身体情報に基づく分娩予兆検知

    兵頭亮介, 斎藤奨, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

    人工知能学会全国大会論文集   JSAI2019  2019.06

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

    DOI

  • 肉牛の分娩検知システムにおけるクラウドソーシングを用いた誤通報の抑制

    沖本裕典, 斎藤奨, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

    人工知能学会全国大会論文集   JSAI2019  2019.06

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

    DOI

  • ベイズ状態空間モデルを用いた定置網漁のための日単位漁獲量予測

    幸加木裕也, 堀内優佳, 俵直弘, 福嶋正義, 井戸上彰, 橋本和夫, 小林哲則, 小川哲司

    人工知能学会全国大会論文集   JSAI2019  2019.06

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

    DOI

  • 回転機器状態監視のための振動異常検知システムにおける特徴表現学習

    長谷川隆徳, 緒方淳, 村川正宏, 小川哲司

    人工知能学会全国大会論文集   JSAI2019  2019.06

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

    DOI

  • 動画像から得られる牛の身体情報に基づく分娩予兆検知システム

    兵頭亮介, 菅原一真, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

    電子情報通信学会技術研究報告(PRMU)   119 ( 64 ) 1 - 6  2019.05

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • クラウドソーシングを用いた肉牛分娩開始検知システムの早期運用

    沖本裕典, 斎藤奨, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

    電子情報通信学会技術研究報告(PRMU)   119 ( 64 ) 7 - 12  2019.05

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 網内の魚の振る舞いを表現した状態空間モデルによる定置網漁のための日単位漁獲量予測

    幸加木裕也, 俵直弘, 橋本和夫, 小林哲則, 福嶋正義, 井戸上彰, 小川哲司

    電子情報通信学会技術研究報告(PRMU)   119 ( 64 ) 13 - 18  2019.05

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 早稲田大学知覚情報システム・メディアインテリジェンス研究室紹介

    長谷川隆徳, 黒澤郁音, 斎藤奨, 松山洋一, 林良彦, 小林哲則, 小川哲司

    日本風力エネルギー学会誌   43 ( 1 ) 154 - 157  2019.05

    Authorship:Last author, Corresponding author

    Article, review, commentary, editorial, etc. (scientific journal)  

    CiNii

  • クエリ文によるゼロショット映像検索 – TRECVID 2018 AVSタスクの成果報告 –

    植木一也, 中込優, 平川幸司, 菊池康太郎, 林良彦, 小川哲司, 小林哲則

    動的画像処理実用化ワークショップ2019 (DIA2019)    2019.03

    Research paper, summary (national, other academic conference)  

  • 漁獲量における心理尺度と漁獲量予測器の最適化への利用

    幸加木裕也, 福嶋正義, 井戸上彰, 橋本和夫, 小林哲則, 小川哲司

    日本水産学会春季大会要旨     140  2019.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 画像情報による黒毛和牛種の状態識別に基づいた分娩予兆検知システム

    兵頭亮介, 安田早希, 斎藤奨, 菅原一真, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

    日本畜産学会 第125回大会要旨     XIII-29-10  2019.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 映像情報による肉牛の分娩検知システムにおけるクラウドソーシングを用いた誤検出抑制

    沖本祐典, 斎藤奨, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

    日本畜産学会 第125回大会要旨     XIII-29-09  2019.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 音韻・話者特徴抽出のためのディスエンタングリングニューラルネットワークの実現にむけて

    俵直弘, 小林哲則, 小川哲司

    日本音響学会講演論文集   2019 ( 春季 ) 1003 - 1004  2019.03

  • ドメイン属性情報を用いたRNN言語モデルのドメイン汎化

    芦川博人, 森岡幹, 俵直弘, 小川厚徳, 岩田具治, 小川哲司, 小林哲則

    日本音響学会講演論文集   2019 ( 春季 ) 927 - 930  2019.03

  • ゼロリソース言語音声認識のための発話者の違いに頑健な特徴抽出

    樋口陽祐, 俵直弘, 小川哲司, 小林哲則

    日本音響学会講演論文集   2019 ( 春季 ) 923 - 924  2019.03

  • noise-aware学習を用いた敵対的デノイジングオートエンコーダによるポストフィルタリング

    俵直弘, 田辺ひかり, 小林哲則, 藤枝大, 片桐一浩, 矢頭隆, 小川哲司

    日本音響学会講演論文集   2019 ( 春季 ) 159 - 162  2019.03

  • 隣接単語系列の予測による文の分散表現構成法

    露木浩章, 小川哲司, 小林哲則, 林良彦

    言語処理学会第25回年次大会発表論文集     1479 - 1482  2019.03

  • 敵対的デノイジングオートエンコーダを用いた拡散性雑音除去

    田辺ひかり, 俵直弘, 小林哲則, 藤枝大, 片桐一浩, 矢頭隆, 小川哲司

    電子情報通信学会技術研究報告(SP)   118 ( 497 ) 155 - 160  2019.03

    CiNii

  • 隣接単語系列の予測による汎用的な文の分散表現の構成

    露木浩章, 小川哲司, 小林哲則, 林良彦

    言語処理学会年次大会発表論文集(Web)   25th  2019

    J-GLOBAL

  • 畳み込みニューラルネットワークに基づく風車異常検知システムにおける判断根拠の可視化に関する検討

    佐伯真於, 緒方淳, 村川正宏, 小川哲司

    第40回風力エネルギー利用シンポジウム予稿集    2018.12

  • 正常稼働状態の表現学習に基づく風車異常検知

    長谷川隆徳, 緒方淳, 村川正宏, 小川哲司

    第40回風力エネルギー利用シンポジウム予稿集    2018.12

  • RNN言語モデルのためのドメイン属性情報を用いたゼロショット学習

    芦川博人, 森岡幹, 俵直弘, 小川厚徳, 岩田具治, 小川哲司, 小林哲則

    情報処理学会研究報告    2018.12

  • 映像からの牛の分娩予兆行動検知に関する検討

    菅原一真, 中野鐵兵, 赤羽誠, 小林晢則, 小川哲司

    電子情報通信学会技術研究報告 (PRMU)   118 ( 362 ) 79 - 84  2018.12

  • 画像からの牛の状態識別に基づく分娩予兆検知

    兵頭亮介, 安田早希, 斎藤奨, 沖本裕典, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

    電子情報通信学会技術研究報告 (PRMU)   118 ( 362 ) 57 - 60  2018.12

  • Waseda_Meisei at TRECVID 2018: Fully-automatic ad-hoc video search

    Kazuya Ueki, Yu Nakagome, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

    Notebook paper of the TRECVID 2018 Workshop    2018.11

    Research paper, summary (international conference)  

  • Waseda Meisei at TRECVID2018: Ad-hoc video search

    Kazuya Ueki, Yu Nakagome, Koji Hirakawa, Kotaro Kikuchi, Yoshihiko Hayashi, Tetsuji Ogawa, Tetsunori Kobayashi

    Notebook paper of the TRECVID 2018 Workshop    2018.11

    Research paper, summary (international conference)  

  • 定置網漁の日単位漁獲量予測モデリングにおける学習データ量と予測性能の関係の調査

    堀内優佳, 幸加木裕也, 小林哲則, 小川哲司

    日本水産学会秋季大会要旨    2018.09

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 敵対的デノイジングオートエンコーダによる非線形ひずみ除去フィルタリング

    俵直弘, 小林哲則, 藤枝大, 片桐一浩, 矢頭隆, 小川哲司

    日本音響学会講演論文集   2018 ( 秋季 ) 159 - 162  2018.09

  • 非線形ひずみ除去のための敵対的 denoising autoencoder

    俵直弘, 小林哲則, 藤枝大, 片桐一浩, 矢頭隆, 小川哲司

    情報処理学会研究報告   2018-SLP-123 ( 1 ) 1 - 7  2018.07

  • 牛の分娩予兆として映像から観測可能な状態の検知

    沖本祐典, 菅原一真, 齊藤奨, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

    人工知能学会全国大会論文集   JSAI2018  2018.06

    DOI CiNii

  • AIで風車の異常を見つける:データ駆動型アプローチによる異常検知の最新動向

    長谷川隆徳, 緒方淳, 村川正宏, 小川哲司

    日本風力エネルギー学会誌   42 ( 1 ) 72 - 76  2018.05  [Invited]

    Authorship:Last author, Corresponding author

    Article, review, commentary, editorial, etc. (scientific journal)  

    DOI

  • 定置網漁における漁獲過程モデルを用いたシロサケの日単位漁獲量予測

    幸加木裕也, 俵直弘, 小林哲則, 橋本和夫, 小川哲司

    日本水産学会春季大会要旨    2018.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 映像情報を用いた分娩時の牛の状態推定

    沖本祐典, 菅原一真, 中野鐵兵, 赤羽誠, 小林哲則, 小川哲司

    日本畜産学会 第124回大会要旨    2018.03

    Authorship:Last author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 敵対的マルチタスク学習を用いた話者の違いに頑健な特徴抽出とゼロリソース音素識別による評価

    土屋平, 俵直弘, 小川哲司, 小林哲則

    日本音響学会講演論文集   2018 ( 春季 ) 9 - 12  2018.03

  • 話者正規化における言語非依存性とゼロリソース音声認識における効果

    島田拓也, 俵直弘, 小川哲司, 小林哲則

    日本音響学会講演論文集   2018 ( 春季 ) 109 - 112  2018.03

  • 敵対的学習に基づく話者特徴抽出

    俵直弘, 土屋平, 小川哲司, 小林哲則

    日本音響学会講演論文集   2018 ( 春季 ) 141 - 144  2018.03

  • 異種データ活用のための変換複合行列分解

    土屋平, 岩田具治, 小川哲司

    電子情報通信学会技術研究報告 (IBISML)   117 ( 475 ) 41 - 48  2018.03

    CiNii

  • 正常・損傷の表現学習に基づく風力発電システム異常検知技術の高度化

    長谷川隆徳, 緒方淳, 村川正宏, 小川哲司

    第39回風力エネルギー利用シンポジウム     371 - 374  2017.12

  • Waseda Meisei at TRECVID2017: Ad-hoc video search

    Kazuya Ueki, Koji Hirakawa, Kotara Kikuchi, Tetsuji Ogawa, Tetsunori Kobayashi

    Notebook paper of the TRECVID 2017 Workshop    2017.11

    Research paper, summary (international conference)  

  • 正常・損傷の表現学習に基づく機械振動異常検知

    長谷川隆徳, 緒方淳, 村川正宏, 小川哲司

    第16回評価・診断に関するシンポジウム講演論文集     5 - 10  2017.11

    DOI

  • 複数人対話を対象としたRNN言語モデルにおける発話終端情報利用の有効性

    芦川博人, 俵直弘, 小川厚徳, 岩田具治, 小林哲則, 小川哲司

    日本音響学会講演論文集   2017 ( 秋季 ) 23 - 26  2017.09

  • ドメイン依存・非依存の内部表現を有する再帰型ニューラルネットワーク言語モデル

    森岡幹, 俵直弘, 小川哲司, 小川厚徳, 岩田具治, 小林哲則

    日本音響学会講演論文集   2017 ( 秋季 ) 27 - 30  2017.09

  • 会話参加状態を考慮した振る舞いをするロボットのシステムアーキテクチャ

    菅原一真, 浅野秀平, 赤川優斗, 藤江真也, 小川哲司, 小林哲則

    人工知能学会全国大会論文集   JSAI2017  2017.06

    DOI CiNii

  • 国際会議INTERSPEECH2016参加報告

    浅見太一, 小川厚徳, 小川哲司, 大谷大和, 倉田岳人, 齋藤大輔, 塩田さやか, 篠原雄介, 鈴木雅之, 高道慎之介, 南條浩輝, 橋本佳, 樋口卓哉, 増村亮, 吉野幸一郎, 渡部晋治

    情報処理学会研究報告 (SLP)   vol.2016-SLP-115 ( 7 ) 1 - 7  2017.02

    Research paper, summary (national, other academic conference)  

  • 少量データに頑健なニューラルネットワーク言語モデル

    森岡幹, 岩田具治, 小川厚徳, 俵直弘, 小川哲司, 小林哲則

    日本音響学会講演論文集   2016 ( 秋季 ) 89 - 92  2016.09

  • 複数人対話のための話者情報を用いたRNN言語モデル

    芦川博人, 森岡幹, 小川厚徳, 岩田具治, 俵直弘, 小川哲司, 小林哲則

    日本音響学会講演論文集   2016 ( 秋季 ) 85 - 88  2016.09

  • 深層学習を用いた出現音素の偏りに頑健な話者照合手法

    佐藤洋輔, 小川哲司, 堀内靖雄, 黒岩眞吾

    電子情報通信学会総合大会講演論文集    2016.03

    Research paper, summary (national, other academic conference)  

  • 連想記憶に基づく線形分離行列推定を用いたタンデム接続型音源分離

    大町基, 小川哲司, 小林哲則, 藤枝大, 片桐一浩

    日本音響学会講演論文集   2016 ( 春季 ) 21 - 24  2016.03

  • 高次相関を考慮した音響特徴量のDNNに基づく音声認識での利用

    小川哲司, 小林哲則, 新田恒雄

    日本音響学会講演論文集   2016 ( 春季 ) 161 - 162  2016.03

    Authorship:Lead author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • ニューラルネットワークに基づく識別器の不確かさの推定とマルチストリーム音声認識への適用

    小川哲司, Mallidi Harish, Vesely Karel, Hermansky Hynek

    日本音響学会講演論文集   2016 ( 春季 ) 67 - 70  2016.03

    Authorship:Lead author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 国際会議INTERSPEECH2015参加報告

    浅見太一, 大谷大和, 小川哲司, 木下慶介, 倉田岳人, 齋藤大輔, 塩田さやか, 太刀岡勇気, 中村静, 増村亮, 渡部晋治

    情報処理学会研究報告   2016-SLP-110 ( 4 ) 1 - 5  2016.02

  • スペクトラルクラスタリングに基づく話者クラスタリングのための因子分析法の効果の検証

    俵直弘, 小川哲司, 小林哲則

    日本音響学会講演論文集   2015 ( 秋季 ) 173 - 174  2015.09

  • 連想記憶に基づくブラインド音源分離のエコーキャンセリングへの応用

    大町基, 小川哲司, 小林哲則, 藤枝大, 片桐一浩

    日本音響学会講演論文集   2015 ( 秋季 ) 593 - 596  2015.09

  • 複数の文脈長を考慮したリカレントニューラルネットワークに基づく言語モデル

    森岡幹, 俵直弘, 小川哲司, 岩田具治, 小川厚徳, 堀貴明, 小林哲則

    日本音響学会講演論文集   2015 ( 秋季 ) 17 - 20  2015.09

  • 国際会議ICASSP2015参加報告

    岡本拓磨, 小川哲司, 落合翼, 柏木陽佑, 亀岡弘和, 木下慶介, 郡山知樹, 齋藤大輔, 篠崎隆宏, 高木信二, 滝口哲也, 太刀岡勇気, 俵直弘, 橋本佳, 藤本雅清, 松田繁樹, 三村正人, 吉岡拓也, 渡部晋治

    情報処理学会研究報告   2015-SLP-107 ( 3 ) 1 - 7  2015.07

  • テンソル積による基底変換に基づく音声認識に関する研究

    上田賢次郎, 小川哲司, 小林哲則, 桂田浩一, 新田恒雄

    日本音響学会講演論文集   2015 ( 春季 ) 7 - 10  2015.03

  • 国際会議INTERSPEECH2014,SLT2014参加報告

    浅見太一, 岩野公司, 小川哲司, 駒谷和範, 齋藤大輔, 篠田浩一, 太刀岡勇気, 東中竜一郎, 福田隆, 増村亮, 渡部晋治

    情報処理学会研究報告   2015-SLP-105 ( 7 ) 1 - 6  2015.02

     View Summary

    2014 年 9 月 14 日から 18 日にかけシンガポールで開催された ISCA 主催の INTERSPEECH2014,及び,同年 12 月 14 日から 18 日にかけて米国レイク・タホで開催された IEEE 主催の SLT2014 に参加した.ともに,音声言語処理分野で一流の国際会議である.ここでは,海外からの発表を中心に,これらの会議における最新の技術動向,注目すべき発表について報告する.

    CiNii

  • i-vectorを用いたスペクトラルクラスタリングによる雑音環境下話者クラスタリング

    俵直弘, 小川哲司, 小林哲則

    情報処理学会研究報告   2015-SLP-105 ( 11 ) 1 - 6  2015.02

     View Summary

    i-vector による話者表現とスペクトラルクラスタリングを組み合わせることで,雑音に頑健な話者クラスタリングを実現する.まず,雑音を含む音声に対して話者クラスタリングを行う場合,高精度な話者特徴量として知られる i-vector を用いて発話間類似度を計算しても,話者の類似度を適切に推定できないことを実験的に明らかにする.また,この問題に対してスペクトラルクラスタリングを適用することの妥当性をグラフラプラシアンの固有ベクトルを分析することで確認する.最後に,スペクトラルクラスタリングの雑音に対する頑健性を実験的に確認するために,日本語話し言葉コーパスに様々な種類の雑音を重畳して得た音声を用いて話者クラスタリング実験を行い,クリーンな音声と同程度の精度で雑音を含む音声のクラスタリングが可能であることを明らかにする.

    CiNii

  • 連想記憶と線形分離フィルタを用いたブラインド音源分離

    大町基, 小川哲司, 小林哲則, 藤枝大, 片桐一浩

    情報処理学会研究報告   2015-SLP-105 ( 4 ) 1 - 6  2015.02

     View Summary

    連想記憶と線形分離フィルタを組み合わせることにより,歪が少ない高精度なブラインド音源分離方式を提案する.独立成分分析 (ICA) や独立ベクトル分析 (IVA) のような線形フィルタに基づく音源分離は,歪が少ないという特徴を持つ.しかしながら,ICA,IVA は,音源の独立性や非ガウス性を仮定するため,これが成立しないとき分離性能が劣化する.提案法は,線形分離フィルタの出力に最も近い無歪の音声を連想記憶を用いて求める処理と,連想記憶の出力に分離フィルタの出力が近づくよう分離フィルタの係数を補正する処理とを繰り返すことで分離音声を求める.これにより音源の独立性を仮定すること無く,歪の少ない分離音声を得ることができる.2 話者同時発話音声に対する音源分離実験の結果,提案法は IVA より分離精度を向上できることを確認した.

    CiNii

  • スペクトラルクラスタリングに基づく話者クラスタリング

    俵直弘, 小川哲司, 小林哲則

    日本音響学会講演論文集   2014 ( 秋季 ) 95 - 98  2014.09

  • A study on MLP-based speaker canonicalization

    IPSJ SIG Notes   2014-SLP-102 ( 8 ) 1 - 6  2014.07

     View Summary

    Accurate and efficient speaker canonicalization is proposed to improve the performance of speaker-independent ASR systems. Vocal tract length normalization (VTLN) is often applied to speaker canonicalization in ASR; however, it requires parallel decoding of speech when estimating the optimal warping parameter. In addition, VTLN provides the same linear spectral transformation in an utterance, although optimal mapping functions differ among phonemes. In this study, we propose a novel speaker canonicalization using multilayer perceptron (MLP) that is trained with a data set of vowels to map an input spectrum to the output spectrum of a standard speaker or a canonical speaker. The proposed speaker canonicalization operates according to the integration of MLP-based mapping and identity mapping that depends on frequency bands and achieves accurate recognition without any tuning of mapping function during run-time. Results of experiments conducted with a continuous digit recognition task showed that the proposed method reduces the intra-class variability in both of the vowel and consonant parts and outperforms VTLN.

    CiNii

  • Speaker recognition using i-vector

    Ogawa Tetsuji, Shiota Sayaka

    THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN   70 ( 6 ) 332 - 339  2014.06  [Invited]

    Authorship:Lead author, Corresponding author

    Article, review, commentary, editorial, etc. (scientific journal)  

    DOI CiNii J-GLOBAL

  • 標準話者母音スペクトルへの変換に基づく話者正準化

    久保田雄一, 大町基, 小川哲司, 小林哲則, 新田恒雄

    日本音響学会講演論文集   2014 ( 春季 ) 77 - 78  2014.03

  • 因子分析モデルに基づく話者照合の環境変動に対する頑健性の調査

    福地佑介, 俵直弘, 小川哲司, 小林哲則

    日本音響学会講演論文集   2013 ( 秋季 ) 75 - 78  2013.09

  • Machine learning for speaker recognition

    OGAWA Tetsuji, MATSUI Tomoko

    The Journal of the Acoustical Society of Japan   69 ( 7 ) 349 - 356  2013.07

    CiNii

  • 効率的なサンプリング手法を用いた話者モデリング

    俵直弘, 小川哲司, 渡部晋治, 中村篤, 小林哲則

    情報処理学会研究報告   2013-SLP-97 ( 2 ) 1 - 8  2013.07

     View Summary

    多重スケール混合分布 (Multi-scale mixture model) を推定するための効率的なサンプリング手法を提案する.多重スケール混合分布は,混合分布を要素分布として持つ混合モデルで,本稿では,要素分布として混合ガウス分布 (Gaussian mixture model: GMM) を導入したモデルを扱う.複数の話者が発話した音声データの集合に対して本モデルを適用した場合,発話のような数十フレーム程度の比較的短いスケールで観測される話者内変動は,各要素 GMM により表現される.一方で,異なる話者の発話間に含まれ,比較的長いスケールで観測される話者間変動は,多重スケール混合分布全体により表現される.このような階層構造を持つ複雑な分布のモデル構造推定問題では,マルコフ連鎖モンテカルロ (Markov chain Monte Carlo: MCMC) 法のような確率論的アプローチに基づくモデル推定の枠組みが有効である.しかし,ギブスサンプリングのような単純な MCMC 法をそのまま適用した場合,本来は階層構造を持つべき長時間スケールの構造と短時間スケールの構造が,どちらも対等にサンプリングされるため,繰り返しを含むモデル推定の過程で,容易に局所解に陥ってしまう.そこで,本研究では,blocked ギブスサンプリングに類する手法を導入することで,モデルの階層構造を考慮できるサンプリング手法を提案する.このとき,Iterative conditional modes (ICM) アルゴリズムを導入し,一部のサンプリングプロセスを決定論的な枠組みに置き換えることにより,全ての分布がひとつの分布に縮退してしまう病的な解が選ばれる現象を回避できることを示す.非定常なノイズを重畳した評価セットに対する話者クラスタリング実験により,提案するサンプリング法に基づく構造推定手法が,従来のサンプリング手法や変分ベイズ法に基づく構造推定手法よりも,高い精度でクラスタリング出来ることを示した.

    CiNii

  • 話者認識で用いる機械学習

    小川哲司, 松井知子

    日本音響学会誌   69 ( 7 ) 349 - 356  2013.07  [Invited]

    Authorship:Lead author, Corresponding author

    Article, review, commentary, editorial, etc. (scientific journal)  

    DOI

  • 指向性を付与したマルチチャネルウィーナフィルタを前段に持つ音源分離方式の検討

    大町基, 小川哲司, 赤桐健三, 小林哲則

    日本音響学会講演論文集   2013 ( 春季 ) 937 - 940  2013.03

  • 性能モニタリングに基づく多層パーセプトロンの適応的選択による雑音に頑健なマルチストリーム音声認識

    小川哲司, Li Feipeng, Hermansky Hynek

    日本音響学会講演論文集   2013 ( 春季 ) 167 - 170  2013.03

  • Current situations and issues of speaker recognition technologies

    AMINO Kanae, ISHIHARA Shunichi, OGAWA Tetsuji, OSANAI Takashi, KUROIWA Shingo, KOSHINAKA Takafumi, SHINODA Koichi, TSUGE Satoru, NISHIDA Masafumi, MATSUI Tomoko, WANG Longbiao

    IEICE technical report. Speech   112 ( 450 ) 63 - 70  2013.02  [Invited]

     View Summary

    Speaker recognition for recognizing who is speaking from his/her voice has been studied for 30 years. As the importance of security measures becomes greater, speaker recognition research has recently come to its boom. In this article, we survey the present state of speaker recognition researches and address their problems. In particular, we focus on their world-wide trends, machine learning approaches, robustness against the variety of environments, forensics applications.

    CiNii

  • New Speech Research Paradigm in the Cloud Era

      2012-SLP-92 ( 4 ) 1 - 7  2012.07

    CiNii

  • 発話単位DPMMを用いたフルベイズ話者クラスタリングと大規模データによる評価

    俵直弘, 小川哲司, 渡部晋治, 中村篤, 小林哲則

    日本音響学会講演論文集   2012 ( 春季 ) 207 - 210  2012.03

  • 話者照合における因子分析に基づく特徴抽出に関する評価

    小川哲司, 小林哲則

    日本音響学会講演論文集   2012 ( 春季 ) 197 - 198  2012.03

  • Fully Bayesian speaker clustering based on hierarchical structured Dirichlet process mixture model

    TAWARA Naohiro, OGAWA Tetsuji, WATANABE Shinji, NAKAMURA Atsushi, KOBAYASHI Tetsunori

      111 ( 480 ) 21 - 28  2012.03

     View Summary

    We proposed a novel speaker clustering method by estimating the structure of a fully Bayesian utterance generative model with a hierarchical structure. We defined the hierarchical generative model as a mixture of GMMs, which represent each speaker's distribution. We approximately estimated this model by introducing a sampling method because strict estimation of this model was infeasible. From speaker clustering experiments, we showed that the proposed method was effective to the data in which the number of utterances varied from speaker to speaker, while the conventional method caused significant degradation in clustering accuracy for these data.

    CiNii

  • 多重混合ガウス分布モデルにおけるフルベイズモデル推定手法の検討と話者クラスタリングによる評価

    俵直弘, 渡部晋治, 小川哲司, 小林哲則

    日本音響学会講演論文集   2011 ( 秋季 ) 175 - 178  2011.09

  • Modified LSD 最小化に基づく空間フィルタキャリブレーション

    田中信秋, 小川哲司, 小林哲則

    日本音響学会講演論文集   2011 ( 秋季 ) 33 - 36  2011.09

  • クラス内変動に頑健なカーネルマシンと話者照合への適用

    小川哲司, 日野英逸, 村田昇, 小林哲則

    日本音響学会講演論文集   2011 ( 秋季 ) 183 - 186  2011.09

    Authorship:Lead author, Corresponding author

    Research paper, summary (national, other academic conference)  

  • 発話を単位としたディリクレ過程混合モデルに基づく話者クラスタリング

    俵直弘, 渡部晋治, 小川哲司, 小林哲則

    日本音響学会講演論文集   2011 ( 春季 ) 41 - 44  2011.03

  • Investigation on optimization in speaker recognition using multiple kernel learning

    IEICE technical report   110 ( 357 ) 153 - 158  2010.12

    CiNii

  • マルチカーネル学習を用いた話者認識における最適化の検討

    小川哲司, 日野英逸, Nima Reyhani, 村田昇, 小林哲則

    情報処理学会研究報告   2010-SLP-84 ( 27 ) 1 - 6  2010.12

    CiNii

  • Toward Developing Practical Automatic Speech Recognition Technology : Sound Source Separation Using Square Microphone Array

    Takashi Yazu, Makoto Morito, Kei Yamada, Tetsuji Ogawa

    IPSJ Magazine   51 ( 11 ) 1410 - 1416  2010.11

    Authorship:Last author

    Article, review, commentary, editorial, etc. (scientific journal)  

    CiNii

  • シャッタが切り取る世界(ちょっとしたエッセイ)

    小川哲司

    日本音響学会誌   66 ( 10 ) 528 - 528  2010.10

    Authorship:Lead author, Corresponding author

    Article, review, commentary, editorial, etc. (scientific journal)  

    DOI CiNii

  • 情報論的な最適化に基づくマルチカーネル学習を用いた話者認識

    小川哲司, 日野英逸, Nima Reyhani, 村田昇, 小林哲則

    日本音響学会講演論文集   2010 ( 秋季 ) 81 - 84  2010.09

  • CENSREC-1-AV An evaluation framework for multimodal speech recognition

    Satoshi,Tamura, Chiyomi,Miyajima, Norihide,Kitaoka, Kazuya,Takeda, Takeshi,Yamada, Tetsuya,Takiguchi, Satoru,Tsuge, Kazumasa,Yamamoto, Takanobu,Nishiura, Masato,Nakayama, Yuki,Denda, Masakiyo,Fujimoto, Shigeki,Matsuda, Tetsuji,Ogawa, Shingo,Kuroiwa, Satoshi,Nakamura

    IPSJ SIG Notes   2010 ( 7 ) 1 - 6  2010.07

     View Summary

    This paper introduces an evaluation framework for multimodal speech recognition: CENSREC-1-AV. The corpus CENSREC-1-AV provides an audiovisual speech database and a baseline system of multimodal speech recognition. Speech signals were recorded in clean condition for training and in-car noises were overlapped for testing. Color and infrared pictures were captured as training data, and image corruption was conducted for testing using the gamma correction technique. In the baseline system, acoustic MFCC as well as eigenface or optical-flow information are adopted as audio and visual features respectively, then multi-stream HMMs are used as a recognition model.

    CiNii

  • 雑音下マルチモーダル音声認識評価基盤CENSREC-1-AVの構築

    田村哲嗣, 宮島千代美, 北岡教英, 武田一哉, 山田武志, 滝口哲也, 柘植覚, 山本一公, 西浦敬信, 中山雅人, 傳田遊亀, 藤本雅清, 松田繁樹, 小川哲司, 黒岩眞吾, 中村哲

    情報処理学会研究報告   2010-SLP-82 ( 7 ) 1 - 6  2010.07

    CiNii

  • CENSREC-1-AV: マルチモーダル音声認識コーパスの構築

    田村哲嗣, 宮島千代美, 北岡教英, 武田一哉, 山田武志, 滝口哲也, 柘植覚, 山本一公, 西浦敬信, 中山雅人, 傳田遊亀, 藤本雅清, 松田繁樹, 小川哲司, 黒岩眞吾, 中村哲

    日本音響学会講演論文集   2010 ( 春季 ) 219 - 220  2010.03

  • Conversation Robot and Its Audition System

    FUJIE Shinya, OGAWA Tetsuji, KOBAYASHI Tetsunori

    JRSJ   28 ( 1 ) 23 - 26  2010.01

    Article, review, commentary, editorial, etc. (scientific journal)  

    DOI CiNii

  • ロボット頭頂部に設置した小型正方形マイクロホンアレイによる音源定位

    細谷耕佑, 小川哲司, 小林哲則

    日本音響学会講演論文集   2009 ( 秋季 ) 775 - 778  2009.09

    Research paper, summary (national, other academic conference)  

  • 音声認識利用者の発声方法誘導を行うエキスパートシステムの実装と評価

    網田康裕, 中野鐵兵, 小川哲司, 菊池英明, 小林哲則

    日本音響学会講演論文集   2009 ( 秋季 ) 229 - 230  2009.09

  • ゾーン強調型ビームフォーマの構築

    田中信秋, 細谷耕佑, 小川哲司, 小林哲則

    日本音響学会講演論文集   2009 ( 秋季 ) 153 - 154  2009.09

  • ロンバード発声音声コーパスの設計と評価

    小川哲司, 川野弘, 西浦敬信, 山田武志, 北岡教英, 小林哲則

    日本音響学会講演論文集   2009 ( 秋季 ) 141 - 144  2009.09

  • 連続円動作の認識に基づくメニュー項目の選択法

    橋口拓弥, 藤江真也, 小川哲司, 中野鐵兵, 小林哲則

    画像の理解・認識シンポジウム(MIRU2009)予稿集   IS3-70   1846 - 1850  2009.07

  • 騒音下音声認識システム評価におけるロンバード効果の影響の検証−ロンバード発声適応モデルを用いた評価−

    小川哲司, 小林哲則

    日本音響学会講演論文集   2009 ( 春季 ) 175 - 176  2009.03

  • Hands-free speech recognition system for robot

    HOSOYA Kosuke, OGAWA Tetsuji, FUJIE Shinya, WATANABE Daichi, ICHIKAWA Yuhi, TANIYAMA Hikaru, KOBAYASHI Tetsunori

    IPSJ SIG Notes   2008-SLP-74 ( 123 ) 7 - 12  2008.12

     View Summary

    A new type of noise reduction method suitable for autonomous mobile robots is proposed and applied to pre-processing of a hands-free spoken dialogue system. The proposed method can reduce various kinds of noise such as directional noise, diffuse noise, moving noise of the robot, and speech utterance from the robot, which are mixed with the target speech for the case in which people talk with the robot, by using small and light-weighted devices and low-computational-cost algorithms. Here, we assume that the people talking with the robot is in the front of the robot, and thus the proposed method aims at extracting speech signals coming from the frontal direction of the robot. In addition, for the case in which the people moves from the front of the robot, the sound source can be localized by face detection and tracking using facial images obtained from a camera mounted on eyes of the robot. By taking advantage of the robot, which can combine speech information with image information, real-time reduction of the various noise can be achieved, and thus the hands-free spoken dialogue system can work well in real environments.

    CiNii

  • Progress Report of SLP Noisy Speech Recognition Evaluation WG : Individual evaluation framework for each factor affecting recognition performance (3)

    KITAOKA Norihide, YAMADA Takeshi, TAKIGUCHI Tetsuya, TSUGE Satoru, YAMAMOTO Kazumasa, MIYAJIMA Chiyomi, NISHIURA Takanobu, NAKAYAMA Masato, DENDA Yuki, FUJIMOTO Masakiyo, TAMURA Satoshi, MATSUDA Shigeki, OGAWA Tetsuji, KUROIWA Shingo, TAKEDA Kazuya, NAKAMURA Satoshi

    IPSJ SIG Notes   2008-SLP-73 ( 102 ) 41 - 46  2008.10

     View Summary

    We organized a working group under Special Interest Group of Spoken Language Processing in Information Processing Society of Japan have developed evaluation frameworks of noisy speech recognition (CENSREC series) with which one can evaluate his/her own noise-robust speech recognition method and compare it with the others. In this report, we introduce the series and then review the history of the noisy speech recognition researches in ASJ and ICASSP and view the roles of our works in the history. Finally we discuss the future directions.

    CiNii

  • Dimensionality Reduction in Rescoring Using Likelihood Patterns Given by HMMs

    OGAWA Tetsuji, KOBAYASHI Tetsunori

    IEICE technical report   108 ( 142 ) 73 - 78  2008.07

     View Summary

    We investigate dimensionality reduction of feature vectors in rescoring using likelihood patterns given by HMMs with long-time structures as feature parameters. The likelihood patterns calculated for word utterances by using word-wise statistical models give discriminative patterns even if those utterances belong to the different word classes consisting of similar phonemes. This characteristic can contribute to reduction of errors for the classes that are difficult to classify by conventional ML classification in rescoring using the likelihood feature vectors with long-time structures. However, since this method utilizes the likelihood feature vectors with a dimensionality of the number of the vocabulary, it is not feasible for large vocabulary tasks. Thus, in the present paper, we attempt to reduce the dimensionality of the feature vectors by selecting only the word classes that contribute to classification from the vocabulary and using the likelihoods only for those word classes as the feature parameters. For the case in which static pattern recognition on the feature space constructed from the likelihood feature vectors is applied to rescoring of the word recognition system, proposed dimensionality reduction did not degrade the performance considerably compared to the system without dimensionality reduction, and it improved the performance compared to the conventional HMMs.

    CiNii

  • HMM における尤度パターンの非対称性を利用した音声認識

    加藤健一, 小川哲司, 小林哲則

    日本音響学会講演論文集   2008 ( 春季 ) 209 - 212  2008.03

  • ロボット頭部に設置した4系統小型無指向性マイクロホンによるハンズフリー音声認識

    竹内寛史, 高田晋太郎, 小川哲司, 赤桐健三, 小林哲則, 森戸誠

    日本音響学会講演論文集   2008 ( 春季 ) 155 - 158  2008.03

  • 残響下音声認識評価基盤(CENSREC-4)の構築

    西浦敬信, 中山雅人, 傳田遊亀, 北岡教英, 山本一公, 山田武志, 藤本雅清, 柘植覚, 宮島千代美, 滝口哲也, 田村哲嗣, 小川哲司, 松田繁樹, 黒岩眞吾, 武田一哉, 中村哲

    日本音響学会講演論文集   2008 ( 春季 ) 175 - 178  2008.03

  • 雑音下音声認識評価ワーキンググループ活動報告:認識に影響する要因の個別評価環境(2)

    北岡教英, 山田武志, 滝口哲也, 柘植覚, 山本一公, 宮島千代美, 西浦敬信, 中山雅人, 傳田遊亀, 藤本雅清, 田村哲嗣, 松田繁樹, 小川哲司, 黒岩眞吾, 武田一哉, 中村哲

    情報処理学会研究報告   2007-SLP-69   1 - 6  2007.12

    CiNii

  • 指向性雑音と拡散性雑音の混在する環境を対象とした携帯端末向け音声強調の検討

    高田晋太郎, 小川哲司, 赤桐健三, 小林哲則

    日本音響学会講演論文集   2007 ( 秋季 ) 743 - 746  2007.09

  • テンプレート群からの確率的距離を用いた階層的音声認識の検討

    加藤健一, 小川哲司, 小林哲則

    日本音響学会講演論文集   2007 ( 秋季 ) 147 - 150  2007.09

  • シミュレーションに基づく騒音下音声認識システム評価におけるロンバード効果の影響の検証−複数の認識タスク,騒音レベルに対する評価−

    小川哲司, 倉持公壮, 小林哲則

    日本音響学会講演論文集   2007 ( 秋季 ) 195 - 198  2007.09

  • Hierarchical Spoken Word Recognition System Using Probabilistic Distances from a Group of Templates with Long-Time Structures

    KATO Ken-ichi, OGAWA Tetsuji, KOBAYASHI Tetsunori

    IEICE technical report   107 ( 116 ) 79 - 84  2007.06

     View Summary

    We propose a hierarchical spoken word recognition method which calculates probabilistic distances from a group of templates with relatively long-time structures at the first stage and adopts static pattern recognition using the probabilistic distances as feature vectors at the second stage. Almost all of conventional speech recognizers treat the time series of spectral parameter as feature vectors. They prepare the statistical models for each category. The category with highest likelihood is estimated as the category of the input data. Here, the likelihood of each category is dependent on the quantity or quality of training dataset and also structure of the statistical models. This fact leads to the classifier-specific recognition-error trends. The probabilistic distances from templates are stable if the word probability models are selected as templates. In the present paper, the fact is shown that hierarchical spoken word recognition using probabilistic distances from word templates as feature vectors can reduce errors even if the likelihood of the correct category is not highest. As the result of spoken word recognition experiment, it is shown that 79% of errors can be reduced in the proposed method compared with the conventional HMM-based speech recognition method.

    CiNii

  • 重み付きHLDA を用いた相補的識別器の構成

    加藤健一, 小川哲司, 小林哲則

    日本音響学会講演論文集   2007 ( 春季 ) 39 - 40  2007.03

  • 空間フィルタとポストフィルタを用いた背景雑音抑圧

    高田晋太郎, 小川哲司, 赤桐健三, 小林哲則

    日本音響学会講演論文集   2007 ( 春季 ) 575 - 576  2007.03

  • プロキシエージェントアーキテクチャによる音声認識アプリケーション用ユーザモニタリング機能の効率化

    中野鐵兵, 梅本暁, 藤江真也, 小川哲司, 小林哲則

    情報処理学会研究報告 (SLP)   2006-SLP-65   23 - 28  2007.02

    Research paper, summary (national, other academic conference)  

  • Combining Complementary Classifiers generated by Boosting in Feature Transformation

    KATO Ken-ichi, OGAWA Tetsuji, KOBAYASHI Tetsunori

      106 ( 442 ) 25 - 30  2006.12

    CiNii

  • Combining Complementary Classifiers generated by Boosting in Feature Transformation

    KATO Ken-ichi, OGAWA Tetsuji, KOBAYASHI Tetsunori

    IPSJ SIG Notes   2006 ( 136 (SLP-64) ) 203 - 208  2006.12

     View Summary

    A framework of system combination using boosting in a feature transformation is proposed. In general, the combination of multiple classifiers improves the classification performance of each classifier. However, there are two important issues in such a system combination. First, the classification performance is not necessarily improved if the classifiers are not complementary. Second, an inappropriate combination makes the performance worse even if the complementary classifiers can be obtained. In this paper, we attempt to solve how to generate and how to combine the complementary classifiers. Aiming at generating the complementary classifiers, the boosting was applied in HLDA based feature transformation. At the combination stage, a pattern recognition using support vector machine was performed, in which a pair of the likelihoods emitted by the classifiers of the first stage was used as a feature parameter. Experimental results showed the effectiveness of proposed method: it reduced the errors by 74% compared to the case without any system combination.

    CiNii

  • 少数のマイクロホンを用いた携帯端末向け音源分離

    高田晋太郎, 勘場智之, 小川哲司, 赤桐健三, 小林哲則

    日本音響学会講演論文集   2006 ( 秋季 ) 493 - 494  2006.09

  • 時間連続性を利用した音源分離処理の高精度化

    勘場智之, 小川哲司, 赤桐健三, 小林哲則

    日本音響学会講演論文集   2006 ( 秋季 ) 491 - 492  2006.09

  • シミュレーションに基づく騒音環境下音声認識におけるロンバード効果の 影響

    小川哲司, 勘場智之, 小林哲則

    日本音響学会講演論文集   2006 ( 秋季 ) 101 - 102  2006.09

  • Adequacy analysis of simulation-based assessment of speech recognition system

    OGAWA Tetsuji, KANBA Satoshi, KOBAYASHI Tetsunori

    IEICE technical report   106 ( 123 ) 1 - 6  2006.06

     View Summary

    The adequacies of the simulation-based assessment of speech recognition systems in noisy conditions are investigated and discussed. To evaluate the speech recognition systems in various environments, it is desirable to collect the test data in various environments but it is not realistic since enormous works are required. To perform such evaluation efficiently, it is promising to simulate evaluation experiments in target environments described below: comparatively small test data is collected and then test data of the target environment is generated by computing convolution of impulse response of the target environment and the collected data. However, it is not necessarily obvious whether the above simulation can precisely approximate the experiment in practical environment. This paper clarifies the condition to perform effective simulations of noisy speech recognition, focused on the influence of computing convolution of an impulse response and the change of acoustic characteristics affected by the Lombard effects.

    CiNii

  • Sound Source Separation using Null Beamformer and Spectral Subtraction, and its Application to Cellular Phone

    TAKADA Shintaro, KANBA Satoshi, OGAWA Tetsuji, AKAGIRI Kenzo, KOBAYASHI Tetsunori

    IEICE technical report   106 ( 123 ) 7 - 12  2006.06

     View Summary

    A novel speech segregation method which consists of a combination of the null beam former using 3 channel omni-directional microphones and the spectral subtraction is proposed and successfully applied to the mobile terminal devices such as cellular phones and PDAs. To realize the application of the speech recognition technology to the mobile devices in noisy environments, the disturbance sounds or ambient noises are required to be suppressed under the restrictions of small number of microphones, space-saving microphone arrangement, and low-cost calculation. The proposed method aims at solving this problem. In this paper, using the microphones actually embedded in the cellular phone, the performance of the proposed method is evaluated. As the result of the sound source separation and the continuous speech recognition experiments for double-talk, the proposed method improved the PESQ-based MOS value by 1 point and achieved 80% word accuracy in the speech recognition.

    CiNii

  • ロボット頭部に設置したマイクロホンによる環境変動に頑健な音源定位

    久保俊明, 持木南生也, 小川哲司, 小林哲則

    人工知能学会研究会資料   SIG-Challenge-0522   89 - 94  2005.10

    CiNii

  • BSSとスペクトラルサブトラクションの多段処理による音源分離

    伊佐崇, 関矢俊之, 小川哲司, 小林哲則

    日本音響学会講演論文集   2005 ( 秋季 ) 705 - 706  2005.09

  • ロボット頭部に設置した4系統指向性マイクロホンによる音源定位におけるHLDA利用の効果

    久保俊明, 持木南生也, 小川哲司, 小林哲則

    日本音響学会講演論文集   2005 ( 秋季 ) 717 - 718  2005.09

  • An extension of the state-observation dependency in Partly Hidden Markov Models and its application to continuous speech recognition

    Tetsuji Ogawa, Tetsunori Kobayashi

    Systems and Computers in Japan   36 ( 8 ) 31 - 39  2005.07

     View Summary

    We extend the state-observation dependencies in a Partly Hidden Markov Model (PHMM) and apply this model to continuous speech recognition. In a PHMM the observations and state transitions are dependent on a series of hidden and observable states. In the standard formulation of a PHMM, the observations and state transitions are conditioned on the same hidden state and observable state variables. Here we also condition the observations and state transitions on the same hidden states but condition the observations and state transitions on different observation states, respectively. This simple improvement to the model gives it significant flexibility allowing it to model stochastic processes more precisely. In addition, by integrating the PHMM containing this extended state-observation dependency with a standard HMM we can construct a stochastic model that we call a Smoothed Partly Hidden Markov Model (SPHMM). Results of continuous speech recognition on a newspaper read-speech have shown reductions of 10 and 24% in the error rate using the PHMM and SPHMM, respectively, compared to a standard HMM thereby displaying the effectiveness of the proposed models. © 2005 Wiley Periodicals, Inc.

    DOI

  • Optimizing the Structure of Partly Hidden Markov Models Using Classification Measure and Genetic Algorithm

    OGAWA Tetsuji, KOBAYASHI Tetsunori

    IEICE technical report. Speech   105 ( 132 ) 37 - 42  2005.06

     View Summary

    A structure of Partly-Hidden Markov Model (PHMM) is optimized. PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can realize the observation dependent behaviors in both observations and state transitions. In the formulation of previous PHMM, we used a common structure in all model categories. However, it is well known that the optimal structure which gives best performance differes from category to category. In this paper, we designed a new structure optimization method in which the state-observation dependences in PHMM are optimally defined with respect to each category using Weighted Likelihood-Ratio Maximization (WLRM) criterion. WLRM criterion induces sparse and discriminative structures, and therefore gives the resulting structurally discriminative models. We define the model structure combination which gives maximum weighted likelihood-ratio for any possible structure patterns as the optimal structures, and Genetic Algorithm is applied to an optimal approximation of search. As the result of continuous speech recognition aiming at lecture talk, the effectiveness of the proposed structure optimization is shown: it reduced the word errors compared to HMM and PHMM with common structure for all categories.

    CiNii

  • Enhancement of Frequency-Domain BSS by Solving Permutation Problem Using Reference Signal and SMDP

    ISA Takashi, SEKIYA Toshiyuki, OGAWA Tetsuji, KOBAYASHI Tetsunori

    IEICE technical report. Speech   105 ( 133 ) 31 - 36  2005.06

     View Summary

    In this paper, we propose a method for solving the permutation problem of blind source separation (BSS) in frequency domain. We integrate techniques, BSS in frequency domain with SMDP (Segregation using Multiple Directivity Pattern). For each frequency, the permutation problem is solved by calculating correlation coefficients between the reference signal and the separated signal. The reference signal is obtained by different processes from BSS as corresponding to individual original signal. It does not need to be separated well. We generate a simultaneous equations of the amplitudes of sound sources using these multiple directivities. The solution of these equations gives good estimates of disturbances. The spectral subtraction is applied with these estimates of disturbances and the perfect enhancement of target speech is performed. The experimental results of double talk recognition show that the proposed technique is effective to achieve 30% error reduction.

    CiNii

  • Effectiveness of adopting HLDA in sound source localization using spectral intensity ratio of microphones

    KUBO Toshiaki, MOCHIKI Naoya, SEKIYA Toshiyuki, OGAWA Tetsuji, KOBAYASHI Tetsunori

    IEICE technical report. Speech   105 ( 133 ) 37 - 42  2005.06

     View Summary

    We propose a novel sound source localization method which gives a robust performance in various environments by Heteroscedastic LDA (HLDA). In our previous work, a robust sound localization method which does not require a strict head related transfer function (HRTF) was proposed. In this method, spectral intensity ratio of microphones mounted on the robot head is extracted as a feature parameters, and then a statistical pattern recognition is conducted. In our pattern recognition, it is well known that the degradation of performance is invoked by the difference between the training environment and the operating environment. In order to compensate the difference, a model adaptation technique, such as MLLR, is executed using a small amount of data obtained from several directions in the operating environment. However, in case that an environment in which the robot acts changes at any time, it is practically difficult to adapt in every case. Thus, in this paper, we propose the useful information extraction from feature vectors by using HLDA. In this method, nuisance information which dones not contribute to discrimination, such as reverberation is deleted and essential information can be extracted. As the result of sound source localization experiment, the robustness of the proposed method is shown.

    CiNii

  • ロボット頭部に設置した4系統指向性マイクロホンによる音源定位

    持木南生也, 関矢俊之, 小川哲司, 小林哲則

    日本音響学会講演論文集   2005 ( 春季 ) 609 - 610  2005.03

  • 重み付き尤度比最大基準に基づく部分隠れマルコフモデルの構造の最適化

    小川哲司, 小林哲則

    日本音響学会講演論文集   2005 ( 春季 ) 131 - 132  2005.03

  • ロボット頭部に設置した4系統指向性マイクロフォンによる音源定位および混合音声認識

    持木南生也, 関矢俊之, 小川哲司, 小林哲則

    人工知能学会研究会資料   SIG-Challenge-0420-4   21 - 27  2004.12

  • 複数の指向特性を利用した音源分離における音源定位との統合

    関矢俊之, 小川哲司, 小林哲則

    日本音響学会講演論文集   2004 ( 秋季 ) 617 - 618  2004.10

  • 雑音環境下における階層的音源分離の評価

    関矢俊之, 澤田知寛, 小川哲司, 小林哲則

    日本音響学会講演論文集   2004 ( 春季 ) 99 - 100  2004.03

  • ロボット頭部に設置した4系統指向性マイクロホンによる混合音声認識

    持木南生也, 関矢俊之, 小川哲司, 小林哲則

    日本音響学会講演論文集   2004 ( 春季 ) 95 - 96  2004.03

  • 階層的音源分離に基づく混合音声の認識

    澤田知寛, 関矢俊之, 小川哲司, 小林哲則

    人工知能学会研究会資料   SIG-Challenge-0318-5   27 - 32  2003.11

  • Mixed Speech Recognition Using Microphonearray

    SEKIYA Toshiyuki, OGAWA Tetsuji, KOBAYASHI Tetsunori

    IEICE technical report. Speech   103 ( 93 ) 13 - 18  2003.05

     View Summary

    Double-talk recognition under distant microphone condition is one of the serious problems in real environment speech recognition. In this paper, this problem is solved by the microphone-array based BSAS (Band-Selection-based Audio Segregation). In this approach, we prepare some different directivity characteristics using a microphone array, and utilize the difference of these outputs of the array to extract desired speech. We also used generalized harmonic analysis (GHA) instead of FFT for the spectral analysis to improve the performance of BSAS. These modifications enable good segregation in a human auditory sense, but the quality is still insufficient for recognition because some spectral distortion occur in segregation processing. We used MLLR-based acoustic model adaptation and retraining to be robust to the spectral distortion. These efforts enabled 76.2% word accuracy under the condition that the SN ratio is 0 dB, this represents a 45% reduction in the error obtained in the case where only array signal processing was used, and a 30% error reduction compared with when array signal processing and BSAS were used.

    CiNii

  • SAFIAによる同時発話音声の認識

    関矢俊之, 芹沢新, 小川哲司, 小林哲則

    日本音響学会講演論文集   2003 ( 春季 ) 19 - 20  2003.03

  • 部分隠れマルコフモデルの拡張と連続音声認識による評価

    小川哲司, 小林哲則

    日本音響学会講演論文集   2002 ( 秋季 ) 51 - 52  2002.09

  • Continuous speech recognition by Partly-Hidden Markov Model

    OGAWA Tetsuji, KOBAYASHI Tetsunori

    IEICE technical report. Speech   102 ( 159 ) 25 - 30  2002.06

     View Summary

    Generalization of state-observation dependencies in Partly-Hidden Markov Model (PHMM) is performed and it is successfully applied to the continuous speech recognition. PHMM, which was proposed in our previous paper, is the novel stochastic model, in which the pairs of the hidden states (H-state) and the observable states (0-state) determine the stochastic phenomena of the current observation and the next state transition. In the previous formulation of PHMM, we used common pair of H-state and 0-state to determine both of these phenomena. In the formulation of modified PHMM proposed here, we use common H-state but different 0-states for the current observation and for the next state separately. This slight modification brought the big flexibility in the modeling of phenomena. Experimental results showed the effectiveness of PHMM (without delta parameters): it reduced the word error by 19% compared to triphone HMM (with delta parameters), respectively.

    CiNii

  • 複数の話者依存モデルを用いた話者空間表現に基づく話者適応

    牛久祐輔, 小川哲司, 小林哲則

    日本音響学会講演論文集   2001 ( 秋季 ) 129 - 130  2001.10

  • 音素単位の部分隠れマルコフモデルにおける状態・出力依存関係の一般化

    小川哲司, 小林哲則

    日本音響学会講演論文集   2000 ( 秋季 ) 19 - 20  2000.09

  • 部分隠れマルコフモデルにおける状態・出力依存関係の一般化

    小川哲司, 古山純子, 小林哲則

    日本音響学会講演論文集   2000 ( 春季 ) 155 - 156  2000.03

▼display all

Industrial Property Rights

  • 予兆検知システムおよびプログラム

    特許7313610

    中野 鐵兵, 小川 哲司, 小林 哲則, 沖本 祐典

    Patent

  • 収音装置、収音プログラム、及び収音方法

    藤枝 大, 原 宗大, 片桐 一浩, 西城 耕平, 小林 哲則, 小川 哲司

    Patent

  • 収音装置、収音プログラム、及び収音方法

    藤枝 大, 片桐 一浩, 西城 耕平, 小川 哲司

    Patent

  • 音声認識モデル学習装置、音声認識装置、およびプログラム

    佐藤 裕明, 所澤 愛子, 伊藤 均, 三島 剛, 河合 吉彦, 小森 智康, 小川 哲司, 佐藤 庄衛

    Patent

  • 学習装置、音声認識装置、学習方法、および、学習プログラム

    Patent

  • 照合装置、照合方法、および、照合プログラム

    小川 哲司

    Patent

  • 制御状態監視システムおよびプログラム

    Patent

  • 信号処理装置、信号処理プログラム、信号処理方法、及び収音装置

    Patent

  • 予兆検知システムおよびプログラム

    中野 鐵兵, 小川 哲司, 小林 哲則, 沖本 祐典

    Patent

  • モニタリング対象機器の異常発生予兆検知方法及びシステム

    長谷川 隆徳, 緒方 淳, 小川 哲司, 村川 正宏

    Patent

  • 予測装置、予測方法および予測プログラム

    小林 哲則, 小川 哲司, 森岡 幹

    Patent

  • 状態監視システム

    中野 鐵兵, 小林 哲則, 斎藤 奨, 小川 哲司

    Patent

  • 単語予測装置、プログラム

    岩田 具治, 小川 厚徳, 小林 哲則, 小川 哲司, 森岡 幹, 川崎 真未

    Patent

  • 音源分離システム、方法及びプログラム

    矢頭 隆, 片桐 一浩, 藤枝 大, 小林 哲則, 大町 基, 小川 哲司

    Patent

  • 音源分離装置、方法及びプログラム

    Patent

  • 音源分離装置、プログラム及び方法

    Patent

  • 音源分離装置、方法及びプログラム

    Patent

  • エコーキャンセラ及びエコーキャンセル方法

    小林 哲則, 赤桐 健三, 藤江 真也, 小川 哲司

    Patent

  • 認識器構築システム、認識器構築方法、組立サービス提供システム、およびプログラム

    小林 哲則, 中野 鐵兵, 藤江 真也, 小川 哲司

    Patent

▼display all

 

Syllabus

▼display all

Teaching Experience

  • Computer Science and Communications Engineering Lab

    Waseda University  

    2023.04
    -
    Now
     

  • Pattern Recognition

    Waseda University  

    2023.04
    -
    Now
     

  • Pattern Recognition and Machine Learning

    Waseda University  

    2023.04
    -
    Now
     

  • 回路理論B

    早稲田大学  

    2020.09
    -
    Now
     

  • 機械学習

    早稲田大学/enPiT-Pro スマートエスイー  

    2019.04
    -
    Now
     

  • Introduction to Computers and Networks

    Waseda University  

    2019.04
    -
    Now
     

  • 情報通信基礎

    早稲田大学  

    2017.04
    -
    Now
     

  • Perceptual Computing

    Waseda University  

    2016.09
    -
    Now
     

  • Logic Circuits

    Waseda University  

    2016.09
    -
    Now
     

  • 論理回路

    早稲田大学  

    2016.04
    -
    Now
     

  • 最適化と認識・学習

    早稲田大学  

    2021.09
    -
    2023.03
     

  • 情報通信実験C/音情報処理

    早稲田大学  

    2016.09
    -
    2023.03
     

  • Pattern Recognition and Machine Learning

    Waseda University  

    2016.04
    -
    2023.03
     

  • 工学系のモデリングA

    早稲田大学  

    2016.04
    -
    2023.03
     

  • アルゴリズムとデータ構造A

    早稲田大学  

    2019.04
    -
    2019.09
     

  • Circuit Theory A

    Waseda University  

    2016.09
    -
    2019.03
     

  • Machine Learning

    Egypt-Japan University of Science and Technology  

    2012.09
    -
    2015.02
     

  • 知覚情報システム

    早稲田大学  

    2008.04
    -
    2011.09
     

  • 音情報処理

    早稲田大学 オープン教育センター  

    2008.09
    -
    2011.03
     

  • インタラクティブシステム

    早稲田大学 オープン教育センター  

    2008.04
    -
    2010.09
     

  • 音インタフェース

    早稲田大学 オープン教育センター  

    2007.09
    -
    2008.03
     

▼display all

 

Sub-affiliation

  • Faculty of Science and Engineering   Graduate School of Fundamental Science and Engineering

Research Institute

  • 2022
    -
    2024

    Waseda Research Institute for Science and Engineering   Concurrent Researcher

  • 2022
    -
    2024

    Waseda Center for a Carbon Neutral Society   Concurrent Researcher

  • 2022
    -
    2024

    Research Organization for Open Innovation Strategy   Concurrent Researcher

Internal Special Research Projects

  • 持続可能な看護支援のための重度障がい児感情推定システムの構築に関する研究

    2023  

     View Summary

    医療が必要な重症心身障がい児(以下,重症児)とのコミュニケーションを支援する人工知能(AI)技術として,映像から重症児の感情状態や意図を推定する方式について検討を行った.重症児は感情の表出方法に強い個人性があること,および感情状態・意図推定の目的が医療・看護に関する意思決定支援であることから,持続可能な重症児看護のための感情状態推定を,少量の学習データでも頑健に感情状態推定モデルを構築可能(要件Ⅰ),感情状態が検知された際の根拠を説明可能(要件Ⅱ)な形で実現することを試みた.具体的には,感情状態の拠り所(サイン)もしくはその構成要素を識別するような大規模事前学習モデルの利用によって,上述の二つの要件を満たすような感情状態推定の枠組みを提案した.提案方式では,顔表情から感情状態が読み取れる児を想定し,顔面筋の動作単位であるアクションユニット(AU)をサインの構成要素として検出するとともに,検出における中間情報を感情状態推定の特徴量として利用した.顔面筋の動きは実際に養育者が意思決定過程で拠り所とする情報であり,感情状態の特徴表現及び予測根拠の直感的な説明材料として利用可能である.また,顔面筋の動きは人に依らない情報であるため,健常者の大規模データを事前学習モデルの構築に利用出来る.これにより高精度な特徴が抽出され,感情状態推定器の学習が重症児の少量データで可能になると期待される.重症児 1 名の映像データを題材とした快・不快状態の推定実験を通して,提案手法と汎用的な事前学習モデルを用いる手法を比較し,推定性能と予測根拠の説明性の観点から,提案手法が有効であるという結論を得た.本研究で得られた知見は,重症児のコミュニケーション支援AIの開発のみならず,個人依存性が高い属性の予測およびそのためのモデリング一般に貢献することが期待される.

  • クラウドソーシングにおける品質保証:効率的な回答収集のための動的なタスク発注

    2022  

     View Summary

    クラウドソーシング(インターネットを通じた作業の依頼)の活用により機械学習に必要な大規模データを比較的容易に収集可能となっているが,悪意のあるワーカ等に起因したデータ品質の劣化が問題となる.同一タスクに複数発注を行い回答の多数決を行うことでデータの品質を向上できる一方,発注数の増加に伴うコストの増加も無視できない.それに対し,タスクの難易度に応じて発注数を適応的に決定することで,経済性と信頼性を併せ持つデータ品質保証技術の開発を試みた.家畜の監視画像に対するアノテーションにおいて,発注数の最小値と最大値,ワーカの最低合意率といったパラメータを正解ラベルなしで学習できることを明らかにした.

  • 意思決定⽀援のための説明可能な状態監視システムの構築・運⽤法に関する研究

    2021  

     View Summary

    状態監視システムを運用する過程で蓄えられるデータをクラウドソーシングにより検証することで効率的かつ持続的にシステムを成長させる枠組みを,畜産業従事者の意思決定支援において重要な課題である,家畜の分娩予兆を映像情報から検知するシステムの開発を通じて確立することを試みた.具体的には,1)正例の見逃しを含むラベルノイズに頑健な映像監視モデルのマルチタスク学習法,2)深層ニューラルネットワークによる予測の不確実性推定のための,相補性を考慮したアンサンブル学習法と,複数モデルの予測の不一致に基づくデータ選択法,3)ストリーミング映像の監視システムを実時間動作可能にする実装法を明らかにした.

  • クラウドソーシングと物体追跡を用いた効率的な映像アノテーションに関する研究

    2021  

     View Summary

    映像中の複数の移動物体に対するアノテーションを効率的に行うため,物体検出器の反復的自己学習により得られる疑似矩形ラベルを活用したインタラクティブなアノテーション方式を提案した.提案方式では,矩形ラベル生成において検出対象の見逃しを低く抑えながら,反復的自己学習により対象の外観の変化に頑健な物体検出器を構築した.また,インタラクティブな追跡により低品質の追跡結果を補正することでアノテーション精度を改善するとともに,対象物体に矩形を描画する既存ツールのアノテーションコストを削減することに成功した.実際,標準的なベンチマークや家畜の映像監視データを用いた検証を通じ,提案方式の高い実用性を確認した.

  • クラウドソーシングを活用した持続可能な状態監視システムの構築・運用法に関する研究

    2020  

     View Summary

    人の意思決定支援を目的とした映像監視システムは,1)少量データで構築可能,2)持続的に運用可能,3)予測結果の根拠を説明可能,であることが求められる.本研究では,ユーザ(専門家)の意思決定プロセスに係る知識をニューラルネットワークに組み込むことで,これらの要件を満たすシステムを構築・運用するためのフレームワークを確立することを試みた.提案フレームワークに基づき映像監視による繁殖牛の分娩予兆検知システムを構築し,少量データ・環境変動に対して頑健な予兆検知性能と畜産業従事者に対する予測根拠の解釈可能性の両面においてend-to-endアプローチで構築したシステムに対する有効性を明らかにした.

  • ドローンによる空撮に基づく潮目の検知に関する研究

    2020  

     View Summary

    ドローンによって撮影された海面映像から自動的に潮目を検知する技術の開発を試みた.ドローンによる潮目の検知が可能になれば,良漁場に関する情報を比較的低コストで漁業事業者に提供できるため,操業効率化への貢献が期待される.潮目検知モデルを構築するために,ドローン空撮による潮目画像データセット(画像総数158,739枚)を構築し,潮目の有無に関する識別実験を行った.潮目の検知モデルにPyramid pooling moduleを備えた畳み込みニューラルネットワークを用いたところ,適合率0.90,再現率0.81,F値0.85という性能で潮目が検知できることがわかった.

  • 映像情報を用いた繁殖牛の発情予兆検知に関する研究

    2019  

     View Summary

    インターネットを通して不特定多数の人に仕事を依頼するクラウドソーシングを用いて,映像から繁殖牛の発情予兆を検知するための技術開発を行った.特に,本研究では,牛の発情予兆として乗駕行動に着目し,その評価基盤を構築した.まず,物体検出アルゴリズムとクラウドソーシングを用いて,牛の検出漏れを抑えながら乗駕行動の有無を信頼性高くアノテーションする方式を開発した.14頭の肉牛がいるフリーストール内で収録した乗駕行動29回分の映像データに対して提案したアノテーションを実施し,合計5020枚の画像からなるデータセットを構築した.さらに,構築したデータセットを用いて交差検証による実験を行ったところ,画像単位では陽性判定率0.80,感度0.76で乗駕行動の検知が可能であることがわかった.

  • 映像監視システムの持続可能な運用法に関する研究

    2019  

     View Summary

    ビッグデータの蓄積を待たずに映像監視システムを早期運用しながら,日々蓄えられるデータを効率的に利用してシステムを成長させる枠組みの確立を試みた.特に,本研究では,パターン認識に基づく映像監視の結果をクラウドソーシングを活用して修正することで,システムの早期運用段階においても高い検知性能を保持する枠組みの開発と検証を行った.映像情報を用いた繁殖牛の分娩検知システムの開発を通じて,提案した映像監視システムの早期運用法に関する評価を行ったところ,パターン認識(分娩検知)とクラウドソーシングを併用することにより,分娩の見逃しを低く抑えながら誤検出を抑制でき,映像監視システムの早期運用が可能であることを明らかにした.

  • エリア収音と敵対的生成ネットワークを用いた多様な雑音に頑健な音声強調

    2018   俵 直弘

     View Summary

    エリア収音により生じた非線形歪を敵対的デノイジングオートエンコーダ (ADAE) により補正するポストフィルタ法を提案した.エリア収音は時間周波数マスキングに基づき目的音と妨害音を高精度に分離可能な技術であるが,非線形信号処理特有の不快な歪が発生するという問題がある.そこで,単チャネル音源強調において有効なADAEを用いて非線形歪を低減することを試みたところ,音質改善に有効であることが示された.また,分離処理前の観測信号や雑音情報をADAEの補助入力として用いるnoise-aware学習の枠組みを導入することで,強調信号の更なる品質改善が得られた.

  • エリア収音と深層学習を用いた高速・高精度・低歪の雑音除去フィルタ構成法

    2017  

     View Summary

    拡散性雑音が重畳された音声に対して低歪で高精度な雑音抑圧を実現する方式について検討を行った.そのために,申請者が研究を続けてきた音源分離技術であるエリア収音により目的音と拡散性雑音を分離した後,目的音に残留した雑音成分を抑圧するフィルタの推定法を提案した.具体的には,エリア収音により分離した目的音と雑音のパワースペクトルから深層ニューラルネットワークによって線形フィルタの係数(厳密にはpriori SNR)を推定した.拡散性雑音下での雑音抑圧性能を雑音抑圧率および対数スペクトル距離により評価したところ,提案手法は双方の尺度で従来のマルチチャネルウィナーフィルタの性能を改善した.

  • メタ認知機能を有するパターン認識システムの構成法に関する研究

    2016  

     View Summary

     人が持つメタ認知機能(知っているか否かを知る,どの程度知っているかを知る機能)を模倣することで,データの収集だけに頼らずに未知の入力に対して頑健に高い性能を与えるパターン認識方式の確立を目指す.本課題では,雑音下音声認識での評価を通じ,「メタ認知機能を有するパターン認識」の基本となる認識性能予測技術およびマルチストリーム型パターン認識アルゴリズムに焦点を当てて検討を行った. 異なる現象を扱うパターン認識システムをDNNにより多数構築しておき,そのうち最適なシステムをDNNの出力(事後確率)の時間変化量および自己符号化器の復元誤差に基づき選択して用いることで,環境変動に頑健な認識を実現した.

  • 部分隠れマルコフモデルによる自然発話音声認識

    2004  

     View Summary

     本研究では、音声認識に用いる確率モデルとして一般的に用いられている隠れマルコフモデル(Hidden Markov Model; HMM)に代わる表現能力の高い確率モデルとして、部分隠れマルコフモデル(Partly-Hidden Markov Model; PHMM)を提案している。このPHMMは、状態と出力双方が過去の出力に依存する枠組みであるが、その構造は全てのモデルカテゴリで共通なものを用いてきた。そこで本年度は,重みつき尤度比最大基準に基づき、PHMM における状態と出力間の依存構造をモデルカテゴリ毎に最適に選択することを試みた。 尤度比最大化に基づくモデル構造選択の枠組みでは、正解カテゴリと不正解カテゴリが与える対数尤度の差を直接計算したものを目的関数として導入し、その値を最大にするようなモデル構造を選択する。ここで、尤度比を改善しても認識結果が変わりにくい、尤度比の値が大きな値を持つデータより、それが0に近い値を持つデータを対象として尤度比を改善することが重要であるため、尤度比の値が小さいときはその値をそのまま用い、尤度比の値が大きいときはある閾値で打ち切るように重み付けを行った。この重み付けされた尤度比を重みつき尤度比と呼び、ここでは重みつき尤度比を最大化するようにモデル構造の選択を行った。また本手法では、各々のカテゴリに帰属するデータに対して重みつき尤度比を最大化するのではなく、全てのカテゴリに対して取り得るモデル構造の組み合わせを考え、生成される膨大な数のモデル構造の組み合わせに対して重みつき尤度比を最大化する。そして、最大の重みつき尤度比を与える構造の組み合わせを、最適な構造と考える。しかし、このような膨大なパターンに対する全探索は現実的ではなく、遺伝的アルゴリズムを適用し、全探索おける近似解を与えることを試みた。 学会講演音声を対象とする連続音声認識実験により提案するモデル構造選択手法の有効性を評価したところ、モデル構造を行わないPHMMの誤りを削減することが示された。

▼display all