2024/12/21 更新

写真a

マキノ ショウジ
牧野 昭二
所属
理工学術院 大学院情報生産システム研究科
職名
特任教授
学位
博士 ( 東北大学 )
プロフィール

早稲田大学 大学院情報生産システム研究科 教授.専門は音響信号処理,特にブラインド音源分離やエコーキャンセラに関する研究に従事.IEEE Jack S. Kilby Signal Processing Medal Committee, IEEE James L. Flanagan Speech & Audio Processing Award Committee, IEEE SPS Fellow Evaluation Committee, IEEE SPS Board of Governor,IEEE SPS Audio and Acoustic Signal Processing TC Chair,IEEE WASPAA2007 General Chair,IEEE Trans. Speech Audio Processing Associate Editor,文部科学大臣表彰 (科学技術賞 研究部門),電子情報通信学会 名誉員・功績賞・業績賞,IEEE SPS Leo L. Beranek Meritorious Service Award, ICA Unsupervised Learning Pioneer Award,IEEE MLSP Competition Award,IEEE Distinguished Lecturer,IEEE Fellow.電子情報通信学会Fellow.

経歴

  • 2021年04月
    -
    継続中

    早稲田大学 大学院情報生産システム研究科 教授

  • 2009年04月
    -
    2021年03月

    筑波大学 先端学際領域研究センター および 大学院システム情報工学研究科   教授

  • 2014年04月
    -
    2018年03月

    国立情報学研究所   客員教授

  • 2013年04月
    -
    2018年03月

    理化学研究所   客員研究員

  • 2008年04月
    -
    2009年03月

    日本電信電話株式会社   コミュニケーション科学基礎研究所   主幹研究員

  • 2008年12月
    -
    2009年02月

    University Erlangen-Nuremberg, Germany   客員教授

  • 2004年04月
    -
    2008年03月

    北海道大学   大学院情報科学研究科   客員教授

  • 2003年04月
    -
    2008年03月

    日本電信電話株式会社   コミュニケーション科学基礎研究所   メディア情報研究部長

  • 2006年04月
    -
    2007年03月

    東京大学   大学院情報理工学系研究科   非常勤講師

  • 2000年04月
    -
    2003年03月

    日本電信電話株式会社   コミュニケーション科学基礎研究所   信号処理研究グループリーダ

  • 1999年01月
    -
    2000年03月

    日本電信電話株式会社   生活環境研究所   グループリーダ

  • 1996年07月
    -
    1998年12月

    日本電信電話株式会社   マルチメディアシステム総合研究所   主幹研究員

  • 1987年08月
    -
    1996年06月

    日本電信電話株式会社   ヒューマンインタフェース研究所   主任研究員

  • 1981年04月
    -
    1987年07月

    日本電信電話株式会社   横須賀電気通信研究所

▼全件表示

学歴

  • 1993年03月
    -
     

    東北大学   博士(工学)  

  • 1979年04月
    -
    1981年03月

    東北大学   大学院工学研究科   機械工学専攻  

  • 1975年04月
    -
    1979年03月

    東北大学   工学部   機械工学第Ⅱ学科  

委員歴

  • 2019年
    -
    継続中

    日本学術振興会  Member of the Grants-in-Aid for Scientific Research Sub-Committee

  • 2019年
    -
    継続中

    European Association for Signal Processing (EURASIP)  Member of the Special Area Team on Acoustic, Speech and Music Signal Processing

  • 2018年
    -
    継続中

    Asia Pacific Signal and Information Processing Association  Member of the Signal and Information Processing Theory and Methods Technical Committee

  • 2014年05月
    -
    継続中

    電子情報通信学会  応用音響研究会 顧問

  • 2013年
    -
    継続中

    日本音響学会  理事

  • 2007年
    -
    継続中

    電子情報通信学会  フェロー

  • 2005年
    -
    継続中

    日本音響学会  評議員

  • 2004年04月
    -
    継続中

    International Speech Communication Association (ISCA)  Member

  • 2004年
    -
    継続中

    Institute of Electrical and Electronics Engineers (IEEE)  Fellow

  • 2003年
    -
    継続中

    日本音響学会  代議員

  • 2003年
    -
    継続中

    International ICA Steering Committee  Member

  • 2000年04月
    -
    継続中

    European Association for Signal Processing (EURASIP)  Member

  • 1999年
    -
    継続中

    International Workshop on Acoustic Echo and Noise Control  International IWAENC Standing Committee Member

  • 1989年04月
    -
    継続中

    Institute of Electrical and Electronics Engineers (IEEE)  Member

  • 1988年04月
    -
    継続中

    電子情報通信学会  会員

  • 1983年04月
    -
    継続中

    日本音響学会  会員

  • 2018年
    -
    2020年

    IEEE Signal Processing Society  Member of the Board of Governors

  • 2019年
     
     

    日本学術振興会  科学研究費 基盤研究(S) 審査意見書委員

  • 2018年
    -
    2019年

    日本学術振興会  国際事業委員会書面審査員・書面評価員

  • 2018年
    -
    2019年

    日本学術振興会  特別研究員等審査会専門委員

  • 2018年
    -
    2019年

    2018 International Workshop on Acoustic Signal Enhancement  General Chair

  • 2017年
    -
    2018年

    IEEE Signal Processing Society Japan Chapter  Chair

  • 2015年
    -
    2018年

    Institute of Electrical and Electronics Engineers (IEEE)  Member of Jack S. Kilby Signal Processing Medal Committee

  • 2013年
    -
    2015年

    日本学術振興会  科学研究費委員会専門委員

  • 2013年
    -
    2015年

    IEEE Signal Processing Magazine  Guest Editor

  • 2014年
     
     

    日本音響学会  独創研究奨励賞板倉記念選考委員会委員長

  • 2013年
    -
    2014年

    IEEE Signal Processing Society  Technical Directions Board Member

  • 2013年
    -
    2014年

    IEEE Signal Processing Society  Chair of the Audio and Acoustic Signal Processing Technical Committee

  • 2013年07月
     
     

    2013 International Conference of the IEEE Engineering in Medicine and Biology (EMBC2013)  Tutorial Speaker

  • 2012年
    -
    2013年

    2012 IEEE International Conference on Acoustics, Speech, and Signal Processing  Plenary Chair

  • 2011年
    -
    2012年

    2011 Annual Conference of the International Speech Communication Association  Tutorial Speaker

  • 2005年
    -
    2012年

    European Association for Signal Processing  Associate Editor of the EURASIP JASP

  • 2009年
    -
    2011年

    IEEE Japan Council  Awards Committee Member

  • 2008年
    -
    2011年

    Institute of Electrical and Electronics Engineers (IEEE)  James L. Flanagan Speech & Audio Processing Award Committee Member

  • 2009年
    -
    2010年

    電子情報通信学会  フェロー推薦委員会 委員

  • 2009年
    -
    2010年

    IEEE Signal Processing Society  Distinguished Lecturer

  • 2008年
    -
    2009年

    2008 Joint Workshop on Hands-Free Speech Communication and Microphone Arrays  Panelist

  • 2008年
     
     

    電子情報通信学会  論文賞選定委員会委員

  • 2007年
    -
    2008年

    2007 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics  General Chair

  • 2007年
    -
    2008年

    電子情報通信学会  基礎・境界ソサイエティ 音響超音波サブソサイエティ 会長

  • 2007年
    -
    2008年

    2007 IEEE International Conference on Acoustics, Speech and Signal Processing  Tutorial Speaker

  • 2007年
    -
    2008年

    2007 International Conference on Independent Component Analysis and Signal Separation  Keynote Speaker

  • 2006年
    -
    2008年

    電子情報通信学会  応用音響研究会 委員長

  • 2006年
    -
    2008年

    IEEE Signal Processing Society  Awards Board Member

  • 2006年
    -
    2007年

    日本音響学会  粟屋潔学術奨励賞選定委員会委員

  • 2005年
    -
    2006年

    2005 Joint Workshop on Hands-Free Speech Communication and Microphone Arrays  Panelist

  • 2002年
    -
    2005年

    Institute of Electrical and Electronics Engineers (IEEE)  Associate Editor of the IEEE Trans. Speech and Audio Processing

  • 2001年
    -
    2005年

    日本音響学会  佐藤論文賞選定委員会委員

  • 2003年
    -
    2004年

    2003 International Workshop on Acoustic Echo and Noise Control  General Chair

  • 2002年
    -
    2004年

    IEEE Signal Processing Society  Conference Board Member

  • 2013年
    -
    継続中

    European project Embedded Audition for Robots  Advisory Board member

  • 2006年
    -
    継続中

    International Advisory Panel Member

  • 2003年
    -
    継続中

    Acoustical Society of Japan  Council member

  • 2020年
    -
    2021年

    2020 European Signal Processing Conference  Special Session Organizer

  • 2020年
    -
    2021年

    2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2020年
    -
    2021年

    2020 European Signal Processing Conference  Area Chair

  • 2020年
    -
    2021年

    2020 International Workshop on Acoustic Echo and Noise Control  Member of the Organizing Committee

  • 2019年
    -
    2020年

    2019 European Signal Processing Conference  Special Session Organizer

  • 2019年
    -
    2020年

    2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2019年
    -
    2020年

    2019 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)  Member of the Technical Committee

  • 2019年
    -
    2020年

    IEEE Signal Processing Society  Member of the TC Review Committee

  • 2018年
    -
    2020年

    IEEE Signal Processing Society  Member of the Long-Range Planning and Implementation Committee

  • 2018年
    -
    2019年

    2018 European Signal Processing Conference  Special Session Organizer

  • 2018年
    -
    2019年

    2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2018年
    -
    2019年

    2018 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2018年
    -
    2019年

    2018 IEEE International Conference on Acoustics, Speech, and Signal Processing  Area Chair

  • 2017年
    -
    2018年

    2017 European Signal Processing Conference  Special Session Organizer

  • 2017年
    -
    2018年

    2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2017年
    -
    2018年

    2017 IEEE International Conference on Acoustics, Speech, and Signal Processing  Area Chair

  • 2016年
    -
    2017年

    Special Session Organizer

  • 2016年
    -
    2017年

    2016 European Signal Processing Conference  Member of the Technical Program Committee

  • 2016年
    -
    2017年

    2016 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2016年
    -
    2017年

    Area Chair

  • 2016年
    -
    2017年

    Area Chair

  • 2016年
    -
    2017年

    IEEE Signal Processing Society  Chair of the Nominations and Elections Subcommittee of the Audio and Acoustic Signal Processing Technical Committee

  • 2012年
    -
    2017年

    IEEE Signal Processing Society  Chair of the Fellow Subcommittee of the Audio and Acoustic Signal Processing Technical Committee

  • 2015年
    -
    2016年

    2015 European Signal Processing Conference  Special Session Organizer

  • 2015年
    -
    2016年

    2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2015年
    -
    2016年

    2015 AEARU Workshop on Computer Science and Web Technology  Member of the Program Committee

  • 2015年
    -
    2016年

    2015 IEEE International Conference on Acoustics, Speech, and Signal Processing  Area Chair

  • 2015年
    -
    2016年

    IEEE Signal Processing Society Japan Chapter  Vice Chair

  • 2015年
    -
    2016年

    2015 International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA)  Special Sessions Chair

  • 2015年
    -
    2016年

    2015 European Signal Processing Conference  Area Chair

  • 2010年
    -
    2016年

    Asia Pacific Signal and Information Processing Association  Member of the Speech, Language, and Audio Technical Committee

  • 2015年
     
     

    IEEE Signal Processing Society  Past Chair of the Audio and Acoustic Signal Processing Technical Committee

  • 2015年
     
     

    2015 IEEE International Workshop on Applications of Signal Processing to Audio  Member of the Technical Program Committee

  • 2015年
     
     

    IEEE Signal Processing Society  Vice Chair of the Nominations and Elections Subcommittee of the Audio and Acoustic Signal Processing Technical Committee

  • 2014年
    -
    2015年

    2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2014年
    -
    2015年

    2014 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2014年
    -
    2015年

    2014 Hands-free Speech Communication and Microphone Arrays  Member of the Technical Program Committee

  • 2014年
    -
    2015年

    Symposia at the 2014 IEEE Global Conference on Signal and Information Processing  Member of the Organizing Committee

  • 2014年
    -
    2015年

    2014 IEEE International Conference on Acoustics, Speech, and Signal Processing  Area Chair

  • 2014年
    -
    2015年

    2014 European Signal Processing Conference  Area Chair

  • 2013年
    -
    2014年

    2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2013年
    -
    2014年

    2013 European Signal Processing Conference  Special Session Organizer

  • 2013年
    -
    2014年

    2013 IEEE International Conference on Acoustics, Speech, and Signal Processing  Area Chair

  • 2013年
    -
    2014年

    2013 European Signal Processing Conference  Area Chair

  • 2012年
    -
    2013年

    Special Session Organizer

  • 2012年
    -
    2013年

    2012 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2011年04月
    -
    2012年03月

    日本音響学会  日本音響学会誌 小特集 ゲスト編集委員長

  • 2011年
    -
    2012年

    2011 Hands-free Speech Communication and Microphone Arrays  Member of the Technical Program Committee

  • 2011年
    -
    2012年

    2011 European Signal Processing Conference  Member of the Technical Program Committee

  • 2011年
    -
    2012年

    IEEE Signal Processing Society  Vice Chair of the Audio and Acoustic Signal Processing Technical Committee

  • 2011年
    -
    2012年

    European Association for Signal Processing (EURASIP)  Guest Editor of the EURASIP Journal on Applied Signal Processing

  • 2010年
    -
    2011年

    2010 Asia-Pacific Signal and Information Processing Conference  Member of the Technical Committee

  • 2010年
    -
    2011年

    2010 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2010年
    -
    2011年

    2010 IEEE International Symposium on Circuits and Systems  Track Chair

  • 2009年
    -
    2010年

    2009 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics  Member of the Organizing Committee

  • 2009年
    -
    2010年

    2009 IEEE International Symposium on Circuits and Systems  Track Chair

  • 2009年
    -
    2010年

    2009 European Signal Processing Conference  Area Chair

  • 2009年
    -
    2010年

    IEEE Circuits and Systems Society  Chair of the Blind Signal Processing Technical Committee

  • 2008年
    -
    2010年

    Institute of Electrical and Electronics Engineers (IEEE)  Guest Editor of the IEEE Trans. Circuits and Systems-I

  • 1990年
    -
    2010年

    IEEE Signal Processing Society  Member of the Audio and Acoustic Signal Processing Technical Committee

  • 2008年
    -
    2009年

    2008 Joint Workshop on Hands-free Speech Communication and Microphone Arrays  Special Session Organizer

  • 2008年
    -
    2009年

    2008 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2008年
    -
    2009年

    2008 Joint Workshop on Hands-free Speech Communication and Microphone Arrays  Technical Co-Chair

  • 2008年
    -
    2009年

    2008 Workshop on Statistical and Perceptual Audition  Co-Organizer

  • 2008年
    -
    2009年

    2008 IEEE International Symposium on Circuits and Systems  Member of the Review Committee

  • 2007年
    -
    2009年

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS.  電子情報通信学会論文誌 小特集 ゲスト編集委員長

  • 2007年
    -
    2008年

    2007 IEEE International Symposium on Circuits and Systems  Special Session Organizer

  • 2007年
    -
    2008年

    電子情報通信学会  基礎・境界ソサイエティ 副会長

  • 2007年
    -
    2008年

    2007 IEEE International Symposium on Circuits and Systems  Member of the Review Committee

  • 2007年
    -
    2008年

    Chair-Elect of the Blind Signal Processing Technical Committee

  • 2006年
    -
    2008年

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS.  電子情報通信学会論文誌 小特集 ゲスト編集委員長

  • 2006年
    -
    2007年

    2006 Asilomar Conference on Signals, Systems, and Computers  Special Session Organizer

  • 2006年
    -
    2007年

    2006 European Signal Processing Conference  Special Session Organizer

  • 2006年
    -
    2007年

    2006 International Conference on Independent Component Analysis and Blind Signal Separation  Special Session Organizer

  • 2006年
    -
    2007年

    2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan  Special Session Organizer

  • 2006年
    -
    2007年

    2006 International Conference on Independent Component Analysis and Blind Signal Separation  Member of the International Program Committee

  • 2006年
    -
    2007年

    2006 European Signal Processing Conference  Member of the Technical Program

  • 2006年
    -
    2007年

    2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan  Member of the Organizing Committee

  • 2006年
    -
    2007年

    2006 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2006年
    -
    2007年

    2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan  Member of the Technical Committee

  • 2006年
    -
    2007年

    2006 IEEE International Symposium on Circuits and Systems  Member of the Review Committee

  • 2006年
    -
    2007年

    Institute of Electrical and Electronics Engineers (IEEE)  Guest Editor of the IEEE Trans. Computers

  • 2006年
    -
    2007年

    2006 International Conference on Independent Component Analysis and Blind Signal Separation  Program Committee Chair

  • 2005年
    -
    2007年

    Institute of Electrical and Electronics Engineers (IEEE)  Guest Editor of the IEEE Trans. ASLP

  • 2005年
    -
    2006年

    2005 IEEE International Symposium on Circuits and Systems  Member of the Review Committee

  • 2005年
    -
    2006年

    2005 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics  Member of the Organizing Committee

  • 2005年
    -
    2006年

    2005 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2004年
    -
    2006年

    IEEE Circuits and Systems Society  Member of the Blind Signal Processing Technical Committee

  • 2003年04月
    -
    2005年03月

    電子情報通信学会  応用音響研究会 専門委員

  • 2004年
    -
    2005年

    2004 International Congress on Acoustics  Special Session Organizer

  • 2004年
    -
    2005年

    2004 IEEE International Conference on Acoustics, Speech and Signal Processing  Special Session Organizer

  • 2004年
    -
    2005年

    2004 Workshop on Communication Scene Analysis  Program Chair

  • 2004年
    -
    2005年

    2004 Workshop on Statistical and Perceptual Audio Processing  Member of the Technical Committee

  • 2004年
    -
    2005年

    2004 International Congress on Acoustics  Member of the Program Committee

  • 2001年
    -
    2005年

    Acoustical Society of Japan  日本音響学会誌 論文委員会 電気音響分野 幹事

  • 2003年
    -
    2004年

    2003 IEEE International Workshop on Neural Networks for Signal Processing  Member of the Program Committee

  • 2003年
    -
    2004年

    2003 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics  Member of the Program Committee

  • 2003年
    -
    2004年

    2003 International Conference on Independent Component Analysis and Blind Signal Separation  Organizing Chair

  • 2001年04月
    -
    2003年03月

    電子情報通信学会  応用音響研究会 副委員長

  • 2002年
    -
    2003年

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS.  電子情報通信学会論文誌 小特集 ゲスト編集委員

  • 2002年
    -
    2003年

    2002 China-Japan Joint Conference on Acoustics  Member of the Organizing Committee

  • 2002年
    -
    2003年

    2002 IEEE International Workshop on Neural Networks for Signal Processing  Member of the Program Committee

  • 1999年
    -
    2003年

    Institute of Electrical and Electronics Engineers (IEEE)  Senior Member

  • 2001年
    -
    2002年

    2001 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 1992年04月
    -
    2001年03月

    電子情報通信学会  応用音響研究会 専門委員

  • 1999年
    -
    2000年

    1999 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 1995年
    -
    1997年

    Acoustical Society of Japan  研究発表会準備委員会 委員

  • 1990年04月
    -
    1992年03月

    電子情報通信学会  応用音響研究会 幹事

  • 1990年
    -
    1992年

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS.  電子情報通信学会論文誌 小特集 ゲスト編集委員

▼全件表示

所属学協会

  •  
     
     

    日本音響学会

  •  
     
     

    電子情報通信学会

  •  
     
     

    APSIPA (Asia Pacific Signal and Information Processing Association)

  •  
     
     

    ISCA (International Speech Communication Association)

  •  
     
     

    EURASIP (European Association for Signal Processing)

  •  
     
     

    IEEE (Institute of Electrical and Electronics Engineers)

▼全件表示

研究分野

  • 知覚情報処理 / 知能ロボティクス / 知能情報学

研究キーワード

  • メディア情報処理

  • ディジタル信号処理

  • 音響信号処理

  • Media Information Processing

  • Digital Signal Processing

  • Acoustic Signal Processing

▼全件表示

受賞

  • 公益財団法人服部報公会 報公賞

    2018年10月   公益財団法人服部報公会  

    受賞者: 牧野昭二

  • 電子情報通信学会 功績賞

    2018年06月   電子情報通信学会  

    受賞者: 牧野昭二

  • 日本音響学会 論文賞

    2018年03月   日本音響学会  

    受賞者: 牧野昭二

  • 電子情報通信学会 業績賞

    2017年06月   通信学会  

    受賞者: 牧野昭二

  • 文部科学大臣表彰 (科学技術賞 研究部門)

    2015年04月  

    受賞者: 牧野昭二

  • 電気通信普及財団 テレコムシステム技術賞

    2015年03月   電気通信普及財団  

    受賞者: 牧野昭二

  • IEEE Signal Processing Society Best Paper Award

    2014年01月   IEEE Signal Processing Society  

    受賞者: Makino Shoji

  • Distinguished Lecturer

    2009年01月   IEEE  

    受賞者: Shoji Makino

  • Fellow

    2007年09月   IEICE  

    受賞者: Shoji Makino

  • MLSP Competition Award

    2007年08月   IEEE  

    受賞者: Shoji Makino

  • Best Presentation Award at the SPIE Defense and Security Symposium

    2006年04月   SPIE  

    受賞者: 牧野 昭二

  • ICA Unsupervised Learning Pioneer Award

    2006年04月   SPIE  

    受賞者: 牧野 昭二

  • Paper Award

    2005年05月   IEICE  

    受賞者: Shoji Makino

  • TELECOM System Technology Award

    2004年03月   Telecommunications Advancement Foundation  

    受賞者: Shoji Makino

  • Fellow

    2004年01月   IEEE  

    受賞者: Shoji Makino

  • Best Paper Award of the International Workshop on Acoustic Echo and Noise Control

    2003年09月  

    受賞者: 牧野 昭二

  • Paper Award

    2002年05月   IEICE  

    受賞者: Shoji Makino

  • Paper Award

    2002年03月   ASJ  

    受賞者: Shoji Makino

  • Achievement Award

    1997年05月   IEICE  

    受賞者: Shoji Makino

  • Outstanding Technological Development Award

    1995年05月   ASJ  

    受賞者: Shoji Makino

  • IEEE Signal Processing Society Notable Services and Contributions Award

    2019年   IEEE Signal Processing Society  

    受賞者: Makino Shoji

  • IEEE Signal Processing Society Chapter Leadership Award

    2018年12月   IEEE Signal Processing Society  

    受賞者: 牧野昭二

  • Best Faculty Member Award of the University of Tsukuba

    2016年02月  

    受賞者: 牧野昭二

  • IEEE Signal Processing Society Outstanding Service Award

    2014年12月   IEEE Signal Processing Society  

    受賞者: 牧野昭二

▼全件表示

 

論文

  • Time-Frequency-Bin-Wise Linear Combination of Beamformers for Distortionless Signal Enhancement.

    Kouei Yamaoka, Nobutaka Ono, Shoji Makino

    IEEE/ACM Transactions on Audio, Speech and Language Processing   29   3461 - 3475  2021年

    DOI

    Scopus

    12
    被引用数
    (Scopus)
  • Multichannel Signal Enhancement Algorithms for Assisted Listening Devices

    Simon Doclo, Walter Kellermann, Shoji Makino, Sven Nordholm

    IEEE SIGNAL PROCESSING MAGAZINE   32 ( 2 ) 18 - 30  2015年03月  [査読有り]

    DOI

    Scopus

    181
    被引用数
    (Scopus)
  • Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment

    Hiroshi Sawada, Shoko Araki, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   19 ( 3 ) 516 - 527  2011年03月  [査読有り]

     概要を見る

    This paper presents a blind source separation method for convolutive mixtures of speech/audio sources. The method can even be applied to an underdetermined case where there are fewer microphones than sources. The separation operation is performed in the frequency domain and consists of two stages. In the first stage, frequency-domain mixture samples are clustered into each source by an expectation-maximization (EM) algorithm. Since the clustering is performed in a frequency bin-wise manner, the permutation ambiguities of the bin-wise clustered samples should be aligned. This is solved in the second stage by using the probability on how likely each sample belongs to the assigned class. This two-stage structure makes it possible to attain a good separation even under reverberant conditions. Experimental results for separating four speech signals with three microphones under reverberant conditions show the superiority of the new method over existing methods. We also report separation results for a benchmark data set and live recordings of speech mixtures.

    DOI

    Scopus

    312
    被引用数
    (Scopus)
  • Frequency-Domain Pearson Distribution Approach for Independent Component Analysis (FD-Pearson-ICA) in Blind Source Separation

    Hiroko Kato Solvang, Yuichi Nagahara, Shoko Araki, Hiroshi Sawada, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   17 ( 4 ) 639 - 649  2009年05月  [査読有り]

     概要を見る

    In frequency-domain blind source separation (BSS) for speech with independent component analysis (ICA), a practical parametric Pearson distribution system is used to model the distribution of frequency-domain source signals. ICA adaptation rules have a score function determined by an approximated signal distribution. Approximation based on the data may produce better separation performance than we can obtain with ICA. Previously, conventional hyperbolic tangent (tanh) or generalized Gaussian distribution (GGD) was uniformly applied to the score function for all frequency bins, even though a wideband speech signal has different distributions at different frequencies. To deal with this, we propose modeling the signal distribution at each frequency by adopting a parametric Pearson distribution and employing it to optimize the separation matrix in the ICA learning process. The score function is estimated by the appropriate Pearson distribution parameters for each frequency bin. We devised three methods for Pearson distribution parameter estimation and conducted separation experiments with real speech signals convolved with actual room impulse responses (T(60) = 130 ms). Our experimental results show that the proposed frequency-domain Pearson-ICA (FD-Pearson-ICA) adapted well to the characteristics of frequency-domain source signals. By applying the FD-Pearson-ICA performance, the signal-to-interference ratio significantly improved by around 2-3 dB compared with conventional nonlinear functions. Even if the signal-to-interference ratio (SIR) values of FD-Pearson-ICA were poor, the performance based on a disparity measure between the true score function and estimated parametric score function clearly showed the advantage of FD-Pearson-ICA. Furthermore, we confirmed the optimum of the proposed approach for/optimized the proposed approach as regards separation performance. By combining individual distribution parameters directly estimated at low frequency with the appropriate parameters optimized at high frequency, it was possible to both reasonably improve the FD-Pearson-ICA performance without any significant increase in the computational burden by comparison with conventional nonlinear functions.

    DOI

    Scopus

    16
    被引用数
    (Scopus)
  • Grouping separated frequency components by estimating propagation model parameters in frequency-domain blind source separation

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   15 ( 5 ) 1592 - 1604  2007年07月  [査読有り]

     概要を見る

    This paper proposes a new formulation and optimization procedure for grouping frequency components in frequency-domain blind source separation (BSS). We adopt two separation techniques, independent component analysis (ICA) and time-frequency.(T-F) masking, for the frequency-domain BSS. With ICA, grouping the frequency components corresponds to aligning the permutation ambiguity of the ICA solution in each frequency bin. With T-F masking, grouping the frequency components corresponds to classifying sensor observations in the time-frequency domain for individual sources. The grouping procedure is based on estimating anechoic propagation model parameters by analyzing ICA results or sensor observations. More specifically, the time delays of arrival and attenuations from a source to all sensors are estimated for each source. The focus of this paper includes the applicability of the proposed procedure for a situation with wide sensor spacing where spatial aliasing may occur. Experimental results show that the proposed procedure effectively separates two or three sources with several sensor configurations in a real room, as long as the room reverberation is moderately low.

    DOI

    Scopus

    108
    被引用数
    (Scopus)
  • Spatio-temporal FastICA algorithms for the blind separation of convolutive mixtures

    Scott C. Douglas, Malay Gupta, Hiroshi Sawada, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   15 ( 5 ) 1511 - 1520  2007年07月  [査読有り]

     概要を見る

    This paper derives two spatio-temporal extensions of the well-known FastICA algorithm of Hyvarinen and Oja that are applicable to the convolutive blind source separation task. Our time-domain algorithms combine multichannel spatio-temporal prewhitening via multistage least-squares linear prediction with novel adaptive procedures that impose paraunitary, constraints on the multichannel separation filter. The techniques converge quickly to a separation solution without any step size selection or divergence difficulties, and unlike other methods, ours do not require special coefficient initialization procedures to obtain good separation performance. They also allow for the efficient reconstruction of individual signals as observed in the sensor measurements directly from the system parameters for single-input multiple-output blind source separation tasks. An analysis of one of the adaptive constraint procedures shows its fast convergence to a paraunitary filter bank solution. Numerical evaluations of the proposed algorithms and comparisons with several existing convolutive blind source separation techniques indicate the excellent relative performance of the proposed methods.

    DOI

    Scopus

    70
    被引用数
    (Scopus)
  • Geometrically constrained independent component analysis

    Mirko Knaak, Shoko Araki, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   15 ( 2 ) 715 - 726  2007年02月  [査読有り]

     概要を見る

    Acoustical signals are often corrupted by other speeches, sources, and background noise. This makes it necessary to use some form of preprocessing so that signal processing systems such as a speech recognizer or machine diagnosis can be effectively employed. In this contribution, we introduce and evaluate a new algorithm that uses independent component analysis (ICA) with a geometrical constraint [constrained ICA (CICA)]. It is based on the fundamental similarity between an adaptive beamformer and blind source separation with ICA, and does not suffer the permutation problem of ICA-algorithms. Unlike conventional ICA algorithms, CICA needs prior knowledge about the rough direction of the target signal. However, it is more robust against an erroneous estimation of the target direction than adaptive beamformers: CICA converges to the right solution as long as its look direction is closer to the target signal than to the jammer signal. A high degree of robustness is very important since the geometrical prior of an adaptive beamformer is always roughly estimated in a reverberant environment, even when the look direction is precise. The effectiveness and robustness of the new algorithms is proven theoretically, and shown experimentally for three sources and three microphones with several sets of real-world data.

    DOI

    Scopus

    44
    被引用数
    (Scopus)
  • Blind extraction of dominant target sources using ICA and time-frequency masking

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   14 ( 6 ) 2165 - 2173  2006年11月  [査読有り]

     概要を見る

    This paper presents a method for enhancing target sources of interest and suppressing other interference sources. The target sources are assumed to be close to sensors, to have dominant powers at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e., without knowing the position and active time of each source. We consider a general case where the total number of sources is larger than the number of sensors, and neither the number of target sources nor the total number of sources is known. The method is based on a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and then time-frequency masking is used to improve the performance further. We propose a new sophisticated method for deciding the number of target sources and then selecting their frequency components. We also propose a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room, whose reverberation time was 130 ms, are presented to show the effectiveness and characteristics of the proposed method.

    DOI

    Scopus

    92
    被引用数
    (Scopus)
  • Natural gradient multichannel blind deconvolution and speech separation using causal FIR filters

    Scott C. Douglas, Hiroshi Sawada, Shoji Makino

    IEEE Transactions on Speech and Audio Processing   13 ( 1 ) 92 - 104  2005年01月  [査読有り]

     概要を見る

    Natural gradient adaptation is an especially convenient method for adapting the coefficients of a linear system in inverse filtering tasks such as convolutive blind source separation and multichannel blind deconvolution. When developing practical implementations of such methods, however, it is not clear how best to window the signals and truncate the filter impulse responses within the filtered gradient updates. In this paper, we show how inadequate use of truncation of the filter impulse responses and signal windowing within a well-known natural gradient algorithm for multichannel blind deconvolution and source separation can introduce a bias into its steady-state solution. We then provide modifications of this algorithm that effectively mitigate these effects for estimating causal FIR solutions to single- and multichannel equalization and source separation tasks. The new multichannel blind deconvolution algorithm requires approximately 6.5 multiply/adds per adaptive filter coefficient, making its computational complexity about 63% greater than the originally-proposed version. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for real-world acoustic sources, even for extremely-short separation filters.

    DOI

    Scopus

    79
    被引用数
    (Scopus)
  • A robust and precise method for solving the permutation problem of frequency-domain blind source separation

    H Sawada, R Mukai, S Araki, S Makino

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   12 ( 5 ) 530 - 538  2004年09月  [査読有り]

     概要を見る

    Blind source separation (BSS) for convolutive mixtures can be solved efficiently in the frequency domain, where independent component analysis (ICA) is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem: the permutation ambiguity of ICA in each frequency bin should be aligned so that a separated signal in the time-domain contains frequency components of the same source signal. This paper presents a robust and precise method for solving the permutation problem. It is based on two approaches: direction of arrival (DOA) estimation for sources and the interfrequency correlation of signal envelopes. We discuss the advantages and disadvantages of the two approaches, and integrate them to exploit their respective advantages. Furthermore, by utilizing the harmonics of signals, we make the new method robust even for low frequencies where DOA estimation is inaccurate. We also present a new closed-form formula for estimating DOAs from a separation matrix obtained by ICA. Experimental results show that our method provided an almost perfect solution to the permutation problem for a case where two sources were mixed in a room whose reverberation time was 300 ms.

    DOI

    Scopus

    443
    被引用数
    (Scopus)
  • The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech

    S Araki, R Mukai, S Makino, T Nishikawa, H Saruwatari

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   11 ( 2 ) 109 - 116  2003年03月  [査読有り]

     概要を見る

    Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the separation performance is still not good. enough. In particular, when the impulse responses are long, performance is highly limited. In this paper, we consider a two-input, two-output convolutive BSS problem. First, we show that it is not good to be constrained by the condition T > P, where T is the frame length of the DFT and P is the length of the room impulse responses. We show that there is an optimum frame size. that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation. We also clarify the reason for. the poor performance of BSS in long reverberant environments, highlighting that the framework of BSS works as two sets of frequency-domain adaptive beamformers. Although BSS can reduce reverberant sounds to some extent like adaptive beamformers, they mainly remove the sounds from the jammer direction. This is the reason for the difficulty of BSS in reverberant environments.

    DOI

    Scopus

    223
    被引用数
    (Scopus)
  • Common-acoustical-pole and zero modeling of head-related transfer functions

    Y Haneda, S Makino, Y Kaneda, N Kitawaki

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   7 ( 2 ) 188 - 196  1999年03月  [査読有り]

     概要を見る

    Use of a common-acoustical-pole and zero model is proposed for modeling head-related transfer functions (HRTF's) for various directions of sound incidence. The HRTF's are expressed using the common acoustical poles, which do not depend on the source directions, and the zeros, which do, The common acoustical poles are estimated as they are common to HRTF's for various source directions; the estimated values of the poles agree well with the resonance frequencies of the ear canal. Because this model uses only the zeros to express the HRTF variations due to changes in source direction, it requires fewer parameters (the order of the zeros) that depend on the source direction than do the conventional all zero or pole/zero models. Furthermore, the proposed model can extract the zeros that are missed in the conventional models because of pole-zero cancellation. As a result, the directional dependence of the zeros can be traced well. Analysis of the zeros for HRTF's on the horizontal plane showed that the nonminimum-phase zero variation was well formulated using a simple pinna-reflection model, The common-acoustical-pole and zero (CAPZ) model is thus effective for modeling and analyzing HRTF's.

  • A block exact fast affine projection algorithm

    M Tanaka, S Makino, J Kojima

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   7 ( 1 ) 79 - 86  1999年01月  [査読有り]

     概要を見る

    This paper describes a block (affine) projection algorithm that has exactly the same convergence rate as the original sample-by-sample algorithm and smaller computational complexity than the fast affine projection algorithm. This is achieved by 1) introducing a correction term that compensates for the filter output difference between the sample-by-sample projection algorithm and the straightforward block projection algorithm, and 2) applying a fast finite impulse response (FIR) filtering technique to compute filter outputs and to update the filter.
    We describe how to choose a pair of block lengths that gives the longest filter length under a constraint on the total computational complexity and processing delay. An example shows that the filter length can be doubled if a delay of a few hundred samples is permissible.

  • The past, present, and future of audio signal processing

    T Chen, GW Elko, SJ Elliot, S Makino, JM Kates, M Bosi, JO Smith, M Kahrs

    IEEE SIGNAL PROCESSING MAGAZINE   14 ( 5 ) 30 - 57  1997年09月  [査読有り]

  • Common Acoustical Pole and Zero Modeling of Room Transfer Functions

    Yoichi Haneda, Shoji Makino, Yutaka Kaneda

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   2 ( 2 ) 320 - 328  1994年04月  [査読有り]

     概要を見る

    A new model for a room transfer function (RTF) by using common acoustical poles that correspond to resonance properties of a room is proposed. These poles are estimated as the common values of many RTF's corresponding to different source and receiver positions. Since there is one-to-one correspondence between poles and AR coefficients, these poles are calculated as common AR coefficients by two methods: i) using the least squares method, assuming all the given multiple RTF's have the same AR coefficients and ii) averaging each set of AR coefficients estimated from each RTF. The estimated poles agree well with the theoretical poles when estimated with the same order as the theoretical pole order. When estimated with a lower order than the theoretical pole order, the estimated poles correspond to the major resonance frequencies, which have high Q factors. Using the estimated common AR coefficients, the proposed method models the RTF's with different MA coefficients. This model is called the common-acoustical-pole and zero (CAPZ) model, and it requires far fewer variable parameters to represent RTF's than the conventional all-zero or pole/zero model. This model was used for an acoustic echo canceller at low frequencies, as one example. The acoustic echo canceller based on the proposed model requires half the variable parameters and converges 1.5 times faster than one based on the all-zero model, confirming the efficiency of the proposed model.

  • Exponentially Weighted Stepsize NLMS Adaptive Filter Based on the Statistics of a Room Impulse Response

    Shoji Makino, Yutaka Kaneda, Nobuo Koizumi

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   1 ( 1 ) 101 - 108  1993年01月  [査読有り]

     概要を見る

    This paper proposes a new normalized least-mean-squares (NLMS) adaptive algorithm with double the convergence speed, at the same computational load, of the conventional NLMS for an acoustic echo canceller. This algorithm, called the ES (exponentially weighted stepsize) algorithm, uses a different stepsize (feedback constant) for each weight of an adaptive transversal filter. These stepsizes are time-invariant and weighted proportional to the expected variation of a room impulse response. The algorithm is based on the fact that the expected variation of a room impulse response becomes progressively smaller along the series by the same exponential ratio as the impulse response energy decay. As a result, the algorithm adjusts coefficients with large errors in large steps, and coefficients with small errors in small steps. A transition formula is derived for the mean-squared coefficient error of the proposed algorithm. The mean stepsize determines the convergence condition, the convergence speed, and the final excess mean-squared error. The algorithm is modified for a practical multiple DSP structure, so that it requires only the same amount of computation as the conventional NLMS. The algorithm is implemented in a commercial acoustic echo canceller and its fast convergence is demonstrated.

    DOI CiNii

    Scopus

    107
    被引用数
    (Scopus)
  • Wavelength-Proportional Interpolation and Extrapolation of Virtual Microphone for Underdetermined Speech Enhancement

    Ryoga Jinzai, Kouei Yamaoka, Shoji Makino, Nobutaka Ono, Mitsuo Matsumoto, Takeshi Yamada

    APSIPA Transactions on Signal and Information Processing   12 ( 3 )  2023年

     概要を見る

    We previously proposed the virtual microphone technique to improve speech enhancement performance in underdetermined situations, in which the number of channels is virtually increased by estimating extra microphone signals at arbitrary positions along the straight line formed by real microphones. The effectiveness of the interpolation of virtual microphone signals for speech enhancement was experimentally confirmed. In this work, we apply the extrapolation of a virtual microphone as preprocessing of the maximum signal-to-noise ratio (SNR) beamformer and compare its speech enhancement performance (the signal-to-distortion ratio (SDR) and signal-to-interference ratio (SIR)) with that of using the interpolation of a virtual microphone. Furthermore, we aim to improve speech enhancement performance by solving a trade-off relationship between performance at low and high frequencies, which can be controlled by adjusting the virtual microphone interval. We propose a new arrangement where a virtual microphone is placed at a distance from the reference real microphone proportional to the wavelength at each frequency. From the results of our experiment in an underdetermined situation, we confirmed speech enhancement performance using the extrapolation of a virtual microphone is higher than that of using the interpolation of a virtual microphone. Moreover, the proposed wavelength-proportional interpolation and extrapolation method improves speech enhancement performance compared with the interpolation and extrapolation. Furthermore, we present the directivity patterns of a spatial filter and confirmed the behavior that improves speech enhancement performance.

    DOI

    Scopus

  • Low latency online blind source separation based on joint optimization with blind dereverberation

    Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings   2021-   506 - 510  2021年

     概要を見る

    This paper presents a new low-latency online blind source separation (BSS) algorithm. Although algorithmic delay of a frequency domain online BSS can be reduced simply by shortening the short-time Fourier transform (STFT) frame length, it degrades the source separation performance in the presence of reverberation. This paper proposes a method to solve this problem by integrating BSS with Weighted Prediction Error (WPE) based dereverberation. Although a simple cascade of online BSS after online WPE upgrades the separation performance, the overall optimality is not guaranteed. Instead, this paper extends a recently proposed batch processing algorithm that can jointly optimize dereverberation and separation so that it can perform online processing with low computational cost and little processing delay (&lt
    12 ms). The results of a source separation experiment in a noisy car environment suggest that the proposed online method has better separation performance than the simple cascaded methods.

    DOI

    Scopus

    13
    被引用数
    (Scopus)
  • SepNet: A deep separation matrix prediction network for multichannel audio source separation

    Shota Inoue, Hirokazu Kameoka, Li Li, Shoji Makino

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings   2021-   191 - 195  2021年

     概要を見る

    In this paper, we propose SepNet, a deep neural network (DNN) designed to predict separation matrices from multichannel observations. One well-known approach to blind source separation (BSS) involves independent component analysis (ICA). A recently developed method called independent low-rank matrix analysis (ILRMA) is one of its powerful variants. These methods allow the estimation of separation matrices based on deterministic iterative algorithms. Specifically, ILRMA is designed to update the separation matrix according to an update rule derived based on the majorization-minimization principle. Although ILRMA performs reasonably well under some conditions, there is still room for improvement in terms of both separation accuracy and computation time, especially for large-scale microphone arrays. The existence of a deterministic iterative algorithm that can find one of the stationary points of the BSS problem implies that a DNN can also play that role if designed and trained properly. Motivated by this, we propose introducing a DNN that learns to convert a predefined input (e.g., an identity matrix) into a true separation matrix in accordance with a multichannel observation. To enable it to find one of the multiple solutions corresponding to different permutations of the source indices, we further propose adopting a permutation invariant training strategy to train the network. By using a fully convolutional architecture, we can design the network so that the forward propagation can be computed efficiently. The experimental results revealed that SepNet was able to find separation matrices faster and with better separation accuracy than ILRMA for mixtures of two sources.

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis.

    Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Masakazu Une, Shoji Makino

    APSIPA ASC     578 - 584  2021年

  • Speech emotion recognition based on attention weight correction using word-level confidence measure

    Jennifer Santoso, Takeshi Yamada, Shoji Makino, Kenkichi Ishizuka, Takekatsu Hiramura

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH   1   301 - 305  2021年

     概要を見る

    Emotion recognition is essential for human behavior analysis and possible through various inputs such as speech and images. However, in practical situations, such as in call center analysis, the available information is limited to speech. This leads to the study of speech emotion recognition (SER). Considering the complexity of emotions, SER is a challenging task. Recently, automatic speech recognition (ASR) has played a role in obtaining text information from speech. The combination of speech and ASR results has improved the SER performance. However, ASR results are highly affected by speech recognition errors. Although there is a method to improve ASR performance on emotional speech, it requires the fine-tuning of ASR, which is costly. To mitigate the errors in SER using ASR systems, we propose the use of the combination of a self-attention mechanism and a word-level confidence measure (CM), which indicates the reliability of ASR results, to reduce the importance of words with a high chance of error. Experimental results confirmed that the combination of self-attention mechanism and CM reduced the effects of incorrectly recognized words in ASR results, providing a better focus on words that determine emotion recognition. Our proposed method outperformed the stateof- the-art methods on the IEMOCAP dataset.

    DOI

    Scopus

    15
    被引用数
    (Scopus)
  • Study on geometrically constrained IVA with auxiliary function approach and VCD for in-car communication

    Goto, Kana, Li, Li, Takahashi, Riki, Makino, Shoji, Yamada, Takeshi

    Proc. APSIPA ASC 2020     858 - 862  2020年12月  [査読有り]

  • Applying virtual microphones to triangular microphone array in in-car communication

    Segawa, Hanako, Takahashi, Riki, Jinzai, Ryoga, Makino, Shoji, Yamada, Takeshi

    Proc. APSIPA ASC 2020     421 - 425  2020年12月  [査読有り]

  • Determined audio source separation with multichannel star generative adversarial network

    Li Li, Hirokazu Kameoka, Shoji Makino

    IEEE International Workshop on Machine Learning for Signal Processing, MLSP   2020-  2020年09月

     概要を見る

    This paper proposes a multichannel source separation approach, which uses a star generative adversarial network (StarGAN) to model power spectrograms of sources. Various studies have shown the significant contributions of a precise source model to the performance improvement in audio source separation, which indicates the importance of developing a better source model. In this paper, we explore the potential of StarGAN for modeling source spectrograms and investigate the effectiveness of the StarGAN source model in determined multichannel source separation by incorporating it into a frequency-domain independent component analysis (ICA) framework. The experimental results reveal that the proposed StarGAN-based method outperformed conventional methods that use non-negative matrix factorization (NMF) or a variational autoencoder (VAE) for source spectrogram modeling.

    DOI

    Scopus

    9
    被引用数
    (Scopus)
  • DNNマスク推定に基づく畳み込みビームフォーマによる音源分離・残響除去・雑音除去の同時実現

    髙橋理希, 中谷智広, 落合翼, 木下慶介, 池下林太郎, Marc, Delcroix, 荒木章子, 牧野, 昭二

    音講論集   3-1-9   285 - 288  2020年03月

  • 基底共有型半教師あり独立低ランク行列分析に基づく多チャネル補聴器システム

    宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

    音講論集   1-1-22   217 - 220  2020年03月

  • 発話の時間変動に着目した音声認識誤り区間推定の検討

    舒, 禹清, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会講演論文集     957 - 958  2020年03月

  • 空間特徴と音響特徴を併用する音響イベント検出の検討

    陳, 軼夫, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会講演論文集     1027 - 1030  2020年03月

  • 車室内コミュニケーション用低遅延音源分離の検討

    上田, 哲也, 井上, 翔太, 牧野, 昭二, 松本, 光雄, 山田, 武志

    日本音響学会春季研究発表会講演論文集     213 - 216  2020年03月

  • 空間フィルタの自動推定による音響シーン識別の検討

    大野, 泰己, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会論文集(D)   D-14-6   113 - 113  2020年03月

  • Generative Adversarial Networks を用いた半教師あり学習の音響イベント検出への適用

    合馬, 一弥, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会論文集(D)   D-14-7   114 - 114  2020年03月

  • Spatial feature extraction based on convolutional neural network with multiple microphone inputs for monitoring of domestic activities

    Kaneko, Yuki, Kurosawa, Rika, Yamada, Takeshi, Makino, Shoji

    Proc. NCSP'20     175 - 178  2020年02月  [査読有り]

  • Underdetermined multichannel speech enhancement using time-frequency-bin-wise switching beamformer and gated CNN-based time-frequency mask for reverberant environments

    Takahashi, Riki, Yamaoka, Kouei, Li, Li, Makino, Shoji, Yamada, Takeshi, Matsumoto, Mitsuo

    Proc. NCSP'20     645 - 648  2020年02月  [査読有り]

  • Blind source separation with low-latency for in-car communication

    Ueda, Tetsuya, Inoue, Shota, Makino, Shoji, Matsumoto, Mitsuo, Yamada, Takeshi

    Proc. NCSP'20     167 - 170  2020年02月  [査読有り]

  • 多チャンネル変分自己符号化器法による任意話者の音源分離

    李莉, 亀岡弘和, 井上翔太, 牧野, 昭二

    信学技報   EA2019-77   79 - 84  2019年12月

  • Evaluation of multichannel hearing aid system by rank-constrained spatial covariance matrix estimation

    Une, Masakazu, Kubo, Yuki, Takamune, Norihiro, Kitamura‡, Daichi, Saruwatari, Hiroshi, Makino, Shoji

    Proc. APSIPA     1874 - 1879  2019年11月  [査読有り]

  • Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE

    Li, Li, Toda, Tomoki, Morikawa, Kazuho, Kobayashi, Kazuhiro, Makino, Shoji

    Proc. Annual Conference of the International Society for Music Information Retrieval     784 - 790  2019年11月

  • Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum

    Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

    Proc. APSIPA ASC 2019     302 - 306  2019年11月  [査読有り]

  • Supervised determined source separation with multichannel variational autoencoder

    Kameoka, Hirokazu, Li, Li, Inoue, Shota, Makino, Shoji

    Neural Computation   31 ( 9 ) 1891 - 1914  2019年09月  [査読有り]

  • Joint separation, dereverberation and classification of multiple sources using multichannel variational autoencoder with auxiliary classifier

    Shota, Inoue, Hirokazu, Kameoka, Li, Li, Makino, Shoji

    Proc. International Congress on Acoustics     6988 - 6995  2019年09月  [査読有り]

  • Gated convolutional neural network-based voice activity detection under high-level noise environments

    Li, Li, Kouei, Yamaoka, Yuki, Koshino, Mitsuo, Matsumoto, Makino, Shoji

    Proc. International Congress on Acoustics     2862 - 2869  2019年09月  [査読有り]

  • ランク制約付き空間共分散モデル推定を用いた多チャネル補聴器システムの評価

    宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

    音講論集   1-1-3   161 - 164  2019年09月

  • Wavelength proportional arrangement of virtual microphones based on interpolation/extrapolation for underdetermined speech enhancement

    Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Makino, Shoji, Yamada, Takeshi

    Proc. EUSIPCO 2019    2019年09月  [査読有り]

  • BLSTMと変調スペクトルを用いた発話特徴識別の検討

    サントソ, ジェニファー, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集     917 - 928  2019年09月

  • BLSTMを用いた音声認識誤り区間推定の検討

    舒, 禹清, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集     921 - 924  2019年09月

  • CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations

    Yamaoka, Kouei, Li, Li, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    Proc. EUSIPCO 2019    2019年09月  [査読有り]

  • Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier

    Li, Li, Hirokazu, Kameoka, Makino, Shoji

    Proc. ICASSP2019     546 - 550  2019年05月

  • Joint separation and dereverberation of reverberant mixtures with multichannel variational autoencoder

    Shota, Inoue, Hirokazu, Kameoka, Li, Li, Shogo, Seki, Makino, Shoji

    Proc. ICASSP2019     96 - 100  2019年05月  [査読有り]

  • Time-frequency-bin-wise switching of minimum variance distortionless response beamformer for underdetermined situations

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    Proc. ICASSP 2019     7908 - 7912  2019年05月  [査読有り]

  • Experimental evaluation of WaveRNN predictor for audio lossless coding

    Amada, Shota, Sugiura, Ryosuke, Kamamoto, Yutaka, Harada, Noboru, Moriya, Takehiro, Yamada, Takeshi, Makino, Shoji

    Proc. NCSP'19     315 - 318  2019年03月  [査読有り]

  • MVDRビームフォーマの時間周波数スイッチングによる劣決定音声強調

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    電子情報通信学会技術研究報告(SP)   SIP2018-130   149 - 154  2019年03月

  • 日本語スピーキングテストにおける解答発話テキストの分散表現を用いた自動採点の検討

    臼井, 桃香, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会論文集(D)   D-14-10   137 - 137  2019年03月

  • Gated CNNを用いた劣悪な雑音環境下における音声区間検出

    李莉, 越野ゆき, 松本光雄, 牧野, 昭二

    電子情報通信学会技術研究報告   EA2018-124   19 - 24  2019年03月

  • Noise suppression using beamformer and transfer-function-gain nonnegative matrix factorization with distributed stereo microphones

    Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

    Proc. NCSP'19     260 - 263  2019年03月  [査読有り]

  • Categorizing error causes related to utterance characteristics in speech recognition

    Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

    Proc. NCSP'19     514 - 517  2019年03月  [査読有り]

  • 多チャンネル変分自己符号化器を用いた音源分離と残響除去の統合的アプローチ

    井上翔太, 亀岡弘和, 李莉, 関翔悟, 牧野, 昭二

    音講論集   2-Q-32   399 - 402  2019年03月

  • Performance evaluation of time-frequency-bin-wise switching beamformer in reverberant environments

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    Proc. NCSP'19     264 - 267  2019年03月  [査読有り]

  • 時間周波数スイッチングビームフォーマとGated CNNを用いた時間周波数マスクの組み合わせによる劣決定音声強調

    髙橋, 理希, 山岡, 洸瑛, 李, 莉, 牧野, 昭二, 山田, 武志

    日本音響学会2019年春季研究発表会講演論文集   1-6-5   181 - 184  2019年03月

  • 音源クラス識別器つき多チャンネル変分自己符号化器を用いた高速セミブラインド音源分離

    李, 莉, 亀岡, 弘和, 牧野, 昭二

    音講論集   1-6-10   201 - 204  2019年03月

  • Microphone position realignment by extrapolation of virtual microphone

    Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Yamada, Takeshi, Makino, Shoji

    Proc. APSIPA ASC 2018     367 - 372  2018年11月  [査読有り]

  • Weakly labeled learning using BLSTM-CTC for sound event detection

    Matsuyoshi, Taiki, Komatsu, Tatsuya, Kondo, Reishi, Yamada, Takeshi, Makino, Shoji

    Proc. APSIPA ASC 2018     1918 - 1923  2018年11月  [査読有り]

  • WaveRNNを利用した音声ロスレス符号化に関する検討と考察

    天田, 将太, 杉浦, 亮介, 鎌本, 優, 原田, 登, 守谷, 健弘, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集   2-4-9   1149 - 1152  2018年09月

  • Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming

    Makino,Shoji

    Proc. IWAENC2018     71 - 75  2018年09月  [査読有り]

    DOI

    Scopus

    10
    被引用数
    (Scopus)
  • ヴァーチャルマイクロフォンの外挿によるマイクロフォン間隔の仮想的拡張

    陣在, 遼河, 山岡, 洸瑛, 松本, 光雄, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集   1-1-21   149 - 152  2018年09月

  • 時間周波数スイッチングビームフォーマと時間周波数マスキングによる劣決定音声強調

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    日本音響学会秋季研究発表会講演論文集   1-Q-12   407 - 410  2018年09月

  • Time-frequency-bin-wise beamformer selection and masking for speech enhancement in underdetermined noisy scenarios

    Yamaoka, Kouei, Brendel, Andreas, Ono, Nobutaka, Makino, Shoji, Buerger, Michael, Yamada, Takeshi, Kellermann, Walter

    Proc. EUSIPCO 2018     1596 - 1600  2018年09月  [査読有り]

  • Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming

    Makino,Shoji

    Proc. IWAENC2018     71 - 75  2018年09月  [査読有り]

  • 音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習法の有効性評価

    松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集   1-R-5   961 - 964  2018年09月

  • Acoustic scene classification based on spatial feature extraction using convolutional neural networks

    Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

    Journal of Signal Processing   22 ( 4 ) 199 - 202  2018年07月  [査読有り]

     概要を見る

    Acoustic scene classification (ASC) classifies the place or situation where an acoustic sound was recorded. The Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge prepared a task involving ASC. Some methods using convolutional neural networks (CNNs) were proposed in the DCASE 2017 Challenge. The best method independently performed convolution operations for the left, right, mid (addition of left and right channels), and side (subtraction of left and right channels) input channels to capture spatial features. On the other hand, we propose a new method of spatial feature extraction using CNNs. In the proposed method, convolutions are performed for the time-space (channel) domain and frequency-space domain in addition to the time-frequency domain to capture spatial features. We evaluate the effectiveness of the proposed method using the task in the DCASE 2017 Challenge. The experimental results confirmed that convolution operations for the frequency-space domain are effective for capturing spatial features. Furthermore, by using a combination of the three domains, the classification accuracy was improved by 2.19% compared with that obtained using the tim

    DOI

  • 畳み込みニューラルネットワークを用いた空間特徴抽出に基づく音響シーン識別の検討

    高橋, 玄, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会講演論文集     67 - 70  2018年03月

  • 複数ビームフォーマの組み合わせによる非線形マイクロフォンアレイ

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    日本音響学会春季研究発表会講演論文集     475 - 478  2018年03月

  • Ego noise reduction and sound source localization adapted to human ears using hose-shaped rescue robot

    Mae, Narumi, Yamaoka, koei, Mitsui, Yosiki, Matsumoto, Mitsuo, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

    Proc. NCSP'18     371 - 374  2018年03月  [査読有り]

  • 複数種録音端末を用いた会議の想定における伝達関数ゲイン基底NMFによる遠方音源抑圧の性能評価

    松井, 裕太郎, 牧野, 昭二, 小野, 順貴, 山田, 武志

    電子情報通信学会技術研究報告     335 - 340  2018年03月

  • 音声認識における誤認識原因通知のための印象評定値推定の検討

    後藤, 孝宏, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会講演論文集     117 - 120  2018年03月

  • 音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習の検討

    松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会講演論文集     63 - 66  2018年03月

  • Novel speech recognition interface based on notification of utterance volume required in changing noisy environment

    Goto, Takahiro, Yamada, Takeshi, Makino, Shoji

    Proc. NCSP'18     192 - 195  2018年03月  [査読有り]

  • Acoustic scene classification based on spatial feature extraction using convolutional neural networks

    Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

    Proc. NCSP'18     188 - 191  2018年03月  [査読有り]

  • Ego-noise reduction for hose-shaped rescue robot using basis-shared semi-supervised independent low-rank matrix analysis

    Takakusaki, Moe, Kitamura, Daichi, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi, Saruwatari, Hiroshi

    Proc. NCSP'18     351 - 354  2018年03月  [査読有り]

  • Sound source localization using binaural difference for hose-shaped rescue robot

    Mae, Narumi, Mitsui, Yoshiki, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

    Proc. APSIPA 2017     1 - 7  2017年12月  [査読有り]

  • Abnormal sound detection by two microphones using virtual microphone technique

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    Proc. APSIPA 2017     1 - 5  2017年12月  [査読有り]

  • Performance evaluation of acoustic scene classification using DNN-GMM and frame-concatenated acoustic Features

    Takahashi, Gen, Yamada, Takeshi, Ono, Nobutaka, Makino, Shoji

    Proc. APSIPA 2017     1 - 5  2017年12月  [査読有り]

  • Experimental evaluation of encoding parameters of MPEG-4 ALS for high-resolution audio

    Amada, Shota, Kamamoto, Yutaka, Harada, Noboru, Sugiura, Ryosuke, Moriya, Takehiro, Makino, Shoji, Yamada, Takeshi

    Proc. IEEE GCCE 2017     141 - 145  2017年10月  [査読有り]

  • 音響ロスレス符号化MPEG-4 ALSにおけるハイレゾ音源向け線形予測次数最適化に関する検討と考察

    天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集     251 - 254  2017年09月

  • 非同期マイクロホンアレーにおける伝達関数ゲイン基底非負値行列因子分解を用いた遠方音源抑圧

    村瀬, 慶和, 小野, 順貴, 宮部, 滋樹, 山田, 武志, 牧野, 昭二

    日本音響学会誌   73 ( 9 ) 563 - 570  2017年09月  [査読有り]

     概要を見る

    <p>ビームフォーミングなどの従来のアレー信号処理による雑音抑圧手法は,位相情報を活用した指向性制御に基づいており,特定方向から到来する雑音に対しては指向性の零点を向けることで高い効果が得られる。しかし,到来方向が特定できないような,いわゆる背景雑音の抑圧は,一般に難しかった。本論文では,伝達関数ゲイン基底NMFにより,遠方から到来する雑音を複数マイクを用いて効果的に抑圧する手法を提案する。提案手法では,背景雑音が遠方から到来することを仮定し,時間周波数領域における振幅情報のみに着目することで,様々な方向から到来する遠方音源を一つの混合音源としてモデル化する。次にこの振幅の混合モデルを従来提案されている制約付き伝達関数ゲイン基底NMFに適用し,遠方音源の抑圧を行う。更に,半教師あり伝達関数ゲイン基底NMFを適用し,遠方音源の抑圧を行う。本手法は振幅情報のみを用いているため,非同期録音機器を用いることができ

    DOI CiNii

  • Mel-generalized cepstral regularization for discriminative non-negative matrix factorization

    Li, Li, Kameoka, Hirokazu, Makino, Shoji

    Proc. MLSP     1 - 6  2017年09月  [査読有り]

  • Performance evaluation of nonlinear speech enhancement based on virtual increase of channels in reverberant environments

    Yamaoka, Kouei, Ono, Nobutaka, Yamada, Takeshi, Makino, Shoji

    Proc. EUSIPCO 2017     2388 - 2392  2017年08月  [査読有り]

  • Multiple far noise suppression in a real environment using transfer-function-gain NMF

    Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

    Proc. EUSIPCO 2017     2378 - 2382  2017年08月  [査読有り]

  • Convolutional neural network architecture and input volume matrix design for ERP classifications in a tactile P300-based brain-computer interface

    Kodama, Takumi, Makino, Shoji

    Proc. IEEE Engineering in Medicine & Biology Society (EMBC)     3814 - 3817  2017年07月  [査読有り]

  • 教師信号を用いた非同期分散型マイクロホンアレーによる音源分離

    坂梨, 龍太郎, 小野, 順貴, 宮部, 滋樹, 山田, 武志, 牧野, 昭二

    日本音響学会誌   73 ( 6 ) 337 - 348  2017年06月  [査読有り]

     概要を見る

    <p>近年,独立に動作する複数の録音機器を用いた非同期マイクロホンアレーが検討されている。非同期マイクロホンアレーは多チャネルのA/D変換器を必要としないので,従来より安価であり,マイクロホンの配置を柔軟に行えるという利点がある。一方,各チャネルの信号が時間的に同期して録音されておらず,また異なるA/D変換器を用いているために,録音開始時刻オフセットやサンプリング周波数ミスマッチ(以下ではミスマッチパラメータと総称する)が生じてしまう。従来,ミスマッチパラメータを推定して時間同期補償を行うために幾つかの手法が提案されている。しかし,処理が複雑であるために長時間の録音に対しては多大な処理時間を要するという問題がある。そこで本論文では,高速な時間同期補償と高性能な音源分離を実現するために,ミスマッチパラメータ推定と音源分離の両方を共通の教師信号を用いて行う枠組みを提案する。教師信号には,ある話者,あるいはある音源のみが音を発している区間である単一音源区間の信号を用いる。提案法では,時間的に離れた位置にある二つの単一音源区間の信号を手がかりに,ミスマッチパラメータを推定して時間同期補償を行う。更に,単一音源区間の信号を音源分離の教師信号として用い,またマイクロホンの分散型配置が可能という特徴を活用するように,音源分離手法であるSN比最大化ビームフォーマとDuong法を拡張する。実験の結果,提案法により十分な精度での時間同期補償が可能であり,また高い音源分離性能が得られることを確認した。</p>

    DOI CiNii

  • 柔軟索状ロボットにおける独立低ランク行列分析と統計的音声強調に基づく高品質ブラインド音源分離の開発

    三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

    日本機械学会ロボティクス・メカトロニクス講演会   1P2-P04   1 - 4  2017年05月

     概要を見る

    <p>In this paper, we propose a novel blind source separation method for the hose-shaped rescue robot based on independent low-rank matrix analysis and statistical speech enhancement. The rescue robot is aimed to detect victims'speech in a disaster area, wearing multiple microphones around the body. Different from the common microphone array, the positions of microphones are unknown, and the conventional beamformer cannot be utilized. In addition, the vibration noise (ego-noise) is generated when the robot moves, yielding the serious contamination in the observed signals. Therefore, it is important to eliminate the ego-noise in this system. This paper describes our newly developed software and hardware system of blind source separation for the robot noise reduction. Also, we report objective and subjective evaluation results showing that the proposed system outperforms the conventional methods in the source separation accuracy and perceptual sound quality via experiments with actual sounds observed in the rescue robot.</p>

    DOI CiNii

  • DNN-GMMと連結特徴量を用いた音響シーン識別の検討

    高橋, 玄, 山田, 武志, 小野, 順貴, 牧野, 昭二

    日本音響学会2017年春季研究発表会   2-P-1   135 - 138  2017年03月

  • 補助関数法による識別的NMFの基底学習アルゴリズム

    李莉, 亀岡弘和, 牧野昭二

    日本音響学会2017年春季研究発表会   1-P-4   519 - 522  2017年03月

  • 独立低ランク行列分析と統計的音声強調を用いた柔軟索状ロボットにおけるブラインド音源分離システムの開発

    三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

    日本音響学会2017年春季研究発表会   1-P-3   517 - 518  2017年03月

  • SJ-CATにおける項目応答理論に基づく能力値推定の精度改善

    小野, 友暉, 山田, 武志, 菊地, 賢一, 今井, 新悟, 牧野, 昭二

    日本音響学会2017年春季研究発表会   2-6-3   247 - 250  2017年03月

  • 音響ロスレス符号化MPEG-4 ALSのハイレゾ音源適応の検討と考察

    天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

    日本音響学会2017年春季研究発表会   2-P-42   381 - 382  2017年03月

  • Analysis of the brain activated distributions in response to full-body spatial vibrotactile stimuli using a tactile P300-based BCI paradigm

    Kodama, T, Makino, Shoji

    Proc. Biomedical and Health Informatics (BHI)     1 - 1  2017年02月  [査読有り]

  • Low latency and high quality two-stage human-voice-enhancement system for a hose-shaped rescue robot

    Yoshiaki Bando, Hiroshi Saruwatari, Nobutaka Ono, Shoji Makino, Katustoshi Itoyama, Daichi Kitamura, Masaru Ishimura, Moe Takakusaki, Narumi Mae, Kouei Yamaoka, Yutaro Matsui, Yuichi Ambe, Masashi Konyo, Satoshi Tadokoro, Kazuyoshi Yoshii, Hiroshi G. Okuno

    Journal of Robotics and Mechatronics   29 ( 1 ) 198 - 212  2017年02月

     概要を見る

    This paper presents the design and implementation of a two-stage human-voice enhancement system for a hose-shaped rescue robot. When a microphoneequipped hose-shaped robot is used to search for a victim under a collapsed building, human-voice enhancement is crucial because the sound captured by a microphone array is contaminated by the ego-noise of the robot. For achieving both low latency and high quality, our system combines online and offline human-voice enhancement, providing an overview first and then details on demand. The online enhancement is used for searching for a victim in real time, while the offline one facilitates scrutiny by listening to highly enhanced human voices. Our online enhancement is based on an online robust principal component analysis, and our offline enhancement is based on an independent lowrank matrix analysis. The two enhancement methods are integrated with Robot Operating System (ROS). Experimental results showed that both the online and offline enhancement methods outperformed conventional methods.

    DOI

    Scopus

    9
    被引用数
    (Scopus)
  • DISCRIMINATIVE NON-NEGATIVE MATRIX FACTORIZATION WITH MAJORIZATION-MINIMIZATION

    Li Li, Hirokazu Kameoka, Shoji Makino

    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017)     141 - 145  2017年  [査読有り]

     概要を見る

    Non-negative matrix factorization (NMF) is a powerful approach to single channel audio source separation. In a supervised setting, NMF is first applied to train the basis spectra of each sound source. At test time, NMF is applied to the spectrogram of a mixture signal using the pretrained spectra. The source signals can then be separated out using a Wiener filter. A typical way to train the basis spectra of each source is to minimize the objective function of NMF. However, the basis spectra obtained in this way do not ensure that the separated signal will be optimal at test time due to the inconsistency between the objective functions for training and separation (Wiener filtering). To address this, a framework called discriminative NMF (DNMF) has recently been proposed. In in this work a multiplicative update algorithm was proposed for the basis training, however one drawback is that the convergence is not guaranteed. To overcome this drawback, this paper proposes using a majorization-minimization principle to develop a convergence-guaranteed algorithm for DNMF. Experimental results showed that the proposed algorithm outperformed standard NMF and DNMF using a multiplicative update algorithm as regards both the signal-to-distortion and signal-to-interference ratios.

  • Blind source separation and multi-talker speech recognition with ad hoc microphone array using smartphones and cloud storage

    越智景子, 小野順貴, 宮部滋樹, 牧野昭二

    Acoustical Science and Technology    2017年  [査読有り]

  • Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization

    Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH   2017-   1998 - 2002  2017年  [査読有り]

     概要を見る

    Spectral domain speech enhancement algorithms based on nonnegative spectrogram models such as non-negative matrix factorization (NMF) and non-negative matrix factor deconvolution are powerful in terms of signal recovery accuracy, however they do not directly lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. We have previously proposed a method that makes it possible to enhance speech in the spectral and cepstral domains simultaneously. Although this method was shown to be effective, the devised algorithm was computationally demanding. This paper proposes yet another formulation that allows for a fast implementation by replacing the regularization term with a divergence measure between the NMF model and the mel-generalized cepstral (MGC) representation of the target spectrum. Since the MGC is an auditory-motivated representation of an audio signal widely used in parametric speech synthesis, we also expect the proposed method to have an effect in enhancing the perceived quality. Experimental results revealed the effectiveness of the proposed method in terms of both the signal-To-distortion ratio and the cepstral distance.

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • Full-body tactile P300-based brain-computer interface accuracy refinement

    Kodama, T, Shimizu, K, Makino, Shoji, Rutkowski, T

    Proc. International Conference on Bio-engineering for Smart Technologies (BioSMART)     1 - 4  2016年12月  [査読有り]

  • 伝達関数ゲイン基底NMFを用いた遠方雑音抑圧の実環境での評価

    松井,裕太郎, 牧野,昭二, 小野,順貴, 山田,武志

    第31回信号処理シンポジウム   B3-1   231 - 235  2016年11月

  • Flexible microphone array based on multichannel nonnegative matrix factorization and statistical signal estimation

    Saruwatari, H, Takata, K, Ono, N, Makino, Shoji

    International Congress on Acoustics     1 - 10  2016年09月  [査読有り]

  • Acoustic scene classification using deep neural network and frame-concatenated acoustic feature

    Gen,Takahashi, Takeshi,Yamada, Shoji,Makino, Nobutaka,Ono

    DCASE2016 Challenge     1 - 2  2016年09月

  • 雑音下音声認識における必要発話音量提示機能の実装と評価

    後藤,孝宏, 山田,武志, 牧野,昭二

    日本音響学会秋季研究発表会   3-Q-12   117 - 120  2016年09月

  • ヴァーチャル多素子化に基づくSN比最大化ビームフォーマの残響に対する性能変化

    山岡,洸瑛, 小野,順貴, 山田,武志, 牧野,昭二

    日本音響学会秋季研究発表会   3-7-5   379 - 382  2016年09月

  • Toward a QoL improvement of ALS patients: Development of the full-body P300-based tactile brain-computer interface

    Kodama, T, Makino, Shoji, Rutkowski, T

    Proc. AEARU Young Researchers International Conference     1 - 4  2016年09月  [査読有り]

  • 日本語スピーキングテストSJ-CATにおける項目応答理論に基づく能力値推定の検証

    小野,友暉, 山田,武志, 菊地,賢一, 今井,新悟, 牧野,昭二

    日本音響学会秋季研究発表会   3-Q-26   253 - 256  2016年09月

  • Amplitude-based speech enhancement with non-negative matrix factorization in time-channel domain for ad-hoc microphone array

    H., Chiba, N., Ono, S., Miyabe, Y., Takahashi, T., Yamada, S., Makino

    J. Acoust. Soc. Jpn   72 ( 8 ) 462 - 470  2016年08月  [査読有り]

    CiNii

  • アドホックマイクロホンアレーにおける時間チャネル領域での非負値行列因子分解を用いた振幅ベースの音声強調

    千葉,大将, 小野,順貴, 宮部,滋樹, 高橋,祐, 山田,武志, 牧野,昭二

    日本音響学会誌   72 ( 8 ) 462 - 470  2016年08月  [査読有り]

    CiNii

  • アドホックマイクロホンアレーにおける時間チャネル領域での非負値行列因子分解を用いた振幅ベースの音声強調

    千葉大将, 小野順貴, 宮部滋樹, 高橋祐, 山田武志, 牧野昭二

    日本音響学会誌   72 ( 8 ) 462 - 470  2016年08月  [査読有り]

     概要を見る

    <p>本論文では,時間チャネル領域の非負値行列因子分解(NMF)による,非同期分散型録音の目的音強調手法について述べる。複数の録音機器による多チャネル信号は,機器ごとのサンプリング周波数の微小なずれが引き起こす位相差のドリフトのため,位相情報を用いるアレー信号処理は適さない。位相に比べると振幅の分析はドリフトの影響を大きく受けないことに着目し,戸上らが提案した時間チャネル領域のNMFによるチャネル間ゲイン差の分析(伝達関数ゲイン基底NMF)に基づく時間周波数マスクを用いる。また,基底数よりも十分大きなチャネル数が得られない条件の音声強調のための,基底を事前に学習する教師ありNMFについて議論する。</p>

    DOI

  • 音声のスペクトル領域とケプストラム領域における同時強調

    李莉, 亀岡弘和, 樋口卓哉, 猿渡洋, 牧野昭二

    信学技報   SP2016-32   29 - 32  2016年08月

  • An Extension of MUSIC Exploiting Higher-Order Moments via Nonlinear Mapping

    Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Biing-Hwang Juang

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E99A ( 6 ) 1152 - 1162  2016年06月  [査読有り]

     概要を見る

    MUltiple Signal Classification (MUSIC) is a standard technique for direction of arrival (DOA) estimation with high resolution. However, MUSIC cannot estimate DOAs accurately in the case of underdetermined conditions, where the number of sources exceeds the number of microphones. To overcome this drawback, an extension of MUSIC using cumulants called 2q-MUSIC has been proposed, but this method greatly suffers from the variance of the statistics, given as the temporal mean of the observation process, and requires long observation. In this paper, we propose a new approach for extending MUSIC that exploits higher-order moments of the signal for the underdetermined DOA estimation with smaller variance. We propose an estimation algorithm that nonlinearly maps the observed signal onto a space with expanded dimensionality and conducts MUSIC-based correlation analysis in the expanded space. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method enables the estimation of DOAs in the case of underdetermined conditions. Furthermore, we describe the class of mapping that allows us to analyze the higher-order moments of the observed signal in the original space. We compare 2q-MUSIC and the proposed method through an experiment assuming that the true number of sources is known as prior information to evaluate in terms of the bias-variance tradeoff of the statistics and computational complexity. The results clarify that the proposed method has advantages for both computational complexity and estimation accuracy in short-time analysis, i.e., the time duration of the analyzed data is short.

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • An extension of MUSIC exploiting higher-order moments via nonlinear mapping

    Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Biing-Hwang Juang

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E99A ( 6 ) 1152 - 1162  2016年06月  [査読有り]

     概要を見る

    MUltiple SIgnal Classification (MUSIC) is a standard technique for direction of arrival (DOA) estimation with high resolution. However, MUSIC cannot estimate DOAs accurately in the case of underdetermined conditions, where the number of sources exceeds the number of microphones. To overcome this drawback, an extension of MUSIC using cumulants called 2q-MUSIC has been proposed, but this method greatly suffers from the variance of the statistics, given as the temporal mean of the observation process, and requires long observation. In this paper, we propose a new approach for extending MUSIC that exploits higher-order moments of the signal for the underdetermined DOA estimation with smaller variance. We propose an estimation algorithm that nonlinearly maps the observed signal onto a space with expanded dimensionality and conducts MUSIC-based correlation analysis in the expanded space. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method enables the estimation of DOAs in the case of underdetermined conditions. Furthermore, we describe the class of mapping that allows us to analyze the higher-order moments of the observed signal in the original space. We compare 2q-MUSIC and the proposed method through an experiment assuming that the true number of sources is known as prior information to evaluate in terms of the bias-variance tradeoff of the statistics and computational complexity. The results clarify that the proposed method has advantages for both computational complexity and estimation accuracy in short-time analysis, i.e., the time duration of the analyzed data is short.

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

    高草木萌, 北村大地, 小野順貴, 山田武志, Makino, Shoji, 猿渡洋

    日本機械学会ロボティクス・メカトロニクス講演会   1A2-10a3   1 - 4  2016年06月

  • 独立ベクトル分析とノイズキャンセラを用いた雑音抑圧の柔軟索状ロボットへの適用

    石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

    日本機械学会ロボティクス・メカトロニクス講演会2016   1P1-08b3   1 - 4  2016年06月

     概要を見る

    <p>This paper presents a noise reduction on a hose-shaped rescue robot. The hose-shaped rescue robot is one of rescue robots developed on Tough Robotics Challenge, and it is used for searching for victims by getting one's voice with its microphone-array. However, the ego noise, caused by its vibration motors, makes it difficult to get the human voice. We propose a noise reduction method using a blind source separation technique based on Independent Vector Analysis (IVA) and noise cancellation. Our method consists of two steps: (1) estimating a speech signal and an ego-noise signal from observed multi-channel signals using the IVA-based blind source separation technique, and (2) applying the noise cancellation to the estimated speech signal using the estimated ego-noise signal as a noise reference.</p>

    DOI

  • ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

    高草木萌, 北村大地, 小野順貴, 山田武志, Makino, Shoji, 猿渡洋

    日本機械学会ロボティクス・メカトロニクス講演会   1A2-10a3   1 - 4  2016年06月

     概要を見る

    <p>A hose-shaped rescue robot is one of the robots that are developed for disaster response in case of a large-scale disasters such as a great earthquake. The robot is suitable for entering narrow and dark places covered with rubble in the disaster site, and for finding inside it. This robot can transmit the ambient sound to its operator by using the built-in microphones. However, there is a serious problem that the inherent noise of this robot, such as the vibration sound or the fricative sound, is mixed into the transmitting voice, therefore disturbing the operator's hearing for a call of help from the victim of the disaster. In this paper, we apply the multichannel NMF (nonnegative matrix factorization) with the rank-1 spatial constraint (Rank-1 MNMF), which was proposed by Kitamura et al., to the reduction of the inherent noise.</p>

    DOI

  • 独立ベクトル分析とノイズキャンセラを用いた柔軟索状ロボットにおける雑音抑圧

    石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

    電子情報通信学会総合大会   2016   58 - 58  2016年03月

    CiNii

  • 教師あり多チャネルNMFと統計的音声強調を用いた柔軟索状ロボットにおける音源分離

    高田一真, 北村大地, 中嶋広明, 小山翔一, 猿渡洋, 小野順貴, 牧野,昭二

    日本音響学会2015年春季研究発表会   ( 3-3-2 ) 609 - 612  2016年03月

  • 非同期分散マイクロホンによるブラインド音源分離を用いた複数話者同時音声認識

    越智景子, 小野順貴, 宮部滋樹, 牧野,昭二

    日本音響学会2016年春季研究発表会   ( 3-3-1 ) 607 - 608  2016年03月

  • Vehicle counting and lane estimation with ad-hoc microphone array in real road environments

    Takuya,Toyoda, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    Proc. NCSP'16     622 - 625  2016年03月  [査読有り]

  • ランク1空間モデル制約付き多チャネルNMFを用いた雑音抑圧の柔軟索状ロボットへの適用

    高草木,萌, 北村,大地, 小野,順貴, 山田,武志, 牧野,昭二, 猿渡,洋

    電子情報通信学会総合大会   2016   57 - 57  2016年03月

    CiNii

  • 振幅のみからの相関推定と雑音尖度に基づく空間サブトラクションアレーの減算係数最適化

    李,傑, 宮部,滋樹, 小野,順貴, 山田,武志, 牧野,昭二

    日本音響学会2016年春季研究発表会     689 - 692  2016年03月

  • Performance estimation of noisy speech recognition using spectral distortion and recognition task complexity

    Ling Guo, Takeshi Yamada, Shigeki Miyabe, Shoji Makino, Nobuhiko Kitawaki

    Acoustical Science and Technology   37 ( 6 ) 286 - 294  2016年  [査読有り]

     概要を見る

    Previously, methods for estimating the performance of noisy speech recognition based on a spectral distortion measure have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, no consideration is given to any change in the components of a speech recognition system. To solve this problem, we propose a novel method for estimating the performance of noisy speech recognition, a major feature of which is the ability to accommodate the use of different noise reduction algorithms and recognition tasks by using two cepstral distances (CDs) and the square mean root perplexity (SMR-perplexity). First, we verified the effectiveness of the proposed distortion measure, i.e., the two CDs. The experimental results showed that the use of the proposed distortion measure achieves estimation accuracy equivalent to the use of the conventional distortion measures, the perceptual evaluation of speech quality (PESQ) and the signal-to-noise ratio (SNR) of noise-reduced speech, and has the advantage of being applicable to noise reduction algorithms that directly output the mel-frequency cepstral coefficient (MFCC) feature. We then evaluated the proposed method by performing a closed test and an open test (10-fold crossvalidation test). The results confirmed that the proposed method gives better estimates without being dependent on the differences among the noise reduction algorithms or the recognition tasks.

    DOI

    Scopus

  • Performance Estimation of Spontaneous Speech Recognition Using Non-Reference Acoustic Features

    Ling Guo, Takeshi Yamada, Shoji Makino

    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1 - 4  2016年  [査読有り]

     概要を見る

    To ensure a satisfactory QoE (Quality of Experience), it is essential to establish a method that can be used to efficiently investigate recognition performance for spontaneous speech. By using this method, it is allowed to monitor the recognition performance in providing speech recognition services. It can be also used as a reliability measure in speech dialogue systems. Previously, methods for estimating the performance of noisy speech recognition based on spectral distortion measures have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, the methods cannot be applied to spontaneous speech because they require the reference speech to obtain the distortion values. To solve this problem, we propose a novel method for estimating the recognition performance of spontaneous speech with various speaking styles. The main feature is to use non-reference acoustic features that do not require the reference speech. The proposed method extracts non-reference features by openSMILE (open-Source Media Interpretation by Large feature-space Extraction) and then estimates the recognition performance by using SVR (Support Vector Regression). We confirmed the effectiveness of the proposed method by experiments using spontaneous speech data from the OGVC (On-line Gaming Voice Chat) corpus.

  • NOISE REDUCTION USING INDEPENDENT VECTOR ANALYSIS AND NOISE CANCELLATION FOR A HOSE-SHAPED RESCUE ROBOT

    Masaru Ishimura, Shoji Makino, Takeshi Yamada, Nobutaka Ono, Hiroshi Saruwatari

    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC)     1 - 5  2016年  [査読有り]

     概要を見る

    In this paper, we present noise reduction for a hose-shaped rescue robot. The robot is used for searching for disaster victims by capturing their voice with its microphone array. However, the ego noise generated by its vibration motors makes it difficult to distinguish human voices. To solve this problem, we propose a noise reduction method using a blind source separation technique based on independent vector analysis (IVA) and noise cancellation. Our method consists of two steps: (1) estimating a speech signal and an ego noise signal from observed multichannel signals using the IVA-based blind source separation technique, and (2) applying noise cancellation to the estimated speech signal using the estimated ego noise signal as a noise reference. The experimental evaluations show that this approach is effective for suppressing the ego noise.

  • Visual Motion Onset Brain--computer Interface

    Tomasz M. Rutkowski

    Proc. International Conference on Bio-engineering for Smart Technologies (BioSMART)     1 - 4  2016年  [査読有り]

  • Nonlinear speech enhancement by virtual increase of channels and maximum SNR beamformer

    Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING   2016 ( 1 ) 1 - 8  2016年01月  [査読有り]

     概要を見る

    In this paper, we propose a new microphone array signal processing technique, which increases the number of microphones virtually by generating extra signal channels from real microphone signals. Microphone array signal processing methods such as speech enhancement are effective for improving the quality of various speech applications such as speech recognition and voice communication systems. However, the performance of speech enhancement and other signal processing methods depends on the number of microphones. Thus, special equipment such as a multichannel A/D converter or a microphone array is needed to achieve high processing performance. Therefore, our aim was to establish a technique for improving the performance of array signal processing with a small number of microphones and, in particular, to increase the number of channels virtually by synthesizing virtual microphone signals, or extra signal channels, from two channels of microphone signals. Each virtual microphone signal is generated by interpolating a short-time Fourier transform (STFT) representation of the microphone signals. The phase and amplitude of the signal are interpolated individually. The phase is linearly interpolated on the basis of a sound propagation model, and the amplitude is nonlinearly interpolated on the basis of beta divergence. We also performed speech enhancement experiments using a maximum signal-to-noise ratio (SNR) beamformer equipped with virtual microphones and evaluated the improvement in performance upon introducing virtual microphones.

    DOI

    Scopus

    17
    被引用数
    (Scopus)
  • EGO-NOISE REDUCTION FOR A HOSE-SHAPED RESCUE ROBOT USING DETERMINED RANK-1 MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION

    Moe Takakusaki, Daichi Kitamura, Nobutaka Ono, Takeshi Yamada, Shoji Makino, Hiroshi Saruwatari

    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC)     1 - 4  2016年  [査読有り]

     概要を見る

    A hose-shaped rescue robot is one of the robots that have been developed for disaster response in times of large-scale disasters such as a massive earthquake. This robot is suitable for entering narrow and dark places covered with rubble in a disaster site and for finding victims inside it. It can transmit ambient sound captured by its built-in microphones to its operator. However, there is a serious problem, that is, the inherent noise of this robot, such as vibration sound or fricative sound, is mixed with the transmitted voice, thereby disturbing the operator's perception of a call for help from a disaster victim. In this paper, we apply the multichannel nonnegative matrix factorization (NMF) with the rank-1 spatial constraint (determined rank-1 MNMF), which was proposed by Kitamura et al., to the reduction of the inherent noise.

  • Multi-talker Speech Recognition Based on Blind Source Separation with Ad hoc Microphone Array Using Smartphones and Cloud Storage

    Keiko Ochi, Nobutaka Ono, Shigeki Miyabe, Shoji Makino

    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5     3369 - 3373  2016年  [査読有り]

     概要を見る

    In this paper, we present a multi-talker speech recognition system based on blind source separation with an ad hoc microphone array, which consists of smartphones and cloud storage. In this system, a mixture of voices from multiple speakers is recorded by each speaker's smartphone, which is automatically transferred to online cloud storage. Our prototype system is realized using iPhone and Dropbox. Although the signals recorded by different iPhones are not synchronized, the blind synchronization technique compensates both the differences in the time offset and the sampling frequency mismatch. Then, auxiliary-function-based independent vector analysis separates the synchronized mixture into each speaker's voice. Finally, automatic speech recognition is applied to transcribe the speech. By experimental evaluation of the multi-talker speech recognition system using Julius, we confirm that it effectively reduces the speech overlap and improves the speech recognition performance.

    DOI

    Scopus

    17
    被引用数
    (Scopus)
  • Tactile Brain-computer Interface Using Classification of P300 Responses Evoked by Full Body Spatial Vibrotactile Stimuli

    Takumi Kodama, Shoji Makino, Tomasz M. Rutkowski

    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1 - 4  2016年  [査読有り]

     概要を見る

    In this study we propose a novel stimulus-driven brain-computer interface (BCI) paradigm, which generates control commands based on classification of somatosensory modality P300 responses. Six spatial vibrotactile stimulus patterns are applied to entire back and limbs of a user. The aim of the current project is to validate an effectiveness of the vibrotactile stimulus patterns for BCI purposes and to establish a novel concept of tactile modality communication link, which shall help locked-in syndrome (LIS) patients, who lose their sight and hearing due to sensory disabilities. We define this approach as a full-body BCI (fbBCI) and we conduct psychophysical stimulus evaluation and realtime EEG response classification experiments with ten healthy body-able users. The grand mean averaged psychophysical stimulus pattern recognition accuracy have resulted at 9 8 : 1 8 %, whereas the realtime EEG accuracy at 5 3 : 6 7 %. An information-transfer-rate (ITR) scores of all the tested users have ranged from 0 : 0 4 2 to 4 : 1 5 4 bit/minute.

  • Ego Noise Reduction for Hose-Shaped Rescue Robot Combining Independent Low-Rank Matrix Analysis and Noise Cancellation

    Narumi Mae, Daichi Kitamura, Masaru Ishimura, Takeshi Yamada, Shoji Makino

    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1 - 6  2016年  [査読有り]

     概要を見る

    In this paper, we present an ego noise reduction method for a hose-shaped rescue robot developed for search and rescue operations in large-scale disasters such as a massive earthquake. It can enter narrow and dark places covered with rubble in a disaster site and is used to search for disaster victims by capturing their voices with its microphone array. However, ego noises, such as vibration or fricative sounds, are mixed with the voices, and it is difficult to differentiate them from a call for help from a disaster victim. To solve this problem, we here propose a two-step noise reduction method as follows: (1) the estimation of both speech and ego noise signals from an observed multichannel signal by multichannel nonnegative matrix factorization (NMF) with the rank-1 spatial constraint, which was proposed by Kitamura et al., and (2) the application of noise cancellation to the estimated speech signal using the noise reference. Our evaluations show that this approach is effective for suppressing ego noise.

  • Unisoner:様々な歌手が同一楽曲を歌ったWeb上の多様な歌声を活用する合唱制作支援インタフェース

    都築,圭太, 中野,倫靖, 後藤,真孝, 山田,武志, 牧野,昭二

    情報処理学会論文誌   56 ( 12 ) 2370 - 2383  2015年12月  [査読有り]

     概要を見る

    本論文では,Web上で公開されている「1つの楽曲を様々な歌手が歌った歌声」から,合唱と呼ばれる作品を制作するためのインタフェースUnisonerを提案する.従来,このような合唱制作では,伴奏を抑制した各歌声波形を楽曲のフレーズごとに切り貼りし,音量の大小や左右のバランスを調整したうえで重ね合わせる必要があり,時間と労力がかかっていた.それに対してUnisonerでは,歌詞に基づいた楽曲内位置の指定と,歌手アイコンのドラッグアンドドロップ操作に基づいた音量調整を可能とするインタフェースによって,直感的かつ効率的に合唱を制作することができる.さらに,歌声のF0(基本周波数)とMFCC(Mel Frequency Cepstral Coefficient)に基づいた音響的な類似度や,MFCCに基づいた歌手性別の推定結果に加え,再生数などのWeb上のメタデータを活用した歌手検索機能も持つ.このような機能を実現するためには,伴奏をともなう歌声のF0推定手法や,歌声と歌詞のアラインメント手法が必要となるが,それらの推定結果に誤りが含まれることが問題となる.そこで本論文では,誤りを含む単一の歌声からの推定結果に対し,複数の歌声の推定結果を統合して誤りを削減する手法を提案する.評価実験の結果,Unisonerによって合唱制作時間が短縮されること,提案手法によりF0推定と歌詞アラインメントにおける誤りが減少することを確認した.This paper proposes Unisoner, an interface for assisting the creation of derivative choruses, in which voices of different singers singing the same song are overlapped on top of one shared accompaniment. In the past, it was time-consuming to create such choruses because creators had to manually cut and paste vocal fragments from different singers, and then adjust the volume and panning of each voice. Unisoner enables users to perform such editing tasks efficiently by selecting phrases using lyrics and by dragging and dropping the corresponding icons onto a virtual stage. Moreover, Unisoner can search vocals with acoustic similarity based on F0 and MFCC, estimated gender, and metadata such as the number of views. We use a vocal F0 estimation technique from polyphonic audio signals, and a technique to synchronize audio signals with lyrics. However, estimation errors occur using conventional techniques for F0 and lyric alignment, so we propose a novel method of reducing those errors by integrating the estimated results from many voices singing the same song. The experimental results confirmed that Unisoner can shorten the time for creating derivative choruses, and the proposed methods can reduce the estimation error of F0 and lyric alignment.

    CiNii

  • Unisoner: An interface for derivative chorus creation from various singing voices singing the same song on the web

    K.,Tsuzuki, T.,Nakano, M.,Goto, T.,Yamada, S.,Makino

    Journal of Information Processing   56 ( 12 ) 2370 - 2383  2015年12月  [査読有り]

     概要を見る

    本論文では,Web上で公開されている「1つの楽曲を様々な歌手が歌った歌声」から,合唱と呼ばれる作品を制作するためのインタフェースUnisonerを提案する.従来,このような合唱制作では,伴奏を抑制した各歌声波形を楽曲のフレーズごとに切り貼りし,音量の大小や左右のバランスを調整したうえで重ね合わせる必要があり,時間と労力がかかっていた.それに対してUnisonerでは,歌詞に基づいた楽曲内位置の指定と,歌手アイコンのドラッグアンドドロップ操作に基づいた音量調整を可能とするインタフェースによって,直感的かつ効率的に合唱を制作することができる.さらに,歌声のF0(基本周波数)とMFCC(Mel Frequency Cepstral Coefficient)に基づいた音響的な類似度や,MFCCに基づいた歌手性別の推定結果に加え,再生数などのWeb上のメタデータを活用した歌手検索機能も持つ.このような機能を実現するためには,伴奏をともなう歌声のF0推定手法や,歌声と歌詞のアラインメント手法が必要となるが,それらの推定結果に誤りが含まれることが問題となる.そこで本論文では,誤りを含む単一の歌声からの推定結果に対し,複数の歌声の推定結果を統合して誤りを削減する手法を提案する.評価実験の結果,Unisonerによって合唱制作時間が短縮されること,提案手法によりF0推定と歌詞アラインメントにおける誤りが減少することを確認した.This paper proposes Unisoner, an interface for assisting the creation of derivative choruses, in which voices of different singers singing the same song are overlapped on top of one shared accompaniment. In the past, it was time-consuming to create such choruses because creators had to manually cut and paste vocal fragments from different singers, and then adjust the volume and panning of each voice. Unisoner enables users to perform such editing tasks efficiently by selecting phrases using lyrics and by dragging and dropping the corresponding icons onto a virtual stage. Moreover, Unisoner can search vocals with acoustic similarity based on F0 and MFCC, estimated gender, and metadata such as the number of views. We use a vocal F0 estimation technique from polyphonic audio signals, and a technique to synchronize audio signals with lyrics. However, estimation errors occur using conventional techniques for F0 and lyric alignment, so we propose a novel method of reducing those errors by integrating the estimated results from many voices singing the same song. The experimental results confirmed that Unisoner can shorten the time for creating derivative choruses, and the proposed methods can reduce the estimation error of F0 and lyric alignment.

    CiNii

  • Adaptive post-filtering method controlled by pitch frequency for CELP-based speech coding

    H.,Chiba, Y.,Kamamoto, T.,Moriya, N.,Harada, S.,Miyabe, T.,Yamada, S.,Makino

    IEICE Trans. Information and Systems   J98-D ( 10 ) 1301 - 1311  2015年10月  [査読有り]

  • CELPに基づく音声符号化向けのピッチ周波数に依存した適応ポストフィルタ

    千葉,大将, 鎌本,優, 守谷,健弘, 原田,登, 宮部,滋樹, 山田,武志, 牧野,昭二

    電子情報通信学会論文誌   J98-D ( 10 ) 1301 - 1311  2015年10月  [査読有り]

  • ノンリファレンスひずみ特徴量を用いた雑音下音声認識性能推定の検討

    郭,レイ, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

    日本音響学会2015年秋季研究発表会     95 - 98  2015年09月

  • 日本語スピーキングテストSJ-CATにおける低スコア解答発話の検出の検討

    小野,友暉, 山田,武志, 今井,新悟, 牧野,昭二

    日本音響学会2015年秋季研究発表会     329 - 332  2015年09月

  • マイクロホンアレーの位相が観測できない条件でのチャネル間の相関係数の推定

    宮部滋樹, 小野順貴, 牧野,昭二

    回路とシステムワークショップ   28   347 - 352  2015年08月

    CiNii

  • Underdetermined blind speech separation with directivity pattern based continuous mask and ICA

    Shoko Araki, Shoji Makino, Hiroshi Sawada, Ryo Mukai

    European Signal Processing Conference   06-10-   1991 - 1994  2015年04月

     概要を見る

    We propose a method for separating speech signals when sources outnumber the sensors. In this paper we mainly concentrate on the case of three sources and two sensors. Some existing methods employ binary masks to extract the signals, and therefore, the extracted signals contain loud musical noise. To overcome this problem, we propose the utilization of a directivity pattern based continuous mask, which removes a single source from the observations, and independent component analysis (ICA) to separate the remaining mixtures. Experimental results show that our proposed method can separate signals with little distortion even in a real reverberant environment of T R =130 ms.

  • 認識性能予測に基づく雑音環境下音声認識のユーザビリティ改善の検討

    青木,智充, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会     133 - 136  2015年03月

  • 非同期分散マイクロフォンアレーによる伝達関数ゲイン基底NMFを用いた拡散雑音抑圧

    村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会2015年春季研究発表会     557 - 560  2015年03月

  • Activity Report from the AASP-TC

    Makino,Shoji

    IEEE Signal Processing Society eNewsletter, TC News    2015年03月  [査読有り]

  • Signal Processing Techniques for Assisted Listening

    Sven Nordholm, Walter Kellermann, Simon Doclo, Vesa Vaelimaeki, Shoji Makino, John R. Hershey

    IEEE SIGNAL PROCESSING MAGAZINE   32 ( 2 ) 16 - 17  2015年03月  [査読有り]

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • ステレオ録音に基づく移動音源モデルによる走行車両検出と走行方向推定

    遠藤,純基, 豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会2015年春季研究発表会     717 - 720  2015年03月

  • 総合品質と明瞭性の客観推定に基づくスペクトルサブトラクションの減算係数の最適化

    中里,徹, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会     333 - 336  2015年03月

  • ケプストラム距離とSMR-パープレキシティを用いた雑音下音声認識の性能推定の検討

    郭,レイ, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会     129 - 132  2015年03月

  • Spatial tactile brain-computer interface by applying vibration to user's shoulders and waist

    T.,Kodama, Makino,Shoji, T.M.,Rutkowski

    AEARU Workshop on Computer Science and Web Technology     41 - 42  2015年02月  [査読有り]

  • SSVEP brain-computer interface using green and blue lights

    D.,Aminaka, Makino,Shoji, T.M.,Rutkowski

    AEARU Workshop on Computer Science and Web Technology     39 - 40  2015年02月  [査読有り]

  • Spatial auditory brain-computer interface using head related impulse response

    C.,Nakaizumi, T.,Matsui, K.,Mori, Makino,Shoji, T.M.,Rutkowski

    AEARU Workshop on Computer Science and Web Technology     37 - 38  2015年02月  [査読有り]

  • Blind compensation of interchannel sampling frequency mismatch for ad hoc microphone array based on maximum likelihood estimation

    Shigeki Miyabe, Nobutaka Ono, Shoji Makino

    SIGNAL PROCESSING   107 ( SI ) 185 - 196  2015年02月  [査読有り]

     概要を見る

    In this paper, we propose a novel method for the blind compensation of drift for the asynchronous recording of an ad hoc microphone array. Digital signals simultaneously observed by different recording devices have drift of the time differences between the observation channels because of the sampling frequency mismatch among the devices. On the basis of a model in which the time difference is constant within each short time frame but varies in proportion to the central time of the frame, the effect of the sampling frequency mismatch can be compensated in the short-time Fourier transform (STFT) domain by a linear phase shift. By assuming that the sources are motionless and have stationary amplitudes, the observation is regarded as being stationary when drift does not occur. Thus, we formulate a likelihood to evaluate the stationarity in the STFT domain to evaluate the compensation of drift. The maximum likelihood estimation is obtained effectively by a golden section search. Using the estimated parameters, we compensate the drift by STFT analysis with a noninteger frame shift. The effectiveness of the proposed blind drift compensation method is evaluated in an experiment in which artificial drift is generated. (C) 2014 The Authors. Published by Elsevier B.V.

    DOI

    Scopus

    55
    被引用数
    (Scopus)
  • Tactile pin-pressure brain-computer interface

    K.,Shimizu, H.,Mori, Makino,Shoji, T.M.,Rutkowski

    AEARU Workshop on Computer Science and Web Technology     35 - 36  2015年02月  [査読有り]

  • Multi-command tactile brain-computer interface using the touch-sense glove

    H.,Yajima, Makino,Shoji, T.M.,Rutkowski

    AEARU Workshop on Computer Science and Web Technology     43 - 44  2015年02月  [査読有り]

  • Implementation and evaluation of an acoustic echo canceller using duo-filter control system

    Yoichi Haneda, Shoji Makino, Junji Kojima, Suehiro Shimauchi

    European Signal Processing Conference    2015年

     概要を見る

    The developed acoustic echo canceller uses an exponentially weighted step-size projection algorithm and a duo-filter control system to achieve fast convergence and high speech quality. The duo-filter control system has an adaptive filter and a fixed filter, and uses variable-loss insertion. Evaluation of this system with multi-channel A/D and D/A converters showed that (1) the convergence speed is under 1.5 seconds for speech input when the adaptive filter length is 125 ms, (2) the residual echo level is nearly as low as the ambient noise level (average: Under -20 dB
    maximum: Under -35 dB), and (3) near-end speech is sent with no disturbance during double talk.

  • Brain Evoked Potential Latencies Optimization for Spatial Auditory Brain-Computer Interface

    Cai,Zhenyu, Makino,Shoji, Rutkowski, Tomasz Maciej

    Cognitive Computation   7 ( 1 ) 34 - 43  2015年  [査読有り]

     概要を見る

    We propose a novel method for the extraction of discriminative features in electroencephalography (EEG) evoked potential latency. Based on our offline results, we present evidence indicating that a full surround sound auditory brain–computer interface (BCI) paradigm has potential for an online application. The auditory spatial BCI concept is based on an eight-directional audio stimuli delivery technique, developed by our group, which employs a loudspeaker array in an octagonal horizontal plane. The stimuli presented to the subjects vary in frequency and timbre. To capture brain responses, we utilize an eight-channel EEG system. We propose a methodology for finding and optimizing evoked response latencies in the P300 range in order later to classify them correctly and to elucidate the subject’s chosen targets or ignored non-targets. To accomplish the above, we propose an approach based on an analysis of variance for feature selection. Finally, we identify the subjects’ intended commands with a Naive Bayesian classifier for sorting the final responses. The results obtained with ten subjects in offline BCI experiments support our research hypothesis by providing higher classific

    DOI

    Scopus

    9
    被引用数
    (Scopus)
  • Chromatic and High-frequency cVEP-based BCI Paradigm

    Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)     1906 - 1909  2015年  [査読有り]

     概要を見る

    We present results of an approach to a code-modulated visual evoked potential (cVEP) based braincomputer interface (BCI) paradigm using four high-frequency flashing stimuli. To generate higher frequency stimulation compared to the state-of-the-art cVEP-based BCIs, we propose to use the light-emitting diodes (LEDs) driven from a small micro-controller board hardware generator designed by our team. The high-frequency and green-blue chromatic flashing stimuli are used in the study in order to minimize a danger of a photosensitive epilepsy (PSE). We compare the the green-blue chromatic cVEP-based BCI accuracies with the conventional white-black flicker based interface. The high-frequency cVEP responses are identified using a canonical correlation analysis (CCA) method.

  • Classification accuracy improvement of chromatic and high–frequency code–modulated visual evoked potential–based BCI

    Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9250   232 - 241  2015年  [査読有り]

     概要を見る

    © Springer International Publishing Switzerland 2015. We present results of a classification improvement approach for a code–modulated visual evoked potential (cVEP) based brain– computer interface (BCI) paradigm using four high–frequency flashing stimuli. Previously published research reports presented successful BCI applications of canonical correlation analysis (CCA) to steady–state visual evoked potential (SSVEP) BCIs. Our team already previously proposed the combined CCA and cVEP techniques’ BCI paradigm. The currently reported study presents the further enhanced results using a support vector machine (SVM) method in application to the cVEP–based BCI.

    DOI

    Scopus

    7
    被引用数
    (Scopus)
  • Fingertip Stimulus Cue-based Tactile Brain-computer Interface

    Hiroki Yajima, Shoji Makino, Tomasz M. Rutkowski

    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1059 - 1064  2015年  [査読有り]

     概要を見る

    The reported project aims to confirm whether a tactile glove fingertips' stimulator is effective for a brain-computer interface (BCI) paradigm using somatosensory event potential (SEP) responses with possible attentional modulation. The proposed simplified stimulator device is presented in detail together with psychophysical and EEG BCI experiment protocols. Results supporting the proposed simple tactile glove device are presented in form of online BCI classification accuracy results using shrinkage linear discriminant analysis (sLDA) technique. Finally, we discuss future possible paradigm improvement steps.

  • Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation

    Shigeki Miyabe, Notubaka Ono, Shoji Makino

    LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, LVA/ICA 2015   9237   421 - 428  2015年  [査読有り]

     概要を見る

    In this paper, we propose a method to estimate a correlation coefficient of two correlated complex signals on the condition that only the amplitudes are observed and the phases are missing. Our proposed method is based on a maximum likelihood estimation. We assume that the original complex random variables are generated from a zero-mean bivariate complex normal distribution. The likelihood of the correlation coefficient is formulated as a bivariate Rayleigh distribution by marginalization over the phases. Although the maximum likelihood estimator has no analytical form, an expectation-maximization (EM) algorithm can be formulated by treating the phases as hidden variables. We evaluate the accuracy of the estimation using artificial signal, and demonstrate the estimation of narrow-band correlation of a two-channel audio signal.

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • Inter-stimulus Interval Study for the Tactile Point-pressure Brain-computer Interface

    Kensuke Shimizu, Shoji Makino, Tomasz M. Rutkowski

    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)     1910 - 1913  2015年  [査読有り]

     概要を見る

    The paper presents a study of an inter-stimulus interval (ISI) influence on a tactile point-pressure stimulus-based brain-computer interface's (tpBCI) classification accuracy. A novel tactile pressure generating tpBCI stimulator is also discussed, which is based on a three-by-three pins' matrix prototype. The six pin-linear patterns are presented to the user's palm during the online tpBCI experiments in an oddball style paradigm allowing for "the aha-responses" elucidation, within the event related potential (ERP). A subsequent classification accuracies' comparison is discussed based on two ISI settings in an online tpBCI application. A research hypothesis of classification accuracies' non-significant differences with various ISIs is confirmed based on the two settings of 120 ms and 300 ms, as well as with various numbers of ERP response averaging scenarios.

  • Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model

    Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     599 - 603  2015年  [査読有り]

     概要を見る

    In this paper, we propose a method for suppressing a large number of interferences by using multichannel amplitude analysis based on nonnegative matrix factorization (NMF) and its effective semi-supervised training. For the point-source interference reduction of an asynchronous microphone array, we propose amplitude-based speech enhancement in the time-channel domain, which we call transfer-function-gain NMF. Transfer-function-gain NMF is a robust method against drift, which disrupts an inter-channel phase analysis. We use this method to suppress a large number of sources. We show that a mass of interferences can be modeled by a single basis assuming that the noise sources are sufficiently far from the microphones and the spatial characteristics become similar to each other. Since the blind optimization of the NMF parameters does not work well with merely sparse observation contaminated by the constant heavy noise, we train the diffuse noise basis in advance of the noise suppression using a speech absent observation, which can be obtained easily using a simple voice activity detection technique. We confirmed the effectiveness of our proposed model and semi-supervised transfer-function-gain NMF in an experiment simulating a target source that was surrounded by a diffuse noise.

  • Variable Sound Elevation Features for Head-related Impulse Response Spatial Auditory BCI

    Chisaki Nakaizumi, Shoji Makino, Tomasz M. Rutkowski

    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1094 - 1099  2015年  [査読有り]

     概要を見る

    This paper presents a study of classification and EEG feature improvement for a spatial auditory brain-computer interface (saBCI). This study provides a comprehensive test of a head-related impulse response (HRIR) cues for the saBCI speller paradigm. We present a comparison with previously developed HRIR-based spatial auditory modalities. We propose and optimize the three types of sound spatialization settings using a variable elevation in order to evaluate the HRIR efficacy for the saBCI. Three experienced and seven naive BCI users participate in the three experimental setups based on ten presented Japanese syllables. The obtained EEG auditory evoked potentials (AEPs) result with encouragingly good and stable P300 responses in online saBCI experiments. We analyze the differences and dispersions of saBCI command accuracies, as well as the individual user accuracies for various spatial sound locations. Our case study indicates that the participating users could perceive elevation in the saBCI experiments using the HRIR measured from a general head model.

  • Head-related Impulse Response Cues for Spatial Auditory Brain-computer Interface

    Chisaki Nakaizumi, Shoji Makino, Tomasz M. Rutkowski

    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)     1071 - 1074  2015年  [査読有り]

     概要を見る

    This study provides a comprehensive test of a head-related impulse response (HRIR) cues for a spatial auditory brain-computer interface (saBCI) speller paradigm. We present a comparison with the conventional virtual sound headphone-based spatial auditory modality. We propose and optimize the three types of sound spatialization settings using a variable elevation in order to evaluate the HRIR efficacy for the saBCI. Three experienced and seven naive BCI users participated in the three experimental setups based on ten presented Japanese syllables. The obtained EEG auditory evoked potentials (AEP) resulted with encouragingly good and stable P300 responses in online BCI experiments. Our case study indicated that users could perceive elevation in the saBCI experiments generated using the HRIR measured from a general head model. The saBCI accuracy and information transfer rate (ITR) scores have been improved comparing to the classical horizontal plane-based virtual spatial sound reproduction modality, as far as the healthy users in the current pilot study are concerned.

  • Eeg filtering optimization for code-modulated chromatic visual evoked potential-based brain-computer interface

    D. Aiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9359   1 - 6  2015年  [査読有り]

     概要を見る

    © Springer International Publishing Switzerland 2015. We present visual BCI classification accuracy improved results after application of high- and low-pass filters to an electroen- cephalogram (EEG) containing code-modulated visual evoked poten- tials (cVEPs). The cVEP responses are applied for the brain-computer interface (BCI) in four commands paradigm mode. The purpose of this project is to enhance BCI accuracy using only the single trial cVEP response. We also aim at identification of the most discriminable EEG bands suitable for the broadband visual stimuli. We report results from a pilot study optimizing the EEG filtering using infinite impulse response filters in application to feature extraction for a linear support vector machine (SVM) classification method. The goal of the presented study is to develop a faster and more reliable BCI to further enhance the sym- biotic relationships between humans and computers.

    DOI

  • SVM Classification Study of Code-modulated Visual Evoked Potentials

    Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1065 - 1070  2015年  [査読有り]

     概要を見る

    We present a study of a support vector machine (SVM) application to brain-computer interface (BCI) paradigm. Four SVM kernel functions are evaluated in order to maximize classification accuracy of a four classes-based BCI paradigm utilizing a code-modulated visual evoked potential (cVEP) response within the captured EEG signals. Our previously published reports applied only the linear SVM, which already outperformed a more classical technique of a canonical correlation analysis (CCA). In the current study we additionally test and compare classification accuracies of polynomial, radial basis and sigmoid kernels, together with the classical linear (non-kernel-based) SVMs in application to the cVEP BCI.

  • TDOA estimation by mapped SRP based on higher-order moment analysis

    Xiao-Dong,Zhai, Yuya,Sugimoto, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    Proc. APSIPA 2014    2014年12月  [査読有り]

  • Adaptive control of applying band-width for post filter of speech coder depending on pitch frequency

    Hironobu,Chiba, Yutaka,Kamamoto, Takehiro,Moriya, Noboru,Harada, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    Proc. Asilomar Conference on Signals, Systems, and Computers, Asilomar 2014    2014年11月  [査読有り]

  • ケプストラム距離を用いた雑音下音声認識の性能推定の検討

    郭,翎, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

    日本音響学会研究発表会講演論文集     61 - 62  2014年09月

  • Spatial tactile brain-computer interface paradigm applying vibration stimuli to large areas of user's back

    T.,Kodama, Makino,Shoji, T.M.,Rutkowski

    International Brain-Computer Interface Conference     1 - 4  2014年09月  [査読有り]

  • βダイバージェンスに基づく一般化振幅補間によるヴァーチャル多素子化を用いた目的音源強調

    片平,拓希, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会講演論文集     633 - 636  2014年09月

    CiNii

  • 伝達関数ゲイン基底NMFにおけるマイク数・マイク配置と目的音強調性能の関係

    村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会講演論文集     523 - 526  2014年09月

    CiNii

  • Spatial Auditory Two-step Input Japanese Syllabary Brain-computer Interface Speller

    M.,Chang, K.,Mori, Makino,Shoji, Rutkowski, Tomasz Maciej

    Procedia Technology   18   25 - 31  2014年09月  [査読有り]

     概要を見る

    We present a concept and a pilot study of a two-step input speller application combined with a spatial auditory brain-computer interface (BCI) for locked-in syndrome (LIS) users. The application has been developed for 25 Japanese syllabary (hiragana) characters using a two-step input procedure, in order to create an easy-to-use BCI-speller interface. In the proposed procedure, the user first selects the representative letter of a subset, defining the second step. In the second step, the final choice is made. At each interfacing step, the user's intentional choices are classified based on the P300 event related potential (ERP) responses captured in the EEG, as in the classic oddball paradigm. The BCI experiment and EEG results of the pilot study confirm the effectiveness of the proposed spelling method. (C) 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

    DOI

  • Head-related impulse response-based spatial auditory brain-computer interface

    C.,Nakaizumi, T.,Matsui, K.,Mori, Makino,Shoji, T.M.,Rutkowski

    International Brain-Computer Interface Conference     1 - 4  2014年09月  [査読有り]

  • 絶対値の観測のみを用いた2つの複素信号の相関係数推定

    宮部滋樹, 小野順貴, 牧野,昭二

    日本音響学会研究発表会講演論文集   ( 1-Q-40 ) 735 - 738  2014年09月

    CiNii

  • 教師なし伝達関数ゲイン基底NMFによる目的音強調における罰則項の特性評価

    千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会講演論文集     527 - 530  2014年09月

    CiNii

  • 分散型マイクロホンアレイを用いた交通車両検出とその車線推定の検討

    豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会講演論文集     643 - 646  2014年09月

    CiNii

  • Multi-stage declipping of clipping distortion based on length classification of clipped interval

    Chenlei,Li, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    日本音響学会研究発表会講演論文集     553 - 556  2014年09月

    CiNii

  • Unisoner: An interactive interface for derivative chorus creation from various singing voices on the web

    K.,Tsuzuki, T.,Nakano, M.,Goto, T.,Yamada, Makino,Shoji

    International Computer Music Conference joint with the Sound & Music Computing conference     790 - 797  2014年09月  [査読有り]

  • Unisoner: an interactive interface for derivative chorus creation from various singing voices on the Web

    Keita,Tsuzuki, Tomoyasu,Nakano, Masataka,Goto, Takeshi,Yamada, Shoji,Makino

    Proc. ICMC SMC 2014     790 - 797  2014年09月  [査読有り]

  • News from the AASP-TC

    Makino,Shoji

    IEEE Signal Processing Society eNewsletter, TC News    2014年08月  [査読有り]

  • Electroencephalogram steady state response sonification focused on the spatial and temporal properties

    Makino,Shoji, T.,Kaniwa, H.,Terasawa

    International Conference on Auditory Display   ( LS7-1 ) 1 - 7  2014年06月  [査読有り]

  • EEG Steady State Response Sonification Focused on the Spatial and Temporal Properties

    Kaniwa, Teruaki, Terasawa, Hiroko, Matsubara, Masaki, Rutkowski, Tomasz, Makino, Shoji

    Proceedings of the 20th International Conference on Auditory Display 2014 (ICAD2014)     1 - 7  2014年06月  [査読有り]

  • 周波数依存到来時間差推定に基づく劣決定ブラインド音源分離の高速化

    丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田,武志, 牧野昭二, 中村篤

    日本音響学会誌   70 ( 6 ) 323 - 331  2014年06月  [査読有り]

     概要を見る

    本論文ではEMアルゴリズムを用いたスパース性に基づく劣決定ブラインド音源分離(BSS)の計算を高速化する手法を提案する。Izumi et al.は,雑音・残響下でも頑健な劣決定BSSを提案したが,到来時間差パラメータをイタレーションごとに離散全探索で求める更新則のために計算量に問題があった。そこで,到来時間差パラメータが周波数に依存すると捉えた,時間差パラメータが解析的に更新される計算量の少ない更新則を提案する。また,帯域重み付け平均による帯域非依存到来時間差推定によってパラメータ数を削減し,収束性を向上させる。実験により,提案手法が計算時間を1/10程度に削減することを確認した。

    CiNii

  • Multimedia Information Processing Combining Brain Science, Life Science, and Information Science

    Makino,Shoji

    USJI Universities Research Report   vol.32  2014年06月  [査読有り]

  • Reduction of computational cost in underdetermined blind source separation based on frequency-dependent time-difference-of-arrival estimation

    T.,Maruyama, S.,Araki, T.,Nakatani, S.,Miyabe, T.,Yamada, 牧野,昭二, A.,Nakamura

    J. Acoust. Soc. Jpn   vol. 70 ( no. 6 ) 323 - 331  2014年06月  [査読有り]

     概要を見る

    本論文ではEMアルゴリズムを用いたスパース性に基づく劣決定ブラインド音源分離(BSS)の計算を高速化する手法を提案する。Izumi et al.は,雑音・残響下でも頑健な劣決定BSSを提案したが,到来時間差パラメータをイタレーションごとに離散全探索で求める更新則のために計算量に問題があった。そこで,到来時間差パラメータが周波数に依存すると捉えた,時間差パラメータが解析的に更新される計算量の少ない更新則を提案する。また,帯域重み付け平均による帯域非依存到来時間差推定によってパラメータ数を削減し,収束性を向上させる。実験により,提案手法が計算時間を1/10程度に削減することを確認した。

    CiNii

  • Acoustic signal processing based on asynchronous and distributed microphone array

    N., Ono, S., Miyabe, S., Makino

    J. Acoust. Soc. Jpn   vol. 70 ( no. 7 ) 391 - 396  2014年06月  [査読有り]

  • Reduction of computational cost in underdetermined blind source separation based on frequency dependent time-difference-of-arrival estimation

    丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田武志, 牧野, 昭二, 中村, 篤

    J. Acoust. Soc. Jpn   70 ( 6 ) 323 - 331  2014年06月  [査読有り]

     概要を見る

    本論文ではEMアルゴリズムを用いたスパース性に基づく劣決定ブラインド音源分離(BSS)の計算を高速化する手法を提案する。Izumi et al.は,雑音・残響下でも頑健な劣決定BSSを提案したが,到来時間差パラメータをイタレーションごとに離散全探索で求める更新則のために計算量に問題があった。そこで,到来時間差パラメータが周波数に依存すると捉えた,時間差パラメータが解析的に更新される計算量の少ない更新則を提案する。また,帯域重み付け平均による帯域非依存到来時間差推定によってパラメータ数を削減し,収束性を向上させる。実験により,提案手法が計算時間を1/10程度に削減することを確認した。

    CiNii

  • Ad-hoc microphone array - Acoustic signal processing using multiple mobile recording devices -

    N., Ono, K.L., Trung, S., Miyabe, S., Makino

    IEICE Fundamentals Review   vol. 7 ( no. 4 ) 336 - 347  2014年04月  [査読有り]

     概要を見る

    マイクロホンアレー信号処理は,複数のマイクロホンで取得した多チャネル信号を処理し,単一マイクロホンでは困難な,音源定位,音源強調,音源分離などを,音源の空間情報を用いることによって行う枠組みである.マイクロホンアレー信号処理においては,チャネル間の微小な時間差が空間情報の大きな手がかりであり,各チャネルを正確に同期させるために,従来は多チャネルA-D 変換器を備えた装置が必要であった.これに対し,我々の身の回りにある,ラップトップPC,ボイスレコーダ,スマートフォンなどの,同期していない録音機器によりマイクロホンアレー信号処理が可能になれば,その利便性は大きく,適用範囲を格段に広げることができる.本稿では、非同期録音機器を用いたマイクロホンアレー信号処理の新しい展開について,関連研究を概観しつつ,筆者らの取組みを紹介する.

    DOI CiNii

  • Adaptive post-fltering method controlled by pitch frequency for CELP-based speech coding

    H.,Chiba, Y.,Kamamoto, T.,Moriya, N.,Harada, S.,Miyabe, T.,Yamada, S.,Makino

    IEICE Trans. Information and Systems    2014年04月  [査読有り]

  • 非負値行列分解と位相復元に基づくオーディオ符号化の多チャネル化

    劉必翔, 澤田宏, 宮部滋樹, 山田武志, 牧野昭二

    日本音響学会春季研究発表会     819 - 822  2014年03月

    CiNii

  • 種々の雑音抑圧手法と認識タスクに適用可能な音声認識性能推定法の検討

    郭レイ, 山田武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会春季研究発表会     13 - 14  2014年03月

  • ACELP用ポストフィルタのピッチ強調帯域及び利得の適応化

    千葉大将, 鎌本優, 守谷健弘, 原田登, 宮部滋樹, 山田武志, 牧野昭二

    日本音響学会春季研究発表会     387 - 388  2014年03月

  • 日本語スピーキングテストS-CATの文読み上げ問題における発話の冗長性・不完全性を考慮した自動採点の検討

    山畑勇人, 盧昊, 山田武志, 今井新悟, 石塚賢吉, 牧野昭二, 北脇信彦

    日本音響学会春季研究発表会     269 - 272  2014年03月

  • 日本語スピーキングテストS-CATの自由発話問題における発話文の難易度を考慮した自動採点の検討

    盧昊, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 牧野昭二, 北脇信彦

    日本音響学会春季研究発表会     273 - 276  2014年03月

  • 分散型マイクロホンアレイを用いた交通量モニタリング

    豊田卓矢, 宮部滋樹, 山田,武志, 小野順貴, 牧野昭二

    電子情報通信学会総合大会講演論文集   2014   151  2014年03月

  • 非同期マイクロホンアレーの符号化録音におけるビットレートと同期性能の関係

    宮部,滋樹, 小野,順貴, 牧野,昭二, 高橋,祐

    音講論集   ( 3-2-8 ) 725 - 726  2014年03月

  • 伝達関数ゲイン基底NMFによる分散配置非同期録音における目的音強調の検討

    千葉大将, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二, 高橋祐

    日本音響学会春季研究発表会     757 - 760  2014年03月

    CiNii

  • Activity Report from the AASP-TC

    S.,Makino

    IEEE Signal Processing Society eNewsletter, TC News    2014年02月  [査読有り]

  • GENERALIZED AMPLITUDE INTERPOLATION BY beta-DIVERGENCE FOR VIRTUAL MICROPHONE ARRAY

    Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC)     149 - 153  2014年  [査読有り]

     概要を見る

    In this paper, we present a generalization of the virtual microphone array we previously proposed to increase the microphone elements by nonlinear interpolation. In the previous work, we generated a virtual observation from two actual microphones by an interpolation in the logarithmic domain. This corresponds to a linear interpolation of the phase and the geometric mean of the amplitude. In this paper, we generalize this interpolation using a linear interpolation of the phase and a nonlinear interpolation of the amplitude with adjustable nonlinearity based on beta-divergence. Improvement of the array signal processing performance is obtained by appropriate tuning of the parameter beta. We evaluate the improvement in speech enhancement using a maximum SNR beamformer.

  • AMPLITUDE-BASED SPEECH ENHANCEMENT WITH NONNEGATIVE MATRIX FACTORIZATION FOR ASYNCHRONOUS DISTRIBUTED RECORDING

    Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Yu Takahashi, Takeshi Yamada, Shoji Makino

    2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC)     203 - 207  2014年  [査読有り]

     概要を見る

    In this paper, we investigate amplitude-based speech enhancement for asynchronous distributed recording. In an ad-hoc microphone array context, it is supposed that different asynchronous devices record speech. As a result, the phase information is unreliable due to sampling frequency mismatch. For speech enhancement based on the amplitude information instead of the phase information, supervised nonnegative matrix factorization (NMF) is introduced in the time-channel domain. The basis vectors, which represents the gain of the transfer function from a source to each microphone, are trained in advance by using single source observation. The experimental evaluations show that this approach is well robust against the sampling frequency mismatch.

  • Spatial Auditory Two-step Input Japanese Syllabary Brain-computer Interface Speller

    Moonjeong Chang, Koichi Mori, Shoji Makino, Tomasz M. Rutkowski

    INTERNATIONAL WORKSHOP ON INNOVATIONS IN INFORMATION AND COMMUNICATION SCIENCE AND TECHNOLOGY, IICST 2014   18   25 - 31  2014年  [査読有り]

     概要を見る

    We present a concept and a pilot study of a two-step input speller application combined with a spatial auditory brain-computer interface (BCI) for locked-in syndrome (LIS) users. The application has been developed for 25 Japanese syllabary (hiragana) characters using a two-step input procedure, in order to create an easy-to-use BCI-speller interface. In the proposed procedure, the user first selects the representative letter of a subset, defining the second step. In the second step, the final choice is made. At each interfacing step, the user's intentional choices are classified based on the P300 event related potential (ERP) responses captured in the EEG, as in the classic oddball paradigm. The BCI experiment and EEG results of the pilot study confirm the effectiveness of the proposed spelling method. (C) 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

    DOI

  • Chromatic SSVEP BCI Paradigm Targeting the Higher Frequency EEG Responses

    Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( WP2-3-2 ) 1 - 7  2014年  [査読有り]

     概要を見る

    A novel approach to steady-state visual evoked potential (SSVEP) based brain-computer interface (BCI) is presented in the paper. To minimize possible side effects of the monochromatic light SSVEP-based BCI we propose to utilize chromatic green blue flicker stimuli in higher, comparing to the traditionally used, frequencies. The developed safer SSVEP responses are processed an classified with features drawn from EEG power spectra. Results obtained from healthy users support the research hypothesis of the chromatic and higher frequency SSVEP. The feasibility of proposed method is evaluated in a comparison of monochromatic versus chromatic SSVEP responses. We also present preliminary results with empirical mode decomposition (EMD) adaptive filtering which resulted with improved classification accuracies.

  • P300 Responses Classification Improvement in Tactile BCI with Touch-sense Glove

    Hiroki Yajima, Shoji Makino, Tomasz M. Rutkowski

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( WP2-3-3 ) 1 - 7  2014年  [査読有り]

     概要を見る

    This paper reports on a project aiming to confirm whether a tactile stimulator "touch sense glove" is effective for a novel brain computer interface (BCI) paradigm and whether the tactile stimulus delivered to the fingers could be utilized to evoke event related potential (ERP) responses with possible attentional modulation. The tactile ERPs are expected to improve the BCI accuracy. The proposed new stimulator device is presented in detail together with psychophysical and EEG BCI experiment protocols. Results supporting the proposed "touch sense glove" device are presented in form of online BCI classification accuracy results. Finally, we outline the future possible paradigm improvements.

  • TDOA Estimation by Mapped Steered Response Power Analysis Utilizing Higher-Order Moments

    Xiao-Dong Zhai, Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( FP-P1-3 ) 1 - 4  2014年  [査読有り]

     概要を見る

    In this paper, we propose a new estimation method for the time difference of arrival (TDOA) between two microphones with improved accuracy by exploiting higher-order moments. In the proposed method analyzes the steered response power (SRP) of the observed signals after nonlinearly mapped onto a higher-dimensional space. Since the mapping operation enhances the linear independence between different vectors by increasing the dimensionality of the observed signals, the TDOA analysis achieves higher resolution. The results of an experiment comparing the TDOA estimation performance of the proposed method with that of the conventional methods reveal the robustness of the proposed method against noise and reverberation.

  • On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

    Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)    2014年  [査読有り]

     概要を見る

    In this paper, we investigate the relationship between the way microphones are arranged and the degree to which speech is enhanced using the transfer-function-gain non-negative matrix factorization (NMF), which is an amplitude-based speech enhancement method that is suitable for use with an asynchronous distributed microphone array. In an asynchronous distributed microphone array, recording devices can be placed freely and the number of devices can be easily increased. Therefore, it is important that to determine the optimum microphone arrangement and the degree to which the performance is improved by using many microphones. We understood experimental evaluations to show that the performance by supervised NMF can achieve close to the ideal time-frequency masking with a sufficient number of microphones. We also show that the performance is better when more microphones are placed close to each source.

  • Tactile and Bone-conduction Auditory Brain Computer Interface for Vision and Hearing Impaired Users - Stimulus Pattern and BCI Accuracy Improvement

    Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( FP2-6-3 ) 1 - 7  2014年  [査読有り]

     概要を見る

    This paper aims to improve tactile and bone-conduction brain computer interface (tbaBCI) classification accuracy based on a new stimulus pattern search in order to trigger more separable P300 responses. We propose and investigate three approaches to stimulus spatial and frequency content modification. As result of the online tbaBCI classification accuracy tests with six subjects we conclude that frequency modification in the previously reported single vibrotactile exciter-based patterns leads to border of significance statistical improvements.

  • Tactile Pressure Brain-computer Interface Using Point Matrix Pattern Paradigm

    Kensuke Shimizu, Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

    2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS)     473 - 477  2014年  [査読有り]

     概要を見る

    The paper presents a tactile pressure stimulus-based brain-computer interface (BCI) paradigm. 3 x 3 pressure pins matrix stimulus patterns are presented to the subjects in an oddball paradigm allowing for "aha-responses" generation to attended targets. A research hypothesis is confirmed with the results with five subjects performing online BCI experiments. One of the users could score with 100% accuracy in online ten averages based BCI test. Three users scored above chance levels, while one remained on the chance level border. The presented pilot study experiments and EEG results confirm the effectiveness of the proposed tactile pressure stimulus based BCI.

  • TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY

    Takuya Toyoda, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC)     318 - 322  2014年

     概要を見る

    In this paper, we propose an easy and convenient method for traffic monitoring based on acoustic sensing with vehicle sound recorded by an ad-hoc microphone array. Since signals recorded by an ad-hoc microphone array are asynchronous, we perform channel synchronization by compensating for the difference between the start and the end of the recording and the sampling frequency mismatch. To monitor traffic, we estimate the number of the vehicles by employing the peak detection of the power envelopes, and classify the traffic lane from the difference between the propagation times of the microphones. We also demonstrate the effectiveness of our proposed method using the results of an experiment in which we estimated the number of vehicles and classified the lane in which the vehicles were traveling, according to F-measure.

  • Adaptive Post-Filtering Controlled by Pitch Frequency for CELP-based Speech Coder

    Hironobu Chiba, Yutaka Kamamoto, Takehiro Moriya, Noboru Harada, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS     838 - 842  2014年  [査読有り]

     概要を見る

    Most speech codecs utilize a post-filter that emphasizes pitch structures to enhance perceptual quality at the decoder. Particularly, the bass post-filter used in ITU-T G.718 performs an adaptive pitch enhancement technique for a lower fixed frequency band. This paper describes a new post-filtering method in which the bass the frequency band and the gain are adaptively controlled frame-by-frame depending on the pitch frequency of decoded signal to improve bass post-filter performance. We have confirmed the improvement of the speech quality with the developed method through objective and subjective evaluations.

  • On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

    Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( FA1-1-3 ) 1 - 5  2014年  [査読有り]

     概要を見る

    In this paper, we investigate the relationship between the way microphones are arranged and the degree to which speech is enhanced using the transfer-function-gain non-negative matrix factorization (NMF), which is an amplitude-based speech enhancement method that is suitable for use with an asynchronous distributed microphone array. In an asynchronous distributed microphone array, recording devices can be placed freely and the number of devices can be easily increased. Therefore, it is important that to determine the optimum microphone arrangement and the degree to which the performance is improved by using many microphones. We understood experimental evaluations to show that the performance by supervised NMF can achieve close to the ideal time-frequency masking with a sufficient number of microphones. We also show that the performance is better when more microphones are placed close to each source.

  • Automatic Scoring Method for Open Answer Task in the SJ-CAT Speaking Test Considering Utterance Difficulty Level

    Hao Lu, Takeshi Yamada, Shingo Imai, Takahiro Shinozaki, Ryuichi Nisimura, Kenkichi Ishizuka, Shoji Makino, Nobuhiko Kitawaki

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( WA1-1-3 ) 1 - 5  2014年  [査読有り]

     概要を見る

    In this paper, we propose an automatic scoring method for the open answer task of the Japanese speaking test SJ-CAT. The proposed method first extracts a set of features from an input answer utterance and then estimates a vocabulary richness score by human raters, which ranges from 0 to 4, by employing SVR (support vector regression). We devised a novel set of features, namely text statistics weighted by word reliability, to assess the abundance of vocabulary and expression, and degree of word relevance based on the hierarchical distance in a thesaurus to evaluate the suitability of vocabulary. We confirmed experimentally that the proposed method provides good estimates of the human richness score, with a correlation coefficient of 0.92 and an RMSE (root mean square error) of 0.56. We also showed that the proposed method is relatively robust to differences among examinees and among questions used for training and testing.

  • Auditory Brain-Computer Interface Paradigm with Head Related Impulse Response-based Spatial Cues

    Chisaki Nakaizumi, Koichi Mori, Toshie Matsui, Shoji Makino, Tomasz M. Rutkowski

    Proc. International Conference on Signal Image Technonogy and Internet Based Systems   ( WS-MISA-01 ) 806 - 811  2013年12月  [査読有り]

     概要を見る

    The aim of this study is to provide a comprehensive test of head related
    impulse response (HRIR) for an auditory spatial speller brain-computer
    interface (BCI) paradigm. The study is conducted with six users in an
    experimental set up based on five Japanese hiragana vowels. Auditory evoked
    potentials resulted with encouragingly good and stable "aha-" or P300-responses
    in real-world online BCI experiments. Our case study indicated that the
    auditory HRIR spatial sound reproduction paradigm could be a viable alternative
    to the established multi-loudspeaker surround sound BCI-speller applications,
    as far as healthy pilot study users are concerned.

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • Unisoner: 同一楽曲を歌った異なる歌声を重ね合わせる合唱制作支援インタフェース

    都築圭太, 中野倫靖, 後藤真孝, 山田,武志, 牧野昭二

    第21回インタラクティブシステムとソフトウェアに関するワークショップ, WISS2013    2013年12月

  • Novel spatial tactile and bone-conduction auditory brain computer interface

    T.M.,Rutkowski, H.,Mori, S.,Makino, K.,Mori

    Proc. Neuroscience2013     79  2013年11月  [査読有り]

  • 様々な歌手が同じ曲を歌った歌声の多様さを活用するシステム

    都築圭太, 中野倫靖, 後藤真孝, 山田武志, 牧野昭二

    情報処理学会研究報告   2013-MUS-100-21   1 - 8  2013年09月

     概要を見る

    本稿では,Web 上で公開されている 「一つの曲を様々な歌手が歌った歌声」 を活用する二つのシステムを提案する.一つは,それらの歌声を重ね合わせる合唱生成支援システム,もう一つは,それらの歌声同士や白分の歌声を比較できる歌唱力向上支援システムである.従来,復数の楽曲を用いた鑑賞や創作支援,自分が歌うだけの歌唱力向上支援は研究されてきたが,同一曲を複数人が歌った歌声を活用した合唱生成や歌唱力向上支援はなかった.合唱生成支援システムでは,歌声の出現時刻と左右チャネルの音量をマウスで直感的に調整できる.直感的な操作と,それぞれの歌が完成された作品であることを利用することで,創作と同時に鑑賞を楽しむ 「創作鑑賞」 も可能となる.また,歌唱力向上支援システムでは,声質 (MFCC) と歌い回し (F0軌跡) が近い歌声同士を比較表示できる.Web 上で公開されていて再生数・マイリスト数があるため,それらの情報を活用しながら歌唱力向上に取り組める.これらのシス

  • 復号信号の特徴に応じたACELP用ポストフィルタの制御

    千葉大将, 守谷健弘, 鎌本優, 原田登, 宮部滋樹, 山田武志, 牧野昭二

    日本音響学会秋季研究発表会     319 - 320  2013年09月

  • Some advances in adaptive source separation

    J.T.,Chien, H.,Sawada, S.,Makino

    APSIPA Newsletter     7 - 9  2013年09月  [査読有り]

  • 複素対数補間を用いたヴァーチャル多素子化マイクロホンアレーの周波数依存素子配置最適化

    片平拓希, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二

    日本音響学会秋季研究発表会     609 - 610  2013年09月

  • 非整数サンプルシフトのフレーム分析を用いた非同期録音の同期化

    宮部,滋樹, 小野,順貴, 牧野,昭二

    音講論集   ( 1-1-9 ) 593 - 596  2013年09月

  • News from the AASP-TC

    Makino,Shoji

    IEEE Signal Processing Society eNewsletter, TC News    2013年08月  [査読有り]

  • Network based complexity analysis in tactile brain computer interface task

    H.,Mori, Y.,Matsumito, S.,Makino, Z.,Struzik, D.,Mandic, T.M.,Rutkowski

    Proc. EMBC2013   51 ( M-134 ) 1 - 1  2013年07月  [査読有り]

    DOI CiNii

  • Multi-command tactile and auditory brain computer interface based on head position stimulation

    H.,Mori, Y.,Matsumito, Z.,Struzik, K.,Mori, S.,Makino, D.,Mandic, T.M.,Rutkowski

    Proc. International Brain-Computer Interface Meeting   ( DOI:10.3217/978-3-85125-260-6- ) 1 - 2  2013年06月  [査読有り]

  • Spatial tactile and auditory brain computer interface based on head position stimulation

    T.M.,Rutkowski, H.,Mori, Y.,Matsumoto, Z.,Struzik, S.,Makino, D.,Mandic, K.,Mori

    Proc. Neuro2013    2013年06月  [査読有り]

  • Comparison of P300 responses in auditory, visual and audiovisual spatial speller BCI paradigms

    M.,Chang, N.,Nishikawa, Z.,Struzik, K.,Mori, S.,Makino, D.,Mandic, T.M.,Rutkowski

    Proc. International Brain-Computer Interface Meeting   ( DOI:10.3217/978-3-85125-260-6- ) 1 - 2  2013年06月  [査読有り]

  • Blind compensation of inter-channel sampling frequency mismatch with maximum Likelihood estimation in STFT domain

    S.,Miyabe, N.,Ono, S.,Makino

    Proc. ICASSP2013     674 - 678  2013年05月  [査読有り]

     概要を見る

    This paper proposes a novel blind compensation of sampling frequency mismatch for asynchronous microphone array. Digital signals simultaneously observed by different recording devices have drift of the time differences between the observation channels because of the sampling frequency mismatch among the devices. Based on the model that such the time difference is constant within each time frame, but varies proportional to the time frame index, the effect of the sampling frequency mismatch can be compensated in the short-time Fourier transform domain by the linear phase shift. By assuming the sources are motionless and stationary, a likelihood of the sampling frequency mismatch is formulated. The maximum likelihood estimation is obtained effectively by a golden section search.

  • 多変量確率モデルによる脳波の信号分離

    栗花,悠輔, 宮部,滋樹, ルトコフスキ,トマシュ, 松本,佳泰, 山田,武志, 牧野,昭二

    電子情報通信学会技術研究報告. MBE, MEとバイオサイバネティックス   112 ( 479 ) 161 - 166  2013年03月

     概要を見る

    信号源分離の主流をなす枠組である独立成分分析は,無数の信号源が混合された脳波の観測信号から目的信号成分を高精度に分離するのは難しい.本稿では,脳内の個々の現象に関連する脳波の振幅変化を脳波イベントと定義し,無数の信号源により生成される脳波イベントの観測が,それぞれ短時間では局所的に零平均多変量正規分布に従うという確率モデルを定式化する.時間周波数領域で脳波イベントがスパースに発生すると仮定すると,観測信号の尤度は混合正規分布で表され,EMアルゴリズムによって脳波イベントのパラメタを推定することが可能になる.また,適切な超パラメタを持つディリクレ分布を各正規分布の発生確率に導入することにより,EMアルゴリズムで有意な脳波イベントの数とそのパラメタを推定することが可能となる.脳波分離実験により,適切な数の脳波イベントが分離できていることを確認した.

  • A network model for the embodied communication of musical emotions

    H.,Terasawa, R.,Hoshi-Shiba, T.,Shibayama, H.,Ohmura, K.,Furukawa, S.,Makino, K.,Okanoya

    Cognitive Studies   20 ( 1 ) 112-129 - 129  2013年03月  [査読有り]

     概要を見る

    Music induces a wide range of emotions. However, the influence of physiological<br> functions on musical emotions needs further theoretical considerations. This paper<br> summarizes the physical and physiological functions that are related to musical emo-<br>tions, and proposes a model for the embodied communication of musical emotions based<br> on a discussion on the transmission of musical emotions across people by sharing move-<br>ments and gestures. In this model, human with musical emotion is represented with<br> (1) the interfaces of perception and expression (senses, movements, facial and vocal<br> expressions), (2) an internal system of neural activities including the mirror system<br> and the hormonal secretion system that handles responses to musical activities, and<br> (3) the musical emotion that is enclosed in the internal system. Using this model, mu-<br>sic is the medium for transmitting emotions, and communication of musical emotions<br> is the communication of internal emotions through music and perception/expression<br> interfaces. Finally, we will discuss which aspect in music functions to encourage the<br> communication of musical emotions by humans.

    DOI CiNii

  • A network model for the embodied communication of musical emotions

    H.,Terasawa, R.,Hoshi-Shiba, T.,Shibayama, H.,Ohmura, K.,Furukawa, S.,Makino, K.,Okanoya

    Cognitive Studies   20 ( 1 ) 112-129 - 129  2013年03月  [査読有り]

     概要を見る

    Music induces a wide range of emotions. However, the influence of physiological functions on musical emotions needs further theoretical considerations. This paper summarizes the physical and physiological functions that are related to musical emo-tions, and proposes a model for the embodied communication of musical emotions based on a discussion on the transmission of musical emotions across people by sharing move-ments and gestures. In this model, human with musical emotion is represented with (1) the interfaces of perception and expression (senses, movements, facial and vocal expressions), (2) an internal system of neural activities including the mirror system and the hormonal secretion system that handles responses to musical activities, and (3) the musical emotion that is enclosed in the internal system. Using this model, mu-sic is the medium for transmitting emotions, and communication of musical emotions is the communication of internal emotions through music and perception/expression interfaces. Finally, we will discuss which aspect in music functions to encourage the communication of musical emotions by humans.

    DOI CiNii

  • Speech enhancement with ad-hoc microphone array using single source activity

    Ryutaro Sakanashi, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( OS.21-SLA.7.5 ) 1 - 6  2013年  [査読有り]

     概要を見る

    In this paper, we propose a method for synchronizing asynchronous channels in an ad-hoc microphone array based on single source activity for speech enhancement. An ad-hoc microphone array can include multiple recording devices, which do not communicate with each other. Therefore, their synchronization is a significant issue when using the conventional microphone array technique. We here assume that we know two or more segments (typically the beginning and the end of the recording) where only the sound source is active. Based on this situation, we compensate for the difference between the start and end of the recording and the sampling frequency mismatch. We also describe experimental results for speech enhancement with a maximum SNR beamformer.

  • Performance estimation of noisy speech recognition using spectral distortion and SNR of noise-reduced speech

    Guo Ling, Takeshi Yamada, Shoji Makino, Nobuhiko Kitawaki

    IEEE Region 10 Annual International Conference, Proceedings/TENCON    2013年  [査読有り]

     概要を見る

    To ensure a satisfactory QoE (Quality of Experience) and facilitate system design in speech recognition services, it is essential to establish a method that can be used to efficiently investigate recognition performance in different noise environments. Previously, we proposed a performance estimation method using the PESQ (Perceptual Evaluation of Speech Quality) as a spectral distortion measure. However, there is the problem that the relationship between the recognition performance and the distortion value differs depending on the noise reduction algorithm used. To solve this problem, we propose a novel performance estimation method that uses an estimator defined as a function of the distortion value and the SNR (Signal to Noise Ratio) of noise-reduced speech. The estimator is applicable to different noise reduction algorithms without any modification. We confirmed the effectiveness of the proposed method by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. © 2013 IEEE.

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • Classification improvement of P300 response based auditory spatial speller brain-computer interface paradigm

    Moonjeong Chang, Shoji Makino, Tomasz M. Rutkowski

    IEEE Region 10 Annual International Conference, Proceedings/TENCON   ( S.I.2.1 ) 1 - 4  2013年  [査読有り]

     概要を見る

    The aim of the presented study is to provide a comprehensive test of the EEG evoked response potential (ERP) feature selection techniques for the spatial auditory BCI-speller paradigm, which creates a novel communication option for paralyzed subjects or body-able individuals requiring a direct brain-computer interfacing application. For rigor, the study is conducted with 16 BCI-naive healthy subjects in an experimental setup based on five Japanese hiragana characters in an offline processing mode. In our previous studies the spatial auditory stimuli related P300 responses resulted with encouragingly separable target vs. non-target latencies in averaged responses, yet that finding was not well reproduced in the online BCI single trial based settings. We present the case study indicating that the auditory spatial unimodal paradigm classification accuracy can be enhanced with an AUC based feature selection approach, as far as BCI-naive healthy subjects are concerned. © 2013 IEEE.

    DOI

    Scopus

    5
    被引用数
    (Scopus)
  • Bone-conduction-based brain computer interface paradigm - EEG signal processing, feature extraction and classification

    Daiki Aminaka, Koichi Mori, Toshie Matsui, Shoji Makino, Tomasz M. Rutkowski

    Proceedings - 2013 International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2013   ( WS-MISA-03 ) 818 - 824  2013年  [査読有り]

     概要を見る

    The paper presents a novel bone-conduction based brain-computer interface paradigm. Four sub-threshold acoustic frequency stimulus patterns are presented to the subjects in an oddball paradigm allowing for 'aha-responses' generation to the attended targets. This allows for successful implementation of the bone-conduction based brain-computer interface (BCI) paradigm. The concept is confirmed with seven subjects in online bone-conducted auditory Morse-code patterns spelling BCI paradigm. We report also brain electrophysiological signal processing and classification steps taken to achieve the successful BCI paradigm. We also present a finding of the response latency variability in a function of stimulus difficulty. © 2013 IEEE.

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • VIRTUALLY INCREASING MICROPHONE ARRAY ELEMENTS BY INTERPOLATION IN COMPLEX-LOGARITHMIC DOMAIN

    Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO)   ( TH-L5.3 )  2013年

     概要を見る

    In this paper, we propose a new array signal processing technique for an underdetermined condition by increasing the number of observation channels. We introduce virtual observation as an estimate of the observed signals at positions where real microphones are not placed. Such signals at virtual observation channels are generated by the complex logarithmic interpolation of real observed signals. With the increased number of observation channels, conventional linear array signal processing methods can be applied to underdetermined conditions. As an example of the proposed array signal processing framework, we show experimental results of speech enhancement obtained with maximum SNR beamformers modified using the virtual observation.

  • Multi-command chest tactile brain computer interface for small vehicle robot navigation

    Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8211 LNAI   469 - 478  2013年  [査読有り]

     概要を見る

    The presented study explores the extent to which tactile stimuli delivered to five chest positions of a healthy user can serve as a platform for a brain computer interface (BCI) that could be used in an interactive application such as robotic vehicle operation. The five chest locations are used to evoke tactile brain potential responses, thus defining a tactile brain computer interface (tBCI). Experimental results with five subjects performing online tBCI provide a validation of the chest location tBCI paradigm, while the feasibility of the concept is illuminated through information-transfer rates. Additionally an offline classification improvement with a linear SVM classifier is presented through the case study. © Springer International Publishing 2013.

    DOI

    Scopus

    11
    被引用数
    (Scopus)
  • Classifying P300 responses to vowel stimuli for auditory brain-computer interface

    Yoshihiro Matsumoto, Shoji Makino, Koichi Mori, Tomasz M. Rutkowski

    2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013   ( OS.31-BioSiPS.2.8 ) 1 - 5  2013年  [査読有り]

     概要を見る

    A brain-computer interface (BCI) is a technology for operating computerized devices based on brain activity and without muscle movement. BCI technology is expected to become a communication solution for amyotrophic lateral sclerosis (ALS) patients. Recently the BCI2000 package application has been commonly used by BCI researchers. The P300 speller included in the BCI2000 is an application allowing the calculation of a classifier necessary for the user to spell letters or sentences in a BCI-speller paradigm. The BCI-speller is based on visual cues, and requires muscle activities such as eye movements, impossible to execute by patients in a totally locked-in state (TLS), which is a terminal stage of the ALS illness. The purpose of our project is to solve this problem, and we aim to develop an auditory BCI as a solution. However, contemporary auditory BCI-spellers are much weaker compared with a visual modality. Therefore there is a necessity for improvement before practical application. In this paper, we focus on an approach related to the differences in responses evoked by various acoustic BCI-speller related stimulus types. In spite of various event related potential waveform shapes, typically a classifier in the BCI speller discriminates only between targets and non-targets, and hence it ignores valuable and possibly discriminative features. Therefore, we expect that the classification accuracy could be improved by using an independent classifier for each of the stimulus cue categories. In this paper, we propose two classifier training methods. The first one uses the data of the five stimulus cues independently. The second method incorporates weighting for each stimulus cue feature in relation to all of them. The results of the experiments reported show the effectiveness of the second method for classification improvement. © 2013 APSIPA.

    DOI

    Scopus

    20
    被引用数
    (Scopus)
  • EMPLOYING MOMENTS OF MULTIPLE HIGH ORDERS FOR HIGH-RESOLUTION UNDERDETERMINED DOA ESTIMATION BASED ON MUSIC

    Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Fred Juang

    2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA)   ( PM-02 ) 1 - 4  2013年  [査読有り]

     概要を見る

    Several extensions of the MUltiple SIgnal Classification (MUSIC) algorithm exploiting high order statistics were proposed to estimate directions of arrival (DOAs) with high resolution in underdetermined conditions. However, these methods entail a trade-off between two performance goals, namely, robustness and resolution, in the choice of orders because use of high-ordered statistics increases not only the resolution but also the statistical bias. To overcome this problem, this paper proposes a new extension of MUSIC using a nonlinear high-dimensional map, which corresponds to the joint analysis of moments of multiple orders and helps to realize the both advantages of robustness and high resolution of low-ordered and high-ordered statistics. Experimental results show that the proposed method can estimate DOAs more accurately than the conventional MUSIC extensions exploiting moments of a single high order.

  • OPTIMIZING FRAME ANALYSIS WITH NON-INTEGRER SHIFT FOR SAMPLING MISMATCH COMPENSATION OF LONG RECORDING

    Shigeki Miyabe, Nobutaka Ono, Shoji Makino

    2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA)   ( TM-09 ) 1 - 4  2013年  [査読有り]

     概要を見る

    This paper proposes a blind synchronization of ad-hoc microphone array in the short-time Fourier transform (STFT) domain with the optimized frame analysis centered at non-integer discrete time. We show that the drift caused by sampling frequency mismatch of asynchronous observation channels can be disregarded in a short interval. Utilizing this property, the sampling frequency mismatch and the recording start offset are estimated roughly by finding two pairs of the short intervals corresponding to the same continuous time. Using the estimate, STFT analysis is synchronized roughly between channels with optimized frame central. Since the optimized frame central is generally non-integer, we approximate the frame analysis by the linear phase filtering of the frame centered at the nearest integer sample. Maximum likelihood estimation refines the compensation of sampling frequency mismatch.

  • Spatial auditory BCI with ERP responses to front-back to the head stimuli distinction support

    Zhenyu Cai, Shoji Makino, Tomasz M. Rutkowski

    2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013   ( OS.31-BioSiPS.2.1 ) 1 - 8  2013年  [査読有り]

     概要を見る

    This paper presents recent results obtained with a new auditory spatial localization based BCI paradigm in which ERP shape differences at early latencies are employed to enhance classification accuracy in an oddball experimental setting. The concept relies on recent results in auditory neuroscience showing the possibility to differentiate early anterior contralateral responses to the spatial sources attended to. We also find that early brain responses indicate which direction, front or rear loudspeaker source, the subject attended to. Contemporary stimuli-driven BCI paradigms benefit most from the P300 ERP latencies in a so-called 'aha-response' setting. We show the further enhancement of the classification results in a spatial auditory paradigm, in which we incorporate N200 latencies. The results reveal that these early spatial auditory ERPs boost offline classification results of the BCI application. The offline BCI experiments with the multi-command BCI prototype support our research hypothesis with higher classification results and improved information transfer rates. © 2013 APSIPA.

    DOI

    Scopus

    2
    被引用数
    (Scopus)
  • Adaptive processing and learning for audio source separation

    Jen-Tzung Chien, Hiroshi Sawada, Shoji Makino

    2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013   ( OS.42-SLA.13.3 ) 1 - 6  2013年  [査読有り]

     概要を見る

    This paper overviews a series of recent advances in adaptive processing and learning for audio source separation. In real world, speech and audio signal mixtures are observed in reverberant environments. Sources are usually more than mixtures. The mixing condition is occasionally changed due to the moving sources or when the sources are changed or abruptly present or absent. In this survey article, we investigate different issues in audio source separation including overdetermined/underdetermined problems, permutation alignment, convolutive mixtures, contrast functions, nonstationary conditions and system robustness. We provide a systematic and comprehensive view for these issues and address new approaches to overdetermined/underdetermined convolutive separation, sparse learning, nonnegative matrix factorization, information-theoretic learning, online learning and Bayesian approaches. © 2013 APSIPA.

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • Spatial auditory BCI paradigm based on real and virtual sound image generation

    Nozomu Nishikawa, Shoji Makino, Tomasz M. Rutkowski

    2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013   ( OS.31-BioSiPS.2.7 ) 1 - 5  2013年  [査読有り]

     概要を見る

    This paper presents a novel concept of spatial auditory brain-computer interface utilizing real and virtual sound images. We report results obtained from psychophysical and EEG experiments with nine subjects utilizing a novel method of spatial real or virtual sound images as spatial auditory brain computer interface (BCI) cues. Real spatial sound sources result in better behavioral and BCI response classification accuracies, yet a direct comparison of partial results in a mixed experiment confirms the usability of the virtual sound images for the spatial auditory BCI. Additionally, we compare stepwise linear discriminant analysis (SWLDA) and support vector machine (SVM) classifiers in a single sequence BCI experiment. The interesting point of the mixed usage of real and virtual spatial sound images in a single experiment is that both stimuli types generate distinct event related potential (ERP) response patterns allowing for their separate classification. This discovery is the strongest point of the reported research and it brings the possibility to create new spatial auditory BCI paradigms. © 2013 APSIPA.

    DOI

    Scopus

    6
    被引用数
    (Scopus)
  • Multi-command tactile brain computer interface: A feasibility study

    Hiromu Mori, Yoshihiro Matsumoto, Victor Kryssanov, Eric Cooper, Hitoshi Ogawa, Shoji Makino, Zbigniew R. Struzik, Tomasz M. Rutkowski

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   7989 LNCS   50 - 59  2013年  [査読有り]

     概要を見る

    The study presented explores the extent to which tactile stimuli delivered to the ten digits of a BCI-naive subject can serve as a platform for a brain computer interface (BCI) that could be used in an interactive application such as robotic vehicle operation. The ten fingertips are used to evoke somatosensory brain responses, thus defining a tactile brain computer interface (tBCI). Experimental results on subjects performing online (real-time) tBCI, using stimuli with a moderately fast inter-stimulus-interval (ISI), provide a validation of the tBCI prototype, while the feasibility of the concept is illuminated through information-transfer rates obtained through the case study. © 2013 Springer-Verlag.

    DOI

    Scopus

    14
    被引用数
    (Scopus)
  • EEG signal processing and classification for the novel tactile-force brain-computer interface paradigm

    Shota Kono, Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    Proceedings - 2013 International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2013   ( WS-MISA-02 ) 812 - 817  2013年  [査読有り]

     概要を見る

    The presented study explores the extent to which tactile-force stimulus delivered to a hand holding a force-feedback joystick can serve as a platform for a brain-computer interface (BCI). The four pressure directions are used to evoke tactile brain potential responses, thus defining a tactile-force brain computer interface (tfBCI). We present brain signal processing and classification procedures leading to successful online interfacing results. Experimental results with seven subjects performing online BCI experiments provide a validation of the hand location tfBCI paradigm, while the feasibility of the concept is illuminated through remarkable information-transfer rates. © 2013 IEEE.

    DOI

    Scopus

    4
    被引用数
    (Scopus)
  • Inter-subject differences in personalized technical ear training and the influence of an individually optimized training sequence

    Sungyoung Kim, Teruaki Kaniwa, Hiroko Terasawa, Takeshi Yamada, Shoji Makino

    Acoustical Science and Technology   34 ( 6 ) 424 - 431  2013年  [査読有り]

     概要を見る

    Technical ear training aims to improve the listening of sound engineers so they can skillfully modify and edit the structure of sound. Despite recent increasing interest in listening ability and subjective evaluation in the field of audio- and acoustic-related fields and the subsequent appearance of various technical ear-training methods, the subject of how to provide efficient training for a self-trainee has not yet been studied. This paper investigated trainees' performances and showed that an (inherent or learned) ability to correctly describe spectral differences using the terms of a parametric equalizer (center frequency, Q, and gain) was different for each person. To cope with such individual differences in spectral identification, the authors proposed a novel method that adaptively controls the training task based on a trainee's prior performances. In detail, the method estimates the weakness of the trainee, and generates a training routine that focuses on that weakness. Subsequently, we tried to determine whether the proposed method-adaptive feedback-helps self-learners improve their performance in technical listening that involves identifying spectral differences. The results showed that the proposed method could assist trainees in improving their ability to identify differences more effectively than the counterpart group. Together with other features required for effective selftraining, this adaptive feedback would assist a trainee in acquisition of timbre-identification ability. © 2013 The Acoustical Society of Japan.

    DOI

    Scopus

    9
    被引用数
    (Scopus)
  • Exhaustive structural comparison of protein-DNA binding surfaces

    R.,Minai, T.,Horiike, S.,Makino

    GIW2012 (International Conference on Genome Informatics)   ( poster 29 )  2012年12月  [査読有り]

  • Full-reference objective quality evaluation for noise-reduced speech considering effect of musical noise

    Y.,Fujita, T.,Yamada, S.,Makino, N.,Kitawaki

    Oriental COCOSDA2012     300-305  2012年12月  [査読有り]

  • Foreword to special issue on recent mathematical advances in acoustic signal processing

    S.,Makino

    The Journal of the Acoustical Society of Japan   68 ( 11 ) 557-558 - 558  2012年11月  [査読有り]

    CiNii

  • A multi-command spatial auditory BMI based on evoked EEG responses from real and virtual sound stimuli

    T.M.,Rutkowski, Z.,Cai, N.,Nishikawa, Y.,Matsumoto, S.Makino, D.,Looney, D.P.,Mandic, Z.R.,Struzik, A.W, Przybyszewski

    Neuroscience2012     891.16/NN4  2012年10月  [査読有り]

  • Underdetermined DOA estimation by the non-linear MUSIC exploiting higher-order moments

    Y.,Sugimoto, S.,Miyabe, T.,Yamada, S.,Makino, F.,Juang

    IWAENC2012   ( E-03 )  2012年09月  [査読有り]

  • In Search of a Perceptual Metric for Timbre: Dissimilarity Judgments among Synthetic Sounds with MFCC-Derived Spectral Envelopes

    Hiroko Terasawa, Jonathan Berger, Shoji Makino

    JOURNAL OF THE AUDIO ENGINEERING SOCIETY   60 ( 9 ) 674 - 685  2012年09月  [査読有り]

     概要を見る

    This paper presents a quantitative metric to describe the multidimensionality of spectral envelope perception, that is, the perception specifically related to the spectral element of timbre. Mel-cepstrum (Mel-frequency cepstral coefficients or MFCCs) is chosen as a hypothetical metric for spectral envelope perception due to its desirable properties of linearity, orthogonality, and multidimensionality. The experimental results confirmed the relevance of Mel-cepstrum to the perceived timbre dissimilarity when the spectral envelopes of complex-tone synthetic sounds were systematically controlled. The first experiment measured the perceived dissimilarity when the stimuli were synthesized by varying only a single coefficient from MFCC. Linear regression analysis proved that each of the 12 MFCCs has a linear correlation with spectral envelope perception. The second experiment measured the perceived dissimilarity when the stimuli were synthesized by varying two of the MFCCs. Multiple regression analysis showed that the perceived dissimilarity can be explained in terms of the Euclidean distance of the MFCC values of the synthetic sounds. The quantitative and perceptual relevance between the MFCCs and spectral centroids is also discussed. These results suggest that MFCCs can be a metric representation of spectral envelope perception, where each of its orthogonal basis functions provides a linear match with human perception.

  • Analysis of brain responses to spatial real and virtual sounds - A BCI/BMI approach

    N.,Nishikawa, S.,Makino, T.M.,Rutkowski

    International Workshop on Brain Inspired Computing, BIC2012    2012年06月  [査読有り]

  • Steady-state auditory responses application to BCI/BMI

    Y.,Matsumoto, S.,Makino, T.M.,Rutkowski

    International Workshop on Brain Inspired Computing, BIC2012    2012年06月  [査読有り]

  • Spatial auditory BCI/BMI paradigm

    Z.,Cai, S.,Makino, T.M.,Rutkowski

    International Workshop on Brain Inspired Computing, BIC2012    2012年06月  [査読有り]

  • フルランク空間相関行列モデルに基づく拡散性雑音除去

    礒, 佳樹, 荒木, 章子, 牧野, 昭二, 中谷, 智広, 澤田, 宏, 山田, 武志, 宮部, 滋樹, 中村, 篤

    電子情報通信学会総合大会講演論文集   2012 ( 0 ) 194  2012年03月

  • D-14-1 雑音抑圧音声の主観品質評価におけるミュージカルノイズの影響(D-14.音声,一般セッション)

    藤田, 悠希, 山田, 武志, 牧野, 昭二, 北脇, 信彦

    電子情報通信学会総合大会講演論文集   2012 ( 1 ) 185  2012年03月

  • Cepstral smoothing of separated signals for underdetermined speech separation

    Y.,Ansai, S.,Araki, S.,Makino, T.,Nakatani, T.,Yamada, A.,Nakamura, N.,Kitawaki

    The Journal of the Acoustical Society of Japan   68 ( 2 ) 74 - 85  2012年02月  [査読有り]

     概要を見る

    本論文では,音源信号のスパース性に基づき,時間周波数バイナリマスク(BM)を用いる音源分離手法におけるミュージカルノイズの低減を目的とした,分離音声のケプストラムスムージング(CSS)を提案する。CSSは,近年提案されたスペクトルマスクのケプストラムスムージング(CSM)で用いられるケプストラム領域でスムージングする考え方と,ケプストラム表現による音声特性の保持の制御という観点では,マスクではなくBMによって得られた分離音声を直接スムージングする方が好ましいという仮説とに基づいている。また,従来法(CSM)や提案法(CSS)と他のミュージカルノイズ低減手法の性能を実験により比較する。CSSでは,CSMと同程度のミュージカルノイズ低減性能を有し,更に目的音声の歪の小さい分離信号が得られた。

    CiNii

  • NEW ANALYTICAL UPDATE RULE FOR TDOA INFERENCE FOR UNDERDETERMINED BSS IN NOISY ENVIRONMENTS

    Takuro Maruyama, Shoko Araki, Tomohiro Nakatani, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Atsushi Nakamura

    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)     269 - 272  2012年  [査読有り]

     概要を見る

    In this paper, we propose a new technique for sparseness-based underdetermined BSS that is based on the clustering of the frequency-dependent time difference of arrival (TDOA) information and that can cope with diffused noise environments. Such a method with an EM algorithm has already been proposed, however, it required a time-consuming exhaust search for TDOA inference. To remove the need for such an exhaust search, we propose a new technique by focusing on a stereo case. We derive an update rule for analytical TDOA estimation. This update rule eliminates the need for the exhaustive TDOA search, and therefore reduces the computational load. We show experimental results for separation performance and calculation time in comparison with those obtained with the conventional approach. Our reported results validate our proposed method, that is, our proposed method achieves high performance without a high computational cost.

  • Spatial auditory BCI paradigm utilizing N200 and P300 responses

    Zhenyu Cai, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012   ( OS.6-BioSPS.1.4 ) 1-7  2012年  [査読有り]

     概要を見る

    The paper presents our recent results obtained with a new auditory spatial localization based BCI paradigm in which the ERP shape differences at early latencies are employed to enhance the traditional P300 responses in an oddball experimental setting. The concept relies on the recent results in auditory neuroscience showing a possibility to differentiate early anterior contralateral responses to attended spatial sources. Contemporary stimuli-driven BCI paradigms benefit mostly from the P300 ERP latencies in so called "aha-response" settings. We show the further enhancement of the classification results in spatial auditory paradigms by incorporating the N200 latencies, which differentiate the brain responses to lateral, in relation to the subject head, sound locations in the auditory space. The results reveal that those early spatial auditory ERPs boost online classification results of the BCI application. The online BCI experiments with the multi-command BCI prototype support our research hypothesis with the higher classification results and the improved information-transfer-rates. © 2012 APSIPA.

  • Sonification of Muscular Activity in Human Movements Using the Temporal Patterns in EMG

    Masaki Matsubara, Hiroko Terasawa, Hideki Kadone, Kenji Suzuki, Shoji Makino

    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)   ( OS.6-BioSPS.1.2 ) 1-5  2012年  [査読有り]

     概要を見る

    Biofeedback is currently considered as an effective method for medical rehabilitation. It aims to increase the awareness and recognition of the body's motion by feeding back the physiological information to the patients in real time. Our goal is to create an auditory biofeedback that aids understanding of the dynamic motion involving multiple muscular parts, with the ultimate aim of clinical rehabilitation use. In this paper, we report the development of a real-time sonification system using EMG, and we propose three sonification methods that represent the data in pitch, timbre, and the combination of polyphonic timbre and loudness. Our user evaluation test involves the task of timing and order identification and a questionnaire about the subjective comprehensibility and the preferences, leading to a discussion of the task performance and usability. The results show that the subjects can understand the order of the muscular activities at 63.7% accuracy on average. And the sonification method with polyphonic timbre and loudness provides an 85.2% accuracy score on average, showing its effectiveness. Regarding the preference of the sound design, we found that there is not a direct relationship between the task performance accuracy and the preference of sound in the proposed implementations.

  • Vibrotactile stimulus frequency optimization for the haptic BCI prototype

    Hiromu Mori, Yoshihiro Matsumito, Shoji Makino, Victor Kryssanov, Tomasz M. Rutkowski

    6th International Conference on Soft Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012     2150 - 2153  2012年  [査読有り]

     概要を見る

    The paper presents results from a psychophysical study conducted to optimize vibrotactile stimuli delivered to subject finger tips in order to evoke the somatosensory responses to be utilized next in a haptic brain computer interface (hBCI) paradigm. We also present the preliminary EEG evoked responses for the chosen stimulating frequency. The obtained results confirm our hypothesis that the hBCI paradigm concept is valid and it will allow for rapid stimuli presentation in order to improve information-transfer-rate (ITR) of the BCI. © 2012 IEEE.

    DOI

    Scopus

    15
    被引用数
    (Scopus)
  • AUTOMATIC SCORING METHOD CONSIDERING QUALITY AND CONTENT OF SPEECH FOR SCAT JAPANESE SPEAKING TEST

    Naoko Okubo, Yuto Yamahata, Takeshi Yamada, Shingo Imai, Kenkichi Ishizuka, Takahiro Shinozaki, Ryuichi Nisimura, Shoji Makino, Nobuhiko Kitawaki

    2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS     72 - 77  2012年  [査読有り]

     概要を見る

    We are now developing a Japanese speaking test called SCAT, which is part of J-CAT (Japanese Computerized Adaptive Test), a free online proficiency test for Japanese language learners. In this paper, we focus on the sentence-reading-aloud task and the sentence generation task in SCAT, and propose an automatic scoring method for estimating the overall score of answer speech, which is holistically determined by language teachers according to a rating standard. In that process, teachers carefully consider different factors but do not rate the scores of them. We therefore analyze how each factor contributes to the overall score. The factors are divided into two categories: the quality of speech and the content of speech. The former includes pronunciation and intonation, and the latter representation and vocabulary. We then propose an automatic scoring method based on the analysis. Experimental results confirm that the proposed method gives relatively accurate estimates of the overall score.

  • Auditory steady-state response stimuli based BCI application - The optimization of the stimuli types and lengths

    Yoshihiro Matsumoto, Nozomu Nishikawa, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012   ( OS.13-BioSPS.2.3 ) 1-7  2012年  [査読有り]

     概要を見る

    We propose a method for an improvement of auditory BCI (aBCI) paradigm based on a combination of ASSR stimuli optimization by choosing the subjects' best responses to AM-, flutter-, AM/FM and click-envelope modulated sounds. As the ASSR response features we propose pairwise phase-locking-values calculated from the EEG and next classified using binary classifier to detect attended and ignored stimuli. We also report on a possibility to use the stimuli as short as half a second, which is a step forward in ASSR based aBCI. The presented results are helpful for optimization of the aBCI stimuli for each subject. © 2012 APSIPA.

  • EEG steady state synchrony patterns sonification

    Teruaki Kaniwa, Hiroko Terasawa, Masaki Matsubara, Tomasz M. Rutkowski, Shoji Makino

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012   ( OS.6-BioSPS.1.5 ) 1-6  2012年  [査読有り]

     概要を見る

    This paper describes an application of a multichannel EEG sonification approach. We present results obtained with a multichannel-sonification method tested with steady-state EEG responses. We elucidate brain synchrony patterns in an auditory domain with utilization of the EEG coherence measure. The transitions in the synchrony patterns are represented as timbre (i.e., spectro-temporal) deviation and as spatial movement of the sound cluster. Our final sonification evaluation experiment with six subjects confirms the validity of the proposed brain synchrony-elucidation approach. © 2012 APSIPA.

  • Distance Attenuation Control of Spherical Loudspeaker Array

    Shigeki Miyabe, Takaya Hayashi, Takeshi Yamada, Shoji Makino

    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)   ( OS.15-SLA.7.2 ) 1-4  2012年  [査読有り]

     概要を見る

    This paper describes control of distance attenuation using spherical loudspeaker array. Fisher et al. proposed radial filtering with spherical microphone to control the sensitivity to distance from a sound source by modeling the propagation of waves in spherical harmonic domain. Since transfer functions are not changed by swapping their inputs and outputs, we can use the same theory of radial filtering for microphone arrays to the filter design of distance attenuation control with loudspeaker arrays. Experimental results confirmed that the proposed method is effective in low frequencies.

  • The spatial real and virtual sound stimuli optimization for the auditory BCI

    Nozomu Nishikawa, Yoshihiro Matsumoto, Shoji Makino, Tomasz M. Rutkowski

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012   ( OS.13-BioSPS.2.6 ) 1-9  2012年  [査読有り]

     概要を見る

    The paper presents results from a project aiming to create horizontally distributed surround sound sources and virtual sound images as auditory BCI (aBCI) stimuli. The purpose is to create evoked brain wave response patterns depending on attended or ignored sound directions. We propose to use a modified version of the vector based amplitude panning (VBAP) approach to achieve the goal. The so created spatial sound stimulus system for the novel oddball aBCI paradigm allows us to create a multi-command experimental environment with very encouraging results reported in this paper.We also present results showing that a modulation of the sound image depth changes also the subject responses. Finally, we also compare the proposed virtual sound approach with the traditional one based on real sound sources generated from the real loudspeaker directions. The so obtained results confirm the hypothesis of the possibility to modulate independently the brain responses to spatial types and depths of sound sources which allows for the development of the novel multi-command aBCI. © 2012 APSIPA.

  • Psychophysical responses comparison in spatial visual, audiovisual, and auditory BCI-spelling paradigms

    Moonjeong Chang, Nozomu Nishikawa, Zhenyu Cai, Shoji Makino, Tomasz M. Rutkowski

    6th International Conference on Soft Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012     2154 - 2157  2012年  [査読有り]

     概要を見る

    The paper presents a pilot study conducted with spatial visual, audiovisual and auditory brain-computer-interface (BCI) based speller paradigms. The psychophysical experiments are conducted with healthy subjects in order to evaluate a difficulty and a possible response accuracy variability. We also present preliminary EEG results in offline BCI mode. The obtained results validate a thesis, that spatial auditory only paradigm performs as good as the traditional visual and audiovisual speller BCI tasks. © 2012 IEEE.

    DOI

    Scopus

    5
    被引用数
    (Scopus)
  • Comparison of superimposition and sparse models in blind source separation by multichannel Wiener filter

    Ryutaro Sakanashi, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)   ( OS.18-SLA.9.5 ) 1-6  2012年  [査読有り]

     概要を見る

    Multichannel Wiener filter proposed by Duong et al. can conduct underdetermined blind source separation (BSS) with low distortion. This method assumes that the observed signal is the superimposition of the multichannel source images generated from multivariate normal distributions. The covariance matrix in each time-frequency slot is estimated by an EM algorithm which treats the source images as the hidden variables. Using the estimated parameters, the source images are separated as the maximum a posteriori estimate. It is worth nothing that this method does not assume the sparseness of sources, which is usually assumed in underdetermined BSS. In this paper we investigate the effectiveness of the three attributes of Duong's method, i.e., the source image model with multivariate normal distribution, the observation model without sparseness assumption, and the source separation by multichannel Wiener filter. We newly formulate three BSS methods with the similar source image model and the different observation model assuming sparseness, and we compare them with Duong's method and the conventional binary masking. Experimental results confirmed the effectiveness of all the three attributes of Duong's method.

  • New analytical calculation and estimation of TDOA for underdetermined BSS in noisy environments

    Takuro Maruyama, Shoko Araki, Tomohiro Nakatani, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Atsushi Nakamura

    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)   ( OS.12-SLA.6.4 ) 1-6  2012年  [査読有り]

     概要を見る

    We have proposed a new algorithm for sparseness-based underdetermined blind source separation (ESS) that can cope with diffused noise environments. This algorithm includes a technique for estimating the time-difference-of-arrival (TDOA) parameter separately in individual frequency bins for each source. In this paper, we propose methods that integrate the frequency-bin-wise TDOA parameter to estimate the TDOA of each source. The accuracy of TDOA estimation with the proposed approach is shown experimentally in comparison with a conventional approach. The separation performance and calculation time of the proposed approach is also examined.

  • Visualization of conversation flow in meetings by analysis of direction of arrivals and continuousness of utterance

    M., Katoh, Y., Sugimoto, S., Miyabe, S., Makino, T., Yamada, and, N. Kitawaki

    Tunisian-Japan Symposium on Science, Society & Technology     1-5  2011年11月  [査読有り]

  • New EEG components separation method: Data driven Huang-Hilbert transform application to auditory BMI paradigm

    T.M., Rutkowski, Q., Zhao, D.P., Mandic, Z., Cai, A., Cichocki, S., Makino, and, A.W. Przybyszewski

    Neuroscience 2011     627.15/AAA32  2011年11月  [査読有り]

  • 周波数依存の時間差モデルによる劣決定BSS

    丸山, 卓郎, 荒木, 章子, 中谷, 智広, 宮部, 滋樹, 山田, 武志, 牧野, 昭二, 中村, 篤

    電子情報通信学会技術研究報告. EA, 応用音響   111 ( 306 ) 25 - 30  2011年11月

     概要を見る

    本研究では、EMアルゴリズムを用いたスパース性に基づく劣決定ブラインド音源分離(BSS)において、推定時間差を解析的に更新する手法を提案する。和泉らは、雑音・残響下でも頑強な劣決定BSSを提案しているが、時間差を示すパラメータを離散全探索で求める必要があるため、計算コストがかかってしまうという問題点がある。本稿では特に2chBSSの場合に議論を絞り、時間差が周波数に依存するモデルを用いることによって、時間差パラメタについて解析的な更新式を与える。この手法を用いることで、時間差を全探索により求める必要がなくなるため、計算時間を短縮することが可能になる。提案手法の分離性能・計算コストを、従来手法と比較した実験結果を示す。

    CiNii

  • Performance estimation of noisy speech recognition based on short-term noise characteristics

    E., Morishita, T., Yamada, S., Makino, and, N. Kitawaki

    Tunisian-Japan Symposium on Science, Society & Technology     1-4  2011年11月  [査読有り]

  • Performance estimation of noisy speech recognition considering the accuracy of acoustic models

    T., Takaoka, T., Yamada, S., Makino, and, N. Kitawaki

    Tunisian-Japan Symposium on Science, Society & Technology     1-4  2011年11月  [査読有り]

  • A study on sound image control method for operational support of touch panel display

    Shigeyoshi, Amano, Takeshi, Yamada, Shoji, Makino, Nobuhiko, Kitawaki

    Proc. APSIPA ASC 2011   ( Thu-PM.PS2 ) 1-1  2011年10月  [査読有り]

  • 雑音抑圧音声の主観・客観品質評価法

    山田, 武志, 牧野, 昭二, 北脇, 信彦

    日本音響学会誌   67 ( 10 ) 476 - 481  2011年10月  [査読有り]

    CiNii

  • Towards a personalized technical ear training program: An investigation of the effect of adaptive feedback

    T., Kaniwa, S., Kim, H., Terasawa, M., Ikeda, T., Yamada, and, S. Makino, 牧野, 昭二

    Sound and Music Computing Conference     439-443  2011年07月  [査読有り]

  • C. elegans meets data sonification: Can we hear its elegant movement?

    H., Terasawa, Y., Takahashi, K., Hirota, T., Hamano, T., Yamada, A., Fukamizu, and, S. Makino, 牧野, 昭二

    Sound and Music Computing Conference     77-82  2011年07月  [査読有り]

  • DOA Estimation for Multiple Sparse Sources with Arbitrarily Arranged Multiple Sensors

    Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY   63 ( 3 ) 265 - 275  2011年06月  [査読有り]

     概要を見る

    This paper proposes a method for estimating the direction of arrival (DOA) of multiple source signals for an underdetermined situation, where the number of sources N exceeds the number of sensors M (M &lt; N). Some DOA estimation methods have already been proposed for underdetermined cases. However, since most of them restrict their microphone array arrangements, their DOA estimation ability is limited to a 2-dimensional plane. To deal with an underdetermined case where sources are distributed arbitrarily, we propose a method that can employ a 2- or 3-dimensional sensor array. Our new method employs the source sparseness assumption to handle an underdetermined case. Our formulation with the sensor coordinate vectors allows us to employ arbitrarily arranged sensors easily. We obtained promising experimental results for 2-dimensionally distributed sensors and sources 3x4, 3x5 (#sensors x #speech sources), and for 3-dimensional case with 4x5 in a room (reverberation time (RT) of 120 ms). We also investigate the DOA estimation performance under several reverberant conditions.

    DOI

    Scopus

    31
    被引用数
    (Scopus)
  • B-11-19 楽音と音声の双方に適用できる客観品質評価法の検討(B-11.コミュニケーションクオリティ,一般セッション)

    三上, 雄一郎, 山田, 武志, 牧野, 昭二, 北脇, 信彦

    電子情報通信学会総合大会講演論文集   2011 ( 2 ) 448  2011年02月

    CiNii

  • B-11-18 雑音抑圧音声の客観品質評価に用いる総合品質推定モデルの改良(B-11.コミュニケーションクオリティ,一般セッション)

    藤田, 悠希, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会講演論文集   2011 ( 2 ) 447 - 447  2011年02月

    CiNii

  • DCTと動きベクトルを極力継承して再量子化雑音を低減するインタレース映像用MPEG-2/H.264再符号化手法(画像・映像処理)

    吉留, 健, 上倉, 一人, 牧野, 昭二, 北脇, 信彦

    電子情報通信学会論文誌. D, 情報・システム   94 ( 2 ) 469 - 480  2011年02月  [査読有り]

     概要を見る

    インタレース映像を符号化したMPEG-2ストリームをH.264へトランスコードする際に,初段符号化情報を利用して,混入する量子化雑音を低減する手法を提案する.本手法では,MPEG-2のDCT種別と動き補償種別をH.264へ極力継承し,更にフレームベクトルからフィールドベクトルに変換すれば継承可能となるペアMBをDCT種別と動き補償種別の組合せから判別し,ベクトル変換することで継承率を向上させる.実験の結果,符号化情報を利用しない従来手法に比べ,0.19〜0.31dBのPSNR向上が確認できた.

    CiNii

  • Blind source separation of mixed speech in a high reverberation environment

    Keiju Iso, Shoko Araki, Shoji Makino, Tomohiro Nakatani, Hiroshi Sawada, Takeshi Yamada, Atsushi Nakamura

    2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays, HSCMA'11     36 - 39  2011年  [査読有り]

     概要を見る

    Blind source separation (BSS) is a technique for estimating and separating individual source signals from a mixed signal using only information observed by each sensor. BSS is still being developed for mixed signals that are affected by reverberation. In this paper, we propose combining the BSS method that considers reverberation proposed by Duong et al. with the BSS method reported by Sawada et al., which does not consider reverberation, for the initial setting of the EM algorithm. This proposed method assumes the underdetermined case. In the experiment, we compare the proposed method with the conventional method reported by Duong et al. and that reported by Sawada et al., and demonstrate the effectiveness of the proposed method. © 2011 IEEE.

    DOI

    Scopus

    7
    被引用数
    (Scopus)
  • Spatial location and sound timbre as informative cues in auditory BCI/BMI - Electrodes position optimization for brain evoked potential enhancement

    Zhenyu Cai, Hiroko Terasawa, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

    APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011   ( Wed-PM.SS4 ) 222 - 227  2011年  [査読有り]

     概要を見る

    The paper introduces a novel auditory BCI/BMI paradigm based on combined sound timbre and horizontal plane spatial locations as informative cues. The presented concept is based on responses to eight-directional audio stimuli with various tonal and environmental sound stimuli. The approach is based on a monitoring of brain electrical activity by means of the electroencephalogram (EEG). The previously developed by the authors spatial auditory stimulus is extended to varying in timbre sound stimuli which feature helps the subjects to attend to the targets. The main achievement discussed in the paper is an offline BCI analysis based on an optimization of electrode locations on the scalp and evoked response latency for further classification results improvement. The so developed new BCI paradigm is more user-friendly and it leads to better results comparing to previously utilized simple tonal or steady-state stimuli.

  • Restoration of Clipped Audio Signal Using Recursive Vector Projection

    Shin Miura, Hirofumi Nakajima, Shigeki Miyabe, Shoji Makino, Takeshi Yamada, Kazuhiro Nakadai

    2011 IEEE REGION 10 CONFERENCE TENCON 2011     394 - 397  2011年  [査読有り]

     概要を見る

    This paper proposes signal restoration from clipping effect without prior knowledge. First, an interval of signal including clipped samples is analyzed by recursive vector projection. By analyzing the neighboring samples of the clipped interval and excluding the clipped interval in the analysis of similarity, signal estimation in the clipped interval is estimated as a by-product of the analysis. Since the estimation holds consistency with the neighboring samples, the restored signal does not suffer from click noise. Evaluation of the clipping restoration with various audio signal ascertained that the proposed method improves signal-to-noise ratio.

  • Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source

    Kazuma Takeda, Hirokazu Kameoka, Hiroshi Sawada, Shoko Araki, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2011 IEEE REGION 10 CONFERENCE TENCON 2011     413 - 416  2011年  [査読有り]

     概要を見る

    This paper presents a new method for underdetermined Blind Source Separation (BSS), based on a concept called multichannel complex non-negative matrix factorization (NMF). The method assumes (1) that the time-frequency representations of sources have disjoint support (W-disjoint orthogonality of sources), and (2) that each source is modeled as a superposition of components whose amplitudes vary over time coherently across all frequencies (amplitude coherence of frequency components) in order to jointly solve the indeterminacy involved in the frequency domain underdetermined BSS problem. We confirmed experimentally that the present method performed reasonably well in terms of the signal-to-interference ratio when the mixing process was known.

  • Mora pitch level recognition for the development of a Japanese pitch accent acquisition system

    Greg, Short, Keikichi, Hirose, Takeshi, Yamada, Nobuaki, Minematsu, Nobuhiko, Kitawaki, Shoji, Makino

    Proc. International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques, Oriental COCOSDA 2010     1-6  2010年10月  [査読有り]

  • 雑音抑圧された音声の主観・客観品質評価法

    山田, 武志, 牧野, 昭二, 北脇, 信彦

    情報処理学会研究報告. SLP, 音声言語情報処理   2010 ( 7 ) 1 - 6  2010年10月

     概要を見る

    雑音環境において高品質の音声通信を実現するためには,音声に重畳している雑音成分を抑圧することが有効である.しかし,雑音抑圧によって雑音の音量が低減する一方で,音声成分にはひずみが生じ,また抑圧しきれなかった雑音成分が残留するという問題が生じる.これらのひずみや残留雑音の特性は,雑音や雑音抑圧アルゴリズムの性質によって変動し,ユーザ体感品質に大きな影響を及ぼす.よって,雑音抑圧音声の品質を適切に評価する手法の確立が必要不可欠である.本稿では,雑音抑圧音声の主観品質評価法と客観品質評価法について述べる.

    CiNii

  • A VC-1 to H.264/AVC intra transcoding using encoding information to reduce re-quantization noise

    T., Yoshitome, Y., Nakajima, K., Kamikura, S., makino, N., Kitawaki

    International Conference on Signal and Image Processing     170-177  2010年08月  [査読有り]

  • BS-5-4 雑音抑圧音声のMOSと単語了解度の客観推定(BS-5.QoE最前線-情報通信サービスにおけるユーザ体感品質-,シンポジウムセッション)

    山田, 武志, 北脇, 信彦, 牧野, 昭二

    電子情報通信学会ソサイエティ大会講演論文集   2010 ( 2 ) - 19  2010年08月

  • 空間パワースペクトルの主成分分析に基づく時間断続信号の検出(音響信号処理/聴覚/一般)

    加藤, 通朗, 杉本, 侑哉, 牧野, 昭二, 山田, 武志, 北脇, 信彦

    電子情報通信学会技術研究報告. EA, 応用音響   110 ( 171 ) 25 - 30  2010年08月

     概要を見る

    会議音声アーカイブを効率良く再生するためには,誰がいつ,どのように発話しているかを事前に検出しておくことが重要である.本稿では,相槌のように,発話の時間長が短く,かつ時間的に断続している信号を,連続発話区間や無発話区間と区別した上で,自動的に検出する手法を提案する.提案手法は,マイクロホンアレーにより収録した会議音声データから空間パワースペクトルの時系列を求め,これを主成分分析する.そして,方向毎に求めた主成分スコアに基づいて複数の話者・音源が発する時間断続信号を検出する.インタビュアーが1名,参加者が2名といった,同時発話にスパース性を仮定できる少人数の会議を想定し,提案手法の有効性を検証した.その結果,上位数個の主成分スコアから連続発話,相槌,無発話を検出できる示唆を得た.

    CiNii

  • Special Section on Blind Signal Processing and Its Applications

    Shoji Makino, Andrzej Cichocki, Wei Xing Zheng, Aurelio Uncini

    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS   57 ( 7 ) 1401 - 1403  2010年07月  [査読有り]

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • Special Section on Blind Signal Processing and Its Applications

    Shoji Makino, Andrzej Cichocki, Wei Xing Zheng, Aurelio Uncini

    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS   57 ( 7 ) 1401 - 1403  2010年07月

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • Underdetermined Blind Source Separation Using Acoustic Arrays

    Shoji Makino, Shoko Araki, Stefan Winter, Hiroshi Sawada

    Handbook on Array Processing and Sensor Networks     303 - 341  2010年04月  [査読有り]

    DOI

    Scopus

    8
    被引用数
    (Scopus)
  • B-11-1 IP網における音声の客観品質評価に用いる擬似音声信号の検討(B-11.コミュニケーションクオリティ,一般セッション)

    青島, 千佳, 北脇, 信彦, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会講演論文集   2010 ( 2 ) 435  2010年03月

  • B-11-2 雑音抑圧音声の総合品質推定モデルを適用したフルリファレンス客観品質評価法(B-11.コミュニケーションクオリティ,一般セッション)

    篠原, 佑基, 山田, 武志, 北脇, 信彦, 牧野, 昭二

    電子情報通信学会総合大会講演論文集   2010 ( 2 ) 436  2010年03月

  • MPEG-2/H.264 transcoding with vector conversion reducing re-quantization noise

    Takeshi Yoshitome, Kazuto Kamikura, Shoji Makino, Nobuhiko Kitawaki

    Proceedings - International Conference on Computer Communications and Networks, ICCCN     1-6  2010年  [査読有り]

     概要を見る

    We propose an MPEG-2 to H.264 transcoding method for interlace streams intermingled with frame and field macroblocks. This method uses the encoding information from an MPEG-2 stream and keeps as many DCT coefficients of the original MPEG-2 bitstream as possible. Experimental results show that the proposed method improves PSNR by about 0.19-0.31 dB compared with a conventional method. © 2010 IEEE.

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • Performance Estimation of Noisy Speech Recognition Considering Recognition Task Complexity

    Takeshi Yamada, Tomohiro Nakajima, Nobuhiko Kitawaki, Shoji Makino

    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4     2042 - 2045  2010年  [査読有り]

     概要を見る

    To ensure a satisfactory QoE (Quality of Experience) and facilitate system design in speech recognition services, it is essential to establish a method that can be used to efficiently investigate recognition performance in different noise environments. Previously, we proposed a performance estimation method using a spectral distortion measure. However, there is the problem that recognition task complexity affects the relationship between the recognition performance and the distortion value. To solve this problem, this paper proposes a novel performance estimation method considering the recognition task complexity. We confirmed that the proposed method gives accurate estimates of the recognition performance for various recognition tasks by an experiment using noisy speech data recorded in a real room.

  • Comparison of MOS evaluation characteristics for Chinese, Japanese, and English in IP telephony

    Zhenyu Cai, Nobuhiko Kitawaki, Takeshi Yamada, Shoji Makino

    2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings     112 - 115  2010年  [査読有り]

     概要を見る

    Communication quality in IP telephony is rated in terms of the Mean Opinion Score (MOS), which is an Absolute Category Rating (ACR) scale. There is a problem when comparing subjectively evaluated MOSs in that the evaluation results are strongly affected by differences in language, the instruction words used for the evaluation, and the nationality of the evaluator. To solve these problems, ITU-T SG12 has started to investigate the cultural and language dependencies of subjective quality evaluations undertaken with the MOS method for speech/video/multimedia. In this paper, we present the results of a comparison of the MOS evaluation characteristics for Chinese, Japanese, and English. ©2010 IEEE.

    DOI

    Scopus

    20
    被引用数
    (Scopus)
  • A study of artificial voices for telephonometry in the IP-based telecommunication networks

    Chika, Aoshima, Nobuhiko, Kitawaki, Takeshi, Yamada, 山田, 武志, 牧野, 昭二

    Proc. Tunisian-Japan Symposium on Science, Society & Technology    2009年11月  [査読有り]

  • Analysis of standardized speech database by considering long-term average spectrum

    Naoko, Okubo, Nobuhiko, Kitawaki, Takeshi, Yamada, Makino, Shoji

    Tunisian-Japan Symposium on Science, Society & Technology     1-4  2009年11月  [査読有り]

  • DOA estimation for multiple sparse sources with arbitrarily arranged multiple sensors

    S., Araki, H., Sawada, R., Mukai, and, S. Makino, 牧野, 昭二

    Journal of Signal Processing Systems     1-11 - 11  2009年10月  [査読有り]

    CiNii

  • ブラインド信号処理の技術とその応用論文小特集の発行にあたって

    牧野昭二

    信学論A   J92-A ( 5 ) 275 - 275  2009年05月  [査読有り]

    CiNii

  • Stereo Source Separation and Source Counting with MAP Estimation with Dirichlet Prior Considering Spatial Aliasing Problem

    Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada, Shoji Makino

    INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS   5441   742 - 750  2009年  [査読有り]

     概要を見る

    In this paper, we propose a novel sparse source separation method that can estimate the number of sources and time-frequency masks simultaneously, even when the spatial aliasing problem exists. Recently, many sparse Source separation approaches with time-frequency masks have been proposed. However, most of these approaches require information on the number of sources in advance. In our proposed method, we model the phase difference of arrival (PDOA) between microphones with a Gaussian mixture model (GMM) with a Dirichlet prior. Then we estimate the model parameters by using the maximum a posteriori (MAP) estimation based on the EM algorithm. In order to avoid one cluster being modeled by two or more Gaussians, we utilize a sparse distribution modeled by the Dirichlet distributions as the prior of the GMM mixture weight. Moreover, to handle wide microphone spacing cases where the spatial aliasing problem occurs, the indeterminacy of modulus 2 pi k in the phase is also included in our model. Experimental results show good performance of our proposed method.

  • BLIND SPARSE SOURCE SEPARATION FOR UNKNOWN NUMBER OF SOURCES USING GAUSSIAN MIXTURE MODEL FITTING WITH DIRICHLET PRIOR

    Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada, Shoji Makino

    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS     33 - 36  2009年  [査読有り]

     概要を見る

    In this paper, we propose a novel sparse source separation method that can be applied even if the number of sources is unknown. Recently, many sparse source separation approaches with time-frequency masks have been proposed. However, most of these approaches require information on the number of sources in advance. In our proposed method, we model the histogram of the estimated direction of arrival (DOA) with a Gaussian mixture model (GMM) with a Dirichlet prior. Then we estimate the model parameters by using the maximum a posteriori estimation based on the EM algorithm. In order to avoid one cluster being modeled by two or more Gaussians, we utilize a sparse distribution modeled by the Dirichlet distributions as the prior of the GMM mixture weight. By using this prior, without any specific model selection process, our proposed method can estimate the number of sources and time-frequency masks simultaneously. Experimental results show the performance of our proposed method.

  • Handling speaker position changes in a meeting diarization system by combining DOA clustering and speaker identification

    T., Hager, S., Araki, K., Ishizuka, M., Fujimoto, T., Nakatani, and, S. Makino, 牧野, 昭二

    IWAENC2008     2-12  2008年09月  [査読有り]

    CiNii

  • Foreword to the special section on acoustic scene analysis and reproduction

    S., Makino

    IEICE Trans. Fundamentals   E91-A ( 6 ) 1301-1302  2008年06月  [査読有り]

  • 音源分離技術の最新動向

    澤田宏, 荒木章子, 牧野昭二

    電子情報通信学会誌   91 ( 4 ) 292-296 - 296  2008年04月  [査読有り]

     概要を見る

    音源分離技術は,実環境におけるハンズフリー音声認識やコンピュータによる音環境理解のために必要不可欠な技術である.音源の位置や話者の特徴など,事前知識を必要としない,いわゆるブラインド処理に関する技術がこの10年で大きく進展した.本稿では,独立成分分析やスパース性など,ブラインド音源分離に必要な基本技術を分かりやすく解説し,研究動向や現状での到達点を述べる.

    CiNii

  • A DOA based speaker diarization system for real meetings

    Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino

    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS     30 - 33  2008年  [査読有り]

     概要を見る

    This paper presents a speaker diarization system that estimates who spoke when in a meeting. Our proposed system is realized by using a noise robust voice activity detector (VAD), a direction of arrival (DOA) estimator, and a DOA classifier. Our previous system utilized the generalized cross correlation method with the phase transform (GCC-PHAT) approach for the DOA estimation. Because the GCC-PHAT can estimate just one DOA per frame, it was difficult to handle speaker overlaps. This paper tries to deal with this issue by employing a DOA at each time-frequency slot (TFDOA), and reports how it improves diarization performance for real meetings / conversations recorded in a room with a reverberation time of 350 ms.

  • Speaker indexing and speech enhancement in real meetings/conversations

    Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino

    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12     93 - 96  2008年  [査読有り]

     概要を見る

    This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a GCC-PHAT based direction of arrival (DOA) estimator, and a DOA classifier. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings/conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.

  • Missing feature speech recognition in a meeting situation with maximum SNR beamforming

    Dorothea Kolossa, Shoko Araki, Marc Delcroix, Tomohiro Nakatani, Reinhold Orglmeister, Shoji Makino

    PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10     3218 - +  2008年  [査読有り]

     概要を見る

    Especially for tasks like automatic meeting transcription, it would be useful to automatically recognize speech also while multiple speakers are talking simultaneously. For this purpose, speech separation can be performed, for example by using maximum SNR beamforming. However, even when good interferer suppression is attained, the interfering speech will still be recognizable during those intervals, where the target speaker is silent. In order to avoid the consequential insertion errors, a new soft masking scheme is proposed, which works in the time domain by inducing a large damping on those temporal periods, where the observed direction of arrival does not correspond to that of the target speaker. Even though the masking scheme is aggressive, by means of missing feature recognition the recognition accuracy can be improved significantly, with relative error reductions in the order of 60% compared to maximum SNR beamforming alone, and it is successful also for three simultaneously active speakers. Results are reported based on the SOLON speech recognizer, NTT's large vocabulary system [1], which is applied here for the recognition of artificially mixed data using real-room impulse responses and the entire clean test set of the Aurora 2 database.

  • Guest editors' introduction: Special section on emergent systems, algorithms, and architectures for speech-based human-machine interaction

    Rodrigo Capobianco Guido, Li Deng, Shoji Makino

    IEEE TRANSACTIONS ON COMPUTERS   56 ( 9 ) 1153 - 1155  2007年09月  [査読有り]

  • Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors

    Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

    SIGNAL PROCESSING   87 ( 8 ) 1833 - 1847  2007年08月  [査読有り]

     概要を見る

    This paper presents a new method for blind sparse source separation. Some sparse source separation methods, which. rely on source sparseness and an anechoic mixing model, have already been proposed. These methods utilize level ratios and phase differences between sensor observations as their features, and they separate signals by classifying them. However, some of the features cannot form clusters with a well-known clustering algorithm, e.g., the k-means. Moreover, most previous methods utilize a linear sensor array (or only two sensors), and therefore they cannot separate symmetrically positioned sources. To overcome such problems, we propose a new feature that can be clustered by the k-means algorithm and that can be easily applied to more than three sensors arranged non-linearly. We have obtained promising results for two- and three-dimensionally distributed speech separation with non-linear/non-uniform sensor arrays in a real room even in underdetermined situations. We also investigate the way in which the performance of such methods is affected by room reverberation, which may cause the sparseness and anechoic assumptions to collapse. (C) 2007 Elsevier B.V. All rights reserved.

    DOI CiNii

    Scopus

    228
    被引用数
    (Scopus)
  • Introduction to the special section on blind signal processing for speech and audio applications

    Shoji Makino, Te-Won Lee, Guy J. Brown

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   15 ( 5 ) 1509 - 1510  2007年07月  [査読有り]

    DOI

    Scopus

    2
    被引用数
    (Scopus)
  • MAP-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and ℓ1-norm minimization

    Stefan Winter, Walter Kellermann, Hiroshi Sawada, Shoji Makino

    Eurasip Journal on Advances in Signal Processing   2007  2007年  [査読有り]

     概要を見る

    We address the problem of underdetermined BSS. While most previous approaches are designed for instantaneous mixtures,we propose a time-frequency-domain algorithm for convolutive mixtures. We adopt a two-step method based on a general maximum a posteriori (MAP) approach. In the first step, we estimate the mixing matrix based on hierarchical clustering, assuming that the source signals are sufficiently sparse. The algorithm works directly on the complex-valued data in the time-frequency domain and shows better convergence than algorithms based on self-organizing maps. The assumption of Laplacian priors for the source signals in the second step leads to an algorithm for estimating the source signals. It involves the ℓ1-norm minimization of complex numbers because of the use of the time-frequency-domain approach. We compare a combinatorial approach initially designed for real numbers with a second-order cone programming (SOCP) approach designed for complex numbers. We found that although the former approach is not theoretically justified for complex numbers, its results are comparable to, or even better than, the SOCP solution. The advantage is a lower computational cost for problems with low input/output dimensions.

    DOI

    Scopus

    93
    被引用数
    (Scopus)
  • Blind audio source separation based on independent component analysis

    Shoji Makino, Hiroshi Sawada, Shoko Araki

    INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS   4666   843 - 843  2007年  [査読有り]

  • Blind source separation based on a beamformer array and time frequency binary masking

    Jan Cermak, Shoko Araki, Hiroshi Sawada, Shoji Makino

    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS     145 - 148  2007年  [査読有り]

     概要を見る

    This paper deals with a new technique for blind source separation (BSS) from convolutive mixtures. We present a three-stage separation system employing time-frequency binary masking, beamforming and a non-linear post processing technique. The experiments show that this system outperforms conventional time-frequency binary masking (TFBM) in both (over-)determined and underdetermined cases. Moreover it removes the musical noise and reduces interference in time-frequency slots extracted by TFBM.

  • MLSP 2007 data analysis competition: Frequency-domain blind source separation for convolutive mixtures of speech/audio signals

    Hiroshi Sawada, Shoko Araki, Shoji Makino

    Machine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP     45 - 50  2007年  [査読有り]

     概要を見る

    This paper describes the frequency-domain approach to the blind source separation of speech/audio signals that are convolutively mixed in a real room environment. With the application of shorttime Fourier transforms, convolutive mixtures in the time domain can be approximated as multiple instantaneous mixtures in the frequency domain. We employ complex-valued independent component analysis (ICA) to separate the mixtures in each frequency bin. Then, the permutation ambiguity of the ICA solutions should be aligned so that the separated signals are constructed properly in the time domain. We propose a permutation alignment method based on clustering the activity sequences of the frequency bin-wise separated signals. We achieved the overall winner status of MLSP 2007 Data Analysis Competition based on the presented method. ©2007 IEEE.

    DOI

    Scopus

    16
    被引用数
    (Scopus)
  • A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures

    Hiroshi Sawada, Shoko Araki, Shoji Makino

    2007 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS     157 - 160  2007年  [査読有り]

     概要を見る

    This paper proposes a two-stage method for the blind separation of convolutively mixed sources. We employ time-frequency masking, which can be applied even to an underdetermined case where the number of sensors is insufficient for the number of sources. In the first stage of the method, frequency bin-wise mixtures are classified based on Gaussian mixture model fitting. In the second stage, the permutation ambiguities of the bin-wise classified signals are aligned by clustering the posterior probability sequences calculated in the first stage. Experimental results for separating four speeches with three microphones under reverberant conditions show the superiority of the proposed method over existing methods based on time-difference-of-arrival estimations or signal envelope clustering.

  • Measuring dependence of bin-wise separated signals for permutation alignment in frequency-domain BSS

    Hiroshi Sawada, Shoko Araki, Shoji Makino

    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11     3247 - 3250  2007年  [査読有り]

     概要を見る

    This paper presents a new method for grouping bin-wise separated signals for individual sources, i.e., solving the permutation problem, in the process of frequency-domain blind source separation. Conventionally, the correlation coefficient of separated signal envelopes is calculated to judge whether or not the separated signals originate from the same source. In this paper, we propose a new measure that represents the dominance of the separated signal in the mixtures, and use it for calculating the correlation coefficient, instead of a signal envelope. Such dominance measures exhibit dependence/independence more clearly than traditionally used signal envelopes. Consequently, a simple clustering algorithm with centroids works well for grouping separated signals. Experimental results were very appealing, as three sources including two coming from the same direction were separated properly with the new method.

  • Blind speech separation in a meeting situation with maximum SNR beamformers

    Shoko Araki, Hiroshi Sawada, Shoji Makino

    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS     41 - 44  2007年  [査読有り]

     概要を見る

    We propose a speech separation method for a meeting situation, where each speaker sometimes speaks and the number of speakers changes every moment. Many source separation methods have already been proposed, however, they consider a case where all the speakers keep speaking: this is not always true in a real meeting. In such cases, in addition to separation, speech detection and the classification of the detected speech according to speaker become important issues. For that purpose, we propose a method that employs a maximum signal-to-noise (MaxSNR) beamformer combined with a voice activity detector and online clustering. We also discuss the scaling ambiguity problem as regards the MaxSNR beamformer, and provide their solutions. We report some encouraging results for a real meeting in a room with a reverberation time of about 350 ins.

  • First stereo audio source separation evaluation campaign: Data, algorithms and results

    Emmanuel Vincent, Hiroshi Sawada, Pau Bofill, Shoji Makino, Justinian P. Rosca

    INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS   4666   552 - +  2007年  [査読有り]

     概要を見る

    This article provides an overview of the first stereo audio source separation evaluation campaign, organized by the authors. Fifteen underdetermined stereo source separation algorithms have been applied to various audio data, including instantaneous, convolutive and real mixtures of speech or music sources. The data and the algorithms are presented and the estimated source signals are compared to reference signals using several objective performance criteria.

  • MAP-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and l(1)-norm minimization

    Stefan Winter, Walter Kellermann, Hiroshi Sawada, Shoji Makino

    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING   2007 ( 24717 ) 1 - 12  2007年  [査読有り]

     概要を見る

    We address the problem of underdetermined BSS. While most previous approaches are designed for instantaneous mixtures, we propose a time-frequency-domain algorithm for convolutive mixtures. We adopt a two-step method based on a general maximum a posteriori ( MAP) approach. In the first step, we estimate the mixing matrix based on hierarchical clustering, assuming that the source signals are sufficiently sparse. The algorithm works directly on the complex-valued data in the time-frequency domain and shows better convergence than algorithms based on self-organizing maps. The assumption of Laplacian priors for the source signals in the second step leads to an algorithm for estimating the source signals. It involves the l(1)-norm minimization of complex numbers because of the use of the time-frequency-domain approach. We compare a combinatorial approach initially designed for real numbers with a second-order cone programming (SOCP) approach designed for complex numbers. We found that although the former approach is not theoretically justified for complex numbers, its results are comparable to, or even better than, the SOCP solution. The advantage is a lower computational cost for problems with low input/output dimensions. Copyright (C) 2007 Stefan Winter et al.

    DOI

    Scopus

    93
    被引用数
    (Scopus)
  • Frequency domain blind source separation in a noisy environment

    R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

    2006 Joint meeting of ASA and ASJ     1pSP1  2006年11月  [査読有り]

  • Normalized observation vector clustering approach for sparse source separation

    S., Araki, H., Sawada, R., Mukai, and, S. Makino, 牧野, 昭二

    EUSIPCO2006     Wed.5.4.4  2006年09月  [査読有り]

  • Underdetermined source separation by ICA and homomorphic signal processing

    S., Winter, W., Kellermann, H., Sawada, S., Makino

    IWAENC2006     Wed.Sep.8  2006年09月  [査読有り]

  • Performance evaluation of sparse source separation and DOA estimation with observation vector clustering in reverberant environments

    S., Araki, H., Sawada, R., Mukai, and, S. Makino, 牧野, 昭二

    IWAENC2006     Tue.Sep.4  2006年09月  [査読有り]

  • Blind sparse source separation with spatially smoothed time-frequency masking

    S., Araki, H., Sawada, R., Mukai, S., Makino

    IWAENC2006     Wed.Sep.9  2006年09月  [査読有り]

    CiNii

  • Parametric-Pearson-based independent component analysis for frequency-domain blind speech separation

    H., Kato, Y., Nagahara, S., Araki, H., Sawada, and, S. Makino, 牧野, 昭二

    EUSIPCO2006     Tue.4.2.5  2006年09月  [査読有り]

  • Blind speech separation by combining beamformers and a time frequency binary mask

    J., Cermak, S., Araki, H., Sawada, S., Makino

    IWAENC2006     Tue.Sep.5 - 148  2006年09月  [査読有り]

    CiNii

  • Underdetermined source separation for colored sources

    S., Winter, W., Kellermann, H., Sawada, S., Makino

    EUSIPCO2006     Thu.3.1.6  2006年09月  [査読有り]

  • Musical noise reduction in time-frequency-binary-masking-based blind source separation systems

    J., Cermak, S., Araki, H., Sawada, and, S. Makino, 牧野, 昭二

    Czech-German Workshop on Speech Processing    2006年09月  [査読有り]

  • Stereo echo cancellation algorithm using adaptive update on the basis of enhanced input-signal vector

    S Emura, Y Haneda, A Kataoka, S Makino

    SIGNAL PROCESSING   86 ( 6 ) 1157 - 1167  2006年06月  [査読有り]

     概要を見る

    Stereo echo cancellation requires a fast converging adaptive algorithm because the stereo input signals are highly cross correlated and the convergence rate of the misalignment is slow even after preprocessing for unique identification of stereo echo paths. To speed up the convergence, we propose enhancing the contribution of the decorrelated components in the preprocessed input-signal vector to adaptive updates. The adaptive filter coefficients are updated on the basis of either a single or multiple past enhanced input-signal vectors.
    For a single-vector update, we show how this enhancement improves the convergence rate by analyzing the behavior of the filter coefficient error in the mean. For a two-past-vector update, simulation showed that the proposed enhancement leads to a faster decrease in misalignment than the corresponding conventional second-order affine projection algorithm while computational complexities are almost the same. (c) 2005 Elsevier B.V. All rights reserved.

    DOI

    Scopus

    18
    被引用数
    (Scopus)
  • Solving the permutation problem of frequency-domain BSS when spatial aliasing occurs with wide sensor spacing

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13     4935 - 4938  2006年  [査読有り]

     概要を見る

    This paper describes a method for solving the permutation problem of frequency-domain blind source separation (BSS). The method analyzes the mixing system information estimated with independent component analysis (ICA). When we use widely spaced sensors or increase the sampling rate, spatial aliasing may occur for high frequencies due to the possibility of multiple cycles in the sensor spacing. In such cases, the estimated information would imply multiple possibilities for a source location. This causes some difficulty when analyzing the information. We propose a new method designed to overcome this difficulty. This method first estimates the model parameters for the mixing system at low frequencies where spatial aliasing does not occur, and then refines the estimations by using data at all frequencies. This refinement leads to precise parameter estimation and therefore precise permutation alignment. Experimental results show the effectiveness of the new method.

  • DOA estimation for multiple sparse sources with normalized observation vector clustering

    Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS     33 - +  2006年  [査読有り]

     概要を見る

    This paper presents a new method for estimating the direction of arrival (DOA) of source signals whose number N can exceed the number of sensors M. Subspace based methods, e.g., the MUSIC algorithm, have been widely studied, however, they are only applicable when M &gt; N. Another conventional independent component analysis based method allows AY &gt; N, however, it cannot be applied when A,1 &lt; N. By contrast, our new method can be applied where the sources outnumber the sensors (i.e., an underdeterinined case M &lt; N) by assuming source sparseness. Our method can cope with 2- or 3-dimensionally distributed sources with a 2- or 3-dimensional sensor array. We obtained promising experimental results for 3 x 4, 3 x 5 and 4 x 5 (#sensors x #speech sources) in a room (RT60 = 120 ms).

  • Blind source separation of many signals in the frequency domain

    Ryo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13     5827 - 5830  2006年  [査読有り]

     概要を見る

    This paper describes the frequency-domain blind source separation (BSS) of convolutively mixed acoustic signals using independent component analysis (ICA). The most critical issue related to frequency domain BSS is the permutation problem. This paper presents two methods for solving this problem. Both methods are based on the clustering of information derived from a separation matrix obtained by ICA. The first method is based on direction of arrival (DOA) clustering. This approach is intuitive and easy to understand. The second method is based on normalized basis vector clustering. This method is less intuitive than the DOA based method, but it has several advantages. First, it does not need sensor ar-ray geometry information. Secondly, it can fully utilize the information contained in the separation matrix, since the clustering is performed in high-dimensional space. Experimental results show that our methods realize BSS in various situations such as the separation of many speech signals located in a 3-dimensional space, and the extraction of primary sound sources surrounded by many background interferences.

  • Frequency domain blind source separation of a reduced amount of data using frequency normalization

    Enrique Robledo-Arnuncio, Hiroshi Sawada, Shoji Makino

    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13     5695 - 5698  2006年  [査読有り]

     概要を見る

    The problem of blind source separation (BSS) from convolutive mixtures is often addressed using independent component analysis in the frequency domain. The separation performance with this approach degrades significantly when only a short amount of data is available, since the estimation of the separation system becomes inaccurate. In this paper we present a novel approach to the frequency domain BSS using frequency normalization. Under the conditions of almost sparse sources and of dominant direct path in the mixing systems, we show that the new approach provides better performance than the conventional one when the amount of available data is small.

  • Blind source separation of convolutive mixtures - art. no. 624709

    Shoji Makino

    Independent Component Analyses, Wavelets, Unsupervised Smart Sensors, and Neural Networks IV   6247 ( 7 ) 24709 - 24709  2006年  [査読有り]

     概要を見る

    This paper introduces the blind source separation (BSS) of convolutive mixtures of acoustic signals, especially speech. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving, nonlinear decorrelation, nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency, domain adaptive microphone arrays, i.e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. This is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct separation filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction., and sources can be simultaneously active in BSS.

    DOI

    Scopus

    4
    被引用数
    (Scopus)
  • Geometrical interpretation of the PCA subspace approach for overdetermined blind source separation

    S. Winter, H. Sawada, S. Makino

    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING   2006 ( 71632 ) 1-11  2006年  [査読有り]

     概要を見る

    We discuss approaches for blind source separation where we can use more sensors than sources to obtain a better performance. The discussion focuses mainly on reducing the dimensions of mixed signals before applying independent component analysis. We compare two previously proposed methods. The first is based on principal component analysis, where noise reduction is achieved. The second is based on geometric considerations and selects a subset of sensors in accordance with the fact that a low frequency prefers a wide spacing, and a high frequency prefers a narrow spacing. We found that the PCA-based method behaves similarly to the geometry-based method for low frequencies in the way that it emphasizes the outer sensors and yields superior results for high frequencies. These results provide a better understanding of the former method.

    DOI

    Scopus

    12
    被引用数
    (Scopus)
  • Solving the permutation problem of frequency-domain BSS when spatial aliasing occurs with wide sensor spacing

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS     77 - +  2006年  [査読有り]

     概要を見る

    This paper describes a method for solving the permutation problem of frequency-domain blind source separation (BSS). The method analyzes the mixing system information estimated with independent component analysis (ICA). When we use widely spaced sensors or increase the sampling rate, spatial aliasing may occur for high frequencies due to the possibility of multiple cycles in the sensor spacing. In such cases, the estimated information would imply multiple possibilities for a source location. This causes some difficulty when analyzing the information. We propose a new method designed to overcome this difficulty. This method first estimates the model parameters for the mixing system at low frequencies where spatial aliasing does not occur, and then refines the estimations by using data at all frequencies. This refinement leads to precise parameter estimation and therefore precise permutation alignment. Experimental results show the effectiveness of the new method.

  • Frequency domain blind source separation of a reduced amount of data using frequency normalization

    Enrique Robledo-Arnunciou, Hiroshi Sawada, Shoji Makino

    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS     837 - +  2006年  [査読有り]

     概要を見る

    The problem of blind source separation (BSS) from convolutive mixtures is often addressed using independent component analysis in the frequency domain. The separation performance with this approach degrades significantly when only a short amount of data is available, since the estimation of the separation system becomes inaccurate. In this paper we present a novel approach to the frequency domain BSS using frequency normalization. Under the conditions of almost sparse sources and of dominant direct path in the mixing systems. we show that the new approach provides better performance than the conventional one when the amount of available data is small.

  • Underdetermined sparse source separation of convolutive mixtures with observation vector clustering

    Shoko Araki, Heroshi Sawada, Ryo Mukai, Shoji Makino

    2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS     3594 - 3597  2006年  [査読有り]

     概要を見る

    We propose a new method for solving the underdetermined sparse signal separation problem. Some sparseness based methods have already been proposed. However, most of these methods utilized a linear sensor array (or only two sensors), and therefore they have certain limitations; e.g., they cannot separate symmetrically positioned sources. To allow the use of more than three sensors that can be arranged in a non-linear/non-uniform way, we propose a new method that includes the normalization and clustering of the observation vectors. Our proposed method can handle both underdetermined case and (over-)determined cases. We show practical results for speech separation with nonlinear/non-uniform sensor arrangements. We obtained promising experimental results for the cases of 3 x 4, 4 x 5 (#sensors x #sources) in a room (RT60 = 120 ms).

  • DOA estimationfor multiple sparse sources with normalized observation vector clustering

    Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13     4891 - 4894  2006年  [査読有り]

     概要を見る

    This paper presents a new method for estimating the direction of arrival (DOA) of source signals whose number N can exceed the number of sensors M. Subspace based methods, e.g., the MUSIC algorithm, have been widely studied, however, they are only applicable when M &gt; N. Another conventional independent component analysis based method allows M &gt; N, however, it cannot be applied when M &lt; N. By contrast, our new method can be applied where the sources outnumber the sensors (i.e., an underdetermined case M &lt; N) by assuming source sparseness. Our method can cope with 2- or 3-dimensionally distributed sources with a 2- or 3-dimensional sensor array. We obtained promising experimental results for 3 x 4, 3 x 5 and 4 x 5 (#sensors x #speech sources) in a room (RT60 = 120 ms).

  • On calculating the inverse of separation matrix in frequency-domain blind source separation

    H Sawada, S Araki, R Mukai, S Makino

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION, PROCEEDINGS   3889   691 - 699  2006年  [査読有り]

     概要を見る

    For blind source separation (BSS) of convolutive mixtures, the frequency-domain approach is efficient and practical, because the convolutive mixtures are modeled with instantaneous mixtures at each frequency bin and simple instantaneous independent component analysis (ICA) can be employed to separate the mixtures. However, the permutation and scaling ambiguities of ICA solutions need to be aligned to obtain proper time-domain separated signals. This paper discusses the idea that calculating the inverses of separation matrices obtained by ICA is very important as regards aligning these ambiguities. This paper also shows the relationship between the ICA-based method and the time-frequency masking method for BSS, which becomes clear by calculating the inverses.

  • Blind source separation of many signals in the frequency domain

    Ryo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS     969 - +  2006年  [査読有り]

     概要を見る

    This paper describes the frequency-domain blind source separation (BSS) of convolutively mixed acoustic signals using independent component analysis (ICA). The most critical issue related to frequency domain BSS is the permutation problem. This paper presents two methods for solving this problem. Both methods are based on the clustering of information derived from a separation matrix obtained by ICA. The first method is based on direction of arrival (DOA) clustering. This approach is intuitive and easy to understand. The second method is based on normalized basis vector clustering. This method is less intuitive than the DOA based method, but it has several advantages. First, it does not need sensor array geometry information. Secondly, it can fully utilize the information contained in the separation matrix, since the clustering is performed in high-dimensional space. Experimental results show that our methods realize BSS in various situations such as the separation of many speech signals located in a 3-dimensional space. and the extraction of primary sound sources surrounded by many background interferences.

  • Recognition of convolutive speech mixtures by missing feature techniques for ICA

    Dorothea Kolossa, Hiroshi Sawada, Ramon Fernandez Astudillo, Reinhold Orglmeister, Shoji Makino

    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5     1397 - +  2006年  [査読有り]

     概要を見る

    One challenging problem for robust speech recognition is the cocktail party effect, where multiple speaker signals are active simultaneously in an overlapping frequency range. In that case, independent component analysis (ICA) can separate the signals in reverberant environments, also. However, incurred feature distortions prove detrimental for speech recognition. To reduce consequential recognition errors, we describe the use of ICA for the additional estimation of uncertainty information. This information is subsequently used in missing feature speech recognition, which leads to far more correct and accurate recognition also in reverberant situations at RT60 = 300ms.

  • Blind separation and localization of speeches in a meeting situation

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5     1407 - +  2006年  [査読有り]

     概要を見る

    The technique of blind source separation (BSS) has been well studied. In this paper, we apply the BSS technique, particularly based on independent component analysis (ICA), to a meeting situation. The goal is to enhance the spoken utterances and to estimate the location of each speaker by means of multiple microphones. The technique may help us to take the minutes of a meeting.

  • Frequency-domain blind source separation of many speech signals using near-field and far-field models

    Ruo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING   2006 ( 83683 ) 1 - 13  2006年  [査読有り]

     概要を見る

    We discuss the frequency-domain blind source separation (BSS) of convolutive mixtures when the number of source signals is large, and the potential source locations are omnidirectional. The most critical problem related to the frequency-domain BSS is the permutation problem, and geometric information is helpful as regards solving it. In this paper, we propose a method for obtaining proper geometric information with which to solve the permutation problem when the number of source signals is large and some of the signals come from the same or a similar direction. First, we describe a method for estimating the absolute DOA by using relative DOAs obtained by the solution provided by independent component analysis (ICA) and the far-field model. Next, we propose a method for estimating the spheres on which source signals exist by using ICA solution and the near-field model. We also address another problem with regard to frequency-domain BSS that arises from the circularity of discrete-frequency representation. We discuss the characteristics of the problem and present a solution for solving it. Experimental results using eight microphones in a room show that the proposed method can separate a mixture of six speech signals arriving from various directions, even when two of them come from the same direction. Copyright (C) 2006 Ryo Mukai et al.

    DOI

    Scopus

    26
    被引用数
    (Scopus)
  • Subband-based blind separation for convolutive mixtures of speech

    S Araki, S Makino, R Aichner, T Nishikawa, H Saruwatari

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E88A ( 12 ) 3593 - 3603  2005年12月  [査読有り]

     概要を見る

    We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.

    DOI

    Scopus

    21
    被引用数
    (Scopus)
  • Underdetermined blind separation for speech in real environments with F0 adaptive comb filtering

    F., Flego, S., Araki, H., Sawada, T., Nakatani, and, S. Makino, 牧野, 昭二

    IWAENC2005     93-96  2005年09月  [査読有り]

  • Real-time blind source separation and DOA estimation using small 3-D microphone array

    R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

    IWAENC2005     45-48  2005年09月  [査読有り]

  • Real-time blind extraction of dominant target sources from many background interference sources

    H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    IWAENC2005     73-76 - 76  2005年09月  [査読有り]

    CiNii

  • A novel blind source separation method with observation vector clustering

    S., Araki, H., Sawada, R., Mukai, and, S. Makino, 牧野, 昭二

    IWAENC2005     117-120  2005年09月  [査読有り]

  • Blind source separation of convolutive mixtures of audio signals in frequency domain

    S., Makino

    Advances in Circuits and Systems   ( 5 )  2005年08月  [査読有り]

  • Underdetermined blind separation of convolutive mixtures of speech using time-frequency mask and mixing matrix estimation

    A Blin, S Araki, S Makino

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E88A ( 7 ) 1693 - 1700  2005年07月  [査読有り]

     概要を見る

    This paper focuses on the underdetermined blind source separation (BSS) of three speech signals mixed in a real environment from measurements provided by two sensors. To date, solutions to the underdetermined BSS problem have mainly been based on the assumption that the speech signals are sufficiently sparse. They involve designing binary masks that extract signals at time-frequency points where only one signal was assumed to exist. The major issue encountered in previous work relates to the occurrence of distortion, which affects a separated signal with loud musical noise. To overcome this problem, we propose combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to detect when only one source is active and to perform a preliminary separation with a time-frequency mask. This information is then used to estimate the mixing matrix, which allows us to improve our separation. Experimental results show that this combination of time-frequency mask and mixing matrix estimation provides separated signals of better quality (less distortion, less musical noise) than those extracted without using the estimated mixing matrix in reverberant conditions where the reverberant time (TR) was 130 ms and 200 ms. Furthermore, informal listening tests clearly show that musical noise is deeply lowered by the proposed method comparatively to the classical approaches.

    DOI

    Scopus

    19
    被引用数
    (Scopus)
  • Blind source separation of convolutive mixtures of speech in frequency domain

    S Makino, H Sawada, R Mukai, S Araki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E88A ( 7 ) 1640 - 1655  2005年07月  [査読有り]  [招待有り]

     概要を見る

    This paper overviews a total solution for frequency-domain blind source separation (BSS) of convolutive mixtures of audio signals, especially speech. Frequency-domain BSS performs independent component analysis (ICA) in each frequency bin, and this is more efficient than time-domain BSS. We describe a sophisticated total solution for frequency-domain BSS, including permutation, scaling, circularity, and complex activation function solutions. Experimental results of 2 x 2, 3 x 3, 4 x 4, 6 x 8, and 2 x 2 (moving sources), (#sources x #microphones) in a room are promising.

    DOI

    Scopus

    57
    被引用数
    (Scopus)
  • Frequency-domain blind source separation without array geometry information

    H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    HSCMA2005     d13-d14  2005年03月  [査読有り]

  • Blind source separation and DOA estimation using small 3-D microphone array

    R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

    HSCMA2005 (Joint Workshop on Hands-Free Speech Communication and Microphone Arrays)     d9-d10  2005年03月  [査読有り]

  • Source extraction from speech mixtures with null-directivity pattern based mask

    S., Araki, S., Makino, H., Sawada, and, R. Mukai

    HSCMA2005     d1-d2  2005年03月  [査読有り]

  • Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    Proceedings - IEEE International Symposium on Circuits and Systems     5882 - 5885  2005年  [査読有り]

     概要を見る

    This paper presents a method for enhancing a target source of interest and suppressing other interference sources. The target source is assumed to be close to sensors, to have dominant power at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e. without knowing the total number of sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. We propose a new sophisticated method for selecting the target source frequency components, and also a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room (reverberation time was 130 ms) are presented to show the effectiveness and characteristics of the proposed method. © 2005 IEEE.

    DOI

    Scopus

    12
    被引用数
    (Scopus)
  • Blind extraction of a dominant source signal from mixtures of many sources

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings   III   III61 - III64  2005年  [査読有り]

     概要を見る

    This paper presents a method for enhancing a dominant target source that is close to sensors, and suppressing other interferences. The enhancement is performed blindly, i.e. without knowing the number of total sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage processing technique where a spatial filter is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. To obtain the spatial filter we employ independent component analysis and then select the component of the target source. Time-frequency masks in the second stage are obtained by calculating the angle between the basis vector corresponding to the target source and a sample vector. The experimental results for a simulated cocktail party situation were very encouraging. ©2005 IEEE.

    DOI

    Scopus

    20
    被引用数
    (Scopus)
  • Multiple source localization using independent component analysis

    Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

    IEEE Antennas and Propagation Society, AP-S International Symposium (Digest)   4 ( P3 ) 81 - 84  2005年  [査読有り]

     概要を見る

    This paper presents a method for estimating location information about multiple sources. The proposed method uses independent component analysis (ICA) as a main statistical tool. The nearfield model as well as the farfield model can be assumed in this method. As an application of the method, we show experimental results for the direction-of-arrival (DOA) estimation of three sources that were positioned 3-dimensionally. © 2005 IEEE.

    DOI

    Scopus

    20
    被引用数
    (Scopus)
  • Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask

    S Araki, S Makino, H Sawada, R Mukai

    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5   III   81 - 84  2005年  [査読有り]

     概要を見る

    Musical noise is a typical problem with blind source separation using a time-frequency mask. In this paper, we report that a fine-shift and overlap-add method reduces the musical noise without degrading the separation performance. The effectiveness was confirmed by results of a the listening test undertaken in a room with a reverberation time of RT60 =130 ms.

  • Estimating the number of sources using independent component analysis

    Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

    Acoustical Science and Technology   26 ( 5 ) 450 - 452  2005年  [査読有り]

     概要を見る

    A new approach for estimating the number of sources that employs independent component analysis (ICA) is discussed. Estimating the number of sources provides information for signal processing applications such as blind source separation (BSS) in the frequency domain. The new method can identify a noise component that includes reverberations by calculating the correlation of the envelopes. The results show that the characteristics of the proposed approach compare with the conventional eigenvalue-based method.

    DOI

    Scopus

    15
    被引用数
    (Scopus)
  • Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking

    H Sawada, S Araki, R Mukai, S Makino

    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS   III   5882 - 5885  2005年  [査読有り]

     概要を見る

    This paper presents a method for enhancing a target source of interest and suppressing other interference sources. The target source is assumed to be close to sensors, to have dominant power at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e. without knowing the total number of sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. We propose a new sophisticated method for selecting the target source frequency components, and also a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room (reverberation time was 130 ms) are presented to show the effectiveness and characteristics of the proposed method.

  • A spatio-temporal fastica algorithm for separating convolutive mixtures

    SC Douglas, H Sawada, S Makino

    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5   V   165 - 168  2005年  [査読有り]

     概要を見る

    This paper presents a spatio-temporal extension of the well-known fastICA algorithm of Hyvarinen and Oja that is applicable to both convolutive blind source separation and multichannel blind deconvolution tasks. Our time-domain algorithm combines multichannel spatio-temporal prewhitening via multi-stage least-squares linear prediction with a fixed-point iteration involving a new adaptive technique for imposing paraunitary constraints on the multichannel separation filter. Our technique also allows for efficient reconstruction of individual signals as observed in the sensor measurements for single-input, multiple-output (SIMO) BSS tasks. Analysis and simulations verify the utility of the proposed methods.

  • Blind Source Separation of 3-D located many speech signals

    R Mukai, H Sawada, S Araki, S Makino

    2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA)     9 - 12  2005年  [査読有り]

     概要を見る

    This paper presents if prototype system for Blind Source Separation (BSS) of many speech signals and describes the techniques used in the system. Our System uses 8 microphones located at the vertexes of a 4cmx4cmx4cm cube and has the ability to separate signals distributed in three-dimensional space. The mixed signals observed by the microphone array are processed by Independent Component Analysis (ICA) in the frequency domain and separated into a given number of signals (LIP to 8). We carried Out experiments in all ordinary office and obtained more than 20 dB of SIR improvement.

  • On real and complex valued l(1)-norm minimization for overcomplete blind source separation

    S Winter, H Sawada, S Makino

    2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA)     86 - 89  2005年  [査読有り]

     概要を見る

    A maximum it-posteriori approach for overcomplete blind source separation based on Laplacian priors usually involves l(1)-norm minimization. It requires different approaches for real and complex numbers it,; they appear for example in the frequency domain. In this paper we compare a combinatorial approach for real numbers with it second order cone programming approach for complex numbers.
    Although the combinatorial solution with a proven minimum number of zeros is not theoretically justified for complex numbers, its performance quality is comparable to the performance of the second order cone programming (SOCP) solution. However, it has the advantage that it is faster for complex overcomplete BSS problems with low input/output dimensions.

  • Hierarchical clustering applied to overcomplete BSS for convolutive mixtures

    S., Winter, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

    SAPA2004 (ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing)   1 ( 3 ) 1-6  2004年10月  [査読有り]

  • Underdetermined blind speech separation with directivity pattern based continuous mask and ICA

    S., Araki, S., Makino, H., Sawada, and, R. Mukai

    EUSIPCO2004     1991-1994  2004年09月  [査読有り]

  • Blind source separation for moving speech signals using blockwise ICA and residual crosstalk subtraction

    R Mukai, H Sawada, S Araki, S Makino

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E87A ( 8 ) 1941 - 1948  2004年08月  [査読有り]

     概要を見る

    This paper describes a real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.

  • Convolutive blind source separation for more than two sources in the frequency domain

    Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

    Acoustical Science and Technology   25 ( 4 ) 296 - 298  2004年07月  [査読有り]

     概要を見る

    The use of blind source separation (BSS) technique for the recovery of more than two sources inthe frequency domain was iprensented. It was found that frequency-domain BSS method was practically applicable for more than two sources by overcoming problem of permutation and circularity. The minimization error could be done by adjusting the scaling ambiguity of the independent component analysis (ICA) solution before windowing. The result shows that the effectiveness and efficiency of the BSS method and the separation of six sources with a planar array of eight sensors.

    DOI

    Scopus

    4
    被引用数
    (Scopus)
  • Underdetermined blind source separation for convolutive mixtures exploiting a sparseness-mixing matrix estimation (SMME)

    A., Blin, S., Araki, and, S. Makino, 牧野, 昭二

    ICA2004 (International Congress on Acoustics)   IV   3139-3142  2004年04月  [査読有り]

  • A causal frequency-domain implementation of a natural gradient multichannel blind deconvolution and source separation algorithm

    S., Douglas, H., Sawada, and, S. Makino, 牧野, 昭二

    ICA2004 (International Congress on Acoustics)   I   85-88  2004年04月  [査読有り]

  • Solving the permutation and circularity problems of frequency-domain blind source separation

    H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    ICA2004 (International Congress on Acoustics)   I   89-92  2004年04月  [査読有り]

  • Algorithmic complexity based blind source separation for convolutive speech mixtures

    S, de la Kethulle, R., Mukai, H., Sawada, and, S. Makino, 牧野, 昭二

    ICA2004 (International Congress on Acoustics)   IV   3127-3130  2004年04月  [査読有り]

  • A solution for the permutation problem in frequency domain BSS using near- and far-field models

    R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

    ICA2004 (International Congress on Acoustics)   IV   3135-3138  2004年04月  [査読有り]

  • Underdetermined blind separation of convolutive mixtures of speech by combining time-frequency masks and ICA

    S., Araki, S., Makino, A., Blin, R., Mukai, and, H. Sawada

    ICA2004 (International Congress on Acoustics)   I   321-324  2004年04月  [査読有り]

  • Evaluation of separation and dereverberation performance in frequency domain blind source separation

    Ryo Mukai, Shoko Araki, Hiroshi Sawada, Shoji Makino

    Acoustical Science and Technology   25 ( 2 ) 119 - 126  2004年03月  [査読有り]

     概要を見る

    In this paper, we propose a new method for evaluating the separation and dereverberation performance of a convolutive blind source separation (BSS) system, and investigate a separating system obtained by employing frequency domain BSS based on independent component analysis (ICA). As a result, we reveal the acoustical characteristics of the frequency domain BSS for convolutive mixture of speech signals. We show that the separating system removes the direct sound of a jammer signal even when the frame length is relatively short, and it also reduces the reverberation of the jammer according to the frame length. We also confirm that the reverberation of the target is not reduced. Moreover, we propose a technique, suggested by the experimental results, for improving the quality of the separated signals by removing pre-echo noise.

    DOI

    Scopus

    9
    被引用数
    (Scopus)
  • Underdetermined blind separation for speech in real environments with sparseness and ICA

    S Araki, S Makino, A Blin, R Mukai, H Sawada

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS   III   881 - 884  2004年  [査読有り]

     概要を見る

    In this paper, we propose a method for separating speech signals when there are more signals than sensors. Several methods have already been proposed for solving the underdetermined problem, and some of these utilize the sparseness of speech signals. These methods employ binary masks to extract the signals, and therefore, their extracted signals contain loud musical noise. To overcome this problem, we propose combining a sparseness approach and independent component analysis (ICA). First, using sparseness, we estimate the time points when only one source is active. Then, we remove this single source from the observations and apply ICA to the remaining mixtures. Experimental results show that our proposed sparseness and ICA (SPICA) method can separate signals with little distortion even in reverberant conditions of T-R=130 and 200 ms.

  • Frequency domain blind source separation using small and large spacing sensor pairs

    R Mukai, H Sawada, S Araki, S Makino

    2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 5, PROCEEDINGS   V   1 - 4  2004年  [査読有り]

     概要を見る

    This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when the number of source signals is large, and the potential source locations are onmidirectional. We propose a combination of small and large spacing sensor pairs with various axis directions in order to obtain proper geometrical information for solving the permutation problem. Experimental results show that the proposed method can separate a mixture of six speech signals that come from various directions, even when two of them come from the same direction.

  • Convolutive blind source separation for more than two sources in the frequency domain

    H Sawada, R Mukai, S Araki, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS   III   885 - 888  2004年  [査読有り]

     概要を見る

    Blind source separation (BSS) for convolutive mixtures can be efficiently achieved in the frequency domain, where independent component analysis is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem, which is well known as a difficult problem, especially when the number of sources is large. This paper presents a method for solving the permutation problem, which works well even for many sources. The successful solution for the permutation problem highlights another problem with frequency-domain BSS that arises from the circularity of discrete frequency representation. This paper discusses the phenomena of the problem and presents a method for solving it. With these two methods, we can separate many sources with a practical execution time. Moreover, real-time processing is currently possible for up to three sources with our implementation.

  • Audio source separation based on independent component analysis

    S Makino, S Araki, R Mukai, H Sawada

    2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 5, PROCEEDINGS   V   668 - 671  2004年  [査読有り]

     概要を見る

    This paper introduces the blind source separation (BSS) of convolutive mixtures of acoustic signals, especially speech. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving nonlinear decorrelation,.nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. This is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct unmixing filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction, and sources can be simultaneously active in BSS.

  • Near-field frequency domain blind source separation for convolutive mixtures

    R Mukai, H Sawada, S Araki, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS   IV   49 - 52  2004年  [査読有り]

     概要を見る

    This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when source signals come from the same or similar directions. Geometric information such as the direction of arrival (DOA) is helpful for solving the permutation problem, and a combination of the DOA based and correlation based methods provides a robust and precise solution. However when signals come from similar directions, the DOA based approach fails, and we have to use only the correlation based method whose performance is unstable. In this paper, we show that an interpretation of the ICA solution by a near-field model yields information about spheres on which source signals exist, which can be used as an alternative to the DOA. Experimental results show that the proposed method can robustly separate a mixture of signals arriving from the same direction.

  • On coefficient delay in natural gradient blind deconvolution and source separation algorithms

    SC Douglas, H Sawada, S Makino

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION   3195   634 - 642  2004年  [査読有り]

     概要を見る

    In this paper, we study the performance effects caused by coefficient delays in natural gradient blind deconvolution and source separation algorithms. We present a statistical analysis of the effect of coefficient delays within such algorithms, quantifying the relative loss in performance caused by such coefficient delays with respect to delayless algorithm updates. We then propose a simple change to one such algorithm to improve its convergence performance.

  • Overcomplete BSS for convolutive mixtures based on hierarchical clustering

    S Winter, H Sawada, S Araki, S Makino

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION   3195   652 - 660  2004年  [査読有り]

     概要を見る

    In this paper we address the problem of overcomplete BSS for convolutive mixtures following a two-step approach. In the first step the mixing matrix is estimated, which is then used to separate the signals in the second step. For estimating the mixing matrix we propose an algorithm based on hierarchical clustering, assuming that the source signals are sufficiently sparse. It has the advantage of working directly on the complex valued sample data in the frequency-domain. It also shows better convergence than algorithms based on self-organizing maps. The results are improved by reducing the variance of direction of arrival. Experiments show accurate estimations of the mixing matrix and very low musical tone noise.

  • Natural gradient multichannel blind deconvolution and source separation using causal FIR filters

    SC Douglas, H Sawada, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS   V   477 - 480  2004年  [査読有り]

     概要を見る

    Practical gradient-based adaptive algorithms for multichannel blind deconvolution and convolutive blind source separation typically employ FIR filters for the separation system. Inadequate use of signal truncation within these algorithms can introduce steady-state biases into their converged solutions that lead to degraded separation and deconvolution performances. In this paper, we derive a natural gradient multichannel blind deconvolution and source separation algorithm that mitigates these effects for estimating causal FIR solutions to these tasks. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for acoustic sources, even for extremely-short separation filters.

  • Convolutive blind source separation for more than two sources in the frequency domain

    H Sawada, R Mukai, S Araki, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS   25 ( 4 ) 885 - 888  2004年  [査読有り]

     概要を見る

    Blind source separation (BSS) for convolutive mixtures can be efficiently achieved in the frequency domain, where independent component analysis is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem, which is well known as a difficult problem, especially when the number of sources is large. This paper presents a method for solving the permutation problem, which works well even for many sources. The successful solution for the permutation problem highlights another problem with frequency-domain BSS that arises from the circularity of discrete frequency representation. This paper discusses the phenomena of the problem and presents a method for solving it. With these two methods, we can separate many sources with a practical execution time. Moreover, real-time processing is currently possible for up to three sources with our implementation.

  • Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ICA

    S Araki, S Makino, H Sawada, R Mukai

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION   3195   898 - 905  2004年  [査読有り]

     概要を見る

    We propose a method for separating N speech signals with M sensors where N &gt; M. Some existing methods employ binary masks to extract the signals, and therefore, the extracted signals contain loud musical noise. To overcome this problem, we propose using a directivity pattern based continuous mask, which masks N - M sources in the observations, and independent component analysis (ICA) to separate the remaining mixtures. We conducted experiments for N = 3 with M = 2 and N = 4 with M = 2, and obtained separated signals with little distortion.

  • Natural gradient multichannel blind deconvolution and source separation using causal FIR filters

    SC Douglas, H Sawada, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS   13 ( 1 ) 477 - 480  2004年  [査読有り]

     概要を見る

    Practical gradient-based adaptive algorithms for multichannel blind deconvolution and convolutive blind source separation typically employ FIR filters for the separation system. Inadequate use of signal truncation within these algorithms can introduce steady-state biases into their converged solutions that lead to degraded separation and deconvolution performances. In this paper, we derive a natural gradient multichannel blind deconvolution and source separation algorithm that mitigates these effects for estimating causal FIR solutions to these tasks. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for acoustic sources, even for extremely-short separation filters.

  • A sparseness - Mixing Matrix Estimation (SMME) solving the underdetermined BSS for convolutive mixtures

    A Blin, S Araki, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS   IV   85 - 88  2004年  [査読有り]

     概要を見る

    We propose a method for blindly separating real environment speech signals with as less distortion as possible in the special case where speech signals outnumber sensors. Our idea consists in combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to perform a preliminary separation and to detect when only one source is active. This information is then used to estimate the mixing matrix. Then we remove one source from the observations and separate the residual signals with the inverse of the estimated mixing matrix. Experimental results in a real environment (T-R=130ms and 200ms) show that our proposed method, which we call Sparseness Mixing Matrix Estimation (SMME), provides separated signals of better quality than those extracted by only using the sparseness property of the speech signal.

  • Frequency domain blind source separation for many speech signals

    R Mukai, H Sawada, S Araki, S Makino

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION   3195   461 - 469  2004年  [査読有り]

     概要を見る

    This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when the number of source signals is large, and the potential source locations are omnidirectional. We propose a combination of small and large spacing sensor pairs with various axis directions in order to obtain proper geometric information for solving the permutation problem. Experimental results in a room (reverberation time T-R = 130 ms) with eight microphones show that the proposed method can separate a mixture of six speech signals that come from various directions, even when two of them come from the same direction.

  • Estimating the number of sources for frequency-domain blind source separation

    H Sawada, S Winter, R Mukai, S Araki, S Makino

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION   3195   610 - 617  2004年  [査読有り]

     概要を見る

    Blind source separation (BSS) for convolutive mixtures can be performed efficiently in the frequency domain, where independent component analysis (ICA) is applied separately in each frequency bin. To solve the permutation problem of frequency-domain BSS robustly, information regarding the number of sources is very important. This paper presents a method for estimating the number of sources from convolutive mixtures of sources. The new method estimates the power of each source or noise component by using ICA and a scaling technique to distinguish sources and noises. Also, a reverberant component can be identified by calculating the correlation of component envelopes. Experimental results for up to three sources show that the proposed method worked well in a reverberant condition whose reverberation time was 200 ms.

  • Underdetermined blind separation of convolutive mixtures of speech with binary masks and ICA

    S., Araki, S., Makino, H., Sawada, A., Blin, and, R. Mukai

    NIPS2003 Workshop on ICA: Sparse Representations in Signal Processing   2 ( 7 ) 1-4  2003年12月  [査読有り]

  • Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming for convolutive mixtures

    S., Araki, S., Makino, Y., Hinamoto, R., Mukai, T., Nishikawa, and, H. Saruwatari

    EURASIP Journal on Applied Signal Processing   2003 ( 11 ) 1157-1166 - 1166  2003年11月  [査読有り]

    CiNii

  • Blind source separation when speech signals outnumber sensors using a sparseness-mixing matrix estimation (SMME)

    A., Blin, S., Araki, and, S. Makino, 牧野, 昭二

    IWAENC2003     211-214 - 214  2003年09月  [査読有り]

    CiNii

  • Blind separation of more speech than sensors with less distortion by combining sparseness and ICA

    S., Araki, S., Makino, A., Blin, R., Mukai, and, H. Sawada

    IWAENC2003     271-274  2003年09月  [査読有り]

    CiNii

  • Spectral smoothing for frequency-domain blind source separation

    H., Sawada, R., Mukai, S, de la Kethulle, S., Araki, and, S. Makino, 牧野, 昭二

    IWAENC2003     311-314  2003年09月  [査読有り]

    CiNii

  • Blind source separation for convolutive mixtures based on complexity minimization

    S, de la Kethulle, R., Mukai, H., Sawada, and, S. Makino, 牧野, 昭二

    IWAENC2003     303-306  2003年09月  [査読有り]

  • Array geometry arrangement for frequency domain blind source separation

    R., Mukai, H., Sawada, S, de la Kethulle, S., Araki, and, S. Makino, 牧野, 昭二

    IWAENC2003     219-222 - 222  2003年09月  [査読有り]

    CiNii

  • Multistage ICA for blind source separation of real acoustic convolutive mixture

    T., Nishikawa, H., Saruwatari, K., Shikano, S., Araki, and, S. Makino, 牧野, 昭二

    ICA2003     523-528  2003年04月  [査読有り]

    CiNii

  • Subband based blind source separation with appropriate processing for each frequency band

    S., Araki, S., Makino, R., Aichner, T., Nishikawa, and, H. Saruwatari

    ICA2003     499-504  2003年04月  [査読有り]

    CiNii

  • Geometrical interpretation of the PCA subspace method for overdetermined blind source separation

    S., Winter, H., Sawada, and, S. Makino, 牧野, 昭二

    ICA2003     775-780 - 780  2003年04月  [査読有り]

    CiNii

  • Real-time blind source separation for moving speakers using blockwise ICA and residual crosstalk subtraction

    R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

    ICA2003     975-980 - 980  2003年04月  [査読有り]

    CiNii

  • On-line time-domain blind source separation of nonstationary convolved signals

    R., Aichner, H., Buchner, S., Araki, and, S. Makino, 牧野, 昭二

    ICA2003     987-992  2003年04月  [査読有り]

  • A robust and precise method for solving the permutation problem of frequency-domain blind source separation

    H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    ICA2003     505-510  2003年04月  [査読有り]

    CiNii

  • Geometrically constrained ICA for robust separation of sound mixtures

    M., Knaak, S., Araki, and, S. Makino, 牧野, 昭二

    ICA2003     951-956  2003年04月  [査読有り]

  • Polar coordinate based nonlinear function for frequency-domain blind source separation

    H Sawada, R Mukai, S Araki, S Makino

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E86A ( 3 ) 590 - 596  2003年03月  [査読有り]

     概要を見る

    This paper discusses a nonlinear function for independent component analysis to process complex-valued signals in frequency-domain blind source separation. Conventionally, nonlinear functions based on the Cartesian coordinates are widely used. However, such functions have a convergence problem. In this paper, we propose a more appropriate nonlinear function that is based on the polar coordinates of a complex number. In addition, we show that the difference between the two types of functions arises from the assumed densities of independent components. Our discussion is supported by several experimental results for separating speech signals, which show that the polar type nonlinear functions behave better than the Cartesian type.

  • A robust approach to the permutation problem of frequency-domain blind source separation

    H Sawada, R Mukai, S Araki, S Makino

    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS   V   381 - 384  2003年  [査読有り]

     概要を見る

    This paper presents a robust and precise method for solving the permutation problem of frequency-domain blind source separation. It is based on two previous approaches: the direction of arrival estimation approach and the inter-frequency correlation approach. We discuss the advantages and disadvantages of the two approaches, and integrate them to exploit the both advantages. We also present a closed form formula to calculate a null direction, which is used in estimating the directions of source signals. Experimental results show that our method solved permutation problems almost perfectly for a situation that two sources were mixed in a room whose reverberation time was 300 ms.

  • Robust real-time blind source separation for moving speakers in a room

    R Mukai, H Sawada, S Araki, S Makino

    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS   V   469 - 472  2003年  [査読有り]

     概要を見る

    This paper describes a robust real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.

  • Geometrically constraint ICA for convolutive mixtures of sound

    M Knaak, S Araki, S Makino

    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS   II   725 - 728  2003年  [査読有り]

     概要を見る

    The goal of this contribution is a new algorithm using independent component analysis with a geometrical constraint. The new algorithm solves the permutation problem of blind source separation of acoustic mixtures, and it is significantly less sensitive to the precision of the geometrical constraint than an adaptive beamformer. A high degree of robustness is very important since the steering vector is always roughly estimated in the reverberant environment, even when the look direction is precise. The new algorithm is based on FastICA and constrained optimization. It is theoretically and experimentally analyzed with respect to the roughness of the steering vector estimation by using impulse responses of real room. The effectiveness of the algorithms for real-world mixtures is also shown in the case of three sources and three microphones.

  • Direction of arrival estimation for multiple source signals using independent component analysis

    H Sawada, R Mukai, S Makino

    SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 2, PROCEEDINGS     411 - 414  2003年  [査読有り]

     概要を見る

    This paper presents a new method for estimating the directions of source signals. We assume a situation in which multiple source signals are mixed in a reverberant condition and observed at several sensors. The new method is based on independent component analysis, which separates mixed signals into original source signals. It can be applied where the number of sources is equal to the number of sensors, whereas the conventional methods based on sub-space analysis, such as the MUSIC algorithm, are applicable where there are fewer sources than sensors. Even in cases where the MUSIC algorithm can be applied, the new method is better at estimating the directions of sources if they are closely placed.

  • Subband based blind source separation for convolutive mixtures of speech

    S Araki, S Makino, R Aichner, T Nishikawa, H Saruwatari

    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS   V   509 - 512  2003年  [査読有り]

     概要を見る

    Subband processing is applied to blind source, separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In our proposed subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can handle long reverberation. Subband BSS achieves better performance than frequency-domain BSS. Moreover, we propose efficient separation procedures that take into consideration the frequency characteristics of room reverberation and speech signals. We achieve this (3) by using longer unmixing filters in low frequency bands, and (4) by adopting overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized in the proposed subband BSS.

  • Geometrical understanding of the PCA subspace method for overdetermined blind source separation

    S Winter, H Sawada, S Makino

    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS     769 - 772  2003年  [査読有り]

     概要を見る

    In this paper, we discuss approaches for blind source separation where we can use more sensors than the number of sources for a better performance. The discussion focuses mainly on reducing the dimension of mixed signals before applying independent component analysis. We compare two previously proposed methods. The first is based on principal component analysis, where noise reduction is achieved. The second involves selecting a subset of sensors based on the fact that a low frequency prefers a wide spacing and a high frequency prefers a narrow spacing. We found that the PCA-based method behaves similarly to the geometry-based method for low frequencies in the way that it emphasizes the outer sensors and yields superior results for high frequencies, which provides a better understanding of the former method.

  • Natural gradient blind deconvolution and equalization using causal FIR filters

    SC Douglas, HO Sawada, S Makino

    CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2   2 ( 3 ) 197 - 201  2003年  [査読有り]

     概要を見る

    Natural gradient adaptation is an especially convenient method for adapting the coefficients of a linear system in inverse filtering tasks such as blind deconvolution and equalization. Practical implementations of such methods require truncation of the filter impulse responses within the gradient updates. In this paper, we show how truncation of these filter impulse responses can create convergence problems and introduces a bias into the steady-state solution of one such algorithm. We then show how this algorithm can be modified to effectively mitigate these effects for estimating causal FIR approximations to doubly-infinite IIR equalizers. Simulations indicate that the modified algorithm provides the convergence benefits of the natural gradient while still attaining good steady-state performance.

  • ICA-based blind source separation of sounds

    S., Makino, S., Araki, R., Mukai, H., Sawada, and, H. Saruwatari

    JCA2002 (China-Japan Joint Conference on Acoustics)     83-86 - 86  2002年11月  [査読有り]

    CiNii

  • Digital technologies for controlling room acoustics

    M., Miyoshi, S., Makino

    JCA2002 (China-Japan Joint Conference on Acoustics)     19-24  2002年11月  [査読有り]

  • Blind source separation for convolutive mixtures of speech using subband processing

    S., Araki, S., Makino, R., Aichner, T., Nishikawa, and, H. Saruwatari

    SMMSP2002 (International Workshop on Spectral Methods and Multirate Signal Processing)     195-202  2002年09月  [査読有り]

    CiNii

  • Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming

    S Araki, Y Hinamoto, S Makino, T Nishikawa, R Mukai, H Saruwatari

    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS     1785 - 1788  2002年  [査読有り]

     概要を見る

    Frequency domain Blind Source Separation (BSS) is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., Adaptive Beamformers (ABFs). The minimization of the off-diagonal components in the BSS update equation can be viewed as the minimization of the mean square error in the ABF. The unmixing matrix of the BSS and the filter coefficients of the ABF converge to the same solution in the mean square error sense if the two source signals are ideally independent. Therefore, the performance of the BSS is limited by that of the ABF. This understanding gives an interpretation of BSS from physical point of view.

  • Time domain blind source separation of non-stationary convolved signals by utilizing geometric beamforming

    R Aichner, S Araki, S Makino

    NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS     445 - 454  2002年  [査読有り]

     概要を見る

    We propose a time-domain BSS algorithm that utilizes geometric information such as sensor positions and assumed locations of sources. The algorithm tackles the problem of convolved mixtures by explicitly exploiting the non-stationarity of the acoustic sources. The learning rule is based on secondorder statistics and is derived by natural gradient minimization. The proposed initialization of the algorithm is based on the null beamforming principle. This method leads to improved separation performance, and the algorithm is able to estimate long unmixing FIR filters in the time domain due to the geometric initialization. We also propose a post-filtering method for dewhitening which is based on the scaling technique in frequency-domain BSS. The validity of the proposed method is shown by computer simulations. Our experimental results confirm that the algorithm is capable of separating real-world speech mixtures and can be applied to short learning data sets down to a few seconds. Our results also confirm that the proposed dewhitening post-filtering method maintains the spectral content of the original speech in the separated output.

  • Enhanced frequency-domain adaptive algorithm for stereo echo cancellation

    S Emura, Y Haneda, S Makino

    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS   II   1901 - 1904  2002年  [査読有り]

     概要を見る

    Highly cross-correlated input signals create the problem of slow convergence of misalignment in stereo echo cancellation even after undergoing non-linear preprocessing. We propose a new frequency-domain adaptive algorithm that improves the convergence rate by increasing the contribution of non-linearity in the adjustment vector. Computer simulation showed that it is effective when the non-linearity gain is small.

  • Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming

    S Araki, Y Hinamoto, S Makino, T Nishikawa, R Mukai, H Saruwatari

    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS   II   1785 - 1788  2002年  [査読有り]

     概要を見る

    Frequency domain Blind Source Separation (BSS) is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., Adaptive Beamformers (ABFs). The minimization of the off-diagonal components in the BSS update equation can be viewed as the minimization of the mean square error in the ABF. The unmixing matrix of the BSS and the filter coefficients of the ABF converge to the same solution in the mean square error sense if the two source signals are ideally independent. Therefore, the performance of the BSS is limited by that of the ABF. This understanding gives an interpretation of BSS from physical point of view.

  • Polar coordinate based nonlinear function for frequency-domain blind source separation

    H Sawada, R Mukai, S Araki, S Makino

    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS   I   1001 - 1004  2002年  [査読有り]

     概要を見る

    This paper presents a new type of nonlinear function for independent component analysis to process complex-valued signals, which is used in frequency-domain blind source separation. The new function is based on the polar coordinates of a complex number, whereas the conventional one is based on the Cartesian coordinates. The new function is derived from the probability density function of frequency-domain signals that are assumed to be independent of the phase. We show that the difference between the two types of functions is in the assumed densities of independent components. Experimental results for separating speech signals show that the new nonlinear function behaves better than the conventional one.

  • Removal of residual cross-talk components in blind source separation using time-delayed spectral subtraction

    R Mukai, S Araki, H Sawada, S Makino

    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS   II   1789 - 1792  2002年  [査読有り]

     概要を見る

    This paper describes a post processing method to refine output signals obtained by Blind Source Separation (BSS). The performance of BSS using Independent Component Analysis (ICA) declines significantly in a reverberant environment. The degradation is mainly caused by the cross-talk components derived from the reverberation of the jammer signal. Utilizing this knowledge, we propose a new method, time-delayed non-stationary spectral subtraction, which removes the residual components from the separated signals precisely. The proposed method compensates for the weakness of BSS in a reverberant environment. Experimental results using speech signals show that the proposed method improves the signal-to-noise ratio by 3 to 5 dB.

  • Removal of residual crosstalk components in blind source separation using LMS filters

    R Mukai, S Araki, H Sawada, S Makino

    NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS     435 - 444  2002年  [査読有り]

     概要を見る

    The performance of Blind Source Separation (BSS) using Independent Component Analysis (ICA) declines significantly in a reverberant environment. The degradation is mainly caused by the residual crosstalk components derived from the reverberation of the jammer signal. This paper describes a post-processing method designed to refine output signals obtained by BSS.
    We propose a new method which uses LMS filters in the frequency domain to estimate the residual crosstalk components in separated signals. The estimated components are removed by nonstational spectral subtraction. The proposed method removes the residual components precisely, thus it compensates for the weakness of BSS in a reverberant environment.
    Experimental results using speech signals show that the proposed method improves the signal-to-interference ratio by 3 to 5 dB.

  • Blind source separation with different sensor spacing and filter length for each frequency range

    H Sawada, S Araki, R Mukai, S Makino

    NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS     465 - 474  2002年  [査読有り]

     概要を見る

    This paper presents a method for blind source separation using several separating subsystems whose sensor spacing and filter length can be configured individually. Each subsystem is responsible for source separation of an allocated frequency range. With this mechanism, we can use appropriate sensor spacing as well as filter length for each frequency range. We obtained better separation performance than with the conventional method by using a wide sensor spacing and a long filter for a low frequency range, and a narrow sensor spacing and a short filter for a high frequency range.

  • Separation and dereverberation performance of frequency domain blind source separation

    R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    ICA2001     230-235  2001年12月  [査読有り]

  • A polar-coordinate based activation function for frequency domain blind source separation

    H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    ICA2001     663-668  2001年12月  [査読有り]

    CiNii

  • 実環境におけるブラインド音源分離技術を開発 -2人の声の同時聞き分けに成功-

    牧野昭二, 荒木章子, 向井良, 片桐滋

    電子情報通信学会誌   84 ( 11 ) 848 - 848  2001年11月  [査読有り]

    CiNii

  • Equivalence between frequency domain blind source separation and frequency domain adaptive beamformers

    S., Araki, S., Makino, R., Mukai, and, H. Saruwatari

    CRAC (A workshop on Consistent and Reliable acoustic cues for sound analysis)   2 ( 4 ) 1-4  2001年09月  [査読有り]

  • ICASSP2001会議報告

    牧野昭二, 荒木章子

    人工知能学会誌   16 ( 5 ) 736-737  2001年09月  [査読有り]

  • Adaptive filtering algorithm enhancing decorrelated additive signals for stereo echo cancellation

    S., Emura, Y., Haneda, and, S. Makino, 牧野, 昭二

    IWAENC2001     67-70  2001年09月  [査読有り]

  • Separation and dereverberation performance of frequency domain blind source separation in a reverberant environment

    R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    IWAENC2001     127-130  2001年09月  [査読有り]

    CiNii

  • Equivalence between frequency domain blind source separation and frequency domain adaptive null beamformers

    S., Araki, S., Makino, R., Mukai, and, H. Saruwatari

    Eurospeech2001     2595-2598  2001年09月  [査読有り]

  • Separation and dereverberation performance of frequency domain blind source separation for speech in a reverberant environment

    R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    Eurospeech2001 (European Conference on Speech Communication and Technology)     2599-2602  2001年09月  [査読有り]

  • 全指向性を持つスピーカ・マイクロホン一体型通話装置の設計

    中川, 朗, 島内, 末廣, 羽田, 陽一, 青木, 茂明, 牧野, 昭二

    日本音響学会誌   57 ( 8 ) 509-516 - 516  2001年08月  [査読有り]

     概要を見る

    ハンズフリーでの遠隔通信会議において, エコーやハウリングの諸問題を解決するために, 適応フィルタを用いた音響エコーキャンセラが広く用いられてきている。通信会議の利便性向上のために, スピーカとマイクロホンを同一筐体に収めたハンズフリー通話装置が望まれているが, このような装置では, スピーカとマイクロホン間の距離が短いため, スピーカからマイクロホンに回り込む音響エコーが増大し, 音響エコーキャンセラの制御が困難になる。本論文では, 音響エコー増大の問題に対して, 4マイクロホン構成の検討を行い, 各マイクロホン出力信号の位相を制御することにより, スピーカから回り込む音響エコーを約12dB低減しながら, 全指向性を持つスピーカ・マイクロホン一体型装置を実現した。

    CiNii

  • Fundamental limitation of frequency domain Blind Source Separation for convolutive mixture of speech

    A Shoko, S. Makino, T Nishikawa, H Saruwatari

    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS     2737 - 2740  2001年  [査読有り]

     概要を見る

    Despite several recent proposals to achieve Blind Source Separation (BSS) for realistic acoustic signal, separation performance is still not enough. In particular, when the length of impulse response is long, performance is highly limited. In this paper, we show it is useless to be constrained by the condition, Pmuch less than T, where T is the frame size of FFT and P is the length of room impulse response. From our experiments, a frame size of 256 or 512 (32 or 64 ms at a sampling frequency of 8 kHz) is best even for the long room reverberation of T-R = 150 and 300 ms. We also clarified the reason for poor performance of BSS in long reverberant environment, finding that separation is achieved chiefly for the sound from the direction of jammer because BSS cannot calculate the inverse of the room transfer function both for the target and jammer signals.

  • Stereophonic acoustic echo cancellation: An overview and recent solutions

    S. Makino

    Acoustical Science and Technology   22 ( 5 ) 325 - 333  2001年  [査読有り]

     概要を見る

    The fundamental problems of stereophonic acoustic echo cancellation were discussed and the recent solutions were reviewed. The stereo echo cancellation was achieved by linearly combining two monoaural echo cancellers. A duofilter control system including a continually running adaptive filter and a fixed filter was used for double talk control. A second order stereo projection algorithm was used in the adaptive filter and a stereo switch was also implemented.

    DOI CiNii

    Scopus

    11
    被引用数
    (Scopus)
  • Subjective assessment of the desired echo return loss for subband acoustic echo cancellers

    S Sakauchi, Y Haneda, S Makino, M Tanaka, Y Kaneda

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E83A ( 12 ) 2633 - 2639  2000年12月  [査読有り]

     概要を見る

    We investigated the dependence of the desired echo return loss on frequency for various hands-free telecommunication conditions by subjective assessment The desired echo return loss as a function of frequency (DERLf) is all important factor in the design and performance evaluation of a subband echo canceller, and it is a measure of what is considered all acceptable echo caused by electrical loss in the transmission line. The DERLf during single talk was obtained as attenuated band-limited echo levels that subjects did not find objectionable when listening to the near-end speech and its band-limited echo under various hands-free telecommunication conditions. When we investigated the DERLf during double-talk, subjects also heard the speech in the far-end room from a loudspeaker. The echo was limited to a 250-Hz bandwidth assuming the use of a subband echo canceller. The test results showed that: (1) when the transmission delay was short (30 ms), the echo component around 2 to 3 kHz was the most objectionable to listeners. (2) as the transmission delay rose to 300 ms, the echo component around 1 kHz became the most objectionable; (3) when the room reverberation time was relatively long (about 500 ms). the echo cumyonent around 1 kHz was the most objectionable even if the transmission delay was short; and ( 1) the DERLf during double-talk was about 5 to 10dB lower than that during single-talk. Use of these DERLf values will enable the design of mure efficient subband echo cancellers.

  • A study of microphone system for hands-free teleconferencing units

    Akira Nakagawa, Suehiro Shimauchi, Shoji Makino

    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi)   21 ( 1 ) 33 - 35  2000年  [査読有り]

    DOI

    Scopus

  • Channel-number-compressed multi-channel acoustic echo canceller for high-presence teleconferencing system with large display

    A Nakagawa, S Shimauchi, Y Haneda, S Aoki, S Makino

    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI     813 - 816  2000年  [査読有り]

     概要を見る

    Sound localization is important to make conversation easy between local and remote sites in a teleconference. This requires a multi-channel sound system having a multi-channel acoustic echo canceller (MAEC). The appropriate number of channels is determined from a trade-off between high presence and MAEC performance, so it is not possible to increase the channel number by much.
    We propose a channel-number-compressed MAEC to provide teleconferencing systems that exhibit high presence. The channel number of the MAEC inputs is compressed and that of its outputs is expanded.

  • Hybrid of acoustic echo cancellers and voice switching control for multi-channel applications

    S., Shimauchi, A., Nakagawa, Y., Haneda, and, S. Makino, 牧野, 昭二

    IWAENC99     48-51  1999年09月  [査読有り]

    CiNii

  • Subband echo canceler with an exponentially weighted stepsize NLMS adaptive filter

    S Makino, Y Haneda

    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE   82 ( 3 ) 49 - 57  1999年03月  [査読有り]

     概要を見る

    This paper proposes a novel adaptive algorithm for an echo canceler. In this algorithm, the number of operations and memory capacity are equivalent to those of the conventional NLMS algorithm but the convergence speed is twice that using the conventional algorithm. This adaptive algorithm is referred to as subband ES (exponentially weighted stepsize). In the algorithm, the frequency bands of the received input signal and echo signal are divided into multiple subbands, and echo is independently canceled in each subband. Each adaptive filter in each subband has independent coefficients with an independent stepsize. The stepsize is time-independent and its weight is exponentially proportional to the change of the impulse response within the frequency region, such as the expected value of the difference between the waveforms of two impulse responses. As a result, the characteristic of the acoustic echo path in each frequency band is analyzed using the adaptive algorithm to improve the convergence characteristic. Using the results of computer simulation and experimental results obtained via an experimental setup with DSP, it is shown that the convergence speed with respect to input voice signal can be about 4 times faster when using echo cancellation based on the new algorithm than in conventional full-band echo cancellation based on the NLMS algorithm. (C) 1998 Scripta Technica, Electron Comm Jpn Pt 3, 82(3): 49-57, 1999.

  • A stereo echo canceller implemented using a stereo shaker and a duo-filter control system

    S Shimauchi, S Makino, Y Haneda, A Nakagawa, S Sakauchi

    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI     857 - 860  1999年  [査読有り]

     概要を見る

    Stereo echo cancellation has been achieved and used in daily teleconferencing. To overcome the non-uniqueness problem, a stereo shaker is introduced in eight frequency bands and adjusted so as to be inaudible and not affect stereo perception. A duo-filter control system including a continually running adaptive filter and a fixed filter is used for double-talk control. A second-order stereo projection algorithm is used in the adaptive filter. A stereo voice switch is also included. This stereo echo canceller was tested in two-way conversation in a conference room, and the strength of the stereo shaker was subjectively adjusted. A misalignment of 20 dB was obtained in the teleconferencing environment, and changing the talker's position in the transmission room did not affect the cancellation. This echo canceller is now used daily in a high-presence teleconferencing system and has been demonstrated to more than 300 attendees.

  • New configuration for a stereo echo canceller with nonlinear pre-processing

    S Shimauchi, Y Haneda, S Makino, Y Kaneda

    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6     3685 - 3688  1998年  [査読有り]

     概要を見る

    A new configuration for a stereo echo canceller with nonlinear pre-processing is proposed. The pre-processor which adds uncorrelated components to the original received stereo signals improves the adaptive filter convergence even in the conventional configuration. However, because of the inaudibility restriction, the preprocessed signals still include a large amount of the original stereo signals which are often highly cross-correlated. Therefore, the improvement is limited. To overcome this, our new stereo echo canceller includes exclusive adaptive filters whose inputs are the uncorrelated signals generated in the pre-processor. These exclusive adaptive filters converge to true solutions without suffering from cross-correlation between the original stereo signals. This is demonstrated through computer simulation results.

  • Subband acoustic echo canceller using two different analysis filters and 8th order projection algorithm

    A., Nakagawa, Y., Haneda, and, S. Makino, 牧野, 昭二

    IWAENC97     140-143  1997年09月  [査読有り]

    CiNii

  • Subjective assessment of echo return loss required for subband acoustic echo cancellers

    S., Sakauchi, Y., Haneda, and, S. Makino, 牧野, 昭二

    IWAENC97     152-155  1997年09月  [査読有り]

  • Multiple-point equalization of room transfer functions by using common acoustical poles

    Y Haneda, S Makino, Y Kaneda

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   5 ( 4 ) 325 - 333  1997年07月  [査読有り]

     概要を見る

    A multiple-point equalization filter using the common acoustical poles of room transfer functions is proposed, The common acoustical poles correspond to the resonance frequencies, which are independent of source and receiver positions. They are estimated as common autoregressive (AR) coefficients from multiple room transfer functions. The equalization is achieved with a finite impulse response (FIR) filter, which has the inverse characteristics of the common acoustical pole function. Although the proposed filter cannot recover the frequency response dips of the multiple room transfer functions, it can suppress their common peaks due to resonance; it is also less sensitive to changes in receiver position, Evaluation of the proposed equalization filter using measured room transfer functions shows that it can reduce the deviations in the frequency characteristics of multiple room transfer functions better than a conventional multiple-point inverse filter, Experiments show that the proposed filter enables 1-5 dB additional amplifier gain in a public address system without acoustic feedback at multiple receiver positions, Furthermore, the proposed filter reduces the reflected sound in room impulse responses without the pre-echo that occurs with a multiple-point inverse filter. A multiple-point equalization filter using common acoustical poles can thus equalize multiple room transfer functions by suppressing their common peaks.

  • Subband stereo echo canceller using the projection algorithm with fast convergence to the true echo path

    S Makino, K Strauss, S Shimauchi, Y Haneda, A Nakagawa

    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V     299 - 302  1997年  [査読有り]

     概要を見る

    This paper proposes a new subband stereo echo canceller that converges to the true echo path impulse response much faster than conventional stereo echo cancellers. Since signals are bandlimited and downsampled in the subband structure, the time interval between the subband signals become longer, so the variation of the crosscorrelation between the stereo input signals becomes large. Consequently, convergence to the true solution is improved. Furthermore, the projection algorithm, or affine projection algorithm, is applied to further speed up the convergence. Computer simulations using stereo signals recorded in a conference room demonstrate that this method significantly improves convergence speed and almost solves the problem of stereo echo cancellation with low computational load.

  • Noise reduction for subband acoustic echo canceller

    J., Sasaki, Y., Haneda, and, S. Makino, 牧野, 昭二

    Joint meeting, Acoustical Society of America and Acoustical Society of Japan     1285-1290  1996年12月  [査読有り]

    CiNii

  • Implementation and evaluation of an acoustic echo canceller using duo-filter control system

    Y., Haneda, S., Makino, J., Kojima, and, S. Shimauchi

    EUSIPCO96 (European Signal Processing Conference)     1115-1118 - 1118  1996年09月  [査読有り]

    CiNii

  • SSB subband echo canceller using low-order projection algorithm

    S Makino, J Noebauer, Y Haneda, A Nakagawa

    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6     945 - 948  1996年  [査読有り]

  • Stereo echo cancellation algorithm using imaginary input-output relationships

    S Shimauchi, S Makino

    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6     941 - 944  1996年  [査読有り]

  • A FAST PROJECTION ALGORITHM FOR ADAPTIVE FILTERING

    M TANAKA, Y KANEDA, S MAKINO, J KOJIMA

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E78A ( 10 ) 1355 - 1361  1995年10月  [査読有り]

     概要を見る

    This paper proposes a new algorithm called the fast Projection algorithm, which reduces the computational complexity of the Projection algorithm from (p+1)L+O(p(3)) to 2L+20p (where L is the length of the estimation filter and p is the projection order.) This algorithm has properties that lie between those of NLMS and RLS, i.e. less computational complexity than RLS but much faster convergence than NLMS for input signals like speech. The reduction of computation consists of two parts. One concerns calculating the pre-filtering vector which originally took O(p(3)) operations. Our new algorithm computes the pre-filtering vector recursively with about 15p operations. The other reduction is accomplished by introducing an approximation vector of the estimation filter. Experimental results for speech input show that the convergence speed of the Projection algorithm approaches that of RLS as the projection order increases with only a slight extra calculation complexity beyond that of NLMS, which indicates the efficiency of the proposed fast Projection algorithm.

  • Relationship between the 'ES family' algorithms and conventional adaptive algorithms

    S., Makino

    IWAENC95     11-14  1995年06月  [査読有り]

    CiNii

  • Implementation and evaluation of an acoustic echo canceller using the duo-filter control system

    Y., Haneda, S., Makino, J., Kojima, and, S. Shimauchi

    IWAENC95     79-82  1995年06月  [査読有り]

  • エコーキャンセラは拡声装置のハウリングにも有効か?

    牧野昭二

    日本音響学会誌   51 ( 3 ) 248  1995年03月  [査読有り]

  • STEREO PROJECTION ECHO CANCELER WITH TRUE ECHO PATH ESTIMATION

    S SHIMAUCHI, S MAKINO

    1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5     3059 - 3062  1995年  [査読有り]

  • FAST PROJECTION ALGORITHM AND ITS STEP-SIZE CONTROL

    M TANAKA, Y KANEDA, S MAKINO, J KOJIMA

    1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5     945 - 948  1995年  [査読有り]

  • 高性能音響エコーキャンセラの開発

    小島順治, 牧野昭二, 羽田陽一, 島田末廣

    NTT R&D   44 ( 1 ) 39-44  1995年01月  [査読有り]

  • 室内音場伝達関数の共通極・零モデル化

    羽田陽一, 牧野昭二, 金田豊

    NTT R&D   44 ( 1 ) 53-58 - 101  1995年01月  [査読有り]

     概要を見る

    室内音場の共振系に対応した共通極を用いた室内音場伝達関数の新しいモデル(共通極・零モデル)を提案する。共通極は音源・受音点配置の異なる複数の室内音場伝達関数から共通AR係数として推定する。このモデルでは、複数の室内音場伝達関数を、推定した共通AR係数とそれぞれの室内音場伝達関数毎に異なる零点を用いて表現するため、従来の全零モデルや極零モデルに比べてパラメータ数を削減することができる。共通極・零モデルに基づいたエコーキャンセラのシミュレーションを行なった結果、従来の全零モデルに比べて、800Hzまでの帯域で、適応フィルタの次数を約半分に、収束速度を約1.5倍に向上させることができ、提案したモデルの有効性が確認された。

    CiNii

  • 1994年音響・音声・信号処理国際会議(ICASSP-94)報告

    牧野昭二, 他

    日本音響学会誌   50 ( 9 ) 759-760  1994年09月  [査読有り]

  • 音声エコーキャンセラのための適応信号処理の研究

    牧野昭二

    日本音響学会誌     75  1994年01月  [査読有り]

  • Arma modeling of a room transfer function at low frequencies

    Yoichi Haneda, Shoji Makino, Yutaka Kaneda, Nobuo Koizumi

    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi)   15 ( 5 ) 353 - 355  1994年  [査読有り]

    DOI

    Scopus

    8
    被引用数
    (Scopus)
  • A NEW RLS ALGORITHM-BASED ON THE VARIATION CHARACTERISTICS OF A ROOM IMPULSE-RESPONSE

    S MAKINO, Y KANEDA

    ICASSP-94 - PROCEEDINGS, VOL 3   III   373 - 376  1994年  [査読有り]

  • マイクロプロセッサを用いたプログラム制御形音声スイッチの設計

    及川弘, 西野正和, 山森和彦, 牧野昭二

    電子情報通信学会論文誌   J77-B-I ( 1 ) 66-74 - 74  1994年01月  [査読有り]

     概要を見る

    音声スイッチ回路(VS)は,マイクロホンとスピーカを用いて拡声通話を実現する通信機器におけるエコー抑圧やハウリング防止などに広く使用されている.また,最近では,エコーキャンセラの性能を補完する形で,エコーキャンセラと併用して使用されることも多い.加藤らは,このVSをアナログ回路を利用して実現する際の動作特性と設計法について詳しく検討し,音響・側音特性の変化に自動的に適応して切換え損を小さくできる自動損失切換え形VS(ALS)を提案している.しかし,このALSは,アナログ回路のみで構成するため,設計が複雑で,今以上に切換え損を小さくすることは極めて困難である.そこで,本論文では,マイクロプロセッサ(μP)を用い,プログラム制御でALSの機能を実現することで優れた通話性能が得られるプログラム制御形音声スイッチとして(1)全通話帯域に適用するもの(Type A)と,(2)通話帯域を分割し各通話帯域ごとに適用するもの(Type B)とを提案する.

    CiNii

  • Common acoustical poles independent of sound directions and modeling of head-related transfer functions

    Yoichi Haneda, Shoji Makino, Yutaka Kaneda

    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi)   15 ( 4 ) 277 - 279  1994年  [査読有り]

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • EXPONENTIALLY WEIGHTED STEP-SIZE PROJECTION ALGORITHM FOR ACOUSTIC ECHO CANCELERS

    S MAKINO, Y KANEDA

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E75A ( 11 ) 1500 - 1508  1992年11月  [査読有り]

     概要を見る

    This paper proposes a new adaptive algorithm for acoustic echo cancellers with four times the convergence speed for a speech input, at almost the same computational load, of the normalized LMS (NLMS). This algorithm reflects both the statistics of the variation of a room impulse response and the whitening of the received input signal. This algorithm, called the ESP (exponentially weighted step-size projection) algorithm, uses a different step size for each coefficient of an adaptive transversal filter. These step sizes are time-invariant and weighted proportional to the expected variation of a room impulse response. As a result, the algorithm adjusts coefficients with large errors in large steps, and coefficients with small errors in small steps. The algorithm is based on the fact that the expected variation of a room impulse response becomes progressively smaller along the series by the same exponential ratio as the impulse response energy decay. This algorithm also reflects the whitening of the received input signal, ie., it removes the correlation between consecutive received input vectors. This process is effective for speech, which has a highly non-white spectrum. A geometric interpretation of the proposed algorithm is derived and the convergence condition is proved. A fast projection algorithm is introduced to reduce the computational complexity and modified for a practical multiple DSP structure so that it requires almost the same computational load, 2L multiply-add operations, as the conventional NLMS. The algorithm is implemented in an acoustic echo canceller constructed with multiple DSP chips, and its fast convergence is demonstrated.

  • MODELING OF A ROOM TRANSFER-FUNCTION USING COMMON ACOUSTICAL POLES

    Y HANEDA, S MAKINO, Y KANEDA

    ICASSP-92 - 1992 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5   II   B213 - B216  1992年  [査読有り]

  • Subband echo canceller with an exponentially weighted step size NLMS adaptive filter

    S., Makino, Y., Haneda

    IWAENC91 (International Workshop on Acoustic Echo and Noise Control)     109-120  1991年09月  [査読有り]

  • 1990年音響・音声・信号処理国際会議(ICASSP90)報告

    広瀬啓吉, 中川聖一, 谷口智彦, 牧野昭二

    日本音響学会誌   46 ( 10 ) 869-870 - 870  1990年10月  [査読有り]

    CiNii

  • 最近の電話の音響技術 - エコー制御技術 -

    島田正治, 牧野昭二

    テレビジョン学会誌   44 ( 3 ) 222-227 - 227  1990年03月  [査読有り]

    DOI CiNii

  • ACOUSTIC ECHO CANCELER ALGORITHM BASED ON THE VARIATION CHARACTERISTICS OF A ROOM IMPULSE-RESPONSE

    S MAKINO, Y KANEDA

    ICASSP 90, VOLS 1-5     1133 - 1136  1990年  [査読有り]

  • Echo control in telecommunications

    Shoji Makino, Shoji Shimada

    Journal of the Acoustical Society of Japan (E)   11 ( 6 ) 309 - 316  1990年  [査読有り]

     概要を見る

    This paper reviews echo control techniques for telecommunications, emphasizing the principles and applications of both circuit and acoustic echo cancellers. First, echo generating mechanisms and echo problems are described for circuit and acoustic echoes. Circuit echo is caused by impedance mismatching in a hybrid coil. Acoustic echo is caused by acoustic coupling between loudspeakers and microphones in a room. The echo problem is severe when the round-trip propagation delay is long. In this case, the echo must be removed. Next, the basic principle of the echo canceller, adaptive filter structure and adaptive algorithm are discussed. Emphasis is focused on the construction and operation of an adaptive transversal filter using the NLMS (Normalized Least Mean Square) algorithm, which is the most popular for the echo canceller. Then, applications of circuit and acoustic echo cancellers are described. Circuit echo cancellers have been well studied and implemented in LSIs for many applications. Although acoustic echo cancellers have been introduced into audio teleconference systems, they still have some problems which must be solved. Therefore, they are now being studied intensely. Finally, this paper mentions the problems of echo cancellers and the direction of future work on them. The main targets for acoustic echo cancellers are improving the convergence speed, reducing the amount of hardware and bettering the double-talk control technique. © 1990, Acoustical Society of Japan. All rights reserved.

    DOI

    Scopus

    5
    被引用数
    (Scopus)
  • Acoustic echo canceller algorithm based on room acoustic characteristics

    S., Makino, N., Koizumi

    WASPAA89 (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics)   1 ( 1 ) 1-2  1989年10月  [査読有り]

  • A coustic echo canceller with multiple echo paths

    Nobuo Koizumi, Shoji Makino, Hiroshi Oikawa

    Journal of the Acoustical Society of Japan (E)   10 ( 1 ) 39 - 45  1989年  [査読有り]

     概要を見る

    A new configuration of acoustic echo canceller for multiple microphone teleconferencing systems is proposed. It is designed for use with microphones whose gains switch or vary during teleconferencing according to the talker. This system requires memory for multiple echo paths, which enables the updating of filter coefficients when an echo path is changed due to the switching of the actuated microphone during talker alternation. In comparison to the single echo path model which uses only adaptation, this method maintains echo cancellation during abrupt changes of the echo path when the microphone alternates between talkers. Also in comparison to direct microphone output mixing, this method reduces the stationary residual echo level by the reduction of acoustic coupling. © 1989, Acoustical Society of Japan. All rights reserved.

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • エコーキャンセラの室内音場における適応特性の改善について

    牧野昭二, 小泉宣夫

    電子情報通信学会論文誌   J71-A ( 12 ) 2212-2214 - 2214  1988年12月  [査読有り]

    CiNii

  • AUDIO TELECONFERENCING SET WITH MULTIPATH ECHO CANCELLER

    H OIKAWA, N KOIZUMI, S MAKINO

    REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES   36 ( 2 ) 217 - 223  1988年03月  [査読有り]

  • 複数反響路エコーキャンセラを用いた音声会議装置

    及川弘, 小泉宣夫, 牧野昭二

    研究実用化報告   37 ( 2 ) 191-197 - 197  1988年02月  [査読有り]

    CiNii

  • 周辺に段差を持つ圧電バイモルフ振動板の振動特性

    牧野昭二, 一ノ瀬裕

    日本音響学会誌   43 ( 3 ) 161-166 - 166  1987年03月  [査読有り]

    CiNii

▼全件表示

書籍等出版物

  • Audio Source Separation

    Makino,Shoji( 担当: 単著)

    Springer International Publishing  2018年03月 ISBN: 9783319730318

  • Underdetermined blind source separation using acoustic arrays

    S., Makino, S., Araki, S., Winter, and, H. Sawada( 担当: 単著)

    Wiley  2010年01月

  • Underdetermined blind source separation of convolutive mixtures by hierarchical clustering and L1-norm minimization

    S., Winter, W., Kellermann, H., Sawada, and, S. Makino, 牧野, 昭二( 担当: その他)

    Springer  2007年09月

  • Frequency-domain blind source separation

    H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二( 担当: その他)

    Springer  2007年09月

  • K-means based underdetermined blind speech separation

    S., Araki, H., Sawada, and, S. Makino, 牧野, 昭二( 担当: その他)

    Springer  2007年09月

  • Blind Speech Separation

    S. Makino, Te-Won Lee, H. Sawada( 担当: 編集)

    Springer  2007年09月 ISBN: 9781402064784

     概要を見る

    http://www.amazon.co.jp/Speech-Separation-Signals-Communication-Technology/dp/1402064780

  • Blind source separation of convolutive mixtures of audio signals in frequency domain

    S., Makino, H., Sawada, R., Mukai, and, S. Araki( 担当: 単著)

    Springer  2006年05月

  • Speech Enhancement

    J. Benesty, S. Makino, J. Chen( 担当: 編集)

    Springer  2005年05月 ISBN: 354024039X

     概要を見る

    http://www.amazon.co.jp/Speech-Enhancement-Signals-Communication-Technology/dp/354024039X

  • Real-time blind source separation for moving speech signals

    R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二( 担当: その他)

    Springer  2005年03月

  • Subband based blind source separation

    S., Araki, S., Makino( 担当: その他)

    Springer  2005年03月

  • Blind source separation of convolutive mixtures of speech

    S., Makino( 担当: 単著)

    Springer  2003年01月

  • IEICE Knowledge Base

    S.Makino( 担当: 分担執筆,  担当範囲: Blind audio source separation based on sparse component analysis)

    IEICE  2012年10月

  • 2011 IEEE REGION 10 CONFERENCE TENCON 2011

    Takeda, Kazuma, Kameoka, Hirokazu, Sawada, Hiroshi, Araki, Shoko, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji( 担当: 分担執筆,  担当範囲: Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source)

    IEEE  2011年01月 ISBN: 9781457702556

  • 2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS

    Ansai, Yumi, Araki, Shoko, Makino, Shoji, Nakatani, Tomohiro, Yamada, Takeshi, Nakamura, Atsushi, Kitawaki, Nobuhiko( 担当: 分担執筆,  担当範囲: Cepstral Smoothing of Separated Signals for Underdetermined Speech Separation)

    IEEE  2010年01月 ISBN: 9781424453092

  • 通信会議設備

    牧野, 昭二( 担当: 単著)

    フジ・テクノシステム  1999年10月

  • 音響エコーキャンセラのための適応信号処理の研究

    牧野, 昭二( 担当: 単著)

    1993年03月

▼全件表示

講演・口頭発表等

  • Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier

    Li, Li, Hirokazu, Kameoka, Makino, Shoji

    ICASSP   (Brighton, United Kingdom) 

    発表年月: 2019年05月

  • Joint separation and dereverberation of reverberant mixtures with multichannel variational autoencoder

    Shota, Inoue, Hirokazu, Kameoka, Li, Li, Shogo, Seki, Makino, Shoji

    ICASSP   (Brighton, United Kingdom) 

    発表年月: 2019年05月

  • Time-frequency-bin-wise switching of minimum variance distortionless response beamformer for underdetermined situations

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    ICASSP 2019   (Brighton, ENGLAND) 

    発表年月: 2019年05月

  • NEW ANALYTICAL UPDATE RULE FOR TDOA INFERENCE FOR UNDERDETERMINED BSS IN NOISY ENVIRONMENTS

    Maruyama, Takuro, Araki, Shoko, Nakatani, Tomohiro, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji, Nakamura, Atsushi

    IEEE International Conference on Acoustics, Speech and Signal Processing   (Kyoto, JAPAN) 

    発表年月: 2012年03月

  • Audio source separation based on independent component analysis

    S. Makino, H. Sawada  [招待有り]

    Tutorial at the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing  

    発表年月: 2007年04月

  • Study on geometrically constrained IVA with auxiliary function approach and VCD for in-car communication

    Goto, Kana, Li, Li, Takahashi, Riki, Makino, Shoji, Yamada, Takeshi

    APSIPA ASC 2020  

    発表年月: 2020年12月

  • Applying virtual microphones to triangular microphone array in in-car communication

    Segawa, Hanako, Takahashi, Riki, Jinzai, Ryoga, Makino, Shoji, Yamada, Takeshi

    APSIPA ASC 2020  

    発表年月: 2020年12月

  • 空間フィルタの自動推定による音響シーン識別の検討

    大野, 泰己, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会  

    発表年月: 2020年03月

  • Generative Adversarial Networks を用いた半教師あり学習の音響イベント検出への適用

    合馬, 一弥, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会  

    発表年月: 2020年03月

  • 発話の時間変動に着目した音声認識誤り区間推定の検討

    舒, 禹清, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会  

    発表年月: 2020年03月

  • 空間特徴と音響特徴を併用する音響イベント検出の検討

    陳, 軼夫, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会  

    発表年月: 2020年03月

  • 車室内コミュニケーション用低遅延音源分離の検討

    上田, 哲也, 井上, 翔太, 牧野, 昭二, 松本, 光雄, 山田, 武志

    日本音響学会春季研究発表会  

    発表年月: 2020年03月

  • DNNマスク推定に基づく畳み込みビームフォーマによる音源分離・残響除去・雑音除去の同時実現

    髙橋理希, 中谷智広, 落合翼, 木下慶介, 池下林太郎, Marc, Delcroix, 荒木章子, 牧野, 昭二

    日本音響学会2020年春季研究発表会  

    発表年月: 2020年03月

  • 基底共有型半教師あり独立低ランク行列分析に基づく多チャネル補聴器システム

    宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

    日本音響学会2020年春季研究発表会  

    発表年月: 2020年03月

  • Spatial feature extraction based on convolutional neural network with multiple microphone inputs for monitoring of domestic activities

    Kaneko, Yuki, Kurosawa, Rika, Yamada, Takeshi, Makino, Shoji

    NCSP'20  

    発表年月: 2020年02月

  • Underdetermined multichannel speech enhancement using time-frequency-bin-wise switching beamformer and gated CNN-based time-frequency mask for reverberant environments

    Takahashi, Riki, Yamaoka, Kouei, Li, Li, Makino, Shoji, Yamada, Takeshi, Matsumoto, Mitsuo

    NCSP'20  

    発表年月: 2020年02月

  • Blind source separation with low-latency for in-car communication

    Ueda, Tetsuya, Inoue, Shota, Makino, Shoji, Matsumoto, Mitsuo, Yamada, Takeshi

    NCSP'20  

    発表年月: 2020年02月

  • 多チャンネル変分自己符号化器法による任意話者の音源分離

    李莉, 亀岡弘和, 井上翔太, 牧野, 昭二

    電子情報通信学会  

    発表年月: 2019年12月

  • Evaluation of multichannel hearing aid system by rank-constrained spatial covariance matrix estimation

    Une, Masakazu, Kubo, Yuki, Takamune, Norihiro, Kitamura, Daichi, Saruwatari, Hiroshi, Makino, Shoji

    APSIPA   (Lanzhou, China) 

    発表年月: 2019年11月

  • Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum

    Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

    APSIPA ASC 2019   (Lanzhou, PEOPLES R CHINA) 

    発表年月: 2019年11月

  • Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE

    Li, Li, Toda, Tomoki, Morikawa, Kazuho, Kobayashi, Kazuhiro, Makino, Shoji

    ISMIR   (Delft, The Netherlands) 

    発表年月: 2019年11月

  • Joint separation, dereverberation and classification of multiple sources using multichannel variational autoencoder with auxiliary classifier

    Shota, Inoue, Hirokazu, Kameoka, Li, Li, Makino, Shoji

    ICA   (AACHEN, GERMANY) 

    発表年月: 2019年09月

  • Gated convolutional neural network-based voice activity detection under high-level noise environments

    Li, Li, Kouei, Yamaoka, Yuki, Koshino, Mitsuo, Matsumoto, Makino, Shoji

    ICA   (AACHEN, GERMANY) 

    発表年月: 2019年09月

  • BLSTMと変調スペクトルを用いた発話特徴識別の検討

    サントソ, ジェニファー, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会  

    発表年月: 2019年09月

  • BLSTMを用いた音声認識誤り区間推定の検討

    舒, 禹清, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会  

    発表年月: 2019年09月

  • Wavelength proportional arrangement of virtual microphones based on interpolation/extrapolation for underdetermined speech enhancement

    Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Makino, Shoji, Yamada, Takeshi

    EUSIPCO 2019  

    発表年月: 2019年09月

  • CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations

    Yamaoka, Kouei, Li, Li, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    EUSIPCO 2019  

    発表年月: 2019年09月

  • ランク制約付き空間共分散モデル推定を用いた多チャネル補聴器システムの評価

    宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

    日本音響学会2019年秋季研究発表会  

    発表年月: 2019年09月

  • 日本語スピーキングテストにおける解答発話テキストの分散表現を用いた自動採点の検討

    臼井, 桃香, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会  

    発表年月: 2019年03月

  • MVDRビームフォーマの時間周波数スイッチングによる劣決定音声強調

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    電子情報通信学会音声研究会  

    発表年月: 2019年03月

  • 時間周波数スイッチングビームフォーマとGated CNNを用いた時間周波数マスクの組み合わせによる劣決定音声強調

    髙橋, 理希, 山岡, 洸瑛, 李, 莉, 牧野, 昭二, 山田, 武志

    日本音響学会2019年春季研究発表会  

    発表年月: 2019年03月

  • Experimental evaluation of WaveRNN predictor for audio lossless coding

    Amada, Shota, Sugiura, Ryosuke, Kamamoto, Yutaka, Harada, Noboru, Moriya, Takehiro, Yamada, Takeshi, Makino, Shoji

    NCSP'19  

    発表年月: 2019年03月

  • Noise suppression using beamformer and transfer-function-gain nonnegative matrix factorization with distributed stereo microphones

    Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

    NCSP'19  

    発表年月: 2019年03月

  • Categorizing error causes related to utterance characteristics in speech recognition

    Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

    NCSP'19  

    発表年月: 2019年03月

  • Performance evaluation of time-frequency-bin-wise switching beamformer in reverberant environments

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    NCSP'19  

    発表年月: 2019年03月

  • 音源クラス識別器つき多チャンネル変分自己符号化器を用いた高速セミブラインド音源分離

    李, 莉, 亀岡, 弘和, 牧野, 昭二

    日本音響学会2019年春季研究発表会  

    発表年月: 2019年03月

  • Gated CNNを用いた劣悪な雑音環境下における音声区間検出

    牧野, 昭二, 李莉, 越野ゆき, 松本光雄

    電子情報通信学会  

    発表年月: 2019年03月

  • 多チャンネル変分自己符号化器を用いた音源分離と残響除去の統合的アプローチ

    井上翔太, 亀岡弘和, 李莉, 関翔悟, 牧野, 昭二

    日本音響学会2019年春季研究発表会  

    発表年月: 2019年03月

  • Microphone position realignment by extrapolation of virtual microphone

    Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Yamada, Takeshi, Makino, Shoji

    10th Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC)   (Honolulu, HI) 

    発表年月: 2018年11月

  • Weakly labeled learning using BLSTM-CTC for sound event detection

    Matsuyoshi, Taiki, Komatsu, Tatsuya, Kondo, Reishi, Yamada, Takeshi, Makino, Shoji

    APSIPA ASC 2018  

    発表年月: 2018年11月

  • 時間周波数スイッチングビームフォーマと時間周波数マスキングによる劣決定音声強調

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    日本音響学会秋季研究発表会  

    発表年月: 2018年09月

  • Time-frequency-bin-wise beamformer selection and masking for speech enhancement in underdetermined noisy scenarios

    Yamaoka, Kouei, Brendel, Andreas, Ono, Nobutaka, Makino, Shoji, Buerger, Michael, Yamada, Takeshi, Kellermann, Walter

    EUSIPCO 2018   (Rome, ITALY) 

    発表年月: 2018年09月

  • Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming

    Matsui, Yutaro, Nakatani, Tomohiro, Delcroix, Marc, Kinoshita, Keisuke, Ito, Nobutaka, Araki, Shoko, Makino, Shoji

    IWAENC2018  

    発表年月: 2018年09月

  • WaveRNNを利用した音声ロスレス符号化に関する検討と考察

    天田, 将太, 杉浦, 亮介, 鎌本, 優, 原田, 登, 守谷, 健弘, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会  

    発表年月: 2018年09月

  • ヴァーチャルマイクロフォンの外挿によるマイクロフォン間隔の仮想的拡張

    陣在, 遼河, 山岡, 洸瑛, 松本, 光雄, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会  

    発表年月: 2018年09月

  • 音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習法の有効性評価

    松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会  

    発表年月: 2018年09月

  • 複数種録音端末を用いた会議の想定における伝達関数ゲイン基底NMFによる遠方音源抑圧の性能評価

    松井, 裕太郎, 牧野, 昭二, 小野, 順貴, 山田, 武志

    電子情報通信学会信号処理研究会  

    発表年月: 2018年03月

  • 音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習の検討

    松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会  

    発表年月: 2018年03月

  • 複数ビームフォーマの組み合わせによる非線形マイクロフォンアレイ

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    日本音響学会春季研究発表会  

    発表年月: 2018年03月

  • 音声認識における誤認識原因通知のための印象評定値推定の検討

    後藤, 孝宏, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会  

    発表年月: 2018年03月

  • 畳み込みニューラルネットワークを用いた空間特徴抽出に基づく音響シーン識別の検討

    高橋, 玄, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会  

    発表年月: 2018年03月

  • Novel speech recognition interface based on notification of utterance volume required in changing noisy environment

    Goto, Takahiro, Yamada, Takeshi, Makino, Shoji

    NCSP'18  

    発表年月: 2018年03月

  • Acoustic scene classification based on spatial feature extraction using convolutional neural networks

    Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

    NCSP'18  

    発表年月: 2018年03月

  • Ego noise reduction and sound source localization adapted to human ears using hose-shaped rescue robot

    Mae, Narumi, Yamaoka, koei, Mitsui, Yosiki, Matsumoto, Mitsuo, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

    NCSP'18  

    発表年月: 2018年03月

  • Ego-noise reduction for hose-shaped rescue robot using basis-shared semi-supervised independent low-rank matrix analysis

    Takakusaki, Moe, Kitamura, Daichi, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi, Saruwatari, Hiroshi

    NCSP'18  

    発表年月: 2018年03月

  • Abnormal sound detection by two microphones using virtual microphone technique

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    APSIPA 2017   (Kuala Lumpur, MALAYSIA) 

    発表年月: 2017年12月

  • Sound source localization using binaural difference for hose-shaped rescue robot

    Mae, Narumi, Mitsui, Yoshiki, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

    APSIPA 2017   (Kuala Lumpur, MALAYSIA) 

    発表年月: 2017年12月

  • Performance evaluation of acoustic scene classification using DNN-GMM and frame-concatenated acoustic Features

    Takahashi, Gen, Yamada, Takeshi, Ono, Nobutaka, Makino, Shoji

    APSIPA 2017   (Kuala Lumpur, MALAYSIA) 

    発表年月: 2017年12月

  • Experimental evaluation of encoding parameters of MPEG-4 ALS for high-resolution audio

    Amada, Shota, Kamamoto, Yutaka, Harada, Noboru, Sugiura, Ryosuke, Moriya, Takehiro, Makino, Shoji, Yamada, Takeshi

    IEEE GCCE 2017   (Nagoya, JAPAN) 

    発表年月: 2017年10月

  • Mel-generalized cepstral regularization for discriminative non-negative matrix factorization

    Li, Li, Kameoka, Hirokazu, Makino, Shoji

    MLSP   (Tokyo, Japan) 

    発表年月: 2017年09月

  • Multiple far noise suppression in a real environment using transfer-function-gain NMF

    Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

    EUSIPCO 2017   (GREECE) 

    発表年月: 2017年08月

  • Performance evaluation of nonlinear speech enhancement based on virtual increase of channels in reverberant environments

    Yamaoka, Kouei, Ono, Nobutaka, Yamada, Takeshi, Makino, Shoji

    EUSIPCO 2017   (GREECE) 

    発表年月: 2017年08月

  • Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization

    Li, Li, Kameoka, Hirokazu, Toda, Tomoki, Makino, Shoji

    Interspeech   (Stockholm, Sweden) 

    発表年月: 2017年08月

  • Convolutional neural network architecture and input volume matrix design for ERP classifications in a tactile P300-based brain-computer interface

    Kodama, Takumi, Makino, Shoji

    IEEE Engineering in Medicine & Biology Society (EMBC)   (Jeju Island, Korea) 

    発表年月: 2017年07月

  • 柔軟索状ロボットにおける独立低ランク行列分析と統計的音声強調に基づく高品質ブラインド音源分離の開発

    三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

    日本機械学会ロボティクス・メカトロニクス講演会  

    発表年月: 2017年05月

  • SJ-CATにおける項目応答理論に基づく能力値推定の精度改善

    小野, 友暉, 山田, 武志, 菊地, 賢一, 今井, 新悟, 牧野, 昭二

    日本音響学会2017年春季研究発表会  

    発表年月: 2017年03月

  • 音響ロスレス符号化MPEG-4 ALSのハイレゾ音源適応の検討と考察

    天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

    日本音響学会2017年春季研究発表会  

    発表年月: 2017年03月

  • DNN-GMMと連結特徴量を用いた音響シーン識別の検討

    高橋, 玄, 山田, 武志, 小野, 順貴, 牧野, 昭二

    日本音響学会2017年春季研究発表会  

    発表年月: 2017年03月

  • Discriminative non-negative matrix factorization with majorization-minimization

    Li, L, Kameoka, H, Makino, Shoji

    HSCMA   (San Francisco, CA) 

    発表年月: 2017年03月

  • 補助関数法による識別的NMFの基底学習アルゴリズム

    李莉, 亀岡弘和, 牧野昭二

    日本音響学会2017年春季研究発表会  

    発表年月: 2017年03月

  • 独立低ランク行列分析と統計的音声強調を用いた柔軟索状ロボットにおけるブラインド音源分離システムの開発

    三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

    日本音響学会2017年春季研究発表会  

    発表年月: 2017年03月

  • Ego noise reduction for hose-shaped rescue robot combining independent low-rank matrix analysis and multichannel noise cancellation

    Mae, N, Ishimura, M, Makino, Shoji, Kitamura, D, Ono, N, Yamada, T, Saruwatari, H

    13th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA)   (Grenoble Alpes Univ, Grenoble, FRANCE) 

    発表年月: 2017年02月

  • Analysis of the brain activated distributions in response to full-body spatial vibrotactile stimuli using a tactile P300-based BCI paradigm

    Kodama, T, Makino, Shoji

    Biomedical and Health Informatics (BHI)  

    発表年月: 2017年02月

  • Performance estimation of spontaneous speech recognition using non-reference acoustic features

    Ling,Guo, Takeshi,Yamada, Shoji,Makino

    APSIPA2016   (Jeju, SOUTH KOREA) 

    発表年月: 2016年12月

  • Full-body tactile P300-based brain-computer interface accuracy refinement

    Kodama, T, Shimizu, K, Makino, Shoji, Rutkowski, T

    International Conference on Bio-engineering for Smart Technologies (BioSMART)  

    発表年月: 2016年12月

  • Tactile brain-computer interface using classification of P300 responses evoked by full body spatial vibrotactile stimuli

    Kodama, T, Makino, Shoji, Rutkowski, T

    APSIPA  

    発表年月: 2016年12月

  • Visual motion onset augmented reality brain-computer interface

    Shimizu, K, Kodama, T, Makino, Shoji, Rutkowski, T

    International Conference on Bio-engineering for Smart Technologies (BioSMART)  

    発表年月: 2016年12月

  • 伝達関数ゲイン基底NMFを用いた遠方雑音抑圧の実環境での評価

    松井,裕太郎, 牧野,昭二, 小野,順貴, 山田,武志

    第31回信号処理シンポジウム  

    発表年月: 2016年11月

  • 雑音下音声認識における必要発話音量提示機能の実装と評価

    後藤,孝宏, 山田,武志, 牧野,昭二

    日本音響学会秋季研究発表会  

    発表年月: 2016年09月

  • 日本語スピーキングテストSJ-CATにおける項目応答理論に基づく能力値推定の検証

    小野,友暉, 山田,武志, 菊地,賢一, 今井,新悟, 牧野,昭二

    日本音響学会秋季研究発表会  

    発表年月: 2016年09月

  • ノンリファレンス特徴量を用いた自然発話音声認識の性能推定の検討

    郭,レイ, 山田,武志, 牧野,昭二

    日本音響学会秋季研究発表会  

    発表年月: 2016年09月

  • ヴァーチャル多素子化に基づくSN比最大化ビームフォーマの残響に対する性能変化

    山岡,洸瑛, 小野,順貴, 山田,武志, 牧野,昭二

    日本音響学会秋季研究発表会  

    発表年月: 2016年09月

  • Ego-noise reduction for a hose-shaped rescue robot using determined Rank-1 multichannel nonnegative matrix factorization

    Moe,Takakusaki, Daichi,Kitamura, Nobutaka,Ono, Takeshi,Yamada, Shoji,Makino, Hiroshi,Saruwatari

    IWAENC2016  

    発表年月: 2016年09月

  • Noise reduction using independent vector analysis and noise cancellation for a hose-shaped rescue robot

    Masaru,Ishimura, Shoji,Makino, Takeshi,Yamada, Nobutaka,Ono, Hiroshi,Saruwatari

    IWAENC2016   (Xian, PEOPLES R CHINA) 

    発表年月: 2016年09月

  • Multi-talker speech recognition based on blind source separation with ad hoc microphone array using smartphones and cloud storage

    Ochi, K, Ono, N, Miyabe, S, Makino, Shoji

    Interspeech   (San Francisco, CA) 

    発表年月: 2016年09月

  • Acoustic scene classification using deep neural network and frame-concatenated acoustic feature

    Gen, Takahashi, Takeshi, Yamada, Shoji, Makino, Nobutaka, Ono

    Detection and Classification of Acoustic Scenes and Events  

    発表年月: 2016年09月

  • Flexible microphone array based on multichannel nonnegative matrix factorization and statistical signal estimation

    Saruwatari, H, Takata, K, Ono, N, Makino, Shoji  [招待有り]

    International Congress on Acoustics  

    発表年月: 2016年09月

  • Toward a QoL improvement of ALS patients: Development of the full-body P300-based tactile brain-computer interface

    Kodama, T, Makino, Shoji, Rutkowski, T

    AEARU Young Researchers International Conference  

    発表年月: 2016年09月

  • 音声のスペクトル領域とケプストラム領域における同時強調

    李莉, 亀岡弘和, 樋口卓哉, 猿渡洋, 牧野昭二

    信学技報 EA2014-75  

    発表年月: 2016年08月

  • 独立ベクトル分析とノイズキャンセラを用いた雑音抑圧の柔軟索状ロボットへの適用

    石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

    日本機械学会ロボティクス・メカトロニクス講演会2016  

    発表年月: 2016年06月

  • Vehicle counting and lane estimation with ad-hoc microphone array in real road environments

    Takuya,Toyoda, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    NCSP'16  

    発表年月: 2016年03月

  • ランク1空間モデル制約付き多チャネルNMFを用いた雑音抑圧の柔軟索状ロボットへの適用

    高草木,萌, 北村,大地, 小野,順貴, 山田,武志, 牧野,昭二, 猿渡,洋

    電子情報通信学会総合大会  

    発表年月: 2016年03月

  • 振幅のみからの相関推定と雑音尖度に基づく空間サブトラクションアレーの減算係数最適化

    李,傑, 宮部,滋樹, 小野,順貴, 山田,武志, 牧野,昭二

    日本音響学会2016年春季研究発表会  

    発表年月: 2016年03月

  • 独立ベクトル分析とノイズキャンセラを用いた柔軟索状ロボットにおける雑音抑圧

    石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

    電子情報通信学会総合大会  

    発表年月: 2016年03月

  • 教師あり多チャネルNMFと統計的音声強調を用いた柔軟索状ロボットにおける音源分離

    高田一真, 北村大地, 中嶋広明, 小山翔一, 猿渡洋, 小野順貴, 牧野,昭二

    日本音響学会2016年春季研究発表会  

    発表年月: 2016年03月

  • ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

    高草木萌, 北村大地, 小野順貴, 山田武志, 牧野昭二, 猿渡洋

    日本機械学会ロボティクス・メカトロニクス講演会  

    発表年月: 2016年03月

  • 非同期分散マイクロホンによるブラインド音源分離を用いた複数話者同時音声認識

    越智景子, 小野順貴, 宮部滋樹, 牧野,昭二

    日本音響学会2016年春季研究発表会  

    発表年月: 2016年03月

  • SVM classification study of code-modulated visual evoked potentials

    D.,Aminaka, S.,Makino, T.M.,Rutkowski

    APSIPA   (PEOPLES R CHINA Hong Kong) 

    発表年月: 2015年12月

  • Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model

    Yoshikazu,Murase, Hironobu,Chiba, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    APSIPA2015   (PEOPLES R CHINA Hong Kong) 

    発表年月: 2015年12月

  • Fingertip stimulus cue-based tactile brain-computer interface

    H.,Yajima, S.,Makino, T.M.,Rutkowski

    APSIPA   (PEOPLES R CHINA Hong Kong) 

    発表年月: 2015年12月

  • Variable sound elevation features for head-related impulse response spatial auditory BCI

    C.,Nakaizumi, S.,Makino, T.M.,Rutkowski

    APSIPA   (PEOPLES R CHINA Hong Kong) 

    発表年月: 2015年12月

  • EEG filtering optimization for code-modulated chromatic visual evoked potential-based brain-computer interface

    D.,Aminaka, S.,Makino, T.M.,Rutkowski

    International Symbiotic Workshop (SYMBIOTIC)  

    発表年月: 2015年10月

  • 日本語スピーキングテストSJ-CATにおける低スコア解答発話の検出の検討

    小野,友暉, 山田,武志, 今井,新悟, 牧野,昭二

    日本音響学会2015年秋季研究発表会  

    発表年月: 2015年09月

  • ノンリファレンスひずみ特徴量を用いた雑音下音声認識性能推定の検討

    郭,レイ, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

    日本音響学会2015年秋季研究発表会  

    発表年月: 2015年09月

  • Classification Accuracy Improvement of Chromatic and High-Frequency Code-Modulated Visual Evoked Potential-Based BCI

    Aminaka,Daiki, Makino,Shoji, Rutkowski, Tomasz M

    8th International Conference on Brain Informatics and Health (BIH)   (Royal Geog Soc, London, ENGLAND) 

    発表年月: 2015年08月

  • Estimating correlation coefficient between two complex signals without phase observation

    S.,Miyabe, N.,Ono, Makino,Shoji

    LVA/ICA  

    発表年月: 2015年08月

  • Chromatic and high-frequency cVEP-based BCI paradigm

    Aminaka,Daiki, Makino,Shoji, Rutkowski, Tomasz M

    Engineering in Medicine and Biology Conference (EMBC)  

    発表年月: 2015年08月

  • Head-related impulse response cues for spatial auditory brain-computer interface

    C.,Nakaizumi, S.,Makino, T.M.,Rutkowski

    Engineering in Medicine and Biology Conference (EMBC)  

    発表年月: 2015年08月

  • マイクロホンアレーの位相が観測できない条件でのチャネル間の相関係数の推定

    宮部滋樹, 小野順貴, 牧野,昭二

    回路とシステムワークショップ  

    発表年月: 2015年08月

  • Inter-stimulus interval study for the tactile point-pressure brain-computer interface

    K.,Shimizu, Makino,Shoji, T.M.,Rutkowski

    Engineering in Medicine and Biology Conference (EMBC)  

    発表年月: 2015年08月

  • ステレオ録音に基づく移動音源モデルによる走行車両検出と走行方向推定

    遠藤,純基, 豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会2015年春季研究発表会  

    発表年月: 2015年03月

  • 総合品質と明瞭性の客観推定に基づくスペクトルサブトラクションの減算係数の最適化

    中里,徹, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会  

    発表年月: 2015年03月

  • 非同期分散マイクロフォンアレーによる伝達関数ゲイン基底NMFを用いた拡散雑音抑圧

    村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会2015年春季研究発表会  

    発表年月: 2015年03月

  • ケプストラム距離とSMR-パープレキシティを用いた雑音下音声認識の性能推定の検討

    郭,レイ, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会  

    発表年月: 2015年03月

  • 2つの超ガウス性複素信号の位相観測を用いない相関係数推定

    宮部滋樹, 小野順貴, 牧野, 昭二

    信学技報EA2014-75  

    発表年月: 2015年03月

  • Spatial auditory BCI spellers using real and virtual surround sound systems

    M.,Chang, C.,Nakaizumi, K.,Mori, Makino,Shoji, T.M.,Rutkowski

    Conference on Systems Neuroscience and Rehabilitation (SNR2015)  

    発表年月: 2015年03月

  • 認識性能予測に基づく雑音環境下音声認識のユーザビリティ改善の検討

    青木,智充, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会  

    発表年月: 2015年03月

  • On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

    Yoshikazu,Murase, Hironobu,Chiba, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    APSIPA 2014  

    発表年月: 2014年12月

  • 絶対値の観測のみを用いた2つの複素信号の相関係数推定

    宮部滋樹, 小野順貴, 牧野,昭二

    日本音響学会研究発表会  

    発表年月: 2014年09月

  • ケプストラム距離を用いた雑音下音声認識の性能推定の検討

    郭,翎, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

    日本音響学会研究発表会  

    発表年月: 2014年09月

  • 伝達関数ゲイン基底NMFにおけるマイク数・マイク配置と目的音強調性能の関係

    村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会  

    発表年月: 2014年09月

  • βダイバージェンスに基づく一般化振幅補間によるヴァーチャル多素子化を用いた目的音源強調

    片平,拓希, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会  

    発表年月: 2014年09月

  • 分散型マイクロホンアレイを用いた交通車両検出とその車線推定の検討

    豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会  

    発表年月: 2014年09月

  • AMPLITUDE-BASED SPEECH ENHANCEMENT WITH NONNEGATIVE MATRIX FACTORIZATION FOR ASYNCHRONOUS DISTRIBUTED RECORDING

    Chiba, Hironobu, Ono, Nobutaka, Miyabe, Shigeki, Takahashi, Yu, Yamada, Takeshi, Makino, Shoji

    14th International Workshop on Acoustic Signal Enhancement (IWAENC)   (Antibes, FRANCE) 

    発表年月: 2014年09月

  • Multi-stage declipping of clipping distortion based on length classification of clipped interval

    Chenlei,Li, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    日本音響学会研究発表会  

    発表年月: 2014年09月

  • 教師なし伝達関数ゲイン基底NMFによる目的音強調における罰則項の特性評価

    千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会  

    発表年月: 2014年09月

  • M2Mを用いた大規模データ収集システムの構築に関する研究

    牧野,昭二

    情報処理学会研究報告 計算機アーキテクチャ研究会(ARC)  

    発表年月: 2013年12月

  • VIRTUALLY INCREASING MICROPHONE ARRAY ELEMENTS BY INTERPOLATION IN COMPLEX-LOGARITHMIC DOMAIN

    Katahira, Hiroki, Ono, Nobutaka, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji

    21st European Signal Processing Conference (EUSIPCO)   (Marrakesh, MOROCCO) 

    発表年月: 2013年09月

  • 非同期録音ブラインド同期のための線形位相補償の効率的最尤解探索

    宮部滋樹, 小野順貴, 牧野昭二  [招待有り]

    音講論集___2-10-4_  

    発表年月: 2013年03月

  • 複素対数補間によるヴァーチャル観測に基づく劣決定条件での音声強調

    片平拓希, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二  [招待有り]

    音講論集___2-10-6_  

    発表年月: 2013年03月

  • 日本語スピーキングテストSCATにおける文読み上げ・文生成問題の自動採点手法の改良

    山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

    音講論集___1-Q-52a_465-468  

    発表年月: 2013年03月

  • 楽音符号化品質に影響を及ぼす楽音信号の特徴量の検討

    松浦嶺, 山田武志, 牧野昭二, 北脇信彦

    音講論集___1-Q-11c_401-404  

    発表年月: 2013年03月

  • ACELPにおけるピッチシャープニングの特性評価

    千葉大将, 守谷健弘, 鎌本優, 原田登, 宮部滋樹, 山田武志, 牧野昭二  [招待有り]

    音講論集___1-7-18_  

    発表年月: 2013年03月

  • 身体機能の統合による音楽情動コミュニケーションモデル

    寺澤洋子, 星-芝, 玲子, 柴山拓郎, 大村英史, 古川聖, 牧野, 昭二, 岡ノ谷一夫

    認知科学  

    発表年月: 2013年

  • AUTOMATIC SCORING METHOD CONSIDERING QUALITY AND CONTENT OF SPEECH FOR SCAT JAPANESE SPEAKING TEST

    Okubo, Naoko, Yamahata, Yuto, Yamada, Takeshi, Imai, Shingo, Ishizuka, Kenkichi, Shinozaki, Takahiro, Nisimura, Ryuichi, Makino, Shoji, Kitawaki, Nobuhiko

    International Conference on Speech Database and Assessments (Oriental COCOSDA)   (11 Macau, PEOPLES R CHINA) 

    発表年月: 2012年12月

  • 日本語スピーキングテストにおける文生成問題の自動採点の検討

    大久保梨思子, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

    音講論集___3-Q-16_395-396  

    発表年月: 2012年09月

  • ミュージカルノイズを考慮した雑音抑圧音声のFR型客観品質評価の検討

    藤田悠希, 山田武志, 牧野昭二, 北脇信彦

    音講論集___3-P-5_127-130  

    発表年月: 2012年09月

  • 身体動作の連動性理解にむけた筋活動可聴化

    松原正樹, 寺澤洋子, 門根秀樹, 鈴木健嗣, 牧野昭二  [招待有り]

    音講論集___2-10-2_  

    発表年月: 2012年09月

  • 非同期録音信号の線形位相補償によるブラインド同期と音源分離への応用

    宮部滋樹, 小野順貴, 牧野昭二  [招待有り]

    音講論集___3-9-8_  

    発表年月: 2012年09月

  • 日本語スピーキングテストにおける文章読み上げ問題の自動採点の検討

    山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

    音講論集___3-Q-18_399-400  

    発表年月: 2012年09月

  • コヒーレンス解析による定常状態誘発反応の可聴化

    加庭輝明, 寺澤洋子, 松原正樹, T.M.,Rutkowski, 牧野昭二

    音講論集___2002/10/2_919-922  

    発表年月: 2012年09月

  • 多チャンネルウィーナーフィルタを用いた音源分離における観測モデルの調査

    坂梨龍太郎, 宮部滋樹, 山田武志, 牧野昭二

    音講論集___1-P-14,_757-760  

    発表年月: 2012年09月

  • 混合DOA モデルに基づく多チャンネル複素NMF による劣決定BSS

    武田和馬, 亀岡弘和, 澤田宏, 荒木章子, 宮部滋樹, 山田武志, 牧野昭二

    音講論集___2-1-9_747-750  

    発表年月: 2012年03月

  • 日本語スピーキングテストにおける文生成問題の採点に影響を及ぼす要因の検討

    大久保梨思子, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

    信学総大___D-14-9_193  

    発表年月: 2012年03月

  • 日本語スピーキングテストにおける文章読み上げ問題の採点に影響を及ぼす要因の検討

    山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

    信学総大___D-14-8_192  

    発表年月: 2012年03月

  • 雑音抑圧音声の主観品質評価におけるミュージカルノイズの影響

    藤田悠希, 山田武志, 牧野昭二, 北脇信彦  [招待有り]

    信学総大___D-14-1_185  

    発表年月: 2012年03月

  • 音響モデルの精度を考慮した雑音下音声認識の性能推定の検討

    高岡隆守, 山田武志, 牧野昭二, 北脇信彦

    音講論集___1-P-13_149-150  

    発表年月: 2012年03月

  • 短時間雑音特性に基づく雑音下音声認識の性能推定の検討

    森下恵里, 山田武志, 牧野昭二, 北脇信彦

    音講論集___1-P-14_151-152  

    発表年月: 2012年03月

  • フルランク空間相関行列モデルに基づく拡散性雑音除去

    礒佳樹, 荒木章子, 牧野昭二, 中谷智広, 澤田宏, 山田武志, 宮部滋樹, 中村篤

    信学総大___A-10-9_194  

    発表年月: 2012年03月

  • 音量差に基づく音像生成における個人適応手法の有効性検証

    天野成祥, 山田武志, 牧野昭二, 北脇信彦

    音講論集___2-Q-1_895-898  

    発表年月: 2012年03月

  • 高次相関を用いた非線形MUSIC による高分解能方位推定

    杉本侑哉, 宮部滋樹, 山田武志, 牧野昭二

    音講論集___3-1-6_763-766  

    発表年月: 2012年03月

  • 時間周波数領域におけるグリッド間の整合性に基づくクリッピングの除去

    三浦晋, 宮部滋樹, 山田武志, 牧野昭二, 中島弘史, 中臺一博

    音講論集___1-Q-10_843-846  

    発表年月: 2012年03月

  • Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source

    Takeda, Kazuma, Kameoka, Hirokazu, Sawada, Hiroshi, Araki, Shoko, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji

    IEEE Region 10 Conference on TENCON   (INDONESIA) 

    発表年月: 2011年11月

  • Restoration of Clipped Audio Signal Using Recursive Vector Projection

    Miura, Shin, Nakajima, Hirofumi, Miyabe, Shigeki, Makino, Shoji, Yamada, Takeshi, Nakadai, Kazuhiro

    IEEE Region 10 Conference on TENCON   (INDONESIA) 

    発表年月: 2011年11月

  • 周波数依存の時間差モデルによる劣決定BSS

    丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田武志, 牧野昭二, 中村篤

    信学技報___EA2011-86_25-30  

    発表年月: 2011年11月

  • 発話の連続性に基づいた音声信号の分類による会議音声の可視化

    加藤通朗, 杉本侑哉, 宮部滋樹, 牧野昭二, 山田武志, 北脇信彦

    音講論集___3-P-20_197-200  

    発表年月: 2011年09月

  • 雑音抑圧音声の総合品質推定モデルの改良とその客観品質評価への適用

    藤田悠希, 山田武志, 牧野昭二, 北脇信彦

    音講論集___2-Q-23_127-130  

    発表年月: 2011年09月

  • スピーカ間の音量差に基づく音像生成手法における個人適応の検討

    天野成祥, 山田武志, 牧野昭二, 北脇信彦

    音講論集___2-4-10_661-664  

    発表年月: 2011年09月

  • 楽音と音声の双方に適用できる客観品質評価法の検討

    三上, 雄一郎, 山田, 武志, 牧野, 昭二, 北脇, 信彦

    信学総大___B-11-19_448  

    発表年月: 2011年03月

  • 雑音抑圧音声の客観品質評価に用いる総合品質推定モデルの改良

    藤田, 悠希, 山田, 武志, 牧野, 昭二

    信学総大___B-11-18_447  

    発表年月: 2011年03月

  • スペクトル変形同定の聴覚トレーニングにおける適応的フィードバックの影響

    加庭, 輝明, 金, 成英, 寺澤, 洋子, 伊藤, 寿浩, 池田, 雅弘, 山田, 武志, 牧野, 昭二

    音講論集___2-1-1_1003-1006  

    発表年月: 2011年03月

  • クリッピングした音響信号の修復

    三浦, 晋, 中島, 弘史, 牧野, 昭二, 山田, 武志, 中臺, 一博

    音講論集___3-P-53(d)_941-944  

    発表年月: 2011年03月

  • 空間スペクトルを用いた時間断続信号の検出における主成分分析と周波数分析の比較評価

    加藤, 通朗, 杉本, 侑哉, 牧野, 昭二, 山田, 武志, 北脇, 信彦

    音講論集___3-P-8(d)_879-880  

    発表年月: 2011年03月

  • 空間スペクトルへの周波数分析の適用による時間断続信号の検出

    杉本, 侑哉, 加藤, 通朗, 牧野, 昭二, 山田, 武志

    音講論集___3-P-7(c)_877-878  

    発表年月: 2011年03月

  • 高残響下で混合された音声の音源分離に関する研究

    礒, 佳樹, 荒木, 章子, 牧野, 昭二, 中谷, 智広, 澤田, 宏, 山田, 武志, 中村, 篤

    音講論集___1-9-13_643-646  

    発表年月: 2011年03月

  • 音源のW-DO性を仮定した多チャンネル複素NMFによる劣決定BSS

    武田, 和馬, 亀岡, 弘和, 澤田, 宏, 荒木, 章子, 山田, 武志, 牧野, 昭二

    音講論集___1-Q-19(f)_801-804  

    発表年月: 2011年03月

  • 視覚障がい者のタッチパネル操作支援のための音像生成手法の検討

    天野, 成祥, 山田, 武志, 牧野, 昭

    音講論集___3-P-7(c)_877-878  

    発表年月: 2011年03月

  • 雑音抑圧された音声の主観・客観品質評価法

    山田, 武志, 牧野, 昭二, 北脇, 信彦

    情報処理学会研究報告 音声言語情報処理(SLP)___2010-SLP-83 (7)_1-6  

    発表年月: 2010年10月

  • 雑音抑圧音声のMOSと単語了解度の客観推定

    山田, 武志, 北脇, 信彦, 牧野, 昭二

    信学ソ大___BS-5-4_S-19  

    発表年月: 2010年09月

  • 空間パワースペクトルの主成分分析に基づく時間断続信号の検出

    加藤, 通朗, 杉本, 侑哉, 牧野, 昭二, 山田, 武志, 北脇, 信彦

    信学技報___EA2010-47_25-30  

    発表年月: 2010年08月

  • Cepstral Smoothing of Separated Signals for Underdetermined Speech Separation

    Ansai, Yumi, Araki, Shoko, Makino, Shoji, Nakatani, Tomohiro, Yamada, Takeshi, Nakamura, Atsushi, Kitawaki, Nobuhiko

    International Symposium on Circuits and Systems Nano-Bio Circuit Fabrics and Systems (ISCAS 2010)   (Paris, FRANCE) 

    発表年月: 2010年05月

  • 調波構造とHMM合成に基づく混合楽器音認識の検討

    山本裕貴, 山田武志, 北脇信彦, 牧野昭二

    音講論集___3-8-4_1003-1004  

    発表年月: 2010年03月

  • 雑音抑圧音声の総合品質推定モデルを適用したフルリファレンス客観品質評価法

    篠原佑基, 山田武志, 北脇信彦, 牧野昭二

    信学総大___B-11-2_436  

    発表年月: 2010年03月

  • 劣決定音源分離のための分離信号のケプストラムスムージング

    安齊祐美, 荒木章子, 牧野昭二, 中谷智広, 山田武志, 中村篤, 北脇信彦

    音講論集___2-P-25_847-850  

    発表年月: 2010年03月

  • 日本語学習支援のためのアクセント認識の検討

    ショートグレッグ, 山田武志, 北脇信彦, 牧野昭二

    音講論集___1-P-17_447-448  

    発表年月: 2010年03月

  • 雑音下音声認識の性能推定法の実環境における評価

    中島智弘, 山田武志, 北脇信彦, 牧野昭二

    音講論集___2-Q-4_241-244  

    発表年月: 2010年03月

  • IP網における音声の客観品質評価に用いる擬似音声信号の検討

    青島千佳, 北脇信彦, 山田武志, 牧野昭二  [招待有り]

    信学総大___B-11-1_435  

    発表年月: 2010年03月

  • IP網における客観品質評価に用いる擬似音声信号の検討

    青島千佳, 北脇信彦, 山田武志, 牧野昭二  [招待有り]

    QoSワークショップ___QW7-P-16_  

    発表年月: 2009年11月

  • 楽音と音声の双方に適用できるオーディオ信号の客観品質推定法の検討

    三上雄一郎, 北脇信彦, 山田武志, 牧野昭二

    QoSワークショップ___QW-7-P-15_  

    発表年月: 2009年11月

  • 雑音抑圧音声の総合品質推定モデルを用いたフルリファレンス客観品質評価法の検討

    篠原佑基, 山田武志, 北脇信彦, 牧野昭二

    QoSワークショップ___QW7-P-13_  

    発表年月: 2009年11月

  • 音声区間推定と時間周波数領域方向推定の統合による会議音声話者識別

    荒木, 章子, 藤本, 雅清, 石塚, 健太郎, 中谷, 智広, 澤田, 宏, 牧野, 昭二

    信学技報___EA2008-40_19-24  

    発表年月: 2008年07月

  • [フェロー記念講演]独立成分分析に基づくブラインド音源分離

    牧野, 昭二

    信学技報___EA2008-17_65-73  

    発表年月: 2008年05月

  • 周波数領域ICAにおける初期値の短時間データからの学習

    荒木, 章子, 伊藤, 信貴, 澤田, 宏, 小野, 順貴, 牧野, 昭二, 嵯峨山, 茂樹

    信学総大___A-10-6_208  

    発表年月: 2008年03月

  • 音声区間検出と方向情報を用いた会議音声話者識別システムとその評価

    荒木, 章子, 藤本, 雅清, 石塚, 健太郎, 澤田, 宏, 牧野, 昭二

    音講論集___1-10-1_1-4  

    発表年月: 2008年03月

  • 音声のスパース性を用いたUnderdetermined音源分離

    荒木, 章子, 澤田, 宏, 牧野, 昭二

    信学総大___AS-4-5_S-46 - S-47  

    発表年月: 2008年03月

  • A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures

    H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

    ICA2007, Stereo Audio Source Separation Evaluation Campaign____  

    発表年月: 2007年09月

  • Blind source separation based on time-frequency masking and maximum SNR beamformer array

    S., Araki, H., Sawada, and, S. Makino, 牧野, 昭二

    ICA2007, Stereo Audio Source Separation Evaluation Campaign____  

    発表年月: 2007年09月

  • Blind audio source separation based on independent component analysis

    S. Makino  [招待有り]

    Keynote Talk at the 2007 International Conference on Independent Component Analysis and Signal Separation  

    発表年月: 2007年09月

  • 話者分類とSN比最大化ビームフォーマに基づく会議音声強調

    荒木, 章子, 澤田, 宏, 牧野, 昭二

    音講論集___2-1-13_571-572  

    発表年月: 2007年03月

  • 事前学習を用いる周波数領域Pearson-ICAの高速化

    加藤, 比呂子, 永原, 裕一, 荒木, 章子, 澤田, 宏, 牧野, 昭二

    音講論集___1-5-22_549-550  

    発表年月: 2006年03月

  • 観測信号ベクトルのクラスタリングに基づくスパース信号の到来方向推定

    荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

    音講論集___3-5-6_615-616  

    発表年月: 2006年03月

  • 独立成分分析に基づくブラインド音源分離

    牧野, 昭二, 荒木, 章子, 向井, 良, 澤田, 宏

    計測自動制御学会 中国支部 学術講演会____2-9  

    発表年月: 2005年11月

  • 多音源に対する周波数領域ブラインド音源分離

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    AIチャレンジ研究会___SIG-Challenge-0522-3_17-22  

    発表年月: 2005年10月

  • パラメトリックピアソン分布を用いた周波数領域ブラインド音源分離

    加藤, 比呂子, 永原, 裕一, 荒木, 章子, 澤田, 宏, 牧野, 昭二

    音講論集___2-2-4_593-594  

    発表年月: 2005年09月

  • 観測信号ベクトル正規化とクラスタリングによる音源分離手法とその評価

    荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

    音講論集___2-2-3_591-592  

    発表年月: 2005年09月

  • 3次元マイクロホンアレイを用いた多音源ブラインド分離

    向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

    信学ソ大___A-10-8_209  

    発表年月: 2005年09月

  • 多くの背景音からの主要音源のブラインド抽出

    澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

    信学ソ大___A-10-9_210  

    発表年月: 2005年09月

  • 観測ベクトルのクラスタリングによるブラインド音源分離

    荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

    信学ソ大___A-10-7_208  

    発表年月: 2005年09月

  • 独立成分分析を用いた音源数推定法

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    音講論集___3-Q-20_753-754  

    発表年月: 2004年09月

  • A solution for the permutation problem in frequency domain BSS using near- and far-field models

    R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

    CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-3_  

    発表年月: 2004年04月

  • Underdetermined blind source separation for convolutive mixtures of sparse signals

    S., Winter, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

    CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-2_  

    発表年月: 2004年04月

  • Blind separation of more speech than sensors using time-frequency masks and ICA

    S., Araki, S., Makino, H., Sawada, and, R. Mukai

    CSA2004 (NTT Workshop on Communication Scene Analysis)___AU-4_  

    発表年月: 2004年04月

  • Blind source separation for convolutive mixtures in the frequency domain

    H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-1_  

    発表年月: 2004年04月

  • 狭間隔・広間隔の複数マイクロホン対を用いた周波数領域ブラインド音源分離

    向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

    音講論集___3-P-16_627-628  

    発表年月: 2004年03月

  • 独立成分分析に基づくブラインド音源分離

    牧野, 昭二, 荒木, 章子, 向井, 良, 澤田, 宏

    ディジタル信号処理シンポジウム___A3-2_1-10  

    発表年月: 2003年11月

  • Blind Separation of More Speech Signals than Sensors using Time-frequency Masking and Mixing Matrix Estimation

    荒木, 章子, Audrey, Blin, 牧野, 昭二

    音講論集___1-P-4_585-586  

    発表年月: 2003年09月

  • 周波数領域BSSにおける近距離場モデルを用いたパーミュテーションの解法

    向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

    音講論集___1-P-6_589-590  

    発表年月: 2003年09月

  • 実環境における3音源以上のブラインド分離

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    音講論集___2-5-19_547-548  

    発表年月: 2003年09月

  • 時間周波数マスキングとICAの併用による音源数 > マイク数の場合のブラインド音源分離

    荒木, 章子, 向井, 良, 澤田, 宏, 牧野, 昭二

    音講論集___1-P-5_587-588  

    発表年月: 2003年09月

  • 独立成分分析に基づくブラインド音源分離

    牧野, 昭二, 荒木, 章子, 向井, 良, 澤田, 宏

    信学技報___EA2003-45_17-24  

    発表年月: 2003年06月

  • ICA-based audio source separation

    S., Makino, S., Araki, R., Mukai, and, H. Sawada

    International Workshop on Microphone Array Systems - Theory and Practice____  

    発表年月: 2003年05月

  • 周波数領域ブラインド音源分離におけるpermutation問題の頑健な解法

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    音講論集___3-P-25_777-778  

    発表年月: 2003年03月

  • 移動音源の低遅延実時間ブラインド分離

    向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

    音講論集___3-P-26_779-780  

    発表年月: 2003年03月

  • 帯域に適した分離手法を用いるサブバンド領域ブラインド音源分離

    荒木, 章子, 牧野, 昭二, Robert, Aichner, 西川, 剛樹, 猿渡, 洋

    音講論集___3-P-27_781-782  

    発表年月: 2003年03月

  • KL情報量最小化に基づく時間領域ICAと非定常信号の同時無相関化に基づく時間領域ICAの比較

    西川, 剛樹, 高谷, 智哉, 猿渡, 洋, 鹿野, 清宏, 荒木, 章子, 牧野, 昭二

    音講論集___2-5-14_545-546  

    発表年月: 2002年09月

  • 死角型ビームフォーマを初期値に用いる時間領域ブラインド音源分離

    荒木, 章子, 牧野, 昭二, Robert, Aichner, 西川, 剛樹, 猿渡, 洋

    音講論集___2-5-13_543-544  

    発表年月: 2002年09月

  • ブラインド音源分離後の残留スペクトルの推定と除去

    向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

    音講論集___2-5-11_539-540  

    発表年月: 2002年09月

  • 周波数領域ブラインド音源分離におけるpermutation問題の解法

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    音講論集___2-5-12_541-542  

    発表年月: 2002年09月

  • 周波数領域ICAと時間遅れスペクトル減算による残響下での実時間ブラインド音源分離

    向井, 良, 荒木, 章子, 澤田, 宏, 牧野, 昭二

    音講論集___1-Q-19_673-674  

    発表年月: 2002年03月

  • サブバンド処理によるブラインド音源分離に関する検討

    荒木, 章子, 牧野, 昭二, Robert, Aichner, 西川, 剛樹, 猿渡, 洋

    音講論集___3-4-9_619-620  

    発表年月: 2002年03月

  • 間隔の異なる複数のマイクペアによるブラインド音源分離

    澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

    音講論集___3-4-10_621-622  

    発表年月: 2002年03月

  • ICA-based sound separation

    S., Makino, S., Araki, R., Mukai, H., Sawada, R., Aichner, H., Saruwatari, T., Nishikawa, and, Y. Hinamoto

    NTT Workshop on Comm. Scene Analysis____  

    発表年月: 2002年01月

  • Time domain blind source separation of non-stationary convolved signals with utilization of geometric beamforming

    R., Aichner, S., Araki, S., Makino, H., Sawada, T., Nishikawa, and, H. Saruwatari

    NTT Workshop on Comm. Scene Analysis____  

    発表年月: 2002年01月

  • Separation and dereverberation performance of frequency domain blind source separation

    R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    NTT Workshop on Comm. Scene Analysis____  

    発表年月: 2002年01月

  • Equivalence between frequency domain blind source separation and frequency domain adaptive beamformers

    S., Araki, S., Makino, R., Mukai, and, H. Saruwatari

    NTT Workshop on Comm. Scene Analysis____  

    発表年月: 2002年01月

  • A polar-coordinate based activation function for frequency domain blind source separation

    H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    NTT Workshop on Comm. Scene Analysis____  

    発表年月: 2002年01月

  • 周波数領域ブラインド音源分離と適応ビ-ムフォ-マの等価性について

    雛元, 洋一, 西川, 剛樹, 猿渡, 洋, 荒木, 章子, 牧野, 昭二, 向井, 良

    信学技報___EA2001-84_75-82  

    発表年月: 2001年11月

  • 非定常スペクトルサブトラクションによる音源分離後の残留雑音除去

    向井, 良, 荒木, 章子, 澤田, 宏, 牧野, 昭二

    音講論集___2-6-14_617-618  

    発表年月: 2001年10月

  • 周波数領域ブラインド音源分離のための極座標表示に基づく活性化関数

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    音講論集___2-6-13_615-616  

    発表年月: 2001年10月

  • 周波数領域ブラインド音源分離と周波数領域適応ビームフォーマの関係について

    荒木, 章子, 牧野, 昭二, 向井, 良, 猿渡, 洋

    音講論集___2-6-12_613-614  

    発表年月: 2001年10月

  • 時間領域ICAと周波数領域ICAを併用した多段ICAによるブラインド音源分離

    猿渡, 洋, 西川, 剛樹, 荒木, 章子, 牧野, 昭二

    日本神経回路学会全国大会____99-100  

    発表年月: 2001年09月

  • 複素数に対する独立成分分析のための極座標表示に基づく活性化関数

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    日本神経回路学会全国大会____97-98  

    発表年月: 2001年09月

  • 実環境での混合音声に対する周波数領域ブラインド音源分離手法の性能限界

    荒木, 章子, 牧野, 昭二, 西川, 剛樹, 猿渡, 洋

    音講論集___3-7-4_567-568  

    発表年月: 2001年03月

  • 帯域分割型ICAを用いたBlind Source Separationにおける帯域分割数の最適化

    西川, 剛樹, 荒木, 章子, 牧野, 昭二, 猿渡, 洋

    音講論集___3-7-5_569-570  

    発表年月: 2001年03月

  • 実環境におけるブラインド音源分離と残響除去性能に関する検討

    向井, 良, 荒木, 章子, 牧野, 昭二

    音講論集___3-7-3_565-566  

    発表年月: 2001年03月

  • 周波数領域Blind Source Separationにおける帯域分割数の最適化

    西川, 剛樹, 荒木, 章子, 牧野, 昭二, 猿渡, 洋

    信学技報___EA2000-95_53-59  

    発表年月: 2001年01月

  • チャネル数変換型多チャネル音響エコーキャンセラ

    中川, 朗, 島内, 末廣, 羽田, 陽一, 青木, 茂明, 牧野, 昭二

    信学総大___A-4-51_140  

    発表年月: 2000年03月

  • ステレオエコーキャンセラにおける相互相関変動方法の検討

    鈴木, 邦和, 杉山, 精, 阪内, 澄宇, 島内, 末廣, 牧野, 昭二

    信学技報___EA99-86_25-32  

    発表年月: 1999年12月

  • 音響系の変動に着目したステレオ信号の相関低減方法

    鈴木, 邦和, 阪内, 澄宇, 島内, 末廣, 牧野, 昭二

    音講論集___1-6-12_453-454  

    発表年月: 1999年03月

  • ハンズフリー音声会議装置における複数マイクロホンの構成の検討

    中川, 朗, 島内, 末廣, 牧野, 昭二

    音講論集___2-6-7_493-494  

    発表年月: 1999年03月

  • 相互相関の変動付加処理に適したステレオエコーキャンセラの構成の検討

    島内, 末廣, 羽田, 陽一, 牧野, 昭二, 金田, 豊

    信学総大___A-4-12_121  

    発表年月: 1998年03月

  • Block fast projection algorithm with independent block sizes

    M., Tanaka, S., Makino, J., Kojima

    信学総大___TA-2-2_554-555  

    発表年月: 1997年03月

  • 射影アルゴリズムを用いたサブバンドステレオエコーキャンセラ

    牧野, 昭二, 島内, 末廣, 羽田, 陽一, 中川, 朗

    音講論集___2-7-18_549-550  

    発表年月: 1996年09月

  • サブバンドエコーキャンセラにおけるフィルタ更新ベクトルの平坦化の検討

    中川, 朗, 羽田, 陽一, 牧野, 昭二

    信学ソ大___A-87_88  

    発表年月: 1996年09月

  • 拡声通信システムにおける周波数帯域別所要エコー抑圧量の検討

    阪内, 澄宇, 牧野, 昭二

    音講論集___2-7-17_547-548  

    発表年月: 1996年09月

  • 高速射影アルゴリズムの多チャンネル系への適用

    島内, 末廣, 田中, 雅史, 牧野, 昭二

    信学総大___A-168_170  

    発表年月: 1996年03月

  • ES family'アルゴリズムと従来の適応アルゴリズムの関係について

    牧野, 昭二

    信学技報___DSP95-148_65-70  

    発表年月: 1996年01月

  • 高速FIRフィルタリング算法を利用した射影法

    田中, 雅史, 牧野, 昭二, 金田, 豊

    信学ソ大___A-79_81  

    発表年月: 1995年09月

  • サブバンドエコーキャンセラのプロトタイプフィルタの検討

    中川, 朗, 羽田, 陽一, 牧野, 昭二

    信学ソ大___A-73_75  

    発表年月: 1995年09月

  • 擬似入出力関係を利用したステレオ音響エコーキャンセラ用アルゴリズムの検討

    島内, 末廣, 牧野, 昭二

    音講論集___2-6-5_543-544  

    発表年月: 1995年09月

  • 複素射影サブバンドエコーキャンセラに関する検討

    中川, 朗, 羽田, 陽一, 牧野, 昭二

    音講論集___2-6-3_539-540  

    発表年月: 1995年09月

  • エコーキャンセラ用SSBサブバンド射影アルゴリズム

    牧野, 昭二, 羽田, 陽一, 中川, 朗

    音講論集___2-6-4_541-542  

    発表年月: 1995年09月

  • 真の音響エコー経路を推定するステレオ射影エコーキャンセラの検討

    島内, 末廣, 牧野, 昭二

    信学総大___A-220_220  

    発表年月: 1995年03月

  • ES射影アルゴリズムを用いたデュオフィルタ構成のエコーキャンセラの検討

    羽田, 陽一, 牧野, 昭二, 小島, 順治, 島内, 末廣

    音講論集___3-3-10_595-596  

    発表年月: 1995年03月

  • 音響エコーキャンセラ用デュオフィルタコントロールシステム

    羽田, 陽一, 牧野, 昭二, 田中, 雅史, 島内, 末廣, 小島, 順治

    信学総大___A-350_350  

    発表年月: 1995年03月

  • 高性能音響エコーキャンセラの開発

    小島, 順治, 牧野, 昭二, 羽田, 陽一, 島内, 末廣, 金田, 豊

    信学総大___A-348_348  

    発表年月: 1995年03月

  • ES射影アルゴリズムの音響エコーキャンセラへの適用

    牧野, 昭二, 羽田, 陽一, 田中, 雅史, 金田, 豊, 小島, 順治

    信学総大___A-349_349  

    発表年月: 1995年03月

  • エコーキャンセラの音声入力に対する収束速度改善方法の比較について

    牧野, 昭二

    音講論集___2-6-16_653-654  

    発表年月: 1994年10月

  • ステレオ信号の相互相関の変化に着目したステレオ射影エコーキャンセラの検討

    島内, 末廣, 牧野, 昭二

    音講論集___2-6-17_655-656  

    発表年月: 1994年10月

  • PMTC/N-ISDN用多地点エコーキャンセラの構成

    須田, 泰史, 藤野, 雄一, 牧野, 昭二, 小長井, 俊介, 川田, 真一

    信学全大___B-795_393  

    発表年月: 1994年09月

  • 室内音場伝達関数の共通極・零モデル化

    羽田, 陽一, 牧野, 昭二, 金田, 豊

    信学技報___EA93-101_19-29  

    発表年月: 1994年03月

  • ES-RLSアルゴリズムと従来の適応アルゴリズムの関係について

    牧野, 昭二

    音講論集___1-5-12_471-472  

    発表年月: 1993年10月

  • 共通極を用いたスピーカ特性の多点イコライゼーションについて

    羽田, 陽一, 牧野, 昭二

    音講論集___1-5-18_483-484  

    発表年月: 1993年10月

  • 高次の射影アルゴリズムの演算量削減について

    田中, 雅史, 金田, 豊, 牧野, 昭二

    信学全大___A-101_1-103  

    発表年月: 1993年09月

  • 共通極を用いた多点イコライゼーションフィルタについて

    羽田, 陽一, 牧野, 昭二

    音講論集___3-9-17_491-492  

    発表年月: 1993年03月

  • 複数の室内音場伝達関数に共通な極の最小2乗推定について

    羽田, 陽一, 牧野, 昭二, 金田, 豊

    信学全大___SA-11-4_1-489 - 1-490  

    発表年月: 1993年03月

  • 音響エコーキャンセラ用ES射影アルゴリズム

    牧野, 昭二, 金田, 豊

    信学技報___EA92-74_41-52  

    発表年月: 1992年11月

  • 室内インパルス応答の変動特性を反映させたES-RLSアルゴリズム

    牧野, 昭二, 金田, 豊

    音講論集___2-4-19_547-548  

    発表年月: 1992年10月

  • 音声入力に対する射影法の次数と収束特性について

    田中, 雅史, 牧野, 昭二, 金田, 豊

    音講論集___1-4-14_489-490  

    発表年月: 1992年10月

  • エコーキャンセラ用ES射影アルゴリズムの収束条件について

    牧野, 昭二, 金田, 豊

    信学全大___SA-9-6_1-301  

    発表年月: 1992年09月

  • 室内インパルス応答の統計的性質に基づく指数重み付けNLMS適応フィルタ

    牧野, 昭二, 金田, 豊

    信学技報___EA92-48_9-20  

    発表年月: 1992年08月

  • エコーキャンセラ用ES射影アルゴリズム

    牧野, 昭二, 金田, 豊

    信学全大___SA-7-11_1-472 - 1-473  

    発表年月: 1992年03月

  • 音響エコーキャンセラにおけるダブルトーク制御方式の検討

    中原, 宏之, 羽田, 陽一, 牧野, 昭二, 吉川, 昭吉郎

    音講論集___3-5-7_503-504  

    発表年月: 1992年03月

  • 音の到来方向によらない頭部伝達伝達関数の共通極とモデル化について

    羽田, 陽一, 牧野, 昭二, 金田, 豊

    音講論集___1-8-5_483-484  

    発表年月: 1991年10月

  • エコーキャンセラ用ES (Exponential Step) アルゴリズムの収束条件について

    牧野, 昭二, 金田, 豊

    音講論集___1-7-25_419-420  

    発表年月: 1991年03月

  • 室内音場伝達関数の極の推定について

    羽田, 陽一, 牧野, 昭二, 金田, 豊

    音講論集___1-7-12_393-394  

    発表年月: 1991年03月

  • 帯域分割形指数重み付けアルゴリズムを用いた音響エコーキャンセラ

    牧野, 昭二, 羽田, 陽一

    信学全大___SA-9-4_1-255 - 1-256  

    発表年月: 1990年10月

  • 低周波領域における室内音場伝達関数のARMAモデルについて

    羽田, 陽一, 牧野, 昭二, 小泉, 宣夫

    音講論集___2-7-14_439-440  

    発表年月: 1990年03月

  • 指数重み付けによるエコーキャンセラ用適応アルゴリズム

    牧野, 昭二

    音講論集___3-6-5_517-518  

    発表年月: 1989年10月

  • エコーキャンセラの室内音場における適応特性改善について

    牧野, 昭二, 小泉, 宣夫

    信学技報___EA89-3_15-21  

    発表年月: 1989年04月

  • 拡声通話形の音声会議システム

    及川, 弘, 西野, 正和, 牧野, 昭二

    信学全大___B-548_2-243  

    発表年月: 1988年03月

  • エコーキャンセラの室内音場における適応特性の改善について

    牧野, 昭二, 小泉, 宣夫

    音講論集___1-5-13_355-356  

    発表年月: 1988年03月

  • 複数反響路を有する音響エコーキャンセラの構成法

    小泉, 宣夫, 牧野, 昭二, 及川, 弘

    信学技報___EA87-75_1-6  

    発表年月: 1988年01月

  • 複数反響路を有する音響エコーキャンセラ

    小泉, 宣夫, 牧野, 昭二, 及川, 弘

    信学部門全大___431_1-296  

    発表年月: 1987年09月

  • 音響エコーキャンセラの室内環境における消去特性について

    牧野, 昭二, 小泉, 宣夫

    信学技報___EA87-43_41-48  

    発表年月: 1987年08月

  • 直方体ブース内の障害物によるインパルス応答の変動について

    牧野, 昭二, 小泉, 宣夫

    音講論集___1-3-1_295-296  

    発表年月: 1987年03月

  • MTFによる音声会議でのマイクロホン配置の評価について

    小泉, 宣夫, 牧野, 昭二, 青木, 茂明

    音講論集___2-7-18_631-632  

    発表年月: 1986年10月

  • 音響エコーキャンセラの室内環境における定常特性について

    牧野, 昭二, 小泉, 宣夫

    音講論集___2-7-19_383-384  

    発表年月: 1985年10月

  • 室内残響特性を考慮した音声スイッチ切替特性の検討

    牧野, 昭二, 山森, 和彦

    音講論集___1-2-19_265-266  

    発表年月: 1984年10月

  • マイクロプロセッサ制御を用いた拡声電話機の構成法

    山森, 和彦, 松井, 弘行, 牧野, 昭二

    信学技報___EA84-41_15-21  

    発表年月: 1984年09月

  • 音声スイッチ回路損失制御波形の通話品質への影響

    石丸, 薫, 小川, 峰義, 牧野, 昭二

    信学部門全大___795_3-190  

    発表年月: 1984年09月

  • 周辺に段差を持つ圧電バイモルフ振動板の振動特性について

    一ノ瀬, 裕, 牧野, 昭二

    音講論集___1-6-5_287-288  

    発表年月: 1983年10月

  • ハンドセット小形化に関する一検討

    牧野, 昭二, 一ノ瀬, 裕

    音講論集___1-6-10_297-298  

    発表年月: 1983年10月

▼全件表示

共同研究・競争的資金等の研究課題

  • 次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

    関東経済産業局  中小企業経営支援等対策費補助金(戦略的基盤技術高度化支援事業)

    研究期間:

    2020年04月
    -
    2021年03月
     

  • 音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

    日本学術振興会  基盤研究(B)

    研究期間:

    2019年
    -
    2021年
     

    牧野 昭二

     概要を見る

    [検討項目1] 音の伝播の物理的なモデルに基づいて観測信号を補間し、実際には存在しない、いわばバーチャルな観測信号を作り出して素子数を擬似的に増やすことにより、音源数に依存することなく高品質な出力を得るための統一的なアレー信号処理を検討した。擬似観測の振幅は非線形補間により推定した。擬似観測を用いた音声強調の劣決定拡張により、擬似観測の基本的な検証を行った。さらに、バーチャルマイクロホンの動作原理の解明と高性能化を図った。今期は、国際会議発表2件、および、国内大会発表1件の研究成果を得た。
    [検討項目2] 音環境からの情報を利用した多チャネル信号処理アルゴリズムを開発した。既存のアルゴリズムを分散型マイクロホンアレーに対応できるように一般化し、さらに強力な最適化規範を導入した。分散型マイクロホンアレーにおけるサブアレーの同期手法を開発した。ブラインド音源分離/抽出アルゴリズムや多チャネル残響除去アルゴリズムを分散型マイクロホンアレーに対応できるように開発した。さらに、必要なマイクロホンを最小化して演算量を削減しながら、性能を最適化するためのマイクロホン選択手法も検討した。今期は、雑誌論文4件、国際会議発表7件、および、国内大会発表9件の研究成果を得た。
    [検討項目3] 強調された音源信号から抽出した特徴量に基づき、音環境を解析・理解した。音源信号に関する先見知識を利用し、特徴量次元での分類法も利用した。分類精度を向上させるために、深層学習などの最新の音声認識技術を活用した。今期は、国際会議発表1件、および、国内大会発表1件の研究成果を得た。

  • マイクロホンアレーを用いた音情景解析の研究

    筑波大学・ドイツ学術交流会(DAAD)パートナーシップ・プログラム 

    研究期間:

    2017年04月
    -
    2018年03月
     

  • ALS患者のための音の空間情報を利用したブレインマシンインタフェース(BMI)の研究開発

    総務省 戦略的情報通信研究開発推進制度(SCOPE)  その他

    研究期間:

    2014年04月
    -
    2015年03月
     

  • 脳科学と情報科学を融合させたBMI構築のための多チャネル脳波信号処理技術の革新

    日本学術振興会  基盤研究(C)

    研究期間:

    2013年04月
    -
    2014年03月
     

    牧野 昭二, ルトコフスキ トマシュ, 宮部 滋樹, 寺澤 洋子, 山田 武志

     概要を見る

    本プロジェクトでは、人が音の空間性を認識するときの脳活動に着目し、基礎研究を行なうとともに、このような空間性を有する音の聴取時に観察される特徴ある脳活動を利用したブレインマシンインタフェースの開発を行なった。今年度は、空間聴覚刺激に対する事象関連電位の統計的特徴に基づいた電極と潜時の選択手法を提案し、識別率を向上させた。音の出力法の試みでは、スピーカによる実音源と仮想音源でP300反応に個人差があること、視覚刺激より振幅が小さいこと、頭部伝達関数を利用した空間聴覚刺激が後頭部にP300を誘発させることを確認した。

  • 音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

    日本学術振興会  基盤研究(B)

    研究期間:

    2020年04月
    -
    2021年03月
     

  • スモールデータ機械学習理論に基づく音響拡張現実感及び音コミュニケーション能力拡張

    日本学術振興会  基盤研究(A)

    研究期間:

    2020年04月
    -
    2021年03月
     

  • 音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

    日本学術振興会  基盤研究(B)

    研究期間:

    2019年04月
    -
    2020年03月
     

  • スモールデータ機械学習理論に基づく音響拡張現実感及び音コミュニケーション能力拡張

    日本学術振興会  基盤研究(A)

    研究期間:

    2019年04月
    -
    2020年03月
     

  • 次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

    関東経済産業局  中小企業経営支援等対策費補助金(戦略的基盤技術高度化支援事業)

    研究期間:

    2019年04月
    -
    2020年03月
     

  • 非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

    日本学術振興会  基盤研究(A)

    研究期間:

    2019年04月
    -
    2020年03月
     

  • 高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

    セコム科学技術振興財団 

    研究期間:

    2019年04月
    -
    2020年03月
     

  • 大量音声データの事前学習に基づく ブラインド音源分離手法の高度化

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2019年04月
    -
    2020年02月
     

  • 次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

    関東経済産業局  中小企業経営支援等対策費補助金(戦略的基盤技術高度化支援事業)

    研究期間:

    2018年09月
    -
    2019年03月
     

  • 非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

    日本学術振興会  基盤研究(A)

    研究期間:

    2018年04月
    -
    2019年03月
     

  • 高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

    セコム科学技術振興財団 

    研究期間:

    2018年04月
    -
    2019年03月
     

  • 聴覚特性を考慮した信号処理・機械学習アプローチによる音声強調法の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2018年04月
    -
    2019年02月
     

  • DNNを用いた音声音響符号化の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2018年04月
    -
    2019年02月
     

  • 非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

    日本学術振興会  基盤研究(A)

    研究期間:

    2017年04月
    -
    2018年03月
     

  • 音環境の認識と理解およびスマートホームセキュリティ-、ロボット聴覚、等への応用

    NII  国内共同研究

    研究期間:

    2017年04月
    -
    2018年03月
     

  • 環境に適応するための音声強調系最適化

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2017年04月
    -
    2018年03月
     

  • 高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

    セコム科学技術振興財団 

    研究期間:

    2017年04月
    -
    2018年03月
     

  • DNNを用いた音声音響符号化の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2017年04月
    -
    2018年02月
     

  • 聴覚特性を考慮した信号処理・機械学習アプローチによる音声強調法の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2017年04月
    -
    2018年02月
     

  • 柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

    国立研究開発法人科学技術振興機構 (JST)  革新的研究開発推進プログラム(ImPACT)

    研究期間:

    2017年04月
    -
    2017年11月
     

  • マイクの指向性による、音声認識率の向上

    富士ソフト株式会社  国内共同研究

    研究期間:

    2016年04月
    -
    2017年03月
     

  • 柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

    国立研究開発法人科学技術振興機構 (JST)  革新的研究開発推進プログラム(ImPACT)

    研究期間:

    2016年04月
    -
    2017年03月
     

  • 非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

    日本学術振興会  基盤研究(A)

    研究期間:

    2016年04月
    -
    2017年03月
     

  • マイクロホンアレー付き監視カメラを用い音響情報を統計数理的学習理論により解析するイベント検出とシーン解析

    NII  国内共同研究

    研究期間:

    2016年04月
    -
    2017年03月
     

  • 高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

    セコム科学技術振興財団 

    研究期間:

    2016年04月
    -
    2017年03月
     

  • 音響情報と映像情報を統計数理的学習理論により融合するイベント検出とシーン解析

    筑波大学  研究基盤支援プログラム(Bタイプ)

    研究期間:

    2016年04月
    -
    2017年03月
     

  • マイクロホンアレーを用いた音情景解析の研究

    筑波大学・ドイツ学術交流会(DAAD)パートナーシップ・プログラム 

    研究期間:

    2016年04月
    -
    2017年03月
     

  • 音声音響符号化音のプレフィルタ・ポストフィルタ処理による音質改善の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2016年04月
    -
    2017年02月
     

  • 音声のスペクトル領域とケプストラム領域における同時強調法の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2016年04月
    -
    2017年02月
     

  • 柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

    国立研究開発法人科学技術振興機構 (JST)  革新的研究開発推進プログラム(ImPACT)

    研究期間:

    2015年09月
    -
    2016年03月
     

  • 非同期録音機器を利用可能にするマイクロフォンアレイ信号処理の研究

    日本学術振興会  基盤研究(B)

    研究期間:

    2015年04月
    -
    2016年03月
     

  • 音響センシングによる交通量モニタリング

    NII  国内共同研究

    研究期間:

    2014年04月
    -
    2015年03月
     

  • 低遅延・低ビットレートの音声・音響統合符号化の検討

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2014年04月
    -
    2015年03月
     

  • 非同期録音機器を利用可能にするマイクロフォンアレイ信号処理の研究

    日本学術振興会  基盤研究(B)

    研究期間:

    2014年04月
    -
    2015年03月
     

  • 高次統計量追跡による自律カスタムメイド音コミュニケーション拡張システムの研究

    日本学術振興会  基盤研究(A)

    研究期間:

    2014年04月
    -
    2015年03月
     

  • ALS患者のための音の空間情報を利用したブレインマシンインタフェース(BMI)の研究開発

    総務省 戦略的情報通信研究開発推進制度(SCOPE)  その他

    研究期間:

    2013年04月
    -
    2014年03月
     

  • 非同期録音機器を利用可能にするマイクロフォンアレイ信号処理の研究

    日本学術振興会  基盤研究(B)

    研究期間:

    2013年04月
    -
    2014年03月
     

    小野 順貴, 牧野 昭二, 宮部 滋樹, 篠田 浩一

     概要を見る

    マイクロフォンアレイ信号処理は、複数のマイクで録音した信号を処理し、音の到来方向を推定したり、雑音の中から目的音を強調したりすることを可能にする重要な技術です。マイクロフォンアレイ信号処理では、チャンネル間の微小な時間差が重要な情報となっているため、従来は複数のマイクロフォンが同期して録音される必要がありました。これに対し本研究では、スマートフォン、ノートPC、ICレコーダーなど、同期していない複数の録音機器をアレイ信号処理に用いるために、録音信号を事前情報なしに同期させたり、録音信号からマイクロフォンの位置を推定したりする技術を開発しました。

  • 複数録音機器による非同期録音信号の同期に関する研究

    ヤマハ株式会社  国内共同研究

    研究期間:

    2013年04月
    -
    2014年03月
     

  • 複素対数補間に基づくヴァーチャル観測を用いた劣決定アレイ信号処理

    NII  国内共同研究

    研究期間:

    2013年04月
    -
    2014年03月
     

  • 高次統計量追跡による自律カスタムメイド音コミュニケーション拡張システムの研究

    日本学術振興会  基盤研究(A)

    研究期間:

    2013年04月
    -
    2014年03月
     

    猿渡 洋, 鹿野 清宏, 戸田 智基, 川波 弘道, 小野 順貴, 宮部 滋樹, 牧野 昭二, 小山 翔一

     概要を見る

    本研究では、高次統計量追跡による自律カスタムメイド音声コミュニケーション拡張システムに関して研究を行った。具体的なシステムとして、ブラインド音源分離に基づく両耳補聴システムや声質変換に基づく発声補助システムを開発し、以下の成果が得られた。(1)両耳補聴システムに関しては、高精度かつ高速なブラインド音源分離及び統計的音声強調アルゴリズムを提案し、聴覚印象の不動点を活用した高品質な音声強調システムが実現できた。(2)発声補助システムに関しては、データベース間における発話のミスマッチを許容する声質変換処理を開発した。実環境模擬データベースを用いてその評価を行い、有効性を確認することが出来た。

  • 低遅延・低ビットレートの音声・音響統合符号化の検討

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2013年05月
    -
    2014年02月
     

  • ALS患者のための音の空間情報を利用したブレインマシンインタフェース(BMI)の研究開発

    総務省 戦略的情報通信研究開発推進制度(SCOPE)  その他

    研究期間:

    2012年09月
    -
    2013年03月
     

  • 脳科学と情報科学を融合させたBMI構築のための多チャネル脳波信号処理技術の革新

    日本学術振興会  基盤研究(C)

    研究期間:

    2012年04月
    -
    2013年03月
     

    牧野 昭二, ルトコフスキ トマシュ, 宮部 滋樹, 寺澤 洋子, 山田 武志

     概要を見る

    本プロジェクトでは、人が音の空間性を認識するときの脳活動に着目し、基礎研究を行なうとともに、このような空間性を有する音の聴取時に観察される特徴ある脳活動を利用したブレインマシンインタフェースの開発を行なった。今年度は、空間聴覚刺激に対する事象関連電位の統計的特徴に基づいた電極と潜時の選択手法を提案し、識別率を向上させた。音の出力法の試みでは、スピーカによる実音源と仮想音源でP300反応に個人差があること、視覚刺激より振幅が小さいこと、頭部伝達関数を利用した空間聴覚刺激が後頭部にP300を誘発させることを確認した。

  • 非同期録音機器を利用可能にするアレイ信号処理技術

    NII  国内共同研究

    研究期間:

    2012年04月
    -
    2013年03月
     

  • 高次統計量追跡による自律カスタムメイド音コミュニケーション拡張システムの研究

    日本学術振興会  基盤研究(A)

    研究期間:

    2012年04月
    -
    2013年03月
     

    猿渡 洋, 鹿野 清宏, 戸田 智基, 川波 弘道, 小野 順貴, 宮部 滋樹, 牧野 昭二, 小山 翔一

     概要を見る

    本研究では、高次統計量追跡による自律カスタムメイド音声コミュニケーション拡張システムに関して研究を行った。具体的なシステムとして、ブラインド音源分離に基づく両耳補聴システムや声質変換に基づく発声補助システムを開発し、以下の成果が得られた。(1)両耳補聴システムに関しては、高精度かつ高速なブラインド音源分離及び統計的音声強調アルゴリズムを提案し、聴覚印象の不動点を活用した高品質な音声強調システムが実現できた。(2)発声補助システムに関しては、データベース間における発話のミスマッチを許容する声質変換処理を開発した。実環境模擬データベースを用いてその評価を行い、有効性を確認することが出来た。

  • 脳科学と情報科学を融合させたBMI構築のための多チャネル脳波信号処理技術の革新

    日本学術振興会  基盤研究(C)

    研究期間:

    2011年04月
    -
    2012年03月
     

    牧野 昭二, ルトコフスキ トマシュ, 宮部 滋樹, 寺澤 洋子, 山田 武志

     概要を見る

    本プロジェクトでは、人が音の空間性を認識するときの脳活動に着目し、基礎研究を行なうとともに、このような空間性を有する音の聴取時に観察される特徴ある脳活動を利用したブレインマシンインタフェースの開発を行なった。今年度は、空間聴覚刺激に対する事象関連電位の統計的特徴に基づいた電極と潜時の選択手法を提案し、識別率を向上させた。音の出力法の試みでは、スピーカによる実音源と仮想音源でP300反応に個人差があること、視覚刺激より振幅が小さいこと、頭部伝達関数を利用した空間聴覚刺激が後頭部にP300を誘発させることを確認した。

  • 音声特性と聴覚特性を反映した音声強調処理技術の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2011年04月
    -
    2012年03月
     

  • 脳科学と情報科学を融合させたBCI構築のための多チャネル脳波信号処理の研究

    電気通信普及財団  出資金による受託研究

    研究期間:

    2011年04月
    -
    2012年03月
     

  • 脳科学,生命科学,情報科学を融合させた生体マルチメディア情報研究

    研究期間:

    2011年04月
    -
     
     

  • 音声特性と聴覚特性を反映した音声強調処理技術の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2010年04月
    -
    2011年03月
     

  • 音声特性と聴覚特性を反映した音声強調処理技術の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    研究期間:

    2009年04月
    -
    2010年03月
     

  • 生体信号処理と音響信号処理による生命科学研究の革新

    日本学術振興会  科学研究費助成事業

    研究期間:

    2010年
     
     
     

    牧野 昭二, WU Y. J.

  • 音声、音楽メディアのコンテンツ基盤技術の創出とイマーシブオーディオコミュニケーションの創生

    研究期間:

    2009年04月
    -
     
     

▼全件表示

Misc

  • 畳込み混合のブラインド音源分離(<特集>独立成分分析とその応用特集号)

    牧野昭二, 荒木章子, 向井良, 澤田宏

    システム/制御/情報 : システム制御情報学会誌   48 ( 10 ) 401 - 408  2004年10月

    DOI CiNii

  • ブラインドな処理が可能な音源分離技術 (特集 コミュニケーションの壁を克服するための音声・音響処理技術)

    牧野昭二, 荒木章子, 向井良

    NTT技術ジャ-ナル   15 ( 12 ) 8 - 12  2003年12月

    CiNii

  • ステレオエコーキャンセラの課題と解決法

    牧野昭二, 島内末廣

    システム/制御/情報 : システム制御情報学会誌   46 ( 12 ) 724 - 732  2002年12月

    DOI CiNii

  • 混じりあった声を解く--遠隔発話の認識を目指して (特集論文1 人にやさしい対話型コンピュータ)

    牧野昭二, 向井良, 荒木章子

    NTT R & D   50 ( 12 ) 937 - 944  2001年12月

    CiNii

  • サブバンド信号処理 : 実時間動作化の奥の手

    牧野昭二

    日本音響学会誌   56 ( 12 ) 845 - 851  2000年12月

    DOI CiNii

  • 周波数帯域における音響エコー経路の変動特性を反映させたサブバンドESアルゴリズム

    牧野昭二, 羽田陽一

    電子情報通信学会論文誌. A, 基礎・境界   79 ( 6 ) 1138 - 1146  1996年06月

     概要を見る

    本論文は, 従来のNLMSアルゴリズムと同等の演算量と記憶容量で収束速度が約2倍の, 新しいエコーキャンセラ用適応アルゴリズムを提案するものである. サブバンド ES (exponentially weighted stepsize) アルゴリズムと名づけたこの適応アルゴリズムでは, 受話入力信号とエコー信号を複数の周波数帯域に分割し, それぞれの周波数帯域で独立にエコーを消去するサブバンドエコーキャンセラにおいて, それぞれの周波数帯域に設けた適応形トランスバーサルフィルタのそれぞれの係数に対して, 異なるステップサイズを用いている. これらのステップサイズは時不変で, その周波数帯域における室内インパルス応答の変化分, 例えば二つのインパルス応答波形の差, の期待値に比例して指数的に重み付けられている. その結果, 各周波数帯域における音響エコー経路の変動特性の違いを適応アルゴリズムに反映させ, 収束特性を改善することができる. ここでは, 室内音場のインパルス応答データを用いた計算機シミュレーション, およびDSPで構成した実験装置を用いた実時間評価実験を行い, NLMSアルゴリズムを用いた従来のフルバンドエコーキャンセラに比べて, 音声入力に対する収束速度を約4倍にできることを明らかにする.

    CiNii

  • 音響エコ-キャンセラ用ES射影アルゴリズム (シ-ムレスな音響空間の実現を目指して<特集>)

    牧野昭二, 金田豊

    NTT R & D   44 ( 1 ) p45 - 52  1995年01月

    CiNii

  • 音響エコー経路の変動特性を反映させたRLS適応アルゴリズム

    牧野昭二, 金田豊

    日本音響学会誌   50 ( 1 ) 32 - 39  1993年12月

    CiNii

  • 2つの超ガウス性複素信号の位相観測を用いない相関係数推定 (信号処理)

    宮部 滋樹, 小野 順貴, 牧野 昭二

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   114 ( 474 ) 19 - 24  2015年03月

     概要を見る

    本稿では,2つの有相関な複素振幅系列の位相が失われた観測から,元の位相を持った信号の相関係数を推定する方法について議論する。我々は以前に,2つの複素振幅系列が2変量複素正規分布に従うと仮定した確率モデルを立てて,位相差を隠れ変数とするEMアルゴリズムによって相関系列を推定する手法を提案した。しかし,優ガウス的であることが多い複素振幅とのモデルミスマッチにより,信号によっては推定精度が低下してしまうという問題があった。本稿では,2つの複素信号系列が2変量複素t分布に従うと仮定して,優ガウス的な信号の形状に適応的な最尤推定を定式化することにより,モデルミスマッチに頑健な相関係数推定を試みる。実験の結果,複素t分布モデルは信号によっては必ずしも複素正規分布モデルよりも高精度とは限らないが,適切なモデルを選択することにより,単純な絶対値の相関を用いるよりも高い精度の推定が得られることを確認した。

    CiNii

  • 非同期マイクロホンアレーのためのサンプリング周波数ミスマッチのブラインド補償 (応用音響)

    宮部 滋樹, 小野 順貴, 牧野 昭二

    電子情報通信学会技術研究報告 : 信学技報   112 ( 347 ) 11 - 16  2012年12月

     概要を見る

    本稿では,非同期マイクロホンアレーのためのチャネル間のサンプリング周波数ミスマッチをブラインドに推定し補償する手法について述べる.サンプリング周波数のミスマッチによるチャネル問の時間差の変化は短時間では一定となるため,フレーム毎に周波数領域で位相を操作することで補償する.また,音源が移動しないと仮定した最尤推定により,サンプリング周波数のミスマッチを推定する.実験により提案手法はアレー信号処理の性能を大幅に回復できることが確認された.

    CiNii

  • 単一音源区間情報を用いた非同期マイクロホンアレーによる音声強調 (応用音響)

    坂梨 龍太郎, 小野 順貴, 宮部 滋樹, 山田 武志, 牧野 昭二

    電子情報通信学会技術研究報告 : 信学技報   112 ( 347 ) 17 - 22  2012年12月

     概要を見る

    非同期マイクロホンアレーは,携帯電話やボイスレコーダーなど複数の録音機器を用いることで,従来のマイクロホンアレーによる音響信号処理における拡張性による制約がなく,安価で柔軟な構成を行えるという利点がある.しかし,非同期マイクロホンアレーには,録音開始時刻やDOA情報が不明であり,機器間のサンプリング周波数における未知の個体差が存在するなどの問題点も挙げられる.特に,録音開始時刻のずれや機器間のサンプリング周波数個体差は,音響信号処理に重大な影響を与え,これを補償することが必要となる.本稿では,議事録作成のための会議録音など,予め音声強調を目的とした場面を想定し,ある特定の音源だけが音を生じている時間区間である,単一音源区間を録音信号に盛り込むことでそれを手がかりとした同期補償を提案する.

    CiNii

  • 球状スピーカアレーを用いた放射特性制御のシミュレーション

    林 貴哉, 宮部 滋樹, 山田 武志, 牧野 昭二

    電子情報通信学会技術研究報告. EA, 応用音響   112 ( 76 ) 19 - 24  2012年06月

     概要を見る

    本稿では球状スピーカアレーによる距離減衰制御システムについて述べる.従来研究において,球面調和領域における波の伝搬モデルを用いて球状マイクロホンアレーにより音源からの距離に対する感度を制御する放射特性フィルタが提案されている.入力と出力が入れ替わっても変化しないという伝達関数の性質から,マイクロホンアレーのための放射特性フィルタはラウドスピーカアレーによる距離減衰のフィルタ設計に直接応用することができる.無残響シミュレーションの結果,低い周波数では提案手法が有効であることを確認した.

    CiNii

  • 高次モーメント分析に基づく非線形MUSICによる劣決定方向推定

    杉本 侑哉, 宮部 滋樹, 山田 武志, 牧野 昭二

    電子情報通信学会技術研究報告. EA, 応用音響   112 ( 76 ) 49 - 54  2012年06月

     概要を見る

    本稿では,高次モーメント分析によってMUltiple SIgnal Classification(MUSIC)を劣決定条件の下での方向推定へと拡張した写像MUSICを提案する.写像MUSICは非線形関数により観測信号を高次元空間へと写像し,写像空間において共分散行列の分析を行う.このとき,写像の共分散行列は信号の高次クロスモーメント行列に相当する.高次元写像によって雑音部分空間の次元数が増大することから,方向推定の分解能が向上し,劣決定条件の下での方向推定が可能となる.信号の高次キュムラント分析に基づいた2q-MUSICとの理論・実験の両面からの比較によって写像MUSICの性質について議論し,現実的な条件の下ではより少ない計算量で同等の推定精度が得られることを示す.

    CiNii

  • D-14-9 日本語スピーキングテストにおける文生成問題の採点に影響を及ぼす要因の検討(D-14.音声,一般セッション)

    大久保 梨思子, 山畑 勇人, 山田 武志, 今井 新悟, 石塚 賢吉, 篠崎 隆宏, 西村 竜一, 牧野 昭二, 北脇 信彦

    電子情報通信学会総合大会講演論文集   2012 ( 1 ) 193 - 193  2012年03月

    CiNii

  • D-14-8 日本語スピーキングテストにおける文章読み上げ問題の採点に影響を及ぼす要因の検討(D-14.音声,一般セッション)

    山畑 勇人, 大久保 梨思子, 山田 武志, 今井 新悟, 石塚 賢吉, 篠崎 隆宏, 西村 竜一, 牧野 昭二, 北脇 信彦

    電子情報通信学会総合大会講演論文集   2012 ( 1 ) 192 - 192  2012年03月

    CiNii

  • 空間パワースペクトルの主成分分析に基づく時間断続信号の検出

    加藤 通朗, 杉本 侑哉, 牧野 昭二

    聴覚研究会資料   40 ( 7 ) 575 - 580  2010年08月

    CiNii

  • 音声区間推定と時間周波数領域方向推定の統合による会議音声話者識別

    荒木 章子, 藤本 雅清, 石塚 健太郎, 中谷 智広, 澤田 宏, 牧野 昭二

    電子情報通信学会技術研究報告. EA, 応用音響   108 ( 143 ) 19 - 24  2008年07月

     概要を見る

    我々は、会議状況において「いつ誰が話したか」を推定する方法を検討している。これは、音声区間検出器(VAD)で推定した音声存在確率と、音声区間における音声到来方向(DOA)の分類結果とを用いて、会議音声中の各話者の音声区間を推定するものである。これを本稿では話者識別と呼ぶ。本稿では、この性能向上を目的とし、2つの方法を提案する。提案1として、DOAを各時間周波数スロットで推定することで、特に複数人同時発話時の話者識別精度を向上させる。提案2として、VAD結果およびDOA情報を確率的に統合する方法を検討する。両提案法により、実際の会話音声データに対して、話者識別性能の向上が見られたので報告する。

    CiNii

  • Special section on acoustic scene analysis and reproduction - Foreword

    Shoji Makino

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E91A ( 6 ) 1301 - 1302  2008年06月

    その他  

  • 独立成分分析に基づくブラインド音源分離

    牧野 昭二

    電子情報通信学会技術研究報告. SIP, 信号処理   108 ( 70 ) 65 - 73  2008年05月

     概要を見る

    たくさんの音の中から聞きたい音を聞き分ける音源分離技術として,近年,独立成分分析(Independent Component Analysis, ICA)に基づく手法が脚光を浴びている.この手法は,音源位置の知識や目的音(妨害音)区間の切り出しを原理的に必要とせず,完全なブラインド分離が可能である.統計的処理であるICAは,物理的,音響的にはある種のブラックボックスであり,その中で何が行われているのか,何がどこまで分離できるのかがあまりわかっていなかった.我々はこれまでの研究により,統計的手法であるICAを音響信号処理的な観点から分析して物理的意味づけを与え,従来の音響信号処理技術との関係を解明した.そして,ICAに基づくブラインド音源分離が,適応ビームフォーマ(adaptive beamformer, ABF)と呼ばれるマイクロホンアレーと同じ動作原理を実現しており二乗誤差最小の意味で等価であることを明らかにした.2マイクのABFの支配的な動作は妨害音に1つの死角を向ける動作である.これより,様々な方向からの残響音を消せないことがICAが残響に弱い理由の一つであること,ABFがICAの性能の上限を与えることなどを明らかにした.しかしながら,ICAは音源位置の知識や妨害音区間の切り出しが不要で,音源信号が同時になっていても全く問題ないという点で,ABFの高機能版と言える.

    CiNii

  • 周波数領域ICAにおける初期値の短時間データからの学習

    荒木章子, 伊藤信貴, 澤田宏, 小野順貴, 牧野昭二, 嵯峨山茂樹

    電子情報通信学会大会講演論文集   2008   208 - 208  2008年03月

    CiNii J-GLOBAL

  • 音声のスパース性を用いた Underdetermined 音源分離

    荒木章子, 澤田 宏, 牧野 昭二

    電子情報通信学会総合大会基礎 境界講演論文集, 2008     "S - 46"-"S-47"  2008年

    CiNii

  • A-10-7 観測ベクトルのクラスタリングによるブラインド音源分離(A-10.応用音響,基礎・境界)

    荒木 章子, 澤田 宏, 向井 良, 牧野 昭二

    電子情報通信学会ソサイエティ大会講演論文集   2005   208 - 208  2005年09月

    CiNii

  • A-10-9 多くの背景音からの主要音源のブラインド抽出(A-10.応用音響,基礎・境界)

    澤田 宏, 荒木 章子, 向井 良, 牧野 昭二

    電子情報通信学会ソサイエティ大会講演論文集   2005   210 - 210  2005年09月

    CiNii

  • A-10-8 3次元マイクロホンアレイを用いた多音源ブラインド分離(A-10.応用音響,基礎・境界)

    向井 良, 澤田 宏, 荒木 章子, 牧野 昭二

    電子情報通信学会ソサイエティ大会講演論文集   2005   209 - 209  2005年09月

    CiNii

  • 移動音源の低遅延実時間ブラインド分離

    向井 良, 澤田 宏, 荒木 章子, 牧野 昭二

    日本音響学会研究発表会講演論文集   2003 ( 1 ) 779 - 780  2003年03月

    CiNii

  • 独立成分分析に基づくブラインド音源分離

    牧野昭二

    ディジタル信号処理シンポジウム   103 ( 129 ) 17 - 24  2003年

    記事・総説・解説・論説等(学術雑誌)  

     概要を見る

    私たちが普段それほど意識せずに行っている「聞きたい音を聞き分ける」という能力がコンピュータには欠けている.独立成分分析に基づく手法は,ある人が話している声と別の人の声,背景に流れる音楽,雑音等,それぞれの音は互いに統計的に独立であるという仮定により,複数のマイクで観測した信号を互いに独立な信号に分離すれば,それぞれのもとの音を復元できる,という原理に基づいている.この手法は,音源や混合系の情報を原理的に必要としない,いわゆるブラインドな分離が可能である.招待講演では,独立成分分析とは何か,ブラインド音源分離とは何か,どのようにして分離が達成されるのか,分離のメカニズムはどのようなものか,などについて,できるだけ直感的に分り易く説明する[1].

    CiNii

  • サブバンド処理によるブラインド音源分離に関する検討

    荒木 章子, AICHNER Robert, 牧野 昭二, 西川 剛樹, 猿渡 洋

    日本音響学会研究発表会講演論文集   2002 ( 1 ) 619 - 620  2002年03月

    CiNii

  • 間隔の異なる複数のマイクペアによるブラインド音源分離

    澤田 宏, 荒木 章子, 向井 良, 牧野 昭二

    日本音響学会研究発表会講演論文集   2002 ( 1 ) 621 - 622  2002年03月

    CiNii

  • 周波数領域ICAと時間遅れスペクトル減算による残響下での実時間ブラインド音源分離

    向井 良, 荒木 章子, 澤田 宏, 牧野 昭二

    日本音響学会研究発表会講演論文集   2002 ( 1 ) 673 - 674  2002年03月

    CiNii

  • 周波数領域ブラインド音源分離と周波数領域適応ビームフォーマの関係について

    荒木 章子, 牧野 昭二, 向井 良, 猿渡 洋

    日本音響学会研究発表会講演論文集   2001 ( 2 ) 613 - 614  2001年10月

    CiNii

  • 非定常スペクトルサブトラクションによる音源分離後の残留雑音除去

    向井 良, 荒木 章子, 澤田 宏, 牧野 昭二

    日本音響学会研究発表会講演論文集   2001 ( 2 ) 617 - 618  2001年10月

    CiNii

  • 実環境での混合音声に対する周波数領域ブラインド音源分離手法の性能限界

    荒木 章子, 牧野 昭二, 西川 剛樹, 猿渡 洋

    日本音響学会研究発表会講演論文集   2001 ( 1 ) 567 - 568  2001年03月

    CiNii

  • 実環境におけるブラインド音源分離と残響除去性能に関する検討

    向井 良, 荒木 章子, 牧野 昭二

    日本音響学会研究発表会講演論文集   2001 ( 1 ) 565 - 566  2001年03月

    CiNii

  • 帯域分割型 ICA を用いた Blind Source Separation における帯域分割数の最適化

    西川 剛樹, 荒木 章子, 牧野 昭二, 猿渡 洋

    日本音響学会研究発表会講演論文集   2001 ( 1 ) 569 - 570  2001年03月

    CiNii

  • 周波数領域 Blind Source Separation における帯域分割数の最適化

    西川 剛樹, 荒木 章子, 牧野 昭二, 猿渡 洋

    電子情報通信学会技術研究報告. EA, 応用音響   100 ( 580 ) 53 - 59  2001年01月

     概要を見る

    本稿では周波数領域Blind Source Separation(BSS)における帯域分割数の最適化について述べる.一般に, 従来の周波数領域ICAに基づくBSSは残響に弱い.一方, 残響除去を行う逆フィルタを構成する際, フィルタ長(もしくは帯域分割数)を増やすと残響抑圧性能が向上するということが知られている.そこでまず初めに, 残響下における分離性能を向上させるため, BSSにおける帯域分割数を増加させて音源分離実験を行った.音源分離実験の結果, 帯域分割数を過度に増やすと分離精度が劣化してしまうことが分かった.次に, この劣化原因を明らかにするため, 独立性を測る簡易な客観評価量を定義し, 帯域分割数と狭帯域信号間の独立性の関係を調べた.独立性評価実験の結果より, 帯域分割数を増やすと独立性が低下することが確認された.よって, 周波数領域ICAに基づくBSSにおいては, 最適な帯域分割数が存在することがわかった.

    CiNii

  • チャネル数変換型多チャネル音響エコーキャンセラ

    中川 朗, 島内 末廣, 羽田 陽一, 青木 茂明, 牧野 昭二

    電子情報通信学会総合大会講演論文集   2000   140 - 140  2000年03月

    CiNii

  • ステレオエコーキャンセラにおける相互相関変動方式の検討

    鈴木 邦和, 杉山 精, 阪内 澄宇, 島内 末廣, 牧野 昭二

    電子情報通信学会技術研究報告. EA, 応用音響   99 ( 518 ) 25 - 32  1999年12月

     概要を見る

    ステレオ音声による拡声通話システムで必要となるステレオエコーキャンセラでは,ステレオの受話信号間の相互相関が高い場合が多く,適応フィルタが真のエコー経路に収束しない,真値への収束速度が遅い,といった問題がある.これらの問題を解決するために,人為的に相関を変動させる前処理方式が数多く提案されているが,これらの方式は音声に歪みを伴うという欠点がある.本報告では,実際の通話時における遠端の送話者の微小な移動に着目し,音像の定位を乱すことなく相互相関を変動させる方式を提案する.さらに聴覚特性を考慮した最適化により,収束性能の向上と音声品質保持の両立が可能であるという検討結果を示す.

    CiNii

  • 音響系の変動に着目したステレオ信号の相関低減方法 -(第2報)聴覚特性を考慮した最適化

    鈴木 邦和, 阪内 澄宇, 島内 末廣, 牧野 昭二

    日本音響学会研究発表会講演論文集   1999 ( 2 ) 495 - 496  1999年09月

    CiNii

  • 音響系の変動に着目したステレオ信号の相関低減方法

    鈴木 邦和, 阪内 澄宇, 島内 末廣, 牧野 昭二

    日本音響学会研究発表会講演論文集   1999 ( 1 ) 453 - 454  1999年03月

    CiNii

  • ハンズフリー音声会議装置における複数マイクロホンの構成の検討

    中川 朗, 島内 末廣, 牧野 昭二

    日本音響学会研究発表会講演論文集   1999 ( 1 ) 493 - 494  1999年03月

    CiNii

  • 相互相関の変動付加処理に適したステレオエコーキャンセラの構成の検討

    島内 末廣, 羽田 陽一, 牧野 昭二, 金田 豊

    電子情報通信学会総合大会講演論文集   1998   121 - 121  1998年03月

    CiNii

  • ブロック長を独立にしたブロック高速射影法

    田中 雅史, 牧野 昭二, 小島 順治

    電子情報通信学会総合大会講演論文集   1997   554 - 555  1997年03月

     概要を見る

    Block processing is an effective approach for reducing the computational complexity of adaptive filtering algorithms although it delays the adaptive filter output and degrades the convergence rate in some implementations. Recently, Benesty[1] proposed a solution to the problems. He introduced the idea of 'exact' block processing which produces the filter output exactly the same as that of the corresponding sample-by-sample algorithm and has short delay by facilitating the fast FIR filtering method. Block processing can be applied to two parts of the adaptive filtering algorithms, i.e. computing the filter output and updating the filter. Conventional 'exact' block algorithms have been using the identical block size for the two parts. This short paper presents the 'exact' block projection algorithm [2] having two independent block sizes, which is listed in List 1. We see, by showing the relation between the filter length and the output delay for a given computation power, that the independent block sizes extend the availability of the 'exact' block fast projection algorithm toward use with longer delay.

    CiNii

  • サブバンドエコーキャンセラにおけるフィルタ係数更新ベクトルの平坦化の検討

    中川 朗, 羽田 陽一, 牧野 昭二

    電子情報通信学会ソサイエティ大会講演論文集   1996   88 - 88  1996年09月

     概要を見る

    サブバンドエコーキャンセラ(SBEC)では、 間引き率を上げ分割数に近付けると、エリアジングの影響により定常消去量が劣下する。これを避けるために間引き率を下げると、適応フィルタヘの入力信号に帯域通過フィルタの特'性が影響し、収束速度が劣下する。筆者らはこの問題に対し、入力信号と反響信号に異なる特'性の分割フィルタを設定する方法を既に提案した。本報告では、適応フィルタ係数の更新部への入力信号が固定の周波数特性を持つことに注目し、これを固定係数の平坦化フィルタで平坦化することによって収束特性を改善する方法を提案する。

    CiNii

  • 拡声通信システムにおける周波数帯域別所要エコー抑圧量の検討

    阪内 澄宇, 牧野 昭二

    日本音響学会研究発表会講演論文集   1996 ( 2 ) 547 - 548  1996年09月

    CiNii

  • 射影アルゴリズムを用いたサブバンドステレオエコーキャンセラ

    牧野 昭二, 島内 末廣, 羽田 陽一, 中川 朗

    日本音響学会研究発表会講演論文集   1996 ( 2 ) 549 - 550  1996年09月

    CiNii

  • 高速射影アルゴリズムの多チャンネル系への適用

    島内 末廣, 田中 雅史, 牧野 昭二

    電子情報通信学会総合大会講演論文集   1996   170 - 170  1996年03月

     概要を見る

    線形未知システムに対する入出力をもとに、そのシステムを同定する一手法として、次数の選択により、演算量に応じた同定速度が得られる射影アルゴリズムがある。高速算法の利用により、演算量はさらに低減可能である。また、ステレオ音響エコーキャンセラヘの適用等、多チャンネル系の同定法としても提案されている。本報告では、多チャンネル系に拡張された射影アルゴリズムに高速算法を適用する。

    CiNii

  • サブバンドエコーキャンセラのプロトタイプフィルタの検討

    中川 朗, 羽田 陽一, 牧野 昭二

    電子情報通信学会ソサイエティ大会講演論文集   1995   75 - 75  1995年09月

     概要を見る

    サブバンドエコーキャンセラ(SBEC)は、音声の白色化効果による適応フィルタの収束速度向上、間引きによる演算量の低減が望める。その一方で、帯域分割/合成フィルタ処理による遅延や定常消去量の低下が問題となる。本報告では、図1に示すポリフェーズ型SBECの2つの帯域分割用プロトタイプフィルタA(z)、B(z)のフィルタ長および適応フィルタ長に着目し、収束特性の改善方法について検討した。

    CiNii

  • 擬似入出力関係利用したステレオ音響エコーキャンセラ用アルゴリズムの検討

    島内 末廣, 牧野 昭二

    日本音響学会研究発表会講演論文集   1995 ( 2 ) 543 - 544  1995年09月

    CiNii

  • エコーキャンセラ用SSBサブバンド射影アルゴリズム

    牧野 昭二, 羽田 陽一, 中川 朗

    日本音響学会研究発表会講演論文集   1995 ( 2 ) 541 - 542  1995年09月

    CiNii

  • 複素射影サブバンドエコーキャンセラに関する検討

    中川 朗, 羽田 陽一, 牧野 昭二

    日本音響学会研究発表会講演論文集   1995 ( 2 ) 539 - 540  1995年09月

    CiNii

  • 真の音響エコー経路を推定するステレオ射影エコーキャンセラの検討

    島内 末廣, 牧野 昭二

    電子情報通信学会総合大会講演論文集   1995   220 - 220  1995年03月

     概要を見る

    ステレオ音声による通信会議等に不可欠となるステレオエコーキャンセラには、ステレオ信号の相互相関の影響により音響エコー経路の推定を誤る問題がある。このため、話者交替等の度に残留エコーの増大が起きる。本報告では、ステレオ信号の相互相関の変化を強制して利用するステレオ射影エコーキャンセラについて、真の音響エコー経路の推定への有効性と話者交替時の残留エコー増大の低減効果を示す。

    CiNii

  • ES射影アルゴリズムの音響エコーキャンセラへの適用

    牧野 昭二, 羽田 陽一, 田中 雅史, 金田 豊, 小島 順治

    電子情報通信学会総合大会講演論文集   1995   349 - 349  1995年03月

     概要を見る

    エコーキャンセラを実環境で安定に動作させるためには,受話音声の微小音区間に対する対策や,ダブルトーク対策が重要である.ここでは,ES射影アルゴリズムをDuo Filter構成のエコーキャンセラに適用し,速い収束と安定な動作を実現したので報告する.

    CiNii

  • ES射影アルゴリズムを用いたデュオフィルタ構成のエコーキャンセラの検討

    羽田 陽一, 牧野 昭二, 小島 順治, 島内 末廣

    日本音響学会研究発表会講演論文集   1995 ( 1 ) 595 - 596  1995年03月

    CiNii

  • 音響エコーキャンセラ用デュオフィルタコントロールシステム

    羽田陽一, 牧野 昭二, 田中 雅史, 島内 末廣, 小島 順治

    1995電子情報通信学会総合大会, March     350 - 350  1995年

     概要を見る

    音響エコーキャンセラを実環境で動作させるためには、(1)ダブルトーク検出を含め、適応動作制御を如何に行なうか。(2)適応フィルタが音響系を同定していない状態で如何にハウリングを抑えるか。の2点が特に重要となる。(1)のダブルトークの検出技術に関してはこれまで多くの研究がなされてきているが、特に優れた方式はなく、送話信号と受話信号のパワー比較などで行なわれている。また、(2)に関しては音声スイッチとの併用法が提案されているが、音響結合量を予測して最適な挿入損失量を与えないと、結果的に過大な挿入損失を与えてしまい、通話に切断感を与えてしまう。本報告では、ES射影アルゴリズムを用いたDuo Filter Control Systemを提案し、上記2つの問題を解決したので報告する。

    CiNii

  • 高速FIRフィルタリング算法を利用した射影法

    田中雅史, 牧野 昭二, 金田 豊

    信学ソ大, Sept. 1995     81 - 81  1995年

     概要を見る

    近年提案されている高速射影法の演算量は適応フィルタの次数をL、射影の次数をpとすると、約2L+20pであり、演算量2LのNLMS(学習同定法)とほぼ同程度の演算量の少ない手法といえる。しかし、音響エコーキャンセラのようにフィルタ長Lが数百、数千にもなる応用ではさらなる演算量の削減が要求される。本報告では、高速FIRフィルタリング算法を射影法に導入することで、さらに演算量を削減する方法を示す。この提案法では、高速FIRフィルタリング算法がブロック処理を行なうので推定誤差の出力が遅れるが、その他の収束特性は逐次処理に基づくオリジナルの射影法の性能が保たれる。

    CiNii

  • 音響エコーキャンセラのための適応信号処理の研究

    牧野昭二

    東北大学博士論文   71 ( 12 ) 2212 - 2214  1993年

    CiNii

  • 帯域分割形指数重み付けアルゴリズムを用いた音響エコーキャンセラ

    牧野昭二

    信学全大,SA-9-4    1990年

    CiNii

▼全件表示

産業財産権

  • Device for blind source separation

    H., Sawada, S., Araki, R., Mukai, and, S. Makino, 牧野, 昭二

    特許権

  • Device for blind source separation

    S., Araki, H., Sawada, S., Makino, and, R. Mukai

    特許権

  • Apparatus, method and program for estimation of positional information on signal sources

    H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    特許権

  • 音情報処理装置及びプログラム

    牧野, 昭二, 山岡洸瑛, 山田武志, 小野順貴

    特許権

  • 音響処理装置, 音響処理システム及び音響処理方法

    牧野昭二, 石村, 大, 前, 成美, 山田武志, 小野順貴

    特許権

  • 信号処理装置、信号処理方法、プログラム、記録媒体 (可変カットオフ周波数によるポストフィルタリング方法)

    鎌本,優, 守谷,健弘, 原田,登, 千葉,大将, 宮部,滋樹, 山田,武志, 牧野,昭二

    特許権

  • 音声信号処理装置及び方法

    小野,順貴, 宮部,滋樹, 牧野,昭二

    特許権

  • 信号処理装置、信号処理方法、プログラム (ピッチ周波数に依存する可変ゲインによるポストフィルタリング方法)

    鎌本,優, 守谷,健弘, 原田,登, 千葉,大将, 宮部,滋樹, 山田,武志, 牧野,昭二

    特許権

  • 方向情報分布推定装置, 音源数推定装置, 音源方向測定装置, 音源分離装置, それらの方法, それらのプログラム

    荒木, 章子, 中谷, 智広, 澤田, 宏, 牧野, 昭二

    特許権

  • 複数信号区間推定装置, 複数信号区間推定方法, そのプログラムおよび記録媒体

    荒木, 章子, 石塚, 健太郎, 藤本, 雅清, 中谷, 智広, 牧野, 昭二

    特許権

  • 複数信号区間推定装置とその方法と, プログラムとその記録媒体

    荒木, 章子, 石塚, 健太郎, 藤本, 雅清, 牧野, 昭二

    特許権

  • 信号分離装置, 信号分離方法, プログラム及び記録媒体

    澤田, 宏, 荒木, 章子, 牧野, 昭二

    特許権

  • 信号分離装置, 信号分離方法, 信号分離プログラム, 記録媒体

    澤田, 宏, 荒木, 章子, 牧野, 昭二

    特許権

  • 多信号強調装置, 方法, プログラム及びその記録媒体

    荒木, 章子, 澤田, 宏, 牧野, 昭二

    特許権

  • ブラインド信号抽出装置, その方法, そのプログラム, 及びそのプログラムを記録した記録媒体

    荒木, 章子, 澤田, 宏, Jan, Cermak, 牧野, 昭二

    特許権

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

    特許権

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

    特許権

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体, 並びに, 信号到来方向推定装置, 信号到来方向推定方法, 信号到来方向推定プログラム及び記録媒体

    澤田, 宏, 牧野, 昭二, 荒木, 章子, 向井, 良

    特許権

  • 信号到来方向推定装置, 信号到来方向推定方法, 信号到来方向推定プログラム及び記録媒体

    荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

    特許権

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

    特許権

  • 信号到来方向推定方法, 装置, プログラムおよびこれを記録した記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    特許権

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

    特許権

  • 信号抽出装置, 信号抽出方法, 信号抽出プログラム及び記録媒体

    澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

    特許権

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    荒木, 章子, 牧野, 昭二, 澤田, 宏, 向井, 良

    特許権

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    特許権

  • 信号源数の推定方法, 推定装置, 推定プログラム及び記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    特許権

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    荒木, 章子, 牧野, 昭二, 澤田, 宏, 向井, 良

    特許権

  • 信号分離方法, 信号分離装置, 信号分離プログラム及び記録媒体

    澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

    特許権

  • 信号分離方法および装置ならびに信号分離プログラムおよびそのプログラムを記録した記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    特許権

  • ブラインド信号分離装置, ブラインド信号分離方法及びブラインド信号分離プログラム

    荒木, 章子, 牧野, 昭二, 向井, 良, 澤田, 宏

    特許権

  • ブラインド信号分離装置, ブラインド信号分離方法及びブラインド信号分離プログラム

    向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

    特許権

  • ブラインド信号分離方法, ブラインド信号分離プログラム及び記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    特許権

  • 信号到来方向推定方法, 装置, プログラムおよびこれを記録した記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    特許権

  • SmoothQuiet

    牧野, 昭二, 小島, 順治

    特許権

  • QuiteSmooth

    牧野, 昭二, 小島, 順治

    特許権

  • EchoCam

    牧野, 昭二, 小島 順治

    特許権

  • SUBBANDES

    牧野, 昭二, 羽田, 陽一, 小島, 順治

    特許権

  • ESPARC

    羽田, 陽一, 牧野, 昭二, 小島, 順治

    特許権

  • Radespa

    羽田, 陽一, 牧野, 昭二, 小島, 順治

    特許権

  • DISCAS

    羽田, 陽一, 牧野, 昭二, 小島, 順治

    特許権

  • ES射影アルゴリズム

    牧野, 昭二, 羽田, 陽一, 小島, 順治

    特許権

  • デュオフィルタ

    牧野, 昭二, 羽田, 陽一, 小島, 順治

    特許権

  • インテリジェント ロス コントローラ

    牧野, 昭二, 羽田, 陽一, 小島, 順治

    特許権

  • フェールセーフ適応動作制御方式

    牧野, 昭二, 羽田, 陽一, 小島, 順治

    特許権

  • スムーストーク

    牧野, 昭二, 小島, 順治

    特許権

▼全件表示

 

現在担当している科目

▼全件表示

担当経験のある科目(授業)

  • 情報科学概論Ⅱ

    筑波大学  

 

他学部・他研究科等兼任情報

  • 理工学術院   基幹理工学部

学内研究所・附属機関兼任歴

  • 2022年
    -
    2024年

    理工学術院総合研究所   兼任研究員

特定課題制度(学内資金)

  • ⾳環境の認識と理解のための⾰新的マイクロホンアレー基盤技術の研究

    2023年  

     概要を見る

    ブラインド処理と空間正則化処理に基づいてオンライン音源分離,残響除去,およびノイズ低減を実行する,計算効率の高い同時最適化アルゴリズムを提案した.まず,独立ベクトル抽出(IVE)と重み付き予測誤差残響除去(WPE)のブラインドオンライン同時最適化アルゴリズムを提案した.このオンラインアルゴリズムは,WPEを使用することで残響を低減できるため,短い分析フレームでも正確な分離を実現できた.次に,オンライン同時最適化をロバストな空間正則化で拡張した.DOA ベースの空間正則化を確実に機能させるためには,分離された信号のスケールを正規化することが非常に効果的であることを明らかにした.実験では,ブラインドオンライン同時最適化アルゴリズムが 8 ms のアルゴリズム遅延で分離精度を大幅に改善できることを確認した.さらに,提案した空間正則化オンライン同時最適化アルゴリズムが音源順序エラーを 0 % に低減することを確認した.

  • 音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

    2022年  

     概要を見る

    空間正則化付き独立ベクトル抽出(SRIVE)は,事前推定した音響伝達関数を用いて,所望の出力順序になるように音源分離を行う.しかし,従来のSRIVEはスケール任意性や伝達関数の誤差による出力順序誘導への影響が十分に考慮されていなかった.本研究では,空間正則化に加えてさらに分離フィルタのスケールを小さくする正則化を導入することで上記の問題の解決を試みた.実験より,スケール正則化が分離性能(SDR)を維持しつつ,出力順序正答率を75%から100%に改善することを確かめた.

  • 音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

    2021年  

     概要を見る

    Thisresearch explores whether the newly proposed online algorithm that jointlyoptimizes weighted prediction error (WPE) and independent vector analysis (IVA)works well in separating moving sound sources in reverberant indoorenvironments. The moving source is first fixed and then rotated 60 degrees in aroom at a speed of less than 10 cm/s, while the other remains fixed. Throughthe comparison of the online-AuxIVA, online-WPE+IVA (separate), andonline-WPE+IVA (joint) algorithms, we can conclude that the online-WPE+IVA(joint) method has the best separation performance when the sources are fixed,but online-WPE+IVA (separate) is more stable and has better performance whenremoving moving sources from the mixed sound.