Updated on 2024/12/21

写真a

 
MAKINO, Shoji
 
Affiliation
Faculty of Science and Engineering, Graduate School of Information, Production, and Systems
Job title
Professor
Degree
博士 ( 東北大学 )
Profile

Shoji Makino is a Professor at the University of Tsukuba. His research interests include adaptive filtering technologies and the realization of acoustic echo cancellation, blind source separation of convolutive mixtures of speech, and acoustic signal processing for speech and audio applications. He was the Chair of the TC on Blind Signal Processing of the IEEE CAS Society, General Chair of the IEEE WASPAA2007, Associate Editor of the IEEE Trans. SAP, Vice President of the Engineering Sciences Society of the IEICE, and the Chair of the TC on Engineering Acoustics of the IEICE.

Research Experience

  • 2021.04
    -
    Now

    早稲田大学 大学院情報生産システム研究科 教授

  • 2009.04
    -
    2021.03

    筑波大学 先端学際領域研究センター および 大学院システム情報工学研究科   教授

  • 2014.04
    -
    2018.03

    National Institute of Informatics   Guest Professor

  • 2013.04
    -
    2018.03

    理化学研究所   客員研究員

  • 2008.04
    -
    2009.03

    NTT Communication Science Laboratories, Atsugi, Japan.   Senior Research Scientist, Supervisor

  • 2008.12
    -
    2009.02

    University Erlangen-Nuremberg, Germany.   Guest Professor

  • 2004.04
    -
    2008.03

    Hokkaido University, Sapporo, Japan.   Guest Professor

  • 2003.04
    -
    2008.03

    the NTT Communication Science Laboratories, Atsugi, Japan.   Media Information Laboratory   Executive Manager

  • 2006.04
    -
    2007.03

    The University of Tokyo   The Graduate School of Information Science and Technology

  • 2000.04
    -
    2003.03

    the NTT Communication Science Laboratories, Kyoto, Japan.   Group Leader at the Speech Open Laboratory   Senior Research Scientist, Supervisor

  • 1999.01
    -
    2000.03

    the NTT Lifestyle and Environmental Technology Laboratories, Atsugi, Japan.   r, Group Leader at the Multimedia Electronics Laboratory   Senior Research Engineer, Superviso

  • 1996.07
    -
    1998.12

    the NTT Multi-Media System Laboratory Group, Yokosuka, Japan.   Strategic Planning   Senior Research Engineer, Supervisor

  • 1987.08
    -
    1996.06

    the NTT Human Interface Laboratories, Musashino, Japan.   Speech and Acoustics Laboratory   Senior Research Engineer, Supervisor

  • 1981.04
    -
    1987.07

    NTT Electrical Communication Laboratory, Yokosuka, Japan.   Research Engineer

▼display all

Education Background

  • 1993.03
    -
     

    Tohoku University  

  • 1979.04
    -
    1981.03

    Tohoku University   Graduate School of Engineering   Mechanical Engineering  

  • 1975.04
    -
    1979.03

    Tohoku University   Faculty of Engineering  

Committee Memberships

  • 2019
    -
    Now

    日本学術振興会  Member of the Grants-in-Aid for Scientific Research Sub-Committee

  • 2019
    -
    Now

    European Association for Signal Processing (EURASIP)  Member of the Special Area Team on Acoustic, Speech and Music Signal Processing

  • 2018
    -
    Now

    Asia Pacific Signal and Information Processing Association  Member of the Signal and Information Processing Theory and Methods Technical Committee

  • 2014.05
    -
    Now

    電子情報通信学会  応用音響研究会 顧問

  • 2013
    -
    Now

    日本音響学会  理事

  • 2007
    -
    Now

    電子情報通信学会  フェロー

  • 2005
    -
    Now

    日本音響学会  評議員

  • 2004.04
    -
    Now

    International Speech Communication Association (ISCA)  Member

  • 2004
    -
    Now

    Institute of Electrical and Electronics Engineers (IEEE)  Fellow

  • 2003
    -
    Now

    日本音響学会  代議員

  • 2003
    -
    Now

    International ICA Steering Committee  Member

  • 2000.04
    -
    Now

    European Association for Signal Processing (EURASIP)  Member

  • 1999
    -
    Now

    International Workshop on Acoustic Echo and Noise Control  International IWAENC Standing Committee Member

  • 1989.04
    -
    Now

    Institute of Electrical and Electronics Engineers (IEEE)  Member

  • 1988.04
    -
    Now

    電子情報通信学会  会員

  • 1983.04
    -
    Now

    日本音響学会  会員

  • 2018
    -
    2020

    IEEE Signal Processing Society  Member of the Board of Governors

  • 2019
     
     

    日本学術振興会  科学研究費 基盤研究(S) 審査意見書委員

  • 2018
    -
    2019

    日本学術振興会  国際事業委員会書面審査員・書面評価員

  • 2018
    -
    2019

    日本学術振興会  特別研究員等審査会専門委員

  • 2018
    -
    2019

    2018 International Workshop on Acoustic Signal Enhancement  General Chair

  • 2017
    -
    2018

    IEEE Signal Processing Society Japan Chapter  Chair

  • 2015
    -
    2018

    Institute of Electrical and Electronics Engineers (IEEE)  Member of Jack S. Kilby Signal Processing Medal Committee

  • 2013
    -
    2015

    日本学術振興会  科学研究費委員会専門委員

  • 2013
    -
    2015

    IEEE Signal Processing Magazine  Guest Editor

  • 2014
     
     

    日本音響学会  独創研究奨励賞板倉記念選考委員会委員長

  • 2013
    -
    2014

    IEEE Signal Processing Society  Technical Directions Board Member

  • 2013
    -
    2014

    IEEE Signal Processing Society  Chair of the Audio and Acoustic Signal Processing Technical Committee

  • 2013.07
     
     

    2013 International Conference of the IEEE Engineering in Medicine and Biology (EMBC2013)  Tutorial Speaker

  • 2012
    -
    2013

    2012 IEEE International Conference on Acoustics, Speech, and Signal Processing  Plenary Chair

  • 2011
    -
    2012

    2011 Annual Conference of the International Speech Communication Association  Tutorial Speaker

  • 2005
    -
    2012

    European Association for Signal Processing  Associate Editor of the EURASIP JASP

  • 2009
    -
    2011

    IEEE Japan Council  Awards Committee Member

  • 2008
    -
    2011

    Institute of Electrical and Electronics Engineers (IEEE)  James L. Flanagan Speech & Audio Processing Award Committee Member

  • 2009
    -
    2010

    電子情報通信学会  フェロー推薦委員会 委員

  • 2009
    -
    2010

    IEEE Signal Processing Society  Distinguished Lecturer

  • 2008
    -
    2009

    2008 Joint Workshop on Hands-Free Speech Communication and Microphone Arrays  Panelist

  • 2008
     
     

    電子情報通信学会  論文賞選定委員会委員

  • 2007
    -
    2008

    2007 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics  General Chair

  • 2007
    -
    2008

    電子情報通信学会  基礎・境界ソサイエティ 音響超音波サブソサイエティ 会長

  • 2007
    -
    2008

    2007 IEEE International Conference on Acoustics, Speech and Signal Processing  Tutorial Speaker

  • 2007
    -
    2008

    2007 International Conference on Independent Component Analysis and Signal Separation  Keynote Speaker

  • 2006
    -
    2008

    電子情報通信学会  応用音響研究会 委員長

  • 2006
    -
    2008

    IEEE Signal Processing Society  Awards Board Member

  • 2006
    -
    2007

    日本音響学会  粟屋潔学術奨励賞選定委員会委員

  • 2005
    -
    2006

    2005 Joint Workshop on Hands-Free Speech Communication and Microphone Arrays  Panelist

  • 2002
    -
    2005

    Institute of Electrical and Electronics Engineers (IEEE)  Associate Editor of the IEEE Trans. Speech and Audio Processing

  • 2001
    -
    2005

    日本音響学会  佐藤論文賞選定委員会委員

  • 2003
    -
    2004

    2003 International Workshop on Acoustic Echo and Noise Control  General Chair

  • 2002
    -
    2004

    IEEE Signal Processing Society  Conference Board Member

  • 2013
    -
    Now

    European project Embedded Audition for Robots  Advisory Board member

  • 2006
    -
    Now

    International Advisory Panel Member

  • 2003
    -
    Now

    Acoustical Society of Japan  Council member

  • 2020
    -
    2021

    2020 European Signal Processing Conference  Special Session Organizer

  • 2020
    -
    2021

    2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2020
    -
    2021

    2020 European Signal Processing Conference  Area Chair

  • 2020
    -
    2021

    2020 International Workshop on Acoustic Echo and Noise Control  Member of the Organizing Committee

  • 2019
    -
    2020

    2019 European Signal Processing Conference  Special Session Organizer

  • 2019
    -
    2020

    2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2019
    -
    2020

    2019 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)  Member of the Technical Committee

  • 2019
    -
    2020

    IEEE Signal Processing Society  Member of the TC Review Committee

  • 2018
    -
    2020

    IEEE Signal Processing Society  Member of the Long-Range Planning and Implementation Committee

  • 2018
    -
    2019

    2018 European Signal Processing Conference  Special Session Organizer

  • 2018
    -
    2019

    2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2018
    -
    2019

    2018 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2018
    -
    2019

    2018 IEEE International Conference on Acoustics, Speech, and Signal Processing  Area Chair

  • 2017
    -
    2018

    2017 European Signal Processing Conference  Special Session Organizer

  • 2017
    -
    2018

    2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2017
    -
    2018

    2017 IEEE International Conference on Acoustics, Speech, and Signal Processing  Area Chair

  • 2016
    -
    2017

    Special Session Organizer

  • 2016
    -
    2017

    2016 European Signal Processing Conference  Member of the Technical Program Committee

  • 2016
    -
    2017

    2016 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2016
    -
    2017

    Area Chair

  • 2016
    -
    2017

    Area Chair

  • 2016
    -
    2017

    IEEE Signal Processing Society  Chair of the Nominations and Elections Subcommittee of the Audio and Acoustic Signal Processing Technical Committee

  • 2012
    -
    2017

    IEEE Signal Processing Society  Chair of the Fellow Subcommittee of the Audio and Acoustic Signal Processing Technical Committee

  • 2015
    -
    2016

    2015 European Signal Processing Conference  Special Session Organizer

  • 2015
    -
    2016

    2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2015
    -
    2016

    2015 AEARU Workshop on Computer Science and Web Technology  Member of the Program Committee

  • 2015
    -
    2016

    2015 IEEE International Conference on Acoustics, Speech, and Signal Processing  Area Chair

  • 2015
    -
    2016

    IEEE Signal Processing Society Japan Chapter  Vice Chair

  • 2015
    -
    2016

    2015 International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA)  Special Sessions Chair

  • 2015
    -
    2016

    2015 European Signal Processing Conference  Area Chair

  • 2010
    -
    2016

    Asia Pacific Signal and Information Processing Association  Member of the Speech, Language, and Audio Technical Committee

  • 2015
     
     

    IEEE Signal Processing Society  Past Chair of the Audio and Acoustic Signal Processing Technical Committee

  • 2015
     
     

    2015 IEEE International Workshop on Applications of Signal Processing to Audio  Member of the Technical Program Committee

  • 2015
     
     

    IEEE Signal Processing Society  Vice Chair of the Nominations and Elections Subcommittee of the Audio and Acoustic Signal Processing Technical Committee

  • 2014
    -
    2015

    2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2014
    -
    2015

    2014 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2014
    -
    2015

    2014 Hands-free Speech Communication and Microphone Arrays  Member of the Technical Program Committee

  • 2014
    -
    2015

    Symposia at the 2014 IEEE Global Conference on Signal and Information Processing  Member of the Organizing Committee

  • 2014
    -
    2015

    2014 IEEE International Conference on Acoustics, Speech, and Signal Processing  Area Chair

  • 2014
    -
    2015

    2014 European Signal Processing Conference  Area Chair

  • 2013
    -
    2014

    2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference  Special Session Organizer

  • 2013
    -
    2014

    2013 European Signal Processing Conference  Special Session Organizer

  • 2013
    -
    2014

    2013 IEEE International Conference on Acoustics, Speech, and Signal Processing  Area Chair

  • 2013
    -
    2014

    2013 European Signal Processing Conference  Area Chair

  • 2012
    -
    2013

    Special Session Organizer

  • 2012
    -
    2013

    2012 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2011.04
    -
    2012.03

    日本音響学会  日本音響学会誌 小特集 ゲスト編集委員長

  • 2011
    -
    2012

    2011 Hands-free Speech Communication and Microphone Arrays  Member of the Technical Program Committee

  • 2011
    -
    2012

    2011 European Signal Processing Conference  Member of the Technical Program Committee

  • 2011
    -
    2012

    IEEE Signal Processing Society  Vice Chair of the Audio and Acoustic Signal Processing Technical Committee

  • 2011
    -
    2012

    European Association for Signal Processing (EURASIP)  Guest Editor of the EURASIP Journal on Applied Signal Processing

  • 2010
    -
    2011

    2010 Asia-Pacific Signal and Information Processing Conference  Member of the Technical Committee

  • 2010
    -
    2011

    2010 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2010
    -
    2011

    2010 IEEE International Symposium on Circuits and Systems  Track Chair

  • 2009
    -
    2010

    2009 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics  Member of the Organizing Committee

  • 2009
    -
    2010

    2009 IEEE International Symposium on Circuits and Systems  Track Chair

  • 2009
    -
    2010

    2009 European Signal Processing Conference  Area Chair

  • 2009
    -
    2010

    IEEE Circuits and Systems Society  Chair of the Blind Signal Processing Technical Committee

  • 2008
    -
    2010

    Institute of Electrical and Electronics Engineers (IEEE)  Guest Editor of the IEEE Trans. Circuits and Systems-I

  • 1990
    -
    2010

    IEEE Signal Processing Society  Member of the Audio and Acoustic Signal Processing Technical Committee

  • 2008
    -
    2009

    2008 Joint Workshop on Hands-free Speech Communication and Microphone Arrays  Special Session Organizer

  • 2008
    -
    2009

    2008 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2008
    -
    2009

    2008 Joint Workshop on Hands-free Speech Communication and Microphone Arrays  Technical Co-Chair

  • 2008
    -
    2009

    2008 Workshop on Statistical and Perceptual Audition  Co-Organizer

  • 2008
    -
    2009

    2008 IEEE International Symposium on Circuits and Systems  Member of the Review Committee

  • 2007
    -
    2009

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS.  電子情報通信学会論文誌 小特集 ゲスト編集委員長

  • 2007
    -
    2008

    2007 IEEE International Symposium on Circuits and Systems  Special Session Organizer

  • 2007
    -
    2008

    電子情報通信学会  基礎・境界ソサイエティ 副会長

  • 2007
    -
    2008

    2007 IEEE International Symposium on Circuits and Systems  Member of the Review Committee

  • 2007
    -
    2008

    Chair-Elect of the Blind Signal Processing Technical Committee

  • 2006
    -
    2008

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS.  電子情報通信学会論文誌 小特集 ゲスト編集委員長

  • 2006
    -
    2007

    2006 Asilomar Conference on Signals, Systems, and Computers  Special Session Organizer

  • 2006
    -
    2007

    2006 European Signal Processing Conference  Special Session Organizer

  • 2006
    -
    2007

    2006 International Conference on Independent Component Analysis and Blind Signal Separation  Special Session Organizer

  • 2006
    -
    2007

    2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan  Special Session Organizer

  • 2006
    -
    2007

    2006 International Conference on Independent Component Analysis and Blind Signal Separation  Member of the International Program Committee

  • 2006
    -
    2007

    2006 European Signal Processing Conference  Member of the Technical Program

  • 2006
    -
    2007

    2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan  Member of the Organizing Committee

  • 2006
    -
    2007

    2006 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2006
    -
    2007

    2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan  Member of the Technical Committee

  • 2006
    -
    2007

    2006 IEEE International Symposium on Circuits and Systems  Member of the Review Committee

  • 2006
    -
    2007

    Institute of Electrical and Electronics Engineers (IEEE)  Guest Editor of the IEEE Trans. Computers

  • 2006
    -
    2007

    2006 International Conference on Independent Component Analysis and Blind Signal Separation  Program Committee Chair

  • 2005
    -
    2007

    Institute of Electrical and Electronics Engineers (IEEE)  Guest Editor of the IEEE Trans. ASLP

  • 2005
    -
    2006

    2005 IEEE International Symposium on Circuits and Systems  Member of the Review Committee

  • 2005
    -
    2006

    2005 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics  Member of the Organizing Committee

  • 2005
    -
    2006

    2005 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 2004
    -
    2006

    IEEE Circuits and Systems Society  Member of the Blind Signal Processing Technical Committee

  • 2003.04
    -
    2005.03

    電子情報通信学会  応用音響研究会 専門委員

  • 2004
    -
    2005

    2004 International Congress on Acoustics  Special Session Organizer

  • 2004
    -
    2005

    2004 IEEE International Conference on Acoustics, Speech and Signal Processing  Special Session Organizer

  • 2004
    -
    2005

    2004 Workshop on Communication Scene Analysis  Program Chair

  • 2004
    -
    2005

    2004 Workshop on Statistical and Perceptual Audio Processing  Member of the Technical Committee

  • 2004
    -
    2005

    2004 International Congress on Acoustics  Member of the Program Committee

  • 2001
    -
    2005

    Acoustical Society of Japan  日本音響学会誌 論文委員会 電気音響分野 幹事

  • 2003
    -
    2004

    2003 IEEE International Workshop on Neural Networks for Signal Processing  Member of the Program Committee

  • 2003
    -
    2004

    2003 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics  Member of the Program Committee

  • 2003
    -
    2004

    2003 International Conference on Independent Component Analysis and Blind Signal Separation  Organizing Chair

  • 2001.04
    -
    2003.03

    電子情報通信学会  応用音響研究会 副委員長

  • 2002
    -
    2003

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS.  電子情報通信学会論文誌 小特集 ゲスト編集委員

  • 2002
    -
    2003

    2002 China-Japan Joint Conference on Acoustics  Member of the Organizing Committee

  • 2002
    -
    2003

    2002 IEEE International Workshop on Neural Networks for Signal Processing  Member of the Program Committee

  • 1999
    -
    2003

    Institute of Electrical and Electronics Engineers (IEEE)  Senior Member

  • 2001
    -
    2002

    2001 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 1992.04
    -
    2001.03

    電子情報通信学会  応用音響研究会 専門委員

  • 1999
    -
    2000

    1999 International Workshop on Acoustic Echo and Noise Control  Member of the Technical Committee

  • 1995
    -
    1997

    Acoustical Society of Japan  研究発表会準備委員会 委員

  • 1990.04
    -
    1992.03

    電子情報通信学会  応用音響研究会 幹事

  • 1990
    -
    1992

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS.  電子情報通信学会論文誌 小特集 ゲスト編集委員

▼display all

Professional Memberships

  •  
     
     

    日本音響学会

  •  
     
     

    電子情報通信学会

  •  
     
     

    APSIPA (Asia Pacific Signal and Information Processing Association)

  •  
     
     

    ISCA (International Speech Communication Association)

  •  
     
     

    EURASIP (European Association for Signal Processing)

  •  
     
     

    IEEE (Institute of Electrical and Electronics Engineers)

▼display all

Research Areas

  • Perceptual information processing / Intelligent robotics / Intelligent informatics

Research Interests

  • メディア情報処理

  • ディジタル信号処理

  • 音響信号処理

  • Media Information Processing

  • Digital Signal Processing

  • Acoustic Signal Processing

▼display all

Awards

  • Hoko Award

    2018.10   Hattori Hokokai Foundation  

  • Outstanding Contribution Award of the Institute of Electronics, Information, and Communication Engineers

    2018.06   Institute of Electronics, Information, and Communication Engineers  

  • Paper Award of the Acoustical Society of Japan

    2018.03   Acoustical Society of Japan  

  • 電子情報通信学会 業績賞

    2017.06   通信学会  

    Winner: 牧野昭二

  • Prizes for Science and Technology Research Category

    2015.04  

    Winner: Makino Shoji

     View Summary

    The Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology

  • TELECOM System Technology Award

    2015.03   Telecommunications Advancement Foundation  

    Winner: Makino Shoji

  • IEEE Signal Processing Society Best Paper Award

    2014.01   IEEE Signal Processing Society  

    Winner: Makino Shoji

  • Distinguished Lecturer

    2009.01   IEEE  

    Winner: Shoji Makino

  • Fellow

    2007.09   IEICE  

    Winner: Shoji Makino

  • MLSP Competition Award

    2007.08   IEEE  

    Winner: Shoji Makino

  • Best Presentation Award at the SPIE Defense and Security Symposium

    2006.04   SPIE  

    Winner: Makino Shoji

  • ICA Unsupervised Learning Pioneer Award

    2006.04   SPIE  

    Winner: Makino Shoji

  • Paper Award

    2005.05   IEICE  

    Winner: Shoji Makino

  • TELECOM System Technology Award

    2004.03   Telecommunications Advancement Foundation  

    Winner: Shoji Makino

  • Fellow

    2004.01   IEEE  

    Winner: Shoji Makino

  • Best Paper Award of the International Workshop on Acoustic Echo and Noise Control

    2003.09  

    Winner: Makino Shoji

  • Paper Award

    2002.05   IEICE  

    Winner: Shoji Makino

  • Paper Award

    2002.03   ASJ  

    Winner: Shoji Makino

  • Achievement Award

    1997.05   IEICE  

    Winner: Shoji Makino

  • Outstanding Technological Development Award

    1995.05   ASJ  

    Winner: Shoji Makino

  • IEEE Signal Processing Society Notable Services and Contributions Award

    2019   IEEE Signal Processing Society  

    Winner: Makino Shoji

  • IEEE Signal Processing Society Chapter Leadership Award

    2018.12   IEEE Signal Processing Society  

    Winner: 牧野昭二

  • Best Faculty Member Award of the University of Tsukuba

    2016.02  

    Winner: Shoji Makino

  • IEEE Signal Processing Society Outstanding Service Award

    2014.12   IEEE Signal Processing Society  

    Winner: Makino Shoji

▼display all

 

Papers

  • Time-Frequency-Bin-Wise Linear Combination of Beamformers for Distortionless Signal Enhancement.

    Kouei Yamaoka, Nobutaka Ono, Shoji Makino

    IEEE/ACM Transactions on Audio, Speech and Language Processing   29   3461 - 3475  2021

    DOI

    Scopus

    12
    Citation
    (Scopus)
  • Multichannel Signal Enhancement Algorithms for Assisted Listening Devices

    Simon Doclo, Walter Kellermann, Shoji Makino, Sven Nordholm

    IEEE SIGNAL PROCESSING MAGAZINE   32 ( 2 ) 18 - 30  2015.03  [Refereed]

    DOI

    Scopus

    181
    Citation
    (Scopus)
  • Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment

    Hiroshi Sawada, Shoko Araki, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   19 ( 3 ) 516 - 527  2011.03  [Refereed]

     View Summary

    This paper presents a blind source separation method for convolutive mixtures of speech/audio sources. The method can even be applied to an underdetermined case where there are fewer microphones than sources. The separation operation is performed in the frequency domain and consists of two stages. In the first stage, frequency-domain mixture samples are clustered into each source by an expectation-maximization (EM) algorithm. Since the clustering is performed in a frequency bin-wise manner, the permutation ambiguities of the bin-wise clustered samples should be aligned. This is solved in the second stage by using the probability on how likely each sample belongs to the assigned class. This two-stage structure makes it possible to attain a good separation even under reverberant conditions. Experimental results for separating four speech signals with three microphones under reverberant conditions show the superiority of the new method over existing methods. We also report separation results for a benchmark data set and live recordings of speech mixtures.

    DOI

    Scopus

    312
    Citation
    (Scopus)
  • Frequency-Domain Pearson Distribution Approach for Independent Component Analysis (FD-Pearson-ICA) in Blind Source Separation

    Hiroko Kato Solvang, Yuichi Nagahara, Shoko Araki, Hiroshi Sawada, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   17 ( 4 ) 639 - 649  2009.05  [Refereed]

     View Summary

    In frequency-domain blind source separation (BSS) for speech with independent component analysis (ICA), a practical parametric Pearson distribution system is used to model the distribution of frequency-domain source signals. ICA adaptation rules have a score function determined by an approximated signal distribution. Approximation based on the data may produce better separation performance than we can obtain with ICA. Previously, conventional hyperbolic tangent (tanh) or generalized Gaussian distribution (GGD) was uniformly applied to the score function for all frequency bins, even though a wideband speech signal has different distributions at different frequencies. To deal with this, we propose modeling the signal distribution at each frequency by adopting a parametric Pearson distribution and employing it to optimize the separation matrix in the ICA learning process. The score function is estimated by the appropriate Pearson distribution parameters for each frequency bin. We devised three methods for Pearson distribution parameter estimation and conducted separation experiments with real speech signals convolved with actual room impulse responses (T(60) = 130 ms). Our experimental results show that the proposed frequency-domain Pearson-ICA (FD-Pearson-ICA) adapted well to the characteristics of frequency-domain source signals. By applying the FD-Pearson-ICA performance, the signal-to-interference ratio significantly improved by around 2-3 dB compared with conventional nonlinear functions. Even if the signal-to-interference ratio (SIR) values of FD-Pearson-ICA were poor, the performance based on a disparity measure between the true score function and estimated parametric score function clearly showed the advantage of FD-Pearson-ICA. Furthermore, we confirmed the optimum of the proposed approach for/optimized the proposed approach as regards separation performance. By combining individual distribution parameters directly estimated at low frequency with the appropriate parameters optimized at high frequency, it was possible to both reasonably improve the FD-Pearson-ICA performance without any significant increase in the computational burden by comparison with conventional nonlinear functions.

    DOI

    Scopus

    16
    Citation
    (Scopus)
  • Grouping separated frequency components by estimating propagation model parameters in frequency-domain blind source separation

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   15 ( 5 ) 1592 - 1604  2007.07  [Refereed]

     View Summary

    This paper proposes a new formulation and optimization procedure for grouping frequency components in frequency-domain blind source separation (BSS). We adopt two separation techniques, independent component analysis (ICA) and time-frequency.(T-F) masking, for the frequency-domain BSS. With ICA, grouping the frequency components corresponds to aligning the permutation ambiguity of the ICA solution in each frequency bin. With T-F masking, grouping the frequency components corresponds to classifying sensor observations in the time-frequency domain for individual sources. The grouping procedure is based on estimating anechoic propagation model parameters by analyzing ICA results or sensor observations. More specifically, the time delays of arrival and attenuations from a source to all sensors are estimated for each source. The focus of this paper includes the applicability of the proposed procedure for a situation with wide sensor spacing where spatial aliasing may occur. Experimental results show that the proposed procedure effectively separates two or three sources with several sensor configurations in a real room, as long as the room reverberation is moderately low.

    DOI

    Scopus

    108
    Citation
    (Scopus)
  • Spatio-temporal FastICA algorithms for the blind separation of convolutive mixtures

    Scott C. Douglas, Malay Gupta, Hiroshi Sawada, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   15 ( 5 ) 1511 - 1520  2007.07  [Refereed]

     View Summary

    This paper derives two spatio-temporal extensions of the well-known FastICA algorithm of Hyvarinen and Oja that are applicable to the convolutive blind source separation task. Our time-domain algorithms combine multichannel spatio-temporal prewhitening via multistage least-squares linear prediction with novel adaptive procedures that impose paraunitary, constraints on the multichannel separation filter. The techniques converge quickly to a separation solution without any step size selection or divergence difficulties, and unlike other methods, ours do not require special coefficient initialization procedures to obtain good separation performance. They also allow for the efficient reconstruction of individual signals as observed in the sensor measurements directly from the system parameters for single-input multiple-output blind source separation tasks. An analysis of one of the adaptive constraint procedures shows its fast convergence to a paraunitary filter bank solution. Numerical evaluations of the proposed algorithms and comparisons with several existing convolutive blind source separation techniques indicate the excellent relative performance of the proposed methods.

    DOI

    Scopus

    70
    Citation
    (Scopus)
  • Geometrically constrained independent component analysis

    Mirko Knaak, Shoko Araki, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   15 ( 2 ) 715 - 726  2007.02  [Refereed]

     View Summary

    Acoustical signals are often corrupted by other speeches, sources, and background noise. This makes it necessary to use some form of preprocessing so that signal processing systems such as a speech recognizer or machine diagnosis can be effectively employed. In this contribution, we introduce and evaluate a new algorithm that uses independent component analysis (ICA) with a geometrical constraint [constrained ICA (CICA)]. It is based on the fundamental similarity between an adaptive beamformer and blind source separation with ICA, and does not suffer the permutation problem of ICA-algorithms. Unlike conventional ICA algorithms, CICA needs prior knowledge about the rough direction of the target signal. However, it is more robust against an erroneous estimation of the target direction than adaptive beamformers: CICA converges to the right solution as long as its look direction is closer to the target signal than to the jammer signal. A high degree of robustness is very important since the geometrical prior of an adaptive beamformer is always roughly estimated in a reverberant environment, even when the look direction is precise. The effectiveness and robustness of the new algorithms is proven theoretically, and shown experimentally for three sources and three microphones with several sets of real-world data.

    DOI

    Scopus

    44
    Citation
    (Scopus)
  • Blind extraction of dominant target sources using ICA and time-frequency masking

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   14 ( 6 ) 2165 - 2173  2006.11  [Refereed]

     View Summary

    This paper presents a method for enhancing target sources of interest and suppressing other interference sources. The target sources are assumed to be close to sensors, to have dominant powers at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e., without knowing the position and active time of each source. We consider a general case where the total number of sources is larger than the number of sensors, and neither the number of target sources nor the total number of sources is known. The method is based on a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and then time-frequency masking is used to improve the performance further. We propose a new sophisticated method for deciding the number of target sources and then selecting their frequency components. We also propose a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room, whose reverberation time was 130 ms, are presented to show the effectiveness and characteristics of the proposed method.

    DOI

    Scopus

    92
    Citation
    (Scopus)
  • Natural gradient multichannel blind deconvolution and speech separation using causal FIR filters

    Scott C. Douglas, Hiroshi Sawada, Shoji Makino

    IEEE Transactions on Speech and Audio Processing   13 ( 1 ) 92 - 104  2005.01  [Refereed]

     View Summary

    Natural gradient adaptation is an especially convenient method for adapting the coefficients of a linear system in inverse filtering tasks such as convolutive blind source separation and multichannel blind deconvolution. When developing practical implementations of such methods, however, it is not clear how best to window the signals and truncate the filter impulse responses within the filtered gradient updates. In this paper, we show how inadequate use of truncation of the filter impulse responses and signal windowing within a well-known natural gradient algorithm for multichannel blind deconvolution and source separation can introduce a bias into its steady-state solution. We then provide modifications of this algorithm that effectively mitigate these effects for estimating causal FIR solutions to single- and multichannel equalization and source separation tasks. The new multichannel blind deconvolution algorithm requires approximately 6.5 multiply/adds per adaptive filter coefficient, making its computational complexity about 63% greater than the originally-proposed version. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for real-world acoustic sources, even for extremely-short separation filters.

    DOI

    Scopus

    79
    Citation
    (Scopus)
  • A robust and precise method for solving the permutation problem of frequency-domain blind source separation

    H Sawada, R Mukai, S Araki, S Makino

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   12 ( 5 ) 530 - 538  2004.09  [Refereed]

     View Summary

    Blind source separation (BSS) for convolutive mixtures can be solved efficiently in the frequency domain, where independent component analysis (ICA) is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem: the permutation ambiguity of ICA in each frequency bin should be aligned so that a separated signal in the time-domain contains frequency components of the same source signal. This paper presents a robust and precise method for solving the permutation problem. It is based on two approaches: direction of arrival (DOA) estimation for sources and the interfrequency correlation of signal envelopes. We discuss the advantages and disadvantages of the two approaches, and integrate them to exploit their respective advantages. Furthermore, by utilizing the harmonics of signals, we make the new method robust even for low frequencies where DOA estimation is inaccurate. We also present a new closed-form formula for estimating DOAs from a separation matrix obtained by ICA. Experimental results show that our method provided an almost perfect solution to the permutation problem for a case where two sources were mixed in a room whose reverberation time was 300 ms.

    DOI

    Scopus

    443
    Citation
    (Scopus)
  • The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech

    S Araki, R Mukai, S Makino, T Nishikawa, H Saruwatari

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   11 ( 2 ) 109 - 116  2003.03  [Refereed]

     View Summary

    Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the separation performance is still not good. enough. In particular, when the impulse responses are long, performance is highly limited. In this paper, we consider a two-input, two-output convolutive BSS problem. First, we show that it is not good to be constrained by the condition T > P, where T is the frame length of the DFT and P is the length of the room impulse responses. We show that there is an optimum frame size. that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation. We also clarify the reason for. the poor performance of BSS in long reverberant environments, highlighting that the framework of BSS works as two sets of frequency-domain adaptive beamformers. Although BSS can reduce reverberant sounds to some extent like adaptive beamformers, they mainly remove the sounds from the jammer direction. This is the reason for the difficulty of BSS in reverberant environments.

    DOI

    Scopus

    223
    Citation
    (Scopus)
  • Common-acoustical-pole and zero modeling of head-related transfer functions

    Y Haneda, S Makino, Y Kaneda, N Kitawaki

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   7 ( 2 ) 188 - 196  1999.03  [Refereed]

     View Summary

    Use of a common-acoustical-pole and zero model is proposed for modeling head-related transfer functions (HRTF's) for various directions of sound incidence. The HRTF's are expressed using the common acoustical poles, which do not depend on the source directions, and the zeros, which do, The common acoustical poles are estimated as they are common to HRTF's for various source directions; the estimated values of the poles agree well with the resonance frequencies of the ear canal. Because this model uses only the zeros to express the HRTF variations due to changes in source direction, it requires fewer parameters (the order of the zeros) that depend on the source direction than do the conventional all zero or pole/zero models. Furthermore, the proposed model can extract the zeros that are missed in the conventional models because of pole-zero cancellation. As a result, the directional dependence of the zeros can be traced well. Analysis of the zeros for HRTF's on the horizontal plane showed that the nonminimum-phase zero variation was well formulated using a simple pinna-reflection model, The common-acoustical-pole and zero (CAPZ) model is thus effective for modeling and analyzing HRTF's.

  • A block exact fast affine projection algorithm

    M Tanaka, S Makino, J Kojima

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   7 ( 1 ) 79 - 86  1999.01  [Refereed]

     View Summary

    This paper describes a block (affine) projection algorithm that has exactly the same convergence rate as the original sample-by-sample algorithm and smaller computational complexity than the fast affine projection algorithm. This is achieved by 1) introducing a correction term that compensates for the filter output difference between the sample-by-sample projection algorithm and the straightforward block projection algorithm, and 2) applying a fast finite impulse response (FIR) filtering technique to compute filter outputs and to update the filter.
    We describe how to choose a pair of block lengths that gives the longest filter length under a constraint on the total computational complexity and processing delay. An example shows that the filter length can be doubled if a delay of a few hundred samples is permissible.

  • The past, present, and future of audio signal processing

    T Chen, GW Elko, SJ Elliot, S Makino, JM Kates, M Bosi, JO Smith, M Kahrs

    IEEE SIGNAL PROCESSING MAGAZINE   14 ( 5 ) 30 - 57  1997.09  [Refereed]

  • Common Acoustical Pole and Zero Modeling of Room Transfer Functions

    Yoichi Haneda, Shoji Makino, Yutaka Kaneda

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   2 ( 2 ) 320 - 328  1994.04  [Refereed]

     View Summary

    A new model for a room transfer function (RTF) by using common acoustical poles that correspond to resonance properties of a room is proposed. These poles are estimated as the common values of many RTF's corresponding to different source and receiver positions. Since there is one-to-one correspondence between poles and AR coefficients, these poles are calculated as common AR coefficients by two methods: i) using the least squares method, assuming all the given multiple RTF's have the same AR coefficients and ii) averaging each set of AR coefficients estimated from each RTF. The estimated poles agree well with the theoretical poles when estimated with the same order as the theoretical pole order. When estimated with a lower order than the theoretical pole order, the estimated poles correspond to the major resonance frequencies, which have high Q factors. Using the estimated common AR coefficients, the proposed method models the RTF's with different MA coefficients. This model is called the common-acoustical-pole and zero (CAPZ) model, and it requires far fewer variable parameters to represent RTF's than the conventional all-zero or pole/zero model. This model was used for an acoustic echo canceller at low frequencies, as one example. The acoustic echo canceller based on the proposed model requires half the variable parameters and converges 1.5 times faster than one based on the all-zero model, confirming the efficiency of the proposed model.

  • Exponentially Weighted Stepsize NLMS Adaptive Filter Based on the Statistics of a Room Impulse Response

    Shoji Makino, Yutaka Kaneda, Nobuo Koizumi

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   1 ( 1 ) 101 - 108  1993.01  [Refereed]

     View Summary

    This paper proposes a new normalized least-mean-squares (NLMS) adaptive algorithm with double the convergence speed, at the same computational load, of the conventional NLMS for an acoustic echo canceller. This algorithm, called the ES (exponentially weighted stepsize) algorithm, uses a different stepsize (feedback constant) for each weight of an adaptive transversal filter. These stepsizes are time-invariant and weighted proportional to the expected variation of a room impulse response. The algorithm is based on the fact that the expected variation of a room impulse response becomes progressively smaller along the series by the same exponential ratio as the impulse response energy decay. As a result, the algorithm adjusts coefficients with large errors in large steps, and coefficients with small errors in small steps. A transition formula is derived for the mean-squared coefficient error of the proposed algorithm. The mean stepsize determines the convergence condition, the convergence speed, and the final excess mean-squared error. The algorithm is modified for a practical multiple DSP structure, so that it requires only the same amount of computation as the conventional NLMS. The algorithm is implemented in a commercial acoustic echo canceller and its fast convergence is demonstrated.

    DOI CiNii

    Scopus

    107
    Citation
    (Scopus)
  • Wavelength-Proportional Interpolation and Extrapolation of Virtual Microphone for Underdetermined Speech Enhancement

    Ryoga Jinzai, Kouei Yamaoka, Shoji Makino, Nobutaka Ono, Mitsuo Matsumoto, Takeshi Yamada

    APSIPA Transactions on Signal and Information Processing   12 ( 3 )  2023

     View Summary

    We previously proposed the virtual microphone technique to improve speech enhancement performance in underdetermined situations, in which the number of channels is virtually increased by estimating extra microphone signals at arbitrary positions along the straight line formed by real microphones. The effectiveness of the interpolation of virtual microphone signals for speech enhancement was experimentally confirmed. In this work, we apply the extrapolation of a virtual microphone as preprocessing of the maximum signal-to-noise ratio (SNR) beamformer and compare its speech enhancement performance (the signal-to-distortion ratio (SDR) and signal-to-interference ratio (SIR)) with that of using the interpolation of a virtual microphone. Furthermore, we aim to improve speech enhancement performance by solving a trade-off relationship between performance at low and high frequencies, which can be controlled by adjusting the virtual microphone interval. We propose a new arrangement where a virtual microphone is placed at a distance from the reference real microphone proportional to the wavelength at each frequency. From the results of our experiment in an underdetermined situation, we confirmed speech enhancement performance using the extrapolation of a virtual microphone is higher than that of using the interpolation of a virtual microphone. Moreover, the proposed wavelength-proportional interpolation and extrapolation method improves speech enhancement performance compared with the interpolation and extrapolation. Furthermore, we present the directivity patterns of a spatial filter and confirmed the behavior that improves speech enhancement performance.

    DOI

    Scopus

  • Low latency online blind source separation based on joint optimization with blind dereverberation

    Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings   2021-   506 - 510  2021

     View Summary

    This paper presents a new low-latency online blind source separation (BSS) algorithm. Although algorithmic delay of a frequency domain online BSS can be reduced simply by shortening the short-time Fourier transform (STFT) frame length, it degrades the source separation performance in the presence of reverberation. This paper proposes a method to solve this problem by integrating BSS with Weighted Prediction Error (WPE) based dereverberation. Although a simple cascade of online BSS after online WPE upgrades the separation performance, the overall optimality is not guaranteed. Instead, this paper extends a recently proposed batch processing algorithm that can jointly optimize dereverberation and separation so that it can perform online processing with low computational cost and little processing delay (&lt
    12 ms). The results of a source separation experiment in a noisy car environment suggest that the proposed online method has better separation performance than the simple cascaded methods.

    DOI

    Scopus

    13
    Citation
    (Scopus)
  • SepNet: A deep separation matrix prediction network for multichannel audio source separation

    Shota Inoue, Hirokazu Kameoka, Li Li, Shoji Makino

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings   2021-   191 - 195  2021

     View Summary

    In this paper, we propose SepNet, a deep neural network (DNN) designed to predict separation matrices from multichannel observations. One well-known approach to blind source separation (BSS) involves independent component analysis (ICA). A recently developed method called independent low-rank matrix analysis (ILRMA) is one of its powerful variants. These methods allow the estimation of separation matrices based on deterministic iterative algorithms. Specifically, ILRMA is designed to update the separation matrix according to an update rule derived based on the majorization-minimization principle. Although ILRMA performs reasonably well under some conditions, there is still room for improvement in terms of both separation accuracy and computation time, especially for large-scale microphone arrays. The existence of a deterministic iterative algorithm that can find one of the stationary points of the BSS problem implies that a DNN can also play that role if designed and trained properly. Motivated by this, we propose introducing a DNN that learns to convert a predefined input (e.g., an identity matrix) into a true separation matrix in accordance with a multichannel observation. To enable it to find one of the multiple solutions corresponding to different permutations of the source indices, we further propose adopting a permutation invariant training strategy to train the network. By using a fully convolutional architecture, we can design the network so that the forward propagation can be computed efficiently. The experimental results revealed that SepNet was able to find separation matrices faster and with better separation accuracy than ILRMA for mixtures of two sources.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis.

    Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Masakazu Une, Shoji Makino

    APSIPA ASC     578 - 584  2021

  • Speech emotion recognition based on attention weight correction using word-level confidence measure

    Jennifer Santoso, Takeshi Yamada, Shoji Makino, Kenkichi Ishizuka, Takekatsu Hiramura

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH   1   301 - 305  2021

     View Summary

    Emotion recognition is essential for human behavior analysis and possible through various inputs such as speech and images. However, in practical situations, such as in call center analysis, the available information is limited to speech. This leads to the study of speech emotion recognition (SER). Considering the complexity of emotions, SER is a challenging task. Recently, automatic speech recognition (ASR) has played a role in obtaining text information from speech. The combination of speech and ASR results has improved the SER performance. However, ASR results are highly affected by speech recognition errors. Although there is a method to improve ASR performance on emotional speech, it requires the fine-tuning of ASR, which is costly. To mitigate the errors in SER using ASR systems, we propose the use of the combination of a self-attention mechanism and a word-level confidence measure (CM), which indicates the reliability of ASR results, to reduce the importance of words with a high chance of error. Experimental results confirmed that the combination of self-attention mechanism and CM reduced the effects of incorrectly recognized words in ASR results, providing a better focus on words that determine emotion recognition. Our proposed method outperformed the stateof- the-art methods on the IEMOCAP dataset.

    DOI

    Scopus

    15
    Citation
    (Scopus)
  • Study on geometrically constrained IVA with auxiliary function approach and VCD for in-car communication

    Goto, Kana, Li, Li, Takahashi, Riki, Makino, Shoji, Yamada, Takeshi

    Proc. APSIPA ASC 2020     858 - 862  2020.12  [Refereed]

  • Applying virtual microphones to triangular microphone array in in-car communication

    Segawa, Hanako, Takahashi, Riki, Jinzai, Ryoga, Makino, Shoji, Yamada, Takeshi

    Proc. APSIPA ASC 2020     421 - 425  2020.12  [Refereed]

  • Determined audio source separation with multichannel star generative adversarial network

    Li Li, Hirokazu Kameoka, Shoji Makino

    IEEE International Workshop on Machine Learning for Signal Processing, MLSP   2020-  2020.09

     View Summary

    This paper proposes a multichannel source separation approach, which uses a star generative adversarial network (StarGAN) to model power spectrograms of sources. Various studies have shown the significant contributions of a precise source model to the performance improvement in audio source separation, which indicates the importance of developing a better source model. In this paper, we explore the potential of StarGAN for modeling source spectrograms and investigate the effectiveness of the StarGAN source model in determined multichannel source separation by incorporating it into a frequency-domain independent component analysis (ICA) framework. The experimental results reveal that the proposed StarGAN-based method outperformed conventional methods that use non-negative matrix factorization (NMF) or a variational autoencoder (VAE) for source spectrogram modeling.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • DNNマスク推定に基づく畳み込みビームフォーマによる音源分離・残響除去・雑音除去の同時実現

    髙橋理希, 中谷智広, 落合翼, 木下慶介, 池下林太郎, Marc, Delcroix, 荒木章子, 牧野, 昭二

    音講論集   3-1-9   285 - 288  2020.03

  • 基底共有型半教師あり独立低ランク行列分析に基づく多チャネル補聴器システム

    宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

    音講論集   1-1-22   217 - 220  2020.03

  • 発話の時間変動に着目した音声認識誤り区間推定の検討

    舒, 禹清, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会講演論文集     957 - 958  2020.03

  • 空間特徴と音響特徴を併用する音響イベント検出の検討

    陳, 軼夫, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会講演論文集     1027 - 1030  2020.03

  • 車室内コミュニケーション用低遅延音源分離の検討

    上田, 哲也, 井上, 翔太, 牧野, 昭二, 松本, 光雄, 山田, 武志

    日本音響学会春季研究発表会講演論文集     213 - 216  2020.03

  • 空間フィルタの自動推定による音響シーン識別の検討

    大野, 泰己, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会論文集(D)   D-14-6   113 - 113  2020.03

  • Generative Adversarial Networks を用いた半教師あり学習の音響イベント検出への適用

    合馬, 一弥, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会論文集(D)   D-14-7   114 - 114  2020.03

  • Spatial feature extraction based on convolutional neural network with multiple microphone inputs for monitoring of domestic activities

    Kaneko, Yuki, Kurosawa, Rika, Yamada, Takeshi, Makino, Shoji

    Proc. NCSP'20     175 - 178  2020.02  [Refereed]

  • Underdetermined multichannel speech enhancement using time-frequency-bin-wise switching beamformer and gated CNN-based time-frequency mask for reverberant environments

    Takahashi, Riki, Yamaoka, Kouei, Li, Li, Makino, Shoji, Yamada, Takeshi, Matsumoto, Mitsuo

    Proc. NCSP'20     645 - 648  2020.02  [Refereed]

  • Blind source separation with low-latency for in-car communication

    Ueda, Tetsuya, Inoue, Shota, Makino, Shoji, Matsumoto, Mitsuo, Yamada, Takeshi

    Proc. NCSP'20     167 - 170  2020.02  [Refereed]

  • 多チャンネル変分自己符号化器法による任意話者の音源分離

    李莉, 亀岡弘和, 井上翔太, 牧野, 昭二

    信学技報   EA2019-77   79 - 84  2019.12

  • Evaluation of multichannel hearing aid system by rank-constrained spatial covariance matrix estimation

    Une, Masakazu, Kubo, Yuki, Takamune, Norihiro, Kitamura‡, Daichi, Saruwatari, Hiroshi, Makino, Shoji

    Proc. APSIPA     1874 - 1879  2019.11  [Refereed]

  • Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE

    Li, Li, Toda, Tomoki, Morikawa, Kazuho, Kobayashi, Kazuhiro, Makino, Shoji

    Proc. Annual Conference of the International Society for Music Information Retrieval     784 - 790  2019.11

  • Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum

    Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

    Proc. APSIPA ASC 2019     302 - 306  2019.11  [Refereed]

  • Supervised determined source separation with multichannel variational autoencoder

    Kameoka, Hirokazu, Li, Li, Inoue, Shota, Makino, Shoji

    Neural Computation   31 ( 9 ) 1891 - 1914  2019.09  [Refereed]

  • Joint separation, dereverberation and classification of multiple sources using multichannel variational autoencoder with auxiliary classifier

    Shota, Inoue, Hirokazu, Kameoka, Li, Li, Makino, Shoji

    Proc. International Congress on Acoustics     6988 - 6995  2019.09  [Refereed]

  • Gated convolutional neural network-based voice activity detection under high-level noise environments

    Li, Li, Kouei, Yamaoka, Yuki, Koshino, Mitsuo, Matsumoto, Makino, Shoji

    Proc. International Congress on Acoustics     2862 - 2869  2019.09  [Refereed]

  • ランク制約付き空間共分散モデル推定を用いた多チャネル補聴器システムの評価

    宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

    音講論集   1-1-3   161 - 164  2019.09

  • Wavelength proportional arrangement of virtual microphones based on interpolation/extrapolation for underdetermined speech enhancement

    Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Makino, Shoji, Yamada, Takeshi

    Proc. EUSIPCO 2019    2019.09  [Refereed]

  • BLSTMと変調スペクトルを用いた発話特徴識別の検討

    サントソ, ジェニファー, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集     917 - 928  2019.09

  • BLSTMを用いた音声認識誤り区間推定の検討

    舒, 禹清, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集     921 - 924  2019.09

  • CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations

    Yamaoka, Kouei, Li, Li, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    Proc. EUSIPCO 2019    2019.09  [Refereed]

  • Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier

    Li, Li, Hirokazu, Kameoka, Makino, Shoji

    Proc. ICASSP2019     546 - 550  2019.05

  • Joint separation and dereverberation of reverberant mixtures with multichannel variational autoencoder

    Shota, Inoue, Hirokazu, Kameoka, Li, Li, Shogo, Seki, Makino, Shoji

    Proc. ICASSP2019     96 - 100  2019.05  [Refereed]

  • Time-frequency-bin-wise switching of minimum variance distortionless response beamformer for underdetermined situations

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    Proc. ICASSP 2019     7908 - 7912  2019.05  [Refereed]

  • Experimental evaluation of WaveRNN predictor for audio lossless coding

    Amada, Shota, Sugiura, Ryosuke, Kamamoto, Yutaka, Harada, Noboru, Moriya, Takehiro, Yamada, Takeshi, Makino, Shoji

    Proc. NCSP'19     315 - 318  2019.03  [Refereed]

  • MVDRビームフォーマの時間周波数スイッチングによる劣決定音声強調

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    電子情報通信学会技術研究報告(SP)   SIP2018-130   149 - 154  2019.03

  • 日本語スピーキングテストにおける解答発話テキストの分散表現を用いた自動採点の検討

    臼井, 桃香, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会論文集(D)   D-14-10   137 - 137  2019.03

  • Gated CNNを用いた劣悪な雑音環境下における音声区間検出

    李莉, 越野ゆき, 松本光雄, 牧野, 昭二

    電子情報通信学会技術研究報告   EA2018-124   19 - 24  2019.03

  • Noise suppression using beamformer and transfer-function-gain nonnegative matrix factorization with distributed stereo microphones

    Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

    Proc. NCSP'19     260 - 263  2019.03  [Refereed]

  • Categorizing error causes related to utterance characteristics in speech recognition

    Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

    Proc. NCSP'19     514 - 517  2019.03  [Refereed]

  • 多チャンネル変分自己符号化器を用いた音源分離と残響除去の統合的アプローチ

    井上翔太, 亀岡弘和, 李莉, 関翔悟, 牧野, 昭二

    音講論集   2-Q-32   399 - 402  2019.03

  • Performance evaluation of time-frequency-bin-wise switching beamformer in reverberant environments

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    Proc. NCSP'19     264 - 267  2019.03  [Refereed]

  • 時間周波数スイッチングビームフォーマとGated CNNを用いた時間周波数マスクの組み合わせによる劣決定音声強調

    髙橋, 理希, 山岡, 洸瑛, 李, 莉, 牧野, 昭二, 山田, 武志

    日本音響学会2019年春季研究発表会講演論文集   1-6-5   181 - 184  2019.03

  • 音源クラス識別器つき多チャンネル変分自己符号化器を用いた高速セミブラインド音源分離

    李, 莉, 亀岡, 弘和, 牧野, 昭二

    音講論集   1-6-10   201 - 204  2019.03

  • Microphone position realignment by extrapolation of virtual microphone

    Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Yamada, Takeshi, Makino, Shoji

    Proc. APSIPA ASC 2018     367 - 372  2018.11  [Refereed]

  • Weakly labeled learning using BLSTM-CTC for sound event detection

    Matsuyoshi, Taiki, Komatsu, Tatsuya, Kondo, Reishi, Yamada, Takeshi, Makino, Shoji

    Proc. APSIPA ASC 2018     1918 - 1923  2018.11  [Refereed]

  • WaveRNNを利用した音声ロスレス符号化に関する検討と考察

    天田, 将太, 杉浦, 亮介, 鎌本, 優, 原田, 登, 守谷, 健弘, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集   2-4-9   1149 - 1152  2018.09

  • 10
    Citation
    (Scopus)
  • ヴァーチャルマイクロフォンの外挿によるマイクロフォン間隔の仮想的拡張

    陣在, 遼河, 山岡, 洸瑛, 松本, 光雄, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集   1-1-21   149 - 152  2018.09

  • 時間周波数スイッチングビームフォーマと時間周波数マスキングによる劣決定音声強調

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    日本音響学会秋季研究発表会講演論文集   1-Q-12   407 - 410  2018.09

  • Time-frequency-bin-wise beamformer selection and masking for speech enhancement in underdetermined noisy scenarios

    Yamaoka, Kouei, Brendel, Andreas, Ono, Nobutaka, Makino, Shoji, Buerger, Michael, Yamada, Takeshi, Kellermann, Walter

    Proc. EUSIPCO 2018     1596 - 1600  2018.09  [Refereed]

  • Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming

    Makino,Shoji

    Proc. IWAENC2018     71 - 75  2018.09  [Refereed]

  • 音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習法の有効性評価

    松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集   1-R-5   961 - 964  2018.09

  • Acoustic Scene Classification Based on Spatial Feature Extraction Using Convolutional Neural Networks

    Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

    Journal of Signal Processing   22 ( 4 ) 199 - 202  2018.07  [Refereed]

     View Summary

    Acoustic scene classification (ASC) classifies the place or situation where an acoustic sound was recorded. The Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge prepared a task involving ASC. Some methods using convolutional neural networks (CNNs) were proposed in the DCASE 2017 Challenge. The best method independently performed convolution operations for the left, right, mid (addition of left and right channels), and side (subtraction of left and right channels) input channels to capture spatial features. On the other hand, we propose a new method of spatial feature extraction using CNNs. In the proposed method, convolutions are performed for the time-space (channel) domain and frequency-space domain in addition to the time-frequency domain to capture spatial features. We evaluate the effectiveness of the proposed method using the task in the DCASE 2017 Challenge. The experimental results confirmed that convolution operations for the frequency-space domain are effective for capturing spatial features. Furthermore, by using a combination of the three domains, the classification accuracy was improved by 2.19% compared with that obtained using the tim

    DOI

  • 畳み込みニューラルネットワークを用いた空間特徴抽出に基づく音響シーン識別の検討

    高橋, 玄, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会講演論文集     67 - 70  2018.03

  • 複数ビームフォーマの組み合わせによる非線形マイクロフォンアレイ

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    日本音響学会春季研究発表会講演論文集     475 - 478  2018.03

  • Ego noise reduction and sound source localization adapted to human ears using hose-shaped rescue robot

    Mae, Narumi, Yamaoka, koei, Mitsui, Yosiki, Matsumoto, Mitsuo, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

    Proc. NCSP'18     371 - 374  2018.03  [Refereed]

  • 複数種録音端末を用いた会議の想定における伝達関数ゲイン基底NMFによる遠方音源抑圧の性能評価

    松井, 裕太郎, 牧野, 昭二, 小野, 順貴, 山田, 武志

    電子情報通信学会技術研究報告     335 - 340  2018.03

  • 音声認識における誤認識原因通知のための印象評定値推定の検討

    後藤, 孝宏, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会講演論文集     117 - 120  2018.03

  • 音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習の検討

    松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会講演論文集     63 - 66  2018.03

  • Novel speech recognition interface based on notification of utterance volume required in changing noisy environment

    Goto, Takahiro, Yamada, Takeshi, Makino, Shoji

    Proc. NCSP'18     192 - 195  2018.03  [Refereed]

  • Acoustic scene classification based on spatial feature extraction using convolutional neural networks

    Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

    Proc. NCSP'18     188 - 191  2018.03  [Refereed]

  • Ego-noise reduction for hose-shaped rescue robot using basis-shared semi-supervised independent low-rank matrix analysis

    Takakusaki, Moe, Kitamura, Daichi, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi, Saruwatari, Hiroshi

    Proc. NCSP'18     351 - 354  2018.03  [Refereed]

  • Sound source localization using binaural difference for hose-shaped rescue robot

    Mae, Narumi, Mitsui, Yoshiki, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

    Proc. APSIPA 2017     1 - 7  2017.12  [Refereed]

  • Abnormal sound detection by two microphones using virtual microphone technique

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    Proc. APSIPA 2017     1 - 5  2017.12  [Refereed]

  • Performance evaluation of acoustic scene classification using DNN-GMM and frame-concatenated acoustic Features

    Takahashi, Gen, Yamada, Takeshi, Ono, Nobutaka, Makino, Shoji

    Proc. APSIPA 2017     1 - 5  2017.12  [Refereed]

  • Experimental evaluation of encoding parameters of MPEG-4 ALS for high-resolution audio

    Amada, Shota, Kamamoto, Yutaka, Harada, Noboru, Sugiura, Ryosuke, Moriya, Takehiro, Makino, Shoji, Yamada, Takeshi

    Proc. IEEE GCCE 2017     141 - 145  2017.10  [Refereed]

  • 音響ロスレス符号化MPEG-4 ALSにおけるハイレゾ音源向け線形予測次数最適化に関する検討と考察

    天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会講演論文集     251 - 254  2017.09

  • Far-noise suppression by transfer-function-gain non-negative matrix factorization in ad hoc microphone array

    村瀬, 慶和, 小野, 順貴, 宮部, 滋樹, 山田, 武志, 牧野, 昭二

    THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN   73 ( 9 ) 563 - 570  2017.09  [Refereed]

     View Summary

    <p>ビームフォーミングなどの従来のアレー信号処理による雑音抑圧手法は,位相情報を活用した指向性制御に基づいており,特定方向から到来する雑音に対しては指向性の零点を向けることで高い効果が得られる。しかし,到来方向が特定できないような,いわゆる背景雑音の抑圧は,一般に難しかった。本論文では,伝達関数ゲイン基底NMFにより,遠方から到来する雑音を複数マイクを用いて効果的に抑圧する手法を提案する。提案手法では,背景雑音が遠方から到来することを仮定し,時間周波数領域における振幅情報のみに着目することで,様々な方向から到来する遠方音源を一つの混合音源としてモデル化する。次にこの振幅の混合モデルを従来提案されている制約付き伝達関数ゲイン基底NMFに適用し,遠方音源の抑圧を行う。更に,半教師あり伝達関数ゲイン基底NMFを適用し,遠方音源の抑圧を行う。本手法は振幅情報のみを用いているため,非同期録音機器を用いることができ

    DOI CiNii

  • Mel-generalized cepstral regularization for discriminative non-negative matrix factorization

    Li, Li, Kameoka, Hirokazu, Makino, Shoji

    Proc. MLSP     1 - 6  2017.09  [Refereed]

  • Performance evaluation of nonlinear speech enhancement based on virtual increase of channels in reverberant environments

    Yamaoka, Kouei, Ono, Nobutaka, Yamada, Takeshi, Makino, Shoji

    Proc. EUSIPCO 2017     2388 - 2392  2017.08  [Refereed]

  • Multiple far noise suppression in a real environment using transfer-function-gain NMF

    Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

    Proc. EUSIPCO 2017     2378 - 2382  2017.08  [Refereed]

  • Convolutional neural network architecture and input volume matrix design for ERP classifications in a tactile P300-based brain-computer interface

    Kodama, Takumi, Makino, Shoji

    Proc. IEEE Engineering in Medicine & Biology Society (EMBC)     3814 - 3817  2017.07  [Refereed]

  • 教師信号を用いた非同期分散型マイクロホンアレーによる音源分離

    坂梨, 龍太郎, 小野, 順貴, 宮部, 滋樹, 山田, 武志, 牧野, 昭二

    日本音響学会誌   73 ( 6 ) 337 - 348  2017.06  [Refereed]

    DOI CiNii

  • Development of High Quality Blind Source Separation Based on Independent Low-Rank Matrix Analysis and Statistical Speech Enhancement for Flexible Hose-Shaped Robot

    三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

    The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec)   1P2-P04   1 - 4  2017.05

     View Summary

    <p>In this paper, we propose a novel blind source separation method for the hose-shaped rescue robot based on independent low-rank matrix analysis and statistical speech enhancement. The rescue robot is aimed to detect victims'speech in a disaster area, wearing multiple microphones around the body. Different from the common microphone array, the positions of microphones are unknown, and the conventional beamformer cannot be utilized. In addition, the vibration noise (ego-noise) is generated when the robot moves, yielding the serious contamination in the observed signals. Therefore, it is important to eliminate the ego-noise in this system. This paper describes our newly developed software and hardware system of blind source separation for the robot noise reduction. Also, we report objective and subjective evaluation results showing that the proposed system outperforms the conventional methods in the source separation accuracy and perceptual sound quality via experiments with actual sounds observed in the rescue robot.</p>

    DOI CiNii

  • DNN-GMMと連結特徴量を用いた音響シーン識別の検討

    高橋, 玄, 山田, 武志, 小野, 順貴, 牧野, 昭二

    日本音響学会2017年春季研究発表会   2-P-1   135 - 138  2017.03

  • 補助関数法による識別的NMFの基底学習アルゴリズム

    李莉, 亀岡弘和, 牧野昭二

    日本音響学会2017年春季研究発表会   1-P-4   519 - 522  2017.03

  • 独立低ランク行列分析と統計的音声強調を用いた柔軟索状ロボットにおけるブラインド音源分離システムの開発

    三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

    日本音響学会2017年春季研究発表会   1-P-3   517 - 518  2017.03

  • SJ-CATにおける項目応答理論に基づく能力値推定の精度改善

    小野, 友暉, 山田, 武志, 菊地, 賢一, 今井, 新悟, 牧野, 昭二

    日本音響学会2017年春季研究発表会   2-6-3   247 - 250  2017.03

  • 音響ロスレス符号化MPEG-4 ALSのハイレゾ音源適応の検討と考察

    天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

    日本音響学会2017年春季研究発表会   2-P-42   381 - 382  2017.03

  • Analysis of the brain activated distributions in response to full-body spatial vibrotactile stimuli using a tactile P300-based BCI paradigm

    Kodama, T, Makino, Shoji

    Proc. Biomedical and Health Informatics (BHI)     1 - 1  2017.02  [Refereed]

  • Low latency and high quality two-stage human-voice-enhancement system for a hose-shaped rescue robot

    Yoshiaki Bando, Hiroshi Saruwatari, Nobutaka Ono, Shoji Makino, Katustoshi Itoyama, Daichi Kitamura, Masaru Ishimura, Moe Takakusaki, Narumi Mae, Kouei Yamaoka, Yutaro Matsui, Yuichi Ambe, Masashi Konyo, Satoshi Tadokoro, Kazuyoshi Yoshii, Hiroshi G. Okuno

    Journal of Robotics and Mechatronics   29 ( 1 ) 198 - 212  2017.02

     View Summary

    This paper presents the design and implementation of a two-stage human-voice enhancement system for a hose-shaped rescue robot. When a microphoneequipped hose-shaped robot is used to search for a victim under a collapsed building, human-voice enhancement is crucial because the sound captured by a microphone array is contaminated by the ego-noise of the robot. For achieving both low latency and high quality, our system combines online and offline human-voice enhancement, providing an overview first and then details on demand. The online enhancement is used for searching for a victim in real time, while the offline one facilitates scrutiny by listening to highly enhanced human voices. Our online enhancement is based on an online robust principal component analysis, and our offline enhancement is based on an independent lowrank matrix analysis. The two enhancement methods are integrated with Robot Operating System (ROS). Experimental results showed that both the online and offline enhancement methods outperformed conventional methods.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • DISCRIMINATIVE NON-NEGATIVE MATRIX FACTORIZATION WITH MAJORIZATION-MINIMIZATION

    Li Li, Hirokazu Kameoka, Shoji Makino

    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017)     141 - 145  2017  [Refereed]

     View Summary

    Non-negative matrix factorization (NMF) is a powerful approach to single channel audio source separation. In a supervised setting, NMF is first applied to train the basis spectra of each sound source. At test time, NMF is applied to the spectrogram of a mixture signal using the pretrained spectra. The source signals can then be separated out using a Wiener filter. A typical way to train the basis spectra of each source is to minimize the objective function of NMF. However, the basis spectra obtained in this way do not ensure that the separated signal will be optimal at test time due to the inconsistency between the objective functions for training and separation (Wiener filtering). To address this, a framework called discriminative NMF (DNMF) has recently been proposed. In in this work a multiplicative update algorithm was proposed for the basis training, however one drawback is that the convergence is not guaranteed. To overcome this drawback, this paper proposes using a majorization-minimization principle to develop a convergence-guaranteed algorithm for DNMF. Experimental results showed that the proposed algorithm outperformed standard NMF and DNMF using a multiplicative update algorithm as regards both the signal-to-distortion and signal-to-interference ratios.

  • Blind source separation and multi-talker speech recognition with ad hoc microphone array using smartphones and cloud storage

    越智景子, 小野順貴, 宮部滋樹, 牧野昭二

    Acoustical Science and Technologyv    2017  [Refereed]

  • Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization

    Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH   2017-   1998 - 2002  2017  [Refereed]

     View Summary

    Spectral domain speech enhancement algorithms based on nonnegative spectrogram models such as non-negative matrix factorization (NMF) and non-negative matrix factor deconvolution are powerful in terms of signal recovery accuracy, however they do not directly lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. We have previously proposed a method that makes it possible to enhance speech in the spectral and cepstral domains simultaneously. Although this method was shown to be effective, the devised algorithm was computationally demanding. This paper proposes yet another formulation that allows for a fast implementation by replacing the regularization term with a divergence measure between the NMF model and the mel-generalized cepstral (MGC) representation of the target spectrum. Since the MGC is an auditory-motivated representation of an audio signal widely used in parametric speech synthesis, we also expect the proposed method to have an effect in enhancing the perceived quality. Experimental results revealed the effectiveness of the proposed method in terms of both the signal-To-distortion ratio and the cepstral distance.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Full-body tactile P300-based brain-computer interface accuracy refinement

    Kodama, T, Shimizu, K, Makino, Shoji, Rutkowski, T

    Proc. International Conference on Bio-engineering for Smart Technologies (BioSMART)     1 - 4  2016.12  [Refereed]

  • 伝達関数ゲイン基底NMFを用いた遠方雑音抑圧の実環境での評価

    松井,裕太郎, 牧野,昭二, 小野,順貴, 山田,武志

    第31回信号処理シンポジウム   B3-1   231 - 235  2016.11

  • Flexible microphone array based on multichannel nonnegative matrix factorization and statistical signal estimation

    Saruwatari, H, Takata, K, Ono, N, Makino, Shoji

    International Congress on Acoustics     1 - 10  2016.09  [Refereed]

  • Acoustic scene classification using deep neural network and frame-concatenated acoustic feature

    Gen,Takahashi, Takeshi,Yamada, Shoji,Makino, Nobutaka,Ono

    DCASE2016 Challenge     1 - 2  2016.09

  • 雑音下音声認識における必要発話音量提示機能の実装と評価

    後藤,孝宏, 山田,武志, 牧野,昭二

    日本音響学会秋季研究発表会   3-Q-12   117 - 120  2016.09

  • ヴァーチャル多素子化に基づくSN比最大化ビームフォーマの残響に対する性能変化

    山岡,洸瑛, 小野,順貴, 山田,武志, 牧野,昭二

    日本音響学会秋季研究発表会   3-7-5   379 - 382  2016.09

  • Toward a QoL improvement of ALS patients: Development of the full-body P300-based tactile brain-computer interface

    Kodama, T, Makino, Shoji, Rutkowski, T

    Proc. AEARU Young Researchers International Conference     1 - 4  2016.09  [Refereed]

  • 日本語スピーキングテストSJ-CATにおける項目応答理論に基づく能力値推定の検証

    小野,友暉, 山田,武志, 菊地,賢一, 今井,新悟, 牧野,昭二

    日本音響学会秋季研究発表会   3-Q-26   253 - 256  2016.09

  • Amplitude-based speech enhancement with non-negative matrix factorization in time-channel domain for ad-hoc microphone array

    H., Chiba, N., Ono, S., Miyabe, Y., Takahashi, T., Yamada, S., Makino

    J. Acoust. Soc. Jpn   72 ( 8 ) 462 - 470  2016.08  [Refereed]

    CiNii

  • アドホックマイクロホンアレーにおける時間チャネル領域での非負値行列因子分解を用いた振幅ベースの音声強調

    千葉,大将, 小野,順貴, 宮部,滋樹, 高橋,祐, 山田,武志, 牧野,昭二

    日本音響学会誌   72 ( 8 ) 462 - 470  2016.08  [Refereed]

    CiNii

  • Amplitude-based speech enhancement with non-negative matrix factorization in time-channel domain for ad-hoc microphone array

    千葉大将, 小野順貴, 宮部滋樹, 高橋祐, 山田武志, 牧野昭二

    J. Acoust. Soc. Jpn   72 ( 8 ) 462 - 470  2016.08  [Refereed]

     View Summary

    <p>本論文では,時間チャネル領域の非負値行列因子分解(NMF)による,非同期分散型録音の目的音強調手法について述べる。複数の録音機器による多チャネル信号は,機器ごとのサンプリング周波数の微小なずれが引き起こす位相差のドリフトのため,位相情報を用いるアレー信号処理は適さない。位相に比べると振幅の分析はドリフトの影響を大きく受けないことに着目し,戸上らが提案した時間チャネル領域のNMFによるチャネル間ゲイン差の分析(伝達関数ゲイン基底NMF)に基づく時間周波数マスクを用いる。また,基底数よりも十分大きなチャネル数が得られない条件の音声強調のための,基底を事前に学習する教師ありNMFについて議論する。</p>

    DOI

  • 音声のスペクトル領域とケプストラム領域における同時強調

    李莉, 亀岡弘和, 樋口卓哉, 猿渡洋, 牧野昭二

    信学技報   SP2016-32   29 - 32  2016.08

  • An Extension of MUSIC Exploiting Higher-Order Moments via Nonlinear Mapping

    Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Biing-Hwang Juang

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E99A ( 6 ) 1152 - 1162  2016.06  [Refereed]

     View Summary

    MUltiple Signal Classification (MUSIC) is a standard technique for direction of arrival (DOA) estimation with high resolution. However, MUSIC cannot estimate DOAs accurately in the case of underdetermined conditions, where the number of sources exceeds the number of microphones. To overcome this drawback, an extension of MUSIC using cumulants called 2q-MUSIC has been proposed, but this method greatly suffers from the variance of the statistics, given as the temporal mean of the observation process, and requires long observation. In this paper, we propose a new approach for extending MUSIC that exploits higher-order moments of the signal for the underdetermined DOA estimation with smaller variance. We propose an estimation algorithm that nonlinearly maps the observed signal onto a space with expanded dimensionality and conducts MUSIC-based correlation analysis in the expanded space. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method enables the estimation of DOAs in the case of underdetermined conditions. Furthermore, we describe the class of mapping that allows us to analyze the higher-order moments of the observed signal in the original space. We compare 2q-MUSIC and the proposed method through an experiment assuming that the true number of sources is known as prior information to evaluate in terms of the bias-variance tradeoff of the statistics and computational complexity. The results clarify that the proposed method has advantages for both computational complexity and estimation accuracy in short-time analysis, i.e., the time duration of the analyzed data is short.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • An extension of MUSIC exploiting higher-order moments via nonlinear mapping

    Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Biing-Hwang Juang

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E99A ( 6 ) 1152 - 1162  2016.06  [Refereed]

     View Summary

    MUltiple SIgnal Classification (MUSIC) is a standard technique for direction of arrival (DOA) estimation with high resolution. However, MUSIC cannot estimate DOAs accurately in the case of underdetermined conditions, where the number of sources exceeds the number of microphones. To overcome this drawback, an extension of MUSIC using cumulants called 2q-MUSIC has been proposed, but this method greatly suffers from the variance of the statistics, given as the temporal mean of the observation process, and requires long observation. In this paper, we propose a new approach for extending MUSIC that exploits higher-order moments of the signal for the underdetermined DOA estimation with smaller variance. We propose an estimation algorithm that nonlinearly maps the observed signal onto a space with expanded dimensionality and conducts MUSIC-based correlation analysis in the expanded space. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method enables the estimation of DOAs in the case of underdetermined conditions. Furthermore, we describe the class of mapping that allows us to analyze the higher-order moments of the observed signal in the original space. We compare 2q-MUSIC and the proposed method through an experiment assuming that the true number of sources is known as prior information to evaluate in terms of the bias-variance tradeoff of the statistics and computational complexity. The results clarify that the proposed method has advantages for both computational complexity and estimation accuracy in short-time analysis, i.e., the time duration of the analyzed data is short.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

    高草木萌, 北村大地, 小野順貴, 山田武志, Makino, Shoji, 猿渡洋

    日本機械学会ロボティクス・メカトロニクス講演会   1A2-10a3   1 - 4  2016.06

  • Applying independent vector analysis and noise cancellation to noise reduction for a hose-shaped rescue robot

    石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

    The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec)   1P1-08b3   1 - 4  2016.06

     View Summary

    <p>This paper presents a noise reduction on a hose-shaped rescue robot. The hose-shaped rescue robot is one of rescue robots developed on Tough Robotics Challenge, and it is used for searching for victims by getting one's voice with its microphone-array. However, the ego noise, caused by its vibration motors, makes it difficult to get the human voice. We propose a noise reduction method using a blind source separation technique based on Independent Vector Analysis (IVA) and noise cancellation. Our method consists of two steps: (1) estimating a speech signal and an ego-noise signal from observed multi-channel signals using the IVA-based blind source separation technique, and (2) applying the noise cancellation to the estimated speech signal using the estimated ego-noise signal as a noise reference.</p>

    DOI

  • ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

    高草木萌, 北村大地, 小野順貴, 山田武志, Makino, Shoji, 猿渡洋

    日本機械学会ロボティクス・メカトロニクス講演会   1A2-10a3   1 - 4  2016.06

     View Summary

    <p>A hose-shaped rescue robot is one of the robots that are developed for disaster response in case of a large-scale disasters such as a great earthquake. The robot is suitable for entering narrow and dark places covered with rubble in the disaster site, and for finding inside it. This robot can transmit the ambient sound to its operator by using the built-in microphones. However, there is a serious problem that the inherent noise of this robot, such as the vibration sound or the fricative sound, is mixed into the transmitting voice, therefore disturbing the operator's hearing for a call of help from the victim of the disaster. In this paper, we apply the multichannel NMF (nonnegative matrix factorization) with the rank-1 spatial constraint (Rank-1 MNMF), which was proposed by Kitamura et al., to the reduction of the inherent noise.</p>

    DOI

  • A-5-2 Noise reduction for a hose-shaped rescue robot using independent vector analysis and noise cancellation

    石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

    電子情報通信学会総合大会   2016   58 - 58  2016.03

    CiNii

  • 教師あり多チャネルNMFと統計的音声強調を用いた柔軟索状ロボットにおける音源分離

    高田一真, 北村大地, 中嶋広明, 小山翔一, 猿渡洋, 小野順貴, 牧野,昭二

    日本音響学会2015年春季研究発表会   ( 3-3-2 ) 609 - 612  2016.03

  • 非同期分散マイクロホンによるブラインド音源分離を用いた複数話者同時音声認識

    越智景子, 小野順貴, 宮部滋樹, 牧野,昭二

    日本音響学会2016年春季研究発表会   ( 3-3-1 ) 607 - 608  2016.03

  • Vehicle counting and lane estimation with ad-hoc microphone array in real road environments

    Takuya,Toyoda, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    Proc. NCSP'16     622 - 625  2016.03  [Refereed]

  • A-5-1 Noise reduction using rank-1 multichannel NMF for a hose-shaped rescue robot

    高草木,萌, 北村,大地, 小野,順貴, 山田,武志, 牧野,昭二, 猿渡,洋

    Proceedings of the IEICE Engineering Sciences Society/NOLTA Society Conference   2016   57 - 57  2016.03

    CiNii

  • 振幅のみからの相関推定と雑音尖度に基づく空間サブトラクションアレーの減算係数最適化

    李,傑, 宮部,滋樹, 小野,順貴, 山田,武志, 牧野,昭二

    日本音響学会2016年春季研究発表会     689 - 692  2016.03

  • Performance estimation of noisy speech recognition using spectral distortion and recognition task complexity

    Ling Guo, Takeshi Yamada, Shigeki Miyabe, Shoji Makino, Nobuhiko Kitawaki

    Acoustical Science and Technology   37 ( 6 ) 286 - 294  2016  [Refereed]

     View Summary

    Previously, methods for estimating the performance of noisy speech recognition based on a spectral distortion measure have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, no consideration is given to any change in the components of a speech recognition system. To solve this problem, we propose a novel method for estimating the performance of noisy speech recognition, a major feature of which is the ability to accommodate the use of different noise reduction algorithms and recognition tasks by using two cepstral distances (CDs) and the square mean root perplexity (SMR-perplexity). First, we verified the effectiveness of the proposed distortion measure, i.e., the two CDs. The experimental results showed that the use of the proposed distortion measure achieves estimation accuracy equivalent to the use of the conventional distortion measures, the perceptual evaluation of speech quality (PESQ) and the signal-to-noise ratio (SNR) of noise-reduced speech, and has the advantage of being applicable to noise reduction algorithms that directly output the mel-frequency cepstral coefficient (MFCC) feature. We then evaluated the proposed method by performing a closed test and an open test (10-fold crossvalidation test). The results confirmed that the proposed method gives better estimates without being dependent on the differences among the noise reduction algorithms or the recognition tasks.

    DOI

    Scopus

  • Performance Estimation of Spontaneous Speech Recognition Using Non-Reference Acoustic Features

    Ling Guo, Takeshi Yamada, Shoji Makino

    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1 - 4  2016  [Refereed]

     View Summary

    To ensure a satisfactory QoE (Quality of Experience), it is essential to establish a method that can be used to efficiently investigate recognition performance for spontaneous speech. By using this method, it is allowed to monitor the recognition performance in providing speech recognition services. It can be also used as a reliability measure in speech dialogue systems. Previously, methods for estimating the performance of noisy speech recognition based on spectral distortion measures have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, the methods cannot be applied to spontaneous speech because they require the reference speech to obtain the distortion values. To solve this problem, we propose a novel method for estimating the recognition performance of spontaneous speech with various speaking styles. The main feature is to use non-reference acoustic features that do not require the reference speech. The proposed method extracts non-reference features by openSMILE (open-Source Media Interpretation by Large feature-space Extraction) and then estimates the recognition performance by using SVR (Support Vector Regression). We confirmed the effectiveness of the proposed method by experiments using spontaneous speech data from the OGVC (On-line Gaming Voice Chat) corpus.

  • NOISE REDUCTION USING INDEPENDENT VECTOR ANALYSIS AND NOISE CANCELLATION FOR A HOSE-SHAPED RESCUE ROBOT

    Masaru Ishimura, Shoji Makino, Takeshi Yamada, Nobutaka Ono, Hiroshi Saruwatari

    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC)     1 - 5  2016  [Refereed]

     View Summary

    In this paper, we present noise reduction for a hose-shaped rescue robot. The robot is used for searching for disaster victims by capturing their voice with its microphone array. However, the ego noise generated by its vibration motors makes it difficult to distinguish human voices. To solve this problem, we propose a noise reduction method using a blind source separation technique based on independent vector analysis (IVA) and noise cancellation. Our method consists of two steps: (1) estimating a speech signal and an ego noise signal from observed multichannel signals using the IVA-based blind source separation technique, and (2) applying noise cancellation to the estimated speech signal using the estimated ego noise signal as a noise reference. The experimental evaluations show that this approach is effective for suppressing the ego noise.

  • Visual Motion Onset Brain--computer Interface

    Tomasz M. Rutkowski

    Proc. International Conference on Bio-engineering for Smart Technologies (BioSMART)     1 - 4  2016  [Refereed]

  • Nonlinear speech enhancement by virtual increase of channels and maximum SNR beamformer

    Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING   2016 ( 1 ) 1 - 8  2016.01  [Refereed]

     View Summary

    In this paper, we propose a new microphone array signal processing technique, which increases the number of microphones virtually by generating extra signal channels from real microphone signals. Microphone array signal processing methods such as speech enhancement are effective for improving the quality of various speech applications such as speech recognition and voice communication systems. However, the performance of speech enhancement and other signal processing methods depends on the number of microphones. Thus, special equipment such as a multichannel A/D converter or a microphone array is needed to achieve high processing performance. Therefore, our aim was to establish a technique for improving the performance of array signal processing with a small number of microphones and, in particular, to increase the number of channels virtually by synthesizing virtual microphone signals, or extra signal channels, from two channels of microphone signals. Each virtual microphone signal is generated by interpolating a short-time Fourier transform (STFT) representation of the microphone signals. The phase and amplitude of the signal are interpolated individually. The phase is linearly interpolated on the basis of a sound propagation model, and the amplitude is nonlinearly interpolated on the basis of beta divergence. We also performed speech enhancement experiments using a maximum signal-to-noise ratio (SNR) beamformer equipped with virtual microphones and evaluated the improvement in performance upon introducing virtual microphones.

    DOI

    Scopus

    17
    Citation
    (Scopus)
  • EGO-NOISE REDUCTION FOR A HOSE-SHAPED RESCUE ROBOT USING DETERMINED RANK-1 MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION

    Moe Takakusaki, Daichi Kitamura, Nobutaka Ono, Takeshi Yamada, Shoji Makino, Hiroshi Saruwatari

    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC)     1 - 4  2016  [Refereed]

     View Summary

    A hose-shaped rescue robot is one of the robots that have been developed for disaster response in times of large-scale disasters such as a massive earthquake. This robot is suitable for entering narrow and dark places covered with rubble in a disaster site and for finding victims inside it. It can transmit ambient sound captured by its built-in microphones to its operator. However, there is a serious problem, that is, the inherent noise of this robot, such as vibration sound or fricative sound, is mixed with the transmitted voice, thereby disturbing the operator's perception of a call for help from a disaster victim. In this paper, we apply the multichannel nonnegative matrix factorization (NMF) with the rank-1 spatial constraint (determined rank-1 MNMF), which was proposed by Kitamura et al., to the reduction of the inherent noise.

  • Multi-talker Speech Recognition Based on Blind Source Separation with Ad hoc Microphone Array Using Smartphones and Cloud Storage

    Keiko Ochi, Nobutaka Ono, Shigeki Miyabe, Shoji Makino

    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5     3369 - 3373  2016  [Refereed]

     View Summary

    In this paper, we present a multi-talker speech recognition system based on blind source separation with an ad hoc microphone array, which consists of smartphones and cloud storage. In this system, a mixture of voices from multiple speakers is recorded by each speaker's smartphone, which is automatically transferred to online cloud storage. Our prototype system is realized using iPhone and Dropbox. Although the signals recorded by different iPhones are not synchronized, the blind synchronization technique compensates both the differences in the time offset and the sampling frequency mismatch. Then, auxiliary-function-based independent vector analysis separates the synchronized mixture into each speaker's voice. Finally, automatic speech recognition is applied to transcribe the speech. By experimental evaluation of the multi-talker speech recognition system using Julius, we confirm that it effectively reduces the speech overlap and improves the speech recognition performance.

    DOI

    Scopus

    17
    Citation
    (Scopus)
  • Tactile Brain-computer Interface Using Classification of P300 Responses Evoked by Full Body Spatial Vibrotactile Stimuli

    Takumi Kodama, Shoji Makino, Tomasz M. Rutkowski

    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1 - 4  2016  [Refereed]

     View Summary

    In this study we propose a novel stimulus-driven brain-computer interface (BCI) paradigm, which generates control commands based on classification of somatosensory modality P300 responses. Six spatial vibrotactile stimulus patterns are applied to entire back and limbs of a user. The aim of the current project is to validate an effectiveness of the vibrotactile stimulus patterns for BCI purposes and to establish a novel concept of tactile modality communication link, which shall help locked-in syndrome (LIS) patients, who lose their sight and hearing due to sensory disabilities. We define this approach as a full-body BCI (fbBCI) and we conduct psychophysical stimulus evaluation and realtime EEG response classification experiments with ten healthy body-able users. The grand mean averaged psychophysical stimulus pattern recognition accuracy have resulted at 9 8 : 1 8 %, whereas the realtime EEG accuracy at 5 3 : 6 7 %. An information-transfer-rate (ITR) scores of all the tested users have ranged from 0 : 0 4 2 to 4 : 1 5 4 bit/minute.

  • Ego Noise Reduction for Hose-Shaped Rescue Robot Combining Independent Low-Rank Matrix Analysis and Noise Cancellation

    Narumi Mae, Daichi Kitamura, Masaru Ishimura, Takeshi Yamada, Shoji Makino

    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1 - 6  2016  [Refereed]

     View Summary

    In this paper, we present an ego noise reduction method for a hose-shaped rescue robot developed for search and rescue operations in large-scale disasters such as a massive earthquake. It can enter narrow and dark places covered with rubble in a disaster site and is used to search for disaster victims by capturing their voices with its microphone array. However, ego noises, such as vibration or fricative sounds, are mixed with the voices, and it is difficult to differentiate them from a call for help from a disaster victim. To solve this problem, we here propose a two-step noise reduction method as follows: (1) the estimation of both speech and ego noise signals from an observed multichannel signal by multichannel nonnegative matrix factorization (NMF) with the rank-1 spatial constraint, which was proposed by Kitamura et al., and (2) the application of noise cancellation to the estimated speech signal using the noise reference. Our evaluations show that this approach is effective for suppressing ego noise.

  • Unisoner:様々な歌手が同一楽曲を歌ったWeb上の多様な歌声を活用する合唱制作支援インタフェース

    都築,圭太, 中野,倫靖, 後藤,真孝, 山田,武志, 牧野,昭二

    情報処理学会論文誌   56 ( 12 ) 2370 - 2383  2015.12  [Refereed]

    CiNii

  • Unisoner: An interface for derivative chorus creation from various singing voices singing the same song on the web

    K.,Tsuzuki, T.,Nakano, M.,Goto, T.,Yamada, S.,Makino

    Journal of Information Processing   56 ( 12 ) 2370 - 2383  2015.12  [Refereed]

    CiNii

  • Adaptive post-filtering method controlled by pitch frequency for CELP-based speech coding

    H.,Chiba, Y.,Kamamoto, T.,Moriya, N.,Harada, S.,Miyabe, T.,Yamada, S.,Makino

    IEICE Trans. Information and Systems   J98-D ( 10 ) 1301 - 1311  2015.10  [Refereed]

  • CELPに基づく音声符号化向けのピッチ周波数に依存した適応ポストフィルタ

    千葉,大将, 鎌本,優, 守谷,健弘, 原田,登, 宮部,滋樹, 山田,武志, 牧野,昭二

    電子情報通信学会論文誌   J98-D ( 10 ) 1301 - 1311  2015.10  [Refereed]

  • ノンリファレンスひずみ特徴量を用いた雑音下音声認識性能推定の検討

    郭,レイ, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

    日本音響学会2015年秋季研究発表会     95 - 98  2015.09

  • 日本語スピーキングテストSJ-CATにおける低スコア解答発話の検出の検討

    小野,友暉, 山田,武志, 今井,新悟, 牧野,昭二

    日本音響学会2015年秋季研究発表会     329 - 332  2015.09

  • マイクロホンアレーの位相が観測できない条件でのチャネル間の相関係数の推定

    宮部滋樹, 小野順貴, 牧野,昭二

    回路とシステムワークショップ   28   347 - 352  2015.08

    CiNii

  • Underdetermined blind speech separation with directivity pattern based continuous mask and ICA

    Shoko Araki, Shoji Makino, Hiroshi Sawada, Ryo Mukai

    European Signal Processing Conference   06-10-   1991 - 1994  2015.04

     View Summary

    We propose a method for separating speech signals when sources outnumber the sensors. In this paper we mainly concentrate on the case of three sources and two sensors. Some existing methods employ binary masks to extract the signals, and therefore, the extracted signals contain loud musical noise. To overcome this problem, we propose the utilization of a directivity pattern based continuous mask, which removes a single source from the observations, and independent component analysis (ICA) to separate the remaining mixtures. Experimental results show that our proposed method can separate signals with little distortion even in a real reverberant environment of T R =130 ms.

  • 認識性能予測に基づく雑音環境下音声認識のユーザビリティ改善の検討

    青木,智充, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会     133 - 136  2015.03

  • 非同期分散マイクロフォンアレーによる伝達関数ゲイン基底NMFを用いた拡散雑音抑圧

    村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会2015年春季研究発表会     557 - 560  2015.03

  • Activity Report from the AASP-TC

    Makino,Shoji

    IEEE Signal Processing Society eNewsletter, TC News    2015.03  [Refereed]

  • Signal Processing Techniques for Assisted Listening

    Sven Nordholm, Walter Kellermann, Simon Doclo, Vesa Vaelimaeki, Shoji Makino, John R. Hershey

    IEEE SIGNAL PROCESSING MAGAZINE   32 ( 2 ) 16 - 17  2015.03  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • ステレオ録音に基づく移動音源モデルによる走行車両検出と走行方向推定

    遠藤,純基, 豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会2015年春季研究発表会     717 - 720  2015.03

  • 総合品質と明瞭性の客観推定に基づくスペクトルサブトラクションの減算係数の最適化

    中里,徹, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会     333 - 336  2015.03

  • ケプストラム距離とSMR-パープレキシティを用いた雑音下音声認識の性能推定の検討

    郭,レイ, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会     129 - 132  2015.03

  • Spatial tactile brain-computer interface by applying vibration to user's shoulders and waist

    T.,Kodama, Makino,Shoji, T.M.,Rutkowski

    AEARU Workshop on Computer Science and Web Technology     41 - 42  2015.02  [Refereed]

  • SSVEP brain-computer interface using green and blue lights

    D.,Aminaka, Makino,Shoji, T.M.,Rutkowski

    AEARU Workshop on Computer Science and Web Technology     39 - 40  2015.02  [Refereed]

  • Spatial auditory brain-computer interface using head related impulse response

    C.,Nakaizumi, T.,Matsui, K.,Mori, Makino,Shoji, T.M.,Rutkowski

    AEARU Workshop on Computer Science and Web Technology     37 - 38  2015.02  [Refereed]

  • Blind compensation of interchannel sampling frequency mismatch for ad hoc microphone array based on maximum likelihood estimation

    Shigeki Miyabe, Nobutaka Ono, Shoji Makino

    SIGNAL PROCESSING   107 ( SI ) 185 - 196  2015.02  [Refereed]

     View Summary

    In this paper, we propose a novel method for the blind compensation of drift for the asynchronous recording of an ad hoc microphone array. Digital signals simultaneously observed by different recording devices have drift of the time differences between the observation channels because of the sampling frequency mismatch among the devices. On the basis of a model in which the time difference is constant within each short time frame but varies in proportion to the central time of the frame, the effect of the sampling frequency mismatch can be compensated in the short-time Fourier transform (STFT) domain by a linear phase shift. By assuming that the sources are motionless and have stationary amplitudes, the observation is regarded as being stationary when drift does not occur. Thus, we formulate a likelihood to evaluate the stationarity in the STFT domain to evaluate the compensation of drift. The maximum likelihood estimation is obtained effectively by a golden section search. Using the estimated parameters, we compensate the drift by STFT analysis with a noninteger frame shift. The effectiveness of the proposed blind drift compensation method is evaluated in an experiment in which artificial drift is generated. (C) 2014 The Authors. Published by Elsevier B.V.

    DOI

    Scopus

    55
    Citation
    (Scopus)
  • Tactile pin-pressure brain-computer interface

    K.,Shimizu, H.,Mori, Makino,Shoji, T.M.,Rutkowski

    AEARU Workshop on Computer Science and Web Technology     35 - 36  2015.02  [Refereed]

  • Multi-command tactile brain-computer interface using the touch-sense glove

    H.,Yajima, Makino,Shoji, T.M.,Rutkowski

    AEARU Workshop on Computer Science and Web Technology     43 - 44  2015.02  [Refereed]

  • Implementation and evaluation of an acoustic echo canceller using duo-filter control system

    Yoichi Haneda, Shoji Makino, Junji Kojima, Suehiro Shimauchi

    European Signal Processing Conference    2015

     View Summary

    The developed acoustic echo canceller uses an exponentially weighted step-size projection algorithm and a duo-filter control system to achieve fast convergence and high speech quality. The duo-filter control system has an adaptive filter and a fixed filter, and uses variable-loss insertion. Evaluation of this system with multi-channel A/D and D/A converters showed that (1) the convergence speed is under 1.5 seconds for speech input when the adaptive filter length is 125 ms, (2) the residual echo level is nearly as low as the ambient noise level (average: Under -20 dB
    maximum: Under -35 dB), and (3) near-end speech is sent with no disturbance during double talk.

  • Brain Evoked Potential Latencies Optimization for Spatial Auditory Brain--Computer Interface

    Tomasz M. Rutkowski

    Cognitive Computation   7 ( 1 ) 34 - 43  2015  [Refereed]

     View Summary

    © 2013, Springer Science+Business Media New York. We propose a novel method for the extraction of discriminative features in electroencephalography (EEG) evoked potential latency. Based on our offline results, we present evidence indicating that a full surround sound auditory brain–computer interface (BCI) paradigm has potential for an online application. The auditory spatial BCI concept is based on an eight-directional audio stimuli delivery technique, developed by our group, which employs a loudspeaker array in an octagonal horizontal plane. The stimuli presented to the subjects vary in frequency and timbre. To capture brain responses, we utilize an eight-channel EEG system. We propose a methodology for finding and optimizing evoked response latencies in the P300 range in order later to classify them correctly and to elucidate the subject’s chosen targets or ignored non-targets. To accomplish the above, we propose an approach based on an analysis of variance for feature selection. Finally, we identify the subjects’ intended commands with a Naive Bayesian classifier for sorting the final responses. The results obtained with ten subjects in offline BCI experiments support our research hypothesis by providing higher classification results and an improved information transfer rate compared with state-of-the-art solutions.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • Chromatic and High-frequency cVEP-based BCI Paradigm

    Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)     1906 - 1909  2015  [Refereed]

     View Summary

    We present results of an approach to a code-modulated visual evoked potential (cVEP) based braincomputer interface (BCI) paradigm using four high-frequency flashing stimuli. To generate higher frequency stimulation compared to the state-of-the-art cVEP-based BCIs, we propose to use the light-emitting diodes (LEDs) driven from a small micro-controller board hardware generator designed by our team. The high-frequency and green-blue chromatic flashing stimuli are used in the study in order to minimize a danger of a photosensitive epilepsy (PSE). We compare the the green-blue chromatic cVEP-based BCI accuracies with the conventional white-black flicker based interface. The high-frequency cVEP responses are identified using a canonical correlation analysis (CCA) method.

  • Classification accuracy improvement of chromatic and high–frequency code–modulated visual evoked potential–based BCI

    Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9250   232 - 241  2015  [Refereed]

     View Summary

    © Springer International Publishing Switzerland 2015. We present results of a classification improvement approach for a code–modulated visual evoked potential (cVEP) based brain– computer interface (BCI) paradigm using four high–frequency flashing stimuli. Previously published research reports presented successful BCI applications of canonical correlation analysis (CCA) to steady–state visual evoked potential (SSVEP) BCIs. Our team already previously proposed the combined CCA and cVEP techniques’ BCI paradigm. The currently reported study presents the further enhanced results using a support vector machine (SVM) method in application to the cVEP–based BCI.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • Fingertip Stimulus Cue-based Tactile Brain-computer Interface

    Hiroki Yajima, Shoji Makino, Tomasz M. Rutkowski

    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1059 - 1064  2015  [Refereed]

     View Summary

    The reported project aims to confirm whether a tactile glove fingertips' stimulator is effective for a brain-computer interface (BCI) paradigm using somatosensory event potential (SEP) responses with possible attentional modulation. The proposed simplified stimulator device is presented in detail together with psychophysical and EEG BCI experiment protocols. Results supporting the proposed simple tactile glove device are presented in form of online BCI classification accuracy results using shrinkage linear discriminant analysis (sLDA) technique. Finally, we discuss future possible paradigm improvement steps.

  • Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation

    Shigeki Miyabe, Notubaka Ono, Shoji Makino

    LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, LVA/ICA 2015   9237   421 - 428  2015  [Refereed]

     View Summary

    In this paper, we propose a method to estimate a correlation coefficient of two correlated complex signals on the condition that only the amplitudes are observed and the phases are missing. Our proposed method is based on a maximum likelihood estimation. We assume that the original complex random variables are generated from a zero-mean bivariate complex normal distribution. The likelihood of the correlation coefficient is formulated as a bivariate Rayleigh distribution by marginalization over the phases. Although the maximum likelihood estimator has no analytical form, an expectation-maximization (EM) algorithm can be formulated by treating the phases as hidden variables. We evaluate the accuracy of the estimation using artificial signal, and demonstrate the estimation of narrow-band correlation of a two-channel audio signal.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Inter-stimulus Interval Study for the Tactile Point-pressure Brain-computer Interface

    Kensuke Shimizu, Shoji Makino, Tomasz M. Rutkowski

    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)     1910 - 1913  2015  [Refereed]

     View Summary

    The paper presents a study of an inter-stimulus interval (ISI) influence on a tactile point-pressure stimulus-based brain-computer interface's (tpBCI) classification accuracy. A novel tactile pressure generating tpBCI stimulator is also discussed, which is based on a three-by-three pins' matrix prototype. The six pin-linear patterns are presented to the user's palm during the online tpBCI experiments in an oddball style paradigm allowing for "the aha-responses" elucidation, within the event related potential (ERP). A subsequent classification accuracies' comparison is discussed based on two ISI settings in an online tpBCI application. A research hypothesis of classification accuracies' non-significant differences with various ISIs is confirmed based on the two settings of 120 ms and 300 ms, as well as with various numbers of ERP response averaging scenarios.

  • Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model

    Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     599 - 603  2015  [Refereed]

     View Summary

    In this paper, we propose a method for suppressing a large number of interferences by using multichannel amplitude analysis based on nonnegative matrix factorization (NMF) and its effective semi-supervised training. For the point-source interference reduction of an asynchronous microphone array, we propose amplitude-based speech enhancement in the time-channel domain, which we call transfer-function-gain NMF. Transfer-function-gain NMF is a robust method against drift, which disrupts an inter-channel phase analysis. We use this method to suppress a large number of sources. We show that a mass of interferences can be modeled by a single basis assuming that the noise sources are sufficiently far from the microphones and the spatial characteristics become similar to each other. Since the blind optimization of the NMF parameters does not work well with merely sparse observation contaminated by the constant heavy noise, we train the diffuse noise basis in advance of the noise suppression using a speech absent observation, which can be obtained easily using a simple voice activity detection technique. We confirmed the effectiveness of our proposed model and semi-supervised transfer-function-gain NMF in an experiment simulating a target source that was surrounded by a diffuse noise.

  • Variable Sound Elevation Features for Head-related Impulse Response Spatial Auditory BCI

    Chisaki Nakaizumi, Shoji Makino, Tomasz M. Rutkowski

    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1094 - 1099  2015  [Refereed]

     View Summary

    This paper presents a study of classification and EEG feature improvement for a spatial auditory brain-computer interface (saBCI). This study provides a comprehensive test of a head-related impulse response (HRIR) cues for the saBCI speller paradigm. We present a comparison with previously developed HRIR-based spatial auditory modalities. We propose and optimize the three types of sound spatialization settings using a variable elevation in order to evaluate the HRIR efficacy for the saBCI. Three experienced and seven naive BCI users participate in the three experimental setups based on ten presented Japanese syllables. The obtained EEG auditory evoked potentials (AEPs) result with encouragingly good and stable P300 responses in online saBCI experiments. We analyze the differences and dispersions of saBCI command accuracies, as well as the individual user accuracies for various spatial sound locations. Our case study indicates that the participating users could perceive elevation in the saBCI experiments using the HRIR measured from a general head model.

  • Head-related Impulse Response Cues for Spatial Auditory Brain-computer Interface

    Chisaki Nakaizumi, Shoji Makino, Tomasz M. Rutkowski

    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC)     1071 - 1074  2015  [Refereed]

     View Summary

    This study provides a comprehensive test of a head-related impulse response (HRIR) cues for a spatial auditory brain-computer interface (saBCI) speller paradigm. We present a comparison with the conventional virtual sound headphone-based spatial auditory modality. We propose and optimize the three types of sound spatialization settings using a variable elevation in order to evaluate the HRIR efficacy for the saBCI. Three experienced and seven naive BCI users participated in the three experimental setups based on ten presented Japanese syllables. The obtained EEG auditory evoked potentials (AEP) resulted with encouragingly good and stable P300 responses in online BCI experiments. Our case study indicated that users could perceive elevation in the saBCI experiments generated using the HRIR measured from a general head model. The saBCI accuracy and information transfer rate (ITR) scores have been improved comparing to the classical horizontal plane-based virtual spatial sound reproduction modality, as far as the healthy users in the current pilot study are concerned.

  • Eeg filtering optimization for code-modulated chromatic visual evoked potential-based brain-computer interface

    D. Aiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9359   1 - 6  2015  [Refereed]

     View Summary

    © Springer International Publishing Switzerland 2015. We present visual BCI classification accuracy improved results after application of high- and low-pass filters to an electroen- cephalogram (EEG) containing code-modulated visual evoked poten- tials (cVEPs). The cVEP responses are applied for the brain-computer interface (BCI) in four commands paradigm mode. The purpose of this project is to enhance BCI accuracy using only the single trial cVEP response. We also aim at identification of the most discriminable EEG bands suitable for the broadband visual stimuli. We report results from a pilot study optimizing the EEG filtering using infinite impulse response filters in application to feature extraction for a linear support vector machine (SVM) classification method. The goal of the presented study is to develop a faster and more reliable BCI to further enhance the sym- biotic relationships between humans and computers.

    DOI

  • SVM Classification Study of Code-modulated Visual Evoked Potentials

    Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)     1065 - 1070  2015  [Refereed]

     View Summary

    We present a study of a support vector machine (SVM) application to brain-computer interface (BCI) paradigm. Four SVM kernel functions are evaluated in order to maximize classification accuracy of a four classes-based BCI paradigm utilizing a code-modulated visual evoked potential (cVEP) response within the captured EEG signals. Our previously published reports applied only the linear SVM, which already outperformed a more classical technique of a canonical correlation analysis (CCA). In the current study we additionally test and compare classification accuracies of polynomial, radial basis and sigmoid kernels, together with the classical linear (non-kernel-based) SVMs in application to the cVEP BCI.

  • TDOA estimation by mapped SRP based on higher-order moment analysis

    Xiao-Dong,Zhai, Yuya,Sugimoto, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    Proc. APSIPA 2014    2014.12  [Refereed]

  • Adaptive control of applying band-width for post filter of speech coder depending on pitch frequency

    Hironobu,Chiba, Yutaka,Kamamoto, Takehiro,Moriya, Noboru,Harada, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    Proc. Asilomar Conference on Signals, Systems, and Computers, Asilomar 2014    2014.11  [Refereed]

  • ケプストラム距離を用いた雑音下音声認識の性能推定の検討

    郭,翎, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

    日本音響学会研究発表会講演論文集     61 - 62  2014.09

  • Spatial tactile brain-computer interface paradigm applying vibration stimuli to large areas of user's back

    T.,Kodama, Makino,Shoji, T.M.,Rutkowski

    International Brain-Computer Interface Conference     1 - 4  2014.09  [Refereed]

  • βダイバージェンスに基づく一般化振幅補間によるヴァーチャル多素子化を用いた目的音源強調

    片平,拓希, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会講演論文集     633 - 636  2014.09

    CiNii

  • 伝達関数ゲイン基底NMFにおけるマイク数・マイク配置と目的音強調性能の関係

    村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会講演論文集     523 - 526  2014.09

    CiNii

  • Spatial Auditory Two-step Input Japanese Syllabary Brain-computer Interface Speller

    M.,Chang, K.,Mori, Makino,Shoji, Rutkowski, Tomasz Maciej

    Procedia Technology   18   25 - 31  2014.09  [Refereed]

     View Summary

    We present a concept and a pilot study of a two-step input speller application combined with a spatial auditory brain-computer interface (BCI) for locked-in syndrome (LIS) users. The application has been developed for 25 Japanese syllabary (hiragana) characters using a two-step input procedure, in order to create an easy-to-use BCI-speller interface. In the proposed procedure, the user first selects the representative letter of a subset, defining the second step. In the second step, the final choice is made. At each interfacing step, the user's intentional choices are classified based on the P300 event related potential (ERP) responses captured in the EEG, as in the classic oddball paradigm. The BCI experiment and EEG results of the pilot study confirm the effectiveness of the proposed spelling method. (C) 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

    DOI

  • Head-related impulse response-based spatial auditory brain-computer interface

    C.,Nakaizumi, T.,Matsui, K.,Mori, Makino,Shoji, T.M.,Rutkowski

    International Brain-Computer Interface Conference     1 - 4  2014.09  [Refereed]

  • 絶対値の観測のみを用いた2つの複素信号の相関係数推定

    宮部滋樹, 小野順貴, 牧野,昭二

    日本音響学会研究発表会講演論文集   ( 1-Q-40 ) 735 - 738  2014.09

    CiNii

  • 教師なし伝達関数ゲイン基底NMFによる目的音強調における罰則項の特性評価

    千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会講演論文集     527 - 530  2014.09

    CiNii

  • 分散型マイクロホンアレイを用いた交通車両検出とその車線推定の検討

    豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会講演論文集     643 - 646  2014.09

    CiNii

  • Multi-stage declipping of clipping distortion based on length classification of clipped interval

    Chenlei,Li, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    日本音響学会研究発表会講演論文集     553 - 556  2014.09

    CiNii

  • Unisoner: An interactive interface for derivative chorus creation from various singing voices on the web

    K.,Tsuzuki, T.,Nakano, M.,Goto, T.,Yamada, Makino,Shoji

    International Computer Music Conference joint with the Sound & Music Computing conference     790 - 797  2014.09  [Refereed]

  • Unisoner: an interactive interface for derivative chorus creation from various singing voices on the Web

    Keita,Tsuzuki, Tomoyasu,Nakano, Masataka,Goto, Takeshi,Yamada, Shoji,Makino

    Proc. ICMC SMC 2014     790 - 797  2014.09  [Refereed]

  • News from the AASP-TC

    Makino,Shoji

    IEEE Signal Processing Society eNewsletter, TC News    2014.08  [Refereed]

  • Electroencephalogram steady state response sonification focused on the spatial and temporal properties

    Makino,Shoji, T.,Kaniwa, H.,Terasawa

    International Conference on Auditory Display   ( LS7-1 ) 1 - 7  2014.06  [Refereed]

  • EEG Steady State Response Sonification Focused on the Spatial and Temporal Properties

    Kaniwa, Teruaki, Terasawa, Hiroko, Matsubara, Masaki, Rutkowski, Tomasz, Makino, Shoji

    Proceedings of the 20th International Conference on Auditory Display 2014 (ICAD2014)     1 - 7  2014.06  [Refereed]

  • Reduction of computational cost in underdetermined blind source separation based on frequency-dependent time-difference-of-arrival estimation

    丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田,武志, 牧野昭二, 中村篤

    The Journal of the Acoustical Society of Japan   70 ( 6 ) 323 - 331  2014.06  [Refereed]

     View Summary

    本論文ではEMアルゴリズムを用いたスパース性に基づく劣決定ブラインド音源分離(BSS)の計算を高速化する手法を提案する。Izumi et al.は,雑音・残響下でも頑健な劣決定BSSを提案したが,到来時間差パラメータをイタレーションごとに離散全探索で求める更新則のために計算量に問題があった。そこで,到来時間差パラメータが周波数に依存すると捉えた,時間差パラメータが解析的に更新される計算量の少ない更新則を提案する。また,帯域重み付け平均による帯域非依存到来時間差推定によってパラメータ数を削減し,収束性を向上させる。実験により,提案手法が計算時間を1/10程度に削減することを確認した。

    CiNii

  • Multimedia Information Processing Combining Brain Science, Life Science, and Information Science

    Makino,Shoji

    USJI Universities Research Report   vol.32  2014.06  [Refereed]

  • Reduction of computational cost in underdetermined blind source separation based on frequency-dependent time-difference-of-arrival estimation

    T.,Maruyama, S.,Araki, T.,Nakatani, S.,Miyabe, T.,Yamada, 牧野,昭二, A.,Nakamura

    J. Acoust. Soc. Jpn   vol. 70 ( no. 6 ) 323 - 331  2014.06  [Refereed]

    CiNii

  • Acoustic signal processing based on asynchronous and distributed microphone array

    N., Ono, S., Miyabe, S., Makino

    J. Acoust. Soc. Jpn   vol. 70 ( no. 7 ) 391 - 396  2014.06  [Refereed]

  • Reduction of computational cost in underdetermined blind source separation based on frequency dependent time-difference-of-arrival estimation

    丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田武志, 牧野, 昭二, 中村, 篤

    J. Acoust. Soc. Jpn   70 ( 6 ) 323 - 331  2014.06  [Refereed]

    CiNii

  • Ad-hoc microphone array - Acoustic signal processing using multiple mobile recording devices -

    N., Ono, K.L., Trung, S., Miyabe, S., Makino

    IEICE Fundamentals Review   vol. 7 ( no. 4 ) 336 - 347  2014.04  [Refereed]

     View Summary

    Microphone array signal processing is a framework for source localization, source enhancement and source separation with processing multichannel observations achieved using multiple microphones, which are difficult using a single microphone. In microphone array signal processing, the small time difference between channels is a very important cue to obtain spatial information. Therefore, a multichannel A-D converter has been conventionally essential for synchronized observation. On the other hand, if microphone array signal processing can be performed using asynchronous recording devices such as laptop PCs, voice recorders, or smart phones, which are easily available in daily life, it would enhance convenience and increase the number of possible applications markedly. In this review, focusing on a new trend of microphone array signal processing using asynchronously recording devices, we survey existing works and also introduce our approach.

    DOI CiNii

  • Adaptive post-fltering method controlled by pitch frequency for CELP-based speech coding

    H.,Chiba, Y.,Kamamoto, T.,Moriya, N.,Harada, S.,Miyabe, T.,Yamada, S.,Makino

    IEICE Trans. Information and Systems    2014.04  [Refereed]

  • 非負値行列分解と位相復元に基づくオーディオ符号化の多チャネル化

    劉必翔, 澤田宏, 宮部滋樹, 山田武志, 牧野昭二

    日本音響学会春季研究発表会     819 - 822  2014.03

    CiNii

  • 種々の雑音抑圧手法と認識タスクに適用可能な音声認識性能推定法の検討

    郭レイ, 山田武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会春季研究発表会     13 - 14  2014.03

  • ACELP用ポストフィルタのピッチ強調帯域及び利得の適応化

    千葉大将, 鎌本優, 守谷健弘, 原田登, 宮部滋樹, 山田武志, 牧野昭二

    日本音響学会春季研究発表会     387 - 388  2014.03

  • 日本語スピーキングテストS-CATの文読み上げ問題における発話の冗長性・不完全性を考慮した自動採点の検討

    山畑勇人, 盧昊, 山田武志, 今井新悟, 石塚賢吉, 牧野昭二, 北脇信彦

    日本音響学会春季研究発表会     269 - 272  2014.03

  • 日本語スピーキングテストS-CATの自由発話問題における発話文の難易度を考慮した自動採点の検討

    盧昊, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 牧野昭二, 北脇信彦

    日本音響学会春季研究発表会     273 - 276  2014.03

  • A-10-10 Traffic monitoring by using ad-hoc microphone arrays

    豊田卓矢, 宮部滋樹, 山田,武志, 小野順貴, 牧野昭二

    Proceedings of the IEICE General Conference   2014   151  2014.03

  • 非同期マイクロホンアレーの符号化録音におけるビットレートと同期性能の関係

    宮部,滋樹, 小野,順貴, 牧野,昭二, 高橋,祐

    音講論集   ( 3-2-8 ) 725 - 726  2014.03

  • 伝達関数ゲイン基底NMFによる分散配置非同期録音における目的音強調の検討

    千葉大将, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二, 高橋祐

    日本音響学会春季研究発表会     757 - 760  2014.03

    CiNii

  • Activity Report from the AASP-TC

    S.,Makino

    IEEE Signal Processing Society eNewsletter, TC News    2014.02  [Refereed]

  • GENERALIZED AMPLITUDE INTERPOLATION BY beta-DIVERGENCE FOR VIRTUAL MICROPHONE ARRAY

    Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC)     149 - 153  2014  [Refereed]

     View Summary

    In this paper, we present a generalization of the virtual microphone array we previously proposed to increase the microphone elements by nonlinear interpolation. In the previous work, we generated a virtual observation from two actual microphones by an interpolation in the logarithmic domain. This corresponds to a linear interpolation of the phase and the geometric mean of the amplitude. In this paper, we generalize this interpolation using a linear interpolation of the phase and a nonlinear interpolation of the amplitude with adjustable nonlinearity based on beta-divergence. Improvement of the array signal processing performance is obtained by appropriate tuning of the parameter beta. We evaluate the improvement in speech enhancement using a maximum SNR beamformer.

  • AMPLITUDE-BASED SPEECH ENHANCEMENT WITH NONNEGATIVE MATRIX FACTORIZATION FOR ASYNCHRONOUS DISTRIBUTED RECORDING

    Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Yu Takahashi, Takeshi Yamada, Shoji Makino

    2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC)     203 - 207  2014  [Refereed]

     View Summary

    In this paper, we investigate amplitude-based speech enhancement for asynchronous distributed recording. In an ad-hoc microphone array context, it is supposed that different asynchronous devices record speech. As a result, the phase information is unreliable due to sampling frequency mismatch. For speech enhancement based on the amplitude information instead of the phase information, supervised nonnegative matrix factorization (NMF) is introduced in the time-channel domain. The basis vectors, which represents the gain of the transfer function from a source to each microphone, are trained in advance by using single source observation. The experimental evaluations show that this approach is well robust against the sampling frequency mismatch.

  • Spatial Auditory Two-step Input Japanese Syllabary Brain-computer Interface Speller

    Moonjeong Chang, Koichi Mori, Shoji Makino, Tomasz M. Rutkowski

    INTERNATIONAL WORKSHOP ON INNOVATIONS IN INFORMATION AND COMMUNICATION SCIENCE AND TECHNOLOGY, IICST 2014   18   25 - 31  2014  [Refereed]

     View Summary

    We present a concept and a pilot study of a two-step input speller application combined with a spatial auditory brain-computer interface (BCI) for locked-in syndrome (LIS) users. The application has been developed for 25 Japanese syllabary (hiragana) characters using a two-step input procedure, in order to create an easy-to-use BCI-speller interface. In the proposed procedure, the user first selects the representative letter of a subset, defining the second step. In the second step, the final choice is made. At each interfacing step, the user's intentional choices are classified based on the P300 event related potential (ERP) responses captured in the EEG, as in the classic oddball paradigm. The BCI experiment and EEG results of the pilot study confirm the effectiveness of the proposed spelling method. (C) 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

    DOI

  • Chromatic SSVEP BCI Paradigm Targeting the Higher Frequency EEG Responses

    Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( WP2-3-2 ) 1 - 7  2014  [Refereed]

     View Summary

    A novel approach to steady-state visual evoked potential (SSVEP) based brain-computer interface (BCI) is presented in the paper. To minimize possible side effects of the monochromatic light SSVEP-based BCI we propose to utilize chromatic green blue flicker stimuli in higher, comparing to the traditionally used, frequencies. The developed safer SSVEP responses are processed an classified with features drawn from EEG power spectra. Results obtained from healthy users support the research hypothesis of the chromatic and higher frequency SSVEP. The feasibility of proposed method is evaluated in a comparison of monochromatic versus chromatic SSVEP responses. We also present preliminary results with empirical mode decomposition (EMD) adaptive filtering which resulted with improved classification accuracies.

  • P300 Responses Classification Improvement in Tactile BCI with Touch-sense Glove

    Hiroki Yajima, Shoji Makino, Tomasz M. Rutkowski

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( WP2-3-3 ) 1 - 7  2014  [Refereed]

     View Summary

    This paper reports on a project aiming to confirm whether a tactile stimulator "touch sense glove" is effective for a novel brain computer interface (BCI) paradigm and whether the tactile stimulus delivered to the fingers could be utilized to evoke event related potential (ERP) responses with possible attentional modulation. The tactile ERPs are expected to improve the BCI accuracy. The proposed new stimulator device is presented in detail together with psychophysical and EEG BCI experiment protocols. Results supporting the proposed "touch sense glove" device are presented in form of online BCI classification accuracy results. Finally, we outline the future possible paradigm improvements.

  • TDOA Estimation by Mapped Steered Response Power Analysis Utilizing Higher-Order Moments

    Xiao-Dong Zhai, Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( FP-P1-3 ) 1 - 4  2014  [Refereed]

     View Summary

    In this paper, we propose a new estimation method for the time difference of arrival (TDOA) between two microphones with improved accuracy by exploiting higher-order moments. In the proposed method analyzes the steered response power (SRP) of the observed signals after nonlinearly mapped onto a higher-dimensional space. Since the mapping operation enhances the linear independence between different vectors by increasing the dimensionality of the observed signals, the TDOA analysis achieves higher resolution. The results of an experiment comparing the TDOA estimation performance of the proposed method with that of the conventional methods reveal the robustness of the proposed method against noise and reverberation.

  • On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

    Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)    2014  [Refereed]

     View Summary

    In this paper, we investigate the relationship between the way microphones are arranged and the degree to which speech is enhanced using the transfer-function-gain non-negative matrix factorization (NMF), which is an amplitude-based speech enhancement method that is suitable for use with an asynchronous distributed microphone array. In an asynchronous distributed microphone array, recording devices can be placed freely and the number of devices can be easily increased. Therefore, it is important that to determine the optimum microphone arrangement and the degree to which the performance is improved by using many microphones. We understood experimental evaluations to show that the performance by supervised NMF can achieve close to the ideal time-frequency masking with a sufficient number of microphones. We also show that the performance is better when more microphones are placed close to each source.

  • Tactile and Bone-conduction Auditory Brain Computer Interface for Vision and Hearing Impaired Users - Stimulus Pattern and BCI Accuracy Improvement

    Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( FP2-6-3 ) 1 - 7  2014  [Refereed]

     View Summary

    This paper aims to improve tactile and bone-conduction brain computer interface (tbaBCI) classification accuracy based on a new stimulus pattern search in order to trigger more separable P300 responses. We propose and investigate three approaches to stimulus spatial and frequency content modification. As result of the online tbaBCI classification accuracy tests with six subjects we conclude that frequency modification in the previously reported single vibrotactile exciter-based patterns leads to border of significance statistical improvements.

  • Tactile Pressure Brain-computer Interface Using Point Matrix Pattern Paradigm

    Kensuke Shimizu, Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

    2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS)     473 - 477  2014  [Refereed]

     View Summary

    The paper presents a tactile pressure stimulus-based brain-computer interface (BCI) paradigm. 3 x 3 pressure pins matrix stimulus patterns are presented to the subjects in an oddball paradigm allowing for "aha-responses" generation to attended targets. A research hypothesis is confirmed with the results with five subjects performing online BCI experiments. One of the users could score with 100% accuracy in online ten averages based BCI test. Three users scored above chance levels, while one remained on the chance level border. The presented pilot study experiments and EEG results confirm the effectiveness of the proposed tactile pressure stimulus based BCI.

  • TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY

    Takuya Toyoda, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC)     318 - 322  2014

     View Summary

    In this paper, we propose an easy and convenient method for traffic monitoring based on acoustic sensing with vehicle sound recorded by an ad-hoc microphone array. Since signals recorded by an ad-hoc microphone array are asynchronous, we perform channel synchronization by compensating for the difference between the start and the end of the recording and the sampling frequency mismatch. To monitor traffic, we estimate the number of the vehicles by employing the peak detection of the power envelopes, and classify the traffic lane from the difference between the propagation times of the microphones. We also demonstrate the effectiveness of our proposed method using the results of an experiment in which we estimated the number of vehicles and classified the lane in which the vehicles were traveling, according to F-measure.

  • Adaptive Post-Filtering Controlled by Pitch Frequency for CELP-based Speech Coder

    Hironobu Chiba, Yutaka Kamamoto, Takehiro Moriya, Noboru Harada, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS     838 - 842  2014  [Refereed]

     View Summary

    Most speech codecs utilize a post-filter that emphasizes pitch structures to enhance perceptual quality at the decoder. Particularly, the bass post-filter used in ITU-T G.718 performs an adaptive pitch enhancement technique for a lower fixed frequency band. This paper describes a new post-filtering method in which the bass the frequency band and the gain are adaptively controlled frame-by-frame depending on the pitch frequency of decoded signal to improve bass post-filter performance. We have confirmed the improvement of the speech quality with the developed method through objective and subjective evaluations.

  • On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

    Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( FA1-1-3 ) 1 - 5  2014  [Refereed]

     View Summary

    In this paper, we investigate the relationship between the way microphones are arranged and the degree to which speech is enhanced using the transfer-function-gain non-negative matrix factorization (NMF), which is an amplitude-based speech enhancement method that is suitable for use with an asynchronous distributed microphone array. In an asynchronous distributed microphone array, recording devices can be placed freely and the number of devices can be easily increased. Therefore, it is important that to determine the optimum microphone arrangement and the degree to which the performance is improved by using many microphones. We understood experimental evaluations to show that the performance by supervised NMF can achieve close to the ideal time-frequency masking with a sufficient number of microphones. We also show that the performance is better when more microphones are placed close to each source.

  • Automatic Scoring Method for Open Answer Task in the SJ-CAT Speaking Test Considering Utterance Difficulty Level

    Hao Lu, Takeshi Yamada, Shingo Imai, Takahiro Shinozaki, Ryuichi Nisimura, Kenkichi Ishizuka, Shoji Makino, Nobuhiko Kitawaki

    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( WA1-1-3 ) 1 - 5  2014  [Refereed]

     View Summary

    In this paper, we propose an automatic scoring method for the open answer task of the Japanese speaking test SJ-CAT. The proposed method first extracts a set of features from an input answer utterance and then estimates a vocabulary richness score by human raters, which ranges from 0 to 4, by employing SVR (support vector regression). We devised a novel set of features, namely text statistics weighted by word reliability, to assess the abundance of vocabulary and expression, and degree of word relevance based on the hierarchical distance in a thesaurus to evaluate the suitability of vocabulary. We confirmed experimentally that the proposed method provides good estimates of the human richness score, with a correlation coefficient of 0.92 and an RMSE (root mean square error) of 0.56. We also showed that the proposed method is relatively robust to differences among examinees and among questions used for training and testing.

  • Auditory Brain-Computer Interface Paradigm with Head Related Impulse Response-based Spatial Cues

    Chisaki Nakaizumi, Koichi Mori, Toshie Matsui, Shoji Makino, Tomasz M. Rutkowski

    Proc. International Conference on Signal Image Technonogy and Internet Based Systems   ( WS-MISA-01 ) 806 - 811  2013.12  [Refereed]

     View Summary

    The aim of this study is to provide a comprehensive test of head related
    impulse response (HRIR) for an auditory spatial speller brain-computer
    interface (BCI) paradigm. The study is conducted with six users in an
    experimental set up based on five Japanese hiragana vowels. Auditory evoked
    potentials resulted with encouragingly good and stable "aha-" or P300-responses
    in real-world online BCI experiments. Our case study indicated that the
    auditory HRIR spatial sound reproduction paradigm could be a viable alternative
    to the established multi-loudspeaker surround sound BCI-speller applications,
    as far as healthy pilot study users are concerned.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Unisoner: 同一楽曲を歌った異なる歌声を重ね合わせる合唱制作支援インタフェース

    都築圭太, 中野倫靖, 後藤真孝, 山田,武志, 牧野昭二

    第21回インタラクティブシステムとソフトウェアに関するワークショップ, WISS2013    2013.12

  • Novel spatial tactile and bone-conduction auditory brain computer interface

    T.M.,Rutkowski, H.,Mori, S.,Makino, K.,Mori

    Proc. Neuroscience2013     79  2013.11  [Refereed]

  • 様々な歌手が同じ曲を歌った歌声の多様さを活用するシステム

    都築圭太, 中野倫靖, 後藤真孝, 山田武志, 牧野昭二

    情報処理学会研究報告   2013-MUS-100-21   1 - 8  2013.09

     View Summary

    本稿では,Web 上で公開されている 「一つの曲を様々な歌手が歌った歌声」 を活用する二つのシステムを提案する.一つは,それらの歌声を重ね合わせる合唱生成支援システム,もう一つは,それらの歌声同士や白分の歌声を比較できる歌唱力向上支援システムである.従来,復数の楽曲を用いた鑑賞や創作支援,自分が歌うだけの歌唱力向上支援は研究されてきたが,同一曲を複数人が歌った歌声を活用した合唱生成や歌唱力向上支援はなかった.合唱生成支援システムでは,歌声の出現時刻と左右チャネルの音量をマウスで直感的に調整できる.直感的な操作と,それぞれの歌が完成された作品であることを利用することで,創作と同時に鑑賞を楽しむ 「創作鑑賞」 も可能となる.また,歌唱力向上支援システムでは,声質 (MFCC) と歌い回し (F0軌跡) が近い歌声同士を比較表示できる.Web 上で公開されていて再生数・マイリスト数があるため,それらの情報を活用しながら歌唱力向上に取り組める.これらのシス

  • 復号信号の特徴に応じたACELP用ポストフィルタの制御

    千葉大将, 守谷健弘, 鎌本優, 原田登, 宮部滋樹, 山田武志, 牧野昭二

    日本音響学会秋季研究発表会     319 - 320  2013.09

  • Some advances in adaptive source separation

    J.T.,Chien, H.,Sawada, S.,Makino

    APSIPA Newsletter     7 - 9  2013.09  [Refereed]

  • 複素対数補間を用いたヴァーチャル多素子化マイクロホンアレーの周波数依存素子配置最適化

    片平拓希, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二

    日本音響学会秋季研究発表会     609 - 610  2013.09

  • 非整数サンプルシフトのフレーム分析を用いた非同期録音の同期化

    宮部,滋樹, 小野,順貴, 牧野,昭二

    音講論集   ( 1-1-9 ) 593 - 596  2013.09

  • News from the AASP-TC

    Makino,Shoji

    IEEE Signal Processing Society eNewsletter, TC News    2013.08  [Refereed]

  • Network based complexity analysis in tactile brain computer interface task

    H.,Mori, Y.,Matsumito, S.,Makino, Z.,Struzik, D.,Mandic, T.M.,Rutkowski

    Proc. EMBC2013   51 ( M-134 ) 1 - 1  2013.07  [Refereed]

    DOI CiNii

  • Multi-command tactile and auditory brain computer interface based on head position stimulation

    H.,Mori, Y.,Matsumito, Z.,Struzik, K.,Mori, S.,Makino, D.,Mandic, T.M.,Rutkowski

    Proc. International Brain-Computer Interface Meeting   ( DOI:10.3217/978-3-85125-260-6- ) 1 - 2  2013.06  [Refereed]

  • Spatial tactile and auditory brain computer interface based on head position stimulation

    T.M.,Rutkowski, H.,Mori, Y.,Matsumoto, Z.,Struzik, S.,Makino, D.,Mandic, K.,Mori

    Proc. Neuro2013    2013.06  [Refereed]

  • Comparison of P300 responses in auditory, visual and audiovisual spatial speller BCI paradigms

    M.,Chang, N.,Nishikawa, Z.,Struzik, K.,Mori, S.,Makino, D.,Mandic, T.M.,Rutkowski

    Proc. International Brain-Computer Interface Meeting   ( DOI:10.3217/978-3-85125-260-6- ) 1 - 2  2013.06  [Refereed]

  • Blind compensation of inter-channel sampling frequency mismatch with maximum Likelihood estimation in STFT domain

    S.,Miyabe, N.,Ono, S.,Makino

    Proc. ICASSP2013     674 - 678  2013.05  [Refereed]

     View Summary

    This paper proposes a novel blind compensation of sampling frequency mismatch for asynchronous microphone array. Digital signals simultaneously observed by different recording devices have drift of the time differences between the observation channels because of the sampling frequency mismatch among the devices. Based on the model that such the time difference is constant within each time frame, but varies proportional to the time frame index, the effect of the sampling frequency mismatch can be compensated in the short-time Fourier transform domain by the linear phase shift. By assuming the sources are motionless and stationary, a likelihood of the sampling frequency mismatch is formulated. The maximum likelihood estimation is obtained effectively by a golden section search.

  • Signal Separation of EEG Using Multivariate Probabilistic Model

    KURIHANA,Yusuke, MIYABE,Shigeki, RUTKOWSKI, Tomasz M, MATSUMOTO,Yoshihiro, YAMADA,Takeshi, MAKINO,Shoji

    IEICE technical report. ME and bio cybernetics   112 ( 479 ) 161 - 166  2013.03

     View Summary

    With independent component analysis (ICA), one promising source separation framework, it is difficult to separate desired signal components from the EEG observation, where vast number of sources are mixed. In this paper, we define the change of magnitude caused by each phenomenon inside brain as EEG event, and we formulate the probability model of the EEG event assuming the observation of each EEG event follows multivariate normal distribution locally in every short period. By regarding that each EEG event distirubtes sparsely in the time-frequency domain, the likelihood of the observation is given by Gaussian mixture model (GMM), and the parameters of the EEG events are estimated by an expectation-maximization (EM)algorithm. Also, by introducing Dirichlet prior probability with an appropriate hyperparameter to the activation of each Gaussian components, the EM algorithm achieves the ability to estimate both the number of significant EEG events and their parameters. An EEG separation experiment reveals that the proposed method can separate an appropriate number of EEG event.

  • A network model for the embodied communication of musical emotions

    H.,Terasawa, R.,Hoshi-Shiba, T.,Shibayama, H.,Ohmura, K.,Furukawa, S.,Makino, K.,Okanoya

    Cognitive Studies   20 ( 1 ) 112-129 - 129  2013.03  [Refereed]

     View Summary

    Music induces a wide range of emotions. However, the influence of physiological<br> functions on musical emotions needs further theoretical considerations. This paper<br> summarizes the physical and physiological functions that are related to musical emo-<br>tions, and proposes a model for the embodied communication of musical emotions based<br> on a discussion on the transmission of musical emotions across people by sharing move-<br>ments and gestures. In this model, human with musical emotion is represented with<br> (1) the interfaces of perception and expression (senses, movements, facial and vocal<br> expressions), (2) an internal system of neural activities including the mirror system<br> and the hormonal secretion system that handles responses to musical activities, and<br> (3) the musical emotion that is enclosed in the internal system. Using this model, mu-<br>sic is the medium for transmitting emotions, and communication of musical emotions<br> is the communication of internal emotions through music and perception/expression<br> interfaces. Finally, we will discuss which aspect in music functions to encourage the<br> communication of musical emotions by humans.

    DOI CiNii

  • A network model for the embodied communication of musical emotions

    H.,Terasawa, R.,Hoshi-Shiba, T.,Shibayama, H.,Ohmura, K.,Furukawa, S.,Makino, K.,Okanoya

    Cognitive Studies   20 ( 1 ) 112-129 - 129  2013.03  [Refereed]

     View Summary

    Music induces a wide range of emotions. However, the influence of physiological functions on musical emotions needs further theoretical considerations. This paper summarizes the physical and physiological functions that are related to musical emo-tions, and proposes a model for the embodied communication of musical emotions based on a discussion on the transmission of musical emotions across people by sharing move-ments and gestures. In this model, human with musical emotion is represented with (1) the interfaces of perception and expression (senses, movements, facial and vocal expressions), (2) an internal system of neural activities including the mirror system and the hormonal secretion system that handles responses to musical activities, and (3) the musical emotion that is enclosed in the internal system. Using this model, mu-sic is the medium for transmitting emotions, and communication of musical emotions is the communication of internal emotions through music and perception/expression interfaces. Finally, we will discuss which aspect in music functions to encourage the communication of musical emotions by humans.

    DOI CiNii

  • Speech enhancement with ad-hoc microphone array using single source activity

    Ryutaro Sakanashi, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)   ( OS.21-SLA.7.5 ) 1 - 6  2013  [Refereed]

     View Summary

    In this paper, we propose a method for synchronizing asynchronous channels in an ad-hoc microphone array based on single source activity for speech enhancement. An ad-hoc microphone array can include multiple recording devices, which do not communicate with each other. Therefore, their synchronization is a significant issue when using the conventional microphone array technique. We here assume that we know two or more segments (typically the beginning and the end of the recording) where only the sound source is active. Based on this situation, we compensate for the difference between the start and end of the recording and the sampling frequency mismatch. We also describe experimental results for speech enhancement with a maximum SNR beamformer.

  • Performance estimation of noisy speech recognition using spectral distortion and SNR of noise-reduced speech

    Guo Ling, Takeshi Yamada, Shoji Makino, Nobuhiko Kitawaki

    IEEE Region 10 Annual International Conference, Proceedings/TENCON    2013  [Refereed]

     View Summary

    To ensure a satisfactory QoE (Quality of Experience) and facilitate system design in speech recognition services, it is essential to establish a method that can be used to efficiently investigate recognition performance in different noise environments. Previously, we proposed a performance estimation method using the PESQ (Perceptual Evaluation of Speech Quality) as a spectral distortion measure. However, there is the problem that the relationship between the recognition performance and the distortion value differs depending on the noise reduction algorithm used. To solve this problem, we propose a novel performance estimation method that uses an estimator defined as a function of the distortion value and the SNR (Signal to Noise Ratio) of noise-reduced speech. The estimator is applicable to different noise reduction algorithms without any modification. We confirmed the effectiveness of the proposed method by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. © 2013 IEEE.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Classification improvement of P300 response based auditory spatial speller brain-computer interface paradigm

    Moonjeong Chang, Shoji Makino, Tomasz M. Rutkowski

    IEEE Region 10 Annual International Conference, Proceedings/TENCON   ( S.I.2.1 ) 1 - 4  2013  [Refereed]

     View Summary

    The aim of the presented study is to provide a comprehensive test of the EEG evoked response potential (ERP) feature selection techniques for the spatial auditory BCI-speller paradigm, which creates a novel communication option for paralyzed subjects or body-able individuals requiring a direct brain-computer interfacing application. For rigor, the study is conducted with 16 BCI-naive healthy subjects in an experimental setup based on five Japanese hiragana characters in an offline processing mode. In our previous studies the spatial auditory stimuli related P300 responses resulted with encouragingly separable target vs. non-target latencies in averaged responses, yet that finding was not well reproduced in the online BCI single trial based settings. We present the case study indicating that the auditory spatial unimodal paradigm classification accuracy can be enhanced with an AUC based feature selection approach, as far as BCI-naive healthy subjects are concerned. © 2013 IEEE.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Bone-conduction-based brain computer interface paradigm - EEG signal processing, feature extraction and classification

    Daiki Aminaka, Koichi Mori, Toshie Matsui, Shoji Makino, Tomasz M. Rutkowski

    Proceedings - 2013 International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2013   ( WS-MISA-03 ) 818 - 824  2013  [Refereed]

     View Summary

    The paper presents a novel bone-conduction based brain-computer interface paradigm. Four sub-threshold acoustic frequency stimulus patterns are presented to the subjects in an oddball paradigm allowing for 'aha-responses' generation to the attended targets. This allows for successful implementation of the bone-conduction based brain-computer interface (BCI) paradigm. The concept is confirmed with seven subjects in online bone-conducted auditory Morse-code patterns spelling BCI paradigm. We report also brain electrophysiological signal processing and classification steps taken to achieve the successful BCI paradigm. We also present a finding of the response latency variability in a function of stimulus difficulty. © 2013 IEEE.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • VIRTUALLY INCREASING MICROPHONE ARRAY ELEMENTS BY INTERPOLATION IN COMPLEX-LOGARITHMIC DOMAIN

    Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO)   ( TH-L5.3 )  2013

     View Summary

    In this paper, we propose a new array signal processing technique for an underdetermined condition by increasing the number of observation channels. We introduce virtual observation as an estimate of the observed signals at positions where real microphones are not placed. Such signals at virtual observation channels are generated by the complex logarithmic interpolation of real observed signals. With the increased number of observation channels, conventional linear array signal processing methods can be applied to underdetermined conditions. As an example of the proposed array signal processing framework, we show experimental results of speech enhancement obtained with maximum SNR beamformers modified using the virtual observation.

  • Multi-command chest tactile brain computer interface for small vehicle robot navigation

    Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8211 LNAI   469 - 478  2013  [Refereed]

     View Summary

    The presented study explores the extent to which tactile stimuli delivered to five chest positions of a healthy user can serve as a platform for a brain computer interface (BCI) that could be used in an interactive application such as robotic vehicle operation. The five chest locations are used to evoke tactile brain potential responses, thus defining a tactile brain computer interface (tBCI). Experimental results with five subjects performing online tBCI provide a validation of the chest location tBCI paradigm, while the feasibility of the concept is illuminated through information-transfer rates. Additionally an offline classification improvement with a linear SVM classifier is presented through the case study. © Springer International Publishing 2013.

    DOI

    Scopus

    11
    Citation
    (Scopus)
  • Classifying P300 responses to vowel stimuli for auditory brain-computer interface

    Yoshihiro Matsumoto, Shoji Makino, Koichi Mori, Tomasz M. Rutkowski

    2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013   ( OS.31-BioSiPS.2.8 ) 1 - 5  2013  [Refereed]

     View Summary

    A brain-computer interface (BCI) is a technology for operating computerized devices based on brain activity and without muscle movement. BCI technology is expected to become a communication solution for amyotrophic lateral sclerosis (ALS) patients. Recently the BCI2000 package application has been commonly used by BCI researchers. The P300 speller included in the BCI2000 is an application allowing the calculation of a classifier necessary for the user to spell letters or sentences in a BCI-speller paradigm. The BCI-speller is based on visual cues, and requires muscle activities such as eye movements, impossible to execute by patients in a totally locked-in state (TLS), which is a terminal stage of the ALS illness. The purpose of our project is to solve this problem, and we aim to develop an auditory BCI as a solution. However, contemporary auditory BCI-spellers are much weaker compared with a visual modality. Therefore there is a necessity for improvement before practical application. In this paper, we focus on an approach related to the differences in responses evoked by various acoustic BCI-speller related stimulus types. In spite of various event related potential waveform shapes, typically a classifier in the BCI speller discriminates only between targets and non-targets, and hence it ignores valuable and possibly discriminative features. Therefore, we expect that the classification accuracy could be improved by using an independent classifier for each of the stimulus cue categories. In this paper, we propose two classifier training methods. The first one uses the data of the five stimulus cues independently. The second method incorporates weighting for each stimulus cue feature in relation to all of them. The results of the experiments reported show the effectiveness of the second method for classification improvement. © 2013 APSIPA.

    DOI

    Scopus

    20
    Citation
    (Scopus)
  • EMPLOYING MOMENTS OF MULTIPLE HIGH ORDERS FOR HIGH-RESOLUTION UNDERDETERMINED DOA ESTIMATION BASED ON MUSIC

    Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Fred Juang

    2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA)   ( PM-02 ) 1 - 4  2013  [Refereed]

     View Summary

    Several extensions of the MUltiple SIgnal Classification (MUSIC) algorithm exploiting high order statistics were proposed to estimate directions of arrival (DOAs) with high resolution in underdetermined conditions. However, these methods entail a trade-off between two performance goals, namely, robustness and resolution, in the choice of orders because use of high-ordered statistics increases not only the resolution but also the statistical bias. To overcome this problem, this paper proposes a new extension of MUSIC using a nonlinear high-dimensional map, which corresponds to the joint analysis of moments of multiple orders and helps to realize the both advantages of robustness and high resolution of low-ordered and high-ordered statistics. Experimental results show that the proposed method can estimate DOAs more accurately than the conventional MUSIC extensions exploiting moments of a single high order.

  • OPTIMIZING FRAME ANALYSIS WITH NON-INTEGRER SHIFT FOR SAMPLING MISMATCH COMPENSATION OF LONG RECORDING

    Shigeki Miyabe, Nobutaka Ono, Shoji Makino

    2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA)   ( TM-09 ) 1 - 4  2013  [Refereed]

     View Summary

    This paper proposes a blind synchronization of ad-hoc microphone array in the short-time Fourier transform (STFT) domain with the optimized frame analysis centered at non-integer discrete time. We show that the drift caused by sampling frequency mismatch of asynchronous observation channels can be disregarded in a short interval. Utilizing this property, the sampling frequency mismatch and the recording start offset are estimated roughly by finding two pairs of the short intervals corresponding to the same continuous time. Using the estimate, STFT analysis is synchronized roughly between channels with optimized frame central. Since the optimized frame central is generally non-integer, we approximate the frame analysis by the linear phase filtering of the frame centered at the nearest integer sample. Maximum likelihood estimation refines the compensation of sampling frequency mismatch.

  • Spatial auditory BCI with ERP responses to front-back to the head stimuli distinction support

    Zhenyu Cai, Shoji Makino, Tomasz M. Rutkowski

    2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013   ( OS.31-BioSiPS.2.1 ) 1 - 8  2013  [Refereed]

     View Summary

    This paper presents recent results obtained with a new auditory spatial localization based BCI paradigm in which ERP shape differences at early latencies are employed to enhance classification accuracy in an oddball experimental setting. The concept relies on recent results in auditory neuroscience showing the possibility to differentiate early anterior contralateral responses to the spatial sources attended to. We also find that early brain responses indicate which direction, front or rear loudspeaker source, the subject attended to. Contemporary stimuli-driven BCI paradigms benefit most from the P300 ERP latencies in a so-called 'aha-response' setting. We show the further enhancement of the classification results in a spatial auditory paradigm, in which we incorporate N200 latencies. The results reveal that these early spatial auditory ERPs boost offline classification results of the BCI application. The offline BCI experiments with the multi-command BCI prototype support our research hypothesis with higher classification results and improved information transfer rates. © 2013 APSIPA.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Adaptive processing and learning for audio source separation

    Jen-Tzung Chien, Hiroshi Sawada, Shoji Makino

    2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013   ( OS.42-SLA.13.3 ) 1 - 6  2013  [Refereed]

     View Summary

    This paper overviews a series of recent advances in adaptive processing and learning for audio source separation. In real world, speech and audio signal mixtures are observed in reverberant environments. Sources are usually more than mixtures. The mixing condition is occasionally changed due to the moving sources or when the sources are changed or abruptly present or absent. In this survey article, we investigate different issues in audio source separation including overdetermined/underdetermined problems, permutation alignment, convolutive mixtures, contrast functions, nonstationary conditions and system robustness. We provide a systematic and comprehensive view for these issues and address new approaches to overdetermined/underdetermined convolutive separation, sparse learning, nonnegative matrix factorization, information-theoretic learning, online learning and Bayesian approaches. © 2013 APSIPA.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Spatial auditory BCI paradigm based on real and virtual sound image generation

    Nozomu Nishikawa, Shoji Makino, Tomasz M. Rutkowski

    2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013   ( OS.31-BioSiPS.2.7 ) 1 - 5  2013  [Refereed]

     View Summary

    This paper presents a novel concept of spatial auditory brain-computer interface utilizing real and virtual sound images. We report results obtained from psychophysical and EEG experiments with nine subjects utilizing a novel method of spatial real or virtual sound images as spatial auditory brain computer interface (BCI) cues. Real spatial sound sources result in better behavioral and BCI response classification accuracies, yet a direct comparison of partial results in a mixed experiment confirms the usability of the virtual sound images for the spatial auditory BCI. Additionally, we compare stepwise linear discriminant analysis (SWLDA) and support vector machine (SVM) classifiers in a single sequence BCI experiment. The interesting point of the mixed usage of real and virtual spatial sound images in a single experiment is that both stimuli types generate distinct event related potential (ERP) response patterns allowing for their separate classification. This discovery is the strongest point of the reported research and it brings the possibility to create new spatial auditory BCI paradigms. © 2013 APSIPA.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Multi-command tactile brain computer interface: A feasibility study

    Hiromu Mori, Yoshihiro Matsumoto, Victor Kryssanov, Eric Cooper, Hitoshi Ogawa, Shoji Makino, Zbigniew R. Struzik, Tomasz M. Rutkowski

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   7989 LNCS   50 - 59  2013  [Refereed]

     View Summary

    The study presented explores the extent to which tactile stimuli delivered to the ten digits of a BCI-naive subject can serve as a platform for a brain computer interface (BCI) that could be used in an interactive application such as robotic vehicle operation. The ten fingertips are used to evoke somatosensory brain responses, thus defining a tactile brain computer interface (tBCI). Experimental results on subjects performing online (real-time) tBCI, using stimuli with a moderately fast inter-stimulus-interval (ISI), provide a validation of the tBCI prototype, while the feasibility of the concept is illuminated through information-transfer rates obtained through the case study. © 2013 Springer-Verlag.

    DOI

    Scopus

    14
    Citation
    (Scopus)
  • EEG signal processing and classification for the novel tactile-force brain-computer interface paradigm

    Shota Kono, Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

    Proceedings - 2013 International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2013   ( WS-MISA-02 ) 812 - 817  2013  [Refereed]

     View Summary

    The presented study explores the extent to which tactile-force stimulus delivered to a hand holding a force-feedback joystick can serve as a platform for a brain-computer interface (BCI). The four pressure directions are used to evoke tactile brain potential responses, thus defining a tactile-force brain computer interface (tfBCI). We present brain signal processing and classification procedures leading to successful online interfacing results. Experimental results with seven subjects performing online BCI experiments provide a validation of the hand location tfBCI paradigm, while the feasibility of the concept is illuminated through remarkable information-transfer rates. © 2013 IEEE.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Inter-subject differences in personalized technical ear training and the influence of an individually optimized training sequence

    Sungyoung Kim, Teruaki Kaniwa, Hiroko Terasawa, Takeshi Yamada, Shoji Makino

    Acoustical Science and Technology   34 ( 6 ) 424 - 431  2013  [Refereed]

     View Summary

    Technical ear training aims to improve the listening of sound engineers so they can skillfully modify and edit the structure of sound. Despite recent increasing interest in listening ability and subjective evaluation in the field of audio- and acoustic-related fields and the subsequent appearance of various technical ear-training methods, the subject of how to provide efficient training for a self-trainee has not yet been studied. This paper investigated trainees' performances and showed that an (inherent or learned) ability to correctly describe spectral differences using the terms of a parametric equalizer (center frequency, Q, and gain) was different for each person. To cope with such individual differences in spectral identification, the authors proposed a novel method that adaptively controls the training task based on a trainee's prior performances. In detail, the method estimates the weakness of the trainee, and generates a training routine that focuses on that weakness. Subsequently, we tried to determine whether the proposed method-adaptive feedback-helps self-learners improve their performance in technical listening that involves identifying spectral differences. The results showed that the proposed method could assist trainees in improving their ability to identify differences more effectively than the counterpart group. Together with other features required for effective selftraining, this adaptive feedback would assist a trainee in acquisition of timbre-identification ability. © 2013 The Acoustical Society of Japan.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • Exhaustive structural comparison of protein-DNA binding surfaces

    R,Minai, T,Horiike, S. Makino

    GIW2012 (International Conference on Genome Informatics)   ( poster 29 )  2012.12  [Refereed]

  • Full-reference objective quality evaluation for noise-reduced speech considering effect of musical noise

    Y.,Fujita, T.,Yamada, S.,Makino, N.,Kitawaki

    Oriental COCOSDA2012     300-305  2012.12  [Refereed]

  • Foreword to special issue on recent mathematical advances in acoustic signal processing

    S.,Makino

    The Journal of the Acoustical Society of Japan   68 ( 11 ) 557-558 - 558  2012.11  [Refereed]

    CiNii

  • A multi-command spatial auditory BMI based on evoked EEG responses from real and virtual sound stimuli

    T.M.,Rutkowski, Z.,Cai, N.,Nishikawa, Y.,Matsumoto, S.Makino, D.,Looney, D.P.,Mandic, Z.R.,Struzik, A.W, Przybyszewski

    Neuroscience2012     891.16/NN4  2012.10  [Refereed]

  • Underdetermined DOA estimation by the non-linear MUSIC exploiting higher-order moments

    Y,Sugimoto, S,Miyabe, T,Yamada, S,Makino, and,F. Juang

    IWAENC2012   ( E-03 )  2012.09  [Refereed]

  • In Search of a Perceptual Metric for Timbre: Dissimilarity Judgments among Synthetic Sounds with MFCC-Derived Spectral Envelopes

    Hiroko Terasawa, Jonathan Berger, Shoji Makino

    JOURNAL OF THE AUDIO ENGINEERING SOCIETY   60 ( 9 ) 674 - 685  2012.09  [Refereed]

     View Summary

    This paper presents a quantitative metric to describe the multidimensionality of spectral envelope perception, that is, the perception specifically related to the spectral element of timbre. Mel-cepstrum (Mel-frequency cepstral coefficients or MFCCs) is chosen as a hypothetical metric for spectral envelope perception due to its desirable properties of linearity, orthogonality, and multidimensionality. The experimental results confirmed the relevance of Mel-cepstrum to the perceived timbre dissimilarity when the spectral envelopes of complex-tone synthetic sounds were systematically controlled. The first experiment measured the perceived dissimilarity when the stimuli were synthesized by varying only a single coefficient from MFCC. Linear regression analysis proved that each of the 12 MFCCs has a linear correlation with spectral envelope perception. The second experiment measured the perceived dissimilarity when the stimuli were synthesized by varying two of the MFCCs. Multiple regression analysis showed that the perceived dissimilarity can be explained in terms of the Euclidean distance of the MFCC values of the synthetic sounds. The quantitative and perceptual relevance between the MFCCs and spectral centroids is also discussed. These results suggest that MFCCs can be a metric representation of spectral envelope perception, where each of its orthogonal basis functions provides a linear match with human perception.

  • Analysis of brain responses to spatial real and virtual sounds - A BCI/BMI approach

    N,Nishikawa, S,Makino, and,T.M. Rutkowski

    International Workshop on Brain Inspired Computing, BIC2012    2012.06  [Refereed]

  • Steady-state auditory responses application to BCI/BMI

    Y,Matsumoto, S,Makino, and,T.M. Rutkowski

    International Workshop on Brain Inspired Computing, BIC2012    2012.06  [Refereed]

  • Spatial auditory BCI/BMI paradigm

    Z,Cai, S,Makino, and,T.M. Rutkowski

    International Workshop on Brain Inspired Computing, BIC2012    2012.06  [Refereed]

  • Diffuse Noise Reduction using a Full-rank Spatial Covariance Model

    Iso,Keiju, Araki,Shoko, Makino,Shoji, Nakatani,Tomohiro, Sawada,Hiroshi, Yamada,Takeshi, Miyabe,Shigeki, Nakamura,Atsushi

    Proceedings of the IEICE General Conference   2012 ( 0 ) 194  2012.03

  • D-14-1 Effect of Musical Noise on Subjective Quality Evaluation of Noise-Reduced Speech

    Fujita,Yuki, Yamada,Takeshi, Makino,Shoji, Kitawaki,Nobuhiko

    Proceedings of the IEICE General Conference   2012 ( 1 ) 185  2012.03

  • Cepstral smoothing of separated signals for underdetermined speech separation

    Ansai,Yumi, Araki,Shoko, Makino,Shoji, Nakatani,Tomohiro, Yamada,Takeshi, Nakamura,Atsushi, Kitawaki,Nobuhiko

    The Journal of the Acoustical Society of Japan   68 ( 2 ) 74 - 85  2012.02  [Refereed]

     View Summary

    本論文では,音源信号のスパース性に基づき,時間周波数バイナリマスク(BM)を用いる音源分離手法におけるミュージカルノイズの低減を目的とした,分離音声のケプストラムスムージング(CSS)を提案する。CSSは,近年提案されたスペクトルマスクのケプストラムスムージング(CSM)で用いられるケプストラム領域でスムージングする考え方と,ケプストラム表現による音声特性の保持の制御という観点では,マスクではなくBMによって得られた分離音声を直接スムージングする方が好ましいという仮説とに基づいている。また,従来法(CSM)や提案法(CSS)と他のミュージカルノイズ低減手法の性能を実験により比較する。CSSでは,CSMと同程度のミュージカルノイズ低減性能を有し,更に目的音声の歪の小さい分離信号が得られた。

    CiNii

  • NEW ANALYTICAL UPDATE RULE FOR TDOA INFERENCE FOR UNDERDETERMINED BSS IN NOISY ENVIRONMENTS

    Takuro Maruyama, Shoko Araki, Tomohiro Nakatani, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Atsushi Nakamura

    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)     269 - 272  2012  [Refereed]

     View Summary

    In this paper, we propose a new technique for sparseness-based underdetermined BSS that is based on the clustering of the frequency-dependent time difference of arrival (TDOA) information and that can cope with diffused noise environments. Such a method with an EM algorithm has already been proposed, however, it required a time-consuming exhaust search for TDOA inference. To remove the need for such an exhaust search, we propose a new technique by focusing on a stereo case. We derive an update rule for analytical TDOA estimation. This update rule eliminates the need for the exhaustive TDOA search, and therefore reduces the computational load. We show experimental results for separation performance and calculation time in comparison with those obtained with the conventional approach. Our reported results validate our proposed method, that is, our proposed method achieves high performance without a high computational cost.

  • Spatial auditory BCI paradigm utilizing N200 and P300 responses

    Zhenyu Cai, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012   ( OS.6-BioSPS.1.4 ) 1-7  2012  [Refereed]

     View Summary

    The paper presents our recent results obtained with a new auditory spatial localization based BCI paradigm in which the ERP shape differences at early latencies are employed to enhance the traditional P300 responses in an oddball experimental setting. The concept relies on the recent results in auditory neuroscience showing a possibility to differentiate early anterior contralateral responses to attended spatial sources. Contemporary stimuli-driven BCI paradigms benefit mostly from the P300 ERP latencies in so called "aha-response" settings. We show the further enhancement of the classification results in spatial auditory paradigms by incorporating the N200 latencies, which differentiate the brain responses to lateral, in relation to the subject head, sound locations in the auditory space. The results reveal that those early spatial auditory ERPs boost online classification results of the BCI application. The online BCI experiments with the multi-command BCI prototype support our research hypothesis with the higher classification results and the improved information-transfer-rates. © 2012 APSIPA.

  • Sonification of Muscular Activity in Human Movements Using the Temporal Patterns in EMG

    Masaki Matsubara, Hiroko Terasawa, Hideki Kadone, Kenji Suzuki, Shoji Makino

    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)   ( OS.6-BioSPS.1.2 ) 1-5  2012  [Refereed]

     View Summary

    Biofeedback is currently considered as an effective method for medical rehabilitation. It aims to increase the awareness and recognition of the body's motion by feeding back the physiological information to the patients in real time. Our goal is to create an auditory biofeedback that aids understanding of the dynamic motion involving multiple muscular parts, with the ultimate aim of clinical rehabilitation use. In this paper, we report the development of a real-time sonification system using EMG, and we propose three sonification methods that represent the data in pitch, timbre, and the combination of polyphonic timbre and loudness. Our user evaluation test involves the task of timing and order identification and a questionnaire about the subjective comprehensibility and the preferences, leading to a discussion of the task performance and usability. The results show that the subjects can understand the order of the muscular activities at 63.7% accuracy on average. And the sonification method with polyphonic timbre and loudness provides an 85.2% accuracy score on average, showing its effectiveness. Regarding the preference of the sound design, we found that there is not a direct relationship between the task performance accuracy and the preference of sound in the proposed implementations.

  • Vibrotactile stimulus frequency optimization for the haptic BCI prototype

    Hiromu Mori, Yoshihiro Matsumito, Shoji Makino, Victor Kryssanov, Tomasz M. Rutkowski

    6th International Conference on Soft Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012     2150 - 2153  2012  [Refereed]

     View Summary

    The paper presents results from a psychophysical study conducted to optimize vibrotactile stimuli delivered to subject finger tips in order to evoke the somatosensory responses to be utilized next in a haptic brain computer interface (hBCI) paradigm. We also present the preliminary EEG evoked responses for the chosen stimulating frequency. The obtained results confirm our hypothesis that the hBCI paradigm concept is valid and it will allow for rapid stimuli presentation in order to improve information-transfer-rate (ITR) of the BCI. © 2012 IEEE.

    DOI

    Scopus

    15
    Citation
    (Scopus)
  • AUTOMATIC SCORING METHOD CONSIDERING QUALITY AND CONTENT OF SPEECH FOR SCAT JAPANESE SPEAKING TEST

    Naoko Okubo, Yuto Yamahata, Takeshi Yamada, Shingo Imai, Kenkichi Ishizuka, Takahiro Shinozaki, Ryuichi Nisimura, Shoji Makino, Nobuhiko Kitawaki

    2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS     72 - 77  2012  [Refereed]

     View Summary

    We are now developing a Japanese speaking test called SCAT, which is part of J-CAT (Japanese Computerized Adaptive Test), a free online proficiency test for Japanese language learners. In this paper, we focus on the sentence-reading-aloud task and the sentence generation task in SCAT, and propose an automatic scoring method for estimating the overall score of answer speech, which is holistically determined by language teachers according to a rating standard. In that process, teachers carefully consider different factors but do not rate the scores of them. We therefore analyze how each factor contributes to the overall score. The factors are divided into two categories: the quality of speech and the content of speech. The former includes pronunciation and intonation, and the latter representation and vocabulary. We then propose an automatic scoring method based on the analysis. Experimental results confirm that the proposed method gives relatively accurate estimates of the overall score.

  • Auditory steady-state response stimuli based BCI application - The optimization of the stimuli types and lengths

    Yoshihiro Matsumoto, Nozomu Nishikawa, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012   ( OS.13-BioSPS.2.3 ) 1-7  2012  [Refereed]

     View Summary

    We propose a method for an improvement of auditory BCI (aBCI) paradigm based on a combination of ASSR stimuli optimization by choosing the subjects' best responses to AM-, flutter-, AM/FM and click-envelope modulated sounds. As the ASSR response features we propose pairwise phase-locking-values calculated from the EEG and next classified using binary classifier to detect attended and ignored stimuli. We also report on a possibility to use the stimuli as short as half a second, which is a step forward in ASSR based aBCI. The presented results are helpful for optimization of the aBCI stimuli for each subject. © 2012 APSIPA.

  • EEG steady state synchrony patterns sonification

    Teruaki Kaniwa, Hiroko Terasawa, Masaki Matsubara, Tomasz M. Rutkowski, Shoji Makino

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012   ( OS.6-BioSPS.1.5 ) 1-6  2012  [Refereed]

     View Summary

    This paper describes an application of a multichannel EEG sonification approach. We present results obtained with a multichannel-sonification method tested with steady-state EEG responses. We elucidate brain synchrony patterns in an auditory domain with utilization of the EEG coherence measure. The transitions in the synchrony patterns are represented as timbre (i.e., spectro-temporal) deviation and as spatial movement of the sound cluster. Our final sonification evaluation experiment with six subjects confirms the validity of the proposed brain synchrony-elucidation approach. © 2012 APSIPA.

  • Distance Attenuation Control of Spherical Loudspeaker Array

    Shigeki Miyabe, Takaya Hayashi, Takeshi Yamada, Shoji Makino

    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)   ( OS.15-SLA.7.2 ) 1-4  2012  [Refereed]

     View Summary

    This paper describes control of distance attenuation using spherical loudspeaker array. Fisher et al. proposed radial filtering with spherical microphone to control the sensitivity to distance from a sound source by modeling the propagation of waves in spherical harmonic domain. Since transfer functions are not changed by swapping their inputs and outputs, we can use the same theory of radial filtering for microphone arrays to the filter design of distance attenuation control with loudspeaker arrays. Experimental results confirmed that the proposed method is effective in low frequencies.

  • The spatial real and virtual sound stimuli optimization for the auditory BCI

    Nozomu Nishikawa, Yoshihiro Matsumoto, Shoji Makino, Tomasz M. Rutkowski

    2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012   ( OS.13-BioSPS.2.6 ) 1-9  2012  [Refereed]

     View Summary

    The paper presents results from a project aiming to create horizontally distributed surround sound sources and virtual sound images as auditory BCI (aBCI) stimuli. The purpose is to create evoked brain wave response patterns depending on attended or ignored sound directions. We propose to use a modified version of the vector based amplitude panning (VBAP) approach to achieve the goal. The so created spatial sound stimulus system for the novel oddball aBCI paradigm allows us to create a multi-command experimental environment with very encouraging results reported in this paper.We also present results showing that a modulation of the sound image depth changes also the subject responses. Finally, we also compare the proposed virtual sound approach with the traditional one based on real sound sources generated from the real loudspeaker directions. The so obtained results confirm the hypothesis of the possibility to modulate independently the brain responses to spatial types and depths of sound sources which allows for the development of the novel multi-command aBCI. © 2012 APSIPA.

  • Psychophysical responses comparison in spatial visual, audiovisual, and auditory BCI-spelling paradigms

    Moonjeong Chang, Nozomu Nishikawa, Zhenyu Cai, Shoji Makino, Tomasz M. Rutkowski

    6th International Conference on Soft Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012     2154 - 2157  2012  [Refereed]

     View Summary

    The paper presents a pilot study conducted with spatial visual, audiovisual and auditory brain-computer-interface (BCI) based speller paradigms. The psychophysical experiments are conducted with healthy subjects in order to evaluate a difficulty and a possible response accuracy variability. We also present preliminary EEG results in offline BCI mode. The obtained results validate a thesis, that spatial auditory only paradigm performs as good as the traditional visual and audiovisual speller BCI tasks. © 2012 IEEE.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Comparison of superimposition and sparse models in blind source separation by multichannel Wiener filter

    Ryutaro Sakanashi, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)   ( OS.18-SLA.9.5 ) 1-6  2012  [Refereed]

     View Summary

    Multichannel Wiener filter proposed by Duong et al. can conduct underdetermined blind source separation (BSS) with low distortion. This method assumes that the observed signal is the superimposition of the multichannel source images generated from multivariate normal distributions. The covariance matrix in each time-frequency slot is estimated by an EM algorithm which treats the source images as the hidden variables. Using the estimated parameters, the source images are separated as the maximum a posteriori estimate. It is worth nothing that this method does not assume the sparseness of sources, which is usually assumed in underdetermined BSS. In this paper we investigate the effectiveness of the three attributes of Duong's method, i.e., the source image model with multivariate normal distribution, the observation model without sparseness assumption, and the source separation by multichannel Wiener filter. We newly formulate three BSS methods with the similar source image model and the different observation model assuming sparseness, and we compare them with Duong's method and the conventional binary masking. Experimental results confirmed the effectiveness of all the three attributes of Duong's method.

  • New analytical calculation and estimation of TDOA for underdetermined BSS in noisy environments

    Takuro Maruyama, Shoko Araki, Tomohiro Nakatani, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Atsushi Nakamura

    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)   ( OS.12-SLA.6.4 ) 1-6  2012  [Refereed]

     View Summary

    We have proposed a new algorithm for sparseness-based underdetermined blind source separation (ESS) that can cope with diffused noise environments. This algorithm includes a technique for estimating the time-difference-of-arrival (TDOA) parameter separately in individual frequency bins for each source. In this paper, we propose methods that integrate the frequency-bin-wise TDOA parameter to estimate the TDOA of each source. The accuracy of TDOA estimation with the proposed approach is shown experimentally in comparison with a conventional approach. The separation performance and calculation time of the proposed approach is also examined.

  • Visualization of conversation flow in meetings by analysis of direction of arrivals and continuousness of utterance

    M. Katoh, Y. Sugimoto, S. Miyabe, S. Makino, T. Yamada, N. Kitawaki

    Tunisian-Japan Symposium on Science, Society & Technology     1-5  2011.11  [Refereed]

  • New EEG components separation method: Data driven Huang-Hilbert transform application to auditory BMI paradigm

    T.M. Rutkowski, Q. Zhao, D.P. Mandic, Z. Cai, A. Cichocki, S. Makino, A.W. Przybyszewski

    Neuroscience 2011     627.15/AAA32  2011.11  [Refereed]

  • Underdetermined BSS in noisy environments with new analytical update rule for TDOA inference

    MARUYAMA,Takuro, ARAKI,Shoko, NAKATANI,Tomohiro, MIYABE,Shigeki, YAMADA,Takashi, MAKINO,Shoji, NAKAMURA,Atsushi

    Technical report of IEICE. EA   111 ( 306 ) 25 - 30  2011.11

     View Summary

    In this research, we propose a method to update estimation of time difference of arrival (TDOA) analytically in sparseness-based underdetermined blind source separation (BSS) with an EM algorithm. Izumi et at. proposed underdetermined BSS that can cope with diffuse noise environments. However, Izumi's method requires discrete exhaustive search to update TDOA parameter every iteration, thereby takes high computational cost. In this paper, focusing on the stereo case, we obtain analytical update of TDOA parameters in each frequency bin using frequency-dependent TDOA modeling. This update rule eliminates the exhaustive TDOA search, and therefore reduces the computational load. We show experimental results of separation performance and calculation time in comparison with those obtained with the conventional approach.

    CiNii

  • Performance estimation of noisy speech recognition based on short-term noise characteristics

    E. Morishita, T. Yamada, S. Makino, N. Kitawaki

    Tunisian-Japan Symposium on Science, Society & Technology     1-4  2011.11  [Refereed]

  • Performance estimation of noisy speech recognition considering the accuracy of acoustic models

    T. Takaoka, T. Yamada, S. Makino, N. Kitawaki

    Tunisian-Japan Symposium on Science, Society & Technology     1-4  2011.11  [Refereed]

  • A study on sound image control method for operational support of touch panel display

    Shigeyoshi, Amano, Takeshi, Yamada, Shoji, Makino, Nobuhiko, Kitawaki

    Proc. APSIPA ASC 2011   ( Thu-PM.PS2 ) 1-1  2011.10  [Refereed]

  • Subjective and objective quality evaluation of noise-reduced speech

    Yamada,Takeshi, Makino,Shoji, Kitawaki,Nobuhiko

    The Journal of the Acoustical Society of Japan   67 ( 10 ) 476 - 481  2011.10  [Refereed]

    CiNii

  • Towards a personalized technical ear training program: An investigation of the effect of adaptive feedback

    T. Kaniwa, S. Kim, H. Terasawa, M. Ikeda, T. Yamada, S. Makino

    Sound and Music Computing Conference     439-443  2011.07  [Refereed]

  • C. elegans meets data sonification: Can we hear its elegant movement?

    H. Terasawa, Y. Takahashi, K. Hirota, T. Hamano, T. Yamada, A. Fukamizu, S. Makino

    Sound and Music Computing Conference     77-82  2011.07  [Refereed]

  • DOA Estimation for Multiple Sparse Sources with Arbitrarily Arranged Multiple Sensors

    Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY   63 ( 3 ) 265 - 275  2011.06  [Refereed]

     View Summary

    This paper proposes a method for estimating the direction of arrival (DOA) of multiple source signals for an underdetermined situation, where the number of sources N exceeds the number of sensors M (M &lt; N). Some DOA estimation methods have already been proposed for underdetermined cases. However, since most of them restrict their microphone array arrangements, their DOA estimation ability is limited to a 2-dimensional plane. To deal with an underdetermined case where sources are distributed arbitrarily, we propose a method that can employ a 2- or 3-dimensional sensor array. Our new method employs the source sparseness assumption to handle an underdetermined case. Our formulation with the sensor coordinate vectors allows us to employ arbitrarily arranged sensors easily. We obtained promising experimental results for 2-dimensionally distributed sensors and sources 3x4, 3x5 (#sensors x #speech sources), and for 3-dimensional case with 4x5 in a room (reverberation time (RT) of 120 ms). We also investigate the DOA estimation performance under several reverberant conditions.

    DOI

    Scopus

    31
    Citation
    (Scopus)
  • B-11-19 A Study on Objective Quality Evaluation Method Applicable to Both Music and Speech

    Mikami,Yuichiro, Yamada,Takeshi, Makino,Shoji, Kitawaki,Nobuhiko

    Proceedings of the IEICE General Conference   2011 ( 2 ) 448  2011.02

    CiNii

  • B-11-18 An Improvement of Overall Quality Estimation Model for Objective Quality Evaluation of Noise-Reduced Speech

    Fujita,Yuki, Yamada,Takeshi, Makino,Shoji

    Proceedings of the IEICE General Conference   2011 ( 2 ) 447 - 447  2011.02

    CiNii

  • An MPEG-2 to H.264 Transcoding Preserving DCT Types and Motion Vectors to Suppress Re-Quantization Noise for Interlace Contents

    YOSHITOME,Takeshi, KAMIKURA,Kazuto, MAKINO,Shoji, KITAWAKI,Nobuhiko

    The IEICE transactions on information and systems (Japanese edetion)   94 ( 2 ) 469 - 480  2011.02  [Refereed]

     View Summary

    インタレース映像を符号化したMPEG-2ストリームをH.264へトランスコードする際に,初段符号化情報を利用して,混入する量子化雑音を低減する手法を提案する.本手法では,MPEG-2のDCT種別と動き補償種別をH.264へ極力継承し,更にフレームベクトルからフィールドベクトルに変換すれば継承可能となるペアMBをDCT種別と動き補償種別の組合せから判別し,ベクトル変換することで継承率を向上させる.実験の結果,符号化情報を利用しない従来手法に比べ,0.19〜0.31dBのPSNR向上が確認できた.

    CiNii

  • Blind source separation of mixed speech in a high reverberation environment

    Keiju Iso, Shoko Araki, Shoji Makino, Tomohiro Nakatani, Hiroshi Sawada, Takeshi Yamada, Atsushi Nakamura

    2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays, HSCMA'11     36 - 39  2011  [Refereed]

     View Summary

    Blind source separation (BSS) is a technique for estimating and separating individual source signals from a mixed signal using only information observed by each sensor. BSS is still being developed for mixed signals that are affected by reverberation. In this paper, we propose combining the BSS method that considers reverberation proposed by Duong et al. with the BSS method reported by Sawada et al., which does not consider reverberation, for the initial setting of the EM algorithm. This proposed method assumes the underdetermined case. In the experiment, we compare the proposed method with the conventional method reported by Duong et al. and that reported by Sawada et al., and demonstrate the effectiveness of the proposed method. © 2011 IEEE.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • Spatial location and sound timbre as informative cues in auditory BCI/BMI - Electrodes position optimization for brain evoked potential enhancement

    Zhenyu Cai, Hiroko Terasawa, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

    APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011   ( Wed-PM.SS4 ) 222 - 227  2011  [Refereed]

     View Summary

    The paper introduces a novel auditory BCI/BMI paradigm based on combined sound timbre and horizontal plane spatial locations as informative cues. The presented concept is based on responses to eight-directional audio stimuli with various tonal and environmental sound stimuli. The approach is based on a monitoring of brain electrical activity by means of the electroencephalogram (EEG). The previously developed by the authors spatial auditory stimulus is extended to varying in timbre sound stimuli which feature helps the subjects to attend to the targets. The main achievement discussed in the paper is an offline BCI analysis based on an optimization of electrode locations on the scalp and evoked response latency for further classification results improvement. The so developed new BCI paradigm is more user-friendly and it leads to better results comparing to previously utilized simple tonal or steady-state stimuli.

  • Restoration of Clipped Audio Signal Using Recursive Vector Projection

    Shin Miura, Hirofumi Nakajima, Shigeki Miyabe, Shoji Makino, Takeshi Yamada, Kazuhiro Nakadai

    2011 IEEE REGION 10 CONFERENCE TENCON 2011     394 - 397  2011  [Refereed]

     View Summary

    This paper proposes signal restoration from clipping effect without prior knowledge. First, an interval of signal including clipped samples is analyzed by recursive vector projection. By analyzing the neighboring samples of the clipped interval and excluding the clipped interval in the analysis of similarity, signal estimation in the clipped interval is estimated as a by-product of the analysis. Since the estimation holds consistency with the neighboring samples, the restored signal does not suffer from click noise. Evaluation of the clipping restoration with various audio signal ascertained that the proposed method improves signal-to-noise ratio.

  • Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source

    Kazuma Takeda, Hirokazu Kameoka, Hiroshi Sawada, Shoko Araki, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

    2011 IEEE REGION 10 CONFERENCE TENCON 2011     413 - 416  2011  [Refereed]

     View Summary

    This paper presents a new method for underdetermined Blind Source Separation (BSS), based on a concept called multichannel complex non-negative matrix factorization (NMF). The method assumes (1) that the time-frequency representations of sources have disjoint support (W-disjoint orthogonality of sources), and (2) that each source is modeled as a superposition of components whose amplitudes vary over time coherently across all frequencies (amplitude coherence of frequency components) in order to jointly solve the indeterminacy involved in the frequency domain underdetermined BSS problem. We confirmed experimentally that the present method performed reasonably well in terms of the signal-to-interference ratio when the mixing process was known.

  • Mora pitch level recognition for the development of a Japanese pitch accent acquisition system

    Greg, Short, Keikichi, Hirose, Takeshi, Yamada, Nobuaki, Minematsu, Nobuhiko, Kitawaki, Shoji, Makino

    Proc. International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques, Oriental COCOSDA 2010     1-6  2010.10  [Refereed]

  • Subjective and Objective Quality Evaluation for Noise-Reduced Speech

    Takeshi,Yamada, Shoji,Makino, Nobuhiko,Kitawaki

    IPSJ SIG Notes   2010 ( 7 ) 1 - 6  2010.10

     View Summary

    To provide users with natural and intelligible speech in noisy environments, the use of a noise reduction algorithm, which reduces the noise component in the noisy input speech, can be effective. It is, however, well-known that any noise reduction algorithm unavoidably produces speech distortion and residual noise. Here, the critical issue is that the characteristics of these undesired byproducts vary according to the noise reduction algorithm used and the type of noise to be reduced. It is therefore essential to establish methods that can be used to evaluate the quality of noise-reduced speech. In this paper, we describe subjective and objective quality evaluation methods for noise-reduced speech.

    CiNii

  • A VC-1 to H.264/AVC intra transcoding using encoding information to reduce re-quantization noise

    T. Yoshitome, Y. Nakajima, K. Kamikura, S. makino, N. Kitawaki

    International Conference on Signal and Image Processing     170-177  2010.08  [Refereed]

  • BS-5-4 Objective Estimation of MOS and Word Intelligibility for Noise-Reduced Speech

    Yamada,Takeshi, Kitawaki,Nobuhiko, Makino,Shoji

    Proceedings of the Society Conference of IEICE   2010 ( 2 ) - 19  2010.08

  • Scattered Speech Signal Detection by Principal Component Analysis for Spatial Power Spectrum

    KATOH,Michiaki, SUGIMOTO,Yuya, MAKINO,Shoji, YAMADA,Takeshi, KITAWAKI,Nobuhiko

    Technical report of IEICE. EA   110 ( 171 ) 25 - 30  2010.08

     View Summary

    It is important for efficiently reviewing meeting speech archives to preliminarily and automatically detect "when, how and who talked". In this paper, we propose a method for automatically detecting a short and scattered signal such as agreements by using only acoustical information. The proposed method has two steps: 1) extract a spatial power spectrum frame-by-frame from the meeting speech archive recorded by a microphone array, and 2) detect the target signal by using an outlier detection algorithm based on principal component analysis. To evaluate the effectiveness of the proposed method, we conducted an experiment using the meeting speech archive recorded in a real room. The experimental results imply that we can detect a long utterance, a short utterance, no utterance from only a few principal components.

    CiNii

  • Special Section on Blind Signal Processing and Its Applications

    Shoji Makino, Andrzej Cichocki, Wei Xing Zheng, Aurelio Uncini

    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS   57 ( 7 ) 1401 - 1403  2010.07  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Special Section on Blind Signal Processing and Its Applications

    Shoji Makino, Andrzej Cichocki, Wei Xing Zheng, Aurelio Uncini

    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS   57 ( 7 ) 1401 - 1403  2010.07

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Underdetermined Blind Source Separation Using Acoustic Arrays

    Shoji Makino, Shoko Araki, Stefan Winter, Hiroshi Sawada

    Handbook on Array Processing and Sensor Networks     303 - 341  2010.04  [Refereed]

    DOI

    Scopus

    8
    Citation
    (Scopus)
  • B-11-1 A Study of Artificial Voices for Telephonometry in the IP-based Telecommunication Networks

    Aoshima,Chika, Kitawaki,Nobuhiko, Yamada,Takeshi, Makino,Shoji

    Proceedings of the IEICE General Conference   2010 ( 2 ) 435  2010.03

  • B-11-2 Full-reference Objective Quality Evaluation for Noise-reduced Speech Using Overall Quality Estimation Model

    Shinohara,Yuki, Yamada,Takeshi, Kitawaki,Nobuhiko, Makino,Shoji

    Proceedings of the IEICE General Conference   2010 ( 2 ) 436  2010.03

  • MPEG-2/H.264 transcoding with vector conversion reducing re-quantization noise

    Takeshi Yoshitome, Kazuto Kamikura, Shoji Makino, Nobuhiko Kitawaki

    Proceedings - International Conference on Computer Communications and Networks, ICCCN     1-6  2010  [Refereed]

     View Summary

    We propose an MPEG-2 to H.264 transcoding method for interlace streams intermingled with frame and field macroblocks. This method uses the encoding information from an MPEG-2 stream and keeps as many DCT coefficients of the original MPEG-2 bitstream as possible. Experimental results show that the proposed method improves PSNR by about 0.19-0.31 dB compared with a conventional method. © 2010 IEEE.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Performance Estimation of Noisy Speech Recognition Considering Recognition Task Complexity

    Takeshi Yamada, Tomohiro Nakajima, Nobuhiko Kitawaki, Shoji Makino

    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4     2042 - 2045  2010  [Refereed]

     View Summary

    To ensure a satisfactory QoE (Quality of Experience) and facilitate system design in speech recognition services, it is essential to establish a method that can be used to efficiently investigate recognition performance in different noise environments. Previously, we proposed a performance estimation method using a spectral distortion measure. However, there is the problem that recognition task complexity affects the relationship between the recognition performance and the distortion value. To solve this problem, this paper proposes a novel performance estimation method considering the recognition task complexity. We confirmed that the proposed method gives accurate estimates of the recognition performance for various recognition tasks by an experiment using noisy speech data recorded in a real room.

  • Comparison of MOS evaluation characteristics for Chinese, Japanese, and English in IP telephony

    Zhenyu Cai, Nobuhiko Kitawaki, Takeshi Yamada, Shoji Makino

    2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings     112 - 115  2010  [Refereed]

     View Summary

    Communication quality in IP telephony is rated in terms of the Mean Opinion Score (MOS), which is an Absolute Category Rating (ACR) scale. There is a problem when comparing subjectively evaluated MOSs in that the evaluation results are strongly affected by differences in language, the instruction words used for the evaluation, and the nationality of the evaluator. To solve these problems, ITU-T SG12 has started to investigate the cultural and language dependencies of subjective quality evaluations undertaken with the MOS method for speech/video/multimedia. In this paper, we present the results of a comparison of the MOS evaluation characteristics for Chinese, Japanese, and English. ©2010 IEEE.

    DOI

    Scopus

    20
    Citation
    (Scopus)
  • A study of artificial voices for telephonometry in the IP-based telecommunication networks

    Chika, Aoshima, Nobuhiko, Kitawaki, Takeshi, Yamada, 山田, 武志, 牧野, 昭二

    Tunisian-Japan Symposium on Science, Society & Technology    2009.11  [Refereed]

  • Analysis of standardized speech database by considering long-term average spectrum

    Naoko, Okubo, Nobuhiko, Kitawaki, Takeshi, Yamada, Makino, Shoji

    Tunisian-Japan Symposium on Science, Society & Technology     1-4  2009.11  [Refereed]

  • DOA estimation for multiple sparse sources with arbitrarily arranged multiple sensors

    S. Araki, H. Sawada, R. Mukai, S. Makino

    Journal of Signal Processing Systems     1-11 - 11  2009.10  [Refereed]

    CiNii

  • Foreword to the special section on blind signal processing and its applications

    牧野昭二

    IEICE Trans. Fundamentals   J92-A ( 5 ) 275 - 275  2009.05  [Refereed]

    CiNii

  • Stereo Source Separation and Source Counting with MAP Estimation with Dirichlet Prior Considering Spatial Aliasing Problem

    Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada, Shoji Makino

    INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS   5441   742 - 750  2009  [Refereed]

     View Summary

    In this paper, we propose a novel sparse source separation method that can estimate the number of sources and time-frequency masks simultaneously, even when the spatial aliasing problem exists. Recently, many sparse Source separation approaches with time-frequency masks have been proposed. However, most of these approaches require information on the number of sources in advance. In our proposed method, we model the phase difference of arrival (PDOA) between microphones with a Gaussian mixture model (GMM) with a Dirichlet prior. Then we estimate the model parameters by using the maximum a posteriori (MAP) estimation based on the EM algorithm. In order to avoid one cluster being modeled by two or more Gaussians, we utilize a sparse distribution modeled by the Dirichlet distributions as the prior of the GMM mixture weight. Moreover, to handle wide microphone spacing cases where the spatial aliasing problem occurs, the indeterminacy of modulus 2 pi k in the phase is also included in our model. Experimental results show good performance of our proposed method.

  • BLIND SPARSE SOURCE SEPARATION FOR UNKNOWN NUMBER OF SOURCES USING GAUSSIAN MIXTURE MODEL FITTING WITH DIRICHLET PRIOR

    Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada, Shoji Makino

    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS     33 - 36  2009  [Refereed]

     View Summary

    In this paper, we propose a novel sparse source separation method that can be applied even if the number of sources is unknown. Recently, many sparse source separation approaches with time-frequency masks have been proposed. However, most of these approaches require information on the number of sources in advance. In our proposed method, we model the histogram of the estimated direction of arrival (DOA) with a Gaussian mixture model (GMM) with a Dirichlet prior. Then we estimate the model parameters by using the maximum a posteriori estimation based on the EM algorithm. In order to avoid one cluster being modeled by two or more Gaussians, we utilize a sparse distribution modeled by the Dirichlet distributions as the prior of the GMM mixture weight. By using this prior, without any specific model selection process, our proposed method can estimate the number of sources and time-frequency masks simultaneously. Experimental results show the performance of our proposed method.

  • Handling speaker position changes in a meeting diarization system by combining DOA clustering and speaker identification

    T. Hager, S. Araki, K. Ishizuka, M. Fujimoto, T. Nakatani, S. Makino

    IWAENC2008     2-12  2008.09  [Refereed]

    CiNii

  • Foreword to the special section on acoustic scene analysis and reproduction

    S., Makino

    IEICE Trans. Fundamentals   E91-A ( 6 ) 1301-1302  2008.06  [Refereed]

  • Recent advances in audio source separation techniques

    H. Sawada, S. Araki, S. Makino

    Journal of IEICE   91 ( 4 ) 292-296 - 296  2008.04  [Refereed]

    CiNii

  • A DOA based speaker diarization system for real meetings

    Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino

    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS     30 - 33  2008  [Refereed]

     View Summary

    This paper presents a speaker diarization system that estimates who spoke when in a meeting. Our proposed system is realized by using a noise robust voice activity detector (VAD), a direction of arrival (DOA) estimator, and a DOA classifier. Our previous system utilized the generalized cross correlation method with the phase transform (GCC-PHAT) approach for the DOA estimation. Because the GCC-PHAT can estimate just one DOA per frame, it was difficult to handle speaker overlaps. This paper tries to deal with this issue by employing a DOA at each time-frequency slot (TFDOA), and reports how it improves diarization performance for real meetings / conversations recorded in a room with a reverberation time of 350 ms.

  • Speaker indexing and speech enhancement in real meetings/conversations

    Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino

    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12     93 - 96  2008  [Refereed]

     View Summary

    This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a GCC-PHAT based direction of arrival (DOA) estimator, and a DOA classifier. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings/conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.

  • Missing feature speech recognition in a meeting situation with maximum SNR beamforming

    Dorothea Kolossa, Shoko Araki, Marc Delcroix, Tomohiro Nakatani, Reinhold Orglmeister, Shoji Makino

    PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10     3218 - +  2008  [Refereed]

     View Summary

    Especially for tasks like automatic meeting transcription, it would be useful to automatically recognize speech also while multiple speakers are talking simultaneously. For this purpose, speech separation can be performed, for example by using maximum SNR beamforming. However, even when good interferer suppression is attained, the interfering speech will still be recognizable during those intervals, where the target speaker is silent. In order to avoid the consequential insertion errors, a new soft masking scheme is proposed, which works in the time domain by inducing a large damping on those temporal periods, where the observed direction of arrival does not correspond to that of the target speaker. Even though the masking scheme is aggressive, by means of missing feature recognition the recognition accuracy can be improved significantly, with relative error reductions in the order of 60% compared to maximum SNR beamforming alone, and it is successful also for three simultaneously active speakers. Results are reported based on the SOLON speech recognizer, NTT's large vocabulary system [1], which is applied here for the recognition of artificially mixed data using real-room impulse responses and the entire clean test set of the Aurora 2 database.

  • Guest editors' introduction: Special section on emergent systems, algorithms, and architectures for speech-based human-machine interaction

    Rodrigo Capobianco Guido, Li Deng, Shoji Makino

    IEEE TRANSACTIONS ON COMPUTERS   56 ( 9 ) 1153 - 1155  2007.09  [Refereed]

  • Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors

    Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

    SIGNAL PROCESSING   87 ( 8 ) 1833 - 1847  2007.08  [Refereed]

     View Summary

    This paper presents a new method for blind sparse source separation. Some sparse source separation methods, which. rely on source sparseness and an anechoic mixing model, have already been proposed. These methods utilize level ratios and phase differences between sensor observations as their features, and they separate signals by classifying them. However, some of the features cannot form clusters with a well-known clustering algorithm, e.g., the k-means. Moreover, most previous methods utilize a linear sensor array (or only two sensors), and therefore they cannot separate symmetrically positioned sources. To overcome such problems, we propose a new feature that can be clustered by the k-means algorithm and that can be easily applied to more than three sensors arranged non-linearly. We have obtained promising results for two- and three-dimensionally distributed speech separation with non-linear/non-uniform sensor arrays in a real room even in underdetermined situations. We also investigate the way in which the performance of such methods is affected by room reverberation, which may cause the sparseness and anechoic assumptions to collapse. (C) 2007 Elsevier B.V. All rights reserved.

    DOI CiNii

    Scopus

    228
    Citation
    (Scopus)
  • Introduction to the special section on blind signal processing for speech and audio applications

    Shoji Makino, Te-Won Lee, Guy J. Brown

    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING   15 ( 5 ) 1509 - 1510  2007.07  [Refereed]

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • MAP-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and ℓ1-norm minimization

    Stefan Winter, Walter Kellermann, Hiroshi Sawada, Shoji Makino

    Eurasip Journal on Advances in Signal Processing   2007  2007  [Refereed]

     View Summary

    We address the problem of underdetermined BSS. While most previous approaches are designed for instantaneous mixtures,we propose a time-frequency-domain algorithm for convolutive mixtures. We adopt a two-step method based on a general maximum a posteriori (MAP) approach. In the first step, we estimate the mixing matrix based on hierarchical clustering, assuming that the source signals are sufficiently sparse. The algorithm works directly on the complex-valued data in the time-frequency domain and shows better convergence than algorithms based on self-organizing maps. The assumption of Laplacian priors for the source signals in the second step leads to an algorithm for estimating the source signals. It involves the ℓ1-norm minimization of complex numbers because of the use of the time-frequency-domain approach. We compare a combinatorial approach initially designed for real numbers with a second-order cone programming (SOCP) approach designed for complex numbers. We found that although the former approach is not theoretically justified for complex numbers, its results are comparable to, or even better than, the SOCP solution. The advantage is a lower computational cost for problems with low input/output dimensions.

    DOI

    Scopus

    93
    Citation
    (Scopus)
  • Blind audio source separation based on independent component analysis

    Shoji Makino, Hiroshi Sawada, Shoko Araki

    INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS   4666   843 - 843  2007  [Refereed]

  • Blind source separation based on a beamformer array and time frequency binary masking

    Jan Cermak, Shoko Araki, Hiroshi Sawada, Shoji Makino

    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS     145 - 148  2007  [Refereed]

     View Summary

    This paper deals with a new technique for blind source separation (BSS) from convolutive mixtures. We present a three-stage separation system employing time-frequency binary masking, beamforming and a non-linear post processing technique. The experiments show that this system outperforms conventional time-frequency binary masking (TFBM) in both (over-)determined and underdetermined cases. Moreover it removes the musical noise and reduces interference in time-frequency slots extracted by TFBM.

  • MLSP 2007 data analysis competition: Frequency-domain blind source separation for convolutive mixtures of speech/audio signals

    Hiroshi Sawada, Shoko Araki, Shoji Makino

    Machine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP     45 - 50  2007  [Refereed]

     View Summary

    This paper describes the frequency-domain approach to the blind source separation of speech/audio signals that are convolutively mixed in a real room environment. With the application of shorttime Fourier transforms, convolutive mixtures in the time domain can be approximated as multiple instantaneous mixtures in the frequency domain. We employ complex-valued independent component analysis (ICA) to separate the mixtures in each frequency bin. Then, the permutation ambiguity of the ICA solutions should be aligned so that the separated signals are constructed properly in the time domain. We propose a permutation alignment method based on clustering the activity sequences of the frequency bin-wise separated signals. We achieved the overall winner status of MLSP 2007 Data Analysis Competition based on the presented method. ©2007 IEEE.

    DOI

    Scopus

    16
    Citation
    (Scopus)
  • A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures

    Hiroshi Sawada, Shoko Araki, Shoji Makino

    2007 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS     157 - 160  2007  [Refereed]

     View Summary

    This paper proposes a two-stage method for the blind separation of convolutively mixed sources. We employ time-frequency masking, which can be applied even to an underdetermined case where the number of sensors is insufficient for the number of sources. In the first stage of the method, frequency bin-wise mixtures are classified based on Gaussian mixture model fitting. In the second stage, the permutation ambiguities of the bin-wise classified signals are aligned by clustering the posterior probability sequences calculated in the first stage. Experimental results for separating four speeches with three microphones under reverberant conditions show the superiority of the proposed method over existing methods based on time-difference-of-arrival estimations or signal envelope clustering.

  • Measuring dependence of bin-wise separated signals for permutation alignment in frequency-domain BSS

    Hiroshi Sawada, Shoko Araki, Shoji Makino

    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11     3247 - 3250  2007  [Refereed]

     View Summary

    This paper presents a new method for grouping bin-wise separated signals for individual sources, i.e., solving the permutation problem, in the process of frequency-domain blind source separation. Conventionally, the correlation coefficient of separated signal envelopes is calculated to judge whether or not the separated signals originate from the same source. In this paper, we propose a new measure that represents the dominance of the separated signal in the mixtures, and use it for calculating the correlation coefficient, instead of a signal envelope. Such dominance measures exhibit dependence/independence more clearly than traditionally used signal envelopes. Consequently, a simple clustering algorithm with centroids works well for grouping separated signals. Experimental results were very appealing, as three sources including two coming from the same direction were separated properly with the new method.

  • Blind speech separation in a meeting situation with maximum SNR beamformers

    Shoko Araki, Hiroshi Sawada, Shoji Makino

    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS     41 - 44  2007  [Refereed]

     View Summary

    We propose a speech separation method for a meeting situation, where each speaker sometimes speaks and the number of speakers changes every moment. Many source separation methods have already been proposed, however, they consider a case where all the speakers keep speaking: this is not always true in a real meeting. In such cases, in addition to separation, speech detection and the classification of the detected speech according to speaker become important issues. For that purpose, we propose a method that employs a maximum signal-to-noise (MaxSNR) beamformer combined with a voice activity detector and online clustering. We also discuss the scaling ambiguity problem as regards the MaxSNR beamformer, and provide their solutions. We report some encouraging results for a real meeting in a room with a reverberation time of about 350 ins.

  • First stereo audio source separation evaluation campaign: Data, algorithms and results

    Emmanuel Vincent, Hiroshi Sawada, Pau Bofill, Shoji Makino, Justinian P. Rosca

    INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS   4666   552 - +  2007  [Refereed]

     View Summary

    This article provides an overview of the first stereo audio source separation evaluation campaign, organized by the authors. Fifteen underdetermined stereo source separation algorithms have been applied to various audio data, including instantaneous, convolutive and real mixtures of speech or music sources. The data and the algorithms are presented and the estimated source signals are compared to reference signals using several objective performance criteria.

  • MAP-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and l(1)-norm minimization

    Stefan Winter, Walter Kellermann, Hiroshi Sawada, Shoji Makino

    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING   2007 ( 24717 ) 1 - 12  2007  [Refereed]

     View Summary

    We address the problem of underdetermined BSS. While most previous approaches are designed for instantaneous mixtures, we propose a time-frequency-domain algorithm for convolutive mixtures. We adopt a two-step method based on a general maximum a posteriori ( MAP) approach. In the first step, we estimate the mixing matrix based on hierarchical clustering, assuming that the source signals are sufficiently sparse. The algorithm works directly on the complex-valued data in the time-frequency domain and shows better convergence than algorithms based on self-organizing maps. The assumption of Laplacian priors for the source signals in the second step leads to an algorithm for estimating the source signals. It involves the l(1)-norm minimization of complex numbers because of the use of the time-frequency-domain approach. We compare a combinatorial approach initially designed for real numbers with a second-order cone programming (SOCP) approach designed for complex numbers. We found that although the former approach is not theoretically justified for complex numbers, its results are comparable to, or even better than, the SOCP solution. The advantage is a lower computational cost for problems with low input/output dimensions. Copyright (C) 2007 Stefan Winter et al.

    DOI

    Scopus

    93
    Citation
    (Scopus)
  • Frequency domain blind source separation in a noisy environment

    R. Mukai, H. Sawada, S. Araki, S. Makino

    2006 Joint meeting of ASA and ASJ     1pSP1  2006.11  [Refereed]

  • Normalized observation vector clustering approach for sparse source separation

    S. Araki, H. Sawada, R. Mukai, S. Makino

    EUSIPCO2006     Wed.5.4.4  2006.09  [Refereed]

  • Underdetermined source separation by ICA and homomorphic signal processing

    S. Winter, W. Kellermann, H. Sawada, S. Makino

    IWAENC2006     Wed.Sep.8  2006.09  [Refereed]

  • Performance evaluation of sparse source separation and DOA estimation with observation vector clustering in reverberant environments

    S. Araki, H. Sawada, R. Mukai, S. Makino

    IWAENC2006     Tue.Sep.4  2006.09  [Refereed]

  • Blind sparse source separation with spatially smoothed time-frequency masking

    S. Araki, H. Sawada, R. Mukai, S. Makino

    IWAENC2006     Wed.Sep.9  2006.09  [Refereed]

    CiNii

  • Parametric-Pearson-based independent component analysis for frequency-domain blind speech separation

    H. Kato, Y. Nagahara, S. Araki, H. Sawada, S. Makino

    EUSIPCO2006     Tue.4.2.5  2006.09  [Refereed]

  • Blind speech separation by combining beamformers and a time frequency binary mask

    J. Cermak, S. Araki, H. Sawada, S. Makino

    IWAENC2006     Tue.Sep.5 - 148  2006.09  [Refereed]

    CiNii

  • Underdetermined source separation for colored sources

    S. Winter, W. Kellermann, H. Sawada, S. Makino

    EUSIPCO2006     Thu.3.1.6  2006.09  [Refereed]

  • Musical noise reduction in time-frequency-binary-masking-based blind source separation systems

    J. Cermak, S. Araki, H. Sawada, S. Makino

    Czech-German Workshop on Speech Processing    2006.09  [Refereed]

  • Stereo echo cancellation algorithm using adaptive update on the basis of enhanced input-signal vector

    S Emura, Y Haneda, A Kataoka, S Makino

    SIGNAL PROCESSING   86 ( 6 ) 1157 - 1167  2006.06  [Refereed]

     View Summary

    Stereo echo cancellation requires a fast converging adaptive algorithm because the stereo input signals are highly cross correlated and the convergence rate of the misalignment is slow even after preprocessing for unique identification of stereo echo paths. To speed up the convergence, we propose enhancing the contribution of the decorrelated components in the preprocessed input-signal vector to adaptive updates. The adaptive filter coefficients are updated on the basis of either a single or multiple past enhanced input-signal vectors.
    For a single-vector update, we show how this enhancement improves the convergence rate by analyzing the behavior of the filter coefficient error in the mean. For a two-past-vector update, simulation showed that the proposed enhancement leads to a faster decrease in misalignment than the corresponding conventional second-order affine projection algorithm while computational complexities are almost the same. (c) 2005 Elsevier B.V. All rights reserved.

    DOI

    Scopus

    18
    Citation
    (Scopus)
  • Solving the permutation problem of frequency-domain BSS when spatial aliasing occurs with wide sensor spacing

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13     4935 - 4938  2006  [Refereed]

     View Summary

    This paper describes a method for solving the permutation problem of frequency-domain blind source separation (BSS). The method analyzes the mixing system information estimated with independent component analysis (ICA). When we use widely spaced sensors or increase the sampling rate, spatial aliasing may occur for high frequencies due to the possibility of multiple cycles in the sensor spacing. In such cases, the estimated information would imply multiple possibilities for a source location. This causes some difficulty when analyzing the information. We propose a new method designed to overcome this difficulty. This method first estimates the model parameters for the mixing system at low frequencies where spatial aliasing does not occur, and then refines the estimations by using data at all frequencies. This refinement leads to precise parameter estimation and therefore precise permutation alignment. Experimental results show the effectiveness of the new method.

  • DOA estimation for multiple sparse sources with normalized observation vector clustering

    Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS     33 - +  2006  [Refereed]

     View Summary

    This paper presents a new method for estimating the direction of arrival (DOA) of source signals whose number N can exceed the number of sensors M. Subspace based methods, e.g., the MUSIC algorithm, have been widely studied, however, they are only applicable when M &gt; N. Another conventional independent component analysis based method allows AY &gt; N, however, it cannot be applied when A,1 &lt; N. By contrast, our new method can be applied where the sources outnumber the sensors (i.e., an underdeterinined case M &lt; N) by assuming source sparseness. Our method can cope with 2- or 3-dimensionally distributed sources with a 2- or 3-dimensional sensor array. We obtained promising experimental results for 3 x 4, 3 x 5 and 4 x 5 (#sensors x #speech sources) in a room (RT60 = 120 ms).

  • Blind source separation of many signals in the frequency domain

    Ryo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13     5827 - 5830  2006  [Refereed]

     View Summary

    This paper describes the frequency-domain blind source separation (BSS) of convolutively mixed acoustic signals using independent component analysis (ICA). The most critical issue related to frequency domain BSS is the permutation problem. This paper presents two methods for solving this problem. Both methods are based on the clustering of information derived from a separation matrix obtained by ICA. The first method is based on direction of arrival (DOA) clustering. This approach is intuitive and easy to understand. The second method is based on normalized basis vector clustering. This method is less intuitive than the DOA based method, but it has several advantages. First, it does not need sensor ar-ray geometry information. Secondly, it can fully utilize the information contained in the separation matrix, since the clustering is performed in high-dimensional space. Experimental results show that our methods realize BSS in various situations such as the separation of many speech signals located in a 3-dimensional space, and the extraction of primary sound sources surrounded by many background interferences.

  • Frequency domain blind source separation of a reduced amount of data using frequency normalization

    Enrique Robledo-Arnuncio, Hiroshi Sawada, Shoji Makino

    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13     5695 - 5698  2006  [Refereed]

     View Summary

    The problem of blind source separation (BSS) from convolutive mixtures is often addressed using independent component analysis in the frequency domain. The separation performance with this approach degrades significantly when only a short amount of data is available, since the estimation of the separation system becomes inaccurate. In this paper we present a novel approach to the frequency domain BSS using frequency normalization. Under the conditions of almost sparse sources and of dominant direct path in the mixing systems, we show that the new approach provides better performance than the conventional one when the amount of available data is small.

  • Blind source separation of convolutive mixtures - art. no. 624709

    Shoji Makino

    Independent Component Analyses, Wavelets, Unsupervised Smart Sensors, and Neural Networks IV   6247 ( 7 ) 24709 - 24709  2006  [Refereed]

     View Summary

    This paper introduces the blind source separation (BSS) of convolutive mixtures of acoustic signals, especially speech. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving, nonlinear decorrelation, nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency, domain adaptive microphone arrays, i.e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. This is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct separation filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction., and sources can be simultaneously active in BSS.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Geometrical interpretation of the PCA subspace approach for overdetermined blind source separation

    S. Winter, H. Sawada, S. Makino

    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING   2006 ( 71632 ) 1-11  2006  [Refereed]

     View Summary

    We discuss approaches for blind source separation where we can use more sensors than sources to obtain a better performance. The discussion focuses mainly on reducing the dimensions of mixed signals before applying independent component analysis. We compare two previously proposed methods. The first is based on principal component analysis, where noise reduction is achieved. The second is based on geometric considerations and selects a subset of sensors in accordance with the fact that a low frequency prefers a wide spacing, and a high frequency prefers a narrow spacing. We found that the PCA-based method behaves similarly to the geometry-based method for low frequencies in the way that it emphasizes the outer sensors and yields superior results for high frequencies. These results provide a better understanding of the former method.

    DOI

    Scopus

    12
    Citation
    (Scopus)
  • Solving the permutation problem of frequency-domain BSS when spatial aliasing occurs with wide sensor spacing

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS     77 - +  2006  [Refereed]

     View Summary

    This paper describes a method for solving the permutation problem of frequency-domain blind source separation (BSS). The method analyzes the mixing system information estimated with independent component analysis (ICA). When we use widely spaced sensors or increase the sampling rate, spatial aliasing may occur for high frequencies due to the possibility of multiple cycles in the sensor spacing. In such cases, the estimated information would imply multiple possibilities for a source location. This causes some difficulty when analyzing the information. We propose a new method designed to overcome this difficulty. This method first estimates the model parameters for the mixing system at low frequencies where spatial aliasing does not occur, and then refines the estimations by using data at all frequencies. This refinement leads to precise parameter estimation and therefore precise permutation alignment. Experimental results show the effectiveness of the new method.

  • Frequency domain blind source separation of a reduced amount of data using frequency normalization

    Enrique Robledo-Arnunciou, Hiroshi Sawada, Shoji Makino

    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS     837 - +  2006  [Refereed]

     View Summary

    The problem of blind source separation (BSS) from convolutive mixtures is often addressed using independent component analysis in the frequency domain. The separation performance with this approach degrades significantly when only a short amount of data is available, since the estimation of the separation system becomes inaccurate. In this paper we present a novel approach to the frequency domain BSS using frequency normalization. Under the conditions of almost sparse sources and of dominant direct path in the mixing systems. we show that the new approach provides better performance than the conventional one when the amount of available data is small.

  • Underdetermined sparse source separation of convolutive mixtures with observation vector clustering

    Shoko Araki, Heroshi Sawada, Ryo Mukai, Shoji Makino

    2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS     3594 - 3597  2006  [Refereed]

     View Summary

    We propose a new method for solving the underdetermined sparse signal separation problem. Some sparseness based methods have already been proposed. However, most of these methods utilized a linear sensor array (or only two sensors), and therefore they have certain limitations; e.g., they cannot separate symmetrically positioned sources. To allow the use of more than three sensors that can be arranged in a non-linear/non-uniform way, we propose a new method that includes the normalization and clustering of the observation vectors. Our proposed method can handle both underdetermined case and (over-)determined cases. We show practical results for speech separation with nonlinear/non-uniform sensor arrangements. We obtained promising experimental results for the cases of 3 x 4, 4 x 5 (#sensors x #sources) in a room (RT60 = 120 ms).

  • DOA estimationfor multiple sparse sources with normalized observation vector clustering

    Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13     4891 - 4894  2006  [Refereed]

     View Summary

    This paper presents a new method for estimating the direction of arrival (DOA) of source signals whose number N can exceed the number of sensors M. Subspace based methods, e.g., the MUSIC algorithm, have been widely studied, however, they are only applicable when M &gt; N. Another conventional independent component analysis based method allows M &gt; N, however, it cannot be applied when M &lt; N. By contrast, our new method can be applied where the sources outnumber the sensors (i.e., an underdetermined case M &lt; N) by assuming source sparseness. Our method can cope with 2- or 3-dimensionally distributed sources with a 2- or 3-dimensional sensor array. We obtained promising experimental results for 3 x 4, 3 x 5 and 4 x 5 (#sensors x #speech sources) in a room (RT60 = 120 ms).

  • On calculating the inverse of separation matrix in frequency-domain blind source separation

    H Sawada, S Araki, R Mukai, S Makino

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION, PROCEEDINGS   3889   691 - 699  2006  [Refereed]

     View Summary

    For blind source separation (BSS) of convolutive mixtures, the frequency-domain approach is efficient and practical, because the convolutive mixtures are modeled with instantaneous mixtures at each frequency bin and simple instantaneous independent component analysis (ICA) can be employed to separate the mixtures. However, the permutation and scaling ambiguities of ICA solutions need to be aligned to obtain proper time-domain separated signals. This paper discusses the idea that calculating the inverses of separation matrices obtained by ICA is very important as regards aligning these ambiguities. This paper also shows the relationship between the ICA-based method and the time-frequency masking method for BSS, which becomes clear by calculating the inverses.

  • Blind source separation of many signals in the frequency domain

    Ryo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS     969 - +  2006  [Refereed]

     View Summary

    This paper describes the frequency-domain blind source separation (BSS) of convolutively mixed acoustic signals using independent component analysis (ICA). The most critical issue related to frequency domain BSS is the permutation problem. This paper presents two methods for solving this problem. Both methods are based on the clustering of information derived from a separation matrix obtained by ICA. The first method is based on direction of arrival (DOA) clustering. This approach is intuitive and easy to understand. The second method is based on normalized basis vector clustering. This method is less intuitive than the DOA based method, but it has several advantages. First, it does not need sensor array geometry information. Secondly, it can fully utilize the information contained in the separation matrix, since the clustering is performed in high-dimensional space. Experimental results show that our methods realize BSS in various situations such as the separation of many speech signals located in a 3-dimensional space. and the extraction of primary sound sources surrounded by many background interferences.

  • Recognition of convolutive speech mixtures by missing feature techniques for ICA

    Dorothea Kolossa, Hiroshi Sawada, Ramon Fernandez Astudillo, Reinhold Orglmeister, Shoji Makino

    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5     1397 - +  2006  [Refereed]

     View Summary

    One challenging problem for robust speech recognition is the cocktail party effect, where multiple speaker signals are active simultaneously in an overlapping frequency range. In that case, independent component analysis (ICA) can separate the signals in reverberant environments, also. However, incurred feature distortions prove detrimental for speech recognition. To reduce consequential recognition errors, we describe the use of ICA for the additional estimation of uncertainty information. This information is subsequently used in missing feature speech recognition, which leads to far more correct and accurate recognition also in reverberant situations at RT60 = 300ms.

  • Blind separation and localization of speeches in a meeting situation

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5     1407 - +  2006  [Refereed]

     View Summary

    The technique of blind source separation (BSS) has been well studied. In this paper, we apply the BSS technique, particularly based on independent component analysis (ICA), to a meeting situation. The goal is to enhance the spoken utterances and to estimate the location of each speaker by means of multiple microphones. The technique may help us to take the minutes of a meeting.

  • Frequency-domain blind source separation of many speech signals using near-field and far-field models

    Ruo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING   2006 ( 83683 ) 1 - 13  2006  [Refereed]

     View Summary

    We discuss the frequency-domain blind source separation (BSS) of convolutive mixtures when the number of source signals is large, and the potential source locations are omnidirectional. The most critical problem related to the frequency-domain BSS is the permutation problem, and geometric information is helpful as regards solving it. In this paper, we propose a method for obtaining proper geometric information with which to solve the permutation problem when the number of source signals is large and some of the signals come from the same or a similar direction. First, we describe a method for estimating the absolute DOA by using relative DOAs obtained by the solution provided by independent component analysis (ICA) and the far-field model. Next, we propose a method for estimating the spheres on which source signals exist by using ICA solution and the near-field model. We also address another problem with regard to frequency-domain BSS that arises from the circularity of discrete-frequency representation. We discuss the characteristics of the problem and present a solution for solving it. Experimental results using eight microphones in a room show that the proposed method can separate a mixture of six speech signals arriving from various directions, even when two of them come from the same direction. Copyright (C) 2006 Ryo Mukai et al.

    DOI

    Scopus

    26
    Citation
    (Scopus)
  • Subband-based blind separation for convolutive mixtures of speech

    S Araki, S Makino, R Aichner, T Nishikawa, H Saruwatari

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E88A ( 12 ) 3593 - 3603  2005.12  [Refereed]

     View Summary

    We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.

    DOI

    Scopus

    21
    Citation
    (Scopus)
  • Underdetermined blind separation for speech in real environments with F0 adaptive comb filtering

    F. Flego, S. Araki, H. Sawada, T. Nakatani, S. Makino

    IWAENC2005     93-96  2005.09  [Refereed]

  • Real-time blind source separation and DOA estimation using small 3-D microphone array

    R. Mukai, H. Sawada, S. Araki, S. Makino

    IWAENC2005     45-48  2005.09  [Refereed]

  • Real-time blind extraction of dominant target sources from many background interference sources

    H. Sawada, R. Mukai, S. Araki, S. Makino

    IWAENC2005     73-76 - 76  2005.09  [Refereed]

    CiNii

  • A novel blind source separation method with observation vector clustering

    S. Araki, H. Sawada, R. Mukai, S. Makino

    IWAENC2005     117-120  2005.09  [Refereed]

  • Blind source separation of convolutive mixtures of audio signals in frequency domain

    S., Makino

    Advances in Circuits and Systems   ( 5 )  2005.08  [Refereed]

  • Underdetermined blind separation of convolutive mixtures of speech using time-frequency mask and mixing matrix estimation

    A Blin, S Araki, S Makino

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E88A ( 7 ) 1693 - 1700  2005.07  [Refereed]

     View Summary

    This paper focuses on the underdetermined blind source separation (BSS) of three speech signals mixed in a real environment from measurements provided by two sensors. To date, solutions to the underdetermined BSS problem have mainly been based on the assumption that the speech signals are sufficiently sparse. They involve designing binary masks that extract signals at time-frequency points where only one signal was assumed to exist. The major issue encountered in previous work relates to the occurrence of distortion, which affects a separated signal with loud musical noise. To overcome this problem, we propose combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to detect when only one source is active and to perform a preliminary separation with a time-frequency mask. This information is then used to estimate the mixing matrix, which allows us to improve our separation. Experimental results show that this combination of time-frequency mask and mixing matrix estimation provides separated signals of better quality (less distortion, less musical noise) than those extracted without using the estimated mixing matrix in reverberant conditions where the reverberant time (TR) was 130 ms and 200 ms. Furthermore, informal listening tests clearly show that musical noise is deeply lowered by the proposed method comparatively to the classical approaches.

    DOI

    Scopus

    19
    Citation
    (Scopus)
  • Blind source separation of convolutive mixtures of speech in frequency domain

    S Makino, H Sawada, R Mukai, S Araki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E88A ( 7 ) 1640 - 1655  2005.07  [Refereed]  [Invited]

     View Summary

    This paper overviews a total solution for frequency-domain blind source separation (BSS) of convolutive mixtures of audio signals, especially speech. Frequency-domain BSS performs independent component analysis (ICA) in each frequency bin, and this is more efficient than time-domain BSS. We describe a sophisticated total solution for frequency-domain BSS, including permutation, scaling, circularity, and complex activation function solutions. Experimental results of 2 x 2, 3 x 3, 4 x 4, 6 x 8, and 2 x 2 (moving sources), (#sources x #microphones) in a room are promising.

    DOI

    Scopus

    57
    Citation
    (Scopus)
  • Frequency-domain blind source separation without array geometry information

    H. Sawada, R. Mukai, S. Araki, S. Makino

    HSCMA2005     d13-d14  2005.03  [Refereed]

  • Blind source separation and DOA estimation using small 3-D microphone array

    R. Mukai, H. Sawada, S. Araki, S. Makino

    HSCMA2005 (Joint Workshop on Hands-Free Speech Communication and Microphone Arrays)     d9-d10  2005.03  [Refereed]

  • Source extraction from speech mixtures with null-directivity pattern based mask

    S. Araki, S. Makino, H. Sawada, R. Mukai

    HSCMA2005     d1-d2  2005.03  [Refereed]

  • Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    Proceedings - IEEE International Symposium on Circuits and Systems     5882 - 5885  2005  [Refereed]

     View Summary

    This paper presents a method for enhancing a target source of interest and suppressing other interference sources. The target source is assumed to be close to sensors, to have dominant power at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e. without knowing the total number of sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. We propose a new sophisticated method for selecting the target source frequency components, and also a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room (reverberation time was 130 ms) are presented to show the effectiveness and characteristics of the proposed method. © 2005 IEEE.

    DOI

    Scopus

    12
    Citation
    (Scopus)
  • Blind extraction of a dominant source signal from mixtures of many sources

    Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings   III   III61 - III64  2005  [Refereed]

     View Summary

    This paper presents a method for enhancing a dominant target source that is close to sensors, and suppressing other interferences. The enhancement is performed blindly, i.e. without knowing the number of total sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage processing technique where a spatial filter is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. To obtain the spatial filter we employ independent component analysis and then select the component of the target source. Time-frequency masks in the second stage are obtained by calculating the angle between the basis vector corresponding to the target source and a sample vector. The experimental results for a simulated cocktail party situation were very encouraging. ©2005 IEEE.

    DOI

    Scopus

    20
    Citation
    (Scopus)
  • Multiple source localization using independent component analysis

    Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

    IEEE Antennas and Propagation Society, AP-S International Symposium (Digest)   4 ( P3 ) 81 - 84  2005  [Refereed]

     View Summary

    This paper presents a method for estimating location information about multiple sources. The proposed method uses independent component analysis (ICA) as a main statistical tool. The nearfield model as well as the farfield model can be assumed in this method. As an application of the method, we show experimental results for the direction-of-arrival (DOA) estimation of three sources that were positioned 3-dimensionally. © 2005 IEEE.

    DOI

    Scopus

    20
    Citation
    (Scopus)
  • Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask

    S Araki, S Makino, H Sawada, R Mukai

    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5   III   81 - 84  2005  [Refereed]

     View Summary

    Musical noise is a typical problem with blind source separation using a time-frequency mask. In this paper, we report that a fine-shift and overlap-add method reduces the musical noise without degrading the separation performance. The effectiveness was confirmed by results of a the listening test undertaken in a room with a reverberation time of RT60 =130 ms.

  • Estimating the number of sources using independent component analysis

    Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

    Acoustical Science and Technology   26 ( 5 ) 450 - 452  2005  [Refereed]

     View Summary

    A new approach for estimating the number of sources that employs independent component analysis (ICA) is discussed. Estimating the number of sources provides information for signal processing applications such as blind source separation (BSS) in the frequency domain. The new method can identify a noise component that includes reverberations by calculating the correlation of the envelopes. The results show that the characteristics of the proposed approach compare with the conventional eigenvalue-based method.

    DOI

    Scopus

    15
    Citation
    (Scopus)
  • Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking

    H Sawada, S Araki, R Mukai, S Makino

    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS   III   5882 - 5885  2005  [Refereed]

     View Summary

    This paper presents a method for enhancing a target source of interest and suppressing other interference sources. The target source is assumed to be close to sensors, to have dominant power at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e. without knowing the total number of sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. We propose a new sophisticated method for selecting the target source frequency components, and also a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room (reverberation time was 130 ms) are presented to show the effectiveness and characteristics of the proposed method.

  • A spatio-temporal fastica algorithm for separating convolutive mixtures

    SC Douglas, H Sawada, S Makino

    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5   V   165 - 168  2005  [Refereed]

     View Summary

    This paper presents a spatio-temporal extension of the well-known fastICA algorithm of Hyvarinen and Oja that is applicable to both convolutive blind source separation and multichannel blind deconvolution tasks. Our time-domain algorithm combines multichannel spatio-temporal prewhitening via multi-stage least-squares linear prediction with a fixed-point iteration involving a new adaptive technique for imposing paraunitary constraints on the multichannel separation filter. Our technique also allows for efficient reconstruction of individual signals as observed in the sensor measurements for single-input, multiple-output (SIMO) BSS tasks. Analysis and simulations verify the utility of the proposed methods.

  • Blind Source Separation of 3-D located many speech signals

    R Mukai, H Sawada, S Araki, S Makino

    2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA)     9 - 12  2005  [Refereed]

     View Summary

    This paper presents if prototype system for Blind Source Separation (BSS) of many speech signals and describes the techniques used in the system. Our System uses 8 microphones located at the vertexes of a 4cmx4cmx4cm cube and has the ability to separate signals distributed in three-dimensional space. The mixed signals observed by the microphone array are processed by Independent Component Analysis (ICA) in the frequency domain and separated into a given number of signals (LIP to 8). We carried Out experiments in all ordinary office and obtained more than 20 dB of SIR improvement.

  • On real and complex valued l(1)-norm minimization for overcomplete blind source separation

    S Winter, H Sawada, S Makino

    2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA)     86 - 89  2005  [Refereed]

     View Summary

    A maximum it-posteriori approach for overcomplete blind source separation based on Laplacian priors usually involves l(1)-norm minimization. It requires different approaches for real and complex numbers it,; they appear for example in the frequency domain. In this paper we compare a combinatorial approach for real numbers with it second order cone programming approach for complex numbers.
    Although the combinatorial solution with a proven minimum number of zeros is not theoretically justified for complex numbers, its performance quality is comparable to the performance of the second order cone programming (SOCP) solution. However, it has the advantage that it is faster for complex overcomplete BSS problems with low input/output dimensions.

  • Hierarchical clustering applied to overcomplete BSS for convolutive mixtures

    S. Winter, H. Sawada, S. Araki, S. Makino

    SAPA2004 (ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing)   1 ( 3 ) 1-6  2004.10  [Refereed]

  • Underdetermined blind speech separation with directivity pattern based continuous mask and ICA

    S. Araki, S. Makino, H. Sawada, R. Mukai

    EUSIPCO2004     1991-1994  2004.09  [Refereed]

  • Blind source separation for moving speech signals using blockwise ICA and residual crosstalk subtraction

    R Mukai, H Sawada, S Araki, S Makino

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E87A ( 8 ) 1941 - 1948  2004.08  [Refereed]

     View Summary

    This paper describes a real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.

  • Convolutive blind source separation for more than two sources in the frequency domain

    Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

    Acoustical Science and Technology   25 ( 4 ) 296 - 298  2004.07  [Refereed]

     View Summary

    The use of blind source separation (BSS) technique for the recovery of more than two sources inthe frequency domain was iprensented. It was found that frequency-domain BSS method was practically applicable for more than two sources by overcoming problem of permutation and circularity. The minimization error could be done by adjusting the scaling ambiguity of the independent component analysis (ICA) solution before windowing. The result shows that the effectiveness and efficiency of the BSS method and the separation of six sources with a planar array of eight sensors.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Underdetermined blind source separation for convolutive mixtures exploiting a sparseness-mixing matrix estimation (SMME)

    A. Blin, S. Araki, S. Makino

    ICA2004 (International Congress on Acoustics)   IV   3139-3142  2004.04  [Refereed]

  • A causal frequency-domain implementation of a natural gradient multichannel blind deconvolution and source separation algorithm

    S. Douglas, H. Sawada, S. Makino

    ICA2004 (International Congress on Acoustics)   I   85-88  2004.04  [Refereed]

  • Solving the permutation and circularity problems of frequency-domain blind source separation

    H. Sawada, R. Mukai, S. Araki, S. Makino

    ICA2004 (International Congress on Acoustics)   I   89-92  2004.04  [Refereed]

  • Algorithmic complexity based blind source separation for convolutive speech mixtures

    S. de la Kethulle, R. Mukai, H. Sawada, S. Makino

    ICA2004 (International Congress on Acoustics)   IV   3127-3130  2004.04  [Refereed]

  • A solution for the permutation problem in frequency domain BSS using near- and far-field models

    R. Mukai, H. Sawada, S. Araki, S. Makino

    ICA2004 (International Congress on Acoustics)   IV   3135-3138  2004.04  [Refereed]

  • Underdetermined blind separation of convolutive mixtures of speech by combining time-frequency masks and ICA

    S. Araki, S. Makino, A. Blin, R. Mukai, H. Sawada

    ICA2004 (International Congress on Acoustics)   I   321-324  2004.04  [Refereed]

  • Evaluation of separation and dereverberation performance in frequency domain blind source separation

    Ryo Mukai, Shoko Araki, Hiroshi Sawada, Shoji Makino

    Acoustical Science and Technology   25 ( 2 ) 119 - 126  2004.03  [Refereed]

     View Summary

    In this paper, we propose a new method for evaluating the separation and dereverberation performance of a convolutive blind source separation (BSS) system, and investigate a separating system obtained by employing frequency domain BSS based on independent component analysis (ICA). As a result, we reveal the acoustical characteristics of the frequency domain BSS for convolutive mixture of speech signals. We show that the separating system removes the direct sound of a jammer signal even when the frame length is relatively short, and it also reduces the reverberation of the jammer according to the frame length. We also confirm that the reverberation of the target is not reduced. Moreover, we propose a technique, suggested by the experimental results, for improving the quality of the separated signals by removing pre-echo noise.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • Underdetermined blind separation for speech in real environments with sparseness and ICA

    S Araki, S Makino, A Blin, R Mukai, H Sawada

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS   III   881 - 884  2004  [Refereed]

     View Summary

    In this paper, we propose a method for separating speech signals when there are more signals than sensors. Several methods have already been proposed for solving the underdetermined problem, and some of these utilize the sparseness of speech signals. These methods employ binary masks to extract the signals, and therefore, their extracted signals contain loud musical noise. To overcome this problem, we propose combining a sparseness approach and independent component analysis (ICA). First, using sparseness, we estimate the time points when only one source is active. Then, we remove this single source from the observations and apply ICA to the remaining mixtures. Experimental results show that our proposed sparseness and ICA (SPICA) method can separate signals with little distortion even in reverberant conditions of T-R=130 and 200 ms.

  • Frequency domain blind source separation using small and large spacing sensor pairs

    R Mukai, H Sawada, S Araki, S Makino

    2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 5, PROCEEDINGS   V   1 - 4  2004  [Refereed]

     View Summary

    This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when the number of source signals is large, and the potential source locations are onmidirectional. We propose a combination of small and large spacing sensor pairs with various axis directions in order to obtain proper geometrical information for solving the permutation problem. Experimental results show that the proposed method can separate a mixture of six speech signals that come from various directions, even when two of them come from the same direction.

  • Convolutive blind source separation for more than two sources in the frequency domain

    H Sawada, R Mukai, S Araki, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS   III   885 - 888  2004  [Refereed]

     View Summary

    Blind source separation (BSS) for convolutive mixtures can be efficiently achieved in the frequency domain, where independent component analysis is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem, which is well known as a difficult problem, especially when the number of sources is large. This paper presents a method for solving the permutation problem, which works well even for many sources. The successful solution for the permutation problem highlights another problem with frequency-domain BSS that arises from the circularity of discrete frequency representation. This paper discusses the phenomena of the problem and presents a method for solving it. With these two methods, we can separate many sources with a practical execution time. Moreover, real-time processing is currently possible for up to three sources with our implementation.

  • Audio source separation based on independent component analysis

    S Makino, S Araki, R Mukai, H Sawada

    2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 5, PROCEEDINGS   V   668 - 671  2004  [Refereed]

     View Summary

    This paper introduces the blind source separation (BSS) of convolutive mixtures of acoustic signals, especially speech. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving nonlinear decorrelation,.nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. This is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct unmixing filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction, and sources can be simultaneously active in BSS.

  • Near-field frequency domain blind source separation for convolutive mixtures

    R Mukai, H Sawada, S Araki, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS   IV   49 - 52  2004  [Refereed]

     View Summary

    This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when source signals come from the same or similar directions. Geometric information such as the direction of arrival (DOA) is helpful for solving the permutation problem, and a combination of the DOA based and correlation based methods provides a robust and precise solution. However when signals come from similar directions, the DOA based approach fails, and we have to use only the correlation based method whose performance is unstable. In this paper, we show that an interpretation of the ICA solution by a near-field model yields information about spheres on which source signals exist, which can be used as an alternative to the DOA. Experimental results show that the proposed method can robustly separate a mixture of signals arriving from the same direction.

  • On coefficient delay in natural gradient blind deconvolution and source separation algorithms

    SC Douglas, H Sawada, S Makino

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION   3195   634 - 642  2004  [Refereed]

     View Summary

    In this paper, we study the performance effects caused by coefficient delays in natural gradient blind deconvolution and source separation algorithms. We present a statistical analysis of the effect of coefficient delays within such algorithms, quantifying the relative loss in performance caused by such coefficient delays with respect to delayless algorithm updates. We then propose a simple change to one such algorithm to improve its convergence performance.

  • Overcomplete BSS for convolutive mixtures based on hierarchical clustering

    S Winter, H Sawada, S Araki, S Makino

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION   3195   652 - 660  2004  [Refereed]

     View Summary

    In this paper we address the problem of overcomplete BSS for convolutive mixtures following a two-step approach. In the first step the mixing matrix is estimated, which is then used to separate the signals in the second step. For estimating the mixing matrix we propose an algorithm based on hierarchical clustering, assuming that the source signals are sufficiently sparse. It has the advantage of working directly on the complex valued sample data in the frequency-domain. It also shows better convergence than algorithms based on self-organizing maps. The results are improved by reducing the variance of direction of arrival. Experiments show accurate estimations of the mixing matrix and very low musical tone noise.

  • Natural gradient multichannel blind deconvolution and source separation using causal FIR filters

    SC Douglas, H Sawada, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS   V   477 - 480  2004  [Refereed]

     View Summary

    Practical gradient-based adaptive algorithms for multichannel blind deconvolution and convolutive blind source separation typically employ FIR filters for the separation system. Inadequate use of signal truncation within these algorithms can introduce steady-state biases into their converged solutions that lead to degraded separation and deconvolution performances. In this paper, we derive a natural gradient multichannel blind deconvolution and source separation algorithm that mitigates these effects for estimating causal FIR solutions to these tasks. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for acoustic sources, even for extremely-short separation filters.

  • Convolutive blind source separation for more than two sources in the frequency domain

    H Sawada, R Mukai, S Araki, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS   25 ( 4 ) 885 - 888  2004  [Refereed]

     View Summary

    Blind source separation (BSS) for convolutive mixtures can be efficiently achieved in the frequency domain, where independent component analysis is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem, which is well known as a difficult problem, especially when the number of sources is large. This paper presents a method for solving the permutation problem, which works well even for many sources. The successful solution for the permutation problem highlights another problem with frequency-domain BSS that arises from the circularity of discrete frequency representation. This paper discusses the phenomena of the problem and presents a method for solving it. With these two methods, we can separate many sources with a practical execution time. Moreover, real-time processing is currently possible for up to three sources with our implementation.

  • Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ICA

    S Araki, S Makino, H Sawada, R Mukai

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION   3195   898 - 905  2004  [Refereed]

     View Summary

    We propose a method for separating N speech signals with M sensors where N &gt; M. Some existing methods employ binary masks to extract the signals, and therefore, the extracted signals contain loud musical noise. To overcome this problem, we propose using a directivity pattern based continuous mask, which masks N - M sources in the observations, and independent component analysis (ICA) to separate the remaining mixtures. We conducted experiments for N = 3 with M = 2 and N = 4 with M = 2, and obtained separated signals with little distortion.

  • Natural gradient multichannel blind deconvolution and source separation using causal FIR filters

    SC Douglas, H Sawada, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS   13 ( 1 ) 477 - 480  2004  [Refereed]

     View Summary

    Practical gradient-based adaptive algorithms for multichannel blind deconvolution and convolutive blind source separation typically employ FIR filters for the separation system. Inadequate use of signal truncation within these algorithms can introduce steady-state biases into their converged solutions that lead to degraded separation and deconvolution performances. In this paper, we derive a natural gradient multichannel blind deconvolution and source separation algorithm that mitigates these effects for estimating causal FIR solutions to these tasks. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for acoustic sources, even for extremely-short separation filters.

  • A sparseness - Mixing Matrix Estimation (SMME) solving the underdetermined BSS for convolutive mixtures

    A Blin, S Araki, S Makino

    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS   IV   85 - 88  2004  [Refereed]

     View Summary

    We propose a method for blindly separating real environment speech signals with as less distortion as possible in the special case where speech signals outnumber sensors. Our idea consists in combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to perform a preliminary separation and to detect when only one source is active. This information is then used to estimate the mixing matrix. Then we remove one source from the observations and separate the residual signals with the inverse of the estimated mixing matrix. Experimental results in a real environment (T-R=130ms and 200ms) show that our proposed method, which we call Sparseness Mixing Matrix Estimation (SMME), provides separated signals of better quality than those extracted by only using the sparseness property of the speech signal.

  • Frequency domain blind source separation for many speech signals

    R Mukai, H Sawada, S Araki, S Makino

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION   3195   461 - 469  2004  [Refereed]

     View Summary

    This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when the number of source signals is large, and the potential source locations are omnidirectional. We propose a combination of small and large spacing sensor pairs with various axis directions in order to obtain proper geometric information for solving the permutation problem. Experimental results in a room (reverberation time T-R = 130 ms) with eight microphones show that the proposed method can separate a mixture of six speech signals that come from various directions, even when two of them come from the same direction.

  • Estimating the number of sources for frequency-domain blind source separation

    H Sawada, S Winter, R Mukai, S Araki, S Makino

    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION   3195   610 - 617  2004  [Refereed]

     View Summary

    Blind source separation (BSS) for convolutive mixtures can be performed efficiently in the frequency domain, where independent component analysis (ICA) is applied separately in each frequency bin. To solve the permutation problem of frequency-domain BSS robustly, information regarding the number of sources is very important. This paper presents a method for estimating the number of sources from convolutive mixtures of sources. The new method estimates the power of each source or noise component by using ICA and a scaling technique to distinguish sources and noises. Also, a reverberant component can be identified by calculating the correlation of component envelopes. Experimental results for up to three sources show that the proposed method worked well in a reverberant condition whose reverberation time was 200 ms.

  • Underdetermined blind separation of convolutive mixtures of speech with binary masks and ICA

    S. Araki, S. Makino, H. Sawada, A. Blin, R. Mukai

    NIPS2003 Workshop on ICA: Sparse Representations in Signal Processing   2 ( 7 ) 1-4  2003.12  [Refereed]

  • Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming for convolutive mixtures

    S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, H. Saruwatari

    EURASIP Journal on Applied Signal Processing   2003 ( 11 ) 1157-1166 - 1166  2003.11  [Refereed]

    CiNii

  • Blind source separation when speech signals outnumber sensors using a sparseness-mixing matrix estimation (SMME)

    A. Blin, S. Araki, S. Makino

    IWAENC2003     211-214 - 214  2003.09  [Refereed]

    CiNii

  • Blind separation of more speech than sensors with less distortion by combining sparseness and ICA

    S. Araki, S. Makino, A. Blin, R. Mukai, H. Sawada

    IWAENC2003     271-274  2003.09  [Refereed]

    CiNii

  • Spectral smoothing for frequency-domain blind source separation

    H. Sawada, R. Mukai, S. de la Kethulle, S. Araki, S. Makino

    IWAENC2003     311-314  2003.09  [Refereed]

    CiNii

  • Blind source separation for convolutive mixtures based on complexity minimization

    S. de la Kethulle, R. Mukai, H. Sawada, S. Makino

    IWAENC2003     303-306  2003.09  [Refereed]

  • Array geometry arrangement for frequency domain blind source separation

    R. Mukai, H. Sawada, S. de la Kethulle, S. Araki, S. Makino

    IWAENC2003     219-222 - 222  2003.09  [Refereed]

    CiNii

  • Multistage ICA for blind source separation of real acoustic convolutive mixture

    T. Nishikawa, H. Saruwatari, K. Shikano, S. Araki, S. Makino

    ICA2003     523-528  2003.04  [Refereed]

    CiNii

  • Subband based blind source separation with appropriate processing for each frequency band

    S. Araki, S. Makino, R. Aichner, T. Nishikawa, H. Saruwatari

    ICA2003     499-504  2003.04  [Refereed]

    CiNii

  • Geometrical interpretation of the PCA subspace method for overdetermined blind source separation

    S. Winter, H. Sawada, S. Makino

    ICA2003     775-780 - 780  2003.04  [Refereed]

    CiNii

  • Real-time blind source separation for moving speakers using blockwise ICA and residual crosstalk subtraction

    R. Mukai, H. Sawada, S. Araki, S. Makino

    ICA2003     975-980 - 980  2003.04  [Refereed]

    CiNii

  • On-line time-domain blind source separation of nonstationary convolved signals

    R. Aichner, H. Buchner, S. Araki, S. Makino

    ICA2003     987-992  2003.04  [Refereed]

  • A robust and precise method for solving the permutation problem of frequency-domain blind source separation

    H. Sawada, R. Mukai, S. Araki, S. Makino

    ICA2003     505-510  2003.04  [Refereed]

    CiNii

  • Geometrically constrained ICA for robust separation of sound mixtures

    M. Knaak, S. Araki, S. Makino

    ICA2003     951-956  2003.04  [Refereed]

  • Polar coordinate based nonlinear function for frequency-domain blind source separation

    H Sawada, R Mukai, S Araki, S Makino

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E86A ( 3 ) 590 - 596  2003.03  [Refereed]

     View Summary

    This paper discusses a nonlinear function for independent component analysis to process complex-valued signals in frequency-domain blind source separation. Conventionally, nonlinear functions based on the Cartesian coordinates are widely used. However, such functions have a convergence problem. In this paper, we propose a more appropriate nonlinear function that is based on the polar coordinates of a complex number. In addition, we show that the difference between the two types of functions arises from the assumed densities of independent components. Our discussion is supported by several experimental results for separating speech signals, which show that the polar type nonlinear functions behave better than the Cartesian type.

  • A robust approach to the permutation problem of frequency-domain blind source separation

    H Sawada, R Mukai, S Araki, S Makino

    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS   V   381 - 384  2003  [Refereed]

     View Summary

    This paper presents a robust and precise method for solving the permutation problem of frequency-domain blind source separation. It is based on two previous approaches: the direction of arrival estimation approach and the inter-frequency correlation approach. We discuss the advantages and disadvantages of the two approaches, and integrate them to exploit the both advantages. We also present a closed form formula to calculate a null direction, which is used in estimating the directions of source signals. Experimental results show that our method solved permutation problems almost perfectly for a situation that two sources were mixed in a room whose reverberation time was 300 ms.

  • Robust real-time blind source separation for moving speakers in a room

    R Mukai, H Sawada, S Araki, S Makino

    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS   V   469 - 472  2003  [Refereed]

     View Summary

    This paper describes a robust real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.

  • Geometrically constraint ICA for convolutive mixtures of sound

    M Knaak, S Araki, S Makino

    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS   II   725 - 728  2003  [Refereed]

     View Summary

    The goal of this contribution is a new algorithm using independent component analysis with a geometrical constraint. The new algorithm solves the permutation problem of blind source separation of acoustic mixtures, and it is significantly less sensitive to the precision of the geometrical constraint than an adaptive beamformer. A high degree of robustness is very important since the steering vector is always roughly estimated in the reverberant environment, even when the look direction is precise. The new algorithm is based on FastICA and constrained optimization. It is theoretically and experimentally analyzed with respect to the roughness of the steering vector estimation by using impulse responses of real room. The effectiveness of the algorithms for real-world mixtures is also shown in the case of three sources and three microphones.

  • Direction of arrival estimation for multiple source signals using independent component analysis

    H Sawada, R Mukai, S Makino

    SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 2, PROCEEDINGS     411 - 414  2003  [Refereed]

     View Summary

    This paper presents a new method for estimating the directions of source signals. We assume a situation in which multiple source signals are mixed in a reverberant condition and observed at several sensors. The new method is based on independent component analysis, which separates mixed signals into original source signals. It can be applied where the number of sources is equal to the number of sensors, whereas the conventional methods based on sub-space analysis, such as the MUSIC algorithm, are applicable where there are fewer sources than sensors. Even in cases where the MUSIC algorithm can be applied, the new method is better at estimating the directions of sources if they are closely placed.

  • Subband based blind source separation for convolutive mixtures of speech

    S Araki, S Makino, R Aichner, T Nishikawa, H Saruwatari

    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS   V   509 - 512  2003  [Refereed]

     View Summary

    Subband processing is applied to blind source, separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In our proposed subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can handle long reverberation. Subband BSS achieves better performance than frequency-domain BSS. Moreover, we propose efficient separation procedures that take into consideration the frequency characteristics of room reverberation and speech signals. We achieve this (3) by using longer unmixing filters in low frequency bands, and (4) by adopting overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized in the proposed subband BSS.

  • Geometrical understanding of the PCA subspace method for overdetermined blind source separation

    S Winter, H Sawada, S Makino

    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS     769 - 772  2003  [Refereed]

     View Summary

    In this paper, we discuss approaches for blind source separation where we can use more sensors than the number of sources for a better performance. The discussion focuses mainly on reducing the dimension of mixed signals before applying independent component analysis. We compare two previously proposed methods. The first is based on principal component analysis, where noise reduction is achieved. The second involves selecting a subset of sensors based on the fact that a low frequency prefers a wide spacing and a high frequency prefers a narrow spacing. We found that the PCA-based method behaves similarly to the geometry-based method for low frequencies in the way that it emphasizes the outer sensors and yields superior results for high frequencies, which provides a better understanding of the former method.

  • Natural gradient blind deconvolution and equalization using causal FIR filters

    SC Douglas, HO Sawada, S Makino

    CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2   2 ( 3 ) 197 - 201  2003  [Refereed]

     View Summary

    Natural gradient adaptation is an especially convenient method for adapting the coefficients of a linear system in inverse filtering tasks such as blind deconvolution and equalization. Practical implementations of such methods require truncation of the filter impulse responses within the gradient updates. In this paper, we show how truncation of these filter impulse responses can create convergence problems and introduces a bias into the steady-state solution of one such algorithm. We then show how this algorithm can be modified to effectively mitigate these effects for estimating causal FIR approximations to doubly-infinite IIR equalizers. Simulations indicate that the modified algorithm provides the convergence benefits of the natural gradient while still attaining good steady-state performance.

  • ICA-based blind source separation of sounds

    S. Makino, S. Araki, R. Mukai, H. Sawada, H. Saruwatari

    JCA2002 (China-Japan Joint Conference on Acoustics)     83-86 - 86  2002.11  [Refereed]

    CiNii

  • Digital technologies for controlling room acoustics

    M. Miyoshi, S. Makino

    JCA2002 (China-Japan Joint Conference on Acoustics)     19-24  2002.11  [Refereed]

  • Blind source separation for convolutive mixtures of speech using subband processing

    S. Araki, S. Makino, R. Aichner, T. Nishikawa, H. Saruwatari

    SMMSP2002 (International Workshop on Spectral Methods and Multirate Signal Processing)     195-202  2002.09  [Refereed]

    CiNii

  • Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming

    S Araki, Y Hinamoto, S Makino, T Nishikawa, R Mukai, H Saruwatari

    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS     1785 - 1788  2002  [Refereed]

     View Summary

    Frequency domain Blind Source Separation (BSS) is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., Adaptive Beamformers (ABFs). The minimization of the off-diagonal components in the BSS update equation can be viewed as the minimization of the mean square error in the ABF. The unmixing matrix of the BSS and the filter coefficients of the ABF converge to the same solution in the mean square error sense if the two source signals are ideally independent. Therefore, the performance of the BSS is limited by that of the ABF. This understanding gives an interpretation of BSS from physical point of view.

  • Time domain blind source separation of non-stationary convolved signals by utilizing geometric beamforming

    R Aichner, S Araki, S Makino

    NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS     445 - 454  2002  [Refereed]

     View Summary

    We propose a time-domain BSS algorithm that utilizes geometric information such as sensor positions and assumed locations of sources. The algorithm tackles the problem of convolved mixtures by explicitly exploiting the non-stationarity of the acoustic sources. The learning rule is based on secondorder statistics and is derived by natural gradient minimization. The proposed initialization of the algorithm is based on the null beamforming principle. This method leads to improved separation performance, and the algorithm is able to estimate long unmixing FIR filters in the time domain due to the geometric initialization. We also propose a post-filtering method for dewhitening which is based on the scaling technique in frequency-domain BSS. The validity of the proposed method is shown by computer simulations. Our experimental results confirm that the algorithm is capable of separating real-world speech mixtures and can be applied to short learning data sets down to a few seconds. Our results also confirm that the proposed dewhitening post-filtering method maintains the spectral content of the original speech in the separated output.

  • Enhanced frequency-domain adaptive algorithm for stereo echo cancellation

    S Emura, Y Haneda, S Makino

    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS   II   1901 - 1904  2002  [Refereed]

     View Summary

    Highly cross-correlated input signals create the problem of slow convergence of misalignment in stereo echo cancellation even after undergoing non-linear preprocessing. We propose a new frequency-domain adaptive algorithm that improves the convergence rate by increasing the contribution of non-linearity in the adjustment vector. Computer simulation showed that it is effective when the non-linearity gain is small.

  • Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming

    S Araki, Y Hinamoto, S Makino, T Nishikawa, R Mukai, H Saruwatari

    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS   II   1785 - 1788  2002  [Refereed]

     View Summary

    Frequency domain Blind Source Separation (BSS) is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., Adaptive Beamformers (ABFs). The minimization of the off-diagonal components in the BSS update equation can be viewed as the minimization of the mean square error in the ABF. The unmixing matrix of the BSS and the filter coefficients of the ABF converge to the same solution in the mean square error sense if the two source signals are ideally independent. Therefore, the performance of the BSS is limited by that of the ABF. This understanding gives an interpretation of BSS from physical point of view.

  • Polar coordinate based nonlinear function for frequency-domain blind source separation

    H Sawada, R Mukai, S Araki, S Makino

    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS   I   1001 - 1004  2002  [Refereed]

     View Summary

    This paper presents a new type of nonlinear function for independent component analysis to process complex-valued signals, which is used in frequency-domain blind source separation. The new function is based on the polar coordinates of a complex number, whereas the conventional one is based on the Cartesian coordinates. The new function is derived from the probability density function of frequency-domain signals that are assumed to be independent of the phase. We show that the difference between the two types of functions is in the assumed densities of independent components. Experimental results for separating speech signals show that the new nonlinear function behaves better than the conventional one.

  • Removal of residual cross-talk components in blind source separation using time-delayed spectral subtraction

    R Mukai, S Araki, H Sawada, S Makino

    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS   II   1789 - 1792  2002  [Refereed]

     View Summary

    This paper describes a post processing method to refine output signals obtained by Blind Source Separation (BSS). The performance of BSS using Independent Component Analysis (ICA) declines significantly in a reverberant environment. The degradation is mainly caused by the cross-talk components derived from the reverberation of the jammer signal. Utilizing this knowledge, we propose a new method, time-delayed non-stationary spectral subtraction, which removes the residual components from the separated signals precisely. The proposed method compensates for the weakness of BSS in a reverberant environment. Experimental results using speech signals show that the proposed method improves the signal-to-noise ratio by 3 to 5 dB.

  • Removal of residual crosstalk components in blind source separation using LMS filters

    R Mukai, S Araki, H Sawada, S Makino

    NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS     435 - 444  2002  [Refereed]

     View Summary

    The performance of Blind Source Separation (BSS) using Independent Component Analysis (ICA) declines significantly in a reverberant environment. The degradation is mainly caused by the residual crosstalk components derived from the reverberation of the jammer signal. This paper describes a post-processing method designed to refine output signals obtained by BSS.
    We propose a new method which uses LMS filters in the frequency domain to estimate the residual crosstalk components in separated signals. The estimated components are removed by nonstational spectral subtraction. The proposed method removes the residual components precisely, thus it compensates for the weakness of BSS in a reverberant environment.
    Experimental results using speech signals show that the proposed method improves the signal-to-interference ratio by 3 to 5 dB.

  • Blind source separation with different sensor spacing and filter length for each frequency range

    H Sawada, S Araki, R Mukai, S Makino

    NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS     465 - 474  2002  [Refereed]

     View Summary

    This paper presents a method for blind source separation using several separating subsystems whose sensor spacing and filter length can be configured individually. Each subsystem is responsible for source separation of an allocated frequency range. With this mechanism, we can use appropriate sensor spacing as well as filter length for each frequency range. We obtained better separation performance than with the conventional method by using a wide sensor spacing and a long filter for a low frequency range, and a narrow sensor spacing and a short filter for a high frequency range.

  • Separation and dereverberation performance of frequency domain blind source separation

    R. Mukai, S. Araki, S. Makino

    ICA2001     230-235  2001.12  [Refereed]

  • A polar-coordinate based activation function for frequency domain blind source separation

    H. Sawada, R. Mukai, S. Araki, S. Makino

    ICA2001     663-668  2001.12  [Refereed]

    CiNii

  • Blind source separation in a real room

    S. Makino, S. Araki, R. Mukai, S. Katagiri

    Journal of IEICE Japan   84 ( 11 ) 848 - 848  2001.11  [Refereed]

    CiNii

  • Equivalence between frequency domain blind source separation and frequency domain adaptive beamformers

    S. Araki, S. Makino, R. Mukai, H. Saruwatari

    CRAC (A workshop on Consistent and Reliable acoustic cues for sound analysis)   2 ( 4 ) 1-4  2001.09  [Refereed]

  • ICASSP2001 conference report

    S. Makino, S. Araki

    Journal of Japanese Society for Artificial Intelligence   16 ( 5 ) 736-737  2001.09  [Refereed]

  • Adaptive filtering algorithm enhancing decorrelated additive signals for stereo echo cancellation

    S. Emura, Y. Haneda, S. Makino

    IWAENC2001     67-70  2001.09  [Refereed]

  • Separation and dereverberation performance of frequency domain blind source separation in a reverberant environment

    R. Mukai, S. Araki, S. Makino

    IWAENC2001     127-130  2001.09  [Refereed]

    CiNii

  • Equivalence between frequency domain blind source separation and frequency domain adaptive null beamformers

    S. Araki, S. Makino, R. Mukai, H. Saruwatari

    Eurospeech2001     2595-2598  2001.09  [Refereed]

  • Separation and dereverberation performance of frequency domain blind source separation for speech in a reverberant environment

    R. Mukai, S. Araki, S. Makino

    Eurospeech2001 (European Conference on Speech Communication and Technology)     2599-2602  2001.09  [Refereed]

  • A design of a hands-free communication unit using loudspeakers and microphones with a flat directional pattern

    A. Nakagawa, S. Shimauchi, Y. Haneda, S. Aoki, S. Makino

    J. Acoust. Soc. Jpn   57 ( 8 ) 509-516 - 516  2001.08  [Refereed]

    CiNii

  • Fundamental limitation of frequency domain Blind Source Separation for convolutive mixture of speech

    A Shoko, S. Makino, T Nishikawa, H Saruwatari

    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS     2737 - 2740  2001  [Refereed]

     View Summary

    Despite several recent proposals to achieve Blind Source Separation (BSS) for realistic acoustic signal, separation performance is still not enough. In particular, when the length of impulse response is long, performance is highly limited. In this paper, we show it is useless to be constrained by the condition, Pmuch less than T, where T is the frame size of FFT and P is the length of room impulse response. From our experiments, a frame size of 256 or 512 (32 or 64 ms at a sampling frequency of 8 kHz) is best even for the long room reverberation of T-R = 150 and 300 ms. We also clarified the reason for poor performance of BSS in long reverberant environment, finding that separation is achieved chiefly for the sound from the direction of jammer because BSS cannot calculate the inverse of the room transfer function both for the target and jammer signals.

  • Stereophonic acoustic echo cancellation: An overview and recent solutions

    S. Makino

    Acoustical Science and Technology   22 ( 5 ) 325 - 333  2001  [Refereed]

     View Summary

    The fundamental problems of stereophonic acoustic echo cancellation were discussed and the recent solutions were reviewed. The stereo echo cancellation was achieved by linearly combining two monoaural echo cancellers. A duofilter control system including a continually running adaptive filter and a fixed filter was used for double talk control. A second order stereo projection algorithm was used in the adaptive filter and a stereo switch was also implemented.

    DOI CiNii

    Scopus

    11
    Citation
    (Scopus)
  • Subjective assessment of the desired echo return loss for subband acoustic echo cancellers

    S Sakauchi, Y Haneda, S Makino, M Tanaka, Y Kaneda

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E83A ( 12 ) 2633 - 2639  2000.12  [Refereed]

     View Summary

    We investigated the dependence of the desired echo return loss on frequency for various hands-free telecommunication conditions by subjective assessment The desired echo return loss as a function of frequency (DERLf) is all important factor in the design and performance evaluation of a subband echo canceller, and it is a measure of what is considered all acceptable echo caused by electrical loss in the transmission line. The DERLf during single talk was obtained as attenuated band-limited echo levels that subjects did not find objectionable when listening to the near-end speech and its band-limited echo under various hands-free telecommunication conditions. When we investigated the DERLf during double-talk, subjects also heard the speech in the far-end room from a loudspeaker. The echo was limited to a 250-Hz bandwidth assuming the use of a subband echo canceller. The test results showed that: (1) when the transmission delay was short (30 ms), the echo component around 2 to 3 kHz was the most objectionable to listeners. (2) as the transmission delay rose to 300 ms, the echo component around 1 kHz became the most objectionable; (3) when the room reverberation time was relatively long (about 500 ms). the echo cumyonent around 1 kHz was the most objectionable even if the transmission delay was short; and ( 1) the DERLf during double-talk was about 5 to 10dB lower than that during single-talk. Use of these DERLf values will enable the design of mure efficient subband echo cancellers.

  • A study of microphone system for hands-free teleconferencing units

    Akira Nakagawa, Suehiro Shimauchi, Shoji Makino

    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi)   21 ( 1 ) 33 - 35  2000  [Refereed]

    DOI

    Scopus

  • Channel-number-compressed multi-channel acoustic echo canceller for high-presence teleconferencing system with large display

    A Nakagawa, S Shimauchi, Y Haneda, S Aoki, S Makino

    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI     813 - 816  2000  [Refereed]

     View Summary

    Sound localization is important to make conversation easy between local and remote sites in a teleconference. This requires a multi-channel sound system having a multi-channel acoustic echo canceller (MAEC). The appropriate number of channels is determined from a trade-off between high presence and MAEC performance, so it is not possible to increase the channel number by much.
    We propose a channel-number-compressed MAEC to provide teleconferencing systems that exhibit high presence. The channel number of the MAEC inputs is compressed and that of its outputs is expanded.

  • Hybrid of acoustic echo cancellers and voice switching control for multi-channel applications

    S. Shimauchi, A. Nakagawa, Y. Haneda, S. Makino

    IWAENC99     48-51  1999.09  [Refereed]

    CiNii

  • Subband echo canceler with an exponentially weighted stepsize NLMS adaptive filter

    S Makino, Y Haneda

    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE   82 ( 3 ) 49 - 57  1999.03  [Refereed]

     View Summary

    This paper proposes a novel adaptive algorithm for an echo canceler. In this algorithm, the number of operations and memory capacity are equivalent to those of the conventional NLMS algorithm but the convergence speed is twice that using the conventional algorithm. This adaptive algorithm is referred to as subband ES (exponentially weighted stepsize). In the algorithm, the frequency bands of the received input signal and echo signal are divided into multiple subbands, and echo is independently canceled in each subband. Each adaptive filter in each subband has independent coefficients with an independent stepsize. The stepsize is time-independent and its weight is exponentially proportional to the change of the impulse response within the frequency region, such as the expected value of the difference between the waveforms of two impulse responses. As a result, the characteristic of the acoustic echo path in each frequency band is analyzed using the adaptive algorithm to improve the convergence characteristic. Using the results of computer simulation and experimental results obtained via an experimental setup with DSP, it is shown that the convergence speed with respect to input voice signal can be about 4 times faster when using echo cancellation based on the new algorithm than in conventional full-band echo cancellation based on the NLMS algorithm. (C) 1998 Scripta Technica, Electron Comm Jpn Pt 3, 82(3): 49-57, 1999.

  • A stereo echo canceller implemented using a stereo shaker and a duo-filter control system

    S Shimauchi, S Makino, Y Haneda, A Nakagawa, S Sakauchi

    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI     857 - 860  1999  [Refereed]

     View Summary

    Stereo echo cancellation has been achieved and used in daily teleconferencing. To overcome the non-uniqueness problem, a stereo shaker is introduced in eight frequency bands and adjusted so as to be inaudible and not affect stereo perception. A duo-filter control system including a continually running adaptive filter and a fixed filter is used for double-talk control. A second-order stereo projection algorithm is used in the adaptive filter. A stereo voice switch is also included. This stereo echo canceller was tested in two-way conversation in a conference room, and the strength of the stereo shaker was subjectively adjusted. A misalignment of 20 dB was obtained in the teleconferencing environment, and changing the talker's position in the transmission room did not affect the cancellation. This echo canceller is now used daily in a high-presence teleconferencing system and has been demonstrated to more than 300 attendees.

  • New configuration for a stereo echo canceller with nonlinear pre-processing

    S Shimauchi, Y Haneda, S Makino, Y Kaneda

    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6     3685 - 3688  1998  [Refereed]

     View Summary

    A new configuration for a stereo echo canceller with nonlinear pre-processing is proposed. The pre-processor which adds uncorrelated components to the original received stereo signals improves the adaptive filter convergence even in the conventional configuration. However, because of the inaudibility restriction, the preprocessed signals still include a large amount of the original stereo signals which are often highly cross-correlated. Therefore, the improvement is limited. To overcome this, our new stereo echo canceller includes exclusive adaptive filters whose inputs are the uncorrelated signals generated in the pre-processor. These exclusive adaptive filters converge to true solutions without suffering from cross-correlation between the original stereo signals. This is demonstrated through computer simulation results.

  • Subband acoustic echo canceller using two different analysis filters and 8th order projection algorithm

    A. Nakagawa, Y. Haneda, S. Makino

    IWAENC97     140-143  1997.09  [Refereed]

    CiNii

  • Subjective assessment of echo return loss required for subband acoustic echo cancellers

    S. Sakauchi, Y. Haneda, S. Makino

    IWAENC97     152-155  1997.09  [Refereed]

  • Multiple-point equalization of room transfer functions by using common acoustical poles

    Y Haneda, S Makino, Y Kaneda

    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING   5 ( 4 ) 325 - 333  1997.07  [Refereed]

     View Summary

    A multiple-point equalization filter using the common acoustical poles of room transfer functions is proposed, The common acoustical poles correspond to the resonance frequencies, which are independent of source and receiver positions. They are estimated as common autoregressive (AR) coefficients from multiple room transfer functions. The equalization is achieved with a finite impulse response (FIR) filter, which has the inverse characteristics of the common acoustical pole function. Although the proposed filter cannot recover the frequency response dips of the multiple room transfer functions, it can suppress their common peaks due to resonance; it is also less sensitive to changes in receiver position, Evaluation of the proposed equalization filter using measured room transfer functions shows that it can reduce the deviations in the frequency characteristics of multiple room transfer functions better than a conventional multiple-point inverse filter, Experiments show that the proposed filter enables 1-5 dB additional amplifier gain in a public address system without acoustic feedback at multiple receiver positions, Furthermore, the proposed filter reduces the reflected sound in room impulse responses without the pre-echo that occurs with a multiple-point inverse filter. A multiple-point equalization filter using common acoustical poles can thus equalize multiple room transfer functions by suppressing their common peaks.

  • Subband stereo echo canceller using the projection algorithm with fast convergence to the true echo path

    S Makino, K Strauss, S Shimauchi, Y Haneda, A Nakagawa

    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V     299 - 302  1997  [Refereed]

     View Summary

    This paper proposes a new subband stereo echo canceller that converges to the true echo path impulse response much faster than conventional stereo echo cancellers. Since signals are bandlimited and downsampled in the subband structure, the time interval between the subband signals become longer, so the variation of the crosscorrelation between the stereo input signals becomes large. Consequently, convergence to the true solution is improved. Furthermore, the projection algorithm, or affine projection algorithm, is applied to further speed up the convergence. Computer simulations using stereo signals recorded in a conference room demonstrate that this method significantly improves convergence speed and almost solves the problem of stereo echo cancellation with low computational load.

  • Noise reduction for subband acoustic echo canceller

    J. Sasaki, Y. Haneda, S. Makino

    Joint meeting, Acoustical Society of America and Acoustical Society of Japan     1285-1290  1996.12  [Refereed]

    CiNii

  • Implementation and evaluation of an acoustic echo canceller using duo-filter control system

    Y. Haneda, S. Makino, J. Kojima, S. Shimauchi

    EUSIPCO96 (European Signal Processing Conference)     1115-1118 - 1118  1996.09  [Refereed]

    CiNii

  • SSB subband echo canceller using low-order projection algorithm

    S Makino, J Noebauer, Y Haneda, A Nakagawa

    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6     945 - 948  1996  [Refereed]

  • Stereo echo cancellation algorithm using imaginary input-output relationships

    S Shimauchi, S Makino

    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6     941 - 944  1996  [Refereed]

  • A FAST PROJECTION ALGORITHM FOR ADAPTIVE FILTERING

    M TANAKA, Y KANEDA, S MAKINO, J KOJIMA

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E78A ( 10 ) 1355 - 1361  1995.10  [Refereed]

     View Summary

    This paper proposes a new algorithm called the fast Projection algorithm, which reduces the computational complexity of the Projection algorithm from (p+1)L+O(p(3)) to 2L+20p (where L is the length of the estimation filter and p is the projection order.) This algorithm has properties that lie between those of NLMS and RLS, i.e. less computational complexity than RLS but much faster convergence than NLMS for input signals like speech. The reduction of computation consists of two parts. One concerns calculating the pre-filtering vector which originally took O(p(3)) operations. Our new algorithm computes the pre-filtering vector recursively with about 15p operations. The other reduction is accomplished by introducing an approximation vector of the estimation filter. Experimental results for speech input show that the convergence speed of the Projection algorithm approaches that of RLS as the projection order increases with only a slight extra calculation complexity beyond that of NLMS, which indicates the efficiency of the proposed fast Projection algorithm.

  • Relationship between the 'ES family' algorithms and conventional adaptive algorithms

    S., Makino

    IWAENC95     11-14  1995.06  [Refereed]

    CiNii

  • Implementation and evaluation of an acoustic echo canceller using the duo-filter control system

    Y. Haneda, S. Makino, J. Kojima, S. Shimauchi

    IWAENC95     79-82  1995.06  [Refereed]

  • Can echo cancellers cancel howling in PA systems?

    牧野昭二

    J. Acoust. Soc. Jpn.   51 ( 3 ) 248  1995.03  [Refereed]

  • STEREO PROJECTION ECHO CANCELER WITH TRUE ECHO PATH ESTIMATION

    S SHIMAUCHI, S MAKINO

    1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5     3059 - 3062  1995  [Refereed]

  • FAST PROJECTION ALGORITHM AND ITS STEP-SIZE CONTROL

    M TANAKA, Y KANEDA, S MAKINO, J KOJIMA

    1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5     945 - 948  1995  [Refereed]

  • High-performance acoustic echo canceller development

    J. Kojima, S. Makino, Y. Haneda, S. Shimauchi

    NTT R&D   44 ( 1 ) 39-44  1995.01  [Refereed]

  • Common acoustical pole and zero modeling of room transfer functions

    Y. Haneda, S. Makino, Y. Kaneda

    NTT R&D   44 ( 1 ) 53-58 - 101  1995.01  [Refereed]

     View Summary

    A new model(Common-Acoustical-Pole and Zero model:CAPZ model)is proposed for a room transfer function(RTF)by using common acoustical poles that correspond to resonance properties of a room. These poles are estimated as the common AR coefficients of many RTFs corresponding to different source and receiver positions. Using the estimated common AR coefficients,the proposed method models the RTFs with different MA coefficients.This new model requires far fewer variable parameters to represent RTFs than the conventional ab-zero or pole, zero model.The acoustic echo canceller based on the proposed model requires half the variable parameters and converges 1.5 times faster than one based on the all-zero model,confmning the efficiency of the proposed model.

    CiNii

  • Report on the 1994 International Conference on Acoustic, Speech, and Signal Processing

    S. Makino, t al

    J. Acoust. Soc. Jpn.   50 ( 9 ) 759-760  1994.09  [Refereed]

  • Research on the adaptive signal processing for acoustic echo cancellation

    S. Makino

    J. Acoust. Soc. Jpn.     75  1994.01  [Refereed]

  • Arma modeling of a room transfer function at low frequencies

    Yoichi Haneda, Shoji Makino, Yutaka Kaneda, Nobuo Koizumi

    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi)   15 ( 5 ) 353 - 355  1994  [Refereed]

    DOI

    Scopus

    8
    Citation
    (Scopus)
  • A NEW RLS ALGORITHM-BASED ON THE VARIATION CHARACTERISTICS OF A ROOM IMPULSE-RESPONSE

    S MAKINO, Y KANEDA

    ICASSP-94 - PROCEEDINGS, VOL 3   III   373 - 376  1994  [Refereed]

  • A new design for program controlled voice switching circuits using a microprocessor

    H. Oikawa, M. Nishino, K. Yamamori, S. Makino

    IEICE Trans. Fundamentals   J77-B-I ( 1 ) 66-74 - 74  1994.01  [Refereed]

    CiNii

  • Common acoustical poles independent of sound directions and modeling of head-related transfer functions

    Yoichi Haneda, Shoji Makino, Yutaka Kaneda

    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi)   15 ( 4 ) 277 - 279  1994  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • EXPONENTIALLY WEIGHTED STEP-SIZE PROJECTION ALGORITHM FOR ACOUSTIC ECHO CANCELERS

    S MAKINO, Y KANEDA

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E75A ( 11 ) 1500 - 1508  1992.11  [Refereed]

     View Summary

    This paper proposes a new adaptive algorithm for acoustic echo cancellers with four times the convergence speed for a speech input, at almost the same computational load, of the normalized LMS (NLMS). This algorithm reflects both the statistics of the variation of a room impulse response and the whitening of the received input signal. This algorithm, called the ESP (exponentially weighted step-size projection) algorithm, uses a different step size for each coefficient of an adaptive transversal filter. These step sizes are time-invariant and weighted proportional to the expected variation of a room impulse response. As a result, the algorithm adjusts coefficients with large errors in large steps, and coefficients with small errors in small steps. The algorithm is based on the fact that the expected variation of a room impulse response becomes progressively smaller along the series by the same exponential ratio as the impulse response energy decay. This algorithm also reflects the whitening of the received input signal, ie., it removes the correlation between consecutive received input vectors. This process is effective for speech, which has a highly non-white spectrum. A geometric interpretation of the proposed algorithm is derived and the convergence condition is proved. A fast projection algorithm is introduced to reduce the computational complexity and modified for a practical multiple DSP structure so that it requires almost the same computational load, 2L multiply-add operations, as the conventional NLMS. The algorithm is implemented in an acoustic echo canceller constructed with multiple DSP chips, and its fast convergence is demonstrated.

  • MODELING OF A ROOM TRANSFER-FUNCTION USING COMMON ACOUSTICAL POLES

    Y HANEDA, S MAKINO, Y KANEDA

    ICASSP-92 - 1992 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5   II   B213 - B216  1992  [Refereed]

  • Subband echo canceller with an exponentially weighted step size NLMS adaptive filter

    S. Makino, Y. Haneda

    IWAENC91 (International Workshop on Acoustic Echo and Noise Control)     109-120  1991.09  [Refereed]

  • Report on the 1990 International Conference on Acoustic, Speech, and Signal Processing

    K. Hirose, S. Nakagawa, T. Taniguchi, S. Makino

    J. Acoust. Soc. Jpn.   46 ( 10 ) 869-870 - 870  1990.10  [Refereed]

    CiNii

  • Recent techniques of circuit and acoustic echo control for telephony

    S. Shimada, S. Makino

    Journal of the Institute of Image Information and Television Engineers   44 ( 3 ) 222-227 - 227  1990.03  [Refereed]

    DOI CiNii

  • ACOUSTIC ECHO CANCELER ALGORITHM BASED ON THE VARIATION CHARACTERISTICS OF A ROOM IMPULSE-RESPONSE

    S MAKINO, Y KANEDA

    ICASSP 90, VOLS 1-5     1133 - 1136  1990  [Refereed]

  • Echo control in telecommunications

    Shoji Makino, Shoji Shimada

    Journal of the Acoustical Society of Japan (E)   11 ( 6 ) 309 - 316  1990  [Refereed]

     View Summary

    This paper reviews echo control techniques for telecommunications, emphasizing the principles and applications of both circuit and acoustic echo cancellers. First, echo generating mechanisms and echo problems are described for circuit and acoustic echoes. Circuit echo is caused by impedance mismatching in a hybrid coil. Acoustic echo is caused by acoustic coupling between loudspeakers and microphones in a room. The echo problem is severe when the round-trip propagation delay is long. In this case, the echo must be removed. Next, the basic principle of the echo canceller, adaptive filter structure and adaptive algorithm are discussed. Emphasis is focused on the construction and operation of an adaptive transversal filter using the NLMS (Normalized Least Mean Square) algorithm, which is the most popular for the echo canceller. Then, applications of circuit and acoustic echo cancellers are described. Circuit echo cancellers have been well studied and implemented in LSIs for many applications. Although acoustic echo cancellers have been introduced into audio teleconference systems, they still have some problems which must be solved. Therefore, they are now being studied intensely. Finally, this paper mentions the problems of echo cancellers and the direction of future work on them. The main targets for acoustic echo cancellers are improving the convergence speed, reducing the amount of hardware and bettering the double-talk control technique. © 1990, Acoustical Society of Japan. All rights reserved.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Acoustic echo canceller algorithm based on room acoustic characteristics

    S. Makino, N. Koizumi

    WASPAA89 (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics)   1 ( 1 ) 1-2  1989.10  [Refereed]

  • A coustic echo canceller with multiple echo paths

    Nobuo Koizumi, Shoji Makino, Hiroshi Oikawa

    Journal of the Acoustical Society of Japan (E)   10 ( 1 ) 39 - 45  1989  [Refereed]

     View Summary

    A new configuration of acoustic echo canceller for multiple microphone teleconferencing systems is proposed. It is designed for use with microphones whose gains switch or vary during teleconferencing according to the talker. This system requires memory for multiple echo paths, which enables the updating of filter coefficients when an echo path is changed due to the switching of the actuated microphone during talker alternation. In comparison to the single echo path model which uses only adaptation, this method maintains echo cancellation during abrupt changes of the echo path when the microphone alternates between talkers. Also in comparison to direct microphone output mixing, this method reduces the stationary residual echo level by the reduction of acoustic coupling. © 1989, Acoustical Society of Japan. All rights reserved.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Improvement on adaptation of an echo canceller in a room

    S. Makino, N. Koizumi

    IEICE Trans. Fundamentals   J71-A ( 12 ) 2212-2214 - 2214  1988.12  [Refereed]

    CiNii

  • AUDIO TELECONFERENCING SET WITH MULTIPATH ECHO CANCELLER

    H OIKAWA, N KOIZUMI, S MAKINO

    REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES   36 ( 2 ) 217 - 223  1988.03  [Refereed]

  • Audio teleconferencing set with multi-path echo canceller

    H. Oikawa, N. Koizumi, S. Makino

    ECL Technical Journal   37 ( 2 ) 191-197 - 197  1988.02  [Refereed]

    CiNii

  • Vibration characteristics of a piezoelectric bimorph diaphragm with a step-shaped edge

    S. Makino, Y. Ichinose

    J. Acoust. Soc. Jpn.   43 ( 3 ) 161-166 - 166  1987.03  [Refereed]

    CiNii

▼display all

Books and Other Publications

  • Audio Source Separation

    Makino,Shoji( Part: Sole author)

    Springer International Publishing  2018.03 ISBN: 9783319730318

  • Underdetermined blind source separation using acoustic arrays

    S., Makino, S., Araki, S., Winter, and, H. Sawada( Part: Sole author)

    Wiley  2010.01

  • Underdetermined blind source separation of convolutive mixtures by hierarchical clustering and L1-norm minimization

    S. Winter, W. Kellermann, H. Sawada, S. Makino( Part: Other)

    Springer  2007.09

  • Frequency-domain blind source separation

    H. Sawada, S. Araki, S. Makino( Part: Other)

    Springer  2007.09

  • K-means based underdetermined blind speech separation

    S. Araki, H. Sawada, S. Makino( Part: Other)

    Springer  2007.09

  • Blind Speech Separation

    S. Makino, Te-Won Lee, H. Sawada( Part: Edit)

    Springer  2007.09 ISBN: 9781402064784

     View Summary

    http://www.amazon.co.jp/Speech-Separation-Signals-Communication-Technology/dp/1402064780

  • Blind source separation of convolutive mixtures of audio signals in frequency domain

    S. Makino, H. Sawada, R. Mukai, S. Araki( Part: Sole author)

    Springer  2006.05

  • Speech Enhancement

    J. Benesty, S. Makino, J. Chen( Part: Edit)

    Springer  2005.05 ISBN: 354024039X

     View Summary

    http://www.amazon.co.jp/Speech-Enhancement-Signals-Communication-Technology/dp/354024039X

  • Real-time blind source separation for moving speech signals

    R. Mukai, H. Sawada, S. Araki, S. Makino( Part: Other)

    Springer  2005.03

  • Subband based blind source separation

    S. Araki, S. Makino( Part: Other)

    Springer  2005.03

  • Blind source separation of convolutive mixtures of speech

    S., Makino( Part: Sole author)

    Springer  2003.01

  • IEICE Knowledge Base

    S.Makino( Part: Contributor, Blind audio source separation based on sparse component analysis)

    IEICE  2012.10

  • 2011 IEEE REGION 10 CONFERENCE TENCON 2011

    Takeda, Kazuma, Kameoka, Hirokazu, Sawada, Hiroshi, Araki, Shoko, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji( Part: Contributor, Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source)

    IEEE  2011.01 ISBN: 9781457702556

  • 2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS

    Ansai, Yumi, Araki, Shoko, Makino, Shoji, Nakatani, Tomohiro, Yamada, Takeshi, Nakamura, Atsushi, Kitawaki, Nobuhiko( Part: Contributor, Cepstral Smoothing of Separated Signals for Underdetermined Speech Separation)

    IEEE  2010.01 ISBN: 9781424453092

  • Teleconferencing equipment

    牧野, 昭二( Part: Sole author)

    Fiji Technosystem  1999.10

  • 音響エコーキャンセラのための適応信号処理の研究

    Makino,Shoji( Part: Sole author)

    1993.03

▼display all

Presentations

  • Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier

    Li, Li, Hirokazu, Kameoka, Makino, Shoji

    ICASSP  (Brighton, United Kingdom) 

    Presentation date: 2019.05

  • Joint separation and dereverberation of reverberant mixtures with multichannel variational autoencoder

    Shota, Inoue, Hirokazu, Kameoka, Li, Li, Shogo, Seki, Makino, Shoji

    ICASSP  (Brighton, United Kingdom) 

    Presentation date: 2019.05

  • Time-frequency-bin-wise switching of minimum variance distortionless response beamformer for underdetermined situations

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    ICASSP 2019  (Brighton, ENGLAND) 

    Presentation date: 2019.05

  • NEW ANALYTICAL UPDATE RULE FOR TDOA INFERENCE FOR UNDERDETERMINED BSS IN NOISY ENVIRONMENTS

    Maruyama, Takuro, Araki, Shoko, Nakatani, Tomohiro, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji, Nakamura, Atsushi

    IEEE International Conference on Acoustics, Speech and Signal Processing  (Kyoto, JAPAN) 

    Presentation date: 2012.03

  • Audio source separation based on independent component analysis

    S. Makino, H. Sawada  [Invited]

    Tutorial at the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing 

    Presentation date: 2007.04

  • Study on geometrically constrained IVA with auxiliary function approach and VCD for in-car communication

    Goto, Kana, Li, Li, Takahashi, Riki, Makino, Shoji, Yamada, Takeshi

    APSIPA ASC 2020 

    Presentation date: 2020.12

  • Applying virtual microphones to triangular microphone array in in-car communication

    Segawa, Hanako, Takahashi, Riki, Jinzai, Ryoga, Makino, Shoji, Yamada, Takeshi

    APSIPA ASC 2020 

    Presentation date: 2020.12

  • 空間フィルタの自動推定による音響シーン識別の検討

    大野, 泰己, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会 

    Presentation date: 2020.03

  • Generative Adversarial Networks を用いた半教師あり学習の音響イベント検出への適用

    合馬, 一弥, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会 

    Presentation date: 2020.03

  • 発話の時間変動に着目した音声認識誤り区間推定の検討

    舒, 禹清, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会 

    Presentation date: 2020.03

  • 空間特徴と音響特徴を併用する音響イベント検出の検討

    陳, 軼夫, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会 

    Presentation date: 2020.03

  • 車室内コミュニケーション用低遅延音源分離の検討

    上田, 哲也, 井上, 翔太, 牧野, 昭二, 松本, 光雄, 山田, 武志

    日本音響学会春季研究発表会 

    Presentation date: 2020.03

  • DNNマスク推定に基づく畳み込みビームフォーマによる音源分離・残響除去・雑音除去の同時実現

    髙橋理希, 中谷智広, 落合翼, 木下慶介, 池下林太郎, Marc, Delcroix, 荒木章子, 牧野, 昭二

    日本音響学会2020年春季研究発表会 

    Presentation date: 2020.03

  • 基底共有型半教師あり独立低ランク行列分析に基づく多チャネル補聴器システム

    宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

    日本音響学会2020年春季研究発表会 

    Presentation date: 2020.03

  • Spatial feature extraction based on convolutional neural network with multiple microphone inputs for monitoring of domestic activities

    Kaneko, Yuki, Kurosawa, Rika, Yamada, Takeshi, Makino, Shoji

    NCSP'20 

    Presentation date: 2020.02

  • Underdetermined multichannel speech enhancement using time-frequency-bin-wise switching beamformer and gated CNN-based time-frequency mask for reverberant environments

    Takahashi, Riki, Yamaoka, Kouei, Li, Li, Makino, Shoji, Yamada, Takeshi, Matsumoto, Mitsuo

    NCSP'20 

    Presentation date: 2020.02

  • Blind source separation with low-latency for in-car communication

    Ueda, Tetsuya, Inoue, Shota, Makino, Shoji, Matsumoto, Mitsuo, Yamada, Takeshi

    NCSP'20 

    Presentation date: 2020.02

  • 多チャンネル変分自己符号化器法による任意話者の音源分離

    李莉, 亀岡弘和, 井上翔太, 牧野, 昭二

    電子情報通信学会 

    Presentation date: 2019.12

  • Evaluation of multichannel hearing aid system by rank-constrained spatial covariance matrix estimation

    Une, Masakazu, Kubo, Yuki, Takamune, Norihiro, Kitamura, Daichi, Saruwatari, Hiroshi, Makino, Shoji

    APSIPA  (Lanzhou, China) 

    Presentation date: 2019.11

  • Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum

    Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

    APSIPA ASC 2019  (Lanzhou, PEOPLES R CHINA) 

    Presentation date: 2019.11

  • Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE

    Li, Li, Toda, Tomoki, Morikawa, Kazuho, Kobayashi, Kazuhiro, Makino, Shoji

    ISMIR  (Delft, The Netherlands) 

    Presentation date: 2019.11

  • Joint separation, dereverberation and classification of multiple sources using multichannel variational autoencoder with auxiliary classifier

    Shota, Inoue, Hirokazu, Kameoka, Li, Li, Makino, Shoji

    ICA  (AACHEN, GERMANY) 

    Presentation date: 2019.09

  • Gated convolutional neural network-based voice activity detection under high-level noise environments

    Li, Li, Kouei, Yamaoka, Yuki, Koshino, Mitsuo, Matsumoto, Makino, Shoji

    ICA  (AACHEN, GERMANY) 

    Presentation date: 2019.09

  • BLSTMと変調スペクトルを用いた発話特徴識別の検討

    サントソ, ジェニファー, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会 

    Presentation date: 2019.09

  • BLSTMを用いた音声認識誤り区間推定の検討

    舒, 禹清, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会 

    Presentation date: 2019.09

  • Wavelength proportional arrangement of virtual microphones based on interpolation/extrapolation for underdetermined speech enhancement

    Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Makino, Shoji, Yamada, Takeshi

    EUSIPCO 2019 

    Presentation date: 2019.09

  • CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations

    Yamaoka, Kouei, Li, Li, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    EUSIPCO 2019 

    Presentation date: 2019.09

  • ランク制約付き空間共分散モデル推定を用いた多チャネル補聴器システムの評価

    宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

    日本音響学会2019年秋季研究発表会 

    Presentation date: 2019.09

  • 日本語スピーキングテストにおける解答発話テキストの分散表現を用いた自動採点の検討

    臼井, 桃香, 山田, 武志, 牧野, 昭二

    電子情報通信学会総合大会 

    Presentation date: 2019.03

  • MVDRビームフォーマの時間周波数スイッチングによる劣決定音声強調

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    電子情報通信学会音声研究会 

    Presentation date: 2019.03

  • 時間周波数スイッチングビームフォーマとGated CNNを用いた時間周波数マスクの組み合わせによる劣決定音声強調

    髙橋, 理希, 山岡, 洸瑛, 李, 莉, 牧野, 昭二, 山田, 武志

    日本音響学会2019年春季研究発表会 

    Presentation date: 2019.03

  • Experimental evaluation of WaveRNN predictor for audio lossless coding

    Amada, Shota, Sugiura, Ryosuke, Kamamoto, Yutaka, Harada, Noboru, Moriya, Takehiro, Yamada, Takeshi, Makino, Shoji

    NCSP'19 

    Presentation date: 2019.03

  • Noise suppression using beamformer and transfer-function-gain nonnegative matrix factorization with distributed stereo microphones

    Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

    NCSP'19 

    Presentation date: 2019.03

  • Categorizing error causes related to utterance characteristics in speech recognition

    Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

    NCSP'19 

    Presentation date: 2019.03

  • Performance evaluation of time-frequency-bin-wise switching beamformer in reverberant environments

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    NCSP'19 

    Presentation date: 2019.03

  • 音源クラス識別器つき多チャンネル変分自己符号化器を用いた高速セミブラインド音源分離

    李, 莉, 亀岡, 弘和, 牧野, 昭二

    日本音響学会2019年春季研究発表会 

    Presentation date: 2019.03

  • Gated CNNを用いた劣悪な雑音環境下における音声区間検出

    牧野, 昭二, 李莉, 越野ゆき, 松本光雄

    電子情報通信学会 

    Presentation date: 2019.03

  • 多チャンネル変分自己符号化器を用いた音源分離と残響除去の統合的アプローチ

    井上翔太, 亀岡弘和, 李莉, 関翔悟, 牧野, 昭二

    日本音響学会2019年春季研究発表会 

    Presentation date: 2019.03

  • Microphone position realignment by extrapolation of virtual microphone

    Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Yamada, Takeshi, Makino, Shoji

    10th Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC)  (Honolulu, HI) 

    Presentation date: 2018.11

  • Weakly labeled learning using BLSTM-CTC for sound event detection

    Matsuyoshi, Taiki, Komatsu, Tatsuya, Kondo, Reishi, Yamada, Takeshi, Makino, Shoji

    APSIPA ASC 2018 

    Presentation date: 2018.11

  • 時間周波数スイッチングビームフォーマと時間周波数マスキングによる劣決定音声強調

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    日本音響学会秋季研究発表会 

    Presentation date: 2018.09

  • Time-frequency-bin-wise beamformer selection and masking for speech enhancement in underdetermined noisy scenarios

    Yamaoka, Kouei, Brendel, Andreas, Ono, Nobutaka, Makino, Shoji, Buerger, Michael, Yamada, Takeshi, Kellermann, Walter

    EUSIPCO 2018  (Rome, ITALY) 

    Presentation date: 2018.09

  • Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming

    Matsui, Yutaro, Nakatani, Tomohiro, Delcroix, Marc, Kinoshita, Keisuke, Ito, Nobutaka, Araki, Shoko, Makino, Shoji

    IWAENC2018 

    Presentation date: 2018.09

  • WaveRNNを利用した音声ロスレス符号化に関する検討と考察

    天田, 将太, 杉浦, 亮介, 鎌本, 優, 原田, 登, 守谷, 健弘, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会 

    Presentation date: 2018.09

  • ヴァーチャルマイクロフォンの外挿によるマイクロフォン間隔の仮想的拡張

    陣在, 遼河, 山岡, 洸瑛, 松本, 光雄, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会 

    Presentation date: 2018.09

  • 音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習法の有効性評価

    松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

    日本音響学会秋季研究発表会 

    Presentation date: 2018.09

  • 複数種録音端末を用いた会議の想定における伝達関数ゲイン基底NMFによる遠方音源抑圧の性能評価

    松井, 裕太郎, 牧野, 昭二, 小野, 順貴, 山田, 武志

    電子情報通信学会信号処理研究会 

    Presentation date: 2018.03

  • 音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習の検討

    松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会 

    Presentation date: 2018.03

  • 複数ビームフォーマの組み合わせによる非線形マイクロフォンアレイ

    山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

    日本音響学会春季研究発表会 

    Presentation date: 2018.03

  • 音声認識における誤認識原因通知のための印象評定値推定の検討

    後藤, 孝宏, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会 

    Presentation date: 2018.03

  • 畳み込みニューラルネットワークを用いた空間特徴抽出に基づく音響シーン識別の検討

    高橋, 玄, 山田, 武志, 牧野, 昭二

    日本音響学会春季研究発表会 

    Presentation date: 2018.03

  • Novel speech recognition interface based on notification of utterance volume required in changing noisy environment

    Goto, Takahiro, Yamada, Takeshi, Makino, Shoji

    NCSP'18 

    Presentation date: 2018.03

  • Acoustic scene classification based on spatial feature extraction using convolutional neural networks

    Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

    NCSP'18 

    Presentation date: 2018.03

  • Ego noise reduction and sound source localization adapted to human ears using hose-shaped rescue robot

    Mae, Narumi, Yamaoka, koei, Mitsui, Yosiki, Matsumoto, Mitsuo, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

    NCSP'18 

    Presentation date: 2018.03

  • Ego-noise reduction for hose-shaped rescue robot using basis-shared semi-supervised independent low-rank matrix analysis

    Takakusaki, Moe, Kitamura, Daichi, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi, Saruwatari, Hiroshi

    NCSP'18 

    Presentation date: 2018.03

  • Abnormal sound detection by two microphones using virtual microphone technique

    Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

    APSIPA 2017  (Kuala Lumpur, MALAYSIA) 

    Presentation date: 2017.12

  • Sound source localization using binaural difference for hose-shaped rescue robot

    Mae, Narumi, Mitsui, Yoshiki, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

    APSIPA 2017  (Kuala Lumpur, MALAYSIA) 

    Presentation date: 2017.12

  • Performance evaluation of acoustic scene classification using DNN-GMM and frame-concatenated acoustic Features

    Takahashi, Gen, Yamada, Takeshi, Ono, Nobutaka, Makino, Shoji

    APSIPA 2017  (Kuala Lumpur, MALAYSIA) 

    Presentation date: 2017.12

  • Experimental evaluation of encoding parameters of MPEG-4 ALS for high-resolution audio

    Amada, Shota, Kamamoto, Yutaka, Harada, Noboru, Sugiura, Ryosuke, Moriya, Takehiro, Makino, Shoji, Yamada, Takeshi

    IEEE GCCE 2017  (Nagoya, JAPAN) 

    Presentation date: 2017.10

  • Mel-generalized cepstral regularization for discriminative non-negative matrix factorization

    Li, Li, Kameoka, Hirokazu, Makino, Shoji

    MLSP  (Tokyo, Japan) 

    Presentation date: 2017.09

  • Multiple far noise suppression in a real environment using transfer-function-gain NMF

    Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

    EUSIPCO 2017  (GREECE) 

    Presentation date: 2017.08

  • Performance evaluation of nonlinear speech enhancement based on virtual increase of channels in reverberant environments

    Yamaoka, Kouei, Ono, Nobutaka, Yamada, Takeshi, Makino, Shoji

    EUSIPCO 2017  (GREECE) 

    Presentation date: 2017.08

  • Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization

    Li, Li, Kameoka, Hirokazu, Toda, Tomoki, Makino, Shoji

    Interspeech  (Stockholm, Sweden) 

    Presentation date: 2017.08

  • Convolutional neural network architecture and input volume matrix design for ERP classifications in a tactile P300-based brain-computer interface

    Kodama, Takumi, Makino, Shoji

    IEEE Engineering in Medicine & Biology Society (EMBC)  (Jeju Island, Korea) 

    Presentation date: 2017.07

  • 柔軟索状ロボットにおける独立低ランク行列分析と統計的音声強調に基づく高品質ブラインド音源分離の開発

    三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

    日本機械学会ロボティクス・メカトロニクス講演会 

    Presentation date: 2017.05

  • SJ-CATにおける項目応答理論に基づく能力値推定の精度改善

    小野, 友暉, 山田, 武志, 菊地, 賢一, 今井, 新悟, 牧野, 昭二

    日本音響学会2017年春季研究発表会 

    Presentation date: 2017.03

  • 音響ロスレス符号化MPEG-4 ALSのハイレゾ音源適応の検討と考察

    天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

    日本音響学会2017年春季研究発表会 

    Presentation date: 2017.03

  • DNN-GMMと連結特徴量を用いた音響シーン識別の検討

    高橋, 玄, 山田, 武志, 小野, 順貴, 牧野, 昭二

    日本音響学会2017年春季研究発表会 

    Presentation date: 2017.03

  • Discriminative non-negative matrix factorization with majorization-minimization

    Li, L, Kameoka, H, Makino, Shoji

    HSCMA  (San Francisco, CA) 

    Presentation date: 2017.03

  • 補助関数法による識別的NMFの基底学習アルゴリズム

    李莉, 亀岡弘和, 牧野昭二

    日本音響学会2017年春季研究発表会 

    Presentation date: 2017.03

  • 独立低ランク行列分析と統計的音声強調を用いた柔軟索状ロボットにおけるブラインド音源分離システムの開発

    三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

    日本音響学会2017年春季研究発表会 

    Presentation date: 2017.03

  • Ego noise reduction for hose-shaped rescue robot combining independent low-rank matrix analysis and multichannel noise cancellation

    Mae, N, Ishimura, M, Makino, Shoji, Kitamura, D, Ono, N, Yamada, T, Saruwatari, H

    LVA/ICA  (Grenoble Alpes Univ, Grenoble, FRANCE) 

    Presentation date: 2017.02

  • Analysis of the brain activated distributions in response to full-body spatial vibrotactile stimuli using a tactile P300-based BCI paradigm

    Kodama, T, Makino, Shoji

    Biomedical and Health Informatics (BHI) 

    Presentation date: 2017.02

  • Performance estimation of spontaneous speech recognition using non-reference acoustic features

    Ling,Guo, Takeshi,Yamada, Shoji,Makino

    Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)  (Jeju, SOUTH KOREA) 

    Presentation date: 2016.12

  • Full-body tactile P300-based brain-computer interface accuracy refinement

    Kodama, T, Shimizu, K, Makino, Shoji, Rutkowski, T

    International Conference on Bio-engineering for Smart Technologies (BioSMART) 

    Presentation date: 2016.12

  • Tactile brain-computer interface using classification of P300 responses evoked by full body spatial vibrotactile stimuli

    Kodama, T, Makino, Shoji, Rutkowski, T

    APSIPA 

    Presentation date: 2016.12

  • Visual motion onset augmented reality brain-computer interface

    Shimizu, K, Kodama, T, Makino, Shoji, Rutkowski, T

    International Conference on Bio-engineering for Smart Technologies (BioSMART) 

    Presentation date: 2016.12

  • 伝達関数ゲイン基底NMFを用いた遠方雑音抑圧の実環境での評価

    松井,裕太郎, 牧野,昭二, 小野,順貴, 山田,武志

    第31回信号処理シンポジウム 

    Presentation date: 2016.11

  • 雑音下音声認識における必要発話音量提示機能の実装と評価

    後藤,孝宏, 山田,武志, 牧野,昭二

    日本音響学会秋季研究発表会 

    Presentation date: 2016.09

  • 日本語スピーキングテストSJ-CATにおける項目応答理論に基づく能力値推定の検証

    小野,友暉, 山田,武志, 菊地,賢一, 今井,新悟, 牧野,昭二

    日本音響学会秋季研究発表会 

    Presentation date: 2016.09

  • ノンリファレンス特徴量を用いた自然発話音声認識の性能推定の検討

    郭,レイ, 山田,武志, 牧野,昭二

    日本音響学会秋季研究発表会 

    Presentation date: 2016.09

  • ヴァーチャル多素子化に基づくSN比最大化ビームフォーマの残響に対する性能変化

    山岡,洸瑛, 小野,順貴, 山田,武志, 牧野,昭二

    日本音響学会秋季研究発表会 

    Presentation date: 2016.09

  • Ego-noise reduction for a hose-shaped rescue robot using determined Rank-1 multichannel nonnegative matrix factorization

    Moe,Takakusaki, Daichi,Kitamura, Nobutaka,Ono, Takeshi,Yamada, Shoji,Makino, Hiroshi,Saruwatari

    IWAENC2016 

    Presentation date: 2016.09

  • Noise reduction using independent vector analysis and noise cancellation for a hose-shaped rescue robot

    Masaru,Ishimura, Shoji,Makino, Takeshi,Yamada, Nobutaka,Ono, Hiroshi,Saruwatari

    IWAENC2016  (Xian, PEOPLES R CHINA) 

    Presentation date: 2016.09

  • Multi-talker speech recognition based on blind source separation with ad hoc microphone array using smartphones and cloud storage

    Ochi, K, Ono, N, Miyabe, S, Makino, Shoji

    Interspeech  (San Francisco, CA) 

    Presentation date: 2016.09

  • Acoustic scene classification using deep neural network and frame-concatenated acoustic feature

    Gen, Takahashi, Takeshi, Yamada, Shoji, Makino, Nobutaka, Ono

    Detection and Classification of Acoustic Scenes and Events 

    Presentation date: 2016.09

  • Flexible microphone array based on multichannel nonnegative matrix factorization and statistical signal estimation

    Saruwatari, H, Takata, K, Ono, N, Makino, Shoji  [Invited]

    International Congress on Acoustics 

    Presentation date: 2016.09

  • Toward a QoL improvement of ALS patients: Development of the full-body P300-based tactile brain-computer interface

    Kodama, T, Makino, Shoji, Rutkowski, T

    AEARU Young Researchers International Conference 

    Presentation date: 2016.09

  • 音声のスペクトル領域とケプストラム領域における同時強調

    李莉, 亀岡弘和, 樋口卓哉, 猿渡洋, 牧野昭二

    信学技報 EA2014-75 

    Presentation date: 2016.08

  • 独立ベクトル分析とノイズキャンセラを用いた雑音抑圧の柔軟索状ロボットへの適用

    石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

    日本機械学会ロボティクス・メカトロニクス講演会2016 

    Presentation date: 2016.06

  • Vehicle counting and lane estimation with ad-hoc microphone array in real road environments

    Takuya,Toyoda, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    NCSP'16 

    Presentation date: 2016.03

  • ランク1空間モデル制約付き多チャネルNMFを用いた雑音抑圧の柔軟索状ロボットへの適用

    高草木,萌, 北村,大地, 小野,順貴, 山田,武志, 牧野,昭二, 猿渡,洋

    電子情報通信学会総合大会 

    Presentation date: 2016.03

  • 振幅のみからの相関推定と雑音尖度に基づく空間サブトラクションアレーの減算係数最適化

    李,傑, 宮部,滋樹, 小野,順貴, 山田,武志, 牧野,昭二

    日本音響学会2016年春季研究発表会 

    Presentation date: 2016.03

  • 独立ベクトル分析とノイズキャンセラを用いた柔軟索状ロボットにおける雑音抑圧

    石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

    電子情報通信学会総合大会 

    Presentation date: 2016.03

  • 教師あり多チャネルNMFと統計的音声強調を用いた柔軟索状ロボットにおける音源分離

    高田一真, 北村大地, 中嶋広明, 小山翔一, 猿渡洋, 小野順貴, 牧野,昭二

    日本音響学会2016年春季研究発表会 

    Presentation date: 2016.03

  • ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

    高草木萌, 北村大地, 小野順貴, 山田武志, 牧野昭二, 猿渡洋

    日本機械学会ロボティクス・メカトロニクス講演会 

    Presentation date: 2016.03

  • 非同期分散マイクロホンによるブラインド音源分離を用いた複数話者同時音声認識

    越智景子, 小野順貴, 宮部滋樹, 牧野,昭二

    日本音響学会2016年春季研究発表会 

    Presentation date: 2016.03

  • SVM classification study of code-modulated visual evoked potentials

    D.,Aminaka, S.,Makino, T.M.,Rutkowski

    Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC)  (PEOPLES R CHINA Hong Kong) 

    Presentation date: 2015.12

  • Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model

    Yoshikazu,Murase, Hironobu,Chiba, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC)  (PEOPLES R CHINA Hong Kong) 

    Presentation date: 2015.12

  • Fingertip stimulus cue-based tactile brain-computer interface

    H.,Yajima, S.,Makino, T.M.,Rutkowski

    Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC)  (PEOPLES R CHINA Hong Kong) 

    Presentation date: 2015.12

  • Variable sound elevation features for head-related impulse response spatial auditory BCI

    C.,Nakaizumi, S.,Makino, T.M.,Rutkowski

    Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC)  (PEOPLES R CHINA Hong Kong) 

    Presentation date: 2015.12

  • EEG filtering optimization for code-modulated chromatic visual evoked potential-based brain-computer interface

    D.,Aminaka, S.,Makino, T.M.,Rutkowski

    International Symbiotic Workshop (SYMBIOTIC) 

    Presentation date: 2015.10

  • 日本語スピーキングテストSJ-CATにおける低スコア解答発話の検出の検討

    小野,友暉, 山田,武志, 今井,新悟, 牧野,昭二

    日本音響学会2015年秋季研究発表会 

    Presentation date: 2015.09

  • ノンリファレンスひずみ特徴量を用いた雑音下音声認識性能推定の検討

    郭,レイ, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

    日本音響学会2015年秋季研究発表会 

    Presentation date: 2015.09

  • Classification Accuracy Improvement of Chromatic and High-Frequency Code-Modulated Visual Evoked Potential-Based BCI

    Aminaka,Daiki, Makino,Shoji, Rutkowski, Tomasz M

    8th International Conference on Brain Informatics and Health (BIH)  (Royal Geog Soc, London, ENGLAND) 

    Presentation date: 2015.08

  • Estimating correlation coefficient between two complex signals without phase observation

    S.,Miyabe, N.,Ono, Makino,Shoji

    LVA/ICA 

    Presentation date: 2015.08

  • Chromatic and high-frequency cVEP-based BCI paradigm

    Aminaka,Daiki, Makino,Shoji, Rutkowski, Tomasz M

    Engineering in Medicine and Biology Conference (EMBC) 

    Presentation date: 2015.08

  • Head-related impulse response cues for spatial auditory brain-computer interface

    C.,Nakaizumi, S.,Makino, T.M.,Rutkowski

    Engineering in Medicine and Biology Conference (EMBC) 

    Presentation date: 2015.08

  • マイクロホンアレーの位相が観測できない条件でのチャネル間の相関係数の推定

    宮部滋樹, 小野順貴, 牧野,昭二

    回路とシステムワークショップ 

    Presentation date: 2015.08

  • Inter-stimulus interval study for the tactile point-pressure brain-computer interface

    K.,Shimizu, Makino,Shoji, T.M.,Rutkowski

    Engineering in Medicine and Biology Conference (EMBC) 

    Presentation date: 2015.08

  • ステレオ録音に基づく移動音源モデルによる走行車両検出と走行方向推定

    遠藤,純基, 豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会2015年春季研究発表会 

    Presentation date: 2015.03

  • 総合品質と明瞭性の客観推定に基づくスペクトルサブトラクションの減算係数の最適化

    中里,徹, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会 

    Presentation date: 2015.03

  • 非同期分散マイクロフォンアレーによる伝達関数ゲイン基底NMFを用いた拡散雑音抑圧

    村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会2015年春季研究発表会 

    Presentation date: 2015.03

  • ケプストラム距離とSMR-パープレキシティを用いた雑音下音声認識の性能推定の検討

    郭,レイ, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会 

    Presentation date: 2015.03

  • 2つの超ガウス性複素信号の位相観測を用いない相関係数推定

    宮部滋樹, 小野順貴, 牧野, 昭二

    信学技報EA2014-75 

    Presentation date: 2015.03

  • Spatial auditory BCI spellers using real and virtual surround sound systems

    M.,Chang, C.,Nakaizumi, K.,Mori, Makino,Shoji, T.M.,Rutkowski

    Conference on Systems Neuroscience and Rehabilitation (SNR2015) 

    Presentation date: 2015.03

  • 認識性能予測に基づく雑音環境下音声認識のユーザビリティ改善の検討

    青木,智充, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

    日本音響学会2015年春季研究発表会 

    Presentation date: 2015.03

  • On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

    Yoshikazu,Murase, Hironobu,Chiba, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    APSIPA 2014 

    Presentation date: 2014.12

  • 絶対値の観測のみを用いた2つの複素信号の相関係数推定

    宮部滋樹, 小野順貴, 牧野,昭二

    日本音響学会研究発表会 

    Presentation date: 2014.09

  • ケプストラム距離を用いた雑音下音声認識の性能推定の検討

    郭,翎, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

    日本音響学会研究発表会 

    Presentation date: 2014.09

  • 伝達関数ゲイン基底NMFにおけるマイク数・マイク配置と目的音強調性能の関係

    村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会 

    Presentation date: 2014.09

  • βダイバージェンスに基づく一般化振幅補間によるヴァーチャル多素子化を用いた目的音源強調

    片平,拓希, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会 

    Presentation date: 2014.09

  • 分散型マイクロホンアレイを用いた交通車両検出とその車線推定の検討

    豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会 

    Presentation date: 2014.09

  • AMPLITUDE-BASED SPEECH ENHANCEMENT WITH NONNEGATIVE MATRIX FACTORIZATION FOR ASYNCHRONOUS DISTRIBUTED RECORDING

    Chiba, Hironobu, Ono, Nobutaka, Miyabe, Shigeki, Takahashi, Yu, Yamada, Takeshi, Makino, Shoji

    14th International Workshop on Acoustic Signal Enhancement (IWAENC)  (Antibes, FRANCE) 

    Presentation date: 2014.09

  • Multi-stage declipping of clipping distortion based on length classification of clipped interval

    Chenlei,Li, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

    日本音響学会研究発表会 

    Presentation date: 2014.09

  • 教師なし伝達関数ゲイン基底NMFによる目的音強調における罰則項の特性評価

    千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

    日本音響学会研究発表会 

    Presentation date: 2014.09

  • M2Mを用いた大規模データ収集システムの構築に関する研究

    牧野,昭二

    情報処理学会研究報告 計算機アーキテクチャ研究会(ARC) 

    Presentation date: 2013.12

  • VIRTUALLY INCREASING MICROPHONE ARRAY ELEMENTS BY INTERPOLATION IN COMPLEX-LOGARITHMIC DOMAIN

    Katahira, Hiroki, Ono, Nobutaka, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji

    21st European Signal Processing Conference (EUSIPCO)  (Marrakesh, MOROCCO) 

    Presentation date: 2013.09

  • 非同期録音ブラインド同期のための線形位相補償の効率的最尤解探索

    宮部滋樹, 小野順貴, 牧野昭二  [Invited]

    音講論集___2-10-4_ 

    Presentation date: 2013.03

  • 複素対数補間によるヴァーチャル観測に基づく劣決定条件での音声強調

    片平拓希, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二  [Invited]

    音講論集___2-10-6_ 

    Presentation date: 2013.03

  • 日本語スピーキングテストSCATにおける文読み上げ・文生成問題の自動採点手法の改良

    山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

    音講論集___1-Q-52a_465-468 

    Presentation date: 2013.03

  • 楽音符号化品質に影響を及ぼす楽音信号の特徴量の検討

    松浦嶺, 山田武志, 牧野昭二, 北脇信彦

    音講論集___1-Q-11c_401-404 

    Presentation date: 2013.03

  • ACELPにおけるピッチシャープニングの特性評価

    千葉大将, 守谷健弘, 鎌本優, 原田登, 宮部滋樹, 山田武志, 牧野昭二  [Invited]

    音講論集___1-7-18_ 

    Presentation date: 2013.03

  • A network model for the embodied communication of musical emotions

    寺澤洋子, 星-芝, 玲子, 柴山拓郎, 大村英史, 古川聖, 牧野, 昭二, 岡ノ谷一夫

    Cognitive Studies 

    Presentation date: 2013

  • AUTOMATIC SCORING METHOD CONSIDERING QUALITY AND CONTENT OF SPEECH FOR SCAT JAPANESE SPEAKING TEST

    Okubo, Naoko, Yamahata, Yuto, Yamada, Takeshi, Imai, Shingo, Ishizuka, Kenkichi, Shinozaki, Takahiro, Nisimura, Ryuichi, Makino, Shoji, Kitawaki, Nobuhiko

    International Conference on Speech Database and Assessments (Oriental COCOSDA)  (11 Macau, PEOPLES R CHINA) 

    Presentation date: 2012.12

  • 日本語スピーキングテストにおける文生成問題の自動採点の検討

    大久保梨思子, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

    音講論集___3-Q-16_395-396 

    Presentation date: 2012.09

  • ミュージカルノイズを考慮した雑音抑圧音声のFR型客観品質評価の検討

    藤田悠希, 山田武志, 牧野昭二, 北脇信彦

    音講論集___3-P-5_127-130 

    Presentation date: 2012.09

  • 身体動作の連動性理解にむけた筋活動可聴化

    松原正樹, 寺澤洋子, 門根秀樹, 鈴木健嗣, 牧野昭二  [Invited]

    音講論集___2-10-2_ 

    Presentation date: 2012.09

  • 非同期録音信号の線形位相補償によるブラインド同期と音源分離への応用

    宮部滋樹, 小野順貴, 牧野昭二  [Invited]

    音講論集___3-9-8_ 

    Presentation date: 2012.09

  • 日本語スピーキングテストにおける文章読み上げ問題の自動採点の検討

    山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

    音講論集___3-Q-18_399-400 

    Presentation date: 2012.09

  • コヒーレンス解析による定常状態誘発反応の可聴化

    加庭輝明, 寺澤洋子, 松原正樹, Tomasz,M. Rutkowski, 牧野昭二

    音講論集___2002/10/2_919-922 

    Presentation date: 2012.09

  • 多チャンネルウィーナーフィルタを用いた音源分離における観測モデルの調査

    坂梨龍太郎, 宮部滋樹, 山田武志, 牧野昭二

    音講論集___1-P-14,_757-760 

    Presentation date: 2012.09

  • 混合DOA モデルに基づく多チャンネル複素NMF による劣決定BSS

    武田和馬, 亀岡弘和, 澤田宏, 荒木章子, 宮部滋樹, 山田武志, 牧野昭二

    音講論集___2-1-9_747-750 

    Presentation date: 2012.03

  • 日本語スピーキングテストにおける文生成問題の採点に影響を及ぼす要因の検討

    大久保梨思子, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

    信学総大___D-14-9_193 

    Presentation date: 2012.03

  • 日本語スピーキングテストにおける文章読み上げ問題の採点に影響を及ぼす要因の検討

    山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

    信学総大___D-14-8_192 

    Presentation date: 2012.03

  • 雑音抑圧音声の主観品質評価におけるミュージカルノイズの影響

    藤田悠希, 山田武志, 牧野昭二, 北脇信彦  [Invited]

    信学総大___D-14-1_185 

    Presentation date: 2012.03

  • 音響モデルの精度を考慮した雑音下音声認識の性能推定の検討

    高岡隆守, 山田武志, 牧野昭二, 北脇信彦

    音講論集___1-P-13_149-150 

    Presentation date: 2012.03

  • 短時間雑音特性に基づく雑音下音声認識の性能推定の検討

    森下恵里, 山田武志, 牧野昭二, 北脇信彦

    音講論集___1-P-14_151-152 

    Presentation date: 2012.03

  • フルランク空間相関行列モデルに基づく拡散性雑音除去

    礒佳樹, 荒木章子, 牧野昭二, 中谷智広, 澤田宏, 山田武志, 宮部滋樹, 中村篤

    信学総大___A-10-9_194 

    Presentation date: 2012.03

  • 音量差に基づく音像生成における個人適応手法の有効性検証

    天野成祥, 山田武志, 牧野昭二, 北脇信彦

    音講論集___2-Q-1_895-898 

    Presentation date: 2012.03

  • 高次相関を用いた非線形MUSIC による高分解能方位推定

    杉本侑哉, 宮部滋樹, 山田武志, 牧野昭二

    音講論集___3-1-6_763-766 

    Presentation date: 2012.03

  • 時間周波数領域におけるグリッド間の整合性に基づくクリッピングの除去

    三浦晋, 宮部滋樹, 山田武志, 牧野昭二, 中島弘史, 中臺一博

    音講論集___1-Q-10_843-846 

    Presentation date: 2012.03

  • Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source

    Takeda, Kazuma, Kameoka, Hirokazu, Sawada, Hiroshi, Araki, Shoko, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji

    IEEE Region 10 Conference on TENCON  (INDONESIA) 

    Presentation date: 2011.11

  • Restoration of Clipped Audio Signal Using Recursive Vector Projection

    Miura, Shin, Nakajima, Hirofumi, Miyabe, Shigeki, Makino, Shoji, Yamada, Takeshi, Nakadai, Kazuhiro

    IEEE Region 10 Conference on TENCON  (INDONESIA) 

    Presentation date: 2011.11

  • 周波数依存の時間差モデルによる劣決定BSS

    丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田武志, 牧野昭二, 中村篤

    信学技報___EA2011-86_25-30 

    Presentation date: 2011.11

  • 発話の連続性に基づいた音声信号の分類による会議音声の可視化

    加藤通朗, 杉本侑哉, 宮部滋樹, 牧野昭二, 山田武志, 北脇信彦

    音講論集___3-P-20_197-200 

    Presentation date: 2011.09

  • 雑音抑圧音声の総合品質推定モデルの改良とその客観品質評価への適用

    藤田悠希, 山田武志, 牧野昭二, 北脇信彦

    音講論集___2-Q-23_127-130 

    Presentation date: 2011.09

  • スピーカ間の音量差に基づく音像生成手法における個人適応の検討

    天野成祥, 山田武志, 牧野昭二, 北脇信彦

    音講論集___2-4-10_661-664 

    Presentation date: 2011.09

  • 楽音と音声の双方に適用できる客観品質評価法の検討

    三上 雄一郎, 山田 武志, 牧野 昭二, 北脇 信彦

    信学総大___B-11-19_448 

    Presentation date: 2011.03

  • 雑音抑圧音声の客観品質評価に用いる総合品質推定モデルの改良

    藤田 悠希, 山田 武志, 牧野 昭二

    信学総大___B-11-18_447 

    Presentation date: 2011.03

  • スペクトル変形同定の聴覚トレーニングにおける適応的フィードバックの影響

    加庭 輝明, 金 成英, 寺澤 洋子, 伊藤 寿浩, 池田 雅弘, 山田 武志, 牧野 昭二

    音講論集___2-1-1_1003-1006 

    Presentation date: 2011.03

  • クリッピングした音響信号の修復

    三浦 晋, 中島 弘史, 牧野 昭二, 山田 武志, 中臺 一博

    音講論集___3-P-53(d)_941-944 

    Presentation date: 2011.03

  • 空間スペクトルを用いた時間断続信号の検出における主成分分析と周波数分析の比較評価

    加藤 通朗, 杉本 侑哉, 牧野 昭二, 山田 武志, 北脇 信彦

    音講論集___3-P-8(d)_879-880 

    Presentation date: 2011.03

  • 空間スペクトルへの周波数分析の適用による時間断続信号の検出

    杉本 侑哉, 加藤 通朗, 牧野 昭二, 山田 武志

    音講論集___3-P-7(c)_877-878 

    Presentation date: 2011.03

  • 高残響下で混合された音声の音源分離に関する研究

    礒 佳樹, 荒木 章子, 牧野 昭二, 中谷 智広, 澤田 宏, 山田 武志, 中村 篤

    音講論集___1-9-13_643-646 

    Presentation date: 2011.03

  • 音源のW-DO性を仮定した多チャンネル複素NMFによる劣決定BSS

    武田 和馬, 亀岡 弘和, 澤田 宏, 荒木 章子, 山田 武志, 牧野 昭二

    音講論集___1-Q-19(f)_801-804 

    Presentation date: 2011.03

  • 視覚障がい者のタッチパネル操作支援のための音像生成手法の検討

    天野 成祥, 山田 武志, 牧野 昭

    音講論集___3-P-7(c)_877-878 

    Presentation date: 2011.03

  • 雑音抑圧された音声の主観・客観品質評価法

    山田 武志, 牧野 昭二, 北脇 信彦

    情報処理学会研究報告 音声言語情報処理(SLP)___2010-SLP-83 (7)_1-6 

    Presentation date: 2010.10

  • 雑音抑圧音声のMOSと単語了解度の客観推定

    山田 武志, 北脇 信彦, 牧野 昭二

    信学ソ大___BS-5-4_S-19 

    Presentation date: 2010.09

  • 空間パワースペクトルの主成分分析に基づく時間断続信号の検出

    加藤 通朗, 杉本 侑哉, 牧野 昭二, 山田 武志, 北脇 信彦

    信学技報___EA2010-47_25-30 

    Presentation date: 2010.08

  • Cepstral Smoothing of Separated Signals for Underdetermined Speech Separation

    Ansai, Yumi, Araki, Shoko, Makino, Shoji, Nakatani, Tomohiro, Yamada, Takeshi, Nakamura, Atsushi, Kitawaki, Nobuhiko

    International Symposium on Circuits and Systems Nano-Bio Circuit Fabrics and Systems (ISCAS 2010)  (Paris, FRANCE) 

    Presentation date: 2010.05

  • 調波構造とHMM合成に基づく混合楽器音認識の検討

    山本裕貴, 山田武志, 北脇信彦, 牧野昭二

    音講論集___3-8-4_1003-1004 

    Presentation date: 2010.03

  • 雑音抑圧音声の総合品質推定モデルを適用したフルリファレンス客観品質評価法

    篠原佑基, 山田武志, 北脇信彦, 牧野昭二

    信学総大___B-11-2_436 

    Presentation date: 2010.03

  • 劣決定音源分離のための分離信号のケプストラムスムージング

    安齊祐美, 荒木章子, 牧野昭二, 中谷智広, 山田武志, 中村篤, 北脇信彦

    音講論集___2-P-25_847-850 

    Presentation date: 2010.03

  • 日本語学習支援のためのアクセント認識の検討

    ショートグレッグ, 山田武志, 北脇信彦, 牧野昭二

    音講論集___1-P-17_447-448 

    Presentation date: 2010.03

  • 雑音下音声認識の性能推定法の実環境における評価

    中島智弘, 山田武志, 北脇信彦, 牧野昭二

    音講論集___2-Q-4_241-244 

    Presentation date: 2010.03

  • IP網における音声の客観品質評価に用いる擬似音声信号の検討

    青島千佳, 北脇信彦, 山田武志, 牧野昭二  [Invited]

    信学総大___B-11-1_435 

    Presentation date: 2010.03

  • IP網における客観品質評価に用いる擬似音声信号の検討

    青島千佳, 北脇信彦, 山田武志, 牧野昭二  [Invited]

    QoSワークショップ___QW7-P-16_ 

    Presentation date: 2009.11

  • 楽音と音声の双方に適用できるオーディオ信号の客観品質推定法の検討

    三上雄一郎, 北脇信彦, 山田武志, 牧野昭二

    QoSワークショップ___QW-7-P-15_ 

    Presentation date: 2009.11

  • 雑音抑圧音声の総合品質推定モデルを用いたフルリファレンス客観品質評価法の検討

    篠原佑基, 山田武志, 北脇信彦, 牧野昭二

    QoSワークショップ___QW7-P-13_ 

    Presentation date: 2009.11

  • 音声区間推定と時間周波数領域方向推定の統合による会議音声話者識別

    荒木 章子, 藤本 雅清, 石塚 健太郎, 中谷 智広, 澤田 宏, 牧野 昭二

    信学技報___EA2008-40_19-24 

    Presentation date: 2008.07

  • [フェロー記念講演]独立成分分析に基づくブラインド音源分離

    牧野, 昭二

    信学技報___EA2008-17_65-73 

    Presentation date: 2008.05

  • 周波数領域ICAにおける初期値の短時間データからの学習

    荒木 章子, 伊藤 信貴, 澤田 宏, 小野 順貴, 牧野 昭二, 嵯峨山 茂樹

    信学総大___A-10-6_208 

    Presentation date: 2008.03

  • 音声区間検出と方向情報を用いた会議音声話者識別システムとその評価

    荒木 章子, 藤本 雅清, 石塚 健太郎, 澤田 宏, 牧野 昭二

    音講論集___1-10-1_1-4 

    Presentation date: 2008.03

  • 音声のスパース性を用いたUnderdetermined音源分離

    荒木 章子, 澤田 宏, 牧野 昭二

    信学総大___AS-4-5_S-46 - S-47 

    Presentation date: 2008.03

  • A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures

    H. Sawada, S. Araki, S. Makino

    ICA2007, Stereo Audio Source Separation Evaluation Campaign____ 

    Presentation date: 2007.09

  • Blind source separation based on time-frequency masking and maximum SNR beamformer array

    S. Araki, H. Sawada, S. Makino

    ICA2007, Stereo Audio Source Separation Evaluation Campaign____ 

    Presentation date: 2007.09

  • Blind audio source separation based on independent component analysis

    S. Makino  [Invited]

    Keynote Talk at the 2007 International Conference on Independent Component Analysis and Signal Separation 

    Presentation date: 2007.09

  • 話者分類とSN比最大化ビームフォーマに基づく会議音声強調

    荒木 章子, 澤田 宏, 牧野 昭二

    音講論集___2-1-13_571-572 

    Presentation date: 2007.03

  • 事前学習を用いる周波数領域Pearson-ICAの高速化

    加藤 比呂子, 永原 裕一, 荒木 章子, 澤田 宏, 牧野 昭二

    音講論集___1-5-22_549-550 

    Presentation date: 2006.03

  • 観測信号ベクトルのクラスタリングに基づくスパース信号の到来方向推定

    荒木 章子, 澤田 宏, 向井 良, 牧野 昭二

    音講論集___3-5-6_615-616 

    Presentation date: 2006.03

  • 独立成分分析に基づくブラインド音源分離

    牧野 昭二, 荒木 章子, 向井 良, 澤田 宏

    計測自動制御学会 中国支部 学術講演会____2-9 

    Presentation date: 2005.11

  • 多音源に対する周波数領域ブラインド音源分離

    澤田 宏, 向井 良, 荒木 章子, 牧野 昭二

    AIチャレンジ研究会___SIG-Challenge-0522-3_17-22 

    Presentation date: 2005.10

  • パラメトリックピアソン分布を用いた周波数領域ブラインド音源分離

    加藤 比呂子, 永原 裕一, 荒木 章子, 澤田 宏, 牧野 昭二

    音講論集___2-2-4_593-594 

    Presentation date: 2005.09

  • 観測信号ベクトル正規化とクラスタリングによる音源分離手法とその評価

    荒木 章子, 澤田 宏, 向井 良, 牧野 昭二

    音講論集___2-2-3_591-592 

    Presentation date: 2005.09

  • 3次元マイクロホンアレイを用いた多音源ブラインド分離

    向井 良, 澤田 宏, 荒木 章子, 牧野 昭二

    信学ソ大___A-10-8_209 

    Presentation date: 2005.09

  • 多くの背景音からの主要音源のブラインド抽出

    澤田 宏, 荒木 章子, 向井 良, 牧野 昭二

    信学ソ大___A-10-9_210 

    Presentation date: 2005.09

  • 観測ベクトルのクラスタリングによるブラインド音源分離

    荒木 章子, 澤田 宏, 向井 良, 牧野 昭二

    信学ソ大___A-10-7_208 

    Presentation date: 2005.09

  • 独立成分分析を用いた音源数推定法

    澤田 宏, 向井 良, 荒木 章子, 牧野 昭二

    音講論集___3-Q-20_753-754 

    Presentation date: 2004.09

  • A solution for the permutation problem in frequency domain BSS using near- and far-field models

    R. Mukai, H. Sawada, S. Araki, S. Makino

    CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-3_ 

    Presentation date: 2004.04

  • Underdetermined blind source separation for convolutive mixtures of sparse signals

    S. Winter, H. Sawada, S. Araki, S. Makino

    CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-2_ 

    Presentation date: 2004.04

  • Blind separation of more speech than sensors using time-frequency masks and ICA

    S. Araki, S. Makino, H. Sawada, R. Mukai

    CSA2004 (NTT Workshop on Communication Scene Analysis)___AU-4_ 

    Presentation date: 2004.04

  • Blind source separation for convolutive mixtures in the frequency domain

    H. Sawada, R. Mukai, S. Araki, S. Makino

    CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-1_ 

    Presentation date: 2004.04

  • 狭間隔・広間隔の複数マイクロホン対を用いた周波数領域ブラインド音源分離

    向井 良, 澤田 宏, 荒木 章子, 牧野 昭二

    音講論集___3-P-16_627-628 

    Presentation date: 2004.03

  • 独立成分分析に基づくブラインド音源分離

    牧野 昭二, 荒木 章子, 向井 良, 澤田 宏

    ディジタル信号処理シンポジウム___A3-2_1-10 

    Presentation date: 2003.11

  • Blind Separation of More Speech Signals than Sensors using Time-frequency Masking and Mixing Matrix Estimation

    Shoko Araki, Audrey Blin, Shoji Makino

    音講論集___1-P-4_585-586 

    Presentation date: 2003.09

  • 周波数領域BSSにおける近距離場モデルを用いたパーミュテーションの解法

    向井 良, 澤田 宏, 荒木 章子, 牧野 昭二

    音講論集___1-P-6_589-590 

    Presentation date: 2003.09

  • 実環境における3音源以上のブラインド分離

    澤田 宏, 向井 良, 荒木 章子, 牧野 昭二

    音講論集___2-5-19_547-548 

    Presentation date: 2003.09

  • 時間周波数マスキングとICAの併用による音源数 > マイク数の場合のブラインド音源分離

    荒木 章子, 向井 良, 澤田 宏, 牧野 昭二

    音講論集___1-P-5_587-588 

    Presentation date: 2003.09

  • ICA-Based audio source separation

    S. Makino, S. Araki, R. Mukai, H. Sawada

    Technical report of IEICE___EA2003-45_17-24 

    Presentation date: 2003.06

  • ICA-based audio source separation

    S. Makino, S. Araki, R. Mukai, H. Sawada

    International Workshop on Microphone Array Systems - Theory and Practice____ 

    Presentation date: 2003.05

  • 周波数領域ブラインド音源分離におけるpermutation問題の頑健な解法

    澤田 宏, 向井 良, 荒木 章子, 牧野 昭二

    音講論集___3-P-25_777-778 

    Presentation date: 2003.03

  • 移動音源の低遅延実時間ブラインド分離

    向井 良, 澤田 宏, 荒木 章子, 牧野 昭二

    音講論集___3-P-26_779-780 

    Presentation date: 2003.03

  • 帯域に適した分離手法を用いるサブバンド領域ブラインド音源分離

    荒木 章子, 牧野 昭二, Robert Aichner, 西川 剛樹, 猿渡 洋

    音講論集___3-P-27_781-782 

    Presentation date: 2003.03

  • KL情報量最小化に基づく時間領域ICAと非定常信号の同時無相関化に基づく時間領域ICAの比較

    西川 剛樹, 高谷 智哉, 猿渡 洋, 鹿野 清宏, 荒木 章子, 牧野 昭二

    音講論集___2-5-14_545-546 

    Presentation date: 2002.09

  • 死角型ビームフォーマを初期値に用いる時間領域ブラインド音源分離

    荒木 章子, 牧野 昭二, Robert Aichner, 西川 剛樹, 猿渡 洋

    音講論集___2-5-13_543-544 

    Presentation date: 2002.09

  • ブラインド音源分離後の残留スペクトルの推定と除去

    向井 良, 澤田 宏, 荒木 章子, 牧野 昭二

    音講論集___2-5-11_539-540 

    Presentation date: 2002.09

  • 周波数領域ブラインド音源分離におけるpermutation問題の解法

    澤田 宏, 向井 良, 荒木 章子, 牧野 昭二

    音講論集___2-5-12_541-542 

    Presentation date: 2002.09

  • 周波数領域ICAと時間遅れスペクトル減算による残響下での実時間ブラインド音源分離

    向井 良, 荒木 章子, 澤田 宏, 牧野 昭二

    音講論集___1-Q-19_673-674 

    Presentation date: 2002.03

  • サブバンド処理によるブラインド音源分離に関する検討

    荒木 章子, 牧野 昭二, Robert Aichner, 西川 剛樹, 猿渡 洋

    音講論集___3-4-9_619-620 

    Presentation date: 2002.03

  • 間隔の異なる複数のマイクペアによるブラインド音源分離

    澤田 宏, 荒木 章子, 向井 良, 牧野 昭二

    音講論集___3-4-10_621-622 

    Presentation date: 2002.03

  • ICA-based sound separation

    S. Makino, S. Araki, R. Mukai, H. Sawada, R. Aichner, H. Saruwatari, T. Nishikawa, Y. Hinamoto

    NTT Workshop on Comm. Scene Analysis____ 

    Presentation date: 2002.01

  • Time domain blind source separation of non-stationary convolved signals with utilization of geometric beamforming

    R. Aichner, S. Araki, S. Makino, H. Sawada, T. Nishikawa, H. Saruwatari

    NTT Workshop on Comm. Scene Analysis____ 

    Presentation date: 2002.01

  • Separation and dereverberation performance of frequency domain blind source separation

    R. Mukai, S. Araki, S. Makino

    NTT Workshop on Comm. Scene Analysis____ 

    Presentation date: 2002.01

  • Equivalence between frequency domain blind source separation and frequency domain adaptive beamformers

    S. Araki, S. Makino, R. Mukai, H. Saruwatari

    NTT Workshop on Comm. Scene Analysis____ 

    Presentation date: 2002.01

  • A polar-coordinate based activation function for frequency domain blind source separation

    H. Sawada, R. Mukai, S. Araki, S. Makino

    NTT Workshop on Comm. Scene Analysis____ 

    Presentation date: 2002.01

  • 周波数領域ブラインド音源分離と適応ビ-ムフォ-マの等価性について

    雛元 洋一, 西川 剛樹, 猿渡 洋, 荒木 章子, 牧野 昭二, 向井 良

    信学技報___EA2001-84_75-82 

    Presentation date: 2001.11

  • 非定常スペクトルサブトラクションによる音源分離後の残留雑音除去

    向井 良, 荒木 章子, 澤田 宏, 牧野 昭二

    音講論集___2-6-14_617-618 

    Presentation date: 2001.10

  • 周波数領域ブラインド音源分離のための極座標表示に基づく活性化関数

    澤田 宏, 向井 良, 荒木 章子, 牧野 昭二

    音講論集___2-6-13_615-616 

    Presentation date: 2001.10

  • 周波数領域ブラインド音源分離と周波数領域適応ビームフォーマの関係について

    荒木 章子, 牧野 昭二, 向井 良, 猿渡 洋

    音講論集___2-6-12_613-614 

    Presentation date: 2001.10

  • 時間領域ICAと周波数領域ICAを併用した多段ICAによるブラインド音源分離

    猿渡 洋, 西川 剛樹, 荒木 章子, 牧野 昭二

    日本神経回路学会全国大会____99-100 

    Presentation date: 2001.09

  • 複素数に対する独立成分分析のための極座標表示に基づく活性化関数

    澤田 宏, 向井 良, 荒木 章子, 牧野 昭二

    日本神経回路学会全国大会____97-98 

    Presentation date: 2001.09

  • 実環境での混合音声に対する周波数領域ブラインド音源分離手法の性能限界

    荒木 章子, 牧野 昭二, 西川 剛樹, 猿渡 洋

    音講論集___3-7-4_567-568 

    Presentation date: 2001.03

  • 帯域分割型ICAを用いたBlind Source Separationにおける帯域分割数の最適化

    西川 剛樹, 荒木 章子, 牧野 昭二, 猿渡 洋

    音講論集___3-7-5_569-570 

    Presentation date: 2001.03

  • 実環境におけるブラインド音源分離と残響除去性能に関する検討

    向井 良, 荒木 章子, 牧野 昭二

    音講論集___3-7-3_565-566 

    Presentation date: 2001.03

  • 周波数領域Blind Source Separationにおける帯域分割数の最適化

    西川 剛樹, 荒木 章子, 牧野 昭二, 猿渡 洋

    信学技報___EA2000-95_53-59 

    Presentation date: 2001.01

  • チャネル数変換型多チャネル音響エコーキャンセラ

    中川 朗, 島内 末廣, 羽田 陽一, 青木 茂明, 牧野 昭二

    信学総大___A-4-51_140 

    Presentation date: 2000.03

  • ステレオエコーキャンセラにおける相互相関変動方法の検討

    鈴木 邦和, 杉山 精, 阪内 澄宇, 島内 末廣, 牧野 昭二

    信学技報___EA99-86_25-32 

    Presentation date: 1999.12

  • 音響系の変動に着目したステレオ信号の相関低減方法

    鈴木 邦和, 阪内 澄宇, 島内 末廣, 牧野 昭二

    音講論集___1-6-12_453-454 

    Presentation date: 1999.03

  • ハンズフリー音声会議装置における複数マイクロホンの構成の検討

    中川 朗, 島内 末廣, 牧野 昭二

    音講論集___2-6-7_493-494 

    Presentation date: 1999.03

  • 相互相関の変動付加処理に適したステレオエコーキャンセラの構成の検討

    島内 末廣, 羽田 陽一, 牧野 昭二, 金田 豊

    信学総大___A-4-12_121 

    Presentation date: 1998.03

  • Block fast projection algorithm with independent block sizes

    M. Tanaka, S. Makino, J. Kojima

    信学総大___TA-2-2_554-555 

    Presentation date: 1997.03

  • 射影アルゴリズムを用いたサブバンドステレオエコーキャンセラ

    牧野 昭二, 島内 末廣, 羽田 陽一, 中川 朗

    音講論集___2-7-18_549-550 

    Presentation date: 1996.09

  • サブバンドエコーキャンセラにおけるフィルタ更新ベクトルの平坦化の検討

    中川 朗, 羽田 陽一, 牧野 昭二

    信学ソ大___A-87_88 

    Presentation date: 1996.09

  • 拡声通信システムにおける周波数帯域別所要エコー抑圧量の検討

    阪内 澄宇, 牧野 昭二

    音講論集___2-7-17_547-548 

    Presentation date: 1996.09

  • 高速射影アルゴリズムの多チャンネル系への適用

    島内 末廣, 田中 雅史, 牧野 昭二

    信学総大___A-168_170 

    Presentation date: 1996.03

  • ES family'アルゴリズムと従来の適応アルゴリズムの関係について

    牧野, 昭二

    信学技報___DSP95-148_65-70 

    Presentation date: 1996.01

  • 高速FIRフィルタリング算法を利用した射影法

    田中 雅史, 牧野 昭二, 金田 豊

    信学ソ大___A-79_81 

    Presentation date: 1995.09

  • サブバンドエコーキャンセラのプロトタイプフィルタの検討

    中川 朗, 羽田 陽一, 牧野 昭二

    信学ソ大___A-73_75 

    Presentation date: 1995.09

  • 擬似入出力関係を利用したステレオ音響エコーキャンセラ用アルゴリズムの検討

    島内 末廣, 牧野 昭二

    音講論集___2-6-5_543-544 

    Presentation date: 1995.09

  • 複素射影サブバンドエコーキャンセラに関する検討

    中川 朗, 羽田 陽一, 牧野 昭二

    音講論集___2-6-3_539-540 

    Presentation date: 1995.09

  • エコーキャンセラ用SSBサブバンド射影アルゴリズム

    牧野 昭二, 羽田 陽一, 中川 朗

    音講論集___2-6-4_541-542 

    Presentation date: 1995.09

  • 真の音響エコー経路を推定するステレオ射影エコーキャンセラの検討

    島内 末廣, 牧野 昭二

    信学総大___A-220_220 

    Presentation date: 1995.03

  • ES射影アルゴリズムを用いたデュオフィルタ構成のエコーキャンセラの検討

    羽田 陽一, 牧野 昭二, 小島 順治, 島内 末廣

    音講論集___3-3-10_595-596 

    Presentation date: 1995.03

  • 音響エコーキャンセラ用デュオフィルタコントロールシステム

    羽田 陽一, 牧野 昭二, 田中 雅史, 島内 末廣, 小島 順治

    信学総大___A-350_350 

    Presentation date: 1995.03

  • 高性能音響エコーキャンセラの開発

    小島 順治, 牧野 昭二, 羽田 陽一, 島内 末廣, 金田 豊

    信学総大___A-348_348 

    Presentation date: 1995.03

  • ES射影アルゴリズムの音響エコーキャンセラへの適用

    牧野 昭二, 羽田 陽一, 田中 雅史, 金田 豊, 小島 順治

    信学総大___A-349_349 

    Presentation date: 1995.03

  • エコーキャンセラの音声入力に対する収束速度改善方法の比較について

    牧野, 昭二

    音講論集___2-6-16_653-654 

    Presentation date: 1994.10

  • ステレオ信号の相互相関の変化に着目したステレオ射影エコーキャンセラの検討

    島内 末廣, 牧野 昭二

    音講論集___2-6-17_655-656 

    Presentation date: 1994.10

  • PMTC/N-ISDN用多地点エコーキャンセラの構成

    須田 泰史, 藤野 雄一, 牧野 昭二, 小長井 俊介, 川田 真一

    信学全大___B-795_393 

    Presentation date: 1994.09

  • 室内音場伝達関数の共通極・零モデル化

    羽田 陽一, 牧野 昭二, 金田 豊

    信学技報___EA93-101_19-29 

    Presentation date: 1994.03

  • ES-RLSアルゴリズムと従来の適応アルゴリズムの関係について

    牧野, 昭二

    音講論集___1-5-12_471-472 

    Presentation date: 1993.10

  • 共通極を用いたスピーカ特性の多点イコライゼーションについて

    羽田 陽一, 牧野 昭二

    音講論集___1-5-18_483-484 

    Presentation date: 1993.10

  • 高次の射影アルゴリズムの演算量削減について

    田中 雅史, 金田 豊, 牧野 昭二

    信学全大___A-101_1-103 

    Presentation date: 1993.09

  • 共通極を用いた多点イコライゼーションフィルタについて

    羽田 陽一, 牧野 昭二

    音講論集___3-9-17_491-492 

    Presentation date: 1993.03

  • 複数の室内音場伝達関数に共通な極の最小2乗推定について

    羽田 陽一, 牧野 昭二, 金田 豊

    信学全大___SA-11-4_1-489 - 1-490 

    Presentation date: 1993.03

  • 音響エコーキャンセラ用ES射影アルゴリズム

    牧野 昭二, 金田 豊

    信学技報___EA92-74_41-52 

    Presentation date: 1992.11

  • 室内インパルス応答の変動特性を反映させたES-RLSアルゴリズム

    牧野 昭二, 金田 豊

    音講論集___2-4-19_547-548 

    Presentation date: 1992.10

  • 音声入力に対する射影法の次数と収束特性について

    田中 雅史, 牧野 昭二, 金田 豊

    音講論集___1-4-14_489-490 

    Presentation date: 1992.10

  • エコーキャンセラ用ES射影アルゴリズムの収束条件について

    牧野 昭二, 金田 豊

    信学全大___SA-9-6_1-301 

    Presentation date: 1992.09

  • 室内インパルス応答の統計的性質に基づく指数重み付けNLMS適応フィルタ

    牧野 昭二, 金田 豊

    信学技報___EA92-48_9-20 

    Presentation date: 1992.08

  • エコーキャンセラ用ES射影アルゴリズム

    牧野 昭二, 金田 豊

    信学全大___SA-7-11_1-472 - 1-473 

    Presentation date: 1992.03

  • 音響エコーキャンセラにおけるダブルトーク制御方式の検討

    中原 宏之, 羽田 陽一, 牧野 昭二, 吉川 昭吉郎

    音講論集___3-5-7_503-504 

    Presentation date: 1992.03

  • 音の到来方向によらない頭部伝達伝達関数の共通極とモデル化について

    羽田 陽一, 牧野 昭二, 金田 豊

    音講論集___1-8-5_483-484 

    Presentation date: 1991.10

  • エコーキャンセラ用ES (Exponential Step) アルゴリズムの収束条件について

    牧野 昭二, 金田 豊

    音講論集___1-7-25_419-420 

    Presentation date: 1991.03

  • 室内音場伝達関数の極の推定について

    羽田 陽一, 牧野 昭二, 金田 豊

    音講論集___1-7-12_393-394 

    Presentation date: 1991.03

  • 帯域分割形指数重み付けアルゴリズムを用いた音響エコーキャンセラ

    牧野 昭二, 羽田 陽一

    信学全大___SA-9-4_1-255 - 1-256 

    Presentation date: 1990.10

  • 低周波領域における室内音場伝達関数のARMAモデルについて

    羽田 陽一, 牧野 昭二, 小泉 宣夫

    音講論集___2-7-14_439-440 

    Presentation date: 1990.03

  • 指数重み付けによるエコーキャンセラ用適応アルゴリズム

    牧野, 昭二

    音講論集___3-6-5_517-518 

    Presentation date: 1989.10

  • エコーキャンセラの室内音場における適応特性改善について

    牧野 昭二, 小泉 宣夫

    信学技報___EA89-3_15-21 

    Presentation date: 1989.04

  • 拡声通話形の音声会議システム

    及川 弘, 西野 正和, 牧野 昭二

    信学全大___B-548_2-243 

    Presentation date: 1988.03

  • エコーキャンセラの室内音場における適応特性の改善について

    牧野 昭二, 小泉 宣夫

    音講論集___1-5-13_355-356 

    Presentation date: 1988.03

  • 複数反響路を有する音響エコーキャンセラの構成法

    小泉 宣夫, 牧野 昭二, 及川 弘

    信学技報___EA87-75_1-6 

    Presentation date: 1988.01

  • 複数反響路を有する音響エコーキャンセラ

    小泉 宣夫, 牧野 昭二, 及川 弘

    信学部門全大___431_1-296 

    Presentation date: 1987.09

  • 音響エコーキャンセラの室内環境における消去特性について

    牧野 昭二, 小泉 宣夫

    信学技報___EA87-43_41-48 

    Presentation date: 1987.08

  • 直方体ブース内の障害物によるインパルス応答の変動について

    牧野 昭二, 小泉 宣夫

    音講論集___1-3-1_295-296 

    Presentation date: 1987.03

  • MTFによる音声会議でのマイクロホン配置の評価について

    小泉 宣夫, 牧野 昭二, 青木 茂明

    音講論集___2-7-18_631-632 

    Presentation date: 1986.10

  • 音響エコーキャンセラの室内環境における定常特性について

    牧野 昭二, 小泉 宣夫

    音講論集___2-7-19_383-384 

    Presentation date: 1985.10

  • 室内残響特性を考慮した音声スイッチ切替特性の検討

    牧野 昭二, 山森 和彦

    音講論集___1-2-19_265-266 

    Presentation date: 1984.10

  • マイクロプロセッサ制御を用いた拡声電話機の構成法

    山森 和彦, 松井 弘行, 牧野 昭二

    信学技報___EA84-41_15-21 

    Presentation date: 1984.09

  • 音声スイッチ回路損失制御波形の通話品質への影響

    石丸 薫, 小川 峰義, 牧野 昭二

    信学部門全大___795_3-190 

    Presentation date: 1984.09

  • 周辺に段差を持つ圧電バイモルフ振動板の振動特性について

    一ノ瀬 裕, 牧野 昭二

    音講論集___1-6-5_287-288 

    Presentation date: 1983.10

  • ハンドセット小形化に関する一検討

    牧野 昭二, 一ノ瀬 裕

    音講論集___1-6-10_297-298 

    Presentation date: 1983.10

▼display all

Research Projects

  • 次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

    関東経済産業局  中小企業経営支援等対策費補助金(戦略的基盤技術高度化支援事業)

    Project Year :

    2020.04
    -
    2021.03
     

  • 音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

    日本学術振興会  基盤研究(B)

    Project Year :

    2019
    -
    2021
     

    牧野 昭二

     View Summary

    [検討項目1] 音の伝播の物理的なモデルに基づいて観測信号を補間し、実際には存在しない、いわばバーチャルな観測信号を作り出して素子数を擬似的に増やすことにより、音源数に依存することなく高品質な出力を得るための統一的なアレー信号処理を検討した。擬似観測の振幅は非線形補間により推定した。擬似観測を用いた音声強調の劣決定拡張により、擬似観測の基本的な検証を行った。さらに、バーチャルマイクロホンの動作原理の解明と高性能化を図った。今期は、国際会議発表2件、および、国内大会発表1件の研究成果を得た。
    [検討項目2] 音環境からの情報を利用した多チャネル信号処理アルゴリズムを開発した。既存のアルゴリズムを分散型マイクロホンアレーに対応できるように一般化し、さらに強力な最適化規範を導入した。分散型マイクロホンアレーにおけるサブアレーの同期手法を開発した。ブラインド音源分離/抽出アルゴリズムや多チャネル残響除去アルゴリズムを分散型マイクロホンアレーに対応できるように開発した。さらに、必要なマイクロホンを最小化して演算量を削減しながら、性能を最適化するためのマイクロホン選択手法も検討した。今期は、雑誌論文4件、国際会議発表7件、および、国内大会発表9件の研究成果を得た。
    [検討項目3] 強調された音源信号から抽出した特徴量に基づき、音環境を解析・理解した。音源信号に関する先見知識を利用し、特徴量次元での分類法も利用した。分類精度を向上させるために、深層学習などの最新の音声認識技術を活用した。今期は、国際会議発表1件、および、国内大会発表1件の研究成果を得た。

  • マイクロホンアレーを用いた音情景解析の研究

    筑波大学・ドイツ学術交流会(DAAD)パートナーシップ・プログラム 

    Project Year :

    2017.04
    -
    2018.03
     

  • ALS患者のための音の空間情報を利用したブレインマシンインタフェース(BMI)の研究開発

    総務省 戦略的情報通信研究開発推進制度(SCOPE)  その他

    Project Year :

    2014.04
    -
    2015.03
     

  • Innovation of multi-channel EEG signal processing technology for BMI development by fusion of information science and brain science

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2013.04
    -
    2014.03
     

    MAKINO Shoji, RUTKOWSKI Tomasz, MIYABE Shigeki, TERASAWA Hiroko, YAMADA Takeshi

     View Summary

    We advanced BMI development in two following frameworks. (1) We proposed a method to select the optimal latency and electrode based on the F-value based on the statistical characteristics of an event-related potential for spatial auditory stimuli. The proposed method demonstrated an 8% improvement of correct classification rate. (2) We verified a combination of real and virtual sound sources by speakers to evoke P300 responses. A large individual difference in P300 appearance was confirmed. To develop an alternate auditory BMI using virtual sound source, we tried a headphone-based auditory BMI using a head impulse response in an open database. A clear P300 was observed in occipital area. With view to develop multimodal BMI, the P300s by the spatial auditory stimuli, visual stimuli and the combination of these modalities were compared. It revealed that the amplitude of P300 for spatial auditory stimuli was less than the other.

  • 音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

    日本学術振興会  基盤研究(B)

    Project Year :

    2020.04
    -
    2021.03
     

  • スモールデータ機械学習理論に基づく音響拡張現実感及び音コミュニケーション能力拡張

    日本学術振興会  基盤研究(A)

    Project Year :

    2020.04
    -
    2021.03
     

  • 音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

    日本学術振興会  基盤研究(B)

    Project Year :

    2019.04
    -
    2020.03
     

  • スモールデータ機械学習理論に基づく音響拡張現実感及び音コミュニケーション能力拡張

    日本学術振興会  基盤研究(A)

    Project Year :

    2019.04
    -
    2020.03
     

  • 次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

    関東経済産業局  中小企業経営支援等対策費補助金(戦略的基盤技術高度化支援事業)

    Project Year :

    2019.04
    -
    2020.03
     

  • 非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

    日本学術振興会  基盤研究(A)

    Project Year :

    2019.04
    -
    2020.03
     

  • 高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

    セコム科学技術振興財団 

    Project Year :

    2019.04
    -
    2020.03
     

  • 大量音声データの事前学習に基づく ブラインド音源分離手法の高度化

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2019.04
    -
    2020.02
     

  • 次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

    関東経済産業局  中小企業経営支援等対策費補助金(戦略的基盤技術高度化支援事業)

    Project Year :

    2018.09
    -
    2019.03
     

  • 非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

    日本学術振興会  基盤研究(A)

    Project Year :

    2018.04
    -
    2019.03
     

  • 高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

    セコム科学技術振興財団 

    Project Year :

    2018.04
    -
    2019.03
     

  • 聴覚特性を考慮した信号処理・機械学習アプローチによる音声強調法の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2018.04
    -
    2019.02
     

  • DNNを用いた音声音響符号化の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2018.04
    -
    2019.02
     

  • 非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

    日本学術振興会  基盤研究(A)

    Project Year :

    2017.04
    -
    2018.03
     

  • 音環境の認識と理解およびスマートホームセキュリティ-、ロボット聴覚、等への応用

    NII  国内共同研究

    Project Year :

    2017.04
    -
    2018.03
     

  • 環境に適応するための音声強調系最適化

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2017.04
    -
    2018.03
     

  • 高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

    セコム科学技術振興財団 

    Project Year :

    2017.04
    -
    2018.03
     

  • DNNを用いた音声音響符号化の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2017.04
    -
    2018.02
     

  • 聴覚特性を考慮した信号処理・機械学習アプローチによる音声強調法の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2017.04
    -
    2018.02
     

  • 柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

    国立研究開発法人科学技術振興機構 (JST)  革新的研究開発推進プログラム(ImPACT)

    Project Year :

    2017.04
    -
    2017.11
     

  • マイクの指向性による、音声認識率の向上

    富士ソフト株式会社  国内共同研究

    Project Year :

    2016.04
    -
    2017.03
     

  • 柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

    国立研究開発法人科学技術振興機構 (JST)  革新的研究開発推進プログラム(ImPACT)

    Project Year :

    2016.04
    -
    2017.03
     

  • 非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

    日本学術振興会  基盤研究(A)

    Project Year :

    2016.04
    -
    2017.03
     

  • マイクロホンアレー付き監視カメラを用い音響情報を統計数理的学習理論により解析するイベント検出とシーン解析

    NII  国内共同研究

    Project Year :

    2016.04
    -
    2017.03
     

  • 高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

    セコム科学技術振興財団 

    Project Year :

    2016.04
    -
    2017.03
     

  • 音響情報と映像情報を統計数理的学習理論により融合するイベント検出とシーン解析

    筑波大学  研究基盤支援プログラム(Bタイプ)

    Project Year :

    2016.04
    -
    2017.03
     

  • マイクロホンアレーを用いた音情景解析の研究

    筑波大学・ドイツ学術交流会(DAAD)パートナーシップ・プログラム 

    Project Year :

    2016.04
    -
    2017.03
     

  • 音声音響符号化音のプレフィルタ・ポストフィルタ処理による音質改善の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2016.04
    -
    2017.02
     

  • 音声のスペクトル領域とケプストラム領域における同時強調法の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2016.04
    -
    2017.02
     

  • 柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

    国立研究開発法人科学技術振興機構 (JST)  革新的研究開発推進プログラム(ImPACT)

    Project Year :

    2015.09
    -
    2016.03
     

  • 非同期録音機器を利用可能にするマイクロフォンアレイ信号処理の研究

    日本学術振興会  基盤研究(B)

    Project Year :

    2015.04
    -
    2016.03
     

  • 音響センシングによる交通量モニタリング

    NII  国内共同研究

    Project Year :

    2014.04
    -
    2015.03
     

  • 低遅延・低ビットレートの音声・音響統合符号化の検討

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2014.04
    -
    2015.03
     

  • 非同期録音機器を利用可能にするマイクロフォンアレイ信号処理の研究

    日本学術振興会  基盤研究(B)

    Project Year :

    2014.04
    -
    2015.03
     

  • 高次統計量追跡による自律カスタムメイド音コミュニケーション拡張システムの研究

    日本学術振興会  基盤研究(A)

    Project Year :

    2014.04
    -
    2015.03
     

  • ALS患者のための音の空間情報を利用したブレインマシンインタフェース(BMI)の研究開発

    総務省 戦略的情報通信研究開発推進制度(SCOPE)  その他

    Project Year :

    2013.04
    -
    2014.03
     

  • Microphone Array Signal Processing with Asynchronous Recording Devices

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2013.04
    -
    2014.03
     

    ONO Nobutaka, MAKINO Shoji, MIYABE Shigeki, SHINODA Koichi

     View Summary

    Microphone array signal processing is an important technique to estimate the direction of arrival of sound or to enhance a target sound in noisy environment by processing multi-channel signals. In the microphone array signal processing, a tiny time difference between channels is important information. Therefore, multi-channel signals have to be recorded in a synchronized way in conventional framework. While in this study, we have developed a technique to synchronize recording signals or to estimate microphone positions without any a priori knowledge in order to use asynchronous individual recording devices such as smartphones, laptop PC, and IC recorder.

  • 複数録音機器による非同期録音信号の同期に関する研究

    ヤマハ株式会社  国内共同研究

    Project Year :

    2013.04
    -
    2014.03
     

  • 複素対数補間に基づくヴァーチャル観測を用いた劣決定アレイ信号処理

    NII  国内共同研究

    Project Year :

    2013.04
    -
    2014.03
     

  • A study on custom-made augmented speech communication system based on higher-order statistics pursuit

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2013.04
    -
    2014.03
     

    Saruwatari Hiroshi, SHIKANO Kiyohiro, TODA Tomoki, KAWANAMI Hiromichi, ONO Nobutaka, MIYABE Shigeki, MAKINO Shoji, KOYAMA Shoichi

     View Summary

    In this study, we address an unsupervised custom-made augmented speech communication system based on the higher-order statistics pursuit. This system consists of two parts, namely, a binaural hearing aid using blind source separation and a speaking aid via speech conversion. The following results are obtained. (1) As the binaural hearing-aid system, we propose new algorithms for an accurate and fast blind source separation and statistical speech conversion, yielding a high quality speech enhancement system utilizing a fixed point of auditory perception. (2) As the speaking-aid system, a new robust speech conversion algorithm against a mismatch between speech database is proposed. The evaluation using real-world sound database shows the efficacy of the proposed method.

  • 低遅延・低ビットレートの音声・音響統合符号化の検討

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2013.05
    -
    2014.02
     

  • ALS患者のための音の空間情報を利用したブレインマシンインタフェース(BMI)の研究開発

    総務省 戦略的情報通信研究開発推進制度(SCOPE)  その他

    Project Year :

    2012.09
    -
    2013.03
     

  • Innovation of multi-channel EEG signal processing technology for BMI development by fusion of information science and brain science

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2012.04
    -
    2013.03
     

    MAKINO Shoji, RUTKOWSKI Tomasz, MIYABE Shigeki, TERASAWA Hiroko, YAMADA Takeshi

     View Summary

    We advanced BMI development in two following frameworks. (1) We proposed a method to select the optimal latency and electrode based on the F-value based on the statistical characteristics of an event-related potential for spatial auditory stimuli. The proposed method demonstrated an 8% improvement of correct classification rate. (2) We verified a combination of real and virtual sound sources by speakers to evoke P300 responses. A large individual difference in P300 appearance was confirmed. To develop an alternate auditory BMI using virtual sound source, we tried a headphone-based auditory BMI using a head impulse response in an open database. A clear P300 was observed in occipital area. With view to develop multimodal BMI, the P300s by the spatial auditory stimuli, visual stimuli and the combination of these modalities were compared. It revealed that the amplitude of P300 for spatial auditory stimuli was less than the other.

  • 非同期録音機器を利用可能にするアレイ信号処理技術

    NII  国内共同研究

    Project Year :

    2012.04
    -
    2013.03
     

  • A study on custom-made augmented speech communication system based on higher-order statistics pursuit

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2012.04
    -
    2013.03
     

    Saruwatari Hiroshi, SHIKANO Kiyohiro, TODA Tomoki, KAWANAMI Hiromichi, ONO Nobutaka, MIYABE Shigeki, MAKINO Shoji, KOYAMA Shoichi

     View Summary

    In this study, we address an unsupervised custom-made augmented speech communication system based on the higher-order statistics pursuit. This system consists of two parts, namely, a binaural hearing aid using blind source separation and a speaking aid via speech conversion. The following results are obtained. (1) As the binaural hearing-aid system, we propose new algorithms for an accurate and fast blind source separation and statistical speech conversion, yielding a high quality speech enhancement system utilizing a fixed point of auditory perception. (2) As the speaking-aid system, a new robust speech conversion algorithm against a mismatch between speech database is proposed. The evaluation using real-world sound database shows the efficacy of the proposed method.

  • Innovation of multi-channel EEG signal processing technology for BMI development by fusion of information science and brain science

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2011.04
    -
    2012.03
     

    MAKINO Shoji, RUTKOWSKI Tomasz, MIYABE Shigeki, TERASAWA Hiroko, YAMADA Takeshi

     View Summary

    We advanced BMI development in two following frameworks. (1) We proposed a method to select the optimal latency and electrode based on the F-value based on the statistical characteristics of an event-related potential for spatial auditory stimuli. The proposed method demonstrated an 8% improvement of correct classification rate. (2) We verified a combination of real and virtual sound sources by speakers to evoke P300 responses. A large individual difference in P300 appearance was confirmed. To develop an alternate auditory BMI using virtual sound source, we tried a headphone-based auditory BMI using a head impulse response in an open database. A clear P300 was observed in occipital area. With view to develop multimodal BMI, the P300s by the spatial auditory stimuli, visual stimuli and the combination of these modalities were compared. It revealed that the amplitude of P300 for spatial auditory stimuli was less than the other.

  • 音声特性と聴覚特性を反映した音声強調処理技術の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2011.04
    -
    2012.03
     

  • 脳科学と情報科学を融合させたBCI構築のための多チャネル脳波信号処理の研究

    電気通信普及財団  出資金による受託研究

    Project Year :

    2011.04
    -
    2012.03
     

  • 脳科学,生命科学,情報科学を融合させた生体マルチメディア情報研究

    Project Year :

    2011.04
    -
     
     

  • 音声特性と聴覚特性を反映した音声強調処理技術の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2010.04
    -
    2011.03
     

  • 音声特性と聴覚特性を反映した音声強調処理技術の研究

    NTT コミュニケーション科学基礎研究所  国内共同研究

    Project Year :

    2009.04
    -
    2010.03
     

  • 生体信号処理と音響信号処理による生命科学研究の革新

    日本学術振興会  科学研究費助成事業

    Project Year :

    2010
     
     
     

    牧野 昭二, WU Y. J.

  • 音声、音楽メディアのコンテンツ基盤技術の創出とイマーシブオーディオコミュニケーションの創生

    Project Year :

    2009.04
    -
     
     

▼display all

Misc

  • 畳込み混合のブラインド音源分離(<特集>独立成分分析とその応用特集号)

    牧野昭二, 荒木章子, 向井良, 澤田宏

    システム/制御/情報 : システム制御情報学会誌   48 ( 10 ) 401 - 408  2004.10

    DOI CiNii

  • ブラインドな処理が可能な音源分離技術 (特集 コミュニケーションの壁を克服するための音声・音響処理技術)

    牧野昭二, 荒木章子, 向井良

    NTT技術ジャ-ナル   15 ( 12 ) 8 - 12  2003.12

    CiNii

  • ステレオエコーキャンセラの課題と解決法

    牧野昭二, 島内末廣

    システム/制御/情報 : システム制御情報学会誌   46 ( 12 ) 724 - 732  2002.12

    DOI CiNii

  • 混じりあった声を解く--遠隔発話の認識を目指して (特集論文1 人にやさしい対話型コンピュータ)

    牧野昭二, 向井良, 荒木章子

    NTT R & D   50 ( 12 ) 937 - 944  2001.12

    CiNii

  • サブバンド信号処理 : 実時間動作化の奥の手

    牧野昭二

    日本音響学会誌   56 ( 12 ) 845 - 851  2000.12

    DOI CiNii

  • 周波数帯域における音響エコー経路の変動特性を反映させたサブバンドESアルゴリズム

    牧野昭二, 羽田陽一

    電子情報通信学会論文誌. A, 基礎・境界   79 ( 6 ) 1138 - 1146  1996.06

    CiNii

  • 音響エコ-キャンセラ用ES射影アルゴリズム (シ-ムレスな音響空間の実現を目指して<特集>)

    牧野昭二, 金田豊

    NTT R & D   44 ( 1 ) p45 - 52  1995.01

    CiNii

  • 音響エコー経路の変動特性を反映させたRLS適応アルゴリズム

    牧野昭二, 金田豊

    日本音響学会誌   50 ( 1 ) 32 - 39  1993.12

    CiNii

  • Estimating correlation coefficients of two super-Gaussian complex signals without phase observation

    MIYABE Shigeki, ONO Nobutaka, MAKINO Shoji

    IEICE technical report. Signal processing   114 ( 474 ) 19 - 24  2015.03

     View Summary

    In this paper, we describe estimation of a correlation coefficient between two complex signal sequences under the condition where the observation misses the phase. In our previous work, we formulated a probabilistic model which assumes that the complex amplitude sequences follow a bivariate complex normal distribution, and proposed a maximum likelihood estimation of the correlation coefficient by an EM algorithm which treats the phase difference as a hidden variable. However, complex signals are often super Gaussian and cause model mismatch of the Gaussian assumption, and the estimation accuracy depends on signals. In this paper, we examine an estimation robust against the model mismatch by formulating a maximum likelihood estimation adaptive to the signal shapes by assuming that the two complex amplitude sequences follow a multivariate t distribution. Experimental results reveals that the complex t distribution model is not always better than the complex normal distribution model depending on the signals, but by selecting the appropriate model, the maximum likelihood estimation can obtain the better result than a straightforward amplitude correlation estimator.

    CiNii

  • Blind Compensation of Sampling Frequency Mismatch for Unsynchronized Microphone Array

    Miyabe Shigeki, Ono Nobutaka, Makino Shoji

    Technical report of IEICE. EA   112 ( 347 ) 11 - 16  2012.12

     View Summary

    In this paper we propose a method to estimate the mismatch of sampling frequencies between the boservation channels for unsynchronized microphone arrays. Since the change of time difference between channels can be regarded as constant in a short time interval, we compensate the phase in the frequency domain. Also, assuming the sources does not move, we estimate the mismatch of sampling frequencies by maximum likelihood esitmation. Experiments reveals that the proposed method recovers the performance of array singal processing.

    CiNii

  • Speech enhancement by asynchronous microphone array using the single source interval information

    Sakanashi Ryutaro, Ono Nobutaka, Miyabe Shigeki, Yamada Takeshi, Makino Shoji

    Technical report of IEICE. EA   112 ( 347 ) 17 - 22  2012.12

     View Summary

    Asynchronous microphone array has the advantage that the use of plurality of recording devices such as mobile phones and voice recorder, there is no scalability constraints in audio signal processing according to conven tional microphone array, it can be inexpensive and flexible configuration. However, asynchronous microphone array has several problems. For example, recording beginning time and DOA information are unknown. Also, unknown individual difference of sampling frequency between the devices. In particular, the shift of the recording beginning time and individual difference of sampling frequency between the devices can have a significant impact on the signal processing, it is necessary to compensate. In this paper, we assume that the purpose of speech enhancement in advance. Such as recording conference to create the minutes. Then, the signal that is put into the record "the single source interval information" that is the time interval that produced only sound, we suggest the proposed synchronization and compensation.

    CiNii

  • Simulation of radial characteristic control with spherical speaker array

    HAYASHI Takaya, MIYABE Shigeki, YAMADA Takeshi, MAKINO Shoji

    Technical report of IEICE. EA   112 ( 76 ) 19 - 24  2012.06

     View Summary

    This paper describes control of distance attenuation using spherical loudspeaker array. One research group proposed radial filtering with spherical microphone to control the sensitivity to distance from a sound source by modeling the propagation of waves in spherical harmonic domain. Since transfer functions do not change when the input and output are swapped, we can apply the radial filtering for microphone arrays to the filter design of distance attenuation control with loudspeaker arrays. Experimental results confirmed that the proposed method is effective in the low frequencies.

    CiNii

  • Underdetermined DOA estimation by the non-linear MUSIC based on higher-order moment analysis

    SUGIMOTO Yuya, MIYABE Shigeki, YAMADA Takeshi, MAKINO Shoji

    Technical report of IEICE. EA   112 ( 76 ) 49 - 54  2012.06

     View Summary

    This paper describes a new approach to extend MUltiple SIgnal Classification (MUSIC) to underdetermined direction-of-arrival (DOA) estimation in high resolution by exploiting higher-order moments. The proposed method maps the observed signals onto higher-dimensional space nonlinearly, and analyzes the covariance matrix there. The covariance matrix in the higher-dimensional space corresponds to the higher-order cross moment matrix in the original space of the observed signals. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method achieves higher resolution of DOA estimation than the standard MUSIC, and also achieves the ability to estimate DOAs in underdetermined conditions. We compared the property of the proposed method with the conventional 2q-MUSIC utilizing higher-order cumulants theoretically and experimentally.

    CiNii

  • D-14-9 Effective factors for grading short answer questions in Japanese speaking test

    Okubo Naoko, Yamahata Yuto, Yamada Takeshi, Imai Shingo, Ishizuka Kenkichi, Shinozaki Takahiro, Nisimura Ryuichi, Makino Shoji, Kitawaki Nobuhiko

    Proceedings of the IEICE General Conference   2012 ( 1 ) 193 - 193  2012.03

    CiNii

  • D-14-8 Effective factors for grading reading questions in Japanese speaking test

    Yamahata Yuto, Okubo Naoko, Yamada Takeshi, Imai Shingo, Isizuka Kenkichi, Shinozaki Takahiro, Nishimura Ryuichi, Makino Shoji, Kitawaki Nobuhiko

    Proceedings of the IEICE General Conference   2012 ( 1 ) 192 - 192  2012.03

    CiNii

  • Speaker diarization for meetings by integrating speech presence probability estimation and time-frequency domain direction of arrival estimation

    ARAKI Shoko, FUJIMOTO Masakiyo, ISHIZUKA Kentaro, NAKATANI Tomohiro, SAWADA Hiroshi, MAKINO Shoji

    IEICE technical report   108 ( 143 ) 19 - 24  2008.07

     View Summary

    This paper presents a meeting diarization system that estimates who spoke when in a meeting. Our proposed system is realized by using a noise robust voice activity detector (VAD), a direction of arrival (DOA) estimator, and a DOA classifier. This paper proposes two methods for improving diarization performance. As the first proposal, we employ a DOA at each time-frequency slot (TFDOA) so that multiple DOAs can be estimated at a frame when multiple speakers speak simultaneously. The second proposal is to integrate VAD and DOA in a probabilistic way. This paper reports how such proposals improve diarization performance for real meetings/conversations.

    CiNii

  • Special section on acoustic scene analysis and reproduction - Foreword

    Shoji Makino

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E91A ( 6 ) 1301 - 1302  2008.06

    Other  

  • Blind Audio Source Separation based on Independent Component Analysis

    MAKINO Shoji

    IEICE technical report   108 ( 70 ) 65 - 73  2008.05

     View Summary

    This paper describes a state-of-the-art method for the blind source separation (BSS) of convolutive mixtures of audio signals. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving nonlinear decorrelation, nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i..e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. The is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct separation filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction, and sources can be simultaneously active in BSS.

    CiNii

  • 周波数領域ICAにおける初期値の短時間データからの学習

    荒木章子, 伊藤信貴, 澤田宏, 小野順貴, 牧野昭二, 嵯峨山茂樹

    電子情報通信学会大会講演論文集   2008   208 - 208  2008.03

    CiNii J-GLOBAL

  • AS-4-5 Sparseness based Underdetermined Blind Speech Separation

    Araki Shoko, Sawada Hiroshi, Makino Shoji

    Proceedings of the IEICE General Conference     "S - 46"-"S-47"  2008

    CiNii

  • A-10-7 Blind Signal Separation by Observation Vector Clustering

    Araki Shoko, Sawada Hiroshi, Mukai Ryo, Makino Shoji

    Proceedings of the Society Conference of IEICE   2005   208 - 208  2005.09

    CiNii

  • A-10-9 Blind Extraction of Dominant Target Sources from Many Background Interference Sources

    Sawada Hiroshi, Araki Shoko, Mukai Ryo, Makino Shoji

    Proceedings of the Society Conference of IEICE   2005   210 - 210  2005.09

    CiNii

  • A-10-8 Blind Source Separation of Many Speech Signals Using Small 3-D Microphone Array

    Mukai Ryo, Sawada Hiroshi, Araki Shoko, Makino Shoji

    Proceedings of the Society Conference of IEICE   2005   209 - 209  2005.09

    CiNii

  • Low-delay Real-time Blind Srouce Separation for Moving Speakers

    MUKAI Ryo, SAWADA Hiroshi, ARAKI Shoko, MAKINO Shoji

      2003 ( 1 ) 779 - 780  2003.03

    CiNii

  • 独立成分分析に基づくブラインド音源分離

    牧野昭二

    ディジタル信号処理シンポジウム   103 ( 129 ) 17 - 24  2003

    Article, review, commentary, editorial, etc. (scientific journal)  

     View Summary

    This paper introduces the blind source separation (BSS) of convolutive mixtures of acoustic signals, especially speech. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving nonlinear decorrelation, nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. This is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct unmixing filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction, and sources can be simultaneously active in BSS.

    CiNii

  • Blind source separation using SSB subband

    ARAKI S., AICHNER Robert, MAKINO S., NISHIKAWA T., SARUWATARI H.

      2002 ( 1 ) 619 - 620  2002.03

    CiNii

  • Blind source separation using pairs of microphones with different distances

    SAWADA Hiroshi, ARAKI Shoko, MUKAI Ryo, MAKINO Shoji

      2002 ( 1 ) 621 - 622  2002.03

    CiNii

  • Real time blind source separation in reverberant environment using frequency domain ICA and time-delayed spectral subtraction

    MUKAI Ryo, ARAKI Shoko, SAWADA Hiroshi, MAKINO Shoji

      2002 ( 1 ) 673 - 674  2002.03

    CiNii

  • Relationship between frequency domain blind source separation and frequency domain adaptive beamformers

    ARAKI S., MAKINO S., MUKAI R., SARUWATARI H.

      2001 ( 2 ) 613 - 614  2001.10

    CiNii

  • Suppression of residual cross-talk component using non-stationary spectral subtraction

    MUKAI Ryo, ARAKI Shoko, SAWADA Hiroshi, MAKINO Shoji

      2001 ( 2 ) 617 - 618  2001.10

    CiNii

  • Limitation of frequency domain Blind Source Separation for convolutive mixture of speech

      2001 ( 1 ) 567 - 568  2001.03

    CiNii

  • Blind source separation and removal of reverberation in the real environment

    MUKAI Ryo, ARAKI Shoko, MAKINO Shoji

      2001 ( 1 ) 565 - 566  2001.03

    CiNii

  • Optimization on the Number of Subbands in Blind Source Separation with Subband ICA

    NISHIKAWA T., ARAKI S., MAKINO S., SARUWATARI H.

      2001 ( 1 ) 569 - 570  2001.03

    CiNii

  • Optimization on the Number of Subbands in Frequency-Domain Blind Source Separation

    NISHIKAWA Tsuyoki, ARAKI Shoko, MAKINO Shoji, SARUWATARI Hiroshi

    Technical report of IEICE. EA   100 ( 580 ) 53 - 59  2001.01

     View Summary

    This paper describes an optimization strategy in terms of the number of subbands in frequency-domain blind source separation (BSS). In general, the separation performance of the conventional ICA-based BSS method significantly degrades under reverberant conditions. On the other hand, as for the inverse filter for dereverberation, it is known that the higher performance can be achieved as the number of filter taps (or the number of subbands) increases. Accordingly, first, we carry out the BSS experiments by increasing the number of subbands in ICA to improve the BSS performance. The results of the signal separation experiments reveal that the separation performance degrades when the number of subbands is exceedingly large; e.g., 1024-or 2048-subband are used. In order to show the cause of the degradation, next, we newly define a simple objective measure to quantify an independence, and investigate the relations between the number of subbands and the independence among narrowband sound sources. The results of the measurements clarify that the independence decreases as the number of subbands increases, and we can conclude that the optimal number of subbands exists in BSS based on the frequency-domain ICA.

    CiNii

  • A multi-channel acoustic echo canceller using channel number compressor and expander

    Nakagawa Akira, Shimauchi Suehiro, Haneda Yoichi, Aoki Shigeaki, Makino Shoji

    Proceedings of the IEICE General Conference   2000   140 - 140  2000.03

    CiNii

  • A study of decorrelation on a stereo echo canceller

    SUZUKI Kuniyasu, SUGIYAMA Kiyoshi, SAKAUCHI Sumitaka, SHIMAUCHI Suehiro, MAKINO Shoji

    Technical report of IEICE. EA   99 ( 518 ) 25 - 32  1999.12

     View Summary

    A stereo echo canceller is required for a stereo teleconferencing system. The main problems are that the adaptive filters often misconverge or, if not, convergence speeds are very slow because of the cross-correlation between stereo signals. Several pre-processing methods which decorrelate stereo signals in order to overcome this problem have been proposed. But these methods introduce distortion resulting in low speech quality. In this paper, we focus on tiny movement of far-end talker and propose a new method of decorrelating with stereo signals without any confusion in sound image localization. We show that convergence can be further improved and speech quality maintained by optimizing using the characteristics of auditory perception.

    CiNii

  • Decorrelation of the stero signals based on acoustic path variation. -2nd report.Optimizing using the characteristics of auditory perception.-

    SUZUKI Kuniyasu, SAKAUCHI Sunitaka, SHIMAUCHI Suehiro, MAKINO Shoji

      1999 ( 2 ) 495 - 496  1999.09

    CiNii

  • Decorrelation of the stereo signals based on acoustic path variation.

    SUZUKI Kuniyasu, SAKAUCHI Sumitaka, SHIMAUCHI Suehiro, MAKINO Shoji

      1999 ( 1 ) 453 - 454  1999.03

    CiNii

  • A study of microphone system for the hands-free tele-conferencing unit

    NAKAGAWA Akira, SHIMAUCHI Suehiro, MAKINO Shoji

      1999 ( 1 ) 493 - 494  1999.03

    CiNii

  • A Study on Configuration of Stereo Echo Canceller with Cross-Correlation Shaker

    Shimauchi Suehiro, Haneda Yoichi, Makino Shoji, Kaneda Yutaka

    Proceedings of the IEICE General Conference   1998   121 - 121  1998.03

    CiNii

  • Block Fast Projection Algorithm with Independent Block Sizes

    Tanaka Masashi, Makino Shoji, Kojima Junji

    Proceedings of the IEICE General Conference   1997   554 - 555  1997.03

     View Summary

    Block processing is an effective approach for reducing the computational complexity of adaptive filtering algorithms although it delays the adaptive filter output and degrades the convergence rate in some implementations. Recently, Benesty[1] proposed a solution to the problems. He introduced the idea of 'exact' block processing which produces the filter output exactly the same as that of the corresponding sample-by-sample algorithm and has short delay by facilitating the fast FIR filtering method. Block processing can be applied to two parts of the adaptive filtering algorithms, i.e. computing the filter output and updating the filter. Conventional 'exact' block algorithms have been using the identical block size for the two parts. This short paper presents the 'exact' block projection algorithm [2] having two independent block sizes, which is listed in List 1. We see, by showing the relation between the filter length and the output delay for a given computation power, that the independent block sizes extend the availability of the 'exact' block fast projection algorithm toward use with longer delay.

    CiNii

  • Whitening of the filter coefficient update-vector in the subband echo cancellers

    NAKAGAWA Akira, HANEDA Yoichi, MAKINO Shoji

    Proceedings of the Society Conference of IEICE   1996   88 - 88  1996.09

    CiNii

  • Consideration on frequency domain echo return loss required for audio teleconference systems

    SAKAUCHI Sumitaka, MAKINO Shoji

      1996 ( 2 ) 547 - 548  1996.09

    CiNii

  • Subband stereo echo canceller using projection algorithm with fast convergence to the true echo path.

    MAKINO Shoji, SHIMAUCHI Suehiro, HANEDA Yoichi, NAKAGAWA Akira

      1996 ( 2 ) 549 - 550  1996.09

    CiNii

  • Fast Projection Algorithm for Multi-Channel Systems

    Shimauchi Suehiro, Tanaka Masashi, Makino Shoji

    Proceedings of the IEICE General Conference   1996   170 - 170  1996.03

    CiNii

  • A study on prototype filter of subband echo canceller

    NAKAAGAWA Akira, HANEDA Yoichi, MAKINO Shoji

    Proceedings of the Society Conference of IEICE   1995   75 - 75  1995.09

    CiNii

  • Study on the stereo echo cancellation algorithm using imaginary input-output relationships

    SHIMAUCHI Suehiro, MAKINO Shoji

      1995 ( 2 ) 543 - 544  1995.09

    CiNii

  • SSB subband projection algorithm for echo cancellers

    MAKINO Shoji, HANEDA Yoichi, NAKAGAWA Akira

      1995 ( 2 ) 541 - 542  1995.09

    CiNii

  • A study on the complex projection subband echocancellers

    NAKAGAWA Akira, HANEDA Yoichi, MAKINO Shoji

      1995 ( 2 ) 539 - 540  1995.09

    CiNii

  • A study of stereo projection echo canseller with true echo path estimation

    Shimauchi Suehiro, Makino Shoji

    Proceedings of the IEICE General Conference   1995   220 - 220  1995.03

    CiNii

  • Study on the echo canceller using the ES Projection algorithm

    Makino Shoji, Haneda Yoichi, Tanaka Masashi, Kaneda Yutaka, Kojima Jyunji

    Proceedings of the IEICE General Conference   1995   349 - 349  1995.03

    CiNii

  • Study on the echo canceller based on the duo filter system using the ES Projection algorithm

    HANEDA Yoichi, MAKINO Shoji, KOJIMA Junji, SHIMAUCHI Suehiro

      1995 ( 1 ) 595 - 596  1995.03

    CiNii

  • Duo filter control system for acoustic echo cancellers

    Haneda Yoichi, Makino Shoji, Tanaka Masashi, Shimauchi Suehiro, Kojima Junji

    Proceedings of the IEICE General Conference     350 - 350  1995

    CiNii

  • Projection algorithm using fast FIR Filtering techniques

    Proceedings of the Society Conference of IEICE     81 - 81  1995

    CiNii

  • 音響エコーキャンセラのための適応信号処理の研究

    牧野昭二

    東北大学博士論文   71 ( 12 ) 2212 - 2214  1993

    CiNii

  • 帯域分割形指数重み付けアルゴリズムを用いた音響エコーキャンセラ

    牧野昭二

    信学全大,SA-9-4    1990

    CiNii

▼display all

Industrial Property Rights

  • Device for blind source separation

    H., Sawada, S., Araki, R., Mukai, and, S. Makino, 牧野, 昭二

    Patent

  • Device for blind source separation

    S., Araki, H., Sawada, S., Makino, and, R. Mukai

    Patent

  • Apparatus, method and program for estimation of positional information on signal sources

    H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

    Patent

  • 音情報処理装置及びプログラム

    牧野, 昭二, 山岡洸瑛, 山田武志, 小野順貴

    Patent

  • 音響処理装置, 音響処理システム及び音響処理方法

    牧野昭二, 石村, 大, 前, 成美, 山田武志, 小野順貴

    Patent

  • 信号処理装置、信号処理方法、プログラム、記録媒体 (可変カットオフ周波数によるポストフィルタリング方法)

    鎌本,優, 守谷,健弘, 原田,登, 千葉,大将, 宮部,滋樹, 山田,武志, 牧野,昭二

    Patent

  • 音声信号処理装置及び方法

    小野,順貴, 宮部,滋樹, 牧野,昭二

    Patent

  • 信号処理装置、信号処理方法、プログラム (ピッチ周波数に依存する可変ゲインによるポストフィルタリング方法)

    鎌本,優, 守谷,健弘, 原田,登, 千葉,大将, 宮部,滋樹, 山田,武志, 牧野,昭二

    Patent

  • 方向情報分布推定装置, 音源数推定装置, 音源方向測定装置, 音源分離装置, それらの方法, それらのプログラム

    荒木, 章子, 中谷, 智広, 澤田, 宏, 牧野, 昭二

    Patent

  • 複数信号区間推定装置, 複数信号区間推定方法, そのプログラムおよび記録媒体

    荒木, 章子, 石塚, 健太郎, 藤本, 雅清, 中谷, 智広, 牧野, 昭二

    Patent

  • 複数信号区間推定装置とその方法と, プログラムとその記録媒体

    荒木, 章子, 石塚, 健太郎, 藤本, 雅清, 牧野, 昭二

    Patent

  • 信号分離装置, 信号分離方法, プログラム及び記録媒体

    澤田, 宏, 荒木, 章子, 牧野, 昭二

    Patent

  • 信号分離装置, 信号分離方法, 信号分離プログラム, 記録媒体

    澤田, 宏, 荒木, 章子, 牧野, 昭二

    Patent

  • 多信号強調装置, 方法, プログラム及びその記録媒体

    荒木, 章子, 澤田, 宏, 牧野, 昭二

    Patent

  • ブラインド信号抽出装置, その方法, そのプログラム, 及びそのプログラムを記録した記録媒体

    荒木, 章子, 澤田, 宏, Jan, Cermak, 牧野, 昭二

    Patent

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

    Patent

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

    Patent

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体, 並びに, 信号到来方向推定装置, 信号到来方向推定方法, 信号到来方向推定プログラム及び記録媒体

    澤田, 宏, 牧野, 昭二, 荒木, 章子, 向井, 良

    Patent

  • 信号到来方向推定装置, 信号到来方向推定方法, 信号到来方向推定プログラム及び記録媒体

    荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

    Patent

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

    Patent

  • 信号到来方向推定方法, 装置, プログラムおよびこれを記録した記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    Patent

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

    Patent

  • 信号抽出装置, 信号抽出方法, 信号抽出プログラム及び記録媒体

    澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

    Patent

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    荒木, 章子, 牧野, 昭二, 澤田, 宏, 向井, 良

    Patent

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    Patent

  • 信号源数の推定方法, 推定装置, 推定プログラム及び記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    Patent

  • 信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

    荒木, 章子, 牧野, 昭二, 澤田, 宏, 向井, 良

    Patent

  • 信号分離方法, 信号分離装置, 信号分離プログラム及び記録媒体

    澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

    Patent

  • 信号分離方法および装置ならびに信号分離プログラムおよびそのプログラムを記録した記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    Patent

  • ブラインド信号分離装置, ブラインド信号分離方法及びブラインド信号分離プログラム

    荒木, 章子, 牧野, 昭二, 向井, 良, 澤田, 宏

    Patent

  • ブラインド信号分離装置, ブラインド信号分離方法及びブラインド信号分離プログラム

    向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

    Patent

  • ブラインド信号分離方法, ブラインド信号分離プログラム及び記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    Patent

  • 信号到来方向推定方法, 装置, プログラムおよびこれを記録した記録媒体

    澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

    Patent

  • SmoothQuiet

    牧野, 昭二, 小島, 順治

    Patent

  • QuiteSmooth

    牧野, 昭二, 小島, 順治

    Patent

  • EchoCam

    牧野, 昭二, 小島 順治

    Patent

  • SUBBANDES

    牧野, 昭二, 羽田, 陽一, 小島, 順治

    Patent

  • ESPARC

    羽田, 陽一, 牧野, 昭二, 小島, 順治

    Patent

  • Radespa

    羽田, 陽一, 牧野, 昭二, 小島, 順治

    Patent

  • DISCAS

    羽田, 陽一, 牧野, 昭二, 小島, 順治

    Patent

  • ES射影アルゴリズム

    牧野, 昭二, 羽田, 陽一, 小島, 順治

    Patent

  • デュオフィルタ

    牧野, 昭二, 羽田, 陽一, 小島, 順治

    Patent

  • インテリジェント ロス コントローラ

    牧野, 昭二, 羽田, 陽一, 小島, 順治

    Patent

  • フェールセーフ適応動作制御方式

    牧野, 昭二, 羽田, 陽一, 小島, 順治

    Patent

  • スムーストーク

    牧野, 昭二, 小島, 順治

    Patent

▼display all

 

Syllabus

▼display all

Teaching Experience

  • 情報科学概論Ⅱ

    筑波大学  

 

Sub-affiliation

  • Faculty of Science and Engineering   School of Fundamental Science and Engineering

Research Institute

  • 2022
    -
    2024

    Waseda Research Institute for Science and Engineering   Concurrent Researcher

Internal Special Research Projects

  • ⾳環境の認識と理解のための⾰新的マイクロホンアレー基盤技術の研究

    2023  

     View Summary

    ブラインド処理と空間正則化処理に基づいてオンライン音源分離,残響除去,およびノイズ低減を実行する,計算効率の高い同時最適化アルゴリズムを提案した.まず,独立ベクトル抽出(IVE)と重み付き予測誤差残響除去(WPE)のブラインドオンライン同時最適化アルゴリズムを提案した.このオンラインアルゴリズムは,WPEを使用することで残響を低減できるため,短い分析フレームでも正確な分離を実現できた.次に,オンライン同時最適化をロバストな空間正則化で拡張した.DOA ベースの空間正則化を確実に機能させるためには,分離された信号のスケールを正規化することが非常に効果的であることを明らかにした.実験では,ブラインドオンライン同時最適化アルゴリズムが 8 ms のアルゴリズム遅延で分離精度を大幅に改善できることを確認した.さらに,提案した空間正則化オンライン同時最適化アルゴリズムが音源順序エラーを 0 % に低減することを確認した.

  • 音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

    2022  

     View Summary

    空間正則化付き独立ベクトル抽出(SRIVE)は,事前推定した音響伝達関数を用いて,所望の出力順序になるように音源分離を行う.しかし,従来のSRIVEはスケール任意性や伝達関数の誤差による出力順序誘導への影響が十分に考慮されていなかった.本研究では,空間正則化に加えてさらに分離フィルタのスケールを小さくする正則化を導入することで上記の問題の解決を試みた.実験より,スケール正則化が分離性能(SDR)を維持しつつ,出力順序正答率を75%から100%に改善することを確かめた.

  • 音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

    2021  

     View Summary

    Thisresearch explores whether the newly proposed online algorithm that jointlyoptimizes weighted prediction error (WPE) and independent vector analysis (IVA)works well in separating moving sound sources in reverberant indoorenvironments. The moving source is first fixed and then rotated 60 degrees in aroom at a speed of less than 10 cm/s, while the other remains fixed. Throughthe comparison of the online-AuxIVA, online-WPE+IVA (separate), andonline-WPE+IVA (joint) algorithms, we can conclude that the online-WPE+IVA(joint) method has the best separation performance when the sources are fixed,but online-WPE+IVA (separate) is more stable and has better performance whenremoving moving sources from the mixed sound.