Details of a Researcher - MAKINO, Shoji

写真a

MAKINO, Shoji

Scopus Paper Info

Paper Count: 281 Citation Count: 5620 h-index: 37

Click to view the Scopus page. The data was downloaded from Scopus API in October 06, 2025, via http://api.elsevier.com and http://www.scopus.com .

Google Scholar Information (Citations per year)

Citation Count: 12710 h-index: 55 i10-index: 186

Click to view the Google Scholar page.

Scopus Information

Affiliation

Faculty of Science and Engineering, Graduate School of Information, Production, and Systems

Job title

Professor

Degree

博士 ( 東北大学 )

Homepage URL

https://s.makino.w.waseda.jp/index.htm

Profile

Shoji Makino is a Professor at the University of Tsukuba. His research interests include adaptive filtering technologies and the realization of acoustic echo cancellation, blind source separation of convolutive mixtures of speech, and acoustic signal processing for speech and audio applications. He was the Chair of the TC on Blind Signal Processing of the IEEE CAS Society, General Chair of the IEEE WASPAA2007, Associate Editor of the IEEE Trans. SAP, Vice President of the Engineering Sciences Society of the IEICE, and the Chair of the TC on Engineering Acoustics of the IEICE.

Research Experience

2021.04

-

Now

早稲田大学大学院情報生産システム研究科教授
2009.04

-

2021.03

筑波大学先端学際領域研究センターおよび大学院システム情報工学研究科教授
2014.04

-

2018.03

National Institute of Informatics Guest Professor
2013.04

-

2018.03

理化学研究所客員研究員
2008.04

-

2009.03

NTT Communication Science Laboratories, Atsugi, Japan. Senior Research Scientist, Supervisor
2008.12

-

2009.02

University Erlangen-Nuremberg, Germany. Guest Professor
2004.04

-

2008.03

Hokkaido University, Sapporo, Japan. Guest Professor
2003.04

-

2008.03

the NTT Communication Science Laboratories, Atsugi, Japan. Media Information Laboratory Executive Manager
2006.04

-

2007.03

The University of Tokyo The Graduate School of Information Science and Technology
2000.04

-

2003.03

the NTT Communication Science Laboratories, Kyoto, Japan. Group Leader at the Speech Open Laboratory Senior Research Scientist, Supervisor
1999.01

-

2000.03

the NTT Lifestyle and Environmental Technology Laboratories, Atsugi, Japan. r, Group Leader at the Multimedia Electronics Laboratory Senior Research Engineer, Superviso
1996.07

-

1998.12

the NTT Multi-Media System Laboratory Group, Yokosuka, Japan. Strategic Planning Senior Research Engineer, Supervisor
1987.08

-

1996.06

the NTT Human Interface Laboratories, Musashino, Japan. Speech and Acoustics Laboratory Senior Research Engineer, Supervisor
1981.04

-

1987.07

NTT Electrical Communication Laboratory, Yokosuka, Japan. Research Engineer

▼display all

Education Background

1993.03

-

　

Tohoku University
1979.04

-

1981.03

Tohoku University Graduate School of Engineering Mechanical Engineering
1975.04

-

1979.03

Tohoku University Faculty of Engineering

Committee Memberships

2019

-

Now

日本学術振興会 Member of the Grants-in-Aid for Scientific Research Sub-Committee
2019

-

Now

European Association for Signal Processing (EURASIP) Member of the Special Area Team on Acoustic, Speech and Music Signal Processing
2018

-

Now

Asia Pacific Signal and Information Processing Association Member of the Signal and Information Processing Theory and Methods Technical Committee
2014.05

-

Now

電子情報通信学会応用音響研究会顧問
2013

-

Now

日本音響学会理事
2007

-

Now

電子情報通信学会フェロー
2005

-

Now

日本音響学会評議員
2004.04

-

Now

International Speech Communication Association (ISCA) Member
2004

-

Now

Institute of Electrical and Electronics Engineers (IEEE) Fellow
2003

-

Now

日本音響学会代議員
2003

-

Now

International ICA Steering Committee Member
2000.04

-

Now

European Association for Signal Processing (EURASIP) Member
1999

-

Now

International Workshop on Acoustic Echo and Noise Control International IWAENC Standing Committee Member
1989.04

-

Now

Institute of Electrical and Electronics Engineers (IEEE) Member
1988.04

-

Now

電子情報通信学会会員
1983.04

-

Now

日本音響学会会員
2018

-

2020

IEEE Signal Processing Society Member of the Board of Governors
2019

　

　

日本学術振興会科学研究費基盤研究(S) 審査意見書委員
2018

-

2019

日本学術振興会国際事業委員会書面審査員・書面評価員
2018

-

2019

日本学術振興会特別研究員等審査会専門委員
2018

-

2019

2018 International Workshop on Acoustic Signal Enhancement General Chair
2017

-

2018

IEEE Signal Processing Society Japan Chapter Chair
2015

-

2018

Institute of Electrical and Electronics Engineers (IEEE) Member of Jack S. Kilby Signal Processing Medal Committee
2013

-

2015

日本学術振興会科学研究費委員会専門委員
2013

-

2015

IEEE Signal Processing Magazine Guest Editor
2014

　

　

日本音響学会独創研究奨励賞板倉記念選考委員会委員長
2013

-

2014

IEEE Signal Processing Society Technical Directions Board Member
2013

-

2014

IEEE Signal Processing Society Chair of the Audio and Acoustic Signal Processing Technical Committee
2013.07

　

　

2013 International Conference of the IEEE Engineering in Medicine and Biology (EMBC2013) Tutorial Speaker
2012

-

2013

2012 IEEE International Conference on Acoustics, Speech, and Signal Processing Plenary Chair
2011

-

2012

2011 Annual Conference of the International Speech Communication Association Tutorial Speaker
2005

-

2012

European Association for Signal Processing Associate Editor of the EURASIP JASP
2009

-

2011

IEEE Japan Council Awards Committee Member
2008

-

2011

Institute of Electrical and Electronics Engineers (IEEE) James L. Flanagan Speech & Audio Processing Award Committee Member
2009

-

2010

電子情報通信学会フェロー推薦委員会委員
2009

-

2010

IEEE Signal Processing Society Distinguished Lecturer
2008

-

2009

2008 Joint Workshop on Hands-Free Speech Communication and Microphone Arrays Panelist
2008

　

　

電子情報通信学会論文賞選定委員会委員
2007

-

2008

2007 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics General Chair
2007

-

2008

電子情報通信学会基礎・境界ソサイエティ音響超音波サブソサイエティ会長
2007

-

2008

2007 IEEE International Conference on Acoustics, Speech and Signal Processing Tutorial Speaker
2007

-

2008

2007 International Conference on Independent Component Analysis and Signal Separation Keynote Speaker
2006

-

2008

電子情報通信学会応用音響研究会委員長
2006

-

2008

IEEE Signal Processing Society Awards Board Member
2006

-

2007

日本音響学会粟屋潔学術奨励賞選定委員会委員
2005

-

2006

2005 Joint Workshop on Hands-Free Speech Communication and Microphone Arrays Panelist
2002

-

2005

Institute of Electrical and Electronics Engineers (IEEE) Associate Editor of the IEEE Trans. Speech and Audio Processing
2001

-

2005

日本音響学会佐藤論文賞選定委員会委員
2003

-

2004

2003 International Workshop on Acoustic Echo and Noise Control General Chair
2002

-

2004

IEEE Signal Processing Society Conference Board Member
2013

-

Now

European project Embedded Audition for Robots Advisory Board member
2006

-

Now

International Advisory Panel Member
2003

-

Now

Acoustical Society of Japan Council member
2020

-

2021

2020 European Signal Processing Conference Special Session Organizer
2020

-

2021

2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2020

-

2021

2020 European Signal Processing Conference Area Chair
2020

-

2021

2020 International Workshop on Acoustic Echo and Noise Control Member of the Organizing Committee
2019

-

2020

2019 European Signal Processing Conference Special Session Organizer
2019

-

2020

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2019

-

2020

2019 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) Member of the Technical Committee
2019

-

2020

IEEE Signal Processing Society Member of the TC Review Committee
2018

-

2020

IEEE Signal Processing Society Member of the Long-Range Planning and Implementation Committee
2018

-

2019

2018 European Signal Processing Conference Special Session Organizer
2018

-

2019

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2018

-

2019

2018 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2018

-

2019

2018 IEEE International Conference on Acoustics, Speech, and Signal Processing Area Chair
2017

-

2018

2017 European Signal Processing Conference Special Session Organizer
2017

-

2018

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2017

-

2018

2017 IEEE International Conference on Acoustics, Speech, and Signal Processing Area Chair
2016

-

2017

Special Session Organizer
2016

-

2017

2016 European Signal Processing Conference Member of the Technical Program Committee
2016

-

2017

2016 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2016

-

2017

Area Chair
2016

-

2017

Area Chair
2016

-

2017

IEEE Signal Processing Society Chair of the Nominations and Elections Subcommittee of the Audio and Acoustic Signal Processing Technical Committee
2012

-

2017

IEEE Signal Processing Society Chair of the Fellow Subcommittee of the Audio and Acoustic Signal Processing Technical Committee
2015

-

2016

2015 European Signal Processing Conference Special Session Organizer
2015

-

2016

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2015

-

2016

2015 AEARU Workshop on Computer Science and Web Technology Member of the Program Committee
2015

-

2016

2015 IEEE International Conference on Acoustics, Speech, and Signal Processing Area Chair
2015

-

2016

IEEE Signal Processing Society Japan Chapter Vice Chair
2015

-

2016

2015 International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) Special Sessions Chair
2015

-

2016

2015 European Signal Processing Conference Area Chair
2010

-

2016

Asia Pacific Signal and Information Processing Association Member of the Speech, Language, and Audio Technical Committee
2015

　

　

IEEE Signal Processing Society Past Chair of the Audio and Acoustic Signal Processing Technical Committee
2015

　

　

2015 IEEE International Workshop on Applications of Signal Processing to Audio Member of the Technical Program Committee
2015

　

　

IEEE Signal Processing Society Vice Chair of the Nominations and Elections Subcommittee of the Audio and Acoustic Signal Processing Technical Committee
2014

-

2015

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2014

-

2015

2014 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2014

-

2015

2014 Hands-free Speech Communication and Microphone Arrays Member of the Technical Program Committee
2014

-

2015

Symposia at the 2014 IEEE Global Conference on Signal and Information Processing Member of the Organizing Committee
2014

-

2015

2014 IEEE International Conference on Acoustics, Speech, and Signal Processing Area Chair
2014

-

2015

2014 European Signal Processing Conference Area Chair
2013

-

2014

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2013

-

2014

2013 European Signal Processing Conference Special Session Organizer
2013

-

2014

2013 IEEE International Conference on Acoustics, Speech, and Signal Processing Area Chair
2013

-

2014

2013 European Signal Processing Conference Area Chair
2012

-

2013

Special Session Organizer
2012

-

2013

2012 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2011.04

-

2012.03

日本音響学会日本音響学会誌小特集ゲスト編集委員長
2011

-

2012

2011 Hands-free Speech Communication and Microphone Arrays Member of the Technical Program Committee
2011

-

2012

2011 European Signal Processing Conference Member of the Technical Program Committee
2011

-

2012

IEEE Signal Processing Society Vice Chair of the Audio and Acoustic Signal Processing Technical Committee
2011

-

2012

European Association for Signal Processing (EURASIP) Guest Editor of the EURASIP Journal on Applied Signal Processing
2010

-

2011

2010 Asia-Pacific Signal and Information Processing Conference Member of the Technical Committee
2010

-

2011

2010 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2010

-

2011

2010 IEEE International Symposium on Circuits and Systems Track Chair
2009

-

2010

2009 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics Member of the Organizing Committee
2009

-

2010

2009 IEEE International Symposium on Circuits and Systems Track Chair
2009

-

2010

2009 European Signal Processing Conference Area Chair
2009

-

2010

IEEE Circuits and Systems Society Chair of the Blind Signal Processing Technical Committee
2008

-

2010

Institute of Electrical and Electronics Engineers (IEEE) Guest Editor of the IEEE Trans. Circuits and Systems-I
1990

-

2010

IEEE Signal Processing Society Member of the Audio and Acoustic Signal Processing Technical Committee
2008

-

2009

2008 Joint Workshop on Hands-free Speech Communication and Microphone Arrays Special Session Organizer
2008

-

2009

2008 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2008

-

2009

2008 Joint Workshop on Hands-free Speech Communication and Microphone Arrays Technical Co-Chair
2008

-

2009

2008 Workshop on Statistical and Perceptual Audition Co-Organizer
2008

-

2009

2008 IEEE International Symposium on Circuits and Systems Member of the Review Committee
2007

-

2009

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS. 電子情報通信学会論文誌小特集ゲスト編集委員長
2007

-

2008

2007 IEEE International Symposium on Circuits and Systems Special Session Organizer
2007

-

2008

電子情報通信学会基礎・境界ソサイエティ副会長
2007

-

2008

2007 IEEE International Symposium on Circuits and Systems Member of the Review Committee
2007

-

2008

Chair-Elect of the Blind Signal Processing Technical Committee
2006

-

2008

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS. 電子情報通信学会論文誌小特集ゲスト編集委員長
2006

-

2007

2006 Asilomar Conference on Signals, Systems, and Computers Special Session Organizer
2006

-

2007

2006 European Signal Processing Conference Special Session Organizer
2006

-

2007

2006 International Conference on Independent Component Analysis and Blind Signal Separation Special Session Organizer
2006

-

2007

2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan Special Session Organizer
2006

-

2007

2006 International Conference on Independent Component Analysis and Blind Signal Separation Member of the International Program Committee
2006

-

2007

2006 European Signal Processing Conference Member of the Technical Program
2006

-

2007

2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan Member of the Organizing Committee
2006

-

2007

2006 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2006

-

2007

2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan Member of the Technical Committee
2006

-

2007

2006 IEEE International Symposium on Circuits and Systems Member of the Review Committee
2006

-

2007

Institute of Electrical and Electronics Engineers (IEEE) Guest Editor of the IEEE Trans. Computers
2006

-

2007

2006 International Conference on Independent Component Analysis and Blind Signal Separation Program Committee Chair
2005

-

2007

Institute of Electrical and Electronics Engineers (IEEE) Guest Editor of the IEEE Trans. ASLP
2005

-

2006

2005 IEEE International Symposium on Circuits and Systems Member of the Review Committee
2005

-

2006

2005 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics Member of the Organizing Committee
2005

-

2006

2005 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2004

-

2006

IEEE Circuits and Systems Society Member of the Blind Signal Processing Technical Committee
2003.04

-

2005.03

電子情報通信学会応用音響研究会専門委員
2004

-

2005

2004 International Congress on Acoustics Special Session Organizer
2004

-

2005

2004 IEEE International Conference on Acoustics, Speech and Signal Processing Special Session Organizer
2004

-

2005

2004 Workshop on Communication Scene Analysis Program Chair
2004

-

2005

2004 Workshop on Statistical and Perceptual Audio Processing Member of the Technical Committee
2004

-

2005

2004 International Congress on Acoustics Member of the Program Committee
2001

-

2005

Acoustical Society of Japan 日本音響学会誌論文委員会電気音響分野幹事
2003

-

2004

2003 IEEE International Workshop on Neural Networks for Signal Processing Member of the Program Committee
2003

-

2004

2003 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics Member of the Program Committee
2003

-

2004

2003 International Conference on Independent Component Analysis and Blind Signal Separation Organizing Chair
2001.04

-

2003.03

電子情報通信学会応用音響研究会副委員長
2002

-

2003

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS. 電子情報通信学会論文誌小特集ゲスト編集委員
2002

-

2003

2002 China-Japan Joint Conference on Acoustics Member of the Organizing Committee
2002

-

2003

2002 IEEE International Workshop on Neural Networks for Signal Processing Member of the Program Committee
1999

-

2003

Institute of Electrical and Electronics Engineers (IEEE) Senior Member
2001

-

2002

2001 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
1992.04

-

2001.03

電子情報通信学会応用音響研究会専門委員
1999

-

2000

1999 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
1995

-

1997

Acoustical Society of Japan 研究発表会準備委員会委員
1990.04

-

1992.03

電子情報通信学会応用音響研究会幹事
1990

-

1992

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS. 電子情報通信学会論文誌小特集ゲスト編集委員

▼display all

Professional Memberships

　

　

　

日本音響学会
　

　

　

電子情報通信学会
　

　

　

APSIPA (Asia Pacific Signal and Information Processing Association)
　

　

　

ISCA (International Speech Communication Association)
　

　

　

EURASIP (European Association for Signal Processing)
　

　

　

IEEE (Institute of Electrical and Electronics Engineers)

▼display all

Research Areas

Perceptual information processing / Intelligent robotics / Intelligent informatics

Research Interests

メディア情報処理
ディジタル信号処理
音響信号処理
Media Information Processing
Digital Signal Processing
Acoustic Signal Processing

▼display all

Awards

Hoko Award

2018.10 Hattori Hokokai Foundation
Outstanding Contribution Award of the Institute of Electronics, Information, and Communication Engineers

2018.06 Institute of Electronics, Information, and Communication Engineers
Paper Award of the Acoustical Society of Japan

2018.03 Acoustical Society of Japan
電子情報通信学会業績賞

2017.06 通信学会

Winner：牧野昭二
Prizes for Science and Technology Research Category

2015.04

Winner： Makino Shoji

　View Summary

The Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology
TELECOM System Technology Award

2015.03 Telecommunications Advancement Foundation

Winner： Makino Shoji
IEEE Signal Processing Society Best Paper Award

2014.01 IEEE Signal Processing Society

Winner： Makino Shoji
Distinguished Lecturer

2009.01 IEEE

Winner： Shoji Makino
Fellow

2007.09 IEICE

Winner： Shoji Makino
MLSP Competition Award

2007.08 IEEE

Winner： Shoji Makino
Best Presentation Award at the SPIE Defense and Security Symposium

2006.04 SPIE

Winner： Makino Shoji
ICA Unsupervised Learning Pioneer Award

2006.04 SPIE

Winner： Makino Shoji
Paper Award

2005.05 IEICE

Winner： Shoji Makino
TELECOM System Technology Award

2004.03 Telecommunications Advancement Foundation

Winner： Shoji Makino
Fellow

2004.01 IEEE

Winner： Shoji Makino
Best Paper Award of the International Workshop on Acoustic Echo and Noise Control

2003.09

Winner： Makino Shoji
Paper Award

2002.05 IEICE

Winner： Shoji Makino
Paper Award

2002.03 ASJ

Winner： Shoji Makino
Achievement Award

1997.05 IEICE

Winner： Shoji Makino
Outstanding Technological Development Award

1995.05 ASJ

Winner： Shoji Makino
IEEE Signal Processing Society Notable Services and Contributions Award

2019 IEEE Signal Processing Society

Winner： Makino Shoji
IEEE Signal Processing Society Chapter Leadership Award

2018.12 IEEE Signal Processing Society

Winner：牧野昭二
Best Faculty Member Award of the University of Tsukuba

2016.02

Winner： Shoji Makino
IEEE Signal Processing Society Outstanding Service Award

2014.12 IEEE Signal Processing Society

Winner： Makino Shoji

▼display all

Papers

Time-Frequency-Bin-Wise Linear Combination of Beamformers for Distortionless Signal Enhancement.

Kouei Yamaoka, Nobutaka Ono, Shoji Makino

IEEE/ACM Transactions on Audio, Speech and Language Processing 29 3461 - 3475 2021

DOI

Scopus

15

Citation

(Scopus)
Multichannel Signal Enhancement Algorithms for Assisted Listening Devices

Simon Doclo, Walter Kellermann, Shoji Makino, Sven Nordholm

IEEE SIGNAL PROCESSING MAGAZINE 32 ( 2 ) 18 - 30 2015.03 [Refereed]

DOI

Scopus

188

Citation

(Scopus)
Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment

Hiroshi Sawada, Shoko Araki, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 19 ( 3 ) 516 - 527 2011.03 [Refereed]

　View Summary

This paper presents a blind source separation method for convolutive mixtures of speech/audio sources. The method can even be applied to an underdetermined case where there are fewer microphones than sources. The separation operation is performed in the frequency domain and consists of two stages. In the first stage, frequency-domain mixture samples are clustered into each source by an expectation-maximization (EM) algorithm. Since the clustering is performed in a frequency bin-wise manner, the permutation ambiguities of the bin-wise clustered samples should be aligned. This is solved in the second stage by using the probability on how likely each sample belongs to the assigned class. This two-stage structure makes it possible to attain a good separation even under reverberant conditions. Experimental results for separating four speech signals with three microphones under reverberant conditions show the superiority of the new method over existing methods. We also report separation results for a benchmark data set and live recordings of speech mixtures.

DOI

Scopus

317

Citation

(Scopus)
Frequency-Domain Pearson Distribution Approach for Independent Component Analysis (FD-Pearson-ICA) in Blind Source Separation

Hiroko Kato Solvang, Yuichi Nagahara, Shoko Araki, Hiroshi Sawada, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 17 ( 4 ) 639 - 649 2009.05 [Refereed]

　View Summary

In frequency-domain blind source separation (BSS) for speech with independent component analysis (ICA), a practical parametric Pearson distribution system is used to model the distribution of frequency-domain source signals. ICA adaptation rules have a score function determined by an approximated signal distribution. Approximation based on the data may produce better separation performance than we can obtain with ICA. Previously, conventional hyperbolic tangent (tanh) or generalized Gaussian distribution (GGD) was uniformly applied to the score function for all frequency bins, even though a wideband speech signal has different distributions at different frequencies. To deal with this, we propose modeling the signal distribution at each frequency by adopting a parametric Pearson distribution and employing it to optimize the separation matrix in the ICA learning process. The score function is estimated by the appropriate Pearson distribution parameters for each frequency bin. We devised three methods for Pearson distribution parameter estimation and conducted separation experiments with real speech signals convolved with actual room impulse responses (T(60) = 130 ms). Our experimental results show that the proposed frequency-domain Pearson-ICA (FD-Pearson-ICA) adapted well to the characteristics of frequency-domain source signals. By applying the FD-Pearson-ICA performance, the signal-to-interference ratio significantly improved by around 2-3 dB compared with conventional nonlinear functions. Even if the signal-to-interference ratio (SIR) values of FD-Pearson-ICA were poor, the performance based on a disparity measure between the true score function and estimated parametric score function clearly showed the advantage of FD-Pearson-ICA. Furthermore, we confirmed the optimum of the proposed approach for/optimized the proposed approach as regards separation performance. By combining individual distribution parameters directly estimated at low frequency with the appropriate parameters optimized at high frequency, it was possible to both reasonably improve the FD-Pearson-ICA performance without any significant increase in the computational burden by comparison with conventional nonlinear functions.

DOI

Scopus

16

Citation

(Scopus)
Grouping separated frequency components by estimating propagation model parameters in frequency-domain blind source separation

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 15 ( 5 ) 1592 - 1604 2007.07 [Refereed]

　View Summary

This paper proposes a new formulation and optimization procedure for grouping frequency components in frequency-domain blind source separation (BSS). We adopt two separation techniques, independent component analysis (ICA) and time-frequency.(T-F) masking, for the frequency-domain BSS. With ICA, grouping the frequency components corresponds to aligning the permutation ambiguity of the ICA solution in each frequency bin. With T-F masking, grouping the frequency components corresponds to classifying sensor observations in the time-frequency domain for individual sources. The grouping procedure is based on estimating anechoic propagation model parameters by analyzing ICA results or sensor observations. More specifically, the time delays of arrival and attenuations from a source to all sensors are estimated for each source. The focus of this paper includes the applicability of the proposed procedure for a situation with wide sensor spacing where spatial aliasing may occur. Experimental results show that the proposed procedure effectively separates two or three sources with several sensor configurations in a real room, as long as the room reverberation is moderately low.

DOI

Scopus

109

Citation

(Scopus)
Spatio-temporal FastICA algorithms for the blind separation of convolutive mixtures

Scott C. Douglas, Malay Gupta, Hiroshi Sawada, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 15 ( 5 ) 1511 - 1520 2007.07 [Refereed]

　View Summary

This paper derives two spatio-temporal extensions of the well-known FastICA algorithm of Hyvarinen and Oja that are applicable to the convolutive blind source separation task. Our time-domain algorithms combine multichannel spatio-temporal prewhitening via multistage least-squares linear prediction with novel adaptive procedures that impose paraunitary, constraints on the multichannel separation filter. The techniques converge quickly to a separation solution without any step size selection or divergence difficulties, and unlike other methods, ours do not require special coefficient initialization procedures to obtain good separation performance. They also allow for the efficient reconstruction of individual signals as observed in the sensor measurements directly from the system parameters for single-input multiple-output blind source separation tasks. An analysis of one of the adaptive constraint procedures shows its fast convergence to a paraunitary filter bank solution. Numerical evaluations of the proposed algorithms and comparisons with several existing convolutive blind source separation techniques indicate the excellent relative performance of the proposed methods.

DOI

Scopus

70

Citation

(Scopus)
Geometrically constrained independent component analysis

Mirko Knaak, Shoko Araki, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 15 ( 2 ) 715 - 726 2007.02 [Refereed]

　View Summary

Acoustical signals are often corrupted by other speeches, sources, and background noise. This makes it necessary to use some form of preprocessing so that signal processing systems such as a speech recognizer or machine diagnosis can be effectively employed. In this contribution, we introduce and evaluate a new algorithm that uses independent component analysis (ICA) with a geometrical constraint [constrained ICA (CICA)]. It is based on the fundamental similarity between an adaptive beamformer and blind source separation with ICA, and does not suffer the permutation problem of ICA-algorithms. Unlike conventional ICA algorithms, CICA needs prior knowledge about the rough direction of the target signal. However, it is more robust against an erroneous estimation of the target direction than adaptive beamformers: CICA converges to the right solution as long as its look direction is closer to the target signal than to the jammer signal. A high degree of robustness is very important since the geometrical prior of an adaptive beamformer is always roughly estimated in a reverberant environment, even when the look direction is precise. The effectiveness and robustness of the new algorithms is proven theoretically, and shown experimentally for three sources and three microphones with several sets of real-world data.

DOI

Scopus

47

Citation

(Scopus)
Blind extraction of dominant target sources using ICA and time-frequency masking

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 14 ( 6 ) 2165 - 2173 2006.11 [Refereed]

　View Summary

This paper presents a method for enhancing target sources of interest and suppressing other interference sources. The target sources are assumed to be close to sensors, to have dominant powers at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e., without knowing the position and active time of each source. We consider a general case where the total number of sources is larger than the number of sensors, and neither the number of target sources nor the total number of sources is known. The method is based on a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and then time-frequency masking is used to improve the performance further. We propose a new sophisticated method for deciding the number of target sources and then selecting their frequency components. We also propose a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room, whose reverberation time was 130 ms, are presented to show the effectiveness and characteristics of the proposed method.

DOI

Scopus

92

Citation

(Scopus)
Natural gradient multichannel blind deconvolution and speech separation using causal FIR filters

Scott C. Douglas, Hiroshi Sawada, Shoji Makino

IEEE Transactions on Speech and Audio Processing 13 ( 1 ) 92 - 104 2005.01 [Refereed]

　View Summary

Natural gradient adaptation is an especially convenient method for adapting the coefficients of a linear system in inverse filtering tasks such as convolutive blind source separation and multichannel blind deconvolution. When developing practical implementations of such methods, however, it is not clear how best to window the signals and truncate the filter impulse responses within the filtered gradient updates. In this paper, we show how inadequate use of truncation of the filter impulse responses and signal windowing within a well-known natural gradient algorithm for multichannel blind deconvolution and source separation can introduce a bias into its steady-state solution. We then provide modifications of this algorithm that effectively mitigate these effects for estimating causal FIR solutions to single- and multichannel equalization and source separation tasks. The new multichannel blind deconvolution algorithm requires approximately 6.5 multiply/adds per adaptive filter coefficient, making its computational complexity about 63% greater than the originally-proposed version. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for real-world acoustic sources, even for extremely-short separation filters.

DOI

Scopus

79

Citation

(Scopus)
A robust and precise method for solving the permutation problem of frequency-domain blind source separation

H Sawada, R Mukai, S Araki, S Makino

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 12 ( 5 ) 530 - 538 2004.09 [Refereed]

　View Summary

Blind source separation (BSS) for convolutive mixtures can be solved efficiently in the frequency domain, where independent component analysis (ICA) is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem: the permutation ambiguity of ICA in each frequency bin should be aligned so that a separated signal in the time-domain contains frequency components of the same source signal. This paper presents a robust and precise method for solving the permutation problem. It is based on two approaches: direction of arrival (DOA) estimation for sources and the interfrequency correlation of signal envelopes. We discuss the advantages and disadvantages of the two approaches, and integrate them to exploit their respective advantages. Furthermore, by utilizing the harmonics of signals, we make the new method robust even for low frequencies where DOA estimation is inaccurate. We also present a new closed-form formula for estimating DOAs from a separation matrix obtained by ICA. Experimental results show that our method provided an almost perfect solution to the permutation problem for a case where two sources were mixed in a room whose reverberation time was 300 ms.

DOI

Scopus

448

Citation

(Scopus)
The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech

S Araki, R Mukai, S Makino, T Nishikawa, H Saruwatari

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 11 ( 2 ) 109 - 116 2003.03 [Refereed]

　View Summary

Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the separation performance is still not good. enough. In particular, when the impulse responses are long, performance is highly limited. In this paper, we consider a two-input, two-output convolutive BSS problem. First, we show that it is not good to be constrained by the condition T > P, where T is the frame length of the DFT and P is the length of the room impulse responses. We show that there is an optimum frame size. that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation. We also clarify the reason for. the poor performance of BSS in long reverberant environments, highlighting that the framework of BSS works as two sets of frequency-domain adaptive beamformers. Although BSS can reduce reverberant sounds to some extent like adaptive beamformers, they mainly remove the sounds from the jammer direction. This is the reason for the difficulty of BSS in reverberant environments.

DOI

Scopus

226

Citation

(Scopus)
Common-acoustical-pole and zero modeling of head-related transfer functions

Y Haneda, S Makino, Y Kaneda, N Kitawaki

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 7 ( 2 ) 188 - 196 1999.03 [Refereed]

　View Summary

Use of a common-acoustical-pole and zero model is proposed for modeling head-related transfer functions (HRTF's) for various directions of sound incidence. The HRTF's are expressed using the common acoustical poles, which do not depend on the source directions, and the zeros, which do, The common acoustical poles are estimated as they are common to HRTF's for various source directions; the estimated values of the poles agree well with the resonance frequencies of the ear canal. Because this model uses only the zeros to express the HRTF variations due to changes in source direction, it requires fewer parameters (the order of the zeros) that depend on the source direction than do the conventional all zero or pole/zero models. Furthermore, the proposed model can extract the zeros that are missed in the conventional models because of pole-zero cancellation. As a result, the directional dependence of the zeros can be traced well. Analysis of the zeros for HRTF's on the horizontal plane showed that the nonminimum-phase zero variation was well formulated using a simple pinna-reflection model, The common-acoustical-pole and zero (CAPZ) model is thus effective for modeling and analyzing HRTF's.
A block exact fast affine projection algorithm

M Tanaka, S Makino, J Kojima

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 7 ( 1 ) 79 - 86 1999.01 [Refereed]

　View Summary

This paper describes a block (affine) projection algorithm that has exactly the same convergence rate as the original sample-by-sample algorithm and smaller computational complexity than the fast affine projection algorithm. This is achieved by 1) introducing a correction term that compensates for the filter output difference between the sample-by-sample projection algorithm and the straightforward block projection algorithm, and 2) applying a fast finite impulse response (FIR) filtering technique to compute filter outputs and to update the filter.
We describe how to choose a pair of block lengths that gives the longest filter length under a constraint on the total computational complexity and processing delay. An example shows that the filter length can be doubled if a delay of a few hundred samples is permissible.
The past, present, and future of audio signal processing

T Chen, GW Elko, SJ Elliot, S Makino, JM Kates, M Bosi, JO Smith, M Kahrs

IEEE SIGNAL PROCESSING MAGAZINE 14 ( 5 ) 30 - 57 1997.09 [Refereed]
Common Acoustical Pole and Zero Modeling of Room Transfer Functions

Yoichi Haneda, Shoji Makino, Yutaka Kaneda

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 2 ( 2 ) 320 - 328 1994.04 [Refereed]

　View Summary

A new model for a room transfer function (RTF) by using common acoustical poles that correspond to resonance properties of a room is proposed. These poles are estimated as the common values of many RTF's corresponding to different source and receiver positions. Since there is one-to-one correspondence between poles and AR coefficients, these poles are calculated as common AR coefficients by two methods: i) using the least squares method, assuming all the given multiple RTF's have the same AR coefficients and ii) averaging each set of AR coefficients estimated from each RTF. The estimated poles agree well with the theoretical poles when estimated with the same order as the theoretical pole order. When estimated with a lower order than the theoretical pole order, the estimated poles correspond to the major resonance frequencies, which have high Q factors. Using the estimated common AR coefficients, the proposed method models the RTF's with different MA coefficients. This model is called the common-acoustical-pole and zero (CAPZ) model, and it requires far fewer variable parameters to represent RTF's than the conventional all-zero or pole/zero model. This model was used for an acoustic echo canceller at low frequencies, as one example. The acoustic echo canceller based on the proposed model requires half the variable parameters and converges 1.5 times faster than one based on the all-zero model, confirming the efficiency of the proposed model.
Exponentially Weighted Stepsize NLMS Adaptive Filter Based on the Statistics of a Room Impulse Response

Shoji Makino, Yutaka Kaneda, Nobuo Koizumi

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 1 ( 1 ) 101 - 108 1993.01 [Refereed]

　View Summary

This paper proposes a new normalized least-mean-squares (NLMS) adaptive algorithm with double the convergence speed, at the same computational load, of the conventional NLMS for an acoustic echo canceller. This algorithm, called the ES (exponentially weighted stepsize) algorithm, uses a different stepsize (feedback constant) for each weight of an adaptive transversal filter. These stepsizes are time-invariant and weighted proportional to the expected variation of a room impulse response. The algorithm is based on the fact that the expected variation of a room impulse response becomes progressively smaller along the series by the same exponential ratio as the impulse response energy decay. As a result, the algorithm adjusts coefficients with large errors in large steps, and coefficients with small errors in small steps. A transition formula is derived for the mean-squared coefficient error of the proposed algorithm. The mean stepsize determines the convergence condition, the convergence speed, and the final excess mean-squared error. The algorithm is modified for a practical multiple DSP structure, so that it requires only the same amount of computation as the conventional NLMS. The algorithm is implemented in a commercial acoustic echo canceller and its fast convergence is demonstrated.

DOI CiNii

Scopus

109

Citation

(Scopus)
Wavelength-Proportional Interpolation and Extrapolation of Virtual Microphone for Underdetermined Speech Enhancement

Ryoga Jinzai, Kouei Yamaoka, Shoji Makino, Nobutaka Ono, Mitsuo Matsumoto, Takeshi Yamada

APSIPA Transactions on Signal and Information Processing 12 ( 3 ) 2023

　View Summary

We previously proposed the virtual microphone technique to improve speech enhancement performance in underdetermined situations, in which the number of channels is virtually increased by estimating extra microphone signals at arbitrary positions along the straight line formed by real microphones. The effectiveness of the interpolation of virtual microphone signals for speech enhancement was experimentally confirmed. In this work, we apply the extrapolation of a virtual microphone as preprocessing of the maximum signal-to-noise ratio (SNR) beamformer and compare its speech enhancement performance (the signal-to-distortion ratio (SDR) and signal-to-interference ratio (SIR)) with that of using the interpolation of a virtual microphone. Furthermore, we aim to improve speech enhancement performance by solving a trade-off relationship between performance at low and high frequencies, which can be controlled by adjusting the virtual microphone interval. We propose a new arrangement where a virtual microphone is placed at a distance from the reference real microphone proportional to the wavelength at each frequency. From the results of our experiment in an underdetermined situation, we confirmed speech enhancement performance using the extrapolation of a virtual microphone is higher than that of using the interpolation of a virtual microphone. Moreover, the proposed wavelength-proportional interpolation and extrapolation method improves speech enhancement performance compared with the interpolation and extrapolation. Furthermore, we present the directivity patterns of a spatial filter and confirmed the behavior that improves speech enhancement performance.

DOI

Scopus
Low latency online blind source separation based on joint optimization with blind dereverberation

Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2021- 506 - 510 2021

　View Summary

This paper presents a new low-latency online blind source separation (BSS) algorithm. Although algorithmic delay of a frequency domain online BSS can be reduced simply by shortening the short-time Fourier transform (STFT) frame length, it degrades the source separation performance in the presence of reverberation. This paper proposes a method to solve this problem by integrating BSS with Weighted Prediction Error (WPE) based dereverberation. Although a simple cascade of online BSS after online WPE upgrades the separation performance, the overall optimality is not guaranteed. Instead, this paper extends a recently proposed batch processing algorithm that can jointly optimize dereverberation and separation so that it can perform online processing with low computational cost and little processing delay (&lt
12 ms). The results of a source separation experiment in a noisy car environment suggest that the proposed online method has better separation performance than the simple cascaded methods.

DOI

Scopus

17

Citation

(Scopus)
SepNet: A deep separation matrix prediction network for multichannel audio source separation

Shota Inoue, Hirokazu Kameoka, Li Li, Shoji Makino

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2021- 191 - 195 2021

　View Summary

In this paper, we propose SepNet, a deep neural network (DNN) designed to predict separation matrices from multichannel observations. One well-known approach to blind source separation (BSS) involves independent component analysis (ICA). A recently developed method called independent low-rank matrix analysis (ILRMA) is one of its powerful variants. These methods allow the estimation of separation matrices based on deterministic iterative algorithms. Specifically, ILRMA is designed to update the separation matrix according to an update rule derived based on the majorization-minimization principle. Although ILRMA performs reasonably well under some conditions, there is still room for improvement in terms of both separation accuracy and computation time, especially for large-scale microphone arrays. The existence of a deterministic iterative algorithm that can find one of the stationary points of the BSS problem implies that a DNN can also play that role if designed and trained properly. Motivated by this, we propose introducing a DNN that learns to convert a predefined input (e.g., an identity matrix) into a true separation matrix in accordance with a multichannel observation. To enable it to find one of the multiple solutions corresponding to different permutations of the source indices, we further propose adopting a permutation invariant training strategy to train the network. By using a fully convolutional architecture, we can design the network so that the forward propagation can be computed efficiently. The experimental results revealed that SepNet was able to find separation matrices faster and with better separation accuracy than ILRMA for mixtures of two sources.

DOI

Scopus

3

Citation

(Scopus)
Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis.

Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Masakazu Une, Shoji Makino

APSIPA ASC 578 - 584 2021
Speech emotion recognition based on attention weight correction using word-level confidence measure

Jennifer Santoso, Takeshi Yamada, Shoji Makino, Kenkichi Ishizuka, Takekatsu Hiramura

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 1 301 - 305 2021

　View Summary

Emotion recognition is essential for human behavior analysis and possible through various inputs such as speech and images. However, in practical situations, such as in call center analysis, the available information is limited to speech. This leads to the study of speech emotion recognition (SER). Considering the complexity of emotions, SER is a challenging task. Recently, automatic speech recognition (ASR) has played a role in obtaining text information from speech. The combination of speech and ASR results has improved the SER performance. However, ASR results are highly affected by speech recognition errors. Although there is a method to improve ASR performance on emotional speech, it requires the fine-tuning of ASR, which is costly. To mitigate the errors in SER using ASR systems, we propose the use of the combination of a self-attention mechanism and a word-level confidence measure (CM), which indicates the reliability of ASR results, to reduce the importance of words with a high chance of error. Experimental results confirmed that the combination of self-attention mechanism and CM reduced the effects of incorrectly recognized words in ASR results, providing a better focus on words that determine emotion recognition. Our proposed method outperformed the stateof- the-art methods on the IEMOCAP dataset.

DOI

Scopus

20

Citation

(Scopus)
Study on geometrically constrained IVA with auxiliary function approach and VCD for in-car communication

Goto, Kana, Li, Li, Takahashi, Riki, Makino, Shoji, Yamada, Takeshi

Proc. APSIPA ASC 2020 858 - 862 2020.12 [Refereed]
Applying virtual microphones to triangular microphone array in in-car communication

Segawa, Hanako, Takahashi, Riki, Jinzai, Ryoga, Makino, Shoji, Yamada, Takeshi

Proc. APSIPA ASC 2020 421 - 425 2020.12 [Refereed]
Determined audio source separation with multichannel star generative adversarial network

Li Li, Hirokazu Kameoka, Shoji Makino

IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2020- 2020.09

　View Summary

This paper proposes a multichannel source separation approach, which uses a star generative adversarial network (StarGAN) to model power spectrograms of sources. Various studies have shown the significant contributions of a precise source model to the performance improvement in audio source separation, which indicates the importance of developing a better source model. In this paper, we explore the potential of StarGAN for modeling source spectrograms and investigate the effectiveness of the StarGAN source model in determined multichannel source separation by incorporating it into a frequency-domain independent component analysis (ICA) framework. The experimental results reveal that the proposed StarGAN-based method outperformed conventional methods that use non-negative matrix factorization (NMF) or a variational autoencoder (VAE) for source spectrogram modeling.

DOI

Scopus

9

Citation

(Scopus)
DNNマスク推定に基づく畳み込みビームフォーマによる音源分離・残響除去・雑音除去の同時実現

髙橋理希, 中谷智広, 落合翼, 木下慶介, 池下林太郎, Marc, Delcroix, 荒木章子, 牧野, 昭二

音講論集 3-1-9 285 - 288 2020.03
基底共有型半教師あり独立低ランク行列分析に基づく多チャネル補聴器システム

宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

音講論集 1-1-22 217 - 220 2020.03
発話の時間変動に着目した音声認識誤り区間推定の検討

舒, 禹清, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会講演論文集 957 - 958 2020.03
空間特徴と音響特徴を併用する音響イベント検出の検討

陳, 軼夫, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会講演論文集 1027 - 1030 2020.03
車室内コミュニケーション用低遅延音源分離の検討

上田, 哲也, 井上, 翔太, 牧野, 昭二, 松本, 光雄, 山田, 武志

日本音響学会春季研究発表会講演論文集 213 - 216 2020.03
空間フィルタの自動推定による音響シーン識別の検討

大野, 泰己, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会論文集(D) D-14-6 113 - 113 2020.03
Generative Adversarial Networks を用いた半教師あり学習の音響イベント検出への適用

合馬, 一弥, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会論文集(D) D-14-7 114 - 114 2020.03
Spatial feature extraction based on convolutional neural network with multiple microphone inputs for monitoring of domestic activities

Kaneko, Yuki, Kurosawa, Rika, Yamada, Takeshi, Makino, Shoji

Proc. NCSP'20 175 - 178 2020.02 [Refereed]
Underdetermined multichannel speech enhancement using time-frequency-bin-wise switching beamformer and gated CNN-based time-frequency mask for reverberant environments

Takahashi, Riki, Yamaoka, Kouei, Li, Li, Makino, Shoji, Yamada, Takeshi, Matsumoto, Mitsuo

Proc. NCSP'20 645 - 648 2020.02 [Refereed]
Blind source separation with low-latency for in-car communication

Ueda, Tetsuya, Inoue, Shota, Makino, Shoji, Matsumoto, Mitsuo, Yamada, Takeshi

Proc. NCSP'20 167 - 170 2020.02 [Refereed]
多チャンネル変分自己符号化器法による任意話者の音源分離

李莉, 亀岡弘和, 井上翔太, 牧野, 昭二

信学技報 EA2019-77 79 - 84 2019.12
Evaluation of multichannel hearing aid system by rank-constrained spatial covariance matrix estimation

Une, Masakazu, Kubo, Yuki, Takamune, Norihiro, Kitamura‡, Daichi, Saruwatari, Hiroshi, Makino, Shoji

Proc. APSIPA 1874 - 1879 2019.11 [Refereed]
Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE

Li, Li, Toda, Tomoki, Morikawa, Kazuho, Kobayashi, Kazuhiro, Makino, Shoji

Proc. Annual Conference of the International Society for Music Information Retrieval 784 - 790 2019.11
Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum

Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

Proc. APSIPA ASC 2019 302 - 306 2019.11 [Refereed]
Supervised determined source separation with multichannel variational autoencoder

Kameoka, Hirokazu, Li, Li, Inoue, Shota, Makino, Shoji

Neural Computation 31 ( 9 ) 1891 - 1914 2019.09 [Refereed]
Joint separation, dereverberation and classification of multiple sources using multichannel variational autoencoder with auxiliary classifier

Shota, Inoue, Hirokazu, Kameoka, Li, Li, Makino, Shoji

Proc. International Congress on Acoustics 6988 - 6995 2019.09 [Refereed]
Gated convolutional neural network-based voice activity detection under high-level noise environments

Li, Li, Kouei, Yamaoka, Yuki, Koshino, Mitsuo, Matsumoto, Makino, Shoji

Proc. International Congress on Acoustics 2862 - 2869 2019.09 [Refereed]
ランク制約付き空間共分散モデル推定を用いた多チャネル補聴器システムの評価

宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

音講論集 1-1-3 161 - 164 2019.09
Wavelength proportional arrangement of virtual microphones based on interpolation/extrapolation for underdetermined speech enhancement

Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Makino, Shoji, Yamada, Takeshi

Proc. EUSIPCO 2019 2019.09 [Refereed]
BLSTMと変調スペクトルを用いた発話特徴識別の検討

サントソ, ジェニファー, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 917 - 928 2019.09
BLSTMを用いた音声認識誤り区間推定の検討

舒, 禹清, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 921 - 924 2019.09
CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations

Yamaoka, Kouei, Li, Li, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

Proc. EUSIPCO 2019 2019.09 [Refereed]
Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier

Li, Li, Hirokazu, Kameoka, Makino, Shoji

Proc. ICASSP2019 546 - 550 2019.05
Joint separation and dereverberation of reverberant mixtures with multichannel variational autoencoder

Shota, Inoue, Hirokazu, Kameoka, Li, Li, Shogo, Seki, Makino, Shoji

Proc. ICASSP2019 96 - 100 2019.05 [Refereed]
Time-frequency-bin-wise switching of minimum variance distortionless response beamformer for underdetermined situations

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

Proc. ICASSP 2019 7908 - 7912 2019.05 [Refereed]
Experimental evaluation of WaveRNN predictor for audio lossless coding

Amada, Shota, Sugiura, Ryosuke, Kamamoto, Yutaka, Harada, Noboru, Moriya, Takehiro, Yamada, Takeshi, Makino, Shoji

Proc. NCSP'19 315 - 318 2019.03 [Refereed]
MVDRビームフォーマの時間周波数スイッチングによる劣決定音声強調

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

電子情報通信学会技術研究報告（SP） SIP2018-130 149 - 154 2019.03
日本語スピーキングテストにおける解答発話テキストの分散表現を用いた自動採点の検討

臼井, 桃香, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会論文集（D） D-14-10 137 - 137 2019.03
Gated CNNを用いた劣悪な雑音環境下における音声区間検出

李莉, 越野ゆき, 松本光雄, 牧野, 昭二

電子情報通信学会技術研究報告 EA2018-124 19 - 24 2019.03
Noise suppression using beamformer and transfer-function-gain nonnegative matrix factorization with distributed stereo microphones

Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

Proc. NCSP'19 260 - 263 2019.03 [Refereed]
Categorizing error causes related to utterance characteristics in speech recognition

Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

Proc. NCSP'19 514 - 517 2019.03 [Refereed]
多チャンネル変分自己符号化器を用いた音源分離と残響除去の統合的アプローチ

井上翔太, 亀岡弘和, 李莉, 関翔悟, 牧野, 昭二

音講論集 2-Q-32 399 - 402 2019.03
Performance evaluation of time-frequency-bin-wise switching beamformer in reverberant environments

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

Proc. NCSP'19 264 - 267 2019.03 [Refereed]
時間周波数スイッチングビームフォーマとGated CNNを用いた時間周波数マスクの組み合わせによる劣決定音声強調

髙橋, 理希, 山岡, 洸瑛, 李, 莉, 牧野, 昭二, 山田, 武志

日本音響学会2019年春季研究発表会講演論文集 1-6-5 181 - 184 2019.03
音源クラス識別器つき多チャンネル変分自己符号化器を用いた高速セミブラインド音源分離

李, 莉, 亀岡, 弘和, 牧野, 昭二

音講論集 1-6-10 201 - 204 2019.03
Microphone position realignment by extrapolation of virtual microphone

Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Yamada, Takeshi, Makino, Shoji

Proc. APSIPA ASC 2018 367 - 372 2018.11 [Refereed]
Weakly labeled learning using BLSTM-CTC for sound event detection

Matsuyoshi, Taiki, Komatsu, Tatsuya, Kondo, Reishi, Yamada, Takeshi, Makino, Shoji

Proc. APSIPA ASC 2018 1918 - 1923 2018.11 [Refereed]
WaveRNNを利用した音声ロスレス符号化に関する検討と考察

天田, 将太, 杉浦, 亮介, 鎌本, 優, 原田, 登, 守谷, 健弘, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 2-4-9 1149 - 1152 2018.09
Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming

Makino,Shoji

Proc. IWAENC2018 71 - 75 2018.09 [Refereed]

DOI

Scopus

11

Citation

(Scopus)
ヴァーチャルマイクロフォンの外挿によるマイクロフォン間隔の仮想的拡張

陣在, 遼河, 山岡, 洸瑛, 松本, 光雄, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 1-1-21 149 - 152 2018.09
時間周波数スイッチングビームフォーマと時間周波数マスキングによる劣決定音声強調

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

日本音響学会秋季研究発表会講演論文集 1-Q-12 407 - 410 2018.09
Time-frequency-bin-wise beamformer selection and masking for speech enhancement in underdetermined noisy scenarios

Yamaoka, Kouei, Brendel, Andreas, Ono, Nobutaka, Makino, Shoji, Buerger, Michael, Yamada, Takeshi, Kellermann, Walter

Proc. EUSIPCO 2018 1596 - 1600 2018.09 [Refereed]
Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming

Makino,Shoji

Proc. IWAENC2018 71 - 75 2018.09 [Refereed]
音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習法の有効性評価

松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 1-R-5 961 - 964 2018.09
Acoustic Scene Classification Based on Spatial Feature Extraction Using Convolutional Neural Networks

Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

Journal of Signal Processing 22 ( 4 ) 199 - 202 2018.07 [Refereed]

　View Summary

Acoustic scene classification (ASC) classifies the place or situation where an acoustic sound was recorded. The Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge prepared a task involving ASC. Some methods using convolutional neural networks (CNNs) were proposed in the DCASE 2017 Challenge. The best method independently performed convolution operations for the left, right, mid (addition of left and right channels), and side (subtraction of left and right channels) input channels to capture spatial features. On the other hand, we propose a new method of spatial feature extraction using CNNs. In the proposed method, convolutions are performed for the time-space (channel) domain and frequency-space domain in addition to the time-frequency domain to capture spatial features. We evaluate the effectiveness of the proposed method using the task in the DCASE 2017 Challenge. The experimental results confirmed that convolution operations for the frequency-space domain are effective for capturing spatial features. Furthermore, by using a combination of the three domains, the classification accuracy was improved by 2.19% compared with that obtained using the tim

DOI
畳み込みニューラルネットワークを用いた空間特徴抽出に基づく音響シーン識別の検討

高橋, 玄, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会講演論文集 67 - 70 2018.03
複数ビームフォーマの組み合わせによる非線形マイクロフォンアレイ

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

日本音響学会春季研究発表会講演論文集 475 - 478 2018.03
Ego noise reduction and sound source localization adapted to human ears using hose-shaped rescue robot

Mae, Narumi, Yamaoka, koei, Mitsui, Yosiki, Matsumoto, Mitsuo, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

Proc. NCSP'18 371 - 374 2018.03 [Refereed]
複数種録音端末を用いた会議の想定における伝達関数ゲイン基底NMFによる遠方音源抑圧の性能評価

松井, 裕太郎, 牧野, 昭二, 小野, 順貴, 山田, 武志

電子情報通信学会技術研究報告 335 - 340 2018.03
音声認識における誤認識原因通知のための印象評定値推定の検討

後藤, 孝宏, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会講演論文集 117 - 120 2018.03
音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習の検討

松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会講演論文集 63 - 66 2018.03
Novel speech recognition interface based on notification of utterance volume required in changing noisy environment

Goto, Takahiro, Yamada, Takeshi, Makino, Shoji

Proc. NCSP'18 192 - 195 2018.03 [Refereed]
Acoustic scene classification based on spatial feature extraction using convolutional neural networks

Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

Proc. NCSP'18 188 - 191 2018.03 [Refereed]
Ego-noise reduction for hose-shaped rescue robot using basis-shared semi-supervised independent low-rank matrix analysis

Takakusaki, Moe, Kitamura, Daichi, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi, Saruwatari, Hiroshi

Proc. NCSP'18 351 - 354 2018.03 [Refereed]
Sound source localization using binaural difference for hose-shaped rescue robot

Mae, Narumi, Mitsui, Yoshiki, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

Proc. APSIPA 2017 1 - 7 2017.12 [Refereed]
Abnormal sound detection by two microphones using virtual microphone technique

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

Proc. APSIPA 2017 1 - 5 2017.12 [Refereed]
Performance evaluation of acoustic scene classification using DNN-GMM and frame-concatenated acoustic Features

Takahashi, Gen, Yamada, Takeshi, Ono, Nobutaka, Makino, Shoji

Proc. APSIPA 2017 1 - 5 2017.12 [Refereed]
Experimental evaluation of encoding parameters of MPEG-4 ALS for high-resolution audio

Amada, Shota, Kamamoto, Yutaka, Harada, Noboru, Sugiura, Ryosuke, Moriya, Takehiro, Makino, Shoji, Yamada, Takeshi

Proc. IEEE GCCE 2017 141 - 145 2017.10 [Refereed]
音響ロスレス符号化MPEG-4 ALSにおけるハイレゾ音源向け線形予測次数最適化に関する検討と考察

天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 251 - 254 2017.09
Far-noise suppression by transfer-function-gain non-negative matrix factorization in ad hoc microphone array

村瀬, 慶和, 小野, 順貴, 宮部, 滋樹, 山田, 武志, 牧野, 昭二

THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN 73 ( 9 ) 563 - 570 2017.09 [Refereed]

　View Summary

ビームフォーミングなどの従来のアレー信号処理による雑音抑圧手法は，位相情報を活用した指向性制御に基づいており，特定方向から到来する雑音に対しては指向性の零点を向けることで高い効果が得られる。しかし，到来方向が特定できないような，いわゆる背景雑音の抑圧は，一般に難しかった。本論文では，伝達関数ゲイン基底NMFにより，遠方から到来する雑音を複数マイクを用いて効果的に抑圧する手法を提案する。提案手法では，背景雑音が遠方から到来することを仮定し，時間周波数領域における振幅情報のみに着目することで，様々な方向から到来する遠方音源を一つの混合音源としてモデル化する。次にこの振幅の混合モデルを従来提案されている制約付き伝達関数ゲイン基底NMFに適用し，遠方音源の抑圧を行う。更に，半教師あり伝達関数ゲイン基底NMFを適用し，遠方音源の抑圧を行う。本手法は振幅情報のみを用いているため，非同期録音機器を用いることができ

DOI CiNii
Mel-generalized cepstral regularization for discriminative non-negative matrix factorization

Li, Li, Kameoka, Hirokazu, Makino, Shoji

Proc. MLSP 1 - 6 2017.09 [Refereed]
Performance evaluation of nonlinear speech enhancement based on virtual increase of channels in reverberant environments

Yamaoka, Kouei, Ono, Nobutaka, Yamada, Takeshi, Makino, Shoji

Proc. EUSIPCO 2017 2388 - 2392 2017.08 [Refereed]
Multiple far noise suppression in a real environment using transfer-function-gain NMF

Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

Proc. EUSIPCO 2017 2378 - 2382 2017.08 [Refereed]
Convolutional neural network architecture and input volume matrix design for ERP classifications in a tactile P300-based brain-computer interface

Kodama, Takumi, Makino, Shoji

Proc. IEEE Engineering in Medicine & Biology Society (EMBC) 3814 - 3817 2017.07 [Refereed]
教師信号を用いた非同期分散型マイクロホンアレーによる音源分離

坂梨, 龍太郎, 小野, 順貴, 宮部, 滋樹, 山田, 武志, 牧野, 昭二

日本音響学会誌 73 ( 6 ) 337 - 348 2017.06 [Refereed]

DOI CiNii
Development of High Quality Blind Source Separation Based on Independent Low-Rank Matrix Analysis and Statistical Speech Enhancement for Flexible Hose-Shaped Robot

三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 1P2-P04 1 - 4 2017.05

　View Summary

In this paper, we propose a novel blind source separation method for the hose-shaped rescue robot based on independent low-rank matrix analysis and statistical speech enhancement. The rescue robot is aimed to detect victims'speech in a disaster area, wearing multiple microphones around the body. Different from the common microphone array, the positions of microphones are unknown, and the conventional beamformer cannot be utilized. In addition, the vibration noise (ego-noise) is generated when the robot moves, yielding the serious contamination in the observed signals. Therefore, it is important to eliminate the ego-noise in this system. This paper describes our newly developed software and hardware system of blind source separation for the robot noise reduction. Also, we report objective and subjective evaluation results showing that the proposed system outperforms the conventional methods in the source separation accuracy and perceptual sound quality via experiments with actual sounds observed in the rescue robot.

DOI CiNii
DNN-GMMと連結特徴量を用いた音響シーン識別の検討

高橋, 玄, 山田, 武志, 小野, 順貴, 牧野, 昭二

日本音響学会2017年春季研究発表会 2-P-1 135 - 138 2017.03
補助関数法による識別的NMFの基底学習アルゴリズム

李莉, 亀岡弘和, 牧野昭二

日本音響学会2017年春季研究発表会 1-P-4 519 - 522 2017.03
独立低ランク行列分析と統計的音声強調を用いた柔軟索状ロボットにおけるブラインド音源分離システムの開発

三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

日本音響学会2017年春季研究発表会 1-P-3 517 - 518 2017.03
SJ-CATにおける項目応答理論に基づく能力値推定の精度改善

小野, 友暉, 山田, 武志, 菊地, 賢一, 今井, 新悟, 牧野, 昭二

日本音響学会2017年春季研究発表会 2-6-3 247 - 250 2017.03
音響ロスレス符号化MPEG-4 ALSのハイレゾ音源適応の検討と考察

天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

日本音響学会2017年春季研究発表会 2-P-42 381 - 382 2017.03
Analysis of the brain activated distributions in response to full-body spatial vibrotactile stimuli using a tactile P300-based BCI paradigm

Kodama, T, Makino, Shoji

Proc. Biomedical and Health Informatics (BHI) 1 - 1 2017.02 [Refereed]
Low latency and high quality two-stage human-voice-enhancement system for a hose-shaped rescue robot

Yoshiaki Bando, Hiroshi Saruwatari, Nobutaka Ono, Shoji Makino, Katustoshi Itoyama, Daichi Kitamura, Masaru Ishimura, Moe Takakusaki, Narumi Mae, Kouei Yamaoka, Yutaro Matsui, Yuichi Ambe, Masashi Konyo, Satoshi Tadokoro, Kazuyoshi Yoshii, Hiroshi G. Okuno

Journal of Robotics and Mechatronics 29 ( 1 ) 198 - 212 2017.02

　View Summary

This paper presents the design and implementation of a two-stage human-voice enhancement system for a hose-shaped rescue robot. When a microphoneequipped hose-shaped robot is used to search for a victim under a collapsed building, human-voice enhancement is crucial because the sound captured by a microphone array is contaminated by the ego-noise of the robot. For achieving both low latency and high quality, our system combines online and offline human-voice enhancement, providing an overview first and then details on demand. The online enhancement is used for searching for a victim in real time, while the offline one facilitates scrutiny by listening to highly enhanced human voices. Our online enhancement is based on an online robust principal component analysis, and our offline enhancement is based on an independent lowrank matrix analysis. The two enhancement methods are integrated with Robot Operating System (ROS). Experimental results showed that both the online and offline enhancement methods outperformed conventional methods.

DOI

Scopus

9

Citation

(Scopus)
DISCRIMINATIVE NON-NEGATIVE MATRIX FACTORIZATION WITH MAJORIZATION-MINIMIZATION

Li Li, Hirokazu Kameoka, Shoji Makino

2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017) 141 - 145 2017 [Refereed]

　View Summary

Non-negative matrix factorization (NMF) is a powerful approach to single channel audio source separation. In a supervised setting, NMF is first applied to train the basis spectra of each sound source. At test time, NMF is applied to the spectrogram of a mixture signal using the pretrained spectra. The source signals can then be separated out using a Wiener filter. A typical way to train the basis spectra of each source is to minimize the objective function of NMF. However, the basis spectra obtained in this way do not ensure that the separated signal will be optimal at test time due to the inconsistency between the objective functions for training and separation (Wiener filtering). To address this, a framework called discriminative NMF (DNMF) has recently been proposed. In in this work a multiplicative update algorithm was proposed for the basis training, however one drawback is that the convergence is not guaranteed. To overcome this drawback, this paper proposes using a majorization-minimization principle to develop a convergence-guaranteed algorithm for DNMF. Experimental results showed that the proposed algorithm outperformed standard NMF and DNMF using a multiplicative update algorithm as regards both the signal-to-distortion and signal-to-interference ratios.
Blind source separation and multi-talker speech recognition with ad hoc microphone array using smartphones and cloud storage

越智景子, 小野順貴, 宮部滋樹, 牧野昭二

Acoustical Science and Technologyv 2017 [Refereed]
Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization

Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017- 1998 - 2002 2017 [Refereed]

　View Summary

Spectral domain speech enhancement algorithms based on nonnegative spectrogram models such as non-negative matrix factorization (NMF) and non-negative matrix factor deconvolution are powerful in terms of signal recovery accuracy, however they do not directly lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. We have previously proposed a method that makes it possible to enhance speech in the spectral and cepstral domains simultaneously. Although this method was shown to be effective, the devised algorithm was computationally demanding. This paper proposes yet another formulation that allows for a fast implementation by replacing the regularization term with a divergence measure between the NMF model and the mel-generalized cepstral (MGC) representation of the target spectrum. Since the MGC is an auditory-motivated representation of an audio signal widely used in parametric speech synthesis, we also expect the proposed method to have an effect in enhancing the perceived quality. Experimental results revealed the effectiveness of the proposed method in terms of both the signal-To-distortion ratio and the cepstral distance.

DOI

Scopus

1

Citation

(Scopus)
Full-body tactile P300-based brain-computer interface accuracy refinement

Kodama, T, Shimizu, K, Makino, Shoji, Rutkowski, T

Proc. International Conference on Bio-engineering for Smart Technologies (BioSMART) 1 - 4 2016.12 [Refereed]
伝達関数ゲイン基底NMFを用いた遠方雑音抑圧の実環境での評価

松井,裕太郎, 牧野,昭二, 小野,順貴, 山田,武志

第31回信号処理シンポジウム B3-1 231 - 235 2016.11
Flexible microphone array based on multichannel nonnegative matrix factorization and statistical signal estimation

Saruwatari, H, Takata, K, Ono, N, Makino, Shoji

International Congress on Acoustics 1 - 10 2016.09 [Refereed]
Acoustic scene classification using deep neural network and frame-concatenated acoustic feature

Gen,Takahashi, Takeshi,Yamada, Shoji,Makino, Nobutaka,Ono

DCASE2016 Challenge 1 - 2 2016.09
雑音下音声認識における必要発話音量提示機能の実装と評価

後藤,孝宏, 山田,武志, 牧野,昭二

日本音響学会秋季研究発表会 3-Q-12 117 - 120 2016.09
ヴァーチャル多素子化に基づくSN比最大化ビームフォーマの残響に対する性能変化

山岡,洸瑛, 小野,順貴, 山田,武志, 牧野,昭二

日本音響学会秋季研究発表会 3-7-5 379 - 382 2016.09
Toward a QoL improvement of ALS patients: Development of the full-body P300-based tactile brain-computer interface

Kodama, T, Makino, Shoji, Rutkowski, T

Proc. AEARU Young Researchers International Conference 1 - 4 2016.09 [Refereed]
日本語スピーキングテストSJ-CATにおける項目応答理論に基づく能力値推定の検証

小野,友暉, 山田,武志, 菊地,賢一, 今井,新悟, 牧野,昭二

日本音響学会秋季研究発表会 3-Q-26 253 - 256 2016.09
Amplitude-based speech enhancement with non-negative matrix factorization in time-channel domain for ad-hoc microphone array

H., Chiba, N., Ono, S., Miyabe, Y., Takahashi, T., Yamada, S., Makino

J. Acoust. Soc. Jpn 72 ( 8 ) 462 - 470 2016.08 [Refereed]

CiNii
アドホックマイクロホンアレーにおける時間チャネル領域での非負値行列因子分解を用いた振幅ベースの音声強調

千葉,大将, 小野,順貴, 宮部,滋樹, 高橋,祐, 山田,武志, 牧野,昭二

日本音響学会誌 72 ( 8 ) 462 - 470 2016.08 [Refereed]

CiNii
Amplitude-based speech enhancement with non-negative matrix factorization in time-channel domain for ad-hoc microphone array

千葉大将, 小野順貴, 宮部滋樹, 高橋祐, 山田武志, 牧野昭二

J. Acoust. Soc. Jpn 72 ( 8 ) 462 - 470 2016.08 [Refereed]

　View Summary

本論文では，時間チャネル領域の非負値行列因子分解（NMF）による，非同期分散型録音の目的音強調手法について述べる。複数の録音機器による多チャネル信号は，機器ごとのサンプリング周波数の微小なずれが引き起こす位相差のドリフトのため，位相情報を用いるアレー信号処理は適さない。位相に比べると振幅の分析はドリフトの影響を大きく受けないことに着目し，戸上らが提案した時間チャネル領域のNMFによるチャネル間ゲイン差の分析（伝達関数ゲイン基底NMF）に基づく時間周波数マスクを用いる。また，基底数よりも十分大きなチャネル数が得られない条件の音声強調のための，基底を事前に学習する教師ありNMFについて議論する。

DOI
音声のスペクトル領域とケプストラム領域における同時強調

李莉, 亀岡弘和, 樋口卓哉, 猿渡洋, 牧野昭二

信学技報 SP2016-32 29 - 32 2016.08
An Extension of MUSIC Exploiting Higher-Order Moments via Nonlinear Mapping

Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Biing-Hwang Juang

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E99A ( 6 ) 1152 - 1162 2016.06 [Refereed]

　View Summary

MUltiple Signal Classification (MUSIC) is a standard technique for direction of arrival (DOA) estimation with high resolution. However, MUSIC cannot estimate DOAs accurately in the case of underdetermined conditions, where the number of sources exceeds the number of microphones. To overcome this drawback, an extension of MUSIC using cumulants called 2q-MUSIC has been proposed, but this method greatly suffers from the variance of the statistics, given as the temporal mean of the observation process, and requires long observation. In this paper, we propose a new approach for extending MUSIC that exploits higher-order moments of the signal for the underdetermined DOA estimation with smaller variance. We propose an estimation algorithm that nonlinearly maps the observed signal onto a space with expanded dimensionality and conducts MUSIC-based correlation analysis in the expanded space. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method enables the estimation of DOAs in the case of underdetermined conditions. Furthermore, we describe the class of mapping that allows us to analyze the higher-order moments of the observed signal in the original space. We compare 2q-MUSIC and the proposed method through an experiment assuming that the true number of sources is known as prior information to evaluate in terms of the bias-variance tradeoff of the statistics and computational complexity. The results clarify that the proposed method has advantages for both computational complexity and estimation accuracy in short-time analysis, i.e., the time duration of the analyzed data is short.

DOI

Scopus

3

Citation

(Scopus)
An extension of MUSIC exploiting higher-order moments via nonlinear mapping

Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Biing-Hwang Juang

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E99A ( 6 ) 1152 - 1162 2016.06 [Refereed]

　View Summary

MUltiple SIgnal Classification (MUSIC) is a standard technique for direction of arrival (DOA) estimation with high resolution. However, MUSIC cannot estimate DOAs accurately in the case of underdetermined conditions, where the number of sources exceeds the number of microphones. To overcome this drawback, an extension of MUSIC using cumulants called 2q-MUSIC has been proposed, but this method greatly suffers from the variance of the statistics, given as the temporal mean of the observation process, and requires long observation. In this paper, we propose a new approach for extending MUSIC that exploits higher-order moments of the signal for the underdetermined DOA estimation with smaller variance. We propose an estimation algorithm that nonlinearly maps the observed signal onto a space with expanded dimensionality and conducts MUSIC-based correlation analysis in the expanded space. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method enables the estimation of DOAs in the case of underdetermined conditions. Furthermore, we describe the class of mapping that allows us to analyze the higher-order moments of the observed signal in the original space. We compare 2q-MUSIC and the proposed method through an experiment assuming that the true number of sources is known as prior information to evaluate in terms of the bias-variance tradeoff of the statistics and computational complexity. The results clarify that the proposed method has advantages for both computational complexity and estimation accuracy in short-time analysis, i.e., the time duration of the analyzed data is short.

DOI

Scopus

3

Citation

(Scopus)
ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

高草木萌, 北村大地, 小野順貴, 山田武志, Makino, Shoji, 猿渡洋

日本機械学会ロボティクス・メカトロニクス講演会 1A2-10a3 1 - 4 2016.06
Applying independent vector analysis and noise cancellation to noise reduction for a hose-shaped rescue robot

石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 1P1-08b3 1 - 4 2016.06

　View Summary

This paper presents a noise reduction on a hose-shaped rescue robot. The hose-shaped rescue robot is one of rescue robots developed on Tough Robotics Challenge, and it is used for searching for victims by getting one's voice with its microphone-array. However, the ego noise, caused by its vibration motors, makes it difficult to get the human voice. We propose a noise reduction method using a blind source separation technique based on Independent Vector Analysis (IVA) and noise cancellation. Our method consists of two steps: (1) estimating a speech signal and an ego-noise signal from observed multi-channel signals using the IVA-based blind source separation technique, and (2) applying the noise cancellation to the estimated speech signal using the estimated ego-noise signal as a noise reference.

DOI
ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

高草木萌, 北村大地, 小野順貴, 山田武志, Makino, Shoji, 猿渡洋

日本機械学会ロボティクス・メカトロニクス講演会 1A2-10a3 1 - 4 2016.06

　View Summary

A hose-shaped rescue robot is one of the robots that are developed for disaster response in case of a large-scale disasters such as a great earthquake. The robot is suitable for entering narrow and dark places covered with rubble in the disaster site, and for finding inside it. This robot can transmit the ambient sound to its operator by using the built-in microphones. However, there is a serious problem that the inherent noise of this robot, such as the vibration sound or the fricative sound, is mixed into the transmitting voice, therefore disturbing the operator's hearing for a call of help from the victim of the disaster. In this paper, we apply the multichannel NMF (nonnegative matrix factorization) with the rank-1 spatial constraint (Rank-1 MNMF), which was proposed by Kitamura et al., to the reduction of the inherent noise.

DOI
A-5-2 Noise reduction for a hose-shaped rescue robot using independent vector analysis and noise cancellation

石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

電子情報通信学会総合大会 2016 58 - 58 2016.03

CiNii
教師あり多チャネルNMFと統計的音声強調を用いた柔軟索状ロボットにおける音源分離

高田一真, 北村大地, 中嶋広明, 小山翔一, 猿渡洋, 小野順貴, 牧野,昭二

日本音響学会2015年春季研究発表会 ( 3-3-2 ) 609 - 612 2016.03
非同期分散マイクロホンによるブラインド音源分離を用いた複数話者同時音声認識

越智景子, 小野順貴, 宮部滋樹, 牧野,昭二

日本音響学会2016年春季研究発表会 ( 3-3-1 ) 607 - 608 2016.03
Vehicle counting and lane estimation with ad-hoc microphone array in real road environments

Takuya,Toyoda, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

Proc. NCSP'16 622 - 625 2016.03 [Refereed]
A-5-1 Noise reduction using rank-1 multichannel NMF for a hose-shaped rescue robot

高草木,萌, 北村,大地, 小野,順貴, 山田,武志, 牧野,昭二, 猿渡,洋

Proceedings of the IEICE Engineering Sciences Society/NOLTA Society Conference 2016 57 - 57 2016.03

CiNii
振幅のみからの相関推定と雑音尖度に基づく空間サブトラクションアレーの減算係数最適化

李,傑, 宮部,滋樹, 小野,順貴, 山田,武志, 牧野,昭二

日本音響学会2016年春季研究発表会 689 - 692 2016.03
Performance estimation of noisy speech recognition using spectral distortion and recognition task complexity

Ling Guo, Takeshi Yamada, Shigeki Miyabe, Shoji Makino, Nobuhiko Kitawaki

Acoustical Science and Technology 37 ( 6 ) 286 - 294 2016 [Refereed]

　View Summary

Previously, methods for estimating the performance of noisy speech recognition based on a spectral distortion measure have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, no consideration is given to any change in the components of a speech recognition system. To solve this problem, we propose a novel method for estimating the performance of noisy speech recognition, a major feature of which is the ability to accommodate the use of different noise reduction algorithms and recognition tasks by using two cepstral distances (CDs) and the square mean root perplexity (SMR-perplexity). First, we verified the effectiveness of the proposed distortion measure, i.e., the two CDs. The experimental results showed that the use of the proposed distortion measure achieves estimation accuracy equivalent to the use of the conventional distortion measures, the perceptual evaluation of speech quality (PESQ) and the signal-to-noise ratio (SNR) of noise-reduced speech, and has the advantage of being applicable to noise reduction algorithms that directly output the mel-frequency cepstral coefficient (MFCC) feature. We then evaluated the proposed method by performing a closed test and an open test (10-fold crossvalidation test). The results confirmed that the proposed method gives better estimates without being dependent on the differences among the noise reduction algorithms or the recognition tasks.

DOI

Scopus
Performance Estimation of Spontaneous Speech Recognition Using Non-Reference Acoustic Features

Ling Guo, Takeshi Yamada, Shoji Makino

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1 - 4 2016 [Refereed]

　View Summary

To ensure a satisfactory QoE (Quality of Experience), it is essential to establish a method that can be used to efficiently investigate recognition performance for spontaneous speech. By using this method, it is allowed to monitor the recognition performance in providing speech recognition services. It can be also used as a reliability measure in speech dialogue systems. Previously, methods for estimating the performance of noisy speech recognition based on spectral distortion measures have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, the methods cannot be applied to spontaneous speech because they require the reference speech to obtain the distortion values. To solve this problem, we propose a novel method for estimating the recognition performance of spontaneous speech with various speaking styles. The main feature is to use non-reference acoustic features that do not require the reference speech. The proposed method extracts non-reference features by openSMILE (open-Source Media Interpretation by Large feature-space Extraction) and then estimates the recognition performance by using SVR (Support Vector Regression). We confirmed the effectiveness of the proposed method by experiments using spontaneous speech data from the OGVC (On-line Gaming Voice Chat) corpus.
NOISE REDUCTION USING INDEPENDENT VECTOR ANALYSIS AND NOISE CANCELLATION FOR A HOSE-SHAPED RESCUE ROBOT

Masaru Ishimura, Shoji Makino, Takeshi Yamada, Nobutaka Ono, Hiroshi Saruwatari

2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) 1 - 5 2016 [Refereed]

　View Summary

In this paper, we present noise reduction for a hose-shaped rescue robot. The robot is used for searching for disaster victims by capturing their voice with its microphone array. However, the ego noise generated by its vibration motors makes it difficult to distinguish human voices. To solve this problem, we propose a noise reduction method using a blind source separation technique based on independent vector analysis (IVA) and noise cancellation. Our method consists of two steps: (1) estimating a speech signal and an ego noise signal from observed multichannel signals using the IVA-based blind source separation technique, and (2) applying noise cancellation to the estimated speech signal using the estimated ego noise signal as a noise reference. The experimental evaluations show that this approach is effective for suppressing the ego noise.
Visual Motion Onset Brain--computer Interface

Tomasz M. Rutkowski

Proc. International Conference on Bio-engineering for Smart Technologies (BioSMART) 1 - 4 2016 [Refereed]
Nonlinear speech enhancement by virtual increase of channels and maximum SNR beamformer

Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING 2016 ( 1 ) 1 - 8 2016.01 [Refereed]

　View Summary

In this paper, we propose a new microphone array signal processing technique, which increases the number of microphones virtually by generating extra signal channels from real microphone signals. Microphone array signal processing methods such as speech enhancement are effective for improving the quality of various speech applications such as speech recognition and voice communication systems. However, the performance of speech enhancement and other signal processing methods depends on the number of microphones. Thus, special equipment such as a multichannel A/D converter or a microphone array is needed to achieve high processing performance. Therefore, our aim was to establish a technique for improving the performance of array signal processing with a small number of microphones and, in particular, to increase the number of channels virtually by synthesizing virtual microphone signals, or extra signal channels, from two channels of microphone signals. Each virtual microphone signal is generated by interpolating a short-time Fourier transform (STFT) representation of the microphone signals. The phase and amplitude of the signal are interpolated individually. The phase is linearly interpolated on the basis of a sound propagation model, and the amplitude is nonlinearly interpolated on the basis of beta divergence. We also performed speech enhancement experiments using a maximum signal-to-noise ratio (SNR) beamformer equipped with virtual microphones and evaluated the improvement in performance upon introducing virtual microphones.

DOI

Scopus

22

Citation

(Scopus)
EGO-NOISE REDUCTION FOR A HOSE-SHAPED RESCUE ROBOT USING DETERMINED RANK-1 MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION

Moe Takakusaki, Daichi Kitamura, Nobutaka Ono, Takeshi Yamada, Shoji Makino, Hiroshi Saruwatari

2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) 1 - 4 2016 [Refereed]

　View Summary

A hose-shaped rescue robot is one of the robots that have been developed for disaster response in times of large-scale disasters such as a massive earthquake. This robot is suitable for entering narrow and dark places covered with rubble in a disaster site and for finding victims inside it. It can transmit ambient sound captured by its built-in microphones to its operator. However, there is a serious problem, that is, the inherent noise of this robot, such as vibration sound or fricative sound, is mixed with the transmitted voice, thereby disturbing the operator's perception of a call for help from a disaster victim. In this paper, we apply the multichannel nonnegative matrix factorization (NMF) with the rank-1 spatial constraint (determined rank-1 MNMF), which was proposed by Kitamura et al., to the reduction of the inherent noise.
Multi-talker Speech Recognition Based on Blind Source Separation with Ad hoc Microphone Array Using Smartphones and Cloud Storage

Keiko Ochi, Nobutaka Ono, Shigeki Miyabe, Shoji Makino

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5 3369 - 3373 2016 [Refereed]

　View Summary

In this paper, we present a multi-talker speech recognition system based on blind source separation with an ad hoc microphone array, which consists of smartphones and cloud storage. In this system, a mixture of voices from multiple speakers is recorded by each speaker's smartphone, which is automatically transferred to online cloud storage. Our prototype system is realized using iPhone and Dropbox. Although the signals recorded by different iPhones are not synchronized, the blind synchronization technique compensates both the differences in the time offset and the sampling frequency mismatch. Then, auxiliary-function-based independent vector analysis separates the synchronized mixture into each speaker's voice. Finally, automatic speech recognition is applied to transcribe the speech. By experimental evaluation of the multi-talker speech recognition system using Julius, we confirm that it effectively reduces the speech overlap and improves the speech recognition performance.

DOI

Scopus

18

Citation

(Scopus)
Tactile Brain-computer Interface Using Classification of P300 Responses Evoked by Full Body Spatial Vibrotactile Stimuli

Takumi Kodama, Shoji Makino, Tomasz M. Rutkowski

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1 - 4 2016 [Refereed]

　View Summary

In this study we propose a novel stimulus-driven brain-computer interface (BCI) paradigm, which generates control commands based on classification of somatosensory modality P300 responses. Six spatial vibrotactile stimulus patterns are applied to entire back and limbs of a user. The aim of the current project is to validate an effectiveness of the vibrotactile stimulus patterns for BCI purposes and to establish a novel concept of tactile modality communication link, which shall help locked-in syndrome (LIS) patients, who lose their sight and hearing due to sensory disabilities. We define this approach as a full-body BCI (fbBCI) and we conduct psychophysical stimulus evaluation and realtime EEG response classification experiments with ten healthy body-able users. The grand mean averaged psychophysical stimulus pattern recognition accuracy have resulted at 9 8 : 1 8 %, whereas the realtime EEG accuracy at 5 3 : 6 7 %. An information-transfer-rate (ITR) scores of all the tested users have ranged from 0 : 0 4 2 to 4 : 1 5 4 bit/minute.
Ego Noise Reduction for Hose-Shaped Rescue Robot Combining Independent Low-Rank Matrix Analysis and Noise Cancellation

Narumi Mae, Daichi Kitamura, Masaru Ishimura, Takeshi Yamada, Shoji Makino

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1 - 6 2016 [Refereed]

　View Summary

In this paper, we present an ego noise reduction method for a hose-shaped rescue robot developed for search and rescue operations in large-scale disasters such as a massive earthquake. It can enter narrow and dark places covered with rubble in a disaster site and is used to search for disaster victims by capturing their voices with its microphone array. However, ego noises, such as vibration or fricative sounds, are mixed with the voices, and it is difficult to differentiate them from a call for help from a disaster victim. To solve this problem, we here propose a two-step noise reduction method as follows: (1) the estimation of both speech and ego noise signals from an observed multichannel signal by multichannel nonnegative matrix factorization (NMF) with the rank-1 spatial constraint, which was proposed by Kitamura et al., and (2) the application of noise cancellation to the estimated speech signal using the noise reference. Our evaluations show that this approach is effective for suppressing ego noise.
Unisoner：様々な歌手が同一楽曲を歌ったWeb上の多様な歌声を活用する合唱制作支援インタフェース

都築,圭太, 中野,倫靖, 後藤,真孝, 山田,武志, 牧野,昭二

情報処理学会論文誌 56 ( 12 ) 2370 - 2383 2015.12 [Refereed]

CiNii
Unisoner: An interface for derivative chorus creation from various singing voices singing the same song on the web

K.,Tsuzuki, T.,Nakano, M.,Goto, T.,Yamada, S.,Makino

Journal of Information Processing 56 ( 12 ) 2370 - 2383 2015.12 [Refereed]

CiNii
Adaptive post-filtering method controlled by pitch frequency for CELP-based speech coding

H.,Chiba, Y.,Kamamoto, T.,Moriya, N.,Harada, S.,Miyabe, T.,Yamada, S.,Makino

IEICE Trans. Information and Systems J98-D ( 10 ) 1301 - 1311 2015.10 [Refereed]
CELPに基づく音声符号化向けのピッチ周波数に依存した適応ポストフィルタ

千葉,大将, 鎌本,優, 守谷,健弘, 原田,登, 宮部,滋樹, 山田,武志, 牧野,昭二

電子情報通信学会論文誌 J98-D ( 10 ) 1301 - 1311 2015.10 [Refereed]
ノンリファレンスひずみ特徴量を用いた雑音下音声認識性能推定の検討

郭,レイ, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

日本音響学会2015年秋季研究発表会 95 - 98 2015.09
日本語スピーキングテストSJ-CATにおける低スコア解答発話の検出の検討

小野,友暉, 山田,武志, 今井,新悟, 牧野,昭二

日本音響学会2015年秋季研究発表会 329 - 332 2015.09
マイクロホンアレーの位相が観測できない条件でのチャネル間の相関係数の推定

宮部滋樹, 小野順貴, 牧野,昭二

回路とシステムワークショップ 28 347 - 352 2015.08

CiNii
Underdetermined blind speech separation with directivity pattern based continuous mask and ICA

Shoko Araki, Shoji Makino, Hiroshi Sawada, Ryo Mukai

European Signal Processing Conference 06-10- 1991 - 1994 2015.04

　View Summary

We propose a method for separating speech signals when sources outnumber the sensors. In this paper we mainly concentrate on the case of three sources and two sensors. Some existing methods employ binary masks to extract the signals, and therefore, the extracted signals contain loud musical noise. To overcome this problem, we propose the utilization of a directivity pattern based continuous mask, which removes a single source from the observations, and independent component analysis (ICA) to separate the remaining mixtures. Experimental results show that our proposed method can separate signals with little distortion even in a real reverberant environment of T R =130 ms.
認識性能予測に基づく雑音環境下音声認識のユーザビリティ改善の検討

青木,智充, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会 133 - 136 2015.03
非同期分散マイクロフォンアレーによる伝達関数ゲイン基底NMFを用いた拡散雑音抑圧

村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会2015年春季研究発表会 557 - 560 2015.03
Activity Report from the AASP-TC

Makino,Shoji

IEEE Signal Processing Society eNewsletter, TC News 2015.03 [Refereed]
Signal Processing Techniques for Assisted Listening

Sven Nordholm, Walter Kellermann, Simon Doclo, Vesa Vaelimaeki, Shoji Makino, John R. Hershey

IEEE SIGNAL PROCESSING MAGAZINE 32 ( 2 ) 16 - 17 2015.03 [Refereed]

DOI

Scopus

1

Citation

(Scopus)
ステレオ録音に基づく移動音源モデルによる走行車両検出と走行方向推定

遠藤,純基, 豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会2015年春季研究発表会 717 - 720 2015.03
総合品質と明瞭性の客観推定に基づくスペクトルサブトラクションの減算係数の最適化

中里,徹, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会 333 - 336 2015.03
ケプストラム距離とSMR-パープレキシティを用いた雑音下音声認識の性能推定の検討

郭,レイ, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会 129 - 132 2015.03
Spatial tactile brain-computer interface by applying vibration to user's shoulders and waist

T.,Kodama, Makino,Shoji, T.M.,Rutkowski

AEARU Workshop on Computer Science and Web Technology 41 - 42 2015.02 [Refereed]
SSVEP brain-computer interface using green and blue lights

D.,Aminaka, Makino,Shoji, T.M.,Rutkowski

AEARU Workshop on Computer Science and Web Technology 39 - 40 2015.02 [Refereed]
Spatial auditory brain-computer interface using head related impulse response

C.,Nakaizumi, T.,Matsui, K.,Mori, Makino,Shoji, T.M.,Rutkowski

AEARU Workshop on Computer Science and Web Technology 37 - 38 2015.02 [Refereed]
Blind compensation of interchannel sampling frequency mismatch for ad hoc microphone array based on maximum likelihood estimation

Shigeki Miyabe, Nobutaka Ono, Shoji Makino

SIGNAL PROCESSING 107 ( SI ) 185 - 196 2015.02 [Refereed]

　View Summary

In this paper, we propose a novel method for the blind compensation of drift for the asynchronous recording of an ad hoc microphone array. Digital signals simultaneously observed by different recording devices have drift of the time differences between the observation channels because of the sampling frequency mismatch among the devices. On the basis of a model in which the time difference is constant within each short time frame but varies in proportion to the central time of the frame, the effect of the sampling frequency mismatch can be compensated in the short-time Fourier transform (STFT) domain by a linear phase shift. By assuming that the sources are motionless and have stationary amplitudes, the observation is regarded as being stationary when drift does not occur. Thus, we formulate a likelihood to evaluate the stationarity in the STFT domain to evaluate the compensation of drift. The maximum likelihood estimation is obtained effectively by a golden section search. Using the estimated parameters, we compensate the drift by STFT analysis with a noninteger frame shift. The effectiveness of the proposed blind drift compensation method is evaluated in an experiment in which artificial drift is generated. (C) 2014 The Authors. Published by Elsevier B.V.

DOI

Scopus

58

Citation

(Scopus)
Tactile pin-pressure brain-computer interface

K.,Shimizu, H.,Mori, Makino,Shoji, T.M.,Rutkowski

AEARU Workshop on Computer Science and Web Technology 35 - 36 2015.02 [Refereed]
Multi-command tactile brain-computer interface using the touch-sense glove

H.,Yajima, Makino,Shoji, T.M.,Rutkowski

AEARU Workshop on Computer Science and Web Technology 43 - 44 2015.02 [Refereed]
Implementation and evaluation of an acoustic echo canceller using duo-filter control system

Yoichi Haneda, Shoji Makino, Junji Kojima, Suehiro Shimauchi

European Signal Processing Conference 2015

　View Summary

The developed acoustic echo canceller uses an exponentially weighted step-size projection algorithm and a duo-filter control system to achieve fast convergence and high speech quality. The duo-filter control system has an adaptive filter and a fixed filter, and uses variable-loss insertion. Evaluation of this system with multi-channel A/D and D/A converters showed that (1) the convergence speed is under 1.5 seconds for speech input when the adaptive filter length is 125 ms, (2) the residual echo level is nearly as low as the ambient noise level (average: Under -20 dB
maximum: Under -35 dB), and (3) near-end speech is sent with no disturbance during double talk.
Brain Evoked Potential Latencies Optimization for Spatial Auditory Brain--Computer Interface

Tomasz M. Rutkowski

Cognitive Computation 7 ( 1 ) 34 - 43 2015 [Refereed]

　View Summary

© 2013, Springer Science+Business Media New York. We propose a novel method for the extraction of discriminative features in electroencephalography (EEG) evoked potential latency. Based on our offline results, we present evidence indicating that a full surround sound auditory brain–computer interface (BCI) paradigm has potential for an online application. The auditory spatial BCI concept is based on an eight-directional audio stimuli delivery technique, developed by our group, which employs a loudspeaker array in an octagonal horizontal plane. The stimuli presented to the subjects vary in frequency and timbre. To capture brain responses, we utilize an eight-channel EEG system. We propose a methodology for finding and optimizing evoked response latencies in the P300 range in order later to classify them correctly and to elucidate the subject’s chosen targets or ignored non-targets. To accomplish the above, we propose an approach based on an analysis of variance for feature selection. Finally, we identify the subjects’ intended commands with a Naive Bayesian classifier for sorting the final responses. The results obtained with ten subjects in offline BCI experiments support our research hypothesis by providing higher classification results and an improved information transfer rate compared with state-of-the-art solutions.

DOI

Scopus

9

Citation

(Scopus)
Chromatic and High-frequency cVEP-based BCI Paradigm

Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC) 1906 - 1909 2015 [Refereed]

　View Summary

We present results of an approach to a code-modulated visual evoked potential (cVEP) based braincomputer interface (BCI) paradigm using four high-frequency flashing stimuli. To generate higher frequency stimulation compared to the state-of-the-art cVEP-based BCIs, we propose to use the light-emitting diodes (LEDs) driven from a small micro-controller board hardware generator designed by our team. The high-frequency and green-blue chromatic flashing stimuli are used in the study in order to minimize a danger of a photosensitive epilepsy (PSE). We compare the the green-blue chromatic cVEP-based BCI accuracies with the conventional white-black flicker based interface. The high-frequency cVEP responses are identified using a canonical correlation analysis (CCA) method.
Classification accuracy improvement of chromatic and high–frequency code–modulated visual evoked potential–based BCI

Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9250 232 - 241 2015 [Refereed]

　View Summary

© Springer International Publishing Switzerland 2015. We present results of a classification improvement approach for a code–modulated visual evoked potential (cVEP) based brain– computer interface (BCI) paradigm using four high–frequency flashing stimuli. Previously published research reports presented successful BCI applications of canonical correlation analysis (CCA) to steady–state visual evoked potential (SSVEP) BCIs. Our team already previously proposed the combined CCA and cVEP techniques’ BCI paradigm. The currently reported study presents the further enhanced results using a support vector machine (SVM) method in application to the cVEP–based BCI.

DOI

Scopus

7

Citation

(Scopus)
Fingertip Stimulus Cue-based Tactile Brain-computer Interface

Hiroki Yajima, Shoji Makino, Tomasz M. Rutkowski

2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1059 - 1064 2015 [Refereed]

　View Summary

The reported project aims to confirm whether a tactile glove fingertips' stimulator is effective for a brain-computer interface (BCI) paradigm using somatosensory event potential (SEP) responses with possible attentional modulation. The proposed simplified stimulator device is presented in detail together with psychophysical and EEG BCI experiment protocols. Results supporting the proposed simple tactile glove device are presented in form of online BCI classification accuracy results using shrinkage linear discriminant analysis (sLDA) technique. Finally, we discuss future possible paradigm improvement steps.
Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation

Shigeki Miyabe, Notubaka Ono, Shoji Makino

LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, LVA/ICA 2015 9237 421 - 428 2015 [Refereed]

　View Summary

In this paper, we propose a method to estimate a correlation coefficient of two correlated complex signals on the condition that only the amplitudes are observed and the phases are missing. Our proposed method is based on a maximum likelihood estimation. We assume that the original complex random variables are generated from a zero-mean bivariate complex normal distribution. The likelihood of the correlation coefficient is formulated as a bivariate Rayleigh distribution by marginalization over the phases. Although the maximum likelihood estimator has no analytical form, an expectation-maximization (EM) algorithm can be formulated by treating the phases as hidden variables. We evaluate the accuracy of the estimation using artificial signal, and demonstrate the estimation of narrow-band correlation of a two-channel audio signal.

DOI

Scopus

3

Citation

(Scopus)
Inter-stimulus Interval Study for the Tactile Point-pressure Brain-computer Interface

Kensuke Shimizu, Shoji Makino, Tomasz M. Rutkowski

2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC) 1910 - 1913 2015 [Refereed]

　View Summary

The paper presents a study of an inter-stimulus interval (ISI) influence on a tactile point-pressure stimulus-based brain-computer interface's (tpBCI) classification accuracy. A novel tactile pressure generating tpBCI stimulator is also discussed, which is based on a three-by-three pins' matrix prototype. The six pin-linear patterns are presented to the user's palm during the online tpBCI experiments in an oddball style paradigm allowing for "the aha-responses" elucidation, within the event related potential (ERP). A subsequent classification accuracies' comparison is discussed based on two ISI settings in an online tpBCI application. A research hypothesis of classification accuracies' non-significant differences with various ISIs is confirmed based on the two settings of 120 ms and 300 ms, as well as with various numbers of ERP response averaging scenarios.
Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model

Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 599 - 603 2015 [Refereed]

　View Summary

In this paper, we propose a method for suppressing a large number of interferences by using multichannel amplitude analysis based on nonnegative matrix factorization (NMF) and its effective semi-supervised training. For the point-source interference reduction of an asynchronous microphone array, we propose amplitude-based speech enhancement in the time-channel domain, which we call transfer-function-gain NMF. Transfer-function-gain NMF is a robust method against drift, which disrupts an inter-channel phase analysis. We use this method to suppress a large number of sources. We show that a mass of interferences can be modeled by a single basis assuming that the noise sources are sufficiently far from the microphones and the spatial characteristics become similar to each other. Since the blind optimization of the NMF parameters does not work well with merely sparse observation contaminated by the constant heavy noise, we train the diffuse noise basis in advance of the noise suppression using a speech absent observation, which can be obtained easily using a simple voice activity detection technique. We confirmed the effectiveness of our proposed model and semi-supervised transfer-function-gain NMF in an experiment simulating a target source that was surrounded by a diffuse noise.
Variable Sound Elevation Features for Head-related Impulse Response Spatial Auditory BCI

Chisaki Nakaizumi, Shoji Makino, Tomasz M. Rutkowski

2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1094 - 1099 2015 [Refereed]

　View Summary

This paper presents a study of classification and EEG feature improvement for a spatial auditory brain-computer interface (saBCI). This study provides a comprehensive test of a head-related impulse response (HRIR) cues for the saBCI speller paradigm. We present a comparison with previously developed HRIR-based spatial auditory modalities. We propose and optimize the three types of sound spatialization settings using a variable elevation in order to evaluate the HRIR efficacy for the saBCI. Three experienced and seven naive BCI users participate in the three experimental setups based on ten presented Japanese syllables. The obtained EEG auditory evoked potentials (AEPs) result with encouragingly good and stable P300 responses in online saBCI experiments. We analyze the differences and dispersions of saBCI command accuracies, as well as the individual user accuracies for various spatial sound locations. Our case study indicates that the participating users could perceive elevation in the saBCI experiments using the HRIR measured from a general head model.
Head-related Impulse Response Cues for Spatial Auditory Brain-computer Interface

Chisaki Nakaizumi, Shoji Makino, Tomasz M. Rutkowski

2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC) 1071 - 1074 2015 [Refereed]

　View Summary

This study provides a comprehensive test of a head-related impulse response (HRIR) cues for a spatial auditory brain-computer interface (saBCI) speller paradigm. We present a comparison with the conventional virtual sound headphone-based spatial auditory modality. We propose and optimize the three types of sound spatialization settings using a variable elevation in order to evaluate the HRIR efficacy for the saBCI. Three experienced and seven naive BCI users participated in the three experimental setups based on ten presented Japanese syllables. The obtained EEG auditory evoked potentials (AEP) resulted with encouragingly good and stable P300 responses in online BCI experiments. Our case study indicated that users could perceive elevation in the saBCI experiments generated using the HRIR measured from a general head model. The saBCI accuracy and information transfer rate (ITR) scores have been improved comparing to the classical horizontal plane-based virtual spatial sound reproduction modality, as far as the healthy users in the current pilot study are concerned.
Eeg filtering optimization for code-modulated chromatic visual evoked potential-based brain-computer interface

D. Aiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9359 1 - 6 2015 [Refereed]

　View Summary

© Springer International Publishing Switzerland 2015. We present visual BCI classification accuracy improved results after application of high- and low-pass filters to an electroen- cephalogram (EEG) containing code-modulated visual evoked poten- tials (cVEPs). The cVEP responses are applied for the brain-computer interface (BCI) in four commands paradigm mode. The purpose of this project is to enhance BCI accuracy using only the single trial cVEP response. We also aim at identification of the most discriminable EEG bands suitable for the broadband visual stimuli. We report results from a pilot study optimizing the EEG filtering using infinite impulse response filters in application to feature extraction for a linear support vector machine (SVM) classification method. The goal of the presented study is to develop a faster and more reliable BCI to further enhance the sym- biotic relationships between humans and computers.

DOI
SVM Classification Study of Code-modulated Visual Evoked Potentials

Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1065 - 1070 2015 [Refereed]

　View Summary

We present a study of a support vector machine (SVM) application to brain-computer interface (BCI) paradigm. Four SVM kernel functions are evaluated in order to maximize classification accuracy of a four classes-based BCI paradigm utilizing a code-modulated visual evoked potential (cVEP) response within the captured EEG signals. Our previously published reports applied only the linear SVM, which already outperformed a more classical technique of a canonical correlation analysis (CCA). In the current study we additionally test and compare classification accuracies of polynomial, radial basis and sigmoid kernels, together with the classical linear (non-kernel-based) SVMs in application to the cVEP BCI.
TDOA estimation by mapped SRP based on higher-order moment analysis

Xiao-Dong,Zhai, Yuya,Sugimoto, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

Proc. APSIPA 2014 2014.12 [Refereed]
Adaptive control of applying band-width for post filter of speech coder depending on pitch frequency

Hironobu,Chiba, Yutaka,Kamamoto, Takehiro,Moriya, Noboru,Harada, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

Proc. Asilomar Conference on Signals, Systems, and Computers, Asilomar 2014 2014.11 [Refereed]
ケプストラム距離を用いた雑音下音声認識の性能推定の検討

郭,翎, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

日本音響学会研究発表会講演論文集 61 - 62 2014.09
Spatial tactile brain-computer interface paradigm applying vibration stimuli to large areas of user's back

T.,Kodama, Makino,Shoji, T.M.,Rutkowski

International Brain-Computer Interface Conference 1 - 4 2014.09 [Refereed]
βダイバージェンスに基づく一般化振幅補間によるヴァーチャル多素子化を用いた目的音源強調

片平,拓希, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会講演論文集 633 - 636 2014.09

CiNii
伝達関数ゲイン基底NMFにおけるマイク数・マイク配置と目的音強調性能の関係

村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会講演論文集 523 - 526 2014.09

CiNii
Spatial Auditory Two-step Input Japanese Syllabary Brain-computer Interface Speller

M.,Chang, K.,Mori, Makino,Shoji, Rutkowski, Tomasz Maciej

Procedia Technology 18 25 - 31 2014.09 [Refereed]

　View Summary

We present a concept and a pilot study of a two-step input speller application combined with a spatial auditory brain-computer interface (BCI) for locked-in syndrome (LIS) users. The application has been developed for 25 Japanese syllabary (hiragana) characters using a two-step input procedure, in order to create an easy-to-use BCI-speller interface. In the proposed procedure, the user first selects the representative letter of a subset, defining the second step. In the second step, the final choice is made. At each interfacing step, the user's intentional choices are classified based on the P300 event related potential (ERP) responses captured in the EEG, as in the classic oddball paradigm. The BCI experiment and EEG results of the pilot study confirm the effectiveness of the proposed spelling method. (C) 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

DOI
Head-related impulse response-based spatial auditory brain-computer interface

C.,Nakaizumi, T.,Matsui, K.,Mori, Makino,Shoji, T.M.,Rutkowski

International Brain-Computer Interface Conference 1 - 4 2014.09 [Refereed]
絶対値の観測のみを用いた2つの複素信号の相関係数推定

宮部滋樹, 小野順貴, 牧野,昭二

日本音響学会研究発表会講演論文集 ( 1-Q-40 ) 735 - 738 2014.09

CiNii
教師なし伝達関数ゲイン基底NMFによる目的音強調における罰則項の特性評価

千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会講演論文集 527 - 530 2014.09

CiNii
分散型マイクロホンアレイを用いた交通車両検出とその車線推定の検討

豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会講演論文集 643 - 646 2014.09

CiNii
Multi-stage declipping of clipping distortion based on length classification of clipped interval

Chenlei,Li, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

日本音響学会研究発表会講演論文集 553 - 556 2014.09

CiNii
Unisoner: An interactive interface for derivative chorus creation from various singing voices on the web

K.,Tsuzuki, T.,Nakano, M.,Goto, T.,Yamada, Makino,Shoji

International Computer Music Conference joint with the Sound & Music Computing conference 790 - 797 2014.09 [Refereed]
Unisoner: an interactive interface for derivative chorus creation from various singing voices on the Web

Keita,Tsuzuki, Tomoyasu,Nakano, Masataka,Goto, Takeshi,Yamada, Shoji,Makino

Proc. ICMC SMC 2014 790 - 797 2014.09 [Refereed]
News from the AASP-TC

Makino,Shoji

IEEE Signal Processing Society eNewsletter, TC News 2014.08 [Refereed]
Electroencephalogram steady state response sonification focused on the spatial and temporal properties

Makino,Shoji, T.,Kaniwa, H.,Terasawa

International Conference on Auditory Display ( LS7-1 ) 1 - 7 2014.06 [Refereed]
EEG Steady State Response Sonification Focused on the Spatial and Temporal Properties

Kaniwa, Teruaki, Terasawa, Hiroko, Matsubara, Masaki, Rutkowski, Tomasz, Makino, Shoji

Proceedings of the 20th International Conference on Auditory Display 2014 (ICAD2014) 1 - 7 2014.06 [Refereed]
Reduction of computational cost in underdetermined blind source separation based on frequency-dependent time-difference-of-arrival estimation

丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田,武志, 牧野昭二, 中村篤

The Journal of the Acoustical Society of Japan 70 ( 6 ) 323 - 331 2014.06 [Refereed]

　View Summary

本論文ではEMアルゴリズムを用いたスパース性に基づく劣決定ブラインド音源分離(BSS)の計算を高速化する手法を提案する。Izumi et al.は,雑音・残響下でも頑健な劣決定BSSを提案したが,到来時間差パラメータをイタレーションごとに離散全探索で求める更新則のために計算量に問題があった。そこで,到来時間差パラメータが周波数に依存すると捉えた,時間差パラメータが解析的に更新される計算量の少ない更新則を提案する。また,帯域重み付け平均による帯域非依存到来時間差推定によってパラメータ数を削減し,収束性を向上させる。実験により,提案手法が計算時間を1/10程度に削減することを確認した。

CiNii
Multimedia Information Processing Combining Brain Science, Life Science, and Information Science

Makino,Shoji

USJI Universities Research Report vol.32 2014.06 [Refereed]
Reduction of computational cost in underdetermined blind source separation based on frequency-dependent time-difference-of-arrival estimation

T.,Maruyama, S.,Araki, T.,Nakatani, S.,Miyabe, T.,Yamada, 牧野,昭二, A.,Nakamura

J. Acoust. Soc. Jpn vol. 70 ( no. 6 ) 323 - 331 2014.06 [Refereed]

CiNii
Acoustic signal processing based on asynchronous and distributed microphone array

N., Ono, S., Miyabe, S., Makino

J. Acoust. Soc. Jpn vol. 70 ( no. 7 ) 391 - 396 2014.06 [Refereed]
Reduction of computational cost in underdetermined blind source separation based on frequency dependent time-difference-of-arrival estimation

丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田武志, 牧野, 昭二, 中村, 篤

J. Acoust. Soc. Jpn 70 ( 6 ) 323 - 331 2014.06 [Refereed]

CiNii
Ad-hoc microphone array - Acoustic signal processing using multiple mobile recording devices -

N., Ono, K.L., Trung, S., Miyabe, S., Makino

IEICE Fundamentals Review vol. 7 ( no. 4 ) 336 - 347 2014.04 [Refereed]

　View Summary

Microphone array signal processing is a framework for source localization, source enhancement and source separation with processing multichannel observations achieved using multiple microphones, which are difficult using a single microphone. In microphone array signal processing, the small time difference between channels is a very important cue to obtain spatial information. Therefore, a multichannel A-D converter has been conventionally essential for synchronized observation. On the other hand, if microphone array signal processing can be performed using asynchronous recording devices such as laptop PCs, voice recorders, or smart phones, which are easily available in daily life, it would enhance convenience and increase the number of possible applications markedly. In this review, focusing on a new trend of microphone array signal processing using asynchronously recording devices, we survey existing works and also introduce our approach.

DOI CiNii
Adaptive post-fltering method controlled by pitch frequency for CELP-based speech coding

H.,Chiba, Y.,Kamamoto, T.,Moriya, N.,Harada, S.,Miyabe, T.,Yamada, S.,Makino

IEICE Trans. Information and Systems 2014.04 [Refereed]
非負値行列分解と位相復元に基づくオーディオ符号化の多チャネル化

劉必翔, 澤田宏, 宮部滋樹, 山田武志, 牧野昭二

日本音響学会春季研究発表会 819 - 822 2014.03

CiNii
種々の雑音抑圧手法と認識タスクに適用可能な音声認識性能推定法の検討

郭レイ, 山田武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会春季研究発表会 13 - 14 2014.03
ACELP用ポストフィルタのピッチ強調帯域及び利得の適応化

千葉大将, 鎌本優, 守谷健弘, 原田登, 宮部滋樹, 山田武志, 牧野昭二

日本音響学会春季研究発表会 387 - 388 2014.03
日本語スピーキングテストS-CATの文読み上げ問題における発話の冗長性・不完全性を考慮した自動採点の検討

山畑勇人, 盧昊, 山田武志, 今井新悟, 石塚賢吉, 牧野昭二, 北脇信彦

日本音響学会春季研究発表会 269 - 272 2014.03
日本語スピーキングテストS-CATの自由発話問題における発話文の難易度を考慮した自動採点の検討

盧昊, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 牧野昭二, 北脇信彦

日本音響学会春季研究発表会 273 - 276 2014.03
A-10-10 Traffic monitoring by using ad-hoc microphone arrays

豊田卓矢, 宮部滋樹, 山田,武志, 小野順貴, 牧野昭二

Proceedings of the IEICE General Conference 2014 151 2014.03
非同期マイクロホンアレーの符号化録音におけるビットレートと同期性能の関係

宮部,滋樹, 小野,順貴, 牧野,昭二, 高橋,祐

音講論集 ( 3-2-8 ) 725 - 726 2014.03
伝達関数ゲイン基底NMFによる分散配置非同期録音における目的音強調の検討

千葉大将, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二, 高橋祐

日本音響学会春季研究発表会 757 - 760 2014.03

CiNii
Activity Report from the AASP-TC

S.,Makino

IEEE Signal Processing Society eNewsletter, TC News 2014.02 [Refereed]
GENERALIZED AMPLITUDE INTERPOLATION BY beta-DIVERGENCE FOR VIRTUAL MICROPHONE ARRAY

Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) 149 - 153 2014 [Refereed]

　View Summary

In this paper, we present a generalization of the virtual microphone array we previously proposed to increase the microphone elements by nonlinear interpolation. In the previous work, we generated a virtual observation from two actual microphones by an interpolation in the logarithmic domain. This corresponds to a linear interpolation of the phase and the geometric mean of the amplitude. In this paper, we generalize this interpolation using a linear interpolation of the phase and a nonlinear interpolation of the amplitude with adjustable nonlinearity based on beta-divergence. Improvement of the array signal processing performance is obtained by appropriate tuning of the parameter beta. We evaluate the improvement in speech enhancement using a maximum SNR beamformer.
AMPLITUDE-BASED SPEECH ENHANCEMENT WITH NONNEGATIVE MATRIX FACTORIZATION FOR ASYNCHRONOUS DISTRIBUTED RECORDING

Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Yu Takahashi, Takeshi Yamada, Shoji Makino

2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) 203 - 207 2014 [Refereed]

　View Summary

In this paper, we investigate amplitude-based speech enhancement for asynchronous distributed recording. In an ad-hoc microphone array context, it is supposed that different asynchronous devices record speech. As a result, the phase information is unreliable due to sampling frequency mismatch. For speech enhancement based on the amplitude information instead of the phase information, supervised nonnegative matrix factorization (NMF) is introduced in the time-channel domain. The basis vectors, which represents the gain of the transfer function from a source to each microphone, are trained in advance by using single source observation. The experimental evaluations show that this approach is well robust against the sampling frequency mismatch.
Spatial Auditory Two-step Input Japanese Syllabary Brain-computer Interface Speller

Moonjeong Chang, Koichi Mori, Shoji Makino, Tomasz M. Rutkowski

INTERNATIONAL WORKSHOP ON INNOVATIONS IN INFORMATION AND COMMUNICATION SCIENCE AND TECHNOLOGY, IICST 2014 18 25 - 31 2014 [Refereed]

　View Summary

We present a concept and a pilot study of a two-step input speller application combined with a spatial auditory brain-computer interface (BCI) for locked-in syndrome (LIS) users. The application has been developed for 25 Japanese syllabary (hiragana) characters using a two-step input procedure, in order to create an easy-to-use BCI-speller interface. In the proposed procedure, the user first selects the representative letter of a subset, defining the second step. In the second step, the final choice is made. At each interfacing step, the user's intentional choices are classified based on the P300 event related potential (ERP) responses captured in the EEG, as in the classic oddball paradigm. The BCI experiment and EEG results of the pilot study confirm the effectiveness of the proposed spelling method. (C) 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

DOI
Chromatic SSVEP BCI Paradigm Targeting the Higher Frequency EEG Responses

Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( WP2-3-2 ) 1 - 7 2014 [Refereed]

　View Summary

A novel approach to steady-state visual evoked potential (SSVEP) based brain-computer interface (BCI) is presented in the paper. To minimize possible side effects of the monochromatic light SSVEP-based BCI we propose to utilize chromatic green blue flicker stimuli in higher, comparing to the traditionally used, frequencies. The developed safer SSVEP responses are processed an classified with features drawn from EEG power spectra. Results obtained from healthy users support the research hypothesis of the chromatic and higher frequency SSVEP. The feasibility of proposed method is evaluated in a comparison of monochromatic versus chromatic SSVEP responses. We also present preliminary results with empirical mode decomposition (EMD) adaptive filtering which resulted with improved classification accuracies.
P300 Responses Classification Improvement in Tactile BCI with Touch-sense Glove

Hiroki Yajima, Shoji Makino, Tomasz M. Rutkowski

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( WP2-3-3 ) 1 - 7 2014 [Refereed]

　View Summary

This paper reports on a project aiming to confirm whether a tactile stimulator "touch sense glove" is effective for a novel brain computer interface (BCI) paradigm and whether the tactile stimulus delivered to the fingers could be utilized to evoke event related potential (ERP) responses with possible attentional modulation. The tactile ERPs are expected to improve the BCI accuracy. The proposed new stimulator device is presented in detail together with psychophysical and EEG BCI experiment protocols. Results supporting the proposed "touch sense glove" device are presented in form of online BCI classification accuracy results. Finally, we outline the future possible paradigm improvements.
TDOA Estimation by Mapped Steered Response Power Analysis Utilizing Higher-Order Moments

Xiao-Dong Zhai, Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( FP-P1-3 ) 1 - 4 2014 [Refereed]

　View Summary

In this paper, we propose a new estimation method for the time difference of arrival (TDOA) between two microphones with improved accuracy by exploiting higher-order moments. In the proposed method analyzes the steered response power (SRP) of the observed signals after nonlinearly mapped onto a higher-dimensional space. Since the mapping operation enhances the linear independence between different vectors by increasing the dimensionality of the observed signals, the TDOA analysis achieves higher resolution. The results of an experiment comparing the TDOA estimation performance of the proposed method with that of the conventional methods reveal the robustness of the proposed method against noise and reverberation.
On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 2014 [Refereed]

　View Summary

In this paper, we investigate the relationship between the way microphones are arranged and the degree to which speech is enhanced using the transfer-function-gain non-negative matrix factorization (NMF), which is an amplitude-based speech enhancement method that is suitable for use with an asynchronous distributed microphone array. In an asynchronous distributed microphone array, recording devices can be placed freely and the number of devices can be easily increased. Therefore, it is important that to determine the optimum microphone arrangement and the degree to which the performance is improved by using many microphones. We understood experimental evaluations to show that the performance by supervised NMF can achieve close to the ideal time-frequency masking with a sufficient number of microphones. We also show that the performance is better when more microphones are placed close to each source.
Tactile and Bone-conduction Auditory Brain Computer Interface for Vision and Hearing Impaired Users - Stimulus Pattern and BCI Accuracy Improvement

Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( FP2-6-3 ) 1 - 7 2014 [Refereed]

　View Summary

This paper aims to improve tactile and bone-conduction brain computer interface (tbaBCI) classification accuracy based on a new stimulus pattern search in order to trigger more separable P300 responses. We propose and investigate three approaches to stimulus spatial and frequency content modification. As result of the online tbaBCI classification accuracy tests with six subjects we conclude that frequency modification in the previously reported single vibrotactile exciter-based patterns leads to border of significance statistical improvements.
Tactile Pressure Brain-computer Interface Using Point Matrix Pattern Paradigm

Kensuke Shimizu, Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS) 473 - 477 2014 [Refereed]

　View Summary

The paper presents a tactile pressure stimulus-based brain-computer interface (BCI) paradigm. 3 x 3 pressure pins matrix stimulus patterns are presented to the subjects in an oddball paradigm allowing for "aha-responses" generation to attended targets. A research hypothesis is confirmed with the results with five subjects performing online BCI experiments. One of the users could score with 100% accuracy in online ten averages based BCI test. Three users scored above chance levels, while one remained on the chance level border. The presented pilot study experiments and EEG results confirm the effectiveness of the proposed tactile pressure stimulus based BCI.
TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY

Takuya Toyoda, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) 318 - 322 2014

　View Summary

In this paper, we propose an easy and convenient method for traffic monitoring based on acoustic sensing with vehicle sound recorded by an ad-hoc microphone array. Since signals recorded by an ad-hoc microphone array are asynchronous, we perform channel synchronization by compensating for the difference between the start and the end of the recording and the sampling frequency mismatch. To monitor traffic, we estimate the number of the vehicles by employing the peak detection of the power envelopes, and classify the traffic lane from the difference between the propagation times of the microphones. We also demonstrate the effectiveness of our proposed method using the results of an experiment in which we estimated the number of vehicles and classified the lane in which the vehicles were traveling, according to F-measure.
Adaptive Post-Filtering Controlled by Pitch Frequency for CELP-based Speech Coder

Hironobu Chiba, Yutaka Kamamoto, Takehiro Moriya, Noboru Harada, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS 838 - 842 2014 [Refereed]

　View Summary

Most speech codecs utilize a post-filter that emphasizes pitch structures to enhance perceptual quality at the decoder. Particularly, the bass post-filter used in ITU-T G.718 performs an adaptive pitch enhancement technique for a lower fixed frequency band. This paper describes a new post-filtering method in which the bass the frequency band and the gain are adaptively controlled frame-by-frame depending on the pitch frequency of decoded signal to improve bass post-filter performance. We have confirmed the improvement of the speech quality with the developed method through objective and subjective evaluations.
On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( FA1-1-3 ) 1 - 5 2014 [Refereed]

　View Summary

In this paper, we investigate the relationship between the way microphones are arranged and the degree to which speech is enhanced using the transfer-function-gain non-negative matrix factorization (NMF), which is an amplitude-based speech enhancement method that is suitable for use with an asynchronous distributed microphone array. In an asynchronous distributed microphone array, recording devices can be placed freely and the number of devices can be easily increased. Therefore, it is important that to determine the optimum microphone arrangement and the degree to which the performance is improved by using many microphones. We understood experimental evaluations to show that the performance by supervised NMF can achieve close to the ideal time-frequency masking with a sufficient number of microphones. We also show that the performance is better when more microphones are placed close to each source.
Automatic Scoring Method for Open Answer Task in the SJ-CAT Speaking Test Considering Utterance Difficulty Level

Hao Lu, Takeshi Yamada, Shingo Imai, Takahiro Shinozaki, Ryuichi Nisimura, Kenkichi Ishizuka, Shoji Makino, Nobuhiko Kitawaki

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( WA1-1-3 ) 1 - 5 2014 [Refereed]

　View Summary

In this paper, we propose an automatic scoring method for the open answer task of the Japanese speaking test SJ-CAT. The proposed method first extracts a set of features from an input answer utterance and then estimates a vocabulary richness score by human raters, which ranges from 0 to 4, by employing SVR (support vector regression). We devised a novel set of features, namely text statistics weighted by word reliability, to assess the abundance of vocabulary and expression, and degree of word relevance based on the hierarchical distance in a thesaurus to evaluate the suitability of vocabulary. We confirmed experimentally that the proposed method provides good estimates of the human richness score, with a correlation coefficient of 0.92 and an RMSE (root mean square error) of 0.56. We also showed that the proposed method is relatively robust to differences among examinees and among questions used for training and testing.
Auditory Brain-Computer Interface Paradigm with Head Related Impulse Response-based Spatial Cues

Chisaki Nakaizumi, Koichi Mori, Toshie Matsui, Shoji Makino, Tomasz M. Rutkowski

Proc. International Conference on Signal Image Technonogy and Internet Based Systems ( WS-MISA-01 ) 806 - 811 2013.12 [Refereed]

　View Summary

The aim of this study is to provide a comprehensive test of head related
impulse response (HRIR) for an auditory spatial speller brain-computer
interface (BCI) paradigm. The study is conducted with six users in an
experimental set up based on five Japanese hiragana vowels. Auditory evoked
potentials resulted with encouragingly good and stable "aha-" or P300-responses
in real-world online BCI experiments. Our case study indicated that the
auditory HRIR spatial sound reproduction paradigm could be a viable alternative
to the established multi-loudspeaker surround sound BCI-speller applications,
as far as healthy pilot study users are concerned.

DOI

Scopus

3

Citation

(Scopus)
Unisoner: 同一楽曲を歌った異なる歌声を重ね合わせる合唱制作支援インタフェース

都築圭太, 中野倫靖, 後藤真孝, 山田,武志, 牧野昭二

第21回インタラクティブシステムとソフトウェアに関するワークショップ, WISS2013 2013.12
Novel spatial tactile and bone-conduction auditory brain computer interface

T.M.,Rutkowski, H.,Mori, S.,Makino, K.,Mori

Proc. Neuroscience2013 79 2013.11 [Refereed]
様々な歌手が同じ曲を歌った歌声の多様さを活用するシステム

都築圭太, 中野倫靖, 後藤真孝, 山田武志, 牧野昭二

情報処理学会研究報告 2013-MUS-100-21 1 - 8 2013.09

　View Summary

本稿では,Web 上で公開されている「一つの曲を様々な歌手が歌った歌声」を活用する二つのシステムを提案する.一つは,それらの歌声を重ね合わせる合唱生成支援システム,もう一つは,それらの歌声同士や白分の歌声を比較できる歌唱力向上支援システムである.従来,復数の楽曲を用いた鑑賞や創作支援,自分が歌うだけの歌唱力向上支援は研究されてきたが,同一曲を複数人が歌った歌声を活用した合唱生成や歌唱力向上支援はなかった.合唱生成支援システムでは,歌声の出現時刻と左右チャネルの音量をマウスで直感的に調整できる.直感的な操作と,それぞれの歌が完成された作品であることを利用することで,創作と同時に鑑賞を楽しむ「創作鑑賞」も可能となる.また,歌唱力向上支援システムでは,声質 (MFCC) と歌い回し (F0軌跡) が近い歌声同士を比較表示できる.Web 上で公開されていて再生数・マイリスト数があるため,それらの情報を活用しながら歌唱力向上に取り組める.これらのシス
復号信号の特徴に応じたACELP用ポストフィルタの制御

千葉大将, 守谷健弘, 鎌本優, 原田登, 宮部滋樹, 山田武志, 牧野昭二

日本音響学会秋季研究発表会 319 - 320 2013.09
Some advances in adaptive source separation

J.T.,Chien, H.,Sawada, S.,Makino

APSIPA Newsletter 7 - 9 2013.09 [Refereed]
複素対数補間を用いたヴァーチャル多素子化マイクロホンアレーの周波数依存素子配置最適化

片平拓希, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二

日本音響学会秋季研究発表会 609 - 610 2013.09
非整数サンプルシフトのフレーム分析を用いた非同期録音の同期化

宮部,滋樹, 小野,順貴, 牧野,昭二

音講論集 ( 1-1-9 ) 593 - 596 2013.09
News from the AASP-TC

Makino,Shoji

IEEE Signal Processing Society eNewsletter, TC News 2013.08 [Refereed]
Network based complexity analysis in tactile brain computer interface task

H.,Mori, Y.,Matsumito, S.,Makino, Z.,Struzik, D.,Mandic, T.M.,Rutkowski

Proc. EMBC2013 51 ( M-134 ) 1 - 1 2013.07 [Refereed]

DOI CiNii
Multi-command tactile and auditory brain computer interface based on head position stimulation

H.,Mori, Y.,Matsumito, Z.,Struzik, K.,Mori, S.,Makino, D.,Mandic, T.M.,Rutkowski

Proc. International Brain-Computer Interface Meeting ( DOI:10.3217/978-3-85125-260-6- ) 1 - 2 2013.06 [Refereed]
Spatial tactile and auditory brain computer interface based on head position stimulation

T.M.,Rutkowski, H.,Mori, Y.,Matsumoto, Z.,Struzik, S.,Makino, D.,Mandic, K.,Mori

Proc. Neuro2013 2013.06 [Refereed]
Comparison of P300 responses in auditory, visual and audiovisual spatial speller BCI paradigms

M.,Chang, N.,Nishikawa, Z.,Struzik, K.,Mori, S.,Makino, D.,Mandic, T.M.,Rutkowski

Proc. International Brain-Computer Interface Meeting ( DOI:10.3217/978-3-85125-260-6- ) 1 - 2 2013.06 [Refereed]
Blind compensation of inter-channel sampling frequency mismatch with maximum Likelihood estimation in STFT domain

S.,Miyabe, N.,Ono, S.,Makino

Proc. ICASSP2013 674 - 678 2013.05 [Refereed]

　View Summary

This paper proposes a novel blind compensation of sampling frequency mismatch for asynchronous microphone array. Digital signals simultaneously observed by different recording devices have drift of the time differences between the observation channels because of the sampling frequency mismatch among the devices. Based on the model that such the time difference is constant within each time frame, but varies proportional to the time frame index, the effect of the sampling frequency mismatch can be compensated in the short-time Fourier transform domain by the linear phase shift. By assuming the sources are motionless and stationary, a likelihood of the sampling frequency mismatch is formulated. The maximum likelihood estimation is obtained effectively by a golden section search.
Signal Separation of EEG Using Multivariate Probabilistic Model

KURIHANA,Yusuke, MIYABE,Shigeki, RUTKOWSKI, Tomasz M, MATSUMOTO,Yoshihiro, YAMADA,Takeshi, MAKINO,Shoji

IEICE technical report. ME and bio cybernetics 112 ( 479 ) 161 - 166 2013.03

　View Summary

With independent component analysis (ICA), one promising source separation framework, it is difficult to separate desired signal components from the EEG observation, where vast number of sources are mixed. In this paper, we define the change of magnitude caused by each phenomenon inside brain as EEG event, and we formulate the probability model of the EEG event assuming the observation of each EEG event follows multivariate normal distribution locally in every short period. By regarding that each EEG event distirubtes sparsely in the time-frequency domain, the likelihood of the observation is given by Gaussian mixture model (GMM), and the parameters of the EEG events are estimated by an expectation-maximization (EM)algorithm. Also, by introducing Dirichlet prior probability with an appropriate hyperparameter to the activation of each Gaussian components, the EM algorithm achieves the ability to estimate both the number of significant EEG events and their parameters. An EEG separation experiment reveals that the proposed method can separate an appropriate number of EEG event.
A network model for the embodied communication of musical emotions

H.,Terasawa, R.,Hoshi-Shiba, T.,Shibayama, H.,Ohmura, K.,Furukawa, S.,Makino, K.,Okanoya

Cognitive Studies 20 ( 1 ) 112-129 - 129 2013.03 [Refereed]

　View Summary

Music induces a wide range of emotions. However, the influence of physiological functions on musical emotions needs further theoretical considerations. This paper summarizes the physical and physiological functions that are related to musical emo- tions, and proposes a model for the embodied communication of musical emotions based on a discussion on the transmission of musical emotions across people by sharing move- ments and gestures. In this model, human with musical emotion is represented with (1) the interfaces of perception and expression (senses, movements, facial and vocal expressions), (2) an internal system of neural activities including the mirror system and the hormonal secretion system that handles responses to musical activities, and (3) the musical emotion that is enclosed in the internal system. Using this model, mu- sic is the medium for transmitting emotions, and communication of musical emotions is the communication of internal emotions through music and perception/expression interfaces. Finally, we will discuss which aspect in music functions to encourage the communication of musical emotions by humans.

DOI CiNii
A network model for the embodied communication of musical emotions

H.,Terasawa, R.,Hoshi-Shiba, T.,Shibayama, H.,Ohmura, K.,Furukawa, S.,Makino, K.,Okanoya

Cognitive Studies 20 ( 1 ) 112-129 - 129 2013.03 [Refereed]

　View Summary

Music induces a wide range of emotions. However, the influence of physiological functions on musical emotions needs further theoretical considerations. This paper summarizes the physical and physiological functions that are related to musical emo-tions, and proposes a model for the embodied communication of musical emotions based on a discussion on the transmission of musical emotions across people by sharing move-ments and gestures. In this model, human with musical emotion is represented with (1) the interfaces of perception and expression (senses, movements, facial and vocal expressions), (2) an internal system of neural activities including the mirror system and the hormonal secretion system that handles responses to musical activities, and (3) the musical emotion that is enclosed in the internal system. Using this model, mu-sic is the medium for transmitting emotions, and communication of musical emotions is the communication of internal emotions through music and perception/expression interfaces. Finally, we will discuss which aspect in music functions to encourage the communication of musical emotions by humans.

DOI CiNii
Speech enhancement with ad-hoc microphone array using single source activity

Ryutaro Sakanashi, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( OS.21-SLA.7.5 ) 1 - 6 2013 [Refereed]

　View Summary

In this paper, we propose a method for synchronizing asynchronous channels in an ad-hoc microphone array based on single source activity for speech enhancement. An ad-hoc microphone array can include multiple recording devices, which do not communicate with each other. Therefore, their synchronization is a significant issue when using the conventional microphone array technique. We here assume that we know two or more segments (typically the beginning and the end of the recording) where only the sound source is active. Based on this situation, we compensate for the difference between the start and end of the recording and the sampling frequency mismatch. We also describe experimental results for speech enhancement with a maximum SNR beamformer.
Performance estimation of noisy speech recognition using spectral distortion and SNR of noise-reduced speech

Guo Ling, Takeshi Yamada, Shoji Makino, Nobuhiko Kitawaki

IEEE Region 10 Annual International Conference, Proceedings/TENCON 2013 [Refereed]

　View Summary

To ensure a satisfactory QoE (Quality of Experience) and facilitate system design in speech recognition services, it is essential to establish a method that can be used to efficiently investigate recognition performance in different noise environments. Previously, we proposed a performance estimation method using the PESQ (Perceptual Evaluation of Speech Quality) as a spectral distortion measure. However, there is the problem that the relationship between the recognition performance and the distortion value differs depending on the noise reduction algorithm used. To solve this problem, we propose a novel performance estimation method that uses an estimator defined as a function of the distortion value and the SNR (Signal to Noise Ratio) of noise-reduced speech. The estimator is applicable to different noise reduction algorithms without any modification. We confirmed the effectiveness of the proposed method by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. © 2013 IEEE.

DOI

Scopus

3

Citation

(Scopus)
Classification improvement of P300 response based auditory spatial speller brain-computer interface paradigm

Moonjeong Chang, Shoji Makino, Tomasz M. Rutkowski

IEEE Region 10 Annual International Conference, Proceedings/TENCON ( S.I.2.1 ) 1 - 4 2013 [Refereed]

　View Summary

The aim of the presented study is to provide a comprehensive test of the EEG evoked response potential (ERP) feature selection techniques for the spatial auditory BCI-speller paradigm, which creates a novel communication option for paralyzed subjects or body-able individuals requiring a direct brain-computer interfacing application. For rigor, the study is conducted with 16 BCI-naive healthy subjects in an experimental setup based on five Japanese hiragana characters in an offline processing mode. In our previous studies the spatial auditory stimuli related P300 responses resulted with encouragingly separable target vs. non-target latencies in averaged responses, yet that finding was not well reproduced in the online BCI single trial based settings. We present the case study indicating that the auditory spatial unimodal paradigm classification accuracy can be enhanced with an AUC based feature selection approach, as far as BCI-naive healthy subjects are concerned. © 2013 IEEE.

DOI

Scopus

5

Citation

(Scopus)
Bone-conduction-based brain computer interface paradigm - EEG signal processing, feature extraction and classification

Daiki Aminaka, Koichi Mori, Toshie Matsui, Shoji Makino, Tomasz M. Rutkowski

Proceedings - 2013 International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2013 ( WS-MISA-03 ) 818 - 824 2013 [Refereed]

　View Summary

The paper presents a novel bone-conduction based brain-computer interface paradigm. Four sub-threshold acoustic frequency stimulus patterns are presented to the subjects in an oddball paradigm allowing for 'aha-responses' generation to the attended targets. This allows for successful implementation of the bone-conduction based brain-computer interface (BCI) paradigm. The concept is confirmed with seven subjects in online bone-conducted auditory Morse-code patterns spelling BCI paradigm. We report also brain electrophysiological signal processing and classification steps taken to achieve the successful BCI paradigm. We also present a finding of the response latency variability in a function of stimulus difficulty. © 2013 IEEE.

DOI

Scopus

1

Citation

(Scopus)
VIRTUALLY INCREASING MICROPHONE ARRAY ELEMENTS BY INTERPOLATION IN COMPLEX-LOGARITHMIC DOMAIN

Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) ( TH-L5.3 ) 2013

　View Summary

In this paper, we propose a new array signal processing technique for an underdetermined condition by increasing the number of observation channels. We introduce virtual observation as an estimate of the observed signals at positions where real microphones are not placed. Such signals at virtual observation channels are generated by the complex logarithmic interpolation of real observed signals. With the increased number of observation channels, conventional linear array signal processing methods can be applied to underdetermined conditions. As an example of the proposed array signal processing framework, we show experimental results of speech enhancement obtained with maximum SNR beamformers modified using the virtual observation.
Multi-command chest tactile brain computer interface for small vehicle robot navigation

Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8211 LNAI 469 - 478 2013 [Refereed]

　View Summary

The presented study explores the extent to which tactile stimuli delivered to five chest positions of a healthy user can serve as a platform for a brain computer interface (BCI) that could be used in an interactive application such as robotic vehicle operation. The five chest locations are used to evoke tactile brain potential responses, thus defining a tactile brain computer interface (tBCI). Experimental results with five subjects performing online tBCI provide a validation of the chest location tBCI paradigm, while the feasibility of the concept is illuminated through information-transfer rates. Additionally an offline classification improvement with a linear SVM classifier is presented through the case study. © Springer International Publishing 2013.

DOI

Scopus

11

Citation

(Scopus)
Classifying P300 responses to vowel stimuli for auditory brain-computer interface

Yoshihiro Matsumoto, Shoji Makino, Koichi Mori, Tomasz M. Rutkowski

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 ( OS.31-BioSiPS.2.8 ) 1 - 5 2013 [Refereed]

　View Summary

A brain-computer interface (BCI) is a technology for operating computerized devices based on brain activity and without muscle movement. BCI technology is expected to become a communication solution for amyotrophic lateral sclerosis (ALS) patients. Recently the BCI2000 package application has been commonly used by BCI researchers. The P300 speller included in the BCI2000 is an application allowing the calculation of a classifier necessary for the user to spell letters or sentences in a BCI-speller paradigm. The BCI-speller is based on visual cues, and requires muscle activities such as eye movements, impossible to execute by patients in a totally locked-in state (TLS), which is a terminal stage of the ALS illness. The purpose of our project is to solve this problem, and we aim to develop an auditory BCI as a solution. However, contemporary auditory BCI-spellers are much weaker compared with a visual modality. Therefore there is a necessity for improvement before practical application. In this paper, we focus on an approach related to the differences in responses evoked by various acoustic BCI-speller related stimulus types. In spite of various event related potential waveform shapes, typically a classifier in the BCI speller discriminates only between targets and non-targets, and hence it ignores valuable and possibly discriminative features. Therefore, we expect that the classification accuracy could be improved by using an independent classifier for each of the stimulus cue categories. In this paper, we propose two classifier training methods. The first one uses the data of the five stimulus cues independently. The second method incorporates weighting for each stimulus cue feature in relation to all of them. The results of the experiments reported show the effectiveness of the second method for classification improvement. © 2013 APSIPA.

DOI

Scopus

20

Citation

(Scopus)
EMPLOYING MOMENTS OF MULTIPLE HIGH ORDERS FOR HIGH-RESOLUTION UNDERDETERMINED DOA ESTIMATION BASED ON MUSIC

Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Fred Juang

2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) ( PM-02 ) 1 - 4 2013 [Refereed]

　View Summary

Several extensions of the MUltiple SIgnal Classification (MUSIC) algorithm exploiting high order statistics were proposed to estimate directions of arrival (DOAs) with high resolution in underdetermined conditions. However, these methods entail a trade-off between two performance goals, namely, robustness and resolution, in the choice of orders because use of high-ordered statistics increases not only the resolution but also the statistical bias. To overcome this problem, this paper proposes a new extension of MUSIC using a nonlinear high-dimensional map, which corresponds to the joint analysis of moments of multiple orders and helps to realize the both advantages of robustness and high resolution of low-ordered and high-ordered statistics. Experimental results show that the proposed method can estimate DOAs more accurately than the conventional MUSIC extensions exploiting moments of a single high order.
OPTIMIZING FRAME ANALYSIS WITH NON-INTEGRER SHIFT FOR SAMPLING MISMATCH COMPENSATION OF LONG RECORDING

Shigeki Miyabe, Nobutaka Ono, Shoji Makino

2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) ( TM-09 ) 1 - 4 2013 [Refereed]

　View Summary

This paper proposes a blind synchronization of ad-hoc microphone array in the short-time Fourier transform (STFT) domain with the optimized frame analysis centered at non-integer discrete time. We show that the drift caused by sampling frequency mismatch of asynchronous observation channels can be disregarded in a short interval. Utilizing this property, the sampling frequency mismatch and the recording start offset are estimated roughly by finding two pairs of the short intervals corresponding to the same continuous time. Using the estimate, STFT analysis is synchronized roughly between channels with optimized frame central. Since the optimized frame central is generally non-integer, we approximate the frame analysis by the linear phase filtering of the frame centered at the nearest integer sample. Maximum likelihood estimation refines the compensation of sampling frequency mismatch.
Spatial auditory BCI with ERP responses to front-back to the head stimuli distinction support

Zhenyu Cai, Shoji Makino, Tomasz M. Rutkowski

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 ( OS.31-BioSiPS.2.1 ) 1 - 8 2013 [Refereed]

　View Summary

This paper presents recent results obtained with a new auditory spatial localization based BCI paradigm in which ERP shape differences at early latencies are employed to enhance classification accuracy in an oddball experimental setting. The concept relies on recent results in auditory neuroscience showing the possibility to differentiate early anterior contralateral responses to the spatial sources attended to. We also find that early brain responses indicate which direction, front or rear loudspeaker source, the subject attended to. Contemporary stimuli-driven BCI paradigms benefit most from the P300 ERP latencies in a so-called 'aha-response' setting. We show the further enhancement of the classification results in a spatial auditory paradigm, in which we incorporate N200 latencies. The results reveal that these early spatial auditory ERPs boost offline classification results of the BCI application. The offline BCI experiments with the multi-command BCI prototype support our research hypothesis with higher classification results and improved information transfer rates. © 2013 APSIPA.

DOI

Scopus

2

Citation

(Scopus)
Adaptive processing and learning for audio source separation

Jen-Tzung Chien, Hiroshi Sawada, Shoji Makino

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 ( OS.42-SLA.13.3 ) 1 - 6 2013 [Refereed]

　View Summary

This paper overviews a series of recent advances in adaptive processing and learning for audio source separation. In real world, speech and audio signal mixtures are observed in reverberant environments. Sources are usually more than mixtures. The mixing condition is occasionally changed due to the moving sources or when the sources are changed or abruptly present or absent. In this survey article, we investigate different issues in audio source separation including overdetermined/underdetermined problems, permutation alignment, convolutive mixtures, contrast functions, nonstationary conditions and system robustness. We provide a systematic and comprehensive view for these issues and address new approaches to overdetermined/underdetermined convolutive separation, sparse learning, nonnegative matrix factorization, information-theoretic learning, online learning and Bayesian approaches. © 2013 APSIPA.

DOI

Scopus

3

Citation

(Scopus)
Spatial auditory BCI paradigm based on real and virtual sound image generation

Nozomu Nishikawa, Shoji Makino, Tomasz M. Rutkowski

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 ( OS.31-BioSiPS.2.7 ) 1 - 5 2013 [Refereed]

　View Summary

This paper presents a novel concept of spatial auditory brain-computer interface utilizing real and virtual sound images. We report results obtained from psychophysical and EEG experiments with nine subjects utilizing a novel method of spatial real or virtual sound images as spatial auditory brain computer interface (BCI) cues. Real spatial sound sources result in better behavioral and BCI response classification accuracies, yet a direct comparison of partial results in a mixed experiment confirms the usability of the virtual sound images for the spatial auditory BCI. Additionally, we compare stepwise linear discriminant analysis (SWLDA) and support vector machine (SVM) classifiers in a single sequence BCI experiment. The interesting point of the mixed usage of real and virtual spatial sound images in a single experiment is that both stimuli types generate distinct event related potential (ERP) response patterns allowing for their separate classification. This discovery is the strongest point of the reported research and it brings the possibility to create new spatial auditory BCI paradigms. © 2013 APSIPA.

DOI

Scopus

6

Citation

(Scopus)
Multi-command tactile brain computer interface: A feasibility study

Hiromu Mori, Yoshihiro Matsumoto, Victor Kryssanov, Eric Cooper, Hitoshi Ogawa, Shoji Makino, Zbigniew R. Struzik, Tomasz M. Rutkowski

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7989 LNCS 50 - 59 2013 [Refereed]

　View Summary

The study presented explores the extent to which tactile stimuli delivered to the ten digits of a BCI-naive subject can serve as a platform for a brain computer interface (BCI) that could be used in an interactive application such as robotic vehicle operation. The ten fingertips are used to evoke somatosensory brain responses, thus defining a tactile brain computer interface (tBCI). Experimental results on subjects performing online (real-time) tBCI, using stimuli with a moderately fast inter-stimulus-interval (ISI), provide a validation of the tBCI prototype, while the feasibility of the concept is illuminated through information-transfer rates obtained through the case study. © 2013 Springer-Verlag.

DOI

Scopus

14

Citation

(Scopus)
EEG signal processing and classification for the novel tactile-force brain-computer interface paradigm

Shota Kono, Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

Proceedings - 2013 International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2013 ( WS-MISA-02 ) 812 - 817 2013 [Refereed]

　View Summary

The presented study explores the extent to which tactile-force stimulus delivered to a hand holding a force-feedback joystick can serve as a platform for a brain-computer interface (BCI). The four pressure directions are used to evoke tactile brain potential responses, thus defining a tactile-force brain computer interface (tfBCI). We present brain signal processing and classification procedures leading to successful online interfacing results. Experimental results with seven subjects performing online BCI experiments provide a validation of the hand location tfBCI paradigm, while the feasibility of the concept is illuminated through remarkable information-transfer rates. © 2013 IEEE.

DOI

Scopus

5

Citation

(Scopus)
Inter-subject differences in personalized technical ear training and the influence of an individually optimized training sequence

Sungyoung Kim, Teruaki Kaniwa, Hiroko Terasawa, Takeshi Yamada, Shoji Makino

Acoustical Science and Technology 34 ( 6 ) 424 - 431 2013 [Refereed]

　View Summary

Technical ear training aims to improve the listening of sound engineers so they can skillfully modify and edit the structure of sound. Despite recent increasing interest in listening ability and subjective evaluation in the field of audio- and acoustic-related fields and the subsequent appearance of various technical ear-training methods, the subject of how to provide efficient training for a self-trainee has not yet been studied. This paper investigated trainees' performances and showed that an (inherent or learned) ability to correctly describe spectral differences using the terms of a parametric equalizer (center frequency, Q, and gain) was different for each person. To cope with such individual differences in spectral identification, the authors proposed a novel method that adaptively controls the training task based on a trainee's prior performances. In detail, the method estimates the weakness of the trainee, and generates a training routine that focuses on that weakness. Subsequently, we tried to determine whether the proposed method-adaptive feedback-helps self-learners improve their performance in technical listening that involves identifying spectral differences. The results showed that the proposed method could assist trainees in improving their ability to identify differences more effectively than the counterpart group. Together with other features required for effective selftraining, this adaptive feedback would assist a trainee in acquisition of timbre-identification ability. © 2013 The Acoustical Society of Japan.

DOI

Scopus

9

Citation

(Scopus)
Exhaustive structural comparison of protein-DNA binding surfaces

R,Minai, T,Horiike, S. Makino

GIW2012 (International Conference on Genome Informatics) ( poster 29 ) 2012.12 [Refereed]
Full-reference objective quality evaluation for noise-reduced speech considering effect of musical noise

Y.,Fujita, T.,Yamada, S.,Makino, N.,Kitawaki

Oriental COCOSDA2012 300-305 2012.12 [Refereed]
Foreword to special issue on recent mathematical advances in acoustic signal processing

S.,Makino

The Journal of the Acoustical Society of Japan 68 ( 11 ) 557-558 - 558 2012.11 [Refereed]

CiNii
A multi-command spatial auditory BMI based on evoked EEG responses from real and virtual sound stimuli

T.M.,Rutkowski, Z.,Cai, N.,Nishikawa, Y.,Matsumoto, S.Makino, D.,Looney, D.P.,Mandic, Z.R.,Struzik, A.W, Przybyszewski

Neuroscience2012 891.16/NN4 2012.10 [Refereed]
Underdetermined DOA estimation by the non-linear MUSIC exploiting higher-order moments

Y,Sugimoto, S,Miyabe, T,Yamada, S,Makino, and,F. Juang

IWAENC2012 ( E-03 ) 2012.09 [Refereed]
In Search of a Perceptual Metric for Timbre: Dissimilarity Judgments among Synthetic Sounds with MFCC-Derived Spectral Envelopes

Hiroko Terasawa, Jonathan Berger, Shoji Makino

JOURNAL OF THE AUDIO ENGINEERING SOCIETY 60 ( 9 ) 674 - 685 2012.09 [Refereed]

　View Summary

This paper presents a quantitative metric to describe the multidimensionality of spectral envelope perception, that is, the perception specifically related to the spectral element of timbre. Mel-cepstrum (Mel-frequency cepstral coefficients or MFCCs) is chosen as a hypothetical metric for spectral envelope perception due to its desirable properties of linearity, orthogonality, and multidimensionality. The experimental results confirmed the relevance of Mel-cepstrum to the perceived timbre dissimilarity when the spectral envelopes of complex-tone synthetic sounds were systematically controlled. The first experiment measured the perceived dissimilarity when the stimuli were synthesized by varying only a single coefficient from MFCC. Linear regression analysis proved that each of the 12 MFCCs has a linear correlation with spectral envelope perception. The second experiment measured the perceived dissimilarity when the stimuli were synthesized by varying two of the MFCCs. Multiple regression analysis showed that the perceived dissimilarity can be explained in terms of the Euclidean distance of the MFCC values of the synthetic sounds. The quantitative and perceptual relevance between the MFCCs and spectral centroids is also discussed. These results suggest that MFCCs can be a metric representation of spectral envelope perception, where each of its orthogonal basis functions provides a linear match with human perception.
Analysis of brain responses to spatial real and virtual sounds - A BCI/BMI approach

N,Nishikawa, S,Makino, and,T.M. Rutkowski

International Workshop on Brain Inspired Computing, BIC2012 2012.06 [Refereed]
Steady-state auditory responses application to BCI/BMI

Y,Matsumoto, S,Makino, and,T.M. Rutkowski

International Workshop on Brain Inspired Computing, BIC2012 2012.06 [Refereed]
Spatial auditory BCI/BMI paradigm

Z,Cai, S,Makino, and,T.M. Rutkowski

International Workshop on Brain Inspired Computing, BIC2012 2012.06 [Refereed]
Diffuse Noise Reduction using a Full-rank Spatial Covariance Model

Iso,Keiju, Araki,Shoko, Makino,Shoji, Nakatani,Tomohiro, Sawada,Hiroshi, Yamada,Takeshi, Miyabe,Shigeki, Nakamura,Atsushi

Proceedings of the IEICE General Conference 2012 ( 0 ) 194 2012.03
D-14-1 Effect of Musical Noise on Subjective Quality Evaluation of Noise-Reduced Speech

Fujita,Yuki, Yamada,Takeshi, Makino,Shoji, Kitawaki,Nobuhiko

Proceedings of the IEICE General Conference 2012 ( 1 ) 185 2012.03
Cepstral smoothing of separated signals for underdetermined speech separation

Ansai,Yumi, Araki,Shoko, Makino,Shoji, Nakatani,Tomohiro, Yamada,Takeshi, Nakamura,Atsushi, Kitawaki,Nobuhiko

The Journal of the Acoustical Society of Japan 68 ( 2 ) 74 - 85 2012.02 [Refereed]

　View Summary

本論文では,音源信号のスパース性に基づき,時間周波数バイナリマスク(BM)を用いる音源分離手法におけるミュージカルノイズの低減を目的とした,分離音声のケプストラムスムージング(CSS)を提案する。CSSは,近年提案されたスペクトルマスクのケプストラムスムージング(CSM)で用いられるケプストラム領域でスムージングする考え方と,ケプストラム表現による音声特性の保持の制御という観点では,マスクではなくBMによって得られた分離音声を直接スムージングする方が好ましいという仮説とに基づいている。また,従来法(CSM)や提案法(CSS)と他のミュージカルノイズ低減手法の性能を実験により比較する。CSSでは,CSMと同程度のミュージカルノイズ低減性能を有し,更に目的音声の歪の小さい分離信号が得られた。

CiNii
NEW ANALYTICAL UPDATE RULE FOR TDOA INFERENCE FOR UNDERDETERMINED BSS IN NOISY ENVIRONMENTS

Takuro Maruyama, Shoko Araki, Tomohiro Nakatani, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Atsushi Nakamura

2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 269 - 272 2012 [Refereed]

　View Summary

In this paper, we propose a new technique for sparseness-based underdetermined BSS that is based on the clustering of the frequency-dependent time difference of arrival (TDOA) information and that can cope with diffused noise environments. Such a method with an EM algorithm has already been proposed, however, it required a time-consuming exhaust search for TDOA inference. To remove the need for such an exhaust search, we propose a new technique by focusing on a stereo case. We derive an update rule for analytical TDOA estimation. This update rule eliminates the need for the exhaustive TDOA search, and therefore reduces the computational load. We show experimental results for separation performance and calculation time in comparison with those obtained with the conventional approach. Our reported results validate our proposed method, that is, our proposed method achieves high performance without a high computational cost.
Spatial auditory BCI paradigm utilizing N200 and P300 responses

Zhenyu Cai, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 ( OS.6-BioSPS.1.4 ) 1-7 2012 [Refereed]

　View Summary

The paper presents our recent results obtained with a new auditory spatial localization based BCI paradigm in which the ERP shape differences at early latencies are employed to enhance the traditional P300 responses in an oddball experimental setting. The concept relies on the recent results in auditory neuroscience showing a possibility to differentiate early anterior contralateral responses to attended spatial sources. Contemporary stimuli-driven BCI paradigms benefit mostly from the P300 ERP latencies in so called "aha-response" settings. We show the further enhancement of the classification results in spatial auditory paradigms by incorporating the N200 latencies, which differentiate the brain responses to lateral, in relation to the subject head, sound locations in the auditory space. The results reveal that those early spatial auditory ERPs boost online classification results of the BCI application. The online BCI experiments with the multi-command BCI prototype support our research hypothesis with the higher classification results and the improved information-transfer-rates. © 2012 APSIPA.
Sonification of Muscular Activity in Human Movements Using the Temporal Patterns in EMG

Masaki Matsubara, Hiroko Terasawa, Hideki Kadone, Kenji Suzuki, Shoji Makino

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) ( OS.6-BioSPS.1.2 ) 1-5 2012 [Refereed]

　View Summary

Biofeedback is currently considered as an effective method for medical rehabilitation. It aims to increase the awareness and recognition of the body's motion by feeding back the physiological information to the patients in real time. Our goal is to create an auditory biofeedback that aids understanding of the dynamic motion involving multiple muscular parts, with the ultimate aim of clinical rehabilitation use. In this paper, we report the development of a real-time sonification system using EMG, and we propose three sonification methods that represent the data in pitch, timbre, and the combination of polyphonic timbre and loudness. Our user evaluation test involves the task of timing and order identification and a questionnaire about the subjective comprehensibility and the preferences, leading to a discussion of the task performance and usability. The results show that the subjects can understand the order of the muscular activities at 63.7% accuracy on average. And the sonification method with polyphonic timbre and loudness provides an 85.2% accuracy score on average, showing its effectiveness. Regarding the preference of the sound design, we found that there is not a direct relationship between the task performance accuracy and the preference of sound in the proposed implementations.
Vibrotactile stimulus frequency optimization for the haptic BCI prototype

Hiromu Mori, Yoshihiro Matsumito, Shoji Makino, Victor Kryssanov, Tomasz M. Rutkowski

6th International Conference on Soft Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012 2150 - 2153 2012 [Refereed]

　View Summary

The paper presents results from a psychophysical study conducted to optimize vibrotactile stimuli delivered to subject finger tips in order to evoke the somatosensory responses to be utilized next in a haptic brain computer interface (hBCI) paradigm. We also present the preliminary EEG evoked responses for the chosen stimulating frequency. The obtained results confirm our hypothesis that the hBCI paradigm concept is valid and it will allow for rapid stimuli presentation in order to improve information-transfer-rate (ITR) of the BCI. © 2012 IEEE.

DOI

Scopus

15

Citation

(Scopus)
AUTOMATIC SCORING METHOD CONSIDERING QUALITY AND CONTENT OF SPEECH FOR SCAT JAPANESE SPEAKING TEST

Naoko Okubo, Yuto Yamahata, Takeshi Yamada, Shingo Imai, Kenkichi Ishizuka, Takahiro Shinozaki, Ryuichi Nisimura, Shoji Makino, Nobuhiko Kitawaki

2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS 72 - 77 2012 [Refereed]

　View Summary

We are now developing a Japanese speaking test called SCAT, which is part of J-CAT (Japanese Computerized Adaptive Test), a free online proficiency test for Japanese language learners. In this paper, we focus on the sentence-reading-aloud task and the sentence generation task in SCAT, and propose an automatic scoring method for estimating the overall score of answer speech, which is holistically determined by language teachers according to a rating standard. In that process, teachers carefully consider different factors but do not rate the scores of them. We therefore analyze how each factor contributes to the overall score. The factors are divided into two categories: the quality of speech and the content of speech. The former includes pronunciation and intonation, and the latter representation and vocabulary. We then propose an automatic scoring method based on the analysis. Experimental results confirm that the proposed method gives relatively accurate estimates of the overall score.
Auditory steady-state response stimuli based BCI application - The optimization of the stimuli types and lengths

Yoshihiro Matsumoto, Nozomu Nishikawa, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 ( OS.13-BioSPS.2.3 ) 1-7 2012 [Refereed]

　View Summary

We propose a method for an improvement of auditory BCI (aBCI) paradigm based on a combination of ASSR stimuli optimization by choosing the subjects' best responses to AM-, flutter-, AM/FM and click-envelope modulated sounds. As the ASSR response features we propose pairwise phase-locking-values calculated from the EEG and next classified using binary classifier to detect attended and ignored stimuli. We also report on a possibility to use the stimuli as short as half a second, which is a step forward in ASSR based aBCI. The presented results are helpful for optimization of the aBCI stimuli for each subject. © 2012 APSIPA.
EEG steady state synchrony patterns sonification

Teruaki Kaniwa, Hiroko Terasawa, Masaki Matsubara, Tomasz M. Rutkowski, Shoji Makino

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 ( OS.6-BioSPS.1.5 ) 1-6 2012 [Refereed]

　View Summary

This paper describes an application of a multichannel EEG sonification approach. We present results obtained with a multichannel-sonification method tested with steady-state EEG responses. We elucidate brain synchrony patterns in an auditory domain with utilization of the EEG coherence measure. The transitions in the synchrony patterns are represented as timbre (i.e., spectro-temporal) deviation and as spatial movement of the sound cluster. Our final sonification evaluation experiment with six subjects confirms the validity of the proposed brain synchrony-elucidation approach. © 2012 APSIPA.
Distance Attenuation Control of Spherical Loudspeaker Array

Shigeki Miyabe, Takaya Hayashi, Takeshi Yamada, Shoji Makino

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) ( OS.15-SLA.7.2 ) 1-4 2012 [Refereed]

　View Summary

This paper describes control of distance attenuation using spherical loudspeaker array. Fisher et al. proposed radial filtering with spherical microphone to control the sensitivity to distance from a sound source by modeling the propagation of waves in spherical harmonic domain. Since transfer functions are not changed by swapping their inputs and outputs, we can use the same theory of radial filtering for microphone arrays to the filter design of distance attenuation control with loudspeaker arrays. Experimental results confirmed that the proposed method is effective in low frequencies.
The spatial real and virtual sound stimuli optimization for the auditory BCI

Nozomu Nishikawa, Yoshihiro Matsumoto, Shoji Makino, Tomasz M. Rutkowski

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 ( OS.13-BioSPS.2.6 ) 1-9 2012 [Refereed]

　View Summary

The paper presents results from a project aiming to create horizontally distributed surround sound sources and virtual sound images as auditory BCI (aBCI) stimuli. The purpose is to create evoked brain wave response patterns depending on attended or ignored sound directions. We propose to use a modified version of the vector based amplitude panning (VBAP) approach to achieve the goal. The so created spatial sound stimulus system for the novel oddball aBCI paradigm allows us to create a multi-command experimental environment with very encouraging results reported in this paper.We also present results showing that a modulation of the sound image depth changes also the subject responses. Finally, we also compare the proposed virtual sound approach with the traditional one based on real sound sources generated from the real loudspeaker directions. The so obtained results confirm the hypothesis of the possibility to modulate independently the brain responses to spatial types and depths of sound sources which allows for the development of the novel multi-command aBCI. © 2012 APSIPA.
Psychophysical responses comparison in spatial visual, audiovisual, and auditory BCI-spelling paradigms

Moonjeong Chang, Nozomu Nishikawa, Zhenyu Cai, Shoji Makino, Tomasz M. Rutkowski

6th International Conference on Soft Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012 2154 - 2157 2012 [Refereed]

　View Summary

The paper presents a pilot study conducted with spatial visual, audiovisual and auditory brain-computer-interface (BCI) based speller paradigms. The psychophysical experiments are conducted with healthy subjects in order to evaluate a difficulty and a possible response accuracy variability. We also present preliminary EEG results in offline BCI mode. The obtained results validate a thesis, that spatial auditory only paradigm performs as good as the traditional visual and audiovisual speller BCI tasks. © 2012 IEEE.

DOI

Scopus

5

Citation

(Scopus)
Comparison of superimposition and sparse models in blind source separation by multichannel Wiener filter

Ryutaro Sakanashi, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) ( OS.18-SLA.9.5 ) 1-6 2012 [Refereed]

　View Summary

Multichannel Wiener filter proposed by Duong et al. can conduct underdetermined blind source separation (BSS) with low distortion. This method assumes that the observed signal is the superimposition of the multichannel source images generated from multivariate normal distributions. The covariance matrix in each time-frequency slot is estimated by an EM algorithm which treats the source images as the hidden variables. Using the estimated parameters, the source images are separated as the maximum a posteriori estimate. It is worth nothing that this method does not assume the sparseness of sources, which is usually assumed in underdetermined BSS. In this paper we investigate the effectiveness of the three attributes of Duong's method, i.e., the source image model with multivariate normal distribution, the observation model without sparseness assumption, and the source separation by multichannel Wiener filter. We newly formulate three BSS methods with the similar source image model and the different observation model assuming sparseness, and we compare them with Duong's method and the conventional binary masking. Experimental results confirmed the effectiveness of all the three attributes of Duong's method.
New analytical calculation and estimation of TDOA for underdetermined BSS in noisy environments

Takuro Maruyama, Shoko Araki, Tomohiro Nakatani, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Atsushi Nakamura

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) ( OS.12-SLA.6.4 ) 1-6 2012 [Refereed]

　View Summary

We have proposed a new algorithm for sparseness-based underdetermined blind source separation (ESS) that can cope with diffused noise environments. This algorithm includes a technique for estimating the time-difference-of-arrival (TDOA) parameter separately in individual frequency bins for each source. In this paper, we propose methods that integrate the frequency-bin-wise TDOA parameter to estimate the TDOA of each source. The accuracy of TDOA estimation with the proposed approach is shown experimentally in comparison with a conventional approach. The separation performance and calculation time of the proposed approach is also examined.
Visualization of conversation flow in meetings by analysis of direction of arrivals and continuousness of utterance

M. Katoh, Y. Sugimoto, S. Miyabe, S. Makino, T. Yamada, N. Kitawaki

Tunisian-Japan Symposium on Science, Society & Technology 1-5 2011.11 [Refereed]
New EEG components separation method: Data driven Huang-Hilbert transform application to auditory BMI paradigm

T.M. Rutkowski, Q. Zhao, D.P. Mandic, Z. Cai, A. Cichocki, S. Makino, A.W. Przybyszewski

Neuroscience 2011 627.15/AAA32 2011.11 [Refereed]
Underdetermined BSS in noisy environments with new analytical update rule for TDOA inference

MARUYAMA,Takuro, ARAKI,Shoko, NAKATANI,Tomohiro, MIYABE,Shigeki, YAMADA,Takashi, MAKINO,Shoji, NAKAMURA,Atsushi

Technical report of IEICE. EA 111 ( 306 ) 25 - 30 2011.11

　View Summary

In this research, we propose a method to update estimation of time difference of arrival (TDOA) analytically in sparseness-based underdetermined blind source separation (BSS) with an EM algorithm. Izumi et at. proposed underdetermined BSS that can cope with diffuse noise environments. However, Izumi's method requires discrete exhaustive search to update TDOA parameter every iteration, thereby takes high computational cost. In this paper, focusing on the stereo case, we obtain analytical update of TDOA parameters in each frequency bin using frequency-dependent TDOA modeling. This update rule eliminates the exhaustive TDOA search, and therefore reduces the computational load. We show experimental results of separation performance and calculation time in comparison with those obtained with the conventional approach.

CiNii
Performance estimation of noisy speech recognition based on short-term noise characteristics

E. Morishita, T. Yamada, S. Makino, N. Kitawaki

Tunisian-Japan Symposium on Science, Society & Technology 1-4 2011.11 [Refereed]
Performance estimation of noisy speech recognition considering the accuracy of acoustic models

T. Takaoka, T. Yamada, S. Makino, N. Kitawaki

Tunisian-Japan Symposium on Science, Society & Technology 1-4 2011.11 [Refereed]
A study on sound image control method for operational support of touch panel display

Shigeyoshi, Amano, Takeshi, Yamada, Shoji, Makino, Nobuhiko, Kitawaki

Proc. APSIPA ASC 2011 ( Thu-PM.PS2 ) 1-1 2011.10 [Refereed]
Subjective and objective quality evaluation of noise-reduced speech

Yamada,Takeshi, Makino,Shoji, Kitawaki,Nobuhiko

The Journal of the Acoustical Society of Japan 67 ( 10 ) 476 - 481 2011.10 [Refereed]

CiNii
Towards a personalized technical ear training program: An investigation of the effect of adaptive feedback

T. Kaniwa, S. Kim, H. Terasawa, M. Ikeda, T. Yamada, S. Makino

Sound and Music Computing Conference 439-443 2011.07 [Refereed]
C. elegans meets data sonification: Can we hear its elegant movement?

H. Terasawa, Y. Takahashi, K. Hirota, T. Hamano, T. Yamada, A. Fukamizu, S. Makino

Sound and Music Computing Conference 77-82 2011.07 [Refereed]
DOA Estimation for Multiple Sparse Sources with Arbitrarily Arranged Multiple Sensors

Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 63 ( 3 ) 265 - 275 2011.06 [Refereed]

　View Summary

This paper proposes a method for estimating the direction of arrival (DOA) of multiple source signals for an underdetermined situation, where the number of sources N exceeds the number of sensors M (M < N). Some DOA estimation methods have already been proposed for underdetermined cases. However, since most of them restrict their microphone array arrangements, their DOA estimation ability is limited to a 2-dimensional plane. To deal with an underdetermined case where sources are distributed arbitrarily, we propose a method that can employ a 2- or 3-dimensional sensor array. Our new method employs the source sparseness assumption to handle an underdetermined case. Our formulation with the sensor coordinate vectors allows us to employ arbitrarily arranged sensors easily. We obtained promising experimental results for 2-dimensionally distributed sensors and sources 3x4, 3x5 (#sensors x #speech sources), and for 3-dimensional case with 4x5 in a room (reverberation time (RT) of 120 ms). We also investigate the DOA estimation performance under several reverberant conditions.

DOI

Scopus

32

Citation

(Scopus)
B-11-19 A Study on Objective Quality Evaluation Method Applicable to Both Music and Speech

Mikami,Yuichiro, Yamada,Takeshi, Makino,Shoji, Kitawaki,Nobuhiko

Proceedings of the IEICE General Conference 2011 ( 2 ) 448 2011.02

CiNii
B-11-18 An Improvement of Overall Quality Estimation Model for Objective Quality Evaluation of Noise-Reduced Speech

Fujita,Yuki, Yamada,Takeshi, Makino,Shoji

Proceedings of the IEICE General Conference 2011 ( 2 ) 447 - 447 2011.02

CiNii
An MPEG-2 to H.264 Transcoding Preserving DCT Types and Motion Vectors to Suppress Re-Quantization Noise for Interlace Contents

YOSHITOME,Takeshi, KAMIKURA,Kazuto, MAKINO,Shoji, KITAWAKI,Nobuhiko

The IEICE transactions on information and systems (Japanese edetion) 94 ( 2 ) 469 - 480 2011.02 [Refereed]

　View Summary

インタレース映像を符号化したMPEG-2ストリームをH.264へトランスコードする際に,初段符号化情報を利用して,混入する量子化雑音を低減する手法を提案する.本手法では,MPEG-2のDCT種別と動き補償種別をH.264へ極力継承し,更にフレームベクトルからフィールドベクトルに変換すれば継承可能となるペアMBをDCT種別と動き補償種別の組合せから判別し,ベクトル変換することで継承率を向上させる.実験の結果,符号化情報を利用しない従来手法に比べ,0.19〜0.31dBのPSNR向上が確認できた.

CiNii
Blind source separation of mixed speech in a high reverberation environment

Keiju Iso, Shoko Araki, Shoji Makino, Tomohiro Nakatani, Hiroshi Sawada, Takeshi Yamada, Atsushi Nakamura

2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays, HSCMA'11 36 - 39 2011 [Refereed]

　View Summary

Blind source separation (BSS) is a technique for estimating and separating individual source signals from a mixed signal using only information observed by each sensor. BSS is still being developed for mixed signals that are affected by reverberation. In this paper, we propose combining the BSS method that considers reverberation proposed by Duong et al. with the BSS method reported by Sawada et al., which does not consider reverberation, for the initial setting of the EM algorithm. This proposed method assumes the underdetermined case. In the experiment, we compare the proposed method with the conventional method reported by Duong et al. and that reported by Sawada et al., and demonstrate the effectiveness of the proposed method. © 2011 IEEE.

DOI

Scopus

7

Citation

(Scopus)
Spatial location and sound timbre as informative cues in auditory BCI/BMI - Electrodes position optimization for brain evoked potential enhancement

Zhenyu Cai, Hiroko Terasawa, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011 ( Wed-PM.SS4 ) 222 - 227 2011 [Refereed]

　View Summary

The paper introduces a novel auditory BCI/BMI paradigm based on combined sound timbre and horizontal plane spatial locations as informative cues. The presented concept is based on responses to eight-directional audio stimuli with various tonal and environmental sound stimuli. The approach is based on a monitoring of brain electrical activity by means of the electroencephalogram (EEG). The previously developed by the authors spatial auditory stimulus is extended to varying in timbre sound stimuli which feature helps the subjects to attend to the targets. The main achievement discussed in the paper is an offline BCI analysis based on an optimization of electrode locations on the scalp and evoked response latency for further classification results improvement. The so developed new BCI paradigm is more user-friendly and it leads to better results comparing to previously utilized simple tonal or steady-state stimuli.
Restoration of Clipped Audio Signal Using Recursive Vector Projection

Shin Miura, Hirofumi Nakajima, Shigeki Miyabe, Shoji Makino, Takeshi Yamada, Kazuhiro Nakadai

2011 IEEE REGION 10 CONFERENCE TENCON 2011 394 - 397 2011 [Refereed]

　View Summary

This paper proposes signal restoration from clipping effect without prior knowledge. First, an interval of signal including clipped samples is analyzed by recursive vector projection. By analyzing the neighboring samples of the clipped interval and excluding the clipped interval in the analysis of similarity, signal estimation in the clipped interval is estimated as a by-product of the analysis. Since the estimation holds consistency with the neighboring samples, the restored signal does not suffer from click noise. Evaluation of the clipping restoration with various audio signal ascertained that the proposed method improves signal-to-noise ratio.
Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source

Kazuma Takeda, Hirokazu Kameoka, Hiroshi Sawada, Shoko Araki, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2011 IEEE REGION 10 CONFERENCE TENCON 2011 413 - 416 2011 [Refereed]

　View Summary

This paper presents a new method for underdetermined Blind Source Separation (BSS), based on a concept called multichannel complex non-negative matrix factorization (NMF). The method assumes (1) that the time-frequency representations of sources have disjoint support (W-disjoint orthogonality of sources), and (2) that each source is modeled as a superposition of components whose amplitudes vary over time coherently across all frequencies (amplitude coherence of frequency components) in order to jointly solve the indeterminacy involved in the frequency domain underdetermined BSS problem. We confirmed experimentally that the present method performed reasonably well in terms of the signal-to-interference ratio when the mixing process was known.
Mora pitch level recognition for the development of a Japanese pitch accent acquisition system

Greg, Short, Keikichi, Hirose, Takeshi, Yamada, Nobuaki, Minematsu, Nobuhiko, Kitawaki, Shoji, Makino

Proc. International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques, Oriental COCOSDA 2010 1-6 2010.10 [Refereed]
Subjective and Objective Quality Evaluation for Noise-Reduced Speech

Takeshi,Yamada, Shoji,Makino, Nobuhiko,Kitawaki

IPSJ SIG Notes 2010 ( 7 ) 1 - 6 2010.10

　View Summary

To provide users with natural and intelligible speech in noisy environments, the use of a noise reduction algorithm, which reduces the noise component in the noisy input speech, can be effective. It is, however, well-known that any noise reduction algorithm unavoidably produces speech distortion and residual noise. Here, the critical issue is that the characteristics of these undesired byproducts vary according to the noise reduction algorithm used and the type of noise to be reduced. It is therefore essential to establish methods that can be used to evaluate the quality of noise-reduced speech. In this paper, we describe subjective and objective quality evaluation methods for noise-reduced speech.

CiNii
A VC-1 to H.264/AVC intra transcoding using encoding information to reduce re-quantization noise

T. Yoshitome, Y. Nakajima, K. Kamikura, S. makino, N. Kitawaki

International Conference on Signal and Image Processing 170-177 2010.08 [Refereed]
BS-5-4 Objective Estimation of MOS and Word Intelligibility for Noise-Reduced Speech

Yamada,Takeshi, Kitawaki,Nobuhiko, Makino,Shoji

Proceedings of the Society Conference of IEICE 2010 ( 2 ) - 19 2010.08
Scattered Speech Signal Detection by Principal Component Analysis for Spatial Power Spectrum

KATOH,Michiaki, SUGIMOTO,Yuya, MAKINO,Shoji, YAMADA,Takeshi, KITAWAKI,Nobuhiko

Technical report of IEICE. EA 110 ( 171 ) 25 - 30 2010.08

　View Summary

It is important for efficiently reviewing meeting speech archives to preliminarily and automatically detect "when, how and who talked". In this paper, we propose a method for automatically detecting a short and scattered signal such as agreements by using only acoustical information. The proposed method has two steps: 1) extract a spatial power spectrum frame-by-frame from the meeting speech archive recorded by a microphone array, and 2) detect the target signal by using an outlier detection algorithm based on principal component analysis. To evaluate the effectiveness of the proposed method, we conducted an experiment using the meeting speech archive recorded in a real room. The experimental results imply that we can detect a long utterance, a short utterance, no utterance from only a few principal components.

CiNii
Special Section on Blind Signal Processing and Its Applications

Shoji Makino, Andrzej Cichocki, Wei Xing Zheng, Aurelio Uncini

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 57 ( 7 ) 1401 - 1403 2010.07 [Refereed]

DOI

Scopus

1

Citation

(Scopus)
Special Section on Blind Signal Processing and Its Applications

Shoji Makino, Andrzej Cichocki, Wei Xing Zheng, Aurelio Uncini

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 57 ( 7 ) 1401 - 1403 2010.07

DOI

Scopus

1

Citation

(Scopus)
Underdetermined Blind Source Separation Using Acoustic Arrays

Shoji Makino, Shoko Araki, Stefan Winter, Hiroshi Sawada

Handbook on Array Processing and Sensor Networks 303 - 341 2010.04 [Refereed]

DOI

Scopus

8

Citation

(Scopus)
B-11-1 A Study of Artificial Voices for Telephonometry in the IP-based Telecommunication Networks

Aoshima,Chika, Kitawaki,Nobuhiko, Yamada,Takeshi, Makino,Shoji

Proceedings of the IEICE General Conference 2010 ( 2 ) 435 2010.03
B-11-2 Full-reference Objective Quality Evaluation for Noise-reduced Speech Using Overall Quality Estimation Model

Shinohara,Yuki, Yamada,Takeshi, Kitawaki,Nobuhiko, Makino,Shoji

Proceedings of the IEICE General Conference 2010 ( 2 ) 436 2010.03
MPEG-2/H.264 transcoding with vector conversion reducing re-quantization noise

Takeshi Yoshitome, Kazuto Kamikura, Shoji Makino, Nobuhiko Kitawaki

Proceedings - International Conference on Computer Communications and Networks, ICCCN 1-6 2010 [Refereed]

　View Summary

We propose an MPEG-2 to H.264 transcoding method for interlace streams intermingled with frame and field macroblocks. This method uses the encoding information from an MPEG-2 stream and keeps as many DCT coefficients of the original MPEG-2 bitstream as possible. Experimental results show that the proposed method improves PSNR by about 0.19-0.31 dB compared with a conventional method. © 2010 IEEE.

DOI

Scopus

1

Citation

(Scopus)
Performance Estimation of Noisy Speech Recognition Considering Recognition Task Complexity

Takeshi Yamada, Tomohiro Nakajima, Nobuhiko Kitawaki, Shoji Makino

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 2042 - 2045 2010 [Refereed]

　View Summary

To ensure a satisfactory QoE (Quality of Experience) and facilitate system design in speech recognition services, it is essential to establish a method that can be used to efficiently investigate recognition performance in different noise environments. Previously, we proposed a performance estimation method using a spectral distortion measure. However, there is the problem that recognition task complexity affects the relationship between the recognition performance and the distortion value. To solve this problem, this paper proposes a novel performance estimation method considering the recognition task complexity. We confirmed that the proposed method gives accurate estimates of the recognition performance for various recognition tasks by an experiment using noisy speech data recorded in a real room.
Comparison of MOS evaluation characteristics for Chinese, Japanese, and English in IP telephony

Zhenyu Cai, Nobuhiko Kitawaki, Takeshi Yamada, Shoji Makino

2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings 112 - 115 2010 [Refereed]

　View Summary

Communication quality in IP telephony is rated in terms of the Mean Opinion Score (MOS), which is an Absolute Category Rating (ACR) scale. There is a problem when comparing subjectively evaluated MOSs in that the evaluation results are strongly affected by differences in language, the instruction words used for the evaluation, and the nationality of the evaluator. To solve these problems, ITU-T SG12 has started to investigate the cultural and language dependencies of subjective quality evaluations undertaken with the MOS method for speech/video/multimedia. In this paper, we present the results of a comparison of the MOS evaluation characteristics for Chinese, Japanese, and English. ©2010 IEEE.

DOI

Scopus

20

Citation

(Scopus)
A study of artificial voices for telephonometry in the IP-based telecommunication networks

Chika, Aoshima, Nobuhiko, Kitawaki, Takeshi, Yamada, 山田, 武志, 牧野, 昭二

Tunisian-Japan Symposium on Science, Society & Technology 2009.11 [Refereed]
Analysis of standardized speech database by considering long-term average spectrum

Naoko, Okubo, Nobuhiko, Kitawaki, Takeshi, Yamada, Makino, Shoji

Tunisian-Japan Symposium on Science, Society & Technology 1-4 2009.11 [Refereed]
DOA estimation for multiple sparse sources with arbitrarily arranged multiple sensors

S. Araki, H. Sawada, R. Mukai, S. Makino

Journal of Signal Processing Systems 1-11 - 11 2009.10 [Refereed]

CiNii
Foreword to the special section on blind signal processing and its applications

牧野昭二

IEICE Trans. Fundamentals J92-A ( 5 ) 275 - 275 2009.05 [Refereed]

CiNii
Stereo Source Separation and Source Counting with MAP Estimation with Dirichlet Prior Considering Spatial Aliasing Problem

Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada, Shoji Makino

INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS 5441 742 - 750 2009 [Refereed]

　View Summary

In this paper, we propose a novel sparse source separation method that can estimate the number of sources and time-frequency masks simultaneously, even when the spatial aliasing problem exists. Recently, many sparse Source separation approaches with time-frequency masks have been proposed. However, most of these approaches require information on the number of sources in advance. In our proposed method, we model the phase difference of arrival (PDOA) between microphones with a Gaussian mixture model (GMM) with a Dirichlet prior. Then we estimate the model parameters by using the maximum a posteriori (MAP) estimation based on the EM algorithm. In order to avoid one cluster being modeled by two or more Gaussians, we utilize a sparse distribution modeled by the Dirichlet distributions as the prior of the GMM mixture weight. Moreover, to handle wide microphone spacing cases where the spatial aliasing problem occurs, the indeterminacy of modulus 2 pi k in the phase is also included in our model. Experimental results show good performance of our proposed method.
BLIND SPARSE SOURCE SEPARATION FOR UNKNOWN NUMBER OF SOURCES USING GAUSSIAN MIXTURE MODEL FITTING WITH DIRICHLET PRIOR

Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada, Shoji Makino

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS 33 - 36 2009 [Refereed]

　View Summary

In this paper, we propose a novel sparse source separation method that can be applied even if the number of sources is unknown. Recently, many sparse source separation approaches with time-frequency masks have been proposed. However, most of these approaches require information on the number of sources in advance. In our proposed method, we model the histogram of the estimated direction of arrival (DOA) with a Gaussian mixture model (GMM) with a Dirichlet prior. Then we estimate the model parameters by using the maximum a posteriori estimation based on the EM algorithm. In order to avoid one cluster being modeled by two or more Gaussians, we utilize a sparse distribution modeled by the Dirichlet distributions as the prior of the GMM mixture weight. By using this prior, without any specific model selection process, our proposed method can estimate the number of sources and time-frequency masks simultaneously. Experimental results show the performance of our proposed method.
Handling speaker position changes in a meeting diarization system by combining DOA clustering and speaker identification

T. Hager, S. Araki, K. Ishizuka, M. Fujimoto, T. Nakatani, S. Makino

IWAENC2008 2-12 2008.09 [Refereed]

CiNii
Foreword to the special section on acoustic scene analysis and reproduction

S., Makino

IEICE Trans. Fundamentals E91-A ( 6 ) 1301-1302 2008.06 [Refereed]
Recent advances in audio source separation techniques

H. Sawada, S. Araki, S. Makino

Journal of IEICE 91 ( 4 ) 292-296 - 296 2008.04 [Refereed]

CiNii
A DOA based speaker diarization system for real meetings

Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino

2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS 30 - 33 2008 [Refereed]

　View Summary

This paper presents a speaker diarization system that estimates who spoke when in a meeting. Our proposed system is realized by using a noise robust voice activity detector (VAD), a direction of arrival (DOA) estimator, and a DOA classifier. Our previous system utilized the generalized cross correlation method with the phase transform (GCC-PHAT) approach for the DOA estimation. Because the GCC-PHAT can estimate just one DOA per frame, it was difficult to handle speaker overlaps. This paper tries to deal with this issue by employing a DOA at each time-frequency slot (TFDOA), and reports how it improves diarization performance for real meetings / conversations recorded in a room with a reverberation time of 350 ms.
Speaker indexing and speech enhancement in real meetings/conversations

Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 93 - 96 2008 [Refereed]

　View Summary

This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a GCC-PHAT based direction of arrival (DOA) estimator, and a DOA classifier. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings/conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.
Missing feature speech recognition in a meeting situation with maximum SNR beamforming

Dorothea Kolossa, Shoko Araki, Marc Delcroix, Tomohiro Nakatani, Reinhold Orglmeister, Shoji Makino

PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10 3218 - + 2008 [Refereed]

　View Summary

Especially for tasks like automatic meeting transcription, it would be useful to automatically recognize speech also while multiple speakers are talking simultaneously. For this purpose, speech separation can be performed, for example by using maximum SNR beamforming. However, even when good interferer suppression is attained, the interfering speech will still be recognizable during those intervals, where the target speaker is silent. In order to avoid the consequential insertion errors, a new soft masking scheme is proposed, which works in the time domain by inducing a large damping on those temporal periods, where the observed direction of arrival does not correspond to that of the target speaker. Even though the masking scheme is aggressive, by means of missing feature recognition the recognition accuracy can be improved significantly, with relative error reductions in the order of 60% compared to maximum SNR beamforming alone, and it is successful also for three simultaneously active speakers. Results are reported based on the SOLON speech recognizer, NTT's large vocabulary system [1], which is applied here for the recognition of artificially mixed data using real-room impulse responses and the entire clean test set of the Aurora 2 database.
Guest editors' introduction: Special section on emergent systems, algorithms, and architectures for speech-based human-machine interaction

Rodrigo Capobianco Guido, Li Deng, Shoji Makino

IEEE TRANSACTIONS ON COMPUTERS 56 ( 9 ) 1153 - 1155 2007.09 [Refereed]
Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors

Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

SIGNAL PROCESSING 87 ( 8 ) 1833 - 1847 2007.08 [Refereed]

　View Summary

This paper presents a new method for blind sparse source separation. Some sparse source separation methods, which. rely on source sparseness and an anechoic mixing model, have already been proposed. These methods utilize level ratios and phase differences between sensor observations as their features, and they separate signals by classifying them. However, some of the features cannot form clusters with a well-known clustering algorithm, e.g., the k-means. Moreover, most previous methods utilize a linear sensor array (or only two sensors), and therefore they cannot separate symmetrically positioned sources. To overcome such problems, we propose a new feature that can be clustered by the k-means algorithm and that can be easily applied to more than three sensors arranged non-linearly. We have obtained promising results for two- and three-dimensionally distributed speech separation with non-linear/non-uniform sensor arrays in a real room even in underdetermined situations. We also investigate the way in which the performance of such methods is affected by room reverberation, which may cause the sparseness and anechoic assumptions to collapse. (C) 2007 Elsevier B.V. All rights reserved.

DOI CiNii

Scopus

229

Citation

(Scopus)
Introduction to the special section on blind signal processing for speech and audio applications

Shoji Makino, Te-Won Lee, Guy J. Brown

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 15 ( 5 ) 1509 - 1510 2007.07 [Refereed]

DOI

Scopus

2

Citation

(Scopus)
MAP-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and ℓ1-norm minimization

Stefan Winter, Walter Kellermann, Hiroshi Sawada, Shoji Makino

Eurasip Journal on Advances in Signal Processing 2007 2007 [Refereed]

　View Summary

We address the problem of underdetermined BSS. While most previous approaches are designed for instantaneous mixtures,we propose a time-frequency-domain algorithm for convolutive mixtures. We adopt a two-step method based on a general maximum a posteriori (MAP) approach. In the first step, we estimate the mixing matrix based on hierarchical clustering, assuming that the source signals are sufficiently sparse. The algorithm works directly on the complex-valued data in the time-frequency domain and shows better convergence than algorithms based on self-organizing maps. The assumption of Laplacian priors for the source signals in the second step leads to an algorithm for estimating the source signals. It involves the ℓ1-norm minimization of complex numbers because of the use of the time-frequency-domain approach. We compare a combinatorial approach initially designed for real numbers with a second-order cone programming (SOCP) approach designed for complex numbers. We found that although the former approach is not theoretically justified for complex numbers, its results are comparable to, or even better than, the SOCP solution. The advantage is a lower computational cost for problems with low input/output dimensions.

DOI

Scopus

95

Citation

(Scopus)
Blind audio source separation based on independent component analysis

Shoji Makino, Hiroshi Sawada, Shoko Araki

INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS 4666 843 - 843 2007 [Refereed]
Blind source separation based on a beamformer array and time frequency binary masking

Jan Cermak, Shoko Araki, Hiroshi Sawada, Shoji Makino

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS 145 - 148 2007 [Refereed]

　View Summary

This paper deals with a new technique for blind source separation (BSS) from convolutive mixtures. We present a three-stage separation system employing time-frequency binary masking, beamforming and a non-linear post processing technique. The experiments show that this system outperforms conventional time-frequency binary masking (TFBM) in both (over-)determined and underdetermined cases. Moreover it removes the musical noise and reduces interference in time-frequency slots extracted by TFBM.
MLSP 2007 data analysis competition: Frequency-domain blind source separation for convolutive mixtures of speech/audio signals

Hiroshi Sawada, Shoko Araki, Shoji Makino

Machine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP 45 - 50 2007 [Refereed]

　View Summary

This paper describes the frequency-domain approach to the blind source separation of speech/audio signals that are convolutively mixed in a real room environment. With the application of shorttime Fourier transforms, convolutive mixtures in the time domain can be approximated as multiple instantaneous mixtures in the frequency domain. We employ complex-valued independent component analysis (ICA) to separate the mixtures in each frequency bin. Then, the permutation ambiguity of the ICA solutions should be aligned so that the separated signals are constructed properly in the time domain. We propose a permutation alignment method based on clustering the activity sequences of the frequency bin-wise separated signals. We achieved the overall winner status of MLSP 2007 Data Analysis Competition based on the presented method. ©2007 IEEE.

DOI

Scopus

18

Citation

(Scopus)
A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures

Hiroshi Sawada, Shoko Araki, Shoji Makino

2007 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS 157 - 160 2007 [Refereed]

　View Summary

This paper proposes a two-stage method for the blind separation of convolutively mixed sources. We employ time-frequency masking, which can be applied even to an underdetermined case where the number of sensors is insufficient for the number of sources. In the first stage of the method, frequency bin-wise mixtures are classified based on Gaussian mixture model fitting. In the second stage, the permutation ambiguities of the bin-wise classified signals are aligned by clustering the posterior probability sequences calculated in the first stage. Experimental results for separating four speeches with three microphones under reverberant conditions show the superiority of the proposed method over existing methods based on time-difference-of-arrival estimations or signal envelope clustering.
Measuring dependence of bin-wise separated signals for permutation alignment in frequency-domain BSS

Hiroshi Sawada, Shoko Araki, Shoji Makino

2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11 3247 - 3250 2007 [Refereed]

　View Summary

This paper presents a new method for grouping bin-wise separated signals for individual sources, i.e., solving the permutation problem, in the process of frequency-domain blind source separation. Conventionally, the correlation coefficient of separated signal envelopes is calculated to judge whether or not the separated signals originate from the same source. In this paper, we propose a new measure that represents the dominance of the separated signal in the mixtures, and use it for calculating the correlation coefficient, instead of a signal envelope. Such dominance measures exhibit dependence/independence more clearly than traditionally used signal envelopes. Consequently, a simple clustering algorithm with centroids works well for grouping separated signals. Experimental results were very appealing, as three sources including two coming from the same direction were separated properly with the new method.
Blind speech separation in a meeting situation with maximum SNR beamformers

Shoko Araki, Hiroshi Sawada, Shoji Makino

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS 41 - 44 2007 [Refereed]

　View Summary

We propose a speech separation method for a meeting situation, where each speaker sometimes speaks and the number of speakers changes every moment. Many source separation methods have already been proposed, however, they consider a case where all the speakers keep speaking: this is not always true in a real meeting. In such cases, in addition to separation, speech detection and the classification of the detected speech according to speaker become important issues. For that purpose, we propose a method that employs a maximum signal-to-noise (MaxSNR) beamformer combined with a voice activity detector and online clustering. We also discuss the scaling ambiguity problem as regards the MaxSNR beamformer, and provide their solutions. We report some encouraging results for a real meeting in a room with a reverberation time of about 350 ins.
First stereo audio source separation evaluation campaign: Data, algorithms and results

Emmanuel Vincent, Hiroshi Sawada, Pau Bofill, Shoji Makino, Justinian P. Rosca

INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS 4666 552 - + 2007 [Refereed]

　View Summary

This article provides an overview of the first stereo audio source separation evaluation campaign, organized by the authors. Fifteen underdetermined stereo source separation algorithms have been applied to various audio data, including instantaneous, convolutive and real mixtures of speech or music sources. The data and the algorithms are presented and the estimated source signals are compared to reference signals using several objective performance criteria.
MAP-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and l(1)-norm minimization

Stefan Winter, Walter Kellermann, Hiroshi Sawada, Shoji Makino

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING 2007 ( 24717 ) 1 - 12 2007 [Refereed]

　View Summary

We address the problem of underdetermined BSS. While most previous approaches are designed for instantaneous mixtures, we propose a time-frequency-domain algorithm for convolutive mixtures. We adopt a two-step method based on a general maximum a posteriori ( MAP) approach. In the first step, we estimate the mixing matrix based on hierarchical clustering, assuming that the source signals are sufficiently sparse. The algorithm works directly on the complex-valued data in the time-frequency domain and shows better convergence than algorithms based on self-organizing maps. The assumption of Laplacian priors for the source signals in the second step leads to an algorithm for estimating the source signals. It involves the l(1)-norm minimization of complex numbers because of the use of the time-frequency-domain approach. We compare a combinatorial approach initially designed for real numbers with a second-order cone programming (SOCP) approach designed for complex numbers. We found that although the former approach is not theoretically justified for complex numbers, its results are comparable to, or even better than, the SOCP solution. The advantage is a lower computational cost for problems with low input/output dimensions. Copyright (C) 2007 Stefan Winter et al.

DOI

Scopus

95

Citation

(Scopus)
Frequency domain blind source separation in a noisy environment

R. Mukai, H. Sawada, S. Araki, S. Makino

2006 Joint meeting of ASA and ASJ 1pSP1 2006.11 [Refereed]
Normalized observation vector clustering approach for sparse source separation

S. Araki, H. Sawada, R. Mukai, S. Makino

EUSIPCO2006 Wed.5.4.4 2006.09 [Refereed]
Underdetermined source separation by ICA and homomorphic signal processing

S. Winter, W. Kellermann, H. Sawada, S. Makino

IWAENC2006 Wed.Sep.8 2006.09 [Refereed]
Performance evaluation of sparse source separation and DOA estimation with observation vector clustering in reverberant environments

S. Araki, H. Sawada, R. Mukai, S. Makino

IWAENC2006 Tue.Sep.4 2006.09 [Refereed]
Blind sparse source separation with spatially smoothed time-frequency masking

S. Araki, H. Sawada, R. Mukai, S. Makino

IWAENC2006 Wed.Sep.9 2006.09 [Refereed]

CiNii
Parametric-Pearson-based independent component analysis for frequency-domain blind speech separation

H. Kato, Y. Nagahara, S. Araki, H. Sawada, S. Makino

EUSIPCO2006 Tue.4.2.5 2006.09 [Refereed]
Blind speech separation by combining beamformers and a time frequency binary mask

J. Cermak, S. Araki, H. Sawada, S. Makino

IWAENC2006 Tue.Sep.5 - 148 2006.09 [Refereed]

CiNii
Underdetermined source separation for colored sources

S. Winter, W. Kellermann, H. Sawada, S. Makino

EUSIPCO2006 Thu.3.1.6 2006.09 [Refereed]
Musical noise reduction in time-frequency-binary-masking-based blind source separation systems

J. Cermak, S. Araki, H. Sawada, S. Makino

Czech-German Workshop on Speech Processing 2006.09 [Refereed]
Stereo echo cancellation algorithm using adaptive update on the basis of enhanced input-signal vector

S Emura, Y Haneda, A Kataoka, S Makino

SIGNAL PROCESSING 86 ( 6 ) 1157 - 1167 2006.06 [Refereed]

　View Summary

Stereo echo cancellation requires a fast converging adaptive algorithm because the stereo input signals are highly cross correlated and the convergence rate of the misalignment is slow even after preprocessing for unique identification of stereo echo paths. To speed up the convergence, we propose enhancing the contribution of the decorrelated components in the preprocessed input-signal vector to adaptive updates. The adaptive filter coefficients are updated on the basis of either a single or multiple past enhanced input-signal vectors.
For a single-vector update, we show how this enhancement improves the convergence rate by analyzing the behavior of the filter coefficient error in the mean. For a two-past-vector update, simulation showed that the proposed enhancement leads to a faster decrease in misalignment than the corresponding conventional second-order affine projection algorithm while computational complexities are almost the same. (c) 2005 Elsevier B.V. All rights reserved.

DOI

Scopus

18

Citation

(Scopus)
Solving the permutation problem of frequency-domain BSS when spatial aliasing occurs with wide sensor spacing

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 4935 - 4938 2006 [Refereed]

　View Summary

This paper describes a method for solving the permutation problem of frequency-domain blind source separation (BSS). The method analyzes the mixing system information estimated with independent component analysis (ICA). When we use widely spaced sensors or increase the sampling rate, spatial aliasing may occur for high frequencies due to the possibility of multiple cycles in the sensor spacing. In such cases, the estimated information would imply multiple possibilities for a source location. This causes some difficulty when analyzing the information. We propose a new method designed to overcome this difficulty. This method first estimates the model parameters for the mixing system at low frequencies where spatial aliasing does not occur, and then refines the estimations by using data at all frequencies. This refinement leads to precise parameter estimation and therefore precise permutation alignment. Experimental results show the effectiveness of the new method.
DOA estimation for multiple sparse sources with normalized observation vector clustering

Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS 33 - + 2006 [Refereed]

　View Summary

This paper presents a new method for estimating the direction of arrival (DOA) of source signals whose number N can exceed the number of sensors M. Subspace based methods, e.g., the MUSIC algorithm, have been widely studied, however, they are only applicable when M > N. Another conventional independent component analysis based method allows AY > N, however, it cannot be applied when A,1 < N. By contrast, our new method can be applied where the sources outnumber the sensors (i.e., an underdeterinined case M < N) by assuming source sparseness. Our method can cope with 2- or 3-dimensionally distributed sources with a 2- or 3-dimensional sensor array. We obtained promising experimental results for 3 x 4, 3 x 5 and 4 x 5 (#sensors x #speech sources) in a room (RT60 = 120 ms).
Blind source separation of many signals in the frequency domain

Ryo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 5827 - 5830 2006 [Refereed]

　View Summary

This paper describes the frequency-domain blind source separation (BSS) of convolutively mixed acoustic signals using independent component analysis (ICA). The most critical issue related to frequency domain BSS is the permutation problem. This paper presents two methods for solving this problem. Both methods are based on the clustering of information derived from a separation matrix obtained by ICA. The first method is based on direction of arrival (DOA) clustering. This approach is intuitive and easy to understand. The second method is based on normalized basis vector clustering. This method is less intuitive than the DOA based method, but it has several advantages. First, it does not need sensor ar-ray geometry information. Secondly, it can fully utilize the information contained in the separation matrix, since the clustering is performed in high-dimensional space. Experimental results show that our methods realize BSS in various situations such as the separation of many speech signals located in a 3-dimensional space, and the extraction of primary sound sources surrounded by many background interferences.
Frequency domain blind source separation of a reduced amount of data using frequency normalization

Enrique Robledo-Arnuncio, Hiroshi Sawada, Shoji Makino

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 5695 - 5698 2006 [Refereed]

　View Summary

The problem of blind source separation (BSS) from convolutive mixtures is often addressed using independent component analysis in the frequency domain. The separation performance with this approach degrades significantly when only a short amount of data is available, since the estimation of the separation system becomes inaccurate. In this paper we present a novel approach to the frequency domain BSS using frequency normalization. Under the conditions of almost sparse sources and of dominant direct path in the mixing systems, we show that the new approach provides better performance than the conventional one when the amount of available data is small.
Blind source separation of convolutive mixtures - art. no. 624709

Shoji Makino

Independent Component Analyses, Wavelets, Unsupervised Smart Sensors, and Neural Networks IV 6247 ( 7 ) 24709 - 24709 2006 [Refereed]

　View Summary

This paper introduces the blind source separation (BSS) of convolutive mixtures of acoustic signals, especially speech. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving, nonlinear decorrelation, nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency, domain adaptive microphone arrays, i.e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. This is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct separation filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction., and sources can be simultaneously active in BSS.

DOI

Scopus

4

Citation

(Scopus)
Geometrical interpretation of the PCA subspace approach for overdetermined blind source separation

S. Winter, H. Sawada, S. Makino

EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING 2006 ( 71632 ) 1-11 2006 [Refereed]

　View Summary

We discuss approaches for blind source separation where we can use more sensors than sources to obtain a better performance. The discussion focuses mainly on reducing the dimensions of mixed signals before applying independent component analysis. We compare two previously proposed methods. The first is based on principal component analysis, where noise reduction is achieved. The second is based on geometric considerations and selects a subset of sensors in accordance with the fact that a low frequency prefers a wide spacing, and a high frequency prefers a narrow spacing. We found that the PCA-based method behaves similarly to the geometry-based method for low frequencies in the way that it emphasizes the outer sensors and yields superior results for high frequencies. These results provide a better understanding of the former method.

DOI

Scopus

12

Citation

(Scopus)
Solving the permutation problem of frequency-domain BSS when spatial aliasing occurs with wide sensor spacing

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS 77 - + 2006 [Refereed]

　View Summary

This paper describes a method for solving the permutation problem of frequency-domain blind source separation (BSS). The method analyzes the mixing system information estimated with independent component analysis (ICA). When we use widely spaced sensors or increase the sampling rate, spatial aliasing may occur for high frequencies due to the possibility of multiple cycles in the sensor spacing. In such cases, the estimated information would imply multiple possibilities for a source location. This causes some difficulty when analyzing the information. We propose a new method designed to overcome this difficulty. This method first estimates the model parameters for the mixing system at low frequencies where spatial aliasing does not occur, and then refines the estimations by using data at all frequencies. This refinement leads to precise parameter estimation and therefore precise permutation alignment. Experimental results show the effectiveness of the new method.
Frequency domain blind source separation of a reduced amount of data using frequency normalization

Enrique Robledo-Arnunciou, Hiroshi Sawada, Shoji Makino

2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS 837 - + 2006 [Refereed]

　View Summary

The problem of blind source separation (BSS) from convolutive mixtures is often addressed using independent component analysis in the frequency domain. The separation performance with this approach degrades significantly when only a short amount of data is available, since the estimation of the separation system becomes inaccurate. In this paper we present a novel approach to the frequency domain BSS using frequency normalization. Under the conditions of almost sparse sources and of dominant direct path in the mixing systems. we show that the new approach provides better performance than the conventional one when the amount of available data is small.
Underdetermined sparse source separation of convolutive mixtures with observation vector clustering

Shoko Araki, Heroshi Sawada, Ryo Mukai, Shoji Makino

2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS 3594 - 3597 2006 [Refereed]

　View Summary

We propose a new method for solving the underdetermined sparse signal separation problem. Some sparseness based methods have already been proposed. However, most of these methods utilized a linear sensor array (or only two sensors), and therefore they have certain limitations; e.g., they cannot separate symmetrically positioned sources. To allow the use of more than three sensors that can be arranged in a non-linear/non-uniform way, we propose a new method that includes the normalization and clustering of the observation vectors. Our proposed method can handle both underdetermined case and (over-)determined cases. We show practical results for speech separation with nonlinear/non-uniform sensor arrangements. We obtained promising experimental results for the cases of 3 x 4, 4 x 5 (#sensors x #sources) in a room (RT60 = 120 ms).
DOA estimationfor multiple sparse sources with normalized observation vector clustering

Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 4891 - 4894 2006 [Refereed]

　View Summary

This paper presents a new method for estimating the direction of arrival (DOA) of source signals whose number N can exceed the number of sensors M. Subspace based methods, e.g., the MUSIC algorithm, have been widely studied, however, they are only applicable when M > N. Another conventional independent component analysis based method allows M > N, however, it cannot be applied when M < N. By contrast, our new method can be applied where the sources outnumber the sensors (i.e., an underdetermined case M < N) by assuming source sparseness. Our method can cope with 2- or 3-dimensionally distributed sources with a 2- or 3-dimensional sensor array. We obtained promising experimental results for 3 x 4, 3 x 5 and 4 x 5 (#sensors x #speech sources) in a room (RT60 = 120 ms).
On calculating the inverse of separation matrix in frequency-domain blind source separation

H Sawada, S Araki, R Mukai, S Makino

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION, PROCEEDINGS 3889 691 - 699 2006 [Refereed]

　View Summary

For blind source separation (BSS) of convolutive mixtures, the frequency-domain approach is efficient and practical, because the convolutive mixtures are modeled with instantaneous mixtures at each frequency bin and simple instantaneous independent component analysis (ICA) can be employed to separate the mixtures. However, the permutation and scaling ambiguities of ICA solutions need to be aligned to obtain proper time-domain separated signals. This paper discusses the idea that calculating the inverses of separation matrices obtained by ICA is very important as regards aligning these ambiguities. This paper also shows the relationship between the ICA-based method and the time-frequency masking method for BSS, which becomes clear by calculating the inverses.
Blind source separation of many signals in the frequency domain

Ryo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS 969 - + 2006 [Refereed]

　View Summary

This paper describes the frequency-domain blind source separation (BSS) of convolutively mixed acoustic signals using independent component analysis (ICA). The most critical issue related to frequency domain BSS is the permutation problem. This paper presents two methods for solving this problem. Both methods are based on the clustering of information derived from a separation matrix obtained by ICA. The first method is based on direction of arrival (DOA) clustering. This approach is intuitive and easy to understand. The second method is based on normalized basis vector clustering. This method is less intuitive than the DOA based method, but it has several advantages. First, it does not need sensor array geometry information. Secondly, it can fully utilize the information contained in the separation matrix, since the clustering is performed in high-dimensional space. Experimental results show that our methods realize BSS in various situations such as the separation of many speech signals located in a 3-dimensional space. and the extraction of primary sound sources surrounded by many background interferences.
Recognition of convolutive speech mixtures by missing feature techniques for ICA

Dorothea Kolossa, Hiroshi Sawada, Ramon Fernandez Astudillo, Reinhold Orglmeister, Shoji Makino

2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5 1397 - + 2006 [Refereed]

　View Summary

One challenging problem for robust speech recognition is the cocktail party effect, where multiple speaker signals are active simultaneously in an overlapping frequency range. In that case, independent component analysis (ICA) can separate the signals in reverberant environments, also. However, incurred feature distortions prove detrimental for speech recognition. To reduce consequential recognition errors, we describe the use of ICA for the additional estimation of uncertainty information. This information is subsequently used in missing feature speech recognition, which leads to far more correct and accurate recognition also in reverberant situations at RT60 = 300ms.
Blind separation and localization of speeches in a meeting situation

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5 1407 - + 2006 [Refereed]

　View Summary

The technique of blind source separation (BSS) has been well studied. In this paper, we apply the BSS technique, particularly based on independent component analysis (ICA), to a meeting situation. The goal is to enhance the spoken utterances and to estimate the location of each speaker by means of multiple microphones. The technique may help us to take the minutes of a meeting.
Frequency-domain blind source separation of many speech signals using near-field and far-field models

Ruo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING 2006 ( 83683 ) 1 - 13 2006 [Refereed]

　View Summary

We discuss the frequency-domain blind source separation (BSS) of convolutive mixtures when the number of source signals is large, and the potential source locations are omnidirectional. The most critical problem related to the frequency-domain BSS is the permutation problem, and geometric information is helpful as regards solving it. In this paper, we propose a method for obtaining proper geometric information with which to solve the permutation problem when the number of source signals is large and some of the signals come from the same or a similar direction. First, we describe a method for estimating the absolute DOA by using relative DOAs obtained by the solution provided by independent component analysis (ICA) and the far-field model. Next, we propose a method for estimating the spheres on which source signals exist by using ICA solution and the near-field model. We also address another problem with regard to frequency-domain BSS that arises from the circularity of discrete-frequency representation. We discuss the characteristics of the problem and present a solution for solving it. Experimental results using eight microphones in a room show that the proposed method can separate a mixture of six speech signals arriving from various directions, even when two of them come from the same direction. Copyright (C) 2006 Ryo Mukai et al.

DOI

Scopus

26

Citation

(Scopus)
Subband-based blind separation for convolutive mixtures of speech

S Araki, S Makino, R Aichner, T Nishikawa, H Saruwatari

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E88A ( 12 ) 3593 - 3603 2005.12 [Refereed]

　View Summary

We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.

DOI

Scopus

21

Citation

(Scopus)
Underdetermined blind separation for speech in real environments with F0 adaptive comb filtering

F. Flego, S. Araki, H. Sawada, T. Nakatani, S. Makino

IWAENC2005 93-96 2005.09 [Refereed]
Real-time blind source separation and DOA estimation using small 3-D microphone array

R. Mukai, H. Sawada, S. Araki, S. Makino

IWAENC2005 45-48 2005.09 [Refereed]
Real-time blind extraction of dominant target sources from many background interference sources

H. Sawada, R. Mukai, S. Araki, S. Makino

IWAENC2005 73-76 - 76 2005.09 [Refereed]

CiNii
A novel blind source separation method with observation vector clustering

S. Araki, H. Sawada, R. Mukai, S. Makino

IWAENC2005 117-120 2005.09 [Refereed]
Blind source separation of convolutive mixtures of audio signals in frequency domain

S., Makino

Advances in Circuits and Systems ( 5 ) 2005.08 [Refereed]
Underdetermined blind separation of convolutive mixtures of speech using time-frequency mask and mixing matrix estimation

A Blin, S Araki, S Makino

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E88A ( 7 ) 1693 - 1700 2005.07 [Refereed]

　View Summary

This paper focuses on the underdetermined blind source separation (BSS) of three speech signals mixed in a real environment from measurements provided by two sensors. To date, solutions to the underdetermined BSS problem have mainly been based on the assumption that the speech signals are sufficiently sparse. They involve designing binary masks that extract signals at time-frequency points where only one signal was assumed to exist. The major issue encountered in previous work relates to the occurrence of distortion, which affects a separated signal with loud musical noise. To overcome this problem, we propose combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to detect when only one source is active and to perform a preliminary separation with a time-frequency mask. This information is then used to estimate the mixing matrix, which allows us to improve our separation. Experimental results show that this combination of time-frequency mask and mixing matrix estimation provides separated signals of better quality (less distortion, less musical noise) than those extracted without using the estimated mixing matrix in reverberant conditions where the reverberant time (TR) was 130 ms and 200 ms. Furthermore, informal listening tests clearly show that musical noise is deeply lowered by the proposed method comparatively to the classical approaches.

DOI

Scopus

20

Citation

(Scopus)
Blind source separation of convolutive mixtures of speech in frequency domain

S Makino, H Sawada, R Mukai, S Araki

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E88A ( 7 ) 1640 - 1655 2005.07 [Refereed] [Invited]

　View Summary

This paper overviews a total solution for frequency-domain blind source separation (BSS) of convolutive mixtures of audio signals, especially speech. Frequency-domain BSS performs independent component analysis (ICA) in each frequency bin, and this is more efficient than time-domain BSS. We describe a sophisticated total solution for frequency-domain BSS, including permutation, scaling, circularity, and complex activation function solutions. Experimental results of 2 x 2, 3 x 3, 4 x 4, 6 x 8, and 2 x 2 (moving sources), (#sources x #microphones) in a room are promising.

DOI

Scopus

58

Citation

(Scopus)
Frequency-domain blind source separation without array geometry information

H. Sawada, R. Mukai, S. Araki, S. Makino

HSCMA2005 d13-d14 2005.03 [Refereed]
Blind source separation and DOA estimation using small 3-D microphone array

R. Mukai, H. Sawada, S. Araki, S. Makino

HSCMA2005 (Joint Workshop on Hands-Free Speech Communication and Microphone Arrays) d9-d10 2005.03 [Refereed]
Source extraction from speech mixtures with null-directivity pattern based mask

S. Araki, S. Makino, H. Sawada, R. Mukai

HSCMA2005 d1-d2 2005.03 [Refereed]
Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

Proceedings - IEEE International Symposium on Circuits and Systems 5882 - 5885 2005 [Refereed]

　View Summary

This paper presents a method for enhancing a target source of interest and suppressing other interference sources. The target source is assumed to be close to sensors, to have dominant power at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e. without knowing the total number of sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. We propose a new sophisticated method for selecting the target source frequency components, and also a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room (reverberation time was 130 ms) are presented to show the effectiveness and characteristics of the proposed method. © 2005 IEEE.

DOI

Scopus

12

Citation

(Scopus)
Blind extraction of a dominant source signal from mixtures of many sources

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings III III61 - III64 2005 [Refereed]

　View Summary

This paper presents a method for enhancing a dominant target source that is close to sensors, and suppressing other interferences. The enhancement is performed blindly, i.e. without knowing the number of total sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage processing technique where a spatial filter is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. To obtain the spatial filter we employ independent component analysis and then select the component of the target source. Time-frequency masks in the second stage are obtained by calculating the angle between the basis vector corresponding to the target source and a sample vector. The experimental results for a simulated cocktail party situation were very encouraging. ©2005 IEEE.

DOI

Scopus

20

Citation

(Scopus)
Multiple source localization using independent component analysis

Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

IEEE Antennas and Propagation Society, AP-S International Symposium (Digest) 4 ( P3 ) 81 - 84 2005 [Refereed]

　View Summary

This paper presents a method for estimating location information about multiple sources. The proposed method uses independent component analysis (ICA) as a main statistical tool. The nearfield model as well as the farfield model can be assumed in this method. As an application of the method, we show experimental results for the direction-of-arrival (DOA) estimation of three sources that were positioned 3-dimensionally. © 2005 IEEE.

DOI

Scopus

20

Citation

(Scopus)
Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask

S Araki, S Makino, H Sawada, R Mukai

2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 III 81 - 84 2005 [Refereed]

　View Summary

Musical noise is a typical problem with blind source separation using a time-frequency mask. In this paper, we report that a fine-shift and overlap-add method reduces the musical noise without degrading the separation performance. The effectiveness was confirmed by results of a the listening test undertaken in a room with a reverberation time of RT60 =130 ms.
Estimating the number of sources using independent component analysis

Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

Acoustical Science and Technology 26 ( 5 ) 450 - 452 2005 [Refereed]

　View Summary

A new approach for estimating the number of sources that employs independent component analysis (ICA) is discussed. Estimating the number of sources provides information for signal processing applications such as blind source separation (BSS) in the frequency domain. The new method can identify a noise component that includes reverberations by calculating the correlation of the envelopes. The results show that the characteristics of the proposed approach compare with the conventional eigenvalue-based method.

DOI

Scopus

15

Citation

(Scopus)
Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking

H Sawada, S Araki, R Mukai, S Makino

2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS III 5882 - 5885 2005 [Refereed]

　View Summary

This paper presents a method for enhancing a target source of interest and suppressing other interference sources. The target source is assumed to be close to sensors, to have dominant power at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e. without knowing the total number of sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. We propose a new sophisticated method for selecting the target source frequency components, and also a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room (reverberation time was 130 ms) are presented to show the effectiveness and characteristics of the proposed method.
A spatio-temporal fastica algorithm for separating convolutive mixtures

SC Douglas, H Sawada, S Makino

2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 V 165 - 168 2005 [Refereed]

　View Summary

This paper presents a spatio-temporal extension of the well-known fastICA algorithm of Hyvarinen and Oja that is applicable to both convolutive blind source separation and multichannel blind deconvolution tasks. Our time-domain algorithm combines multichannel spatio-temporal prewhitening via multi-stage least-squares linear prediction with a fixed-point iteration involving a new adaptive technique for imposing paraunitary constraints on the multichannel separation filter. Our technique also allows for efficient reconstruction of individual signals as observed in the sensor measurements for single-input, multiple-output (SIMO) BSS tasks. Analysis and simulations verify the utility of the proposed methods.
Blind Source Separation of 3-D located many speech signals

R Mukai, H Sawada, S Araki, S Makino

2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) 9 - 12 2005 [Refereed]

　View Summary

This paper presents if prototype system for Blind Source Separation (BSS) of many speech signals and describes the techniques used in the system. Our System uses 8 microphones located at the vertexes of a 4cmx4cmx4cm cube and has the ability to separate signals distributed in three-dimensional space. The mixed signals observed by the microphone array are processed by Independent Component Analysis (ICA) in the frequency domain and separated into a given number of signals (LIP to 8). We carried Out experiments in all ordinary office and obtained more than 20 dB of SIR improvement.
On real and complex valued l(1)-norm minimization for overcomplete blind source separation

S Winter, H Sawada, S Makino

2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) 86 - 89 2005 [Refereed]

　View Summary

A maximum it-posteriori approach for overcomplete blind source separation based on Laplacian priors usually involves l(1)-norm minimization. It requires different approaches for real and complex numbers it,; they appear for example in the frequency domain. In this paper we compare a combinatorial approach for real numbers with it second order cone programming approach for complex numbers.
Although the combinatorial solution with a proven minimum number of zeros is not theoretically justified for complex numbers, its performance quality is comparable to the performance of the second order cone programming (SOCP) solution. However, it has the advantage that it is faster for complex overcomplete BSS problems with low input/output dimensions.
Hierarchical clustering applied to overcomplete BSS for convolutive mixtures

S. Winter, H. Sawada, S. Araki, S. Makino

SAPA2004 (ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing) 1 ( 3 ) 1-6 2004.10 [Refereed]
Underdetermined blind speech separation with directivity pattern based continuous mask and ICA

S. Araki, S. Makino, H. Sawada, R. Mukai

EUSIPCO2004 1991-1994 2004.09 [Refereed]
Blind source separation for moving speech signals using blockwise ICA and residual crosstalk subtraction

R Mukai, H Sawada, S Araki, S Makino

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E87A ( 8 ) 1941 - 1948 2004.08 [Refereed]

　View Summary

This paper describes a real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.
Convolutive blind source separation for more than two sources in the frequency domain

Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

Acoustical Science and Technology 25 ( 4 ) 296 - 298 2004.07 [Refereed]

　View Summary

The use of blind source separation (BSS) technique for the recovery of more than two sources inthe frequency domain was iprensented. It was found that frequency-domain BSS method was practically applicable for more than two sources by overcoming problem of permutation and circularity. The minimization error could be done by adjusting the scaling ambiguity of the independent component analysis (ICA) solution before windowing. The result shows that the effectiveness and efficiency of the BSS method and the separation of six sources with a planar array of eight sensors.

DOI

Scopus

4

Citation

(Scopus)
Underdetermined blind source separation for convolutive mixtures exploiting a sparseness-mixing matrix estimation (SMME)

A. Blin, S. Araki, S. Makino

ICA2004 (International Congress on Acoustics) IV 3139-3142 2004.04 [Refereed]
A causal frequency-domain implementation of a natural gradient multichannel blind deconvolution and source separation algorithm

S. Douglas, H. Sawada, S. Makino

ICA2004 (International Congress on Acoustics) I 85-88 2004.04 [Refereed]
Solving the permutation and circularity problems of frequency-domain blind source separation

H. Sawada, R. Mukai, S. Araki, S. Makino

ICA2004 (International Congress on Acoustics) I 89-92 2004.04 [Refereed]
Algorithmic complexity based blind source separation for convolutive speech mixtures

S. de la Kethulle, R. Mukai, H. Sawada, S. Makino

ICA2004 (International Congress on Acoustics) IV 3127-3130 2004.04 [Refereed]
A solution for the permutation problem in frequency domain BSS using near- and far-field models

R. Mukai, H. Sawada, S. Araki, S. Makino

ICA2004 (International Congress on Acoustics) IV 3135-3138 2004.04 [Refereed]
Underdetermined blind separation of convolutive mixtures of speech by combining time-frequency masks and ICA

S. Araki, S. Makino, A. Blin, R. Mukai, H. Sawada

ICA2004 (International Congress on Acoustics) I 321-324 2004.04 [Refereed]
Evaluation of separation and dereverberation performance in frequency domain blind source separation

Ryo Mukai, Shoko Araki, Hiroshi Sawada, Shoji Makino

Acoustical Science and Technology 25 ( 2 ) 119 - 126 2004.03 [Refereed]

　View Summary

In this paper, we propose a new method for evaluating the separation and dereverberation performance of a convolutive blind source separation (BSS) system, and investigate a separating system obtained by employing frequency domain BSS based on independent component analysis (ICA). As a result, we reveal the acoustical characteristics of the frequency domain BSS for convolutive mixture of speech signals. We show that the separating system removes the direct sound of a jammer signal even when the frame length is relatively short, and it also reduces the reverberation of the jammer according to the frame length. We also confirm that the reverberation of the target is not reduced. Moreover, we propose a technique, suggested by the experimental results, for improving the quality of the separated signals by removing pre-echo noise.

DOI

Scopus

9

Citation

(Scopus)
Underdetermined blind separation for speech in real environments with sparseness and ICA

S Araki, S Makino, A Blin, R Mukai, H Sawada

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS III 881 - 884 2004 [Refereed]

　View Summary

In this paper, we propose a method for separating speech signals when there are more signals than sensors. Several methods have already been proposed for solving the underdetermined problem, and some of these utilize the sparseness of speech signals. These methods employ binary masks to extract the signals, and therefore, their extracted signals contain loud musical noise. To overcome this problem, we propose combining a sparseness approach and independent component analysis (ICA). First, using sparseness, we estimate the time points when only one source is active. Then, we remove this single source from the observations and apply ICA to the remaining mixtures. Experimental results show that our proposed sparseness and ICA (SPICA) method can separate signals with little distortion even in reverberant conditions of T-R=130 and 200 ms.
Frequency domain blind source separation using small and large spacing sensor pairs

R Mukai, H Sawada, S Araki, S Makino

2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 5, PROCEEDINGS V 1 - 4 2004 [Refereed]

　View Summary

This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when the number of source signals is large, and the potential source locations are onmidirectional. We propose a combination of small and large spacing sensor pairs with various axis directions in order to obtain proper geometrical information for solving the permutation problem. Experimental results show that the proposed method can separate a mixture of six speech signals that come from various directions, even when two of them come from the same direction.
Convolutive blind source separation for more than two sources in the frequency domain

H Sawada, R Mukai, S Araki, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS III 885 - 888 2004 [Refereed]

　View Summary

Blind source separation (BSS) for convolutive mixtures can be efficiently achieved in the frequency domain, where independent component analysis is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem, which is well known as a difficult problem, especially when the number of sources is large. This paper presents a method for solving the permutation problem, which works well even for many sources. The successful solution for the permutation problem highlights another problem with frequency-domain BSS that arises from the circularity of discrete frequency representation. This paper discusses the phenomena of the problem and presents a method for solving it. With these two methods, we can separate many sources with a practical execution time. Moreover, real-time processing is currently possible for up to three sources with our implementation.
Audio source separation based on independent component analysis

S Makino, S Araki, R Mukai, H Sawada

2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 5, PROCEEDINGS V 668 - 671 2004 [Refereed]

　View Summary

This paper introduces the blind source separation (BSS) of convolutive mixtures of acoustic signals, especially speech. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving nonlinear decorrelation,.nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. This is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct unmixing filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction, and sources can be simultaneously active in BSS.
Near-field frequency domain blind source separation for convolutive mixtures

R Mukai, H Sawada, S Araki, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS IV 49 - 52 2004 [Refereed]

　View Summary

This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when source signals come from the same or similar directions. Geometric information such as the direction of arrival (DOA) is helpful for solving the permutation problem, and a combination of the DOA based and correlation based methods provides a robust and precise solution. However when signals come from similar directions, the DOA based approach fails, and we have to use only the correlation based method whose performance is unstable. In this paper, we show that an interpretation of the ICA solution by a near-field model yields information about spheres on which source signals exist, which can be used as an alternative to the DOA. Experimental results show that the proposed method can robustly separate a mixture of signals arriving from the same direction.
On coefficient delay in natural gradient blind deconvolution and source separation algorithms

SC Douglas, H Sawada, S Makino

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 634 - 642 2004 [Refereed]

　View Summary

In this paper, we study the performance effects caused by coefficient delays in natural gradient blind deconvolution and source separation algorithms. We present a statistical analysis of the effect of coefficient delays within such algorithms, quantifying the relative loss in performance caused by such coefficient delays with respect to delayless algorithm updates. We then propose a simple change to one such algorithm to improve its convergence performance.
Overcomplete BSS for convolutive mixtures based on hierarchical clustering

S Winter, H Sawada, S Araki, S Makino

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 652 - 660 2004 [Refereed]

　View Summary

In this paper we address the problem of overcomplete BSS for convolutive mixtures following a two-step approach. In the first step the mixing matrix is estimated, which is then used to separate the signals in the second step. For estimating the mixing matrix we propose an algorithm based on hierarchical clustering, assuming that the source signals are sufficiently sparse. It has the advantage of working directly on the complex valued sample data in the frequency-domain. It also shows better convergence than algorithms based on self-organizing maps. The results are improved by reducing the variance of direction of arrival. Experiments show accurate estimations of the mixing matrix and very low musical tone noise.
Natural gradient multichannel blind deconvolution and source separation using causal FIR filters

SC Douglas, H Sawada, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS V 477 - 480 2004 [Refereed]

　View Summary

Practical gradient-based adaptive algorithms for multichannel blind deconvolution and convolutive blind source separation typically employ FIR filters for the separation system. Inadequate use of signal truncation within these algorithms can introduce steady-state biases into their converged solutions that lead to degraded separation and deconvolution performances. In this paper, we derive a natural gradient multichannel blind deconvolution and source separation algorithm that mitigates these effects for estimating causal FIR solutions to these tasks. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for acoustic sources, even for extremely-short separation filters.
Convolutive blind source separation for more than two sources in the frequency domain

H Sawada, R Mukai, S Araki, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS 25 ( 4 ) 885 - 888 2004 [Refereed]

　View Summary

Blind source separation (BSS) for convolutive mixtures can be efficiently achieved in the frequency domain, where independent component analysis is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem, which is well known as a difficult problem, especially when the number of sources is large. This paper presents a method for solving the permutation problem, which works well even for many sources. The successful solution for the permutation problem highlights another problem with frequency-domain BSS that arises from the circularity of discrete frequency representation. This paper discusses the phenomena of the problem and presents a method for solving it. With these two methods, we can separate many sources with a practical execution time. Moreover, real-time processing is currently possible for up to three sources with our implementation.
Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ICA

S Araki, S Makino, H Sawada, R Mukai

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 898 - 905 2004 [Refereed]

　View Summary

We propose a method for separating N speech signals with M sensors where N > M. Some existing methods employ binary masks to extract the signals, and therefore, the extracted signals contain loud musical noise. To overcome this problem, we propose using a directivity pattern based continuous mask, which masks N - M sources in the observations, and independent component analysis (ICA) to separate the remaining mixtures. We conducted experiments for N = 3 with M = 2 and N = 4 with M = 2, and obtained separated signals with little distortion.
Natural gradient multichannel blind deconvolution and source separation using causal FIR filters

SC Douglas, H Sawada, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS 13 ( 1 ) 477 - 480 2004 [Refereed]

　View Summary

Practical gradient-based adaptive algorithms for multichannel blind deconvolution and convolutive blind source separation typically employ FIR filters for the separation system. Inadequate use of signal truncation within these algorithms can introduce steady-state biases into their converged solutions that lead to degraded separation and deconvolution performances. In this paper, we derive a natural gradient multichannel blind deconvolution and source separation algorithm that mitigates these effects for estimating causal FIR solutions to these tasks. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for acoustic sources, even for extremely-short separation filters.
A sparseness - Mixing Matrix Estimation (SMME) solving the underdetermined BSS for convolutive mixtures

A Blin, S Araki, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS IV 85 - 88 2004 [Refereed]

　View Summary

We propose a method for blindly separating real environment speech signals with as less distortion as possible in the special case where speech signals outnumber sensors. Our idea consists in combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to perform a preliminary separation and to detect when only one source is active. This information is then used to estimate the mixing matrix. Then we remove one source from the observations and separate the residual signals with the inverse of the estimated mixing matrix. Experimental results in a real environment (T-R=130ms and 200ms) show that our proposed method, which we call Sparseness Mixing Matrix Estimation (SMME), provides separated signals of better quality than those extracted by only using the sparseness property of the speech signal.
Frequency domain blind source separation for many speech signals

R Mukai, H Sawada, S Araki, S Makino

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 461 - 469 2004 [Refereed]

　View Summary

This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when the number of source signals is large, and the potential source locations are omnidirectional. We propose a combination of small and large spacing sensor pairs with various axis directions in order to obtain proper geometric information for solving the permutation problem. Experimental results in a room (reverberation time T-R = 130 ms) with eight microphones show that the proposed method can separate a mixture of six speech signals that come from various directions, even when two of them come from the same direction.
Estimating the number of sources for frequency-domain blind source separation

H Sawada, S Winter, R Mukai, S Araki, S Makino

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 610 - 617 2004 [Refereed]

　View Summary

Blind source separation (BSS) for convolutive mixtures can be performed efficiently in the frequency domain, where independent component analysis (ICA) is applied separately in each frequency bin. To solve the permutation problem of frequency-domain BSS robustly, information regarding the number of sources is very important. This paper presents a method for estimating the number of sources from convolutive mixtures of sources. The new method estimates the power of each source or noise component by using ICA and a scaling technique to distinguish sources and noises. Also, a reverberant component can be identified by calculating the correlation of component envelopes. Experimental results for up to three sources show that the proposed method worked well in a reverberant condition whose reverberation time was 200 ms.
Underdetermined blind separation of convolutive mixtures of speech with binary masks and ICA

S. Araki, S. Makino, H. Sawada, A. Blin, R. Mukai

NIPS2003 Workshop on ICA: Sparse Representations in Signal Processing 2 ( 7 ) 1-4 2003.12 [Refereed]
Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming for convolutive mixtures

S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, H. Saruwatari

EURASIP Journal on Applied Signal Processing 2003 ( 11 ) 1157-1166 - 1166 2003.11 [Refereed]

CiNii
Blind source separation when speech signals outnumber sensors using a sparseness-mixing matrix estimation (SMME)

A. Blin, S. Araki, S. Makino

IWAENC2003 211-214 - 214 2003.09 [Refereed]

CiNii
Blind separation of more speech than sensors with less distortion by combining sparseness and ICA

S. Araki, S. Makino, A. Blin, R. Mukai, H. Sawada

IWAENC2003 271-274 2003.09 [Refereed]

CiNii
Spectral smoothing for frequency-domain blind source separation

H. Sawada, R. Mukai, S. de la Kethulle, S. Araki, S. Makino

IWAENC2003 311-314 2003.09 [Refereed]

CiNii
Blind source separation for convolutive mixtures based on complexity minimization

S. de la Kethulle, R. Mukai, H. Sawada, S. Makino

IWAENC2003 303-306 2003.09 [Refereed]
Array geometry arrangement for frequency domain blind source separation

R. Mukai, H. Sawada, S. de la Kethulle, S. Araki, S. Makino

IWAENC2003 219-222 - 222 2003.09 [Refereed]

CiNii
Multistage ICA for blind source separation of real acoustic convolutive mixture

T. Nishikawa, H. Saruwatari, K. Shikano, S. Araki, S. Makino

ICA2003 523-528 2003.04 [Refereed]

CiNii
Subband based blind source separation with appropriate processing for each frequency band

S. Araki, S. Makino, R. Aichner, T. Nishikawa, H. Saruwatari

ICA2003 499-504 2003.04 [Refereed]

CiNii
Geometrical interpretation of the PCA subspace method for overdetermined blind source separation

S. Winter, H. Sawada, S. Makino

ICA2003 775-780 - 780 2003.04 [Refereed]

CiNii
Real-time blind source separation for moving speakers using blockwise ICA and residual crosstalk subtraction

R. Mukai, H. Sawada, S. Araki, S. Makino

ICA2003 975-980 - 980 2003.04 [Refereed]

CiNii
On-line time-domain blind source separation of nonstationary convolved signals

R. Aichner, H. Buchner, S. Araki, S. Makino

ICA2003 987-992 2003.04 [Refereed]
A robust and precise method for solving the permutation problem of frequency-domain blind source separation

H. Sawada, R. Mukai, S. Araki, S. Makino

ICA2003 505-510 2003.04 [Refereed]

CiNii
Geometrically constrained ICA for robust separation of sound mixtures

M. Knaak, S. Araki, S. Makino

ICA2003 951-956 2003.04 [Refereed]
Polar coordinate based nonlinear function for frequency-domain blind source separation

H Sawada, R Mukai, S Araki, S Makino

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E86A ( 3 ) 590 - 596 2003.03 [Refereed]

　View Summary

This paper discusses a nonlinear function for independent component analysis to process complex-valued signals in frequency-domain blind source separation. Conventionally, nonlinear functions based on the Cartesian coordinates are widely used. However, such functions have a convergence problem. In this paper, we propose a more appropriate nonlinear function that is based on the polar coordinates of a complex number. In addition, we show that the difference between the two types of functions arises from the assumed densities of independent components. Our discussion is supported by several experimental results for separating speech signals, which show that the polar type nonlinear functions behave better than the Cartesian type.
A robust approach to the permutation problem of frequency-domain blind source separation

H Sawada, R Mukai, S Araki, S Makino

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS V 381 - 384 2003 [Refereed]

　View Summary

This paper presents a robust and precise method for solving the permutation problem of frequency-domain blind source separation. It is based on two previous approaches: the direction of arrival estimation approach and the inter-frequency correlation approach. We discuss the advantages and disadvantages of the two approaches, and integrate them to exploit the both advantages. We also present a closed form formula to calculate a null direction, which is used in estimating the directions of source signals. Experimental results show that our method solved permutation problems almost perfectly for a situation that two sources were mixed in a room whose reverberation time was 300 ms.
Robust real-time blind source separation for moving speakers in a room

R Mukai, H Sawada, S Araki, S Makino

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS V 469 - 472 2003 [Refereed]

　View Summary

This paper describes a robust real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.
Geometrically constraint ICA for convolutive mixtures of sound

M Knaak, S Araki, S Makino

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS II 725 - 728 2003 [Refereed]

　View Summary

The goal of this contribution is a new algorithm using independent component analysis with a geometrical constraint. The new algorithm solves the permutation problem of blind source separation of acoustic mixtures, and it is significantly less sensitive to the precision of the geometrical constraint than an adaptive beamformer. A high degree of robustness is very important since the steering vector is always roughly estimated in the reverberant environment, even when the look direction is precise. The new algorithm is based on FastICA and constrained optimization. It is theoretically and experimentally analyzed with respect to the roughness of the steering vector estimation by using impulse responses of real room. The effectiveness of the algorithms for real-world mixtures is also shown in the case of three sources and three microphones.
Direction of arrival estimation for multiple source signals using independent component analysis

H Sawada, R Mukai, S Makino

SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 2, PROCEEDINGS 411 - 414 2003 [Refereed]

　View Summary

This paper presents a new method for estimating the directions of source signals. We assume a situation in which multiple source signals are mixed in a reverberant condition and observed at several sensors. The new method is based on independent component analysis, which separates mixed signals into original source signals. It can be applied where the number of sources is equal to the number of sensors, whereas the conventional methods based on sub-space analysis, such as the MUSIC algorithm, are applicable where there are fewer sources than sensors. Even in cases where the MUSIC algorithm can be applied, the new method is better at estimating the directions of sources if they are closely placed.
Subband based blind source separation for convolutive mixtures of speech

S Araki, S Makino, R Aichner, T Nishikawa, H Saruwatari

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS V 509 - 512 2003 [Refereed]

　View Summary

Subband processing is applied to blind source, separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In our proposed subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can handle long reverberation. Subband BSS achieves better performance than frequency-domain BSS. Moreover, we propose efficient separation procedures that take into consideration the frequency characteristics of room reverberation and speech signals. We achieve this (3) by using longer unmixing filters in low frequency bands, and (4) by adopting overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized in the proposed subband BSS.
Geometrical understanding of the PCA subspace method for overdetermined blind source separation

S Winter, H Sawada, S Makino

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS 769 - 772 2003 [Refereed]

　View Summary

In this paper, we discuss approaches for blind source separation where we can use more sensors than the number of sources for a better performance. The discussion focuses mainly on reducing the dimension of mixed signals before applying independent component analysis. We compare two previously proposed methods. The first is based on principal component analysis, where noise reduction is achieved. The second involves selecting a subset of sensors based on the fact that a low frequency prefers a wide spacing and a high frequency prefers a narrow spacing. We found that the PCA-based method behaves similarly to the geometry-based method for low frequencies in the way that it emphasizes the outer sensors and yields superior results for high frequencies, which provides a better understanding of the former method.
Natural gradient blind deconvolution and equalization using causal FIR filters

SC Douglas, HO Sawada, S Makino

CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2 2 ( 3 ) 197 - 201 2003 [Refereed]

　View Summary

Natural gradient adaptation is an especially convenient method for adapting the coefficients of a linear system in inverse filtering tasks such as blind deconvolution and equalization. Practical implementations of such methods require truncation of the filter impulse responses within the gradient updates. In this paper, we show how truncation of these filter impulse responses can create convergence problems and introduces a bias into the steady-state solution of one such algorithm. We then show how this algorithm can be modified to effectively mitigate these effects for estimating causal FIR approximations to doubly-infinite IIR equalizers. Simulations indicate that the modified algorithm provides the convergence benefits of the natural gradient while still attaining good steady-state performance.
ICA-based blind source separation of sounds

S. Makino, S. Araki, R. Mukai, H. Sawada, H. Saruwatari

JCA2002 (China-Japan Joint Conference on Acoustics) 83-86 - 86 2002.11 [Refereed]

CiNii
Digital technologies for controlling room acoustics

M. Miyoshi, S. Makino

JCA2002 (China-Japan Joint Conference on Acoustics) 19-24 2002.11 [Refereed]
Blind source separation for convolutive mixtures of speech using subband processing

S. Araki, S. Makino, R. Aichner, T. Nishikawa, H. Saruwatari

SMMSP2002 (International Workshop on Spectral Methods and Multirate Signal Processing) 195-202 2002.09 [Refereed]

CiNii
Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming

S Araki, Y Hinamoto, S Makino, T Nishikawa, R Mukai, H Saruwatari

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS 1785 - 1788 2002 [Refereed]

　View Summary

Frequency domain Blind Source Separation (BSS) is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., Adaptive Beamformers (ABFs). The minimization of the off-diagonal components in the BSS update equation can be viewed as the minimization of the mean square error in the ABF. The unmixing matrix of the BSS and the filter coefficients of the ABF converge to the same solution in the mean square error sense if the two source signals are ideally independent. Therefore, the performance of the BSS is limited by that of the ABF. This understanding gives an interpretation of BSS from physical point of view.
Time domain blind source separation of non-stationary convolved signals by utilizing geometric beamforming

R Aichner, S Araki, S Makino

NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS 445 - 454 2002 [Refereed]

　View Summary

We propose a time-domain BSS algorithm that utilizes geometric information such as sensor positions and assumed locations of sources. The algorithm tackles the problem of convolved mixtures by explicitly exploiting the non-stationarity of the acoustic sources. The learning rule is based on secondorder statistics and is derived by natural gradient minimization. The proposed initialization of the algorithm is based on the null beamforming principle. This method leads to improved separation performance, and the algorithm is able to estimate long unmixing FIR filters in the time domain due to the geometric initialization. We also propose a post-filtering method for dewhitening which is based on the scaling technique in frequency-domain BSS. The validity of the proposed method is shown by computer simulations. Our experimental results confirm that the algorithm is capable of separating real-world speech mixtures and can be applied to short learning data sets down to a few seconds. Our results also confirm that the proposed dewhitening post-filtering method maintains the spectral content of the original speech in the separated output.
Enhanced frequency-domain adaptive algorithm for stereo echo cancellation

S Emura, Y Haneda, S Makino

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS II 1901 - 1904 2002 [Refereed]

　View Summary

Highly cross-correlated input signals create the problem of slow convergence of misalignment in stereo echo cancellation even after undergoing non-linear preprocessing. We propose a new frequency-domain adaptive algorithm that improves the convergence rate by increasing the contribution of non-linearity in the adjustment vector. Computer simulation showed that it is effective when the non-linearity gain is small.
Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming

S Araki, Y Hinamoto, S Makino, T Nishikawa, R Mukai, H Saruwatari

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS II 1785 - 1788 2002 [Refereed]

　View Summary

Frequency domain Blind Source Separation (BSS) is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., Adaptive Beamformers (ABFs). The minimization of the off-diagonal components in the BSS update equation can be viewed as the minimization of the mean square error in the ABF. The unmixing matrix of the BSS and the filter coefficients of the ABF converge to the same solution in the mean square error sense if the two source signals are ideally independent. Therefore, the performance of the BSS is limited by that of the ABF. This understanding gives an interpretation of BSS from physical point of view.
Polar coordinate based nonlinear function for frequency-domain blind source separation

H Sawada, R Mukai, S Araki, S Makino

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS I 1001 - 1004 2002 [Refereed]

　View Summary

This paper presents a new type of nonlinear function for independent component analysis to process complex-valued signals, which is used in frequency-domain blind source separation. The new function is based on the polar coordinates of a complex number, whereas the conventional one is based on the Cartesian coordinates. The new function is derived from the probability density function of frequency-domain signals that are assumed to be independent of the phase. We show that the difference between the two types of functions is in the assumed densities of independent components. Experimental results for separating speech signals show that the new nonlinear function behaves better than the conventional one.
Removal of residual cross-talk components in blind source separation using time-delayed spectral subtraction

R Mukai, S Araki, H Sawada, S Makino

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS II 1789 - 1792 2002 [Refereed]

　View Summary

This paper describes a post processing method to refine output signals obtained by Blind Source Separation (BSS). The performance of BSS using Independent Component Analysis (ICA) declines significantly in a reverberant environment. The degradation is mainly caused by the cross-talk components derived from the reverberation of the jammer signal. Utilizing this knowledge, we propose a new method, time-delayed non-stationary spectral subtraction, which removes the residual components from the separated signals precisely. The proposed method compensates for the weakness of BSS in a reverberant environment. Experimental results using speech signals show that the proposed method improves the signal-to-noise ratio by 3 to 5 dB.
Removal of residual crosstalk components in blind source separation using LMS filters

R Mukai, S Araki, H Sawada, S Makino

NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS 435 - 444 2002 [Refereed]

　View Summary

The performance of Blind Source Separation (BSS) using Independent Component Analysis (ICA) declines significantly in a reverberant environment. The degradation is mainly caused by the residual crosstalk components derived from the reverberation of the jammer signal. This paper describes a post-processing method designed to refine output signals obtained by BSS.
We propose a new method which uses LMS filters in the frequency domain to estimate the residual crosstalk components in separated signals. The estimated components are removed by nonstational spectral subtraction. The proposed method removes the residual components precisely, thus it compensates for the weakness of BSS in a reverberant environment.
Experimental results using speech signals show that the proposed method improves the signal-to-interference ratio by 3 to 5 dB.
Blind source separation with different sensor spacing and filter length for each frequency range

H Sawada, S Araki, R Mukai, S Makino

NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS 465 - 474 2002 [Refereed]

　View Summary

This paper presents a method for blind source separation using several separating subsystems whose sensor spacing and filter length can be configured individually. Each subsystem is responsible for source separation of an allocated frequency range. With this mechanism, we can use appropriate sensor spacing as well as filter length for each frequency range. We obtained better separation performance than with the conventional method by using a wide sensor spacing and a long filter for a low frequency range, and a narrow sensor spacing and a short filter for a high frequency range.
Separation and dereverberation performance of frequency domain blind source separation

R. Mukai, S. Araki, S. Makino

ICA2001 230-235 2001.12 [Refereed]
A polar-coordinate based activation function for frequency domain blind source separation

H. Sawada, R. Mukai, S. Araki, S. Makino

ICA2001 663-668 2001.12 [Refereed]

CiNii
Blind source separation in a real room

S. Makino, S. Araki, R. Mukai, S. Katagiri

Journal of IEICE Japan 84 ( 11 ) 848 - 848 2001.11 [Refereed]

CiNii
Equivalence between frequency domain blind source separation and frequency domain adaptive beamformers

S. Araki, S. Makino, R. Mukai, H. Saruwatari

CRAC (A workshop on Consistent and Reliable acoustic cues for sound analysis) 2 ( 4 ) 1-4 2001.09 [Refereed]
ICASSP2001 conference report

S. Makino, S. Araki

Journal of Japanese Society for Artificial Intelligence 16 ( 5 ) 736-737 2001.09 [Refereed]
Adaptive filtering algorithm enhancing decorrelated additive signals for stereo echo cancellation

S. Emura, Y. Haneda, S. Makino

IWAENC2001 67-70 2001.09 [Refereed]
Separation and dereverberation performance of frequency domain blind source separation in a reverberant environment

R. Mukai, S. Araki, S. Makino

IWAENC2001 127-130 2001.09 [Refereed]

CiNii
Equivalence between frequency domain blind source separation and frequency domain adaptive null beamformers

S. Araki, S. Makino, R. Mukai, H. Saruwatari

Eurospeech2001 2595-2598 2001.09 [Refereed]
Separation and dereverberation performance of frequency domain blind source separation for speech in a reverberant environment

R. Mukai, S. Araki, S. Makino

Eurospeech2001 (European Conference on Speech Communication and Technology) 2599-2602 2001.09 [Refereed]
A design of a hands-free communication unit using loudspeakers and microphones with a flat directional pattern

A. Nakagawa, S. Shimauchi, Y. Haneda, S. Aoki, S. Makino

J. Acoust. Soc. Jpn 57 ( 8 ) 509-516 - 516 2001.08 [Refereed]

CiNii
Fundamental limitation of frequency domain Blind Source Separation for convolutive mixture of speech

A Shoko, S. Makino, T Nishikawa, H Saruwatari

2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS 2737 - 2740 2001 [Refereed]

　View Summary

Despite several recent proposals to achieve Blind Source Separation (BSS) for realistic acoustic signal, separation performance is still not enough. In particular, when the length of impulse response is long, performance is highly limited. In this paper, we show it is useless to be constrained by the condition, Pmuch less than T, where T is the frame size of FFT and P is the length of room impulse response. From our experiments, a frame size of 256 or 512 (32 or 64 ms at a sampling frequency of 8 kHz) is best even for the long room reverberation of T-R = 150 and 300 ms. We also clarified the reason for poor performance of BSS in long reverberant environment, finding that separation is achieved chiefly for the sound from the direction of jammer because BSS cannot calculate the inverse of the room transfer function both for the target and jammer signals.
Stereophonic acoustic echo cancellation: An overview and recent solutions

S. Makino

Acoustical Science and Technology 22 ( 5 ) 325 - 333 2001 [Refereed]

　View Summary

The fundamental problems of stereophonic acoustic echo cancellation were discussed and the recent solutions were reviewed. The stereo echo cancellation was achieved by linearly combining two monoaural echo cancellers. A duofilter control system including a continually running adaptive filter and a fixed filter was used for double talk control. A second order stereo projection algorithm was used in the adaptive filter and a stereo switch was also implemented.

DOI CiNii

Scopus

11

Citation

(Scopus)
Subjective assessment of the desired echo return loss for subband acoustic echo cancellers

S Sakauchi, Y Haneda, S Makino, M Tanaka, Y Kaneda

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E83A ( 12 ) 2633 - 2639 2000.12 [Refereed]

　View Summary

We investigated the dependence of the desired echo return loss on frequency for various hands-free telecommunication conditions by subjective assessment The desired echo return loss as a function of frequency (DERLf) is all important factor in the design and performance evaluation of a subband echo canceller, and it is a measure of what is considered all acceptable echo caused by electrical loss in the transmission line. The DERLf during single talk was obtained as attenuated band-limited echo levels that subjects did not find objectionable when listening to the near-end speech and its band-limited echo under various hands-free telecommunication conditions. When we investigated the DERLf during double-talk, subjects also heard the speech in the far-end room from a loudspeaker. The echo was limited to a 250-Hz bandwidth assuming the use of a subband echo canceller. The test results showed that: (1) when the transmission delay was short (30 ms), the echo component around 2 to 3 kHz was the most objectionable to listeners. (2) as the transmission delay rose to 300 ms, the echo component around 1 kHz became the most objectionable; (3) when the room reverberation time was relatively long (about 500 ms). the echo cumyonent around 1 kHz was the most objectionable even if the transmission delay was short; and ( 1) the DERLf during double-talk was about 5 to 10dB lower than that during single-talk. Use of these DERLf values will enable the design of mure efficient subband echo cancellers.
A study of microphone system for hands-free teleconferencing units

Akira Nakagawa, Suehiro Shimauchi, Shoji Makino

Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) 21 ( 1 ) 33 - 35 2000 [Refereed]

DOI

Scopus
Channel-number-compressed multi-channel acoustic echo canceller for high-presence teleconferencing system with large display

A Nakagawa, S Shimauchi, Y Haneda, S Aoki, S Makino

2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI 813 - 816 2000 [Refereed]

　View Summary

Sound localization is important to make conversation easy between local and remote sites in a teleconference. This requires a multi-channel sound system having a multi-channel acoustic echo canceller (MAEC). The appropriate number of channels is determined from a trade-off between high presence and MAEC performance, so it is not possible to increase the channel number by much.
We propose a channel-number-compressed MAEC to provide teleconferencing systems that exhibit high presence. The channel number of the MAEC inputs is compressed and that of its outputs is expanded.
Hybrid of acoustic echo cancellers and voice switching control for multi-channel applications

S. Shimauchi, A. Nakagawa, Y. Haneda, S. Makino

IWAENC99 48-51 1999.09 [Refereed]

CiNii
Subband echo canceler with an exponentially weighted stepsize NLMS adaptive filter

S Makino, Y Haneda

ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE 82 ( 3 ) 49 - 57 1999.03 [Refereed]

　View Summary

This paper proposes a novel adaptive algorithm for an echo canceler. In this algorithm, the number of operations and memory capacity are equivalent to those of the conventional NLMS algorithm but the convergence speed is twice that using the conventional algorithm. This adaptive algorithm is referred to as subband ES (exponentially weighted stepsize). In the algorithm, the frequency bands of the received input signal and echo signal are divided into multiple subbands, and echo is independently canceled in each subband. Each adaptive filter in each subband has independent coefficients with an independent stepsize. The stepsize is time-independent and its weight is exponentially proportional to the change of the impulse response within the frequency region, such as the expected value of the difference between the waveforms of two impulse responses. As a result, the characteristic of the acoustic echo path in each frequency band is analyzed using the adaptive algorithm to improve the convergence characteristic. Using the results of computer simulation and experimental results obtained via an experimental setup with DSP, it is shown that the convergence speed with respect to input voice signal can be about 4 times faster when using echo cancellation based on the new algorithm than in conventional full-band echo cancellation based on the NLMS algorithm. (C) 1998 Scripta Technica, Electron Comm Jpn Pt 3, 82(3): 49-57, 1999.
A stereo echo canceller implemented using a stereo shaker and a duo-filter control system

S Shimauchi, S Makino, Y Haneda, A Nakagawa, S Sakauchi

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI 857 - 860 1999 [Refereed]

　View Summary

Stereo echo cancellation has been achieved and used in daily teleconferencing. To overcome the non-uniqueness problem, a stereo shaker is introduced in eight frequency bands and adjusted so as to be inaudible and not affect stereo perception. A duo-filter control system including a continually running adaptive filter and a fixed filter is used for double-talk control. A second-order stereo projection algorithm is used in the adaptive filter. A stereo voice switch is also included. This stereo echo canceller was tested in two-way conversation in a conference room, and the strength of the stereo shaker was subjectively adjusted. A misalignment of 20 dB was obtained in the teleconferencing environment, and changing the talker's position in the transmission room did not affect the cancellation. This echo canceller is now used daily in a high-presence teleconferencing system and has been demonstrated to more than 300 attendees.
New configuration for a stereo echo canceller with nonlinear pre-processing

S Shimauchi, Y Haneda, S Makino, Y Kaneda

PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6 3685 - 3688 1998 [Refereed]

　View Summary

A new configuration for a stereo echo canceller with nonlinear pre-processing is proposed. The pre-processor which adds uncorrelated components to the original received stereo signals improves the adaptive filter convergence even in the conventional configuration. However, because of the inaudibility restriction, the preprocessed signals still include a large amount of the original stereo signals which are often highly cross-correlated. Therefore, the improvement is limited. To overcome this, our new stereo echo canceller includes exclusive adaptive filters whose inputs are the uncorrelated signals generated in the pre-processor. These exclusive adaptive filters converge to true solutions without suffering from cross-correlation between the original stereo signals. This is demonstrated through computer simulation results.
Subband acoustic echo canceller using two different analysis filters and 8th order projection algorithm

A. Nakagawa, Y. Haneda, S. Makino

IWAENC97 140-143 1997.09 [Refereed]

CiNii
Subjective assessment of echo return loss required for subband acoustic echo cancellers

S. Sakauchi, Y. Haneda, S. Makino

IWAENC97 152-155 1997.09 [Refereed]
Multiple-point equalization of room transfer functions by using common acoustical poles

Y Haneda, S Makino, Y Kaneda

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 5 ( 4 ) 325 - 333 1997.07 [Refereed]

　View Summary

A multiple-point equalization filter using the common acoustical poles of room transfer functions is proposed, The common acoustical poles correspond to the resonance frequencies, which are independent of source and receiver positions. They are estimated as common autoregressive (AR) coefficients from multiple room transfer functions. The equalization is achieved with a finite impulse response (FIR) filter, which has the inverse characteristics of the common acoustical pole function. Although the proposed filter cannot recover the frequency response dips of the multiple room transfer functions, it can suppress their common peaks due to resonance; it is also less sensitive to changes in receiver position, Evaluation of the proposed equalization filter using measured room transfer functions shows that it can reduce the deviations in the frequency characteristics of multiple room transfer functions better than a conventional multiple-point inverse filter, Experiments show that the proposed filter enables 1-5 dB additional amplifier gain in a public address system without acoustic feedback at multiple receiver positions, Furthermore, the proposed filter reduces the reflected sound in room impulse responses without the pre-echo that occurs with a multiple-point inverse filter. A multiple-point equalization filter using common acoustical poles can thus equalize multiple room transfer functions by suppressing their common peaks.
Subband stereo echo canceller using the projection algorithm with fast convergence to the true echo path

S Makino, K Strauss, S Shimauchi, Y Haneda, A Nakagawa

1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V 299 - 302 1997 [Refereed]

　View Summary

This paper proposes a new subband stereo echo canceller that converges to the true echo path impulse response much faster than conventional stereo echo cancellers. Since signals are bandlimited and downsampled in the subband structure, the time interval between the subband signals become longer, so the variation of the crosscorrelation between the stereo input signals becomes large. Consequently, convergence to the true solution is improved. Furthermore, the projection algorithm, or affine projection algorithm, is applied to further speed up the convergence. Computer simulations using stereo signals recorded in a conference room demonstrate that this method significantly improves convergence speed and almost solves the problem of stereo echo cancellation with low computational load.
Noise reduction for subband acoustic echo canceller

J. Sasaki, Y. Haneda, S. Makino

Joint meeting, Acoustical Society of America and Acoustical Society of Japan 1285-1290 1996.12 [Refereed]

CiNii
Implementation and evaluation of an acoustic echo canceller using duo-filter control system

Y. Haneda, S. Makino, J. Kojima, S. Shimauchi

EUSIPCO96 (European Signal Processing Conference) 1115-1118 - 1118 1996.09 [Refereed]

CiNii
SSB subband echo canceller using low-order projection algorithm

S Makino, J Noebauer, Y Haneda, A Nakagawa

1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6 945 - 948 1996 [Refereed]
Stereo echo cancellation algorithm using imaginary input-output relationships

S Shimauchi, S Makino

1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6 941 - 944 1996 [Refereed]
A FAST PROJECTION ALGORITHM FOR ADAPTIVE FILTERING

M TANAKA, Y KANEDA, S MAKINO, J KOJIMA

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E78A ( 10 ) 1355 - 1361 1995.10 [Refereed]

　View Summary

This paper proposes a new algorithm called the fast Projection algorithm, which reduces the computational complexity of the Projection algorithm from (p+1)L+O(p(3)) to 2L+20p (where L is the length of the estimation filter and p is the projection order.) This algorithm has properties that lie between those of NLMS and RLS, i.e. less computational complexity than RLS but much faster convergence than NLMS for input signals like speech. The reduction of computation consists of two parts. One concerns calculating the pre-filtering vector which originally took O(p(3)) operations. Our new algorithm computes the pre-filtering vector recursively with about 15p operations. The other reduction is accomplished by introducing an approximation vector of the estimation filter. Experimental results for speech input show that the convergence speed of the Projection algorithm approaches that of RLS as the projection order increases with only a slight extra calculation complexity beyond that of NLMS, which indicates the efficiency of the proposed fast Projection algorithm.
Relationship between the 'ES family' algorithms and conventional adaptive algorithms

S., Makino

IWAENC95 11-14 1995.06 [Refereed]

CiNii
Implementation and evaluation of an acoustic echo canceller using the duo-filter control system

Y. Haneda, S. Makino, J. Kojima, S. Shimauchi

IWAENC95 79-82 1995.06 [Refereed]
Can echo cancellers cancel howling in PA systems?

牧野昭二

J. Acoust. Soc. Jpn. 51 ( 3 ) 248 1995.03 [Refereed]
STEREO PROJECTION ECHO CANCELER WITH TRUE ECHO PATH ESTIMATION

S SHIMAUCHI, S MAKINO

1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5 3059 - 3062 1995 [Refereed]
FAST PROJECTION ALGORITHM AND ITS STEP-SIZE CONTROL

M TANAKA, Y KANEDA, S MAKINO, J KOJIMA

1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5 945 - 948 1995 [Refereed]
High-performance acoustic echo canceller development

J. Kojima, S. Makino, Y. Haneda, S. Shimauchi

NTT R&D 44 ( 1 ) 39-44 1995.01 [Refereed]
Common acoustical pole and zero modeling of room transfer functions

Y. Haneda, S. Makino, Y. Kaneda

NTT R&D 44 ( 1 ) 53-58 - 101 1995.01 [Refereed]

　View Summary

A new model(Common-Acoustical-Pole and Zero model:CAPZ model)is proposed for a room transfer function(RTF)by using common acoustical poles that correspond to resonance properties of a room. These poles are estimated as the common AR coefficients of many RTFs corresponding to different source and receiver positions. Using the estimated common AR coefficients,the proposed method models the RTFs with different MA coefficients.This new model requires far fewer variable parameters to represent RTFs than the conventional ab-zero or pole, zero model.The acoustic echo canceller based on the proposed model requires half the variable parameters and converges 1.5 times faster than one based on the all-zero model,confmning the efficiency of the proposed model.

CiNii
Report on the 1994 International Conference on Acoustic, Speech, and Signal Processing

S. Makino, t al

J. Acoust. Soc. Jpn. 50 ( 9 ) 759-760 1994.09 [Refereed]
Research on the adaptive signal processing for acoustic echo cancellation

S. Makino

J. Acoust. Soc. Jpn. 75 1994.01 [Refereed]
Arma modeling of a room transfer function at low frequencies

Yoichi Haneda, Shoji Makino, Yutaka Kaneda, Nobuo Koizumi

Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) 15 ( 5 ) 353 - 355 1994 [Refereed]

DOI

Scopus

8

Citation

(Scopus)
A NEW RLS ALGORITHM-BASED ON THE VARIATION CHARACTERISTICS OF A ROOM IMPULSE-RESPONSE

S MAKINO, Y KANEDA

ICASSP-94 - PROCEEDINGS, VOL 3 III 373 - 376 1994 [Refereed]
A new design for program controlled voice switching circuits using a microprocessor

H. Oikawa, M. Nishino, K. Yamamori, S. Makino

IEICE Trans. Fundamentals J77-B-I ( 1 ) 66-74 - 74 1994.01 [Refereed]

CiNii
Common acoustical poles independent of sound directions and modeling of head-related transfer functions

Yoichi Haneda, Shoji Makino, Yutaka Kaneda

Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) 15 ( 4 ) 277 - 279 1994 [Refereed]

DOI

Scopus

1

Citation

(Scopus)
EXPONENTIALLY WEIGHTED STEP-SIZE PROJECTION ALGORITHM FOR ACOUSTIC ECHO CANCELERS

S MAKINO, Y KANEDA

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E75A ( 11 ) 1500 - 1508 1992.11 [Refereed]

　View Summary

This paper proposes a new adaptive algorithm for acoustic echo cancellers with four times the convergence speed for a speech input, at almost the same computational load, of the normalized LMS (NLMS). This algorithm reflects both the statistics of the variation of a room impulse response and the whitening of the received input signal. This algorithm, called the ESP (exponentially weighted step-size projection) algorithm, uses a different step size for each coefficient of an adaptive transversal filter. These step sizes are time-invariant and weighted proportional to the expected variation of a room impulse response. As a result, the algorithm adjusts coefficients with large errors in large steps, and coefficients with small errors in small steps. The algorithm is based on the fact that the expected variation of a room impulse response becomes progressively smaller along the series by the same exponential ratio as the impulse response energy decay. This algorithm also reflects the whitening of the received input signal, ie., it removes the correlation between consecutive received input vectors. This process is effective for speech, which has a highly non-white spectrum. A geometric interpretation of the proposed algorithm is derived and the convergence condition is proved. A fast projection algorithm is introduced to reduce the computational complexity and modified for a practical multiple DSP structure so that it requires almost the same computational load, 2L multiply-add operations, as the conventional NLMS. The algorithm is implemented in an acoustic echo canceller constructed with multiple DSP chips, and its fast convergence is demonstrated.
MODELING OF A ROOM TRANSFER-FUNCTION USING COMMON ACOUSTICAL POLES

Y HANEDA, S MAKINO, Y KANEDA

ICASSP-92 - 1992 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 II B213 - B216 1992 [Refereed]
Subband echo canceller with an exponentially weighted step size NLMS adaptive filter

S. Makino, Y. Haneda

IWAENC91 (International Workshop on Acoustic Echo and Noise Control) 109-120 1991.09 [Refereed]
Report on the 1990 International Conference on Acoustic, Speech, and Signal Processing

K. Hirose, S. Nakagawa, T. Taniguchi, S. Makino

J. Acoust. Soc. Jpn. 46 ( 10 ) 869-870 - 870 1990.10 [Refereed]

CiNii
Recent techniques of circuit and acoustic echo control for telephony

S. Shimada, S. Makino

Journal of the Institute of Image Information and Television Engineers 44 ( 3 ) 222-227 - 227 1990.03 [Refereed]

DOI CiNii
ACOUSTIC ECHO CANCELER ALGORITHM BASED ON THE VARIATION CHARACTERISTICS OF A ROOM IMPULSE-RESPONSE

S MAKINO, Y KANEDA

ICASSP 90, VOLS 1-5 1133 - 1136 1990 [Refereed]
Echo control in telecommunications

Shoji Makino, Shoji Shimada

Journal of the Acoustical Society of Japan (E) 11 ( 6 ) 309 - 316 1990 [Refereed]

　View Summary

This paper reviews echo control techniques for telecommunications, emphasizing the principles and applications of both circuit and acoustic echo cancellers. First, echo generating mechanisms and echo problems are described for circuit and acoustic echoes. Circuit echo is caused by impedance mismatching in a hybrid coil. Acoustic echo is caused by acoustic coupling between loudspeakers and microphones in a room. The echo problem is severe when the round-trip propagation delay is long. In this case, the echo must be removed. Next, the basic principle of the echo canceller, adaptive filter structure and adaptive algorithm are discussed. Emphasis is focused on the construction and operation of an adaptive transversal filter using the NLMS (Normalized Least Mean Square) algorithm, which is the most popular for the echo canceller. Then, applications of circuit and acoustic echo cancellers are described. Circuit echo cancellers have been well studied and implemented in LSIs for many applications. Although acoustic echo cancellers have been introduced into audio teleconference systems, they still have some problems which must be solved. Therefore, they are now being studied intensely. Finally, this paper mentions the problems of echo cancellers and the direction of future work on them. The main targets for acoustic echo cancellers are improving the convergence speed, reducing the amount of hardware and bettering the double-talk control technique. © 1990, Acoustical Society of Japan. All rights reserved.

DOI

Scopus

5

Citation

(Scopus)
Acoustic echo canceller algorithm based on room acoustic characteristics

S. Makino, N. Koizumi

WASPAA89 (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics) 1 ( 1 ) 1-2 1989.10 [Refereed]
A coustic echo canceller with multiple echo paths

Nobuo Koizumi, Shoji Makino, Hiroshi Oikawa

Journal of the Acoustical Society of Japan (E) 10 ( 1 ) 39 - 45 1989 [Refereed]

　View Summary

A new configuration of acoustic echo canceller for multiple microphone teleconferencing systems is proposed. It is designed for use with microphones whose gains switch or vary during teleconferencing according to the talker. This system requires memory for multiple echo paths, which enables the updating of filter coefficients when an echo path is changed due to the switching of the actuated microphone during talker alternation. In comparison to the single echo path model which uses only adaptation, this method maintains echo cancellation during abrupt changes of the echo path when the microphone alternates between talkers. Also in comparison to direct microphone output mixing, this method reduces the stationary residual echo level by the reduction of acoustic coupling. © 1989, Acoustical Society of Japan. All rights reserved.

DOI

Scopus

3

Citation

(Scopus)
Improvement on adaptation of an echo canceller in a room

S. Makino, N. Koizumi

IEICE Trans. Fundamentals J71-A ( 12 ) 2212-2214 - 2214 1988.12 [Refereed]

CiNii
AUDIO TELECONFERENCING SET WITH MULTIPATH ECHO CANCELLER

H OIKAWA, N KOIZUMI, S MAKINO

REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES 36 ( 2 ) 217 - 223 1988.03 [Refereed]
Audio teleconferencing set with multi-path echo canceller

H. Oikawa, N. Koizumi, S. Makino

ECL Technical Journal 37 ( 2 ) 191-197 - 197 1988.02 [Refereed]

CiNii
Vibration characteristics of a piezoelectric bimorph diaphragm with a step-shaped edge

S. Makino, Y. Ichinose

J. Acoust. Soc. Jpn. 43 ( 3 ) 161-166 - 166 1987.03 [Refereed]

CiNii

▼display all

Books and Other Publications

Audio Source Separation

Makino,Shoji( Part： Sole author)

Springer International Publishing 2018.03 ISBN: 9783319730318
Underdetermined blind source separation using acoustic arrays

S., Makino, S., Araki, S., Winter, and, H. Sawada( Part： Sole author)

Wiley 2010.01
Underdetermined blind source separation of convolutive mixtures by hierarchical clustering and L1-norm minimization

S. Winter, W. Kellermann, H. Sawada, S. Makino( Part： Other)

Springer 2007.09
Frequency-domain blind source separation

H. Sawada, S. Araki, S. Makino( Part： Other)

Springer 2007.09
K-means based underdetermined blind speech separation

S. Araki, H. Sawada, S. Makino( Part： Other)

Springer 2007.09
Blind Speech Separation

S. Makino, Te-Won Lee, H. Sawada( Part： Edit)

Springer 2007.09 ISBN: 9781402064784

　View Summary

http://www.amazon.co.jp/Speech-Separation-Signals-Communication-Technology/dp/1402064780
Blind source separation of convolutive mixtures of audio signals in frequency domain

S. Makino, H. Sawada, R. Mukai, S. Araki( Part： Sole author)

Springer 2006.05
Speech Enhancement

J. Benesty, S. Makino, J. Chen( Part： Edit)

Springer 2005.05 ISBN: 354024039X

　View Summary

http://www.amazon.co.jp/Speech-Enhancement-Signals-Communication-Technology/dp/354024039X
Real-time blind source separation for moving speech signals

R. Mukai, H. Sawada, S. Araki, S. Makino( Part： Other)

Springer 2005.03
Subband based blind source separation

S. Araki, S. Makino( Part： Other)

Springer 2005.03
Blind source separation of convolutive mixtures of speech

S., Makino( Part： Sole author)

Springer 2003.01
IEICE Knowledge Base

S.Makino( Part： Contributor, Blind audio source separation based on sparse component analysis)

IEICE 2012.10
2011 IEEE REGION 10 CONFERENCE TENCON 2011

Takeda, Kazuma, Kameoka, Hirokazu, Sawada, Hiroshi, Araki, Shoko, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji( Part： Contributor, Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source)

IEEE 2011.01 ISBN: 9781457702556
2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS

Ansai, Yumi, Araki, Shoko, Makino, Shoji, Nakatani, Tomohiro, Yamada, Takeshi, Nakamura, Atsushi, Kitawaki, Nobuhiko( Part： Contributor, Cepstral Smoothing of Separated Signals for Underdetermined Speech Separation)

IEEE 2010.01 ISBN: 9781424453092
Teleconferencing equipment

牧野, 昭二( Part： Sole author)

Fiji Technosystem 1999.10
音響エコーキャンセラのための適応信号処理の研究

Makino,Shoji( Part： Sole author)

1993.03

▼display all

Presentations

Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier

Li, Li, Hirokazu, Kameoka, Makino, Shoji

ICASSP (Brighton, United Kingdom)

Presentation date： 2019.05
Joint separation and dereverberation of reverberant mixtures with multichannel variational autoencoder

Shota, Inoue, Hirokazu, Kameoka, Li, Li, Shogo, Seki, Makino, Shoji

ICASSP (Brighton, United Kingdom)

Presentation date： 2019.05
Time-frequency-bin-wise switching of minimum variance distortionless response beamformer for underdetermined situations

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

ICASSP 2019 (Brighton, ENGLAND)

Presentation date： 2019.05
NEW ANALYTICAL UPDATE RULE FOR TDOA INFERENCE FOR UNDERDETERMINED BSS IN NOISY ENVIRONMENTS

Maruyama, Takuro, Araki, Shoko, Nakatani, Tomohiro, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji, Nakamura, Atsushi

IEEE International Conference on Acoustics, Speech and Signal Processing (Kyoto, JAPAN)

Presentation date： 2012.03
Audio source separation based on independent component analysis

S. Makino, H. Sawada [Invited]

Tutorial at the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing

Presentation date： 2007.04
Study on geometrically constrained IVA with auxiliary function approach and VCD for in-car communication

Goto, Kana, Li, Li, Takahashi, Riki, Makino, Shoji, Yamada, Takeshi

APSIPA ASC 2020

Presentation date： 2020.12
Applying virtual microphones to triangular microphone array in in-car communication

Segawa, Hanako, Takahashi, Riki, Jinzai, Ryoga, Makino, Shoji, Yamada, Takeshi

APSIPA ASC 2020

Presentation date： 2020.12
空間フィルタの自動推定による音響シーン識別の検討

大野, 泰己, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会

Presentation date： 2020.03
Generative Adversarial Networks を用いた半教師あり学習の音響イベント検出への適用

合馬, 一弥, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会

Presentation date： 2020.03
発話の時間変動に着目した音声認識誤り区間推定の検討

舒, 禹清, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会

Presentation date： 2020.03
空間特徴と音響特徴を併用する音響イベント検出の検討

陳, 軼夫, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会

Presentation date： 2020.03
車室内コミュニケーション用低遅延音源分離の検討

上田, 哲也, 井上, 翔太, 牧野, 昭二, 松本, 光雄, 山田, 武志

日本音響学会春季研究発表会

Presentation date： 2020.03
DNNマスク推定に基づく畳み込みビームフォーマによる音源分離・残響除去・雑音除去の同時実現

髙橋理希, 中谷智広, 落合翼, 木下慶介, 池下林太郎, Marc, Delcroix, 荒木章子, 牧野, 昭二

日本音響学会2020年春季研究発表会

Presentation date： 2020.03
基底共有型半教師あり独立低ランク行列分析に基づく多チャネル補聴器システム

宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

日本音響学会2020年春季研究発表会

Presentation date： 2020.03
Spatial feature extraction based on convolutional neural network with multiple microphone inputs for monitoring of domestic activities

Kaneko, Yuki, Kurosawa, Rika, Yamada, Takeshi, Makino, Shoji

NCSP'20

Presentation date： 2020.02
Underdetermined multichannel speech enhancement using time-frequency-bin-wise switching beamformer and gated CNN-based time-frequency mask for reverberant environments

Takahashi, Riki, Yamaoka, Kouei, Li, Li, Makino, Shoji, Yamada, Takeshi, Matsumoto, Mitsuo

NCSP'20

Presentation date： 2020.02
Blind source separation with low-latency for in-car communication

Ueda, Tetsuya, Inoue, Shota, Makino, Shoji, Matsumoto, Mitsuo, Yamada, Takeshi

NCSP'20

Presentation date： 2020.02
多チャンネル変分自己符号化器法による任意話者の音源分離

李莉, 亀岡弘和, 井上翔太, 牧野, 昭二

電子情報通信学会

Presentation date： 2019.12
Evaluation of multichannel hearing aid system by rank-constrained spatial covariance matrix estimation

Une, Masakazu, Kubo, Yuki, Takamune, Norihiro, Kitamura, Daichi, Saruwatari, Hiroshi, Makino, Shoji

APSIPA (Lanzhou, China)

Presentation date： 2019.11
Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum

Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

APSIPA ASC 2019 (Lanzhou, PEOPLES R CHINA)

Presentation date： 2019.11
Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE

Li, Li, Toda, Tomoki, Morikawa, Kazuho, Kobayashi, Kazuhiro, Makino, Shoji

ISMIR (Delft, The Netherlands)

Presentation date： 2019.11
Joint separation, dereverberation and classification of multiple sources using multichannel variational autoencoder with auxiliary classifier

Shota, Inoue, Hirokazu, Kameoka, Li, Li, Makino, Shoji

ICA (AACHEN, GERMANY)

Presentation date： 2019.09
Gated convolutional neural network-based voice activity detection under high-level noise environments

Li, Li, Kouei, Yamaoka, Yuki, Koshino, Mitsuo, Matsumoto, Makino, Shoji

ICA (AACHEN, GERMANY)

Presentation date： 2019.09
BLSTMと変調スペクトルを用いた発話特徴識別の検討

サントソ, ジェニファー, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会

Presentation date： 2019.09
BLSTMを用いた音声認識誤り区間推定の検討

舒, 禹清, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会

Presentation date： 2019.09
Wavelength proportional arrangement of virtual microphones based on interpolation/extrapolation for underdetermined speech enhancement

Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Makino, Shoji, Yamada, Takeshi

EUSIPCO 2019

Presentation date： 2019.09
CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations

Yamaoka, Kouei, Li, Li, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

EUSIPCO 2019

Presentation date： 2019.09
ランク制約付き空間共分散モデル推定を用いた多チャネル補聴器システムの評価

宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

日本音響学会2019年秋季研究発表会

Presentation date： 2019.09
日本語スピーキングテストにおける解答発話テキストの分散表現を用いた自動採点の検討

臼井, 桃香, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会

Presentation date： 2019.03
MVDRビームフォーマの時間周波数スイッチングによる劣決定音声強調

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

電子情報通信学会音声研究会

Presentation date： 2019.03
時間周波数スイッチングビームフォーマとGated CNNを用いた時間周波数マスクの組み合わせによる劣決定音声強調

髙橋, 理希, 山岡, 洸瑛, 李, 莉, 牧野, 昭二, 山田, 武志

日本音響学会2019年春季研究発表会

Presentation date： 2019.03
Experimental evaluation of WaveRNN predictor for audio lossless coding

Amada, Shota, Sugiura, Ryosuke, Kamamoto, Yutaka, Harada, Noboru, Moriya, Takehiro, Yamada, Takeshi, Makino, Shoji

NCSP'19

Presentation date： 2019.03
Noise suppression using beamformer and transfer-function-gain nonnegative matrix factorization with distributed stereo microphones

Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

NCSP'19

Presentation date： 2019.03
Categorizing error causes related to utterance characteristics in speech recognition

Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

NCSP'19

Presentation date： 2019.03
Performance evaluation of time-frequency-bin-wise switching beamformer in reverberant environments

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

NCSP'19

Presentation date： 2019.03
音源クラス識別器つき多チャンネル変分自己符号化器を用いた高速セミブラインド音源分離

李, 莉, 亀岡, 弘和, 牧野, 昭二

日本音響学会2019年春季研究発表会

Presentation date： 2019.03
Gated CNNを用いた劣悪な雑音環境下における音声区間検出

牧野, 昭二, 李莉, 越野ゆき, 松本光雄

電子情報通信学会

Presentation date： 2019.03
多チャンネル変分自己符号化器を用いた音源分離と残響除去の統合的アプローチ

井上翔太, 亀岡弘和, 李莉, 関翔悟, 牧野, 昭二

日本音響学会2019年春季研究発表会

Presentation date： 2019.03
Microphone position realignment by extrapolation of virtual microphone

Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Yamada, Takeshi, Makino, Shoji

10th Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC) (Honolulu, HI)

Presentation date： 2018.11
Weakly labeled learning using BLSTM-CTC for sound event detection

Matsuyoshi, Taiki, Komatsu, Tatsuya, Kondo, Reishi, Yamada, Takeshi, Makino, Shoji

APSIPA ASC 2018

Presentation date： 2018.11
時間周波数スイッチングビームフォーマと時間周波数マスキングによる劣決定音声強調

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

日本音響学会秋季研究発表会

Presentation date： 2018.09
Time-frequency-bin-wise beamformer selection and masking for speech enhancement in underdetermined noisy scenarios

Yamaoka, Kouei, Brendel, Andreas, Ono, Nobutaka, Makino, Shoji, Buerger, Michael, Yamada, Takeshi, Kellermann, Walter

EUSIPCO 2018 (Rome, ITALY)

Presentation date： 2018.09
Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming

Matsui, Yutaro, Nakatani, Tomohiro, Delcroix, Marc, Kinoshita, Keisuke, Ito, Nobutaka, Araki, Shoko, Makino, Shoji

IWAENC2018

Presentation date： 2018.09
WaveRNNを利用した音声ロスレス符号化に関する検討と考察

天田, 将太, 杉浦, 亮介, 鎌本, 優, 原田, 登, 守谷, 健弘, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会

Presentation date： 2018.09
ヴァーチャルマイクロフォンの外挿によるマイクロフォン間隔の仮想的拡張

陣在, 遼河, 山岡, 洸瑛, 松本, 光雄, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会

Presentation date： 2018.09
音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習法の有効性評価

松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会

Presentation date： 2018.09
複数種録音端末を用いた会議の想定における伝達関数ゲイン基底NMFによる遠方音源抑圧の性能評価

松井, 裕太郎, 牧野, 昭二, 小野, 順貴, 山田, 武志

電子情報通信学会信号処理研究会

Presentation date： 2018.03
音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習の検討

松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会

Presentation date： 2018.03
複数ビームフォーマの組み合わせによる非線形マイクロフォンアレイ

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

日本音響学会春季研究発表会

Presentation date： 2018.03
音声認識における誤認識原因通知のための印象評定値推定の検討

後藤, 孝宏, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会

Presentation date： 2018.03
畳み込みニューラルネットワークを用いた空間特徴抽出に基づく音響シーン識別の検討

高橋, 玄, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会

Presentation date： 2018.03
Novel speech recognition interface based on notification of utterance volume required in changing noisy environment

Goto, Takahiro, Yamada, Takeshi, Makino, Shoji

NCSP'18

Presentation date： 2018.03
Acoustic scene classification based on spatial feature extraction using convolutional neural networks

Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

NCSP'18

Presentation date： 2018.03
Ego noise reduction and sound source localization adapted to human ears using hose-shaped rescue robot

Mae, Narumi, Yamaoka, koei, Mitsui, Yosiki, Matsumoto, Mitsuo, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

NCSP'18

Presentation date： 2018.03
Ego-noise reduction for hose-shaped rescue robot using basis-shared semi-supervised independent low-rank matrix analysis

Takakusaki, Moe, Kitamura, Daichi, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi, Saruwatari, Hiroshi

NCSP'18

Presentation date： 2018.03
Abnormal sound detection by two microphones using virtual microphone technique

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

APSIPA 2017 (Kuala Lumpur, MALAYSIA)

Presentation date： 2017.12
Sound source localization using binaural difference for hose-shaped rescue robot

Mae, Narumi, Mitsui, Yoshiki, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

APSIPA 2017 (Kuala Lumpur, MALAYSIA)

Presentation date： 2017.12
Performance evaluation of acoustic scene classification using DNN-GMM and frame-concatenated acoustic Features

Takahashi, Gen, Yamada, Takeshi, Ono, Nobutaka, Makino, Shoji

APSIPA 2017 (Kuala Lumpur, MALAYSIA)

Presentation date： 2017.12
Experimental evaluation of encoding parameters of MPEG-4 ALS for high-resolution audio

Amada, Shota, Kamamoto, Yutaka, Harada, Noboru, Sugiura, Ryosuke, Moriya, Takehiro, Makino, Shoji, Yamada, Takeshi

IEEE GCCE 2017 (Nagoya, JAPAN)

Presentation date： 2017.10
Mel-generalized cepstral regularization for discriminative non-negative matrix factorization

Li, Li, Kameoka, Hirokazu, Makino, Shoji

MLSP (Tokyo, Japan)

Presentation date： 2017.09
Multiple far noise suppression in a real environment using transfer-function-gain NMF

Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

EUSIPCO 2017 (GREECE)

Presentation date： 2017.08
Performance evaluation of nonlinear speech enhancement based on virtual increase of channels in reverberant environments

Yamaoka, Kouei, Ono, Nobutaka, Yamada, Takeshi, Makino, Shoji

EUSIPCO 2017 (GREECE)

Presentation date： 2017.08
Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization

Li, Li, Kameoka, Hirokazu, Toda, Tomoki, Makino, Shoji

Interspeech (Stockholm, Sweden)

Presentation date： 2017.08
Convolutional neural network architecture and input volume matrix design for ERP classifications in a tactile P300-based brain-computer interface

Kodama, Takumi, Makino, Shoji

IEEE Engineering in Medicine & Biology Society (EMBC) (Jeju Island, Korea)

Presentation date： 2017.07
柔軟索状ロボットにおける独立低ランク行列分析と統計的音声強調に基づく高品質ブラインド音源分離の開発

三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

日本機械学会ロボティクス・メカトロニクス講演会

Presentation date： 2017.05
SJ-CATにおける項目応答理論に基づく能力値推定の精度改善

小野, 友暉, 山田, 武志, 菊地, 賢一, 今井, 新悟, 牧野, 昭二

日本音響学会2017年春季研究発表会

Presentation date： 2017.03
音響ロスレス符号化MPEG-4 ALSのハイレゾ音源適応の検討と考察

天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

日本音響学会2017年春季研究発表会

Presentation date： 2017.03
DNN-GMMと連結特徴量を用いた音響シーン識別の検討

高橋, 玄, 山田, 武志, 小野, 順貴, 牧野, 昭二

日本音響学会2017年春季研究発表会

Presentation date： 2017.03
Discriminative non-negative matrix factorization with majorization-minimization

Li, L, Kameoka, H, Makino, Shoji

HSCMA (San Francisco, CA)

Presentation date： 2017.03
補助関数法による識別的NMFの基底学習アルゴリズム

李莉, 亀岡弘和, 牧野昭二

日本音響学会2017年春季研究発表会

Presentation date： 2017.03
独立低ランク行列分析と統計的音声強調を用いた柔軟索状ロボットにおけるブラインド音源分離システムの開発

三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

日本音響学会2017年春季研究発表会

Presentation date： 2017.03
Ego noise reduction for hose-shaped rescue robot combining independent low-rank matrix analysis and multichannel noise cancellation

Mae, N, Ishimura, M, Makino, Shoji, Kitamura, D, Ono, N, Yamada, T, Saruwatari, H

LVA/ICA (Grenoble Alpes Univ, Grenoble, FRANCE)

Presentation date： 2017.02
Analysis of the brain activated distributions in response to full-body spatial vibrotactile stimuli using a tactile P300-based BCI paradigm

Kodama, T, Makino, Shoji

Biomedical and Health Informatics (BHI)

Presentation date： 2017.02
Performance estimation of spontaneous speech recognition using non-reference acoustic features

Ling,Guo, Takeshi,Yamada, Shoji,Makino

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) (Jeju, SOUTH KOREA)

Presentation date： 2016.12
Full-body tactile P300-based brain-computer interface accuracy refinement

Kodama, T, Shimizu, K, Makino, Shoji, Rutkowski, T

International Conference on Bio-engineering for Smart Technologies (BioSMART)

Presentation date： 2016.12
Tactile brain-computer interface using classification of P300 responses evoked by full body spatial vibrotactile stimuli

Kodama, T, Makino, Shoji, Rutkowski, T

APSIPA

Presentation date： 2016.12
Visual motion onset augmented reality brain-computer interface

Shimizu, K, Kodama, T, Makino, Shoji, Rutkowski, T

International Conference on Bio-engineering for Smart Technologies (BioSMART)

Presentation date： 2016.12
伝達関数ゲイン基底NMFを用いた遠方雑音抑圧の実環境での評価

松井,裕太郎, 牧野,昭二, 小野,順貴, 山田,武志

第31回信号処理シンポジウム

Presentation date： 2016.11
雑音下音声認識における必要発話音量提示機能の実装と評価

後藤,孝宏, 山田,武志, 牧野,昭二

日本音響学会秋季研究発表会

Presentation date： 2016.09
日本語スピーキングテストSJ-CATにおける項目応答理論に基づく能力値推定の検証

小野,友暉, 山田,武志, 菊地,賢一, 今井,新悟, 牧野,昭二

日本音響学会秋季研究発表会

Presentation date： 2016.09
ノンリファレンス特徴量を用いた自然発話音声認識の性能推定の検討

郭,レイ, 山田,武志, 牧野,昭二

日本音響学会秋季研究発表会

Presentation date： 2016.09
ヴァーチャル多素子化に基づくSN比最大化ビームフォーマの残響に対する性能変化

山岡,洸瑛, 小野,順貴, 山田,武志, 牧野,昭二

日本音響学会秋季研究発表会

Presentation date： 2016.09
Ego-noise reduction for a hose-shaped rescue robot using determined Rank-1 multichannel nonnegative matrix factorization

Moe,Takakusaki, Daichi,Kitamura, Nobutaka,Ono, Takeshi,Yamada, Shoji,Makino, Hiroshi,Saruwatari

IWAENC2016

Presentation date： 2016.09
Noise reduction using independent vector analysis and noise cancellation for a hose-shaped rescue robot

Masaru,Ishimura, Shoji,Makino, Takeshi,Yamada, Nobutaka,Ono, Hiroshi,Saruwatari

IWAENC2016 (Xian, PEOPLES R CHINA)

Presentation date： 2016.09
Multi-talker speech recognition based on blind source separation with ad hoc microphone array using smartphones and cloud storage

Ochi, K, Ono, N, Miyabe, S, Makino, Shoji

Interspeech (San Francisco, CA)

Presentation date： 2016.09
Acoustic scene classification using deep neural network and frame-concatenated acoustic feature

Gen, Takahashi, Takeshi, Yamada, Shoji, Makino, Nobutaka, Ono

Detection and Classification of Acoustic Scenes and Events

Presentation date： 2016.09
Flexible microphone array based on multichannel nonnegative matrix factorization and statistical signal estimation

Saruwatari, H, Takata, K, Ono, N, Makino, Shoji [Invited]

International Congress on Acoustics

Presentation date： 2016.09
Toward a QoL improvement of ALS patients: Development of the full-body P300-based tactile brain-computer interface

Kodama, T, Makino, Shoji, Rutkowski, T

AEARU Young Researchers International Conference

Presentation date： 2016.09
音声のスペクトル領域とケプストラム領域における同時強調

李莉, 亀岡弘和, 樋口卓哉, 猿渡洋, 牧野昭二

信学技報 EA2014-75

Presentation date： 2016.08
独立ベクトル分析とノイズキャンセラを用いた雑音抑圧の柔軟索状ロボットへの適用

石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

日本機械学会ロボティクス・メカトロニクス講演会2016

Presentation date： 2016.06
Vehicle counting and lane estimation with ad-hoc microphone array in real road environments

Takuya,Toyoda, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

NCSP'16

Presentation date： 2016.03
ランク1空間モデル制約付き多チャネルNMFを用いた雑音抑圧の柔軟索状ロボットへの適用

高草木,萌, 北村,大地, 小野,順貴, 山田,武志, 牧野,昭二, 猿渡,洋

電子情報通信学会総合大会

Presentation date： 2016.03
振幅のみからの相関推定と雑音尖度に基づく空間サブトラクションアレーの減算係数最適化

李,傑, 宮部,滋樹, 小野,順貴, 山田,武志, 牧野,昭二

日本音響学会2016年春季研究発表会

Presentation date： 2016.03
独立ベクトル分析とノイズキャンセラを用いた柔軟索状ロボットにおける雑音抑圧

石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

電子情報通信学会総合大会

Presentation date： 2016.03
教師あり多チャネルNMFと統計的音声強調を用いた柔軟索状ロボットにおける音源分離

高田一真, 北村大地, 中嶋広明, 小山翔一, 猿渡洋, 小野順貴, 牧野,昭二

日本音響学会2016年春季研究発表会

Presentation date： 2016.03
ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

高草木萌, 北村大地, 小野順貴, 山田武志, 牧野昭二, 猿渡洋

日本機械学会ロボティクス・メカトロニクス講演会

Presentation date： 2016.03
非同期分散マイクロホンによるブラインド音源分離を用いた複数話者同時音声認識

越智景子, 小野順貴, 宮部滋樹, 牧野,昭二

日本音響学会2016年春季研究発表会

Presentation date： 2016.03
SVM classification study of code-modulated visual evoked potentials

D.,Aminaka, S.,Makino, T.M.,Rutkowski

Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC) (PEOPLES R CHINA Hong Kong)

Presentation date： 2015.12
Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model

Yoshikazu,Murase, Hironobu,Chiba, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC) (PEOPLES R CHINA Hong Kong)

Presentation date： 2015.12
Fingertip stimulus cue-based tactile brain-computer interface

H.,Yajima, S.,Makino, T.M.,Rutkowski

Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC) (PEOPLES R CHINA Hong Kong)

Presentation date： 2015.12
Variable sound elevation features for head-related impulse response spatial auditory BCI

C.,Nakaizumi, S.,Makino, T.M.,Rutkowski

Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC) (PEOPLES R CHINA Hong Kong)

Presentation date： 2015.12
EEG filtering optimization for code-modulated chromatic visual evoked potential-based brain-computer interface

D.,Aminaka, S.,Makino, T.M.,Rutkowski

International Symbiotic Workshop (SYMBIOTIC)

Presentation date： 2015.10
日本語スピーキングテストSJ-CATにおける低スコア解答発話の検出の検討

小野,友暉, 山田,武志, 今井,新悟, 牧野,昭二

日本音響学会2015年秋季研究発表会

Presentation date： 2015.09
ノンリファレンスひずみ特徴量を用いた雑音下音声認識性能推定の検討

郭,レイ, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

日本音響学会2015年秋季研究発表会

Presentation date： 2015.09
Classification Accuracy Improvement of Chromatic and High-Frequency Code-Modulated Visual Evoked Potential-Based BCI

Aminaka,Daiki, Makino,Shoji, Rutkowski, Tomasz M

8th International Conference on Brain Informatics and Health (BIH) (Royal Geog Soc, London, ENGLAND)

Presentation date： 2015.08
Estimating correlation coefficient between two complex signals without phase observation

S.,Miyabe, N.,Ono, Makino,Shoji

LVA/ICA

Presentation date： 2015.08
Chromatic and high-frequency cVEP-based BCI paradigm

Aminaka,Daiki, Makino,Shoji, Rutkowski, Tomasz M

Engineering in Medicine and Biology Conference (EMBC)

Presentation date： 2015.08
Head-related impulse response cues for spatial auditory brain-computer interface

C.,Nakaizumi, S.,Makino, T.M.,Rutkowski

Engineering in Medicine and Biology Conference (EMBC)

Presentation date： 2015.08
マイクロホンアレーの位相が観測できない条件でのチャネル間の相関係数の推定

宮部滋樹, 小野順貴, 牧野,昭二

回路とシステムワークショップ

Presentation date： 2015.08
Inter-stimulus interval study for the tactile point-pressure brain-computer interface

K.,Shimizu, Makino,Shoji, T.M.,Rutkowski

Engineering in Medicine and Biology Conference (EMBC)

Presentation date： 2015.08
ステレオ録音に基づく移動音源モデルによる走行車両検出と走行方向推定

遠藤,純基, 豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会2015年春季研究発表会

Presentation date： 2015.03
総合品質と明瞭性の客観推定に基づくスペクトルサブトラクションの減算係数の最適化

中里,徹, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会

Presentation date： 2015.03
非同期分散マイクロフォンアレーによる伝達関数ゲイン基底NMFを用いた拡散雑音抑圧

村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会2015年春季研究発表会

Presentation date： 2015.03
ケプストラム距離とSMR-パープレキシティを用いた雑音下音声認識の性能推定の検討

郭,レイ, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会

Presentation date： 2015.03
2つの超ガウス性複素信号の位相観測を用いない相関係数推定

宮部滋樹, 小野順貴, 牧野, 昭二

信学技報EA2014-75

Presentation date： 2015.03
Spatial auditory BCI spellers using real and virtual surround sound systems

M.,Chang, C.,Nakaizumi, K.,Mori, Makino,Shoji, T.M.,Rutkowski

Conference on Systems Neuroscience and Rehabilitation (SNR2015)

Presentation date： 2015.03
認識性能予測に基づく雑音環境下音声認識のユーザビリティ改善の検討

青木,智充, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会

Presentation date： 2015.03
On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

Yoshikazu,Murase, Hironobu,Chiba, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

APSIPA 2014

Presentation date： 2014.12
絶対値の観測のみを用いた2つの複素信号の相関係数推定

宮部滋樹, 小野順貴, 牧野,昭二

日本音響学会研究発表会

Presentation date： 2014.09
ケプストラム距離を用いた雑音下音声認識の性能推定の検討

郭,翎, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

日本音響学会研究発表会

Presentation date： 2014.09
伝達関数ゲイン基底NMFにおけるマイク数・マイク配置と目的音強調性能の関係

村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会

Presentation date： 2014.09
βダイバージェンスに基づく一般化振幅補間によるヴァーチャル多素子化を用いた目的音源強調

片平,拓希, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会

Presentation date： 2014.09
分散型マイクロホンアレイを用いた交通車両検出とその車線推定の検討

豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会

Presentation date： 2014.09
AMPLITUDE-BASED SPEECH ENHANCEMENT WITH NONNEGATIVE MATRIX FACTORIZATION FOR ASYNCHRONOUS DISTRIBUTED RECORDING

Chiba, Hironobu, Ono, Nobutaka, Miyabe, Shigeki, Takahashi, Yu, Yamada, Takeshi, Makino, Shoji

14th International Workshop on Acoustic Signal Enhancement (IWAENC) (Antibes, FRANCE)

Presentation date： 2014.09
Multi-stage declipping of clipping distortion based on length classification of clipped interval

Chenlei,Li, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

日本音響学会研究発表会

Presentation date： 2014.09
教師なし伝達関数ゲイン基底NMFによる目的音強調における罰則項の特性評価

千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会

Presentation date： 2014.09
M2Mを用いた大規模データ収集システムの構築に関する研究

牧野,昭二

情報処理学会研究報告計算機アーキテクチャ研究会（ARC）

Presentation date： 2013.12
VIRTUALLY INCREASING MICROPHONE ARRAY ELEMENTS BY INTERPOLATION IN COMPLEX-LOGARITHMIC DOMAIN

Katahira, Hiroki, Ono, Nobutaka, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji

21st European Signal Processing Conference (EUSIPCO) (Marrakesh, MOROCCO)

Presentation date： 2013.09
非同期録音ブラインド同期のための線形位相補償の効率的最尤解探索

宮部滋樹, 小野順貴, 牧野昭二 [Invited]

音講論集___2-10-4_

Presentation date： 2013.03
複素対数補間によるヴァーチャル観測に基づく劣決定条件での音声強調

片平拓希, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二 [Invited]

音講論集___2-10-6_

Presentation date： 2013.03
日本語スピーキングテストSCATにおける文読み上げ・文生成問題の自動採点手法の改良

山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

音講論集___1-Q-52a_465-468

Presentation date： 2013.03
楽音符号化品質に影響を及ぼす楽音信号の特徴量の検討

松浦嶺, 山田武志, 牧野昭二, 北脇信彦

音講論集___1-Q-11c_401-404

Presentation date： 2013.03
ACELPにおけるピッチシャープニングの特性評価

千葉大将, 守谷健弘, 鎌本優, 原田登, 宮部滋樹, 山田武志, 牧野昭二 [Invited]

音講論集___1-7-18_

Presentation date： 2013.03
A network model for the embodied communication of musical emotions

寺澤洋子, 星-芝, 玲子, 柴山拓郎, 大村英史, 古川聖, 牧野, 昭二, 岡ノ谷一夫

Cognitive Studies

Presentation date： 2013
AUTOMATIC SCORING METHOD CONSIDERING QUALITY AND CONTENT OF SPEECH FOR SCAT JAPANESE SPEAKING TEST

Okubo, Naoko, Yamahata, Yuto, Yamada, Takeshi, Imai, Shingo, Ishizuka, Kenkichi, Shinozaki, Takahiro, Nisimura, Ryuichi, Makino, Shoji, Kitawaki, Nobuhiko

International Conference on Speech Database and Assessments (Oriental COCOSDA) (11 Macau, PEOPLES R CHINA)

Presentation date： 2012.12
日本語スピーキングテストにおける文生成問題の自動採点の検討

大久保梨思子, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

音講論集___3-Q-16_395-396

Presentation date： 2012.09
ミュージカルノイズを考慮した雑音抑圧音声のFR型客観品質評価の検討

藤田悠希, 山田武志, 牧野昭二, 北脇信彦

音講論集___3-P-5_127-130

Presentation date： 2012.09
身体動作の連動性理解にむけた筋活動可聴化

松原正樹, 寺澤洋子, 門根秀樹, 鈴木健嗣, 牧野昭二 [Invited]

音講論集___2-10-2_

Presentation date： 2012.09
非同期録音信号の線形位相補償によるブラインド同期と音源分離への応用

宮部滋樹, 小野順貴, 牧野昭二 [Invited]

音講論集___3-9-8_

Presentation date： 2012.09
日本語スピーキングテストにおける文章読み上げ問題の自動採点の検討

山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

音講論集___3-Q-18_399-400

Presentation date： 2012.09
コヒーレンス解析による定常状態誘発反応の可聴化

加庭輝明, 寺澤洋子, 松原正樹, Tomasz,M. Rutkowski, 牧野昭二

音講論集___2002/10/2_919-922

Presentation date： 2012.09
多チャンネルウィーナーフィルタを用いた音源分離における観測モデルの調査

坂梨龍太郎, 宮部滋樹, 山田武志, 牧野昭二

音講論集___1-P-14,_757-760

Presentation date： 2012.09
混合DOA モデルに基づく多チャンネル複素NMF による劣決定BSS

武田和馬, 亀岡弘和, 澤田宏, 荒木章子, 宮部滋樹, 山田武志, 牧野昭二

音講論集___2-1-9_747-750

Presentation date： 2012.03
日本語スピーキングテストにおける文生成問題の採点に影響を及ぼす要因の検討

大久保梨思子, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

信学総大___D-14-9_193

Presentation date： 2012.03
日本語スピーキングテストにおける文章読み上げ問題の採点に影響を及ぼす要因の検討

山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

信学総大___D-14-8_192

Presentation date： 2012.03
雑音抑圧音声の主観品質評価におけるミュージカルノイズの影響

藤田悠希, 山田武志, 牧野昭二, 北脇信彦 [Invited]

信学総大___D-14-1_185

Presentation date： 2012.03
音響モデルの精度を考慮した雑音下音声認識の性能推定の検討

高岡隆守, 山田武志, 牧野昭二, 北脇信彦

音講論集___1-P-13_149-150

Presentation date： 2012.03
短時間雑音特性に基づく雑音下音声認識の性能推定の検討

森下恵里, 山田武志, 牧野昭二, 北脇信彦

音講論集___1-P-14_151-152

Presentation date： 2012.03
フルランク空間相関行列モデルに基づく拡散性雑音除去

礒佳樹, 荒木章子, 牧野昭二, 中谷智広, 澤田宏, 山田武志, 宮部滋樹, 中村篤

信学総大___A-10-9_194

Presentation date： 2012.03
音量差に基づく音像生成における個人適応手法の有効性検証

天野成祥, 山田武志, 牧野昭二, 北脇信彦

音講論集___2-Q-1_895-898

Presentation date： 2012.03
高次相関を用いた非線形MUSIC による高分解能方位推定

杉本侑哉, 宮部滋樹, 山田武志, 牧野昭二

音講論集___3-1-6_763-766

Presentation date： 2012.03
時間周波数領域におけるグリッド間の整合性に基づくクリッピングの除去

三浦晋, 宮部滋樹, 山田武志, 牧野昭二, 中島弘史, 中臺一博

音講論集___1-Q-10_843-846

Presentation date： 2012.03
Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source

Takeda, Kazuma, Kameoka, Hirokazu, Sawada, Hiroshi, Araki, Shoko, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji

IEEE Region 10 Conference on TENCON (INDONESIA)

Presentation date： 2011.11
Restoration of Clipped Audio Signal Using Recursive Vector Projection

Miura, Shin, Nakajima, Hirofumi, Miyabe, Shigeki, Makino, Shoji, Yamada, Takeshi, Nakadai, Kazuhiro

IEEE Region 10 Conference on TENCON (INDONESIA)

Presentation date： 2011.11
周波数依存の時間差モデルによる劣決定BSS

丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田武志, 牧野昭二, 中村篤

信学技報___EA2011-86_25-30

Presentation date： 2011.11
発話の連続性に基づいた音声信号の分類による会議音声の可視化

加藤通朗, 杉本侑哉, 宮部滋樹, 牧野昭二, 山田武志, 北脇信彦

音講論集___3-P-20_197-200

Presentation date： 2011.09
雑音抑圧音声の総合品質推定モデルの改良とその客観品質評価への適用

藤田悠希, 山田武志, 牧野昭二, 北脇信彦

音講論集___2-Q-23_127-130

Presentation date： 2011.09
スピーカ間の音量差に基づく音像生成手法における個人適応の検討

天野成祥, 山田武志, 牧野昭二, 北脇信彦

音講論集___2-4-10_661-664

Presentation date： 2011.09
楽音と音声の双方に適用できる客観品質評価法の検討

三上雄一郎, 山田武志, 牧野昭二, 北脇信彦

信学総大___B-11-19_448

Presentation date： 2011.03
雑音抑圧音声の客観品質評価に用いる総合品質推定モデルの改良

藤田悠希, 山田武志, 牧野昭二

信学総大___B-11-18_447

Presentation date： 2011.03
スペクトル変形同定の聴覚トレーニングにおける適応的フィードバックの影響

加庭輝明, 金成英, 寺澤洋子, 伊藤寿浩, 池田雅弘, 山田武志, 牧野昭二

音講論集___2-1-1_1003-1006

Presentation date： 2011.03
クリッピングした音響信号の修復

三浦晋, 中島弘史, 牧野昭二, 山田武志, 中臺一博

音講論集___3-P-53(d)_941-944

Presentation date： 2011.03
空間スペクトルを用いた時間断続信号の検出における主成分分析と周波数分析の比較評価

加藤通朗, 杉本侑哉, 牧野昭二, 山田武志, 北脇信彦

音講論集___3-P-8(d)_879-880

Presentation date： 2011.03
空間スペクトルへの周波数分析の適用による時間断続信号の検出

杉本侑哉, 加藤通朗, 牧野昭二, 山田武志

音講論集___3-P-7(c)_877-878

Presentation date： 2011.03
高残響下で混合された音声の音源分離に関する研究

礒佳樹, 荒木章子, 牧野昭二, 中谷智広, 澤田宏, 山田武志, 中村篤

音講論集___1-9-13_643-646

Presentation date： 2011.03
音源のW-DO性を仮定した多チャンネル複素NMFによる劣決定BSS

武田和馬, 亀岡弘和, 澤田宏, 荒木章子, 山田武志, 牧野昭二

音講論集___1-Q-19(f)_801-804

Presentation date： 2011.03
視覚障がい者のタッチパネル操作支援のための音像生成手法の検討

天野成祥, 山田武志, 牧野昭

音講論集___3-P-7(c)_877-878

Presentation date： 2011.03
雑音抑圧された音声の主観・客観品質評価法

山田武志, 牧野昭二, 北脇信彦

情報処理学会研究報告音声言語情報処理（SLP）___2010-SLP-83 (7)_1-6

Presentation date： 2010.10
雑音抑圧音声のMOSと単語了解度の客観推定

山田武志, 北脇信彦, 牧野昭二

信学ソ大___BS-5-4_S-19

Presentation date： 2010.09
空間パワースペクトルの主成分分析に基づく時間断続信号の検出

加藤通朗, 杉本侑哉, 牧野昭二, 山田武志, 北脇信彦

信学技報___EA2010-47_25-30

Presentation date： 2010.08
Cepstral Smoothing of Separated Signals for Underdetermined Speech Separation

Ansai, Yumi, Araki, Shoko, Makino, Shoji, Nakatani, Tomohiro, Yamada, Takeshi, Nakamura, Atsushi, Kitawaki, Nobuhiko

International Symposium on Circuits and Systems Nano-Bio Circuit Fabrics and Systems (ISCAS 2010) (Paris, FRANCE)

Presentation date： 2010.05
調波構造とHMM合成に基づく混合楽器音認識の検討

山本裕貴, 山田武志, 北脇信彦, 牧野昭二

音講論集___3-8-4_1003-1004

Presentation date： 2010.03
雑音抑圧音声の総合品質推定モデルを適用したフルリファレンス客観品質評価法

篠原佑基, 山田武志, 北脇信彦, 牧野昭二

信学総大___B-11-2_436

Presentation date： 2010.03
劣決定音源分離のための分離信号のケプストラムスムージング

安齊祐美, 荒木章子, 牧野昭二, 中谷智広, 山田武志, 中村篤, 北脇信彦

音講論集___2-P-25_847-850

Presentation date： 2010.03
日本語学習支援のためのアクセント認識の検討

ショートグレッグ, 山田武志, 北脇信彦, 牧野昭二

音講論集___1-P-17_447-448

Presentation date： 2010.03
雑音下音声認識の性能推定法の実環境における評価

中島智弘, 山田武志, 北脇信彦, 牧野昭二

音講論集___2-Q-4_241-244

Presentation date： 2010.03
IP網における音声の客観品質評価に用いる擬似音声信号の検討

青島千佳, 北脇信彦, 山田武志, 牧野昭二 [Invited]

信学総大___B-11-1_435

Presentation date： 2010.03
IP網における客観品質評価に用いる擬似音声信号の検討

青島千佳, 北脇信彦, 山田武志, 牧野昭二 [Invited]

QoSワークショップ___QW7-P-16_

Presentation date： 2009.11
楽音と音声の双方に適用できるオーディオ信号の客観品質推定法の検討

三上雄一郎, 北脇信彦, 山田武志, 牧野昭二

QoSワークショップ___QW-7-P-15_

Presentation date： 2009.11
雑音抑圧音声の総合品質推定モデルを用いたフルリファレンス客観品質評価法の検討

篠原佑基, 山田武志, 北脇信彦, 牧野昭二

QoSワークショップ___QW7-P-13_

Presentation date： 2009.11
音声区間推定と時間周波数領域方向推定の統合による会議音声話者識別

荒木章子, 藤本雅清, 石塚健太郎, 中谷智広, 澤田宏, 牧野昭二

信学技報___EA2008-40_19-24

Presentation date： 2008.07
［フェロー記念講演］独立成分分析に基づくブラインド音源分離

牧野, 昭二

信学技報___EA2008-17_65-73

Presentation date： 2008.05
周波数領域ICAにおける初期値の短時間データからの学習

荒木章子, 伊藤信貴, 澤田宏, 小野順貴, 牧野昭二, 嵯峨山茂樹

信学総大___A-10-6_208

Presentation date： 2008.03
音声区間検出と方向情報を用いた会議音声話者識別システムとその評価

荒木章子, 藤本雅清, 石塚健太郎, 澤田宏, 牧野昭二

音講論集___1-10-1_1-4

Presentation date： 2008.03
音声のスパース性を用いたUnderdetermined音源分離

荒木章子, 澤田宏, 牧野昭二

信学総大___AS-4-5_S-46 - S-47

Presentation date： 2008.03
A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures

H. Sawada, S. Araki, S. Makino

ICA2007, Stereo Audio Source Separation Evaluation Campaign____

Presentation date： 2007.09
Blind source separation based on time-frequency masking and maximum SNR beamformer array

S. Araki, H. Sawada, S. Makino

ICA2007, Stereo Audio Source Separation Evaluation Campaign____

Presentation date： 2007.09
Blind audio source separation based on independent component analysis

S. Makino [Invited]

Keynote Talk at the 2007 International Conference on Independent Component Analysis and Signal Separation

Presentation date： 2007.09
話者分類とSN比最大化ビームフォーマに基づく会議音声強調

荒木章子, 澤田宏, 牧野昭二

音講論集___2-1-13_571-572

Presentation date： 2007.03
事前学習を用いる周波数領域Pearson-ICAの高速化

加藤比呂子, 永原裕一, 荒木章子, 澤田宏, 牧野昭二

音講論集___1-5-22_549-550

Presentation date： 2006.03
観測信号ベクトルのクラスタリングに基づくスパース信号の到来方向推定

荒木章子, 澤田宏, 向井良, 牧野昭二

音講論集___3-5-6_615-616

Presentation date： 2006.03
独立成分分析に基づくブラインド音源分離

牧野昭二, 荒木章子, 向井良, 澤田宏

計測自動制御学会中国支部学術講演会____2-9

Presentation date： 2005.11
多音源に対する周波数領域ブラインド音源分離

澤田宏, 向井良, 荒木章子, 牧野昭二

ＡＩチャレンジ研究会___SIG-Challenge-0522-3_17-22

Presentation date： 2005.10
パラメトリックピアソン分布を用いた周波数領域ブラインド音源分離

加藤比呂子, 永原裕一, 荒木章子, 澤田宏, 牧野昭二

音講論集___2-2-4_593-594

Presentation date： 2005.09
観測信号ベクトル正規化とクラスタリングによる音源分離手法とその評価

荒木章子, 澤田宏, 向井良, 牧野昭二

音講論集___2-2-3_591-592

Presentation date： 2005.09
3次元マイクロホンアレイを用いた多音源ブラインド分離

向井良, 澤田宏, 荒木章子, 牧野昭二

信学ソ大___A-10-8_209

Presentation date： 2005.09
多くの背景音からの主要音源のブラインド抽出

澤田宏, 荒木章子, 向井良, 牧野昭二

信学ソ大___A-10-9_210

Presentation date： 2005.09
観測ベクトルのクラスタリングによるブラインド音源分離

荒木章子, 澤田宏, 向井良, 牧野昭二

信学ソ大___A-10-7_208

Presentation date： 2005.09
独立成分分析を用いた音源数推定法

澤田宏, 向井良, 荒木章子, 牧野昭二

音講論集___3-Q-20_753-754

Presentation date： 2004.09
A solution for the permutation problem in frequency domain BSS using near- and far-field models

R. Mukai, H. Sawada, S. Araki, S. Makino

CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-3_

Presentation date： 2004.04
Underdetermined blind source separation for convolutive mixtures of sparse signals

S. Winter, H. Sawada, S. Araki, S. Makino

CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-2_

Presentation date： 2004.04
Blind separation of more speech than sensors using time-frequency masks and ICA

S. Araki, S. Makino, H. Sawada, R. Mukai

CSA2004 (NTT Workshop on Communication Scene Analysis)___AU-4_

Presentation date： 2004.04
Blind source separation for convolutive mixtures in the frequency domain

H. Sawada, R. Mukai, S. Araki, S. Makino

CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-1_

Presentation date： 2004.04
狭間隔・広間隔の複数マイクロホン対を用いた周波数領域ブラインド音源分離

向井良, 澤田宏, 荒木章子, 牧野昭二

音講論集___3-P-16_627-628

Presentation date： 2004.03
独立成分分析に基づくブラインド音源分離

牧野昭二, 荒木章子, 向井良, 澤田宏

ディジタル信号処理シンポジウム___A3-2_1-10

Presentation date： 2003.11
Blind Separation of More Speech Signals than Sensors using Time-frequency Masking and Mixing Matrix Estimation

Shoko Araki, Audrey Blin, Shoji Makino

音講論集___1-P-4_585-586

Presentation date： 2003.09
周波数領域BSSにおける近距離場モデルを用いたパーミュテーションの解法

向井良, 澤田宏, 荒木章子, 牧野昭二

音講論集___1-P-6_589-590

Presentation date： 2003.09
実環境における3音源以上のブラインド分離

澤田宏, 向井良, 荒木章子, 牧野昭二

音講論集___2-5-19_547-548

Presentation date： 2003.09
時間周波数マスキングとICAの併用による音源数 > マイク数の場合のブラインド音源分離

荒木章子, 向井良, 澤田宏, 牧野昭二

音講論集___1-P-5_587-588

Presentation date： 2003.09
ICA-Based audio source separation

S. Makino, S. Araki, R. Mukai, H. Sawada

Technical report of IEICE___EA2003-45_17-24

Presentation date： 2003.06
ICA-based audio source separation

S. Makino, S. Araki, R. Mukai, H. Sawada

International Workshop on Microphone Array Systems - Theory and Practice____

Presentation date： 2003.05
周波数領域ブラインド音源分離におけるpermutation問題の頑健な解法

澤田宏, 向井良, 荒木章子, 牧野昭二

音講論集___3-P-25_777-778

Presentation date： 2003.03
移動音源の低遅延実時間ブラインド分離

向井良, 澤田宏, 荒木章子, 牧野昭二

音講論集___3-P-26_779-780

Presentation date： 2003.03
帯域に適した分離手法を用いるサブバンド領域ブラインド音源分離

荒木章子, 牧野昭二, Robert Aichner, 西川剛樹, 猿渡洋

音講論集___3-P-27_781-782

Presentation date： 2003.03
KL情報量最小化に基づく時間領域ICAと非定常信号の同時無相関化に基づく時間領域ICAの比較

西川剛樹, 高谷智哉, 猿渡洋, 鹿野清宏, 荒木章子, 牧野昭二

音講論集___2-5-14_545-546

Presentation date： 2002.09
死角型ビームフォーマを初期値に用いる時間領域ブラインド音源分離

荒木章子, 牧野昭二, Robert Aichner, 西川剛樹, 猿渡洋

音講論集___2-5-13_543-544

Presentation date： 2002.09
ブラインド音源分離後の残留スペクトルの推定と除去

向井良, 澤田宏, 荒木章子, 牧野昭二

音講論集___2-5-11_539-540

Presentation date： 2002.09
周波数領域ブラインド音源分離におけるpermutation問題の解法

澤田宏, 向井良, 荒木章子, 牧野昭二

音講論集___2-5-12_541-542

Presentation date： 2002.09
周波数領域ICAと時間遅れスペクトル減算による残響下での実時間ブラインド音源分離

向井良, 荒木章子, 澤田宏, 牧野昭二

音講論集___1-Q-19_673-674

Presentation date： 2002.03
サブバンド処理によるブラインド音源分離に関する検討

荒木章子, 牧野昭二, Robert Aichner, 西川剛樹, 猿渡洋

音講論集___3-4-9_619-620

Presentation date： 2002.03
間隔の異なる複数のマイクペアによるブラインド音源分離

澤田宏, 荒木章子, 向井良, 牧野昭二

音講論集___3-4-10_621-622

Presentation date： 2002.03
ICA-based sound separation

S. Makino, S. Araki, R. Mukai, H. Sawada, R. Aichner, H. Saruwatari, T. Nishikawa, Y. Hinamoto

NTT Workshop on Comm. Scene Analysis____

Presentation date： 2002.01
Time domain blind source separation of non-stationary convolved signals with utilization of geometric beamforming

R. Aichner, S. Araki, S. Makino, H. Sawada, T. Nishikawa, H. Saruwatari

NTT Workshop on Comm. Scene Analysis____

Presentation date： 2002.01
Separation and dereverberation performance of frequency domain blind source separation

R. Mukai, S. Araki, S. Makino

NTT Workshop on Comm. Scene Analysis____

Presentation date： 2002.01
Equivalence between frequency domain blind source separation and frequency domain adaptive beamformers

S. Araki, S. Makino, R. Mukai, H. Saruwatari

NTT Workshop on Comm. Scene Analysis____

Presentation date： 2002.01
A polar-coordinate based activation function for frequency domain blind source separation

H. Sawada, R. Mukai, S. Araki, S. Makino

NTT Workshop on Comm. Scene Analysis____

Presentation date： 2002.01
周波数領域ブラインド音源分離と適応ビ－ムフォ－マの等価性について

雛元洋一, 西川剛樹, 猿渡洋, 荒木章子, 牧野昭二, 向井良

信学技報___EA2001-84_75-82

Presentation date： 2001.11
非定常スペクトルサブトラクションによる音源分離後の残留雑音除去

向井良, 荒木章子, 澤田宏, 牧野昭二

音講論集___2-6-14_617-618

Presentation date： 2001.10
周波数領域ブラインド音源分離のための極座標表示に基づく活性化関数

澤田宏, 向井良, 荒木章子, 牧野昭二

音講論集___2-6-13_615-616

Presentation date： 2001.10
周波数領域ブラインド音源分離と周波数領域適応ビームフォーマの関係について

荒木章子, 牧野昭二, 向井良, 猿渡洋

音講論集___2-6-12_613-614

Presentation date： 2001.10
時間領域ICAと周波数領域ICAを併用した多段ICAによるブラインド音源分離

猿渡洋, 西川剛樹, 荒木章子, 牧野昭二

日本神経回路学会全国大会____99-100

Presentation date： 2001.09
複素数に対する独立成分分析のための極座標表示に基づく活性化関数

澤田宏, 向井良, 荒木章子, 牧野昭二

日本神経回路学会全国大会____97-98

Presentation date： 2001.09
実環境での混合音声に対する周波数領域ブラインド音源分離手法の性能限界

荒木章子, 牧野昭二, 西川剛樹, 猿渡洋

音講論集___3-7-4_567-568

Presentation date： 2001.03
帯域分割型ICAを用いたBlind Source Separationにおける帯域分割数の最適化

西川剛樹, 荒木章子, 牧野昭二, 猿渡洋

音講論集___3-7-5_569-570

Presentation date： 2001.03
実環境におけるブラインド音源分離と残響除去性能に関する検討

向井良, 荒木章子, 牧野昭二

音講論集___3-7-3_565-566

Presentation date： 2001.03
周波数領域Blind Source Separationにおける帯域分割数の最適化

西川剛樹, 荒木章子, 牧野昭二, 猿渡洋

信学技報___EA2000-95_53-59

Presentation date： 2001.01
チャネル数変換型多チャネル音響エコーキャンセラ

中川朗, 島内末廣, 羽田陽一, 青木茂明, 牧野昭二

信学総大___A-4-51_140

Presentation date： 2000.03
ステレオエコーキャンセラにおける相互相関変動方法の検討

鈴木邦和, 杉山精, 阪内澄宇, 島内末廣, 牧野昭二

信学技報___EA99-86_25-32

Presentation date： 1999.12
音響系の変動に着目したステレオ信号の相関低減方法

鈴木邦和, 阪内澄宇, 島内末廣, 牧野昭二

音講論集___1-6-12_453-454

Presentation date： 1999.03
ハンズフリー音声会議装置における複数マイクロホンの構成の検討

中川朗, 島内末廣, 牧野昭二

音講論集___2-6-7_493-494

Presentation date： 1999.03
相互相関の変動付加処理に適したステレオエコーキャンセラの構成の検討

島内末廣, 羽田陽一, 牧野昭二, 金田豊

信学総大___A-4-12_121

Presentation date： 1998.03
Block fast projection algorithm with independent block sizes

M. Tanaka, S. Makino, J. Kojima

信学総大___TA-2-2_554-555

Presentation date： 1997.03
射影アルゴリズムを用いたサブバンドステレオエコーキャンセラ

牧野昭二, 島内末廣, 羽田陽一, 中川朗

音講論集___2-7-18_549-550

Presentation date： 1996.09
サブバンドエコーキャンセラにおけるフィルタ更新ベクトルの平坦化の検討

中川朗, 羽田陽一, 牧野昭二

信学ソ大___A-87_88

Presentation date： 1996.09
拡声通信システムにおける周波数帯域別所要エコー抑圧量の検討

阪内澄宇, 牧野昭二

音講論集___2-7-17_547-548

Presentation date： 1996.09
高速射影アルゴリズムの多チャンネル系への適用

島内末廣, 田中雅史, 牧野昭二

信学総大___A-168_170

Presentation date： 1996.03
ES family'アルゴリズムと従来の適応アルゴリズムの関係について

牧野, 昭二

信学技報___DSP95-148_65-70

Presentation date： 1996.01
高速FIRフィルタリング算法を利用した射影法

田中雅史, 牧野昭二, 金田豊

信学ソ大___A-79_81

Presentation date： 1995.09
サブバンドエコーキャンセラのプロトタイプフィルタの検討

中川朗, 羽田陽一, 牧野昭二

信学ソ大___A-73_75

Presentation date： 1995.09
擬似入出力関係を利用したステレオ音響エコーキャンセラ用アルゴリズムの検討

島内末廣, 牧野昭二

音講論集___2-6-5_543-544

Presentation date： 1995.09
複素射影サブバンドエコーキャンセラに関する検討

中川朗, 羽田陽一, 牧野昭二

音講論集___2-6-3_539-540

Presentation date： 1995.09
エコーキャンセラ用ＳＳＢサブバンド射影アルゴリズム

牧野昭二, 羽田陽一, 中川朗

音講論集___2-6-4_541-542

Presentation date： 1995.09
真の音響エコー経路を推定するステレオ射影エコーキャンセラの検討

島内末廣, 牧野昭二

信学総大___A-220_220

Presentation date： 1995.03
ES射影アルゴリズムを用いたデュオフィルタ構成のエコーキャンセラの検討

羽田陽一, 牧野昭二, 小島順治, 島内末廣

音講論集___3-3-10_595-596

Presentation date： 1995.03
音響エコーキャンセラ用デュオフィルタコントロールシステム

羽田陽一, 牧野昭二, 田中雅史, 島内末廣, 小島順治

信学総大___A-350_350

Presentation date： 1995.03
高性能音響エコーキャンセラの開発

小島順治, 牧野昭二, 羽田陽一, 島内末廣, 金田豊

信学総大___A-348_348

Presentation date： 1995.03
ＥＳ射影アルゴリズムの音響エコーキャンセラへの適用

牧野昭二, 羽田陽一, 田中雅史, 金田豊, 小島順治

信学総大___A-349_349

Presentation date： 1995.03
エコーキャンセラの音声入力に対する収束速度改善方法の比較について

牧野, 昭二

音講論集___2-6-16_653-654

Presentation date： 1994.10
ステレオ信号の相互相関の変化に着目したステレオ射影エコーキャンセラの検討

島内末廣, 牧野昭二

音講論集___2-6-17_655-656

Presentation date： 1994.10
PMTC/N-ISDN用多地点エコーキャンセラの構成

須田泰史, 藤野雄一, 牧野昭二, 小長井俊介, 川田真一

信学全大___B-795_393

Presentation date： 1994.09
室内音場伝達関数の共通極・零モデル化

羽田陽一, 牧野昭二, 金田豊

信学技報___EA93-101_19-29

Presentation date： 1994.03
ES-RLSアルゴリズムと従来の適応アルゴリズムの関係について

牧野, 昭二

音講論集___1-5-12_471-472

Presentation date： 1993.10
共通極を用いたスピーカ特性の多点イコライゼーションについて

羽田陽一, 牧野昭二

音講論集___1-5-18_483-484

Presentation date： 1993.10
高次の射影アルゴリズムの演算量削減について

田中雅史, 金田豊, 牧野昭二

信学全大___A-101_1-103

Presentation date： 1993.09
共通極を用いた多点イコライゼーションフィルタについて

羽田陽一, 牧野昭二

音講論集___3-9-17_491-492

Presentation date： 1993.03
複数の室内音場伝達関数に共通な極の最小2乗推定について

羽田陽一, 牧野昭二, 金田豊

信学全大___SA-11-4_1-489 - 1-490

Presentation date： 1993.03
音響エコーキャンセラ用ES射影アルゴリズム

牧野昭二, 金田豊

信学技報___EA92-74_41-52

Presentation date： 1992.11
室内インパルス応答の変動特性を反映させたES-RLSアルゴリズム

牧野昭二, 金田豊

音講論集___2-4-19_547-548

Presentation date： 1992.10
音声入力に対する射影法の次数と収束特性について

田中雅史, 牧野昭二, 金田豊

音講論集___1-4-14_489-490

Presentation date： 1992.10
エコーキャンセラ用ES射影アルゴリズムの収束条件について

牧野昭二, 金田豊

信学全大___SA-9-6_1-301

Presentation date： 1992.09
室内インパルス応答の統計的性質に基づく指数重み付けNLMS適応フィルタ

牧野昭二, 金田豊

信学技報___EA92-48_9-20

Presentation date： 1992.08
エコーキャンセラ用ES射影アルゴリズム

牧野昭二, 金田豊

信学全大___SA-7-11_1-472 - 1-473

Presentation date： 1992.03
音響エコーキャンセラにおけるダブルトーク制御方式の検討

中原宏之, 羽田陽一, 牧野昭二, 吉川昭吉郎

音講論集___3-5-7_503-504

Presentation date： 1992.03
音の到来方向によらない頭部伝達伝達関数の共通極とモデル化について

羽田陽一, 牧野昭二, 金田豊

音講論集___1-8-5_483-484

Presentation date： 1991.10
エコーキャンセラ用ES (Exponential Step) アルゴリズムの収束条件について

牧野昭二, 金田豊

音講論集___1-7-25_419-420

Presentation date： 1991.03
室内音場伝達関数の極の推定について

羽田陽一, 牧野昭二, 金田豊

音講論集___1-7-12_393-394

Presentation date： 1991.03
帯域分割形指数重み付けアルゴリズムを用いた音響エコーキャンセラ

牧野昭二, 羽田陽一

信学全大___SA-9-4_1-255 - 1-256

Presentation date： 1990.10
低周波領域における室内音場伝達関数のARMAモデルについて

羽田陽一, 牧野昭二, 小泉宣夫

音講論集___2-7-14_439-440

Presentation date： 1990.03
指数重み付けによるエコーキャンセラ用適応アルゴリズム

牧野, 昭二

音講論集___3-6-5_517-518

Presentation date： 1989.10
エコーキャンセラの室内音場における適応特性改善について

牧野昭二, 小泉宣夫

信学技報___EA89-3_15-21

Presentation date： 1989.04
拡声通話形の音声会議システム

及川弘, 西野正和, 牧野昭二

信学全大___B-548_2-243

Presentation date： 1988.03
エコーキャンセラの室内音場における適応特性の改善について

牧野昭二, 小泉宣夫

音講論集___1-5-13_355-356

Presentation date： 1988.03
複数反響路を有する音響エコーキャンセラの構成法

小泉宣夫, 牧野昭二, 及川弘

信学技報___EA87-75_1-6

Presentation date： 1988.01
複数反響路を有する音響エコーキャンセラ

小泉宣夫, 牧野昭二, 及川弘

信学部門全大___431_1-296

Presentation date： 1987.09
音響エコーキャンセラの室内環境における消去特性について

牧野昭二, 小泉宣夫

信学技報___EA87-43_41-48

Presentation date： 1987.08
直方体ブース内の障害物によるインパルス応答の変動について

牧野昭二, 小泉宣夫

音講論集___1-3-1_295-296

Presentation date： 1987.03
MTFによる音声会議でのマイクロホン配置の評価について

小泉宣夫, 牧野昭二, 青木茂明

音講論集___2-7-18_631-632

Presentation date： 1986.10
音響エコーキャンセラの室内環境における定常特性について

牧野昭二, 小泉宣夫

音講論集___2-7-19_383-384

Presentation date： 1985.10
室内残響特性を考慮した音声スイッチ切替特性の検討

牧野昭二, 山森和彦

音講論集___1-2-19_265-266

Presentation date： 1984.10
マイクロプロセッサ制御を用いた拡声電話機の構成法

山森和彦, 松井弘行, 牧野昭二

信学技報___EA84-41_15-21

Presentation date： 1984.09
音声スイッチ回路損失制御波形の通話品質への影響

石丸薫, 小川峰義, 牧野昭二

信学部門全大___795_3-190

Presentation date： 1984.09
周辺に段差を持つ圧電バイモルフ振動板の振動特性について

一ノ瀬裕, 牧野昭二

音講論集___1-6-5_287-288

Presentation date： 1983.10
ハンドセット小形化に関する一検討

牧野昭二, 一ノ瀬裕

音講論集___1-6-10_297-298

Presentation date： 1983.10

▼display all

Research Projects

次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

関東経済産業局中小企業経営支援等対策費補助金（戦略的基盤技術高度化支援事業）

Project Year :

2020.04

-

2021.03
音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

日本学術振興会基盤研究(B)

Project Year :

2019

-

2021

牧野昭二

　View Summary

[検討項目１] 音の伝播の物理的なモデルに基づいて観測信号を補間し、実際には存在しない、いわばバーチャルな観測信号を作り出して素子数を擬似的に増やすことにより、音源数に依存することなく高品質な出力を得るための統一的なアレー信号処理を検討した。擬似観測の振幅は非線形補間により推定した。擬似観測を用いた音声強調の劣決定拡張により、擬似観測の基本的な検証を行った。さらに、バーチャルマイクロホンの動作原理の解明と高性能化を図った。今期は、国際会議発表２件、および、国内大会発表１件の研究成果を得た。
[検討項目２] 音環境からの情報を利用した多チャネル信号処理アルゴリズムを開発した。既存のアルゴリズムを分散型マイクロホンアレーに対応できるように一般化し、さらに強力な最適化規範を導入した。分散型マイクロホンアレーにおけるサブアレーの同期手法を開発した。ブラインド音源分離/抽出アルゴリズムや多チャネル残響除去アルゴリズムを分散型マイクロホンアレーに対応できるように開発した。さらに、必要なマイクロホンを最小化して演算量を削減しながら、性能を最適化するためのマイクロホン選択手法も検討した。今期は、雑誌論文４件、国際会議発表７件、および、国内大会発表９件の研究成果を得た。
[検討項目３] 強調された音源信号から抽出した特徴量に基づき、音環境を解析・理解した。音源信号に関する先見知識を利用し、特徴量次元での分類法も利用した。分類精度を向上させるために、深層学習などの最新の音声認識技術を活用した。今期は、国際会議発表１件、および、国内大会発表１件の研究成果を得た。
マイクロホンアレーを用いた音情景解析の研究

筑波大学・ドイツ学術交流会（ＤＡＡＤ）パートナーシップ・プログラム

Project Year :

2017.04

-

2018.03
ＡＬＳ患者のための音の空間情報を利用したブレインマシンインタフェース(ＢＭＩ)の研究開発

総務省戦略的情報通信研究開発推進制度（ＳＣＯＰＥ）その他

Project Year :

2014.04

-

2015.03
Innovation of multi-channel EEG signal processing technology for BMI development by fusion of information science and brain science

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2013.04

-

2014.03

MAKINO Shoji, RUTKOWSKI Tomasz, MIYABE Shigeki, TERASAWA Hiroko, YAMADA Takeshi

　View Summary

We advanced BMI development in two following frameworks. (1) We proposed a method to select the optimal latency and electrode based on the F-value based on the statistical characteristics of an event-related potential for spatial auditory stimuli. The proposed method demonstrated an 8% improvement of correct classification rate. (2) We verified a combination of real and virtual sound sources by speakers to evoke P300 responses. A large individual difference in P300 appearance was confirmed. To develop an alternate auditory BMI using virtual sound source, we tried a headphone-based auditory BMI using a head impulse response in an open database. A clear P300 was observed in occipital area. With view to develop multimodal BMI, the P300s by the spatial auditory stimuli, visual stimuli and the combination of these modalities were compared. It revealed that the amplitude of P300 for spatial auditory stimuli was less than the other.
音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

日本学術振興会基盤研究(B)

Project Year :

2020.04

-

2021.03
スモールデータ機械学習理論に基づく音響拡張現実感及び音コミュニケーション能力拡張

日本学術振興会基盤研究(A)

Project Year :

2020.04

-

2021.03
音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

日本学術振興会基盤研究(B)

Project Year :

2019.04

-

2020.03
スモールデータ機械学習理論に基づく音響拡張現実感及び音コミュニケーション能力拡張

日本学術振興会基盤研究(A)

Project Year :

2019.04

-

2020.03
次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

関東経済産業局中小企業経営支援等対策費補助金（戦略的基盤技術高度化支援事業）

Project Year :

2019.04

-

2020.03
非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

日本学術振興会基盤研究(A)

Project Year :

2019.04

-

2020.03
高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

セコム科学技術振興財団

Project Year :

2019.04

-

2020.03
大量音声データの事前学習に基づくブラインド音源分離手法の高度化

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2019.04

-

2020.02
次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

関東経済産業局中小企業経営支援等対策費補助金（戦略的基盤技術高度化支援事業）

Project Year :

2018.09

-

2019.03
非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

日本学術振興会基盤研究(A)

Project Year :

2018.04

-

2019.03
高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

セコム科学技術振興財団

Project Year :

2018.04

-

2019.03
聴覚特性を考慮した信号処理・機械学習アプローチによる音声強調法の研究

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2018.04

-

2019.02
DNNを用いた音声音響符号化の研究

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2018.04

-

2019.02
非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

日本学術振興会基盤研究(A)

Project Year :

2017.04

-

2018.03
音環境の認識と理解およびスマートホームセキュリティ－、ロボット聴覚、等への応用

NII 国内共同研究

Project Year :

2017.04

-

2018.03
環境に適応するための音声強調系最適化

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2017.04

-

2018.03
高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

セコム科学技術振興財団

Project Year :

2017.04

-

2018.03
DNNを用いた音声音響符号化の研究

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2017.04

-

2018.02
聴覚特性を考慮した信号処理・機械学習アプローチによる音声強調法の研究

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2017.04

-

2018.02
柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

国立研究開発法人科学技術振興機構 (JST) 革新的研究開発推進プログラム（ImPACT)

Project Year :

2017.04

-

2017.11
マイクの指向性による、音声認識率の向上

富士ソフト株式会社国内共同研究

Project Year :

2016.04

-

2017.03
柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

国立研究開発法人科学技術振興機構 (JST) 革新的研究開発推進プログラム（ImPACT)

Project Year :

2016.04

-

2017.03
非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

日本学術振興会基盤研究(A)

Project Year :

2016.04

-

2017.03
マイクロホンアレー付き監視カメラを用い音響情報を統計数理的学習理論により解析するイベント検出とシーン解析

NII 国内共同研究

Project Year :

2016.04

-

2017.03
高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

セコム科学技術振興財団

Project Year :

2016.04

-

2017.03
音響情報と映像情報を統計数理的学習理論により融合するイベント検出とシーン解析

筑波大学研究基盤支援プログラム（Ｂタイプ）

Project Year :

2016.04

-

2017.03
マイクロホンアレーを用いた音情景解析の研究

筑波大学・ドイツ学術交流会（ＤＡＡＤ）パートナーシップ・プログラム

Project Year :

2016.04

-

2017.03
音声音響符号化音のプレフィルタ・ポストフィルタ処理による音質改善の研究

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2016.04

-

2017.02
音声のスペクトル領域とケプストラム領域における同時強調法の研究

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2016.04

-

2017.02
柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

国立研究開発法人科学技術振興機構 (JST) 革新的研究開発推進プログラム（ImPACT)

Project Year :

2015.09

-

2016.03
非同期録音機器を利用可能にするマイクロフォンアレイ信号処理の研究

日本学術振興会基盤研究(B)

Project Year :

2015.04

-

2016.03
音響センシングによる交通量モニタリング

NII 国内共同研究

Project Year :

2014.04

-

2015.03
低遅延・低ビットレートの音声・音響統合符号化の検討

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2014.04

-

2015.03
非同期録音機器を利用可能にするマイクロフォンアレイ信号処理の研究

日本学術振興会基盤研究(B)

Project Year :

2014.04

-

2015.03
高次統計量追跡による自律カスタムメイド音コミュニケーション拡張システムの研究

日本学術振興会基盤研究(A)

Project Year :

2014.04

-

2015.03
ＡＬＳ患者のための音の空間情報を利用したブレインマシンインタフェース(ＢＭＩ)の研究開発

総務省戦略的情報通信研究開発推進制度（ＳＣＯＰＥ）その他

Project Year :

2013.04

-

2014.03
Microphone Array Signal Processing with Asynchronous Recording Devices

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2013.04

-

2014.03

ONO Nobutaka, MAKINO Shoji, MIYABE Shigeki, SHINODA Koichi

　View Summary

Microphone array signal processing is an important technique to estimate the direction of arrival of sound or to enhance a target sound in noisy environment by processing multi-channel signals. In the microphone array signal processing, a tiny time difference between channels is important information. Therefore, multi-channel signals have to be recorded in a synchronized way in conventional framework. While in this study, we have developed a technique to synchronize recording signals or to estimate microphone positions without any a priori knowledge in order to use asynchronous individual recording devices such as smartphones, laptop PC, and IC recorder.
複数録音機器による非同期録音信号の同期に関する研究

ヤマハ株式会社国内共同研究

Project Year :

2013.04

-

2014.03
複素対数補間に基づくヴァーチャル観測を用いた劣決定アレイ信号処理

NII 国内共同研究

Project Year :

2013.04

-

2014.03
A study on custom-made augmented speech communication system based on higher-order statistics pursuit

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2013.04

-

2014.03

Saruwatari Hiroshi, SHIKANO Kiyohiro, TODA Tomoki, KAWANAMI Hiromichi, ONO Nobutaka, MIYABE Shigeki, MAKINO Shoji, KOYAMA Shoichi

　View Summary

In this study, we address an unsupervised custom-made augmented speech communication system based on the higher-order statistics pursuit. This system consists of two parts, namely, a binaural hearing aid using blind source separation and a speaking aid via speech conversion. The following results are obtained. (1) As the binaural hearing-aid system, we propose new algorithms for an accurate and fast blind source separation and statistical speech conversion, yielding a high quality speech enhancement system utilizing a fixed point of auditory perception. (2) As the speaking-aid system, a new robust speech conversion algorithm against a mismatch between speech database is proposed. The evaluation using real-world sound database shows the efficacy of the proposed method.
低遅延・低ビットレートの音声・音響統合符号化の検討

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2013.05

-

2014.02
ＡＬＳ患者のための音の空間情報を利用したブレインマシンインタフェース(ＢＭＩ)の研究開発

総務省戦略的情報通信研究開発推進制度（ＳＣＯＰＥ）その他

Project Year :

2012.09

-

2013.03
Innovation of multi-channel EEG signal processing technology for BMI development by fusion of information science and brain science

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2012.04

-

2013.03

MAKINO Shoji, RUTKOWSKI Tomasz, MIYABE Shigeki, TERASAWA Hiroko, YAMADA Takeshi

　View Summary

We advanced BMI development in two following frameworks. (1) We proposed a method to select the optimal latency and electrode based on the F-value based on the statistical characteristics of an event-related potential for spatial auditory stimuli. The proposed method demonstrated an 8% improvement of correct classification rate. (2) We verified a combination of real and virtual sound sources by speakers to evoke P300 responses. A large individual difference in P300 appearance was confirmed. To develop an alternate auditory BMI using virtual sound source, we tried a headphone-based auditory BMI using a head impulse response in an open database. A clear P300 was observed in occipital area. With view to develop multimodal BMI, the P300s by the spatial auditory stimuli, visual stimuli and the combination of these modalities were compared. It revealed that the amplitude of P300 for spatial auditory stimuli was less than the other.
非同期録音機器を利用可能にするアレイ信号処理技術

NII 国内共同研究

Project Year :

2012.04

-

2013.03
A study on custom-made augmented speech communication system based on higher-order statistics pursuit

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2012.04

-

2013.03

Saruwatari Hiroshi, SHIKANO Kiyohiro, TODA Tomoki, KAWANAMI Hiromichi, ONO Nobutaka, MIYABE Shigeki, MAKINO Shoji, KOYAMA Shoichi

　View Summary

In this study, we address an unsupervised custom-made augmented speech communication system based on the higher-order statistics pursuit. This system consists of two parts, namely, a binaural hearing aid using blind source separation and a speaking aid via speech conversion. The following results are obtained. (1) As the binaural hearing-aid system, we propose new algorithms for an accurate and fast blind source separation and statistical speech conversion, yielding a high quality speech enhancement system utilizing a fixed point of auditory perception. (2) As the speaking-aid system, a new robust speech conversion algorithm against a mismatch between speech database is proposed. The evaluation using real-world sound database shows the efficacy of the proposed method.
Innovation of multi-channel EEG signal processing technology for BMI development by fusion of information science and brain science

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2011.04

-

2012.03

MAKINO Shoji, RUTKOWSKI Tomasz, MIYABE Shigeki, TERASAWA Hiroko, YAMADA Takeshi

　View Summary

We advanced BMI development in two following frameworks. (1) We proposed a method to select the optimal latency and electrode based on the F-value based on the statistical characteristics of an event-related potential for spatial auditory stimuli. The proposed method demonstrated an 8% improvement of correct classification rate. (2) We verified a combination of real and virtual sound sources by speakers to evoke P300 responses. A large individual difference in P300 appearance was confirmed. To develop an alternate auditory BMI using virtual sound source, we tried a headphone-based auditory BMI using a head impulse response in an open database. A clear P300 was observed in occipital area. With view to develop multimodal BMI, the P300s by the spatial auditory stimuli, visual stimuli and the combination of these modalities were compared. It revealed that the amplitude of P300 for spatial auditory stimuli was less than the other.
音声特性と聴覚特性を反映した音声強調処理技術の研究

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2011.04

-

2012.03
脳科学と情報科学を融合させたＢＣＩ構築のための多チャネル脳波信号処理の研究

電気通信普及財団出資金による受託研究

Project Year :

2011.04

-

2012.03
脳科学，生命科学，情報科学を融合させた生体マルチメディア情報研究

Project Year :

2011.04

-

　
音声特性と聴覚特性を反映した音声強調処理技術の研究

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2010.04

-

2011.03
音声特性と聴覚特性を反映した音声強調処理技術の研究

NTT コミュニケーション科学基礎研究所国内共同研究

Project Year :

2009.04

-

2010.03
生体信号処理と音響信号処理による生命科学研究の革新

日本学術振興会科学研究費助成事業

Project Year :

2010

　

　

牧野昭二, WU Y. J.
音声、音楽メディアのコンテンツ基盤技術の創出とイマーシブオーディオコミュニケーションの創生

Project Year :

2009.04

-

▼display all

Misc

畳込み混合のブラインド音源分離(<特集>独立成分分析とその応用特集号)

牧野昭二, 荒木章子, 向井良, 澤田宏

システム/制御/情報 : システム制御情報学会誌 48 ( 10 ) 401 - 408 2004.10

DOI CiNii
ブラインドな処理が可能な音源分離技術 (特集コミュニケーションの壁を克服するための音声・音響処理技術)

牧野昭二, 荒木章子, 向井良

NTT技術ジャ-ナル 15 ( 12 ) 8 - 12 2003.12

CiNii
ステレオエコーキャンセラの課題と解決法

牧野昭二, 島内末廣

システム/制御/情報 : システム制御情報学会誌 46 ( 12 ) 724 - 732 2002.12

DOI CiNii
混じりあった声を解く--遠隔発話の認識を目指して (特集論文1 人にやさしい対話型コンピュータ)

牧野昭二, 向井良, 荒木章子

NTT R & D 50 ( 12 ) 937 - 944 2001.12

CiNii
サブバンド信号処理 : 実時間動作化の奥の手

牧野昭二

日本音響学会誌 56 ( 12 ) 845 - 851 2000.12

DOI CiNii
周波数帯域における音響エコー経路の変動特性を反映させたサブバンドESアルゴリズム

牧野昭二, 羽田陽一

電子情報通信学会論文誌. A, 基礎・境界 79 ( 6 ) 1138 - 1146 1996.06

CiNii
音響エコ-キャンセラ用ES射影アルゴリズム (シ-ムレスな音響空間の実現を目指して<特集>)

牧野昭二, 金田豊

NTT R & D 44 ( 1 ) p45 - 52 1995.01

CiNii
音響エコー経路の変動特性を反映させたRLS適応アルゴリズム

牧野昭二, 金田豊

日本音響学会誌 50 ( 1 ) 32 - 39 1993.12

CiNii
Estimating correlation coefficients of two super-Gaussian complex signals without phase observation

MIYABE Shigeki, ONO Nobutaka, MAKINO Shoji

IEICE technical report. Signal processing 114 ( 474 ) 19 - 24 2015.03

　View Summary

In this paper, we describe estimation of a correlation coefficient between two complex signal sequences under the condition where the observation misses the phase. In our previous work, we formulated a probabilistic model which assumes that the complex amplitude sequences follow a bivariate complex normal distribution, and proposed a maximum likelihood estimation of the correlation coefficient by an EM algorithm which treats the phase difference as a hidden variable. However, complex signals are often super Gaussian and cause model mismatch of the Gaussian assumption, and the estimation accuracy depends on signals. In this paper, we examine an estimation robust against the model mismatch by formulating a maximum likelihood estimation adaptive to the signal shapes by assuming that the two complex amplitude sequences follow a multivariate t distribution. Experimental results reveals that the complex t distribution model is not always better than the complex normal distribution model depending on the signals, but by selecting the appropriate model, the maximum likelihood estimation can obtain the better result than a straightforward amplitude correlation estimator.

CiNii
Blind Compensation of Sampling Frequency Mismatch for Unsynchronized Microphone Array

Miyabe Shigeki, Ono Nobutaka, Makino Shoji

Technical report of IEICE. EA 112 ( 347 ) 11 - 16 2012.12

　View Summary

In this paper we propose a method to estimate the mismatch of sampling frequencies between the boservation channels for unsynchronized microphone arrays. Since the change of time difference between channels can be regarded as constant in a short time interval, we compensate the phase in the frequency domain. Also, assuming the sources does not move, we estimate the mismatch of sampling frequencies by maximum likelihood esitmation. Experiments reveals that the proposed method recovers the performance of array singal processing.

CiNii
Speech enhancement by asynchronous microphone array using the single source interval information

Sakanashi Ryutaro, Ono Nobutaka, Miyabe Shigeki, Yamada Takeshi, Makino Shoji

Technical report of IEICE. EA 112 ( 347 ) 17 - 22 2012.12

　View Summary

Asynchronous microphone array has the advantage that the use of plurality of recording devices such as mobile phones and voice recorder, there is no scalability constraints in audio signal processing according to conven tional microphone array, it can be inexpensive and flexible configuration. However, asynchronous microphone array has several problems. For example, recording beginning time and DOA information are unknown. Also, unknown individual difference of sampling frequency between the devices. In particular, the shift of the recording beginning time and individual difference of sampling frequency between the devices can have a significant impact on the signal processing, it is necessary to compensate. In this paper, we assume that the purpose of speech enhancement in advance. Such as recording conference to create the minutes. Then, the signal that is put into the record "the single source interval information" that is the time interval that produced only sound, we suggest the proposed synchronization and compensation.

CiNii
Simulation of radial characteristic control with spherical speaker array

HAYASHI Takaya, MIYABE Shigeki, YAMADA Takeshi, MAKINO Shoji

Technical report of IEICE. EA 112 ( 76 ) 19 - 24 2012.06

　View Summary

This paper describes control of distance attenuation using spherical loudspeaker array. One research group proposed radial filtering with spherical microphone to control the sensitivity to distance from a sound source by modeling the propagation of waves in spherical harmonic domain. Since transfer functions do not change when the input and output are swapped, we can apply the radial filtering for microphone arrays to the filter design of distance attenuation control with loudspeaker arrays. Experimental results confirmed that the proposed method is effective in the low frequencies.

CiNii
Underdetermined DOA estimation by the non-linear MUSIC based on higher-order moment analysis

SUGIMOTO Yuya, MIYABE Shigeki, YAMADA Takeshi, MAKINO Shoji

Technical report of IEICE. EA 112 ( 76 ) 49 - 54 2012.06

　View Summary

This paper describes a new approach to extend MUltiple SIgnal Classification (MUSIC) to underdetermined direction-of-arrival (DOA) estimation in high resolution by exploiting higher-order moments. The proposed method maps the observed signals onto higher-dimensional space nonlinearly, and analyzes the covariance matrix there. The covariance matrix in the higher-dimensional space corresponds to the higher-order cross moment matrix in the original space of the observed signals. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method achieves higher resolution of DOA estimation than the standard MUSIC, and also achieves the ability to estimate DOAs in underdetermined conditions. We compared the property of the proposed method with the conventional 2q-MUSIC utilizing higher-order cumulants theoretically and experimentally.

CiNii
D-14-9 Effective factors for grading short answer questions in Japanese speaking test

Okubo Naoko, Yamahata Yuto, Yamada Takeshi, Imai Shingo, Ishizuka Kenkichi, Shinozaki Takahiro, Nisimura Ryuichi, Makino Shoji, Kitawaki Nobuhiko

Proceedings of the IEICE General Conference 2012 ( 1 ) 193 - 193 2012.03

CiNii
D-14-8 Effective factors for grading reading questions in Japanese speaking test

Yamahata Yuto, Okubo Naoko, Yamada Takeshi, Imai Shingo, Isizuka Kenkichi, Shinozaki Takahiro, Nishimura Ryuichi, Makino Shoji, Kitawaki Nobuhiko

Proceedings of the IEICE General Conference 2012 ( 1 ) 192 - 192 2012.03

CiNii
Scattered speech signal detection by principal component analysis for spatial power spectrum

40 ( 7 ) 575 - 580 2010.08

CiNii
Speaker diarization for meetings by integrating speech presence probability estimation and time-frequency domain direction of arrival estimation

ARAKI Shoko, FUJIMOTO Masakiyo, ISHIZUKA Kentaro, NAKATANI Tomohiro, SAWADA Hiroshi, MAKINO Shoji

IEICE technical report 108 ( 143 ) 19 - 24 2008.07

　View Summary

This paper presents a meeting diarization system that estimates who spoke when in a meeting. Our proposed system is realized by using a noise robust voice activity detector (VAD), a direction of arrival (DOA) estimator, and a DOA classifier. This paper proposes two methods for improving diarization performance. As the first proposal, we employ a DOA at each time-frequency slot (TFDOA) so that multiple DOAs can be estimated at a frame when multiple speakers speak simultaneously. The second proposal is to integrate VAD and DOA in a probabilistic way. This paper reports how such proposals improve diarization performance for real meetings/conversations.

CiNii
Special section on acoustic scene analysis and reproduction - Foreword

Shoji Makino

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E91A ( 6 ) 1301 - 1302 2008.06

Other
Blind Audio Source Separation based on Independent Component Analysis

MAKINO Shoji

IEICE technical report 108 ( 70 ) 65 - 73 2008.05

　View Summary

This paper describes a state-of-the-art method for the blind source separation (BSS) of convolutive mixtures of audio signals. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving nonlinear decorrelation, nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i..e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. The is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct separation filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction, and sources can be simultaneously active in BSS.

CiNii
周波数領域ICAにおける初期値の短時間データからの学習

荒木章子, 伊藤信貴, 澤田宏, 小野順貴, 牧野昭二, 嵯峨山茂樹

電子情報通信学会大会講演論文集 2008 208 - 208 2008.03

CiNii J-GLOBAL
AS-4-5 Sparseness based Underdetermined Blind Speech Separation

Araki Shoko, Sawada Hiroshi, Makino Shoji

Proceedings of the IEICE General Conference "S - 46"-"S-47" 2008

CiNii
A-10-7 Blind Signal Separation by Observation Vector Clustering

Araki Shoko, Sawada Hiroshi, Mukai Ryo, Makino Shoji

Proceedings of the Society Conference of IEICE 2005 208 - 208 2005.09

CiNii
A-10-9 Blind Extraction of Dominant Target Sources from Many Background Interference Sources

Sawada Hiroshi, Araki Shoko, Mukai Ryo, Makino Shoji

Proceedings of the Society Conference of IEICE 2005 210 - 210 2005.09

CiNii
A-10-8 Blind Source Separation of Many Speech Signals Using Small 3-D Microphone Array

Mukai Ryo, Sawada Hiroshi, Araki Shoko, Makino Shoji

Proceedings of the Society Conference of IEICE 2005 209 - 209 2005.09

CiNii
Low-delay Real-time Blind Srouce Separation for Moving Speakers

MUKAI Ryo, SAWADA Hiroshi, ARAKI Shoko, MAKINO Shoji

2003 ( 1 ) 779 - 780 2003.03

CiNii
独立成分分析に基づくブラインド音源分離

牧野昭二

ディジタル信号処理シンポジウム 103 ( 129 ) 17 - 24 2003

Article, review, commentary, editorial, etc. (scientific journal)

　View Summary

This paper introduces the blind source separation (BSS) of convolutive mixtures of acoustic signals, especially speech. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving nonlinear decorrelation, nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. This is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct unmixing filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction, and sources can be simultaneously active in BSS.

CiNii
Blind source separation using SSB subband

ARAKI S., AICHNER Robert, MAKINO S., NISHIKAWA T., SARUWATARI H.

2002 ( 1 ) 619 - 620 2002.03

CiNii
Blind source separation using pairs of microphones with different distances

SAWADA Hiroshi, ARAKI Shoko, MUKAI Ryo, MAKINO Shoji

2002 ( 1 ) 621 - 622 2002.03

CiNii
Real time blind source separation in reverberant environment using frequency domain ICA and time-delayed spectral subtraction

MUKAI Ryo, ARAKI Shoko, SAWADA Hiroshi, MAKINO Shoji

2002 ( 1 ) 673 - 674 2002.03

CiNii
Relationship between frequency domain blind source separation and frequency domain adaptive beamformers

ARAKI S., MAKINO S., MUKAI R., SARUWATARI H.

2001 ( 2 ) 613 - 614 2001.10

CiNii
Suppression of residual cross-talk component using non-stationary spectral subtraction

MUKAI Ryo, ARAKI Shoko, SAWADA Hiroshi, MAKINO Shoji

2001 ( 2 ) 617 - 618 2001.10

CiNii
Limitation of frequency domain Blind Source Separation for convolutive mixture of speech

2001 ( 1 ) 567 - 568 2001.03

CiNii
Blind source separation and removal of reverberation in the real environment

MUKAI Ryo, ARAKI Shoko, MAKINO Shoji

2001 ( 1 ) 565 - 566 2001.03

CiNii
Optimization on the Number of Subbands in Blind Source Separation with Subband ICA

NISHIKAWA T., ARAKI S., MAKINO S., SARUWATARI H.

2001 ( 1 ) 569 - 570 2001.03

CiNii
Optimization on the Number of Subbands in Frequency-Domain Blind Source Separation

NISHIKAWA Tsuyoki, ARAKI Shoko, MAKINO Shoji, SARUWATARI Hiroshi

Technical report of IEICE. EA 100 ( 580 ) 53 - 59 2001.01

　View Summary

This paper describes an optimization strategy in terms of the number of subbands in frequency-domain blind source separation (BSS). In general, the separation performance of the conventional ICA-based BSS method significantly degrades under reverberant conditions. On the other hand, as for the inverse filter for dereverberation, it is known that the higher performance can be achieved as the number of filter taps (or the number of subbands) increases. Accordingly, first, we carry out the BSS experiments by increasing the number of subbands in ICA to improve the BSS performance. The results of the signal separation experiments reveal that the separation performance degrades when the number of subbands is exceedingly large; e.g., 1024-or 2048-subband are used. In order to show the cause of the degradation, next, we newly define a simple objective measure to quantify an independence, and investigate the relations between the number of subbands and the independence among narrowband sound sources. The results of the measurements clarify that the independence decreases as the number of subbands increases, and we can conclude that the optimal number of subbands exists in BSS based on the frequency-domain ICA.

CiNii
A multi-channel acoustic echo canceller using channel number compressor and expander

Nakagawa Akira, Shimauchi Suehiro, Haneda Yoichi, Aoki Shigeaki, Makino Shoji

Proceedings of the IEICE General Conference 2000 140 - 140 2000.03

CiNii
A study of decorrelation on a stereo echo canceller

SUZUKI Kuniyasu, SUGIYAMA Kiyoshi, SAKAUCHI Sumitaka, SHIMAUCHI Suehiro, MAKINO Shoji

Technical report of IEICE. EA 99 ( 518 ) 25 - 32 1999.12

　View Summary

A stereo echo canceller is required for a stereo teleconferencing system. The main problems are that the adaptive filters often misconverge or, if not, convergence speeds are very slow because of the cross-correlation between stereo signals. Several pre-processing methods which decorrelate stereo signals in order to overcome this problem have been proposed. But these methods introduce distortion resulting in low speech quality. In this paper, we focus on tiny movement of far-end talker and propose a new method of decorrelating with stereo signals without any confusion in sound image localization. We show that convergence can be further improved and speech quality maintained by optimizing using the characteristics of auditory perception.

CiNii
Decorrelation of the stero signals based on acoustic path variation. -2nd report.Optimizing using the characteristics of auditory perception.-

SUZUKI Kuniyasu, SAKAUCHI Sunitaka, SHIMAUCHI Suehiro, MAKINO Shoji

1999 ( 2 ) 495 - 496 1999.09

CiNii
Decorrelation of the stereo signals based on acoustic path variation.

SUZUKI Kuniyasu, SAKAUCHI Sumitaka, SHIMAUCHI Suehiro, MAKINO Shoji

1999 ( 1 ) 453 - 454 1999.03

CiNii
A study of microphone system for the hands-free tele-conferencing unit

NAKAGAWA Akira, SHIMAUCHI Suehiro, MAKINO Shoji

1999 ( 1 ) 493 - 494 1999.03

CiNii
A Study on Configuration of Stereo Echo Canceller with Cross-Correlation Shaker

Shimauchi Suehiro, Haneda Yoichi, Makino Shoji, Kaneda Yutaka

Proceedings of the IEICE General Conference 1998 121 - 121 1998.03

CiNii
Block Fast Projection Algorithm with Independent Block Sizes

Tanaka Masashi, Makino Shoji, Kojima Junji

Proceedings of the IEICE General Conference 1997 554 - 555 1997.03

　View Summary

Block processing is an effective approach for reducing the computational complexity of adaptive filtering algorithms although it delays the adaptive filter output and degrades the convergence rate in some implementations. Recently, Benesty[1] proposed a solution to the problems. He introduced the idea of 'exact' block processing which produces the filter output exactly the same as that of the corresponding sample-by-sample algorithm and has short delay by facilitating the fast FIR filtering method. Block processing can be applied to two parts of the adaptive filtering algorithms, i.e. computing the filter output and updating the filter. Conventional 'exact' block algorithms have been using the identical block size for the two parts. This short paper presents the 'exact' block projection algorithm [2] having two independent block sizes, which is listed in List 1. We see, by showing the relation between the filter length and the output delay for a given computation power, that the independent block sizes extend the availability of the 'exact' block fast projection algorithm toward use with longer delay.

CiNii
Whitening of the filter coefficient update-vector in the subband echo cancellers

NAKAGAWA Akira, HANEDA Yoichi, MAKINO Shoji

Proceedings of the Society Conference of IEICE 1996 88 - 88 1996.09

CiNii
Consideration on frequency domain echo return loss required for audio teleconference systems

SAKAUCHI Sumitaka, MAKINO Shoji

1996 ( 2 ) 547 - 548 1996.09

CiNii
Subband stereo echo canceller using projection algorithm with fast convergence to the true echo path.

MAKINO Shoji, SHIMAUCHI Suehiro, HANEDA Yoichi, NAKAGAWA Akira

1996 ( 2 ) 549 - 550 1996.09

CiNii
Fast Projection Algorithm for Multi-Channel Systems

Shimauchi Suehiro, Tanaka Masashi, Makino Shoji

Proceedings of the IEICE General Conference 1996 170 - 170 1996.03

CiNii
A study on prototype filter of subband echo canceller

NAKAAGAWA Akira, HANEDA Yoichi, MAKINO Shoji

Proceedings of the Society Conference of IEICE 1995 75 - 75 1995.09

CiNii
Study on the stereo echo cancellation algorithm using imaginary input-output relationships

SHIMAUCHI Suehiro, MAKINO Shoji

1995 ( 2 ) 543 - 544 1995.09

CiNii
SSB subband projection algorithm for echo cancellers

MAKINO Shoji, HANEDA Yoichi, NAKAGAWA Akira

1995 ( 2 ) 541 - 542 1995.09

CiNii
A study on the complex projection subband echocancellers

NAKAGAWA Akira, HANEDA Yoichi, MAKINO Shoji

1995 ( 2 ) 539 - 540 1995.09

CiNii
A study of stereo projection echo canseller with true echo path estimation

Shimauchi Suehiro, Makino Shoji

Proceedings of the IEICE General Conference 1995 220 - 220 1995.03

CiNii
Study on the echo canceller using the ES Projection algorithm

Makino Shoji, Haneda Yoichi, Tanaka Masashi, Kaneda Yutaka, Kojima Jyunji

Proceedings of the IEICE General Conference 1995 349 - 349 1995.03

CiNii
Study on the echo canceller based on the duo filter system using the ES Projection algorithm

HANEDA Yoichi, MAKINO Shoji, KOJIMA Junji, SHIMAUCHI Suehiro

1995 ( 1 ) 595 - 596 1995.03

CiNii
Duo filter control system for acoustic echo cancellers

Haneda Yoichi, Makino Shoji, Tanaka Masashi, Shimauchi Suehiro, Kojima Junji

Proceedings of the IEICE General Conference 350 - 350 1995

CiNii
Projection algorithm using fast FIR Filtering techniques

Proceedings of the Society Conference of IEICE 81 - 81 1995

CiNii
音響エコーキャンセラのための適応信号処理の研究

牧野昭二

東北大学博士論文 71 ( 12 ) 2212 - 2214 1993

CiNii
帯域分割形指数重み付けアルゴリズムを用いた音響エコーキャンセラ

牧野昭二

信学全大,SA-9-4 1990

CiNii

▼display all

Industrial Property Rights

Device for blind source separation

H., Sawada, S., Araki, R., Mukai, and, S. Makino, 牧野, 昭二

Patent
Device for blind source separation

S., Araki, H., Sawada, S., Makino, and, R. Mukai

Patent
Apparatus, method and program for estimation of positional information on signal sources

H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

Patent
音情報処理装置及びプログラム

牧野, 昭二, 山岡洸瑛, 山田武志, 小野順貴

Patent
音響処理装置, 音響処理システム及び音響処理方法

牧野昭二, 石村, 大, 前, 成美, 山田武志, 小野順貴

Patent
信号処理装置、信号処理方法、プログラム、記録媒体 (可変カットオフ周波数によるポストフィルタリング方法)

鎌本,優, 守谷,健弘, 原田,登, 千葉,大将, 宮部,滋樹, 山田,武志, 牧野,昭二

Patent
音声信号処理装置及び方法

小野,順貴, 宮部,滋樹, 牧野,昭二

Patent
信号処理装置、信号処理方法、プログラム (ピッチ周波数に依存する可変ゲインによるポストフィルタリング方法)

鎌本,優, 守谷,健弘, 原田,登, 千葉,大将, 宮部,滋樹, 山田,武志, 牧野,昭二

Patent
方向情報分布推定装置, 音源数推定装置, 音源方向測定装置, 音源分離装置, それらの方法, それらのプログラム

荒木, 章子, 中谷, 智広, 澤田, 宏, 牧野, 昭二

Patent
複数信号区間推定装置, 複数信号区間推定方法, そのプログラムおよび記録媒体

荒木, 章子, 石塚, 健太郎, 藤本, 雅清, 中谷, 智広, 牧野, 昭二

Patent
複数信号区間推定装置とその方法と, プログラムとその記録媒体

荒木, 章子, 石塚, 健太郎, 藤本, 雅清, 牧野, 昭二

Patent
信号分離装置, 信号分離方法, プログラム及び記録媒体

澤田, 宏, 荒木, 章子, 牧野, 昭二

Patent
信号分離装置, 信号分離方法, 信号分離プログラム, 記録媒体

澤田, 宏, 荒木, 章子, 牧野, 昭二

Patent
多信号強調装置, 方法, プログラム及びその記録媒体

荒木, 章子, 澤田, 宏, 牧野, 昭二

Patent
ブラインド信号抽出装置, その方法, そのプログラム, 及びそのプログラムを記録した記録媒体

荒木, 章子, 澤田, 宏, Jan, Cermak, 牧野, 昭二

Patent
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

Patent
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

Patent
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体, 並びに, 信号到来方向推定装置, 信号到来方向推定方法, 信号到来方向推定プログラム及び記録媒体

澤田, 宏, 牧野, 昭二, 荒木, 章子, 向井, 良

Patent
信号到来方向推定装置, 信号到来方向推定方法, 信号到来方向推定プログラム及び記録媒体

荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

Patent
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

Patent
信号到来方向推定方法, 装置, プログラムおよびこれを記録した記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

Patent
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

Patent
信号抽出装置, 信号抽出方法, 信号抽出プログラム及び記録媒体

澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

Patent
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

荒木, 章子, 牧野, 昭二, 澤田, 宏, 向井, 良

Patent
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

Patent
信号源数の推定方法, 推定装置, 推定プログラム及び記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

Patent
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

荒木, 章子, 牧野, 昭二, 澤田, 宏, 向井, 良

Patent
信号分離方法, 信号分離装置, 信号分離プログラム及び記録媒体

澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

Patent
信号分離方法および装置ならびに信号分離プログラムおよびそのプログラムを記録した記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

Patent
ブラインド信号分離装置, ブラインド信号分離方法及びブラインド信号分離プログラム

荒木, 章子, 牧野, 昭二, 向井, 良, 澤田, 宏

Patent
ブラインド信号分離装置, ブラインド信号分離方法及びブラインド信号分離プログラム

向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

Patent
ブラインド信号分離方法, ブラインド信号分離プログラム及び記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

Patent
信号到来方向推定方法, 装置, プログラムおよびこれを記録した記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

Patent
SmoothQuiet

牧野, 昭二, 小島, 順治

Patent
QuiteSmooth

牧野, 昭二, 小島, 順治

Patent
EchoCam

牧野, 昭二, 小島順治

Patent
SUBBANDES

牧野, 昭二, 羽田, 陽一, 小島, 順治

Patent
ESPARC

羽田, 陽一, 牧野, 昭二, 小島, 順治

Patent
Radespa

羽田, 陽一, 牧野, 昭二, 小島, 順治

Patent
DISCAS

羽田, 陽一, 牧野, 昭二, 小島, 順治

Patent
ES射影アルゴリズム

牧野, 昭二, 羽田, 陽一, 小島, 順治

Patent
デュオフィルタ

牧野, 昭二, 羽田, 陽一, 小島, 順治

Patent
インテリジェントロスコントローラ

牧野, 昭二, 羽田, 陽一, 小島, 順治

Patent
フェールセーフ適応動作制御方式

牧野, 昭二, 羽田, 陽一, 小島, 順治

Patent
スムーストーク

牧野, 昭二, 小島, 順治

Patent

▼display all

Syllabus

Bachelor Thesis B

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis B（Spring Semester）

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A（Fall Semester）

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2025 spring semester
Topics in Fundamental Science and Engineering C

School of Fundamental Science and Engineering

2025 spring semester
Topics in Fundamental Science and Engineering C

School of Fundamental Science and Engineering

2025 spring semester
Topics in Fundamental Science and Engineering C

School of Fundamental Science and Engineering

2025 spring semester
Topics in Fundamental Science and Engineering C

School of Fundamental Science and Engineering

2025 spring semester
Topics in Fundamental Science and Engineering C

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis B [S Grade]

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B (Spring Semester)

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A (Fall Semester)

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2025 fall semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2025 spring semester
Bachelor Thesis A [S Grade]

School of Fundamental Science and Engineering

2025 spring semester
Intelligent Acoustic Systems Research (Doctor's Thesis)

Graduate School of Information, Production and Systems

2025 full year
Intelligent Acoustic Systems Research (Fall)

Graduate School of Information, Production and Systems

2025 fall semester
Intelligent Acoustic Systems Research (Spring)

Graduate School of Information, Production and Systems

2025 spring semester
Intelligent Acoustic Systems D

Graduate School of Information, Production and Systems

2025 fall semester
Intelligent Acoustic Systems C

Graduate School of Information, Production and Systems

2025 spring semester
Intelligent Acoustic Systems B

Graduate School of Information, Production and Systems

2025 spring semester
Intelligent Acoustic Systems A

Graduate School of Information, Production and Systems

2025 fall semester
Acoustic Signal Processing

Graduate School of Information, Production and Systems

2025 fall semester
Machine Learning

Graduate School of Information, Production and Systems

2025 spring semester
Master's Thesis (Integrated Systems)(Fall)

Graduate School of Information, Production and Systems

2025 fall semester
Master's Thesis (Integrated Systems)(Spring)

Graduate School of Information, Production and Systems

2025 spring semester
Intelligent Acoustic Systems Research (Spring)

Graduate School of Information, Production and Systems

2025 spring semester
Intelligent Acoustic Systems

Graduate School of Information, Production and Systems

2025 fall semester
Intelligent Acoustic Systems Research (Fall)

Graduate School of Information, Production and Systems

2025 fall semester
Digital Signal Processing

Graduate School of Information, Production and Systems

2025 spring semester

▼display all

Teaching Experience

情報科学概論Ⅱ

筑波大学

Sub-affiliation

Faculty of Science and Engineering School of Fundamental Science and Engineering

Research Institute

2024

-

2026

Waseda Research Institute for Science and Engineering Concurrent Researcher

Internal Special Research Projects

⾳環境の認識と理解のための⾰新的マイクロホンアレー基盤技術の研究

2024

　View Summary

　観測信号に含まれる音声信号の数がマイク数を上回る劣決定条件下において，複数チャネルの観測信号から単一の目的音声信号を抽出する新しい手法を提案した．従来のビームフォーマや空間正則化を用いたブラインド音源分離は，このような状況では干渉音声信号の抑圧が困難であった．スイッチング最小出力歪み応答 (Switching Minimum Power Distortionless Response: Sw-MPDR) ビームフォーマは，スイッチング機構を用いることで劣決定条件に対応可能だが，目的音声の到来方向によって決定されるステアリングベクトルに過適合すると推定精度が大幅に低下する．空間正則化独立ベクトル抽出 (Spatially-Regularized Independent Vector Extraction : SRIVE) は，到来方向のみに基づいて目的音声を頑健に強調できるが，劣決定条件下では性能が劣化する．本研究では，これらの従来法を拡張し，その限界を克服した．まず，Sw-MPDRビームフォーマに時間変動ガウス音源モデルを導入し，到来方向のみに基づいた目的音声の強調を効果的に行った．次に，SRIVEにスイッチング機構を導入し，劣決定条件下での音声強調性能を向上させた．これらの提案法をそれぞれスイッチング加重MPDR (Sw-wMPDR) ビームフォーマおよびスイッチングSRIVE (Sw-SRIVE)と呼ぶ．実験により，両提案法が劣決定条件下において到来方向を用いた目的音声の強調性能で従来法を上回ることを示した．本研究は，2024年 IEEE福岡支部学生研究奨励賞を受賞した．
⾳環境の認識と理解のための⾰新的マイクロホンアレー基盤技術の研究

2023

　View Summary

ブラインド処理と空間正則化処理に基づいてオンライン音源分離，残響除去，およびノイズ低減を実行する，計算効率の高い同時最適化アルゴリズムを提案した．まず，独立ベクトル抽出(IVE)と重み付き予測誤差残響除去(WPE)のブラインドオンライン同時最適化アルゴリズムを提案した．このオンラインアルゴリズムは，WPEを使用することで残響を低減できるため，短い分析フレームでも正確な分離を実現できた．次に，オンライン同時最適化をロバストな空間正則化で拡張した．DOA ベースの空間正則化を確実に機能させるためには，分離された信号のスケールを正規化することが非常に効果的であることを明らかにした．実験では，ブラインドオンライン同時最適化アルゴリズムが 8 ms のアルゴリズム遅延で分離精度を大幅に改善できることを確認した．さらに，提案した空間正則化オンライン同時最適化アルゴリズムが音源順序エラーを 0 % に低減することを確認した．
音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

2022

　View Summary

空間正則化付き独立ベクトル抽出（SRIVE）は，事前推定した音響伝達関数を用いて，所望の出力順序になるように音源分離を行う．しかし，従来のSRIVEはスケール任意性や伝達関数の誤差による出力順序誘導への影響が十分に考慮されていなかった．本研究では，空間正則化に加えてさらに分離フィルタのスケールを小さくする正則化を導入することで上記の問題の解決を試みた．実験より，スケール正則化が分離性能(SDR)を維持しつつ，出力順序正答率を75%から100%に改善することを確かめた．
音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

2021

　View Summary

Thisresearch explores whether the newly proposed online algorithm that jointlyoptimizes weighted prediction error (WPE) and independent vector analysis (IVA)works well in separating moving sound sources in reverberant indoorenvironments. The moving source is first fixed and then rotated 60 degrees in aroom at a speed of less than 10 cm/s, while the other remains fixed. Throughthe comparison of the online-AuxIVA, online-WPE+IVA (separate), andonline-WPE+IVA (joint) algorithms, we can conclude that the online-WPE+IVA(joint) method has the best separation performance when the sources are fixed,but online-WPE+IVA (separate) is more stable and has better performance whenremoving moving sources from the mixed sound.