研究者詳細 - 牧野　昭二

写真a

マキノ　ショウジ

牧野　昭二

Google Scholar 情報（Citations per year）

Citation: 12790 h-index: 55 i10-index: 187

Click to view the Google Scholar page.

所属

理工学術院大学院情報生産システム研究科

職名

特任教授

学位

博士 ( 東北大学 )

ホームページ

https://s.makino.w.waseda.jp/index-j.htm

プロフィール

早稲田大学大学院情報生産システム研究科教授．専門は音響信号処理，特にブラインド音源分離やエコーキャンセラに関する研究に従事．IEEE Jack S. Kilby Signal Processing Medal Committee, IEEE James L. Flanagan Speech & Audio Processing Award Committee, IEEE SPS Fellow Evaluation Committee, IEEE SPS Board of Governor，IEEE SPS Audio and Acoustic Signal Processing TC Chair，IEEE WASPAA2007 General Chair，IEEE Trans. Speech Audio Processing Associate Editor，文部科学大臣表彰 (科学技術賞研究部門)，電子情報通信学会名誉員・功績賞・業績賞，IEEE SPS Leo L. Beranek Meritorious Service Award, ICA Unsupervised Learning Pioneer Award，IEEE MLSP Competition Award，IEEE Distinguished Lecturer，IEEE Fellow．電子情報通信学会Fellow．

経歴

2021年04月

-

継続中

早稲田大学大学院情報生産システム研究科教授
2009年04月

-

2021年03月

筑波大学先端学際領域研究センターおよび大学院システム情報工学研究科教授
2014年04月

-

2018年03月

国立情報学研究所客員教授
2013年04月

-

2018年03月

理化学研究所客員研究員
2008年04月

-

2009年03月

日本電信電話株式会社コミュニケーション科学基礎研究所主幹研究員
2008年12月

-

2009年02月

University Erlangen-Nuremberg, Germany 客員教授
2004年04月

-

2008年03月

北海道大学大学院情報科学研究科客員教授
2003年04月

-

2008年03月

日本電信電話株式会社コミュニケーション科学基礎研究所メディア情報研究部長
2006年04月

-

2007年03月

東京大学大学院情報理工学系研究科非常勤講師
2000年04月

-

2003年03月

日本電信電話株式会社コミュニケーション科学基礎研究所信号処理研究グループリーダ
1999年01月

-

2000年03月

日本電信電話株式会社生活環境研究所グループリーダ
1996年07月

-

1998年12月

日本電信電話株式会社マルチメディアシステム総合研究所主幹研究員
1987年08月

-

1996年06月

日本電信電話株式会社ヒューマンインタフェース研究所主任研究員
1981年04月

-

1987年07月

日本電信電話株式会社横須賀電気通信研究所

▼全件表示

学歴

1993年03月

-

　

東北大学博士（工学）
1979年04月

-

1981年03月

東北大学大学院工学研究科機械工学専攻
1975年04月

-

1979年03月

東北大学工学部機械工学第Ⅱ学科

委員歴

2019年

-

継続中

日本学術振興会 Member of the Grants-in-Aid for Scientific Research Sub-Committee
2019年

-

継続中

European Association for Signal Processing (EURASIP) Member of the Special Area Team on Acoustic, Speech and Music Signal Processing
2018年

-

継続中

Asia Pacific Signal and Information Processing Association Member of the Signal and Information Processing Theory and Methods Technical Committee
2014年05月

-

継続中

電子情報通信学会応用音響研究会顧問
2013年

-

継続中

日本音響学会理事
2007年

-

継続中

電子情報通信学会フェロー
2005年

-

継続中

日本音響学会評議員
2004年04月

-

継続中

International Speech Communication Association (ISCA) Member
2004年

-

継続中

Institute of Electrical and Electronics Engineers (IEEE) Fellow
2003年

-

継続中

日本音響学会代議員
2003年

-

継続中

International ICA Steering Committee Member
2000年04月

-

継続中

European Association for Signal Processing (EURASIP) Member
1999年

-

継続中

International Workshop on Acoustic Echo and Noise Control International IWAENC Standing Committee Member
1989年04月

-

継続中

Institute of Electrical and Electronics Engineers (IEEE) Member
1988年04月

-

継続中

電子情報通信学会会員
1983年04月

-

継続中

日本音響学会会員
2018年

-

2020年

IEEE Signal Processing Society Member of the Board of Governors
2019年

　

　

日本学術振興会科学研究費基盤研究(S) 審査意見書委員
2018年

-

2019年

日本学術振興会国際事業委員会書面審査員・書面評価員
2018年

-

2019年

日本学術振興会特別研究員等審査会専門委員
2018年

-

2019年

2018 International Workshop on Acoustic Signal Enhancement General Chair
2017年

-

2018年

IEEE Signal Processing Society Japan Chapter Chair
2015年

-

2018年

Institute of Electrical and Electronics Engineers (IEEE) Member of Jack S. Kilby Signal Processing Medal Committee
2013年

-

2015年

日本学術振興会科学研究費委員会専門委員
2013年

-

2015年

IEEE Signal Processing Magazine Guest Editor
2014年

　

　

日本音響学会独創研究奨励賞板倉記念選考委員会委員長
2013年

-

2014年

IEEE Signal Processing Society Technical Directions Board Member
2013年

-

2014年

IEEE Signal Processing Society Chair of the Audio and Acoustic Signal Processing Technical Committee
2013年07月

　

　

2013 International Conference of the IEEE Engineering in Medicine and Biology (EMBC2013) Tutorial Speaker
2012年

-

2013年

2012 IEEE International Conference on Acoustics, Speech, and Signal Processing Plenary Chair
2011年

-

2012年

2011 Annual Conference of the International Speech Communication Association Tutorial Speaker
2005年

-

2012年

European Association for Signal Processing Associate Editor of the EURASIP JASP
2009年

-

2011年

IEEE Japan Council Awards Committee Member
2008年

-

2011年

Institute of Electrical and Electronics Engineers (IEEE) James L. Flanagan Speech & Audio Processing Award Committee Member
2009年

-

2010年

電子情報通信学会フェロー推薦委員会委員
2009年

-

2010年

IEEE Signal Processing Society Distinguished Lecturer
2008年

-

2009年

2008 Joint Workshop on Hands-Free Speech Communication and Microphone Arrays Panelist
2008年

　

　

電子情報通信学会論文賞選定委員会委員
2007年

-

2008年

2007 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics General Chair
2007年

-

2008年

電子情報通信学会基礎・境界ソサイエティ音響超音波サブソサイエティ会長
2007年

-

2008年

2007 IEEE International Conference on Acoustics, Speech and Signal Processing Tutorial Speaker
2007年

-

2008年

2007 International Conference on Independent Component Analysis and Signal Separation Keynote Speaker
2006年

-

2008年

電子情報通信学会応用音響研究会委員長
2006年

-

2008年

IEEE Signal Processing Society Awards Board Member
2006年

-

2007年

日本音響学会粟屋潔学術奨励賞選定委員会委員
2005年

-

2006年

2005 Joint Workshop on Hands-Free Speech Communication and Microphone Arrays Panelist
2002年

-

2005年

Institute of Electrical and Electronics Engineers (IEEE) Associate Editor of the IEEE Trans. Speech and Audio Processing
2001年

-

2005年

日本音響学会佐藤論文賞選定委員会委員
2003年

-

2004年

2003 International Workshop on Acoustic Echo and Noise Control General Chair
2002年

-

2004年

IEEE Signal Processing Society Conference Board Member
2013年

-

継続中

European project Embedded Audition for Robots Advisory Board member
2006年

-

継続中

International Advisory Panel Member
2003年

-

継続中

Acoustical Society of Japan Council member
2020年

-

2021年

2020 European Signal Processing Conference Special Session Organizer
2020年

-

2021年

2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2020年

-

2021年

2020 European Signal Processing Conference Area Chair
2020年

-

2021年

2020 International Workshop on Acoustic Echo and Noise Control Member of the Organizing Committee
2019年

-

2020年

2019 European Signal Processing Conference Special Session Organizer
2019年

-

2020年

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2019年

-

2020年

2019 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) Member of the Technical Committee
2019年

-

2020年

IEEE Signal Processing Society Member of the TC Review Committee
2018年

-

2020年

IEEE Signal Processing Society Member of the Long-Range Planning and Implementation Committee
2018年

-

2019年

2018 European Signal Processing Conference Special Session Organizer
2018年

-

2019年

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2018年

-

2019年

2018 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2018年

-

2019年

2018 IEEE International Conference on Acoustics, Speech, and Signal Processing Area Chair
2017年

-

2018年

2017 European Signal Processing Conference Special Session Organizer
2017年

-

2018年

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2017年

-

2018年

2017 IEEE International Conference on Acoustics, Speech, and Signal Processing Area Chair
2016年

-

2017年

Special Session Organizer
2016年

-

2017年

2016 European Signal Processing Conference Member of the Technical Program Committee
2016年

-

2017年

2016 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2016年

-

2017年

Area Chair
2016年

-

2017年

Area Chair
2016年

-

2017年

IEEE Signal Processing Society Chair of the Nominations and Elections Subcommittee of the Audio and Acoustic Signal Processing Technical Committee
2012年

-

2017年

IEEE Signal Processing Society Chair of the Fellow Subcommittee of the Audio and Acoustic Signal Processing Technical Committee
2015年

-

2016年

2015 European Signal Processing Conference Special Session Organizer
2015年

-

2016年

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2015年

-

2016年

2015 AEARU Workshop on Computer Science and Web Technology Member of the Program Committee
2015年

-

2016年

2015 IEEE International Conference on Acoustics, Speech, and Signal Processing Area Chair
2015年

-

2016年

IEEE Signal Processing Society Japan Chapter Vice Chair
2015年

-

2016年

2015 International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) Special Sessions Chair
2015年

-

2016年

2015 European Signal Processing Conference Area Chair
2010年

-

2016年

Asia Pacific Signal and Information Processing Association Member of the Speech, Language, and Audio Technical Committee
2015年

　

　

IEEE Signal Processing Society Past Chair of the Audio and Acoustic Signal Processing Technical Committee
2015年

　

　

2015 IEEE International Workshop on Applications of Signal Processing to Audio Member of the Technical Program Committee
2015年

　

　

IEEE Signal Processing Society Vice Chair of the Nominations and Elections Subcommittee of the Audio and Acoustic Signal Processing Technical Committee
2014年

-

2015年

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2014年

-

2015年

2014 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2014年

-

2015年

2014 Hands-free Speech Communication and Microphone Arrays Member of the Technical Program Committee
2014年

-

2015年

Symposia at the 2014 IEEE Global Conference on Signal and Information Processing Member of the Organizing Committee
2014年

-

2015年

2014 IEEE International Conference on Acoustics, Speech, and Signal Processing Area Chair
2014年

-

2015年

2014 European Signal Processing Conference Area Chair
2013年

-

2014年

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference Special Session Organizer
2013年

-

2014年

2013 European Signal Processing Conference Special Session Organizer
2013年

-

2014年

2013 IEEE International Conference on Acoustics, Speech, and Signal Processing Area Chair
2013年

-

2014年

2013 European Signal Processing Conference Area Chair
2012年

-

2013年

Special Session Organizer
2012年

-

2013年

2012 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2011年04月

-

2012年03月

日本音響学会日本音響学会誌小特集ゲスト編集委員長
2011年

-

2012年

2011 Hands-free Speech Communication and Microphone Arrays Member of the Technical Program Committee
2011年

-

2012年

2011 European Signal Processing Conference Member of the Technical Program Committee
2011年

-

2012年

IEEE Signal Processing Society Vice Chair of the Audio and Acoustic Signal Processing Technical Committee
2011年

-

2012年

European Association for Signal Processing (EURASIP) Guest Editor of the EURASIP Journal on Applied Signal Processing
2010年

-

2011年

2010 Asia-Pacific Signal and Information Processing Conference Member of the Technical Committee
2010年

-

2011年

2010 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2010年

-

2011年

2010 IEEE International Symposium on Circuits and Systems Track Chair
2009年

-

2010年

2009 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics Member of the Organizing Committee
2009年

-

2010年

2009 IEEE International Symposium on Circuits and Systems Track Chair
2009年

-

2010年

2009 European Signal Processing Conference Area Chair
2009年

-

2010年

IEEE Circuits and Systems Society Chair of the Blind Signal Processing Technical Committee
2008年

-

2010年

Institute of Electrical and Electronics Engineers (IEEE) Guest Editor of the IEEE Trans. Circuits and Systems-I
1990年

-

2010年

IEEE Signal Processing Society Member of the Audio and Acoustic Signal Processing Technical Committee
2008年

-

2009年

2008 Joint Workshop on Hands-free Speech Communication and Microphone Arrays Special Session Organizer
2008年

-

2009年

2008 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2008年

-

2009年

2008 Joint Workshop on Hands-free Speech Communication and Microphone Arrays Technical Co-Chair
2008年

-

2009年

2008 Workshop on Statistical and Perceptual Audition Co-Organizer
2008年

-

2009年

2008 IEEE International Symposium on Circuits and Systems Member of the Review Committee
2007年

-

2009年

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS. 電子情報通信学会論文誌小特集ゲスト編集委員長
2007年

-

2008年

2007 IEEE International Symposium on Circuits and Systems Special Session Organizer
2007年

-

2008年

電子情報通信学会基礎・境界ソサイエティ副会長
2007年

-

2008年

2007 IEEE International Symposium on Circuits and Systems Member of the Review Committee
2007年

-

2008年

Chair-Elect of the Blind Signal Processing Technical Committee
2006年

-

2008年

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS. 電子情報通信学会論文誌小特集ゲスト編集委員長
2006年

-

2007年

2006 Asilomar Conference on Signals, Systems, and Computers Special Session Organizer
2006年

-

2007年

2006 European Signal Processing Conference Special Session Organizer
2006年

-

2007年

2006 International Conference on Independent Component Analysis and Blind Signal Separation Special Session Organizer
2006年

-

2007年

2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan Special Session Organizer
2006年

-

2007年

2006 International Conference on Independent Component Analysis and Blind Signal Separation Member of the International Program Committee
2006年

-

2007年

2006 European Signal Processing Conference Member of the Technical Program
2006年

-

2007年

2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan Member of the Organizing Committee
2006年

-

2007年

2006 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2006年

-

2007年

2006 Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan Member of the Technical Committee
2006年

-

2007年

2006 IEEE International Symposium on Circuits and Systems Member of the Review Committee
2006年

-

2007年

Institute of Electrical and Electronics Engineers (IEEE) Guest Editor of the IEEE Trans. Computers
2006年

-

2007年

2006 International Conference on Independent Component Analysis and Blind Signal Separation Program Committee Chair
2005年

-

2007年

Institute of Electrical and Electronics Engineers (IEEE) Guest Editor of the IEEE Trans. ASLP
2005年

-

2006年

2005 IEEE International Symposium on Circuits and Systems Member of the Review Committee
2005年

-

2006年

2005 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics Member of the Organizing Committee
2005年

-

2006年

2005 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
2004年

-

2006年

IEEE Circuits and Systems Society Member of the Blind Signal Processing Technical Committee
2003年04月

-

2005年03月

電子情報通信学会応用音響研究会専門委員
2004年

-

2005年

2004 International Congress on Acoustics Special Session Organizer
2004年

-

2005年

2004 IEEE International Conference on Acoustics, Speech and Signal Processing Special Session Organizer
2004年

-

2005年

2004 Workshop on Communication Scene Analysis Program Chair
2004年

-

2005年

2004 Workshop on Statistical and Perceptual Audio Processing Member of the Technical Committee
2004年

-

2005年

2004 International Congress on Acoustics Member of the Program Committee
2001年

-

2005年

Acoustical Society of Japan 日本音響学会誌論文委員会電気音響分野幹事
2003年

-

2004年

2003 IEEE International Workshop on Neural Networks for Signal Processing Member of the Program Committee
2003年

-

2004年

2003 IEEE International Workshop on Applications of Signal Processing to Audio and Acoustics Member of the Program Committee
2003年

-

2004年

2003 International Conference on Independent Component Analysis and Blind Signal Separation Organizing Chair
2001年04月

-

2003年03月

電子情報通信学会応用音響研究会副委員長
2002年

-

2003年

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS. 電子情報通信学会論文誌小特集ゲスト編集委員
2002年

-

2003年

2002 China-Japan Joint Conference on Acoustics Member of the Organizing Committee
2002年

-

2003年

2002 IEEE International Workshop on Neural Networks for Signal Processing Member of the Program Committee
1999年

-

2003年

Institute of Electrical and Electronics Engineers (IEEE) Senior Member
2001年

-

2002年

2001 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
1992年04月

-

2001年03月

電子情報通信学会応用音響研究会専門委員
1999年

-

2000年

1999 International Workshop on Acoustic Echo and Noise Control Member of the Technical Committee
1995年

-

1997年

Acoustical Society of Japan 研究発表会準備委員会委員
1990年04月

-

1992年03月

電子情報通信学会応用音響研究会幹事
1990年

-

1992年

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS. 電子情報通信学会論文誌小特集ゲスト編集委員

▼全件表示

所属学協会

　

　

　

日本音響学会
　

　

　

電子情報通信学会
　

　

　

APSIPA (Asia Pacific Signal and Information Processing Association)
　

　

　

ISCA (International Speech Communication Association)
　

　

　

EURASIP (European Association for Signal Processing)
　

　

　

IEEE (Institute of Electrical and Electronics Engineers)

▼全件表示

研究分野

知覚情報処理 / 知能ロボティクス / 知能情報学

研究キーワード

メディア情報処理
ディジタル信号処理
音響信号処理
Media Information Processing
Digital Signal Processing
Acoustic Signal Processing

▼全件表示

受賞

公益財団法人服部報公会報公賞

2018年10月公益財団法人服部報公会

受賞者：牧野昭二
電子情報通信学会功績賞

2018年06月電子情報通信学会

受賞者：牧野昭二
日本音響学会論文賞

2018年03月日本音響学会

受賞者：牧野昭二
電子情報通信学会業績賞

2017年06月通信学会

受賞者：牧野昭二
文部科学大臣表彰 (科学技術賞研究部門)

2015年04月

受賞者：牧野昭二
電気通信普及財団テレコムシステム技術賞

2015年03月電気通信普及財団

受賞者：牧野昭二
IEEE Signal Processing Society Best Paper Award

2014年01月 IEEE Signal Processing Society

受賞者： Makino Shoji
Distinguished Lecturer

2009年01月 IEEE

受賞者： Shoji Makino
Fellow

2007年09月 IEICE

受賞者： Shoji Makino
MLSP Competition Award

2007年08月 IEEE

受賞者： Shoji Makino
Best Presentation Award at the SPIE Defense and Security Symposium

2006年04月 SPIE

受賞者：牧野昭二
ICA Unsupervised Learning Pioneer Award

2006年04月 SPIE

受賞者：牧野昭二
Paper Award

2005年05月 IEICE

受賞者： Shoji Makino
TELECOM System Technology Award

2004年03月 Telecommunications Advancement Foundation

受賞者： Shoji Makino
Fellow

2004年01月 IEEE

受賞者： Shoji Makino
Best Paper Award of the International Workshop on Acoustic Echo and Noise Control

2003年09月

受賞者：牧野昭二
Paper Award

2002年05月 IEICE

受賞者： Shoji Makino
Paper Award

2002年03月 ASJ

受賞者： Shoji Makino
Achievement Award

1997年05月 IEICE

受賞者： Shoji Makino
Outstanding Technological Development Award

1995年05月 ASJ

受賞者： Shoji Makino
IEEE Signal Processing Society Notable Services and Contributions Award

2019年 IEEE Signal Processing Society

受賞者： Makino Shoji
IEEE Signal Processing Society Chapter Leadership Award

2018年12月 IEEE Signal Processing Society

受賞者：牧野昭二
Best Faculty Member Award of the University of Tsukuba

2016年02月

受賞者：牧野昭二
IEEE Signal Processing Society Outstanding Service Award

2014年12月 IEEE Signal Processing Society

受賞者：牧野昭二

▼全件表示

論文

Time-Frequency-Bin-Wise Linear Combination of Beamformers for Distortionless Signal Enhancement.

Kouei Yamaoka, Nobutaka Ono, Shoji Makino

IEEE/ACM Transactions on Audio, Speech and Language Processing 29 3461 - 3475 2021年

DOI
Multichannel Signal Enhancement Algorithms for Assisted Listening Devices

Simon Doclo, Walter Kellermann, Shoji Makino, Sven Nordholm

IEEE SIGNAL PROCESSING MAGAZINE 32 ( 2 ) 18 - 30 2015年03月 [査読有り]

DOI
Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment

Hiroshi Sawada, Shoko Araki, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 19 ( 3 ) 516 - 527 2011年03月 [査読有り]

　概要を見る

This paper presents a blind source separation method for convolutive mixtures of speech/audio sources. The method can even be applied to an underdetermined case where there are fewer microphones than sources. The separation operation is performed in the frequency domain and consists of two stages. In the first stage, frequency-domain mixture samples are clustered into each source by an expectation-maximization (EM) algorithm. Since the clustering is performed in a frequency bin-wise manner, the permutation ambiguities of the bin-wise clustered samples should be aligned. This is solved in the second stage by using the probability on how likely each sample belongs to the assigned class. This two-stage structure makes it possible to attain a good separation even under reverberant conditions. Experimental results for separating four speech signals with three microphones under reverberant conditions show the superiority of the new method over existing methods. We also report separation results for a benchmark data set and live recordings of speech mixtures.

DOI
Frequency-Domain Pearson Distribution Approach for Independent Component Analysis (FD-Pearson-ICA) in Blind Source Separation

Hiroko Kato Solvang, Yuichi Nagahara, Shoko Araki, Hiroshi Sawada, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 17 ( 4 ) 639 - 649 2009年05月 [査読有り]

　概要を見る

In frequency-domain blind source separation (BSS) for speech with independent component analysis (ICA), a practical parametric Pearson distribution system is used to model the distribution of frequency-domain source signals. ICA adaptation rules have a score function determined by an approximated signal distribution. Approximation based on the data may produce better separation performance than we can obtain with ICA. Previously, conventional hyperbolic tangent (tanh) or generalized Gaussian distribution (GGD) was uniformly applied to the score function for all frequency bins, even though a wideband speech signal has different distributions at different frequencies. To deal with this, we propose modeling the signal distribution at each frequency by adopting a parametric Pearson distribution and employing it to optimize the separation matrix in the ICA learning process. The score function is estimated by the appropriate Pearson distribution parameters for each frequency bin. We devised three methods for Pearson distribution parameter estimation and conducted separation experiments with real speech signals convolved with actual room impulse responses (T(60) = 130 ms). Our experimental results show that the proposed frequency-domain Pearson-ICA (FD-Pearson-ICA) adapted well to the characteristics of frequency-domain source signals. By applying the FD-Pearson-ICA performance, the signal-to-interference ratio significantly improved by around 2-3 dB compared with conventional nonlinear functions. Even if the signal-to-interference ratio (SIR) values of FD-Pearson-ICA were poor, the performance based on a disparity measure between the true score function and estimated parametric score function clearly showed the advantage of FD-Pearson-ICA. Furthermore, we confirmed the optimum of the proposed approach for/optimized the proposed approach as regards separation performance. By combining individual distribution parameters directly estimated at low frequency with the appropriate parameters optimized at high frequency, it was possible to both reasonably improve the FD-Pearson-ICA performance without any significant increase in the computational burden by comparison with conventional nonlinear functions.

DOI
Grouping separated frequency components by estimating propagation model parameters in frequency-domain blind source separation

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 15 ( 5 ) 1592 - 1604 2007年07月 [査読有り]

　概要を見る

This paper proposes a new formulation and optimization procedure for grouping frequency components in frequency-domain blind source separation (BSS). We adopt two separation techniques, independent component analysis (ICA) and time-frequency.(T-F) masking, for the frequency-domain BSS. With ICA, grouping the frequency components corresponds to aligning the permutation ambiguity of the ICA solution in each frequency bin. With T-F masking, grouping the frequency components corresponds to classifying sensor observations in the time-frequency domain for individual sources. The grouping procedure is based on estimating anechoic propagation model parameters by analyzing ICA results or sensor observations. More specifically, the time delays of arrival and attenuations from a source to all sensors are estimated for each source. The focus of this paper includes the applicability of the proposed procedure for a situation with wide sensor spacing where spatial aliasing may occur. Experimental results show that the proposed procedure effectively separates two or three sources with several sensor configurations in a real room, as long as the room reverberation is moderately low.

DOI
Spatio-temporal FastICA algorithms for the blind separation of convolutive mixtures

Scott C. Douglas, Malay Gupta, Hiroshi Sawada, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 15 ( 5 ) 1511 - 1520 2007年07月 [査読有り]

　概要を見る

This paper derives two spatio-temporal extensions of the well-known FastICA algorithm of Hyvarinen and Oja that are applicable to the convolutive blind source separation task. Our time-domain algorithms combine multichannel spatio-temporal prewhitening via multistage least-squares linear prediction with novel adaptive procedures that impose paraunitary, constraints on the multichannel separation filter. The techniques converge quickly to a separation solution without any step size selection or divergence difficulties, and unlike other methods, ours do not require special coefficient initialization procedures to obtain good separation performance. They also allow for the efficient reconstruction of individual signals as observed in the sensor measurements directly from the system parameters for single-input multiple-output blind source separation tasks. An analysis of one of the adaptive constraint procedures shows its fast convergence to a paraunitary filter bank solution. Numerical evaluations of the proposed algorithms and comparisons with several existing convolutive blind source separation techniques indicate the excellent relative performance of the proposed methods.

DOI
Geometrically constrained independent component analysis

Mirko Knaak, Shoko Araki, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 15 ( 2 ) 715 - 726 2007年02月 [査読有り]

　概要を見る

Acoustical signals are often corrupted by other speeches, sources, and background noise. This makes it necessary to use some form of preprocessing so that signal processing systems such as a speech recognizer or machine diagnosis can be effectively employed. In this contribution, we introduce and evaluate a new algorithm that uses independent component analysis (ICA) with a geometrical constraint [constrained ICA (CICA)]. It is based on the fundamental similarity between an adaptive beamformer and blind source separation with ICA, and does not suffer the permutation problem of ICA-algorithms. Unlike conventional ICA algorithms, CICA needs prior knowledge about the rough direction of the target signal. However, it is more robust against an erroneous estimation of the target direction than adaptive beamformers: CICA converges to the right solution as long as its look direction is closer to the target signal than to the jammer signal. A high degree of robustness is very important since the geometrical prior of an adaptive beamformer is always roughly estimated in a reverberant environment, even when the look direction is precise. The effectiveness and robustness of the new algorithms is proven theoretically, and shown experimentally for three sources and three microphones with several sets of real-world data.

DOI
Blind extraction of dominant target sources using ICA and time-frequency masking

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 14 ( 6 ) 2165 - 2173 2006年11月 [査読有り]

　概要を見る

This paper presents a method for enhancing target sources of interest and suppressing other interference sources. The target sources are assumed to be close to sensors, to have dominant powers at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e., without knowing the position and active time of each source. We consider a general case where the total number of sources is larger than the number of sensors, and neither the number of target sources nor the total number of sources is known. The method is based on a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and then time-frequency masking is used to improve the performance further. We propose a new sophisticated method for deciding the number of target sources and then selecting their frequency components. We also propose a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room, whose reverberation time was 130 ms, are presented to show the effectiveness and characteristics of the proposed method.

DOI
Natural gradient multichannel blind deconvolution and speech separation using causal FIR filters

Scott C. Douglas, Hiroshi Sawada, Shoji Makino

IEEE Transactions on Speech and Audio Processing 13 ( 1 ) 92 - 104 2005年01月 [査読有り]

　概要を見る

Natural gradient adaptation is an especially convenient method for adapting the coefficients of a linear system in inverse filtering tasks such as convolutive blind source separation and multichannel blind deconvolution. When developing practical implementations of such methods, however, it is not clear how best to window the signals and truncate the filter impulse responses within the filtered gradient updates. In this paper, we show how inadequate use of truncation of the filter impulse responses and signal windowing within a well-known natural gradient algorithm for multichannel blind deconvolution and source separation can introduce a bias into its steady-state solution. We then provide modifications of this algorithm that effectively mitigate these effects for estimating causal FIR solutions to single- and multichannel equalization and source separation tasks. The new multichannel blind deconvolution algorithm requires approximately 6.5 multiply/adds per adaptive filter coefficient, making its computational complexity about 63% greater than the originally-proposed version. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for real-world acoustic sources, even for extremely-short separation filters.

DOI
A robust and precise method for solving the permutation problem of frequency-domain blind source separation

H Sawada, R Mukai, S Araki, S Makino

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 12 ( 5 ) 530 - 538 2004年09月 [査読有り]

　概要を見る

Blind source separation (BSS) for convolutive mixtures can be solved efficiently in the frequency domain, where independent component analysis (ICA) is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem: the permutation ambiguity of ICA in each frequency bin should be aligned so that a separated signal in the time-domain contains frequency components of the same source signal. This paper presents a robust and precise method for solving the permutation problem. It is based on two approaches: direction of arrival (DOA) estimation for sources and the interfrequency correlation of signal envelopes. We discuss the advantages and disadvantages of the two approaches, and integrate them to exploit their respective advantages. Furthermore, by utilizing the harmonics of signals, we make the new method robust even for low frequencies where DOA estimation is inaccurate. We also present a new closed-form formula for estimating DOAs from a separation matrix obtained by ICA. Experimental results show that our method provided an almost perfect solution to the permutation problem for a case where two sources were mixed in a room whose reverberation time was 300 ms.

DOI
The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech

S Araki, R Mukai, S Makino, T Nishikawa, H Saruwatari

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 11 ( 2 ) 109 - 116 2003年03月 [査読有り]

　概要を見る

Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the separation performance is still not good. enough. In particular, when the impulse responses are long, performance is highly limited. In this paper, we consider a two-input, two-output convolutive BSS problem. First, we show that it is not good to be constrained by the condition T > P, where T is the frame length of the DFT and P is the length of the room impulse responses. We show that there is an optimum frame size. that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation. We also clarify the reason for. the poor performance of BSS in long reverberant environments, highlighting that the framework of BSS works as two sets of frequency-domain adaptive beamformers. Although BSS can reduce reverberant sounds to some extent like adaptive beamformers, they mainly remove the sounds from the jammer direction. This is the reason for the difficulty of BSS in reverberant environments.

DOI
Common-acoustical-pole and zero modeling of head-related transfer functions

Y Haneda, S Makino, Y Kaneda, N Kitawaki

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 7 ( 2 ) 188 - 196 1999年03月 [査読有り]

　概要を見る

Use of a common-acoustical-pole and zero model is proposed for modeling head-related transfer functions (HRTF's) for various directions of sound incidence. The HRTF's are expressed using the common acoustical poles, which do not depend on the source directions, and the zeros, which do, The common acoustical poles are estimated as they are common to HRTF's for various source directions; the estimated values of the poles agree well with the resonance frequencies of the ear canal. Because this model uses only the zeros to express the HRTF variations due to changes in source direction, it requires fewer parameters (the order of the zeros) that depend on the source direction than do the conventional all zero or pole/zero models. Furthermore, the proposed model can extract the zeros that are missed in the conventional models because of pole-zero cancellation. As a result, the directional dependence of the zeros can be traced well. Analysis of the zeros for HRTF's on the horizontal plane showed that the nonminimum-phase zero variation was well formulated using a simple pinna-reflection model, The common-acoustical-pole and zero (CAPZ) model is thus effective for modeling and analyzing HRTF's.
A block exact fast affine projection algorithm

M Tanaka, S Makino, J Kojima

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 7 ( 1 ) 79 - 86 1999年01月 [査読有り]

　概要を見る

This paper describes a block (affine) projection algorithm that has exactly the same convergence rate as the original sample-by-sample algorithm and smaller computational complexity than the fast affine projection algorithm. This is achieved by 1) introducing a correction term that compensates for the filter output difference between the sample-by-sample projection algorithm and the straightforward block projection algorithm, and 2) applying a fast finite impulse response (FIR) filtering technique to compute filter outputs and to update the filter.
We describe how to choose a pair of block lengths that gives the longest filter length under a constraint on the total computational complexity and processing delay. An example shows that the filter length can be doubled if a delay of a few hundred samples is permissible.
The past, present, and future of audio signal processing

T Chen, GW Elko, SJ Elliot, S Makino, JM Kates, M Bosi, JO Smith, M Kahrs

IEEE SIGNAL PROCESSING MAGAZINE 14 ( 5 ) 30 - 57 1997年09月 [査読有り]
Common Acoustical Pole and Zero Modeling of Room Transfer Functions

Yoichi Haneda, Shoji Makino, Yutaka Kaneda

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 2 ( 2 ) 320 - 328 1994年04月 [査読有り]

　概要を見る

A new model for a room transfer function (RTF) by using common acoustical poles that correspond to resonance properties of a room is proposed. These poles are estimated as the common values of many RTF's corresponding to different source and receiver positions. Since there is one-to-one correspondence between poles and AR coefficients, these poles are calculated as common AR coefficients by two methods: i) using the least squares method, assuming all the given multiple RTF's have the same AR coefficients and ii) averaging each set of AR coefficients estimated from each RTF. The estimated poles agree well with the theoretical poles when estimated with the same order as the theoretical pole order. When estimated with a lower order than the theoretical pole order, the estimated poles correspond to the major resonance frequencies, which have high Q factors. Using the estimated common AR coefficients, the proposed method models the RTF's with different MA coefficients. This model is called the common-acoustical-pole and zero (CAPZ) model, and it requires far fewer variable parameters to represent RTF's than the conventional all-zero or pole/zero model. This model was used for an acoustic echo canceller at low frequencies, as one example. The acoustic echo canceller based on the proposed model requires half the variable parameters and converges 1.5 times faster than one based on the all-zero model, confirming the efficiency of the proposed model.
Exponentially Weighted Stepsize NLMS Adaptive Filter Based on the Statistics of a Room Impulse Response

Shoji Makino, Yutaka Kaneda, Nobuo Koizumi

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 1 ( 1 ) 101 - 108 1993年01月 [査読有り]

　概要を見る

This paper proposes a new normalized least-mean-squares (NLMS) adaptive algorithm with double the convergence speed, at the same computational load, of the conventional NLMS for an acoustic echo canceller. This algorithm, called the ES (exponentially weighted stepsize) algorithm, uses a different stepsize (feedback constant) for each weight of an adaptive transversal filter. These stepsizes are time-invariant and weighted proportional to the expected variation of a room impulse response. The algorithm is based on the fact that the expected variation of a room impulse response becomes progressively smaller along the series by the same exponential ratio as the impulse response energy decay. As a result, the algorithm adjusts coefficients with large errors in large steps, and coefficients with small errors in small steps. A transition formula is derived for the mean-squared coefficient error of the proposed algorithm. The mean stepsize determines the convergence condition, the convergence speed, and the final excess mean-squared error. The algorithm is modified for a practical multiple DSP structure, so that it requires only the same amount of computation as the conventional NLMS. The algorithm is implemented in a commercial acoustic echo canceller and its fast convergence is demonstrated.

DOI CiNii
Wavelength-Proportional Interpolation and Extrapolation of Virtual Microphone for Underdetermined Speech Enhancement

Ryoga Jinzai, Kouei Yamaoka, Shoji Makino, Nobutaka Ono, Mitsuo Matsumoto, Takeshi Yamada

APSIPA Transactions on Signal and Information Processing 12 ( 3 ) 2023年

　概要を見る

We previously proposed the virtual microphone technique to improve speech enhancement performance in underdetermined situations, in which the number of channels is virtually increased by estimating extra microphone signals at arbitrary positions along the straight line formed by real microphones. The effectiveness of the interpolation of virtual microphone signals for speech enhancement was experimentally confirmed. In this work, we apply the extrapolation of a virtual microphone as preprocessing of the maximum signal-to-noise ratio (SNR) beamformer and compare its speech enhancement performance (the signal-to-distortion ratio (SDR) and signal-to-interference ratio (SIR)) with that of using the interpolation of a virtual microphone. Furthermore, we aim to improve speech enhancement performance by solving a trade-off relationship between performance at low and high frequencies, which can be controlled by adjusting the virtual microphone interval. We propose a new arrangement where a virtual microphone is placed at a distance from the reference real microphone proportional to the wavelength at each frequency. From the results of our experiment in an underdetermined situation, we confirmed speech enhancement performance using the extrapolation of a virtual microphone is higher than that of using the interpolation of a virtual microphone. Moreover, the proposed wavelength-proportional interpolation and extrapolation method improves speech enhancement performance compared with the interpolation and extrapolation. Furthermore, we present the directivity patterns of a spatial filter and confirmed the behavior that improves speech enhancement performance.

DOI
Low latency online blind source separation based on joint optimization with blind dereverberation

Tetsuya Ueda, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki, Shoji Makino

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2021- 506 - 510 2021年

　概要を見る

This paper presents a new low-latency online blind source separation (BSS) algorithm. Although algorithmic delay of a frequency domain online BSS can be reduced simply by shortening the short-time Fourier transform (STFT) frame length, it degrades the source separation performance in the presence of reverberation. This paper proposes a method to solve this problem by integrating BSS with Weighted Prediction Error (WPE) based dereverberation. Although a simple cascade of online BSS after online WPE upgrades the separation performance, the overall optimality is not guaranteed. Instead, this paper extends a recently proposed batch processing algorithm that can jointly optimize dereverberation and separation so that it can perform online processing with low computational cost and little processing delay (&lt
12 ms). The results of a source separation experiment in a noisy car environment suggest that the proposed online method has better separation performance than the simple cascaded methods.

DOI
SepNet: A deep separation matrix prediction network for multichannel audio source separation

Shota Inoue, Hirokazu Kameoka, Li Li, Shoji Makino

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 2021- 191 - 195 2021年

　概要を見る

In this paper, we propose SepNet, a deep neural network (DNN) designed to predict separation matrices from multichannel observations. One well-known approach to blind source separation (BSS) involves independent component analysis (ICA). A recently developed method called independent low-rank matrix analysis (ILRMA) is one of its powerful variants. These methods allow the estimation of separation matrices based on deterministic iterative algorithms. Specifically, ILRMA is designed to update the separation matrix according to an update rule derived based on the majorization-minimization principle. Although ILRMA performs reasonably well under some conditions, there is still room for improvement in terms of both separation accuracy and computation time, especially for large-scale microphone arrays. The existence of a deterministic iterative algorithm that can find one of the stationary points of the BSS problem implies that a DNN can also play that role if designed and trained properly. Motivated by this, we propose introducing a DNN that learns to convert a predefined input (e.g., an identity matrix) into a true separation matrix in accordance with a multichannel observation. To enable it to find one of the multiple solutions corresponding to different permutations of the source indices, we further propose adopting a permutation invariant training strategy to train the network. By using a fully convolutional architecture, we can design the network so that the forward propagation can be computed efficiently. The experimental results revealed that SepNet was able to find separation matrices faster and with better separation accuracy than ILRMA for mixtures of two sources.

DOI
Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis.

Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Masakazu Une, Shoji Makino

APSIPA ASC 578 - 584 2021年
Speech emotion recognition based on attention weight correction using word-level confidence measure

Jennifer Santoso, Takeshi Yamada, Shoji Makino, Kenkichi Ishizuka, Takekatsu Hiramura

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 1 301 - 305 2021年

　概要を見る

Emotion recognition is essential for human behavior analysis and possible through various inputs such as speech and images. However, in practical situations, such as in call center analysis, the available information is limited to speech. This leads to the study of speech emotion recognition (SER). Considering the complexity of emotions, SER is a challenging task. Recently, automatic speech recognition (ASR) has played a role in obtaining text information from speech. The combination of speech and ASR results has improved the SER performance. However, ASR results are highly affected by speech recognition errors. Although there is a method to improve ASR performance on emotional speech, it requires the fine-tuning of ASR, which is costly. To mitigate the errors in SER using ASR systems, we propose the use of the combination of a self-attention mechanism and a word-level confidence measure (CM), which indicates the reliability of ASR results, to reduce the importance of words with a high chance of error. Experimental results confirmed that the combination of self-attention mechanism and CM reduced the effects of incorrectly recognized words in ASR results, providing a better focus on words that determine emotion recognition. Our proposed method outperformed the stateof- the-art methods on the IEMOCAP dataset.

DOI
Study on geometrically constrained IVA with auxiliary function approach and VCD for in-car communication

Goto, Kana, Li, Li, Takahashi, Riki, Makino, Shoji, Yamada, Takeshi

Proc. APSIPA ASC 2020 858 - 862 2020年12月 [査読有り]
Applying virtual microphones to triangular microphone array in in-car communication

Segawa, Hanako, Takahashi, Riki, Jinzai, Ryoga, Makino, Shoji, Yamada, Takeshi

Proc. APSIPA ASC 2020 421 - 425 2020年12月 [査読有り]
Determined audio source separation with multichannel star generative adversarial network

Li Li, Hirokazu Kameoka, Shoji Makino

IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2020- 2020年09月

　概要を見る

This paper proposes a multichannel source separation approach, which uses a star generative adversarial network (StarGAN) to model power spectrograms of sources. Various studies have shown the significant contributions of a precise source model to the performance improvement in audio source separation, which indicates the importance of developing a better source model. In this paper, we explore the potential of StarGAN for modeling source spectrograms and investigate the effectiveness of the StarGAN source model in determined multichannel source separation by incorporating it into a frequency-domain independent component analysis (ICA) framework. The experimental results reveal that the proposed StarGAN-based method outperformed conventional methods that use non-negative matrix factorization (NMF) or a variational autoencoder (VAE) for source spectrogram modeling.

DOI
DNNマスク推定に基づく畳み込みビームフォーマによる音源分離・残響除去・雑音除去の同時実現

髙橋理希, 中谷智広, 落合翼, 木下慶介, 池下林太郎, Marc, Delcroix, 荒木章子, 牧野, 昭二

音講論集 3-1-9 285 - 288 2020年03月
基底共有型半教師あり独立低ランク行列分析に基づく多チャネル補聴器システム

宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

音講論集 1-1-22 217 - 220 2020年03月
発話の時間変動に着目した音声認識誤り区間推定の検討

舒, 禹清, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会講演論文集 957 - 958 2020年03月
空間特徴と音響特徴を併用する音響イベント検出の検討

陳, 軼夫, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会講演論文集 1027 - 1030 2020年03月
車室内コミュニケーション用低遅延音源分離の検討

上田, 哲也, 井上, 翔太, 牧野, 昭二, 松本, 光雄, 山田, 武志

日本音響学会春季研究発表会講演論文集 213 - 216 2020年03月
空間フィルタの自動推定による音響シーン識別の検討

大野, 泰己, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会論文集(D) D-14-6 113 - 113 2020年03月
Generative Adversarial Networks を用いた半教師あり学習の音響イベント検出への適用

合馬, 一弥, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会論文集(D) D-14-7 114 - 114 2020年03月
Spatial feature extraction based on convolutional neural network with multiple microphone inputs for monitoring of domestic activities

Kaneko, Yuki, Kurosawa, Rika, Yamada, Takeshi, Makino, Shoji

Proc. NCSP'20 175 - 178 2020年02月 [査読有り]
Underdetermined multichannel speech enhancement using time-frequency-bin-wise switching beamformer and gated CNN-based time-frequency mask for reverberant environments

Takahashi, Riki, Yamaoka, Kouei, Li, Li, Makino, Shoji, Yamada, Takeshi, Matsumoto, Mitsuo

Proc. NCSP'20 645 - 648 2020年02月 [査読有り]
Blind source separation with low-latency for in-car communication

Ueda, Tetsuya, Inoue, Shota, Makino, Shoji, Matsumoto, Mitsuo, Yamada, Takeshi

Proc. NCSP'20 167 - 170 2020年02月 [査読有り]
多チャンネル変分自己符号化器法による任意話者の音源分離

李莉, 亀岡弘和, 井上翔太, 牧野, 昭二

信学技報 EA2019-77 79 - 84 2019年12月
Evaluation of multichannel hearing aid system by rank-constrained spatial covariance matrix estimation

Une, Masakazu, Kubo, Yuki, Takamune, Norihiro, Kitamura‡, Daichi, Saruwatari, Hiroshi, Makino, Shoji

Proc. APSIPA 1874 - 1879 2019年11月 [査読有り]
Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE

Li, Li, Toda, Tomoki, Morikawa, Kazuho, Kobayashi, Kazuhiro, Makino, Shoji

Proc. Annual Conference of the International Society for Music Information Retrieval 784 - 790 2019年11月
Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum

Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

Proc. APSIPA ASC 2019 302 - 306 2019年11月 [査読有り]
Supervised determined source separation with multichannel variational autoencoder

Kameoka, Hirokazu, Li, Li, Inoue, Shota, Makino, Shoji

Neural Computation 31 ( 9 ) 1891 - 1914 2019年09月 [査読有り]
Joint separation, dereverberation and classification of multiple sources using multichannel variational autoencoder with auxiliary classifier

Shota, Inoue, Hirokazu, Kameoka, Li, Li, Makino, Shoji

Proc. International Congress on Acoustics 6988 - 6995 2019年09月 [査読有り]
Gated convolutional neural network-based voice activity detection under high-level noise environments

Li, Li, Kouei, Yamaoka, Yuki, Koshino, Mitsuo, Matsumoto, Makino, Shoji

Proc. International Congress on Acoustics 2862 - 2869 2019年09月 [査読有り]
ランク制約付き空間共分散モデル推定を用いた多チャネル補聴器システムの評価

宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

音講論集 1-1-3 161 - 164 2019年09月
Wavelength proportional arrangement of virtual microphones based on interpolation/extrapolation for underdetermined speech enhancement

Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Makino, Shoji, Yamada, Takeshi

Proc. EUSIPCO 2019 2019年09月 [査読有り]
BLSTMと変調スペクトルを用いた発話特徴識別の検討

サントソ, ジェニファー, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 917 - 928 2019年09月
BLSTMを用いた音声認識誤り区間推定の検討

舒, 禹清, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 921 - 924 2019年09月
CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations

Yamaoka, Kouei, Li, Li, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

Proc. EUSIPCO 2019 2019年09月 [査読有り]
Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier

Li, Li, Hirokazu, Kameoka, Makino, Shoji

Proc. ICASSP2019 546 - 550 2019年05月
Joint separation and dereverberation of reverberant mixtures with multichannel variational autoencoder

Shota, Inoue, Hirokazu, Kameoka, Li, Li, Shogo, Seki, Makino, Shoji

Proc. ICASSP2019 96 - 100 2019年05月 [査読有り]
Time-frequency-bin-wise switching of minimum variance distortionless response beamformer for underdetermined situations

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

Proc. ICASSP 2019 7908 - 7912 2019年05月 [査読有り]
Experimental evaluation of WaveRNN predictor for audio lossless coding

Amada, Shota, Sugiura, Ryosuke, Kamamoto, Yutaka, Harada, Noboru, Moriya, Takehiro, Yamada, Takeshi, Makino, Shoji

Proc. NCSP'19 315 - 318 2019年03月 [査読有り]
MVDRビームフォーマの時間周波数スイッチングによる劣決定音声強調

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

電子情報通信学会技術研究報告（SP） SIP2018-130 149 - 154 2019年03月
日本語スピーキングテストにおける解答発話テキストの分散表現を用いた自動採点の検討

臼井, 桃香, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会論文集（D） D-14-10 137 - 137 2019年03月
Gated CNNを用いた劣悪な雑音環境下における音声区間検出

李莉, 越野ゆき, 松本光雄, 牧野, 昭二

電子情報通信学会技術研究報告 EA2018-124 19 - 24 2019年03月
Noise suppression using beamformer and transfer-function-gain nonnegative matrix factorization with distributed stereo microphones

Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

Proc. NCSP'19 260 - 263 2019年03月 [査読有り]
Categorizing error causes related to utterance characteristics in speech recognition

Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

Proc. NCSP'19 514 - 517 2019年03月 [査読有り]
多チャンネル変分自己符号化器を用いた音源分離と残響除去の統合的アプローチ

井上翔太, 亀岡弘和, 李莉, 関翔悟, 牧野, 昭二

音講論集 2-Q-32 399 - 402 2019年03月
Performance evaluation of time-frequency-bin-wise switching beamformer in reverberant environments

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

Proc. NCSP'19 264 - 267 2019年03月 [査読有り]
時間周波数スイッチングビームフォーマとGated CNNを用いた時間周波数マスクの組み合わせによる劣決定音声強調

髙橋, 理希, 山岡, 洸瑛, 李, 莉, 牧野, 昭二, 山田, 武志

日本音響学会2019年春季研究発表会講演論文集 1-6-5 181 - 184 2019年03月
音源クラス識別器つき多チャンネル変分自己符号化器を用いた高速セミブラインド音源分離

李, 莉, 亀岡, 弘和, 牧野, 昭二

音講論集 1-6-10 201 - 204 2019年03月
Microphone position realignment by extrapolation of virtual microphone

Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Yamada, Takeshi, Makino, Shoji

Proc. APSIPA ASC 2018 367 - 372 2018年11月 [査読有り]
Weakly labeled learning using BLSTM-CTC for sound event detection

Matsuyoshi, Taiki, Komatsu, Tatsuya, Kondo, Reishi, Yamada, Takeshi, Makino, Shoji

Proc. APSIPA ASC 2018 1918 - 1923 2018年11月 [査読有り]
WaveRNNを利用した音声ロスレス符号化に関する検討と考察

天田, 将太, 杉浦, 亮介, 鎌本, 優, 原田, 登, 守谷, 健弘, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 2-4-9 1149 - 1152 2018年09月
Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming

Makino,Shoji

Proc. IWAENC2018 71 - 75 2018年09月 [査読有り]

DOI
ヴァーチャルマイクロフォンの外挿によるマイクロフォン間隔の仮想的拡張

陣在, 遼河, 山岡, 洸瑛, 松本, 光雄, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 1-1-21 149 - 152 2018年09月
時間周波数スイッチングビームフォーマと時間周波数マスキングによる劣決定音声強調

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

日本音響学会秋季研究発表会講演論文集 1-Q-12 407 - 410 2018年09月
Time-frequency-bin-wise beamformer selection and masking for speech enhancement in underdetermined noisy scenarios

Yamaoka, Kouei, Brendel, Andreas, Ono, Nobutaka, Makino, Shoji, Buerger, Michael, Yamada, Takeshi, Kellermann, Walter

Proc. EUSIPCO 2018 1596 - 1600 2018年09月 [査読有り]
Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming

Makino,Shoji

Proc. IWAENC2018 71 - 75 2018年09月 [査読有り]
音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習法の有効性評価

松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 1-R-5 961 - 964 2018年09月
Acoustic scene classification based on spatial feature extraction using convolutional neural networks

Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

Journal of Signal Processing 22 ( 4 ) 199 - 202 2018年07月 [査読有り]

　概要を見る

Acoustic scene classification (ASC) classifies the place or situation where an acoustic sound was recorded. The Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge prepared a task involving ASC. Some methods using convolutional neural networks (CNNs) were proposed in the DCASE 2017 Challenge. The best method independently performed convolution operations for the left, right, mid (addition of left and right channels), and side (subtraction of left and right channels) input channels to capture spatial features. On the other hand, we propose a new method of spatial feature extraction using CNNs. In the proposed method, convolutions are performed for the time-space (channel) domain and frequency-space domain in addition to the time-frequency domain to capture spatial features. We evaluate the effectiveness of the proposed method using the task in the DCASE 2017 Challenge. The experimental results confirmed that convolution operations for the frequency-space domain are effective for capturing spatial features. Furthermore, by using a combination of the three domains, the classification accuracy was improved by 2.19% compared with that obtained using the tim

DOI
畳み込みニューラルネットワークを用いた空間特徴抽出に基づく音響シーン識別の検討

高橋, 玄, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会講演論文集 67 - 70 2018年03月
複数ビームフォーマの組み合わせによる非線形マイクロフォンアレイ

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

日本音響学会春季研究発表会講演論文集 475 - 478 2018年03月
Ego noise reduction and sound source localization adapted to human ears using hose-shaped rescue robot

Mae, Narumi, Yamaoka, koei, Mitsui, Yosiki, Matsumoto, Mitsuo, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

Proc. NCSP'18 371 - 374 2018年03月 [査読有り]
複数種録音端末を用いた会議の想定における伝達関数ゲイン基底NMFによる遠方音源抑圧の性能評価

松井, 裕太郎, 牧野, 昭二, 小野, 順貴, 山田, 武志

電子情報通信学会技術研究報告 335 - 340 2018年03月
音声認識における誤認識原因通知のための印象評定値推定の検討

後藤, 孝宏, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会講演論文集 117 - 120 2018年03月
音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習の検討

松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会講演論文集 63 - 66 2018年03月
Novel speech recognition interface based on notification of utterance volume required in changing noisy environment

Goto, Takahiro, Yamada, Takeshi, Makino, Shoji

Proc. NCSP'18 192 - 195 2018年03月 [査読有り]
Acoustic scene classification based on spatial feature extraction using convolutional neural networks

Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

Proc. NCSP'18 188 - 191 2018年03月 [査読有り]
Ego-noise reduction for hose-shaped rescue robot using basis-shared semi-supervised independent low-rank matrix analysis

Takakusaki, Moe, Kitamura, Daichi, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi, Saruwatari, Hiroshi

Proc. NCSP'18 351 - 354 2018年03月 [査読有り]
Sound source localization using binaural difference for hose-shaped rescue robot

Mae, Narumi, Mitsui, Yoshiki, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

Proc. APSIPA 2017 1 - 7 2017年12月 [査読有り]
Abnormal sound detection by two microphones using virtual microphone technique

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

Proc. APSIPA 2017 1 - 5 2017年12月 [査読有り]
Performance evaluation of acoustic scene classification using DNN-GMM and frame-concatenated acoustic Features

Takahashi, Gen, Yamada, Takeshi, Ono, Nobutaka, Makino, Shoji

Proc. APSIPA 2017 1 - 5 2017年12月 [査読有り]
Experimental evaluation of encoding parameters of MPEG-4 ALS for high-resolution audio

Amada, Shota, Kamamoto, Yutaka, Harada, Noboru, Sugiura, Ryosuke, Moriya, Takehiro, Makino, Shoji, Yamada, Takeshi

Proc. IEEE GCCE 2017 141 - 145 2017年10月 [査読有り]
音響ロスレス符号化MPEG-4 ALSにおけるハイレゾ音源向け線形予測次数最適化に関する検討と考察

天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会講演論文集 251 - 254 2017年09月
非同期マイクロホンアレーにおける伝達関数ゲイン基底非負値行列因子分解を用いた遠方音源抑圧

村瀬, 慶和, 小野, 順貴, 宮部, 滋樹, 山田, 武志, 牧野, 昭二

日本音響学会誌 73 ( 9 ) 563 - 570 2017年09月 [査読有り]

　概要を見る

ビームフォーミングなどの従来のアレー信号処理による雑音抑圧手法は，位相情報を活用した指向性制御に基づいており，特定方向から到来する雑音に対しては指向性の零点を向けることで高い効果が得られる。しかし，到来方向が特定できないような，いわゆる背景雑音の抑圧は，一般に難しかった。本論文では，伝達関数ゲイン基底NMFにより，遠方から到来する雑音を複数マイクを用いて効果的に抑圧する手法を提案する。提案手法では，背景雑音が遠方から到来することを仮定し，時間周波数領域における振幅情報のみに着目することで，様々な方向から到来する遠方音源を一つの混合音源としてモデル化する。次にこの振幅の混合モデルを従来提案されている制約付き伝達関数ゲイン基底NMFに適用し，遠方音源の抑圧を行う。更に，半教師あり伝達関数ゲイン基底NMFを適用し，遠方音源の抑圧を行う。本手法は振幅情報のみを用いているため，非同期録音機器を用いることができ

DOI CiNii
Mel-generalized cepstral regularization for discriminative non-negative matrix factorization

Li, Li, Kameoka, Hirokazu, Makino, Shoji

Proc. MLSP 1 - 6 2017年09月 [査読有り]
Performance evaluation of nonlinear speech enhancement based on virtual increase of channels in reverberant environments

Yamaoka, Kouei, Ono, Nobutaka, Yamada, Takeshi, Makino, Shoji

Proc. EUSIPCO 2017 2388 - 2392 2017年08月 [査読有り]
Multiple far noise suppression in a real environment using transfer-function-gain NMF

Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

Proc. EUSIPCO 2017 2378 - 2382 2017年08月 [査読有り]
Convolutional neural network architecture and input volume matrix design for ERP classifications in a tactile P300-based brain-computer interface

Kodama, Takumi, Makino, Shoji

Proc. IEEE Engineering in Medicine & Biology Society (EMBC) 3814 - 3817 2017年07月 [査読有り]
教師信号を用いた非同期分散型マイクロホンアレーによる音源分離

坂梨, 龍太郎, 小野, 順貴, 宮部, 滋樹, 山田, 武志, 牧野, 昭二

日本音響学会誌 73 ( 6 ) 337 - 348 2017年06月 [査読有り]

　概要を見る

近年，独立に動作する複数の録音機器を用いた非同期マイクロホンアレーが検討されている。非同期マイクロホンアレーは多チャネルのA/D変換器を必要としないので，従来より安価であり，マイクロホンの配置を柔軟に行えるという利点がある。一方，各チャネルの信号が時間的に同期して録音されておらず，また異なるA/D変換器を用いているために，録音開始時刻オフセットやサンプリング周波数ミスマッチ（以下ではミスマッチパラメータと総称する）が生じてしまう。従来，ミスマッチパラメータを推定して時間同期補償を行うために幾つかの手法が提案されている。しかし，処理が複雑であるために長時間の録音に対しては多大な処理時間を要するという問題がある。そこで本論文では，高速な時間同期補償と高性能な音源分離を実現するために，ミスマッチパラメータ推定と音源分離の両方を共通の教師信号を用いて行う枠組みを提案する。教師信号には，ある話者，あるいはある音源のみが音を発している区間である単一音源区間の信号を用いる。提案法では，時間的に離れた位置にある二つの単一音源区間の信号を手がかりに，ミスマッチパラメータを推定して時間同期補償を行う。更に，単一音源区間の信号を音源分離の教師信号として用い，またマイクロホンの分散型配置が可能という特徴を活用するように，音源分離手法であるSN比最大化ビームフォーマとDuong法を拡張する。実験の結果，提案法により十分な精度での時間同期補償が可能であり，また高い音源分離性能が得られることを確認した。

DOI CiNii
柔軟索状ロボットにおける独立低ランク行列分析と統計的音声強調に基づく高品質ブラインド音源分離の開発

三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

日本機械学会ロボティクス・メカトロニクス講演会 1P2-P04 1 - 4 2017年05月

　概要を見る

In this paper, we propose a novel blind source separation method for the hose-shaped rescue robot based on independent low-rank matrix analysis and statistical speech enhancement. The rescue robot is aimed to detect victims'speech in a disaster area, wearing multiple microphones around the body. Different from the common microphone array, the positions of microphones are unknown, and the conventional beamformer cannot be utilized. In addition, the vibration noise (ego-noise) is generated when the robot moves, yielding the serious contamination in the observed signals. Therefore, it is important to eliminate the ego-noise in this system. This paper describes our newly developed software and hardware system of blind source separation for the robot noise reduction. Also, we report objective and subjective evaluation results showing that the proposed system outperforms the conventional methods in the source separation accuracy and perceptual sound quality via experiments with actual sounds observed in the rescue robot.

DOI CiNii
DNN-GMMと連結特徴量を用いた音響シーン識別の検討

高橋, 玄, 山田, 武志, 小野, 順貴, 牧野, 昭二

日本音響学会2017年春季研究発表会 2-P-1 135 - 138 2017年03月
補助関数法による識別的NMFの基底学習アルゴリズム

李莉, 亀岡弘和, 牧野昭二

日本音響学会2017年春季研究発表会 1-P-4 519 - 522 2017年03月
独立低ランク行列分析と統計的音声強調を用いた柔軟索状ロボットにおけるブラインド音源分離システムの開発

三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

日本音響学会2017年春季研究発表会 1-P-3 517 - 518 2017年03月
SJ-CATにおける項目応答理論に基づく能力値推定の精度改善

小野, 友暉, 山田, 武志, 菊地, 賢一, 今井, 新悟, 牧野, 昭二

日本音響学会2017年春季研究発表会 2-6-3 247 - 250 2017年03月
音響ロスレス符号化MPEG-4 ALSのハイレゾ音源適応の検討と考察

天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

日本音響学会2017年春季研究発表会 2-P-42 381 - 382 2017年03月
Analysis of the brain activated distributions in response to full-body spatial vibrotactile stimuli using a tactile P300-based BCI paradigm

Kodama, T, Makino, Shoji

Proc. Biomedical and Health Informatics (BHI) 1 - 1 2017年02月 [査読有り]
Low latency and high quality two-stage human-voice-enhancement system for a hose-shaped rescue robot

Yoshiaki Bando, Hiroshi Saruwatari, Nobutaka Ono, Shoji Makino, Katustoshi Itoyama, Daichi Kitamura, Masaru Ishimura, Moe Takakusaki, Narumi Mae, Kouei Yamaoka, Yutaro Matsui, Yuichi Ambe, Masashi Konyo, Satoshi Tadokoro, Kazuyoshi Yoshii, Hiroshi G. Okuno

Journal of Robotics and Mechatronics 29 ( 1 ) 198 - 212 2017年02月

　概要を見る

This paper presents the design and implementation of a two-stage human-voice enhancement system for a hose-shaped rescue robot. When a microphoneequipped hose-shaped robot is used to search for a victim under a collapsed building, human-voice enhancement is crucial because the sound captured by a microphone array is contaminated by the ego-noise of the robot. For achieving both low latency and high quality, our system combines online and offline human-voice enhancement, providing an overview first and then details on demand. The online enhancement is used for searching for a victim in real time, while the offline one facilitates scrutiny by listening to highly enhanced human voices. Our online enhancement is based on an online robust principal component analysis, and our offline enhancement is based on an independent lowrank matrix analysis. The two enhancement methods are integrated with Robot Operating System (ROS). Experimental results showed that both the online and offline enhancement methods outperformed conventional methods.

DOI
DISCRIMINATIVE NON-NEGATIVE MATRIX FACTORIZATION WITH MAJORIZATION-MINIMIZATION

Li Li, Hirokazu Kameoka, Shoji Makino

2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017) 141 - 145 2017年 [査読有り]

　概要を見る

Non-negative matrix factorization (NMF) is a powerful approach to single channel audio source separation. In a supervised setting, NMF is first applied to train the basis spectra of each sound source. At test time, NMF is applied to the spectrogram of a mixture signal using the pretrained spectra. The source signals can then be separated out using a Wiener filter. A typical way to train the basis spectra of each source is to minimize the objective function of NMF. However, the basis spectra obtained in this way do not ensure that the separated signal will be optimal at test time due to the inconsistency between the objective functions for training and separation (Wiener filtering). To address this, a framework called discriminative NMF (DNMF) has recently been proposed. In in this work a multiplicative update algorithm was proposed for the basis training, however one drawback is that the convergence is not guaranteed. To overcome this drawback, this paper proposes using a majorization-minimization principle to develop a convergence-guaranteed algorithm for DNMF. Experimental results showed that the proposed algorithm outperformed standard NMF and DNMF using a multiplicative update algorithm as regards both the signal-to-distortion and signal-to-interference ratios.
Blind source separation and multi-talker speech recognition with ad hoc microphone array using smartphones and cloud storage

越智景子, 小野順貴, 宮部滋樹, 牧野昭二

Acoustical Science and Technology 2017年 [査読有り]
Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization

Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017- 1998 - 2002 2017年 [査読有り]

　概要を見る

Spectral domain speech enhancement algorithms based on nonnegative spectrogram models such as non-negative matrix factorization (NMF) and non-negative matrix factor deconvolution are powerful in terms of signal recovery accuracy, however they do not directly lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. We have previously proposed a method that makes it possible to enhance speech in the spectral and cepstral domains simultaneously. Although this method was shown to be effective, the devised algorithm was computationally demanding. This paper proposes yet another formulation that allows for a fast implementation by replacing the regularization term with a divergence measure between the NMF model and the mel-generalized cepstral (MGC) representation of the target spectrum. Since the MGC is an auditory-motivated representation of an audio signal widely used in parametric speech synthesis, we also expect the proposed method to have an effect in enhancing the perceived quality. Experimental results revealed the effectiveness of the proposed method in terms of both the signal-To-distortion ratio and the cepstral distance.

DOI
Full-body tactile P300-based brain-computer interface accuracy refinement

Kodama, T, Shimizu, K, Makino, Shoji, Rutkowski, T

Proc. International Conference on Bio-engineering for Smart Technologies (BioSMART) 1 - 4 2016年12月 [査読有り]
伝達関数ゲイン基底NMFを用いた遠方雑音抑圧の実環境での評価

松井,裕太郎, 牧野,昭二, 小野,順貴, 山田,武志

第31回信号処理シンポジウム B3-1 231 - 235 2016年11月
Flexible microphone array based on multichannel nonnegative matrix factorization and statistical signal estimation

Saruwatari, H, Takata, K, Ono, N, Makino, Shoji

International Congress on Acoustics 1 - 10 2016年09月 [査読有り]
Acoustic scene classification using deep neural network and frame-concatenated acoustic feature

Gen,Takahashi, Takeshi,Yamada, Shoji,Makino, Nobutaka,Ono

DCASE2016 Challenge 1 - 2 2016年09月
雑音下音声認識における必要発話音量提示機能の実装と評価

後藤,孝宏, 山田,武志, 牧野,昭二

日本音響学会秋季研究発表会 3-Q-12 117 - 120 2016年09月
ヴァーチャル多素子化に基づくSN比最大化ビームフォーマの残響に対する性能変化

山岡,洸瑛, 小野,順貴, 山田,武志, 牧野,昭二

日本音響学会秋季研究発表会 3-7-5 379 - 382 2016年09月
Toward a QoL improvement of ALS patients: Development of the full-body P300-based tactile brain-computer interface

Kodama, T, Makino, Shoji, Rutkowski, T

Proc. AEARU Young Researchers International Conference 1 - 4 2016年09月 [査読有り]
日本語スピーキングテストSJ-CATにおける項目応答理論に基づく能力値推定の検証

小野,友暉, 山田,武志, 菊地,賢一, 今井,新悟, 牧野,昭二

日本音響学会秋季研究発表会 3-Q-26 253 - 256 2016年09月
Amplitude-based speech enhancement with non-negative matrix factorization in time-channel domain for ad-hoc microphone array

H., Chiba, N., Ono, S., Miyabe, Y., Takahashi, T., Yamada, S., Makino

J. Acoust. Soc. Jpn 72 ( 8 ) 462 - 470 2016年08月 [査読有り]

CiNii
アドホックマイクロホンアレーにおける時間チャネル領域での非負値行列因子分解を用いた振幅ベースの音声強調

千葉,大将, 小野,順貴, 宮部,滋樹, 高橋,祐, 山田,武志, 牧野,昭二

日本音響学会誌 72 ( 8 ) 462 - 470 2016年08月 [査読有り]

CiNii
アドホックマイクロホンアレーにおける時間チャネル領域での非負値行列因子分解を用いた振幅ベースの音声強調

千葉大将, 小野順貴, 宮部滋樹, 高橋祐, 山田武志, 牧野昭二

日本音響学会誌 72 ( 8 ) 462 - 470 2016年08月 [査読有り]

　概要を見る

本論文では，時間チャネル領域の非負値行列因子分解（NMF）による，非同期分散型録音の目的音強調手法について述べる。複数の録音機器による多チャネル信号は，機器ごとのサンプリング周波数の微小なずれが引き起こす位相差のドリフトのため，位相情報を用いるアレー信号処理は適さない。位相に比べると振幅の分析はドリフトの影響を大きく受けないことに着目し，戸上らが提案した時間チャネル領域のNMFによるチャネル間ゲイン差の分析（伝達関数ゲイン基底NMF）に基づく時間周波数マスクを用いる。また，基底数よりも十分大きなチャネル数が得られない条件の音声強調のための，基底を事前に学習する教師ありNMFについて議論する。

DOI
音声のスペクトル領域とケプストラム領域における同時強調

李莉, 亀岡弘和, 樋口卓哉, 猿渡洋, 牧野昭二

信学技報 SP2016-32 29 - 32 2016年08月
An Extension of MUSIC Exploiting Higher-Order Moments via Nonlinear Mapping

Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Biing-Hwang Juang

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E99A ( 6 ) 1152 - 1162 2016年06月 [査読有り]

　概要を見る

MUltiple Signal Classification (MUSIC) is a standard technique for direction of arrival (DOA) estimation with high resolution. However, MUSIC cannot estimate DOAs accurately in the case of underdetermined conditions, where the number of sources exceeds the number of microphones. To overcome this drawback, an extension of MUSIC using cumulants called 2q-MUSIC has been proposed, but this method greatly suffers from the variance of the statistics, given as the temporal mean of the observation process, and requires long observation. In this paper, we propose a new approach for extending MUSIC that exploits higher-order moments of the signal for the underdetermined DOA estimation with smaller variance. We propose an estimation algorithm that nonlinearly maps the observed signal onto a space with expanded dimensionality and conducts MUSIC-based correlation analysis in the expanded space. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method enables the estimation of DOAs in the case of underdetermined conditions. Furthermore, we describe the class of mapping that allows us to analyze the higher-order moments of the observed signal in the original space. We compare 2q-MUSIC and the proposed method through an experiment assuming that the true number of sources is known as prior information to evaluate in terms of the bias-variance tradeoff of the statistics and computational complexity. The results clarify that the proposed method has advantages for both computational complexity and estimation accuracy in short-time analysis, i.e., the time duration of the analyzed data is short.

DOI
An extension of MUSIC exploiting higher-order moments via nonlinear mapping

Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Biing-Hwang Juang

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E99A ( 6 ) 1152 - 1162 2016年06月 [査読有り]

　概要を見る

MUltiple SIgnal Classification (MUSIC) is a standard technique for direction of arrival (DOA) estimation with high resolution. However, MUSIC cannot estimate DOAs accurately in the case of underdetermined conditions, where the number of sources exceeds the number of microphones. To overcome this drawback, an extension of MUSIC using cumulants called 2q-MUSIC has been proposed, but this method greatly suffers from the variance of the statistics, given as the temporal mean of the observation process, and requires long observation. In this paper, we propose a new approach for extending MUSIC that exploits higher-order moments of the signal for the underdetermined DOA estimation with smaller variance. We propose an estimation algorithm that nonlinearly maps the observed signal onto a space with expanded dimensionality and conducts MUSIC-based correlation analysis in the expanded space. Since the dimensionality of the noise subspace is increased by the mapping, the proposed method enables the estimation of DOAs in the case of underdetermined conditions. Furthermore, we describe the class of mapping that allows us to analyze the higher-order moments of the observed signal in the original space. We compare 2q-MUSIC and the proposed method through an experiment assuming that the true number of sources is known as prior information to evaluate in terms of the bias-variance tradeoff of the statistics and computational complexity. The results clarify that the proposed method has advantages for both computational complexity and estimation accuracy in short-time analysis, i.e., the time duration of the analyzed data is short.

DOI
ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

高草木萌, 北村大地, 小野順貴, 山田武志, Makino, Shoji, 猿渡洋

日本機械学会ロボティクス・メカトロニクス講演会 1A2-10a3 1 - 4 2016年06月
独立ベクトル分析とノイズキャンセラを用いた雑音抑圧の柔軟索状ロボットへの適用

石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

日本機械学会ロボティクス・メカトロニクス講演会2016 1P1-08b3 1 - 4 2016年06月

　概要を見る

This paper presents a noise reduction on a hose-shaped rescue robot. The hose-shaped rescue robot is one of rescue robots developed on Tough Robotics Challenge, and it is used for searching for victims by getting one's voice with its microphone-array. However, the ego noise, caused by its vibration motors, makes it difficult to get the human voice. We propose a noise reduction method using a blind source separation technique based on Independent Vector Analysis (IVA) and noise cancellation. Our method consists of two steps: (1) estimating a speech signal and an ego-noise signal from observed multi-channel signals using the IVA-based blind source separation technique, and (2) applying the noise cancellation to the estimated speech signal using the estimated ego-noise signal as a noise reference.

DOI
ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

高草木萌, 北村大地, 小野順貴, 山田武志, Makino, Shoji, 猿渡洋

日本機械学会ロボティクス・メカトロニクス講演会 1A2-10a3 1 - 4 2016年06月

　概要を見る

A hose-shaped rescue robot is one of the robots that are developed for disaster response in case of a large-scale disasters such as a great earthquake. The robot is suitable for entering narrow and dark places covered with rubble in the disaster site, and for finding inside it. This robot can transmit the ambient sound to its operator by using the built-in microphones. However, there is a serious problem that the inherent noise of this robot, such as the vibration sound or the fricative sound, is mixed into the transmitting voice, therefore disturbing the operator's hearing for a call of help from the victim of the disaster. In this paper, we apply the multichannel NMF (nonnegative matrix factorization) with the rank-1 spatial constraint (Rank-1 MNMF), which was proposed by Kitamura et al., to the reduction of the inherent noise.

DOI
独立ベクトル分析とノイズキャンセラを用いた柔軟索状ロボットにおける雑音抑圧

石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

電子情報通信学会総合大会 2016 58 - 58 2016年03月

CiNii
教師あり多チャネルNMFと統計的音声強調を用いた柔軟索状ロボットにおける音源分離

高田一真, 北村大地, 中嶋広明, 小山翔一, 猿渡洋, 小野順貴, 牧野,昭二

日本音響学会2015年春季研究発表会 ( 3-3-2 ) 609 - 612 2016年03月
非同期分散マイクロホンによるブラインド音源分離を用いた複数話者同時音声認識

越智景子, 小野順貴, 宮部滋樹, 牧野,昭二

日本音響学会2016年春季研究発表会 ( 3-3-1 ) 607 - 608 2016年03月
Vehicle counting and lane estimation with ad-hoc microphone array in real road environments

Takuya,Toyoda, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

Proc. NCSP'16 622 - 625 2016年03月 [査読有り]
ランク1空間モデル制約付き多チャネルNMFを用いた雑音抑圧の柔軟索状ロボットへの適用

高草木,萌, 北村,大地, 小野,順貴, 山田,武志, 牧野,昭二, 猿渡,洋

電子情報通信学会総合大会 2016 57 - 57 2016年03月

CiNii
振幅のみからの相関推定と雑音尖度に基づく空間サブトラクションアレーの減算係数最適化

李,傑, 宮部,滋樹, 小野,順貴, 山田,武志, 牧野,昭二

日本音響学会2016年春季研究発表会 689 - 692 2016年03月
Performance estimation of noisy speech recognition using spectral distortion and recognition task complexity

Ling Guo, Takeshi Yamada, Shigeki Miyabe, Shoji Makino, Nobuhiko Kitawaki

Acoustical Science and Technology 37 ( 6 ) 286 - 294 2016年 [査読有り]

　概要を見る

Previously, methods for estimating the performance of noisy speech recognition based on a spectral distortion measure have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, no consideration is given to any change in the components of a speech recognition system. To solve this problem, we propose a novel method for estimating the performance of noisy speech recognition, a major feature of which is the ability to accommodate the use of different noise reduction algorithms and recognition tasks by using two cepstral distances (CDs) and the square mean root perplexity (SMR-perplexity). First, we verified the effectiveness of the proposed distortion measure, i.e., the two CDs. The experimental results showed that the use of the proposed distortion measure achieves estimation accuracy equivalent to the use of the conventional distortion measures, the perceptual evaluation of speech quality (PESQ) and the signal-to-noise ratio (SNR) of noise-reduced speech, and has the advantage of being applicable to noise reduction algorithms that directly output the mel-frequency cepstral coefficient (MFCC) feature. We then evaluated the proposed method by performing a closed test and an open test (10-fold crossvalidation test). The results confirmed that the proposed method gives better estimates without being dependent on the differences among the noise reduction algorithms or the recognition tasks.

DOI
Performance Estimation of Spontaneous Speech Recognition Using Non-Reference Acoustic Features

Ling Guo, Takeshi Yamada, Shoji Makino

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1 - 4 2016年 [査読有り]

　概要を見る

To ensure a satisfactory QoE (Quality of Experience), it is essential to establish a method that can be used to efficiently investigate recognition performance for spontaneous speech. By using this method, it is allowed to monitor the recognition performance in providing speech recognition services. It can be also used as a reliability measure in speech dialogue systems. Previously, methods for estimating the performance of noisy speech recognition based on spectral distortion measures have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, the methods cannot be applied to spontaneous speech because they require the reference speech to obtain the distortion values. To solve this problem, we propose a novel method for estimating the recognition performance of spontaneous speech with various speaking styles. The main feature is to use non-reference acoustic features that do not require the reference speech. The proposed method extracts non-reference features by openSMILE (open-Source Media Interpretation by Large feature-space Extraction) and then estimates the recognition performance by using SVR (Support Vector Regression). We confirmed the effectiveness of the proposed method by experiments using spontaneous speech data from the OGVC (On-line Gaming Voice Chat) corpus.
NOISE REDUCTION USING INDEPENDENT VECTOR ANALYSIS AND NOISE CANCELLATION FOR A HOSE-SHAPED RESCUE ROBOT

Masaru Ishimura, Shoji Makino, Takeshi Yamada, Nobutaka Ono, Hiroshi Saruwatari

2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) 1 - 5 2016年 [査読有り]

　概要を見る

In this paper, we present noise reduction for a hose-shaped rescue robot. The robot is used for searching for disaster victims by capturing their voice with its microphone array. However, the ego noise generated by its vibration motors makes it difficult to distinguish human voices. To solve this problem, we propose a noise reduction method using a blind source separation technique based on independent vector analysis (IVA) and noise cancellation. Our method consists of two steps: (1) estimating a speech signal and an ego noise signal from observed multichannel signals using the IVA-based blind source separation technique, and (2) applying noise cancellation to the estimated speech signal using the estimated ego noise signal as a noise reference. The experimental evaluations show that this approach is effective for suppressing the ego noise.
Visual Motion Onset Brain--computer Interface

Tomasz M. Rutkowski

Proc. International Conference on Bio-engineering for Smart Technologies (BioSMART) 1 - 4 2016年 [査読有り]
Nonlinear speech enhancement by virtual increase of channels and maximum SNR beamformer

Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING 2016 ( 1 ) 1 - 8 2016年01月 [査読有り]

　概要を見る

In this paper, we propose a new microphone array signal processing technique, which increases the number of microphones virtually by generating extra signal channels from real microphone signals. Microphone array signal processing methods such as speech enhancement are effective for improving the quality of various speech applications such as speech recognition and voice communication systems. However, the performance of speech enhancement and other signal processing methods depends on the number of microphones. Thus, special equipment such as a multichannel A/D converter or a microphone array is needed to achieve high processing performance. Therefore, our aim was to establish a technique for improving the performance of array signal processing with a small number of microphones and, in particular, to increase the number of channels virtually by synthesizing virtual microphone signals, or extra signal channels, from two channels of microphone signals. Each virtual microphone signal is generated by interpolating a short-time Fourier transform (STFT) representation of the microphone signals. The phase and amplitude of the signal are interpolated individually. The phase is linearly interpolated on the basis of a sound propagation model, and the amplitude is nonlinearly interpolated on the basis of beta divergence. We also performed speech enhancement experiments using a maximum signal-to-noise ratio (SNR) beamformer equipped with virtual microphones and evaluated the improvement in performance upon introducing virtual microphones.

DOI
EGO-NOISE REDUCTION FOR A HOSE-SHAPED RESCUE ROBOT USING DETERMINED RANK-1 MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION

Moe Takakusaki, Daichi Kitamura, Nobutaka Ono, Takeshi Yamada, Shoji Makino, Hiroshi Saruwatari

2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) 1 - 4 2016年 [査読有り]

　概要を見る

A hose-shaped rescue robot is one of the robots that have been developed for disaster response in times of large-scale disasters such as a massive earthquake. This robot is suitable for entering narrow and dark places covered with rubble in a disaster site and for finding victims inside it. It can transmit ambient sound captured by its built-in microphones to its operator. However, there is a serious problem, that is, the inherent noise of this robot, such as vibration sound or fricative sound, is mixed with the transmitted voice, thereby disturbing the operator's perception of a call for help from a disaster victim. In this paper, we apply the multichannel nonnegative matrix factorization (NMF) with the rank-1 spatial constraint (determined rank-1 MNMF), which was proposed by Kitamura et al., to the reduction of the inherent noise.
Multi-talker Speech Recognition Based on Blind Source Separation with Ad hoc Microphone Array Using Smartphones and Cloud Storage

Keiko Ochi, Nobutaka Ono, Shigeki Miyabe, Shoji Makino

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5 3369 - 3373 2016年 [査読有り]

　概要を見る

In this paper, we present a multi-talker speech recognition system based on blind source separation with an ad hoc microphone array, which consists of smartphones and cloud storage. In this system, a mixture of voices from multiple speakers is recorded by each speaker's smartphone, which is automatically transferred to online cloud storage. Our prototype system is realized using iPhone and Dropbox. Although the signals recorded by different iPhones are not synchronized, the blind synchronization technique compensates both the differences in the time offset and the sampling frequency mismatch. Then, auxiliary-function-based independent vector analysis separates the synchronized mixture into each speaker's voice. Finally, automatic speech recognition is applied to transcribe the speech. By experimental evaluation of the multi-talker speech recognition system using Julius, we confirm that it effectively reduces the speech overlap and improves the speech recognition performance.

DOI
Tactile Brain-computer Interface Using Classification of P300 Responses Evoked by Full Body Spatial Vibrotactile Stimuli

Takumi Kodama, Shoji Makino, Tomasz M. Rutkowski

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1 - 4 2016年 [査読有り]

　概要を見る

In this study we propose a novel stimulus-driven brain-computer interface (BCI) paradigm, which generates control commands based on classification of somatosensory modality P300 responses. Six spatial vibrotactile stimulus patterns are applied to entire back and limbs of a user. The aim of the current project is to validate an effectiveness of the vibrotactile stimulus patterns for BCI purposes and to establish a novel concept of tactile modality communication link, which shall help locked-in syndrome (LIS) patients, who lose their sight and hearing due to sensory disabilities. We define this approach as a full-body BCI (fbBCI) and we conduct psychophysical stimulus evaluation and realtime EEG response classification experiments with ten healthy body-able users. The grand mean averaged psychophysical stimulus pattern recognition accuracy have resulted at 9 8 : 1 8 %, whereas the realtime EEG accuracy at 5 3 : 6 7 %. An information-transfer-rate (ITR) scores of all the tested users have ranged from 0 : 0 4 2 to 4 : 1 5 4 bit/minute.
Ego Noise Reduction for Hose-Shaped Rescue Robot Combining Independent Low-Rank Matrix Analysis and Noise Cancellation

Narumi Mae, Daichi Kitamura, Masaru Ishimura, Takeshi Yamada, Shoji Makino

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1 - 6 2016年 [査読有り]

　概要を見る

In this paper, we present an ego noise reduction method for a hose-shaped rescue robot developed for search and rescue operations in large-scale disasters such as a massive earthquake. It can enter narrow and dark places covered with rubble in a disaster site and is used to search for disaster victims by capturing their voices with its microphone array. However, ego noises, such as vibration or fricative sounds, are mixed with the voices, and it is difficult to differentiate them from a call for help from a disaster victim. To solve this problem, we here propose a two-step noise reduction method as follows: (1) the estimation of both speech and ego noise signals from an observed multichannel signal by multichannel nonnegative matrix factorization (NMF) with the rank-1 spatial constraint, which was proposed by Kitamura et al., and (2) the application of noise cancellation to the estimated speech signal using the noise reference. Our evaluations show that this approach is effective for suppressing ego noise.
Unisoner：様々な歌手が同一楽曲を歌ったWeb上の多様な歌声を活用する合唱制作支援インタフェース

都築,圭太, 中野,倫靖, 後藤,真孝, 山田,武志, 牧野,昭二

情報処理学会論文誌 56 ( 12 ) 2370 - 2383 2015年12月 [査読有り]

　概要を見る

本論文では，Web上で公開されている「1つの楽曲を様々な歌手が歌った歌声」から，合唱と呼ばれる作品を制作するためのインタフェースUnisonerを提案する．従来，このような合唱制作では，伴奏を抑制した各歌声波形を楽曲のフレーズごとに切り貼りし，音量の大小や左右のバランスを調整したうえで重ね合わせる必要があり，時間と労力がかかっていた．それに対してUnisonerでは，歌詞に基づいた楽曲内位置の指定と，歌手アイコンのドラッグアンドドロップ操作に基づいた音量調整を可能とするインタフェースによって，直感的かつ効率的に合唱を制作することができる．さらに，歌声のF0（基本周波数）とMFCC（Mel Frequency Cepstral Coefficient）に基づいた音響的な類似度や，MFCCに基づいた歌手性別の推定結果に加え，再生数などのWeb上のメタデータを活用した歌手検索機能も持つ．このような機能を実現するためには，伴奏をともなう歌声のF0推定手法や，歌声と歌詞のアラインメント手法が必要となるが，それらの推定結果に誤りが含まれることが問題となる．そこで本論文では，誤りを含む単一の歌声からの推定結果に対し，複数の歌声の推定結果を統合して誤りを削減する手法を提案する．評価実験の結果，Unisonerによって合唱制作時間が短縮されること，提案手法によりF0推定と歌詞アラインメントにおける誤りが減少することを確認した．This paper proposes Unisoner, an interface for assisting the creation of derivative choruses, in which voices of different singers singing the same song are overlapped on top of one shared accompaniment. In the past, it was time-consuming to create such choruses because creators had to manually cut and paste vocal fragments from different singers, and then adjust the volume and panning of each voice. Unisoner enables users to perform such editing tasks efficiently by selecting phrases using lyrics and by dragging and dropping the corresponding icons onto a virtual stage. Moreover, Unisoner can search vocals with acoustic similarity based on F0 and MFCC, estimated gender, and metadata such as the number of views. We use a vocal F0 estimation technique from polyphonic audio signals, and a technique to synchronize audio signals with lyrics. However, estimation errors occur using conventional techniques for F0 and lyric alignment, so we propose a novel method of reducing those errors by integrating the estimated results from many voices singing the same song. The experimental results confirmed that Unisoner can shorten the time for creating derivative choruses, and the proposed methods can reduce the estimation error of F0 and lyric alignment.

CiNii
Unisoner: An interface for derivative chorus creation from various singing voices singing the same song on the web

K.,Tsuzuki, T.,Nakano, M.,Goto, T.,Yamada, S.,Makino

Journal of Information Processing 56 ( 12 ) 2370 - 2383 2015年12月 [査読有り]

　概要を見る

本論文では，Web上で公開されている「1つの楽曲を様々な歌手が歌った歌声」から，合唱と呼ばれる作品を制作するためのインタフェースUnisonerを提案する．従来，このような合唱制作では，伴奏を抑制した各歌声波形を楽曲のフレーズごとに切り貼りし，音量の大小や左右のバランスを調整したうえで重ね合わせる必要があり，時間と労力がかかっていた．それに対してUnisonerでは，歌詞に基づいた楽曲内位置の指定と，歌手アイコンのドラッグアンドドロップ操作に基づいた音量調整を可能とするインタフェースによって，直感的かつ効率的に合唱を制作することができる．さらに，歌声のF0（基本周波数）とMFCC（Mel Frequency Cepstral Coefficient）に基づいた音響的な類似度や，MFCCに基づいた歌手性別の推定結果に加え，再生数などのWeb上のメタデータを活用した歌手検索機能も持つ．このような機能を実現するためには，伴奏をともなう歌声のF0推定手法や，歌声と歌詞のアラインメント手法が必要となるが，それらの推定結果に誤りが含まれることが問題となる．そこで本論文では，誤りを含む単一の歌声からの推定結果に対し，複数の歌声の推定結果を統合して誤りを削減する手法を提案する．評価実験の結果，Unisonerによって合唱制作時間が短縮されること，提案手法によりF0推定と歌詞アラインメントにおける誤りが減少することを確認した．This paper proposes Unisoner, an interface for assisting the creation of derivative choruses, in which voices of different singers singing the same song are overlapped on top of one shared accompaniment. In the past, it was time-consuming to create such choruses because creators had to manually cut and paste vocal fragments from different singers, and then adjust the volume and panning of each voice. Unisoner enables users to perform such editing tasks efficiently by selecting phrases using lyrics and by dragging and dropping the corresponding icons onto a virtual stage. Moreover, Unisoner can search vocals with acoustic similarity based on F0 and MFCC, estimated gender, and metadata such as the number of views. We use a vocal F0 estimation technique from polyphonic audio signals, and a technique to synchronize audio signals with lyrics. However, estimation errors occur using conventional techniques for F0 and lyric alignment, so we propose a novel method of reducing those errors by integrating the estimated results from many voices singing the same song. The experimental results confirmed that Unisoner can shorten the time for creating derivative choruses, and the proposed methods can reduce the estimation error of F0 and lyric alignment.

CiNii
Adaptive post-filtering method controlled by pitch frequency for CELP-based speech coding

H.,Chiba, Y.,Kamamoto, T.,Moriya, N.,Harada, S.,Miyabe, T.,Yamada, S.,Makino

IEICE Trans. Information and Systems J98-D ( 10 ) 1301 - 1311 2015年10月 [査読有り]
CELPに基づく音声符号化向けのピッチ周波数に依存した適応ポストフィルタ

千葉,大将, 鎌本,優, 守谷,健弘, 原田,登, 宮部,滋樹, 山田,武志, 牧野,昭二

電子情報通信学会論文誌 J98-D ( 10 ) 1301 - 1311 2015年10月 [査読有り]
ノンリファレンスひずみ特徴量を用いた雑音下音声認識性能推定の検討

郭,レイ, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

日本音響学会2015年秋季研究発表会 95 - 98 2015年09月
日本語スピーキングテストSJ-CATにおける低スコア解答発話の検出の検討

小野,友暉, 山田,武志, 今井,新悟, 牧野,昭二

日本音響学会2015年秋季研究発表会 329 - 332 2015年09月
マイクロホンアレーの位相が観測できない条件でのチャネル間の相関係数の推定

宮部滋樹, 小野順貴, 牧野,昭二

回路とシステムワークショップ 28 347 - 352 2015年08月

CiNii
Underdetermined blind speech separation with directivity pattern based continuous mask and ICA

Shoko Araki, Shoji Makino, Hiroshi Sawada, Ryo Mukai

European Signal Processing Conference 06-10- 1991 - 1994 2015年04月

　概要を見る

We propose a method for separating speech signals when sources outnumber the sensors. In this paper we mainly concentrate on the case of three sources and two sensors. Some existing methods employ binary masks to extract the signals, and therefore, the extracted signals contain loud musical noise. To overcome this problem, we propose the utilization of a directivity pattern based continuous mask, which removes a single source from the observations, and independent component analysis (ICA) to separate the remaining mixtures. Experimental results show that our proposed method can separate signals with little distortion even in a real reverberant environment of T R =130 ms.
認識性能予測に基づく雑音環境下音声認識のユーザビリティ改善の検討

青木,智充, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会 133 - 136 2015年03月
非同期分散マイクロフォンアレーによる伝達関数ゲイン基底NMFを用いた拡散雑音抑圧

村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会2015年春季研究発表会 557 - 560 2015年03月
Activity Report from the AASP-TC

Makino,Shoji

IEEE Signal Processing Society eNewsletter, TC News 2015年03月 [査読有り]
Signal Processing Techniques for Assisted Listening

Sven Nordholm, Walter Kellermann, Simon Doclo, Vesa Vaelimaeki, Shoji Makino, John R. Hershey

IEEE SIGNAL PROCESSING MAGAZINE 32 ( 2 ) 16 - 17 2015年03月 [査読有り]

DOI
ステレオ録音に基づく移動音源モデルによる走行車両検出と走行方向推定

遠藤,純基, 豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会2015年春季研究発表会 717 - 720 2015年03月
総合品質と明瞭性の客観推定に基づくスペクトルサブトラクションの減算係数の最適化

中里,徹, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会 333 - 336 2015年03月
ケプストラム距離とSMR-パープレキシティを用いた雑音下音声認識の性能推定の検討

郭,レイ, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会 129 - 132 2015年03月
Spatial tactile brain-computer interface by applying vibration to user's shoulders and waist

T.,Kodama, Makino,Shoji, T.M.,Rutkowski

AEARU Workshop on Computer Science and Web Technology 41 - 42 2015年02月 [査読有り]
SSVEP brain-computer interface using green and blue lights

D.,Aminaka, Makino,Shoji, T.M.,Rutkowski

AEARU Workshop on Computer Science and Web Technology 39 - 40 2015年02月 [査読有り]
Spatial auditory brain-computer interface using head related impulse response

C.,Nakaizumi, T.,Matsui, K.,Mori, Makino,Shoji, T.M.,Rutkowski

AEARU Workshop on Computer Science and Web Technology 37 - 38 2015年02月 [査読有り]
Blind compensation of interchannel sampling frequency mismatch for ad hoc microphone array based on maximum likelihood estimation

Shigeki Miyabe, Nobutaka Ono, Shoji Makino

SIGNAL PROCESSING 107 ( SI ) 185 - 196 2015年02月 [査読有り]

　概要を見る

In this paper, we propose a novel method for the blind compensation of drift for the asynchronous recording of an ad hoc microphone array. Digital signals simultaneously observed by different recording devices have drift of the time differences between the observation channels because of the sampling frequency mismatch among the devices. On the basis of a model in which the time difference is constant within each short time frame but varies in proportion to the central time of the frame, the effect of the sampling frequency mismatch can be compensated in the short-time Fourier transform (STFT) domain by a linear phase shift. By assuming that the sources are motionless and have stationary amplitudes, the observation is regarded as being stationary when drift does not occur. Thus, we formulate a likelihood to evaluate the stationarity in the STFT domain to evaluate the compensation of drift. The maximum likelihood estimation is obtained effectively by a golden section search. Using the estimated parameters, we compensate the drift by STFT analysis with a noninteger frame shift. The effectiveness of the proposed blind drift compensation method is evaluated in an experiment in which artificial drift is generated. (C) 2014 The Authors. Published by Elsevier B.V.

DOI
Tactile pin-pressure brain-computer interface

K.,Shimizu, H.,Mori, Makino,Shoji, T.M.,Rutkowski

AEARU Workshop on Computer Science and Web Technology 35 - 36 2015年02月 [査読有り]
Multi-command tactile brain-computer interface using the touch-sense glove

H.,Yajima, Makino,Shoji, T.M.,Rutkowski

AEARU Workshop on Computer Science and Web Technology 43 - 44 2015年02月 [査読有り]
Implementation and evaluation of an acoustic echo canceller using duo-filter control system

Yoichi Haneda, Shoji Makino, Junji Kojima, Suehiro Shimauchi

European Signal Processing Conference 2015年

　概要を見る

The developed acoustic echo canceller uses an exponentially weighted step-size projection algorithm and a duo-filter control system to achieve fast convergence and high speech quality. The duo-filter control system has an adaptive filter and a fixed filter, and uses variable-loss insertion. Evaluation of this system with multi-channel A/D and D/A converters showed that (1) the convergence speed is under 1.5 seconds for speech input when the adaptive filter length is 125 ms, (2) the residual echo level is nearly as low as the ambient noise level (average: Under -20 dB
maximum: Under -35 dB), and (3) near-end speech is sent with no disturbance during double talk.
Brain Evoked Potential Latencies Optimization for Spatial Auditory Brain-Computer Interface

Cai,Zhenyu, Makino,Shoji, Rutkowski, Tomasz Maciej

Cognitive Computation 7 ( 1 ) 34 - 43 2015年 [査読有り]

　概要を見る

We propose a novel method for the extraction of discriminative features in electroencephalography (EEG) evoked potential latency. Based on our offline results, we present evidence indicating that a full surround sound auditory brain–computer interface (BCI) paradigm has potential for an online application. The auditory spatial BCI concept is based on an eight-directional audio stimuli delivery technique, developed by our group, which employs a loudspeaker array in an octagonal horizontal plane. The stimuli presented to the subjects vary in frequency and timbre. To capture brain responses, we utilize an eight-channel EEG system. We propose a methodology for finding and optimizing evoked response latencies in the P300 range in order later to classify them correctly and to elucidate the subject’s chosen targets or ignored non-targets. To accomplish the above, we propose an approach based on an analysis of variance for feature selection. Finally, we identify the subjects’ intended commands with a Naive Bayesian classifier for sorting the final responses. The results obtained with ten subjects in offline BCI experiments support our research hypothesis by providing higher classific

DOI
Chromatic and High-frequency cVEP-based BCI Paradigm

Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC) 1906 - 1909 2015年 [査読有り]

　概要を見る

We present results of an approach to a code-modulated visual evoked potential (cVEP) based braincomputer interface (BCI) paradigm using four high-frequency flashing stimuli. To generate higher frequency stimulation compared to the state-of-the-art cVEP-based BCIs, we propose to use the light-emitting diodes (LEDs) driven from a small micro-controller board hardware generator designed by our team. The high-frequency and green-blue chromatic flashing stimuli are used in the study in order to minimize a danger of a photosensitive epilepsy (PSE). We compare the the green-blue chromatic cVEP-based BCI accuracies with the conventional white-black flicker based interface. The high-frequency cVEP responses are identified using a canonical correlation analysis (CCA) method.
Classification accuracy improvement of chromatic and high–frequency code–modulated visual evoked potential–based BCI

Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9250 232 - 241 2015年 [査読有り]

　概要を見る

© Springer International Publishing Switzerland 2015. We present results of a classification improvement approach for a code–modulated visual evoked potential (cVEP) based brain– computer interface (BCI) paradigm using four high–frequency flashing stimuli. Previously published research reports presented successful BCI applications of canonical correlation analysis (CCA) to steady–state visual evoked potential (SSVEP) BCIs. Our team already previously proposed the combined CCA and cVEP techniques’ BCI paradigm. The currently reported study presents the further enhanced results using a support vector machine (SVM) method in application to the cVEP–based BCI.

DOI
Fingertip Stimulus Cue-based Tactile Brain-computer Interface

Hiroki Yajima, Shoji Makino, Tomasz M. Rutkowski

2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1059 - 1064 2015年 [査読有り]

　概要を見る

The reported project aims to confirm whether a tactile glove fingertips' stimulator is effective for a brain-computer interface (BCI) paradigm using somatosensory event potential (SEP) responses with possible attentional modulation. The proposed simplified stimulator device is presented in detail together with psychophysical and EEG BCI experiment protocols. Results supporting the proposed simple tactile glove device are presented in form of online BCI classification accuracy results using shrinkage linear discriminant analysis (sLDA) technique. Finally, we discuss future possible paradigm improvement steps.
Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation

Shigeki Miyabe, Notubaka Ono, Shoji Makino

LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION, LVA/ICA 2015 9237 421 - 428 2015年 [査読有り]

　概要を見る

In this paper, we propose a method to estimate a correlation coefficient of two correlated complex signals on the condition that only the amplitudes are observed and the phases are missing. Our proposed method is based on a maximum likelihood estimation. We assume that the original complex random variables are generated from a zero-mean bivariate complex normal distribution. The likelihood of the correlation coefficient is formulated as a bivariate Rayleigh distribution by marginalization over the phases. Although the maximum likelihood estimator has no analytical form, an expectation-maximization (EM) algorithm can be formulated by treating the phases as hidden variables. We evaluate the accuracy of the estimation using artificial signal, and demonstrate the estimation of narrow-band correlation of a two-channel audio signal.

DOI
Inter-stimulus Interval Study for the Tactile Point-pressure Brain-computer Interface

Kensuke Shimizu, Shoji Makino, Tomasz M. Rutkowski

2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC) 1910 - 1913 2015年 [査読有り]

　概要を見る

The paper presents a study of an inter-stimulus interval (ISI) influence on a tactile point-pressure stimulus-based brain-computer interface's (tpBCI) classification accuracy. A novel tactile pressure generating tpBCI stimulator is also discussed, which is based on a three-by-three pins' matrix prototype. The six pin-linear patterns are presented to the user's palm during the online tpBCI experiments in an oddball style paradigm allowing for "the aha-responses" elucidation, within the event related potential (ERP). A subsequent classification accuracies' comparison is discussed based on two ISI settings in an online tpBCI application. A research hypothesis of classification accuracies' non-significant differences with various ISIs is confirmed based on the two settings of 120 ms and 300 ms, as well as with various numbers of ERP response averaging scenarios.
Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model

Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 599 - 603 2015年 [査読有り]

　概要を見る

In this paper, we propose a method for suppressing a large number of interferences by using multichannel amplitude analysis based on nonnegative matrix factorization (NMF) and its effective semi-supervised training. For the point-source interference reduction of an asynchronous microphone array, we propose amplitude-based speech enhancement in the time-channel domain, which we call transfer-function-gain NMF. Transfer-function-gain NMF is a robust method against drift, which disrupts an inter-channel phase analysis. We use this method to suppress a large number of sources. We show that a mass of interferences can be modeled by a single basis assuming that the noise sources are sufficiently far from the microphones and the spatial characteristics become similar to each other. Since the blind optimization of the NMF parameters does not work well with merely sparse observation contaminated by the constant heavy noise, we train the diffuse noise basis in advance of the noise suppression using a speech absent observation, which can be obtained easily using a simple voice activity detection technique. We confirmed the effectiveness of our proposed model and semi-supervised transfer-function-gain NMF in an experiment simulating a target source that was surrounded by a diffuse noise.
Variable Sound Elevation Features for Head-related Impulse Response Spatial Auditory BCI

Chisaki Nakaizumi, Shoji Makino, Tomasz M. Rutkowski

2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1094 - 1099 2015年 [査読有り]

　概要を見る

This paper presents a study of classification and EEG feature improvement for a spatial auditory brain-computer interface (saBCI). This study provides a comprehensive test of a head-related impulse response (HRIR) cues for the saBCI speller paradigm. We present a comparison with previously developed HRIR-based spatial auditory modalities. We propose and optimize the three types of sound spatialization settings using a variable elevation in order to evaluate the HRIR efficacy for the saBCI. Three experienced and seven naive BCI users participate in the three experimental setups based on ten presented Japanese syllables. The obtained EEG auditory evoked potentials (AEPs) result with encouragingly good and stable P300 responses in online saBCI experiments. We analyze the differences and dispersions of saBCI command accuracies, as well as the individual user accuracies for various spatial sound locations. Our case study indicates that the participating users could perceive elevation in the saBCI experiments using the HRIR measured from a general head model.
Head-related Impulse Response Cues for Spatial Auditory Brain-computer Interface

Chisaki Nakaizumi, Shoji Makino, Tomasz M. Rutkowski

2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC) 1071 - 1074 2015年 [査読有り]

　概要を見る

This study provides a comprehensive test of a head-related impulse response (HRIR) cues for a spatial auditory brain-computer interface (saBCI) speller paradigm. We present a comparison with the conventional virtual sound headphone-based spatial auditory modality. We propose and optimize the three types of sound spatialization settings using a variable elevation in order to evaluate the HRIR efficacy for the saBCI. Three experienced and seven naive BCI users participated in the three experimental setups based on ten presented Japanese syllables. The obtained EEG auditory evoked potentials (AEP) resulted with encouragingly good and stable P300 responses in online BCI experiments. Our case study indicated that users could perceive elevation in the saBCI experiments generated using the HRIR measured from a general head model. The saBCI accuracy and information transfer rate (ITR) scores have been improved comparing to the classical horizontal plane-based virtual spatial sound reproduction modality, as far as the healthy users in the current pilot study are concerned.
Eeg filtering optimization for code-modulated chromatic visual evoked potential-based brain-computer interface

D. Aiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9359 1 - 6 2015年 [査読有り]

　概要を見る

© Springer International Publishing Switzerland 2015. We present visual BCI classification accuracy improved results after application of high- and low-pass filters to an electroen- cephalogram (EEG) containing code-modulated visual evoked poten- tials (cVEPs). The cVEP responses are applied for the brain-computer interface (BCI) in four commands paradigm mode. The purpose of this project is to enhance BCI accuracy using only the single trial cVEP response. We also aim at identification of the most discriminable EEG bands suitable for the broadband visual stimuli. We report results from a pilot study optimizing the EEG filtering using infinite impulse response filters in application to feature extraction for a linear support vector machine (SVM) classification method. The goal of the presented study is to develop a faster and more reliable BCI to further enhance the sym- biotic relationships between humans and computers.

DOI
SVM Classification Study of Code-modulated Visual Evoked Potentials

Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 1065 - 1070 2015年 [査読有り]

　概要を見る

We present a study of a support vector machine (SVM) application to brain-computer interface (BCI) paradigm. Four SVM kernel functions are evaluated in order to maximize classification accuracy of a four classes-based BCI paradigm utilizing a code-modulated visual evoked potential (cVEP) response within the captured EEG signals. Our previously published reports applied only the linear SVM, which already outperformed a more classical technique of a canonical correlation analysis (CCA). In the current study we additionally test and compare classification accuracies of polynomial, radial basis and sigmoid kernels, together with the classical linear (non-kernel-based) SVMs in application to the cVEP BCI.
TDOA estimation by mapped SRP based on higher-order moment analysis

Xiao-Dong,Zhai, Yuya,Sugimoto, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

Proc. APSIPA 2014 2014年12月 [査読有り]
Adaptive control of applying band-width for post filter of speech coder depending on pitch frequency

Hironobu,Chiba, Yutaka,Kamamoto, Takehiro,Moriya, Noboru,Harada, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

Proc. Asilomar Conference on Signals, Systems, and Computers, Asilomar 2014 2014年11月 [査読有り]
ケプストラム距離を用いた雑音下音声認識の性能推定の検討

郭,翎, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

日本音響学会研究発表会講演論文集 61 - 62 2014年09月
Spatial tactile brain-computer interface paradigm applying vibration stimuli to large areas of user's back

T.,Kodama, Makino,Shoji, T.M.,Rutkowski

International Brain-Computer Interface Conference 1 - 4 2014年09月 [査読有り]
βダイバージェンスに基づく一般化振幅補間によるヴァーチャル多素子化を用いた目的音源強調

片平,拓希, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会講演論文集 633 - 636 2014年09月

CiNii
伝達関数ゲイン基底NMFにおけるマイク数・マイク配置と目的音強調性能の関係

村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会講演論文集 523 - 526 2014年09月

CiNii
Spatial Auditory Two-step Input Japanese Syllabary Brain-computer Interface Speller

M.,Chang, K.,Mori, Makino,Shoji, Rutkowski, Tomasz Maciej

Procedia Technology 18 25 - 31 2014年09月 [査読有り]

　概要を見る

We present a concept and a pilot study of a two-step input speller application combined with a spatial auditory brain-computer interface (BCI) for locked-in syndrome (LIS) users. The application has been developed for 25 Japanese syllabary (hiragana) characters using a two-step input procedure, in order to create an easy-to-use BCI-speller interface. In the proposed procedure, the user first selects the representative letter of a subset, defining the second step. In the second step, the final choice is made. At each interfacing step, the user's intentional choices are classified based on the P300 event related potential (ERP) responses captured in the EEG, as in the classic oddball paradigm. The BCI experiment and EEG results of the pilot study confirm the effectiveness of the proposed spelling method. (C) 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

DOI
Head-related impulse response-based spatial auditory brain-computer interface

C.,Nakaizumi, T.,Matsui, K.,Mori, Makino,Shoji, T.M.,Rutkowski

International Brain-Computer Interface Conference 1 - 4 2014年09月 [査読有り]
絶対値の観測のみを用いた2つの複素信号の相関係数推定

宮部滋樹, 小野順貴, 牧野,昭二

日本音響学会研究発表会講演論文集 ( 1-Q-40 ) 735 - 738 2014年09月

CiNii
教師なし伝達関数ゲイン基底NMFによる目的音強調における罰則項の特性評価

千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会講演論文集 527 - 530 2014年09月

CiNii
分散型マイクロホンアレイを用いた交通車両検出とその車線推定の検討

豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会講演論文集 643 - 646 2014年09月

CiNii
Multi-stage declipping of clipping distortion based on length classification of clipped interval

Chenlei,Li, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

日本音響学会研究発表会講演論文集 553 - 556 2014年09月

CiNii
Unisoner: An interactive interface for derivative chorus creation from various singing voices on the web

K.,Tsuzuki, T.,Nakano, M.,Goto, T.,Yamada, Makino,Shoji

International Computer Music Conference joint with the Sound & Music Computing conference 790 - 797 2014年09月 [査読有り]
Unisoner: an interactive interface for derivative chorus creation from various singing voices on the Web

Keita,Tsuzuki, Tomoyasu,Nakano, Masataka,Goto, Takeshi,Yamada, Shoji,Makino

Proc. ICMC SMC 2014 790 - 797 2014年09月 [査読有り]
News from the AASP-TC

Makino,Shoji

IEEE Signal Processing Society eNewsletter, TC News 2014年08月 [査読有り]
Electroencephalogram steady state response sonification focused on the spatial and temporal properties

Makino,Shoji, T.,Kaniwa, H.,Terasawa

International Conference on Auditory Display ( LS7-1 ) 1 - 7 2014年06月 [査読有り]
EEG Steady State Response Sonification Focused on the Spatial and Temporal Properties

Kaniwa, Teruaki, Terasawa, Hiroko, Matsubara, Masaki, Rutkowski, Tomasz, Makino, Shoji

Proceedings of the 20th International Conference on Auditory Display 2014 (ICAD2014) 1 - 7 2014年06月 [査読有り]
周波数依存到来時間差推定に基づく劣決定ブラインド音源分離の高速化

丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田,武志, 牧野昭二, 中村篤

日本音響学会誌 70 ( 6 ) 323 - 331 2014年06月 [査読有り]

　概要を見る

本論文ではEMアルゴリズムを用いたスパース性に基づく劣決定ブラインド音源分離(BSS)の計算を高速化する手法を提案する。Izumi et al.は,雑音・残響下でも頑健な劣決定BSSを提案したが,到来時間差パラメータをイタレーションごとに離散全探索で求める更新則のために計算量に問題があった。そこで,到来時間差パラメータが周波数に依存すると捉えた,時間差パラメータが解析的に更新される計算量の少ない更新則を提案する。また,帯域重み付け平均による帯域非依存到来時間差推定によってパラメータ数を削減し,収束性を向上させる。実験により,提案手法が計算時間を1/10程度に削減することを確認した。

CiNii
Multimedia Information Processing Combining Brain Science, Life Science, and Information Science

Makino,Shoji

USJI Universities Research Report vol.32 2014年06月 [査読有り]
Reduction of computational cost in underdetermined blind source separation based on frequency-dependent time-difference-of-arrival estimation

T.,Maruyama, S.,Araki, T.,Nakatani, S.,Miyabe, T.,Yamada, 牧野,昭二, A.,Nakamura

J. Acoust. Soc. Jpn vol. 70 ( no. 6 ) 323 - 331 2014年06月 [査読有り]

　概要を見る

本論文ではEMアルゴリズムを用いたスパース性に基づく劣決定ブラインド音源分離(BSS)の計算を高速化する手法を提案する。Izumi et al.は,雑音・残響下でも頑健な劣決定BSSを提案したが,到来時間差パラメータをイタレーションごとに離散全探索で求める更新則のために計算量に問題があった。そこで,到来時間差パラメータが周波数に依存すると捉えた,時間差パラメータが解析的に更新される計算量の少ない更新則を提案する。また,帯域重み付け平均による帯域非依存到来時間差推定によってパラメータ数を削減し,収束性を向上させる。実験により,提案手法が計算時間を1/10程度に削減することを確認した。

CiNii
Acoustic signal processing based on asynchronous and distributed microphone array

N., Ono, S., Miyabe, S., Makino

J. Acoust. Soc. Jpn vol. 70 ( no. 7 ) 391 - 396 2014年06月 [査読有り]
Reduction of computational cost in underdetermined blind source separation based on frequency dependent time-difference-of-arrival estimation

丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田武志, 牧野, 昭二, 中村, 篤

J. Acoust. Soc. Jpn 70 ( 6 ) 323 - 331 2014年06月 [査読有り]

　概要を見る

本論文ではEMアルゴリズムを用いたスパース性に基づく劣決定ブラインド音源分離(BSS)の計算を高速化する手法を提案する。Izumi et al.は,雑音・残響下でも頑健な劣決定BSSを提案したが,到来時間差パラメータをイタレーションごとに離散全探索で求める更新則のために計算量に問題があった。そこで,到来時間差パラメータが周波数に依存すると捉えた,時間差パラメータが解析的に更新される計算量の少ない更新則を提案する。また,帯域重み付け平均による帯域非依存到来時間差推定によってパラメータ数を削減し,収束性を向上させる。実験により,提案手法が計算時間を1/10程度に削減することを確認した。

CiNii
Ad-hoc microphone array - Acoustic signal processing using multiple mobile recording devices -

N., Ono, K.L., Trung, S., Miyabe, S., Makino

IEICE Fundamentals Review vol. 7 ( no. 4 ) 336 - 347 2014年04月 [査読有り]

　概要を見る

マイクロホンアレー信号処理は，複数のマイクロホンで取得した多チャネル信号を処理し，単一マイクロホンでは困難な，音源定位，音源強調，音源分離などを，音源の空間情報を用いることによって行う枠組みである．マイクロホンアレー信号処理においては，チャネル間の微小な時間差が空間情報の大きな手がかりであり，各チャネルを正確に同期させるために，従来は多チャネルA-D 変換器を備えた装置が必要であった．これに対し，我々の身の回りにある，ラップトップPC，ボイスレコーダ，スマートフォンなどの，同期していない録音機器によりマイクロホンアレー信号処理が可能になれば，その利便性は大きく，適用範囲を格段に広げることができる．本稿では、非同期録音機器を用いたマイクロホンアレー信号処理の新しい展開について，関連研究を概観しつつ，筆者らの取組みを紹介する．

DOI CiNii
Adaptive post-fltering method controlled by pitch frequency for CELP-based speech coding

H.,Chiba, Y.,Kamamoto, T.,Moriya, N.,Harada, S.,Miyabe, T.,Yamada, S.,Makino

IEICE Trans. Information and Systems 2014年04月 [査読有り]
非負値行列分解と位相復元に基づくオーディオ符号化の多チャネル化

劉必翔, 澤田宏, 宮部滋樹, 山田武志, 牧野昭二

日本音響学会春季研究発表会 819 - 822 2014年03月

CiNii
種々の雑音抑圧手法と認識タスクに適用可能な音声認識性能推定法の検討

郭レイ, 山田武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会春季研究発表会 13 - 14 2014年03月
ACELP用ポストフィルタのピッチ強調帯域及び利得の適応化

千葉大将, 鎌本優, 守谷健弘, 原田登, 宮部滋樹, 山田武志, 牧野昭二

日本音響学会春季研究発表会 387 - 388 2014年03月
日本語スピーキングテストS-CATの文読み上げ問題における発話の冗長性・不完全性を考慮した自動採点の検討

山畑勇人, 盧昊, 山田武志, 今井新悟, 石塚賢吉, 牧野昭二, 北脇信彦

日本音響学会春季研究発表会 269 - 272 2014年03月
日本語スピーキングテストS-CATの自由発話問題における発話文の難易度を考慮した自動採点の検討

盧昊, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 牧野昭二, 北脇信彦

日本音響学会春季研究発表会 273 - 276 2014年03月
分散型マイクロホンアレイを用いた交通量モニタリング

豊田卓矢, 宮部滋樹, 山田,武志, 小野順貴, 牧野昭二

電子情報通信学会総合大会講演論文集 2014 151 2014年03月
非同期マイクロホンアレーの符号化録音におけるビットレートと同期性能の関係

宮部,滋樹, 小野,順貴, 牧野,昭二, 高橋,祐

音講論集 ( 3-2-8 ) 725 - 726 2014年03月
伝達関数ゲイン基底NMFによる分散配置非同期録音における目的音強調の検討

千葉大将, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二, 高橋祐

日本音響学会春季研究発表会 757 - 760 2014年03月

CiNii
Activity Report from the AASP-TC

S.,Makino

IEEE Signal Processing Society eNewsletter, TC News 2014年02月 [査読有り]
GENERALIZED AMPLITUDE INTERPOLATION BY beta-DIVERGENCE FOR VIRTUAL MICROPHONE ARRAY

Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) 149 - 153 2014年 [査読有り]

　概要を見る

In this paper, we present a generalization of the virtual microphone array we previously proposed to increase the microphone elements by nonlinear interpolation. In the previous work, we generated a virtual observation from two actual microphones by an interpolation in the logarithmic domain. This corresponds to a linear interpolation of the phase and the geometric mean of the amplitude. In this paper, we generalize this interpolation using a linear interpolation of the phase and a nonlinear interpolation of the amplitude with adjustable nonlinearity based on beta-divergence. Improvement of the array signal processing performance is obtained by appropriate tuning of the parameter beta. We evaluate the improvement in speech enhancement using a maximum SNR beamformer.
AMPLITUDE-BASED SPEECH ENHANCEMENT WITH NONNEGATIVE MATRIX FACTORIZATION FOR ASYNCHRONOUS DISTRIBUTED RECORDING

Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Yu Takahashi, Takeshi Yamada, Shoji Makino

2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) 203 - 207 2014年 [査読有り]

　概要を見る

In this paper, we investigate amplitude-based speech enhancement for asynchronous distributed recording. In an ad-hoc microphone array context, it is supposed that different asynchronous devices record speech. As a result, the phase information is unreliable due to sampling frequency mismatch. For speech enhancement based on the amplitude information instead of the phase information, supervised nonnegative matrix factorization (NMF) is introduced in the time-channel domain. The basis vectors, which represents the gain of the transfer function from a source to each microphone, are trained in advance by using single source observation. The experimental evaluations show that this approach is well robust against the sampling frequency mismatch.
Spatial Auditory Two-step Input Japanese Syllabary Brain-computer Interface Speller

Moonjeong Chang, Koichi Mori, Shoji Makino, Tomasz M. Rutkowski

INTERNATIONAL WORKSHOP ON INNOVATIONS IN INFORMATION AND COMMUNICATION SCIENCE AND TECHNOLOGY, IICST 2014 18 25 - 31 2014年 [査読有り]

　概要を見る

We present a concept and a pilot study of a two-step input speller application combined with a spatial auditory brain-computer interface (BCI) for locked-in syndrome (LIS) users. The application has been developed for 25 Japanese syllabary (hiragana) characters using a two-step input procedure, in order to create an easy-to-use BCI-speller interface. In the proposed procedure, the user first selects the representative letter of a subset, defining the second step. In the second step, the final choice is made. At each interfacing step, the user's intentional choices are classified based on the P300 event related potential (ERP) responses captured in the EEG, as in the classic oddball paradigm. The BCI experiment and EEG results of the pilot study confirm the effectiveness of the proposed spelling method. (C) 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

DOI
Chromatic SSVEP BCI Paradigm Targeting the Higher Frequency EEG Responses

Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( WP2-3-2 ) 1 - 7 2014年 [査読有り]

　概要を見る

A novel approach to steady-state visual evoked potential (SSVEP) based brain-computer interface (BCI) is presented in the paper. To minimize possible side effects of the monochromatic light SSVEP-based BCI we propose to utilize chromatic green blue flicker stimuli in higher, comparing to the traditionally used, frequencies. The developed safer SSVEP responses are processed an classified with features drawn from EEG power spectra. Results obtained from healthy users support the research hypothesis of the chromatic and higher frequency SSVEP. The feasibility of proposed method is evaluated in a comparison of monochromatic versus chromatic SSVEP responses. We also present preliminary results with empirical mode decomposition (EMD) adaptive filtering which resulted with improved classification accuracies.
P300 Responses Classification Improvement in Tactile BCI with Touch-sense Glove

Hiroki Yajima, Shoji Makino, Tomasz M. Rutkowski

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( WP2-3-3 ) 1 - 7 2014年 [査読有り]

　概要を見る

This paper reports on a project aiming to confirm whether a tactile stimulator "touch sense glove" is effective for a novel brain computer interface (BCI) paradigm and whether the tactile stimulus delivered to the fingers could be utilized to evoke event related potential (ERP) responses with possible attentional modulation. The tactile ERPs are expected to improve the BCI accuracy. The proposed new stimulator device is presented in detail together with psychophysical and EEG BCI experiment protocols. Results supporting the proposed "touch sense glove" device are presented in form of online BCI classification accuracy results. Finally, we outline the future possible paradigm improvements.
TDOA Estimation by Mapped Steered Response Power Analysis Utilizing Higher-Order Moments

Xiao-Dong Zhai, Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( FP-P1-3 ) 1 - 4 2014年 [査読有り]

　概要を見る

In this paper, we propose a new estimation method for the time difference of arrival (TDOA) between two microphones with improved accuracy by exploiting higher-order moments. In the proposed method analyzes the steered response power (SRP) of the observed signals after nonlinearly mapped onto a higher-dimensional space. Since the mapping operation enhances the linear independence between different vectors by increasing the dimensionality of the observed signals, the TDOA analysis achieves higher resolution. The results of an experiment comparing the TDOA estimation performance of the proposed method with that of the conventional methods reveal the robustness of the proposed method against noise and reverberation.
On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) 2014年 [査読有り]

　概要を見る

In this paper, we investigate the relationship between the way microphones are arranged and the degree to which speech is enhanced using the transfer-function-gain non-negative matrix factorization (NMF), which is an amplitude-based speech enhancement method that is suitable for use with an asynchronous distributed microphone array. In an asynchronous distributed microphone array, recording devices can be placed freely and the number of devices can be easily increased. Therefore, it is important that to determine the optimum microphone arrangement and the degree to which the performance is improved by using many microphones. We understood experimental evaluations to show that the performance by supervised NMF can achieve close to the ideal time-frequency masking with a sufficient number of microphones. We also show that the performance is better when more microphones are placed close to each source.
Tactile and Bone-conduction Auditory Brain Computer Interface for Vision and Hearing Impaired Users - Stimulus Pattern and BCI Accuracy Improvement

Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( FP2-6-3 ) 1 - 7 2014年 [査読有り]

　概要を見る

This paper aims to improve tactile and bone-conduction brain computer interface (tbaBCI) classification accuracy based on a new stimulus pattern search in order to trigger more separable P300 responses. We propose and investigate three approaches to stimulus spatial and frequency content modification. As result of the online tbaBCI classification accuracy tests with six subjects we conclude that frequency modification in the previously reported single vibrotactile exciter-based patterns leads to border of significance statistical improvements.
Tactile Pressure Brain-computer Interface Using Point Matrix Pattern Paradigm

Kensuke Shimizu, Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS) 473 - 477 2014年 [査読有り]

　概要を見る

The paper presents a tactile pressure stimulus-based brain-computer interface (BCI) paradigm. 3 x 3 pressure pins matrix stimulus patterns are presented to the subjects in an oddball paradigm allowing for "aha-responses" generation to attended targets. A research hypothesis is confirmed with the results with five subjects performing online BCI experiments. One of the users could score with 100% accuracy in online ten averages based BCI test. Three users scored above chance levels, while one remained on the chance level border. The presented pilot study experiments and EEG results confirm the effectiveness of the proposed tactile pressure stimulus based BCI.
TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY

Takuya Toyoda, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC) 318 - 322 2014年

　概要を見る

In this paper, we propose an easy and convenient method for traffic monitoring based on acoustic sensing with vehicle sound recorded by an ad-hoc microphone array. Since signals recorded by an ad-hoc microphone array are asynchronous, we perform channel synchronization by compensating for the difference between the start and the end of the recording and the sampling frequency mismatch. To monitor traffic, we estimate the number of the vehicles by employing the peak detection of the power envelopes, and classify the traffic lane from the difference between the propagation times of the microphones. We also demonstrate the effectiveness of our proposed method using the results of an experiment in which we estimated the number of vehicles and classified the lane in which the vehicles were traveling, according to F-measure.
Adaptive Post-Filtering Controlled by Pitch Frequency for CELP-based Speech Coder

Hironobu Chiba, Yutaka Kamamoto, Takehiro Moriya, Noboru Harada, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS 838 - 842 2014年 [査読有り]

　概要を見る

Most speech codecs utilize a post-filter that emphasizes pitch structures to enhance perceptual quality at the decoder. Particularly, the bass post-filter used in ITU-T G.718 performs an adaptive pitch enhancement technique for a lower fixed frequency band. This paper describes a new post-filtering method in which the bass the frequency band and the gain are adaptively controlled frame-by-frame depending on the pitch frequency of decoded signal to improve bass post-filter performance. We have confirmed the improvement of the speech quality with the developed method through objective and subjective evaluations.
On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

Yoshikazu Murase, Hironobu Chiba, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( FA1-1-3 ) 1 - 5 2014年 [査読有り]

　概要を見る

In this paper, we investigate the relationship between the way microphones are arranged and the degree to which speech is enhanced using the transfer-function-gain non-negative matrix factorization (NMF), which is an amplitude-based speech enhancement method that is suitable for use with an asynchronous distributed microphone array. In an asynchronous distributed microphone array, recording devices can be placed freely and the number of devices can be easily increased. Therefore, it is important that to determine the optimum microphone arrangement and the degree to which the performance is improved by using many microphones. We understood experimental evaluations to show that the performance by supervised NMF can achieve close to the ideal time-frequency masking with a sufficient number of microphones. We also show that the performance is better when more microphones are placed close to each source.
Automatic Scoring Method for Open Answer Task in the SJ-CAT Speaking Test Considering Utterance Difficulty Level

Hao Lu, Takeshi Yamada, Shingo Imai, Takahiro Shinozaki, Ryuichi Nisimura, Kenkichi Ishizuka, Shoji Makino, Nobuhiko Kitawaki

2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( WA1-1-3 ) 1 - 5 2014年 [査読有り]

　概要を見る

In this paper, we propose an automatic scoring method for the open answer task of the Japanese speaking test SJ-CAT. The proposed method first extracts a set of features from an input answer utterance and then estimates a vocabulary richness score by human raters, which ranges from 0 to 4, by employing SVR (support vector regression). We devised a novel set of features, namely text statistics weighted by word reliability, to assess the abundance of vocabulary and expression, and degree of word relevance based on the hierarchical distance in a thesaurus to evaluate the suitability of vocabulary. We confirmed experimentally that the proposed method provides good estimates of the human richness score, with a correlation coefficient of 0.92 and an RMSE (root mean square error) of 0.56. We also showed that the proposed method is relatively robust to differences among examinees and among questions used for training and testing.
Auditory Brain-Computer Interface Paradigm with Head Related Impulse Response-based Spatial Cues

Chisaki Nakaizumi, Koichi Mori, Toshie Matsui, Shoji Makino, Tomasz M. Rutkowski

Proc. International Conference on Signal Image Technonogy and Internet Based Systems ( WS-MISA-01 ) 806 - 811 2013年12月 [査読有り]

　概要を見る

The aim of this study is to provide a comprehensive test of head related
impulse response (HRIR) for an auditory spatial speller brain-computer
interface (BCI) paradigm. The study is conducted with six users in an
experimental set up based on five Japanese hiragana vowels. Auditory evoked
potentials resulted with encouragingly good and stable "aha-" or P300-responses
in real-world online BCI experiments. Our case study indicated that the
auditory HRIR spatial sound reproduction paradigm could be a viable alternative
to the established multi-loudspeaker surround sound BCI-speller applications,
as far as healthy pilot study users are concerned.

DOI
Unisoner: 同一楽曲を歌った異なる歌声を重ね合わせる合唱制作支援インタフェース

都築圭太, 中野倫靖, 後藤真孝, 山田,武志, 牧野昭二

第21回インタラクティブシステムとソフトウェアに関するワークショップ, WISS2013 2013年12月
Novel spatial tactile and bone-conduction auditory brain computer interface

T.M.,Rutkowski, H.,Mori, S.,Makino, K.,Mori

Proc. Neuroscience2013 79 2013年11月 [査読有り]
様々な歌手が同じ曲を歌った歌声の多様さを活用するシステム

都築圭太, 中野倫靖, 後藤真孝, 山田武志, 牧野昭二

情報処理学会研究報告 2013-MUS-100-21 1 - 8 2013年09月

　概要を見る

本稿では,Web 上で公開されている「一つの曲を様々な歌手が歌った歌声」を活用する二つのシステムを提案する.一つは,それらの歌声を重ね合わせる合唱生成支援システム,もう一つは,それらの歌声同士や白分の歌声を比較できる歌唱力向上支援システムである.従来,復数の楽曲を用いた鑑賞や創作支援,自分が歌うだけの歌唱力向上支援は研究されてきたが,同一曲を複数人が歌った歌声を活用した合唱生成や歌唱力向上支援はなかった.合唱生成支援システムでは,歌声の出現時刻と左右チャネルの音量をマウスで直感的に調整できる.直感的な操作と,それぞれの歌が完成された作品であることを利用することで,創作と同時に鑑賞を楽しむ「創作鑑賞」も可能となる.また,歌唱力向上支援システムでは,声質 (MFCC) と歌い回し (F0軌跡) が近い歌声同士を比較表示できる.Web 上で公開されていて再生数・マイリスト数があるため,それらの情報を活用しながら歌唱力向上に取り組める.これらのシス
復号信号の特徴に応じたACELP用ポストフィルタの制御

千葉大将, 守谷健弘, 鎌本優, 原田登, 宮部滋樹, 山田武志, 牧野昭二

日本音響学会秋季研究発表会 319 - 320 2013年09月
Some advances in adaptive source separation

J.T.,Chien, H.,Sawada, S.,Makino

APSIPA Newsletter 7 - 9 2013年09月 [査読有り]
複素対数補間を用いたヴァーチャル多素子化マイクロホンアレーの周波数依存素子配置最適化

片平拓希, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二

日本音響学会秋季研究発表会 609 - 610 2013年09月
非整数サンプルシフトのフレーム分析を用いた非同期録音の同期化

宮部,滋樹, 小野,順貴, 牧野,昭二

音講論集 ( 1-1-9 ) 593 - 596 2013年09月
News from the AASP-TC

Makino,Shoji

IEEE Signal Processing Society eNewsletter, TC News 2013年08月 [査読有り]
Network based complexity analysis in tactile brain computer interface task

H.,Mori, Y.,Matsumito, S.,Makino, Z.,Struzik, D.,Mandic, T.M.,Rutkowski

Proc. EMBC2013 51 ( M-134 ) 1 - 1 2013年07月 [査読有り]

DOI CiNii
Multi-command tactile and auditory brain computer interface based on head position stimulation

H.,Mori, Y.,Matsumito, Z.,Struzik, K.,Mori, S.,Makino, D.,Mandic, T.M.,Rutkowski

Proc. International Brain-Computer Interface Meeting ( DOI:10.3217/978-3-85125-260-6- ) 1 - 2 2013年06月 [査読有り]
Spatial tactile and auditory brain computer interface based on head position stimulation

T.M.,Rutkowski, H.,Mori, Y.,Matsumoto, Z.,Struzik, S.,Makino, D.,Mandic, K.,Mori

Proc. Neuro2013 2013年06月 [査読有り]
Comparison of P300 responses in auditory, visual and audiovisual spatial speller BCI paradigms

M.,Chang, N.,Nishikawa, Z.,Struzik, K.,Mori, S.,Makino, D.,Mandic, T.M.,Rutkowski

Proc. International Brain-Computer Interface Meeting ( DOI:10.3217/978-3-85125-260-6- ) 1 - 2 2013年06月 [査読有り]
Blind compensation of inter-channel sampling frequency mismatch with maximum Likelihood estimation in STFT domain

S.,Miyabe, N.,Ono, S.,Makino

Proc. ICASSP2013 674 - 678 2013年05月 [査読有り]

　概要を見る

This paper proposes a novel blind compensation of sampling frequency mismatch for asynchronous microphone array. Digital signals simultaneously observed by different recording devices have drift of the time differences between the observation channels because of the sampling frequency mismatch among the devices. Based on the model that such the time difference is constant within each time frame, but varies proportional to the time frame index, the effect of the sampling frequency mismatch can be compensated in the short-time Fourier transform domain by the linear phase shift. By assuming the sources are motionless and stationary, a likelihood of the sampling frequency mismatch is formulated. The maximum likelihood estimation is obtained effectively by a golden section search.
多変量確率モデルによる脳波の信号分離

栗花,悠輔, 宮部,滋樹, ルトコフスキ,トマシュ, 松本,佳泰, 山田,武志, 牧野,昭二

電子情報通信学会技術研究報告. MBE, MEとバイオサイバネティックス 112 ( 479 ) 161 - 166 2013年03月

　概要を見る

信号源分離の主流をなす枠組である独立成分分析は,無数の信号源が混合された脳波の観測信号から目的信号成分を高精度に分離するのは難しい.本稿では,脳内の個々の現象に関連する脳波の振幅変化を脳波イベントと定義し,無数の信号源により生成される脳波イベントの観測が,それぞれ短時間では局所的に零平均多変量正規分布に従うという確率モデルを定式化する.時間周波数領域で脳波イベントがスパースに発生すると仮定すると,観測信号の尤度は混合正規分布で表され,EMアルゴリズムによって脳波イベントのパラメタを推定することが可能になる.また,適切な超パラメタを持つディリクレ分布を各正規分布の発生確率に導入することにより,EMアルゴリズムで有意な脳波イベントの数とそのパラメタを推定することが可能となる.脳波分離実験により,適切な数の脳波イベントが分離できていることを確認した.
A network model for the embodied communication of musical emotions

H.,Terasawa, R.,Hoshi-Shiba, T.,Shibayama, H.,Ohmura, K.,Furukawa, S.,Makino, K.,Okanoya

Cognitive Studies 20 ( 1 ) 112-129 - 129 2013年03月 [査読有り]

　概要を見る

Music induces a wide range of emotions. However, the influence of physiological functions on musical emotions needs further theoretical considerations. This paper summarizes the physical and physiological functions that are related to musical emo- tions, and proposes a model for the embodied communication of musical emotions based on a discussion on the transmission of musical emotions across people by sharing move- ments and gestures. In this model, human with musical emotion is represented with (1) the interfaces of perception and expression (senses, movements, facial and vocal expressions), (2) an internal system of neural activities including the mirror system and the hormonal secretion system that handles responses to musical activities, and (3) the musical emotion that is enclosed in the internal system. Using this model, mu- sic is the medium for transmitting emotions, and communication of musical emotions is the communication of internal emotions through music and perception/expression interfaces. Finally, we will discuss which aspect in music functions to encourage the communication of musical emotions by humans.

DOI CiNii
A network model for the embodied communication of musical emotions

H.,Terasawa, R.,Hoshi-Shiba, T.,Shibayama, H.,Ohmura, K.,Furukawa, S.,Makino, K.,Okanoya

Cognitive Studies 20 ( 1 ) 112-129 - 129 2013年03月 [査読有り]

　概要を見る

Music induces a wide range of emotions. However, the influence of physiological functions on musical emotions needs further theoretical considerations. This paper summarizes the physical and physiological functions that are related to musical emo-tions, and proposes a model for the embodied communication of musical emotions based on a discussion on the transmission of musical emotions across people by sharing move-ments and gestures. In this model, human with musical emotion is represented with (1) the interfaces of perception and expression (senses, movements, facial and vocal expressions), (2) an internal system of neural activities including the mirror system and the hormonal secretion system that handles responses to musical activities, and (3) the musical emotion that is enclosed in the internal system. Using this model, mu-sic is the medium for transmitting emotions, and communication of musical emotions is the communication of internal emotions through music and perception/expression interfaces. Finally, we will discuss which aspect in music functions to encourage the communication of musical emotions by humans.

DOI CiNii
Speech enhancement with ad-hoc microphone array using single source activity

Ryutaro Sakanashi, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) ( OS.21-SLA.7.5 ) 1 - 6 2013年 [査読有り]

　概要を見る

In this paper, we propose a method for synchronizing asynchronous channels in an ad-hoc microphone array based on single source activity for speech enhancement. An ad-hoc microphone array can include multiple recording devices, which do not communicate with each other. Therefore, their synchronization is a significant issue when using the conventional microphone array technique. We here assume that we know two or more segments (typically the beginning and the end of the recording) where only the sound source is active. Based on this situation, we compensate for the difference between the start and end of the recording and the sampling frequency mismatch. We also describe experimental results for speech enhancement with a maximum SNR beamformer.
Performance estimation of noisy speech recognition using spectral distortion and SNR of noise-reduced speech

Guo Ling, Takeshi Yamada, Shoji Makino, Nobuhiko Kitawaki

IEEE Region 10 Annual International Conference, Proceedings/TENCON 2013年 [査読有り]

　概要を見る

To ensure a satisfactory QoE (Quality of Experience) and facilitate system design in speech recognition services, it is essential to establish a method that can be used to efficiently investigate recognition performance in different noise environments. Previously, we proposed a performance estimation method using the PESQ (Perceptual Evaluation of Speech Quality) as a spectral distortion measure. However, there is the problem that the relationship between the recognition performance and the distortion value differs depending on the noise reduction algorithm used. To solve this problem, we propose a novel performance estimation method that uses an estimator defined as a function of the distortion value and the SNR (Signal to Noise Ratio) of noise-reduced speech. The estimator is applicable to different noise reduction algorithms without any modification. We confirmed the effectiveness of the proposed method by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. © 2013 IEEE.

DOI
Classification improvement of P300 response based auditory spatial speller brain-computer interface paradigm

Moonjeong Chang, Shoji Makino, Tomasz M. Rutkowski

IEEE Region 10 Annual International Conference, Proceedings/TENCON ( S.I.2.1 ) 1 - 4 2013年 [査読有り]

　概要を見る

The aim of the presented study is to provide a comprehensive test of the EEG evoked response potential (ERP) feature selection techniques for the spatial auditory BCI-speller paradigm, which creates a novel communication option for paralyzed subjects or body-able individuals requiring a direct brain-computer interfacing application. For rigor, the study is conducted with 16 BCI-naive healthy subjects in an experimental setup based on five Japanese hiragana characters in an offline processing mode. In our previous studies the spatial auditory stimuli related P300 responses resulted with encouragingly separable target vs. non-target latencies in averaged responses, yet that finding was not well reproduced in the online BCI single trial based settings. We present the case study indicating that the auditory spatial unimodal paradigm classification accuracy can be enhanced with an AUC based feature selection approach, as far as BCI-naive healthy subjects are concerned. © 2013 IEEE.

DOI
Bone-conduction-based brain computer interface paradigm - EEG signal processing, feature extraction and classification

Daiki Aminaka, Koichi Mori, Toshie Matsui, Shoji Makino, Tomasz M. Rutkowski

Proceedings - 2013 International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2013 ( WS-MISA-03 ) 818 - 824 2013年 [査読有り]

　概要を見る

The paper presents a novel bone-conduction based brain-computer interface paradigm. Four sub-threshold acoustic frequency stimulus patterns are presented to the subjects in an oddball paradigm allowing for 'aha-responses' generation to the attended targets. This allows for successful implementation of the bone-conduction based brain-computer interface (BCI) paradigm. The concept is confirmed with seven subjects in online bone-conducted auditory Morse-code patterns spelling BCI paradigm. We report also brain electrophysiological signal processing and classification steps taken to achieve the successful BCI paradigm. We also present a finding of the response latency variability in a function of stimulus difficulty. © 2013 IEEE.

DOI
VIRTUALLY INCREASING MICROPHONE ARRAY ELEMENTS BY INTERPOLATION IN COMPLEX-LOGARITHMIC DOMAIN

Hiroki Katahira, Nobutaka Ono, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) ( TH-L5.3 ) 2013年

　概要を見る

In this paper, we propose a new array signal processing technique for an underdetermined condition by increasing the number of observation channels. We introduce virtual observation as an estimate of the observed signals at positions where real microphones are not placed. Such signals at virtual observation channels are generated by the complex logarithmic interpolation of real observed signals. With the increased number of observation channels, conventional linear array signal processing methods can be applied to underdetermined conditions. As an example of the proposed array signal processing framework, we show experimental results of speech enhancement obtained with maximum SNR beamformers modified using the virtual observation.
Multi-command chest tactile brain computer interface for small vehicle robot navigation

Hiromu Mori, Shoji Makino, Tomasz M. Rutkowski

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8211 LNAI 469 - 478 2013年 [査読有り]

　概要を見る

The presented study explores the extent to which tactile stimuli delivered to five chest positions of a healthy user can serve as a platform for a brain computer interface (BCI) that could be used in an interactive application such as robotic vehicle operation. The five chest locations are used to evoke tactile brain potential responses, thus defining a tactile brain computer interface (tBCI). Experimental results with five subjects performing online tBCI provide a validation of the chest location tBCI paradigm, while the feasibility of the concept is illuminated through information-transfer rates. Additionally an offline classification improvement with a linear SVM classifier is presented through the case study. © Springer International Publishing 2013.

DOI
Classifying P300 responses to vowel stimuli for auditory brain-computer interface

Yoshihiro Matsumoto, Shoji Makino, Koichi Mori, Tomasz M. Rutkowski

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 ( OS.31-BioSiPS.2.8 ) 1 - 5 2013年 [査読有り]

　概要を見る

A brain-computer interface (BCI) is a technology for operating computerized devices based on brain activity and without muscle movement. BCI technology is expected to become a communication solution for amyotrophic lateral sclerosis (ALS) patients. Recently the BCI2000 package application has been commonly used by BCI researchers. The P300 speller included in the BCI2000 is an application allowing the calculation of a classifier necessary for the user to spell letters or sentences in a BCI-speller paradigm. The BCI-speller is based on visual cues, and requires muscle activities such as eye movements, impossible to execute by patients in a totally locked-in state (TLS), which is a terminal stage of the ALS illness. The purpose of our project is to solve this problem, and we aim to develop an auditory BCI as a solution. However, contemporary auditory BCI-spellers are much weaker compared with a visual modality. Therefore there is a necessity for improvement before practical application. In this paper, we focus on an approach related to the differences in responses evoked by various acoustic BCI-speller related stimulus types. In spite of various event related potential waveform shapes, typically a classifier in the BCI speller discriminates only between targets and non-targets, and hence it ignores valuable and possibly discriminative features. Therefore, we expect that the classification accuracy could be improved by using an independent classifier for each of the stimulus cue categories. In this paper, we propose two classifier training methods. The first one uses the data of the five stimulus cues independently. The second method incorporates weighting for each stimulus cue feature in relation to all of them. The results of the experiments reported show the effectiveness of the second method for classification improvement. © 2013 APSIPA.

DOI
EMPLOYING MOMENTS OF MULTIPLE HIGH ORDERS FOR HIGH-RESOLUTION UNDERDETERMINED DOA ESTIMATION BASED ON MUSIC

Yuya Sugimoto, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Fred Juang

2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) ( PM-02 ) 1 - 4 2013年 [査読有り]

　概要を見る

Several extensions of the MUltiple SIgnal Classification (MUSIC) algorithm exploiting high order statistics were proposed to estimate directions of arrival (DOAs) with high resolution in underdetermined conditions. However, these methods entail a trade-off between two performance goals, namely, robustness and resolution, in the choice of orders because use of high-ordered statistics increases not only the resolution but also the statistical bias. To overcome this problem, this paper proposes a new extension of MUSIC using a nonlinear high-dimensional map, which corresponds to the joint analysis of moments of multiple orders and helps to realize the both advantages of robustness and high resolution of low-ordered and high-ordered statistics. Experimental results show that the proposed method can estimate DOAs more accurately than the conventional MUSIC extensions exploiting moments of a single high order.
OPTIMIZING FRAME ANALYSIS WITH NON-INTEGRER SHIFT FOR SAMPLING MISMATCH COMPENSATION OF LONG RECORDING

Shigeki Miyabe, Nobutaka Ono, Shoji Makino

2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) ( TM-09 ) 1 - 4 2013年 [査読有り]

　概要を見る

This paper proposes a blind synchronization of ad-hoc microphone array in the short-time Fourier transform (STFT) domain with the optimized frame analysis centered at non-integer discrete time. We show that the drift caused by sampling frequency mismatch of asynchronous observation channels can be disregarded in a short interval. Utilizing this property, the sampling frequency mismatch and the recording start offset are estimated roughly by finding two pairs of the short intervals corresponding to the same continuous time. Using the estimate, STFT analysis is synchronized roughly between channels with optimized frame central. Since the optimized frame central is generally non-integer, we approximate the frame analysis by the linear phase filtering of the frame centered at the nearest integer sample. Maximum likelihood estimation refines the compensation of sampling frequency mismatch.
Spatial auditory BCI with ERP responses to front-back to the head stimuli distinction support

Zhenyu Cai, Shoji Makino, Tomasz M. Rutkowski

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 ( OS.31-BioSiPS.2.1 ) 1 - 8 2013年 [査読有り]

　概要を見る

This paper presents recent results obtained with a new auditory spatial localization based BCI paradigm in which ERP shape differences at early latencies are employed to enhance classification accuracy in an oddball experimental setting. The concept relies on recent results in auditory neuroscience showing the possibility to differentiate early anterior contralateral responses to the spatial sources attended to. We also find that early brain responses indicate which direction, front or rear loudspeaker source, the subject attended to. Contemporary stimuli-driven BCI paradigms benefit most from the P300 ERP latencies in a so-called 'aha-response' setting. We show the further enhancement of the classification results in a spatial auditory paradigm, in which we incorporate N200 latencies. The results reveal that these early spatial auditory ERPs boost offline classification results of the BCI application. The offline BCI experiments with the multi-command BCI prototype support our research hypothesis with higher classification results and improved information transfer rates. © 2013 APSIPA.

DOI
Adaptive processing and learning for audio source separation

Jen-Tzung Chien, Hiroshi Sawada, Shoji Makino

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 ( OS.42-SLA.13.3 ) 1 - 6 2013年 [査読有り]

　概要を見る

This paper overviews a series of recent advances in adaptive processing and learning for audio source separation. In real world, speech and audio signal mixtures are observed in reverberant environments. Sources are usually more than mixtures. The mixing condition is occasionally changed due to the moving sources or when the sources are changed or abruptly present or absent. In this survey article, we investigate different issues in audio source separation including overdetermined/underdetermined problems, permutation alignment, convolutive mixtures, contrast functions, nonstationary conditions and system robustness. We provide a systematic and comprehensive view for these issues and address new approaches to overdetermined/underdetermined convolutive separation, sparse learning, nonnegative matrix factorization, information-theoretic learning, online learning and Bayesian approaches. © 2013 APSIPA.

DOI
Spatial auditory BCI paradigm based on real and virtual sound image generation

Nozomu Nishikawa, Shoji Makino, Tomasz M. Rutkowski

2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2013 ( OS.31-BioSiPS.2.7 ) 1 - 5 2013年 [査読有り]

　概要を見る

This paper presents a novel concept of spatial auditory brain-computer interface utilizing real and virtual sound images. We report results obtained from psychophysical and EEG experiments with nine subjects utilizing a novel method of spatial real or virtual sound images as spatial auditory brain computer interface (BCI) cues. Real spatial sound sources result in better behavioral and BCI response classification accuracies, yet a direct comparison of partial results in a mixed experiment confirms the usability of the virtual sound images for the spatial auditory BCI. Additionally, we compare stepwise linear discriminant analysis (SWLDA) and support vector machine (SVM) classifiers in a single sequence BCI experiment. The interesting point of the mixed usage of real and virtual spatial sound images in a single experiment is that both stimuli types generate distinct event related potential (ERP) response patterns allowing for their separate classification. This discovery is the strongest point of the reported research and it brings the possibility to create new spatial auditory BCI paradigms. © 2013 APSIPA.

DOI
Multi-command tactile brain computer interface: A feasibility study

Hiromu Mori, Yoshihiro Matsumoto, Victor Kryssanov, Eric Cooper, Hitoshi Ogawa, Shoji Makino, Zbigniew R. Struzik, Tomasz M. Rutkowski

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7989 LNCS 50 - 59 2013年 [査読有り]

　概要を見る

The study presented explores the extent to which tactile stimuli delivered to the ten digits of a BCI-naive subject can serve as a platform for a brain computer interface (BCI) that could be used in an interactive application such as robotic vehicle operation. The ten fingertips are used to evoke somatosensory brain responses, thus defining a tactile brain computer interface (tBCI). Experimental results on subjects performing online (real-time) tBCI, using stimuli with a moderately fast inter-stimulus-interval (ISI), provide a validation of the tBCI prototype, while the feasibility of the concept is illuminated through information-transfer rates obtained through the case study. © 2013 Springer-Verlag.

DOI
EEG signal processing and classification for the novel tactile-force brain-computer interface paradigm

Shota Kono, Daiki Aminaka, Shoji Makino, Tomasz M. Rutkowski

Proceedings - 2013 International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2013 ( WS-MISA-02 ) 812 - 817 2013年 [査読有り]

　概要を見る

The presented study explores the extent to which tactile-force stimulus delivered to a hand holding a force-feedback joystick can serve as a platform for a brain-computer interface (BCI). The four pressure directions are used to evoke tactile brain potential responses, thus defining a tactile-force brain computer interface (tfBCI). We present brain signal processing and classification procedures leading to successful online interfacing results. Experimental results with seven subjects performing online BCI experiments provide a validation of the hand location tfBCI paradigm, while the feasibility of the concept is illuminated through remarkable information-transfer rates. © 2013 IEEE.

DOI
Inter-subject differences in personalized technical ear training and the influence of an individually optimized training sequence

Sungyoung Kim, Teruaki Kaniwa, Hiroko Terasawa, Takeshi Yamada, Shoji Makino

Acoustical Science and Technology 34 ( 6 ) 424 - 431 2013年 [査読有り]

　概要を見る

Technical ear training aims to improve the listening of sound engineers so they can skillfully modify and edit the structure of sound. Despite recent increasing interest in listening ability and subjective evaluation in the field of audio- and acoustic-related fields and the subsequent appearance of various technical ear-training methods, the subject of how to provide efficient training for a self-trainee has not yet been studied. This paper investigated trainees' performances and showed that an (inherent or learned) ability to correctly describe spectral differences using the terms of a parametric equalizer (center frequency, Q, and gain) was different for each person. To cope with such individual differences in spectral identification, the authors proposed a novel method that adaptively controls the training task based on a trainee's prior performances. In detail, the method estimates the weakness of the trainee, and generates a training routine that focuses on that weakness. Subsequently, we tried to determine whether the proposed method-adaptive feedback-helps self-learners improve their performance in technical listening that involves identifying spectral differences. The results showed that the proposed method could assist trainees in improving their ability to identify differences more effectively than the counterpart group. Together with other features required for effective selftraining, this adaptive feedback would assist a trainee in acquisition of timbre-identification ability. © 2013 The Acoustical Society of Japan.

DOI
Exhaustive structural comparison of protein-DNA binding surfaces

R.,Minai, T.,Horiike, S.,Makino

GIW2012 (International Conference on Genome Informatics) ( poster 29 ) 2012年12月 [査読有り]
Full-reference objective quality evaluation for noise-reduced speech considering effect of musical noise

Y.,Fujita, T.,Yamada, S.,Makino, N.,Kitawaki

Oriental COCOSDA2012 300-305 2012年12月 [査読有り]
Foreword to special issue on recent mathematical advances in acoustic signal processing

S.,Makino

The Journal of the Acoustical Society of Japan 68 ( 11 ) 557-558 - 558 2012年11月 [査読有り]

CiNii
A multi-command spatial auditory BMI based on evoked EEG responses from real and virtual sound stimuli

T.M.,Rutkowski, Z.,Cai, N.,Nishikawa, Y.,Matsumoto, S.Makino, D.,Looney, D.P.,Mandic, Z.R.,Struzik, A.W, Przybyszewski

Neuroscience2012 891.16/NN4 2012年10月 [査読有り]
Underdetermined DOA estimation by the non-linear MUSIC exploiting higher-order moments

Y.,Sugimoto, S.,Miyabe, T.,Yamada, S.,Makino, F.,Juang

IWAENC2012 ( E-03 ) 2012年09月 [査読有り]
In Search of a Perceptual Metric for Timbre: Dissimilarity Judgments among Synthetic Sounds with MFCC-Derived Spectral Envelopes

Hiroko Terasawa, Jonathan Berger, Shoji Makino

JOURNAL OF THE AUDIO ENGINEERING SOCIETY 60 ( 9 ) 674 - 685 2012年09月 [査読有り]

　概要を見る

This paper presents a quantitative metric to describe the multidimensionality of spectral envelope perception, that is, the perception specifically related to the spectral element of timbre. Mel-cepstrum (Mel-frequency cepstral coefficients or MFCCs) is chosen as a hypothetical metric for spectral envelope perception due to its desirable properties of linearity, orthogonality, and multidimensionality. The experimental results confirmed the relevance of Mel-cepstrum to the perceived timbre dissimilarity when the spectral envelopes of complex-tone synthetic sounds were systematically controlled. The first experiment measured the perceived dissimilarity when the stimuli were synthesized by varying only a single coefficient from MFCC. Linear regression analysis proved that each of the 12 MFCCs has a linear correlation with spectral envelope perception. The second experiment measured the perceived dissimilarity when the stimuli were synthesized by varying two of the MFCCs. Multiple regression analysis showed that the perceived dissimilarity can be explained in terms of the Euclidean distance of the MFCC values of the synthetic sounds. The quantitative and perceptual relevance between the MFCCs and spectral centroids is also discussed. These results suggest that MFCCs can be a metric representation of spectral envelope perception, where each of its orthogonal basis functions provides a linear match with human perception.
Analysis of brain responses to spatial real and virtual sounds - A BCI/BMI approach

N.,Nishikawa, S.,Makino, T.M.,Rutkowski

International Workshop on Brain Inspired Computing, BIC2012 2012年06月 [査読有り]
Steady-state auditory responses application to BCI/BMI

Y.,Matsumoto, S.,Makino, T.M.,Rutkowski

International Workshop on Brain Inspired Computing, BIC2012 2012年06月 [査読有り]
Spatial auditory BCI/BMI paradigm

Z.,Cai, S.,Makino, T.M.,Rutkowski

International Workshop on Brain Inspired Computing, BIC2012 2012年06月 [査読有り]
フルランク空間相関行列モデルに基づく拡散性雑音除去

礒, 佳樹, 荒木, 章子, 牧野, 昭二, 中谷, 智広, 澤田, 宏, 山田, 武志, 宮部, 滋樹, 中村, 篤

電子情報通信学会総合大会講演論文集 2012 ( 0 ) 194 2012年03月
D-14-1 雑音抑圧音声の主観品質評価におけるミュージカルノイズの影響(D-14.音声,一般セッション)

藤田, 悠希, 山田, 武志, 牧野, 昭二, 北脇, 信彦

電子情報通信学会総合大会講演論文集 2012 ( 1 ) 185 2012年03月
Cepstral smoothing of separated signals for underdetermined speech separation

Y.,Ansai, S.,Araki, S.,Makino, T.,Nakatani, T.,Yamada, A.,Nakamura, N.,Kitawaki

The Journal of the Acoustical Society of Japan 68 ( 2 ) 74 - 85 2012年02月 [査読有り]

　概要を見る

本論文では,音源信号のスパース性に基づき,時間周波数バイナリマスク(BM)を用いる音源分離手法におけるミュージカルノイズの低減を目的とした,分離音声のケプストラムスムージング(CSS)を提案する。CSSは,近年提案されたスペクトルマスクのケプストラムスムージング(CSM)で用いられるケプストラム領域でスムージングする考え方と,ケプストラム表現による音声特性の保持の制御という観点では,マスクではなくBMによって得られた分離音声を直接スムージングする方が好ましいという仮説とに基づいている。また,従来法(CSM)や提案法(CSS)と他のミュージカルノイズ低減手法の性能を実験により比較する。CSSでは,CSMと同程度のミュージカルノイズ低減性能を有し,更に目的音声の歪の小さい分離信号が得られた。

CiNii
NEW ANALYTICAL UPDATE RULE FOR TDOA INFERENCE FOR UNDERDETERMINED BSS IN NOISY ENVIRONMENTS

Takuro Maruyama, Shoko Araki, Tomohiro Nakatani, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Atsushi Nakamura

2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 269 - 272 2012年 [査読有り]

　概要を見る

In this paper, we propose a new technique for sparseness-based underdetermined BSS that is based on the clustering of the frequency-dependent time difference of arrival (TDOA) information and that can cope with diffused noise environments. Such a method with an EM algorithm has already been proposed, however, it required a time-consuming exhaust search for TDOA inference. To remove the need for such an exhaust search, we propose a new technique by focusing on a stereo case. We derive an update rule for analytical TDOA estimation. This update rule eliminates the need for the exhaustive TDOA search, and therefore reduces the computational load. We show experimental results for separation performance and calculation time in comparison with those obtained with the conventional approach. Our reported results validate our proposed method, that is, our proposed method achieves high performance without a high computational cost.
Spatial auditory BCI paradigm utilizing N200 and P300 responses

Zhenyu Cai, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 ( OS.6-BioSPS.1.4 ) 1-7 2012年 [査読有り]

　概要を見る

The paper presents our recent results obtained with a new auditory spatial localization based BCI paradigm in which the ERP shape differences at early latencies are employed to enhance the traditional P300 responses in an oddball experimental setting. The concept relies on the recent results in auditory neuroscience showing a possibility to differentiate early anterior contralateral responses to attended spatial sources. Contemporary stimuli-driven BCI paradigms benefit mostly from the P300 ERP latencies in so called "aha-response" settings. We show the further enhancement of the classification results in spatial auditory paradigms by incorporating the N200 latencies, which differentiate the brain responses to lateral, in relation to the subject head, sound locations in the auditory space. The results reveal that those early spatial auditory ERPs boost online classification results of the BCI application. The online BCI experiments with the multi-command BCI prototype support our research hypothesis with the higher classification results and the improved information-transfer-rates. © 2012 APSIPA.
Sonification of Muscular Activity in Human Movements Using the Temporal Patterns in EMG

Masaki Matsubara, Hiroko Terasawa, Hideki Kadone, Kenji Suzuki, Shoji Makino

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) ( OS.6-BioSPS.1.2 ) 1-5 2012年 [査読有り]

　概要を見る

Biofeedback is currently considered as an effective method for medical rehabilitation. It aims to increase the awareness and recognition of the body's motion by feeding back the physiological information to the patients in real time. Our goal is to create an auditory biofeedback that aids understanding of the dynamic motion involving multiple muscular parts, with the ultimate aim of clinical rehabilitation use. In this paper, we report the development of a real-time sonification system using EMG, and we propose three sonification methods that represent the data in pitch, timbre, and the combination of polyphonic timbre and loudness. Our user evaluation test involves the task of timing and order identification and a questionnaire about the subjective comprehensibility and the preferences, leading to a discussion of the task performance and usability. The results show that the subjects can understand the order of the muscular activities at 63.7% accuracy on average. And the sonification method with polyphonic timbre and loudness provides an 85.2% accuracy score on average, showing its effectiveness. Regarding the preference of the sound design, we found that there is not a direct relationship between the task performance accuracy and the preference of sound in the proposed implementations.
Vibrotactile stimulus frequency optimization for the haptic BCI prototype

Hiromu Mori, Yoshihiro Matsumito, Shoji Makino, Victor Kryssanov, Tomasz M. Rutkowski

6th International Conference on Soft Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012 2150 - 2153 2012年 [査読有り]

　概要を見る

The paper presents results from a psychophysical study conducted to optimize vibrotactile stimuli delivered to subject finger tips in order to evoke the somatosensory responses to be utilized next in a haptic brain computer interface (hBCI) paradigm. We also present the preliminary EEG evoked responses for the chosen stimulating frequency. The obtained results confirm our hypothesis that the hBCI paradigm concept is valid and it will allow for rapid stimuli presentation in order to improve information-transfer-rate (ITR) of the BCI. © 2012 IEEE.

DOI
AUTOMATIC SCORING METHOD CONSIDERING QUALITY AND CONTENT OF SPEECH FOR SCAT JAPANESE SPEAKING TEST

Naoko Okubo, Yuto Yamahata, Takeshi Yamada, Shingo Imai, Kenkichi Ishizuka, Takahiro Shinozaki, Ryuichi Nisimura, Shoji Makino, Nobuhiko Kitawaki

2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS 72 - 77 2012年 [査読有り]

　概要を見る

We are now developing a Japanese speaking test called SCAT, which is part of J-CAT (Japanese Computerized Adaptive Test), a free online proficiency test for Japanese language learners. In this paper, we focus on the sentence-reading-aloud task and the sentence generation task in SCAT, and propose an automatic scoring method for estimating the overall score of answer speech, which is holistically determined by language teachers according to a rating standard. In that process, teachers carefully consider different factors but do not rate the scores of them. We therefore analyze how each factor contributes to the overall score. The factors are divided into two categories: the quality of speech and the content of speech. The former includes pronunciation and intonation, and the latter representation and vocabulary. We then propose an automatic scoring method based on the analysis. Experimental results confirm that the proposed method gives relatively accurate estimates of the overall score.
Auditory steady-state response stimuli based BCI application - The optimization of the stimuli types and lengths

Yoshihiro Matsumoto, Nozomu Nishikawa, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 ( OS.13-BioSPS.2.3 ) 1-7 2012年 [査読有り]

　概要を見る

We propose a method for an improvement of auditory BCI (aBCI) paradigm based on a combination of ASSR stimuli optimization by choosing the subjects' best responses to AM-, flutter-, AM/FM and click-envelope modulated sounds. As the ASSR response features we propose pairwise phase-locking-values calculated from the EEG and next classified using binary classifier to detect attended and ignored stimuli. We also report on a possibility to use the stimuli as short as half a second, which is a step forward in ASSR based aBCI. The presented results are helpful for optimization of the aBCI stimuli for each subject. © 2012 APSIPA.
EEG steady state synchrony patterns sonification

Teruaki Kaniwa, Hiroko Terasawa, Masaki Matsubara, Tomasz M. Rutkowski, Shoji Makino

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 ( OS.6-BioSPS.1.5 ) 1-6 2012年 [査読有り]

　概要を見る

This paper describes an application of a multichannel EEG sonification approach. We present results obtained with a multichannel-sonification method tested with steady-state EEG responses. We elucidate brain synchrony patterns in an auditory domain with utilization of the EEG coherence measure. The transitions in the synchrony patterns are represented as timbre (i.e., spectro-temporal) deviation and as spatial movement of the sound cluster. Our final sonification evaluation experiment with six subjects confirms the validity of the proposed brain synchrony-elucidation approach. © 2012 APSIPA.
Distance Attenuation Control of Spherical Loudspeaker Array

Shigeki Miyabe, Takaya Hayashi, Takeshi Yamada, Shoji Makino

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) ( OS.15-SLA.7.2 ) 1-4 2012年 [査読有り]

　概要を見る

This paper describes control of distance attenuation using spherical loudspeaker array. Fisher et al. proposed radial filtering with spherical microphone to control the sensitivity to distance from a sound source by modeling the propagation of waves in spherical harmonic domain. Since transfer functions are not changed by swapping their inputs and outputs, we can use the same theory of radial filtering for microphone arrays to the filter design of distance attenuation control with loudspeaker arrays. Experimental results confirmed that the proposed method is effective in low frequencies.
The spatial real and virtual sound stimuli optimization for the auditory BCI

Nozomu Nishikawa, Yoshihiro Matsumoto, Shoji Makino, Tomasz M. Rutkowski

2012 Conference Handbook - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2012 ( OS.13-BioSPS.2.6 ) 1-9 2012年 [査読有り]

　概要を見る

The paper presents results from a project aiming to create horizontally distributed surround sound sources and virtual sound images as auditory BCI (aBCI) stimuli. The purpose is to create evoked brain wave response patterns depending on attended or ignored sound directions. We propose to use a modified version of the vector based amplitude panning (VBAP) approach to achieve the goal. The so created spatial sound stimulus system for the novel oddball aBCI paradigm allows us to create a multi-command experimental environment with very encouraging results reported in this paper.We also present results showing that a modulation of the sound image depth changes also the subject responses. Finally, we also compare the proposed virtual sound approach with the traditional one based on real sound sources generated from the real loudspeaker directions. The so obtained results confirm the hypothesis of the possibility to modulate independently the brain responses to spatial types and depths of sound sources which allows for the development of the novel multi-command aBCI. © 2012 APSIPA.
Psychophysical responses comparison in spatial visual, audiovisual, and auditory BCI-spelling paradigms

Moonjeong Chang, Nozomu Nishikawa, Zhenyu Cai, Shoji Makino, Tomasz M. Rutkowski

6th International Conference on Soft Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012 2154 - 2157 2012年 [査読有り]

　概要を見る

The paper presents a pilot study conducted with spatial visual, audiovisual and auditory brain-computer-interface (BCI) based speller paradigms. The psychophysical experiments are conducted with healthy subjects in order to evaluate a difficulty and a possible response accuracy variability. We also present preliminary EEG results in offline BCI mode. The obtained results validate a thesis, that spatial auditory only paradigm performs as good as the traditional visual and audiovisual speller BCI tasks. © 2012 IEEE.

DOI
Comparison of superimposition and sparse models in blind source separation by multichannel Wiener filter

Ryutaro Sakanashi, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) ( OS.18-SLA.9.5 ) 1-6 2012年 [査読有り]

　概要を見る

Multichannel Wiener filter proposed by Duong et al. can conduct underdetermined blind source separation (BSS) with low distortion. This method assumes that the observed signal is the superimposition of the multichannel source images generated from multivariate normal distributions. The covariance matrix in each time-frequency slot is estimated by an EM algorithm which treats the source images as the hidden variables. Using the estimated parameters, the source images are separated as the maximum a posteriori estimate. It is worth nothing that this method does not assume the sparseness of sources, which is usually assumed in underdetermined BSS. In this paper we investigate the effectiveness of the three attributes of Duong's method, i.e., the source image model with multivariate normal distribution, the observation model without sparseness assumption, and the source separation by multichannel Wiener filter. We newly formulate three BSS methods with the similar source image model and the different observation model assuming sparseness, and we compare them with Duong's method and the conventional binary masking. Experimental results confirmed the effectiveness of all the three attributes of Duong's method.
New analytical calculation and estimation of TDOA for underdetermined BSS in noisy environments

Takuro Maruyama, Shoko Araki, Tomohiro Nakatani, Shigeki Miyabe, Takeshi Yamada, Shoji Makino, Atsushi Nakamura

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) ( OS.12-SLA.6.4 ) 1-6 2012年 [査読有り]

　概要を見る

We have proposed a new algorithm for sparseness-based underdetermined blind source separation (ESS) that can cope with diffused noise environments. This algorithm includes a technique for estimating the time-difference-of-arrival (TDOA) parameter separately in individual frequency bins for each source. In this paper, we propose methods that integrate the frequency-bin-wise TDOA parameter to estimate the TDOA of each source. The accuracy of TDOA estimation with the proposed approach is shown experimentally in comparison with a conventional approach. The separation performance and calculation time of the proposed approach is also examined.
Visualization of conversation flow in meetings by analysis of direction of arrivals and continuousness of utterance

M., Katoh, Y., Sugimoto, S., Miyabe, S., Makino, T., Yamada, and, N. Kitawaki

Tunisian-Japan Symposium on Science, Society & Technology 1-5 2011年11月 [査読有り]
New EEG components separation method: Data driven Huang-Hilbert transform application to auditory BMI paradigm

T.M., Rutkowski, Q., Zhao, D.P., Mandic, Z., Cai, A., Cichocki, S., Makino, and, A.W. Przybyszewski

Neuroscience 2011 627.15/AAA32 2011年11月 [査読有り]
周波数依存の時間差モデルによる劣決定BSS

丸山, 卓郎, 荒木, 章子, 中谷, 智広, 宮部, 滋樹, 山田, 武志, 牧野, 昭二, 中村, 篤

電子情報通信学会技術研究報告. EA, 応用音響 111 ( 306 ) 25 - 30 2011年11月

　概要を見る

本研究では、EMアルゴリズムを用いたスパース性に基づく劣決定ブラインド音源分離(BSS)において、推定時間差を解析的に更新する手法を提案する。和泉らは、雑音・残響下でも頑強な劣決定BSSを提案しているが、時間差を示すパラメータを離散全探索で求める必要があるため、計算コストがかかってしまうという問題点がある。本稿では特に2chBSSの場合に議論を絞り、時間差が周波数に依存するモデルを用いることによって、時間差パラメタについて解析的な更新式を与える。この手法を用いることで、時間差を全探索により求める必要がなくなるため、計算時間を短縮することが可能になる。提案手法の分離性能・計算コストを、従来手法と比較した実験結果を示す。

CiNii
Performance estimation of noisy speech recognition based on short-term noise characteristics

E., Morishita, T., Yamada, S., Makino, and, N. Kitawaki

Tunisian-Japan Symposium on Science, Society & Technology 1-4 2011年11月 [査読有り]
Performance estimation of noisy speech recognition considering the accuracy of acoustic models

T., Takaoka, T., Yamada, S., Makino, and, N. Kitawaki

Tunisian-Japan Symposium on Science, Society & Technology 1-4 2011年11月 [査読有り]
A study on sound image control method for operational support of touch panel display

Shigeyoshi, Amano, Takeshi, Yamada, Shoji, Makino, Nobuhiko, Kitawaki

Proc. APSIPA ASC 2011 ( Thu-PM.PS2 ) 1-1 2011年10月 [査読有り]
雑音抑圧音声の主観・客観品質評価法

山田, 武志, 牧野, 昭二, 北脇, 信彦

日本音響学会誌 67 ( 10 ) 476 - 481 2011年10月 [査読有り]

CiNii
Towards a personalized technical ear training program: An investigation of the effect of adaptive feedback

T., Kaniwa, S., Kim, H., Terasawa, M., Ikeda, T., Yamada, and, S. Makino, 牧野, 昭二

Sound and Music Computing Conference 439-443 2011年07月 [査読有り]
C. elegans meets data sonification: Can we hear its elegant movement?

H., Terasawa, Y., Takahashi, K., Hirota, T., Hamano, T., Yamada, A., Fukamizu, and, S. Makino, 牧野, 昭二

Sound and Music Computing Conference 77-82 2011年07月 [査読有り]
DOA Estimation for Multiple Sparse Sources with Arbitrarily Arranged Multiple Sensors

Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY 63 ( 3 ) 265 - 275 2011年06月 [査読有り]

　概要を見る

This paper proposes a method for estimating the direction of arrival (DOA) of multiple source signals for an underdetermined situation, where the number of sources N exceeds the number of sensors M (M < N). Some DOA estimation methods have already been proposed for underdetermined cases. However, since most of them restrict their microphone array arrangements, their DOA estimation ability is limited to a 2-dimensional plane. To deal with an underdetermined case where sources are distributed arbitrarily, we propose a method that can employ a 2- or 3-dimensional sensor array. Our new method employs the source sparseness assumption to handle an underdetermined case. Our formulation with the sensor coordinate vectors allows us to employ arbitrarily arranged sensors easily. We obtained promising experimental results for 2-dimensionally distributed sensors and sources 3x4, 3x5 (#sensors x #speech sources), and for 3-dimensional case with 4x5 in a room (reverberation time (RT) of 120 ms). We also investigate the DOA estimation performance under several reverberant conditions.

DOI
B-11-19 楽音と音声の双方に適用できる客観品質評価法の検討(B-11.コミュニケーションクオリティ,一般セッション)

三上, 雄一郎, 山田, 武志, 牧野, 昭二, 北脇, 信彦

電子情報通信学会総合大会講演論文集 2011 ( 2 ) 448 2011年02月

CiNii
B-11-18 雑音抑圧音声の客観品質評価に用いる総合品質推定モデルの改良(B-11.コミュニケーションクオリティ,一般セッション)

藤田, 悠希, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会講演論文集 2011 ( 2 ) 447 - 447 2011年02月

CiNii
DCTと動きベクトルを極力継承して再量子化雑音を低減するインタレース映像用MPEG-2/H.264再符号化手法(画像・映像処理)

吉留, 健, 上倉, 一人, 牧野, 昭二, 北脇, 信彦

電子情報通信学会論文誌. D, 情報・システム 94 ( 2 ) 469 - 480 2011年02月 [査読有り]

　概要を見る

インタレース映像を符号化したMPEG-2ストリームをH.264へトランスコードする際に,初段符号化情報を利用して,混入する量子化雑音を低減する手法を提案する.本手法では,MPEG-2のDCT種別と動き補償種別をH.264へ極力継承し,更にフレームベクトルからフィールドベクトルに変換すれば継承可能となるペアMBをDCT種別と動き補償種別の組合せから判別し,ベクトル変換することで継承率を向上させる.実験の結果,符号化情報を利用しない従来手法に比べ,0.19〜0.31dBのPSNR向上が確認できた.

CiNii
Blind source separation of mixed speech in a high reverberation environment

Keiju Iso, Shoko Araki, Shoji Makino, Tomohiro Nakatani, Hiroshi Sawada, Takeshi Yamada, Atsushi Nakamura

2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays, HSCMA'11 36 - 39 2011年 [査読有り]

　概要を見る

Blind source separation (BSS) is a technique for estimating and separating individual source signals from a mixed signal using only information observed by each sensor. BSS is still being developed for mixed signals that are affected by reverberation. In this paper, we propose combining the BSS method that considers reverberation proposed by Duong et al. with the BSS method reported by Sawada et al., which does not consider reverberation, for the initial setting of the EM algorithm. This proposed method assumes the underdetermined case. In the experiment, we compare the proposed method with the conventional method reported by Duong et al. and that reported by Sawada et al., and demonstrate the effectiveness of the proposed method. © 2011 IEEE.

DOI
Spatial location and sound timbre as informative cues in auditory BCI/BMI - Electrodes position optimization for brain evoked potential enhancement

Zhenyu Cai, Hiroko Terasawa, Shoji Makino, Takeshi Yamada, Tomasz M. Rutkowski

APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011 ( Wed-PM.SS4 ) 222 - 227 2011年 [査読有り]

　概要を見る

The paper introduces a novel auditory BCI/BMI paradigm based on combined sound timbre and horizontal plane spatial locations as informative cues. The presented concept is based on responses to eight-directional audio stimuli with various tonal and environmental sound stimuli. The approach is based on a monitoring of brain electrical activity by means of the electroencephalogram (EEG). The previously developed by the authors spatial auditory stimulus is extended to varying in timbre sound stimuli which feature helps the subjects to attend to the targets. The main achievement discussed in the paper is an offline BCI analysis based on an optimization of electrode locations on the scalp and evoked response latency for further classification results improvement. The so developed new BCI paradigm is more user-friendly and it leads to better results comparing to previously utilized simple tonal or steady-state stimuli.
Restoration of Clipped Audio Signal Using Recursive Vector Projection

Shin Miura, Hirofumi Nakajima, Shigeki Miyabe, Shoji Makino, Takeshi Yamada, Kazuhiro Nakadai

2011 IEEE REGION 10 CONFERENCE TENCON 2011 394 - 397 2011年 [査読有り]

　概要を見る

This paper proposes signal restoration from clipping effect without prior knowledge. First, an interval of signal including clipped samples is analyzed by recursive vector projection. By analyzing the neighboring samples of the clipped interval and excluding the clipped interval in the analysis of similarity, signal estimation in the clipped interval is estimated as a by-product of the analysis. Since the estimation holds consistency with the neighboring samples, the restored signal does not suffer from click noise. Evaluation of the clipping restoration with various audio signal ascertained that the proposed method improves signal-to-noise ratio.
Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source

Kazuma Takeda, Hirokazu Kameoka, Hiroshi Sawada, Shoko Araki, Shigeki Miyabe, Takeshi Yamada, Shoji Makino

2011 IEEE REGION 10 CONFERENCE TENCON 2011 413 - 416 2011年 [査読有り]

　概要を見る

This paper presents a new method for underdetermined Blind Source Separation (BSS), based on a concept called multichannel complex non-negative matrix factorization (NMF). The method assumes (1) that the time-frequency representations of sources have disjoint support (W-disjoint orthogonality of sources), and (2) that each source is modeled as a superposition of components whose amplitudes vary over time coherently across all frequencies (amplitude coherence of frequency components) in order to jointly solve the indeterminacy involved in the frequency domain underdetermined BSS problem. We confirmed experimentally that the present method performed reasonably well in terms of the signal-to-interference ratio when the mixing process was known.
Mora pitch level recognition for the development of a Japanese pitch accent acquisition system

Greg, Short, Keikichi, Hirose, Takeshi, Yamada, Nobuaki, Minematsu, Nobuhiko, Kitawaki, Shoji, Makino

Proc. International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques, Oriental COCOSDA 2010 1-6 2010年10月 [査読有り]
雑音抑圧された音声の主観・客観品質評価法

山田, 武志, 牧野, 昭二, 北脇, 信彦

情報処理学会研究報告. SLP, 音声言語情報処理 2010 ( 7 ) 1 - 6 2010年10月

　概要を見る

雑音環境において高品質の音声通信を実現するためには,音声に重畳している雑音成分を抑圧することが有効である.しかし,雑音抑圧によって雑音の音量が低減する一方で,音声成分にはひずみが生じ,また抑圧しきれなかった雑音成分が残留するという問題が生じる.これらのひずみや残留雑音の特性は,雑音や雑音抑圧アルゴリズムの性質によって変動し,ユーザ体感品質に大きな影響を及ぼす.よって,雑音抑圧音声の品質を適切に評価する手法の確立が必要不可欠である.本稿では,雑音抑圧音声の主観品質評価法と客観品質評価法について述べる.

CiNii
A VC-1 to H.264/AVC intra transcoding using encoding information to reduce re-quantization noise

T., Yoshitome, Y., Nakajima, K., Kamikura, S., makino, N., Kitawaki

International Conference on Signal and Image Processing 170-177 2010年08月 [査読有り]
BS-5-4 雑音抑圧音声のMOSと単語了解度の客観推定(BS-5.QoE最前線-情報通信サービスにおけるユーザ体感品質-,シンポジウムセッション)

山田, 武志, 北脇, 信彦, 牧野, 昭二

電子情報通信学会ソサイエティ大会講演論文集 2010 ( 2 ) - 19 2010年08月
空間パワースペクトルの主成分分析に基づく時間断続信号の検出(音響信号処理/聴覚/一般)

加藤, 通朗, 杉本, 侑哉, 牧野, 昭二, 山田, 武志, 北脇, 信彦

電子情報通信学会技術研究報告. EA, 応用音響 110 ( 171 ) 25 - 30 2010年08月

　概要を見る

会議音声アーカイブを効率良く再生するためには,誰がいつ,どのように発話しているかを事前に検出しておくことが重要である.本稿では,相槌のように,発話の時間長が短く,かつ時間的に断続している信号を,連続発話区間や無発話区間と区別した上で,自動的に検出する手法を提案する.提案手法は,マイクロホンアレーにより収録した会議音声データから空間パワースペクトルの時系列を求め,これを主成分分析する.そして,方向毎に求めた主成分スコアに基づいて複数の話者・音源が発する時間断続信号を検出する.インタビュアーが1名,参加者が2名といった,同時発話にスパース性を仮定できる少人数の会議を想定し,提案手法の有効性を検証した.その結果,上位数個の主成分スコアから連続発話,相槌,無発話を検出できる示唆を得た.

CiNii
Special Section on Blind Signal Processing and Its Applications

Shoji Makino, Andrzej Cichocki, Wei Xing Zheng, Aurelio Uncini

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 57 ( 7 ) 1401 - 1403 2010年07月 [査読有り]

DOI
Special Section on Blind Signal Processing and Its Applications

Shoji Makino, Andrzej Cichocki, Wei Xing Zheng, Aurelio Uncini

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 57 ( 7 ) 1401 - 1403 2010年07月

DOI
Underdetermined Blind Source Separation Using Acoustic Arrays

Shoji Makino, Shoko Araki, Stefan Winter, Hiroshi Sawada

Handbook on Array Processing and Sensor Networks 303 - 341 2010年04月 [査読有り]

DOI
B-11-1 IP網における音声の客観品質評価に用いる擬似音声信号の検討(B-11.コミュニケーションクオリティ,一般セッション)

青島, 千佳, 北脇, 信彦, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会講演論文集 2010 ( 2 ) 435 2010年03月
B-11-2 雑音抑圧音声の総合品質推定モデルを適用したフルリファレンス客観品質評価法(B-11.コミュニケーションクオリティ,一般セッション)

篠原, 佑基, 山田, 武志, 北脇, 信彦, 牧野, 昭二

電子情報通信学会総合大会講演論文集 2010 ( 2 ) 436 2010年03月
MPEG-2/H.264 transcoding with vector conversion reducing re-quantization noise

Takeshi Yoshitome, Kazuto Kamikura, Shoji Makino, Nobuhiko Kitawaki

Proceedings - International Conference on Computer Communications and Networks, ICCCN 1-6 2010年 [査読有り]

　概要を見る

We propose an MPEG-2 to H.264 transcoding method for interlace streams intermingled with frame and field macroblocks. This method uses the encoding information from an MPEG-2 stream and keeps as many DCT coefficients of the original MPEG-2 bitstream as possible. Experimental results show that the proposed method improves PSNR by about 0.19-0.31 dB compared with a conventional method. © 2010 IEEE.

DOI
Performance Estimation of Noisy Speech Recognition Considering Recognition Task Complexity

Takeshi Yamada, Tomohiro Nakajima, Nobuhiko Kitawaki, Shoji Makino

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 2042 - 2045 2010年 [査読有り]

　概要を見る

To ensure a satisfactory QoE (Quality of Experience) and facilitate system design in speech recognition services, it is essential to establish a method that can be used to efficiently investigate recognition performance in different noise environments. Previously, we proposed a performance estimation method using a spectral distortion measure. However, there is the problem that recognition task complexity affects the relationship between the recognition performance and the distortion value. To solve this problem, this paper proposes a novel performance estimation method considering the recognition task complexity. We confirmed that the proposed method gives accurate estimates of the recognition performance for various recognition tasks by an experiment using noisy speech data recorded in a real room.
Comparison of MOS evaluation characteristics for Chinese, Japanese, and English in IP telephony

Zhenyu Cai, Nobuhiko Kitawaki, Takeshi Yamada, Shoji Makino

2010 4th International Universal Communication Symposium, IUCS 2010 - Proceedings 112 - 115 2010年 [査読有り]

　概要を見る

Communication quality in IP telephony is rated in terms of the Mean Opinion Score (MOS), which is an Absolute Category Rating (ACR) scale. There is a problem when comparing subjectively evaluated MOSs in that the evaluation results are strongly affected by differences in language, the instruction words used for the evaluation, and the nationality of the evaluator. To solve these problems, ITU-T SG12 has started to investigate the cultural and language dependencies of subjective quality evaluations undertaken with the MOS method for speech/video/multimedia. In this paper, we present the results of a comparison of the MOS evaluation characteristics for Chinese, Japanese, and English. ©2010 IEEE.

DOI
A study of artificial voices for telephonometry in the IP-based telecommunication networks

Chika, Aoshima, Nobuhiko, Kitawaki, Takeshi, Yamada, 山田, 武志, 牧野, 昭二

Proc. Tunisian-Japan Symposium on Science, Society & Technology 2009年11月 [査読有り]
Analysis of standardized speech database by considering long-term average spectrum

Naoko, Okubo, Nobuhiko, Kitawaki, Takeshi, Yamada, Makino, Shoji

Tunisian-Japan Symposium on Science, Society & Technology 1-4 2009年11月 [査読有り]
DOA estimation for multiple sparse sources with arbitrarily arranged multiple sensors

S., Araki, H., Sawada, R., Mukai, and, S. Makino, 牧野, 昭二

Journal of Signal Processing Systems 1-11 - 11 2009年10月 [査読有り]

CiNii
ブラインド信号処理の技術とその応用論文小特集の発行にあたって

牧野昭二

信学論A J92-A ( 5 ) 275 - 275 2009年05月 [査読有り]

CiNii
Stereo Source Separation and Source Counting with MAP Estimation with Dirichlet Prior Considering Spatial Aliasing Problem

Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada, Shoji Makino

INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS 5441 742 - 750 2009年 [査読有り]

　概要を見る

In this paper, we propose a novel sparse source separation method that can estimate the number of sources and time-frequency masks simultaneously, even when the spatial aliasing problem exists. Recently, many sparse Source separation approaches with time-frequency masks have been proposed. However, most of these approaches require information on the number of sources in advance. In our proposed method, we model the phase difference of arrival (PDOA) between microphones with a Gaussian mixture model (GMM) with a Dirichlet prior. Then we estimate the model parameters by using the maximum a posteriori (MAP) estimation based on the EM algorithm. In order to avoid one cluster being modeled by two or more Gaussians, we utilize a sparse distribution modeled by the Dirichlet distributions as the prior of the GMM mixture weight. Moreover, to handle wide microphone spacing cases where the spatial aliasing problem occurs, the indeterminacy of modulus 2 pi k in the phase is also included in our model. Experimental results show good performance of our proposed method.
BLIND SPARSE SOURCE SEPARATION FOR UNKNOWN NUMBER OF SOURCES USING GAUSSIAN MIXTURE MODEL FITTING WITH DIRICHLET PRIOR

Shoko Araki, Tomohiro Nakatani, Hiroshi Sawada, Shoji Makino

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS 33 - 36 2009年 [査読有り]

　概要を見る

In this paper, we propose a novel sparse source separation method that can be applied even if the number of sources is unknown. Recently, many sparse source separation approaches with time-frequency masks have been proposed. However, most of these approaches require information on the number of sources in advance. In our proposed method, we model the histogram of the estimated direction of arrival (DOA) with a Gaussian mixture model (GMM) with a Dirichlet prior. Then we estimate the model parameters by using the maximum a posteriori estimation based on the EM algorithm. In order to avoid one cluster being modeled by two or more Gaussians, we utilize a sparse distribution modeled by the Dirichlet distributions as the prior of the GMM mixture weight. By using this prior, without any specific model selection process, our proposed method can estimate the number of sources and time-frequency masks simultaneously. Experimental results show the performance of our proposed method.
Handling speaker position changes in a meeting diarization system by combining DOA clustering and speaker identification

T., Hager, S., Araki, K., Ishizuka, M., Fujimoto, T., Nakatani, and, S. Makino, 牧野, 昭二

IWAENC2008 2-12 2008年09月 [査読有り]

CiNii
Foreword to the special section on acoustic scene analysis and reproduction

S., Makino

IEICE Trans. Fundamentals E91-A ( 6 ) 1301-1302 2008年06月 [査読有り]
音源分離技術の最新動向

澤田宏, 荒木章子, 牧野昭二

電子情報通信学会誌 91 ( 4 ) 292-296 - 296 2008年04月 [査読有り]

　概要を見る

音源分離技術は,実環境におけるハンズフリー音声認識やコンピュータによる音環境理解のために必要不可欠な技術である.音源の位置や話者の特徴など,事前知識を必要としない,いわゆるブラインド処理に関する技術がこの10年で大きく進展した.本稿では,独立成分分析やスパース性など,ブラインド音源分離に必要な基本技術を分かりやすく解説し,研究動向や現状での到達点を述べる.

CiNii
A DOA based speaker diarization system for real meetings

Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino

2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS 30 - 33 2008年 [査読有り]

　概要を見る

This paper presents a speaker diarization system that estimates who spoke when in a meeting. Our proposed system is realized by using a noise robust voice activity detector (VAD), a direction of arrival (DOA) estimator, and a DOA classifier. Our previous system utilized the generalized cross correlation method with the phase transform (GCC-PHAT) approach for the DOA estimation. Because the GCC-PHAT can estimate just one DOA per frame, it was difficult to handle speaker overlaps. This paper tries to deal with this issue by employing a DOA at each time-frequency slot (TFDOA), and reports how it improves diarization performance for real meetings / conversations recorded in a room with a reverberation time of 350 ms.
Speaker indexing and speech enhancement in real meetings/conversations

Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 93 - 96 2008年 [査読有り]

　概要を見る

This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a GCC-PHAT based direction of arrival (DOA) estimator, and a DOA classifier. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings/conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.
Missing feature speech recognition in a meeting situation with maximum SNR beamforming

Dorothea Kolossa, Shoko Araki, Marc Delcroix, Tomohiro Nakatani, Reinhold Orglmeister, Shoji Makino

PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10 3218 - + 2008年 [査読有り]

　概要を見る

Especially for tasks like automatic meeting transcription, it would be useful to automatically recognize speech also while multiple speakers are talking simultaneously. For this purpose, speech separation can be performed, for example by using maximum SNR beamforming. However, even when good interferer suppression is attained, the interfering speech will still be recognizable during those intervals, where the target speaker is silent. In order to avoid the consequential insertion errors, a new soft masking scheme is proposed, which works in the time domain by inducing a large damping on those temporal periods, where the observed direction of arrival does not correspond to that of the target speaker. Even though the masking scheme is aggressive, by means of missing feature recognition the recognition accuracy can be improved significantly, with relative error reductions in the order of 60% compared to maximum SNR beamforming alone, and it is successful also for three simultaneously active speakers. Results are reported based on the SOLON speech recognizer, NTT's large vocabulary system [1], which is applied here for the recognition of artificially mixed data using real-room impulse responses and the entire clean test set of the Aurora 2 database.
Guest editors' introduction: Special section on emergent systems, algorithms, and architectures for speech-based human-machine interaction

Rodrigo Capobianco Guido, Li Deng, Shoji Makino

IEEE TRANSACTIONS ON COMPUTERS 56 ( 9 ) 1153 - 1155 2007年09月 [査読有り]
Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors

Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

SIGNAL PROCESSING 87 ( 8 ) 1833 - 1847 2007年08月 [査読有り]

　概要を見る

This paper presents a new method for blind sparse source separation. Some sparse source separation methods, which. rely on source sparseness and an anechoic mixing model, have already been proposed. These methods utilize level ratios and phase differences between sensor observations as their features, and they separate signals by classifying them. However, some of the features cannot form clusters with a well-known clustering algorithm, e.g., the k-means. Moreover, most previous methods utilize a linear sensor array (or only two sensors), and therefore they cannot separate symmetrically positioned sources. To overcome such problems, we propose a new feature that can be clustered by the k-means algorithm and that can be easily applied to more than three sensors arranged non-linearly. We have obtained promising results for two- and three-dimensionally distributed speech separation with non-linear/non-uniform sensor arrays in a real room even in underdetermined situations. We also investigate the way in which the performance of such methods is affected by room reverberation, which may cause the sparseness and anechoic assumptions to collapse. (C) 2007 Elsevier B.V. All rights reserved.

DOI CiNii
Introduction to the special section on blind signal processing for speech and audio applications

Shoji Makino, Te-Won Lee, Guy J. Brown

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 15 ( 5 ) 1509 - 1510 2007年07月 [査読有り]

DOI
MAP-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and ℓ1-norm minimization

Stefan Winter, Walter Kellermann, Hiroshi Sawada, Shoji Makino

Eurasip Journal on Advances in Signal Processing 2007 2007年 [査読有り]

　概要を見る

We address the problem of underdetermined BSS. While most previous approaches are designed for instantaneous mixtures,we propose a time-frequency-domain algorithm for convolutive mixtures. We adopt a two-step method based on a general maximum a posteriori (MAP) approach. In the first step, we estimate the mixing matrix based on hierarchical clustering, assuming that the source signals are sufficiently sparse. The algorithm works directly on the complex-valued data in the time-frequency domain and shows better convergence than algorithms based on self-organizing maps. The assumption of Laplacian priors for the source signals in the second step leads to an algorithm for estimating the source signals. It involves the ℓ1-norm minimization of complex numbers because of the use of the time-frequency-domain approach. We compare a combinatorial approach initially designed for real numbers with a second-order cone programming (SOCP) approach designed for complex numbers. We found that although the former approach is not theoretically justified for complex numbers, its results are comparable to, or even better than, the SOCP solution. The advantage is a lower computational cost for problems with low input/output dimensions.

DOI
Blind audio source separation based on independent component analysis

Shoji Makino, Hiroshi Sawada, Shoko Araki

INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS 4666 843 - 843 2007年 [査読有り]
Blind source separation based on a beamformer array and time frequency binary masking

Jan Cermak, Shoko Araki, Hiroshi Sawada, Shoji Makino

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS 145 - 148 2007年 [査読有り]

　概要を見る

This paper deals with a new technique for blind source separation (BSS) from convolutive mixtures. We present a three-stage separation system employing time-frequency binary masking, beamforming and a non-linear post processing technique. The experiments show that this system outperforms conventional time-frequency binary masking (TFBM) in both (over-)determined and underdetermined cases. Moreover it removes the musical noise and reduces interference in time-frequency slots extracted by TFBM.
MLSP 2007 data analysis competition: Frequency-domain blind source separation for convolutive mixtures of speech/audio signals

Hiroshi Sawada, Shoko Araki, Shoji Makino

Machine Learning for Signal Processing 17 - Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP 45 - 50 2007年 [査読有り]

　概要を見る

This paper describes the frequency-domain approach to the blind source separation of speech/audio signals that are convolutively mixed in a real room environment. With the application of shorttime Fourier transforms, convolutive mixtures in the time domain can be approximated as multiple instantaneous mixtures in the frequency domain. We employ complex-valued independent component analysis (ICA) to separate the mixtures in each frequency bin. Then, the permutation ambiguity of the ICA solutions should be aligned so that the separated signals are constructed properly in the time domain. We propose a permutation alignment method based on clustering the activity sequences of the frequency bin-wise separated signals. We achieved the overall winner status of MLSP 2007 Data Analysis Competition based on the presented method. ©2007 IEEE.

DOI
A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures

Hiroshi Sawada, Shoko Araki, Shoji Makino

2007 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS 157 - 160 2007年 [査読有り]

　概要を見る

This paper proposes a two-stage method for the blind separation of convolutively mixed sources. We employ time-frequency masking, which can be applied even to an underdetermined case where the number of sensors is insufficient for the number of sources. In the first stage of the method, frequency bin-wise mixtures are classified based on Gaussian mixture model fitting. In the second stage, the permutation ambiguities of the bin-wise classified signals are aligned by clustering the posterior probability sequences calculated in the first stage. Experimental results for separating four speeches with three microphones under reverberant conditions show the superiority of the proposed method over existing methods based on time-difference-of-arrival estimations or signal envelope clustering.
Measuring dependence of bin-wise separated signals for permutation alignment in frequency-domain BSS

Hiroshi Sawada, Shoko Araki, Shoji Makino

2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11 3247 - 3250 2007年 [査読有り]

　概要を見る

This paper presents a new method for grouping bin-wise separated signals for individual sources, i.e., solving the permutation problem, in the process of frequency-domain blind source separation. Conventionally, the correlation coefficient of separated signal envelopes is calculated to judge whether or not the separated signals originate from the same source. In this paper, we propose a new measure that represents the dominance of the separated signal in the mixtures, and use it for calculating the correlation coefficient, instead of a signal envelope. Such dominance measures exhibit dependence/independence more clearly than traditionally used signal envelopes. Consequently, a simple clustering algorithm with centroids works well for grouping separated signals. Experimental results were very appealing, as three sources including two coming from the same direction were separated properly with the new method.
Blind speech separation in a meeting situation with maximum SNR beamformers

Shoko Araki, Hiroshi Sawada, Shoji Makino

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS 41 - 44 2007年 [査読有り]

　概要を見る

We propose a speech separation method for a meeting situation, where each speaker sometimes speaks and the number of speakers changes every moment. Many source separation methods have already been proposed, however, they consider a case where all the speakers keep speaking: this is not always true in a real meeting. In such cases, in addition to separation, speech detection and the classification of the detected speech according to speaker become important issues. For that purpose, we propose a method that employs a maximum signal-to-noise (MaxSNR) beamformer combined with a voice activity detector and online clustering. We also discuss the scaling ambiguity problem as regards the MaxSNR beamformer, and provide their solutions. We report some encouraging results for a real meeting in a room with a reverberation time of about 350 ins.
First stereo audio source separation evaluation campaign: Data, algorithms and results

Emmanuel Vincent, Hiroshi Sawada, Pau Bofill, Shoji Makino, Justinian P. Rosca

INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS 4666 552 - + 2007年 [査読有り]

　概要を見る

This article provides an overview of the first stereo audio source separation evaluation campaign, organized by the authors. Fifteen underdetermined stereo source separation algorithms have been applied to various audio data, including instantaneous, convolutive and real mixtures of speech or music sources. The data and the algorithms are presented and the estimated source signals are compared to reference signals using several objective performance criteria.
MAP-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and l(1)-norm minimization

Stefan Winter, Walter Kellermann, Hiroshi Sawada, Shoji Makino

EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING 2007 ( 24717 ) 1 - 12 2007年 [査読有り]

　概要を見る

We address the problem of underdetermined BSS. While most previous approaches are designed for instantaneous mixtures, we propose a time-frequency-domain algorithm for convolutive mixtures. We adopt a two-step method based on a general maximum a posteriori ( MAP) approach. In the first step, we estimate the mixing matrix based on hierarchical clustering, assuming that the source signals are sufficiently sparse. The algorithm works directly on the complex-valued data in the time-frequency domain and shows better convergence than algorithms based on self-organizing maps. The assumption of Laplacian priors for the source signals in the second step leads to an algorithm for estimating the source signals. It involves the l(1)-norm minimization of complex numbers because of the use of the time-frequency-domain approach. We compare a combinatorial approach initially designed for real numbers with a second-order cone programming (SOCP) approach designed for complex numbers. We found that although the former approach is not theoretically justified for complex numbers, its results are comparable to, or even better than, the SOCP solution. The advantage is a lower computational cost for problems with low input/output dimensions. Copyright (C) 2007 Stefan Winter et al.

DOI
Frequency domain blind source separation in a noisy environment

R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

2006 Joint meeting of ASA and ASJ 1pSP1 2006年11月 [査読有り]
Normalized observation vector clustering approach for sparse source separation

S., Araki, H., Sawada, R., Mukai, and, S. Makino, 牧野, 昭二

EUSIPCO2006 Wed.5.4.4 2006年09月 [査読有り]
Underdetermined source separation by ICA and homomorphic signal processing

S., Winter, W., Kellermann, H., Sawada, S., Makino

IWAENC2006 Wed.Sep.8 2006年09月 [査読有り]
Performance evaluation of sparse source separation and DOA estimation with observation vector clustering in reverberant environments

S., Araki, H., Sawada, R., Mukai, and, S. Makino, 牧野, 昭二

IWAENC2006 Tue.Sep.4 2006年09月 [査読有り]
Blind sparse source separation with spatially smoothed time-frequency masking

S., Araki, H., Sawada, R., Mukai, S., Makino

IWAENC2006 Wed.Sep.9 2006年09月 [査読有り]

CiNii
Parametric-Pearson-based independent component analysis for frequency-domain blind speech separation

H., Kato, Y., Nagahara, S., Araki, H., Sawada, and, S. Makino, 牧野, 昭二

EUSIPCO2006 Tue.4.2.5 2006年09月 [査読有り]
Blind speech separation by combining beamformers and a time frequency binary mask

J., Cermak, S., Araki, H., Sawada, S., Makino

IWAENC2006 Tue.Sep.5 - 148 2006年09月 [査読有り]

CiNii
Underdetermined source separation for colored sources

S., Winter, W., Kellermann, H., Sawada, S., Makino

EUSIPCO2006 Thu.3.1.6 2006年09月 [査読有り]
Musical noise reduction in time-frequency-binary-masking-based blind source separation systems

J., Cermak, S., Araki, H., Sawada, and, S. Makino, 牧野, 昭二

Czech-German Workshop on Speech Processing 2006年09月 [査読有り]
Stereo echo cancellation algorithm using adaptive update on the basis of enhanced input-signal vector

S Emura, Y Haneda, A Kataoka, S Makino

SIGNAL PROCESSING 86 ( 6 ) 1157 - 1167 2006年06月 [査読有り]

　概要を見る

Stereo echo cancellation requires a fast converging adaptive algorithm because the stereo input signals are highly cross correlated and the convergence rate of the misalignment is slow even after preprocessing for unique identification of stereo echo paths. To speed up the convergence, we propose enhancing the contribution of the decorrelated components in the preprocessed input-signal vector to adaptive updates. The adaptive filter coefficients are updated on the basis of either a single or multiple past enhanced input-signal vectors.
For a single-vector update, we show how this enhancement improves the convergence rate by analyzing the behavior of the filter coefficient error in the mean. For a two-past-vector update, simulation showed that the proposed enhancement leads to a faster decrease in misalignment than the corresponding conventional second-order affine projection algorithm while computational complexities are almost the same. (c) 2005 Elsevier B.V. All rights reserved.

DOI
Solving the permutation problem of frequency-domain BSS when spatial aliasing occurs with wide sensor spacing

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 4935 - 4938 2006年 [査読有り]

　概要を見る

This paper describes a method for solving the permutation problem of frequency-domain blind source separation (BSS). The method analyzes the mixing system information estimated with independent component analysis (ICA). When we use widely spaced sensors or increase the sampling rate, spatial aliasing may occur for high frequencies due to the possibility of multiple cycles in the sensor spacing. In such cases, the estimated information would imply multiple possibilities for a source location. This causes some difficulty when analyzing the information. We propose a new method designed to overcome this difficulty. This method first estimates the model parameters for the mixing system at low frequencies where spatial aliasing does not occur, and then refines the estimations by using data at all frequencies. This refinement leads to precise parameter estimation and therefore precise permutation alignment. Experimental results show the effectiveness of the new method.
DOA estimation for multiple sparse sources with normalized observation vector clustering

Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS 33 - + 2006年 [査読有り]

　概要を見る

This paper presents a new method for estimating the direction of arrival (DOA) of source signals whose number N can exceed the number of sensors M. Subspace based methods, e.g., the MUSIC algorithm, have been widely studied, however, they are only applicable when M > N. Another conventional independent component analysis based method allows AY > N, however, it cannot be applied when A,1 < N. By contrast, our new method can be applied where the sources outnumber the sensors (i.e., an underdeterinined case M < N) by assuming source sparseness. Our method can cope with 2- or 3-dimensionally distributed sources with a 2- or 3-dimensional sensor array. We obtained promising experimental results for 3 x 4, 3 x 5 and 4 x 5 (#sensors x #speech sources) in a room (RT60 = 120 ms).
Blind source separation of many signals in the frequency domain

Ryo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 5827 - 5830 2006年 [査読有り]

　概要を見る

This paper describes the frequency-domain blind source separation (BSS) of convolutively mixed acoustic signals using independent component analysis (ICA). The most critical issue related to frequency domain BSS is the permutation problem. This paper presents two methods for solving this problem. Both methods are based on the clustering of information derived from a separation matrix obtained by ICA. The first method is based on direction of arrival (DOA) clustering. This approach is intuitive and easy to understand. The second method is based on normalized basis vector clustering. This method is less intuitive than the DOA based method, but it has several advantages. First, it does not need sensor ar-ray geometry information. Secondly, it can fully utilize the information contained in the separation matrix, since the clustering is performed in high-dimensional space. Experimental results show that our methods realize BSS in various situations such as the separation of many speech signals located in a 3-dimensional space, and the extraction of primary sound sources surrounded by many background interferences.
Frequency domain blind source separation of a reduced amount of data using frequency normalization

Enrique Robledo-Arnuncio, Hiroshi Sawada, Shoji Makino

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 5695 - 5698 2006年 [査読有り]

　概要を見る

The problem of blind source separation (BSS) from convolutive mixtures is often addressed using independent component analysis in the frequency domain. The separation performance with this approach degrades significantly when only a short amount of data is available, since the estimation of the separation system becomes inaccurate. In this paper we present a novel approach to the frequency domain BSS using frequency normalization. Under the conditions of almost sparse sources and of dominant direct path in the mixing systems, we show that the new approach provides better performance than the conventional one when the amount of available data is small.
Blind source separation of convolutive mixtures - art. no. 624709

Shoji Makino

Independent Component Analyses, Wavelets, Unsupervised Smart Sensors, and Neural Networks IV 6247 ( 7 ) 24709 - 24709 2006年 [査読有り]

　概要を見る

This paper introduces the blind source separation (BSS) of convolutive mixtures of acoustic signals, especially speech. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving, nonlinear decorrelation, nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency, domain adaptive microphone arrays, i.e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. This is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct separation filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction., and sources can be simultaneously active in BSS.

DOI
Geometrical interpretation of the PCA subspace approach for overdetermined blind source separation

S. Winter, H. Sawada, S. Makino

EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING 2006 ( 71632 ) 1-11 2006年 [査読有り]

　概要を見る

We discuss approaches for blind source separation where we can use more sensors than sources to obtain a better performance. The discussion focuses mainly on reducing the dimensions of mixed signals before applying independent component analysis. We compare two previously proposed methods. The first is based on principal component analysis, where noise reduction is achieved. The second is based on geometric considerations and selects a subset of sensors in accordance with the fact that a low frequency prefers a wide spacing, and a high frequency prefers a narrow spacing. We found that the PCA-based method behaves similarly to the geometry-based method for low frequencies in the way that it emphasizes the outer sensors and yields superior results for high frequencies. These results provide a better understanding of the former method.

DOI
Solving the permutation problem of frequency-domain BSS when spatial aliasing occurs with wide sensor spacing

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS 77 - + 2006年 [査読有り]

　概要を見る

This paper describes a method for solving the permutation problem of frequency-domain blind source separation (BSS). The method analyzes the mixing system information estimated with independent component analysis (ICA). When we use widely spaced sensors or increase the sampling rate, spatial aliasing may occur for high frequencies due to the possibility of multiple cycles in the sensor spacing. In such cases, the estimated information would imply multiple possibilities for a source location. This causes some difficulty when analyzing the information. We propose a new method designed to overcome this difficulty. This method first estimates the model parameters for the mixing system at low frequencies where spatial aliasing does not occur, and then refines the estimations by using data at all frequencies. This refinement leads to precise parameter estimation and therefore precise permutation alignment. Experimental results show the effectiveness of the new method.
Frequency domain blind source separation of a reduced amount of data using frequency normalization

Enrique Robledo-Arnunciou, Hiroshi Sawada, Shoji Makino

2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS 837 - + 2006年 [査読有り]

　概要を見る

The problem of blind source separation (BSS) from convolutive mixtures is often addressed using independent component analysis in the frequency domain. The separation performance with this approach degrades significantly when only a short amount of data is available, since the estimation of the separation system becomes inaccurate. In this paper we present a novel approach to the frequency domain BSS using frequency normalization. Under the conditions of almost sparse sources and of dominant direct path in the mixing systems. we show that the new approach provides better performance than the conventional one when the amount of available data is small.
Underdetermined sparse source separation of convolutive mixtures with observation vector clustering

Shoko Araki, Heroshi Sawada, Ryo Mukai, Shoji Makino

2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS 3594 - 3597 2006年 [査読有り]

　概要を見る

We propose a new method for solving the underdetermined sparse signal separation problem. Some sparseness based methods have already been proposed. However, most of these methods utilized a linear sensor array (or only two sensors), and therefore they have certain limitations; e.g., they cannot separate symmetrically positioned sources. To allow the use of more than three sensors that can be arranged in a non-linear/non-uniform way, we propose a new method that includes the normalization and clustering of the observation vectors. Our proposed method can handle both underdetermined case and (over-)determined cases. We show practical results for speech separation with nonlinear/non-uniform sensor arrangements. We obtained promising experimental results for the cases of 3 x 4, 4 x 5 (#sensors x #sources) in a room (RT60 = 120 ms).
DOA estimationfor multiple sparse sources with normalized observation vector clustering

Shoko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 4891 - 4894 2006年 [査読有り]

　概要を見る

This paper presents a new method for estimating the direction of arrival (DOA) of source signals whose number N can exceed the number of sensors M. Subspace based methods, e.g., the MUSIC algorithm, have been widely studied, however, they are only applicable when M > N. Another conventional independent component analysis based method allows M > N, however, it cannot be applied when M < N. By contrast, our new method can be applied where the sources outnumber the sensors (i.e., an underdetermined case M < N) by assuming source sparseness. Our method can cope with 2- or 3-dimensionally distributed sources with a 2- or 3-dimensional sensor array. We obtained promising experimental results for 3 x 4, 3 x 5 and 4 x 5 (#sensors x #speech sources) in a room (RT60 = 120 ms).
On calculating the inverse of separation matrix in frequency-domain blind source separation

H Sawada, S Araki, R Mukai, S Makino

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION, PROCEEDINGS 3889 691 - 699 2006年 [査読有り]

　概要を見る

For blind source separation (BSS) of convolutive mixtures, the frequency-domain approach is efficient and practical, because the convolutive mixtures are modeled with instantaneous mixtures at each frequency bin and simple instantaneous independent component analysis (ICA) can be employed to separate the mixtures. However, the permutation and scaling ambiguities of ICA solutions need to be aligned to obtain proper time-domain separated signals. This paper discusses the idea that calculating the inverses of separation matrices obtained by ICA is very important as regards aligning these ambiguities. This paper also shows the relationship between the ICA-based method and the time-frequency masking method for BSS, which becomes clear by calculating the inverses.
Blind source separation of many signals in the frequency domain

Ryo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS 969 - + 2006年 [査読有り]

　概要を見る

This paper describes the frequency-domain blind source separation (BSS) of convolutively mixed acoustic signals using independent component analysis (ICA). The most critical issue related to frequency domain BSS is the permutation problem. This paper presents two methods for solving this problem. Both methods are based on the clustering of information derived from a separation matrix obtained by ICA. The first method is based on direction of arrival (DOA) clustering. This approach is intuitive and easy to understand. The second method is based on normalized basis vector clustering. This method is less intuitive than the DOA based method, but it has several advantages. First, it does not need sensor array geometry information. Secondly, it can fully utilize the information contained in the separation matrix, since the clustering is performed in high-dimensional space. Experimental results show that our methods realize BSS in various situations such as the separation of many speech signals located in a 3-dimensional space. and the extraction of primary sound sources surrounded by many background interferences.
Recognition of convolutive speech mixtures by missing feature techniques for ICA

Dorothea Kolossa, Hiroshi Sawada, Ramon Fernandez Astudillo, Reinhold Orglmeister, Shoji Makino

2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5 1397 - + 2006年 [査読有り]

　概要を見る

One challenging problem for robust speech recognition is the cocktail party effect, where multiple speaker signals are active simultaneously in an overlapping frequency range. In that case, independent component analysis (ICA) can separate the signals in reverberant environments, also. However, incurred feature distortions prove detrimental for speech recognition. To reduce consequential recognition errors, we describe the use of ICA for the additional estimation of uncertainty information. This information is subsequently used in missing feature speech recognition, which leads to far more correct and accurate recognition also in reverberant situations at RT60 = 300ms.
Blind separation and localization of speeches in a meeting situation

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5 1407 - + 2006年 [査読有り]

　概要を見る

The technique of blind source separation (BSS) has been well studied. In this paper, we apply the BSS technique, particularly based on independent component analysis (ICA), to a meeting situation. The goal is to enhance the spoken utterances and to estimate the location of each speaker by means of multiple microphones. The technique may help us to take the minutes of a meeting.
Frequency-domain blind source separation of many speech signals using near-field and far-field models

Ruo Mukai, Hiroshi Sawada, Shoko Araki, Shoji Makino

EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING 2006 ( 83683 ) 1 - 13 2006年 [査読有り]

　概要を見る

We discuss the frequency-domain blind source separation (BSS) of convolutive mixtures when the number of source signals is large, and the potential source locations are omnidirectional. The most critical problem related to the frequency-domain BSS is the permutation problem, and geometric information is helpful as regards solving it. In this paper, we propose a method for obtaining proper geometric information with which to solve the permutation problem when the number of source signals is large and some of the signals come from the same or a similar direction. First, we describe a method for estimating the absolute DOA by using relative DOAs obtained by the solution provided by independent component analysis (ICA) and the far-field model. Next, we propose a method for estimating the spheres on which source signals exist by using ICA solution and the near-field model. We also address another problem with regard to frequency-domain BSS that arises from the circularity of discrete-frequency representation. We discuss the characteristics of the problem and present a solution for solving it. Experimental results using eight microphones in a room show that the proposed method can separate a mixture of six speech signals arriving from various directions, even when two of them come from the same direction. Copyright (C) 2006 Ryo Mukai et al.

DOI
Subband-based blind separation for convolutive mixtures of speech

S Araki, S Makino, R Aichner, T Nishikawa, H Saruwatari

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E88A ( 12 ) 3593 - 3603 2005年12月 [査読有り]

　概要を見る

We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.

DOI
Underdetermined blind separation for speech in real environments with F0 adaptive comb filtering

F., Flego, S., Araki, H., Sawada, T., Nakatani, and, S. Makino, 牧野, 昭二

IWAENC2005 93-96 2005年09月 [査読有り]
Real-time blind source separation and DOA estimation using small 3-D microphone array

R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

IWAENC2005 45-48 2005年09月 [査読有り]
Real-time blind extraction of dominant target sources from many background interference sources

H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

IWAENC2005 73-76 - 76 2005年09月 [査読有り]

CiNii
A novel blind source separation method with observation vector clustering

S., Araki, H., Sawada, R., Mukai, and, S. Makino, 牧野, 昭二

IWAENC2005 117-120 2005年09月 [査読有り]
Blind source separation of convolutive mixtures of audio signals in frequency domain

S., Makino

Advances in Circuits and Systems ( 5 ) 2005年08月 [査読有り]
Underdetermined blind separation of convolutive mixtures of speech using time-frequency mask and mixing matrix estimation

A Blin, S Araki, S Makino

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E88A ( 7 ) 1693 - 1700 2005年07月 [査読有り]

　概要を見る

This paper focuses on the underdetermined blind source separation (BSS) of three speech signals mixed in a real environment from measurements provided by two sensors. To date, solutions to the underdetermined BSS problem have mainly been based on the assumption that the speech signals are sufficiently sparse. They involve designing binary masks that extract signals at time-frequency points where only one signal was assumed to exist. The major issue encountered in previous work relates to the occurrence of distortion, which affects a separated signal with loud musical noise. To overcome this problem, we propose combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to detect when only one source is active and to perform a preliminary separation with a time-frequency mask. This information is then used to estimate the mixing matrix, which allows us to improve our separation. Experimental results show that this combination of time-frequency mask and mixing matrix estimation provides separated signals of better quality (less distortion, less musical noise) than those extracted without using the estimated mixing matrix in reverberant conditions where the reverberant time (TR) was 130 ms and 200 ms. Furthermore, informal listening tests clearly show that musical noise is deeply lowered by the proposed method comparatively to the classical approaches.

DOI
Blind source separation of convolutive mixtures of speech in frequency domain

S Makino, H Sawada, R Mukai, S Araki

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E88A ( 7 ) 1640 - 1655 2005年07月 [査読有り] [招待有り]

　概要を見る

This paper overviews a total solution for frequency-domain blind source separation (BSS) of convolutive mixtures of audio signals, especially speech. Frequency-domain BSS performs independent component analysis (ICA) in each frequency bin, and this is more efficient than time-domain BSS. We describe a sophisticated total solution for frequency-domain BSS, including permutation, scaling, circularity, and complex activation function solutions. Experimental results of 2 x 2, 3 x 3, 4 x 4, 6 x 8, and 2 x 2 (moving sources), (#sources x #microphones) in a room are promising.

DOI
Frequency-domain blind source separation without array geometry information

H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

HSCMA2005 d13-d14 2005年03月 [査読有り]
Blind source separation and DOA estimation using small 3-D microphone array

R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

HSCMA2005 (Joint Workshop on Hands-Free Speech Communication and Microphone Arrays) d9-d10 2005年03月 [査読有り]
Source extraction from speech mixtures with null-directivity pattern based mask

S., Araki, S., Makino, H., Sawada, and, R. Mukai

HSCMA2005 d1-d2 2005年03月 [査読有り]
Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

Proceedings - IEEE International Symposium on Circuits and Systems 5882 - 5885 2005年 [査読有り]

　概要を見る

This paper presents a method for enhancing a target source of interest and suppressing other interference sources. The target source is assumed to be close to sensors, to have dominant power at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e. without knowing the total number of sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. We propose a new sophisticated method for selecting the target source frequency components, and also a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room (reverberation time was 130 ms) are presented to show the effectiveness and characteristics of the proposed method. © 2005 IEEE.

DOI
Blind extraction of a dominant source signal from mixtures of many sources

Hiroshi Sawada, Shoko Araki, Ryo Mukai, Shoji Makino

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings III III61 - III64 2005年 [査読有り]

　概要を見る

This paper presents a method for enhancing a dominant target source that is close to sensors, and suppressing other interferences. The enhancement is performed blindly, i.e. without knowing the number of total sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage processing technique where a spatial filter is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. To obtain the spatial filter we employ independent component analysis and then select the component of the target source. Time-frequency masks in the second stage are obtained by calculating the angle between the basis vector corresponding to the target source and a sample vector. The experimental results for a simulated cocktail party situation were very encouraging. ©2005 IEEE.

DOI
Multiple source localization using independent component analysis

Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

IEEE Antennas and Propagation Society, AP-S International Symposium (Digest) 4 ( P3 ) 81 - 84 2005年 [査読有り]

　概要を見る

This paper presents a method for estimating location information about multiple sources. The proposed method uses independent component analysis (ICA) as a main statistical tool. The nearfield model as well as the farfield model can be assumed in this method. As an application of the method, we show experimental results for the direction-of-arrival (DOA) estimation of three sources that were positioned 3-dimensionally. © 2005 IEEE.

DOI
Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask

S Araki, S Makino, H Sawada, R Mukai

2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 III 81 - 84 2005年 [査読有り]

　概要を見る

Musical noise is a typical problem with blind source separation using a time-frequency mask. In this paper, we report that a fine-shift and overlap-add method reduces the musical noise without degrading the separation performance. The effectiveness was confirmed by results of a the listening test undertaken in a room with a reverberation time of RT60 =130 ms.
Estimating the number of sources using independent component analysis

Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

Acoustical Science and Technology 26 ( 5 ) 450 - 452 2005年 [査読有り]

　概要を見る

A new approach for estimating the number of sources that employs independent component analysis (ICA) is discussed. Estimating the number of sources provides information for signal processing applications such as blind source separation (BSS) in the frequency domain. The new method can identify a noise component that includes reverberations by calculating the correlation of the envelopes. The results show that the characteristics of the proposed approach compare with the conventional eigenvalue-based method.

DOI
Blind extraction of a dominant source from mixtures of many sources using ICA and time-frequency masking

H Sawada, S Araki, R Mukai, S Makino

2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS III 5882 - 5885 2005年 [査読有り]

　概要を見る

This paper presents a method for enhancing a target source of interest and suppressing other interference sources. The target source is assumed to be close to sensors, to have dominant power at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e. without knowing the total number of sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. We propose a new sophisticated method for selecting the target source frequency components, and also a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room (reverberation time was 130 ms) are presented to show the effectiveness and characteristics of the proposed method.
A spatio-temporal fastica algorithm for separating convolutive mixtures

SC Douglas, H Sawada, S Makino

2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 V 165 - 168 2005年 [査読有り]

　概要を見る

This paper presents a spatio-temporal extension of the well-known fastICA algorithm of Hyvarinen and Oja that is applicable to both convolutive blind source separation and multichannel blind deconvolution tasks. Our time-domain algorithm combines multichannel spatio-temporal prewhitening via multi-stage least-squares linear prediction with a fixed-point iteration involving a new adaptive technique for imposing paraunitary constraints on the multichannel separation filter. Our technique also allows for efficient reconstruction of individual signals as observed in the sensor measurements for single-input, multiple-output (SIMO) BSS tasks. Analysis and simulations verify the utility of the proposed methods.
Blind Source Separation of 3-D located many speech signals

R Mukai, H Sawada, S Araki, S Makino

2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) 9 - 12 2005年 [査読有り]

　概要を見る

This paper presents if prototype system for Blind Source Separation (BSS) of many speech signals and describes the techniques used in the system. Our System uses 8 microphones located at the vertexes of a 4cmx4cmx4cm cube and has the ability to separate signals distributed in three-dimensional space. The mixed signals observed by the microphone array are processed by Independent Component Analysis (ICA) in the frequency domain and separated into a given number of signals (LIP to 8). We carried Out experiments in all ordinary office and obtained more than 20 dB of SIR improvement.
On real and complex valued l(1)-norm minimization for overcomplete blind source separation

S Winter, H Sawada, S Makino

2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) 86 - 89 2005年 [査読有り]

　概要を見る

A maximum it-posteriori approach for overcomplete blind source separation based on Laplacian priors usually involves l(1)-norm minimization. It requires different approaches for real and complex numbers it,; they appear for example in the frequency domain. In this paper we compare a combinatorial approach for real numbers with it second order cone programming approach for complex numbers.
Although the combinatorial solution with a proven minimum number of zeros is not theoretically justified for complex numbers, its performance quality is comparable to the performance of the second order cone programming (SOCP) solution. However, it has the advantage that it is faster for complex overcomplete BSS problems with low input/output dimensions.
Hierarchical clustering applied to overcomplete BSS for convolutive mixtures

S., Winter, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

SAPA2004 (ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing) 1 ( 3 ) 1-6 2004年10月 [査読有り]
Underdetermined blind speech separation with directivity pattern based continuous mask and ICA

S., Araki, S., Makino, H., Sawada, and, R. Mukai

EUSIPCO2004 1991-1994 2004年09月 [査読有り]
Blind source separation for moving speech signals using blockwise ICA and residual crosstalk subtraction

R Mukai, H Sawada, S Araki, S Makino

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E87A ( 8 ) 1941 - 1948 2004年08月 [査読有り]

　概要を見る

This paper describes a real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.
Convolutive blind source separation for more than two sources in the frequency domain

Hiroshi Sawada, Ryo Mukai, Shoko Araki, Shoji Makino

Acoustical Science and Technology 25 ( 4 ) 296 - 298 2004年07月 [査読有り]

　概要を見る

The use of blind source separation (BSS) technique for the recovery of more than two sources inthe frequency domain was iprensented. It was found that frequency-domain BSS method was practically applicable for more than two sources by overcoming problem of permutation and circularity. The minimization error could be done by adjusting the scaling ambiguity of the independent component analysis (ICA) solution before windowing. The result shows that the effectiveness and efficiency of the BSS method and the separation of six sources with a planar array of eight sensors.

DOI
Underdetermined blind source separation for convolutive mixtures exploiting a sparseness-mixing matrix estimation (SMME)

A., Blin, S., Araki, and, S. Makino, 牧野, 昭二

ICA2004 (International Congress on Acoustics) IV 3139-3142 2004年04月 [査読有り]
A causal frequency-domain implementation of a natural gradient multichannel blind deconvolution and source separation algorithm

S., Douglas, H., Sawada, and, S. Makino, 牧野, 昭二

ICA2004 (International Congress on Acoustics) I 85-88 2004年04月 [査読有り]
Solving the permutation and circularity problems of frequency-domain blind source separation

H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

ICA2004 (International Congress on Acoustics) I 89-92 2004年04月 [査読有り]
Algorithmic complexity based blind source separation for convolutive speech mixtures

S, de la Kethulle, R., Mukai, H., Sawada, and, S. Makino, 牧野, 昭二

ICA2004 (International Congress on Acoustics) IV 3127-3130 2004年04月 [査読有り]
A solution for the permutation problem in frequency domain BSS using near- and far-field models

R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

ICA2004 (International Congress on Acoustics) IV 3135-3138 2004年04月 [査読有り]
Underdetermined blind separation of convolutive mixtures of speech by combining time-frequency masks and ICA

S., Araki, S., Makino, A., Blin, R., Mukai, and, H. Sawada

ICA2004 (International Congress on Acoustics) I 321-324 2004年04月 [査読有り]
Evaluation of separation and dereverberation performance in frequency domain blind source separation

Ryo Mukai, Shoko Araki, Hiroshi Sawada, Shoji Makino

Acoustical Science and Technology 25 ( 2 ) 119 - 126 2004年03月 [査読有り]

　概要を見る

In this paper, we propose a new method for evaluating the separation and dereverberation performance of a convolutive blind source separation (BSS) system, and investigate a separating system obtained by employing frequency domain BSS based on independent component analysis (ICA). As a result, we reveal the acoustical characteristics of the frequency domain BSS for convolutive mixture of speech signals. We show that the separating system removes the direct sound of a jammer signal even when the frame length is relatively short, and it also reduces the reverberation of the jammer according to the frame length. We also confirm that the reverberation of the target is not reduced. Moreover, we propose a technique, suggested by the experimental results, for improving the quality of the separated signals by removing pre-echo noise.

DOI
Underdetermined blind separation for speech in real environments with sparseness and ICA

S Araki, S Makino, A Blin, R Mukai, H Sawada

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS III 881 - 884 2004年 [査読有り]

　概要を見る

In this paper, we propose a method for separating speech signals when there are more signals than sensors. Several methods have already been proposed for solving the underdetermined problem, and some of these utilize the sparseness of speech signals. These methods employ binary masks to extract the signals, and therefore, their extracted signals contain loud musical noise. To overcome this problem, we propose combining a sparseness approach and independent component analysis (ICA). First, using sparseness, we estimate the time points when only one source is active. Then, we remove this single source from the observations and apply ICA to the remaining mixtures. Experimental results show that our proposed sparseness and ICA (SPICA) method can separate signals with little distortion even in reverberant conditions of T-R=130 and 200 ms.
Frequency domain blind source separation using small and large spacing sensor pairs

R Mukai, H Sawada, S Araki, S Makino

2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 5, PROCEEDINGS V 1 - 4 2004年 [査読有り]

　概要を見る

This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when the number of source signals is large, and the potential source locations are onmidirectional. We propose a combination of small and large spacing sensor pairs with various axis directions in order to obtain proper geometrical information for solving the permutation problem. Experimental results show that the proposed method can separate a mixture of six speech signals that come from various directions, even when two of them come from the same direction.
Convolutive blind source separation for more than two sources in the frequency domain

H Sawada, R Mukai, S Araki, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS III 885 - 888 2004年 [査読有り]

　概要を見る

Blind source separation (BSS) for convolutive mixtures can be efficiently achieved in the frequency domain, where independent component analysis is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem, which is well known as a difficult problem, especially when the number of sources is large. This paper presents a method for solving the permutation problem, which works well even for many sources. The successful solution for the permutation problem highlights another problem with frequency-domain BSS that arises from the circularity of discrete frequency representation. This paper discusses the phenomena of the problem and presents a method for solving it. With these two methods, we can separate many sources with a practical execution time. Moreover, real-time processing is currently possible for up to three sources with our implementation.
Audio source separation based on independent component analysis

S Makino, S Araki, R Mukai, H Sawada

2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 5, PROCEEDINGS V 668 - 671 2004年 [査読有り]

　概要を見る

This paper introduces the blind source separation (BSS) of convolutive mixtures of acoustic signals, especially speech. A statistical and computational technique, called independent component analysis (ICA), is examined. By achieving nonlinear decorrelation,.nonstationary decorrelation, or time-delayed decorrelation, we can find source signals only from observed mixed signals. Particular attention is paid to the physical interpretation of BSS from the acoustical signal processing point of view. Frequency-domain BSS is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., adaptive beamformers (ABFs). Although BSS can reduce reverberant sounds to some extent in the same way as ABF, it mainly removes the sounds from the jammer direction. This is why BSS has difficulties with long reverberation in the real world. If sources are not "independent," the dependence results in bias noise when obtaining the correct unmixing filter coefficients. Therefore, the performance of BSS is limited by that of ABF. Although BSS is upper bounded by ABF, BSS has a strong advantage over ABF. BSS can be regarded as an intelligent version of ABF in the sense that it can adapt without any information on the array manifold or the target direction, and sources can be simultaneously active in BSS.
Near-field frequency domain blind source separation for convolutive mixtures

R Mukai, H Sawada, S Araki, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS IV 49 - 52 2004年 [査読有り]

　概要を見る

This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when source signals come from the same or similar directions. Geometric information such as the direction of arrival (DOA) is helpful for solving the permutation problem, and a combination of the DOA based and correlation based methods provides a robust and precise solution. However when signals come from similar directions, the DOA based approach fails, and we have to use only the correlation based method whose performance is unstable. In this paper, we show that an interpretation of the ICA solution by a near-field model yields information about spheres on which source signals exist, which can be used as an alternative to the DOA. Experimental results show that the proposed method can robustly separate a mixture of signals arriving from the same direction.
On coefficient delay in natural gradient blind deconvolution and source separation algorithms

SC Douglas, H Sawada, S Makino

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 634 - 642 2004年 [査読有り]

　概要を見る

In this paper, we study the performance effects caused by coefficient delays in natural gradient blind deconvolution and source separation algorithms. We present a statistical analysis of the effect of coefficient delays within such algorithms, quantifying the relative loss in performance caused by such coefficient delays with respect to delayless algorithm updates. We then propose a simple change to one such algorithm to improve its convergence performance.
Overcomplete BSS for convolutive mixtures based on hierarchical clustering

S Winter, H Sawada, S Araki, S Makino

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 652 - 660 2004年 [査読有り]

　概要を見る

In this paper we address the problem of overcomplete BSS for convolutive mixtures following a two-step approach. In the first step the mixing matrix is estimated, which is then used to separate the signals in the second step. For estimating the mixing matrix we propose an algorithm based on hierarchical clustering, assuming that the source signals are sufficiently sparse. It has the advantage of working directly on the complex valued sample data in the frequency-domain. It also shows better convergence than algorithms based on self-organizing maps. The results are improved by reducing the variance of direction of arrival. Experiments show accurate estimations of the mixing matrix and very low musical tone noise.
Natural gradient multichannel blind deconvolution and source separation using causal FIR filters

SC Douglas, H Sawada, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS V 477 - 480 2004年 [査読有り]

　概要を見る

Practical gradient-based adaptive algorithms for multichannel blind deconvolution and convolutive blind source separation typically employ FIR filters for the separation system. Inadequate use of signal truncation within these algorithms can introduce steady-state biases into their converged solutions that lead to degraded separation and deconvolution performances. In this paper, we derive a natural gradient multichannel blind deconvolution and source separation algorithm that mitigates these effects for estimating causal FIR solutions to these tasks. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for acoustic sources, even for extremely-short separation filters.
Convolutive blind source separation for more than two sources in the frequency domain

H Sawada, R Mukai, S Araki, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS 25 ( 4 ) 885 - 888 2004年 [査読有り]

　概要を見る

Blind source separation (BSS) for convolutive mixtures can be efficiently achieved in the frequency domain, where independent component analysis is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem, which is well known as a difficult problem, especially when the number of sources is large. This paper presents a method for solving the permutation problem, which works well even for many sources. The successful solution for the permutation problem highlights another problem with frequency-domain BSS that arises from the circularity of discrete frequency representation. This paper discusses the phenomena of the problem and presents a method for solving it. With these two methods, we can separate many sources with a practical execution time. Moreover, real-time processing is currently possible for up to three sources with our implementation.
Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ICA

S Araki, S Makino, H Sawada, R Mukai

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 898 - 905 2004年 [査読有り]

　概要を見る

We propose a method for separating N speech signals with M sensors where N > M. Some existing methods employ binary masks to extract the signals, and therefore, the extracted signals contain loud musical noise. To overcome this problem, we propose using a directivity pattern based continuous mask, which masks N - M sources in the observations, and independent component analysis (ICA) to separate the remaining mixtures. We conducted experiments for N = 3 with M = 2 and N = 4 with M = 2, and obtained separated signals with little distortion.
Natural gradient multichannel blind deconvolution and source separation using causal FIR filters

SC Douglas, H Sawada, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS 13 ( 1 ) 477 - 480 2004年 [査読有り]

　概要を見る

Practical gradient-based adaptive algorithms for multichannel blind deconvolution and convolutive blind source separation typically employ FIR filters for the separation system. Inadequate use of signal truncation within these algorithms can introduce steady-state biases into their converged solutions that lead to degraded separation and deconvolution performances. In this paper, we derive a natural gradient multichannel blind deconvolution and source separation algorithm that mitigates these effects for estimating causal FIR solutions to these tasks. Numerical experiments verify the robust convergence performance of the new method both in multichannel blind deconvolution tasks for i.i.d. sources and in convolutive BSS tasks for acoustic sources, even for extremely-short separation filters.
A sparseness - Mixing Matrix Estimation (SMME) solving the underdetermined BSS for convolutive mixtures

A Blin, S Araki, S Makino

2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS IV 85 - 88 2004年 [査読有り]

　概要を見る

We propose a method for blindly separating real environment speech signals with as less distortion as possible in the special case where speech signals outnumber sensors. Our idea consists in combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to perform a preliminary separation and to detect when only one source is active. This information is then used to estimate the mixing matrix. Then we remove one source from the observations and separate the residual signals with the inverse of the estimated mixing matrix. Experimental results in a real environment (T-R=130ms and 200ms) show that our proposed method, which we call Sparseness Mixing Matrix Estimation (SMME), provides separated signals of better quality than those extracted by only using the sparseness property of the speech signal.
Frequency domain blind source separation for many speech signals

R Mukai, H Sawada, S Araki, S Makino

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 461 - 469 2004年 [査読有り]

　概要を見る

This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when the number of source signals is large, and the potential source locations are omnidirectional. We propose a combination of small and large spacing sensor pairs with various axis directions in order to obtain proper geometric information for solving the permutation problem. Experimental results in a room (reverberation time T-R = 130 ms) with eight microphones show that the proposed method can separate a mixture of six speech signals that come from various directions, even when two of them come from the same direction.
Estimating the number of sources for frequency-domain blind source separation

H Sawada, S Winter, R Mukai, S Araki, S Makino

INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION 3195 610 - 617 2004年 [査読有り]

　概要を見る

Blind source separation (BSS) for convolutive mixtures can be performed efficiently in the frequency domain, where independent component analysis (ICA) is applied separately in each frequency bin. To solve the permutation problem of frequency-domain BSS robustly, information regarding the number of sources is very important. This paper presents a method for estimating the number of sources from convolutive mixtures of sources. The new method estimates the power of each source or noise component by using ICA and a scaling technique to distinguish sources and noises. Also, a reverberant component can be identified by calculating the correlation of component envelopes. Experimental results for up to three sources show that the proposed method worked well in a reverberant condition whose reverberation time was 200 ms.
Underdetermined blind separation of convolutive mixtures of speech with binary masks and ICA

S., Araki, S., Makino, H., Sawada, A., Blin, and, R. Mukai

NIPS2003 Workshop on ICA: Sparse Representations in Signal Processing 2 ( 7 ) 1-4 2003年12月 [査読有り]
Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming for convolutive mixtures

S., Araki, S., Makino, Y., Hinamoto, R., Mukai, T., Nishikawa, and, H. Saruwatari

EURASIP Journal on Applied Signal Processing 2003 ( 11 ) 1157-1166 - 1166 2003年11月 [査読有り]

CiNii
Blind source separation when speech signals outnumber sensors using a sparseness-mixing matrix estimation (SMME)

A., Blin, S., Araki, and, S. Makino, 牧野, 昭二

IWAENC2003 211-214 - 214 2003年09月 [査読有り]

CiNii
Blind separation of more speech than sensors with less distortion by combining sparseness and ICA

S., Araki, S., Makino, A., Blin, R., Mukai, and, H. Sawada

IWAENC2003 271-274 2003年09月 [査読有り]

CiNii
Spectral smoothing for frequency-domain blind source separation

H., Sawada, R., Mukai, S, de la Kethulle, S., Araki, and, S. Makino, 牧野, 昭二

IWAENC2003 311-314 2003年09月 [査読有り]

CiNii
Blind source separation for convolutive mixtures based on complexity minimization

S, de la Kethulle, R., Mukai, H., Sawada, and, S. Makino, 牧野, 昭二

IWAENC2003 303-306 2003年09月 [査読有り]
Array geometry arrangement for frequency domain blind source separation

R., Mukai, H., Sawada, S, de la Kethulle, S., Araki, and, S. Makino, 牧野, 昭二

IWAENC2003 219-222 - 222 2003年09月 [査読有り]

CiNii
Multistage ICA for blind source separation of real acoustic convolutive mixture

T., Nishikawa, H., Saruwatari, K., Shikano, S., Araki, and, S. Makino, 牧野, 昭二

ICA2003 523-528 2003年04月 [査読有り]

CiNii
Subband based blind source separation with appropriate processing for each frequency band

S., Araki, S., Makino, R., Aichner, T., Nishikawa, and, H. Saruwatari

ICA2003 499-504 2003年04月 [査読有り]

CiNii
Geometrical interpretation of the PCA subspace method for overdetermined blind source separation

S., Winter, H., Sawada, and, S. Makino, 牧野, 昭二

ICA2003 775-780 - 780 2003年04月 [査読有り]

CiNii
Real-time blind source separation for moving speakers using blockwise ICA and residual crosstalk subtraction

R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

ICA2003 975-980 - 980 2003年04月 [査読有り]

CiNii
On-line time-domain blind source separation of nonstationary convolved signals

R., Aichner, H., Buchner, S., Araki, and, S. Makino, 牧野, 昭二

ICA2003 987-992 2003年04月 [査読有り]
A robust and precise method for solving the permutation problem of frequency-domain blind source separation

H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

ICA2003 505-510 2003年04月 [査読有り]

CiNii
Geometrically constrained ICA for robust separation of sound mixtures

M., Knaak, S., Araki, and, S. Makino, 牧野, 昭二

ICA2003 951-956 2003年04月 [査読有り]
Polar coordinate based nonlinear function for frequency-domain blind source separation

H Sawada, R Mukai, S Araki, S Makino

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E86A ( 3 ) 590 - 596 2003年03月 [査読有り]

　概要を見る

This paper discusses a nonlinear function for independent component analysis to process complex-valued signals in frequency-domain blind source separation. Conventionally, nonlinear functions based on the Cartesian coordinates are widely used. However, such functions have a convergence problem. In this paper, we propose a more appropriate nonlinear function that is based on the polar coordinates of a complex number. In addition, we show that the difference between the two types of functions arises from the assumed densities of independent components. Our discussion is supported by several experimental results for separating speech signals, which show that the polar type nonlinear functions behave better than the Cartesian type.
A robust approach to the permutation problem of frequency-domain blind source separation

H Sawada, R Mukai, S Araki, S Makino

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS V 381 - 384 2003年 [査読有り]

　概要を見る

This paper presents a robust and precise method for solving the permutation problem of frequency-domain blind source separation. It is based on two previous approaches: the direction of arrival estimation approach and the inter-frequency correlation approach. We discuss the advantages and disadvantages of the two approaches, and integrate them to exploit the both advantages. We also present a closed form formula to calculate a null direction, which is used in estimating the directions of source signals. Experimental results show that our method solved permutation problems almost perfectly for a situation that two sources were mixed in a room whose reverberation time was 300 ms.
Robust real-time blind source separation for moving speakers in a room

R Mukai, H Sawada, S Araki, S Makino

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS V 469 - 472 2003年 [査読有り]

　概要を見る

This paper describes a robust real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.
Geometrically constraint ICA for convolutive mixtures of sound

M Knaak, S Araki, S Makino

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS II 725 - 728 2003年 [査読有り]

　概要を見る

The goal of this contribution is a new algorithm using independent component analysis with a geometrical constraint. The new algorithm solves the permutation problem of blind source separation of acoustic mixtures, and it is significantly less sensitive to the precision of the geometrical constraint than an adaptive beamformer. A high degree of robustness is very important since the steering vector is always roughly estimated in the reverberant environment, even when the look direction is precise. The new algorithm is based on FastICA and constrained optimization. It is theoretically and experimentally analyzed with respect to the roughness of the steering vector estimation by using impulse responses of real room. The effectiveness of the algorithms for real-world mixtures is also shown in the case of three sources and three microphones.
Direction of arrival estimation for multiple source signals using independent component analysis

H Sawada, R Mukai, S Makino

SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 2, PROCEEDINGS 411 - 414 2003年 [査読有り]

　概要を見る

This paper presents a new method for estimating the directions of source signals. We assume a situation in which multiple source signals are mixed in a reverberant condition and observed at several sensors. The new method is based on independent component analysis, which separates mixed signals into original source signals. It can be applied where the number of sources is equal to the number of sensors, whereas the conventional methods based on sub-space analysis, such as the MUSIC algorithm, are applicable where there are fewer sources than sensors. Even in cases where the MUSIC algorithm can be applied, the new method is better at estimating the directions of sources if they are closely placed.
Subband based blind source separation for convolutive mixtures of speech

S Araki, S Makino, R Aichner, T Nishikawa, H Saruwatari

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS V 509 - 512 2003年 [査読有り]

　概要を見る

Subband processing is applied to blind source, separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In our proposed subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can handle long reverberation. Subband BSS achieves better performance than frequency-domain BSS. Moreover, we propose efficient separation procedures that take into consideration the frequency characteristics of room reverberation and speech signals. We achieve this (3) by using longer unmixing filters in low frequency bands, and (4) by adopting overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized in the proposed subband BSS.
Geometrical understanding of the PCA subspace method for overdetermined blind source separation

S Winter, H Sawada, S Makino

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS 769 - 772 2003年 [査読有り]

　概要を見る

In this paper, we discuss approaches for blind source separation where we can use more sensors than the number of sources for a better performance. The discussion focuses mainly on reducing the dimension of mixed signals before applying independent component analysis. We compare two previously proposed methods. The first is based on principal component analysis, where noise reduction is achieved. The second involves selecting a subset of sensors based on the fact that a low frequency prefers a wide spacing and a high frequency prefers a narrow spacing. We found that the PCA-based method behaves similarly to the geometry-based method for low frequencies in the way that it emphasizes the outer sensors and yields superior results for high frequencies, which provides a better understanding of the former method.
Natural gradient blind deconvolution and equalization using causal FIR filters

SC Douglas, HO Sawada, S Makino

CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2 2 ( 3 ) 197 - 201 2003年 [査読有り]

　概要を見る

Natural gradient adaptation is an especially convenient method for adapting the coefficients of a linear system in inverse filtering tasks such as blind deconvolution and equalization. Practical implementations of such methods require truncation of the filter impulse responses within the gradient updates. In this paper, we show how truncation of these filter impulse responses can create convergence problems and introduces a bias into the steady-state solution of one such algorithm. We then show how this algorithm can be modified to effectively mitigate these effects for estimating causal FIR approximations to doubly-infinite IIR equalizers. Simulations indicate that the modified algorithm provides the convergence benefits of the natural gradient while still attaining good steady-state performance.
ICA-based blind source separation of sounds

S., Makino, S., Araki, R., Mukai, H., Sawada, and, H. Saruwatari

JCA2002 (China-Japan Joint Conference on Acoustics) 83-86 - 86 2002年11月 [査読有り]

CiNii
Digital technologies for controlling room acoustics

M., Miyoshi, S., Makino

JCA2002 (China-Japan Joint Conference on Acoustics) 19-24 2002年11月 [査読有り]
Blind source separation for convolutive mixtures of speech using subband processing

S., Araki, S., Makino, R., Aichner, T., Nishikawa, and, H. Saruwatari

SMMSP2002 (International Workshop on Spectral Methods and Multirate Signal Processing) 195-202 2002年09月 [査読有り]

CiNii
Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming

S Araki, Y Hinamoto, S Makino, T Nishikawa, R Mukai, H Saruwatari

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS 1785 - 1788 2002年 [査読有り]

　概要を見る

Frequency domain Blind Source Separation (BSS) is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., Adaptive Beamformers (ABFs). The minimization of the off-diagonal components in the BSS update equation can be viewed as the minimization of the mean square error in the ABF. The unmixing matrix of the BSS and the filter coefficients of the ABF converge to the same solution in the mean square error sense if the two source signals are ideally independent. Therefore, the performance of the BSS is limited by that of the ABF. This understanding gives an interpretation of BSS from physical point of view.
Time domain blind source separation of non-stationary convolved signals by utilizing geometric beamforming

R Aichner, S Araki, S Makino

NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS 445 - 454 2002年 [査読有り]

　概要を見る

We propose a time-domain BSS algorithm that utilizes geometric information such as sensor positions and assumed locations of sources. The algorithm tackles the problem of convolved mixtures by explicitly exploiting the non-stationarity of the acoustic sources. The learning rule is based on secondorder statistics and is derived by natural gradient minimization. The proposed initialization of the algorithm is based on the null beamforming principle. This method leads to improved separation performance, and the algorithm is able to estimate long unmixing FIR filters in the time domain due to the geometric initialization. We also propose a post-filtering method for dewhitening which is based on the scaling technique in frequency-domain BSS. The validity of the proposed method is shown by computer simulations. Our experimental results confirm that the algorithm is capable of separating real-world speech mixtures and can be applied to short learning data sets down to a few seconds. Our results also confirm that the proposed dewhitening post-filtering method maintains the spectral content of the original speech in the separated output.
Enhanced frequency-domain adaptive algorithm for stereo echo cancellation

S Emura, Y Haneda, S Makino

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS II 1901 - 1904 2002年 [査読有り]

　概要を見る

Highly cross-correlated input signals create the problem of slow convergence of misalignment in stereo echo cancellation even after undergoing non-linear preprocessing. We propose a new frequency-domain adaptive algorithm that improves the convergence rate by increasing the contribution of non-linearity in the adjustment vector. Computer simulation showed that it is effective when the non-linearity gain is small.
Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming

S Araki, Y Hinamoto, S Makino, T Nishikawa, R Mukai, H Saruwatari

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS II 1785 - 1788 2002年 [査読有り]

　概要を見る

Frequency domain Blind Source Separation (BSS) is shown to be equivalent to two sets of frequency domain adaptive microphone arrays, i.e., Adaptive Beamformers (ABFs). The minimization of the off-diagonal components in the BSS update equation can be viewed as the minimization of the mean square error in the ABF. The unmixing matrix of the BSS and the filter coefficients of the ABF converge to the same solution in the mean square error sense if the two source signals are ideally independent. Therefore, the performance of the BSS is limited by that of the ABF. This understanding gives an interpretation of BSS from physical point of view.
Polar coordinate based nonlinear function for frequency-domain blind source separation

H Sawada, R Mukai, S Araki, S Makino

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS I 1001 - 1004 2002年 [査読有り]

　概要を見る

This paper presents a new type of nonlinear function for independent component analysis to process complex-valued signals, which is used in frequency-domain blind source separation. The new function is based on the polar coordinates of a complex number, whereas the conventional one is based on the Cartesian coordinates. The new function is derived from the probability density function of frequency-domain signals that are assumed to be independent of the phase. We show that the difference between the two types of functions is in the assumed densities of independent components. Experimental results for separating speech signals show that the new nonlinear function behaves better than the conventional one.
Removal of residual cross-talk components in blind source separation using time-delayed spectral subtraction

R Mukai, S Araki, H Sawada, S Makino

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS II 1789 - 1792 2002年 [査読有り]

　概要を見る

This paper describes a post processing method to refine output signals obtained by Blind Source Separation (BSS). The performance of BSS using Independent Component Analysis (ICA) declines significantly in a reverberant environment. The degradation is mainly caused by the cross-talk components derived from the reverberation of the jammer signal. Utilizing this knowledge, we propose a new method, time-delayed non-stationary spectral subtraction, which removes the residual components from the separated signals precisely. The proposed method compensates for the weakness of BSS in a reverberant environment. Experimental results using speech signals show that the proposed method improves the signal-to-noise ratio by 3 to 5 dB.
Removal of residual crosstalk components in blind source separation using LMS filters

R Mukai, S Araki, H Sawada, S Makino

NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS 435 - 444 2002年 [査読有り]

　概要を見る

The performance of Blind Source Separation (BSS) using Independent Component Analysis (ICA) declines significantly in a reverberant environment. The degradation is mainly caused by the residual crosstalk components derived from the reverberation of the jammer signal. This paper describes a post-processing method designed to refine output signals obtained by BSS.
We propose a new method which uses LMS filters in the frequency domain to estimate the residual crosstalk components in separated signals. The estimated components are removed by nonstational spectral subtraction. The proposed method removes the residual components precisely, thus it compensates for the weakness of BSS in a reverberant environment.
Experimental results using speech signals show that the proposed method improves the signal-to-interference ratio by 3 to 5 dB.
Blind source separation with different sensor spacing and filter length for each frequency range

H Sawada, S Araki, R Mukai, S Makino

NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS 465 - 474 2002年 [査読有り]

　概要を見る

This paper presents a method for blind source separation using several separating subsystems whose sensor spacing and filter length can be configured individually. Each subsystem is responsible for source separation of an allocated frequency range. With this mechanism, we can use appropriate sensor spacing as well as filter length for each frequency range. We obtained better separation performance than with the conventional method by using a wide sensor spacing and a long filter for a low frequency range, and a narrow sensor spacing and a short filter for a high frequency range.
Separation and dereverberation performance of frequency domain blind source separation

R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

ICA2001 230-235 2001年12月 [査読有り]
A polar-coordinate based activation function for frequency domain blind source separation

H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

ICA2001 663-668 2001年12月 [査読有り]

CiNii
実環境におけるブラインド音源分離技術を開発 -2人の声の同時聞き分けに成功-

牧野昭二, 荒木章子, 向井良, 片桐滋

電子情報通信学会誌 84 ( 11 ) 848 - 848 2001年11月 [査読有り]

CiNii
Equivalence between frequency domain blind source separation and frequency domain adaptive beamformers

S., Araki, S., Makino, R., Mukai, and, H. Saruwatari

CRAC (A workshop on Consistent and Reliable acoustic cues for sound analysis) 2 ( 4 ) 1-4 2001年09月 [査読有り]
ICASSP2001会議報告

牧野昭二, 荒木章子

人工知能学会誌 16 ( 5 ) 736-737 2001年09月 [査読有り]
Adaptive filtering algorithm enhancing decorrelated additive signals for stereo echo cancellation

S., Emura, Y., Haneda, and, S. Makino, 牧野, 昭二

IWAENC2001 67-70 2001年09月 [査読有り]
Separation and dereverberation performance of frequency domain blind source separation in a reverberant environment

R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

IWAENC2001 127-130 2001年09月 [査読有り]

CiNii
Equivalence between frequency domain blind source separation and frequency domain adaptive null beamformers

S., Araki, S., Makino, R., Mukai, and, H. Saruwatari

Eurospeech2001 2595-2598 2001年09月 [査読有り]
Separation and dereverberation performance of frequency domain blind source separation for speech in a reverberant environment

R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

Eurospeech2001 (European Conference on Speech Communication and Technology) 2599-2602 2001年09月 [査読有り]
全指向性を持つスピーカ・マイクロホン一体型通話装置の設計

中川, 朗, 島内, 末廣, 羽田, 陽一, 青木, 茂明, 牧野, 昭二

日本音響学会誌 57 ( 8 ) 509-516 - 516 2001年08月 [査読有り]

　概要を見る

ハンズフリーでの遠隔通信会議において, エコーやハウリングの諸問題を解決するために, 適応フィルタを用いた音響エコーキャンセラが広く用いられてきている。通信会議の利便性向上のために, スピーカとマイクロホンを同一筐体に収めたハンズフリー通話装置が望まれているが, このような装置では, スピーカとマイクロホン間の距離が短いため, スピーカからマイクロホンに回り込む音響エコーが増大し, 音響エコーキャンセラの制御が困難になる。本論文では, 音響エコー増大の問題に対して, 4マイクロホン構成の検討を行い, 各マイクロホン出力信号の位相を制御することにより, スピーカから回り込む音響エコーを約12dB低減しながら, 全指向性を持つスピーカ・マイクロホン一体型装置を実現した。

CiNii
Fundamental limitation of frequency domain Blind Source Separation for convolutive mixture of speech

A Shoko, S. Makino, T Nishikawa, H Saruwatari

2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS 2737 - 2740 2001年 [査読有り]

　概要を見る

Despite several recent proposals to achieve Blind Source Separation (BSS) for realistic acoustic signal, separation performance is still not enough. In particular, when the length of impulse response is long, performance is highly limited. In this paper, we show it is useless to be constrained by the condition, Pmuch less than T, where T is the frame size of FFT and P is the length of room impulse response. From our experiments, a frame size of 256 or 512 (32 or 64 ms at a sampling frequency of 8 kHz) is best even for the long room reverberation of T-R = 150 and 300 ms. We also clarified the reason for poor performance of BSS in long reverberant environment, finding that separation is achieved chiefly for the sound from the direction of jammer because BSS cannot calculate the inverse of the room transfer function both for the target and jammer signals.
Stereophonic acoustic echo cancellation: An overview and recent solutions

S. Makino

Acoustical Science and Technology 22 ( 5 ) 325 - 333 2001年 [査読有り]

　概要を見る

The fundamental problems of stereophonic acoustic echo cancellation were discussed and the recent solutions were reviewed. The stereo echo cancellation was achieved by linearly combining two monoaural echo cancellers. A duofilter control system including a continually running adaptive filter and a fixed filter was used for double talk control. A second order stereo projection algorithm was used in the adaptive filter and a stereo switch was also implemented.

DOI CiNii
Subjective assessment of the desired echo return loss for subband acoustic echo cancellers

S Sakauchi, Y Haneda, S Makino, M Tanaka, Y Kaneda

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E83A ( 12 ) 2633 - 2639 2000年12月 [査読有り]

　概要を見る

We investigated the dependence of the desired echo return loss on frequency for various hands-free telecommunication conditions by subjective assessment The desired echo return loss as a function of frequency (DERLf) is all important factor in the design and performance evaluation of a subband echo canceller, and it is a measure of what is considered all acceptable echo caused by electrical loss in the transmission line. The DERLf during single talk was obtained as attenuated band-limited echo levels that subjects did not find objectionable when listening to the near-end speech and its band-limited echo under various hands-free telecommunication conditions. When we investigated the DERLf during double-talk, subjects also heard the speech in the far-end room from a loudspeaker. The echo was limited to a 250-Hz bandwidth assuming the use of a subband echo canceller. The test results showed that: (1) when the transmission delay was short (30 ms), the echo component around 2 to 3 kHz was the most objectionable to listeners. (2) as the transmission delay rose to 300 ms, the echo component around 1 kHz became the most objectionable; (3) when the room reverberation time was relatively long (about 500 ms). the echo cumyonent around 1 kHz was the most objectionable even if the transmission delay was short; and ( 1) the DERLf during double-talk was about 5 to 10dB lower than that during single-talk. Use of these DERLf values will enable the design of mure efficient subband echo cancellers.
A study of microphone system for hands-free teleconferencing units

Akira Nakagawa, Suehiro Shimauchi, Shoji Makino

Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) 21 ( 1 ) 33 - 35 2000年 [査読有り]

DOI
Channel-number-compressed multi-channel acoustic echo canceller for high-presence teleconferencing system with large display

A Nakagawa, S Shimauchi, Y Haneda, S Aoki, S Makino

2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI 813 - 816 2000年 [査読有り]

　概要を見る

Sound localization is important to make conversation easy between local and remote sites in a teleconference. This requires a multi-channel sound system having a multi-channel acoustic echo canceller (MAEC). The appropriate number of channels is determined from a trade-off between high presence and MAEC performance, so it is not possible to increase the channel number by much.
We propose a channel-number-compressed MAEC to provide teleconferencing systems that exhibit high presence. The channel number of the MAEC inputs is compressed and that of its outputs is expanded.
Hybrid of acoustic echo cancellers and voice switching control for multi-channel applications

S., Shimauchi, A., Nakagawa, Y., Haneda, and, S. Makino, 牧野, 昭二

IWAENC99 48-51 1999年09月 [査読有り]

CiNii
Subband echo canceler with an exponentially weighted stepsize NLMS adaptive filter

S Makino, Y Haneda

ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE 82 ( 3 ) 49 - 57 1999年03月 [査読有り]

　概要を見る

This paper proposes a novel adaptive algorithm for an echo canceler. In this algorithm, the number of operations and memory capacity are equivalent to those of the conventional NLMS algorithm but the convergence speed is twice that using the conventional algorithm. This adaptive algorithm is referred to as subband ES (exponentially weighted stepsize). In the algorithm, the frequency bands of the received input signal and echo signal are divided into multiple subbands, and echo is independently canceled in each subband. Each adaptive filter in each subband has independent coefficients with an independent stepsize. The stepsize is time-independent and its weight is exponentially proportional to the change of the impulse response within the frequency region, such as the expected value of the difference between the waveforms of two impulse responses. As a result, the characteristic of the acoustic echo path in each frequency band is analyzed using the adaptive algorithm to improve the convergence characteristic. Using the results of computer simulation and experimental results obtained via an experimental setup with DSP, it is shown that the convergence speed with respect to input voice signal can be about 4 times faster when using echo cancellation based on the new algorithm than in conventional full-band echo cancellation based on the NLMS algorithm. (C) 1998 Scripta Technica, Electron Comm Jpn Pt 3, 82(3): 49-57, 1999.
A stereo echo canceller implemented using a stereo shaker and a duo-filter control system

S Shimauchi, S Makino, Y Haneda, A Nakagawa, S Sakauchi

ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI 857 - 860 1999年 [査読有り]

　概要を見る

Stereo echo cancellation has been achieved and used in daily teleconferencing. To overcome the non-uniqueness problem, a stereo shaker is introduced in eight frequency bands and adjusted so as to be inaudible and not affect stereo perception. A duo-filter control system including a continually running adaptive filter and a fixed filter is used for double-talk control. A second-order stereo projection algorithm is used in the adaptive filter. A stereo voice switch is also included. This stereo echo canceller was tested in two-way conversation in a conference room, and the strength of the stereo shaker was subjectively adjusted. A misalignment of 20 dB was obtained in the teleconferencing environment, and changing the talker's position in the transmission room did not affect the cancellation. This echo canceller is now used daily in a high-presence teleconferencing system and has been demonstrated to more than 300 attendees.
New configuration for a stereo echo canceller with nonlinear pre-processing

S Shimauchi, Y Haneda, S Makino, Y Kaneda

PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6 3685 - 3688 1998年 [査読有り]

　概要を見る

A new configuration for a stereo echo canceller with nonlinear pre-processing is proposed. The pre-processor which adds uncorrelated components to the original received stereo signals improves the adaptive filter convergence even in the conventional configuration. However, because of the inaudibility restriction, the preprocessed signals still include a large amount of the original stereo signals which are often highly cross-correlated. Therefore, the improvement is limited. To overcome this, our new stereo echo canceller includes exclusive adaptive filters whose inputs are the uncorrelated signals generated in the pre-processor. These exclusive adaptive filters converge to true solutions without suffering from cross-correlation between the original stereo signals. This is demonstrated through computer simulation results.
Subband acoustic echo canceller using two different analysis filters and 8th order projection algorithm

A., Nakagawa, Y., Haneda, and, S. Makino, 牧野, 昭二

IWAENC97 140-143 1997年09月 [査読有り]

CiNii
Subjective assessment of echo return loss required for subband acoustic echo cancellers

S., Sakauchi, Y., Haneda, and, S. Makino, 牧野, 昭二

IWAENC97 152-155 1997年09月 [査読有り]
Multiple-point equalization of room transfer functions by using common acoustical poles

Y Haneda, S Makino, Y Kaneda

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 5 ( 4 ) 325 - 333 1997年07月 [査読有り]

　概要を見る

A multiple-point equalization filter using the common acoustical poles of room transfer functions is proposed, The common acoustical poles correspond to the resonance frequencies, which are independent of source and receiver positions. They are estimated as common autoregressive (AR) coefficients from multiple room transfer functions. The equalization is achieved with a finite impulse response (FIR) filter, which has the inverse characteristics of the common acoustical pole function. Although the proposed filter cannot recover the frequency response dips of the multiple room transfer functions, it can suppress their common peaks due to resonance; it is also less sensitive to changes in receiver position, Evaluation of the proposed equalization filter using measured room transfer functions shows that it can reduce the deviations in the frequency characteristics of multiple room transfer functions better than a conventional multiple-point inverse filter, Experiments show that the proposed filter enables 1-5 dB additional amplifier gain in a public address system without acoustic feedback at multiple receiver positions, Furthermore, the proposed filter reduces the reflected sound in room impulse responses without the pre-echo that occurs with a multiple-point inverse filter. A multiple-point equalization filter using common acoustical poles can thus equalize multiple room transfer functions by suppressing their common peaks.
Subband stereo echo canceller using the projection algorithm with fast convergence to the true echo path

S Makino, K Strauss, S Shimauchi, Y Haneda, A Nakagawa

1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V 299 - 302 1997年 [査読有り]

　概要を見る

This paper proposes a new subband stereo echo canceller that converges to the true echo path impulse response much faster than conventional stereo echo cancellers. Since signals are bandlimited and downsampled in the subband structure, the time interval between the subband signals become longer, so the variation of the crosscorrelation between the stereo input signals becomes large. Consequently, convergence to the true solution is improved. Furthermore, the projection algorithm, or affine projection algorithm, is applied to further speed up the convergence. Computer simulations using stereo signals recorded in a conference room demonstrate that this method significantly improves convergence speed and almost solves the problem of stereo echo cancellation with low computational load.
Noise reduction for subband acoustic echo canceller

J., Sasaki, Y., Haneda, and, S. Makino, 牧野, 昭二

Joint meeting, Acoustical Society of America and Acoustical Society of Japan 1285-1290 1996年12月 [査読有り]

CiNii
Implementation and evaluation of an acoustic echo canceller using duo-filter control system

Y., Haneda, S., Makino, J., Kojima, and, S. Shimauchi

EUSIPCO96 (European Signal Processing Conference) 1115-1118 - 1118 1996年09月 [査読有り]

CiNii
SSB subband echo canceller using low-order projection algorithm

S Makino, J Noebauer, Y Haneda, A Nakagawa

1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6 945 - 948 1996年 [査読有り]
Stereo echo cancellation algorithm using imaginary input-output relationships

S Shimauchi, S Makino

1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6 941 - 944 1996年 [査読有り]
A FAST PROJECTION ALGORITHM FOR ADAPTIVE FILTERING

M TANAKA, Y KANEDA, S MAKINO, J KOJIMA

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E78A ( 10 ) 1355 - 1361 1995年10月 [査読有り]

　概要を見る

This paper proposes a new algorithm called the fast Projection algorithm, which reduces the computational complexity of the Projection algorithm from (p+1)L+O(p(3)) to 2L+20p (where L is the length of the estimation filter and p is the projection order.) This algorithm has properties that lie between those of NLMS and RLS, i.e. less computational complexity than RLS but much faster convergence than NLMS for input signals like speech. The reduction of computation consists of two parts. One concerns calculating the pre-filtering vector which originally took O(p(3)) operations. Our new algorithm computes the pre-filtering vector recursively with about 15p operations. The other reduction is accomplished by introducing an approximation vector of the estimation filter. Experimental results for speech input show that the convergence speed of the Projection algorithm approaches that of RLS as the projection order increases with only a slight extra calculation complexity beyond that of NLMS, which indicates the efficiency of the proposed fast Projection algorithm.
Relationship between the 'ES family' algorithms and conventional adaptive algorithms

S., Makino

IWAENC95 11-14 1995年06月 [査読有り]

CiNii
Implementation and evaluation of an acoustic echo canceller using the duo-filter control system

Y., Haneda, S., Makino, J., Kojima, and, S. Shimauchi

IWAENC95 79-82 1995年06月 [査読有り]
エコーキャンセラは拡声装置のハウリングにも有効か？

牧野昭二

日本音響学会誌 51 ( 3 ) 248 1995年03月 [査読有り]
STEREO PROJECTION ECHO CANCELER WITH TRUE ECHO PATH ESTIMATION

S SHIMAUCHI, S MAKINO

1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5 3059 - 3062 1995年 [査読有り]
FAST PROJECTION ALGORITHM AND ITS STEP-SIZE CONTROL

M TANAKA, Y KANEDA, S MAKINO, J KOJIMA

1995 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING - CONFERENCE PROCEEDINGS, VOLS 1-5 945 - 948 1995年 [査読有り]
高性能音響エコーキャンセラの開発

小島順治, 牧野昭二, 羽田陽一, 島田末廣

NTT R&D 44 ( 1 ) 39-44 1995年01月 [査読有り]
室内音場伝達関数の共通極・零モデル化

羽田陽一, 牧野昭二, 金田豊

NTT R&D 44 ( 1 ) 53-58 - 101 1995年01月 [査読有り]

　概要を見る

室内音場の共振系に対応した共通極を用いた室内音場伝達関数の新しいモデル(共通極・零モデル)を提案する。共通極は音源・受音点配置の異なる複数の室内音場伝達関数から共通AR係数として推定する。このモデルでは、複数の室内音場伝達関数を、推定した共通AR係数とそれぞれの室内音場伝達関数毎に異なる零点を用いて表現するため、従来の全零モデルや極零モデルに比べてパラメータ数を削減することができる。共通極・零モデルに基づいたエコーキャンセラのシミュレーションを行なった結果、従来の全零モデルに比べて、800Hzまでの帯域で、適応フィルタの次数を約半分に、収束速度を約1.5倍に向上させることができ、提案したモデルの有効性が確認された。

CiNii
1994年音響・音声・信号処理国際会議（ICASSP-94）報告

牧野昭二, 他

日本音響学会誌 50 ( 9 ) 759-760 1994年09月 [査読有り]
音声エコーキャンセラのための適応信号処理の研究

牧野昭二

日本音響学会誌 75 1994年01月 [査読有り]
Arma modeling of a room transfer function at low frequencies

Yoichi Haneda, Shoji Makino, Yutaka Kaneda, Nobuo Koizumi

Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) 15 ( 5 ) 353 - 355 1994年 [査読有り]

DOI
A NEW RLS ALGORITHM-BASED ON THE VARIATION CHARACTERISTICS OF A ROOM IMPULSE-RESPONSE

S MAKINO, Y KANEDA

ICASSP-94 - PROCEEDINGS, VOL 3 III 373 - 376 1994年 [査読有り]
マイクロプロセッサを用いたプログラム制御形音声スイッチの設計

及川弘, 西野正和, 山森和彦, 牧野昭二

電子情報通信学会論文誌 J77-B-I ( 1 ) 66-74 - 74 1994年01月 [査読有り]

　概要を見る

音声スイッチ回路(VS)は,マイクロホンとスピーカを用いて拡声通話を実現する通信機器におけるエコー抑圧やハウリング防止などに広く使用されている.また,最近では,エコーキャンセラの性能を補完する形で,エコーキャンセラと併用して使用されることも多い.加藤らは,このVSをアナログ回路を利用して実現する際の動作特性と設計法について詳しく検討し,音響・側音特性の変化に自動的に適応して切換え損を小さくできる自動損失切換え形VS(ALS)を提案している.しかし,このALSは,アナログ回路のみで構成するため,設計が複雑で,今以上に切換え損を小さくすることは極めて困難である.そこで,本論文では,マイクロプロセッサ(μP)を用い,プログラム制御でALSの機能を実現することで優れた通話性能が得られるプログラム制御形音声スイッチとして(1)全通話帯域に適用するもの(Type A)と,(2)通話帯域を分割し各通話帯域ごとに適用するもの(Type B)とを提案する.

CiNii
Common acoustical poles independent of sound directions and modeling of head-related transfer functions

Yoichi Haneda, Shoji Makino, Yutaka Kaneda

Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) 15 ( 4 ) 277 - 279 1994年 [査読有り]

DOI
EXPONENTIALLY WEIGHTED STEP-SIZE PROJECTION ALGORITHM FOR ACOUSTIC ECHO CANCELERS

S MAKINO, Y KANEDA

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E75A ( 11 ) 1500 - 1508 1992年11月 [査読有り]

　概要を見る

This paper proposes a new adaptive algorithm for acoustic echo cancellers with four times the convergence speed for a speech input, at almost the same computational load, of the normalized LMS (NLMS). This algorithm reflects both the statistics of the variation of a room impulse response and the whitening of the received input signal. This algorithm, called the ESP (exponentially weighted step-size projection) algorithm, uses a different step size for each coefficient of an adaptive transversal filter. These step sizes are time-invariant and weighted proportional to the expected variation of a room impulse response. As a result, the algorithm adjusts coefficients with large errors in large steps, and coefficients with small errors in small steps. The algorithm is based on the fact that the expected variation of a room impulse response becomes progressively smaller along the series by the same exponential ratio as the impulse response energy decay. This algorithm also reflects the whitening of the received input signal, ie., it removes the correlation between consecutive received input vectors. This process is effective for speech, which has a highly non-white spectrum. A geometric interpretation of the proposed algorithm is derived and the convergence condition is proved. A fast projection algorithm is introduced to reduce the computational complexity and modified for a practical multiple DSP structure so that it requires almost the same computational load, 2L multiply-add operations, as the conventional NLMS. The algorithm is implemented in an acoustic echo canceller constructed with multiple DSP chips, and its fast convergence is demonstrated.
MODELING OF A ROOM TRANSFER-FUNCTION USING COMMON ACOUSTICAL POLES

Y HANEDA, S MAKINO, Y KANEDA

ICASSP-92 - 1992 INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 II B213 - B216 1992年 [査読有り]
Subband echo canceller with an exponentially weighted step size NLMS adaptive filter

S., Makino, Y., Haneda

IWAENC91 (International Workshop on Acoustic Echo and Noise Control) 109-120 1991年09月 [査読有り]
1990年音響・音声・信号処理国際会議（ICASSP90）報告

広瀬啓吉, 中川聖一, 谷口智彦, 牧野昭二

日本音響学会誌 46 ( 10 ) 869-870 - 870 1990年10月 [査読有り]

CiNii
最近の電話の音響技術 - エコー制御技術 -

島田正治, 牧野昭二

テレビジョン学会誌 44 ( 3 ) 222-227 - 227 1990年03月 [査読有り]

DOI CiNii
ACOUSTIC ECHO CANCELER ALGORITHM BASED ON THE VARIATION CHARACTERISTICS OF A ROOM IMPULSE-RESPONSE

S MAKINO, Y KANEDA

ICASSP 90, VOLS 1-5 1133 - 1136 1990年 [査読有り]
Echo control in telecommunications

Shoji Makino, Shoji Shimada

Journal of the Acoustical Society of Japan (E) 11 ( 6 ) 309 - 316 1990年 [査読有り]

　概要を見る

This paper reviews echo control techniques for telecommunications, emphasizing the principles and applications of both circuit and acoustic echo cancellers. First, echo generating mechanisms and echo problems are described for circuit and acoustic echoes. Circuit echo is caused by impedance mismatching in a hybrid coil. Acoustic echo is caused by acoustic coupling between loudspeakers and microphones in a room. The echo problem is severe when the round-trip propagation delay is long. In this case, the echo must be removed. Next, the basic principle of the echo canceller, adaptive filter structure and adaptive algorithm are discussed. Emphasis is focused on the construction and operation of an adaptive transversal filter using the NLMS (Normalized Least Mean Square) algorithm, which is the most popular for the echo canceller. Then, applications of circuit and acoustic echo cancellers are described. Circuit echo cancellers have been well studied and implemented in LSIs for many applications. Although acoustic echo cancellers have been introduced into audio teleconference systems, they still have some problems which must be solved. Therefore, they are now being studied intensely. Finally, this paper mentions the problems of echo cancellers and the direction of future work on them. The main targets for acoustic echo cancellers are improving the convergence speed, reducing the amount of hardware and bettering the double-talk control technique. © 1990, Acoustical Society of Japan. All rights reserved.

DOI
Acoustic echo canceller algorithm based on room acoustic characteristics

S., Makino, N., Koizumi

WASPAA89 (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics) 1 ( 1 ) 1-2 1989年10月 [査読有り]
A coustic echo canceller with multiple echo paths

Nobuo Koizumi, Shoji Makino, Hiroshi Oikawa

Journal of the Acoustical Society of Japan (E) 10 ( 1 ) 39 - 45 1989年 [査読有り]

　概要を見る

A new configuration of acoustic echo canceller for multiple microphone teleconferencing systems is proposed. It is designed for use with microphones whose gains switch or vary during teleconferencing according to the talker. This system requires memory for multiple echo paths, which enables the updating of filter coefficients when an echo path is changed due to the switching of the actuated microphone during talker alternation. In comparison to the single echo path model which uses only adaptation, this method maintains echo cancellation during abrupt changes of the echo path when the microphone alternates between talkers. Also in comparison to direct microphone output mixing, this method reduces the stationary residual echo level by the reduction of acoustic coupling. © 1989, Acoustical Society of Japan. All rights reserved.

DOI
エコーキャンセラの室内音場における適応特性の改善について

牧野昭二, 小泉宣夫

電子情報通信学会論文誌 J71-A ( 12 ) 2212-2214 - 2214 1988年12月 [査読有り]

CiNii
AUDIO TELECONFERENCING SET WITH MULTIPATH ECHO CANCELLER

H OIKAWA, N KOIZUMI, S MAKINO

REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES 36 ( 2 ) 217 - 223 1988年03月 [査読有り]
複数反響路エコーキャンセラを用いた音声会議装置

及川弘, 小泉宣夫, 牧野昭二

研究実用化報告 37 ( 2 ) 191-197 - 197 1988年02月 [査読有り]

CiNii
周辺に段差を持つ圧電バイモルフ振動板の振動特性

牧野昭二, 一ノ瀬裕

日本音響学会誌 43 ( 3 ) 161-166 - 166 1987年03月 [査読有り]

CiNii

▼全件表示

書籍等出版物

Audio Source Separation

Makino,Shoji( 担当：単著)

Springer International Publishing 2018年03月 ISBN: 9783319730318
Underdetermined blind source separation using acoustic arrays

S., Makino, S., Araki, S., Winter, and, H. Sawada( 担当：単著)

Wiley 2010年01月
Underdetermined blind source separation of convolutive mixtures by hierarchical clustering and L1-norm minimization

S., Winter, W., Kellermann, H., Sawada, and, S. Makino, 牧野, 昭二( 担当：その他)

Springer 2007年09月
Frequency-domain blind source separation

H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二( 担当：その他)

Springer 2007年09月
K-means based underdetermined blind speech separation

S., Araki, H., Sawada, and, S. Makino, 牧野, 昭二( 担当：その他)

Springer 2007年09月
Blind Speech Separation

S. Makino, Te-Won Lee, H. Sawada( 担当：編集)

Springer 2007年09月 ISBN: 9781402064784

　概要を見る

http://www.amazon.co.jp/Speech-Separation-Signals-Communication-Technology/dp/1402064780
Blind source separation of convolutive mixtures of audio signals in frequency domain

S., Makino, H., Sawada, R., Mukai, and, S. Araki( 担当：単著)

Springer 2006年05月
Speech Enhancement

J. Benesty, S. Makino, J. Chen( 担当：編集)

Springer 2005年05月 ISBN: 354024039X

　概要を見る

http://www.amazon.co.jp/Speech-Enhancement-Signals-Communication-Technology/dp/354024039X
Real-time blind source separation for moving speech signals

R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二( 担当：その他)

Springer 2005年03月
Subband based blind source separation

S., Araki, S., Makino( 担当：その他)

Springer 2005年03月
Blind source separation of convolutive mixtures of speech

S., Makino( 担当：単著)

Springer 2003年01月
IEICE Knowledge Base

S.Makino( 担当：分担執筆, 担当範囲: Blind audio source separation based on sparse component analysis)

IEICE 2012年10月
2011 IEEE REGION 10 CONFERENCE TENCON 2011

Takeda, Kazuma, Kameoka, Hirokazu, Sawada, Hiroshi, Araki, Shoko, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji( 担当：分担執筆, 担当範囲: Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source)

IEEE 2011年01月 ISBN: 9781457702556
2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS

Ansai, Yumi, Araki, Shoko, Makino, Shoji, Nakatani, Tomohiro, Yamada, Takeshi, Nakamura, Atsushi, Kitawaki, Nobuhiko( 担当：分担執筆, 担当範囲: Cepstral Smoothing of Separated Signals for Underdetermined Speech Separation)

IEEE 2010年01月 ISBN: 9781424453092
通信会議設備

牧野, 昭二( 担当：単著)

フジ・テクノシステム 1999年10月
音響エコーキャンセラのための適応信号処理の研究

牧野, 昭二( 担当：単著)

1993年03月

▼全件表示

講演・口頭発表等

Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier

Li, Li, Hirokazu, Kameoka, Makino, Shoji

ICASSP (Brighton, United Kingdom)

発表年月： 2019年05月
Joint separation and dereverberation of reverberant mixtures with multichannel variational autoencoder

Shota, Inoue, Hirokazu, Kameoka, Li, Li, Shogo, Seki, Makino, Shoji

ICASSP (Brighton, United Kingdom)

発表年月： 2019年05月
Time-frequency-bin-wise switching of minimum variance distortionless response beamformer for underdetermined situations

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

ICASSP 2019 (Brighton, ENGLAND)

発表年月： 2019年05月
NEW ANALYTICAL UPDATE RULE FOR TDOA INFERENCE FOR UNDERDETERMINED BSS IN NOISY ENVIRONMENTS

Maruyama, Takuro, Araki, Shoko, Nakatani, Tomohiro, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji, Nakamura, Atsushi

IEEE International Conference on Acoustics, Speech and Signal Processing (Kyoto, JAPAN)

発表年月： 2012年03月
Audio source separation based on independent component analysis

S. Makino, H. Sawada [招待有り]

Tutorial at the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing

発表年月： 2007年04月
Study on geometrically constrained IVA with auxiliary function approach and VCD for in-car communication

Goto, Kana, Li, Li, Takahashi, Riki, Makino, Shoji, Yamada, Takeshi

APSIPA ASC 2020

発表年月： 2020年12月
Applying virtual microphones to triangular microphone array in in-car communication

Segawa, Hanako, Takahashi, Riki, Jinzai, Ryoga, Makino, Shoji, Yamada, Takeshi

APSIPA ASC 2020

発表年月： 2020年12月
空間フィルタの自動推定による音響シーン識別の検討

大野, 泰己, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会

発表年月： 2020年03月
Generative Adversarial Networks を用いた半教師あり学習の音響イベント検出への適用

合馬, 一弥, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会

発表年月： 2020年03月
発話の時間変動に着目した音声認識誤り区間推定の検討

舒, 禹清, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会

発表年月： 2020年03月
空間特徴と音響特徴を併用する音響イベント検出の検討

陳, 軼夫, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会

発表年月： 2020年03月
車室内コミュニケーション用低遅延音源分離の検討

上田, 哲也, 井上, 翔太, 牧野, 昭二, 松本, 光雄, 山田, 武志

日本音響学会春季研究発表会

発表年月： 2020年03月
DNNマスク推定に基づく畳み込みビームフォーマによる音源分離・残響除去・雑音除去の同時実現

髙橋理希, 中谷智広, 落合翼, 木下慶介, 池下林太郎, Marc, Delcroix, 荒木章子, 牧野, 昭二

日本音響学会2020年春季研究発表会

発表年月： 2020年03月
基底共有型半教師あり独立低ランク行列分析に基づく多チャネル補聴器システム

宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

日本音響学会2020年春季研究発表会

発表年月： 2020年03月
Spatial feature extraction based on convolutional neural network with multiple microphone inputs for monitoring of domestic activities

Kaneko, Yuki, Kurosawa, Rika, Yamada, Takeshi, Makino, Shoji

NCSP'20

発表年月： 2020年02月
Underdetermined multichannel speech enhancement using time-frequency-bin-wise switching beamformer and gated CNN-based time-frequency mask for reverberant environments

Takahashi, Riki, Yamaoka, Kouei, Li, Li, Makino, Shoji, Yamada, Takeshi, Matsumoto, Mitsuo

NCSP'20

発表年月： 2020年02月
Blind source separation with low-latency for in-car communication

Ueda, Tetsuya, Inoue, Shota, Makino, Shoji, Matsumoto, Mitsuo, Yamada, Takeshi

NCSP'20

発表年月： 2020年02月
多チャンネル変分自己符号化器法による任意話者の音源分離

李莉, 亀岡弘和, 井上翔太, 牧野, 昭二

電子情報通信学会

発表年月： 2019年12月
Evaluation of multichannel hearing aid system by rank-constrained spatial covariance matrix estimation

Une, Masakazu, Kubo, Yuki, Takamune, Norihiro, Kitamura, Daichi, Saruwatari, Hiroshi, Makino, Shoji

APSIPA (Lanzhou, China)

発表年月： 2019年11月
Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum

Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

APSIPA ASC 2019 (Lanzhou, PEOPLES R CHINA)

発表年月： 2019年11月
Improving singing aid system for laryngectomees with statistical voice conversion and VAE-SPACE

Li, Li, Toda, Tomoki, Morikawa, Kazuho, Kobayashi, Kazuhiro, Makino, Shoji

ISMIR (Delft, The Netherlands)

発表年月： 2019年11月
Joint separation, dereverberation and classification of multiple sources using multichannel variational autoencoder with auxiliary classifier

Shota, Inoue, Hirokazu, Kameoka, Li, Li, Makino, Shoji

ICA (AACHEN, GERMANY)

発表年月： 2019年09月
Gated convolutional neural network-based voice activity detection under high-level noise environments

Li, Li, Kouei, Yamaoka, Yuki, Koshino, Mitsuo, Matsumoto, Makino, Shoji

ICA (AACHEN, GERMANY)

発表年月： 2019年09月
BLSTMと変調スペクトルを用いた発話特徴識別の検討

サントソ, ジェニファー, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会

発表年月： 2019年09月
BLSTMを用いた音声認識誤り区間推定の検討

舒, 禹清, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会

発表年月： 2019年09月
Wavelength proportional arrangement of virtual microphones based on interpolation/extrapolation for underdetermined speech enhancement

Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Makino, Shoji, Yamada, Takeshi

EUSIPCO 2019

発表年月： 2019年09月
CNN-based virtual microphone signal estimation for MPDR beamforming in underdetermined situations

Yamaoka, Kouei, Li, Li, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

EUSIPCO 2019

発表年月： 2019年09月
ランク制約付き空間共分散モデル推定を用いた多チャネル補聴器システムの評価

宇根昌和, 久保優騎, 高宗典玄, 北村大地, 猿渡洋, 牧野, 昭二

日本音響学会2019年秋季研究発表会

発表年月： 2019年09月
日本語スピーキングテストにおける解答発話テキストの分散表現を用いた自動採点の検討

臼井, 桃香, 山田, 武志, 牧野, 昭二

電子情報通信学会総合大会

発表年月： 2019年03月
MVDRビームフォーマの時間周波数スイッチングによる劣決定音声強調

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

電子情報通信学会音声研究会

発表年月： 2019年03月
時間周波数スイッチングビームフォーマとGated CNNを用いた時間周波数マスクの組み合わせによる劣決定音声強調

髙橋, 理希, 山岡, 洸瑛, 李, 莉, 牧野, 昭二, 山田, 武志

日本音響学会2019年春季研究発表会

発表年月： 2019年03月
Experimental evaluation of WaveRNN predictor for audio lossless coding

Amada, Shota, Sugiura, Ryosuke, Kamamoto, Yutaka, Harada, Noboru, Moriya, Takehiro, Yamada, Takeshi, Makino, Shoji

NCSP'19

発表年月： 2019年03月
Noise suppression using beamformer and transfer-function-gain nonnegative matrix factorization with distributed stereo microphones

Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

NCSP'19

発表年月： 2019年03月
Categorizing error causes related to utterance characteristics in speech recognition

Santoso, Jennifer, Yamada, Takeshi, Makino, Shoji

NCSP'19

発表年月： 2019年03月
Performance evaluation of time-frequency-bin-wise switching beamformer in reverberant environments

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

NCSP'19

発表年月： 2019年03月
音源クラス識別器つき多チャンネル変分自己符号化器を用いた高速セミブラインド音源分離

李, 莉, 亀岡, 弘和, 牧野, 昭二

日本音響学会2019年春季研究発表会

発表年月： 2019年03月
Gated CNNを用いた劣悪な雑音環境下における音声区間検出

牧野, 昭二, 李莉, 越野ゆき, 松本光雄

電子情報通信学会

発表年月： 2019年03月
多チャンネル変分自己符号化器を用いた音源分離と残響除去の統合的アプローチ

井上翔太, 亀岡弘和, 李莉, 関翔悟, 牧野, 昭二

日本音響学会2019年春季研究発表会

発表年月： 2019年03月
Microphone position realignment by extrapolation of virtual microphone

Jinzai, Ryoga, Yamaoka, Kouei, Matsumoto, Mitsuo, Yamada, Takeshi, Makino, Shoji

10th Asia-Pacific-Signal-and-Information-Processing-Association Annual Summit and Conference (APSIPA ASC) (Honolulu, HI)

発表年月： 2018年11月
Weakly labeled learning using BLSTM-CTC for sound event detection

Matsuyoshi, Taiki, Komatsu, Tatsuya, Kondo, Reishi, Yamada, Takeshi, Makino, Shoji

APSIPA ASC 2018

発表年月： 2018年11月
時間周波数スイッチングビームフォーマと時間周波数マスキングによる劣決定音声強調

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

日本音響学会秋季研究発表会

発表年月： 2018年09月
Time-frequency-bin-wise beamformer selection and masking for speech enhancement in underdetermined noisy scenarios

Yamaoka, Kouei, Brendel, Andreas, Ono, Nobutaka, Makino, Shoji, Buerger, Michael, Yamada, Takeshi, Kellermann, Walter

EUSIPCO 2018 (Rome, ITALY)

発表年月： 2018年09月
Online integration of DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming

Matsui, Yutaro, Nakatani, Tomohiro, Delcroix, Marc, Kinoshita, Keisuke, Ito, Nobutaka, Araki, Shoko, Makino, Shoji

IWAENC2018

発表年月： 2018年09月
WaveRNNを利用した音声ロスレス符号化に関する検討と考察

天田, 将太, 杉浦, 亮介, 鎌本, 優, 原田, 登, 守谷, 健弘, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会

発表年月： 2018年09月
ヴァーチャルマイクロフォンの外挿によるマイクロフォン間隔の仮想的拡張

陣在, 遼河, 山岡, 洸瑛, 松本, 光雄, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会

発表年月： 2018年09月
音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習法の有効性評価

松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

日本音響学会秋季研究発表会

発表年月： 2018年09月
複数種録音端末を用いた会議の想定における伝達関数ゲイン基底NMFによる遠方音源抑圧の性能評価

松井, 裕太郎, 牧野, 昭二, 小野, 順貴, 山田, 武志

電子情報通信学会信号処理研究会

発表年月： 2018年03月
音響イベント検出におけるBLSTM-CTCを用いた弱ラベル学習の検討

松吉, 大輝, 小松, 達也, 近藤, 玲史, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会

発表年月： 2018年03月
複数ビームフォーマの組み合わせによる非線形マイクロフォンアレイ

山岡, 洸瑛, 小野, 順貴, 牧野, 昭二, 山田, 武志

日本音響学会春季研究発表会

発表年月： 2018年03月
音声認識における誤認識原因通知のための印象評定値推定の検討

後藤, 孝宏, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会

発表年月： 2018年03月
畳み込みニューラルネットワークを用いた空間特徴抽出に基づく音響シーン識別の検討

高橋, 玄, 山田, 武志, 牧野, 昭二

日本音響学会春季研究発表会

発表年月： 2018年03月
Novel speech recognition interface based on notification of utterance volume required in changing noisy environment

Goto, Takahiro, Yamada, Takeshi, Makino, Shoji

NCSP'18

発表年月： 2018年03月
Acoustic scene classification based on spatial feature extraction using convolutional neural networks

Takahashi, Gen, Yamada, Takeshi, Makino, Shoji

NCSP'18

発表年月： 2018年03月
Ego noise reduction and sound source localization adapted to human ears using hose-shaped rescue robot

Mae, Narumi, Yamaoka, koei, Mitsui, Yosiki, Matsumoto, Mitsuo, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

NCSP'18

発表年月： 2018年03月
Ego-noise reduction for hose-shaped rescue robot using basis-shared semi-supervised independent low-rank matrix analysis

Takakusaki, Moe, Kitamura, Daichi, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi, Saruwatari, Hiroshi

NCSP'18

発表年月： 2018年03月
Abnormal sound detection by two microphones using virtual microphone technique

Yamaoka, Kouei, Ono, Nobutaka, Makino, Shoji, Yamada, Takeshi

APSIPA 2017 (Kuala Lumpur, MALAYSIA)

発表年月： 2017年12月
Sound source localization using binaural difference for hose-shaped rescue robot

Mae, Narumi, Mitsui, Yoshiki, Makino, Shoji, Kitamura, Daichi, Ono, Nobutaka, Yamada, Takeshi, Saruwatari, Hiroshi

APSIPA 2017 (Kuala Lumpur, MALAYSIA)

発表年月： 2017年12月
Performance evaluation of acoustic scene classification using DNN-GMM and frame-concatenated acoustic Features

Takahashi, Gen, Yamada, Takeshi, Ono, Nobutaka, Makino, Shoji

APSIPA 2017 (Kuala Lumpur, MALAYSIA)

発表年月： 2017年12月
Experimental evaluation of encoding parameters of MPEG-4 ALS for high-resolution audio

Amada, Shota, Kamamoto, Yutaka, Harada, Noboru, Sugiura, Ryosuke, Moriya, Takehiro, Makino, Shoji, Yamada, Takeshi

IEEE GCCE 2017 (Nagoya, JAPAN)

発表年月： 2017年10月
Mel-generalized cepstral regularization for discriminative non-negative matrix factorization

Li, Li, Kameoka, Hirokazu, Makino, Shoji

MLSP (Tokyo, Japan)

発表年月： 2017年09月
Multiple far noise suppression in a real environment using transfer-function-gain NMF

Matsui, Yutaro, Makino, Shoji, Ono, Nobutaka, Yamada, Takeshi

EUSIPCO 2017 (GREECE)

発表年月： 2017年08月
Performance evaluation of nonlinear speech enhancement based on virtual increase of channels in reverberant environments

Yamaoka, Kouei, Ono, Nobutaka, Yamada, Takeshi, Makino, Shoji

EUSIPCO 2017 (GREECE)

発表年月： 2017年08月
Speech enhancement using non-negative spectrogram models with mel-generalized cepstral regularization

Li, Li, Kameoka, Hirokazu, Toda, Tomoki, Makino, Shoji

Interspeech (Stockholm, Sweden)

発表年月： 2017年08月
Convolutional neural network architecture and input volume matrix design for ERP classifications in a tactile P300-based brain-computer interface

Kodama, Takumi, Makino, Shoji

IEEE Engineering in Medicine & Biology Society (EMBC) (Jeju Island, Korea)

発表年月： 2017年07月
柔軟索状ロボットにおける独立低ランク行列分析と統計的音声強調に基づく高品質ブラインド音源分離の開発

三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

日本機械学会ロボティクス・メカトロニクス講演会

発表年月： 2017年05月
SJ-CATにおける項目応答理論に基づく能力値推定の精度改善

小野, 友暉, 山田, 武志, 菊地, 賢一, 今井, 新悟, 牧野, 昭二

日本音響学会2017年春季研究発表会

発表年月： 2017年03月
音響ロスレス符号化MPEG-4 ALSのハイレゾ音源適応の検討と考察

天田, 将太, 鎌本, 優, 原田, 登, 杉浦, 亮介, 守谷, 健弘, 山田, 武志, 牧野, 昭二

日本音響学会2017年春季研究発表会

発表年月： 2017年03月
DNN-GMMと連結特徴量を用いた音響シーン識別の検討

高橋, 玄, 山田, 武志, 小野, 順貴, 牧野, 昭二

日本音響学会2017年春季研究発表会

発表年月： 2017年03月
Discriminative non-negative matrix factorization with majorization-minimization

Li, L, Kameoka, H, Makino, Shoji

HSCMA (San Francisco, CA)

発表年月： 2017年03月
補助関数法による識別的NMFの基底学習アルゴリズム

李莉, 亀岡弘和, 牧野昭二

日本音響学会2017年春季研究発表会

発表年月： 2017年03月
独立低ランク行列分析と統計的音声強調を用いた柔軟索状ロボットにおけるブラインド音源分離システムの開発

三井祥幹, 溝口聡, 猿渡洋, 越智景子, 北村大地, 小野順貴, 石村大, 前成美, 高草木萌, 松井裕太郎, 山岡洸瑛, Makino, Shoji

日本音響学会2017年春季研究発表会

発表年月： 2017年03月
Ego noise reduction for hose-shaped rescue robot combining independent low-rank matrix analysis and multichannel noise cancellation

Mae, N, Ishimura, M, Makino, Shoji, Kitamura, D, Ono, N, Yamada, T, Saruwatari, H

13th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) (Grenoble Alpes Univ, Grenoble, FRANCE)

発表年月： 2017年02月
Analysis of the brain activated distributions in response to full-body spatial vibrotactile stimuli using a tactile P300-based BCI paradigm

Kodama, T, Makino, Shoji

Biomedical and Health Informatics (BHI)

発表年月： 2017年02月
Performance estimation of spontaneous speech recognition using non-reference acoustic features

Ling,Guo, Takeshi,Yamada, Shoji,Makino

APSIPA2016 (Jeju, SOUTH KOREA)

発表年月： 2016年12月
Full-body tactile P300-based brain-computer interface accuracy refinement

Kodama, T, Shimizu, K, Makino, Shoji, Rutkowski, T

International Conference on Bio-engineering for Smart Technologies (BioSMART)

発表年月： 2016年12月
Tactile brain-computer interface using classification of P300 responses evoked by full body spatial vibrotactile stimuli

Kodama, T, Makino, Shoji, Rutkowski, T

APSIPA

発表年月： 2016年12月
Visual motion onset augmented reality brain-computer interface

Shimizu, K, Kodama, T, Makino, Shoji, Rutkowski, T

International Conference on Bio-engineering for Smart Technologies (BioSMART)

発表年月： 2016年12月
伝達関数ゲイン基底NMFを用いた遠方雑音抑圧の実環境での評価

松井,裕太郎, 牧野,昭二, 小野,順貴, 山田,武志

第31回信号処理シンポジウム

発表年月： 2016年11月
雑音下音声認識における必要発話音量提示機能の実装と評価

後藤,孝宏, 山田,武志, 牧野,昭二

日本音響学会秋季研究発表会

発表年月： 2016年09月
日本語スピーキングテストSJ-CATにおける項目応答理論に基づく能力値推定の検証

小野,友暉, 山田,武志, 菊地,賢一, 今井,新悟, 牧野,昭二

日本音響学会秋季研究発表会

発表年月： 2016年09月
ノンリファレンス特徴量を用いた自然発話音声認識の性能推定の検討

郭,レイ, 山田,武志, 牧野,昭二

日本音響学会秋季研究発表会

発表年月： 2016年09月
ヴァーチャル多素子化に基づくSN比最大化ビームフォーマの残響に対する性能変化

山岡,洸瑛, 小野,順貴, 山田,武志, 牧野,昭二

日本音響学会秋季研究発表会

発表年月： 2016年09月
Ego-noise reduction for a hose-shaped rescue robot using determined Rank-1 multichannel nonnegative matrix factorization

Moe,Takakusaki, Daichi,Kitamura, Nobutaka,Ono, Takeshi,Yamada, Shoji,Makino, Hiroshi,Saruwatari

IWAENC2016

発表年月： 2016年09月
Noise reduction using independent vector analysis and noise cancellation for a hose-shaped rescue robot

Masaru,Ishimura, Shoji,Makino, Takeshi,Yamada, Nobutaka,Ono, Hiroshi,Saruwatari

IWAENC2016 (Xian, PEOPLES R CHINA)

発表年月： 2016年09月
Multi-talker speech recognition based on blind source separation with ad hoc microphone array using smartphones and cloud storage

Ochi, K, Ono, N, Miyabe, S, Makino, Shoji

Interspeech (San Francisco, CA)

発表年月： 2016年09月
Acoustic scene classification using deep neural network and frame-concatenated acoustic feature

Gen, Takahashi, Takeshi, Yamada, Shoji, Makino, Nobutaka, Ono

Detection and Classification of Acoustic Scenes and Events

発表年月： 2016年09月
Flexible microphone array based on multichannel nonnegative matrix factorization and statistical signal estimation

Saruwatari, H, Takata, K, Ono, N, Makino, Shoji [招待有り]

International Congress on Acoustics

発表年月： 2016年09月
Toward a QoL improvement of ALS patients: Development of the full-body P300-based tactile brain-computer interface

Kodama, T, Makino, Shoji, Rutkowski, T

AEARU Young Researchers International Conference

発表年月： 2016年09月
音声のスペクトル領域とケプストラム領域における同時強調

李莉, 亀岡弘和, 樋口卓哉, 猿渡洋, 牧野昭二

信学技報 EA2014-75

発表年月： 2016年08月
独立ベクトル分析とノイズキャンセラを用いた雑音抑圧の柔軟索状ロボットへの適用

石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

日本機械学会ロボティクス・メカトロニクス講演会2016

発表年月： 2016年06月
Vehicle counting and lane estimation with ad-hoc microphone array in real road environments

Takuya,Toyoda, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

NCSP'16

発表年月： 2016年03月
ランク1空間モデル制約付き多チャネルNMFを用いた雑音抑圧の柔軟索状ロボットへの適用

高草木,萌, 北村,大地, 小野,順貴, 山田,武志, 牧野,昭二, 猿渡,洋

電子情報通信学会総合大会

発表年月： 2016年03月
振幅のみからの相関推定と雑音尖度に基づく空間サブトラクションアレーの減算係数最適化

李,傑, 宮部,滋樹, 小野,順貴, 山田,武志, 牧野,昭二

日本音響学会2016年春季研究発表会

発表年月： 2016年03月
独立ベクトル分析とノイズキャンセラを用いた柔軟索状ロボットにおける雑音抑圧

石村,大, 牧野,昭二, 山田,武志, 小野,順貴, 猿渡,洋

電子情報通信学会総合大会

発表年月： 2016年03月
教師あり多チャネルNMFと統計的音声強調を用いた柔軟索状ロボットにおける音源分離

高田一真, 北村大地, 中嶋広明, 小山翔一, 猿渡洋, 小野順貴, 牧野,昭二

日本音響学会2016年春季研究発表会

発表年月： 2016年03月
ランク1 空間モデル制約付き多チャネルNMFを用いた柔軟索状ロボットにおける雑音抑圧

高草木萌, 北村大地, 小野順貴, 山田武志, 牧野昭二, 猿渡洋

日本機械学会ロボティクス・メカトロニクス講演会

発表年月： 2016年03月
非同期分散マイクロホンによるブラインド音源分離を用いた複数話者同時音声認識

越智景子, 小野順貴, 宮部滋樹, 牧野,昭二

日本音響学会2016年春季研究発表会

発表年月： 2016年03月
SVM classification study of code-modulated visual evoked potentials

D.,Aminaka, S.,Makino, T.M.,Rutkowski

APSIPA (PEOPLES R CHINA Hong Kong)

発表年月： 2015年12月
Diffuse noise suppression with asynchronous microphone array based on amplitude additivity model

Yoshikazu,Murase, Hironobu,Chiba, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

APSIPA2015 (PEOPLES R CHINA Hong Kong)

発表年月： 2015年12月
Fingertip stimulus cue-based tactile brain-computer interface

H.,Yajima, S.,Makino, T.M.,Rutkowski

APSIPA (PEOPLES R CHINA Hong Kong)

発表年月： 2015年12月
Variable sound elevation features for head-related impulse response spatial auditory BCI

C.,Nakaizumi, S.,Makino, T.M.,Rutkowski

APSIPA (PEOPLES R CHINA Hong Kong)

発表年月： 2015年12月
EEG filtering optimization for code-modulated chromatic visual evoked potential-based brain-computer interface

D.,Aminaka, S.,Makino, T.M.,Rutkowski

International Symbiotic Workshop (SYMBIOTIC)

発表年月： 2015年10月
日本語スピーキングテストSJ-CATにおける低スコア解答発話の検出の検討

小野,友暉, 山田,武志, 今井,新悟, 牧野,昭二

日本音響学会2015年秋季研究発表会

発表年月： 2015年09月
ノンリファレンスひずみ特徴量を用いた雑音下音声認識性能推定の検討

郭,レイ, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

日本音響学会2015年秋季研究発表会

発表年月： 2015年09月
Classification Accuracy Improvement of Chromatic and High-Frequency Code-Modulated Visual Evoked Potential-Based BCI

Aminaka,Daiki, Makino,Shoji, Rutkowski, Tomasz M

8th International Conference on Brain Informatics and Health (BIH) (Royal Geog Soc, London, ENGLAND)

発表年月： 2015年08月
Estimating correlation coefficient between two complex signals without phase observation

S.,Miyabe, N.,Ono, Makino,Shoji

LVA/ICA

発表年月： 2015年08月
Chromatic and high-frequency cVEP-based BCI paradigm

Aminaka,Daiki, Makino,Shoji, Rutkowski, Tomasz M

Engineering in Medicine and Biology Conference (EMBC)

発表年月： 2015年08月
Head-related impulse response cues for spatial auditory brain-computer interface

C.,Nakaizumi, S.,Makino, T.M.,Rutkowski

Engineering in Medicine and Biology Conference (EMBC)

発表年月： 2015年08月
マイクロホンアレーの位相が観測できない条件でのチャネル間の相関係数の推定

宮部滋樹, 小野順貴, 牧野,昭二

回路とシステムワークショップ

発表年月： 2015年08月
Inter-stimulus interval study for the tactile point-pressure brain-computer interface

K.,Shimizu, Makino,Shoji, T.M.,Rutkowski

Engineering in Medicine and Biology Conference (EMBC)

発表年月： 2015年08月
ステレオ録音に基づく移動音源モデルによる走行車両検出と走行方向推定

遠藤,純基, 豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会2015年春季研究発表会

発表年月： 2015年03月
総合品質と明瞭性の客観推定に基づくスペクトルサブトラクションの減算係数の最適化

中里,徹, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会

発表年月： 2015年03月
非同期分散マイクロフォンアレーによる伝達関数ゲイン基底NMFを用いた拡散雑音抑圧

村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会2015年春季研究発表会

発表年月： 2015年03月
ケプストラム距離とSMR-パープレキシティを用いた雑音下音声認識の性能推定の検討

郭,レイ, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会

発表年月： 2015年03月
2つの超ガウス性複素信号の位相観測を用いない相関係数推定

宮部滋樹, 小野順貴, 牧野, 昭二

信学技報EA2014-75

発表年月： 2015年03月
Spatial auditory BCI spellers using real and virtual surround sound systems

M.,Chang, C.,Nakaizumi, K.,Mori, Makino,Shoji, T.M.,Rutkowski

Conference on Systems Neuroscience and Rehabilitation (SNR2015)

発表年月： 2015年03月
認識性能予測に基づく雑音環境下音声認識のユーザビリティ改善の検討

青木,智充, 山田,武志, 宮部滋樹, 牧野昭二, 北脇信彦

日本音響学会2015年春季研究発表会

発表年月： 2015年03月
On microphone arrangement for multichannel speech enhancement based on nonnegative matrix factorization in time-channel domain

Yoshikazu,Murase, Hironobu,Chiba, Nobutaka,Ono, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

APSIPA 2014

発表年月： 2014年12月
絶対値の観測のみを用いた2つの複素信号の相関係数推定

宮部滋樹, 小野順貴, 牧野,昭二

日本音響学会研究発表会

発表年月： 2014年09月
ケプストラム距離を用いた雑音下音声認識の性能推定の検討

郭,翎, 山田,武志, 宮部,滋樹, 牧野,昭二, 北脇,信彦

日本音響学会研究発表会

発表年月： 2014年09月
伝達関数ゲイン基底NMFにおけるマイク数・マイク配置と目的音強調性能の関係

村瀬,慶和, 千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会

発表年月： 2014年09月
βダイバージェンスに基づく一般化振幅補間によるヴァーチャル多素子化を用いた目的音源強調

片平,拓希, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会

発表年月： 2014年09月
分散型マイクロホンアレイを用いた交通車両検出とその車線推定の検討

豊田,卓矢, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会

発表年月： 2014年09月
AMPLITUDE-BASED SPEECH ENHANCEMENT WITH NONNEGATIVE MATRIX FACTORIZATION FOR ASYNCHRONOUS DISTRIBUTED RECORDING

Chiba, Hironobu, Ono, Nobutaka, Miyabe, Shigeki, Takahashi, Yu, Yamada, Takeshi, Makino, Shoji

14th International Workshop on Acoustic Signal Enhancement (IWAENC) (Antibes, FRANCE)

発表年月： 2014年09月
Multi-stage declipping of clipping distortion based on length classification of clipped interval

Chenlei,Li, Shigeki,Miyabe, Takeshi,Yamada, Shoji,Makino

日本音響学会研究発表会

発表年月： 2014年09月
教師なし伝達関数ゲイン基底NMFによる目的音強調における罰則項の特性評価

千葉,大将, 小野,順貴, 宮部,滋樹, 山田,武志, 牧野,昭二

日本音響学会研究発表会

発表年月： 2014年09月
M2Mを用いた大規模データ収集システムの構築に関する研究

牧野,昭二

情報処理学会研究報告計算機アーキテクチャ研究会（ARC）

発表年月： 2013年12月
VIRTUALLY INCREASING MICROPHONE ARRAY ELEMENTS BY INTERPOLATION IN COMPLEX-LOGARITHMIC DOMAIN

Katahira, Hiroki, Ono, Nobutaka, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji

21st European Signal Processing Conference (EUSIPCO) (Marrakesh, MOROCCO)

発表年月： 2013年09月
非同期録音ブラインド同期のための線形位相補償の効率的最尤解探索

宮部滋樹, 小野順貴, 牧野昭二 [招待有り]

音講論集___2-10-4_

発表年月： 2013年03月
複素対数補間によるヴァーチャル観測に基づく劣決定条件での音声強調

片平拓希, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二 [招待有り]

音講論集___2-10-6_

発表年月： 2013年03月
日本語スピーキングテストSCATにおける文読み上げ・文生成問題の自動採点手法の改良

山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

音講論集___1-Q-52a_465-468

発表年月： 2013年03月
楽音符号化品質に影響を及ぼす楽音信号の特徴量の検討

松浦嶺, 山田武志, 牧野昭二, 北脇信彦

音講論集___1-Q-11c_401-404

発表年月： 2013年03月
ACELPにおけるピッチシャープニングの特性評価

千葉大将, 守谷健弘, 鎌本優, 原田登, 宮部滋樹, 山田武志, 牧野昭二 [招待有り]

音講論集___1-7-18_

発表年月： 2013年03月
身体機能の統合による音楽情動コミュニケーションモデル

寺澤洋子, 星-芝, 玲子, 柴山拓郎, 大村英史, 古川聖, 牧野, 昭二, 岡ノ谷一夫

認知科学

発表年月： 2013年
AUTOMATIC SCORING METHOD CONSIDERING QUALITY AND CONTENT OF SPEECH FOR SCAT JAPANESE SPEAKING TEST

Okubo, Naoko, Yamahata, Yuto, Yamada, Takeshi, Imai, Shingo, Ishizuka, Kenkichi, Shinozaki, Takahiro, Nisimura, Ryuichi, Makino, Shoji, Kitawaki, Nobuhiko

International Conference on Speech Database and Assessments (Oriental COCOSDA) (11 Macau, PEOPLES R CHINA)

発表年月： 2012年12月
日本語スピーキングテストにおける文生成問題の自動採点の検討

大久保梨思子, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

音講論集___3-Q-16_395-396

発表年月： 2012年09月
ミュージカルノイズを考慮した雑音抑圧音声のFR型客観品質評価の検討

藤田悠希, 山田武志, 牧野昭二, 北脇信彦

音講論集___3-P-5_127-130

発表年月： 2012年09月
身体動作の連動性理解にむけた筋活動可聴化

松原正樹, 寺澤洋子, 門根秀樹, 鈴木健嗣, 牧野昭二 [招待有り]

音講論集___2-10-2_

発表年月： 2012年09月
非同期録音信号の線形位相補償によるブラインド同期と音源分離への応用

宮部滋樹, 小野順貴, 牧野昭二 [招待有り]

音講論集___3-9-8_

発表年月： 2012年09月
日本語スピーキングテストにおける文章読み上げ問題の自動採点の検討

山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

音講論集___3-Q-18_399-400

発表年月： 2012年09月
コヒーレンス解析による定常状態誘発反応の可聴化

加庭輝明, 寺澤洋子, 松原正樹, T.M.,Rutkowski, 牧野昭二

音講論集___2002/10/2_919-922

発表年月： 2012年09月
多チャンネルウィーナーフィルタを用いた音源分離における観測モデルの調査

坂梨龍太郎, 宮部滋樹, 山田武志, 牧野昭二

音講論集___1-P-14,_757-760

発表年月： 2012年09月
混合DOA モデルに基づく多チャンネル複素NMF による劣決定BSS

武田和馬, 亀岡弘和, 澤田宏, 荒木章子, 宮部滋樹, 山田武志, 牧野昭二

音講論集___2-1-9_747-750

発表年月： 2012年03月
日本語スピーキングテストにおける文生成問題の採点に影響を及ぼす要因の検討

大久保梨思子, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

信学総大___D-14-9_193

発表年月： 2012年03月
日本語スピーキングテストにおける文章読み上げ問題の採点に影響を及ぼす要因の検討

山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

信学総大___D-14-8_192

発表年月： 2012年03月
雑音抑圧音声の主観品質評価におけるミュージカルノイズの影響

藤田悠希, 山田武志, 牧野昭二, 北脇信彦 [招待有り]

信学総大___D-14-1_185

発表年月： 2012年03月
音響モデルの精度を考慮した雑音下音声認識の性能推定の検討

高岡隆守, 山田武志, 牧野昭二, 北脇信彦

音講論集___1-P-13_149-150

発表年月： 2012年03月
短時間雑音特性に基づく雑音下音声認識の性能推定の検討

森下恵里, 山田武志, 牧野昭二, 北脇信彦

音講論集___1-P-14_151-152

発表年月： 2012年03月
フルランク空間相関行列モデルに基づく拡散性雑音除去

礒佳樹, 荒木章子, 牧野昭二, 中谷智広, 澤田宏, 山田武志, 宮部滋樹, 中村篤

信学総大___A-10-9_194

発表年月： 2012年03月
音量差に基づく音像生成における個人適応手法の有効性検証

天野成祥, 山田武志, 牧野昭二, 北脇信彦

音講論集___2-Q-1_895-898

発表年月： 2012年03月
高次相関を用いた非線形MUSIC による高分解能方位推定

杉本侑哉, 宮部滋樹, 山田武志, 牧野昭二

音講論集___3-1-6_763-766

発表年月： 2012年03月
時間周波数領域におけるグリッド間の整合性に基づくクリッピングの除去

三浦晋, 宮部滋樹, 山田武志, 牧野昭二, 中島弘史, 中臺一博

音講論集___1-Q-10_843-846

発表年月： 2012年03月
Underdetermined BSS With Multichannel Complex NMF Assuming W-Disjoint Orthogonality of Source

Takeda, Kazuma, Kameoka, Hirokazu, Sawada, Hiroshi, Araki, Shoko, Miyabe, Shigeki, Yamada, Takeshi, Makino, Shoji

IEEE Region 10 Conference on TENCON (INDONESIA)

発表年月： 2011年11月
Restoration of Clipped Audio Signal Using Recursive Vector Projection

Miura, Shin, Nakajima, Hirofumi, Miyabe, Shigeki, Makino, Shoji, Yamada, Takeshi, Nakadai, Kazuhiro

IEEE Region 10 Conference on TENCON (INDONESIA)

発表年月： 2011年11月
周波数依存の時間差モデルによる劣決定BSS

丸山卓郎, 荒木章子, 中谷智広, 宮部滋樹, 山田武志, 牧野昭二, 中村篤

信学技報___EA2011-86_25-30

発表年月： 2011年11月
発話の連続性に基づいた音声信号の分類による会議音声の可視化

加藤通朗, 杉本侑哉, 宮部滋樹, 牧野昭二, 山田武志, 北脇信彦

音講論集___3-P-20_197-200

発表年月： 2011年09月
雑音抑圧音声の総合品質推定モデルの改良とその客観品質評価への適用

藤田悠希, 山田武志, 牧野昭二, 北脇信彦

音講論集___2-Q-23_127-130

発表年月： 2011年09月
スピーカ間の音量差に基づく音像生成手法における個人適応の検討

天野成祥, 山田武志, 牧野昭二, 北脇信彦

音講論集___2-4-10_661-664

発表年月： 2011年09月
楽音と音声の双方に適用できる客観品質評価法の検討

三上, 雄一郎, 山田, 武志, 牧野, 昭二, 北脇, 信彦

信学総大___B-11-19_448

発表年月： 2011年03月
雑音抑圧音声の客観品質評価に用いる総合品質推定モデルの改良

藤田, 悠希, 山田, 武志, 牧野, 昭二

信学総大___B-11-18_447

発表年月： 2011年03月
スペクトル変形同定の聴覚トレーニングにおける適応的フィードバックの影響

加庭, 輝明, 金, 成英, 寺澤, 洋子, 伊藤, 寿浩, 池田, 雅弘, 山田, 武志, 牧野, 昭二

音講論集___2-1-1_1003-1006

発表年月： 2011年03月
クリッピングした音響信号の修復

三浦, 晋, 中島, 弘史, 牧野, 昭二, 山田, 武志, 中臺, 一博

音講論集___3-P-53(d)_941-944

発表年月： 2011年03月
空間スペクトルを用いた時間断続信号の検出における主成分分析と周波数分析の比較評価

加藤, 通朗, 杉本, 侑哉, 牧野, 昭二, 山田, 武志, 北脇, 信彦

音講論集___3-P-8(d)_879-880

発表年月： 2011年03月
空間スペクトルへの周波数分析の適用による時間断続信号の検出

杉本, 侑哉, 加藤, 通朗, 牧野, 昭二, 山田, 武志

音講論集___3-P-7(c)_877-878

発表年月： 2011年03月
高残響下で混合された音声の音源分離に関する研究

礒, 佳樹, 荒木, 章子, 牧野, 昭二, 中谷, 智広, 澤田, 宏, 山田, 武志, 中村, 篤

音講論集___1-9-13_643-646

発表年月： 2011年03月
音源のW-DO性を仮定した多チャンネル複素NMFによる劣決定BSS

武田, 和馬, 亀岡, 弘和, 澤田, 宏, 荒木, 章子, 山田, 武志, 牧野, 昭二

音講論集___1-Q-19(f)_801-804

発表年月： 2011年03月
視覚障がい者のタッチパネル操作支援のための音像生成手法の検討

天野, 成祥, 山田, 武志, 牧野, 昭

音講論集___3-P-7(c)_877-878

発表年月： 2011年03月
雑音抑圧された音声の主観・客観品質評価法

山田, 武志, 牧野, 昭二, 北脇, 信彦

情報処理学会研究報告音声言語情報処理（SLP）___2010-SLP-83 (7)_1-6

発表年月： 2010年10月
雑音抑圧音声のMOSと単語了解度の客観推定

山田, 武志, 北脇, 信彦, 牧野, 昭二

信学ソ大___BS-5-4_S-19

発表年月： 2010年09月
空間パワースペクトルの主成分分析に基づく時間断続信号の検出

加藤, 通朗, 杉本, 侑哉, 牧野, 昭二, 山田, 武志, 北脇, 信彦

信学技報___EA2010-47_25-30

発表年月： 2010年08月
Cepstral Smoothing of Separated Signals for Underdetermined Speech Separation

Ansai, Yumi, Araki, Shoko, Makino, Shoji, Nakatani, Tomohiro, Yamada, Takeshi, Nakamura, Atsushi, Kitawaki, Nobuhiko

International Symposium on Circuits and Systems Nano-Bio Circuit Fabrics and Systems (ISCAS 2010) (Paris, FRANCE)

発表年月： 2010年05月
調波構造とHMM合成に基づく混合楽器音認識の検討

山本裕貴, 山田武志, 北脇信彦, 牧野昭二

音講論集___3-8-4_1003-1004

発表年月： 2010年03月
雑音抑圧音声の総合品質推定モデルを適用したフルリファレンス客観品質評価法

篠原佑基, 山田武志, 北脇信彦, 牧野昭二

信学総大___B-11-2_436

発表年月： 2010年03月
劣決定音源分離のための分離信号のケプストラムスムージング

安齊祐美, 荒木章子, 牧野昭二, 中谷智広, 山田武志, 中村篤, 北脇信彦

音講論集___2-P-25_847-850

発表年月： 2010年03月
日本語学習支援のためのアクセント認識の検討

ショートグレッグ, 山田武志, 北脇信彦, 牧野昭二

音講論集___1-P-17_447-448

発表年月： 2010年03月
雑音下音声認識の性能推定法の実環境における評価

中島智弘, 山田武志, 北脇信彦, 牧野昭二

音講論集___2-Q-4_241-244

発表年月： 2010年03月
IP網における音声の客観品質評価に用いる擬似音声信号の検討

青島千佳, 北脇信彦, 山田武志, 牧野昭二 [招待有り]

信学総大___B-11-1_435

発表年月： 2010年03月
IP網における客観品質評価に用いる擬似音声信号の検討

青島千佳, 北脇信彦, 山田武志, 牧野昭二 [招待有り]

QoSワークショップ___QW7-P-16_

発表年月： 2009年11月
楽音と音声の双方に適用できるオーディオ信号の客観品質推定法の検討

三上雄一郎, 北脇信彦, 山田武志, 牧野昭二

QoSワークショップ___QW-7-P-15_

発表年月： 2009年11月
雑音抑圧音声の総合品質推定モデルを用いたフルリファレンス客観品質評価法の検討

篠原佑基, 山田武志, 北脇信彦, 牧野昭二

QoSワークショップ___QW7-P-13_

発表年月： 2009年11月
音声区間推定と時間周波数領域方向推定の統合による会議音声話者識別

荒木, 章子, 藤本, 雅清, 石塚, 健太郎, 中谷, 智広, 澤田, 宏, 牧野, 昭二

信学技報___EA2008-40_19-24

発表年月： 2008年07月
［フェロー記念講演］独立成分分析に基づくブラインド音源分離

牧野, 昭二

信学技報___EA2008-17_65-73

発表年月： 2008年05月
周波数領域ICAにおける初期値の短時間データからの学習

荒木, 章子, 伊藤, 信貴, 澤田, 宏, 小野, 順貴, 牧野, 昭二, 嵯峨山, 茂樹

信学総大___A-10-6_208

発表年月： 2008年03月
音声区間検出と方向情報を用いた会議音声話者識別システムとその評価

荒木, 章子, 藤本, 雅清, 石塚, 健太郎, 澤田, 宏, 牧野, 昭二

音講論集___1-10-1_1-4

発表年月： 2008年03月
音声のスパース性を用いたUnderdetermined音源分離

荒木, 章子, 澤田, 宏, 牧野, 昭二

信学総大___AS-4-5_S-46 - S-47

発表年月： 2008年03月
A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures

H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

ICA2007, Stereo Audio Source Separation Evaluation Campaign____

発表年月： 2007年09月
Blind source separation based on time-frequency masking and maximum SNR beamformer array

S., Araki, H., Sawada, and, S. Makino, 牧野, 昭二

ICA2007, Stereo Audio Source Separation Evaluation Campaign____

発表年月： 2007年09月
Blind audio source separation based on independent component analysis

S. Makino [招待有り]

Keynote Talk at the 2007 International Conference on Independent Component Analysis and Signal Separation

発表年月： 2007年09月
話者分類とSN比最大化ビームフォーマに基づく会議音声強調

荒木, 章子, 澤田, 宏, 牧野, 昭二

音講論集___2-1-13_571-572

発表年月： 2007年03月
事前学習を用いる周波数領域Pearson-ICAの高速化

加藤, 比呂子, 永原, 裕一, 荒木, 章子, 澤田, 宏, 牧野, 昭二

音講論集___1-5-22_549-550

発表年月： 2006年03月
観測信号ベクトルのクラスタリングに基づくスパース信号の到来方向推定

荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

音講論集___3-5-6_615-616

発表年月： 2006年03月
独立成分分析に基づくブラインド音源分離

牧野, 昭二, 荒木, 章子, 向井, 良, 澤田, 宏

計測自動制御学会中国支部学術講演会____2-9

発表年月： 2005年11月
多音源に対する周波数領域ブラインド音源分離

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

ＡＩチャレンジ研究会___SIG-Challenge-0522-3_17-22

発表年月： 2005年10月
パラメトリックピアソン分布を用いた周波数領域ブラインド音源分離

加藤, 比呂子, 永原, 裕一, 荒木, 章子, 澤田, 宏, 牧野, 昭二

音講論集___2-2-4_593-594

発表年月： 2005年09月
観測信号ベクトル正規化とクラスタリングによる音源分離手法とその評価

荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

音講論集___2-2-3_591-592

発表年月： 2005年09月
3次元マイクロホンアレイを用いた多音源ブラインド分離

向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

信学ソ大___A-10-8_209

発表年月： 2005年09月
多くの背景音からの主要音源のブラインド抽出

澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

信学ソ大___A-10-9_210

発表年月： 2005年09月
観測ベクトルのクラスタリングによるブラインド音源分離

荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

信学ソ大___A-10-7_208

発表年月： 2005年09月
独立成分分析を用いた音源数推定法

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

音講論集___3-Q-20_753-754

発表年月： 2004年09月
A solution for the permutation problem in frequency domain BSS using near- and far-field models

R., Mukai, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-3_

発表年月： 2004年04月
Underdetermined blind source separation for convolutive mixtures of sparse signals

S., Winter, H., Sawada, S., Araki, and, S. Makino, 牧野, 昭二

CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-2_

発表年月： 2004年04月
Blind separation of more speech than sensors using time-frequency masks and ICA

S., Araki, S., Makino, H., Sawada, and, R. Mukai

CSA2004 (NTT Workshop on Communication Scene Analysis)___AU-4_

発表年月： 2004年04月
Blind source separation for convolutive mixtures in the frequency domain

H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

CSA2004 (NTT Workshop on Communication Scene Analysis)___PAU-1_

発表年月： 2004年04月
狭間隔・広間隔の複数マイクロホン対を用いた周波数領域ブラインド音源分離

向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

音講論集___3-P-16_627-628

発表年月： 2004年03月
独立成分分析に基づくブラインド音源分離

牧野, 昭二, 荒木, 章子, 向井, 良, 澤田, 宏

ディジタル信号処理シンポジウム___A3-2_1-10

発表年月： 2003年11月
Blind Separation of More Speech Signals than Sensors using Time-frequency Masking and Mixing Matrix Estimation

荒木, 章子, Audrey, Blin, 牧野, 昭二

音講論集___1-P-4_585-586

発表年月： 2003年09月
周波数領域BSSにおける近距離場モデルを用いたパーミュテーションの解法

向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

音講論集___1-P-6_589-590

発表年月： 2003年09月
実環境における3音源以上のブラインド分離

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

音講論集___2-5-19_547-548

発表年月： 2003年09月
時間周波数マスキングとICAの併用による音源数 > マイク数の場合のブラインド音源分離

荒木, 章子, 向井, 良, 澤田, 宏, 牧野, 昭二

音講論集___1-P-5_587-588

発表年月： 2003年09月
独立成分分析に基づくブラインド音源分離

牧野, 昭二, 荒木, 章子, 向井, 良, 澤田, 宏

信学技報___EA2003-45_17-24

発表年月： 2003年06月
ICA-based audio source separation

S., Makino, S., Araki, R., Mukai, and, H. Sawada

International Workshop on Microphone Array Systems - Theory and Practice____

発表年月： 2003年05月
周波数領域ブラインド音源分離におけるpermutation問題の頑健な解法

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

音講論集___3-P-25_777-778

発表年月： 2003年03月
移動音源の低遅延実時間ブラインド分離

向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

音講論集___3-P-26_779-780

発表年月： 2003年03月
帯域に適した分離手法を用いるサブバンド領域ブラインド音源分離

荒木, 章子, 牧野, 昭二, Robert, Aichner, 西川, 剛樹, 猿渡, 洋

音講論集___3-P-27_781-782

発表年月： 2003年03月
KL情報量最小化に基づく時間領域ICAと非定常信号の同時無相関化に基づく時間領域ICAの比較

西川, 剛樹, 高谷, 智哉, 猿渡, 洋, 鹿野, 清宏, 荒木, 章子, 牧野, 昭二

音講論集___2-5-14_545-546

発表年月： 2002年09月
死角型ビームフォーマを初期値に用いる時間領域ブラインド音源分離

荒木, 章子, 牧野, 昭二, Robert, Aichner, 西川, 剛樹, 猿渡, 洋

音講論集___2-5-13_543-544

発表年月： 2002年09月
ブラインド音源分離後の残留スペクトルの推定と除去

向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

音講論集___2-5-11_539-540

発表年月： 2002年09月
周波数領域ブラインド音源分離におけるpermutation問題の解法

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

音講論集___2-5-12_541-542

発表年月： 2002年09月
周波数領域ICAと時間遅れスペクトル減算による残響下での実時間ブラインド音源分離

向井, 良, 荒木, 章子, 澤田, 宏, 牧野, 昭二

音講論集___1-Q-19_673-674

発表年月： 2002年03月
サブバンド処理によるブラインド音源分離に関する検討

荒木, 章子, 牧野, 昭二, Robert, Aichner, 西川, 剛樹, 猿渡, 洋

音講論集___3-4-9_619-620

発表年月： 2002年03月
間隔の異なる複数のマイクペアによるブラインド音源分離

澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

音講論集___3-4-10_621-622

発表年月： 2002年03月
ICA-based sound separation

S., Makino, S., Araki, R., Mukai, H., Sawada, R., Aichner, H., Saruwatari, T., Nishikawa, and, Y. Hinamoto

NTT Workshop on Comm. Scene Analysis____

発表年月： 2002年01月
Time domain blind source separation of non-stationary convolved signals with utilization of geometric beamforming

R., Aichner, S., Araki, S., Makino, H., Sawada, T., Nishikawa, and, H. Saruwatari

NTT Workshop on Comm. Scene Analysis____

発表年月： 2002年01月
Separation and dereverberation performance of frequency domain blind source separation

R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

NTT Workshop on Comm. Scene Analysis____

発表年月： 2002年01月
Equivalence between frequency domain blind source separation and frequency domain adaptive beamformers

S., Araki, S., Makino, R., Mukai, and, H. Saruwatari

NTT Workshop on Comm. Scene Analysis____

発表年月： 2002年01月
A polar-coordinate based activation function for frequency domain blind source separation

H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

NTT Workshop on Comm. Scene Analysis____

発表年月： 2002年01月
周波数領域ブラインド音源分離と適応ビ－ムフォ－マの等価性について

雛元, 洋一, 西川, 剛樹, 猿渡, 洋, 荒木, 章子, 牧野, 昭二, 向井, 良

信学技報___EA2001-84_75-82

発表年月： 2001年11月
非定常スペクトルサブトラクションによる音源分離後の残留雑音除去

向井, 良, 荒木, 章子, 澤田, 宏, 牧野, 昭二

音講論集___2-6-14_617-618

発表年月： 2001年10月
周波数領域ブラインド音源分離のための極座標表示に基づく活性化関数

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

音講論集___2-6-13_615-616

発表年月： 2001年10月
周波数領域ブラインド音源分離と周波数領域適応ビームフォーマの関係について

荒木, 章子, 牧野, 昭二, 向井, 良, 猿渡, 洋

音講論集___2-6-12_613-614

発表年月： 2001年10月
時間領域ICAと周波数領域ICAを併用した多段ICAによるブラインド音源分離

猿渡, 洋, 西川, 剛樹, 荒木, 章子, 牧野, 昭二

日本神経回路学会全国大会____99-100

発表年月： 2001年09月
複素数に対する独立成分分析のための極座標表示に基づく活性化関数

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

日本神経回路学会全国大会____97-98

発表年月： 2001年09月
実環境での混合音声に対する周波数領域ブラインド音源分離手法の性能限界

荒木, 章子, 牧野, 昭二, 西川, 剛樹, 猿渡, 洋

音講論集___3-7-4_567-568

発表年月： 2001年03月
帯域分割型ICAを用いたBlind Source Separationにおける帯域分割数の最適化

西川, 剛樹, 荒木, 章子, 牧野, 昭二, 猿渡, 洋

音講論集___3-7-5_569-570

発表年月： 2001年03月
実環境におけるブラインド音源分離と残響除去性能に関する検討

向井, 良, 荒木, 章子, 牧野, 昭二

音講論集___3-7-3_565-566

発表年月： 2001年03月
周波数領域Blind Source Separationにおける帯域分割数の最適化

西川, 剛樹, 荒木, 章子, 牧野, 昭二, 猿渡, 洋

信学技報___EA2000-95_53-59

発表年月： 2001年01月
チャネル数変換型多チャネル音響エコーキャンセラ

中川, 朗, 島内, 末廣, 羽田, 陽一, 青木, 茂明, 牧野, 昭二

信学総大___A-4-51_140

発表年月： 2000年03月
ステレオエコーキャンセラにおける相互相関変動方法の検討

鈴木, 邦和, 杉山, 精, 阪内, 澄宇, 島内, 末廣, 牧野, 昭二

信学技報___EA99-86_25-32

発表年月： 1999年12月
音響系の変動に着目したステレオ信号の相関低減方法

鈴木, 邦和, 阪内, 澄宇, 島内, 末廣, 牧野, 昭二

音講論集___1-6-12_453-454

発表年月： 1999年03月
ハンズフリー音声会議装置における複数マイクロホンの構成の検討

中川, 朗, 島内, 末廣, 牧野, 昭二

音講論集___2-6-7_493-494

発表年月： 1999年03月
相互相関の変動付加処理に適したステレオエコーキャンセラの構成の検討

島内, 末廣, 羽田, 陽一, 牧野, 昭二, 金田, 豊

信学総大___A-4-12_121

発表年月： 1998年03月
Block fast projection algorithm with independent block sizes

M., Tanaka, S., Makino, J., Kojima

信学総大___TA-2-2_554-555

発表年月： 1997年03月
射影アルゴリズムを用いたサブバンドステレオエコーキャンセラ

牧野, 昭二, 島内, 末廣, 羽田, 陽一, 中川, 朗

音講論集___2-7-18_549-550

発表年月： 1996年09月
サブバンドエコーキャンセラにおけるフィルタ更新ベクトルの平坦化の検討

中川, 朗, 羽田, 陽一, 牧野, 昭二

信学ソ大___A-87_88

発表年月： 1996年09月
拡声通信システムにおける周波数帯域別所要エコー抑圧量の検討

阪内, 澄宇, 牧野, 昭二

音講論集___2-7-17_547-548

発表年月： 1996年09月
高速射影アルゴリズムの多チャンネル系への適用

島内, 末廣, 田中, 雅史, 牧野, 昭二

信学総大___A-168_170

発表年月： 1996年03月
ES family'アルゴリズムと従来の適応アルゴリズムの関係について

牧野, 昭二

信学技報___DSP95-148_65-70

発表年月： 1996年01月
高速FIRフィルタリング算法を利用した射影法

田中, 雅史, 牧野, 昭二, 金田, 豊

信学ソ大___A-79_81

発表年月： 1995年09月
サブバンドエコーキャンセラのプロトタイプフィルタの検討

中川, 朗, 羽田, 陽一, 牧野, 昭二

信学ソ大___A-73_75

発表年月： 1995年09月
擬似入出力関係を利用したステレオ音響エコーキャンセラ用アルゴリズムの検討

島内, 末廣, 牧野, 昭二

音講論集___2-6-5_543-544

発表年月： 1995年09月
複素射影サブバンドエコーキャンセラに関する検討

中川, 朗, 羽田, 陽一, 牧野, 昭二

音講論集___2-6-3_539-540

発表年月： 1995年09月
エコーキャンセラ用ＳＳＢサブバンド射影アルゴリズム

牧野, 昭二, 羽田, 陽一, 中川, 朗

音講論集___2-6-4_541-542

発表年月： 1995年09月
真の音響エコー経路を推定するステレオ射影エコーキャンセラの検討

島内, 末廣, 牧野, 昭二

信学総大___A-220_220

発表年月： 1995年03月
ES射影アルゴリズムを用いたデュオフィルタ構成のエコーキャンセラの検討

羽田, 陽一, 牧野, 昭二, 小島, 順治, 島内, 末廣

音講論集___3-3-10_595-596

発表年月： 1995年03月
音響エコーキャンセラ用デュオフィルタコントロールシステム

羽田, 陽一, 牧野, 昭二, 田中, 雅史, 島内, 末廣, 小島, 順治

信学総大___A-350_350

発表年月： 1995年03月
高性能音響エコーキャンセラの開発

小島, 順治, 牧野, 昭二, 羽田, 陽一, 島内, 末廣, 金田, 豊

信学総大___A-348_348

発表年月： 1995年03月
ＥＳ射影アルゴリズムの音響エコーキャンセラへの適用

牧野, 昭二, 羽田, 陽一, 田中, 雅史, 金田, 豊, 小島, 順治

信学総大___A-349_349

発表年月： 1995年03月
エコーキャンセラの音声入力に対する収束速度改善方法の比較について

牧野, 昭二

音講論集___2-6-16_653-654

発表年月： 1994年10月
ステレオ信号の相互相関の変化に着目したステレオ射影エコーキャンセラの検討

島内, 末廣, 牧野, 昭二

音講論集___2-6-17_655-656

発表年月： 1994年10月
PMTC/N-ISDN用多地点エコーキャンセラの構成

須田, 泰史, 藤野, 雄一, 牧野, 昭二, 小長井, 俊介, 川田, 真一

信学全大___B-795_393

発表年月： 1994年09月
室内音場伝達関数の共通極・零モデル化

羽田, 陽一, 牧野, 昭二, 金田, 豊

信学技報___EA93-101_19-29

発表年月： 1994年03月
ES-RLSアルゴリズムと従来の適応アルゴリズムの関係について

牧野, 昭二

音講論集___1-5-12_471-472

発表年月： 1993年10月
共通極を用いたスピーカ特性の多点イコライゼーションについて

羽田, 陽一, 牧野, 昭二

音講論集___1-5-18_483-484

発表年月： 1993年10月
高次の射影アルゴリズムの演算量削減について

田中, 雅史, 金田, 豊, 牧野, 昭二

信学全大___A-101_1-103

発表年月： 1993年09月
共通極を用いた多点イコライゼーションフィルタについて

羽田, 陽一, 牧野, 昭二

音講論集___3-9-17_491-492

発表年月： 1993年03月
複数の室内音場伝達関数に共通な極の最小2乗推定について

羽田, 陽一, 牧野, 昭二, 金田, 豊

信学全大___SA-11-4_1-489 - 1-490

発表年月： 1993年03月
音響エコーキャンセラ用ES射影アルゴリズム

牧野, 昭二, 金田, 豊

信学技報___EA92-74_41-52

発表年月： 1992年11月
室内インパルス応答の変動特性を反映させたES-RLSアルゴリズム

牧野, 昭二, 金田, 豊

音講論集___2-4-19_547-548

発表年月： 1992年10月
音声入力に対する射影法の次数と収束特性について

田中, 雅史, 牧野, 昭二, 金田, 豊

音講論集___1-4-14_489-490

発表年月： 1992年10月
エコーキャンセラ用ES射影アルゴリズムの収束条件について

牧野, 昭二, 金田, 豊

信学全大___SA-9-6_1-301

発表年月： 1992年09月
室内インパルス応答の統計的性質に基づく指数重み付けNLMS適応フィルタ

牧野, 昭二, 金田, 豊

信学技報___EA92-48_9-20

発表年月： 1992年08月
エコーキャンセラ用ES射影アルゴリズム

牧野, 昭二, 金田, 豊

信学全大___SA-7-11_1-472 - 1-473

発表年月： 1992年03月
音響エコーキャンセラにおけるダブルトーク制御方式の検討

中原, 宏之, 羽田, 陽一, 牧野, 昭二, 吉川, 昭吉郎

音講論集___3-5-7_503-504

発表年月： 1992年03月
音の到来方向によらない頭部伝達伝達関数の共通極とモデル化について

羽田, 陽一, 牧野, 昭二, 金田, 豊

音講論集___1-8-5_483-484

発表年月： 1991年10月
エコーキャンセラ用ES (Exponential Step) アルゴリズムの収束条件について

牧野, 昭二, 金田, 豊

音講論集___1-7-25_419-420

発表年月： 1991年03月
室内音場伝達関数の極の推定について

羽田, 陽一, 牧野, 昭二, 金田, 豊

音講論集___1-7-12_393-394

発表年月： 1991年03月
帯域分割形指数重み付けアルゴリズムを用いた音響エコーキャンセラ

牧野, 昭二, 羽田, 陽一

信学全大___SA-9-4_1-255 - 1-256

発表年月： 1990年10月
低周波領域における室内音場伝達関数のARMAモデルについて

羽田, 陽一, 牧野, 昭二, 小泉, 宣夫

音講論集___2-7-14_439-440

発表年月： 1990年03月
指数重み付けによるエコーキャンセラ用適応アルゴリズム

牧野, 昭二

音講論集___3-6-5_517-518

発表年月： 1989年10月
エコーキャンセラの室内音場における適応特性改善について

牧野, 昭二, 小泉, 宣夫

信学技報___EA89-3_15-21

発表年月： 1989年04月
拡声通話形の音声会議システム

及川, 弘, 西野, 正和, 牧野, 昭二

信学全大___B-548_2-243

発表年月： 1988年03月
エコーキャンセラの室内音場における適応特性の改善について

牧野, 昭二, 小泉, 宣夫

音講論集___1-5-13_355-356

発表年月： 1988年03月
複数反響路を有する音響エコーキャンセラの構成法

小泉, 宣夫, 牧野, 昭二, 及川, 弘

信学技報___EA87-75_1-6

発表年月： 1988年01月
複数反響路を有する音響エコーキャンセラ

小泉, 宣夫, 牧野, 昭二, 及川, 弘

信学部門全大___431_1-296

発表年月： 1987年09月
音響エコーキャンセラの室内環境における消去特性について

牧野, 昭二, 小泉, 宣夫

信学技報___EA87-43_41-48

発表年月： 1987年08月
直方体ブース内の障害物によるインパルス応答の変動について

牧野, 昭二, 小泉, 宣夫

音講論集___1-3-1_295-296

発表年月： 1987年03月
MTFによる音声会議でのマイクロホン配置の評価について

小泉, 宣夫, 牧野, 昭二, 青木, 茂明

音講論集___2-7-18_631-632

発表年月： 1986年10月
音響エコーキャンセラの室内環境における定常特性について

牧野, 昭二, 小泉, 宣夫

音講論集___2-7-19_383-384

発表年月： 1985年10月
室内残響特性を考慮した音声スイッチ切替特性の検討

牧野, 昭二, 山森, 和彦

音講論集___1-2-19_265-266

発表年月： 1984年10月
マイクロプロセッサ制御を用いた拡声電話機の構成法

山森, 和彦, 松井, 弘行, 牧野, 昭二

信学技報___EA84-41_15-21

発表年月： 1984年09月
音声スイッチ回路損失制御波形の通話品質への影響

石丸, 薫, 小川, 峰義, 牧野, 昭二

信学部門全大___795_3-190

発表年月： 1984年09月
周辺に段差を持つ圧電バイモルフ振動板の振動特性について

一ノ瀬, 裕, 牧野, 昭二

音講論集___1-6-5_287-288

発表年月： 1983年10月
ハンドセット小形化に関する一検討

牧野, 昭二, 一ノ瀬, 裕

音講論集___1-6-10_297-298

発表年月： 1983年10月

▼全件表示

共同研究・競争的資金等の研究課題

次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

関東経済産業局中小企業経営支援等対策費補助金（戦略的基盤技術高度化支援事業）

研究期間:

2020年04月

-

2021年03月
音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

日本学術振興会基盤研究(B)

研究期間:

2019年

-

2021年

牧野昭二

　概要を見る

[検討項目１] 音の伝播の物理的なモデルに基づいて観測信号を補間し、実際には存在しない、いわばバーチャルな観測信号を作り出して素子数を擬似的に増やすことにより、音源数に依存することなく高品質な出力を得るための統一的なアレー信号処理を検討した。擬似観測の振幅は非線形補間により推定した。擬似観測を用いた音声強調の劣決定拡張により、擬似観測の基本的な検証を行った。さらに、バーチャルマイクロホンの動作原理の解明と高性能化を図った。今期は、国際会議発表２件、および、国内大会発表１件の研究成果を得た。
[検討項目２] 音環境からの情報を利用した多チャネル信号処理アルゴリズムを開発した。既存のアルゴリズムを分散型マイクロホンアレーに対応できるように一般化し、さらに強力な最適化規範を導入した。分散型マイクロホンアレーにおけるサブアレーの同期手法を開発した。ブラインド音源分離/抽出アルゴリズムや多チャネル残響除去アルゴリズムを分散型マイクロホンアレーに対応できるように開発した。さらに、必要なマイクロホンを最小化して演算量を削減しながら、性能を最適化するためのマイクロホン選択手法も検討した。今期は、雑誌論文４件、国際会議発表７件、および、国内大会発表９件の研究成果を得た。
[検討項目３] 強調された音源信号から抽出した特徴量に基づき、音環境を解析・理解した。音源信号に関する先見知識を利用し、特徴量次元での分類法も利用した。分類精度を向上させるために、深層学習などの最新の音声認識技術を活用した。今期は、国際会議発表１件、および、国内大会発表１件の研究成果を得た。
マイクロホンアレーを用いた音情景解析の研究

筑波大学・ドイツ学術交流会（ＤＡＡＤ）パートナーシップ・プログラム

研究期間:

2017年04月

-

2018年03月
ＡＬＳ患者のための音の空間情報を利用したブレインマシンインタフェース(ＢＭＩ)の研究開発

総務省戦略的情報通信研究開発推進制度（ＳＣＯＰＥ）その他

研究期間:

2014年04月

-

2015年03月
脳科学と情報科学を融合させたＢＭＩ構築のための多チャネル脳波信号処理技術の革新

日本学術振興会基盤研究(C)

研究期間:

2013年04月

-

2014年03月

牧野昭二, ルトコフスキトマシュ, 宮部滋樹, 寺澤洋子, 山田武志

　概要を見る

本プロジェクトでは、人が音の空間性を認識するときの脳活動に着目し、基礎研究を行なうとともに、このような空間性を有する音の聴取時に観察される特徴ある脳活動を利用したブレインマシンインタフェースの開発を行なった。今年度は、空間聴覚刺激に対する事象関連電位の統計的特徴に基づいた電極と潜時の選択手法を提案し、識別率を向上させた。音の出力法の試みでは、スピーカによる実音源と仮想音源でP300反応に個人差があること、視覚刺激より振幅が小さいこと、頭部伝達関数を利用した空間聴覚刺激が後頭部にP300を誘発させることを確認した。
音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

日本学術振興会基盤研究(B)

研究期間:

2020年04月

-

2021年03月
スモールデータ機械学習理論に基づく音響拡張現実感及び音コミュニケーション能力拡張

日本学術振興会基盤研究(A)

研究期間:

2020年04月

-

2021年03月
音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

日本学術振興会基盤研究(B)

研究期間:

2019年04月

-

2020年03月
スモールデータ機械学習理論に基づく音響拡張現実感及び音コミュニケーション能力拡張

日本学術振興会基盤研究(A)

研究期間:

2019年04月

-

2020年03月
次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

関東経済産業局中小企業経営支援等対策費補助金（戦略的基盤技術高度化支援事業）

研究期間:

2019年04月

-

2020年03月
非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

日本学術振興会基盤研究(A)

研究期間:

2019年04月

-

2020年03月
高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

セコム科学技術振興財団

研究期間:

2019年04月

-

2020年03月
大量音声データの事前学習に基づくブラインド音源分離手法の高度化

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2019年04月

-

2020年02月
次世代自動車ハンズフリー通話システムのための音声強調信号処理技術の研究開発

関東経済産業局中小企業経営支援等対策費補助金（戦略的基盤技術高度化支援事業）

研究期間:

2018年09月

-

2019年03月
非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

日本学術振興会基盤研究(A)

研究期間:

2018年04月

-

2019年03月
高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

セコム科学技術振興財団

研究期間:

2018年04月

-

2019年03月
聴覚特性を考慮した信号処理・機械学習アプローチによる音声強調法の研究

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2018年04月

-

2019年02月
DNNを用いた音声音響符号化の研究

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2018年04月

-

2019年02月
非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

日本学術振興会基盤研究(A)

研究期間:

2017年04月

-

2018年03月
音環境の認識と理解およびスマートホームセキュリティ－、ロボット聴覚、等への応用

NII 国内共同研究

研究期間:

2017年04月

-

2018年03月
環境に適応するための音声強調系最適化

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2017年04月

-

2018年03月
高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

セコム科学技術振興財団

研究期間:

2017年04月

-

2018年03月
DNNを用いた音声音響符号化の研究

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2017年04月

-

2018年02月
聴覚特性を考慮した信号処理・機械学習アプローチによる音声強調法の研究

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2017年04月

-

2018年02月
柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

国立研究開発法人科学技術振興機構 (JST) 革新的研究開発推進プログラム（ImPACT)

研究期間:

2017年04月

-

2017年11月
マイクの指向性による、音声認識率の向上

富士ソフト株式会社国内共同研究

研究期間:

2016年04月

-

2017年03月
柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

国立研究開発法人科学技術振興機構 (JST) 革新的研究開発推進プログラム（ImPACT)

研究期間:

2016年04月

-

2017年03月
非同期分散チャンネルへ展開するアレイ信号処理理論の深化と実世界応用

日本学術振興会基盤研究(A)

研究期間:

2016年04月

-

2017年03月
マイクロホンアレー付き監視カメラを用い音響情報を統計数理的学習理論により解析するイベント検出とシーン解析

NII 国内共同研究

研究期間:

2016年04月

-

2017年03月
高次統計量制御スパース信号表現に基づく協創型音響センシグ及びその社会システム応用

セコム科学技術振興財団

研究期間:

2016年04月

-

2017年03月
音響情報と映像情報を統計数理的学習理論により融合するイベント検出とシーン解析

筑波大学研究基盤支援プログラム（Ｂタイプ）

研究期間:

2016年04月

-

2017年03月
マイクロホンアレーを用いた音情景解析の研究

筑波大学・ドイツ学術交流会（ＤＡＡＤ）パートナーシップ・プログラム

研究期間:

2016年04月

-

2017年03月
音声音響符号化音のプレフィルタ・ポストフィルタ処理による音質改善の研究

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2016年04月

-

2017年02月
音声のスペクトル領域とケプストラム領域における同時強調法の研究

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2016年04月

-

2017年02月
柔軟ロボット音響センシングにおけるシミュレータ構築及び音源分離処理の高精度化

国立研究開発法人科学技術振興機構 (JST) 革新的研究開発推進プログラム（ImPACT)

研究期間:

2015年09月

-

2016年03月
非同期録音機器を利用可能にするマイクロフォンアレイ信号処理の研究

日本学術振興会基盤研究(B)

研究期間:

2015年04月

-

2016年03月
音響センシングによる交通量モニタリング

NII 国内共同研究

研究期間:

2014年04月

-

2015年03月
低遅延・低ビットレートの音声・音響統合符号化の検討

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2014年04月

-

2015年03月
非同期録音機器を利用可能にするマイクロフォンアレイ信号処理の研究

日本学術振興会基盤研究(B)

研究期間:

2014年04月

-

2015年03月
高次統計量追跡による自律カスタムメイド音コミュニケーション拡張システムの研究

日本学術振興会基盤研究(A)

研究期間:

2014年04月

-

2015年03月
ＡＬＳ患者のための音の空間情報を利用したブレインマシンインタフェース(ＢＭＩ)の研究開発

総務省戦略的情報通信研究開発推進制度（ＳＣＯＰＥ）その他

研究期間:

2013年04月

-

2014年03月
非同期録音機器を利用可能にするマイクロフォンアレイ信号処理の研究

日本学術振興会基盤研究(B)

研究期間:

2013年04月

-

2014年03月

小野順貴, 牧野昭二, 宮部滋樹, 篠田浩一

　概要を見る

マイクロフォンアレイ信号処理は、複数のマイクで録音した信号を処理し、音の到来方向を推定したり、雑音の中から目的音を強調したりすることを可能にする重要な技術です。マイクロフォンアレイ信号処理では、チャンネル間の微小な時間差が重要な情報となっているため、従来は複数のマイクロフォンが同期して録音される必要がありました。これに対し本研究では、スマートフォン、ノートＰＣ、ＩＣレコーダーなど、同期していない複数の録音機器をアレイ信号処理に用いるために、録音信号を事前情報なしに同期させたり、録音信号からマイクロフォンの位置を推定したりする技術を開発しました。
複数録音機器による非同期録音信号の同期に関する研究

ヤマハ株式会社国内共同研究

研究期間:

2013年04月

-

2014年03月
複素対数補間に基づくヴァーチャル観測を用いた劣決定アレイ信号処理

NII 国内共同研究

研究期間:

2013年04月

-

2014年03月
高次統計量追跡による自律カスタムメイド音コミュニケーション拡張システムの研究

日本学術振興会基盤研究(A)

研究期間:

2013年04月

-

2014年03月

猿渡洋, 鹿野清宏, 戸田智基, 川波弘道, 小野順貴, 宮部滋樹, 牧野昭二, 小山翔一

　概要を見る

本研究では、高次統計量追跡による自律カスタムメイド音声コミュニケーション拡張システムに関して研究を行った。具体的なシステムとして、ブラインド音源分離に基づく両耳補聴システムや声質変換に基づく発声補助システムを開発し、以下の成果が得られた。（１）両耳補聴システムに関しては、高精度かつ高速なブラインド音源分離及び統計的音声強調アルゴリズムを提案し、聴覚印象の不動点を活用した高品質な音声強調システムが実現できた。（２）発声補助システムに関しては、データベース間における発話のミスマッチを許容する声質変換処理を開発した。実環境模擬データベースを用いてその評価を行い、有効性を確認することが出来た。
低遅延・低ビットレートの音声・音響統合符号化の検討

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2013年05月

-

2014年02月
ＡＬＳ患者のための音の空間情報を利用したブレインマシンインタフェース(ＢＭＩ)の研究開発

総務省戦略的情報通信研究開発推進制度（ＳＣＯＰＥ）その他

研究期間:

2012年09月

-

2013年03月
脳科学と情報科学を融合させたＢＭＩ構築のための多チャネル脳波信号処理技術の革新

日本学術振興会基盤研究(C)

研究期間:

2012年04月

-

2013年03月

牧野昭二, ルトコフスキトマシュ, 宮部滋樹, 寺澤洋子, 山田武志

　概要を見る

本プロジェクトでは、人が音の空間性を認識するときの脳活動に着目し、基礎研究を行なうとともに、このような空間性を有する音の聴取時に観察される特徴ある脳活動を利用したブレインマシンインタフェースの開発を行なった。今年度は、空間聴覚刺激に対する事象関連電位の統計的特徴に基づいた電極と潜時の選択手法を提案し、識別率を向上させた。音の出力法の試みでは、スピーカによる実音源と仮想音源でP300反応に個人差があること、視覚刺激より振幅が小さいこと、頭部伝達関数を利用した空間聴覚刺激が後頭部にP300を誘発させることを確認した。
非同期録音機器を利用可能にするアレイ信号処理技術

NII 国内共同研究

研究期間:

2012年04月

-

2013年03月
高次統計量追跡による自律カスタムメイド音コミュニケーション拡張システムの研究

日本学術振興会基盤研究(A)

研究期間:

2012年04月

-

2013年03月

猿渡洋, 鹿野清宏, 戸田智基, 川波弘道, 小野順貴, 宮部滋樹, 牧野昭二, 小山翔一

　概要を見る

本研究では、高次統計量追跡による自律カスタムメイド音声コミュニケーション拡張システムに関して研究を行った。具体的なシステムとして、ブラインド音源分離に基づく両耳補聴システムや声質変換に基づく発声補助システムを開発し、以下の成果が得られた。（１）両耳補聴システムに関しては、高精度かつ高速なブラインド音源分離及び統計的音声強調アルゴリズムを提案し、聴覚印象の不動点を活用した高品質な音声強調システムが実現できた。（２）発声補助システムに関しては、データベース間における発話のミスマッチを許容する声質変換処理を開発した。実環境模擬データベースを用いてその評価を行い、有効性を確認することが出来た。
脳科学と情報科学を融合させたＢＭＩ構築のための多チャネル脳波信号処理技術の革新

日本学術振興会基盤研究(C)

研究期間:

2011年04月

-

2012年03月

牧野昭二, ルトコフスキトマシュ, 宮部滋樹, 寺澤洋子, 山田武志

　概要を見る

本プロジェクトでは、人が音の空間性を認識するときの脳活動に着目し、基礎研究を行なうとともに、このような空間性を有する音の聴取時に観察される特徴ある脳活動を利用したブレインマシンインタフェースの開発を行なった。今年度は、空間聴覚刺激に対する事象関連電位の統計的特徴に基づいた電極と潜時の選択手法を提案し、識別率を向上させた。音の出力法の試みでは、スピーカによる実音源と仮想音源でP300反応に個人差があること、視覚刺激より振幅が小さいこと、頭部伝達関数を利用した空間聴覚刺激が後頭部にP300を誘発させることを確認した。
音声特性と聴覚特性を反映した音声強調処理技術の研究

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2011年04月

-

2012年03月
脳科学と情報科学を融合させたＢＣＩ構築のための多チャネル脳波信号処理の研究

電気通信普及財団出資金による受託研究

研究期間:

2011年04月

-

2012年03月
脳科学，生命科学，情報科学を融合させた生体マルチメディア情報研究

研究期間:

2011年04月

-

　
音声特性と聴覚特性を反映した音声強調処理技術の研究

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2010年04月

-

2011年03月
音声特性と聴覚特性を反映した音声強調処理技術の研究

NTT コミュニケーション科学基礎研究所国内共同研究

研究期間:

2009年04月

-

2010年03月
生体信号処理と音響信号処理による生命科学研究の革新

日本学術振興会科学研究費助成事業

研究期間:

2010年

　

　

牧野昭二, WU Y. J.
音声、音楽メディアのコンテンツ基盤技術の創出とイマーシブオーディオコミュニケーションの創生

研究期間:

2009年04月

-

▼全件表示

Misc

畳込み混合のブラインド音源分離(<特集>独立成分分析とその応用特集号)

牧野昭二, 荒木章子, 向井良, 澤田宏

システム/制御/情報 : システム制御情報学会誌 48 ( 10 ) 401 - 408 2004年10月

DOI CiNii
ブラインドな処理が可能な音源分離技術 (特集コミュニケーションの壁を克服するための音声・音響処理技術)

牧野昭二, 荒木章子, 向井良

NTT技術ジャ-ナル 15 ( 12 ) 8 - 12 2003年12月

CiNii
ステレオエコーキャンセラの課題と解決法

牧野昭二, 島内末廣

システム/制御/情報 : システム制御情報学会誌 46 ( 12 ) 724 - 732 2002年12月

DOI CiNii
混じりあった声を解く--遠隔発話の認識を目指して (特集論文1 人にやさしい対話型コンピュータ)

牧野昭二, 向井良, 荒木章子

NTT R & D 50 ( 12 ) 937 - 944 2001年12月

CiNii
サブバンド信号処理 : 実時間動作化の奥の手

牧野昭二

日本音響学会誌 56 ( 12 ) 845 - 851 2000年12月

DOI CiNii
周波数帯域における音響エコー経路の変動特性を反映させたサブバンドESアルゴリズム

牧野昭二, 羽田陽一

電子情報通信学会論文誌. A, 基礎・境界 79 ( 6 ) 1138 - 1146 1996年06月

　概要を見る

本論文は, 従来のNLMSアルゴリズムと同等の演算量と記憶容量で収束速度が約2倍の, 新しいエコーキャンセラ用適応アルゴリズムを提案するものである. サブバンド ES (exponentially weighted stepsize) アルゴリズムと名づけたこの適応アルゴリズムでは, 受話入力信号とエコー信号を複数の周波数帯域に分割し, それぞれの周波数帯域で独立にエコーを消去するサブバンドエコーキャンセラにおいて, それぞれの周波数帯域に設けた適応形トランスバーサルフィルタのそれぞれの係数に対して, 異なるステップサイズを用いている. これらのステップサイズは時不変で, その周波数帯域における室内インパルス応答の変化分, 例えば二つのインパルス応答波形の差, の期待値に比例して指数的に重み付けられている. その結果, 各周波数帯域における音響エコー経路の変動特性の違いを適応アルゴリズムに反映させ, 収束特性を改善することができる. ここでは, 室内音場のインパルス応答データを用いた計算機シミュレーション, およびDSPで構成した実験装置を用いた実時間評価実験を行い, NLMSアルゴリズムを用いた従来のフルバンドエコーキャンセラに比べて, 音声入力に対する収束速度を約4倍にできることを明らかにする.

CiNii
音響エコ-キャンセラ用ES射影アルゴリズム (シ-ムレスな音響空間の実現を目指して<特集>)

牧野昭二, 金田豊

NTT R & D 44 ( 1 ) p45 - 52 1995年01月

CiNii
音響エコー経路の変動特性を反映させたRLS適応アルゴリズム

牧野昭二, 金田豊

日本音響学会誌 50 ( 1 ) 32 - 39 1993年12月

CiNii
2つの超ガウス性複素信号の位相観測を用いない相関係数推定 (信号処理)

宮部滋樹, 小野順貴, 牧野昭二

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 114 ( 474 ) 19 - 24 2015年03月

　概要を見る

本稿では,2つの有相関な複素振幅系列の位相が失われた観測から,元の位相を持った信号の相関係数を推定する方法について議論する。我々は以前に,2つの複素振幅系列が2変量複素正規分布に従うと仮定した確率モデルを立てて,位相差を隠れ変数とするEMアルゴリズムによって相関系列を推定する手法を提案した。しかし,優ガウス的であることが多い複素振幅とのモデルミスマッチにより,信号によっては推定精度が低下してしまうという問題があった。本稿では,2つの複素信号系列が2変量複素t分布に従うと仮定して,優ガウス的な信号の形状に適応的な最尤推定を定式化することにより,モデルミスマッチに頑健な相関係数推定を試みる。実験の結果,複素t分布モデルは信号によっては必ずしも複素正規分布モデルよりも高精度とは限らないが,適切なモデルを選択することにより,単純な絶対値の相関を用いるよりも高い精度の推定が得られることを確認した。

CiNii
非同期マイクロホンアレーのためのサンプリング周波数ミスマッチのブラインド補償 (応用音響)

宮部滋樹, 小野順貴, 牧野昭二

電子情報通信学会技術研究報告 : 信学技報 112 ( 347 ) 11 - 16 2012年12月

　概要を見る

本稿では,非同期マイクロホンアレーのためのチャネル間のサンプリング周波数ミスマッチをブラインドに推定し補償する手法について述べる.サンプリング周波数のミスマッチによるチャネル問の時間差の変化は短時間では一定となるため,フレーム毎に周波数領域で位相を操作することで補償する.また,音源が移動しないと仮定した最尤推定により,サンプリング周波数のミスマッチを推定する.実験により提案手法はアレー信号処理の性能を大幅に回復できることが確認された.

CiNii
単一音源区間情報を用いた非同期マイクロホンアレーによる音声強調 (応用音響)

坂梨龍太郎, 小野順貴, 宮部滋樹, 山田武志, 牧野昭二

電子情報通信学会技術研究報告 : 信学技報 112 ( 347 ) 17 - 22 2012年12月

　概要を見る

非同期マイクロホンアレーは,携帯電話やボイスレコーダーなど複数の録音機器を用いることで,従来のマイクロホンアレーによる音響信号処理における拡張性による制約がなく,安価で柔軟な構成を行えるという利点がある.しかし,非同期マイクロホンアレーには,録音開始時刻やDOA情報が不明であり,機器間のサンプリング周波数における未知の個体差が存在するなどの問題点も挙げられる.特に,録音開始時刻のずれや機器間のサンプリング周波数個体差は,音響信号処理に重大な影響を与え,これを補償することが必要となる.本稿では,議事録作成のための会議録音など,予め音声強調を目的とした場面を想定し,ある特定の音源だけが音を生じている時間区間である,単一音源区間を録音信号に盛り込むことでそれを手がかりとした同期補償を提案する.

CiNii
球状スピーカアレーを用いた放射特性制御のシミュレーション

林貴哉, 宮部滋樹, 山田武志, 牧野昭二

電子情報通信学会技術研究報告. EA, 応用音響 112 ( 76 ) 19 - 24 2012年06月

　概要を見る

本稿では球状スピーカアレーによる距離減衰制御システムについて述べる.従来研究において,球面調和領域における波の伝搬モデルを用いて球状マイクロホンアレーにより音源からの距離に対する感度を制御する放射特性フィルタが提案されている.入力と出力が入れ替わっても変化しないという伝達関数の性質から,マイクロホンアレーのための放射特性フィルタはラウドスピーカアレーによる距離減衰のフィルタ設計に直接応用することができる.無残響シミュレーションの結果,低い周波数では提案手法が有効であることを確認した.

CiNii
高次モーメント分析に基づく非線形MUSICによる劣決定方向推定

杉本侑哉, 宮部滋樹, 山田武志, 牧野昭二

電子情報通信学会技術研究報告. EA, 応用音響 112 ( 76 ) 49 - 54 2012年06月

　概要を見る

本稿では,高次モーメント分析によってMUltiple SIgnal Classification(MUSIC)を劣決定条件の下での方向推定へと拡張した写像MUSICを提案する.写像MUSICは非線形関数により観測信号を高次元空間へと写像し,写像空間において共分散行列の分析を行う.このとき,写像の共分散行列は信号の高次クロスモーメント行列に相当する.高次元写像によって雑音部分空間の次元数が増大することから,方向推定の分解能が向上し,劣決定条件の下での方向推定が可能となる.信号の高次キュムラント分析に基づいた2q-MUSICとの理論・実験の両面からの比較によって写像MUSICの性質について議論し,現実的な条件の下ではより少ない計算量で同等の推定精度が得られることを示す.

CiNii
D-14-9 日本語スピーキングテストにおける文生成問題の採点に影響を及ぼす要因の検討(D-14.音声,一般セッション)

大久保梨思子, 山畑勇人, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

電子情報通信学会総合大会講演論文集 2012 ( 1 ) 193 - 193 2012年03月

CiNii
D-14-8 日本語スピーキングテストにおける文章読み上げ問題の採点に影響を及ぼす要因の検討(D-14.音声,一般セッション)

山畑勇人, 大久保梨思子, 山田武志, 今井新悟, 石塚賢吉, 篠崎隆宏, 西村竜一, 牧野昭二, 北脇信彦

電子情報通信学会総合大会講演論文集 2012 ( 1 ) 192 - 192 2012年03月

CiNii
空間パワースペクトルの主成分分析に基づく時間断続信号の検出

加藤通朗, 杉本侑哉, 牧野昭二

聴覚研究会資料 40 ( 7 ) 575 - 580 2010年08月

CiNii
音声区間推定と時間周波数領域方向推定の統合による会議音声話者識別

荒木章子, 藤本雅清, 石塚健太郎, 中谷智広, 澤田宏, 牧野昭二

電子情報通信学会技術研究報告. EA, 応用音響 108 ( 143 ) 19 - 24 2008年07月

　概要を見る

我々は、会議状況において「いつ誰が話したか」を推定する方法を検討している。これは、音声区間検出器(VAD)で推定した音声存在確率と、音声区間における音声到来方向(DOA)の分類結果とを用いて、会議音声中の各話者の音声区間を推定するものである。これを本稿では話者識別と呼ぶ。本稿では、この性能向上を目的とし、2つの方法を提案する。提案1として、DOAを各時間周波数スロットで推定することで、特に複数人同時発話時の話者識別精度を向上させる。提案2として、VAD結果およびDOA情報を確率的に統合する方法を検討する。両提案法により、実際の会話音声データに対して、話者識別性能の向上が見られたので報告する。

CiNii
Special section on acoustic scene analysis and reproduction - Foreword

Shoji Makino

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E91A ( 6 ) 1301 - 1302 2008年06月

その他
独立成分分析に基づくブラインド音源分離

牧野昭二

電子情報通信学会技術研究報告. SIP, 信号処理 108 ( 70 ) 65 - 73 2008年05月

　概要を見る

たくさんの音の中から聞きたい音を聞き分ける音源分離技術として,近年,独立成分分析(Independent Component Analysis, ICA)に基づく手法が脚光を浴びている.この手法は,音源位置の知識や目的音(妨害音)区間の切り出しを原理的に必要とせず,完全なブラインド分離が可能である.統計的処理であるICAは,物理的,音響的にはある種のブラックボックスであり,その中で何が行われているのか,何がどこまで分離できるのかがあまりわかっていなかった.我々はこれまでの研究により,統計的手法であるICAを音響信号処理的な観点から分析して物理的意味づけを与え,従来の音響信号処理技術との関係を解明した.そして,ICAに基づくブラインド音源分離が,適応ビームフォーマ(adaptive beamformer, ABF)と呼ばれるマイクロホンアレーと同じ動作原理を実現しており二乗誤差最小の意味で等価であることを明らかにした.2マイクのABFの支配的な動作は妨害音に1つの死角を向ける動作である.これより,様々な方向からの残響音を消せないことがICAが残響に弱い理由の一つであること,ABFがICAの性能の上限を与えることなどを明らかにした.しかしながら,ICAは音源位置の知識や妨害音区間の切り出しが不要で,音源信号が同時になっていても全く問題ないという点で,ABFの高機能版と言える.

CiNii
周波数領域ICAにおける初期値の短時間データからの学習

荒木章子, 伊藤信貴, 澤田宏, 小野順貴, 牧野昭二, 嵯峨山茂樹

電子情報通信学会大会講演論文集 2008 208 - 208 2008年03月

CiNii J-GLOBAL
音声のスパース性を用いた Underdetermined 音源分離

荒木章子, 澤田宏, 牧野昭二

電子情報通信学会総合大会基礎境界講演論文集, 2008 "S - 46"-"S-47" 2008年

CiNii
A-10-7 観測ベクトルのクラスタリングによるブラインド音源分離(A-10.応用音響,基礎・境界)

荒木章子, 澤田宏, 向井良, 牧野昭二

電子情報通信学会ソサイエティ大会講演論文集 2005 208 - 208 2005年09月

CiNii
A-10-9 多くの背景音からの主要音源のブラインド抽出(A-10.応用音響,基礎・境界)

澤田宏, 荒木章子, 向井良, 牧野昭二

電子情報通信学会ソサイエティ大会講演論文集 2005 210 - 210 2005年09月

CiNii
A-10-8 3次元マイクロホンアレイを用いた多音源ブラインド分離(A-10.応用音響,基礎・境界)

向井良, 澤田宏, 荒木章子, 牧野昭二

電子情報通信学会ソサイエティ大会講演論文集 2005 209 - 209 2005年09月

CiNii
移動音源の低遅延実時間ブラインド分離

向井良, 澤田宏, 荒木章子, 牧野昭二

日本音響学会研究発表会講演論文集 2003 ( 1 ) 779 - 780 2003年03月

CiNii
独立成分分析に基づくブラインド音源分離

牧野昭二

ディジタル信号処理シンポジウム 103 ( 129 ) 17 - 24 2003年

記事・総説・解説・論説等（学術雑誌）

　概要を見る

私たちが普段それほど意識せずに行っている「聞きたい音を聞き分ける」という能力がコンピュータには欠けている.独立成分分析に基づく手法は,ある人が話している声と別の人の声,背景に流れる音楽,雑音等,それぞれの音は互いに統計的に独立であるという仮定により,複数のマイクで観測した信号を互いに独立な信号に分離すれば,それぞれのもとの音を復元できる,という原理に基づいている.この手法は,音源や混合系の情報を原理的に必要としない,いわゆるブラインドな分離が可能である.招待講演では,独立成分分析とは何か,ブラインド音源分離とは何か,どのようにして分離が達成されるのか,分離のメカニズムはどのようなものか,などについて,できるだけ直感的に分り易く説明する[1].

CiNii
サブバンド処理によるブラインド音源分離に関する検討

荒木章子, AICHNER Robert, 牧野昭二, 西川剛樹, 猿渡洋

日本音響学会研究発表会講演論文集 2002 ( 1 ) 619 - 620 2002年03月

CiNii
間隔の異なる複数のマイクペアによるブラインド音源分離

澤田宏, 荒木章子, 向井良, 牧野昭二

日本音響学会研究発表会講演論文集 2002 ( 1 ) 621 - 622 2002年03月

CiNii
周波数領域ICAと時間遅れスペクトル減算による残響下での実時間ブラインド音源分離

向井良, 荒木章子, 澤田宏, 牧野昭二

日本音響学会研究発表会講演論文集 2002 ( 1 ) 673 - 674 2002年03月

CiNii
周波数領域ブラインド音源分離と周波数領域適応ビームフォーマの関係について

荒木章子, 牧野昭二, 向井良, 猿渡洋

日本音響学会研究発表会講演論文集 2001 ( 2 ) 613 - 614 2001年10月

CiNii
非定常スペクトルサブトラクションによる音源分離後の残留雑音除去

向井良, 荒木章子, 澤田宏, 牧野昭二

日本音響学会研究発表会講演論文集 2001 ( 2 ) 617 - 618 2001年10月

CiNii
実環境での混合音声に対する周波数領域ブラインド音源分離手法の性能限界

荒木章子, 牧野昭二, 西川剛樹, 猿渡洋

日本音響学会研究発表会講演論文集 2001 ( 1 ) 567 - 568 2001年03月

CiNii
実環境におけるブラインド音源分離と残響除去性能に関する検討

向井良, 荒木章子, 牧野昭二

日本音響学会研究発表会講演論文集 2001 ( 1 ) 565 - 566 2001年03月

CiNii
帯域分割型 ICA を用いた Blind Source Separation における帯域分割数の最適化

西川剛樹, 荒木章子, 牧野昭二, 猿渡洋

日本音響学会研究発表会講演論文集 2001 ( 1 ) 569 - 570 2001年03月

CiNii
周波数領域 Blind Source Separation における帯域分割数の最適化

西川剛樹, 荒木章子, 牧野昭二, 猿渡洋

電子情報通信学会技術研究報告. EA, 応用音響 100 ( 580 ) 53 - 59 2001年01月

　概要を見る

本稿では周波数領域Blind Source Separation(BSS)における帯域分割数の最適化について述べる.一般に, 従来の周波数領域ICAに基づくBSSは残響に弱い.一方, 残響除去を行う逆フィルタを構成する際, フィルタ長(もしくは帯域分割数)を増やすと残響抑圧性能が向上するということが知られている.そこでまず初めに, 残響下における分離性能を向上させるため, BSSにおける帯域分割数を増加させて音源分離実験を行った.音源分離実験の結果, 帯域分割数を過度に増やすと分離精度が劣化してしまうことが分かった.次に, この劣化原因を明らかにするため, 独立性を測る簡易な客観評価量を定義し, 帯域分割数と狭帯域信号間の独立性の関係を調べた.独立性評価実験の結果より, 帯域分割数を増やすと独立性が低下することが確認された.よって, 周波数領域ICAに基づくBSSにおいては, 最適な帯域分割数が存在することがわかった.

CiNii
チャネル数変換型多チャネル音響エコーキャンセラ

中川朗, 島内末廣, 羽田陽一, 青木茂明, 牧野昭二

電子情報通信学会総合大会講演論文集 2000 140 - 140 2000年03月

CiNii
ステレオエコーキャンセラにおける相互相関変動方式の検討

鈴木邦和, 杉山精, 阪内澄宇, 島内末廣, 牧野昭二

電子情報通信学会技術研究報告. EA, 応用音響 99 ( 518 ) 25 - 32 1999年12月

　概要を見る

ステレオ音声による拡声通話システムで必要となるステレオエコーキャンセラでは,ステレオの受話信号間の相互相関が高い場合が多く,適応フィルタが真のエコー経路に収束しない,真値への収束速度が遅い,といった問題がある.これらの問題を解決するために,人為的に相関を変動させる前処理方式が数多く提案されているが,これらの方式は音声に歪みを伴うという欠点がある.本報告では,実際の通話時における遠端の送話者の微小な移動に着目し,音像の定位を乱すことなく相互相関を変動させる方式を提案する.さらに聴覚特性を考慮した最適化により,収束性能の向上と音声品質保持の両立が可能であるという検討結果を示す.

CiNii
音響系の変動に着目したステレオ信号の相関低減方法 -(第2報)聴覚特性を考慮した最適化

鈴木邦和, 阪内澄宇, 島内末廣, 牧野昭二

日本音響学会研究発表会講演論文集 1999 ( 2 ) 495 - 496 1999年09月

CiNii
音響系の変動に着目したステレオ信号の相関低減方法

鈴木邦和, 阪内澄宇, 島内末廣, 牧野昭二

日本音響学会研究発表会講演論文集 1999 ( 1 ) 453 - 454 1999年03月

CiNii
ハンズフリー音声会議装置における複数マイクロホンの構成の検討

中川朗, 島内末廣, 牧野昭二

日本音響学会研究発表会講演論文集 1999 ( 1 ) 493 - 494 1999年03月

CiNii
相互相関の変動付加処理に適したステレオエコーキャンセラの構成の検討

島内末廣, 羽田陽一, 牧野昭二, 金田豊

電子情報通信学会総合大会講演論文集 1998 121 - 121 1998年03月

CiNii
ブロック長を独立にしたブロック高速射影法

田中雅史, 牧野昭二, 小島順治

電子情報通信学会総合大会講演論文集 1997 554 - 555 1997年03月

　概要を見る

Block processing is an effective approach for reducing the computational complexity of adaptive filtering algorithms although it delays the adaptive filter output and degrades the convergence rate in some implementations. Recently, Benesty[1] proposed a solution to the problems. He introduced the idea of 'exact' block processing which produces the filter output exactly the same as that of the corresponding sample-by-sample algorithm and has short delay by facilitating the fast FIR filtering method. Block processing can be applied to two parts of the adaptive filtering algorithms, i.e. computing the filter output and updating the filter. Conventional 'exact' block algorithms have been using the identical block size for the two parts. This short paper presents the 'exact' block projection algorithm [2] having two independent block sizes, which is listed in List 1. We see, by showing the relation between the filter length and the output delay for a given computation power, that the independent block sizes extend the availability of the 'exact' block fast projection algorithm toward use with longer delay.

CiNii
サブバンドエコーキャンセラにおけるフィルタ係数更新ベクトルの平坦化の検討

中川朗, 羽田陽一, 牧野昭二

電子情報通信学会ソサイエティ大会講演論文集 1996 88 - 88 1996年09月

　概要を見る

サブバンドエコーキャンセラ(SBEC)では、間引き率を上げ分割数に近付けると、エリアジングの影響により定常消去量が劣下する。これを避けるために間引き率を下げると、適応フィルタヘの入力信号に帯域通過フィルタの特'性が影響し、収束速度が劣下する。筆者らはこの問題に対し、入力信号と反響信号に異なる特'性の分割フィルタを設定する方法を既に提案した。本報告では、適応フィルタ係数の更新部への入力信号が固定の周波数特性を持つことに注目し、これを固定係数の平坦化フィルタで平坦化することによって収束特性を改善する方法を提案する。

CiNii
拡声通信システムにおける周波数帯域別所要エコー抑圧量の検討

阪内澄宇, 牧野昭二

日本音響学会研究発表会講演論文集 1996 ( 2 ) 547 - 548 1996年09月

CiNii
射影アルゴリズムを用いたサブバンドステレオエコーキャンセラ

牧野昭二, 島内末廣, 羽田陽一, 中川朗

日本音響学会研究発表会講演論文集 1996 ( 2 ) 549 - 550 1996年09月

CiNii
高速射影アルゴリズムの多チャンネル系への適用

島内末廣, 田中雅史, 牧野昭二

電子情報通信学会総合大会講演論文集 1996 170 - 170 1996年03月

　概要を見る

線形未知システムに対する入出力をもとに、そのシステムを同定する一手法として、次数の選択により、演算量に応じた同定速度が得られる射影アルゴリズムがある。高速算法の利用により、演算量はさらに低減可能である。また、ステレオ音響エコーキャンセラヘの適用等、多チャンネル系の同定法としても提案されている。本報告では、多チャンネル系に拡張された射影アルゴリズムに高速算法を適用する。

CiNii
サブバンドエコーキャンセラのプロトタイプフィルタの検討

中川朗, 羽田陽一, 牧野昭二

電子情報通信学会ソサイエティ大会講演論文集 1995 75 - 75 1995年09月

　概要を見る

サブバンドエコーキャンセラ(SBEC)は、音声の白色化効果による適応フィルタの収束速度向上、間引きによる演算量の低減が望める。その一方で、帯域分割/合成フィルタ処理による遅延や定常消去量の低下が問題となる。本報告では、図1に示すポリフェーズ型SBECの2つの帯域分割用プロトタイプフィルタA(z)、B(z)のフィルタ長および適応フィルタ長に着目し、収束特性の改善方法について検討した。

CiNii
擬似入出力関係利用したステレオ音響エコーキャンセラ用アルゴリズムの検討

島内末廣, 牧野昭二

日本音響学会研究発表会講演論文集 1995 ( 2 ) 543 - 544 1995年09月

CiNii
エコーキャンセラ用SSBサブバンド射影アルゴリズム

牧野昭二, 羽田陽一, 中川朗

日本音響学会研究発表会講演論文集 1995 ( 2 ) 541 - 542 1995年09月

CiNii
複素射影サブバンドエコーキャンセラに関する検討

中川朗, 羽田陽一, 牧野昭二

日本音響学会研究発表会講演論文集 1995 ( 2 ) 539 - 540 1995年09月

CiNii
真の音響エコー経路を推定するステレオ射影エコーキャンセラの検討

島内末廣, 牧野昭二

電子情報通信学会総合大会講演論文集 1995 220 - 220 1995年03月

　概要を見る

ステレオ音声による通信会議等に不可欠となるステレオエコーキャンセラには、ステレオ信号の相互相関の影響により音響エコー経路の推定を誤る問題がある。このため、話者交替等の度に残留エコーの増大が起きる。本報告では、ステレオ信号の相互相関の変化を強制して利用するステレオ射影エコーキャンセラについて、真の音響エコー経路の推定への有効性と話者交替時の残留エコー増大の低減効果を示す。

CiNii
ES射影アルゴリズムの音響エコーキャンセラへの適用

牧野昭二, 羽田陽一, 田中雅史, 金田豊, 小島順治

電子情報通信学会総合大会講演論文集 1995 349 - 349 1995年03月

　概要を見る

エコーキャンセラを実環境で安定に動作させるためには,受話音声の微小音区間に対する対策や,ダブルトーク対策が重要である.ここでは,ES射影アルゴリズムをDuo Filter構成のエコーキャンセラに適用し,速い収束と安定な動作を実現したので報告する.

CiNii
ES射影アルゴリズムを用いたデュオフィルタ構成のエコーキャンセラの検討

羽田陽一, 牧野昭二, 小島順治, 島内末廣

日本音響学会研究発表会講演論文集 1995 ( 1 ) 595 - 596 1995年03月

CiNii
音響エコーキャンセラ用デュオフィルタコントロールシステム

羽田陽一, 牧野昭二, 田中雅史, 島内末廣, 小島順治

1995電子情報通信学会総合大会, March 350 - 350 1995年

　概要を見る

音響エコーキャンセラを実環境で動作させるためには、(1)ダブルトーク検出を含め、適応動作制御を如何に行なうか。(2)適応フィルタが音響系を同定していない状態で如何にハウリングを抑えるか。の2点が特に重要となる。(1)のダブルトークの検出技術に関してはこれまで多くの研究がなされてきているが、特に優れた方式はなく、送話信号と受話信号のパワー比較などで行なわれている。また、(2)に関しては音声スイッチとの併用法が提案されているが、音響結合量を予測して最適な挿入損失量を与えないと、結果的に過大な挿入損失を与えてしまい、通話に切断感を与えてしまう。本報告では、ES射影アルゴリズムを用いたDuo Filter Control Systemを提案し、上記2つの問題を解決したので報告する。

CiNii
高速FIRフィルタリング算法を利用した射影法

田中雅史, 牧野昭二, 金田豊

信学ソ大, Sept. 1995 81 - 81 1995年

　概要を見る

近年提案されている高速射影法の演算量は適応フィルタの次数をL、射影の次数をpとすると、約2L+20pであり、演算量2LのNLMS(学習同定法)とほぼ同程度の演算量の少ない手法といえる。しかし、音響エコーキャンセラのようにフィルタ長Lが数百、数千にもなる応用ではさらなる演算量の削減が要求される。本報告では、高速FIRフィルタリング算法を射影法に導入することで、さらに演算量を削減する方法を示す。この提案法では、高速FIRフィルタリング算法がブロック処理を行なうので推定誤差の出力が遅れるが、その他の収束特性は逐次処理に基づくオリジナルの射影法の性能が保たれる。

CiNii
音響エコーキャンセラのための適応信号処理の研究

牧野昭二

東北大学博士論文 71 ( 12 ) 2212 - 2214 1993年

CiNii
帯域分割形指数重み付けアルゴリズムを用いた音響エコーキャンセラ

牧野昭二

信学全大,SA-9-4 1990年

CiNii

▼全件表示

産業財産権

Device for blind source separation

H., Sawada, S., Araki, R., Mukai, and, S. Makino, 牧野, 昭二

特許権
Device for blind source separation

S., Araki, H., Sawada, S., Makino, and, R. Mukai

特許権
Apparatus, method and program for estimation of positional information on signal sources

H., Sawada, R., Mukai, S., Araki, and, S. Makino, 牧野, 昭二

特許権
音情報処理装置及びプログラム

牧野, 昭二, 山岡洸瑛, 山田武志, 小野順貴

特許権
音響処理装置, 音響処理システム及び音響処理方法

牧野昭二, 石村, 大, 前, 成美, 山田武志, 小野順貴

特許権
信号処理装置、信号処理方法、プログラム、記録媒体 (可変カットオフ周波数によるポストフィルタリング方法)

鎌本,優, 守谷,健弘, 原田,登, 千葉,大将, 宮部,滋樹, 山田,武志, 牧野,昭二

特許権
音声信号処理装置及び方法

小野,順貴, 宮部,滋樹, 牧野,昭二

特許権
信号処理装置、信号処理方法、プログラム (ピッチ周波数に依存する可変ゲインによるポストフィルタリング方法)

鎌本,優, 守谷,健弘, 原田,登, 千葉,大将, 宮部,滋樹, 山田,武志, 牧野,昭二

特許権
方向情報分布推定装置, 音源数推定装置, 音源方向測定装置, 音源分離装置, それらの方法, それらのプログラム

荒木, 章子, 中谷, 智広, 澤田, 宏, 牧野, 昭二

特許権
複数信号区間推定装置, 複数信号区間推定方法, そのプログラムおよび記録媒体

荒木, 章子, 石塚, 健太郎, 藤本, 雅清, 中谷, 智広, 牧野, 昭二

特許権
複数信号区間推定装置とその方法と, プログラムとその記録媒体

荒木, 章子, 石塚, 健太郎, 藤本, 雅清, 牧野, 昭二

特許権
信号分離装置, 信号分離方法, プログラム及び記録媒体

澤田, 宏, 荒木, 章子, 牧野, 昭二

特許権
信号分離装置, 信号分離方法, 信号分離プログラム, 記録媒体

澤田, 宏, 荒木, 章子, 牧野, 昭二

特許権
多信号強調装置, 方法, プログラム及びその記録媒体

荒木, 章子, 澤田, 宏, 牧野, 昭二

特許権
ブラインド信号抽出装置, その方法, そのプログラム, 及びそのプログラムを記録した記録媒体

荒木, 章子, 澤田, 宏, Jan, Cermak, 牧野, 昭二

特許権
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

特許権
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

特許権
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体, 並びに, 信号到来方向推定装置, 信号到来方向推定方法, 信号到来方向推定プログラム及び記録媒体

澤田, 宏, 牧野, 昭二, 荒木, 章子, 向井, 良

特許権
信号到来方向推定装置, 信号到来方向推定方法, 信号到来方向推定プログラム及び記録媒体

荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

特許権
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

特許権
信号到来方向推定方法, 装置, プログラムおよびこれを記録した記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

特許権
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

荒木, 章子, 澤田, 宏, 向井, 良, 牧野, 昭二

特許権
信号抽出装置, 信号抽出方法, 信号抽出プログラム及び記録媒体

澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

特許権
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

荒木, 章子, 牧野, 昭二, 澤田, 宏, 向井, 良

特許権
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

特許権
信号源数の推定方法, 推定装置, 推定プログラム及び記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

特許権
信号分離装置, 信号分離方法, 信号分離プログラム及び記録媒体

荒木, 章子, 牧野, 昭二, 澤田, 宏, 向井, 良

特許権
信号分離方法, 信号分離装置, 信号分離プログラム及び記録媒体

澤田, 宏, 荒木, 章子, 向井, 良, 牧野, 昭二

特許権
信号分離方法および装置ならびに信号分離プログラムおよびそのプログラムを記録した記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

特許権
ブラインド信号分離装置, ブラインド信号分離方法及びブラインド信号分離プログラム

荒木, 章子, 牧野, 昭二, 向井, 良, 澤田, 宏

特許権
ブラインド信号分離装置, ブラインド信号分離方法及びブラインド信号分離プログラム

向井, 良, 澤田, 宏, 荒木, 章子, 牧野, 昭二

特許権
ブラインド信号分離方法, ブラインド信号分離プログラム及び記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

特許権
信号到来方向推定方法, 装置, プログラムおよびこれを記録した記録媒体

澤田, 宏, 向井, 良, 荒木, 章子, 牧野, 昭二

特許権
SmoothQuiet

牧野, 昭二, 小島, 順治

特許権
QuiteSmooth

牧野, 昭二, 小島, 順治

特許権
EchoCam

牧野, 昭二, 小島順治

特許権
SUBBANDES

牧野, 昭二, 羽田, 陽一, 小島, 順治

特許権
ESPARC

羽田, 陽一, 牧野, 昭二, 小島, 順治

特許権
Radespa

羽田, 陽一, 牧野, 昭二, 小島, 順治

特許権
DISCAS

羽田, 陽一, 牧野, 昭二, 小島, 順治

特許権
ES射影アルゴリズム

牧野, 昭二, 羽田, 陽一, 小島, 順治

特許権
デュオフィルタ

牧野, 昭二, 羽田, 陽一, 小島, 順治

特許権
インテリジェントロスコントローラ

牧野, 昭二, 羽田, 陽一, 小島, 順治

特許権
フェールセーフ適応動作制御方式

牧野, 昭二, 羽田, 陽一, 小島, 順治

特許権
スムーストーク

牧野, 昭二, 小島, 順治

特許権

▼全件表示

現在担当している科目

基幹理工学特論Ｃ

基幹理工学部

2025年春学期
基幹理工学特論Ｃ

基幹理工学部

2025年春学期
卒業論文Ｂ　18前再

基幹理工学部

2025年秋学期
卒業論文Ｂ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
卒業論文Ａ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年春学期
卒業論文Ａ　18前再

基幹理工学部

2025年春学期
卒業論文Ｂ（春学期）　18前再

基幹理工学部

2025年春学期
卒業論文Ａ（秋学期）　18前再

基幹理工学部

2025年秋学期
基幹理工学特論Ｃ

基幹理工学部

2025年春学期
基幹理工学特論Ｃ

基幹理工学部

2025年春学期
卒業論文Ｂ　18前再

基幹理工学部

2025年秋学期
卒業論文Ａ　18前再

基幹理工学部

2025年春学期
卒業論文Ａ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年春学期
基幹理工学特論Ｃ

基幹理工学部

2025年春学期
卒業論文Ｂ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
卒業論文Ａ（秋学期）　18前再

基幹理工学部

2025年秋学期
卒業論文Ｂ（春学期）　18前再

基幹理工学部

2025年春学期
知的音響システム演習C

大学院情報生産システム研究科

2025年春学期
知的音響システム演習B

大学院情報生産システム研究科

2025年春学期
知的音響システム演習A

大学院情報生産システム研究科

2025年秋学期
音響信号処理

大学院情報生産システム研究科

2025年秋学期
知的音響システム研究（博士論文）

大学院情報生産システム研究科

2025年通年
知的音響システム研究（博士）　秋

大学院情報生産システム研究科

2025年秋学期
知的音響システム研究（博士）　春

大学院情報生産システム研究科

2025年春学期
知的音響システム演習D

大学院情報生産システム研究科

2025年秋学期
機械学習

大学院情報生産システム研究科

2025年春学期
知的音響システム特論

大学院情報生産システム研究科

2025年秋学期
知的音響システム研究（修士）　秋

大学院情報生産システム研究科

2025年秋学期
知的音響システム研究（修士）　春

大学院情報生産システム研究科

2025年春学期
修士論文（集積システム分野）秋

大学院情報生産システム研究科

2025年秋学期
修士論文（集積システム分野）春

大学院情報生産システム研究科

2025年春学期
ディジタル信号処理

大学院情報生産システム研究科

2025年春学期

▼全件表示

担当経験のある科目(授業)

情報科学概論Ⅱ

筑波大学

他学部・他研究科等兼任情報

理工学術院基幹理工学部

学内研究所・附属機関兼任歴

2024年

-

2026年

理工学術院総合研究所兼任研究員

特定課題制度（学内資金）

⾳環境の認識と理解のための⾰新的マイクロホンアレー基盤技術の研究

2024年

　概要を見る

　観測信号に含まれる音声信号の数がマイク数を上回る劣決定条件下において，複数チャネルの観測信号から単一の目的音声信号を抽出する新しい手法を提案した．従来のビームフォーマや空間正則化を用いたブラインド音源分離は，このような状況では干渉音声信号の抑圧が困難であった．スイッチング最小出力歪み応答 (Switching Minimum Power Distortionless Response: Sw-MPDR) ビームフォーマは，スイッチング機構を用いることで劣決定条件に対応可能だが，目的音声の到来方向によって決定されるステアリングベクトルに過適合すると推定精度が大幅に低下する．空間正則化独立ベクトル抽出 (Spatially-Regularized Independent Vector Extraction : SRIVE) は，到来方向のみに基づいて目的音声を頑健に強調できるが，劣決定条件下では性能が劣化する．本研究では，これらの従来法を拡張し，その限界を克服した．まず，Sw-MPDRビームフォーマに時間変動ガウス音源モデルを導入し，到来方向のみに基づいた目的音声の強調を効果的に行った．次に，SRIVEにスイッチング機構を導入し，劣決定条件下での音声強調性能を向上させた．これらの提案法をそれぞれスイッチング加重MPDR (Sw-wMPDR) ビームフォーマおよびスイッチングSRIVE (Sw-SRIVE)と呼ぶ．実験により，両提案法が劣決定条件下において到来方向を用いた目的音声の強調性能で従来法を上回ることを示した．本研究は，2024年 IEEE福岡支部学生研究奨励賞を受賞した．
⾳環境の認識と理解のための⾰新的マイクロホンアレー基盤技術の研究

2023年

　概要を見る

ブラインド処理と空間正則化処理に基づいてオンライン音源分離，残響除去，およびノイズ低減を実行する，計算効率の高い同時最適化アルゴリズムを提案した．まず，独立ベクトル抽出(IVE)と重み付き予測誤差残響除去(WPE)のブラインドオンライン同時最適化アルゴリズムを提案した．このオンラインアルゴリズムは，WPEを使用することで残響を低減できるため，短い分析フレームでも正確な分離を実現できた．次に，オンライン同時最適化をロバストな空間正則化で拡張した．DOA ベースの空間正則化を確実に機能させるためには，分離された信号のスケールを正規化することが非常に効果的であることを明らかにした．実験では，ブラインドオンライン同時最適化アルゴリズムが 8 ms のアルゴリズム遅延で分離精度を大幅に改善できることを確認した．さらに，提案した空間正則化オンライン同時最適化アルゴリズムが音源順序エラーを 0 % に低減することを確認した．
音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

2022年

　概要を見る

空間正則化付き独立ベクトル抽出（SRIVE）は，事前推定した音響伝達関数を用いて，所望の出力順序になるように音源分離を行う．しかし，従来のSRIVEはスケール任意性や伝達関数の誤差による出力順序誘導への影響が十分に考慮されていなかった．本研究では，空間正則化に加えてさらに分離フィルタのスケールを小さくする正則化を導入することで上記の問題の解決を試みた．実験より，スケール正則化が分離性能(SDR)を維持しつつ，出力順序正答率を75%から100%に改善することを確かめた．
音環境の認識と理解のための革新的マイクロホンアレー基盤技術の研究

2021年

　概要を見る

Thisresearch explores whether the newly proposed online algorithm that jointlyoptimizes weighted prediction error (WPE) and independent vector analysis (IVA)works well in separating moving sound sources in reverberant indoorenvironments. The moving source is first fixed and then rotated 60 degrees in aroom at a speed of less than 10 cm/s, while the other remains fixed. Throughthe comparison of the online-AuxIVA, online-WPE+IVA (separate), andonline-WPE+IVA (joint) algorithms, we can conclude that the online-WPE+IVA(joint) method has the best separation performance when the sources are fixed,but online-WPE+IVA (separate) is more stable and has better performance whenremoving moving sources from the mixed sound.