矢田部 浩平 (ヤタベ コウヘイ)

写真a

所属

理工学術院 基幹理工学部

職名

講師(任期付)

ホームページ

http://www.acoust.ias.sci.waseda.ac.jp/members/1

学位 【 表示 / 非表示

  • 早稲田大学   博士(工学)

 

研究分野 【 表示 / 非表示

  • 知覚情報処理

研究キーワード 【 表示 / 非表示

  • 信号処理

  • 光学計測

  • 音響工学

論文 【 表示 / 非表示

  • Physical-model-based reconstruction of axisymmetric three-dimensional sound field from optical interferometric measurement

    Kenji Ishikawa, Kohei Yatabe, Yasuhiro Oikawa

    Measurement Science and Technology   32 ( 4 )  2021年04月  [査読有り]

     概要を見る

    Optical interferometric measurement methods for sound fields have garnered considerable attention owing to their contactless nature. The capabilities of non-invasive measurement and reconstruction of three-dimensional sound fields are significant for characterizing acoustic transducers. However, three-dimensional reconstructions are typically time consuming because of the two-dimensional scanning and rotation of the measurement system. This paper presents a scan and rotation-free reconstruction of an axisymmetric sound field in the human hearing range. A physical-model-based algorithm is proposed to reconstruct an axisymmetric sound field from optical interferograms recorded using parallel phase-shifting interferometry and a high-speed polarization camera. We demonstrate that audible sound fields can be reconstructed from data measured in 10 ms. The proposed method is effective for the rapid evaluation of axially symmetric acoustic transducers.

    DOI

  • Gamma Boltzmann Machine for Audio Modeling

    Toru Nakashika, Kohei Yatabe

    IEEE/ACM Transactions on Audio Speech and Language Processing   29   2591 - 2605  2021年  [査読有り]

     概要を見る

    This paper presents an energy-based probabilistic model that handles nonnegative data in consideration of both linear and logarithmic scales. In audio applications, magnitude of time-frequency representation, including spectrogram, is regarded as one of the most important features. Such magnitude-based features have been extensively utilized in learning-based audio processing. Since a logarithmic scale is important in terms of auditory perception, the features are usually computed with a logarithmic function. That is, a logarithmic function is applied within the computation of features so that a learning machine does not have to explicitly model the logarithmic scale. We think in a different way and propose a restricted Boltzmann machine (RBM) that simultaneously models linear- and log-magnitude spectra. RBM is a stochastic neural network that can discover data representations without supervision. To manage both linear and logarithmic scales, we define an energy function based on both scales. This energy function results in a conditional distribution (of the observable data, given hidden units) that is written as the gamma distribution, and hence the proposed RBM is termed gamma-Bernoulli RBM. The proposed gamma-Bernoulli RBM was compared to the ordinary Gaussian-Bernoulli RBM by speech representation experiments. Both objective and subjective evaluations illustrated the advantage of the proposed model.

    DOI

  • Determined BSS Based on Time-frequency Masking and Its Application to Harmonic Vector Analysis

    Kohei Yatabe, Daichi Kitamura

    IEEE/ACM Transactions on Audio Speech and Language Processing    2021年  [査読有り]

     概要を見る

    This paper proposes harmonic vector analysis (HVA) based on a general algorithmic framework of audio blind source separation (BSS) that is also presented in this paper. BSS for a convolutive audio mixture is usually performed by multichannel linear filtering when the numbers of microphones and sources are equal (determined situation). This paper addresses such determined BSS based on batch processing. To estimate the demixing filters, effective modeling of the source signals is important. One successful example is independent vector analysis (IVA) that models the signals via co-occurrence among the frequency components in each source. To give more freedom to the source modeling, a general framework of determined BSS is presented in this paper. It is based on the plug-and-play scheme using a primal-dual splitting algorithm and enables us to model the source signals implicitly through a time-frequency mask. By using the proposed framework, determined BSS algorithms can be developed by designing masks that enhance the source signals. As an example of its application, we propose HVA by defining a time-frequency mask that enhances the harmonic structure of audio signals via sparsity of cepstrum. The experiments showed that HVA outperforms IVA and independent low-rank matrix analysis (ILRMA) for both speech and music signals. A MATLAB code is provided along with the paper for a reference.

    DOI

  • Deep Griffin-Lim Iteration: Trainable Iterative Phase Reconstruction Using Neural Network

    Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

    IEEE Journal on Selected Topics in Signal Processing   15 ( 1 ) 37 - 50  2021年01月  [査読有り]

     概要を見る

    In this paper, we propose a phase reconstruction framework, named Deep Griffin-Lim Iteration (DeGLI). Phase reconstruction is a fundamental technique for improving the quality of sound obtained through some process in the time-frequency domain. It has been shown that the recent methods using deep neural networks (DNN) outperformed the conventional iterative phase reconstruction methods such as the Griffin-Lim algorithm (GLA). However, the computational cost of DNN-based methods is not adjustable at the time of inference, which may limit the range of applications. To address this problem, we combine the iterative structure of GLA with a DNN so that the computational cost becomes adjustable by changing the number of iterations of the proposed DNN-based component. A training method that is independent of the number of iterations for inference is also proposed to minimize the computational cost of the training. This training method, named sub-block training by denoising (SBTD), avoids recursive use of the DNN and enables training of DeGLI with a single sub-block (corresponding to one GLA iteration). Furthermore, we propose a complex DNN based on complex convolution layers with gated mechanisms and investigated its performance in terms of the proposed framework. Through several experiments, we found that DeGLI significantly improved both objective and subjective measures from GLA by incorporating the DNN, and its sound quality was comparable to those of neural vocoders.

    DOI

  • Determination of frequency response of MEMS microphone from sound field measurements using optical phase-shifting interferometry method

    Denny Hermawanto, Kenji Ishikawa, Kohei Yatabe, Yasuhiro Oikawa

    Applied Acoustics   170  2020年12月  [査読有り]

     概要を見る

    © 2020 Elsevier Ltd Accurate determination of microphone sensitivity is important to build reliable acoustical instruments. The sensitivity is usually determined by calibration. However, because available microphone calibration methods determine the sensitivity from a mathematical model derived from the geometry of a conventional condenser microphone, they cannot be applied to the calibration of microelectromechanical systems (MEMS) microphone straightforwardly. To compromise this geometry difference with the available calibration methods, some authors have proposed the development of adapters that fits the conventional calibration apparatus and modified the calibration procedure. In this paper, we propose a different approach to calibrate the MEMS microphone. The sensitivity is calculated directly from the measurement of the sound field applied to the MEMS microphone and its output voltage. The projection of the sound field is measured by parallel phase-shifting interferometry (PPSI), and sound pressure on the MEMS microphone is obtained by tomographic reconstruction. Experimental calibration of a MEMS microphone was performed and validated using a microphone substitution method to evaluate the discrepancies of the sensitivity result. It is shown that the proposed method can be used to determine the frequency response of the MEMS microphone in the frequency range of 1000 Hz to 12000 Hz.

    DOI

全件表示 >>

書籍等出版物 【 表示 / 非表示

受賞 【 表示 / 非表示

  • 第13回 独創研究奨励賞 板倉記念

    2018年03月   日本音響学会  

  • 早稲田大学ティーチングアワード

    2018年02月   早稲田大学  

  • 第38回 粟屋潔学術奨励賞

    2015年09月   日本音響学会  

  • 第8回 学生優秀発表賞

    2014年03月   日本音響学会  

特定課題研究 【 表示 / 非表示

  • 時間周波数解析の構造に基づく音響信号処理

    2020年  

     概要を見る

     音響信号などの時系列データの解析・処理には,時間周波数解析が広く用いられている.各時刻における周波数成分に関する情報を得ることができるので,特定の成分のみを処理することができ,特に音響信号処理においては欠かせないツールとなっている.時間周波数解析には不確定性原理に基づく制約があり,その構造を利用することで本来のデータの情報を考慮することができるが,これまでは時間周波数解析の構造はあまり積極的に活用されてこなかった.本研究では,時間周波数領域における時間と周波数の隣接関係に着目し,新たな音響信号処理に関する検討を行った.

  • 近接分離法のヒューリスティック拡張に基づく音響信号処理

    2019年  

     概要を見る

     信号処理では,解決すべき課題を最適化問題に帰着させ何らかのアルゴリズムによって解くことで,データに対して所望の処理を行うことが多い.最適化問題は複数の項が含まれると解くのが難しくなるが,近年の近接分離最適化手法は各項を分離して各々解くのみで良く,複雑な問題を解くのに適している.本研究では,これまで提案されている音響信号処理手法に用いられた最適化の手続きをヒューリスティックに拡張することで,従来の定式化にとらわれない新たな信号処理アルゴリズムを実現した.具体的には,マルチチャネル音源分離を行う近接分離アルゴリズムの近接作用素を一般化することで,近接作用素として定式化困難な操作の導入を実現した.

  • 近接分離最適化による音響信号処理

    2018年  

     概要を見る

     信号処理では,解決すべき課題を最適化問題に帰着させ,何らかのアルゴリズムによって解くことで,データに対して所望の処理を行うことが多い.最適化問題は,複数の項が含まれると解くのが難しくなるが,近年の近接分離最適化手法は各項を分離して各々解くのみで良く,複雑な問題を解くのに適している.本研究では,音響における諸問題を近接分離アルゴリズムを用いて解くことで,様々な音響信号処理を実現した.具体的には,混合音から混合元の音源を推定する音源分離,観測信号から雑音を取り除くノイズ除去,音の鳴っている位置を推定する音源定位,振幅スペクトログラムから位相を生成する位相復元などを提案した.

  • 凸最適化に基づく音響信号処理の研究

    2017年  

     概要を見る

     信号処理では,解くべき問題を最適化問題に帰着させ,何らかのアルゴリズムによって最適化することで,データに対して所望の処理を行うことが多い.最適化問題は,非凸な問題と凸な問題に大別でき,凸なら大域最適性を保証できるなど,凸最適化問題は非凸な問題に比べて性質の良い問題と言える.本研究では,音響信号処理の諸問題を凸最適化問題として定式化し,凸最適化アルゴリズムを用いて解くことで,様々な処理を実現した.具体的には,混合音から混合元の音源を推定する音源分離,観測信号から雑音を取り除くノイズ除去,音を調波成分に分けるモード分解,音響シミュレーションにおける初期条件の推定,実信号の包絡推定等を提案した.

 

現在担当している科目 【 表示 / 非表示

全件表示 >>

 

委員歴 【 表示 / 非表示

  • 2021年06月
    -
    継続中

    日本音響学会  サマーセミナー実行委員

  • 2021年06月
    -
    継続中

    日本音響学会  学会誌編集委員

  • 2021年06月
    -
    継続中

    電子情報通信学会  基礎・境界ソサイエティ論文誌編集委員

  • 2021年05月
    -
    継続中

    日本音響学会  評議員

  • 2018年05月
    -
    継続中

    電子情報通信学会  信号処理研究専門委員会

全件表示 >>