MORISHIMA, Shigeo

写真a

Affiliation

Faculty of Science and Engineering, School of Advanced Science and Engineering

Job title

Professor

Mail Address

E-mail address

Homepage URL

http://www.mlab.phys.waseda.ac.jp/

Concurrent Post 【 display / non-display

  • Faculty of Science and Engineering   Graduate School of Advanced Science and Engineering

  • Affiliated organization   Global Education Center

Research Institute 【 display / non-display

  • 2020
    -
    2022

    理工学術院総合研究所   兼任研究員

Education 【 display / non-display

  • 1982.04
    -
    1987.03

    University of Tokyo   Graduate School of Engineerng   Department of Electronics  

  • 1978.04
    -
    1982.03

    University of Tokyo   Faculty of Engineering   Department of Electronics  

Degree 【 display / non-display

  • Doctror of Engineering

Research Experience 【 display / non-display

  • 2004.04
    -
    Now

    Waseda University   Faculty of Science and Engineering   Professor

  • 2010.04
    -
    2014.03

    NICT   Spoken Language and Communication Research institute   Invited Researcher

  • 1999.04
    -
    2010.03

    ATR   Invited Resercher

  • 2001.04
    -
    2004.03

    Seikei University   Faculty of Engineering   Professor

  • 1988.04
    -
    2001.03

    Seikei University   Faculty of Engineering   Associate Professor

display all >>

Professional Memberships 【 display / non-display

  •  
     
     

    THE SOCIETY FOR ART AND SCIENCE

  •  
     
     

    JAPANESE ACADEMY OF FACIAL STUDIES

  •  
     
     

    THE JAPANESE PSYCHOLOGICAL ASSOCIATION

  •  
     
     

    THE INSTITUTE OF IMAGE INFORMATION AND TELEVISION ENGINEERS

  •  
     
     

    ACOUSTICAL SOCIETY OF JAPAN

display all >>

 

Research Areas 【 display / non-display

  • Intelligent informatics

Research Interests 【 display / non-display

  • Deep Learning

  • Audio Signal Processing

  • Face Image Processing

  • Multimedia Information Processing

  • Human Computer Interaction

display all >>

Papers 【 display / non-display

  • LSTM-SAKT: LSTM-Encoded SAKT-like Transformer for Knowledge Tracing

    Takashi Oya, Shigeo Morishima

    CoRR   abs/2102.00845  2021.01

     View Summary

    This paper introduces the 2nd place solution for the Riiid! Answer
    Correctness Prediction in Kaggle, the world's largest data science competition
    website. This competition was held from October 16, 2020, to January 7, 2021,
    with 3395 teams and 4387 competitors. The main insights and contributions of
    this paper are as follows. (i) We pointed out existing Transformer-based models
    are suffering from a problem that the information which their query/key/value
    can contain is limited. To solve this problem, we proposed a method that uses
    LSTM to obtain query/key/value and verified its effectiveness. (ii) We pointed
    out 'inter-container' leakage problem, which happens in datasets where
    questions are sometimes served together. To solve this problem, we showed
    special indexing/masking techniques that are useful when using RNN-variants and
    Transformer. (iii) We found additional hand-crafted features are effective to
    overcome the limits of Transformer, which can never consider the samples older
    than the sequence length.

  • Property analysis of adversarially robust representation

    Yoshihiro Fukuhara, Takahiro Itazuri, Hirokatsu Kataoka, Shigeo Morishima

    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering   87 ( 1 ) 83 - 91  2021.01

     View Summary

    In this paper, we address the open question: "What do adversarially robust models look at?" Recently, it has been reported in many works that there exists the trade-off between standard accuracy and adversarial robustness. According to prior works, this trade-off is rooted in the fact that adversarially robust and standard accurate models might depend on very different sets of features. However, it has not been well studied what kind of difference actually exists. In this paper, we analyze this difference through various experiments visually and quantitatively. Experimental results show that adversarially robust models look at things at a larger scale than standard models and pay less attention to fine textures. Furthermore, although it has been claimed that adversarially robust features are not compatible with standard accuracy, there is even a positive effect by using them as pre-trained models particularly in low resolution datasets.

    DOI

  • Self-supervised learning for visual summary identification in scientific publications

    Shintaro Yamamoto, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš, Shigeo Morishima

    CEUR Workshop Proceedings   2847   5 - 19  2021

     View Summary

    Providing visual summaries of scientific publications can increase information access for readers and thereby help deal with the exponential growth in the number of scientific publications. Nonetheless, efforts in providing visual publication summaries have been few and far apart, primarily focusing on the biomedical domain. This is primarily because of the limited availability of annotated gold standards, which hampers the application of robust and high-performing supervised learning techniques. To address these problems we create a new benchmark dataset for selecting figures to serve as visual summaries of publications based on their abstracts, covering several domains in computer science. Moreover, we develop a self-supervised learning approach, based on heuristic matching of inline references to figures with figure captions. Experiments in both biomedical and computer science domains show that our model is able to outperform the state of the art despite being self-supervised and therefore not relying on any annotated training data.

  • Do We Need Sound for Sound Source Localization?

    Takashi Oya, Shohei Iwase, Ryota Natsume, Takahiro Itazuri, Shugo Yamaguchi, Shigeo Morishima

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   12627 LNCS   119 - 136  2021

     View Summary

    During the performance of sound source localization which uses both visual and aural information, it presently remains unclear how much either image or sound modalities contribute to the result, i.e. do we need both image and sound for sound source localization? To address this question, we develop an unsupervised learning system that solves sound source localization by decomposing this task into two steps: (i) “potential sound source localization”, a step that localizes possible sound sources using only visual information (ii) “object selection”, a step that identifies which objects are actually sounding using aural information. Our overall system achieves state-of-the-art performance in sound source localization, and more importantly, we find that despite the constraint on available information, the results of (i) achieve similar performance. From this observation and further experiments, we show that visual information is dominant in “sound” source localization when evaluated with the currently adopted benchmark dataset. Moreover, we show that the majority of sound-producing objects within the samples in this dataset can be inherently identified using only visual information, and thus that the dataset is inadequate to evaluate a system’s capability to leverage aural information. As an alternative, we present an evaluation protocol that enforces both visual and aural information to be leveraged, and verify this property through several experiments.

    DOI

  • Song2Face: Synthesizing Singing Facial Animation from Audio

    Shohei Iwase, Takuya Kato, Shugo Yamaguchi, Tsuchiya Yukitaka, Shigeo Morishima

    SIGGRAPH Asia 2020 Technical Communications, SA 2020    2020.12  [Refereed]

     View Summary

    We present Song2Face, a deep neural network capable of producing singing facial animation from an input of singing voice and singer label. The network architecture is built upon our insight that, although facial expression when singing varies between different individuals, singing voices store valuable information such as pitch, breathe, and vibrato that expressions may be attributed to. Therefore, our network consists of an encoder that extracts relevant vocal features from audio, and a regression network conditioned on a singer label that predicts control parameters for facial animation. In contrast to prior audio-driven speech animation methods which initially map audio to text-level features, we show that vocal features can be directly learned from singing voice without any explicit constraints. Our network is capable of producing movements for all parts of the face and also rotational movement of the head itself. Furthermore, stylistic differences in expression between different singers are captured via the singer label, and thus the resulting animations singing style can be manipulated at test time.

    DOI

display all >>

Misc 【 display / non-display

  • Property Analysis of Adversarially Robust Representation

    福原吉博, 板摺貴大, 片岡裕雄, 森島繁生

    精密工学会誌(Web)   87 ( 1 )  2021

    J-GLOBAL

  • スペクトログラムとピッチグラムの深層クラスタリングに基づく複数楽器パート採譜

    田中啓太郎, 中塚貴之, 錦見亮, 吉井和佳, 森島繁生

    情報処理学会研究報告(Web)   2020 ( MUS-128 )  2020

    J-GLOBAL

  • LineChaser:視覚障碍者が列に並ぶためのスマートフォン型支援システム

    栗林雅希, 粥川青汰, 高木啓伸, 浅川智恵子, 森島繁生

    日本ソフトウェア科学会研究会資料シリーズ(Web)   ( 91 )  2020

    J-GLOBAL

  • 分離型畳み込みカーネルを用いた非均一表面下散乱の効率的な計測と実時間レンダリング法

    谷田川達也, 山口 泰, 森島繁生

    Visual Computing 2019 論文集   P26  2019.06

  • What Do Adversarially Robust Models Look At?

    Takahiro Itazuri, Yoshihiro Fukuhara, Hirokatsu Kataoka, Shigeo Morishima

       2019.05

     View Summary

    In this paper, we address the open question: "What do adversarially robust
    models look at?" Recently, it has been reported in many works that there exists
    the trade-off between standard accuracy and adversarial robustness. According
    to prior works, this trade-off is rooted in the fact that adversarially robust
    and standard accurate models might depend on very different sets of features.
    However, it has not been well studied what kind of difference actually exists.
    In this paper, we analyze this difference through various experiments visually
    and quantitatively. Experimental results show that adversarially robust models
    look at things at a larger scale than standard models and pay less attention to
    fine textures. Furthermore, although it has been claimed that adversarially
    robust features are not compatible with standard accuracy, there is even a
    positive effect by using them as pre-trained models particularly in low
    resolution datasets.

display all >>

Awards 【 display / non-display

  • Best Paper Award

    2020.12   Japan Society for Software Science and Technology   Line Chaser: Smart Phone Based Blind People Assistance System for Line Tracing

    Winner: Masaki Kuribayashi, Seita Kayukawa, Takanobu Takagi, Chieko Asakawa, Shigeo Morishima

  • CG Japan Award

    2020.11   The Society for Art and Science   Innovative Research and its Application of CG, CV and Music Information Processing

    Winner: Shigeo Morishima

  • Hagura Award Forum Eight Award

    2020.11   State of the Art Technologies Expression Association   Automatic Singing and Dancing Photorealistic Character Animation Generation from Single Snapshot

    Winner: Shigeo Morishima, Naoya Iwamoto, Takuya Kato, Takayuki Nakatsuka, Shugo Yamaguchi

  • Best Paper Award Finalist

    2019.06   IEEE CVPR 2019   SiCloPe: Silhouette-Based Clothed People

    Winner: Ryota Natsume, Shunsuke Saito, Zeng Huang, Weikai Chen, Chongyang Ma, Hao Li, Shigeo Morishima

  • Interaction 2019 Best Paper Award

    2019.03   Information Processing Society Japan  

    Winner: Seita Kayukawa, Keita Higuchi, João Guerreiro, Shigeo Morishima, Yoichi Sato, Kris Kitani, Chieko Asakawa

display all >>

Research Projects 【 display / non-display

  • Privacy preserved AI framework based on distributed anonymization

    "Realization of a Super Smart Society (Society 5.0)" area, Innovative AI technologies for sophisticated integration of cyber and physical world

    Project Year :

    2019.11
    -
    2026.03
     

    Hideo Saito, Shigeo Morishima

    Authorship: Coinvestigator(s)

  • 認識・生成過程の統合に基づく視聴覚音楽理解

    基盤研究(B)

    Project Year :

    2019.04
    -
    2023.03
     

    吉井 和佳, 河原 達也, 森島 繁生

     View Summary

    2019年度は、聴覚系による音楽理解の定量化として、まず、生成モデルと認識モデルの統合に基づく統計的自動採譜に取り組んだ。具体的には、コード認識タスクにおいて、コード系列から音響的特徴量系列が生成される過程を確率的生成モデルとして定式化し、その逆問題を解く、すなわち、音響的特徴量系列からコード系列を推定するための認識モデルを、償却型変分推論の枠組みで導入することで、両者を同時に最適化する方法を考案した。これにより、コードラベルが付与されていない音響信号も用いた半教師あり学習を可能にした。これは、人間が音楽を聴いて、そのコードを認識する際に、そのコードからどのような響きの音が発生するのかを同時に想像し、元の音楽との整合性を無意識的に考慮していることに相当していると考えられる。また、音楽の記号的な側面にも着目して研究を展開した。具体的には、ピアノの運指推定や、メロディのスタイル変換などの課題において、運指モデルや楽譜モデルを事前分布に導入し、身体的あるいは音楽的に妥当な推定結果を得るための統計的枠組みを考案した。さらに、音声理解の定量化して、音声スペクトルの深層生成モデルを事前分布に基いた音声強調法を開発すると同時に、高精度かつ高速なブラインド音源分離技術も考案し、音源モデル・空間モデルの両面から音理解の定量化に迫ることができた。一方、視覚系によるダンス動画理解の定量化に向けた第一段階として、画像中の人間の姿勢推定の研究の取り組みも開始した。また、楽器音を入力とすることで、高品質かつ音に合った自然な演奏映像の生成を実現した。具体的には、人の姿勢特徴量を介すことで、音と人物映像といった異なるドメイン間をマッピングするEnd-to-End学習が可能になった。

  • Analysis of Reality Distorted Spatio-Temporal Field to Improve Human Skill and Motivation

    Grant-in-Aid for Scientific Research (A)

    Project Year :

    2019.04
    -
    2022.03
     

  • Next generation speech translation research

    Grant-in-Aid for Scientific Research (S)

    Project Year :

    2017.05
    -
    2022.03
     

  • Building Foundations and Developing Applications for Next-Generation Media Content Ecosystem Technologies

    ACCEL

    Project Year :

    2016.04
    -
    2021.03
     

    Masataka Goto, Shigeo Morishima, Kazuyoshi Yoshii, Satoshi Nakamura

    Authorship: Coinvestigator(s)

display all >>

 

Syllabus 【 display / non-display

display all >>

Teaching Experience 【 display / non-display

  • 画像情報処理工学特論

    早稲田大学  

  • デジタル信号処理

    早稲田大学  

  • 回路理論

    早稲田大学, 成蹊大学  

 

Committee Memberships 【 display / non-display

  • 2020.04
    -
    Now

    ACM VRST 2021  Sponsorship Chair

  • 2018.04
    -
    Now

    JST CREST  Advisor

  • 2016.05
    -
    2020.05

    画像電子学会ビジュアルコンピューティング研究専門委員会  委員長

  • 2018.12
    -
    2019.12

    ACM VRST 2019  Program Chair

  • 2018.12
     
     

    ACM SIGGRAPH ASIA 2018  VR/AR Advisor

display all >>