研究者詳細 - 尾形　哲也

写真a

オガタ　テツヤ

尾形　哲也

Scopus 論文情報

論文数: 387 Citation: 4977 h-index: 31

Click to view the Scopus page. The data was downloaded from Scopus API in October 12, 2025, via http://api.elsevier.com and http://www.scopus.com .

Google Scholar 情報（Citations per year）

Citation: 9135 h-index: 43 i10-index: 200

Click to view the Google Scholar page.

Scopus 情報

News & Topics

2023.06.05

令和5年度科学技術分野の文部科学大臣表彰　受賞コメント

2023.04.07

令和5年度科学技術分野の文部科学大臣表彰 5名の教員が受賞

2022.05.26

最先端の研究を発表「次代の中核研究者2021」

2022.04.15

深層予測学習型ロボット制御技術開発

2022.04.13

ロボット・テクノロジーから医療への挑戦：日医大・早大研究者交流会を開催

所属

理工学術院基幹理工学部

職名

教授

学位

博士(工学) ( 2000年03月早稲田大学 )

メールアドレス

ホームページ

https://ogata-lab.jp/ja/member_ja/ogata_ja.html

プロフィール

1993年早稲田大学理工学部機械工学科卒業、1997年日本学術振興会特別研究員（DC2）、1999年早稲田大学理工学部助手、2001年理化学研究所脳科学総合研究センター研究員、2003年京都大学大学院情報学研究科講師、2005年同准教授を経て、2012年より早稲田大学理工学術院基幹理工学部表現工学科教授。博士（工学）。2009年-2015年JSTさきがけ領域研究員。また2017年より産業総合技術研究所人工知能研究センター特定フェロー。2013年から2014年日本ロボット学会理事。2016年から2018年人工知能学会理事などを歴任。2020年より早稲田大学次世代ロボット研究機構AIロボット研究所所長。2024年より国立情報学研究所大規模言語モデル研究開発センター客員教授。2025年よりAIロボット協会理事長。2025年よりJST CREST「実環境知能システム」領域研究総括。深層学習、生成AIに代表される神経回路モデルとロボットシステムを用いた，認知ロボティクス研究，特に予測学習，模倣学習，マルチモーダル統合，言語学習，コミュニケーションなどの研究に従事．2021年IEEE ICRA2021 Best Paper Award In Cognitive Science、2023年文部科学大臣表彰科学技術賞（研究部門）などを受賞。

経歴

2024年10月

-

継続中

国立情報学研究所大規模言語モデル研究開発センター客員教授
2017年10月

-

継続中

国立研究開発法人産業技術総合研究所人工知能研究センター特定フェロー
2012年04月

-

継続中

早稲田大学理工学術院基幹理工学部表現工学科教授
2009年10月

-

2015年03月

国立研究開発法人科学技術振興機構さきがけ領域研究員
2005年06月

-

2012年03月

京都大学大学院情報学研究科知能情報学専攻准教授
2003年10月

-

2005年05月

京都大学大学院情報学研究科知能情報学専攻講師
2001年04月

-

2003年09月

特定国立研究開発法人理化学研究所脳科学総合研究センター研究員
1999年04月

-

2001年03月

早稲田大学理工学部機械工学科助手
1997年04月

-

1999年03月

独立行政法人日本学術振興会特別研究員 (DC2)

▼全件表示

学歴

1995年04月

-

1998年03月

早稲田大学大学院理工学研究科機械工学専攻

博士後期課程
1993年04月

-

1995年03月

早稲田大学大学院理工学研究科機械工学専攻

修士課程
1989年04月

-

1993年03月

早稲田大学理工学部機械工学科

委員歴

2017年04月

-

2025年07月

日本ディープラーニング協会理事
2018年04月

-

2020年03月

計測自動制御学会理事
2016年07月

-

2018年06月

人工知能学会理事
2018年04月

-

　

一般社団法人G1ディープラーニング研究会アドバイザリー・ボードメンバー
2013年01月

-

2014年12月

日本ロボット学会理事
2004年01月

-

　

Robotics Society of Japan Associate Editor of Advanced Robotics
2004年01月

-

　

日本ロボット学会欧文誌委員

▼全件表示

所属学協会

　

-

継続中

日本ディープラーニング協会
　

-

継続中

日本発達神経科学会
　

-

継続中

IEEE
　

-

継続中

人工知能学会
　

-

継続中

計測自動制御学会（フェロー）
　

-

継続中

バイオメカニズム学会
　

-

継続中

日本機械学会（フェロー）
　

-

継続中

日本ロボット学会（フェロー）
　

-

継続中

情報処理学会

▼全件表示

研究分野

ロボティクス、知能機械システム / 知能ロボティクス

研究キーワード

深層予測学習
認知ロボティクス

受賞

論文賞

2024年12月 FA財団 Learning-based Collision-free Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

受賞者： Tomoki Ando, Hiroto Iino, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata
Best Paper Award

2024年09月 Advanced Robotics, The Robotics Society of Japan Learning-based Collision-free Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

受賞者： Tomoki Ando, Hiroto Iino, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata
Best paper award Nomination Finalist

2024年01月 IEEE/SICE International Symposium on System Integration (SII 2024) Real-time Motion Generation and Data Augmentation for Grasping Moving Objects with Dynamic Speed and Position Changes

受賞者： Kenjiro Yamamoto, Hiroshi Ito, Hideyuki Ichiwara, Hiroki Mori, Tetsuya Ogata
Frontiers of Science Awards

2023年07月 The International Congress for Basic Science How to select and use tools?: Active perception of target objects using multimodal deep learning

受賞者： Namiko Saito, Tetsuya Ogata, Satoshi Funabashi, Hiroki Mori, Shigeki Sugano
文部科学大臣表彰科学技術賞（研究部門）

2023年04月文部科学省深層予測学習によるロボットのマルチタスク学習に関する研究

受賞者：尾形哲也
フェロー

2023年02月日本機械学会

受賞者：尾形哲也
SI2022優秀講演賞

2022年12月計測自動制御学会システムインテグレーション部門 VRデバイスに発生するノイズのCAEを用いた実時間フィルタリング

受賞者：橋本直樹, 陽品駒, 尾形哲也
フェロー

2022年09月計測自動制御学会

受賞者：尾形哲也
フェロー

2022年09月日本ロボット学会

受賞者：尾形哲也
学術業績賞

2022年06月日本機械学会ロボティクス・メカトロニクス部門

受賞者：尾形哲也
Best Paper Award Finalist

2022年01月 IEEE/SICE International Symposium on System Integration (SII 2022) Sensory-Motor Learning for Simultaneous Control of Motion and Force: Generating Rubbing Motion against Uneven Object

受賞者： Hiroshi Ito, Takumi Kurata, and Tetsuya Ogata
Best paper award

2022年01月 IEEE/SICE International Symposium on System Integration (SII 2022) Buttoning Task with a Dual-Arm Robot: An Exploratory Study on a Marker-based Algorithmic Method and Marker-less Machine Learning Methods

受賞者： Wakana Fujii, Kanata Suzuki, Tomoki Ando, Ai Tateishi, Hiroki Mori, Tetsuya Ogata
Best RoboCup Paper Award Nomination Finalist

2021年09月 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2021) Flexible Object Manipulation for Humanoid Robot Using Partially Binarized Auto-Encoder on FPGA

受賞者： Satoshi Ohara, Tetsuya Ogata, and Hiromitsu Awano
Best paper award on Cognitive Robotics

2021年06月 IEEE International Conference on Robotics and Automation (ICRA 2021) How to select and use tools? : Active Perception of Target Objects Using Multimodal Deep Learning

受賞者： Namiko Saito, Tetsuya Ogata, Satoshi Funabashi, Hiroki Mori, Shigeki Sugano
SI2020優秀講演賞

2020年12月計測自動制御学会SI部門データ変換に着目したミドルウェアモデルにおけるデバイス類の位置の扱いに関する議論

受賞者：菅佑樹, 森裕紀, 尾形哲也
全国大会優秀賞

2020年07月人工知能学会過去から未来までの文脈を考慮した神経回路モデルによるロボットの目標に基づいた柔軟な行動生成

受賞者：佐藤琢, 村田真悟, 出井勇人, 尾形哲也
全国大会優秀賞

2020年07月人工知能学会未知語に対応可能な言語と動作の統合表現獲得モデル

受賞者：豊田みのり, 森裕紀, 鈴木彼方, 林良彦, 尾形哲也
論文賞

2019年12月 FA財団 Dynamic motion learning for multi-DOF flexible-joint robots using active-passive motor babbling through deep learning

受賞者： Kuniyuki Takahashi, Tetsuya Ogata, Gordon Cheng, Shigeki Sugano
Best paper award

2019年09月日本ロボット学会欧文誌 Dynamic Motion Learning for Multi-DOF Flexible-Joint Robots Using Active-Passive Motor Babbling through Deep Learning

受賞者： Kuniyuki Takahashi, Tetsuya Ogata, Jun Nakanishi, Gordon Cheng, Shigeki Sugano
ROBOMECH表彰（産業・応用分野）

2019年06月日本機械学会ロボティクスメカトロニクス部門深層学習を用いた要素動作の統合手法の開発

受賞者：伊藤洋, 山本健次郎, 尾形哲也
IBM Academic Awards

2017年07月 IBM Achieving Robot-Behavior Adaptability Utilizing Deep Learning Model

受賞者： Tetsuya Ogata
Best paper award

2016年09月 International conference on artificial neural networks (ICANN2016) Dynamical Linking of Positive and Negative Sentences to Goal-oriented Robot Behavior by Hierarchical RNN

受賞者： Tatsuro Yamada, Shingo Murata, Hiroaki Arie, and Tetsuya Ogata
ティーチングアワード総長賞

2015年03月早稲田大学インタラクティブセンシング

受賞者：尾形哲也, 橋田朋子
Best paper award (Robotics)

2011年12月 IEEE/SICE International Symposium on System Integration (SII 2011) Exploring Movable Space using Rhythmical Active Touch in Disordered Obstacle Environment

受賞者： Kenri KODAKA, Tetsuya OGATA, Hirotaka OHTA, Shigeki SUGANO
NFT Award for Entertainment Robots and Systems

2010年10月 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2010) Robot Musical Accompaniment: Integrating Audio and Visual Cues for Real-time Synchronization with a Human Flutist

受賞者： Angelica Lim, Takeshi MIZUMOTO, Lois-Kenzo Cahier, Takuma OTSUKA, Toru TAKAHASHI, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO
Best Paper Award

2010年06月 International Conference on Industrial, Engineering and Other Applications of Applied Intelligence Systems (IEA/AIE-2010) Music-ensemble robot that is capable of playing the theremin while listening to the accompanied music

受賞者： Takuma OTSUKA, Takeshi MIZUMOTO, Kazuhiro NAKADAI, Toru TAKAHASHI, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO
NFT award on “Entertainment Robots and Systems” nomination finalist

2008年09月 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2008) A Robot Listens to Music and Counts Its Beats Aloud by Separating Music from Counting Voice

受賞者： Takeshi MIZUMOTO, Ryu TAKEDA, Kazuyoshi YOSHII, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO
財団設立10周年記念・特別研究助成

2006年02月栢森情報科学振興財団能動的知覚に基づくロボットの物体の動的認識

受賞者：尾形哲也
Best paper award

2005年06月 International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE-2005) Distance Based Dynamic Interaction of Humanoid Robot with Multiple People

受賞者： Tsuyoshi TASAKI, Shohei MATSUMOTO, Hayato OHBA, Mitsuhiko TODA, Kazunori KOMATANI, Tetsuya OGATA, and Hiroshi G. OKUNO
学会論文賞

2000年03月日本機械学会情動モデルを有する自律ロボットWAMOEBA-2と人間との情緒交流

受賞者：尾形哲也, 菅野重樹

▼全件表示

メディア報道

日本でAI・ロボット人材が326万人も不足する理由…まったく文化が異なる両分野の融合への取り組み

インターネットメディア

執筆者：本人以外

BUSINESS JOURNAL

2025年08月
『日本のAIロボティクスは復活できるか』日本ロボット協会理事長の尾形先生が語る、世界との差と日本の国家戦略

インターネットメディア

執筆者：本人以外

柏原迅 | AIマーケティング戦略

2025年08月
ロボットAIの第一人者、尾形教授と語る「フィジカルAI」世界競争の最前線と日本の戦略 VCファーストライトが開催

インターネットメディア

執筆者：本人以外

ロボスタ

2025年08月
物流ロボ焦点は｢足｣から｢手｣に、育成視点が不可欠

インターネットメディア

執筆者：本人以外

LOGISTICS TODAY

2025年07月
ヒト型ロボットのパイオニア＝日本、フィジカルAI開発で世界に勝つ…産学連携でデータセット開発

インターネットメディア

執筆者：本人以外

BUSINESS JOURNAL

2025年07月
変革促す自律型AI 「攻守一体」で実装を, NIKKEI生成AIシンポジウム

インターネットメディア

執筆者：本人以外

NIKKEI BizGate

2025年07月
人型ロボ、AIで進化開発は歩行から作業スキルへ

インターネットメディア

執筆者：本人以外

日経ビジネス電子版

2025年07月
INTERVIEW｜AIロボット協会ロボットと人の相互作用を促し自律的に動く知能ロボティクス尾形哲也氏

執筆者：本人以外

日本経済新聞出版日経ムック AIで加速するエマージングテクノロジー

2025年06月
日本の人型AIロボットは世界で勝てるのか？ロボット開発の現在地と未来への提言【石黒浩×北野宏明×島田太郎×尾形哲也×井原慶子】

インターネットメディア

執筆者：本人以外

GLOBIS学び放題X知見録

2025年06月
AIロボットはビジネスをどう変えるか

新聞・雑誌

執筆者：本人以外

Best Partner

2025年06月
テクノロジー／AIロボット

テレビ・ラジオ番組

LucyFM茨城放送ダイバーシティニュース

2025年05月
AIロボットの動向と展望

インターネットメディア

執筆者：本人以外

InfoComニューズレター

2025年05月
AIロボットの開発促進に向けて指導 25年度は基盤モデルなどを開発

新聞・雑誌

執筆者：本人以外

電子デバイス産業新聞

9面

2025年05月
#184 ロボット(後編)

テレビ・ラジオ番組

TOKYO MX なんて美だ！

2025年05月
転換期を迎えたロボット・フィジカルAI開発、AIとシミュレーションがもたらす変化

インターネットメディア

執筆者：本人以外

Seizo Trend

2025年05月
#183 ロボット(前編)

テレビ・ラジオ番組

TOKYO MX なんて美だ！

2025年04月
AI×ロボットの基盤モデル構築へ AIロボット協会

新聞・雑誌

日刊工業新聞

2025年04月
国産AIロボット開発最前線, おはよう日本

テレビ・ラジオ番組

NHK総合

2025年04月
ロボット強国走り続ける中国 AIも融合少子化での人手不足補う

新聞・雑誌

執筆者：本人以外

朝日新聞

2025年04月
人型ロボのハーフ完走「すごい」のはなぜ？識者が語る期待と限界

インターネットメディア

執筆者：本人以外

朝日新聞電子版

2025年04月
“1人に1台 AIロボット” 開発最前線

インターネットメディア

執筆者：本人以外

NHK NEWS WEB

2025年04月
主役はハードからAIに、開発競争は歩行から作業スキルに移行

新聞・雑誌

日経エレクトロニクス

2025年04月
オールジャパンでロボAI開発へ、AIロボット協会「AIRoA」発足、メカ偏重への危機感後押し

新聞・雑誌

執筆者：本人以外

日経ロボティクス

2025年04月
汎用ロボットの実用化目指し産学官で「AIロボット協会」発足トヨタや日産も参加

インターネットメディア

執筆者：本人以外

日刊自動車新聞電子版

2025年04月
トヨタら日本勢が狙うAIロボの“Linux”、大規模基盤モデル構築へ

新聞・雑誌

執筆者：本人以外

日経クロステック

2025年04月
トヨタ･KDDI参加「AIロボット協会」､基盤整え開発加速

新聞・雑誌

執筆者：本人以外

日本経済新聞

2025年03月
AIとロボットの融合の促進へ推進団体が来年度から基盤づくりを本格化

インターネットメディア

執筆者：本人以外

電波新聞デジタル

2025年03月
【現地取材】物流現場などでロボットを柔軟に動かせるAI開発目指す

インターネットメディア

執筆者：本人以外

LOGI-BIZ Online

2025年03月
AIロボット協会が発足トヨタや日産も参画高度な汎用タイプ実現へ

インターネットメディア

執筆者：本人以外

日刊自動車新聞電子版

2025年03月
ロボットでAIを育てる/25年度にモデル公開/AIロボット協会が設立会見

インターネットメディア

執筆者：本人以外

建設通信新聞

2025年03月
日本の汎用ロボット開発の起爆剤となるか、基盤モデル構築目指すAIRoAが発足

インターネットメディア

執筆者：本人以外

MONOist

2025年03月
New company-university organization launched in Japan to develop ‘AI robots’

インターネットメディア

執筆者：本人以外

NHK WORLD Japan

2025年03月
「日本はロボット経験値が豊富」 AIロボットの開発・実装へ、トヨタ・KDDI・NECら参画の「AIRoA」が本格始動

インターネットメディア

BUSINESS NETWORK

2025年03月
米中で覇権争い AIロボット日本の勝ち筋は? ワールドビジネスサテライト

テレビ・ラジオ番組

執筆者：本人以外

WBS

2025年03月
“人間のような動き”AIロボット開発へ国内で新たな団体設立

インターネットメディア

執筆者：本人以外

NHK Web

2025年03月
ロボット動かすAI、トヨタや早大など開発 25年公開へ

新聞・雑誌

執筆者：本人以外

日本経済新聞

2025年03月
トヨタほか24社が汎用ロボット実現に挑む「AIロボット協会」が活動開始

インターネットメディア

執筆者：本人以外

PC Watch

2025年03月
人型ロボブームは本物か、識者に聞いた5つの疑問

新聞・雑誌

日経クロステック

2025年03月
第12回日経星新一賞 100年後の未来へ想像力が希望つなぐ

新聞・雑誌

執筆者：本人以外

日本経済新聞

24面

2025年03月
「風呂はロボットが掃除してくれる」未来予測反省会

テレビ・ラジオ番組

執筆者：本人

NHK BS

2025年03月
一般社団法人AIロボット協会（AIRoA）AI×ロボット分野で、ロボットデータエコシステム構築を目指し活動を開始

インターネットメディア

執筆者：本人以外

PR TIMES

2025年03月
第12回日経「星新一賞」、グランプリに吉野玄冬氏

新聞・雑誌

執筆者：本人以外

日本経済新聞

2025年02月
プレスリリース［岡山大学］次世代AIと医学の可能性シンポジウム～フィジカルインテリジェンス～を開催

インターネットメディア

PR TIMES

2025年02月
MIT学長に聞く世界トップの大学に求められること

テレビ・ラジオ番組

執筆者：本人以外

NHK総合おはよう日本

2024年12月
生成AIで「ロボット」はここまで進化した、早大尾形哲也教授が語る「2050年の世界」

インターネットメディア

ビジネス+IT

2024年11月
ロボットに指示・意図どう伝える？…インターフェース最適化へ新手法探る

新聞・雑誌

日刊工業新聞

ニューススイッチ

2024年11月
人型ロボットが自動車工場や物流倉庫へ、テスラは25年から数千台配備

インターネットメディア

執筆者：本人以外

日経クロステック

2024年11月
日立製作所、フロントラインワーカー（現場作業員）の業務を革新するAIロボットの研究・開発を紹介

インターネットメディア

執筆者：本人以外

CEATECニュース

2024年10月
【ゲスト: 早稲田大学教授尾形哲也】AI/ロボットが得意なこと・苦手なこととは？そして人間が人間に求め続けたいものとは？

インターネットメディア

井桁弘恵の#みみしゃか!!, 聴くテレ朝

2024年09月
約1720億パラメータ（GPT-3級）の大規模言語モデルのフルスクラッチ学習を行い、プレビュー版「LLM-jp-3 172B beta1」を公開～学習データを含めすべてオープンにしたモデルとしては世界最大〜

インターネットメディア

執筆者：本人以外

国立情報学研究所

2024年09月
【トクイテン×早稲田大学尾形哲也教授】AIロボット×農業オンライン対談イベント

インターネットメディア

執筆者：本人以外

PR TIMES

2024年09月
【ゲスト: 早稲田大学教授尾形哲也】「AIロボットを一人一台持つ時代」が来たら、人間は仕事や家事から解放される！？

インターネットメディア

井桁弘恵の#みみしゃか!!, 聴くテレ朝

2024年09月
日本ソーシャル・イノベーション学会第6回年次大会を開催

インターネットメディア

執筆者：本人以外

BIGLOBEニュース

2024年09月
E2E自動運転､テスラのみこむ中国震源は｢上海AI実験室｣

インターネットメディア

執筆者：本人以外

NIKKEI Mobility

2024年07月
共に進化するAIとロボット尾形哲也（早稲田大学 AIロボット研究所）

新聞・雑誌

執筆者：本人以外

婦人之友2024年8月号

アイデアのたね

2024年07月
第12回日経星新一賞

新聞・雑誌

執筆者：本人以外

日本経済新聞

22面

2024年07月
人工知能学会、中高生向けの無料AIセミナーを5都市で開催-トップ研究者がAIや人工知能の最前線を紹介

インターネットメディア

執筆者：本人以外

こどもとIT

2024年06月
ロボットは大規模基盤モデルでどう変わる？まだまだ「賢くなる」、最新研究の数々

インターネットメディア

執筆者：本人以外

Seizo Trend

2024年05月
尾形哲也先生が期待する「ロボット製作から世界を広げていく方法」

インターネットメディア

執筆者：本人以外

ヒューマンアカデミーこども教育総合研究所

2024年05月
【ICRA2024】大規模基盤モデルとロボットの連携による新たな可能性

インターネットメディア

執筆者：本人以外

モリカトロンAIラボ

2024年05月
進化するロボット、家事や医療も器用に「ICRA」3選

インターネットメディア

執筆者：本人以外

NIKKEI Tech Foresight

2024年05月
人型ロボットにAI のワザ「頼れる機械」を社会に尾形哲也（早稲田大学）

新聞・雑誌

日経サイエンス

2024年04月
早大教授「ロボット研究の未来、身体的なデータが重要」, 直談専門家に問う

新聞・雑誌

執筆者：本人

日経産業新聞

2024年03月
Google、生成AIをロボットの頭脳に話しかけて操作

新聞・雑誌

執筆者：本人以外

日本経済新聞

2024年03月
最新技術多角的に問う

新聞・雑誌

執筆者：本人以外

毎日新聞

2024年03月
AIの未来を展望する企画展「AIのアイ～AIが見る世界、AIと創る世界～」をSKIPシティ映像ミュージアムで2024年1月16日から開催

インターネットメディア

執筆者：本人以外

ロボスタ

2024年01月
エクサウィザーズ、「AIの適切性に関する有識者委員会」を設立

インターネットメディア

執筆者：本人以外

日本経済新聞電子版

2023年12月
ロボット作業で重要な「力感覚」…AI学習で試行錯誤、日本リードも海外猛追

新聞・雑誌

執筆者：本人以外

日刊工業新聞

2023年12月
ロボットVTuber「ハツキ」を国際ロボット展(iREX)で披露-早大の尾形研究室で開発-アニメ文化とヒューマノイドの融合、山洋電気が稼働展示

インターネットメディア

執筆者：本人以外

ロボスタ

2023年12月
2023国際ロボット展／早大、ジャケットをハンガーにかけるロボ模倣学習で技術

新聞・雑誌

執筆者：本人以外

日刊工業新聞

2023年11月
「国際ロボット展」コンシェルジュ業・料理・掃除・介護も…最新技術お披露目

テレビ・ラジオ番組

執筆者：本人以外

FNN FNNプライムオンライン

2023年11月
ロボット展に過去最多出展半導体めぐる国際競争激化

テレビ・ラジオ番組

執筆者：本人以外

TBS TBS NEWS DIG

2023年11月
神への挑戦―人知の向かう先は-五感で自ら判断するロボット AIは人間に近づくのか

新聞・雑誌

執筆者：本人以外

毎日新聞

2023年10月
マルチモーダルAIでロボットが飛躍的に進化、実験も自動化へ【美容業界における生成AIのインパクトを考える（５）】

インターネットメディア

執筆者：本人以外

BeautyTech.jp

2023年10月
人間のように「見たモノを”崩しそう、つぶしそう”と想像する力」をAIが獲得物体間に働く力を想起する能力産総研

インターネットメディア

執筆者：本人以外

ロボスタ

2023年09月
LLMとロボットが奏でる未来

会誌・広報誌

執筆者：本人以外

NII Today, 第100号

2023年09月
エクサウィザーズ、JAXAと研究開発したAIロボットシステムが柔軟物のファスナーの開閉作業を実現精度は100％

インターネットメディア

ロボスタ

2023年08月
JAXAと開発AIロボ、ファスナー開閉エクサウィザーズ

新聞・雑誌

執筆者：本人以外

日本経済新聞

2023年08月
エクサウィザーズとJAXA、柔軟物のファスナー開閉が可能なAIロボットを開発

インターネットメディア

執筆者：本人以外

マイナビニュース

2023年08月
エクサウィザーズ、曲線のファスナーも自動開閉できるAIロボットシステムをJAXAと開発

インターネットメディア

執筆者：本人以外

IoTNEWS

2023年08月
JAXAと研究開発したAIロボットシステムが100％の精度で曲線も含む柔軟物のファスナーの開閉作業を実現

インターネットメディア

執筆者：本人以外

PR TIMES

2023年08月
한국로봇학회, UR 학술대회 성료

インターネットメディア

執筆者：本人以外

Robot Media

2023年07月
专访AI知名学者早稻田大学教授尾形哲也：“无所不能”的ChatGPT，却办不到这件事

インターネットメディア

執筆者：本人以外

东方新话

2023年07月
日本早稻田大学教授尾形哲也：智能机器人“有效”比“像人”更重要

新聞・雑誌

執筆者：本人以外

中国新闻网

2023年07月
AIとヒトの未来

会誌・広報誌

執筆者：本人以外

早稲田ウィークリー

2023年06月
AIロボが人を追い越す日カギは触覚「10年で人並みに」

新聞・雑誌

執筆者：本人以外

日本経済新聞

2023年06月
シューイチプレミアム

テレビ・ラジオ番組

日本テレビ

2023年05月
Japanese robotics lags as AI captures global attention

新聞・雑誌

執筆者：本人以外

Nature

DOI: 10.1038/d41586-023-00668-z

2023年03月
ディープラーニング、人間と共存するロボットを作る: 早稲田大学尾形哲也教授インタビュー（本文は韓国語）

新聞・雑誌

執筆者：本人以外

MIT Technology Review（韓国版）

2023年02月
AIとロボットの共進化がもたらす未来

インターネットメディア

執筆者：本人以外

Deloitte AI Institute ブログ

2023年01月
EFFICIENT MULTITASK LEARNING POSSIBLE WITH A PREDICTIVE MODEL FOR DOOR OPENING AND ENTRY

インターネットメディア

執筆者：本人以外

SERVO MAGAZINE

2022 ISSUE-2

2023年01月
人型ロボ細やかな動作可能に

新聞・雑誌

執筆者：本人以外

読売新聞

みんなのカガク知のリレー（5面）

2022年12月
早大、触覚ハンド“握って”深層学習４本指にセンサー384個、物の持ち方最適化

新聞・雑誌

執筆者：本人以外

日刊工業新聞

2022年09月
Tetsuya Ogata “Cognitive Robotics”

インターネットメディア

IEEE Soft Robotics Podcast

2022年07月
大量の実画像データの収集が不要なAIを開発

その他

執筆者：本人以外

産業技術総合研究所

2022年06月
日立 × 早稲田の共同研究グループ。ロボットの探究が好きでたまらない4人の研究ストーリー

インターネットメディア

執筆者：本人以外

Qiita Zine

タイアップ

2022年06月
「意識」には質的な差

新聞・雑誌

執筆者：本人以外

宮崎日日新聞

奏論

2022年06月
AIに自己意思は困難

新聞・雑誌

執筆者：本人以外

山形新聞

奏論

2022年06月
学習で人間的要素を

新聞・雑誌

執筆者：本人以外

岐阜新聞

争論

2022年06月
機械学習で人間的な要素を

新聞・雑誌

執筆者：本人以外

京都新聞

ニュースを読み解く

2022年06月
人間に近い意識格段に困難

新聞・雑誌

執筆者：本人以外

中国新聞朝刊

交論高性能ロボット

2022年06月
機械学習で人間的に

新聞・雑誌

執筆者：本人以外

秋田さきがけ

奏論

2022年06月
「倫理基準」の確立必要

新聞・雑誌

執筆者：本人以外

新潟日報

争論

2022年06月
自然な動き習得促進意思持つ人工知能困難,

新聞・雑誌

執筆者：本人以外

茨城新聞

奏論

2022年06月
機械に意識持たせるのは困難

新聞・雑誌

執筆者：本人以外

神戸新聞

奏論

2022年06月
機械学習で人間的に

新聞・雑誌

執筆者：本人以外

岩手日報

奏論

2022年05月
第3回全校高等専門学校ディープラーニングコンテスト（審査結果）

新聞・雑誌

執筆者：本人以外

日本経済新聞

2022年05月
早稲田大教授尾形哲也氏機械学習で人間的要素意識持たせるのは困難

新聞・雑誌

執筆者：本人以外

山陰中央新報

2022年05月
「サイエンス探求AIロボットプラットフォーム」とは-ムーンショット3が目指す柔軟な知能を持ったロボット

インターネットメディア

執筆者：本人以外

ロボスタ

2022年05月
かのうちあやこの「ＮＥＸＴＥＣＨ」レポート

インターネットメディア

執筆者：本人以外

eWARRANT JOURNAL

2022年05月
ＡＩキティも作った…「パートナーロボット」を夢見る日本

インターネットメディア

執筆者：本人以外

韓国中央日報

コラム

2022年05月
スクランブルエッグ調理でロボの学び方解釈早大が新技術

新聞・雑誌

執筆者：本人以外

日刊工業新聞

2022年04月
Cool or Creepy? Video Shows AI Robot Taught How to Open Doors

インターネットメディア

執筆者：本人以外

Newsweek

2022年04月
早大、予測と現実の差を埋めるよう柔軟に行動できるロボット制御技術を開発

インターネットメディア

執筆者：本人以外

マイナビ TECH+

2022年04月
人間のように状況判断しながら動くロボットの制御技術あらわる！

インターネットメディア

執筆者：本人以外

bp-Affairs

2022年04月
高専DCON2022本戦

新聞・雑誌

執筆者：本人以外

日本経済新聞朝刊

2022年04月
早大と日立、作業内容や環境が変化しても行動をリアルタイムに決定・実行可能な深層予測学習型のロボット制御技術を開発

インターネットメディア

執筆者：本人以外

日本経済新聞

2022年04月
Robot learns to open doors by splitting the task into three easy steps

インターネットメディア

執筆者：本人以外

NewScientist

2022年04月
Japanese companies develop sophisticated robots built for companionship

インターネットメディア

執筆者：本人以外

The Japan News, Asia News Network

2022年03月
ロボもっと愛(AI)らしく

新聞・雑誌

執筆者：本人以外

読売新聞朝刊

2022年03月
Personal AI-based robots as lifetime human companions

その他

執筆者：本人

Science” webinar

2022年03月
深層予測学習と実ロボットによる身体知の実現

新聞・雑誌

執筆者：本人

日刊工業新聞

2022年03月
AIとロボットの共進化とは？研究の最前線に触れ、語り合うAI活用の未来

インターネットメディア

執筆者：本人以外

DL for DX

2022年02月
NVIDIA Partner Solution Connect 開催！

インターネットメディア

執筆者：本人以外

PR TIMES

2022年01月
等身大の自律人型フィギュアVTuberに注目現実側でもロボットとして稼働できる！

インターネットメディア

MoguLive

2022年01月
Can Elon Musk and Tesla really build a humanoid robot in 2022?

インターネットメディア

執筆者：本人以外

NewScientist

2021年12月
面倒な家事、ロボにお任せ

新聞・雑誌

執筆者：本人以外

日経産業新聞

マンスリー編集特集

2021年12月
「深層予測学習」でロボットを制御する, FUTURE STORY

会誌・広報誌

執筆者：本人以外

三機工業株式会社 Harmony

2021年11月
人工知能(AI)・ロボット活用による自動化, ロボ化で製造現場を変革

新聞・雑誌

執筆者：本人以外

日刊工業新聞朝刊

2021年10月
大学の勉強ってこんなにおもしろい！ vol. 128, AIロボットのゲンバ, 自ら判断・行動するロボットが家事や介護を担う未来

新聞・雑誌

執筆者：本人以外

株式会社四谷大塚 Dream Navi

2021年09月
ロボットの知能化を実現する「エクスペリエンス・ベースド・ロボティクス」とは

インターネットメディア

執筆者：本人以外

MONOist

https://monoist.atmarkit.co.jp/mn/articles/2107/09/news056.html

2021年07月
作業内容に合わせて操作法を変更して実行する深層学習型ロボット制御技術

インターネットメディア

執筆者：本人以外

わかる科学, つくばサイエンスニュース

http://www.tsukuba-sci.com/?column02=%e4%bd%9c%e6%a5%ad%e5%86%85%e5%ae%b9%e3%81%ab%e5%90%88%e3%82%8f%e3%81%9b%e3%81%a6%e6%93%8d%e4%bd%9c%e6%b3%95%e3%82%92%e5%a4%89%e6%9b%b4%e3%81%97%e3%81%a6%e5%ae%9f%e8%a1%8c%e3%81%99%e3%82%8b-%e6%b7%b1

2021年07月
～興味や知識レベルに応じて自由に選択・段階的に学べる～ JMOOC提供『AI活用人材育成講座』全8講座オンライン講座「gacco（R）（ガッコ）」にて6月30日開講

インターネットメディア

執筆者：本人以外

Dream News

https://www.dreamnews.jp/press/0000239399/

2021年06月
「癒やし系」の家庭用ロボ続々巣ごもりの話し相手にも

新聞・雑誌

執筆者：本人以外

朝日新聞デジタル

https://www.asahi.com/articles/ASP6H3CL9P69ULFA02Q.html

2021年06月
IEEE ICRA 2021 Awards (with videos and papers)

インターネットメディア

執筆者：本人以外

Robohub

https://robohub.org/ieee-icra-2021-awards-with-videos-and-papers/

2021年06月
自ら学習ロボ１台で家事早稲田大学AIロボット研究所所長尾形哲也さん, リレーおぴにおんソロで行こう7

新聞・雑誌

執筆者：本人以外

朝日新聞朝刊

https://www.asahi.com/articles/DA3S14931800.html

2021年06月
早大、知らない言葉でもデータから類推して作業できるロボット制御法を開発

インターネットメディア

執筆者：本人以外

マイナビニュース

https://news.mynavi.jp/article/20210531-1897267/

2021年05月
家事全般担う「相棒」開発早稲田大AIロボット研究所

新聞・雑誌

執筆者：本人以外

日本経済新聞朝刊

2021年05月
An artificial neural network to acquire grounded representations of robot actions and language

インターネットメディア

執筆者：本人以外

Tech Xplore

https://techxplore.com/news/2021-05-artificial-neural-network-grounded-representations.html

2021年05月
評価額は6億円｜1位は福井高専のエッジAIによる老朽化診断ツール｜DCON2021速報

インターネットメディア

執筆者：本人以外

AINOW

https://ainow.ai/2021/04/17/254647/

2021年04月
Robot learns to tie knots using only two fingers on each hand

インターネットメディア

執筆者：本人以外

NewScientist

2021年03月
神経回路モデル搭載ロボットで、ASDの認知行動異常を解明－早大ほか

インターネットメディア

執筆者：本人以外

医療NEWS

2020年08月
早大と国立精神・神経医療研究センター、神経発達障害の認知行動異常のメカニズムを解明

新聞・雑誌

執筆者：本人以外

日本経済新聞

2020年08月
ANA HD、グループ会社のavatarinにアドバイザー4名が就任

インターネットメディア

執筆者：本人以外

FlyTeamニュース

https://flyteam.jp/news/article/124644

2020年05月
ディープラーニングがロボットを多能にする，進化を続ける人工知能AI

新聞・雑誌

日経サイエンス

p.96

2020年03月
等身大の「動くフィギュア」を開発するCutieroidプロジェクトとは？

インターネットメディア

Gigazine

2020年02月
AIの最新技術とロボットを融合させた新しいモノづくりを実現！−早稲田大学理工学術院基幹理工学部表現工学科教授尾形哲也氏

会誌・広報誌

執筆者：本人以外

株式会社四谷大塚 Dream Navi

2020年01月
ロボットの知能化最前線ミラーニューロン、模倣学習+GAN最新研究「NEDO AI＆ROBOT NEXTシンポジウム」浅田氏・尾形氏・松原氏講演

インターネットメディア

執筆者：本人以外

ロボスタ

2020年01月
NEDO AI＆ROBOT NEXT シンポジウム、「次世代人工知能技術」や「次世代人工知能技術を搭載したロボット」講演概要

新聞・雑誌

執筆者：本人以外

週刊アスキー

2020年01月
複数動作を深層学習早大、双腕ロボでタオル畳み実証

新聞・雑誌

執筆者：本人以外

日刊工業新聞

2019年11月
分身ロボで出勤宇宙へカフェへ-第8部となりのロボ（1）

新聞・雑誌

執筆者：本人以外

日本経済新聞

2019年11月
世界的権威に聞く「ロボット×ディープラーニング最前線」

インターネットメディア

執筆者：本人以外

ROBOTEER

2019年10月
深層学習によるロボットの動作学習と応用可能性

インターネットメディア

執筆者：本人以外

WASEDA ONLINE 読売新聞

https://yab.yomiuri.co.jp/adv/wol/opinion/science_190924.html

2019年09月
「多機能型の家庭用ロボが登場」、早大尾形教授が語る2025年のAI

インターネットメディア

執筆者：本人以外

日経 xTECH

https://tech.nikkeibp.co.jp/atcl/nxt/column/18/00934/082600005/

2019年09月
ディープラーニングが革新するロボット産業・後編｜早稲田大学教授尾形哲也

Found

https://found.media/n/nf68aa1bd76c5

2019年09月
ディープラーニングが革新するロボット産業・前編｜早稲田大学教授尾形哲也

インターネットメディア

執筆者：本人以外

Found

https://found.media/n/naeafdbb8be34

2019年09月
【大学研究室Vol.37】ロボットの身体に人間らしい感覚を──。産業界などとの協働にも注力しながら、“知能ロボット”研究の未来を切り拓く

会誌・広報誌

Technologist’s Magazine

https://www.criprof.com/magazine/2019/08/22/post-6009/?fbclid=IwAR0Bt-ojYStk4mTI7eNK8RNoeq-O_eizKukpH1IWlatFj4FkLUE2nG25k7c

2019年08月
ディープラーニングとハードウェアで競う「高専版」マネーの虎──「サイエンスZERO」も密着、開催の裏側

インターネットメディア

執筆者：本人以外

Ledge.ai

https://ledge.ai/dcon-afterstory/

2019年07月
ロボ操縦ＡＩの研究加速深層学習で動作習得

新聞・雑誌

執筆者：本人以外

日刊工業新聞

https://www.nikkan.co.jp/articles/view/00523943?fbclid=IwAR2_lrbKqrH6Ww5VE8MB13DKfRZS0hCSaF_zKN0slpqtsmAMgyYCJgzmAfQ

2019年07月
日本ディープラーニング協会の新体制、5名の特別顧問が就任

インターネットメディア

執筆者：本人以外

AINOW

https://ainow.ai/2019/07/03/173052/

2019年07月
ロボット x AIの領域がブルーオーシャンである理由

インターネットメディア

執筆者：本人以外

AI新聞

https://aishinbun.com/clm/20190620/2147/

2019年06月
日本はもはやロボット大国ではない！？論文数で７位に転落

インターネットメディア

執筆者：本人以外

AI新聞

https://aishinbun.com/nocategory/20190613/2136/

2019年06月
あらゆる業界にAI浸透 AI/SUM閉幕

新聞・雑誌

執筆者：本人以外

日本経済新聞

https://www.nikkei.com/article/DGXMZO44176680V20C19A4XY0000/

2019年04月
【官民総力戦】日経新聞社主催のグローバルAIサミット「AI/SUM（アイサム）」開幕

インターネットメディア

執筆者：本人以外

Ledge.ai

https://ledge.ai/aisum-day-1/

2019年04月
AIのロボット応用事例を紹介群馬産業技術センター講演会

新聞・雑誌

執筆者：本人以外

上毛新聞

2019年04月
日立製作所共同開発の新拠点「協創の森」創設

テレビ・ラジオ番組

執筆者：本人以外

テレビ朝日テレ朝news

https://news.tv-asahi.co.jp/news_economy/articles/000151983.html

2019年04月
早稲田大学尾形哲也教授インタビュー人工知能を基盤とする日常生活支援ロボットの研究開発

その他

執筆者：本人以外

国立研究開発法人新エネルギー・産業技術総合開発機構次世代人工知能・ロボット中核技術開発～紹介ハンドブック～（2018年度版）

https://www.nedo.go.jp/library/pamphlets/ZZ_pamphlets_00009.html

2019年03月
AI活用の壁は”アクション”で乗り越える【イベントレポート後編】

インターネットメディア

執筆者：本人以外

情報畑でつかまえて|NTTテクノロスブログ

https://www.ntt-tx.co.jp/column/feature_blog/20190131_2/

2019年02月
学習済みの動作を組み合わせてロボット全身の自律制御を行う深層学習技術

会誌・広報誌

執筆者：本人以外

日立評論技術革新サービス&プラットフォーム：研究開発

http://www.hitachihyoron.com/jp/archive/2010s/2019/01/24/index.html#sec04

2019年02月
“できない”AIを使いこなす3つのポイント【イベントレビュー前編】

インターネットメディア

執筆者：本人以外

情報畑でつかまえて|NTTテクノロスブログ

https://www.ntt-tx.co.jp/column/feature_blog/20190131/

2019年02月
NTTテクノクロスフェア2018 Crossing for the Next

新聞・雑誌

執筆者：本人以外

週刊東洋経済

https://toyokeizai.net/articles/-/253785?page=2

2018年12月
対談 AIロボットの挑戦尾形哲也／田原総一朗

会誌・広報誌

執筆者：本人以外

早稲田学報

2018年11月
Artificial intelligence: the new ghost in the machine,

インターネットメディア

執筆者：本人以外

Engineering and Technology

https://eandt.theiet.org/content/articles/2018/10/artificial-intelligence-the-new-ghost-in-the-machine/

2018年10月
Googleなど各社は、画像から音響や材質質感・３次元立体映像を推定する技術をどう商用展開するか

インターネットメディア

執筆者：本人以外

AINOW

http://ainow.ai/2018/10/10/148428/

2018年10月
Kampai to AI! GTC Japan Celebrates Robotics Innovations

インターネットメディア

執筆者：本人以外

NVIDIA Blogs

https://blogs.nvidia.com/blog/2018/09/07/gtc-japan-2018/

2018年09月
NVIDIAのCEOが先進技術の新機能を発表

インターネットメディア

執筆者：本人以外

bp-Affairs

https://bp-affairs.com/news/2018/09/20180903-7962.html

2018年09月
NVIDIA CEO ジェンスンフアン、ロボティクス、AI、自動運転のための新機能を発表

インターネットメディア

執筆者：本人以外

PR TIMES

https://prtimes.jp/main/html/rd/p/000000090.000012662.html

2018年08月
AIで切り開く新たな未来ーロボット制御から精神疾患治療までサイエンティフィック・アメリカン主催日経サイエンス共催

新聞・雑誌

執筆者：本人以外

日経サイエンス

https://www.natureasia.com/ja-jp/ndigest/v15/n9/AI%E3%81%A7%E5%88%87%E3%82%8A%E9%96%8B%E3%81%8F%E6%96%B0%E3%81%9F%E3%81%AA%E6%9C%AA%E6%9D%A5/93829

2018年08月
GPUとディープラーニング、AI関連技術の国内最大級のイベント「GTC Japan 2018」9月13・14日に開催

インターネットメディア

執筆者：本人以外

ロボスタ

https://robotstart.info/2018/08/10/gtcjapan2018.html

2018年08月
「正解」を示さなくてもなぜAIが学べるのか-経営者のためのAI入門（3）

インターネットメディア

執筆者：本人以外

JBpress

https://jbpress.ismedia.jp/articles/-/53579

2018年07月
AI人材に求められるもの、2018年度人工知能学会全国大会

インターネットメディア

執筆者：本人以外

日経XTREND

http://trend.nikkeibp.co.jp/atcl/contents/watch/00013/00047/

2018年07月
失業するかもしれない…AI脅威論の払拭を模索する研究者たち-産総研がＡＩ三本柱戦略

インターネットメディア

執筆者：本人以外

ニューススイッチ

https://newswitch.jp/p/13360

2018年06月
サイエンスView

新聞・雑誌

執筆者：本人以外

読売新聞朝刊

2018年06月
学習済みの複数の動作を自律的に組み合わせてロボット全身の制御を行う深層学習技術を開発ー動作習得に必要な期間の大幅短縮と動作バリエーションの飛躍的な増大を実現

会誌・広報誌

執筆者：本人以外

日立製作所ニュースリリース

http://www.hitachi.co.jp/New/cnews/month/2018/05/0531.html

2018年05月
学習データを取り換えるだけで、様々な動作を実現する汎用ロボット

インターネットメディア

執筆者：本人以外

IoTNEWS.JP

https://iotnews.jp/archives/89974

2018年04月
直談専門家に問うロボとAIの融合日本，ハード面で強み

新聞・雑誌

執筆者：本人以外

日経産業新聞

2018年04月
人工知能が未来を変える！ＡＩ大解剖スペシャル第二回

テレビ・ラジオ番組

BS-TBS

http://www.bs-tbs.co.jp/genre/detail/?mid=ai2018

2018年03月
人工知能が未来を変える！ＡＩ大解剖スペシャル第一回

テレビ・ラジオ番組

BS-TBS

http://www.bs-tbs.co.jp/genre/detail/?mid=ai2018

2018年03月
デンソーウェーブら、Science Robotics Meetingで「双腕型マルチモーダルAIロボ」を展示

インターネットメディア

執筆者：本人以外

ロボスタ

https://robotstart.info/2018/03/08/densowave-srm.html

2018年03月
デンソーウェーブ、ベッコフオートメーションと共同で米国サイエンス誌主催「Science Robotics Meeting in Japan2018」に双腕型マルチモーダルAIロボットを出展～「マルチモーダルAIロボットの誕生と成長」を語る座談会を同時開催～

インターネットメディア

執筆者：本人以外

PRTIMES

https://prtimes.jp/main/html/rd/p/000000028.000013815.html

2018年03月
Review 尾形哲也教授

新聞・雑誌

執筆者：本人以外

早稲田理工 by AERA 2018

https://publications.asahi.com/ecs/detail/?item_id=19796

2018年02月
ＡＩの死角（上）感覚・常識、学びにくく

新聞・雑誌

執筆者：本人以外

日本経済新聞朝刊

https://www.nikkei.com/article/DGKKZO26246240Y8A120C1TJM000/

2018年01月
人工知能、「超人」へ

新聞・雑誌

執筆者：本人以外

日経エレクトロニクス

https://xtech.nikkei.com/dm/atcl/mag/15/00189/

2018年01月
経営ひと言／早稲田大学・尾形哲也教授

新聞・雑誌

執筆者：本人以外

日刊工業新聞

2018年01月
認知ロボティクスで、多用途で活躍できるロボットを開発する〜尾形哲也・早稲田大学基幹理工学部教授

インターネットメディア

執筆者：本人以外

IGPI Top Researchers

http://top-researchers.com/?s=%E5%B0%BE%E5%BD%A2%E5%93%B2%E4%B9%9F

2018年01月
Video Friday: Happy Robot Holidays, AI Folding Laundry, and RoboThespian’s TED Talk

インターネットメディア

執筆者：本人以外

IEEE Spectrum

https://spectrum.ieee.org/automaton/robotics/robotics-hardware/video-friday-happy-robot-holidays-ai-folding-laundry-robothespian-ted-talk

2017年12月
出川哲朗のアイ・アム・スタディー

テレビ・ラジオ番組

日本テレビ

2017年12月
双腕型ロボットが自動でタオルをたたみサラダを盛り付ける、AI学習はVRシステム

インターネットメディア

執筆者：本人以外

MONOist

https://monoist.itmedia.co.jp/mn/articles/1711/30/news055.html

2017年11月
VRでやって見せればAIで動作を覚えるロボット-プログラムレスで複雑な動きも

インターネットメディア

執筆者：本人以外

日経テクノロジーONLINE

https://xtech.nikkei.com/dm/atcl/event/15/091100141/112900012/

2017年11月
【ここまできた！】初公開の「汎用」マルチモーダルAIロボットアームはここが凄い！深層学習と予測学習を使い、VRでティーチング！

インターネットメディア

執筆者：本人以外

ロボスタ

https://robotstart.info/2017/11/29/denso-mmaira.html

2017年11月
今後の「AI・ロボット」の発展（寄稿）

新聞・雑誌

執筆者：本人

日刊工業新聞

2017年11月
デンソーウェーブ、ベッコフオートメーション、エクサウィザーズ、ディープラーニングでロボットアームをリアルタイム制御する双腕型マルチモーダルAIロボットを開発

インターネットメディア

執筆者：本人以外

PRTIMES

https://prtimes.jp/main/html/rd/p/000000024.000013815.html

2017年11月
CEATEC 2017ロボットレポート（後編）――双腕ロボットが大活躍

インターネットメディア

執筆者：本人以外

MONOist

https://monoist.itmedia.co.jp/mn/articles/1711/06/news019.html

2017年11月
君は未来から来た友達

新聞・雑誌

執筆者：本人以外

読売新聞夕刊

2017年11月
衝撃！未来テクノロジー２０３０年世界はこう変わる

テレビ・ラジオ番組

BSジャパン

https://www.bs-tvtokyo.co.jp/official/miraitechnology/

2017年10月
社会実装に向け着実に進化、CEATEC 2017で見たAI

インターネットメディア

執筆者：本人以外

EE Times Japan

https://eetimes.itmedia.co.jp/ee/articles/1710/12/news078.html

2017年10月
産総研がタオルたたむロボット、「強化学習より短時間で学習」

インターネットメディア

執筆者：本人以外

日経クロステック

https://xtech.nikkei.com/dm/atcl/event/15/091100139/101100087/

2017年10月
CEATEC 2017で見た「明日の技術」いろいろ

インターネットメディア

執筆者：本人以外

マイナビニュース

https://news.mynavi.jp/article/20171009-ceatec02/3

2017年10月
「CEATEC JAPAN 2017」で「日本ディープラーニング協会」設立発表会開催

インターネットメディア

執筆者：本人以外

CarWatch

https://car.watch.impress.co.jp/docs/news/1084506.html

2017年10月
日本ディープラーニング協会が発足、技術者育成へ

新聞・雑誌

執筆者：本人以外

日本経済新聞

https://www.nikkei.com/article/DGXMZO2193011005102017000000/

2017年10月
日本ディープラーニング協会が設立、2020年までに3万人の技術者育成を目指す

新聞・雑誌

執筆者：本人以外

日経BP

http://itpro.nikkeibp.co.jp/atcl/news/17/100402400/

2017年10月
変わる学びの形態，大学でのアクティブラーニングの実例

新聞・雑誌

東進進学情報

2017年07月
人工知能とロボット技術の最前線第５回神経モデルとロボットの深淵なる関係

新聞・雑誌

執筆者：本人以外

オーム社ロボコンマガジン

2017年07月
AI・ロボット開発，これが日本の勝利の法則

その他

執筆者：本人以外

扶桑社

2017年03月
明日のAIを見にいこう

会誌・広報誌

執筆者：本人以外

経済産業省 METI Journal

2017年02月
世界を変えるニッポンの技術 SFの世界が現実に！？

新聞・雑誌

執筆者：本人以外

朝日新聞出版 AERA,

https://dot.asahi.com/aera/2017010600169.html?page=4

2017年01月
IEEEプレスセミナー：ディープラーニングが「意図をくみ取る」ロボットを実現する

インターネットメディア

執筆者：本人以外

TechFactory

https://www.atpress.ne.jp/news/118679

2016年11月
インタビュー早稲田大学理工学術院・尾形哲也教授（下）

新聞・雑誌

執筆者：本人以外

日刊工業新聞社機械設計11月別冊

https://www.nikkan.co.jp/articles/view/00407174

2016年11月
インタビュー早稲田大学理工学術院尾形哲也教授（上）

新聞・雑誌

執筆者：本人以外

日刊工業新聞社機械設計11月別冊

https://www.nikkan.co.jp/articles/view/00406428

2016年11月
早大尾形教授とベッコフ川野社長対談、IoTによるAIとロボットの融合は何をもたらすか

インターネットメディア

執筆者：本人以外

ビジネス＋IT

https://www.sbbit.jp/article/cont1/32784

2016年10月
SFリアル「アトムと暮らす日」

テレビ・ラジオ番組

執筆者：本人以外

NHK Eテレ23

2016年08月
有識者インタビュー人工知能（AI）の現状と未来

会誌・広報誌

執筆者：本人以外

総務省平成28年版情報通信白書

https://www.soumu.go.jp/johotsusintokei/whitepaper/ja/h28/pdf/n4200000.pdf

2016年07月
人工知能の大革命！ディープラーニング

テレビ・ラジオ番組

執筆者：本人以外

NHK Eテレ NHKサイエンスZERO

2016年06月
人と協調するロボット、衛星画像からの予測… 、期待がかかる国内の人工知能研究者

日経BigData

2016年01月
人工知能の実力（中）「深層学習」で自ら賢く

新聞・雑誌

執筆者：本人以外

日本経済新聞

https://www.nikkei.com/article/DGKKZO89777120X20C15A7TJM000/

2015年07月
ディープラーニングは万能か【第３部：タスク別編】

新聞・雑誌

執筆者：本人以外

日経エレクトロニクス

https://xtech.nikkei.com/dm/article/MAG/20150501/416852/

2015年06月
第8回ディープラーニング

会誌・広報誌

執筆者：本人

東京都立産業技術研究センター TIRI NEWS

https://www.iri-tokyo.jp/uploaded/attachment/2235.pdf

▼全件表示

論文

ForceMapping: learning visual-force features from vision for soft objects manipulation

Abdullah Mustafa, Ryo Hanai, Ixchel Ramirez, Floris Erich, Ryoichi Nakajo, Yukiyasu Domae, Tetsuya Ogata

Advanced Robotics 2025年05月 [査読有り]

担当区分：最終著者

DOI

Scopus
Enhancement of long-horizon task planning via active and passive modification in large language models

Kazuki Hori, Kanata Suzuki, Tetsuya Ogata

Scientific Reports 15 ( 1 ) 2025年02月 [査読有り]

担当区分：最終著者

　概要を見る

Abstract

This study proposes a method for generating complex and long-horizon off-line task plans using large language models (LLMs). Although several studies have been conducted in recent years on robot task planning using LLMs, the planning results tend to be simple, consisting of ten or fewer action commands, depending on the task. In the proposed method, the LLM actively collects missing information by asking questions, and the task plan is upgraded with one dialog example. One of the contributions of this study is a Q&A process in which ambiguity judgment is left to the LLM. By sequentially eliminating ambiguities contained in long-horizon tasks through dialogue, our method increases the amount of information included in movement plans. This study aims to further refine action plans obtained from active modification through dialogue by passive modification, and few studies have addressed these issues for long-horizon robot tasks. In our experiments, we define the number of items in the task planning as information for robot task execution, and we demonstrate the effectiveness of the proposed method through dialogue experiments using a cooking task as the subject. And as a result of the experiment, the amount of information could be increased by the proposed method.

DOI

Scopus

1

被引用数

(Scopus)
Teleoperation Experience Like VR Games: Generating Object-Grasping Motions Based on Predictive Learning

Ryuya Shuto, Pin-Chu Yang, Naoki Hashimoto, Mohammed Al-Sada, Tetsuya Ogata

2025 IEEE/SICE International Symposium on System Integration (SII) 1310 - 1317 2025年01月 [査読有り]

担当区分：最終著者

DOI
Computational Simulation of Wisconsin Card Sorting Task by using Variational Recurrent Neural Network based on the free energy principle

Daiki Goto, Hayato Idei, Tetsuya Ogata

2025 IEEE/SICE International Symposium on System Integration (SII) 21 - 28 2025年01月 [査読有り]

担当区分：最終著者

DOI
Deformation Analysis and Prediction of Drop-Stitch Reinforced Inflatable Robot Link for 1DOF and 2DOF Motion

Gangadhara Naga Sai Gubbala, Masato Nagashima, Hiroki Mori, Young Ah Seong, Hiroki Sato, Ryuma Niiyama, Yuki Suga, Tetsuya Ogata

2025 IEEE/SICE International Symposium on System Integration (SII) 88 - 95 2025年01月 [査読有り]

担当区分：最終著者

DOI
Developing a Framework for Natural Human Movement Mimicry of Low-Dynamic Motions in Mobile-Based Humanoids

Simon Gormuzov, Yushi Wang, Pinchu Yang, Tamon Miyake, Tetsuya Ogata, Shigeki Sugano

2025 IEEE/SICE International Symposium on System Integration (SII) 942 - 948 2025年01月 [査読有り]

DOI
Predicting human behavior using knowledge information in jig operation and robot collaborative action generation.

Mone Tamaki, Ryoichi Nakajo, Natsuki Yamanobe, Yukiyasu Domae, Tetsuya Ogata

SII 1041 - 1046 2025年 [査読有り]

担当区分：最終著者

DOI

Scopus
3D Space Perception via Disparity Learning Using Stereo Images and an Attention Mechanism: Real-Time Grasping Motion Generation for Transparent Objects

Xianbo Cai, Hiroshi Ito, Hyogo Hiruma, Tetsuya Ogata

IEEE Robotics and Automation Letters 2024年12月 [査読有り]

担当区分：最終著者

DOI

Scopus
Multi-Fingered Dragging of Unknown Objects and Orientations Using Distributed Tactile Information Through Vision-Transformer and LSTM

T. Ueno, S. Funabashi, H. Ito, A. Schmitz, S. Kulkarni, T. Ogata, S. Sugano

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 7445 - 7452 2024年10月 [査読有り]

DOI
Sensorimotor Attention and Language-based Regressions in Shared Latent Variables for Integrating Robot Motion Learning and LLM

Kanata Suzuki, Tetsuya Ogata

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 11872 - 11878 2024年10月 [査読有り]

担当区分：最終著者

DOI
Real-time Coordinated Motion Generation: A Hierarchical Deep Predictive Learning Model for Bimanual Tasks

Genki Shikada, Simon Armleder, Hiroshi Ito, Gordon Cheng, Tetsuya Ogata

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 496 - 503 2024年10月 [査読有り]

担当区分：最終著者

DOI
Augmenting Compliance With Motion Generation Through Imitation Learning Using Drop-Stitch Reinforced Inflatable Robot Arm With Rigid Joints

Gangadhara Naga Sai Gubbala, Masato Nagashima, Hiroki Mori, Young Ah Seong, Hiroki Sato, Ryuma Niiyama, Yuki Suga, Tetsuya Ogata

IEEE Robotics and Automation Letters 2024年10月 [査読有り]

担当区分：最終著者

DOI

Scopus

4

被引用数

(Scopus)
Special issue on real-world robot applications of the foundation models.

Kento Kawaharazuka, Tatsuya Matsushima, Shuhei Kurita, Chris Paxton 0001, Andy Zeng 0001, Tetsuya Ogata, Tadahiro Taniguchi

Advanced Robotics 38 ( 18 ) 1231 - 1231 2024年09月

DOI

Scopus
Retry-behavior Emergence for Robot-Motion Learning Without Teaching and Subtask Design

Hideyuki Ichiwara, Hiroshi Ito, Kenjiro Yamamoto, Tetsuya Ogata

2024 IEEE International Conference on Advanced Intelligent Mechatronics (AIM) 178 - 183 2024年07月 [査読有り]

担当区分：最終著者

DOI
Future shapes present: autonomous goal-directed and sensory-focused mode switching in a Bayesian allostatic network model

Hayato Idei, Jun Tani, Tetsuya Ogata, Yuichi Yamashita

2024年04月 [査読有り]

　概要を見る

Abstract

Trade-offs between moving to achieve goals and perceiving the surrounding environment highlight the complexity of continually adapting behaviors. The need to switch between goal-directed and sensory-focused modes, along with the goal emergence phenomenon, challenges conventional optimization frameworks, necessitating heuristic solutions. In this study, we propose a Bayesian recurrent neural network framework for homeostatic behavior adaptation via hierarchical multimodal integration. In it, the meta-goal of “minimizing predicted future sensory entropy” underpins the dynamic self-organization of future sensorimotor goals and their precision regarding the increasing sensory uncertainty due to unusual physiological conditions. We demonstrated that after learning a hierarchical predictive model of a dynamic environment through random exploration, our Bayesian agent autonomously switched self-organized behavior between goal-directed feeding and sensory-focused resting. It increased feeding before anticipated food shortages, explaining predictive energy regulation (allostasis) in animals. Our modeling framework opens new avenues for studying brain information processing and anchoring continual behavioral adaptations.

DOI
Tactile Object Property Recognition Using Geometrical Graph Edge Features and Multi-Thread Graph Convolutional Network

Shardul Kulkarni, Satoshi Funabashi, Alexander Schmitz, Tetsuya Ogata, Shigeki Sugano

IEEE Robotics and Automation Letters 2024年04月 [査読有り]

DOI

Scopus

5

被引用数

(Scopus)
Visual Imitation Learning of Non-Prehensile Manipulation Tasks with Dynamics-Supervised Models

Abdullah Mustafa, Ryo Hanai, Ixchel G. Ramirez-Alpizar, Floris Erich, Ryoichi Nakajo, Yukiyasu Domae, Tetsuya Ogata

IEEE International Conference on Automation Science and Engineering 3872 - 3879 2024年 [査読有り]

担当区分：最終著者

　概要を見る

Unlike quasi-static robotic manipulation tasks like pick-and-place, dynamic tasks such as non-prehensile manipulation pose greater challenges, especially for vision-based control. Successful control requires the extraction of features relevant to the target task. In visual imitation learning settings, these features can be learnt by backpropagating the policy loss through the vision backbone. Yet, this approach tends to learn task-specific features with limited generalizability. Alternatively, learning world models can realize more generalizable vision backbones. Utilizing the learnt features, task-specific policies are subsequently trained. Commonly, these models are trained solely to predict the next RGB state from the current state and action taken. But only-RGB prediction might not fully-capture the task-relevant dynamics. In this work, we hypothesize that direct supervision of target dynamic states (Dynamics Mapping) can learn better dynamics-informed world models. Beside the next RGB reconstruction, the world model is also trained to directly predict position, velocity, and acceleration of environment rigid bodies. To verify our hypothesis, we designed a non-prehensile 2D environment tailored to two tasks: "Balance-Reaching"and "Bin-Dropping". When trained on the first task, dynamics mapping enhanced the task performance under different training configurations (Decoupled, Joint, End-to-End) and policy architectures (Feedforward, Recurrent). Notably, its most significant impact was for world model pretraining boosting the success rate from 21% to 85%. Although frozen dynamics-informed world models could generalize well to a task with in-domain dynamics, but poorly to a one with out-of-domain dynamics. Code available at: https://github.com/Automation-Research-Team/DynamicsMapping-2D.

DOI

Scopus

1

被引用数

(Scopus)
Work Tempo Instruction Framework for Balancing Human Workload and Productivity in Repetitive Task.

Naoki Shirakura, Natsuki Yamanobe, Tsubasa Maruyama, Yukiyasu Domae, Tetsuya Ogata

HRI (Companion) 980 - 984 2024年 [査読有り]

担当区分：最終著者

DOI

Scopus
Automatic Segmentation of Continuous Time-Series Data Based on Prediction Error Using Deep Predictive Learning.

Suzuka Harada, Ryoichi Nakajo, Kei Kase, Tetsuya Ogata

SII 928 - 933 2024年 [査読有り]

担当区分：最終著者

DOI

Scopus
Generating Long-Horizon Task Actions by Leveraging Predictions of Environmental States.

Hiroto Iino, Kei Kase, Ryoichi Nakajo, Naoya Chiba, Hiroki Mori, Tetsuya Ogata

SII 478 - 483 2024年 [査読有り]

担当区分：最終著者

DOI

Scopus
Real-Time Motion Generation and Data Augmentation for Grasping Moving Objects with Dynamic Speed and Position Changes.

Kenjiro Yamamoto, Hiroshi Ito, Hideyuki Ichiwara, Hiroki Mori, Tetsuya Ogata

SII 390 - 397 2024年 [査読有り]

担当区分：最終著者

DOI

Scopus

3

被引用数

(Scopus)
Interactively Robot Action Planning with Uncertainty Analysis and Active Questioning by Large Language Model.

Kazuki Hori, Kanata Suzuki, Tetsuya Ogata

SII 85 - 91 2024年 [査読有り]

担当区分：最終著者

DOI

Scopus

9

被引用数

(Scopus)
Tactile Transfer Learning and Object Recognition With a Multifingered Hand Using Morphology Specific Convolutional Neural Networks

Satoshi Funabashi, Gang Yan, Fei Hongyi, Alexander Schmitz, Lorenzo Jamone, Tetsuya Ogata, Shigeki Sugano

IEEE Transactions on Neural Networks and Learning Systems 1 - 15 2024年 [査読有り]

DOI

Scopus

14

被引用数

(Scopus)
Modality Attention for Prediction-Based Robot Motion Generation: Improving Interpretability and Robustness of Using Multi-Modality

Hideyuki Ichiwara, Hiroshi Ito, Kenjiro Yamamoto, Hiroki Mori, Tetsuya Ogata

IEEE Robotics and Automation Letters 8 ( 12 ) 8271 - 8278 2023年12月 [査読有り]

担当区分：最終著者

DOI
Uncertainty-Aware Haptic Shared Control With Humanoid Robots for Flexible Object Manipulation

Takumi Hara, Takashi Sato, Tetsuya Ogata, Hiromitsu Awano

IEEE Robotics and Automation Letters 8 ( 10 ) 6435 - 6442 2023年10月 [査読有り]

DOI
Multi-Timestep-Ahead Prediction with Mixture of Experts for Embodied Question Answering

Kanata Suzuki, Yuya Kamiwano, Naoya Chiba, Hiroki Mori, Tetsuya Ogata

Artificial Neural Networks and Machine Learning – ICANN 2023 243 - 255 2023年09月 [査読有り]

担当区分：最終著者

DOI

Scopus

1

被引用数

(Scopus)
Structured Motion Generation with Predictive Learning: Proposing Subgoal for Long-Horizon Manipulation

Namiko Saito, João Moura, Tetsuya Ogata, Marina Y. Aoyama, Shingo Murata, Shigeki Sugano, Sethu Vijayakumar

2023 IEEE International Conference on Robotics and Automation (ICRA) 2023年05月 [査読有り]

DOI
Multimodal Time Series Learning of Robots Based on Distributed and Integrated Modalities: Verification with a Simulator and Actual Robots

Hideyuki Ichiwara, Hiroshi Ito, Kenjiro Yamamoto, Hiroki Mori, Tetsuya Ogata

2023 IEEE International Conference on Robotics and Automation (ICRA) 2023年05月 [査読有り]

担当区分：最終著者

DOI
Visual Spatial Attention and Proprioceptive Data-Driven Reinforcement Learning for Robust Peg-in-Hole Task Under Variable Conditions

Andre Yuji Yasutomi, Hideyuki Ichiwara, Hiroshi Ito, Hiroki Mori, Tetsuya Ogata

IEEE Robotics and Automation Letters 8 ( 3 ) 1834 - 1841 2023年03月 [査読有り]

担当区分：最終著者, 責任著者

DOI
Learning-based collision-free planning on arbitrary optimization criteria in the latent space through cGANs

Tomoki Ando, Hiroto Iino, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata

Advanced Robotics 1 - 13 2023年02月 [査読有り]

担当区分：最終著者, 責任著者

DOI

Scopus

4

被引用数

(Scopus)
深層予測学習を用いた双腕ロボットによる柔軟物体操作: 近未来予測とリアルタイム動作生成

伊藤洋, SIMON Armleder, 鹿田玄輝, 蔡賢博, GORDON Cheng, 尾形哲也

人工知能学会全国大会論文集 JSAI2023 1G4OS21a04 - 1G4OS21a04 2023年

　概要を見る

本研究では、過去の学習経験に基づいて、実世界に適した動作をリアルタイムに予測することで、未学習の環境や作業対象物に対応することが可能な深層予測学習を用いた双腕ロボットによる柔軟物体操作を提案する。柔軟物体操作としてタオル掛けタスクを対象に、ロボットは机の上に置かれたタオルを把持し、物干し竿にかける動作を実現する。ロボットは視覚運動情報に基づき近未来の状況を予測し、現実との誤差を最小にするように動作を生成し、学習時と現実の差を許容しながらリアルタイムに動作を調整し続けることで、未学習の状況下でも柔軟に作業することが可能である。

DOI
深層予測学習を用いたロボット動作の複合生成

鈴木彼方, 伊藤洋, 山田竜郎, 加瀬敬唯, 尾形哲也

日本ロボット学会誌 40 ( 9 ) 772 - 777 2022年11月 [招待有り]

担当区分：最終著者, 責任著者

DOI
深層予測学習：背景と今後

尾形哲也

日本ロボット学会誌 40 ( 9 ) 761 - 765 2022年11月 [招待有り]

担当区分：筆頭著者, 責任著者

DOI
Use of Action Label in Deep Predictive Learning for Robot Manipulation

Kei Kase, Chikara Utsumi, Yukiyasu Domae, Tetsuya Ogata

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022年10月 [査読有り]

担当区分：最終著者, 責任著者

DOI
Guided Visual Attention Model Based on Interactions Between Top-down and Bottom-up Prediction for Robot Pose Prediction

Hyogo Hiruma, Hiroki Mori, Hiroshi Ito, Tetsuya Ogata

IECON 2022 – 48th Annual Conference of the IEEE Industrial Electronics Society 2022年10月 [査読有り]

担当区分：最終著者

DOI
Learning Bidirectional Translation Between Descriptions and Actions With Small Paired Data

Minori Toyoda, Kanata Suzuki, Yoshihiko Hayashi, Tetsuya Ogata

IEEE Robotics and Automation Letters 7 ( 4 ) 10930 - 10937 2022年10月 [査読有り] [国際誌]

担当区分：最終著者

DOI
Emergence of sensory attenuation based upon the free-energy principle

Hayato Idei, Wataru Ohata, Yuichi Yamashita, Tetsuya Ogata, Jun Tani

Scientific Reports 12 ( 1 ) 14542 - 14542 2022年08月 [査読有り] [国際誌]

　概要を見る

Abstract

The brain attenuates its responses to self-produced exteroceptions (e.g., we cannot tickle ourselves). Is this phenomenon, known as sensory attenuation, enabled innately, or acquired through learning? Here, our simulation study using a multimodal hierarchical recurrent neural network model, based on variational free-energy minimization, shows that a mechanism for sensory attenuation can develop through learning of two distinct types of sensorimotor experience, involving self-produced or externally produced exteroceptions. For each sensorimotor context, a particular free-energy state emerged through interaction between top-down prediction with precision and bottom-up sensory prediction error from each sensory area. The executive area in the network served as an information hub. Consequently, shifts between the two sensorimotor contexts triggered transitions from one free-energy state to another in the network via executive control, which caused shifts between attenuating and amplifying prediction-error-induced responses in the sensory areas. This study situates emergence of sensory attenuation (or self-other distinction) in development of distinct free-energy states in the dynamic hierarchical neural system.

DOI PubMed

Scopus

13

被引用数

(Scopus)
Deep Active Visual Attention for Real-Time Robot Motion Generation: Emergence of Tool-Body Assimilation and Adaptive Tool-Use

Hyogo Hiruma, Hiroshi Ito, Hiroki Mori, Tetsuya Ogata

IEEE Robotics and Automation Letters 7 ( 3 ) 8550 - 8557 2022年07月 [査読有り]

担当区分：最終著者

DOI
Robot Task Learning With Motor Babbling Using Pseudo Rehearsal

Kei Kase, Ai Tateishi, Tetsuya Ogata

IEEE Robotics and Automation Letters 7 ( 3 ) 8377 - 8382 2022年07月 [査読有り]

担当区分：最終著者

DOI

Scopus

2

被引用数

(Scopus)
Efficient multitask learning with an embodied predictive model for door opening and entry with whole-body control

Hiroshi Ito, Kenjiro Yamamoto, Hiroki Mori, Tetsuya Ogata

Science Robotics 7 ( 65 ) 2022年04月 [査読有り] [国際誌]

担当区分：最終著者, 責任著者

　概要を見る

Robots need robust models to effectively perform tasks that humans do on a daily basis. These models often require substantial developmental costs to maintain because they need to be adjusted and adapted over time. Deep reinforcement learning is a powerful approach for acquiring complex real-world models because there is no need for a human to design the model manually. Furthermore, a robot can establish new motions and optimal trajectories that may not have been considered by a human. However, the cost of learning is an issue because it requires a huge amount of trial and error in the real world. Here, we report a method for realizing complicated tasks in the real world with low design and teaching costs based on the principle of prediction error minimization. We devised a module integration method by introducing a mechanism that switches modules based on the prediction error of multiple modules. The robot generates appropriate motions according to the door’s position, color, and pattern with a low teaching cost. We also show that by calculating the prediction error of each module in real time, it is possible to execute a sequence of tasks (opening door outward and passing through) by linking multiple modules and responding to sudden changes in the situation and operating procedures. The experimental results show that the method is effective at enabling a robot to operate autonomously in the real world in response to changes in the environment.

DOI

Scopus

60

被引用数

(Scopus)
Multi-Fingered In-Hand Manipulation With Various Object Properties Using Graph Convolutional Networks and Distributed Tactile Sensors

Satoshi Funabashi, Tomoki Isobe, Fei Hongyi, Atsumu Hiramoto, Alexander Schmitz, Shigeki Sugano, Tetsuya Ogata

IEEE Robotics and Automation Letters 7 ( 2 ) 2102 - 2109 2022年04月 [査読有り]

担当区分：最終著者

DOI

Scopus

35

被引用数

(Scopus)
Utilization of Image/Force/Tactile Sensor Data for Object-Shape-Oriented Manipulation: Wiping Objects With Turning Back Motions and Occlusion

Namiko Saito, Takumi Shimizu, Tetsuya Ogata, Shigeki Sugano

IEEE Robotics and Automation Letters 7 ( 2 ) 968 - 975 2022年04月 [査読有り]

DOI

Scopus

13

被引用数

(Scopus)
Special issue on Symbol Emergence in Robotics and Cognitive Systems (I)

Tadahiro Taniguchi, Takayuki Nagai, Shingo Shimoda, Angelo Cangelosi, Yiannis Demiris, Yutaka Matsuo, Kenji Doya, Tetsuya Ogata, Lorenzo Jamone, Yukie Nagai, Emre Ugur, Daichi Mochihashi, Yuuya Unno, Kazuo Okanoya, Takashi Hashimoto

Advanced Robotics 36 ( 1-2 ) 1 - 2 2022年

DOI

Scopus
Special issue on symbol emergence in robotics and cognitive systems (II)

Tadahiro Taniguchi, Takayuki Nagai, Shingo Shimoda, Angelo Cangelosi, Yiannis Demiris, Yutaka Matsuo, Kenji Doya, Tetsuya Ogata, Lorenzo Jamone, Yukie Nagai, Emre Ugur, Daichi Mochihashi, Yuuya Unno, Kazuo Okanoya, Takashi Hashimoto

Advanced Robotics 36 ( 5-6 ) 217 - 218 2022年

DOI

Scopus
Special issue on symbol emergence in robotics and cognitive systems (II).

Tadahiro Taniguchi, Takayuki Nagai, Shingo Shimoda, Angelo Cangelosi, Yiannis Demiris, Yutaka Matsuo, Kenji Doya, Tetsuya Ogata, Lorenzo Jamone, Yukie Nagai, Emre Ugur, Daichi Mochihashi, Yuuya Unno, Kazuo Okanoya, Takashi Hashimoto

Advanced Robotics 36 ( 5-6 ) 217 - 218 2022年

DOI

Scopus
Special issue on Symbol Emergence in Robotics and Cognitive Systems (I).

Tadahiro Taniguchi, Takayuki Nagai, Shingo Shimoda, Angelo Cangelosi, Yiannis Demiris, Yutaka Matsuo, Kenji Doya, Tetsuya Ogata, Lorenzo Jamone, Yukie Nagai, Emre Ugur, Daichi Mochihashi, Yuuya Unno, Kazuo Okanoya, Takashi Hashimoto

Advanced Robotics 36 ( 1-2 ) 1 - 2 2022年

DOI

Scopus
Generating Humanoid Robot Motions based on a Procedural Animation IK Rig Method.

Pin-Chu Yang, Satoshi Funabashi, Mohammed Al-Sada, Tetsuya Ogata

IEEE/SICE International Symposium on System Integration(SII) 491 - 498 2022年

DOI

Scopus

5

被引用数

(Scopus)
モダリティ注意による深層予測学習の解釈性とノイズロバスト性の向上―日立-早大の共同研究開発事例―

一藁秀行, 伊藤洋, 山本健次郎, 森裕紀, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2022 2A2-H11 2022年

DOI
Point Cloud Pre-training with Natural 3D Structures.

Ryosuke Yamada, Hirokatsu Kataoka, Naoya Chiba, Yukiyasu Domae, Tetsuya Ogata

Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 21251 - 21261 2022年 [査読有り]

担当区分：最終著者

DOI

Scopus

31

被引用数

(Scopus)
Time Pressure Based Human Workload and Productivity Compatible System for Human-Robot Collaboration.

Naoki Shirakura, Ryuichi Takase, Natsuki Yamanobe, Yukiyasu Domae, Tetsuya Ogata

Proceedings of IEEE International Conference on Automation Science and Engineering (CASE) 2022 659 - 666 2022年 [査読有り]

担当区分：最終著者

DOI

Scopus

2

被引用数

(Scopus)
Sensory-Motor Learning for Simultaneous Control of Motion and Force: Generating Rubbing Motion against Uneven Object.

Hiroshi Ito, Takumi Kurata, Tetsuya Ogata

IEEE/SICE International Symposium on System Integration(SII) 408 - 415 2022年 [査読有り]

担当区分：最終著者

DOI

Scopus

3

被引用数

(Scopus)
Integrated Learning of Robot Motion and Sentences: Real-Time Prediction of Grasping Motion and Attention based on Language Instructions.

Hiroshi Ito, Hideyuki Ichiwara, Kenjiro Yamamoto, Hiroki Mori, Tetsuya Ogata

ICRA 5404 - 5410 2022年 [査読有り]

担当区分：最終著者

DOI

Scopus

8

被引用数

(Scopus)
Contact-Rich Manipulation of a Flexible Object based on Deep Predictive Learning using Vision and Tactility.

Hideyuki Ichiwara, Hiroshi Ito, Kenjiro Yamamoto, Hiroki Mori, Tetsuya Ogata

ICRA 5375 - 5381 2022年 [査読有り]

担当区分：最終著者

DOI

Scopus

14

被引用数

(Scopus)
Leveraging Motor Babbling for Efficient Robot Learning

Kei Kase, Noboru Matsumoto, Tetsuya Ogata

Journal of Robotics and Mechatronics 33 ( 5 ) 1063 - 1074 2021年10月 [査読有り]

担当区分：最終著者

　概要を見る

Deep robotic learning by learning from demonstration allows robots to mimic a given demonstration and generalize their performance to unknown task setups. However, this generalization ability is heavily affected by the number of demonstrations, which can be costly to manually generate. Without sufficient demonstrations, robots tend to overfit to the available demonstrations and lose the robustness offered by deep learning. Applying the concept of motor babbling – a process similar to that by which human infants move their bodies randomly to obtain proprioception – is also effective for allowing robots to enhance their generalization ability. Furthermore, the generation of babbling data is simpler than task-oriented demonstrations. Previous researches use motor babbling in the concept of pre-training and fine-tuning but have the problem of the babbling data being overwritten by the task data. In this work, we propose an RNN-based robot-control framework capable of leveraging targetless babbling data to aid the robot in acquiring proprioception and increasing the generalization ability of the learned task data by learning both babbling and task data simultaneously. Through simultaneous learning, our framework can use the dynamics obtained from babbling data to learn the target task efficiently. In the experiment, we prepare demonstrations of a block-picking task and aimless-babbling data. With our framework, the robot can learn tasks faster and show greater generalization ability when blocks are at unknown positions or move during execution.

DOI

Scopus

2

被引用数

(Scopus)
Tool-Use Model to Reproduce the Goal Situations Considering Relationship Among Tools, Objects, Actions and Effects Using Multimodal Deep Neural Networks

Namiko Saito, Tetsuya Ogata, Hiroki Mori, Shingo Murata, Shigeki Sugano

Frontiers in Robotics and AI 8 2021年09月 [査読有り]

　概要を見る

We propose a tool-use model that enables a robot to act toward a provided goal. It is important to consider features of the four factors; tools, objects actions, and effects at the same time because they are related to each other and one factor can influence the others. The tool-use model is constructed with deep neural networks (DNNs) using multimodal sensorimotor data; image, force, and joint angle information. To allow the robot to learn tool-use, we collect training data by controlling the robot to perform various object operations using several tools with multiple actions that leads different effects. Then the tool-use model is thereby trained and learns sensorimotor coordination and acquires relationships among tools, objects, actions and effects in its latent space. We can give the robot a task goal by providing an image showing the target placement and orientation of the object. Using the goal image with the tool-use model, the robot detects the features of tools and objects, and determines how to act to reproduce the target effects automatically. Then the robot generates actions adjusting to the real time situations even though the tools and objects are unknown and more complicated than trained ones.

DOI

Scopus

5

被引用数

(Scopus)
Paradoxical sensory reactivity induced by functional disconnection in a robot model of neurodevelopmental disorder

Hayato Idei, Shingo Murata, Yuichi Yamashita, Tetsuya Ogata

Neural Networks 138 150 - 163 2021年06月 [査読有り]

担当区分：最終著者

　概要を見る

Neurodevelopmental disorders are characterized by heterogeneous and non-specific nature of their clinical symptoms. In particular, hyper- and hypo-reactivity to sensory stimuli are diagnostic features of autism spectrum disorder and are reported across many neurodevelopmental disorders. However, computational mechanisms underlying the unusual paradoxical behaviors remain unclear. In this study, using a robot controlled by a hierarchical recurrent neural network model with predictive processing and learning mechanism, we simulated how functional disconnection altered the learning process and subsequent behavioral reactivity to environmental change. The results show that, through the learning process, long-range functional disconnection between distinct network levels could simultaneously lower the precision of sensory information and higher-level prediction. The alteration caused a robot to exhibit sensory-dominated and sensory-ignoring behaviors ascribed to sensory hyper- and hypo-reactivity, respectively. As long-range functional disconnection became more severe, a frequency shift from hyporeactivity to hyperreactivity was observed, paralleling an early sign of autism spectrum disorder. Furthermore, local functional disconnection at the level of sensory processing similarly induced hyporeactivity due to low sensory precision. These findings suggest a computational explanation for paradoxical sensory behaviors in neurodevelopmental disorders, such as coexisting hyper- and hypo-reactivity to sensory stimulus. A neurorobotics approach may be useful for bridging various levels of understanding in neurodevelopmental disorders and providing insights into mechanisms underlying complex clinical symptoms. (c) 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

DOI PubMed

Scopus

8

被引用数

(Scopus)
Development of a Basic Educational Kit for Robotic System with Deep Neural Networks

Momomi Kanamura, Kanata Suzuki, Yuki Suga, Tetsuya Ogata

Sensors 21 ( 11 ) 3804 - 3804 2021年05月 [査読有り]

担当区分：責任著者

　概要を見る

In many robotics studies, deep neural networks (DNNs) are being actively studied due to their good performance. However, existing robotic techniques and DNNs have not been systematically integrated, and packages for beginners are yet to be developed. In this study, we proposed a basic educational kit for robotic system development with DNNs. Our goal was to educate beginners in both robotics and machine learning, especially the use of DNNs. Initially, we required the kit to (1) be easy to understand, (2) employ experience-based learning, and (3) be applicable in many areas. To clarify the learning objectives and important parts of the basic educational kit, we analyzed the research and development (R&D) of DNNs and divided the process into three steps of data collection (DC), machine learning (ML), and task execution (TE). These steps were configured under a hierarchical system flow with the ability to be executed individually at the development stage. To evaluate the practicality of the proposed system flow, we implemented it for a physical robotic grasping system using robotics middleware. We also demonstrated that the proposed system can be effectively applied to other hardware, sensor inputs, and robot tasks.

DOI PubMed

Scopus

3

被引用数

(Scopus)
From Anime To Reality: Embodying An Anime Character As A Humanoid Robot

Mohammed Al Al Sada, Pin-Chu Yang, Chang Chieh Chiu, Tito Pradhono Tomo, Mhd Yamen Saraiji, Tetsuya Ogata, Tatsuo Nakajima

Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems 2021年05月 [査読有り]

DOI

Scopus

2

被引用数

(Scopus)
Embodying Pre-Trained Word Embeddings Through Robot Actions

Minori Toyoda, Kanata Suzuki, Hiroki Mori, Yoshihiko Hayashi, Tetsuya Ogata

IEEE Robotics and Automation Letters 6 ( 2 ) 4225 - 4232 2021年04月 [査読有り]

担当区分：責任著者

DOI

Scopus

12

被引用数

(Scopus)
Compensation for Undefined Behaviors during Robot Task Execution by Switching Controllers Depending on Embedded Dynamics in RNN

Kanata Suzuki, Hiroki Mori, Tetsuya Ogata

IEEE Robotics and Automation Letters 6 ( 2 ) 3475 - 3482 2021年04月 [査読有り]

担当区分：最終著者, 責任著者

　概要を見る

Robotic applications require both correct task performance and compensation for undefined behaviors. Although deep learning is a promising approach to perform complex tasks, the response to undefined behaviors that are not reflected in the training dataset remains challenging. In a human-robot collaborative task, the robot may adopt an unexpected posture due to collisions and other unexpected events. Therefore, robots should be able to recover from disturbances for completing the execution of the intended task. We propose a compensation method for undefined behaviors by switching between two controllers. Specifically, the proposed method switches between learning-based and model-based controllers depending on the internal representation of a recurrent neural network that learns task dynamics. We applied the proposed method to a pick-And-place task and evaluated the compensation for undefined behaviors. Experimental results from simulations and on a real robot demonstrate the effectiveness and high performance of the proposed method.

DOI

Scopus

17

被引用数

(Scopus)
How to Select and Use Tools? : Active Perception of Target Objects Using Multimodal Deep Learning

Namiko Saito, Tetsuya Ogata, Satoshi Funabashi, Hiroki Mori, Shigeki Sugano

IEEE Robotics and Automation Letters 6 ( 2 ) 2517 - 2524 2021年04月 [査読有り]

　概要を見る

Selection of appropriate tools and use of them when performing daily tasks is a critical function for introducing robots for domestic applications. In previous studies, however, adaptability to target objects was limited, making it difficult to accordingly change tools and adjust actions. To manipulate various objects with tools, robots must both understand tool functions and recognize object characteristics to discern a tool-object-action relation. We focus on active perception using multimodal sensorimotor data while a robot interacts with objects, and allow the robot to recognize their extrinsic and intrinsic characteristics. We construct a deep neural networks (DNN) model that learns to recognize object characteristics, acquires tool-object-action relations, and generates motions for tool selection and handling. As an example tool-use situation, the robot performs an ingredients transfer task, using a turner or ladle to transfer an ingredient from a pot to a bowl. The results confirm that the robot recognizes object characteristics and servings even when the target ingredients are unknown. We also examine the contributions of images, force, and tactile data and show that learning a variety of multimodal information results in rich perception for tool use.

DOI

Scopus

43

被引用数

(Scopus)
Binary Neural Network in Robotic Manipulation: Flexible Object Manipulation for Humanoid Robot Using Partially Binarized Auto-Encoder on FPGA.

Satoshi Ohara, Tetsuya Ogata, Hiromitsu Awano

CoRR abs/2107.00209 2021年
Comparison of Consolidation Methods for Predictive Learning of Time Series

Ryoichi Nakajo, Tetsuya Ogata

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12798 113 - 120 2021年

　概要を見る

In environments where various tasks are sequentially given to deep neural networks (DNNs), training methods are needed that enable DNNs to learn the given tasks continuously. A DNN is typically trained by a single dataset, and continuous learning of subsequent datasets causes the problem of catastrophic forgetting. Previous studies have reported results for consolidation learning methods in recognition tasks and reinforcement learning problems. However, those methods were validated on only a few examples of predictive learning for time series. In this study, we applied elastic weight consolidation (EWC) and pseudo-rehearsal to the predictive learning of time series and compared their learning results. Evaluating the latent space after the consolidation learning revealed that the EWC method acquires properties of the pre-training and subsequent datasets with the same distribution, and the pseudo-rehearsal method distinguishes the properties and acquires them with different distributions.

DOI

Scopus

2

被引用数

(Scopus)
Binary Neural Network in Robotic Manipulation: Flexible Object Manipulation for Humanoid Robot Using Partially Binarized Auto-Encoder on FPGA.

Satoshi Ohara, Tetsuya Ogata, Hiromitsu Awano

IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS) 6010 - 6015 2021年

DOI

Scopus

1

被引用数

(Scopus)
Contact-Rich Manipulation of a Flexible Object based on Deep Predictive Learning using Vision and Tactility.

Hideyuki Ichiwara, Hiroshi Ito, Kenjiro Yamamoto, Hiroki Mori, Tetsuya Ogata

CoRR abs/2112.06442 2021年
言語指示に基づいた注意予測による把持動作生成—日立-早大の共同研究開発事例

伊藤洋, 一藁秀行, 山本健次郎, 森裕紀, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2021 1P3-D05 2021年

DOI CiNii
触覚センサと深層学習を用いた布バッグのジッパー開け動作―日立-早大の共同研究開発事例―

一藁秀行, 伊藤洋, 山本健次郎, 森裕紀, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2021 1P3-D04 2021年

DOI CiNii
Viewpoint Planning Based on Uncertainty Maps Created from the Generative Query Network

Kelvin Lukman, Hiroki Mori, Tetsuya Ogata

Advances in Intelligent Systems and Computing 37 - 48 2021年 [招待有り]

担当区分：最終著者

DOI
In-air Knotting of Rope using Dual-Arm Robot based on Deep Learning.

Kanata Suzuki, Momomi Kanamura, Yuki Suga, Hiroki Mori, Tetsuya Ogata

CoRR abs/2103.09402 2021年
Spatial Attention Point Network for Deep-learning-based Robust Autonomous Robot Motion Generation.

Hideyuki Ichiwara, Hiroshi Ito, Kenjiro Yamamoto, Hiroki Mori, Tetsuya Ogata

CoRR abs/2103.01598 2021年
Stable deep reinforcement learning method by predicting uncertainty in rewards as a subtask.

Kanata Suzuki, Tetsuya Ogata

CoRR abs/2101.06906 2021年
Tactile-based curiosity maximizes tactile-rich object-oriented actions even without any extrinsic rewards

Hiroki Mori, Masayuki Masuda, Tetsuya Ogata

ICDL-EpiRob 2020 - 10th IEEE International Conference on Development and Learning and Epigenetic Robotics 2020年10月 [査読有り]

担当区分：最終著者

　概要を見る

This study proposed a hypothesis regarding the emergence of object-oriented action via tactile-based curiosity. The hypothesis is such that a curious exploration driven by tactile sensation leads tactile-rich object-oriented actions, while there are no explicit rewards or other designated intentional purposes. Experiments were with the curiosity model named the disagreement model from the reinforcement learning research field and with a simple physics robotic simulation with visual and tactile sensory information. The experimental results indicated that the tactile sensation induces object-oriented actions such as hitting and pecking by the body parts that have tactile sensors. We deduced that the hypothesis could be extended to discussions regarding the acquisition of dexterous skillful object manipulation in human development.

DOI

Scopus
Wiping 3D-objects using deep learning model based on image/force/joint information

Namiko Saito, Danyang Wang, Tetsuya Ogata, Hiroki Mori, Shigeki Sugano

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 10152 - 10157 2020年10月 [査読有り]

　概要を見る

We propose a deep learning model for a robot to wipe 3D-objects. Wiping of 3D-objects requires recognizing the shapes of objects and planning the motor angle adjustments for tracing the objects. Unlike previous research, our learning model does not require pre-designed computational models of target objects. The robot is able to wipe the objects to be placed by using image, force, and arm joint information. We evaluate the generalization ability of the model by confirming that the robot handles untrained cube and bowl shaped-objects. We also find that it is necessary to use both image and force information to recognize the shape of and wipe 3D objects consistently by comparing changes in the input sensor data to the model. To our knowledge, this is the first work enabling a robot to use learning sensorimotor information alone to trace various unknown 3D-shape.

DOI

Scopus

14

被引用数

(Scopus)
Variable in-hand manipulations for tactile-driven robot hand via CNN-LSTM

Satoshi Funabashi, Shun Ogasa, Tomoki Isobe, Tetsuya Ogata, Alexander Schmitz, Tito Pradhono Tomo, Shigeki Sugano

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 9472 - 9479 2020年10月 [査読有り]

　概要を見る

Performing various in-hand manipulation tasks, without learning each individual task, would enable robots to act more versatile, while reducing the effort for training. However, in general it is difficult to achieve stable in-hand manipulation, because the contact state between the fingertips becomes difficult to model, especially for a robot hand with anthropomorphically shaped fingertips. Rich tactile feedback can aid the robust task execution, but on the other hand it is challenging to process high-dimensional tactile information. In the current paper we use two fingers of the Allegro hand, and each fingertip is anthropomorphically shaped and equipped not only with 6-axis force-torque (F/T) sensors, but also with uSkin tactile sensors, which provide 24 tri-axial measurements per fingertip. A convolutional neural network is used to process the high dimensional uSkin information, and a long short-term memory (LSTM) handles the time-series information. The network is trained to generate two different motions ("twist"and "push"). The desired motion is provided as a task-parameter to the network, with twist defined as -1 and push as +1. When values between -1 and +1 are used as the task parameter, the network is able to generate untrained motions in-between the two trained motions. Thereby, we can achieve multiple untrained manipulations, and can achieve robustness with high-dimensional tactile feedback.

DOI

Scopus

13

被引用数

(Scopus)
Stable in-grasp manipulation with a low-cost robot hand by using 3-axis tactile sensors with a CNN

Satoshi Funabashi, Tomoki Isobe, Shun Ogasa, Tetsuya Ogata, Alexander Schmitz, Tito Pradhono Tomo, Shigeki Sugano

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 9166 - 9173 2020年10月 [査読有り]

　概要を見る

The use of tactile information is one of the most important factors for achieving stable in-grasp manipulation. Especially with low-cost robotic hands that provide low-precision control, robust in-grasp manipulation is challenging. Abundant tactile information could provide the required feed-back to achieve reliable in-grasp manipulation also in such cases. In this research, soft distributed 3-axis skin sensors ("uSkin") and 6-axis F/T (force/torque) sensors were mounted on each fingertip of an Allegro Hand to provide rich tactile information. These sensors yielded 78 measurements for each fingertip (72 measurements from the uSkin and 6 measurements from the 6-axis F/T sensor). However, such high-dimensional tactile information can be difficult to process because of the complex contact states between the grasped object and the fingertips. Therefore, a convolutional neural network (CNN) was employed to process the tactile information. In this paper, we explored the importance of the different sensors for achieving in-grasp manipulation. Successful in-grasp manipulation with untrained daily objects was achieved when both 3-axis uSkin and 6-axis F/T information was provided and when the information was processed using a CNN.

DOI

Scopus

19

被引用数

(Scopus)
Homogeneous Intrinsic Neuronal Excitability Induces Overfitting to Sensory Noise: A Robot Model of Neurodevelopmental Disorder

Hayato Idei, Shingo Murata, Yuichi Yamashita, Tetsuya Ogata

Frontiers in Psychiatry 11 2020年08月 [査読有り]

担当区分：最終著者

　概要を見る

Neurodevelopmental disorders, including autism spectrum disorder, have been intensively investigated at the neural, cognitive, and behavioral levels, but the accumulated knowledge remains fragmented. In particular, developmental learning aspects of symptoms and interactions with the physical environment remain largely unexplored in computational modeling studies, although a leading computational theory has posited associations between psychiatric symptoms and an unusual estimation of information uncertainty (precision), which is an essential aspect of the real world and is estimated through learning processes. Here, we propose a mechanistic explanation that unifies the disparate observations via a hierarchical predictive coding and developmental learning framework, which is demonstrated in experiments using a neural network-controlled robot. The results show that, through the developmental learning process, homogeneous intrinsic neuronal excitability at the neural level induced via self-organization changes at the information processing level, such as hyper sensory precision and overfitting to sensory noise. These changes led to multifaceted alterations at the behavioral level, such as inflexibility, reduced generalization, and motor clumsiness. In addition, these behavioral alterations were accompanied by fluctuating neural activity and excessive development of synaptic connections. These findings might bridge various levels of understandings in autism spectrum and other neurodevelopmental disorders and provide insights into the disease processes underlying observed behaviors and brain activities in individual patients. This study shows the potential of neurorobotics frameworks for modeling how psychiatric disorders arise from dynamic interactions among the brain, body, and uncertain environments.

DOI

Scopus

13

被引用数

(Scopus)
HATSUKI : An anime character like robot figure platform with anime-style expressions and imitation learning based action generation

Pin-Chu Yang, Mohammed Al-Sada, Chang-Chieh Chiu, Kevin Kuo, Tito Pradhono Tomo, Kanata Suzuki, Nelson Yalta, Kuo-Hao Shu, Tetsuya Ogata

2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) 2020年08月 [査読有り]

担当区分：最終著者

DOI
Applying Uncertainty Maps created from Generative Query Network for a Viewpoint Planner

Kelvin Lukman, Hiroki Mori, Tetsuya Ogata

第34回人工知能学会全国大会 34 2G1-ES-4-04 2020年06月

DOI
未知語に対応可能な言語と動作の統合表現獲得モデル

豊田みのり, 森裕紀, 鈴木彼方, 林良彦, 尾形哲也

第34回人工知能学会全国大会 2D4-OS-18a-04 2020年06月
Transferable Task Execution from Pixels through Deep Planning Domain Learning

Kei Kase, Chris Paxton, Hammad Mazhar, Tetsuya Ogata, Dieter Fox

2020 IEEE International Conference on Robotics and Automation (ICRA) 2020年05月 [査読有り]

DOI
料理ロボットのための道具の選択・使用深層学習モデル – 道具と食材の配置に応じた料理のよそい動作の実現

斎藤菜美子, 呉雨恒, 尾形哲也, 森裕紀, 王丹阳, 陽品駒, 菅野重樹

情報処理学会第82回全国大会 5U-08 2020年03月
位置情報を明示的に扱う空間的注意機構モデルによる物体位置・角度推定の汎化

昼間彪吾, 森裕紀, 尾形哲也

情報処理学会第82回全国大会 2020年03月
Evaluation of Generalization Performance of Visuo-Motor Learning by Analyzing Internal State Structured from Robot Motion

Hiroshi Ito, Kenjiro Yamamoto, Hiroki Mori, Tetsuya Ogata

New Generation Computing 38 ( 1 ) 7 - 22 2020年03月 [査読有り]

担当区分：最終著者

DOI

Scopus

3

被引用数

(Scopus)
不確実性の推定と自閉スペクトラム症ー神経ロボティクス実験による症状シミュレーションー

出井勇人, 村田真悟, 尾形哲也, 山下祐一

精神医学 62 ( 2 ) 219 - 229 2020年02月 [査読有り]

　概要を見る

＜文献概要＞脳の計算理論の発展に伴い,それを精神疾患の病態理解に応用する試みへの期待が,近年,高まっている。特に,脳の予測符号化モデルに基づいた研究では,感覚情報の不確実性を推定する神経機能の変調と精神疾患との関連を示唆する概念的な仮説が提示されてきたが,実際の臨床で観察される行動レベルでの病像との間にはギャップが存在している。本論文では,神経ロボティクスの手法を用いることで,実験的にこの理論と臨床像との橋渡しを行った研究を紹介する。実験では,自閉スペクトラム症に特徴的な限定された反復的な行動様式や行為の停止といった多様な症状が,不確実性の過小評価と過大評価による異なるプロセスから生じ得ることが示され,精神症状のmultifinal性とequifinal性の一側面が捉えられた。また,病態理解にとどまらず,症状改善のための環境調整的な介入への理論的示唆といった,計算論的精神医学の臨床研究への貢献可能性が示された。
ゲームエンジンを使用したロボット模倣学習を効率化するプラットフォーム開発：—Autonomous Humanoid Figure ”Hatsuki” Mk.I

正陽品駒, 鈴木彼方, 邱章傑, トモティトプラドノ, ヤルタネルソン, カクケビン, 舒國豪, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2020 2A1-D04 2020年

DOI CiNii
深層学習を用いた実ロボットの反射動作学習—日立-早大の共同研究開発事例

伊藤洋, 山本健次郎, 森裕紀, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2020 2A1-A11 2020年

DOI CiNii
HATSUKI : An anime character like robot figure platform with anime-style expressions and imitation learning based action generation.

Pin-Chu Yang, Mohammed Al-Sada, Chang-Chieh Chiu, Kevin Kuo, Tito Pradhono Tomo, Kanata Suzuki, Nelson Yalta, Kuo-Hao Shu, Tetsuya Ogata

CoRR abs/2003.14121 2020年
Undefined-behavior guarantee by switching to model-based controller according to the embedded dynamics in Recurrent Neural Network.

Kanata Suzuki, Hiroki Mori, Tetsuya Ogata

CoRR abs/2003.04862 2020年
Transferable Task Execution from Pixels through Deep Planning Domain Learning.

Kei Kase, Chris Paxton, Hammad Mazhar, Tetsuya Ogata, Dieter Fox

CoRR abs/2003.03726 2020年
Development of a Basic Educational Kit for Robot Development Using Deep Neural Networks

Momomi Kanamura, Yuki Suga, Tetsuya Ogata

2020 IEEE/SICE International Symposium on System Integration (SII) 2020年01月 [査読有り]

担当区分：最終著者

DOI
Stable Deep Reinforcement Learning Method by Predicting Uncertainty in Rewards as a Subtask

Kanata Suzuki, Tetsuya Ogata

Neural Information Processing 651 - 662 2020年 [査読有り]

担当区分：最終著者

DOI

Scopus

1

被引用数

(Scopus)
Visualization of Focal Cues for Visuomotor Coordination by Gradient-based Methods: A Recurrent Neural Network Shifts the Attention Depending on Task Requirements

Hiroshi Ito, Kenjiro Yamamoto, Hiroki Mori, Shuki Goto, Tetsuya Ogata

Proceedings of the 2020 IEEE/SICE International Symposium on System Integration, SII 2020 188 - 194 2020年01月 [査読有り]

担当区分：最終著者

　概要を見る

For an autonomous robot to flexibly move in response to various tasks or environmental changes, an attention mechanism is required that is based on the robot's behavioral experience. In this paper, we visualize how attention is acquired inside a neural network learned using supervised learning and describe how to acquire a suitable representation for performing a task. Our experimental evaluation shows that the attention was automatically acquired for objects that are needed to perform tasks by learning the time-series of both vision and motor information rather than only vision information. By multimodal learning, the attention is robust against unlearned conditions which background changes or obstacles.

DOI

Scopus

1

被引用数

(Scopus)
Disentanglement in conceptual space during sensorimotor interaction

Junpei Zhong, Tetsuya Ogata, Angelo Cangelosi, Chenguang Yang

Cognitive Computation and Systems 1 ( 4 ) 103 - 112 2019年12月

DOI

Scopus

2

被引用数

(Scopus)
Real-time liquid pouring motion generation: End-to-end sensorimotor coordination for unknown liquid dynamics trained with deep neural networks

Namiko Saito, Nguyen Ba Dai, Tetsuya Ogata, Hiroki Mori, Shigeki Sugano

IEEE International Conference on Robotics and Biomimetics, ROBIO 2019 1077 - 1082 2019年12月 [査読有り]

　概要を見る

We propose a sensorimotor dynamical system model for pouring unknown liquids. With our system, a robot holds and shakes a bottle to estimate the characteristics of the contained liquid, such as viscosity and fill level, without calculating to determine their parameters. Next, the robot pours a specified amount of the liquid into another container. The system needs to integrate information on the robot's actions, the liquids, the container, and the surrounding environment to perform the estimation and execute a continuous pouring motion using the same model. We use deep neural networks (DNN) to construct the system. The DNN model repeats prediction and execution of the actions to be taken in the next time step based on the input sensorimotor data, including camera images, force sensor data, and joint angles. At the same time, the DNN model acquires liquid characteristics in the internal state. We confirmed that the DNN model can control the robot to pour a desired amount of liquid with unknown viscosity and fill level.

DOI

Scopus

8

被引用数

(Scopus)
A Bi-directional Multiple Timescales LSTM Model for Grounding of Actions and Verbs

Alexandre Antunes, Alban Laflaquiere, Tetsuya Ogata, Angelo Cangelosi

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2614 - 2621 2019年11月 [査読有り] [国際誌] [国際共著]

　概要を見る

In this paper we present a neural architecture to learn a bi-directional mapping between actions and language. We implement a Multiple Timescale Long Short-Term Memory (MT-LSTM) network comprised of 7 layers with different timescale factors, to connect actions to language without explicitly learning an intermediate representation. Instead, the model self-organizes such representations at the level of a slow-varying latent layer, linking action branch and language branch at the center. We train the model in a bi-directional way, learning how to produce a sentence from a certain action sequence input and, simultaneously, how to generate an action sequence given a sentence as input. Furthermore we show this model preserves some of the generalization behaviour of Multiple Timescale Recurrent Neural Networks (MTRNN) in generating sentences and actions that were not explicitly trained. We compare this model with a number of different baseline models, confirming the importance of both the bi-directional training and the multiple timescales architecture. Finally, the network was evaluated on motor actions performed by an iCub robot and their corresponding letter-based description. The results of these experiments are presented at the end of the paper.

DOI

Scopus

5

被引用数

(Scopus)
Learning Multiple Sensorimotor Units to Complete Compound Tasks using an RNN with Multiple Attractors

Kei Kase, Ryoichi Nakajo, Hiroki Mori, Tetsuya Ogata

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 4244 - 4249 2019年11月 [査読有り]

担当区分：最終著者

　概要を見る

As the complexity of the robot's tasks increases, we can consider many general tasks in a compound form that consists of shorter tasks. Therefore, for robots to generate various tasks, they need to be able to execute shorter tasks in succession, appropriately to the situation. With the design principle to construct the architecture for robots to execute complex tasks compounded with multiple subtasks, this study proposes a visuomotor-control framework with the characteristics of a state machine to train shorter tasks as sensorimotor units. The design procedure of training framework consists of 4 steps: (1) segment entire task into appropriate subtasks, (2) define subtasks as states and transitions in a state machine, (3) collect subtasks data, and (4) train neural networks: (a) autoencoder to extract visual features, (b) a single recurrent neural network to generate subtasks to realize a pseud-state-machine model with a constraint in hidden values. We implemented this framework on two different robots to allow their performance of repetitive tasks with error-recovery motion, subsequently, confirming the ability of the robot to switch the sensorimotor units from visual input at the attractors of the hidden values created by the constraint.

DOI

Scopus

6

被引用数

(Scopus)
Large-scale data collection for goal-directed drawing task with self-report psychiatric symptom questionnaires via crowdsourcing

Shingo Murata, Hikaru Yanagida, Kentaro Katahira, Shinsuke Suzuki, Tetsuya Ogata, Yuichi Yamashita

IEEE International Conference on Systems, Man and Cybernetics (SMC) 2019-October 3859 - 3865 2019年10月 [査読有り]

　概要を見る

Drawing is a representative human cognitive ability and may mirror cognitive characteristics including those associated with psychiatric symptoms. Therefore, analysis of drawing data collected from various populations such as healthy people and psychiatric patients may be beneficial for better understanding human cognition. However, collecting such large-scale data about the relationship between drawing and cognitive/personality traits offline-in a laboratory-is a difficult issue. To overcome this issue, we devised a novel experimental paradigm involving a goal-directed drawing task conducted online-on the eb-with participants recruited via a crowdsourcing platform. With the assistance of 1155 participants with differing levels of psychiatric symptoms, we collected a total of 194, 040 trajectory data and answers to seven different self-report psychiatric symptom questionnaires comprising 181 items. We visualized the collected trajectory data and performed an exploratory factor analysis on the correlation matrix of the psychiatric symptom questionnaire items. Our results suggest that there were associations between psychiatric symptoms represented by specific psychiatric factors and atypical behavior observed while performing the goal-directed drawing task. This indicates the efficacy of a dimensional approach to large-scale online experiments with respect to clinical psychiatry.

DOI

Scopus

5

被引用数

(Scopus)
CNN-based multichannel end-to-end speech recognition for everyday home environments

Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, Tetsuya Ogata

European Signal Processing Conference 2019-September 2019年09月 [査読有り] [国際誌] [国際共著]

担当区分：最終著者

　概要を見る

Casual conversations involving multiple speakers and noises from surrounding devices are common in everyday environments, which degrades the performances of automatic speech recognition systems. These challenging characteristics of environments are the target of the CHiME-5 challenge. By employing a convolutional neural network (CNN)-based multichannel end-to-end speech recognition system, this study attempts to overcome the presents difficulties in everyday environments. The system comprises of an attention-based encoder-decoder neural network that directly generates a text as an output from a sound input. The multichannel CNN encoder, which uses residual connections and batch renormalization, is trained with augmented data, including white noise injection. The experimental results show that the word error rate is reduced by 8.5% and 0.6% absolute from a single channel end-to-end and the best baseline (LF-MMI TDNN) on the CHiME-5 corpus, respectively.

DOI

Scopus

12

被引用数

(Scopus)
CNNRNNPBを用いたOne-Shotによる模倣動作生成

伊藤洋, 山本健次郎, 森裕紀, 尾形哲也

日本ロボット学会第37回学術講演会予稿集 37th 1A3-06 2019年09月

J-GLOBAL
双腕ロボットに向けた再帰型神経回路モデルを用いたドラミングタスクの学習，日本ロボット学会第37回学術講演会

中島佳昭, 加瀬敬唯, 森裕紀, Claudio Zito, Andrey Barsky, 尾形哲也

日本ロボット学会第37回学術講演会予稿集 1A2-05 2019年09月
深層学習を用いた視覚運動モデルの異なる入出力情報によるロボット動作生成の比較

松本昇, 加瀬敬唯, 森裕紀, 尾形哲也

日本ロボット学会第37回学術講演会予稿集 1A2-03 2019年09月
Looking Back and Ahead: Adaptation and Planning by Gradient Descent

Shingo Murata, Hiroki Sawa, Shigeki Sugano, Tetsuya Ogata

2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) 2019年08月 [査読有り]

担当区分：最終著者

DOI
Weakly-Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation

Nelson Yalta, Shinji Watanabe, Kazuhiro Nakadai, Tetsuya Ogata

2019 International Joint Conference on Neural Networks (IJCNN) 2019年07月 [査読有り]

担当区分：最終著者

DOI
End-to-end Learning Method for Self-Driving Cars with Trajectory Recovery Using a Path-following Function

Tadashi Onishi, Toshiyuki Motoyoshi, Yuki Suga, Hiroki Mori, Tetsuya Ogata

2019 International Joint Conference on Neural Networks (IJCNN) 2019年07月 [査読有り]

担当区分：最終著者

DOI
End-to-End自動運転モデル改善のための画像認識サブタスクの設計と評価

石晶,李志豪, 本吉俊之, 大西直, 森裕紀, 尾形哲也

第33回人工知能学会全国大会予稿集 1L2-J-11-01 2019年06月
Conditional Generative Adversarial Networks によるロボットアームの障害物回避軌道計画

鳥島亮太, 森裕紀, 高橋城志, 岡野原大輔, 尾形哲也

日本機械学会ロボティクスメカトロニクス講演会予稿集 1P2-A10 2019年06月
モータ関節角と電流値を用いた再帰型神経回路モデルによるペグ挿入動作生成

倉田拓実, 伊藤洋, 森裕紀, 山本健次郎, 尾形哲也

日本機械学会ロボティクスメカトロニクス講演会予稿集 2019 1P2-A10 2019年06月

DOI CiNii
Path following algorithm for skid-steering mobile robot based on adaptive discontinuous posture control

Ibrahim, F., Abouelsoud, A.A., Fath Elbab, A.M.R., Ogata, T.

Advanced Robotics 33 ( 9 ) 439 - 453 2019年05月 [査読有り]

担当区分：最終著者

DOI

Scopus

19

被引用数

(Scopus)
Morphology-Specific Convolutional Neural Networks for Tactile Object Recognition with a Multi-Fingered Hand

Satoshi Funabashi, Gang Yan, Andreas Geier, Alexander Schmitz, Tetsuya Ogata, Shigeki Sugano

2019 International Conference on Robotics and Automation (ICRA) 2019年05月 [査読有り]

DOI
Adaptive Drawing Behavior by Visuomotor Learning Using Recurrent Neural Networks

Kazuma Sasaki, Tetsuya Ogata

IEEE Transactions on Cognitive and Developmental Systems 11 ( 1 ) 119 - 128 2019年03月 [査読有り]

担当区分：最終著者

DOI

Scopus

7

被引用数

(Scopus)
Encoding Longer-term Contextual Sensorimotor Information in a Predictive Coding Model

Junpei Zhong, Tetsuya Ogata, Angelo Cangelosi

Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence, SSCI 2018 160 - 167 2019年01月 [査読有り]

　概要を見る

Studies suggest that the difference of the sensorimotor events can be recorded with the fast- and slower-changing neural activities in the hierarchical brain areas, in which they have bi-directional connections. The slow-changing representations attempt to predict the activities on the faster level by relaying categorized sensorimotor events. On the other hand, the incoming sensory information corrects such event-based prediction on the higher level by the novel or surprising signal. From this motivation, we propose a predictive hierarchical artificial neural network model which is implemented the differentiated temporal parameters for neural updates. Also, both the fast- and slow-changing neural activities are modulated by the active motor activities. This model is examined in the driving dataset, recorded in various events, which incorporates the image sequences and the ego-motion of the vehicle. Experiments show that the model encodes the driving scenarios on the higher-level where the neuron recorded the long-term context.

DOI

Scopus

1

被引用数

(Scopus)
Tool-Use Model Considering Tool Selection by a Robot Using Deep Learning

Namiko Saito, Kitae Kim, Shingo Murata, Tetsuya Ogata, Shigeki Sugano

IEEE-RAS International Conference on Humanoid Robots 2018-November 814 - 819 2019年01月 [査読有り]

　概要を見る

We propose a tool-use model that can select tools that require neither labeling nor modeling of the environment and actions. With this model, a robot can choose a tool by itself and perform the operation that matches a human command and the environmental situation. To realize this, we use deep learning to train sensory motor data recorded during tool selection and tool use as experienced by a robot. The experience includes two types of selection, namely according to function and according to size, thereby allowing the robot to handle both situations. For evaluation, the robot is required to generate motion either in an untrained situation or using an untrained tool. We confirm that the robot can choose and use a tool that is suitable for achieving the target task.

DOI

Scopus

10

被引用数

(Scopus)
From natural to artificial embodied intelligence: is Deep Learning the solution (NII Shonan Meeting 137).

Lorenzo Jamone, Tetsuya Ogata, Beata J. Grzyb

NII Shonan Meet. Rep. 2019 2019年
Editorial: Machine learning methods for high-level cognitive capabilities in robotics

Tadahiro Taniguchi, Emre Ugur, Tetsuya Ogata, Takayuki Nagai, Yiannis Demiris

Frontiers in Neurorobotics 13 2019年

DOI

Scopus

1

被引用数

(Scopus)
Achieving Human–Robot Collaboration with Dynamic Goal Inference by Gradient Descent

Shingo Murata, Wataru Masuda, Jiayi Chen, Hiroaki Arie, Tetsuya Ogata, Shigeki Sugano

In Proceedings of the 26th International Conference on Neural Information Processing (ICONIP 2019) 11954 LNCS 579 - 590 2019年 [査読有り]

DOI

Scopus

10

被引用数

(Scopus)
Multisensory Learning Framework for Robot Drumming.

Andrey Barsky, Claudio Zito, Hiroki Mori, Tetsuya Ogata, Jeremy L. Wyatt

CoRR abs/1907.09775 2019年 [査読有り]
深層学習を用いた実機ロボットアームの高精度動作生成

後藤守規, 伊藤洋, 森裕紀, 山本健次郎, 尾形哲也

計測自動制御学会システムインテグレーション部門講演会SI2018 予稿集 19th 3A3-07 2018年12月

J-GLOBAL
A Neurorobotics Simulation of Autistic Behavior Induced by Unusual Sensory Precision

Hayato Idei, Shingo Murata, Yiwen Chen, Yuichi Yamashita, Jun Tani, Tetsuya Ogata

Computational Psychiatry 2 164 - 164 2018年12月 [査読有り]

担当区分：最終著者

DOI
The new ghost in the machine [Artificial Intelligence]

C. Edwards

Engineering & Technology 13 ( 10 ) 50 - 53 2018年11月 [招待有り]

DOI
Motion Switching With Sensory and Instruction Signals by Designing Dynamical Systems Using Deep Neural Network

Kanata Suzuki, Hiroki Mori, Tetsuya Ogata

IEEE Robotics and Automation Letters 3 ( 4 ) 3481 - 3488 2018年10月 [査読有り]

担当区分：最終著者

DOI

Scopus

18

被引用数

(Scopus)
Paired recurrent autoencoders for bidirectional translation between robot actions and linguistic descriptions

Yamada, T., Matsunaga, H., Ogata, T.

IEEE Robotics and Automation Letters 3 ( 4 ) 3441 - 3448 2018年10月 [査読有り]

担当区分：最終著者

DOI

Scopus

57

被引用数

(Scopus)
Message from the conference chairs

Tetsuya Ogata, Angelo Cangelosi

2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2018 2018年09月

DOI

Scopus
Dynamic Motion Generation by Flexible-Joint Robot based on Deep Learning using Images

Yuheng Wu, Kuniyuki Takahashi, Hiroki Yamada, Kitae Kim, Shingo Murata, Shigeki Sugano, Tetsuya Ogata

2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) 2018年09月 [査読有り]

担当区分：最終著者

DOI
Detecting features of tools, objects, and actions from effects in a robot using deep learning

Namiko Saito, Kitae Kim, Shingo Murata, Tetsuya Ogata, Shigeki Sugano

2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2018 91 - 96 2018年09月 [査読有り]

　概要を見る

We propose a tool-use model that can detect the features of tools, target objects, and actions from the provided effects of object manipulation. We construct a model that enables robots to manipulate objects with tools, using infant learning as a concept. To realize this, we train sensory-motor data recorded during a tool-use task performed by a robot with deep learning. Experiments include four factors: (1) tools, (2) objects, (3) actions, and (4) effects, which the model considers simultaneously. For evaluation, the robot generates predicted images and motions given information of the effects of using unknown tools and objects. We confirm that the robot is capable of detecting features of tools, objects, and actions by learning the effects and executing the task.

DOI

Scopus

3

被引用数

(Scopus)
Learning to Achieve Different Levels of Adaptability for Human–Robot Collaboration Utilizing a Neuro-Dynamical System

Shingo Murata, Yuxi Li, Hiroaki Arie, Tetsuya Ogata, Shigeki Sugano

IEEE Transactions on Cognitive and Developmental Systems 10 ( 3 ) 1 - 1 2018年09月 [査読有り]

DOI

Scopus

16

被引用数

(Scopus)
Acquisition of Viewpoint Transformation and Action Mappings via Sequence to Sequence Imitative Learning by Deep Neural Networks

Ryoichi Nakajo, Shingo Murata, Hiroaki Arie, Tetsuya Ogata

Frontiers in Neurorobotics 12 2018年07月 [査読有り]

担当区分：最終著者

DOI

Scopus

2

被引用数

(Scopus)
AFA-PredNet: The Action Modulation Within Predictive Coding

Junpei Zhong, Angelo Cangelosi, Xinzheng Zhang, Tetsuya Ogata

2018 International Joint Conference on Neural Networks (IJCNN) 2018年07月 [査読有り]

担当区分：最終著者

DOI
End-to-End Visuomotor Learning of Drawing Sequences using Recurrent Neural Networks

Kazuma Sasaki, Tetsuya Ogata

2018 International Joint Conference on Neural Networks (IJCNN) 2018年07月 [査読有り]

担当区分：最終著者

DOI
Put-in-Box Task Generated from Multiple Discrete Tasks by aHumanoid Robot Using Deep Learning

Kei Kase, Kanata Suzuki, Pin-Chu Yang, Hiroki Mori, Tetsuya Ogata

2018 IEEE International Conference on Robotics and Automation (ICRA) 2018年05月 [査読有り]

担当区分：最終著者

DOI
Understanding natural language sentences with word embedding and multi-modal interaction

Junpei Zhong, Tetsuya Ogata, Angelo Cangelosi, Chenguang Yang

7th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, ICDL-EpiRob 2017 2018-January 184 - 189 2018年04月 [査読有り]

　概要を見る

Understanding and grounding human commands with natural languages have been a fundamental requirement for service robotic applications. Although there have been several attempts toward this goal, the bottleneck still exists to store and process the corpora of natural language in an interaction system. Currently, the neural- and statistical-based (N&S) natural language processing have shown potential to solve this problem. With the availability of large data-sets nowadays, these processing methods are able to extract semantic relationships while parsing a corpus of natural language (NL) text without much human design, compared with the rule-based language processing methods. In this paper, we show that how two N&S based word embedding methods, called Word2vec and GloVe, can be used in natural language understanding as pre-training tools in a multi-modal environment. Together with two different multiple time-scale recurrent neural models, they form hybrid neural language understanding models for a robot manipulation experiment.

DOI

Scopus

7

被引用数

(Scopus)
RNNを備えた二台ロボット間インタラクションの複雑性解析

澤弘樹, 山田竜郎, 村田真悟, 森裕紀, 尾形哲也, 菅野重樹

情報処理学会第80回全国大会, 予稿集 5N-07 2018年03月
Effective input order of dynamics learning tree

Chyon Hae Kim, Shohei Hama, Ryo Hirai, Kuniyuki Takahashi, Hiroki Yamada, Tetsuya Ogata, Shigeki Sugano

Advanced Robotics 32 ( 3 ) 122 - 136 2018年02月 [査読有り]

　概要を見る

In this paper, we discuss about the learning performance of dynamics learning tree (DLT) while mainly focusing on the implementation on robot arms. We propose an input-order-designing method for DLT. DLT has been applied to the modeling of boat, vehicle, and humanoid robot. However, the relationship between the input order and the performance of DLT has not been investigated. In the proposed method, a developer is able to design an effective input order intuitively. The proposed method was validated in the model learning tasks on a simulated robot manipulator, a real robot manipulator, and a simulated vehicle. The first/second manipulator was equipped with flexible arm/finger joints that made uncertainty around the trajectories of manipulated objects. In all of the cases, the proposed method improved the performance of DLT.

DOI

Scopus

2

被引用数

(Scopus)
CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments.

Nelson Yalta, Shinji Watanabe, Takaaki Hori, Kazuhiro Nakadai, Tetsuya Ogata

CoRR abs/1811.02735 2018年 [査読有り]
深層学習を用いた要素動作の統合手法の開発

伊藤洋, 山本健次郎, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2018 1A1-D09 2018年

DOI CiNii
Rethinking Self-driving: Multi-task Knowledge for Better Generalization and Accident Explanation Ability.

Zhihao Li, Toshiyuki Motoyoshi, Kazuma Sasaki, Tetsuya Ogata, Shigeki Sugano

CoRR abs/1809.11100 2018年
Detecting Features of Tools, Objects, and Actions from Effects in a Robot using Deep Learning.

Namiko Saito, Kitae Kim, Shingo Murata, Tetsuya Ogata, Shigeki Sugano

CoRR abs/1809.08613 2018年
Weakly Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation.

Nelson Yalta, Shinji Watanabe 0001, Kazuhiro Nakadai, Tetsuya Ogata

CoRR abs/1807.01126 2018年
Encoding Longer-term Contextual Multi-modal Information in a Predictive Coding Model.

Junpei Zhong, Tetsuya Ogata, Angelo Cangelosi

CoRR abs/1804.06774 2018年
AFA-PredNet: The action modulation within predictive coding.

Junpei Zhong, Angelo Cangelosi, Xinzheng Zhang 0001, Tetsuya Ogata

CoRR abs/1804.03826 2018年
Deep 3D Pose Dictionary: 3D Human Pose Estimation from Single RGB Image Using Deep Convolutional Neural Network

Reda Elbasiony, Walid Gomaa, Tetsuya Ogata

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11141 LNCS 310 - 320 2018年 [査読有り]

担当区分：最終著者

DOI

Scopus
Encoding Longer-Term Contextual Information with Predictive Coding and Ego-Motion

Junpei Zhong, Angelo Cangelosi, Tetsuya Ogata, Xinzheng Zhang

Complexity 2018 1 - 15 2018年 [査読有り]

　概要を見る

Studies suggest that, within the hierarchical architecture, the topological higher level possibly represents the scenarios of the current sensory events with slower changing activities. They attempt to predict the neural activities on the lower level by relaying the predicted information after the scenario of the sensorimotor event has been determined. On the other hand, the incoming sensory information corrects such prediction of the events on the higher level by the fast-changing novel or surprising signal. From this point, we propose a predictive hierarchical artificial neural network model that examines this hypothesis on neurorobotic platforms. It integrates the perception and action in the predictive coding framework. Moreover, in this neural network model, there are different temporal scales of predictions existing on different levels of the hierarchical predictive coding architecture, which defines the temporal memories in recording the events occurring. Also, both the fast- and the slow-changing neural activities are modulated by the motor action. Therefore, the slow-changing neurons can be regarded as the representation of the recent scenario which the sensorimotor system has encountered. The neurorobotic experiments based on the architecture were also conducted.

DOI

Scopus

4

被引用数

(Scopus)
Four-Part Harmonization: Comparison of a Bayesian Network and a Recurrent Neural Network

Yamada, T., Kitahara, T., Arie, H., Ogata, T.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 11265 LNCS 213 - 225 2018年 [査読有り]

担当区分：最終著者

DOI

Scopus

1

被引用数

(Scopus)
Discontinuous Stabilizing Control of Skid-Steering Mobile Robot (SSMR)

Ibrahim, F., Abouelsoud, A.A., Fath El Bab, A.M.R., Ogata, T.

Journal of Intelligent and Robotic Systems: Theory and Applications 95 ( 2 ) 253 - 266 2018年 [査読有り]

担当区分：最終著者

DOI

Scopus

8

被引用数

(Scopus)
Sensorimotor input as a language generalisation tool: a neurorobotics model for generation and generalisation of noun-verb combinations with sensorimotor inputs

Zhong, J., Peniak, M., Tani, J., Ogata, T., Cangelosi, A.

Autonomous Robots 43 ( 5 ) 1271 - 1290 2018年 [査読有り]

DOI

Scopus

11

被引用数

(Scopus)
Representation Learning of Logic Words by an RNN: From Word Sequences to Robot Actions

Tatsuro Yamada, Shingo Murata, Hiroaki Arie, Tetsuya Ogata

Frontiers in Neurorobotics 11 2017年12月 [査読有り]

担当区分：最終著者

DOI

Scopus

12

被引用数

(Scopus)
Dynamic motion learning for multi-DOF flexible-joint robots using active–passive motor babbling through deep learning

Kuniyuki Takahashi, Tetsuya Ogata, Jun Nakanishi, Gordon Cheng, Shigeki Sugano

Advanced Robotics 31 ( 18 ) 1002 - 1015 2017年09月 [査読有り]

DOI

Scopus

17

被引用数

(Scopus)
Reduced behavioral flexibility by aberrant sensory precision in autism spectrum disorder: A neurorobotics experiment

Hayato Idei, Shingo Murata, Yiwen Chen, Yuichi Yamashita, Jun Tani, Tetsuya Ogata

2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) 271 - 276 2017年09月 [査読有り]

担当区分：最終著者

　概要を見る

Recently, the importance of the application of computational models utilized in cognitive neuroscience to psychiatric disorders has been recognized. This study utilizes a recurrent neural network model to test aberrant sensory precision, a normative theory of autism spectrum disorder. We particularly focus on the effects of increased and decreased sensory precision on adaptive behavior based on a prediction error minimization mechanism. To distinguish dysfunction at the behavioral and network levels, we employ a humanoid robot driven by a neural network and observe ball-playing interactions with a human experimenter. Experimental results show that behavioral rigidity characteristic of autism spectrum disorder-including stopping movement and repetitive behavior-was generated from both increased and decreased sensory precision, but through different processes at the network level. These results may provide a system-level explanation of different types of behavioral rigidity in psychiatric diseases such as compulsions and stereotypies. The results also support a system-level model for autism spectrum disorder that suggests core deficits in estimating the uncertainty of sensory evidence.

DOI
Tool-body assimilation model considering grasping motion through deep learning

Kuniyuki Takahashi, Kitae Kim, Tetsuya Ogata, Shigeki Sugano

Robotics and Autonomous Systems 91 115 - 127 2017年05月 [査読有り]

　概要を見る

We propose a tool-body assimilation model that considers grasping during motor babbling for using tools. A robot with tool-use skills can be useful in human–robot symbiosis because this allows the robot to expand its task performing abilities. Past studies that included tool-body assimilation approaches were mainly focused on obtaining the functions of the tools, and demonstrated the robot starting its motions with a tool pre-attached to the robot. This implies that the robot would not be able to decide whether and where to grasp the tool. In real life environments, robots would need to consider the possibilities of tool-grasping positions, and then grasp the tool. To address these issues, the robot performs motor babbling by grasping and nongrasping the tools to learn the robot's body model and tool functions. In addition, the robot grasps various parts of the tools to learn different tool functions from different grasping positions. The motion experiences are learned using deep learning. In model evaluation, the robot manipulates an object task without tools, and with several tools of different shapes. The robot generates motions after being shown the initial state and a target image, by deciding whether and where to grasp the tool. Therefore, the robot is capable of generating the correct motion and grasping decision when the initial state and a target image are provided to the robot.

DOI

Scopus

43

被引用数

(Scopus)
Toward abstraction from multi-modal data: Empirical studies on multiple time-scale recurrent models

Junpei Zhong, Angelo Cangelosi, Tetsuya Ogata

2017 International Joint Conference on Neural Networks (IJCNN) 2017年05月 [査読有り]

担当区分：最終著者

DOI
Repeatable Folding Task by Humanoid Robot Worker Using Deep Learning

Yang, P.-C., Sasaki, K., Suzuki, K., Kase, K., Sugano, S., Ogata, T.

IEEE Robotics and Automation Letters 2 ( 2 ) 397 - 403 2017年04月 [査読有り]

担当区分：最終著者

DOI

Scopus

172

被引用数

(Scopus)
Learning to Perceive the World as Probabilistic or Deterministic via Interaction with Others: A Neuro-Robotics Experiment

Shingo Murata, Yuichi Yamashita, Hiroaki Arie, Tetsuya Ogata, Shigeki Sugano, Jun Tani

IEEE Transactions on Neural Networks and Learning Systems 28 ( 4 ) 830 - 848 2017年04月 [査読有り]

　概要を見る

We suggest that different behavior generation schemes, such as sensory reflex behavior and intentional proactive behavior, can be developed by a newly proposed dynamic neural network model, named stochastic multiple timescale recurrent neural network (S-MTRNN). The model learns to predict subsequent sensory inputs, generating both their means and their uncertainty levels in terms of variance (or inverse precision) by utilizing its multiple timescale property. This model was employed in robotics learning experiments in which one robot controlled by the S-MTRNN was required to interact with another robot under the condition of uncertainty about the other's behavior. The experimental results show that self-organized and sensory reflex behavior-based on probabilistic prediction-emerges when learning proceeds without a precise specification of initial conditions. In contrast, intentional proactive behavior with deterministic predictions emerges when precise initial conditions are available. The results also showed that, in situations where unanticipated behavior of the other robot was perceived, the behavioral context was revised adequately by adaptation of the internal neural dynamics to respond to sensory inputs during sensory reflex behavior generation. On the other hand, during intentional proactive behavior generation, an error regression scheme by which the internal neural activity was modified in the direction of minimizing prediction errors was needed for adequately revising the behavioral context. These results indicate that two different ways of treating uncertainty about perceptual events in learning, namely, probabilistic modeling and deterministic modeling, contribute to the development of different dynamic neuronal structures governing the two types of behavior generation schemes.

DOI PubMed

Scopus

37

被引用数

(Scopus)
Emergence of interactive behaviors between two robots by prediction error minimization mechanism

Yiwen Chen, Shingo Murata, Hiroaki Arie, Tetsuya Ogata, Jun Tani, Shigeki Sugano

2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2016 302 - 307 2017年02月 [査読有り]

　概要を見る

This study demonstrates that the prediction error minimization (PEM) mechanism can account for the emergence of reciprocal interaction between two cognitive agents. During interactive processes, alternation of forming and deforming interactions may be triggered by various internal and external causes. We focus in particular on external causes derived from a dynamic and uncertain environment. Two small humanoid robots controlled by an identical dynamic neural network model using the PEM mechanism were trained to achieve a set of coherent ball-playing interactions between them. The two robots predict each other in a top-down way while they try to minimize the prediction errors derived from the unstable ball dynamics or the external cause in a bottom-up way by using the PEM mechanism. The experimental results showed that switching among the set of trained interactive ball plays between the two robots appears spontaneously. The analysis clarified how each complementary behavior can be generated via mutual adaptation between the two robots by undertaking top-down and bottom-up interaction in each individual dynamic neural network model by using the PEM mechanism.

DOI

Scopus

7

被引用数

(Scopus)
Analysis of imitative interactions between humans and a robot with a neuro-dynamical system

Shingo Murata, Kai Hirano, Hiroaki Arie, Shigeki Sugano, Tetsuya Ogata

SII 2016 - 2016 IEEE/SICE International Symposium on System Integration 343 - 348 2017年02月 [査読有り]

担当区分：最終著者

　概要を見る

Human communicative behavior is both dynamic and bidirectional. This study aims to analyze such behavior by conducting imitative interactions between human subjects and a humanoid robot that has a neuro-dynamical system. For this purpose, we take a robot-centered approach in which the change in robot performance according to difference in human partner is analyzed, rather than adopting the typical human-centered approach. A small humanoid robot equipped with a neuro-dynamical system learns imitative arm movement patterns and interacts with humans after the learning process. We analyze the interactive phenomena by different methods, including principal component analysis and use of a recurrence plot. Through this analysis, we demonstrate that different classes of interactions can be observed in the contextual dynamics of the neuro-dynamical system.

DOI

Scopus

1

被引用数

(Scopus)
Sound source localization using deep learning models

Nelson Yalta, Kazuhiro Nakadai, Tetsuya Ogata

Journal of Robotics and Mechatronics 29 ( 1 ) 37 - 48 2017年02月 [査読有り]

担当区分：最終著者

　概要を見る

This study proposes the use of a deep neural network to localize a sound source using an array of microphones in a reverberant environment. During the last few years, applications based on deep neural networks have performed various tasks such as image classification or speech recognition to levels that exceed even human capabilities. In our study, we employ deep residual networks, which have recently shown remarkable performance in image classification tasks even when the training period is shorter than that of other models. Deep residual networks are used to process audio input similar to multiple signal classification (MUSIC) methods. We show that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.

DOI

Scopus

107

被引用数

(Scopus)
手描きスケッチを扱う深層学習モデル (Imaging Today 人工知能における学習技術)

佐々木一磨, 尾形哲也

日本画像学会誌 = Journal of the Imaging Society of Japan 56 ( 2 ) 177 - 186 2017年

CiNii
ロボットミドルウェアにおける物体認識・リーチングのためのフレームワークの提案

太田博己, 安里太緒, 菅佑樹, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2017 2A2 - K11 2017年

　概要を見る

In this study, we propose a framework for object recognition and reaching motion in robot systems with robot middleware. There exist many elemental technologies for robot manipulation system such as object recognition and reaching. However, each technology is not systematically integrated for component-based development as open and common frameworks are undeveloped. In order to realize systematic integration of object recognition and reaching motion, we built a framework of robot middleware and proposed it as open framework. This framework is oriented reusable, exchangeable and extensible system and does not depend on specific robot middleware platforms. We implemented the proposed framework in a pick-and-place system with robot arm, and validate the system working.

DOI CiNii
ロボットシミュレーション環境構築フレームワーク「RTM-Unity Sim」の開発

大西直, 佐々木一磨, 本吉俊之, 菅佑樹, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2017 2A2 - J09 2017年

　概要を見る

In recent years, several robot simulators and open source game engines have been developed. Open source game engines such as Unity and Unreal Engine 4 help to create the simulation environment. Therefore, we implemented a simulator environment by connecting a game engine and a robot middleware. The requirements for the simulator are follows: (1) To create a simulation environment with the high degree of freedom. (2) To support the multiple OS. In this study, we selected Unity and OpenRTM-aist. To assess the machine learning performance, the training data can be collected and the learned model can be verified in the simulator environment. As an example of the implementation of the simulation environment using this framework, we developed a simulator for autonomous driving systems.

DOI CiNii
MTRNNを用いたEnd to End Learningによる移動行動学習システムの開発

日永田佑介, 伊藤洋, 山本健二郎, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2017 2A2 - D11 2017年

　概要を見る

This paper presents an autonomous mobile system using E2E learning. We examined the autonomous mobile system which could move in real environment for service robot and autonomous mobile vehicles. As a result, we proposed a system using MTRNN for E2E learning method of the autonomous mobile system. In addition, we developed the mobile robot system for the experiments. We suggested the machine learning simulator which could collect enormous learning data at low cost and developed it. By the above, we developed learning system of the autonomous mobile system by E2E learning.

DOI CiNii
神経回路モデルを用いた感覚不確実性の予測による状況変化に対する適応的行動生成

増田航, 村田真悟, 富岡咲希, 尾形哲也, 菅野重樹

ロボティクス・メカトロニクス講演会講演概要集 2017 1P2 - N08 2017年

　概要を見る

Robots working in the real environment need to respond to necessary sensory inputs. However, if the sensory inputs are not necessary for action generation, robots need to stably generate action without being affected by unnecessary sensory inputs. To realize such adaptive action generation against situational changes, robots should automatically decide how much sensory inputs are necessary for action generation. In this research, we propose a method which automatically decides the ratio of actual and predicted sensory inputs based on predicted sensory uncertainty. As a result of robot experiments, the robot with proposed mechanism could conduct adaptive action generation against situational changes.

DOI CiNii
Online Motion Generation with Sensory Information and Instructions by Hierarchical RNN.

Kanata Suzuki, Hiroki Mori, Tetsuya Ogata

CoRR abs/1712.05109 2017年
General problem solving with category theory.

Francisco J. Arjonilla, Tetsuya Ogata

CoRR abs/1709.04825 2017年
Toward Abstraction from Multi-modal Data: Empirical Studies on Multiple Time-scale Recurrent Models.

Junpei Zhong, Angelo Cangelosi, Tetsuya Ogata

CoRR abs/1702.05441 2017年
Learning of labeling room space for mobile robots based on visual motor experience

Tatsuro Yamada, Saki Ito, Hiroaki Arie, Tetsuya Ogata

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10613 LNCS 35 - 42 2017年 [査読有り]

担当区分：最終著者

　概要を見る

A model was developed to allow a mobile robot to label the areas of a typical domestic room, using raw sequential visual and motor data, no explicit information on location was provided, and no maps were constructed. The model comprised a deep autoencoder and a recurrent neural network. The model was demonstrated to (1) learn to correctly label areas of different shapes and sizes, (2) be capable of adapting to changes in room shape and rearrangement of items in the room, and (3) attribute different labels to the same area, when approached from different angles. Analysis of the internal representations of the model showed that a topological structure corresponding to the room structure was self-organized as the trajectory of the internal activations of the network.

DOI

Scopus

2

被引用数

(Scopus)
Mixing actual and predicted sensory states based on uncertainty estimation for flexible and robust robot behavior

Shingo Murata, Wataru Masuda, Saki Tomioka, Tetsuya Ogata, Shigeki Sugano

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10613 LNCS 11 - 18 2017年 [査読有り]

　概要を見る

In this paper, we propose a method to dynamically modulate the input state of recurrent neural networks (RNNs) so as to realize flexible and robust robot behavior. We employ the so-called stochastic continuous-time RNN (S-CTRNN), which can learn to predict the mean and variance (or uncertainty) of subsequent sensorimotor information. Our proposed method uses this estimated uncertainty to determine a mixture ratio for combining actual and predicted sensory states of network input. The method is evaluated by conducting a robot learning experiment in which a robot is required to perform a sensory-dependent task and a sensory-independent task. The sensory-dependent task requires the robot to incorporate meaningful sensory information, and the sensory-independent task requires the robot to ignore irrelevant sensory information. Experimental results demonstrate that a robot controlled by our proposed method exhibits flexible and robust behavior, which results from dynamic modulation of the network input on the basis of the estimated uncertainty of actual sensory states.

DOI

Scopus

2

被引用数

(Scopus)
深層学習とマニピュレーション

尾形哲也

日本ロボット学会誌 35 ( 1 ) 28 - 31 2017年 [招待有り]

CiNii
Learning to Perceive the World as Probabilistic or Deterministic via Interaction With Others: A Neuro-Robotics Experiment.

Shingo Murata, Yuichi Yamashita, Hiroaki Arie, Tetsuya Ogata, Shigeki Sugano, Jun Tani

IEEE Trans. Neural Networks Learn. Syst. 28 ( 4 ) 830 - 848 2017年 [査読有り]

　概要を見る

We suggest that different behavior generation schemes, such as sensory reflex behavior and intentional proactive behavior, can be developed by a newly proposed dynamic neural network model, named stochastic multiple timescale recurrent neural network (S-MTRNN). The model learns to predict subsequent sensory inputs, generating both their means and their uncertainty levels in terms of variance (or inverse precision) by utilizing its multiple timescale property. This model was employed in robotics learning experiments in which one robot controlled by the S-MTRNN was required to interact with another robot under the condition of uncertainty about the other's behavior. The experimental results show that self-organized and sensory reflex behavior-based on probabilistic prediction-emerges when learning proceeds without a precise specification of initial conditions. In contrast, intentional proactive behavior with deterministic predictions emerges when precise initial conditions are available. The results also showed that, in situations where unanticipated behavior of the other robot was perceived, the behavioral context was revised adequately by adaptation of the internal neural dynamics to respond to sensory inputs during sensory reflex behavior generation. On the other hand, during intentional proactive behavior generation, an error regression scheme by which the internal neural activity was modified in the direction of minimizing prediction errors was needed for adequately revising the behavioral context. These results indicate that two different ways of treating uncertainty about perceptual events in learning, namely, probabilistic modeling and deterministic modeling, contribute to the development of different dynamic neuronal structures governing the two types of behavior generation schemes.

DOI

Scopus

37

被引用数

(Scopus)
An effective visual programming tool for learning and using robotics middleware

Nishimura Yumi, Suga Yuki, Ogata. Tetsuya

2016 IEEE/SICE International Symposium on System Integration (SII) 2016年12月 [査読有り]

担当区分：最終著者

DOI
A reusability-based hierarchical fault-detection architecture for robot middleware and its implementation in an autonomous mobile robot system

Tao Asato, Yuki Suga, Tetsuya Ogata

2016 IEEE/SICE International Symposium on System Integration (SII) 2016年12月 [査読有り]

担当区分：最終著者

DOI
Achieving Different Levels of Adaptability for Human–Robot Collaboration Utilizing a Neuro-Dynamical System

Yuxi Li, Shingo Murata, Hiroaki Arie, Tetsuya Ogata, Shigeki Sugano

Workshop on Bio-inspired Social Robot Learning in Home Scenarios, The 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2016) 1 - 6 2016年10月 [査読有り]
神経回路モデルによるロボットの行動と言語の統合学習

尾形哲也

計測と制御 55 ( 10 ) 872 - 877 2016年10月 [招待有り]

DOI CiNii
Symbol emergence in robotics: A survey

Tadahiro Taniguchi, Takayuki Nagai, Tomoaki Nakamura, Naoto Iwahashi, Tetsuya Ogata, Hideki Asoh

Advanced Robotics 30 ( 11-12 ) 706 - 728 2016年06月 [査読有り]

　概要を見る

Humans can learn a language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form symbol systems and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted regarding the construction of robotic systems and machine learning methods that can learn a language through embodied multimodal interaction with their environment and other systems. Understanding human?-social interactions and developing a robot that can smoothly communicate with human users in the long term require an understanding of the dynamics of symbol systems. The embodied cognition and social interaction of participants gradually alter a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER represents a constructive approach towards a symbol emergence system. The symbol emergence system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e. humans and developmental robots. In this paper, specifically, we describe some state-of-art research topics concerning SER, such as multimodal categorization, word discovery, and double articulation analysis. They enable robots to discover words and their embodied meanings from raw sensory-motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions for research in SER.

DOI

Scopus

103

被引用数

(Scopus)
ロボティクスと深層学習(<特集>ニューラルネットワーク研究のフロンティア)

尾形哲也

人工知能:人工知能学会誌 31 ( 2 ) 210 - 215 2016年03月 [招待有り]

CiNii
CNNによる二次元物体画像から実ロボットでの把持動作生成

鈴木彼方, 新古眞純, 陽品駒, 高橋城志, 菅野重樹, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2016 2P1-12b7 2016年

DOI CiNii
編集後記

尾形哲也

計測と制御 55 ( 10 ) 908 - 908 2016年

DOI CiNii
Sensorimotor Input as a Language Generalisation Tool: A Neurorobotics Model for Generation and Generalisation of Noun-Verb Combinations with Sensorimotor Inputs.

Junpei Zhong, Martin Peniak, Jun Tani, Tetsuya Ogata, Angelo Cangelosi

CoRR abs/1605.03261 2016年
Dynamical Integration of Language and Behavior in a Recurrent Neural Network for Human–Robot Interaction

Tatsuro Yamada, Shingo Murata, Hiroaki Arie, Tetsuya Ogata

Frontiers in Neurorobotics 10 ( JUL ) 5 - 5 2016年 [査読有り]

担当区分：最終著者

DOI

Scopus

29

被引用数

(Scopus)
Self and non-self discrimination mechanism based on predictive learning with estimation of uncertainty

Ryoichi Nakajo, Maasa Takahashi, Shingo Murata, Hiroaki Arie, Tetsuya Ogata

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9950 LNCS 228 - 235 2016年 [査読有り]

担当区分：最終著者

　概要を見る

In this paper, we propose a model that can explain the mechanism of self and non-self discrimination. Infants gradually develop their abilities for self–other cognition through interaction with the environment. Predictive learning has been widely used to explain the mechanism of infants’ development. We hypothesized that infants’ cognitive abilities are developed through predictive learning and the uncertainty estimation of their sensory-motor inputs. We chose a stochastic continuous time recurrent neural network, which is a dynamical neural network model, to predict uncertainties as variances. From the perspective of cognitive developmental robotics, a predictive learning experiment with a robot was performed. The results indicate that training made the robot predict the regions related to its body more easily. We confirmed that self and non-self cognitive abilities might be acquired through predictive learning with uncertainty estimation.

DOI

Scopus

2

被引用数

(Scopus)
Classification of photo and sketch images using convolutional neural networks

Kazuma Sasaki, Madoka Yamakawa, Kana Sekiguchi, Tetsuya Ogata

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9887 LNCS 283 - 290 2016年 [査読有り]

担当区分：最終著者

　概要を見る

Content-Based Image Retrieval (CBIR) system enables us to access images using only images as queries, instead of keywords. Photorealistic images, and hand-drawn sketch image can be used as a queries as well. Recently, convolutional neural networks (CNNs) are used to discriminate images including sketches. However, the tasks are limited to classifying only one type of images, either photo or sketch images, due to the lack of a large dataset of sketch images and the large difference of their visual characteristics. In this paper, we introduce a simple way to prepare training datasets, which can enable the CNN model to classify both types of images by color transforming photo and illustration images. Through the training experiment, we show that the proposed method contributes to the improvement of classification accuracy.

DOI

Scopus

3

被引用数

(Scopus)
Body model transition by tool grasping during motor babbling using deep learning and RNN

Kuniyuki Takahashi, Hadi Tjandra, Tetsuya Ogata, Shigeki Sugano

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9886 LNCS 166 - 174 2016年 [査読有り]

　概要を見る

We propose a method of tool use considering the transition process of a body model from not grasping to grasping a tool using a single model. In our previous research, we proposed a tool-body assimilation model in which a robot autonomously learns tool functions using a deep neural network (DNN) and recurrent neural network (RNN) through experiences of motor babbling. However, the robot started its motion already holding the tools. In real-life situations, the robot would make decisions regarding grasping (handling) or not grasping (manipulating) a tool. To achieve this, the robot performs motor babbling without the tool pre-attached to the hand with the same motion twice, in which the robot handles the tool or manipulates without graping it. To evaluate the model, we have the robot generate motions with showing the initial and target states. As a result, the robot could generate the correct motions with grasping decisions.

DOI

Scopus
Dynamical linking of positive and negative sentences to goal-oriented robot behavior by hierarchical RNN

Tatsuro Yamada, Shingo Murata, Hiroaki Arie, Tetsuya Ogata

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9886 LNCS 339 - 346 2016年 [査読有り]

担当区分：最終著者

　概要を見る

Meanings of language expressions are constructed not only from words grounded in real-world matters, but also from words such as “not” that participate in the construction by working as logical operators. This study proposes a connectionist method for learning and internally representing functions that deal with both of these word groups, and grounding sentences constructed from them in corresponding behaviors just by experiencing raw sequential data of an imposed task. In the experiment, a robot implemented with a recurrent neural network is required to ground imperative positive and negative sentences given as a sequence of words in corresponding goal-oriented behavior. Analysis of the internal representations reveals that the network fulfilled the requirement by extracting XOR problems implicitly included in the target sequences and solving them by learning to represent the logical operations in its nonlinear dynamics in a self-organizing manner.

DOI

Scopus

2

被引用数

(Scopus)
Visual motor integration of robot's drawing behavior using recurrent neural network.

Kazuma Sasaki, Kuniaki Noda, Tetsuya Ogata

Robotics Auton. Syst. 86 184 - 195 2016年 [査読有り]

担当区分：最終著者

　概要を見る

Drawing is a way of visually expressing our feelings, knowledge, and situation. People draw pictures to share information with other human beings. This study investigates visuomotor memory (VM), which is a reusable memory storing drawing behavioral data. We propose a neural network-based model for acquiring a computational memory that can replicate VM through self-organized learning of a robot's actual drawing experiences. To design the model, we assume that VM has the following two characteristics: (1) it is formed by bottom-up learning and integration of temporal drawn pictures and motion data, and (2) it allows the observers to associate drawing motions from pictures. The proposed model comprises a deep neural network for dimensionally compressing temporal drawn images and a continuous-time recurrent neural network for integration learning of drawing motions and temporal drawn images. Two experiments are conducted on unicursal shape learning to investigate whether the proposed model can learn the function without any shape information for visual processing. Based on the first experiment, the model can learn 15 drawing sequences for three types of pictures, acquiring associative memory for drawing motions through the bottom-up learning process. Thus, it can associate drawing motions from untrained drawn images. In the second experiment, four types of pictures are trained, with four distorted variations per type. In this case, the model can organize the different shapes based on their distortions by utilizing both the image information and the drawing motions, even if visual characteristics are not shared. (C) 2016 The Authors. Published by Elsevier B.V.

DOI

Scopus

27

被引用数

(Scopus)
Sound source separation for robot audition using deep learning

Kuniaki Noda, Naoya Hashimoto, Kazuhiro Nakadai, Tetsuya Ogata

IEEE-RAS International Conference on Humanoid Robots 2015-December 389 - 394 2015年12月 [査読有り]

担当区分：最終著者

　概要を見る

Noise robust speech recognition is crucial for effective human-machine interaction in real-world environments. Sound source separation (SSS) is one of the most widely used approaches for addressing noise robust speech recognition by extracting a target speaker's speech signal while suppressing simultaneous unintended signals. However, conventional SSS algorithms, such as independent component analysis or nonlinear principal component analysis, are limited in modeling complex projections with scalability. Moreover, conventional systems required designing an independent subsystem for noise reduction (NR) in addition to the SSS. To overcome these issues, we propose a deep neural network (DNN) framework for modeling the separation function (SF) of an SSS system. By training a DNN to predict clean sound features of a target sound from corresponding multichannel deteriorated sound feature inputs, we enable the DNN to model the SF for extracting the target sound without prior knowledge regarding the acoustic properties of the surrounding environment. Moreover, the same DNN is trained to function simultaneously as a NR filter. Our proposed SSS system is evaluated using an isolated word recognition task and a large vocabulary continuous speech recognition task when either nondirectional or directional noise is accumulated in the target speech. Our evaluation results demonstrate that DNN performs noticeably better than the baseline approach, especially when directional noise is accumulated with a low signal-to-noise ratio.

DOI

Scopus

12

被引用数

(Scopus)
記号創発ロボティクスによる人間機械コラボレーション基盤創成 (クラウドネットワークロボット)

長井隆行, 谷口忠大, 尾形哲也

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115 ( 375 ) 23 - 27 2015年12月

CiNii
Effective motion learning for a flexible-joint robot using motor babbling

Kuniyuki Takahashi, Tetsuya Ogata, Hiroki Yamada, Hadi Tjandra, Shigeki Sugano

IEEE International Conference on Intelligent Robots and Systems 2015-December 2723 - 2728 2015年12月 [査読有り]

　概要を見る

We propose a method for realizing effective dynamic motion learning in a flexible-joint robot using motor babbling. Flexible-joint robots have recently attracted attention because of their adaptiveness, safety, and, in particular, dynamic motions. It is difficult to control robots that require dynamic motion. In past studies, attractors and oscillators were designed as motion primitives of an assumed task in advance. However, it is difficult to adapt to unintended environmental changes using such methods. To overcome this problem, we use a recurrent neural network (RNN) that does not require predetermined parameters. In this research, we propose a method for facilitating effective learning. First, a robot learns simple motions via motor babbling, acquiring body dynamics using a recurrent neural network (RNN). Motor babbling is the process of movement that infants use to acquire their own body dynamics during their early days. Next, the robot learns additional motions required for a target task using the acquired body dynamics. For acquiring these body dynamics, the robot uses motor babbling with its redundant flexible joints to learn motion primitives. This redundancy implies that there are numerous possible motion patterns. In comparison to a basic learning task, the motion primitives are simply modified to adjust to the task. Next, we focus on the types of motions used in motor babbling. We classify the motions into two motion types, passive motion and active motion. Passive motion involves inertia without any torque input, whereas active motion involves a torque input. The robot acquires body dynamics from the passive motion and a means of torque generation from the active motion. As a result, we demonstrate the importance of performing prior learning via motor babbling before learning a task. In addition, task learning is made more efficient by dividing the motion into two types of motor babbling patterns.

DOI

Scopus

8

被引用数

(Scopus)
Neural network based model for visual-motor integration learning of robot's drawing behavior: Association of a drawing motion from a drawn image

Kazuma Sasaki, Hadi Tjandra, Kuniaki Noda, Kuniyuki Takahashi, Tetsuya Ogata

IEEE International Conference on Intelligent Robots and Systems 2015-December 2736 - 2741 2015年12月 [査読有り]

担当区分：最終著者

　概要を見る

In this study, we propose a neural network based model for learning a robot's drawing sequences in an unsupervised manner. We focus on the ability to learn visual-motor relationships, which can work as a reusable memory in association of drawing motion from a picture image. Assuming that a humanoid robot can draw a shape on a pen tablet, the proposed model learns drawing sequences, which comprises drawing motion and drawn picture image frames. To learn raw pixel data without any given specific features, we utilized a deep neural network for compressing large dimensional picture images and a continuous time recurrent neural network for integration of motion and picture images. To confirm the ability of the proposed model, we performed an experiment for learning 15 sequences comprising three types of shapes. The model successfully learns all the sequences and can associate a drawing motion from a not trained picture image and a trained picture with similar success. We also show that the proposed model self-organizes its behavior according to types shapes.

DOI

Scopus

13

被引用数

(Scopus)
Attractor representations of language-behavior structure in a recurrent neural network for human-robot interaction

Tatsuro Yamada, Shingo Murata, Hiroaki Arie, Tetsuya Ogata

IEEE International Conference on Intelligent Robots and Systems 2015-December 4179 - 4184 2015年12月 [査読有り]

担当区分：最終著者

　概要を見る

In recent years there has been increased interest in studies that explore integrative learning of language and other modalities by using neural network models. However, for practical application to human-robot interaction, the acquired semantic structure between language and meaning has to be available immediately and repeatably whenever necessary, just as in everyday communication. As a solution to this problem, this study proposes a method in which a recurrent neural network self-organizes cyclic attractors that reflect semantic structure and represent interaction flows in its internal dynamics. To evaluate this method we design a simple task in which a human verbally directs a robot, which responds appropriately. Training the network with training data that represent the interaction series, the cyclic attractors that reflect the semantic structure is self-organized. The network first receives a verbal direction, and its internal state moves according to the first half of the cyclic attractors with branch structures corresponding to semantics. After that, the internal state reaches a potential to generate appropriate behavior. Finally, the internal state moves to the second half and converges on the initial point of the cycle while generating the appropriate behavior. By self-organizing such an internal structure in its forward dynamics, the model achieves immediate and repeatable response to linguistic directions. Furthermore, the network self-organizes a fixed-point attractor, and so able to wait for directions. It can thus repeat the interaction flexibly without explicit turn-taking signs.

DOI

Scopus

5

被引用数

(Scopus)
Acquisition of viewpoint representation in imitative learning from own sensory-motor experiences

Ryoichi Nakajo, Shingo Murata, Hiroaki Arie, Tetsuya Ogata

5th Joint International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2015 326 - 331 2015年12月 [査読有り]

担当区分：最終著者

　概要を見る

This paper introduces an imitative model that enables a robot to acquire viewpoints of the self and others from its own sensory-motor experiences. This is important for recognizing and imitating actions generated from various directions. Existing methods require coordinate transformations input by human designers or complex learning modules to acquire a viewpoint. In the proposed model, several neurons dedicated to generated actions and viewpoints of the self and others are added to a dynamic nueral network model reffered as continuous time recurrent neural network (CTRNN). The training data are labeled with types of actions and viewpoints, and are linked to each internal state. We implemented this model in a robot and trained the model to perform actions of object manipulation. Representations of behavior and viewpoint were formed in the internal states of the CTRNN. In addition, we analyzed the initial values of the internal states that represent the viewpoint information. We confirmed the distinction of the observational perspective of other's actions self-organized in the space of the initial values. Combining the initial values of the internal states that describe the behavior and the viewpoint, the system can generate unlearned data.

DOI

Scopus

7

被引用数

(Scopus)
Predictive learning with uncertainty estimation for modeling infants' cognitive development with caregivers: A neurorobotics experiment

Shingo Murata, Saki Tomioka, Ryoichi Nakajo, Tatsuro Yamada, Hiroaki Arie, Tetsuya Ogata, Shigeki Sugano

5th Joint International Conference on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2015 302 - 307 2015年12月 [査読有り]

　概要を見る

Dynamic interactions with caregivers are essential for infants to develop cognitive abilities, including aspects of action, perception, and attention. We hypothesized that these abilities can be acquired through the predictive learning of sensory inputs including their uncertainty (inverse precision) in terms of variance. To examine our hypothesis from the perspective of cognitive developmental robotics, we conducted a neurorobotics experiment involving a ball-playing interaction task between a human experimenter representing a caregiver and a small humanoid robot representing an infant. The robot was equipped with a dynamic generative model called a stochastic continuous-time recurrent neural network (S-CTRNN). The S-CTRNN learned to generate predictions about both the visuo-proprioceptive states of the robot and the uncertainty of these states by minimizing a negative log-likelihood consisting of log-uncertainty and precision-weighted prediction error. The experimental results showed that predictive learning with uncertainty estimation enabled the robot to acquire infant-like cognitive abilities through dynamic interactions with the experimenter. We also discuss the effects of infant-directed modifications observed in caregiver-infant interactions on the development of these abilities.

DOI

Scopus

3

被引用数

(Scopus)
Audio-visual speech recognition using deep learning

Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata

Applied Intelligence 42 ( 4 ) 722 - 737 2015年06月 [査読有り]

担当区分：最終著者

　概要を見る

Audio-visual speech recognition (AVSR) system is thought to be one of the most promising solutions for reliable speech recognition, particularly when the audio is corrupted by noise. However, cautious selection of sensory features is crucial for attaining high recognition performance. In the machine-learning community, deep learning approaches have recently attracted increasing attention because deep neural networks can effectively extract robust latent features that enable various recognition algorithms to demonstrate revolutionary generalization capabilities under diverse application conditions. This study introduces a connectionist-hidden Markov model (HMM) system for noise-robust AVSR. First, a deep denoising autoencoder is utilized for acquiring noise-robust audio features. By preparing the training data for the network with pairs of consecutive multiple steps of deteriorated audio features and the corresponding clean features, the network is trained to output denoised audio features from the corresponding features deteriorated by noise. Second, a convolutional neural network (CNN) is utilized to extract visual features from raw mouth area images. By preparing the training data for the CNN as pairs of raw images and the corresponding phoneme label outputs, the network is trained to predict phoneme labels from the corresponding mouth area input images. Finally, a multi-stream HMM (MSHMM) is applied for integrating the acquired audio and visual HMMs independently trained with the respective features. By comparing the cases when normal and denoised mel-frequency cepstral coefficients (MFCCs) are utilized as audio features to the HMM, our unimodal isolated word recognition results demonstrate that approximately 65 % word recognition rate gain is attained with denoised MFCCs under 10 dB signal-to-noise-ratio (SNR) for the audio signal input. Moreover, our multimodal isolated word recognition results utilizing MSHMM with denoised MFCCs and acquired visual features demonstrate that an additional word recognition rate gain is attained for the SNR conditions below 10 dB.

DOI

Scopus

523

被引用数

(Scopus)
2P1-S06 再帰型神経回路モデルによる分散予測を用いた柔軟関節ロボットの身体ダイナミクスの探索

鈴木彼方, 高橋城志, Hadi Tjandra, 村田真悟, 菅野重樹, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2015 "2P1 - S06(1)"-"2P1-S06(2)" 2015年05月

　概要を見る

We propose an exploratory motor babbling utilizing variance prediction of recurrent neural network as a method to explore body dynamics of robot with flexible joints. With conventional research methods, practical use with real robots is difficult because of the large numbers of required motor babbling motions. With the proposed method, over-fitting of predictable motions is reduced by sequentially learning appropriate motion for body model from unpredictable motions. To evaluate the method, an experiment where the robot additionally learns crank turning task after the exploration of body dynamics were conducted. The results show that the proposed method is capable of efficient motion generation in any given motion task.

CiNii
1A1-K01 RTミドルウエアのためのRTC・RTSリポジトリ管理フレームワークwasanbon : リポジトリ情報収集機能とRTCのビルド管理システムの開発

菅佑樹, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2015 "1A1 - K01(1)"-"1A1-K01(2)" 2015年05月

　概要を見る

We are currently developing an open-framework for RT-middleware which advances the reusability of RT-component and RT-system. Using the RT-middleware, though the softwares of the robotics elements like actuators, controllers, and sensors, are encapsulated into RT-component which can be easily reused, the way how to collect the source-codes of RTCs is not well-discussed yet. In our open-framework, both a repository management and a build support system are implemented. In this paper, the RTC source-code build-status management system which uses both the repository management and the build support system is shown.

CiNii
1A1-J02 自律移動ロボットフレームワークの提案とRTミドルウェアへの実装

安里太緒, 菅佑樹, 尾形哲也

ロボティクス・メカトロニクス講演会講演概要集 2015 "1A1 - J02(1)"-"1A1-J02(2)" 2015年05月

　概要を見る

In this paper, we propose a system framework of a mapping and navigation robot which does not depend on specific robot middleware platforms. Recently, robot middleware have attracted attention widely because that is expected to reduce the cost of robot development. However, many robot middleware platforms have been developed, so it must to define a common framework to design robot system without depending on specific platform. Therefore, we propose an autonomous mobile robot framework which independent on specific robot middleware platforms, hi addition, this model are implemented in RT-Middleware and experimentally evaluated.

CiNii
Preferential training of neurodynamical model based on predictability of target dynamics

Shun Nishide, Harumitsu Nobuta, Hiroshi G. Okuno, Tetsuya Ogata

Advanced Robotics 29 ( 9 ) 587 - 596 2015年05月 [査読有り]

担当区分：最終著者

　概要を見る

Intrinsic motivation is one of the keys in implementing the mechanism of interest to robots. In this paper, we present a method to apply intrinsic motivation in dynamics learning with predictable and unpredictable targets in view. The robots arm is used for the predictable target and the humans arm is used for the unpredictable target in the experiment. The learning algorithm based on intrinsic motivation will automatically set a larger weight to targets that would contribute to decreasing the training error, while setting a smaller weight to others. A neurodynamical model, namely multiple timescales recurrent neural network (MTRNN), is utilized for studying the robots arm/external object dynamics. Training of MTRNN is done using the back propagation through time (BPTT) algorithm. We modify the BPTT algorithm by the following two steps. (1) Evaluate predictability of robots arm/objects using training error of MTRNN. (2) Assign a preference ratio, which represents the weight of the training, to each object based on predictability. The proposed training method would focus more on reducing training error of predictable objects compared to normal BPTT, where training error is equally treated for every object. Experiments were conducted using an actual robot platform, moving the robots arm while a human moves his arm in the robots camera view. The results of the experiment showed that the proposed training method could achieve smaller training error of the robots arm visuomotor dynamics, which is predictable from the robots motor command, compared to general training with BPTT. Evaluation of MTRNN as a forward model to predict untrained data showed that the proposed model is capable of predicting the robots hand motion, specifically with larger number of nodes.

DOI

Scopus

1

被引用数

(Scopus)
Tool-body assimilation model based on body babbling and neurodynamical system

Kuniyuki Takahashi, Tetsuya Ogata, Hadi Tjandra, Yuki Yamaguchi, Shigeki Sugano

Mathematical Problems in Engineering 2015 2015年02月 [査読有り]

　概要を見る

We propose the new method of tool use with a tool-body assimilation model based on body babbling and a neurodynamical system for robots to use tools. Almost all existing studies for robots to use tools require predetermined motions and tool features; the motion patterns are limited and the robots cannot use novel tools. Other studies fully search for all available parameters for novel tools, but this leads to massive amounts of calculations. To solve these problems, we took the following approach: we used a humanoid robot model to generate random motions based on human body babbling. These rich motion experiences were used to train recurrent and deep neural networks for modeling a body image. Tool features were self-organized in parametric bias, modulating the body image according to the tool in use. Finally, we designed a neural network for the robot to generate motion only from the target image. Experiments were conducted with multiple tools for manipulating a cylindrical target object. The results show that the tool-body assimilation model is capable of motion generation.

DOI

Scopus

10

被引用数

(Scopus)
Tactile object recognition using deep learning and dropout

Alexander Schmitz, Yusuke Bansho, Kuniaki Noda, Hiroyasu Iwata, Tetsuya Ogata, Shigeki Sugano

IEEE-RAS International Conference on Humanoid Robots 2015-February 1044 - 1050 2015年02月 [査読有り]

　概要を見る

Recognizing grasped objects with tactile sensors is beneficial in many situations, as other sensor information like vision is not always reliable. In this paper, we aim for multimodal object recognition by power grasping of objects with an unknown orientation and position relation to the hand. Few robots have the necessary tactile sensors to reliably recognize objects: in this study the multifingered hand of TWENDY-ONE is used, which has distributed skin sensors covering most of the hand, 6 axis F/T sensors in each fingertip, and provides information about the joint angles. Moreover, the hand is compliant. When using tactile sensors, it is not clear what kinds of features are useful for object recognition. Recently, deep learning has shown promising results. Nevertheless, deep learning has rarely been used in robotics and to our best knowledge never for tactile sensing, probably because it is difficult to gather many samples with tactile sensors. Our results show a clear improvement when using a denoising autoencoder with dropout compared to traditional neural networks. Nevertheless, a higher number of layers did not prove to be beneficial.

DOI

Scopus

86

被引用数

(Scopus)
Special Issue on Cutting Edge of Robotics in Japan 2015

Tetsuya Ogata

Advanced Robotics 29 ( 1 ) 1 - 1 2015年01月

DOI

Scopus
Symbol Emergence in Robotics: A Survey.

Tadahiro Taniguchi, Takayuki Nagai, Tomoaki Nakamura, Naoto Iwahashi, Tetsuya Ogata, Hideki Asoh

CoRR abs/1509.08973 2015年
Efficient motor babbling using variance predictions from a recurrent neural network

Kuniyuki Takahashi, Kanata Suzuki, Tetsuya Ogata, Hadi Tjandra, Shigeki Sugano

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9491 26 - 33 2015年 [査読有り]

　概要を見る

We propose an exploratory form of motor babbling that uses variance predictions from a recurrent neural network as a method to acquire the body dynamics of a robot with flexible joints. In conventional research methods, it is difficult to construct real robots because of the large number of motor babbling motions required. In motor babbling, different motions may be easy or difficult to predict. The variance is large in difficult-to-predict motions, whereas the variance is small in easy-topredict motions. We use a Stochastic Continuous Timescale Recurrent Neural Network to predict the accuracy and variance of motions. Using the proposed method, a robot can explore motions based on variance. To evaluate the proposed method, experiments were conducted in which the robot learns crank turning and door opening/closing tasks after exploring its body dynamics. The results show that the proposed method is capable of efficient motion generation for any given motion tasks.

DOI

Scopus

1

被引用数

(Scopus)
Generation of sensory reflex behavior versus intentional proactive behavior in robot learning of cooperative interactions with others

Shingo Murata, Yuichi Yamashita, Hiroaki Arie, Tetsuya Ogata, Jun Tani, Shigeki Sugano

IEEE ICDL-EPIROB 2014 - 4th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics 242 - 248 2014年12月 [査読有り]

　概要を見る

This paper investigates the essential difference between two types of behavior generation schemes, namely, sensory reflex behavior generation and intentional proactive behavior generation, by proposing a dynamic neural network model referred to as stochastic multiple-timescale recurrent neural network (S-MTRNN). The proposed model was employed in an experiment involving robots learning to cooperate with others under the condition of potential unpredictability of the others' behaviors. The results of the learning experiment showed that sensory reflex behavior was generated by a self-organizing probabilistic prediction mechanism when the initial sensitivity characteristics in the network dynamics were not utilized in the learning process. In contrast, proactive behavior with a deterministic prediction mechanism was developed when the initial sensitivity was utilized. It was further shown that in situations where unexpected behaviors of others were observed, the behavioral context was re-situated by adaptation of the internal neural dynamics by means of simple sensory reflexes in the former case. In the latter case, the behavioral context was re-situated by error regression of the internal neural activity rather than by sensory reflex. The role of the top-down and bottom-up interactions in dealing with unexpected situations is discussed.

DOI

Scopus

4

被引用数

(Scopus)
Insertion of Pause in Drawing from Babbling for Robot’s Developmental Imitation Learning.

Shun Nishide, Keita Mochizuki, Hiroshi G. Okuno, Tetsuya Ogata

Proceedings of 2014 IEEE International Conference on Robots and Automation (ICRA 2014) 4785 - 4791 2014年09月 [査読有り]

担当区分：最終著者

DOI

Scopus

15

被引用数

(Scopus)
人間の描画発達に基づくロボットの描画模倣学習モデルの構築

西出俊, 望月敬太, 奥乃博, 尾形哲也

日本ロボット学会学術講演会予稿集(CD-ROM) 32nd ROMBUNNO.2I2-04 2014年09月

J-GLOBAL
Learning to generate proactive and reactive behavior using a dynamic neural network model with time-varying variance prediction mechanism

Shingo Murata, Hiroaki Arie, Tetsuya Ogata, Shigeki Sugano, Jun Tani

Advanced Robotics 28 ( 17 ) 1189 - 1203 2014年09月 [査読有り]

　概要を見る

This paper discusses a possible neurodynamic mechanism that enables self-organization of two basic behavioral modes, namely a proactive mode and a reactive mode, and of autonomous switching between these modes depending on the situation. In the proactive mode, actions are generated based on an internal prediction, whereas in the reactive mode actions are generated in response to sensory inputs in unpredictable situations. In order to investigate how these two behavioral modes can be self-organized and how autonomous switching between the two modes can be achieved, we conducted neurorobotics experiments by using our recently developed dynamic neural network model that has a capability to learn to predict time-varying variance of the observable variables. In a set of robot experiments under various conditions, the robot was required to imitate others movements consisting of alternating predictable and unpredictable patterns. The experimental results showed that the robot controlled by the neural network model was able to proactively imitate predictable patterns and reactively follow unpredictable patterns by autonomously switching its behavioral modes. Our analysis revealed that variance prediction mechanism can lead to self-organization of these abilities with sufficient robustness and generalization capabilities. © 2014 © 2014 Taylor & Francis and The Robotics Society of Japan.

DOI

Scopus

9

被引用数

(Scopus)
Deep neural network を用いたヒューマノイドロボットの適応的行動選択

野田邦昭, 有江浩明, 菅佑樹, 尾形哲也

，GPU Technology Conference Japan 2014年07月
Multimodal integration learning of robot behavior using deep neural networks

Kuniaki Noda, Hiroaki Arie, Yuki Suga, Tetsuya Ogata

Robotics and Autonomous Systems 62 ( 6 ) 721 - 736 2014年06月 [査読有り]

担当区分：最終著者

　概要を見る

For humans to accurately understand the world around them, multimodal integration is essential because it enhances perceptual precision and reduces ambiguity. Computational models replicating such human ability may contribute to the practical use of robots in daily human living environments; however, primarily because of scalability problems that conventional machine learning algorithms suffer from, sensory-motor information processing in robotic applications has typically been achieved via modal-dependent processes. In this paper, we propose a novel computational framework enabling the integration of sensory-motor time-series data and the self-organization of multimodal fused representations based on a deep learning approach. To evaluate our proposed model, we conducted two behavior-learning experiments utilizing a humanoid robot; the experiments consisted of object manipulation and bell-ringing tasks. From our experimental results, we show that large amounts of sensory-motor information, including raw RGB images, sound spectrums, and joint angles, are directly fused to generate higher-level multimodal representations. Further, we demonstrated that our proposed framework realizes the following three functions: (1) cross-modal memory retrieval utilizing the information complementation capability of the deep autoencoder; (2) noise-robust behavior recognition utilizing the generalization capability of multimodal features; and (3) multimodal causality acquisition and sensory-motor prediction based on the acquired causality. © 2014 Elsevier B.V. All rights reserved.

DOI

Scopus

150

被引用数

(Scopus)
身体バブリングと再帰結合型神経回路モデルによる道具身体化〜深層学習による画像特徴量抽出〜

高橋城志, 尾形哲也, Hadi Tjandra, 野田邦昭, 村田真悟, 有江浩明, 菅野重樹

第28回人工知能学会全国大会 1I4-OS-09a-4 2014年05月
予測精度の予測に基づいた能動的・受動的な適応行動の生成学習

村田真悟, 山下祐一, 有江浩明, 尾形哲也, 谷淳, 菅野重樹

第28回人工知能学会全国大会 2K4-OS-04a-3 2014年05月
Deep neural network による映像・音響・運動データの統合と共起

野田邦昭, 有江浩明, 菅佑樹, 尾形哲也

第28回人工知能学会全国大会 3H4-OS-24b-3 2014年05月
異なる神経メカニズムによる能動的・受動的行動の選択

村田真悟, 山下祐一, 有江浩明, 尾形哲也, 谷淳, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 3P2-Q03 2014年05月
神経回路モデルと身体バブリングによる道具身体化と道具機能の獲得

高橋城志, 尾形哲也, TjandraHadi, 野田邦昭, 村田真悟, 有江浩明, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 3P2-P02 2014年05月
Deep neural network を用いた感覚運動統合メカニズムによるヒューマノイドロボットの物体操作行動認識

野田邦昭, 有江浩明, 菅佑樹, 尾形哲也

日本機械学会ロボティクスメカトロニクス講演会 3P2-P03 2014年05月
神経力学モデルと身体バブリングに基づく道具身体化と動作生成

Hadi Tjandra, 高橋城志, 村田真悟, 有江浩明, 山口雄紀, 尾形哲也, 菅野重樹

情報処理学会第76回全国大会 2014 ( 1 ) 1S - 4 2014年03月

　概要を見る

神経回路モデルを用いた道具身体化と動作の獲得を提案する.従来の道具使用モデルでは，道具の特徴量，身体モデル，動作を設計する必要があった．我々は，事前設計を必要としない手法を神経回路モデルで行う．道具の特徴量は，Self Organized Mapを用いて画像から自己組織的に獲得する．動作については，人間の発達過程にみられる身体バブリングを参考に学習させる．具体的には，ランダム動作中の感覚運動データから，神経回路モデルMultipleTime-scales Neural Network(MTRNN)に自己の身体モデルを獲得させる．道具使用時は学習済みのMTRNNにParametric Biasを付加することで獲得した身体モデルを変更することなく動作生成を実現する．実験はシミュレーション上で, 素手による物体操作で身体モデルの学習後，T字型の道具による物体操作を行った.結果として, 目標画像を与えることでその目標状態に近くなるように引き寄せ動作を生成することが確認できた．

CiNii
ロボットによる描画運動発達モデルと軌道の重み付き区間認識・学習を利用した精度向上

望月敬太, 西出俊, 奥乃博, 尾形哲也

情報処理学会第76回全国大会 3C - 5 2014年03月
S-CTRNNを用いた複数時系列パターンの記憶学習

村田真悟, 有江浩明, 尾形哲也, 谷淳, 菅野重樹

情報処理学会第76回全国大会 3C - 6 2014年03月
Deep Neural Networkを用いたマルチモーダル音声認識の為の特徴量学習

山口雄紀, 野田邦昭, 中臺一博, 奥乃博, 尾形哲也

情報処理学会第76回全国大会 5S - 3 2014年03月
Tool-body assimilation model using a neuro-dynamical system for acquiring representation of tool function and motion.

Kuniyuki Takahashi, Tetsuya Ogata, Hadi Tjandra, Yuki Yamaguchi, Yuki Suga, Shigeki Sugano

IEEE/ASME International Conference on Advanced Intelligent Mechatronics, AIM ( PM2-3 ) 1255 - 1260 2014年

　概要を見る

In this paper, we propose a tool-body assimilation model that implements a multiple time-scales recurrent neural network (MTRNN). Our model allows a robot to acquire the representation of a tool function and the required motion without having any prior knowledge of the tool. It is composed of five modules: image feature extraction, body model, tool dynamics feature, tool recognition, and motion recognition. Self-organizing maps (SOM) are used for image feature extraction from raw images. The MTRNN is used for body model learning. Parametric bias (PB) nodes are used to learn tool dynamic features. The PB nodes are attached to the neurons of the MTRNN to modulate the body model. A hierarchical neural network (HNN) is implemented for tool and motion recognition. Experiments were conducted using OpenHRP3, a robotics simulator, with multiple tools. The results show that the tool-body assimilation model is capable of recognizing tools, including those having an unlearned shape, and acquires the required motions accordingly. © 2014 IEEE.

DOI

Scopus

4

被引用数

(Scopus)
Applying intrinsic motivation for visuomotor learning of robot arm motion

Shun Nishide, Harumitsu Nobuta, Hiroshi G. Okuno, Tetsuya Ogata

2014 11th International Conference on Ubiquitous Robots and Ambient Intelligence, URAI 2014 364 - 367 2014年 [査読有り]

担当区分：最終著者

　概要を見る

In this paper, we present a method to apply intrinsic motivation for improving visuomotor learning of robot's arm with external object in view. Multiple Timescales Recurrent Neural Network (MTRNN) is utilized for learning the robot arm/external object dynamics. Training of MTRNN is done using the Back Propagation Through Time (BPTT) algorithm. BPTT algorithm is modified as follows. 1. Evaluate predictability of robot arm/objects using training error of MTRNN. 2. Assign a preference ratio to each object based on predictability. The preference ratio represents the weight of each object to training. Experiments were conducted using an actual robot moving the arm while a human moves his arm in the robot's camera view. The result of the experiment showed that the proposed method presents better training result of robot arm visuomotor dynamics compared to general training with BPTT.

DOI

Scopus
Lipreading using convolutional neural network

Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 1149 - 1153 2014年 [査読有り]

担当区分：最終著者

　概要を見る

In recent automatic speech recognition studies, deep learning architecture applications for acoustic modeling have eclipsed conventional sound features such as Mel-frequency cepstral co- efficients. However, for visual speech recognition (VSR) stud- ies, handcrafted visual feature extraction mechanisms are still widely utilized. In this paper, we propose to apply a convo- lutional neural network (CNN) as a visual feature extraction mechanism for VSR. By training a CNN with images of a speaker's mouth area in combination with phoneme labels, the CNN acquires multiple convolutional filters, used to extract vi- sual features essential for recognizing phonemes. Further, by modeling the temporal dependencies of the generated phoneme label sequences, a hidden Markov model in our proposed sys- Tem recognizes multiple isolated words. Our proposed system is evaluated on an audio-visual speech dataset comprising 300 Japanese words with six different speakers. The evaluation re- sults of our isolated word recognition experiment demonstrate that the visual features acquired by the CNN significantly out- perform those acquired by conventional dimensionality com- pression approaches, including principal component analysis.
Tool-body assimilation model using a neuro-dynamical system for acquiring representation of tool function and motion

Kuniyuki Takahshi, Tetsuya Ogata, Hadi Tjandra, Yuki Yamaguchi, Yuki Suga, Shigeki Sugano

IEEE/ASME International Conference on Advanced Intelligent Mechatronics, AIM ( PM2-3 ) 1255 - 1260 2014年 [査読有り]

　概要を見る

In this paper, we propose a tool-body assimilation model that implements a multiple time-scales recurrent neural network (MTRNN). Our model allows a robot to acquire the representation of a tool function and the required motion without having any prior knowledge of the tool. It is composed of five modules: image feature extraction, body model, tool dynamics feature, tool recognition, and motion recognition. Self-organizing maps (SOM) are used for image feature extraction from raw images. The MTRNN is used for body model learning. Parametric bias (PB) nodes are used to learn tool dynamic features. The PB nodes are attached to the neurons of the MTRNN to modulate the body model. A hierarchical neural network (HNN) is implemented for tool and motion recognition. Experiments were conducted using OpenHRP3, a robotics simulator, with multiple tools. The results show that the tool-body assimilation model is capable of recognizing tools, including those having an unlearned shape, and acquires the required motions accordingly. © 2014 IEEE.

DOI

Scopus

4

被引用数

(Scopus)
The interaction between a robot and multiple people based on spatially mapping of friendliness and motion parameters

Tsuyoshi Tasaki, Tetsuya Ogata, Hiroshi G. Okuno

Advanced Robotics 28 ( 1 ) 39 - 51 2014年 [査読有り]

　概要を見る

We aim to achieve interaction between a robot and multiple people. For this, robots should localize people, select an interaction partner, and act appropriately for him/her. It is difficult to deal with all these problems using only the sensors installed into the robots. We focus on that people use a rough interaction distance among other people . We divide this interaction area into different spaces based on both the interaction distances and sensor abilities of robots. Our robots localize people roughly within this divided space. To select an interaction partner, they map friendliness holding the interaction history onto the divided space, and integrate the sensor information. Furthermore, we developed a method for appropriately changing the motions, sizes, and speeds based on the distance. Our robots regard the divided spaces as Q-Learning states, and learn the motion parameters. Our robot interacted with 27 visitors. It localized a partner with an F-value of 0.76 through integration, which is higher than that of a single sensor. A factor analysis was performed on the results from questionnaires. Exciting and Friendly were the representatives of the first and second factors, respectively. For both factors, a motion with friendliness provided higher impression scores than that without friendliness. © 2013 Taylor & Francis and The Robotics Society of Japan.

DOI

Scopus

4

被引用数

(Scopus)
Tool-body assimilation model based on body babbling and a neuro-dynamical system for motion generation

Kuniyuki Takahashi, Tetsuya Ogata, Hadi Tjandra, Shingo Murata, Hiroaki Arie, Shigeki Sugano

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8681 LNCS 363 - 370 2014年 [査読有り]

　概要を見る

We propose a model for robots to use tools without predetermined parameters based on a human cognitive model. Almost all existing studies of robot using tool require predetermined motions and tool features, so the motion patterns are limited and the robots cannot use new tools. Other studies use a full search for new tools; however, this entails an enormous number of calculations. We built a model for tool use based on the phenomenon of tool-body assimilation using the following approach: We used a humanoid robot model to generate random motion, based on human body babbling. These rich motion experiences were then used to train a recurrent neural network for modeling a body image. Tool features were self-organized in the parametric bias modulating the body image according to the used tool. Finally, we designed the neural network for the robot to generate motion only from the target image. © 2014 Springer International Publishing Switzerland.

DOI

Scopus

3

被引用数

(Scopus)
Learning and recognition of multiple fluctuating temporal patterns using S-CTRNN

Shingo Murata, Hiroaki Arie, Tetsuya Ogata, Jun Tani, Shigeki Sugano

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8681 LNCS 9 - 16 2014年 [査読有り]

　概要を見る

In the present study, we demonstrate the learning and recognition capabilities of our recently proposed recurrent neural network (RNN) model called stochastic continuous-time RNN (S-CTRNN). S-CTRNN can learn to predict not only the mean but also the variance of the next state of the learning targets. The network parameters consisting of weights, biases, and initial states of context neurons are optimized through maximum likelihood estimation (MLE) using the gradient descent method. First, we clarify the essential difference between the learning capabilities of conventional CTRNN and S-CTRNN by analyzing the results of a numerical experiment in which multiple fluctuating temporal patterns were used as training data, where the variance of the Gaussian noise varied among the patterns. Furthermore, we also show that the trained S-CTRNN can recognize given fluctuating patterns by inferring the initial states that can reproduce the patterns through the same MLE scheme as that used for network training. © 2014 Springer International Publishing Switzerland.

DOI

Scopus

9

被引用数

(Scopus)
RTシステムの再利用性を高めるためのオープンフレームワークの開発

菅佑樹, 尾形哲也

計測自動制御学会システムインテグレーション部門講演会 SI2013 3G1 - 2 2013年12月
神経回路モデルを用いた道具身体化による道具機能と動作の獲得

高橋城志, Tjandra Hadi, 山口雄紀, 菅佑樹, 菅野重樹, 尾形哲也

計測自動制御学会システムインテグレーション部門講演会 SI2013 2K1 - 5 2013年12月
マルチメディア向けグラフィカル統合開発環境「Max」とRTCを繋ぐブリッジプラグインの開発

佐々木一磨, 寺田翔太, 有江浩明, 野田邦昭, 菅佑樹, 尾形哲也

計測自動制御学会システムインテグレーション部門講演会 SI2013 1B3 - 1 2013年12月
レコードスケッチ

寺田翔太, 佐々木一磨, 有江浩明, 野田邦昭, 菅佑樹, 尾形哲也

計測自動制御学会システムインテグレーション部門講演会 SI2013 1B2 - 6 2013年12月
相槌認識による聞き手の理解状態推定を利用したインタラクションのためのロボット動作制御

田崎豪, 尾形哲也, 奥乃博

ヒューマンインタフェース学会論文誌 15 ( 4 ) 363 - 374 2013年11月

CiNii
Altered Prediction of Uncertainty Induced by Network Disequilibrium: A Neuro-Robotics Study

Shingo Murata, Yuichi Yamashita, Tetsuya Ogata, Hiroaki Arie, Jun Tani, Shigeki Sugano

Computational Psychiatry 2013, Abstract, Miami, USA 2013年10月
スパース再帰神経回路モデルによる人物の行動学習

西出俊, 奥乃博, 尾形哲也

日本ロボット学会第31回学術講演会 2C2 - 01 2013年09月
停止活動を活用した描画運動におけるロボットの発達的模倣学習

望月敬太, 西出俊, 奥乃博, 尾形哲也

日本ロボット学会第31回学術講演会 1C2 - 06 2013年09月
Deep neural networkを用いたヒューマノイドロボットによる物体操作行動の記憶学習と行動生成

野田邦昭, 有江浩明, 菅佑樹, 尾形哲也

人工知能学会全国大会（第27回） 2G4-OS-19a-2 2013年06月
再帰型神経回路モデルによる予測可能性を利用した自己・他者の識別

有江浩明, 野田邦昭, 菅佑樹,谷淳, 尾形哲也

人工知能学会全国大会（第27回） 3J3-OS-20b-1 2013年06月
空間の能動的認知と身体の拡張

尾形哲也

第57回システム制御情報学会研究発表講演会(SCI’13) 126 - 1 2013年05月
RTミドルウエア利用者のためのオープンフレームワークの開発

菅佑樹, 尾形哲也

日本機械学会ロボティクスメカトロニクス講演会 1P1 - C03 2013年05月
Deep neural networkを用いた連想記憶メカニズムによるヒューマノイドロボットの適応的行動選択

野田邦昭, 有江浩明, 菅佑樹, 尾形哲也

日本機械学会ロボティクスメカトロニクス講演会 1P1 - B01 2013年05月
再帰型神経回路モデルを用いた引き込みによる適応的な行為生成

有江浩明, 野田邦昭, 菅佑樹,谷淳, 尾形哲也

日本機械学会ロボティクスメカトロニクス講演会 1P1 - B03 2013年05月
全探索と人間のアフォーダンスとの定量的差異の検証

高橋城志, 尾形哲也, 岩田浩康, 菅野重樹

第13回日本赤ちゃん学会学術集会 2013年05月
Integration of behaviors and languages with a hierarchal structure self-organized in a neuro-dynamical model

Tetsuya Ogata, Hiroshi G. Okuno

2013 IEEE Workshop on Robotic Intelligence in Informationally Structured Space (RiiSS) 89 - 95 2013年04月 [査読有り]

担当区分：筆頭著者, 責任著者

　概要を見る

This paper proposes an approach for robots to ac-quire language grounding in their robot's sensory-motor flow using neuro-dynamical models. We trained our neuro-dynamical model over a set of sentences represented as sequences of characters. For the integrated recognition, we introduced a cognitive hypothesis for integrated recognition where a human's brain separately processed the 'structure' and 'contents' of a sentence. Our model was trained with the spelling of words and their semantic role emerged in the first model. As a result of binding the model with sensory-motion patterns, we confirmed that it could associate proper word spellings with a sensory-motor flows and a semantic roles, even if an observed flow had not been learned. © 2013 IEEE.

DOI
多層神経回路モデルによる共感覚現象の学習と連想

山口雄紀, 野田邦明, 西出俊, 奥乃博, 尾形哲也

情報処理学会第75回全国大会 2013年03月
神経回路モデルを用いたロボットの描画運動における発達的模倣学習

望月敬太, 西出俊, 奥乃博, 尾形哲也

情報処理学会第75回全国大会 1R - 5 2013年03月
再帰結合神経回路モデルを用いた予測可能性による段階的対象選択学習

信田春満, 西出俊, 奥乃博, 尾形哲也

情報処理学会第75回全国大会 3R - 2 2013年03月
Robust Multipitch Analyzer against Initialization based on Latent Harmonic Allocation using Overtone Corpus

Daichi Sakaue, Katsutoshi Itoyama, Tetsuya Ogata, Hiroshi G. Okuno

Journal of Information Processing 21 ( 2 ) 246 - 255 2013年 [査読有り]

　概要を見る

We present a Bayesian analysis method that estimates the harmonic structure of musical instruments in music signals on the basis of psychoacoustic evidence. Since the main objective of multipitch analysis is joint estimation of the fundamental frequencies and their harmonic structures, the performance of harmonic structure estimation significantly affects fundamental frequency estimation accuracy. Many methods have been proposed for estimating the harmonic structure accurately, but no method has been proposed that satisfies all these requirements: robust against initialization, optimization-free, and psychoacoustically appropriate and thus easy to develop further. Our method satisfies these requirements by explicitly incorporating Terhardt's virtual pitch theory within a Bayesian framework. It does this by automatically learning the valid weight range of the harmonic components using a MIDI synthesizer. The bounds are termed "overtone corpus." Modeling demonstrated that the proposed overtone corpus method can stably estimate the harmonic structure of 40 musical pieces for a wide variety of initial settings. © 2013 Information Processing Society of Japan.

DOI CiNii

Scopus
Developmental human-robot imitation learning of drawing with a neuro dynamical system

Keita Mochizuki, Shun Nishide, Hiroshi G. Okuno, Tetsuya Ogata

Proceedings - 2013 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2013 2336 - 2341 2013年 [査読有り]

担当区分：最終著者

　概要を見る

This paper mainly deals with robot developmental learning on drawing and discusses the influences of physical embodiment to the task. Humans are said to develop their drawing skills through five phases: 1) Scribbling, 2) Fortuitous Realism, 3) Failed Realism, 4) Intellectual Realism, 5) Visual Realism. We implement phases 1) and 3) into the humanoid robot NAO, holding a pen, using a neuro dynamical model, namely Multiple Timescales Recurrent Neural Network (MTRNN). For phase 1), we used random arm motion of the robot as body babbling to associate motor dynamics with pen position dynamics. For phase 3), we developed incremental imitation learning to imitate and develop the robot's drawing skill using basic shapes: circle, triangle, and rectangle. We confirmed two notable features from the experiment. First, the drawing was better performed for shapes requiring arm motions used in babbling. Second, performance of clockwise drawing of circle was good from beginning, which is a similar phenomenon that can be observed in human development. The results imply the capability of the model to create a developmental robot relating to human development. © 2013 IEEE.

DOI

Scopus

32

被引用数

(Scopus)
Intersensory causality modeling using deep neural networks

Kuniaki Noda, Hiroaki Arie, Yuki Suga, Tetsuya Ogata

Proceedings - 2013 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2013 1995 - 2000 2013年 [査読有り]

担当区分：最終著者

　概要を見る

Our brain is known to enhance perceptual precision and reduce ambiguity about sensory environment by integrating multiple sources of sensory information acquired from different modalities, such as vision, auditory and somatic sensation. From an engineering perspective, building a computational model that replicates this ability to integrate multimodal information and to self-organize the causal dependency among them, represents one of the central challenges in robotics. In this study, we propose such a model based on a deep learning framework and we evaluate the proposed model by conducting a bell ring task using a small humanoid robot. Our experimental results demonstrate that (1) the cross-modal memory retrieval function of the proposed method succeeds in generating visual sequence from the corresponding sound and bell ring motion, and (2) the proposed method leads to accurate causal dependencies among the sensory-motor sequence. © 2013 IEEE.

DOI

Scopus

4

被引用数

(Scopus)
Multimodal integration learning of object manipulation behaviors using deep neural networks

Kuniaki Noda, Hiroaki Arie, Yuki Suga, Testuya Ogata

IEEE International Conference on Intelligent Robots and Systems 1728 - 1733 2013年 [査読有り]

担当区分：最終著者

　概要を見る

This paper presents a novel computational approach for modeling and generating multiple object manipulation behaviors by a humanoid robot. The contribution of this paper is that deep learning methods are applied not only for multimodal sensor fusion but also for sensory-motor coordination. More specifically, a time-delay deep neural network is applied for modeling multiple behavior patterns represented with multi-dimensional visuomotor temporal sequences. By using the efficient training performance of Hessian-free optimization, the proposed mechanism successfully models six different object manipulation behaviors in a single network. The generalization capability of the learning mechanism enables the acquired model to perform the functions of cross-modal memory retrieval and temporal sequence prediction. The experimental results show that the motion patterns for object manipulation behaviors are successfully generated from the corresponding image sequence, and vice versa. Moreover, the temporal sequence prediction enables the robot to interactively switch multiple behaviors in accordance with changes in the displayed objects. © 2013 IEEE.

DOI

Scopus

14

被引用数

(Scopus)
Learning and association of synaesthesia phenomenon using deep neural networks

Yuki Yamaguchi, Kuniaki Noda, Shun Nishide, Hiroshi G. Okuno, Tetsuya Ogata

2013 IEEE/SICE International Symposium on System Integration, SII 2013 659 - 664 2013年 [査読有り]

担当区分：最終著者

　概要を見る

Robots are required to process multimodal information because the information in the real world comes from various modal inputs. However, there exist only a few robots integrating multimodal information. Humans can recognize the environment effectively by cross-modal processing. We focus on modeling synaesthesia phenomenon known to be a cross-modal perception of humans. Recently, deep neural networks (DNNs) have gained more attention and successfully applied to process high-dimensional data composed not only of single modality but also of multimodal information. We introduced DNNs to construct multimodal association model which can reconstruct one modality from the other modality. Our model is composed of two DNNs: one for image compression and the other for audio-visual sequential learning. We tried to reproduce synaesthesia phenomenon by training our model with the multimodal data acquired from psychological experiment. Cross-modal association experiment showed that our model can reconstruct the same or similar images from sound as synaesthetes, those who experience synaesthesia. The analysis of middle layers of DNNs representing multimodal features implied that DNNs self-organized the difference of perception between individual synaesthetes. © 2013 IEEE.

DOI

Scopus
RTミドルウエアを用いたテレプレゼンスロボット用フレームワークの開発

菅佑樹, 尾形哲也

SI 2012 2012年12月
再帰型神経回路モデルを用いた内発的動機付けによる身体モデルの優先的学習

信田春満, 西出俊, 奥乃博, 尾形哲也

SI 2012 2012年12月
Tool-Body Assimilation Model using Neuro-Dynamical System for Acquiring Representation of Tool Function

Yuki YAMAGUCHI, Harumitsu NOBUTA, Shun NISHIDE, Hiroshi G. OKUNO, Tetsuya OGATA

IROS2012 Workshop on Cognitive Neuroscience Robotics ( PM2-3 ) 6 2012年10月 [査読有り]

担当区分：最終著者
Developmental Human-Robot Imitation Learning with Phased Structuring in Neuro Dynamical System

Keita MOCHIZUKI, Harumitsu NOBUTA, Shun NISHIDE, Hiroshi G. OKUNO, Tetsuya OGATA

IROS2012 Workshop on Cognitive Neuroscience Robotics ( Pos-3 ) 6 2012年10月 [査読有り]

担当区分：最終著者
Automatic allocation of training data for speech understanding based on multiple model combinations

Kazunori Komatani, Mikio Nakano, Masaki Katsumaru, Kotaro Funakoshi, Tetsuya Ogata, Hiroshi G. Okuno

IEICE Transactions on Information and Systems E95-D ( 9 ) 2298 - 2307 2012年09月 [査読有り]

　概要を見る

The optimal way to build speech understanding modules depends on the amount of training data available. When only a small amount of training data is available, effective allocation of the data is crucial to preventing overfitting of statistical methods. We have developed a method for allocating a limited amount of training data in accordance with the amount available. Our method exploits rule-based methods for when the amount of data is small, which are included in our speech understanding framework based on multiple model combinations, i.e., multiple automatic speech recognition (ASR) modules and multiple language understanding (LU) modules, and then allocates training data preferentially to the modules that dominate the overall performance of speech understanding. Experimental evaluation showed that our allocation method consistently outperforms baseline methods that use a single ASR module and a single LU module while the amount of training data increases. Copyright © 2012 The Institute of Electronics, Information and Communication Engineers.

DOI

Scopus
Automated violin fingering transcription through analysis of an audio recording

Akira Maezawa, Katsutoshi Itoyama, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Computer Music Journal 36 ( 3 ) 57 - 72 2012年09月 [査読有り]

　概要を見る

We present a method to recuperate fingerings for a given piece of violin music in order to recreate the timbre of a given audio recording of the piece. This is achieved by first analyzing an audio signal to determine the most likely sequence of two-dimensional fingerboard locations (string number and location along the string), which recovers elements of violin fingering relevant to timbre. This sequence is then used as a constraint for finding an ergonomic sequence of finger placements that satisfies both the sequence of notated pitch and the given fingerboard-location sequence. Fingerboard-location-sequence estimation is based on estimation of a hidden Markov model, each state of which represents a particular fingerboard location and emits a Gaussian mixture model of the relative strengths of harmonics. The relative strengths of harmonics are estimated from a polyphonic mixture using score-informed source segregation, and compensates for discrepancies between observed data and training data through mean normalization. Fingering estimation is based on the modeling of a cost function for a sequence of finger placements. We tailor our model to incorporate the playing practices of the violin. We evaluate the performance of the fingerboard-location estimator with a polyphonic mixture, and with recordings of a violin whose timbral characteristics differ significantly from that of the training data. We subjectively evaluate the fingering estimator and validate the effectiveness of tailoring the fingering model towards the violin. © 2012 Massachusetts Institute of Technology.

DOI

Scopus

11

被引用数

(Scopus)
発達的模倣学習における神経力学モデルの段階的構造化と獲得プリミティブの解析

望月敬太, 信田春満, 西出俊奥乃博, 尾形哲也

日本ロボット学会第30回学術講演会 4N3 - 4 2012年09月
神経力学モデルを用いたロボットの道具身体化機構

山口雄紀, 信田春満, 西出俊, 奥乃博, 尾形哲也

日本ロボット学会第30回学術講演会 4N3 - 3 2012年09月
MTRNNを用いたロボットの物体操作における挙動表現特徴量の自己組織化

西出俊, 奥乃博, 尾形哲也

日本ロボット学会第30回学術講演会 3N1 - 7 2012年09月
人とロボットの合奏のための多人数合奏の主導権推定

水本武志, 尾形哲也, 奥乃博

日本ロボット学会第30回学術講演会 3D2 - 3 2012年09月
神経力学モデルによる自己身体領域抽出と視覚運動系の自己組織化

信田春満, 河本献太, 野田邦昭, 佐部浩太郎, 西出俊, 奥乃博, 尾形哲也

日本ロボット学会第30回学術講演会 2H3 - 2 2012年09月
OpenRTM-aist 体験用開発ツール「RT System Builder」の開発

菅佑樹, 尾形哲也

日本ロボット学会第30回学術講演会 2B1 - 7 2012年09月
同時複数音源に対する擬音語による音源選択システム

山村祐介, 高橋徹, 尾形哲也, 奥乃博

情報処理学会全国大会講演論文集 74th ( 2 ) 2.587-2.588 2012年03月

J-GLOBAL
楽曲印象軌跡に基づく楽曲検索システムの実装と評価

西川直毅, 糸山克寿, 藤原弘将, 後藤真孝, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 1S - 7, 6 2012年03月
神経回路モデルを用いた道具身体化モデルによる道具機能表現の獲得

山口雄紀, 信田春満, 西出俊, 奥乃博, 尾形哲也

情報処理学会第74回全国大会 4Q - 3, 7 2012年03月
Kinectによる楽器マスキングを用いた視聴覚統合ビートトラッキング

糸原達彦, 水本武志, 大塚琢馬, 中臺一博, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 5ZD 2012年03月
段階的に構造化する神経回路モデルを用いたロボットと人間の発達的インタラクション

望月敬太, 信田春満, 西出俊奥乃博, 尾形哲也

情報処理学会第74回全国大会 5ZA 2012年03月
The DESIRE Model: Cross-modal emotion analysis and expression for robots

Angelica Lim, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 5ZA 2012年03月
複数音源下での擬音語による音源選択システム,

山村祐介, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 4U 2012年03月
パーティクルフィルタを用いた動的環境下の複数音源追跡

黄楊暘, 大塚琢馬, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 4U 2012年03月
Complex Infinite Sparse Factor Analysisによる周波数領域での音声信号のブラインド音源分離

柳楽浩平, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 4U 2012年03月
倍音コーパスを用いた初期値依存性の低い多重基本周波数推定法

阪上大地, 糸山克寿, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 4S - 7, 7 2012年03月
押弦制約付きギター演奏自動採譜システム

矢澤一樹, 阪上大地, 糸山克寿, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 4S - 6, 7 2012年03月
柔軟関節をもつ人間型ロボットにおける神経力学モデルを用いたダイナミック動作の模倣

壷内将之, 尾形哲也, 奥乃博, 西出俊, 信田春満

情報処理学会第74回全国大会 5P - 7, 8 2012年03月
再帰型神経回路モデルを用いた視野変化予測と場所知覚ニューロンの発現

信田春満, 河本献太, 野田邦昭, 佐部浩太郎, 奥乃博, 尾形哲也

情報処理学会第74回全国大会 5P - 8, 8 2012年03月
アクセント特徴量を用いた歌声と朗読音声の識別システム

阿曽慎平, 齋藤毅, 後藤真孝, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 6U - 9, 8 2012年03月
発話中の方言変化に頑健な方言変換システム

平山直樹, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第74回全国大会 6U - 8, 8 2012年03月
音響特徴・ベース音・和音遷移を用いた自動和音認識

糸山克寿, 尾形哲也, 奥乃博

第94回音楽情報科学研究会,舘山寺温泉, 情報処理学会 Vol.2012-MUS-94, pp 2012年02月
スペクトル変化量のピーク間隔・F0・MFCCを用いた歌声と朗読音声の自動識別システム

阿曽慎平, 齋藤毅, 後藤真孝, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

第94回音楽情報科学研究会,舘山寺温泉, 情報処理学会 Vol.2012-MUS-94, pp 2012年02月
Sound sources selection system by using onomatopoeic querries from multiple sound sources

Yusuke Yamamura, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 2364 - 2369 2012年 [査読有り]

　概要を見る

Our motivation is to develop a robot that treats auditory information in real environment because auditory information is useful for animated communications or understanding our surroundings. Interactions by using sound information need an aquisition of it and a proper sound source reference between a user and a robot leads to it. Such sound source reference is difficult due to multiple sound sources generating in real environemnt, and we use onomatopoeic representations as a representation for the reference. This paper shows a system that selects a sound source specified by a user from multiple sound sources. Users use onomatopoeias in the specification, and our system separates a mixed sound and converts separated sounds into onomatopoeias for the selection. Onomatopoeais have the ambiguity that each user gives each expression to a certain sound and we create an original similarity based on Minimum Edit Distance and acoustic features for solving its problem. In experiments, our system receives a mixed sound consisting of three sounds and a user's query as inputs, and checks a count of a consistency of a sound source selected by a system and a sound source specified by a user in 100 tests. The result shows © 2012 IEEE.

DOI

Scopus

1

被引用数

(Scopus)
Tool-body assimilation of humanoid robot using a neurodynamical system

Shun Nishide, Jun Tani, Toru Takahashi, Hiroshi G. Okuno, Tetsuya Ogata

IEEE Transactions on Autonomous Mental Development 4 ( 2 ) 139 - 149 2012年 [査読有り]

担当区分：最終著者

　概要を見る

Researches in the brain science field have uncovered the human capability to use tools as if they are part of the human bodies (known as tool-body assimilation) through trial and experience. This paper presents a method to apply a robot's active sensing experience to create the tool-body assimilation model. The model is composed of a feature extraction module, dynamics learning module, and a tool-body assimilation module. Self-organizing map (SOM) is used for the feature extraction module to extract object features from raw images. Multiple time-scales recurrent neural network (MTRNN) is used as the dynamics learning module. Parametric bias (PB) nodes are attached to the weights of MTRNN as second-order network to modulate the behavior of MTRNN based on the properties of the tool. The generalization capability of neural networks provide the model the ability to deal with unknown tools. Experiments were conducted with the humanoid robot HRP-2 using no tool, I-shaped, T-shaped, and L-shaped tools. The distribution of PB values have shown that the model has learned that the robot's dynamic properties change when holding a tool. Motion generation experiments show that the tool-body assimilation model is capable of applying to unknown tools to generate goal-oriented motions. © 2012 IEEE.

DOI

Scopus

31

被引用数

(Scopus)
Efficient blind dereverberation and echo cancellation based on independent component analysis for actual acoustic signals

Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Neural Computation 24 ( 1 ) 234 - 272 2012年 [査読有り]

　概要を見る

This letter presents a new algorithm for blind dereverberation and echo cancellation based on independent component analysis (ICA) for actual acoustic signals. We focus on frequency domain ICA (FD-ICA) because its computational cost and speed of learning convergence are sufficiently reasonable for practical applications such as hands-free speech recognition. In applying conventional FD-ICA as a preprocessing of automatic speech recognition in noisy environments, one of the most critical problems is how to copewith reverberations. To extract a clean signal from the reverberant observation, we model the separation process in the shorttime Fourier transform domain and apply the multiple input/output inverse-filtering theorem (MINT) to the FD-ICA separation model. A naive implementation of this method is computationally expensive, because its time complexity is the second order of reverberation time. Therefore, themain issue in dereverberation is to reduce the high computational cost of ICA. In this letter, wereduce the computational complexity to the linear order of the reverberation time by using two techniques: (1) a separation model based on the independence of delayed observed signals with MINT and (2) spatial sphering for preprocessing. Experiments show that the computational cost grows in proportion to the linear order of the reverberation time and that ourmethod improves the word correctness of automatic speech recognition by 10 to 20 points in a RT20 =670 ms reverberant environment. © 2011 Massachusetts Institute of Technology.

DOI PubMed

Scopus

14

被引用数

(Scopus)
A musical robot that synchronizes with a coplayer using non-verbal cues

Angelica Lim, Takeshi Mizumoto, Tetsuya Ogata, Hiroshi G. Okuno

Advanced Robotics 26 ( 3-4 ) 363 - 381 2012年 [査読有り]

　概要を見る

Music has long been used to strengthen bonds between humans. In our research, we develop musical coplayer robots with the hope that music may improve human-robot symbiosis as well. In this paper, we underline the importance of non-verbal, visual communication for ensemble synchronization at the start, during and end of a piece. We propose three cues for interplayer communication, and present a thereminplaying, singing robot that can detect them and adapt its play to a human flutist. Experiments with two naive flutists suggest that the system can recognize naturally occurring flutist gestures without requiring specialized user training. In addition, we show how the use of audio-visual aggregation can allow a robot to adapt to tempo changes quickly. © 2012 Koninklijke Brill NV, Leiden and The Robotics Society of Japan.

DOI

Scopus

9

被引用数

(Scopus)
A multimodal tempo and beat-tracking system based on audiovisual information from live guitar performances

Tatsuhiko Itohara, Takuma Otsuka, Takeshi Mizumoto, Angelica Lim, Tetsuya Ogata, Hiroshi G. Okuno

Eurasip Journal on Audio, Speech, and Music Processing 2012 ( 1 ) 6 - 6 2012年 [査読有り]

　概要を見る

The aim of this paper is to improve beat-tracking for live guitar performances. Beat-tracking is a function to estimate musical measurements, for example musical tempo and phase. This method is critical to achieve a synchronized ensemble performance such as musical robot accompaniment. Beat-tracking of a live guitar performance has to deal with three challenges: tempo fluctuation, beat pattern complexity and environmenta noise. To cope with these problems, we devise an audiovisual integration method for beat-tracking. The auditory beat features are estimated in terms of tactus (phase) and tempo (period) by Spectro-Temporal Pattern Matching (STPM), robust against stationary noise. The visual beat features are estimated by tracking the position of the hand relative to the guitar using optical flow, mean shift and the Hough transform. Both estimated features are integrated using a particle filter to aggregate the multimodal information based on a beat location model and a hand's trajectory model. Experimental results confirm that our beat-tracking improves the F-measure by 8.9 points on average over the Murata beat-tracking method, which uses STPM and rule-based beat detection. The results also show that the system is capable of real-time processing with a suppressed number of particles while preserving the estimation accuracy. We demonstrate an ensemble with the humanoid HRP-2 that plays the theremin with a human guitarist. © 2012 Itohara et al; licensee Springer.

DOI

Scopus

3

被引用数

(Scopus)
Towards expressive musical robots: A cross-modal framework for emotional gesture, voice and music

Angelica Lim, Tetsuya Ogata, Hiroshi G. Okuno

Eurasip Journal on Audio, Speech, and Music Processing 2012 ( 1 ) 3 - 3 2012年 [査読有り]

　概要を見る

It has been long speculated that expression of emotions from different modalities have the same underlying 'code', whether it be a dance step, musical phrase, or tone of voice. This is the first attempt to implement this theory across three modalities, inspired by the polyvalence and repeatability of robotics. We propose a unifying framework to generate emotions across voice, gesture, and music, by representing emotional states as a 4-parameter tuple of speed, intensity, regularity, and extent (SIRE). Our results show that a simple 4-tuple can capture four emotions recognizable at greater than chance across gesture and voice, and at least two emotions across all three modalities. An application for multi-modal, expressive music robots is discussed. © 2012 Lim et al; licensee Springer.

DOI

Scopus

25

被引用数

(Scopus)
A GMM sound source model for blind speech separation in under-determined conditions

Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7191 LNCS 446 - 453 2012年

　概要を見る

This paper focuses on blind speech separation in under-determined conditions, that is, in the case when there are more sound sources than microphones. We introduce a sound source model based on the Gaussian mixture model (GMM) to represent a speech signal in the time-frequency domain, and derive rules for updating the model parameters using the auxiliary function method. Our GMM sound source model consists of two kinds of Gaussians: sharp ones representing harmonic parts and smooth ones representing nonharmonic parts. Experimental results reveal that our method outperforms the method based on non-negative matrix factorization (NMF) by 0.7dB in the signal-to-distortion ratio (SDR), and by 1.7dB in the signal-to-interference ratio (SIR). This means that our method effectively removes interference coming from other talkers. © 2012 Springer-Verlag.

DOI

Scopus

2

被引用数

(Scopus)
Complex extension of infinite sparse factor analysis for blind speech separation

Kohei Nagira, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7191 LNCS 388 - 396 2012年

　概要を見る

We present a method of blind source separation (BSS) for speech signals using a complex extension of infinite sparse factor analysis (ISFA) in the frequency domain. Our method is robust against delayed signals that usually occur in real environments, such as reflections, short-time reverberations, and time lags of signals arriving at microphones. ISFA is a conventional non-parametric Bayesian method of BSS, which has only been applied to time domain signals because it can only deal with real signals. Our method uses complex normal distributions to estimate source signals and mixing matrix. Experimental results indicate that our method outperforms the conventional ISFA in the average signal-to-distortion ratio (SDR). © 2012 Springer-Verlag.

DOI

Scopus

3

被引用数

(Scopus)
Initialization-robust multipitch estimation based on latent harmonic allocation using overtone corpus

Daichi Sakaue, Katsutoshi Itoyama, Tetsuya Ogata, Hiroshi G. Okuno

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 425 - 428 2012年 [査読有り]

　概要を見る

We present a new method for modeling the overtone structures of musical instruments that uses an overtone corpus generated using a MIDI synthesizer. Since multipitch estimation requires a joint estimation of F0's and their overtone structures, one of the most important problems is the overtone structure modeling. Latent harmonic allocation (LHA), a promising multipitch estimation method, is difficult to use for various applications because it requires appropriate prior distributions of the overtone structures, which cannot be determined from statistical evidence. Our method uses an overtone corpus to avoid the problem of setting prior distributions and instead restricts the lower and upper bounds of each overtone weight. The bounds are determined from reference signals generated by a MIDI synthesizer. Experimental results demonstrated that the overtone structures were stably and accurately estimated for a wide variety of initial settings. © 2012 IEEE.

DOI

Scopus

4

被引用数

(Scopus)
Incremental probabilistic geometry estimation for robot scene understanding

Louis Kenzo Cahier, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings - IEEE International Conference on Robotics and Automation 3625 - 3630 2012年 [査読有り]

　概要を見る

Our goal is to give mobile robots a rich representation of their environment as fast as possible. Current mapping methods such as SLAM are often sparse, and scene reconstruction methods using tilting laser scanners are relatively slow. In this paper, we outline a new method for iterative construction of a geometric mesh using streaming time-of-flight range data. Our results show that our algorithm can produce a stable representation after 6 frames, with higher accuracy than raw time-of-flight data. © 2012 IEEE.

DOI

Scopus

2

被引用数

(Scopus)
Rhythm-based adaptive localization in incomplete RFID landmark environments

Kenri Kodaka, Tetsuya Ogata, Shigeki Sugano

Proceedings - IEEE International Conference on Robotics and Automation 2108 - 2114 2012年 [査読有り]

　概要を見る

This paper proposes a novel hybrid-structured model for the adaptive localization of robots combining a stochastic localization model and a rhythmic action model, for avoiding vacant spaces of landmarks efficiently. In regularly arranged landmark environments, robots may not be able to detect any landmarks for a long time during a straight-like movement. Consequently, locally diverse and smooth movement patterns need to be generated to keep the position estimation stable. Conventional approaches aiming at the probabilistic optimization cannot rapidly generate the detailed movement pattern due to a huge computational cost; therefore a simple but diverse movement structure needs to be introduced as an alternative option. We solve this problem by combining a particle filter as the stochastic localization module and the dynamical action model generating a zig-zagging motion. The validation experiments, where virtual-line-tracing tasks are exhibited on a floor-installed RFID environment, show that introducing the proposed rhythm pattern can improve a minimum error boundary and a velocity performance for arbitrary tolerance errors can be improved by the rhythm amplitude adaptation fed back by the localization deviation. © 2012 IEEE.

DOI

Scopus
Adaptive pitch control for robot thereminist using unscented Kalman filter

Takeshi Mizumoto, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Studies in Computational Intelligence 431 19 - 24 2012年 [査読有り]

　概要を見る

We present an adaptive pitch control method for a theremin playing robot in ensemble. The problem of the theremin playing is its sensitivity to the environment. This degrades the pitch accuracy because its pitch characteristics are time varying caused by, such as a co-player motion during the ensemble. We solve this problem using a state space model of this characteristics and an unscented Kalman filter. Experimental results show that our method reduces the pitch error the EKF and block-wise update method by 90% and 77% on average, and the robot can play a musical score of 72.9 cent error on average. © Springer-Verlag Berlin Heidelberg 2012.

DOI

Scopus

1

被引用数

(Scopus)
Automatic chord recognition based on probabilistic integration of acoustic features, bass sounds, and chord transition

Katsutoshi Itoyama, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7345 LNAI 58 - 67 2012年

　概要を見る

We have developed a method that identifies musical chords in polyphonic musical signals. As musical chords mainly represent the harmony of music and are related to other musical elements such as melody and rhythm, we should be able to recognize chords more effectively if this interrelationship is taken into consideration. We use bass pitches as clues for improving chord recognition. The proposed chord recognition system is constructed based on Viterbi-algorithm- based maximum a posteriori estimation that uses a posterior probability based on chord features, chord transition patterns, and bass pitch distributions. Experimental results with 150 Beatles songs that has keys and no modulation showed that the recognition rate was 73.7% on average. © 2012 Springer-Verlag.

DOI

Scopus

1

被引用数

(Scopus)
Self-organization of object features representing motion using Multiple Timescales Recurrent Neural Network

Shun Nishide, Jun Tani, Hiroshi G. Okuno, Tetsuya Ogata

Proceedings of the International Joint Conference on Neural Networks 1 - 8 2012年 [査読有り]

　概要を見る

Affordance theory suggests that humans recognize the environment based on invariants. Invariants are features that describe the environment offering behavioral information to humans. Two types of invariants exist, structural invariants and transformational invariants. In our previous paper, we developed a method that self- organizes transformational invariants, or motion features, from camera images based on robot's experiences. The model used a bi-directional technique combining a recurrent neural network for dynamics learning, namely Recurrent Neural Network with Parametric Bias (RNNPB), and a hierarchical neural network for feature extraction. The bi-directional training method developed in the previous work was effective in clustering the motion of objects, but the analysis did not give good segregation results of the self-organized features (transformational invariants) among different motion types. In this paper, we present a refined model which integrates dynamics learning and feature extraction in a single model. The refined model is comprised of Multiple Timescales Recurrent Neural Network (MTRNN), which possesses better learning capability than RNNPB. Self-organization result of four types of motions have proved the model's capability to create clusters of object motions. The analysis showed that the model extracted feature sequences with different characteristics for four object motion types. © 2012 IEEE.

DOI

Scopus

2

被引用数

(Scopus)
Body area segmentation from visual scene based on predictability of neuro-dynamical system

Harumitsu Nobuta, Kenta Kawamoto, Kuniaki Noda, Kohtaro Sabe, Shun Nishide, Hiroshi G. Okuno, Tetsuya Ogata

Proceedings of the International Joint Conference on Neural Networks 1 - 8 2012年 [査読有り]

　概要を見る

We propose neural models for segmenting the area of a body from visual scene based on predictability. Neuroscience has shown that a prediction model in brain, which predicts sensory-feedback from motor command, can divide the sensory-feedback into the self-motion derived feedback and other derived feedback. The prediction model is important for prediction control of the body. Previous studies in robotics of the prediction model assumed that a robot can recognize the position of its body (e.g. its hand) and that the view contains only that body part. In our models, motor commands and visual feedback (pixel image that includes not only a hand but also object and background) are input into a neural network model and then the body area is segmented and prediction model of body is acquired. Our model contains two parts: 1) An object detection model obtains a conversion system between object positions and the pixel image. 2) A movement prediction model predicts hand-object positions from motor commands and identifies the body. We confirmed that our models can segment the body/object area based on their pixel textures and discriminate between them by using prediction error. © 2012 IEEE.

DOI

Scopus
Who is the leader in a multiperson ensemble? - Multiperson human-robot ensemble model with leaderness - Multiperson h

Takeshi Mizumoto, Tetsuya Ogata, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 1413 - 1419 2012年 [査読有り]

　概要を見る

This paper presents a state space model for a multiperson ensemble and an estimation method of the onset timings, tempos, and leaders. In a multiperson ensemble, determining one explicit leader is difficult because (1) participants' rhythms are mutually influenced and (2) they compete with each other. Most ensemble studies however assumed that one leader exists at a time and the others just follow the leader. To deal with the multiple and time-varying leaders, we define leaderness indicating the power to influence the others as the product of the tempo stability and the distance from the ensemble tempo. This definition means that a leader should have a strong desire to change the current tempo. Using the leaderness, we present a state space model of a multiperson ensemble and an unscented Kalman filter based estimation method. The model consists of the leaderness update, the ensemble tempo update, the individual tempo update, and the onset timing adaptation, each of which has a relationship to psychological results of an ensemble. We evaluate our method using simulation and human behavior. The simulation results show that our model is stable for various initial tempos and the number of participants. For the human behavior, pairs and triads of participants are asked to tap keys in synchronization with the others. The results show that the leaderness successfully indicate the dynamics of the leaders, and the onset errors are 181msec and 241msec for pairs and triads on average, respectively, which are comparable to those of humans (153msec and 227msec for pairs and triads, respectively.) © 2012 IEEE.

DOI

Scopus

3

被引用数

(Scopus)
Improvement of audio-visual score following in robot ensemble with human guitarist

Tatsuhiko Itohara, Kazuhiro Nakadai, Tetsuya Ogata, Hiroshi G. Okuno

IEEE-RAS International Conference on Humanoid Robots 574 - 579 2012年 [査読有り]

　概要を見る

Our goal is to create an ensemble between human guitarists and music robots, e.g., singing and playing instruments robots. Such robots need to detect the tempo and beat time of the music. Score following and beat tracking, which requires and does not requires a score, are commonly used for this purpose. Score following is an incremental audio-to-score alignment. Although most score following methods assume that players have a precise score, most scores for guitarists have only melody and chord sequences without any beat patterns. An audio-visual beat tracking for guitarists is reported that improves the accuracy of beat detection. However, the result of this method is still insufficient because it uses only onset information, not pitch information, and because the hand tracking shows low accuracy. In this paper, we report a multimodal score following for a guitar performance, an extension of an audio-visual beat tracking method. The main difference is to use chord sequences to improve tracking of audio signals and depth information to improve tracking of guitar playing. Chord sequences are used for the calculation of chord correlation between the input and a score. Depth information is used in the guitar plane masking by three dimensional Hough transform, for the stable detection of a hand. Finally, the system extracts score positions and tempos by a particle-filter based integration of audio and visual features, The resulting score following system improves the tempo and the score position of a performance by 0.2 [sec] compared to an existing system. © 2012 IEEE.

DOI

Scopus

7

被引用数

(Scopus)
音楽共演ロボット: 開始・終了キューの画像認識による人間のフルート奏者との実時間同期

リムアンジェリカ, 水本武志, 大塚琢馬, 古谷ルイ賢造, 尾形哲也, 奥乃博

情報処理学会論文誌 52 ( 12 ) 3599 - 3610 2011年12月

　概要を見る

聞く，見るは，共演者が仲間の演奏者とテンポを合わせて演奏するために重要なスキルである．画像キュー（cue，合図）を検知し，他の演奏者に耳を傾けることによって，演奏者はいっせいに演奏を始め，テンポの緩急の合図に合わせ，さらに，いっせいに演奏を終えることができる．本稿では，人間のフルート奏者がアンサンブルリーダを担い，ロボットは伴奏者として人間の演奏に追従する問題を扱う．まず，フルート奏者の3種類のジェスチャを提案し，画像キューによる認識，音響ビートと画像キューとの統合によるテンポ推定について述べ，テルミン演奏共演ロボットのジェスチャ認識について報告する．初期実験で3タイプの画像キューが83%以上で検出できること，また，画像キューと音響ビート検出とを組み合わせることにより，テンポ検出が0.5秒以内に行えればビート追跡が安定することが分かった．この結果，フルート奏者の指示に合わせて共演者音楽ロボットがテルミンを演奏し，歌を歌うことが可能となった．Listening and watching are important skills for co-players to play in time with fellow musicians. By detecting visual cues and listening to other players, musicians can start together, stop together, and follow a leader's visual cues of changes in tempo. In this paper, we formalize three visual cues for the case of flutists, and describe how our thereminist robot co-player system detects them. Initial experiments show over 83% detection rates for our 3 types of visual cues. Additionally, by coupling visual cues and acoustic beat detection, the robot can extract a tempo in half a second. The resulting robot co-player can play theremin and sing a song with the lead of a human flutist.

CiNii
音声対話システムにおける簡略表現認識のための自動語彙拡張

森信介, 駒谷和範, 勝丸真樹, 尾形哲也, 奥乃博

情報処理学会論文誌 52 ( 12 ) 3398 - 3407 2011年12月
発語行為レベルの情報をユーザ発話の解釈に用いる音声対話システム

駒谷和範, 松山匡子, 武田龍, 高橋徹, 尾形哲也, 奥乃博

情報処理学会論文誌 52 ( 12 ) 3374 - 3385 2011年12月
フレーズ置換のための調波非調波GMM・ NMF・残響推定に基づく音源分離・演奏合成

安良岡直希, 吉岡拓也, 糸山克寿, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会論文誌 52 ( 12 ) 3839 - 3852 2011年12月
An interactive musical ensemble with the NAO Thereminist

Angelica Lim, 水本武志, 大塚琢馬, 糸原達彦, 中臺一博, 尾形哲也, 奥乃博

第34回 AI チャレンジ研究会,人工知能学会 2011年12月
マルチロボットによるKinectを用いた同期合奏

糸原達彦, 水本武志, Angelica Lim, 大塚琢馬, 中村圭佑, 長谷川雄二, 中臺一博, 尾形哲也, 奥乃博

第34回 AI チャレンジ研究会, 人工知能学会 SIG-Challenge-B102-10, pp.4-49~4-54 2011年12月

CiNii
ブラインド音源分離のためのInfinite Sparse Factor Analysisの複素拡張

柳楽浩平, 高橋徹, 尾形哲也, 奥乃博

第34回 AI チャレンジ研究会, 人工知能学会 SIG-Challenge-B102-9, pp.4-43~4-48 2011年12月

J-GLOBAL
音源定位手法MUSICのベイズ拡張

大塚琢馬, 中臺一博, 尾形哲也, 奥乃博

第34回 AI チャレンジ研究会, 人工知能学会 SIG-Challenge-B102-6, pp.4-25~4-30 2011年12月
Infinite Sparse Factor Analysis の複素拡張による音声信号のブラインド音源分離

柳楽浩平, 高橋徹, 尾形哲也, 奥乃博

日本音響学会関西支部第14回若手研究者交流研究発表会 2011年12月
More cowbell! A musical ensemble with the NAO thereminist

Angelica Lim, Takeshi MIZUMOTO, Takuma OTSUKA, Tatsuhiko ITOHARA, Kazuhiro NAKADAI, Tetsuya OGATA, Hiroshi G. OKUNO

Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2011) 2011年09月 [査読有り]
Improved Statistical Model-Based Voice Activity Detection with Noise Reduction for the SIG-2 Humanoid Robot

Uihyun Kim, Toru TAKAHASHI, Tetsuya OGATA, Hiroshi G. OKUNO

日本ロボット学会第29回学術講演会 1Q1 - 7 2011年09月
Fast Incremental Probabilistic Surface Recognition for Robot Scene Understanding

Louis Kenzo Cahier, Tetsuya OGATA, Hiroshi G. OKUNO

日本ロボット学会第29回学術講演会 1Q2 - 3 2011年09月
神経力学モデルによる文字列からの言語構造の自己組織化とロボット運動感覚との統合

尾形哲也, 日下航, 奥乃博

日本ロボット学会第29回学術講演会 1A3 - 6 2011年09月
mproving social telepresence by converting emotional voice to robot gesture

Angelica Lim, Tetsuya OGATA, Hiroshi G. OKUNO

Angelica Lim, Tetsuya OGATA, Hiroshi G. OKUNO 1Q3 - 7 2011年09月
MUSIC 法を用いた音源定位のベイズ拡張

大塚琢馬, 尾形哲也, 奥乃博

日本ロボット学会第29回学術講演会 3A3 - 2 2011年09月

CiNii
対話データの再帰結合神経回路による学習と相槌タイミング予測

佐野正太郎, 西出俊, 奥乃博, 尾形哲也

日本ロボット学会第29回学術講演会 3O2 - 1 2011年09月
Improving social telepresence by converting emotional voice to robot gesture

Angelica Lim, Tetsuya OGATA, Hiroshi G. OKUNO

日本ロボット学会第29回学術講演会 1Q3 - 7 2011年09月
MUSIC 法を用いた音源定位のベイズ拡張

大塚琢馬, 尾形哲也, 奥乃博

日本ロボット学会第29回学術講演会 3A3 - 2 2011年09月
神経力学モデルによる身体図式に基づく空間地図の獲得

信田春満, 西出俊, 尾形哲也, 奥乃博

日本ロボット学会第29回学術講演会 3N1 - 5 2011年09月
分散的ランドマーク環境における適応リズムによる移動ロボットの誤差低減

小鷹研理, 尾形哲也, 菅野重樹

日本ロボット学会第29回学術講演会 3N1 - 2 2011年09月
ノンパラメトリックベイズによる時間周波数領域における音声信号のブラインド音源分離

柳楽浩平, 高橋徹, 尾形哲也, 奥乃博

日本ロボット学会第29回学術講演会 29th 3A2 - 5 2011年09月

J-GLOBAL
調波・非調波音源モデルを用いたマイク数以上の音源分離

平澤恭治, 安良岡直希, 高橋徹, 尾形哲也, 奥乃博

日本ロボット学会第29回学術講演会 3A2 - 4 2011年09月
パーティクルフィルタを用いたギター演奏の視聴覚統合ビートトラッキング

糸原達彦, 大塚琢馬, 水本武志, 尾形哲也, 奥乃博

日本ロボット学会第29回学術講演会 3A2 - 2 2011年09月
テルミン演奏ロボットのためのUnscented Kalman Filter による適応的音高制御

水本武志, 尾形哲也, 奥乃博

日本ロボット学会第29回学術講演会 3A2 - 1 2011年09月
ぺた語義:京大における Lisp を使ったプログラミング教育

湯淺太一, 奥乃博, 尾形哲也

情報処理 52 ( 9 ) 1191 - 1194 2011年08月

CiNii
MAHL:演奏者間のインタラクション分析のためのスコアアライメント手法の提案

前澤陽, 糸山克寿, 尾形哲也, 奥乃博

研究報告音楽情報科学（MUS） 2011 ( 19 ) 1 - 6 2011年07月

　概要を見る

本稿では、楽器パート毎に、楽譜と音響信号のアライメントを算出する手法を提案する。本手法では、各楽器パートに共通の、自己回帰過程に従うテンポモデルを持たせる。各楽器パートの時系列は隠れセミマルコフモデルに従い、状態継続長の事前分布としてテンポモデルを持つ。また、音響信号の出力は潜在的調波配分法に従う。パート間の揺らぎを持たせない場合の、アライメントの性能を評価し、アライメント手法としての有用性が確認された。また、演奏における発音タイミングの揺らぎがモデル化できることが示唆された。This paper presents a method to align an audio signal and individual music instrument parts comprising a music score. Such method allows a machine to analyze temporal interaction of music performers. Proposed method is based on fitting multiple Hidden Semi-Markov Models (HSMM) to the observed audio signal, each HSMM of which emits Latent Harmonic Allocation parameters. Each HSMM corresponds to a music instrument part, and the state duration probability is conditioned on an auto-regressive tempo model. Evaluation suggests usefulness as score alignment method, and hints at the usefulness as multiple part alignment method.

CiNii
整合性基準に基づく多対多オーディオアライメント

前澤陽, 糸山克寿, 尾形哲也, 奥乃博

第91回音楽情報科学研究会, 情報処理学会 Vol.2011 2011年07月
歌詞と音響特徴量を用いた楽曲印象軌跡推定法の設計と評価

西川直毅, 糸山克寿, 藤原弘将, 後藤真孝, 尾形哲也, 奥乃博

第91回音楽情報科学研究会,情報処理学会 Vol.2011 2011年07月
"情"が作る真のコミュニケーション (「動き・かたち」と「思考」のサイエンスロボットイノベーション) -- (人間を科学する--ヒューマノイドの知・情・意)

菅野重樹, 尾形哲也

別冊日経サイエンス ( 179 ) 128 - 134 2011年06月

CiNii
Evaluation of Spoken Dialogue System that uses Utterance Timing to Interprete User Utterances

Kazunori KOMATANI, Kyoko MATSUYAMA, Ryu TAKEDA, Tetsuya OGATA, Hiroshi G. OKUNO

Proceedings of International Workshop on Spoken Dialogue Systems (IWSDS2011) 315 - 325 2011年06月
神経力学モデルによるロボットの言語・運動の統合的認知

尾形哲也, 日下航, 奥乃博

2011年度人工知能学会全国大会 3B1 2011年06月

CiNii
擬音語と環境音の音響的関係性を考慮した環境音to擬音語変換システム

山川暢英, 北原鉄朗, 高橋徹, 尾形哲也, 奥乃博

2011年度人工知能学会全国大会 25 1C2 - 4 2011年06月

CiNii
2A1-D09 自己形態主張カスタマイズロボットの開発 : ユーザの自己効力感向上がインタラクションに与える効果の検証(コミュニケーション-ロボット)

山崎由美子, 守良真, 菅祐樹, 尾形哲也, 菅野重樹

ロボティクス・メカトロニクス講演会講演概要集 2011 "2A1 - D09(1)"-"2A1-D09(2)" 2011年05月

　概要を見る

Our goal is to crate a robot which can communicate with people for a long time. To make this possible, we are developing a customizable communication robot "WEAR (Waseda Extendable ARchitecture)". We also provided modules, and experimental subjects can customize the robot by inserting them into the module-port. At the same time, the robot can recognize the customize, and it also can express its autonomy by rejecting the customization by ejecting the module. In this paper, we proposed the communication model to grow self-efficacy. And as the result of the experiments, we found that the proposed communication model is preferred by user than all-accept communication model.

CiNii
Emergence of hierarchical structure mirroring linguistic composition in a recurrent neural network

Wataru Hinoshita, Hiroaki Arie, Jun Tani, Hiroshi G. Okuno, Tetsuya Ogata

Neural Networks 24 ( 4 ) 311 - 320 2011年05月 [査読有り]

　概要を見る

We show that a Multiple Timescale Recurrent Neural Network (MTRNN) can acquire the capabilities to recognize, generate, and correct sentences by self-organizing in a way that mirrors the hierarchical structure of sentences: characters grouped into words, and words into sentences. The model can control which sentence to generate depending on its initial states (generation phase) and the initial states can be calculated from the target sentence (recognition phase). In an experiment, we trained our model over a set of unannotated sentences from an artificial language, represented as sequences of characters. Once trained, the model could recognize and generate grammatical sentences, even if they were not learned. Moreover, we found that our model could correct a few substitution errors in a sentence, and the correction performance was improved by adding the errors to the training sentences in each training iteration with a certain probability. An analysis of the neural activations in our model revealed that the MTRNN had self-organized, reflecting the hierarchical linguistic structure by taking advantage of the differences in timescale among its neurons: in particular, neurons that change the fastest represented "characters", those that change more slowly, "words", and those that change the slowest, "sentences". © 2011 Elsevier Ltd.

DOI PubMed

Scopus

29

被引用数

(Scopus)
ベース音高と和音特徴の統合に基づく和音系列認識

須見康平, 糸山克寿, 吉井和佳, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会論文誌 52 ( 4 ) 1803 - 1812 2011年04月

CiNii
発語行為レベルの情報を用いた音声対話システムの構築とデータ分析

松山匡子, 駒谷和範, 武田龍, 尾形哲也, 奥乃博

人工知能学会言語・音声理解と対話処理研究会資料 61st 7 - 12 2011年03月

J-GLOBAL
視聴覚統合ビートトラッキングを用いた音楽ロボットとギターとの合奏システム

糸原達彦, 大塚琢馬, 水本武志, 高橋徹, 尾形哲也, 奥乃博

全国大会講演論文集 2011 ( 1 ) 235 - 237 2011年03月

　概要を見る

合奏において、ビートトラッキングは動作タイミングの取得の基礎となる技術である。ギターとの合奏において、ビートトラッキングは演奏テンポの揺らぎや裏拍ビートを含む多様なリズムへの頑健性、つまり(1)テンポと(2)音符長の両方の変動に対する追従性が要求される。しかし従来の手法では両立できなかった。本研究では視聴覚情報統合により、両者の変動追従性向上を実現する。(1)の問題にはSTPMという聴覚情報を用いた手法を適用する。(2)の問題はギター演奏動作の周期性を利用し手の位置情報を取得、それとSTPMで得られる信頼度関数とに粒子フィルタを適用することで解決する。

CiNii
拍長の連続性を考慮した潜在的調波配分法に基づくスコアアライメント手法

前澤陽, 後藤真孝, 尾形哲也, 奥乃博

日本音響学会研究発表会講演論文集(CD-ROM) 2011 ROMBUNNO.3-1-15 2011年03月

J-GLOBAL
予測可能性による身体識別及び身体図式獲得

信田春満, 日下航, 尾形哲也, 高橋徹, 奥乃博

情報処理学会全国大会講演論文集 73rd ( 2 ) 2.125-2.126 2011年03月

J-GLOBAL
ロボット聴覚のためのMatching Pursuitによる複数環境音の同定

山川鴨英, 北原鉄朗, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 6P - 3 2011年03月
L1ノルム最小化による劣決定音源分離のための線形計画と二次錐計画の比較評価

平澤恭治, 武田龍, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 2011 ( 1 ) 6P - 2 2011年03月

　概要を見る

我々はマイク数より音源数が多い状況(劣決定状況)に対処するため, 伝達関数が既知の際にL1ノルム最小化を用いて音源分離を行なう手法を改良し, その有効性を確認してきた. しかし我々の手法は高速化のために線形計画(LP)による近似計算を利用しており, 理論的裏付けのある二次錐計画(SOCP)との誤差が提案手法の性能を低下させている可能性があった. 本稿ではLPによる近似解とSOCPによる厳密解を比較し, LPによる近似解が十分に実用可能な精度を保っていることを確認した.

CiNii
音源数同定とブラインド音源分離を同時に行うinfinite ICA

柳楽浩平, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 6P - 1 2011年03月
Audio-visual musical instrument recognition

Angelica Lim, 中村圭佑, 中臺一博, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 2011 ( 1 ) 5R - 9 2011年03月

　概要を見る

Is this person playing a violin or a flute? Classification of musical instrument performances is usually carried out using audio features such as spectral coefficients. We propose augmenting the typical audio feature set with visual features. We show that a combination of audio features and video perform better than audio alone, and verify this multimodal recognition approach on a real-time robot platform.

CiNii
累積頻度重みを適用したパーティクルフィルタによる実時間楽譜追従

大塚琢馬, 中臺一博, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 2011 ( 1 ) 5R - 7 2011年03月

　概要を見る

パーティクルフィルタによる楽譜追従は，音響信号と楽譜との距離に基づくパーティクル重みの計算によって追従性能が大きく左右される．従来のベクトル内積計算やシグモイド関数を用いた重み計算手法では，音響信号の非調波成分や楽器の音色のバリエーションにより，楽譜位置推定が正しい場合，誤った場合でそれぞれの重みに大きな差が生じず，最終的に推定された楽譜位置に誤差が含まれるという問題点があった．本稿では，過去に計算された距離の累積頻度から重みを動的に計算し，正しい楽譜位置ではより高い重みを計算する．評価実験では，累積頻度を用いた重み計算法が，従来の重み計算法よりも楽譜追従精度で改善することが確認された．

CiNii
伝達関数のスパース性仮定に基づく音楽音響信号中のディレイエフェクトブラインド推定

阪上大地, 安良岡直希, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 2011 ( 1 ) 5R - 5 2011年03月

　概要を見る

本稿では楽器音響信号に加えられたディレイエフェクトの強度及び時刻の推定手法について述べる。ディレイ信号は残響の一種と見なせるが、反響音の生起が連続的でなく高々数回しかない点が通常の残響と異なるため、従来の残響推定アルゴリズムでは推定精度が不十分である。我々は、ディレイ音を記述する伝達関数の係数がスパースであるという制約条件の元で残響推定を行うことにより、ディレイエフェクトの強度及び時刻を高精度に推定する。

CiNii
潜在的調波配分法に基づく隠れセミマルコフモデルを用いたベイズ的スコアアライメント

前澤陽, 後藤真孝, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 5R - 4 2011年03月
歌詞と音響特徴量を用いた楽曲の印象軌跡推定

西川直毅, 糸山克寿, 藤原弘将, 後藤真孝, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 5R - 3 2011年03月
調波パラメトリックNMFによる楽器演奏音響信号の分析合成

安良岡直希, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 2011 ( 1 ) 5R - 1 2011年03月

　概要を見る

本稿では調波パラメトリックNonnegative Matrix Factorization (HPNMF) と呼ぶ新しい振幅スペクトログラムモデリング手法を用いた音源分離と演奏合成法について述べる．HPNMFでは，振幅スペクトログラムを直接因子分解するのではなく，各時刻のスペクトルを調波Gaussian Mixtureによりモデル化した上でその各倍音強度パラメータを楽曲全体で因子分解する．これにより基本周波数パラメータをNMFの枠組みの外側で適応でき，通常のNMFが苦手とするビブラート信号などを効率的に推定できる．HPNMFを用いて多重奏からの特定楽器パート音源分離と演奏音響信号再合成が高精度に実現されることを示す．

CiNii
Classification of Harmonic and Textural Keyboard Playing Style Using Acoustic Features

Jooyoung Ahn, 前澤陽, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 2011 ( 1 ) 4C - 2, 2011年03月

　概要を見る

Keyboard playing is a widely used method to represent musical idea,which is played in either harmonic or textural styles. The goal ofthis paper is to classify such style of the user's keyboard playingfrom its audio signal. Because the acoustic features for suchclassification is poorly studied, we defined acoustic features whichrepresent harmonic and textural playing style, and classified actualpractical keyboard playings.

CiNii
Speaker Localization Using Two-Channel Microphone on the SIG-2 Humanoid Robot

Uihyun Kim, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

情報処理学会第73回全国大会 2011 ( 1 ) 4C - 1 2011年03月

　概要を見る

This paper describes a speaker localization system using two microphonesembedded in the head of the SIG-2 humanoid robot. Nowadays, studies based onmicrophone arrays utilize many microphones to obtain high detectionperformance in acoustical engineering. However, using multiple microphoneswould raise costs and hinder maintenance. To achieve good results in speakerlocalization only with two low-cost microphones, we employ the statisticalmodel-based voice activity detection (VAD) and a sound-source-directionestimator based on the generalized cross-correlation with phase transform(GCC-PHAT). The experimental results show that SIG-2 can localize a callerreliably when the speaker calls robot's name in the azimuth.

CiNii
神経力学モデルの引込みによる相槌タイミング予測

佐野正太郎, 尾形哲也, 日下航, 高橋徹, 奥乃博

情報処理学会第73回全国大会 4P - 7 2011年03月
誤認識頻発状況下で選択肢列挙を行う音声対話システムとその評価

松山匡子, 駒谷和範, 武田龍, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 4P - 4 2011年03月
神経回路モデルによる言語とロボット動作の相互連想学習

日下航, 尾形哲也, 高橋徹, 奥乃博

情報処理学会第73回全国大会 3Q - 1 2011年03月
視聴覚統合ビートトラッキングを用いた音楽ロボットとギターとの合奏システム

糸原達彦, 大塚琢馬, 水本武志, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 1ZB - 2 2011年03月
F0・音韻長・パワー制御による歌声らしさ・話声らしさの変化の評価

阿曽慎平, 齋藤毅, 後藤真孝, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 2011 ( 1 ) 2R - 6 2011年03月

　概要を見る

歌声，話声，歌声と話声の中間的な音声，歌舞伎や能の音声のそれぞれの歌声らしさ，話声らしさを評価する．人間の歌声らしさ・話声らしさに関する知覚は連続的に変化すると考えられるため，中間的に知覚される音声が存在する．従来の研究では歌声か，話声かのみを考えており，中間的な音声や歌声・話声以外を考慮していない．計算機で中間的な音声の評価ができれば人間の歌声らしさ・話声らしさの知覚機構解明に貢献できると考えられる．本報告ではF0・音韻長・パワーを制御することで中間的な音声を作る．合成した音声に対し聴取実験を行い，どのような制御が歌声らしさ・話声らしさ知覚に影響を与えるのか聴取実験で評価する．

CiNii
Time-of-flight camera based Probabilistic Polygonal Mesh mapping

Louis-Kenzo Cahier, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第73回全国大会 1T - 5 2011年03月
再帰結合神経回路モデルへのスパース構造導入による学習能力の向上

粟野皓光, 尾形哲也, 有江浩明, 谷淳, 高橋徹, 奥乃博

情報処理学会第73回全国大会 2011 ( 1 ) 1Q - 4 2011年03月

　概要を見る

本稿では，再帰結合型神経回路モデルへのスパース結合導入による性能向上を示す．近年，多様な時系列パターンを学習可能な，スパース結合型神経回路が着目を集めている．しかし通常これらのモデルは内部の結合重みが固定されており，学習能力には限界がある．我々は，異なる時定数のニューロン群からなる再帰結合型神経回路モデル，MTRNNの一部結合をスパース化し，全結合を学習可能としたモデルの性能評価を行った．スパース化率の異なるMTRNNに，アルファベット列からなる文章を学習させ，未知文及びノイズ文の認識・生成能力の評価を行った．実験の結果，スパース結合とすることで，全結合の場合よりも性能を向上できることが確認された．

CiNii
神経力学モデルによる予測可能性を用いた身体識別

信田春満, 尾形哲也, 日下航, 高橋徹, 奥乃博

情報処理学会第73回全国大会 1Q - 1 2011年03月
Preface

Tetsuya Ogata, Tetsuo Sawaragi, Tadahiro Taniguchi

Advanced Robotics 25 ( 17 ) 2125 - 2126 2011年

DOI

Scopus
Robot Audition based on Multiple-Input Independent Component Analysis for Recognizing Barge-In Speech under Reverberation

Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

JRSJ 27 ( 7 ) 782 - 792 2011年 [査読有り]

　概要を見る

This paper presents a new method based on independent component analysis (ICA) for enhancing a target source and suppressing other interfering sound sources, supposed that the latter are known. The method can provides in a reverberant environment a barge-in-able robot audition system; that is, the user can talk to the robot at any time even when the robot speaks. Our method separates and dereverberates the user's speech and the robot's one by using Multiple Input ICA. The critical issue for real-time processing is to reduce the computational complexity of Multiple Input ICA to the linear order of the reverberation time, which has not been proposed so far. We attain it by exploit the property of the independence relationship between late observed signals and late speech signals. Experimental results show that 1) the computational complexity of our method is less than the na&iuml;ve Multiple Input ICA method, and that 2) our method improves word correctness of automatic speech recognition under barge-in and reverberant situations; by at most 40 points for reverberation time of 240[ms] and 30 points for 670[ms].

DOI CiNii
Phoneme Acquisition based on Vowel Imitation Model using Recurrent Neural Network and Physical Vocal Tract Model

Hisashi Kanda, Tetsuya Ogata, Toru Takahashi, Kazunori Komatani, Hiroshi G. Okuno

JRSJ 27 ( 7 ) 802 - 813 2011年 [査読有り]

　概要を見る

This paper proposes a continuous vowel imitation system that explains the process of phoneme acquisition by infants from the dynamical systems perspective. Almost existing models concerning this process dealt with discrete phoneme sequences. Human infants, however, have no knowledge of phoneme innately. They perceive speech sounds as continuous acoustic signals. The imitation target of this study is continuous acoustic signals including unknown numbers and kinds of phonemes. The key ideas of the model are (1) the use of a physical vocal tract model called the Maeda model for embodying the motor theory of speech perception, (2) the use of a dynamical system called the Recurrent Neural Network with Parametric Bias (RNNPB) trained with both dynamics of the acoustic signals and articulatory movements of the Maeda model, and (3) the segmenting method of a temporal sequence using the prediction error of the RNNPB model. The experiments of our model demonstrated following results: (a) the self-organization of the vowel structure into attractors of RNNPB model, (b) the improvement of vowel imitation using movement of the Maeda model, and (c) the generation of clear vowels based on the bubbling process trained with a few random utterances. These results suggest that our model reflects the process of phoneme acquisition.

DOI CiNii
Real-time audio-to-score alignment using particle filter for coplayer music robots

Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Eurasip Journal on Advances in Signal Processing 2011 2011年 [査読有り]

　概要を見る

Our goal is to develop a coplayer music robot capable of presenting a musical expression together with humans. Although many instrument-performing robots exist, they may have difficulty playing with human performers due to the lack of the synchronization function. The robot has to follow differences in humans' performance such as temporal fluctuations to play with human performers. We classify synchronization and musical expression into two levels: (1) melody level and (2) rhythm level to cope with erroneous synchronizations. The idea is as follows: When the synchronization with the melody is reliable, respond to the pitch the robot hears, when the synchronization is uncertain, try to follow the rhythm of the music. Our method estimates the score position for the melody level and the tempo for the rhythm level. The reliability of the score position estimation is extracted from the probability distribution of the score position. The experimental results demonstrate that our method outperforms the existing score following system in 16 songs out of 20 polyphonic songs. The error in the prediction of the score position is reduced by 69 on average. The results also revealed that the switching mechanism alleviates the error in the estimation of the score position. Copyright © 2011 Takuma Otsuka, et al.

DOI

Scopus

30

被引用数

(Scopus)
People Detection Based on Spatial Mapping of Friendliness and Floor Boundary Points for a Mobile Navigation Robot

Tasaki Tsuyoshi, Ozaki Fumio, Matsuhira Nobuto, Ogata Tetsuya, Okuno Hiroshi G.

Journal of Robotics 2011 1 - 10 2011年 [査読有り]

　概要を見る

Navigation robots must single out partners requiring navigation and move in the cluttered environment where people walk around. Developing such robots requires two different people detections: detecting partners and detecting all moving people around the robots. For detecting partners, we design divided spaces based on the spatial relationships and sensing ranges. Mapping the friendliness of each divided space based on the stimulus from the multiple sensors to detect people calling robots positively, robots detect partners on the highest friendliness space. For detecting moving people, we regard objects' floor boundary points in an omnidirectional image as obstacles. We classify obstacles as moving people by comparing movement of each point with robot movement using odometry data, dynamically changing thresholds to detect. Our robot detected 95.0% of partners while it stands by and interacts with people and detected 85.0% of moving people while robot moves, which was four times higher than previous methods did.

DOI CiNii
Towards Written Text Recognition Based on Handwriting Experiences Using a Recurrent Neural Network.

Shun Nishide, Jun Tani, Hiroshi G. Okuno, Tetsuya Ogata

Adv. Robotics 25 ( 17 ) 2173 - 2187 2011年 [査読有り]

　概要を見る

In this paper, we propose a model for recognizing written text through prediction of a handwriting sequence. The approach is based on findings in the brain sciences field. When recognizing written text, humans are said to unintentionally trace its handwriting sequence in their brains. Likewise, we aim to create a model that predicts a handwriting sequence from a static image of written text. The predicted handwriting sequence would be used to recognize the text. As the first step towards the goal, we created a model using neural networks, and evaluated the learning and recognition capability of the model using single Japanese characters. First, the handwriting image sequences for training are self-organized into image features using a self-organizing map. The self-organized image features are used to train the neuro-dynamics learning model. For recognition, we used both trained and untrained image sequences to evaluate the capability of the model to adapt to unknown data. The results of two experiments using 10 Japanese characters show the effectivity of the model. (C) Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2011

DOI
Polyphonic audio-to-score alignment based on Bayesian latent harmonic allocation hidden Markov model

Akira Maezawa, Hiroshi G. Okuno, Tetsuya Ogata, Masataka Goto

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 185 - 188 2011年 [査読有り]

　概要を見る

This paper presents a Bayesian method for temporally aligning a music score and an audio rendition. A critical problem in audio-to-score alignment is in dealing with the wide variety of timbre and volume of the audio rendition. In contrast with existing works that achieve this through ad-hoc feature design or careful training of tone models, we propose a Bayesian audio-to-score alignment method by modeling music performance as a Bayesian Hidden Markov Model, each state of which emits a Bayesian signal model based on Latent Harmonic Allocation. After attenuating reverberation, variational Bayes method is used to iteratively adapt the alignment, instrument tone model and the volume balance at each position of the score. The method is evaluated using sixty works of classical music of a variety of instrumentation ranging from solo piano to full orchestra. We verify that our method improves the alignment accuracy compared to dynamic time warping based on chroma vector for orchestral music, or our method employed in a maximum likelihood setting. © 2011 IEEE.

DOI

Scopus

15

被引用数

(Scopus)
Simultaneous processing of sound source separation and musical instrument identification using Bayesian spectral modeling

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 3816 - 3819 2011年 [査読有り]

　概要を見る

This paper presents a method of both separating audio mixtures into sound sources and identifying the musical instruments of the sources. A statistical tone model of the power spectrogram, called an integrated model, is defined and source separation and instrument identification are carried out on the basis of Bayesian inference. Since, the parameter distributions of the integrated model depend on each instrument, the instrument name is identified by selecting the one that has the maximum relative instrument weight. Experimental results showed correct instrument identification enables precise source separation even when many overtones overlap. © 2011 IEEE.

DOI

Scopus

14

被引用数

(Scopus)
Cluster self-organization of known and unknown environmental sounds using recurrent neural network

Yang Zhang, Shun Nishide, Toru Takahashi, Hiroshi G. Okuno, Tetsuya Ogata

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6791 LNCS ( PART 1 ) 167 - 175 2011年 [査読有り]

　概要を見る

Our goal is to develop a system that is able to learn and classify environmental sounds for robots working in the real world. In the real world, two main restrictions pertain in learning. First, the system has to learn using only a small amount of data in a limited time because of hardware restrictions. Second, it has to adapt to unknown data since it is virtually impossible to collect samples of all environmental sounds. We used a neuro-dynamical model to build a prediction and classification system which can self-organize sound classes into parameters by learning samples. The proposed system searches space of parameters for classifying. In the experiment, we evaluated the accuracy of classification for known and unknown sound classes. © 2011 Springer-Verlag.

DOI

Scopus

1

被引用数

(Scopus)
Robot with two ears listens to more than two simultaneous utterances by exploiting harmonic structures

Yasuharu Hirasawa, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6703 LNAI ( PART 1 ) 348 - 358 2011年

　概要を見る

In real-world situations, people often hear more than two simultaneous sounds. For robots, when the number of sound sources exceeds that of sensors, the situation is called under-determined, and robots with two ears need to deal with this situation. Some studies on under-determined sound source separation use L1-norm minimization methods, but the performance of automatic speech recognition with separated speech signals is poor due to its spectral distortion. In this paper, a two-stage separation method to improve separation quality with low computational cost is presented. The first stage uses a L1-norm minimization method in order to extract the harmonic structures. The second stage exploits reliable harmonic structures to maintain acoustic features. Experiments that simulate three utterances recorded by two microphones in an anechoic chamber show that our method improves speech recognition correctness by about three points and is fast enough for real-time separation. © 2011 Springer-Verlag.

DOI

Scopus
Environmental sound recognition for robot audition using matching-pursuit

Nobuhide Yamakawa, Toru Takahashi, Tetsuro Kitahara, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6704 LNAI ( PART 2 ) 1 - 10 2011年

　概要を見る

Our goal is to achieve a robot audition system that is capable of recognizing multiple environmental sounds and making use of them in human-robot interaction. The main problems in environmental sound recognition in robot audition are: (1) recognition under a large amount of background noise including the noise from the robot itself, and (2) the necessity of robust feature extraction against spectrum distortion due to separation of multiple sound sources. This paper presents the environmental recognition of two sound sources fired simultaneously using matching pursuit (MP) with the Gabor wavelet, which extracts salient audio features from a signal. The two environmental sounds come from different directions, and they are localized by multiple signal classification and, using their geometric information, separated by geometric source separation with the aid of measured head-related transfer functions. The experimental results show the noise-robustness of MP although the performance depends on the properties of the sound sources. © 2011 Springer-Verlag.

DOI

Scopus

18

被引用数

(Scopus)
Fast and simple iterative algorithm of Lp-norm minimization for under-determined speech separation

Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 1745 - 1748 2011年 [査読有り]

　概要を見る

This paper presents an efficient algorithm to solve Lp-norm minimization problem for under-determined speech separation; that is, for the case that there are more sound sources than microphones. We employ an auxiliary function method in order to derive update rules under the assumption that the amplitude of each sound source follows generalized Gaussian distribution. Experiments reveal that our method solves the L1-norm minimization problem ten times faster than a general solver, and also solves Lp-norm minimization problem efficiently, especially when the parameter p is small; when p is not more than 0.7, it runs in real-time without loss of separation quality. Copyright © 2011 ISCA.
Bayesian extension of MUSIC for sound source localization and tracking

Takuma Otsuka, Kazuhiro Nakadai, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 3109 - 3112 2011年 [査読有り]

　概要を見る

This paper presents a Bayesian extension of MUSIC-based sound source localization (SSL) and tracking method. SSL is important for distant speech enhancement and simultaneous speech separation for improving speech recognition, as well as for auditory scene analysis by mobile robots. One of the draw- backs of existing SSL methods is the necessity of careful param- eter tunings, e.g., the sound source detection threshold depend- ing on the reverberation time and the number of sources. Our contribution consists of (1) automatic parameter estimation in the variational Bayesian framework and (2) tracking of sound sources with reliability. Experimental results demonstrate our method robustly tracks multiple sound sources in a reverberant environment with RT20 = 840 (ms). Copyright © 2011 ISCA.
Particle-filter based audio-visual beat-tracking for music robot ensemble with human guitarist

Tatsuhiko Itohara, Takuma Otsuka, Takeshi Mizumoto, Tetsuya Ogata, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 118 - 124 2011年

　概要を見る

This paper presents an audio-visual beat-tracking method for ensemble robots with a human guitarist. Beat-tracking, or estimation of tempo and beat times of music, is critical to the high quality of musical ensemble performance. Since a human plays the guitar in out-beat in back beat and syncopation, the main problems of beat-tracking of a human's guitar playing are twofold: tempo changes and varying note lengths. Most conventional methods have not addressed human's guitar playing. Therefore, they lack the adaptation of either of the problems. To solve the problems simultaneously, our method uses not only audio but visual features. We extract audio features with Spectro-Temporal Pattern Matching (STPM) and visual features with optical flow, mean shift and Hough transform. Our beat-tracking estimates tempo and beat time using a particle filter; both acoustic feature of guitar sounds and visual features of arm motions are represented as particles. The particle is determined based on prior distribution of audio and visual features, respectively Experimental results confirm that our integrated audio-visual approach is robust against tempo changes and varying note lengths. In addition, they also show that estimation convergence rate depends only a little on the number of particles. The real-time factor is 0.88 when the number of particles is 200, and this shows out method works in real-time. © 2011 IEEE.

DOI

Scopus

12

被引用数

(Scopus)
Improvement of speaker localization by considering multipath interference of sound wave for binaural robot audition

Ui Hyun Kim, Takeshi Mizumoto, Tetsuya Ogata, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 2910 - 2915 2011年 [査読有り]

　概要を見る

This paper presents an improved speaker localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for binaural robot audition. The problem with the conventional direction-of-arrival (DOA) estimation based on the GCC-PHAT method is a multipath interference whereby a sound wave travels to microphones via the front-head path and the back-head path in binaural robot audition. This paper describes a new time delay factor for the GCC-PHAT method to compensate multipath interference on the assumption of spherical robot head. In addition, the restriction of the time difference of arrival (TDOA) estimation by the sampling frequency is also solved by applying the maximum likelihood (ML) estimation in frequency domain. Experiments conducted in the SIG-2 humanoid robot show that the proposed method reduces localization errors by 17.8 degrees on average and by over 35 degrees in side directions comparing to the conventional DOA estimation. © 2011 IEEE.

DOI

Scopus

8

被引用数

(Scopus)
Classification of known and unknown environmental sounds based on self-organized space using a recurrent neural network

Yang Zhang, Tetsuya Ogata, Shun Nishide, Toru Takahashi, Hiroshi G. Okuno

Advanced Robotics 25 ( 17 ) 2127 - 2141 2011年 [査読有り]

　概要を見る

Our goal is to develop a system to learn and classify environmental sounds for robots working in the real world. In the real world, two main restrictions pertain in learning. (i) Robots have to learn using only a small amount of data in a limited time because of hardware restrictions. (ii) The system has to adapt to unknown data since it is virtually impossible to collect samples of all environmental sounds. We used a neuro-dynamical model to build a prediction and classification system. This neuro-dynamical model can self-organize sound classes into parameters by learning samples. The sound classification space, constructed by these parameters, is structured for the sound generation dynamics and obtains clusters not only for known classes, but also unknown classes. The proposed system searches on the basis of the sound classification space for classifying. In the experiment, we evaluated the accuracy of classification for both known and unknown sound classes. © 2011 Koninklijke Brill NV, Leiden.

DOI

Scopus

2

被引用数

(Scopus)
Handwriting prediction based character recognition using recurrent neural network

Shun Nishide, Hiroshi G. Okuno, Tetsuya Ogata, Jun Tani

Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics 2549 - 2554 2011年 [査読有り]

　概要を見る

Humans are said to unintentionally trace handwriting sequences in their brains based on handwriting experiences when recognizing written text. In this paper, we propose a model for predicting handwriting sequence for written text recognition based on handwriting experiences. The model is first trained using image sequences acquired while writing text. The image features of sequences are self-organized from the images using Self-Organizing Map. The feature sequences are used to train a neuro-dynamics learning model. For recognition, the text image is input into the model for predicting the handwriting sequence and recognition of the text. We conducted two experiments using ten Japanese characters. The results of the experiments show the effectivity of the model. © 2011 IEEE.

DOI

Scopus

13

被引用数

(Scopus)
Incremental bayesian audio-to-score alignment with flexible harmonic structure models

Takuma Otsuka, Kazuhiro Nakadai, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011 525 - 530 2011年 [査読有り]

　概要を見る

Music information retrieval, especially the audio-to-score alignment problem, often involves a matching problem between the audio and symbolic representations. We must cope with uncertainty in the audio signal generated from the score in a symbolic representation such as the variation in the timbre or temporal fluctuations. Existing audio-to-score alignment methods are sometimes vulnerable to the uncertainty in which multiple notes are simultaneously played with a variety of timbres because these methods rely on static observation models. For example, a chroma vector or a fixed harmonic structure template is used under the assumption that musical notes in a chord are all in the same volume and timbre. This paper presents a particle filterbased audio-to-score alignment method with a flexible observation model based on latent harmonic allocation. Our method adapts to the harmonic structure for the audio-toscore matching based on the observation of the audio signal through Bayesian inference. Experimental results with 20 polyphonic songs reveal that our method is effective when more number of instruments are involved in the ensemble. © 2011 International Society for Music Information Retrieval.
Converting emotional voice to motion for robot telepresence

Angelica Lim, Tetsuya Ogata, Hiroshi G. Okuno

IEEE-RAS International Conference on Humanoid Robots 472 - 479 2011年 [査読有り]

　概要を見る

In this paper we present a new method for producing affective motion for humanoid robots. The NAO robot, like other humanoids, does not possess facial features to convey emotion. Instead, our proposed system generates pose-independent robot movement using a description of emotion through speed, intensity, regularity and extent (DESIRE). We show how the DESIRE framework can link the emotional content of voice and gesture, without the need for an emotion recognition system. Our results show that DESIRE movement can be used to effectively convey at least four emotions with user agreement 60-75%, and that voices converted to motion through SIRE maintained the same emotion significantly higher than chance, even across cultures (German to Japanese). Additionally, portrayals recognized as happiness were rated significantly easier to understand with motion over voice alone. © 2011 IEEE.

DOI

Scopus

21

被引用数

(Scopus)
A musical mood trajectory estimation method using lyrics and acoustic features

Naoki Nishikawa, Katsutoshi Itoyama, Hiromasa Fujihara, Masataka Goto, Tetsuya Ogata, Hiroshi G. Okuno

MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - MIRUM 2011 Workshop, MIRUM'11 51 - 56 2011年 [査読有り]

　概要を見る

In this paper, we present a new method that represents an overall musical time-varying impression of a song by a pair of mood trajectories estimated from lyrics and audio signals. The mood trajectory of the lyrics is obtained by using the probabilistic latent semantic analysis (PLSA) to estimate topics (representing impressions) from words in the lyrics. The mood trajectory of the audio signals is estimated from acoustic features by using the multiple linear regression analysis. In our experiments, the mood trajectories of 100 songs in Last.fm's Best of 2010 were estimated. The detailed analysis of the 100 songs confirms that acoustic features provide more accurate mood trajectory and the 21% resulting mood trajectories are matched to realistic musical mood available at Last.fm. © 2011 ACM.

DOI

Scopus

6

被引用数

(Scopus)
Use of a sparse structure to improve learning performance of recurrent neural networks

Hiromitsu Awano, Shun Nishide, Hiroaki Arie, Jun Tani, Toru Takahashi, Hiroshi G. Okuno, Tetsuya Ogata

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7064 LNCS ( PART 3 ) 323 - 331 2011年 [査読有り]

　概要を見る

The objective of our study is to find out how a sparse structure affects the performance of a recurrent neural network (RNN). Only a few existing studies have dealt with the sparse structure of RNN with learning like Back Propagation Through Time (BPTT). In this paper, we propose a RNN with sparse connection and BPTT called Multiple time scale RNN (MTRNN). Then, we investigated how sparse connection affects generalization performance and noise robustness. In the experiments using data composed of alphabetic sequences, the MTRNN showed the best generalization performance when the connection rate was 40%. We also measured sparseness of neural activity and found out that sparseness of neural activity corresponds to generalization performance. These results means that sparse connection improved learning performance and sparseness of neural activity would be used as metrics of generalization performance. © 2011 Springer-Verlag.

DOI

Scopus

4

被引用数

(Scopus)
Exploring movable space using rhythmical active touch in disordered obstacle environment

Kenri Kodaka, Tetsuya Ogata, Hirotaka Ohta, Shigeki Sugano

2011 IEEE/SICE International Symposium on System Integration, SII 2011 485 - 490 2011年

　概要を見る

We propose a novel navigation system for adaptively exploring an obstacle space using diverse ways of touching an object. Conventional navigation models are typically based on the avoidance of obstacles, i.e., avoiding collision. However, actual disordered space may be full of various kinds of obstacles. To reach a destination in such a space, a robot requires an active approach for avoiding a deadlock with obstacles or changing the obstacle configuration to find an open space using diverse ways of touching an object. We solved this problem by generating locally diverse moving patterns by using an action model with rhythmical oscillation in addition to a localization model using a particle filter. The proposed model was demonstrated to be effective through an experiment where a robot navigated to a destination behind partially movable obstacles using rhythmical active touch. © 2011 IEEE.

DOI

Scopus
Predicting listener back-channels for human-agent interaction using neuro-dynamical model

Shotaro Sano, Shun Nishide, Hiroshi G. Okuno, Tetsuya Ogata

2011 IEEE/SICE International Symposium on System Integration, SII 2011 18 - 23 2011年

　概要を見る

The goal of our work is to create natural verbal interaction between humans and speech dialogue agents. In this paper, we focus on generations of back-channel for speech dialogue agents the same way humans do. To create such a system, the system needs to predict the appropriate timing of back-channel on the basis of the human's speech. For the prediction model, we use a neuro-dynamical system called a multiple timescale recurrent neural network (MTRNN). The model is trained using an actual corpus of a poster session of the IMADE project using the presenter's prosodic and visual information as features. Using the model, we conducted back-channel timing prediction experiments. The results showed that our system could predict back-channel timing about 0.5 seconds before generation of back-channel response. Comparing the results with the actual back-channel timing in the corpus, the system showed 37.1% of recall, 31.7% of precision, and 34.2% of F-measure. These results show the model to effectively predict and generate back-channel responses. © 2011 IEEE.

DOI

Scopus

1

被引用数

(Scopus)
Identification of self-body based on dynamic predictability using neuro-dynamical system

Harumitsu Nobuta, Shun Nishide, Hiroshi G. Okuno, Tetsuya Ogata

2011 IEEE/SICE International Symposium on System Integration, SII 2011 256 - 261 2011年

　概要を見る

The goal of our work is to acquire an internal model through a robot's experience. The internal model has the ability for mutual conversion between motor commands and movement of the body (e.g. hand) in view. Unlike other works, which assume the robot's body to be extracted in its view, we assume that external moving objects are also included in its view. We introduce predictability as a measure to segregate such objects from the robot's body: the robot's body is predictable while moving objects are not. Prediction is conducted using a neuro-dynamical system called the multiple timescales recurrent neural network (MTRNN). The prediction results of the robot's body are compared with the actual motion to distinguish the robot's body from other objects. For evaluation, we conducted an experiment with the robot moving its hand while moving objects were in view. The results of the experiment showed that the prediction of the robot's hand is 3.86 times as accurate as that of others on average. These results show the effectiveness of using predictability as a measure to acquire an internal model in an environment that includes both a robot's body and other moving objects in view. © 2011 IEEE.

DOI

Scopus

1

被引用数

(Scopus)
Selecting help messages by using robust grammar verification for handling out-of-grammar utterances in spoken dialogue systems

Kazunori Komatani, Yuichiro Fukubayashi, Satoshi Ikeda, Tetsuya Ogata, Hiroshi G. Okuno

IEICE Transactions on Information and Systems E93-D ( 12 ) 3359 - 3367 2010年12月 [査読有り]

　概要を見る

We address the issue of out-of-grammar (OOG) utterances in spoken dialogue systems by generating help messages. Help message generation for OOG utterances is a challenge because language understanding based on automatic speech recognition (ASR) of OOG utterances is usually erroneous; important words are often misrecognized or missing from such utterances. Our grammar verification method uses a weighted finite-state transducer, to accurately identify the grammar rule that the user intended to use for the utterance, even if important words are missing from the ASR results. We then use a ranking algorithm, RankBoost, to rank help message candidates in order of likely usefulness. Its features include the grammar verification results and the utterance history representing the user's experience. Copyright © 2010.

DOI

Scopus

1

被引用数

(Scopus)
コミュニケーションロボットWAMOEBA-3の開発

菅野重樹, 尾形哲也, 菅佑樹, 西佑起

第11回システムインテグレーション部門講演会 (SI2010), 計測自動制御学会 1E3 - 3 2010年12月
自己形態主張型カスタマイズロボットの開発

山崎由美子, 守良真, 菅佑樹, 尾形哲也, 菅野重樹

第11回システムインテグレーション部門講演会 (SI2010),計測自動制御学会 2010 1E3 - 4 2010年12月

　概要を見る

Our goal is to create a robot which can communicate with people for a long time. To make this possible, we are developing a customizable communication robot "WEAR (Waseda Extendable ARchitecture)". We also provided modules, and experimental subjects can customize the robot by inserting them into the module-port. At the same time, the robot can recognize the customize, and it also can express its autonomy by rejecting the customization by ejecting the module. In this paper, we propose to introduce a requesting behavior (making sound and moving the module-port) into the robot as well as the rejecting behavior. As the result of experiments, we found that the impression of the sound differs among subjects in the view point of understanding the robot's request.

CiNii
再帰神経回路による環境音の構造化と識別

張陽, 尾形哲也, 高橋徹, 奥乃博

第11回システムインテグレーション部門講演会 (SI2010), 計測自動制御学会 2I1 - 5 2010年12月
ロボット間マルチモーダルコミュニケーションの記号過程

尾形哲也, 日下航

システム/制御/情報 : システム制御情報学会誌 = Systems, control and information 54 ( 11 ) 433 - 438 2010年11月

CiNii
テルミンの音高・音量特性のモデルに基づくテルミン演奏ロボットの開発

水本武志, 辻野広司, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会論文誌 51 ( 10 ) 2007 - 2019 2010年10月

　概要を見る

本論文では，テルミンを演奏するロボットのためのテルミンの特性モデルと演奏動作生成手法について報告する．テルミンとは，演奏者の手の位置を動かして演奏する電子楽器である．楽器との物理的接触なしで連続的に音高と音量を操作できるので，ハードウェア構成が異なるロボットにも適用可能であるという点で，移植性が高い．テルミン演奏ロボットの主たる課題は，(1)動作生成の物理的な基準点がないので，演奏法学習に要する学習サンプル数が多くなること，および(2)演奏特性が静電的環境によって変化するので，適応的な演奏動作生成が必要であることの2点である．これらの課題に対して，我々は環境の影響をパラメータで表現した音高・音量特性モデルを構築し，少数の測定で音域内の任意の音を演奏できる制御手法を開発した．実験の結果，約12点の測定で音高が任意に制御できること，環境が変化しても所望の音高や音量で演奏できることを3種類のロボットで確認した．We present a theremin player robot towards an ensemble between humans and robots. A theremin, whose pitch and volume change continuously, can be played without any physical contacts. We thus expect that a robot system has high portability because it requires only few physical constraints. The problems for theremin playing are: (1) we have no physical reference points and (2) an environment affects sound characteristics seriously. To solve them, we develop a model-based feedforward arm control method based on our novel models of theremin's pitch and volume characteristics, which method realizes play an arbitrary sound using a few measurements. Experimental results show that our method works under four environments and three different robots.

CiNii
テルミンの音高・音量特性のモデルに基づくテルミン演奏ロボットの開発

水本武志, 辻野広司, 高橋徹, 駒谷和範, 尾形哲也

情報処理学会論文誌 51 ( 10 ) 2008 - 2019 2010年10月

CiNii
Inter-modality mapping in robot with recurrent neural network

Tetsuya Ogata, Shun Nishide, Hideki Kozima, Kazunori Komatani, Hiroshi G. Okuno

Pattern Recognition Letters 31 ( 12 ) 1560 - 1569 2010年09月 [査読有り]

　概要を見る

A system for mapping between different sensory modalities was developed for a robot system to enable it to generate motions expressing auditory signals and sounds generated by object movement. A recurrent neural network model with parametric bias, which has good generalization ability, is used as a learning model. Since the correspondences between auditory signals and visual signals are too numerous to memorize, the ability to generalize is indispensable. This system was implemented in the "Keepon" robot, and the robot was shown horizontal reciprocating or rotating motions with the sound of friction and falling or overturning motion with the sound of collision by manipulating a box object. Keepon behaved appropriately not only from learned events but also from unknown events and generated various sounds in accordance with observed motions. © 2009 Elsevier B.V. All rights reserved.

DOI

Scopus

23

被引用数

(Scopus)
多重奏音響信号中の歌唱音声の歌詞を自由に差し替える歌詞置換システム

安良岡直希, 糸山克寿, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

音響学会秋季学術講演会 2010年09月
Multimodal gesture recognition for robot musical accompaniment

Lim Angelica, Mizumoto Takeshi, Cahier Louis-Kenzo, Otsuka Takuma, Takahashi Toru, Ogata Tetsuya, Okuno Hiroshi G

日本ロボット学会第28回学術講演会, 名古屋工業大学 1C1 - 4 2010年09月

CiNii
調波構造を用いたL1ノルム最小化に基づく劣決定音源分離手法の性能評価

平澤恭治, 高橋徹, 尾形哲也, 奥乃博

日本ロボット学会第28回学術講演会, 名古屋工業大学 1H2 - 3 2010年09月
ロボット聴覚のためのMatching-Pursuitによる環境音の分離音認識

山川暢英, 高橋徹, 北原鉄朗, 尾形哲也, 奥乃博

日本ロボット学会第28回学術講演会, 名古屋工業大学 1H2 - 4 2010年09月
Dynamic Recognition of Enviromental Sounds with Recurrent Neural Network

張陽, 尾形哲也, 高橋徹, 奥乃博

日本ロボット学会第28回学術講演会, 名古屋工業大学 1H2 - 7 2010年09月
リサンプル-ブロック処理と並列化に基づくICAの実時間実装

武田龍, 中臺一博, 高橋徹, 尾形哲也, 奥乃博

日本ロボット学会第28回学術講演会, 名古屋工業大学 1H3 - 1, 2010年09月
打楽器とロボットの合奏のための結合振動子モデルに基づく打撃時刻予測手法

水本武志, 中臺一博, 大塚琢馬, 高橋徹, 尾形哲也, 奥乃博

日本ロボット学会第28回学術講演会, 名古屋工業大学 1H3 - 2 2010年09月
音楽ロボットのためのパーティクルフィルタを用いたテンポ・楽譜追従手法

大塚琢馬, 中臺一博, 高橋徹, 尾形哲也, 奥乃博

日本ロボット学会第28回学術講演会, 名古屋工業大学 1H3 - 6 2010年09月
Probabilistic polygonal mesh for 3D SLAM

Cahier Louis-Kenzo, Takahashi Toru, Ogata Tetsuya, Okuno Hiroshi G

日本ロボット学会第28回学術講演会, 名古屋工業大学 2B2 - 4 2010年09月
確信度を用いた物体配置作業における人間ロボット協調

粟野皓光, 尾形哲也, 西出俊, 高橋徹, 奥乃博

日本ロボット学会第28回学術講演会, 名古屋工業大学 3J1 - 6 2010年09月
自己形態主張型カスタマイズロボットの開発?音を用いたカスタマイズ要求主張行動の検証?

山崎由美子, 守良真, 近藤裕樹, 菅祐樹, 尾形哲也, 菅野重樹

日本ロボット学会第28回学術講演会, 名古屋工業大学 3C2 - 5 2010年09月
能動知覚経験に基づく物体特徴量の自己組織化と予測信頼性に基づく動作生成

西出俊, 尾形哲也, 谷淳, 高橋徹, 奥乃博

日本ロボット学会第28回学術講演会, 名古屋工業大学 3A2 - 5 2010年09月
MTRNNを用いた階層的言語構造の創発

日下航, 有江浩明, 谷淳, 尾形哲也, 高橋徹, 奥乃博

日本ロボット学会第28回学術講演会, 名古屋工業大学 3A2 - 7 2010年09月
バージイン許容音声対話システムにおけるユーザ発話の分析と指示対象同定への応用

松山匡子, 駒谷和範, 武田龍, 尾形哲也, 奥乃博

研究報告音声言語情報処理（SLP） 2010 ( 21 ) 1 - 6 2010年07月

　概要を見る

本稿は，バージイン許容列挙型音声対話におけるユーザ発話の分析と，分析結果を応用した指示対象同定手法の拡張について報告する．バージイン許容音声対話では，個々のユーザやシステムの発話内容によってユーザの発話タイミングや発話表現が異なる．そこでこれらを事前確率として反映させ，発話意図解釈の性能向上を図る．我々はまず，ニュース読み上げとクイズの2つの列挙型対話システムで収集したユーザ発話 1584 発話を分析し，ユーザの参照表現発話率が個々のユーザやシステムの列挙項目長に依存することを明らかにした．さらに，これらの特性を指示対象同定の枠組みに組み込み，タイミングと音声認識結果の解釈の事前確率として反映させる．この事前確率の推定には，ロジスティック回帰を用いる．事前確率として一定値を用いた場合に比べて，指示対象同定精度が最大 6.2 ポイント向上することを実験により確認した．This paper reports the extension of identification method based on analyses of user utterance in barge-in-able spoken dialogue system which reads out items. Generally, user's behaviors such as barge-in timing and utterance expressions vary in accordance with the user's preference and the content of system utterances. To interpret users' intention robustly, first, we analyze 1584 utterances collected by our systems with quiz and news-listing tasks and reveal that the ratio of using referential expressions depends on individual users and average lengths of listed items. Second, we incorporate this tendency as a prior probability into our probabilistic framework for identifying user's intended item. This prior probability is calculated by logistic regression. Experimental results show that our method improves the identification accuracy by as many as 6.2 points in the best case over the non-informative prior.

CiNii
バージイン許容音声対話システムにおけるユーザ発話の分析と指示対象同定への応用

松山匡子, 駒谷和範, 武田龍, 尾形哲也, 奥乃博

第回音声言語情報処理研究会情処研報,情報処理学会 Vol.2010 ( No.10 ) 2010 2010年07月
多重奏音響信号中の演奏をユーザー指定の旋律に差し替えるフレーズ置換システム

安良岡直希, 糸山克寿, 吉岡拓也, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

音楽情報科学研究会, つくば, 情報処理学会 Vol.2010-MUS-, No., pp. 2010年07月
SpeakBySinging: 歌声を話声に変換する話声合成システム

阿曽慎平, 齋藤毅, 後藤真孝, 糸山克寿, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

音楽情報科学研究会,つくば, 情報処理学会 Vol.2010-MUS-86, No., pp. 2010年07月
複数の言語モデルと言語理解モデルによる音声理解の高精度化

勝丸真樹, 中野幹生, 駒谷和範, 船越孝太郎, 辻野広司, 尾形哲也, 奥乃博

電子情報通信学会論文誌. D, 情報・システム = The IEICE transactions on information and systems (Japanese edition) 93 ( 6 ) 879 - 888 2010年06月

　概要を見る

本論文では,音声対話システムにおいて,複数の言語モデルと複数の言語理解モデルを用いることで,高精度な音声理解を行う手法について述べる.ユーザの発話によって適した言語モデル・言語理解モデルの組合せは異なることから,単一の音声理解方式で様々な発話に対して高精度な音声理解を実現することは難しい.そこで本論文では,まず,複数の言語モデルと言語理解モデルを用いて複数の理解結果を得ることで,理解結果の候補を得る.次に,得られた複数の理解結果に対して,ロジスティック回帰に基づき発話単位の信頼度を付与し,その信頼度が最も高い理解結果を選択する.本論文では,言語モデルとして文法モデルとN-gramモデルの2種類,言語理解モデルとしてFinite-State Transducer(FST)とWeighted FST(WFST),Keyphrase-Extractorの3種類を用いた.評価実験では,言語モデル・言語理解モデルのいずれかを複数用いた場合と比較して,コンセプト理解精度の向上が得られた.また,従来のROVER法による理解結果の統合と比較し,本手法の有効性を示した.

CiNii
RNNを備えた２体のロボット間における身体性に基づいた動的コミュニケーションの創発

日下航, 尾形哲也, 小嶋秀樹, 高橋徹, 奥乃博

日本ロボット学会誌 28 ( 4 ) 532 - 543 2010年05月
複数の言語モデルと言語理解モデルによるラピッドプロトタイピング向け音声理解

勝丸真樹, 駒谷和範, 中野幹生, 船越孝太郎, 辻野広司, 尾形哲也, 奥乃博

情報処理学会研究報告(CD-ROM) 2009 ( 6 ) ROMBUNNO.SLP-80,5 2010年04月

J-GLOBAL
実環境音声認識のためのロボット聴覚システム開発とパラメータチューニング

高橋徹, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 72 29 - 30 2010年03月

CiNii
ロボット音声対話におけるSemi-blind ICAを用いた自己発話キャンセル

武田龍, 中臺一博, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 72 27 - 28 2010年03月

CiNii
楽器音イコライザによる楽曲音響特徴変動と類似楽曲検索への応用

糸山克寿, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 72 25 - 26 2010年03月

CiNii
バージイン許容音声対話におけるLSMによる許容発話範囲の拡張

松山匡子, 駒谷和範, 高橋徹, 尾形哲也, 奥乃博

全国大会講演論文集 72 129 - 130 2010年03月

CiNii
MTRNNを用いた単語と文法の階層的自己組織化による文の認識・生成

日下航, 有江浩明, 谷淳, 尾形哲也, 高橋徹, 駒谷和範, 奥乃博

全国大会講演論文集 72 525 - 526 2010年03月

CiNii
RNNを用いた行為予測による人間とロボットの協調物体配置

粟野皓光, 尾形哲也, 高橋徹, 駒谷和範, 奥乃博

全国大会講演論文集 72 395 - 396 2010年03月

CiNii
F0・振幅・音韻長の制御により歌声を話声に変換する話声合成システムSpeakBySinging

阿曽慎平, 齋藤毅, 後藤真孝, 糸山克寿, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 72 295 - 296 2010年03月

CiNii
環境音から擬音語への自動変換における特徴量抽出法の検討

山川暢英, 北原鉄朗, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 72 257 - 258 2010年03月

CiNii
複数自由度を用いて音高特性モデルに基づく音高制御を行うテルミン演奏ロボットの開発

水本武志, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 72 203 - 204 2010年03月

CiNii
ユーザの文法知識を状態に加えたPOMDPに基づく音声対話システム

穐山空道, 駒谷和範, 高橋徹, 尾形哲也, 奥乃博

全国大会講演論文集 72 291 - 292 2010年03月

CiNii
調波非調波GMMに基づくMIDI演奏音響信号に対する音色・演奏表情操作

安良岡直希, 糸山克寿, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 72 183 - 184 2010年03月

CiNii
スペクトル推定を用いたマイク数以上の同時発話に対する音声認識

平澤恭治, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 72 253 - 254 2010年03月

CiNii
複数の言語モデルと言語理解モデルによる音声理解手法のラピッドプロトタイピングへの適用

勝丸真樹, 駒谷和範, 中野幹生, 船越孝太郎, 辻野広司, 尾形哲也, 奥乃博

全国大会講演論文集 72 243 - 244 2010年03月

CiNii
クラシック音楽理解力拡張インタフェースを目指して : 複数の演奏家による解釈共通旋律と解釈相違旋律の推定

前澤陽, 後藤真孝, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 72 ( 0 ) 143 - 144 2010年03月

CiNii J-GLOBAL
"Score Following by Particle Filtering for Music Robots, "

大塚琢馬, 中臺一博, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 5R - 7 2010年03月
調波非調波GMMに基づくMIDI演奏音響信号に対する音色・演奏表情操作

安良岡直希, 糸山克寿, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.5T - 5 2010年03月
クラシック音楽理解能力拡張インターフェイスのための同音旋律音量推定手法と主旋律推定への応用

前澤陽, 後藤真孝, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.3T - 1 2010年03月
楽器音イコライザによる楽曲音響特徴変動と類似楽曲検索への応用

糸山克寿, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.6J - 6 2010年03月
F0・振幅・音韻長の制御により歌声を話声に変換する話声合成システムSpeakBySinging

阿曽慎平, 齋藤毅, 後藤真孝, 糸山克寿, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.6U - 1 2010年03月
MTRNNを用いた単語と文法の階層的自己組織化による文の認識・生成

日下航, 有江浩明, 谷淳, 尾形哲也, 高橋徹, 駒谷和範, 奥乃博

情報処理学会第72回全国大会 p.6W - 8 2010年03月
RNNを用いた行為予測による人間とロボットの協調物体配置

粟野皓光, 尾形哲也, 高橋徹, 駒谷和範, 奥乃博

情報処理学会第72回全国大会 p.5V - 6 2010年03月
環境音から擬音語への自動変換における特徴量抽出法の検討

山川暢英, 北原鉄朗, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.3U - 9 2010年03月
実環境音声認識のためのロボット聴覚システム開発とパラメータチューニング

高橋徹, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.6J - 8 2010年03月
ロボット音声対話におけるSemi-blind ICAを用いた自己発話キャンセル

武田龍, 中臺一博, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.6J - 7 2010年03月
複数自由度を用いて音高特性モデルに基づく音高制御を行うテルミン演奏ロボットの開発

水本武志, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.6T - 8 2010年03月
スペクトル推定を用いたマイク数以上の同時発話に対する音声認識

平澤恭治, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.3U - 7 2010年03月
ユーザの文法知識を状態に加えたPOMDPに基づく音声対話システム

穐山空道, 駒谷和範, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.5U - 9 2010年03月
バージイン許容音声対話におけるLSMによる許容発話範囲の拡張

松山匡子, 駒谷和範, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.5U - 9 2010年03月
Robot Musical Accompaniment: Real-time Synchronization using Visual Cue Recognition

Angelica Lim, 水本武志, 大塚琢馬, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 72 p.6T - 7 2010年03月

CiNii
複数の言語モデルと言語理解モデルによる音声理解手法のラピッドプロトタイピングへの適用

勝丸真樹, 駒谷和範, 中野幹生, 船越孝太郎, 辻野広司, 尾形哲也, 奥乃博

情報処理学会第72回全国大会 p.3U - 2 2010年03月
複数の言語モデルと言語理解モデルによるラピッドプロトタイピング向け音声理解

勝丸真樹, 駒谷和範, 中野幹生, 船越孝太郎, 辻野広司, 尾形哲也, 奥乃博

研究報告音声言語情報処理（SLP） 2010 ( 5 ) 1 - 6 2010年02月

　概要を見る

本稿では，少量の学習データでも高精度な音声理解を実現する手法について述べる．学習データが少ない場合，単一の音声理解方式による精度は低い傾向にある．そこで本手法では，まず，複数の言語モデルと言語理解モデルを用いて複数の理解結果を得ることで，対処可能な発話を増やす．次に，得られた複数の理解結果に対して，ロジスティック回帰に基づき発話単位の信頼度を付与し，その信頼度が最も高い理解結果を選択する．ロジスティック回帰には，学習データ増加時の回帰係数の変化量に着目することで，必要最低限の学習データを割り当てる．評価実験では，学習データが少ない場合でも，単一の音声理解方式と比較して，本手法が高い音声理解精度を得られることを示す．We aim to improve a speech understanding module with a small amount of training data. High performance is not obtained by single speech understanding methods especially when the amount of available training data is small. We utilize multiple language models (LMs) and language understanding models (LUMs) to cover various user utterances. Then, the most appropriate speech understanding result is selected from several candidates on the basis of confidence measures calculated by logistic regressions. We determine necessary amount of training data for the regressions by focusing on changes in their coefficients when the training data increases. We evaluate our method for various amounts of training data and confirm that our method outperforms every single speech understanding method even when only a small amount of training data is available.

CiNii
1P1-C18 ハードウェアをカスタマイズできるコミュニケーションロボットに関する研究 : ロボットの理想形態への理解度推定に基づく動作生成

守良真, 山崎由美子, 近藤裕樹, 菅佑樹, 尾形哲也, 菅野重樹

ロボティクス・メカトロニクス講演会講演概要集 2010 "1P1 - C18(1)"-"1P1-C18(3)" 2010年

　概要を見る

Our goal is to create a robot which can communicate with people for a long time. We developed the customizable communication robot "WEAR" which is installed the system which can reject of user's customization based on its judgement. In this paper, we carried out an experiment to verify the user's impression when WEAR changes it's motion based on user's understanding of the judgement. As the results, we confirmed that when WEAR changes it's motion, users more interested in WEAR, compared to didn't changes motion.

CiNii
Soft missing-feature mask generation for robot audition.

Toru Takahashi 0001, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Paladyn J. Behav. Robotics 1 ( 1 ) 37 - 47 2010年

DOI

Scopus
Voice-awareness control for a humanoid robot consistent with its body posture and movements.

Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi 0001, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Paladyn J. Behav. Robotics 1 ( 1 ) 80 - 88 2010年

DOI

Scopus

6

被引用数

(Scopus)
Upper-limit evaluation of robot audition based on ICA-BSS in multi-source, barge-in and highly reverberant conditions

Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings - IEEE International Conference on Robotics and Automation 4366 - 4371 2010年 [査読有り]

　概要を見る

This paper presents the upper-limit evaluation of robot audition based on ICA-BSS in multi-source, barge-in and highly reverberant conditions. The goal is that the robot can automatically distinguish a target speech from its own speech and other sound sources in a reverberant environment. We focus on the multi-channel semi-blind ICA (MCSB-ICA), which is one of the sound source separation methods with a microphone array, to achieve such an audition system because it can separate sound source signals including reverberations with few assumptions on environments. The evaluation of MCSB-ICA has been limited to robot's speech separation and reverberation separation. In this paper, we evaluate MCSB-ICA extensively by applying it to multi-source separation problems under common reverberant environments. Experimental results prove that MCSB-ICA outperforms conventional ICA by 30 points in automatic speech recognition performance. ©2010 IEEE.

DOI

Scopus

4

被引用数

(Scopus)
Improvement in listening capability for humanoid robot HRP-2

Toru Takahashi, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings - IEEE International Conference on Robotics and Automation 470 - 475 2010年 [査読有り]

　概要を見る

This paper describes improvement of sound source separation for a simultaneous automatic speech recognition (ASR) system of a humanoid robot. A recognition error in the system is caused by a separation error and interferences of other sources. In separability, an original geometric source separation (GSS) is improved. Our GSS uses a measured robot's head related transfer function (HRTF) to estimate a separation matrix. As an original GSS uses a simulated HRTF calculated based on a distance between microphone and sound source, there is a large mismatch between the simulated and the measured transfer functions. The mismatch causes a severe degradation of recognition performance. Faster convergence speed of separation matrix reduces separation error. Our approach gives a nearer initial separation matrix based on a measured transfer function from an optimal separation matrix than a simulated one. As a result, we expect that our GSS improves the convergence speed. Our GSS is also able to handle an adaptive step-size parameter. These new features are added into open source robot audition software (OSS) called "HARK" which is newly updated as version 1.0.0. The HARK has been installed on a HRP-2 humanoid with an 8-element microphone array. The listening capability of HRP-2 is evaluated by recognizing a target speech signal which is separated from a simultaneous speech signal by three talkers. The word correct rate (WCR) of ASR improves by 5 points under normal acoustic environments and by 10 points under noisy environments. Experimental results show that HARK 1.0.0 improves the robustness against noises. ©2010 IEEE.

DOI

Scopus

6

被引用数

(Scopus)
Improving identification accuracy by extending acceptable utterances in spoken dialogue system using barge-in timing

Kyoko Matsuyama, Kazunori Komatani, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6097 LNAI ( PART 2 ) 585 - 594 2010年 [査読有り]

　概要を見る

We describe a novel dialogue strategy enabling robust interaction under noisy environments where automatic speech recognition (ASR) results are not necessarily reliable. We have developed a method that exploits utterance timing together with ASR results to interpret user intention, that is, to identify one item that a user wants to indicate from system enumeration. The timing of utterances containing referential expressions is approximated by Gamma distribution, which is integrated with ASR results by expressing both of them as probabilities. In this paper, we improve the identification accuracy by extending the method. First, we enable interpretation of utterances including ordinal numbers, which appear several times in our data collected from users. Then we use proper acoustic models and parameters, improving the identification accuracy by 4.0% in total. We also show that Latent Semantic Mapping (LSM) enables more expressions to be handled in our framework. © 2010 Springer-Verlag.

DOI

Scopus

1

被引用数

(Scopus)
Violin fingering estimation based on violin pedagogical fingering model constrained by bowed sequence estimation from audio input

Akira Maezawa, Katsutoshi Itoyama, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6098 LNAI ( PART 3 ) 249 - 259 2010年 [査読有り]

　概要を見る

This work presents an automated violin fingering estimation method that facilitates a student violinist acquire the "sound" of his/her favorite recording artist created by the artist's unique fingering. Our method realizes this by analyzing an audio recording played by the artist, and recuperating the most playable fingering that recreates the aural characteristics of the recording. Recovering the aural characteristics requires the bowed string estimation of an audio recording, and using the estimated result for optimal fingering decision. The former requires high accuracy and robustness against the use of different violins or brand of strings; and the latter needs to create a natural fingering for the violinist. We solve the first problem by detecting estimation errors using rule-based algorithms, and by adapting the estimator to the recording based on mean normalization. We solve the second problem by incorporating, in addition to generic stringed-instrument model used in existing studies, a fingering model that is based on pedagogical practices of violin playing, defined on a sequence of two or three notes. The accuracy of the bowed string estimator improved by 21 points in a realistic situation (38% → 59%) by incorporating error correction and mean normalization. Subjective evaluation of the optimal fingering decision algorithm by seven violinists on 22 musical excerpts showed that compared to the model used in existing studies, our proposed model was preferred over existing one (p=0.01), but no significant preference towards proposed method defined on sequence of two notes versus three notes was observed (p=0.05). © 2010 Springer-Verlag.

DOI

Scopus

8

被引用数

(Scopus)
Music-ensemble robot that is capable of playing the Theremin while listening to the accompanied music

Takuma Otsuka, Takeshi Mizumoto, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6096 LNAI ( PART 1 ) 102 - 112 2010年 [査読有り]

　概要を見る

Our goal is to achieve a musical ensemble among a robot and human musicians where the robot listens to the music with its own microphones. The main issues are (1) robust beat-tracking since the robot hears its own generated sounds in addition to the accompanied music, and (2) robust synchronizing its performance with the accompanied music even if humans' musical performance fluctuates. This paper presents a music-ensemble Thereminist robot implemented on the humanoid HRP-2 with the following three functions: (1) self-generated Theremin sound suppression by semi-blind Independent Component Analysis, (2) beat tracking robust against tempo fluctuation in humans' performance, and (3) feedforward control of Theremin pitch. Experimental results with a human drummer show the capability of this robot for the adaptation to the temporal fluctuation in his performance. © 2010 Springer-Verlag.

DOI

Scopus

8

被引用数

(Scopus)
Recognition and generation of sentences through self-organizing linguistic hierarchy using MTRNN

Wataru Hinoshita, Hiroaki Arie, Jun Tani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6098 LNAI ( PART 3 ) 42 - 51 2010年 [査読有り]

　概要を見る

We show that a Multiple Timescale Recurrent Neural Network (MTRNN) can acquire the capabilities of recognizing and generating sentences by self-organizing a hierarchical linguistic structure. There have been many studies aimed at finding whether a neural system such as the brain can acquire languages without innate linguistic faculties. These studies have found that some kinds of recurrent neural networks could learn grammar. However, these models could not acquire the capability of deterministically generating various sentences, which is an essential part of language functions. In addition, the existing models require a word set in advance to learn the grammar. Learning languages without previous knowledge about words requires the capability of hierarchical composition such as characters to words and words to sentences, which is the essence of the rich expressiveness of languages. In our experiment, we trained our model to learn language using only a sentence set without any previous knowledge about words or grammar. Our experimental results demonstrated that the model could acquire the capabilities of recognizing and deterministically generating grammatical sentences even if they were not learned. The analysis of neural activations in our model revealed that the MTRNN had self-organized the linguistic structure hierarchically by taking advantage of differences in the time scale among its neurons, more concretely, neurons that change the fastest represented "characters," those that change more slowly represented "words," and those that change the slowest represented "sentences." © 2010 Springer-Verlag.

DOI

Scopus

1

被引用数

(Scopus)
Design and implementation of two-level synchronization for an interactive music robot

Takuma Otsuka, Kazuhiro Nakadai, Tom Takahashi, Kazunori Komatanj, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the National Conference on Artificial Intelligence 2 1238 - 1244 2010年 [査読有り]

　概要を見る

Our goal is to develop an interactive music robot, i.e., a robot that presents a musical expression together with humans. A music interaction requires two important functions: synchronization with the music and musical expression, such as singing and dancing. Many instrument-performing robots are only capable of the latter function, they may have difficulty in playing live with human performers. The synchronization function is critical for the interaction. We classify synchronization and musical expression into two levels: (1) the rhythm level and (2) the melody level. Two issues in achieving two-layer synchronization and musical expression are: (1) simultaneous estimation of the rhythm structure and the current part of the music and (2) derivation of the estimation confidence to switch behavior between the rhythm level and the melody level. This paper presents a score following algorithm, incremental audio to score alignment, that conforms to the two-level synchronization design using a particle filter. Our method estimates the score position for the melody level and the tempo for the rhythm level. The reliability of the score position estimation is extracted from the probability distribution of the score position. Experiments are carried out using polyphonic jazz songs. The results confirm that our method switches levels in accordance with the difficulty of the score estimation. When the tempo of the music is less than 120 (beats per minute; bpm), the estimated score positions are accurate and reported; when the tempo is over 120 (bpm), the system tends to report only the tempo to suppress the error in the reported score position predictions. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Automatic allocation of training data for rapid prototyping of speech understanding based on multiple model combination

Kazunori Komatani, Masaki Katsumaru, Mikio Nakano, Kotaro Funakoshi, Tetsuya Ogata, Hiroshi G. Okuno

Coling 2010 - 23rd International Conference on Computational Linguistics, Proceedings of the Conference 2 579 - 587 2010年

　概要を見る

The optimal choice of speech understanding method depends on the amount of training data available in rapid prototyping. A statistical method is ultimately chosen, but it is not clear at which point in the increase in training data a statistical method become effective. Our framework combines multiple automatic speech recognition (ASR) and language understanding (LU) modules to provide a set of speech understanding results and selects the best result among them. The issue is how to allocate training data to statistical modules and the selection module in order to avoid overfitting in training and obtain better performance. This paper presents an automatic training data allocation method that is based on the change in the coefficients of the logistic regression functions used in the selection module. Experimental evaluation showed that our allocation method outperformed baseline methods that use a single ASR module and a single LU module at every point while training data increase.
Analyzing user utterances in barge-in-able spoken dialogue system for improving identification accuracy

Kyoko Matsuyama, Kazunori Komatani, Ryu Takeda, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 3050 - 3053 2010年 [査読有り]

　概要を見る

In our barge-in-able spoken dialogue system, the user's behaviors such as barge-in timing and utterance expressions vary according to his/her characteristics and situations. The system adapts to the behaviors by modeling them. We analyzed 1584 utterances collected by our systems of quiz and news-listing tasks and showed that ratio of using referential expressions depends on individual users and average lengths of listed items. This tendency was incorporated as a prior probability into our method and improved the identification accuracy of the user's intended items. © 2010 ISCA.
Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition

Nobuhide Yamakawa, Tetsuro Kitahara, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 2342 - 2345 2010年 [査読有り]

　概要を見る

Research on environmental sound recognition has not shown great development in comparison with that on speech and musical signals. One of the reasons is that the sound category of environmental sounds covers a broad range of acoustical natures. We classified them in order to explore suitable recognition techniques for each characteristic. We focus on impulsive sounds and their non-stationary feature within and between analytic frames. We used matching-pursuit as a framework to use wavelet analysis for extracting temporal variation of audio features inside a frame. We also investigated the validity of modeling decaying patterns of sounds using Hidden markov models. Experimental results indicate that sounds with multiple impulsive signals are recognized better by using time-frequency analyzing bases than by frequency domain analysis. Classification of sound classes with a long and clear decaying pattern improves when HMMs with multiple number of hidden states are applied. © 2010 ISCA.
SpeakBySinging: Converting singing voices to speaking voices while retaining voice timbre

Shimpei Aso, Takeshi Saitou, Masataka Goto, Katsutoshi Itoyama, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

13th International Conference on Digital Audio Effects, DAFx 2010 Proceedings 14 - 121 2010年

　概要を見る

This paper describes a singing-to-speaking synthesis system called "SpeakBySinging" that can synthesize a speaking voice from an input singing voice and the song lyrics. The system controls three acoustic features that determine the difference between speaking and singing voices: the fundamental frequency (F0), phoneme duration, and power (volume). By changing these features of a singing voice, the system synthesizes a speaking voice while retaining the timbre of the singing voice. The system first analyzes the singing voice to extract the F0 contour, the duration of each phoneme of the lyrics, and the power. These features are then converted to target values that are obtained by feeding the lyrics into a traditional text-to-speech (TTS) system. The system finally generates a speaking voice that preserves the timbre of the singing voice but has speech-like features. Experimental results show that SpeakBySinging can convert singing voices into speaking voices whose timbre is almost the same as the original singing voices.
Human-robot cooperation in arrangement of objects using confidence measure of neuro-dynamical system

Hiromitsu Awano, Tetsuya Ogata, Shun Nishide, Toru Takahashi, Kazunori Komatani, Hiroshi G. Okuno

Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics 2533 - 2538 2010年 [査読有り]

　概要を見る

The objective of our study was to develop dynamic collaboration between a human and a robot. Most conventional studies have created pre-designed rule-based collaboration systems to determine the timing and behavior of robots to participate in tasks. Our aim is to introduce the confidence of the task as a criterion for robots to determine their timing and behavior. In this paper, we report the effectiveness of applying reproduction accuracy as a measure for quantitatively evaluating confidence in an object arrangement task. Our method is comprised of three phases. First, we obtain human-robot interaction data through the Wizard of OZ method. Second, the obtained data are trained using a neuro-dynamical system, namely, the Multiple Time-scales Recurrent Neural Network (MTRNN). Finally, the prediction error in MTRNN is applied as a confidence measure to determine the robot's behavior. The robot participated in the task when its confidence was high, while it just observed when its confidence was low. Training data were acquired using an actual robot platform, Hiro. The method was evaluated using a robot simulator. The results revealed that motion trajectories could be precisely reproduced with a high degree of confidence, demonstrating the effectiveness of the method. ©2010 IEEE.

DOI

Scopus

5

被引用数

(Scopus)
Exploiting harmonic structures to improve separating simultaneous speech in under-determined conditions

Yasuharu Hirasawa, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings 450 - 457 2010年 [査読有り]

　概要を見る

In real-world situations, a robot may often encounter "under- determined" situation, where there are more sound sources than microphones. This paper presents a speech separation method using a new constraint on the harmonic structure for a simultaneous speech-recognition system in under-determined conditions. The requirements for a speech separation method in a simultaneous speech-recognition system are (1) ability to handle a large number of talkers, and (2) reduction of distortion in acoustic features. Conventional methods use a maximum likelihood estimation in sound source separation, which fulfills requirement (1). Since it is a general approach, the performance is limited when separating speech. This paper presents a two-stage method to improve the separation. The first stage uses maximum likelihood estimation and extracts the harmonic structure, and the second stage exploits the harmonic structure as a new constraint to achieve requirement (2). We carried out an experiment that simulated three simultaneous utterances using impulse responses recorded by two microphones in an anechoic chamber. The experimental results revealed that our method could improve speech recognition correctness by about four points. ©2010 IEEE.

DOI

Scopus
Motion generation based on reliable predictability using self-organized object features

Shun Nishide, Tetsuya Ogata, Jun Tani, Toru Takahashi, Kazunori Komatani, Hiroshi G. Okuno

IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings 3453 - 3458 2010年 [査読有り]

　概要を見る

Predictability is an important factor for determining robot motions. This paper presents a model to generate robot motions based on reliable predictability evaluated through a dynamics learning model which self-organizes object features. The model is composed of a dynamics learning module, namely Recurrent Neural Network with Parametric Bias (RNNPB), and a hierarchical neural network as a feature extraction module. The model inputs raw object images and robot motions. Through bi-directional training of the two models, object features which describe the object motion are self-organized in the output of the hierarchical neural network, which is linked to the input of RNNPB. After training, the model searches for the robot motion with high reliable predictability of object motion. Experiments were performed with the robot's pushing motion with a variety of objects to generate sliding, falling over, bouncing, and rolling motions. For objects with single motion possibility, the robot tended to generate motions that induce the object motion. For objects with two motion possibilities, the robot evenly generated motions that induce the two object motions. ©2010 IEEE.

DOI

Scopus
An improvement in automatic speech recognition using soft missing feature masks for robot audition

Toru Takahashi, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings 964 - 969 2010年 [査読有り]

　概要を見る

We describe integration of preprocessing and automatic speech recognition based on Missing-Feature-Theory (MFT) to recognize a highly interfered speech signal, such as the signal in a narrow angle between a desired and interfered speakers. As a speech signal separated from a mixture of speech signals includes the leakage from other speech signals, recognition performance of the separated speech degrades. An important problem is estimating the leakage in time-frequency components. Once the leakage is estimated, we can generate missing feature masks (MFM) automatically by using our method. A new weighted sigmoid function is introduced for our MFM generation method. An experiment shows that a word correct rate improves from 66 % to 74 % by using our MFM generation method tuned by a search base approach in the parameter space. ©2010 IEEE.

DOI

Scopus

3

被引用数

(Scopus)
Robot musical accompaniment: Integrating audio and visual cues for real-time synchronization with a human flutist

Angelica Lim, Takeshi Mizumoto, Louis Kenzo Cahier, Takuma Otsuka, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings 1964 - 1969 2010年 [査読有り]

　概要を見る

Musicians often have the following problem: they have a music score that requires 2 or more players, but they have no one with whom to practice. So far, score-playing music robots exist, but they lack adaptive abilities to synchronize with fellow players' tempo variations. In other words, if the human speeds up their play, the robot should also increase its speed. However, computer accompaniment systems allow exactly this kind of adaptive ability. We present a first step towards giving these accompaniment abilities to a music robot. We introduce a new paradigm of beat tracking using 2 types of sensory input - visual and audio - using our own visual cue recognition system and state-of-the-art acoustic onset detection techniques. Preliminary experiments suggest that by coupling these two modalities, a robot accompanist can start and stop a performance in synchrony with a flutist, and detect tempo changes within half a second. ©2010 IEEE.

DOI

Scopus

29

被引用数

(Scopus)
Human-robot ensemble between robot thereminist and human percussionist using coupled oscillator model

Takeshi Mizumoto, Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings 1957 - 1963 2010年 [査読有り]

　概要を見る

This paper presents a novel synchronizing method for a human-robot ensemble using coupled oscillators. We define an ensemble as a synchronized performance produced through interactions between independent players. To attain better synchronized performance, the robot should predict the human's behavior to reduce the difference between the human's and robot's onset timings. Existing studies in such synchronization only adapts to onset intervals, thus, need a considerable time to synchronize. We use a coupled oscillator model to predict the human's behavior. Experimental results show that our method reduces the average of onset time errors; when we use a metronome, a tempo-varying metronome or a human drummer, errors are reduced by 38%, 10% or 14% on the average, respectively. These results mean that the prediction of human's behaviors is effective for the synchronized performance. ©2010 IEEE.

DOI

Scopus

20

被引用数

(Scopus)
Speedup and performance improvement of ica-based robot audition by parallel and resampling-based block-wise processing

Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings 1949 - 1956 2010年 [査読有り]

　概要を見る

This paper describes a speedup and performance improvement of multi-channel semi-blind ICA (MCSB-ICA) with parallel and resampling-based block-wise processing. MCSB-ICA is an integrated method of sound source separation that accomplishes blind source separation, blind dereverberation, and echo cancellation. This method enables robots to separate user's speech signals from observed signals including the robot's own speech, other speech and their reverberations without a priori information. The main problem when MCSB-ICA is applied to robot audition is its high computational cost. We tackle this by multi-threading programming, and the two main issues are 1) the design of parallel processing and 2) incremental implementation. These are solved by a) multiple-stack-based parallel implementation, and b) resampling-based overlaps and block-wise separation. The experimental results proved that our method reduced the real-time factor to less than 0.5 with an eight-core CPU, and it improves the performance of automatic speech recognition by 2.10 points compared with the single-stack-based parallel implementation without the resampling technique. ©2010 IEEE.

DOI

Scopus

2

被引用数

(Scopus)
Method of discriminating known and unknown environmental sounds using recurrent neural network

Yang Zhang, Tetsuya Ogata, Shun Nishide, Toru Takahashi, Hiroshi G. Okuno

SCIS and ISIS 2010 - Joint 5th International Conference on Soft Computing and Intelligent Systems and 11th International Symposium on Advanced Intelligent Systems 378 - 383 2010年

　概要を見る

This paper describes our method to classify nonspeech environmental sounds for robots working. In the real world, two main restrictions pertain in learning. First, robots have to learn using only a small amount of sounds in a limited time and space because of restrictions. Second, it has to detect unknown sounds to avoid false classification since it is virtually impossible to collect samples of all environmental sounds. Most of the previous methods require a huge number of samples of all target sounds, including noises, for training stochastic models such as the Gaussian mixture model. In contrast, we use a neurodynamical model to build a prediction and classification system. The neuro-dynamical system can be trained with a small amount of sounds and generalize others by inferring the sound generation dynamics. After training, a self-organized space is structured for the sound generation dynamics. The proposed system classify on the basis of the self-organized space. The prediction results of sounds are used for determining unknown sounds in our system. In this paper, we show the results of preliminary experiments on the proposed model's classification of known and unknown sound classes.
Query-by-example music information retrieval by score-informed source separation and remixing technologies

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Eurasip Journal on Advances in Signal Processing 2010 2010年 [査読有り]

　概要を見る

We describe a novel query-by-example (QBE) approach in music information retrieval that allows a user to customize query examples by directly modifying the volume of different instrument parts. The underlying hypothesis of this approach is that the musical mood of retrieved results changes in relation to the volume balance of different instruments. On the basis of this hypothesis, we aim to clarify the relationship between the change in the volume balance of a query and the genre of the retrieved pieces, called genre classification shift. Such an understanding would allow us to instruct users in how to generate alternative queries without finding other appropriate pieces. Our QBE system first separates all instrument parts from the audio signal of a piece with the help of its musical score, and then it allows users remix these parts to change the acoustic features that represent the musical mood of the piece. Experimental results showed that the genre classification shift was actually caused by the volume change in the vocal, guitar, and drum parts. © 2010 Katsutoshi Itoyama et al.

DOI

Scopus

10

被引用数

(Scopus)
Missing-feature-theory-based robust simultaneous speech recognition system with non-clean speech acoustic model

Toru Takahashi, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009 2730 - 2735 2009年12月

　概要を見る

A humanoid robot must recognize a target speech signal while people around the robot chat with them in real-world. To recognize the target speech signal, robot has to separate the target speech signal among other speech signals and recognize the separated speech signal. As separated signal includes distortion, automatic speech recognition (ASR) performance degrades. To avoid the degradation, we trained an acoustic model from non-clean speech signals to adapt acoustic feature of distorted signal and adding white noise to separated speech signal before extracting acoustic feature. The issues are (1) To determine optimal noise level to add the training speech signals, and (2) To determine optimal noise level to add the separated signal. In this paper, we investigate how much noises should be added to clean speech data for training and how speech recognition performance improves for different positions of three talkers with soft masking. Experimental results show that the best performance is obtained by adding white noises of 30 dB. The ASR with the acoustic model outperforms with ASR with the clean acoustic model by 4 points. © 2009 IEEE.

DOI

Scopus

5

被引用数

(Scopus)
Incremental polyphonic audio to score alignment using beat tracking for singer robots

Takuma Otsuka, Kazumasa Murata, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009 2289 - 2296 2009年12月 [査読有り]

　概要を見る

We aim at developing a singer robot capable of listening to music with its own "ears" and interacting with a human's musical performance. Such a singer robot requires at least three functions: listening to the music, understanding what position in the music is being performed, and generating a singing voice. In this paper, we focus on the second function, that is, the capability to align an audio signal to its musical score represented symbolically. Issues underlying the score alignment problem are: (1) diversity in the sounds of various musical instruments, (2) difference between the audio signal and the musical score, (3) fluctuation in tempo of the musical performance. Our solutions to these issues are as follows: (1) the design of features based on a chroma vector in the 12-tone model and onset of the sound, (2) defining the rareness for each tone based on the idea that scarcely used tone is salient in the audio signal, and (3) the use of a switching Kalman filter for robust tempo estimation. The experimental result shows that our score alignment method improves the average of cumulative absolute errors in score alignment by 29% using 100 popular music tunes compared to the beat tracking without score alignment. © 2009 IEEE.

DOI

Scopus

14

被引用数

(Scopus)
Phoneme acquisition model based on vowel imitation using recurrent neural network

Hisashi Kanda, Tetsuya Ogata, Toru Takahashi, Kazunori Komatani, Hiroshi G. Okuno

2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009 5388 - 5393 2009年12月 [査読有り]

　概要を見る

A phoneme-acquisition system was developed using a computational model that explains the developmental process of human infants in the early period of acquiring language. There are two important findings in constructing an infant's acquisition of phonemes: (1) an infant's vowel like cooing tends to invoke utterances that are imitated by its caregiver, and (2) maternal imitation effectively reinforces infant vocalization. Therefore, we hypothesized that infants can acquire phonemes to imitate their caregivers' voices by trial and error, i. e., infants use self-vocalization experience to search for imitable and unimitable elements in their caregivers' voices. On the basis of this hypothesis, we constructed a phoneme acquisition process using interaction involving vowel imitation between a human and an infant model. Our infant model had a vocal tract system, called the Maeda model, and an auditory system implemented by using Mel-Frequency Cepstral Coefficients (MFCCs) through STRAIGHT analysis. We applied Recurrent Neural Network with Parametric Bias (RNNPB) to learn the experience of self-vocalization, to recognize the human voice, and to produce the sound imitated by the infant model. To evaluate imitable and unimitable sounds, we used the prediction error of the RNNPB model. The experimental results revealed that as imitation interactions were repeated, the formants of sounds imitated by our system moved closer to those of human voices, and our system could self-organize the same vowels in different continuous sounds. This suggests that our system can reflect the process of phoneme acquisition. © 2009 IEEE.

DOI

Scopus

6

被引用数

(Scopus)
Emergence of evolutionary interaction with voice and motion between two robots using RNN

Wataru Hinoshita, Tetsuya Ogata, Hideki Kozima, Hisashi Kanda, Toru Takahashi, Hiroshi G. Okuno

2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009 4186 - 4192 2009年12月 [査読有り]

　概要を見る

We propose a model of evolutionary interaction between two robots where signs used for communication emerge through mutual adaptation. Signs used in human interaction, e.g., language, gestures and eye contact change and evolve in form and meaning through repeated use. To create flexible human-like interaction systems, it is necessary to deal with signs as a dynamic property and to construct a framework in which signs emerge from mutual adaptation by agents. Our target is multi-modal interaction using voice and motion between two robots where a voice/motion pattern is used as a sign referring to a motion/voice pattern. To enable evolutionary signs (voice and motion patterns) to be recognized and generated, we utilized a dynamics model: Multiple Timescale Recurrent Neural Network (MTRNN). To enable the robots to interpret signs, we utilized hierarchical neural networks, which transform dynamics model parameters of voice/motion into those of motion/voice. In our experiment, two robots modified their own interpretation of signs constantly through mutual adaptation in interaction where they responded to the other's voice with motion one after the other. As a result of the experiment, we found that the interaction kept evolving through the robots' repeated and alternate miscommunications and readaptations, and this induced the emergence of diverse new signs that depended on the robots' body dynamics through the generalization capability of MTRNN. © 2009 IEEE.

DOI

Scopus

5

被引用数

(Scopus)
Step-size parameter adaptation of multi-channel semi-blind ICA with piecewise linear model for barge-in-able robot audition

Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009 2277 - 2282 2009年12月 [査読有り]

　概要を見る

This paper describes a step-size parameter adaptation technique of multi-channel semi-blind independent component analysis (MCSB-ICA) for a "barge-in-able" robot audition system. By "barge-in", we mean that the user can speak simultaneously when the robot is speaking. We focused on MCSB-ICA to achieve such an audition system because it can separate a user's and a robot's speech under reverberant environments. The problem with MCSB-ICA for robot audition is the slow speed of convergence in estimating a separation filter due to its step-size parameters. Many optimization methods cannot be adopted because their computational costs are proportional to the 2nd order of the reverberation time. Our method yields adaptive step-size parameters with MCSB-ICA at low computational costs. It is based on three techniques; 1) recursive expression of the separation process, 2) a piecewise linear model of the step-size of the separation filter, and 3) adaptive step-size parameters with a sub-ICA-filter. Experimental results show that our approach attains faster convergence speed and lower computational costs than those with a fixed step-size parameter. © 2009 IEEE.

DOI

Scopus

6

被引用数

(Scopus)
Modeling tool-body assimilation using second-order recurrent neural network

Shun Nishide, Tatsuhiro Nakagawa, Tetsuya Ogata, Jun Tani, Toru Takahashi, Hiroshi G. Okuno

2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009 5376 - 5381 2009年12月 [査読有り]

　概要を見る

Tool-body assimilation is one of the intelligent human abilities. Through trial and experience, humans are capable of using tools as if they are part of their own bodies. This paper presents a method to apply a robot's active sensing experience for creating the tool-body assimilation model. The model is composed of a feature extraction module, dynamics learning module, and a tool recognition module. Self-Organizing Map (SOM) is used for the feature extraction module to extract object features from raw images. Multiple Time-scales Recurrent Neural Network (MTRNN) is used as the dynamics learning module. Parametric Bias (PB) nodes are attached to the weights of MTRNN as second-order network to modulate the behavior of MTRNN based on the tool. The generalization capability of neural networks provide the model the ability to deal with unknown tools. Experiments are performed with HRP-2 using no tool, I-shaped, T-shaped, and L-shaped tools. The distribution of PB values have shown that the model has learned that the robot's dynamic properties change when holding a tool. The results of the experiment show that the tool-body assimilation model is capable of applying to unknown objects to generate goal-oriented motions. © 2009 IEEE.

DOI

Scopus

12

被引用数

(Scopus)
Thereminist Robot: Development of a Robot Theremin Player based on a Theremin”s Pitch Model

Takeshi Mizumoto, Hiroshi Tsujino, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Special SIGMUS Symposium 2297 - 2302 2009年12月 [査読有り]

DOI

Scopus

19

被引用数

(Scopus)
神経回路モデルの感覚・行為予測に基づく空間認知モデル

尾形哲也, 西出俊

計測と制御 = Journal of the Society of Instrument and Control Engineers 48 ( 12 ) 852 - 857 2009年12月

CiNii
弦の音響差異を考慮したバイオリン演奏音響信号に対する運指推定

前澤陽, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

日本音響学会関西支部第12回若手研究者交流研究発表会 2009年12月
ハードウェアをカスタマイズできるコミュニケーションロボットに関する研究

守良真, 由美子, 近藤裕樹, 菅佑樹, 尾形哲也, 菅野重樹

第10回システムインテグレーション部門講演会 (SI2009), 計測自動制御学会 2F1 - 2 2009年12月
MTRNNによる環境音の予測識別

張陽, 尾形哲也, 高橋徹, 駒谷和範, 奥乃博

第10回システムインテグレーション部門講演会 (SI2009), 計測自動制御学会 1H4 - 5 2009年12月
京都大学大学院情報学研究科音声メディア分野

尾形哲也

バイオメカニズム学会誌 = Journal of the Society of Biomechanisms 33 ( 4 ) 284 - 286 2009年11月

CiNii
特集「発声と音声認知のメカニズムの理解を目指して」に寄せて

尾形哲也

バイオメカニズム学会誌 = Journal of the Society of Biomechanisms 33 ( 4 ) 224 - 224 2009年11月

DOI CiNii
Self-organization of dynamic object features based on bidirectional training

Shun Nishide, Tetsuya Ogata, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

Advanced Robotics 23 ( 15 ) 2035 - 2057 2009年10月 [査読有り]

　概要を見る

This paper presents a method to self-organize object features that describe object dynamics using bidirectional training. The model is composed of a dynamics learning module and a feature extraction module. Recurrent Neural Network with Parametric Bias (RNNPB) is utilized for the dynamics learning module, learning and self-organizing the sequences of robot and object motions. A hierarchical neural network is linked to the input of RNNPB as the feature extraction module for self-organizing object features that describe the object motions. The two modules are simultaneously trained through bidirectional training using image and motion sequences acquired from the robot's active sensing with objects. Experiments are performed with the robot's pushing motion with a variety of objects to generate sliding, falling over, bouncing and rolling motions. The results have shown that the model is capable of self-organizing object dynamics based on the self-organized features. © Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2009.

DOI

Scopus

4

被引用数

(Scopus)
Target speech detection and separation for communication with humanoid robots in noisy home environments

Hyun Don Kim, Jinsung Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Advanced Robotics 23 ( 15 ) 2093 - 2111 2009年10月 [査読有り]

　概要を見る

People usually talk face to face when they communicate with their partner. Therefore, in robot audition, the recognition of the front talker is critical for smooth interactions. This paper presents an enhanced speech detection method for a humanoid robot that can separate and recognize speech signals originating from the front even in noisy home environments. The robot audition system consists of a new type of voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method and a maximum signal-to-noise ratio (SNR) beamformer. This VAD based on CSCC can classify speech signals that are retrieved at the frontal region of two microphones embedded on the robot. The system works in real-time without needing training filter coefficients given in advance even in a noisy environment (SNR > 0 dB). It can cope with speech noise generated from televisions and audio devices that does not originate from the center. Experiments using a humanoid robot, SIG2, with two microphones showed that our system enhanced extracted target speech signals more than 12 dB (SNR) and the success rate of automatic speech recognition for Japanese words was increased by about 17 points. © Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2009.

DOI

Scopus

1

被引用数

(Scopus)
音響信号とコンテキスト制約を併用したバイオリンの演奏弦系列の推定

前澤陽, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

日本音響学会研究発表会講演論文集(CD-ROM) 2009 ROMBUNNO.2-5-15 2009年09月

J-GLOBAL
Enabling A User To Specify An Item At Any Time During System Enumeration

Kyoko MATSUYAMA, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Proceedings of International Conference on Spoken Language Processing (Interspeech-2009) 4 - 1 2009年09月
RNNを備えた2体のロボット間における身体性に基づいた動的コミュニケーションモデル

日下航, 尾形哲也, 小嶋秀樹, 高橋徹, 奥乃博

日本ロボット学会第27回学術講演会, 横浜国立大学 2009年09月
二次リカレントニューラルネットワークを用いた道具身体化モデルの構築

西出俊, 中川達裕, 尾形哲也, 谷淳, 高橋徹, 奥乃博

日本ロボット学会第27回学術講演会, 横浜国立大学 2009年09月
ロボット音声対話におけるバージン発話の指示対象同定

松山匡子, 駒谷和範, 武田龍, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

日本ロボット学会第27回学術講演会, 横浜国立大学 2009年09月
テルミン演奏ロボットのための音高依存性を考慮した音量モデル

水本武志, 辻野広司, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

日本ロボット学会第27回学術講演会, 横浜国立大学 2009年09月
Voice-Awareness Control Consistent with Robot”s Body Movements

大塚琢馬, 中臺一博, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

日本ロボット学会第27回学術講演会, 横浜国立大学 2009年09月
頭部音響伝達関数を用いたGSSによる3話者同時発話認識?HARK 1.0.0 の新機能?

高橋徹, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

日本ロボット学会第27回学術講演会, 横浜国立大学 2009年09月

CiNii
声道物理モデルの母音列繰り返し模倣による音素獲得シミュレーション (基調講演)

尾形哲也, 神田尚, 高橋徹, 駒谷和範, 奥乃博

日本ロボット学会第27回学術講演会, 横浜国立大学 2009年09月
Binaural active audition for humanoid robots to localise speech over entire azimuth range

Hyun Don Kim, Kazunori Komatani, Tetsuya Ogata, HIROShi G. Okuno

Applied Bionics and Biomechanics 6 ( 3-4 ) 355 - 367 2009年09月

　概要を見る

We applied motion theory to robot audition to improve the inadequate performance. Motions are critical for overcoming the ambiguity and sparseness of information obtained by two microphones. To realise this, we first designed a sound source localisation system integrated with cross-power spectrum phase (CSP) analysis and an EM algorithm. The CSP of sound signals obtained with only two microphones was used to localise the sound source without having to measure impulse response data. The expectation-maximisation (EM) algorithm helped the system to cope with several moving sound sources and reduce localisation errors. We then proposed a way of constructing a database for moving sounds to evaluate binaural sound source localisation. We evaluated our sound localisation method using artificial moving sounds and confirmed that it could effectively localise moving sounds slower than 1.125 rad/s. Consequently, we solved the problem of distinguishing whether sounds were coming from the front or rear by rotating and/or tipping the robot's head that was equipped with only two microphones. Our system was applied to a humanoid robot called SIG2, and we confirmed its ability to localise sounds over the entire azimuth range as the success rates for sound localisation in the front and rear areas were 97.6% and 75.6% respectively. © 2009 Taylor & Francis.

DOI

Scopus

5

被引用数

(Scopus)
Parameter Estimation for Harmonic and Inharmonic Models by Using Timbre Feature Distributions

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, HiroshiG.Okuno

情報処理学会論文誌 50 ( 7 ) 1757 - 1767 2009年07月

　概要を見る

We describe an improved way of estimating parameters for an integrated weighted-mixture model consisting of both harmonic and inharmonic tone models. Our final goal is to build an instrument equalizer (music remixer) that enables a user to change the volume of parts of polyphonic sound mixtures. To realize the instrument equalizer, musical signals must be separated into each musical instrument part. We have developed a score-informed sound source separation method using the integrated model. A remaining but critical problem is to find a way to deal with timbre varieties caused by various performance styles and instrument bodies because our method used template sounds to represent their timbre. Template sounds are generated from a MIDI tone generator based on an aligned score. Difference of instrument bodies between mixed signals and template sounds causes timbre difference and decreases separation performance. To solve this problem, we train probabilistic distributions of timbre features using various sounds to reduce template dependency. By adding a new constraint of maximizing the likelihood of timbre features extracted from each tone model, we can estimate model parameters that express the timbre more accurately. Experimental results show that separation performance improved from 4.89 to 8.48 dB.We describe an improved way of estimating parameters for an integrated weighted-mixture model consisting of both harmonic and inharmonic tone models. Our final goal is to build an instrument equalizer (music remixer) that enables a user to change the volume of parts of polyphonic sound mixtures. To realize the instrument equalizer, musical signals must be separated into each musical instrument part. We have developed a score-informed sound source separation method using the integrated model. A remaining but critical problem is to find a way to deal with timbre varieties caused by various performance styles and instrument bodies because our method used template sounds to represent their timbre. Template sounds are generated from a MIDI tone generator based on an aligned score. Difference of instrument bodies between mixed signals and template sounds causes timbre difference and decreases separation performance. To solve this problem, we train probabilistic distributions of timbre features using various sounds to reduce template dependency. By adding a new constraint of maximizing the likelihood of timbre features extracted from each tone model, we can estimate model parameters that express the timbre more accurately. Experimental results show that separation performance improved from 4.89 to 8.48 dB.

CiNii
音声対話システムにおける文法検証結果と発話履歴に基づくヘルプメッセージ候補のランキング

駒谷和範, 池田智志, 福林雄一朗, 尾形哲也, 奥乃博

情報処理学会音声言語情報処理研究会, 飯坂温泉,情報処理学会. 2009 ( 12 ) 1 - 6 2009年07月

　概要を見る

本稿では，ヘルプメッセージを提示することにより，音声対話システムにおける想定外発話の問題に取り組む．想定外発話に対する音声認識結果では，重要な単語が誤認識されていたり，ユーザが十分な情報を発話しない場合があるため，想定外発話に対するヘルプメッセージ生成は困難な課題である．本稿ではまず，重みつき有限状態トランスデューサ (WFST) を用いて，想定外発話に対する文法検証手法を開発した．この文法検証では，音声認識結果中の重要な単語が抜けている場合でも，ユーザが意図した表現に最も近い文法を同定する．さらに，この文法検証結果やユーザの経験を表す対話履歴を特徴として用いて，RankBoost アルゴリズムによりヘルプメッセージ候補の順序付けを行った．We address an issue of out-of-grammar (OOG) utterances in spoken dialogue systems by generating help messages for novice users. Help generation for OOG utterances is a challenging problem because language understanding (LU) results based on automatic speech recognition (ASR) results for such utterances are always erroneous as important words are often misrecognized or missed from such utterances. We first develop grammar verification for OOG utterances by using a Weighted Finite-State Transducer (WFST). It robustly identifies a grammar rule that a user intends to utter, even when some important words are missed from the ASR result. We then adopt a ranking algorithm, RankBoost, whose features include the grammar verification results and the utterance history representing the user's experience.

CiNii
音響信号と音楽的制約を統合したバイオリンの演奏弦系列の推定

前澤陽, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

音楽情報科学研究会, 情報処理学会 Vol.2009-MUS-81 ( No.5 ) 1 - 6 2009年07月
複数楽器混合モデルのパラメータ推定と楽器名同定への応用

糸山克寿, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

音楽情報科学研究会, 情報処理学会 Vol.2009-MUS-81 ( No.13 ) 1 - 6 2009年07月
残差スペクトルモデルによる伴奏・残響成分抑制に基づいた楽器演奏分析合成の高精度化

安良岡直希, 安部武宏, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

音楽情報科学研究会 Vol.2009-MUS-81 ( No.10 ) 1 - 6 2009年07月

　概要を見る

本報告書では，楽器演奏音響信号の分析合成における，入力中の伴奏音や残響成分を抑制した分析手法を報告する．対象演奏パートの楽譜情報に合致しないスペクトル成分を表現する残差スペクトルモデルを導入し, これを用いて伴奏や残響を含む音響信号から対象の演奏を効率よく分離する. 調波非調波統合音モデルに用いた演奏分析をこの分離と同時に行い, 分析された音モデルを用いて未知楽譜への演奏を合成する.評価実験では, 伴奏付き演奏に対する分析精度が本手法によりスペクトル距離において平均 35.2% 改善し, また残響を含む演奏に対する分析合成精度の低下を回避できる事が確認された.This paper presents a musical performance analysis-and-synthesis method using residual model for reduction of accompaniment or sound reverberation. The residual model is designed for representing spectrum that the score does not convey about the performance. This leads to an efficient extraction of a performed part from accompanied and/or reverberant audio source. The extraction is performed simultaneously with estimation of musical tone models that represent both harmonic and inharmonic sound of the performance. Using the estimated tone models, a new performance sound corresponding to a new given score is synthesized. An experiment showed that the spectral distance of one instrument part extracted from polyphonic audio source improved by 35.0 points by incorporating the residual model. Another result showed the effectiveness of our method under reverberant source.

CiNii
多重奏楽曲の楽器音量バランス変化による音楽ジャンルシフト

糸山克寿, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

音楽情報科学研究会 Vol.2009-MUS-8 ( No.3 ) 1 - 6 2009年07月
バージイン発話タイミングモデルを導入した指示対象同定

松山匡子, 駒谷和範, 武田龍, 尾形哲也, 奥乃博

情報処理学会研究報告(CD-ROM) 2009 ( 1 ) ROMBUNNO.NL-191,14 2009年06月

J-GLOBAL
市民参画のための公的討議の議事録閲覧支援システム

白松俊, 駒谷和範, 尾形哲也, 奥乃博

人工知能学会全国大会 (JSAI2009) 3I1 - 1 2009年06月
バージイン発話タイミングモデルを導入した指示対象同定

松山匡子, 駒谷和範, 武田龍, 尾形哲也, 奥乃博

研究報告自然言語処理（NL） 2009 ( 14 ) 1 - 7 2009年05月

　概要を見る

自然な会話を実現できる音声対話システムでは，ユーザが自由なタイミングや言語表現で発話できることが望ましい．我々は，ユーザが任意のタイミングでシステム発話に割り込み（バージイン）できる手法を開発している．本手法では，Independent Component Analysis (ICA) に基づくセミブラインド音源分離を利用している．本稿では，システムが列挙する項目に対してユーザがバージイン発話で指定した対象を同定するために，ユーザのバージイン発話から得られるタイミング情報を用いて解釈する新手法について報告する．まず，ユーザが参照表現を用いて発話する場合のタイミング分布を，予備調査の結果に基づき，ガンマ分布で近似する．次に，システムの読み上げる各項目に対して，ユーザ発話がそのタイミングで解釈されるべき場合とその音声認識結果で解釈されるべき場合とをそれぞれ確率として表現する．これら2つの確率を統合し，最も尤度の高い項目をユーザの指示対象と同定する．システムが列挙する項目の一つを指定するユーザのバージイン発話400発話に対して，本手法が２つのベースライン手法(音声認識結果のみから指示対象を同定する手法，及び，ユーザの発話タイミングのみから指示対象を同定する手法) よりも高精度に同定できることを実験により確認した．In conversational dialogue systems, the user prefers to speak at any time and to use natural expressions. We have developed an Independent Component Analysis (ICA) based semiblind source separation method, which allows users to barge-in over system utterances at any time. We create a novel method from timing information derived from barge-in utterances to identify one item that a user indicates during system enumeration. First, we determine the timing distribution of user utterances containing referential expressions and then approximate it using gamma distribution. Second, we represent both the utterance timing and automatic speech recognition (ASR) results as probabilities of the desired selection from the system's enumeration. We then integrate these two probabilities to identify the item having the maximum likelihood of selection. Experimental results using 400 utterances indicated that our method outperformed two methods used as a baseline (one of ASR results only and one of utterance timing only) in identification accuracy.

CiNii
バージイン発話タイミングを導入した指示対象同定

松山匡子, 駒谷和範, 武田龍, 尾形哲也, 奥乃博

情報処理学会音声言語研究会 2009年05月
神経回路モデルを用いた音声模倣モデルによる音声バブリングと母音獲得過程シミュレーション

神田尚, 尾形哲也, 高橋徹, 駒谷和範, 奥乃博

全国大会講演論文集 71 ( 0 ) 133 - 134 2009年03月

CiNii J-GLOBAL
ベース音高確率とクロマベクトルの相関を考慮した和音進行認識

高野秀樹, 須見康平, 糸山克寿, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 71 ( 0 ) 241 - 242 2009年03月

CiNii J-GLOBAL
歌唱ロボットのためのビート情報とメロディ・ハーモニー情報の統合による音楽音響信号と楽譜の実時間同期手法の開発

大塚琢馬, 村田和真, 武田龍, 中臺一博, 高橋徹, 尾形哲也, 奥乃博

全国大会講演論文集 71 ( 0 ) 243 - 244 2009年03月

CiNii
音色特徴の歪みを回避した楽器音の音高・音長操作手法

安部武宏, 糸山克寿, 吉井和佳, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会論文誌 50 ( 3 ) 1054 - 1066 2009年03月
話題遷移図の可視化と話題遷移に応じた関連情報提示による議事録閲覧支援

白松俊, 駒谷和範, 尾形哲也, 高橋徹, 奥乃博

言語処理学会第15回年次大会 D2 - 1 2009年03月
RNNを備えた2対の小型ロボット間の首振り動作と音声によるインタラクションにおける共有シンボルの創発

日下航, 神田尚, 尾形哲也, 小嶋秀樹, 奥乃博

情報処理学会第71回全国大会 71 325 - 326 2009年03月

CiNii
神経回路モデルを用いた音声模倣モデルによる音声バブリングと音声獲得過程シミュレーション

神田尚, 尾形哲也, 高橋徹, 駒谷和範, 奥乃博

情報処理学会第71回全国大会 2009年03月
実環境音声対話システムにおけるバージイン発話タイミングを活用した指示対象の同定

松山匡子, 駒谷和範, 白松俊, 武田龍, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 2009年03月
音声認識と言語理解を動的に選択する音声理解フレームワーク

勝丸真樹, 中野幹生, 駒谷和範, 成松宏美, 船腰孝太郎, 辻野広司, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 2009年03月
音声対話システムにおける想定外発話の文法検証を用いた対話行為推定に基づくヘルプ生成,

池田智志, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 2009年03月
顔追跡による音環境可視化システムのアウエアネスの改善

久保田祐史, 白松俊, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 2009年03月
フィールドにおける音源定位のための音声視覚化デバイス「カエルホタル」の設計

水本武志, 合原一究, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 2009年03月
Probabilistic Classification of Monophonic Instrument Playing Techniques

前澤陽, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 2009年03月
楽器の内部モデルに基づくフィードフォワード制御によるテルミン演奏ロボットの開発

水本武志, 辻野広司, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 2009年03月
ロボットによる卓上物体操作のためのRNNを用いた道具身体化モデルの構築

中川達裕, 尾形哲也, 谷淳, 高橋徹, 奥乃博

情報処理学会第71回全国大会 2009年03月
ソフトマスクと音響モデル適応を用いた3話者同時発話音声認識

高橋徹, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 2009年03月
A Music Retrieval Approach from Alternative Genres of Query by Adjusting Instrument Volume

王凱平, 糸山克寿, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 2009年03月
連続発音中の音色変化に着目した未学習譜面情への演奏信号生成

安良岡直希, 安部武宏, 糸山克寿, 高橋徹, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 71 217 - 218 2009年03月

CiNii
ベース音高とクロマベクトルの相関に基づいた和音進行認識

高野秀樹, 須見康平, 糸山克寿, 吉井和佳, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 2009年03月
音色特徴量に基づく調波・非調波統合モデルによる楽器音モーフィング

安部武宏, 糸山克寿, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第71回全国大会 2009年03月
マルチドメイン音声対話システムにおけるトピック推定と対話履歴の統合によるドメイン選択手法

池田智志, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会論文誌 50 ( 2 ) 488 - 500 2009年02月

CiNii
複数の言語モデル・言語理解方式を用いた音声理解の高精度化

勝丸真樹, 中野幹生, 駒谷和範, 成松宏美, 船腰孝太郎, 辻野広司, 高橋徹, 尾形哲也, 奥乃博

第75回音声言語情報処理研究会, 2009-SLP-75 (9), 情処研報 Vol.2009 ( No.10 ) 45 - 50 2009年02月
市民参画のための公的討議の議事録閲覧支援システム

白松俊, 駒谷和範, 尾形哲也

論文集 23 1 - 4 2009年

CiNii
Parameter Estimation for Harmonic and Inharmonic Models by Using Timbre Feature Distributions

Itoyama Katsutoshi, Goto Masataka, Komatani Kazunori, Ogata Tetsuya, G. Okuno Hiroshi

Information and Media Technologies 4 ( 3 ) 672 - 682 2009年

　概要を見る

We describe an improved way of estimating parameters for an integrated weighted-mixture model consisting of both harmonic and inharmonic tone models. Our final goal is to build an instrument equalizer (music remixer) that enables a user to change the volume of parts of polyphonic sound mixtures. To realize the instrument equalizer, musical signals must be separated into each musical instrument part. We have developed a score-informed sound source separation method using the integrated model. A remaining but critical problem is to find a way to deal with timbre varieties caused by various performance styles and instrument bodies because our method used template sounds to represent their timbre. Template sounds are generated from a MIDI tone generator based on an aligned score. Difference of instrument bodies between mixed signals and template sounds causes timbre difference and decreases separation performance. To solve this problem, we train probabilistic distributions of timbre features using various sounds to reduce template dependency. By adding a new constraint of maximizing the likelihood of timbre features extracted from each tone model, we can estimate model parameters that express the timbre more accurately. Experimental results show that separation performance improved from 4.89 to 8.48dB.

DOI CiNii
自己形態主張を行うカスタマイズ可能なコミュニケーションロボットの研究

守良真, 近藤裕樹, 奥出京司郎, 菅佑樹, 尾形哲也, 菅野重樹

日本機械学会ロボティクス・メカトロニクス部門(ROBOMEC2009) 1P1-F11 2009年
ハードウェアをカスタマイズできるコミュニケーションロボットにおける研究

近藤裕樹, 守良真, 奥出京司郎, 菅佑樹, 尾形哲也, 菅野重樹

日本機械学会ロボティクス・メカトロニクス部門(ROBOMEC2009) 1P1-E21 2009年
Enabling a user to specify an item at any time during system enumeration - Item identification for barge-in-able conversational dialogue systems

Kyoko Matsuyama, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 252 - 255 2009年 [査読有り]

　概要を見る

In conversational dialogue systems, users prefer to speak at any time and to use natural expressions. We have developed an Independent Component Analysis (ICA) based semi-blind source separation method, which allows users to barge-in over system utterances at any time. We created a novel method from timing information derived from barge-in utterances to identify one item that a user indicates during system enumeration. First, we determine the timing distribution of user utterances containing referential expressions and then approximate it using a gamma distribution. Second, we represent both the utterance timing and automatic speech recognition (ASR) results as probabilities of the desired selection from the system's enumeration. We then integrate these two probabilities to identify the item having the maximum likelihood of selection. Experimental results using 400 utterances indicated that our method outperformed two methods used as a baseline (one of ASR results only and one of utterance timing only) in identification accuracy. Copyright © 2009 ISCA.
Ranking help message candidates based on robust grammar verification results and utterance history in spoken dialogue systems

Kazunori Komatani, Satoshi Ikeda, Yuichiro Fukubayashi, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the SIGDIAL 2009 Conference: 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue 314 - 321 2009年 [査読有り]

　概要を見る

We address an issue of out-of-grammar (OOG) utterances in spoken dialogue systems by generating help messages for novice users. Help generation for OOG utterances is a challenging problem because language understanding (LU) results based on automatic speech recognition (ASR) results for such utterances are always erroneous as important words are often misrecognized or missed from such utterances. We first develop grammar verification for OOG utterances on the basis of a Weighted Finite-State Transducer (WFST). It robustly identifies a grammar rule that a user intends to utter, even when some important words are missed from the ASR result. We then adopt a ranking algorithm, RankBoost, whose features include the grammar verification results and the utterance history representing the user's experience. © 2009 Association for Computational Linguistics.

DOI

Scopus

2

被引用数

(Scopus)
A Speech Understanding Framework that Uses Multiple Language Models and Multiple Understanding Models.

Masaki Katsumaru, Mikio Nakano, Kazunori Komatani, Kotaro Funakoshi, Tetsuya Ogata, Hiroshi G. Okuno

Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31 - June 5, 2009, Boulder, Colorado, USA, Short Papers 133 - 136 2009年 [査読有り]
残響下でのバージイン発話認識のための多入力独立成分分析を応用したロボット聴覚

武田龍, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

日本ロボット学会誌 27 ( 7 ) 782 - 792 2009年
人工神経回路モデルと声道物理モデルを用いた母音模倣モデルに基づく音素獲得シミュレーション

神田尚, 尾形哲也, 高橋徹, 駒谷和範, 奥乃博

日本ロボット学会誌 27 ( 7 ) 802 - 813 2009年
Autonomous Motion Generation Based on Reliable Predictability.

Shun Nishide, Tetsuya Ogata, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

J. Robotics Mechatronics 21 ( 4 ) 478 - 488 2009年

DOI

Scopus

4

被引用数

(Scopus)
Adjusting occurrence probabilities of automatically-generated abbreviated words in spoken dialogue systems

Masaki Katsumaru, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5579 LNAI 481 - 490 2009年 [査読有り]

　概要を見る

Users often abbreviate long words when using spoken dialogue systems, which results in automatic speech recognition (ASR) errors. We define abbreviated words as sub-words of an original word and add them to the ASR dictionary. The first problem we face is that proper nouns cannot be correctly segmented by general morphological analyzers, although long and compound words need to be segmented in agglutinative languages such as Japanese. The second is that, as vocabulary size increases, adding many abbreviated words degrades the ASR accuracy. We have developed two methods, (1) to segment words by using conjunction probabilities between characters, and (2) to adjust occurrence probabilities of generated abbreviated words on the basis of the following two cues: phonological similarities between the abbreviated and original words and frequencies of abbreviated words in Web documents. Our method improves ASR accuracy by 34.9 points for utterances containing abbreviated words without degrading the accuracy for utterances containing original words. © 2009 Springer Berlin Heidelberg Spoken dialogue systems*abbreviated words*adjusting occurrence probabilities.

DOI

Scopus

2

被引用数

(Scopus)
Automatic estimation of reverberation time with robot speech to improve ICA-based robot audition

Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

9th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS09 250 - 255 2009年

　概要を見る

This paper presents an ICA-based robot audition system which estimates the reverberation time of the environment automatically by using the robot's own speech. The system is based on multi-channel semi-blind independent component analysis (MCSB-ICA), a source separation method using a microphone array that can separate user and robot speech under reverberant environments. Perception of the reverberation time (RT) is critical, because an inappropriate RT degrades separation performance and increases processing time. Unlike most previous methods that assume the RT is given in advance, our method estimates an RT by using the echo's intensity of the robot's own speech. It has three steps: speaks a sentence in a new environment, calculates the relative powers of the echoes, and estimates the RT using linear regression of them. Experimental results show that this method sets an appropriate RT for MCSB-ICA for real-world environments and that word correctness is improved by up to 6 points and processing time is reduced by up to 60%. ©2009 IEEE.

DOI

Scopus

2

被引用数

(Scopus)
ICA-based efficient blind dereverberation and echo cancellation method for barge-in-able robot audition

Ryu Takeda, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 3677 - 3680 2009年 [査読有り]

　概要を見る

This paper describes a new method that allows "Barge-In" in various environments for robot audition. "Barge-in" means that a user begins to speak simultaneously while a robot is speaking. To achieve the function, we must deal with problems on blind dereverberation and echo cancellation at the same time. We adopt Independent Component Analysis (ICA) because it essentially provides a natural framework for these two problems. To deal with reverberation, we apply a Multiple Input/Output INverse-filtering Theorem-based model of observation to the frequency domain ICA. The main problem is its high-computational cost of ICA. We reduce the computational complexity to the linear order of reverberation time by using two techniques: 1) a separation model based on observed signal independence, and 2) enforced spatial sphering for preprocessing. The experimental results revealed that our method improved word correctness of reverberant speech by 10-20 points. ©2009 IEEE.

DOI

Scopus

18

被引用数

(Scopus)
Human tracking system integrating sound and face localization using an expectation-maximization algorithm in real environments

Hyun Don Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Advanced Robotics 23 ( 6 ) 629 - 653 2009年 [査読有り]

　概要を見る

We have developed a human tracking system for use by robots that integrate sound and face localization. Conventional systems usually require many microphones and/or prior information to localize several sound sources. Moreover, they are incapable of coping with various types of background noise. Our system, the cross-power spectrum phase analysis of sound signals obtained with only two microphones, is used to localize the sound source without having to use prior information such as impulse response data. An expectation- maximization (EM) algorithm is used to help the system cope with several moving sound sources. The problem of distinguishing whether sounds are coming from the front or back is also solved with only two microphones by rotating the robot's head. A developed method that uses facial skin colors classified by another EM algorithm enables the system to detect faces in various poses. It can compensate for the error in the sound localization for a speaker and also identify noise signals entering from undesired directions by detecting a human face. A developed probability-based method is used to integrate the auditory and visual information in order to produce a reliable tracking path in real-time. Experiments using a robot showed that our system can localize two sounds at the same time and track a communication partner while dealing with various types of background noise. © 2009 Koninklijke Brill NV.

DOI

Scopus

14

被引用数

(Scopus)
Continuous vocal imitation with self-organized vowel spaces in recurrent neural network

Hisashi Kanda, Tetsuya Ogata, Toru Takahashi, Kazunori Komatani, Hiroshi G. Okuno

Proceedings - IEEE International Conference on Robotics and Automation 4438 - 4443 2009年 [査読有り]

　概要を見る

A continuous vocal imitation system was developed using a computational model that explains the process of phoneme acquisition by infants. Human infants perceive speech sounds not as discrete phoneme sequences but as continuous acoustic signals. One of critical problems in phoneme acquisition is the design for segmenting these continuous speech sounds. The key idea to solve this problem is that articulatory mechanisms such as the vocal tract help human beings to perceive speech sound units corresponding to phonemes. To segment acoustic signal with articulatory movement, we apply the segmenting method to our system by Recurrent Neural Network with Parametric Bias (RNNPB). This method determines the multiple segmentation boundaries in a temporal sequence using the prediction error of the RNNPB model, and the PB values obtained by the method can be encoded as kind of phonemes. Our system was implemented by using a physical vocal tract model, called the Maeda model. Experimental results demonstrated that our system can self-organize the same phonemes in different continuous sounds, and can imitate vocal sound involving arbitrary numbers of vowels using the vowel space in the RNNPB. This suggests that our model reflects theprocess of phoneme acquisition. © 2009 IEEE.

DOI

Scopus

13

被引用数

(Scopus)
Prediction and Imitation of Other”s Motions by Reusing Own Forward-Inverse Model in Robots

Tetsuya OGATA, Ryunosuke YOKOYA, Jun TANI, Kazunori KOMATANI, Hiroshi G. OKUNO

Proceedings of IEEE-RAS International Conference on Robots and Automation (ICRA-2009) 4144 - 4149 2009年 [査読有り]

DOI

Scopus

10

被引用数

(Scopus)
Visualization-based approaches to support context sharing towards public involvement support system

Shun Shiramatsu, Yuji Kubota, Kazunori Komatani, Tetsuya Ogata, Toru Takahashi, Hiroshi G. Okuno

Studies in Computational Intelligence 214 111 - 117 2009年 [査読有り]

　概要を見る

In order to facilitate public involvement in the consensus building process needed for community development, a lot of time and effort needs to be spent on assessing and sharing public concerns. This paper presents new approaches for support for context sharing that involve visualizing public meeting records. The first approach is to visualize the transition of topics to enable the user to grasp an overview and to find specific arguments. The second is to visualize topic-related information to enable the user to understand background. The third is to visualize the auditory scene to enable the user to find and to listen to paralinguistic (prosodic) information contained in audio recordings. These approaches are designed on the basis of Visual Information-Seeking Mantra, "Overview first, zoom and filter, then details on demand." These approaches support citizens and stakeholders to find, to track, and to understand target arguments from the records of a public meeting. © 2009 Springer-Verlag Berlin Heidelberg.

DOI

Scopus

2

被引用数

(Scopus)
Parameter estimation for harmonic and inharmonic models by using timbre feature distributions

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Journal of Information Processing 17 ( 7 ) 191 - 201 2009年

　概要を見る

We describe an improved way of estimating parameters for an integrated weighted-mixture model consisting of both harmonic and inharmonic tone models. Our final goal is to build an instrument equalizer (music remixer) that enables a user to change the volume of parts of polyphonic sound mixtures. To realize the instrument equalizer, musical signals must be separated into each musical instrument part. We have developed a score-informed sound source separation method using the integrated model. A remaining but critical problem is to find a way to deal with timbre varieties caused by various performance styles and instrument bodies because our method used template sounds to represent their timbre. Template sounds are generated from a MIDI tone generator based on an aligned score. Difference of instrument bodies between mixed signals and template sounds causes timbre difference and decreases separation performance. To solve this problem, we train probabilistic distributions of timbre features using various sounds to reduce template dependency. By adding a new constraint of maximizing the likelihood of timbre features extracted from each tone model, we can estimate model parameters that express the timbre more accurately. Experimental results show that separation performance improved from 4.89 to 8.48 dB.

DOI

Scopus
Analysis of motion searching based on reliable predictability using recurrent neural network

Shun Nishide, Tetsuya Ogata, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

IEEE/ASME International Conference on Advanced Intelligent Mechatronics, AIM 192 - 197 2009年 [査読有り]

　概要を見る

Reliable predictability is one of the main factors that determine human behaviors. The authors developed a model that searches and generates robot motions based on reliable predictability. Training of the model consists of three phases. In the first phase, the model trains a sequential learner, namely Recurrent Neural Network with Parametric Bias, to self-organize robot and object dynamics. In the second phase, Steepest Descent Method is utilized to search for robot motion that induces the most predictable object motion. In the third phase, a hierarchical neural network is trained to link object image with the searched motion. Experiments were conducted with cylindrical objects. Analysis of the results have shown that the robot has acquired the most reliable robot motion, shifting it according to the posture of the object. Twenty motion generation experiments have resulted in generation of robot motion that induces consistent rolling motion of the objects. ©2009 IEEE.

DOI

Scopus
Development of a meeting browser towards supporting public involvement

Shun Shiramatsu, Tadachika Ozono, Toramatsu Shintani, Kazunori Komatani, Tetsuya Ogata, Toru Takahashi, Hiroshi G. Okuno

Proceedings - 12th IEEE International Conference on Computational Science and Engineering, CSE 2009 4 717 - 722 2009年

　概要を見る

This paper presents novel methods for support for browsing a long meeting record towards supporting public involvement. Facilitating public involvement in the consensus building process for community development needs a lot of effort and time for sharing context and concerns among citizens and stakeholders. A record of public meeting often becomes too long to overview and to understand for people who did not participate in it. The two issues we addressed relate to how to best provide support for these people. First, support for overviewing the changes in a long meeting to track and to find intended arguments. Second, support for understanding the background of arguments. The approaches to the issues are first, to visualize the transition of topics in the meeting, and second provide information related to a transient topic specified by a user. The meeting browser we developed is designed on the basis of Visual Information-Seeking Mantra, "Overview first, zoom and filter, then details on demand." To visualize a dynamic topic flow, a graph for visualizing the topic transition, SalienceGraph is used to track the dynamic transition of the salience of a word. To visualize related information, the search engine based on SalienceGraph retrieves passages related to a transient topic from past meeting records or documents. These approaches support citizens and stakeholders to find, to track, and to understand a target argument from a long meeting record. © 2009 IEEE.

DOI

Scopus

1

被引用数

(Scopus)
Query-by-Example music retrieval approach based on musical genre shift by changing instrument volume

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the 12th International Conference on Digital Audio Effects, DAFx 2009 205 - 212 2009年

　概要を見る

We describe a novel Query-by-Example (QBE) approach in Music Information Retrieval, which allows a user to customize query examples by directly modifying the volume of different instrument parts. The underlying hypothesis is that the musical genre shifts (changes) in relation to the volume balance of different instruments. On the basis of this hypothesis, we aim to clarify the relationship between the change of the volume balance of a query and the shift in the musical genre of retrieved similar pieces, and thus help instruct a user in generating alternative queries without choosing other pieces. Our QBE system first separates all instrument parts from the audio signal of a piece with the help of its musical score, and then lets a user remix those parts to change acoustic features that represent musical mood of the piece. The distribution of those features is modeled by the Gaussian Mixture Model for each musical piece, and the Earth Movers Distance between mixtures of different pieces is used as the degree of their mood similarity. Experimental results showed that the shift was actually caused by the volume change of vocal, guitar, and drums.
Improving speech understanding accuracy with limited training data using multiple language models and multiple understanding models

Masaki Katsumaru, Mikio Nakano, Kazunori Komatani, Kotaro Funakoshi, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2735 - 2738 2009年 [査読有り]

　概要を見る

We aim to improve a speech understanding module with a small amount of training data. A speech understanding module uses a language model (LM) and a language understanding model (LUM). A lot of training data are needed to improve the models. Such data collection is, however, difficult in an actual process of development. We therefore design and develop a new framework that uses multiple LMs and LUMs to improve speech understanding accuracy under various amounts of training data. Even if the amount of available training data is small, each LM and each LUM can deal well with different types of utterances and more utterances are understood by using multiple LM and LUM. As one implementation of the framework, we develop a method for selecting the most appropriate speech understanding result from several candidates. The selection is based on probabilities of correctness calculated by logistic regressions. We evaluate our framework with various amounts of training data. Copyright © 2009 ISCA.
Thereminist Robot: Development of a Robot Theremin Player with Feedforward and Feedback Arm Control based on a Theremin”s Pitch Model (Invited paper)

Takeshi MIZUMOTO, Hiroshi TSUJINO, Toru TAKAHASHI, Tetsuya OGATA, Hiroshi G. OKUNO

Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2009) 2297 - 2302 2009年 [査読有り]
Changing timbre and phrase in existing musical performances as you like - Manipulations of single part using harmonic and inharmonic models

Naoki Yasuraoka, Takehiro Abe, Katsutoshi Itoyama, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

MM'09 - Proceedings of the 2009 ACM Multimedia Conference, with Co-located Workshops and Symposiums 203 - 212 2009年

　概要を見る

This paper presents a new music manipulation method that can change the timbre and phrases of an existing instrumental performance in a polyphonic sound mixture. This method consists of three primitive functions: 1) extracting and analyzing of a single instrumental part from polyphonic music signals, 2) mixing the instrument timbre with another, and 3) rendering a new phrase expression for another given score. The resulting customized part is re-mixed with the remaining parts of the original performance to generate new polyphonic music signals. A single instrumental part is extracted by using an integrated tone model that consists of harmonic and inharmonic tone models with the aid of the score of the single instrumental part. The extraction incorporates a residual model for the single instrumental part in order to avoid crosstalk between instrumental parts. The extracted model parameters are classified into their averages and deviations. The former is treated as instrument timbre and is customized by mixing, while the latter is treated as phrase expression and is customized by rendering. We evaluated our method in three experiments. The first experiment focused on introduction of the residual model, and it showed that the model parameters are estimated more accurately by 35.0 points. The second focused on timbral customization, and it showed that our method is more robust by 42.9 points in spectral distance compared with a conventional sound analysis-synthesis method, STRAIGHT. The third focused on the acoustic fidelity of customizing performance, and it showed that rendering phrase expression according to the note sequence leads to more accurate performance by 9.2 points in spectral distance in comparison with a rendering method that ignores the note sequence. Copyright 2009 ACM.

DOI

Scopus

10

被引用数

(Scopus)
Voice quality manipulation for humanoid robots consistent with their head movements

Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

9th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS09 405 - 410 2009年

　概要を見る

This paper presents voice-quality control of humanoid robots based on a new model of spectral envelope modification corresponding to the vertical head motions, and left-right sound-pressure modulation corresponding to the horizontal head motions. We assume that a pitch-axis rotation, or a vertical head motion, and a yaw-axis rotation, or a horizontal head motion, affect the voice quality independently. Spectral envelope modification model is constructed based on the analysis of human vocalizations. Left-right sound-pressure modulation model is established through the measurements of impulse responses using a pair of microphones. The experiments are carried out using two humanoid robots HRP-2 and Robovie-R2. Experimental results show that our method presents the change in the voice quality derived from pitch-axis head movement in a robot-to-robot dialogue situation when the interval between the robots are 50 cm. It is also confirmed that an observable modulation in the voice quality declines as the distance between the robots becomes large. The voice-cast directionality caused by yaw-axis rotation is observable using our model even when the robots stand as far as 150 cm away. ©2009 IEEE.

DOI

Scopus

4

被引用数

(Scopus)
Bowed string sequence estimation of a violin based on adaptive audio signal classification and context-dependent error correction

Akira Maezawa, Katsutoshi Itoyama, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

ISM 2009 - 11th IEEE International Symposium on Multimedia 9 - 16 2009年 [査読有り]

　概要を見る

The sequence of strings played on a bowed string instrument is essential to understanding of the fingering. Thus, its estimation is required for machine understanding of violin playing. Audio-based identification is the only viable way to realize this goal for existing music recordings. A naïve implementation using audio classification alone, however, is inaccurate and is not robust against variations in string or instruments. We develop a bowed string sequence estimation method by combining audio-based bowed string classification and context-dependent error correction. The robustness against different setups of instruments improves by normalizing the F0-dependent features using the average feature of a recording. The performance of error correction is evaluated using an electric violin with two different brands of strings and and an acoustic violin. By incorporating mean normalization, the recognition error of recognition accuracy due to changing the string alleviates by 8 points, and that due to change of instrument by 12 points. Error correction decreases the error due to change of string by 8 points and that due to different instrument by 9 points. © 2009 IEEE.

DOI

Scopus

8

被引用数

(Scopus)
ロボットによるRNNPBを用いた高予測信頼性動作探索とその解析

西出俊, 尾形哲也, 谷淳, 駒谷和範, 奥乃博

第9回システムインテグレーション部門講演会(SI2008), 計測自動制御学会 81 - 82 2008年12月
連続音響信号と構音情報の分節化に基づく母音音声模倣モデル

神田尚, 尾形哲也, 高橋徹, 駒谷和範, 奥乃博

第9回システムインテグレーション部門講演会(SI2008), 計測自動制御学会 639 - 640 2008年12月
RNNを用いた構音運動文節化に基づく連続母音模倣モデル

神田尚, 尾形哲也, 高橋徹, 駒谷和範, 奥乃博

(社)音響学会関西支部第11回若手研究者交流研究発表会 2008年12月
楽器音イコライザ：楽器パートの音量を操作可能なオーディオプレイヤー

糸山克寿, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

(社)音響学会関西支部第11回若手研究者交流研究発表会 2008年12月
Vocal Imitation Model with Segmenting and Composing Capability of Vowel Structure using Recurrent Neural Network

Hisashi KANDA, Tetsuya OGATA, Toru TAKAHASHI, Kazunori KOMATANI, Hiroshi G. OKUNO

第28回 AI チャレンジ研究会, SIG-Challenge-A802-2, 人工知能学会 7 - 12 2008年11月
Managing out-of-grammar utterances by topic estimation with domain extensibility in multi-domain spoken dialogue systems

Kazunori Komatani, Satoshi Ikeda, Tetsuya Ogata, Hiroshi G. Okuno

Speech Communication 50 ( 10 ) 863 - 870 2008年10月 [査読有り]

　概要を見る

Spoken dialogue systems must inevitably deal with out-of-grammar utterances. We address this problem in multi-domain spoken dialogue systems, which deal with more tasks than a single-domain system. We defined a topic by augmenting a domain about which users want to find more information, and we developed a method of recovering out-of-grammar utterances based on topic estimation, i.e., by providing a help message in the estimated domain. Moreover, domain extensibility, that is, the ability to add new domains to the system, should be inherently retained in multi-domain systems. To estimate domains without sacrificing extensibility, we collected documents from the Web as training data. Since the data contained a certain amount of noise, we used latent semantic mapping (LSM), which enables robust topic estimation by removing the effects of noise from the data. Experimental results showed that our method improved topic estimation accuracy by 23.2 points for data including out-of-grammar utterances. © 2008 Elsevier B.V. All rights reserved.

DOI

Scopus

7

被引用数

(Scopus)
A game-theoretic model of referential coherence and its empirical verification using large Japanese and English corpora

Shun Shiramatsu, Kazunori Komatani, Kôiti Hasida, Tetsuya Ogata, Hiroshi G. Okuno

ACM Transactions on Speech and Language Processing 5 ( 3 ) 1 - 27 2008年10月

　概要を見る

Referential coherence represents the smoothness of discourse resulting from topic continuity and pronominalization. Rational individuals prefer a referentially coherent structure of discourse when they select a language expression and its interpretation. This is a preference for cooperation in communication. By what principle do they share coherent expressions and interpretations? Centering theory is the standard theory of referential coherence [Grosz et al. 1995]. Although it is well designed on the bases of first-order inference rules [Joshi and Kuhn 1979], it does not embody a behavioral principle for the cooperation evident in communication. Hasida [1996] proposed a game-theoretic hypothesis in relation to this issue. We aim to empirically verify Hasida's hypothesis by using corpora of multiple languages. We statistically design language-dependent parameters by using a corpus of the target language. This statistical design enables us to objectively absorb language-specific differences and to verify the universality of Hasida's hypothesis by using corpora. We empirically verified our model by using large Japanese and English corpora. The result proves the language universality of the hypothesis.

DOI

Scopus

4

被引用数

(Scopus)
ミッシングフィーチャ理論に基づく複数話者同時発話音声認識における音響特徴量とマスクの検討

高橋徹, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

日本音響学会 2008年秋季研究発表会講演論文集 95 - 06 2008年10月
独立成分分析に基づく適応フィルタのロボット聴覚への応用

武田龍, 中臺一博, 駒谷和範, 尾形哲也

日本ロボット学会誌 26 ( 6 ) 529 - 536 2008年09月
Synthesis Approach for Manipulating Pitch of a Musical Instrument Sound with Considering Timbral Characteristics

Takehiro ABE, Katsutoshi ITOYAMA, Kazuyoshi YOSHII, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Proceeding of the 11th International Conference on Digital Audio Effects (DAFx-08) 2008年09月
Analysis of Reliable Predictability based Motion Generation using RNNPB

Shun NISHIDE, Tetsuya OGATA, Jun TANI, Kazunori KOMATANI, Hiroshi G. OKUNO

Proceedings of International Conference on Soft Computing and Intelligent Systems and International Symposium on advanced Intelligent Systems (SCIS&ISIS 2008) 305 - 310 2008年09月
ロボット聴覚のためのソフトマスク生成法による周辺話者音声認識率の改善

高橋徹, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

第26回日本ロボット学会学術講演会 1A1 - 01 2008年09月
聴覚機能を持つ音楽ロボットのためのアーキテクチャの設計とビートカウントロボットへの適用

水本武志, 武田龍, 吉井和佳, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

第26回日本ロボット学会学術講演会 1A1 - 02 2008年09月
ロボットにおける自己身体の順逆モデルを再利用した他者行為の予測と模倣

尾形哲也, 横矢龍之介, 谷淳, 駒谷和範, 奥乃博

第26回日本ロボット学会学術講演会 1J1 - 04 2008年09月
RNN を用いた連続音響信号からの母音構造と遷移情報の抽出

神田尚, 尾形哲也, 駒谷和範, 奥乃博

第26回日本ロボット学会学術講演会 1A2 - 01 2008年09月
独立成分分析を応用したロボット聴覚による残響下におけるバージイン発話認識

武田龍, 中臺一博, 高橋徹, 駒谷和範, 尾形哲也, 奥乃博

第26回日本ロボット学会学術講演会 1A2 - 02 2008年09月
物体挙動予測モデルによる動画像特徴量の自己組織化

西出俊, 尾形哲也, 谷淳, 駒谷和範, 奥乃博

第26回日本ロボット学会学術講演会 2N2 - 04 2008年09月
形態主張型コミュニケーションロボットにおける研究—形態主張行動の解り易さのユーザへの影響に関する研究—

近藤裕樹, 奥出京司郎, 岩丸大二郎, 守良真, 菅佑樹, 尾形哲也, 菅野重樹

第26回日本ロボット学会学術講演会 3J2 - 02 2008年09月
音声対話システムにおけるラピッドプロトタイピングを指向したWFSTに基づく言語理解

福林雄一朗, 駒谷和範, 中野幹生, 船越孝太郎, 辻野広司, 尾形哲也, 奥乃博

情報処理学会論文誌 49 ( 8 ) 2762 - 2772 2008年08月

　概要を見る

音声対話システムの開発の初期段階において，言語理解部は，(i) 構築が容易，(ii)様々な表現に対して頑健という2条件を満たす必要がある．本論文では，大量のコーパス収集や人手での詳細な言語理解ルールの記述を行うことなしに，簡単に言語理解部を構築する（ラピッドプロトタイピング）手法について述べる．本手法では，音声認識誤りを含む入力に対して，Weighted Finite State Transducer（WFST）により言語理解結果を出力する．この際の重みは複数種類を定義したうえで，学習データに基づき最適な重みづけを選択する．この重みづけは従来のWFSTを利用した手法に比べて簡単であるため，少ない学習データで動作する．本手法を2つのドメインで評価した結果，本手法では100発話程度の学習で，ベースライン手法より高いコンセプト正解精度が得られた．開発の初期段階にある新たなドメインであっても，この程度の量の発話を集めることは容易であり，本手法は言語理解部のラピッドタイピングに適している．Language understanding (LU) modules for spoken dialogue systems in an early phase of their development need to be (i) easy to construct, and (ii) robust against various expressions. In this paper, we describe a method for constructing LU modules easily without a large amount of corpus or complicated handcrafted rules. An LU result is selected with Weighted Finite State Transducer from an automatic speech recognition output that may contain speech recognition errors. We designed several weighting schemes. A weighting scheme is determined by using training data. Since these weighting schemes are simpler than conventional methods, our method does not need a large amount of data for determining an optimal scheme. We evaluated our method in two different domains. The results revealed that our method outperformed baseline methods with less than one hundred utterances as training data, which can be reasonably prepared for new domains. This shows that our method is appropriate for a rapid prototyping of LU modules.

CiNii
音高による音色変化を考慮した楽器音の音高・音長操作手法

安部武宏, 糸山克寿, 吉井和佳, 駒谷和範, 尾形哲也, 奥乃博

音楽情報科学研究会, 2008-MUS-76, 情報処理学会 Vol.2008 2008年08月
楽器音イコライザによる音色の類似度に基づく楽曲検索システム

糸山克寿, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

音楽情報科学研究会, 2008-MUS-76, 情報処理学会 Vol.2008 2008年08月
音素獲得に向けたリカレントニューラルネットワークによる音響信号と構音運動の分節化

神田尚, 尾形哲也, 駒谷和範, 奥乃博

日本機械学会ロボティクスメカトロニクス講演会 2P1 - G03 2008年06月
ユーザのカスタマイズを受容・拒否できる機構を持つロボットシステムの開発

近藤裕樹, 坂上徳翁, 奥出京司郎, 岩丸大二郎, 菅佑樹, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 1P1 - G11 2008年06月
神経調節機能を参考とした自律エージェントの神経制御器の開発

菅佑樹, 小林大三, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2P1 - G05 2008年06月
物体操作に関する脳の情報処理構造を参考にした運動学習モデル

有江浩明, 尾形哲也, 谷淳, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2P1 2008年06月
自己組織化回路素子SONEの制御回路構造形成メカニズム

金天海, 阿部博行, 出澤純一, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2P2 - G11 2008年06月
カスタマイズ可能なロボットにおける形態主張の効果と検証

岩丸大二郎, 奥出京司郎, 近藤裕樹, 坂上徳翁, 菅佑樹, 尾形哲也, 菅野重樹

人工知能学会全国大会 (JSAI2008) 1I1 - 04 2008年06月
: SalienceGraph: 参照確率に基づく話題遷移図の可視化

白松俊, 駒谷和範, 尾形哲也, 奥乃博

人工知能学会全国大会 (JSAI2008) 22 1H1 - 1 2008年06月

CiNii
音声対話システムにおける簡略表現認識のための誤認識増加を抑制する自動語彙拡張

勝丸真樹, 駒谷和範, 尾形哲也, 奥乃博

第71回音声言語情報処理研究会, 情報処理学会 2008 ( 46 ) 71 - 76 2008年05月

　概要を見る

音声対話システムにおいてユーザはしばしば名称の一部を省略して発話し，音声認識誤りを招く．我々は，「簡略表現」を単語の一部を省略した表現として定義し，簡略表現を音声認識辞書に自動追加する．簡略表現の取得には，日本語では複合語を分割する必要があるが，形態素解析器のみの分割では固有名詞は必ずしも正確に分割できない．さらに，多くの簡略表現を辞書に追加すると，語彙サイズの増加により音声認識精度が劣化する．我々は，前者の問題に対し，形態素解析器に加えて辞書内の文字間の連接確率を用いることで固有名詞の複合語を分割した．後者の問題に対しては，生成した簡略表現の生起確率を，既存辞書の語との音素列の類似度に応じて操作した．本手法で簡略表現を自動追加した結果，既存辞書内の語彙のみを含む発話に対する単語正解精度の劣化を 0.1 ポイントに抑えながら，簡略表現を含む発話の単語正解精度を，既存の辞書による場合と比較して 24.2 ポイント向上させた．Users often abbreviate long words when using spoken dialogue systems, which results in automatic speech recognition (ASR) errors. We define abbreviated words as sub-words of the original word, and automatically add them into an ASR dictionary. Two issues arise during this vocabulary expansion. The first problem is that proper nouns cannot be correctly segmented by general morphological analyzers, although long and compounded words need to be segmented in agglutinative languages such as Japanese. The second is that, as the vocabulary size increases, adding many abbreviated words degrades the ASR accuracy. We develop two methods, (1) to segment words by using conjunction probabilities between characters, and (2) to manipulate occurrence probabilities of generated abbreviated words on the basis of the phonological similarities between abbreviated and original words. By our method, the ASR accuracy was improved by 24.2 points for utterances containing abbreviated words, with only a 0.1 point degradation of ASR accuracy for those containing words in the original dictionary.

CiNii
Predicting object dynamics from visual images through active sensing experiences

Shun Nishide, Tetsuya Ogata, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

Advanced Robotics 22 ( 5 ) 527 - 546 2008年04月 [査読有り]

　概要を見る

Prediction of dynamic features is an important task for determining the manipulation strategies of an object. This paper presents a technique for predicting dynamics of objects relative to the robot's motion from visual images. During the training phase, the authors use the recurrent neural network with parametric bias (RNNPB) to self-organize the dynamics of objects manipulated by the robot into the PB space. The acquired PB values, static images of objects and robot motor values are input into a hierarchical neural network to link the images to dynamic features (PB values). The neural network extracts prominent features that each induce object dynamics. For prediction of the motion sequence of an unknown object, the static image of the object and robot motor value are input into the neural network to calculate the PB values. By inputting the PB values into the closed loop RNNPB, the predicted movements of the object relative to the robot motion are calculated recursively. Experiments were conducted with the humanoid robot Robovie-IIs pushing objects at different heights. The results of the experiment predicting the dynamics of target objects proved that the technique is efficient for predicting the dynamics of the objects. © 2008 VSP.

DOI

Scopus

22

被引用数

(Scopus)
音楽と映像の調和度計算モデルを用いたクロスメディア検索

斎藤博己, 糸山克寿, 吉井和佳, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 70 465 - 466 2008年03月

CiNii
顔の動作に追従したインタフェースを持つ音環境可視化システム

久保田祐史, 吉田雅敏, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 70 ( 0 ) 273 - 274 2008年03月

CiNii J-GLOBAL
トピック推定と対話履歴の統合によるドメイン選択を行うマルチドメイン音声対話システム

池田智志, 駒谷和範, 尾形哲也, 奥乃博

全国大会講演論文集 70 ( 0 ) 139 - 140 2008年03月

CiNii J-GLOBAL
Motion from sound: Intermodal neural network mapping

Tetsuya Ogata, Hiroshi G. Okuno, Hideki Kozima

IEEE Intelligent Systems 23 ( 2 ) 76 - 78 2008年03月 [査読有り]

　概要を見る

A technological method has been developed for intermodal mapping to generate robot motion from various sounds as well as to generate sounds from motions. The procedure consists of two phases, first the learning phase in which it observes some events together with associated sounds and then memorizes those sounds along with the motions of the sound source. Second phase is the interacting phase in which the robot receives limited sensory information from a single modality as input and associates this with different modality and expresses it. The recurrent-neural-network model with parametric bias (RNNPB) is applied that uses the current state-vector as input for outputting the next state-vector. The RNNPB model can self-organize the values that encode the input dynamics into special parametric-bias modes to reproduce he multimodal sensory flow.

DOI

Scopus
楽譜情報を援用した多重奏音楽音響信号の音源分離と調波・非調波統合モデルの制約付パラメータ推定の同時実現

糸山克寿, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会論文誌 49 ( 3 ) 1465 - 1479 2008年03月

　概要を見る

本論文では，多重奏の音楽音響信号とその楽曲に含まれるすべての単音の音高・音長・音量・発音時刻・楽器の種類の組である楽譜情報を入力として，単音ごとの音響信号を出力する音源分離手法と，そのための制約付きモデルパラメータ推定手法について述べる．本分離手法では，Standard MIDIb File（SMF）などから抽出された楽譜情報を用いることで混合音のパワースペクトルを単音ごとに分離し，調波構造と非調波構造のそれぞれを表現する2 つのモデルを統合した新たな重み付き混合モデルを用いることで，単音に複数の調波構造が含まれることを防ぎ，かつ音高を超えた楽器音の音色類似性を考慮することを実現する．モデルパラメータは，楽譜情報に基づいてMIDI 音源から生成したテンプレート音によって初期化し，EM アルゴリズムを用いた最大事後確率推定により反復推定する．さらに，モデルの過学習を防ぎ，同一楽器の単音のモデルに類似した音色を持たせるための制約条件も同時に用いる．ポピュラー音楽のSMF を用いた評価実験で，本手法によりSNR が0.4 ? 0.9dB向上することを確認した．This paper describes a sound sourse separation method for polyphonic sound mixtures of musical signals which include both harmonic instrument sounds and inharmonic instrument sounds, and a constrained parameter estimation method by using a score which includes pitch, duration, volume, onset time, and instrument of each note as prior information. We separate a power spectrum of sound mixtures into each musical note by using an integrated weighted-mixture model consisting of both harmonic-structure and inharmonic-structure tone models (generative models for the power spectrogram). The integrated model realize a parameter estimation method under a constraint of parameter similarity in the same musical instruments. We initialize model parameters using template sounds which are recorded from a MIDI tone generator. On the basis of the Maximum A Posteriori Probability estimation using the EM algorithm, we estimated all parameters of this integrated model under several original constraints for preventing over-training and maintaining intra-instrument consistency. Using standard MIDI files as prior information of the model parameters, we confirmed that the integrated model increased the SNR by 0.4 窶骭 0.9 dB.

CiNii
ロボットの順逆モデルの変換による他者行為予測と模倣

横矢龍之介, 尾形哲也, 西出俊, 谷淳, 駒谷和範, 奥乃博

情報処理学会第70回全国大会 2008年03月
ベース音高を考慮したポピュラー音楽に対する和音進行認識

須見康平, 糸山克寿, 吉井和佳, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
楽器固有の音響的特徴を考慮した楽器音の音高操作手法

安部武宏, 糸山克寿, 吉井和佳, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
複数楽器個体による事前分布を用いた調波・非調波統合モデルのパラメータ推定

糸山克寿, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
音楽と自分の声を聞き分けながらビートに合わせて発声するロボットの開発

水本武志, 武田龍, 吉井和佳, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
顔の動作に追従したGUIインタフェースを持つ音環境可視化システム

久保田祐史, 吉田雅敏, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
音声対話システムにおけるユーザの固有名詞の簡略化に対処する語彙拡張

勝丸真樹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
マルチドメインシステムにおけるトピック推定と対話履歴の統合によるドメイン選択の高精度化

池田智志, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
音声対話システムにおけるWFSTに基づく文法検証を利用した動的ヘルプ生成

福林雄一朗, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
音声対話システムにおける誤り原因の階層的分類とその推定に基づく発話誘導

駒谷和範, 福林雄一朗, 池田智志, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
ロボット音声対話のためのMFTとICAによるバージイン許容機能の評価

武田龍, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
音楽と映像の調和度計算モデルを用いたクロスメディア探索

斎藤博己, 糸山克寿, 吉井和佳, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
音源定位結果と音声認識結果をHMDに統合呈示する聴覚障害者向け音環境理解支援システム

徳田浩一, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第70回全国大会 2008年03月
神経回路モデルによる動作・言語変換を利用した人間ロボット音声協調

張陽, 尾形哲也, 谷淳, 村瀬昌満, 駒谷和範, 奥乃博

情報処理学会第70回全国大会 2008年03月
RNNPBによる音響模倣・分節化を用いた音素獲得モデルの提案

神田尚, 尾形哲也, 駒谷和範, 奥乃博

情報処理学会第70回全国大会 2008年03月
新近性効果の減数曲線を加味した顕現性計算手法に基づく話題遷移の可視化

白松俊, 駒谷和範, 尾形哲也, 奥乃博

言語処理学会第14回年次大会 432 - 435 2008年03月
音声対話システムにおける動的ヘルプ生成を指向したWFSTに基づく文法検証によるユーザ知識推定

福林雄一朗, 駒谷和範, 尾形哲也, 奥乃博

人工知能学会言語・音声理解と対話処理研究会資料 52nd 45 - 50 2008年02月

J-GLOBAL
An efficient hybrid music recommender system using an incrementally trainable probabilistic generative model

Kazuyoshi Yoshii, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IEEE Transactions on Audio, Speech and Language Processing 16 ( 2 ) 435 - 447 2008年02月 [査読有り]

　概要を見る

This paper presents a hybrid music recommender system that ranks musical pieces while efficiently maintaining collaborative and content-based data, i.e., rating scores given by users and acoustic features of audio signals. This hybrid approach overcomes the conventional tradeoff between recommendation accuracy and variety of recommended artists. Collaborative filtering, which is used on e-commerce sites, cannot recommend nonbrated pieces and provides a narrow variety of artists. Content-based filtering does not have satisfactory accuracy because it is based on the heuristics that the user's favorite pieces will have similar musical content despite there being exceptions. To attain a higher recommendation accuracy along with a wider variety of artists, we use a probabilistic generative model that unifies the collaborative and content-based data in a principled way. This model can explain the generative mechanism of the observed data in the probability theory. The probability distribution over users, pieces, and features is decomposed into three conditionally independent ones by introducing latent variables. This decomposition enables us to efficiently and incrementally adapt the model for increasing numbers of users and rating scores. We evaluated our system by using audio signals of commercial CDs and their corresponding rating scores obtained from an e-commerce site. The results revealed that our system accurately recommended pieces including nonrated ones from a wide variety of artists and maintained a high degree of accuracy even when new users and rating scores were added. © 2008 IEEE.

DOI

Scopus

147

被引用数

(Scopus)
自己モデルの投影に基づくロボットによる他者発見と動作模倣

横矢龍之介, 尾形哲也, 谷淳, 駒谷和範

ヒューマンインタフェース学会論文誌 10 ( 1 ) 59 - 71 2008年02月 [査読有り]
Hybrid Collaborative and Content-based Music Recommendation Using Incrementally-trainable Probabilistic Generative Model

Kazuyoshi YOSHII, Masataka GOTO, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

IEEE Transactions on Audio, Speech and Language Processing 12 ( 2 ) 435 - 447 2008年02月 [査読有り]
Meaning Games, In New Frontiers in Artificial Intelligence, JSAI 2007 Conference and Workshops

Koiti HASHIDA, Shun SHIRAMATSU, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Revised Selected Papers, Lecture Notes in Artificial Intelligence 4914 228 - 241 2008年02月
Advanced Robotics: Preface

Maria Chiara Carrozza, Tetsuya Ogata, Eugenio Guglielmelli

Advanced Robotics 22 ( 1 ) 1 - 2 2008年01月

DOI

Scopus
Cheek to Chip: Dancing Robots and AI's Future.

Jean-Julien Aucouturier, Katsushi Ikeuchi, Hirohisa Hirukawa, Shinichiro Nakaoka, Takaaki Shiratori, Shunsuke Kudoh, Fumio Kanehiro, Tetsuya Ogata, Hideki Kozima, Hiroshi G. Okuno, Marek P. Michalowski, Yuta Ogai, Takashi Ikegami, Kazuhiro Kosuge, Takahiro Takeda, Yasuhisa Hirata

IEEE Intell. Syst. 23 ( 2 ) 74 - 84 2008年 [査読有り]

DOI

Scopus
Meaning games

Kôiti Hasida, Shun Shiramatsu, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4914 LNAI 228 - 241 2008年 [査読有り]

　概要を見る

Communication can be accounted for in game-theoretic terms. The meaning game is proposed to formalize intentional communication in which the sender sends a message and the receiver attempts to infer its intended meaning. Using large Japanese and English corpora, the present paper demonstrates that centering theory is derived from a meaning game. This suggests that there are no language-specific rules on referential coherence. More generally speaking, language use seems to employ Pareto-optimal ESSs (evolutionarily stable strategies) of potentially very complex meaning games. There is still much to do before this complexity is elucidated in scientific terms, but game theory provides statistical and analytic means by which to advance the study on semantics and pragmatics of natural languages and other communication modalities. © 2008 Springer-Verlag Berlin Heidelberg.

DOI

Scopus

1

被引用数

(Scopus)
2P1-G03 Segmenting Sound Signals and Articulatory Movement using Recurrent Neural Network toward Phoneme Acquisition

神田尚, 尾形哲也, 駒谷和範, 奥乃博

ロボティクス・メカトロニクス講演会講演概要集 2008 ( 0 ) _2P1 - G03_1-_2P1-G03_4 2008年

　概要を見る

This paper proposes a computational model for phoneme acquisition by infants. Infants perceive speech not as discrete phoneme sequences but as continuous acoustic signals. One of critical problems in phoneme acquisition is the design for segmenting these continuous speech. The key idea to solve this problem is that articulatory mechanisms such as the vocal tract help human beings to perceive sound units corresponding to phonemes. To segment acoustic signal with articulatory movement, our system was implemented by using a physical vocal tract model, called the Maeda model, and applying a segmenting method using Recurrent Neural Network with Parametric Bias (RNNPB). This method determines segmentation boundaries in a sequence using the prediction error of the RNNPB model, and the PB values obtained by the method can be encoded as kind of phonemes. Experimental results demonstrated that our system could self-organize the same phonemes in different continuous sounds. This suggests that our model reflects the process of phoneme acquisition.

CiNii
Structural feature extraction based on active sensing experiences

Shun Nishide, Tetsuya Ogata, Ryunosuke Yokoya, Kazunori Komatani, Hiroshi G. Okuno, Jun Tani

Proceedings - International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008 169 - 172 2008年 [査読有り]

　概要を見る

Affordance is a feature of an object or environment that implies how to interact with it. Based on affordance theory, humans are said to perceive invariant structures for cognizing the object/environment for generating behaviors. In this paper, the authors present a method to extract invariant structures of objects from visual raw images, based on object manipulation experiences using a humanoid robot. The method consists of two training phases. The first phase utilizes Recurrent Neural Network with Parametric Bias (RNNPB) to self-organize dynamical object features extracted during active sensing with objects. The second phase trains a hierarchical neural network attached to RNNPB for associating object images and robot motions with self-organized object features. Analysis of the model has uncovered static objects features that are closely related to dynamic object motions, such as round or stable. © 2008 IEEE.

DOI

Scopus

2

被引用数

(Scopus)
Robot audition from the viewpoint of computational auditory scene analysis

Hiroshi G. Okuno, Tetsuya Ogata, Kazunori Komatani

Proceedings - International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008 35 - 40 2008年 [査読有り]

　概要を見る

We have been engaged in research on computational auditory scene analysis to attain sophisticated robot/computer human interaction by manipulating real-world sound signals. The objective of our research is the understanding of an arbitrary sound mixture including music and environmental sounds as well as voiced speech, obtained by robot's ears (microphones) embedded on the robot. Three main issues in computational auditory scene analysis are sound source localization, separation, and recognition of separated sounds for a mixture of speech signals as well as polyphonic music signals. The Missing Feature Theory (MFT) approach integrates sound source separation and automatic speech recognition by generating missing feature masks. This robot audition system has been successfully ported to three kinds of robots, SIG2, Robovie R2 and Honda ASIMO. A robot recognizes three simultaneous speeches such as placing a meal order ora referee for RockPaper-Scissors Sound Games with a delay of less than 2 seconds. The real-time beat tracking system is also developed for robot audition. A robot hears music, understands and predicts its musical beats to behave in accordance with the beat times in real-time. © 2008 IEEE.

DOI

Scopus

1

被引用数

(Scopus)
Evaluation of two-channel-based sound source localization using 3D moving sound creation tool

Hyun Don Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings - International Conference on Informatics Education and Research for Knowledge-Circulating Society, ICKS 2008 209 - 212 2008年 [査読有り]

　概要を見る

We proposed the way that can repeatedly evaluate the localization methods for moving sounds in the same condition regardless of a kind of methods and a number of microphones. And, we developed two-channel-based sound source localization integrated with a cross-power spectrum phase (CSP) analysis and EM algorithm. This one can localize several moving sounds and reduce localization error. Many sound source localization methods have already been developed. However, they could not be evaluated for moving sound in the same condition because it is hard to build database for moving sounds with accurate track information whenever making experiments. Also, to localize several moving sounds, conventional methods need a lot of microphone and/or prior information such as impulse response data. In this paper, we evaluated our sound localization method using 3D moving sound creation tool and confirmed that our method with two microphones can well localize the voices of a moving talker without impulse response data. © 2008 IEEE.

DOI

Scopus

3

被引用数

(Scopus)
Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics

Takehiro Abe, Katsutoshi Itoyama, Kazuyoshi Yoshii, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings - 11th International Conference on Digital Audio Effects, DAFx 2008 249 - 256 2008年 [査読有り]

　概要を見る

This paper presents an analysis-manipulation method that can generate musical instrument sounds with arbitrary pitches and durations from the sound of a given musical instrument (called seed) without distorting its timbrai characteristics. Based on psychoacoustical knowledge of the auditory effects of timbres, we defined timbrai features based on the spectrogram of the sound of a musical instrument as (i) the relative amplitudes of the harmonic peaks, (ii) the distribution of the inharmonic component, and (iii) temporal envelopes. First, to analyze the timbrai features of a seed, it was separated into harmonic and inharmonic components using Itoyama's integrated model. For pitch manipulation, we took into account the pitch-dependency of features (i) and (ii). We predicted the values of each feature by using a cubic polynomial that approximated the distribution of these features over pitches. To manipulate duration, we focused on preserving feature (iii) in the attack and decay duration of a seed. Therefore, only steady durations were expanded or shrunk. In addition, we propose a method for reproducing the properties of vibrato. Experimental results demonstrated the quality of the synthesized sounds produced using our method. The spectral and MFCC distances between the synthesized sounds and actual sounds of 32 instruments were reduced by 64.70% and 32.31%, respectively.
Vowel imitation using vocal tract model and recurrent neural network

Hisashi Kanda, Tetsuya Ogata, Kazunori Komatani, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4985 LNCS ( PART 2 ) 222 - 232 2008年 [査読有り]

　概要を見る

A vocal imitation system was developed using a computational model that supports the motor theory of speech perception. A critical problem in vocal imitation is how to generate speech sounds produced by adults, whose vocal tracts have physical properties (i.e., articulatory motions) differing from those of infants' vocal tracts. To solve this problem, a model based on the motor theory of speech perception, was constructed. Applying this model enables the vocal imitation system to estimate articulatory motions for unexperienced speech sounds that have not actually been generated by the system. The system was implemented by using Recurrent Neural Network with Parametric Bias (RNNPB) and a physical vocal tract model, called Maeda model. Experimental results demonstrated that the system was sufficiently robust with respect to individual differences in speech sounds and could imitate unexperienced vowel sounds. © 2008 Springer-Verlag Berlin Heidelberg.

DOI

Scopus

1

被引用数

(Scopus)
Rapid Prototyping of Robust Language Understanding Modules for Spoken Dialogue Systems.

Yuichiro Fukubayashi, Kazunori Komatani, Mikio Nakano, Kotaro Funakoshi, Hiroshi Tsujino, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008) 210 - 216 2008年 [査読有り]
Object dynamics prediction and motion generation based on reliable predictability

Shun Nishide, Tetsuya Ogata, Ryunosuke Yokoya, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

Proceedings - IEEE International Conference on Robotics and Automation 1608 - 1614 2008年 [査読有り]

　概要を見る

Consistency of object dynamics, which is related to reliable predictability, is an important factor for generating object manipulation motions. This paper proposes a technique to generate autonomous motions based on consistency of object dynamics. The technique resolves two issues: construction of an object dynamics prediction model and evaluation of consistency. The authors utilize Recurrent Neural Network with Parametric Bias to self-organize the dynamics, and link static images to the self-organized dynamics using a hierarchical neural network to deal with the first issue. For evaluation of consistency, the authors have set an evaluation function based on object dynamics relative to robot motor dynamics. Experiments have shown that the method is capable of predicting 90% of unknown object dynamics. Motion generation experiments have proved that the technique is capable of generating autonomous pushing motions that generate consistent rolling motions. ©2008 IEEE.

DOI

Scopus

12

被引用数

(Scopus)
Two-channel-based voice activity detection for humanoid robots in noisy home environments

Hyun Don Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings - IEEE International Conference on Robotics and Automation 3495 - 3501 2008年 [査読有り]

　概要を見る

The purpose of this research is to accurately classify the speech signals originating from the front even in noisy home environments. This ability can help robots to improve speech recognition and to spot keywords. We therefore developed a new voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method. It can classify the speech signals that are received at the front of two microphones by comparing the spectral energy of observed signals with that of target signals estimated by CSCC. Also, it can work in real time without training filter coefficients beforehand even in noisy environments (SNR > 0 dB) and can cope with speech noises generated by audio-visual equipments such as televisions and audio devices. Since the CSCC method requires the directions of the noise signals, we also developed a sound source localization system integrated with cross-power spectrum phase (CSP) analysis and an expectation-maximization (EM) algorithm. This system was demonstrated to enable a robot to cope with multiple sound sources using two microphones. ©2008 IEEE.

DOI

Scopus

6

被引用数

(Scopus)
A portable robot audition software system for multiple simultaneous speech signals

H. G. Okuno, S. Yamamoto, K. Nakadai, J. M. Valin, T. Ogata, K. Komatani

Proceedings - European Conference on Noise Control 483 - 488 2008年 [査読有り]

　概要を見る

Since a robot is deployed in various kinds of environments, the robot audition system should work with minimum prior information on environments to localize, separate and recognize utterances by multiple simultaneous talkers. For example, it should not assume either the number of speakers, the location of speakers for sound source separation (SSS), or specially tuned acoustic model for automatic speech recognition (ASR). We developed \HARK" portable robot audition that uses eight microphones installed on the surface of robot's body such as Honda ASIMO, and SIG-2 and Robovie-R2 at Kyoto University. HARK integrates SSS and ASR by using the Missing-Feature Theory. For SSS, we use Geometric Source Separation and multi-channel post-filter to separate each utterance. Since separated speech signals are distorted due to interfering talkers and sound source separation, multi-channel post-filter enhanced speech signals. At this process, we create a missing feature mask that specifies which acoustic features are reliable in time-frequency domain. Multi-band Julius, a missing-feature-theory based ASR, uses this mask to avoid the inuence of unreliable features in recognizing such distorted speech signals. The system demonstrated a waitress robot that accepts meal orders placed by three actual human talkers.
Integrating topic estimation and dialogue history for domain selection in multi-domain spoken dialogue systems

Satoshi Ikeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5027 LNAI 294 - 304 2008年 [査読有り]

　概要を見る

We present a method of robust domain selection against out-of-grammar (OOG) utterances in multi-domain spoken dialogue systems. These utterances cause language-understanding errors because of a limited set of grammar and vocabulary of the systems, and deteriorate the domain selection. This is critical for multi-domain spoken dialogue systems to determine a system's response. We first define a topic as a domain from which the user wants to retrieve information, and estimate it as the user's intention. This topic estimation is enabled by using a large amount of sentences collected from the Web and Latent Semantic Mapping (LSM). The results are reliable even for OOG utterances. We then integrated both the topic estimation results and the dialogue history to construct a robust domain classifier against OOG utterances. The idea of integration is based on the fact that the reliability of the dialogue history is often impeded by language-understanding errors caused by OOG utterances, from which using topic estimation obtains useful information. Experimental results using 2191 utterances showed that our integrated method reduced domain selection errors by 14.3%. © 2008 Springer-Verlag Berlin Heidelberg.

DOI

Scopus

3

被引用数

(Scopus)
Development of user-adaptive value system of learning function using interactive EC

Yuki Suga, Shigeki Sugano, Yoshinori Ikuma, Tetsuya Ogata

IFAC Proceedings Volumes (IFAC-PapersOnline) 17 ( 1 PART 1 ) 9156 - 9161 2008年

　概要を見る

Our goal is to create a user-adaptive communication-robot. We are developing a system for evaluating human-robot interactions. Although such evaluation is indispensable for learning algorithms, users' preferences are too difficult to model because they are subjective. In this study, we used the interactive evolutionary computation (IEC) to configure the value system of a learning communicationrobot. The IEC is a genetic algorithm whose fitness function is performed by the user. In our experiment, we encoded the values of sensors (reward or punishment) into genes, and subjects interacted with the learning robot. Through the interaction, the subjects evaluated the robot by touching its sensors, and the robot learned appropriate combinations between input and output. Afterward, the subjects gave their scores to the experimenter, and the scores were regarded as the fitness values of the corresponding genes. These sequences were continued until the 4 generation, and then the subjects compared three of their best genes and two of the experimenter's. We found that the user-adaptive value system is suitable for the communication-robot. Copyright © 2007 International Federation of Automatic Control All Rights Reserved.

DOI

Scopus
Human-Adaptive Robot Interaction Using Interactive EC with Human-Machine Hybrid Evaluation.

Yuki Suga, Tetsuya Ogata, Shigeki Sugano

J. Robotics Mechatronics 20 ( 4 ) 610 - 620 2008年

DOI

Scopus
Soft Missing-Feature Mask Generation for Simultaneous Speech Recognition System in Robots

Toru Takahashi, Shun'ichi Yamamoto, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 992 - + 2008年 [査読有り]
Instrument equalizer for query-by-example retrieval: Improving sound source separation based on integrated harmonic and inharmonic models

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ISMIR 2008 - 9th International Conference on Music Information Retrieval 133 - 138 2008年 [査読有り]

　概要を見る

This paper describes a music remixing interface, called Instrument Equalizer, that allows users to control the volume of each instrument part within existing audio recordings in real time. Although query-by-example retrieval systems need a user to prepare favorite examples (songs) in general, our interface gives a user to generate examples from existing ones by cutting or boosting some instrument/vocal parts, resulting in a variety of retrieved results. To change the volume, all instrument parts are separated from the input sound mixture using the corresponding standard MIDI file. For the separation, we used an integrated tone (timbre) model consisting of harmonic and inharmonic models that are initialized with template sounds recorded from a MIDI sound generator. The remaining but critical problem here is to deal with various performance styles and instrument bodies that are not given in the template sounds. To solve this problem, we train probabilistic distributions of timbre features by using various sounds. By adding a new constraint of maximizing the likelihood of timbre features extracted from each tone model, we succeeded in estimating model parameters that better express actual timbre.
Automatic chord recognition based on probabilistic integration of chord transition and bass pitch estimation

Kouhei Sumi, Katsutoshi Itoyama, Kazuyoshi Yoshii, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ISMIR 2008 - 9th International Conference on Music Information Retrieval 39 - 44 2008年 [査読有り]

　概要を見る

This paper presents a method that identifies musical chords in polyphonic musical signals. As musical chords mainly represent the harmony of music and are related to other musical elements such as melody and rhythm, the performance of chord recognition should improve if this interrelationship is taken into consideration. Nevertheless, this interrelationship has not been utilized in the literature as far as the authors are aware. In this paper, bass lines are utilized as clues for improving chord recognition because they can be regarded as an element of the melody. A probabilistic framework is devised to uniformly integrate bass lines extracted by using bass pitch estimation into a hypothesis-search-based chord recognition. To prune the hypothesis space of the search, the hypothesis reliability is defined as the weighted sum of three reliabilities: the likelihood of Gaussian Mixture Models for the observed features, the joint probability of chord and bass pitch, and the chord transition N-gram probability. Experimental results show that our method recognized the chord sequences of 150 songs in twelve Beatles albums; the average frame-rate accuracy of the results was 73.4%.
Segmenting acoustic signal with articulatory movement using recurrent neural network for phoneme acquisition

Hisashi Kanda, Tetsuya Ogata, Kazunori Komatani, Hiroshi G. Okuno

2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1712 - 1717 2008年 [査読有り]

　概要を見る

This paper proposes a computational model for phoneme acquisition by infants. Human infants perceive speech sounds not as discrete phoneme sequences but as continuous acoustic signals. One of critical problems in phoneme acquisition is the design for segmenting these continuous speech sounds. The key idea to solve this problem is that articulatory mechanisms such as the vocal tract help human beings to perceive speech sound units corresponding to phonemes. That is, the ability to distinguish phonemes is learned by recognizing unstable points in the dynamics of continuous sound with articulatory movement. We have developed a vocal imitation system embodying the relationship between articulatory movements and sounds produced by the movements. To segment acoustic signal with articulatory movement, we apply the segmenting method to our system by Recurrent Neural Network with Parametric Bias (RNNPB). This method determines the multiple segmentation boundaries in a temporal sequence using the prediction error of the RNNPB model, and the PB values obtained by the method can be encoded as kind of phonemes. Our system was implemented by using a physical vocal tract model, called the Maeda model. Experimental results demonstrated that our system can self-organize the same phonemes in different continuous sounds. This suggests that our model reflects the process of phoneme acquisition. ©2008 IEEE.

DOI

Scopus

7

被引用数

(Scopus)
Target speech detection and separation for humanoid robots in sparse dialogue with noisy home environments

Hyun Don Kim, Jinsung Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1705 - 1711 2008年 [査読有り]

　概要を見る

In normal human communication, people face the speaker when listening and usually pay attention to the speaker' face. Therefore, in robot audition, the recognition of the front talker is critical for smooth interactions. This paper presents an enhanced speech detection method for a humanoid robot that can separate and recognize speech signals originating from the front even in noisy home environments. The robot audition system consists of a new type of voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method and a maximum signal-to-noise (Max-SNR) beamformer. This VAD based on CSCC can classify speech signals that are retrieved at the frontal region of two microphones embedded on the robot. The system works in real-time without needing training filter coefficients given in advance even in a noisy environment (SNR > 0 dB). It can cope with speech noise generated from televisions and audio devices that does not originate from the center. Experiments using a humanoid robot, SIG2, with two microphones showed that our system enhanced extracted target speech signals more than 12 dB (SNR) and the success rate of automatic speech recognition for Japanese words was increased about 17 points. ©2008 IEEE.

DOI

Scopus

16

被引用数

(Scopus)
Design and evaluation of two-channel-based sound source localization over entire azimuth range for moving talkers

Hyun Don Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2197 - 2203 2008年 [査読有り]

　概要を見る

We propose a way to evaluate various sound localization systems for moving sounds under the same conditions. To construct a database for moving sounds, we developed a moving sound creation tool using the API library developed by the ARINIS Company. We developed a two-channel-based sound source localization system integrated with a cross-power spectrum phase (CSP) analysis and EM algorithm. The CSP of sound signals obtained with only two microphones is used to localize the sound source without having to use prior information such as impulse response data. The EM algorithm helps the system cope with several moving sound sources and reduce localization error. We evaluated our sound localization method using artificial moving sounds and confirmed that it can well localize moving sounds slower than 1.125 rad/sec. Finally, we solve the problem of distinguishing whether sounds are coming from the front or back by rotating a robot's head equipped with only two microphones. Our system was applied to a humanoid robot called SIG2, and we confirmed its ability to localize sounds over the entire azimuth range. ©2008 IEEE.

DOI

Scopus

7

被引用数

(Scopus)
A robot listens to music and counts its beats aloud by separating music from counting voice

Takeshi Mizumoto, Ryu Takeda, Kazuyoshi Yoshii, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1538 - 1543 2008年 [査読有り]

　概要を見る

This paper presents a beat-counting robot that can count musical beats aloud, i.e., speak "one, two, three, four, one, two, ..." along music, while listening to music by using its own ears. Music-understanding robots that interact with humans should be able not only to recognize music internally, but also to express their own internal states. To develop our beat-counting robot, we have tackled three issues: (1) recognition of hierarchical beat structures, (2) expression of these structures by counting beats, and (3) suppression of counting voice (self-generated sound) in sound mixtures recorded by ears. The main issue is (3) because the interference of counting voice in music causes the decrease of the beat recognition accuracy. So we designed the architecture for music-understanding robot that is capable of dealing with the issue of self-generated sounds. To solve these issues, we took the following approaches: (1) beat structure prediction based on musical knowledge on chords and drums, (2) speed control of counting voice according to music tempo via a vocoder called STRAIGHT, and (3) semi-blind separation of sound mixtures into music and counting voice via an adaptive filter based on ICA (Independent Component Analysis) that uses the waveform of the counting voice as a prior knowledge. Experimental result showed that suppressing robot's own voice improved music recognition capability. ©2008 IEEE.

DOI

Scopus

14

被引用数

(Scopus)
Barge-in-able robot audition based on ICA and missing feature theory under semi-blind situation

Ryu Takeda, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1718 - 1723 2008年 [査読有り]

　概要を見る

This paper describes a robot audition system that allows the user to barge-in; that is, the user can speak simultaneously when the robot is speaking. Our "barge-in-able" system consists of two stages: (1) cancellation of robot speech and (2) recognition of the separated user speech under the "semi-blind situation". The semi-blind situation is where a robot's speech signal is known but a user's speech signal is not. The first stage is achieved by using an adaptive filter based on time-frequency domain Independent Component Analysis, because that can separate robot speech more robustly against noise than conventional echo cancellers. To improve performance in online processing, we utilized known source normalization and the exponentially weighted stepsize method. The second stage is achieved by automatic speech recognition (ASR) based on the missing feature theory which provides robust recognition by exploiting the reliability of speech features distorted due to noise and/or separation. The semi-blind situation simplifies the estimation of such reliabilities. Experiments demonstrated that our system improved word correctness of ASR by 10.0 %. ©2008 IEEE.

DOI

Scopus

14

被引用数

(Scopus)
Active sensing based dynamical object feature extraction

Shun Nishide, Tetsuya Ogata, Ryunosuke Yokoya, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1 - 7 2008年 [査読有り]

　概要を見る

This paper presents a method to autonomously extract object features that describe their dynamics from active sensing experiences. The model is composed of a dynamics learning module and a feature extraction module. Recurrent Neural Network with Parametric Bias (RNNPB) is utilized for the dynamics learning module, learning and self-organizing the sequences of robot and object motions. A hierarchical neural network is linked to the input of RNNPB as the feature extraction module for extracting object features that describe the object motions. The two modules are simultaneously trained using image and motion sequences acquired from the robot's active sensing with objects. Experiments are performed with the robot's pushing motion with a variety of objects to generate sliding, falling over, bouncing, and rolling motions. The results have shown that the model is capable of extracting features that distinguish the characteristics of object dynamics. ©2008 IEEE.

DOI

Scopus

3

被引用数

(Scopus)
Extensibility verification of robust domain selection against out-of-grammar utterances in multi-domain spoken dialogue system

Satoshi Ikeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 487 - 490 2008年 [査読有り]

　概要を見る

We developed a robust domain selection method and verified its extensibility. An issue in domain selection is its robustness against out-of-grammar utterances. It is essential to generate correct system responses because such utterances often cause domain selection errors. We therefore integrated the topic estimation results and the dialogue history to construct a robust domain classifier. Another issue is that domain selection should be performed within an extensible framework, because the system is often modified and extended. That is, the classifier should still have high performance without reconstructing it after adding new domains. The extensibility of our method was not experimentally verified yet, because it requires a lot of effort to collect new dialogue data after extending the system. Therefore, we verified extensibility without collecting new data. We constructed the classifier by leaving out some domains in the dialogue data and then evaluated its accuracy as the classifier for the data where the left-out domains were virtually added. Copyright © 2008 ISCA.
Expanding Vocabulary for Recognizing User”s Abbreviations of Proper Nouns without Increasing ASR Error Rates in Spoken Dialogue Systems

Masaki KATSUMARU, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Proceedings of International Conference on Spoken Language Processing (Interspeech-2008) 187 - 190 2008年 [査読有り]
Reinforcement Signal Propagation Algorithm for Logic Circuit.

Chyon Hae Kim, Tetsuya Ogata, Shigeki Sugano

J. Robotics Mechatronics 20 ( 5 ) 757 - 774 2008年 [査読有り]

DOI

Scopus

3

被引用数

(Scopus)
SalienceGraph: Visualizing salience dynamics of written discourse by using reference probability and PLSA

Shun Shiramatsu, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5351 LNAI 890 - 902 2008年 [査読有り]

　概要を見る

Since public involvement in the decision-making process for community development needs a lot of efforts and time, support tools for speeding up the consensus building process among stakeholders are required. This paper presents a new method for finding, tracking and visualizing participants' concerns (topics) from the record of a public debate. For finding topics, we use the salience of a term, which is computed as its reference probability based on referential coherence in the Centering Theory. Our system first annotates a debate record or minute into Global Document Annotation (GDA) format automatically, and then computes the salience of each term from the GDA-annotated text sentence by sentence. Then, by using the Probalilistic Latent Semantic Analytsis (PLSA), our system reduces the dimensions of the vector of salience values of terms into a set of major latent topics. For tracking topics, we use the salience dynamics, which is computed as the temporal change of joint attention to the major latent topics with additional user-supplied terms. The resulting graph is called SalienceGraph. For visualizing SalienceGraph, we use 3D visualizer with GUI designed by "overview first, zoom and filter, then details on demand" principle. SalienceGraph provides more accurate trajectory of topics than conventional TF•IDF. © 2008 Springer Berlin Heidelberg.

DOI

Scopus

5

被引用数

(Scopus)
Design and implementation of 3d auditory scene visualizer towards auditory awarenesswith face tracking

Yuji Kubota, Masatoshi Yoshida, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings - 10th IEEE International Symposium on Multimedia, ISM 2008 468 - 476 2008年 [査読有り]

　概要を見る

If machine audition can recognize an auditory scene containing simultaneous and moving talkers, what kinds of awareness will people gain from an auditory scene visualizer? This paper presents the design and implementation of 3D Auditory Scene Visualizer based on the visual information seeking mantra, i.e., "overview first, zoom and filter, then details on demand". The machine audition system called HARK captures 3D sounds with a microphone array, localizes and separates sounds, and recognizes separated sounds by automatic speech recognition (ASR). The 3D visualizer implemented in Java 3D displays each sound stream as a beam originating from the center of the microphones (overview mode), shows temporal snapshots with/without specifying focusing areas (zoom and filter mode), and shows detailed information about a particular sound stream (details on demand). In the details-ondemand mode, ASR results are displayed in a "karaoke" manner, i.e., character-by-character. This three-mode visualization will give the user auditory awareness enhanced by HARK. In addition, a face-tracking system automatically changes the focus of attention by tracking the user's face. The resulting system is portable and can be deployed in any place, so it is expected to give more vivid awareness than expensive high-fidelity auditory scene reproduction systems. © 2008 IEEE.

DOI

Scopus

13

被引用数

(Scopus)
3D Auditory Scene Visualizer with face tracking: Design and implementation for auditory awareness compensation

Yuji Kubota, Shun Shiramatsu, Masatoshi Yoshida, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the 2nd International Symposium on Universal Communication, ISUC 2008 42 - 49 2008年 [査読有り]

　概要を見る

This paper presents the design and implementation of 3D Auditory Scene Visualizer based on the visual information seeking mantra, "overview first, zoom and filter, then details on demand". The machine audition system called HARK captures 3D sounds with a microphone array. The natural language processing called SalienceGraph visualizes topic transition by using discourse salience. The 3D visualizer implemented in Java 3D displays topic transition and each sound stream as a beam originating from the microphones (overview mode), shows temporal snapshots with/without specifying focusing areas (zoom-andfilter mode), and shows detailed information about a particular sound stream (details-on-demand mode). This threemode visualization will give the user auditory awareness enhanced by HARK and SalienceGraph. In addition, a facetracking system automatically determines the user's intention by tracking the user's face. The resulting system will enable users to manage and browse auditory scene files effectively, so it should acceleration and support the information explosion to compensate the lack of auditory awareness. © 2008 IEEE.

DOI

Scopus

3

被引用数

(Scopus)
Soft missing-feature mask generation for simultaneous speech recognition system in robots

Toru Takahashi, Shun'ichi Yamamoto, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 1 ( 1 ) 992 - 995 2008年 [査読有り]

　概要を見る

This paper addresses automatic soft missing-feature mask (MFM) generation based on a leak energy estimation for a simultaneous speech recognition system. An MFM is used as a weight for probability calculation in a recognition process. In a previous work, a threshold-base-zero-or-one function was applied to decide if spectral parameter can be reliable or not for each frequency bin. The function is extended into a weighted sigmoid function which has two free parameters. In addition, a contribution ratio of static features is introduced for the probability calculation in a recognition process which static and dynamic features are input. The ratio can be implemented as a part of soft mask. The average recognition rate based on a soft MFM improved by about 5% for all directions from a conventional system based on a hard MFM. Word recognition rates improved from 70 to 80% for peripheral talkers and from 93 to 97% for front speech when speakers were 90 degrees apart. Copyright © 2008 ISCA.

DOI

Scopus
人工神経回路モデルによるインタラクション創発システム実現に向けて

尾形哲也

日本神経回路学会誌 = The Brain & neural networks 14 ( 4 ) 282 - 292 2007年12月

DOI CiNii
Experience-based imitation using RNNPB

Ryunosuke Yokoya, Tetsuya Ogata, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

Advanced Robotics 21 ( 12 ) 1351 - 1367 2007年12月 [査読有り]

　概要を見る

Robot imitation is a useful and promising alternative to robot programming. Robot imitation involves two crucial issues. The first is how a robot can imitate a human whose physical structure and properties differ greatly from its own. The second is how the robot can generate various motions from finite programmable patterns (generalization). This paper describes a novel approach to robot imitation based on its own physical experiences. We considered the target task of moving an object on a table. For imitation, we focused on an active sensing process in which the robot acquires the relation between the object's motion and its own arm motion. For generalization, we applied the RNNPB (recurrent neural network with parametric bias) model to enable recognition/generation of imitation motions. The robot associates the arm motion which reproduces the observed object's motion presented by a human operator. Experimental results proved the generalization capability of our method, which enables the robot to imitate not only motion it has experienced, but also unknown motion through nonlinear combination of the experienced motions. © 2007 VSP.

DOI

Scopus

21

被引用数

(Scopus)
能動知覚経験に基づく物体挙動連想と動作生成

西出俊, 尾形哲也, 横矢龍之介, 谷淳, 駒谷和範, 奥乃博

第8回システムインテグレーション部門講演会 (SI2007), 計測自動制御学会 1C1 - 3 2007年12月
ニューラルネットによる腱駆動ロボットアームの制御

有江浩明, 尾形哲也, 谷淳, 菅野重樹

第8回システムインテグレーション部門講演会 (SI2007) ,計測自動制御学会 1B4 - 6 2007年12月
自己モデルの再利用に基づくロボットによる他者の発見と模倣

横矢龍之介, 尾形哲也, 西出俊, 谷淳, 駒谷和範, 奥乃博

第8回システムインテグレーション部門講演会 (SI2007), 計測自動制御学会 2C1 - 2 2007年12月
マルチドメインシステムにおけるトピック推定と対話履歴の統合によるドメイン選択の高精度化

池田智志, 駒谷和範, 尾形哲也, 奥乃博

音声言語シンポジウム, 信学技法 NLC2007-80 ( SP2007-143 ) 277 - 282 2007年12月

　概要を見る

本論文では，マルチドメイン音声対話システムにおいて，システム想定外発話に対しても頑健に，応答すべきドメインを決定する方法について述べる．想定外発話は言語理解誤りを引き起こし，ドメイン選択誤りの原因となる．そこで本論文では，まず，『ユーザが意図したドメイン』をトピックとして定義し，Web から大量に収集した学習文書と，Latent Semantic Mapping を用いてトピックを推定する．次に，対話履歴とトピック推定を決定木を用いて統合し，想定外発話に頑健なドメイン選択器を構成した．トピック推定結果は，想定外発話に頑健であるが文脈情報を含まない．一方で，対話履歴は，想定外発話に頑健でないが文脈情報を含むため，これら２つは相補的に働く．話者 10 名 2191 発話を用いた評価実験により，従来手法からドメイン選択誤りを 14.3％削減した．We present a method of robust domain selection against out-of-grammar (OOG) utterances in multi-domain spoken dialogue systems. We first define a topic as a domain from which the user wants to retrieve information, and estimate it as the user's intention. This topic estimation is enabled by using a large amount of sentences collected from the Web and Latent Semantic Mapping (LSM). Topic estimation results are reliable even for OOG utterances. We then integrated both topic estimation results and dialogue history to construct a robust domain classifier against OOG utterances. The experimental results using 2191 utterances showed that our integrated method reduced domain selection errors by 14.3％.

CiNii
実世界の力学構造に基づいた疑似シンボル生成と言語動作相互変換

尾形哲也, 谷淳, 駒谷和範, 奥乃博

システム・情報部門学術講演会講演論文集 (SSI2007), 計測自動制御学会 2A2-3 211 - 216 2007年11月
Advanced Robotics: Preface

Maria Chiara Carrozza, Tetsuya Ogata, Eugenio Guglielmelli

Advanced Robotics 21 ( 10 ) 1093 - 1095 2007年10月

DOI

Scopus

2

被引用数

(Scopus)
Reinforcement learning of a continuous motor sequence with hidden states

Hiroaki Arie, Tetsuya Ogata, Jun Tani, Shigeki Sugano

Advanced Robotics 21 ( 10 ) 1215 - 1229 2007年10月 [査読有り]

　概要を見る

Reinforcement learning is the scheme for unsupervised learning in which robots are expected to acquire behavior skills through self-explorations based on reward signals. There are some difficulties, however, in applying conventional reinforcement learning algorithms to motion control tasks of a robot because most algorithms are concerned with discrete state space and based on the assumption of complete observability of the state. Real-world environments often have partial observablility; therefore, robots have to estimate the unobservable hidden states. This paper proposes a method to solve these two problems by combining the reinforcement learning algorithm and a learning algorithm for a continuous time recurrent neural network (CTRNN). The CTRNN can learn spatio-temporal structures in a continuous time and space domain, and can preserve the contextual flow by a self-organizing appropriate internal memory structure. This enables the robot to deal with the hidden state problem. We carried out an experiment on the pendulum swing-up task without rotational speed information. As a result, this task is accomplished in several hundred trials using the proposed algorithm. In addition, it is shown that the information about the rotational speed of the pendulum, which is considered as a hidden state, is estimated and encoded on the activation of a context neuron. © 2007 VSP.

DOI

Scopus

13

被引用数

(Scopus)
ゲーム理論に基づく参照結束性のモデル化と日本語・英語の大規模コーパスを用いた統計的検証

白松俊, 駒谷和範, 橋田浩一, 尾形哲也, 奥乃博

自然言語処理 14 ( 5 ) 199 - 239 2007年10月 [査読有り]

CiNii
自己組織化回路素子SONEにおけるノイズの抑制

金天海, 出澤純一, 尾形哲也, 菅野重樹

日本ロボット学会誌 = Journal of Robotics Society of Japan 25 ( 6 ) 913 - 920 2007年09月

　概要を見る

In the recent years, neural networks or other learing networks are frequently used in the field of robotics. However, the needed conditions of the learning system are not fulfilled enough in autonomous robot, because the variety of the needed conditions let it difficult to accomplish. So, integration of the functions is inevitable to create an effective learning system in autonomous robot. In traditional methods, it was difficult to accomplish "autonomous exploration of the effective output", "simple external parameters", and "low calculation cost" together in a learning system. Thus, we proposed a new learning method self-organizing network elements (SONE) against this problem. All of these conditions are fulfilled by SONE, however there is a need to enhance the ability against noises. Therefore, we propose a technique to restrain noises in SONE. In our experiments, more resistance against noises was confirmed with this technique. Also in a robot simulation, the performance of the robot was improved by this novel method.

DOI CiNii
自己組織化回路素子SONEにおけるノイズの抑制

金天海, 出澤純一, 尾形哲也, 菅野重樹

日本ロボット学会誌 25 ( 6 ) 115 - 122 2007年09月 [査読有り]
独立成分分析に基づく適応フィルタのロボット聴覚への応用

武田龍, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

第25回日本ロボット学会学術講演会 1N16 2007年09月

CiNii
声道物理モデルとリカレントニューラルネットワークによる母音模倣

神田尚, 尾形哲也, 駒谷和範, 奥乃博

第25回日本ロボット学会学術講演会 1N17 2007年09月
リカレントニューラルネットワークによる複数文章とロボット動作の双方向変換

尾形哲也, 村瀬昌満, 谷淳, 駒谷和範, 奥乃博

第25回日本ロボット学会学術講演会 1C26 2007年09月
物体挙動の予測信頼性に基づく自律的な動作生成

西出俊, 尾形哲也, 谷淳, 駒谷和範, 奥乃博

第25回日本ロボット学会学術講演会 1C36 2007年09月
対話型進化的計算による強化学習器の報酬系の獲得

菅佑樹, 生熊良規, 尾形哲也, 菅野重樹

第25回日本ロボット学会学術講演会,2O18, 2007年09月
自己モデルの投影に基づくロボットによる模倣動作の自律的獲得

横矢龍之介, 尾形哲也, 谷淳, 駒谷和範, 奥乃博

第25回日本ロボット学会学術講演会 3N27 2007年09月
マルチドメイン音声対話システムにおける想定外発話への対処のためのWebを用いたシステム知識の拡張

池田智志, 駒谷和範, 尾形哲也, 奥乃博

情報技術レターズ (第6回情報科学技術フォーラム(FIT2007)講演論文集) LE - 007 2007年09月
バージインを許容するロボット音声対話のためのICAを用いたセミブラインド音源分離

武田龍, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

情報科学技術フォーラム FIT 2007 ( 2 ) 261 - 262 2007年08月

CiNii J-GLOBAL
LE-008 音声認識結果とコンセプトへの重みづけによるWFSTに基づく音声言語理解の高精度化(自然言語・音声・音楽)

福林雄一朗, 駒谷和範, 中野幹生, 船越孝太郎, 尾形哲也, 奥乃博

情報科学技術レターズ 6 133 - 134 2007年08月

CiNii
2007 IEEE国際会議 Robotics and Automation

尾形哲也

バイオメカニズム学会誌 = Journal of the Society of Biomechanisms 31 ( 3 ) 162 - 163 2007年08月

CiNii
楽曲推薦システムの効率性とスケーラビリティの改善のための確率的推薦モデルのインクリメンタル学習法

吉井和佳, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会研究報告 2007 ( 81(MUS-71) ) 19 - 26 2007年08月

　概要を見る

本稿では，ハイブリッド型楽曲推薦システムの効率性とスケーラビリティを改善する手法について述べる．我々は以前，確率的推薦モデルを用いて「ユーザの楽曲評価」と「楽曲の音響的特徴」とを同時に考慮するハイブリッド型推薦システムを開発し，高精度な推薦ができることを実証したが，効率性とスケーラビリティに関する問題が残されていた．すなわち，新規評価や新規ユーザ，新規楽曲が追加されるたびにモデル全体を再学習する必要があった．また，モデル学習にかかる計算コストがユーザ数と楽曲数に比例して増加するため，現実的なサイズのデータを扱うことができなかった．そこで本稿では，効率性を改善するため，モデルをわずかなコストで部分的に更新できるインクリメンタル学習法を提案する．さらに，スケーラビリティを改善するため，まず少数の代表的な仮想ユーザと仮想楽曲に対して「コアモデル」を構築し，次にインクリメンタル学習法によって実際のユーザと楽曲をコアモデルに追加登録していく手法を提案する．実験の結果，上記二つの改良を施した推薦システムでは，推薦精度が向上することが分かった．We aimed at improving the efficiency and scalability of a hybrid music recommender system. Although this system was proved to make accurate recommendations by using a probabilistic model that integrates rating scores provided by users acoustic features of musical pieces, it lacks efficiency and scalability. That is, the entire model needs to be re-trained from scratch whenever a new score, user, or piece is added. Furthermore, the system cannot deal with practical numbers of users and pieces. To improve efficiency, we propose an incremental method that partially updates the model at low computational cost. To enhance scalability, we propose a method that first constructs a small "core" model over fewer virtual representatives created from real users and pieces, and then adds the real users and pieces to the core model by using the incremental method. The experimental results revealed that the proposed system was not only efficient and scalable but also outperformed the original system in terms of accuracy.

CiNii J-GLOBAL
音色特徴量分布の利用による調波・非調波併用モデルのパラメータ推定

糸山克寿, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会音楽情報処理研究会研究報告, 2007-SIG71-26 2007 ( 81 ) 161 - 166 2007年08月

　概要を見る

本稿では，調波・非調波統合モデルのモデルパラメータ推定における楽器の奏法と個体差に対する問題点を改善する手法について述べる．我々は，CDなどの複雑な多重奏音楽音響信号中の調波構造を持つ楽器音と持たない楽器音を同時に分離するために調波・非調波統合モデルおよびそのモデルパラメータの推定手法を設計し，調波構造モデルや非調波構造モデルをそれぞれ単独で用いるよりも統合モデルを用いることで分離性能が向上することを実験で示した．本システムで実楽曲を扱うためには，楽器の多様な奏法や個体差に対処する必要がある．そのための問題点は， MIDIは奏法を扱うことがほとんど不可能な点，単一のテンプレート音を用いると個体差に対処できない点である．そこで本稿ではこれらの問題点を解決するため，楽器音認識で用いられる音色特徴量の確率分布を用いてモデルパラメータを推定する手法を提案する．音色特徴量の確率分布は多数のデータを用いて学習されるため，特定のテンプレート音の影響を除去したパラメータ推定が可能になる．モデルから抽出した音色特徴量の確率分布に対する尤度を最大化するような制約を追加することで，モデルが表現する楽器音の特徴を多く満たすようなモデルパラメータが推定される．実験によって，音色特徴量分布の尤度を考慮することで分離`性能が向上することを確認した．This paper describes an improved parameter estimation method for an integrated weighted-mixture model con sisting of both harmonic-structure and inharmonic tone models. Although we have developed a sound source separation method by using the integrated model, this method has difficulties to deal with various performance styles and individual differences of musical instruments. To solve this problem, we propose a new parameter esti mation method by using probabilistic distributions of musical timbre features. Since the probabilistic distributions are trained by using various audio signals, dependency from particular template sounds decreases. By adding a new constraint of maximizing the likelihood of the probabilistic distributions of timbre features extracted from an estimated model, the model parameters can be estimated so that they can well express musical timbre features. The experimental results showed that the performance of separation improved.

CiNii
ドメイン拡張性を備えたトピック推定に基づく発話誘導を行うマルチドメイン音声対話システム

池田智志, 駒谷和範, 尾形哲也, 奥乃博

人工知能学会研究会資料 SIG-SLUD-A701-10 (7/23) 50 83 - 88 2007年07月

CiNii
会話文脈に応じた関連情報提示タスクのための文脈類似度計算手法の開発

白松俊, 駒谷和範, 尾形哲也, 奥乃博

人工知能学会研究会資料 SIG-SLUD-A701-10 (7/23) 57 - 62 2007年07月
自己組織化回路素子へのフリップフロップ素子導入による時系列学習

金天海, 出澤純一, 尾形哲也, 菅野重樹

第21回人工知能学会全国大会 21 3G6 - 1 2007年06月

CiNii
2A1-B10 ニューラルネットによる腱駆動ロボットアームの制御 : ランダムな運動からの逆モデルの学習(移動知)

有江浩明, ブルウィーラーベアート, 尾形哲也, 谷淳, 菅野重樹

ロボティクス・メカトロニクス講演会講演概要集 2007 "2A1 - B10(1)"-"2A1-B10(2)" 2007年05月

　概要を見る

In order to conduct the long time exploration-based learning experiments with a physical robot, there was necessity to build a durable robot. Therefore, we built a novel robot with a tendon-based actuation mechanism which can afford elasticity at each joint of the robot. Each of the arm joints is actuated by an antagonistic pair of two motor-spring assemblies. By controlling the tension in the spring with a classic PID controller, the joints are being torque-controlled. Because of inherent nonlinear characteristics of the developed system, standard PID control schemas cannot be applied to the system. Therefore, we employ the RNN to construct the inverse model which plays the role of controller. In this paper, we report the result of preliminary experiment using this controller.

CiNii
マルチドメイン音声対話システムにおける対話履歴を利用したドメイン選択

神田直之, 駒谷和範, 中野幹生, 中臺一博, 辻野広司, 尾形哲也, 奥乃博

情報処理学会論文誌 48 ( 5 ) 1980 - 1989 2007年05月
Meaning-Game-based Centering Model with Statistical Definition of Utility of Referential Expression and Its Verification Using Japanese and English Corpora

Shun SHIRAMATSU, Kazunori KOMATANI, Koiti HASIDA, Tetsuya OGATA, Hiroshi G. OKUNO

Proceedings of the 6th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC2007) 121 - 126 2007年05月 [査読有り]
可聴音波を用いたAHによる遮蔽物の検出と距離推定法

丹羽治彦, 尾形哲也, 駒谷和範, 奥乃博

日本音響学会研究発表会講演論文集(CD-ROM) 2007 1-10-7 2007年03月

J-GLOBAL
音を視覚化する録音再生システム

吉田雅敏, 海尻聡, 山本俊一, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会全国大会講演論文集 69th ( 2 ) 2.577-2.578 2007年03月

J-GLOBAL
自己身体モデルの投影に基づく模倣行為中における他者の発見

横矢龍之介, 尾形哲也, 谷淳, 駒谷和範, 奥乃博

情報処理学会全国大会講演論文集 69th ( 2 ) 2.445-2.446 2007年03月

J-GLOBAL
Drumix: An Audio Player with Functions of Realtime Drum-Part Rearrangement for Active Music Listening

Kazuyoshi YOSHII, Masataka GOTO, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Journal of Information Processing Society of Japan 48 ( 3 ) 1229 - 1239 2007年03月 [査読有り]
ICA と MFT に基づく音声認識における Soft Mask を用いた性能評価

武田龍, 山本俊一, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第69回全国大会 6ZB - 6 2007年03月
音環境を可視化する録音再生システム

吉田雅敏, 海尻聡, 山本俊一, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第69回全国大会 6ZB - 2 2007年03月

CiNii
物体静止画像から動的特徴を抽出する神経回路モデルの学習と解析

西出俊, 尾形哲也, 谷淳, 駒谷和範, 奥乃博

情報処理学会第69回全国大会 6B - 4 2007年03月
マルチドメイン音声対話システムにおけるシステム想定外発話のトピック推定に基づく発話誘導

池田智志, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第69回全国大会 5Q - 6 2007年03月
音声対話システムにおける発話検証を利用したシステム想定外発話の誤受理抑制

福林雄一朗, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第69回全国大会 2 5Q - 5 2007年03月

CiNii
EMアルゴリズムとパーティクルフィルタの統合によるリアルタイム複数の人物追跡システム実現

金鉉燉, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第69回全国大会 4B - 3 2007年03月
聴覚障害児の授業支援のためのHMDによる音声認識結果提示システムの設計

徳田浩一, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第69回全国大会 3ZB - 4 2007年03月
Onomatree:擬音語と木構造を併用した環境音検索インターフェース

清水敬太, 北原鉄朗, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第69回全国大会 3N - 7 2007年03月
自己組織化回路素子(SONE)への教師あり学習の付与

金天海, 尾形哲也, 菅野重樹

情報処理学会第69回全国大会 2Q - 3 2007年03月
マルチメディアコンテンツにおける音楽と映像の調和に関する分析

西山正紘, 北原鉄朗, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第69回全国大会 2N - 6 2007年03月
楽譜情報を用いたNMFによる音楽音響信号の音源分離

糸山克寿, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第69回全国大会 2N - 1 2007年03月

CiNii
RNNPBによる自然言語列と動作列の意味的結合と人間ロボットインタラクション

村瀬昌満, 尾形哲也, 谷淳, 駒谷和範, 奥乃博

情報処理学会第69回全国大会 1R - 4 2007年03月
自己身体モデルの投影に基づくロボットによる他者の動作予測

横矢龍之介, 尾形哲也, 谷淳, 駒谷和範, 奥乃博

情報処理学会第69回全国大会 1R - 3 2007年03月
人間型声道モデルと神経回路モデルを利用した母音模倣

神田尚, 尾形哲也, 駒谷和範, 奥乃博

情報処理学会第69回全国大会 1Q - 2 2007年03月

CiNii
可聴域音波を用いたAcoustical Holography (AH) による遮蔽物の検出と距離計測法

丹羽治彦, 駒谷和範, 尾形哲也, 奥乃博

音響学会春季講演会 1 2007年03月
歌声の分離と音響モデルの分離歌声への適用に基づく音楽音響信号と歌詞の時間的対応付け手法

藤原弘将, 後藤真孝, 緒方淳, 駒谷和範, 尾形哲也, 奥乃博

音響学会春季講演会 3 2007年03月
コーパスからの関連語獲得に基づく連想を加味した顕現性の推定

白松俊, 駒谷和範, 尾形哲也, 奥乃博

言語処理学会第13回年次大会 2007年03月
音声認識結果とコンセプトへの重みづけによるWFSTに基づく音声言語理解の高精度化

福林雄一朗, 駒谷和範, 中野幹生, 船越孝太郎, 辻野広司, 尾形哲也, 奥乃博

第66回音声言語情報処理研究会, 2007-SLP-66 (8), 2007-NL-179 (8) 情報処理学会 2007 ( 47 ) 43 - 48 2007年03月
多重奏音楽音響信号の音源分離のための調波・非調波モデルの制約付きパラメータ推定

糸山克寿, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

音楽情報処理研究会, 2007-MUS-70 (13), 2007-EC-7 (13) 2007 ( 37 ) 81 - 88 2007年03月
マルチメディアコンテンツにおける音楽と映像の調和度計算モデル

西山正紘, 北原鉄朗, 駒谷和範, 尾形哲也, 奥乃博

音楽情報処理研究会, 2007-MUS-,Vol.2007, No., 情報処理学会 2007 ( 15 ) 31 - 36 2007年02月

　概要を見る

本稿では、アクセント構造およびムードの一致に基づいて、音楽と映像の調和の度合い（調和度）を計算する枠組を提案する。一般に、音楽と映像の調和要因としては、時間的なアクセントの一致による時間的調和と、ムードの一致による意味的調和の２つが存在する。従来の研究では、それぞれの要因のみしか扱っておらず、両要因を統一的に扱った事例は存在しない。そこで本稿では、音楽と映像の調和度を、アクセント構造の一致に基づいて定量化した調和度とムードの一致に基づいて定量化した調和度の重み付き線形和で表現する。アクセント構造の一致は音楽と映像それぞれの特徴量系列間の相関に基づいて、ムードの一致はそれぞれの特徴部分空間内における相互の特徴量の連想に基づいて定量化する。実映像作品を対象とし、本手法の有効性を実験により評価した。In this paper, we propose a framework that understands congruency between music and video based on similarity of accent structure and mood. There are two types of congruency between music and video: temporal congruency related to synchronization of accents and semantic congruency related to similarity of mood. Previous works, however, have dealt only with either congruency. We model the temporal congruency based on the correlation between accent feature sequences extracted from audio and visual content, and the semantic congruency based on mutual mapping between two feature spaces representing music and video respectively. Then, we integrate the two types of congruency as a weighted linear sum. Our experiments with real-world content show the effects of our method.

CiNii
音声対話システムにおけるヘルプ生成のためのシステム想定外発話の誤受理抑制

福林雄一朗, 駒谷和範, 尾形哲也, 奥乃博

第65回音声言語情報処理研究会, Vol.2007, No., 情報処理学会 2007 ( 11 ) 61 - 66 2007年02月

　概要を見る

音声対話システムにおいて、音声認識誤りによる誤作動は避けられない。特に、初心者や慣れていないユーザの発話には、システムが想定していない文法や語彙が多く含まれ、音声認識誤りを引き起こす。これへの対処として我々は、ユーザの状態推定に基づく動的なヘルプ生成を目指してきた 1)。しかしシステム想定外発話に対する認識誤りの誤受理は、ユーザの状態推定に多大な悪影響を及ぼす。さらに音声認識結果が信用できない発話からも、ユーザの状態を推定できる必要がある。この問題に対して、我々は発話検証技術 2) の導入により解決を図る。この際に、検証用モデルとして、他タスクでの統計的言語モデルが使用可能かどうかを実験的に確認した。さらに検証用モデルとの尤度差が、発話の棄却判定だけでなく、語彙外発話と文法外発話との判別にも使用可能であることを示す。これにより、内容語が認識できていない場合でもユーザの状態を更新でき、ユーザへのヘルプ生成の精度向上が見込める。In spoken dialogue systems, false acceptances (FA) caused by automatic speech recognition (ASR) errors are inevitable. Especially, when a novice user uses the systems, he/she often makes out-of-grammar or out-of-vocabulary utterances, which cause ASR errors. We have developed a method for generating dynamic helps by estimating the user's mental model 1). However, these FAs badly affect the estimation of the model. It should be also possible that the model can be estimated even when ASR results are not reliable. To solve this problem, we incorporate a method of utterance verification. We confirmed that several statistical language models generally used are available as verification models. Furthermore, the differences of scores between two recognizers are helpful not only for rejecting ASR errors but also for distinguishing between out-of-grammar and out-of-vocabulary utterances. This result shows that user's mental model can be updated even when content words are not correctly recognized, and accordingly accuracy of the generated helps will be improved.

CiNii
Instrogram: Probabilistic Representation of Instrument Existence for Polyphonic Music

Kitahara Tetsuro, Goto Masataka, Komatani Kazunori, Ogata Tetsuya, Okuno Hiroshi G.

Information and Media Technologies 2 ( 1 ) 279 - 291 2007年

　概要を見る

This paper presents a new technique for recognizing musical instruments in polyphonic music. Since conventional musical instrument recognition in polyphonic music is performed notewise, i.e., for each note, accurate estimation of the onset time and fundamental frequency (F0) of each note is required. However, these estimations are generally not easy in polyphonic music, and thus estimation errors severely deteriorated the recognition performance. Without these estimations, our technique calculates the temporal trajectory of instrument existence probabilities for every possible F0. The instrument existence probability is defined as the product of a nonspecific instrument existence probabilitycalculated using the PreFEst and a conditional instrument existence probability calculated using hidden Markov models. The instrument existence probability is visualized as a spectrogram-like graphical representation called the instrogram and is applied to MPEG-7 annotation and instrumentation-similarity-based music information retrieval. Experimental results from both synthesized music and real performance recordings have shown that instrograms achieved MPEG-7 annotation (instrument identification) with a precision rate of 87.5% for synthesized music and 69.4% for real performances on average and that the instrumentation similarity measure reflected the actual instrumentation better than an MFCC-based measure.

DOI CiNii
Drumix: An Audio Player with Real-time Drum-part Rearrangement Functions for Active Music Listening

Yoshii Kazuyoshi, Goto Masataka, Komatani Kazunori, Ogata Tetsuya, Okuno Hiroshi G.

Information and Media Technologies 2 ( 2 ) 601 - 611 2007年

　概要を見る

This paper presents a highly functional audio player, called Drumix, that allows a listener to control the volume, timbre, and rhythmic patterns (drum patterns)of bass and snare drums within existing audio recordings in real time. A demand for active music listening has recently emerged. If the drum parts of popular songs could be manipulated, listeners could have new musical experiences by freely changing their impressions of the pieces (e.g., making the drum performance more energetic)instead of passively listening to them. To achieve this, Drumix provides three functions for rearranging drum parts, i.e., a volume control function that enables users to cut or boost the volume of each drum with a drum-specific volume slider, a timbre change function that allows them to replace the original timbre of each drum with another selected from a drop-down list, and a drum-pattern editing function that enables them to edit repetitive patterns of drum onsets on a graphical representation of their scores. Special musical skills are not required to use these functions. Subjective experiments revealed that Drumix could add a new dimension to the way listeners experience music.

DOI CiNii
Computational auditory scene analysis and its application to robot audition: Five years experience

Hiroshi G. Okuno, Tetsuya Ogata, Kazunori Komatani

Proceedings - Second International Conference on Informatics Research for Development of Knowledge Society Infrastructure, ICKS 2007 69 - 76 2007年 [査読有り]

　概要を見る

We have been engaged in research on computational auditory scene analysis to attain sophisticated robot/computer human interaction by manipulating real-world sound signals. The objective of our research is the understanding of an arbitrary sound mixture including non-speech sounds and music as well as voiced speech, obtained by robot's ears, that is, microphones embedded in the robot. We have coped with three main issues in computational auditory scene analysis, that is, sound source localization, separation, and recognition of separated sounds for a mixture of speech signals as well as polyphonic music signals. This paper overviews our results in robot audition, in particular, Missing Feature Theory based integration of sound source separation and automatic speech recognition, and those in music information processing, in particular, drum sound equalizer. © 2007 IEEE.

DOI

Scopus

17

被引用数

(Scopus)
Instrogram: Probabilistic Representation of Instrument Existence for Polyphonic Music

Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ipsjdc 3 1 - 13 2007年 [査読有り]

　概要を見る

This paper presents a new technique for recognizing musical instruments in polyphonic music. Since conventional musical instrument recognition in polyphonic music is performed notewise, i.e., for each note, accurate estimation of the onset time and fundamental frequency (F0) of each note is required. However, these estimations are generally not easy in polyphonic music, and thus estimation errors severely deteriorated the recognition performance. Without these estimations, our technique calculates the temporal trajectory of instrument existence probabilities for every possible F0. The instrument existence probability is defined as the product of a nonspecific instrument existence probabilitycalculated using the PreFEst and a conditional instrument existence probability calculated using hidden Markov models. The instrument existence probability is visualized as a spectrogram-like graphical representation called the instrogram and is applied to MPEG-7 annotation and instrumentation-similarity-based music information retrieval. Experimental results from both synthesized music and real performance recordings have shown that instrograms achieved MPEG-7 annotation (instrument identification) with a precision rate of 87.5% for synthesized music and 69.4% for real performances on average and that the instrumentation similarity measure reflected the actual instrumentation better than an MFCC-based measure.

DOI CiNii
Improving efficiency and scalability of model-based music recommender system based on incremental training

Kazuyoshi Yoshii, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the 8th International Conference on Music Information Retrieval, ISMIR 2007 89 - 94 2007年 [査読有り]

　概要を見る

We aimed at improving the efficiency and scalability of a hybrid music recommender system based on a probabilistic generative model that integrates both collaborative data (rating scores provided by users) and content-based data (acoustic features of musical pieces). Although the hybrid system was proved to make accurate recommendations, it lacks efficiency and scalability. In other words, the entire model needs to be re-trained from scratch whenever a new score, user, or piece is added. Furthermore, the system cannot deal with practical numbers of users and pieces on an enterprise scale. To improve efficiency, we propose an incremental method that partially updates the model at low computational cost. To enhance scalability, we propose a method that first constructs a small "core" model over fewer virtual representatives created from real users and pieces, and then adds the real users and pieces to the core model by using the incremental method. The experimental results revealed that the proposed system was not only efficient and scalable but also outperformed the original system in terms of accuracy. ©2007 Austrian Computer Society (OCG).
Drumix: An Audio Player with Real-time Drum-part Rearrangement Functions for Active Music Listening

Kazuyoshi Yoshii, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ipsjdc 3 134 - 144 2007年 [査読有り]

　概要を見る

This paper presents a highly functional audio player, called Drumix, that allows a listener to control the volume, timbre, and rhythmic patterns (drum patterns)of bass and snare drums within existing audio recordings in real time. A demand for active music listening has recently emerged. If the drum parts of popular songs could be manipulated, listeners could have new musical experiences by freely changing their impressions of the pieces (e.g., making the drum performance more energetic)instead of passively listening to them. To achieve this, Drumix provides three functions for rearranging drum parts, i.e., a volume control function that enables users to cut or boost the volume of each drum with a drum-specific volume slider, a timbre change function that allows them to replace the original timbre of each drum with another selected from a drop-down list, and a drum-pattern editing function that enables them to edit repetitive patterns of drum onsets on a graphical representation of their scores. Special musical skills are not required to use these functions. Subjective experiments revealed that Drumix could add a new dimension to the way listeners experience music.

DOI CiNii
Instrument identification in polyphonic music: Feature weighting to minimize influence of sound overlaps

Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Eurasip Journal on Advances in Signal Processing 2007 2007年 [査読有り]

　概要を見る

We provide a new solution to the problem of feature variations caused by the overlapping of sounds in instrument identification in polyphonic music. When multiple instruments simultaneously play, partials (harmonic components) of their sounds overlap and interfere, which makes the acoustic features different from those of monophonic sounds. To cope with this, we weight features based on how much they are affected by overlapping. First, we quantitatively evaluate the influence of overlapping on each feature as the ratio of the within-class variance to the between-class variance in the distribution of training data obtained from polyphonic sounds. Then, we generate feature axes using a weighted mixture that minimizes the influence via linear discriminant analysis. In addition, we improve instrument identification using musical context. Experimental results showed that the recognition rates using both feature weighting and musical context were 84.1 % for duo, 77.6 % for trio, and 72.3 % for quartet; those without using either were 53.4, 49.6, and 46.5 % , respectively.

DOI

Scopus

68

被引用数

(Scopus)
音源分離との統合によるミッシングフィーチャマスク自動生成に基づく同時発話音声認識

山本俊一, 中臺一博, 中野幹生, 辻野広司, Jean-Marc Valin, 駒谷和範, 尾形哲也, 奥乃博

日本ロボット学会誌 25 ( 1 ) 92 - 102 2007年01月 [査読有り]
Instrogram: Probabilistic Representation of Instrument Existence for Polyphonic Music

Tetsuro KITAHARA, Masataka GOTO, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Journal of Information Processing Society of Japan 48 ( 1 ) 214 - 216 2007年01月
Integration and adaptation of harmonic and inharmonic models for separating polyphonic musical signals

Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 1 57 - 60 2007年 [査読有り]

　概要を見る

This paper describes a sound source separation method for polyphonic sound mixtures o music to build an instrument equalizer for remixing multiple tracks separated from compact-disc recordings by changing the volume level of each track. Although such, mixtures usually include both harmonic and inharmonic sounds, the difficulties in dealing with both types of sounds together have not been addressed in most previous methods that have focused on either of the two types separately. We therefore developed an integrated weighted-mixture model consisting of both harmonic-structure and inharmonic-structure tone models (generative models for the power spectrogram). On the basis of the MAP estimation using the EM algorithm, we estimated all model parameters of this, integrated model under several original constraints for preventing over-training and maintaining intra-instrument consistency. Using standard MIDI files as prior information of the model parameters, We applied this model to compact-disc recordings and achieved the instrument equalizer. © 2007 IEEE.

DOI

Scopus

26

被引用数

(Scopus)
Enhancement of self organizing network elements for supervised learning

Chyon Hae Kim, Tetsuya Ogata, Shigeki Sugano

Proceedings - IEEE International Conference on Robotics and Automation 92 - 98 2007年 [査読有り]

　概要を見る

We have proposed self-organizing network elements (SONE) as a learning method for robots to meet the requirements of autonomous exploration of effective output, simple external parameters, and low calculation costs. SONE can be used as an algorithm for obtaining network topology by propagating reinforcement signals between the elements of a network. Traditionally, the analysis of fundamental features in SONE and their application to supervised learning tasks were difficult because the learning method of SONE was limited to reinforcement learning. Here the abilities of generalization, incremental learning, and temporal sequence learning were evaluated using a supervised learning method with SONE. Moreover, the proposed method enabled our SONE to be applied to a greater variety of tasks. © 2007 IEEE.

DOI

Scopus
Distance estimation of hidden objects based on acoustical holography by applying acoustic diffraction of audible sound

Haruhiko Niwa, Tetsuya Ogata, Kazunori Komatani, Okuno G. Hiroshi

Proceedings - IEEE International Conference on Robotics and Automation 423 - 428 2007年 [査読有り]

　概要を見る

Occlusion is a problem for range finders; ranging systems using cameras or lasers cannot be used to estimate distance to an object (hidden object) that is occluded by another (obstacle). We developed a method to estimate the distance to the hidden object by applying acoustic diffraction of audible sound. Our method is based on time-of-flight (TOF), which has been used in ultrasound ranging systems. We determined the best frequency of audible sound and designed its optimal modulated signal for our system. We determined that the system estimates the distance to the hidden object as well as the obstacle. However, the measurement signal obtained from the hidden object was weak. Thus, interference from sound signals reflected from other objects or walls was not negligible. Therefore, we combined acoustical holography (AH) and TOF, which enabled a partial analysis of the reflection sound intensity field around the obstacle and hidden object. Our method was effective for ranging two objects of the same size within a 1.2 m depth range. The accuracy of our method was 3 cm for the obstacle, and 6 cm for the hidden object. ©2007 IEEE.

DOI

Scopus

5

被引用数

(Scopus)
Human-robot cooperation using quasi-symbols generated by RNNPB model

Tetsuya Ogata, Shohei Matsumoto, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

Proceedings - IEEE International Conference on Robotics and Automation 2156 - 2161 2007年 [査読有り]

　概要を見る

We describe a means of human robot interaction based not on natural language but on "quasi symbols," which represent sensory-motor dynamics in the task and/or environment. It thus overcomes a key problem of using natural language for human-robot interaction - the need to understand the dynamic context The quasi-symbols used are motion primitives corresponding to the attractor dynamics of the sensory-motor flow. These primitives are extracted from the observed data using the recurrent neural network with parametric bias (RNNPB) model. Binary representations based on the model parameters were implemented as quasi symbols in a humanoid robot, Robovie. The experiment task was robot-arm operation on a table. The quasi-symbols acquired by learning enabled the robot to perform novel motions. A person was able to control the arm through speech interaction using these quasi-symbols. These quasi symbols formed a hierarchical structure corresponding to the number of nodes in the model. The meaning of some of the quasi-symbols depended on the context, indicating that they are useful for human-robot interaction. © 2007 IEEE.

DOI

Scopus

19

被引用数

(Scopus)
Predicting object dynamics from visual images through active sensing experiences

Shun Nishide, Tetsuya Ogata, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

Proceedings - IEEE International Conference on Robotics and Automation 2501 - 2506 2007年 [査読有り]

　概要を見る

Prediction of dynamic features is an important task for determining the manipulation strategies of an object. This paper presents a technique for predicting dynamics of objects relative to the robot's motion from visual images. During the learning phase, the authors use Recurrent Neural Network with Parametric Bias (RNNPB) to self-organize the dynamics of objects manipulated by the robot into the PB space. The acquired PB values, static images of objects, and robot motor values are input into a hierarchical neural network to link the static images to dynamic features (PB values). The neural network extracts prominent features that induce each object dynamics. For prediction of the motion sequence of an unknown object, the static image of the object and robot motor value are input into the neural network to calculate the PB values. By inputting the PB values into the closed loop RNNPB, the predicted movements of the object relative to the robot motion are calculated sequentially. Experiments were conducted with the humanoid robot Robovie-IIs pushing objects at different heights. Reducted grayscale images and shoulder pitch angles were input into the neural network to predict the dynamics of target objects. The results of the experiment proved that the technique is efficient for predicting the dynamics of the objects. © 2007 IEEE.

DOI

Scopus

7

被引用数

(Scopus)
Evaluation of two simultaneous continuous speech recognition with ICA BSS and MFT-based ASR

Ryu Takeda, Shun'ichi Yamamoto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4570 LNAI 384 - 394 2007年 [査読有り]

　概要を見る

An adaptation of independent component analysis (ICA) and missing feature theory (MFT)-based ASR for two simultaneous continuous speech recognition is described. We have reported on the utility of a system with isolated word recognition, but the performance of the MFT-based ASR is affected by the configuration, such as an acoustic model. The system needs to be evaluated under a more general condition. It first separates the sound sources using ICA. Then, spectral distortion in the separated sounds is estimated to generate missing feature masks (MFMs). Finally, the separated sounds are recognized by MFT-based ASR. We estimate spectral distortion in the temporal-frequency domain in terms of feature vectors, and we generate MFMs. We tested an isolated word and the continuous speech recognition with a cepstral and spectral feature. The resulting system outperformed the baseline robot audition system by 13 and 6 points respectively on the spectral features. © Springer-Verlag Berlin Heidelberg 2007.

DOI

Scopus

1

被引用数

(Scopus)
Real-time auditory and visual talker tracking through integrating EM algorithm and particle filter

Hyun Don Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4570 LNAI 280 - 290 2007年 [査読有り]

　概要を見る

This paper presents techniques that enable a talker tracking for effective human-robot interaction. We propose new way of integrating an EM algorithm and a particle filter to select an appropriate path for tracking the talker. It can easily adapt to new kinds of information for tracking the talker with our system. This is because our system estimates the position of the desired talker through means, variances, and weights calculated from EM training regardless of the numbers or kinds of information. In addition, to enhance a robot's ability to track a talker in real-world environments, we applied the particle filter to talker tracking after executing the EM algorithm. We also integrated a variety of auditory and visual information regarding sound localization, face localization, and the detection of lip movement. Moreover, we applied a sound classification function that allows our system to distinguish between voice, music, or noise. We also developed a vision module that can locate moving objects. © Springer-Verlag Berlin Heidelberg 2007.

DOI

Scopus

4

被引用数

(Scopus)
Auditory and visual integration based localization and tracking of multiple moving sounds in daily-life environments

Hyun Don Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings - IEEE International Workshop on Robot and Human Interactive Communication 399 - 404 2007年 [査読有り]

　概要を見る

This paper presents techniques that enable talker tracking for effective human-robot interaction. To track moving people in daily-life environments, localizing multiple moving sounds is necessary so that robots can locate talkers. However, the conventional method requires an array of microphones and impulse response data. Therefore, we propose a way to integrate a cross-power spectrum phase analysis (CSP) method and an expectation-maximization (EM) algorithm. The CSP can localize sound sources using only two microphones and does not need impulse response data. Moreover, the EM algorithm increases the system's effectiveness and allows it to cope with multiple sound sources. We confirmed that the proposed method performs better than the conventional method. In addition, we added a particle filter to the tracking process to produce a reliable tracking path and the particle filter is able to integrate audio-visual information effectively. Furthermore, the applied particle filter is able to track people while dealing with various noises that are even loud sounds in the daily-life environments. ©2007 IEEE.

DOI

Scopus

2

被引用数

(Scopus)
Topic Estimation with Domain Extensibility for Guiding User”s Out-of-Grammar Utterance in Multi-Domain Spoken Dialogue Systems

Satoshi IKEDA, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Proceedings of International Conference on Spoken Language Processing (Interspeech-2007) 3 2057 - 2060 2007年 [査読有り]
Introducing utterance verification in spoken dialogue system to improve dynamic help generation for novice users

Kazunori Komatani, Yuichiro Fukubayashi, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue 202 - 205 2007年 [査読有り]

　概要を見る

A method is presented that helps novice users understand the language expressions that a system can accept, even from unacceptable utterances made that may contain automatic speech recognition errors. We have developed a method that dynamically generates help messages, which can avoid further unacceptable utterances from being made, by estimating a users' knowledge from their utterances. To improve the accuracy of the estimation, we developed a method to estimate a user's knowledge from utterance verification results. This method estimates whether a user knows an utterance pattern that the system considers acceptable, and suppresses useless help messages from being generated. © 2007 Association for Computational Linguistics.
Exploiting known sound source signals to improve ICA-based robot audition in speech separation and recognition

Ryu Takeda, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 1757 - 1762 2007年 [査読有り]

　概要を見る

This paper describes a new semi-blind source separation (semi-BSS) technique with independent component analysis (ICA) for enhancing a target source of interest and for suppressing other known interference sources. The semi-BSS technique is necessary for double-talk free robot audition systems in order to utilize known sound source signals such as self speech, music, or TV-sound, through a line-in or ubiquitous network. Unlike the conventional semi-BSS with ICA, we use the time-frequency domain convolution model to describe the reflection of the sound and a new mixing process of sounds for ICA. In other words, we consider that reflected sounds during some delay time are different from the original. ICA then separates the reflections as other interference sources. The model enables us to eliminate the frame size limitations of the frequency-domain ICA, and ICA can separate the known sources under a highly reverberative environment. Experimental results show that our method outperformed the conventional semi-BSS using ICA under simulated normal and highly reverberative environments. ©2007 IEEE.

DOI

Scopus

18

被引用数

(Scopus)
Two-way translation of compound sentences and arm motions by recurrent neural networks

Tetsuya Ogata, Masamitsu Murase, Jim Tani, Kazunori Komatani, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 1858 - 1863 2007年 [査読有り]

　概要を見る

We present a connectionist model that combines motions and language based on the behavioral experiences of a real robot. Two models of recurrent neural network with parametric bias (RNNPB) were trained using motion sequences and linguistic sequences. These sequences were combined using their respective parameters so that the robot could handle many-to-many relationships between motion sequences and linguistic sequences. Motion sequences were articulated into some primitives corresponding to given linguistic sequences using the prediction error of the RNNPB model. The experimental task in which a humanoid robot moved its arm on a table demonstrated that the robot could generate a motion sequence corresponding to given linguistic sequence even if the motions or sequences were not included in the training data, and vice versa. ©2007 IEEE.

DOI

Scopus

48

被引用数

(Scopus)
Vocal imitation using physical vocal tract model

Hisashi Kanda, Tetsuya Ogata, Kazunori Komatani, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 1846 - 1851 2007年 [査読有り]

　概要を見る

A vocal imitation system was developed using a computational model that supports the motor theory of speech perception. A critical problem in vocal imitation is how to generate speech sounds produced by adults, whose vocal tracts have physical properties (i.e., articulatory motions) differing from those of infants' vocal tracts. To solve this problem, a model based on the motor theory of speech perception, was constructed. This model suggests that infants simulate the speech generation by estimating their own articulatory motions in order to interpret the speech sounds of adults. Applying this model enables the vocal imitation system to estimate articulatory motions for unexperienced speech sounds that have not actually been generated by the system. The system was implemented by using Recurrent Neural Network with Parametric Bias (RNNPB) and a physical vocal tract model, called the Maeda model. Experimental results demonstrated that the system was sufficiently robust with respect to individual differences in speech sounds and could imitate unexperienced vowel sounds. ©2007 IEEE.

DOI

Scopus

7

被引用数

(Scopus)
A biped robot that keeps steps in time with musical beats while listening to music with its own ears

Kazuyoshi Yoshii, Kazuhiro Nakadai, Toyotaka Torii, Yuji Hasegawa, Hiroshi Tsujino, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 1743 - 1750 2007年 [査読有り]

　概要を見る

We aim at enabling a biped robot to interact with humans through real-world music in daily-life environments, e.g., to autonomously keep its steps (stamps) in time with musical beats. To achieve this, the robot should be able to robustly predict the beat times in real time while listening to musical performance with its own ears (head-embedded microphones). However, this has not previously been addressed in most studies on music-synchronized robots due to the difficulty in predicting the beat times in real-world music. To solve this problem, we implemented a beat-tracking method developed in the field of music information processing. The predicted beat times are then used by a feedback-control method that adjusts the robot's step intervals to synchronize its steps in time with the beats. The experimental results show that the robot can adjust its steps in time with the beat times as the tempo changes. The resulting robot needed about 25 [s] to recognize the tempo change after it and then synchronize its steps. ©2007 IEEE.

DOI

Scopus

38

被引用数

(Scopus)
Discovery of other individuals by projecting a self-model through imitation

Ryunosuke Yokoya, Tetsuya Ogata, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 1009 - 1014 2007年 [査読有り]

　概要を見る

This paper proposes a novel model which enables a humanoid robot infant to discover other individual (e.g. human parent). In this work, the authors define "other individual" as an actor which can be predicted by a self-model. For modeling the developmental process of discovering ability, the following three approaches are employed. (i) Projection of a self-model for predicting other individual's actions. (ii) Mediation by a physical object between self and other individual. (iii) Introduction of infant imitation by parent. For creating the self-model of a robot, we apply Recurrent Neural Network with Parametric Bias (RNNPB) model which can learn the robot's body dynamics. For the other-model of a human, conventional hierarchical neural networks are attached to the RNNPB model as "conversion modules". Our target task is a moving an object. For evaluation of our model, human discovery experiments by the robot projecting its self-model were conducted. The results demonstrated that our method enabled the robot to predict the human's motions, and to estimate the human's position fairly accurately, which proved its adequacy. ©2007 IEEE.

DOI

Scopus

10

被引用数

(Scopus)
Auditory and visual integration based localization and tracking of humans in daily-life environments

Hyun Don Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 2021 - 2027 2007年 [査読有り]

　概要を見る

The purpose of this research is to develop techniques that enable robots to choose and track a desired person for interaction in daily-life environments. Therefore, localizing multiple moving sounds and human faces is necessary so that robots can locate a desired person. For sound source localization, we used a cross-power spectrum phase analysis (CSP) method and showed that CSP can localize sound sources only using two microphones and does not need impulse response data. An expectation-maximization (EM) algorithm was shown to enable a robot to cope with multiple moving sound sources. For face localization, we developed a method that can reliably detect several faces using the skin color classification obtained by using the EM algorithm. To deal with a change in color state according to illumination condition and various skin colors, the robot can obtain new skin color features of faces detected by OpenCV, an open vision library, for detecting human faces. Finally, we developed a probability based method to integrate auditory and visual information and to produce a reliable tracking path in real time. Furthermore, the developed system chose and tracked people while dealing with various background noises that are considered loud, even in the daily-life environments. ©2007 IEEE.

DOI

Scopus

16

被引用数

(Scopus)
Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech

Shun'ichi Yamamoto, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007, Proceedings 111 - 116 2007年 [査読有り]

　概要を見る

This paper addresses robot audition that can cope with speech that has a low signal-to-noise ratio (SNR) in real time by using robot-embedded microphones. To cope with such a noise, we exploited two key ideas; Preprocessing consisting of sound source localization and separation with a microphone array, and system integration based on missing feature theory (MFT). Preprocessing improves the SNR of a target sound signal using geometric source separation with multichannel post-filter. MFT uses only reliable acoustic features in speech recognition and masks unreliable parts caused by errors in preprocessing. MFT thus provides smooth integration between preprocessing and automatic speech recognition. A real-time robot audition system based on these two key ideas is constructed for Honda ASIMO and Humanoid SIG2 with 8-ch microphone arrays. The paper also reports the improvement of ASR performance by using two and three simultaneous speech signals. © 2007 IEEE.

DOI
多重奏を対象とした音源同定：混合音テンプレートを用いた音の重なりに頑健な特徴量への重みづけおよび音楽的文脈の利用

北原鉄朗, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

電子情報通信学会論文誌 J89-D ( 12 ) 2721 - 2733 2006年12月 [査読有り]

　概要を見る

本論文では,多重奏に対する音源同定において不可避な課題である「音の重なりによる特徴変動」について新たな解決法を提案する.多重奏では複数の楽器が同時に発音するため,各々の周波数成分が重なって干渉し,音響的特徴が変動する.本研究では,混合音から抽出した学習データに対して,各特徴量のクラス内分散・クラス間分散比を求めることで,周波数成分の重なりの影響の大きさを定量的に評価する.そして,線形判別分析を用いることで,これを最小化するように特徴量を重み付けした新たな特徴量軸を生成する.これにより,周波数成分の重なりの影響をできるだけ小さくした特徴空間が得られる.更に,音楽的文脈を利用することで音源同定の更なる高精度化を図る.実楽器音データベースから作成した二重奏〜四重奏の音響信号を用いた実験により,二重奏では50.9%から84.1%へ,三重奏では46.1%から77.6%へ,四重奏では43.1%から72.3%へ認識率の改善を得,本手法の有効性を確認した.

CiNii
隠れ状態を有する連続な状態空間での強化学習法の提案

鈴木貴晴, 有江浩明, 尾形哲也, 谷淳, 菅野重樹

第7回システムインテグレーション部門講演会 (SI2006), 2C2-7, 計測自動制御学会 2006年12月
自己組織化回路素子SONEへの教師あり学習機能の付与

金天海, 尾形哲也, 菅野重樹

第7回システムインテグレーション部門講演会 (SI2006), 2C2-6, 計測自動制御学会 2006年12月
自己組織化回路素子SONEにおけるフリップフロップ素子の導入によるシーケンスの分節化と統合

出澤純一, 金天海, 尾形哲也, 菅野重樹

第7回システムインテグレーション部門講演会 (SI2006), 2C2-5, 計測自動制御学会 2006年12月
IECを用いた適応的なインタラクションシステムの実現

小林大三, 遠藤ちひろ, 松本猛, 菅佑樹, 尾形哲也, 菅野重樹

第7回システムインテグレーション部門講演会 (SI2006), 1L3-5, 計測自動制御学会 2006年12月
能動知覚経験に基づく物体静止画像からの挙動推定

西出俊, 尾形哲也, 谷淳, 駒谷和範, 奥乃博

第7回システムインテグレーション部門講演会 (SI2006), 1B2-4, 計測自動制御学会 2006年12月
視聴覚情報統合及びEMアルゴリズムを用いた人物追跡システム実現

Hyun-Don Kim, 駒谷和範, 尾形哲也, 奥乃博

第24回 AI チャレンジ研究会, 人工知能学会 SIG-Challenge-0624-8 51 - 58 2006年11月
ICAによる音源分離とMFTに基づく音声認識の同時発話認識による評価,

武田龍, 山本俊一, 駒谷和範, 尾形哲也, 奥乃博

第24回 AI チャレンジ研究会, 人工知能学会 SIG-Challenge-0624-2 9 - 16 2006年11月

CiNii
RNNPBによる視聴覚情報変換を利用したロボットの身体・音声表現

尾形哲也, 小嶋秀樹, 駒谷和範, 奥乃博

言語理解とコミュニケーション研究会他, TL2006-22 NLC2006-18 PRMU2006-99 pp.27-32(TL), pp.27-32(NLC), pp.45-50(PRMU), 電子情報通信学会 106 ( 298 ) 27 - 32 2006年10月

　概要を見る

本論文は,神経回路モデルを利用したマルチモーダル変換手法を提案し,これを用いた映像からの音響信号生成と音響信号からのロボットジェスチャ生成の結果について報告する.マルチモーダル変換実現に際し,実環境の視聴覚信号全て構造的に保有するデータベースを作ることは不可能であるため,情報の汎化能力が不可欠となる.そこで我々は高い汎化能力を有するRNNPBと呼ばれる神経回路モデルを採用した.本手法を小型ロボットKeeponに導入し,実験を行った.箱状物体を用いて摩擦音・衝突音を生じる,4種類のイベントをKeeponに学習させた結果,既知イベントの映像・音響信号の変換だけでなく,未知イベントの映像・音響信号変換も実現した.

CiNii
可聴音波を用いたAHによる遮蔽物の検出と距離推定法

丹羽治彦, 尾形哲也, 駒谷和範, 奥乃博

電子情報通信学会技術研究報告 106 ( 267(EA2006 48-54) ) 1 - 6 2006年09月

　概要を見る

本研究では、可聴域の音波の回折現象を利用してオクルージョン問題における遮蔽された物体の検出とその距離を計測する手法について報告する。まず可聴域音波を使用した場合の距離計測の性能について調査した。計測手法は基本的に超音波距離計測で用いられるTOF法に基づいた相互相関法を用いた。次に2物体を同軸かつ正面に配置した場合の同手法による同時距離推定を検証した。その問題点として背後の物体からの反射波信号のSNRが小さいというものがあった。そこで背後の物体からの反射波を強調して獲得するために、3次元音場解析法であるAH法を適用した。その結果背後物体からの反射信号のSNRの増加を可能にした。本手法は奥行き1.2mの範囲において同サイズの2物体に対して有効であり、その距離推定精度は正面の物体には3cm程度、背後の物体には6cm程度であった。

CiNii J-GLOBAL
Instrogramを用いた楽器構成に基づく類似楽曲検索

北原鉄朗, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

日本音響学会研究発表会講演論文集(CD-ROM) 2006 2-7-13 2006年09月

J-GLOBAL
Generation of Robot Motions from Environmental Sounds using Inter-modality Mapping by RNNPB

Tetsuya OGATA, Yuya HATTORI, Hideki KOZIMA, Kazunori KOMATANI, Hiroshi G. OKUNO

Proc. of International Workshop on Epigenetic Robotics 95 - 102 2006年09月 [査読有り]

担当区分：筆頭著者

CiNii
パラメータ最適化による実環境同時発話認識向上とそのオンライン処理の実装

山本俊一, 中臺一博, 中野幹生, 辻野広司, JEAN-MARC VALIN, 駒谷和範, 尾形哲也, 奥乃博

第24回日本ロボット学会学術講演 2006年09月
ICAとミッシングフィーチャマスク自動生成によるロボット聴覚

武田龍, 山本俊一, 駒谷和範, 尾形哲也, 奥乃博

第24回日本ロボット学会学術講演会 2006年09月
CTRNNを用いた強化学習法による連続な行動出力の獲得

有江浩明, 尾形哲也, 谷淳, 菅野重樹

第24回日本ロボット学会学術講演会 2006年09月
リカレントニューラルネットワークによるロボットの異種感覚モダリティ変換

尾形哲也, 服部佑哉, 小嶋秀樹, 駒谷和範, 奥乃博

第24回日本ロボット学会学術講演会 2006年09月
ユーザの変化する主観に対応するコミュニケーションロボットの行動獲得

松本猛, 遠藤ちひろ, 小林大三, 菅佑樹, 尾形哲也, 菅野重樹

第24回日本ロボット学会学術講演会 2006年09月
人間ロボット協調のためのRNNPBによる疑似シンボルの獲得とその階層性の解析

村瀬昌満, 松本祥平, 尾形哲也, 谷淳, 駒谷和範, 奥乃博

第24回日本ロボット学会学術講演会 2006年09月
ロボットの身体経験に基づくRNNPBを用いた模倣動作の自律的獲得

横矢龍之介, 尾形哲也, 谷淳, 駒谷和範, 奥乃博

第24回日本ロボット学会学術講演会 2006年09月
自己組織化論理回路における対ノイズ性能の向上

金天海, 出澤純一, 尾形哲也, 菅野重樹

第24回日本ロボット学会学術講演会 2006年09月
人間とプログラムのハイブリッド評価を用いた対話型進化的計算コミュニケーションロボットの行動獲得への応用

菅佑樹, 尾形哲也, 菅野重樹

第24回日本ロボット学会学術講演会 2006年09月
可聴音波を用いたAH法による遮断された物体の検出と距離計測

丹羽治彦, 尾形哲也, 駒谷和範, 奥乃博

日本音響学会応用音響研究会 2006年09月
lnstrogram：発音時刻検出とＦＯ推定の不要な楽器音認識手法

北原鉄朗, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会研究報告音楽情報科学（MUS） 2006 ( 90 ) 69 - 76 2006年08月

　概要を見る

本稿では，Instrogramと呼ばれる楽器存在確率の時間・周波数表現に基づく新たな楽器音認識手法を提案する．従来の多くの楽器音認識では単音を処理単位としていたため，各単音の発音時刻や基本周波数(ＦＯ）を正確に推定する必要があった．しかし，混合音におけるそれらの推定は難しく，推定誤りによって楽器音認識精度は大きく低下していた．本手法では，これらの推定をせずに，あらゆるＦＯに関して楽器存在確率の時系列を求め，Instrogramと呼ばれるスペクトログラムのような視覚表現として可視化する．実験の結果，Instrogramが実際の楽器構成を表していることを確認した．さらに，Instrogram間の類似度計算に基づいた類似楽曲検索も実現した．This paper describes a new musical instrument recognition method based on the time-frequency representation of instrumentation called an instrogram. Because the conventional instrument recognition is performed for each note, accurate estimation of the onset time and fundamental frequency (FO) is required. These estimation is, however, difficult in polyphonic music, and thus their errors deteriorated the recognition performance severely. Without these estimation , our method calculates the temporal trajectory of instrument existence probabilities for every possible FO and visualizes them as a spectrogram-like graphical representation，called an instrogram. Experimental results show that instrograms represent actual instrumentation. We have also achieved music information retrieval by calculating the similarity between instrograms．

CiNii
音楽音響信号と歌詞の時間的対応付け手法:歌声の分離と母音のViterbiアラインメント

藤原弘将, 後藤真孝, 緒方淳, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会研究報告 2006 ( 90(MUS-66) ) 37 - 44 2006年08月

　概要を見る

本稿では，伴奏音を含む音楽音響信号と対応する歌詞の時間的な対応付け手法について述べる．クリーンな音声信号とその発話内容の時間的対応付けを推定をするViterbi アラインメント手法はこれまでも存在したが，歌声と同時に演奏される伴奏音の悪影響で市販 CD 中の歌声には適用できなかった．本稿では，この問題を解決するため，歌声の調波構造を抽出・再合成することで混合音中の歌声を分離する手法，歌声・非歌声状態を行き来する隠れマルコフモデル (HMM)を用いた歌声区間検出手法，音響モデルを分離歌声に適応させることで Viterbi アラインメントを適用する手法を提案する．日本語のポピュラー音楽を用いた評価実験を行い，本手法により10曲中8曲について十分な精度で音楽と歌詞の対応付けが出来ることを確かめた．This paper describes a method that can automatically synchronize between polyphonic musical audio signals and corresponding lyrics. Although there were methods that can synchronize between monophonic speech signals and corresponding text transcriptions by using Viterbi alignment techniques, they cannot be applied to vocals in CD recordings because accompaniment sounds often overlap with vocals. To align lyrics with such vocals, we therefore developed three methods: a method for segregating vocals from polyphonic sound mixtures by extracting and resynthesizing the vocal melody, a method for detecting vocal sections using a Hidden Markov Model (HMM) that transitions back and forth between vocal and non-vocal state, and a method for adapting a speech-recognizer phone model to segregated vocal signals. Experimental results for 10 Japanese popular-music songs showed that our system can synchronize between music and lyrics with satisfactory accuracy for 8 songs.

CiNii J-GLOBAL
第68回情報処理学会全国大会

尾形哲也

バイオメカニズム学会誌 = Journal of the Society of Biomechanisms 30 ( 3 ) 169 - 169 2006年08月

CiNii
ユーザの評価と音響的特徴との確率的統合に基づくハイブリッド型楽曲推薦システム

吉井和佳, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

第66回音楽情報処理研究会, 2006-MUS, 情報処理学会 2006年08月

CiNii
Instrogram: 発音時刻検出とF0推定の不要な楽器音認識手法

北原鉄朗, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

第66回音楽情報処理研究会, 2006-MUS, 情報処理学会 2006 69 - 75 2006年08月

CiNii
伴奏音抑制と高信頼度フレーム選択に基づく楽曲中の歌声の歌手名同定手法

藤原弘将, 北原鉄朗, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

第66回音楽情報処理研究会, 2006-MUS, 情報処理学会 2006年08月
音声対話システムにおける発話パターンを教示するヘルプの動的生成

福林雄一郎, 駒谷和範, 尾形哲也, 奥乃博

人工知能学会研究会資料SIG-SLUD-A601-03 人工知能学会 13 - 18 2006年07月

CiNii
伴奏音抑制と高信頼度フレーム選択に基づく楽曲の歌手名同定手法

藤原弘将, 北原鉄朗, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会論文誌 47 ( 6 ) 1831 - 1843 2006年06月 [査読有り]

　概要を見る

本論文では，実世界の音楽音響信号に対する歌手名の同定手法について述べる．歌手名の同定を行う際に大きな問題となるのは，混在する伴奏音の影響である．本論文ではこの問題を解決するため，伴奏音抑制と高信頼度フレーム選択の手法を提案する．前者では，優勢なメロディの調波構造を抽出し再合成することで，伴奏音の影響を低減させることができる．後者は，歌声と非歌声を表す2 種類の混合正規分布を用いて，それぞれのフレームが歌声として信頼できるか否かを判定するものである．実験の結果，本手法によって，10 歌手40 曲に対して95%の識別率を達成し，本手法を用いない場合と比較して誤り率を約89%削減した．また，20 歌手256 曲に対する実験の結果，約93%の識別率を達成し，誤り率を約65%削減した．This paper describes a method for automatic singer identification from polyphonic musical audio signals. The main problem in automatically identifying singers is negative influence caused by accompaniment sounds. To solve this problem, we developed two methods: accompaniment sound reduction and reliable frame selection. The former makes it possible to reduce accompaniment sounds by extracting and resynthesizing harmonic components of the predominant melody. The latter judges whether each frame of the obtained melody is reliable or not by using two Gaussian mixture models for vocal and non-vocal frames. Experimental results with 256 songs by 20 singers showed that our method reduced 65% of classification errors, and achieved an accuracy of 93%.

CiNii
データベース検索における対話文脈を利用した音声言語理解

神田直之, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会論文誌 47 ( 6 ) 1802 - 1811 2006年06月 [査読有り]
Robot Imitation from Active-Sensing Experiences

Ryunosuke YOKOYA, Tetsuya OGATA, Jun TANI, Kazunori KOMATANI, Hiroshi G. OKUNO

International Conference on Development and Learning (ICDL 2006) 2006年06月 [査読有り]
遺伝的アルゴリズムを用いたパラメータ最適化による話者位置に基づく同時発話認識の向上

山本俊一, 中臺一博, 中野幹生, 辻野広司, VALIN Jean-Marc, 武田龍, 駒谷和範, 尾形哲也, 奥乃博

ヒューマンインタフェース学会論文誌 8 ( 2 ) 203 - 212 2006年05月 [査読有り]

CiNii
ドラムパートのリアルタイム編集機能付きオーディオプレイヤー

吉井和佳, 後藤真考, 駒谷和範, 尾形哲也, 奥乃博

インタラクション2006, インタラクティブセッション, 情報処理学会 2006年05月

CiNii
IECを用いたコミュニケーションロボットにおける人間の主観変化への適応

遠藤ちひろ, 小林大三, 松本猛, 菅佑樹, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2006 "2P1 - D18(1)"-"2P1-D18(2)" 2006年05月

　概要を見る

As the learning function of a communication robot, we are using Interactive Evolutionary Computation (IEC). The IEC is an evolutionary computation where the fitness function is performed by a user, so it is available for the learning of the user's subjective preference. In this paper, we show the adaptability of the IEC to the interaction between various users and a robot. We installed the IEC function into a real robot named WAMOEBA-3. In the experiment, various interactions were shown because we did not tell subjects anything about the robot. However, the robot's motion changed to the behavior the users liked and the fitness values are gradually increased. Therefore, the adaptability of the IEC is confirmed.

CiNii
人間とのコミュニケーションを通したロボットの内分泌系モデルと情動表出の段階的進化

菅佑樹, 高田広太郎, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2006年05月
自己組織化ネットワーク素子群における対ノイズ性能向上

出澤純一, 金天海, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2006 "2A1 - E03(1)"-"2A1-E03(4)" 2006年05月

　概要を見る

The network learning method named Self-Organizing Network Elements (SONE) was proposed and has been investigated by us. Autonomous exploration of effective output, simple external parameters, and low calculation costs are achieved as functions of a robot system with this method. However, there is the necessity of improving performance against noises for learning more complicated tasks. Therefore, we propose the technique for adjusting the thresholds in Self-Organizing Logic Circuit in this paper. In our experiments, 3Bit operation tasks and controlling Khepera robot, the generation of new elements was appropriately controlled with this technique, and also the network performance against noises is improved.

CiNii
CTRNNを用いた連続な状態空間における強化学習法の提案

有江浩明, 尾形哲也, 谷淳, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2006 "2A1 - E07(1)"-"2A1-E07(2)" 2006年05月

　概要を見る

There are some difficulties to apply traditional reinforcement learning algorithms to robot motion control task. Because most algorithms are concerned with discrete state space and based on the assumption of complete observability of the state. This paper deals with these problems by combining the reinforcement learning algorithm and CTRNN learning algorithm (BPTT). We carried out an experiment on pendulum swing up task without rotational speed information. It is shown that the information about rotational speed of pendulum, which is considered as a hidden state, is estimated and encoded on the activation of a context unit.

CiNii
ローカルルールに基づいた理論回路の自己組織アルゴリズム

金天海, 尾形哲也, 菅野重樹

計測自動制御学会論文誌 42 ( 4 ) 334 - 341 2006年04月 [査読有り]

　概要を見る

We developed a learning system for autonomous robots that allows for autonomous exploration of the effective output, and has simple external parameters and a low calculation cost. We propose the concept of self-organizing network elements (SONE) for creating learning systems with these characteristics. We created and evaluated a self-organizing logic circuit by using this concept. Our results indicated this learning system had the characteristics.

DOI CiNii
多重奏中特定パートの自動採譜における複数特徴量の自動重み付け

糸山克寿, 北原鉄朗, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第68回全国大会 2006年03月
表題音楽アノテーションのための階層的物語タグの設計

西山正紘, 北原鉄朗, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第68回全国大会 2006年03月
擬音語表現を利用した環境音のためのXMLタグの設計と自動付与

田口明裕, 北原鉄朗, 石原一志, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第68回全国大会 2006年03月
GAによる話者位置への同時発話認識システムの最適化

山本俊一, 中臺一博, 中野幹生, 辻野広司, Jean-Marc VALIN, 武田龍, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第68回全国大会 2006年03月
音源定位及び唇の動き検出による複数ユーザ環境における発話者認識

Hyun-Don KIM, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第68回全国大会 2006年03月
強化学習による人間位置に基づいたロボットの挙動選択

田崎豪, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第68回全国大会 2006年03月
複数ドメイン音声対話システムにおける対話履歴を利用したドメイン選択の高精度化

神田直之, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第68回全国大会 2006年03月
音声対話システムにおけるユーザの誤り原因の推定に基づく動的ヘルプ生成

福林雄一朗, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第68回全国大会 2006年03月
ICAによる音源分離とミッシングフィーチャーマスクによる同時発話認識

武田龍, 山本俊一, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第68回全国大会 2006年03月
RNNPBを用いたモダリティ間マッピングによるロボットの動作生成

服部佑哉, 駒谷和範, 尾形哲也, 小嶋秀樹, 奥乃博

情報処理学会第68回全国大会 2006年03月
RNNPBを用いて獲得した疑似シンボルによる人間とロボットの協調の実現

松本祥平, 駒谷和範, 尾形哲也, 谷淳, 奥乃博

情報処理学会第68回全国大会 2006年03月
アクティブセンシングを用いたロボットによる模倣動作の自律的獲得

横矢龍之介, 駒谷和範, 尾形哲也, 谷淳, 奥乃博

情報処理学会第68回全国大会 2006年03月

CiNii
調波構造抽出と高信頼度フレーム選択を用いた雑音下での話者識別

藤原弘将, 北原鉄朗, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

日本音響学会春期研究発表会 2006年03月
Instrogram: 楽器存在確率に基づく音楽視覚表現法

北原鉄朗, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

日本音響学会春期研究発表会 2006年03月
ゲーム理論に基づく中心化理論の再定式化と日英コーパスを用いた統計的検証

白松俊, 駒谷和範, 尾形哲也, 橋田浩一, 奥乃博

言語処理学会第12回年次大会 2006年03月
2P1-D19 人間-ロボット間コミュニケーションのための自律系と情動表出の段階的進化

菅佑樹, 高田広太郎, 尾形哲也, 菅野重樹

ロボティクス・メカトロニクス講演会講演概要集 2006 "2P1 - D19(1)"-"2P1-D19(2)" 2006年

　概要を見る

Since quantitative evaluation of the human-robot interaction is quite difficult, we are using the Interactive Evolutionary Computation (IEC) to realize the adaptive interaction robot. However, if the IEC is applied to a complicated robot, the search space becomes very vast. Therefore, in this paper, we propose an incremental evolution technique where the evolution is carried out in two steps. First, the robot is evaluated automatically on the basis of the principle of Self-Preservation. Next, the robot is evaluated by the users. As the results of the experiment, the advantage of our proposed method is confirmed.

CiNii
Acquisition of Motion Primitives of Robot in Human-Navigation Task

Ogata Tetsuya, Sugano Shigeki, Tani Jun

Information and Media Technologies 1 ( 1 ) 305 - 313 2006年

　概要を見る

A novel approach to human-robot collaboration based on quasi-symbolic expressions is proposed. The target task is navigation in which a person with his or her eyes covered and a humanoid robot collaborate in a context-dependent manner. The robot uses a recurrent neural net with parametric bias (RNNPB) model to acquire the behavioral primitives, which are sensory-motor units, composing the whole task. The robot expresses the PB dynamics as primitives using symbolic sounds, and the person influences these dynamics through tactile sensors attached to the robot. Experiments with six participants demonstrated that the level of influence the person has on the PB dynamics is strongly related to task performance, the person's subjective impressions, and the prediction error of the RNNPB model (task stability). Simulation experiments demonstrated that the subjective impressions of the correspondence between the utterance sounds (the PB values) and the motions were well reproduced by the rehearsal of the RNNPB model.

DOI CiNii
An error correction framework based on drum pattern periodicity for improving drum sound detection

Kazuyoshi Yoshii, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 5095 - 5098 2006年 [査読有り]

　概要を見る

This paper presents a framework for correcting errors of automatic drum sound detection focusing on the periodicity of drum patterns. We define drum patterns as periodic structures found in onset sequences of bass and snare drum sounds. Our framework extracts periodic drum patterns from imperfect onset sequences of detected drum sounds (bottom-up processing) and corrects errors using the periodicity of the drum patterns (top-down processing). We implemented this framework on our drum-sound detection system. We first obtained onset sequences of the drum sounds with our system and extracted drum patterns. On the basis of our observation that the same drum patterns tend to be repeated, we detected time points which deviate from the periodicity as error candidates. Finally, we verified each error candidate to judge whether it is an actual onset or not. Experiments of drum sound detection for polyphonic audio signals of popular CD recordings showed that our correction framework improved the average detection accuracy from 77.4% to 80.7%.
Instrogram: A new musical instrument recognition technique without using onset detection nor F0 estimation

Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 5 229 - 232 2006年 [査読有り]

　概要を見る

This paper describes a new technique for recognizing musical instruments in polyphonic music. Because the conventional framework for musical instrument recognition in polyphonic music had to estimate the onset time and fundamental frequency (F0) of each note, instrument recognition strictly suffered from errors of onset detection and F0 estimation. Unlike such a note-based processing framework, our technique calculates the temporal trajectory of instrument existence probabilities for every possible F0, and the results are visualized with a spectrogram-like graphical representation called instrogram. The instrument existence probability is defined as the product of a nonspecific instrument existence probability calculated using PreFEst and a conditional instrument existence probability calculated using the hidden Markov model. Experimental results show that the obtained instrograms reflect the actual instrumentations and facilitate instrument recognition. © 2006 IEEE.

DOI
F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search

Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 5 253 - 256 2006年 [査読有り]

　概要を見る

This paper describes a method for estimating F0s of vocal from polyphonic audio signals. Because melody is sung by a singer in many musical pieces, the estimation of F0s of the vocal part is useful for many applications. Based on existing multiple-F0 estimation method, we evaluate the vocal probabilities of the harmonic structure of each F0 candidate. In order to calculate the vocal probabilities of the harmonic structure, we extract and resynthesize the harmonic structure by using a sinusoidal model and extract feature vectors. Then, we evaluate the vocal probability by using vocal and non-vocal Gaussian mixture models (GMMs). Finally, we track F0 trajectories using these probabilities based on Viterbi search. Experimental results show that our method improves estimation accuracy from 78.1% to 84.3%, which is 28.3% reduction of misestimation. © 2006 IEEE.

DOI
Leak energy based missing feature mask generation for ICA and GSS and its evaluation with simultaneous speech recognition.

Shun'ichi Yamamoto, Ryu Takeda, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean-Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proc. of ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA2006) 42 - 47 2006年
Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears

Ryu Takeda, Shun'ichi Yamamoto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 878 - 885 2006年 [査読有り]

　概要を見る

Robot audition is a critical technology in making robots symbiosis with people. Since we hear a mixture of sounds in our daily lives, sound source localization and separation, and recognition of separated sounds are three essential capabilities. Sound source localization has been recently studied well for robots, while the other capabilities still need extensive studies. This paper reports the robot audition system with a pair of omni-directional microphones embedded in a humanoid to recognize two simultaneous talkers. It first separates sound sources by Independent Component Analysis (ICA) with single-input multiple-output (SIMO) model. Then, spectral distortion for separated sounds is estimated to identify reliable and unreliable components of the spectrogram. This estimation generates the missing feature masks as spectrographic masks. These masks are then used to avoid influences caused by spectral distortion in automatic speech recognition based on missing-feature method. The novel ideas of our system reside in estimates of spectral distortion of temporal-frequency domain in terms of feature vectors. In addition, we point out that the voice-activity detection (VAD) is effective to overcome the weak point of ICA against the changing number of talkers. The resulting system outperformed the baseline robot audition system by 15 %. © 2006 IEEE.

DOI

Scopus

11

被引用数

(Scopus)
Dynamic perception after visually guided grasping by a human-like autonomous robot

Mototaka Suzuki, Kuniaki Noda, Yuki Suga, Tetsuya Ogata, Shigeki Sugano

Advanced Robotics 20 ( 2 ) 233 - 254 2006年 [査読有り]

　概要を見る

We will explore dynamic perception following the visually guided grasping of several objects by a human-like autonomous robot. This competency serves for object categorization. Physical interaction with the hand-held object gives the neural network of the robot the rich, coherent and multi-modal sensory input. Multi-layered self-organizing maps are designed and examined in static and dynamic conditions. The results of the tests in the former condition show its capability of robust categorization against noise. The network also shows better performance than a single-layered map does. In the latter condition we focus on shaking behavior by moving only the forearm of the robot. In some combinations of grasping style and shaking radius the network is capable of categorizing two objects robustly. The results show that the network capability to achieve the task largely depends on how to grasp and how to move the objects. These results together with a preliminary simulation are promising toward the self-organization of a high degree of autonomous dynamic object categorization. © 2006 Taylor & Francis Group, LLC.

DOI

Scopus
可聴域音波の広指向性と回折を利用した同時複数物体定位と遮蔽物検出

丹羽治彦, 尾形哲也, 駒谷和範, 奥乃博

電子情報通信学会超音波研究会 105 ( 553 ) 55 - 60 2006年01月

　概要を見る

本論では広指向性をもつ可聴域の音波を使用し、その反射音を計測することにより広範囲にある複数物体に対し1計測で位置推定を行う手法について報告する。複数物体においてオクルージョンが発生する場合も音波の回折現象を利用することにより、遮蔽された物体の位置検出をも可能であることも検証している。本手法の基本は、可聴域音波の反射音場を音響伝播理論であるAcoustical Holography(AH)を用いて予測し、音強度分布より物体の位置を推定するものである。さらに実測に耐える手法も構築し、単一物体、複数物体、複数(オクルージョン)物体についての定位実験をシミュレーション・実環境それぞれで行い、その有効性を示した。

CiNii
Multi-domain spoken dialogue system with extensibility and robustness against speech recognition errors

Kazunori Komatani, Naoyuki Kanda, Mikio Nakano, Kazuhiro Nakadai, Hiroshi Tsujino, Tetsuya Ogata, Hiroshi G. Okuno

COLING/ACL 2006 - SIGdial06: 7th SIGdial Workshop on Discourse and Dialogue, Proceedings of the Workshop 9 - 17 2006年 [査読有り]

　概要を見る

We developed a multi-domain spoken dialogue system that can handle user requests across multiple domains. Such systems need to satisfy two requirements: extensibility and robustness against speech recognition errors. Extensibility is required to allow for the modification and addition of domains independent of other domains. Robustness against speech recognition errors is required because such errors are inevitable in speech recognition. However, the systems should still behave appropriately, even when their inputs are erroneous. Our system was constructed on an extensible architecture and is equipped with a robust and extensible domain selection method. Domain selection was based on three choices: (I) the previous domain, (II) the domain in which the speech recognition result can be accepted with the highest recognition score, and (III) other domains. With the third choice we newly introduced, our system can prevent dialogues from continuously being stuck in an erroneous domain. Our experimental results, obtained with 10 subjects, showed that our method reduced the domain selection errors by 18.3%, compared to a conventional method. © 2006 Association for Computational Linguistics.

DOI

Scopus

34

被引用数

(Scopus)
Instrogram: A new musical instrument recognition technique without using onset detection nor F0 estimation

Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 5087 - 5090 2006年 [査読有り]

　概要を見る

This paper describes a new technique for recognizing musical instruments in polyphonic music. Because the conventional framework for musical instrument recognition in polyphonic music had to estimate the onset time and fundamental frequency (FO) of each note, instrument recognition strictly suffered from errors of onset detection and FO estimation. Unlike such a note-based processing framework, our technique calculates the temporal trajectory of instrument existence probabilities for every possible FO, and the results are visualized with a spectrogram-like graphical representation called instrogram. The instrument existence probability is defined as the product of a nonspecific instrument existence probability calculated using PreFEst and a conditional instrument existence probability calculated using the hidden Markov model. Experimental results show that the obtained instrograms reflect the actual instrumentations and facilitate instrument recognition.
F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and Viterbi search

Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 5111 - 5114 2006年 [査読有り]

　概要を見る

This paper describes a method for estimating FOs of vocal from polyphonic audio signals. Because melody is sung by a singer in many musical pieces, the estimation of FOs of the vocal part is useful for many applications. Based on existing multiple-170 estimation method, we evaluate the vocal probabilities of the harmonic structure of each FO candidate. In order to calculate the vocal probabilities of the harmonic structure, we extract and resynthesize the harmonic structure by using a sinusoidal model and extract feature vectors. Then, we evaluate the vocal probability by using vocal and non-vocal Gaussian mixture models (GMMs). Finally, we track FO trajectories using these probabilities based on Viterbi search. Experimental results show that our method improves estimation accuracy from 78.1% to 84.3%, which is 28.3% reduction of misestimation.
An error correction framework based on drum pattern periodicity for improving drum sound detection

Kazuyoshi Yoshii, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 5 237 - 240 2006年 [査読有り]

　概要を見る

This paper presents a framework for correcting errors of automatic drum sound detection focusing on the periodicity of drum patterns. We define drum patterns as periodic structures found in onset sequences of bass and snare drum sounds. Our framework extracts periodic drum patterns from imperfect onset sequences of detected drum sounds (bottom-up processing) and corrects errors using the periodicity of the drum patterns (top-down processing). We implemented this framework on our drum-sound detection system. We first obtained onset sequences of the drum sounds with our system and extracted drum patterns. On the basis of our observation that the same drum patterns tend to be repeated, we detected time points which deviate from the periodicity as error candidates. Finally, we verified each error candidate to judge whether it is an actual onset or not. Experiments of drum sound detection for polyphonic audio signals of popular CD recordings showed that our correction framework improved the average detection accuracy from 77.4% to 80.7%. © 2006 IEEE.

DOI
Genetic Algorithm based Improvement of Robot”s Hearing Capabilities in Separating and Recognizing Simultaneous Speech Signals

Shun'ichi YAMAMOTO, Kazuhiro NAKADAI, Miko NAKANO, Hiroshi TSUJINO, Jean-Marc Varin, Ryu TAKEDA, Tetsuya OGATA, Kazunori KOMATANI, Hiroshi G. OKUNO

Nineteenth International Conference on Industrial, Engineering and Other Applications of Applied Intelligence Systems (IEA/AIE-2006) 4031 LNAI 207 - 217 2006年 [査読有り]

DOI

Scopus

8

被引用数

(Scopus)
Recognition of simultaneous speech by estimating reliability of separated signals for robot audition

Shun'ichi Yamamoto, Ryu Takeda, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4099 LNAI 484 - 494 2006年 [査読有り]

　概要を見る

"Listening to several things at once" is a people's dream and one goal of AI and robot audition, because people can listen to at most two things at once according to psychophysical observations. Current noise reduction techniques cannot help to achieve this goal because they assume quasi-stationary noises, not interfering speech signals. Since robots are used in various environments, robot audition systems require minimum a priori information about their acoustic environments and speakers. We evaluate a missing feature theory approach that interfaces between sound source separation (SSS) and automatic speech recognition. The essential part is the estimate of reliability of each feature of separated sounds. We tested two kinds of robot audition systems that use SSS: independent component analysis (ICA) with two microphones, and geometric source separation (GSS) with eight microphones. For each SSS, automatic missing feature mask generation is developed. The recognition accuracy of two simultaneous speech improved to an average of 67.8 and 88.0% for ICA and GSS, respectively. © Springer-Verlag Berlin Heidelberg 2006.

DOI

Scopus

3

被引用数

(Scopus)
Improvement against noises in self-organizing logic circuit

Chyon Hae Kim, Jyun Ichi Idesawa, Shigeki Sugano, Tetsuya Ogata

Proceedings of IEEE ICIA 2006 - 2006 IEEE International Conference on Information Acquisition 53 - 58 2006年 [査読有り]

　概要を見る

We proposed and evaluated a network learning method called self-organizing network elements (SONE). Autonomous exploration of effective output, the use of simple external parameters, and low calculation costs were functions achieved for a robot system with this method. However, there is the need to improve performance against noises for learning more complicated tasks. Therefore, we propose a technique to adjust thresholds in a self-organizing logic circuit based on SONE. In our experiments, performing 3-bit operation tasks, controlling a Khepera robot, and generating new elements was appropriately controlled with this technique. Also, network performance against noises was improved as a result of using the proposed method. ©2006 IEEE.

DOI

Scopus
Reinforcement learning algorithm with CTRNN in continuous action space

Hiroaki Arie, Jun Namikawa, Tetsuya Ogata, Jun Tani, Shigeki Sugano

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4232 LNCS 387 - 396 2006年 [査読有り]

　概要を見る

There are some difficulties in applying traditional reinforcement learning algorithms to motion control tasks of robot. Because most algorithms are concerned with discrete actions and based on the assumption of complete observability of the state. This paper deals with these two problems by combining the reinforcement learning algorithm and CTRNN learning algorithm. We carried out an experiment on the pendulum swing-up task without rotational speed information. It is shown that the information about the rotational speed, which is considered as a hidden state, is estimated and encoded on the activation of a context neuron. As a result, this task is accomplished in several hundred trials using the proposed algorithm. © Springer-Verlag Berlin Heidelberg 2006.

DOI

Scopus

2

被引用数

(Scopus)
Human-Robot Interaction System Using Interactive EC

Y. Suga, C. Endo, T. Ogata, S. Sugano

IEEE/RSJ International conference on Intelligent Robots and Systems 3663 - 3668 2006年 [査読有り]

DOI

Scopus

2

被引用数

(Scopus)
Efficient organization of network topology based on reinforcement signals

Chyon Hae Kim, Shigeki Sugano, Tetsuya Ogata

IEEE International Conference on Intelligent Robots and Systems 3154 - 3159 2006年 [査読有り]

　概要を見る

We developed a learning system for autonomous robots that allows for autonomous exploration of the effective output, and has simple external parameters and a low calculation cost. We propose the concept of self-organizing network elements (SONE) for creating learning systems with these characteristics. We created and evaluated a self-organizing logic circuit by using this concept. Our results indicated this learning system had the characteristics. © 2006 IEEE.

DOI

Scopus

1

被引用数

(Scopus)
Real-time robot audition system that recognizes simultaneous speech in the real world

Shun'ichi Yamamoto, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 5333 - 5338 2006年 [査読有り]

　概要を見る

This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported Missing Feature Theory (MFT) based integration of Sound Source Separation (SSS) and Automatic Speech Recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in Voice Activity Detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed Genetic Algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment. © 2006 IEEE.

DOI

Scopus

55

被引用数

(Scopus)
Multiple acoustical holography method for localization of objects in broad range using audible sound

Haruhiko Niwa, Tetsuya Ogata, Kazunori Komatani, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 1145 - 1150 2006年 [査読有り]

　概要を見る

This paper describes a new acoustic localization method using audible sound, which can be applied over a broader range of search directions. In the field of Robotics, most conventional indoor localization systems based on sonar range finders use ultrasound to obtain a highly accurate distance. Because ultrasound has high directivity, many measurements are required to localize objects in a large space. To achieve localization with one-time measurement, we use audible sounds. We then calculate an intensity field of the reflection sound to estimate object positions. Although acoustical holography (AH) is a well-known technique to do this, it has problems in that it generates false images. We propose multiple AH (MAH) to solve this problem. The method is used to divide a measurement plane into sub-planes and to apply AH to each sub-plane. By integrating the results of applying AH to the sub-planes, false images can be suppressed because the positions of the false images differ depending on the position of the sub-plane. In addition, we use multiple frequencies to advance an accuracy of the localization based on MAH in a real environment. We constructed a localization system with only one speaker and a microphone array. In both simulation and actual experiments, we confirmed that MAH was effective method for the suppression of false image and could be used within the range of an angle view of 120 deg. ©2006 IEEE.

DOI

Scopus

1

被引用数

(Scopus)
Experience based imitation using RNNPB

Ryunosuke Yokoya, Tetsuya Ogata, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

IEEE International Conference on Intelligent Robots and Systems 3669 - 3674 2006年 [査読有り]

　概要を見る

Robot imitation is a useful and promising alternative to robot programming. Robot imitation involves two crucial issues. The first is how a robot can imitate a human whose physical structure and properties differ greatly from its own. The second is how the robot can generate various motions from finite programmable patterns (generalization). This paper describes a novel approach to robot imitation based on its own physical experiences. Let us consider a target task of moving an object on a table. For imitation, we focused on an active sensing process in which the robot acquires the relation between the object's motion and its own arm motion. For generalization, we applied a recurrent neural network with parametric bias (RNNPB) model to enable recognition/generation of imitation motions. The robot associates the arm motion which reproduces the observed object's motion presented by a human operator. Experimental results demonstrated that our method enabled the robot to imitate not only motion it has experienced but also unknown motion, which proved its capability for generalization. © 2006 IEEE.

DOI

Scopus

13

被引用数

(Scopus)
Dynamic Help Generation by Estimating User”s Mental Model in Spoken Dialogue Systems

Yuichiro FUKUBAYASHI, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Proc. of International Conference on Spoken Language Processing (Interspeech-2006) 4 1946 - 1949 2006年 [査読有り]
Improving speech recognition of two simultaneous speech signals by integrating ICA BSS and automatic missing feature mask generation

Rya Takeda, Shu N.Ichi Yamamoto, Kazunori Komatoni, Tetsuya Ogata, Hiroshi G. Okuno

INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP 5 2302 - 2305 2006年 [査読有り]

　概要を見る

Robot audition systems require capabilities for sound source separation and the recognition of separated sounds, since we hear a mixture of sounds in our daily lives, especially mixed of speech. We report a robot audition system with a pair of omni-directional microphones embedded in a humanoid that recognizes two simultaneous talkers. It first separates the sound sources by Independent Component Analysis (ICA) with the single-input multiple-output (SIMO) model. Then, spectral distortion in the separated sounds is then estimated to generate missing feature masks. Finally, the separated sounds are recognized by missing-feature theory (MFT) for Automatic Speech Recognition (ASR). The novel aspects of our system involve estimates of spectral distortion in the temporal-frequency domain in terms of feature vectors and based on estimates error in SIMO-ICA signals. The resulting system outperformed the baseline robot audition system by 7 %.
Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting

Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP 3 1459 - 1462 2006年 [査読有り]

　概要を見る

We present methods for automatic speaker identification in noisy environments. To improve noise robustness of speaker identification, we developed two methods, the harmonic structure extraction method and the reliable frame weighting method. The harmonic structure extraction method enables the speaker of input speech signals to be identified after environmental noise has been reduced. This method first extracts harmonic components of the speech from the sound mixtures and then resynthesizes a clean speech signal by using a sinusoidal model driven by harmonic components. The reliable frame weighting method then determines how each frame of the resynthesized speech is reliable (i.e. little influenced by environmental noises) by using two Gaussian mixture models for the speech and noise. The speaker can be robustly identified by attaching importance to reliable frames. Experimental results with thirty speakers showed that our method was able to reduce the influences of environmental noise and achieved an error rate of 10.7%, while the error rate for a conventional method was 18.9%.
Automatic feature weighting in automatic transcription of specified part in polyphonic music

Katsutoshi Itoyama, Tetsuro Kitahara, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ISMIR 2006 - 7th International Conference on Music Information Retrieval 172 - 175 2006年 [査読有り]

　概要を見る

We studied the problem of automatic music transcription (AMT) for polyphonic music. AMT is an important task for music information retrieval because AMT results enable retrieving musical pieces, high-level annotation, demixing, etc. We attempted to transcribe a part played by an instrument specified by users (specified part tracking). Only two timbre models are required in the specified part tracking to identify the specified musical instrument even when the number of instruments increases. This transcription is formulated into a time-series classification problem with multiple features. We furthermore attempted to automatically estimate weights of the features, because the importance of these features varies for each musical signal. We estimated quasi-optimal weights of the features using a genetic algorithm for each musical signal. We tested our AMT system using trio stereo musical signals. Accuracies with our feature weighting method were 69.8% on average, whereas those without feature weighting were 66.0%. © 2006 University of Victoria.
Hybrid Collaborative and Content-based Music Recommendation Using Probabilistic Model with Latent User Preferences

Kazuyoshi YOSHII, Masataka GOTO, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Proceedings of 8th International Conference on Musical Information Retreival (ISMIR-2007) 296 - 301 2006年 [査読有り]
Evolutionary approach for designing the behavior generator of communication robot

Yuki Suga, Tetsuya Ogata, Shigeki Sugano

2006 SICE-ICASE International Joint Conference 2104 - 2109 2006年 [査読有り]

　概要を見る

Our goal is to create the robot system which interacts with human users keeping their interest during a long period. We focus on the Interactive Evolutionary Computation (IEC) technique to achieve this goal. Although the IEC enables users to design various systems which reflect their subjective preferences, it forces users to evaluate a huge number of individuals in the genetic pool during the evolution period. To solve this problem, we propose a refined IEC technique, named Human-Machine Hybrid Evaluation (HMHE), which selects the representative genes for user evaluation and estimates the evaluation results of the other genes. It can increase the population size without increasing the users' evaluation processes. We carried out some simulations where a humanoid robot with our method interacted with a user. The experimental results demonstrated that the HMHE could continue to generate the various robot behaviors by adapting to the transition of user's subjective preferences. © 2006 ICASE.

DOI

Scopus
Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals

Hiromasa Fujihara, Masataka Goto, Jun Ogata, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ISM 2006 - 8th IEEE International Symposium on Multimedia 257 - 264 2006年 [査読有り]

　概要を見る

This paper describes a system that can automatically synchronize between polyphonic musical audio signals and corresponding lyrics. Although there were methods that can synchronize between monophonie speech signals and corresponding text transcriptions by using Viterbi alignment techniques, they cannot be applied to vocals in CD recordings because accompaniment sounds often overlap with vocals. To align lyrics with such vocals, we therefore developed three methods: a method for segregating vocals from polyphonic sound mixtures, a method for detecting vocal sections, and a method for adapting a speech-recognizer phone model to segregated vocal signals. Experimental results for 10 Japanese popular-music songs showed that our system can synchronize between music and lyrics with satisfactory accuracy for 8 songs. © 2006 IEEE.

DOI

Scopus

60

被引用数

(Scopus)
Musical instrument recognizer "instrogram" and its application to music retrieval based on instrumentation similarity

Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ISM 2006 - 8th IEEE International Symposium on Multimedia 265 - 272 2006年 [査読有り]

　概要を見る

Instrumentation is an important cue in retrieving musical content. Conventional methods for instrument recognition performing notewise require accurate estimation of the onset time and fundamental frequency (F0) for each note, which is not easy in polyphonic music. This paper presents a non-notewise method for instrument recognition in polyphonic musical audio signals. Instead of such note-wise estimation, our method calculates the temporal trajectory of instrument existence probabilities for every F0 and visualizes it as a spectrogram-like graphical representation, called an instrogram. This method can avoid the influence by errors of onset detection and F0 estimation because it does not use them. We also present methods for MPEG-7-based instrument annotation and music information retrieval based on the similarity between instrograms. Experimental results with realistic music show the average accuracy of 76.2% for the instrument annotation and that the instrogram-based similarity measure represents the actual instrumentation similarity better than an MFCC-based one. ©2006 IEEE.

DOI

Scopus

18

被引用数

(Scopus)
ロボットによる周囲状況把握のための雑音下での環境音認識

海尻聡, 石原一志, 北原鉄朗, Valin Jean-Marc, 駒谷和範, 尾形哲也, 奥乃博

第6回システムインテグレーション部門講演会 (SI2005), 計測自動制御学会 2005年12月
音源分離と音源認識を統合した同時発話認識システムの同時発話文による評価

山本俊一, 中臺一博, 中野幹生, 辻野広司, Valin Jean-Marc, 駒谷和範, 尾形哲也, 奥乃博

第6回システムインテグレーション部門講演会 (SI2005), 計測自動制御学会 2005年12月
自己組織化論理回路における学習アルゴリズムに関する解析

金天海, 出澤純一, 尾形哲也, 菅野重樹

第6回システムインテグレーション部門講演会 (SI2005), 計測自動制御学会 2005年12月
超音波センサアレイを用いたアクティブセンシングによる3次元物体の位置形状認識

丹羽治彦, 駒谷和範, 尾形哲也, 奥乃博

音響学会関西支部第8回若手研究者交流研究発表会 2005年12月
多重奏音源同定における音の重なりに対する頑健性の改善

北原鉄朗, 後藤真考, 駒谷和範, 尾形哲也, 奥乃博

音響学会関西支部第8回若手研究者交流研究発表会 2005年12月
非線形振動子による引き込みを利用した仮想空間における歩行

小鷹研理, 尾形哲也, 奥乃博

ヒューマンインタフェース学会論文誌 7 ( 4 ) 443 - 451 2005年11月
ヒューマノイドを対象としたミッシングフィーチャー理論による分離音の連続音声認識の向上

山本俊一, VALIN Jean‐Marc, 中台一博, 中野幹生, 辻野広司, 駒谷和範, 尾形哲也, 奥乃博

日本ロボット学会学術講演会予稿集(CD-ROM) 23rd 3C35 2005年09月

J-GLOBAL
モダリティ間マッピングを用いた環境音を表現するロボットジェスチャー生成

服部佑哉, 小嶋秀樹, 駒谷和範, 尾形哲也, 奥乃博

第23回日本ロボット学会学術講演会 2005年09月
人間との対話のためのロボットによる疑似シンボルの獲得とリアルタイム表情認識

松本祥平, 大谷拓, 駒谷和範, 尾形哲也, 奥乃博

第23回日本ロボット学会学術講演会 2005年09月
身体の動力学的特性に基づいた運動プリミティブの獲得-2自由度リンク系のランダムな運動の分節化の試み

有江浩明, 尾形哲也, 菅野重樹

第23回日本ロボット学会学術講演会 2005年09月
ロボットによる複数人同時インタラクションのための人間新密度空間マッピング

田崎豪, 駒谷和範, 尾形哲也, 奥乃博

第23回日本ロボット学会学術講演会 2005年09月
RNNPBによるマルチモーダルダイナミクス表現に基づいたヒューマノイドの物体能動知覚

尾形哲也, 大庭隼人, 谷淳, 駒谷和範, 奥乃博

第23回日本ロボット学会学術講演会 2005年09月
複数のカルマンフィルタを用いた複数移動話者追跡と精度評価

村瀬昌満, 山本俊一, Valin Jean-Marc, 中臺一博, 山田健太郎, 駒谷和範, 尾形哲也, 奥乃博

第23回日本ロボット学会学術講演会 2005年09月
実ロボットを用いた対話型進化的計算によるコミュニケーション学習

生熊良規, 長尾大輔, 菅佑樹, 尾形哲也, 菅野重樹

第23回日本ロボット学会学術講演会 2005年09月
ミッシングフィーチャー理論に基づく音源分離と音声認識インタフェースの評価

山本俊一, Valin Jean-Marc, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

第23回日本ロボット学会学術講演会 2005年09月
混合音からの特徴量テンプレート作成と音楽的文脈利用による多重奏の音源同定

北原鉄朗, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

音響学会秋期研究発表会 3-10-15 2005年09月
ドラムパターン推定によるドラム音認識誤り補正手法

吉井和佳, 後藤真考, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会, 音楽情報科学研究報告 2005-MUS-61-16, Vol. 2005 ( No. 82 ) 2005年08月
Generating Confirmation to Distinguish Phonologically Confusing Word Pairs in Spoken Dialogue Systems

Kazunori KOMATANI, Ryoji HAMABE, Tetsuya OGATA, Hiroshi G. OKUNO

Proc. of 4th IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems 40 - 45 2005年07月

CiNii
Robot Gesture Generation from Environmental Sounds Using Inter-modality Mapping

Yuya HATTORI, Hideki KOZIMA, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Proc. International Workshop on Epigenetic Robotics 139 - 140 2005年05月

CiNii
音声対話システムにおける音韻的類似表現の混同を防ぐための確認の自動生成

浜辺良二, 駒谷和範, 尾形哲也, 奥乃博

第56回音声言語情報処理研究会 2005-SLP-Vol. 2005 2005年05月

CiNii
ロボツトとの協調作業のためのRNNによる擬似シンボルの獲得

大谷拓, 大庭隼人, 駒谷和範, 尾形哲也, 谷淳, 奥乃博

情報処理学会全国大会講演論文集 67th ( 2 ) 229 - 230 2005年03月

J-GLOBAL
環境音の擬音語変換のための環境音用音素の設計

石原一志, 駒谷和範, 尾形哲也, 奥乃博

人工知能学会論文誌 20 ( 3 ) 229 - 236 2005年03月
音声対話システムにおける音韻的類似度の混同を防ぐための確認の生成

浜辺良二, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
データベース検索タスクの文脈的制約を用いた音声対話システムの実験的評価

神田直之, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
RNNによるロボットの物体操作のための物体ダイナミクスの抽出

大庭隼人, 駒谷和範, 尾形哲也, 谷淳, 奥乃博

情報処理学会第67回全国大会 2005年03月
ロボットとの協調作業のためのRNNによる疑似シンボルの獲得

大谷拓, 駒谷和範, 尾形哲也, 谷淳, 奥乃博

情報処理学会第67回全国大会 2005年03月
音声情報と顔画像情報の統合によるパラ言語レベルの感情認識

松本祥平, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
ロボットによる人間新密度の空間マッピングとインタラクションへの適用

田崎豪, 松本祥平, 大庭隼人, 村瀬昌満, 大谷拓, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
方向と距離の変化に対してロバストな唇認識

山口健, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
マイクロフォンアレイを用いて分離した環境音による状況認識

海尻聡, 石原一志, Jean-Marc Valin, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
カルマンフィルタによる音声の時系列特徴を用いた複数移動話者の追跡

村瀬昌満, 山本俊一, Jean-Marc Valin, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
マイクロフォンアレイによる分離音認識のためのミッシングフィーチャーマスク自動生成

山本俊一, Jean-Marc Valin, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
ドメイン間マッピングによる環境音に呼応したロボット動作生成

服部佑哉, 小嶋秀樹, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
擬音語自動認識に基づいた環境音検索システム

石原一志, 服部佑哉, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
簡易的インタフェースを用いた仮想空間移動のための歩行感覚の補償

小鷹研理, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
歌声の調波構造抽出を用いた歌手名同定

藤原弘将, 北原鉄朗, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
調波構造の抑制によるドラムス発音時刻検出の頑健化

吉井和佳, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
多重奏の音源同定のための混合音からのテンプレート作成法

北原鉄朗, 後藤真孝, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第67回全国大会 2005年03月
超音波センサアレイを用いたアクティブセンシングによる3次元物体の位置形状認識

丹羽治彦, 駒谷和範, 尾形哲也, 奥乃博

音響学会春期講演会 2005年03月
音楽音響信号による室内音場共振周波数のブラインンド推定

吉岡拓也, 引地孝文, 三好正人, 駒谷和範, 尾形哲也, 奥乃博

音響学会春期講演会 2005年03月
データベース検索タスクの文脈的制約を用いた音声対話システムの実験的評価

神田直之, 駒谷和範, 尾形哲也, 奥乃博

音声言語情報処理研究会(情報研報) 2005 ( 12 ) 107 - 112 2005年02月

　概要を見る

データベース検索タスクにおける音声対話システムにおいて，音声認識誤りの棄却や意味曖昧性の解消のために文脈的制約を取り入れる手法について述べる．まず，データベース検索タスクの対話は「検索条件の指定」から「情報の提示要求」へと移行するとモデル化する．さらに，検索条件をその入力の順序に従って木構造状に管理する．言語理解部ではこれらのモデルから得られる特徴を決定木学習により文脈的制約として取り入れる．提案手法をレストランデータベース検索システムとして実装し，20名の被験者による評価実験を行った．実験の結果，提案手法に基づく文脈的特徴を加えることで，13.4\%の意味理解誤り削減が認められた．We describe how to introduce contextual information in spoken dialogue systems for database retrieval task. In this paper, we model dialogues in the database retrieval task as consisting of two modes: ``specifying retrieval conditions'' and ``requesting detailed information about specific entries''. Furthermore, we manage retrieval conditions as a tree structure. Based on those models, we introduce decision tree learning using features reflecting the situations in the task as well as those derived from current utterances. By using the output of the decision tree, the system can appropriately select words from a speech recognition result even when it contains some errors. The experimental result showed that our method could identify users' intentions 13.4% better than that without the contexual information.

CiNii
ミッシングフィーチャ理論を適用した同時発話認識システムの同時発話文による評価

山本俊一, VALIN Jean‐Marc, 中台一博, 中野幹生, 辻野広司, 駒谷和範, 尾形哲也, 奥乃博

人工知能学会AIチャレンジ研究会 22nd 101 - 106 2005年

J-GLOBAL
Sound-imitation word recognition for environmental sounds disambiguation in determining phonemes of sound-imitation words

Kazushi Ishihara, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Transactions of the Japanese Society for Artificial Intelligence 20 ( 3 ) 229 - 236 2005年 [査読有り]

　概要を見る

Environmental sounds are very helpful in understanding environmental situations and in telling the approach of danger, and sound-imitation words (sound-related onomatopoeia) are important expressions to inform such sounds in human communication, especially in Japanese language. In this paper, we design a method to recognize sound-imitation words (SIWs) for environmental sounds. Critical issues in recognizing SIW are how to divide an environmental sound into recognition units and how to resolve representation ambiguity of the sounds. To solve these problems, we designed three-stage procedure that transforms environmental sounds into sound-imitation words, and phoneme group expressions that can represent ambiguous sounds. The three-stage procedure is as follows: (1) a whole waveform is divided into some chunks, (2) the chunks are transformed into sound-imitation syllables by phoneme recognition, (3) a sound-imitation word is constructed from sound-imitation syllables according to the requirements of the Japanese language. Ambiguity problem is that an environmental sound is often recognized differently by different listeners even under the same situation. Phoneme group expressions are new phonemes for environmental sounds, and they can express multiple sound-imitation words by one word. We designed two sets of phoneme groups: "a set of basic phoneme group" and "a set of articulation-based phoneme group" to absorb the ambiguity. Based on subjective experiments, the set of basic phoneme groups proved more appropriate to represent environmental sounds than the articulation-based one or a set of normal Japaneses phonemes.

DOI

Scopus

2

被引用数

(Scopus)
Open-end human-robot interaction from the dynamical systems perspective: Mutual adaptation and incremental learning

Tetsuya Ogata, Shigeki Sugano, Jun Tani

Advanced Robotics 19 ( 6 ) 651 - 670 2005年 [査読有り]

担当区分：筆頭著者

　概要を見る

In this paper, we experimentally investigated the open-end interaction generated by the mutual adaptation between humans and robot. Its essential characteristic, incremental learning, is examined using the dynamical systems approach. Our research concentrated on the navigation system of a specially developed humanoid robot called Robovie and seven human subjects whose eyes were covered, making them dependent on the robot for directions. We used the usual feed-forward neural network (FFNN) without recursive connections and the recurrent neural network (RNN) for the robot control. Although the performances obtained with both the RNN and the FFNN improved in the early stages of learning, as the subject changed the operation by learning on its own, all performances gradually became unstable and failed. Next, we used a 'consolidation-learning algorithm' as a model of the hippocampus in the brain. In this method, the RNN was trained by both new data and the rehearsal outputs of the RNN not to damage the contents of current memory. The proposed method enabled the robot to improve performance even when learning continued for a long time (openend). The dynamical systems analysis of RNNs supports these differences and also showed that the collaboration scheme was developed dynamically along with succeeding phase transitions. © VSP and Robotics Society of Japan 2005.

DOI

Scopus

23

被引用数

(Scopus)
Enhanced robot speech recognition based on microphone array source separation and missing feature theory

Shun'ichi Yamamoto, Jean Marc Valin, Kazuhiro Nakadai, Jean Rouat, François Michaud, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings - IEEE International Conference on Robotics and Automation 2005 1477 - 1482 2005年 [査読有り]

　概要を見る

A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. While the first two are frequently addressed, the last one has not been studied so much. We present a system that gives a humanoid robot the ability to localize, separate and recognize simultaneous sound sources. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. An automatic speech recognizer (ASR) based on the Missing Feature Theory (MFT) recognizes separated sounds in real-time by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. Recognition rates are presented for three simultaneous speakers located at 2m from the robot. Use of both the post-filter and the missing feature mask results in an average reduction in error rate of 42% (relative). ©2005 IEEE.

DOI

Scopus

54

被引用数

(Scopus)
Distance-based dynamic interaction of humanoid robot with multiple people

Tsuyoshi Tasaki, Shohei Matsumoto, Hayato Ohba, Mitsuhiko Toda, Kazuhiro Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 3533 LNAI 111 - 120 2005年 [査読有り]

　概要を見る

Research on human-robot interaction is getting an increasing amount of attention. Because almost all the research has dealt with communication between one robot and one person, quite little is known about communication between a robot and multiple people. We developed a method that enables robots to communicate with multiple people by selecting an interactive partner using criteria based on the concept of proxemics. In this method, a robot changes active sensory-motor modalities based on the interaction distance between itself and a person. Our method was implemented in a humanoid robot, SIG2, using a subsumption architecture. SIG2 has various sensory-motor modalities to interact with humans. A demonstration of SIG2 showed that the proposed method works well during interaction with multiple people. © Springer-Verlag Berlin Heidelberg 2005.

DOI

Scopus

12

被引用数

(Scopus)
Self-organizing algorithm for logic circuit based on local rules

Chyon Hae Kim, Tetsuya Ogata, Shigeki Sugano

IEEE/ASME International Conference on Advanced Intelligent Mechatronics, AIM 2 1192 - 1197 2005年 [査読有り]

　概要を見る

This study discusses a learning algorithm for autonomous robots that has five characteristics including autonomous exploration of effective output, low calculation costs, capability for multi-tasking, reusing past knowledge, and handling time series. We propose the use of self-organizing network elements (SONE) as a method for creating learning systems that provide these characteristics. Using this method, we created and evaluated a Self-Organizing Logic Circuit. The results of our experiments showed that this learning system met the requirements by being capable of creating a basic logic circuit, learning additional knowledge, controlling a simple robot in a simulation, and solving a maze problem. ©2005 IEEE.
Making a robot recognize three simultaneous sentences in real-time

Shun'Ichi Yamamoto, Kazuhiro Nakadai, Jean Marc Valin, Jean Rouat, François Michaud, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 4040 - 4045 2005年 [査読有り]

　概要を見る

A humanoid robot under real-world environments usually hears mixtures of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. We have adopted the missing feature theory (MFT) for automatic recognition of separated speech, and developed the robot audition system. A microphone array is used along with a real-time dedicated implementation of Geometric Source Separation (GSS) and a multi-channel post-filter that gives us a further reduction of interferences from other sources. The automatic speech recognition based on MFT recognizes separated sounds by generating missing feature masks automatically from the post-filtering step. The main advantage of this approach for humanoid robots resides in the fact that the ASR with a clean acoustic model can adapt the distortion of separated sound by consulting the post-filter feature masks. In this paper, we used the improved Julius as an MFT-based automatic speech recognizer (ASR). The Julius is a real-time large vocabulary continuous speech recognition (LVCSR) system. We performed the experiment to evaluate our robot audition system. In this experiment, the system recognizes a sentence, not an isolated word. We showed the improvement in the system performance through three simultaneous speech recognition on the humanoid SIG2. © 2005 IEEE.

DOI

Scopus

31

被引用数

(Scopus)
Interactive evolution of human-robot communication in real world

Yuki Suga, Yoshinori Ikuma, Daisuke Nagao, Shigeki Sugano, Tetsuya Ogata

2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1438 - 1443 2005年 [査読有り]

　概要を見る

This paper describes how to implement interactive evolutionary computation (IEC) into a human-robot communication system. IEC is an evolutionary computation (EC) in which the fitness function is performed by human assessors. We used IEC to configure the human-robot communication system. We have already simulated IEC's application. In this paper, we implemented IEC into a real robot. Since this experiment leads considerable burdens on both the robot and experimental subjects, we propose the human-machine hybrid evaluation (HMHE) to increase the diversity within the genetic pool without increasing the number of interactions. We used a communication robot, WAMOEBA-3 (Waseda Artificial Mind On Emotion BAse), which is appropriate for this experiment. In the experiment, human assessors interacted with WAMOEBA- 3 in various ways. The fitness values increased gradually, and assessors felt the robot learnt the motions they desired. Therefore, it was confirmed that the IEC is most suitable as the communication learning system. © 2005 IEEE.

DOI

Scopus

23

被引用数

(Scopus)
Spatially mapping of friendliness for human-robot interaction

Tsuyoshi Tasaki, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1277 - 1282 2005年 [査読有り]

　概要を見る

It is important that robots interact with multiple people. However, most research has dealt with only interaction between one robot and one person and assumed that the distance between them does not change. This paper focuses on the spatial relationships between a robot and multiple people during interaction. Based on the distance between them, our robot selects appropriate functions to use. It does this using a method we developed for spatially mapping the "friendliness" of each space around the robot. The robot interacts with the highest friendliness spaces (people) selectively, thereby enabling interaction between the robot and multiple people. Our humanoid robot, SIG2 which the proposed method was implemented into, interacted with about 30 visitors, at the Kyoto University Museum. The results obtained using questionnaires after interaction showed that the actions of SIG2 were easy to understand even when it interacted with multiple people at the same time and that SIG2 behaved in a friendly manner. © 2005 IEEE.

DOI

Scopus

15

被引用数

(Scopus)
Extracting multi-modal dynamics of objects using RNNPB

Tetsuya Ogata, Hayato Ohba, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 160 - 165 2005年 [査読有り]

　概要を見る

Dynamic features play an important role in recognizing objects that have similar static features in colors and or shapes. This paper focuses on active sensing that exploits dynamic feature of an object. An extended version of the robot, Robovie-IIs, moves an object by its arm to obtain its dynamic features. Its issue is how to extract symbols from various kinds of temporal states of the object. We use the recurrent neural network with parametric bias (RNNPB) that generates selforganized nodes in the parametric bias space. The RNNPB with 42 neurons was trained with the data of sounds, trajectories, and tactile sensors generated while the robot was moving/hitting an object with its own arm. The clusters of 20 kinds of objects were successfully self-organized. The experiments with unknown (not trained) objects demonstrated that our method configured them in the PB space appropriately, which proves its generalization capability. © 2005 IEEE.

DOI

Scopus

24

被引用数

(Scopus)
Multiple moving speaker tracking by microphone array on mobile robot

Masamitsu Murase, Shunichi Yamamoto, Jean Marc Valin, Kazuhiro Nakadai, Kentaro Yamada, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

9th European Conference on Speech Communication and Technology 249 - 252 2005年

　概要を見る

Real-world applications often require tracking multiple moving speakers for improving human-robot interactions and/or sound source separation. This paper presents multiple moving speaker tracking using an 8ch microphone array system installed on a mobile robot. This problem is difficult because the system does not assume that sound sources and/or the microphone array are fixed. Our solutions consist of two key ideas - time delay of arrival estimation, and multiple Kalman filters. The former localizes multiple sound sources based on beamforming in real time. Non-linear movements are tracked by using a set of Kalman filters with different history lengths in order to reduce errors in tracking multiple moving speakers under noisy and echoic environments. For quantitative evaluation of the tracking, motion references of sound sources and a mobile robot, called SIG2, were measured accurately by ultrasonic 3D tag sensors. As a result, we showed that the system tracked three simultaneous sound sources even when SIG2 moved in a room with large reverberation due to glass walls.
Contextual constraints based on dialogue models in database search task for spoken dialogue systems

Kazunori Komatani, Naoyuki Kanda, Tetsuya Ogata, Hiroshi G. Okuno

9th European Conference on Speech Communication and Technology 877 - 880 2005年

　概要を見る

This paper describes the incorporation of contextual information into spoken dialogue systems in the database search task. Appropriate dialogue modeling is required to manage automatic speech recognition (ASR) errors using dialogue-level information. We define two dialogue models: a model for dialogue flow and a model of structured dialogue history. The model for dialogue flow assumes dialogues in the database search task consist of only two modes. In the structured dialogue history model, query conditions are maintained as a tree structure, taking into consideration their inputted order. The constraints derived from these models are integrated by using a decision tree learning, so that the system can determine a dialogue act of the utterance and whether each content word should be accepted or rejected, even when it contains ASR errors. The experimental result showed that our method could interpret content words better than conventional one without the contextual information. Furthermore, it was also shown that our method was domain-independent because it achieved equivalent accuracy in another domain with-out any more training.
Singer identification based on accompaniment sound reduction and reliable frame selection

Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ISMIR 2005 - 6th International Conference on Music Information Retrieval 329 - 336 2005年

　概要を見る

This paper describes a method for automatic singer identification from polyphonic musical audio signals including sounds of various instruments. Because singing voices play an important role in musical pieces with a vocal part, the identification of singer names is useful for music information retrieval systems. The main problem in automatically identifying singers is the negative influences caused by accompaniment sounds. To solve this problem, we developed two methods, accompaniment sound reduction and reliable frame selection. The former method makes it possible to identify the singer of a singing voice after reducing accompaniment sounds. It first extracts harmonic components of the predominant melody from sound mixtures and then resynthesizes the melody by using a sinusoidal model driven by those components. The latter method then judges whether each frame of the obtained melody is reliable (i.e. little influenced by accompaniment sound) or not by using two Gaussian mixture models for vocal and non-vocal frames. It enables the singer identification using only reliable vocal portions of musical pieces. Experimental results with forty popular-music songs by ten singers showed that our method was able to reduce the influences of accompaniment sounds and achieved an accuracy of 95%, while the accuracy for a conventional method was 53%. © 2005 Queen Mary, University of London.
Instrument identification in polyphonic music: Feature weighting with mixed sounds, pitch-dependent timbre modeling, and use of musical context

Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

ISMIR 2005 - 6th International Conference on Music Information Retrieval 558 - 563 2005年

　概要を見る

This paper addresses the problem of identifying musical instruments in polyphonic music. Musical instrument identification (MII) is an improtant task in music information retrieval because MII results make it possible to automatically retrieving certain types of music (e.g., piano sonata, string quartet). Only a few studies, however, have dealt with MII in polyphonic music. In MII in polyphonic music, there are three issues: feature variations caused by sound mixtures, the pitch dependency of timbres, and the use of musical context. For the first issue, templates of feature vectors representing timbres are extracted from not only isolated sounds but also sound mixtures. Because some features are not robust in the mixtures, features are weighted according to their robustness by using linear discriminant analysis. For the second issue, we use an F0-dependent multivariate normal distribution, which approximates the pitch dependency as a function of fundamental frequency. For the third issue, when the instrument of each note is identified, the a priori probablity of the note is calculated from the a posteriori probabilities of temporally neighboring notes. Experimental results showed that recognition rates were improved from 60.8% to 85.8% for trio music and from 65.5% to 91.1% for duo music. © 2005 Queen Mary, University of London.
Walking with body-sense in virtual space using the nonlinear oscillator

Kenri Kodaka, Tetsuya Ogata, Hiroshi G. Okuno

Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics 1 324 - 329 2005年 [査読有り]

　概要を見る

This paper presents a novel construction of a locomotion system that compensates for walking-sense with simple interfaces using hands or fingers. The realization of moving with body-sense in virtual space has required high-quality system designs including walking cancellations, haptic feedback, and high-resolution displays. However, such approaches result in increasing costs of calculation and space, which obstruct the spread of VR technology. We therefore propose a new framework of locomotion systems with simple interfaces that give users "passivity and restraint", which are essential components of walking-sense. They are realized with mutual entrainment between a nonlinear oscillator and users' input. Two experiments were conducted for evaluation. The first one showed that our system gives users sense of distance based on a body standard and sense of rhythm with stable input. The second one demonstrated that the users of our system can experience a subjective body-sense and a sense of velocity. © 2005 IEEE.

DOI
Acquisition of Motion Primitives of Robot in Human-Navigation Task -Towards Human Robot Interaction based on ”Quasi-Symbol

Tetsuya OGATA, Shigeki SUGANO, Jun TANI

人工知能学会論文誌 20 ( 3 ) 188 - 196 2005年

DOI

Scopus

7

被引用数

(Scopus)
Dynamic communication of humanoid robot with multiple people based on interaction distance

Tsuyoshi Tasaki, Shohei Matsumoto, Hayato Ohba, Shunichi Yamamoto, Mitsuhiko Toda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Transactions of the Japanese Society for Artificial Intelligence 20 ( 3 ) 209 - 219 2005年

　概要を見る

Research on human-robot interaction is getting an increasing amount of attention. Since most research has dealt with communication between one robot and one person, quite few researchers have studied communication between a robot and multiple people. This paper presents a method that enables robots to communicate with multiple people using the "selection priority of the interactive partner" based on the concept of Proxemics. In this method, a robot changes active sensory-motor modalities based on the interaction distance between itself and a person. Our method was implemented into a humanoid robot, SIG2. SIG2 has various sensory-motor modalities to interact with humans. A demonstration of SIG2 showed that our method selected an appropriate interaction partner during interaction with multiple people.

DOI

Scopus

6

被引用数

(Scopus)
Extracting Multimodal Dynamics of Objects Using RNNPB.

Tetsuya Ogata, Hayato Ohba, Jun Tani, Kazunori Komatani, Hiroshi G. Okuno

J. Robotics Mechatronics 17 ( 6 ) 681 - 688 2005年 [査読有り]

担当区分：筆頭著者

DOI

Scopus

5

被引用数

(Scopus)
ローカルルールに基づいたネットワーク素子の生成淘汰アルゴリズム

金天海, 尾形哲也, 菅野重樹

計測自動制御学会第5回システムインテグレーション部門学術講演会 2004年12月
情緒交流ロボットWAMOEBA-3 – 人間-ロボット間コミュニケーション研究プラットフォームの開発

有江浩明, 菅佑樹, 尾形哲也, 菅野重樹

計測自動制御学会第5回システムインテグレーション部門学術講演会 2004年12月
Dynamic Communication between Multiple People and Robots Based on Interaction Distance

Tsuyoshi TASAKI, Shohei MATSUMOTO, Hayato OHBA, Mitsuhiko TODA, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Proc. of The second International Workshop on Man-Machine Symbiotic Systems 327 - 337 2004年11月 [査読有り]
Acquisition of Motion Primitives of Robot in Human-Navigation Task: Towards Human-Robot Interaction Based on ”Quasi-Symbol

Tetsuya OGATA, Shigeki SUGANO, Jun TANI

Proc. of The second International Workshop on Man-Machine Symbiotic Systems 315 - 326 2004年11月 [査読有り]

担当区分：筆頭著者
Development of Emotional Communication Robot, WAMOEBA-3

Yuki SUGA, Tetsuya OGATA, Shigeki SUGANO

Proc. of International Conference on Advanced Mechatronics (ICAM 2004) 413 - 418 2004年10月 [査読有り]

CiNii
ヒューマノイドSIG2の近接学に基づく複数の人とのインタラクション

田崎豪, 松本祥平, 大庭隼人, 戸田充彦, 駒谷和範, 尾形哲也, 奥乃博

日本ロボット学会学術講演会予稿集(CD-ROM) 22nd 1E33 2004年09月

J-GLOBAL
Flexible assembly work cooperating system based on work state identifications by a self-organizing map

Yasuhisa Hayakawa, Tetsuya Ogata, Shigeki Sugano

IEEE/ASME Transactions on Mechatronics 9 ( 3 ) 520 - 528 2004年09月 [査読有り]

　概要を見る

This paper presents a method of realizing flexible assembly work cooperation where the assembler is free to carry out the work, without constraints in the process. To realize such systems, there exists an issue of identifying work states during the assembly and to determine when and what kind of support is necessary. As an approach to solve such issues we took a self-organizing approach in constructing a work model, as an abstract model describing typical work states during the assembly. The necessity of support is judged by detecting uncommon work states occurring, and the type of support is determined by detecting the work state. Examples of work state identifications by the self-organized map are shown. We carried out experiments to evaluate the judgment of situational necessity of support and to verify the correct identification rate of typical work states. Finally a robotic support system was constructed that gives supports of autonomously holding and handing out assembly pieces by the judging of situational necessity of support. © 2004 IEEE.

DOI

Scopus

7

被引用数

(Scopus)
データベース検索タスクにおける文脈的制約を用いた音声対話システム

神田直之, 駒谷和範, 尾形哲也, 奥乃博

FIT-2004「情報科学技術レターズ」 2004年09月
ロボットでの利用を目的とした顔画像情報と音声情報の統合による感情認識

松本祥平, 山口健, 駒谷和範, 尾形哲也, 奥乃博

第22回日本ロボット学会学術講演会 22 3D14 2004年09月

CiNii
ロボットと複数の人とのインタラクションにおける距離に基づくコミュニケーション

田崎豪, 松本祥平, 大庭隼人, 戸田充彦, 駒谷和範, 尾形哲也, 奥乃博

第22回日本ロボット学会学術講演会 2004年09月
ミッシングフィーチャー理論に基づく音源分離と音声認識のインターフェースの評価

山本俊一, 中臺一博, 辻野広司, 駒谷和範, 尾形哲也, 奥乃博

第22回日本ロボット学会学術講演会 2004年09月
動作プリミティブを介した人間とロボットの協調

尾形哲也, 菅野重樹, 谷淳

第22回日本ロボット学会学術講演会 2004年09月
データベース検索音声対話システムにおける履歴を考慮した検索条件の管理

神田直之, 駒谷和範, 尾形哲也, 奥乃博

情報科学技術フォーラム FIT 2004 131 - 132 2004年08月

CiNii J-GLOBAL
和音区間検出と和音名同定の相互依存性を解決する和音認識手法

吉岡拓也, 北原鉄朗, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会研究報告 2004 ( 84(MUS-56) ) 33 - 40 2004年08月

　概要を見る

本稿では，音楽音響信号を対象とした和音進行と調の認識手法について述べる．和音区間検出，和音名同定及び調認識の間に存在する相互依存関係により，これらの処理を同時的に行う処理モデルが求められる．このために，我々は，複数生成された和音進行と調に関する評価値付きの仮説の中から，最も評価値が大きい仮説を認識結果として出力する手法を提案する．仮説の評価値は，従来研究で主に用いられてきた音響的特徴に加えて，和音遷移パターン，ベース音にもとづいて計算する．評価実験の結果，10曲のポピュラー音楽のCDから抽出した音響信号に対して，72.1%の認識率を達成した．This paper describes a method that recognizes musical chords and keys from musical audio signals. The mutual dependency among chord-boundary detection, chord-symbol identification, and key identification requires a model where these three are concurrently processed. We propose a method that generates various hypotheses of chord progressions and keys, and selects the most plausible one out of them as the recognition result. The certainty of a hypothesis is evaluated based not only on acoustic features used in most previous methods but also on chord transition patterns and bass sounds. Experimental results on 10 popular music songs show 72.1% average accuracy.

CiNii J-GLOBAL
一次反射音が移動の感覚に与える影響について

小鷹研理, 駒谷和範, 尾形哲也, 奥乃博

日本認知科学会第21回大会 R - 22 2004年07月
Interactive ECを用いたコミュニケーションロボットのための反射的行動の獲得(進化・学習とロボティクス3)

菅佑樹, 尾形哲也, 菅野重樹

ロボティクス・メカトロニクス講演会講演概要集 2004 188 - 188 2004年06月

CiNii
環境音の擬音語変換のための環境音用音素の設計

石原一志, 中谷智宏, 駒谷和範, 尾形哲也, 奥乃博

人工知能学会全国大会(18回) 18 1E2 - 03 2004年06月

CiNii
多方向の唇画像を利用した音声認識

山口健, 山本俊一, 駒谷和範, 尾形哲也, 奥乃博

人工知能学会全国大会(18回) 1E2 - 02 2004年06月

CiNii
データベース検索タスクにおける文脈的制約を用いた音声対話システム

神田直之, 駒谷和範, 尾形哲也, 奥乃博

第41回言語音声理解と対話処理研究会, 人工知能学会 2004年06月
Interactive ECを用いた自律移動ロボットの進化に関する基礎研究

菅佑樹, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2P2 2004年06月
動作模倣による人間とロボットのインタラクション-手先軌道データを用いた動作予測システムの構築

菅佑樹, 秋和祐介, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2A1 2004年06月
音楽音響信号を対象とした和音進行の認識

吉岡拓也, 北原鉄朗, 駒谷和範, 尾形哲也, 奥乃博

日本音響学会, MA 研究会 2004年06月
音楽音響信号を対象とした和音変化時刻と和音名の同時認識

吉岡拓也, 吉井和佳, 北原鉄朗, 櫻庭洋平, 尾形哲也, 奥乃博

情報処理学会第66回全国大会 2004年03月
ロボットの挙動選択のための顔情報と音声情報を統合した感情判別

松本祥平, 山口健, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第66回全国大会 2004年03月
連続環境音の繰り返し構造の認識

服部佑哉, 石原一志, 尾形哲也, 奥乃博

情報処理学会第66回全国大会 2004年03月
マルチモーダル情報による相槌の認識とロボット対話への応用

田崎豪, 山口健, 戸田充彦, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第66回全国大会 2004年03月
音声対話システムにおける話題の構造を用いた効率的な対話管理,

神田直之, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第66回全国大会 2004年03月
ミッシングフィーチャー理論による三話者同時発話認識の向上

山本俊一, 中臺一博, 辻野広司, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第66回全国大会 2004年03月
マルチモーダル情報統合によるヒューマノイドロボットの挙動選択

戸田充彦, 中臺一博, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第66回全国大会 2004年03月
唇情報を利用した混合音声の分離

山口健, 駒谷和範, 尾形哲也, 奥乃博

情報処理学会第66回全国大会 2004年03月
反射音が自己運動感覚に与える影響の考察

小鷹研理, 尾形哲也, 奥乃博

情報処理学会第66回全国大会 2004年03月
環境音の擬音語変換における音素決定曖昧性の解消

石原一志, 服部佑哉, 尾形哲也, 奥乃博

情報処理学会第66回全国大会 2004年03月
自動採譜における音色と定位と音楽知識を統合したパート形成

櫻庭洋平, 尾形哲也, 奥乃博

情報処理学会第66回全国大会 2004年03月
"情"が作る真のコミュニケーション (特集ロボットから人間を読み解く)

菅野重樹, 尾形哲也

日経サイエンス 34 ( 1 ) 34 - 40 2004年01月

　概要を見る

「将来，ロボットに必要な機能は何か」と聞かれたら，多くの研究者は人間とのコミュニケーション能力だと答えるだろう。工場のロボットなら柵で囲い込めばよいが，一緒に作業をしたり人間をサポートするロボットではそうはいかない。

CiNii
Development of Emotional Communication Robot, WAMOEBA-3(Power Assist and Nursing 1,Session: TA1-C)

SUGA Yuki, ARIE Hiroaki, OGATA Tetsuya, SUGANO Shigeki

The Abstracts of the international conference on advanced mechatronics : toward evolutionary fusion of IT and mechatronics : ICAM 2004 41 - 41 2004年

DOI CiNii
Open-end human robot interaction from the dynamical systems perspective: Mutual adaptation and incremental learning

Tetsuya Ogata, Shigeki Sugano, Jun Tani

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) 3029 435 - 444 2004年 [査読有り]

　概要を見る

This paper describes interactive learning between human subjects and robot using the dynamical systems approach. Our research concentrated on the navigation system of a humanoid robot and human subjects whose eyes were covered. We used the recurrent neural network (RNN) for the robot control. We used a "consolidation-learning algorithm" as a model of hippocampus in brain. In this method, the RNN was trained by both a new data and the rehearsal outputs of the RNN, not to damage the contents of current memory. The proposed method enabled the robot to improve the performance even when learning continued for a long time (open-end). The dynamical systems analysis of RNNs supports these differences.

DOI

Scopus

11

被引用数

(Scopus)
Computational Auditory Scene Analysis and Its Application to Robot Audition

Hiroshi G. Okuno, Tetsuya Ogata, Kazunori Komatani, Kazuhiro Nakadai

Proceedings - International Conference on Informatics Research for Development of Knowledge Society Infrastructure, ICKS 2004 73 - 80 2004年 [査読有り]

　概要を見る

We are engaged in research on computational auditory scene analysis to attain sophisticated robot (computer) human interaction by recognizing auditory awareness. The objective of our research is the understanding of an arbitrary sound mixture including non-speech sounds and music as well as voiced speech, obtained by robot's ears (or microphones embedded in the robot). The main issues are sound source localization, separation, and recognition at signal processing levels, and signal-to-symbol transformation at the interface level to symbol processing levels. The latter is critical in developmental communication and we are developing an automatic onomatopoeia recognition system. This paper overviews our activities in robot audition, in particular, active direction-pass filter (ADPF) that separates sounds originating from a specific direction by integrating sound source localization and visual processing. ADPF is implemented on three kinds of robots and demonstrates separating and recognizing three simultaneous speeches with a pair of microphones.

DOI

Scopus

15

被引用数

(Scopus)
Automatic sound-imitation word recognition from environmental sounds focusing on ambiguity problem in determining phonemes

Kazushi Ishihara, Tomohiro Nakatani, Tetsuya Ogata, Hiroshi G. Okuno

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) 3157 909 - 918 2004年 [査読有り]

　概要を見る

Sound-imitation words (SIWs), or onomatopoeia, are important for computer human interactions and the automatic tagging of sound archives. The main problem in automatic SIW recognition is ambiguity in the determining phonemes, since different listener hears the same environmental sound as a different SIW even under the same situation. To solve this problem, we designed a set of new phonemes, called the basic phoneme-group set, to represent environmental sounds in addition to a set of the articulation-based phoneme-groups. Automatic SIW recognition based on Hidden Markov Model (HMM) with the basic phoneme-groups is allowed to generate plural SIWs in order to absorb ambiguities caused by listener- and situation-dependency. Listening experiments with seven subjects proved that automatic SIW recognition based on the basic phoneme-groups outperformed that based on the articulation-based phoneme-groups and that based on Japanese phonemes. The proposed system proved more adequate to use computer interactions. © Springer-Verlag Berlin Heidelberg 2004.

DOI

Scopus

12

被引用数

(Scopus)
Dynamic communication of humanoid robot with multiple people based on interaction distance

Tsuyoshi Tasaki, Shohei Matsumoto, Hayato Ohba, Mitsuhiko Toda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings - IEEE International Workshop on Robot and Human Interactive Communication 71 - 76 2004年 [査読有り]

　概要を見る

Research on human-robot interaction is getting an increasing amount of attention. Since almost all the research has dealt with only communication between one robot and one person, there have been quite few discussions about communication between a robot and multiple people. This paper proposes a method which enables robots to communicate with multiple people using the 'selection priority of the interactive partner' based on the concept of 'Proxemics'. In this method, a robot changes active sensory-motor modalities based on the 'interaction distance' information. The proposed method is implemented into a humanoid robot SIG2 using subsumption architecture. SIG2 has various sensory-motor modalities to interact with humans. A demonstration of SIG2 showed that the proposed method works well during interaction with multiple people. © 2004 IEEE.
Repeat recognition for environmental sounds

Yuya Hattori, Kazushi Ishihara, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proceedings - IEEE International Workshop on Robot and Human Interactive Communication 83 - 88 2004年 [査読有り]

　概要を見る

This paper focuses on recognition of repeats in continuous environmental sounds. Environmental sounds, that are very informative in our daily lives, often repeat several times. Repeat recognition for environmental sounds is essential for compact representation of those sounds and for forecasting the future. In our method, input environmental sound signal is partitioned into several units according to the shapes of the power envelopes, and the auditory distance between every pair of units is computed. Repeating parts are then detected by using the approximate matching algorithm. Experimental results showed the 73.3% of recognition rate in counting the repeats of 30 environmental sounds, the length of each of which ranged from 20 seconds to 2 minutes. © 2004 IEEE.
Imitation based human-robot interaction-roles of joint attention and motion prediction

Yusuke Akiwa, Yuki Suga, Tetsuya Ogata, Shigeki Sugano

Proceedings - IEEE International Workshop on Robot and Human Interactive Communication 283 - 288 2004年 [査読有り]

　概要を見る

Behavior imitation is crucial for the acquisition of intelligence as well as in communication. This paper describes two kinds of experiments of human-robot communication based on behavior imitation. One compared results obtained when the robot did and did not predict the experimental subject 's behaviors by using past datasets, and the other compared results obtained with and without target objects in the simulator environment. The result of former experiment showed that the prediction of the subject's behaviors increase the subject's interest. The result of the latter experiment confirmed that the presence of objects facilitates joint attention and make human-robot communication possible even when the robot uses a simple imitation mechanism. This result shows that in human-robot communication, human not only recognizes the behaviors of the robot passively but also adapts to the situation actively. In conclusion, it is confirmed that motion prediction and the presence of objects for joint attention are important for human-robot communication. © 2004 IEEE.
Acquisition of reactive motion for communication robots using interactive EC

Yuki Suga, Tetsuya Ogata, Shigeki Sugano

2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2 1198 - 1203 2004年 [査読有り]

　概要を見る

We've developed an emotional communication robot, WAMOEBA, using behavior-based techniques. We also proposed motor-agent (MA) model, which is an autonomous distributed-control algorithm constructed of simple sensor-motor coordination. Though it enables WAMOEBA to behave in various ways, the weight of the combinations between different motor agents is influenced by the preferences of the developer. We usually use machine-learning algorithms to automatically configure these parameters for communication robots. However, this makes it difficult to define the quantitative evaluation required for communication. We therefore used the method of interactive evolutionary computation (IEC), which can be applied to problems involving quantitative evaluation. IEC does not require to define a fitness function; this task is performed by users. But the biggest problem with using IEC is human fatigue, which causes insufficiency of individuals and generations for convergence of EC. To fix this problem, we use the prediction function that automatically calculates the fitness values of genes from some samples that have received the human subjective evaluation. Then we carried out the behavior acquisition experiment using the IEC simulation system with the prediction function. As the results of experiments, it is confirmed that diversifying the genetic pool is an efficient way for generating a variety of behavior.

DOI
Human-robot communication using multiple recurrent neural networks

Yoshihiro Sakamoto, Tetsuya Ogata, Shigeki Sugano

2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2 1574 - 1579 2004年 [査読有り]

　概要を見る

On the methodology of robotic design from the traditional view of communication which is assumed as a symbol process, robots are forced to confront the symbol grounding problem. However, if communication is assumed as the analog dynamics and robots are driven by it, robots can avoid the problem and be situated in the environment and to other agents. In this paper we will introduce a new communication system constructed from the view of dynamical systems to achieve the situatedness. This system is that there is a robot in a virtual environment and the control of the robot is shared by human operation using a joystick and a robot controller. As the controller, we adopt multiple recurrent neural networks (MRNN) which are able to cope with complex environments and broad communication that single recurrent net cannot cope with. We conduct two experiments in order to evaluate the effectiveness of MRNN to a low level communication task such as nonverbal interaction. First, we examine the effect of the number of RNNs contained in MRNN. Second, we examine the effect of the context dependency of MRNN. These experiments show the capability of MRNN as a new-type controller of communication robot.

DOI
Human-robot collaboration using behavioral primitives

Tetsuya Ogata, Masaki Matsunaga, Shigeki Sugano, Jun Tani

2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2 1592 - 1597 2004年 [査読有り]

担当区分：筆頭著者

　概要を見る

A novel approach to human-robot collaboration based on quasi-symbolic expressions is proposed. The target task is navigation in which a person with his or her covered and a humanoid robot collaborate in a context-dependent manner. The robot uses a recurrent neural net with parametric bias (RNNPB) model to acquire the behavioral primitives, which are sensory-motor units, composing the whole task. The robot expresses the PB dynamics as primitives using symbolic sounds, and the person influences these dynamics through tactile sensors attached to the robot Experiments with six participants demonstrated that the level of influence the person has on the PB dynamics is strongly related to task performance, the person's subjective impressions, and the prediction error of the RNNPB model (task stability). Simulation experiments demonstrated that the subjective impressions of the correspondence between the utterance sounds (the PB values) and the motions were well reproduced by the rehearsal of the RNNPB model.

DOI
Disambiguation in determining phonemes of sound-imitation words for environmental sound recognition

Kazushi Ishihara, Yuya Hattori, Tomohiro Nakatani, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

8th International Conference on Spoken Language Processing, ICSLP 2004 1485 - 1488 2004年 [査読有り]

　概要を見る

Onomatopoeia, or sound-imitation words (SIWs) are important in informing sound events in human-computer communication. One problem is listener-dependency in recognizing environmental sounds by means of SIWs, that is, different listener hears the same environmental sound as a different SIW even under the same condition. Therefore, the use of usual Japanese phonemes is not adequate to express SIWs. To cope with this ambiguity problem of phoneme determination, we designed a set of new phonemes, referred to as the basic phoneme-groups, to represent environmental sounds. The basic phoneme-group consists of one or more Japanese phonemes, and thus the ambiguity problem is resolved based on it by generating one or more SIWs for a sound event. An HMM-based scheme is adopted to recognize SIWs using the phoneme-groups. Listening experiments with seven subjects showed that automatic SIW recognition based on the basic phoneme-groups outperformed ones based on the other types of phonemes. The recall and precision rate were 56.4% and 72.2%, respectively.
Robot Motion Control using Listener”s Back-Channels and Head Gesture Information

Tsuyoshi TASAKI, Takeshi YAMAGUCHI, Kazunori KOMATANI, Tetsuya OGATA, Hiroshi G. OKUNO

Proc. of International Conference on Spoken Language Processing (ICSLP-2004) 1033 - 1036 2004年 [査読有り]
Automatic Chord Transcription with Concurrent Recognition of Chord Symbols and Boundaries.

Takuya Yoshioka, Tetsuro Kitahara, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Proc. of 2004 International Conference on Musical Information Retrieval (ISMIR-2004) 100 - 105 2004年 [査読有り]
Constructivist approach to human-robot emotional communication - Design of evolutionary function for WAMOEBA-3

Yuki Suga, Hiroaki Arie, Tetsuya Ogata, Shigeki Sugano

2004 4th IEEE-RAS International Conference on Humanoid Robots 2 869 - 884 2004年 [査読有り]

　概要を見る

By applying a self-preservation function and communication capability, we investigated the emergence of the emotional behavior of robots to propose an "evaluation function for self-preservation," a "model of endocrine system," and a "MA model." We also developed a, new hardware platform, WAMOEBA-3, to install our new knowledge into the robot. WAMOEBA-3, a wheel type, independent robot, was designed for easy maintenance and customization. As a new function for WAMOEBA-3, we introduced an evolutionary function for the acquisition of reactive motion that uses an interactive evolutionary computation method. We show the results of simulation experiments and discuss real world applications. © 2004 IEEE.

DOI
Computational Auditory Scene Analysis and Its Application to Robot Audition

Hiroshi G. Okuno, Tetsuya Ogata, Kazunori Komatani, Kazuhiro Nakadai

Proceedings - International Conference on Informatics Research for Development of Knowledge Society Infrastructure, ICKS 2004 73 - 80 2004年

　概要を見る

We are engaged in research on computational auditory scene analysis to attain sophisticated robot (computer) human interaction by recognizing auditory awareness. The objective of our research is the understanding of an arbitrary sound mixture including non-speech sounds and music as well as voiced speech, obtained by robot's ears (or microphones embedded in the robot). The main issues are sound source localization, separation, and recognition at signal processing levels, and signal-to-symbol transformation at the interface level to symbol processing levels. The latter is critical in developmental communication and we are developing an automatic onomatopoeia recognition system. This paper overviews our activities in robot audition, in particular, active direction-pass filter (ADPF) that separates sounds originating from a specific direction by integrating sound source localization and visual processing. ADPF is implemented on three kinds of robots and demonstrates separating and recognizing three simultaneous speeches with a pair of microphones.

DOI

Scopus

15

被引用数

(Scopus)
コミュニケーションを指向した反射的行動の獲得-対話型進化計算を用いたアプローチの提案

菅佑樹, 尾形哲也, 菅野重樹

計測自動制御学会第4回システムインテグレーション部門学術講演会 1004 - 1005 2003年12月
対話型進化的計算手法を用いたロボットの反射行動の獲得

菅佑樹, 尾形哲也, 菅野重樹

第21回日本ロボット学会学術講演会 3J31 2003年09月
人間とロボットのインタラクティブな学習に関する研究

尾形哲也, 真砂紀孝, 菅野重樹

第21回日本ロボット学会学術講演会 1I26 2003年09月
Collaboration Development through Interactive Learning between Human and Robot

Tetsuya OGATA, Noritaka MASAGO, Shigeki SUGANO, Jun TANI

Proc. of 3rd International Workshop on Epigenetic Robotics 99 - 106 2003年08月 [査読有り]

担当区分：筆頭著者
情緒交流ロボットWAMOEBA-3の開発?トルクセンサ内蔵関節の設計

土屋尚文, 有江浩明, 日野貴司, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2003 2P2 - 124 2003年05月

　概要を見る

これまでの研究で導入されたモータエージェントの考えを発展させ, 各関節に小型の制御用マイコンとサーボを搭載した"分散制御システム"を構築することを念頭に, アーム・頸部の関節機構の設計・開発を行った。

CiNii
メカノクリーチャー -生物から学ぶデザインテクノロジー-

尾形哲也, 菅野重樹

日本機会学会 172 - 203 2003年 [招待有り]

担当区分：筆頭著者
Robust modeling of dynamic environment based on robot embodiment

Kuniaki Noda, Mototaka Suzuki, Naofumi Tsuchiya, Yuki Suga, Tetsuya Ogata, Shigeki Sugano

Proceedings - IEEE International Conference on Robotics and Automation 3 3565 - 3570 2003年 [査読有り]

　概要を見る

Recent studies on embodied cognitive science have shown us the possibility of emergence of more complex and nontrivial behaviors with quite simple designs if the designer takes the dynamics of the system-environment interaction into account properly. In this paper, we report our tentative classification experiments of several objects using the human-like autonomous robot, "WAMOEBA-2Ri". As modeling the environment, we focus on not only static aspects of the environment but also dynamic aspects of it including that of the system own. The visualized results of this experiment shows the integration of multimodal sensor dataset acquired by the system-environment interaction ("grasping") enable robust categorization of several objects. Finally, in discussion, we demonstrate a possible application to making "invariance in motion" emerge consequently by extending this approach.

DOI
Flexible assembly work cooperation based on work state identifications by a self-organizing map

Y. Hayakawa, T. Ogata, S. Sugano

IEEE/ASME International Conference on Advanced Intelligent Mechatronics, AIM 2 1031 - 1036 2003年 [査読有り]

　概要を見る

This study presents a method of realizing flexible assembly work cooperation in cases where neither the assembly process, nor the final form of the completed task is pre-defined in advance. To realize such systems, there exists an issue of identifying work states during the assembly, and determine when and what kind of support is necessary. As an approach of solving such issues, identifying the work states from a work model built by self-organizing assembly motions of human is taken. Examples of the work state identifications and a support system, which judges the situational necessity of support and selects whether to hand out or holds assembly parts are shown. The support is carried out based on the work states identified by the self-organized map. Experiments indicate that work state identification by a self-organizing map is effective in flexibly cooperating with human during assembly work.

DOI

Scopus

3

被引用数

(Scopus)
Interactive Learning in Human-Robot Collaboration

Tetsuya Ogata, Noritaka Masago, Shigeki Sugano, Jun Tani

IEEE International Conference on Intelligent Robots and Systems 1 162 - 167 2003年 [査読有り]

　概要を見る

In this paper, we investigated interactive learning between human subjects and robot experimentally, and its essential characteristics are examined using the dynamical systems approach. Our research concentrated on the navigation system of a specially developed humanoid robot called Robovie and seven human subjects whose eyes were covered, making them dependent on the robot for directions. We compared the usual feed-forward neural net-work (FFNN) without recursive connections and the recurrent neural network (RNN). Although the performances obtained with both the RNN and the FFNN improved in the early stages of learning, as the subject changed the operation by learning on its own, all performances gradually became unstable and failed. Results of a questionnaire given to the subjects confirmed that the FFNN gives better mental impressions, especially from the aspect of operability. When the robot used a consolidation-learning algorithm using the rehearsal outputs of the RNN, the performance improved even when interactive learning continued for a long time. The questionnaire results then also confirmed that the subject's mental impressions of the RNN improved significantly. The dynamical systems analysis of RNNs support these differences.

DOI
身体性に基づいた環境・ロボット自身における新奇性検出

野田邦昭, 鈴木基高, 尾形哲也, 菅野重樹

第20回日本ロボット学会学術講演会， 1C31 2002年10月
模倣を主体とした人間とロボットのコミュニケーション?コミュニケーションにおける共同注意の役割

大竹正海, 坂本義弘, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2002 1P1 - K03 2002年06月

　概要を見る

本研究では, 人間とロボットの対面コミュニケーションの実現を目的とし, 最も原始的なコミュニケーション形態である「模倣行動」に着目し研究を行ってきた。その過程において発達心理学の分野で重要とされる「共同注意」に新たに着目した。そして, 指差し動作から人間行動に空間的な情報を付加することによって, 共同注意を促す模倣形態を考案しロボットの行動出力アルゴリズムを構築した。本稿では構築した仮想空間上のロボットと人間とのコミュニケーション実験を通して共同注意の特徴や役割について考察した結果を報告する。

CiNii
分散エージェントを用いた全身協調による動作生成

小宮孝章, 野田邦昭, 土屋尚文, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2002 2P1 - D06 2002年06月

　概要を見る

本研究では「ロボットと人間との情緒交流」の実現を目的とし, これまでにコミュニケーションに必要不可欠な"行動の多様性"を実現するアルゴリズムとして, ビヘイビアベース手法を拡張した独自の「モータェージェント」を提案してきた。本稿ては現在のモータエージェントが離散的動作しか生成できない, という問題点を回避するために, 新たに各エージェントの出力を重ね合せる手法を提案する。シミュレーションによる評価実験の結果, 提案手法により全身関節の並列, 連続動作が実現され, 動的環境の適応能力が向上したことを確認した。

CiNii
Influence of the Eye Motions in Human-Robot Communication and Motion Generation based on the Robot Body Structure

Tetsuya OGATA, Takaaki KOMIYA, Kuniaki NODA, Shigeki SUGANO

Proc. of IEEE/RAS International Conference on Humanoid Robots (Humanoid 2001) 83 - 89 2001年11月 [査読有り]

担当区分：筆頭著者
模倣を主体とした人間─ロボットコミュニケーションに関する基礎実験

大竹正海, 山本大介, 尾形哲也, 菅野重樹

第19回日本ロボット学会学術講演会 109 - 110 2001年09月
内分泌系モデルによる強化学習パラメータの動的調整

尾形哲也, 松本典剛, 菅野重樹

第19回日本ロボット学会学術講演会 1239 - 1240 2001年09月
身体性に基づいた状態表現機能を持つロボットと人間とのコミュニケーション

野田邦昭, 井田真高, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 1P1 - D10 2001年06月
(1) 情動モデルを有する自律ロボット WAMOEBA-2 と人間との情緒交流

尾形哲也, 菅野重樹

日本機械学會誌 104 ( 990 ) 259 - 259 2001年05月

CiNii
感性情報の動的変化に対応したモーションプラニングの実現と評価

尾形哲也, 志村明俊, 渋谷恒司, 菅野重樹

第６回ロボティクスシンポジア 402 - 407 2001年03月
Motion generation of the autonomous robot based on body structure

Tetsuya Ogata, Takaaki Komiya, Shigeki Sugano

IEEE International Conference on Intelligent Robots and Systems 4 2338 - 2343 2001年 [査読有り]

担当区分：筆頭著者

　概要を見る

This study aims to investigate the intelligence which can make robots adapt to the human environment. This paper points out the problems of the behavior-based robot, and proposes the methods which can generate whole body motions based on body structure and integrates the reflection motions to make the behaviors continuous. The motion performances are compared in two kinds of environments, such as a dynamic environment and a static environment, by using a simulator of the autonomous robot WAMOEBA-2Ri developed in this research. Finally, we show that the integration parameters of the proposal method reflect the body structure of the robot and environmental structures.

DOI

Scopus

2

被引用数

(Scopus)
情動表現生成のためのロボット構造とシステムの統合

尾形哲也, 菅野重樹

計測自動制御学会SIシンポジウム 35 - 36 2000年12月
フェーズの印象変化を考慮したバイオリン演奏アルゴリズムの構築

尾形哲也, 志村明俊, 宇野格, 渋谷恒司, 菅野重樹

第21回バイオメカニズム学会学術講演会 173 - 176 2000年11月
情緒交流と自律行動生成のためのロボット構造

尾形哲也, 菅野重樹

第６回 IFTMM会議シンポジウム 14 - 17 2000年11月
Emotional communication between humans and the autonomous robot W AMOEBA-2 (Waseda amoeba) which has the emotion model

Tetsuya Ogata, Shigeki Sugano

JSME International Journal, Series C: Mechanical Systems, Machine Elements and Manufacturing 43 ( 3 ) 568 - 574 2000年09月 [招待有り]

担当区分：筆頭著者

　概要を見る

In this paper, we discuss the communication between autonomous robots and humans through the development of a robot which has an emotion model. The model depicts the endocrine system of humans and has four kinds of hormone parameters to adjust various internal conditions such as motor output, cooling fan output and sensor gain. We surveyed 126 visitors at the '97 International Robot Exhibition held in Tokyo, Japan (Oct. 1997) to evaluate their psychological impressions of the robot. As a result, the human friendliness of the robot was confirmed and some factors of human-robot emotional communication were discovered.

DOI

Scopus

28

被引用数

(Scopus)
Emotional Communication Robot: WAMOEBA-2R – Emotion Model and Evaluation Experiments-

Tetsuya OGATA, Shigeki SUGANO

Proc. of IEEE/RAS International Conference on Humanoid Robots (Humanoid 2000) 93 2000年09月 [査読有り]

担当区分：筆頭著者
人間と自律ロボットのコミュニケーションに関する実験的考察?システム設計と心理評価の異母集団比較

尾形哲也, 松山佳彦, 小宮孝章, 井田真高, 野田邦昭, 菅野重樹

第18回日本ロボット学会学術講演会 479 - 480 2000年09月
身体性に基づくロボットと人間とのコミュニケーション

尾形哲也, 菅野重樹

bit別冊「身体性とコンピュータ」,岡田美智男,三嶋博之,佐々木正人編,共立出版 195 - 207 2000年08月
The adaptive motion by the endocrine system model in an autonomous robot

Tetsuya OGATA, Shigeki SUGANO

Proc of International Symposium on Adaptive Motion of Animals and Machines E30 2000年08月 [査読有り]

担当区分：筆頭著者
自律ロボット WAMOEBA-2Rの開発?アームシステムの搭載と心理実験

尾形哲也, 松山佳彦, 小宮孝章, 井田真高, 野田邦昭, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2000 1A1 - 44 2000年05月

　概要を見る

本研究では生理学の知見を基に, ロボットハードウェアをベースとした行動知能モデル, コミュニケーション知能モデルを提案, その有効性を検証することを目的としている。本稿では開発した自律ロボットWAMOEBA-2R(ワメーバ)のアームハンドシステムの概要, 制御手法と, '99国際ロボット展においてWAMOEBA-2が実現した, 人間との物理的インタラクションに関する評価結果を述べる。

CiNii
未知作業支援に適用可能な作業状態モデル生成手法に関する研究

第５回ロボティクスシンポジア pp.260-265 2000年03月
情緒交流ロボットWamoeba-2Rの開発〜システム構成と評価実験

第５回ロボティクスシンポジア pp.68-73 2000年03月
情緒交流ロボットWamoeba-2Rの開発?システム構成と評価実験

尾形哲也, 松山佳彦, 小宮孝章, 井田真高, 野田邦昭, 菅野重樹

第５回ロボティクスシンポジア pp.68-73 68 - 73 2000年03月
未知作業支援に適用可能な作業状態モデル生成手法に関する研究

早川泰久, 佐竹賢亮, 内田信宏, 尾形哲也, 菅野重樹

第５回ロボティクスシンポジア pp.260-265 260 - 265 2000年03月
Assembly Support Based on Human Model -Provision of Physical Support According to Implicit Desire for Support-.

Yasuhisa Hayakawa, Ikuo Kitagishi, Yusuke Kira, Kensuke Satake, Tetsuya Ogata, Shigeki Sugano

J. Robotics Mechatronics 12 ( 2 ) 118 - 125 2000年 [査読有り]

DOI

Scopus

4

被引用数

(Scopus)
Robotic co-operation system based on a self-organization approached human work model

Yasuhisa Hayakawa, Tetsuya Ogata, Shigeki Sugano

Proceedings - IEEE International Conference on Robotics and Automation 4 4057 - 4062 2000年 [査読有り]

　概要を見る

This study presents a method of human co-operating systems, which can determine when support behavior is necessary by a human work model. We focus on assembly work as the target and propose a self-organizing approach of human work models by sampled human information by vision sensors. The support is determined according to the work model. Such a system would realize provision of support without a strict model of the assembly target and enable support in cases where neither the assembly process, nor the final form of the completed task, is known to the system in advance. First, a method of measuring human information and extracting states where support is necessary, from the human work model is presented. Next, a support system for assembly work co-operation according to the work model, with physical interaction capabilities is described. Experiments were carried out to evaluate and verify the effectiveness of the system. The results show that the constructed assembly support system is effective in both improving performance and increasing friendliness.

DOI
Acquisition of internal representation in robots - toward human-robot communication using primitive language

Tetsuya Ogata, Yoshihiro Matsuyama, Shigeki Sugano

Advanced Robotics 14 ( 4 ) 277 - 291 2000年 [査読有り]

担当区分：筆頭著者

　概要を見る

This research aims to clarify behavior intelligence and human cooperation intelligence of robots by emotion models which are based on the robot's hardware structure. In this paper, a human's mental image (internal expression) is given consideration as a method for emotional expression of robots. The hypothesis model for the acquisition of the internal expression of robots and experimental results using a real autonomous robot are described.

DOI

Scopus

3

被引用数

(Scopus)
A violin playing algorithm considering the change of phrase impression.

Tetsuya Ogata, Akitoshi Shimura, Koji Shibuya, Shigeki Sugano

SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5 2 1342 - 1347 2000年 [査読有り]

担当区分：筆頭著者

　概要を見る

This study focused on the dynamics of KANSEI information and aims to propose the algorithm of motion planning using KANSEI Concretely: the violin playing is regarded as the target motion which will be greatly influenced by KANSEI. This study introduced the multi-agent algorithm in which four physical bowing parameters are agents to adapt the impression transition smoothly maintaining the relationships between the parameters. Ire realized the violin performance suiting for the timbre words by introducing the proposed agent algorithm into the bowing machine developed in this research. As a result of the experiments, it was confirmed that there were various playing performances according to a single impression transition.

DOI
Development of arm system for human-robot emotional communication

T. Ogata, T. Komiya, S. Sugano

IECON Proceedings (Industrial Electronics Conference) 1 475 - 480 2000年 [査読有り]

　概要を見る

This study aims to clarify the cooperation intelligence of robots with an emotion model based on the human biological system. The paper describes the functions of the arm system of the behavior-based robot, WAMOEBA-2, which has an emotion model and can communicate with humans. The specifications of the arm system were determined based on a study of human robot physical interaction. Each joint of the arm is equipped with a torque sensor and the arm is controlled by a distributed agent network system. The network architecture is acquired in a neural network by feedback-error-learning algorithm. In the experiment, human beings can play with the arm system by physical interaction.

DOI

Scopus

2

被引用数

(Scopus)
Development of emotional communication robot: WAMOEBA-2R Experimental evaluation of the emotional communication between robots and humans

T. Ogata, Y. Matsuyama, T. Komiya, M. Ida, K. Noda, S. Sugano

IEEE International Conference on Intelligent Robots and Systems 1 175 - 180 2000年 [査読有り]

担当区分：筆頭著者

　概要を見る

This study aims to clarify the cooperation intelligence of robots. This paper describes the autonomous robot named WAMOEBA-2R which can communicate with humans by both an informational and physical way. WAMOEBA-2R has two arms of which each joint has a torque sensor to realize the physical interaction with humans, the function of the voice recognition and the face recognition. The arms are controlled by a distributed agent network system. The network architecture is acquired in a neural network by the feedback-error-learning algorithm. We surveyed 150 visitors at the '99 International Robot Exhibition held in Tokyo (Oct. 1999) to evaluate their psychological impressions of WAMOEBA-2R. As the result, some factors of the human-robot emotional communication were discovered.

DOI
人間とロボットの情緒的コミュニケーションの実験的評価-アーム・ハンドによる人間との物理的インタラクション

尾形哲也, 菅野重樹

システム制御情報学会論文誌 13 ( 12 ) 566 - 574 2000年 [査読有り]

担当区分：筆頭著者

　概要を見る

This study aims to investigate the intelligence which realizes emotional and physical communication between human and robots. This paper describes the autonomous robot named WAMOEBA 2R which can communicate with humans by both informational and physical way. WAMOEBA-2R has the two arms of which each joint has torque sensor controlled by distributed agent network system. The coefficients in the network architecture are autonomously acquired by using the feedback-error-learning algorithm. We surveyed 150 visitors at the '99 International Robot Exhibition held in Tokyo (Oct. 1999) and 74 visitors at the '99 Tottori Industrial Technology Fair held in Tottori (Nov. 1999) to evaluate their psychological impressions of WAMOEBA-2R. As a result, some factors of the human-robot emotional communication were discovered.

DOI CiNii
Analysis of Design Process by the Observation of Human Motion

Tetsuya OGATA, Yasuhisa HAYAKAWA, Kensuke SATAKE, Shigeki SUGANO

Proc. of International Workshop on Emergent Synthesis (IWES”99) pp.167-172 167 - 172 1999年12月 [招待有り]

担当区分：筆頭著者
人間-ロボット相互関係分析に基づくマニピュレータ支援行動生成

第20回バイオメカニズム学会学術講演会 pp.380-381 1999年11月
人間-ロボット相互関係分析に基づくマニピュレータ支援行動生成

早川泰久, 内田信宏, 佐竹賢亮, 尾形哲也, 菅野重樹

第20回バイオメカニズム学会学術講演会 pp.380-381 380 - 381 1999年11月
Extraction of Human Intention for Human Co-operating Systems -Prototype Assembling Work Support Robot System according to Human Intention-

Yasuhisa HAYAKAWA, Yusuke KIRA, Tetsuya OGATA, Shigeki SUGANO

Proc. of International Conference on Advanced Robotics (ICAR”99) pp. 199-204 199 - 204 1999年10月 [査読有り]
身体運動観察に基づく組立作業過程の記号化と分析

第17回日本ロボット学会学術講演会 pp.1067-1068 1999年09月
人間とロボットの物理的コミュニケーション〜アームシステムの開発と分散エージェントによる行動獲得

第17回日本ロボット学会学術講演会 pp. 423-424 1999年09月
人間とロボットの物理的コミュニケーション〜アームシステムの開発と分散エージェントによる行動獲得

尾形哲也, 小宮孝章, 菅野重樹

第17回日本ロボット学会学術講演会 pp. 423-424 423 - 424 1999年09月
身体運動観察に基づく組立作業過程の記号化と分析

早川泰久, 佐竹賢亮, 尾形哲也, 菅野重樹

，第17回日本ロボット学会学術講演会 pp.1067-1068 1067 - 1068 1999年09月
情緒交流ロボットWamoeba-2のアームハンド機構の開発〜分散エージェントによる動作生成

尾形哲也, 日塔潔, 小宮孝章, 松本典剛, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2P2-79-103 2P2 1999年06月
頭部・手部位置および把持情報の構造化による暗黙意図の抽出

日本機械学会ロボティクスメカトロニクス講演会 2A1-76-118 1999年06月
頭部・手部位置および把持情報の構造化による暗黙意図の抽出

早川泰久, 吉良雄介, 佐竹賢亮, 内田信宏, 尾形哲也, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2A1 1999年06月
情緒交流ロボット Wamoeba-2のアームハンド機構の開発?分散エージェントによる動作生成?

尾形哲也, 日塔潔, 小宮孝章, 松本典剛, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2P2-79-103 2P2 1999年06月
内分泌系モデルを組込んだロボットハードウェア

尾形哲也, 菅野重樹

バイオメカニズム学会誌 23 ( 2 ) 106 - 111 1999年05月 [査読有り]

担当区分：筆頭著者

DOI CiNii
情動モデルを有する自律ロボットWAMOEBA-2と人間との情緒交流

尾形哲也, 菅野重樹

日本機械学会論文誌Ｃ編 65 ( 633 ) 1900 - 1906 1999年05月 [査読有り]

担当区分：筆頭著者
組立作業支援を目的としたシンセシスモデル構築に関する研究—組立動作情報を基にした時系列NNによる構造化

早川泰久, 吉良雄介, 尾形哲也, 菅野重樹

精密工学会春季大会学術講演会講演論文集 PP534 ( 1 ) 534 - 534 1999年03月

CiNii
組立作業支援を目的としたシンセシスモデル構築に関する研究?組立動作情報を基にした時系列NNによる構造化?

早川泰久, 吉良雄介, 尾形哲也, 菅野重樹

精密工学会春季大会学術講演会講演論文集 PP534 ( 1 ) 534 - 534 1999年03月

CiNii
ロボットと人間の情緒交流—ロボットにおける前言語発生の考察

尾形哲也, 松山佳彦, 菅野重樹

第4回ロボティクス・シンポジア PP44-49 44 - 49 1999年03月 [査読有り]

担当区分：筆頭著者
Emotional Communication between Humans and the Autonomous Robot WAMOEBA-2 (Waseda Amoeba) which has the Emotion Model.

Tetsuya OGATA, Shigeki SUGANO

TRANSACTIONS OF THE JAPAN SOCIETY OF MECHANICAL ENGINEERS Series C 65 ( 633 ) 1900 - 1906 1999年 [査読有り] [招待有り]

担当区分：筆頭著者

　概要を見る

This study discusses the communication between autonomous robots and humans through the development of a robot which has an emotion model. The model refers to the internal secretion system of humans and it has four kinds of the hormone parameters to use to adjust various internal conditions such as motor output, cooling fan output and sensor gain. We surveyed 126 visitors at ’97 International Robot Exhibition held in Tokyo (Oct. 1997) in order to exaluate psychological impressions of the robot. As the result, the human friendliness of the robot was confirmed and some factors concerning with the human-robot emotional communication were discovered. © 1999, The Japan Society of Mechanical Engineers. All rights reserved.

DOI

Scopus

3

被引用数

(Scopus)
Emotional communication between humans and the autonomous robot which has the emotion model

Tetsuya Ogata, Shigeki Sugano

Proceedings - IEEE International Conference on Robotics and Automation 4 3177 - 3182 1999年 [査読有り]

担当区分：筆頭著者

　概要を見る

This study discusses the communication between autonomous robots and humans through the development of a robot which has an emotion model. The model refers to the internal secretion system of humans and it has four kinds of the hormone parameters to use to adjust various internal conditions such as motor output, cooling fan output and sensor gain. We surveyed 126 visitors at '97 International Robot Exhibition held in Tokyo, Japan (Oct. 1997) in order to evaluate psychological impressions of the robot. As a result, the human friendliness of the robot was confirmed and some factors of the human-robot emotional communication were discovered.

DOI
Human Robot Communication by Physical Interaction – Distributed Agent Control System and The Learning Algorithm -

Tetsuya OGATA, Takaaki KOMIYA, Shigeki SUGANO

Proc. of IEEE International Conference on Systems, Man, and Cybernetics (SMC”99) 2 1005 - 1010 1999年 [査読有り]

担当区分：筆頭著者
Emotional Communication Between Humans and Robots - Consideration of Primitivee Language in Robots

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'99) 2 870 - 875 1999年 [査読有り]

担当区分：筆頭著者

DOI
人間のシンセシス活動の抽出とモデル化—人間機械協調システムを用いた一考察

日本学術振興会未来開拓事業人間指向シンセシスの科学早稲田慶応成果報告会（第2回）講演論文集 PP70-73 1998年12月
人間のシンセシス活動の抽出とモデル化?人間機械協調システムを用いた一考察?

尾形哲也, 早川泰久, 吉良祐介, 菅野重樹

日本学術振興会未来開拓事業人間指向シンセシスの科学早稲田慶応成果報告会（第2回）講演論文集 70 - 73 1998年12月
ロボットにおけるホルモン系モデルとその影響—身体制御と環境適応

バイオメカニズム学会学術講演会 PP71-72 1998年11月
組立作業支援システムの構造—人間指向シンセシスの解釈

日本機械学会設計工学・システム部門講演会 PP169-170 1998年11月
組立作業支援システムの構造—人間指向シンセシスの解釈

尾形哲也, 早川泰久, 菅野重樹

日本機械学会設計工学・システム部門講演会 PP169-170 169 - 170 1998年11月
ロボットにおけるホルモン系モデルとその影響—身体制御と環境適応

尾形哲也, 菅野重樹

バイオメカニズム学会学術講演会 PP71-72 71 - 72 1998年11月
Mechanisms of Internal Secretion System for Intellectual Robots -Towards an Emergence of Emotion in Robots

Tetsuya OGATA, Shigeki SUGANO

Proc. of IEEE International Workshop on Robot and Human Communication (ROMAN”98) PP50-55 50 - 55 1998年10月 [査読有り]

担当区分：筆頭著者
Emergence of Primitive Verbal Communication in Robots

5th International Conference on Soft Computing and Information / Intelligent Systems PP284-287 284 - 287 1998年10月 [査読有り]

担当区分：筆頭著者
ロボットの前言語的表現とコミュニケーション

第16回日本ロボット学会学術講演会 PP645-646 1998年09月
ロボットにおける感覚行動マップと行動計画への展開

第16回日本ロボット学会学術講演会 PP429-430 1998年09月
ロボットにおける感覚行動マップと行動計画への展開

尾形哲也, 日塔潔, 菅野重樹

第16回日本ロボット学会学術講演会 PP429-430 429 - 430 1998年09月
ロボットの前言語的表現とコミュニケーション

尾形哲也, 松山佳彦, 大塚卓美, 菅野重樹

第16回日本ロボット学会学術講演会 PP645-646 645 - 646 1998年09月
ロボットの身体性に基づく感情モデルと内部表象獲得モデル

日本機械学会ロボティクスメカトロニクス講演会 2CII4-3 1998年06月
ロボットの自律系モデルによる情緒表現と評価実験

日本機械学会ロボティクスメカトロニクス講演会 2BII1-6 1998年06月
ロボットの身体性に基づく感情モデルと内部表象獲得モデル

尾形哲也, 大塚卓美, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2CII4-3 2CII4 - 3 1998年06月
ロボットと人間の情緒交流—自律系モデルの効果と試行実験

第3回ロボティクス・シンポジア PP41-46 1998年05月
ロボットと人間の情緒交流—自律系モデルの効果と試行実験

尾形哲也, 菅野重樹

第3回ロボティクス・シンポジア PP41-46 41 - 46 1998年05月
Acquisition of Holophrastic Speech in Autonomous Robots-Toward the Emergence of Verbal Communication in Robots

Tetsuya OGATA, Shigeki SUGANO

Toward a Science of Consciousness "Tucson III" 250 1998年04月

担当区分：筆頭著者
人間共存ロボットにおける心的コミュニケーション

菅野重樹, 渋谷恒司, 尾形哲也

電気学会全国大会講演論文集 S18 - 21 1998年03月
ロボットの自律系モデルによる情緒表現と評価実験

尾形哲也, 大塚卓美, 菅野重樹

日本機械学会ロボティクスメカトロニクス講演会 2BII1-6 2BII1 - 6 1998年03月
行動型自律ロボットWAMOEBA-2と人間との情緒的インタラクション実験

菅野重樹, 渋谷恒司, 尾形哲也

第３回重点領域研究「知能ロボット」シンポジウム予稿集 149 - 152 1998年01月
Communication between behavior-based robots with emotion model and humans.

Tetsuya Ogata, Shigeki Sugano

1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5 2 1095 - 1100 1998年 [査読有り]

担当区分：筆頭著者

　概要を見る

This study discusses the communication between autonomous robots and humans through the development of a robot which has an emotion model. The model refers to the internal secretion system of humans and it has four kinds of the hormone parameters to use to adjust various internal conditions such as motor output, cooling fan output and sensor gain. We surveyed 126 visitors at '97 International Robot Exhibition held in Tokyo, Japan (Oct. 1997) in order to evaluate psychological impressions of the robot. As a result, the human friendliness of the robot was confirmed and some factors of the human-robot emotional communication were discovered.

DOI
シンセシスのための心的言語処理

尾形哲也, 菅野重樹

日本学術振興会未来開拓事業人間指向シンセシスの科学早稲田慶応成果報告会（第1回）講演論文集 9 - 10 1997年12月
ロボットにおける自律的情動反応の生成

尾形哲也, 菅野重樹

第15回ロボット学会学術講演会予稿集 2 385 - 386 1997年09月
自律ロボットの人間との対話表現の獲得

尾形哲也, 大塚卓美, 菅野重樹

第15回ロボット学会学術講演会予稿集 2 385 - 386 1997年09月
異種ロボット間のインタラクション試行実験

尾形哲也, 川端邦明, 藤井輝夫, 淺間一, 遠藤勲, 菅野重樹

第15回ロボット学会学術講演会予稿集 3 833 - 834 1997年09月
Emergence of emotional expression of robots based on the internal secretion system

Tetsuya OGATA, Shigeki SUGANO

The Brain and Self Workshop: Toward a Science of Consciousness 61 1997年08月 [査読有り]

担当区分：筆頭著者
人間と機械の心的交流

菅野重樹, 尾形哲也

計測自動制御学会学術講演会予稿集 1327 - 1328 1997年08月
ロボットにおける自律系?WAMOEBA-2の自律系ハードウェアとその影響?

尾形哲也, 菅野重樹

日本機械学会ロボティクス・メカトロニクス講演会”97 予稿集A 567 - 568 1997年06月
ニューラルネットワークによるビヘイビアネットワークの自己組織化

尾形哲也, 北岸郁雄, 菅野重樹

日本機械学会ロボティクス・メカトロニクス講演会”97予稿集B 1169 - 1172 1997年06月
ロボットにおける自律系の導入?ハードウェア構成とアルゴリズムの提案?

菅野重樹, 尾形哲也, 渋谷恒司

第２回重点領域研究「知能ロボット」シンポジウム予稿集 111 - 114 1997年01月

CiNii
Mechanical systems for autonomic nervous system in robots

Tetsuya Ogata, Shigeki Sugano

IEEE/ASME International Conference on Advanced Intelligent Mechatronics, AIM 113 1997年 [査読有り]

担当区分：筆頭著者

　概要を見る

The similarities between the human autonomic nervous system and the hardware mechanisms of robots are discussed. The hardware of the autonomic nervous system incorporated into the independent autonomous mobile robot WAMOEBA-2 (Waseda Artificial Mind On Emotion BAse) which has been developed for the realization of the emotional communication between humans and robots is presented. WAMOEBA-2 has four kinds of hormone parameters using the original algorithm, the evaluation function for self-preservation. These parameters influence various parts of the mechanical system so that WAMOEBA-2 can cope with its environment and `express emotions'.
自己保存に基づくロボットの行動生成方法論と機械モデルの実体化

尾形哲也, 菅野重樹

日本ロボット学会誌 15 ( 5 ) 710 - 721 1997年 [査読有り]

担当区分：筆頭著者

　概要を見る

The objective of this work is to develop a new robot intelligence for human-machine communication and environment-machine interaction based on the self-preservation function which should be involved in human mind. In this paper, the system chart expressing the human brain information processing, and the development of an autonomous mobile robot "WAMOEBA-1R" (Waseda Artificial Mind On Emotion BAse) are described. The concept of the WAMOEBA-1R design is that robots should have Self-Preservation Evaluation Function as an emotion function. Further more, the method to evaluate the whole system are described from the viewpoint of the animal psychology. As a result of the experiments, WAMOEBA-1R showed specific reactions with color emotion appearances to some simulations. WAMOEBA-1R had the sense of values about colors and sounds based on Self-Preservation, and achieved new type human-machine communications by expressing an emotion color.

DOI CiNii
Generation of behavior automaton on neural network

Tetsuya Ogata, Kazuki Hayashi, Ikuo Kitagishi, Shigeki Sugano

IEEE International Conference on Intelligent Robots and Systems (IROS97) 2 608 - 613 1997年 [査読有り]

担当区分：筆頭著者

　概要を見る

To plan behavior procedures, it is necessary for an agent to have a world model concerning the temporal sequences information. In this paper, a temporal information learning algorithm is proposed with a three layer neural network implemented the `effectiveness of simulation accumulation' algorithm. This algorithm can construct `behavior automaton' in the neural network. From the results of some learning experiments using a mobile robot simulation, the generated automaton express the complexity of the simulation environments. The robot agent acquires a behavior automaton for obstacle avoidance behavior which is influenced by the simulation environment.

DOI
Emotional behavior adjustment system in robots

Tetsuya Ogata, Shigeki Sugano

Robot and Human Communication - Proceedings of the IEEE International Workshop 352 - 357 1997年 [査読有り]

担当区分：筆頭著者

　概要を見る

To cope with dynamically changing environments, creatures have not only `motor nervous system' but `autonomic nervous system' as an internal adjustment mechanism. However conventional robots have been equipped with only the motor nervous system. We developed an autonomous mobile robot implemented a internal adjustment model based on self-preservation. This model involves the original system named `Behavior-Adjustment System', which influences a whole condition of the robot. This paper describes the concrete methodology for application of the behavior-adjustment system to robots. We confirmed the feasibility of the proposed system for the mobile robot in two practical experiments; `obstacle avoidance problem' and `effect of the voltage of battery'.
Sense syncretic model toward the construction of "robot-original-language"

T. Ogata, S. Sugano

Proceedings 5th IEEE International Workshop on Robot and Human Communication. RO-MAN'96 TSUKUBA 1996年11月 [査読有り]

担当区分：筆頭著者

DOI
ロボットにおける自律系の機構

尾形哲也, 中村好伸, 菅野重樹

第14回ロボット学会学術講演会予稿集 389 - 390 1996年11月
ロボットの自律的行動計画の生成?神経回路による時系列オートマトンの獲得とその応用?

尾形哲也, 林一樹, 大塚卓美, 北岸郁雄, 菅野重樹

第14回ロボット学会学術講演会予稿集 2 699 - 700 1996年11月
人間とロボットの情緒交流に関する研究-情緒の表出とその形態に関する一考察-

尾形哲也, 中村好伸, 菅野重樹

第14回ロボット学会学術講演会予稿集 3 1149 - 1150 1996年11月

担当区分：筆頭著者
移動ロボットにおける行動計画のための時系列情報獲得-刺激蓄積効果による時系列ネットワークの生成-

尾形哲也, 林一樹, 菅野重樹

日本機械学会ロボティクス・メカトロニクス講演会”96予稿集 133 - 136 1996年06月

担当区分：筆頭著者
人間とロボットの情緒交流に関する研究-評価ロボット"WAMOEBA-2"の設計と製作-

尾形哲也, 菅野重樹

日本機械学会ロボティクス・メカトロニクス講演会”96予稿集 449 - 452 1996年06月

担当区分：筆頭著者
人間共存ロボットにおける情緒の意義

菅野重樹, 尾形哲也

日本機械学会，No.96-8，第１回ロボメカ・シンポジア講演論文集 97 - 100 1996年05月
ロボット工学を考える

白井良明, 音田弘, 前田浩一, 尾形哲也, 坪内孝司, 奥山健一, 横井一仁, 川内陽志生, 市川誠

日本ロボット学会誌 14 ( 3 ) 326 - 329 1996年04月

DOI CiNii
Emergence of mind in robots for human interface - research methodology and robot model

Shigeki Sugano, Tetsuya Ogata

Proceedings - IEEE International Conference on Robotics and Automation 2 1191 - 1198 1996年 [査読有り]

担当区分：最終著者

　概要を見る

The objective of this work is to develop the technology for human-machine communication through the research of the emergence of mind in mechanical systems. In this paper, the hypothesis about the emergence of mind are proposed. First, the system chart expressing the human brain information processing, and the development of an autonomous mobile robot 'WAMOEBA-IR' (Waseda Artificial Mind On Emotion BAse) are described. The conception of the WAMOEBA-IR design is that robots should have Self-Preservation Evaluation Function. Further more, the method to evaluate the whole system are described from the viewpoint of the animal psychology. As a result of the experiments, WAMOEBA-IR showed specific emotional reactions with color appearances to some situations. WAMOEBA-IR has the sense of values about colors and sounds based on Self-Preservation as the first step of the emergence of mind.

DOI
情緒交流ロボットWAMOEBA-2の設計

菅野重樹, 尾形哲也

第１回重点領域研究「知能ロボット」シンポジウム予稿集 117 - 120 1996年01月
ロボットが心を持つ可能性に関する研究-内外感覚の統合-

尾形哲也, 菅野重樹

第13回ロボット学会学術講演会予稿集 1195 - 1196 1995年11月

担当区分：筆頭著者
ロボットにおける自律系と移動方向判断アルゴリズム

尾形哲也, 菅野重樹

第13回ロボット学会学術講演会予稿 265 - 266 1995年11月
人間とロボットの情緒交流に関する研究-ロボットの内部状態の表出-

尾形哲也, 菅野重樹

日本機械学会ロボティクス・メカトロニクス講演会”95予稿集 877 - 878 1995年06月

担当区分：筆頭著者
ロボットにおける心の発生-志向性と行動-

尾形哲也, 菅野重樹

日本機械学会第72期通常総会講演論文集(IV) 271 - 272 1995年04月

担当区分：筆頭著者
ロボットにおける心の発生-記憶情報処理の導入-

尾形哲也, 山本健次郎, 菅野重樹

第12回ロボット学会学術講演会予稿集 129 - 130 1994年11月

担当区分：筆頭著者
三面図の暖昧性除去における二分決定グラフの利用

望月敬太, 西出俊, 奥乃博, 尾形哲也

全国大会講演論文集 49 ( 0 ) 9 - 10 1994年09月

　概要を見る

三面図は3次元物体の2次元平面上での表現として,歴史的に広く用いられてきた経緯がある.ただ,本来3次元空間に存在するものを2次元平面上へ写像するため,三面図が曖昧になるケースが存在し,その際には対応する3次元モデルが一意に定まらないという不都合が生じる.この不都合を克服するために,対応する3次元モデルを一意に生成するためにヒューリスティクスを用いる研究も存在するが,我々はこれまで三面図の曖昧性の除去に物体表現の観点から取り組んできた.3次元物体のその他の表現法として,位相的に異なるすべての可能な面図を節点とし,それらの間の可能な遷移関係に対して枝を張ったaspect graphと呼ばれるものがある.我々は,aspect graphが有する表現力の完全性を情報量の冗長性をできるだけ避けながら三面図に反映させること,つまり生成用の表現と認識用の表現の統合を目指す観点にたって研究を進めている.なお本稿で扱う3次元物体のクラスは,多面体のうち複数に分離していないもの(多様体)に限るものとする.

CiNii J-GLOBAL
ロボットにおける心の発生-第1報：評価用機械モデルの設計・製作-

菅野重樹, 玉本淳一, 山本健次郎, 尾形哲也, 加藤一郎

第11回ロボット学会学術講演会予稿集 763 - 766 1993年11月

CiNii
ロボットにおける心の発生-第2報：自己保存評価関数と基礎実験-

菅野重樹, 玉本淳一, 山本健次郎, 尾形哲也, 加藤一郎

第11回ロボット学会学術講演会予稿集 767 - 770 1993年11月

▼全件表示

書籍等出版物

ロボット工学ハンドブック

日本ロボット学会( 担当：編集, 担当範囲: 第III編主査)

コロナ社 2023年03月 ISBN: 9784339046793
Cognitive robotics

Cangelosi, Angelo, 浅田, 稔( 担当：共著, 担当範囲: Machine Learning for Cognitive Robotics (Chapter 9))

MIT Press 2022年05月 ISBN: 9780262046831
有斐閣現代心理学辞典

子安, 増生, 丹野, 義彦, 箱田, 裕司( 担当：分担執筆)

有斐閣 2021年02月 ISBN: 9784641002661
「こころ」とアーティフィシャル・マインド

河合, 俊雄, 吉岡, 洋, 西垣, 通, 尾形, 哲也, 長尾, 真

創元社 2021年02月 ISBN: 9784422117577
AI人工知能特集号

MITテクノロジーレビュー編集部( 担当：共著)

角川アスキー総合研究所,KADOKAWA (発売) 2020年09月 ISBN: 9784049110517
AI事典

中島, 秀之, 浅田, 稔, 橋田, 浩一, 松原, 仁, 山川, 宏(人工知能), 栗原, 聡, 松尾, 豊( 担当：分担執筆, 担当範囲: 「深層学習によるインタラクション・記号の創発」，「多種感覚による“能動的”認識」，「行動学習」)

近代科学社 2019年12月 ISBN: 9784764906044
発達ロボティクスハンドブック : ロボットで探る認知発達の仕組み

Cangelosi, Angelo, Schlesinger, Matthew, 萩原, 良信, 荒川, 直哉, 長井, 隆行, 尾形, 哲也, 稲邑, 哲也, 岩橋, 直人, 杉浦, 孔明, 牧野, 武文, 岡田, 浩之, 谷口, 忠大

福村出版 2019年01月 ISBN: 9784571230592
人工知能がもたらす技術の革新と社会の変貌

情報処理推進機構AI白書編集委員会, 情報処理推進機構( 担当：分担執筆, 担当範囲: 1.3.6節)

角川アスキー総合研究所,KADOKAWA (発売) 2017年07月 ISBN: 9784048996075
ディープラーニングがロボットを変える

尾形, 哲也

日刊工業新聞社 2017年07月 ISBN: 9784526077326
人工知能・機械学習・ディープラーニング関連技術とその活用 : 何ができるのか?何が必要なのか?産業利用を考える人のための

情報機構( 担当：分担執筆, 担当範囲: 第3節1項)

情報機構 2016年06月 ISBN: 9784865021110
メカノクリーチャ : 生物から学ぶデザインテクノロジー

日本機械学会( 担当：分担執筆, 担当範囲: 第7章)

コロナ社 2003年04月 ISBN: 4339045683
知の創成 : 身体性認知科学への招待

Pfeifer, Rolf, Scheier, Christian, 石黒, 章夫, 小林, 宏, 細田, 耕( 担当：共訳, 担当範囲: 第12章)

共立出版 2001年11月 ISBN: 9784320120327
身体性とコンピュータ

岡田, 美智男, 三嶋, 博之, 佐々木, 正人( 担当：共著, 担当範囲: 身体性に基づくロボットと人間とのコミュニケーション)

共立出版 2001年08月 ISBN: 4320120205
What should be computed to understand and model brain function? : from robotics, soft computing, biology and neuroscience to cognitive philosophy

喜多村, 直( 担当：共著, 担当範囲: Consideration of Emotion Model and Primitive Language of Robots, Chapter 1)

World Scientific 2001年 ISBN: 9810245181

▼全件表示

講演・口頭発表等

End-to-end制御のロボットによる多様なタスク汎化への挑戦と今後の展望

尾形哲也 [招待有り]

【第88回人工知能セミナー】フィジカルAI：ロボット

発表年月： 2025年09月
ヒューマノイドの進化と課題、日本の現在位置～ヒューマノイド研究の最前線、AIロボット協会･尾形理事長に聞く

尾形哲也 [招待有り]

ロボスタ, オンライン

発表年月： 2025年09月
ロボット基盤モデルによる汎用AIロボットへ向けて

尾形哲也 [招待有り]

DX&AI Forum 2025 Fall 東京

発表年月： 2025年09月
データ駆動型ロボットシステムの基礎研究と社会実装へ向けて

尾形哲也 [招待有り]

第43回日本ロボット学会学術講演会

発表年月： 2025年09月
テクノロジーと表現の融合による感動体験

尾形哲也, 橋田朋子, 原口竜也, 芦ヶ原隆之 [招待有り]

ソニーグループ寄附講座「クリエイティブエンタテインメント学」開講記念シンポジウム「表現工学とエンタテインメントの未来」

発表年月： 2025年09月
From Voice to Touch: The Next Frontier in Human-Robot Interaction

Tetsuya Ogata [招待有り]

Workshop on Co-Designing Human-Centered Robotics for Enhanced Social Teaming and Well-Being (CO-HABIT WORKSHOP 2025), IEEE RO-MAN2025

発表年月： 2025年08月
ロボットAIの第一人者と語る、世界競争の最前線と日本の戦略

尾形哲也, 頼嘉満 [招待有り]

ファーストライト・キャピタル株式会社主催

発表年月： 2025年07月
日本版『ロボット基盤モデル』が拓くAIロボットの社会実装

尾形哲也 [招待有り]

未来モノづくり国際EXPO

発表年月： 2025年07月
Physical AIへの挑戦：AIロボット協会とJST CRESTの構想

尾形哲也 [招待有り]

ROKLive 2025 Japan

発表年月： 2025年07月
オープンなロボット基盤モデル構築と展望

尾形哲也 [招待有り]

SSKセミナー

発表年月： 2025年06月
Open Ecosystem for Foundational Robot Models

Tetsuya Ogata [招待有り]

The International Symposium on Computer Architecture (ISCA2025), Workshop on Architecture Support for Embodied AI Systems

発表年月： 2025年06月
実世界へ進出するAI～ロボット基盤モデルへの挑戦

尾形哲也 [招待有り]

NIKKEI 生成AIシンポジウム

発表年月： 2025年06月
AIロボット基盤モデルの普及と活用推進に向けて

尾形哲也 [招待有り]

Interop25 Tokyo

発表年月： 2025年06月
AI×ヒューマノイドの可能性

尾形哲也 [招待有り]

第11回G1ベンチャー, グロービス経営大学院

発表年月： 2025年06月
Physical AIの学術的課題と社会実装へ向けて

尾形哲也 [招待有り]

身体化AIシンポジウム

発表年月： 2025年06月
フィジカルAIシステムの研究開発～身体性に基づく知能の研究～

尾形哲也 [招待有り]

第39回人工知能学会全国大会

発表年月： 2025年05月
Learning Tasks and Interactions for AI Robots Assisting Human Daily Life

Tetsuya Ogata [招待有り]

The 2nd Workshop on Nonverbal Cues for Human-Robot Cooperative Intelligence, IEEE International Conference on Robotics and Automation (ICRA2025)

発表年月： 2025年05月
次世代のロボット基盤モデルの展望とその応用可能性

尾形哲也 [招待有り]

NVIDIA GTC

発表年月： 2025年03月
ロボット基盤モデルにおける学習と推論のアプローチ

尾形哲也 [招待有り]

第8回Generative AI勉強会

発表年月： 2025年02月
発達ロボティクスの視点から見たロボット基盤モデル

尾形哲也 [招待有り]

ものづくりAI・データ活用DAY2025

発表年月： 2025年01月
Predictive Inference for Efficient AI on Robots

Tetsuya Ogata [招待有り]

Workshop on Predictive Inference for Efficient AI on Robots, IEEE/SICE International Symposium on System Integration (SII2025)

発表年月： 2025年01月
ロボットのための生成AIと能動的推論

尾形哲也 [招待有り]

GPU UNITE 2024

発表年月： 2024年12月
生成AIによる実環境インタラクションと汎用ロボットの可能性

尾形哲也 [招待有り]

AI・人工知能EXPO【秋】

発表年月： 2024年11月

開催年月：
2024年11月

　

　
総合知による人類への貢献

尾形哲也 [招待有り]

早稲田オープン・イノベーション・フォーラム2024

発表年月： 2024年11月
Toward a General-Purpose AI Robot to Assist Humans

Tetsuya Ogata [招待有り]

EASE Fall School

発表年月： 2024年11月

開催年月：
2024年11月

　

　
Workshop on Collecting, Managing and Utilizing Data through Embodied Robots

Tetsuya Ogata [招待有り]

IEEE/RSJ IROS2024

発表年月： 2024年10月
人と技術のソーシャル・イノベーション：〇〇社会をデザインする

尾形哲也 [招待有り]

日本ソーシャル・イノベーション学会

発表年月： 2024年09月
AI とロボットの進化の現状と将来展望

尾形哲也 [招待有り]

第 93 回 WIN 定例講演会・第 48 人間情報学会講演会

発表年月： 2024年09月
「製造業におけるAI技術の未来を探る：基礎から産業応用まで」

尾形哲也 [招待有り]

IEEE ITS Society 名古屋チャプタ

発表年月： 2024年08月
人間を支援する汎用型のAIロボットへ向けて

尾形哲也 [招待有り]

次世代向け2024年夏AIセミナー, 人工知能学会

発表年月： 2024年08月
Design of Deep Learning for Robotics and Applications

Tetsuya Ogata [招待有り]

IEEE WIE SL Section in collaboration with the IEEE RAS Sri Lanka Chapter and the IEEE CIS Sri Lanka Chapter

発表年月： 2024年06月
AIとロボットが融合する未来〜次世代ロボットのビジョン〜

尾形哲也 [招待有り]

電子機器トータルソリューション展 2024

発表年月： 2024年06月
次世代AIモデルの研究開発へ

尾形哲也, 谷口忠大, 三宅陽一郎, 牛久祥孝, 山川宏 [招待有り]

人工知能学会全国大会企画セッション

発表年月： 2024年05月
⼤規模マルチモーダル物理基盤モデルとしてのロボティクス

尾形哲也 [招待有り]

第152回ロボット工学セミナー

発表年月： 2024年05月
Deep predictive learning for robot task learning in real world

Tetsuya Ogata [招待有り]

MyoSymposium’24, IEEE International Conference on Robotics and Automation (ICRA2024)

発表年月： 2024年05月
Implementation of deep predictive learning for multiple robot tasks

Tetsuya Ogata [招待有り]

2nd Workshop on Mobile Manipulation and Embodied Intelligence (MOMA.v2), IEEE International Conference on Robotics and Automation (ICRA2024)

発表年月： 2024年05月
次世代 AI としての知能の身体化 -AI ロボットの新展開

尾形哲也 [招待有り]

東北大学人工知能エレクトロニクス卓越大学院プログラム講演会

発表年月： 2024年05月
実世界での人間との共生のためのロボット知能へ向けて

尾形哲也 [招待有り]

CREST「共生インタラクション」領域2023年度終了課題成果発表シンポジウム

発表年月： 2024年03月
Enhancing Robot Performance: Deep Predictive Learning for Adaptive Perception and Action

Tetsuya Ogata [招待有り]

3rd International Conference on Image Processing and Robotics (ICIPRob2024)

発表年月： 2024年03月
Dynamic adaptability in AI with active Inference for real-world robots

Tetsuya Ogata [招待有り]

The 10th IEEJ International Workshop on Sensing, Actuation, Motion Control, and Optimization (SAMCON2024)

発表年月： 2024年03月
AIロボットがもたらす技術革新と未来社会

尾形哲也 [招待有り]

TSC10周年記念特別セミナー, モノづくり日本会議

発表年月： 2024年02月
多様なロボット動作を可能とする深層予測学習とスマートロボットの開発展望

尾形哲也 [招待有り]

ロボット・AIシンポジウム2024名古屋

発表年月： 2024年02月
Bridging the Gap: AI’s Transition to Real-World Tasks

Tetsuya Ogata [招待有り]

4th Nobel Turing Challenge Initiative Workshop

発表年月： 2024年02月
目標３菅野プロジェクト「一人に一台一生寄り添うスマートロボット」

尾形哲也 [招待有り]

ムーンショット早稲田デー

発表年月： 2024年02月
次世代LLMとしてのLMMとロボットへの展開

尾形哲也 [招待有り]

JATES科学技術と経済の会

発表年月： 2024年02月
生成AIとロボティクスの融合と展望

尾形哲也 [招待有り]

新春PMセミナー2024

発表年月： 2024年01月
Deep Learning for Robotics: Enhancing Adaptive Perception and Action through Predictive Models

Tetsuya Ogata [招待有り]

29th International Symposium on Artificial Life and Robotics (AROB)

発表年月： 2024年01月
能動的推論と深層予測学習

尾形哲也 [招待有り]

CAN2024プレ・キックオフシンポジウム

発表年月： 2023年12月
AIとロボットの融合のための深層予測学習と実装事例

尾形哲也 [招待有り]

NEDOロボット・AIフォーラム2023

発表年月： 2023年11月
汎用ロボットとサステナビリティ

尾形哲也 [招待有り]

早稲田オープン・イノベーション・フォーラム（WOI2023）

発表年月： 2023年11月
Predictive Coding-inspired Robotics: Advancing Adaptability through Deep Predictive Learning

Tetsuya Ogata [招待有り]

Workshop on World Models and Predictive Coding in Cognitive Robotics (in IROS2023)

発表年月： 2023年10月
Deep Predictive Learning in Robotics: Optimizing Models for Adaptive Perception and Action

Tetsuya Ogata [招待有り]

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2023)

発表年月： 2023年10月
予測符号化原理に基づく実ロボットの知能化と事例

尾形哲也 [招待有り]

応用脳科学コンソーシアム

発表年月： 2023年09月
生成AIと次世代AIとしてのロボット応用

尾形哲也 [招待有り]

EWE三月会2023年9月例会

発表年月： 2023年09月
Embodied AI with the Concept of Active Inference

Tetsuya Ogata [招待有り]

International Workshop on Active Inference (IWAI2023)

発表年月： 2023年09月
Deep Predictive Learning: Empowering Robots for Complex Tasks and Revolutionary Applications

Tetsuya Ogata [招待有り]

SICE Annual Conference (SICE2023)

発表年月： 2023年09月
AI & Robotics for Accessible Intelligent Society

Tetsuya Ogata [招待有り]

World Artificial Intelligence Conference (WAIC2023)

発表年月： 2023年07月
Applications of Deep Predictive Learning for Real-World Robots

Tetsuya Ogata [招待有り]

International Conference on Ubiquitous Robotics (UR 2023)

発表年月： 2023年06月

開催年月：
2023年06月

　

　
脳科学とAIをつなぐ新たなパラダイムの出現～「自由エネルギー原理」で考える脳の特性～

乾敏郎, 磯村拓哉, 尾形哲也, 金井良太 [招待有り]

応用脳科学コンソーシアムキックオフシンポジウム

発表年月： 2023年05月
深層予測学習のコンセプトと人間操作に基づくロボットスキル学習

尾形哲也 [招待有り]

第31回日本医学会総会

発表年月： 2023年04月
理工学から見た『文理融合』教育の可能性

尾形哲也 [招待有り]

早稲田大学法学部「先端科学技術と法」シンポジウム

発表年月： 2023年03月
AIとの融合でロボットはどこまで進化できるのか-ヒューマノイド（人間型）ロボット技術の現在とこれから-

尾形哲也 [招待有り]

第2回SENSPIREフォーラム

発表年月： 2023年01月
能動的推論のための深層予測学習モデル〜ロボット動作生成への応用成功事例〜

尾形哲也 [招待有り]

LabTech Talk vol.78

発表年月： 2023年01月
The State of AI Embeded Robotics: How AI (Deep Learning) is Applied to the Smart Robot (AIREC Case)

Tetsuya Ogata [招待有り]

AI Summit 2022 Seoul

発表年月： 2022年12月
Deep Predictive Learning for Humanoid Intelligence

Tetsuya Ogata [招待有り]

OIST/Humanoids2022 Joint Workshop

発表年月： 2022年12月
深層予測学習によるロボットの物体操作学習

尾形哲也 [招待有り]

精密工学会超精密位置決め専門委員会定例会

発表年月： 2022年11月
Toward Personal AI-based robots with Deep Predictive Learning

Tetsuya Ogata [招待有り]

Workshop on Trends and advances in integrating machine learning and automated reasoning for intelligent robots and systems, IEEE International Conference on Intelligent Robots and Systems (IROS2022)

発表年月： 2022年10月
Deep Predictive Learning for AI robots that can learn and act alongside humans

Tetsuya Ogata [招待有り]

Big Challenge Forum, IEEE International Conference on Intelligent Robots and Systems (IROS2022)

発表年月： 2022年10月
深層学習による予測情報処理に基づいたロボットの物体操作の学習

尾形哲也 [招待有り]

第43回日本レーザー医学会総会

発表年月： 2022年10月

開催年月：
2022年10月

　

　
予測符号化モデルとしての深層予測学習とロボット知能化

尾形哲也 [招待有り]

第7回全脳アーキテクチャシンポジウム

発表年月： 2022年10月
実世界で人と共生できるロボットの実現に向けて

尾形哲也 [招待有り]

追手門学院大学心理学部人工知能・認知科学専攻公開講座「人と触れ合う人工知能の現在と未来」

発表年月： 2022年10月
実世界で活動するロボットのための深層予測学習の概念と応用事例

尾形哲也 [招待有り]

第24回Science Cafe

発表年月： 2022年09月
自由エネルギー原理から考える知能ロボティクス

尾形哲也 [招待有り]

応用脳科学コンソーシアム

発表年月： 2022年09月
実ロボットのための深層予測学習の実装と応用

尾形哲也 [招待有り]

DAシンポジウム2022

発表年月： 2022年09月

開催年月：
2022年08月

-

2022年09月
身体を持ったAI – 深層予測学習のコンセプトと応用

尾形哲也 [招待有り]

神奈川県立産業技術総合研究所先端科学技術セミナー2022 ロボティクス編

発表年月： 2022年09月
予測符号化原理に基づく実ロボットの知能化と事例

尾形哲也 [招待有り]

シンギュラリティサロン

発表年月： 2022年07月
深層予測学習に基づくロボット知能化 – 身体による世界の能動的知覚

尾形哲也 [招待有り]

MIRU2022

発表年月： 2022年07月
Intelligence of Real Robots Based on Predictive Coding Principle

Tetsuya Ogata [招待有り]

International Symposium on Artificial Intelligence and Brain Science 2022

発表年月： 2022年07月

開催年月：
2022年07月

　

　
Toward Embodied Intelligence with Deep Predictive Learning and Real Robots

Tetsuya Ogata [招待有り]

Neuro2022

発表年月： 2022年07月

開催年月：
2022年06月

-

2022年07月
深層予測学習の概念とロボット応用事例

尾形哲也 [招待有り]

日本設計工学会春季大会

発表年月： 2022年05月
深層予測学習によるAIとロボットの共進化と実世界応用

尾形哲也 [招待有り]

AI・人工知能EXPO

発表年月： 2022年05月
Neurorobotics model studies based on the policy of prediction error minimization

Tetsuya Ogata [招待有り]

A JAPANESE-GERMAN CONFERENCE

発表年月： 2022年05月
AIロボットが拓く新しい社会

尾形哲也 [招待有り]

第２回合同シンポジウム『総合知による社会課題の解決：スマート社会を導く物流システム・AIロボット技術』

発表年月： 2022年03月
身体知（Embodied Intelligence）とロボット知能ー人文社会科学との協働への期待ー

尾形哲也 [招待有り]

「人と情報テクノロジーの共生のための人工知能の哲学2.0の構築」総括シンポジウム

発表年月： 2022年03月
Deep Predictive Learning for Embodied Intelligence

Tetsuya Ogata [招待有り]

Embodied Intelligence Conference 2022

発表年月： 2022年03月

開催年月：
2022年03月

　

　
深層予測学習に基づく実ロボットの知能化

尾形哲也 [招待有り]

理化学研究所GRPシンポジウム2022

発表年月： 2022年03月
深層予測学習によるロボットの知能化コンセプトと事例

尾形哲也 [招待有り]

DENSO Robotics Expo (DREx2022)

発表年月： 2022年03月
ロボットによる生産革新のための深層学習・予測学習を規範とした行動学習の取り組み方

尾形哲也 [招待有り]

2022国際ロボット展併催セミナー

発表年月： 2022年03月
深層予測学習を用いたロボットの操作スキル学習

尾形哲也 [招待有り]

第10回日本婦人科ロボット手術学会

発表年月： 2022年01月

開催年月：
2022年01月

　

　
AI Roboticsの現在から未来への展望

尾形哲也 [招待有り]

未来の医療を創る”医療人2030″育成プロジェクト【第2部】医療人2030育成プログラム

発表年月： 2021年12月
深層予測学習による身体知の実現に向けて ― データから経験の学習へ

尾形哲也 [招待有り]

人工知能学会合同研究会

発表年月： 2021年11月
AIロボットの社会実装とエッジ活用

尾形哲也 [招待有り]

ARC Processor “Virtual” Summit 2021

発表年月： 2021年11月
スマートロボットによる環境との柔軟なインタラクションの実現

尾形哲也 [招待有り]

第8回AI×ロボティクス(2) AIとロボットの共進化，情報処理学会連続セミナー

発表年月： 2021年10月
Toward Embodied Intelligence with Predictive Learning – From Data to Experiences

Tetsuya Ogata [招待有り]

5th Workshop on Semantic Policy and Action Representations for Autonomous Robots, IEEE International Conference on Intelligent Robots and Systems (IROS2021)

発表年月： 2021年09月
深層学習によるロボット知能の革新

尾形哲也 [招待有り]

市村賞受賞記念フォーラム2021

発表年月： 2021年07月
深層学習によるロボットの運動-言語の学習とインタラクション

尾形哲也 [招待有り]

情報学展望２，京都大学OPENCOURSEWARE

発表年月： 2021年07月
ロボットシステムと AI 技術～深層学習のロボット応用

尾形哲也 [招待有り]

システム構築のためのＡＩ講座, システムイノベーションセンター

発表年月： 2021年07月
Deep Predictive Learning: Real-Time Motion Adaptation for Prediction Error Minimization

Tetsuya Ogata [招待有り]

Virtual Workshop on Robot Learning in Real-world Applications: Beyond Proof of Concept, IEEE International Conference on Robotics and Automation (ICRA2021)

発表年月： 2021年05月
ディープラーニングが革新するロボットの知能化と産業

尾形哲也 [招待有り]

AI・人工知能EXPO

発表年月： 2021年04月
AIとロボットの共進化に向けて

尾形哲也 [招待有り]

第6回次世代ロボット研究機構シンポジウム，早稲田大学次世代ロボット研究機構

発表年月： 2021年02月
Exawizardsにおける AIロボット共同研究の紹介

尾形哲也 [招待有り]

NeurIPS2020オンライン読み会, エクサウィザーズ

発表年月： 2021年01月
Deep Predictive Learning and Robot Applications

Tetsuya Ogata [招待有り]

Knowledge Based Reinforcement Learning Workshop, International Joint Conference on Artificial Intelligence (IJCAI)

発表年月： 2021年01月
情動・感情と身体知の考察とインタラクションへの展開

尾形哲也 [招待有り]

感情とAI冬のワークショップ

発表年月： 2020年12月
議論参加

尾形哲也 [招待有り]

AIと民事法の交差点―完全自動運転車の責任，早稲田大学法学会主催第66回模擬裁判（民事）

発表年月： 2020年12月
予測誤差最小化モデルを基盤としたロボット動作学習

尾形哲也 [招待有り]

科学技術未来戦略ワークショップ「脳型AIアクセラレータ～柔軟な高度情報処理と超低消費電力化の両立～」，文部科学省科学技術振興機構（JST）CRDS

発表年月： 2020年11月
Acquisition of Representations for Linguistic-Behavioral Integration by Deep Learning Models

Tetsuya Ogata [招待有り]

Workshop on Trends and advances in machine learning and automated reasoning for intelligent robots and systems (AI&R 2020), The 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2020)

発表年月： 2020年10月
階層型深層学習モデルによるロボットの動作学習ー異なる環境変化への動的適応ー

尾形哲也 [招待有り]

第41回IBISML研究会，電子情報通信学会

発表年月： 2020年10月
ディスカッション3:人知と機械計算の正しい活用

尾形哲也 [招待有り]

第38回日本ロボット学会学術講演会オーガナイズドセッション

発表年月： 2020年10月
エクスペリエンス・ベースド・ロボティクスの提案ーロボットの身体経験の学習から言語理解へー

尾形哲也 [招待有り]

NVIDIA GTC DIGITAL Japan 2020

発表年月： 2020年10月
Deep Learning for Robotics - From data to experiences -

Tetsuya Ogata [招待有り]

merging Research Trends on Robotics and it’s Applications (ERTRA 2020), India-Japan Cooperative Science Program (IJCSP)

発表年月： 2020年10月
話題提供

尾形哲也 [招待有り]

OS09: プロジェクション科学の基盤拡充を目指して：関連諸科学との対話，日本認知科学会

発表年月： 2020年09月
AI人材教育の現状と今後について，AI時代に必要な人物像とその教育について考える

尾形哲也 [招待有り]

サーティファイWebセミナー

発表年月： 2020年08月
ディープラーニングのロボット応用事例ーデータからエクスペリエンスへ

尾形哲也 [招待有り]

Deep Learning Digital Conference，Deep Learning Lab & CDLE (Community of Deep Learning Evangelists)

発表年月： 2020年08月
私立大学から見た社会人博士

尾形哲也 [招待有り]

社会人学位取得奨励シンポジウム「社会人博士が活躍する社会へむけて」

発表年月： 2020年07月
Panel

Tetsuya Ogata [招待有り]

Workshop on Closing the Academia to Real-World Gap in Service Robotics, International Conference on Robotics: Science and Systems (RSS 2020)

発表年月： 2020年07月
深層学習によるロボットの運動-言語の学習とインタラクション

尾形哲也 [招待有り]

情報学展望２，京都大学OPENCOURSEWARE

発表年月： 2020年06月
ロボットにおける自律性情動反応モデルと感情の考察

尾形哲也 [招待有り]

企画セッション：感情とAI～感情研究の夜明け～，人工知能学会全国大会

発表年月： 2020年06月
話題提供

尾形哲也 [招待有り]

オンライン国際シンポジウム「パラダイムチェンジにおけるレジリエントな共創社会に向けて」，LINK-J

発表年月： 2020年05月
ディープラーニング×ロボティクスー知能ロボットの現在とこれからー

尾形哲也 [招待有り]

Emerging Technology Nite #16, MIT Technology Review

発表年月： 2020年03月
深層学習モデルのロボットへの応用

尾形哲也 [招待有り]

科学技術未来戦略ワークショップ「深層学習と知識・記号推論の融合によるAI基盤技術の発展」，文部科学省科学技術振興機構（JST）CRDS

発表年月： 2020年01月
人工知能を基盤とする日常生活支援ロボットの研究開発

尾形哲也 [招待有り]

NEDO AI＆ROBOT NEXT シンポジウム

発表年月： 2020年01月
深層学習のロボット動作学習への応用と研究事例

尾形哲也 [招待有り]

serBOTinQ & T-AI-Comジョイントセミナー

発表年月： 2019年12月
ロボットによる生産革新のための深層学習・予測学習を規範とした行動学習の実践的手法

尾形哲也 [招待有り]

2019国際ロボット展併催セミナー

発表年月： 2019年12月
未来からのクエスチョン「AIテクノロジーへの期待と課題を語ろう：先端研究・ニュービジネス・人材育成」

尾形哲也 [招待有り]

QWSアカデミア

発表年月： 2019年11月
Deep Learning for Robot Motion Generation – Dynamic Goal Inference by Gradient Descent

Tetsuya Ogata [招待有り]

Workshop on Learning Representations for Planning and Control, The 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019)

発表年月： 2019年11月
ディープラーニングによるロボット研究の新しい展開

尾形哲也 [招待有り]

公開講座／ソフトサイエンスシリーズ第41回「AIとロボット」

発表年月： 2019年10月
深層学習と運動感覚学習ー認知発達ロボティクスの視点からー

尾形哲也 [招待有り]

第4回京都こころ会議シンポジウム「こころとArtificial Mind」

発表年月： 2019年10月
深層学習によるロボットの感覚運動学習と応用事例

尾形哲也 [招待有り]

脳型情報処理機械論，東京大学大学院情報理工学系研究科

発表年月： 2019年10月
生産革新のためのDeep Learningの使い方と知能システムの行動学習の取り組み方～Deep Learningの手法からマルチモーダル、深層強化学習、動作生成モデルまで～

尾形哲也 [招待有り]

日刊工業新聞社セミナー

発表年月： 2019年09月
Deep Learning in Robots from the perspective of Cognitive Developmental Robotics

Tetsuya Ogata [招待有り]

International Symposium on Machine Intelligence for Future Society 2019

発表年月： 2019年09月
深層学習モデルによるロボットの感覚運動学習

尾形哲也 [招待有り]

ヒューマノイド・ロボティクス2019夏の学校

発表年月： 2019年09月
Deep Predictive Learning for Robot System and Application Examples

Tetsuya Ogata [招待有り]

The 4th World Congress of Robotics (WCR-2019)

発表年月： 2019年09月
“Algorithms: Define AI for the Future”——International Frontier Algorithms Summit

Tetsuya Ogata [招待有り]

World Artificial Intelligence Conference 2019

発表年月： 2019年08月
ディープラーニングの基礎と応用事例－基本からロボットシステムへの展開まで－

尾形哲也 [招待有り]

情報機構セミナー

発表年月： 2019年08月
Language Grounding in Robot Behavior by Deep Learning

Tetsuya Ogata [招待有り]

3rd Workshop on Language Learning at the 9th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics (ICDL-EPIROB 2019)

発表年月： 2019年08月
深層学習と運動感覚学習─認知発達ロボティクスの視点から─

尾形哲也 [招待有り]

京都大学こころの未来研究センター第5回こころ研究会

発表年月： 2019年08月
深層予測学習を用いた感覚運動系の獲得と共同研究事例

尾形哲也 [招待有り]

精密工学会画像応用技術専門委員会定例研究会

発表年月： 2019年07月
深層学習によるロボットの運動-言語の学習とインタラクション

尾形哲也 [招待有り]

情報学展望２，京都大学OPENCOURSEWARE

発表年月： 2019年07月
深層学習と認知ロボティクス

尾形哲也 [招待有り]

早稲田地球再生塾シンポジウム2019「脳科学と感性科学の融合」

発表年月： 2019年07月
深層学習によるロボット動作の知能化と応用例

尾形哲也 [招待有り]

第 152 回微小光学研究会「AI で拡げる微小光学」

発表年月： 2019年06月
深層学習を用いた生活支援ロボットの模倣学習

尾形哲也 [招待有り]

人工知能学会全国大会NEDO企画セッション

発表年月： 2019年06月
深層学習の概要とロボット応用事例

尾形哲也 [招待有り]

日本フルードパワーシステム学術講演会

発表年月： 2019年05月
Predictive learning with deep neural network for robot systems

Tetsuya Ogata [招待有り]

IoT Enabling Sensing/Network/AI and Photonics Conference 2019 (IoT-SNAP2019)

発表年月： 2019年04月
AI（深層学習）のロボット応用における企業との連携事例

尾形哲也 [招待有り]

群馬県立群馬産業技術センターリニューアル記念事業ＡＩ技術講演会

発表年月： 2019年04月
Deep leaning Models for Applications in Robotics

Tetsuya Ogata [招待有り]

JST-CREST / IEEE-RAS Spring School on “Social and Artificial Intelligence for User-Friendly Robots” (SoAIR 2019)

発表年月： 2019年03月
Recent and Future AI Engineering and Its Applications to Robotics

Tetsuya Ogata [招待有り]

The 1st Workshop on Active Matter for Soft Robotics

発表年月： 2019年03月
深層学習によるロボットシステム自動化に向けて

尾形哲也 [招待有り]

研磨の基礎科学とイノベーション化専門委員会研究会

発表年月： 2019年03月
予測不確実性に基づく認知と行動変化ーニューロロボティクスの視点から

尾形哲也 [招待有り]

「深層学習の先にあるもの – 記号推論との融合を目指して(2)」公開シンポジウム

発表年月： 2019年03月
深層学習（Deep Learning）によるマルチモーダル学習とロボットの行動学習

尾形哲也 [招待有り]

トリケップスセミナー

発表年月： 2019年02月
ディープラーニングのロボット応用のアプローチと応用事例

尾形哲也 [招待有り]

日本監査役協会講演会

発表年月： 2019年01月
AI（深層学習）とロボットの統合研究と今後の展望，ＡＩ・ロボットで描く未来

尾形哲也 [招待有り]

名城大学平成30年度先端科学セミナー

発表年月： 2018年12月
Workshop at EAAIC 2018

尾形哲也 [招待有り]

Engineering Applications of Artificial Intelligence Conference (マレーシア) UTM

発表年月： 2018年12月
ディープラーニングによる新しい実世界システムに向けて－ロボット研究を事例として

尾形哲也 [招待有り]

スイス・リーアニュアルフォーラム 2018 スイス再保険会社

発表年月： 2018年11月
ロボットへの深層学習利用による福祉応用の可能性

尾形哲也 [招待有り]

やまぐち介護・福祉機器研究会介護・福祉機器開発セミナー (山口県) やまぐち介護・福祉機器研究会

発表年月： 2018年11月
深層予測学習の活用による多様なロボット動作学習

尾形哲也 [招待有り]

平成30年度第３回電子デバイス事業化フォーラム (福山) 中国地域創造研究センター

発表年月： 2018年11月
Invited Talk at Workshop of IEEE-RAS International Conference on Humanoid Robots 2018

尾形哲也 [招待有り]

IEEE-RAS International Conference on Humanoid Robots 2018 IEEE-RAS

発表年月： 2018年11月
深層予測学習を利用したロボットの適応的動作生成－様々なロボットへの応用－

尾形哲也 [招待有り]

『TSC Foresight』セミナー NEDO

発表年月： 2018年10月
ディープラーニングがロボットを変える，その先にある未来

尾形哲也 [招待有り]

NTTテクノクロスフェア2018 NTT

発表年月： 2018年10月
Neural models for linguistic and behavioral integration learning in robots

尾形哲也 [招待有り]

The 28th Annual Conference of the Japanese Neural Network Society (JNNS2018) The Japanese Neural Network Society (JNNS2018)

発表年月： 2018年10月
Deep Neural Models for Robot Systems based on Predictive Learning

尾形哲也 [招待有り]

The 28th Annual Conference of the Japanese Neural Network Society (JNNS2018) The Japanese Neural Network Society

発表年月： 2018年10月
深層学習・予測学習を規範としたロボット行動学習

尾形哲也 [招待有り]

Japan Robot Week2018併催セミナー (東京) 日刊工業新聞社

発表年月： 2018年10月
AI Robot Technology for the Ageing Society

尾形哲也 [招待有り]

MIRAI Seminar 2018 Workshop (東京) Waseda University

発表年月： 2018年10月
Recurrent Neural Models for Translation between Robot Actions and Language,

尾形哲也 [招待有り]

Workshop at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018) (マドリッド) IEEE/RSJ

発表年月： 2018年10月
ディープニューラルネットの力学系構造設計による複数動作の統合

尾形哲也 [招待有り]

NVIDIA GTC Japan 2018 (東京) NVIDIA

発表年月： 2018年09月
AI・ロボット研究における領域融合～マニピューレーションとコミュニケーション～

[招待有り]

SICE2018併設イベント計測自動制御学会

発表年月： 2018年09月
ディープラーニングはロボット動作をどう変えるか

尾形哲也 [招待有り]

IGPI Tech Day 2018 IGPI

発表年月： 2018年09月
ディープラーニングの基礎とロボットの環境認識・行動学習への応用

尾形哲也 [招待有り]

サイエンス&テクノロジーセミナーサイエンス&テクノロジー

発表年月： 2018年08月
深層学習による実ロボットの効率的な動作教示

尾形哲也 [招待有り]

とやまロボット技術研究会富山県新世紀産業機構

発表年月： 2018年08月
深層学習によるロボット動作の予測・模倣学習

尾形哲也 [招待有り]

CEDEC2018 (横浜)

発表年月： 2018年08月
ディープラーニングの基礎と応用事例－実ロボット行動学習への応用へ－

尾形哲也 [招待有り]

情報機構セミナー (東京) 情報機構

発表年月： 2018年08月
深層学習の概説と、”Deep Cognitive Systems”による認知ロボティクスについて

尾形哲也 [招待有り]

丸の内AI倶楽部 (東京) 丸の内AI倶楽部

発表年月： 2018年07月
ディープラーニング（Deep Learning）とロボットの行動系列学習～概要と今後の展望

尾形哲也 [招待有り]

トリケップスセミナー (東京) オーム社

発表年月： 2018年07月
Dynamical Integration of Language and Robot Actions by Deep Learning

尾形哲也 [招待有り]

Second International Workshop on Symbolic-Neural Learning (SNL-2018) (名古屋)

発表年月： 2018年07月
長期インタラクション創発を可能とする知能化空間の設計論

尾形哲也 [招待有り]

情報処理学会創立50周年記念全国大会，JSTさきがけセッション

発表年月： 2010年03月
Model of Tool-Body Assimilation based on Neuro-Dynamical System

Tetsuya Ogata [招待有り]

JSPS-DFG Round Table on ‘Cooperative Technology in future: Cognitive Technical Systems’

発表年月： 2010年02月
Informatics for Interaction Emergence by Synthetic Approach with Neuro-Dynamical System

Tetsuya Ogata [招待有り]

Workshop on “Synergistic Intelligence: approach to human intelligence through understanding and design of cognitive development”, IEEE/RSJ IROS

発表年月： 2009年10月
声道物理モデルの母音列繰り返し模倣による音素獲得シミュレーション

尾形哲也 [招待有り]

日本ロボット学会第27回学術講演会

発表年月： 2009年09月
Dynamics of Human-Robot Interaction with Robot Audition

Tetsuya Ogata [招待有り]

The First Tsinghua-Kyoto Symposium on Intelligent Technologies and Information Management for Knowledge Society

発表年月： 2009年08月
A Synthetic Approach to Dynamical Systems with Symbol Processing in Multimodal Interaction

Tetsuya Ogata [招待有り]

CoTeSys Spring Workshop 2009

発表年月： 2009年04月
私が考える構成論的な知の理解

尾形哲也

第14回ロボティクスシンポジア特別企画『プレシンポジア：提言日本のロボティクス研究について』

発表年月： 2009年03月
ロボットの順逆モデル転用による他者視点獲得と動作模倣

尾形哲也 [招待有り]

学術創成研究「記号過程を内包した動的適応システムの設計論」&日本ロボット学会ロボティック・サイエンス研究会合同研究会

発表年月： 2009年01月
ロボットの自己モデル再利用による異種モダリティ変換と模倣

尾形哲也 [招待有り]

統計数理研究所共同研究集会-動的システムの情報論(8)「ロボットと生命における境界」

発表年月： 2008年12月
空間からの行為情報抽出への力学系アプローチ

尾形哲也 [招待有り]

計測自動制御学会相互作用と賢さ部会研究会

発表年月： 2008年10月
話題提供

尾形哲也 [招待有り]

自己・他者認知の統合的メカニズム：脳・発達・ロボティクスからの挑戦，日本心理学会第72回大会

発表年月： 2008年09月
動的環境認知と記号創発への構成論的アプローチ

尾形哲也 [招待有り]

学術創成研究費「記号過程を内包した動的適応システムの設計論」平成19年度研究成果発表会

発表年月： 2008年03月
インタラクション創発システム情報学の提案

尾形哲也 [招待有り]

第8回システムインテグレーション部門講演会 (SI2007)

発表年月： 2007年12月
脳型ロボットシステムによる動的環境知覚と模倣インタラクション

尾形哲也 [招待有り]

日本知能情報ファジィ学会関東支部第58回ファジィフロント

発表年月： 2007年09月
Human Robot Interaction from the Dynamical Systems Perspective

Tetsuya Ogata [招待有り]

JSPS Japanese-German Colloquium on Robotics

発表年月： 2006年11月
公開シンポジウム「行為の認識と生成」

尾形哲也 [招待有り]

電子情報通信学会ヒューマンコミュニケーショングループシンポジウム

発表年月： 2006年03月
人間とロボットの相互適応と原始シンボルによるインタラクション

尾形哲也 [招待有り]

日本ロボット学会ロボット工学セミナー「ヒューマノイドインタラクションテクノロジー」

発表年月： 2006年03月
人間とロボットの飽きないコミュニケーションに向けて

尾形哲也 [招待有り]

情報処理学会関西支部大会,

発表年月： 2005年10月
模倣・相互適応と人間-ロボットインタラクション

尾形哲也 [招待有り]

京都大学大学院情報学研究科第7回情報学シンポジウム「予測とシステム同定」

発表年月： 2004年12月
Generation of the Joint Attention in the Human-Robot Interaction and Internal Representation

Tetsuya Ogata [招待有り]

Workshop on “Sensory-motor Co-ordination in Human-Robot Interaction”, IEEE/RSJ IROS

発表年月： 2002年10月

▼全件表示

共同研究・競争的資金等の研究課題

一人に一台一生寄り添うスマートロボットAIREC

内閣府ムーンショット

研究期間:

2020年10月

-

2025年03月

菅野重樹

　概要を見る

柔軟な機械ハードウェアと多様な仕事を学習できる独自のAIとを組み合わせたロボット進化技術を確立します。それにより2050年には、家事、接客はもとより、人材不足が迫る福祉、医療などの現場で、人と一緒に活動できる汎用型AIロボットの実現により、人・ロボット共生社会を実現します。
実世界に埋め込まれる人間中心の人工知能技術の研究開発

NEDO 人と共に進化する次世代人工知能に関する技術開発事業

研究期間:

2020年04月

-

2022年03月
Dense 3-axis tactile sensing and AI to implement human-like manual skills in robots

日本学術振興会科学研究費助成事業基盤研究(B)

研究期間:

2019年04月

-

2022年03月

シュミッツアレクサンダー, 尾形哲也, 玉城絵美, Somlor Sophon

　概要を見る

In this research we develop a smart sensing system that enables robot hands to achieve human-like manipulation skills. Key components are 1. dense 3-axis tactile sensors for robot hands and 2. learning algorithms exploiting massive 3-axis tactile data for intelligent force control.
 
We integrated the tactile skin sensors in grippers and robot hands. Using a novel joint (with a remote center of motion mechanism) we could achieve full coverage of the palmar side of the fingers with sensors in one gripper. Furthermore, we instrumented human hands with the sensors, to enable skill transfer from human to robot hands in the future. We used the skin sensors integrated in the robot hands for various machine learning experiments. In particular, we used deep convolutional neural networks for tactile object recognition as well as for in-hand manipulation.
記号創発ロボティクスによる人間機械コラボレーション基盤創成

科学技術振興機構 CREST

研究期間:

2015年10月

-

2021年03月
日常生活支援ロボット

NEDO 次世代人工知能・ロボット中核技術開発

研究期間:

2017年04月

-

2020年03月
深層学習を用いたロボットの動作プリミティブの獲得と行動生成

日本学術振興会科学研究費助成事業基盤研究(A)

研究期間:

2015年04月

-

2020年03月

尾形哲也, 有江浩明

　概要を見る

近年，深層学習が多様な領域で利用されているが，その適用範囲は電子化されたデータ処理に特化されており，実世界での作業性が十分に得られていない．一方，ロボットを用いた生活支援が強く期待されている．近年は，汎用型のロボットOSを利用した多機能型汎用ロボットの可能性が着目されている．そこで本研究では，ロボットOSと深層学習を用いたロボット動作学習を実現し，ロボットの知能化と開発コスト削減を行った．
具体的には，乳幼児の発達学習に関連した認知発達ロボティクス研究の成果を活用し，模倣学習，予測符号化などの概念を元に，実ロボットのから得られる感覚運動情報（経験）を深層学習によりモデル化する手法を提案した．
構成論的発達科学－胎児からの発達原理の解明に基づく発達障害のシステム的理解－

日本学術振興会科学研究費助成事業

研究期間:

2012年06月

-

2017年03月

國吉康夫, 明和政子, 熊谷晋一郎, 長井志江, 小西行郎, 乾敏郎, 浅田稔, 板倉昭二, 尾形哲也, 船曳康子

　概要を見る

本領域では，ロボティクス，医学，心理学，神経科学，発達障害当事者研究等が連携し，胎児からの連続的発達過程を観測，モデル化，シミュレーション，解釈・検証し，発達原理の解明と発達障害の新たな理解を得ることを目指した．
総括班では，領域内融合と成果共有のための全体会議，国際シンポジウムや一般向講演会他のアウトリーチ活動，Web情報開示，若手研究者育成，外部評価等を多数実施した．この結果，領域内の分野を超えた融合と共同研究が活発に行われ，その中で若手研究者も活躍し成長した．当初計画に沿った成果に加え，想定範囲外の新展開と成果も得られた．そして，本新学術領域が活力ある形で確立され，発展を続けている．
構成論的発達科学－胎児からの発達原理の解明に基づく発達障害のシステム的理解－

日本学術振興会科学研究費助成事業新学術領域研究(研究領域提案型)

研究期間:

2012年06月

-

2017年03月

國吉康夫
社会的認知発達モデルとそれに基づく発達障害者支援システム構成論

日本学術振興会科学研究費助成事業新学術領域研究(研究領域提案型)

研究期間:

2012年06月

-

2017年03月

長井志江, 田中文英, 尾形哲也, 吉川雄一郎, 西出俊, 浅田稔

　概要を見る

社会的認知発達の基盤として「感覚・運動信号の予測学習」に基づく理論を提案し，計算論的視点から認知発達の構成的理解と発達障害者支援システムの開発に取り組んだ．乳幼児－養育者相互作用の解析では，乳幼児の身体的・社会的随伴性の発達と，それを促進する養育者の行動変容を明らかにした．計算論的発達モデルの研究では，多様な認知機能が予測学習に基づき獲得されること，モデルのパラメータ変動が発達障害を生み出すことを明らかにした．発達障害者の理解・支援システムの開発では，社会性の問題を引き起こす感覚・運動の非定型性を定量化・再現するシステムを実現した．これらの成果は認知発達原理の解明に大きく貢献している．
長期インタラクション創発を可能とする知能化空間の設計論

科学技術振興機構さきがけ

研究期間:

2009年10月

-

2015年03月
ロボットの能動知覚に基づく物体挙動予測モデルと道具使用

日本学術振興会科学研究費助成事業基盤研究(B)

研究期間:

2009年

-

2012年

尾形哲也, 西出俊

　概要を見る

本研究課題では,ロボットによる多様な一般物体の認識と道具使用を目的として,身体を用いた能動的知覚行為による物体挙動予測のモデルを実ロボットに具現化した.具体的には再帰結合型神経回路モデルと人間型ロボットを利用した,(1) 物体挙動に伴う音響信号の予測と分類,(2) 予測信頼性に基づく身体と物体の選択学習,さらに(3) 道具身体化モデルを構築した.本研究の成果は,工学的応用のみならず認知科学等の学術的な研究への展開も期待できる.
音環境理解に基づくロボット聴覚の構築

日本学術振興会科学研究費助成事業基盤研究(S)

研究期間:

2007年

-

2011年

奥乃博, 尾形哲也, 駒谷和範, 高橋徹, 白松俊, 中臺一博, 北原鉄朗, 糸山克寿, 浅野太, 浅野太

　概要を見る

音環境理解の主要機能である混合音の音源定位・分離・認識技術を開発し,ロボット聴覚ソフトウエアHARKとして公開し,国内外で複数の講習会を実施した. HARKを応用し,複数話者同時発話を認識する聖徳太子ロボット,ユーザの割込発話を許容する対話処理などを開発し,その有効性を実証した.さらに,多重奏音楽演奏から書くパート演奏を聞き分ける技術,実時間楽譜追跡機能を開発し,人と共演をする音楽ロボットなどに応用した。
記号過程を内包した動的適応システムの設計論

日本学術振興会科学研究費助成事業学術創成研究費

研究期間:

2007年

-

2011年

椹木哲夫, 土屋和雄, 門内輝行, 冨田直秀, 横小路泰義, 尾形哲也, 青柳富誌生, 水山元, 中西弘明, 堀口由貴男, 青井伸也, 谷口忠大

　概要を見る

複雑なシステムの中におかれたヒトや生体は,自らを取り巻くところの環境や社会を能動的に意味づけ,価値づけ,自らの棲む世界として秩序化していくことができる.本研究課題では,このような自律的主体の「多様性の生成と選択」の機構を「記号過程」に求め,記号の生成・利用のダイナミズムの観点から,生体細胞から環境適応機械(ロボット),社会組織に亘る様々なレベルにおける適応システムの同型性を見いだし,個々のシステム要素が外部・内部の物理的環境との相互作用を介して機能が形成される一般的過程について追究した. 5カ年の成果により,目的をもって生きる存在としての自律的な主体(人,ロボット,細胞)が, 他者主体を含む環境との相互作用を通して,意味の世界を創出して伝達する仕組み(記号過程)を解明し,システムが人を育て,人がシステムを育てる相互主導性を担保できるシステムの設計論を確立することができた。
オープンエンドな人間とロボットの協調における音声インタラクション創発に関する研究

日本学術振興会科学研究費助成事業若手研究(A)

研究期間:

2005年

-

2007年

尾形哲也

　概要を見る

本課題では,ロボットと人間との相互協調系について"力学系の視点からの解析"と"人間とのインタラクション実験"によって,その特徴を明らかにすることを目的としている.
人間同士の相互協調を考える際,互いの行為モデルを獲得することは不可欠である.しかし人間は複雑な行為モデルを有するため,その完全なモデルを獲得する事は不可能であり,現実には可能な限り早く"有効な近似モデル"を獲得することが重要となる.平成19年度はこれまでの研究をベースに,「他者」モデル獲得をテーマとして研究を展開した.具体的に本研究では「幼児が外部環境や協調者(親)を理解する際,自己身体の順逆モデルを変換し,再利用することで学習コストを軽減している」という仮説を立て,ロボットを幼児に見立てた物体操作模倣(Emulation)発達過程モデルを提案した.
提案モデルは以下の5つの学習段階を持つ.(step 1)ロボットによる自己身体・物体操作のバブリングと再起結合神経回路モデルを用いた順逆モデル獲得,(step 2)協調者とロボットにおける同一物体操作によるインタラクション(三項関係),(step 3)step 2における協調者によるロボットの模倣,(step 4)混合エキスパート神経回路モデルによる入出力(視野)変換モデルの自己組織化,(step 5)ロボットによる協調者行為予測のための変換モデル選択と模倣.
実験の結果,ロボットは協調者予測のための変換モデルを少数回インタラクションで獲得し,これを用い他者(協調者)の視点での物体操作模倣が可能であることが分かった.さらにインタラクション中に未出現の物体操作の模倣,物体操作中の静止画像からの全体動作連想などが可能であることを確認した.これらを模倣に関する知見(Piaget,1996やRizzolatti,1998など)と比較し,その妥当性を議論した.今後はより高次の模倣(MimicやImitation)への拡張と人間とロボットの相互適応系への応用に取り組んでいく.
環境音響を利用したロボットの動作生成

日本学術振興会科学研究費助成事業萌芽研究

研究期間:

2005年

-

2006年

尾形哲也

　概要を見る

本研究では"異種の感覚モダリティのマッピング"を工学的手法により実現することを目的とする.特に視覚と聴覚の変換に着目し,音響信号を表現するロボットモーションや音響モデルを再生する声道モデルダイナミクスを生成する.本技術は,多自由度ロボットのダンスパターンの自動生成や,環境音の擬音語化などに応用可能と期待される.さらに"共感覚"などの心理学の知見との関連に関する議論も期待できる.
我々は平成17年度に小型ロボットKeepon(NICT開発)を用いた動作と音響信号の変換モデルを提案した.しかしその音響信号生成は白色信号をベースとした単純なモデルを用いていたため,きわめて多様性に乏しいものであった.そこで18年度は多様な音響信号生成,さらに幼児の発音過程のモデル化を目的として,平成17年度提案したモダリティ変換モデルに,声道モデル(Maedaモデル,1990)を適用した音声模倣モデルを構築した.具体的には声道モデルによる音声バブリングを人工神経回路で学習し,その体験をもとに入力音声を再生するモデルである.
人間の実発話(母音)模倣をターゲットとして提案モデルの特性分析実験を行った.その結果,2母音の発話入力がバブリング済である場合,人間が弁別可能な高い品質で音声模倣を行えることを確認した.3母音の音声模倣では,提案モデルの汎化能力によりバブリングしていない音声パターンにおける模倣も実現した.また音響信号のみを学習した神経回路モデルと,音響信号と声道動作を学習した神経回路モデルとを比較し「音声知覚の運動理論」を支持する結果が得られた.
人間とロボットの相互学習系におけるインタラクションの創発・発達に関する研究

日本学術振興会科学研究費助成事業特定領域研究

研究期間:

2004年

-

2005年

尾形哲也

　概要を見る

実ロボットシステムにおいて従来のパターン認識技術(状況分類)は、その行動生成に直接役に立たない。本研究では認識を"対象の挙動を予測する行為"として捉える立場をとり、これに関する実験研究を行った。具体的には、感覚運動時系列情報(ダイナミクス)に基づく分節化表現(プリミティブ)に着目し、ロボットが実世界で獲得するプリミティブ表現の分析を行った。
学習・認識・生成を同時に行う手法として、パラメータ付再帰結合神経回路モデル(RNNPB)を利用した。RNNPBは複数のプリミティブを自己組織的に獲得し、その切り替えをパラメータにより制御する予測学習器である。
(1)ロボットによるダイナミクス規範能動知覚:
RNNPBによる物体能動知覚機能をロボットRobovie-IIsに導入した。対象への接触時に生じる音、動き、圧力、モータ関節角度のダイナミクスをRNNPBのパラメータ空間に表現させた。20種類以上の物体をパラメータ空間でクラスタリングが可能であること、さらに未知物体のクラスタリングも可能であることを確認した。
(2)ロボットによる擬似シンボル生成とその構造・解釈:
ロボットと人間による卓上物体の協調移動を対象に、RNNPBが獲得するプリミティブの解析を行った。モータ関節角度と物体位置情報を観測させ、RNNPBのパラメータを閾値処理することにより擬似シンボルを生成した。実験の結果、パラメータ数に応じた擬似シンボルを生成可能であることが確認された。
提案した擬似シンボルは"ロボットが自ら生成・利用できる"点で有効であるが、タスクが時間的に変化していく場合は対応しきれない。今後追加学習&汎化を可能とする枠組みを考慮する必要がある。また現時点の擬似シンボルは感覚運動ダイナミクスのアトラクターの境界条件にすぎない。今後、文法・論理構造など自然言語との関係をより詳細に検討していく必要がある。
多自由度ロボットの身体構造に基づく行動生成法に関する研究

日本学術振興会科学研究費助成事業奨励研究(A)

研究期間:

2001年

　

　

尾形哲也
人間ロボット間の情緒的コミュニケーションに関する研究

日本学術振興会科学研究費助成事業萌芽的研究

研究期間:

1999年

-

2001年

菅野重樹, 尾形哲也

　概要を見る

本研究は,ロボットと人間との円滑なコミュニケーションの実現を目指し,ロボットの行動モデルの構成法,および人間とロボットの「情緒的コミュニケーション(人間の感情移入を伴うコミュニケーション)」に関する,実用的知見を得ることを目的としている.昨年度までに,内分泌系を参考とした感情モデルを持つ自律ロボットWAMOEBA-2Ri(ワメーバ,Waseda Artificial Mind On Emotion Base)を開発してきた.
本年度は更なるコミュニケーション機能向上を目指して,視覚機能およびアームシステムの機能向上を行い,物体の自律的な把持動作・操り動作を実現した.さらにコミュニケーションにおけるシンボル(言語)の扱いについて議論を行うためにセンサ-モータ協調の原理(Sensory-Motor Coordination)について考察し,新たに階層型のコホネン型SOM(Self-Organizing Map)を構築,WAMOEBA-2Riへの導入を行った.
本モデルは,下位に設置した複数のSOMが視聴覚などのセンサ情報,モータコマンドなどの運動情報を,それぞれ独立に分節化し,さらにそこで獲得した中間情報を上位SOMが一つのパターンとして分類(統合化)する.その結果,対象は感覚-運動のコンテキストとしてモデル化される.
WAMOEBA-2Riのシミュレーションを用いて,把持動作中の各種情報(物体の色彩,手先幅-把持圧力の関係,アームにかかるトルクなど)にノイズを加え,そのとき階層型SOMが獲得する内部表現について分析を行った.その結果,次元の異なる情報間の干渉を防ぐことにより,対象の多様なモデル化(表現)が可能となることが確認された.
自己保存に基づくロボットの行動発現に関する研究

日本学術振興会科学研究費助成事業特別研究員奨励費

研究期間:

1998年

　

　

尾形哲也

▼全件表示

Misc

身体バブリングに基づく事前学習モデルを用いたマニピュレーション動作の継続学習

山北夏聖, 山北夏聖, 中條亨一, 中條亨一, 加瀬敬唯, 堂前幸康, 尾形哲也, 尾形哲也

日本ロボット学会学術講演会予稿集(CD-ROM) 42nd 2024年

J-GLOBAL
治具操作における知識情報を活用した人の行動予測とロボットの協調動作生成

玉木萌心, 玉木萌心, 中條亨一, 中條亨一, 山野辺夏樹, 堂前幸康, 尾形哲也, 尾形哲也

日本ロボット学会学術講演会予稿集(CD-ROM) 42nd 2024年

J-GLOBAL
繰り返し作業における作業テンポの指示と作業負荷・生産性の関係調査

白倉尚貴, 山野辺夏樹, 丸山翼, 堂前幸康, 尾形哲也, 尾形哲也

計測自動制御学会システムインテグレーション部門講演会(CD-ROM) 24th 2023年

J-GLOBAL
GANsの潜在空間を用いた多自由度ロボット最適軌道計画の高速化のための最適化アルゴリズムの比較

飯野寛人, 飯野寛人, 千葉直也, 森裕紀, 森裕紀, 尾形哲也, 尾形哲也

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2023 2023年

J-GLOBAL
混合エキスパートによる複数先視野予測モジュールを用いたEmbodied Question Answeringタスクの学習

上和野雄也, 鈴木彼方, 鈴木彼方, 千葉直也, 千葉直也, 森裕紀, 尾形哲也, 尾形哲也

日本機械学会ロボティクス・メカトロニクス講演会講演論文集(CD-ROM) 2023 2023年

J-GLOBAL
深層学習によるロボット知能化のためのモーションキャプチャ教示システム

大山, 知理, 鈴木, 彼方, 陽, 品駒, 尾形, 哲也

第79回全国大会講演論文集 2017 ( 1 ) 147 - 148 2017年03月

　概要を見る

本研究では，人間型ロボットの動作学習を目的として構築した遠隔操作による動作教示システムの概要を報告する．従来，ロボットの動作教示には手先トルクセンサーを用いたダイレクトティーチングが一般的であるが，この方式では画像入力と連携した複雑な動作を教示することは容易ではない．そこでHMDとモーションキャプチャにより人間がロボットを直感的に遠隔操作することで複雑な動作を効率的に試行させ，その際の画像データと関節データを深層学習モデルの教示データとして利用する．提案システムをNextageに導入した結果，双腕協調による物体マニピュレーションタスクの学習が可能となることが確認された．

CiNii
Wizard of Ozと深層学習によるロボットの柔軟物折り畳み作業

陽品駒, 佐々木一磨, 鈴木彼方, 加瀬敬唯, 高橋城志, 高橋城志, 菅野重樹, 尾形哲也, 尾形哲也

日本ロボット学会学術講演会予稿集(CD-ROM) 34th 2016年

J-GLOBAL
RTCによる深層学習モデルと柔軟関節ロボットの統合~道具身体化モデルの学習データ収集と動作実現~

陽品駒, 高橋城志, 高橋城志, 尾形哲也, 菅祐樹, 菅野重樹

日本ロボット学会学術講演会予稿集(CD-ROM) 33rd 2015年

J-GLOBAL

▼全件表示

産業財産権

情報処理装置、情報処理方法、及びプログラム

特許第6955733号

浅谷学嗣, 桐谷太郎, 宮崎祐太, 尾形哲也

特許権

J-GLOBAL
制御装置、システム、学習装置および制御方法

森裕紀, 鳥島亮太, 尾形哲也, 高橋城志, 岡野原大輔

特許権

J-GLOBAL
情報処理システムおよび情報処理方法、並びにプログラム

4472506

菅野重樹, 金天海, 尾形哲也

特許権

J-GLOBAL

現在担当している科目

Intermedia Art and Science Laboratory A

基幹理工学部

2025年秋学期
理工学基礎実験２Ａ　表現

基幹理工学部

2025年春学期
Intermedia Art and Science Laboratory B

基幹理工学部

2025年春学期
卒業論文・制作Ａ

基幹理工学部

2025年春学期
プロジェクト学習２　18前再　【前年度成績S評価者用】

基幹理工学部

2025年通年
卒業論文・制作Ｂ　（春学期）

基幹理工学部

2025年春学期
プロジェクト学習Ａ　【前年度成績S評価者用】

基幹理工学部

2025年春学期
キャリアデザインＢ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
プロジェクト学習Ａ

基幹理工学部

2025年春学期
キャリアデザインＡ

基幹理工学部

2025年春学期
ロボティクス表現デザイン　　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
ロボティクス表現デザイン

基幹理工学部

2025年秋学期
キャリアデザインＢ　18前再

基幹理工学部

2025年秋学期
キャリアデザインＡ　18前再　【前年度成績S評価者用】

基幹理工学部

2025年春学期
キャリアデザインＡ　18前再

基幹理工学部

2025年春学期
キャリアデザインＢ　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
キャリアデザインＢ

基幹理工学部

2025年秋学期
キャリアデザインＡ　【前年度成績S評価者用】

基幹理工学部

2025年春学期
表現工学基礎（科学）　　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
表現工学基礎（科学）

基幹理工学部

2025年秋学期
動的知能表現システム基礎

基幹理工学部

2025年春学期
プロジェクト学習２　18前再

基幹理工学部

2025年通年
インタラクティブ・センシング

基幹理工学部

2025年春学期
プロジェクト学習Ｂ　【前年度成績S評価者用】

基幹理工学部

2025年秋学期
プロジェクト学習Ｂ

基幹理工学部

2025年秋学期
卒業論文・制作Ａ　（秋学期）

基幹理工学部

2025年秋学期
卒業論文・制作Ｂ

基幹理工学部

2025年秋学期
ソニーグループ寄附講座　クリエイティブエンタテインメント学I　―概念化―

基幹理工学部

2025年秋クォーター
Graduation Thesis B (Fall)

基幹理工学部

2025年秋学期
Graduation Thesis B (Fall) [S Grade]

基幹理工学部

2025年秋学期
Graduation Thesis B (Spring)

基幹理工学部

2025年春学期
Graduation Thesis A (Fall)

基幹理工学部

2025年秋学期
Graduation Thesis A (Fall) [S Grade]

基幹理工学部

2025年秋学期
Graduation Thesis A (Spring) [S Grade]

基幹理工学部

2025年春学期
Graduation Thesis A (Spring)

基幹理工学部

2025年春学期
Graduation Thesis B (Spring) [S Grade]

基幹理工学部

2025年春学期
実体情報学演習Ｄ

大学院基幹理工学研究科

2025年秋学期
実体情報学演習Ｃ

大学院基幹理工学研究科

2025年春学期
実体情報学演習Ｂ

大学院基幹理工学研究科

2025年秋学期
実体情報学演習Ａ

大学院基幹理工学研究科

2025年春学期
信頼性と設計

大学院基幹理工学研究科

2025年秋学期
実体情報学概論

大学院基幹理工学研究科

2025年春学期
Master's Thesis (Department of Intermedia Studies)

大学院基幹理工学研究科

2025年通年
修士論文（表現）

大学院基幹理工学研究科

2025年通年
実体情報学特別演習

大学院基幹理工学研究科

2025年通年
実体情報学特別演習　23前再

大学院基幹理工学研究科

2025年通年
実体情報学演習Ｊ

大学院基幹理工学研究科

2025年秋学期
実体情報学演習Ｉ

大学院基幹理工学研究科

2025年春学期
実体情報学演習Ｈ

大学院基幹理工学研究科

2025年秋学期
実体情報学演習Ｇ

大学院基幹理工学研究科

2025年春学期
実体情報学演習Ｆ

大学院基幹理工学研究科

2025年秋学期
実体情報学演習Ｅ

大学院基幹理工学研究科

2025年春学期
動的知能表現システム演習Ｄ

大学院基幹理工学研究科

2025年秋学期
動的知能表現システム演習Ｃ

大学院基幹理工学研究科

2025年春学期
動的知能表現システム演習Ｂ

大学院基幹理工学研究科

2025年秋学期
動的知能表現システム演習Ａ

大学院基幹理工学研究科

2025年春学期
Seminar on Intelligence Dynamics and Representaｔion System D

大学院基幹理工学研究科

2025年秋学期
Seminar on Intelligence Dynamics and Representaｔion System C

大学院基幹理工学研究科

2025年春学期
Seminar on Intelligence Dynamics and Representaｔion System B

大学院基幹理工学研究科

2025年秋学期
Seminar on Intelligence Dynamics and Representaｔion System A

大学院基幹理工学研究科

2025年春学期
Research on Intelligence Dynamics and Representation Systems

大学院基幹理工学研究科

2025年通年
Intelligence Dynamics and Representation System, Advanced

大学院基幹理工学研究科

2025年春学期
動的知能表現システム特論

大学院基幹理工学研究科

2025年春学期
動的知能表現システム研究

大学院基幹理工学研究科

2025年通年
実体情報学特別演習　23前再

大学院創造理工学研究科

2025年通年
実体情報学演習Ｊ

大学院創造理工学研究科

2025年秋学期
実体情報学演習Ｉ

大学院創造理工学研究科

2025年春学期
実体情報学演習Ｈ

大学院創造理工学研究科

2025年秋学期
実体情報学演習Ｇ

大学院創造理工学研究科

2025年春学期
実体情報学演習Ｄ

大学院創造理工学研究科

2025年秋学期
実体情報学演習Ｃ

大学院創造理工学研究科

2025年春学期
実体情報学演習Ｂ

大学院創造理工学研究科

2025年秋学期
実体情報学演習Ａ

大学院創造理工学研究科

2025年春学期
信頼性と設計

大学院創造理工学研究科

2025年秋学期
実体情報学概論

大学院創造理工学研究科

2025年春学期
実体情報学演習Ｆ

大学院創造理工学研究科

2025年秋学期
実体情報学演習Ｅ

大学院創造理工学研究科

2025年春学期
表現工学上級演習Ｂ

大学院基幹理工学研究科

2025年秋学期
動的知能表現システム研究

大学院基幹理工学研究科

2025年通年
表現工学上級演習Ａ

大学院基幹理工学研究科

2025年春学期
実体情報学特別演習

大学院創造理工学研究科

2025年通年
実体情報学演習Ｈ

大学院先進理工学研究科

2025年秋学期
実体情報学演習Ｂ

大学院先進理工学研究科

2025年秋学期
信頼性と設計

大学院先進理工学研究科

2025年秋学期
実体情報学演習Ａ

大学院先進理工学研究科

2025年春学期
実体情報学概論

大学院先進理工学研究科

2025年春学期
実体情報学特別演習

大学院先進理工学研究科

2025年通年
実体情報学演習Ｊ

大学院先進理工学研究科

2025年秋学期
実体情報学演習Ｇ

大学院先進理工学研究科

2025年春学期
実体情報学演習Ｆ

大学院先進理工学研究科

2025年秋学期
実体情報学演習Ｅ

大学院先進理工学研究科

2025年春学期
実体情報学演習Ｄ

大学院先進理工学研究科

2025年秋学期
実体情報学特別演習　23前再

大学院先進理工学研究科

2025年通年
実体情報学演習Ｉ

大学院先進理工学研究科

2025年春学期
実体情報学演習Ｃ

大学院先進理工学研究科

2025年春学期

▼全件表示

社会貢献活動

Technical Advisor

Integral AI, Inc.
2021年

-

継続中
ACT-X研究「AI活用学問革新」領域アドバイザー

科学技術振興機構
2020年

-

継続中
アドバイザー

株式会社アバターイン
2020年

-

継続中
アドバイザー

IGPI テクノロジー
2018年

-

継続中
さきがけ研究「社会デザイン」領域アドバイザー

科学技術振興機構
2017年

-

継続中
理事

日本ディープラーニング協会
2017年

-

継続中
ACT-I研究「情報と未来」領域アドバイザー

科学技術振興機構
2016年

-

継続中
アドバイザー

株式会社エクサウィザーズ
2016年

-

継続中

▼全件表示

学術貢献活動

技術戦略研究センター（TSC）フェロー

審査・学術的助言

新エネルギー・産業技術総合開発機構（NEDO）

2023年

-

継続中
研究開発戦略センター第１AI・情報分野委員会委員

審査・学術的助言

科学技術振興機構

2022年

-

継続中
AdCom member (At-Large)

学会・研究会等

IEEE Robotics and Automation Society

2024年

-

2026年
科学技術・学術審議会専門委員

審査・学術的助言

文部科学省

2024年

-

2025年
国家戦略分野の若手研究者及び博士後期課程学生の育成事業（ＢＯＯＳＴ）運営委員

審査・学術的助言

科学技術振興機構

2024年

　

　
「次世代人工知能・ロボットの中核となるインテグレート技術開発」技術推進委員(2022-2023)

審査・学術的助言

新エネルギー・産業技術総合開発機構（NEDO）

2022年

-

2023年
Member of National Committee

Advanced Robotics

2004年

-

2022年
企画委員

学会・研究会等

人工知能学会

2018年

-

2020年
理事

学会・研究会等

計測自動制御学会

2018年

-

2020年
ロボットによる社会変革推進会議構成員

審査・学術的助言

内閣府/文部科学省/厚生労働省/経済産業省

2019年

-

　
Area Chair

大会・シンポジウム等

The Conference on Robot Learning (CoRL)

2019年

-

　
ロボットによる社会変革推進会議構成員

審査・学術的助言

内閣府/文部科学省/厚生労働省/経済産業省

2019年

-

　
会誌編集委員

学会・研究会等

人工知能学会

2015年

-

2019年
General Chair

大会・シンポジウム等

IEEE International Conference on Development and Learning

2018年

-

　
卓越研究員候補者選考審査委員

審査・学術的助言

文部科学省

2018年

-

　
理事

学会・研究会等

人工知能学会

2016年

-

2018年
会誌出版委員会委員

学会・研究会等

計測自動制御学会

2015年

-

2016年
理事

学会・研究会等

日本ロボット学会

2013年

-

2014年
学会賞委員会委員

審査・学術的助言

計測自動制御学会

2013年

-

2014年
ロボティック・サイエンス研究専門委員会幹事

学術調査

日本ロボット学会

2008年

-

2014年
Secretary General

大会・シンポジウム等

IEEE International Conference on Intelligent Robots and Systems

2013年

-

　
表彰委員会委員

審査・学術的助言

日本機械学会ロボティクス・メカトロニクス部門

2012年

-

2013年
代議員

学会・研究会等

人工知能学会

2011年

-

2013年
「ロボット共生社会実現に向けたロボットの知能発達」先導的研究開発委員会幹事

学術調査

日本学術振興会

2010年

-

2013年
評議員

学会・研究会等

日本ロボット学会

2007年

-

2012年
論文誌編集委員

学会・研究会等

ヒューマンインタフェース学会

2009年

-

2011年

▼全件表示

他学部・他研究科等兼任情報

附属機関・学校グローバル・エデュケーション・センター
理工学術院大学院基幹理工学研究科

学内研究所・附属機関兼任歴

2025年

-

2029年

AIロボット研究所プロジェクト研究所所長
2025年

-

2026年

データ科学センター兼任センター員
2024年

-

2026年

理工学術院総合研究所兼任研究員
2024年

-

2026年

カーボンニュートラル社会研究教育センター兼任センター員
2024年

-

2026年

リサーチ・イノベーション・センター　オープンイノベーション推進セクション兼任センター員

特定課題制度（学内資金）

階層型神経回路モデルにおける予測可能性を利用した自己身体モデルの獲得

2015年

　概要を見る

本研究では，ロボットの学習機構の軸となる機構としての，自己身体モデルについて，特に視野内の自己領域と外部物体とを区別する基礎モデルを提案し，認知モデルとの対応と理解，及びロボットシステムへの応用を目指している．我々は特に，再起結合型神経回モデル（RNN）の一種であるStochastic ContinuousTime Recurrent Neural Network(S-CTRNN)を用いた方法を提案した． S-CTRNNは時系列変化の予測のみならず，その不確実性を分散として予測することが可能なモデルである．このS-CTRNNを人間型ロボットに実装し，視野内の自己のハンドとボールとのインタラクションを観察，学習させる実験を行った．その結果分散予測によって，自己身体と外部物体の運動の区別を行える可能性が示された．