研究者詳細 - 馮　起

写真a

ヒョウ　キ

馮　起

Scopus 論文情報

論文数: 13 Citation: 68 h-index: 5

Click to view the Scopus page. The data was downloaded from Scopus API in August 10, 2025, via http://api.elsevier.com and http://www.scopus.com .

Google Scholar 情報（Citations per year）

Citation: 112 h-index: 5 i10-index: 3

Click to view the Google Scholar page.

Scopus 情報

所属

理工学術院理工学術院総合研究所

職名

次席研究員（研究院講師）

学位

博士（工学） ( 2022年09月早稲田大学 )
修士（工学） ( 2019年09月早稲田大学 )
学士（工学） ( 2017年09月早稲田大学 )

ホームページ

http://qfeng.me/ja

プロフィール

深層学習とデータ生成の手法を用いて、コンピュータグラフィックス(CG)やコンピュータビジョン(CV)に関する研究を行っています。また、仮想現実(VR)や拡張現実(AR)における重要かつ難しい課題にCG/CVの手法を活用して取り込んでいます。

論文

Keep Eyes on the Sentence: An Interactive Sentence Simplification System for English Learners Based on Eye Tracking and Large Language Models

Taichi Higasa, Keitaro Tanaka, Qi Feng, Shigeo Morishima

Extended Abstracts of the CHI Conference on Human Factors in Computing Systems 2024年05月 [査読有り]

DOI

Scopus

4

被引用数

(Scopus)
Gaze-Driven Sentence Simplification for Language Learners: Enhancing Comprehension and Readability

Taichi Higasa, Keitaro Tanaka, Qi Feng, Shigeo Morishima

ACM International Conference Proceeding Series 292 - 296 2023年10月

　概要を見る

Language learners should regularly engage in reading challenging materials as part of their study routine. Nevertheless, constantly referring to dictionaries is time-consuming and distracting. This paper presents a novel gaze-driven sentence simplification system designed to enhance reading comprehension while maintaining their focus on the content. Our system incorporates machine learning models tailored to individual learners, combining eye gaze features and linguistic features to assess sentence comprehension. When the system identifies comprehension difficulties, it provides simplified versions by replacing complex vocabulary and grammar with simpler alternatives via GPT-3.5. We conducted an experiment with 19 English learners, collecting data on their eye movements while reading English text. The results demonstrated that our system is capable of accurately estimating sentence-level comprehension. Additionally, we found that GPT-3.5 simplification improved readability in terms of traditional readability metrics and individual word difficulty, paraphrasing across different linguistic levels.

DOI

Scopus

1

被引用数

(Scopus)
Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning

Sara Kashiwagi, Keitaro Tanaka, Qi Feng, Shigeo Morishima

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023-August 3397 - 3401 2023年

　概要を見る

This paper presents a novel metric learning approach to address the performance gap between normal and silent speech in visual speech recognition (VSR). The difference in lip movements between the two poses a challenge for existing VSR models, which exhibit degraded accuracy when applied to silent speech. To solve this issue and tackle the scarcity of training data for silent speech, we propose to leverage the shared literal content between normal and silent speech and present a metric learning approach based on visemes. Specifically, we aim to map the input of two speech types close to each other in a latent space if they have similar viseme representations. By minimizing the Kullback-Leibler divergence of the predicted viseme probability distributions between and within the two speech types, our model effectively learns and predicts viseme identities. Our evaluation demonstrates that our method improves the accuracy of silent VSR, even when limited training data is available.

DOI

Scopus
Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation

Qi Feng, Hubert P.H. Shum, Shigeo Morishima

Proceedings - 2023 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2023 405 - 414 2023年

　概要を見る

Pre-captured immersive environments using omnidirectional cameras provide a wide range of virtual reality applications. Previous research has shown that manipulating the eye height in egocentric virtual environments can significantly affect distance perception and immersion. However, the influence of eye height in pre-captured real environments has received less attention due to the difficulty of altering the perspective after finishing the capture process. To explore this influence, we first propose a pilot study that captures real environments with multiple eye heights and asks participants to judge the egocentric distances and immersion. If a significant influence is confirmed, an effective image-based approach to adapt pre-captured real-world environments to the user's eye height would be desirable. Motivated by the study, we propose a learning-based approach for synthesizing novel views for omnidirectional images with altered eye heights. This approach employs a multitask architecture that learns depth and semantic segmentation in two formats, and generates high-quality depth and semantic segmentation to facilitate the inpainting stage. With the improved omnidirectional-aware layered depth image, our approach synthesizes natural and realistic visuals for eye height adaptation. Quantitative and qualitative evaluation shows favorable results against state-of-the-art methods, and an extensive user study verifies improved perception and immersion for pre-captured real-world environments.

DOI

Scopus

2

被引用数

(Scopus)
Pointing out Human Answer Mistakes in a Goal-Oriented Visual Dialogue

Ryosuke Oshima, Seitaro Shinagawa, Hideki Tsunashima, Qi Feng, Shigeo Morishima

Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 4665 - 4670 2023年

　概要を見る

Effective communication between humans and intelligent agents has promising applications for solving complex problems. One such approach is visual dialogue, which leverages multimodal context to assist humans. However, real-world scenarios occasionally involve human mistakes, which can cause intelligent agents to fail. While most prior research assumes perfect answers from human interlocutors, we focus on a setting where the agent points out unintentional mistakes for the interlocutor to review, better reflecting real-world situations. In this paper, we show that human answer mistakes depend on question type and QA turn in the visual dialogue by analyzing a previously unused data collection of human mistakes. We demonstrate the effectiveness of those factors for the model's accuracy in a pointing-human-mistake task through experiments using a simple MLP model and a Visual Language Model.

DOI

Scopus
3D car shape reconstruction from a contour sketch using GAN and lazy learning

Naoki Nozawa, Hubert P.H. Shum, Qi Feng, Edmond S.L. Ho, Shigeo Morishima

Visual Computer 38 ( 4 ) 1317 - 1330 2022年04月

　概要を見る

3D car models are heavily used in computer games, visual effects, and even automotive designs. As a result, producing such models with minimal labour costs is increasingly more important. To tackle the challenge, we propose a novel system to reconstruct a 3D car using a single sketch image. The system learns from a synthetic database of 3D car models and their corresponding 2D contour sketches and segmentation masks, allowing effective training with minimal data collection cost. The core of the system is a machine learning pipeline that combines the use of a generative adversarial network (GAN) and lazy learning. GAN, being a deep learning method, is capable of modelling complicated data distributions, enabling the effective modelling of a large variety of cars. Its major weakness is that as a global method, modelling the fine details in the local region is challenging. Lazy learning works well to preserve local features by generating a local subspace with relevant data samples. We demonstrate that the combined use of GAN and lazy learning produces is able to produce high-quality results, in which different types of cars with complicated local features can be generated effectively with a single sketch. Our method outperforms existing ones using other machine learning structures such as the variational autoencoder.

DOI

Scopus

16

被引用数

(Scopus)
360 Depth Estimation in the Wild - The Depth360 Dataset and the SegFuse Network

Qi Feng, Hubert P.H. Shum, Shigeo Morishima

Proceedings - 2022 IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2022 664 - 673 2022年

　概要を見る

Single-view depth estimation from omnidirectional images has gained popularity with its wide range of applications such as autonomous driving and scene reconstruction. Although data-driven learning-based methods demonstrate significant potential in this field, scarce training data and ineffective 360 estimation algorithms are still two key limitations hindering accurate estimation across diverse domains. In this work, we first establish a large-scale dataset with varied settings called Depth360 to tackle the training data problem. This is achieved by exploring the use of a plenteous source of data, 360 videos from the internet, using a test-time training method that leverages unique information in each omnidirectional sequence. With novel geometric and temporal constraints, our method generates consistent and convincing depth samples to facilitate single-view estimation. We then propose an end-to-end two-branch multi-task learning network, SegFuse, that mimics the human eye to effectively learn from the dataset and estimate high-quality depth maps from diverse monocular RGB images. With a peripheral branch that uses equirectangular projection for depth estimation and a foveal branch that uses cubemap projection for semantic segmentation, our method predicts consistent global depth while maintaining sharp details at local regions. Experimental results show favorable performance against the state-of-the-art methods.

DOI

Scopus

18

被引用数

(Scopus)
Audio–visual object removal in 360-degree videos

Ryo Shimamura, Qi Feng, Yuki Koyama, Takayuki Nakatsuka, Satoru Fukayama, Masahiro Hamasaki, Masataka Goto, Shigeo Morishima

The Visual Computer 36 ( 10-12 ) 2117 - 2128 2020年10月

　概要を見る

<jats:title>Abstract</jats:title>
<jats:p>We present a novel concept <jats:italic>audio–visual object removal</jats:italic> in 360-degree videos, in which a target object in a 360-degree video is removed in both the visual and auditory domains synchronously. Previous methods have solely focused on the visual aspect of object removal using video inpainting techniques, resulting in videos with unreasonable remaining sounds corresponding to the removed objects. We propose a solution which incorporates direction acquired during the video inpainting process into the audio removal process. More specifically, our method identifies the sound corresponding to the visually tracked target object and then synthesizes a three-dimensional sound field by subtracting the identified sound from the input 360-degree video. We conducted a user study showing that our multi-modal object removal supporting both visual and auditory domains could significantly improve the virtual reality experience, and our method could generate sufficiently synchronous, natural and satisfactory 360-degree videos.</jats:p>

DOI

Scopus

6

被引用数

(Scopus)
Resolving hand‐object occlusion for mixed reality with joint deep learning and model optimization

Qi Feng, Hubert P. H. Shum, Shigeo Morishima

Computer Animation and Virtual Worlds 31 ( 4-5 ) 2020年07月

　概要を見る

By overlaying virtual imagery onto the real world, mixed reality facilitates diverse applications and has drawn increasing attention. Enhancing physical in-hand objects with a virtual appearance is a key component for many applications that require users to interact with tools such as surgery simulations. However, due to complex hand articulations and severe hand-object occlusions, resolving occlusions in hand-object interactions is a challenging topic. Traditional tracking-based approaches are limited by strong ambiguities from occlusions and changing shapes, while reconstruction-based methods show a poor capability of handling dynamic scenes. In this article, we propose a novel real-time optimization system to resolve hand-object occlusions by spatially reconstructing the scene with estimated hand joints and masks. To acquire accurate results, we propose a joint learning process that shares information between two models and jointly estimates hand poses and semantic segmentation. To facilitate the joint learning system and improve its accuracy under occlusions, we propose an occlusion-aware RGB-D hand data set that mitigates the ambiguity through precise annotations and photorealistic appearance. Evaluations show more consistent overlays compared with literature, and a user study verifies a more realistic experience.

DOI

Scopus

9

被引用数

(Scopus)
Foreground-aware Dense Depth Estimation for 360 Images

Qi Feng, Hubert P. H. Shum, Ryo Shimamura, Shigeo Morishima

Journal of WSCG 28 ( 1-2 ) 79 - 88 2020年

　概要を見る

With 360 imaging devices becoming widely accessible, omnidirectional content has gained popularity in multiple fields. The ability to estimate depth from a single omnidirectional image can benefit applications such as robotics navigation and virtual reality. However, existing depth estimation approaches produce sub-optimal results on real-world omnidirectional images with dynamic foreground objects. On the one hand, capture-based methods cannot obtain the foreground due to the limitations of the scanning and stitching schemes. On the other hand, it is challenging for synthesis-based methods to generate highly-realistic virtual foreground objects that are comparable to the real-world ones. In this paper, we propose to augment datasets with realistic foreground objects using an image-based approach, which produces a foreground-aware photorealistic dataset for machine learning algorithms. By exploiting a novel scale-invariant RGB-D correspondence in the spherical domain, we repurpose abundant non-omnidirectional datasets to include realistic foreground objects with correct distortions. We further propose a novel auxiliary deep neural network to estimate both the depth of the omnidirectional images and the mask of the foreground objects, where the two tasks facilitate each other. A new local depth loss considers small regions of interests and ensures that their depth estimations are not smoothed out during the global gradient’s optimization. We demonstrate the system using human as the foreground due to its complexity and contextual importance, while the framework can be generalized to any other foreground objects. Experimental results demonstrate more consistent global estimations and more accurate local estimations compared with state-of-the-arts.

DOI
Resolving occlusion for 3D object manipulation with hands in mixed reality

Qi Feng, Hubert P.H. Shum, Shigeo Morishima

Proceedings of the ACM Symposium on Virtual Reality Software and Technology, VRST 2018年11月

　概要を見る

Due to the need to interact with virtual objects, the hand-object interaction has become an important element in mixed reality (MR) applications. In this paper, we propose a novel approach to handle the occlusion of augmented 3D object manipulation with hands by exploiting the nature of hand poses combined with tracking-based and model-based methods, to achieve a complete mixed reality experience without necessities of heavy computations, complex manual segmentation processes or wearing special gloves. The experimental results show a frame rate faster than real-time and a great accuracy of rendered virtual appearances, and a user study verifies a more immersive experience compared to past approaches. We believe that the proposed method can improve a wide range of mixed reality applications that involve hand-object interactions.

DOI

Scopus

10

被引用数

(Scopus)

▼全件表示