Updated on 2022/06/30

写真a

 
SIMO SERRA, Edgar
 
Affiliation
Faculty of Science and Engineering, School of Fundamental Science and Engineering
Job title
Associate Professor
Homepage URL

Concurrent Post

  • Faculty of Political Science and Economics   School of Political Science and Economics

Education

  • 2011.08
    -
    2015.07

    BarcelonaTech (UPC)   PhD in Automatic Control, Robotics and Vision   Institute of Industrial and Control Engineering  

  • 2005.08
    -
    2011.05

    BarcelonaTech (UPC)   Superior Industrial Engineering  

Degree

  • BarcelonaTech (UPC)   PhD in Automatic Control, Robotics and Vision

Research Experience

  • 2018.09
    -
    Now

    Waseda University   Faculty of Science and Engineering

  • 2018.09
    -
    2021.03

    科学技術振興機構   さきがけ研究員 (兼任)

  • 2018.04
    -
    2018.08

    科学技術振興機構   さきがけ専任研究員

  • 2017.04
    -
    2018.03

    Waseda University   Research Institute for Science and Engineering

  • 2015.08
    -
    2017.03

    Waseda University   Graduate School of Fundamental Science and Engineering

  • 2011.09
    -
    2015.07

    BarcelonaTech (UPC)   博士課程の大学院生(FPI奨励費)

▼display all

 

Research Areas

  • Intelligent informatics

Research Interests

  • Computer Graphics

  • Computer Vision

  • Machine Learning

Papers

  • Generating Digital Painting Lighting Effects via RGB-space Geometry

    Lvmin Zhang, Edgar Simo-Serra, Yi Ji, Chunping Liu

    ACM Transactions on Graphics (Presented at SIGGRAPH)   39 ( 2 )  2020.01  [Refereed]

    DOI

  • DeepRemaster: Temporal Source-Reference Attention Networks for Comprehensive Video Enhancement

    Satoshi Iizuka, Edgar Simo-Serra

    ACM Transactions on Graphics (SIGGRAPH Asia)   38 ( 6 )  2019.11  [Refereed]

    DOI

  • Real-Time Data-Driven Interactive Rough Sketch Inking

    Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa

    ACM Transactions on Graphics (SIGGRAPH)    2018.08  [Refereed]

    DOI

  • Mastering sketching: Adversarial augmentation for structured prediction

    Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa

    ACM Transactions on Graphics   37 ( 1 )  2018.01  [Refereed]

     View Summary

    We present an integral framework for training sketch simplification networks that convert challenging rough sketches into clean line drawings. Our approach augments a simplification network with a discriminator network, training both networks jointly so that the discriminator network discerns whether a line drawing is real training data or the output of the simplification network, which, in turn, tries to fool it. This approach has two major advantages: first, because the discriminator network learns the structure in line drawings, it encourages the output sketches of the simplification network to be more similar in appearance to the training sketches. Second, we can also train the networks with additional unsupervised data: by adding rough sketches and line drawings that are not corresponding to each other, we can improve the quality of the sketch simplification. Thanks to a difference in the architecture, our approach has advantages over similar adversarial training approaches in stability of training and the aforementioned ability to utilize unsupervised training data. We show how our framework can be used to train models that significantly outperform the state of the art in the sketch simplification task, despite using the same architecture for inference. We also present an approach to optimize for a single image, which improves accuracy at the cost of additional computation time. Finally, we show that, using the same framework, it is possible to train the network to perform the inverse problem, i.e., convert simple line sketches into pencil drawings, which is not possible using the standard mean squared error loss. We validate our framework with two user tests, in which our approach is preferred to the state of the art in sketch simplification 88.9% of the time.

    DOI

  • Joint gap detection and inpainting of line drawings

    Kazuma Sasaki, Satoshi Iizuka, Edgar Simo-Serra, Hiroshi Ishikawa

    Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017   2017-   5768 - 5776  2017.11  [Refereed]

     View Summary

    We propose a novel data-driven approach for automatically detecting and completing gaps in line drawings with a Convolutional Neural Network. In the case of existing inpainting approaches for natural images, masks indicating the missing regions are generally required as input. Here, we show that line drawings have enough structures that can be learned by the CNN to allow automatic detection and completion of the gaps without any such input. Thus, our method can find the gaps in line drawings and complete them without user interaction. Furthermore, the completion realistically conserves thickness and curvature of the line segments. All the necessary heuristics for such realistic line completion are learned naturally from a dataset of line drawings, where various patterns of line completion are generated on the fly as training pairs to improve the model generalization. We evaluate our method qualitatively on a diverse set of challenging line drawings and also provide quantitative results with a user study, where it significantly outperforms the state of the art.

    DOI

  • 3D Human Pose Tracking Priors using Geodesic Mixture Models

    Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer

    International Journal of Computer Vision   122 ( 2 ) 388 - 408  2017.04  [Refereed]

     View Summary

    We present a novel approach for learning a finite mixture model on a Riemannian manifold in which Euclidean metrics are not applicable and one needs to resort to geodesic distances consistent with the manifold geometry. For this purpose, we draw inspiration on a variant of the expectation-maximization algorithm, that uses a minimum message length criterion to automatically estimate the optimal number of components from multivariate data lying on an Euclidean space. In order to use this approach on Riemannian manifolds, we propose a formulation in which each component is defined on a different tangent space, thus avoiding the problems associated with the loss of accuracy produced when linearizing the manifold with a single tangent space. Our approach can be applied to any type of manifold for which it is possible to estimate its tangent space. Additionally, we consider using shrinkage covariance estimation to improve the robustness of the method, especially when dealing with very sparsely distributed samples. We evaluate the approach on a number of situations, going from data clustering on manifolds to combining pose and kinematics of articulated bodies for 3D human pose tracking. In all cases, we demonstrate remarkable improvement compared to several chosen baselines.

    DOI

  • Globally and locally consistent image completion

    Satoshi Iizuka, Edgar Simo-Serra, Hiroshi Ishikawa

    ACM Transactions on Graphics   36 ( 4 )  2017  [Refereed]

     View Summary

    We present a novel approach for image completion that results in images that are both locally and globally consistent. With a fully-convolutional neural network, we can complete images of arbitrary resolutions by filling-in missing regions of any shape. To train this image completion network to be consistent, we use global and local context discriminators that are trained to distinguish real images from completed ones. The global discriminator looks at the entire image to assess if it is coherent as a whole, while the local discriminator looks only at a small area centered at the completed region to ensure the local consistency of the generated patches. The image completion network is then trained to fool the both context discriminator networks, which requires it to generate images that are indistinguishable from real ones with regard to overall consistency as well as in details. We show that our approach can be used to complete a wide variety of scenes. Furthermore, in contrast with the patch-based approaches such as PatchMatch, our approach can generate fragments that do not appear elsewhere in the image, which allows us to naturally complete the images of objects with familiar and highly specific structures, such as faces.

    DOI

  • Let there be color!: Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification

    Satoshi Iizuka, Edgar Simo-Serra, Hiroshi Ishikawa

    ACM Transactions on Graphics   35 ( 4 )  2016.07  [Refereed]

     View Summary

    We present a novel technique to automatically colorize grayscale images that combines both global priors and local image features. Based on Convolutional Neural Networks, our deep network features a fusion layer that allows us to elegantly merge local information dependent on small image patches with global priors computed using the entire image. The entire framework, including the global and local priors as well as the colorization model, is trained in an end-to-end fashion. Furthermore, our architecture can process images of any resolution, unlike most existing approaches based on CNN. We leverage an existing large-scale scene classification database to train our model, exploiting the class labels of the dataset to more efficiently and discriminatively learn the global priors. We validate our approach with a user study and compare against the state of the art, where we show significant improvements. Furthermore, we demonstrate our method extensively on many different types of images, including black-and-white photography from over a hundred years ago, and show realistic colorizations.

    DOI

  • Learning to simplify: Fully convolutional networks for rough sketch cleanup

    Edgar Simo-Serra, Satoshi Iizuka, Kazuma Sasaki, Hiroshi Ishikawa

    ACM Transactions on Graphics   35 ( 4 )  2016.07  [Refereed]

     View Summary

    In this paper, we present a novel technique to simplify sketch drawings based on learning a series of convolution operators. In contrast to existing approaches that require vector images as input, we allow the more general and challenging input of rough raster sketches such as those obtained from scanning pencil sketches. We convert the rough sketch into a simplified version which is then amendable for vectorization. This is all done in a fully automatic way without user intervention. Our model consists of a fully convolutional neural network which, unlike most existing convolutional neural networks, is able to process images of any dimensions and aspect ratio as input, and outputs a simplified sketch which has the same dimensions as the input image. In order to teach our model to simplify, we present a new dataset of pairs of rough and simplified sketch drawings. By leveraging convolution operators in combination with efficient use of our proposed dataset, we are able to train our sketch simplification model. Our approach naturally overcomes the limitations of existing methods, e.g., vector images as input and long computation time
    and we show that meaningful simplifications can be obtained for many different test cases. Finally, we validate our results with a user study in which we greatly outperform similar approaches and establish the state of the art in sketch simplification of raster images.

    DOI

  • Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction

    Edgar Simo-Serra and Hiroshi Ishikawa

    Conference in Computer Vision and Pattern Recognition (CVPR)    2016.06  [Refereed]

    DOI

  • Neuroaesthetics in Fashion: Modeling the Perception of Fashionability

    Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, Raquel Urtasun

    Conference in Computer Vision and Pattern Recognition (CVPR)    2015.06  [Refereed]

    DOI

  • Discriminative learning of deep convolutional feature point descriptors

    Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, Francesc Moreno-Noguer

    Proceedings of the IEEE International Conference on Computer Vision   2015   118 - 126  2015.02  [Refereed]

     View Summary

    Deep learning has revolutionalized image-level tasks such as classification, but patch-level tasks, such as correspondence, still rely on hand-crafted features, e.g. SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. We deal with the large number of potential pairs with the combination of a stochastic sampling of the training set and an aggressive mining strategy biased towards patches that are hard to classify. By using the L2 distance during both training and testing we develop 128-D descriptors whose euclidean distances reflect patch similarity, and which can be used as a drop-in replacement for any task involving SIFT. We demonstrate consistent performance gains over the state of the art, and generalize well against scaling and rotation, perspective transformation, non-rigid deformation, and illumination changes. Our descriptors are efficient to compute and amenable to modern GPUs, and are publicly available.

    DOI

  • DaLI: Deformation and Light Invariant Descriptor

    Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer

    International Journal of Computer Vision (IJCV) 115(2):135-154   115 ( 2 ) 136 - 154  2015.02  [Refereed]

    DOI

  • Kinematic Synthesis using Tree Topologies

    Edgar Simo-Serra and Alba Perez-Gracia

    Mechanism and Machine Theory (MAMT) 72:94-113   72   94 - 113  2014.02  [Refereed]

    DOI

  • A Joint Model for 2D and 3D Pose Estimation from a Single Image

    Edgar Simo-Serra, Ariadna Quattoni, Carme Torras, Francesc Moreno-Noguer

    Conference in Computer Vision and Pattern Recognition (CVPR)    2013.06  [Refereed]

    DOI

  • Single Image 3D Human Pose Estimation from Noisy Observations

    Edgar Simo-Serra, Arnau Ramisa, Guillem Alenyà, Carme Torras, Francesc Moreno-Noguer

    Conference in Computer Vision and Pattern Recognition (CVPR)    2012.06  [Refereed]

    DOI

  • Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval

    Shuhei Yokoo, Kohei Ozaki, Edgar Simo-Serra, Satoshi Iizuka

    Conference in Computer Vision and Pattern Recognition Workshops (CVPRW)    2020.06  [Refereed]

  • Regularized Adversarial Training for Single-shot Virtual Try-On

    Kotaro Kikuchi, Kota Yamaguchi, Edgar Simo-Serra, Tetsunori Kobayashi

    International Conference on Computer Vision Workshops (ICCVW)    2019.11  [Refereed]

  • Understanding the Effects of Pre-training for Object Detectors via Eigenspectrum

    Yosuke Shinya, Edgar Simo-Serra, Taiji Suzuki

    International Conference on Computer Vision Workshops (ICCVW)    2019.10  [Refereed]

  • Virtual Thin Slice: 3D Conditional GAN-based Super-resolution for CT Slice Interval

    Akira Kudo, Yoshiro Kitamura, Yuanzhong Li, Satoshi Iizuka, Edgar Simo-Serra

    International Conference on Medical Image Computing and Computer Assisted Intervention Workshops (MICCAIW)    2019.10  [Refereed]

    DOI

  • 固有値分布に基づく物体検出CNNの事前学習効果の分析

    進矢 陽介, シモセラ エドガー, 鈴木 大慈

    第22回画像の認識・理解シンポジウム(MIRU)    2019.08

  • Temporal Distance Matrices for Workout Form Assessment

    Ryoji Ogata, Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa

    第22回画像の認識・理解シンポジウム(MIRU、ショートオーラル)    2019.07  [Refereed]

  • Regularizing Adversarial Training for Single-shot Object Placement

    Kotaro Kikuchi, Kota Yamaguchi, Edgar Simo-Serra, Tetsunori Kobayashi

    第22回画像の認識・理解シンポジウム(MIRU、ショートオーラル)    2019.07  [Refereed]

  • DeepRemaster: Temporal Source-Reference Attentionを用いた動画のデジタルリマスター

    飯塚 里志,シモセラ エドガー

    Visual Computing / グラフィクスとCAD 合同シンポジウム(オーラル)    2019.06  [Refereed]

  • Optimization-Based Data Generation for Photo Enhancement

    Mayu Omiya, Yusuke Horiuchi, Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa

    Conference in Computer Vision and Pattern Recognition Workshops (CVPRW)    2019.06  [Refereed]

  • Temporal Distance Matrices for Squat Classification

    Ryoji Ogata, Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa

    Conference in Computer Vision and Pattern Recognition Workshops (CVPRW)    2019.06  [Refereed]

  • Re-staining Pathology Images by FCNN

    Masayuki Fujitani, Yoshihiko Mochizuki, Satoshi Iizuka, Edgar Simo-Serra, Hirokazu Kobayashi, Chika Iwamoto, Kenoki Ohuchida, Makoto Hashizume, Hidekata Hontani, Hiroshi Ishikawa

    International Conference on Machine Vision Applications (MVA)    2019.05  [Refereed]

    DOI

  • Spectral Normalization and Relativistic Adversarial Training for Conditional Pose Generation with Self-Attention

    Yusuke Horiuchi, Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa

    International Conference on Machine Vision Applications (MVA)    2019.05  [Refereed]

    DOI

  • Learning Photo Enhancement by Black-Box Model Optimization Data Generation

    Mayu Omiya, Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa

    SIGGRAPH Asia Technical Brief    2018.11  [Refereed]

    DOI

  • FCNNを用いた病理画像の染色変換

    藤谷 真之,望月 義彦,飯塚 里志,シモセラ エドガー,小林 裕和,岩本 千佳,大内田 研宙,橋爪 誠,本谷 秀堅,石川 博

    第21回画像の認識・理解シンポジウム(MIRU、オーラル)    2018.08  [Refereed]

  • 背景と反射成分の同時推定による画像の映り込み除去

    佐藤 良亮,飯塚 里志,シモセラ エドガー,石川 博

    第21回画像の認識・理解シンポジウム(MIRU、オーラル)    2018.08  [Refereed]

  • 再帰型畳み込みニューラルネットワークによる航空写真の多クラスセグメンテーション

    高橋 宏輝,飯塚 里志,シモセラ エドガー,石川 博

    第21回画像の認識・理解シンポジウム(MIRU)    2018.08

  • 補正パラメータ学習による写真の高品質自動補正

    近江谷 真由,シモセラ エドガー,飯塚 里志,石川 博

    第21回画像の認識・理解シンポジウム(MIRU)    2018.08

  • SSDによる郵便物ラベルの認識及び高速化

    尾形 亮二,望月 義彦,飯塚 里志,シモセラ エドガー,石川 博

    第21回画像の認識・理解シンポジウム(MIRU)    2018.08

  • Learning to restore deteriorated line drawing

    Kazuma Sasaki, Satoshi Iizuka, Edgar Simo-Serra, Hiroshi Ishikawa

    Visual Computer   34 ( 6-8 ) 1077 - 1085  2018.06  [Refereed]

     View Summary

    We propose a fully automatic approach to restore aged old line drawings. We decompose the task into two subtasks: the line extraction subtask, which aims to extract line fragments and remove the paper texture background, and the restoration subtask, which fills in possible gaps and deterioration of the lines to produce a clean line drawing. Our approach is based on a convolutional neural network that consists of two sub-networks corresponding to the two subtasks. They are trained as part of a single framework in an end-to-end fashion. We also introduce a new dataset consisting of manually annotated sketches by Leonardo da Vinci which, in combination with a synthetic data generation approach, allows training the network to restore deteriorated line drawings. We evaluate our method on challenging 500-year-old sketches and compare with existing approaches with a user study, in which it is found that our approach is preferred 72.7% of the time.

    DOI

  • 特集「漫画・線画の画像処理」ラフスケッチの自動線画化技術

    シモセラ エドガー,飯塚 里志

    映像情報メディア学会誌2018年5月号    2018.05

  • Multi-modal joint embedding for fashion product retrieval

    A. Rubio, Longlong Yu, E. Simo-Serra, F. Moreno-Noguer

    Proceedings - International Conference on Image Processing, ICIP   2017-   400 - 404  2018.02  [Refereed]

     View Summary

    Finding a product in the fashion world can be a daunting task. Everyday, e-commerce sites are updating with thousands of images and their associated metadata (textual information), deepening the problem, akin to finding a needle in a haystack. In this paper, we leverage both the images and textual metadata and propose a joint multi-modal embedding that maps both the text and images into a common latent space. Distances in the latent space correspond to similarity between products, allowing us to effectively perform retrieval in this latent space, which is both efficient and accurate. We train this embedding using large-scale real world e-commerce data by both minimizing the similarity between related products and using auxiliary classification networks to that encourage the embedding to have semantic meaning. We compare against existing approaches and show significant improvements in retrieval tasks on a large-scale e-commerce dataset. We also provide an analysis of the different metadata.

    DOI

  • Multi-modal Embedding for Main Product Detection in Fashion

    LongLong Yu, Edgar Simo-Serra, Francesc Moreno-Noguer, Antonio Rubio

    Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017   2018-   2236 - 2242  2018.01  [Refereed]

     View Summary

    We present an approach to detect the main product in fashion images by exploiting the textual metadata associated with each image. Our approach is based on a Convolutional Neural Network and learns a joint embedding of object proposals and textual metadata to predict the main product in the image. We additionally use several complementary classification and overlap losses in order to improve training stability and performance. Our tests on a large-scale dataset taken from eight e-commerce sites show that our approach outperforms strong baselines and is able to accurately detect the main product in a wide diversity of challenging fashion images.

    DOI

  • What Makes a Style: Experimental Analysis of Fashion Prediction

    Moeko Takagi, Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa

    Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017   2018-   2247 - 2253  2018.01  [Refereed]

     View Summary

    In this work, we perform an experimental analysis of the differences of both how humans and machines see and distinguish fashion styles. For this purpose, we propose an expert-curated new dataset for fashion style prediction, which consists of 14 different fashion styles each with roughly 1,000 images of worn outfits. The dataset, with a total of 13,126 images, captures the diversity and complexity of modern fashion styles. We perform an extensive analysis of the dataset by benchmarking a wide variety of modern classification networks, and also perform an in-depth user study with both fashion-savvy and fashion-naïve users. Our results indicate that, although classification networks are able to outperform naive users, they are still far from the performance of savvy users, for which it is important to not only consider texture and color, but subtle differences in the combination of garments.

    DOI

  • Multi-label Fashion Image Classification with Minimal Human Supervision

    Naoto Inoue, Edgar Simo-Serra, Toshihiko Yamasaki, Hiroshi Ishikawa

    Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017   2018-   2261 - 2267  2018.01  [Refereed]

     View Summary

    We tackle the problem of multi-label classification of fashion images, learning from noisy data with minimal human supervision. We present a new dataset of full body poses, each with a set of 66 binary labels corresponding to the information about the garments worn in the image obtained in an automatic manner. As the automatically-collected labels contain significant noise, we manually correct the labels for a small subset of the data, and use these correct labels for further training and evaluation. We build upon a recent approach that both cleans the noisy labels and learns to classify, and introduce simple changes that can significantly improve the performance.

    DOI

  • Adaptive Energy Selection For Content-Aware Image Resizing

    Kazuma Sasaki, Yuya Nagahama, Zheng Ze, Satoshi Iizuka, Edgar Simo-Serra, Yoshihiko Mochizuki, Hiroshi Ishikawa

    Asian Conference on Pattern Recognition (ACPR)    2017.11  [Refereed]

    DOI

  • 画像類似度を考慮したデータセットを用いて学習したCNNによる病理画像の染色変換

    藤谷 真之,望月 義彦,飯塚 里志,シモセラ エドガー,石川 博

    ヘルスケア・医療情報通信技術研究会(MICT)    2017.11

  • ディープラーニングによるファッションコーディネートの分類

    高木 萌子,シモセラ エドガー,飯塚 里志,石川 博

    第20回画像の認識・理解シンポジウム(MIRU)    2017.08

  • 再帰構造を用いた全層畳み込みニューラルネットワークによる航空写真における建物のセグメンテーション

    高橋 宏輝,飯塚 里志,シモセラ エドガー,石川 博

    第20回画像の認識・理解シンポジウム(MIRU、オーラル)    2017.08  [Refereed]

  • Banknote portrait detection using convolutional neural network

    Ryutaro Kitagawa, Yoshihiko Mochizuki, Satoshi Iizuka, Edgar Simo-Serra, Hiroshi Matsuki, Naotake Natori, Hiroshi Ishikawa

    Proceedings of the 15th IAPR International Conference on Machine Vision Applications, MVA 2017     440 - 443  2017.07  [Refereed]

     View Summary

    Banknotes generally have different designs according to their denominations. Thus, if characteristics of each design can be recognized, they can be used for sorting banknotes according to denominations. Portrait in banknotes is one such characteristic that can be used for classification. A sorting system for banknotes can be designed that recognizes portraits in each banknote and sort it accordingly. In this paper, our aim is to automate the configuration of such a sorting system by automatically detect portraits in sample banknotes, so that it can be quickly deployed in a new target country. We use Convolutional Neural Networks to detect portraits in completely new set of banknotes robust to variation in the ways they are shown, such as the size and the orientation of the face.

    DOI

  • ディープマリオ

    北川 竜太郎,シモセラ エドガー,飯塚 里志,望月 義彦,石川 博

    Visual Computing / グラフィクスとCAD 合同シンポジウム(オーラル)    2017.06  [Refereed]

  • 回帰分析にもとづく補正モデルを用いた写真の自動補正

    近江谷 真由,シモセラ エドガー,飯塚 里志,石川 博

    Visual Computing / グラフィクスとCAD 合同シンポジウム(オーラル)    2017.06  [Refereed]

  • Room reconstruction from a single spherical image by higher-order energy minimization

    Kosuke Fukano, Yoshihiko Mochizuki, Satoshi Iizuka, Edgar Simo-Serra, Akihiro Sugimoto, Hiroshi Ishikawa

    Proceedings - International Conference on Pattern Recognition     1768 - 1773  2017.04  [Refereed]

     View Summary

    We propose a method for understanding a room from a single spherical image, i.e., reconstructing and identifying structural planes forming the ceiling, the floor, and the walls in a room. A spherical image records the light that falls onto a single viewpoint from all directions and does not require correlating geometrical information from multiple images, which facilitates robust and precise reconstruction of the room structure. In our method, we detect line segments from a given image, and classify them into two groups: segments that form the boundaries of the structural planes and those that do not. We formulate this problem as a higher-order energy minimization problem that combines the various measures of likelihood that one, two, or three line segments are part of the boundary. We minimize the energy with graph cuts to identify segments forming boundaries, from which we estimate structural the planes in 3D. Experimental results on synthetic and real images confirm the effectiveness of the proposed method.

    DOI

  • Detection by classification of buildings in multispectral satellite imagery

    Tomohiro Ishii, Edgar Simo-Serra, Satoshi Iizuka, Yoshihiko Mochizuki, Akihiro Sugimoto, Hiroshi Ishikawa, Ryosuke Nakamura

    Proceedings - International Conference on Pattern Recognition     3344 - 3349  2017.04  [Refereed]

     View Summary

    We present an approach for the detection of buildings in multispectral satellite images. Unlike 3-channel RGB images, satellite imagery contains additional channels corresponding to different wavelengths. Approaches that do not use all channels are unable to fully exploit these images for optimal performance. Furthermore, care must be taken due to the large bias in classes, e.g., most of the Earth is covered in water and thus it will be dominant in the images. Our approach consists of training a Convolutional Neural Network (CNN) from scratch to classify multispectral image patches taken by satellites as whether or not they belong to a class of buildings. We then adapt the classification network to detection by converting the fully-connected layers of the network to convolutional layers, which allows the network to process images of any resolution. The dataset bias is compensated by subsampling negatives and tuning the detection threshold for optimal performance. We have constructed a new dataset using images from the Landsat 8 satellite for detecting solar power plants and show our approach is able to significantly outperform the state-of-the-art. Furthermore, we provide an indepth evaluation of the seven different spectral bands provided by the satellite images and show it is critical to combine them to obtain good results.

    DOI

  • BASS: Boundary-Aware Superpixel Segmentation

    Antonio Rubio, Longlong Yu, Edgar Simo-Serra, Francesc Moreno-Noguer

    Proceedings - International Conference on Pattern Recognition     2824 - 2829  2017.04  [Refereed]

     View Summary

    We propose a new superpixel algorithm based on exploiting the boundary information of an image, as objects in images can generally be described by their boundaries. Our proposed approach initially estimates the boundaries and uses them to place superpixel seeds in the areas in which they are more dense. Afterwards, we minimize an energy function in order to expand the seeds into full superpixels. In addition to standard terms such as color consistency and compactness, we propose using the geodesic distance which concentrates small superpixels in regions of the image with more information, while letting larger superpixels cover more homogeneous regions. By both improving the initialization using the boundaries and coherency of the superpixels with geodesic distances, we are able to maintain the coherency of the image structure with fewer superpixels than other approaches. We show the resulting algorithm to yield smaller Variation of Information metrics in seven different datasets while maintaining Undersegmentation Error values similar to the state-of-the-art methods.

    DOI

  • Multi-Modal Fashion Product Retrieval

    Antonio Rubio, Longlong Yu, Edgar Simo-Serra, Francesc Moreno-Noguer

    The 6th Workshop on Vision and Language    2017.04  [Refereed]

    DOI

  • Convolutional Neural Network による紙幣の肖像画検出

    北川 竜太郎,望月 義彦,飯塚 里志,シモセラ エドガー,名取 直毅,松木 洋,石川 博

    第19回画像の認識・理解シンポジウム(MIRU)    2016.08

  • 全層畳込みニューラルネットワークを用いた線画の自動補完

    佐々木 一真,飯塚 里志,シモセラ エドガー,石川 博

    第19回画像の認識・理解シンポジウム(MIRU)    2016.08

  • Structured Prediction with Output Embeddings for Semantic Image Annotation

    Ariadna Quattoni, Arnau Ramisa, Pranava Swaroop Madhyastha, Edgar Simo-Serra, Francesc Moreno-Noguer

    Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT, Short)    2016.06  [Refereed]

  • 地球観測衛星画像上の地物自動認識

    中村 良介,石井 智大,野里 博和,坂無 英徳,シモセラ エドガー,望月 義彦,飯塚 里志,石川 博

    人工知能学会全国大会    2016.06

  • Understanding Human-Centric Images: From Geometry to Fashion

    Edgar Simo-Serra

    PhD Thesis    2015.07  [Refereed]

  • Efficient Monocular Pose Estimation for Complex 3D Models

    Antonio Rubio, Michael Villamizar, Luis Ferraz, Adrián Peñate-Sánchez, Arnau Ramisa, Edgar Simo-Serra, Alberto Sanfeliu, Francesc Moreno-Noguer

    International Conference in Robotics and Automation (ICRA)    2015.05  [Refereed]

    DOI

  • Lie Algebra-Based Kinematic Prior for 3D Human Pose Tracking

    Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer

    International Conference on Machine Vision Applications (MVA)    2015.05  [Refereed]

    DOI

  • A High Performance CRF Model for Clothes Parsing

    Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, Raquel Urtasun

    Asian Conference on Computer Vision (ACCV)    2014.11  [Refereed]

    DOI

  • Geodesic Finite Mixture Models

    Edgar Simo-Serra, Carme Torras, Francesc Moreno-Noguer

    British Machine Vision Conference (BMVC)    2014.09  [Refereed]

    DOI

  • Kinematic Synthesis of Multi-Fingered Robotic Hands for Finite and Infinitesimal Tasks

    Edgar Simo-Serra, Alba Perez-Gracia, Hyosang Moon, Nina Robson

    Advances in Robot Kinematics (ARK)    2012.06  [Refereed]

    DOI

  • Design of Non-Anthropomorphic Robotic Hands for Anthropomorphic Tasks

    Edgar Simo-Serra, Francesc Moreno-Noguer, Alba Perez-Gracia

    ASME International Design Engineering Technical Conferences (IDETC)    2011.07  [Refereed]

    DOI

  • Kinematic Model of the Hand using Computer Vision

    Edgar Simo-Serra

    Degree Thesis    2011.05  [Refereed]

▼display all

Misc

  • Kinematic Synthesis Using Tree Topologies (vol 72, pg 94, 2013)

    Edgar Simo-Serra, Alba Perez-Gracia

    MECHANISM AND MACHINE THEORY   73   314 - 315  2014.03

    Other  

    DOI

Awards

  • 科学技術への顕著な貢献2018(ナイスステップな研究者)に選定

    2018.11   文部科学省  

  • Best paper賞

    2017.10   International Conference on Computer Vision - Computer Vision for Fashion Workshop (ICCV-CVF)  

  • Special Doctoral Award

    2017.09   BarcelonaTech  

  • Innovative Technologies 2016 特別賞「Culture」

    2016.10   経済産業省  

  • 最優秀賞博士論文賞

    2016.10   Catalan Association for Artificial Intelligence  

  • ACIA award to best diffusion of artificial intelligence research 2015

    2015.10  

  • Best paper award at the International Conference on Machine Vision Applications (MVA) 2015

    2015.05  

▼display all

Research Projects

  • 対話型パーソナライゼーションAIによるコンテンツ制作の拡張

    科学技術振興機構 

    Project Year :

    2018.04
    -
    2021.03
     

  • 教師なし学習による人工知能を用いた対話的なコンテンツ制作

    科学技術振興機構 

    Project Year :

    2016.12
    -
    2018.03
     

Presentations

  • 深層学習を用いた画像処理技術

    シモセラ エドガー  [Invited]

    ナイステップな研究者2018 講演会 

    Presentation date: 2019.06

  • 対話的なニューラルネットワークによるラフスケッチのペン入れ支援

    シモセラ エドガー  [Invited]

    JST CREST HCI for Machine Learning Symposium 

    Presentation date: 2019.03

  • 対話的なニューラルネットワークによるコンテンツ作成の支援

    シモセラ エドガー  [Invited]

    CGVI 第173回研究発表会 

    Presentation date: 2019.03

  • Towards Augmenting Creative Processes with Machine Learning

    シモセラ エドガー  [Invited]

    Computer Science Student Workshop 

    Presentation date: 2018.08

  • 敵対的データ拡張による自動線画化

    シモセラ エドガー  [Invited]

    Visual Computingシンポジウム 

    Presentation date: 2018.06

  • Smart Inker: ラフスケッチのペン入れ支援

    シモセラ エドガー  [Invited]

    Visual Computingシンポジウム 

    Presentation date: 2018.06

  • Semi-Supervised Learning of Sketch Simplification

    シモセラ エドガー  [Invited]

    The Deep Learning Workshop 2018 

    Presentation date: 2018.03

  • 深層学習の基礎と導入に向けて

    シモセラ エドガー  [Invited]

    電子情報通信学会総合大会 

    Presentation date: 2018.03

  • ディープネットワークを用いた画像変換

    シモセラ エドガー  [Invited]

    第69回 Stereo Club Tokyo 例会【春】+先進映像表現研究会 

    Presentation date: 2018.03

  • ディープネットワークを用いた白黒写真の自動色付け

    シモセラ エドガー  [Invited]

    3Dフォーラム 

    Presentation date: 2018.01

  • Exploiting the Web to Understand Fashion

    シモセラ エドガー  [Invited]

    ICCV2017 Computer Vision for Fashion Workshop 

    Presentation date: 2017.10

  • Leveraging the Web for Fashion and Image Understanding

    シモセラ エドガー  [Invited]

    Workshop on E-Commerce and Entertainment Computing (ECEC) 

    Presentation date: 2017.09

  • 深層学習による画像変換

    シモセラ エドガー  [Invited]

    CVIM 2017年9月研究会 

    Presentation date: 2017.09

  • ディープラーニングによる画像生成

    シモセラ エドガー  [Invited]

    IAIP定例研究会 ビッグデータ・画像認識 

    Presentation date: 2017.07

  • Frontiers of Image Processing and Computer Graphics by Deep Learning

    シモセラ エドガー  [Invited]

    Computer Graphics International 2017 

    Presentation date: 2017.06

  • ディープラーニングによる画像生成

    シモセラ エドガー  [Invited]

    第42回 光学シンポジウム 

    Presentation date: 2017.06

  • Towards Mastering the Image: Deep Learning for Computer Graphics

    シモセラ エドガー  [Invited]

    画像と人工知能の最前線 

    Presentation date: 2017.01

  • 全層畳込みニューラルネットワークによる画像生成

    シモセラ エドガー  [Invited]

    第4回 MSR Intern Alumni交流会 

    Presentation date: 2016.12

  • ディープラーニングによる画像生成の最前線(応用)

    シモセラ エドガー  [Invited]

    画像関連学会連合会 第3回秋季大会 

    Presentation date: 2016.11

  • Understanding Human-Centric Images: From Geometry to Fashion

    シモセラ エドガー  [Invited]

    19th International Conference of the Catalan Association for Artificial Intelligence 

    Presentation date: 2016.10

  • Fashion Style in 128 Floats: Joint Ranking and Classification using Weak Data for Feature Extraction

    シモセラ エドガー  [Invited]

    MIRU2016 回画像の認識・理解シンポジウム 

    Presentation date: 2016.08

  • Learning to Simplify Sketches

    シモセラ エドガー  [Invited]

    DCAJプレゼンテーション “Industrial Application of Content Technology in Japan” 

    Presentation date: 2016.07

  • 全層畳込みニューラルネットワークによるラフスケッチの自動線画化

    シモセラ エドガー  [Invited]

    Visual Computingシンポジウム 

    Presentation date: 2016.06

▼display all

 

Syllabus

▼display all

 

Committee Memberships

  • 2018.04
    -
    2022.03

    情報処理学会コンピュータビジョンとイメージメディア研究会  研究専門委員