2022/12/08 更新

写真a

ソン カクメイ
孫 鶴鳴
Scopus 論文情報  
論文数: 0  Citation: 0  h-index: 14

Citation Countは当該年に発表した論文の被引用数

所属
理工学術院 理工学術院総合研究所
職名
次席研究員(研究院講師)

他学部・他研究科等兼任情報

  • 理工学術院   基幹理工学部

学歴

  • 2014年04月
    -
    2017年03月

    早稲田大学   情報生産システム研究科   博士課程  

  • 2012年09月
    -
    2014年03月

    上海交通大学   電子情報電気工学研究科   修士課程  

  • 2010年09月
    -
    2012年09月

    早稲田大学   情報生産システム研究科   修士課程  

  • 2007年09月
    -
    2011年07月

    上海交通大学   電子情報電気工学部  

学位

  • 2017年03月   早稲田大学   博士(工学)

経歴

  • 2019年10月
    -
    継続中

    国立研究開発法人科学技術振興機構

  • 2018年09月
    -
    継続中

    早稲田大学   次席研究員

  • 2017年04月
    -
    2018年09月

    日本電気株式会社   中央研究所   研究員

  • 2016年07月
    -
    2016年09月

    東京大学

  • 2015年08月
    -
    2015年09月

    カリフォルニア大学デービス校

所属学協会

  •  
     
     

    IEEE

  •  
     
     

    電子情報通信学会

 

研究分野

  • 電子デバイス、電子機器

  • 計算機システム

  • 知覚情報処理

研究キーワード

  • 動画像処理

  • 大規模集積回路

  • 深層学習

  • 高性能計算

論文

  • Improving Latent Quantization of Learned Image Compression with Gradient Scaling

    Heming Sun, Lu Yu, Jiro Katto

    IEEE International Conference on Visual Communications and Image Processing    2022年12月  [査読有り]

    担当区分:筆頭著者

  • F-LIC: FPGA-based Learned Image Compression with a Fine-grained Pipeline

    Heming Sun, Qingyang Yi, Fangzheng Lin, Lu Yu, Jiro Katto, Masahiro Fujita

    IEEE Asian Solid-State Circuits Conference    2022年11月  [査読有り]

    担当区分:筆頭著者

  • Learned Video Compression with Residual Prediction and Feature-aided Loop Filter

    Chao Liu, Heming Sun, Xiaoyang Zeng, Yibo Fan

    IEEE International Conference on Image Processing    2022年10月

  • Memory-Efficient Learned Image Compression with Pruned Hyperprior Module

    Ao Luo, Heming Sun, Jinming Liu, Jiro Katto

    IEEE International Conference on Image Processing    2022年10月

  • Streaming-capable High-performance Architecture of Learned Image Compression Codecs

    Fangzheng Lin, Heming Sun, Jiro Katto

    IEEE International Conference on Image Processing    2022年10月

  • Deep Image Compression Based on Multi-scale Deformable Convolution

    Daowen Li, Yingming Li, Heming Sun, Lu Yu

    Journal of Visual Communication and Image Representation    2022年08月  [査読有り]

  • Improving Multiple Machine Vision Tasks in the Compressed Domain

    Jinming Liu, Heming Sun, Jiro Katto

    International Conference on Pattern Recognition    2022年08月

  • A QP-adaptive Mechanism for CNN-based Filter in Video Coding

    Chao Liu, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    IEEE International Symposium on Circuits and Systems    2022年05月

  • Fast Intra Mode Decision for VVC Based on Histogram of Oriented Gradient

    Aorui Gou, Heming Sun, Jiro Katto, Tingting Li, Xiaoyang Zeng, Yibo Fan

    IEEE International Symposium on Circuits and Systems    2022年05月

  • An Area-efficient Unified Transform Architecture for VVC

    Zhijian Hao, Qi Zheng, Yibo Fan, Guoqing Xiang, Peng Zhang, Heming Sun

    IEEE International Symposium on Circuits and Systems    2022年05月

  • QA-Filter: A QP-Adaptive Convolutional Neural Network Filter for Video Coding

    Chao Liu, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    IEEE Transactions on Image Processing   31   3032 - 3045  2022年04月

     概要を見る

    Convolutional neural network (CNN)-based filters have achieved great success in video coding. However, in most previous works, individual models were needed for each quantization parameter (QP) band, which is impractical due to limited storage resources. To explore this, our work consists of two parts. First, we propose a frequency and spatial QP-adaptive mechanism (FSQAM), which can be directly applied to the (vanilla) convolution to help any CNN filter handle different quantization noise. From the frequency domain, a FQAM that introduces the quantization step (Qstep) into the convolution is proposed. When the quantization noise increases, the ability of the CNN filter to suppress noise improves. Moreover, SQAM is further designed to compensate for the FQAM from the spatial domain. Second, based on FSQAM, a QP-adaptive CNN filter called QA-Filter that can be used under a wide range of QP is proposed. By factorizing the mixed features to high-frequency and low-frequency parts with the pair of pooling and upsampling operations, the QA-Filter and FQAM can promote each other to obtain better performance. Compared to the H.266/VVC baseline, average 5.25% and 3.84% BD-rate reductions for luma are achieved by QA-Filter with default all-intra (AI) and random-access (RA) configurations, respectively. Additionally, an up to 9.16% BD-rate reduction is achieved on the luma of sequence BasketballDrill. Besides, FSQAM achieves measurably better BD-rate performance compared with the previous QP map method.

    DOI PubMed

    Scopus

    1
    被引用数
    (Scopus)
  • An Efficient Low-Complexity Convolutional Neural Network Filter

    Chao Liu, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    IEEE Multimedia   29 ( 2 ) 83 - 93  2022年

     概要を見る

    Convolutional neural network (CNN) filters have achieved significant performance in video artifacts reduction. However, the high complexity of existing methods makes them difficult to be applied in actual usage. In this article, an efficient low-complexity CNN filter is proposed. We utilized depth separable convolution merged with the batch normalization as the backbone of our proposed CNN filter and presented a frame-level residual mapping (RM) to use one network to filter both intra- A nd intersamples. It is known that there will be an oversmoothing problem for the interframes if we directly use the filter trained with intrasamples. In this article, the proposed RM can effectively solve the oversmoothing problem. Besides, RM is flexible and can be combined with other learning-based filters. The experimental results show that our proposed method achieves a significant bjÃntegaard-delta(BD)-rate reduction than H.265/high efficiency video coding. The experiments show that the proposed network achieves about 1.2% BD-rate reduction and 79.1% decrease in FLOPs than VR-CNN. Our performance is better with less complexity than the previous work. The measurement on H.266/versatile video coding and ablation studies also ensure the effectiveness of the proposed method.

    DOI

    Scopus

  • Research and examination on implementation of super-resolution models using deep learning with INT8 precision

    Shota Hirose, Naoki Wada, Jiro Katto, Heming Sun

    4th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2022 - Proceedings     133 - 137  2022年

     概要を見る

    Fixed-point arithmetic is a technique for treating weights and intermediate values as integers in deep learning. Since deep learning models generally store each weight as a 32-bit floating-point value, storing by 8-bit integers can reduce the size of the model. In addition, memory usage can be reduced, and inference can be much faster by hardware acceleration when special hardware for int8 inference is provided. On the other hand, when inferences are carried out by fixed-point weights, accuracy of the model is reduced due to loss of dynamic range of the weights and intermediate layer values. For this reason, inference frameworks such as TensorRT and TensorFlow Lite, provide a function called "calibration"to suppress the deterioration of the accuracy caused by quantization by measuring the distribution of input data and numerical values in the intermediate layer when quantization is performed. In this paper, after quantizing a pre-trained model that performs super-resolution, speed and accuracy are measured using TensorRT. As a result, the trade-off between the runtime and the accuracy is confirmed. The effect of calibration is also confirmed.

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • Forward and Backward Warping for Optical Flow-Based Frame Interpolation

    Joi Shimizu, Heming Sun, Jiro Katto

    4th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2022 - Proceedings     82 - 86  2022年

     概要を見る

    Frame interpolation methods generate intermediate frames by taking consecutive frames as inputs. This enables the generation of high frame rate videos from low frame rate videos. Recently, many deep learning-based frame interpolation methods have been proposed. One way of frame interpolation is by using the bi-directional optical flow. In many cases, these methods use backward warping to warp the input images to the desired frame. However, forward warping can also be used to warp the input frames. In this paper, we propose a frame interpolation method that utilizes both forward warping and backward warping. Experimental results show that utilizing both warping methods can enhance the performance compared to only using backward warping.

    DOI

    Scopus

  • ViT-GAN: Using Vision Transformer as Discriminator with Adaptive Data Augmentation

    Shota Hirose, Naoki Wada, Jiro Katto, Heming Sun

    2021 3rd International Conference on Computer Communication and the Internet, ICCCI 2021     185 - 189  2021年06月

     概要を見る

    These days, attention is thought to be an efficient way to recognize an image. Vision Transformer (ViT) uses a Transformer for images and has very high performance in image recognition. ViT has fewer parameters than Big Transfer (BiT) and Noisy Student. Therefore, we consider that Self-Attention-based networks are slimmer than convolution-based networks. We use a ViT as a Discriminator in a Generative Adversarial Network (GAN) to get the same performance with a smaller model. We name it ViT-GAN. Besides, we find parameter sharing is very useful to make parameter-efficient ViT. However, the performances of ViT heavily depend on the number of data samples. Therefore, we propose a new method of Data Augmentation. Our Data Augmentation, in which the strength of Data Augmentation varies adaptively, helps ViT for faster convergence and better performance. With our Data Augmentation, we show ViT-based discriminator can achieve almost the same FID but the number of the parameters of the discriminator is 35% fewer than the original discriminator.

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • Learned Image Compression with Fixed-point Arithmetic

    Heming Sun, Lu Yu, Jiro Katto

    Picture Coding Symposium    2021年06月  [査読有り]

    担当区分:筆頭著者

    DOI

    Scopus

    2
    被引用数
    (Scopus)
  • A Hardware Architecture for Adaptive Loop Filter in VVC Decoder

    Xin Wang, Heming Sun, Jiro Katto, Yibo Fan

    Proceedings of International Conference on ASIC    2021年

     概要を見る

    Adaptive Loop Filter (ALF) is a new technique proposed by the latest video coding standard Versatile Video Coding (VVC). To the best of our knowledge, this paper is the first implementation to design a hardware architecture of ALF in VVC decoder. The implementation reduces 60% of the memory access, saves approximately 50% of the hardware resources including adders and multipliers cost, increases the throughput and makes the hardware configurable for luma and chroma components. The synthesis result demonstrates that the architecture achieves a throughput of 4k@120fps and a maximum frequency of 367MHz under the TSMC 65nm process.

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • Fast Object Detection in HEVC Intra Compressed Domain

    Liuhong Chen, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    European Signal Processing Conference   2021-August   756 - 760  2021年

     概要を見る

    Conventional object detection methods are in the pixel domain and require full decoding with high computational complexity. In this paper, we propose a fast object detection method in the intra compressed domain of High Efficiency Video Coding (HEVC), which significantly accelerates the object detection process that uses compressed video images. Considering the characteristics of various coding features, we select 3 types of data for object detection, including partitioning depths, prediction modes, and residuals. To achieve a more discriminative representation of the residuals, we design an iterative restoration algorithm that can generate the details of the original image and reduce the noise in the residuals. Extensive evaluations on multiple HEVC test sequences and large-scale object detection dataset BDD100K confirm the effectiveness of our method. With a slight reduction in detection accuracy, our compressed domain detection system runs 1.8 times faster than the pixel domain.

    DOI

    Scopus

    2
    被引用数
    (Scopus)
  • Learning in Compressed Domain for Faster Machine Vision Tasks

    Jinming Liu, Heming Sun, Jiro Katto

    2021 International Conference on Visual Communications and Image Processing, VCIP 2021 - Proceedings    2021年

     概要を見る

    Learned image compression (LIC) has illustrated good ability for reconstruction quality driven tasks (e.g. PSNR, MS-SSIM) and machine vision tasks such as image understanding. However, most LIC frameworks are based on pixel domain, which requires the decoding process. In this paper, we develop a learned compressed domain framework for machine vision tasks. 1) By sending the compressed latent representation directly to the task network, the decoding computation can be eliminated to reduce the complexity. 2) By sorting the latent channels by entropy, only selective channels will be transmitted to the task network, which can reduce the bitrate. As a result, compared with the traditional pixel domain methods, we can reduce about 1/3 multiply-add operations (MACs) and 1/5 inference time while keeping the same accuracy. Moreover, proposed channel selection can contribute to at most 6.8% bitrate saving.

    DOI

    Scopus

  • Deep Pedestrian Density Estimation For Smart City Monitoring.

    Kazuki Murayama, Kenji Kanai, Masaru Takeuchi, Heming Sun, Jiro Katto

    ICIP     230 - 234  2021年

    DOI

    Scopus

  • Accelerating convolutional neural network inference based on a reconfigurable sliced systolic array

    Yixuan Zeng, Heming Sun, Jiro Katto, Yibo Fan

    Proceedings - IEEE International Symposium on Circuits and Systems   2021-May  2021年

     概要を見る

    Convolutional neural networks (CNNs) have achieved great successes on many computer vision tasks, such as image recognition, video processing, and target detection. In recent years, many hardware designs have been devoted to accelerating CNN inference. In order to further speed up CNN inference and reduce data waste, this work proposed a reconfigurable sliced systolic array: 1) Depending on the number of network nodes in each layer, the slice mode could be dynamically configured to achieve high throughput and resource utilization. 2) To take full advantage of convolution reuse and weight reuse, this work designed a tile-column sliding (TCS) processing dataflow. 3) A four-stage for loop algorithm was employed, which divides the CNN calculation into several parts based on the input nodes and output nodes. The entire CNN inference is carried out using integer-only arithmetic originated from TensorLite. Experimental results prove that these strategies lead to significant improvement in inference performance and energy efficiency.

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • Approximated reconfigurable transform architecture for VVC

    Yixuan Zeng, Heming Sun, Jiro Katto, Yibo Fan

    Proceedings - IEEE International Symposium on Circuits and Systems   2021-May  2021年

     概要を見る

    As the demand for high-resolution videos grows, the next generation video coding standard Versatile Video Coding introduces many new proposals, including Adaptive Multiple Transforms (AMT), to improve coding efficiency. This paper presents a reconfigurable transform core for the VVC standard where the implementation of 1D DST-VII and DCT-VIII for all transform sizes are enabled. To offer a very low circuit complexity, a simple approximation strategy with a little coding performance loss is proposed. An 8×8 Processing Element (PE) array is employed as the core computational unit, where each PE can be configured dynamically based on the transform type. In addition, the transforms of larger sizes can be realized in the finite PE units with the Partitioned Matrix Multiplication (PMM) scheme. The experimental and synthesis results show that this design can save at least 29.1% area compared with other works in literature with the negligible degradation of video quality and a slight increase in the bit rate.

    DOI

    Scopus

    5
    被引用数
    (Scopus)
  • Fully Neural Network Mode Based Intra Prediction of Variable Block Size

    Heming Sun, Lu Yu, Jiro Katto

    IEEE International Conference on Visual Communications and Image Processing (VCIP)     21 - 24  2020年12月  [査読有り]

    担当区分:筆頭著者

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • Enhanced Intra Prediction for Video Coding by Using Multiple Neural Networks

    Heming Sun, Zhengxue Cheng, Masaru Takeuchi, Jiro Katto

    IEEE Transactions on Multimedia   22 ( 11 ) 2764 - 2779  2020年11月  [査読有り]

    担当区分:筆頭著者

     概要を見る

    This paper enhances the intra prediction by using multiple neural network modes (NM). Each NM serves as an end-To-end mapping from the neighboring reference blocks to the current coding block. For the provided NMs, we present two schemes (appending and substitution) to integrate the NMs with the traditional modes (TM) defined in high efficiency video coding (HEVC). For the appending scheme, each NM is corresponding to a certain range of TMs. The categorization of TMs is based on the expected prediction errors. After determining the relevant TMs for each NM, we present a probability-Aware mode signaling scheme. The NMs with higher probabilities to be the best mode are signaled with fewer bits. For the substitution scheme, we propose to replace the highest and lowest probable TMs. New most probable mode (MPM) generation method is also employed when substituting the lowest probable TMs. Experimental results demonstrate that using multiple NMs will improve the coding efficiency apparently compared with the single NM. Specifically, proposed appending scheme with seven NMs can save 2.6%, 3.8%, and 3.1% BD-rate for Y, U, and V components compared with using single NM in the state-of-The-Art works.

    DOI

  • HEVC video coding with deep learning based frame interpolation

    Joi Shimizu, Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020     433 - 434  2020年10月

     概要を見る

    Recent researches in video frame interpolation show great progress. In this paper, we propose a novel video compression method which incorporates deep learning based frame interpolation into HEVC which is the current video compression standard. Experimental results show that our approach can outperform HEVC in some scenarios.

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • End-To-End Learned Image Compression With Fixed Point Weight Quantization

    Heming Sun, Zhengxue Cheng, Masaru Takeuchi, Jiro Katto

    2020 IEEE International Conference on Image Processing (ICIP)   2020-October   3359 - 3363  2020年10月  [査読有り]

    担当区分:筆頭著者

     概要を見る

    Learned image compression (LIC) has reached the traditional hand-crafted methods such as JPEG2000 and BPG in terms of the coding gain. However, the large model size of the network prohibits the usage of LIC on resource-limited embedded systems. This paper presents a LIC with 8-bit fixed-point weights. First, we quantize the weights in groups and propose a non-linear memory-free codebook. Second, we explore the optimal grouping and quantization scheme. Finally, we develop a novel weight clipping fine tuning scheme. Experimental results illustrate that the coding loss caused by the quantization is small, while around 75% model size can be reduced compared with the 32-bit floating-point anchor. As far as we know, this is the first work to explore and evaluate the LIC fully with fixed-point weights, and our proposed quantized LIC is able to outperform BPG in terms of MS-SSIM.

    DOI

  • Scalable Learned Image Compression With A Recurrent Neural Networks-Based Hyperprior

    Rige Su, Zhengxue Cheng, Heming Sun, Jiro Katto

    2020 IEEE International Conference on Image Processing (ICIP)   2020-October   3369 - 3373  2020年10月  [査読有り]

     概要を見る

    Recently learned image compression has achieved many great progresses, such as representative hyperprior and its variants based on convolutional neural networks (CNNs). However, CNNs are not fit for scalable coding and multiple models need to be trained separately to achieve variable rates. In this paper, we incorporate differentiable quantization and accurate entropy models into recurrent neural networks (RNNs) architectures to achieve a scalable learned image compression. First, we present an RNN architecture with quantization and entropy coding. To realize the scalable coding, we allocate the bits to multiple layers, by adjusting the layer-wise lambda values in Lagrangian multiplier-based rate-distortion optimization function. Second, we add an RNN-based hyperprior to improve the accuracy of entropy models for multiple-layer residual representations. Experimental results demonstrate that our performance can be comparable with recent CNN-based hyperprior methods on Kodak dataset. Besides, our method is a scalable and flexible coding approach, to achieve multiple rates using one single model, which is very appealing.

    DOI

  • A Pipelined 2D Transform Architecture Supporting Mixed Block Sizes for the VVC Standard

    Yibo Fan, Yixuan Zeng, Heming Sun, Jiro Katto, Xiaoyang Zeng

    IEEE Transactions on Circuits and Systems for Video Technology   30 ( 9 ) 3289 - 3295  2020年09月  [査読有り]

    DOI

    Scopus

    17
    被引用数
    (Scopus)
  • A Learning-Based Low Complexity in-Loop Filter for Video Coding

    Chao Liu, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)    2020年07月  [査読有り]

     概要を見る

    With the continuous development of mobile devices, it becomes possible for people to demand higher definition videos. To alleviate the pressure of deploying the video codec in mobile multimedia, a learning-based low complexity in-loop filter is proposed in this paper. Depthwise separable convolution is combined with batch normalization to construct this model. To enhance its performance, the knowledge from a pre-trained teacher model is transferred to it. However, the over-smoothing problem in the inter frames caused by double enhancing effect remains. To solve this, a Wiener-based filtering algorithm that tries to restore the distortion from the learned residuals is designed and introduces an adequate filtering effect. The experimental results show that our proposed methods achieve considerable BD-rate reduction than HEVC anchor. Compared with the previous state-of-the-art work VR-CNN, our model achieves 1.65% extra BD-rate reduction, 79.1% decrease in FLOPs, 25% decrease in encoding complexity, and 70% decoding complexity decrease.

    DOI

  • Low Bitrate Image Compression with Discretized Gaussian Mixture Likelihoods

    Zhengxue Cheng, Heming Sun, Jiro Katto

    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)   2020-June   543 - 546  2020年06月  [査読有り]

     概要を見る

    In this paper, we provide a detailed description on our submitted method Kattolab to Workshop and Challenge on Learned Image Compression (CLIC) 2020. Our method mainly incorporates discretized Gaussian Mixture Likeli-hoods to previous state-of-the-art learned compression algorithms. Besides, we also describes the acceleration strategies and bit optimization with the low-rate constraint. Experimental results have demonstrated that our approach Kattolab achieves 0.9761 in terms of MS-SSIM at the rate constraint of 0.15 bpp during the validation phase.

    DOI

  • Learned Image Compression With Discretized Gaussian Mixture Likelihoods and Attention Modules

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)     7936 - 7945  2020年06月  [査読有り]

     概要を見る

    Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM.

    DOI

  • An Image Compression Framework with Learning-based Filter

    Heming Sun, Chao Liu, Jiro Katto, Yibo Fan

    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)   2020-June   602 - 606  2020年06月  [査読有り]

    担当区分:筆頭著者

     概要を見る

    In this paper, a coding framework VIP-ICT-Codec is introduced. Our method is based on the VTM (Versatile Video Coding Test Model). First, we propose a color space conversion from RGB to YUV domain by using a PCA-like operation. A method for the PCA mean calculation is proposed to de-correlate the residual components of YUV channels. Besides, the correlation of UV components is compensated considering that they share the same coding tree in VVC. We also learn a residual mapping to alleviate the over-filtered and under-filtered problem of specific images. Finally, we regard the rate control as an unconstraint Lagrangian problem to reach the target bpp. The results show that we achieve 32.625dB at the validation phase.

    DOI

  • Learned Lossless Image Compression with A Hyperprior and Discretized Gaussian Mixture Likelihoods

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)   2020-May   2158 - 2162  2020年05月  [査読有り]

     概要を見る

    Lossless image compression is an important task in the field of multimedia communication. Traditional image codecs typically support lossless mode, such as WebP, JPEG2000, FLIF. Recently, deep learning based approaches have started to show the potential at this point. HyperPrior is an effective technique proposed for lossy image compression. This paper generalizes the hyperprior from lossy model to lossless compression, and proposes a L2-norm term into the loss function to speed up training procedure. Besides, this paper also investigated different parameterized models for latent codes, and propose to use Gaussian mixture likelihoods to achieve adaptive and flexible context models. Experimental results validate our method can outperform existing deep learning based lossless compression, and outperform the JPEG2000 and WebP for JPG images.

    DOI

  • Energy Compaction-Based Image Compression Using Convolutional AutoEncoder

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    IEEE Transactions on Multimedia   22 ( 4 ) 860 - 873  2020年04月  [査読有り]

    DOI

    Scopus

    24
    被引用数
    (Scopus)
  • Approximate FPGA-Based Multipliers Using Carry-Inexact Elementary Modules.

    Yi Guo, Heming Sun, Ping Lei, Shinji Kimura

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   103-A ( 9 ) 1054 - 1062  2020年

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • A Fast QTMT Partition Decision Strategy for VVC Intra Prediction

    Yibo Fan, Jun'An Chen, Heming Sun, Jiro Katto, Ming'E Jing

    IEEE Access   8   107900 - 107911  2020年  [査読有り]

     概要を見る

    Different from the traditional quaternary tree (QT) structure utilized in the previous generation video coding standard H.265/HEVC, a brand new partition structure named quadtree with nested multi-type tree (QTMT) is applied in the latest codec H.266/VVC. The introduction of QTMT brings in superior encoding performance at the cost of great time-consuming. Therefore, a fast intra partition algorithm based on variance and Sobel operator is proposed in this paper. The proposed method settles the novel asymmetrical partition issue in VVC by well balancing the reduction of computational complexity and the loss of encoding quality. To be more concrete, we first terminate further splitting of a coding unit (CU) when the texture of it is judged as smooth. Then, we use Sobel operator to extract gradient features to decide whether to split this CU by QT, thus terminating further MT partitions. Finally, a completely novel method to choose only one partition from five QTMT partitions is applied. Obviously, homogeneous area tends to use a larger CU as a whole to do prediction while CUs with complicated texture are prone to be divided into small sub-CUs and these sub-CUs usually have different textures from each other. We calculate the variance of variance of each sub-CU to decide which partition will distinguish the sub-textures best. Our method is embedded into the latest VVC official reference software VTM-7.0. Comparing to anchor VTM-7.0, our method saves the encoding time by 49.27% on average at the cost of only 1.63% BDBR increase. As a traditional scheme based on variance and gradient to decrease the computational complexity in VVC intra coding, our method outperforms other relative existing state-of-the-art methods, including traditional machine learning and convolution neural network methods.

    DOI

  • CNN Based Optimal Intra Prediction Mode Estimation in Video Coding

    Ryota Yokoyama, Masahiko Tahara, Masaru Takeuchi, Heming Sun, Yasutaka Matsuo, Jiro Katto

    IEEE International Conference on Consumer Electronics (ICCE)   2020-January  2020年01月  [査読有り]

    DOI

    Scopus

    2
    被引用数
    (Scopus)
  • Small-Area and Low-Power FPGA-Based Multipliers using Approximate Elementary Modules.

    Yi Guo, Heming Sun, Shinji Kimura

    Asia and South Pacific Design Automation Conference (ASP-DAC)     599 - 604  2020年  [査読有り]

    DOI

    Scopus

    7
    被引用数
    (Scopus)
  • Fast Variance- and Gradient-based QTMT Partition Decision Algorithm in VVC Intra Coding

    Jun’an Chen, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    IEEE International Conference on Visual Communications and Image Processing (VCIP)    2019年12月  [査読有り]

    DOI

    Scopus

    28
    被引用数
    (Scopus)
  • Dual Learning-based Video Coding with Inception Dense Blocks

    Chao Liu, Heming Sun, Jun’an Chen, Zhengxue Cheng, Masaru Takeuchi, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    Picture Coding Symposium (PCS)    2019年11月  [査読有り]

    DOI

    Scopus

    5
    被引用数
    (Scopus)
  • Road Infrastructure Monitoring System using E-Bikes and Its Extensions for Smart Community

    Jiro Katto, Masaru Takeuchi, Kenji Kanai, Heming Sun

    Proceedings of the Annual International Conference on Mobile Computing and Networking, MOBICOM     43 - 44  2019年10月

     概要を見る

    In this paper, we present a road infrastructure monitoring system using e-bikes to support smart community. Smartphone and IoT (Internet of Things) devices are attached to e-bikes, and electric power is supplied from the batteries of e-bikes. Image processing techniques based on deep learning are applied and run on IoT devices or cloud backbone. A prototype system is implemented, which performs from image capturing to web browsing of the processed images.

    DOI

    Scopus

    6
    被引用数
    (Scopus)
  • A Gamut Extension Method considering Color Information Restoration using Convolutional Neural Networks

    Masaru Takeuchi, Yusuke Sakamoto, Ryota Yokoyama, Heming Sun, Yasutaka Matsuo, Jiro Katto

    IEEE International Conference on Image Processing (ICIP)   2019-September   774 - 778  2019年09月  [査読有り]

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • Perceptual Quality Study on Deep Learning based Image Compression

    Zhengxue Cheng, Pinar Akyazi, Heming Sun, Jiro Katto, Touradj Ebrahimi

    IEEE International Conference on Image Processing (ICIP)   2019-September   719 - 723  2019年09月  [査読有り]

    DOI

    Scopus

    10
    被引用数
    (Scopus)
  • Deep Residual Learning for Image Compression

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    Conference on Computer Vision and Pattern Recognition (CVPR) Workshops    2019年06月  [査読有り]

  • Learning Image and Video Compression through Spatial-Temporal Energy Compaction

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    Conference on Computer Vision and Pattern Recognition (CVPR)   2019-June   10063 - 10072  2019年06月  [査読有り]

    DOI

    Scopus

    43
    被引用数
    (Scopus)
  • A MinimalAdder-oriented 1D DST-VII/DCT-VIII Hardware Implementation for VVC Standard

    Yixuan Zeng, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    IEEE International System-on-chip Conference (ISOCC)   2019-September   176 - 180  2019年06月  [査読有り]

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • Design of Low-Cost Approximate Multipliers Based on Probability-Driven Inexact Compressors.

    Yi Guo, Heming Sun, Ping Lei, Shinji Kimura

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   102-A ( 12 ) 1781 - 1791  2019年  [査読有り]

    DOI

    Scopus

  • Gamut-Extension Methods Considering Color Information Restoration

    Masaru Takeuchi, Yusuke Sakamoto, Ryota Yokoyama, Heming Sun, Yasutaka Matsuo, Jiro Katto

    IEEE Access   7   80146 - 80158  2019年  [査読有り]

     概要を見る

    Recently, ultra high definition television (UHDTV) services have become popular using satellites and the Internet. However, there are expansive volumes of high definition television (HDTV) and standard definition television (SDTV) contents held by broadcasting companies and in storage devices. Herein we propose two color space conversion (also known as gamut mapping) methods from BT.709 (used for current HDTV broadcast) to BT.2020 (used for UHDTV broadcast) that restore or estimate lost color information. One of our methods anisotropically diffuses the BT.709 chromaticities with regard to the direction of the original chromaticities in the BT.2020 color space, generating chromaticities out of BT.709 gamut. The other learns an end-to-end conversion method from a BT.709 image to a BT.2020 image and restores lost color information using convolutional neural network (CNN). Using these methods along with BT.709 images, we obtain BT.2020 images with chromaticities from the BT.709 color gamut.

    DOI

  • Approximate DCT Design for Video Encoding Based on Novel Truncation Scheme.

    Heming Sun, Zhengxue Cheng, Amir Masoud Gharehbaghi, Shinji Kimura, Masahiro Fujita

    IEEE Trans. Circuits Syst. I Regul. Pap.   66-I ( 4 ) 1517 - 1530  2019年  [査読有り]

    DOI

    Scopus

    21
    被引用数
    (Scopus)
  • Energy-Efficient and High-Speed Approximate Signed Multipliers with Sign-Focused Compressors.

    Yi Guo, Heming Sun, Shinji Kimura

    IEEE International System-on-chip Conference (ISOCC)     330 - 335  2019年  [査読有り]

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • Deep Convolutional AutoEncoder-based Lossy Image Compression

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    Picture Coding Symposium     253 - 257  2018年06月  [査読有り]

  • Performance Comparison of Convolutional AutoEncoders, Generative Adversarial Networks and Super-Resolution for Image Compression

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops     2613 - 2616  2018年06月  [査読有り]

  • Lossy Image Compression using Deep Convolutional AutoEncoder

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    IEICE technical report   118 ( 73 ) 15 - 20  2018年06月

  • Sparse ternary connect: Convolutional neural networks using ternarized weights with enhanced sparsity.

    Canran Jin, Heming Sun, Shinji Kimura

    Asia and South Pacific Design Automation Conference (ASP-DAC)     190 - 195  2018年  [査読有り]

    DOI

    Scopus

    4
    被引用数
    (Scopus)
  • Design of Power and Area Efficient Lower-Part-OR Approximate Multiplier.

    Yi Guo, Heming Sun, Shinji Kimura

    IEEE Region 10 Conference (TENCON)     2110 - 2115  2018年  [査読有り]

    DOI

    Scopus

    9
    被引用数
    (Scopus)
  • Low-Cost Approximate Multiplier Design using Probability-Driven Inexact Compressors.

    Yi Guo, Heming Sun, Li Guo 0006, Shinji Kimura

    IEEE Asia Pacific Conference on Circuits and Systems     291 - 294  2018年  [査読有り]

    DOI

    Scopus

    21
    被引用数
    (Scopus)
  • Fast Algorithm and VLSI Architecture of Rate Distortion Optimization in H.265/HEVC

    Heming Sun, Dajiang Zhou, Landan Hu, Shinji Kimura, Satoshi Goto

    IEEE TRANSACTIONS ON MULTIMEDIA   19 ( 11 ) 2375 - 2390  2017年11月  [査読有り]

     概要を見る

    In H.265/high efficiency video coding (HEVC) encoding, rate distortion optimization (RDO) is an important cost function for mode decision and coding structure decision. Despite being near-optimum in terms of coding efficiency, RDO suffers from a high complexity. To address this problem, this paper presents a fast RDO algorithm and its very large scale implementation (VLSI) for both intra-and inter-frame coding. The proposed algorithm employs a quantization-free framework that significantly reduces the complexity for rate and distortion optimization. Meanwhile, it maintains a low degradation of coding efficiency by taking the syntax element organization and probability model of HEVC into consideration. The algorithm is also designed with hardware architecture in mind to support an efficient VLSI implementation. When implemented in the HEVC test model, the proposed algorithm achieves 62% RDO time reduction with 1.85% coding efficiency loss for the "all-intra" configuration. The hardware implementation achieves 1.6 x higher normalized throughput relative to previous works, and it can support a throughput of 8k@30fps (for four fine-processed modes per prediction unit) with 256 k logic gates when working at 200 MHz.

    DOI

    Scopus

    12
    被引用数
    (Scopus)
  • Time-efficient and TSV-aware 3D gated clock tree synthesis based on self-tuning spectral clustering

    Fan Yang, Minghao Lin, Heming Sun, Shinji Kimura

    Midwest Symposium on Circuits and Systems   2017-   1200 - 1203  2017年09月  [査読有り]

     概要を見る

    3D gated clock tree synthesis (CTS) mainly consists of three steps: 1) abstract clock topology generation
    2) layer embedding for minimal TSV allocation and 3) clock tree routing with gate and buffer insertion. In this paper, a self-tuning spectral clustering based nearest-neighbor selection (SSC-NNS) algorithm with parallel structure is proposed to achieve high time efficiency in clock tree topology generation, with reduced runtime. In addition, a postorder traversal based layer embedding (PTLE) strategy is adopted for determining the embedding layer of internal nodes with minimal TSVges. Experimental results show that the proposed method achieves 32% and 82% runtime reduction on ISPD2009 and IBM benchmarks respectively compared with the state-of-the-art 3D work. Besides, the TSV count is also reduced by 46% on ISPD2009 benchmarks.

    DOI

    Scopus

  • A low-cost approximate 32-point transform architecture

    Heming Sun, Zhengxue Cheng, Amir Masoud Gharehbaghi, Shinji Kimura, Masahiro Fujita

    Proceedings - IEEE International Symposium on Circuits and Systems     1 - 4  2017年09月  [査読有り]

     概要を見る

    This paper presents an area-efficient approximate method for 32-point transform which is one of the most area-consuming parts in High Efficiency Video Coding (HEVC) applications. Compared to prior literatures, this work reduces the hardware cost of transform by 1) eliminating all the arithmetic operations of 6 least significant bits (LSB), 2) presenting a low-delay method for generating carry propagation from the remaining 5 LSBs and 3) truncating the most significant bits (MSB) according to the position of component. In the implementation of a 32-point forward transform, the experimental results show that 27% area consumption can be saved and the coding efficiency loss aroused by the approximation is only 0.044% compared with the origin.

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • High Accuracy 8×8 Approximate Multiplier based on OR Operation

    Yi Guo, Heming Sun, Canran Jin, Shinji Kimura

    IEICE technical report   116 ( 478 ) 19 - 24  2017年03月

    CiNii

  • Accelerating HEVC Inter Prediction with Improved Merge Mode Handling

    Zhengxue Cheng, Heming Sun, Dajiang Zhou, Shinji Kimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E100A ( 2 ) 546 - 554  2017年02月  [査読有り]

     概要を見る

    High Efficiency Video Coding (HEVC/H.265) obtains 50% bit rate reduction than H.264/AVC standard with comparable quality at the cost of high computational complexity. Merge mode is one of the most important new features introduced in HEVC's inter prediction. Merge mode and traditional inter mode consume about 90% of the total encoding time. To address this high complexity, this paper utilizes the merge mode to accelerate inter prediction by four strategies. 1) A merge candidate decision is proposed by the sum of absolute transformed difference (SATD) cost. 2) An early merge termination is presented with more than 90% accuracy. 3) Due to the compensation effect of merge candidates, symmetric motion partition (SMP) mode is disabled for non-8x8 coding units (CUs). 4) A fast coding unit filtering strategy is proposed to reduce the number of CUs which need to be fine-processed. Experimental results demonstrate that our fast strategies can achieve 35.4%-58.7% time reduction with 0.68%-1.96% BD-rate increment in RA case. Compared with similar works, the proposed strategies are not only among the best performing in average-case complexity reduction, but also notably outperforming in the worst cases.

    DOI

    Scopus

    4
    被引用数
    (Scopus)
  • An 8K H.265/HEVC Video Decoder Chip With a New System Pipeline Design

    Dajiang Zhou, Shihao Wang, Heming Sun, Jianbin Zhou, Jiayi Zhu, Yijin Zhao, Jinjia Zhou, Shuping Zhang, Shinji Kimura, Takeshi Yoshimura, Satoshi Goto

    IEEE JOURNAL OF SOLID-STATE CIRCUITS   52 ( 1 ) 113 - 126  2017年01月  [査読有り]

     概要を見る

    8K ultra-HD is being promoted as the next-generation video specification. While the High Efficiency Video Coding (HEVC) standard greatly enhances the feasibility of 8K with a doubled compression ratio, its implementation is a challenge, owing to ultrahigh-throughput requirements and increased complexity per pixel. The latter comes from the new features of HEVC. At the system level, the most challenging of them is the enlarged and highly variable-size coding/prediction/transform units, which significantly increase the requirement for on-chip memory as pipeline buffers and the difficulty in maintaining pipeline utilization. This paper presents an HEVC decoder chip featuring a system pipeline that works at a nonunified and variable granularity. The pipeline saves on-chip memory with a novel block-in-block-out queue system and a parameter delivery network, while allowing overhead-free and fully pipelined operation of the processing components. With the system pipeline design combined with various component-level optimizations, the proposed decoder in 40 nm achieves a maximum throughput of 4 Gpixels/s or 8K 120 frames/s for the low-delay-P configuration of HEVC, 7.5-55 times faster than prior works. It supports 8K 60 frames/s for the low-delay and random-access configurations. In a normalized comparison, it also shows 3.1-3.6 times better area efficiency and 31%-55% superior energy efficiency.

    DOI

    Scopus

    20
    被引用数
    (Scopus)
  • A Low-Power VLSI Architecture for HEVC De-Quantization and Inverse Transform

    Heming Sun, Dajiang Zhou, Shuping Zhang, Shinji Kimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E99A ( 12 ) 2375 - 2387  2016年12月  [査読有り]

     概要を見る

    In this paper, we present a low-power system for the de-quantization and inverse transform of HEVC. Firstly, we present a low-delay circuit to process the coded results of the syntax elements, and then reduce the number of multipliers from 16 to 4 for the de-quantization process of each 4x4 block. Secondly, we give two efficient data mapping schemes for the memory between de-quantization and inverse transform, and the memory for transpose. Thirdly, the zero information is utilized through the whole system. For two memory parts, the write and read operation of zero blocks/ rows/ coefficients can all be skipped to save the power consumption. The results show that up to 86% power consumption can be saved for the memory part under the configuration of Random-access and common QPs. For the logical part, the proposed architecture for de-quantization can reduce 77% area consumption. Overall, our system can support real-time coding for 8K x 4K 120fps video sequences and the normalized area consumption can be reduced by 68% compared with the latest work.

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • A 4Gpixel/s 8/10b H.265/HEVC Video Decoder Chip for 8K Ultra HD Applications

    Dajiang Zhou, Shihao Wang, Heming Sun, Jianbin Zhou, Jiayi Zhu, Yijin Zhao, Jinjia Zhou, Shuping Zhang, Shinji Kimura, Takeshi Yoshimura, Satoshi Goto

    2016 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC)   59   266 - U369  2016年  [査読有り]

    DOI

    Scopus

    23
    被引用数
    (Scopus)
  • Power-Efficient and Slew-Aware Three Dimensional Gated Clock Tree Synthesis

    Minghao Lin, Heming Sun, Shinji Kimura

    2016 IFIP/IEEE INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC)     1 - 6  2016年  [査読有り]

     概要を見る

    This paper presents a three dimensional (3D) gated clock tree synthesis (CTS) approach, which consists of two steps: 1) abstract tree topology generation; and 2) 3D gated and buffered clock routing. 3D Pair Matching (3D-PM) algorithm is proposed to generate the initial tree topology and then the proposed TSV-minimization algorithm is applied to generate TSV-aware tree topology. Based on TSV-aware tree topology, 3D gated and buffered clock tree routing is done using the proposed 3D Gated and Buffered Deferred-Merge Embedding (3D-GB-DME) algorithm. The slew constraint satisfaction is considered and the clock skew is minimized in our approach. Experimental results show that the proposed method achieves 29.11% power reduction compared with the state-of-the-art 2D work.

    DOI

    Scopus

    7
    被引用数
    (Scopus)
  • Human Detection Method Based on Non-Redundant Gradient Semantic Local Binary Patterns

    Jiu Xu, Ning Jiang, Wenxin Yu, Heming Sun, Satoshi Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E98A ( 8 ) 1735 - 1742  2015年08月  [査読有り]

     概要を見る

    In this paper, a feature named Non-Redundant Gradient Semantic Local Binary Patterns (NRGSLBP) is proposed for human detection as a modified version of the conventional Semantic Local Binary Patterns (SLBP). Calculations of this feature are performed for both intensity and gradient magnitude image so that texture and gradient information are combined. Moreover, and to the best of our knowledge, non-redundant patterns are adopted on SLBP for the first time, allowing better discrimination. Compared with SLBP, no additional cost of the feature dimensions of NRGSLBP is necessary, and the calculation complexity is considerably smaller than that of other features. Experimental results on several datasets show that the detection rate of our proposed feature outperforms those of other features such as Histogram of Orientated Gradient (HOG), Histogram of Templates (HOT), Bidirectional Local Template Patterns (BLTP), Gradient Local Binary Patterns (GLBP), SLBP and Covariance matrix (COV).

    DOI

    Scopus

    3
    被引用数
    (Scopus)
  • A fast level filtering algorithm for inter prediction in HEVC encoder

    Zhengxue Cheng, Heming Sun, Landan Hu, Shinji Kimura

    International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC)     404 - 407  2015年06月  [査読有り]

  • HARDWARE-ORIENTED RATE-DISTORTION OPTIMIZATION ALGORITHM FOR HEVC INTRA-FRAME ENCODER

    Landan Hu, Heming Sun, Dajiang Zhou, Shinji Kimura

    2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)     1 - 6  2015年  [査読有り]

     概要を見る

    Digital video is widely used in the mobile applications, where video compression technology is necessary to store or transmit the videos. High Efficiency Video Coding (HEVC) achieves the highest compression ratio while it costs huge computational complexity, in which rate-distortion (RD) cost calculation takes the majority. This paper presents a low-complexity RD estimation method for HEVC intra prediction by the following schemes. 1) The transformed coefficients rather than quantized coefficients are used to do the RD estimation. 2) For the rate part, the position after the last non-zero quantized coefficient is considered to improve the accuracy of estimation, and a header-bit estimation method is presented to save about 82% complexity on header bits calculation. 3) For the distortion part, the scaling parameter of quantization is modified to the exponential of two so that the bit depth of multiplication can be reduced from 15 to 5 in the worst case. 4) In transform unit 4x4, we consider transform skip mode which is neglect in the prior research. Our proposal could achieve 72.22% time reduction of rate-distortion optimization (RDO) compared with original HEVC Test Model while the BD-rate is only 1.76%.

    DOI

    Scopus

    5
    被引用数
    (Scopus)
  • Merge Mode Based Fast Inter Prediction for HEVC

    Zhengxue Cheng, Heming Sun, Dajiang Zhou, Shinji Kimura

    2015 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)    2015年  [査読有り]

     概要を見る

    The latest High Efficiency Video Coding (HEVC/H.265) obtains 50% bit rate reduction than H.264/AVC standard with comparable quality, but at the cost of high computational complexity. Inter prediction accounts for large complexity and merge mode is one of the most important new features introduced in HEVC. To address this issue, this paper utilizes the merge mode to accelerate inter prediction by three fast mode decision methods. 1) A merge candidate decision is proposed to select the best merge mode by Sum of Absolute Transformed Difference ( SATD) cost to reduce the merge time. 2) An early merge termination is presented still based on SATD cost with more than 90% accuracy. 3) Based on efficient merge mode, symmetric motion partition (SMP) modes can be disabled for non-8x8 code units (CUs). Experimental results demonstrate that our work can achieve 53.1%-54.2% time reduction on average with 1.57%-2.30% BD-rate increment. Besides, our method achieves an improvement of 18%-30% time reduction with 0.89%-2.85% BD-rate increment when combined with other existing approaches.

    DOI

    Scopus

    1
    被引用数
    (Scopus)
  • A Low-Cost VLSI Architecture of Multiple-Size IDCT for H.265/HEVC

    Heming Sun, Dajiang Zhou, Peilin Liu, Satoshi Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E97A ( 12 ) 2467 - 2476  2014年12月  [査読有り]

     概要を見る

    In this paper, we present an area-efficient 4/8/16/32-point inverse discrete cosine transform (IDCT) architecture for a HEVC decoder. Compared with previous work, this work reduces the hardware cost from two aspects. First, we reduce the logical costs of ID IDCT by proposing a reordered parallel-in serial-out (RPISO) scheme. By using the RPISO scheme, we can reduce the required calculations for butterfly inputs in each cycle. Secondly, we reduce the area of transpose architecture by proposing a cyclic data mapping scheme that can achieve 100% I/O utilization of each SRAM. To design a fully pipelined 2D IDCT architecture, we propose a pipelining schedule for row and column transform. The results show that the normalized area by maximum throughput for the logical IDCT part can be reduced by 25%, and the memory area can be reduced by 62%. The maximum throughput reaches 1248 Mpixels/s, which can support real-time decoding of a 4K x 2K 60 fps video sequence.

    DOI

    Scopus

    12
    被引用数
    (Scopus)
  • A fast mode selection algorithm for HEVC intra prediction

    Heming Sun, Satoshi Goto

    International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC)     449 - 451  2014年07月  [査読有り]

  • Fast Prediction Unit Selection and Mode Selection for HEVC Infra Prediction

    Heming Sun, Dajiang Zhou, Peilin Liu, Satoshi Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E97A ( 2 ) 510 - 519  2014年02月  [査読有り]

     概要を見る

    As a next-generation video compression standard, High Efficiency Video Coding (HEVC) achieves enhanced coding performance relative to prior standards such as H.264/AVC. In the new standard, the improved intra prediction plays an important role in bit rate saving. Meanwhile, it also involves significantly increased complexity, due to the adoption of a highly flexible coding unit structure and a large number of angular prediction modes. In this paper, we present a low-complexity intra prediction algorithm for HEVC. We first propose a fast preprocessing stage based on a simplified cost model. Based on its results, a fast prediction unit selection scheme reduces the number of prediction unit (PU) levels that requires fine processing from 5 to 2. To supply PU size decision with appropriate thresholds, a fast training method is also designed. Still based on the preprocessing results, an efficient mode selection scheme reduces the maximum number of angular modes to evaluate from 35 to 8. This achieves further algorithm acceleration by eliminating the necessity to perform fine Hadamard cost calculation. We also propose a 32 x 32 PU compensation scheme to alleviate the mismatch of cost functions for large transform units, which effectively improves coding performance for high-resolution sequences. In comparison with HM 7.0, the proposed algorithm achieves over 50% complexity reduction in terms of encoding time, with the corresponding bit rate increase lower than 2.0%. Moreover, the achieved complexity reduction is relatively stable and independent to sequence characteristics.

    DOI

    Scopus

    4
    被引用数
    (Scopus)
  • Low-complexity rate-distortion optimization algorithms for HEVC intra prediction

    Zhe Sheng, Dajiang Zhou, Heming Sun, Satoshi Goto

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8325 ( 1 ) 541 - 552  2014年  [査読有り]

     概要を見る

    HEVC achieves a better coding efficiency relative to prior standards, but also involves dramatically increased complexity. The complexity increase for intra prediction is especially intensive due to a highly flexible quad-tree coding structure and a large number of prediction modes. The encoder employs rate-distortion optimization (RDO) to select the optimal coding mode. And RDO takes a great portion of intra encoding complexity.Moreover HEVC has stronger dependency on RDO than H.264/AVC. To reduce the computational complexity and to implement a real-time system,this paper presents two low-complexity RDO algorithms for HEVC intra prediction. The structure of RDO is simplified by the proposed rate and distortion estimators, and some hardware-unfriendly modules are facilitated. Compared with the original RDO procedure, the two proposed algorithms reduce RDO time by 46% and 64% respectively with acceptable coding efficiency loss. © 2014 Springer International Publishing.

    DOI

    Scopus

    31
    被引用数
    (Scopus)
  • VLSI ARCHITECTURE OF HEVC INTRA PREDICTION FOR 8K UHDTV APPLICATIONS

    Jianbin Zhou, Dajiang Zhou, Heming Sun, Satoshi Goto

    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)     1273 - 1277  2014年  [査読有り]

     概要を見る

    This paper presents an efficient VLSI architecture of intra prediction for 8Kx4K HEVC decoder. It supports all 35 intra prediction modes and prediction sizes ranging from 4x4 to 64x64. This works proposed a Cyclic SRAM Banks based Parallel Reference Sample Fetching (CSB-PRSF), which guarantees enough reference samples for prediction and reduces the number of registers used for storing reference samples. To guarantee high throughput, 16 pixels are predicted by 4x4 Block Based Pipelining, and dependency between neighboring blocks is eliminated by Hybrid Data Forwarding and Block Reordering.
    This architecture is synthesized using 90nm technology and the maximum working frequency is 469 MHz, with 72.1K gates area. Running at 397MHz, the architecture can support 4320p@120fps HEVC intra decoding, with full modes and full sizes.

    DOI

    Scopus

    8
    被引用数
    (Scopus)
  • AN AREA-EFFICIENT 4/8/16/32-POINT INVERSE DCT ARCHITECTURE FOR UHDTV HEVC DECODER

    Heming Sun, Dajiang Zhou, Jiayi Zhu, Shinji Kimura, Satoshi Goto

    2014 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING CONFERENCE     197 - 200  2014年  [査読有り]

     概要を見る

    This paper presents a new VLSI architecture for HEVC inverse discrete cosine transform (IDCT). Compared to prior arts, this work reduces hardware cost by 1) reducing computational logic of 1-D IDCTs with a reordered parallel-in serial-out (RPISO) scheme that shares the inputs of the butterfly structure, and 2) reducing the area of the transpose buffer with a cyclic memory organization that achieves 100% I/O utilization of the SRAMs. In the implementation of a unified 4/8/16/32-point IDCT, the proposed schemes demonstrate 35% and 62% reduction of logic and memory costs, respectively. The IDCT implementation can support real-time decoding of 4Kx2K 60fps video with a total hardware cost of 357,250um(2) on 2-D IDCT and 80,988um(2) on transpose memory in 90nm process.

    DOI

    Scopus

    14
    被引用数
    (Scopus)
  • Real-Time Human Detection Based on Multi-Scale Bidirectional Local Template Patterns

    Jiu Xu, Ning Jiang, Heming Sun, Axel Beaugendre, Satoshi Goto

    IIEEJ Transactions on Electronics and Visual Computing   1 ( 1 ) 28 - 37  2013年12月  [査読有り]

    CiNii

  • Multi-scale Bidirectional Local Template Patterns for Real-time Human Detection

    Jiu Xu, Ning Jiang, Xinwei Xue, Heming Sun, Wenxin Yu, Satoshi Goto

    2013 IEEE 15TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP)     379 - 383  2013年  [査読有り]

     概要を見る

    In this paper, a feature named multi-scale bidirectional local template patterns (MBLTP) is proposed for human detection. As an extension of bidirectional local template patterns (BLTP), MBLTP not only integrates the textural and gradient information according to the four predefined templates but also calculates information for additional feature vectors by adjusting the scale of the training samples. These additional feature vectors contain multi-scale information on the samples, which can make the feature more discriminative than its original form. Experimental results for an INRIA dataset show that the detection rate of our proposed MBLTP feature outperforms those of other features such as the multi-level histogram of orientated gradient (multi-level HOG), multi scale block histogram of template (MB-HOT), and HOG-LBP. Moreover, in order to make our feature meet real-time requirements, an implementation based on a graphic process unit (GPU) is adopted to accelerate the calculation.

    DOI

    Scopus

  • A low-complexity HEVC intra prediction algorithm based on level and mode filtering

    Heming Sun, Dajiang Zhou, Satoshi Goto

    Proceedings - IEEE International Conference on Multimedia and Expo     1085 - 1090  2012年  [査読有り]

     概要を見る

    HEVC achieves a better coding efficiency relative to prior standards, but also involves increased complexity. For intra prediction, complexity is especially intensive due to a highly flexible coding unit structure and a large number of prediction modes. This paper presents a low-complexity intra prediction algorithm for HEVC. A fast preprocessing stage based on a simplified cost model is proposed. Based on its results, a level filtering scheme reduces the number of prediction unit levels that requires fine processing from 5 to 2. To supply level filtering decision with appropriate thresholds, a fast training method is also designed. A mode filtering scheme further reduces the maximum number of angular modes to be evaluated from 34 to 9. Complexity reduction from HM 3.0 is over 50% and stable for various sequences, which makes the proposed algorithm suitable for real-time applications. The corresponding bit rate increase is lower than 2.5%. © 2012 IEEE.

    DOI

    Scopus

    52
    被引用数
    (Scopus)

▼全件表示

受賞

  • 安藤博記念学術奨励賞

    2022年06月   一般財団法人 安藤研究所  

  • Picture Coding Symposium Top-10 Paper

    2021年06月  

  • IEEE VCIP Best Paper Award

    2020年12月  

  • 研究奨励賞

    2020年01月   公益財団法人 高柳健次郎財団  

  • Picture Coding Symposium Grand Challenge on Short Video Coding Silver Award

    2019年11月  

  • CVPR Workshop and Challenge on Learned Image Compression MOS 第五位

    2019年06月  

  • テレコムシステム技術学生賞

    2018年03月   公益財団法人 電気通信普及財団  

  • VDEC デザインアワード奨励賞

    2017年09月   東京大学大規模集積システム設計教育研究センター  

  • ISSCC 2016 Takuo Sugano Award for Outstanding Far-East Paper

    2016年02月  

▼全件表示

共同研究・競争的資金等の研究課題

  • Low-complexity research for next-generation VVC standard and its neural network extension

    日本学術振興会  科学研究費助成事業 若手研究

    研究期間:

    2021年04月
    -
    2023年03月
     

    孫 鶴鳴

  • リアルタイム低電力深層学習適用による革新的な動画像圧縮システム

    戦略的な研究開発の推進 戦略的創造研究推進事業 さきがけ

    研究期間:

    2019年10月
    -
    2023年03月
     

    孫 鶴鳴

     概要を見る

    動画像圧縮率向上のため、深層学習が検討され、既存の圧縮標準(HEVC)を凌駕することが期待されています。一方、実用化の観点から汎用的なGPUによる処理はリアルタイムコーディングを達成できません。そこで、本研究は深層学習ベースの動画像圧縮専用のFPGA/ASICハードウェアアクセラレータを開発し、アルゴリズム、アーキテクチャ連携により圧縮率、スループット、電力効率を最大化するシステムを実現します。

講演・口頭発表等

  • Advances in Design and Implementation of End-to-End Learned Image and Video Compression

    Wen-Hsiao Peng, Heming Sun

    IEEE ISCAS  

    発表年月: 2021年05月

  • Deep Learning Method for Image Compression

    Heming Sun

    情報処理学会 短期集中セミナー(SC 29)  

    発表年月: 2021年02月

 

現在担当している科目

▼全件表示

担当経験のある科目(授業)

  • ニューラルネットワークの理論と応用

    2019年09月
    -
    継続中
     

  • Fundamentals of Programming

    2019年04月
    -
    継続中
     

 

委員歴

  • 2022年12月
     
     

    Picture Coding Symposium Special Session Chair

  • 2022年05月
     
     

    IEEE ISCAS Session Chair

  • 2021年12月
     
     

    IEEE VCIP Area Chair

  • 2021年05月
     
     

    IEEE ISCAS  Session Chair

  • 2019年11月
     
     

    Picture Coding Symposium  Special Session Chair

学術貢献活動

  • ACM Transactions on Reconfigurable Technology and Systems

  • International Journal of Computer Vision

  • IEEE Signal Processing Letters

  • IEEE Transactions on Pattern Analysis and Machine Intelligence

  • IEEE Transactions on Multimedia

  • IEEE Transactions on Circuits and Systems I: Regular Papers

  • IEEE Transactions on Circuits and Systems—II: Express Briefs

  • IEEE Transactions on Circuits and Systems for Video Technology

▼全件表示