2022/08/17 更新

写真a

ソン カクメイ
孫 鶴鳴
所属
理工学術院 理工学術院総合研究所
職名
次席研究員(研究院講師)

兼担

  • 理工学術院   基幹理工学部

学歴

  • 2014年04月
    -
    2017年03月

    早稲田大学   情報生産システム研究科   博士課程  

  • 2012年09月
    -
    2014年03月

    上海交通大学   電子情報電気工学研究科   修士課程  

  • 2010年09月
    -
    2012年09月

    早稲田大学   情報生産システム研究科   修士課程  

  • 2007年09月
    -
    2011年07月

    上海交通大学   電子情報電気工学部  

学位

  • 2017年03月   早稲田大学   博士(工学)

経歴

  • 2019年10月
    -
    継続中

    国立研究開発法人科学技術振興機構

  • 2018年09月
    -
    継続中

    早稲田大学   次席研究員

  • 2017年04月
    -
    2018年09月

    日本電気株式会社   中央研究所   研究員

  • 2016年07月
    -
    2016年09月

    東京大学

  • 2015年08月
    -
    2015年09月

    カリフォルニア大学デービス校

所属学協会

  •  
     
     

    IEEE

  •  
     
     

    電子情報通信学会

 

研究分野

  • 電子デバイス、電子機器

  • 計算機システム

  • 知覚情報処理

研究キーワード

  • 動画像処理

  • 大規模集積回路

  • 深層学習

  • 高性能計算

論文

  • ViT-GAN: Using Vision Transformer as Discriminator with Adaptive Data Augmentation

    Shota Hirose, Naoki Wada, Jiro Katto, Heming Sun

    2021 3rd International Conference on Computer Communication and the Internet, ICCCI 2021     185 - 189  2021年06月

     概要を見る

    These days, attention is thought to be an efficient way to recognize an image. Vision Transformer (ViT) uses a Transformer for images and has very high performance in image recognition. ViT has fewer parameters than Big Transfer (BiT) and Noisy Student. Therefore, we consider that Self-Attention-based networks are slimmer than convolution-based networks. We use a ViT as a Discriminator in a Generative Adversarial Network (GAN) to get the same performance with a smaller model. We name it ViT-GAN. Besides, we find parameter sharing is very useful to make parameter-efficient ViT. However, the performances of ViT heavily depend on the number of data samples. Therefore, we propose a new method of Data Augmentation. Our Data Augmentation, in which the strength of Data Augmentation varies adaptively, helps ViT for faster convergence and better performance. With our Data Augmentation, we show ViT-based discriminator can achieve almost the same FID but the number of the parameters of the discriminator is 35% fewer than the original discriminator.

    DOI

  • Learned Image Compression with Fixed-point Arithmetic

    Heming Sun, Lu Yu, Jiro Katto

    Picture Coding Symposium    2021年06月  [査読有り]

    担当区分:筆頭著者

  • Deep Pedestrian Density Estimation For Smart City Monitoring.

    Kazuki Murayama, Kenji Kanai, Masaru Takeuchi, Heming Sun, Jiro Katto

    ICIP     230 - 234  2021年

    DOI

  • Accelerating convolutional neural network inference based on a reconfigurable sliced systolic array

    Yixuan Zeng, Heming Sun, Jiro Katto, Yibo Fan

    Proceedings - IEEE International Symposium on Circuits and Systems   2021-May  2021年

     概要を見る

    Convolutional neural networks (CNNs) have achieved great successes on many computer vision tasks, such as image recognition, video processing, and target detection. In recent years, many hardware designs have been devoted to accelerating CNN inference. In order to further speed up CNN inference and reduce data waste, this work proposed a reconfigurable sliced systolic array: 1) Depending on the number of network nodes in each layer, the slice mode could be dynamically configured to achieve high throughput and resource utilization. 2) To take full advantage of convolution reuse and weight reuse, this work designed a tile-column sliding (TCS) processing dataflow. 3) A four-stage for loop algorithm was employed, which divides the CNN calculation into several parts based on the input nodes and output nodes. The entire CNN inference is carried out using integer-only arithmetic originated from TensorLite. Experimental results prove that these strategies lead to significant improvement in inference performance and energy efficiency.

    DOI

  • Approximated reconfigurable transform architecture for VVC

    Yixuan Zeng, Heming Sun, Jiro Katto, Yibo Fan

    Proceedings - IEEE International Symposium on Circuits and Systems   2021-May  2021年

     概要を見る

    As the demand for high-resolution videos grows, the next generation video coding standard Versatile Video Coding introduces many new proposals, including Adaptive Multiple Transforms (AMT), to improve coding efficiency. This paper presents a reconfigurable transform core for the VVC standard where the implementation of 1D DST-VII and DCT-VIII for all transform sizes are enabled. To offer a very low circuit complexity, a simple approximation strategy with a little coding performance loss is proposed. An 8×8 Processing Element (PE) array is employed as the core computational unit, where each PE can be configured dynamically based on the transform type. In addition, the transforms of larger sizes can be realized in the finite PE units with the Partitioned Matrix Multiplication (PMM) scheme. The experimental and synthesis results show that this design can save at least 29.1% area compared with other works in literature with the negligible degradation of video quality and a slight increase in the bit rate.

    DOI

  • Fully Neural Network Mode Based Intra Prediction of Variable Block Size

    Heming Sun, Lu Yu, Jiro Katto

    IEEE International Conference on Visual Communications and Image Processing (VCIP)    2020年12月  [査読有り]

    担当区分:筆頭著者

  • Enhanced Intra Prediction for Video Coding by Using Multiple Neural Networks

    Heming Sun, Zhengxue Cheng, Masaru Takeuchi, Jiro Katto

    IEEE Transactions on Multimedia   22 ( 11 ) 2764 - 2779  2020年11月  [査読有り]

    担当区分:筆頭著者

     概要を見る

    This paper enhances the intra prediction by using multiple neural network modes (NM). Each NM serves as an end-To-end mapping from the neighboring reference blocks to the current coding block. For the provided NMs, we present two schemes (appending and substitution) to integrate the NMs with the traditional modes (TM) defined in high efficiency video coding (HEVC). For the appending scheme, each NM is corresponding to a certain range of TMs. The categorization of TMs is based on the expected prediction errors. After determining the relevant TMs for each NM, we present a probability-Aware mode signaling scheme. The NMs with higher probabilities to be the best mode are signaled with fewer bits. For the substitution scheme, we propose to replace the highest and lowest probable TMs. New most probable mode (MPM) generation method is also employed when substituting the lowest probable TMs. Experimental results demonstrate that using multiple NMs will improve the coding efficiency apparently compared with the single NM. Specifically, proposed appending scheme with seven NMs can save 2.6%, 3.8%, and 3.1% BD-rate for Y, U, and V components compared with using single NM in the state-of-The-Art works.

    DOI

  • HEVC video coding with deep learning based frame interpolation

    Joi Shimizu, Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    2020 IEEE 9th Global Conference on Consumer Electronics, GCCE 2020     433 - 434  2020年10月

     概要を見る

    Recent researches in video frame interpolation show great progress. In this paper, we propose a novel video compression method which incorporates deep learning based frame interpolation into HEVC which is the current video compression standard. Experimental results show that our approach can outperform HEVC in some scenarios.

    DOI

  • End-To-End Learned Image Compression With Fixed Point Weight Quantization

    Heming Sun, Zhengxue Cheng, Masaru Takeuchi, Jiro Katto

    2020 IEEE International Conference on Image Processing (ICIP)   2020-October   3359 - 3363  2020年10月  [査読有り]

    担当区分:筆頭著者

     概要を見る

    Learned image compression (LIC) has reached the traditional hand-crafted methods such as JPEG2000 and BPG in terms of the coding gain. However, the large model size of the network prohibits the usage of LIC on resource-limited embedded systems. This paper presents a LIC with 8-bit fixed-point weights. First, we quantize the weights in groups and propose a non-linear memory-free codebook. Second, we explore the optimal grouping and quantization scheme. Finally, we develop a novel weight clipping fine tuning scheme. Experimental results illustrate that the coding loss caused by the quantization is small, while around 75% model size can be reduced compared with the 32-bit floating-point anchor. As far as we know, this is the first work to explore and evaluate the LIC fully with fixed-point weights, and our proposed quantized LIC is able to outperform BPG in terms of MS-SSIM.

    DOI

  • Scalable Learned Image Compression With A Recurrent Neural Networks-Based Hyperprior

    Rige Su, Zhengxue Cheng, Heming Sun, Jiro Katto

    2020 IEEE International Conference on Image Processing (ICIP)   2020-October   3369 - 3373  2020年10月  [査読有り]

     概要を見る

    Recently learned image compression has achieved many great progresses, such as representative hyperprior and its variants based on convolutional neural networks (CNNs). However, CNNs are not fit for scalable coding and multiple models need to be trained separately to achieve variable rates. In this paper, we incorporate differentiable quantization and accurate entropy models into recurrent neural networks (RNNs) architectures to achieve a scalable learned image compression. First, we present an RNN architecture with quantization and entropy coding. To realize the scalable coding, we allocate the bits to multiple layers, by adjusting the layer-wise lambda values in Lagrangian multiplier-based rate-distortion optimization function. Second, we add an RNN-based hyperprior to improve the accuracy of entropy models for multiple-layer residual representations. Experimental results demonstrate that our performance can be comparable with recent CNN-based hyperprior methods on Kodak dataset. Besides, our method is a scalable and flexible coding approach, to achieve multiple rates using one single model, which is very appealing.

    DOI

  • A Pipelined 2D Transform Architecture Supporting Mixed Block Sizes for the VVC Standard

    Yibo Fan, Yixuan Zeng, Heming Sun, Jiro Katto, Xiaoyang Zeng

    IEEE Transactions on Circuits and Systems for Video Technology    2020年09月  [査読有り]

  • A Learning-Based Low Complexity in-Loop Filter for Video Coding

    Chao Liu, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)    2020年07月  [査読有り]

     概要を見る

    With the continuous development of mobile devices, it becomes possible for people to demand higher definition videos. To alleviate the pressure of deploying the video codec in mobile multimedia, a learning-based low complexity in-loop filter is proposed in this paper. Depthwise separable convolution is combined with batch normalization to construct this model. To enhance its performance, the knowledge from a pre-trained teacher model is transferred to it. However, the over-smoothing problem in the inter frames caused by double enhancing effect remains. To solve this, a Wiener-based filtering algorithm that tries to restore the distortion from the learned residuals is designed and introduces an adequate filtering effect. The experimental results show that our proposed methods achieve considerable BD-rate reduction than HEVC anchor. Compared with the previous state-of-the-art work VR-CNN, our model achieves 1.65% extra BD-rate reduction, 79.1% decrease in FLOPs, 25% decrease in encoding complexity, and 70% decoding complexity decrease.

    DOI

  • Low Bitrate Image Compression with Discretized Gaussian Mixture Likelihoods

    Zhengxue Cheng, Heming Sun, Jiro Katto

    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)   2020-June   543 - 546  2020年06月  [査読有り]

     概要を見る

    In this paper, we provide a detailed description on our submitted method Kattolab to Workshop and Challenge on Learned Image Compression (CLIC) 2020. Our method mainly incorporates discretized Gaussian Mixture Likeli-hoods to previous state-of-the-art learned compression algorithms. Besides, we also describes the acceleration strategies and bit optimization with the low-rate constraint. Experimental results have demonstrated that our approach Kattolab achieves 0.9761 in terms of MS-SSIM at the rate constraint of 0.15 bpp during the validation phase.

    DOI

  • Learned Image Compression With Discretized Gaussian Mixture Likelihoods and Attention Modules

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)     7936 - 7945  2020年06月  [査読有り]

     概要を見る

    Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM.

    DOI

  • An Image Compression Framework with Learning-based Filter

    Heming Sun, Chao Liu, Jiro Katto, Yibo Fan

    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)   2020-June   602 - 606  2020年06月  [査読有り]

    担当区分:筆頭著者

     概要を見る

    In this paper, a coding framework VIP-ICT-Codec is introduced. Our method is based on the VTM (Versatile Video Coding Test Model). First, we propose a color space conversion from RGB to YUV domain by using a PCA-like operation. A method for the PCA mean calculation is proposed to de-correlate the residual components of YUV channels. Besides, the correlation of UV components is compensated considering that they share the same coding tree in VVC. We also learn a residual mapping to alleviate the over-filtered and under-filtered problem of specific images. Finally, we regard the rate control as an unconstraint Lagrangian problem to reach the target bpp. The results show that we achieve 32.625dB at the validation phase.

    DOI

  • Learned Lossless Image Compression with A Hyperprior and Discretized Gaussian Mixture Likelihoods

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)   2020-May   2158 - 2162  2020年05月  [査読有り]

     概要を見る

    Lossless image compression is an important task in the field of multimedia communication. Traditional image codecs typically support lossless mode, such as WebP, JPEG2000, FLIF. Recently, deep learning based approaches have started to show the potential at this point. HyperPrior is an effective technique proposed for lossy image compression. This paper generalizes the hyperprior from lossy model to lossless compression, and proposes a L2-norm term into the loss function to speed up training procedure. Besides, this paper also investigated different parameterized models for latent codes, and propose to use Gaussian mixture likelihoods to achieve adaptive and flexible context models. Experimental results validate our method can outperform existing deep learning based lossless compression, and outperform the JPEG2000 and WebP for JPG images.

    DOI

  • Energy Compaction-Based Image Compression Using Convolutional AutoEncoder

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    IEEE Transactions on Multimedia    2020年04月  [査読有り]

  • Approximate FPGA-Based Multipliers Using Carry-Inexact Elementary Modules.

    Yi Guo, Heming Sun, Ping Lei, Shinji Kimura

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   103-A ( 9 ) 1054 - 1062  2020年

    DOI

  • A Fast QTMT Partition Decision Strategy for VVC Intra Prediction

    Yibo Fan, Jun'An Chen, Heming Sun, Jiro Katto, Ming'E Jing

    IEEE Access   8   107900 - 107911  2020年  [査読有り]

     概要を見る

    Different from the traditional quaternary tree (QT) structure utilized in the previous generation video coding standard H.265/HEVC, a brand new partition structure named quadtree with nested multi-type tree (QTMT) is applied in the latest codec H.266/VVC. The introduction of QTMT brings in superior encoding performance at the cost of great time-consuming. Therefore, a fast intra partition algorithm based on variance and Sobel operator is proposed in this paper. The proposed method settles the novel asymmetrical partition issue in VVC by well balancing the reduction of computational complexity and the loss of encoding quality. To be more concrete, we first terminate further splitting of a coding unit (CU) when the texture of it is judged as smooth. Then, we use Sobel operator to extract gradient features to decide whether to split this CU by QT, thus terminating further MT partitions. Finally, a completely novel method to choose only one partition from five QTMT partitions is applied. Obviously, homogeneous area tends to use a larger CU as a whole to do prediction while CUs with complicated texture are prone to be divided into small sub-CUs and these sub-CUs usually have different textures from each other. We calculate the variance of variance of each sub-CU to decide which partition will distinguish the sub-textures best. Our method is embedded into the latest VVC official reference software VTM-7.0. Comparing to anchor VTM-7.0, our method saves the encoding time by 49.27% on average at the cost of only 1.63% BDBR increase. As a traditional scheme based on variance and gradient to decrease the computational complexity in VVC intra coding, our method outperforms other relative existing state-of-the-art methods, including traditional machine learning and convolution neural network methods.

    DOI

  • CNN Based Optimal Intra Prediction Mode Estimation in Video Coding

    Ryota Yokoyama, Masahiko Tahara, Masaru Takeuchi, Heming Sun, Yasutaka Matsuo, Jiro Katto

    IEEE International Conference on Consumer Electronics (ICCE)    2020年01月  [査読有り]

  • Small-Area and Low-Power FPGA-Based Multipliers using Approximate Elementary Modules.

    Yi Guo, Heming Sun, Shinji Kimura

    Asia and South Pacific Design Automation Conference (ASP-DAC)     599 - 604  2020年  [査読有り]

    DOI

  • Fast Variance- and Gradient-based QTMT Partition Decision Algorithm in VVC Intra Coding

    Jun’an Chen, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    IEEE International Conference on Visual Communications and Image Processing (VCIP)    2019年12月  [査読有り]

  • Dual Learning-based Video Coding with Inception Dense Blocks

    Chao Liu, Heming Sun, Jun’an Chen, Zhengxue Cheng, Masaru Takeuchi, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    Picture Coding Symposium (PCS)    2019年11月  [査読有り]

  • Road Infrastructure Monitoring System using E-Bikes and Its Extensions for Smart Community

    Jiro Katto, Masaru Takeuchi, Kenji Kanai, Heming Sun

    Proceedings of the Annual International Conference on Mobile Computing and Networking, MOBICOM     43 - 44  2019年10月

     概要を見る

    In this paper, we present a road infrastructure monitoring system using e-bikes to support smart community. Smartphone and IoT (Internet of Things) devices are attached to e-bikes, and electric power is supplied from the batteries of e-bikes. Image processing techniques based on deep learning are applied and run on IoT devices or cloud backbone. A prototype system is implemented, which performs from image capturing to web browsing of the processed images.

    DOI

  • A Gamut Extension Method considering Color Information Restoration using Convolutional Neural Networks

    Masaru Takeuchi, Yusuke Sakamoto, Ryota Yokoyama, Heming Sun, Yasutaka Matsuo, Jiro Katto

    IEEE International Conference on Image Processing (ICIP)    2019年09月  [査読有り]

  • Perceptual Quality Study on Deep Learning based Image Compression

    Zhengxue Cheng, Pinar Akyazi, Heming Sun, Jiro Katto, Touradj Ebrahimi

    IEEE International Conference on Image Processing (ICIP)    2019年09月  [査読有り]

  • Deep Residual Learning for Image Compression

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    Conference on Computer Vision and Pattern Recognition (CVPR) Workshops    2019年06月  [査読有り]

  • Learning Image and Video Compression through Spatial-Temporal Energy Compaction

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    Conference on Computer Vision and Pattern Recognition (CVPR)    2019年06月  [査読有り]

  • A MinimalAdder-oriented 1D DST-VII/DCT-VIII Hardware Implementation for VVC Standard

    Yixuan Zeng, Heming Sun, Jiro Katto, Xiaoyang Zeng, Yibo Fan

    IEEE International System-on-chip Conference (ISOCC)    2019年06月  [査読有り]

  • Design of Low-Cost Approximate Multipliers Based on Probability-Driven Inexact Compressors.

    Yi Guo, Heming Sun, Ping Lei, Shinji Kimura

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   102-A ( 12 ) 1781 - 1791  2019年  [査読有り]

    DOI

  • Gamut-Extension Methods Considering Color Information Restoration

    Masaru Takeuchi, Yusuke Sakamoto, Ryota Yokoyama, Heming Sun, Yasutaka Matsuo, Jiro Katto

    IEEE Access   7   80146 - 80158  2019年  [査読有り]

     概要を見る

    Recently, ultra high definition television (UHDTV) services have become popular using satellites and the Internet. However, there are expansive volumes of high definition television (HDTV) and standard definition television (SDTV) contents held by broadcasting companies and in storage devices. Herein we propose two color space conversion (also known as gamut mapping) methods from BT.709 (used for current HDTV broadcast) to BT.2020 (used for UHDTV broadcast) that restore or estimate lost color information. One of our methods anisotropically diffuses the BT.709 chromaticities with regard to the direction of the original chromaticities in the BT.2020 color space, generating chromaticities out of BT.709 gamut. The other learns an end-to-end conversion method from a BT.709 image to a BT.2020 image and restores lost color information using convolutional neural network (CNN). Using these methods along with BT.709 images, we obtain BT.2020 images with chromaticities from the BT.709 color gamut.

    DOI

  • Approximate DCT Design for Video Encoding Based on Novel Truncation Scheme.

    Heming Sun, Zhengxue Cheng, Amir Masoud Gharehbaghi, Shinji Kimura, Masahiro Fujita

    IEEE Trans. Circuits Syst. I Regul. Pap.   66-I ( 4 ) 1517 - 1530  2019年  [査読有り]

    DOI

  • Energy-Efficient and High-Speed Approximate Signed Multipliers with Sign-Focused Compressors.

    Yi Guo, Heming Sun, Shinji Kimura

    IEEE International System-on-chip Conference (ISOCC)     330 - 335  2019年  [査読有り]

    DOI

  • Deep Convolutional AutoEncoder-based Lossy Image Compression

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    Picture Coding Symposium     253 - 257  2018年06月  [査読有り]

  • Performance Comparison of Convolutional AutoEncoders, Generative Adversarial Networks and Super-Resolution for Image Compression

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops     2613 - 2616  2018年06月  [査読有り]

  • Lossy Image Compression using Deep Convolutional AutoEncoder

    Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

    IEICE technical report   118 ( 73 ) 15 - 20  2018年06月

  • Sparse ternary connect: Convolutional neural networks using ternarized weights with enhanced sparsity.

    Canran Jin, Heming Sun, Shinji Kimura

    Asia and South Pacific Design Automation Conference (ASP-DAC)     190 - 195  2018年  [査読有り]

    DOI

  • Design of Power and Area Efficient Lower-Part-OR Approximate Multiplier.

    Yi Guo, Heming Sun, Shinji Kimura

    IEEE Region 10 Conference (TENCON)     2110 - 2115  2018年  [査読有り]

    DOI

  • Low-Cost Approximate Multiplier Design using Probability-Driven Inexact Compressors.

    Yi Guo, Heming Sun, Li Guo 0006, Shinji Kimura

    IEEE Asia Pacific Conference on Circuits and Systems     291 - 294  2018年  [査読有り]

    DOI

  • Fast Algorithm and VLSI Architecture of Rate Distortion Optimization in H.265/HEVC

    Heming Sun, Dajiang Zhou, Landan Hu, Shinji Kimura, Satoshi Goto

    IEEE TRANSACTIONS ON MULTIMEDIA   19 ( 11 ) 2375 - 2390  2017年11月  [査読有り]

     概要を見る

    In H.265/high efficiency video coding (HEVC) encoding, rate distortion optimization (RDO) is an important cost function for mode decision and coding structure decision. Despite being near-optimum in terms of coding efficiency, RDO suffers from a high complexity. To address this problem, this paper presents a fast RDO algorithm and its very large scale implementation (VLSI) for both intra-and inter-frame coding. The proposed algorithm employs a quantization-free framework that significantly reduces the complexity for rate and distortion optimization. Meanwhile, it maintains a low degradation of coding efficiency by taking the syntax element organization and probability model of HEVC into consideration. The algorithm is also designed with hardware architecture in mind to support an efficient VLSI implementation. When implemented in the HEVC test model, the proposed algorithm achieves 62% RDO time reduction with 1.85% coding efficiency loss for the "all-intra" configuration. The hardware implementation achieves 1.6 x higher normalized throughput relative to previous works, and it can support a throughput of 8k@30fps (for four fine-processed modes per prediction unit) with 256 k logic gates when working at 200 MHz.

    DOI

  • Time-efficient and TSV-aware 3D gated clock tree synthesis based on self-tuning spectral clustering

    Fan Yang, Minghao Lin, Heming Sun, Shinji Kimura

    Midwest Symposium on Circuits and Systems   2017-   1200 - 1203  2017年09月  [査読有り]

     概要を見る

    3D gated clock tree synthesis (CTS) mainly consists of three steps: 1) abstract clock topology generation
    2) layer embedding for minimal TSV allocation and 3) clock tree routing with gate and buffer insertion. In this paper, a self-tuning spectral clustering based nearest-neighbor selection (SSC-NNS) algorithm with parallel structure is proposed to achieve high time efficiency in clock tree topology generation, with reduced runtime. In addition, a postorder traversal based layer embedding (PTLE) strategy is adopted for determining the embedding layer of internal nodes with minimal TSVges. Experimental results show that the proposed method achieves 32% and 82% runtime reduction on ISPD2009 and IBM benchmarks respectively compared with the state-of-the-art 3D work. Besides, the TSV count is also reduced by 46% on ISPD2009 benchmarks.

    DOI

  • A low-cost approximate 32-point transform architecture

    Heming Sun, Zhengxue Cheng, Amir Masoud Gharehbaghi, Shinji Kimura, Masahiro Fujita

    Proceedings - IEEE International Symposium on Circuits and Systems     1 - 4  2017年09月  [査読有り]

     概要を見る

    This paper presents an area-efficient approximate method for 32-point transform which is one of the most area-consuming parts in High Efficiency Video Coding (HEVC) applications. Compared to prior literatures, this work reduces the hardware cost of transform by 1) eliminating all the arithmetic operations of 6 least significant bits (LSB), 2) presenting a low-delay method for generating carry propagation from the remaining 5 LSBs and 3) truncating the most significant bits (MSB) according to the position of component. In the implementation of a 32-point forward transform, the experimental results show that 27% area consumption can be saved and the coding efficiency loss aroused by the approximation is only 0.044% compared with the origin.

    DOI

  • High Accuracy 8×8 Approximate Multiplier based on OR Operation

    Yi Guo, Heming Sun, Canran Jin, Shinji Kimura

    IEICE technical report   116 ( 478 ) 19 - 24  2017年03月

  • Accelerating HEVC Inter Prediction with Improved Merge Mode Handling

    Zhengxue Cheng, Heming Sun, Dajiang Zhou, Shinji Kimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E100A ( 2 ) 546 - 554  2017年02月  [査読有り]

     概要を見る

    High Efficiency Video Coding (HEVC/H.265) obtains 50% bit rate reduction than H.264/AVC standard with comparable quality at the cost of high computational complexity. Merge mode is one of the most important new features introduced in HEVC's inter prediction. Merge mode and traditional inter mode consume about 90% of the total encoding time. To address this high complexity, this paper utilizes the merge mode to accelerate inter prediction by four strategies. 1) A merge candidate decision is proposed by the sum of absolute transformed difference (SATD) cost. 2) An early merge termination is presented with more than 90% accuracy. 3) Due to the compensation effect of merge candidates, symmetric motion partition (SMP) mode is disabled for non-8x8 coding units (CUs). 4) A fast coding unit filtering strategy is proposed to reduce the number of CUs which need to be fine-processed. Experimental results demonstrate that our fast strategies can achieve 35.4%-58.7% time reduction with 0.68%-1.96% BD-rate increment in RA case. Compared with similar works, the proposed strategies are not only among the best performing in average-case complexity reduction, but also notably outperforming in the worst cases.

    DOI

  • An 8K H.265/HEVC Video Decoder Chip With a New System Pipeline Design

    Dajiang Zhou, Shihao Wang, Heming Sun, Jianbin Zhou, Jiayi Zhu, Yijin Zhao, Jinjia Zhou, Shuping Zhang, Shinji Kimura, Takeshi Yoshimura, Satoshi Goto

    IEEE JOURNAL OF SOLID-STATE CIRCUITS   52 ( 1 ) 113 - 126  2017年01月  [査読有り]

     概要を見る

    8K ultra-HD is being promoted as the next-generation video specification. While the High Efficiency Video Coding (HEVC) standard greatly enhances the feasibility of 8K with a doubled compression ratio, its implementation is a challenge, owing to ultrahigh-throughput requirements and increased complexity per pixel. The latter comes from the new features of HEVC. At the system level, the most challenging of them is the enlarged and highly variable-size coding/prediction/transform units, which significantly increase the requirement for on-chip memory as pipeline buffers and the difficulty in maintaining pipeline utilization. This paper presents an HEVC decoder chip featuring a system pipeline that works at a nonunified and variable granularity. The pipeline saves on-chip memory with a novel block-in-block-out queue system and a parameter delivery network, while allowing overhead-free and fully pipelined operation of the processing components. With the system pipeline design combined with various component-level optimizations, the proposed decoder in 40 nm achieves a maximum throughput of 4 Gpixels/s or 8K 120 frames/s for the low-delay-P configuration of HEVC, 7.5-55 times faster than prior works. It supports 8K 60 frames/s for the low-delay and random-access configurations. In a normalized comparison, it also shows 3.1-3.6 times better area efficiency and 31%-55% superior energy efficiency.

    DOI

  • A Low-Power VLSI Architecture for HEVC De-Quantization and Inverse Transform

    Heming Sun, Dajiang Zhou, Shuping Zhang, Shinji Kimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E99A ( 12 ) 2375 - 2387  2016年12月  [査読有り]

     概要を見る

    In this paper, we present a low-power system for the de-quantization and inverse transform of HEVC. Firstly, we present a low-delay circuit to process the coded results of the syntax elements, and then reduce the number of multipliers from 16 to 4 for the de-quantization process of each 4x4 block. Secondly, we give two efficient data mapping schemes for the memory between de-quantization and inverse transform, and the memory for transpose. Thirdly, the zero information is utilized through the whole system. For two memory parts, the write and read operation of zero blocks/ rows/ coefficients can all be skipped to save the power consumption. The results show that up to 86% power consumption can be saved for the memory part under the configuration of Random-access and common QPs. For the logical part, the proposed architecture for de-quantization can reduce 77% area consumption. Overall, our system can support real-time coding for 8K x 4K 120fps video sequences and the normalized area consumption can be reduced by 68% compared with the latest work.

    DOI

  • A 4Gpixel/s 8/10b H.265/HEVC Video Decoder Chip for 8K Ultra HD Applications

    Dajiang Zhou, Shihao Wang, Heming Sun, Jianbin Zhou, Jiayi Zhu, Yijin Zhao, Jinjia Zhou, Shuping Zhang, Shinji Kimura, Takeshi Yoshimura, Satoshi Goto

    2016 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC)   59   266 - U369  2016年  [査読有り]

    DOI

  • Power-Efficient and Slew-Aware Three Dimensional Gated Clock Tree Synthesis

    Minghao Lin, Heming Sun, Shinji Kimura

    2016 IFIP/IEEE INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC)     1 - 6  2016年  [査読有り]

     概要を見る

    This paper presents a three dimensional (3D) gated clock tree synthesis (CTS) approach, which consists of two steps: 1) abstract tree topology generation; and 2) 3D gated and buffered clock routing. 3D Pair Matching (3D-PM) algorithm is proposed to generate the initial tree topology and then the proposed TSV-minimization algorithm is applied to generate TSV-aware tree topology. Based on TSV-aware tree topology, 3D gated and buffered clock tree routing is done using the proposed 3D Gated and Buffered Deferred-Merge Embedding (3D-GB-DME) algorithm. The slew constraint satisfaction is considered and the clock skew is minimized in our approach. Experimental results show that the proposed method achieves 29.11% power reduction compared with the state-of-the-art 2D work.

    DOI

  • Human Detection Method Based on Non-Redundant Gradient Semantic Local Binary Patterns

    Jiu Xu, Ning Jiang, Wenxin Yu, Heming Sun, Satoshi Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E98A ( 8 ) 1735 - 1742  2015年08月  [査読有り]

     概要を見る

    In this paper, a feature named Non-Redundant Gradient Semantic Local Binary Patterns (NRGSLBP) is proposed for human detection as a modified version of the conventional Semantic Local Binary Patterns (SLBP). Calculations of this feature are performed for both intensity and gradient magnitude image so that texture and gradient information are combined. Moreover, and to the best of our knowledge, non-redundant patterns are adopted on SLBP for the first time, allowing better discrimination. Compared with SLBP, no additional cost of the feature dimensions of NRGSLBP is necessary, and the calculation complexity is considerably smaller than that of other features. Experimental results on several datasets show that the detection rate of our proposed feature outperforms those of other features such as Histogram of Orientated Gradient (HOG), Histogram of Templates (HOT), Bidirectional Local Template Patterns (BLTP), Gradient Local Binary Patterns (GLBP), SLBP and Covariance matrix (COV).

    DOI

  • A fast level filtering algorithm for inter prediction in HEVC encoder

    Zhengxue Cheng, Heming Sun, Landan Hu, Shinji Kimura

    International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC)     404 - 407  2015年06月  [査読有り]

  • HARDWARE-ORIENTED RATE-DISTORTION OPTIMIZATION ALGORITHM FOR HEVC INTRA-FRAME ENCODER

    Landan Hu, Heming Sun, Dajiang Zhou, Shinji Kimura

    2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)     1 - 6  2015年  [査読有り]

     概要を見る

    Digital video is widely used in the mobile applications, where video compression technology is necessary to store or transmit the videos. High Efficiency Video Coding (HEVC) achieves the highest compression ratio while it costs huge computational complexity, in which rate-distortion (RD) cost calculation takes the majority. This paper presents a low-complexity RD estimation method for HEVC intra prediction by the following schemes. 1) The transformed coefficients rather than quantized coefficients are used to do the RD estimation. 2) For the rate part, the position after the last non-zero quantized coefficient is considered to improve the accuracy of estimation, and a header-bit estimation method is presented to save about 82% complexity on header bits calculation. 3) For the distortion part, the scaling parameter of quantization is modified to the exponential of two so that the bit depth of multiplication can be reduced from 15 to 5 in the worst case. 4) In transform unit 4x4, we consider transform skip mode which is neglect in the prior research. Our proposal could achieve 72.22% time reduction of rate-distortion optimization (RDO) compared with original HEVC Test Model while the BD-rate is only 1.76%.

    DOI

  • Merge Mode Based Fast Inter Prediction for HEVC

    Zhengxue Cheng, Heming Sun, Dajiang Zhou, Shinji Kimura

    2015 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)    2015年  [査読有り]

     概要を見る

    The latest High Efficiency Video Coding (HEVC/H.265) obtains 50% bit rate reduction than H.264/AVC standard with comparable quality, but at the cost of high computational complexity. Inter prediction accounts for large complexity and merge mode is one of the most important new features introduced in HEVC. To address this issue, this paper utilizes the merge mode to accelerate inter prediction by three fast mode decision methods. 1) A merge candidate decision is proposed to select the best merge mode by Sum of Absolute Transformed Difference ( SATD) cost to reduce the merge time. 2) An early merge termination is presented still based on SATD cost with more than 90% accuracy. 3) Based on efficient merge mode, symmetric motion partition (SMP) modes can be disabled for non-8x8 code units (CUs). Experimental results demonstrate that our work can achieve 53.1%-54.2% time reduction on average with 1.57%-2.30% BD-rate increment. Besides, our method achieves an improvement of 18%-30% time reduction with 0.89%-2.85% BD-rate increment when combined with other existing approaches.

    DOI

  • A Low-Cost VLSI Architecture of Multiple-Size IDCT for H.265/HEVC

    Heming Sun, Dajiang Zhou, Peilin Liu, Satoshi Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E97A ( 12 ) 2467 - 2476  2014年12月  [査読有り]

     概要を見る

    In this paper, we present an area-efficient 4/8/16/32-point inverse discrete cosine transform (IDCT) architecture for a HEVC decoder. Compared with previous work, this work reduces the hardware cost from two aspects. First, we reduce the logical costs of ID IDCT by proposing a reordered parallel-in serial-out (RPISO) scheme. By using the RPISO scheme, we can reduce the required calculations for butterfly inputs in each cycle. Secondly, we reduce the area of transpose architecture by proposing a cyclic data mapping scheme that can achieve 100% I/O utilization of each SRAM. To design a fully pipelined 2D IDCT architecture, we propose a pipelining schedule for row and column transform. The results show that the normalized area by maximum throughput for the logical IDCT part can be reduced by 25%, and the memory area can be reduced by 62%. The maximum throughput reaches 1248 Mpixels/s, which can support real-time decoding of a 4K x 2K 60 fps video sequence.

    DOI

  • A fast mode selection algorithm for HEVC intra prediction

    Heming Sun, Satoshi Goto

    International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC)     449 - 451  2014年07月  [査読有り]

  • Fast Prediction Unit Selection and Mode Selection for HEVC Infra Prediction

    Heming Sun, Dajiang Zhou, Peilin Liu, Satoshi Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E97A ( 2 ) 510 - 519  2014年02月  [査読有り]

     概要を見る

    As a next-generation video compression standard, High Efficiency Video Coding (HEVC) achieves enhanced coding performance relative to prior standards such as H.264/AVC. In the new standard, the improved intra prediction plays an important role in bit rate saving. Meanwhile, it also involves significantly increased complexity, due to the adoption of a highly flexible coding unit structure and a large number of angular prediction modes. In this paper, we present a low-complexity intra prediction algorithm for HEVC. We first propose a fast preprocessing stage based on a simplified cost model. Based on its results, a fast prediction unit selection scheme reduces the number of prediction unit (PU) levels that requires fine processing from 5 to 2. To supply PU size decision with appropriate thresholds, a fast training method is also designed. Still based on the preprocessing results, an efficient mode selection scheme reduces the maximum number of angular modes to evaluate from 35 to 8. This achieves further algorithm acceleration by eliminating the necessity to perform fine Hadamard cost calculation. We also propose a 32 x 32 PU compensation scheme to alleviate the mismatch of cost functions for large transform units, which effectively improves coding performance for high-resolution sequences. In comparison with HM 7.0, the proposed algorithm achieves over 50% complexity reduction in terms of encoding time, with the corresponding bit rate increase lower than 2.0%. Moreover, the achieved complexity reduction is relatively stable and independent to sequence characteristics.

    DOI

  • Low-complexity rate-distortion optimization algorithms for HEVC intra prediction

    Zhe Sheng, Dajiang Zhou, Heming Sun, Satoshi Goto

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   8325 ( 1 ) 541 - 552  2014年  [査読有り]

     概要を見る

    HEVC achieves a better coding efficiency relative to prior standards, but also involves dramatically increased complexity. The complexity increase for intra prediction is especially intensive due to a highly flexible quad-tree coding structure and a large number of prediction modes. The encoder employs rate-distortion optimization (RDO) to select the optimal coding mode. And RDO takes a great portion of intra encoding complexity.Moreover HEVC has stronger dependency on RDO than H.264/AVC. To reduce the computational complexity and to implement a real-time system,this paper presents two low-complexity RDO algorithms for HEVC intra prediction. The structure of RDO is simplified by the proposed rate and distortion estimators, and some hardware-unfriendly modules are facilitated. Compared with the original RDO procedure, the two proposed algorithms reduce RDO time by 46% and 64% respectively with acceptable coding efficiency loss. © 2014 Springer International Publishing.

    DOI

  • VLSI ARCHITECTURE OF HEVC INTRA PREDICTION FOR 8K UHDTV APPLICATIONS

    Jianbin Zhou, Dajiang Zhou, Heming Sun, Satoshi Goto

    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)     1273 - 1277  2014年  [査読有り]

     概要を見る

    This paper presents an efficient VLSI architecture of intra prediction for 8Kx4K HEVC decoder. It supports all 35 intra prediction modes and prediction sizes ranging from 4x4 to 64x64. This works proposed a Cyclic SRAM Banks based Parallel Reference Sample Fetching (CSB-PRSF), which guarantees enough reference samples for prediction and reduces the number of registers used for storing reference samples. To guarantee high throughput, 16 pixels are predicted by 4x4 Block Based Pipelining, and dependency between neighboring blocks is eliminated by Hybrid Data Forwarding and Block Reordering.
    This architecture is synthesized using 90nm technology and the maximum working frequency is 469 MHz, with 72.1K gates area. Running at 397MHz, the architecture can support 4320p@120fps HEVC intra decoding, with full modes and full sizes.

    DOI

  • AN AREA-EFFICIENT 4/8/16/32-POINT INVERSE DCT ARCHITECTURE FOR UHDTV HEVC DECODER

    Heming Sun, Dajiang Zhou, Jiayi Zhu, Shinji Kimura, Satoshi Goto

    2014 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING CONFERENCE     197 - 200  2014年  [査読有り]

     概要を見る

    This paper presents a new VLSI architecture for HEVC inverse discrete cosine transform (IDCT). Compared to prior arts, this work reduces hardware cost by 1) reducing computational logic of 1-D IDCTs with a reordered parallel-in serial-out (RPISO) scheme that shares the inputs of the butterfly structure, and 2) reducing the area of the transpose buffer with a cyclic memory organization that achieves 100% I/O utilization of the SRAMs. In the implementation of a unified 4/8/16/32-point IDCT, the proposed schemes demonstrate 35% and 62% reduction of logic and memory costs, respectively. The IDCT implementation can support real-time decoding of 4Kx2K 60fps video with a total hardware cost of 357,250um(2) on 2-D IDCT and 80,988um(2) on transpose memory in 90nm process.

    DOI

  • Real-Time Human Detection Based on Multi-Scale Bidirectional Local Template Patterns

    Jiu Xu, Ning Jiang, Heming Sun, Axel Beaugendre, Satoshi Goto

    IIEEJ Transactions on Electronics and Visual Computing   1 ( 1 ) 28 - 37  2013年12月  [査読有り]

  • Multi-scale Bidirectional Local Template Patterns for Real-time Human Detection

    Jiu Xu, Ning Jiang, Xinwei Xue, Heming Sun, Wenxin Yu, Satoshi Goto

    2013 IEEE 15TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP)     379 - 383  2013年  [査読有り]

     概要を見る

    In this paper, a feature named multi-scale bidirectional local template patterns (MBLTP) is proposed for human detection. As an extension of bidirectional local template patterns (BLTP), MBLTP not only integrates the textural and gradient information according to the four predefined templates but also calculates information for additional feature vectors by adjusting the scale of the training samples. These additional feature vectors contain multi-scale information on the samples, which can make the feature more discriminative than its original form. Experimental results for an INRIA dataset show that the detection rate of our proposed MBLTP feature outperforms those of other features such as the multi-level histogram of orientated gradient (multi-level HOG), multi scale block histogram of template (MB-HOT), and HOG-LBP. Moreover, in order to make our feature meet real-time requirements, an implementation based on a graphic process unit (GPU) is adopted to accelerate the calculation.

    DOI

  • A low-complexity HEVC intra prediction algorithm based on level and mode filtering

    Heming Sun, Dajiang Zhou, Satoshi Goto

    Proceedings - IEEE International Conference on Multimedia and Expo     1085 - 1090  2012年  [査読有り]

     概要を見る

    HEVC achieves a better coding efficiency relative to prior standards, but also involves increased complexity. For intra prediction, complexity is especially intensive due to a highly flexible coding unit structure and a large number of prediction modes. This paper presents a low-complexity intra prediction algorithm for HEVC. A fast preprocessing stage based on a simplified cost model is proposed. Based on its results, a level filtering scheme reduces the number of prediction unit levels that requires fine processing from 5 to 2. To supply level filtering decision with appropriate thresholds, a fast training method is also designed. A mode filtering scheme further reduces the maximum number of angular modes to be evaluated from 34 to 9. Complexity reduction from HM 3.0 is over 50% and stable for various sequences, which makes the proposed algorithm suitable for real-time applications. The corresponding bit rate increase is lower than 2.5%. © 2012 IEEE.

    DOI

▼全件表示

受賞

  • IEEE VCIP Best Paper Award

    2020年12月  

  • 研究奨励賞

    2020年01月   公益財団法人 高柳健次郎財団  

  • Picture Coding Symposium Grand Challenge on Short Video Coding Silver Award

    2019年11月  

  • CVPR Workshop and Challenge on Learned Image Compression MOS 第五位

    2019年06月  

  • テレコムシステム技術学生賞

    2018年03月   公益財団法人 電気通信普及財団  

  • VDEC デザインアワード奨励賞

    2017年09月   東京大学大規模集積システム設計教育研究センター  

  • ISSCC 2016 Takuo Sugano Award for Outstanding Far-East Paper

    2016年02月  

▼全件表示

共同研究・競争的資金等の研究課題

  • リアルタイム低電力深層学習適用による革新的な動画像圧縮システム

講演・口頭発表等

  • Advances in Design and Implementation of End-to-End Learned Image and Video Compression

    Wen-Hsiao Peng, Heming Sun

    IEEE ISCAS  

    発表年月: 2021年05月

  • Deep Learning Method for Image Compression

    Heming Sun

    情報処理学会 短期集中セミナー(SC 29)  

    発表年月: 2021年02月

 

現在担当している科目

▼全件表示

担当経験のある科目(授業)

  • ニューラルネットワークの理論と応用

    2019年09月
    -
    継続中
     

  • Fundamentals of Programming

    2019年04月
    -
    継続中
     

 

委員歴

  • 2021年05月
     
     

    IEEE ISCAS  Session Chair

  • 2019年11月
     
     

    Picture Coding Symposium  Special Session Chair

学術貢献活動

  • ACM Transactions on Reconfigurable Technology and Systems

  • International Journal of Computer Vision

  • IEEE Signal Processing Letters

  • IEEE Transactions on Pattern Analysis and Machine Intelligence

  • IEEE Transactions on Multimedia

  • IEEE Transactions on Circuits and Systems I: Regular Papers

  • IEEE Transactions on Circuits and Systems—II: Express Briefs

  • IEEE Transactions on Circuits and Systems for Video Technology

▼全件表示