Updated on 2022/05/19

写真a

 
SHI, Youhua
 
Affiliation
Faculty of Science and Engineering, School of Fundamental Science and Engineering
Job title
Professor

Concurrent Post

  • Faculty of Science and Engineering   Graduate School of Fundamental Science and Engineering

Research Institute

  • 2020
    -
    2022

    理工学術院総合研究所   兼任研究員

Education

  •  
    -
    2005

    Waseda University   Graduate School, Division of Engineering  

Degree

  • Doctor of Engineering

Professional Memberships

  •  
     
     

    IEICE

  •  
     
     

    IPSJ

  •  
     
     

    IEEE

 

Research Areas

  • Information security

  • Electron device and electronic equipment

  • Computer system

Research Interests

  • Reliable and fault-tolerant computing, Cryptography, Video Processing

Papers

  • Power-Efficient Deep Convolutional Neural Network Design Through Zero-Gating PEs and Partial-Sum Reuse Centric Dataflow.

    Lin Ye, Jinghao Ye, Masao Yanagisawa, Youhua Shi

    IEEE Access   9   17411 - 17420  2021

    DOI

  • A High-Performance Symmetric Hybrid Form Design for High-Order FIR Filters.

    Jinghao Ye, Masao Yanagisawa, Youhua Shi

    2020 IEEE Asia Pacific Conference on Circuits and Systems(APCCAS)     121 - 124  2020

    DOI

  • Transition Detector-Based Radiation-Hardened Latch for Both Single- and Multiple-Node Upsets.

    Saki Tajima, Masao Yanagisawa, Youhua Shi

    IEEE Trans. Circuits Syst. II Express Briefs   67-II ( 6 ) 1114 - 1118  2020

     View Summary

    © 2004-2012 IEEE. This brief presents an output transition detector-based radiation-hardened latch (TDRHL) for reliability improvement. With an error recovery assistant logic and an in-situ transition detector, for any radiation induced single- and double-node upsets, the proposed TDRHL can 1) provide full self-recovery capability and 2) generate a warning signal for architecture-level recovery when soft errors cause the latch output flipped. The evaluation results show that TDRHL outperforms state-of-the-art double-node upset tolerant designs with addition error detection capability, and up to 5.0X power-delay-product improvement can be achieved.

    DOI

  • Faithfully Truncated Adder-Based Area-Power Efficient FIR Design with Predefined Output Accuracy.

    Jinghao Ye, Masao Yanagisawa, Youhua Shi

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   103-A ( 9 ) 1063 - 1070  2020

     View Summary

    © 2020 The Institute of Electronics, Information and Communication Engineers To solve the area and power problems in Finite Impulse Response (FIR) implementations, a faithfully truncated adder-based FIR design is presented in this paper for significant area and power savings while the predefined output accuracy can still be obtained. As a solution to the accuracy loss caused by truncated adders, a static error analysis on the utilization of truncated adders in FIRs was performed. According to the mathematical analysis, we show that, with a given accuracy constraint, the optimal truncated adder configuration for an area-power efficient FIR design can be effortlessly determined. Evaluation results on various FIR implementations by using the proposed faithfully truncated adder designs showed that up to 35.4% and 27.9% savings in area and power consumption can be achieved with less than 1 ulp accuracy loss for uniformly distributed random inputs. Moreover, as a case study for normally distributed signals, a fixed 6-tap FIR is implemented for electrocardiogram (ECG) signal filtering was implemented, in which even with the increased truncated bits up to 10, the mean absolute error (E) can be guaranteed to be less than 1 ulp while up to 29.7% and 25.3% savings in area and power can be obtained.

    DOI

  • A Power-Efficient Soft Error Hardened Latch Design with In-Situ Error Detection Capability

    Saki Tajima, Masao Yanagisawa, Youhua Shi

    Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics   2019-November   53 - 56  2019.11

     View Summary

    © 2019 IEEE. A power-efficient single node upset hardened latch design with in-situ error detection capability, EDSL, is proposed in this paper for reliability improvement against soft errors. Our simulation results show that the proposed EDSL design can not only recover from any incurred single node upset, but also provide in-situ error detection capability when the latch output is upset. When compared with state-of-the-art error-detection-based and SNU resilient designs, the proposed EDSL latch can achieve up to 72.25% and 79.74% reduction of power-delay-product respectively, which clearly shows the effectiveness of the proposed method.

    DOI

  • An Adder-Segmentation-based FIR for High Speed Signal Processing.

    Jinghao Ye, Masao Yanagisawa, Youhua Shi

    Proceedings of International Conference on ASIC     1 - 4  2019

     View Summary

    © 2019 IEEE. An advanced adder-segmentation-based FIR filter design for high speed signal processing is proposed in this paper. In the proposed method, the critical path delay is shortened through adder segmentation. An analysis for the optimization of adder segmentation is also proposed, which can be used for critical path delay balance to maximize the performance of FIR filters. The evaluation results show that the proposed design can achieve up to 30.7% and 22.8% reduction in area-delay-product (ADP) and energy-delay-product (EDP) when compared with the existing FIR filters.

    DOI

  • A Zero-Gating Processing Element Design for Low-Power Deep Convolutional Neural Networks.

    Lin Ye, Jinghao Ye, Masao Yanagisawa, Youhua Shi

    Proceedings - APCCAS 2019: 2019 IEEE Asia Pacific Conference on Circuits and Systems: Innovative CAS Towards Sustainable Energy and Technology Disruption     317 - 320  2019

     View Summary

    © 2019 IEEE. Convolution neural networks (CNNs) have shown great success in many areas such as object detection and pattern recognition. However, the high computational complexity of state-of-the-art deep CNNs makes them extreme difficult to be run on resource-constrained mobile and wearable devices. To address this design challenge, in this paper we first analyzed the filters' weights of pre-trained models from four state-of-the-art CNNs. We found that in all the CNNs that we analyzed, from about 20% (AlexNet) to 43% (VGG-19) of the weights are zeros, which lead to redundant large amounts of computation. Then, based on this observation, a zero-gating processing element (PE) design was proposed for low-power deep CNNs, in which the vast number of zeros in both activation maps and filter weights are explored to eliminate redundant computation for power reduction. We implemented our proposal with VGG-16 using ImageNet dataset. Experiments were conducted for evaluations of area and total power consumption. Compared with the baseline PE design without zero-gating, overall the proposed zero-gating PE can achieve 37% power saving while the corresponding area overhead is less than 8%.

    DOI

  • A Bit-Segmented Adder Chain based Symmetric Transpose Two-Block FIR Design for High-Speed Signal Processing.

    Jinghao Ye, Masao Yanagisawa, Youhua Shi

    Proceedings - APCCAS 2019: 2019 IEEE Asia Pacific Conference on Circuits and Systems: Innovative CAS Towards Sustainable Energy and Technology Disruption     29 - 32  2019

     View Summary

    © 2019 IEEE. A high-speed FIR filter structure is proposed in this paper by utilizing bit-segmentation adders and symmetric transpose 2-block FIR structure. First, a bit-segmented adder chain-based design is proposed with bit-segmentation adders. Second, a basic unit design of symmetric transpose block FIR is proposed to reduce the critical path delay. The evaluation results show that, when compared with state-of-the-art high-speed CSD multiplier-based FIR filter design, the proposed design requires 14.1% less area while provides 7.9% frequency improvement, 10.2% reduction of power consumption, 22.8% reduction of energy-delay-product and 20.4% reduction of area-delay-product, which shows the effectiveness of the proposed method.

    DOI

  • Static Error Analysis and Optimization of Faithfully Truncated Adders for Area-Power Efficient FIR Designs.

    Jinghao Ye, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

    Proceedings - IEEE International Symposium on Circuits and Systems   2019-May   1 - 4  2019  [Refereed]

     View Summary

    © 2019 IEEE Faithfully truncated adders are used for low cost FIR implementations in this paper, which improves state-of-the-art CSD-based FIR filter designs for further area and power reduction while meeting the accuracy requirement. As a solution to the accuracy loss caused by truncated adders, this paper performed a static error analysis of truncated adders. Furthermore, based upon our mathematical analysis, we show that, with a given accuracy constraint, an optimal truncated adder configuration can be effortlessly determined for area-power efficient FIR designs. Evaluation results on various FIR designs showed that 16.8%~35.4% reduction in area and 11.8%~27.9% in power saving can be achieved with the proposed optimal truncated adder designs within an average error of 1 ulp.

    DOI

  • Hardware Trojan Detection Utilizing Machine Learning Approaches.

    Kento Hasegawa, Youhua Shi, Nozomu Togawa

    Proceedings - 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications and 12th IEEE International Conference on Big Data Science and Engineering, Trustcom/BigDataSE 2018     1891 - 1896  2018  [Refereed]

     View Summary

    © 2018 IEEE. Hardware security has become a serious concern in recent years. Due to the outsourcing in hardware production, malicious circuits (or hardware Trojans) can be easily inserted into hardware products by attackers. Since hardware Trojans are tiny and stealthy, their detection is difficult. Under the circumstances, numerous hardware-Trojan detection methods have been proposed. In this paper, we elaborate the overview of hardware-Trojan detection and review the hardware-Trojan detection methods using machine learning which is one of the state-of-the-art approaches.

    DOI

  • A Low Power Soft Error Hardened Latch with Schmitt-Trigger-Based C-Element.

    Saki Tajima, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   101-A ( 7 ) 1025 - 1034  2018  [Refereed]

     View Summary

    Copyright © 2018 The Institute of Electronics, Information and Communication Engineers. To deal with the reliability issue caused by soft errors, this paper proposed a low power soft error hardened latch (SHC) design using a novel Schmitt-Trigger-based C-element for reliable low power applications. Unlike state-of-the-art soft error tolerant latches that are usually based on hardware redundancy with large area overhead and high power consumption, the proposed SHC latch is implemented through double-sampling and node-checking using a novel Schmitt-Trigger-based C-element, which can help to reduce the area overhead and the corresponding power consumption as well. The evaluation results show that the total number of transistors of the proposed SHC latch is only increased by 2 when compared to the conventional unhardened C2MOS latch, while up to 20.35% and 82.96% power reduction can be achieved when compared to the conventional un-hardened C2MOS latch and the existing soft error tolerant HiPeR design, respectively.

    DOI

  • Extension and Performance/Accuracy Formulation for Optimal GeAr-Based Approximate Adder Designs.

    Ken Hayamizu, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   101-A ( 7 ) 1014 - 1024  2018  [Refereed]

     View Summary

    Copyright © 2018 The Institute of Electronics, Information and Communication Engineers. Approximate computing is a promising solution for future energy-efficient designs because it can provide great improvements in performance, area and/or energy consumption over traditional exact-computing designs for non-critical error-tolerant applications. However, the most challenging issue in designing approximate circuits is how to guarantee the pre-specified computation accuracy while achieving energy reduction and performance improvement. To address this problem, this paper starts from the state-of-the-art general approximate adder model (GeAr) and extends it for more possible approximate design candidates by relaxing the design restrictions. And then a maximum-error-distance-based performance/accuracy formulation, which can be used to select the performance/energy-accuracy optimal design from the extended design space, is proposed. Our evaluation results show the effectiveness of the proposed method in terms of area overhead, performance, energy consumption, and computation accuracy.

    DOI

  • A low cost and high speed CSD-based symmetric transpose block FIR implementation.

    Jinghao Ye, Youhua Shi, Nozomu Togawa, Masao Yanagisawa

    Proceedings of International Conference on ASIC   2017-   311 - 314  2017  [Refereed]

     View Summary

    In this paper, a low cost and high speed CSD-based symmetric transpose block FIR design was proposed for low cost digital signal processing. First, the existing area-efficient CSD-based multiplier was optimized by considering the reusability and the symmetry of coefficients for area reduction. Second, the position of the input register was changed for high speed transpose block FIR processing in which half of the number of required multipliers can be saved. When compared with the existing block FIR designs, the proposed FIR design can increase the data rate from 238.66 MHz to 373.13 MHz while saving 10.89% area and 21.30% energy consumption as well.

    DOI

  • Soft error tolerant latch designs with low power consumption (invited paper).

    Saki Tajima, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

    Proceedings of International Conference on ASIC   2017-   52 - 55  2017  [Refereed]

     View Summary

    As semiconductor technology continues scaling down, the reliability issue has become much more critical than ever before. Unlike traditional hard-errors caused by permanent physical damage which can't be recovered in field, soft errors are caused by radiation or voltage/current fluctuations that lead to transient changes on internal node states, thus they can be viewed as temporary errors. However, due to the unpredictable occurrence of soft errors, it is desirable to develop soft error tolerant designs. For this reason, soft error tolerant design techniques have gained great research interest. In this paper, we will explain the soft error mechanism and then review the existing soft error tolerant design techniques with particular emphasis on SEH family because they can achieve low power consumption and small performance overhead as well.

    DOI

  • In-situ Trojan authentication for invalidating hardware-Trojan functions.

    Masaru Oya, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN ISQED 2016     152 - 157  2016  [Refereed]

     View Summary

    Due to the fact that we do not know who will create hardware Trojans (HTs), and when and where they would be inserted, it is very difficult to correctly and completely detect all the real HTs in untrusted ICs, and thus it is desired to incorporate in-situ HT invalidating functions into untrusted ICs as a countermeasure against HTs. This paper proposes an in situ Trojan authentication technique for gate-level netlists to avoid security leakage. In the proposed approach, an untrusted IC operates in authentication mode and normal mode. In the authentication mode, an embedded Trojan authentication circuit monitors the bit-flipping count of a suspicious Trojan net within the pre-defined constant clock cycles and identify whether it is a real Trojan or not. If the authentication condition is satisfied, the suspicious Trojan net is validated. Otherwise, it is invalidated and HT functions are masked. By doing this, even untrusted netlists with HTs can still be used in the normal mode without security leakage. By setting the appropriate authentication condition using training sets from Trust-HUB gate-level benchmarks, the proposed technique invalidates successfully only HTs in the training sets. Furthermore, by embedding the in-situ Trojan authentication circuit into a Trojan-inserted AES crypto netlist, it can run securely and correctly even if HTs exist where its area overhead is just 1.5% with no delay overhead.

    DOI

  • A delay variation and floorplan aware high-level synthesis algorithm with body biasing.

    Koki Igawa, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN ISQED 2016     75 - 80  2016  [Refereed]

     View Summary

    In this paper, we propose a delay variation and floorplan aware high-level synthesis algorithm with body biasing, which minimizes the average leakage energy of manufactured chips. To realize a floorplan-oriented high-level synthesis, we utilize a huddle-based distributed register architecture (HDR architecture), one of the DR architectures. HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit but can increase the latency. We assign CDFG nodes in critical paths to the huddles with larger expected leakage energy and those in non-critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 38.9% without latency and yield degradation compared with typical-case design with body biasing.

    DOI

  • Timing monitoring paths selection for wide voltage IC.

    Weiwei Shan, Wentao Dai, Youhua Shi, Peng Cao 0002, Xiaoyan Xiang

    IEICE Electron. Express   13 ( 8 ) 20160095 - 20160095  2016  [Refereed]

     View Summary

    Wide voltage range circuit has got widespread attention where in-situ timing monitoring based adaptive voltage scaling (AVS) becomes necessary to reduce the design margin. However, the severe PVT variations across near-threshold to super-threshold cause too many critical paths to be monitored. Here activation oriented monitoring paths selection method is proposed to reduce the monitored paths for wide voltage IC. The minimum delay value of the longest activated path is found by dynamic timing analysis and set as the selection threshold. Those paths longer than this threshold by STA analysis are selected to be monitored. Applied on a 40 nm AVS System-on-Chip, it reduces the monitoring paths to only 22% of all critical paths with remarkable power gains under 0.6V-1.1V.

    DOI

  • A Process-Variation-Aware Multi-Scenario High-Level Synthesis Algorithm for Distributed-Register Architectures

    Koki Igawa, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2015 28TH IEEE INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (SOCC)     7 - 12  2015  [Refereed]

     View Summary

    In order to tackle a process-variation problem, we can define several scenarios, each of which corresponds to a particular LSI behavior, such as a typical-case scenario and a worst-case scenario. By designing a single LSI chip which realizes multiple scenarios simultaneously, we can have a process-variation-tolerant LSI chip. In this paper, we propose a processvariation- aware low-latency and multi-scenario high-level synthesis algorithm targeting new distributed-register architectures, called HDR architectures. We assume two scenarios, a typical-case scenario and a worst-case scenario, and realize them onto a single chip. We first schedule/bind each of the scenarios independently. After that, we commonize the scheduling/binding results for the typical-case and worst-case scenarios and thus generate a commonized area-minimized floorplan result. Experimental results show that our algorithm reduces the latency of the typical-case scenario by up to 50% without increasing the latency of the worst-case scenario, compared with several existing methods.

    DOI

  • A Score-Based Classification Method for Identifying Hardware-Trojans at Gate-Level Netlists

    Masaru Oya, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2015 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE)     465 - 470  2015  [Refereed]

     View Summary

    Recently, digital ICs are often designed by outside vendors to reduce design costs in semiconductor industry, which may introduce severe risks that malicious attackers implement Hardware Trojans (HTs) on them. Since IC design phase generates only a single design result, an RT-level or gate-level netlist for example, we cannot assume an HT-free netlist or a Golden netlist and then it is too difficult to identify whether a generated netlist is HT-free or HT-inserted. In this paper, we propose a score-based classification method for identifying HT-free or HT-inserted gate-level netlists without using a Golden netlist. Our proposed method does not directly detect HTs themselves in a gate-level netlist but a net included in HTs, which is called Trojan net, instead. Firstly, we observe Trojan nets from several HT-inserted benchmarks and extract several their features. Secondly, we give scores to extracted Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to classify HT-free and HT-inserted netlists. Based on these scores, we can successfully classify HT-free and HT-inserted netlists in all the Trust-HUB gate-level benchmarks. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be "HT-inserted" and all the HT-free gate-level benchmarks to be "HT-free" in approximately three hours for each benchmark.

  • Improved Monitoring-Path Selection Algorithm for Suspicious Timing Error Prediction based Timing Speculation

    Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    PROCEEDINGS OF 2015 IEEE 11TH INTERNATIONAL CONFERENCE ON ASIC (ASICON)     1 - 4  2015  [Refereed]

     View Summary

    As process technology is scaling down, timing speculation techniques such as Razor and STEP are emerged as alternative solutions to reduce required margins due to various variation effects. Unlike Razor, STEP is a prediction-based timing speculation method to predict suspicious timing errors before they really appear, and thus it can result in more performance improvement. Therefore, an improved monitoring-path selection algorithm for STEP-based timing speculation is proposed in this paper, in which candidate monitoring-paths are selected based on short path removement and path length estimation. Experimental results show that the proposed algorithm realizes an average of 1.71X overclocking compared with worst-case based designs.

    DOI

  • A low-power soft error tolerant latch scheme.

    Saki Tajima, Youhua Shi, Nozomu Togawa, Masao Yanagisawa

    PROCEEDINGS OF 2015 IEEE 11TH INTERNATIONAL CONFERENCE ON ASIC (ASICON)     1 - 4  2015  [Refereed]

     View Summary

    As process technology continues scaling, low power and reliability of integrated circuits are becoming more critical than ever before. Particularly, due to the reduction of node capacitance and operating voltage for low power consumption, it makes the circuits more sensitive to high-energy particles induced soft errors. In this paper, a soft-error tolerant latch called TSPC-SEH is proposed for soft error tolerance with low power consumption. The simulation results show that the proposed TSPC-SEH latch can achieve up to 42% power consumption reduction and 54% delay improvement compared to the existing soft error tolerant SEH and DICE designs.

    DOI

  • A Hardware-Trojans Identifying Method Based on Trojan Net Scoring at Gate-Level Netlists.

    Masaru Oya, Youhua Shi, Noritaka Yamashita, Toshihiko Okamura, Yukiyasu Tsunoo, Satoshi Goto, Masao Yanagisawa, Nozomu Togawa

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   98-A ( 12 ) 2537 - 2546  2015  [Refereed]

     View Summary

    Outsourcing IC design and fabrication is one of the effective solutions to reduce design cost but it may cause severe security risks. Particularly, malicious outside vendors may implement Hardware Trojans (HTs) on ICs. When we focus on IC design phase, we cannot assume an HT-free netlist or a Golden netlist and it is too difficult to identify whether a given netlist is HT-free or not. In this paper, we propose a score-based hardware-trojans identifying method at gate-level netlists without using a Golden netlist. Our proposed method does not directly detect HTs themselves in a gate-level netlist but it detects a net included in HTs, which is called Trojan net, instead. Firstly, we observe Trojan nets from several HT-inserted benchmarks and extract several their features. Secondly, we give scores to extracted Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to classify HT-free and HT-inserted netlists. Based on these scores, we can successfully classify HT-free and HT-inserted netlists in all the Trust-HUB gate-level benchmarks and ISCAS85 benchmarks as well as HT-free and HT-inserted AES gate-level netlists. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be "HT-inserted" and all the HT-free gate-level benchmarks to be "HT-free" in approximately three hours for each benchmark.

    DOI

  • An Effective Suspicious Timing-Error Prediction Circuit Insertion Algorithm Minimizing Area Overhead.

    Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   98-A ( 7 ) 1406 - 1418  2015  [Refereed]

     View Summary

    As process technologies advance, timing-error correction techniques have become important as well. A suspicious timing-error prediction (STEP) technique has been proposed recently, which predicts timing errors by monitoring themiddle points, or check points of several speed-paths in a circuit. However, if we insert STEP circuits (STEPCs) in the middle points of all the paths from primary inputs to primary outputs, we need many STEPCs and thus require too much area overhead. How to determine these check points is very important. In this paper, we propose an effective STEPC insertion algorithm minimizing area overhead. Our proposed algorithm moves the STEPC insertion positions to minimize inserted STEPC counts. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs and reduce the required number of STEPCs to 1/10-1/80 and their area to 1/5-1/8 compared with a naive algorithm. Furthermore, our algorithm realizes 1.12X-1.5X overclocking compared with just inserting STEPCs into several speed-paths.

    DOI

  • An Energy-Efficient Floorplan Driven High-Level Synthesis Algorithm for Multiple Clock Domains Design.

    Shin-ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   98-A ( 7 ) 1376 - 1391  2015  [Refereed]

     View Summary

    In this paper, we first propose an HDR-mcd architecture, which integrates periodically all-in-phase based multiple clock domains and multi-cycle interconnect communication into high-level synthesis. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay should be considered during high-level synthesis. Next, we propose a high-level synthesis algorithm for HDR-mcd, which can reduce energy consumption by optimizing configuration and placement of huddles. Experimental results show that the proposed method achieves 32.5% energy-saving compared with the existing single clock domain based methods.

    DOI

  • A Floorplan-Aware High-Level Synthesis Technique with Delay-Variation Tolerance

    Kazushi Kawamura, Yuta Hagio, Youhua Shi, Nozomu Togawa

    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC)     122 - 125  2015  [Refereed]

     View Summary

    For realizing better trade-off between performance and yield rate in recent LSI designs, it is required to deal with increasing the ratios of interconnect delay as well as delay variation. In this paper, a novel floorplan-aware high-level synthesis technique with delay-variation tolerance is proposed. By utilizing floorplan-driven architectures, interconnect delays can be estimated and then handled even in high-level synthesis. Applying our technique enables to realize two scheduling/binding results (one is a non-delayed result and the other is a delayed result) simultaneously on a chip with small area/performance overhead, and either one of them can be selected according to the post-silicon delay variation. Experimental results demonstrate that our technique can reduce delayed scheduling/binding latency by up to 32.3% compared with conventional approaches.

  • A universal delay line circuit for variation resilient IC with self-calibrated time-to-digital converter

    Shuai Shao, Youhua Shi, Wentao Dai, Jianyi Meng, Weiwei Shan

    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC)     126 - 129  2015  [Refereed]

     View Summary

    A universal delay monitor used to imitate the real critical paths is developed for variation resilient integrated circuit. This monitor is constructed based on the different proportion of logic cells and interconnects. The delay of the monitor is detected by a time-to-digital converter which keeps the sampling results precise. To reduce the deviation of the sampling results caused by PVT, a novel time-to-digital converter with self-calibration mechanism is developed. This variation resilient method based adaptive voltage scaling is applied on an ARM7 based System on a Chip on 0.18 mu m CMOS process with a 112M signoff frequency and an area of 1.3*1.3 mm(2). The simulation results show that it has a 43.42% gain of power consumption under FF corner, -25 degrees Ccompared to the fixed 1.8 V traditional design.

  • Scan-based Side-channel Attack against Symmetric Key Ciphers Using Scan Signatures

    Mika Fujishiro, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC)     309 - 312  2015  [Refereed]

     View Summary

    There are a number of studies on a side-channel attack which uses information exploited from the physical implementation of a cryptosystem. A scan-based side-channel attack utilizes scan chains, one of design-for-test techniques and retrieves the secret information inside the cryptosystem. In this paper, scan based side-channel attack methods against symmetric key ciphers such as block ciphers and stream ciphers using scan signatures are presented to show the risk of scan-based attacks.

  • Throughput Driven Check Point Selection in Suspicious Timing Error Prediction based Designs

    Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE 5TH LATIN AMERICAN SYMPOSIUM ON CIRCUITS AND SYSTEMS (LASCAS)     1 - 4  2014  [Refereed]

     View Summary

    In this paper, a throughput-driven design technique is proposed, in which a suspicious timing error prediction circuit is inserted to monitor the signal transitions at some selected check points. Unlike previous works where timing errors are detected after their occurrence, the proposed method tries to use the real intermediate signal transitions for timing error prediction. The check point selection will affect both the maximal operation frequency and the suspicious timing error overestimation rate, both of which have an effect on the overall throughput, thus an analysis on the check point selection is also given. In our work, the circuit can be overclocked by a factor of 2 or more with ignorable area overhead while guarantees the always-correct output.

    DOI

  • In-situ Timing Monitoring Methods for Variation-Resilient Designs

    Youhua Shi, Nozomu Togawa

    2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     735 - 738  2014  [Refereed]

     View Summary

    With technology scaling, process, voltage, and temperature (PVT) variations pose great challenges on integrated circuit designs. Conventionally, LSI circuits are designed by adding pessimistic timing margin to guarantee "always correct" operations even under worst-case conditions. However, due to the increasing PVT variations, unacceptable larger design guard band should be reserved to avoid timing errors on critical paths of circuits, which will therefore lead to very inefficient designs in terms of power and performance. For this reason, in-situ timing monitoring technique has gained great research interest. In this paper, we will review existing variation-resilient design techniques with particular emphasis on in-situ timing monitoring techniques including both detection and prediction-based methods. The effectiveness of in-situ timing monitoring techniques will be discussed. Finally, we show an example of in-situ timing monitoring technique called STEP with applications to general pipeline designs.

    DOI

  • Secure scan design using improved random order and its evaluations

    Masaru Oya, Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     555 - 558  2014  [Refereed]

     View Summary

    Scan test using scan chains is one of the most important DFT techniques. However, scan-based attacks are reported which can retrieve the secret key in crypto circuits by using scan chains. Secure scan architecture is strongly required to protect scan chains from scan-based attacks. This paper proposes an improved version of random order as a secure scan architecture. In improved random order, a scan chain is partitioned into multiple sub-chains. The structure of the scan chain changes dynamically by selecting a subchain to scan out. Testability and security of the proposed improved random order are also discussed in the paper, and the implementation results demonstrate the effectiveness of the proposed method.

    DOI

  • An Area-Overhead-Oriented Monitoring-Path Selection Algorithm for Suspicious Timing Error Prediction

    Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     300 - 303  2014  [Refereed]

     View Summary

    As process technologies advance, the importance of timing error correction techniques is increasing as well. In this paper, We propose an area-overhead-oriented monitoring-path selection algorithm for suspicious timing error prediction circuits (STEPCs). STEPC predicts timing errors by monitoring the middle points of several speed-paths in a circuit. However, we need many STEPCs with a high area overhead to predict timing errors in an overall circuit. Our proposed method moves the STEPC insertion positions to minimize the number of inserted STEPCs. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs. Our proposed algorithm reduces the required number of STEPCs to 1/19 and their area to 1/5 compared with a naive algorithm. Furthermore, our algorithm realizes 2.25X overclocking compared with just inserting STEPCs into several speed-paths.

    DOI

  • Floorplan Driven Architecture and High-Level Synthesis Algorithm for Dynamic Multiple Supply Voltages

    Shin-ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E96A ( 12 ) 2597 - 2611  2013.12  [Refereed]

     View Summary

    In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delay into high-level synthesis. In AVHDR architecture, voltages can be dynamically assigned for energy reduction. In other words, low supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Next, an AVHDR-based high-level synthesis algorithm is proposed. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, the modules in each huddle can be placed close to each other and the corresponding AVHDR architecture can be generated and optimized with floorplanning information. Experimental results show that on average our algorithm achieves 43.9% energy-saving compared with conventional algorithms.

    DOI

  • Floorplan Driven Architecture and High-Level Synthesis Algorithm for Dynamic Multiple Supply Voltages

    Shin-ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E96A ( 12 ) 2597 - 2611  2013.12

     View Summary

    In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delay into high-level synthesis. In AVHDR architecture, voltages can be dynamically assigned for energy reduction. In other words, low supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Next, an AVHDR-based high-level synthesis algorithm is proposed. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, the modules in each huddle can be placed close to each other and the corresponding AVHDR architecture can be generated and optimized with floorplanning information. Experimental results show that on average our algorithm achieves 43.9% energy-saving compared with conventional algorithms.

    DOI

  • An Energy-efficient High-level Synthesis Algorithm Incorporating Interconnection Delays and Dynamic Multiple Supply Voltages

    Shin-ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

    2013 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION, AND TEST (VLSI-DAT)     1 - 4  2013  [Refereed]

     View Summary

    In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture) that integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis. Next, we propose a high-level synthesis algorithm for AVHDR architectures. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, huddles, each of which abstracts modules placed close to each other, are naturally generated using floorplanning. Low-supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Experimental results show that our algorithm achieves 50% energy-saving compared with conventional algorithms.

    DOI

  • Secure scan design with dynamically configurable connection

    Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC     256 - 262  2013  [Refereed]

     View Summary

    Scan test is a powerful test technique which can control and observe the internal states of the circuit under test through scan chains. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore new secure test methods are required to satisfy both testability and security requirements. In this paper, a secure scan design is proposed to achieve adequate security requirement as a countermeasure against scan-based attacks, while still maintain high testability like normal scan testing. In our method, the internal scan chain is divided into several sub chains, and the connection order of sub chains can be dynamically changed. In addition, how to decide the connection order of those sub chains so that it can't be identified by an attacker is also proposed in this paper. The proposed method is implemented on an AES circuit to show its effectiveness, and a security analysis is also given to show how the proposed approach can be used as a countermeasure against those known scan-based attacks. © 2013 IEEE.

    DOI

  • Suspicious Timing Error Prediction with In-Cycle Clock Gating

    Youhua Shi, Hiroaki Igarashi, Nozomu Togawa, Masao Yanagisawa

    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2013)     335 - 340  2013  [Refereed]

     View Summary

    Conventionally, circuits are designed to add pessimistic timing margin to solve delay variation problems, which guarantees "always correct" operations. However, due to the fact that such a worst-case condition occurs rarely, the traditional pessimistic design method is therefore becoming one of the main obstacles for designers to achieve higher performance and/or ultra-low power consumption. By monitoring timing error occurrence during circuit operation, adaptive timing error detection and recovery methods have gained wide interests recently as a promising solution. As an extension of existing research, in this paper, we propose a suspicious timing error prediction method for performance or energy efficiency improvement in pipeline designs. Experimental results show that with when compared with typical margin designs, the proposed method can 1) achieve up to 1.41X throughput improvement with in-situ timing error prediction ability; and 2) allow the design to be overclocked by up to 1.88X with "always correct" outputs.

    DOI

  • Concurrent Faulty Clock Detection for Crypto Circuits against Clock Glitch based DFA

    Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2013 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)     1432 - 1435  2013  [Refereed]

     View Summary

    In this paper, a concurrent faulty clock detection method is proposed for crypto circuits against clock glitch based differential fault analysis (DFA). In the proposed method, a non-logic buffer-based delay chain is inserted, and then by monitoring the delay along the delay chain, a possible clock glitch based DFA can be detected. Experimental results on an AES circuit show that the proposed method can successfully detect clock glitch based attacks, and the required area overhead is only 0.47% that is much smaller than previous works.

    DOI

  • Concurrent faulty clock detection for crypto circuits against clock glitch based DFA

    Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    Proceedings - IEEE International Symposium on Circuits and Systems     1432 - 1435  2013  [Refereed]

     View Summary

    In this paper, a concurrent faulty clock detection method is proposed for crypto circuits against clock glitch based differential fault analysis (DFA). In the proposed method, a nonlogic buffer-based delay chain is inserted, and then by monitoring the delay along the delay chain, a possible clock glitch based DFA can be detected. Experimental results on an AES circuit show that the proposed method can successfully detect clock glitch based attacks, and the required area overhead is only 0.47% that is much smaller than previous works. © 2013 IEEE.

    DOI

  • An Energy-efficient High-level Synthesis Algorithm Incorporating Interconnection Delays and Dynamic Multiple Supply Voltages

    Shin-ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

    2013 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION, AND TEST (VLSI-DAT)    2013  [Refereed]

     View Summary

    In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture) that integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis. Next, we propose a high-level synthesis algorithm for AVHDR architectures. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, huddles, each of which abstracts modules placed close to each other, are naturally generated using floorplanning. Low-supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Experimental results show that our algorithm achieves 50% energy-saving compared with conventional algorithms.

  • Scan-Based Attack on AES through Round Registers and Its Countermeasure

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E95A ( 12 ) 2338 - 2346  2012.12  [Refereed]

     View Summary

    Scan-based side channel attack on hardware implementations of cryptographic algorithms has shown its great security threat. Unlike existing scan-based attacks, in our work we observed that instead of the secret-related-registers, some non-secret registers also carry the potential of being misused to help a hacker to retrieve secret keys. In this paper, we first present a scan-based side channel attack method on AES by making use of the round counter registers, which are not paid attention to in previous works, to show the potential security threat in designs with scan chains. And then we discussed the issues of secure DFT requirements and proposed a secure scan scheme to preserve all the advantages and simplicities of traditional scan test, while significantly improve the security with ignorable design overhead, for crypto hardware implementations.

    DOI

  • Dynamically Changeable Secure Scan Architecture against Scan-Based Side Channel Attack

    Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2012 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC)     155 - 158  2012  [Refereed]

     View Summary

    Scan test which is one of the useful design for testability techniques is effective for LSIs including cryptographic circuit. It can observe and control the internal states of the circuit under test by using scan chain. However, scan chain presents a significant security risk of information leakage for scan-based attacks which retrieves secret keys of cryptographic LSIs. In this paper, a secure scan architecture against scan-based attack which still has high testability is proposed. In our method, scan data is dynamically changed by adding the latch to any FFs in the scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method.

    DOI

  • State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Side Channel Attack on RSA Circuit

    Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     607 - 610  2012  [Refereed]

     View Summary

    Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore testability and security contradicted to each other, and there is a need to an efficient design for testability circuit so as to satisfy both testability and security requirement. In this paper, a secure scan architecture against scan-based attack is proposed to achieve high security without compromising the testability. In our method, scan structure is dynamically changed by adding the latch to any FFs in the scan chain. We made an analysis on an RSA circuit implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

    DOI

  • MH4 : multiple-supply-voltages aware high-level synthesis for high-integrated and high-frequency circuits for HDR architectures

    Shin-ya Abe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    IEICE ELECTRONICS EXPRESS   9 ( 17 ) 1414 - 1422  2012  [Refereed]

     View Summary

    In this paper, we propose multiple-supply-voltages aware high-level synthesis algorithm for HDR architectures which realizes high-speed and high-efficient circuits. We propose three new techniques: virtual area estimation, virtual area adaptation, and floorplanning-directed huddling, and integrate them into our HDR architecture synthesis algorithm. Virtual area estimation/adaptation effectively estimates a huddle area by gradually reducing it during iterations, which improves the convergence of our algorithm. Floorplanning-directed huddling determines huddle composition very effectively by performing floorplanning and functional unit assignment inside huddles simultaneously. Experimental results show that our algorithm achieves about 29% run-time-saving compared with the conventional algorithms, and obtains a solution which cannot be obtained by our original algorithm even if a very tight clock constraint is given.

    DOI

  • Robust Secure Scan Design Against Scan-Based Differential Cryptanalysis

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS   20 ( 1 ) 176 - 181  2012.01  [Refereed]

     View Summary

    Scan technology carries the potential risk of being misused as a "side channel" to leak out the secrets of crypto cores. The existing scan-based attacks could be viewed as one kind of differential cryptanalysis, which takes advantages of scan chains to observe the bit changes between pairs of chosen plaintexts so as to identify the secret keys. To address such a design/test challenge, this paper proposes a robust secure scan structure design for crypto cores as a countermeasure against scan-based attacks to maintain high security without compromising the testability.

    DOI

  • MH4 : multiple-supply-voltages aware high-level synthesis for high-integrated and high-frequency circuits for HDR architectures

    Shin-ya Abe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    IEICE ELECTRONICS EXPRESS   9 ( 17 ) 1414 - 1422  2012

     View Summary

    In this paper, we propose multiple-supply-voltages aware high-level synthesis algorithm for HDR architectures which realizes high-speed and high-efficient circuits. We propose three new techniques: virtual area estimation, virtual area adaptation, and floorplanning-directed huddling, and integrate them into our HDR architecture synthesis algorithm. Virtual area estimation/adaptation effectively estimates a huddle area by gradually reducing it during iterations, which improves the convergence of our algorithm. Floorplanning-directed huddling determines huddle composition very effectively by performing floorplanning and functional unit assignment inside huddles simultaneously. Experimental results show that our algorithm achieves about 29% run-time-saving compared with the conventional algorithms, and obtains a solution which cannot be obtained by our original algorithm even if a very tight clock constraint is given.

    DOI

  • Scan-based attack on AES through round registers and its countermeasure

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E95-A ( 12 ) 2338 - 2346  2012

     View Summary

    Scan-based side channel attack on hardware implementations of cryptographic algorithms has shown its great security threat. Unlike existing scan-based attacks, in our work we observed that instead of the secret-related-registers, some non-secret registers also carry the potential of being misused to help a hacker to retrieve secret keys. In this paper, we first present a scan-based side channel attack method on AES by making use of the round counter registers, which are not paid attention to in previous works, to show the potential security threat in designs with scan chains. And then we discussed the issues of secure DFT requirements and proposed a secure scan scheme to preserve all the advantages and simplicities of traditional scan test, while significantly improve the security with ignorable design overhead, for crypto hardware implementations. Copyright © 2012 The Institute of Electronics, Information and Communication Engineers.

    DOI

  • Improved Launch for Higher TDF Coverage With Fewer Test Patterns

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS   29 ( 8 ) 1294 - 1299  2010.08  [Refereed]

     View Summary

    Due to the limitations of scan structure, the second vector in transition delay test is usually applied either by shift operation or by functional launch, which possibly results in unsatisfying transition delay fault (TDF) coverage. To overcome such a limitation for higher TDF coverage, a novel improved launch delay test technique that combines the pros of launch-on-shift and launch-on-capture tests is introduced in this paper. The proposed method can achieve near perfect TDF coverage with fewer test patterns without the need for a global fast scan enable signal. Experimental results on ISCAS89 and ITC99 benchmark circuits are included to show the effectiveness of the proposed method.

    DOI

  • State-dependent Changeable Scan Architecture against Scan-based Side Channel Attacks

    Ryuta Nara, Hiroshi Atobe, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS     1867 - 1870  2010  [Refereed]

     View Summary

    Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan path would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan path to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each State-dependent Scan FF to be inverted or not so as to make it more difficult to discover the internal scan architecture.

    DOI

  • VLSI Implementation of a Fast Intra Prediction Algorithm for H.264/AVC Encoding

    Youhua Shi, Kenta Tokumitsu, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS)     1139 - 1142  2010  [Refereed]

     View Summary

    Intra-frame coding is one of the most important technologies in H.264/AVC, which made significant contributions to the enhancement of coding efficiency of H.264/AVC at the cost of computation complexity. To address this problem, in this paper we present an efficient VLSI implementation of a computation efficient intra prediction algorithm for H.264/AVC encoding. Unlike most of existing fast intra-mode selection techniques, in the proposed method the directional differences are computed using a few selected original pixels to obtain the candidate modes with the minimal direction cost. The proposed method is hardware-friendly and provides more processing parallelism for H.264 intra-frame encoding with less overhead and less power consumption, which is expected to be utilized as a favourable accelerator hardware module in a real-time HDTV (1920x1080p) H.264 encoder.

    DOI

  • X-Handling for Current X-Tolerant Compactors with More Unknowns and Maximal Compaction

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E92A ( 12 ) 3119 - 3127  2009.12  [Refereed]

     View Summary

    This paper presents a novel X-handling technique, which removes the effect of unknowns on compacted test response with maximal compaction ratio. The proposed method combines with the current X-tolerant compactors and inserts masking cells on scan paths to selectively mask X's. By doing this, the number of unknown responses in each scan-out cycle could be reduced to a reasonable level such that the target X-tolerant compactor would tolerate with guaranteed possible error detection, It guarantees no test loss due to the effect of X's, and achieves the maximal compaction that the target response compactor could provide as well. Moreover, because the masking cells are only inserted on the scan paths, it has no performance degradation of the designs. Experimental results demonstrate the effectiveness of the proposed method.

    DOI

  • X-Handling for Current X-Tolerant Compactors with More Unknowns and Maximal Compaction

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E92A ( 12 ) 3119 - 3127  2009.12

     View Summary

    This paper presents a novel X-handling technique, which removes the effect of unknowns on compacted test response with maximal compaction ratio. The proposed method combines with the current X-tolerant compactors and inserts masking cells on scan paths to selectively mask X's. By doing this, the number of unknown responses in each scan-out cycle could be reduced to a reasonable level such that the target X-tolerant compactor would tolerate with guaranteed possible error detection, It guarantees no test loss due to the effect of X's, and achieves the maximal compaction that the target response compactor could provide as well. Moreover, because the masking cells are only inserted on the scan paths, it has no performance degradation of the designs. Experimental results demonstrate the effectiveness of the proposed method.

    DOI

  • Unified Dual-Radix Architecture for Scalable Montgomery Multiplications in GF(P) and GF(2(n))

    Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E92A ( 9 ) 2304 - 2317  2009.09  [Refereed]

     View Summary

    Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), that is a type of public-key cryptography. Montgomery multiplier is commonly used to compute the modular multiplications and requires scalability because the bit length of operands varies depending on its security level. In addition, ECC is performed in GF(P) or GF(2(n)), and unified architecture for multipliers in GF(P) and GF(2(n)) is required. However, in previous works, changing frequency is necessary to deal with delay-time difference between GF(P) and GF(2(n)) multipliers because the critical path of the GF(P) multiplier is longer. This paper proposes unified dual-radix architecture for scalable Montgomery multiplications in GF(P) and GF(2(n)). This proposed architecture unifies four parallel radix-2(16) multipliers in GF(P) and a radix-2(64) multiplier in GF(2(n)) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute a GF(P) 256-bit Montgomery multiplication in 0.28 mu s. The implementation result shows that the area of the proposal is almost the same as that of previous works: 39 kgates.

    DOI

  • Unified Dual-Radix Architecture for Scalable Montgomery Multiplications in GF(P) and GF(2(n))

    Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E92A ( 9 ) 2304 - 2317  2009.09

     View Summary

    Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), that is a type of public-key cryptography. Montgomery multiplier is commonly used to compute the modular multiplications and requires scalability because the bit length of operands varies depending on its security level. In addition, ECC is performed in GF(P) or GF(2(n)), and unified architecture for multipliers in GF(P) and GF(2(n)) is required. However, in previous works, changing frequency is necessary to deal with delay-time difference between GF(P) and GF(2(n)) multipliers because the critical path of the GF(P) multiplier is longer. This paper proposes unified dual-radix architecture for scalable Montgomery multiplications in GF(P) and GF(2(n)). This proposed architecture unifies four parallel radix-2(16) multipliers in GF(P) and a radix-2(64) multiplier in GF(2(n)) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute a GF(P) 256-bit Montgomery multiplication in 0.28 mu s. The implementation result shows that the area of the proposal is almost the same as that of previous works: 39 kgates.

    DOI

  • Design-for-Secure-Test for Crypto Cores

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    ITC: 2009 INTERNATIONAL TEST CONFERENCE     618 - 618  2009  [Refereed]

     View Summary

    Scan technology carries the potential of being misused as a "side channel" to leak out the secret information of crypto cores. To address such a design challenge, this paper proposes a design-for-secure-test (DFST) solution for crypto cores by adding a stimuli-launched flip-flop into the traditional scan flip-flop to maintain the high test quality without compromising the security.

    DOI

  • A Unified Test Compression Technique for Scan Stimulus and Unknown Masking Data with No Test Loss

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E91A ( 12 ) 3514 - 3523  2008.12  [Refereed]

     View Summary

    This paper presents a unified test compression technique for scan stimulus and unknown masking data with seamless integration of test generation, test compression and all unknown response masking for high quality manufacturing test cost reduction. Unlike prior test compression methods. the proposed approach considers the unknown responses during test pattern generation procedure, and then selectively encodes, the less specified bits (either Is or Os) in each scan slice for compression while at the same time masks the unknown responses before sending them to the response compactor. The proposed test scheme could dramatically reduce test data volume as well as the number of required test channels by using only c tester channels to drive N internal scan chains, where c = inverted right perpendicular log(2) N inverted left perpendicular + 2. In addition, because all the unknown responses could be exactly masked before entering into the response compactor, test loss due to unknown responses would be eliminated. Experimental results oil both benchmark circuits and larger designs indicated the effectiveness of the proposed technique.

    DOI

  • A Unified Test Compression Technique for Scan Stimulus and Unknown Masking Data with No Test Loss

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E91A ( 12 ) 3514 - 3523  2008.12

     View Summary

    This paper presents a unified test compression technique for scan stimulus and unknown masking data with seamless integration of test generation, test compression and all unknown response masking for high quality manufacturing test cost reduction. Unlike prior test compression methods. the proposed approach considers the unknown responses during test pattern generation procedure, and then selectively encodes, the less specified bits (either Is or Os) in each scan slice for compression while at the same time masks the unknown responses before sending them to the response compactor. The proposed test scheme could dramatically reduce test data volume as well as the number of required test channels by using only c tester channels to drive N internal scan chains, where c = inverted right perpendicular log(2) N inverted left perpendicular + 2. In addition, because all the unknown responses could be exactly masked before entering into the response compactor, test loss due to unknown responses would be eliminated. Experimental results oil both benchmark circuits and larger designs indicated the effectiveness of the proposed technique.

    DOI

  • A secure test technique for pipelined advanced encryption standard

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E91D ( 3 ) 776 - 780  2008.03  [Refereed]

     View Summary

    In this paper, we presented a Design-for-Secure-Test (DFST) technique for pipelined AES to guarantee both the security and the test quality during testing. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits.

    DOI

  • A secure test technique for pipelined advanced encryption standard

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E91D ( 3 ) 776 - 780  2008.03

     View Summary

    In this paper, we presented a Design-for-Secure-Test (DFST) technique for pipelined AES to guarantee both the security and the test quality during testing. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits.

    DOI

  • Scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2(n))

    Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, Kazunori Shimizu, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2008 ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2     667 - 672  2008  [Refereed]

     View Summary

    Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in GF(P) or GF(2), and unified architectures for GF(P) and GF(2(n)) Multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2(n)) circuits of the multiplier because the critical path of GF(P) circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2(n)). The proposed architecture unifies 4 parallel radix-2(16) multipliers in GF(P) and a radix-2(64) multiplier in GF(2(n)) into a single unit Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute GF(P) 256-bit Montgomery multiplication in 0.23 mu s.

    DOI

  • GECOM: Test data compression combined with all unknown response masking

    Youhua Shi, Nozontu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2008 ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2     537 - 542  2008  [Refereed]

     View Summary

    This paper introduces GECOM technology, a novel test compression method with seamless integration of test GEneration, test COmpression (i.e. integrated compression on scan stimulus and masking bits) and all unknown scan responses Masking for manufacturing test cost reduction. Unlike most of prior methods, the proposed method considers the unknown responses during ATPG procedure and selectively encodes the specified 1 or 0 bits (either Is or Os) in scan slices for compression while at the same time masks the unknown responses before sending them to the response compactor. The proposed GECOM technology consists of GECOM architecture and GECOM ATPG technique. In the GECOM architecture, for a circuit with N internal scan chains, only c tester channels, where c = [log(2) N] +2, are required. GECOM ATPG generates test patterns for the GECOM architecture thus not only the scan inputs could be efficiently compressed but also all the unknown responses would be masked. Experimental results on both benchmark circuits and real industrial designs indicated the effectiveness of the proposed GECOM technique.

    DOI

  • Unknown Response Masking with Minimized Observable Response Loss and Mask Data

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2008 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2008), VOLS 1-4     1779 - +  2008  [Refereed]

     View Summary

    This paper presents a new unknown response masking technique to minimize the effect on test loss due to over-masking. Unlike previous works where the scan responses are masked before entering the response compactor, the proposed method could mask the Xs when they are transformed on the scan path. Meanwhile, the masking cells are inserted along the scan paths, thus they would have no degradation on the performance of the designs. In addition, the test data required to mask unknown responses is only one bit for each test pattern. Experimental results show the effectiveness of the proposed method.

    DOI

  • Design for secure test - A case study on pipelined Advanced Encryption Standard

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11     149 - 152  2007  [Refereed]

     View Summary

    Cryptography plays an important role in the security of data transmission. To ensure the correctness of crypto hardware, we should conduct testing at fabrication and infield. However, the state-of-the-art scan-based test techniques, to achieve high test qualities, need to increase the testability of the circuit under test, which carries a potential of being misused to reveal the secret information of the crypto hardware. Thus, to develop efficient test strategies for crypto hardware to achieve high test quality without compromising security becomes an important task. In this paper we discuss the development of a Design-for-Secure-Test (DFST) technique for pipelined AES to overcome the above contradiction between security and test quality in testing crypto hardware. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits.

    DOI

  • Low-cost IP core test using muiltiple-mode loading scan chain and scan chain clusters

    Gang Zeng, Youhua Shi, Toshinori Takabatake, Masao Yanagisawa, Hideo Ito

    21ST IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT-TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS     136 - +  2006  [Refereed]

     View Summary

    A fixing-shifting encoding (FSE) method is proposed to reduce test cost of IP cores. The FSE method reduces test cost by supporting multiple-mode loading test data, i.e., parallel loading, left-direction, and right-direction serial loading for each test slice data. Furthermore, the FSE that utilizes only two test channels can support a large number of internal scan chains and achieve further reduction in test cost by combining with scan chain clustering method As a non-intrusive and automatic test pattern generation (ATPG) independent solution, the approach is applicable to IP core testing because it requires, neither redesign of the core under test (CUT) nor running any additional ATPG for the encoding procedure. In addition, the decoder has low hardware overhead, and its design is independent of the CUT Experimental results for some large ISCAS 89 benchmarks and an industry ASIC design have proven the efficiency of the proposed approach.

    DOI

  • FCSCAN: An efficient multiscan-based test compression technique for test cost reduction

    Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

    ASP-DAC 2006: 11TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, PROCEEDINGS     653 - 658  2006  [Refereed]

     View Summary

    This paper proposes a new multiscan-based test input data compression technique by employing a Fan-out Compression Scan Architecture (FCSCAN) for test cost reduction. The basic idea of FCSCAN is to target the minority specified 1 or 0 bits (either 1 or 0) in scan slices for compression. Due to the low specified bit density in test cube set, FCSCAN can significantly reduce input test data volume and the number of required test channels so as to reduce test cost. The FCSCAN technique is easy to be implemented with small hardware overhead and does not need any special ATPG for test generation. In addition, based on the theoretical compression efficiency analysis, improved procedures are also proposed for the FCSCAN to achieve further compression. Experimental results on both benchmark circuits and one real industrial design indicate that drastic reduction in test cost can be indeed achieved.

    DOI

  • Selective Low-Care Coding: A Means for Test Data Compression in Circuits with Multiple Scan Chains.

    Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE Transactions   89-A ( 4 ) 996 - 1004  2006  [Refereed]

    DOI

  • Selective low-care coding: A means for test data compression in circuits with multiple scan chains

    Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E89-A ( 4 ) 996 - 1003  2006

     View Summary

    This paper presents a test input data compression technique, Selective Low-Care Coding (SLC), which can he used to significantly reduce input test data volume as well as the external test channel requirement for multiscan-based designs. In the proposed SLC scheme, we explored the linear dependencies of the internal scan chains, and instead of encoding all the specified bits in test cubes, only a smaller amount of specified bits are selected for encoding, thus greater compression can be expected. Experiments on the larger benchmark circuits show drastic reduction in test data volume with corresponding savings on test application time can be indeed achieved even for the well-compacted test set. Copyright © 2006 The Institute of Electronics, Information and Communication Engineers.

    DOI

  • Low power test compression technique for designs with multiple scan chains

    YH Shi, N Togawa, S Kimura, M Yanagisawa, T Ohtsuki

    14TH ASIAN TEST SYMPOSIUM, PROCEEDINGS     386 - 389  2005  [Refereed]

     View Summary

    This paper presents a new DFT technique that can significantly reduce test data volume as well as scan-in power consumption for multiscan-based designs. It can also help to reduce test time and tester channel requirements with small hardware overhead In the proposed approach, we start with a pre-computed test cube set and fill the don't-cares with proper values for joint reduction of test data volume and scan power consumption. In addition we explore the linear dependencies of the scan chains to construct a fanout structure only with inverters to achieve further compression. Experimental results for the larger ISCAS'89 benchmarks show the efficiency of the proposed technique.

    DOI

  • A selective scan chain reconfiguration through run-length coding for test data compression and scan power reduction

    Y Shi, S Kimura, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E87A ( 12 ) 3208 - 3215  2004.12  [Refereed]

     View Summary

    Test data volume and power consumption for scan-based designs are two major concerns in system-on-a-chip testing. However, test set compaction by filling the don't-cares will invariably increase the scan-in power dissipation for scan testing, then the goals of test data reduction and low-power scan testing appear to be conflicted. Therefore, in this paper we present a selective scan chain reconfiguration method for test data compression and scan-in power reduction. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. After the scan chain reconfiguration a dictionary is built to indicate the run-length of each compatible class and only the scan-in data for each class should be transferred from the ATE to the CUT so as to reduce test data volume. Experimental results for the larger ISCAS' 89 benchmarks show that the proposed approach overcomes the limitations of traditional run-length coding techniques, and leads to highly reduced test data volume with significant power savings during scan testing in all cases.

  • A hybrid dictionary test data compression for multiscan-based designs

    Y Shi, S Kimura, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E87A ( 12 ) 3193 - 3199  2004.12  [Refereed]

     View Summary

    In this paper, we present a test data compression technique to reduce test data volume for multiscan-based designs. In our method the internal scan chains are divided into equal sized groups and two dictionaries were build to encode either an entire slice or a subset of the slice. Depending on the codeword, the decompressor may load all scan chains or may load only a group of the scan chains, which can enhance the effectiveness of dictionary-based compression. In contrast to previous dictionary coding techniques, even for the CUT with a large number of scan chains, the proposed approach can achieve satisfied reduction in test data volume with a reasonable smaller dictionary. Experimental results showed the proposed test scheme works particularly well for the large ISCAS'89 benchmarks.

  • Alternative Run-Length.Coding through scan chain reconfiguration for joint minimization of test data volume and power consumption in scan test

    YH Shi, S Kimura, N Togawa, M Yanagisawa, T Ohtsuki

    13TH ASIAN TEST SYMPOSIUM, PROCEEDINGS     432 - 437  2004  [Refereed]

     View Summary

    Test data volume and scan power are two Major concerns in SoC test. In this paper we present an alternative run-length coding method through scan chain reconfiguration to reduce both test data volume and scan-in power consumption. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. To extract the compatible scan cells we apply a heuristic algorithm by solving the graph coloring problem; and then a simple greedy algorithm is used to configure the scan chain for the minimization of scan power Experimental results for the larger ISCAS'89 benchmarks show that the proposed approach leads to highly reduced test data volume with significant power savings during scan test.

    DOI

  • A hybrid dictionary test data compression for multiscan-based designs

    Youhua Shi, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E87-A   3193 - 3199  2004.01

     View Summary

    In this paper, we present a test data compression technique to reduce test data volume for multiscan-based designs. In our method the internal scan chains are divided into equal sized groups and two dictionaries were build to encode either an entire slice or a subset of the slice. Depending on the codeword, the decompressor may load all scan chains or may load only a group of the scan chains, which can enhance the effectiveness of dictionary-based compression. In contrast to previous dictionary coding techniques, even for the CUT with a large number of scan chains, the proposed approach can achieve satisfied reduction in test data volume with a reasonable smaller dictionary. Experimental results showed the proposed test scheme works particularly well for the large ISCAS'89 benchmarks.

  • A selective scan chain reconfiguration through run-length coding for test data compression and scan power reduction

    Youhua Shi, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E87-A   3208 - 3214  2004.01

     View Summary

    Test data volume and power consumption for scan-based designs are two major concerns in system-on-a-chip testing. However, test set compaction by filling the don't-cares will invariably increase the scan-in power dissipation for scan testing, then the goals of test data reduction and low-power scan testing appear to be conflicted. Therefore, in this paper we present a selective scan chain reconfiguration method for test data compression and scan-in power reduction. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. After the scan chain reconfiguration a dictionary is built to indicate the run-length of each compatible class and only the scan-in data for each class should be transferred from the ATE to the CUT so as to reduce test data volume. Experimental results for the larger ISCAS'89 benchmarks show that the proposed approach overcomes the limitations of traditional run-length coding techniques, and leads to highly reduced test data volume with significant power savings during scan testing in all cases.

  • Reducing test data volume for multiscan-based designs through single/sequence mixed encoding

    Y Shi, S Kimura, N Togawa, M Yanagisawa, T Ohtsuki

    2004 47TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, CONFERENCE PROCEEDINGS     445 - 448  2004  [Refereed]

     View Summary

    This paper presents a new test data compression technique for multiscan-based designs through dictionary-based encoding on the single or sequences scan-inputs. In spite of its simplicity, it achieves significant reduction in test data volume. Unlike some previous approaches on test data compression, our approach eliminates the need for additional synchronization and handshaking between the CUT and the ATE, so it is especially suitable to be integrated in a low cost test scheme for SoC test In addition in contrast to previous dictionary-based coding techniques, even for the CUT with a small number of scan chains, the proposed approach can achieve satisfied reduction in test data volume. Experimental results showed the proposed test scheme works particularly well for the large ISCAS'89 benchmarks.

  • A built-in reseeding technique for LFSR-based test pattern generation

    Y Shi, Z Zhang, S Kimura, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E86A ( 12 ) 3056 - 3062  2003.12  [Refereed]

     View Summary

    Reseeding technique is proposed to improve the fault coverage in pseudo-random testing. However most of previous works on reseeding is based on storing the seeds in an external tester or in a ROM. In this paper we present a built-in reseeding technique for LFSR-based test pattern generation. The proposed structure can run both in pseudorandom mode and in reseeding mode. Besides, our method requires no storage for the seeds since in reseeding mode the seeds can be generated automatically in hardware. In this paper we also propose an efficient grouping algorithm based on simulated annealing to optimize test vector grouping. Experimental results for benchmark circuits indicate the superiority of our technique against other reseeding methods with respect to test length and area overhead. Moreover, since the theoretical properties of LFSRs are preserved, our method could be beneficially used in conjunction with any other techniques proposed so far.

  • Multiple test set generation method for LFSR-Based BIST

    YH Shi, Z Zhe

    ASP-DAC 2003: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     863 - 868  2003  [Refereed]

     View Summary

    In this paper we propose a new reseeding method for LFSR-based test pattern generation suitable for circuits with random pattern resistant faults. The character of our method is that the proposed test pattern generator (TPG) can work both in normal LFSR mode, to generate pseudorandom test vectors, and in jumping mode to make the TPG jump from a state to the required state (seed of next group). Experimental results indicate that its superiority against other known reseeding techniques with respect to the length-of the test sequence and the required area overhead.

    DOI

  • A Built-in Reseeding Technique for LFSR-Based Test Pattern Generation

    Youhua Shi, Zhe Zhang, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E86-A   3056 - 3062  2003.01

     View Summary

    Reseeding technique is proposed to improve the fault coverage in pseudo-random testing. However most of previous works on reseeding is based on storing the seeds in an external tester or in a ROM. In this paper we present a built-in reseeding technique for LFSR-based test pattern generation. The proposed structure can run both in pseudorandom mode and in reseeding mode. Besides, our method requires no storage for the seeds since in reseeding mode the seeds can be generated automatically in hardware. In this paper we also propose an efficient grouping algorithm based on simulated annealing to optimize test vector grouping. Experimental results for benchmark circuits indicate the superiority of our technique against other reseeding methods with respect to test length and area overhead. Moreover, since the theoretical properties of LFSRs are preserved, our method could be beneficially used in conjunction with any other techniques proposed so far.

  • A new low power BIST methodology by altering the structure of linear feedback shift registers

    R Li, C Hu, J Yang, Z Zhang, YH Shi, LX Shi

    2001 4TH INTERNATIONAL CONFERENCE ON ASIC PROCEEDINGS   25   646 - 649  2001

     View Summary

    In this paper a new low power BIST methodology by altering the structure of linear feedback shift register (LFSR) is proposed. In pseudo-random test mode, the efficiency of the vectors decreases sharply as the test progresses. For low power consumption during test mode, the proposed approach ignores the non-detecting vectors by altering the structure of LFSR. Note that altering the structure of LFSR is efficient, and it has no impact on the fault coverage.

  • A new software for test logic optimization in DFT

    Z Zhang, C Hu, R Li, YH Shi, LX Shi

    2001 4TH INTERNATIONAL CONFERENCE ON ASIC PROCEEDINGS     654 - 657  2001  [Refereed]

     View Summary

    This paper presents a new software named ASIC2000TA developed for design for test (DFT) aiming at optimizing test logic. This software consists of two modules: test analysis module and DFT module. Test analysis module can examine circuit's testability, generate test vectors and perform fault simulation, in which some algorithms are described. DFT module automatically inserts test logic in gate-level netlist, including full scan and partial scan, in which a greedy search algorithm is discussed. Electronic design intermediate format (EDIF) acts as an interface between ASIC2000TA and Cadence. An experiment of ASIC2000TA is presented at last.

▼display all

Misc

  • Timing-error-tolerant AES cipher

    吉田 慎之介, 史 又華, 柳澤 政生

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   115 ( 465 ) 73 - 78  2016.02

    CiNii

  • In-situ Hardware-Trojan Authentication for Invalidating Malicious Functions

    大屋 優, 史 又華, 柳澤 政生

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   115 ( 465 ) 79 - 84  2016.02

    CiNii

  • A Quantitative Criterion of Gate-Level Netlist Vulnerability

    大屋 優, 史 又華, 山下 哲孝

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   115 ( 339 ) 141 - 146  2015.12

    CiNii

  • A Quantitative Criterion of Gate-Level Netlist Vulnerability

    大屋 優, 史 又華, 山下 哲孝

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   115 ( 338 ) 141 - 146  2015.12

    CiNii

  • A low-power soft error tolerant latch scheme on 15nm process

    田島 咲季, 史 又華, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   115 ( 338 ) 123 - 127  2015.12

    CiNii

  • A low-power soft error tolerant latch scheme on 15nm process

    田島 咲季, 史 又華, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   115 ( 339 ) 123 - 127  2015.12

    CiNii

  • A-9-2 Low-power soft-error tolerant New-SEH latch scheme

    TAJIMA Saki, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao

    Proceedings of the IEICE Engineering Sciences Society/NOLTA Society Conference   2015   106 - 106  2015.08

    CiNii

  • AES Encryption Circuit against Clock Glitch based Fault Analysis

    平野 大輔, 史 又華, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   115 ( 21 ) 51 - 55  2015.05

    CiNii

  • AES Encryption Circuit against Clock Glitch based Fault Analysis

    平野 大輔, 史 又華, 戸川 望, 柳澤 政生

    情報処理学会研究報告. SLDM, [システムLSI設計技術]   2015 ( 10 ) 1 - 5  2015.05

     View Summary

    Recently, fault analysis has attracted a lot of attentions as a new kind of side channel attack methods, in which malicious faults are generally injected by attackers through clock glitch generation, voltage change, or laser manipulation during the execution of a crypto circuit. As existing countermeasures against fault analysis, area-redundant and time-redundant methods have been proposed. However they will cause large area overhead or time overhead. Therefore, in this paper, we proposed an AES circuit design that can detect timing faults caused by malicious clock glitches. Experimental results show that the proposed method can detect 100% timing faults at only 4.9% post-layout area overhead.

    CiNii

  • A low-power soft error tolerant latch scheme

    TAJIMA Saki, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao

    Technical report of IEICE. VLD   114 ( 476 ) 55 - 60  2015.03

     View Summary

    In recent technology scaling, reduction of reliability by soft-error and increase power has appeared as an inevitable problem for logic circuits. We propose a low-power and high soft-error tolerant latch called TSPC-SEH latch based Soft Error Hardened (SEH) latch and True Single Phase Clock (TSPC). To compere SEH latch and DICE latch, the proposed latch archives 42% power reduction, and 54%s delay reduction.

    CiNii

  • A Score-Based Hardware-Trojan Identification Method for Gate-Level Netlists

    OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 476 ) 165 - 170  2015.03

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. In this paper, we propose an HT identification method for gate-level netlists without using a Golden netlist. Firstly, we extract several their features specific to Trojan nets using several HT-inserted benchmarks. Secondly, we give scores to Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to identify HTs. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be HT-inserted and all the HT-free gate-level benchmarks to be HT-free in approximately three hours for each benchmark.

    CiNii

  • A Hardware Trojan Detection Method based on Trojan Net Features

    OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 426 ) 157 - 162  2015.01

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. Particularly HTs can be easily inserted during design phase but their detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method based on Trojan net features. Most of nets in HTs have several features and our method detects the nets having these features. Our approach does not assume Golden netlists nor activation of HTs. We can succesfully detect a Trojan net in each of the HT-inserted gate-level netlists from the Trust-HUB benchmark. It takes approximately thirty minutes to detect Trojan nets in each benchmark.

    CiNii

  • A Hardware Trojan Detection Method based on Trojan Net Features

    OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Computer systems   114 ( 427 ) 157 - 162  2015.01

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. Particularly HTs can be easily inserted during design phase but their detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method based on Trojan net features. Most of nets in HTs have several features and our method detects the nets having these features. Our approach does not assume Golden netlists nor activation of HTs. We can succesfully detect a Trojan net in each of the HT-inserted gate-level netlists from the Trust-HUB benchmark. It takes approximately thirty minutes to detect Trojan nets in each benchmark.

    CiNii

  • A Hardware Trojan Detection Method based on Trojan Net Features

    OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report   114 ( 428 ) 157 - 162  2015.01

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. Particularly HTs can be easily inserted during design phase but their detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method based on Trojan net features. Most of nets in HTs have several features and our method detects the nets having these features. Our approach does not assume Golden netlists nor activation of HTs. We can succesfully detect a Trojan net in each of the HT-inserted gate-level netlists from the Trust-HUB benchmark. It takes approximately thirty minutes to detect Trojan nets in each benchmark.

    CiNii

  • A Hardware Trojan Detection Method based on Trojan Net Features

    大屋 優, 史 又華, 柳澤 政生, 戸川 望

    情報処理学会研究報告. SLDM, [システムLSI設計技術]   2015 ( 28 ) 1 - 6  2015.01

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. Partic ularly HTs can be easily inserted during design phase but their detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method based on Trojan net features. Most of nets in HTs have several features and our method detects the nets having these features. Our approach does not assume Golden netlists nor activation of HTs. We can succesfully detect a Trojan net in each of the HT-inserted gate-level netlists from the Trust-HUB benchmark. It takes approximately thirty minutes to detect Trojan nets in each benchmark.

    CiNii

  • Design of Flip-Flop with Timing Error Tolerance

    SUZUKI Taito, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    Technical report of IEICE. VLD   114 ( 328 ) 45 - 50  2014.11

     View Summary

    Under the influence of the miniaturization of the integrated circuit, the variation of the operation condition of the circuit becomes bigger, and margins of the supply voltage and the clock frequency necessary for a design increase. For the mitigation of the margin, the structure of the circuit with the timing error tolerance is studied flourishingly. In this paper, we propose two new Time Borrowing Flip-Flops (TBFF) in transistor level to realize timing error tolerance by switching from flip-flop to latch dynamically. HSPICE simulation results show that the proposed TBFF can achieve up to 28.1% power reduction when compared with existing works.

    CiNii

  • Data Dependent Optimization using Suspicious Timing Error Prediction for Reconfigurable Approximation Circuits

    KAWAMURA Kazushi, ABE Shinya, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 51 - 56  2014.11

     View Summary

    The propagation delay along each path inside an LSI widely varies depending on input data, and this property can be exploited to design high-performance approximation circuit with a negligible error rate. In this paper, we propose a novel approximation circuit design algorithm, which identifies paths to be optimized based on input data and reconfigures these paths. Our algorithm first identifies the optimized paths by incorporating timing error prediction circuits into a target circuit and running them in practice. These paths are then dynamically reconfigured within an accuracy constraint with the objective of maximizing its performance. Experimental results targeting a set of basic adders show that our algorithm can achieve performance increase by up to 18.5% within acceptable error of 2.1% compared with conventional design techniques.

    CiNii

  • An Effective Robust Design Using Improved Checkpoint Insertion Algorithm for Suspicious Timing-Error Prediction Scheme and its Evaluations

    YOSHIDA Shinnosuke, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 57 - 62  2014.11

     View Summary

    As process technologies advance, process and delay variation causes a complex timing design and in-situ timing error correction techniques are strongly required. Suspicious timing error prediction (STEP) predicts timing errors by monitoring checkpoints by STEP circuits (STEPCs) and how to insert checkpoints is very important. We have proposed a network-flow-based checkpoint insertion algorithm for STEP. However, our algorithm may ignore long paths and insert checkpoints near the output. In this paper, we improve how to ignore short paths and set labels by estimating path lengths. Then, we can ignore only short paths and insert checkpoints into near the center of all long paths. We evaluate our algorithm by applying it to four benchmark circuits. Experimental results show that our proposed algorithm realizes an average of 1.71X overclocking compared with just inserting no STEPC. Furthermore, our improved algorithm realizes an average of 1.15X overclocking compared with our original algorithm.

    CiNii

  • High speed design of sub-threshold circuit by using DTMOS

    FUKUDOME Yuji, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    Technical report of IEICE. VLD   114 ( 328 ) 117 - 121  2014.11

     View Summary

    Low power consumption is achieved by operating circuits in sub-threshold region. However, in subthreshold region, the operating speed becomes slow, and the tradeoff between power and speed should be considered carefully. In this work, we present DTMOS implementations to realize high speed and low power in subthreshold region. Transistor level simulation results show that the operating speed can be improved by 30 %-45 %, and on average 15 % energy reduction can be achieved when V_<dd> ranges 0.2-0.3V.

    CiNii

  • A Hardware Trojans Detection Method focusing on Nets in Hardware Trojans in Gate-Level Netlists

    OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 135 - 140  2014.11

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce design costs in semiconductor industry. This circumstance introduces risks that malicious attackers implement Hardware Trojans (HTs) into ICs. HTs are easily inserted in particular during design phase, but HTs detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method through detecting LSLG nets, which have low switching probabilities. Our approach does not assume Golden netlists nor activation of HTs. We succesfully find out that all HT-inserted gate-level netlists from Trust-HUB benchmarks include a small number of LSLG nets. It takes approximately ten minutes to detect LSLG nets in each benchmark.

    CiNii

  • Energy-efficient High-level Synthesis Algorithm targeting HDR-mcv Architecture with Multiple Clock Domains and Multiple Supply Voltages

    ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 203 - 208  2014.11

     View Summary

    An HDR-mcv architecture, which integrates multiple supply voltages and multiple clock domains into high-level synthesis and enables us to estimate interconnection delay effects during high-level synthesis, has been proposed with the corresponding synthesis algorithm. They assign voltages and clock frequencies to huddles which are the partitions for interconnection delay estimation during high-level synthesis. However, the voltage and clock assignment may have some energy overheads due to the increased clock trees. In this paper, we propose a new HDR-mcv architecture in which supply voltages are assigned to functional logics and clock synchronization logics separately. Next, we propose a high-level synthesis algorithm for the architecture, which can assign clock frequencies and supply voltages on the bases of the placement and energy informations. Experimental results show that the proposed method achieves 50% energy-saving compared with the conventional HDR-mcv architecture and 60% energy-saving compared with the existing high-level synthesis methods.

    CiNii

  • Design of Flip-Flop with Timing Error Tolerance

    SUZUKI Taito, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    IEICE technical report. Dependable computing   114 ( 329 ) 45 - 50  2014.11

     View Summary

    Under the influence of the miniaturization of the integrated circuit, the variation of the operation condition of the circuit becomes bigger, and margins of the supply voltage and the clock frequency necessary for a design increase. For the mitigation of the margin, the structure of the circuit with the timing error tolerance is studied flourishingly. In this paper, we propose two new Time Borrowing Flip-Flops (TBFF) in transistor level to realize timing error tolerance by switching from flip-flop to latch dynamically. HSPICE simulation results show that the proposed TBFF can achieve up to 28.1% power reduction when compared with existing works.

    CiNii

  • Data Dependent Optimization using Suspicious Timing Error Prediction for Reconfigurable Approximation Circuits

    KAWAMURA Kazushi, ABE Shinya, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   114 ( 329 ) 51 - 56  2014.11

     View Summary

    The propagation delay along each path inside an LSI widely varies depending on input data, and this property can be exploited to design high-performance approximation circuit with a negligible error rate. In this paper, we propose a novel approximation circuit design algorithm, which identifies paths to be optimized based on input data and reconfigures these paths. Our algorithm first identifies the optimized paths by incorporating timing error prediction circuits into a target circuit and running them in practice. These paths are then dynamically reconfigured within an accuracy constraint with the objective of maximizing its performance. Experimental results targeting a set of basic adders show that our algorithm can achieve performance increase by up to 18.5% within acceptable error of 2.1% compared with conventional design techniques.

    CiNii

  • An Effective Robust Design Using Improved Checkpoint Insertion Algorithm for Suspicious Timing-Error Prediction Scheme and its Evaluations

    YOSHIDA Shinnosuke, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   114 ( 329 ) 57 - 62  2014.11

     View Summary

    As process technologies advance, process and delay variation causes a complex timing design and in-situ timing error correction techniques are strongly required. Suspicious timing error prediction (STEP) predicts timing errors by monitoring checkpoints by STEP circuits (STEPCs) and how to insert checkpoints is very important. We have proposed a network-flow-based checkpoint insertion algorithm for STEP. However, our algorithm may ignore long paths and insert checkpoints near the output. In this paper, we improve how to ignore short paths and set labels by estimating path lengths. Then, we can ignore only short paths and insert checkpoints into near the center of all long paths. We evaluate our algorithm by applying it to four benchmark circuits. Experimental results show that our proposed algorithm realizes an average of 1.71X overclocking compared with just inserting no STEPC. Furthermore, our improved algorithm realizes an average of 1.15X overclocking compared with our original algorithm.

    CiNii

  • High speed design of sub-threshold circuit by using DTMOS

    FUKUDOME Yuji, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    IEICE technical report. Dependable computing   114 ( 329 ) 117 - 121  2014.11

     View Summary

    Low power consumption is achieved by operating circuits in sub-threshold region. However, in sub-threshold region, the operating speed becomes slow, and the tradeoff between power and speed should be considered carefully. In this work, we present DTMOS implementations to realize high speed and low power in subthreshold region. Transistor level simulation results show that the operating speed can be improved by 30 %-45 %, and on average 15 % energy reduction can be achieved when V_<dd> ranges 0.2-0.3V.

    CiNii

  • A Hardware Trojans Detection Method focusing on Nets in Hardware Trojans in Gate-Level Netlists

    OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   114 ( 329 ) 135 - 140  2014.11

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce design costs in semiconductor industry. This circumstance introduces risks that malicious attackers implement Hardware Trojans (HTs) into ICs. HTs are easily inserted in particular during design phase, but HTs detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method through detecting LSLG nets, which have low switching probabilities. Our approach does not assume Golden netlists nor activation of HTs. We succesfully find out that all HT-inserted gate-level netlists from Trust-HUB benchmarks include a small number of LSLG nets. It takes approximately ten minutes to detect LSLG nets in each benchmark.

    CiNii

  • Energy-efficient High-level Synthesis Algorithm targeting HDR-mcv Architecture with Multiple Clock Domains and Multiple Supply Voltages

    ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   114 ( 329 ) 203 - 208  2014.11

     View Summary

    An HDR-mcv architecture, which integrates multiple supply voltages and multiple clock domains into high-level synthesis and enables us to estimate interconnection delay effects during high-level synthesis, has been proposed with the corresponding synthesis algorithm. They assign voltages and clock frequencies to huddles which are the partitions for interconnection delay estimation during high-level synthesis. However, the voltage and clock assignment may have some energy overheads due to the increased clock trees. In this paper, we propose a new HDR-mcv architecture in which supply voltages are assigned to functional logics and clock synchronization logics separately. Next, we propose a high-level synthesis algorithm for the architecture, which can assign clock frequencies and supply voltages on the bases of the placement and energy informations. Experimental results show that the proposed method achieves 50% energy-saving compared with the conventional HDR-mcv architecture and 60% energy-saving compared with the existing high-level synthesis methods.

    CiNii

  • Data Dependent Optimization using Suspicious Timing Error Prediction for Reconfigurable Approximation Circuits

    Author not found

    研究報告システムとLSIの設計技術(SLDM)   2014 ( 2 ) 1 - 6  2014.11

     View Summary

    LSI 内部の各パス遅延は入力データに応じて様々に変動する.この性質を利用することで,計算精度をわずかに落としながらも高速に動作する LSI の設計が可能になる.本稿では,入力データ群にもとづき特定された最適化すべきパスをリコンフィギュレーションし最適化する,新たな回路設計アルゴリズムを提案する.提案アルゴリズムは最適化対象の回路にタイミングエラー予測回路を挿入し動作させることで被最適化パスを特定,動的に再構成し与えられたエラー制約内で動作クロック周期の最小化を図る.本アルゴリズムを加算器に対して適用した結果,通常のクリティカルパス最小化の設計と比較し,2.1 %以下のエラーを許容する制約下で最大 18.5%の高速化に成功した.The propagation delay along each path inside an LSI widely varies depending on input data, and this property can be exploited to design high-performance approximation circuit with a negligible error rate. In this paper, we propose a novel approximation circuit design algorithm, which identifies paths to be optimized based on input data and reconfigures these paths. Our algorithm first identifies the optimized paths by incorporating timing error prediction circuits into a target circuit and running them in practice. These paths are then dynamically reconfigured within an accuracy constraint with the objective of maximizing its performance. Experimental results targeting a set of basic adders show that our algorithm can achieve performance increase by up to 18.5% within acceptable error of 2.1% compared with conventional design techniques.

    CiNii

  • Design of Flip-Flop with Timing Error Tolerance

    鈴木 大渡, 史 又華, 戸川 望, 宇佐美 公良, 柳澤 政生

    研究報告システムとLSIの設計技術(SLDM)   2014 ( 1 ) 1 - 6  2014.11

     View Summary

    集積回路の微細化の影響により,回路のばらつきが大きくなっており,設計に必要な電源電圧やクロック周波数のマージンが増大している.マージンの緩和のため,タイミングエラーへの耐性を持つ回路の構造が盛んに研究されている.本稿では,フリップフロップの動作とラッチの動作を動的に切り替えることによりタイミングエラー耐性を実現する Time Borrowing Flip-Flop(TBFF) のトランジスタレベルの構造を 2 通り提案したまた,HSPICE シミュレーションによる評価を行い,従来手法と比較して消費エネルギーを最大 20.6%削減できることを示した.Under the influence of the miniaturization of the integrated circuit, the variation of the operation condition of the circuit becomes bigger, and margins of the supply voltage and the clock frequency necessary for a design increase. For the mitigation of the margin, the structure of the circuit with the timing error tolerance is studied flourishingly. In this paper, we propose two new Time Borrowing Flip-Flops (TBFF) in transistor level to realize timing error tolerance by switching from flip-flop to latch dynamically. HSPICE simulation results show that the proposed TBFF can achieve up to 28.1% power reduction when compared with existing works.

    CiNii

  • An Effective Robust Design Using Improved Checkpoint Insertion Algorithm for Suspicious Timing-Error Prediction Scheme and its Evaluations

    吉田 慎之介, 史 又華, 柳澤 政生, 戸川 望

    研究報告システムとLSIの設計技術(SLDM)   2014 ( 3 ) 1 - 6  2014.11

     View Summary

    近年,半導体技術の進展に伴いタイミングエラー発生の危険性が増加している.STEP はタイミングエラーを事前に予測できる手法であるが,STEP 回路を挿入する位置が重要である.このような背景から、回路面積を考慮した STEP 回路の挿入位置決定手法を提案した.本手法では STEP 回路の個数を削減するために短いパスを無視するが,長いパスまで無視する可能性があった.また,短いパスに合わせて位置ラベルを付けるため,STEP 回路の挿入位置がパスの後半に偏る可能性があった.本稿では STEP 回路の挿入位置決定手法で用いる,短いパスの探索方法とラベル付けの方法を改良する.パスの長さを推定することで短いパスのみを無視できるため,これまで STEP 回路を挿入しなかった長いパスで発生するタイミングエラーが予測できる.また,任意の長さのパスに合わせたラベル付けもできるため,チェックポイントがバスの後半となることを防ぐ.改良した手法を複数の回路に対して適用し,最大動作周波数の向上を図る.実験結果より STEP 回路を入れない場合と比較して,最大動作周波数を平均 1.71 倍に向上させることができた.改良前の手法と比較すると,最大動作周波数を平均 1.15 倍に向上させることができた.As process technologies advance, process and delay variation causes a complex timing design and in-situ timing error correction techniques are strongly required. Suspicious timing error prediction (STEP) predicts timing errors by monitoring checkpoints by STEP circuits (STEPCs) and how to insert checkpoints is very important. We have proposed a network-flow-based checkpoint insertion algorithm for STEP. However, our algorithm may ignore long paths and insert checkpoints near the output. In this paper, we improve how to ignore short paths and set labels by estimating path lengths. Then, we can ignore only short paths and insert checkpoints into near the center of all long paths. We evaluate our algorithm by applying it to four benchmark circuits. Experimental results show that our proposed algorithm realizes an average of 1.71X overclocking compared with just inserting no STEPC. Furthermore, our improved algorithm realizes an average of 1.15X overclocking compared with our original algorithm.

    CiNii

  • High speed design of sub-threshold circuit by using DTMOS

    福留 祐治, 史 又華, 戸川 望, 宇佐美 公良, 柳澤 政生

    研究報告システムとLSIの設計技術(SLDM)   2014 ( 21 ) 1 - 5  2014.11

     View Summary

    サブスレッショルド領域で回路を動作させることで低電力化は実現されるが,同時に速度が劣化するトレードオフの関係にある.本稿ではサブスレッショルド領域において低電力で高速化を実現するため,DTMOS を用いたサブスレッシヨルド回路の高速化設計を行い,トランジスタレベルのシミュレーションの結果,30~45%高速化し,Vdd=0.2V, 0.3V において平均 15%低エネルギー化したことを示す.Low power consumption is achieved by operating circuits in sub-threshold region. However, in sub-threshold region, the operating speed becomes slow, and the tradeoff between power and speed should be considered carefully. In this work, we present DTMOS implementations to realize high speed and low power in subthreshold region. Transistor level simulation results show that the operating speed can be improved by 30 %-45 %, and on average 15 % energy reduction can be achieved when Vdd ranges 0.2-0.3V.

    CiNii

  • A Hardware Trojans Detection Method focusing on Nets in Hardware Trojans in Gate-Level Netlists

    大屋 優, 史 又華, 柳澤 政生, 戸川 望

    研究報告システムとLSIの設計技術(SLDM)   2014 ( 24 ) 1 - 6  2014.11

     View Summary

    近年チップの製造をサードパーティに外注するようになり,ハードウェアトロイが挿入される可能性が高まってきた.特に設計段階では簡単にハードウェアトロイを挿入することができる.ゲートレベルのネットリストに対してハードウェアトロイ検出手法を適用する場合,我々は Golden ネットリストを持っておらず,挿入されているハードウェアトロイを活性化するという条件下でハードウェアトロイを検出する手法が存在するのみである.本稿では,Golden ネットリストが無く,ハードウェアトロイを活性化させなくてもハードウェアトロイを検出する手法として,低スイッチング確率のネット (LSLG ネットと呼ぶ) の検出を通じてハードウェアトロイを検出する手法を提案する LSLG ネットはネットリストに含まれるネットの数%であるにも関わらず,Trust-HUB の Abstraction Gate Level で公開されているハードウェアトロイが挿入されている全てのゲートレベルのネットリストに対して,ハードウェアトロイの一部を検出することに成功した.提案手法にかかる時間は高々十数分程度である.Recently, digital ICs are designed by outside vendors to reduce design costs in semiconductor industry. This circumstance introduces risks that malicious attackers implement Hardware Trojans (HTs) into ICs. HTs are easily inserted in particular during design phase, but HTs detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method through detecting LSLG nets, which have low switching probabilities. Our approach does not assume Golden netlists nor activation of HTs. We succesfully find out that all HT-inserted gate-level netlists from Trust-HUB benchmarks include a small number of LSLG nets. It takes approximately ten minutes to detect LSLG nets in each benchmark.

    CiNii

  • Energy-efficient High-level Synthesis Algorithm targeting HDR-mcv Architecture with Multiple Clock Domains and Multiple Supply Voltages

    阿部 晋矢, 史 又華, 宇佐美 公良, 柳澤 政生, 戸川 望

    研究報告システムとLSIの設計技術(SLDM)   2014 ( 40 ) 1 - 6  2014.11

     View Summary

    低電力かつ高速な LSI の設計へ向け,配線遅延を考慮しながら複数クロックドメイン,複数電源電圧を同時に適用可能な HDR-mcv および高位合成手法が提案された.従来手法はクロックおよび電圧をハドルと呼ぶ区画毎に割り当てるが,クロックツリー数の増加による消費エネルギーのオーバヘッドが無視できない.提案手法はクロックに同期する論理,および演算回路に対し独立に電圧を割り当てることで,クロックツリー数を増加せずに複数クロックドメインと複数電源電圧を同時適用する.計算機実験結果により,提案手法は従来の HDR-mcv アーキテクチャを対象とした高位合成アルゴリズムと比較し 50%程度消費エネルギーを削減し,最終的に従来のレジスタ分散型アーキテクチャと比較し提案手法は 60%程度消費エネルギーを削減できることを確認した.An HDR-mcv architecture, which integrates multiple supply voltages and multiple clock domains into high-level synthesis and enables us to estimate interconnection delay effects during high-level synthesis, has been proposed with the corresponding synthesis algorithm. They assign voltages and clock frequencies to huddles which are the partitions for interconnection delay estimation during high-level synthesis. However, the voltage and clock assignment may have some energy overheads due to the increased clock trees. In this paper, we propose a new HDR-mcv architecture in which supply voltages are assigned to functional logics and clock synchronization logics separately. Next, we propose a high-level synthesis algorithm for the architecture, which can assign clock frequen cies and supply voltages on the bases of the placement and energy informations. Experimental results show that the proposed method achieves 50% energy-saving compared with the conventional HDR-mcv architecture and 60% energy-saving compared with the existing high-level synthesis methods.

    CiNii

  • Local pulse generation in variable stages pipeline designs for low energy consumption

    NII Takayuki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    Technical report of IEICE. VLD   114 ( 231 ) 7 - 12  2014.10

     View Summary

    The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

    CiNii

  • Local pulse generation in variable stages pipeline designs for low energy consumption

    NII Takayuki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    IEICE technical report. Image engineering   114 ( 233 ) 7 - 12  2014.10

     View Summary

    The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

    CiNii

  • Local pulse generation in variable stages pipeline designs for low energy consumption

    NII Takayuki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    Technical report of IEICE. ICD   114 ( 232 ) 7 - 12  2014.10

     View Summary

    The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

    CiNii

  • Local pulse generation in variable stages pipeline designs for low energy consumption

    Takayuki Nii, Youhua Shi, Nozomu Togawa, Kimiyoshi Usami, Masao Yanagisawa

    研究報告システムとLSIの設計技術(SLDM)   2014 ( 2 ) 1 - 6  2014.09

     View Summary

    The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

    CiNii

  • An Effective Robust Design for Large Delay Variation Using Suspicious Timing-Error Prediction Scheme

    吉田 慎之介, 史 又華, 柳澤 政生, 戸川 望

    DAシンポジウム2014論文集   2014   61 - 66  2014.08

    CiNii

  • Latch-based AES Encryption Circuit Against Fault Analysis

    SHI Youhua, TANIGUCHI Hiroaki, TOGAWA Nozomu, YANAGISAWA Masao

    Technical report of IEICE. VLD   113 ( 454 ) 37 - 42  2014.03

     View Summary

    In general, cryptography is considered to be secure because it is based on complicated mathematical theories. In recent year, however, attacks on not crypto algorithms but hardware implementations such as fault analysis methods have posed new security threats. Cryptographic circuits are prone to fault analysis that intend to retrieve secret data by means of malicious fault injection. Clock-adjustment, voltage change, and laser manipulation can be used to inject malicious faults during the execution of a crypto circuit. As countermeasures against fault analysis, area-redundant methods such as triple modular redundant(TMR) and timing-redundant methods have been proposed at the cost of area or throughput. In this paper, we proposed a latch-based AES encryption circuit, with 18.1% area overhead and 5% throughput improvement, which can detect all the possible errors during the fault analysis region of clock glitch based fault analysis. In addition to fault analysis detection, the proposed method can also prevent the transmission and the use of erroneous results, and then can guarantee the correctness of the final encrypted outputs.

    CiNii

  • Secure scan design using improved random order scans and its evaluations

    OYA Masaru, ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   113 ( 454 ) 43 - 48  2014.03

     View Summary

    Scan test using scan chains is one of the most important DFT techniques. On the other hand, scan-based attacks are reported which can retrieve the secret key in crypto circuits by using scan chains. Secure scan architecture is strongly required to protect scan chains from scan-based attacks. In this paper, we propose an improved version of random order scans as a secure scan architecture. In our improved random order scans, a scan chain is partitioned into multiple sub-chains. The structure of the scan chain changes dynamically by selecting a subchain to scan out using enable signals. We also discuss testability and security of our improved random order scans and demonstrate their effectiveness through implementation results.

    CiNii

  • Experiment and Analysis on Temperature Dependence of Delay and Energy for Subthreshold Circuits

    KUSHIDA Hiroki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    Technical report of IEICE. VLD   113 ( 454 ) 147 - 151  2014.03

     View Summary

    Low voltage design has been used in order to reduce the energy dissipation of mobile network equipment. However, as supply voltage reduces into subthreshold region, performance degradation and environment variations become the primary design challenges. In this paper, we implemented a super-pipelined multiplier for subthreshold supply voltage. With super-pipeline, the performance and energy efficiency can be improved. Moreover, experimental evaluations on the temperature dependences of delay and energy are also conducted for analysis.

    CiNii

  • 51.鋼船規則検査要領R編における改正点の解説 : 窒素発生装置から発生する高濃度ガスの排出場所(2014年版鋼船規則及び関連検査要領等における改正点の解説,技術規則解説)

    [記載なし], 阿部 晋矢, 史 又華, 柳澤 政生, 戸川 望

    日本海事協會會誌   ( 306 ) 69 - 69  2014

     View Summary

    LSI 内部の各パス遅延は入力データに応じて様々に変動する.この性質を利用することで,計算精度をわずかに落としながらも高速に動作する LSI の設計が可能になる.本稿では,入力データ群にもとづき特定された最適化すべきパスをリコンフィギュレーションし最適化する,新たな回路設計アルゴリズムを提案する.提案アルゴリズムは最適化対象の回路にタイミングエラー予測回路を挿入し動作させることで被最適化パスを特定,動的に再構成し与えられたエラー制約内で動作クロック周期の最小化を図る.本アルゴリズムを加算器に対して適用した結果,通常のクリティカルパス最小化の設計と比較し,2.1 %以下のエラーを許容する制約下で最大 18.5%の高速化に成功した.The propagation delay along each path inside an LSI widely varies depending on input data, and this property can be exploited to design high-performance approximation circuit with a negligible error rate. In this paper, we propose a novel approximation circuit design algorithm, which identifies paths to be optimized based on input data and reconfigures these paths. Our algorithm first identifies the optimized paths by incorporating timing error prediction circuits into a target circuit and running them in practice. These paths are then dynamically reconfigured within an accuracy constraint with the objective of maximizing its performance. Experimental results targeting a set of basic adders show that our algorithm can achieve performance increase by up to 18.5% within acceptable error of 2.1% compared with conventional design techniques.

    CiNii

  • Suspicious timing error prediction using check points

    IGARASHI Hiroaki, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   113 ( 321 ) 39 - 44  2013.11

     View Summary

    Due to advance process technologies, timing design of LSIs has become more difficult and the importance of timing error countermeasure techniques is increasing as well. Existing timing error detection/correction methods have difficulties in timing design since they have complex structure. Furthermore, their error correction is realized by re-run operation which results in low throughput. We have proposed a suspicious timing error prediction method (STEP method) which predicts timing error and corrects it with simple structure. STEP is based on checking timing errors by observing several checkpoints on signal paths. Since STEP is a timing error prediction method, we may have false positives and reduction of them is one of the largest problems. In this paper, we propose a method to reduce the false positives to optimize the checkpoints. The experimental results show that an operational frequency is increased by up to 2.4 times and its throughput is improved by up to 45%.

    CiNii

  • Clock Energy-efficient High-level Synthesis and Experimental Evaluation for HDR-mcd Architecture

    ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   113 ( 321 ) 263 - 268  2013.11

     View Summary

    In this paper, we propose a clock energy-efficient high-level synthesis algorithm for HDR-mcd architecture. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay is required and should be considered during high-level synthesis. In our iterative improvement based algorithm, low-frequency clocks are assigned to non-critical huddles under resource and latency constraints for energy efficiency improvement. Experimental results show that the proposed method achieves 20% clock energy-saving and 10% total energy-saving compared with the existing methods considering clock gating.

    CiNii

  • Suspicious timing error prediction using check points

    IGARASHI Hiroaki, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   113 ( 320 ) 39 - 44  2013.11

     View Summary

    Due to advance process technologies, timing design of LSIs has become more difficult and the importance of timing error countermeasure techniques is increasing as well. Existing timing error detection/correction methods have difficulties in timing design since they have complex structure. Furthermore, their error correction is realized by re-run operation which results in low throughput. We have proposed a suspicious timing error prediction method (STEP method) which predicts timing error and corrects it with simple structure. STEP is based on checking timing errors by observing several checkpoints on signal paths. Since STEP is a timing error prediction method, we may have false positives and reduction of them is one of the largest problems. In this paper, we propose a method to reduce the false positives to optimize the checkpoints. The experimental results show that an operational frequency is increased by up to 2.4 times and its throughput is improved by up to 45%.

    CiNii

  • Clock Energy-efficient High-level Synthesis and Experimental Evaluation for HDR-mcd Architecture

    ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   113 ( 320 ) 263 - 268  2013.11

     View Summary

    In this paper, we propose a clock energy-efficient high-level synthesis algorithm for HDR-mcd architecture. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay is required and should be considered during high-level synthesis. In our iterative improvement based algorithm, low-frequency clocks are assigned to non-critical huddles under resource and latency constraints for energy efficiency improvement. Experimental results show that the proposed method achieves 20% clock energy-saving and 10% total energy-saving compared with the existing methods considering clock gating.

    CiNii

  • Suspicious timing error prediction using check points

    五十嵐 博昭, 史 又華, 柳澤 政生, 戸川 望

    研究報告システムLSI設計技術(SLDM)   2013 ( 8 ) 1 - 6  2013.11

     View Summary

    プロセス技術の微細化により LSI のタイミング設計が難しくなっており,タイミングエラー対策手法の重要性が高まっている.既存のタイミングエラー検出手法はエラー訂正に再実行が必要であったり,複雑な構造を持つためタイミング設計が難しい我々はより訂正コストが小さく簡単な構造を持つタイミングエラー対策手法として STEP を提案している.STEP ではチェックポイントと呼ばれるパス中の観測点をチェックすることでタイミングエラー発生の可能性を検出する.STEP はタイミングエラー予測手法であるため誤検出が発生し,誤検出の削減が大きな課題である.本稿ではチェックポイントの最適化により誤検出を削減する手法を提案する.実験結果より,動作可能周波数が最大で 24 倍となり,スループットは最大で約 45%向上した.Due to advance process technologies, timing design of LSIs has become more difficult and the importance of timing error countermeasure techniques is increasing as well. Existing timing error detection/correction methods have difficulties in timing design since they have complex structure. Furthermore, their error correction is realized by re-run operation which results in low throughput. We have proposed a suspicious timing error prediction method (STEP method) which predicts timing error and corrects it with simple structure. STEP is based on checking timing errors by observing several checkpoints on signal paths. Since STEP is a timing error prediction method, we may have false positives and reduction of them is one of the largest problems. In this paper, we propose a method to reduce the false positives to optimize the checkpoints. The experimental results show that an operational frequency is increased by up to 2.4 times and its throughput is improved by up to 45%.

    CiNii

  • Clock Energy-efficient High-level Synthesis and Experimental Evaluation for HDR-mcd Architecture

    阿部 晋矢, 史 又華, 宇佐美 公良, 柳澤 政生, 戸川 望

    研究報告システムLSI設計技術(SLDM)   2013 ( 47 ) 1 - 6  2013.11

     View Summary

    LSI 全体に占めるクロック信号によるエネルギー消費の割合は大きく,マルチクロックドメイン,クロックゲーテイングなどが提案された.本稿では,マルチクロックドメイン指向 HDR-mcd アーキテクチャを対象としたクロックエネルギー削減に向けた高位合成手法を提案する.提案手法は 1 クロック内の通信が保障されるハドルと呼ぶ区画を利用し,配線遅延の影響を予測,異なるクロック間の同期を考慮した高位合成を実現する.クロックはハドル毎に割り当て,資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する.計算機実験により提案手法はクロックゲーテイングのみを考慮した従来手法と比較し,クロックツリーのエネルギーを 30% 程度削減でき,全体のエネルギーを 25% 程度削減できることを確認した.In this paper, we propose a clock energy-efficient high-level synthesis algorithm for HDR-mcd architecture. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay is required and should be considered during high-level synthesis. In our iterative improvement based algorithm, low-frequency clocks are assigned to non-critical huddlesunder resource and latency constraints for energy efficiency improvement. Experimental results show that the proposed method achieves 20% clock energy-saving and 10% total energy-saving compared with the existing methods considering clock gating.

    CiNii

  • A Comsideration on Hardware Trojan Detection Specifying Trojan Path

    Atobe Yuta, Shi Youhua, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2013  2013.09

    CiNii

  • Data Recoverable AES Circuit Against Differential Fault Analysis

    Taniguchi Hiroaki, Shi Youhua, Togawa Nozomu, Yanagisawa Masao

    Proceedings of the Society Conference of IEICE   2013  2013.09

    CiNii

  • Energy-Efficient High-Level Synthesis with Multiple Clock Domain for HDR-mcd

    阿部 晋矢, 史 又華, 宇佐美 公良

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems   26   185 - 190  2013.07

    CiNii

  • フロアプランを考慮したマルチクロックドメイン指向の低電力化高位合成手法 (コンピュータシステム 組込み技術とネットワークに関するワークショップETNET2013)

    阿部 晋矢, 史 又華, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 : 信学技報   112 ( 481 ) 115 - 120  2013.03

     View Summary

    本稿では,マルチクロックドメイン適用へ向け,HDRアーキテクチャを拡張したHDR-mcdを提案する.続いてHDR-mcdを対象にマルチクロックドメイン指向の低電力化高位合成を提案する.提案手法はフロアプラン情報をフィードバックし,反復改良する合成フローを取る.その際,1クロック内の通信が保障されるパドルと呼ぶ区画を利用し,配線遅延の影響を予測,異なるクロック間の同期を考慮した高位合成を実現する.クロックはパドル毎に割り当て,資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する.計算機実験により提案手法は従来の単一クロックのみを考慮したレジスタ分散型アーキテクチャと比較し25%程度消費エネルギーを削減できることを確認した.

    CiNii

  • フロアプランを考慮したマルチクロックドメイン指向の低電力化高位合成手法

    阿部晋矢, 史又華, 柳澤政生, 戸川望

    研究報告システムLSI設計技術(SLDM)   2013 ( 20 ) 1 - 6  2013.03

     View Summary

    本稿では,マルチクロックドメイン適用へ向け,HDRアーキテクチャを拡張したHDR-mcdを提案する.続いてHDR-mcdを対象にマルチクロックドメイン指向の低電力化高位合成を提案する.提案手法はフロアプラン情報をフィードバックし,反復改良する合成フローを取る.その際,1クロック内の通信が保障されるハドルと呼ぶ区画を利用し,配線遅延の影響を予測,異なるクロック間の同期を考慮した高位合成を実現する.クロックはハドル毎に割り当て,資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する.計算機実験により提案手法は従来の単一クロックのみを考慮したレジスタ分散型アーキテクチャと比較し25%程度消費エネルギーを削減できることを確認した.

    CiNii

  • フロアプランを考慮したマルチクロックドメイン指向の低電力化高位合成手法

    阿部晋矢, 史又華, 柳澤政生, 戸川望

    研究報告組込みシステム(EMB)   2013 ( 20 ) 1 - 6  2013.03

     View Summary

    本稿では,マルチクロックドメイン適用へ向け,HDRアーキテクチャを拡張したHDR-mcdを提案する.続いてHDR-mcdを対象にマルチクロックドメイン指向の低電力化高位合成を提案する.提案手法はフロアプラン情報をフィードバックし,反復改良する合成フローを取る.その際,1クロック内の通信が保障されるハドルと呼ぶ区画を利用し,配線遅延の影響を予測,異なるクロック間の同期を考慮した高位合成を実現する.クロックはハドル毎に割り当て,資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する.計算機実験により提案手法は従来の単一クロックのみを考慮したレジスタ分散型アーキテクチャと比較し25%程度消費エネルギーを削減できることを確認した.

    CiNii

  • フロアプランを考慮したマルチクロックドメイン指向の低電力化高位合成手法(動作合成,組込み技術とネットワークに関するワークショップETNET2013)

    阿部 晋矢, 史 又華, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告. CPSY, コンピュータシステム   112 ( 481 ) 115 - 120  2013.03

     View Summary

    本稿では,マルチクロックドメイン適用へ向け,HDRアーキテクチャを拡張したHDR-mcdを提案する.続いてHDR-mcdを対象にマルチクロックドメイン指向の低電力化高位合成を提案する.提案手法はフロアプラン情報をフィードバックし,反復改良する合成フローを取る.その際,1クロック内の通信が保障されるパドルと呼ぶ区画を利用し,配線遅延の影響を予測,異なるクロック間の同期を考慮した高位合成を実現する.クロックはパドル毎に割り当て,資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する.計算機実験により提案手法は従来の単一クロックのみを考慮したレジスタ分散型アーキテクチャと比較し25%程度消費エネルギーを削減できることを確認した.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 320 ) 45 - 50  2012.11

     View Summary

    Secure cryptographic LSIs is intensively used in order to perform confidential operation. Scan test has become the most widely adopted test technique to ensure the correctness of manufactured LSIs, in which through the scan chains the internal states of the circuit under test (CUT) can be controlled and observed externally. However, scan chains using scan test might carry the risk of being misused for secret information leakage. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of vairous crypto circuits implementation is compromised, and the effectiveness of the proposed method. Experimental results on various crypto implementations show the effectiveness of the proposed method.

    CiNii

  • SAAV:Energy-efficient High-level Synthesis Algorithm targeting Adaptive Voltage Huddle-based Distributed Register Architecture with Dynamic Multiple Supply Voltages

    ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 320 ) 135 - 140  2012.11

     View Summary

    An adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis, and a synthesis algorithm for AVHDR architectures have been proposed. This algorithm is based on iterative improvement of scheduling/binding and floorplanning and can converge without oscillation by using virtual-area-based iterative refinement flow. However, virtual areas may have some area and interconnection delay overheads. In this paper, we propose virtual area adaptation which relaxes these overheads as the iteration proceeds. Experimental results show that our algorithm achieves 6.2% energy saving compared with conventional algorithm for AVHDR architectures and 65.7% energy saving compared with conventional algorithms.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   112 ( 321 ) 45 - 50  2012.11

     View Summary

    Secure cryptographic LSIs is intensively used in order to perform confidential operation. Scan test has become the most widely adopted test technique to ensure the correctness of manufactured LSIs, in which through the scan chains the internal states of the circuit under test (CUT) can be controlled and observed externally. However, scan chains using scan test might carry the risk of being misused for secret information leakage. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of vairous crypto circuits implementation is compromised, and the effectiveness of the proposed method. Experimental results on various crypto implementations show the effectiveness of the proposed method.

    CiNii

  • SAAV:Energy-efficient High-level Synthesis Algorithm targeting Adaptive Voltage Huddle-based Distributed Register Architecture with Dynamic Multiple Supply Voltages

    ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   112 ( 321 ) 135 - 140  2012.11

     View Summary

    An adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis, and a synthesis algorithm for AVHDR architectures have been proposed. This algorithm is based on iterative improvement of scheduling/binding and floorplanning and can converge without oscillation by using virtual-area-based iterative refinement flow. However, virtual areas may have some area and interconnection delay overheads. In this paper, we propose virtual area adaptation which relaxes these overheads as the iteration proceeds. Experimental results show that our algorithm achieves 6.2% energy saving compared with conventional algorithm for AVHDR architectures and 65.7% energy saving compared with conventional algorithms.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Attack

    跡部 悠太, 史 又華, 柳澤 政生, 戸川 望

    研究報告システムLSI設計技術(SLDM)   2012 ( 9 ) 1 - 6  2012.11

     View Summary

    暗号 LSI は機密操作を行うために使用されるため,それ自体は安全である必要があるスキャンテストは高い故障検出率を持つテスト容易化手法であり,近年の LSI の大規模化によって重要性が高まっているが,様々な暗号回路へのスキャンベース攻撃手法が報告されているそこで,テスト容易性を保ちスキャンベース攻撃に対して高い安全性を持つセキュアスキャンアーキテクチャとして SDSFF (State Dependent Scan Flip-Flop) が提案された SDSFF では,スキャンフリップフロップに対して付加するラッチの値を更新するタイミングが重要な問題となる本稿では,オンラインテストを可能にする更新タイミングを提案する提案する更新タイミングはスキャンチェイン上の任意のフリップフロップと回路設計時に決定した値との比較結果によって決定される RSA 暗号回路, AES 暗号回路及び DES 暗号回路に提案手法を実装し,評価を行った実験結果より,様々な暗号回路において有効であることが示せた.Secure cryptographic LSIs is intensively used in order to perform confidential operation. Scan test has become the most widely adopted test technique to ensure the correctness of manufactured LSIs, in which through the scan chains the internal states of the circuit under test (CUT) can be controlled and observed externally. However, scan chains using scan test might carry the risk of being misused for secret information leakage. Therefore a secure scan architecture using SDSFF(State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of vairous crypto circuits implementation is compromised, and the effectiveness of the proposed method. Experimental results on various crypto implementations show the effectiveness of the proposed method.

    CiNii

  • SAAV : Energy-efficient High-level Synthesis Algorithm targeting Adaptive Voltage Huddle-based Distributed Register Architecture with Dynamic Multiple Supply Voltages

    阿部 晋矢, 史 又華, 宇佐美 公良, 柳澤 政生, 戸川 望

    研究報告システムLSI設計技術(SLDM)   2012 ( 24 ) 1 - 6  2012.11

     View Summary

    動的複数電源電圧と配線遅延を高位合成に統合するプラットフォームとして, Adaptive Voltages Huddle-based Distributed-Register アーキテクチャ (AVHDR) および AVHDR アーキテクチャを対象とした高位合成手法が提案された.従来手法はフロアプラン情報をフィードバックし,反復改良する合成フローを取る従来手法では収束性を改善するため,仮想面積ベースの反復改良を採用している.しかし,仮想面積は面積と配線遅延のオーバヘッドを伴う可能性がある.本稿では反復が進むごとにオーバヘッドを削減する仮想面積調整を提案する.計算機実験結果により,提案手法は従来の AVHDR アーキテクチャを対象とした高位合成アルゴリズムと比較し最大 6.2% 消費エネルギーを削減し,最終的に従来の高位合成アルゴリズムと比較し最大 65.7% 消費エネルギーを削減できることを確認した.An adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis, and a synthesis algorithm for AVHDR architectures have been proposed. This algorithm is based on iterative improvement of scheduling/binding and floorplanning and can converge without oscillation by using virtual-area-based iterative refinement flow. However, virtual areas may have some area and interconnection delay overheads. In this paper, we propose virtual area adaptation which relaxes these overheads as the iteration proceeds. Experimental results show that our algorithm achieves 6.2% energy saving compared with conventional algorithm for AVHDR architectures and 65.7% energy saving compared with conventional algorithms.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. ICD   112 ( 247 ) 95 - 100  2012.10

     View Summary

    Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    電子情報通信学会技術研究報告. ICD, 集積回路   112 ( 247 ) 95 - 100  2012.10

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Signal processing   112 ( 246 ) 95 - 100  2012.10

     View Summary

    Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 245 ) 95 - 100  2012.10

     View Summary

    Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Image engineering   112 ( 248 ) 95 - 100  2012.10

     View Summary

    Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

    跡部 悠太, 史 又華, 柳澤 政生, 戸川 望

    研究報告システムLSI設計技術(SLDM)   2012 ( 18 ) 1 - 6  2012.10

     View Summary

    スキャンテストは高い故障検出率を持ち,一般的に使われるテスト容易化設計技術である.しかし,スキャンテストで用いられるスキャンチェインを通して暗号 LSI から秘密鍵が解読できる可能性が指摘されている.そこで,テスト容易性を保ちスキャンベース攻撃に対して高い安全性を持つセキュアスキャンアーキテクチャとして SDSFF (State Dependent Scan Flip-Flop) が提案された. SDSFF では,スキャンフリップフロップに対して付加するラッチの値を更新するタイミングが重要な問題となる.本稿では,オンラインテストを可能にする更新タイミングを提案する.提案する更新タイミングはスキャンチェイン上の任意のフリップフロップと回路設計時に決定した値との比較結果によって決定される. RSA 暗号回路に提案するセキュアスキャンアーキテクチャを実装し,評価を行った.実験結果より, SDSFF を 100 個実装した場合面積オーバーヘッドは高々 0.555% であり,従来手法よりも小さい面積オーバーヘッドであることがわかった.Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

    CiNii

  • A-3-4 AES Cryptosystem Using Clock Falling Edge Against DFA

    Igarashi Hiroaki, Shi Youhua, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2012  2012.08

    CiNii

  • A-3-5 Secure Scan Architecture Using State Dependent Scan Flip-Flop with Feedback

    Atobe Yuta, Shi Youhua, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2012  2012.08

    CiNii

  • Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Signal processing   112 ( 115 ) 115 - 120  2012.06

     View Summary

    Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

    CiNii

  • Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Mathematical Systems Science and its Applications : IEICE technical report   112 ( 116 ) 115 - 120  2012.06

     View Summary

    Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

    CiNii

  • Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Circuits and systems   112 ( 113 ) 115 - 120  2012.06

     View Summary

    Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

    CiNii

  • Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 114 ) 115 - 120  2012.06

     View Summary

    Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

    CiNii

  • An Energy-efficient ASIP Synthesis Method Using Scratchpad Memory and Code Placement Optimization

    SHIMADA Yoshinori, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   110 ( 432 ) 25 - 30  2011.02

     View Summary

    In this paper, we propose an energy-efficient ASIP synthesis method using scratchpad memory. Due to the fact that a significant amount of power is consumed in the instruction memory, how to develop energy-efficient memory structure becomes important in reducing the overall power consumption of the system. Our method is based on the idea of using scratchpad memory with code placement optimization. The proposed memory architecture can copy data from instruction memory to scratchpad meory under the control of on-chip program counter. With an inputted application CFG, the proposed code placement optimization is used to decide both the code allocations and the required scratchpad memory size for energy minimization. By doing this, the total energy consumption could be reduced as the number of instruction memory accesses is reduced. Experimental results on Mediabench are included to show the effectiveness of the proposed method, in which on average 47.9% energy consumption could be reduced.

    CiNii

  • Dynamically Variable Secure Scan Architecture against Scan-based Side Channel Attack on Cryptography LSIs

    ATOBE Hiroshi, NARA Ryuta, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 111 ) 55 - 59  2008.11

     View Summary

    Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan chains would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan chains to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each SDSFF to be inverted or not so as to make it more difficult to discover the internal scan architecture. We made an analysis on an AES implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based side channel attack.

    CiNii

  • Dynamically Variable Secure Scan Architecture against Scan-based Side Channel Attack on Cryptography LSIs

    ATOBE Hiroshi, NARA Ryuta, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 298 ) 55 - 59  2008.11

     View Summary

    Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan chains would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan chains to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each SDSFF to be inverted or not so as to make it more difficult to discover the internal scan architecture. We made an analysis on an AES implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based side channel attack.

    CiNii

  • Dynamically Variable Secure Scan Architecture against Scan-based Side Channel Attack on Cryptography LSIs

    ATOBE Hiroshi, NARA Ryuta, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 299 ) 55 - 59  2008.11

     View Summary

    Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan chains would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan chains to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each SDSFF to be inverted or not so as to make it more difficult to discover the internal scan architecture. We made an analysis on an AES implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based side channel attack.

    CiNii

  • An Energy-efficent ASIP Synthesis Method Based on Reducing Bit-width of Instruction Memory

    KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 509 ) 25 - 30  2008.03

     View Summary

    This paper proposes an energy-efficient ASIP synthesis method based on reducing bit-width of instruction memory. VLIW-type processors can execute several instructions concurrently. However, an instruction memory of the processors requires long bit-width. This increases power and energy consumption wastefully. Therefore reducing bit-width of instruction memory can realize high-performance and energy-efficient processors. Bit-width of an instruction memory depends on the instruction encoding format, which is composed of the opcode and the operands of a instruction. The opcode bit-width depends on the number of instructions in the instruction-set and the operand bit-width depends depends on the number of general-purpose registers. Moreover, to reduce opcode bit-width, we introduce a concept of a combined instruction which is handled as one instruction and composed of several instructions issued concurrently at each VLIW-slots. We develop an energy-efficient ASIP synthesis system including 3 algorithm: opcode bit-width reduction algorithm, operand bit-width reduction algorithm and energy minimization algorithm. In experimental results, we confirm 9%〜12% energy consumption reduction at a whole processor system including memories.

    CiNii

  • An energy-efficient ASIP synthesis method based on reducing bit-width of instruction memory

    KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 506 ) 25 - 30  2008.03

     View Summary

    This paper proposes an energy-efficient ASIP synthesis method based on reducing bit-width of instruction memory. VLIW-type processors can execute several instructions concurrently. However, an instruction memory of the processors requires long bit-width. This increases power and energy consumption wastefully. Therefore reducing bit-width of instruction memory can realize high-performance and energy-efficient processors. Bit-width of an instruction memory depends on the instruction encoding format, which is composed of the opcode and the operands of a instruction. The opcode bit-width depends on the number of instructions in the instruction-set and the operand bit-width depends depends on the number of general-purpose registers. Moreover, to reduce opcode bit-width, we introduce a concept of a combined instruction which is handled as one instruction and composed of several instructions issued concurrently at each VLIW-slots. We develop an energy-efficient ASIP synthesis system including 3 algorithm: opcode bit-width reduction algorithm, operand bit-width reduction algorithm and energy minimization algorithm. In experimental results, we confirm 9%〜12% energy consumption reduction at a whole processor system including memories.

    CiNii

  • Scalable Dual-Radix Unified Montgomery Multiplier in GF(P) and GF(2^n)

    TANIMURA Kazuyuki, NARA Ryuta, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 101 ) 43 - 48  2007.06

     View Summary

    Modular multiplication is the dominant arithmetic operation in elliptic curve cryptography (ECC), which is one of public-key cryptographies. Montgomery multiplication is commonly used as a technique for modular multiplication and required scalability since the bit length of operands varies depending on the security levels. ECC is performed in GF(P) of GF(2^n), and scalable unified architectures are proposed in previous works. However, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2^n) parts of the multiplier because the critical path of GF(P) hardware is longer. This paper proposes an algorithm and architecture for a scalable and dual-radix unified Montgomery multiplier in GF(P) and GF(2^n). The proposed architecture unifies 4 parallelized radix-2^16 multipliers in GF(P) and a radix-2^64 multiplier in GF(2^n) into a single unit. Applying lower radix to GF(P) hardware shortens its critical path and allows to compute the numbers in the two fields using a same multiplier. Moreover, parallelized architecture in GF(P) reduces the clock cycles increased by dual-radix approach, achieving the fastest scalable unified Montgomery multiplier yet reported.

    CiNii

  • Scalable Dual-Radix Unified Montgomery Multiplier in GF(P) and GF(2^n)

    TANIMURA Kazuyuki, NARA Ryuta, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 105 ) 43 - 48  2007.06

     View Summary

    Modular multiplication is the dominant arithmetic operation in elliptic curve cryptography (ECC), which is one of public-key cryptographies. Montgomery multiplication is commonly used as a technique for modular multiplication and required scalability since the bit length of operands varies depending on the security levels. ECC is performed in GF(P) of GF(2^n), and scalable unified architectures are proposed in previous works. However, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2^n) parts of the multiplier because the critical path of GF(P) hardware is longer. This paper proposes an algorithm and architecture for a scalable and dual-radix unified Montgomery multiplier in GF(P) and GF(2^n). The proposed architecture unifies 4 parallelized radix-2^16 multipliers in GF(P) and a radix-2^64 multiplier in GF(2^n) into a single unit. Applying lower radix to GF(P) hardware shortens its critical path and allows to compute the numbers in the two fields using a same multiplier. Moreover, parallelized architecture in GF(P) reduces the clock cycles increased by dual-radix approach, achieving the fastest scalable unified Montgomery multiplier yet reported.

    CiNii

  • Scalable Dual-Radix Unified Montgomery Multiplier in GF(P) and GF(2^n)

    TANIMURA Kazuyuki, NARA Ryuta, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 103 ) 43 - 48  2007.06

     View Summary

    Modular multiplication is the dominant arithmetic operation in elliptic curve cryptography (ECC), which is one of public-key cryptographies. Montgomery multiplication is commonly used as a technique for modular multiplication and required scalability since the bit length of operands varies depending on the security levels. ECC is performed in GF(P) of GF(2^n), and scalable unified architectures are proposed in previous works. However, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2^n) parts of the multiplier because the critical path of GF(P) hardware is longer. This paper proposes an algorithm and architecture for a scalable and dual-radix unified Montgomery multiplier in GF(P) and GF(2^n). The proposed architecture unifies 4 parallelized radix-2^16 multipliers in GF(P) and a radix-2^64 multiplier in GF(2^n) into a single unit. Applying lower radix to GF(P) hardware shortens its critical path and allows to compute the numbers in the two fields using a same multiplier. Moreover, parallelized architecture in GF(P) reduces the clock cycles increased by dual-radix approach, achieving the fastest scalable unified Montgomery multiplier yet reported.

    CiNii

  • CoDaMa: An XML-based Framework for Manipulating CDFGs

    KOARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2007 ( 2 ) 73 - 78  2007.01

     View Summary

    This paper proposes an XML-based framework to manipulate CDFGs (Control Data Flow Graphs) for HW/SW (Hardware/Software) co-synthesis systems or high-level synthesis systems. A CDFG is composed of CFG (Control Flow Graph) and DFGs (Data Flow Graphs). In HW/SW co-synthesis systems or high-level synthesis system, CDFGs are often adopted as an internal representation of input application programs. The systems explore design space automatically with various optimization algorithm in order to synthesize hardware and software which satisfy performance requirements and design constraints. However, with the increased scale of the recent SoC (System On a Chip) applications, synthesis systems require implemented more advanced functions, and it would result in increased development efforts. In the proposed framework, developers implement algorithm as modules and construct the synthesis systems by combination of the modules in order to improve development productivity. The developers can implement algorithm and construct the systems easily by using XML descriptions as intermediate representation of application programs and providing the input/output interface.

    CiNii

  • CoDaMa : An XML-based Framework for Manipulation CDFGs

    KOARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 456 ) 19 - 24  2007.01

     View Summary

    This paper proposes an XML-based framework to manipulate CDFGs (Control Data Flow Graphs) for HW/SW (Hardware / Software) co-synthesis systems or high-level synthesis systems. A CDFG is composed of CFG (Control Flow Graph) and DFGs (Data Flow Graphs). In HW/SW co-synthesis systems or high-level synthesis system, CDFGs are often adopted as an internal representation of input application programs. The systems explore design space automatically with various optimization algorithm in order to synthesize hardware and software which satisfy performance requirements and design constraints. However, with the increased scale of the recent SoC (System On a Chip) applications, synthesis systems require implemented more advanced functions, and it would result in increased development efforts. In the proposed framework, developers implement algorithm as modules and construct the synthesis systems by combination of the modules in order to improve development productivity. The developers can implement algorithm and construct the systems easily by using XML descriptions as intermediate representation of application programs and providing the input/output interface.

    CiNii

  • CoDaMa : An XML-based Framework for Manipulating CDFGs

    KOARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 458 ) 19 - 24  2007.01

     View Summary

    This paper proposes an XML-based framework to manipulate CDFGs (Control Data Flow Graphs) for HW/SW (Hardware / Software) co-synthesis systems or high-level synthesis systems. A CDFG is composed of CFG (Control Flow Graph) and DFGs (Data Flow Graphs). In HW/SW co-synthesis systems or high-level synthesis system, CDFGs are often adopted as an internal representation of input application programs. The systems explore design space automatically with various optimization algorithm in order to synthesize hardware and software which satisfy performance requirements and design constraints. However, with the increased scale of the recent SoC (System On a Chip) applications, synthesis systems require implemented more advanced functions, and it would result in increased development efforts. In the proposed framework, developers implement algorithm as modules and construct the synthesis systems by combination of the modules in order to improve development productivity. The developers can implement algorithm and construct the systems easily by using XML descriptions as intermediate representation of application programs and providing the input/output interface.

    CiNii

  • CoDaMa : An XML-based Framework for Manipulation CDFGs

    KOARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 454 ) 19 - 24  2007.01

     View Summary

    This paper proposes an XML-based framework to manipulate CDFGs (Control Data Flow Graphs) for HW/SW (Hardware / Software) co-synthesis systems or high-level synthesis systems. A CDFG is composed of CFG (Control Flow Graph) and DFGs (Data Flow Graphs). In HW/SW co-synthesis systems or high-level synthesis system, CDFGs are often adopted as an internal representation of input application programs. The systems explore design space automatically with various optimization algorithm in order to synthesize hardware and software which satisfy performance requirements and design constraints. However, with the increased scale of the recent SoC (System On a Chip) applications, synthesis systems require implemented more advanced functions, and it would result in increased development efforts. In the proposed framework, developers implement algorithm as modules and construct the synthesis systems by combination of the modules in order to improve development productivity. The developers can implement algorithm and construct the systems easily by using XML descriptions as intermediate representation of application programs and providing the input/output interface.

    CiNii

  • A Forwarding Unit Optimization Method for Application Processors

    HIURA Toshihiro, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2006 ( 126 ) 181 - 186  2006.11

     View Summary

    To meet the requirements in application specific processor designs, such as area, cost, performance and design time, we have been developing a HW/SW co-design system, called SPADES, which can generate an application specific processor with minimum area on the constraint of the execution time of an application. In SPADES, we reduce the area by reducing unnecessary HW unit and then change the instruction set. However, to change the instruction set will affect the processor architecture. On the other hand, forwarding unit is easy to become the critical path in processors when the processor architecture becomes complex. Thus in this paper, we focus on the forwarding unit for optimization by making a tradeoff between area and delay while without any changes in the instruction set. We also propose a new forwarding unit architecture, called foresight judgment type forwarding unit, which can be incorporated into SPADES to generate HDL description automatically without any knowledge of our system. Experimental results show that the proposed method is more suitable in HW/SW co-design systems to generate the optimized forwarding unit.

    CiNii

  • A Forwarding Unit Optimization Method for Application Processors

    HIURA Toshihiro, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 389 ) 49 - 54  2006.11

     View Summary

    To meet the requirements in application specific processor designs, such as area, cost, performance and design time, we have been developing a HW/SW co-design system, called SPADES, which can generate an application specific processor with minimum area on the constraint of the execution time of an application. In SPADES, we reduce the area by reducing unnecessary HW unit and then change the instruction set. However, to change the instruction set will affect the processor architecture. On the other hand, forwarding unit is easy to become the critical path in processors when the processor architecture becomes complex. Thus in this paper, we focus on the forwarding unit for optimization by making a tradeoff between area and delay while without any changes in the instruction set. We also propose a new forwarding unit architecture, called foresight judgment type forwarding unit, which can be incorporated into SPADES to generate HDL description automatically without any knowledge of our system. Experimental results show that the proposed method is more suitable in HW/SW co-design systems to generate the optimized forwarding unit.

    CiNii

  • A Forwarding Unit Optimization Method for Application Processors

    HIURA Toshihiro, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 392 ) 49 - 54  2006.11

     View Summary

    To meet the requirements in application specific processor designs, such as area, cost, performance and design time, we have been developing a HW/SW co-design system, called SPADES, which can generate an application specific processor with minimum area on the constraint of the execution time of an application. In SPADES, we reduce the area by reducing unnecessary HW unit and then change the instruction set. However, to change the instruction set will affect the processor architecture. On the other hand, forwarding unit is easy to become the critical path in processors when the processor architecture becomes complex. Thus in this paper, we focus on the forwarding unit for optimization by making a tradeoff between area and delay while without any changes in the instruction set. We also propose a new forwarding unit architecture, called foresight judgment type forwarding unit, which can be incorporated into SPADES to generate HDL description automatically without any knowledge of our system. Experimental results show that the proposed method is more suitable in HW/SW co-design systems to generate the optimized forwarding unit.

    CiNii

▼display all

Industrial Property Rights

Awards

  • IEEK Best Paper Award

    2012.11  

Research Projects

  • Automatic False Path Identification and Test Synthesis System Development to Avoid Overtesting

  • Design Methods for Crypto LSI Implementations and Testing

  • Research on delay test techniques for ultra-low power designs

  • タイミングエラー予測によるばらつき耐性を有するLSI設計技術に関する研究

    科学研究費助成事業(早稲田大学)  科学研究費助成事業(基盤研究(C))

Presentations

  • Application and evaluation of CNN with approximate adders

    井上 雄太, 戸川 望, 柳澤 政生, 史 又華

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems 

    Presentation date: 2018.05

  • A low power SRAM design with leakage power reduction

    伊藤 卓, 戸川 望, 柳澤 政生, 史 又華

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems 

    Presentation date: 2018.05

  • MOSs SP-SSHI for low frequency piezoelectric energy harvesting

    杉山 貴紀, 戸川 望, 柳澤 政生, 史 又華

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems 

    Presentation date: 2018.05

  • Soft error tolerant latch designs with low power consumption (invited paper)

    Saki Tajima, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

    Proceedings of International Conference on ASIC 

    Presentation date: 2018.01

     View Summary

    © 2017 IEEE. As semiconductor technology continues scaling down, the reliability issue has become much more critical than ever before. Unlike traditional hard-errors caused by permanent physical damage which can't be recovered in field, soft errors are caused by radiation or voltage/current fluctuations that lead to transient changes on internal node states, thus they can be viewed as temporary errors. However, due to the unpredictable occurrence of soft errors, it is desirable to develop soft error tolerant designs. For this reason, soft error tolerant design techniques have gained great research interest. In this paper, we will explain the soft error mechanism and then review the existing soft error tolerant design techniques with particular emphasis on SEH family because they can achieve low power consumption and small performance overhead as well.

  • A low cost and high speed CSD-based symmetric transpose block FIR implementation

    Jinghao Ye, Youhua Shi, Nozomu Togawa, Masao Yanagisawa

    Proceedings of International Conference on ASIC 

    Presentation date: 2018.01

     View Summary

    © 2017 IEEE. In this paper, a low cost and high speed CSD-based symmetric transpose block FIR design was proposed for low cost digital signal processing. First, the existing area-efficient CSD-based multiplier was optimized by considering the reusability and the symmetry of coefficients for area reduction. Second, the position of the input register was changed for high speed transpose block FIR processing in which half of the number of required multipliers can be saved. When compared with the existing block FIR designs, the proposed FIR design can increase the data rate from 238.66 MHz to 373.13 MHz while saving 10.89% area and 21.30% energy consumption as well.

  • Design of a soft error detection latch using internal node

    中垣 直道, 戸川 望, 柳澤 政生, 史 又華

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems 

    Presentation date: 2017.05

  • C-element based soft-error hardened latch designs

    田島 咲季, 戸川 望, 柳澤 政生, 史 又華

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems 

    Presentation date: 2017.05

  • Maximum error distance-based optimization of GeAr circuits

    早水 謙, 戸川 望, 柳澤 政生, 史 又華

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems 

    Presentation date: 2017.05

  • Self-powered switching magnetic transformer circuit for energy harvesting systems

    川合 洋平, 戸川 望, 柳澤 政生, 史 又華

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems 

    Presentation date: 2017.05

  • Improved monitoring-path selection algorithm for suspicious timing error prediction based timing speculation

    Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    Proceedings - 2015 IEEE 11th International Conference on ASIC, ASICON 2015 

    Presentation date: 2016.07

     View Summary

    © 2015 IEEE. As process technology is scaling down, timing speculation techniques such as Razor and STEP are emerged as alternative solutions to reduce required margins due to various variation effects. Unlike Razor, STEP is a prediction-based timing speculation method to predict suspicious timing errors before they really appear, and thus it can result in more performance improvement. Therefore, an improved monitoring-path selection algorithm for STEP-based timing speculation is proposed in this paper, in which candidate monitoring-paths are selected based on short path removement and path length estimation. Experimental results show that the proposed algorithm realizes an average of 1.71X overclocking compared with worst-case based designs.

  • A low-power soft error tolerant latch scheme

    Saki Tajima, Youhua Shi, Nozomu Togawa, Masao Yanagisawa

    Proceedings - 2015 IEEE 11th International Conference on ASIC, ASICON 2015 

    Presentation date: 2016.07

     View Summary

    © 2015 IEEE. As process technology continues scaling, low power and reliability of integrated circuits are becoming more critical than ever before. Particularly, due to the reduction of node capacitance and operating voltage for low power consumption, it makes the circuits more sensitive to high-energy particles induced soft errors. In this paper, a soft-error tolerant latch called TSPC-SEH is proposed for soft error tolerance with low power consumption. The simulation results show that the proposed TSPC-SEH latch can achieve up to 42% power consumption reduction and 54% delay improvement compared to the existing soft error tolerant SEH and DICE designs.

  • In-situ Trojan authentication for invalidating hardware-Trojan functions

    Masaru Oya, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    Proceedings - International Symposium on Quality Electronic Design, ISQED 

    Presentation date: 2016.05

     View Summary

    © 2016 IEEE. Due to the fact that we do not know who will create hardware Trojans (HTs), and when and where they would be inserted, it is very difficult to correctly and completely detect all the real HTs in untrusted ICs, and thus it is desired to incorporate in-situ HT invalidating functions into untrusted ICs as a countermeasure against HTs. This paper proposes an in-situ Trojan authentication technique for gate-level netlists to avoid security leakage. In the proposed approach, an untrusted IC operates in authentication mode and normal mode. In the authentication mode, an embedded Trojan authentication circuit monitors the bit-flipping count of a suspicious Trojan net within the pre-defined constant clock cycles and identify whether it is a real Trojan or not. If the authentication condition is satisfied, the suspicious Trojan net is validated. Otherwise, it is invalidated and HT functions are masked. By doing this, even untrusted netlists with HTs can still be used in the normal mode without security leakage. By setting the appropriate authentication condition using training sets from Trust-HUB gate-level benchmarks, the proposed technique invalidates successfully only HTs in the training sets. Furthermore, by embedding the in-situ Trojan authentication circuit into a Trojan-inserted AES crypto netlist, it can run securely and correctly even if HTs exist where its area overhead is just 1.5% with no delay overhead.

  • A delay variation and floorplan aware high-level synthesis algorithm with body biasing

    Koki Igawa, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    Proceedings - International Symposium on Quality Electronic Design, ISQED 

    Presentation date: 2016.05

     View Summary

    © 2016 IEEE. In this paper, we propose a delay variation and floorplan aware high-level synthesis algorithm with body biasing, which minimizes the average leakage energy of manufactured chips. To realize a floorplan-oriented high-level synthesis, we utilize a huddle-based distributed register architecture (HDR architecture), one of the DR architectures. HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit but can increase the latency. We assign CDFG nodes in critical paths to the huddles with larger expected leakage energy and those in non-critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 38.9% without latency and yield degradation compared with typical-case design with body biasing.

  • Fast and Low-power Soft-error Tolerant Fast-SEH Latch

    田島 咲季, 史 又華, 戸川 望, 柳澤 政生

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems 

    Presentation date: 2016.05

  • A process-variation-aware multi-scenario high-level synthesis algorithm for distributed-register architectures

    Koki Igawa, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    International System on Chip Conference 

    Presentation date: 2016.02

     View Summary

    © 2015 IEEE. In order to tackle a process-variation problem, we can define several scenarios, each of which corresponds to a particular LSI behavior, such as a typical-case scenario and a worst-case scenario. By designing a single LSI chip which realizes multiple scenarios simultaneously, we can have a process-variation-tolerant LSI chip. In this paper, we propose a process-variation-aware low-latency and multi-scenario high-level synthesis algorithm targeting new distributed-register architectures, called HDR architectures. We assume two scenarios, a typical-case scenario and a worst-case scenario, and realize them onto a single chip. We first schedule/bind each of the scenarios independently. After that, we commonize the scheduling/binding results for the typical-case and worst-case scenarios and thus generate a commonized area-minimized floorplan result. Experimental results show that our algorithm reduces the latency of the typical-case scenario by up to 50% without increasing the latency of the worst-case scenario, compared with several existing methods.

  • Scan-based side-channel attack against symmetric key ciphers using scan signatures

    Mika Fujishiro, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015 

    Presentation date: 2015.09

     View Summary

    © 2015 IEEE. There are a number of studies on a side-channel attack which uses information exploited from the physical implementation of a cryptosystem. A scan-based side-channel attack utilizes scan chains, one of design-for-test techniques and retrieves the secret information inside the cryptosystem. In this paper, scan-based side-channel attack methods against symmetric key ciphers such as block ciphers and stream ciphers using scan signatures are presented to show the risk of scan-based attacks.

  • FPGA-based SHA-3 acceleration on a 32-bit processor via instruction set extension

    Yi Wang, Youhua Shi, Chao Wang, Yajun Ha

    Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015 

    Presentation date: 2015.09

     View Summary

    © 2015 IEEE. As embedded systems play more and more important roles Internet of Things (IoT), the integration of cryptographic functionalities is an urgent demand to ensure data and information security. Recently, Keccak was declared as the winner of the third generation of Secure Hashing Algorithm (SHA-3). However, implementing SHA-3 on a specific 32-bit processor failed to meet the performance requirement. On the other hand, implementing it as a cryptographic coprocessor consumes a lot of extra area and requires customized driver program. Although implementing Keccak on a 64-bit platform is more efficient, this platform is not suitable for embedded implementation. In this paper, we propose a novel SHA-3 implementation using instruction set extension based on a 32-bit LEON3 processor (an open source processor), with the goals of reducing execution cycles and code size. Experimental results show that the proposed design reduces around 87% execution cycles and 10.5% code size as compared to reference designs. Our design takes up only 9.44% extra area with negligible speed overhead compared to the standard LEON3 processor. Compared to the existing hardware accelerators, our proposed design occupies only half of area resources and does not require extra driver programs to be developed when integrated into the overall system.

  • A floorplan-aware high-level synthesis technique with delay-variation tolerance

    Kazushi Kawamura, Yuta Hagio, Youhua Shi, Nozomu Togawa

    Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015 

    Presentation date: 2015.09

     View Summary

    © 2015 IEEE. For realizing better trade-off between performance and yield rate in recent LSI designs, it is required to deal with increasing the ratios of interconnect delay as well as delay variation. In this paper, a novel floorplan-aware high-level synthesis technique with delay-variation tolerance is proposed. By utilizing floorplan-driven architectures, interconnect delays can be estimated and then handled even in high-level synthesis. Applying our technique enables to realize two scheduling/binding results (one is a non-delayed result and the other is a delayed result) simultaneously on a chip with small area/performance overhead, and either one of them can be selected according to the post-silicon delay variation. Experimental results demonstrate that our technique can reduce delayed scheduling/binding latency by up to 32.3% compared with conventional approaches.

  • A universal delay line circuit for variation resilient IC with self-calibrated time-to-digital converter

    Shuai Shao, Youhua Shi, Wentao Dai, Jianyi Meng, Weiwei Shan

    Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015 

    Presentation date: 2015.09

     View Summary

    © 2015 IEEE. A universal delay monitor used to imitate the real critical paths is developed for variation resilient integrated circuit. This monitor is constructed based on the different proportion of logic cells and interconnects. The delay of the monitor is detected by a time-to-digital converter which keeps the sampling results precise. To reduce the deviation of the sampling results caused by PVT, a novel time-to-digital converter with self-calibration mechanism is developed. This variation resilient method based adaptive voltage scaling is applied on an ARM7 based System on a Chip on 0.18 μm CMOS process with a 112M signoff frequency and an area of 1.3∗1.3 mm2. The simulation results show that it has a 43.42% gain of power consumption under FF corner, -25°C compared to the fixed 1.8 V traditional design.

  • A Score-Based Classification Method for Identifying Hardware-Trojans Inserted/Free Gate-Level Netlists

    Presentation date: 2015.03

  • A score-based classification method for identifying Hardware-Trojans at gate-level netlists

    Masaru Oya, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    Proceedings -Design, Automation and Test in Europe, DATE 

    Presentation date: 2015.01

     View Summary

    © 2015 EDAA. Recently, digital ICs are often designed by outside vendors to reduce design costs in semiconductor industry, which may introduce severe risks that malicious attackers implement Hardware Trojans (HTs) on them. Since IC design phase generates only a single design result, an RT-level or gate-level netlist for example, we cannot assume an HT-free netlist or a Golden netlist and then it is too difficult to identify whether a generated netlist is HT-free or HT-inserted. In this paper, we propose a score-based classification method for identifying HT-free or HT-inserted gate-level netlists without using a Golden netlist. Our proposed method does not directly detect HTs themselves in a gate-level netlist but a net included in HTs, which is called Trojan net, instead. Firstly, we observe Trojan nets from several HT-inserted benchmarks and extract several their features. Secondly, we give scores to extracted Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to classify HT-free and HT-inserted netlists. Based on these scores, we can successfully classify HT-free and HT-inserted netlists in all the Trust-HUB gate-level benchmarks. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be 'HT-inserted' and all the HT-free gate-level benchmarks to be 'HT-free' in approximately three hours for each benchmark.

  • In-situ timing monitoring methods for variation-resilient designs

    Youhua Shi, Nozomu Togawa

    IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 

    Presentation date: 2015.01

     View Summary

    © 2014 IEEE. With technology scaling, process, voltage, and temperature (PVT) variations pose great challenges on integrated circuit designs. Conventionally, LSI circuits are designed by adding pessimistic timing margin to guarantee 'always correct' operations even under worst-case conditions. However, due to the increasing PVT variations, unacceptable larger design guard band should be reserved to avoid timing errors on critical paths of circuits, which will therefore lead to very inefficient designs in terms of power and performance. For this reason, in-situ timing monitoring technique has gained great research interest. In this paper, we will review existing variation-resilient design techniques with particular emphasis on in-situ timing monitoring techniques including both detection and prediction-based methods. The effectiveness of in-situ timing monitoring techniques will be discussed. Finally, we show an example of in-situ timing monitoring technique called STEP with applications to general pipeline designs.

  • An area-overhead-oriented monitoring-path selection algorithm for suspicious timing error prediction

    Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 

    Presentation date: 2015.01

     View Summary

    © 2014 IEEE. As process technologies advance, the importance of timing error correction techniques is increasing as well. In this paper, We propose an area-overhead-oriented monitoring-path selection algorithm for suspicious timing error prediction circuits (STEPCs). STEPC predicts timing errors by monitoring the middle points of several speed-paths in a circuit. However, we need many STEPCs with a high area overhead to predict timing errors in an overall circuit. Our proposed method moves the STEPC insertion positions to minimize the number of inserted STEPCs. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs. Our proposed algorithm reduces the required number of STEPCs to 1/19 and their area to 1/5 compared with a naive algorithm. Furthermore, our algorithm realizes 2.25X overclocking compared with just inserting STEPCs into several speed-paths.

  • Secure scan design using improved random order and its evaluations

    Masaru Oya, Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 

    Presentation date: 2015.01

     View Summary

    © 2014 IEEE. Scan test using scan chains is one of the most important DFT techniques. However, scan-based attacks are reported which can retrieve the secret key in crypto circuits by using scan chains. Secure scan architecture is strongly required to protect scan chains from scan-based attacks. This paper proposes an improved version of random order as a secure scan architecture. In improved random order, a scan chain is partitioned into multiple sub-chains. The structure of the scan chain changes dynamically by selecting a subchain to scan out. Testability and security of the proposed improved random order are also discussed in the paper, and the implementation results demonstrate the effectiveness of the proposed method.

  • In-situ Timing Monitoring Methods for Variation-Resilient Designs

    Presentation date: 2014.11

  • An Area-Overhead-Oriented Monitoring-Path Selection Algorithm for Suspicious Timing Error Prediction

    Presentation date: 2014.11

  • Secure Scan Design Using Improved Random Order and its Evaluations

    Presentation date: 2014.11

  • A Network-flow-based Checkpoint Insertion Algorithm for Suspicious Timing Error Prediction Scheme

    吉田 慎之介, 史 又華, 柳澤 政生

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems 

    Presentation date: 2014.08

  • InTimeTune: A Throughput Driven Timing Speculation Architecture for Overscaled Designs

    Presentation date: 2014.06

  • Throughput Driven Check Point Selection in Suspicious Timing Error Prediction based Designs

    Presentation date: 2014.02

  • Throughput driven check point selection in suspicious timing error prediction based designs

    Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE 5th Latin American Symposium on Circuits and Systems, LASCAS 2014 - Conference Proceedings 

    Presentation date: 2014.01

     View Summary

    In this paper, a throughput-driven design technique is proposed, in which a suspicious timing error prediction circuit is inserted to monitor the signal transitions at some selected check points. Unlike previous works where timing errors are detected after their occurrence, the proposed method tries to use the real intermediate signal transitions for timing error prediction. The check point selection will affect both the maximal operation frequency and the suspicious timing error overestimation rate, both of which have an effect on the overall throughput, thus an analysis on the check point selection is also given. In our work, the circuit can be overclocked by a factor of 2 or more with ignorable area overhead while guarantees the always-correct output. © 2014 IEEE.

  • Secure Scan Design with Dynamically Configurable Connection

    Presentation date: 2013.12

  • Predication based Timing Speculation Technique for Throughput Improvement

    Presentation date: 2013.11

  • Concurrent faulty clock detection for crypto circuits against clock glitch based DFA

    Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    Proceedings - IEEE International Symposium on Circuits and Systems 

    Presentation date: 2013.09

     View Summary

    In this paper, a concurrent faulty clock detection method is proposed for crypto circuits against clock glitch based differential fault analysis (DFA). In the proposed method, a nonlogic buffer-based delay chain is inserted, and then by monitoring the delay along the delay chain, a possible clock glitch based DFA can be detected. Experimental results on an AES circuit show that the proposed method can successfully detect clock glitch based attacks, and the required area overhead is only 0.47% that is much smaller than previous works. © 2013 IEEE.

  • An energy-efficient high-level synthesis algorithm incorporating interconnection delays and dynamic multiple supply voltages

    Shin Ya Abe, Youhua Shi, Kimiyoshi Usami, Kimiyoshi Usami, Kimiyoshi Usami, Masao Yanagisawa, Masao Yanagisawa, Nozomu Togawa

    2013 International Symposium on VLSI Design, Automation, and Test, VLSI-DAT 2013 

    Presentation date: 2013.08

     View Summary

    In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture) that integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis. Next, we propose a high-level synthesis algorithm for AVHDR architectures. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, huddles, each of which abstracts modules placed close to each other, are naturally generated using floorplanning. Low-supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Experimental results show that our algorithm achieves 50% energy-saving compared with conventional algorithms. © 2013 IEEE.

  • Random Order Scan Design against Scan-Based Attacks

    跡部 悠太, 史 又華, 柳澤 政生

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems 

    Presentation date: 2013.07

  • Suspicious timing error prediction with in-cycle clock gating

    Youhua Shi, Hiroaki Igarashi, Nozomu Togawa, Masao Yanagisawa

    Proceedings - International Symposium on Quality Electronic Design, ISQED 

    Presentation date: 2013.07

     View Summary

    Conventionally, circuits are designed to add pessimistic timing margin to solve delay variation problems, which guarantees 'always correct' operations. However, due to the fact that such a worst-case condition occurs rarely, the traditional pessimistic design method is therefore becoming one of the main obstacles for designers to achieve higher performance and/or ultra-low power consumption. By monitoring timing error occurrence during circuit operation, adaptive timing error detection and recovery methods have gained wide interests recently as a promising solution. As an extension of existing research, in this paper, we propose a suspicious timing error prediction method for performance or energy efficiency improvement in pipeline designs. Experimental results show that with when compared with typical margin designs, the proposed method can 1) achieve up to 1.41X throughput improvement with in-situ timing error prediction ability; and 2) allow the design to be overclocked by up to 1.88X with 'always correct' outputs. © 2013 IEEE.

  • Floorplan Driven Architectures and High-level Synthesis Algorithm for Dynamic Multiple Supply Voltages

    Presentation date: 2013.06

  • Concurrent Faulty Clock Detection for Crypto Circuits Against Clock Glitch Based DFA

    Presentation date: 2013.05

  • DR24 An Energy-efficient High-level Synthesis Algorithm Incorporating Interconnection Delays and Dynamic Multiple Supply Voltages

    Presentation date: 2013.04

  • Suspicious Timing Error Detection and Recovery with In-Cycle Clock Gating

    Presentation date: 2013.03

  • Secure scan design with dynamically configurable connection

    Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 

    Presentation date: 2013.01

     View Summary

    Scan test is a powerful test technique which can control and observe the internal states of the circuit under test through scan chains. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore new secure test methods are required to satisfy both testability and security requirements. In this paper, a secure scan design is proposed to achieve adequate security requirement as a countermeasure against scan-based attacks, while still maintain high testability like normal scan testing. In our method, the internal scan chain is divided into several sub chains, and the connection order of sub chains can be dynamically changed. In addition, how to decide the connection order of those sub chains so that it can't be identified by an attacker is also proposed in this paper. The proposed method is implemented on an AES circuit to show its effectiveness, and a security analysis is also given to show how the proposed approach can be used as a countermeasure against those known scan-based attacks. © 2013 IEEE.

  • State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Side Channel Attack on RSA Circuit

    Presentation date: 2012.12

  • State dependent scan flip-flop with key-based configuration against scan-based side channel attack on RSA circuit

    Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 

    Presentation date: 2012.12

     View Summary

    Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore testability and security contradicted to each other, and there is a need to an efficient design for testability circuit so as to satisfy both testability and security requirement. In this paper, a secure scan architecture against scan-based attack is proposed to achieve high security without compromising the testability. In our method, scan structure is dynamically changed by adding the latch to any FFs in the scan chain. We made an analysis on an RSA circuit implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack. © 2012 IEEE.

  • Dynamically changeable secure scan architecture against scan-based side channel attack

    Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    ISOCC 2012 - 2012 International SoC Design Conference 

    Presentation date: 2012.12

     View Summary

    Scan test which is one of the useful design for testability techniques is effective for LSIs including cryptographic circuit. It can observe and control the internal states of the circuit under test by using scan chain. However, scan chain presents a significant security risk of information leakage for scan-based attacks which retrieves secret keys of cryptographic LSIs. In this paper, a secure scan architecture against scan-based attack which still has high testability is proposed. In our method, scan data is dynamically changed by adding the latch to any FFs in the scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method. © 2012 IEEE.

  • Dynamically Changeable Architecture against Scan-Based Side Channel, Attack Using State Dependent Scan Flip-Flop on RSA Circuit

    Presentation date: 2012.11

  • VLSI implementation of a fast intra prediction algorithm for H.264/AVC encoding

    Youhua Shi, Kenta Tokumitsu, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 

    Presentation date: 2010.12

     View Summary

    Intra-frame coding is one of the most important technologies in H.264/AVC, which made significant contributions to the enhancement of coding efficiency of H.264/AVC at the cost of computation complexity. To address this problem, in this paper we present an efficient VLSI implementation of a computation efficient intra prediction algorithm for H.264/AVC encoding. Unlike most of existing fast intra-mode selection techniques, in the proposed method the directional differences are computed using a few selected original pixels to obtain the candidate modes with the minimal direction cost. The proposed method is hardware-friendly and provides more processing parallelism for H.264 intra-frame encoding with less overhead and less power consumption, which is expected to be utilized as a favourable accelerator hardware module in a real-time HDTV (1920×1080p) H.264 encoder. © 2010 IEEE.

  • State-dependent changeable scan architecture against scan-based side channel attacks

    Ryuta Nara, Hiroshi Atobe, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    ISCAS 2010 - 2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems 

    Presentation date: 2010.08

     View Summary

    Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan path would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan path to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each State-dependent Scan FF to be inverted or not so as to make it more difficult to discover the internal scan architecture. ©2010 IEEE.

  • Design-for-secure-test for crypto cores

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    Proceedings - International Test Conference 

    Presentation date: 2009.12

     View Summary

    Scan technology carries the potential of being misused as a "side channel" to leak out the secret information of crypto cores. To address such a design challenge, this paper proposes a design-for-secure-test (DFST) solution for crypto cores by adding a stimuli-launched flip-flop into the traditional scan flip-flop to maintain the high test quality without compromising the security. © 2009 IEEE.

  • Unknown response masking with minimized observable response loss and mask data

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 

    Presentation date: 2008.12

     View Summary

    This paper presents a new unknown response masking technique to minimize the effect on test loss due to over-masking. Unlike previous works where the scan responses are masked before entering the response compactor, the proposed method could mask the Xs when they are transformed on the scan path. Meanwhile, the masking cells are inserted along the scan paths, thus they would have no degradation on the performance of the designs. In addition, the test data required to mask unknown responses is only one bit for each test pattern. Experimental results show the effectiveness of the proposed method. © 2008 IEEE.

  • GECOM: Test data compression combined with all unknown response masking

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC 

    Presentation date: 2008.08

     View Summary

    This paper introduces GECOM technology, a novel test compression method with seamless integration of test GEneration, test COmpression (i.e. integrated compression on scan stimulus and masking bits) and all unknown scan responses Masking for manufacturing test cost reduction. Unlike most of prior methods, the proposed method considers the unknown responses during ATPG procedure and selectively encodes the specified 1 or 0 bits (either 1s or 0s) in scan slices for compression while at the same time masks the unknown responses before sending them to the response compactor. The proposed GECOM technology consists of GECOM architecture and GECOM ATPG technique. In the GECOM architecture, for a circuit with N internal scan chains, only c tester channels, where c = [log2 N] +2, are required. GECOM ATPG generates test patterns for the GECOM architecture thus not only the scan inputs could be efficiently compressed but also all the unknown responses would be masked. Experimental results on both benchmark circuits and real industrial designs indicated the effectiveness of the proposed GECOM technique. ©2008 IEEE.

  • Scalable unified dual-radix architecture for Montgomery multiplication in GF{P) and GF(2n)

    Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, Kazunori Shimizu, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC 

    Presentation date: 2008.08

     View Summary

    Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in GF(P) or GF(2n), and unified architectures for GF(P) and GF(2n) multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2n) circuits of the multiplier because the critical path of GF(P) circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2n). The proposed architecture unifies 4 parallel radix-216multipliers in GF(P) and a radix-264multiplier in GF(2n) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute GF(P) 256-bit Montgomery multiplication in 0.23μs. ©2008 IEEE.

  • Design for secure test - A case study on pipelined advanced encryption standard

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    Proceedings - IEEE International Symposium on Circuits and Systems 

    Presentation date: 2007.09

     View Summary

    Cryptography plays an important role in the security of data transmission. To ensure the correctness of crypto hardware, we should conduct testing at fabrication and infield. However, the state-of-the-art scan-based test techniques, to achieve high test qualities, need to increase the testability of the circuit under test, which carries a potential of being misused to reveal the secret information of the crypto hardware. Thus, to develop efficient test strategies for crypto hardware to achieve high test quality without compromising security becomes an important task. In this paper we discuss the development of a Design-forSecure-Test (DFST) technique for pipelined AES to overcome the above contradiction between security and test quality in testing crypto hardware. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits. © 2007 IEEE.

  • Low-cost IP core test using multiple-mode loading scan chain and scan chain clusters

    Gang Zeng, Youhua Shi, Toshinori Takabatake, Masao Yanagisawa, Hideo Ito

    Proceedings - IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems 

    Presentation date: 2006.12

     View Summary

    A fixing-shifting encoding (FSE) method is proposed to reduce test cost of IP cores. The FSE method reduces test cost by supporting multiple-mode loading test data, i.e., parallel loading, left-direction, and right-direction serial loading for each test slice data. Furthermore, the FSE that utilizes only two test channels can support a large number of internal scan chains and achieve further reduction in test cost by combining with scan chain clustering method. As a non-intrusive and automatic test pattern generation (ATPG) independent solution, the approach is applicable to IP core testing because it requires neither redesign of the core under test (CUT) nor running any additional ATPG for the encoding procedure. In addition, the decoder has low hardware overhead, and its design is independent of the CUT. Experimental results for some large ISCAS 89 benchmarks and an industry ASIC design have proven the efficiency of the proposed approach. © 2006 IEEE.

  • FCSCAN: An efficient multiscan-based test compression technique for test cost reduction

    Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

    Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC 

    Presentation date: 2006.09

     View Summary

    This paper proposes a new multiscan-based test input data compression technique by employing a Fan-out Compression Scan Architecture (FCSCAN) for test cost reduction. The basic idea of FCSCAN is to target the minority specified 1 or 0 bits (either 1 or 0) in scan slices for compression. Due to the low specified bit density in test cube set, FCSCAN can significantly reduce input test data volume and the number of required test channels so as to reduce test cost. The FCSCAN technique is easy to be implemented with small hardware overhead and does not need any special ATPG for test generation. In addition, based on the theoretical compression efficiency analysis, improved procedures are also proposed for the FCSCAN to achieve further compression. Experimental results on both benchmark circuits and one real industrial design indicate that drastic reduction in test cost can be indeed achieved. © 2006 IEEE.

  • Low power test compression technique for designs with multiple scan chains

    Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

    Proceedings of the Asian Test Symposium 

    Presentation date: 2005.12

     View Summary

    This paper presents a new DFT technique that can significantly reduce test data volume as well as scan-in power consumption for multiscan-based designs. It can also help to reduce test time and tester channel requirements with small hardware overhead. In the proposed approach, we start with apre-computed test cube set and fill the don't-cares with proper values for joint reduction of test data volume and scan power consumption. In addition we explore the linear dependencies of the scan chains to construct a fanout structure only with inverters to achieve further compression. Experimental results for the larger ISCAS'89 benchmarks show the efficiency of the proposed technique. © 2005 IEEE.

  • Alternative run-length coding through scan chain reconfiguration for joint minimization of test data volume and power consumption in scan test

    Youhua Shi, Shinji Kimura, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    Proceedings of the Asian Test Symposium 

    Presentation date: 2004.12

     View Summary

    Test data volume and scan power are two major concerns in SoC test. In this paper we present an alternative run-length coding method through scan chain reconfiguration to reduce both test data volume and scan-in power consumption. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. To extract the compatible scan cells we apply a heuristic algorithm by solving the graph coloring problem; and then a simple greedy algorithm is used to configure the scan chain for the minimization of scan power. Experimental results for the larger IS-CAS'89 benchmarks show that the proposed approach leads to highly reduced test data volume with significant power savings during scan test.

  • Reducing test data volume for multiscan-based designs through single/sequence mixed encoding

    Youhua Shi, Youhua Shi, Shinji Kimura, Nozomu Togawa, Nozomu Togawa, Masao Yanagisawa, Masao Yanagisawa, Tatsuo Ohtsuki, Tatsuo Ohtsuki

    Midwest Symposium on Circuits and Systems 

    Presentation date: 2004.12

     View Summary

    This paper presents a new test data compression technique for multiscan-based designs through dictionary-based encoding on the single or sequences scan-inputs. In spite of its simplicity, it achieves significant reduction in test data volume. Unlike some previous approaches on test data compression, our approach eliminates the need for additional synchronization and handshaking between the CUT and the ATE, so it is especially suitable to be integrated in a low cost test scheme for SoC test In addition in contrast to previous dictionary-based coding techniques, even for the CUT with a small number of scan chains, the proposed approach can achieve satisfied reduction in test data volume. Experimental results showed the proposed test scheme works particularly well for the large ISCAS'89 benchmarks.

  • Multiple test set generation method for LFSR-based BIST

    Youhua Shi, Zhe Zhang

    Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC 

    Presentation date: 2003.01

     View Summary

    © 2003 IEEE. In this paper we propose a new reseeding method for LFSR-based test pattern generation suitable for circuits with random pattern resistant faults. The character of our method is that the proposed test pattern generator (TPG) can work both in normal LFSR mode, to generate pseudorandom test vectors, and in jumping mode to make the TPG jump from a state to the required state (seed of next group). Experimental results indicate that its superiority against other known reseeding techniques with respect to the length of the test sequence and the required area overhead.

▼display all

Specific Research

  • エッジコンピューティングに向け高いエネルギー効率をもつDNN回路設計技術の創出

    2021   葉静浩

     View Summary

    Driven by the explosive growth ofavailable data and powerful computing resources, deep neural networks (DNNs)have achieved remarkable breakthrough recently. As DNN models become morediverse for various applications, how to obtain an optimal accelerator designfor specific NN models while maintaining high energy efficiency with limitedhardware resources becomes an emerging challenge. Unfortunately, few systematicapproaches have been proposed yet. To address this design challenge, amodel-defined energy efficient DNN accelerator design through design spaceexploration and architecture optimization is proposed. Firstly, two dual datareuse approaches are proposed to improve on-chip data utilization efficiency.Secondly, a layer-wise design space exploration framework is developed toprecisely determine the optimal tiling configuration and the corresponding datareuse strategy for target neural network models even with on-chip hardwareresource constraints, which can minimize the amount of data movement betweenoff-chip DRAM and on-chip GLB. Thirdly, an energy efficient accelerator designwith on-chip dual data reuse, centered ifmap/weight buffers, distributed psumbuffers, and optimal resource configuration techniques is presented for GLBaccess reduction and energy efficiency improvement. Compared with thestate-of-the-art accelerators, the proposed design can leverage the energyefficiency by up to 2.7X and 3.6X for AlexNet and VGG, respectively.

  • ウェアラブルデバイスに適用するエナジーハーベスティングインターフェース回路の開発

    2020  

     View Summary

    近年では ,スマートフォン需要の拡大や技術発達に伴う機器の小型化・コスト低下などから Internet of Things(IoT)が様々な分野で急速に普及している。IoTデバイスの電源問題を解決するために,エナジーハーベスティング(Energy Harvesting:EH)技術が大きな注目を集めている。しかし,個々のエナジーハーベスタ(例えば:摩擦帯電型素⼦・圧電素⼦)から得られるエネルギーは⾮常に微弱であるため,高効率なEHインターフェース回路設計技術が必要である。そのため,本研究はウェアラブルデバイスに適⽤するEHインターフェース回路の開発を行った。結果,提案回路を用いて,人体の動作を用いたバッテリーフリー無線送信可能なウェアラブルデバイスの実現を達成した。

  • デジタル社会に向け長期的に高信頼かつ超低消費電力メモリの研究開発

    2019  

     View Summary

    デジタル社会において、データの量が爆発的に増加しているため、メモリ回路の重要性はますます重要になってきている。しかし微細化によってトランジスタの性能ばらつきやソフトエラーの発生率が増大した事と、SRAM (Static Random Access Memory) と呼ばれるメモリの容量が増大し歩留まりが悪くなったことでメモリの消費電力は増大している。そのため、今後のデジタル社会の実現のために、長期的高信頼化かつ超低消費電力化メモリ設計技術の開発が急務である。本研究ではメモリ回路(特にSRAM回路)の長期的に高信頼化・低消費電力化を目的とした回路設計技術の研究開発を行った。特に、低消費電力化かつ長期的な安定性向上の設計技術を提案し、その有効性を評価した。

  • ビッグデータ処理に向けたApproximate Computingを実現するLSI設計技術の研究開発

    2018  

     View Summary

     近年、IoT(Internet of things)・ビッグデータ・人工知能への注目が高まっている。このような膨大的なデータ解析・処理において大きな問題となるのは、その計算量の多さ、実行時間の長さからくる消費電力の大きさである。一方、ビッグデータ分野では潜在的にエラー耐性を持ち、完全な精度の計算が必要とされない場面が多数ある。そこで、本研究は膨大的かつ潜在的にエラー耐性を持つビッグデータ処理に向けて、Approximate Computingを実現するLSI設計技術に関する研究を行った。特に、①エラー距離を考慮した概算加算回路の性能・精度指標の定式化、②ビット幅削減による低消費電力化FIR 回路、および③CNN に対する算術オーバーフローを考慮したビット幅削減手法などを提案した。

  • 自然エネルギー利用に向けたスマートケースLSI設計技術の創生

    2014  

     View Summary

     本研究ではLSI(大規模集積回路)の設計技術に焦点を当て、不安定且つ微弱な自然エネルギーに適合し、状況に応じた最適な動作を実現するスマートケースLSI設計技術の研究開発を行った。特に、既存LSI設計技術の問題点を解決する革新的技術として「I: 極低エネルギーLSI設計技術」と「II:動作中自己調整機能を持つ設計技術」を提案した。本研究は、既存のワーストケースに基づいたLSI設計方法ではなく、回路が動作時自己調整により処理性能・消費電力・信頼性を最大限引き出すことが可能なシステムLSI設計基盤技術を開発した。

  • ディペンタブルな低電圧LSI設計技術に関する研究

    2011  

     View Summary

     情報通信機器が高性能化するにしたがい、消費電力の増大が大きな問題になりつつある。LSI回路の低消費電力化には、LSI の電源電圧を下げることが最も効果的である。CMOS回路の動作電力は電圧の自乗に比例するので、電圧を1/3にすれば、単純には消費電力がほぼ1/10 になる。しかし、低電圧の条件下ではCMOS回路の動作が不安定になり、LSIの製造ばらつきやノイズなどに影響され、動作マージン減少、誤動作などの障害が、現状と比較して極めて増大する。つまり将来安心かつエコなアンビエント情報社会を実現するためには、情報通信・処理の主要素子であるCMOS トランジスタの動作電圧をしきい値電圧以下に低減できるLSI自動化設計技術と高信頼化設計技術の統合・融合したディペンタブルな低電圧LSI設計基盤技術が強く求められると考える。 本研究は、高い信頼性を持つディペンタブルな超低電圧LSI設計技術の開発を目的とする。研究の目標としては、既存研究(カスタム設計)と異なり、自動化設計により、設計複雑度や設計周期を減らし、並びに回路全体の信頼性を高めることを目指す。また、実チップ設計により、既存研究と比較してエネルギーを低減し、並びに低電圧領域における設計タイミングのばらつきを改善することを目標とする。 今年度では、主に以下の研究項目を行ってきた。(1)超低電圧LSI自動化設計技術について 具体的には、低電圧領域(サブスレッショルド領域)で動作する回路設計のため、①サブスレッショルド領域での遅延・電力のモデルの構築;②サブスレッショルド領域で動作させるため、既存のプロセスライブラリを用いて、トランジスタレベルでシミュレーションを行い、エネルギーが最小な電源電圧を選択できる合成手法の提案、及び③提案した最適エネルギー電圧選択手法をベースに上位レベル(RTLレベル)から低電圧による低エネルギー指向LSI自動合成フローの構築などの研究を取り込んだ。様々なアルゴリズムをコンピュータに実装し、評価実験を行った。既存のカスタム設計と異なり、合成時自動でエネルギー最小な電源電圧の選択ができ、Benchmark回路に適用し有効性を確認した。また、自動化設計により、設計複雑度や設計周期を減らすごとができた。(2)ディペンタブルなLSI設計技術について  具体的には、①LSI回路動作時の遅延、温度変化および電源電圧変化の解析、及び②電圧変動により、ディレイ変動を検出・制御する技術の研究を行った。研究成果として、理論面から、80%以上の論理パス上発生した遅延エーラの検出ができた。

  • システムオンチップのテスト容易化設計に関する研究

    2005  

     View Summary

    LSIの超大規模化・超微細化により、情報システム全体をワン・チップ上に実現することが可能になった。しかし、高集積化により故障をチェックするべき点が増え、各点の故障をテストするパターンの数は増加し、製造されたチップが正常に動作するか否かを調べるテストは益々困難になってきている。1チップあたりのテスト時間はテスト・パターンの数に比例するので、機能モジュールを複数集積したシステムオンチップ(SoC,System-on-a-Chip)では、集積したモジュールの数に比例した時間がかかり、テストの時間が非常に長くなる。その結果、SoCのテスト・コストが製造コストを超える勢いで増加しており、テストの品質も低下しているため、テストは半導体産業の発展を阻害する要因になりかねない。そのために、SoCに関する低コスト、高品質なテスト容易化設計方法の研究が重要となってきた。上記背景のもと,本研究ではテスト・データの圧縮技術やテスト時間削減の容易化設計手法に関する研究を行う。提案手法ではデザインに挿入され、少ないスキャン・チャネルから多数の内部スキャン・チェーンを供給するデコンプレッサで構成される。最先端のスキャンおよびテスト・データの圧縮技術と比較し、テスト・データの量とテスト時間を最大20 分の1までに削減できる。その研究成果を学会において発表した。また、多種の故障タイプのテストに対応し、故障解析方法の詳細の検討を行った.

▼display all

 

Syllabus

▼display all