Details of a Researcher

写真a

SHI, Youhua

Scopus Paper Info

Paper Count: 82 Citation Count: 582 h-index: 13

Click to view the Scopus page. The data was downloaded from Scopus API in May 19, 2026, via http://api.elsevier.com and http://www.scopus.com .

Google Scholar Information (Citations per year)

Citation Count: 798 h-index: 14 i10-index: 22

Click to view the Google Scholar page.

Scopus Information

Affiliation

Faculty of Science and Engineering, School of Fundamental Science and Engineering

Job title

Professor

Degree

Doctor of Engineering ( Waseda University )

Homepage URL

http://www.eps.sci.waseda.ac.jp/teachers_popup/shi.html

Education Background

　

-

2005

Waseda University Graduate School, Division of Engineering

Professional Memberships

　

　

　

IEICE
　

　

　

IPSJ
　

　

　

IEEE
　

　

　

応用物理学会
　

　

　

人工知能学会

Research Areas

Computer system / Information security / Electron device and electronic equipment

Research Interests

Trustworthy computing, energy harvesting, and intelligent system design

Awards

IEEK Best Paper Award

2012.11

Papers

Bennet's Doubler-Extended Converter With Optimized Bias for Enhanced Energy Extraction From Triboelectric Nanogenerators

Yirui Su, Youhua Shi

IEEE Transactions on Power Electronics 2025.09

DOI

Scopus

2

Citation

(Scopus)
A Novel Security Threat Model for Automated AI Accelerator Generation Platforms

Chao Guo, Youhua Shi

IEEE Access 2025

DOI

Scopus

2

Citation

(Scopus)
DSE-Based Hardware Trojan Attack for Neural Network Accelerators on FPGAs

Chao Guo, Masao Yanagisawa, Youhua Shi

IEEE Transactions on Neural Networks and Learning Systems 2024.10

　View Summary

Over the past few years, the emergence and development of design space exploration (DSE) have shortened the deployment cycle of deep neural networks (DNNs). As a result, with these open-sourced DSE, we can automatically compute the optimal configuration and generate the corresponding accelerator intellectual properties (IPs) from the pretrained neural network models and hardware constraints. However, to date, the security of DSE has received little attention. Therefore, we explore this issue from an adversarial perspective and propose an automated hardware Trojan (HT) generation framework embedded within DSE. The framework uses an evolutionary algorithm (EA) to analyze user-input data to automatically generate the attack code before placing it in the final output accelerator IPs. The proposed HT is sufficiently stealthy and suitable for both single and multifield-programmable gate array (FPGA) designs. It can also implement controlled accuracy degradation attacks and specified category attacks. We conducted experiments on LeNet, VGG-16, and YOLO, respectively, and found that for the LeNet model trained on the CIFAR-10 dataset, attacking only one kernel resulted in 97.3% of images being classified in the category specified by the adversary and reduced accuracy by 59.58%. Moreover, for the VGG-16 model trained on the ImageNet dataset, attacking eight kernels can cause up to 96.53% of the images to be classified into the category specified by the adversary and causes the model's accuracy to decrease to 2.5%. Finally, for the YOLO model trained on the PASCAL VOC dataset, attacking with eight kernels can cause the model to identify the target as the specified category and cause slight perturbations to the bounding boxes. Compared to the un-compromised designs, the look-up tables (LUTs) overhead of the proposed HT design does not exceed 0.6%.

DOI

Scopus

4

Citation

(Scopus)
A Dual-Output Rectifier-Based Self-Powered Interface Circuit for Triboelectric Nanogenerators

Yirui Su, Masao Yanagisawa, Youhua Shi

IEEE Transactions on Power Electronics 39 ( 6 ) 6630 - 6634 2024.06

　View Summary

Triboelectric nanogenerators (TENGs) offer a cost-effective solution for harvesting energy in Internet of Things (IoT) devices. However, their practical application is limited due to extremely high output voltage and low intrinsic capacitance, alongside the nonself-powered nature of current interface circuits and low transfer efficiency resulting from output voltage asymmetry. Addressing these issues, this letter introduces a dual-output rectifier-based interface circuit, innovatively designed to rectify TENG output into two distinct voltage magnitudes, optimizing for energy harvesting and switching generation. The experimental results validate our approach, showing gains of 2.75 and 2.34 times in terms of maximum output power over a full-wave rectifier (FWR)-based design at 2 and 3 Hz, respectively. Furthermore, under identical frequency and load conditions (1 MΩ at 2 and 3 Hz), the output gains reached 152 and 160 times that of the FWR. Our approach brings about a significant advancement in TENG integration for low-frequency and low-load IoT devices. This letter is accompanied by two videos demonstrating the charge response of 1 MΩ load at 2 and 3 Hz, respectively.

DOI

Scopus

11

Citation

(Scopus)
Transition Detector-Based Radiation-Hardened Latch for Both Single- And Multiple-Node Upsets

Saki Tajima, Masao Yanagisawa, Youhua Shi

IEEE Transactions on Circuits and Systems II: Express Briefs 67 ( 6 ) 1114 - 1118 2020.06

　View Summary

This brief presents an output transition detector-based radiation-hardened latch (TDRHL) for reliability improvement. With an error recovery assistant logic and an in-situ transition detector, for any radiation induced single- and double-node upsets, the proposed TDRHL can 1) provide full self-recovery capability and 2) generate a warning signal for architecture-level recovery when soft errors cause the latch output flipped. The evaluation results show that TDRHL outperforms state-of-the-art double-node upset tolerant designs with addition error detection capability, and up to 5.0X power-delay-product improvement can be achieved.

DOI

Scopus

24

Citation

(Scopus)
Robust secure scan design against scan-based differential cryptanalysis

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 20 ( 1 ) 176 - 181 2012.01

　View Summary

Scan technology carries the potential risk of being misused as a side channel to leak out the secrets of crypto cores. The existing scan-based attacks could be viewed as one kind of differential cryptanalysis, which takes advantages of scan chains to observe the bit changes between pairs of chosen plaintexts so as to identify the secret keys. To address such a design/test challenge, this paper proposes a robust secure scan structure design for crypto cores as a countermeasure against scan-based attacks to maintain high security without compromising the testability. © 2006 IEEE.

DOI

Scopus

24

Citation

(Scopus)
Improved launch for higher TDF coverage with fewer test patterns

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 29 ( 8 ) 1294 - 1299 2010.08

　View Summary

Due to the limitations of scan structure, the second vector in transition delay test is usually applied either by shift operation or by functional launch, which possibly results in unsatisfying transition delay fault (TDF) coverage. To overcome such a limitation for higher TDF coverage, a novel improved launch delay test technique that combines the pros of launch-on-shift and launch-on-capture tests is introduced in this paper. The proposed method can achieve near perfect TDF coverage with fewer test patterns without the need for a global fast scan enable signal. Experimental results on ISCAS89 and ITC99 benchmark circuits are included to show the effectiveness of the proposed method. © 2010 IEEE.

DOI

Scopus
Robust Training to Secure Automated AI Accelerator Generation Against Malicious Platforms

Chao Guo, Youhua Shi

2026

DOI

Scopus
Strategy for Improving Cycle of Maximized Energy Output of Triboelectric Nanogenerators

Yirui Su, Masao Yanagisawa, Youhua Shi

Proceedings of the 2023 International Conference on IC Design and Technology, ICICDT 2023 131 - 135 2023

　View Summary

Triboelectric nanogenerator (TENG) is a new energy harvesting technology proposed in recent years. However, the output energy per cycle from TENG to a load has been proven to be confined within Cycle of Maximized Energy Output (CMEO) which is always used as a common standard to evaluate the performance of a TENG. In this work, a new energy extraction strategy is proposed that utilizes the energy of TENG's negative half cycle to complete the pre-biasing for TENG's positive half cycle, which realizes the output of TENG per cycle beyond the CMEO limit. The simulation results show that the output energy per cycle of this strategy is 1.74 times that of CMEO.

DOI

Scopus

1

Citation

(Scopus)
An Area-Power-Efficient Multiplier-less Processing Element Design for CNN Accelerators

Jiaxiang Li, Masao Yanagisawa, Youhua Shi

Proceedings of International Conference on ASIC 2023

　View Summary

Machine learning has achieved remarkable success in various domains. However, the computational demands and memory requirements of these models pose challenges for deployment on privacy-secured or wearable edge devices. To address this issue, we propose an area-power-efficient multiplier-less processing element (PE) in this paper. Prior to implementing the proposed PE, we apply a power-of-2 dictionary-based quantization to the model. We analyze the effectiveness of this quantization method in preserving the accuracy of the original model and present the standard and a specialized diagram illustrating the schematics of the proposed PE. Our evaluation results demonstrate that our design achieves approximately 30% lower power consumption and 35% smaller core area compared to a conventional multiplication-and-accumulation (MAC) PE. Moreover, the applied quantization reduces the model size and operand bit-width, resulting in reduced on-chip memory usage and energy consumption for memory accesses.

DOI

Scopus

1

Citation

(Scopus)
Scalable Hardware Efficient Architecture for Parallel FIR Filters with Symmetric Coefficients

Jinghao Ye, Masao Yanagisawa, Youhua Shi

Electronics (Switzerland) 11 ( 20 ) 2022.10

　View Summary

Symmetric convolutions can be utilized for potential hardware resource reduction. However, they have not been realized in state-of-the-art transposed block FIR designs. Therefore, we explore the feasibility of symmetric convolution in transposed parallel FIRs and propose a scalable hardware efficient parallel architecture. The proposed design inserts delay elements after multipliers for temporal reuse of intermediate tap products. By doing this, the number of required multipliers can be reduced by half. As a result, we can achieve up to 3.2× and 1.64× area efficiency improvements over the modern transposed block method on reconfigurable and fixed designs, respectively. These results confirm the effectiveness of the proposed STB-FIR architecture for hardware-efficient, high-speed signal processing.

DOI

Scopus

5

Citation

(Scopus)
Dataflow Optimization through Exploring Single-Layer and Inter-Layer Data Reuse in Memory-Constrained Accelerators

Jinghao Ye, Masao Yanagisawa, Youhua Shi

Electronics (Switzerland) 11 ( 15 ) 2022.08

　View Summary

Off-chip memory access has become the performance and energy bottleneck in memory-constrained neural network accelerators. To provide a solution for the energy efficient processing of various neural network models, this paper proposes a dataflow optimization method for modern neural networks by exploring the opportunity of single-layer and inter-layer data reuse to minimize the amount of off-chip memory access in memory-constrained accelerators. A mathematical analysis of three inter-layer data reuse methods is first presented. Then, a comprehensive exploration to determine the optimal data reuse strategy from single-layer and inter-layer data reuse approaches is proposed. The result shows that when compared to the existing single-layer-based exploration method, SmartShuttle, the proposed approach can achieve up to 20.5% and 32.5% of off-chip memory access reduction for ResNeXt-50 and DenseNet-121, respectively.

DOI

Scopus
Power-Efficient Deep Convolutional Neural Network Design through Zero-Gating PEs and Partial-Sum Reuse Centric Dataflow

Lin Ye, Jinghao Ye, Masao Yanagisawa, Youhua Shi

IEEE Access 9 17411 - 17420 2021

　View Summary

Convolution neural networks (CNNs) have shown great success in many areas such as object detection and pattern recognition at the cost of extreme high computation complexity and significant external memory access, which makes state-of-the-art deep CNNs difficult to be implemented on resource-constrained portable/wearable devices with limited capacity of battery. To address this design challenge, a power-efficient CNN design through zero-gating processing elements (PEs) and partial-sum reuse centric dataflow is proposed in this paper. Unlike the existing works which either only consider the zeros in activation maps or use off-chip training process for on-chip computation reduction, a zero-gating PE design is proposed to avoid unnecessary on-chip computation by taking advantages of the large number of zeros in both the filter's weights of pre-trained models and the activation maps. Furthermore, a partial-sum reuse centric dataflow is also proposed for off-chip DRAM access reduction. The evaluation results show that the overall power consumption of PE arrays with our proposal can be reduced by 37% and 14% at the cost of 8% and 1% area overhead when compared to the baseline PE design and the existing only-activation-gated design (i.e. that in Eyeriss), respectively. Moreover, the proposed method can achieve 35% and 47% DRAM access reduction with the corresponding 14% and 49% energy savings for AlexNet and VGG-16 when compared to that in Eyeriss.

DOI

Scopus

5

Citation

(Scopus)
A Reconfigurable Area and Energy Efficient Hardware Accelerator of Five High-order Operators for Vision Sensor Based Robot Systems

Qianjin Wang, Yi Zhan, Bingqiang Liu, Jiajun Wu, Youhua Shi, Guoyi Yu, Chao Wang

2021 IEEE International Conference on Integrated Circuits, Technologies and Applications, ICTA 2021 189 - 190 2021

　View Summary

This paper proposes a reconfigurable hardware accelerator design of five major high-order operators for vision sensor based robot systems. These five high-order operators include convolution, median filtering, Euclidean distance, vector inner-product and iToF, which are intensively used in robot vision algorithms. In this work, a reconfigurable hardware accelerator design method for multiple high-order operators is proposed. FPGA implementation results show that the proposed design has achieved area efficiency with 17.54% reduced LUTs and 44.02% reduced FFs against the baseline hardware design of the five high-order operators. Case studies of Laplace edge-detection and iToF benchmark demonstrate the energy efficiency of proposed design with 19.70% and 6.2% reduction in energy consumption, respectively.

DOI

Scopus
A High-Performance Symmetric Hybrid Form Design for High-Order FIR Filters

Jinghao Ye, Masao Yanagisawa, Youhua Shi

Proceedings of 2020 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2020 121 - 124 2020.12

　View Summary

In this paper, a symmetric hybrid form for high performance finite impulse response (FIR) filters with symmetric coefficients is proposed, which can be utilized in both fixed and reconfigurable FIR implementations to solve the driving capacity problem caused by the high fanout signals in the existing symmetric transposed form based FIR architecture. The evaluation results show that, when compared with the existing high speed FIR designs such as the symmetric systolic form in [13] and the hybrid form in [1], the proposed form can achieve significant area and power savings with great ADP and PDP reduction. Moreover, when compared with the symmetric systolic form in [13] the required latency can be approximately reduced by 33.3%, which clearly shows the performance improvement of the proposed method.

DOI

Scopus

1

Citation

(Scopus)
Faithfully truncated adder-based area-power efficient FIR design with predefined output accuracy

Jinghao Ye, Masao Yanagisawa, Youhua Shi

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E103A ( 9 ) 1063 - 1070 2020.09

　View Summary

To solve the area and power problems in Finite Impulse Response (FIR) implementations, a faithfully truncated adder-based FIR design is presented in this paper for significant area and power savings while the predefined output accuracy can still be obtained. As a solution to the accuracy loss caused by truncated adders, a static error analysis on the utilization of truncated adders in FIRs was performed. According to the mathematical analysis, we show that, with a given accuracy constraint, the optimal truncated adder configuration for an area-power efficient FIR design can be effortlessly determined. Evaluation results on various FIR implementations by using the proposed faithfully truncated adder designs showed that up to 35.4% and 27.9% savings in area and power consumption can be achieved with less than 1 ulp accuracy loss for uniformly distributed random inputs. Moreover, as a case study for normally distributed signals, a fixed 6-tap FIR is implemented for electrocardiogram (ECG) signal filtering was implemented, in which even with the increased truncated bits up to 10, the mean absolute error (E) can be guaranteed to be less than 1 ulp while up to 29.7% and 25.3% savings in area and power can be obtained.

DOI

Scopus

1

Citation

(Scopus)
A Power-Efficient Soft Error Hardened Latch Design with In-Situ Error Detection Capability

Saki Tajima, Masao Yanagisawa, Youhua Shi

Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics 2019-November 53 - 56 2019.11

　View Summary

A power-efficient single node upset hardened latch design with in-situ error detection capability, EDSL, is proposed in this paper for reliability improvement against soft errors. Our simulation results show that the proposed EDSL design can not only recover from any incurred single node upset, but also provide in-situ error detection capability when the latch output is upset. When compared with state-of-the-art error-detection-based and SNU resilient designs, the proposed EDSL latch can achieve up to 72.25% and 79.74% reduction of power-delay-product respectively, which clearly shows the effectiveness of the proposed method.

DOI

Scopus
A Zero-Gating Processing Element Design for Low-Power Deep Convolutional Neural Networks

Lin Ye, Jinghao Ye, Masao Yanagisawa, Youhua Shi

Proceedings - APCCAS 2019: 2019 IEEE Asia Pacific Conference on Circuits and Systems: Innovative CAS Towards Sustainable Energy and Technology Disruption 317 - 320 2019.11

　View Summary

Convolution neural networks (CNNs) have shown great success in many areas such as object detection and pattern recognition. However, the high computational complexity of state-of-the-art deep CNNs makes them extreme difficult to be run on resource-constrained mobile and wearable devices. To address this design challenge, in this paper we first analyzed the filters' weights of pre-trained models from four state-of-the-art CNNs. We found that in all the CNNs that we analyzed, from about 20% (AlexNet) to 43% (VGG-19) of the weights are zeros, which lead to redundant large amounts of computation. Then, based on this observation, a zero-gating processing element (PE) design was proposed for low-power deep CNNs, in which the vast number of zeros in both activation maps and filter weights are explored to eliminate redundant computation for power reduction. We implemented our proposal with VGG-16 using ImageNet dataset. Experiments were conducted for evaluations of area and total power consumption. Compared with the baseline PE design without zero-gating, overall the proposed zero-gating PE can achieve 37% power saving while the corresponding area overhead is less than 8%.

DOI

Scopus

5

Citation

(Scopus)
A Bit-Segmented Adder Chain based Symmetric Transpose Two-Block FIR Design for High-Speed Signal Processing

Jinghao Ye, Masao Yanagisawa, Youhua Shi

Proceedings - APCCAS 2019: 2019 IEEE Asia Pacific Conference on Circuits and Systems: Innovative CAS Towards Sustainable Energy and Technology Disruption 29 - 32 2019.11

　View Summary

A high-speed FIR filter structure is proposed in this paper by utilizing bit-segmentation adders and symmetric transpose 2-block FIR structure. First, a bit-segmented adder chain-based design is proposed with bit-segmentation adders. Second, a basic unit design of symmetric transpose block FIR is proposed to reduce the critical path delay. The evaluation results show that, when compared with state-of-the-art high-speed CSD multiplier-based FIR filter design, the proposed design requires 14.1% less area while provides 7.9% frequency improvement, 10.2% reduction of power consumption, 22.8% reduction of energy-delay-product and 20.4% reduction of area-delay-product, which shows the effectiveness of the proposed method.

DOI

Scopus

1

Citation

(Scopus)
An adder-segmentation-based FIR for high speed signal processing

Jinghao Ye, Masao Yanagisawa, Youhua Shi

Proceedings of International Conference on ASIC 2019.10

　View Summary

An advanced adder-segmentation-based FIR filter design for high speed signal processing is proposed in this paper. In the proposed method, the critical path delay is shortened through adder segmentation. An analysis for the optimization of adder segmentation is also proposed, which can be used for critical path delay balance to maximize the performance of FIR filters. The evaluation results show that the proposed design can achieve up to 30.7% and 22.8% reduction in area-delay-product (ADP) and energy-delay-product (EDP) when compared with the existing FIR filters.

DOI

Scopus

1

Citation

(Scopus)
Static error analysis and optimization of faithfully truncated adders for area-power efficient FIR designs

Jinghao Ye, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

Proceedings - IEEE International Symposium on Circuits and Systems 2019-May 2019

　View Summary

Faithfully truncated adders are used for low cost FIR implementations in this paper, which improves state-of-the-art CSD-based FIR filter designs for further area and power reduction while meeting the accuracy requirement. As a solution to the accuracy loss caused by truncated adders, this paper performed a static error analysis of truncated adders. Furthermore, based upon our mathematical analysis, we show that, with a given accuracy constraint, an optimal truncated adder configuration can be effortlessly determined for area-power efficient FIR designs. Evaluation results on various FIR designs showed that 16.8%~35.4% reduction in area and 11.8%~27.9% in power saving can be achieved with the proposed optimal truncated adder designs within an average error of 1 ulp.

DOI

Scopus

6

Citation

(Scopus)
Hardware Trojan Detection Utilizing Machine Learning Approaches

Kento Hasegawa, Youhua Shi, Nozomu Togawa

Proceedings - 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications and 12th IEEE International Conference on Big Data Science and Engineering, Trustcom/BigDataSE 2018 1891 - 1896 2018.09

　View Summary

Hardware security has become a serious concern in recent years. Due to the outsourcing in hardware production, malicious circuits (or hardware Trojans) can be easily inserted into hardware products by attackers. Since hardware Trojans are tiny and stealthy, their detection is difficult. Under the circumstances, numerous hardware-Trojan detection methods have been proposed. In this paper, we elaborate the overview of hardware-Trojan detection and review the hardware-Trojan detection methods using machine learning which is one of the state-of-the-art approaches.

DOI

Scopus

42

Citation

(Scopus)
Extension and performance/accuracy formulation for optimal GeAr-based approximate adder designs

Ken Hayamizu, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E101A ( 7 ) 1014 - 1024 2018.07

　View Summary

Approximate computing is a promising solution for future energy-efficient designs because it can provide great improvements in performance, area and/or energy consumption over traditional exact-computing designs for non-critical error-tolerant applications. However, the most challenging issue in designing approximate circuits is how to guarantee the pre-specified computation accuracy while achieving energy reduction and performance improvement. To address this problem, this paper starts from the state-of-the-art general approximate adder model (GeAr) and extends it for more possible approximate design candidates by relaxing the design restrictions. And then a maximum-error-distance-based performance/accuracy formulation, which can be used to select the performance/energy-accuracy optimal design from the extended design space, is proposed. Our evaluation results show the effectiveness of the proposed method in terms of area overhead, performance, energy consumption, and computation accuracy.

DOI

Scopus

2

Citation

(Scopus)
A low power soft error hardened latch with schmitt-trigger-based C-Element

Saki Tajima, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E101A ( 7 ) 1025 - 1034 2018.07

　View Summary

To deal with the reliability issue caused by soft errors, this paper proposed a low power soft error hardened latch (SHC) design using a novel Schmitt-Trigger-based C-element for reliable low power applications. Unlike state-of-the-art soft error tolerant latches that are usually based on hardware redundancy with large area overhead and high power consumption, the proposed SHC latch is implemented through double-sampling and node-checking using a novel Schmitt-Trigger-based C-element, which can help to reduce the area overhead and the corresponding power consumption as well. The evaluation results show that the total number of transistors of the proposed SHC latch is only increased by 2 when compared to the conventional unhardened C2MOS latch, while up to 20.35% and 82.96% power reduction can be achieved when compared to the conventional un-hardened C2MOS latch and the existing soft error tolerant HiPeR design, respectively.

DOI

Scopus

7

Citation

(Scopus)
A low cost and high speed CSD-based symmetric transpose block FIR implementation

Jinghao Ye, Youhua Shi, Nozomu Togawa, Masao Yanagisawa

Proceedings of International Conference on ASIC 2017-October 311 - 314 2017.07

　View Summary

In this paper, a low cost and high speed CSD-based symmetric transpose block FIR design was proposed for low cost digital signal processing. First, the existing area-efficient CSD-based multiplier was optimized by considering the reusability and the symmetry of coefficients for area reduction. Second, the position of the input register was changed for high speed transpose block FIR processing in which half of the number of required multipliers can be saved. When compared with the existing block FIR designs, the proposed FIR design can increase the data rate from 238.66 MHz to 373.13 MHz while saving 10.89% area and 21.30% energy consumption as well.

DOI

Scopus

8

Citation

(Scopus)
Soft error tolerant latch designs with low power consumption (invited paper)

Saki Tajima, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

Proceedings of International Conference on ASIC 2017-October 52 - 55 2017.07

　View Summary

As semiconductor technology continues scaling down, the reliability issue has become much more critical than ever before. Unlike traditional hard-errors caused by permanent physical damage which can't be recovered in field, soft errors are caused by radiation or voltage/current fluctuations that lead to transient changes on internal node states, thus they can be viewed as temporary errors. However, due to the unpredictable occurrence of soft errors, it is desirable to develop soft error tolerant designs. For this reason, soft error tolerant design techniques have gained great research interest. In this paper, we will explain the soft error mechanism and then review the existing soft error tolerant design techniques with particular emphasis on SEH family because they can achieve low power consumption and small performance overhead as well.

DOI

Scopus

2

Citation

(Scopus)
Improved monitoring-path selection algorithm for suspicious timing error prediction based timing speculation

Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings - 2015 IEEE 11th International Conference on ASIC, ASICON 2015 2016.07

　View Summary

As process technology is scaling down, timing speculation techniques such as Razor and STEP are emerged as alternative solutions to reduce required margins due to various variation effects. Unlike Razor, STEP is a prediction-based timing speculation method to predict suspicious timing errors before they really appear, and thus it can result in more performance improvement. Therefore, an improved monitoring-path selection algorithm for STEP-based timing speculation is proposed in this paper, in which candidate monitoring-paths are selected based on short path removement and path length estimation. Experimental results show that the proposed algorithm realizes an average of 1.71X overclocking compared with worst-case based designs.

DOI

Scopus
A delay variation and floorplan aware high-level synthesis algorithm with body biasing

Koki Igawa, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings - International Symposium on Quality Electronic Design, ISQED 2016-May 75 - 80 2016.05

　View Summary

In this paper, we propose a delay variation and floorplan aware high-level synthesis algorithm with body biasing, which minimizes the average leakage energy of manufactured chips. To realize a floorplan-oriented high-level synthesis, we utilize a huddle-based distributed register architecture (HDR architecture), one of the DR architectures. HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit but can increase the latency. We assign CDFG nodes in critical paths to the huddles with larger expected leakage energy and those in non-critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 38.9% without latency and yield degradation compared with typical-case design with body biasing.

DOI

Scopus

1

Citation

(Scopus)
In-situ Trojan authentication for invalidating hardware-Trojan functions

Masaru Oya, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings - International Symposium on Quality Electronic Design, ISQED 2016-May 152 - 157 2016.05

　View Summary

Due to the fact that we do not know who will create hardware Trojans (HTs), and when and where they would be inserted, it is very difficult to correctly and completely detect all the real HTs in untrusted ICs, and thus it is desired to incorporate in-situ HT invalidating functions into untrusted ICs as a countermeasure against HTs. This paper proposes an in-situ Trojan authentication technique for gate-level netlists to avoid security leakage. In the proposed approach, an untrusted IC operates in authentication mode and normal mode. In the authentication mode, an embedded Trojan authentication circuit monitors the bit-flipping count of a suspicious Trojan net within the pre-defined constant clock cycles and identify whether it is a real Trojan or not. If the authentication condition is satisfied, the suspicious Trojan net is validated. Otherwise, it is invalidated and HT functions are masked. By doing this, even untrusted netlists with HTs can still be used in the normal mode without security leakage. By setting the appropriate authentication condition using training sets from Trust-HUB gate-level benchmarks, the proposed technique invalidates successfully only HTs in the training sets. Furthermore, by embedding the in-situ Trojan authentication circuit into a Trojan-inserted AES crypto netlist, it can run securely and correctly even if HTs exist where its area overhead is just 1.5% with no delay overhead.

DOI

Scopus

6

Citation

(Scopus)
Timing monitoring paths selection for wide voltage IC

Weiwei Shan, Wentao Dai, Youhua Shi, Peng Cao, Xiaoyan Xiang

IEICE Electronics Express 13 ( 8 ) 2016.03

　View Summary

Wide voltage range circuit has got widespread attention where in-situ timing monitoring based adaptive voltage scaling (AVS) becomes necessary to reduce the design margin. However, the severe PVT variations across near-threshold to super-threshold cause too many critical paths to be monitored. Here activation oriented monitoring paths selection method is proposed to reduce the monitored paths for wide voltage IC. The minimum delay value of the longest activated path is found by dynamic timing analysis and set as the selection threshold. Those paths longer than this threshold by STA analysis are selected to be monitored. Applied on a 40 nm AVS Systemon-Chip, it reduces the monitoring paths to only 22% of all critical paths with remarkable power gains under 0.6 V–1.1 V.

DOI

Scopus

6

Citation

(Scopus)
A process-variation-aware multi-scenario high-level synthesis algorithm for distributed-register architectures

Koki Igawa, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

International System on Chip Conference 2016-February 7 - 12 2016.02

　View Summary

In order to tackle a process-variation problem, we can define several scenarios, each of which corresponds to a particular LSI behavior, such as a typical-case scenario and a worst-case scenario. By designing a single LSI chip which realizes multiple scenarios simultaneously, we can have a process-variation-tolerant LSI chip. In this paper, we propose a process-variation-aware low-latency and multi-scenario high-level synthesis algorithm targeting new distributed-register architectures, called HDR architectures. We assume two scenarios, a typical-case scenario and a worst-case scenario, and realize them onto a single chip. We first schedule/bind each of the scenarios independently. After that, we commonize the scheduling/binding results for the typical-case and worst-case scenarios and thus generate a commonized area-minimized floorplan result. Experimental results show that our algorithm reduces the latency of the typical-case scenario by up to 50% without increasing the latency of the worst-case scenario, compared with several existing methods.

DOI

Scopus

2

Citation

(Scopus)
A hardware-trojans identifying method based on trojan net scoring at gate-level netlists

Masaru Oya, Youhua Shi, Noritaka Yamashita, Toshihiko Okamura, Yukiyasu Tsunoo, Satoshi Goto, Masao Yanagisawa, Nozomu Togawa

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E98A ( 12 ) 2537 - 2546 2015.12

　View Summary

Outsourcing IC design and fabrication is one of the effective solutions to reduce design cost but it may cause severe security risks. Particularly, malicious outside vendors may implement Hardware Trojans (HTs) on ICs. When we focus on IC design phase, we cannot assume an HT-free netlist or a Golden netlist and it is too difficult to identify whether a given netlist is HT-free or not. In this paper, we propose a score-based hardware-trojans identifying method at gate-level netlists without using a Golden netlist. Our proposed method does not directly detect HTs themselves in a gate-level netlist but it detects a net included in HTs, which is called Trojan net, instead. Firstly, we observe Trojan nets from several HT-inserted benchmarks and extract several their features. Secondly, we give scores to extracted Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to classify HTfree and HT-inserted netlists. Based on these scores, we can successfully classify HT-free and HT-inserted netlists in all the Trust-HUB gate-level benchmarks and ISCAS85 benchmarks as well as HT-free and HT-inserted AES gate-level netlists. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be "HTinserted" and all the HT-free gate-level benchmarks to be "HT-free" in approximately three hours for each benchmark.

DOI

Scopus

12

Citation

(Scopus)
A floorplan-aware high-level synthesis technique with delay-variation tolerance

Kazushi Kawamura, Yuta Hagio, Youhua Shi, Nozomu Togawa

Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015 122 - 125 2015.09

　View Summary

For realizing better trade-off between performance and yield rate in recent LSI designs, it is required to deal with increasing the ratios of interconnect delay as well as delay variation. In this paper, a novel floorplan-aware high-level synthesis technique with delay-variation tolerance is proposed. By utilizing floorplan-driven architectures, interconnect delays can be estimated and then handled even in high-level synthesis. Applying our technique enables to realize two scheduling/binding results (one is a non-delayed result and the other is a delayed result) simultaneously on a chip with small area/performance overhead, and either one of them can be selected according to the post-silicon delay variation. Experimental results demonstrate that our technique can reduce delayed scheduling/binding latency by up to 32.3% compared with conventional approaches.

DOI

Scopus

2

Citation

(Scopus)
A universal delay line circuit for variation resilient IC with self-calibrated time-to-digital converter

Shuai Shao, Youhua Shi, Wentao Dai, Jianyi Meng, Weiwei Shan

Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015 126 - 129 2015.09

　View Summary

A universal delay monitor used to imitate the real critical paths is developed for variation resilient integrated circuit. This monitor is constructed based on the different proportion of logic cells and interconnects. The delay of the monitor is detected by a time-to-digital converter which keeps the sampling results precise. To reduce the deviation of the sampling results caused by PVT, a novel time-to-digital converter with self-calibration mechanism is developed. This variation resilient method based adaptive voltage scaling is applied on an ARM7 based System on a Chip on 0.18 μm CMOS process with a 112M signoff frequency and an area of 1.3∗1.3 mm2. The simulation results show that it has a 43.42% gain of power consumption under FF corner, -25°C compared to the fixed 1.8 V traditional design.

DOI

Scopus
FPGA-based SHA-3 acceleration on a 32-bit processor via instruction set extension

Yi Wang, Youhua Shi, Chao Wang, Yajun Ha

Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015 305 - 308 2015.09

　View Summary

As embedded systems play more and more important roles Internet of Things (IoT), the integration of cryptographic functionalities is an urgent demand to ensure data and information security. Recently, Keccak was declared as the winner of the third generation of Secure Hashing Algorithm (SHA-3). However, implementing SHA-3 on a specific 32-bit processor failed to meet the performance requirement. On the other hand, implementing it as a cryptographic coprocessor consumes a lot of extra area and requires customized driver program. Although implementing Keccak on a 64-bit platform is more efficient, this platform is not suitable for embedded implementation. In this paper, we propose a novel SHA-3 implementation using instruction set extension based on a 32-bit LEON3 processor (an open source processor), with the goals of reducing execution cycles and code size. Experimental results show that the proposed design reduces around 87% execution cycles and 10.5% code size as compared to reference designs. Our design takes up only 9.44% extra area with negligible speed overhead compared to the standard LEON3 processor. Compared to the existing hardware accelerators, our proposed design occupies only half of area resources and does not require extra driver programs to be developed when integrated into the overall system.

DOI

Scopus

13

Citation

(Scopus)
Scan-based side-channel attack against symmetric key ciphers using scan signatures

Mika Fujishiro, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015 309 - 312 2015.09

　View Summary

There are a number of studies on a side-channel attack which uses information exploited from the physical implementation of a cryptosystem. A scan-based side-channel attack utilizes scan chains, one of design-for-test techniques and retrieves the secret information inside the cryptosystem. In this paper, scan-based side-channel attack methods against symmetric key ciphers such as block ciphers and stream ciphers using scan signatures are presented to show the risk of scan-based attacks.

DOI

Scopus

1

Citation

(Scopus)
An energy-efficient floorplan driven high-level synthesis algorithm for multiple clock domains design

Shin Ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E98A ( 7 ) 1376 - 1391 2015.07

　View Summary

In this paper, we first propose an HDR-mcd architecture, which integrates periodically all-in-phase based multiple clock domains and multi-cycle interconnect communication into high-level synthesis. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay should be considered during high-level synthesis. Next, we propose a high-level synthesis algorithm for HDR-mcd, which can reduce energy consumption by optimizing configuration and placement of huddles. Experimental results show that the proposed method achieves 32.5% energy-saving compared with the existing single clock domain based methods

DOI

Scopus

1

Citation

(Scopus)
An effective suspicious timing-error prediction circuit insertion algorithm minimizing area overhead

Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E98A ( 7 ) 1406 - 1418 2015.07

　View Summary

As process technologies advance, timing-error correction techniques have become important as well. A suspicious timing-error prediction (STEP) technique has been proposed recently, which predicts timing errors by monitoring themiddle points, or check points of several speedpaths in a circuit. However, if we insert STEP circuits (STEPCs) in the middle points of all the paths from primary inputs to primary outputs, we need many STEPCs and thus require too much area overhead. How to determine these check points is very important. In this paper, we propose an effective STEPC insertion algorithm minimizing area overhead. Our proposed algorithm moves the STEPC insertion positions to minimize inserted STEPC counts. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs and reduce the required number of STEPCs to 1/10-1/80 and their area to 1/5-1/8 compared with a naive algorithm. Furthermore, our algorithm realizes 1.12X-1.5X overclocking compared with just inserting STEPCs into several speed-paths.

DOI

Scopus

3

Citation

(Scopus)
A score-based classification method for identifying Hardware-Trojans at gate-level netlists

Masaru Oya, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings -Design, Automation and Test in Europe, DATE 2015-April 465 - 470 2015.04

　View Summary

Recently, digital ICs are often designed by outside vendors to reduce design costs in semiconductor industry, which may introduce severe risks that malicious attackers implement Hardware Trojans (HTs) on them. Since IC design phase generates only a single design result, an RT-level or gate-level netlist for example, we cannot assume an HT-free netlist or a Golden netlist and then it is too difficult to identify whether a generated netlist is HT-free or HT-inserted. In this paper, we propose a score-based classification method for identifying HT-free or HT-inserted gate-level netlists without using a Golden netlist. Our proposed method does not directly detect HTs themselves in a gate-level netlist but a net included in HTs, which is called Trojan net, instead. Firstly, we observe Trojan nets from several HT-inserted benchmarks and extract several their features. Secondly, we give scores to extracted Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to classify HT-free and HT-inserted netlists. Based on these scores, we can successfully classify HT-free and HT-inserted netlists in all the Trust-HUB gate-level benchmarks. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be 'HT-inserted' and all the HT-free gate-level benchmarks to be 'HT-free' in approximately three hours for each benchmark.

DOI

Scopus

121

Citation

(Scopus)
Secure scan design using improved random order and its evaluations

Masaru Oya, Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 2015-February ( February ) 555 - 558 2015.02

　View Summary

Scan test using scan chains is one of the most important DFT techniques. However, scan-based attacks are reported which can retrieve the secret key in crypto circuits by using scan chains. Secure scan architecture is strongly required to protect scan chains from scan-based attacks. This paper proposes an improved version of random order as a secure scan architecture. In improved random order, a scan chain is partitioned into multiple sub-chains. The structure of the scan chain changes dynamically by selecting a subchain to scan out. Testability and security of the proposed improved random order are also discussed in the paper, and the implementation results demonstrate the effectiveness of the proposed method.

DOI

Scopus

8

Citation

(Scopus)
In-situ timing monitoring methods for variation-resilient designs

Youhua Shi, Nozomu Togawa

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 2015-February ( February ) 735 - 738 2015.02

　View Summary

With technology scaling, process, voltage, and temperature (PVT) variations pose great challenges on integrated circuit designs. Conventionally, LSI circuits are designed by adding pessimistic timing margin to guarantee 'always correct' operations even under worst-case conditions. However, due to the increasing PVT variations, unacceptable larger design guard band should be reserved to avoid timing errors on critical paths of circuits, which will therefore lead to very inefficient designs in terms of power and performance. For this reason, in-situ timing monitoring technique has gained great research interest. In this paper, we will review existing variation-resilient design techniques with particular emphasis on in-situ timing monitoring techniques including both detection and prediction-based methods. The effectiveness of in-situ timing monitoring techniques will be discussed. Finally, we show an example of in-situ timing monitoring technique called STEP with applications to general pipeline designs.

DOI

Scopus
An area-overhead-oriented monitoring-path selection algorithm for suspicious timing error prediction

Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 2015-February ( February ) 300 - 303 2015.02

　View Summary

As process technologies advance, the importance of timing error correction techniques is increasing as well. In this paper, We propose an area-overhead-oriented monitoring-path selection algorithm for suspicious timing error prediction circuits (STEPCs). STEPC predicts timing errors by monitoring the middle points of several speed-paths in a circuit. However, we need many STEPCs with a high area overhead to predict timing errors in an overall circuit. Our proposed method moves the STEPC insertion positions to minimize the number of inserted STEPCs. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs. Our proposed algorithm reduces the required number of STEPCs to 1/19 and their area to 1/5 compared with a naive algorithm. Furthermore, our algorithm realizes 2.25X overclocking compared with just inserting STEPCs into several speed-paths.

DOI

Scopus

1

Citation

(Scopus)
Throughput driven check point selection in suspicious timing error prediction based designs

Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

2014 IEEE 5th Latin American Symposium on Circuits and Systems, LASCAS 2014 - Conference Proceedings 2014

　View Summary

In this paper, a throughput-driven design technique is proposed, in which a suspicious timing error prediction circuit is inserted to monitor the signal transitions at some selected check points. Unlike previous works where timing errors are detected after their occurrence, the proposed method tries to use the real intermediate signal transitions for timing error prediction. The check point selection will affect both the maximal operation frequency and the suspicious timing error overestimation rate, both of which have an effect on the overall throughput, thus an analysis on the check point selection is also given. In our work, the circuit can be overclocked by a factor of 2 or more with ignorable area overhead while guarantees the always-correct output. © 2014 IEEE.

DOI

Scopus
Floorplan driven architecture and high-level synthesis algorithm for dynamic multiple supply voltages

Shin Ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E96-A ( 12 ) 2597 - 2611 2013.12

　View Summary

In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delay into high-level synthesis. In AVHDR architecture, voltages can be dynamically assigned for energy reduction. In other words, low supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Next, an AVHDR-based high-level synthesis algorithm is proposed. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, the modules in each huddle can be placed close to each other and the corresponding AVHDR architecture can be generated and optimized with floorplanning information. Experimental results show that on average our algorithm achieves 43.9% energy-saving compared with conventional algorithms.Copyright © 2013 The Institute of Electronics, Information and Communication Engineers.

DOI

Scopus

2

Citation

(Scopus)
Secure scan design with dynamically configurable connection

Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 256 - 262 2013

　View Summary

Scan test is a powerful test technique which can control and observe the internal states of the circuit under test through scan chains. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore new secure test methods are required to satisfy both testability and security requirements. In this paper, a secure scan design is proposed to achieve adequate security requirement as a countermeasure against scan-based attacks, while still maintain high testability like normal scan testing. In our method, the internal scan chain is divided into several sub chains, and the connection order of sub chains can be dynamically changed. In addition, how to decide the connection order of those sub chains so that it can't be identified by an attacker is also proposed in this paper. The proposed method is implemented on an AES circuit to show its effectiveness, and a security analysis is also given to show how the proposed approach can be used as a countermeasure against those known scan-based attacks. © 2013 IEEE.

DOI

Scopus

35

Citation

(Scopus)
Suspicious timing error prediction with in-cycle clock gating

Youhua Shi, Hiroaki Igarashi, Nozomu Togawa, Masao Yanagisawa

Proceedings - International Symposium on Quality Electronic Design, ISQED 335 - 340 2013

　View Summary

Conventionally, circuits are designed to add pessimistic timing margin to solve delay variation problems, which guarantees 'always correct' operations. However, due to the fact that such a worst-case condition occurs rarely, the traditional pessimistic design method is therefore becoming one of the main obstacles for designers to achieve higher performance and/or ultra-low power consumption. By monitoring timing error occurrence during circuit operation, adaptive timing error detection and recovery methods have gained wide interests recently as a promising solution. As an extension of existing research, in this paper, we propose a suspicious timing error prediction method for performance or energy efficiency improvement in pipeline designs. Experimental results show that with when compared with typical margin designs, the proposed method can 1) achieve up to 1.41X throughput improvement with in-situ timing error prediction ability; and 2) allow the design to be overclocked by up to 1.88X with 'always correct' outputs. © 2013 IEEE.

DOI

Scopus

17

Citation

(Scopus)
An energy-efficient high-level synthesis algorithm incorporating interconnection delays and dynamic multiple supply voltages

Shin Ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

2013 International Symposium on VLSI Design, Automation, and Test, VLSI-DAT 2013 2013

　View Summary

In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture) that integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis. Next, we propose a high-level synthesis algorithm for AVHDR architectures. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, huddles, each of which abstracts modules placed close to each other, are naturally generated using floorplanning. Low-supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Experimental results show that our algorithm achieves 50% energy-saving compared with conventional algorithms. © 2013 IEEE.

DOI

Scopus
Concurrent faulty clock detection for crypto circuits against clock glitch based DFA

Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings - IEEE International Symposium on Circuits and Systems 1432 - 1435 2013

　View Summary

In this paper, a concurrent faulty clock detection method is proposed for crypto circuits against clock glitch based differential fault analysis (DFA). In the proposed method, a nonlogic buffer-based delay chain is inserted, and then by monitoring the delay along the delay chain, a possible clock glitch based DFA can be detected. Experimental results on an AES circuit show that the proposed method can successfully detect clock glitch based attacks, and the required area overhead is only 0.47% that is much smaller than previous works. © 2013 IEEE.

DOI

Scopus

19

Citation

(Scopus)
Scan-based attack on AES through round registers and its countermeasure

Youhua Shi, Nozomu Togawa, Masao Yanagisawa

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E95-A ( 12 ) 2338 - 2346 2012.12

　View Summary

Scan-based side channel attack on hardware implementations of cryptographic algorithms has shown its great security threat. Unlike existing scan-based attacks, in our work we observed that instead of the secret-related-registers, some non-secret registers also carry the potential of being misused to help a hacker to retrieve secret keys. In this paper, we first present a scan-based side channel attack method on AES by making use of the round counter registers, which are not paid attention to in previous works, to show the potential security threat in designs with scan chains. And then we discussed the issues of secure DFT requirements and proposed a secure scan scheme to preserve all the advantages and simplicities of traditional scan test, while significantly improve the security with ignorable design overhead, for crypto hardware implementations. Copyright © 2012 The Institute of Electronics, Information and Communication Engineers.

DOI

Scopus

1

Citation

(Scopus)
MH ⁴: Multiple-supplyvoltages aware high-level synthesis for highintegrated and highfrequency circuits for HDR architectures

Shin Ya Abe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

IEICE Electronics Express 9 ( 17 ) 1414 - 1422 2012

　View Summary

In this paper, we propose multiple-supply-voltages aware high-level synthesis algorithm for HDR architectures which realizes high-speed and high-efficient circuits. We propose three new techniques: virtual area estimation, virtual area adaptation, and floorplanning- directed huddling, and integrate them into our HDR architecture synthesis algorithm. Virtual area estimation/adaptation effectively estimates a huddle area by gradually reducing it during iterations, which improves the convergence of our algorithm. Floorplanningdirected huddling determines huddle composition very effectively by performing floorplanning and functional unit assignment inside huddles simultaneously. Experimental results show that our algorithm achieves about 29% run-time-saving compared with the conventional algorithms, and obtains a solution which cannot be obtained by our original algorithm even if a very tight clock constraint is given. © IEICE 2012.

DOI

Scopus

14

Citation

(Scopus)
Dynamically changeable secure scan architecture against scan-based side channel attack

Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

ISOCC 2012 - 2012 International SoC Design Conference 155 - 158 2012

　View Summary

Scan test which is one of the useful design for testability techniques is effective for LSIs including cryptographic circuit. It can observe and control the internal states of the circuit under test by using scan chain. However, scan chain presents a significant security risk of information leakage for scan-based attacks which retrieves secret keys of cryptographic LSIs. In this paper, a secure scan architecture against scan-based attack which still has high testability is proposed. In our method, scan data is dynamically changed by adding the latch to any FFs in the scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method. © 2012 IEEE.

DOI

Scopus

42

Citation

(Scopus)
State dependent scan flip-flop with key-based configuration against scan-based side channel attack on RSA circuit

Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 607 - 610 2012

　View Summary

Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore testability and security contradicted to each other, and there is a need to an efficient design for testability circuit so as to satisfy both testability and security requirement. In this paper, a secure scan architecture against scan-based attack is proposed to achieve high security without compromising the testability. In our method, scan structure is dynamically changed by adding the latch to any FFs in the scan chain. We made an analysis on an RSA circuit implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack. © 2012 IEEE.

DOI

Scopus

19

Citation

(Scopus)
State-dependent changeable scan architecture against scan-based side channel attacks

Ryuta Nara, Hiroshi Atobe, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

ISCAS 2010 - 2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems 1867 - 1870 2010

　View Summary

Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan path would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan path to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each State-dependent Scan FF to be inverted or not so as to make it more difficult to discover the internal scan architecture. ©2010 IEEE.

DOI

Scopus

10

Citation

(Scopus)
VLSI implementation of a fast intra prediction algorithm for H.264/AVC encoding

Youhua Shi, Kenta Tokumitsu, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 1139 - 1142 2010

　View Summary

Intra-frame coding is one of the most important technologies in H.264/AVC, which made significant contributions to the enhancement of coding efficiency of H.264/AVC at the cost of computation complexity. To address this problem, in this paper we present an efficient VLSI implementation of a computation efficient intra prediction algorithm for H.264/AVC encoding. Unlike most of existing fast intra-mode selection techniques, in the proposed method the directional differences are computed using a few selected original pixels to obtain the candidate modes with the minimal direction cost. The proposed method is hardware-friendly and provides more processing parallelism for H.264 intra-frame encoding with less overhead and less power consumption, which is expected to be utilized as a favourable accelerator hardware module in a real-time HDTV (1920×1080p) H.264 encoder. © 2010 IEEE.

DOI

Scopus

2

Citation

(Scopus)
Design-for-secure-test for crypto cores

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings - International Test Conference 2009.12

　View Summary

Scan technology carries the potential of being misused as a "side channel" to leak out the secret information of crypto cores. To address such a design challenge, this paper proposes a design-for-secure-test (DFST) solution for crypto cores by adding a stimuli-launched flip-flop into the traditional scan flip-flop to maintain the high test quality without compromising the security. © 2009 IEEE.

DOI

Scopus

7

Citation

(Scopus)
X-handling for current X-tolerant compactors with more unknowns and maximal compaction

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E92-A ( 12 ) 3119 - 3127 2009.12

　View Summary

This paper presents a novel X-handling technique, which removes the effect of unknowns on compacted test response with maximal compaction ratio. The proposed method combines with the current X-tolerant compactors and inserts masking cells on scan paths to selectively mask X's. By doing this, the number of unknown responses in each scan-out cycle could be reduced to a reasonable level such that the target X-tolerant compactor would tolerate with guaranteed possible error detection. It guarantees no test loss due to the effect of X's, and achieves the maximal compaction that the target response compactor could provide as well. Moreover, because the masking cells are only inserted on the scan paths, it has no performance degradation of the designs. Experimental results demonstrate the effectiveness of the proposed method. Copyright © 2009 The Institute of Electronics, Information and Communication Engineers.

DOI

Scopus
Unified dual-radix architecture for scalable montgomery multiplications in GF(P) and GF(2ⁿ)

Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E92-A ( 9 ) 2304 - 2317 2009.09

　View Summary

Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), that is a type of publickey cryptography. Montgomery multiplier is commonly used to compute the modular multiplications and requires scalability because the bit length of operands varies depending on its security level. In addition, ECC is performed in GF(P) or GF(2n), and unified architecture for multipliers in GF(P) and GF(2n) is required. However, in previous works, changing frequency is necessary to deal with delay-time difference between GF ( P) and GF(2n) multipliers because the critical path of the GF(P) multiplier is longer. This paper proposes unified dual-radix architecture for scalable Montgomery multiplications in GF(P) and GF(2n). This proposed architecture unifies four parallel radix-216 multipliers in GF(P) and a radix-264 multiplier in GF(2n) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute a GF(P) 256-bit Montgomery multiplication in 0.28 μs. The implementation result shows that the area of the proposal is almost the same as that of previous works: 39 kgates. Copyright © 2009 The Institute of Electronics, Information and Communication Engineers.

DOI

Scopus
A secure test technique for pipelined advanced encryption standard

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

IEICE Transactions on Information and Systems E91-D ( 3 ) 776 - 780 2008.03

　View Summary

In this paper, we presented a Design-for-Secure-Test (DFST) technique for pipelined AES to guarantee both the security and the test quality during testing. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits. Copyright © 2008 The Institute of Electronics, Information and Communication Engineers.

DOI

Scopus

3

Citation

(Scopus)
A unified test compression technique for scan stimulus and unknown masking data with no test loss

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E91-A ( 12 ) 3514 - 3523 2008

　View Summary

This paper presents a unified test compression technique for scan stimulus and unknown masking data with seamless integration of test generation, test compression and all unknown response masking for high quality manufacturing test cost reduction. Unlike prior test compression methods, the proposed approach considers the unknown responses during test pattern generation procedure, and then selectively encodes the less specified bits (either Is or Os) in each scan slice for compression while at the same time masks the unknown responses before sending them to the response compactor. The proposed test scheme could dramatically reduce test data volume as well as the number of required test channels by using only c tester channels to drive N internal scan chains, where c = [10g2 N + 2- In addition, because all the unknown responses could be exactly masked before entering into the response compactor, test loss due to unknown responses would be eliminated. Experimental results on both benchmark circuits and larger designs indicated the effectiveness of the proposed technique. Copyright © 2008 The Institute of Electronics, Information and Communication Engineers.

DOI

Scopus
Scalable unified dual-radix architecture for Montgomery multiplication in GF{P) and GF(2ⁿ)

Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, Kazunori Shimizu, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC 697 - 702 2008

　View Summary

Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in GF(P) or GF(2 n), and unified architectures for GF(P) and GF(2n) multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2n) circuits of the multiplier because the critical path of GF(P) circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2n). The proposed architecture unifies 4 parallel radix-216 multipliers in GF(P) and a radix-264 multiplier in GF(2n) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute GF(P) 256-bit Montgomery multiplication in 0.23μs. ©2008 IEEE.

DOI

Scopus

4

Citation

(Scopus)
GECOM: Test data compression combined with all unknown response masking

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC 577 - 582 2008

　View Summary

This paper introduces GECOM technology, a novel test compression method with seamless integration of test GEneration, test COmpression (i.e. integrated compression on scan stimulus and masking bits) and all unknown scan responses Masking for manufacturing test cost reduction. Unlike most of prior methods, the proposed method considers the unknown responses during ATPG procedure and selectively encodes the specified 1 or 0 bits (either 1s or 0s) in scan slices for compression while at the same time masks the unknown responses before sending them to the response compactor. The proposed GECOM technology consists of GECOM architecture and GECOM ATPG technique. In the GECOM architecture, for a circuit with N internal scan chains, only c tester channels, where c = [log2 N] +2, are required. GECOM ATPG generates test patterns for the GECOM architecture thus not only the scan inputs could be efficiently compressed but also all the unknown responses would be masked. Experimental results on both benchmark circuits and real industrial designs indicated the effectiveness of the proposed GECOM technique. ©2008 IEEE.

DOI

Scopus

5

Citation

(Scopus)
Unknown response masking with minimized observable response loss and mask data

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS 1779 - 1781 2008

　View Summary

This paper presents a new unknown response masking technique to minimize the effect on test loss due to over-masking. Unlike previous works where the scan responses are masked before entering the response compactor, the proposed method could mask the Xs when they are transformed on the scan path. Meanwhile, the masking cells are inserted along the scan paths, thus they would have no degradation on the performance of the designs. In addition, the test data required to mask unknown responses is only one bit for each test pattern. Experimental results show the effectiveness of the proposed method. © 2008 IEEE.

DOI

Scopus
Design for secure test - A case study on pipelined advanced encryption standard

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings - IEEE International Symposium on Circuits and Systems 149 - 152 2007

　View Summary

Cryptography plays an important role in the security of data transmission. To ensure the correctness of crypto hardware, we should conduct testing at fabrication and infield. However, the state-of-the-art scan-based test techniques, to achieve high test qualities, need to increase the testability of the circuit under test, which carries a potential of being misused to reveal the secret information of the crypto hardware. Thus, to develop efficient test strategies for crypto hardware to achieve high test quality without compromising security becomes an important task. In this paper we discuss the development of a Design-forSecure-Test (DFST) technique for pipelined AES to overcome the above contradiction between security and test quality in testing crypto hardware. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits. © 2007 IEEE.

DOI

Scopus

3

Citation

(Scopus)
Selective low-care coding: A means for test data compression in circuits with multiple scan chains

Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E89-A ( 4 ) 996 - 1003 2006.04

　View Summary

This paper presents a test input data compression technique, Selective Low-Care Coding (SLC), which can he used to significantly reduce input test data volume as well as the external test channel requirement for multiscan-based designs. In the proposed SLC scheme, we explored the linear dependencies of the internal scan chains, and instead of encoding all the specified bits in test cubes, only a smaller amount of specified bits are selected for encoding, thus greater compression can be expected. Experiments on the larger benchmark circuits show drastic reduction in test data volume with corresponding savings on test application time can be indeed achieved even for the well-compacted test set. Copyright © 2006 The Institute of Electronics, Information and Communication Engineers.

DOI

Scopus

2

Citation

(Scopus)
FCSCAN: An efficient multiscan-based test compression technique for test cost reduction

Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC 2006 653 - 658 2006

　View Summary

This paper proposes a new multiscan-based test input data compression technique by employing a Fan-out Compression Scan Architecture (FCSCAN) for test cost reduction. The basic idea of FCSCAN is to target the minority specified 1 or 0 bits (either 1 or 0) in scan slices for compression. Due to the low specified bit density in test cube set, FCSCAN can significantly reduce input test data volume and the number of required test channels so as to reduce test cost. The FCSCAN technique is easy to be implemented with small hardware overhead and does not need any special ATPG for test generation. In addition, based on the theoretical compression efficiency analysis, improved procedures are also proposed for the FCSCAN to achieve further compression. Experimental results on both benchmark circuits and one real industrial design indicate that drastic reduction in test cost can be indeed achieved. © 2006 IEEE.
Low-cost IP core test using multiple-mode loading scan chain and scan chain clusters

Gang Zeng, Youhua Shi, Toshinori Takabatake, Masao Yanagisawa, Hideo Ito

Proceedings - IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems 136 - 144 2006

　View Summary

A fixing-shifting encoding (FSE) method is proposed to reduce test cost of IP cores. The FSE method reduces test cost by supporting multiple-mode loading test data, i.e., parallel loading, left-direction, and right-direction serial loading for each test slice data. Furthermore, the FSE that utilizes only two test channels can support a large number of internal scan chains and achieve further reduction in test cost by combining with scan chain clustering method. As a non-intrusive and automatic test pattern generation (ATPG) independent solution, the approach is applicable to IP core testing because it requires neither redesign of the core under test (CUT) nor running any additional ATPG for the encoding procedure. In addition, the decoder has low hardware overhead, and its design is independent of the CUT. Experimental results for some large ISCAS 89 benchmarks and an industry ASIC design have proven the efficiency of the proposed approach. © 2006 IEEE.

DOI

Scopus

2

Citation

(Scopus)
Low power test compression technique for designs with multiple scan chains

Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings of the Asian Test Symposium 2005 386 - 389 2005

　View Summary

This paper presents a new DFT technique that can significantly reduce test data volume as well as scan-in power consumption for multiscan-based designs. It can also help to reduce test time and tester channel requirements with small hardware overhead. In the proposed approach, we start with apre-computed test cube set and fill the don't-cares with proper values for joint reduction of test data volume and scan power consumption. In addition we explore the linear dependencies of the scan chains to construct a fanout structure only with inverters to achieve further compression. Experimental results for the larger ISCAS'89 benchmarks show the efficiency of the proposed technique. © 2005 IEEE.

DOI

Scopus

17

Citation

(Scopus)
A hybrid dictionary test data compression for multiscan-based designs

Youhua Shi, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E87-A ( 12 ) 3193 - 3199 2004.12

　View Summary

In this paper, we present a test data compression technique to reduce test data volume for multiscan-based designs. In our method the internal scan chains are divided into equal sized groups and two dictionaries were build to encode either an entire slice or a subset of the slice. Depending on the codeword, the decompressor may load all scan chains or may load only a group of the scan chains, which can enhance the effectiveness of dictionary-based compression. In contrast to previous dictionary coding techniques, even for the CUT with a large number of scan chains, the proposed approach can achieve satisfied reduction in test data volume with a reasonable smaller dictionary. Experimental results showed the proposed test scheme works particularly well for the large ISCAS'89 benchmarks.
A selective scan chain reconfiguration through run-length coding for test data compression and scan power reduction

Youhua Shi, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E87-A ( 12 ) 3208 - 3214 2004.12

　View Summary

Test data volume and power consumption for scan-based designs are two major concerns in system-on-a-chip testing. However, test set compaction by filling the don't-cares will invariably increase the scan-in power dissipation for scan testing, then the goals of test data reduction and low-power scan testing appear to be conflicted. Therefore, in this paper we present a selective scan chain reconfiguration method for test data compression and scan-in power reduction. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. After the scan chain reconfiguration a dictionary is built to indicate the run-length of each compatible class and only the scan-in data for each class should be transferred from the ATE to the CUT so as to reduce test data volume. Experimental results for the larger ISCAS'89 benchmarks show that the proposed approach overcomes the limitations of traditional run-length coding techniques, and leads to highly reduced test data volume with significant power savings during scan testing in all cases.
Reducing test data volume for multiscan-based designs through single/sequence mixed encoding

Youhua Shi, Shinji Kimura, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Midwest Symposium on Circuits and Systems 2 2004

　View Summary

This paper presents a new test data compression technique for multiscan-based designs through dictionary-based encoding on the single or sequences scan-inputs. In spite of its simplicity, it achieves significant reduction in test data volume. Unlike some previous approaches on test data compression, our approach eliminates the need for additional synchronization and handshaking between the CUT and the ATE, so it is especially suitable to be integrated in a low cost test scheme for SoC test In addition in contrast to previous dictionary-based coding techniques, even for the CUT with a small number of scan chains, the proposed approach can achieve satisfied reduction in test data volume. Experimental results showed the proposed test scheme works particularly well for the large ISCAS'89 benchmarks.
Alternative run-length coding through scan chain reconfiguration for joint minimization of test data volume and power consumption in scan test

Youhua Shi, Shinji Kimura, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings of the Asian Test Symposium 432 - 437 2004

　View Summary

Test data volume and scan power are two major concerns in SoC test. In this paper we present an alternative run-length coding method through scan chain reconfiguration to reduce both test data volume and scan-in power consumption. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. To extract the compatible scan cells we apply a heuristic algorithm by solving the graph coloring problem; and then a simple greedy algorithm is used to configure the scan chain for the minimization of scan power. Experimental results for the larger IS-CAS'89 benchmarks show that the proposed approach leads to highly reduced test data volume with significant power savings during scan test.

DOI

Scopus

2

Citation

(Scopus)
A Built-in Reseeding Technique for LFSR-Based Test Pattern Generation

Youhua Shi, Zhe Zhang, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E86-A ( 12 ) 3056 - 3062 2003.12

　View Summary

Reseeding technique is proposed to improve the fault coverage in pseudo-random testing. However most of previous works on reseeding is based on storing the seeds in an external tester or in a ROM. In this paper we present a built-in reseeding technique for LFSR-based test pattern generation. The proposed structure can run both in pseudorandom mode and in reseeding mode. Besides, our method requires no storage for the seeds since in reseeding mode the seeds can be generated automatically in hardware. In this paper we also propose an efficient grouping algorithm based on simulated annealing to optimize test vector grouping. Experimental results for benchmark circuits indicate the superiority of our technique against other reseeding methods with respect to test length and area overhead. Moreover, since the theoretical properties of LFSRs are preserved, our method could be beneficially used in conjunction with any other techniques proposed so far.
Multiple test set generation method for LFSR-based BIST

Youhua Shi, Zhe Zhang

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC 2003-January 863 - 868 2003

　View Summary

In this paper we propose a new reseeding method for LFSR-based test pattern generation suitable for circuits with random pattern resistant faults. The character of our method is that the proposed test pattern generator (TPG) can work both in normal LFSR mode, to generate pseudorandom test vectors, and in jumping mode to make the TPG jump from a state to the required state (seed of next group). Experimental results indicate that its superiority against other known reseeding techniques with respect to the length of the test sequence and the required area overhead.

DOI

Scopus

9

Citation

(Scopus)
New low power BIST methodology by altering the structure of linear feedback shift registers

Rui Li, Chen Hu, Jun Yang, Zhe Zhang, Youhua Shi

Dianzi Qijian/Journal of Electron Devices 25 ( 3 ) 245 2002.09
Simulated annealing algorithm applied in low power BIST scheme

Chen Hu, Zhe Zhang, Youhua Shi, Jun Yang, Longxing Shi

Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition) 32 ( 2 ) 177 - 180 2002.03

　View Summary

An approach to approximately optimal group test vectors in a certain length of test patterns is proposed to decrease the number of test vectors based on simulated annealing algorithm. By the scheme of reseeding, this approach makes linear feedback shift register (LFSR) generate optimized groups of vectors, so as to reduce the power consumption without any loss of fault coverage. The experiment result shows that more than 70% power consumption can be reduced while keeping the fault coverage invariable. In addition, the test time is greatly shortened with decreased number of test vectors, which is important in real time device.
A new software for test logic optimization in DFT

Zhe Zhang, Chen Hu, Rui Li, Youhua Shi, Longxing Shi

International Conference on ASIC, Proceedings 654 - 657 2001

　View Summary

This paper presents a new software named ASIC2000TA developed for design for test (DFT) aiming at optimizing test logic. This software consists of two modules: Test analysis module and DFT module. Test analysis module can examine circuit's testability, generate test vectors and perform fault simulation, in which some algorithms are described. DFT module automatically inserts test logic in gate-level netlist, including full scan and partial scan, in which a greedy search algorithm is discussed. Electronic design intermediate format (EDIF) acts as an interface between ASIC2000TA and Cadence. An experiment of ASIC2000TA is presented at last.
A new self-test structure for at-speed test of crosstalk in SoC busses

Jun Yang, Chen Hu, Youhua Shi, Zhe Zhang, Longxing Shi

International Conference on ASIC, Proceedings 633 - 636 2001

　View Summary

The use of deep submicron process technologies increases the probability of crosstalk faults in the bus of system-on-a-chip (SoC). Though a self-testing methodology based on MA fault model has been developed, its area overhead of test logic is excessive. This paper proposed a new Error Detector (ED) and new test patterns whose overhead is decreased down to only approximate 50% of the old methodology on the average. A behavior fault simulation is used to validate the self-testing structure described in this paper.
A new low power BIST methodology by altering the structure of linear feedback shift registers

Rui Li, Chen Hu, Jun Yang, Zhe Zhang, Youhua Shi, Longxing Shi

International Conference on ASIC, Proceedings 646 - 649 2001

　View Summary

In this paper a new low power BIST methodology by altering the structure of linear feedback shift register (LFSR) is proposed. In pseudo-random test mode, the efficiency of the vectors decreases sharply as the test progresses. For low power consumption during test mode, the proposed approach ignores the non-detecting vectors by altering the structure of LFSR. Note that altering the structure of LFSR is efficient, and its has no impact on the fault coverage.

▼display all

Presentations

Application and evaluation of CNN with approximate adders

井上雄太, 戸川望, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems

Presentation date： 2018.05
A low power SRAM design with leakage power reduction

伊藤卓, 戸川望, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems

Presentation date： 2018.05
MOSs SP-SSHI for low frequency piezoelectric energy harvesting

杉山貴紀, 戸川望, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems

Presentation date： 2018.05
Soft error tolerant latch designs with low power consumption (invited paper)

Saki Tajima, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

Proceedings of International Conference on ASIC

Presentation date： 2018.01

　View Summary

© 2017 IEEE. As semiconductor technology continues scaling down, the reliability issue has become much more critical than ever before. Unlike traditional hard-errors caused by permanent physical damage which can't be recovered in field, soft errors are caused by radiation or voltage/current fluctuations that lead to transient changes on internal node states, thus they can be viewed as temporary errors. However, due to the unpredictable occurrence of soft errors, it is desirable to develop soft error tolerant designs. For this reason, soft error tolerant design techniques have gained great research interest. In this paper, we will explain the soft error mechanism and then review the existing soft error tolerant design techniques with particular emphasis on SEH family because they can achieve low power consumption and small performance overhead as well.
A low cost and high speed CSD-based symmetric transpose block FIR implementation

Jinghao Ye, Youhua Shi, Nozomu Togawa, Masao Yanagisawa

Proceedings of International Conference on ASIC

Presentation date： 2018.01

　View Summary

© 2017 IEEE. In this paper, a low cost and high speed CSD-based symmetric transpose block FIR design was proposed for low cost digital signal processing. First, the existing area-efficient CSD-based multiplier was optimized by considering the reusability and the symmetry of coefficients for area reduction. Second, the position of the input register was changed for high speed transpose block FIR processing in which half of the number of required multipliers can be saved. When compared with the existing block FIR designs, the proposed FIR design can increase the data rate from 238.66 MHz to 373.13 MHz while saving 10.89% area and 21.30% energy consumption as well.
Design of a soft error detection latch using internal node

中垣直道, 戸川望, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems

Presentation date： 2017.05
C-element based soft-error hardened latch designs

田島咲季, 戸川望, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems

Presentation date： 2017.05
Maximum error distance-based optimization of GeAr circuits

早水謙, 戸川望, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems

Presentation date： 2017.05
Self-powered switching magnetic transformer circuit for energy harvesting systems

川合洋平, 戸川望, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems

Presentation date： 2017.05
Improved monitoring-path selection algorithm for suspicious timing error prediction based timing speculation

Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings - 2015 IEEE 11th International Conference on ASIC, ASICON 2015

Presentation date： 2016.07

　View Summary

© 2015 IEEE. As process technology is scaling down, timing speculation techniques such as Razor and STEP are emerged as alternative solutions to reduce required margins due to various variation effects. Unlike Razor, STEP is a prediction-based timing speculation method to predict suspicious timing errors before they really appear, and thus it can result in more performance improvement. Therefore, an improved monitoring-path selection algorithm for STEP-based timing speculation is proposed in this paper, in which candidate monitoring-paths are selected based on short path removement and path length estimation. Experimental results show that the proposed algorithm realizes an average of 1.71X overclocking compared with worst-case based designs.
A low-power soft error tolerant latch scheme

Saki Tajima, Youhua Shi, Nozomu Togawa, Masao Yanagisawa

Proceedings - 2015 IEEE 11th International Conference on ASIC, ASICON 2015

Presentation date： 2016.07

　View Summary

© 2015 IEEE. As process technology continues scaling, low power and reliability of integrated circuits are becoming more critical than ever before. Particularly, due to the reduction of node capacitance and operating voltage for low power consumption, it makes the circuits more sensitive to high-energy particles induced soft errors. In this paper, a soft-error tolerant latch called TSPC-SEH is proposed for soft error tolerance with low power consumption. The simulation results show that the proposed TSPC-SEH latch can achieve up to 42% power consumption reduction and 54% delay improvement compared to the existing soft error tolerant SEH and DICE designs.
In-situ Trojan authentication for invalidating hardware-Trojan functions

Masaru Oya, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings - International Symposium on Quality Electronic Design, ISQED

Presentation date： 2016.05

　View Summary

© 2016 IEEE. Due to the fact that we do not know who will create hardware Trojans (HTs), and when and where they would be inserted, it is very difficult to correctly and completely detect all the real HTs in untrusted ICs, and thus it is desired to incorporate in-situ HT invalidating functions into untrusted ICs as a countermeasure against HTs. This paper proposes an in-situ Trojan authentication technique for gate-level netlists to avoid security leakage. In the proposed approach, an untrusted IC operates in authentication mode and normal mode. In the authentication mode, an embedded Trojan authentication circuit monitors the bit-flipping count of a suspicious Trojan net within the pre-defined constant clock cycles and identify whether it is a real Trojan or not. If the authentication condition is satisfied, the suspicious Trojan net is validated. Otherwise, it is invalidated and HT functions are masked. By doing this, even untrusted netlists with HTs can still be used in the normal mode without security leakage. By setting the appropriate authentication condition using training sets from Trust-HUB gate-level benchmarks, the proposed technique invalidates successfully only HTs in the training sets. Furthermore, by embedding the in-situ Trojan authentication circuit into a Trojan-inserted AES crypto netlist, it can run securely and correctly even if HTs exist where its area overhead is just 1.5% with no delay overhead.
A delay variation and floorplan aware high-level synthesis algorithm with body biasing

Koki Igawa, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings - International Symposium on Quality Electronic Design, ISQED

Presentation date： 2016.05

　View Summary

© 2016 IEEE. In this paper, we propose a delay variation and floorplan aware high-level synthesis algorithm with body biasing, which minimizes the average leakage energy of manufactured chips. To realize a floorplan-oriented high-level synthesis, we utilize a huddle-based distributed register architecture (HDR architecture), one of the DR architectures. HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit but can increase the latency. We assign CDFG nodes in critical paths to the huddles with larger expected leakage energy and those in non-critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 38.9% without latency and yield degradation compared with typical-case design with body biasing.
Fast and Low-power Soft-error Tolerant Fast-SEH Latch

田島咲季, 史又華, 戸川望, 柳澤政生

回路とシステムワークショップ論文集 Workshop on Circuits and Systems

Presentation date： 2016.05
A process-variation-aware multi-scenario high-level synthesis algorithm for distributed-register architectures

Koki Igawa, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

International System on Chip Conference

Presentation date： 2016.02

　View Summary

© 2015 IEEE. In order to tackle a process-variation problem, we can define several scenarios, each of which corresponds to a particular LSI behavior, such as a typical-case scenario and a worst-case scenario. By designing a single LSI chip which realizes multiple scenarios simultaneously, we can have a process-variation-tolerant LSI chip. In this paper, we propose a process-variation-aware low-latency and multi-scenario high-level synthesis algorithm targeting new distributed-register architectures, called HDR architectures. We assume two scenarios, a typical-case scenario and a worst-case scenario, and realize them onto a single chip. We first schedule/bind each of the scenarios independently. After that, we commonize the scheduling/binding results for the typical-case and worst-case scenarios and thus generate a commonized area-minimized floorplan result. Experimental results show that our algorithm reduces the latency of the typical-case scenario by up to 50% without increasing the latency of the worst-case scenario, compared with several existing methods.
Scan-based side-channel attack against symmetric key ciphers using scan signatures

Mika Fujishiro, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015

Presentation date： 2015.09

　View Summary

© 2015 IEEE. There are a number of studies on a side-channel attack which uses information exploited from the physical implementation of a cryptosystem. A scan-based side-channel attack utilizes scan chains, one of design-for-test techniques and retrieves the secret information inside the cryptosystem. In this paper, scan-based side-channel attack methods against symmetric key ciphers such as block ciphers and stream ciphers using scan signatures are presented to show the risk of scan-based attacks.
FPGA-based SHA-3 acceleration on a 32-bit processor via instruction set extension

Yi Wang, Youhua Shi, Chao Wang, Yajun Ha

Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015

Presentation date： 2015.09

　View Summary

© 2015 IEEE. As embedded systems play more and more important roles Internet of Things (IoT), the integration of cryptographic functionalities is an urgent demand to ensure data and information security. Recently, Keccak was declared as the winner of the third generation of Secure Hashing Algorithm (SHA-3). However, implementing SHA-3 on a specific 32-bit processor failed to meet the performance requirement. On the other hand, implementing it as a cryptographic coprocessor consumes a lot of extra area and requires customized driver program. Although implementing Keccak on a 64-bit platform is more efficient, this platform is not suitable for embedded implementation. In this paper, we propose a novel SHA-3 implementation using instruction set extension based on a 32-bit LEON3 processor (an open source processor), with the goals of reducing execution cycles and code size. Experimental results show that the proposed design reduces around 87% execution cycles and 10.5% code size as compared to reference designs. Our design takes up only 9.44% extra area with negligible speed overhead compared to the standard LEON3 processor. Compared to the existing hardware accelerators, our proposed design occupies only half of area resources and does not require extra driver programs to be developed when integrated into the overall system.
A floorplan-aware high-level synthesis technique with delay-variation tolerance

Kazushi Kawamura, Yuta Hagio, Youhua Shi, Nozomu Togawa

Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015

Presentation date： 2015.09

　View Summary

© 2015 IEEE. For realizing better trade-off between performance and yield rate in recent LSI designs, it is required to deal with increasing the ratios of interconnect delay as well as delay variation. In this paper, a novel floorplan-aware high-level synthesis technique with delay-variation tolerance is proposed. By utilizing floorplan-driven architectures, interconnect delays can be estimated and then handled even in high-level synthesis. Applying our technique enables to realize two scheduling/binding results (one is a non-delayed result and the other is a delayed result) simultaneously on a chip with small area/performance overhead, and either one of them can be selected according to the post-silicon delay variation. Experimental results demonstrate that our technique can reduce delayed scheduling/binding latency by up to 32.3% compared with conventional approaches.
A universal delay line circuit for variation resilient IC with self-calibrated time-to-digital converter

Shuai Shao, Youhua Shi, Wentao Dai, Jianyi Meng, Weiwei Shan

Proceedings of the 2015 IEEE International Conference on Electron Devices and Solid-State Circuits, EDSSC 2015

Presentation date： 2015.09

　View Summary

© 2015 IEEE. A universal delay monitor used to imitate the real critical paths is developed for variation resilient integrated circuit. This monitor is constructed based on the different proportion of logic cells and interconnects. The delay of the monitor is detected by a time-to-digital converter which keeps the sampling results precise. To reduce the deviation of the sampling results caused by PVT, a novel time-to-digital converter with self-calibration mechanism is developed. This variation resilient method based adaptive voltage scaling is applied on an ARM7 based System on a Chip on 0.18 μm CMOS process with a 112M signoff frequency and an area of 1.3∗1.3 mm2. The simulation results show that it has a 43.42% gain of power consumption under FF corner, -25°C compared to the fixed 1.8 V traditional design.
A Score-Based Classification Method for Identifying Hardware-Trojans Inserted/Free Gate-Level Netlists

Presentation date： 2015.03
A score-based classification method for identifying Hardware-Trojans at gate-level netlists

Masaru Oya, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings -Design, Automation and Test in Europe, DATE

Presentation date： 2015.01

　View Summary

© 2015 EDAA. Recently, digital ICs are often designed by outside vendors to reduce design costs in semiconductor industry, which may introduce severe risks that malicious attackers implement Hardware Trojans (HTs) on them. Since IC design phase generates only a single design result, an RT-level or gate-level netlist for example, we cannot assume an HT-free netlist or a Golden netlist and then it is too difficult to identify whether a generated netlist is HT-free or HT-inserted. In this paper, we propose a score-based classification method for identifying HT-free or HT-inserted gate-level netlists without using a Golden netlist. Our proposed method does not directly detect HTs themselves in a gate-level netlist but a net included in HTs, which is called Trojan net, instead. Firstly, we observe Trojan nets from several HT-inserted benchmarks and extract several their features. Secondly, we give scores to extracted Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to classify HT-free and HT-inserted netlists. Based on these scores, we can successfully classify HT-free and HT-inserted netlists in all the Trust-HUB gate-level benchmarks. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be 'HT-inserted' and all the HT-free gate-level benchmarks to be 'HT-free' in approximately three hours for each benchmark.
In-situ timing monitoring methods for variation-resilient designs

Youhua Shi, Nozomu Togawa

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS

Presentation date： 2015.01

　View Summary

© 2014 IEEE. With technology scaling, process, voltage, and temperature (PVT) variations pose great challenges on integrated circuit designs. Conventionally, LSI circuits are designed by adding pessimistic timing margin to guarantee 'always correct' operations even under worst-case conditions. However, due to the increasing PVT variations, unacceptable larger design guard band should be reserved to avoid timing errors on critical paths of circuits, which will therefore lead to very inefficient designs in terms of power and performance. For this reason, in-situ timing monitoring technique has gained great research interest. In this paper, we will review existing variation-resilient design techniques with particular emphasis on in-situ timing monitoring techniques including both detection and prediction-based methods. The effectiveness of in-situ timing monitoring techniques will be discussed. Finally, we show an example of in-situ timing monitoring technique called STEP with applications to general pipeline designs.
An area-overhead-oriented monitoring-path selection algorithm for suspicious timing error prediction

Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS

Presentation date： 2015.01

　View Summary

© 2014 IEEE. As process technologies advance, the importance of timing error correction techniques is increasing as well. In this paper, We propose an area-overhead-oriented monitoring-path selection algorithm for suspicious timing error prediction circuits (STEPCs). STEPC predicts timing errors by monitoring the middle points of several speed-paths in a circuit. However, we need many STEPCs with a high area overhead to predict timing errors in an overall circuit. Our proposed method moves the STEPC insertion positions to minimize the number of inserted STEPCs. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs. Our proposed algorithm reduces the required number of STEPCs to 1/19 and their area to 1/5 compared with a naive algorithm. Furthermore, our algorithm realizes 2.25X overclocking compared with just inserting STEPCs into several speed-paths.
Secure scan design using improved random order and its evaluations

Masaru Oya, Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS

Presentation date： 2015.01

　View Summary

© 2014 IEEE. Scan test using scan chains is one of the most important DFT techniques. However, scan-based attacks are reported which can retrieve the secret key in crypto circuits by using scan chains. Secure scan architecture is strongly required to protect scan chains from scan-based attacks. This paper proposes an improved version of random order as a secure scan architecture. In improved random order, a scan chain is partitioned into multiple sub-chains. The structure of the scan chain changes dynamically by selecting a subchain to scan out. Testability and security of the proposed improved random order are also discussed in the paper, and the implementation results demonstrate the effectiveness of the proposed method.
In-situ Timing Monitoring Methods for Variation-Resilient Designs

Presentation date： 2014.11
An Area-Overhead-Oriented Monitoring-Path Selection Algorithm for Suspicious Timing Error Prediction

Presentation date： 2014.11
Secure Scan Design Using Improved Random Order and its Evaluations

Presentation date： 2014.11
A Network-flow-based Checkpoint Insertion Algorithm for Suspicious Timing Error Prediction Scheme

吉田慎之介, 史又華, 柳澤政生

回路とシステムワークショップ論文集 Workshop on Circuits and Systems

Presentation date： 2014.08
InTimeTune: A Throughput Driven Timing Speculation Architecture for Overscaled Designs

Presentation date： 2014.06
Throughput Driven Check Point Selection in Suspicious Timing Error Prediction based Designs

Presentation date： 2014.02
Throughput driven check point selection in suspicious timing error prediction based designs

Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

2014 IEEE 5th Latin American Symposium on Circuits and Systems, LASCAS 2014 - Conference Proceedings

Presentation date： 2014.01

　View Summary

In this paper, a throughput-driven design technique is proposed, in which a suspicious timing error prediction circuit is inserted to monitor the signal transitions at some selected check points. Unlike previous works where timing errors are detected after their occurrence, the proposed method tries to use the real intermediate signal transitions for timing error prediction. The check point selection will affect both the maximal operation frequency and the suspicious timing error overestimation rate, both of which have an effect on the overall throughput, thus an analysis on the check point selection is also given. In our work, the circuit can be overclocked by a factor of 2 or more with ignorable area overhead while guarantees the always-correct output. © 2014 IEEE.
Secure Scan Design with Dynamically Configurable Connection

Presentation date： 2013.12
Predication based Timing Speculation Technique for Throughput Improvement

Presentation date： 2013.11
Concurrent faulty clock detection for crypto circuits against clock glitch based DFA

Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings - IEEE International Symposium on Circuits and Systems

Presentation date： 2013.09

　View Summary

In this paper, a concurrent faulty clock detection method is proposed for crypto circuits against clock glitch based differential fault analysis (DFA). In the proposed method, a nonlogic buffer-based delay chain is inserted, and then by monitoring the delay along the delay chain, a possible clock glitch based DFA can be detected. Experimental results on an AES circuit show that the proposed method can successfully detect clock glitch based attacks, and the required area overhead is only 0.47% that is much smaller than previous works. © 2013 IEEE.
An energy-efficient high-level synthesis algorithm incorporating interconnection delays and dynamic multiple supply voltages

Shin Ya Abe, Youhua Shi, Kimiyoshi Usami, Kimiyoshi Usami, Kimiyoshi Usami, Masao Yanagisawa, Masao Yanagisawa, Nozomu Togawa

2013 International Symposium on VLSI Design, Automation, and Test, VLSI-DAT 2013

Presentation date： 2013.08

　View Summary

In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture) that integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis. Next, we propose a high-level synthesis algorithm for AVHDR architectures. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, huddles, each of which abstracts modules placed close to each other, are naturally generated using floorplanning. Low-supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Experimental results show that our algorithm achieves 50% energy-saving compared with conventional algorithms. © 2013 IEEE.
Random Order Scan Design against Scan-Based Attacks

跡部悠太, 史又華, 柳澤政生

回路とシステムワークショップ論文集 Workshop on Circuits and Systems

Presentation date： 2013.07
Suspicious timing error prediction with in-cycle clock gating

Youhua Shi, Hiroaki Igarashi, Nozomu Togawa, Masao Yanagisawa

Proceedings - International Symposium on Quality Electronic Design, ISQED

Presentation date： 2013.07

　View Summary

Conventionally, circuits are designed to add pessimistic timing margin to solve delay variation problems, which guarantees 'always correct' operations. However, due to the fact that such a worst-case condition occurs rarely, the traditional pessimistic design method is therefore becoming one of the main obstacles for designers to achieve higher performance and/or ultra-low power consumption. By monitoring timing error occurrence during circuit operation, adaptive timing error detection and recovery methods have gained wide interests recently as a promising solution. As an extension of existing research, in this paper, we propose a suspicious timing error prediction method for performance or energy efficiency improvement in pipeline designs. Experimental results show that with when compared with typical margin designs, the proposed method can 1) achieve up to 1.41X throughput improvement with in-situ timing error prediction ability; and 2) allow the design to be overclocked by up to 1.88X with 'always correct' outputs. © 2013 IEEE.
Floorplan Driven Architectures and High-level Synthesis Algorithm for Dynamic Multiple Supply Voltages

Presentation date： 2013.06
Concurrent Faulty Clock Detection for Crypto Circuits Against Clock Glitch Based DFA

Presentation date： 2013.05
DR24 An Energy-efficient High-level Synthesis Algorithm Incorporating Interconnection Delays and Dynamic Multiple Supply Voltages

Presentation date： 2013.04
Suspicious Timing Error Detection and Recovery with In-Cycle Clock Gating

Presentation date： 2013.03
Secure scan design with dynamically configurable connection

Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC

Presentation date： 2013.01

　View Summary

Scan test is a powerful test technique which can control and observe the internal states of the circuit under test through scan chains. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore new secure test methods are required to satisfy both testability and security requirements. In this paper, a secure scan design is proposed to achieve adequate security requirement as a countermeasure against scan-based attacks, while still maintain high testability like normal scan testing. In our method, the internal scan chain is divided into several sub chains, and the connection order of sub chains can be dynamically changed. In addition, how to decide the connection order of those sub chains so that it can't be identified by an attacker is also proposed in this paper. The proposed method is implemented on an AES circuit to show its effectiveness, and a security analysis is also given to show how the proposed approach can be used as a countermeasure against those known scan-based attacks. © 2013 IEEE.
State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Side Channel Attack on RSA Circuit

Presentation date： 2012.12
State dependent scan flip-flop with key-based configuration against scan-based side channel attack on RSA circuit

Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS

Presentation date： 2012.12

　View Summary

Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore testability and security contradicted to each other, and there is a need to an efficient design for testability circuit so as to satisfy both testability and security requirement. In this paper, a secure scan architecture against scan-based attack is proposed to achieve high security without compromising the testability. In our method, scan structure is dynamically changed by adding the latch to any FFs in the scan chain. We made an analysis on an RSA circuit implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack. © 2012 IEEE.
Dynamically changeable secure scan architecture against scan-based side channel attack

Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

ISOCC 2012 - 2012 International SoC Design Conference

Presentation date： 2012.12

　View Summary

Scan test which is one of the useful design for testability techniques is effective for LSIs including cryptographic circuit. It can observe and control the internal states of the circuit under test by using scan chain. However, scan chain presents a significant security risk of information leakage for scan-based attacks which retrieves secret keys of cryptographic LSIs. In this paper, a secure scan architecture against scan-based attack which still has high testability is proposed. In our method, scan data is dynamically changed by adding the latch to any FFs in the scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method. © 2012 IEEE.
Dynamically Changeable Architecture against Scan-Based Side Channel, Attack Using State Dependent Scan Flip-Flop on RSA Circuit

Presentation date： 2012.11
VLSI implementation of a fast intra prediction algorithm for H.264/AVC encoding

Youhua Shi, Kenta Tokumitsu, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS

Presentation date： 2010.12

　View Summary

Intra-frame coding is one of the most important technologies in H.264/AVC, which made significant contributions to the enhancement of coding efficiency of H.264/AVC at the cost of computation complexity. To address this problem, in this paper we present an efficient VLSI implementation of a computation efficient intra prediction algorithm for H.264/AVC encoding. Unlike most of existing fast intra-mode selection techniques, in the proposed method the directional differences are computed using a few selected original pixels to obtain the candidate modes with the minimal direction cost. The proposed method is hardware-friendly and provides more processing parallelism for H.264 intra-frame encoding with less overhead and less power consumption, which is expected to be utilized as a favourable accelerator hardware module in a real-time HDTV (1920×1080p) H.264 encoder. © 2010 IEEE.
State-dependent changeable scan architecture against scan-based side channel attacks

Ryuta Nara, Hiroshi Atobe, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

ISCAS 2010 - 2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems

Presentation date： 2010.08

　View Summary

Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan path would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan path to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each State-dependent Scan FF to be inverted or not so as to make it more difficult to discover the internal scan architecture. ©2010 IEEE.
Design-for-secure-test for crypto cores

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings - International Test Conference

Presentation date： 2009.12

　View Summary

Scan technology carries the potential of being misused as a "side channel" to leak out the secret information of crypto cores. To address such a design challenge, this paper proposes a design-for-secure-test (DFST) solution for crypto cores by adding a stimuli-launched flip-flop into the traditional scan flip-flop to maintain the high test quality without compromising the security. © 2009 IEEE.
Unknown response masking with minimized observable response loss and mask data

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS

Presentation date： 2008.12

　View Summary

This paper presents a new unknown response masking technique to minimize the effect on test loss due to over-masking. Unlike previous works where the scan responses are masked before entering the response compactor, the proposed method could mask the Xs when they are transformed on the scan path. Meanwhile, the masking cells are inserted along the scan paths, thus they would have no degradation on the performance of the designs. In addition, the test data required to mask unknown responses is only one bit for each test pattern. Experimental results show the effectiveness of the proposed method. © 2008 IEEE.
GECOM: Test data compression combined with all unknown response masking

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC

Presentation date： 2008.08

　View Summary

This paper introduces GECOM technology, a novel test compression method with seamless integration of test GEneration, test COmpression (i.e. integrated compression on scan stimulus and masking bits) and all unknown scan responses Masking for manufacturing test cost reduction. Unlike most of prior methods, the proposed method considers the unknown responses during ATPG procedure and selectively encodes the specified 1 or 0 bits (either 1s or 0s) in scan slices for compression while at the same time masks the unknown responses before sending them to the response compactor. The proposed GECOM technology consists of GECOM architecture and GECOM ATPG technique. In the GECOM architecture, for a circuit with N internal scan chains, only c tester channels, where c = [log2 N] +2, are required. GECOM ATPG generates test patterns for the GECOM architecture thus not only the scan inputs could be efficiently compressed but also all the unknown responses would be masked. Experimental results on both benchmark circuits and real industrial designs indicated the effectiveness of the proposed GECOM technique. ©2008 IEEE.
Scalable unified dual-radix architecture for Montgomery multiplication in GF{P) and GF(2ⁿ)

Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, Kazunori Shimizu, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC

Presentation date： 2008.08

　View Summary

Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in GF(P) or GF(2n), and unified architectures for GF(P) and GF(2n) multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2n) circuits of the multiplier because the critical path of GF(P) circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2n). The proposed architecture unifies 4 parallel radix-216multipliers in GF(P) and a radix-264multiplier in GF(2n) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute GF(P) 256-bit Montgomery multiplication in 0.23μs. ©2008 IEEE.
Design for secure test - A case study on pipelined advanced encryption standard

Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings - IEEE International Symposium on Circuits and Systems

Presentation date： 2007.09

　View Summary

Cryptography plays an important role in the security of data transmission. To ensure the correctness of crypto hardware, we should conduct testing at fabrication and infield. However, the state-of-the-art scan-based test techniques, to achieve high test qualities, need to increase the testability of the circuit under test, which carries a potential of being misused to reveal the secret information of the crypto hardware. Thus, to develop efficient test strategies for crypto hardware to achieve high test quality without compromising security becomes an important task. In this paper we discuss the development of a Design-forSecure-Test (DFST) technique for pipelined AES to overcome the above contradiction between security and test quality in testing crypto hardware. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits. © 2007 IEEE.
Low-cost IP core test using multiple-mode loading scan chain and scan chain clusters

Gang Zeng, Youhua Shi, Toshinori Takabatake, Masao Yanagisawa, Hideo Ito

Proceedings - IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems

Presentation date： 2006.12

　View Summary

A fixing-shifting encoding (FSE) method is proposed to reduce test cost of IP cores. The FSE method reduces test cost by supporting multiple-mode loading test data, i.e., parallel loading, left-direction, and right-direction serial loading for each test slice data. Furthermore, the FSE that utilizes only two test channels can support a large number of internal scan chains and achieve further reduction in test cost by combining with scan chain clustering method. As a non-intrusive and automatic test pattern generation (ATPG) independent solution, the approach is applicable to IP core testing because it requires neither redesign of the core under test (CUT) nor running any additional ATPG for the encoding procedure. In addition, the decoder has low hardware overhead, and its design is independent of the CUT. Experimental results for some large ISCAS 89 benchmarks and an industry ASIC design have proven the efficiency of the proposed approach. © 2006 IEEE.
FCSCAN: An efficient multiscan-based test compression technique for test cost reduction

Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC

Presentation date： 2006.09

　View Summary

This paper proposes a new multiscan-based test input data compression technique by employing a Fan-out Compression Scan Architecture (FCSCAN) for test cost reduction. The basic idea of FCSCAN is to target the minority specified 1 or 0 bits (either 1 or 0) in scan slices for compression. Due to the low specified bit density in test cube set, FCSCAN can significantly reduce input test data volume and the number of required test channels so as to reduce test cost. The FCSCAN technique is easy to be implemented with small hardware overhead and does not need any special ATPG for test generation. In addition, based on the theoretical compression efficiency analysis, improved procedures are also proposed for the FCSCAN to achieve further compression. Experimental results on both benchmark circuits and one real industrial design indicate that drastic reduction in test cost can be indeed achieved. © 2006 IEEE.
Low power test compression technique for designs with multiple scan chains

Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings of the Asian Test Symposium

Presentation date： 2005.12

　View Summary

This paper presents a new DFT technique that can significantly reduce test data volume as well as scan-in power consumption for multiscan-based designs. It can also help to reduce test time and tester channel requirements with small hardware overhead. In the proposed approach, we start with apre-computed test cube set and fill the don't-cares with proper values for joint reduction of test data volume and scan power consumption. In addition we explore the linear dependencies of the scan chains to construct a fanout structure only with inverters to achieve further compression. Experimental results for the larger ISCAS'89 benchmarks show the efficiency of the proposed technique. © 2005 IEEE.
Alternative run-length coding through scan chain reconfiguration for joint minimization of test data volume and power consumption in scan test

Youhua Shi, Shinji Kimura, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

Proceedings of the Asian Test Symposium

Presentation date： 2004.12

　View Summary

Test data volume and scan power are two major concerns in SoC test. In this paper we present an alternative run-length coding method through scan chain reconfiguration to reduce both test data volume and scan-in power consumption. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. To extract the compatible scan cells we apply a heuristic algorithm by solving the graph coloring problem; and then a simple greedy algorithm is used to configure the scan chain for the minimization of scan power. Experimental results for the larger IS-CAS'89 benchmarks show that the proposed approach leads to highly reduced test data volume with significant power savings during scan test.
Reducing test data volume for multiscan-based designs through single/sequence mixed encoding

Youhua Shi, Youhua Shi, Shinji Kimura, Nozomu Togawa, Nozomu Togawa, Masao Yanagisawa, Masao Yanagisawa, Tatsuo Ohtsuki, Tatsuo Ohtsuki

Midwest Symposium on Circuits and Systems

Presentation date： 2004.12

　View Summary

This paper presents a new test data compression technique for multiscan-based designs through dictionary-based encoding on the single or sequences scan-inputs. In spite of its simplicity, it achieves significant reduction in test data volume. Unlike some previous approaches on test data compression, our approach eliminates the need for additional synchronization and handshaking between the CUT and the ATE, so it is especially suitable to be integrated in a low cost test scheme for SoC test In addition in contrast to previous dictionary-based coding techniques, even for the CUT with a small number of scan chains, the proposed approach can achieve satisfied reduction in test data volume. Experimental results showed the proposed test scheme works particularly well for the large ISCAS'89 benchmarks.
Multiple test set generation method for LFSR-based BIST

Youhua Shi, Zhe Zhang

Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC

Presentation date： 2003.01

　View Summary

© 2003 IEEE. In this paper we propose a new reseeding method for LFSR-based test pattern generation suitable for circuits with random pattern resistant faults. The character of our method is that the proposed test pattern generator (TPG) can work both in normal LFSR mode, to generate pseudorandom test vectors, and in jumping mode to make the TPG jump from a state to the required state (seed of next group). Experimental results indicate that its superiority against other known reseeding techniques with respect to the length of the test sequence and the required area overhead.

▼display all

Research Projects

Efficient self-powered energy harvesting circuit designs

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2018.04

-

2021.03

SHI Youhua

　View Summary

Research and development on efficient energy harvesting (EH) interface circuit designs have been conducted for piezoelectric energy harvesting from our human body. The main achievements can be summarized as follows: (1) Efficient EH interface circuit designs have been proposed by implementing a multiple bias-flip-based E-SECE circuit and scalable EH circuit optimization for multi-source piezoelectric energy harvesting, (2) A wideband design technique with introduced switching delay is proposed for power and frequency bandwidth improvement in piezoelectric energy harvesting, and (3) A wearable device capable of battery-free wireless transmission is successfully realized by using the proposed Piezo-based EH interface circuit.
Research on delay test techniques for ultra-low power designs

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2011

-

2013

SHI YOUHUA

　View Summary

Recently, low power VLSI designs have gained a lot of research attentions from both industry and academia. Consequently, reliability becomes an important design issue in state-of-the-art low power designs. Facing this design challenge, we conducted the following researches on 1) reliable sub-threshold circuit design, 2) wire delay aware low power synthesis methods, and 3) timing error detection method to guarantee the reliability of low power VLSI designs.
Design Methods for Crypto LSI Implementations and Testing

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2009

-

2011

YANAGISAWA Masao, NARA Ryuta, SHI Youhua

　View Summary

Scan test has been widely adopted as a default testing technique among most LSI designs, including crypto cores. However, these scan chains might be used as a "side channel" to recover the secret keys from the hardware implementations of cryptographic algorithms. In this research, we propose SD-SFF(State Dependent Scan Flip Flop) which significantly improves the security with ignorable design requirements for crypto hardware implementations.
Automatic False Path Identification and Test Synthesis System Development to Avoid Overtesting

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research

Project Year :

2007

-

2009

SHI Youhua

　View Summary

The progress of design and manufacturing technology of LSIs makes it possible to realize more functional blocks into a chip with high speed and low power consumption. However it also leads to many new design challenges and one of them is the design and test technique due to the existence of false paths in the designs. Therefore in this research, a new analysis and test synthesis system was developed for the low cost design and test of next-generation LSIs, and with the use of this system novel test techniques, more specifically response compaction techniques and non-overtesting delay test methods, were developed.
タイミングエラー予測によるばらつき耐性を有するLSI設計技術に関する研究

科学研究費助成事業(早稲田大学) 科学研究費助成事業(基盤研究(C))

Misc

Energy-efficient and Real-time FPGA-based YOLOv6 accelerator for Object Detection

X. Sha, Z. Liu, Y. Meng, M. Yanagisawa, Y. Shi

129 - 134 2023.08 [Refereed]

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
Optimizing Hardware-Friendly Object Detection Network for Edge Devices

Z. Liu, X. Sha, H. Tao, M. Yanagisawa, Y. Shi

124 - 128 2023.08 [Refereed]

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
エッジデバイス搭載可能なAttention Moduleを用いた動的手話認識システム

孟悦捷, 柳澤政生, 史又華

人工知能学会第37回全国大会 2023.07 [Refereed]

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
Attention Mask によるディープフェイク動画像の検出

小野尚紀, 史又華

人工知能学会第37回全国大会 2023.07 [Refereed]

Authorship：Last author, Corresponding author

Research paper, summary (national, other academic conference)
TFNNを用いた音声感情認識システムに関する考察

新崎正人, 柳澤政生, 史又華

人工知能学会, 第121回人工知能基本問題研究会 2022.09

Internal/External technical report, pre-print, etc.
自立駆動可能な摩擦帯電エネルギーハーベスティング回路の設計

山本圭乃, 蘇怡瑞, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems 35 53 - 58 2022.08 [Refereed]

Research paper, summary (national, other academic conference)
人の動作によるエネルギーハーベスティングのための圧電素子の実機実験

山口航, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems 35 263 - 268 2022.08 [Refereed]

Research paper, summary (national, other academic conference)
リーク削減による低消費電力SRAMの設計—A low power SRAM design with leakage power reduction

伊藤卓, 戸川望, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems 31 197 - 202 2018.05 [Refereed]

CiNii
低周波圧電エネルギーハーベスティングにおけるMOSs SP-SSHI手法—MOSs SP-SSHI for low frequency piezoelectric energy harvesting

杉山貴紀, 戸川望, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems 31 86 - 91 2018.05 [Refereed]

CiNii
CNNに対する概算加算器の適用と評価—Application and evaluation of CNN with approximate adders

井上雄太, 戸川望, 柳澤政生, 史又華

回路とシステムワークショップ論文集 Workshop on Circuits and Systems 31 191 - 196 2018.05 [Refereed]

CiNii
C-element based soft-error hardened latch designs

30 214 - 219 2017.05 [Refereed]

CiNii
Design of a soft error detection latch using internal node

30 220 - 225 2017.05 [Refereed]

CiNii
Maximum error distance-based optimization of GeAr circuits

30 7 - 12 2017.05 [Refereed]

CiNii
Self-powered switching magnetic transformer circuit for energy harvesting systems

30 1 - 6 2017.05 [Refereed]

CiNii
Fast and Low-power Soft-error Tolerant Fast-SEH Latch

29 220 - 224 2016.05 [Refereed]

CiNii
Timing-error-tolerant AES cipher

吉田慎之介, 史又華, 柳澤政生

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115 ( 465 ) 73 - 78 2016.02

CiNii
In-situ Hardware-Trojan Authentication for Invalidating Malicious Functions

大屋優, 史又華, 柳澤政生

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115 ( 465 ) 79 - 84 2016.02

CiNii
A Quantitative Criterion of Gate-Level Netlist Vulnerability

大屋優, 史又華, 山下哲孝

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115 ( 339 ) 141 - 146 2015.12

CiNii
A Quantitative Criterion of Gate-Level Netlist Vulnerability

大屋優, 史又華, 山下哲孝

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115 ( 338 ) 141 - 146 2015.12

CiNii
A low-power soft error tolerant latch scheme on 15nm process

田島咲季, 史又華, 戸川望

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115 ( 338 ) 123 - 127 2015.12

CiNii
A low-power soft error tolerant latch scheme on 15nm process

田島咲季, 史又華, 戸川望

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115 ( 339 ) 123 - 127 2015.12

CiNii
A-9-2 Low-power soft-error tolerant New-SEH latch scheme

TAJIMA Saki, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao

Proceedings of the IEICE Engineering Sciences Society/NOLTA Society Conference 2015 106 - 106 2015.08

CiNii
A Low-Energy-Overhead and Interconenction-Delay Aware High-level Sythesis Algorithm for Delay Variation Compensation with Body Biasing

( 2015 ) 23 - 28 2015.08

CiNii
AES Encryption Circuit against Clock Glitch based Fault Analysis

平野大輔, 史又華, 戸川望

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115 ( 21 ) 51 - 55 2015.05

CiNii
AES Encryption Circuit against Clock Glitch based Fault Analysis

平野大輔, 史又華, 戸川望, 柳澤政生

情報処理学会研究報告. SLDM, [システムLSI設計技術] 2015 ( 10 ) 1 - 5 2015.05

　View Summary

Recently, fault analysis has attracted a lot of attentions as a new kind of side channel attack methods, in which malicious faults are generally injected by attackers through clock glitch generation, voltage change, or laser manipulation during the execution of a crypto circuit. As existing countermeasures against fault analysis, area-redundant and time-redundant methods have been proposed. However they will cause large area overhead or time overhead. Therefore, in this paper, we proposed an AES circuit design that can detect timing faults caused by malicious clock glitches. Experimental results show that the proposed method can detect 100% timing faults at only 4.9% post-layout area overhead.

CiNii
A low-power soft error tolerant latch scheme

TAJIMA Saki, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao

Technical report of IEICE. VLD 114 ( 476 ) 55 - 60 2015.03

　View Summary

In recent technology scaling, reduction of reliability by soft-error and increase power has appeared as an inevitable problem for logic circuits. We propose a low-power and high soft-error tolerant latch called TSPC-SEH latch based Soft Error Hardened (SEH) latch and True Single Phase Clock (TSPC). To compere SEH latch and DICE latch, the proposed latch archives 42% power reduction, and 54%s delay reduction.

DOI CiNii
A Score-Based Hardware-Trojan Identification Method for Gate-Level Netlists

OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 114 ( 476 ) 165 - 170 2015.03

　View Summary

Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. In this paper, we propose an HT identification method for gate-level netlists without using a Golden netlist. Firstly, we extract several their features specific to Trojan nets using several HT-inserted benchmarks. Secondly, we give scores to Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to identify HTs. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be HT-inserted and all the HT-free gate-level benchmarks to be HT-free in approximately three hours for each benchmark.

CiNii
A Hardware Trojan Detection Method based on Trojan Net Features

OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 114 ( 426 ) 157 - 162 2015.01

　View Summary

Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. Particularly HTs can be easily inserted during design phase but their detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method based on Trojan net features. Most of nets in HTs have several features and our method detects the nets having these features. Our approach does not assume Golden netlists nor activation of HTs. We can succesfully detect a Trojan net in each of the HT-inserted gate-level netlists from the Trust-HUB benchmark. It takes approximately thirty minutes to detect Trojan nets in each benchmark.

CiNii
A Hardware Trojan Detection Method based on Trojan Net Features

OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Computer systems 114 ( 427 ) 157 - 162 2015.01

　View Summary

Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. Particularly HTs can be easily inserted during design phase but their detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method based on Trojan net features. Most of nets in HTs have several features and our method detects the nets having these features. Our approach does not assume Golden netlists nor activation of HTs. We can succesfully detect a Trojan net in each of the HT-inserted gate-level netlists from the Trust-HUB benchmark. It takes approximately thirty minutes to detect Trojan nets in each benchmark.

CiNii
A Hardware Trojan Detection Method based on Trojan Net Features

OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report 114 ( 428 ) 157 - 162 2015.01

　View Summary

Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. Particularly HTs can be easily inserted during design phase but their detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method based on Trojan net features. Most of nets in HTs have several features and our method detects the nets having these features. Our approach does not assume Golden netlists nor activation of HTs. We can succesfully detect a Trojan net in each of the HT-inserted gate-level netlists from the Trust-HUB benchmark. It takes approximately thirty minutes to detect Trojan nets in each benchmark.

CiNii
A Hardware Trojan Detection Method based on Trojan Net Features

大屋優, 史又華, 柳澤政生, 戸川望

情報処理学会研究報告. SLDM, [システムLSI設計技術] 2015 ( 28 ) 1 - 6 2015.01

　View Summary

Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. Partic ularly HTs can be easily inserted during design phase but their detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method based on Trojan net features. Most of nets in HTs have several features and our method detects the nets having these features. Our approach does not assume Golden netlists nor activation of HTs. We can succesfully detect a Trojan net in each of the HT-inserted gate-level netlists from the Trust-HUB benchmark. It takes approximately thirty minutes to detect Trojan nets in each benchmark.

CiNii
Design of Flip-Flop with Timing Error Tolerance

SUZUKI Taito, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

Technical report of IEICE. VLD 114 ( 328 ) 45 - 50 2014.11

　View Summary

Under the influence of the miniaturization of the integrated circuit, the variation of the operation condition of the circuit becomes bigger, and margins of the supply voltage and the clock frequency necessary for a design increase. For the mitigation of the margin, the structure of the circuit with the timing error tolerance is studied flourishingly. In this paper, we propose two new Time Borrowing Flip-Flops (TBFF) in transistor level to realize timing error tolerance by switching from flip-flop to latch dynamically. HSPICE simulation results show that the proposed TBFF can achieve up to 28.1% power reduction when compared with existing works.

CiNii
Data Dependent Optimization using Suspicious Timing Error Prediction for Reconfigurable Approximation Circuits

KAWAMURA Kazushi, ABE Shinya, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 114 ( 328 ) 51 - 56 2014.11

　View Summary

The propagation delay along each path inside an LSI widely varies depending on input data, and this property can be exploited to design high-performance approximation circuit with a negligible error rate. In this paper, we propose a novel approximation circuit design algorithm, which identifies paths to be optimized based on input data and reconfigures these paths. Our algorithm first identifies the optimized paths by incorporating timing error prediction circuits into a target circuit and running them in practice. These paths are then dynamically reconfigured within an accuracy constraint with the objective of maximizing its performance. Experimental results targeting a set of basic adders show that our algorithm can achieve performance increase by up to 18.5% within acceptable error of 2.1% compared with conventional design techniques.

CiNii
An Effective Robust Design Using Improved Checkpoint Insertion Algorithm for Suspicious Timing-Error Prediction Scheme and its Evaluations

YOSHIDA Shinnosuke, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 114 ( 328 ) 57 - 62 2014.11

　View Summary

As process technologies advance, process and delay variation causes a complex timing design and in-situ timing error correction techniques are strongly required. Suspicious timing error prediction (STEP) predicts timing errors by monitoring checkpoints by STEP circuits (STEPCs) and how to insert checkpoints is very important. We have proposed a network-flow-based checkpoint insertion algorithm for STEP. However, our algorithm may ignore long paths and insert checkpoints near the output. In this paper, we improve how to ignore short paths and set labels by estimating path lengths. Then, we can ignore only short paths and insert checkpoints into near the center of all long paths. We evaluate our algorithm by applying it to four benchmark circuits. Experimental results show that our proposed algorithm realizes an average of 1.71X overclocking compared with just inserting no STEPC. Furthermore, our improved algorithm realizes an average of 1.15X overclocking compared with our original algorithm.

CiNii
High speed design of sub-threshold circuit by using DTMOS

FUKUDOME Yuji, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

Technical report of IEICE. VLD 114 ( 328 ) 117 - 121 2014.11

　View Summary

Low power consumption is achieved by operating circuits in sub-threshold region. However, in subthreshold region, the operating speed becomes slow, and the tradeoff between power and speed should be considered carefully. In this work, we present DTMOS implementations to realize high speed and low power in subthreshold region. Transistor level simulation results show that the operating speed can be improved by 30 %-45 %, and on average 15 % energy reduction can be achieved when V_<dd> ranges 0.2-0.3V.

CiNii
A Hardware Trojans Detection Method focusing on Nets in Hardware Trojans in Gate-Level Netlists

OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 114 ( 328 ) 135 - 140 2014.11

　View Summary

Recently, digital ICs are designed by outside vendors to reduce design costs in semiconductor industry. This circumstance introduces risks that malicious attackers implement Hardware Trojans (HTs) into ICs. HTs are easily inserted in particular during design phase, but HTs detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method through detecting LSLG nets, which have low switching probabilities. Our approach does not assume Golden netlists nor activation of HTs. We succesfully find out that all HT-inserted gate-level netlists from Trust-HUB benchmarks include a small number of LSLG nets. It takes approximately ten minutes to detect LSLG nets in each benchmark.

CiNii
Energy-efficient High-level Synthesis Algorithm targeting HDR-mcv Architecture with Multiple Clock Domains and Multiple Supply Voltages

ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 114 ( 328 ) 203 - 208 2014.11

　View Summary

An HDR-mcv architecture, which integrates multiple supply voltages and multiple clock domains into high-level synthesis and enables us to estimate interconnection delay effects during high-level synthesis, has been proposed with the corresponding synthesis algorithm. They assign voltages and clock frequencies to huddles which are the partitions for interconnection delay estimation during high-level synthesis. However, the voltage and clock assignment may have some energy overheads due to the increased clock trees. In this paper, we propose a new HDR-mcv architecture in which supply voltages are assigned to functional logics and clock synchronization logics separately. Next, we propose a high-level synthesis algorithm for the architecture, which can assign clock frequencies and supply voltages on the bases of the placement and energy informations. Experimental results show that the proposed method achieves 50% energy-saving compared with the conventional HDR-mcv architecture and 60% energy-saving compared with the existing high-level synthesis methods.

CiNii
Design of Flip-Flop with Timing Error Tolerance

SUZUKI Taito, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

IEICE technical report. Dependable computing 114 ( 329 ) 45 - 50 2014.11

　View Summary

Under the influence of the miniaturization of the integrated circuit, the variation of the operation condition of the circuit becomes bigger, and margins of the supply voltage and the clock frequency necessary for a design increase. For the mitigation of the margin, the structure of the circuit with the timing error tolerance is studied flourishingly. In this paper, we propose two new Time Borrowing Flip-Flops (TBFF) in transistor level to realize timing error tolerance by switching from flip-flop to latch dynamically. HSPICE simulation results show that the proposed TBFF can achieve up to 28.1% power reduction when compared with existing works.

CiNii
Data Dependent Optimization using Suspicious Timing Error Prediction for Reconfigurable Approximation Circuits

KAWAMURA Kazushi, ABE Shinya, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Dependable computing 114 ( 329 ) 51 - 56 2014.11

　View Summary

The propagation delay along each path inside an LSI widely varies depending on input data, and this property can be exploited to design high-performance approximation circuit with a negligible error rate. In this paper, we propose a novel approximation circuit design algorithm, which identifies paths to be optimized based on input data and reconfigures these paths. Our algorithm first identifies the optimized paths by incorporating timing error prediction circuits into a target circuit and running them in practice. These paths are then dynamically reconfigured within an accuracy constraint with the objective of maximizing its performance. Experimental results targeting a set of basic adders show that our algorithm can achieve performance increase by up to 18.5% within acceptable error of 2.1% compared with conventional design techniques.

CiNii
An Effective Robust Design Using Improved Checkpoint Insertion Algorithm for Suspicious Timing-Error Prediction Scheme and its Evaluations

YOSHIDA Shinnosuke, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Dependable computing 114 ( 329 ) 57 - 62 2014.11

　View Summary

As process technologies advance, process and delay variation causes a complex timing design and in-situ timing error correction techniques are strongly required. Suspicious timing error prediction (STEP) predicts timing errors by monitoring checkpoints by STEP circuits (STEPCs) and how to insert checkpoints is very important. We have proposed a network-flow-based checkpoint insertion algorithm for STEP. However, our algorithm may ignore long paths and insert checkpoints near the output. In this paper, we improve how to ignore short paths and set labels by estimating path lengths. Then, we can ignore only short paths and insert checkpoints into near the center of all long paths. We evaluate our algorithm by applying it to four benchmark circuits. Experimental results show that our proposed algorithm realizes an average of 1.71X overclocking compared with just inserting no STEPC. Furthermore, our improved algorithm realizes an average of 1.15X overclocking compared with our original algorithm.

CiNii
High speed design of sub-threshold circuit by using DTMOS

FUKUDOME Yuji, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

IEICE technical report. Dependable computing 114 ( 329 ) 117 - 121 2014.11

　View Summary

Low power consumption is achieved by operating circuits in sub-threshold region. However, in sub-threshold region, the operating speed becomes slow, and the tradeoff between power and speed should be considered carefully. In this work, we present DTMOS implementations to realize high speed and low power in subthreshold region. Transistor level simulation results show that the operating speed can be improved by 30 %-45 %, and on average 15 % energy reduction can be achieved when V_<dd> ranges 0.2-0.3V.

CiNii
A Hardware Trojans Detection Method focusing on Nets in Hardware Trojans in Gate-Level Netlists

OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Dependable computing 114 ( 329 ) 135 - 140 2014.11

　View Summary

Recently, digital ICs are designed by outside vendors to reduce design costs in semiconductor industry. This circumstance introduces risks that malicious attackers implement Hardware Trojans (HTs) into ICs. HTs are easily inserted in particular during design phase, but HTs detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method through detecting LSLG nets, which have low switching probabilities. Our approach does not assume Golden netlists nor activation of HTs. We succesfully find out that all HT-inserted gate-level netlists from Trust-HUB benchmarks include a small number of LSLG nets. It takes approximately ten minutes to detect LSLG nets in each benchmark.

CiNii
Energy-efficient High-level Synthesis Algorithm targeting HDR-mcv Architecture with Multiple Clock Domains and Multiple Supply Voltages

ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Dependable computing 114 ( 329 ) 203 - 208 2014.11

　View Summary

An HDR-mcv architecture, which integrates multiple supply voltages and multiple clock domains into high-level synthesis and enables us to estimate interconnection delay effects during high-level synthesis, has been proposed with the corresponding synthesis algorithm. They assign voltages and clock frequencies to huddles which are the partitions for interconnection delay estimation during high-level synthesis. However, the voltage and clock assignment may have some energy overheads due to the increased clock trees. In this paper, we propose a new HDR-mcv architecture in which supply voltages are assigned to functional logics and clock synchronization logics separately. Next, we propose a high-level synthesis algorithm for the architecture, which can assign clock frequencies and supply voltages on the bases of the placement and energy informations. Experimental results show that the proposed method achieves 50% energy-saving compared with the conventional HDR-mcv architecture and 60% energy-saving compared with the existing high-level synthesis methods.

CiNii
Data Dependent Optimization using Suspicious Timing Error Prediction for Reconfigurable Approximation Circuits

Author not found

研究報告システムとLSIの設計技術（SLDM） 2014 ( 2 ) 1 - 6 2014.11

　View Summary

LSI 内部の各パス遅延は入力データに応じて様々に変動する．この性質を利用することで，計算精度をわずかに落としながらも高速に動作する LSI の設計が可能になる．本稿では，入力データ群にもとづき特定された最適化すべきパスをリコンフィギュレーションし最適化する，新たな回路設計アルゴリズムを提案する．提案アルゴリズムは最適化対象の回路にタイミングエラー予測回路を挿入し動作させることで被最適化パスを特定，動的に再構成し与えられたエラー制約内で動作クロック周期の最小化を図る．本アルゴリズムを加算器に対して適用した結果，通常のクリティカルパス最小化の設計と比較し，2.1 ％以下のエラーを許容する制約下で最大 18.5％の高速化に成功した．The propagation delay along each path inside an LSI widely varies depending on input data, and this property can be exploited to design high-performance approximation circuit with a negligible error rate. In this paper, we propose a novel approximation circuit design algorithm, which identifies paths to be optimized based on input data and reconfigures these paths. Our algorithm first identifies the optimized paths by incorporating timing error prediction circuits into a target circuit and running them in practice. These paths are then dynamically reconfigured within an accuracy constraint with the objective of maximizing its performance. Experimental results targeting a set of basic adders show that our algorithm can achieve performance increase by up to 18.5% within acceptable error of 2.1% compared with conventional design techniques.

CiNii
Design of Flip-Flop with Timing Error Tolerance

鈴木大渡, 史又華, 戸川望, 宇佐美公良, 柳澤政生

研究報告システムとLSIの設計技術（SLDM） 2014 ( 1 ) 1 - 6 2014.11

　View Summary

集積回路の微細化の影響により，回路のばらつきが大きくなっており，設計に必要な電源電圧やクロック周波数のマージンが増大している．マージンの緩和のため，タイミングエラーへの耐性を持つ回路の構造が盛んに研究されている．本稿では，フリップフロップの動作とラッチの動作を動的に切り替えることによりタイミングエラー耐性を実現する Time Borrowing Flip-Flop(TBFF) のトランジスタレベルの構造を 2 通り提案したまた，HSPICE シミュレーションによる評価を行い，従来手法と比較して消費エネルギーを最大 20.6%削減できることを示した．Under the influence of the miniaturization of the integrated circuit, the variation of the operation condition of the circuit becomes bigger, and margins of the supply voltage and the clock frequency necessary for a design increase. For the mitigation of the margin, the structure of the circuit with the timing error tolerance is studied flourishingly. In this paper, we propose two new Time Borrowing Flip-Flops (TBFF) in transistor level to realize timing error tolerance by switching from flip-flop to latch dynamically. HSPICE simulation results show that the proposed TBFF can achieve up to 28.1% power reduction when compared with existing works.

CiNii
An Effective Robust Design Using Improved Checkpoint Insertion Algorithm for Suspicious Timing-Error Prediction Scheme and its Evaluations

吉田慎之介, 史又華, 柳澤政生, 戸川望

研究報告システムとLSIの設計技術（SLDM） 2014 ( 3 ) 1 - 6 2014.11

　View Summary

近年，半導体技術の進展に伴いタイミングエラー発生の危険性が増加している．STEP はタイミングエラーを事前に予測できる手法であるが，STEP 回路を挿入する位置が重要である．このような背景から、回路面積を考慮した STEP 回路の挿入位置決定手法を提案した．本手法では STEP 回路の個数を削減するために短いパスを無視するが，長いパスまで無視する可能性があった．また，短いパスに合わせて位置ラベルを付けるため，STEP 回路の挿入位置がパスの後半に偏る可能性があった．本稿では STEP 回路の挿入位置決定手法で用いる，短いパスの探索方法とラベル付けの方法を改良する．パスの長さを推定することで短いパスのみを無視できるため，これまで STEP 回路を挿入しなかった長いパスで発生するタイミングエラーが予測できる．また，任意の長さのパスに合わせたラベル付けもできるため，チェックポイントがバスの後半となることを防ぐ．改良した手法を複数の回路に対して適用し，最大動作周波数の向上を図る．実験結果より STEP 回路を入れない場合と比較して，最大動作周波数を平均 1.71 倍に向上させることができた．改良前の手法と比較すると，最大動作周波数を平均 1.15 倍に向上させることができた．As process technologies advance, process and delay variation causes a complex timing design and in-situ timing error correction techniques are strongly required. Suspicious timing error prediction (STEP) predicts timing errors by monitoring checkpoints by STEP circuits (STEPCs) and how to insert checkpoints is very important. We have proposed a network-flow-based checkpoint insertion algorithm for STEP. However, our algorithm may ignore long paths and insert checkpoints near the output. In this paper, we improve how to ignore short paths and set labels by estimating path lengths. Then, we can ignore only short paths and insert checkpoints into near the center of all long paths. We evaluate our algorithm by applying it to four benchmark circuits. Experimental results show that our proposed algorithm realizes an average of 1.71X overclocking compared with just inserting no STEPC. Furthermore, our improved algorithm realizes an average of 1.15X overclocking compared with our original algorithm.

CiNii
High speed design of sub-threshold circuit by using DTMOS

福留祐治, 史又華, 戸川望, 宇佐美公良, 柳澤政生

研究報告システムとLSIの設計技術（SLDM） 2014 ( 21 ) 1 - 5 2014.11

　View Summary

サブスレッショルド領域で回路を動作させることで低電力化は実現されるが，同時に速度が劣化するトレードオフの関係にある．本稿ではサブスレッショルド領域において低電力で高速化を実現するため，DTMOS を用いたサブスレッシヨルド回路の高速化設計を行い，トランジスタレベルのシミュレーションの結果，30～45％高速化し，Vdd＝0.2Ｖ, 0.3Ｖにおいて平均 15％低エネルギー化したことを示す．Low power consumption is achieved by operating circuits in sub-threshold region. However, in sub-threshold region, the operating speed becomes slow, and the tradeoff between power and speed should be considered carefully. In this work, we present DTMOS implementations to realize high speed and low power in subthreshold region. Transistor level simulation results show that the operating speed can be improved by 30 %-45 %, and on average 15 % energy reduction can be achieved when Vdd ranges 0.2-0.3V.

CiNii
A Hardware Trojans Detection Method focusing on Nets in Hardware Trojans in Gate-Level Netlists

大屋優, 史又華, 柳澤政生, 戸川望

研究報告システムとLSIの設計技術（SLDM） 2014 ( 24 ) 1 - 6 2014.11

　View Summary

近年チップの製造をサードパーティに外注するようになり，ハードウェアトロイが挿入される可能性が高まってきた．特に設計段階では簡単にハードウェアトロイを挿入することができる．ゲートレベルのネットリストに対してハードウェアトロイ検出手法を適用する場合，我々は Golden ネットリストを持っておらず，挿入されているハードウェアトロイを活性化するという条件下でハードウェアトロイを検出する手法が存在するのみである．本稿では，Golden ネットリストが無く，ハードウェアトロイを活性化させなくてもハードウェアトロイを検出する手法として，低スイッチング確率のネット（LSLG ネットと呼ぶ）の検出を通じてハードウェアトロイを検出する手法を提案する LSLG ネットはネットリストに含まれるネットの数％であるにも関わらず，Trｕｓｔ-HUB の Abstraction Gate Level で公開されているハードウェアトロイが挿入されている全てのゲートレベルのネットリストに対して，ハードウェアトロイの一部を検出することに成功した．提案手法にかかる時間は高々十数分程度である．Recently, digital ICs are designed by outside vendors to reduce design costs in semiconductor industry. This circumstance introduces risks that malicious attackers implement Hardware Trojans (HTs) into ICs. HTs are easily inserted in particular during design phase, but HTs detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method through detecting LSLG nets, which have low switching probabilities. Our approach does not assume Golden netlists nor activation of HTs. We succesfully find out that all HT-inserted gate-level netlists from Trust-HUB benchmarks include a small number of LSLG nets. It takes approximately ten minutes to detect LSLG nets in each benchmark.

CiNii
Energy-efficient High-level Synthesis Algorithm targeting HDR-mcv Architecture with Multiple Clock Domains and Multiple Supply Voltages

阿部晋矢, 史又華, 宇佐美公良, 柳澤政生, 戸川望

研究報告システムとLSIの設計技術（SLDM） 2014 ( 40 ) 1 - 6 2014.11

　View Summary

低電力かつ高速な LSI の設計へ向け，配線遅延を考慮しながら複数クロックドメイン，複数電源電圧を同時に適用可能な HDR-mcv および高位合成手法が提案された．従来手法はクロックおよび電圧をハドルと呼ぶ区画毎に割り当てるが，クロックツリー数の増加による消費エネルギーのオーバヘッドが無視できない．提案手法はクロックに同期する論理，および演算回路に対し独立に電圧を割り当てることで，クロックツリー数を増加せずに複数クロックドメインと複数電源電圧を同時適用する．計算機実験結果により，提案手法は従来の HDR-mcv アーキテクチャを対象とした高位合成アルゴリズムと比較し 50％程度消費エネルギーを削減し，最終的に従来のレジスタ分散型アーキテクチャと比較し提案手法は 60％程度消費エネルギーを削減できることを確認した．An HDR-mcv architecture, which integrates multiple supply voltages and multiple clock domains into high-level synthesis and enables us to estimate interconnection delay effects during high-level synthesis, has been proposed with the corresponding synthesis algorithm. They assign voltages and clock frequencies to huddles which are the partitions for interconnection delay estimation during high-level synthesis. However, the voltage and clock assignment may have some energy overheads due to the increased clock trees. In this paper, we propose a new HDR-mcv architecture in which supply voltages are assigned to functional logics and clock synchronization logics separately. Next, we propose a high-level synthesis algorithm for the architecture, which can assign clock frequen cies and supply voltages on the bases of the placement and energy informations. Experimental results show that the proposed method achieves 50% energy-saving compared with the conventional HDR-mcv architecture and 60% energy-saving compared with the existing high-level synthesis methods.

CiNii
Local pulse generation in variable stages pipeline designs for low energy consumption

NII Takayuki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

Technical report of IEICE. VLD 114 ( 231 ) 7 - 12 2014.10

　View Summary

The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

CiNii
Local pulse generation in variable stages pipeline designs for low energy consumption

NII Takayuki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

IEICE technical report. Image engineering 114 ( 233 ) 7 - 12 2014.10

　View Summary

The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

CiNii
Local pulse generation in variable stages pipeline designs for low energy consumption

NII Takayuki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

Technical report of IEICE. ICD 114 ( 232 ) 7 - 12 2014.10

　View Summary

The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

CiNii
Local pulse generation in variable stages pipeline designs for low energy consumption

Takayuki Nii, Youhua Shi, Nozomu Togawa, Kimiyoshi Usami, Masao Yanagisawa

研究報告システムとLSIの設計技術（SLDM） 2014 ( 2 ) 1 - 6 2014.09

　View Summary

The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

CiNii
An Effective Robust Design for Large Delay Variation Using Suspicious Timing-Error Prediction Scheme

吉田慎之介, 史又華, 柳澤政生, 戸川望

DAシンポジウム2014論文集 2014 61 - 66 2014.08

CiNii
A Network-flow-based Checkpoint Insertion Algorithm for Suspicious Timing Error Prediction Scheme

27 416 - 421 2014.08

CiNii
Latch-based AES Encryption Circuit Against Fault Analysis

SHI Youhua, TANIGUCHI Hiroaki, TOGAWA Nozomu, YANAGISAWA Masao

Technical report of IEICE. VLD 113 ( 454 ) 37 - 42 2014.03

　View Summary

In general, cryptography is considered to be secure because it is based on complicated mathematical theories. In recent year, however, attacks on not crypto algorithms but hardware implementations such as fault analysis methods have posed new security threats. Cryptographic circuits are prone to fault analysis that intend to retrieve secret data by means of malicious fault injection. Clock-adjustment, voltage change, and laser manipulation can be used to inject malicious faults during the execution of a crypto circuit. As countermeasures against fault analysis, area-redundant methods such as triple modular redundant(TMR) and timing-redundant methods have been proposed at the cost of area or throughput. In this paper, we proposed a latch-based AES encryption circuit, with 18.1% area overhead and 5% throughput improvement, which can detect all the possible errors during the fault analysis region of clock glitch based fault analysis. In addition to fault analysis detection, the proposed method can also prevent the transmission and the use of erroneous results, and then can guarantee the correctness of the final encrypted outputs.

CiNii
Secure scan design using improved random order scans and its evaluations

OYA Masaru, ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 113 ( 454 ) 43 - 48 2014.03

　View Summary

Scan test using scan chains is one of the most important DFT techniques. On the other hand, scan-based attacks are reported which can retrieve the secret key in crypto circuits by using scan chains. Secure scan architecture is strongly required to protect scan chains from scan-based attacks. In this paper, we propose an improved version of random order scans as a secure scan architecture. In our improved random order scans, a scan chain is partitioned into multiple sub-chains. The structure of the scan chain changes dynamically by selecting a subchain to scan out using enable signals. We also discuss testability and security of our improved random order scans and demonstrate their effectiveness through implementation results.

CiNii
Experiment and Analysis on Temperature Dependence of Delay and Energy for Subthreshold Circuits

KUSHIDA Hiroki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

Technical report of IEICE. VLD 113 ( 454 ) 147 - 151 2014.03

　View Summary

Low voltage design has been used in order to reduce the energy dissipation of mobile network equipment. However, as supply voltage reduces into subthreshold region, performance degradation and environment variations become the primary design challenges. In this paper, we implemented a super-pipelined multiplier for subthreshold supply voltage. With super-pipeline, the performance and energy efficiency can be improved. Moreover, experimental evaluations on the temperature dependences of delay and energy are also conducted for analysis.

CiNii
Suspicious timing error prediction using check points

IGARASHI Hiroaki, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Dependable computing 113 ( 321 ) 39 - 44 2013.11

　View Summary

Due to advance process technologies, timing design of LSIs has become more difficult and the importance of timing error countermeasure techniques is increasing as well. Existing timing error detection/correction methods have difficulties in timing design since they have complex structure. Furthermore, their error correction is realized by re-run operation which results in low throughput. We have proposed a suspicious timing error prediction method (STEP method) which predicts timing error and corrects it with simple structure. STEP is based on checking timing errors by observing several checkpoints on signal paths. Since STEP is a timing error prediction method, we may have false positives and reduction of them is one of the largest problems. In this paper, we propose a method to reduce the false positives to optimize the checkpoints. The experimental results show that an operational frequency is increased by up to 2.4 times and its throughput is improved by up to 45%.

CiNii
Clock Energy-efficient High-level Synthesis and Experimental Evaluation for HDR-mcd Architecture

ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Dependable computing 113 ( 321 ) 263 - 268 2013.11

　View Summary

In this paper, we propose a clock energy-efficient high-level synthesis algorithm for HDR-mcd architecture. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay is required and should be considered during high-level synthesis. In our iterative improvement based algorithm, low-frequency clocks are assigned to non-critical huddles under resource and latency constraints for energy efficiency improvement. Experimental results show that the proposed method achieves 20% clock energy-saving and 10% total energy-saving compared with the existing methods considering clock gating.

CiNii
Suspicious timing error prediction using check points

IGARASHI Hiroaki, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 113 ( 320 ) 39 - 44 2013.11

　View Summary

Due to advance process technologies, timing design of LSIs has become more difficult and the importance of timing error countermeasure techniques is increasing as well. Existing timing error detection/correction methods have difficulties in timing design since they have complex structure. Furthermore, their error correction is realized by re-run operation which results in low throughput. We have proposed a suspicious timing error prediction method (STEP method) which predicts timing error and corrects it with simple structure. STEP is based on checking timing errors by observing several checkpoints on signal paths. Since STEP is a timing error prediction method, we may have false positives and reduction of them is one of the largest problems. In this paper, we propose a method to reduce the false positives to optimize the checkpoints. The experimental results show that an operational frequency is increased by up to 2.4 times and its throughput is improved by up to 45%.

CiNii
Clock Energy-efficient High-level Synthesis and Experimental Evaluation for HDR-mcd Architecture

ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 113 ( 320 ) 263 - 268 2013.11

　View Summary

In this paper, we propose a clock energy-efficient high-level synthesis algorithm for HDR-mcd architecture. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay is required and should be considered during high-level synthesis. In our iterative improvement based algorithm, low-frequency clocks are assigned to non-critical huddles under resource and latency constraints for energy efficiency improvement. Experimental results show that the proposed method achieves 20% clock energy-saving and 10% total energy-saving compared with the existing methods considering clock gating.

CiNii
Suspicious timing error prediction using check points

五十嵐博昭, 史又華, 柳澤政生, 戸川望

研究報告システムLSI設計技術（SLDM） 2013 ( 8 ) 1 - 6 2013.11

　View Summary

プロセス技術の微細化により LSI のタイミング設計が難しくなっており，タイミングエラー対策手法の重要性が高まっている．既存のタイミングエラー検出手法はエラー訂正に再実行が必要であったり，複雑な構造を持つためタイミング設計が難しい我々はより訂正コストが小さく簡単な構造を持つタイミングエラー対策手法として STEP を提案している．STEP ではチェックポイントと呼ばれるパス中の観測点をチェックすることでタイミングエラー発生の可能性を検出する．STEP はタイミングエラー予測手法であるため誤検出が発生し，誤検出の削減が大きな課題である．本稿ではチェックポイントの最適化により誤検出を削減する手法を提案する．実験結果より，動作可能周波数が最大で 24 倍となり，スループットは最大で約 45％向上した．Due to advance process technologies, timing design of LSIs has become more difficult and the importance of timing error countermeasure techniques is increasing as well. Existing timing error detection/correction methods have difficulties in timing design since they have complex structure. Furthermore, their error correction is realized by re-run operation which results in low throughput. We have proposed a suspicious timing error prediction method (STEP method) which predicts timing error and corrects it with simple structure. STEP is based on checking timing errors by observing several checkpoints on signal paths. Since STEP is a timing error prediction method, we may have false positives and reduction of them is one of the largest problems. In this paper, we propose a method to reduce the false positives to optimize the checkpoints. The experimental results show that an operational frequency is increased by up to 2.4 times and its throughput is improved by up to 45%.

CiNii
Clock Energy-efficient High-level Synthesis and Experimental Evaluation for HDR-mcd Architecture

阿部晋矢, 史又華, 宇佐美公良, 柳澤政生, 戸川望

研究報告システムLSI設計技術（SLDM） 2013 ( 47 ) 1 - 6 2013.11

　View Summary

LSI 全体に占めるクロック信号によるエネルギー消費の割合は大きく，マルチクロックドメイン，クロックゲーテイングなどが提案された．本稿では，マルチクロックドメイン指向 HDR-mcd アーキテクチャを対象としたクロックエネルギー削減に向けた高位合成手法を提案する．提案手法は 1 クロック内の通信が保障されるハドルと呼ぶ区画を利用し，配線遅延の影響を予測，異なるクロック間の同期を考慮した高位合成を実現する．クロックはハドル毎に割り当て，資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する．計算機実験により提案手法はクロックゲーテイングのみを考慮した従来手法と比較し，クロックツリーのエネルギーを 30％程度削減でき，全体のエネルギーを 25％程度削減できることを確認した．In this paper, we propose a clock energy-efficient high-level synthesis algorithm for HDR-mcd architecture. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay is required and should be considered during high-level synthesis. In our iterative improvement based algorithm, low-frequency clocks are assigned to non-critical huddlesunder resource and latency constraints for energy efficiency improvement. Experimental results show that the proposed method achieves 20% clock energy-saving and 10% total energy-saving compared with the existing methods considering clock gating.

CiNii
A Comsideration on Hardware Trojan Detection Specifying Trojan Path

Atobe Yuta, Shi Youhua, Yanagisawa Masao, Togawa Nozomu

Proceedings of the Society Conference of IEICE 2013 48 - 48 2013.09

CiNii
Data Recoverable AES Circuit Against Differential Fault Analysis

Taniguchi Hiroaki, Shi Youhua, Togawa Nozomu, Yanagisawa Masao

Proceedings of the Society Conference of IEICE 2013 49 - 49 2013.09

CiNii
Random Order Scan Design against Scan-Based Attacks

26 448 - 453 2013.07

CiNii
Energy-Efficient High-Level Synthesis with Multiple Clock Domain for HDR-mcd

阿部晋矢, 史又華, 宇佐美公良

回路とシステムワークショップ論文集 Workshop on Circuits and Systems 26 185 - 190 2013.07

CiNii
フロアプランを考慮したマルチクロックドメイン指向の低電力化高位合成手法 (コンピュータシステム組込み技術とネットワークに関するワークショップETNET2013)

阿部晋矢, 史又華, 柳澤政生, 戸川望

電子情報通信学会技術研究報告 : 信学技報 112 ( 481 ) 115 - 120 2013.03

　View Summary

本稿では,マルチクロックドメイン適用へ向け,HDRアーキテクチャを拡張したHDR-mcdを提案する.続いてHDR-mcdを対象にマルチクロックドメイン指向の低電力化高位合成を提案する.提案手法はフロアプラン情報をフィードバックし,反復改良する合成フローを取る.その際,1クロック内の通信が保障されるパドルと呼ぶ区画を利用し,配線遅延の影響を予測,異なるクロック間の同期を考慮した高位合成を実現する.クロックはパドル毎に割り当て,資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する.計算機実験により提案手法は従来の単一クロックのみを考慮したレジスタ分散型アーキテクチャと比較し25%程度消費エネルギーを削減できることを確認した.

CiNii
フロアプランを考慮したマルチクロックドメイン指向の低電力化高位合成手法

阿部晋矢, 史又華, 柳澤政生, 戸川望

研究報告システムLSI設計技術（SLDM） 2013 ( 20 ) 1 - 6 2013.03

　View Summary

本稿では，マルチクロックドメイン適用へ向け，HDRアーキテクチャを拡張したHDR-mcdを提案する．続いてHDR-mcdを対象にマルチクロックドメイン指向の低電力化高位合成を提案する．提案手法はフロアプラン情報をフィードバックし，反復改良する合成フローを取る．その際，1クロック内の通信が保障されるハドルと呼ぶ区画を利用し，配線遅延の影響を予測，異なるクロック間の同期を考慮した高位合成を実現する．クロックはハドル毎に割り当て，資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する．計算機実験により提案手法は従来の単一クロックのみを考慮したレジスタ分散型アーキテクチャと比較し25%程度消費エネルギーを削減できることを確認した．

CiNii
フロアプランを考慮したマルチクロックドメイン指向の低電力化高位合成手法

阿部晋矢, 史又華, 柳澤政生, 戸川望

研究報告組込みシステム（EMB） 2013 ( 20 ) 1 - 6 2013.03

　View Summary

本稿では，マルチクロックドメイン適用へ向け，HDRアーキテクチャを拡張したHDR-mcdを提案する．続いてHDR-mcdを対象にマルチクロックドメイン指向の低電力化高位合成を提案する．提案手法はフロアプラン情報をフィードバックし，反復改良する合成フローを取る．その際，1クロック内の通信が保障されるハドルと呼ぶ区画を利用し，配線遅延の影響を予測，異なるクロック間の同期を考慮した高位合成を実現する．クロックはハドル毎に割り当て，資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する．計算機実験により提案手法は従来の単一クロックのみを考慮したレジスタ分散型アーキテクチャと比較し25%程度消費エネルギーを削減できることを確認した．

CiNii
フロアプランを考慮したマルチクロックドメイン指向の低電力化高位合成手法(動作合成,組込み技術とネットワークに関するワークショップETNET2013)

阿部晋矢, 史又華, 柳澤政生, 戸川望

電子情報通信学会技術研究報告. CPSY, コンピュータシステム 112 ( 481 ) 115 - 120 2013.03

　View Summary

本稿では,マルチクロックドメイン適用へ向け,HDRアーキテクチャを拡張したHDR-mcdを提案する.続いてHDR-mcdを対象にマルチクロックドメイン指向の低電力化高位合成を提案する.提案手法はフロアプラン情報をフィードバックし,反復改良する合成フローを取る.その際,1クロック内の通信が保障されるパドルと呼ぶ区画を利用し,配線遅延の影響を予測,異なるクロック間の同期を考慮した高位合成を実現する.クロックはパドル毎に割り当て,資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する.計算機実験により提案手法は従来の単一クロックのみを考慮したレジスタ分散型アーキテクチャと比較し25%程度消費エネルギーを削減できることを確認した.

CiNii
Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Attack

ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 112 ( 320 ) 45 - 50 2012.11

　View Summary

Secure cryptographic LSIs is intensively used in order to perform confidential operation. Scan test has become the most widely adopted test technique to ensure the correctness of manufactured LSIs, in which through the scan chains the internal states of the circuit under test (CUT) can be controlled and observed externally. However, scan chains using scan test might carry the risk of being misused for secret information leakage. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of vairous crypto circuits implementation is compromised, and the effectiveness of the proposed method. Experimental results on various crypto implementations show the effectiveness of the proposed method.

CiNii
SAAV:Energy-efficient High-level Synthesis Algorithm targeting Adaptive Voltage Huddle-based Distributed Register Architecture with Dynamic Multiple Supply Voltages

ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 112 ( 320 ) 135 - 140 2012.11

　View Summary

An adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis, and a synthesis algorithm for AVHDR architectures have been proposed. This algorithm is based on iterative improvement of scheduling/binding and floorplanning and can converge without oscillation by using virtual-area-based iterative refinement flow. However, virtual areas may have some area and interconnection delay overheads. In this paper, we propose virtual area adaptation which relaxes these overheads as the iteration proceeds. Experimental results show that our algorithm achieves 6.2% energy saving compared with conventional algorithm for AVHDR architectures and 65.7% energy saving compared with conventional algorithms.

CiNii
Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Attack

ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Dependable computing 112 ( 321 ) 45 - 50 2012.11

　View Summary

Secure cryptographic LSIs is intensively used in order to perform confidential operation. Scan test has become the most widely adopted test technique to ensure the correctness of manufactured LSIs, in which through the scan chains the internal states of the circuit under test (CUT) can be controlled and observed externally. However, scan chains using scan test might carry the risk of being misused for secret information leakage. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of vairous crypto circuits implementation is compromised, and the effectiveness of the proposed method. Experimental results on various crypto implementations show the effectiveness of the proposed method.

CiNii
SAAV:Energy-efficient High-level Synthesis Algorithm targeting Adaptive Voltage Huddle-based Distributed Register Architecture with Dynamic Multiple Supply Voltages

ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Dependable computing 112 ( 321 ) 135 - 140 2012.11

　View Summary

An adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis, and a synthesis algorithm for AVHDR architectures have been proposed. This algorithm is based on iterative improvement of scheduling/binding and floorplanning and can converge without oscillation by using virtual-area-based iterative refinement flow. However, virtual areas may have some area and interconnection delay overheads. In this paper, we propose virtual area adaptation which relaxes these overheads as the iteration proceeds. Experimental results show that our algorithm achieves 6.2% energy saving compared with conventional algorithm for AVHDR architectures and 65.7% energy saving compared with conventional algorithms.

CiNii
Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Attack

跡部悠太, 史又華, 柳澤政生, 戸川望

研究報告システムLSI設計技術（SLDM） 2012 ( 9 ) 1 - 6 2012.11

　View Summary

暗号 LSI は機密操作を行うために使用されるため，それ自体は安全である必要があるスキャンテストは高い故障検出率を持つテスト容易化手法であり，近年の LSI の大規模化によって重要性が高まっているが，様々な暗号回路へのスキャンベース攻撃手法が報告されているそこで，テスト容易性を保ちスキャンベース攻撃に対して高い安全性を持つセキュアスキャンアーキテクチャとして SDSFF (State Dependent Scan Flip-Flop) が提案された SDSFF では，スキャンフリップフロップに対して付加するラッチの値を更新するタイミングが重要な問題となる本稿では，オンラインテストを可能にする更新タイミングを提案する提案する更新タイミングはスキャンチェイン上の任意のフリップフロップと回路設計時に決定した値との比較結果によって決定される RSA 暗号回路， AES 暗号回路及び DES 暗号回路に提案手法を実装し，評価を行った実験結果より，様々な暗号回路において有効であることが示せた．Secure cryptographic LSIs is intensively used in order to perform confidential operation. Scan test has become the most widely adopted test technique to ensure the correctness of manufactured LSIs, in which through the scan chains the internal states of the circuit under test (CUT) can be controlled and observed externally. However, scan chains using scan test might carry the risk of being misused for secret information leakage. Therefore a secure scan architecture using SDSFF(State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of vairous crypto circuits implementation is compromised, and the effectiveness of the proposed method. Experimental results on various crypto implementations show the effectiveness of the proposed method.

CiNii
SAAV : Energy-efficient High-level Synthesis Algorithm targeting Adaptive Voltage Huddle-based Distributed Register Architecture with Dynamic Multiple Supply Voltages

阿部晋矢, 史又華, 宇佐美公良, 柳澤政生, 戸川望

研究報告システムLSI設計技術（SLDM） 2012 ( 24 ) 1 - 6 2012.11

　View Summary

動的複数電源電圧と配線遅延を高位合成に統合するプラットフォームとして， Adaptive Voltages Huddle-based Distributed-Register アーキテクチャ (AVHDR) および AVHDR アーキテクチャを対象とした高位合成手法が提案された．従来手法はフロアプラン情報をフィードバックし，反復改良する合成フローを取る従来手法では収束性を改善するため，仮想面積ベースの反復改良を採用している．しかし，仮想面積は面積と配線遅延のオーバヘッドを伴う可能性がある．本稿では反復が進むごとにオーバヘッドを削減する仮想面積調整を提案する．計算機実験結果により，提案手法は従来の AVHDR アーキテクチャを対象とした高位合成アルゴリズムと比較し最大 6.2％消費エネルギーを削減し，最終的に従来の高位合成アルゴリズムと比較し最大 65.7％消費エネルギーを削減できることを確認した．An adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis, and a synthesis algorithm for AVHDR architectures have been proposed. This algorithm is based on iterative improvement of scheduling/binding and floorplanning and can converge without oscillation by using virtual-area-based iterative refinement flow. However, virtual areas may have some area and interconnection delay overheads. In this paper, we propose virtual area adaptation which relaxes these overheads as the iteration proceeds. Experimental results show that our algorithm achieves 6.2% energy saving compared with conventional algorithm for AVHDR architectures and 65.7% energy saving compared with conventional algorithms.

CiNii
Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. ICD 112 ( 247 ) 95 - 100 2012.10

　View Summary

Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

CiNii
Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

電子情報通信学会技術研究報告. ICD, 集積回路 112 ( 247 ) 95 - 100 2012.10

CiNii
Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Signal processing 112 ( 246 ) 95 - 100 2012.10

　View Summary

Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

CiNii
Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 112 ( 245 ) 95 - 100 2012.10

　View Summary

Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

CiNii
Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Image engineering 112 ( 248 ) 95 - 100 2012.10

　View Summary

Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

CiNii
Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

跡部悠太, 史又華, 柳澤政生, 戸川望

研究報告システムLSI設計技術（SLDM） 2012 ( 18 ) 1 - 6 2012.10

　View Summary

スキャンテストは高い故障検出率を持ち，一般的に使われるテスト容易化設計技術である．しかし，スキャンテストで用いられるスキャンチェインを通して暗号 LSI から秘密鍵が解読できる可能性が指摘されている．そこで，テスト容易性を保ちスキャンベース攻撃に対して高い安全性を持つセキュアスキャンアーキテクチャとして SDSFF (State Dependent Scan Flip-Flop) が提案された． SDSFF では，スキャンフリップフロップに対して付加するラッチの値を更新するタイミングが重要な問題となる．本稿では，オンラインテストを可能にする更新タイミングを提案する．提案する更新タイミングはスキャンチェイン上の任意のフリップフロップと回路設計時に決定した値との比較結果によって決定される． RSA 暗号回路に提案するセキュアスキャンアーキテクチャを実装し，評価を行った．実験結果より， SDSFF を 100 個実装した場合面積オーバーヘッドは高々 0.555％であり，従来手法よりも小さい面積オーバーヘッドであることがわかった．Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

CiNii
A-3-4 AES Cryptosystem Using Clock Falling Edge Against DFA

Igarashi Hiroaki, Shi Youhua, Yanagisawa Masao, Togawa Nozomu

Proceedings of the Society Conference of IEICE 2012 51 - 51 2012.08

CiNii
A-3-5 Secure Scan Architecture Using State Dependent Scan Flip-Flop with Feedback

Atobe Yuta, Shi Youhua, Yanagisawa Masao, Togawa Nozomu

Proceedings of the Society Conference of IEICE 2012 52 - 52 2012.08

CiNii
Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Signal processing 112 ( 115 ) 115 - 120 2012.06

　View Summary

Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

CiNii
Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Mathematical Systems Science and its Applications : IEICE technical report 112 ( 116 ) 115 - 120 2012.06

　View Summary

Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

CiNii
Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

IEICE technical report. Circuits and systems 112 ( 113 ) 115 - 120 2012.06

　View Summary

Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

CiNii
Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

Technical report of IEICE. VLD 112 ( 114 ) 115 - 120 2012.06

　View Summary

Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

CiNii
An Energy-efficient ASIP Synthesis Method Using Scratchpad Memory and Code Placement Optimization

SHIMADA Yoshinori, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 110 ( 432 ) 25 - 30 2011.02

　View Summary

In this paper, we propose an energy-efficient ASIP synthesis method using scratchpad memory. Due to the fact that a significant amount of power is consumed in the instruction memory, how to develop energy-efficient memory structure becomes important in reducing the overall power consumption of the system. Our method is based on the idea of using scratchpad memory with code placement optimization. The proposed memory architecture can copy data from instruction memory to scratchpad meory under the control of on-chip program counter. With an inputted application CFG, the proposed code placement optimization is used to decide both the code allocations and the required scratchpad memory size for energy minimization. By doing this, the total energy consumption could be reduced as the number of instruction memory accesses is reduced. Experimental results on Mediabench are included to show the effectiveness of the proposed method, in which on average 47.9% energy consumption could be reduced.

CiNii
Dynamically Variable Secure Scan Architecture against Scan-based Side Channel Attack on Cryptography LSIs

ATOBE Hiroshi, NARA Ryuta, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

情報処理学会研究報告システムLSI設計技術（SLDM） 2008 ( 111 ) 55 - 59 2008.11

　View Summary

Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan chains would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan chains to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each SDSFF to be inverted or not so as to make it more difficult to discover the internal scan architecture. We made an analysis on an AES implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based side channel attack.

CiNii
Dynamically Variable Secure Scan Architecture against Scan-based Side Channel Attack on Cryptography LSIs

ATOBE Hiroshi, NARA Ryuta, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 108 ( 298 ) 55 - 59 2008.11

　View Summary

Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan chains would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan chains to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each SDSFF to be inverted or not so as to make it more difficult to discover the internal scan architecture. We made an analysis on an AES implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based side channel attack.

CiNii
Dynamically Variable Secure Scan Architecture against Scan-based Side Channel Attack on Cryptography LSIs

ATOBE Hiroshi, NARA Ryuta, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 108 ( 299 ) 55 - 59 2008.11

　View Summary

Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan chains would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan chains to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each SDSFF to be inverted or not so as to make it more difficult to discover the internal scan architecture. We made an analysis on an AES implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based side channel attack.

CiNii
An Energy-efficent ASIP Synthesis Method Based on Reducing Bit-width of Instruction Memory

KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 107 ( 509 ) 25 - 30 2008.03

　View Summary

This paper proposes an energy-efficient ASIP synthesis method based on reducing bit-width of instruction memory. VLIW-type processors can execute several instructions concurrently. However, an instruction memory of the processors requires long bit-width. This increases power and energy consumption wastefully. Therefore reducing bit-width of instruction memory can realize high-performance and energy-efficient processors. Bit-width of an instruction memory depends on the instruction encoding format, which is composed of the opcode and the operands of a instruction. The opcode bit-width depends on the number of instructions in the instruction-set and the operand bit-width depends depends on the number of general-purpose registers. Moreover, to reduce opcode bit-width, we introduce a concept of a combined instruction which is handled as one instruction and composed of several instructions issued concurrently at each VLIW-slots. We develop an energy-efficient ASIP synthesis system including 3 algorithm: opcode bit-width reduction algorithm, operand bit-width reduction algorithm and energy minimization algorithm. In experimental results, we confirm 9%〜12% energy consumption reduction at a whole processor system including memories.

CiNii
An energy-efficient ASIP synthesis method based on reducing bit-width of instruction memory

KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 107 ( 506 ) 25 - 30 2008.03

　View Summary

This paper proposes an energy-efficient ASIP synthesis method based on reducing bit-width of instruction memory. VLIW-type processors can execute several instructions concurrently. However, an instruction memory of the processors requires long bit-width. This increases power and energy consumption wastefully. Therefore reducing bit-width of instruction memory can realize high-performance and energy-efficient processors. Bit-width of an instruction memory depends on the instruction encoding format, which is composed of the opcode and the operands of a instruction. The opcode bit-width depends on the number of instructions in the instruction-set and the operand bit-width depends depends on the number of general-purpose registers. Moreover, to reduce opcode bit-width, we introduce a concept of a combined instruction which is handled as one instruction and composed of several instructions issued concurrently at each VLIW-slots. We develop an energy-efficient ASIP synthesis system including 3 algorithm: opcode bit-width reduction algorithm, operand bit-width reduction algorithm and energy minimization algorithm. In experimental results, we confirm 9%〜12% energy consumption reduction at a whole processor system including memories.

CiNii
Scalable Dual-Radix Unified Montgomery Multiplier in GF(P) and GF(2^n)

TANIMURA Kazuyuki, NARA Ryuta, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 107 ( 101 ) 43 - 48 2007.06

　View Summary

Modular multiplication is the dominant arithmetic operation in elliptic curve cryptography (ECC), which is one of public-key cryptographies. Montgomery multiplication is commonly used as a technique for modular multiplication and required scalability since the bit length of operands varies depending on the security levels. ECC is performed in GF(P) of GF(2^n), and scalable unified architectures are proposed in previous works. However, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2^n) parts of the multiplier because the critical path of GF(P) hardware is longer. This paper proposes an algorithm and architecture for a scalable and dual-radix unified Montgomery multiplier in GF(P) and GF(2^n). The proposed architecture unifies 4 parallelized radix-2^16 multipliers in GF(P) and a radix-2^64 multiplier in GF(2^n) into a single unit. Applying lower radix to GF(P) hardware shortens its critical path and allows to compute the numbers in the two fields using a same multiplier. Moreover, parallelized architecture in GF(P) reduces the clock cycles increased by dual-radix approach, achieving the fastest scalable unified Montgomery multiplier yet reported.

CiNii
Scalable Dual-Radix Unified Montgomery Multiplier in GF(P) and GF(2^n)

TANIMURA Kazuyuki, NARA Ryuta, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 107 ( 105 ) 43 - 48 2007.06

　View Summary

Modular multiplication is the dominant arithmetic operation in elliptic curve cryptography (ECC), which is one of public-key cryptographies. Montgomery multiplication is commonly used as a technique for modular multiplication and required scalability since the bit length of operands varies depending on the security levels. ECC is performed in GF(P) of GF(2^n), and scalable unified architectures are proposed in previous works. However, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2^n) parts of the multiplier because the critical path of GF(P) hardware is longer. This paper proposes an algorithm and architecture for a scalable and dual-radix unified Montgomery multiplier in GF(P) and GF(2^n). The proposed architecture unifies 4 parallelized radix-2^16 multipliers in GF(P) and a radix-2^64 multiplier in GF(2^n) into a single unit. Applying lower radix to GF(P) hardware shortens its critical path and allows to compute the numbers in the two fields using a same multiplier. Moreover, parallelized architecture in GF(P) reduces the clock cycles increased by dual-radix approach, achieving the fastest scalable unified Montgomery multiplier yet reported.

CiNii
Scalable Dual-Radix Unified Montgomery Multiplier in GF(P) and GF(2^n)

TANIMURA Kazuyuki, NARA Ryuta, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 107 ( 103 ) 43 - 48 2007.06

　View Summary

Modular multiplication is the dominant arithmetic operation in elliptic curve cryptography (ECC), which is one of public-key cryptographies. Montgomery multiplication is commonly used as a technique for modular multiplication and required scalability since the bit length of operands varies depending on the security levels. ECC is performed in GF(P) of GF(2^n), and scalable unified architectures are proposed in previous works. However, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2^n) parts of the multiplier because the critical path of GF(P) hardware is longer. This paper proposes an algorithm and architecture for a scalable and dual-radix unified Montgomery multiplier in GF(P) and GF(2^n). The proposed architecture unifies 4 parallelized radix-2^16 multipliers in GF(P) and a radix-2^64 multiplier in GF(2^n) into a single unit. Applying lower radix to GF(P) hardware shortens its critical path and allows to compute the numbers in the two fields using a same multiplier. Moreover, parallelized architecture in GF(P) reduces the clock cycles increased by dual-radix approach, achieving the fastest scalable unified Montgomery multiplier yet reported.

CiNii
CoDaMa: An XML-based Framework for Manipulating CDFGs

KOARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

情報処理学会研究報告システムLSI設計技術（SLDM） 2007 ( 2 ) 73 - 78 2007.01

　View Summary

This paper proposes an XML-based framework to manipulate CDFGs (Control Data Flow Graphs) for HW/SW (Hardware/Software) co-synthesis systems or high-level synthesis systems. A CDFG is composed of CFG (Control Flow Graph) and DFGs (Data Flow Graphs). In HW/SW co-synthesis systems or high-level synthesis system, CDFGs are often adopted as an internal representation of input application programs. The systems explore design space automatically with various optimization algorithm in order to synthesize hardware and software which satisfy performance requirements and design constraints. However, with the increased scale of the recent SoC (System On a Chip) applications, synthesis systems require implemented more advanced functions, and it would result in increased development efforts. In the proposed framework, developers implement algorithm as modules and construct the synthesis systems by combination of the modules in order to improve development productivity. The developers can implement algorithm and construct the systems easily by using XML descriptions as intermediate representation of application programs and providing the input/output interface.

CiNii
CoDaMa : An XML-based Framework for Manipulation CDFGs

KOARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 106 ( 456 ) 19 - 24 2007.01

　View Summary

This paper proposes an XML-based framework to manipulate CDFGs (Control Data Flow Graphs) for HW/SW (Hardware / Software) co-synthesis systems or high-level synthesis systems. A CDFG is composed of CFG (Control Flow Graph) and DFGs (Data Flow Graphs). In HW/SW co-synthesis systems or high-level synthesis system, CDFGs are often adopted as an internal representation of input application programs. The systems explore design space automatically with various optimization algorithm in order to synthesize hardware and software which satisfy performance requirements and design constraints. However, with the increased scale of the recent SoC (System On a Chip) applications, synthesis systems require implemented more advanced functions, and it would result in increased development efforts. In the proposed framework, developers implement algorithm as modules and construct the synthesis systems by combination of the modules in order to improve development productivity. The developers can implement algorithm and construct the systems easily by using XML descriptions as intermediate representation of application programs and providing the input/output interface.

CiNii
CoDaMa : An XML-based Framework for Manipulating CDFGs

KOARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 106 ( 458 ) 19 - 24 2007.01

　View Summary

This paper proposes an XML-based framework to manipulate CDFGs (Control Data Flow Graphs) for HW/SW (Hardware / Software) co-synthesis systems or high-level synthesis systems. A CDFG is composed of CFG (Control Flow Graph) and DFGs (Data Flow Graphs). In HW/SW co-synthesis systems or high-level synthesis system, CDFGs are often adopted as an internal representation of input application programs. The systems explore design space automatically with various optimization algorithm in order to synthesize hardware and software which satisfy performance requirements and design constraints. However, with the increased scale of the recent SoC (System On a Chip) applications, synthesis systems require implemented more advanced functions, and it would result in increased development efforts. In the proposed framework, developers implement algorithm as modules and construct the synthesis systems by combination of the modules in order to improve development productivity. The developers can implement algorithm and construct the systems easily by using XML descriptions as intermediate representation of application programs and providing the input/output interface.

CiNii
CoDaMa : An XML-based Framework for Manipulation CDFGs

KOARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 106 ( 454 ) 19 - 24 2007.01

　View Summary

This paper proposes an XML-based framework to manipulate CDFGs (Control Data Flow Graphs) for HW/SW (Hardware / Software) co-synthesis systems or high-level synthesis systems. A CDFG is composed of CFG (Control Flow Graph) and DFGs (Data Flow Graphs). In HW/SW co-synthesis systems or high-level synthesis system, CDFGs are often adopted as an internal representation of input application programs. The systems explore design space automatically with various optimization algorithm in order to synthesize hardware and software which satisfy performance requirements and design constraints. However, with the increased scale of the recent SoC (System On a Chip) applications, synthesis systems require implemented more advanced functions, and it would result in increased development efforts. In the proposed framework, developers implement algorithm as modules and construct the synthesis systems by combination of the modules in order to improve development productivity. The developers can implement algorithm and construct the systems easily by using XML descriptions as intermediate representation of application programs and providing the input/output interface.

CiNii
A Forwarding Unit Optimization Method for Application Processors

HIURA Toshihiro, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

情報処理学会研究報告システムLSI設計技術（SLDM） 2006 ( 126 ) 181 - 186 2006.11

　View Summary

To meet the requirements in application specific processor designs, such as area, cost, performance and design time, we have been developing a HW/SW co-design system, called SPADES, which can generate an application specific processor with minimum area on the constraint of the execution time of an application. In SPADES, we reduce the area by reducing unnecessary HW unit and then change the instruction set. However, to change the instruction set will affect the processor architecture. On the other hand, forwarding unit is easy to become the critical path in processors when the processor architecture becomes complex. Thus in this paper, we focus on the forwarding unit for optimization by making a tradeoff between area and delay while without any changes in the instruction set. We also propose a new forwarding unit architecture, called foresight judgment type forwarding unit, which can be incorporated into SPADES to generate HDL description automatically without any knowledge of our system. Experimental results show that the proposed method is more suitable in HW/SW co-design systems to generate the optimized forwarding unit.

CiNii
A Forwarding Unit Optimization Method for Application Processors

HIURA Toshihiro, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 106 ( 389 ) 49 - 54 2006.11

　View Summary

To meet the requirements in application specific processor designs, such as area, cost, performance and design time, we have been developing a HW/SW co-design system, called SPADES, which can generate an application specific processor with minimum area on the constraint of the execution time of an application. In SPADES, we reduce the area by reducing unnecessary HW unit and then change the instruction set. However, to change the instruction set will affect the processor architecture. On the other hand, forwarding unit is easy to become the critical path in processors when the processor architecture becomes complex. Thus in this paper, we focus on the forwarding unit for optimization by making a tradeoff between area and delay while without any changes in the instruction set. We also propose a new forwarding unit architecture, called foresight judgment type forwarding unit, which can be incorporated into SPADES to generate HDL description automatically without any knowledge of our system. Experimental results show that the proposed method is more suitable in HW/SW co-design systems to generate the optimized forwarding unit.

CiNii
A Forwarding Unit Optimization Method for Application Processors

HIURA Toshihiro, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

IEICE technical report 106 ( 392 ) 49 - 54 2006.11

　View Summary

To meet the requirements in application specific processor designs, such as area, cost, performance and design time, we have been developing a HW/SW co-design system, called SPADES, which can generate an application specific processor with minimum area on the constraint of the execution time of an application. In SPADES, we reduce the area by reducing unnecessary HW unit and then change the instruction set. However, to change the instruction set will affect the processor architecture. On the other hand, forwarding unit is easy to become the critical path in processors when the processor architecture becomes complex. Thus in this paper, we focus on the forwarding unit for optimization by making a tradeoff between area and delay while without any changes in the instruction set. We also propose a new forwarding unit architecture, called foresight judgment type forwarding unit, which can be incorporated into SPADES to generate HDL description automatically without any knowledge of our system. Experimental results show that the proposed method is more suitable in HW/SW co-design systems to generate the optimized forwarding unit.

CiNii

▼display all

Industrial Property Rights

信号処理装置および信号処理方法

史又華, 戸川望, 柳澤政生, 五十嵐博昭

Patent

J-GLOBAL
故障攻撃検出回路および暗号処理装置

戸川望, 五十嵐博昭, 史又華

Patent

J-GLOBAL

Syllabus

Master's Thesis (Department of Electronic and Physical Systems)

Graduate School of Fundamental Science and Engineering

2026 full year
Master's Thesis (Department of Electronic and Physical Systems)

Graduate School of Fundamental Science and Engineering

2026 full year
IoT System Design

Graduate School of Fundamental Science and Engineering

2026 spring semester
Hardware for Machine Learning

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Intelligent System Design D

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Intelligent System Design B

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Intelligent System Design A

Graduate School of Fundamental Science and Engineering

2026 spring semester
Seminar on Intelligent System Design D

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Intelligent System Design C

Graduate School of Fundamental Science and Engineering

2026 spring semester
Seminar on Intelligent System Design A

Graduate School of Fundamental Science and Engineering

2026 spring semester
Seminar on Intelligent System Design B

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Integrated System Design C

Graduate School of Fundamental Science and Engineering

2026 spring semester
Seminar on Integrated System Design B

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Integrated System Design A

Graduate School of Fundamental Science and Engineering

2026 spring semester
Seminar on Integrated System Design D

Graduate School of Fundamental Science and Engineering

2026 fall semester
Research on Integrated System Design

Graduate School of Fundamental Science and Engineering

2026 full year
Hardware for Machine Learning

Graduate School of Fundamental Science and Engineering

2026 fall semester
Research on Intelligent System Design

Graduate School of Fundamental Science and Engineering

2026 full year
Seminar on Intelligent System Design C

Graduate School of Fundamental Science and Engineering

2026 spring semester
Seminar on Integrated System Design B

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Integrated System Design A

Graduate School of Fundamental Science and Engineering

2026 spring semester
Seminar on Integrated System Design D

Graduate School of Fundamental Science and Engineering

2026 fall semester
Seminar on Integrated System Design C

Graduate School of Fundamental Science and Engineering

2026 spring semester
Hardware for Machine Learning

Graduate School of Fundamental Science and Engineering

2026 fall semester
Research on Intelligent System Design

Graduate School of Fundamental Science and Engineering

2026 full year
Research on Integrated System Design

Graduate School of Fundamental Science and Engineering

2026 full year
IoT System Design

Graduate School of Creative Science and Engineering

2026 spring semester
Research on Intelligent System Design

Graduate School of Fundamental Science and Engineering

2026 full year
Research on Integrated System Design

Graduate School of Fundamental Science and Engineering

2026 full year
Hardware for Machine Learning

Graduate School of Fundamental Science and Engineering

2026 fall semester
IoT System Design

Graduate School of Advanced Science and Engineering

2026 spring semester
Physical Electronics Laboratory B

School of Fundamental Science and Engineering

2026 spring quarter
Physical Electronics Seminar B

School of Fundamental Science and Engineering

2026 spring quarter
Physical Electronics Laboratory A

School of Fundamental Science and Engineering

2026 winter quarter
Physical Electronics Seminar A

School of Fundamental Science and Engineering

2026 winter quarter
Introduction to Electronic and physical systems [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Electronic and Physical Systems Practice A [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Electronic and Physical Systems Practice B

School of Fundamental Science and Engineering

2026 fall semester
Electronic and Physical Systems Practice A

School of Fundamental Science and Engineering

2026 spring semester
Logic Circuits

School of Fundamental Science and Engineering

2026 spring semester
Logic Circuits [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
IoT System Design

School of Fundamental Science and Engineering

2026 spring semester
IoT System Design

School of Fundamental Science and Engineering

2026 spring semester
Electronic and Physical Systems Laboratory B

School of Fundamental Science and Engineering

2026 spring semester
IoT System Design

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis B

School of Fundamental Science and Engineering

2026 spring semester
LSI Architecture

School of Fundamental Science and Engineering

2026 fall quarter
Bachelor Thesis B

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis B [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis A [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis A [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2026 fall semester
Bachelor Thesis A

School of Fundamental Science and Engineering

2026 spring semester
Electronic and Physical Systems Laboratory B [S Grade]

School of Fundamental Science and Engineering

2026 spring semester
Introduction to Electronic and physical systems

School of Fundamental Science and Engineering

2026 spring semester
Special Seminar on Electronic and Physical Systems

School of Fundamental Science and Engineering

2026 fall semester
Electronic and Physical Systems Laboratory C [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Electronic and Physical Systems Laboratory C

School of Fundamental Science and Engineering

2026 fall semester
Electronic and Physical Systems Laboratory A [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Electronic and Physical Systems Laboratory A

School of Fundamental Science and Engineering

2026 fall semester
Electronic and Physical Systems Practice B [S Grade]

School of Fundamental Science and Engineering

2026 fall semester
Electronic Circuits B [S Grade]

School of Fundamental Science and Engineering

2026 summer quarter
Electronic Circuits B

School of Fundamental Science and Engineering

2026 summer quarter
Electronic Circuits A [S Grade]

School of Fundamental Science and Engineering

2026 spring quarter
Electronic Circuits A

School of Fundamental Science and Engineering

2026 spring quarter

▼display all

Sub-affiliation

Faculty of Science and Engineering Graduate School of Fundamental Science and Engineering

Research Institute

2024

-

2026

Waseda Research Institute for Science and Engineering Concurrent Researcher
2024

-

2026

Waseda Center for a Carbon Neutral Society Concurrent Researcher

Internal Special Research Projects

TENGを用いた海洋エネルギー発電システムの開発

2025

　View Summary

海洋は地球表面の約70％を占め、極めて大きなエネルギー生成ポテンシャルを有している。特に、海洋エネルギーは、地球・月・太陽の運動や重力相互作用に起因して発生し、安定かつ長期的な電力供給源として利用可能である。本研究では、摩擦帯電現象を利用したTriboelectric Nanogenerator（TENG）を基盤とする海洋エネルギー発電システムの開発に取り組んだ。特に、小型・軽量で、不規則かつ低周波な波浪および海流運動からエネルギーを回収可能な高効率ハーベスティングシステムの構築を行った。具体的には、回転型およびスライディング型TENGシステムを設計し、材料選択、電極構造、ならびに複数TENG素子の接続方式の最適化を行った。さらに、海洋環境下での使用を想定し、耐水・耐塩害性を考慮したブイ構造設計と、エネルギーハーベスティング・蓄積用のインタフェース回路を開発した。試作デバイスの評価では、低周波振動条件下において安定した電力出力が得られることを確認するとともに、複数TENGにおける動作周波数および位相差が出力特性に与える影響を解析した。その結果、不規則かつ低周波な波浪・海流環境において、回転型TENGを用いることで分散型Internet of Things (IoT)機器への自己電源供給の実現可能性を示した。
AIシステム脆弱性の解明とその対策手法の開発

2025 GUO, Chao

　View Summary

近年、人工知能（AI）技術の急速な発展に伴い、エッジ環境において高性能かつ低コスト・低消費電力な推論を実現するため、深層学習モデルに最適化されたAIアクセラレータの重要性が急速に高まっている。モデルの複雑化および多様化に伴い、従来の手作業によるアクセラレータ設計に代わり、AIアクセラレータ自動生成プラットフォームが広く用いられるようになっている。しかしながら、これらのプラットフォーム自体が内在するセキュリティ脆弱性に関する体系的な研究は極めて限定的であった。そこで本研究では、AIアクセラレータ自動生成プラットフォームのセキュリティ脆弱性に着目し、その脅威モデル、攻撃手法、ならびに防御手法に関する研究を行った。具体的には、勾配情報を用いず、推論過程における出力情報のみを利用して層横断的に攻撃に脆弱なパラメータを探索することで、極めて低オーバーヘッドかつ高い秘匿性を有するハードウェアトロイ（HT）を自動的に探索・生成・挿入可能であることを明らかにした。提案手法を複数の代表的AIモデルに適用した結果、既存手法と比較して攻撃耐性の向上と性能劣化の抑制を両立できることを確認した。さらに、単一FPGAにとどまらず、マルチFPGA設計への攻撃拡張が可能であることも実証した。加えて、攻撃者視点にとどまらず、防御者の立場から悪意ある生成プラットフォームに対抗可能なモデルレベルの防御手法についても提案した。本研究成果は、今後の高信頼・高安全AIシステム構築に向けた基盤技術として有用である。
発電・センシング一体化システムの開発

2024

　View Summary

現在、IoTデバイスはバッテリー依存からの脱却が求められる中、超低消費電力動作が可能なセンサーやウェアラブルデバイスにおいて、エネルギーハーベスティング技術の適用が注目されている。特に、摩擦電気ナノジェネレータ（TENG）は、人体動作に伴う機械エネルギーを電力へ変換する技術として、テキスタイルや生体埋め込みデバイスへの応用が期待されている。一方、TENGの非線形かつ時間依存性を持つ内部容量、非対称出力、高電圧・微小電流特性は、エネルギー変換効率の最大化を妨げる要因であり、回路設計上の課題となっている。そのため、本研究は、TENGの課題を解決し、エネルギーハーベスティング、センシング、情報処理を統合する新たな解決策の提案を目的とする。本研究では、垂直接触分離型TENG（CS-TENG）および回転型TENG（RS-TENG）の試作に加え、高効率インターフェース回路を設計し、発電・センシング統合システムを開発した。実証実験では、1.5 Hz/3 Hz動作時に1MΩ負荷条件下で、従来の全波整流回路（FWR）と比較し、それぞれ264倍および168倍の出力電力の向上を達成した。さらに、センサー駆動と無線通信機能の実装により、環境発電・センシング一体化システム実現へのブレークスルーを実証した。本成果はIEEE Transactions on Power Electronicsに投稿中であり、超低消費電力デバイス向けの新たなエネルギー自立ソリューションとして産業応用が期待される。
摩擦帯電型ナノ発電機の試作および自立駆動可能なインターフェース回路の設計

2023

　View Summary

Triboelectric nanogenerators (TENGs) are emerging as a promising, cost-effective energy harvesting approach for Internet of Things (IoT) applications. However, their practical deployment faces challenges due to extremely high output voltages, ultra-low intrinsic capacitance, the necessity for non-self-powered interface circuits, and ultra-low transfer efficiency due to output voltage asymmetry.Addressing these issues, this research introduces a novel dual-output rectifier (DOR)-based interface circuit designed to efficiently convert TENG outputs into two different voltage levels, optimizing energy harvesting and switching generation. Our approach leverages energy from the TENG's transition from separation to contact in the negative half cycle to produce a step-down switching control signal. Concurrently, energy generated during the positive half cycle, from contact to separation, is temporarily stored at a high voltage level. This energy is later stepped down and directed to the load via a flyback converter, upon reaching a threshold that activates the control module, optimizing energy transfer efficiency.The effectiveness of our approach was demonstrated using self-manufactured vertical contact-separation TENGs (CS-TENGs) featuring a spring-assisted separation structure comprising two copper sheets and a polytetrafluoroethylene (PTFE) sheet, which occupies a 120mm × 90mm effective contact area. The PTFE layer is 0.1mm thick, allowing for a maximum displacement of 1.2mm in our experiments. The experimental results demonstrate significant improvements, achieving 2.75 and 2.34 times the maximum output power compared to a full-wave rectifier (FWR)-based design at 2 Hz and 3 Hz, respectively. Additionally, under the same frequency and load conditions (1MΩ at 2 and 3 Hz), the output gains are 152 and 160 times greater than the FWR's. Our approach brings about a significant advancement in integrating TENGs for low-frequency and low-load IoT devices, demonstrating its potential for wider practical application. The corresponding achievements have been accepted for publication in IEEE Transactions on Power Electronics.
摩擦発電のための高効率インターフェース回路の設計

2022 蘇怡瑞

　View Summary

エネルギーハーベスティング技術の一つである摩擦帯電型ナノ発電機(Triboelectric Nanogenerator：TENG)は安価に作製でき、出力電力密度が高く、低周波数帯域での安定性が良い。一方、TENGの出力は高電圧、低電流で、内部抵抗は極めて高いため、そのままInternet of Things (IoT)デバイスに電力供給には適しない。これらの課題を解決するために、本研究は、TENG素子の試作を行い、高い電力変換効率を保ちながら、必要な最適負荷を低減できる自立駆動できるインターフェース回路を提案した。実機で評価した結果では、既存研究に比べて低い負荷抵抗にしても2倍以上の出力電力を得ることが達成した。本研究成果は今後、様々なIoTデバイスへの応用が大いに期待できる。
エッジコンピューティングに向け高いエネルギー効率をもつDNN回路設計技術の創出

2021 葉静浩

　View Summary

Driven by the explosive growth ofavailable data and powerful computing resources, deep neural networks (DNNs)have achieved remarkable breakthrough recently. As DNN models become morediverse for various applications, how to obtain an optimal accelerator designfor specific NN models while maintaining high energy efficiency with limitedhardware resources becomes an emerging challenge. Unfortunately, few systematicapproaches have been proposed yet. To address this design challenge, amodel-defined energy efficient DNN accelerator design through design spaceexploration and architecture optimization is proposed. Firstly, two dual datareuse approaches are proposed to improve on-chip data utilization efficiency.Secondly, a layer-wise design space exploration framework is developed toprecisely determine the optimal tiling configuration and the corresponding datareuse strategy for target neural network models even with on-chip hardwareresource constraints, which can minimize the amount of data movement betweenoff-chip DRAM and on-chip GLB. Thirdly, an energy efficient accelerator designwith on-chip dual data reuse, centered ifmap/weight buffers, distributed psumbuffers, and optimal resource configuration techniques is presented for GLBaccess reduction and energy efficiency improvement. Compared with thestate-of-the-art accelerators, the proposed design can leverage the energyefficiency by up to 2.7X and 3.6X for AlexNet and VGG, respectively.
ウェアラブルデバイスに適用するエナジーハーベスティングインターフェース回路の開発

2020

　View Summary

近年では，スマートフォン需要の拡大や技術発達に伴う機器の小型化・コスト低下などから Internet of Things(IoT)が様々な分野で急速に普及している。IoTデバイスの電源問題を解決するために，エナジーハーベスティング（Energy　Harvesting：EH）技術が大きな注目を集めている。しかし，個々のエナジーハーベスタ（例えば：摩擦帯電型素⼦・圧電素⼦）から得られるエネルギーは⾮常に微弱であるため，高効率なEHインターフェース回路設計技術が必要である。そのため，本研究はウェアラブルデバイスに適⽤するEHインターフェース回路の開発を行った。結果，提案回路を用いて，人体の動作を用いたバッテリーフリー無線送信可能なウェアラブルデバイスの実現を達成した。
デジタル社会に向け長期的に高信頼かつ超低消費電力メモリの研究開発

2019

　View Summary

デジタル社会において、データの量が爆発的に増加しているため、メモリ回路の重要性はますます重要になってきている。しかし微細化によってトランジスタの性能ばらつきやソフトエラーの発生率が増大した事と、SRAM (Static Random Access Memory) と呼ばれるメモリの容量が増大し歩留まりが悪くなったことでメモリの消費電力は増大している。そのため、今後のデジタル社会の実現のために、長期的高信頼化かつ超低消費電力化メモリ設計技術の開発が急務である。本研究ではメモリ回路（特にSRAM回路）の長期的に高信頼化・低消費電力化を目的とした回路設計技術の研究開発を行った。特に、低消費電力化かつ長期的な安定性向上の設計技術を提案し、その有効性を評価した。
ビッグデータ処理に向けたApproximate Computingを実現するLSI設計技術の研究開発

2018

　View Summary

　近年、IoT（Internet of things）・ビッグデータ・人工知能への注目が高まっている。このような膨大的なデータ解析・処理において大きな問題となるのは、その計算量の多さ、実行時間の長さからくる消費電力の大きさである。一方、ビッグデータ分野では潜在的にエラー耐性を持ち、完全な精度の計算が必要とされない場面が多数ある。そこで、本研究は膨大的かつ潜在的にエラー耐性を持つビッグデータ処理に向けて、Approximate Computingを実現するLSI設計技術に関する研究を行った。特に、①エラー距離を考慮した概算加算回路の性能・精度指標の定式化、②ビット幅削減による低消費電力化FIR 回路、および③CNN に対する算術オーバーフローを考慮したビット幅削減手法などを提案した。
自然エネルギー利用に向けたスマートケースLSI設計技術の創生

2014

　View Summary

　本研究ではLSI（大規模集積回路）の設計技術に焦点を当て、不安定且つ微弱な自然エネルギーに適合し、状況に応じた最適な動作を実現するスマートケースLSI設計技術の研究開発を行った。特に、既存LSI設計技術の問題点を解決する革新的技術として「I: 極低エネルギーLSI設計技術」と「II：動作中自己調整機能を持つ設計技術」を提案した。本研究は、既存のワーストケースに基づいたLSI設計方法ではなく、回路が動作時自己調整により処理性能・消費電力・信頼性を最大限引き出すことが可能なシステムLSI設計基盤技術を開発した。
ディペンタブルな低電圧ＬＳＩ設計技術に関する研究

2011

　View Summary

　情報通信機器が高性能化するにしたがい、消費電力の増大が大きな問題になりつつある。LSI回路の低消費電力化には、LSI の電源電圧を下げることが最も効果的である。CMOS回路の動作電力は電圧の自乗に比例するので、電圧を1/3にすれば、単純には消費電力がほぼ1/10 になる。しかし、低電圧の条件下ではCMOS回路の動作が不安定になり、LSIの製造ばらつきやノイズなどに影響され、動作マージン減少、誤動作などの障害が、現状と比較して極めて増大する。つまり将来安心かつエコなアンビエント情報社会を実現するためには、情報通信・処理の主要素子であるCMOS トランジスタの動作電圧をしきい値電圧以下に低減できるLSI自動化設計技術と高信頼化設計技術の統合・融合したディペンタブルな低電圧LSI設計基盤技術が強く求められると考える。　本研究は、高い信頼性を持つディペンタブルな超低電圧LSI設計技術の開発を目的とする。研究の目標としては、既存研究（カスタム設計）と異なり、自動化設計により、設計複雑度や設計周期を減らし、並びに回路全体の信頼性を高めることを目指す。また、実チップ設計により、既存研究と比較してエネルギーを低減し、並びに低電圧領域における設計タイミングのばらつきを改善することを目標とする。　今年度では、主に以下の研究項目を行ってきた。（１）超低電圧LSI自動化設計技術について　具体的には、低電圧領域（サブスレッショルド領域）で動作する回路設計のため、①サブスレッショルド領域での遅延・電力のモデルの構築；②サブスレッショルド領域で動作させるため、既存のプロセスライブラリを用いて、トランジスタレベルでシミュレーションを行い、エネルギーが最小な電源電圧を選択できる合成手法の提案、及び③提案した最適エネルギー電圧選択手法をベースに上位レベル（RTLレベル）から低電圧による低エネルギー指向LSI自動合成フローの構築などの研究を取り込んだ。様々なアルゴリズムをコンピュータに実装し、評価実験を行った。既存のカスタム設計と異なり、合成時自動でエネルギー最小な電源電圧の選択ができ、Benchmark回路に適用し有効性を確認した。また、自動化設計により、設計複雑度や設計周期を減らすごとができた。（２）ディペンタブルなLSI設計技術について　　具体的には、①LSI回路動作時の遅延、温度変化および電源電圧変化の解析、及び②電圧変動により、ディレイ変動を検出・制御する技術の研究を行った。研究成果として、理論面から、80％以上の論理パス上発生した遅延エーラの検出ができた。
システムオンチップのテスト容易化設計に関する研究

2005

　View Summary

LSIの超大規模化・超微細化により、情報システム全体をワン・チップ上に実現することが可能になった。しかし、高集積化により故障をチェックするべき点が増え、各点の故障をテストするパターンの数は増加し、製造されたチップが正常に動作するか否かを調べるテストは益々困難になってきている。1チップあたりのテスト時間はテスト・パターンの数に比例するので、機能モジュールを複数集積したシステムオンチップ（SoC，System-on-a-Chip）では、集積したモジュールの数に比例した時間がかかり、テストの時間が非常に長くなる。その結果、SoCのテスト・コストが製造コストを超える勢いで増加しており、テストの品質も低下しているため、テストは半導体産業の発展を阻害する要因になりかねない。そのために、SoCに関する低コスト、高品質なテスト容易化設計方法の研究が重要となってきた。上記背景のもと，本研究ではテスト・データの圧縮技術やテスト時間削減の容易化設計手法に関する研究を行う。提案手法ではデザインに挿入され、少ないスキャン・チャネルから多数の内部スキャン・チェーンを供給するデコンプレッサで構成される。最先端のスキャンおよびテスト・データの圧縮技術と比較し、テスト・データの量とテスト時間を最大20 分の1までに削減できる。その研究成果を学会において発表した。また、多種の故障タイプのテストに対応し、故障解析方法の詳細の検討を行った．

▼display all