Updated on 2024/04/25

写真a

 
TOGAWA, Nozomu
 
Affiliation
Faculty of Science and Engineering, School of Fundamental Science and Engineering
Job title
Professor
Degree
Dr. Eng. ( Waseda University )
Profile

1992年早稲田大学理工学部卒業。1997年同大学院理工学研究科博士後期課程修了。同年博士(工学)。早稲田大学助手・講師、北九州市立大学助教授等を経て、2009年より早稲田大学理工学術院教授(現在に至る)。集積システム設計、量子計算、セキュリティ等が専門。

Research Experience

  • 2009.04
    -
    Now

    Waseda University   Faculty of Science and Engineering

  • 2005.04
    -
    2009.03

    Waseda University   Faculty of Science and Engineering

  • 2001.04
    -
    2005.03

    The University of Kitakyushu   Faculty of Environmental Engineering

  • 2000.04
    -
    2001.03

    Waseda University   Advanced Research Institute for Science and Engineering

  • 1997.04
    -
    2000.03

    Waseda University   School of Science and Engineering

Education Background

  • 1994.04
    -
    1997.03

    Waseda University   Graduate School of Science and Engineering   Electrical Engieering  

  • 1992.04
    -
    1994.03

    Waseda University   Graduate School of Science and Engineering  

  • 1988.04
    -
    1992.03

    Waseda University   School of Science and Engineering  

Committee Memberships

  • 2020.06
    -
    Now

    電子情報通信学会VLSI設計技術研究専門委員会  委員

  • 2018.04
    -
    Now

    内閣サイバーセキュリティセンター(NISC)  研究開発戦略専門調査会委員

  • 2017.01
    -
    Now

    総務省サイバーセキュリティタスクフォース  構成員

  • 2022.01
    -
    2023.12

    IEEE Circuits and Systems Society, Japan Joint Chapter  Chair

  • 2020.06
    -
    2022.06

    電子情報通信学会基礎・境界ソサイエティ  特別委員

  • 2018.04
    -
    2022.03

    情報処理学会 高度交通システムとスマートコミュニティ研究会  委員

  • 2020.01
    -
    2021.12

    IEEE Circuits and Systems Society, Japan Joint Chapter  Vice Chair

  • 2013.01
    -
    2021.01

    IEEE/ACM Asia South Pacific Design Automation Conference  Steering Committee Secretary

  • 2019.06
    -
    2020.06

    電子情報通信学会VLSI設計技術研究専門委員会  委員長

  • 2018.01
    -
    2019.12

    IEEE Circuits and Systems Society, Japan Joint Chapter  Secretary

  • 2018.06
    -
    2019.05

    電子情報通信学会VLSI設計技術研究専門委員会  副委員長

  • 2015.04
    -
    2019.03

    IPSJ Transactions on System LSI Design Methodology  Editor-in-Chief

  • 2014.06
    -
    2017.05

    電子情報通信学会基礎・境界ソサイエティ  特別委員

  • 2013.04
    -
    2017.03

    情報処理学会システムとLSIの設計技術研究会  委員

  • 2010.05
    -
    2016.05

    電子情報通信学会VLSI設計技術研究専門委員会  専門委員

  • 2011.05
    -
    2014.06

    電子情報通信学会 アクレディエーション委員会  幹事

  • 2013.04
    -
    2014.03

    情報処理学会論文誌編集委員会基盤グループ  主査

  • 2011.04
    -
    2013.03

    情報処理学会システムLSI設計技術研究会  幹事

  • 2010.05
    -
    2012.05

    電子情報通信学会基礎・境界ソサイエティ  幹事

  • 2005.04
    -
    2012.05

    電子情報通信学会 リコンフィギャラブルシステム研究専門委員会  専門委員

  • 2008.05
    -
    2011.05

    電子情報通信学会 アクレディエーション委員会  委員

  • 2009.04
    -
    2011.03

    情報処理学会システムLSI設計技術研究会  委員

  • 2008.05
    -
    2010.05

    電子情報通信学会VLSI設計技術研究専門委員会  幹事

  • 2003.05
    -
    2008.05

    電子情報通信学会 VLSI設計技術研究専門委員会  専門委員

  • 2004.05
    -
    2006.05

    電子情報通信学会 会誌編集委員会  委員

▼display all

Professional Memberships

  •  
     
     

    ACM

  •  
     
     

    IEEE

  •  
     
     

    情報処理学会

  •  
     
     

    電子情報通信学会

Research Areas

  • Computer system / Information security

Research Interests

  • integrated system design

  • quantum computation

  • information security

Awards

  • 科学技術分野の文部科学大臣表彰・科学技術賞(研究部門)

    2018.04   文部科学省   集積回路の革新的設計技術とそのセキュリティ応用研究

    Winner: 戸川望

  • Best Paper Award

    2017.09   IEEE ICCE-Berlin   A robust scan-based side-channel attack method against HMAC-SHA-256 circuits

    Winner: Daisuke Oku, Masao Yanagisawa, Nozomu Togawa

  • Best Paper Award

    2016.10   IEEE International SoC Conference   A high-performance circuit design algorithm using datadependent approximation

    Winner: Kazushi Kawamura, Masao Yanagisawa, Nozomu Togawa

  • テレコムシステム技術賞

    2011.03   電気通信普及財団  

    Winner: 戸川 望

  • (財)船井情報科学財団 船井学術賞

    2010.04  

  • (財)丸文研究交流財団 丸文研究奨励賞

    2010.03  

  • IEEE DAC/ISSCC Student Design Contest, 1st Place

    2006.07  

  • (財)武田計測先端知財団,武田研究奨励賞優秀賞

    2001.12  

  • 丹羽記念会平成9年度(第21回)丹羽記念賞

    1998.02  

  • (財)安藤研究所第9回安藤博記念学術奨励賞

    1996.06  

  • 電子情報通信学会 第8回回路とシステム軽井沢ワークショップ 研究奨励賞

    1996.04  

  • 早稲田大学平成7年度小野梓記念学術賞

    1996.03  

  • 早稲田大学平成7年度大川功記念賞

    1996.03  

  • Best Paper Award (IEEE Asia and South Pacific Design Automation Conference 1995)

    1995.08  

  • (財)電気通信普及財団第10回テレコムシステム技術学生賞

    1995.03  

▼display all

 

Papers

  • Ising machine approach to the lecturer–student assignment problem

    Sora Tomita, Tatsuhiko Shirai, Nozomu Togawa

    IEEE Access    2024  [Refereed]

    Authorship:Last author

    DOI

    Scopus

  • A GPU-Based Ising Machine With a Multi-Spin-Flip Capability for Constrained Combinatorial Optimization

    Satoru Jimbo, Tatsuhiko Shirai, Nozomu Togawa, Masato Motomura, Kazushi Kawamura

    IEEE Access    2024  [Refereed]

    DOI

    Scopus

  • Postprocessing Variationally Scheduled Quantum Algorithm for Constrained Combinatorial Optimization Problems

    Tatsuhiko Shirai, Nozomu Togawa

    IEEE Transactions on Quantum Engineering    2024  [Refereed]

    Authorship:Last author

    DOI

    Scopus

  • Optimization of Practical Time-Dependent Vehicle Routing Problem by Ising Machines.

    Yui Tsuyumine, Kenichi Masuda, Takeshi Hachikawa, Tsuyoshi Haga, Yuta Yachi, Tatsuhiko Shirai, Masashi Tawada, Nozomu Togawa

    ICCE     1 - 5  2024  [Refereed]

    Authorship:Last author

    DOI

    Scopus

  • Time-Dependent Multi-Objective Trip Planning by Ant Colony Optimization with Route API.

    Etsushi Saeki, Siya Bao, Toshinori Takayama, Nozomu Togawa

    ICCE     1 - 2  2024  [Refereed]

    Authorship:Last author

    DOI

    Scopus

  • Evaluation of Ensemble Learning Models for Hardware-Trojan Identification at Gate-level Netlists.

    Ryotaro Negishi, Nozomu Togawa

    ICCE     1 - 6  2024  [Refereed]

    Authorship:Last author

    DOI

    Scopus

  • An Interaction Coefficient Control Method for Setting Initial Solutions to Ising Machines.

    Soma Kawakami, Kentaro Ohno, Dema Ba, Satoshi Yagi, Junji Teramoto, Nozomu Togawa

    ICCE     1 - 2  2024  [Refereed]

    Authorship:Last author

    DOI

    Scopus

  • Carrying-Mode-Free Stair Ascent and Descent Estimation using Smartphones.

    Dai Kajimoto, Etsushi Saeki, Siya Bao, Nozomu Togawa

    ICCE     1 - 6  2024  [Refereed]

    Authorship:Last author

    DOI

    Scopus

  • Gen-Power: Anomaly Detection in IoT Devices Utilizing Generated Power Waveforms.

    Kota Hisafuru, Nozomu Togawa

    ICCE     1 - 6  2024  [Refereed]

    Authorship:Last author

    DOI

    Scopus

  • Hybrid Iterative Annealing Method Using a Quantum Annealer and a Classical Computer.

    Keisuke Fukada, Tatshuhiko Shirai, Nozomu Togawa

    ICCE     1 - 6  2024  [Refereed]

    Authorship:Last author

    DOI

    Scopus

  • An Anomalous Behavior Detection Method Utilizing IoT Power Waveform Shapes.

    Kota Hisafuru, Kazunari Takasaki, Nozomu Togawa

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   107 ( 1 ) 75 - 86  2024.01  [Refereed]

    Authorship:Last author

    DOI

  • Hardware-Trojan Detection at Gate-Level Netlists Using a Gradient Boosting Decision Tree Model and Its Extension Using Trojan Probability Propagation.

    Ryotaro Negishi, Tatsuki Kurihara, Nozomu Togawa

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   107 ( 1 ) 63 - 74  2024.01  [Refereed]

    Authorship:Last author

    DOI

  • Giving a Quasi-Initial Solution to Ising Machines by Controlling External Magnetic Field Coefficients.

    Soma Kawakami, Kentaro Ohno, Dema Ba, Satoshi Yagi, Junji Teramoto, Nozomu Togawa

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   107 ( 1 ) 52 - 62  2024.01  [Refereed]

    Authorship:Last author

    DOI

  • Ising-Machine-Based Solver for Constrained Graph Coloring Problems.

    Soma Kawakami, Yosuke Mukasa, Siya Bao, Dema Ba, Junya Arai, Satoshi Yagi, Junji Teramoto, Nozomu Togawa

    IEICE Trans. Fundam. Electron. Commun. Comput. Sci.   107 ( 1 ) 38 - 51  2024.01  [Refereed]

    Authorship:Last author

    DOI

  • Node-wise Hardware Trojan Detection Based on Graph Learning

    Kento Hasegawa, Kazuki Yamashita, Seira Hidano, Kazuhide Fukushima, Kazuo Hashimoto, Nozomu Togawa

    IEEE Transactions on Computers    2024  [Refereed]

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Hybrid Optimization Method Using Simulated-Annealing-Based Ising Machine and Quantum Annealer

    Shuta Kikuchi, Nozomu Togawa, Shu Tanaka

    Journal of the Physical Society of Japan    2023.12

    DOI

    Scopus

  • An Efficient Combined Bit-Width Reducing Method for Ising Models.

    Yuta Yachi, Masashi Tawada, Nozomu Togawa

    IEICE Trans. Inf. Syst.   106 ( 4 ) 495 - 508  2023.04  [Refereed]

    DOI

  • Multi-Spin-Flip Engineering in an Ising Machine.

    Tatsuhiko Shirai, Nozomu Togawa

    IEEE Trans. Computers   72 ( 3 ) 759 - 771  2023.03  [Refereed]

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • R-HTDetector: Robust Hardware-Trojan Detection Based on Adversarial Training.

    Kento Hasegawa, Seira Hidano, Kohei Nozawa, Shinsaku Kiyomoto, Nozomu Togawa

    IEEE Trans. Computers   72 ( 2 ) 333 - 345  2023.02  [Refereed]

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Multi-Day Intermodal Travel Planning for Urban Cities Using Ising Machines

    Siya Bao, Nozomu Togawa

    IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC     54 - 60  2023  [Refereed]

    Authorship:Last author

     View Summary

    The multi-day intermodal travel planning problem (MITPP) is an optimization problem (OP) and it generates the optimal sequences of point-of-interests (POIs) and hotels while searching for the most suitable transport modes between POIs and hotels. Conventional methods and solvers using von Neumann computers provide good approximate solutions to the OPs, but the computation time grows exponentially dealing with large-scale or complex OPs. Meanwhile, Ising machines or quantum annealing machines are non-von Neumann computers that are designed to solve complex OPs. In this paper, we focus on solving the MITPP by a two-phase Ising-based method. The first POI clustering phase aims at generating POIs clusters for sightseeing days and the second POI routing phase generates travel routes for each day with the optimal transport modes. Practical factors such as POI satisfaction, POI duration, hotel fee, and transportation fee are included in the MITPP. We map these elements onto quadratic unconstrained binary optimization (QUBO) models. For evaluation, we use a real-world dataset in Sapporo, Japan. Empirical results confirm that the proposed method can effectively solve the MITPP both in terms of solution quality and execution time and outperforms a conventional solver, a conventional method, and the latest Ising-based method.

    DOI

    Scopus

  • Smart Device-Based PDR Methods for Indoor Localization

    Siya Bao, Nozomu Togawa

    Machine Learning for Indoor Localization and Navigation     27 - 48  2023.01

     View Summary

    Smart devices, such as smartphones and smartwatches, are indispensable nowadays for everyone’s daily life due to their mobility and powerful computation capability. Sensors embedded in these devices are relatively low-cost and convenient to carry. Consequently, leveraging the sensors embedded in smart devices has provided new opportunities for indoor PDR developments. In this chapter, we first introduce various types of smart devices and device-based carrying modes. We then describe the types and functionalities of sensors built into these devices, as well as common steps and evaluation metrics in smart device-based PDR methods. Several methods are summarized based on the usage of smart devices, sensors, techniques, and performances. Lastly, we present challenges and issues that remain for current smart device-based PDR methods.

    DOI

    Scopus

  • An Ising-Machine-Based Solver of Vehicle Routing Problem With Balanced Pick-Up

    Siya Bao, Masashi Tawada, Shu Tanaka, Nozomu Togawa

    IEEE Transactions on Consumer Electronics    2023  [Refereed]

    Authorship:Last author

     View Summary

    Vehicle routing applications are ubiquitous in the field of pick-up and delivery service. We focus on the vehicle routing problem with balanced pick-up called VRPBP which originates from the package pick-up service. The aim of the problem is not only to efficiently explore the shortest travel route but also to balance loads between depots and vehicles. These problems can be regarded as optimization problems, and recent developments in Ising machines, including quantum annealing machines, bring us a new opportunity to solve complex real-world optimization problems. In this paper, a two-phase method and a three-phase method using Ising machines are proposed for solving the VRPBP. As the applicability of current Ising machines is limited due to the small size of Ising spins and connectivities, we partition the complex problem into two or three sub-problems, and the key elements of each sub-problem are mapped onto quadratic unconstrained binary optimization (QUBO) models to fit in the structure of the Ising machines. We first compared the performances of the Ising machine on the standard TSP and CVRP datasets with a conventional state-of-the-art solver and three conventional methods. Then, we evaluated the performances of the proposed methods compared with five conventional method for solving the VRPBP. The results confirm the effectiveness of the two proposed methods in solving vehicle-routing-related optimization problems.

    DOI

    Scopus

  • QuDASH: Quantum-Inspired Rate Adaptation Approach for DASH Video Streaming.

    Bo Wei, Hang Song, Makoto Nakamura, Koichi Kimura, Nozomu Togawa, Jiro Katto

    IEEE Access   11   118462 - 118473  2023  [Refereed]

    DOI

    Scopus

  • Trip Planning Based on subQUBO Annealing.

    Tatsuya Noguchi, Keisuke Fukada, Siya Bao, Nozomu Togawa

    IEEE Access   11   100383 - 100395  2023  [Refereed]

    DOI

    Scopus

  • Spin-Variable Reduction Method for Handling Linear Equality Constraints in Ising Machines

    Tatsuhiko Shirai, Nozomu Togawa

    IEEE Transactions on Computers    2023  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Dynamical Process of a Bit-Width Reduced Ising Model With Simulated Annealing.

    Shuta Kikuchi, Nozomu Togawa, Shu Tanaka

    IEEE Access   11   95493 - 95506  2023  [Refereed]

    DOI

    Scopus

  • Fast Hyperparameter Tuning for Ising Machines.

    Matthieu Parizy, Norihiro Kakuko, Nozomu Togawa

    ICCE     1 - 6  2023  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A Quasi-Initial Solution Giving Method for Ising Machines by Controlling External Magnetic Field Coefficients.

    Soma Kawakami, Kentaro Ohno, Dema Ba, Satoshi Yagi, Junji Teramoto, Nozomu Togawa

    ICCE     1 - 6  2023  [Refereed]

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A Constrained Graph Coloring Solver Based on Ising Machines.

    Soma Kawakami, Yosuke Mukasa, Siya Bao, Dema Ba, Junya Arai, Satoshi Yagi, Junji Teramoto, Nozomu Togawa

    ICCE     1 - 6  2023  [Refereed]

    DOI

    Scopus

  • Cardinality Constrained Portfolio Optimization on an Ising Machine.

    Matthieu Parizy, Przemyslaw Sadowski, Nozomu Togawa

    SOCC     1 - 6  2022  [Refereed]

    DOI

    Scopus

  • Effective Hardware-Trojan Feature Extraction Against Adversarial Attacks at Gate-Level Netlists.

    Kazuki Yamashita, Tomohiro Kato, Kento Hasegawa, Seira Hidano, Kazuhide Fukushima, Nozomu Togawa

    IOLTS     1 - 7  2022  [Refereed]

    DOI

    Scopus

  • An Anomalous Behavior Detection Method for IoT Devices Based on Power Waveform Shapes.

    Kota Hisafuru, Kazunari Takasaki, Nozomu Togawa

    IOLTS     1 - 7  2022  [Refereed]

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Hardware-Trojan Detection at Gate-level Netlists using Gradient Boosting Decision Tree Models.

    Ryotaro Negishi, Tatsuki Kurihara, Nozomu Togawa

    ICCE-Berlin     1 - 6  2022  [Refereed]

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Autonomous driving system with feature extraction using a binarized autoencoder.

    Kota Hisafuru, Ryotaro Negishi, Soma Kawakami, Dai Sato, Kazuki Yamashita, Keisuke Fukada, Nozomu Togawa

    FPT     1 - 4  2022  [Refereed]

    DOI

    Scopus

  • Multi-Objective Trip Planning Based on Ant Colony Optimization Utilizing Trip Records.

    Etsushi Saeki, Siya Bao, Toshinori Takayama, Nozomu Togawa

    IEEE Access   10   127825 - 127844  2022  [Refereed]

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Hybrid Annealing Method Based on subQUBO Model Extraction With Multiple Solution Instances.

    Yuta Atobe, Masashi Tawada, Nozomu Togawa

    IEEE Transactions on Computers   71 ( 10 ) 2606 - 2619  2022  [Refereed]

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Hardware-Trojan Detection Based on the Structural Features of Trojan Circuits Using Random Forests.

    Tatsuki Kurihara, Nozomu Togawa

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   105-A ( 7 ) 1049 - 1060  2022  [Refereed]

    DOI

  • QUBO Matrix Distorting Method for Consumer Applications.

    Tomokazu Yoshimura, Tatsuhiko Shirai, Masashi Tawada, Nozomu Togawa

    ICCE     1 - 6  2022  [Refereed]

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Efficient Coefficient Bit-Width Reduction Method for Ising Machines.

    Yuta Yachi, Yousuke Mukasa, Masashi Tawada, Nozomu Togawa

    ICCE     1 - 6  2022  [Refereed]

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • 1
    Citation
    (Scopus)
  • Carrying-mode Free Indoor Positioning Using Smartphone and Smartwatch and Its Evaluations.

    Tomoya Wakaizumi, Nozomu Togawa

    J. Inf. Process.   30   52 - 65  2022  [Refereed]

    DOI

  • An Anomalous Behavior Detection Method Based on Power Analysis Utilizing Steady State Power Waveform Predicted by LSTM

    Kazunari Takasaki, Ryoichi Kida, Nozomu Togawa

    Proceedings - 2021 IEEE 27th International Symposium on On-Line Testing and Robust System Design, IOLTS 2021     1 - 7  2021.06  [Refereed]

     View Summary

    Hardware security issues have emerged in recent years as Internet of Things (IoT) devices have rapidly spread. Power analysis is one of the methods to detect anomalous operations, but it is hard to apply it to IoT devices where an operating system and various software programs are running and hence its power waveforms become more complex. In this paper, we propose an anomalous behavior detection method utilizing application-specific power behaviors extracted by steady-state power waveform, which is generated by LSTM (long short-term memory). The proposed method is based on extracting application-specific power behaviors by predicting steady-state power waveforms. At that time, by using LSTM, we can effectively predict steady-state power waveforms, even if they include one or more cycled waveforms and/or they are composed of many complex waveforms. In the experiment, we implement three normal application programs and one anomalous application program on a single board computer and apply the proposed method to it. The experimental results demonstrate that the proposed method successfully detects the anomalous power behavior of an anomalous application program, while the existing method cannot.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Hardware-trojan classification based on the structure of trigger circuits utilizing random forests

    Tatsuki Kurihara, Nozomu Togawa

    Proceedings - 2021 IEEE 27th International Symposium on On-Line Testing and Robust System Design, IOLTS 2021     1 - 4  2021.06  [Refereed]

     View Summary

    Recently, with the spread of Internet of Things (IoT) devices, embedded hardware devices have been used in a variety of everyday electrical items. Due to the increased demand for embedded hardware devices, some of the IC design and manufacturing steps have been outsourced to third-party vendors. Since malicious third-party vendors may insert malicious circuits, called hardware Trojans, into their products, developing an effective hardware Trojan detection method is strongly required. In this paper, we propose 25 hardware-Trojan features based on the structure of trigger circuits for machine-learning-based hardware Trojan detection. Combining the proposed features into 11 existing hardware-Trojan features, we totally utilize 36 hardware-Trojan features for classification. Then we classify the nets in an unknown netlist into a set of normal nets and Trojan nets based on the random-forest classifier. The experimental results demonstrate that the average true positive rate (TPR) becomes 63.6% and the average true negative rate (TNR) becomes 100.0%. They improve the average TPR by 14.7 points while keeping the average TNR compared to existing state-of-the-art methods. In particular, the proposed method successfully finds out Trojan nets in several benchmark circuits, which are not found by the existing method.

    DOI

    Scopus

    14
    Citation
    (Scopus)
  • Data Augmentation for Machine Learning-Based Hardware Trojan Detection at Gate-Level Netlists

    Kento Hasegawa, Seira Hidano, Kohei Nozawa, Shinsaku Kiyomoto, Nozomu Togawa

    Proceedings - 2021 IEEE 27th International Symposium on On-Line Testing and Robust System Design, IOLTS 2021     1 - 4  2021.06  [Refereed]

     View Summary

    Due to the rapid growth in the information and telecommunications industries, an untrusted vendor might compromise the complicated supply chain by inserting hardware Trojans (HTs). Although hardware Trojan detection methods at gate-level netlists employing machine learning have been developed, the training dataset is insufficient. In this paper, we propose a data augmentation method for machine-learning-based hardware Trojan detection. Our proposed method replaces a gate in a hardware Trojan circuit with logically equivalent gates. The experimental results demonstrate that our proposed method successfully enhances the classification performance with all the classifiers in terms of the true positive rates (TPRs).

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • An Approach to the Vehicle Routing Problem with Balanced Pick-up Using Ising Machines

    Siya Bao, Masashi Tawada, Shu Tanaka, Nozomu Togawa

    2021 International Symposium on VLSI Design, Automation and Test, VLSI-DAT 2021 - Proceedings    2021.04  [Refereed]

     View Summary

    Vehicle routing problems (VRPs) can be solved as optimization problems. Practical applications of the VRPs are involved in various areas including manufacturing, supply chain, and tourism. Conventional approaches using von Neumann computers obtain good approximate solutions to the optimization problems, but conventional approaches show disadvantages of computation costs in large-scale or complex problems due to the combinatorial explosion. Oppositely, Ising machines or quantum annealing machines are non-von Neumann computers that are designed to solve complex optimization problems. In this paper, we propose an Ising-machine based approach for the vehicle routing problem with balanced pick-up (VRPBP). The development of the VRPBP is motivated by postal items pick-up services in the real-world. Our approach includes various features of VRP variants. We propose a 2-phase approach to solve the VRPBP and key elements in each phase are mapped onto quadratic unconstrained binary optimization (QUBO) forms. Specifically, the first phase belongs to the clustering phase which is an extension to the knapsack problem with additional distance and load balancing concerns. The second phase is mapped to the traveling salesman problem. Experimental results of our approach are evaluated in terms of solution quality and computation time compared with conventional approaches.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Mapping induced subgraph isomorphism problems to ising models and its evaluations by an ising machine

    Natsuhito Yoshimura, Masashi Tawada, Shu Tanaka, Junya Arai, Satoshi Yagi, Hiroyuki Uchiyama, Nozomu Togawa

    IEICE Transactions on Information and Systems   E104.D ( 4 ) 481 - 489  2021.04  [Refereed]

     View Summary

    SUMMARY Ising machines have attracted attention as they are expected to solve combinatorial optimization problems at high speed with Ising models corresponding to those problems. An induced subgraph isomorphism problem is one of the decision problems, which determines whether a specific graph structure is included in a whole graph or not. The problem can be represented by equality constraints in the words of combinatorial optimization problem. By using the penalty functions corresponding to the equality constraints, we can utilize an Ising machine to the induced subgraph isomorphism problem. The induced subgraph isomorphism problem can be seen in many practical problems, for example, finding out a particular malicious circuit in a device or particular network structure of chemical bonds in a compound. However, due to the limitation of the number of spin variables in the current Ising machines, reducing the number of spin variables is a major concern. Here, we propose an efficient Ising model mapping method to solve the induced subgraph isomorphism problem by Ising machines. Our proposed method theoretically solves the induced subgraph isomorphism problem. Furthermore, the number of spin variables in the Ising model generated by our proposed method is theoretically smaller than that of the conventional method. Experimental results demonstrate that our proposed method can successfully solve the induced subgraph isomorphism problem by using the Ising-model based simulated annealing and a real Ising machine.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Solving constrained slot placement problems using an ising machine and its evaluations

    Sho Kanamaru, Kazushi Kawamura, Shu Tanaka, Yoshinori Tomita, Nozomu Togawa

    IEICE Transactions on Information and Systems   E104D ( 2 ) 226 - 236  2021.02  [Refereed]

     View Summary

    Ising machines have attracted attention, which is expected to obtain better solutions of various combinatorial optimization problems at high speed by mapping the problems to natural phenomena. A slot-placement problem is one of the combinatorial optimization problems, regarded as a quadratic assignment problem, which relates to the optimal logic-block placement in a digital circuit as well as optimal delivery planning. Here, we propose a mapping to the Ising model for solving a slot-placement problem with additional constraints, called a constrained slot-placement problem, where several item pairs must be placed within a given distance. Since the behavior of Ising machines is stochastic and we map the problem to the Ising model which uses the penalty method, the obtained solution does not always satisfy the slot-placement constraint, which is different from the conventional methods such as the conventional simulated annealing. To resolve the problem, we propose an interpretation method in which a feasible solution is generated by post-processing procedures. We measured the execution time of an Ising machine and compared the execution time of the simulated annealing in which solutions with almost the same accuracy are obtained. As a result, we found that the Ising machine is faster than the simulated annealing that we implemented.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • An Indoor Positioning Method using Smartphone and Smartwatch Independent of Carrying Modes

    Tomoya Wakaizumi, Nozomu Togawa

    Digest of Technical Papers - IEEE International Conference on Consumer Electronics   2021-January  2021.01  [Refereed]

     View Summary

    A pedestrian dead reckoning method, or PDR method in short, is one of the positioning methods in indoor environments, which estimates user's positions by using sensors such as acceleration and angular velocity sensors. When we consider using a smartphone as a PDR sensor device, it has various carrying modes such as holding it directly and carrying it inside a pocket. How to deal with these various carrying modes is the great concern in PDR using a smartphone. In this paper, we propose a PDR method based on a combination of a smartphone and a smartwatch. By synchronizing smartphone and smartwatch sensors effectively, the proposed method can successfully reduce drift errors and thus estimate accurate user's positions, compared to just using a smartphone. Furthermore, even when the user carries his/her smartphone in various carrying modes, the proposed method still realizes accurate PDR. The experimental results demonstrate that the positioning errors are reduced by approximately 87.5% on average compared to the existing method.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Visiting-Route Recommendation in Amusement Parks and its Evaluations by an Ising Machine

    Yosuke Mukasa, Tomoya Wakaizumi, Shu Tanaka, Nozomu Togawa

    Digest of Technical Papers - IEEE International Conference on Consumer Electronics   2021-January   1 - 6  2021.01  [Refereed]

     View Summary

    In an amusement park, an attraction-visiting route considering the waiting time and traveling time improves visitors' satisfaction and experience. We focus on Ising machines to solve the problem, which are recently expected to solve combinatorial optimization problems at high speed by mapping the problems to Ising models or quadratic unconstrained binary optimization (QUBO) models. We propose a mapping of the visiting-route recommendation problem in amusement parks to a QUBO model for solving it using Ising machines. By using an actual Ising machine, we could obtain feasible solutions 15 times faster with almost the same accuracy as the simulated annealing method for the visiting-route recommendation problem.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Reducing Writing Energy Consumption for Non-Volatile Registers Utilizing Frequent Patterns of Sequential Bits on RISC-V Architecture

    Shota Matsuno, Masashi Tawada, Nozomu Togawa

    Digest of Technical Papers - IEEE International Conference on Consumer Electronics   2021-January   1 - 6  2021.01  [Refereed]

     View Summary

    Single-board computers have been widely spread and used in a variety of situations, where they may be requested to operate under low-energy conditions or with an unstable power supply. Utilizing non-volatile memory (NVM) retaining data without power must be one of the effective solutions to tackle this issue. However, compared to volatile memory such as SRAM and DRAM, NVM consumes more energy in writing operations. In this paper, we propose an effective energy reduction method for RISC-V architecture, targeting one of NVMs called spin-transfer torque RAMs (STT-RAM). Firstly, we thoroughly investigate the writing bit patterns to registers in RISC-V architecture for various typical application programs and find out that most of them can be classified into three patterns, in which most bits in writing 32-bit data are 0s (zero's). Secondly, we propose an energy-reduced register-writing method utilizing these frequent writing bit patterns. In this method, when a writing data falls into one of the three frequent bit writing patterns above, we just write the bit pattern type into the extra bits and do not write actual data into registers and hence we can reduce the write energy in NVM register writing extremely. Experimental results on RISC-V architecture demonstrate that the energy consumption is reduced by 12.5%-53.8% by using our proposed method compared to the baseline architecture.

    DOI

    Scopus

  • Multi-day Travel Planning Using Ising Machines for Real-world Applications.

    Siya Bao, Masashi Tawada, Shu Tanaka, Nozomu Togawa

    24th IEEE International Intelligent Transportation Systems Conference(ITSC)     3704 - 3709  2021  [Refereed]

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • A PDR Method Combining Smartphone and Smartwatch based on Multi-Scenario Map Matching.

    Tomoya Wakaizumi, Nozomu Togawa

    GCCE     308 - 309  2021  [Refereed]

    DOI

    Scopus

  • An autonomous driving system utilizing image processing accelerated by FPGA.

    Kazunari Takasaki, Kota Hisafuru, Ryotaro Negishi, Kazuki Yamashita, Keisuke Fukada, Tomoya Wakaizumi, Nozomu Togawa

    FPT     1 - 4  2021  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Toward Learning Robust Detectors from Imbalanced Datasets Leveraging Weighted Adversarial Training.

    Kento Hasegawa, Seira Hidano, Shinsaku Kiyomoto, Nozomu Togawa

    CANS     392 - 411  2021  [Refereed]

    DOI

    Scopus

  • A Three-Stage Annealing Method Solving Slot-Placement Problems Using an Ising Machine.

    Keisuke Fukada, Matthieu Parizy, Yoshinori Tomita, Nozomu Togawa

    IEEE Access   9   134413 - 134426  2021  [Refereed]

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Experimental evaluations of parallel tempering on an ising machine

    Yosuke Mukasa, Shu Tanaka, Nozomu Togawa

    IPSJ Transactions on System LSI Design Methodology   14   27 - 29  2021  [Refereed]

     View Summary

    Ising machines have recently attracted much attention because they are expected to solve combinatorial optimization problems efficiently. We focus on an Ising machine whose algorithm is based on parallel tempering (PT), and experimentally evaluate the performance of the Ising machine for MIN-CUT problems. Experimental results show that the Ising machine outperforms a famous graph partitioning solver in terms of the quality of solution and the time-to-target-solution.

    DOI

    Scopus

  • Performance Comparison of Typical Binary-Integer Encodings in an Ising Machine

    Kensuke Tamura, Tatsuhiko Shirai, Hosho Katsura, Shu Tanaka, Nozomu Togawa

    IEEE Access   9   81032 - 81039  2021  [Refereed]

     View Summary

    The differences in performance among binary-integer encodings in an Ising machine, which can solve combinatorial optimization problems, are investigated. Many combinatorial optimization problems can be mapped to find the lowest-energy (ground) state of an Ising model or its equivalent model, the Quadratic Unconstrained Binary Optimization (QUBO). Since the Ising model and QUBO consist of binary variables, they often express integers as binary when using Ising machines. A typical example is the combinatorial optimization problem under inequality constraints. Here, the quadratic knapsack problem is adopted as a prototypical problem with an inequality constraint. It is solved using typical binary-integer encodings: one-hot encoding, binary encoding, and unary encoding. Unary encoding shows the best performance for large-sized problems.

    DOI

    Scopus

    25
    Citation
    (Scopus)
  • Generating adversarial examples for hardware-trojan detection at gate-level netlists

    Kohei Nozawa, Kento Hasegawa, Seira Hidano, Shinsaku Kiyomoto, Kazuo Hashimoto, Nozomu Togawa

    Journal of Information Processing   29   236 - 246  2021  [Refereed]

     View Summary

    Recently, the great demand for integrated circuits (ICs) drives third parties to be involved in IC design and manufacturing steps. At the same time, the threat of injecting a malicious circuit, called a hardware Trojan, by third parties has been increasing. Machine learning is one of the powerful solutions for detecting hardware Trojans. How-ever, a weakness of such a machine-learning-based classification method against adversarial examples (AEs) has been reported, which causes misclassification by adding perturbation in input samples. This paper firstly proposes a framework generating adversarial examples for hardware-Trojan detection at gate-level netlists utilizing neural networks. The proposed framework replaces hardware Trojan circuits with logically equivalent ones, and makes it difficult to detect them. Secondly, we propose a Trojan-net concealment degree (TCD) and a modification evaluating value (MEV) as measures of the amount of modifications. Finally, based on the MEV, we pick up adversarial modification patterns to apply to the circuits against hardware-Trojan detection. The experimental results using benchmarks demonstrate that the proposed framework successfully decreases the true positive rate (TPR) by a maximum of 30.15 points.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • A route recommendation method considering individual user’s preferences by monte-carlo tree search and its evaluations

    Yuta Ishizaki, Yurie Koyama, Toshinori Takayama, Nozomu Togawa

    Journal of Information Processing   29   81 - 92  2021  [Refereed]

     View Summary

    As smartphones and tablets are widely spread and used, route recommendation and guidance services have become commonplace. Conventional services in route recommendation and guidance try to give best routes in terms of route length, time required, and train/bus fares, whereas even different users are given the same route when inputting the same parameters. However, each user has various preferences from the aspect of safety and comfort. It is strongly desirable to reflect the user’s preferences in route recommendation and recommend the most preferable route to every user. Since user’s preferences are extremely vague and complicated, how to evaluate them in route recommendation is one of the key problems there. In this paper, we propose a route recommendation method, called P-UCT method, considering individual user’s preferences utilizing Monte-Carlo tree search. In the proposed method, we firstly ex-tract route features based on the route recommendation history of every user and construct a route evaluator based on Support Vector Machine (SVM). After that, the method generates a random route from a start point to an end point by Monte-Carlo tree search. The route evaluator determines how well every generated route matches the user’s preferences. By repeating the evaluation, the method obtains the route, which must be closest to the user’s preferences. Experimental results demonstrate that the proposed method outperforms the existing method from the viewpoint of the average evaluation scores. They also demonstrate that the proposed method provides the recommended route reflecting the user’s individual preferences even if it learns the recommended route history of areas in different situations.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • A capacitance measurement device for running hardware devices and its evaluations

    Makoto Nishizawa, Kento Hasegawa, Nozomu Togawa

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E103A ( 9 ) 1018 - 1027  2020.09  [Refereed]

     View Summary

    In IoT (Internet-of-Things) era, the number and variety of hardware devices becomes continuously increasing. Several IoT devices are utilized at infrastructure equipments. How to maintain such IoT devices is a serious concern. Capacitance measurement is one of the powerful ways to detect anomalous states in the structure of the hardware devices. Particularly, measuring capacitance while the hardware device is running is a major challenge but no such researches proposed so far. This paper proposes a capacitance measuring device which measures device capacitance in operation. We firstly combine the AC (alternating current) voltage signal with the DC (direct current) supply voltage signal and generates the fluctuating signal. We supply the fluctuating signal to the target device instead of supplying the DC supply voltage. By effectively filtering the observed current in the target device, the filtered current can be proportional to the capacitance value and thus we can measure the target device capacitance even when it is running. We have implemented the proposed capacitance measuring device on the printed wiring board with the size of 95 mm × 70 mm and evaluated power consumption and accuracy of the capacitance measurement. The experimental results demonstrate that power consumption of the proposed capacitance measuring device is reduced by 65% in low-power mode from measuring mode and proposed device successfully measured capacitance in 0.002 µF resolution.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Trojan-net classification for gate-level hardware design utilizing boundary net structures

    Kento Hasegawa, Masao Yanagisawa, Nozomu Togawa

    IEICE Transactions on Information and Systems   E103D ( 7 ) 1618 - 1622  2020.07  [Refereed]

     View Summary

    Cybersecurity has become a serious concern in our daily lives. The malicious functions inserted into hardware devices have been well known as hardware Trojans. In this letter, we propose a hardware-Trojan classification method at gate-level netlists utilizing boundary net structures. We first use a machine-learning-based hardware-Trojan detection method and classify the nets in a given netlist into a set of normal nets and a set of Trojan nets. Based on the classification results, we investigate the net structures around the boundary between normal nets and Trojan nets, and extract the features of the nets mistakenly identified to be normal nets or Trojan nets. Finally, based on the extracted features of the boundary nets, we again classify the nets in a given netlist into a set of normal nets and a set of Trojan nets. The experimental results demonstrate that our proposed method outperforms an existing machine-learning-based hardware-Trojan detection method in terms of its true positive rate.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • An Anomalous Behavior Detection Method for IoT Devices by Extracting Application-Specific Power Behaviors

    Kazunari Takasaki, Kento Hasegawa, Ryoichi Kida, Nozomu Togawa

    Proceedings - 2020 26th IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2020     1 - 4  2020.07  [Refereed]

     View Summary

    With the widespread use of Internet of Things (IoT) devices in recent years, we utilize a variety of hardware devices in our daily life. On the other hand, hardware security issues are emerging. Power analysis is one of the methods to detect anomalous operations, but it is hard to apply it to IoT devices where an operating system and various software programs are running. In this paper, we propose an anomalous behavior detection method for an IoT device by extracting application-specific power behaviors. First, we measure a power consumption of an IoT device, and obtain the power waveform. Next, we extract an application-specific power waveform by eliminating a steady factor from the obtained power waveform. Finally, we extract feature values from the application-specific power waveform and detect an anomalous behavior by utilizing the local outlier factor (LOF) method. The experimental results using a single board computer demonstrate that the proposed method successfully detects the anomalous power behavior of an anomalous application program.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Evaluation on Hardware-Trojan Detection at Gate-Level IP Cores Utilizing Machine Learning Methods

    Tatsuki Kurihara, Kento Hasegawa, Nozomu Togawa

    Proceedings - 2020 26th IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2020     1 - 4  2020.07  [Refereed]

     View Summary

    Recently, with the spread of Internet of Things (IoT) devices, embedded hardware devices have been used in a variety of everyday electrical items. Due to the increased demand for embedded hardware devices, some of the IC design and manufacturing steps have been outsourced to third-party vendors. Since malicious third-party vendors may insert hardware Trojans into their products, developing an effective hardware Trojan detection method is strongly required. In this paper, we evaluate hardware Trojan detection methods using neural networks and random forests at gate-level intellectual property (IP) cores that contain more than 10,000 nets. First, we extract 11 features for each net in a given netlist, and learn them with neural networks and random forests. Then, we classify the nets in an unknown netlist into a set of normal nets and Trojan nets based on the learned classifiers. The experimental results demonstrate that the average true positive rate becomes 84.6% and the average true negative rate becomes 95.1%, which is sufficiently high accuracy compared to existing evaluations.

    DOI

    Scopus

    17
    Citation
    (Scopus)
  • Designing stochastic number generators sharing a random number source based on the randomization function

    Masashi Tawada, Nozomu Togawa

    NEWCAS 2020 - 18th IEEE International New Circuits and Systems Conference, Proceedings     271 - 274  2020.06  [Refereed]

     View Summary

    In this study, we propose a novel stochastic number generator architecture and prove that the resulting circuit can deliver independent stochastic numbers and improve the accuracy of the calculation results obtained using some recent conventional stochastic computing-based arithmetic circuits. This study is motivated by the increasingly important role of stochastic computing in various fields, such as the digital circuit design, where the stochastic number generators are responsible for a significant share of the hardware cost.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Document-level sentiment classification in japanese by stem-based segmentation with category and data-source information

    Siya Bao, Nozomu Togawa

    Proceedings - 14th IEEE International Conference on Semantic Computing, ICSC 2020     311 - 314  2020.02  [Refereed]

     View Summary

    Existing studies focus on text information while ignoring category and data source information, both of which are verified to be important in interpreting sentiments in travel comments in this paper. Furthermore, the unique linguistic characteristics of Japanese cause difficulty in applying the conventional token-based word segmentation methods to Japanese comments directly. In this paper, we propose a method of stem-based segmentation based on Japanese linguistic characteristics and incorporate it with category and data source information into a hierarchical network model for document-level sentiment classification. Empirical results of our proposed model outperform existing models on a real-world dataset.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Multi-Resolutional Image Format Using Stochastic Numbers and Its Hardware Implementation

    Ryota Ishikawa, Masashi Tawada, Masao Yanagisawa, Nozomu Togawa

    2020 IEEE 11th Latin American Symposium on Circuits and Systems, LASCAS 2020     1 - 4  2020.02  [Refereed]

     View Summary

    The popularization of IoT devices made image processing very common for users. Image formats used in hardware abound since there are varieties of IoT devices. Conversion of image formats in hardware is relatively complicated compared with other calculation. This paper focuses on conversion of image resolution, especially image reduction. By expressing images with stochastic numbers, this paper proposes an image format which can be treated to be in several resolution with one data. From experimental evaluations, we found that the proposed image format enables image reduction by pixel average to be implemented into hardware with lower costs compared with conventional pixel average using binary numbers. Also, image magnification using the proposed image format can restore the original image, while conventional image magnification cannot.

    DOI

    Scopus

  • Adversarial examples for hardware-trojan detection at gate-level netlists

    Kohei Nozawa, Kento Hasegawa, Seira Hidano, Shinsaku Kiyomoto, Kazuo Hashimoto, Nozomu Togawa

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   11980 LNCS   341 - 359  2020  [Refereed]

     View Summary

    Recently, due to the increase of outsourcing in integrated circuit (IC) design and manufacturing, the threat of injecting a malicious circuit, called a hardware Trojan, by third party has been increasing. Machine learning has been known to produce a powerful model to detect hardware Trojans. But it is recently reported that such a machine learning based detection is weak against adversarial examples (AEs), which cause misclassification by adding perturbation in input data. Referring to the existing studies on adversarial examples, most of which are discussed in the field of image processing, this paper first proposes a framework generating adversarial examples for hardware-Trojan detection for gate-level netlists utilizing neural networks. The proposed framework replaces hardware Trojan circuits with logically equivalent circuits, and makes it difficult to detect them. Second, we define Trojan-net concealment degree (TCD) as a possibility of misclassification, and modification evaluating value (MEV) as a measure of the amount of modifications. Third, judging from MEV, we pick up adversarial modification patterns to apply to the circuits against hardware-Trojan detection. The experimental results using benchmarks demonstrate that the proposed framework successfully decreases true positive rate (TPR) by at most 30.15 points.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • How to Reduce the Bit-width of an Ising Model by Adding Auxiliary Spins

    Daisuke Oku, Masashi Tawada, Shu Tanaka, Nozomu Togawa

    IEEE Transactions on Computers   71 ( 1 ) 223 - 234  2020  [Refereed]

     View Summary

    Annealing machines have been developed as non-von Neumann computers aimed at solving combinatorial optimization problems efficiently. To use annealing machines for solving combinatorial optimization problems, we have to represent the objective function and constraints by an Ising model, which is a theoretical model in statistical physics. Further, it is necessary to transform the Ising model according to the hardware limitations. In the transformation, the process of effectively reducing the bit-widths of coefficients in the Ising model has hardly been studied so far. Thus, when we consider the Ising model with a large bit-width, a naive method, which means right bit-shift, has to be applied. Since it is expected that obtaining highly accurate solutions is difficult by the naive method, it is necessary to construct a method for efficiently reducing the bit-width. This paper proposes methods for reducing the bit-widths of interaction and external magnetic field coefficients in the Ising model and proves that the reduction gives theoretically the same ground state of the original Ising model. The experimental evaluations also demonstrate the effectiveness of our proposed methods.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • Guiding Principle for Minor-Embedding in Simulated-Annealing-Based Ising Machines

    Tatsuhiko Shirai, Shu Tanaka, Nozomu Togawa

    IEEE Access   8   210490 - 210502  2020  [Refereed]

     View Summary

    We propose a novel type of minor-embedding (ME) in simulated-annealing-based Ising machines. The Ising machines can solve combinatorial optimization problems. Many combinatorial optimization problems are mapped to find the ground (lowest-energy) state of the logical Ising model. When connectivity is restricted on Ising machines, ME is required for mapping from the logical Ising model to a physical Ising model, which corresponds to a specific Ising machine. Herein we discuss the guiding principle of ME design to achieve a high performance in Ising machines. We derive the proposed ME based on a theoretical argument of statistical mechanics. The performance of the proposed ME is compared with two existing types of MEs for different benchmarking problems. Simulated annealing shows that the proposed ME outperforms existing MEs for all benchmarking problems, especially when the distribution of the degree in a logical Ising model has a large standard deviation. This study validates the guiding principle of using statistical mechanics for ME to realize fast and high-precision solvers for combinatorial optimization problems.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • A new LDPC code decoding method: Expanding the scope of ising machines

    Masashi Tawada, Shu Tanaka, Nozomu Togawa

    Digest of Technical Papers - IEEE International Conference on Consumer Electronics   2020-January   1 - 6  2020.01  [Refereed]

     View Summary

    Low-density parity-check (LDPC) codes have previously been considered as combinatorial optimization problems (COPs) in respect to its decoding. However, after defining it as such, none have gone so far as to convert the LDPC code into a quadratic unconstrained binary optimization (QUBO) problem. Thus, a new method is created: one that converts the LDPC code to a QUBO problem, inputs the QUBO problem into Ising machines (computers based on the Ising model that are designed to solve the QUBO problem), obtains the QUBO solution and converts it to a LDPC solution. By utilizing an actual Ising machine, LDPC solutions with code length of 256-bits have been obtained with an accuracy of 93.9% by average annealing time 214.0ms. The benefit of this newfound methodology goes beyond its theoretical imprint of obtaining LDPC solutions more accurately. It has only been a few years since the Ising machine has been developed. Therefore, in formulating this method, one expands the currently scope of studies involving Ising machines, helping current and future researchers unlock its full range of capabilities and possibilities.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Theory of Ising Machines and a Common Software Platform for Ising Machines

    Shu Tanaka, Yoshiki Matsuda, Nozomu Togawa

    Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC   2020-January   659 - 666  2020.01  [Refereed]

     View Summary

    Ising machines are a new type of non-Neumann computer that specializes in solving combinatorial optimization problems efficiently. The input form of Ising machines is the energy function of the Ising model or quadratic unconstrained binary optimization form, and Ising machines operate to search for a condition to minimize the energy function. We describe the theory of Ising machines and the present status of the Ising machines, software for Ising machines, and applications using Ising machines.

    DOI

    Scopus

    12
    Citation
    (Scopus)
  • FPGA-based Heterogeneous Solver for Three-Dimensional Routing

    Kento Hasegawa, Ryota Ishikawa, Makoto Nishizawa, Kazushi Kawamura, Masashi Tawada, Nozomu Togawa

    Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC   2020-January   11 - 12  2020.01  [Refereed]

     View Summary

    A heuristic algorithm is one of the approaches to solve an NP-hard problem. In order to enhance the capability of the system, heterogeneous computing is often adapted. In this paper, we propose an FPGA-based heterogeneous solver for three-dimensional routing. The proposed system is implemented into multiple FPGA boards and a single-board computer. The experimental results demonstrate that the proposed system outperforms a single FPGA system.

    DOI

    Scopus

  • Scalable stochastic number duplicators for accuracy-flexible arithmetic circuit design

    Ryota Ishikawa, Masashi Tawada, Masao Yanagisawa, Nozomu Togawa

    IPSJ Transactions on System LSI Design Methodology   13   10 - 20  2020  [Refereed]

     View Summary

    Stochastic computing is a computation method which can implement arithmetic operations by simple logic circuits. Stochastic numbers are used in this method, whose values are defined by their bit streams' appearance rates of 1's. As a nature of stochastic computing, changing the length of the input stochastic numbers will change the whole circuit's accuracy. However, in some implementations with re-convergence paths, the circuit itself will cause errors, i.e., the length of the input stochastic numbers does not change that circuit's accuracy. This paper proposes a stochastic number duplicator whose outputs differ every time and are all independent. This stochastic number duplicator has a scalable structure by changing the numbers of flip-flops for bit re-arrangement. From the experimental evaluations and discussions, we clarify that the proposed stochastic number duplicator enables accuracy-flexible circuits.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A travel decision support algorithm: Landmark activity extraction from japanese travel comments

    Siya Bao, Masao Yanagisawa, Nozomu Togawa

    Studies in Computational Intelligence   849   109 - 123  2020  [Refereed]

     View Summary

    To help people smoothly and efficiently make travel decisions, we utilize the advantages of travel comments posted by thousands of other travelers. In this paper, we analyze the feasibility of exploring landmark activity queries and representative examples from Japanese travel comments. Contributions in this paper include a framework for extracting activity concerned keywords and queries, quantifying the relationship between landmark activities and comment contents. An evaluation of activity-example extraction is conducted in two case studies through 18,939 travel comments.

    DOI

    Scopus

  • Implementation of a ROS-Based Autonomous Vehicle on an FPGA Board

    Kento Hasegawa, Kazunari Takasaki, Makoto Nishizawa, Ryota Ishikawa, Kazushi Kawamura, Nozomu Togawa

    2019 International Conference on Field-Programmable Technology (ICFPT)    2019.12

    DOI

  • A fully-connected ising model embedding method and its evaluation for CMOS annealing machines

    Oku, D., Terada, K., Hayashi, M., Yamaoka, M., Tanaka, S., Togawa, N.

    IEICE Transactions on Information and Systems   E102D ( 9 )  2019

    DOI

    Scopus

    19
    Citation
    (Scopus)
  • Personalized Landmark Recommendation for Language-Specific Users by Open Data Mining

    Siya Bao, Masao Yanagisawa, Nozomu Togawa

    Studies in Computational Intelligence   791   107 - 121  2019  [Refereed]

     View Summary

    © 2019, Springer Nature Switzerland AG. This paper proposes a personalized landmark recommendation algorithm aiming at exploring new sights into the determinants of landmark satisfaction prediction. We gather 1,219,048 user-generated comments in Tokyo, Shanghai and New York from four travel websites. We find that users have diverse satisfaction on landmarks those findings, we propose an effective algorithm for personalize landmark satisfaction prediction. Our algorithm provides the top-6 landmarks with the highest satisfaction to users for a one-day trip plan our proposed algorithm has better performances than previous studies from the viewpoints of landmark recommendation and landmark satisfaction prediction.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Bicycle behavior recognition using 3-axis acceleration sensor and 3-axis gyro sensor equipped with smartphone

    Usami, Y., Ishikawa, K., Takayama, T., Yanagisawa, M., Togawa, N.

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E102A ( 8 ) 953 - 965  2019  [Refereed]

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Static Error Analysis and Optimization of Faithfully Truncated Adders for Area-Power Efficient FIR Designs.

    Jinghao Ye, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

    IEEE International Symposium on Circuits and Systems, ISCAS 2019, Sapporo, Japan, May 26-29, 2019   2019-May   1 - 4  2019  [Refereed]

     View Summary

    © 2019 IEEE Faithfully truncated adders are used for low cost FIR implementations in this paper, which improves state-of-the-art CSD-based FIR filter designs for further area and power reduction while meeting the accuracy requirement. As a solution to the accuracy loss caused by truncated adders, this paper performed a static error analysis of truncated adders. Furthermore, based upon our mathematical analysis, we show that, with a given accuracy constraint, an optimal truncated adder configuration can be effortlessly determined for area-power efficient FIR designs. Evaluation results on various FIR designs showed that 16.8%~35.4% reduction in area and 11.8%~27.9% in power saving can be achieved with the proposed optimal truncated adder designs within an average error of 1 ulp.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Efficient Ising Model Mapping to Solving Slot Placement Problem.

    Sho Kanamaru, Daisuke Oku, Masashi Tawada, Shu Tanaka, Masato Hayashi, Masanao Yamaoka, Masao Yanagisawa, Nozomu Togawa

    IEEE International Conference on Consumer Electronics, ICCE 2019, Las Vegas, NV, USA, January 11-13, 2019     1 - 6  2019  [Refereed]

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • An FPGA Implementation Method based on Distributed-register Architectures.

    Koichi Fujiwara, Kazushi Kawamura, Masao Yanagisawa, Nozomu Togawa

    IPSJ Trans. System LSI Design Methodology   12   38 - 41  2019  [Refereed]

    DOI

    Scopus

  • A robust indoor/outdoor detection method based on spatial and temporal features of sparse GPS measured positions

    Iwata, S., Ishikawa, K., Takayama, T., Yanagisawa, M., Togawa, N.

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E102A ( 6 ) 860 - 865  2019  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A multiple cyclic-route generation method with route length constraint considering point-of-interests

    Nishimura, T., Ishikawa, K., Takayama, T., Yanagisawa, M., Togawa, N.

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E102A ( 4 ) 641 - 653  2019  [Refereed]

    DOI

    Scopus

  • Personalized landmark recommendation algorithm based on language-specific satisfaction prediction using heterogeneous open data sources

    Siya Bao, Masao Yanagisawa, Nozomu Togawa

    Proceedings - 2018 10th International Conference on Computational Intelligence and Communication Networks, CICN 2018     70 - 76  2018.08

     View Summary

    © 2018 IEEE. This paper proposes a personalized landmark recommendation algorithm based on the prediction of users' satisfaction on landmarks. We have accumulated 270,239 user-generated comments from travel websites of Ctrip, Jaran and TripAdvisor for 196 landmarks in Tokyo, Japan. We find that users do have different satisfaction on landmarks depending on their commonly used languages and travel websites. Then we establish a database for landmarks with abundant and accurate landmark type and landmark satisfaction information. Finally, we propose an effective personalized landmark satisfaction prediction algorithm based on users' landmark type, language and travel website preferences. After that, landmarks with the top-6 highest satisfaction are provided to the user for a one-day visit plan in Tokyo. Experimental results demonstrate that the proposed algorithm can recommend landmarks that fit the user's preferences and our algorithm also successfully predicts the user's landmark satisfaction with a low error rate less than 7%, which is superior to other previous studies.

    DOI

    Scopus

  • Robust AES circuit design for delay variation using suspicious timing error prediction

    Yuki Yahagi, Masao Yanagisawa, Nozomu Togawa

    Proceedings - International SoC Design Conference 2017, ISOCC 2017     101 - 102  2018.05  [Refereed]

     View Summary

    This paper proposes a robust AES (advanced encryption standard) circuit for delay variation. In our proposed AES circuit, suspicious timing error prediction circuits (STEPCs) and their associating gating circuit are incorporated into a normal AES circuit to predict timing errors. STEPCs are inserted between inter-module connections and thus we can monitor almost all of the signal paths between registers and effectively prevent timing errors. The simulation results demonstrate that our AES circuit with STEPCs can be overclocked by up to 1.66X with just 8.05% area overheads.

    DOI

    Scopus

  • A selector-based FFT processor and its FPGA implementation

    Yuya Hirai, Kazushi Kawamura, Masao Yanagisawa, Nozomu Togawa

    Proceedings - International SoC Design Conference 2017, ISOCC 2017     88 - 89  2018.05  [Refereed]

     View Summary

    Fast Fourier transform (FFT) is used in various applications such as signal processings and developing a high-speed FFT processor is quite required. In this paper, we propose a high-speed FFT processor based on selector logics. The selector-based FFT processor is constructed by focusing on the subtract-multiplication operations and partly applying selector logics to them. Furthermore, we implement the selector-based FFT processor on a Xilinx FPGA. Experimental results show that our proposed FFT processor can improve the processing speed by up to 21% and also reduce the number of LUTs by up to 33% compared with a naive FFT processor.

    DOI

    Scopus

  • A loop structure optimization targeting high-level synthesis of fast number theoretic transform

    Kazushi Kawamura, Masao Yanagisawa, Nozomu Togawa

    Proceedings - International Symposium on Quality Electronic Design, ISQED   2018-   106 - 111  2018.05  [Refereed]

     View Summary

    Multiplication with a large number of digits is heavily used when processing data encrypted by a fully homomorphic encryption, which is a bottleneck in computation time. An algorithm utilizing fast number theoretic transform (FNTT) is known as a high-speed multiplication algorithm and the further speeding up is expected by implementing the FNTT process on an FPGA. A high-level synthesis tool enables efficient hardware implementation even for FNTT with a large number of points. In this paper, we propose a methodology for optimizing the loop structure included in a software description of FNTT so that the performance of the synthesized FNTT processor can be maximized. The loop structure optimization is considered in terms of loop flattening and trip count reduction. We implement a 65,536-point FNTT processor with the loop structure optimization on an FPGA, and demonstrate that it can be executed 6.9 times faster than the execution on a CPU.

    DOI

    Scopus

    17
    Citation
    (Scopus)
  • A stayed location estimation method for sparse GPS positioning information based on positioning accuracy and short-time cluster removal

    Sae Iwata, Tomoyuki Nitta, Toshinori Takayama, Masao Yanagisawa, Nozomu Togawa

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E101A ( 5 ) 831 - 843  2018.05  [Refereed]

     View Summary

    Cell phones with GPS function as well as GPS loggers are widely used and users' geographic information can be easily obtained. However, still battery consumption in these mobile devices is main concern and then obtaining GPS positioning data so frequently is not allowed. In this paper, a stayed location estimation method for sparse GPS positioning information is proposed. After generating initial clusters from a sequence of measured positions, the e ective radius is set for every cluster based on positioning accuracy and the clusters are merged e ectively using it. After that, short-time clusters are removed temporarily but measured positions included in them are not removed. Then the clusters are merged again, taking all the measured positions into consideration. This process is performed twice, in other words, two-stage short-time cluster removal is performed, and finally accurate stayed location estimation is realized even when the GPS positioning interval is five minutes or more. Experiments demonstrate that the total distance error between the estimated stayed location and the true stayed location is reduced by more than 33% and also the proposed method much improves F1 measure compared to conventional state-of-the-art methods.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A hardware-Trojan classification method utilizing boundary net structures

    Kento Hasegawa, Masao Yanagisawa, Nozomu Togawa

    2018 IEEE International Conference on Consumer Electronics, ICCE 2018   2018-   1 - 4  2018.03  [Refereed]

     View Summary

    Recently, cybersecurity has become a serious concern for us. For example, the threats of hardware Trojans (malfunctions inserted into hardware devices) have appeared. Since hardware vendors often outsource parts of their hardware products to third-party vendors, the risk of hardware-Trojan insertion has been increased. Especially in the hardware design step, malicious vendors have a chance to insert hardware Trojans easily. In this paper, we propose a hardware-Trojan classification method utilizing boundary net structures. To begin with, we use a machine-learning-based hardware-Trojan detection method and classify the nets in a given netlist into a set of normal nets and that of Trojan nets. Based on the classification, we investigate the nets around the boundary between normal nets and Trojan nets and extract the features of the nets identified to be normal nets or Trojan nets mistakenly. Finally, using the classification results of machine-learning-based hardware-Trojan detection and the extracted features of the boundary nets, we classify the nets in a given netlist into a set of normal nets and that of Trojan nets again. The experimental results demonstrate that our method outperforms an existing machine-learning-based hardware-Trojan detection method in terms of true positive rate.

    DOI

    Scopus

    23
    Citation
    (Scopus)
  • Road-illuminance level inference across road networks based on Bayesian analysis

    Siya Bao, Masao Yanagisawa, Nozomu Togawa

    2018 IEEE International Conference on Consumer Electronics, ICCE 2018   2018-   1 - 6  2018.03  [Refereed]

     View Summary

    This paper proposes a road-illuminance level inference method based on the naive Bayesian analysis. We investigate quantities and types of road lights and landmarks with a large set of roads in real environments and reorganize them into two safety classes, safe or unsafe, with seven road attributes. Then we carry out data learning using three types of datasets according to different groups of the road attributes. Experimental results demonstrate that the proposed method successfully classifies a set of roads with seven attributes into safe ones and unsafe ones with the accuracy of more than 85%, which is superior to other machine-learning based methods and a manual-based method.

    DOI

    Scopus

  • Scan-based side-channel attack against HMAC-SHA-256 circuits based on isolating bit-transition groups using scan signatures

    Daisuke Oku, Masao Yanagisawa, Nozomu Togawa

    IPSJ Transactions on System LSI Design Methodology   11   16 - 28  2018.02  [Refereed]

     View Summary

    A scan chain is used by scan-path test, one of design-for-test techniques, which can control and observe internal registers in an LSI chip. On the other hand, a scan-based side-channel attack is focused on which can restore secret information by exploiting the scan data obtained from a scan chain inside the crypto chip during cryptographic processing. In this paper, we propose a scan-based attack method against a hash generator circuit called HMAC-SHA- 256. Our proposed method is composed of three steps
    Firstly, we isolate 64 bit-transition groups from a scan data using scan signatures based on the property of the HMAC-SHA-256 algorithm. Secondly, we classify these 64 bittransition groups into 32 pairs. Lastly, we find out the correspondence between the scan data and the internal registers in the target HMAC-SHA-256 circuit. Our proposed method restores the secret information by the three steps above, even if the scan chain includes registers other than the target hash generator circuit and hence it becomes too long. Experimental results show that our proposed method successfully restores two secret keys of the HMAC-SHA-256 circuit using up to 425 input messages in 7.5 hours.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Designing hardware trojans and their detection based on a SVM-based approach

    Tomotaka Inoue, Kento Hasegawa, Masao Yanagisawa, Nozomu Togawa

    Proceedings of International Conference on ASIC   2017-   811 - 814  2018.01  [Refereed]

     View Summary

    Since hardware production become inexpensive and international, hardware vendors often outsource their products to third-party vendors. Due to the situation, malicious vendors can easily insert malfunctions (also known as 'hardware Trojans') to their products. In this paper, we experimentally evaluate a machine-learning-based hardware-Trojan detection method using several hardware Trojans we designed. To begin with, we design three types of hardware Trojans and insert them to simple RS232 transceiver circuits. After that, we learn known netlists, where we know which nets are Trojan ones or normal ones beforehand, using a machine-learning-based hardware-Trojan detection method with a support vector machine (SVM) classifier. Finally, we classify the nets in the designed hardware-Trojan-inserted netlists into a set of Trojan nets and that of normal nets using the learned classifier. The experimental results demonstrate that the hardware-Trojan detection method with the SVM-based approach can detect a part of hardware Trojans we designed.

    DOI

    Scopus

    31
    Citation
    (Scopus)
  • A low cost and high speed CSD-based symmetric transpose block FIR implementation

    Jinghao Ye, Youhua Shi, Nozomu Togawa, Masao Yanagisawa

    Proceedings of International Conference on ASIC   2017-   311 - 314  2018.01  [Refereed]

     View Summary

    In this paper, a low cost and high speed CSD-based symmetric transpose block FIR design was proposed for low cost digital signal processing. First, the existing area-efficient CSD-based multiplier was optimized by considering the reusability and the symmetry of coefficients for area reduction. Second, the position of the input register was changed for high speed transpose block FIR processing in which half of the number of required multipliers can be saved. When compared with the existing block FIR designs, the proposed FIR design can increase the data rate from 238.66 MHz to 373.13 MHz while saving 10.89% area and 21.30% energy consumption as well.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • Floorplan-driven high-level synthesis using volatile/non-volatile registers for hybrid energy-harvesting systems

    Daiki Asai, Masao Yanagisawa, Nozomu Togawa

    Proceedings of International Conference on ASIC   2017-   64 - 67  2018.01  [Refereed]

     View Summary

    In this paper, we propose a floorplan-driven highlevel synthesis algorithm utilizing both volatile and non-volatile registers for hybrid energy-harvesting systems. In our algorithm, we firstly introduce an idea of safety line candidates. Based on them, we perform safety-line (SL) scheduling so that every operation does not cross the safety line candidates and then perform volatile/non-volatile register binding so that all the data crossing the safety line candidates are stored into non-violate registers. We can safely restore all the data and re-start the circuit operation from every safety line candidate, even if the power shut-off occurs while running the circuit. Experimental results show that our algorithm reduces average latency by 30.76% and the average energy consumption by 24.94% compared to the naive algorithm when sufficient energy is given (normal mode). Experimental results also show that our algorithm reduces average latency by 30.58% compared to the naive algorithm by reducing rollback execution if a small amount of energy is given (energy-harvesting mode).

    DOI

    Scopus

  • Soft error tolerant latch designs with low power consumption (invited paper)

    Saki Tajima, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

    Proceedings of International Conference on ASIC   2017-   52 - 55  2018.01  [Refereed]

     View Summary

    As semiconductor technology continues scaling down, the reliability issue has become much more critical than ever before. Unlike traditional hard-errors caused by permanent physical damage which can't be recovered in field, soft errors are caused by radiation or voltage/current fluctuations that lead to transient changes on internal node states, thus they can be viewed as temporary errors. However, due to the unpredictable occurrence of soft errors, it is desirable to develop soft error tolerant designs. For this reason, soft error tolerant design techniques have gained great research interest. In this paper, we will explain the soft error mechanism and then review the existing soft error tolerant design techniques with particular emphasis on SEH family because they can achieve low power consumption and small performance overhead as well.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Detecting the Existence of Malfunctions in Microcontrollers Utilizing Power Analysis.

    Kento Hasegawa, Masao Yanagisawa, Nozomu Togawa

    24th IEEE International Symposium on On-Line Testing And Robust System Design, IOLTS 2018, Platja D'Aro, Spain, July 2-4, 2018     97 - 102  2018  [Refereed]

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • An Effective Stochastic Number Duplicator and Its Evaluations Using Composite Arithmetic Circuits.

    Ryota Ishikawa, Masashi Tawada, Masao Yanagisawa, Nozomu Togawa

    24th IEEE International Symposium on On-Line Testing And Robust System Design, IOLTS 2018, Platja D'Aro, Spain, July 2-4, 2018     53 - 56  2018  [Refereed]

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Hardware Trojan Detection Utilizing Machine Learning Approaches.

    Kento Hasegawa, Youhua Shi, Nozomu Togawa

    17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications / 12th IEEE International Conference On Big Data Science And Engineering, TrustCom/BigDataSE 2018, New York, NY, USA, August 1-3, 2018     1891 - 1896  2018  [Refereed]

     View Summary

    © 2018 IEEE. Hardware security has become a serious concern in recent years. Due to the outsourcing in hardware production, malicious circuits (or hardware Trojans) can be easily inserted into hardware products by attackers. Since hardware Trojans are tiny and stealthy, their detection is difficult. Under the circumstances, numerous hardware-Trojan detection methods have been proposed. In this paper, we elaborate the overview of hardware-Trojan detection and review the hardware-Trojan detection methods using machine learning which is one of the state-of-the-art approaches.

    DOI

    Scopus

    30
    Citation
    (Scopus)
  • Message from the Editor-in-Chief

    Togawa, N.

    IPSJ Transactions on System LSI Design Methodology   11   1 - 1  2018  [Refereed]

    DOI

    Scopus

  • An Ising model mapping to solve rectangle packing problem.

    Kotaro Terada, Daisuke Oku, Sho Kanamaru, Shu Tanaka, Masato Hayashi, Masanao Yamaoka, Masao Yanagisawa, Nozomu Togawa

    2018 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan, April 16-19, 2018     1 - 4  2018  [Refereed]

    DOI

    Scopus

    21
    Citation
    (Scopus)
  • A relaxed bit-write-reducing and error-correcting code for non-volatile memories

    Kojo, T., Tawada, M., Yanagisawa, M., Togawa, N.

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E101A ( 7 ) 1045 - 1052  2018  [Refereed]

    DOI

    Scopus

  • A Low Power Soft Error Hardened Latch with Schmitt-Trigger-Based C-Element.

    Saki Tajima, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

    IEICE Transactions   101-A ( 7 ) 1025 - 1034  2018  [Refereed]

     View Summary

    Copyright © 2018 The Institute of Electronics, Information and Communication Engineers. To deal with the reliability issue caused by soft errors, this paper proposed a low power soft error hardened latch (SHC) design using a novel Schmitt-Trigger-based C-element for reliable low power applications. Unlike state-of-the-art soft error tolerant latches that are usually based on hardware redundancy with large area overhead and high power consumption, the proposed SHC latch is implemented through double-sampling and node-checking using a novel Schmitt-Trigger-based C-element, which can help to reduce the area overhead and the corresponding power consumption as well. The evaluation results show that the total number of transistors of the proposed SHC latch is only increased by 2 when compared to the conventional unhardened C2MOS latch, while up to 20.35% and 82.96% power reduction can be achieved when compared to the conventional un-hardened C2MOS latch and the existing soft error tolerant HiPeR design, respectively.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Extension and Performance/Accuracy Formulation for Optimal GeAr-Based Approximate Adder Designs.

    Ken Hayamizu, Nozomu Togawa, Masao Yanagisawa, Youhua Shi

    IEICE Transactions   101-A ( 7 ) 1014 - 1024  2018  [Refereed]

     View Summary

    Copyright © 2018 The Institute of Electronics, Information and Communication Engineers. Approximate computing is a promising solution for future energy-efficient designs because it can provide great improvements in performance, area and/or energy consumption over traditional exact-computing designs for non-critical error-tolerant applications. However, the most challenging issue in designing approximate circuits is how to guarantee the pre-specified computation accuracy while achieving energy reduction and performance improvement. To address this problem, this paper starts from the state-of-the-art general approximate adder model (GeAr) and extends it for more possible approximate design candidates by relaxing the design restrictions. And then a maximum-error-distance-based performance/accuracy formulation, which can be used to select the performance/energy-accuracy optimal design from the extended design space, is proposed. Our evaluation results show the effectiveness of the proposed method in terms of area overhead, performance, energy consumption, and computation accuracy.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Stochastic number duplicators based on bit re-arrangement using randomized bit streams

    Ishikawa, R., Tawada, M., Yanagisawa, M., Togawa, N.

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E101A ( 7 ) 1002 - 1013  2018  [Refereed]

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • A Multiple Cyclic-Route Generation Method for Strolling Based on Point-of-Interests.

    Tensei Nishimura, Kazuaki Ishikawa, Toshinori Takayama, Masao Yanagisawa, Nozomu Togawa

    8th IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin 2018, Berlin, Germany, September 2-5, 2018     1 - 2  2018  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Robust Indoor/Outdoor Detection Method based on Sparse GPS Positioning Information.

    Sae Iwata, Kazuaki Ishikawa, Toshinori Takayama, Masao Yanagisawa, Nozomu Togawa

    8th IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin 2018, Berlin, Germany, September 2-5, 2018     1 - 4  2018  [Refereed]

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Designing Subspecies of Hardware Trojans and Their Detection Using Neural Network Approach.

    Tomotaka Inoue, Kento Hasegawa, Yuki Kobayashi, Masao Yanagisawa, Nozomu Togawa

    8th IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin 2018, Berlin, Germany, September 2-5, 2018     1 - 4  2018  [Refereed]

    DOI

    Scopus

    15
    Citation
    (Scopus)
  • Landmark Seasonal Travel Distribution and Activity Prediction Based on Language-specific Analysis.

    Siya Bao, Masao Yanagisawa, Nozomu Togawa

    IEEE International Conference on Big Data, Big Data 2018, Seattle, WA, USA, December 10-13, 2018     3628 - 3637  2018  [Refereed]

     View Summary

    © 2018 IEEE. Online media communities have globally spanned and have increasingly accelerated the development of intelligent travel recommendation systems in both academic and industrial fields. However, there is a bottleneck that differences in users' seasonal travel distributions (when to visit) in various language groups are ignored. This paper proposes a seasonal activity prediction algorithm based on user comments over the period of 2012 to 2017 in different language groups. We take the advantage of online user comments which provide visiting time for each landmark and detailed activity description. With the accumulation of 417,787 user comments on TripAdvisor for 300 landmarks in three big cities, we analyze the language-specific differences in travel distributions. After that, prediction of future travel distribution for each language group is generated. Then potential peak and off seasons of each landmark are distinguished and representative seasonal activities are extracted through comment contents for peak and off seasons, respectively. Experimental results in the three cities show that the proposed algorithm is more accurate in terms of peak season detection and seasonal activity prediction than previous studies.

    DOI

    Scopus

  • Capacitance Measurement of Running Hardware Devices and its Application to Malicious Modification Detection.

    Makoto Nishizawa, Kento Hasegawa, Nozomu Togawa

    2018 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2018, Chengdu, China, October 26-30, 2018     362 - 365  2018  [Refereed]

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Empirical evaluation and optimization of hardware-Trojan classification for gate-level netlists based on multi-layer neural networks

    Hasegawa, K., Yanagisawa, M., Togawa, N.

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E101A ( 12 ) 2320 - 2326  2018  [Refereed]

    DOI

    Scopus

    10
    Citation
    (Scopus)
  • An accurate indoor positioning algorithm using particle filter based on the proximity of bluetooth beacons

    Ryoya Momose, Tomoyuki Nitta, Masao Yanagisawa, Nozomu Togawa

    2017 IEEE 6th Global Conference on Consumer Electronics, GCCE 2017   2017-   1 - 5  2017.12  [Refereed]

     View Summary

    Indoor positioning without GPS is one of the most important problems in indoor pedestrian navigation. In this paper, we propose an accurate indoor positioning algorithm using a particle filter based on a floormap, where we use the proximity of the Bluetooth beacons as well as acceleration and geomagnetic sensors. In designing the likelihood function in the particle filter, we effectively use the proximity of the Bluetooth beacons, which just gives rough distance to the target beacon but more stable than conventional RSSI-based distance estimation. In addition to that, by effectively utilizing a floormap, the accumulated positioning errors due to the acceleration and geomagnetic sensors are much reduced. Moreover, when the radio waves from the Bluetooth beacons are blocked by obstacles, we can also take it into account in designing the likelihood function in the particle filter. Experimental results demonstrate that our algorithm can reduce the indoor positioning errors by up to 79% compared to several conventional algorithms.

    DOI

    Scopus

    14
    Citation
    (Scopus)
  • A stayed location estimation method for sparse GPS positioning information

    Sae Iwata, Tomoyuki Nitta, Toshinori Takayama, Masao Yanagisawa, Nozomu Togawa

    2017 IEEE 6th Global Conference on Consumer Electronics, GCCE 2017   2017-   1 - 5  2017.12  [Refereed]

     View Summary

    Cell phones with GPS function as well as GPS loggers are widely used and we can easily obtain users' geographic information. However, still battery consumption in these mobile devices is main concern and then we are not allowed to obtain GPS positioning data so frequently. In this paper, we propose a stayed location estimation method for sparse GPS positioning data. After generating initial clusters from a sequence of measured positions, we set the effective radius for every cluster based on positioning accuracy and merge the clusters effectively using it. After that, we temporarily remove short-time clusters but do not remove measured positions included in them. Then we merge the clusters again, taking all the measured positions into consideration. We perform this process twice, i.e, we perform two-stage short-time cluster removal, and finally realize accurate stayed location estimation even when the GPS positioning interval is five minutes or more. Experiments demonstrate that the total distance error between the estimated stayed location and the true stayed location is reduced by more than 50% compared to a conventional state-of-the-art method.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Personalized one-day travel with multi-nearby-landmark recommendation

    Siya Bao, Masao Yanagisawa, Nozomu Togawa

    IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin   2017-   239 - 242  2017.12  [Refereed]

     View Summary

    Travel route recommendation can strongly influence users' satisfaction and the success of touristic businesses. This paper proposes a personalized travel recommendation algorithm with time planning. We use landmark categorization and region clustering to obtain effective elements. Then we build a travel map to generate all possible travel routes. Our proposed algorithm has higher precision in landmark recommendation and time planning than thoes in previous algorithms.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • A robust scan-based side-channel attack method against HMAC-SHA-256 circuits

    Daisuke Oku, Masao Yanagisawa, Nozomu Togawa

    IEEE International Conference on Consumer Electronics - Berlin, ICCE-Berlin   2017-   79 - 84  2017.12  [Refereed]

     View Summary

    A scan-based side-channel attack is still a real threat against a crypto circuit as well as a hash generator circuit, which can restore secret information by exploiting the scan data obtained from scan chains inside the chip during its processing. In this paper, we propose a scan-based attack method against a hash generator circuit called HMAC-SHA-256. Our proposed method restores the secret information by finding out the correspondence between the scan data obtained from a scan chain and the internal registers in the target HMAC-SHA-256 circuit, even if the scan chain includes registers other than the target hash generator circuit and an attacker does not know well the hash generation timing. Experimental results show that our proposed method successfully restores two secret keys of the HMAC-SHA-256 circuit in at most 6 hours.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • A bitwidth-aware high-level synthesis algorithm using operation chainings for tiled-DR architectures

    Kotaro Terada, Masao Yanagisawa, Nozomu Togawa

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E100A ( 12 ) 2911 - 2924  2017.12  [Refereed]

     View Summary

    As application hardware designs and implementations in a short term are required, high-level synthesis is more and more essential EDA technique nowadays. In deep-submicron era, interconnection delays are not negligible even in high-level synthesis thus distributed-register and - controller architectures (DR architectures) have been proposed in order to cope with this problem. It is also profitable to take data-bitwidth into account in high-level synthesis. In this paper, we propose a bitwidth-aware high-level synthesis algorithm using operation chainings targeting Tiled-DR architectures. Our proposed algorithm optimizes bitwidths of functional units and utilizes the vacant tiles by adding some extra functional units to realize effective operation chainings to generate high performance circuits without increasing the total area. Experimental results show that our proposed algorithm reduces the overall latency by up to 47% comparedtothe conventional approach without area overheads by eliminating unnecessary bitwidths and adding efficient extra FUs for Tiled-DR architectures.

    DOI

    Scopus

  • A safe and comprehensive route finding algorithm for pedestrians based on lighting and landmark conditions

    Siya Bao, Tomoyuki Nitta, Masao Yanagisawa, Nozomu Togawa

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E100A ( 11 ) 2439 - 2450  2017.11  [Refereed]

     View Summary

    In this paper, we propose a safe and comprehensive route finding algorithm for pedestrians based on lighting and landmark conditions. Safety and comprehensiveness can be predicted by the five possible indicators: (1) lighting conditions, (2) landmark visibility, (3) landmark effectiveness, (4) turning counts along a route, and (5) road widths. We first investigate impacts of these five indicators on pedestrians' perceptions on safety and comprehensiveness during route findings. After that, a route finding algorithm is proposed for pedestrians. In the algorithm, we design the score based on the indicators (1), (2), (3), and (5) above and also introduce a turning count reduction strategy for the indicator (4). Thus we find out a safe and comprehensive route through them. In particular, we design daytime score and nighttime score differently and find out an appropriate route depending on the time periods. Experimental simulation results demonstrate that the proposed algorithm obtains higher scores compared to several existing algorithms. We also demonstrate that the proposed algorithm is able to find out safe and comprehensive routes for pedestrians in real environments in accordance with questionnaire results.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Effective write-reduction method for MLC non-volatile memory

    Masashi Tawada, Shinji Kimura, Masao Yanagisawa, Nozomu Togawa

    Proceedings - IEEE International Symposium on Circuits and Systems     1 - 4  2017.09  [Refereed]

     View Summary

    Recently, the requirement for non-volatile memory on embedded systems has increased because they can be applied with normally-off and power gating technologies to. However, they have a lower endurance than volatile memories. When data is encoded as a write-reduction code appropriately, the endurance of non-volatile memory can be enhanced by writing the encoded data into the memory. We propose a highly effective write-reduction method for a multi-level cell (MLC) non-volatile memory focusing on the write-reduction code (WRC) as the optimal bit-write reduction method. The WRC can be applied only to single-level cell non-volatile memory. The proposed method generates a cell-write reduction code based on the WRC
    the cell has multiple bits as the holdable data. Our proposed method achieves a cell-write reduction by 31.6% compared to the conventional method.

    DOI

    Scopus

  • Trojan-feature extraction at gate-level netlists and its application to hardware-Trojan detection using random forest classifier

    Kento Hasegawa, Masao Yanagisawa, Nozomu Togawa

    Proceedings - IEEE International Symposium on Circuits and Systems     1 - 4  2017.09  [Refereed]

     View Summary

    Recently, due to the increase of outsourcing in IC design, it has been reported that malicious third-party vendors often insert hardware Trojans into their ICs. How to detect them is a strong concern in IC design process. The features of hardware-Trojan infected nets (or Trojan nets) in ICs often differ from those of normal nets. To classify all the nets in netlists designed by third-party vendors into Trojan ones and normal ones, we have to extract effective Trojan features from Trojan nets. In this paper, we first propose 51 Trojan features which describe Trojan nets from netlists. Based on the importance values obtained from the random forest classifier, we extract the best set of 11 Trojan features out of the 51 features which can effectively detect Trojan nets, maximizing the F-measures. By using the 11 Trojan features extracted, the machine-learning based hardware Trojan classifier has achieved at most 100% true positive rate as well as 100% true negative rate in several TrustHUB benchmarks and obtained the average F-measure of 74.6%, which realizes the best values among existing machine-learning-based hardware-Trojan detection methods.

    DOI

    Scopus

    129
    Citation
    (Scopus)
  • Trojan-feature extraction at gate-level netlists and its application to hardware-Trojan detection using random forest classifier

    Kento Hasegawa, Masao Yanagisawa, Nozomu Togawa

    Proceedings - IEEE International Symposium on Circuits and Systems   100-A ( 12 ) 2857 - 2868  2017.09  [Refereed]

     View Summary

    Recently, due to the increase of outsourcing in IC design, it has been reported that malicious third-party vendors often insert hardware Trojans into their ICs. How to detect them is a strong concern in IC design process. The features of hardware-Trojan infected nets (or Trojan nets) in ICs often differ from those of normal nets. To classify all the nets in netlists designed by third-party vendors into Trojan ones and normal ones, we have to extract effective Trojan features from Trojan nets. In this paper, we first propose 51 Trojan features which describe Trojan nets from netlists. Based on the importance values obtained from the random forest classifier, we extract the best set of 11 Trojan features out of the 51 features which can effectively detect Trojan nets, maximizing the F-measures. By using the 11 Trojan features extracted, the machine-learning based hardware Trojan classifier has achieved at most 100% true positive rate as well as 100% true negative rate in several TrustHUB benchmarks and obtained the average F-measure of 74.6%, which realizes the best values among existing machine-learning-based hardware-Trojan detection methods.

    DOI

    Scopus

    129
    Citation
    (Scopus)
  • Hardware Trojans classification for gate-level netlists using multi-layer neural networks

    Kento Hasegawa, Masao Yanagisawa, Nozomu Togawa

    2017 IEEE 23rd International Symposium on On-Line Testing and Robust System Design, IOLTS 2017     227 - 232  2017.09  [Refereed]

     View Summary

    Recently, due to the increase of outsourcing in IC design and manufacturing, it has been reported that malicious third-party IC vendors often insert hardware Trojans into their products. Especially in IC design step, it is strongly required to detect hardware Trojans because malicious third-party vendors can easily insert hardware Trojans in their products. In this paper, we propose a machine-learning-based hardware-Trojan detection method for gate-level netlists using multi-layer neural networks. First, we extract 11 Trojan-net feature values for each net in a netlist. After that, we classify the nets in an unknown netlist into a set of Trojan nets and that of normal nets using multi-layer neural networks. We obtained at most 100% true positive rate with our proposed method.

    DOI

    Scopus

    99
    Citation
    (Scopus)
  • Hardware Trojan detection and classification based on steady state learning

    Masaru Oya, Masao Yanagisawa, Nozomu Togawa

    2017 IEEE 23rd International Symposium on On-Line Testing and Robust System Design, IOLTS 2017     215 - 220  2017.09  [Refereed]

     View Summary

    In this paper, we propose a logic-testing based HT detection and classification method utilizing steady state learning. We first observe that HTs are hidden while applying random test patterns in a short time but most of them can be activated in a very long-term random circuit operation. Hence it is very natural that we learn steady signal-transition states of every suspicious Trojan net in a netlist by performing short-term random simulation. After that, we simulate or emulate the netlist in a very long time by giving random test patterns and obtain a set of signal-transition states. By discovering correlation between them, our method detects HTs and finds out its behavior. Experimental results demonstrate that our method can successfully identify all the real Trojan nets to be Trojan nets and all the normal nets to be normal nets, while other existing logic-testing HT detection methods cannot detect some of them.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A Floorplan Aware High-Level Synthesis Algorithm with Body Biasing for Delay Variation Compensation

    Koki Igawa, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E100A ( 7 ) 1439 - 1451  2017.07  [Refereed]

     View Summary

    In this paper, we propose a floorplan aware high-level synthesis algorithm with body biasing for delay variation compensation, which minimizes the average leakage energy of manufactured chips. In order to realize floorplan-aware high-level synthesis, we utilize huddle-based distributed register architecture (HDR architecture). HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit more but can increase the latency. We assign control-data flow graph (CDFG) nodes in non-critical paths to the huddles with larger expected leakage energy and those in critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 39.7% without latency and yield degradation compared with typical-case design with body biasing.

    DOI

    Scopus

  • A Hardware-Trojan Classification Method Using Machine Learning at Gate-Level Netlists Based on Trojan Features

    Kento Hasegawa, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E100A ( 7 ) 1427 - 1438  2017.07  [Refereed]

     View Summary

    Due to the increase of outsourcing by IC vendors, we face a serious risk that malicious third-party vendors insert hardware Trojans very easily into their IC products. However, detecting hardware Trojans is very difficult because today's ICs are huge and complex. In this paper, we propose a hardware-Trojan classification method for gate-level netlists to identify hardware-Trojan infected nets (or Trojan nets) using a support vector machine (SVM) or a neural network (NN). At first, we extract the five hardware-Trojan features from each net in a netlist. These feature values are complicated so that we cannot give the simple and fixed threshold values to them. Hence we secondly represent them to be a five-dimensional vector and learn them by using SVM or NN. Finally, we can successfully classify all the nets in an unknown netlist into Trojan ones and normal ones based on the learned classifiers. We have applied our machine-learning based hardware-Trojan classification method to Trust-HUB benchmarks. The results demonstrate that our method increases the true positive rate compared to the existing state-of-the-art results in most of the cases. In some cases, our method can achieve the true positive rate of 100%, which shows that all the Trojan nets in an unknown netlist are completely detected by our method.

    DOI

    Scopus

    39
    Citation
    (Scopus)
  • Efficient Multiplexer Networks for Field-Data Extractors and Their Evaluations

    Koki Ito, Kazushi Kawamura, Yutaka Tamiya, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E100A ( 4 ) 1015 - 1028  2017.04  [Refereed]

     View Summary

    As seen in stream data processing, it is necessary to extract a particular data field from bulk data, where we can use a field-data extractor. Particularly, an (M, N)-field-data extractor reads out any consecutive N bytes from an M-byte register by connecting its input/output using multiplexers (MUXs). However, the number of required MUXs increases too much as the input/output byte widths increase. It is known that partitioning a MUX network leads to reducing the number of MUXs. In this paper, we firstly pick up a multi-layered MUX network, which is generated by repeatedly partitioning a MUX network into a collection of single layered MUX networks. We show that the multi-layered MUX network is equivalent to the barrel shifter from which redundant MUXs and wires are removed, and we prove that the number of required MUXs becomes the smallest among MUX-network-partitioning based field-data extractors. Next, we propose a rotator-based MUX network for a field-data extractor, which is based on reading out a particular data in an input register to a rotator. The byte width of the rotator is the same as its output register and hence we no longer require any extra wires nor MUXs. By rotating the input data appropriately, we can finally have a right-ordered data into an output register. Experimental results show that a multi-layered MUX network reduces the number of required gates to construct a field-data extractor by up to 97.0% compared with the one using a naive approach and its delay becomes 1.8 ns-2.3 ns. A rotator-based MUX network with a control circuit also reduces the number of required gates to construct a field-data extractor by up to 97.3% compared with the one using a naive approach and its delay becomes 2.1 ns-2.9 ns.

    DOI

    Scopus

  • Trojan-net feature extraction and its application to hardware-Trojan detection for gate-level netlists using random forest

    Hasegawa, K., Yanagisawa, M., Togawa, N.

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E100A ( 12 )  2017

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • A Bit-Write-Reducing and Error-Correcting Code Generation Method by Clustering ECC Codewords for Non-Volatile Memories

    Tatsuro Kojo, Masashi Tawada, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E99A ( 12 ) 2398 - 2411  2016.12  [Refereed]

     View Summary

    Non-volatile memories are paid attention to as a promising alternative to memory design. Data stored in them still may be destructed due to crosstalk and radiation. We can restore the data by using errorcorrecting codes which require extra bits to correct bit errors. Further, nonvolatile memories consume ten to hundred times more energy than normal memories in bit-writing. When we configure them using error-correcting codes, it is quite necessary to reduce writing bits. In this paper, we propose a method to generate a bit-write-reducing code with error-correcting ability. We first pick up an error-correcting code which can correct t-bit errors. We cluster its codeswords and generate a cluster graph satisfying the S-bit flip conditions. We assign a data to be written to each cluster. In other words, we generate one-to-many mapping from each data to the codewords in the cluster. We prove that, if the cluster graph is a complete graph, every data in a memory cell can be re-written into another data by flipping at most S bits keeping error-correcting ability to t bits. We further propose an efficient method to cluster error-correcting codewords. Experimental results show that the bit-write-reducing and error-correcting codes generated by our proposed method efficiently reduce energy consumption. This paper proposes the world-first theoretically near-optimal bit-write-reducing code with error-correcting ability based on the efficient coding theories.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A Highly-Adaptable and Small-Sized In-Field Power Analyzer for Low-Power IoT Devices

    Ryosuke Kitayama, Takashi Takenaka, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E99A ( 12 ) 2348 - 2362  2016.12  [Refereed]

     View Summary

    Power analysis for IoT devices is strongly required to protect attacks from malicious attackers. It is also very important to reduce power consumption itself of IoT devices. In this paper, we propose a highly-adaptable and small-sized in-field power analyzer for low-power IoT devices. The proposed power analyzer has the following advantages: (A) The proposed power analyzer realizes signal-averaging noise reduction with synchronization signal lines and thus it can reduce wide frequency range of noises; (B) The proposed power analyzer partitions a long-term power analysis process into several analysis segments and measures voltages and currents of each analysis segment by using small amount of data memories. By combining these analysis segments, we can obtain long-term analysis results; (C) The proposed power analyzer has two amplifiers that amplify current signals adaptively depending on their magnitude. Hence maximum readable current can be increased with keeping minimum readable current small enough. Since all of (A), (B) and (C) do not require complicated mechanisms nor circuits, the proposed power analyzer is implemented on just a 2.5 cm x 3.3 cm board, which is the smallest size among the other existing power analyzers for IoT devices. We have measured power and energy consumption of the AES encryption process on the IoT device and demonstrated that the proposed power analyzer has only up to 1.17% measurement errors compared to a high-precision oscilloscope.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Hardware-Trojans Rank:Quantitative Evaluation of Security Threats at Gate-Level Netlists by Pattern Matching

    Masaru Oya, Noritaka Yamashita, Toshihiko Okamura, Yukiyasu Tsunoo, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E99A ( 12 ) 2335 - 2347  2016.12  [Refereed]

     View Summary

    Since digital ICs are often designed and fabricated by third parties at any phases today, we must eliminate risks that malicious attackers may implement Hardware Trojans (HTs) on them. In particular, they can easily insert HTs during design phase. This paper proposes an HT rank which is a new quantitative analysis criterion against HTs at gate-level netlists. We have carefully analyzed all the gate-level netlists in Trust-HUB benchmark suite and found out several Trojan net features in them. Then we design the three types of Trojan points: feature point, count point, and location point. By assigning these points to every net and summing up them, we have the maximum Trojan point in a gate-level netlist. This point gives our HT rank. The HT rank can be calculated just by net features and we do not perform any logic simulation nor random test. When all the gate-level netlists in Trust-HUB, ISCAS85, ISCAS89 and ITC99 benchmark suites as well as several OpenCores designs, HT-free and HT-inserted AES netlists are ranked by our HT rank, we can completely distinguish HT-inserted ones (which HT rank is ten or more) from HT-free ones (which HT rank is nine or less). The HT rank is the world-first quantitative criterion which distinguishes HT-inserted netlists from HT-free ones in all the gate-level netlists in Trust-HUB, ISCAS85, ISCAS89, and ITC99.

    DOI

    Scopus

    10
    Citation
    (Scopus)
  • Multi-scenario high-level synthesis for dynamic delay variation and its evaluation on FPGA platforms

    Koki Igawa, Masao Yanagisawa, Nozomu Togawa

    IEICE ELECTRONICS EXPRESS   13 ( 18 ) 20160641  2016.09  [Refereed]

     View Summary

    Multi-scenario high-level synthesis for distributed register/controller architecture has been proposed targeting static delay variation. In this paper, we extend it and propose a floorplan-driven high-level synthesis algorithm which can be applied to dynamic delay variation by effectively using an error prediction technique, where pre-error registers are introduced to local registers in every circuit block. Experimental results show that the proposed algorithm using two and three scenarios on an FPGA chip reduces the average number of required control steps by 17.6% and 25.5% on average compared to worst-case high-level synthesis at the expense of increasing lookup-tables and flip-flops. Moreover, we implement a multi-scenario elliptic-wave-filter (EWF) circuit with three scenarios synthesized by our proposed algorithm onto an FPGA chip and run it under the environment with varying supply voltages which causes dynamic delay variation. The FPGA implementation experiments also demonstrate that the EWF circuit effectively runs on the real FPGA chip. As far as we know, this is the world-first experiment where a multi-scenario circuit runs under real dynamic delay variation environment.

    DOI

    Scopus

  • Bi-Partitioning Based Multiplexer Network for Field-Data Extractors

    Koki Ito, Kazushi Kawamura, Yutaka Tamiya, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E99A ( 7 ) 1410 - 1414  2016.07  [Refereed]

     View Summary

    An (M,N)-field-data extractor reads out any consecutive N bytes from an M-byte register by connecting its input/output using a multiplexer (MUX) network. It is used in packet analysis and/or stream data processing for video/audio data. In this letter, we propose an efficient MUX network for an (M,N)-field-data extractor. By bi-partitioning a simple MUX network into an upper one and a lower one, we can theoretically reduce the number of required MUXs without increasing the MUX network depth. Experimental results show that we can reduce the gate count by up to 92% compared to a naive approach.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Interconnection-Delay and Clock-Skew Estimate Modelings for Floorplan-Driven High-Level Synthesis Targeting FPGA Designs

    Koichi Fujiwara, Kazushi Kawamura, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E99A ( 7 ) 1294 - 1310  2016.07  [Refereed]

     View Summary

    Recently, high-level synthesis techniques for FPGA designs (FPGA-HLS techniques) are strongly required in various applications. Both interconnection delays and clock skews have a large impact on circuit performance implemented onto FPGA, which indicates the need for floorplan-driven FPGA-HLS algorithms considering them. To appropriately estimate interconnection delays and clock skews at HLS phase, a reasonable model to estimate them becomes essential. In this paper, we demonstrate several experiments to characterize interconnection delays and clock skews in FPGA and propose novel estimate models called "IDEF" and "CSEF". In order to evaluate our models, we integrate them into a conventional floorplan-driven FPGA-HLS algorithm. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the latency by up to 22% compared with conventional approaches.

    DOI

  • A Multi-Scenario High-Level Synthesis Algorithm for Variation-Tolerant Floorplan-Driven Design

    Koki Igawa, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E99A ( 7 ) 1278 - 1293  2016.07  [Refereed]

     View Summary

    In order to tackle a process-variation problem, we can define several scenarios, each of which corresponds to a particular LSI behavior, such as a typical-case scenario and a worst-case scenario. By designing a single LSI chip which realizes multiple scenarios simultaneously, we can have a process-variation-tolerant LSI chip. In this paper, we propose a multi-scenario high-level synthesis algorithm for variation-tolerant floorplan-driven design targeting new distributed-register architectures, called HDR architectures. We assume two scenarios, a typical-case scenario and a worst-case scenario, and realize them onto a single chip. We first schedule/bind each of the scenarios independently. After that, we commonize the scheduling/binding results for the typical-case and worstcase scenarios and thus generate a commonized area-minimized floorplan result. At that time, we can explicitly take into account interconnection delays by using distributed-register architectures. Experimental results show that our algorithm reduces the latency of the typical-case scenario by up to 50% without increasing the latency of the worst-case scenario, compared with several existing methods.

    DOI

  • Indoor Navigation Based on Real-time Direction Information Generation Using Wearable Glasses

    Ryota Iwanaji, Tomoyuki Nitta, Kazuaki Ishikawa, Masao Yanagisawa, Nozomu Togawa

    2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-ASIA (ICCE-ASIA)    2016  [Refereed]

     View Summary

    Indoor areas such as office buildings and railway stations have almost no landmarks and how to navigate pedestrians to their destination points there is one of the big challenges. In this paper, we propose an indoor navigation system based on real-time direction information generation using wearable glasses. The proposed system effectively calculates a pedestrian's direction in a real-time manner using sensors and superimposes the direction to proceed on wearable glasses. Hence it navigates pedestrians to their right direction without using landmarks. Experiments demonstrate the effectiveness of the proposed navigation system.

  • A High-level Synthesis Algorithm for FPGA Designs Optimizing Critical Path with Interconnection-delay and Clock-skew Consideration

    Koichi Fujiwara, Kazushi Kawamura, Masao Yanagisawa, Nozomu Togawa

    2016 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT)    2016  [Refereed]

     View Summary

    High-level synthesis for FPGA designs (FPGA-HLS) is recently required in various applications. Since wire delays are becoming a design bottleneck in FPGA, we need to handle interconnection delays and clock skews in FPGA-HLS flow. In this paper, we propose an FPGA-HLS algorithm optimizing critical path with interconnection-delay and clock-skew consideration. By utilizing HDR architecture, we floorplan circuit modules in HLS flow and, based on the result, estimate interconnection delays and clock skews. To reduce the critical-path delay(s) of a circuit, we propose two novel methods for FPGA-HLS. Experimental results demonstrate that our algorithm can improve circuit performance by up to 24% compared with conventional approaches.

  • Rotator-Based Multiplexer Network Synthesis for Field-Data Extractors

    Koki Ito, Kazushi Kawamura, Yutaka Tamiya, Masao Yanagisawa, Nozomu Togawa

    2016 29TH IEEE INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (SOCC)     194 - 199  2016  [Refereed]

     View Summary

    As seen in stream data processing, it is necessary to extract a particular data field from bulk data, where we can use a field-data extractor. Particularly, an (M; N)-field-data extractor reads out any consecutive N bytes from an M -byte register by connecting its input/output using multiplexers (MUXs). However, the number of required MUXs increases too much as the input/output byte lengths increase. It is known that partitioning an MUX network leads to reducing the number of MUXs. In this paper, we firstly pick up a multi-layered MUX network, which is generated by repeatedly partitioning a MUX network into a collection of single-layered MUX networks. We show that the multi-layered MUX network is equivalent to the barrel shifter from which redundant MUXs and wires are removed, and we prove that the number of its required MUXs becomes the smallest among MUX-network-partitioning based field-data extractors. Next, we propose a rotator-based MUX network for a field-data extractor, which reads out a particular data in an input register to a rotator. The size of the rotator is the same as its output register and hence we no longer require any extra wires nor MUXs. By rotating the input data appropriately, we can finally have a right-ordered data into an output register. Experimental results show that our rotator-based MUX network reduces the required number of gates to implement a field-data extractor by up to 33% compared with the one using a multi-layered MUX network.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • In-situ Trojan Authentication for Invalidating Hardware-Trojan Functions

    Masaru Oya, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN ISQED 2016     152 - 157  2016  [Refereed]

     View Summary

    Due to the fact that we do not know who will create hardware Trojans (HTs), and when and where they would be inserted, it is very difficult to correctly and completely detect all the real HTs in untrusted ICs, and thus it is desired to incorporate in-situ HT invalidating functions into untrusted ICs as a countermeasure against HTs. This paper proposes an in situ Trojan authentication technique for gate-level netlists to avoid security leakage. In the proposed approach, an untrusted IC operates in authentication mode and normal mode. In the authentication mode, an embedded Trojan authentication circuit monitors the bit-flipping count of a suspicious Trojan net within the pre-defined constant clock cycles and identify whether it is a real Trojan or not. If the authentication condition is satisfied, the suspicious Trojan net is validated. Otherwise, it is invalidated and HT functions are masked. By doing this, even untrusted netlists with HTs can still be used in the normal mode without security leakage. By setting the appropriate authentication condition using training sets from Trust-HUB gate-level benchmarks, the proposed technique invalidates successfully only HTs in the training sets. Furthermore, by embedding the in-situ Trojan authentication circuit into a Trojan-inserted AES crypto netlist, it can run securely and correctly even if HTs exist where its area overhead is just 1.5% with no delay overhead.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • A Delay Variation and Floorplan Aware High-level Synthesis Algorithm with Body Biasing

    Koki Igawa, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN ISQED 2016     75 - 80  2016  [Refereed]

     View Summary

    In this paper, we propose a delay variation and floorplan aware high-level synthesis algorithm with body biasing, which minimizes the average leakage energy of manufactured chips. To realize a floorplan-oriented high-level synthesis, we utilize a huddle-based distributed register architecture (HDR architecture), one of the DR architectures. HDR architecture divides the chip area into small partitions called a huddle and we can control a body bias voltage for every huddle. During high-level synthesis, we iteratively obtain expected leakage energy for every huddle when applying a body bias voltage. A huddle with smaller expected leakage energy contributes to reducing expected leakage energy of the entire circuit but can increase the latency. We assign CDFG nodes in critical paths to the huddles with larger expected leakage energy and those in non-critical paths to the huddles with smaller expected leakage energy. We expect to minimize the entire leakage energy in a manufactured chip without increasing its latency. Experimental results show that our algorithm reduces the average leakage energy by up to 38.9% without latency and yield degradation compared with typical-case design with body biasing.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A High-performance Circuit Design Algorithm using Data Dependent Approximation

    Kazushi Kawamura, Masao Yanagisawa, Nozomu Togawa

    2016 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC)     95 - 96  2016  [Refereed]

     View Summary

    This paper proposes a high-performance circuit design algorithm using input data dependent approximation. In our algorithm, STEPCs (Suspicious Timing Error Prediction Circuits) are utilized for identifying the paths to be optimized inside a circuit efficiently. Experimental results targeting a set of basic adders show that our algorithm can achieve performance increase by up to 11.1% within the error rate of 2.1% compared to a conventional design technique.

    DOI

    Scopus

  • Hash-Table and Balanced-Tree Based FIB Architecture for CCN Routers

    Kenta Shimazaki, Takashi Aoki, Takahiro Hatano, Takuya Otsuka, Akihiko Miyazaki, Toshitaka Tsuda, Nozomu Togawa

    2016 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC)     67 - 68  2016  [Refereed]

     View Summary

    Recently, content centric networking (CCN) attracts attention as a next generation network on which every router forwards a packet to another router and also functions as a server. A CCN router has a forwarding table called FIB (Forwarding Information Base) but its table look-up can become a bottleneck. In this paper, we propose FIB data structure for CCN routers which can reduce the number of comparisons in its look-up table. Our proposed FIB is composed of a bloom filter and a hash table and each hash entry is connected to a balanced binary search tree. By using our FIB, the number of comparisons cannot much increase even if hash collisions occur. Experimental results demonstrate the effectiveness of the proposed FIB over the several existing methods.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Scalable and Small-Sized Power Analyzer Design with Signal-Averaging Noise Reduction for Low-Power IoT Devices

    Ryosuke Kitayama, Takashi Takenaka, Masao Yanagisawa, Nozomu Togawa

    2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)     978 - 981  2016  [Refereed]

     View Summary

    Power analysis for IoT devices is strongly required to reduce power consumption and realize secure communications. In this paper, we propose a scalable and small-sized power analyzer with signal-averaging noise reduction for low-power IoT devices. The proposed power analyzer reduces a wide frequency range of noises by using a signal averaging method and is implemented on just a 2cm x 3cm board, which is the smallest size among the other existing power analyzers for IoT devices. It further has the following advantages: (a) It has a two-level amplifier that amplifies current signals adaptively depending on their magnitude. Hence maximum readable current can be increased with keeping minimum readable current small enough. (b) If long-time analysis is required, it can be partitioned into several analysis segments. The proposed power analyzer can measure currents and voltages of each analysis segment by using a small amount of data memories. After that, by combining these analysis segments using a timer module, we can obtain long-time analysis results. We have analyzed power and energy consumption of encryption processes of AES block cipher on the IoT device and demonstrated that the proposed power analyzer has only 1.8% measurement error compared with a high-precision oscilloscope.

    DOI

    Scopus

  • Redesign for Untrusted Gate-level Netlists

    Masaru Oya, Masao Yanagisawa, Nozomu Togawa

    2016 IEEE 22ND INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS)     219 - 220  2016  [Refereed]

     View Summary

    This paper proposes a redesign technique which designs from untrusted netlists to trusted netlists. Our approach consists of two phases, detection phase and invalidation phase. The detection phase picks up suspicious hardware Trojans (HTs) by pattern matching. The invalidation phase modifies the suspicious HTs in order not to activate them. In the invalidation phase, three invalidation techniques are selected by analyzing location of suspicious malicious nets. Applying appropriately the invalidation technique to the nets can correctly invalidate HTs. In our results, the proposed technique can successfully invalidate HTs on several Trust-HUB benchmarks without HT activations. The results clearly demonstrate that our redesign technique is very effective to remove HT risks.

    DOI

    Scopus

  • Hardware Trojans Classification for Gate-level Netlists based on Machine Learning

    Kento Hasegawa, Masaru Oya, Masao Yanagisawa, Nozomu Togawa

    2016 IEEE 22ND INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS)     203 - 206  2016  [Refereed]

     View Summary

    Recently, we face a serious risk that malicious third-party vendors can very easily insert hardware Trojans into their IC products but it is very difficult to analyze huge and complex ICs. In this paper, we propose a hardware-Trojan classification method to identify hardware-Trojan infected nets (or Trojan nets) using a support vector machine (SVM). Firstly, we extract the five hardware-Trojan features in each net in a netlist. Secondly, since we cannot effectively give the simple and fixed threshold values to them to detect hardware Trojans, we represent them to be a five-dimensional vector and learn them by using SVM. Finally, we can successfully classify a set of all the nets in an unknown netlist into Trojan ones and normal ones based on the learned SVM classifier. We have applied our SVM-based hardware-Trojan classification method to Trust-HUB benchmarks and the results demonstrate that our method can much increase the true positive rate compared to the existing state-of-the-art results in most of the cases. In some cases, our method can achieve the true positive rate of 100%, which shows that all the Trojan nets in a netlist are completely detected by our method.

    DOI

    Scopus

    123
    Citation
    (Scopus)
  • Pedestrian Navigation Based on Landmark Recognition Using Glass-type Wearable Devices

    Ryoya Yano, Tomoyuki Nitta, Kazuaki Ishikawa, Masao Yanagisawa, Nozomu Togawa

    2016 IEEE 5TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS     1 - 2  2016  [Refereed]

     View Summary

    In this paper, we propose a pedestrian navigation system based on landmark recognition. Our proposed system utilizes a glass-type wearable device and gives a correspondence between a map and a real-world landscape. By recognizing a landmark position effectively, a pedestrian can easily know where to turn at each turning position and hence he/she can reach his/her goal without losing his/her way. Experimental results demonstrate that the proposed system can increase the landmarks which pedestrians can recognize and thus gives comprehensive navigation effectively.

    DOI

    Scopus

  • Comprehensive Deformed Map Generation for Wristwatch-type Wearable Devices Based on Landmark-based Partitioning

    Keisuke Kono, Tomoyuki Nitta, Kazuaki Ishikawa, Masao Yanagisawa, Nozomu Togawa

    2016 IEEE 5TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS     1 - 2  2016  [Refereed]

     View Summary

    Recently, wristwatch-type wearable devices are developed and geographic information services have become widely available on them. In this paper, we propose a comprehensive deformed map generation algorithm for wristwatch-type wearable devices. Our algorithm first normalizes a pedestrian route to 0 degrees, 45 degrees, or 90 degrees so that the pedestrian can see the route not tilting the wristwatch-type wearable device on his/her wrist. Second, our algorithm partitions the normalized map so that several landmarks are overlapped in the partitioned sub-maps. Hence the sub-maps can be largely displayed on wristwatch-type wearable devices and the pedestrian can recognize his/her location even when the sub-maps displayed are changed. Experiments demonstrate the effectiveness of our deformed map generation algorithm on wristwatch-type wearable devices.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • A Safe and Comprehensive Route Finding Method for Pedestrian Based on Lighting and Landmark

    Siya Bao, Tomoyuki Nitta, Kazuaki Ishikawa, Masao Yanagisawa, Nozomu Togawa

    2016 IEEE 5TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS     1 - 5  2016  [Refereed]

     View Summary

    This paper proposes a safe and comprehensive route finding method for pedestrians. We evaluate five factors that do relieve pedestrians' fear of darkness. Based upon the evaluation, we propose a comprehensive route finding method by taking road width and reduction on turning points into consideration. The experimental results on real outdoor environments under different lighting situations confirm that the proposed method can obtain safety and comprehensive routes for pedestrians.

    DOI

    Scopus

    8
    Citation
    (Scopus)
  • Implementation Evaluation of Scan-based Attack against a Trivium Cipher Circuit

    Daisuke Oku, Masao Yanagisawa, Nozomu Togawa

    2016 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     220 - 223  2016  [Refereed]

     View Summary

    Scan-path test, which is one of design-for-test techniques using a scan chain, can control and observe internal registers in an LSI chip. However, attackers can also use it. to retrieve secret information from cipher circuits. Recently, scan-based attacks using a scan chain inside an LSI chip is reported which can restore secret information by analyzing the scan data during cryptographic processing. In this paper, we pick up a scan-based attack method against a Trivium cipher, one of synchronous stream ciphers, and evaluate it using the FPGA platform called SASEBO-GII We implement the Trivium cipher on the FPGA chip and perform the scan-based attack against it. We demonstrate that the scan-based attack can successfully restore the secret information in the FPGA chip within several minutes, even if the FPGA chip contains several circuits other than the Trivium cipher circuit, which reveals that the scan-based attack against the Trivium cipher is not only a simulation threat but a real threat.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Scan-Based Side-Channel Attack on the Camellia Block Cipher Using Scan Signatures

    Huiqian Jiang, Mika Fujishiro, Hirokazu Kodera, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E98A ( 12 ) 2547 - 2555  2015.12  [Refereed]

     View Summary

    Camellia is a block cipher jointly developed by Mitsubishi and NTT of Japan. It is designed suitable for both software and hardware implementations. One of the design-for-test techniques using scan chains is called scan-path test, in which testers can observe and control the registers inside the LSI chip directly in order to check if the LSI chip correctly operates or not. Recently, a scan-based side-channel attack is reported which retrieves the secret information from the cryptosystem using scan chains. In this paper, we propose a scan-based attack method on the Camellia cipher using scan signatures. Our proposed method is based on the equivalent transformation of the Camellia algorithm and the possible key candidate reduction in order to retrieve the secret key. Experimental results show that our proposed method sucessfully retrieved its 128-bit secret key using 960 plaintexts even if the scan chain includes the Camellia cipher and other circuits and also sucessfully retrieves its secret key on the SASEBO-GII board, which is a side-channel attack standard evaluation board.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A Hardware-Trojans Identifying Method Based on Trojan Net Scoring at Gate-Level Netlists

    Masaru Oya, Youhua Shi, Noritaka Yamashita, Toshihiko Okamura, Yukiyasu Tsunoo, Satoshi Goto, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E98A ( 12 ) 2537 - 2546  2015.12  [Refereed]

     View Summary

    Outsourcing IC design and fabrication is one of the effective solutions to reduce design cost but it may cause severe security risks. Particularly, malicious outside vendors may implement Hardware Trojans (HTs) on ICs. When we focus on IC design phase, we cannot assume an HT-free netlist or a Golden netlist and it is too difficult to identify whether a given netlist is HT-free or not. In this paper, we propose a score-based hardware-trojans identifying method at gate-level netlists without using a Golden netlist. Our proposed method does not directly detect HTs themselves in a gate-level netlist but it detects a net included in HTs, which is called Trojan net, instead. Firstly, we observe Trojan nets from several HT-inserted benchmarks and extract several their features. Secondly, we give scores to extracted Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to classify HT-free and HT-inserted netlists. Based on these scores, we can successfully classify HT-free and HT-inserted netlists in all the Trust-HUB gate-level benchmarks and ISCAS85 benchmarks as well as HT-free and HT-inserted AES gate-level netlists. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be "HT-inserted" and all the HT-free gate-level benchmarks to be "HT-free" in approximately three hours for each benchmark.

    DOI

    Scopus

    12
    Citation
    (Scopus)
  • ECC-Based Bit-Write Reduction Code Generation for Non-Volatile Memory

    Masashi Tawada, Shinji Kimura, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E98A ( 12 ) 2494 - 2504  2015.12  [Refereed]

     View Summary

    Non-volatile memory has many advantages such as high density and low leakage power but it consumes larger writing energy than SRAM. It is quite necessary to reduce writing energy in non-volatile memory design. In this paper, we propose write-reduction codes based on error correcting codes and reduce writing energy in non-volatile memory by decreasing the number of writing bits. When a data is written into a memory cell, we do not write it directly but encode it into a codeword. In our write-reduction codes, every data corresponds to an information vector in an error-correcting code and an information vector corresponds not to a single codeword but a set of write-reduction codewords. Given a writing data and current memory bits, we can deterministically select a particular write-reduction codeword corresponding to the data to be written, where the maximum number of flipped bits are theoretically minimized. Then the number of writing bits into memory cells will also be minimized. Experimental results demonstrate that we have achieved writing-bits reduction by an average of 51% and energy reduction by an average of 33% compared to non-encoded memory.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Code Generation Limiting Maximum and Minimum Hamming Distances for Non-Volatile Memories

    Tatsuro Kojo, Masashi Tawada, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E98A ( 12 ) 2484 - 2493  2015.12  [Refereed]

     View Summary

    Data stored in non-volatile memories may be destructed due to crosstalk and radiation but we can restore their data by using error-correcting codes. However, non-volatile memories consume a large amount of energy in writing. How to reduce maximum writing bits even using error-correcting codes is one of the challenges in non-volatile memory design. In this paper, we first propose Doughnut code which is based on state encoding limiting maximum and minimum Hamming distances. After that, we propose a code expansion method, which improves maximum and minimum Hamming distances. When we apply our code expansion method to Doughnut code, we can obtain a code which reduces maximum-flipped bits and has error-correcting ability equal to Hamming code. Experimental results show that the proposed code efficiently reduces the number of maximum-writing bits.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A floorplan-driven high-level synthesis algorithm with multiple-operation chainings based on path enumeration

    Kotaro Terada, Masao Yanagisawa, Nozomu Togawa

    Proceedings - IEEE International Symposium on Circuits and Systems   2015-   2129 - 2132  2015.07  [Refereed]

     View Summary

    As process technologies advance, interconnection delays are not negligible even in high-level synthesis and regular-distributed-register (RDR) architecture has been proposed to cope with this problem. In this paper, we propose a floorplan-driven high-level synthesis algorithm using multiple-operation chainings composed of two or more operations, and reduce the overall latency targeting RDR architecture. Our algorithm enumerates multiple-operation-chaining path candidates before performing scheduling/binding. Based on them, we find out optimal ones taking into account RDR floorplan information. Experimental results show that our algorithm successfully reduces the latency by up to 30.4% compared to the conventional approaches.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • An Effective Suspicious Timing-Error Prediction Circuit Insertion Algorithm Minimizing Area Overhead

    Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E98A ( 7 ) 1406 - 1418  2015.07  [Refereed]

     View Summary

    As process technologies advance, timing-error correction techniques have become important as well. A suspicious timing-error prediction (STEP) technique has been proposed recently, which predicts timing errors by monitoring themiddle points, or check points of several speed-paths in a circuit. However, if we insert STEP circuits (STEPCs) in the middle points of all the paths from primary inputs to primary outputs, we need many STEPCs and thus require too much area overhead. How to determine these check points is very important. In this paper, we propose an effective STEPC insertion algorithm minimizing area overhead. Our proposed algorithm moves the STEPC insertion positions to minimize inserted STEPC counts. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs and reduce the required number of STEPCs to 1/10-1/80 and their area to 1/5-1/8 compared with a naive algorithm. Furthermore, our algorithm realizes 1.12X-1.5X overclocking compared with just inserting STEPCs into several speed-paths.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A Floorplan-Driven High-Level Synthesis Algorithm for Multiplexer Reduction Targeting FPGA Designs

    Koichi Fujiwara, Kazushi Kawamura, Shin-ya Abe, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E98A ( 7 ) 1392 - 1405  2015.07  [Refereed]

     View Summary

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in various applications such as computerized stock tradings and reconfigurable network processings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer's cost concurrently. In this paper, we propose a floorplan-driven HLS algorithm for multiplexer reduction targeting FPGA designs. By utilizing distributed-register architectures called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer's cost, we propose two novel binding methods called datapath-oriented scheduling/FU binding and datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the number of slices by up to 47% and latency by up to 22% compared with conventional approaches while the number of required control steps is almost the same.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • An Energy-Efficient Floorplan Driven High-Level Synthesis Algorithm for Multiple Clock Domains Design

    Shin-ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E98A ( 7 ) 1376 - 1391  2015.07  [Refereed]

     View Summary

    In this paper, we first propose an HDR-mcd architecture, which integrates periodically all-in-phase based multiple clock domains and multi-cycle interconnect communication into high-level synthesis. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay should be considered during high-level synthesis. Next, we propose a high-level synthesis algorithm for HDR-mcd, which can reduce energy consumption by optimizing configuration and placement of huddles. Experimental results show that the proposed method achieves 32.5% energy-saving compared with the existing single clock domain based methods.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A High-Level Synthesis Algorithm with Inter-Island Distance Based Operation Chainings for RDR Architectures

    Kotaro Terada, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E98A ( 7 ) 1366 - 1375  2015.07  [Refereed]

     View Summary

    In deep-submicron era, interconnection delays are not negligible even in high-level synthesis and regular-distributed-register architectures (RDR architectures) have been proposed to cope with this problem. In this paper, we propose a high-level synthesis algorithm using operation chainings which reduces the overall latency targeting RDR architectures. Our algorithm consists of three steps: The first step enumerates candidate operations for chaining. The second step introduces maximal chaining distance (MCD), which gives the maximal allowable inter-island distance on RDR architecture between chaining candidate operations. The last step performs list-scheduling and binding simultaneously based on the results of the two preceding steps. Our algorithm enumerates feasible chaining candidates and selects the best ones for RDR architecture. Experimental results show that our proposed algorithm reduces the latency by up to 40.0% compared to the original approach, and by up to 25.0% compared to a conventional approach. Our algorithm also reduces the number of registers and the number of multiplexers compared to the conventional approaches in some cases.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Energy-efficient High-level Synthesis for HDR Architecture with Multi-stage Clock Gating

    Akasaka Hiroyuki, Abe Shin-ya, Yanagisawa Masao, Togawa Nozomu

    IMT   10 ( 1 ) 1 - 7  2015

     View Summary

    With the miniaturization and high performance of current and future LSIs, demand for portable devices has much more increased. Especially the problems of battery runtime and device overheating have occurred. In addition, with the downsize of the LSI design process, the ratio of an interconnection delay to a gate delay has continued to increase. High-level synthesis to estimate the interconnection delays and reduce energy consumption is essential. In this paper, we propose a high-level synthesis algorithm based on HDR architectures (huddle-based distributed register architectures) utilizing multi-stage clock gating. By increasing the number of clock gating stages in each huddle, we increase the number of the control steps at which we can apply the clock gating to registers. We can determine the configuration of the clock gating with optimized energy consumption. The experimental results demonstrate that our proposed algorithm reduced energy consumption by up to 27.7% compared with conventional algorithms.

    DOI CiNii

  • Fast source optimization by clustering algorithm based on lithography properties

    Masashi Tawada, Takaki Hashimoto, Keishi Sakanushi, Shigeki Nojima, Toshiya Kotani, Masao Yanagisawa, Nozomu Togawa

    DESIGN-PROCESS-TECHNOLOGY CO-OPTIMIZATION FOR MANUFACTURABILITY IX   9427  2015  [Refereed]

     View Summary

    Lithography is a technology to make circuit patterns on a wafer. UV light diffracted by a photomask forms optical images on a photoresist. Then, a photoresist is melt by an amount of exposed UV light exceeding the threshold. The UV light diffracted by a photomask through lens exposes the photoresist on the wafer. Its lightness and darkness generate patterns on the photoresist. As the technology node advances, the feature sizes on photoresist becomes much smaller. Diffracted UV light is dispersed on the wafer, and then exposing photoresists has become more difficult. Exposure source optimization, SO in short, techniques for optimizing illumination shape have been studied. Although exposure source has hundreds of grid-points, all of previous works deal with them one by one. Then they consume too much running time and that increases design time extremely. How to reduce the parameters to be optimized in SO is the key to decrease source optimization time. In this paper, we propose a variation-resilient and high-speed cluster-based exposure source optimization algorithm. We focus on image log slope (ILS) and use it for generating clusters. When an optical image formed by a source shape has a small ILS value at an EPE (Edge placement error) evaluation point, dose/focus variation much affects the EPE values. When an optical image formed by a source shape has a large ILS value at an evaluation point, dose/focus variation less affects the EPE value. In our algorithm, we cluster several grid-points with similar ILS values and reduce the number of parameters to be simultaneously optimized in SO. Our clustering algorithm is composed of two STEPs: In STEP 1, we cluster grid-points into four groups based on ILS values of grid-points at each evaluation point. In STEP 2, we generate super clusters from the clusters generated in STEP 1. We consider a set of grid-points in each cluster to be a single light source element. As a result, we can optimize the SO problem very fast. Experimental results demonstrate that our algorithm runs speed-up compared to a conventional algorithm with keeping the EPE values.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A Floorplan-Aware High-Level Synthesis Technique with Delay-Variation Tolerance

    Kazushi Kawamura, Yuta Hagio, Youhua Shi, Nozomu Togawa

    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC)     122 - 125  2015  [Refereed]

     View Summary

    For realizing better trade-off between performance and yield rate in recent LSI designs, it is required to deal with increasing the ratios of interconnect delay as well as delay variation. In this paper, a novel floorplan-aware high-level synthesis technique with delay-variation tolerance is proposed. By utilizing floorplan-driven architectures, interconnect delays can be estimated and then handled even in high-level synthesis. Applying our technique enables to realize two scheduling/binding results (one is a non-delayed result and the other is a delayed result) simultaneously on a chip with small area/performance overhead, and either one of them can be selected according to the post-silicon delay variation. Experimental results demonstrate that our technique can reduce delayed scheduling/binding latency by up to 32.3% compared with conventional approaches.

  • Scan-based Side-channel Attack against Symmetric Key Ciphers Using Scan Signatures

    Mika Fujishiro, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC)     309 - 312  2015  [Refereed]

     View Summary

    There are a number of studies on a side-channel attack which uses information exploited from the physical implementation of a cryptosystem. A scan-based side-channel attack utilizes scan chains, one of design-for-test techniques and retrieves the secret information inside the cryptosystem. In this paper, scan based side-channel attack methods against symmetric key ciphers such as block ciphers and stream ciphers using scan signatures are presented to show the risk of scan-based attacks.

  • Partitioning-Based Multiplexer Network Synthesis for Field-Data Extractors

    Koki Ito, Yutaka Tamiya, Masao Yanagisawa, Nozomu Togawa

    2015 28TH IEEE INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (SOCC)     263 - 268  2015  [Refereed]

     View Summary

    As seen in packet analysis of TCP/IP offload engine and stream data processing for video/audio data, it is necessary to extract a particular data field from bulk data, where we can use a field-data extractor. Particularly, an (M, N)-field-data extractor reads out any consecutive N bytes from an M-byte register by connecting its input/ output using multiplexers. However, the number of required multiplexers increases too much as the input/ output byte lengths increase. How to reduce the number of its required multiplexers is a major challenge. In this paper, we propose an efficient multiplexer network synthesis method for an (M, N)-field-data extractor. Our method is based on inserting an (N + B - 1)-byte virtual intermediate register into a multiplexer network and partitioning it into an upper network and a lower network. Our method theoretically reduces the number of required multiplexers without increasing the multiplexer network depth. We also propose how to determine the size of the virtual intermediate register that minimizes the number of required multiplexers. Experimental results show that our method reduces the required number of gates to implement a field-data extractor by up to 92% compared with the one using a naive multiplexer network.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A Process-Variation-Aware Multi-Scenario High-Level Synthesis Algorithm for Distributed-Register Architectures

    Koki Igawa, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2015 28TH IEEE INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (SOCC)     7 - 12  2015  [Refereed]

     View Summary

    In order to tackle a process-variation problem, we can define several scenarios, each of which corresponds to a particular LSI behavior, such as a typical-case scenario and a worst-case scenario. By designing a single LSI chip which realizes multiple scenarios simultaneously, we can have a process-variation-tolerant LSI chip. In this paper, we propose a processvariation- aware low-latency and multi-scenario high-level synthesis algorithm targeting new distributed-register architectures, called HDR architectures. We assume two scenarios, a typical-case scenario and a worst-case scenario, and realize them onto a single chip. We first schedule/bind each of the scenarios independently. After that, we commonize the scheduling/binding results for the typical-case and worst-case scenarios and thus generate a commonized area-minimized floorplan result. Experimental results show that our algorithm reduces the latency of the typical-case scenario by up to 50% without increasing the latency of the worst-case scenario, compared with several existing methods.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A Floorplan-Driven High-Level Synthesis Algorithm with Multiple-Operation Chainings based on Path Enumeration

    Kotaro Terada, Masao Yanagisawa, Nozomu Togawa

    2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)     2129 - 2132  2015  [Refereed]

     View Summary

    As process technologies advance, interconnection delays are not negligible even in high-level synthesis and regular-distributed-register (RDR) architecture has been proposed to cope with this problem. In this paper, we propose a floorplan-driven high-level synthesis algorithm using multiple-operation chainings composed of two or more operations, and reduce the overall latency targeting RDR architecture. Our algorithm enumerates multiple-operation-chaining path candidates before performing scheduling/ binding. Based on them, we find out optimal ones taking into account RDR floorplan information. Experimental results show that our algorithm successfully reduces the latency by up to 30.4% compared to the conventional approaches.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Bit-Write-Reducing and Error-Correcting Code Generation by Clustering Error-Correcting Codewords for Non-Volatile Memories

    Tatsuro Kojo, Masashi Tawada, Masao Yanagisawa, Nozomu Togawa

    2015 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD)     682 - 689  2015  [Refereed]

     View Summary

    Non-volatile memories are paid attention to as a promising alternative to memory design. Data stored in them still may be destructed due to crosstalk and radiation. We can restore the data by using error-correcting codes which require extra bits to correct bit errors. Further, non-volatile memories consume ten to hundred times more energy than normal memories in bit-writing. When we configure them using error-correcting codes, it is quite necessary to reduce writing bits. In this paper, we propose a method to generate a bit-write-reducing code with error-correcting ability. We first pick up an error-correcting code which can correct t-bit errors. We cluster its codeswords and generate a cluster graph satisfying the S-bit flip conditions. We assign a data to be written to each cluster. In other words, we generate one-to-many mapping from each data to the codewords in the cluster. We prove that, if the cluster graph is a complete graph, every data in a memory cell can be re-written into another data by flipping at most S bits keeping error-correcting ability to t bits. We further propose an efficient method to cluster error-correcting codewords. Experimental results demonstrate that, when we apply our bit-write-reducing code to MediaBench applications, it can reduce writing-bit counts by up to 28.2% and also energy consumption of non-volatile memory cells by up to 27.9% compared to existing error-correcting codes keeping the same error-correcting ability. This paper proposes the world-first theoretically near-optimal bit-write-reducing code with error-correcting ability based on the efficient coding theories.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Effective Parallel Algorithm for GPGPU-Accelerated Explicit Routing Optimization

    Ko Kikuta, Eiji Oki, Naoaki Yamanaka, Nozomu Togawa, Hidenori Nakazato

    2015 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM)     1 - 6  2015  [Refereed]

     View Summary

    The recent development of network technologies that offer centralized control of explicit routes opens the door to the online optimization of explicit routing. For this kind of Traffic Engineering optimization, raising the calculation speeds by using multi-core processors with effective parallel algorithms is a key goal. This paper proposes an effective parallel algorithm for General purpose Programming on Graphic Processing Unit (GPGPU); its massively parallel style promises strong acceleration of calculation speed. The proposed algorithm parallelizes not only the search method of the Genetic Algorithm, but also its fitness functions, which calculate the network congestion ratio, so as to fully utilize the power of modern GPGPUs. Concurrently, each execution is designed for thread-block execution on the GPU with consideration of thread occupancy, local resources, and SIMT execution to maximize GPU performance. Evaluations show that the proposed algorithm offers, on average, a nine fold speedup compared to the conventional CPU approach.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A Landmark-based Route Recommendation Method for Pedestrian Walking Strategies

    Siya Bao, Tomoyuki Nitta, Daisuke Shindou, Masao Yanagisawa, Nozomu Togawa

    2015 IEEE 4TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE)     672 - 673  2015  [Refereed]

     View Summary

    This paper proposes a landmark-based route recommendation method for enjoyable walking atmosphere strategies by accumulating and analyzing geographical information. We utilize landmark categorization and region clustering to obtain effective elements. Experimental results demonstrate that our proposed method improves walking environment quality and confirm that it is applicable in both urban and rural areas.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • A Visible Corner-Landmark Based Route Finding Algorithm for Pedestrian Navigation

    Kengo Takeda, Tomoyuki Nitta, Daisuke Shindou, Masao Yanagisawa, Nozomu Togawa

    2015 IEEE 4TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE)     601 - 602  2015  [Refereed]

     View Summary

    Although many GPS-based pedestrian navigations are released, their instructions at decision points are not sufficient. This is mainly due to the lack of landmark informations. They may cause pedestrians to pass decision points or misunderstand when to turn. This paper proposes a visible corner-landmark based route finding algorithm. The proposed algorithm is based on visibility edges for landmarks and can obtain a pedestrian route that has visible landmarks on its corner points. Experiments demonstrate that the proposed algorithm can maximize the visible corner landmarks included in the generated routes.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A Score-Based Classification Method for Identifying Hardware-Trojans at Gate-Level Netlists

    Masaru Oya, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2015 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE)     465 - 470  2015  [Refereed]

     View Summary

    Recently, digital ICs are often designed by outside vendors to reduce design costs in semiconductor industry, which may introduce severe risks that malicious attackers implement Hardware Trojans (HTs) on them. Since IC design phase generates only a single design result, an RT-level or gate-level netlist for example, we cannot assume an HT-free netlist or a Golden netlist and then it is too difficult to identify whether a generated netlist is HT-free or HT-inserted. In this paper, we propose a score-based classification method for identifying HT-free or HT-inserted gate-level netlists without using a Golden netlist. Our proposed method does not directly detect HTs themselves in a gate-level netlist but a net included in HTs, which is called Trojan net, instead. Firstly, we observe Trojan nets from several HT-inserted benchmarks and extract several their features. Secondly, we give scores to extracted Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to classify HT-free and HT-inserted netlists. Based on these scores, we can successfully classify HT-free and HT-inserted netlists in all the Trust-HUB gate-level benchmarks. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be "HT-inserted" and all the HT-free gate-level benchmarks to be "HT-free" in approximately three hours for each benchmark.

  • A Bit-Write Reduction Method based on Error-Correcting Codes for Non-Volatile Memories

    Masashi Tawada, Shinji Kimura, Masao Yanagisawa, Nozomu Togawa

    2015 20TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC)     496 - 501  2015  [Refereed]

     View Summary

    Non-volatile memory has many advantages over SRAM. However, one of its largest problems is that it consumes a large amount of energy in writing. In this paper, we propose a bit-write reduction method based on error correcting codes for non-volatile memories. When a data is written into a memory cell, we do not write it directly but encode it into a codeword. We focus on error-correcting codes and generate new codes called write-reduction codes. In our write-reduction codes, each data corresponds to an information vector in an error-correcting code and an information vector corresponds not to a single codeword but a set of write-reduction codewords. Given a writing data and current memory bits, we can deterministically select a particular write-reduction codeword corresponding to a data to be written, where the maximum number of flipped bits are theoretically minimized. Then the number of writing bits into memory cells will also be minimized. We perform several experimental evaluations and demonstrate up to 72% energy reduction.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Improved Monitoring-Path Selection Algorithm for Suspicious Timing Error Prediction based Timing Speculation

    Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    PROCEEDINGS OF 2015 IEEE 11TH INTERNATIONAL CONFERENCE ON ASIC (ASICON)     1 - 4  2015  [Refereed]

     View Summary

    As process technology is scaling down, timing speculation techniques such as Razor and STEP are emerged as alternative solutions to reduce required margins due to various variation effects. Unlike Razor, STEP is a prediction-based timing speculation method to predict suspicious timing errors before they really appear, and thus it can result in more performance improvement. Therefore, an improved monitoring-path selection algorithm for STEP-based timing speculation is proposed in this paper, in which candidate monitoring-paths are selected based on short path removement and path length estimation. Experimental results show that the proposed algorithm realizes an average of 1.71X overclocking compared with worst-case based designs.

    DOI

    Scopus

  • A low-power soft error tolerant latch scheme

    Saki Tajima, Youhua Shi, Nozomu Togawa, Masao Yanagisawa

    PROCEEDINGS OF 2015 IEEE 11TH INTERNATIONAL CONFERENCE ON ASIC (ASICON)     1 - 4  2015  [Refereed]

     View Summary

    As process technology continues scaling, low power and reliability of integrated circuits are becoming more critical than ever before. Particularly, due to the reduction of node capacitance and operating voltage for low power consumption, it makes the circuits more sensitive to high-energy particles induced soft errors. In this paper, a soft-error tolerant latch called TSPC-SEH is proposed for soft error tolerance with low power consumption. The simulation results show that the proposed TSPC-SEH latch can achieve up to 42% power consumption reduction and 54% delay improvement compared to the existing soft error tolerant SEH and DICE designs.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Small-Sized and Noise-Reducing Power Analyzer Design for Low-Power IoT Devices

    Ryosuke Kitayama, Takashi Takenaka, Masao Yanagisawa, Nozomu Togawa

    PROCEEDINGS OF 2015 IEEE 11TH INTERNATIONAL CONFERENCE ON ASIC (ASICON)     1 - 4  2015  [Refereed]

     View Summary

    Power analysis for IoT devices is strongly required to reduce power consumption and realize secure communications, where we need a small-sized power analyzer that can reduce a wide frequency range of noises is needed. In this paper, we propose a small-sized and noise-reduced power analyzer for IoT devices. We utilize a signal averaging method to reduce a wide frequency range of noises. At that time, how to implement a synchronous process between a power analyzer and a target IoT device becomes the key problem. We solve this problem by ( a) using synchronization signals generated by a general-purpose I/O interface of a microprocessor and ( b) introducing a data-order correction process. We analyze power/energy consumption of the encryption process of LED block cipher on the IoT device and obtain an average power of 146.3mW and energy of 3.84mJ. The proposed power analyzer is just implemented on a 5cmx5cm board but these results only have 5% errors compared to a highprecision oscilloscope.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Image Synthesis Circuit Design Using Selector-logic-based Alpha Blending and Its FPGA Implementation

    Keita Igarashi, Masao Yanagisawa, Nozomu Togawa

    PROCEEDINGS OF 2015 IEEE 11TH INTERNATIONAL CONFERENCE ON ASIC (ASICON)     1 - 4  2015  [Refereed]

     View Summary

    Alpha blending is one of image synthesis techniques, which synthesizes a new image by summing up weighted input images and realizes transparent effect. In this paper, we focus on alpha blending using selector logics and implement it on an FPGA board. By applying selector logics to the alpha blending operation, its total product terms are decreased and thus a circuit size and circuit delay are improved simultaneously. In our implementation, original pixel values are stored into a memory on the FPGA board and then a new pixel value is synthesized based on input transmittance factors. We realize approximately 23% speed-up and 8% area reduction simultaneously using selector-logic based alpha blending.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Clock Skew Estimate Modeling for FPGA High-level Synthesis and Its Application

    Koichi Fujiwara, Kazushi Kawamura, Masao Yanagisawa, Nozomu Togawa

    PROCEEDINGS OF 2015 IEEE 11TH INTERNATIONAL CONFERENCE ON ASIC (ASICON)     1 - 4  2015  [Refereed]

     View Summary

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in various applications. Clock network in FPGA has already been built before implementing any circuits, which may lead a large impact of clock skews and then degrade operation frequency. In this paper, we formulate a clock skew estimate model for FPGA-HLS (CSEF). CSEF is an accurate model to estimate clock skews in HLS flow. CSEF is then integrated into a floorplan-aware HLS algorithm targeting FPGA designs. Experimental results demonstrate that our HLS algorithm can realize FPGA designs which reduce the latency by up to 19% compared with conventional approaches.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Scan-Based Side-Channel Attack on the LED Block Cipher Using Scan Signatures

    Mika Fujishiro, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E97A ( 12 ) 2434 - 2442  2014.12  [Refereed]

     View Summary

    LED (Light Encryption Device) block cipher, one of lightweight block ciphers, is very compact in hardware. Its encryption process is composed of AES-like rounds. Recently, a scan-based side-channel attack is reported which retrieves the secret information inside the cryptosystem utilizing scan chains, one of design-for-test techniques. In this paper, a scan-based attack method on the LED block cipher using scan signatures is proposed. In our proposed method, we focus on a particular 16-bit position in scanned data obtained from an LED LSI chip and retrieve its secret key using scan signatures. Experimental results show that our proposed method successfully retrieves its 64-bit secret key using 36 plaintexts on average if the scan chain is only connected to the LED block cipher. These experimental results also show the key is successfully retrieved even if the scan chain includes additional 130,000 1-bit data.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Scan-Based Attack against Trivium Stream Cipher Using Scan Signatures

    Mika Fujishiro, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E97A ( 7 ) 1444 - 1451  2014.07  [Refereed]

     View Summary

    Trivium is a synchronous stream cipher using three shift registers. It is designed to have a simple structure and runs at high speed. A scan-based side-channel attack retrieves secret information using scan chains, one of design-for-test techniques. In this paper, a scan-based side-channel attack method against Trivium using scan signatures is proposed. In our method, we reconstruct a previous internal state in Trivium one by one from the internal state just when a ciphertext is generated. When we retrieve the internal state, we focus on a particular 1-bit position in a collection of scan chains and then we can attack Trivium even if the scan chain includes other registers than internal state registers in Trivium. Experimental results show that our proposed method successfully retrieves a plaintext from a ciphertext generated by Trivium.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Foreword: Special section on VLSI design and CAD algorithms

    Yamada, A., Higami, Y., Takagi, K., Amagasaki, M., Ikeda, M., Ishihara, T., Ito, K., Usami, K., Okada, K., Kajihara, S., Kaneko, M., Kawaguchi, H., Kimura, S., Kurokawa, A., Shibata, Y., Seto, K., Song, T., Takashima, Y., Takahashi, A., Takenaka, T., Togawa, N., Tomiyama, H., Nakatake, S., Nakamura, Y., Hashimoto, M., Hamaguchi, K., Higuchi, H., Hirose, T., Fukuda, D., Matsumoto, T., Miura, Y., Minato, S.-I., Minami, F., Yamashita, S., Yuminaka, Y., Yoshikawa, M., Watanabe, T.

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E97A ( 12 ) 2366 - 2366  2014

  • A Delay-variation-aware High-level Synthesis Algorithm for RDR Architectures

    Hagio Yuta, Yanagisawa Masao, Togawa Nozomu

    IMT   9 ( 4 ) 446 - 455  2014

     View Summary

    As device feature size drops, interconnection delays often exceed gate delays. We have to incorporate interconnection delays even in high-level synthesis. Using RDR architectures is one of the effective solutions to this problem. At the same time, process and delay variation also becomes a serious problem which may result in several timing errors. How to deal with this problem is another key issue in high-level synthesis. In this paper, we propose a delay-variation-aware high-level synthesis algorithm for RDR architectures. We first obtain a non-delayed scheduling/binding result and, based on it, we also obtain a delayed scheduling/binding result. By adding several extra functional units to vacant RDR islands, we can have a delayed scheduling/binding result so that its latency is not much increased compared with the non-delayed one. After that, we similarize the two scheduling/binding results by repeatedly modifying their results. We can finally realize non-delayed and delayed scheduling/binding results simultaneously on RDR architecture with almost no area/performance overheads and we can select either one of them depending on post-silicon delay variation. Experimental results show that our algorithm successfully reduces delayed scheduling/binding latency by up to 42.9% compared with the conventional approach.

    DOI CiNii

  • Scan-based attack on the LED block cipher using scan signatures

    Mika Fujishiro, Masao Yanagisawa, Nozomu Togawa

    Proceedings - IEEE International Symposium on Circuits and Systems     1460 - 1463  2014  [Refereed]

     View Summary

    LED (Light Encryption Device) block cipher, one of lightweight block ciphers, is very compact in hardware. Its encryption process is composed of AES-like rounds. Recently, a scan-based side-channel attack is reported which retrieves the secret information inside the cryptosystem utilizing scan chains, one of design-for-test techniques. In this paper, a scan-based attack method on the LED block cipher using scan signatures is proposed. In our proposed method, we focus on a particular 16-bit position in scanned data obtained from an LED LSI chip and retrieve its secret key using scan signatures. Experimental results show that our proposed method successfully retrieves its 64-bit secret key using 73 plaintexts on average if the scan chain is only connected to the LED block cipher. These experimental results also show the key is successfully retrieved even if the scan chain includes additional some 4000 1-bit registers. © 2014 IEEE.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Linear and bi-linear interpolation circuits using selector logics and their evaluations

    Masashi Shio, Masao Yanagisawa, Nozomu Togawa

    Proceedings - IEEE International Symposium on Circuits and Systems     1436 - 1439  2014  [Refereed]

     View Summary

    Interpolation is a technique that presumes a value between existing data, which is often used for image scaling and correction of distortion. Linear interpolation is one of the interpolation techniques which interpolates inbetween values by linearly connecting two known values. Also, bi-linear interpolation is one of interpolation techniques, which interpolates a value linearly from its four circumferences. Both of them are used practically in many cases. In this paper, we propose high-speed and small-sized linear and bi-linear interpolation circuits based on selector logics. The proposed linear and bi-linear interpolation circuits reduce carry propagation delays by using selector logics and then realize fast and small-sized circuits. We have implemented our linear interpolation circuit and bi-linear interpolation circuits in several ways and evaluated each of them. We can find out that a selector-based bi-linear interpolation circuit where its partial products are summed up by using the arithmetic operator saves its area by up to 42% and reduces its delay by up to 18% compared with a conventional design. © 2014 IEEE.

    DOI

    Scopus

    10
    Citation
    (Scopus)
  • Throughput Driven Check Point Selection in Suspicious Timing Error Prediction based Designs

    Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE 5TH LATIN AMERICAN SYMPOSIUM ON CIRCUITS AND SYSTEMS (LASCAS)     1 - 4  2014  [Refereed]

     View Summary

    In this paper, a throughput-driven design technique is proposed, in which a suspicious timing error prediction circuit is inserted to monitor the signal transitions at some selected check points. Unlike previous works where timing errors are detected after their occurrence, the proposed method tries to use the real intermediate signal transitions for timing error prediction. The check point selection will affect both the maximal operation frequency and the suspicious timing error overestimation rate, both of which have an effect on the overall throughput, thus an analysis on the check point selection is also given. In our work, the circuit can be overclocked by a factor of 2 or more with ignorable area overhead while guarantees the always-correct output.

    DOI

    Scopus

  • Scan-based Attack on the LED Block Cipher Using Scan Signatures

    Mika Fujishiro, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)     1460 - 1463  2014  [Refereed]

     View Summary

    LED (Light Encryption Device) block cipher, one of lightweight block ciphers, is very compact in hardware. Its encryption process is composed of AES-like rounds. Recently, a scan-based side-channel attack is reported which retrieves the secret information inside the cryptosystem utilizing scan chains, one of design-for-test techniques. In this paper, a scan-based attack method on the LED block cipher using scan signatures is proposed. In our proposed method, we focus on a particular 16-bit position in scanned data obtained from an LED LSI chip and retrieve its secret key using scan signatures. Experimental results show that our proposed method successfully retrieves its 64-bit secret key using 73 plaintexts on average if the scan chain is only connected to the LED block cipher. These experimental results also show the key is successfully retrieved even if the scan chain includes additional some 4000 1-bit registers.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Linear and Bi-linear Interpolation Circuits Selector Logics and their Evaluations

    Masashi Shio, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)     1436 - 1439  2014  [Refereed]

     View Summary

    Interpolation is a technique that presumes a value between existing data, which is often used for image scaling and correction of distortion. Linear interpolation is one of the interpolation techniques which interpolates inbetween values by linearly connecting two known values. Also, bi-linear interpolation is one of interpolation techniques, which interpolates a value linearly from its four circumferences. Both of them are used practically in many cases. In this paper, we propose high-speed and small-sized linear and bi-linear interpolation circuits based on selector logics. The proposed linear and bi-linear interpolation circuits reduce carry propagation delays by using selector logics and then realize fast and small-sized circuits. We have implemented our linear interpolation circuit and bi-linear interpolation circuits in several ways and evaluated each of them. We can find out that a selector-based bi-linear interpolation circuit where its partial products are summed up by using the arithmetic operator saves its area by up to 42% and reduces its delay by up to 18% compared with a conventional design.

    DOI

    Scopus

    10
    Citation
    (Scopus)
  • In-situ Timing Monitoring Methods for Variation-Resilient Designs

    Youhua Shi, Nozomu Togawa

    2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     735 - 738  2014  [Refereed]

     View Summary

    With technology scaling, process, voltage, and temperature (PVT) variations pose great challenges on integrated circuit designs. Conventionally, LSI circuits are designed by adding pessimistic timing margin to guarantee "always correct" operations even under worst-case conditions. However, due to the increasing PVT variations, unacceptable larger design guard band should be reserved to avoid timing errors on critical paths of circuits, which will therefore lead to very inefficient designs in terms of power and performance. For this reason, in-situ timing monitoring technique has gained great research interest. In this paper, we will review existing variation-resilient design techniques with particular emphasis on in-situ timing monitoring techniques including both detection and prediction-based methods. The effectiveness of in-situ timing monitoring techniques will be discussed. Finally, we show an example of in-situ timing monitoring technique called STEP with applications to general pipeline designs.

    DOI

    Scopus

  • Secure scan design using improved random order and its evaluations

    Masaru Oya, Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     555 - 558  2014  [Refereed]

     View Summary

    Scan test using scan chains is one of the most important DFT techniques. However, scan-based attacks are reported which can retrieve the secret key in crypto circuits by using scan chains. Secure scan architecture is strongly required to protect scan chains from scan-based attacks. This paper proposes an improved version of random order as a secure scan architecture. In improved random order, a scan chain is partitioned into multiple sub-chains. The structure of the scan chain changes dynamically by selecting a subchain to scan out. Testability and security of the proposed improved random order are also discussed in the paper, and the implementation results demonstrate the effectiveness of the proposed method.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • A Write-Reducing and Error-Correcting Code Generation Method for Non-Volatile Memories

    Tatsuro Kojo, Masashi Tawada, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     304 - 307  2014  [Refereed]

     View Summary

    Data stored in non-volatile memories may be destructed due to crosstalk and radiation but we can restore their data by using error-correcting codes. However, non-volatile memories consume a large amount of energy in writing. How to reduce writing bits even using error-correcting codes is one of the challenges in non-volatile memory design. In this paper, we propose a new write-reducing and error-correcting code, called Doughnut code. Doughnut code is based on state encoding limiting maximum and minimum Hamming distances. After that, we propose a code expansion method, which improves minimum and maximum Hamming distances by expanding a write-reducing code. When we apply our code expansion method to Doughnut code, we can obtain a write-reducing code whose error-correcting ability is equal to Hamming code. Experimental results show that the proposed write-reducing code reduces the number of writing bits by up to 36% compared to Hamming code.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • An Area-Overhead-Oriented Monitoring-Path Selection Algorithm for Suspicious Timing Error Prediction

    Shinnosuke Yoshida, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     300 - 303  2014  [Refereed]

     View Summary

    As process technologies advance, the importance of timing error correction techniques is increasing as well. In this paper, We propose an area-overhead-oriented monitoring-path selection algorithm for suspicious timing error prediction circuits (STEPCs). STEPC predicts timing errors by monitoring the middle points of several speed-paths in a circuit. However, we need many STEPCs with a high area overhead to predict timing errors in an overall circuit. Our proposed method moves the STEPC insertion positions to minimize the number of inserted STEPCs. We apply a max-flow and min-cut approach to determine the optimal positions of inserted STEPCs. Our proposed algorithm reduces the required number of STEPCs to 1/19 and their area to 1/5 compared with a naive algorithm. Furthermore, our algorithm realizes 2.25X overclocking compared with just inserting STEPCs into several speed-paths.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Scan-based Side-Channel Attack on Camellia Cipher Using Scan Signatures

    Huiqian Jiang, Mika Fujishiro, Hirokazu Kodera, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     252 - 255  2014  [Refereed]

     View Summary

    Camellia, a block cipher jointly developed by Mitsubishi and NTT of Japan, is suitable for both software and hardware implementations and more secure than AES cipher. One of design-for-test techniques using scan chains is called scan-path test, in which testers can observe and control registers inside the LSI chip directly. Recently, scan-based side-channel attack is reported which retrieves the secret information from the cryptosystem using scan chains. In this paper, we propose a scan-based attack method on Camellia cipher using scan signatures. Our proposed method is based on equivalent transformation of the Camellia algorithm and key pattern reduction in order to retrieve the secret key. Experimental results show that our proposed method sucessfully retrieves its 128-bit secret key using 960 plaintexts if the scan chain is only connected to the Camellia cipher and also sucessfully retrieves its key on SASEBO-GII, which is a side-channel attack standard evaluation board.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A Floorplan-Driven High-Level Synthesis Algorithm with Operation Chainings Using Chaining Enumeration

    Kotaro Terada, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     248 - 251  2014  [Refereed]

     View Summary

    In deep-submicron era, interconnection delays are not negligible even in high-level synthesis and RDR (Regular-Distributed-Register) architecture has been proposed to cope with this problem. In this paper, we propose a high-level synthesis algorithm using operation chainings which reduces the overall latency targeting RDR architectures. Our algorithm consists of three steps: The first step enumerates candidates for chaining. The second step introduces maximal chaining distance (MCD), which gives the maximum allowable distance on RDR architecture between chaining candidate operations. The last step performs list-scheduling and binding simultaneously using the results of two preceding steps. Our algorithm enumerates feasible chaining candidates and selects the best ones for RDR architecture. Experimental results show that our algorithm reduces the latency by up to 28.6%, the number of registers by up to 37.5%, the number of multiplexers by up to 25.0%, compared to the conventional approaches.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A Floorplan-Aware High-level Synthesis Algorithm for Multiplexer Reduction Targeting FPGA Designs

    Koichi Fujiwara, Shinya Abe, Kazushi Kawamura, Masao Yanagisawa, Nozomu Togawa

    2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     244 - 247  2014  [Refereed]

     View Summary

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in various applications such as computerized stock tradings and reconfigurable network processings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer's cost concurrently. In this paper, we propose a floorplan-aware HLS algorithm for multiplexer reduction targeting FPGA designs. By utilizing distirbuted-register architectures called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer's cost, we propose two novel binding methods called datapath-oriented scheduling/FU binding and datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduces the number of slices by up to 47% and circuit delay by up to 16% compared with the conventional approach.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A delay-variation-aware high-level synthesis algorithm for RDR architectures

    Yuta Hagio, Masao Yanagisawa, Nozomu Togawa

    IPSJ Transactions on System LSI Design Methodology   7   81 - 90  2014  [Refereed]

     View Summary

    As device feature size drops, interconnection delays often exceed gate delays. We have to incorporate interconnection delays even in high-level synthesis. Using RDR architectures is one of the effective solutions to this problem. At the same time, process and delay variation also becomes a serious problem which may result in several timing errors. How to deal with this problem is another key issue in high-level synthesis. In this paper, we propose a delay-variation-aware high-level synthesis algorithm for RDR architectures. We first obtain a non-delayed scheduling/binding result and, based on it, we also obtain a delayed scheduling/binding result. By adding several extra functional units to vacant RDR islands, we can have a delayed scheduling/binding result so that its latency is not much increased compared with the non-delayed one. After that, we similarize the two scheduling/binding results by repeatedly modifying their results. We can finally realize non-delayed and delayed scheduling/binding results simultaneously on RDR architecture with almost no area/performance overheads and we can select either one of them depending on post-silicon delay variation. Experimental results show that our algorithm successfully reduces delayed scheduling/binding latency by up to 42.9% compared with the conventional approach.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Energy-efficient high-level synthesis for HDR architecture with multi-stage clock gating

    Hiroyuki Akasaka, Shin-Ya Abe, Masao Yanagisawa, Nozomu Togawa

    IPSJ Transactions on System LSI Design Methodology   7   74 - 80  2014  [Refereed]

     View Summary

    With the miniaturization and high performance of current and future LSIs, demand for portable devices has much more increased. Especially the problems of battery runtime and device overheating have occurred. In addition, with the downsize of the LSI design process, the ratio of an interconnection delay to a gate delay has continued to increase. High-level synthesis to estimate the interconnection delays and reduce energy consumption is essential. In this paper, we propose a high-level synthesis algorithm based on HDR architectures (huddle-based distributed register architectures) utilizing multi-stage clock gating. By increasing the number of clock gating stages in each huddle, we increase the number of the control steps at which we can apply the clock gating to registers. We can determine the configuration of the clock gating with optimized energy consumption. The experimental results demonstrate that our proposed algorithm reduced energy consumption by up to 27.7% compared with conventional algorithms.

    DOI

    Scopus

  • Floorplan Driven Architecture and High-Level Synthesis Algorithm for Dynamic Multiple Supply Voltages

    Shin-ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E96A ( 12 ) 2597 - 2611  2013.12  [Refereed]

     View Summary

    In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delay into high-level synthesis. In AVHDR architecture, voltages can be dynamically assigned for energy reduction. In other words, low supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Next, an AVHDR-based high-level synthesis algorithm is proposed. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, the modules in each huddle can be placed close to each other and the corresponding AVHDR architecture can be generated and optimized with floorplanning information. Experimental results show that on average our algorithm achieves 43.9% energy-saving compared with conventional algorithms.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A High-Speed Trace-Driven Cache Configuration Simulator for Dual-Core Processor L1 Caches

    Masashi Tawada, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E96A ( 6 ) 1283 - 1292  2013.06  [Refereed]

     View Summary

    Recently, multi-core processors are used in embedded systems very often. Since application programs is much limited running on embedded systems, there must exists an optimal cache memory configuration in terms of power and area. Simulating application programs on various cache configurations is one of the best options to determine the optimal one. Multi-core cache configuration simulation, however, is much more complicated and takes much more time than single-core cache configuration simulation. In this paper, we propose a very fast dual-core L1 cache configuration simulation algorithm. We first propose a new data structure where just a single data structure represents two or more multi-core cache configurations with different cache associativities. After that, we propose a new multi-core cache configuration simulation algorithm using our new data structure associated with new theorems. Experimental results demonstrate that our algorithm obtains exact simulation results but runs 20 times faster than a conventional approach.

    DOI

    Scopus

  • Scan-based Attack against DES and Triple DES Cryptosystems Using Scan Signatures

    Kodera Hirokazu, Yanagisawa Masao, Togawa Nozomu

    IMT   8 ( 3 ) 867 - 874  2013

     View Summary

    A scan-path test is one of the useful design-for-test techniques, in which testers can observe and control registers inside the target LSI chip directly. On the other hand, the risk of side-channel attacks against cryptographic LSIs and modules has been pointed out. In particular, scan-based attacks which retrieve secret keys by analyzing scan data obtained from scan chains have been attracting attention. In this paper, we propose two scan-based attack methods against DES and Triple DES using scan signatures. Our proposed methods are based on focusing on particular bit-column-data in a set of scan data and observing their changes when giving several plaintexts. Based on this property, we introduce the idea of a scan signature first and apply it to DES cryptosystems. In DES cryptosystems, we can retrieve secret keys by partitioning the S-BOX process into eight independent sub-processes and reducing the number of the round key candidates from 248 to 26 × 8 = 512. In Triple DES cryptosystems, three secret keys are used to encrypt plaintexts. Then we retrieve them one by one, using the similar technique as in DES cryptosystems. Although some problems occur when retrieving the second/third secret key, our proposed method effectively resolves them. Our proposed methods can retrieve secret keys even if a scan chain includes registers except a crypto module and attackers do not know when the encryption is really done in the crypto module. Experimental results demonstrate that we successfully retrieve the secret keys of a DES cryptosystem using at most 32 plaintexts and that of a Triple DES cryptosystem using at most 36 plaintexts.

    DOI CiNii

  • Energy-efficient High-level Synthesis for HDR Architectures with Clock Gating Based on Concurrency-oriented Scheduling

    Akasaka Hiroyuki, Abe Shin-ya, Yanagisawa Masao, Togawa Nozomu

    IMT   8 ( 4 ) 913 - 923  2013

     View Summary

    With the miniaturization of LSIs and its increasing performance, demand for high-functional portable devices has grown significantly. At the same time, battery lifetime and device overheating are leading to major design problems hampering further LSI integration. On the other hand, the ratio of an interconnection delay to a gate delay has continued to increase as device feature size decreases. We have to estimate interconnection delays and reduce energy consumption even in a high-level synthesis stage. In this paper, we propose a high-level synthesis algorithm for huddle-based distributed-register architectures (HDR architectures) with clock gatings based on concurrency-oriented scheduling/functional unit binding. We assume coarse-grained clock gatings to huddles and we focus on the number of control steps, or gating steps, at which we can apply the clock gating to registers in every huddle. We propose two methods to increase gating steps: One is that we try to schedule and bind operations to be performed at the same timing. By adjusting the clock gating timings in a high-level synthesis stage, we expect that we can enhance the effect of clock gatings more than applying clock gatings after logic synthesis. The other is that we try to synthesize huddles such that each of the synthesized huddles includes registers which have similar or the same clock gating timings. At this time, we determine the clock gating timings to minimize all energy consumption including clock tree energy. The experimental results show that our proposed algorithm reduces energy consumption by a maximum of 23.8% compared with several conventional algorithms.

    DOI CiNii

  • Concurrent faulty clock detection for crypto circuits against clock glitch based DFA

    Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    Proceedings - IEEE International Symposium on Circuits and Systems     1432 - 1435  2013  [Refereed]

     View Summary

    In this paper, a concurrent faulty clock detection method is proposed for crypto circuits against clock glitch based differential fault analysis (DFA). In the proposed method, a nonlogic buffer-based delay chain is inserted, and then by monitoring the delay along the delay chain, a possible clock glitch based DFA can be detected. Experimental results on an AES circuit show that the proposed method can successfully detect clock glitch based attacks, and the required area overhead is only 0.47% that is much smaller than previous works. © 2013 IEEE.

    DOI

    Scopus

    13
    Citation
    (Scopus)
  • An Energy-efficient High-level Synthesis Algorithm Incorporating Interconnection Delays and Dynamic Multiple Supply Voltages

    Shin-ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

    2013 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION, AND TEST (VLSI-DAT)    2013  [Refereed]

     View Summary

    In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture) that integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis. Next, we propose a high-level synthesis algorithm for AVHDR architectures. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, huddles, each of which abstracts modules placed close to each other, are naturally generated using floorplanning. Low-supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Experimental results show that our algorithm achieves 50% energy-saving compared with conventional algorithms.

  • High-Level Synthesis with Post-Silicon Delay Tuning for RDR Architectures

    Yuta Hagio, Masao Yanagisawa, Nozomu Togawa

    2013 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC)     194 - 197  2013  [Refereed]

     View Summary

    In this paper, we propose a high-level synthesis algorithm with post-silicon delay tuning for RDR architectures. We first obtain a non-delayed scheduling/binding result and a delayed scheduling/binding result. By adding several extra functional units to vacant RDR islands, we have a delayed scheduling/binding result so that its latency cannot be increased compared with the non-delayed one. After that, we similarize the two scheduling/binding results by repeatedly modifying their results. We can finally realize non-delayed and delayed scheduling/binding results simultaneously on RDR architecture with almost no area/performance overheads and we can select either one of them depending on post-silicon delay variation. Experimental results show that our algorithm successfully reduces delayed scheduling/binding latency by up to 42.9% compared with the conventional approach.

  • An Energy-efficient High-level Synthesis Algorithm Incorporating Interconnection Delays and Dynamic Multiple Supply Voltages

    Shin-ya Abe, Youhua Shi, Kimiyoshi Usami, Masao Yanagisawa, Nozomu Togawa

    2013 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION, AND TEST (VLSI-DAT)     1 - 4  2013  [Refereed]

     View Summary

    In this paper, we propose an adaptive voltage huddle-based distributed-register architecture (AVHDR architecture) that integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis. Next, we propose a high-level synthesis algorithm for AVHDR architectures. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, huddles, each of which abstracts modules placed close to each other, are naturally generated using floorplanning. Low-supply voltages are assigned to non-critical operations, and leakage power is cut off by turning off the power supply to the sleeping functional units. Experimental results show that our algorithm achieves 50% energy-saving compared with conventional algorithms.

    DOI

    Scopus

  • Secure scan design with dynamically configurable connection

    Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC     256 - 262  2013  [Refereed]

     View Summary

    Scan test is a powerful test technique which can control and observe the internal states of the circuit under test through scan chains. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore new secure test methods are required to satisfy both testability and security requirements. In this paper, a secure scan design is proposed to achieve adequate security requirement as a countermeasure against scan-based attacks, while still maintain high testability like normal scan testing. In our method, the internal scan chain is divided into several sub chains, and the connection order of sub chains can be dynamically changed. In addition, how to decide the connection order of those sub chains so that it can't be identified by an attacker is also proposed in this paper. The proposed method is implemented on an AES circuit to show its effectiveness, and a security analysis is also given to show how the proposed approach can be used as a countermeasure against those known scan-based attacks. © 2013 IEEE.

    DOI

    Scopus

    30
    Citation
    (Scopus)
  • Suspicious Timing Error Prediction with In-Cycle Clock Gating

    Youhua Shi, Hiroaki Igarashi, Nozomu Togawa, Masao Yanagisawa

    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2013)     335 - 340  2013  [Refereed]

     View Summary

    Conventionally, circuits are designed to add pessimistic timing margin to solve delay variation problems, which guarantees "always correct" operations. However, due to the fact that such a worst-case condition occurs rarely, the traditional pessimistic design method is therefore becoming one of the main obstacles for designers to achieve higher performance and/or ultra-low power consumption. By monitoring timing error occurrence during circuit operation, adaptive timing error detection and recovery methods have gained wide interests recently as a promising solution. As an extension of existing research, in this paper, we propose a suspicious timing error prediction method for performance or energy efficiency improvement in pipeline designs. Experimental results show that with when compared with typical margin designs, the proposed method can 1) achieve up to 1.41X throughput improvement with in-situ timing error prediction ability; and 2) allow the design to be overclocked by up to 1.88X with "always correct" outputs.

    DOI

    Scopus

    17
    Citation
    (Scopus)
  • A partial redundant fault-secure high-level synthesis algorithm for RDR architectures

    Kazushi Kawamura, Sho Tanaka, Masao Yanagisawa, Nozomu Togawa

    Proceedings - IEEE International Symposium on Circuits and Systems     1736 - 1739  2013  [Refereed]

     View Summary

    In this paper, we propose a partial redundant fault-secure high-level synthesis algorithm for RDR architectures, where we duplicate a part of the original CDFG and maximize its reliability under a timing constraint. Firstly, our algorithm allocates some new additional functional units to vacant spaces on RDR islands for recomputation and increases the number of duplicated operation nodes. Secondly, it minimizes the number of inserted comparator nodes through re-scheduling/re-binding the recomputation CDFG's nodes. As a result, we will obtain a scheduled/bound recomputation CDFG and renewed functional unit allocation with high reliability. Experimental results demonstrate that our algorithm improves reliability by up to 52% compared with the conventional approach. © 2013 IEEE.

    DOI

    Scopus

  • Concurrent Faulty Clock Detection for Crypto Circuits against Clock Glitch based DFA

    Hiroaki Igarashi, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2013 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)     1432 - 1435  2013  [Refereed]

     View Summary

    In this paper, a concurrent faulty clock detection method is proposed for crypto circuits against clock glitch based differential fault analysis (DFA). In the proposed method, a non-logic buffer-based delay chain is inserted, and then by monitoring the delay along the delay chain, a possible clock glitch based DFA can be detected. Experimental results on an AES circuit show that the proposed method can successfully detect clock glitch based attacks, and the required area overhead is only 0.47% that is much smaller than previous works.

    DOI

    Scopus

    13
    Citation
    (Scopus)
  • Energy Evaluation for Two-level On-chip Cache with Non-Volatile Memory on Mobile Processors

    Shota Matsuno, Masashi Tawada, Masao Yanagisawa, Shinji Kimura, Nozomu Togawa, Tadahiko Sugibayashi

    2013 IEEE 10TH INTERNATIONAL CONFERENCE ON ASIC (ASICON)     1 - 4  2013  [Refereed]

     View Summary

    As leakage power of traditional SRAM becomes larger, a ratio of static energy in total energy of memory architecture becomes also larger. Non-volatile memory (NVM) has many advantages over SRAM, such as high density, low leakage power, and non-volatility, but consumes too much write energy. In this paper, we evaluate energy consumption of two-level cache using NVM in part on mobile processors and confirm that it effectively reduces energy consumption.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Scan-based Attack against Trivium Stream Cipher Independent of Scan Structure

    Mika Fujishiro, Masao Yanagisawa, Nozomu Togawa

    2013 IEEE 10TH INTERNATIONAL CONFERENCE ON ASIC (ASICON)     1 - 4  2013  [Refereed]

     View Summary

    Trivium is a synchronous stream cipher using three shift registers running at high speed with simple structure. A scan-based side-channel attack retrieves secret information using scan chains, one of design-for-test techniques. In this paper, a scan-based side-channel attack method against Trivium using scan signatures is proposed. In our method, we focus on a particular I-bit position in a collection of scan chains and then we can attack Trivium even if the scan chain includes other registers than internal state registers in Trivium. Experimental results show that our proposed method successfully retrieves a plaintext from a ciphertext.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Scan-based attack against DES and Triple DES cryptosystems using scan signatures

    Hirokazu Kodera, Masao Yanagisawa, Nozomu Togawa

    Journal of Information Processing   21 ( 3 ) 572 - 579  2013  [Refereed]

     View Summary

    A scan-path test is one of the useful design-for-test techniques, in which testers can observe and control registers inside the target LSI chip directly. On the other hand, the risk of side-channel attacks against cryptographic LSIs and modules has been pointed out. In particular, scan-based attacks which retrieve secret keys by analyzing scan data obtained from scan chains have been attracting attention. In this paper, we propose two scan-based attack methods against DES and Triple DES using scan signatures. Our proposed methods are based on focusing on particular bit-column-data in a set of scan data and observing their changes when giving several plaintexts. Based on this property, we introduce the idea of a scan signature first and apply it to DES cryptosystems. In DES cryptosystems, we can retrieve secret keys by partitioning the S-BOX process into eight independent sub-processes and reducing the number of the round key candidates from 248 to 26 × 8 = 512. In Triple DES cryptosystems, three secret keys are used to encrypt plaintexts. Then we retrieve them one by one, using the similar technique as in DES cryptosystems. Although some problems occur when retrieving the second/third secret key, our proposed method effectively resolves them. Our proposed methods can retrieve secret keys even if a scan chain includes registers except a crypto module and attackers do not know when the encryption is really done in the crypto module. Experimental results demonstrate that we successfully retrieve the secret keys of a DES cryptosystem using at most 32 plaintexts and that of a Triple DES cryptosystem using at most 36 plaintexts. © 2013 Information Processing Society of Japan.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Energy-efficient High-level Synthesis for HDR Architectures with Clock Gating Based on Concurrency-oriented Scheduling.

    Hiroyuki Akasaka, Shin-ya Abe, Masao Yanagisawa, Nozomu Togawa

    IPSJ Trans. System LSI Design Methodology   6   101 - 111  2013  [Refereed]

     View Summary

    With the miniaturization of LSIs and its increasing performance, demand for high-functional portable devices has grown significantly. At the same time, battery lifetime and device overheating are leading to major design problems hampering further LSI integration. On the other hand, the ratio of an interconnection delay to a gate delay has continued to increase as device feature size decreases. We have to estimate interconnection delays and reduce energy consumption even in a high-level synthesis stage. In this paper, we propose a high-level synthesis algorithm for huddle-based distributed-register architectures (HDR architectures) with clock gatings based on concurrency-oriented scheduling/functional unit binding. We assume coarse-grained clock gatings to huddles and we focus on the number of control steps, or gating steps, at which we can apply the clock gating to registers in every huddle. We propose two methods to increase gating steps: One is that we try to schedule and bind operations to be performed at the same timing. By adjusting the clock gating timings in a high-level synthesis stage, we expect that we can enhance the effect of clock gatings more than applying clock gatings after logic synthesis. The other is that we try to synthesize huddles such that each of the synthesized huddles includes registers which have similar or the same clock gating timings. At this time, we determine the clock gating timings to minimize all energy consumption including clock tree energy. The experimental results show that our proposed algorithm reduces energy consumption by a maximum of 23.8% compared with several conventional algorithms.

    DOI CiNii

  • A thermal-aware high-level synthesis algorithm for RDR architectures through binding and allocation

    Kazushi Kawamura, Masao Yanagisawa, Nozomu Togawa

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E96-A ( 1 ) 312 - 321  2013  [Refereed]

     View Summary

    With process technology scaling, a heat problem in ICs is becoming a serious issue. Since high temperature adversely impacts on reliability, design costs, and leakage power, it is necessary to incorporate thermal-aware synthesis into IC design flows. In particular, hot spots are serious concerns where a chip is locally too much heated and reducing the peak temperature inside a chip is very important. On the other hand, increasing the average interconnect delays is also becoming a serious issue. By using RDR architectures (Regular-Distributed-Register architectures), the interconnect delays can be easily estimated and their influence can be much reduced even in high-level synthesis. In this paper, we propose a thermal-aware high-level synthesis algorithm for RDR architectures. The RDR architecture divides the entire chip into islands and each island has uniform area. Our algorithm balances the energy consumption among islands through re-binding to functional units. By allocating some new additional functional units to vacant areas on islands, our algorithm further balances the energy consumption among islands and thus reduces the peak temperature. Experimental results demonstrate that our algorithm reduces the peak temperature by up to 9.1% compared with the conventional approach. Copyright © 2013 The Institute of Electronics, Information and Communication Engineers.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Scan-Based Attack on AES through Round Registers and Its Countermeasure

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E95A ( 12 ) 2338 - 2346  2012.12  [Refereed]

     View Summary

    Scan-based side channel attack on hardware implementations of cryptographic algorithms has shown its great security threat. Unlike existing scan-based attacks, in our work we observed that instead of the secret-related-registers, some non-secret registers also carry the potential of being misused to help a hacker to retrieve secret keys. In this paper, we first present a scan-based side channel attack method on AES by making use of the round counter registers, which are not paid attention to in previous works, to show the potential security threat in designs with scan chains. And then we discussed the issues of secure DFT requirements and proposed a secure scan scheme to preserve all the advantages and simplicities of traditional scan test, while significantly improve the security with ignorable design overhead, for crypto hardware implementations.

    DOI

    Scopus

  • A Locality-Aware Hybrid NoC Configuration Algorithm Utilizing the Communication Volume among IP Cores

    Seungju Lee, Masao Yanagisawa, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E95A ( 9 ) 1538 - 1549  2012.09  [Refereed]

     View Summary

    Network-on-chip (NoC) architectures have emerged as a promising solution to the lack of scalability in multi-processor systems-on-chips (MPSoCs). With the explosive growth in the usage of multimedia applications, it is expected that NoC serves as a multimedia server supporting multi-class services. In this paper, we propose a configuration algorithm for a hybrid bus-NoC architecture together with simulation results. Our target architecture is a hybrid bus-NoC architecture, called busmesh NoC, which is a generalized version of a hybrid NoC with local buses. In our BMNoC configuration algorithm, cores which have a heavy communication volume between them are mapped in a cluster node (CN) and connected by a local bus. CNs can have communication with each other via edge switches (ESes) and mesh routers (MRs). With this hierarchical communication network, our proposed algorithm can improve the latency as compared with conventional methods. Several realistic applications applied to our algorithm illustrate the better performance than earlier studies and feasibility of our proposed algorithm.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Energy-efficient High-level Synthesis for HDR Architectures

    Abe Shin-ya, Yanagisawa Masao, Togawa Nozomu

    IMT   7 ( 4 ) 1319 - 1330  2012

     View Summary

    As battery runtime and overheating problems for portable devices become unignorable, energy-aware LSI design is strongly required. Moreover, an interconnection delay should be explicitly considered there because it exceeds a gate delay as the semiconductor devices are downsized. We must take account of energy efficiency and interconnection delays even in high-level synthesis. In this paper, we first propose a huddle-based distributed-register architecture (HDR architecture), an island-based distributed-register architecture for multi-cycle interconnect communications where we can develop several energy-saving techniques. Next, we propose an energy-efficient high-level synthesis algorithm for HDR architectures focusing on multiple supply voltages. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, a huddle, which is composed of functional units, registers, controller, and level converters, are very naturally generated using floorplanning results. By assigning high supply voltage to critical huddles and low supply voltage to non-critical huddles, we can finally have energy-efficient floorplan-aware high-level synthesis. Experimental results show that our algorithm achieves 45% energy-saving compared with the conventional distributed-register architectures and conventional algorithms.

    DOI CiNii

  • Dynamically Changeable Secure Scan Architecture against Scan-Based Side Channel Attack

    Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2012 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC)     155 - 158  2012  [Refereed]

     View Summary

    Scan test which is one of the useful design for testability techniques is effective for LSIs including cryptographic circuit. It can observe and control the internal states of the circuit under test by using scan chain. However, scan chain presents a significant security risk of information leakage for scan-based attacks which retrieves secret keys of cryptographic LSIs. In this paper, a secure scan architecture against scan-based attack which still has high testability is proposed. In our method, scan data is dynamically changed by adding the latch to any FFs in the scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method.

    DOI

    Scopus

    37
    Citation
    (Scopus)
  • Energy-efficient High-level Synthesis for HDR Architectures with Clock Gating

    Hiroyuki Akasaka, Masao Yanagisawa, Nozomu Togawa

    2012 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC)     135 - 138  2012  [Refereed]

     View Summary

    With the miniaturization of LSIs and its increasing performance, demand for high-functional portable devices has grown significantly. At the same time, the problems for battery runtime and device overheating have occurred. On the other hand, the ratio of an interconnection delay to a gate delay has continued to increase as device feature size decreases. We have to estimate the interconnection delay and reduce energy consumption even in a high-level synthesis stage. Recently, an HDR architecture and its associated power-optimized high-level synthesis algorithm have been proposed which can effectively estimate the interconnection delays by introducing the idea of "huddles" into an LSI chip. It utilize multiple supply voltages and achieves power-optimized LSI synthesis but does not take into account the clock gatings. In this paper, we propose a high-level synthesis algorithm based on HDR architectures utilizing clock gatings. Firstly we focus on the number of the control steps at which we can apply the clock gating to registers. Secondly, we synthesize the huddles such that each of the synthesized huddles includes registers which have similar or exactly the same clock gating timings. The experimental results show that our proposed algorithm reduces energy consumption by a maximum of 14.9% compared with the conventional algorithm.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A Novel BMNoC Configuration Algorithm Utilizing Communication Volume and Locality among Cores

    Seungju Lee, Nozomu Togawa, Takashi Aoki, Akira Onozawa

    2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012)     1668 - 1671  2012  [Refereed]

     View Summary

    Network-on-chip (NoC) architectures are emerged as a promising solution to the lack of scalability in multi-processor systems-on-chips (MPSoCs). In this paper, we propose a novel BMNoC configuration algorithm together with simulation results. Our BMNoC configuration algorithm analyses the data traffic of the target application and determines which core is the right one to put into the certain cluster with its communication volume and locality. Furthermore, the simulation results illustrate the better latency than earlier studies and feasibility of BMNoC.

    DOI

    Scopus

  • An Energy-efficient High-level Synthesis Algorithm for Huddle-based Distributed-Register Architectures

    Shin-ya Abet, Masao Yanagisawa, Nozomu Togawat

    2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012)     576 - 579  2012  [Refereed]

     View Summary

    In this paper, we first propose a huddle-based distributed-register architecture (HDR architecture), an island-based distributed-register architecture for multi-cycle interconnect communications where we can develop several energy-saving techniques. Next, we propose an energy-efficient high-level synthesis algorithm for HDR architectures focusing on multiple supply voltages. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, huddles, each of which is composed of functional units, registers, controller, and level converters, are very naturally generated using floorplanning results. By assigning high supply voltage to critical huddles and low supply voltage to non-critical huddles, we can finally have energy-efficient floorplan-aware high-level synthesis. Experimental results show that our algorithm achieves 45% energy-saving compared with the conventional distributed-register architectures and conventional algorithms.

    DOI

    Scopus

    8
    Citation
    (Scopus)
  • State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Side Channel Attack on RSA Circuit

    Yuta Atobe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     607 - 610  2012  [Refereed]

     View Summary

    Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it's possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore testability and security contradicted to each other, and there is a need to an efficient design for testability circuit so as to satisfy both testability and security requirement. In this paper, a secure scan architecture against scan-based attack is proposed to achieve high security without compromising the testability. In our method, scan structure is dynamically changed by adding the latch to any FFs in the scan chain. We made an analysis on an RSA circuit implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

    DOI

    Scopus

    14
    Citation
    (Scopus)
  • Weighted Adders with Selector Logics for Super-resolution and Its FPGA-based Evaluation

    Hiromine Yoshihara, Masao Yanagisawa, Nozomu Togawa

    2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     603 - 606  2012  [Refereed]

     View Summary

    Super-resolution is a technique to remove the noise of observed images and restore its high frequencies. We focus on reconstruction-based super-resolution. Reconstruction requires large computation cost since it requires many images. In this paper, we propose a fast weighted adder for reconstruction-based super-resolution. From the viewpoint of reducing partial products, we propose two approaches to speed up a weighted adder. First, we use selector logics to halve its partial products. Second, we propose a weights-range limit method utilizing negative term. By applying our proposed approaches to a weighted adder, we can reduce carry propagations and our weighted adder can be designed by a fast circuit as compared to conventional ones. Experimental evaluations demonstrate that our weighted adder improves the performance by a maximum of 29.9% and reduces a maximum of 592 LUTs, compared to conventional implementations.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Scan-based Attack against DES Cryptosystems Using Scan Signatures

    Hirokazu Kodera, Masao Yanagisawa, Nozomu Togawa

    2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     599 - 602  2012  [Refereed]

     View Summary

    With the high integration of LSI in recent years, the importance of design-for-techniques has been increasing. A scan-path test is one of the useful design-for-test techniques, in which testers can observe and control registers inside the target LSI chip directly. On the other hand, the risk of side-channel attacks against cryptographic LSIs and modules has been pointed out. In particular, scan-based attacks which retrieve secret keys by analyzing scan data obtained from scan chains has been attracting attention. In this paper, we propose a scan-based attack method against DES using scan signatures. Our proposed method are based on focusing on particular bit-column-data in a set of scan data and observing their changes when given several plaintexts. We can retrieve secret keys by partitioning the S-BOX process into eight independent sub-processes and reducing the number of the round key candidates from 2(48) to 2(6) x 8 = 512. Our proposed methods can retrieve secret keys even if a scan chain includes registers except a crypto module and attackers do not know when the encryption is really done in the crypto module. Experimental results demonstrate that we successfully retrieve the secret keys of a DES cryptosystem using at most 32 plaintexts.

    DOI

    Scopus

    29
    Citation
    (Scopus)
  • A Hybrid NoC Architecture Utilizing Packet Transmission Priority Control Method

    Seungju Lee, Nozomu Togawa, Yusuke Sekihara, Takashi Aoki, Akira Onozawa

    2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     404 - 407  2012  [Refereed]

     View Summary

    Network-on-chip architectures have emerged as a promising solution to the lack of scalability in multi-processor systems-on-chips (MPSoCs). With the explosive growth in the usage of multimedia applications, it is expected that NoC serves as a multimedia server supporting multi-class services. Recently, a busmesh NoC (BMNoC) has been proposed. The BMNoC architecture, which analyses the data traffic and makes aware of localities between cores, improves the system performance in terms of latency as compared with conventional NoCs. In this paper, we propose a novel BMNoC utilizing packet transmission priority control methods. Our proposed BMNoC is a generalized and simplified version of a hybrid NoC which is composed of local buses and global mesh routers. Several realistic applications applied to our algorithm illustrate the better performance than previous studies and feasibility of our proposed architecture.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Robust Secure Scan Design Against Scan-Based Differential Cryptanalysis

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS   20 ( 1 ) 176 - 181  2012.01  [Refereed]

     View Summary

    Scan technology carries the potential risk of being misused as a "side channel" to leak out the secrets of crypto cores. The existing scan-based attacks could be viewed as one kind of differential cryptanalysis, which takes advantages of scan chains to observe the bit changes between pairs of chosen plaintexts so as to identify the secret keys. To address such a design/test challenge, this paper proposes a robust secure scan structure design for crypto cores as a countermeasure against scan-based attacks to maintain high security without compromising the testability.

    DOI

    Scopus

    24
    Citation
    (Scopus)
  • Energy-efficient high-level synthesis for HDR architectures

    Shin-Ya Abe, Masao Yanagisawa, Nozomu Togawa

    IPSJ Transactions on System LSI Design Methodology   5   106 - 117  2012  [Refereed]

     View Summary

    As battery runtime and overheating problems for portable devices become unignorable, energy-aware LSI design is strongly required. Moreover, an interconnection delay should be explicitly considered there because it exceeds a gate delay as the semiconductor devices are downsized. We must take account of energy efficiency and interconnection delays even in high-level synthesis. In this paper, we first propose a huddle-based distributed-register architecture (HDR architecture), an island-based distributed-register architecture for multi-cycle interconnect communications where we can develop several energy-saving techniques. Next, we propose an energy-efficient high-level synthesis algorithm for HDR architectures focusing on multiple supply voltages. Our algorithm is based on iterative improvement of scheduling/binding and floorplanning. In the iteration process, a huddle, which is composed of functional units, registers, controller, and level converters, are very naturally generated using floorplanning results. By assigning high supply voltage to critical huddles and low supply voltage to non-critical huddles, we can finally have energy-efficient floorplan-aware high-level synthesis. Experimental results show that our algorithm achieves 45% energy-saving compared with the conventional distributed-register architectures and conventional algorithms. © 2012 Information Processing Society of Japan.

    DOI

    Scopus

    13
    Citation
    (Scopus)
  • A fastweighted adder by reducing partial product for reconstruction in super-resolution

    Hiromine Yoshihara, Masao Yanagisawa, Nozomu Togawa

    IPSJ Transactions on System LSI Design Methodology   5   96 - 105  2012  [Refereed]

     View Summary

    In recent years, it is quite necessary to convert conventional low-resolution images to high-resolution ones at low cost. Super-resolution is a technique to remove the noise of observed images and restore its high frequencies. We focus on reconstruction-based super-resolution. Reconstruction requires large computation cost since it requires many images. In this paper, we propose a fast weighted adder for reconstruction-based super-resolution. From the viewpoint of reducing partial products, we propose two approaches to speed up a weighted adder. First, we use selector logics to halve its partial products. Second, we propose a weights-range limit method utilizing negative term. By applying our proposed approaches to a weighted adder, we can reduce carry propagations and our weighted adder can be designed by a fast circuit as compared to conventional ones. Experimental evaluations demonstrate that our weighted adder reduces its delay time by a maximum of 25.29% and its area to a maximum of 1/3, compared to conventional implementations. © 2012 Information Processing Society of Japan.

    DOI

    Scopus

  • MH4 : multiple-supply-voltages aware high-level synthesis for high-integrated and high-frequency circuits for HDR architectures

    Shin-ya Abe, Youhua Shi, Masao Yanagisawa, Nozomu Togawa

    IEICE ELECTRONICS EXPRESS   9 ( 17 ) 1414 - 1422  2012  [Refereed]

     View Summary

    In this paper, we propose multiple-supply-voltages aware high-level synthesis algorithm for HDR architectures which realizes high-speed and high-efficient circuits. We propose three new techniques: virtual area estimation, virtual area adaptation, and floorplanning-directed huddling, and integrate them into our HDR architecture synthesis algorithm. Virtual area estimation/adaptation effectively estimates a huddle area by gradually reducing it during iterations, which improves the convergence of our algorithm. Floorplanning-directed huddling determines huddle composition very effectively by performing floorplanning and functional unit assignment inside huddles simultaneously. Experimental results show that our algorithm achieves about 29% run-time-saving compared with the conventional algorithms, and obtains a solution which cannot be obtained by our original algorithm even if a very tight clock constraint is given.

    DOI

    Scopus

    14
    Citation
    (Scopus)
  • Greedy Algorithm for the On-Chip Decoupling Capacitance Optimization to Satisfy the Voltage Drop Constraint

    Mikiko Sode Tanaka, Nozomu Togawa, Masao Yanagisawa, Satoshi Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E94A ( 12 ) 2482 - 2489  2011.12  [Refereed]

     View Summary

    With the progress of process technology in recent years, low voltage power supplies have become quite predominant. With this, the voltage margin has decreased and therefore the on-chip decoupling capacitance optimization that satisfies the voltage drop constraint becomes more important. In addition, the reduction of the on-chip decoupling capacitance area will reduce the chip area and, therefore, manufacturing costs. Hence, we propose an algorithm that satisfies the voltage drop constraint and at the same time, minimizes the total on-chip decoupling capacitance area. The proposed algorithm uses the idea of the network algorithm where the path which has the most influence on voltage drop is found. Voltage drop is improved by adding the on-chip capacitance to the node on the path. The proposed algorithm is efficient and effectively adds the on-chip capacitance to the greatest influence on the voltage drop. Experimental results demonstrate that, with the proposed algorithm, real size power/ground network could be optimized in just a few minutes which are quite practical. Compared with the conventional algorithm, we confirmed that the total on-chip decoupling capacitance area of the power/ground network was reducible by about 40 similar to 50%.

    DOI

    Scopus

  • Speeding-up exact and fast FIFO-based cache configuration simulation

    Masashi Tawada, Masao Yanagisawa, Nozomu Togawa

    IEICE ELECTRONICS EXPRESS   8 ( 14 ) 1161 - 1167  2011.07  [Refereed]

     View Summary

    The number of sets, block size, and associativity determine processor's cache configurations. Particularly in embedded systems, their cache configuration can be optimized since their target applications are much limited. Recently, the CRCB method has been proposed for LRU-based (Least Recently Used-based) cache configuration simulation, which can calculate cache hit/miss counts accurately and very fast changing the three parameters. However many recent processors use FIFO-based (First-In-First-Out-based) caches instead of LRU-based caches due to the viewpoints of their hardware costs. In this paper, we propose a speeding-up cache configuration simulation method for embedded applications that uses FIFO as a cache replacement policy. We first prove several properties for FIFO-based caches and then propose a simulation method that can process two or more FIFO-based cache configurations with different cache associativities simultaneously. Experimental results show that our proposed method can obtain accurate cache hits/misses and runs up to 32% faster than the conventional simulators.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Greedy Optimization Algorithm for the Power/Ground Network Design to Satisfy the Voltage Drop Constraint

    Mikiko Sode Tanaka, Nozomu Togawa, Masao Yanagisawa, Satoshi Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E94A ( 4 ) 1082 - 1090  2011.04  [Refereed]

     View Summary

    With the process technological progress in recent years, low voltage power supplies have become quite predominant. With this, the voltage margin has decreased and therefore the power/ground design that satisfies the voltage drop constraint becomes more important. In addition, the reduction of the power/ground total wiring area and the number of layers will reduce manufacturing and designing costs. So, we propose an algorithm that satisfies the voltage drop constraint and at the same time, minimizes the power/ground total wiring area. The proposed algorithm uses the idea of a network algorithm [I] where the edge which has the most influence on voltage drop is found. Voltage drop is improved by changing the resistance of the edge. The proposed algorithm is efficient and effectively updates the edge with the greatest influence on the voltage drop. From experimental results, compared with the conventional algorithm, we confirmed that the total wiring area of the power/ground was reducible by about 1/3. Also, the experimental data shows that the proposed algorithm satisfies the voltage drop constraint in the data whereas the conventional algorithm cannot.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A Fast Selector-Based Subtract-Multiplication Unit and Its Application to Butterfly Unit

    Tsukamoto Youhei, Yanagisawa Masao, Ohtsuki Tatsuo, Togawa Nozomu

    Information and Media Technologies   6 ( 2 ) 276 - 285  2011

     View Summary

    Large-scale network and multimedia application LSIs include application specific arithmetic units. A multiply-accumulator unit or a MAC unit which is one of these optimized units arranges partial products and decreases carry propagations. However, there is no method similar to MAC to execute “subtract-multiplication”. In this paper, we propose a high-speed subtract-multiplication unit that decreases latency of a subtract operation by bit-level transformation using selector logics. By using bit-level transformation, its partial products are calculated directly. The proposed subtract-multiplication units can be applied to any types of systems using subtract-multiplications and a butterfly operation in FFT is one of their suitable applications. We apply them effectively to Radix-2 butterfly units and Radix-4 butterfly units. Experimental results show that our proposed operation units using selector logics improves the performance by up to 13.92%, compared to a conventional approach.

    DOI CiNii

  • Exact, Fast and Flexible L1 Cache Configuration Simulation for Embedded Systems

    Tawada Masashi, Yanagisawa Masao, Ohtsuki Tatsuo, Togawa Nozomu

    Information and Media Technologies   6 ( 4 ) 1076 - 1091  2011

     View Summary

    Since target applications running on an embedded processor are much limited in embedded systems, we can optimize its cache configuration based on the number of sets, block size, and associativities. An extremely fast cache configuration simulation method, CRCB (Configuration Reduction approach by the Cache Behavior), has been recently proposed which can calculate cache hit/miss counts accurately for possible cache configurations when the three parameters above are changed. The CRCB method assumes LRU-based (Least Recently Used-based) cache but many recent processors use FIFO-based (First In First Out-based) cache or PLRU-based (Pseudo LRU-based) cache due to its hardware cost. In this paper, we propose exact and fast L1 cache configuration simulation algorithms for embedded applications that use PLRU or FIFO as a cache replacement policy. Firstly, we prove that the CRCB method can be applied not only to LRU but also to other cache replacement policies including FIFO and PLRU. Secondly, we prove several properties for FIFO- and PLRU-based caches and we propose associated cache simulation algorithms which can simulate simultaneously more than one cache configurations with different cache associativities accurately for FIFO or PLRU. Finally, many experimental results demonstrate that our cache configuration simulation algorithms obtain accurate cache hit/miss counts and run up to 249 times faster than a conventional cache simulator.

    DOI CiNii

  • Exact and Fast L1 Cache Configuration Simulation for Embedded Systems with FIFO/PLRU Cache Replacement Policies

    Masashi Tawada, Masao Yanagisawa, Tatsuo Ohtsuki, Nozomu Togawa

    2011 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT)     247 - 250  2011  [Refereed]

     View Summary

    Since target applications in embedded systems are limited, we can optimize its cache configuration. A very fast and exact cache simulation algorithm, CRCB, has been recently proposed. CRCB assumes LRU as a cache replacement policy but FIFO- or PLRU-based cache is often used due to its low hardware cost. This paper proposes exact and fast L1 cache simulation algorithms for PLRU- or FIFO-based caches. First, we prove that CRCB can be applied to FIFO and PLRU. Next, we show several properties for FIFO- and PLRU-based caches and propose their associated cache-simulation speed-up algorithms. Experiments demonstrate that our algorithms run up to 300 times faster than a well-known cache simulator.

  • Exact, fast and flexible L1 cache configuration simulation for embedded systems

    Masashi Tawada, Masao Yanagisawa, Tatsuo Ohtsuki, Nozomu Togawa

    IPSJ Transactions on System LSI Design Methodology   4   166 - 181  2011  [Refereed]

     View Summary

    Since target applications running on an embedded processor are much limited in embedded systems, we can optimize its cache configuration based on the number of sets, block size, and associativities. An extremely fast cache configuration simulation method, CRCB (Configuration Reduction approach by the Cache Behavior), has been recently proposed which can calculate cache hit/miss counts accurately for possible cache configurations when the three parameters above are changed. The CRCB method assumes LRU-based (Least Recently Used-based) cache but many recent processors use FIFO-based (First In First Out-based) cache or PLRU-based (Pseudo LRU-based) cache due to its hardware cost. In this paper, we propose exact and fast L1 cache configuration simulation algorithms for embedded applications that use PLRU or FIFO as a cache replacement policy. Firstly, we prove that the CRCB method can be applied not only to LRU but also to other cache replacement policies including FIFO and PLRU. Secondly, we prove several properties for FIFO- and PLRU-based caches and we propose associated cache simulation algorithms which can simulate simultaneously more than one cache configurations with different cache associativities accurately for FIFO or PLRU. Finally, many experimental results demonstrate that our cache configuration simulation algorithms obtain accurate cache hit/miss counts and run up to 249 times faster than a conventional cache simulator. © 2011 Information Processing Society of Japan.

    DOI

    Scopus

  • A fault-secure high-level synthesis algorithm for RDR architectures

    Sho Tanaka, Masao Yanagisawa, Tatsuo Ohtsuki, Nozomu Togawa

    IPSJ Transactions on System LSI Design Methodology   4   150 - 165  2011  [Refereed]

     View Summary

    As device feature size decreases, the reliability improvement against soft errors becomes quite necessary. A fault-secure system, in which concurrent error detection is realized, is one of the solutions to this problem. On the other hand, average interconnection delays exceed gate delays which leads to a serious timing closure problem. By using regular-distributed-register architecture (RDR architecture), we can estimate interconnection delays very accurately and their influence can be much reduced even in behavioral-level design. In this paper, we propose a fault-secure high-level synthesis algorithm for an RDR architecture. In fault-secure high-level synthesis, a recomputation CDFG as well as a normal-computation CDFG must be scheduled to control steps and bound to functional units. Firstly, our algorithm re-uses vacant areas on RDR islands to allocate new function units additionally for the recomputation CDFG. Secondly, we propose an efficient edge-break algorithm which considers comparison nodes' scheduling/binding. We can have small-latency scheduling/binding for both the normal CDFG and recomputation CDFG. Our algorithm reduces the required control steps by up to 53% compared with the conventional approach. © 2011 Information Processing Society of Japan.

    DOI

    Scopus

    12
    Citation
    (Scopus)
  • A fast selector-based subtract-multiplication unit and its application to butterfly unit

    Youhei Tsukamoto, Masao Yanagisawa, Tatsuo Ohtsuki, Nozomu Togawa

    IPSJ Transactions on System LSI Design Methodology   4   60 - 69  2011  [Refereed]

     View Summary

    Large-scale network and multimedia application LSIs include application specific arithmetic units. A multiply-accumulator unit or a MAC unit which is one of these optimized units arranges partial products and decreases carry propagations. However, there is no method similar to MAC to execute "subtractmultiplication". In this paper, we propose a high-speed subtract-multiplication unit that decreases latency of a subtract operation by bit-level transformation using selector logics. By using bit-level transformation, its partial products are calculated directly. The proposed subtract-multiplication units can be applied to any types of systems using subtract-multiplications and a butterfly operation in FFT is one of their suitable applications. We apply them effectively to Radix- 2 butterfly units and Radix-4 butterfly units. Experimental results show that our proposed operation units using selector logics improves the performance by up to 13.92%, compared to a conventional approach. © 2011 Information Processing Society of Japan.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Scan vulnerability in elliptic curve cryptosystems

    Ryuta Nara, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IPSJ Transactions on System LSI Design Methodology   4   47 - 59  2011  [Refereed]

     View Summary

    A scan-path test is one of the most important testing techniques, but it can be used as a side-channel attack against a cryptography circuit. Scan-based attacks are techniques to decipher a secret key using scanned data obtained from a cryptography circuit. Public-key cryptography, such as RSA and elliptic curve cryptosystem (ECC), is extensively used but conventional scan-based attacks cannot be applied to it, because it has a complicated algorithm as well as a complicated architecture. This paper proposes a scan-based attack which enables us to decipher a secret key in ECC. The proposed method is based on detecting intermediate values calculated in ECC. We focus on a 1-bit sequence which is specific to some intermediate values. By monitoring the 1-bit sequence in the scan path, we can find out the register position specific to the intermediate value in it and we can know whether this intermediate value is calculated or not in the target ECC circuit. By using several intermediate values, we can decipher a secret key. The experimental results demonstrate that a secret key in a practical ECC circuit can be deciphered using 29 points over the elliptic curve E within 40 seconds. © 2011 Information Processing Society of Japan.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Scan-Based Side-Channel Attack against RSA Cryptosystems Using Scan Signatures

    Ryuta Nara, Kei Satoh, Masao Yanagisawa, Tatsuo Ohtsuki, Nozomu Togawa

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E93A ( 12 ) 2481 - 2489  2010.12  [Refereed]

     View Summary

    Scan based side channel attacks retrieve a secret key in a cryptography circuit by analyzing scanned data Since they must be considerable threats to a cryptosystem LSI we have to protect cryptography circuits from them RSA is one of the most important cryptography algorithms because it effectively realizes a public key cryptography system RSA is extensively used but conventional scan based side channel attacks cannot be applied to it because It has a complicated algorithm This paper proposes a scan based side channel attack which enables us to retrieve a secret key in an RSA circuit The proposed method is based on detecting intermediate values calculated in an RSA circuit We focus on a I bit time sequence which is specific to some intermediate values By monitoring the I bit time sequence in the scan path we can find out the register position specific to the intermediate value and we can know whether this intermediate value is calculated or not in the target RSA circuit We can retrieve a secret key one bit by one bit from MSB to LSB The experimental results demonstrate that a 1 024 bit secret key used in the target RSA circuit can be retrieved using 30 2 input messages within 98 3 seconds and its 2 048 bit secret key can be retrieved using, 34 4 input within 634 0 seconds

    DOI

    Scopus

    71
    Citation
    (Scopus)
  • Improved Launch for Higher TDF Coverage With Fewer Test Patterns

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS   29 ( 8 ) 1294 - 1299  2010.08  [Refereed]

     View Summary

    Due to the limitations of scan structure, the second vector in transition delay test is usually applied either by shift operation or by functional launch, which possibly results in unsatisfying transition delay fault (TDF) coverage. To overcome such a limitation for higher TDF coverage, a novel improved launch delay test technique that combines the pros of launch-on-shift and launch-on-capture tests is introduced in this paper. The proposed method can achieve near perfect TDF coverage with fewer test patterns without the need for a global fast scan enable signal. Experimental results on ISCAS89 and ITC99 benchmark circuits are included to show the effectiveness of the proposed method.

    DOI

    Scopus

  • State-dependent Changeable Scan Architecture against Scan-based Side Channel Attacks

    Ryuta Nara, Hiroshi Atobe, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS     1867 - 1870  2010  [Refereed]

     View Summary

    Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan path would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan path to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each State-dependent Scan FF to be inverted or not so as to make it more difficult to discover the internal scan architecture.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • Performance-driven High-level Synthesis with floorplan for GDR Architectures and its Evaluation

    Akira Ohchi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS     921 - 924  2010  [Refereed]

     View Summary

    In this paper, we propose a high-level synthesis method targeting generalized distributed-register architecture in which we introduce shared/local registers and global/local controllers. Functional units on a critical path use local registers and local controllers and functional units on non-critical path use shared register and global controller in our architecture. Our method is based on iterative improvement of scheduling/binding and floorplanning. Using iterative flow, we obtains a generalized distributed-register architecture where its scheduling/binding as well as floorplanning are simultaneously optimized. Experimental results show that 8.6% performance improvement can be achieved compared to the conventional high-performance method.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Scan-Based Attack against Elliptic Curve Cryptosystems

    Ryuta Nara, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2010 15TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC 2010)     402 - 407  2010  [Refereed]

     View Summary

    Scan-based attacks are techniques to decipher a secret key using scanned data obtained from a cryptography circuit. Public-key cryptography, such as RSA and elliptic curve cryptosystem (ECC), is extensively used but conventional scan-based attacks cannot be applied to it, because it has a complicated algorithm as well as a complicated architecture. This paper proposes a scan-based attack which enables us to decipher a secret key in ECC. The proposed method is based on detecting intermediate values calculated in ECC. By monitoring the 1-bit sequence in the scan path, we can find out the register position specific to the intermediate value in it and we can know whether this intermediate value is calculated or not in the target ECC circuit. By using several intermediate values, we can decipher a secret key. The experimental results demonstrate that a secret key in a practical ECC circuit can be deciphered using 29 points over the elliptic curve E within 40 seconds.

    DOI

    Scopus

    65
    Citation
    (Scopus)
  • VLSI Implementation of a Fast Intra Prediction Algorithm for H.264/AVC Encoding

    Youhua Shi, Kenta Tokumitsu, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS)     1139 - 1142  2010  [Refereed]

     View Summary

    Intra-frame coding is one of the most important technologies in H.264/AVC, which made significant contributions to the enhancement of coding efficiency of H.264/AVC at the cost of computation complexity. To address this problem, in this paper we present an efficient VLSI implementation of a computation efficient intra prediction algorithm for H.264/AVC encoding. Unlike most of existing fast intra-mode selection techniques, in the proposed method the directional differences are computed using a few selected original pixels to obtain the candidate modes with the minimal direction cost. The proposed method is hardware-friendly and provides more processing parallelism for H.264 intra-frame encoding with less overhead and less power consumption, which is expected to be utilized as a favourable accelerator hardware module in a real-time HDTV (1920x1080p) H.264 encoder.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A Fast Selector-Based Subtract-Multiplication Unit and Its Application to Radix-2 Butterfly Unit

    Youhei Tsukamoto, Masao Yanagisawa, Tatsuo Ohtsuki, Nozomu Togawa

    PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS)     1083 - 1086  2010  [Refereed]

     View Summary

    Large-scale network and multimedia application LSIs include application specific arithmetic units. A multiplyaccumulator unit (MAC unit) which is one of these optimized units arranges partial products and decreases carry propagations. However, there is no method similar to MAC to execute "subtract-multiplication". In this paper, we propose a high-speed subtract-multiplication unit that decreases latency of a subtract operation by bit-level transformation using selector logics. By using bit-level transformation, its partial products are calculated directly. The proposed subtract-multiplication units can be applied to even any types of systems using subtract-multiplications and a butterfly operation in FFT is one of their suitable applications. Experimental results show that our proposed arithmetic units using selector logics improves the performance by 13.92%, compared to a conventional approach.

    DOI

    Scopus

  • BusMesh NoC: A Novel NoC Architecture Comprised of Bus-based Connection and Global Mesh Routers

    SeungJu Lee, Masao Yanagisawa, Tatsuo Ohtsuki, Nozomu Togawa

    PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS)     712 - 715  2010  [Refereed]

     View Summary

    Network-on-chip (NoC) architectures are emerged as a promising solution to the lack of scalability in multi-processor systems-on-chips (MPSoCs). In this paper, A busmesh network-on-chip (BMNoC) architecture is proposed, together with simulation results. It is comprised of bus-based connection and global mesh routers to enhance the performance of on-chip communication. Furthermore, MPEG-4, H.264 and a hybrid application mixed MPEG-4 and H.264 on our architecture illustrates the better performance than earlier studies and feasibility of BMNoC.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • A Two-Level Cache Design Space Exploration System for Embedded Applications

    Nobuaki Tojo, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E92A ( 12 ) 3238 - 3247  2009.12  [Refereed]

     View Summary

    Recently, two-level cache, L1 cache and L2 cache, is commonly used in a processor. Particularly in an embedded system whereby a single application or a class of applications is repeatedly executed on a processor, its cache configuration can be customized such that an optimal one is achieved. An optimal two-level cache configuration can be obtained which minimizes overall memory access time or memory energy consumption by varying the three cache parameters: the number of sets, a line size, and an associativity, for L1 cache and L2 cache. In this paper, we first extend the L1 cache simulation algorithm so that we can explore two-level cache configuration. Second, we propose two-level cache design space exploration algorithms: CRCB-T1 and CRCB-T2, each of which is based on applying Cache Inclusion Proper v to two-level cache configuration. Each of the proposed algorithms realizes exact cache simulation but decreases the number of cache hit/miss judgments by a factor of several thousands. Experimental results show that. by using our approach. the number of cache hit/miss judgments required to optimize a cache configurations is reduced to 1/50-1/5500 compared to the exhaustive approach. As a result, our proposed approach totally runs an average of 1398.25 times faster compared to the exhaustive approach. Our proposed cache simulation approach achieves the world fastest two-level cache design space exploration.

    DOI

    Scopus

  • A Scan-Based Attack Based on Discriminators for AES Cryptosystems

    Ryuta Nara, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E92A ( 12 ) 3229 - 3237  2009.12  [Refereed]

     View Summary

    A scan chain is one of the most important testing techniques, but it can be used as side-channel attacks against a cryptography LSI. We focus on scan-based attacks, in which scan chains are targeted for side-channel attacks. The conventional scan-based attacks only consider the scan chain composed of only the registers in a cryptography circuit. However, a cryptography LSI usually uses many circuits such as memories, micro processors and other circuits. This means that the conventional attacks cannot be applied to the practical scan chain composed of various types of registers. In this paper, a scan-based attack which enables to decipher the secret key in an AES cryptography LSI composed of an AES circuit and other circuits is proposed. By focusing on bit pattern of the specific register and monitoring its change, Our scan-based attack eliminates the influence of registers included in other circuits than AES. Our attack does not depend on scan chain architecture, and it can decipher practical AES cryptography LSIs.

    DOI

    Scopus

    47
    Citation
    (Scopus)
  • Floorplan-Aware High-Level Synthesis for Generalized Distributed-Register Architectures

    Akira Ohchi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E92A ( 12 ) 3169 - 3179  2009.12  [Refereed]

     View Summary

    As device feature size decreases, interconnection delay becomes the dominating factor of circuit total delay. Distributed-register architectures call reduce the influence of interconnection delay. They may, however, increase circuit area because they require many local registers. Moreover original distributed-register architectures do not consider control signal delay, which may be the bottleneck in a circuit. In this paper. we propose it high-level synthesis method targeting generalized distributed-register architecture in which we introduce shared/local registers aid global/local controllers. Our method is based on iterative improvement of scheduling/binding and floorplanning. First, we prepare shared-register groups with global controllers, each of which corresponds to it single functional unit. As iterations proceed, we use local registers and local controllers for functional units on it critical path. Shared-register groups physically located close to each other are merged into a single group. Accordingly, global controllers are merged. Finally, our method obtains it generalized distributed-register architecture where its scheduling/binding as well as floorplanning are simultaneously optimized. Experimental results show that the area is decreased by 4.7% while maintaining the performance of the circuit equal with that using original distributed-register architectures.

    DOI

    Scopus

    8
    Citation
    (Scopus)
  • X-Handling for Current X-Tolerant Compactors with More Unknowns and Maximal Compaction

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E92A ( 12 ) 3119 - 3127  2009.12  [Refereed]

     View Summary

    This paper presents a novel X-handling technique, which removes the effect of unknowns on compacted test response with maximal compaction ratio. The proposed method combines with the current X-tolerant compactors and inserts masking cells on scan paths to selectively mask X's. By doing this, the number of unknown responses in each scan-out cycle could be reduced to a reasonable level such that the target X-tolerant compactor would tolerate with guaranteed possible error detection, It guarantees no test loss due to the effect of X's, and achieves the maximal compaction that the target response compactor could provide as well. Moreover, because the masking cells are only inserted on the scan paths, it has no performance degradation of the designs. Experimental results demonstrate the effectiveness of the proposed method.

    DOI

    Scopus

  • Unified Dual-Radix Architecture for Scalable Montgomery Multiplications in GF(P) and GF(2(n))

    Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E92A ( 9 ) 2304 - 2317  2009.09  [Refereed]

     View Summary

    Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), that is a type of public-key cryptography. Montgomery multiplier is commonly used to compute the modular multiplications and requires scalability because the bit length of operands varies depending on its security level. In addition, ECC is performed in GF(P) or GF(2(n)), and unified architecture for multipliers in GF(P) and GF(2(n)) is required. However, in previous works, changing frequency is necessary to deal with delay-time difference between GF(P) and GF(2(n)) multipliers because the critical path of the GF(P) multiplier is longer. This paper proposes unified dual-radix architecture for scalable Montgomery multiplications in GF(P) and GF(2(n)). This proposed architecture unifies four parallel radix-2(16) multipliers in GF(P) and a radix-2(64) multiplier in GF(2(n)) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute a GF(P) 256-bit Montgomery multiplication in 0.28 mu s. The implementation result shows that the area of the proposal is almost the same as that of previous works: 39 kgates.

    DOI

    Scopus

  • An L1 Cache Design Space Exploration System for Embedded Applications

    Nobuaki Tojo, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E92A ( 6 ) 1442 - 1453  2009.06  [Refereed]

     View Summary

    In an embedded system where a single application or a class of applications is repeatedly executed on a processor, its cache configuration can be customized such that an optimal one is achieved. We can have an optimal cache configuration which minimizes overall memory access time by varying the three cache parameters: the number of sets, a line size, and an associativity. In this paper, we first propose two cache simulation algorithms: CRCB1 and CRCB2, based on Cache Inclusion Property. They realize exact cache simulation but decrease the number of cache hit/miss judgments dramatically. We further propose three more cache design space exploration algorithms: CRMF1, CRMF2, and CRMF3, based on our experimental observations. They can find an almost optimal cache configuration from the viewpoint of access time. By using our approach, the number of cache hit/miss judgments required for optimizing cache configurations is reduced to 1/10-1/50 compared to conventional approaches. As a result, our proposed approach totally runs an aver-age of 3.2 times faster and a maximum of 5.3 times faster compared to the fastest approach proposed so far. Our proposed cache simulation approach achieves the world fastest cache design space exploration when optimizing total memory access time.

    DOI

    Scopus

  • Design-for-Secure-Test for Crypto Cores

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    ITC: 2009 INTERNATIONAL TEST CONFERENCE     618 - 618  2009  [Refereed]

     View Summary

    Scan technology carries the potential of being misused as a "side channel" to leak out the secret information of crypto cores. To address such a design challenge, this paper proposes a design-for-secure-test (DFST) solution for crypto cores by adding a stimuli-launched flip-flop into the traditional scan flip-flop to maintain the high test quality without compromising the security.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • Exact and Fast L1 Cache Simulation for Embedded Systems

    Nobuaki Tojo, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    PROCEEDINGS OF THE ASP-DAC 2009: ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 2009     817 - 822  2009  [Refereed]

     View Summary

    In recent years, the gap between the cycle time of processors and memory access time has been increasing. One of the solutions to solve this problem is to use a cache. But just using a large cache may not reduce the total memory access time. We can have an optimal cache configuration which minimizes overall memory access time by varying the three cache parameters: a cache set size, a line size, and an associativity. In this paper, we propose two exact cache simulation algorithms: CRCB1 and CRCB2, based on Cache Inclusion Property. They realize exact cache simulation but increase simulation speed dramatically. By using our approach, the number of cache hit/miss judgments required for simulating all the cache configurations is reduced to 31.4%-93.6% compared to conventional approaches. As a result, our proposed approach totally runs an average of 1.8 times faster and a maximum of 3.3 times faster compared to the fastest approach proposed so far. Our proposed exact cache simulation approach achieves the world fastest L1 cache simulation.

    DOI

    Scopus

    25
    Citation
    (Scopus)
  • A Unified Test Compression Technique for Scan Stimulus and Unknown Masking Data with No Test Loss

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E91A ( 12 ) 3514 - 3523  2008.12  [Refereed]

     View Summary

    This paper presents a unified test compression technique for scan stimulus and unknown masking data with seamless integration of test generation, test compression and all unknown response masking for high quality manufacturing test cost reduction. Unlike prior test compression methods. the proposed approach considers the unknown responses during test pattern generation procedure, and then selectively encodes, the less specified bits (either Is or Os) in each scan slice for compression while at the same time masks the unknown responses before sending them to the response compactor. The proposed test scheme could dramatically reduce test data volume as well as the number of required test channels by using only c tester channels to drive N internal scan chains, where c = inverted right perpendicular log(2) N inverted left perpendicular + 2. In addition, because all the unknown responses could be exactly masked before entering into the response compactor, test loss due to unknown responses would be eliminated. Experimental results oil both benchmark circuits and larger designs indicated the effectiveness of the proposed technique.

    DOI

    Scopus

  • Floorplan-driven high-level synthesis for distributed/shared-register architectures

    Akira Ohchi, Shunitsu Kohara, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IPSJ Transactions on System LSI Design Methodology   1   78 - 90  2008.08  [Refereed]

     View Summary

    In this paper, we propose a high-level synthesis method targeting distributed/shared-register architectures. Our method repeats (1) scheduling/ FU binding, (2) register allocation, (3) register binding, and (4) module placement. By feeding back floorplan information from (4) to (1), our method obtains a distributed/shared-register architecture where its scheduling/binding as well as floorplaning are simultaneously optimized. Experimental results show that the area is decreased by 13.2% while maintaining the performance of the circuit equal with that using distributed-register architectures. © 2008 Information Processing Society of Japan.

    DOI

    Scopus

    8
    Citation
    (Scopus)
  • Low power LDPC code decoder architecture based on intermediate message compression technique

    Kazunori Shimizu, Nozomu Togawa, Takeshi Ikenaga, Satoshi Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E91A ( 4 ) 1054 - 1061  2008.04  [Refereed]

     View Summary

    Reducing the power dissipation for LDPC code decoder is a major challenging task to apply it to the practical digital communication systems. In this paper, we propose a low power LDPC code decoder architecture based on an intermediate message-compression technique which features as follows: (i) An intermediate message compression technique enables the decoder to reduce the required memory capacity and write power dissipation. (H) A clock gated shift register based intermediate message memory architecture enables the decoder to decompress the compressed messages in a single clock cycle while reducing the read power dissipation. The combination of the above two techniques enables the decoder to reduce the power dissipation while keeping the decoding throughput. The simulation results show that the proposed architecture improves the power efficiency up to 52% and 18% compared to that of the decoder based on the overlapped schedule and the rapid convergence schedule without the proposed techniques respectively.

    DOI

    Scopus

  • A secure test technique for pipelined advanced encryption standard

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E91D ( 3 ) 776 - 780  2008.03  [Refereed]

     View Summary

    In this paper, we presented a Design-for-Secure-Test (DFST) technique for pipelined AES to guarantee both the security and the test quality during testing. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Floorplan-Driven High-Level Synthesis for Distributed/Shared-Register Architectures

    Ohchi Akira, Kohara Shunitsu, Togawa Nozomu, Yanagisawa Masao, Ohtsuki Tatsuo

    Information and Media Technologies   3 ( 4 ) 691 - 703  2008

     View Summary

    In this paper, we propose a high-level synthesis method targeting distributed/shared-register architectures. Our method repeats (1) scheduling/FU binding, (2) register allocation, (3) register binding, and (4) module placement. By feeding back floorplan information from (4) to (1), our method obtains a distributed/shared-register architecture where its scheduling/binding as well as floorplaning are simultaneously optimized. Experimental results show that the area is decreased by 13.2% while maintaining the performance of the circuit equal with that using distributed-register architectures.

    DOI CiNii

  • High-level synthesis algorithms with floorplaning for distributed/shared-register architectures

    Akira Ohchi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2008 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), PROCEEDINGS OF TECHNICAL PROGRAM     164 - 167  2008  [Refereed]

     View Summary

    In this paper, we propose a high-level synthesis method targeting distributed/shared-register architectures. Our method repeats (1) scheduling/FU binding, (2) register allocation, (3) register binding, and (4) module placement. By feeding back floorplan information from (4) to (1), our method obtains a distributed/shared-register architecture where its scheduling/binding as well as floorplaning are simultaneously optimized. Experimental results show that the area is decreased by 13.6% while maintaining the performance of the circuit equal with that using distributed-register architectures.

  • Scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2(n))

    Kazuyuki Tanimura, Ryuta Nara, Shunitsu Kohara, Kazunori Shimizu, Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2008 ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2     667 - 672  2008  [Refereed]

     View Summary

    Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in GF(P) or GF(2), and unified architectures for GF(P) and GF(2(n)) Multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2(n)) circuits of the multiplier because the critical path of GF(P) circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2(n)). The proposed architecture unifies 4 parallel radix-2(16) multipliers in GF(P) and a radix-2(64) multiplier in GF(2(n)) into a single unit Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute GF(P) 256-bit Montgomery multiplication in 0.23 mu s.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • GECOM: Test data compression combined with all unknown response masking

    Youhua Shi, Nozontu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2008 ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2     537 - 542  2008  [Refereed]

     View Summary

    This paper introduces GECOM technology, a novel test compression method with seamless integration of test GEneration, test COmpression (i.e. integrated compression on scan stimulus and masking bits) and all unknown scan responses Masking for manufacturing test cost reduction. Unlike most of prior methods, the proposed method considers the unknown responses during ATPG procedure and selectively encodes the specified 1 or 0 bits (either Is or Os) in scan slices for compression while at the same time masks the unknown responses before sending them to the response compactor. The proposed GECOM technology consists of GECOM architecture and GECOM ATPG technique. In the GECOM architecture, for a circuit with N internal scan chains, only c tester channels, where c = [log(2) N] +2, are required. GECOM ATPG generates test patterns for the GECOM architecture thus not only the scan inputs could be efficiently compressed but also all the unknown responses would be masked. Experimental results on both benchmark circuits and real industrial designs indicated the effectiveness of the proposed GECOM technique.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Unknown Response Masking with Minimized Observable Response Loss and Mask Data

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2008 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2008), VOLS 1-4     1779 - +  2008  [Refereed]

     View Summary

    This paper presents a new unknown response masking technique to minimize the effect on test loss due to over-masking. Unlike previous works where the scan responses are masked before entering the response compactor, the proposed method could mask the Xs when they are transformed on the scan path. Meanwhile, the masking cells are inserted along the scan paths, thus they would have no degradation on the performance of the designs. In addition, the test data required to mask unknown responses is only one bit for each test pattern. Experimental results show the effectiveness of the proposed method.

    DOI

    Scopus

  • Dynamically Reconfigurable Architecture for Multi-Rate Compatible Regular LDPC Decoding

    Akiyuki Nagashima, Yuta Imai, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2008 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2008), VOLS 1-4     705 - 708  2008  [Refereed]

     View Summary

    Recently a demand for high-speed wireless network service on mobile devices is rapidly increasing. Error correcting codes are used to enhance network communication quality. Particularly, LDPC (Low Density Parity Check) codes show high throughput and achieve information rates very close to the Shannon limit. In this paper, we propose a dynamically reconfigurable architecture for mufti-rate compatible regular LDPC decoding. Our proposed decoder deals with mufti-rate codes by introducing a mufti-rate compatible 1st-2nd minimum searching unit. The proposed decoder shows the better throughput over the wide range of S/N ratio compared to conventional rate-fixed LDPC decoders.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • FIR Filter Design on Flexible Engine/Generic ALU Array and Its Dedicated Synthesis Algorithm

    Ryo Tamura, Masayuki Honma, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki, Makoto Satoh

    2008 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2008), VOLS 1-4     701 - +  2008  [Refereed]

     View Summary

    Reconfigurable processors are those whose contexts are dynamically reconfigured while they are working. We focus on a reconfigurable processor called FE-GA (Flexible Engine/Generic ALU array) for digital media processing. Currently, FE-GA does not have its dedicated behavior synthesis tool. In this paper, we design FIR filters and propose an algorithm to map them onto it automatically. For given an order and coefficients of an FIR filter, the algorithm generates a dedicated assembly code which represents a given FIR filter for FE-GA. Then an editor called FEEditor reads the generated assembly code and implements its corresponding FIR filter on FE-GA. The proposed algorithm achieves automatic mapping of FIR filters of all orders within the range of the specification of FE-GA architecture. Furthermore, it is proved that a minimum cycle is achieved to execute FIR filtering if there is no thread switching.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • 携帯機器向けMPEG-A Photo Playerのメタデータ生成システムのハードウェア化に関する一考察

    元橋雅人, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2006 ( 552 ) 31 - 36  2007.03

     View Summary

    The number of multimedia contents possessed by users is facing dramatic growth due to recent dissemimation of imaging device. Recently, research and development were focused on improvements in efficiency treatment, such as searching and grouping. In this paper, we focus on digital photo album in which allows searching and grouping, and conducts fundamental research on transferring its functions towards mobile devices. Now, the function is not practically implemented due to technical difficulties with meta-data generation processing. However, it is highly likely that mobile device suppliers will provide various services based on meta-data, and hance we will mainly discuss with meta-data generation processing system in this research. The constructed system here produces meta-data standardized by MPEG-A Photo Player, and it is aimed to generate the data in 1 second for the purpose of mobile device implementation. With regarding to the most significant bottleneck; cluster substitution, we will resolve this problem by speeding up Radon-transformation and reducing image size, in order for reducing through-put. On the other hand, we introduce custom hardware to establish histogram, which leads the system to enhance running time and reach the data generation within 1 second. Therefore, our system runs 0.87 second, it satisfies the constraint: within 1 second.

    CiNii

  • アプリケーションプロセッサ向けデータキャッシュ構成最適化システムとその評価

    堀内一央, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2006  2007.03

    CiNii

  • SIMD型プロセッサコア最適化設計のための多重ループに対応したSIMD命令合成手法

    中島裕貴, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2006 ( 551 ) 13 - 18  2007.03

     View Summary

    The hardware/software cosynthesis system named SPADES which synthesize a processor with packed SIMD type instructions needs a parallelizing compiler for the processor with packed SIMD type instructions. The parallelizing compiler targets the virtual processor that has all available hardware units. It exploits instruction level parallelism using packed SIMD type instructions and output fastest scheduled assembly codes. The output of the parallelizing compiler decides the initial configuration of the processor. This paper proposes a parallelizing algorithm for multi loop and packed SIMD instruciton generation algorithm. The proposed algorithm extracts instruction level parallelism from multi loop in input application and enables to generate packed SIMD type instructions. Experimental results show effectiveness of the proposed algorithm.

    CiNii

  • SIMD型プロセッサコアを対象としたハードウェア/ソフトウェア分割フレームワーク

    大東真崇, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2006 ( 551 ) 7 - 12  2007.03

     View Summary

    This paper proposes a hardware/software(HW/SW) partitioning framework for HW/SW cosynthesis system named SPADES. SPADES is a system to synthesis processor core specialized in application automatically. Synthesized processor core would be just enough in area and performance. A HW/SW partitoner, a core part in SPADES, first decides hardware constructions which enables to process application with all speed. And then reduces hardware units (functional units, registers, and so on) to explore optimum hardware constructions. Proposal framework enables to explore optimum hardware constructions and to synthesis much smaller SIMD processor core specialized in application. In addition, because of proposal framework is divided as modules, it enables to change or extend the system without difficulty. The experimental results show effectiveness of the proposed framework.

    CiNii

  • SIMD型プロセッサコア設計におけるプロセッシングユニット最適化手法

    繁田裕之, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2006 ( 551 ) 1 - 6  2007.03

     View Summary

    Processor cores which process image and sound data can achieve higher speed, smaller area and lower power than general-purpose processor by adding operation units specialized in application. We call these operation units "processing units". Because it takes a long time to design processing units depending on applications, a system to synthesis processing units automatically is required. This paper proposes an algorithm to optimize porcessing units for SIMD-type processor core. Our propsal synthesizes processing units automatically by clustering arithmatic and logic operation nodes from application control data flow graph(CDFG). Synthesized processing units are embedded to the prcessor core. And then we tried to explorer optimum hardware architecture by reconstructing processing units simultaneously witn processor cores. The experimental results show effectiveness of the proposed algorithm.

    CiNii

  • Power-Efficient LDPC Code Decoder Architecture

    Kazunori Shimizu, Nozomu Togawa, Takeshi Ikenaga, Satoshi Goto

    ISLPED'07: PROCEEDINGS OF THE 2007 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN     359 - 362  2007  [Refereed]

     View Summary

    This paper proposes the power-efficient LDPC decoder architecture which features (1) a FIFO buffering based rapid convergence schedule which enables the decoder to accelerate the decoding throughput without increasing the required number of memory bits, (2) an intermediate message compression technique based on a clock gated shift register which reduces the read and write, power dissipation for the intermediate messages. Simulation results show that the proposed decoder achieves 1.66 times faster decoding throughput, and improves the power efficiency (which is defined by the power dissipation per Mbps) up to 52% compared to the decoder based on the conventional overlapped schedule.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Design for secure test - A case study on pipelined Advanced Encryption Standard

    Youhua Shi, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11     149 - 152  2007  [Refereed]

     View Summary

    Cryptography plays an important role in the security of data transmission. To ensure the correctness of crypto hardware, we should conduct testing at fabrication and infield. However, the state-of-the-art scan-based test techniques, to achieve high test qualities, need to increase the testability of the circuit under test, which carries a potential of being misused to reveal the secret information of the crypto hardware. Thus, to develop efficient test strategies for crypto hardware to achieve high test quality without compromising security becomes an important task. In this paper we discuss the development of a Design-for-Secure-Test (DFST) technique for pipelined AES to overcome the above contradiction between security and test quality in testing crypto hardware. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits.

    DOI

  • XMLをベースとしたCDFGマニピュレーションフレームワーク: CoDaMa

    小原俊逸, 史又華, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2006-97   19 - 24  2007.01

    CiNii

  • 楕円曲線暗号向けGF(2m)上のDigit-Serial乗算器の設計

    奈良竜太, 小原俊逸, 清水一範, 戸川望, 池永剛, 柳澤政生, 後藤敏, 大附辰夫

    電子情報通信学会技術研究報告   VLD2006-89 ( 455 ) 25 - 30  2007.01

    CiNii

  • Power-efficient LDPC decoder architecture based on accelerated message-passing schedule

    Kazunori Shimizu, Tatsuyuki Ishikawa, Nozomu Togawa, Takeshi Ikenaga, Satoshi Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E89A ( 12 ) 3602 - 3612  2006.12  [Refereed]

     View Summary

    In this paper, we propose a power-efficient LDPC decoder architecture based on an accelerated message-passing schedule. The proposed decoder architecture is characterized as follows: (i) Partitioning a pipelined operation not to read and write intermediate messages simultaneously enables the accelerated message-passing schedule to be implemented with single-port SRAMs. (H) FIFO-based buffering reduces the number of SRAM banks and words of the LDPC. decoder based on the accelerated message-passing schedule.. The proposed LDPC decoder keeps a single message for each non-zero bit in a parity check matrix as well as a classical schedule while achieving the accelerated message-passing schedule. Implementation results in 0.18 [mu m] CMOS technology show that the proposed decoder architecture reduces an area of the LDPC decoder by 43% and a power dissipation by 29% compared to the conventional architecture based on the accelerated message-passing schedule.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • アプリケーションプロセッサのフォワーディングユニット最適化手法

    日浦敏俊, 小原俊逸, 史又華, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   ( VLD2006-80 )  2006.11

  • 動的再構成可能なマルチレート対応LDPC符号復号器の実装

    今井優太, 清水一範, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   RECONF2006-43 ( 393 ) 35 - 40  2006.11

     View Summary

    Recently a demand for high-speed wireless network service on mobile devices is gradually increasing. Error correcting code is used as one method to enhance network communication quality. In this paper, we propose a multi-rate compatible LDPC (Low Density Parity Check) decoder architecture where LDPC is a new code which can show high throghput in linear time. Our proposed decoder supports multi-rate code which is done by implementing the Multi-Rate compatible 1st-2nd Minimum Searching Ciurcuit. Applying this method, the decoder shows the better thorughput over the wide range of S/N ratio compared to the rate-fixed LDPC decoder.

    CiNii

  • 歩行者ナビゲーションにおける微小画面での視認性とユーザの迷いにくさを考慮した略地図生成手法

    二宮直也, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   ITS2006-34 ( 266 ) 53 - 58  2006.09

     View Summary

    The use of map service for pedestrians has expanded by the spread of the location information service and Internet services by the cellular phone. There have been various researches to generate effective deformed maps to mobile devices with a small display automatically. The existing techniques are based on making road shape horizontal and vertical, and quantizing of intersection angle. Deformed maps generated by them have a high level of visibility, but they are not easy to understand for users. In this paper, we propose a road shape transformation algorithm based on cognitive science. It can generate deformed maps that can be understandable in a small display and has easiness of route understanding. By applying our proposed algorithm to about 400 node road-network data, we confirmed that our proposed algorithm work efficiently.

    CiNii

  • 屋内用歩行者ナビゲーションにおける歩行者の嗜好を反映させる経路探索手法

    荒井亨, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   ITS2006-33 ( 266 ) 47 - 52  2006.09

     View Summary

    A navigation service for pedestrians using mobile devices, guiding a user to his/her requested destination has become common these days. However such kind of service is available only for outdoor environment. In this paper we focus on indoor environment as a target of a navigation service and propose network data structures specific to indoor environment such as an underground shopping center or a department store. Based on these network data structures, we propose a route searching algorithm which satisfies individual preferences. In order to show the effectiveness of our proposed algorithm, we carry out two types of simulation and indicate that we obtain the optimal route for an indoor navigation.

    CiNii

  • 屋内向け歩行者ナビゲーションにおけるユーザの嗜好性と混雑状況を考慮した目的地決定手法

    小林和馬, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   ITS2006-32 ( 266 ) 41 - 45  2006.09

     View Summary

    In recent years, the enforcement of researches for pedestrian navigation system with a cellular phone is increasing in accordance with the improvement of a communications network system of a cellular phone. In this system, possibility of navigation considering user's demand can be expected to lead the improvement of usability. In this paper, we propose a destination speculating system considering individual preferences and congestions for indoor pedestrian navigation, aiming at intelligent navigation system adapted to users. The system uses user's preferences to find out a destination for the navigation. In this system, we place user's preference of food, past record and congestion in store as important element to decide destination. According to these element, this system decide shops satisfy user's preference. The destination is decided through the interaction between the system and the user. As a result of constructing the system of our proposal, we verified our proposal would be effective.

    CiNii

  • 車車間・路車間通信技術を用いた車線別の渋滞情報の検出手法

    大高宏介, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   ITS2006-18 ( 265 ) 19 - 24  2006.09

     View Summary

    As the ITS technology evolves, the measurement accuracy and the technology of the route guide is rising. But, because the measurement accuracy of the time required from startoing point to destination is not high enough, it is problem how to acquire accurate congestion information. Especially, because difference in congestion situation for each lane exerts a great influence on calculation of the time required, if the congestion level is different for each lane, it is necessary to detect congestion information for each lane in the intersection without causing a problem which was seen in conventional congestion-detecting method. Then, we propose a method to detect congestion information of each lane by using Vehicle-to-Vehicle and Road-to-Vehicle Communication technology in real time in the intersection on a general road. Time required to pass the congestion is calculated by using the information which was gathered by iterative communication among cars which starts from a beacom. After that, we show the effectiveness of this method by simulating it.

    CiNii

  • H.264符号化向けDSPにおける動き予測演算器の設計

    高橋豊和, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   ( VLD2006 )  2006.06

  • アプリケーションプロセッサの面積/遅延見積もり手法

    山崎大輔, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   CAS2006-1 ( VLD2006-14, SIP2006-24 )  2006.06

  • Selective low-care coding: A means for test data compression in circuits with multiple scan chains

    YH Shi, N Togawa, S Kimura, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E89A ( 4 ) 996 - 1004  2006.04  [Refereed]

     View Summary

    This paper presents a test input data compression technique, Selective Low-Care Coding (SLC), which can be used to significantly reduce input test data volume as well as the external test channel requirement for multiscan-based designs. In the proposed SLC scheme, we explored the linear dependencies of the internal scan chains, and instead of encoding all the specified bits in test cubes, only a smaller amount of specified bits are selected for encoding, thus greater compression can be expected. Experiments on the larger benchmark circuits show drastic reduction in test data volume with corresponding savings on test application time can be indeed achieved even for the well-compacted test set.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Partially-parallel LDPC decoder achieving high-efficiency message-passing schedule

    K Shimizu, T Ishikawa, N Togawa, T Ikenaga, S Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E89A ( 4 ) 969 - 978  2006.04  [Refereed]

     View Summary

    In this paper, we propose a partially-parallel LDPC decoder which achieves a high-efficiency message-passing schedule. The proposed LDPC decoder is characterized as follows: (i) The column operations follow the row operations in a pipelined architecture to ensure that the row and column operations are performed concurrently. (ii) The proposed parallel pipelined bit functional unit enables the column operation module to compute every message in each bit node which is updated by the row operations. These column operations can be performed without extending the single iterative decoding delay when the row and column operations are performed concurrently. Therefore, the proposed decoder performs the column operations more frequently in a single iterative decoding, and achieves a high-efficiency message-passing schedule within the limited decoding delay time. Hardware implementation on an FPGA and simulation results show that the proposed partially-parallel LDPC decoder improves the decoding throughput and bit error performance with a small hardware overhead.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • Hardware architecture of efficient message-passing schedule based on modified min-sum algorithm for decoding LDPC codes

    Proc. Synthesis and System Integration of Mixed Technologies (SASIMI 2006)    2006.04

  • A pipelined functional unit generation method in HW/SW cosynthesis for SIMD processor cores

    Proc. Synthesis and System Integration of Mixed Technologies (SASIMI 2006)    2006.04

  • Partially-parallel LDPC decoder achieving high-efficiency message-passing schedule

    K Shimizu, T Ishikawa, N Togawa, T Ikenaga, S Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E89A ( 4 ) 969 - 978  2006.04

     View Summary

    In this paper, we propose a partially-parallel LDPC decoder which achieves a high-efficiency message-passing schedule. The proposed LDPC decoder is characterized as follows: (i) The column operations follow the row operations in a pipelined architecture to ensure that the row and column operations are performed concurrently. (ii) The proposed parallel pipelined bit functional unit enables the column operation module to compute every message in each bit node which is updated by the row operations. These column operations can be performed without extending the single iterative decoding delay when the row and column operations are performed concurrently. Therefore, the proposed decoder performs the column operations more frequently in a single iterative decoding, and achieves a high-efficiency message-passing schedule within the limited decoding delay time. Hardware implementation on an FPGA and simulation results show that the proposed partially-parallel LDPC decoder improves the decoding throughput and bit error performance with a small hardware overhead.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • Hardware architecture of efficient message-passing schedule based on modified min-sum algorithm for decoding LDPC codes

    清水一範, 石川達之, 戸川望, 池永剛, 後藤敏

    Proc. Synthesis and System Integration of Mixed Technologies (SASIMI 2006)    2006.04

  • A pipelined functional unit generation method in HW/SW cosynthesis for SIMD processor cores

    小原俊逸, 栗原輝, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    Proc. Synthesis and System Integration of Mixed Technologies (SASIMI 2006)    2006.04

  • アプリケーションプロセッサのデータキャッシュ構成最適化手法

    堀内一央, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会第19回回路とシステム軽井沢ワークショップ論文集   19   583 - 588  2006.04

    CiNii

  • FIFOバッファによる高効率Message-Passingスケジュールを用いたLDPC復号器

    清水一範, 石川達之, 戸川望, 池永剛, 後藤敏

    電子情報通信学会第19回回路とシステム軽井沢ワークショップ論文集   19   211 - 216  2006.04

    CiNii

  • Selective low-care coding: A means for test data compression in circuits with multiple scan chains

    YH Shi, N Togawa, S Kimura, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E89A ( 4 ) 996 - 1004  2006.04

     View Summary

    This paper presents a test input data compression technique, Selective Low-Care Coding (SLC), which can be used to significantly reduce input test data volume as well as the external test channel requirement for multiscan-based designs. In the proposed SLC scheme, we explored the linear dependencies of the internal scan chains, and instead of encoding all the specified bits in test cubes, only a smaller amount of specified bits are selected for encoding, thus greater compression can be expected. Experiments on the larger benchmark circuits show drastic reduction in test data volume with corresponding savings on test application time can be indeed achieved even for the well-compacted test set.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Partially-parallel LDPC decoder achieving high-efficiency message-passing schedule

    K Shimizu, T Ishikawa, N Togawa, T Ikenaga, S Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E89A ( 4 ) 969 - 978  2006.04

     View Summary

    In this paper, we propose a partially-parallel LDPC decoder which achieves a high-efficiency message-passing schedule. The proposed LDPC decoder is characterized as follows: (i) The column operations follow the row operations in a pipelined architecture to ensure that the row and column operations are performed concurrently. (ii) The proposed parallel pipelined bit functional unit enables the column operation module to compute every message in each bit node which is updated by the row operations. These column operations can be performed without extending the single iterative decoding delay when the row and column operations are performed concurrently. Therefore, the proposed decoder performs the column operations more frequently in a single iterative decoding, and achieves a high-efficiency message-passing schedule within the limited decoding delay time. Hardware implementation on an FPGA and simulation results show that the proposed partially-parallel LDPC decoder improves the decoding throughput and bit error performance with a small hardware overhead.

    DOI

    Scopus

    9
    Citation
    (Scopus)
  • A fast elliptic curve cryptosystem LSI embedding word-based Montgomery multiplier

    J Uchida, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON ELECTRONICS   E89C ( 3 ) 243 - 249  2006.03  [Refereed]

     View Summary

    Elliptic curve cryptosystems are expected to be a next standard of public-key cryptosystems. A security level of elliptic curve cryptosystems depends on a difficulty of a discrete logarithm problem on elliptic curves. The security level of a elliptic curve cryptosystem which has a public-key of 160-bit is equivalent to that of a RSA system which has a public-key of 1024-bit. We propose an elliptic curve cryptosystem LSI architecture embedding word-based Montgomery multipliers. A Montgomery multiplication is an efficient method for a finite field multiplication. We can design a scalable architecture for an elliptic curve cryptosystem by selecting structure of word-based Montgomery multipliers. Experimental results demonstrate effectiveness and efficiency of the proposed architecture. In the hardware evaluation using 0.18 mu m CMOS library, the highspeed design using 126 Kgates with 20 x 8-bit multipliers achieved operation times of 3.6 ms for a 160-bit point multiplication.

    DOI

    Scopus

  • A fast elliptic curve cryptosystem LSI embedding word-based Montgomery multiplier

    J Uchida, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON ELECTRONICS   E89C ( 3 ) 243 - 249  2006.03

     View Summary

    Elliptic curve cryptosystems are expected to be a next standard of public-key cryptosystems. A security level of elliptic curve cryptosystems depends on a difficulty of a discrete logarithm problem on elliptic curves. The security level of a elliptic curve cryptosystem which has a public-key of 160-bit is equivalent to that of a RSA system which has a public-key of 1024-bit. We propose an elliptic curve cryptosystem LSI architecture embedding word-based Montgomery multipliers. A Montgomery multiplication is an efficient method for a finite field multiplication. We can design a scalable architecture for an elliptic curve cryptosystem by selecting structure of word-based Montgomery multipliers. Experimental results demonstrate effectiveness and efficiency of the proposed architecture. In the hardware evaluation using 0.18 mu m CMOS library, the highspeed design using 126 Kgates with 20 x 8-bit multipliers achieved operation times of 3.6 ms for a 160-bit point multiplication.

    DOI

    Scopus

  • 歩行者向け地図情報配信システムにおける道路交通標識を用いた位置特定手法

    中口智史, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   ( ITS2005-114 )  2006.03

  • SIMD型プロセッサコアの自動合成におけるパイプライン構成最適化手法

    栗原輝, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   105 ( VLD2005-115, ICD2005-232 ) 43 - 48  2006.03

     View Summary

    This paper proposes an algorithm to optimize the pipeline architecture of processor core to be synthesized. The algorithm can be integrated into the synthesis system for an application-specific SIMD processor core. A SIMD processor core has SIMD functional units whose critical path delay is relatively large and it usually determines operating frequency. By applying pipelining technique to the processor core, its operating frequency can be increased without adding too much area. However, increasing the number of pipeline stages does not always lead to reduction in execution time because of hazards, thus the optimum pipeline architecture should be searched during the processor core synthesis. In the algorithm, first a set of pipeline architectures with different number of pipeline stages is defined. Next, for each defined pipeline architecture, the number of hardware units which are added to the processor core is optimized to satisfy the given timing constraint. Last, the pipeline architecture with the smallest area among the defined pipeline architectures is selected as an optimum solution. We also show the promising experimental results on the algorithm evaluation.

    CiNii

  • 動的フローに対応したネットワークプロセッサの改良とその評価

    田淵英孝, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   ( VLD2005-112, ICD2005-229 )  2006.03

  • 設計ナビゲーション機構を有するシステムLSI設計のためのHW/SW分割システム

    小島洋平, 戸川望, 橘昌良, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   105 ( VLD2005-111, ICD2005-228 ) 19 - 24  2006.03

     View Summary

    In this paper, we propose a new hardware/software partitioning system for system LSIs design. This system has the IP database and the design navigation. The IP database changes the number and kind of enumerated IP according to the constraints. By reusing the IPs in the database, we can decrease designing new module and the design period can be shortened. By reducing the number of enumerated IP, the search time can be shortened. The design navigation interactively can help the designer improves his/her design when the solution satisfy constraints is not obtained. By using the design navigation, the designer can examine the bottleneck and the architecture can be easily improved. We confirmed the effectiveness of the proposed system through computer experiments.

    CiNii

  • 高速移動体のためのハンドオフメッセージ数を最小化した高速ハンドオフ手法

    伊藤光司, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   ( IN2005-222 )  2006.03

  • A fast elliptic curve cryptosystem LSI embedding word-based Montgomery multiplier

    J Uchida, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON ELECTRONICS   E89C ( 3 ) 243 - 249  2006.03

     View Summary

    Elliptic curve cryptosystems are expected to be a next standard of public-key cryptosystems. A security level of elliptic curve cryptosystems depends on a difficulty of a discrete logarithm problem on elliptic curves. The security level of a elliptic curve cryptosystem which has a public-key of 160-bit is equivalent to that of a RSA system which has a public-key of 1024-bit. We propose an elliptic curve cryptosystem LSI architecture embedding word-based Montgomery multipliers. A Montgomery multiplication is an efficient method for a finite field multiplication. We can design a scalable architecture for an elliptic curve cryptosystem by selecting structure of word-based Montgomery multipliers. Experimental results demonstrate effectiveness and efficiency of the proposed architecture. In the hardware evaluation using 0.18 mu m CMOS library, the highspeed design using 126 Kgates with 20 x 8-bit multipliers achieved operation times of 3.6 ms for a 160-bit point multiplication.

    DOI

    Scopus

  • ASIC implementation of LDPC decoder accelerating message-passing schedule

    SHIMIZU Kazunori

    IEEE International Solid State Circuits Confeference (ISSCC), DAC/ISSCC2006 Student Design Contest (Conceptual Category: 1st Place Winner), San Franscisco    2006.02

    CiNii

  • ASIC implementation of LDPC decoder accelerating message-passing schedule

    清水一範, 石川達之, 戸川望, 池永剛, 後藤敏

    IEEE International Solid State Circuits Confeference (ISSCC), DAC/ISSCC2006 Student Design Contest (Conceptual Category: 1st Place Winner), San Franscisco    2006.02

  • Special section on VLSI Design and CAD Algorithms

    Onodera, H., Ikeda, M., Ishihara, T., Isshiki, T., Inoue, K., Okada, K., Kajihara, S., Kaneko, M., Kawaguchi, H., Kimura, S., Kuga, M., Kurokawa, A., Sato, T., Shibuya, T., Shiraishi, Y., Takagi, K., Takahashi, A., Takeuchi, Y., Togawa, N., Tomiyama, H., Nakamura, Y., Hamaguchi, K., Miura, Y., Minato, S.-I., Yamaguchi, R., Yamada, M., Yuminaka, Y., Watanabe, T., Hashimoto, M., Miyazaki, M.

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E89-A ( 12 ) 3377 - 3377  2006

    DOI

    Scopus

  • A parallel LSI architecture for LDPC decoder improving message-passing schedule

    Kazunori Shimizu, Tatsuyuki Ishikawa, Nozomu Togawa, Takeshi Ikenaga, Satoshi Gotot

    2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS     5099 - +  2006  [Refereed]

     View Summary

    This paper proposes a parallel LSI architecture for LDPC decoder which improves a message-passing schedule. The proposed LDPC decoder is characterized as follows: (i) The column operations follow the row operations in a pipelined architecture to ensure that the row and column operations are performed concurrently. (ii) The proposed parallel pipelined bit functional unit enables the decoder to perform every column operation using the messages which is updated by the row operations. These column operations can be performed without extending the single iterative decoding delay. Hardware imp mentation and simulation results show that the proposed decoder improves the decoding throughput and bit error performance with a small hardware overhead.

    DOI

  • FCSCAN: An efficient multiscan-based test compression technique for test cost reduction

    Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

    ASP-DAC 2006: 11TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, PROCEEDINGS     653 - 658  2006  [Refereed]

     View Summary

    This paper proposes a new multiscan-based test input data compression technique by employing a Fan-out Compression Scan Architecture (FCSCAN) for test cost reduction. The basic idea of FCSCAN is to target the minority specified 1 or 0 bits (either 1 or 0) in scan slices for compression. Due to the low specified bit density in test cube set, FCSCAN can significantly reduce input test data volume and the number of required test channels so as to reduce test cost. The FCSCAN technique is easy to be implemented with small hardware overhead and does not need any special ATPG for test generation. In addition, based on the theoretical compression efficiency analysis, improved procedures are also proposed for the FCSCAN to achieve further compression. Experimental results on both benchmark circuits and one real industrial design indicate that drastic reduction in test cost can be indeed achieved.

    DOI

  • An interface-circuit synthesis method with configurable processor core in IP-based SoC designs

    Shunitsu Kohara, Naoki Tomono, Jumpei Uchida, Yuichiro Miyaoka, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    ASP-DAC 2006: 11TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, PROCEEDINGS     594 - 599  2006  [Refereed]

     View Summary

    In SoC designs, efficient communication between the hardware IPs and the on-chip processor becomes very important, however the interface is usually affacted by the processor core specification. Thus in this paper, we focus on developing an efficient interface circuit architecture for the communications between the on-chip processor and embedded hardware IP cores. we also propose a method to synthesize it. Experimental results show that our method could obtain optimal interface circuits and works well through designing a MPEG-4 encode application.

    DOI

  • Memory-efficient accelerating schedule for LDPC decoder

    Kazunori Shimizu, Nozonm Togawa, Takeshi Ikenaga, Satoshi Goto

    2006 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS     1317 - +  2006  [Refereed]

     View Summary

    This paper proposes a memory-efficient accelerating schedule for LDPC decoder. Important properties of the proposed techniques are as follows: (i) Partitioning a pipelined operation not to read and write intermediate messages simultaneously enables the accelerated message-passing schedule to be implemented with single-port memories. (ii) FIFO-based buffering reduces the number of memory banks and words for the decoder based on the accelerated message-passing schedule. The proposed decoder reduces the memories for intermediate messages by half compared to the conventional one based on the accelerated message-passing schedule.

    DOI

    Scopus

  • A parallel LSI architecture for LDPC decoder improving message-passing schedule

    Kazunori Shimizu, Tatsuyuki Ishikawa, Nozomu Togawa, Takeshi Ikenaga, Satoshi Gotot

    2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS     5099 - +  2006

     View Summary

    This paper proposes a parallel LSI architecture for LDPC decoder which improves a message-passing schedule. The proposed LDPC decoder is characterized as follows: (i) The column operations follow the row operations in a pipelined architecture to ensure that the row and column operations are performed concurrently. (ii) The proposed parallel pipelined bit functional unit enables the decoder to perform every column operation using the messages which is updated by the row operations. These column operations can be performed without extending the single iterative decoding delay. Hardware imp mentation and simulation results show that the proposed decoder improves the decoding throughput and bit error performance with a small hardware overhead.

  • Selective low-care coding: A means for test data compression in circuits with multiple scan chains

    Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E89-A ( 4 ) 996 - 1003  2006

     View Summary

    This paper presents a test input data compression technique, Selective Low-Care Coding (SLC), which can he used to significantly reduce input test data volume as well as the external test channel requirement for multiscan-based designs. In the proposed SLC scheme, we explored the linear dependencies of the internal scan chains, and instead of encoding all the specified bits in test cubes, only a smaller amount of specified bits are selected for encoding, thus greater compression can be expected. Experiments on the larger benchmark circuits show drastic reduction in test data volume with corresponding savings on test application time can be indeed achieved even for the well-compacted test set. Copyright © 2006 The Institute of Electronics, Information and Communication Engineers.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • FCSCAN: An efficient multiscan-based test compression technique for test cost reduction

    Youhua Shi, Nozomu Togawa, Shinji Kimura, Masao Yanagisawa, Tatsuo Ohtsuki

    ASP-DAC 2006: 11TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, PROCEEDINGS     653 - 658  2006

     View Summary

    This paper proposes a new multiscan-based test input data compression technique by employing a Fan-out Compression Scan Architecture (FCSCAN) for test cost reduction. The basic idea of FCSCAN is to target the minority specified 1 or 0 bits (either 1 or 0) in scan slices for compression. Due to the low specified bit density in test cube set, FCSCAN can significantly reduce input test data volume and the number of required test channels so as to reduce test cost. The FCSCAN technique is easy to be implemented with small hardware overhead and does not need any special ATPG for test generation. In addition, based on the theoretical compression efficiency analysis, improved procedures are also proposed for the FCSCAN to achieve further compression. Experimental results on both benchmark circuits and one real industrial design indicate that drastic reduction in test cost can be indeed achieved.

  • An interface-circuit synthesis method with configurable processor core in IP-based SoC designs

    Shunitsu Kohara, Naoki Tomono, Jumpei Uchida, Yuichiro Miyaoka, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    ASP-DAC 2006: 11TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, PROCEEDINGS     594 - 599  2006

     View Summary

    In SoC designs, efficient communication between the hardware IPs and the on-chip processor becomes very important, however the interface is usually affacted by the processor core specification. Thus in this paper, we focus on developing an efficient interface circuit architecture for the communications between the on-chip processor and embedded hardware IP cores. we also propose a method to synthesize it. Experimental results show that our method could obtain optimal interface circuits and works well through designing a MPEG-4 encode application.

  • MPEG-4形状符号化/復号化に対応したDSP組み込み向け専用演算器の設計

    古宇多朋史, 小原俊逸, 史又華, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会組込みシステムシンポジウム2006論文集(ESS2006)    2006

  • 連携処理を考慮したネットワークプロセッサ合成システム

    中山敬史, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会DAシンポジウム2006論文集    2006

  • レジスタ分散・共有併用型アーキテクチャを対象としたフロアプランを考慮した高位合成手法

    大智輝, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会DAシンポジウム2006論文集    2006

  • SIMD型プロセッサコアの自動合成のためのパイプライン演算ユニット生成手法

    栗原輝, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会論文誌   vol. 47 ( no. 6 )  2006

  • H.264符号化向けDSPにおける動き予測演算器の設計

    高橋豊和, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   CAS2006-10 ( VLD2006-23, SIP2006-33 ) 13 - 18  2006

     View Summary

    The improved coding efficiency in H.264/AVC comes from higher computational complexity. Most of that is related to motion estimation. Some new features, such as multiple reference frame, variable block size motion compensation and quarter-pel accuracy motion compensation have been adopted to improve coding peformance, however they would increase the processing time. On the other hand, to speed up motion estimation, many architectures that can implement integer-pel motion estimation have also been proposed. However, it's difficult to improve the processing performance of such architectures in memory bandwidth restricted architecture like a DSP datapath, due to the irregular memory access. In this paper, we propose an integer-pel motion estimator on DSP that adopts pixel subsampling technique to reduce hardware cost. In addition, we modify subsampling pattern from commonly used chessboad-like pattern to vertical-striped pattern, which is able to speed up motion estimation by reducing memory access cycles. The proposed architecture can process 86.5 CIF frames per second at 200MHz operating frequency.

    CiNii

  • A parallel LSI architecture for LDPC decoder improving message-passing schedule

    Kazunori Shimizu, Tatsuyuki Ishikawa, Nozomu Togawa, Takeshi Ikenaga, Satoshi Gotot

    2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS     5099 - +  2006

     View Summary

    This paper proposes a parallel LSI architecture for LDPC decoder which improves a message-passing schedule. The proposed LDPC decoder is characterized as follows: (i) The column operations follow the row operations in a pipelined architecture to ensure that the row and column operations are performed concurrently. (ii) The proposed parallel pipelined bit functional unit enables the decoder to perform every column operation using the messages which is updated by the row operations. These column operations can be performed without extending the single iterative decoding delay. Hardware imp mentation and simulation results show that the proposed decoder improves the decoding throughput and bit error performance with a small hardware overhead.

  • FCSCAN: An efficient multiscan-based data compression technique for test cost reduction

    史又華, 戸川望, 木村晋二, 柳澤政生, 大附辰夫

    Proc. IEEE Asia and South Pacific Design Automation Conference 2006 (ASP-DAC 2006)     653 - 658  2006.01

  • An interface-circuit synthesizer with configurable processor core in IP-based SOC design

    小原俊逸, 友野直紀, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    Proc. IEEE Asia and South Pacific Design Automation Conference 2006 (ASP-DAC 2006)     594 - 599  2006.01

  • 重回帰分析により得られた1次式によるインダクタンスを考慮した配線遅延の見積り

    鈴木康成, マルタディナタ アンワル, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   105 ( VLD2005-72 ) 67 - 72  2005.12

     View Summary

    In recent DSM (Deep SubMicron) technology, we need to take some important points, such as floorplaning, interconnect resistance and so on into consideration. It has been shown that inductance effect on clock, power, bus and macroblock interconnect is considerably large. In this paper we propose a new method to estimate single interconnect 50% delay by using an approximated equation given by multiple regression analysis. The proposed method achieved higher accuracy and less amount of operation than those of a conventional method.

    CiNii

  • レジスタ分散・共有アーキテクチャを対象としたフロアプラン指向高位合成手法

    大智輝, 戸川望, 柳澤雅夫, 大附辰夫

    電子情報通信学会技術研究報告   105 ( VLD2005-66 ) 31 - 36  2005.12

     View Summary

    As device feature size decreases, interconnection delay becomes the dominating factor of total delay. By using Distributed-Register architectures, we can synthesize the circuits with register-to-register data transfer, and can reduce influence of interconnect delay. However, Distributed-Register architectures have the problem that circuit area increases by the number of registers increasing. In this paper, we propose a high-level synthesis method targeting a Distributed/Shared-Register architectures. Our method repeats (1) scheduling, (2) register allocation, (3) register binding, (4) module placement processes, and feeds back floorplan information from (4). This method can reduce circuit area while maintaining the performance of the circuit equal with Distrubuted-register architectures. We show effectiveness of the proposed methods through experimental results.

    CiNii

  • SIMD型プロセッサの自動合成におけるパイプライン演算ユニット生成手法

    栗原輝, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会DAシンポジウム2005論文集     25 - 30  2005.08

    CiNii

  • 画像処理向けシステムLSI設計における設計ナビゲーションを考慮したHW/SW分割システム

    小島洋平, 戸川望, 橘昌良, 柳澤政生, 大附辰夫

    情報処理学会DAシンポジウム2005論文集     19 - 24  2005.08

  • Reconfigurable adaptive FEC system based on Reed-Solomon code with interleaving

    K Shimizu, N Togawa, T Ikenaga, S Goto

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E88D ( 7 ) 1526 - 1537  2005.07  [Refereed]

     View Summary

    This paper proposes a reconfigurable adaptive FEC system based on Reed-Solomon (RS) code with interleaving. In adaptive FEC schemes, error correction capability t is changed dynamically according to the communication channel condition. For given error correction capability t, we can implement an optimal RS decoder composed of minimum hardware units for each t. If the hardware units of the RS decoder can be reduced for any given error correction capability t, we can embed as large deinterleaver as possible into the RS decoder for each.t. Reconfiguring the RS decoder embedded with the expanded deinterleaver dynamically for each error correction capability t allows us to decode larger interleaved codes which are more robust error correction codes to burst errors. In a reliable transport protocol, experimental results show that our system achieves up to 65% lower packet error rate and 5.9% higher data transmission throughput compared to the adaptive FEC scheme on a conventional fixed hardware system. In an unreliable transport protocol, our system achieves up to 76% better bit error performance with higher code rate compared to the adaptive FEC scheme on a conventional fixed hardware system.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A SIMD instruction set and functional unit synthesis algorithm with SIMD operation decomposition

    N Togawa, K Tachikake, Y Miyaoka, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E88D ( 7 ) 1340 - 1349  2005.07  [Refereed]

     View Summary

    This paper focuses on SIMD processor synthesis and proposes a SIMD instruction set/functional unit synthesis algorithm. Given an initial assembly code and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with optimal SIMD functional units. It also synthesizes a SIMD instruction set. The input initial assembly code is assumed to run on a full-resource SIMD processor (virtual processor) which has all the possible SIMD functional units. In our algorithm, we introduce the SIMD operation decomposition and apply it to the initial assembly code and the full-resource SIMD processor. By gradually reducing SIMD operations or decomposing SIMD operations, we can finally find a processor core with small area under the given timing constraint. The promising experimental results are also shown.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Reconfigurable adaptive FEC system based on Reed-Solomon code with interleaving

    Kazunori Shimizu, Nozomu Togawa, Takeshi Ikenaga, Satoshi Goto

    IEICE Trans. on Information and Systems   E88-D ( 7 ) 1538 - 1545  2005.07

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A SIMD instruction set and functional unit synthesis algorithm with simd operation decomposition

    IEICE Trans. on Information and Systems   E88-D ( 7 ) 1340 - 1349  2005.07

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Reconfigurable adaptive FEC system based on Reed-Solomon code with interleaving

    K Shimizu, N Togawa, T Ikenaga, S Goto

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E88D ( 7 ) 1526 - 1537  2005.07

     View Summary

    This paper proposes a reconfigurable adaptive FEC system based on Reed-Solomon (RS) code with interleaving. In adaptive FEC schemes, error correction capability t is changed dynamically according to the communication channel condition. For given error correction capability t, we can implement an optimal RS decoder composed of minimum hardware units for each t. If the hardware units of the RS decoder can be reduced for any given error correction capability t, we can embed as large deinterleaver as possible into the RS decoder for each.t. Reconfiguring the RS decoder embedded with the expanded deinterleaver dynamically for each error correction capability t allows us to decode larger interleaved codes which are more robust error correction codes to burst errors. In a reliable transport protocol, experimental results show that our system achieves up to 65% lower packet error rate and 5.9% higher data transmission throughput compared to the adaptive FEC scheme on a conventional fixed hardware system. In an unreliable transport protocol, our system achieves up to 76% better bit error performance with higher code rate compared to the adaptive FEC scheme on a conventional fixed hardware system.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A SIMD instruction set and functional unit synthesis algorithm with SIMD operation decomposition

    N Togawa, K Tachikake, Y Miyaoka, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E88D ( 7 ) 1340 - 1349  2005.07

     View Summary

    This paper focuses on SIMD processor synthesis and proposes a SIMD instruction set/functional unit synthesis algorithm. Given an initial assembly code and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with optimal SIMD functional units. It also synthesizes a SIMD instruction set. The input initial assembly code is assumed to run on a full-resource SIMD processor (virtual processor) which has all the possible SIMD functional units. In our algorithm, we introduce the SIMD operation decomposition and apply it to the initial assembly code and the full-resource SIMD processor. By gradually reducing SIMD operations or decomposing SIMD operations, we can finally find a processor core with small area under the given timing constraint. The promising experimental results are also shown.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • レジスタ分散型アーキテクチャを対象とするフロアプランとタイミング制約を考慮した高位合成手法

    田中真, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会論文誌   46 ( 6 ) 1383 - 1394  2005.05

  • Sub-operation parallelism optimization in SIMD processor core synthesis

    H Kawazu, J Uchida, Y Miyaoka, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E88A ( 4 ) 876 - 884  2005.04  [Refereed]

     View Summary

    A b-bit SIMD functional unit has n k-bit sub-functional units in itself, where b = k x n. It can execute n-parallel k-bit operations. However, all the b-bit functional units in a processor core do not necessarily execute n-parallel operations. Depending on an application program, some of them just execute n/2-parallel operations or even n/4-parallel operations. This means that we can modify a b-bit SIMD functional unit so that it has n/2 k-bit sub-functional units or n/4 k-bit sub-functional units. The number of k-bit sub-functional units in a SIMD functional unit is called sub-operation parallelism. We incorporate a sub-operation parallelism optimization algorithm into SIMD functional unit optimization. Our proposed algorithm gradually reduces sub-operation parallelism of a SIMD functional unit while the timing constraint of execution time satisfied. Thereby, we can finally find a processor core with small area under the given timing constraint. We expect that we can obtain processor core configurations of smaller area in the same timing constraint rather than a conventional system. The promising experimental results are also shown.

    DOI

    Scopus

  • IP再利用を考慮したシステムLSI設計におけるインタフェース回路生成システム

    小原俊逸, 友野直紀, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会第18回回路とシステム軽井沢ワークショップ論文集     581 - 586  2005.04

  • SIMD型プロセッサコア向けHW/SW協調合成システムにおけるパイプライン演算ユニット生成手法

    栗原輝, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会第18回回路とシステム軽井沢ワークショップ論文集     575 - 580  2005.04

  • A selective care bits coding method for test data compression

    史又華, 戸川望, 木村晋二, 柳澤政生, 大附辰夫

    電子情報通信学会第18回回路とシステム軽井沢ワークショップ論文集     241 - 246  2005.04

  • 信頼度の伝播効率を改善する部分並列LDPC復号器の実装と評価

    清水一範, 石川達之, 戸川望, 池永剛, 後藤敏

    電子情報通信学会第18回回路とシステム軽井沢ワークショップ論文集     181 - 186  2005.04

    CiNii

  • インダクタンスを考慮した配線遅延の近似式による見積もり

    鈴木康成, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会第18回回路とシステム軽井沢ワークショップ論文集     1 - 6  2005.04

  • ネットワークプロセッサ合成システムの改良とその評価

    升本英行, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2004  2005.03

  • 動的フローに適応したネットワークプロセッサ設計とその評価

    細田宗一郎, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2004 ( 709 ) 79 - 84  2005.03

     View Summary

    This paper proposes a network processor which configures its behaviors adaptively based on dynamic communication data-flows. The network processor consists of an input processing unit, an encryption unit, an output processing unit, and a Dynamic-Micro Packet Processor (D-MPP). The D-MPP dynamicaly detects bottleneck process based on command queues and cycle counts for each processing. By assigning D-MPP to bottleneck processing, the throughput of the whole network processor can be improved. In this paper, the effective of the D-MPP is shown through implementations and evaluations.

    CiNii

  • ワードベースモンゴメリ乗算器を搭載した高速楕円曲線暗号LSI

    内田純平, 奈良竜太, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2004 ( 708 ) 5 - 10  2005.03

     View Summary

    Elliptic curve cryptosystem is hoped for a next standard public-key cryptosystem. Security of Elliptic curve cryptosystem depends on a difficulty of a discrete logarithm problem over elliptic curves. The security of elliptic curve cryptosystem which has a public-key of 160 bits is equivalent to that of RSA cryptosystem which has a public-key of 1024 bits. We propose an elliptic curve cryptosystem LSI architecture embedding word-based Montgomery multiplier. A Montgomery multiplication is an efficient method for modular multiplication. We can design a scalable architecture for an elliptic curve cryptosystem by selecting structure of word-based Montgomery multipliers. Experimental results demonstrate effectiveness and efficiency of the proposed architecture. In the hardware evaluation using 0.18μm CMOS standard cell library, the high-speed design using 126 Kgates with 8 bits multiplier achieved operation times of 3.6ms.

    CiNii

  • 面積制約を考慮したマルチスレッドプロセッサの合成手法

    麻生雄一, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告    2005.03

  • Sub-operation parallelism optimization in SIMD processor synthesis and its experimental evaluations

    N Togawa, Y Miyaoka, H Kawazu, M Yanagisawa, J Uchida, T Ohtsuki

    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS     3499 - 3502  2005  [Refereed]

     View Summary

    In this paper, we propose a sub-operation parallelism optimization algorithm in SIMD) processor synthesis. Given an initial assembly code and timing constraints, our algorithm synthesizes a processor core with sub-operation parallelism optimization for SIMD) functional units. First we consider an initial processor which has sufficient hardware units for executing an initial assembly code. An initial processor core includes the maximum sub-operation parallelism for each SIMD) functional unit. By gradually reducing sub-operation parallelism, we can finally have a processor core with small area meeting a given timing constraints. We show the effectiveness of our proposed algorithm through experimental results.

    DOI

    Scopus

  • Partially-parallel LDPC decoder based on high-efficiency message-passing algorithm

    K Shimizu, T Ishikawa, N Togawa, T Ikenaga, S Goto

    2005 IEEE International Conference on Computer Design: VLSI in Computers & Processors, Proceedings     503 - 510  2005  [Refereed]

     View Summary

    This paper proposes a partially-parallel LDPC decoder based on a high-efficiency message-passing algorithm. Our proposed partially-parallel LDPC decoder performs the column operations for bit nodes in conjunction with the row operations for check nodes. Bit functional unit with pipeline architecture in our LDPC decoder allows us to perform column operations for every bit node connected to each of check nodes which are updated by the row operations in parallel. Our proposed LDPC decoder improves the timing when the column operations are performed, accordingly it improves the message-passing efficiency within the limited number of iterations for decoding. We implemented the proposed partially-parallel LDPC decoder on an FPGA, and simulated its decoding performance. Practical simulation shows that our proposed LDPC decoder reduces the number of iterations for decoding, and it improves the bit error performance with a small hardware overhead.

    DOI

    Scopus

    31
    Citation
    (Scopus)
  • Low power test compression technique for designs with multiple scan chains

    YH Shi, N Togawa, S Kimura, M Yanagisawa, T Ohtsuki

    14TH ASIAN TEST SYMPOSIUM, PROCEEDINGS     386 - 389  2005  [Refereed]

     View Summary

    This paper presents a new DFT technique that can significantly reduce test data volume as well as scan-in power consumption for multiscan-based designs. It can also help to reduce test time and tester channel requirements with small hardware overhead In the proposed approach, we start with a pre-computed test cube set and fill the don't-cares with proper values for joint reduction of test data volume and scan power consumption. In addition we explore the linear dependencies of the scan chains to construct a fanout structure only with inverters to achieve further compression. Experimental results for the larger ISCAS'89 benchmarks show the efficiency of the proposed technique.

    DOI

    Scopus

    17
    Citation
    (Scopus)
  • Reconfigurable adaptive FEC system with interleaving

    Kazunori Shimizu, Nozomu Togawa, Takeshi Ikenaga, Satoshi Goto

    ASP-DAC 2005: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2     1252 - 1255  2005  [Refereed]

     View Summary

    This paper proposes a reconfigurable adaptive FEC system with interleaving. For adaptive FEC schemes, we can implement an optimal RS decoder composed of minimum hardware units for any given error correction capability t. If the hardware units of the RS decoder can be reduced for any given t, we can embed as large deinterleaver as possible into the RS decoder for each t. Reconfiguring the RS decoder embedded with the expanded deinterleaver dynamically for each t allows us to decode larger interleaved codes which are more robust FEC codes to burst errors. Our reconfigurable adaptive FEC system with interleaving achieves better packet error rate and higher throughput than fixed hardware systems.

  • A processor core synthesis system in IP-based SoC design

    Naoki Tomono, Shunitsu Kohara, Jumpei Uchida, Yuichiro Miyaoka, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    ASP-DAC 2005: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2     286 - 291  2005  [Refereed]

     View Summary

    This paper proposes a new design methodology for SoCs reusing hardware IPs. In our approach, after system-level HW/SW partitioning, we use IPs for hardware parts, but synthesize a new processor core instead of reusing a processor core IP. System performs efficient parallel execution of hardware and software by taking account of a response time of hardware IP obtained by the proposed calculation algorithm. We can use optimal hardware IPs selected by the proposed hardware IPs selection algorithm. The experimental results show effectiveness of our new design methodology.

  • Sub-operation parallelism optimization in SIMD processor synthesis and its experimental evaluations

    N Togawa, Y Miyaoka, H Kawazu, M Yanagisawa, J Uchida, T Ohtsuki

    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS     3499 - 3502  2005

     View Summary

    In this paper, we propose a sub-operation parallelism optimization algorithm in SIMD) processor synthesis. Given an initial assembly code and timing constraints, our algorithm synthesizes a processor core with sub-operation parallelism optimization for SIMD) functional units. First we consider an initial processor which has sufficient hardware units for executing an initial assembly code. An initial processor core includes the maximum sub-operation parallelism for each SIMD) functional unit. By gradually reducing sub-operation parallelism, we can finally have a processor core with small area meeting a given timing constraints. We show the effectiveness of our proposed algorithm through experimental results.

    DOI

    Scopus

  • Partially-parallel LDPC decoder based on high-efficiency message-passing algorithm

    K Shimizu, T Ishikawa, N Togawa, T Ikenaga, S Goto

    2005 IEEE International Conference on Computer Design: VLSI in Computers & Processors, Proceedings     503 - 510  2005

     View Summary

    This paper proposes a partially-parallel LDPC decoder based on a high-efficiency message-passing algorithm. Our proposed partially-parallel LDPC decoder performs the column operations for bit nodes in conjunction with the row operations for check nodes. Bit functional unit with pipeline architecture in our LDPC decoder allows us to perform column operations for every bit node connected to each of check nodes which are updated by the row operations in parallel. Our proposed LDPC decoder improves the timing when the column operations are performed, accordingly it improves the message-passing efficiency within the limited number of iterations for decoding. We implemented the proposed partially-parallel LDPC decoder on an FPGA, and simulated its decoding performance. Practical simulation shows that our proposed LDPC decoder reduces the number of iterations for decoding, and it improves the bit error performance with a small hardware overhead.

    DOI

    Scopus

    31
    Citation
    (Scopus)
  • Sub-operation parallelism optimization in SIMD processor core synthesis

    Hideki Kawazu, Jumpei Uchida, Yuichiro Miyaoka, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E88-A ( 4 ) 876 - 883  2005

     View Summary

    A b-bit SIMD functional unit has n k-bit sub-functional units in itself, where b = k × n. It can execute n-parallel k-bit operations. However, all the b-bit functional units in a processor core do not necessarily execute n-parallel operations. Depending on an application program, some of them just execute n/2-parallel operations or even n/4-parallel operations. This means that we can modify a b-bit SIMD functional unit so that it has n/2 k-bit sub-functional units or n/4 k-bit sub-functional units. The number of k-bit sub-functional units in a SIMD functional unit is called sub-operation parallelism. We incorporate a sub-operation parallelism optimization algorithm into SIMD functional unit optimization. Our proposed algorithm gradually reduces sub-operation parallelism of a SIMD functional unit while the timing constraint of execution time satisfied. Thereby, we can finally find a processor core with small area under the given timing constraint. We expect that we can obtain processor core configurations of smaller area in the same timing constraint rather than a conventional system. The promising experimental results are also shown. Copyright © 2005 The Institute of Electronics, Information and Communication Engineers.

    DOI

    Scopus

  • A processor core synthesis system in IP-based SoC design

    Naoki Tomono, Shunitsu Kohara, Jumpei Uchida, Yuichiro Miyaoka, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    ASP-DAC 2005: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2     286 - 291  2005

     View Summary

    This paper proposes a new design methodology for SoCs reusing hardware IPs. In our approach, after system-level HW/SW partitioning, we use IPs for hardware parts, but synthesize a new processor core instead of reusing a processor core IP. System performs efficient parallel execution of hardware and software by taking account of a response time of hardware IP obtained by the proposed calculation algorithm. We can use optimal hardware IPs selected by the proposed hardware IPs selection algorithm. The experimental results show effectiveness of our new design methodology.

  • Sub-operation parallelism optimization in SIMD processor synthesis and its experimental evaluations

    Nozomu Togawa, Hideki Kawazu, Jumpei Uchida, Yuichiro Miyaoka, Masao Yanagisawa, Tatsuo Ohtsuki

    Proceedings - IEEE International Symposium on Circuits and Systems     3499 - 3502  2005

     View Summary

    In this paper, we propose a sub-operation parallelism optimization algorithm in SIMD processor synthesis. Given an initial assembly code and timing constraints, our algorithm synthesizes a processor core with sub-operation parallelism optimization for SIMD functional units. First we consider an initial processor which has sufficient hardware units for executing an initial assembly code. An initial processor core includes the maximum sub-operation parallelism for each SIMD functional unit. By gradually reducing sub-operation parallelism, we can finally have a processor core with small area meeting a given timing constraints. We show the effectiveness of our proposed algorithm through experimental results. © 2005 IEEE.

    DOI

    Scopus

  • Partially-parallel LDPC decoder based on high-efficiency message-passing algorithm

    K Shimizu, T Ishikawa, N Togawa, T Ikenaga, S Goto

    2005 IEEE International Conference on Computer Design: VLSI in Computers & Processors, Proceedings     503 - 510  2005

     View Summary

    This paper proposes a partially-parallel LDPC decoder based on a high-efficiency message-passing algorithm. Our proposed partially-parallel LDPC decoder performs the column operations for bit nodes in conjunction with the row operations for check nodes. Bit functional unit with pipeline architecture in our LDPC decoder allows us to perform column operations for every bit node connected to each of check nodes which are updated by the row operations in parallel. Our proposed LDPC decoder improves the timing when the column operations are performed, accordingly it improves the message-passing efficiency within the limited number of iterations for decoding. We implemented the proposed partially-parallel LDPC decoder on an FPGA, and simulated its decoding performance. Practical simulation shows that our proposed LDPC decoder reduces the number of iterations for decoding, and it improves the bit error performance with a small hardware overhead.

    DOI

    Scopus

    31
    Citation
    (Scopus)
  • Sub-operation parallelism optimization in SIMD processor core synthesis

    Hideki Kawazu, Jumpei Uchida, Yuichiro Miyaoka, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E88-A ( 4 ) 876 - 883  2005

     View Summary

    A b-bit SIMD functional unit has n k-bit sub-functional units in itself, where b = k × n. It can execute n-parallel k-bit operations. However, all the b-bit functional units in a processor core do not necessarily execute n-parallel operations. Depending on an application program, some of them just execute n/2-parallel operations or even n/4-parallel operations. This means that we can modify a b-bit SIMD functional unit so that it has n/2 k-bit sub-functional units or n/4 k-bit sub-functional units. The number of k-bit sub-functional units in a SIMD functional unit is called sub-operation parallelism. We incorporate a sub-operation parallelism optimization algorithm into SIMD functional unit optimization. Our proposed algorithm gradually reduces sub-operation parallelism of a SIMD functional unit while the timing constraint of execution time satisfied. Thereby, we can finally find a processor core with small area under the given timing constraint. We expect that we can obtain processor core configurations of smaller area in the same timing constraint rather than a conventional system. The promising experimental results are also shown. Copyright © 2005 The Institute of Electronics, Information and Communication Engineers.

    DOI

    Scopus

  • A Processor Core Synthesis System in IP-based SoC Design

    友野直紀, 小原俊逸, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    Proceedings of the ASP-DAC 2005    2005.01

  • FPGA-based reconfigurable adaptive FEC

    K Shimizu, J Uchida, Y Miyaoka, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E87A ( 12 ) 3036 - 3046  2004.12

     View Summary

    In this paper, we propose a reconfigurable adaptive FEC system. In adaptive FEC schemes, the error correction capability t is changed dynamically according to the communication channel condition. If a particular error correction capability t is given, we can implement an FEC decoder which is optimal for t by taking the number of operations into consideration. Thus, reconfiguring the optimal FEC decoder dynamically for each error correction capability allows us to maximize the throughput of each decoder within a limited hardware resource. Based on this concept, our reconfigurable adaptive FEC system can reduce the packet dropping rate more efficiently than conventional fixed hardware systems. We can improve data transmission throughput for a reliable transport protocol. Practical simulation results are also shown.

  • High-level power optimization based on thread partitioning

    J Uchida, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E87A ( 12 ) 3075 - 3082  2004.12

     View Summary

    This paper proposes a thread partitioning algorithm in low power high-level synthesis. The algorithm is applied to high-level synthesis systems. In the systems, we can describe parallel behaving circuit blocks (threads) explicitly. First it focuses on a local register file RF in a thread. It partitions a thread into two sub-threads, one of which has RF and the other does not have RE The partitioned sub-threads need to be synchronized with each other to keep the data dependency of the original thread. Since the partitioned sub-threads have waiting time for synchronization, gated clocks can be applied to each sub-thread. Then we can synthesize a low power circuit with a low area overhead, compared to the original circuit. Experimental results demonstrate effectiveness and efficiency of the algorithm.

  • レジスタ分散型アーキテクチャを対象とするフロアプランを考慮した高位合成手法

    田中真, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2004  2004.12

    CiNii

  • FPGA-based reconfigurable adaptive FEC

    K Shimizu, J Uchida, Y Miyaoka, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E87A ( 12 ) 3036 - 3046  2004.12

     View Summary

    In this paper, we propose a reconfigurable adaptive FEC system. In adaptive FEC schemes, the error correction capability t is changed dynamically according to the communication channel condition. If a particular error correction capability t is given, we can implement an FEC decoder which is optimal for t by taking the number of operations into consideration. Thus, reconfiguring the optimal FEC decoder dynamically for each error correction capability allows us to maximize the throughput of each decoder within a limited hardware resource. Based on this concept, our reconfigurable adaptive FEC system can reduce the packet dropping rate more efficiently than conventional fixed hardware systems. We can improve data transmission throughput for a reliable transport protocol. Practical simulation results are also shown.

  • High-level power optimization based on thread partitioning

    J Uchida, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E87A ( 12 ) 3075 - 3082  2004.12

     View Summary

    This paper proposes a thread partitioning algorithm in low power high-level synthesis. The algorithm is applied to high-level synthesis systems. In the systems, we can describe parallel behaving circuit blocks (threads) explicitly. First it focuses on a local register file RF in a thread. It partitions a thread into two sub-threads, one of which has RF and the other does not have RE The partitioned sub-threads need to be synchronized with each other to keep the data dependency of the original thread. Since the partitioned sub-threads have waiting time for synchronization, gated clocks can be applied to each sub-thread. Then we can synthesize a low power circuit with a low area overhead, compared to the original circuit. Experimental results demonstrate effectiveness and efficiency of the algorithm.

  • A sub-operation parallelism optimization algorithmin HW/SW partitioning for SIMD processor cores

    SASIMI2004     483 - 490  2004.10

  • A sub-operation parallelism optimization algorithmin HW/SW partitioning for SIMD processor cores

    川津秀樹, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    SASIMI2004     483 - 490  2004.10

  • IP再利用を考慮したシステムLSIにおけるプロセッサコア合成システム

    友野直紀, 小原俊逸, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会 DAシンポジウム 2004     19 - 24  2004.07

  • フロアプランとタイミング制約に基づくレジスタ間データ転送を考慮した高位合成手法

    田中真, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会 DAシンポジウム 2004     283 - 288  2004.07

  • A hardware/software cosynthesis algorithm for processors with heterogeneous datapaths

    Y Miyaoka, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E87A ( 4 ) 830 - 836  2004.04

     View Summary

    This paper proposes a hardware/software cosynthesis algorithm for processors with heterogeneous registers. Given a CDFG corresponding to an application program and a timing constraint, the algorithm generates a processor configuration minimizing area of the processor and an assembly code on the processor. First, the algorithm configures a datapath which can execute several DFG nodes with data dependency at one cycle. The datapath can execute the application program at the least number of cycles. The branch and bound algorithm is applied and all the number of functional units and memory banks are tried. For an assumed number of functional units and memory banks, an appropriate number of heterogeneous registers and connections to functional units and registers are explored. The experimental results show effectiveness and efficiency of the algorithm.

  • SIMD型プロセッサコア向けHW/SW分割におけるSIMD型演算最適化手法

    川津秀樹, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会第17回 回路とシステム(軽井沢)ワークショップ     579 - 584  2004.04

  • A hardware/software cosynthesis algorithm for processors with heterogeneous datapaths

    Y Miyaoka, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E87A ( 4 ) 830 - 836  2004.04

     View Summary

    This paper proposes a hardware/software cosynthesis algorithm for processors with heterogeneous registers. Given a CDFG corresponding to an application program and a timing constraint, the algorithm generates a processor configuration minimizing area of the processor and an assembly code on the processor. First, the algorithm configures a datapath which can execute several DFG nodes with data dependency at one cycle. The datapath can execute the application program at the least number of cycles. The branch and bound algorithm is applied and all the number of functional units and memory banks are tried. For an assumed number of functional units and memory banks, an appropriate number of heterogeneous registers and connections to functional units and registers are explored. The experimental results show effectiveness and efficiency of the algorithm.

  • ネットワークプロセッサ合成システム

    松浦努, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2003-145   55 - 60  2004.03

  • HW/SW分割システムにおける仮想IP類推手法

    小田雄一, 内田純平, 宮岡祐一郎, 戸川望, 橘昌良, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2003-158 ( 703 ) 47 - 52  2004.03

     View Summary

    In this paper, we propose a virtual IP analogizing algorithm. This algorithm enumerates virtual IPs on the analogy of existing IPs based on "the parallelism" and "the iterative process". We focus on image processing applications, such as MPEG-4, JPEG, and JPEG2000. We analyze the processes (DOT, quantization, etc.) which constitute these applications. Then we apply "an algorithm based on the parallelism" or "an algorithm based on the iterative process" to each processes. By applying the proposed algorithm, design space spreds. We implement the proposed algorithm on a computer and show its effectiveness.

    CiNii

  • Packed SIMD型命令を持つプロセッサ合成システムのためのリターゲッタブルコンパイラ

    加藤久晴, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2003-157   41 - 46  2004.03

  • 面積制約を考慮したCAMプロセッサ最適化手法

    石川裕一朗, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2003-152   13 - 18  2004.03

  • インターリーブを考慮したReconfigurable Adaptive FEC

    清水一範, 内田純平, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2003-151 ( 703 ) 7 - 12  2004.03

     View Summary

    In this paper, we propose a Reconfigurable Adaptive FEC system with Interleave. For given correction capability t, we can realize an FEC CODEC system with optimal interleave depth by dynamically reconfiguring its architecture. The proposed Reconfigurable Adaptive FEC system can generate optimal interleaved RS codes which are more robust to burst errors than those generated by fixed hardware systems. Especially, we focus on optimization of RS decoders and propose a error correction architecture with an interleave. We estimated its effectiveness and efficiency through practical simulation.

    CiNii

  • 携帯機器を対象としたJava動的コンパイラにおけるプロファイリングシステム

    船田雅史, 内田純平, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会研究報告,2004-MBL-28   2004 ( 21 ) 55 - 62  2004.03

     View Summary

    This paper proposes a lightweight profiling system of Java dynamic compiler for handheld devices. The system detects the methods frequently invoked in application (hot method) during execution of a Java virtual machine. A hot method is compiled into a native code by the compiler, and is stored in a heap area. The profiler determines the heap area used for a native code, and it is possible to reduce a garbage collection. Our technique can profiles method informations with 3% overhead of the processing time of a Java virtual machine. By compiling the hot method, as a result, we achieve approximately 7 times speedup in average, by suppressing a garbage collections to approximately 2 times of the original virtual machine.

    CiNii

  • Alternative Run-Length.Coding through scan chain reconfiguration for joint minimization of test data volume and power consumption in scan test

    YH Shi, S Kimura, N Togawa, M Yanagisawa, T Ohtsuki

    13TH ASIAN TEST SYMPOSIUM, PROCEEDINGS     432 - 437  2004  [Refereed]

     View Summary

    Test data volume and scan power are two Major concerns in SoC test. In this paper we present an alternative run-length coding method through scan chain reconfiguration to reduce both test data volume and scan-in power consumption. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. To extract the compatible scan cells we apply a heuristic algorithm by solving the graph coloring problem; and then a simple greedy algorithm is used to configure the scan chain for the minimization of scan power Experimental results for the larger ISCAS'89 benchmarks show that the proposed approach leads to highly reduced test data volume with significant power savings during scan test.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Instruction set and functional unit synthesis for SIMD processor cores

    N Togawa, K Tachikake, Y Miyaoka, M Yanagisawa, T Ohtsuki

    ASP-DAC 2004: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     743 - 750  2004  [Refereed]

     View Summary

    This paper focuses on SIMD processor synthesis and proposes a SIMD instruction set/functional unit synthesis algorithm. Given an initial assembly code and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with optimal SIMD functional units. It also synthesizes a SIMD instruction set. The input initial assembly code is assumed to run on a full-resource SIMD processor (virtual processor) which has all the possible SIMD functional units. In our algorithm, we introduce the SIMD operation decomposition and apply it to the initial assembly code and the full-resource SIMD processor. By gradually reducing SIMD operations or decomposing SIMD operations, we can finally find a processor core with small area under the given timing constraint. The promising experimental results are also shown.

  • A cosynthesis algorithm for application specific processors with heterogeneous datapaths

    Y Miyaoka, N Togawa, M Yanagisawa, T Ohtsuki

    ASP-DAC 2004: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     250 - 255  2004  [Refereed]

     View Summary

    This paper proposes a hardware/software cosynthesis algorithm for processors with heterogeneous registers. Given a CDFG corresponding to an application program and a timing constraint, the algorithm generates a processor configuration minimizing area of the processor and an assembly code on the processor. First, the algorithm configures a datapath which can execute several DFG nodes with data dependency at one cycle. The datapath can execute the application program at the least number of cycles. The branch and bound algorithm is applied and all the number of functional units and memory banks are tried. For an assumed number of functional units and memory banks, an appropriate number of heterogeneous registers and connections to functional units and registers are explored. The experimental results show effectiveness and efficiency of the algorithm.

  • A thread partitioning algorithm in low power high-level synthesis

    J Uchida, N Togawa, M Yanagisawa, T Ohtsuki

    ASP-DAC 2004: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     74 - 79  2004  [Refereed]

     View Summary

    This paper proposes a thread partitioning algorithm in low power high-level synthesis. The algorithm is applied to high-level synthesis systems. In the systems, we can describe parallel behaving circuit blocks(threads) explicitly. First it focuses on a local register file RF in a thread. It partitions a thread into two sub-threads, one of which has RF and the other does not have, RF. The partitioned sub-threads need to be synchronized with each other to keep the data dependency of the original thread. Since the partitioned sub-threads have waiting time for synchronization, gated clocks can be applied to each sub-thread. Then we can synthesize a low power circuit with a low area overhead, compared to the original circuit. Experimental results demonstrate effectiveness and efficiency of the algorithm.

  • A reconfigurable adaptive FEC system for reliable wireless communications

    K Shimizu, N Togawa, T Ikenaga, M Yanagisawa, S Goto, T Ohtsuki

    PROCEEDINGS OF THE 2004 IEEE ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, VOL 1 AND 2     13 - 16  2004

     View Summary

    This paper proposes a reconfigurable adaptive FEC system. For adaptive FEC schemes, we can implement an FEC decoder which is optimal for error correction capability t by taking the number of operations into consideration. Reconfiguring the optimal FEC decoder dynamically for each t allows us to maximize the throughput of each decoder within a limited hardware resource. Our system can reduce packet dropping rate more efficiently than conventional fixed hardware systems for a reliable transport protocol.

  • Experimental evaluation of high-level energy optimization based on thread partitioning

    J Uchida, Y Miyaoka, N Togawa, M Yanagisawa, T Ohtsuki

    PROCEEDINGS OF THE 2004 IEEE ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, VOL 1 AND 2     161 - 164  2004

     View Summary

    This paper presents a thread partitioning algorithm for high-level synthesis systems which generate low energy circuits. In the algorithm, we partitions a thread into two sub-threads, one of which has RF and the other does not have RE The partitioned sub-threads need to be synchronized with each other to keep the data dependency of the original thread. Since the partitioned sub-threads have waiting time for synchronization, gated clocks can be applied to each sub-thread. We achieve 33% energy reduction when we apply our proposed algorithm to a JPEG encoder.

  • Alternative Run-Length.Coding through scan chain reconfiguration for joint minimization of test data volume and power consumption in scan test

    YH Shi, S Kimura, N Togawa, M Yanagisawa, T Ohtsuki

    13TH ASIAN TEST SYMPOSIUM, PROCEEDINGS     432 - 437  2004

     View Summary

    Test data volume and scan power are two Major concerns in SoC test. In this paper we present an alternative run-length coding method through scan chain reconfiguration to reduce both test data volume and scan-in power consumption. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. To extract the compatible scan cells we apply a heuristic algorithm by solving the graph coloring problem; and then a simple greedy algorithm is used to configure the scan chain for the minimization of scan power Experimental results for the larger ISCAS'89 benchmarks show that the proposed approach leads to highly reduced test data volume with significant power savings during scan test.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • An efficient algorithm/architecture codesign for image encoders

    J Choi, N Togawa, T Ikenaga, S Goto, M Yanagisawa, T Ohtsuki

    2004 47TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, CONFERENCE PROCEEDINGS     469 - 472  2004

     View Summary

    We describe the optimization of a complex video encoder systems based on target architecture. We implemented the MPEG-4 encoder using hardware/software codesign approach, mapped together based on a target architecture. We proposed a target architecture template and an optimization methodology. In our design flow, we searched for a bottleneck module constraining the system. After investigating the computational complexity, quality, and the simplicity of algorithms, we chose the best algorithm for hardware implementation, and then mapped the selected algorithm onto the hardware with different architecture, what does the best architecture for the algorithm and which is the best architecture of components. We chose one of the architectures meet the constraints and also made tradeoffs among speed, chip area, and memory bandwidth for different architecture. The proposed system architecture was used to reduce the design decisions and iterations, provided flexible and scalable systems. The evaluations resulted in effective optimization of the motion estimation module and better tradeoffs that optimized the overall system.

  • Reducing test data volume for multiscan-based designs through single/sequence mixed encoding

    Y Shi, S Kimura, N Togawa, M Yanagisawa, T Ohtsuki

    2004 47TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, CONFERENCE PROCEEDINGS     445 - 448  2004

     View Summary

    This paper presents a new test data compression technique for multiscan-based designs through dictionary-based encoding on the single or sequences scan-inputs. In spite of its simplicity, it achieves significant reduction in test data volume. Unlike some previous approaches on test data compression, our approach eliminates the need for additional synchronization and handshaking between the CUT and the ATE, so it is especially suitable to be integrated in a low cost test scheme for SoC test In addition in contrast to previous dictionary-based coding techniques, even for the CUT with a small number of scan chains, the proposed approach can achieve satisfied reduction in test data volume. Experimental results showed the proposed test scheme works particularly well for the large ISCAS'89 benchmarks.

  • Instruction set and functional unit synthesis for SIMD processor cores

    N Togawa, K Tachikake, Y Miyaoka, M Yanagisawa, T Ohtsuki

    ASP-DAC 2004: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     743 - 750  2004

     View Summary

    This paper focuses on SIMD processor synthesis and proposes a SIMD instruction set/functional unit synthesis algorithm. Given an initial assembly code and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with optimal SIMD functional units. It also synthesizes a SIMD instruction set. The input initial assembly code is assumed to run on a full-resource SIMD processor (virtual processor) which has all the possible SIMD functional units. In our algorithm, we introduce the SIMD operation decomposition and apply it to the initial assembly code and the full-resource SIMD processor. By gradually reducing SIMD operations or decomposing SIMD operations, we can finally find a processor core with small area under the given timing constraint. The promising experimental results are also shown.

  • A cosynthesis algorithm for application specific processors with heterogeneous datapaths

    Y Miyaoka, N Togawa, M Yanagisawa, T Ohtsuki

    ASP-DAC 2004: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     250 - 255  2004

     View Summary

    This paper proposes a hardware/software cosynthesis algorithm for processors with heterogeneous registers. Given a CDFG corresponding to an application program and a timing constraint, the algorithm generates a processor configuration minimizing area of the processor and an assembly code on the processor. First, the algorithm configures a datapath which can execute several DFG nodes with data dependency at one cycle. The datapath can execute the application program at the least number of cycles. The branch and bound algorithm is applied and all the number of functional units and memory banks are tried. For an assumed number of functional units and memory banks, an appropriate number of heterogeneous registers and connections to functional units and registers are explored. The experimental results show effectiveness and efficiency of the algorithm.

  • A thread partitioning algorithm in low power high-level synthesis

    J Uchida, N Togawa, M Yanagisawa, T Ohtsuki

    ASP-DAC 2004: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     74 - 79  2004

     View Summary

    This paper proposes a thread partitioning algorithm in low power high-level synthesis. The algorithm is applied to high-level synthesis systems. In the systems, we can describe parallel behaving circuit blocks(threads) explicitly. First it focuses on a local register file RF in a thread. It partitions a thread into two sub-threads, one of which has RF and the other does not have, RF. The partitioned sub-threads need to be synchronized with each other to keep the data dependency of the original thread. Since the partitioned sub-threads have waiting time for synchronization, gated clocks can be applied to each sub-thread. Then we can synthesize a low power circuit with a low area overhead, compared to the original circuit. Experimental results demonstrate effectiveness and efficiency of the algorithm.

  • A reconfigurable adaptive FEC system for reliable wireless communications

    K Shimizu, N Togawa, T Ikenaga, M Yanagisawa, S Goto, T Ohtsuki

    PROCEEDINGS OF THE 2004 IEEE ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, VOL 1 AND 2     13 - 16  2004

     View Summary

    This paper proposes a reconfigurable adaptive FEC system. For adaptive FEC schemes, we can implement an FEC decoder which is optimal for error correction capability t by taking the number of operations into consideration. Reconfiguring the optimal FEC decoder dynamically for each t allows us to maximize the throughput of each decoder within a limited hardware resource. Our system can reduce packet dropping rate more efficiently than conventional fixed hardware systems for a reliable transport protocol.

  • Experimental evaluation of high-level energy optimization based on thread partitioning

    J Uchida, Y Miyaoka, N Togawa, M Yanagisawa, T Ohtsuki

    PROCEEDINGS OF THE 2004 IEEE ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, VOL 1 AND 2     161 - 164  2004

     View Summary

    This paper presents a thread partitioning algorithm for high-level synthesis systems which generate low energy circuits. In the algorithm, we partitions a thread into two sub-threads, one of which has RF and the other does not have RE The partitioned sub-threads need to be synchronized with each other to keep the data dependency of the original thread. Since the partitioned sub-threads have waiting time for synchronization, gated clocks can be applied to each sub-thread. We achieve 33% energy reduction when we apply our proposed algorithm to a JPEG encoder.

  • Alternative Run-Length.Coding through scan chain reconfiguration for joint minimization of test data volume and power consumption in scan test

    YH Shi, S Kimura, N Togawa, M Yanagisawa, T Ohtsuki

    13TH ASIAN TEST SYMPOSIUM, PROCEEDINGS     432 - 437  2004

     View Summary

    Test data volume and scan power are two Major concerns in SoC test. In this paper we present an alternative run-length coding method through scan chain reconfiguration to reduce both test data volume and scan-in power consumption. The proposed method analyzes the compatibility of the internal scan cells for a given test set and then divides the scan cells into compatible classes. To extract the compatible scan cells we apply a heuristic algorithm by solving the graph coloring problem; and then a simple greedy algorithm is used to configure the scan chain for the minimization of scan power Experimental results for the larger ISCAS'89 benchmarks show that the proposed approach leads to highly reduced test data volume with significant power savings during scan test.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • An efficient algorithm/architecture codesign for image encoders

    J Choi, N Togawa, T Ikenaga, S Goto, M Yanagisawa, T Ohtsuki

    2004 47TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, CONFERENCE PROCEEDINGS     469 - 472  2004

     View Summary

    We describe the optimization of a complex video encoder systems based on target architecture. We implemented the MPEG-4 encoder using hardware/software codesign approach, mapped together based on a target architecture. We proposed a target architecture template and an optimization methodology. In our design flow, we searched for a bottleneck module constraining the system. After investigating the computational complexity, quality, and the simplicity of algorithms, we chose the best algorithm for hardware implementation, and then mapped the selected algorithm onto the hardware with different architecture, what does the best architecture for the algorithm and which is the best architecture of components. We chose one of the architectures meet the constraints and also made tradeoffs among speed, chip area, and memory bandwidth for different architecture. The proposed system architecture was used to reduce the design decisions and iterations, provided flexible and scalable systems. The evaluations resulted in effective optimization of the motion estimation module and better tradeoffs that optimized the overall system.

  • Reducing test data volume for multiscan-based designs through single/sequence mixed encoding

    Y Shi, S Kimura, N Togawa, M Yanagisawa, T Ohtsuki

    2004 47TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II, CONFERENCE PROCEEDINGS     445 - 448  2004

     View Summary

    This paper presents a new test data compression technique for multiscan-based designs through dictionary-based encoding on the single or sequences scan-inputs. In spite of its simplicity, it achieves significant reduction in test data volume. Unlike some previous approaches on test data compression, our approach eliminates the need for additional synchronization and handshaking between the CUT and the ATE, so it is especially suitable to be integrated in a low cost test scheme for SoC test In addition in contrast to previous dictionary-based coding techniques, even for the CUT with a small number of scan chains, the proposed approach can achieve satisfied reduction in test data volume. Experimental results showed the proposed test scheme works particularly well for the large ISCAS'89 benchmarks.

  • Instruction set and functional unit synthesis for SIMD processor cores

    N Togawa, K Tachikake, Y Miyaoka, M Yanagisawa, T Ohtsuki

    ASP-DAC 2004: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     743 - 750  2004

     View Summary

    This paper focuses on SIMD processor synthesis and proposes a SIMD instruction set/functional unit synthesis algorithm. Given an initial assembly code and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with optimal SIMD functional units. It also synthesizes a SIMD instruction set. The input initial assembly code is assumed to run on a full-resource SIMD processor (virtual processor) which has all the possible SIMD functional units. In our algorithm, we introduce the SIMD operation decomposition and apply it to the initial assembly code and the full-resource SIMD processor. By gradually reducing SIMD operations or decomposing SIMD operations, we can finally find a processor core with small area under the given timing constraint. The promising experimental results are also shown.

  • A cosynthesis algorithm for application specific processors with heterogeneous datapaths

    Y Miyaoka, N Togawa, M Yanagisawa, T Ohtsuki

    ASP-DAC 2004: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     250 - 255  2004

     View Summary

    This paper proposes a hardware/software cosynthesis algorithm for processors with heterogeneous registers. Given a CDFG corresponding to an application program and a timing constraint, the algorithm generates a processor configuration minimizing area of the processor and an assembly code on the processor. First, the algorithm configures a datapath which can execute several DFG nodes with data dependency at one cycle. The datapath can execute the application program at the least number of cycles. The branch and bound algorithm is applied and all the number of functional units and memory banks are tried. For an assumed number of functional units and memory banks, an appropriate number of heterogeneous registers and connections to functional units and registers are explored. The experimental results show effectiveness and efficiency of the algorithm.

  • A thread partitioning algorithm in low power high-level synthesis

    J Uchida, N Togawa, M Yanagisawa, T Ohtsuki

    ASP-DAC 2004: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     74 - 79  2004

     View Summary

    This paper proposes a thread partitioning algorithm in low power high-level synthesis. The algorithm is applied to high-level synthesis systems. In the systems, we can describe parallel behaving circuit blocks(threads) explicitly. First it focuses on a local register file RF in a thread. It partitions a thread into two sub-threads, one of which has RF and the other does not have, RF. The partitioned sub-threads need to be synchronized with each other to keep the data dependency of the original thread. Since the partitioned sub-threads have waiting time for synchronization, gated clocks can be applied to each sub-thread. Then we can synthesize a low power circuit with a low area overhead, compared to the original circuit. Experimental results demonstrate effectiveness and efficiency of the algorithm.

  • A hardware/software partitioning algorithm for processor cores with packed SIMD-type instructions

    N Togawa, K Tachikake, Y Miyaoka, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E86A ( 12 ) 3218 - 3224  2003.12

     View Summary

    This letter proposes a new hardware/software partitioning algorithm for processor cores with SIMD instructions. Given a compiled assembly code including SIMD instructions and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with a new assembly code. Firstly, we assume for each operation type a super SIMD functional unit which can execute all the SIMD instructions. Secondly we reduce a SIMD instruction or "sub-function" of each super functional unit, one by one, while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new processor configuration. By repeating this process, we finally find SIMD functional unit configuration as well as a processor core architecture. The promising experimental results are also shown.

  • A retargetable simulator generator for DSP processor cores with packed SIMD-type instructions

    N Togawa, K Kasahara, Y Miyaoka, J Choi, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E86A ( 12 ) 3099 - 3109  2003.12

     View Summary

    A packed SIMD type operation or a SIMD operation is n-parallel b/n-bit sub-operations executed by the modified n-bit functional unit. Such a functional unit is called a SIMD functional unit and a processor core which can execute SIMD operations is called a SIMD processor core. SIMD operations can be effectively applied to image processing applications. This paper focuses on hardware/software cosynthesis of SIMD processor cores and particularly proposes a new simulator generator which simulates pipelined instructions for a SIMD processor. Generally, a SIMD functional unit has many options and then we can have so many different SIMD functional unit instances. However, since our hardware/software cosynthesis system synthesizes a special-purpose processor core for an input application program, it uses very limited SIMD functional unit instances. In the proposed approach, we consider a SIMD operation to be a set of SIMD sub-operations. By adding up the appropriate SIMD sub-operations, we construct a single SlMD operation. Then a SIMD functional unit behavior can be characterized by a collection of SIMD operations. This approach has the advantage that: if we have a small number of behavior libraries for SIMD suboperations, we can instantiate a particular SIMD functional unit behavior. Experimental results demonstrate the effectiveness of the proposed approach.

  • A hardware/software partitioning algorithm for processor cores with packed SIMD-type instructions

    N Togawa, K Tachikake, Y Miyaoka, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E86A ( 12 ) 3218 - 3224  2003.12

     View Summary

    This letter proposes a new hardware/software partitioning algorithm for processor cores with SIMD instructions. Given a compiled assembly code including SIMD instructions and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with a new assembly code. Firstly, we assume for each operation type a super SIMD functional unit which can execute all the SIMD instructions. Secondly we reduce a SIMD instruction or "sub-function" of each super functional unit, one by one, while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new processor configuration. By repeating this process, we finally find SIMD functional unit configuration as well as a processor core architecture. The promising experimental results are also shown.

  • A retargetable simulator generator for DSP processor cores with packed SIMD-type instructions

    N Togawa, K Kasahara, Y Miyaoka, J Choi, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E86A ( 12 ) 3099 - 3109  2003.12

     View Summary

    A packed SIMD type operation or a SIMD operation is n-parallel b/n-bit sub-operations executed by the modified n-bit functional unit. Such a functional unit is called a SIMD functional unit and a processor core which can execute SIMD operations is called a SIMD processor core. SIMD operations can be effectively applied to image processing applications. This paper focuses on hardware/software cosynthesis of SIMD processor cores and particularly proposes a new simulator generator which simulates pipelined instructions for a SIMD processor. Generally, a SIMD functional unit has many options and then we can have so many different SIMD functional unit instances. However, since our hardware/software cosynthesis system synthesizes a special-purpose processor core for an input application program, it uses very limited SIMD functional unit instances. In the proposed approach, we consider a SIMD operation to be a set of SIMD sub-operations. By adding up the appropriate SIMD sub-operations, we construct a single SlMD operation. Then a SIMD functional unit behavior can be characterized by a collection of SIMD operations. This approach has the advantage that: if we have a small number of behavior libraries for SIMD suboperations, we can instantiate a particular SIMD functional unit behavior. Experimental results demonstrate the effectiveness of the proposed approach.

  • 面積制約付きCAMプロセッサ合成手法

    石川裕一朗, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術研究報告   VLD2003-89 ( 478 ) 115 - 120  2003.11

     View Summary

    We have been building the hardware/software cosynthesis system for a processor core with a content addressable memory (CAM). We input a description of an application program written in C language into the system, and the system outputs an optimal hardware configration of a CAM processor which executes an inputted application program. This paper extends our hardware/software cosynthesis system which incorporates area constraints for a CAM processor. We reduce the CAM processor's area by replacing CAM with RAM. The system computes the number of CAM words which minimizes the execution time with meeting the area constraints. Experimental results for practical application program show that the system can output a configration of the processor which executes the application program fastest with meeting the area constraints.

    CiNii

  • 面積制約を考慮したCAMプロセッサ向けハードウェア/ソフトウェア協調設計手法

    石川裕一朗, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   IE2003-98 ( 380 ) 83 - 88  2003.10

     View Summary

    We have been building the hardware/software cosynthesis system for a processor core with a content addressable memory (CAM) . We input a description of an application program written in C language into the system, and the system outputs an optimal hardware configration of a CAM processor which executes an inputted application program. This paper extends our hardware/software cosynthesis system which incorporates area constraints for a CAM processor. The system computes the number of CAM words which minimizes the execution time with meeting the area constraints. We reduce the CAM processor's area by replacing CAM with RAM according to the word number that the system computed. Experimental results for practical application program show that the system can output a configration of the processor which executes the application program fastest with meeting the area constraints.

    CiNii

  • FPGAを用いたReconfigurable Adaptive FECの実装と評価

    清水一範, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   Reconf2003-9  2003.09

  • 分岐距離による再送手法選択式マルチキャスト

    山田泰弘, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   CQ2003-58 ( 290 ) 29 - 32  2003.09

     View Summary

    Reliable muticast protocol is an effective technology in IP multicast. The various retransmissions which have the suitable situation of a network and the scale of a multicast group are proposed so far. We propose the reliable multicat protocol which chooses the retransmissions suitable for the network situation and the number of receiving hosts which change in time. Our protocol decides the retransmissions by Divergent Distance which is the number of hop to other receiving hosts. As a result of comparison of our protocol with other Reliable muticast protocols,we confirmed that our protocol is superior to other protocols.

    CiNii

  • 公共空間におけるハンドオフ時間短縮を考慮したBluetoothネットワークの手順に関する一検討

    寺崎暁, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   CQ2003-58   25 - 28  2003.09

  • 動的再構成可能システムによるAdaptive FECの実装

    清水一範, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会 DAシンポジウム 2003     25 - 30  2003.07

  • システムLSI設計における定性的側面を考慮したハードウェア/ソフトウェア分割システム

    小田雄一, 宮岡祐一郎, 戸川望, 橘昌良, 柳澤政生, 大附辰夫

    情報処理学会 DAシンポジウム 2003     169 - 174  2003.07

    CiNii

  • 冗長記述を利用したVHDLへの透かし埋め込み手法

    久保ゆきこ, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会 DAシンポジウム 2003     37 - 42  2003.07

  • System Architecture based on Hardware/Software Codesign for Optimization of Video Encoders

    The 2003 International Technical Conference on Circuits/Systems,Computers and Communications    2003.06

  • System Architecture based on Hardware/Software Codesign for Optimization of Video Encoders

    崔鎮求, 戸川望, 柳澤政生, 大附辰夫

    The 2003 International Technical Conference on Circuits/Systems,Computers and Communications    2003.06

  • A hardware/software cosynthesis system for processor cores with content addressable memories

    N Togawa, T Totsuka, T Wakui, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E86A ( 5 ) 1082 - 1092  2003.05

     View Summary

    Content addressable memory (CAM) is one of the functional memories which realize word-parallel equivalence search. Since a CAM unit is generally used in a particular application program, we consider that appropriate design for CAM units is required depending on the requirements for the application program. This paper proposes a hardware/software cosynthesis system for CAM processors. The input of the system is an application program written in C including CAM functions and a constraint for execution time (or CAM processor area). Its output is hardware descriptions of a synthesized processor and a binary code executed on it. Based on the branch-and-bound method, the system determines which CAM function is realized by a hardware and which CAM function is realized by a software with meeting the given timing constraint (or area constraint) and minimizing the CAM processor area (or execution time of the application program). We expect that we can realize optimal CAM processor design for an application program. Experimental results for several application programs show that we can obtain a CAM processor whose area is minimum with meeting the given timing constraint.

  • A hardware/software cosynthesis system for processor cores with content addressable memories

    N Togawa, T Totsuka, T Wakui, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E86A ( 5 ) 1082 - 1092  2003.05

     View Summary

    Content addressable memory (CAM) is one of the functional memories which realize word-parallel equivalence search. Since a CAM unit is generally used in a particular application program, we consider that appropriate design for CAM units is required depending on the requirements for the application program. This paper proposes a hardware/software cosynthesis system for CAM processors. The input of the system is an application program written in C including CAM functions and a constraint for execution time (or CAM processor area). Its output is hardware descriptions of a synthesized processor and a binary code executed on it. Based on the branch-and-bound method, the system determines which CAM function is realized by a hardware and which CAM function is realized by a software with meeting the given timing constraint (or area constraint) and minimizing the CAM processor area (or execution time of the application program). We expect that we can realize optimal CAM processor design for an application program. Experimental results for several application programs show that we can obtain a CAM processor whose area is minimum with meeting the given timing constraint.

  • An Instruction-Set Simulator Generator for SIMD Processor Cores

    Proceedings of workshop SASIMI2003     160 - 167  2003.04

  • ネットワークスイッチング処理を対象としたCAMプロセッサ自動合成システム

    田中英夫, 戸川望, 柳澤政生, 大附辰夫

    回路とシステム(軽井沢)ワークショップ     435 - 440  2003.04

  • 不規則なデータパスを持つプロセッサのハードウェア/ソフトウェア協調合成手法

    宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    回路とシステム(軽井沢)ワークショップ     441 - 446  2003.04

  • An Instruction-Set Simulator Generator for SIMD Processor Cores

    宮岡祐一郎, 戸川望, 笠原亨介, 崔鎮求, 柳澤政生, 大附辰夫

    Proceedings of SASIMI2003     160 - 167  2003.04

  • 閾値検索機能付きCAMプロセッサの最適化手法

    戸塚崇夫, 宮岡祐一郎, 石川裕一朗, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2002-158   19 - 24  2003.03

  • SIMD型プロセッサコア向けHW/SW分割におけるSIMD型演算最適化手法

    太刀掛宏一, 宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2002-157   13 - 18  2003.03

  • 高位合成システムにおけるスレッド分割を用いた低消費電力化手法

    内田純平, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2002-156   7 - 12  2003.03

  • A hardware/software partitioning algorithm for SIMD processor cores

    K Tachikake, N Togawa, Y Miyaoka, J Choi, M Yanagisawa, T Ohtsuki

    ASP-DAC 2003: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     135 - 140  2003  [Refereed]

     View Summary

    This paper proposes a new hardware/software partitioning algorithm for processor cores with SIMD instructions. Given a compiled assembly code including SIMD instructions, a timing constraint of execution time, and available hardware units, the proposed algorithm synthesizes an area-optimized processor core with a new assembly code. Firstly, we assume an initial processor core on which an input assembly code can run with the shortest execution time. Secondly we reduce a hardware unit added to a processor core one by one while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new processor configuration. By repeating this process, we finally obtain a processor core architecture with small area under the given timing constraint. We expect that we can obtain a processor core which has appropriate SIMD functional units for running the input application program. The promising experimental results are also shown.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A hardware/software partitioning algorithm for SIMD processor cores

    K Tachikake, N Togawa, Y Miyaoka, J Choi, M Yanagisawa, T Ohtsuki

    ASP-DAC 2003: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     135 - 140  2003

     View Summary

    This paper proposes a new hardware/software partitioning algorithm for processor cores with SIMD instructions. Given a compiled assembly code including SIMD instructions, a timing constraint of execution time, and available hardware units, the proposed algorithm synthesizes an area-optimized processor core with a new assembly code. Firstly, we assume an initial processor core on which an input assembly code can run with the shortest execution time. Secondly we reduce a hardware unit added to a processor core one by one while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new processor configuration. By repeating this process, we finally obtain a processor core architecture with small area under the given timing constraint. We expect that we can obtain a processor core which has appropriate SIMD functional units for running the input application program. The promising experimental results are also shown.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • ハードウェアIPの応答時間を考慮したプロセッサコアのハードウェア/ソフトウェア分割手法

    田川博規, 小原俊逸, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2002-136 ( 609 ) 37 - 42  2003.01

     View Summary

    This paper proposes a hardware/software partitioning algorithm based on response time of hardware IPs. We have been developing a new design approach which first determines the hardware IPs,, then co-synthesizes a processor core. Our approach realizes an application-specific system LSI including the processor core that contains only the necessary functionalities. We can reduce an unnecessary functionalities by hardware/software partitioning for micro processors based on response time of hardware IPs. Our algorithm obtains hardware response time of hardware IPs at instruction level. That realizes the efficient parallel execution of hardware and software. The experimental results show effectiveness of the proposed algorithm and our new design approach.

    CiNii

  • ハードウェアIPの応答時間を考慮したプロセッサコア合成システム

    小原俊逸, 田川博規, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2002-135 ( 609 ) 31 - 36  2003.01

     View Summary

    This paper proposes a processor core synthesis system based on response time of hardware IPs, and a framework for system LSI design over the synthesis system. In case of designing a system LSI using hardware IPs, IPs which are necessary and sufficient performance for the system LSI are not always provided. Our approach is as follow: After system level hardware/software partitioning, we use IPs for hardware, but not processor core IPs for software. We use a processor core which is auto synthesized by the proposed synthesis system and has just enough performance. We design a JPEG encoder within the framework and the results demonstrate its effectiveness and efficiency.

    CiNii

  • MPEG-4コアプロファイル符号化向けDSP

    石本剛, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2002-134   25 - 30  2003.01

  • A hardware/software partitioning algorithm for SIMD processor cores

    K Tachikake, N Togawa, Y Miyaoka, J Choi, M Yanagisawa, T Ohtsuki

    ASP-DAC 2003: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     135 - 140  2003

     View Summary

    This paper proposes a new hardware/software partitioning algorithm for processor cores with SIMD instructions. Given a compiled assembly code including SIMD instructions, a timing constraint of execution time, and available hardware units, the proposed algorithm synthesizes an area-optimized processor core with a new assembly code. Firstly, we assume an initial processor core on which an input assembly code can run with the shortest execution time. Secondly we reduce a hardware unit added to a processor core one by one while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new processor configuration. By repeating this process, we finally obtain a processor core architecture with small area under the given timing constraint. We expect that we can obtain a processor core which has appropriate SIMD functional units for running the input application program. The promising experimental results are also shown.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A high-level energy-optimizing algorithm for system VLSIs based on area/time/power estimation

    S Noda, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E85A ( 12 ) 2655 - 2666  2002.12

     View Summary

    This paper proposes a high-level energy-optimizing algorithm which can synthesize low energy system VLSIs. Given an initial system hardware obtained from an abstract behavioral description, the proposed algorithm applies to it the three energy reduction techniques, 1) reducing supply voltage, 2) selecting lower energy modules, and 3) applying gated clocks. By incorporating our area/delay/power estimation, the proposed algorithm can obtain low energy system VLSIs meeting the constraints of area, delay, and execution time. The proposed algorithm has been incorporated into a high-level synthesis system and experimental results demonstrate effectiveness and efficiency of the algorithm.

  • An algorithm and a flexible architecture for fast block-matching motion estimation

    J Choi, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E85A ( 12 ) 2603 - 2611  2002.12

     View Summary

    The motion estimation can choose the most suitable algorithm for different kinds of motion types, formats, and characteristics. The video encoding system can be optimized for quality, speed, and power consumption. In this paper, we propose a reconfigurable approach to a motion estimation algorithm and hardware architecture. The proposed algorithm determines motion type and then selects adapted block-matching algorithm for different kinds of motion sequences. The quality of our algorithm is better than that of the TSS and the BBGDS algorithm, or comparable to the performance of the better of the two, and the computational complexity of our algorithm is significantly less than that of the TSS. We also propose hardware architecture for realizing two kinds of motion estimations in the same hardware. We implemented the flexible and reconfigurable hardware architecture by using address generator unit, delay unit, and parameters and by using the hardware description language (VHDL) and the SYNOPSYS synthesis design tools. We analyze the performance of the algorithm and present adapted algorithm for a low cost real time application.

  • A high-level energy-optimizing algorithm for system VLSIs based on area/time/power estimation

    S Noda, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E85A ( 12 ) 2655 - 2666  2002.12

     View Summary

    This paper proposes a high-level energy-optimizing algorithm which can synthesize low energy system VLSIs. Given an initial system hardware obtained from an abstract behavioral description, the proposed algorithm applies to it the three energy reduction techniques, 1) reducing supply voltage, 2) selecting lower energy modules, and 3) applying gated clocks. By incorporating our area/delay/power estimation, the proposed algorithm can obtain low energy system VLSIs meeting the constraints of area, delay, and execution time. The proposed algorithm has been incorporated into a high-level synthesis system and experimental results demonstrate effectiveness and efficiency of the algorithm.

  • An algorithm and a flexible architecture for fast block-matching motion estimation

    J Choi, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E85A ( 12 ) 2603 - 2611  2002.12

     View Summary

    The motion estimation can choose the most suitable algorithm for different kinds of motion types, formats, and characteristics. The video encoding system can be optimized for quality, speed, and power consumption. In this paper, we propose a reconfigurable approach to a motion estimation algorithm and hardware architecture. The proposed algorithm determines motion type and then selects adapted block-matching algorithm for different kinds of motion sequences. The quality of our algorithm is better than that of the TSS and the BBGDS algorithm, or comparable to the performance of the better of the two, and the computational complexity of our algorithm is significantly less than that of the TSS. We also propose hardware architecture for realizing two kinds of motion estimations in the same hardware. We implemented the flexible and reconfigurable hardware architecture by using address generator unit, delay unit, and parameters and by using the hardware description language (VHDL) and the SYNOPSYS synthesis design tools. We analyze the performance of the algorithm and present adapted algorithm for a low cost real time application.

  • 閾値検索機能を持つCAMプロセッサの自動合成システム

    戸塚崇夫, 石川裕一朗, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2002-113 ( 476 ) 197 - 192  2002.11

     View Summary

    This paper proposes a behavioral synthesis system for a processor core with an extended content addressable memory (CAM). The input of the system is an application program written in C including CAM functions. Its outputs are hardware descriptions of a synthesized processor and a binary code executed on it. An extended content addressable memory realizes not only conventional equivalent search but parallel threshold search such as less-than search and greater-than search. By utilizing ten types of these extended CAM cell arrays, our system synthesizes a CAM processor which can execute an input application program in a short time with small processor area. Experimental results for two practical application programs show the effectiveness of the proposed system and extended CAM processors.

    CiNii

  • 動的再構成可能システムによるプロトコルブースタの実装

    清水一範, 陳暁梅, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2002-103   127 - 132  2002.11

  • ストリーミングを主目的としたアクセスネットワークでの最大許容遅延を考慮した制御方式

    柳澤政生, 佐藤隆之, 戸川望, 大附辰夫

    電子情報通信学会技術報告,MoMuC   2-Jul ( 251 ) 13 - 18  2002.07

     View Summary

    This paper considers an access network environment for for data streaming using Bluetooth and Ethernet. In general, since there is much more lack of packets in a wireless network than in a wired network, so degradation of streaming data takes place in a wireless network. Man's ear has the feature of not perceiving the delay time approximately 100[ms]. Now, consider the delay time tolerable to streaming data. If an error correction is possible within the tolerable delay time, degradation of streaming data can be restored. In an access network using Bluetooth, transmission speed in a wired network is usually faster than in a wireless network where a Bluetooth link exists. Therefore, we consider that increasing of the packets to correct errors does not affect streaming data in a wired network. This method is suitable for the access network using Ethernet and Bluetooth. This paper introduces maximum tolerable delay time for streaming data and proposes an access network control method for data streaming with the maximum tolerable delay time. Experimental results demonstrate effectiveness and efficiency of the proposed method.

    CiNii

  • 仮想IP類推機構を有する動画像処理向けシステムVLSIのためのハードウェア/ソフトウェア分割システム

    小田雄一, 磯田新平, 戸川望, 橘昌良, 柳澤政生, 大附辰夫

    情報処理学会 DAシンポジウム 2002     173 - 178  2002.07

  • A Software/Hardware Codesign for MPEG Encoder

    FIT(Forum on Information Technology)2002    2002.06

  • System-level Function and Architecture Codesign for Optimization of MPEG Encoder

    ITC-CSCC'02    2002.06

  • Packed SIMD 型命令を持つプロセッサを対象としたハードウェア/ソフトウェア協調合成システムのための並列化コンパイル手法

    鈴木伸治, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2002-78 ( 168 ) 79 - 84  2002.06

     View Summary

    Consider to synthesize a processor with packed SIMD type instructions by a hardware/software cosynthesis system. The system needs a parallelizing compiler for the processor with packed SIMD type instructions. The parallelizing compiler targets the virtual processor that has all availabale hardware units. It exploits instruction level parallelism using packed SIMD type instructions and output assembly codes. The output of the parallelizing compiler decides the initial configuration of the processor. This paper proposes a packed SIMD generation algorithm and an instruction merge algorithm. The packed SIMD generation algorithm packs and aligns low precision data in a register and generates packed SIMD type instructions. The instruction merge algorithm merges several packed SIMD type instructions to generate the packed SIMD type instructions that include saturation and shift operation. Exprimental results demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • Packed SIMD型命令セットを持った画像処理プロセッサのためのハードウェア/ソフトウェア分割手法

    太刀掛宏一, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2002-53 ( 168 ) 85 - 90  2002.06

     View Summary

    This paper proposes a hardware/software partitioning algorithm for image processors with packed SIMD type instructions. An image processing application includes packed SIMD type instructions. Each packed SIMD type instruction in the application is executed by one of packed SIMD type functional units in the processor core. We can change a hardware configuration and a cost of a packed SIMD type functional unit according to the set of packed SIMD type instructions executed by the functional unit. Total hardware costs of the processor is reduced by hardware/software partitioning which selects an appropriate set of packed SIMD type instructions for each packed SIMD type functional unit. The experimental results show effectiveness of the proposed algorithm.

    CiNii

  • A Software/Hardware Codesign for MPEG Encoder

    崔鎮求, 戸川望, 柳澤政生, 大附辰夫

    FIT(Forum on Information Technology)2002    2002.06

  • System-level Function and Architecture Codesign for Optimization of MPEG Encoder

    崔鎮求, 戸川望, 柳澤政生, 大附辰夫

    ITC-CSCC'02    2002.06

  • モバイル環境における一対多通信 -シミュレーションによるFTPとSRMの比較-

    佐藤隆之, 柳生健吾, 戸川望, 大附辰夫

    電子情報通信学会技術報告,MoMuC   2-Jun   33 - 38  2002.05

  • ディジタル信号処理向けプロセッサのためのシミュレータ生成手法

    笠原亨介, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会論文誌   vol.43 No.5   1202 - 1213  2002.05

  • Packed SIMD型命令を持つプロセッサを対象としたハードウェア/ソフトウェア協調合成システムのためのハードウェアユニット生成手法

    宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会論文誌   vol.43 No.5 ( 5 ) 1191 - 1201  2002.05

     View Summary

    This paper proposes a hardware unit generation algorithm for a hardware/software cosynthesis system of digital signal processors with packed SIMD type instructions. Given a set of instructions, the proposed algorithm extracts a set of subfunctions to be required by the hardware unit and generates more than one architecture candidates for hardware units. The algorithm also outputs the estimated area and delay of each of the generated hardware units. The execution time of the proposed algorithm is very short and thus it can be easily incorporated into the processor core synthesis system. Experimental results demonstrate effectiveness and efficiency of the alogorithm.

    CiNii

  • High-level area/delay/power estimation for low power system VLSIs with gated clocks

    S Noda, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E85A ( 4 ) 827 - 834  2002.04

     View Summary

    At high-level synthesis for system VLSIs, their power consumption is efficiently reduced by applying gated clocks to them. Since using gated clocks causes the reduction of power consumption and the increase of area/delay, estimating tradeoff between power and area/delay by applying gated clocks is very important. In this paper. we discuss the amount of variance of area, delay and power by applying gated clocks. We propose a simple gate-level circuit model and estimation equations. We vary parameters in our proposed circuit model, and evaluate power consumption by back-annotating gate-level simulation results to the original circuit. This paper also proposes a conditional expression for applying gated clocks The expression shows whether or not we can reduce power consumption by applying gated clocks. We confirm the accuracy of proposed estimation equations by experiments.

  • DSPプロセッサコアのハードウェア/ソフトウェア協調合成システムのための演算語長縮小化手法

    田川博規, 嶋下和宏, 戸川望, 柳澤政生, 大附辰夫

    回路とシステム軽井沢ワークショップ     429 - 434  2002.04

  • High-level area/delay/power estimation for low power system VLSIs with gated clocks

    S Noda, N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E85A ( 4 ) 827 - 834  2002.04

     View Summary

    At high-level synthesis for system VLSIs, their power consumption is efficiently reduced by applying gated clocks to them. Since using gated clocks causes the reduction of power consumption and the increase of area/delay, estimating tradeoff between power and area/delay by applying gated clocks is very important. In this paper. we discuss the amount of variance of area, delay and power by applying gated clocks. We propose a simple gate-level circuit model and estimation equations. We vary parameters in our proposed circuit model, and evaluate power consumption by back-annotating gate-level simulation results to the original circuit. This paper also proposes a conditional expression for applying gated clocks The expression shows whether or not we can reduce power consumption by applying gated clocks. We confirm the accuracy of proposed estimation equations by experiments.

  • 制御処理ハードウェア高位合成のためのコントロールデータフローグラフ変形手法

    石井哲雄, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2001-165 ( 695 ) 41 - 48  2002.03

     View Summary

    In this paper, a control/data flow graph(CDFG) transformation algorithm is proposed for high-level synthesis system targeted at control-based hardwares. A scheduler in our system generates state transition graphs(STGs) from a CDFG. This transformation algorithm is focused on reducing the execution time of hardwares. Proposal transformation algorithm is consist of two techniques. The one reduces the number of transition in a STG by replicating sub-CDFG, and the other reduces the clock cycle time by executing arithmetic operations with memory accesses in parallel. Experimental results for several control-based hardwares demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • IP再利用を考慮した動画像処理システムVLSI向けハードウェア/ソフトウェア分割設計支援システム

    磯田新平, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2001-164 ( 695 ) 33 - 40  2002.03

     View Summary

    In this paper, we propose a new hardware/software partition system which has 1) a analogy enumeration mechanism, 2) a qualitative objective evaluation system. This partition system is intended for video processing VLSIs. In this system, IP designed in the past is accumulated in a database, and it uses for analogy enumeration. Analogy enumeration is a function which guesses change of the area/delay from IP accumulated in a database when changing the degree of the parallel processing. Qualitative objectives are objectives other than area/delay/electric power, like the ease of connecting with other hardware parts. The number of solution candidates can be virtually increased by introducing 1), and a designer can obtain the solution which fills area/delay constraints easily. The low design cost candidates can be chosen from solution candidates with an equivalent quantitative objectives by introducing 2). We built a system including these functions and show the validity by a computer experiment.

    CiNii

  • Packed SIMD 型演算器を持つディジタル信号処理プロセッサのためのリターゲッタブルシミュレータ生成手法

    笠原亨介, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2001-162   24 - 17  2002.03

  • VLSI architecture for a flexible motion estimation with parameters

    J Choi, N Togawa, M Yanagisawa, T Ohtsuki

    ASP-DAC/VLSI DESIGN 2002: 7TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE AND 15TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, PROCEEDINGS     452 - 457  2002  [Refereed]

     View Summary

    If motion estimation can choose the most suitable algorithm according to the changing characteristics of input image signals, we can get benefits, which improve quality and performance, reduce power consumption, and an optimize system. In this paper we propose a reconfigurable approach to motion estimation algorithm and architecture. The propose algorithm determines motion type and then selects adapted algorithm in order to improve quality and performance of images. We implemented the flexible and reconfigurable architecture by hardware with address generator unit, delay unit, and parameters. Our architecture supports more than one block-matching algorithm and parameters providing to optimize system. We are implementing our architecture by using hardware description language (VHDL) and synthesis design tools. We analyze the performance of architecture and present adaption to algorithm for a low cost real time application.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • An algorithm of hardware unit generation for processor core synthesis with packed SIMD type instructions

    Y Miyaoka, A Choi, N Togawa, M Yanagisawa, T Ohtsuki

    APCCAS 2002: ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, VOL 1, PROCEEDINGS     171 - 176  2002  [Refereed]

     View Summary

    Let us consider to synthesize a processor core with SIMD instructions by a hardware/software cosynthesis system. The system is required to configure functional units executing SIMD instructions and obtain the area and delay of the functional units to evaluate the synthesized processor core. This paper proposes a hardware unit generation algorithm for a hardwaxe/software cosynthesis system of processors with SIMD instructions. Given a set of instructions to be executed by a hardware unit and constraints for area and delay of the hardware unit, the proposed algorithm extracts a set of subfunctions to be required by the hardware unit and generates more than one architecture candidates for the hardware unit. The algorithm also outputs the estimated area and delay of each of the generated hardware units. The execution time of the proposed algorithm is very short and thus it can be easily incorporated into the processor core synthesis system. Experimental results demonstrate effectiveness and efficiency of the algorithm.

    DOI

    Scopus

  • An algorithm of hardware unit generation for processor core synthesis with packed SIMD type instructions

    Y. Miyaoka, J. Choi, N. Togawa, M. Yanagisawa, T. Ohtsuki

    IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS   1   171 - 176  2002

     View Summary

    The authors consider the synthesis of a processor core with SIMD instructions by a hardware/software cosynthesis system. The system is required to configure functional units executing SIMD instructions and obtain the area and delay of the functional units to evaluate the synthesized processor core. This paper proposes a hardware unit generation algorithm for a hardware/software cosynthesis system of processors with SIMD instructions. Given a set of instructions to be executed by a hardware unit and constraints for area and delay of the hardware unit, the proposed algorithm extracts a set of subfunctions to be required by the hardware unit and generates more than one architecture candidates for the hardware unit. The algorithm also outputs the estimated area and delay of each of the generated hardware units. The execution time of the proposed algorithm is very short and thus it can be easily incorporated into the processor core synthesis system. Experimental results demonstrate effectiveness and efficiency of the algorithm.

    DOI

    Scopus

  • An algorithm of hardware unit generation for processor core synthesis with packed SIMD type instructions

    Y Miyaoka, A Choi, N Togawa, M Yanagisawa, T Ohtsuki

    APCCAS 2002: ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, VOL 1, PROCEEDINGS     171 - 176  2002

     View Summary

    Let us consider to synthesize a processor core with SIMD instructions by a hardware/software cosynthesis system. The system is required to configure functional units executing SIMD instructions and obtain the area and delay of the functional units to evaluate the synthesized processor core. This paper proposes a hardware unit generation algorithm for a hardwaxe/software cosynthesis system of processors with SIMD instructions. Given a set of instructions to be executed by a hardware unit and constraints for area and delay of the hardware unit, the proposed algorithm extracts a set of subfunctions to be required by the hardware unit and generates more than one architecture candidates for the hardware unit. The algorithm also outputs the estimated area and delay of each of the generated hardware units. The execution time of the proposed algorithm is very short and thus it can be easily incorporated into the processor core synthesis system. Experimental results demonstrate effectiveness and efficiency of the algorithm.

  • ロジック入力用レベルシフトコンパレーター設計考察

    宮崎英敏, 戸川望, 柳澤政生, 大附辰夫, 茨木栄武, 新谷悟

    電子回路研究会,ETC-02-16     13 - 17  2002.01

  • システムVLSIのための高位面積/遅延/消費電力見積もりに基づく低消費電力指向高位合成手法

    野田真一, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2001-144 ( 577 ) 93 - 100  2002.01

     View Summary

    This paper propses a new high-level synthesis system which can synthesize low-powered system VLSIs under the constraints of area, delay, and execution time. In the proposed system, first an initial system hardware is obtained from an abstract behavioral description. Then three power reduction techniques, 1) reducing power supply voltage, 2) selecting lower power modules, and 3) applying gated clocks, are applied to it. However these power reduction techniques may increase area, delay, and/or execution time of a synthesized hardware, while they can reduce its power dissipation. In this paper, we propose a power optimization algorithm which incorporates area/delay/power estimation, in which we can obtain a synthesized hardware meeting given area/delay/power constraints. Experimental results demonstrate effectiveness and effciency of the algorithm.

    CiNii

  • A new hardware/software partitioning algorithm for DSP processor cores with two types of register files

    N Togawa, T Sakurai, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E84A ( 11 ) 2802 - 2807  2001.11

     View Summary

    This letter proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more types of functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which consider only one type of functional units for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

  • Area and delay estimation in hardware/software cosynthesis for digital signal processor cores

    N Togawa, Y Kataoka, Y Miyaoka, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E84A ( 11 ) 2639 - 2647  2001.11

     View Summary

    Hardware/software partitioning is one of the key processes in a hardware/software cosynthesis system for digital signal processor cores. In hardware/software partitioning, area and delay estimation of a processor core plays an unportant role since the hardware/software partitioning process must determine which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. This paper proposes area and delay estimation equations for digital signal processor cores. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for a processor kernel can be mainly obtained by minimum area for a processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. Experimental results demonstrate that errors of area estimation are less than 2% and errors of delay estimation are less than 2 ns when comparing estimated area and delay with logic-synthesized area and delay.

  • メモリとのインターフェース仕様を考慮した演算語長縮小に基づくプロセッサコアのハードウェア/ソフトウェア協調合成システム

    嶋下和宏, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2001-110 ( 467 ) 127 - 132  2001.11

     View Summary

    Let us consider to reduce an area of a processor by shortening the operation word length from n to n/2. In this case, we generally need to excute an operation instruction at least two times in order to obtain n-bit result. However, assume that internal variables in an application program uses only n/2 bits. In this case, we need to execute the operation instruction only once. We have proposed a hardware/software cosynthesis system for processors. In the system, we assume that data length of applications program equals to operation word length of a processor core. This paper proposes an algorithm for shortening an operation word length. The algorithm repeatedly replaces each n bit operation instruction with one or more n/2 bit operation instructions depending on internal variable precision.

    CiNii

  • A new hardware/software partitioning algorithm for DSP processor cores with two types of register files

    N Togawa, T Sakurai, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E84A ( 11 ) 2802 - 2807  2001.11

     View Summary

    This letter proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more types of functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which consider only one type of functional units for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

  • Area and delay estimation in hardware/software cosynthesis for digital signal processor cores

    N Togawa, Y Kataoka, Y Miyaoka, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E84A ( 11 ) 2639 - 2647  2001.11

     View Summary

    Hardware/software partitioning is one of the key processes in a hardware/software cosynthesis system for digital signal processor cores. In hardware/software partitioning, area and delay estimation of a processor core plays an unportant role since the hardware/software partitioning process must determine which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. This paper proposes area and delay estimation equations for digital signal processor cores. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for a processor kernel can be mainly obtained by minimum area for a processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. Experimental results demonstrate that errors of area estimation are less than 2% and errors of delay estimation are less than 2 ns when comparing estimated area and delay with logic-synthesized area and delay.

  • A Hardware/Software Cosynthesis System for CAM Processors

    SASIMI2001    2001.10

  • A Hardware/Software Cosynthesis System for CAM Processors

    戸川望, 涌井達彦, 柳澤政生, 大附辰夫

    SASIMI2001    2001.10

    CiNii

  • Implementation of Motion Estimation IP Core for MPEG Encoder

    ITC-CSCC 2001    2001.07

  • Packed SIMD型命令を持つプロセッサを対象としたハードウェア/ソフトウェア協調合成システムのためのハードウェアユニット生成手法

    宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会 DAシンポジウム 2001     223 - 228  2001.07

    CiNii

  • ディジタル信号処理向けプロセッサのためのシミュレータ生成手法

    笠原亨介, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会 DAシンポジウム 2001     137 - 142  2001.07

    CiNii

  • Implementation of Motion Estimation IP Core for MPEG Encoder

    崔鎮求, 戸川望, 柳澤政生, 大附辰夫

    ITC-CSCC 2001    2001.07

    CiNii

  • An area/time optimizing algorithm in high-level synthesis of control-based hardwares

    N Togawa, M Ienaga, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E84A ( 5 ) 1166 - 1176  2001.05

     View Summary

    This paper proposes an area/time optimizing algorithm in a high-level synthesis system for control-based hardwares. Given a call graph whose node corresponds to a control flow of an application program. the algorithm generates a set of state-transition graphs which represents the input call graph under area and timing constraint. In the algorithm. first state-transition graphs which satisfy only timing constraint are generated and second they are transformed so that they can satisfy area constraint. Since the algorithm is directly applied to control-flow graphs, it can deal with control flows such as bitwise processed and conditional branches. Further, the algorithm synthesizes more than one hardware architecture candidates from a single call graph for an application program. Designers of an application program can select several good hardware architectures among candidates depending on multiple design criteria. Experimental results for several control-based hardwares demonstrate effectiveness and efficiency of the algorithm.

  • An area/time optimizing algorithm in high-level synthesis of control-based hardwares

    N Togawa, M Ienaga, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E84A ( 5 ) 1166 - 1176  2001.05

     View Summary

    This paper proposes an area/time optimizing algorithm in a high-level synthesis system for control-based hardwares. Given a call graph whose node corresponds to a control flow of an application program. the algorithm generates a set of state-transition graphs which represents the input call graph under area and timing constraint. In the algorithm. first state-transition graphs which satisfy only timing constraint are generated and second they are transformed so that they can satisfy area constraint. Since the algorithm is directly applied to control-flow graphs, it can deal with control flows such as bitwise processed and conditional branches. Further, the algorithm synthesizes more than one hardware architecture candidates from a single call graph for an application program. Designers of an application program can select several good hardware architectures among candidates depending on multiple design criteria. Experimental results for several control-based hardwares demonstrate effectiveness and efficiency of the algorithm.

  • ディジタル信号処理向けプロセッサコアのPacked SIMD型ハードウェアユニット生成手法

    宮岡祐一郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2001-2 ( 45 ) 7 - 13  2001.05

     View Summary

    Consider to synthesize a processor core with packed SIMD type instructions by a hardware/software cosynthesis system. The system is required to configure functional units executing packed SIMD type instructions and obtain the area and delay of the functional units to evaluate the synthesized processor core. This paper proposes a hardware unit generation algorithm for packed SIMD type functional units. Given a set of instructions to be executed by a hardware unit and constraints for area and delay of the hardware unit, the proposed algorithm extracts a set of subfunctions to be required by the hardware unit and generates more than one architecture candidates for the hardware unit. The algorithm also outputs the estimated area and delay of each of the generated hardware units. The execution time of the proposed algorithm is very short and thus it can be easily incorporated into the processor core synthesis system. Experimental results for packed SIMD type functional units demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • Gated Clockによる低消費電力化システムVLSIの高位面積/遅延/消費電力見積り

    野田真一, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会第14回 回路とシステム(軽井沢)ワークショップ     591 - 596  2001.04

  • ソフトIPのための保護アルゴリズム

    堀川哲郎, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会第14回 回路とシステム(軽井沢)ワークショップ     411 - 416  2001.04

  • システムLSIを対象としたハードウェア/ソフトウェア分割システム

    小田龍之介, 磯田新平, 戸川望, 橘昌良, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2000-140 ( 646 ) 37 - 42  2001.03

     View Summary

    This paper proposes a hardware/software partitioning system for system LSIs. The system is composed of an architecture database and an implementation binder. The database associates an algorithm name with implementation cases. Input of the implementation binder is an application modeled by a functional block diagram and a constraint of its execution time. The diagram consists of functional modules and their connections. Each module has an algorithm name. The binder enumerates more than one module-case bindings, all of which meet the given time constraint. We apply the system to MPEG-4 encoder and the results demonstrate its effectiveness and efficiency.

    CiNii

  • 画像処理を対象としたPacked SIMD型命令セットを持つプロセッサのハードウェア/ソフトウェア協調合成システムにおける並列化Cコンパイラ

    野々垣直浩, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2000-139 ( 646 ) 31 - 36  2001.03

     View Summary

    Many current general purpose processors and digital signal processors have extended instructions to enhance their performance of image/video processing applications. The extended functionality comes primarily with the addition of packed SIMD type instructions. These instructions aim at exploiting subword parallelism. The packed SIMD type instruction set includes hundreds of instructions but a small subset of them is enough to implement most image/video processing applications. Thus we can significantly reduce area of a processor within a restriction of execution time if application-specific syntyesis is applied to it. In this paper, we propose a hardware/software cosynthesis system for processors with packed SIMD type instruction set and an algorithm of SIMD parallelization in a register for its compiler. The input of the system is an application description written in C and application data, and the output is hardware descriptions of a synthesized processor core, an application binary code executed on the processor core and software environment. Its compiler generates an object code assuming a processor core with all the available hardware units. It exploits instruction level and subword level parallelism, and attempts to minimize its execution time. The experimental results show the effectiveness of the compiler.

    CiNii

  • 制御処理ハードウェアの高位合成システムのための面積/遅延見積もり手法

    余田貴幸, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会研究報告   2001-SLDM-100-4,pp.25-32 ( 12 ) 25 - 32  2001.02

     View Summary

    This paper proposes an area/delay estimation technique in high-level synthesis for control flow based hardwares. At area/delay estimation, the input is the state-transition graph, which is generated by the area/time optimizing. The output is estimated area and delay value for the state-transition graph. Our estimation technique gives area and delay including control part of hardware, using an estimation equation. The equation has been decided by number of operations, number of states and type of operations. Experimental results for several control-based hardware demonstrate effectiveness and efficiency of the technique.

    CiNii

  • RC等価回路に基づくクロストーク低減配線手法

    曽根原理仁, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会研究報告   2001-SLDM-100-3,pp.17-24   17 - 24  2001.02

    CiNii

  • Area/delay estimation for digital signal processor cores

    Y Miyaoka, Y Kataoka, N Togawa, M Yanagisawa, T Ohtsuki

    PROCEEDINGS OF THE ASP-DAC 2001: ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 2001     156 - 161  2001  [Refereed]

     View Summary

    Hardware/software partitioning is one of the key processes in a hardware/software cosynthesis system for digital signal processor cores. In hardware/software partitioning, area and delay estimation of a processor core plays an important role since the hardware/software partitioning process must determine which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. This paper proposes area and delay estimation equations for digital signal processor cores. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for a processor kernel can be mainly obtained by minimum area for a processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. Experimental results demonstrate that errors of area estimation are less than 2% and errors of delay estimation are less than 2ns when comparing estimated area and delay with logic-synthesized area and delay.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • 発見的算法と分枝限定法を用いた時間的予測に基づくリソースバイディング

    中村洋, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2000-119 ( 532 ) 17 - 24  2001.01

     View Summary

    This paper proposes a resource binding algorithm based on computation time estimation in the high-level synthesis system for digital signal processing. In the algorithm, a heuristic based binder is first executed and then a branch-and-bound based binder is executed. The computation time to run the algorithm depends on the number of resource assignments which the heuristic based binder determines. Thus we can estimate computation time to run the algorithm by varying the number of such resource assignments. In the algorithm, for a given constraint of computation time, we first obtain the number of resource assignments which the heuristic based binder determines based on the computation time estimation. Then we actually execute the heuristic based binder. After that, we execute the branch-and-bound based binder for the rest of the resource assignments. Experimental results demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • FPGAを用いた動的再構成可能システムを対象とするスケジューリング手法

    石飛貴志, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2000-115   33 - 40  2001.01

    CiNii

  • パラメータ付けされた動的再構成可能システムとその応用

    香西伸治, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2000-114 ( 531 ) 25 - 32  2001.01

     View Summary

    Recently, there has been proposed a dynamically reconfigurable system where a part of the system can be reconfigured in-system. A conventional Hardware configuration in dynamically reconfigurable systems is fixed and cannot be changed. Therefore, there is a problem that functions of the system configuration can be redundant or insufficient. In this work, we propose a scalable dynamic reconfigurable system with configuration parameters which resolves the ploblem. The proposed dynamic reconfigurable system is comosed of a PCI interface, Function Units which excute operations and a control unit which controls a PCI interface and Function Units. The device configuration parameters such as processsing speed, size of the device, reconfiguration time and the number of pin as well as connections among devices in Function Units can be determined depending on applicatios. The device parameters have costs depending on its performance, and users can determine the system configuration and device parameters so that the application can run as fast as possible under a given cost constraint. In this work, we evaluate effectivity of this system by applying it to image processing and packet processing applications.

    CiNii

  • Area/delay estimation for digital signal processor cores

    Y Miyaoka, Y Kataoka, N Togawa, M Yanagisawa, T Ohtsuki

    PROCEEDINGS OF THE ASP-DAC 2001: ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 2001     156 - 161  2001

     View Summary

    Hardware/software partitioning is one of the key processes in a hardware/software cosynthesis system for digital signal processor cores. In hardware/software partitioning, area and delay estimation of a processor core plays an important role since the hardware/software partitioning process must determine which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. This paper proposes area and delay estimation equations for digital signal processor cores. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for a processor kernel can be mainly obtained by minimum area for a processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. Experimental results demonstrate that errors of area estimation are less than 2% and errors of delay estimation are less than 2ns when comparing estimated area and delay with logic-synthesized area and delay.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • CAM processor synthesis based on behavioral descriptions

    N Togawa, T Wakui, T Yoden, M Terajima, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E83A ( 12 ) 2464 - 2473  2000.12

     View Summary

    CAM (Content Addressable Memory) units are generally designed so that thc) carl be applied to variety of application programs. However, if a particular application runs on CAM units, some functions in CAM units may be often used and other functions may never be used. We consider that appropriate design for CAM units is required depending on the requirements for a given application program. This paper proposes a CAM processor synthesis system based on behavioral descriptions. The input of the system is an application programs written in C including CAM functions, and its output is hardware descriptions of a synthesized processor and a binary code executed on it. Since the system determines functions in CAM units and synthesizes a CAM processor depending on the requirements of an application program, we expect that a synthesized CAM processor can execute the application program with small processor area and delay. Experimental results demonstrate its efficiency and effectiveness.

  • CAM processor synthesis based on behavioral descriptions

    N Togawa, T Wakui, T Yoden, M Terajima, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E83A ( 12 ) 2464 - 2473  2000.12

     View Summary

    CAM (Content Addressable Memory) units are generally designed so that thc) carl be applied to variety of application programs. However, if a particular application runs on CAM units, some functions in CAM units may be often used and other functions may never be used. We consider that appropriate design for CAM units is required depending on the requirements for a given application program. This paper proposes a CAM processor synthesis system based on behavioral descriptions. The input of the system is an application programs written in C including CAM functions, and its output is hardware descriptions of a synthesized processor and a binary code executed on it. Since the system determines functions in CAM units and synthesizes a CAM processor depending on the requirements of an application program, we expect that a synthesized CAM processor can execute the application program with small processor area and delay. Experimental results demonstrate its efficiency and effectiveness.

  • CAMプロセッサを対象とするハードウェア/ソフトウェア協調合成システム

    涌井達彦, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2000-84   89 - 94  2000.11

    CiNii

  • 機能メモリを使用したプロセッサの面積/遅延見積もり手法

    余傅達彦, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD2000-83   83 - 88  2000.11

  • 制御処理ハードウェアの高位合成システムのための面積/時間最適化アルゴリズム

    家長真行, 戸川望, 柳澤政生, 大附辰夫

    情報処理学会DAシンポジウム2000     27 - 32  2000.07

  • A Behavioral Synthesis System for Processors with Content Addressable Memories

    涌井達彦, 余傅達彦, 寺島信, 戸川望, 柳澤政生, 大附辰夫

    Proc.SASIMI2000     56 - 63  2000.04

  • システムVLSIの動作合成におけるレイアウト面積・遅延見積もり手法

    諏訪勝, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会第13回回路とシステム(軽井沢)ワークショップ     125 - 130  2000.04

    CiNii

  • A hardware/software cosynthesis system for digital signal processor cores with two types of register files

    N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E83A ( 3 ) 442 - 451  2000.03

     View Summary

    In digital signal processing, bit width of intermediate variables should be longer than that of input and output variables in order to execute intermediate operations with high precision. Then a processor cole for digital signal processing is required to have two types of register files, one of which is used by input and output variables and the other one is used by intermediate variables. This paper proposes a hardware/software cosynthesis system for digital signal processor cores with two types of register files. Given an application program and its data, the system synthesizes a hardware description of a processor cure, an object code running on the processor core, and software environments. A synthesized processor core can be composed of a processor kernel, multiple data memory buses, hardware loop units, addressing units, and multiple functional units. Furthermore it can have two types of register files RF1 and RF2. The bit width and number of registers in RF1 or RF2 will be determined based on a given application program. Thus a synthesized processor core will have small area with keeping high precision of intermediate operations compared with a processor core with only one register file. The experimental results demonstrate the effectiveness of the proposed system.

  • 歩行者を対象とした地図データ配信システムにおける専用プロセッサの設計と評価

    伊澤義貴, 濱未希子, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD99-267 ( 658 ) 15 - 22  2000.03

     View Summary

    A map data distributing system for pedestrians is composed of map servers storing vector map data, base stations having antennas and multiple personal digital assistances. A person having a personal digital assistance obtains his or her positional information from GPS and sends it to a map server via a base station. The map server searches map data around the personal digital assistance's position. Then the map server sends back the searched map data to the personal digital assistance. This paper considers a problem to search vector map data crossing or included by a given query rectangle in the map data distributing system and proposes two types of processors to solve this problem; a processor with an associative memory and a processor with a segment search unit. These processors have been written in a hardware description language and logic-synthesized. The areas and the critical delays show their efficiency and effectiveness.

    CiNii

  • FPGAを用いた動的再構成可能システムと暗号化アルゴリズムへの応用

    羽切崇, 戸川望, 柳澤政生, 大附辰夫

    電子情報通信学会技術報告   VLD99-266  2000.03

    CiNii

  • A hardware/software cosynthesis system for digital signal processor cores with two types of register files

    N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E83A ( 3 ) 442 - 451  2000.03

     View Summary

    In digital signal processing, bit width of intermediate variables should be longer than that of input and output variables in order to execute intermediate operations with high precision. Then a processor cole for digital signal processing is required to have two types of register files, one of which is used by input and output variables and the other one is used by intermediate variables. This paper proposes a hardware/software cosynthesis system for digital signal processor cores with two types of register files. Given an application program and its data, the system synthesizes a hardware description of a processor cure, an object code running on the processor core, and software environments. A synthesized processor core can be composed of a processor kernel, multiple data memory buses, hardware loop units, addressing units, and multiple functional units. Furthermore it can have two types of register files RF1 and RF2. The bit width and number of registers in RF1 or RF2 will be determined based on a given application program. Thus a synthesized processor core will have small area with keeping high precision of intermediate operations compared with a processor core with only one register file. The experimental results demonstrate the effectiveness of the proposed system.

  • An area/time optimizing algorithm in high-level synthesis for control-based hardwares

    Nozomu Togawa, Masayuki Ienaga, Masao Yanagisawa, Tatsuo Ohtsuki

    Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC     309 - 312  2000  [Refereed]

     View Summary

    This paper proposes an area/time optimizing algorithm in high-level synthesis for control-based hardwares. Given a call graph whose node corresponds to a control flow of an application program, the algorithm generates a set of state-transition graphs which represents the input call graph under area and timing constraint. In the algorithm, first state-transition graphs which satisfy only timing constraint are generated and second they are transformed so that they can satisfy area constraint. Since the algorithm is directly applied to control-flow graphs, it can deal with control flows such as bit-wise processes and conditional branches. Further, the algorithm synthesizes more than one hardware architecture candidates from a single call graph for an application program. Designers of an application program can select several good hardware architectures among candidates depending on multiple design criteria. Experimental results for several control-based hardwares demonstrate effectiveness and efficiency of the algorithm. © 2000 IEEE.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • A hardware/software partitioning algorithm for digital signal processor cores with two types of register files

    N Togawa, T Sakurai, M Yanagisawa, T Ohtsuki

    2000 IEEE ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS     544 - 547  2000

     View Summary

    This paper proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which have only one functional unit for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

  • A Behavioral Synthesis System for Processors with Content Addressable Memories

    Proc. SASIMI 2000     56 - 63  2000

  • A hardware/software partitioning algorithm for digital signal processor cores with two types of register files

    N Togawa, T Sakurai, M Yanagisawa, T Ohtsuki

    2000 IEEE ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS   pp.544-547   544 - 547  2000

     View Summary

    This paper proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which have only one functional unit for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

  • An area/time optimizing algorithm in high-level synthesis for control-based hardwares

    戸川望, 家長真行, 柳澤政生, 大附辰夫

    Proceedings of IEEE Asia and South Pacific Design Automation Conference 2000 (ASP-DAC 2000)    2000.01

  • A simultaneous placement and routing algorithm for FPGAs with power optimization

    Journal of Circuits, Systems and Computers   9;1,2   99 - 112  1999.12

    DOI

  • A hardware/software cosynthesis system for digital signal processor cores

    N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E82A ( 11 ) 2325 - 2337  1999.11

     View Summary

    This paper proposes a hardware/software cosynthesis system for digital signal processor cores and a hardware/software partitioning algorithm which is one of the key issues for the system. The target processor has a VLIW-type core which can be composed of a processor kernel, multiple data memory buses (X-bus and Y-bus), hardware loop units, addressing units, and multiple functional units. The processor kernel includes five pipeline stages (RISC-type kernel) or three pipeline stages (DSP-type kernel). Given an application program written in the C language and a set of application data, the system synthesizes a processor core by selecting an appropriate kernel (RISC-type or DSP-type kernel) and required hardware units according to the application program/data and the hardware costs. The system also generates the object code for the application program and a software environment (compiler and simulator) for the processor core. The experimental results demonstrate that the system synthesizes processor cores effectively according to the features of an application program and the synthesized processor cores execute most application programs with the minimum number of clock cycles compared with several existing processors.

  • 2種類のレジスタファイルを持つディジタル信号処理向けプロセッサのハードウェア/ソフトウェア分割手法

    電子情報通信学会技術報告   VLD99-76  1999.11

  • ディジタル信号処理向けプロセッサコアの面積/遅延見積り手法

    KATAOKA Yoshiharu, YOSHIZAWA Dai, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    電子情報通信学会技術報告   VLD99-75 ( 475 ) 1 - 8  1999.11

     View Summary

    A hardware/software cosynthesis system for digital signal processors with two types of register files requires to certain evalution values in the phase of hardware/software partitioning. These evaluation values are execution time of a given application program and a hardware cost of a generated processor core. In order to obtain these evaluation values, we, in advance, configure a variety of hardware units and the results are logic-synthesized and analyzed to establish estimation equations. We propose techniques for deriving the convincing equations which estimate both the delay and the area of the target processor core. For the area estimation, we show that the total area can be derived by the summation of area of a processor kernel and area of additional hardware units. The processor kernel area amounts to two independent rules: (1) area corresponding to an overhead when extra hardware units are added; (2) the size of general-purpose resisters. We have compared the derived estimation values with the in-advance logic-synthesized data. Errors of the area estimation are less than 2%. For the delay estimation, we can reduce estimation errors by focusing on the functional units on a critical path. Errors of the delay estimation are all less than 2ns.

    CiNii

  • A hardware/software cosynthesis system for digital signal processor cores

    IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences   E82-A;11   2325 - 2337  1999.11

  • 制御処理ハードウェアの高位合成システムのための面積/時間最適化アルゴリム

    IENAGA Masayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    電子情報通信学会技術報告   VLD99-66 ( 317 ) 15 - 22  1999.09

     View Summary

    This paper proposes an area/time optimizing algorithm in high-level synthesis for control-based hardwares. Given a call graph whose node corresponds to a control flow of an application program, the algorithm generates a set of state-transition graphs which represents the input call graph under area and timing constraint. In the algorithm, first state-transition graphs which Satisfy only timing constraint are generated and second they are transformed so that they can satisfy area constraint. Since the algorithm is directly applied to control-flow graphs, it can deal with control flows such as bit-wise processes and conditional branches. Further, the algorithm synthesizes more than one hardware architecture candidates from a single call graph for an application program. Designers of an application program can select several good hardware architectures among candidates according to multiple design criteria. Experimental results for several control-based hardwares demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • 制御処理を主体としたハードウェア記述生成手法

    情報処理学会DAシンポジウム'99論文集    1999.07

    CiNii

  • 制御処理を主体としたハードウェアを対象とする高位合成システムとその適用

    情報処理学会DAシンポジウム'99論文集    1999.07

  • 2種類のレジスタファイルを持ったディジタル信号処理向けプロセッサのハードウェア/ソフトウェア協調合成システム

    電子情報通信学会第12回回路とシステム軽井沢ワークショップ論文集    1999.04

  • 分枝限定法に基づく最適解を保証するリソースバインディング手法

    情報処理学会論文誌   40;4   1565 - 1577  1999.04

  • A depth-constrained technology mapping algorithm for logic-blocks composed of tree-structured LUTs

    N Togawa, K Ara, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E82A ( 3 ) 473 - 482  1999.03

     View Summary

    This paper proposes a fast depth-constrained technology mapping algorithm for logic-blocks composed of tree-structured lookup tables. First, we propose a technology mapping algorithm which minimizes the number of logic-blocks if an input Boolean network is a tree. Second, we propose a technology mapping algorithm which minimizes logic depth for any input Boolean network. Finally, we combine those two technology mapping algorithms and propose an algorithm which realizes technology mapping whose depth is bounded by a given upper bound d(c). Experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

  • FPGAを用いた再構成可能システムとその応用

    電子情報通信学会技術研究報告   VLD98;143  1999.03

  • A depth-constrained technology mapping algorithm for logic-blocks composed of tree-structured LUTs

    N Togawa, K Ara, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E82A ( 3 ) 473 - 482  1999.03

     View Summary

    This paper proposes a fast depth-constrained technology mapping algorithm for logic-blocks composed of tree-structured lookup tables. First, we propose a technology mapping algorithm which minimizes the number of logic-blocks if an input Boolean network is a tree. Second, we propose a technology mapping algorithm which minimizes logic depth for any input Boolean network. Finally, we combine those two technology mapping algorithms and propose an algorithm which realizes technology mapping whose depth is bounded by a given upper bound d(c). Experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

  • A simultaneous placement and global routing algorithm for FPGAS with power optimization

    N Togawa, K Ukai, M Yanagisawa, T Ohtsuki

    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS   9 ( 1-2 ) 99 - 112  1999.02  [Refereed]

     View Summary

    This paper proposes a simultaneous placement and global routing algorithm for FPGAs with power optimization. The algorithm is based on hierarchical bipartitioning of layout regions and sets of logic-blocks. When bipartitioning a layout region, pseudo-blocks are introduced to preserve connections if there exist connections between bipartitioned logic-block sets. A global route is represented by a sequence of pseudo-blocks. Since pseudo-blocks and logic-blocks can be dealt with equally, placement and global routing are processed simultaneously. The algorithm gives weights to nets with high switching probabilities and attempts to assign the blocks connected by weighted nets to the same region. Thus their length is shortened and the power consumption of a whole circuit can be reduced. The experimental results demonstrate the effectiveness and efficiency of the algorithm.

    DOI

  • A simultaneous placement and global routing algorithm for FPGAS with power optimization

    N Togawa, K Ukai, M Yanagisawa, T Ohtsuki

    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS   9 ( 1-2 ) 99 - 112  1999.02

     View Summary

    This paper proposes a simultaneous placement and global routing algorithm for FPGAs with power optimization. The algorithm is based on hierarchical bipartitioning of layout regions and sets of logic-blocks. When bipartitioning a layout region, pseudo-blocks are introduced to preserve connections if there exist connections between bipartitioned logic-block sets. A global route is represented by a sequence of pseudo-blocks. Since pseudo-blocks and logic-blocks can be dealt with equally, placement and global routing are processed simultaneously. The algorithm gives weights to nets with high switching probabilities and attempts to assign the blocks connected by weighted nets to the same region. Thus their length is shortened and the power consumption of a whole circuit can be reduced. The experimental results demonstrate the effectiveness and efficiency of the algorithm.

    DOI

  • 2種類のレジスタファイルを持ったディジタル信号処理向けプロセッサのハードウェア/ソフトウェア協調合成システムとその並列化コンパイラ

    NAKAMURA Tsuyoshi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    電子情報通信学会技術研究報告   FTS98;132   71 - 78  1999.02

     View Summary

    In digital signal processing, intermediate results require greater bit width than input data in order to keep high precision for arithmetic operation. If a digital signal processor has two types of register files, digital signal processing applications can keep high precision for arithmetic operation with small amount of processor area. This paper proposes a hardware/software cosynthesis system which synthesizes digital signal processors with two types of register files and its compiler. The in put of the system is an application program written in C and application data, and its output is hardware descriptions of a synthesized process or core, an application binary code executed on the process or core and software environment. The proposed compiler generates an assembly code for a processor core with all the available hardware units which can be added to the processor core. It extracts from an input application program those instructions which can be executed concurrently and attempts to minimize its execution time. Moreover it generates an assembly code which keeps required precision for arithmetic operation, since the proposed compiler assigns two types of data to two types of register files. The experimental results show the effectiveness of the system and the compiler.

    CiNii

  • A hardware/software partitioning algorithm for processor cores of digital signal processing

    N Togawa, T Sakurai, M Yanagisawa, T Ohtsuki

    PROCEEDINGS OF ASP-DAC '99     335 - 338  1999  [Refereed]

     View Summary

    A hardware/software cosynthesis system for processor cores of digital signal processing has been developed. This paper focuses on a hardware/software partitioning algorithm which is one of the Rey issues in the system. Given an input assembly code generated by the compiler in the system, the proposed hardware/software partitioning algorithm first determines the types and the numbers of required hardware units, such as multiple functional units, hardware loop units, and particular addressing units, for a processor core (initial resource allocation). Second, the hardware units determined at initial resource allocation are reduced one by one while the assembly code meets a given timing constraint (configuration of a processor core). The execution time of the assembly code becomes longer but the hardware costs for a processor core to execute it becomes smaller. Finally, it outputs an optimized assembly code and a processor configuration. Experimental results demonstrate that the system synthesizes processor cores effectively according to the features of an application program/data.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A hardware/software partitioning algorithm for processor cores of digital signal processing

    N Togawa, T Sakurai, M Yanagisawa, T Ohtsuki

    PROCEEDINGS OF ASP-DAC '99     335 - 338  1999

     View Summary

    A hardware/software cosynthesis system for processor cores of digital signal processing has been developed. This paper focuses on a hardware/software partitioning algorithm which is one of the Rey issues in the system. Given an input assembly code generated by the compiler in the system, the proposed hardware/software partitioning algorithm first determines the types and the numbers of required hardware units, such as multiple functional units, hardware loop units, and particular addressing units, for a processor core (initial resource allocation). Second, the hardware units determined at initial resource allocation are reduced one by one while the assembly code meets a given timing constraint (configuration of a processor core). The execution time of the assembly code becomes longer but the hardware costs for a processor core to execute it becomes smaller. Finally, it outputs an optimized assembly code and a processor configuration. Experimental results demonstrate that the system synthesizes processor cores effectively according to the features of an application program/data.

  • A high-level synthesis system for digital signal processing based on data-flow graph enumeration

    N Togawa, T Hisaki, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E81A ( 12 ) 2563 - 2575  1998.12

     View Summary

    This paper proposes a high-level synthesis system for datapath design of digital signal processing hardwares. The system consists of four phases: (I) DFG (data-flow graph) generation, (2) scheduling, (3) resource binding, and (4) HDL (hardware description language) generation. In (1), the system does not generate only one best DFG representing a given behavioral description of a hardware, but more than one good DFGs representing it. In (2) and (3), several synthesis tools can be incorporated into the system depending on the required objectives. Thus we can obtain more than one datapath candidates for a behavioral description with their area and performance evaluation. In (4), the best datapath design is selected among those candidates and its hardware description is generated. The experimental results for applying the system to several benchmarks show the effectiveness and efficiency.

  • FPGAのマクロブロックを対象とした配置概略配線同時処理手法

    情報処理学会研究報告   98-DA;90  1998.12

  • A high-level synthesis system for digital signal processing based on data-flow graph enumeration

    N Togawa, T Hisaki, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E81A ( 12 ) 2563 - 2575  1998.12

     View Summary

    This paper proposes a high-level synthesis system for datapath design of digital signal processing hardwares. The system consists of four phases: (I) DFG (data-flow graph) generation, (2) scheduling, (3) resource binding, and (4) HDL (hardware description language) generation. In (1), the system does not generate only one best DFG representing a given behavioral description of a hardware, but more than one good DFGs representing it. In (2) and (3), several synthesis tools can be incorporated into the system depending on the required objectives. Thus we can obtain more than one datapath candidates for a behavioral description with their area and performance evaluation. In (4), the best datapath design is selected among those candidates and its hardware description is generated. The experimental results for applying the system to several benchmarks show the effectiveness and efficiency.

  • Maple-opt: A performance-oriented simultaneous technology mapping, placement, and global routing algorithm for FPGA's

    N Togawa, M Yanagisawa, T Ohtsuki

    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS   17 ( 9 ) 803 - 818  1998.09  [Refereed]

     View Summary

    A new held programmable gate array (FPGA) design algorithm, Maple-opt, is proposed for technology mapping, placement, and global routing subject to a given upper bound of critical signal path delay. The basic procedure of Maple-opt is viewed as top-down hierarchical bipartition of a layout region. In each bipartitioning step, technology mapping onto logic blocks of FPGA's, their placement, and global routing are determined simultaneously, which leads to a more congestion-balanced layout for routing, In addition, Maple-opt is capable of estimating a lower bound of the delay for a constrained path and of extracting critical paths based on the difference between the lower bounds and given constraint values in each bipartitioning step. Two delay-reduction procedures for the critical paths are applied; routing delay reduction and logic-block delay reduction, The routing delay reduction is done by assigning each constrained path to a single subregion when bipartitioning a region. The logic-block delay reduction is done by mapping each constrained path onto a smaller number of logic blocks, Experimental results for benchmark circuits demonstrate that Maple-opt reduces the maximum number of tracks per channel by a maximum of 38% compared with existing algorithms while satisfying almost all the path delay constraints.

    DOI J-GLOBAL

    Scopus

    7
    Citation
    (Scopus)
  • 最適解を保証するリソースバインディング手法

    情報処理学会DAシンポジウム'98論文集    1998.07

  • A fast scheduling algorithm based on gradual time-frame reduction for datapath synthesis

    N Togawa, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E81A ( 6 ) 1231 - 1241  1998.06

     View Summary

    This paper proposes a fast scheduling algorithm based on gradual lime-frame reduction for datapath synthesis of digital signal processing hardwares. The objective of the algorithm is to minimize the costs for functional units and registers and to maximize connectivity under given computation time and initiation interval. Incorporating the connectivity in a scheduling stage can reduce multiplexer counts in resource binding. The algorithm maximizes connectivity with maintaining low time complexity and obtains datapath designs with totally small hardware costs in the high-level synthesis environment. The algorithm also resolves inter-iteration data dependencies and thus realizes pipelined datapaths. The experimental results demonstrate that the proposed algorithm reduces the multiplexer counts after resource binding with maintaining low costs for functional units and registers compared with eight conventional schedulers.

  • An FPGA layout reconfiguration algorithm based on global routes for engineering changes in system design specifications

    N Togawa, K Hagi, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E81A ( 5 ) 873 - 884  1998.05

     View Summary

    Rapid system prototyping is one of the main applications for field-programmable gate arrays (FPGAs). At the stage of rapid system prototyping, design specifications can often be changed since they cannot be determined completely. In this paper, layout design change is focused on and a layout reconfiguration algorithm is proposed for FPGAs. The target FPGA architecture is developed for transport processing. In order to implement more various circuits flexibly it has three-input lookup tables (LUTs) as minimum logic cells. Since its logic granularity is finer than that of conventional FPGAs, it requires more routing resources to connect them and minimization of routing congestion is indispensable. In layout reconfiguration, the main problem is to add LUTs to initial layouts. Our algorithm consists of two steps: For given placement and global routing of LUTs, in Step 1 an added LUT is placed with allowing that the position of the added LUT may overlap that of a preplaced LUT; Then in Step 2 preplaced LUTs are moved to their, adjacent positions so that the overlap of the LUT positions can be resolved. Global routes are updated corresponding to reconfiguration of placement. The algorithm keeps routing congestion small by evaluating global routes directly both in Steps 1 and 2. Especially in Step 2, if the minimum number of preplaced LUTs are moved to their adjacent positions, our algorithm minimizes routing congestion. Experimental results demonstrate that, if the number of added LUTs is at most 20% of the number of initial LUTs, our algorithm generates the reconfigured layouts whose routing congestion is as small as that obtained by executing a conventional placement and global routing algorithm. Run time of our algorithm is within approximately one second.

  • 分布定数回路の遅延感度解析に基づくクロック配線最適化手法

    情報処理学会研究報告   98-DA;88 ( 43 ) 21 - 28  1998.05

    CiNii

  • An FPGA layout reconfiguration algorithm based on global routes for engineering changes in system design specifications

    N Togawa, K Hagi, M Yanagisawa, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E81A ( 5 ) 873 - 884  1998.05

     View Summary

    Rapid system prototyping is one of the main applications for field-programmable gate arrays (FPGAs). At the stage of rapid system prototyping, design specifications can often be changed since they cannot be determined completely. In this paper, layout design change is focused on and a layout reconfiguration algorithm is proposed for FPGAs. The target FPGA architecture is developed for transport processing. In order to implement more various circuits flexibly it has three-input lookup tables (LUTs) as minimum logic cells. Since its logic granularity is finer than that of conventional FPGAs, it requires more routing resources to connect them and minimization of routing congestion is indispensable. In layout reconfiguration, the main problem is to add LUTs to initial layouts. Our algorithm consists of two steps: For given placement and global routing of LUTs, in Step 1 an added LUT is placed with allowing that the position of the added LUT may overlap that of a preplaced LUT; Then in Step 2 preplaced LUTs are moved to their, adjacent positions so that the overlap of the LUT positions can be resolved. Global routes are updated corresponding to reconfiguration of placement. The algorithm keeps routing congestion small by evaluating global routes directly both in Steps 1 and 2. Especially in Step 2, if the minimum number of preplaced LUTs are moved to their adjacent positions, our algorithm minimizes routing congestion. Experimental results demonstrate that, if the number of added LUTs is at most 20% of the number of initial LUTs, our algorithm generates the reconfigured layouts whose routing congestion is as small as that obtained by executing a conventional placement and global routing algorithm. Run time of our algorithm is within approximately one second.

  • ツリー状に接続されたLUTを対象とした深さ制約付きテクノロジーマッピング手法

    電子情報通信学会第11回回路とシステム軽井沢ワークショップ論文集    1998.04

  • パイプラインプロセッサのハードウェア記述自動生成手法

    電子情報通信学会技術研究報告   VLD97;117  1998.03

  • ディジタル信号処理向けプロセッサの自動合成システムにおける並列化コンパイラ

    電子情報通信学会技術研究報告   VLD97;116  1998.03

  • ディジタル信号処理向けプロセッサのハードウェア/ソフトウェア協調合成システム

    TOGAWA Nozomu, SAKURAI Takashi, YANAGISAWA Masao, OHTSUKI Tatsuo

    電子情報通信学会技術研究報告   VLD97;115 ( 576 ) 17 - 24  1998.03

     View Summary

    This paper proposes a hardware/software cosynthesis system for processor cores of digital signal processing and a hardware/software partitioning algorithm which is one of the key issues for the system.The target processor has a VLIW-type core which can be composed of a processor kernel, multiple data memory buses(X-bus sand Y-bus), hardware loops, addressing units, and multiple functinal units.The processor kernel includes five pipeline stages(RISC-type kernel)or three pipeline stsges(DSP-kernel).The system synthesizes a processor core by selecting the required hardware units among them based on a given application program and data and the hardware costs.As a result, the synthesized processor core can range from RISC to DSP.The experimental results show the effectiveness of our system and hardware/software partitioning algorithm

    CiNii

  • 平成9年度(第21回)丹羽記念賞

    丹羽記念会    1998.02

  • An incremental placement and global routing algorithm for field-programmable gate arrays

    N Togawa, K Hagi, M Yanagisawa, T Ohtsuki

    PROCEEDINGS OF THE ASP-DAC '98 - ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 1998 WITH EDA TECHNO FAIR '98     519 - 526  1998  [Refereed]

     View Summary

    Rapid system prototyping is one of the main applications for field-programmable gate arrays (FPGAs). At the stage of rapid system prototyping, design specifications can often be changed since they cannot always be determined completely. In this paper, layout design change is focused on and a layout reconfiguration algorithm is proposed for FPGAs. In layout reconfiguration, the main problem is to add LUTs to initial layouts. Our algorithm consists of two steps: For given placement and global routing of LUTs, Step 1 places an added LUT with allowing that the position of the added LUT may overlap that of a preplaced LUT; Then Step 2 moves preplaced LUTs to their adjacent positions so that the overlap of the LUT positions can be resolved. Global routes are updated corresponding to reconfiguration of placement. The algorithm keeps routing congestion small by evaluating global routes directly both in Steps 1 and 2. Especially in Step 2, if the minimum number of preplaced LUTs are moved to their adjacent positions, our algorithm minimizes routing congestion. Experimental results demonstrate the effectiveness and efficiency of the algorithm.

    DOI

  • A high-level synthesis system for digital signal processing based on enumerating data-flow graphs

    N Togawa, T Hisaki, M Yanagisawa, T Ohtsuki

    PROCEEDINGS OF THE ASP-DAC '98 - ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 1998 WITH EDA TECHNO FAIR '98     265 - 274  1998  [Refereed]

     View Summary

    This paper proposes a high-level synthesis system for datapath design of digital signal processing hardwares. The system consists of four phases: (1) DFG (data-flow graph) generation, (2) scheduling, (3) resource binding, and (4) HDL (hardware description language) generation. In (1), the system does not generate only one best DFG representing a given behavioral description of a hardware, but more than one good DFGs representing it. In (2) and (3), several synthesis tools can be incorporated into the system depending on the required objectives. Thus we can obtain more than one datapath candidates for a behavioral description with their area and performance evaluation. In (4), the best datapath design is selected among those candidates and its hardware description is generated. The experimental results for applying the system to several benchmarks show the effectiveness and efficiency.

    DOI

  • A Fast Scheduling Algorithm Based on Gradual Time-Frame Reduction for Datapath Synthesis

    IEICE Trans on Fundamentals of Electronics, Communications and Computer Sciences   E81-A/6   1231 - 1240  1998

  • A simultaneous placement and global routing algorithm for FPGAs with power optimization

    N Togawa, K Ukai, M Yanagisawa, T Ohtsuki

    APCCAS '98 - IEEE ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS     125 - 128  1998

     View Summary

    This paper proposes a, simultaneous placement and global routing algorithm for FPGAs with power optimization. The algorithm is based on hierarchical bipartitioning of layout regions and sets of logic-blocks. When bipartitioning a layout region, pseudo-blocks are introduced to preserve connections if there exist connections between bipartitioned logic-block sets. A global route is represented by a sequence of pseudo-blocks, Since pseudo-blocks and logic-blocks can be dealt with equally, placement and global routing are processed simultaneously. The algorithm gives weights to the nets with high switching probabilities and assigns the blocks connected by weighted nets to the same region. Thus their length is shortened and the power consumption of a whole circuit can be reduced. The experimental results demonstrate the effectiveness and efficiency of the algorithm.

  • Maple-opt: A performance-oriented simultaneous technology mapping, placement, and global routing algorithm for FPGA's

    Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems   17 ( 9 ) 803 - 818  1998

     View Summary

    A new field programmable gate array (FPGA) design algorithm, Maple-opt, is proposed for technology mapping, placement, and global routing subject to a given upper bound of critical signal path delay. The basic procedure of Mapleopt is viewed as top-down hierarchical bipartition of a layout region. In each bipartitioning step, technology mapping onto logic blocks of FPGA's, their placement, and global routing are determined simultaneously, which leads to a more congestionbalanced layout for routing. In addition, Maple-opt is capable of estimating a lower bound of the delay for a constrained path and of extracting critical paths based on the difference between the lower bounds and given constraint values in each bipartitioning step. Two delay-reduction procedures for the critical paths are applied
    routing delay reduction and logic-block delay reduction. The routing delay reduction is done by assigning each constrained path to a single subregion when bipartitioning a region. The logic-block delay reduction is done by mapping each constrained path onto a smaller number of logic blocks. Experimental results for benchmark circuits demonstrate that Maple-opt reduces the maximum number of tracks per channel by a maximum of 38% compared with existing algorithms while satisfying almost all the path delay constraints. © 1998 IEEE.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • An incremental placement and global routing algorithm for field-programmable gate arrays

    N Togawa, K Hagi, M Yanagisawa, T Ohtsuki

    PROCEEDINGS OF THE ASP-DAC '98 - ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 1998 WITH EDA TECHNO FAIR '98     519 - 526  1998

     View Summary

    Rapid system prototyping is one of the main applications for field-programmable gate arrays (FPGAs). At the stage of rapid system prototyping, design specifications can often be changed since they cannot always be determined completely. In this paper, layout design change is focused on and a layout reconfiguration algorithm is proposed for FPGAs. In layout reconfiguration, the main problem is to add LUTs to initial layouts. Our algorithm consists of two steps: For given placement and global routing of LUTs, Step 1 places an added LUT with allowing that the position of the added LUT may overlap that of a preplaced LUT; Then Step 2 moves preplaced LUTs to their adjacent positions so that the overlap of the LUT positions can be resolved. Global routes are updated corresponding to reconfiguration of placement. The algorithm keeps routing congestion small by evaluating global routes directly both in Steps 1 and 2. Especially in Step 2, if the minimum number of preplaced LUTs are moved to their adjacent positions, our algorithm minimizes routing congestion. Experimental results demonstrate the effectiveness and efficiency of the algorithm.

  • A high-level synthesis system for digital signal processing based on enumerating data-flow graphs

    N Togawa, T Hisaki, M Yanagisawa, T Ohtsuki

    PROCEEDINGS OF THE ASP-DAC '98 - ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 1998 WITH EDA TECHNO FAIR '98     265 - 274  1998

     View Summary

    This paper proposes a high-level synthesis system for datapath design of digital signal processing hardwares. The system consists of four phases: (1) DFG (data-flow graph) generation, (2) scheduling, (3) resource binding, and (4) HDL (hardware description language) generation. In (1), the system does not generate only one best DFG representing a given behavioral description of a hardware, but more than one good DFGs representing it. In (2) and (3), several synthesis tools can be incorporated into the system depending on the required objectives. Thus we can obtain more than one datapath candidates for a behavioral description with their area and performance evaluation. In (4), the best datapath design is selected among those candidates and its hardware description is generated. The experimental results for applying the system to several benchmarks show the effectiveness and efficiency.

  • A performance-oriented circuit partitioning algorithm with logic-block replication for multi-FPGA systems

    N Togawa, M Sato, T Ohtsuki

    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS   7 ( 5 ) 373 - 393  1997.10  [Refereed]

     View Summary

    In this paper, we extend the circuit partitioning algorithm which we had proposed for multi-FPGA systems and present a new algorithm in which the delay of each critical signal path is within a specified upper bound imposed on it. The core of the presented algorithm is recursive bipartitioning of a circuit. The bipartitioning procedure consists of three stages: (0) detection of critical paths; (1) bipartitioning of a set of primary inputs and outputs; and (2) bipartitioning of a set of logic-blocks. In (0), the algorithm computes the lower bounds of delays for paths with path delay constraints and detects the critical paths based on the difference between the lower and upper bounds dynamically in every bipartitioning procedure. The delays of the critical paths are reduced with higher priority. In (1), the algorithm attempts to assign the primary inputs and outputs on each critical path to one chip so that the critical path does not cross between chips. Finally in (2), the algorithm not only decreases the number of crossings between chips but also assigns the logic-blocks on each critical path to one chip by exploiting a network flow technique. The algorithm has been implemented and applied to MCNC PARTITIONING 93 benchmark circuits. The experimental results demonstrate that it resolves almost all path delay constraints while maintaining the maximum number of required I/O blocks per chip small compared with conventional algorithms.

    DOI

    Scopus

  • A performance-oriented simultaneous placement and global routing algorithm for transport-processing FPGAs

    N Togawa, M Sato, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E80A ( 10 ) 1795 - 1806  1997.10

     View Summary

    In layout design of transport-processing FPGAs, it is required that not only routing congestion is kept small but also circuits implemented on them operate with higher operation frequency. This paper extends the proposed simultaneous placement and glob al routing algorithm for transport-processing FPGAs whose objective is to minimize routing congestion and proposes a new algorithm in which the length of each critical signal path (path length) is limited within a specified upper bound imposed on it (path length constraint). The algorithm is based on hierarchical bipartitioning of layout regions and LUT (LookUp Table) sets to be placed. In each bipartitioning, the algorithm first searches the paths with tighter path length constraints by estimating their path lengths. Second the algorithm proceeds the bipartitioning so that the path lengths of critical paths can be reduced. The algorithm is applied to transport-processing circuits and compared with conventional approaches. The results demonstrate that the algorithm satisfies the path length constraints for 11 out of 13 circuits, though it increases routing congestion by an average of 20%. After detailed routing, it achieves 100% routing for all the circuits and decreases a circuit delay by an average of 23%.

  • Fast scheduling and allocation algorithms for entropy CODEC

    K Suzuki, N Togawa, M Sato, T Ohtsuki

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E80D ( 10 ) 982 - 992  1997.10

     View Summary

    Entropy coding/decoding are implemented on FPGAs as a fast and flexible system in which high-level synthesis technologies are key issues. In this paper, we propose scheduling and allocation algorithms for behavioral descriptions of entropy CODEC. The scheduling algorithm employs a control-flow graph as input and finds a solution with minimal hardware cost and execution time by merging nodes in the control-flow graph. The allocation algorithm assigns operations to operators with various bit lengths. As a result, register-transfer level descriptions are efficiently obtained from behavioral descriptions of entropy CODEC with complicated control flow and variable bit lengths. Experimental results demonstrate that our algorithms synthesize the same circuits as manually designed within one second.

  • A performance-oriented simultaneous placement and global routing algorithm for transport-processing FPGAs

    N Togawa, M Sato, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E80A ( 10 ) 1795 - 1806  1997.10

     View Summary

    In layout design of transport-processing FPGAs, it is required that not only routing congestion is kept small but also circuits implemented on them operate with higher operation frequency. This paper extends the proposed simultaneous placement and glob al routing algorithm for transport-processing FPGAs whose objective is to minimize routing congestion and proposes a new algorithm in which the length of each critical signal path (path length) is limited within a specified upper bound imposed on it (path length constraint). The algorithm is based on hierarchical bipartitioning of layout regions and LUT (LookUp Table) sets to be placed. In each bipartitioning, the algorithm first searches the paths with tighter path length constraints by estimating their path lengths. Second the algorithm proceeds the bipartitioning so that the path lengths of critical paths can be reduced. The algorithm is applied to transport-processing circuits and compared with conventional approaches. The results demonstrate that the algorithm satisfies the path length constraints for 11 out of 13 circuits, though it increases routing congestion by an average of 20%. After detailed routing, it achieves 100% routing for all the circuits and decreases a circuit delay by an average of 23%.

  • 機能メモリを使用したプロセッサを対象とするハードウェア/ソフトウェア協調合成システム

    電子情報通信学会技術研究報告   CPSY98;85  1997.09

  • ディジタル信号処理を対象とした高位合成システムにおける高速なスケジューリングアルゴリズム

    情報処理学会DAシンポジウム'97論文集    1997.07

  • スケッチレイアウトシステムにおけるBGAパッケージ配線手法

    回路実装学会誌   12;4   241 - 246  1997.07

    DOI

  • FPGAを対象とした低消費電力指向配置・概略配線同時処理手法

    UKAI Kaoru, TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    電子情報通信学会技術研究報告   VLD97;42 ( 141 ) 191 - 198  1997.06

     View Summary

    FPGAs are becoming used as interfaces which connect core chips and peripheral LSIs, that can increase the flexibility of system LSIs. Since FPGAs consume more power than conventional gate arrays, a design methodology with power optimization is required for FPGAs in order to reduce power consumption in the entire system LSIs. In this paper, we propose a simultaneous placement and global routing algorithm for FPGAs with power optimization. The algorithm is based on hierarchical bipartitioning of layout regions and sets of logic-blocks to be placed. If there exist connections between bipatitioned logic-block sets, pairs of pseudo-blocks are introduced to preserve the connections. A global route is represented by a sequence of pseudo-blocks. Since pseudo-blocks and logic-blocks can be dealt with equally, placement and global routing are processed simultaneously. The algorithm attaches weights to the nets with high switching activities and assignes the blocks connected by weighted nets to the same region. Thus their length is shortened and power consumption of a whole circuit can be reduced. The experimental results demonstrate the efficiency and effectiveness of the algorithm.

    CiNii

  • システム設計仕様部分的変更を実現する概略配線径路を考慮したFPGA向けレイアウト再構成手法

    電子情報通信学会第10回回路とシステム軽井沢ワークショップ論文集    1997.04

  • A circuit partitioning algorithm with path delay constraints for multi-FPGA systems

    N Togawa, M Sato, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E80A ( 3 ) 494 - 505  1997.03

     View Summary

    In this paper, we extend the circuit partitioning algorithm which we have proposed for multi-FPGA systems and present a new algorithm in which the delay of each critical signal path is within a specified upper bound imposed on it. The core of the presented algorithm is recursive bipartitioning of a circuit. The bipartitioning procedure consists of three stages: 0) detection of critical paths; 1) bipartitioning of a set of primary inputs and outputs; and 2) bipartitioning of a set of logic-blocks. In 0), the algorithm computes the lower bounds of delays for paths with path delay constraints and detects the critical paths based on the difference between the lower and upper bound dynamically in every bipartitioning procedure. The delays of the critical paths are reduced with higher priority. In 1), the algorithm attempts to assign the primary inputs and outputs on each critical path to one chip so that the critical path does not cross between chips. Finally in 2), the algorithm not only decreases the number of crossings between chips but also assigns the logic-blocks on each critical path to one chip by exploiting a network flow technique. The algorithm has been implemented and applied to MCNC PARTITIONING 93 benchmark circuits. The experimental results demonstrate that it resolves almost all path delay constraints with maintaining the maximum number of required I/O blocks per chip small compared with conventional algorithms.

  • A circuit partitioning algorithm with path delay constraints for multi-FPGA systems

    N Togawa, M Sato, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E80A ( 3 ) 494 - 505  1997.03

     View Summary

    In this paper, we extend the circuit partitioning algorithm which we have proposed for multi-FPGA systems and present a new algorithm in which the delay of each critical signal path is within a specified upper bound imposed on it. The core of the presented algorithm is recursive bipartitioning of a circuit. The bipartitioning procedure consists of three stages: 0) detection of critical paths; 1) bipartitioning of a set of primary inputs and outputs; and 2) bipartitioning of a set of logic-blocks. In 0), the algorithm computes the lower bounds of delays for paths with path delay constraints and detects the critical paths based on the difference between the lower and upper bound dynamically in every bipartitioning procedure. The delays of the critical paths are reduced with higher priority. In 1), the algorithm attempts to assign the primary inputs and outputs on each critical path to one chip so that the critical path does not cross between chips. Finally in 2), the algorithm not only decreases the number of crossings between chips but also assigns the logic-blocks on each critical path to one chip by exploiting a network flow technique. The algorithm has been implemented and applied to MCNC PARTITIONING 93 benchmark circuits. The experimental results demonstrate that it resolves almost all path delay constraints with maintaining the maximum number of required I/O blocks per chip small compared with conventional algorithms.

  • スケッチレイアウトシステムにおけるBGAパッケージ配線手法

    電子情報通信学会VLSI設計技術研究会   VLD96;96  1997.03

  • 接続コストの最小化を目的とした高速アロケーション手法

    KATO Kenkichi, TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    電子情報通信学会VLSI設計技術研究会   VLD96;96 ( 556 ) 1 - 8  1997.03

     View Summary

    In this paper, we propose a fast resource allocation algorithm for data path design of digital signal processors, whose objective is minimizing interconnection costs of the architecture in reegister-transfer level. The proposed algorithm is composed of (1)register assignment by bipartite weighted matching, (2)functional unit assignment by interconnection probability, and (3)register reassignment by interconnection probability. Since the algorithm takes account of interconnection costs in (1), (2), and (3), it obtains near-optimal solutions in a short time. The experimental results show the effectiveness and efficiency of our algorithm.

    CiNii

  • A Circuit Partitioning Alglrithm with Replication Capability for Multi-FPGA Systems

    IEICE Trans,on Fundementals of Eledtronics,Communications and Computer Sciences   E78-A/13   1118 - 1123  1997

  • A performance-oriented circuit partitioning algorithm with logic-block replication for multi-FPGA systems

    Nozomu Togawa, Masao Sato, Tatsuo Ohtsuki

    Journal of Circuits, Systems and Computers   7 ( 5 ) 373 - 393  1997

     View Summary

    In this paper, we extend the circuit partitioning algorithm which we had proposed for multi-FPGA systems and present a new algorithm in which the delay of each critical signal path is within a specified upper bound imposed on it. The core of the presented algorithm is recursive bipartitioning of a circuit. The bipartitioning procedure consists of three stages: (0) detection of critical paths
    (1) bipartitioning of a set of primary inputs and outputs
    and (2) bipartitioning of a set of logic-blocks. In (0), the algorithm computes the lower bounds of delays for paths with path delay constraints and detects the critical paths based on the difference between the lower and upper bounds dynamically in every bipartitioning procedure. The delays of the critical paths are reduced with higher priority. In (1), the algorithm attempts to assign the primary inputs and outputs on each critical path to one chip so that the critical path does not cross between chips. Finally in (2), the algorithm not only decreases the number of crossings between chips but also assigns the logic-blocks on each critical path to one chip by exploiting a network flow technique. The algorithm has been implemented and applied to MCNC PARTITIONING 93 benchmark circuits. The experimental results demonstrate that it resolves almost all path delay constraints while maintaining the maximum number of required I/O blocks per chip small compared with conventional algorithms.

    DOI

    Scopus

  • Simultaneous placement and global routing for transport-processing FPGA layout

    N Togawa, M Sato, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E79A ( 12 ) 2140 - 2150  1996.12

     View Summary

    Transport-processing FPGAs have been proposed for flexible telecommunication systems. Since those FPGAs have finer granularity of logic functions to implement circuits on them the amount of routing resources tends to increase. Tn order to keep routing congestion small, it is necessary to execute placement and routing simultaneously. This paper proposes a simultaneous placement and global routing algorithm for transport-processing FPGAs whose primary objective is minimizing routing congestion. The algorithm is based on hierarchical bipartition of layout regions and sets of LUTs (LookUp Tables) to be placed. It achieves bipartitioning which leads to small routing congestion by applying a network Bow technique to it and computing a maximum Bow and a minimum cut. If there exist connections between bipartitioned LUT sets, pairs of pseudo-terminals are introduced to preserve the connections. A sequence of pseudo-terminals represents a global route of each net. As a result, both placement of LUTs and global routing are determined when hierarchical bipartitioning procedures are finished. The proposed algorithm has been implemented and applied to practical transport-processing circuits. The experimental results demonstrate that it decreases routing congestion bq an average of 37% compared with a conventional algorithm and achieves 100% routing for the circuits for which the conventional algorithm causes unrouted nets.

  • Dharmaアーキテクチャに基づくFPGAチップの試作

    マイクロエレクトロニクス研究開発機構第15回研究交流会    1996.12

  • Scheduling and Allocation Algorithms for Entropy CODEC

    SUZUKI K.

    Proceedings of the Sixth Workshop on Synthesis and System Integration of Mixed Technologies (SASIMI'96)     149 - 154  1996.11

    CiNii

  • パス長制約を考慮した通信処理用FPGA向け配置・概略配線同時処理手法

    TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    情報処理学会設計自動化研究会   DA96;81 ( 299 ) 9 - 16  1996.10

     View Summary

    In layout design of transport-processing FPGAs, it is required that not only routing congestion is kept small but also circuits implemented on them operate with higher operation frequency. This paper extends the proposed simultaneous placement and global routing algorithm for transport-processing FPGAs whose objective is to minimize routing congestion and proposes a new algorithm in which the length of each critical signal path (path length) is limited within a specified upper bound imposed on it (path length constraint). The algorithm is based on hierarchical bipartitioning of layout regions and LUT (LookUp Table) sets to be placed. Each bipartitioning procedure consists of three phases: (0) estimation of path lengths, (1) bipartitioning of a set of terminals, and (2) bipartitioning of a set of LUTs. After searching the paths with tighter path length constraints by estimating path lengths in (0), (1) and (2) are executed so that their path lengths are reduced with higher priority and thus path length constraints are not violated. The experimental results demonstrate the efficiency and effectiveness of the algorithm.

    CiNii

  • 高位合成システムを用いた画像符号化アルゴリズムのハードウェア合成法

    情報処理学会DAシンポジウム'96論文集    1996.08

  • データパス設計を対象とした高位合成システム

    情報処理学会DAシンポジウム'96論文集    1996.08

    CiNii

  • 安藤研究所第9回安藤博記念学術奨励賞

       1996.06

  • 通信処理用FPGAを対象とした配置・概略配線同時処理手法

    TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    情報処理学会設計自動化研究会   DA96;80   15 - 22  1996.05

     View Summary

    This paper proposes a simultaneous placement and global routing algorithm for transport-processing FPGAs whose primary objective is minimizing routing congestion. The algorithm is based on hierarchical bipartition of layout regions and sets of LUTs (LookUp Tables) to be placed. It achieves bipartitioning which leads to small routing congestion by applying a network flow technique to it and computing a maximum flow and a minimum cut. If there exist connections between bipartitioned LUT sets, pairs of pseudo-terminals are introduced to preserve the connections. A sequence of pseudo-terminals represents a global route of each net. As a result, both placement of LUTs and global routing are determined when hierarchical bipartitioning procedures are finished. The experimental results for practical transport-processing circuits show its efficiency and effectiveness.

    CiNii

  • プリント配線板を対象とした二層均等化スペーシング手法

    情報処理学会設計自動化研究会   DA96;80 ( 51 ) 9 - 14  1996.05

    CiNii

  • 電子情報通信学会第8回回路とシステム軽井沢ワークショップ研究奨励賞

       1996.04

  • A simultaneous technology mapping, placement, and global routing algorithm for FPGAs with path delay constraints

    N Togawa, M Sato, T Ohtsuki

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E79A ( 3 ) 321 - 329  1996.03  [Refereed]

     View Summary

    In this paper, we propose a new FPGA design algorithm, Maple-opt, in which technology mapping, placement, and global routing are executed so that the delay of each critical signal path in an input circuit is within a specified upper bound imposed on it. The basic algorithm of Maple-opt is top-down hierarchical bi-partitioning of regions. Technology mapping onto logic-blocks of FPGAs, their placement, and global routing are determined simultaneously in each hierarchical process. This simultaneity leads to less congested layout For routing. In addition to that, Maple-opt computes a lower bound of delay for each path with a constraint value and determines critical paths based on the difference between the lower bound and the constraint value dynamically in each hierarchical process. Two delay reduction processes are executed for the critical paths; one is routing delay reduction and the other is logic-block delay reduction. Routing delay reduction is realized such that, when bi-partitioning a region, each constrained path is assigned to one subregion. Logic-block delay reduction is realized such that each constrained path is mapped onto fewer logic-blocks. Experimental results for some benchmark circuits show its efficiency and effectiveness.

  • A simultaneous placement and global routing algorithm with path length constraints for transport-processing FPGAs

    N Togawa, M Sato, T Ohtsuki

    PROCEEDINGS OF THE ASP-DAC '97 - ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 1997     569 - 578  1996  [Refereed]

     View Summary

    In layout design of transport-processing FPGAs, it is required that not only routing congestion is kept small but also circuits implemented on them operate with higher operation frequency. This paper extends the proposed simultaneous placement and global routing algorithm for transport-processing FPGAs whose objective is to minimize routing congestion and proposes a new algorithm in which the length of each critical signal path (path length) is limited within a specified upper bound imposed on it (path length constraint). The algorithm is based on hierarchical bipartitioning of layout regions and LUT (LookUp Table) sets to be placed. Each bipartitioning procedure consists of three phases: (0) estimation of path lengths, (1) bipartitioning of a set of terminals, and (2) bipartitioning of a set of LUTs. After searching the paths with tighter path length constraints by estimating path lengths in (0), (1) and (2) are executed so that their path lengths are reduced with higher priority and thus path length constraints are not violated. The algorithm has been implemented and applied to transport-processing circuits compared with conventional approaches. The results demonstrate that the algorithm resolves path length constraints for 11 out of 13 circuits, though it increases routing congestion by an average of 20%. After detailed routing, it achieves 100% routing for all the circuits and decreases a circuit delay by an average of 23%.

    DOI

  • A simultaneous placement and global routing algorithm with path length constraints for transport-processing FPGAs

    N Togawa, M Sato, T Ohtsuki

    PROCEEDINGS OF THE ASP-DAC '97 - ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 1997     569 - 578  1996

     View Summary

    In layout design of transport-processing FPGAs, it is required that not only routing congestion is kept small but also circuits implemented on them operate with higher operation frequency. This paper extends the proposed simultaneous placement and global routing algorithm for transport-processing FPGAs whose objective is to minimize routing congestion and proposes a new algorithm in which the length of each critical signal path (path length) is limited within a specified upper bound imposed on it (path length constraint). The algorithm is based on hierarchical bipartitioning of layout regions and LUT (LookUp Table) sets to be placed. Each bipartitioning procedure consists of three phases: (0) estimation of path lengths, (1) bipartitioning of a set of terminals, and (2) bipartitioning of a set of LUTs. After searching the paths with tighter path length constraints by estimating path lengths in (0), (1) and (2) are executed so that their path lengths are reduced with higher priority and thus path length constraints are not violated. The algorithm has been implemented and applied to transport-processing circuits compared with conventional approaches. The results demonstrate that the algorithm resolves path length constraints for 11 out of 13 circuits, though it increases routing congestion by an average of 20%. After detailed routing, it achieves 100% routing for all the circuits and decreases a circuit delay by an average of 23%.

  • A performance-oriented circuit partitioning algorithm with logic-block replication for multi-FPGA systems

    N Togawa, M Sato, T Ohtsuki

    APCCAS '96 - IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS '96     294 - 297  1996

     View Summary

    This paper proposes a circuit partitioning algorithm in which the delay of each critical signal path is within a specified upper bound. Its core is recursive bipartitioning of a circuit which consists of three stages: 0) detection of critical paths; 1) bipartitioning of a set of primary inputs and outputs; and 2) bipartitioning of a set of logic-blocks. In 0), the algorithm detects the critical paths based on their lower bounds of delays. The delays of the critical paths are reduced with higher priority In 1), the algorithm attempts to assign the primary input and output on each critical path to one chip. In 2), the algorithm not only decreases the number of crossings between chips but also assigns the logic-blocks on each critical path to one chip by exploiting a network flow technique with logic-block replication. The experimental results demonstrate that it resolves almost all path delay constraints with maintaining the maximum number of required I/O blocks per chip small compared with conventional algorithms.

  • Maple-opt: a simultaneous technology mapping, placement, and global routing algorithm FPGAs with performance optimization.

    Nozomu Togawa, Masao Sato, Tatsuo Ohtsuki

    Proceedings of the 1995 Conference on Asia Pacific Design Automation, Makuhari, Massa, Chiba, Japan, August 29 - September 1, 1995     319 - 327  1995  [Refereed]

    DOI CiNii

  • MAPLE - A SIMULTANEOUS TECHNOLOGY MAPPING, PLACEMENT, AND GLOBAL ROUTING ALGORITHM FOR FIELD-PROGRAMMABLE GATE ARRAYS

    N TOGAWA, M SATO, T OHTSUKI

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E77A ( 12 ) 2028 - 2038  1994.12  [Refereed]

     View Summary

    Technology mapping algorithms for LUT (Look Up Table) based FPGAs have been proposed to transfer a Boolean network into logic-blocks. However, since those algorithms take no layout information into account, they do not always lead to excellent results. In this paper, a simultaneous technology mapping, placement and global routing algorithm for FPGAs, Maple, is presented. Maple is an extended version of a simultaneous placement and global routing algorithm for FPGAs, which is based on recursive partition of layout regions and block sets. Maple inherits its basic process and executes the technology mapping simultaneously in each recursive process. Therefore, the mapping can be done with the placement and global routing information. Experimental results for some benchmark circuits demonstrate its efficiency and effectiveness.

  • MAPLE - A SIMULTANEOUS TECHNOLOGY MAPPING, PLACEMENT, AND GLOBAL ROUTING ALGORITHM FOR FPGAS

    N TOGAWA, M SATO, T OHTSUKI

    APCCAS '94 - 1994 IEEE ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS     554 - 559  1994  [Refereed]

  • A SIMULTANEOUS PLACEMENT AND GLOBAL ROUTING ALGORITHM FOR FPGAS

    N TOGAWA, M SATO, T OHTSUKI

    1994 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 1     A483 - A486  1994  [Refereed]

    DOI

  • A simultaneous technology mapping, placement, and global routing algorithm for field-programmable gate arrays.

    Nozomu Togawa, Masao Sato, Tatsuo Ohtsuki

    Proceedings of the 1994 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 1994, San Jose, California, USA, November 6-10, 1994     156 - 163  1994  [Refereed]

    DOI CiNii

▼display all

Books and Other Publications

  • CMOS VLSI 回路設計 応用編

    ウェスト,ハリス著, 宇佐美公良, 池田誠, 小林和淑監訳, 戸川望他分担共訳( Part: Joint translator)

    丸善出版  2014.01 ISBN: 9784621087206

  • 組込みシステム概論

    戸川望編著

    CQ出版  2008.02 ISBN: 9784789845502

Research Projects

  • 量子・古典ハイブリッドテストベッドの利用環境整備

    戦略的イノベーション創造プログラム(SIP)第三期

    Project Year :

    2023.10
    -
    2028.03
     

  • 再構成アクセラレータのための近似最適化手法

    日本学術振興会  科学研究費助成事業

    Project Year :

    2023.04
    -
    2026.03
     

    木村 晋二, 戸川 望, 孫 鶴鳴

  • 量子計算及びイジング計算システムの統合型研究開発

    NEDO 高効率・高速処理を可能とするAIチップ・次世代コンピューティングの技術開発

    Project Year :

    2021.04
    -
    2026.03
     

  • 攻撃に耐性を持つ機械学習モデルによる設計工程ハードウェアトロイ検知

    日本学術振興会  科学研究費助成事業

    Project Year :

    2022.04
    -
    2025.03
     

    戸川 望, 木村 晋二

  • 地理空間情報を自在に操るイジング計算機の新展開

    科学技術振興機構  戦略的な研究開発の推進 戦略的創造研究推進事業 CREST

    Project Year :

    2019
    -
    2024
     

    戸川 望

     View Summary

    本研究はSociety5.0の実現に不可欠な「地理空間情報処理」の高度化に焦点をあて,ノイマン型コンピューティング技術によるプログラムパラダイムを抜本的に変革し,地理空間情報処理向けイジングプログラミングを確立します.多種制約付き多地点最適巡回経路探索など多くの地理空間情報処理問題をイジング模型にマッピング,実イジング計算機にエンベッドし,実規模かつ実制約を持つ地理空間情報処理問題を解法します.

  • 次世代アクセラレータ基盤に係る研究開発

    戦略的イノベーション創造プログラム(SIP)第二期

    Project Year :

    2019.10
    -
    2023.03
     

  • Hardware-Trojan Detection for Integrated Circuit Design Data based on Machine Learning

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

    Project Year :

    2019.04
    -
    2022.03
     

    Togawa Nozomu

     View Summary

    Recently, as Internet of Things (IoT) devices become widespread, the demand for embedded hardware devices has been increasing. In order to produce embedded hardware devices more inexpensively, the manufacturing bases have been internationalized, and several processes in the IC design and manufacturing steps have been outsourced to third-party vendors. Under the circumstances, a hardware Trojan, which is a malicious function circuit inserted into a hardware device, may be inserted into IC products by the malicious third-party vendors, and therefore the risk of hardware Trojans has arisen. In this research, we have developed a machine-learning-based hardware Trojan detection method to detect known and unknown hardware Trojans effectively and efficiently.

  • Optimum Data Representation and Its Accuracy Assurance for Reconfigurable Accelerators

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2018.04
    -
    2021.03
     

    Kimura Shinji

     View Summary

    The project is on the optimum data representation and its accuracy assurance for reconfigurable accelerators including reconfigurable hardware modules such as FPGA (Field Programmable Logic Array). A reconfigurable accelerator can construct dedicated special hardware accelerators depending on applications. In the optimization of data representation for reconfigurable accelerators, the area, delay and power are optimized under the error tolerance of applications. On image processing and image recognition applications, new data representation methods, operational units for the data representation, and their evaluation methods have been devised and evaluated.

  • イジングマシン共通ソフトウェア基盤の研究開発

    NEDO  高効率・高速処理を可能とするAIチップ・次世代コンピューティングの技術開発

    Project Year :

    2018.09
    -
    2021.03
     

    戸川 望

  • 設計・製造におけるチップの脆弱性検知手法の研究開発

    総務省・ICT重点技術の研究開発プロジェクト

    Project Year :

    2019.09
    -
    2020.03
     

  • Abstract LSI Model and Its Associated High-Level Synthesis Algorithm for Deep Submicron Technologies

    科学研究費助成事業(早稲田大学)  科学研究費助成事業(基盤研究(B))

    Project Year :

    2010
    -
    2012
     

    戸川 望

     View Summary

    In this reseach, we have firstly developed an abstract LSI model, where we introduce "logical connection" and "physical connection" among registers, controllers, and functinal units inside an LSI chip. Using our abstract LSI model, we can have well-defined interface between high-level design and physical-level design. Secondly, we have developed a high-level synthesis algorithm for our abstract LSI mode, which realizes physical-synthesis-aware high-level sythnsis. Our simulation results demonstrate that our abstract LSI model and its associated high-level sysnthsis outperform several convetntional LSI synthesis modethods.

  • Indoor Route Search and Display Methods based on Cognitive Science and Printed Circuit Board Wiring

    科学研究費助成事業(早稲田大学)  科学研究費助成事業(挑戦的萌芽研究)

    Project Year :

    2009
    -
    2011
     

    戸川 望

     View Summary

    Mobile communication services rapidly grow up due to mobile communication network development and downsized mobile devices. However such kinds of services are available only for outdoor environments, since it is difficult to trace positions in indoor environments. Moreover there exist few research works to automatically model indoor environments. In this research, we propose a method to automatically model indoor environments, making use of PCB wiring techniques. Based on this modeling, we propose an indoor pedestrian navigation system including route searching and route guidance. We have shown the effectiveness of our proposed systems through practical experiments.

  • 選択論理を利用した超高速な差積演算回路とその実応用回路の設計

    産学が連携した研究開発成果の展開 研究成果展開事業 研究成果最適展開支援プログラム(A-STEP) 探索タイプ

    Project Year :

    2010
     
     
     

    戸川 望

     View Summary

    差積演算は高速フーリエ変換や超解像処理など重要な応用回路の基本演算として頻出する演算であるが、減算と乗算の順序関係のため演算時間が増大するという問題点がある。これに対し申請者は、差積演算を2進表現し展開し途中項をxizi+yizi(xi,yi,ziは1ビット変数)という形式で表現することに成功した。これは選択論理と呼ばれ桁上げがなく、前処理として高速演算すれば途中項数を1/2に、つまり差積演算時間あるいはその面積を最大1/2に削減できる。この成果のもと本申請では、高速フーリエ変換の基本演算としてバタフライ演算を高速化、続いて、超解像処理の基本演算として加重加算を高速化・低面積化した。バタフライ演算では11%以上の高速化を達成し、加重加算では25%以上の高速化および最大50%の低面積化を実現した。今後、これらの基本演算の高速化・低面積化により直接、デジタルテレビ規格で用いられるOFDM(直交波周波数分割多重)方式や、超解像処理、3次元画像処理を等して、たとえば次世代デジタルテレビ放送などに対して大きくその飛躍が期待できる。

  • Next-Generation High-Level Synthesis System Based on Deep Submicron Technology

    科学研究費助成事業(早稲田大学)  科学研究費助成事業(若手研究(B))

    Project Year :

    2006
    -
    2008
     

    戸川 望

  • 高信頼IoT社会を実現する分散型基盤アーキテクチャの研究開発

    NEDO  エネルギー・環境新技術先導プログラム

    戸川 望

  • 極低エネルギー化を実現する統合化システムLSI設計技術

    NEDO  先導的産業技術創出事業

    戸川 望

  • IoT部品・機器・ネットワークの階層横断セキュリティ技術の研究開発

    総務省  戦略的情報通信研究開発推進事業(SCOPE)

    戸川 望

  • 設計工程に侵入したハードウェアトロイの検出と耐ハードウェアトロイ設計技術の研究開発

    総務省  戦略的情報通信研究開発推進事業(SCOPE)

    戸川 望

  • 不揮発メモリの書込みビット数を厳密に最小化する符号化とノーマリオフ計算への応用

    科学研究費助成事業(早稲田大学)  科学研究費助成事業(挑戦的研究(萌芽))

    戸川 望

  • 画像処理向け組込みプロセッサのハ-ドウェア/ソフトウェア協調設計手法に関する研究

    科学研究費助成事業(早稲田大学)  科学研究費助成事業(奨励研究(A))

    戸川 望

  • FPGAを対象とした動的再構成可能システムとその設計環境に関する研究

    科学研究費助成事業(早稲田大学)  科学研究費助成事業(奨励研究(A))

    戸川 望

  • 高速大容量ネットワークプロセッサ設計システムに関する研究

    科学研究費助成事業(北九州市立大学)  科学研究費助成事業(若手研究(B))

    戸川 望

  • セレクタ論理を利用し部分積項数を半減する差積演算回路設計とその画像処理応用

    科学研究費助成事業(早稲田大学)  科学研究費助成事業(挑戦的萌芽研究)

    戸川 望

  • 大域的超低エネルギー化を実現するLSI抽象モデルと上位下位統合化LSI設計技術

    科学研究費助成事業(早稲田大学)  科学研究費助成事業(基盤研究(B))

    戸川 望

▼display all

Misc

  • イジングモデルの係数削減による量子イジングマシンの出力改善の評価

    谷地, 悠太, 多和田, 雅師, 戸川, 望

    DAシンポジウム2023論文集   2023   141 - 148  2023.08

     View Summary

    量子イジングマシンは,イジングモデルを入力として取り,その基底状態を探索することで組合せ最適化問題を解く.しかし,入力したイジングモデルの係数と量子イジングマシンが解法する係数に誤差が発生する.本誤差の存在により,量子イジングマシンは入力したイジングモデルの基底状態を必ずしも出力しない.本稿では,イジングモデルの係数が量子イジングマシンの基底状態を得る確率に与える影響を示す.入力するイジングモデルの係数の絶対値における最大値と最小値の比率が大きくなるにつれ,量子イジングマシンが基底状態を得る確率が減少することを実験で示した.加えて,同実験結果をもとに,量子イジングマシンにイジングモデルを入力した場合に発生する誤差の大きさを推定する方法を検討した.さらに,入力するイジングモデルの係数の絶対値における最大値と最小値の比率を削減することで,量子イジングマシンが基底状態を得る確率が上昇することを実験で示した.上記実験結果は,量子イジングマシンが基底状態を得る確率の上昇のためのアプローチとして,入力するイジングモデルの係数の削減が有効である可能性を示している.

  • 量子ビット読み出し時間を削減するトポロジ周期性活用のマイナ埋め込み手法

    多和田, 雅師, 戸川, 望

    DAシンポジウム2023論文集   2023   167 - 172  2023.08

     View Summary

    マイナ埋め込みは量子アニーリングの実行時にレイテンシ増加を引き起こす.レイテンシの削減を目指すために,全結合グラフからハードウェアトポロジへのマイナ埋め込みパタンを事前に準備する戦略が存在する.既存の手法では,入力された論理イジングモデルを全結合グラフとして扱い,実行時のマイナ埋め込みを省略するためにマイナ埋め込みパタンを生成する.我々は,量子ビットの個々のばらつきが既存の手法の読み出し時間を増加させることを発見した.本稿では,量子ビットの個々のばらつきを考慮に入れ,読み出し時間を最小化する全結合グラフのマイナ埋め込み手法を提案する.提案手法では,量子アニーリングマシンのハードウェアトポロジに周期性が存在することに注目し,元のマイナ埋め込みパタンをユニットセルごとにシフトさせて読み出し時間が最小となるマイナ埋め込みパタンを探索する.計算機実験により,提案した手法は既存手法と比較して,量子アニーリングの実行時間の一部である読み出し時間を削減することが確認された.

  • ACOによる時間変化に対応した旅行計画最適化手法

    佐伯, 越志, 鮑, 思雅, 高山, 敏典, 戸川, 望

    マルチメディア,分散,協調とモバイルシンポジウム2023論文集   2023   490 - 503  2023.06

     View Summary

    観光産業の振興と情報科学技術の発展によって,ユーザの旅行計画を補助する技術の開発が進んでいる.旅行計画では,人気度や費用など複数の目的関数を同時に最適化することで,ユーザが満足する経路を生成する必要がある.さらに,ユーザに旅行の詳細な情報を与え,ユーザが行動しやすい旅行経路を生成するには,時間依存で変化する移動時間や観光地の価値を考慮するべきである.例えば,移動に公共交通機関を利用する場合,時刻表や移動経路によって出発時刻に依存して移動時間が変化する.観光地の価値についても,夜景が綺麗な観光地や,イベントを開催する観光地,営業時間の存在など,訪問時間によって価値が変化する.本稿では,旅行計画における時間変化する価値を考慮し,複数の目的関数を最適化できる,時間依存多目的旅行計画問題最適化手法を提案する.提案手法は,蟻コロニー最適化において複数の目的関数を異なる重みで考慮する蟻を設定し,フェロモンに時間属性を付加することで時間依存多目的旅行計画問題を解法する.特に,タイムスタンプ付きの過去のユーザの旅行履歴を利用することで時間依存の観光地の価値に対応し,詳細経路 API を利用して時間変動する移動時間に対応する.その上で,詳細経路 API 利用時の応答時間の増加を想定し,API 呼出回数を削減する工夫を導入する.評価実験により,提案手法は既存手法に対し,より時間変化する価値を最適化した旅行経路を生成した.

  • 歩行特性を利用したスマートフォン階段昇降推定

    梶本, 大, 佐伯, 越志, 鮑, 思雅, 戸川, 望

    マルチメディア,分散,協調とモバイルシンポジウム2023論文集   2023   329 - 335  2023.06

     View Summary

    GPS (Global Positioning System) をはじめとして,我々は日常的に自己位置を推定している.しかし,GPS を利用できない環境の場合,携帯端末のセンサを用いた PDR (Pedestrian Dead Reckoning) 等の相対的測位手法が必要となる.特に複雑な屋内空間において,歩行者は水平方向に移動するだけでなく垂直方向にも移動する.このとき,エレベータやエスカレータのように歩行者の揺れや振動が少ない移動手段だけではなく,階段のような歩行者に不規則に揺れや振動が生じる場合にも,正確に垂直方向の移動を推定する必要がある.本稿では,スマートフォンを利用した階段昇降推定手法を提案する.提案手法は,歩行者の歩行特性を利用してフロアの水平部分を検出し気圧センサの誤差を解消することで,高い精度で階段中のフロアを推定する.さらに,気圧センサの値がスマートフォンの姿勢に左右されない特性を利用することで,スマートフォンの姿勢によらない階段昇降推定を実現する.評価実験の結果,提案手法は既存手法と比較して,推定誤差を低減し階段昇降を推定できた.

  • 部分QUBOアニーリングによる複数日旅程最適化問題の解法

    野口竜弥, 深田佳佑, 鮑思雅, 戸川望

    情報処理学会研究報告(Web)   2023 ( ITS-092 )  2023

    J-GLOBAL

  • ACOによる多目的要求に対応した旅行計画最適化手法

    佐伯, 越志, 鮑, 思雅, 高山, 敏典, 戸川, 望

    マルチメディア,分散,協調とモバイルシンポジウム2022論文集   2022   1556 - 1569  2022.07

     View Summary

    観光産業の振興と情報科学技術の発展によって,旅行計画サービスの開発が進んでいる.旅行計画サービスが対象とする旅行計画では,満足度や費用など複数の目的関数を同時に最適化することで,ユーザが満足する経路を生成する必要がある.とりわけ,過去に多くのユーザが同様な旅程を計画している,あるいは部分的に同様な旅程を計画していることから,いかに過去のユーザの旅行経路を再利用するかが旅行計画の大きな鍵となる.本稿では,旅行計画に対するユーザの要求を満足するため,多目的オリエンテーリング問題をベースに過去のユーザの旅行経路を陽に利用した旅行計画最適化手法を提案する.提案手法は,蟻コロニー最適化を利用することで,過去のユーザの旅行経路を陽に反映した旅行計画を可能とする.その上で,蟻コロニー最適化において蟻の行動を多様な目的関数に対応して変化させることで,多目的オリエンテーリング問題を解法する.評価実験により,既存手法に対し,過去の旅行者の旅行経路に近く,よりユーザの要求を満足する旅行経路を生成した.

  • Analysis for Hybrid Method using Quantum Annealing Machine and Nonquantum Type Ising Machine

    菊池脩太, 戸川望, 田中宗, 田中宗, 田中宗

    日本物理学会講演概要集(CD-ROM)   77 ( 2 )  2022

    J-GLOBAL

  • Multi-spin flip method for an Ising machine

    白井達彦, 戸川望

    日本物理学会講演概要集(CD-ROM)   77 ( 1 )  2022

    J-GLOBAL

  • 歩行時の加速度の周期性によるスマートグラス端末姿勢推定手法

    佐藤大生, 戸川望

    電子情報通信学会技術研究報告(Web)   122 ( 190(ITS2022 6-9) )  2022

    J-GLOBAL

  • 係数分割によるイジングモデルのビット幅削減の検討

    谷地悠太, 多和田雅師, 戸川望

    情報処理学会研究報告(Web)   2022 ( SLDM-199 )  2022

    J-GLOBAL

  • Analysis of dynamical process on the bit-width reduced Ising model

    菊池脩太, 戸川望, 田中宗, 田中宗

    日本物理学会講演概要集(CD-ROM)   77 ( 1 )  2022

    J-GLOBAL

  • イジングマシンを用いた複数日にまたがる観光地選出手法

    鮑思雅, 戸川望

    電子情報通信学会大会講演論文集(CD-ROM)   2022  2022

    J-GLOBAL

  • Hardware-Trojan Classification at Practical Trojan Netlists Utilizing Random Forests

    コンピュータセキュリティシンポジウム2021論文集     9 - 16  2021.10

    CiNii

  • Power-Analysis Based Anomalous Behavior Detection Utilizing Steady State Power Waveform Generated by LSTM with Many Output Dimentions

    コンピュータセキュリティシンポジウム2021論文集     17 - 24  2021.10

    CiNii

  • Hardware-Trojan Detection Utilizing Graph Neural Networks at Gate-Level Netlists

    コンピュータセキュリティシンポジウム2021論文集     1 - 8  2021.10

    CiNii

  • ストカスティック数を用いた絶対値関数及び不連続関数の実装と評価

    石川 遼太, 多和田 雅師, 戸川 望

    DAシンポジウム2021論文集   ( 2021 ) 65 - 70  2021.08

    CiNii

  • スマートフォンとスマートウォッチを併用したPDRによる屋内位置推定

    若泉 朋弥, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2205論文集   ( 2020 ) 1290 - 1302  2020.06

    CiNii

  • モンテカルロ木探索を用いたユーザ個人の嗜好を考慮した経路推薦手法の高速化

    石崎 雄太, 高山 敏典, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2206論文集   ( 2020 ) 1303 - 1310  2020.06

    CiNii

  • メタヒューリスティクスの制約なし二次形式二値変数最適化問題への適用 (システム数理と応用)

    多和田 雅師, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   119 ( 470 ) 43 - 48  2020.03

    CiNii

  • イジング計算機による3次元直方体パッキング問題の解法 (VLSI設計技術)

    金丸 翔, 寺田 晃太朗, 川村 一志, 田中 宗, 富田 憲範, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   119 ( 443 ) 173 - 178  2020.03

    CiNii

  • トリガ回路の性質にもとづく特徴量を利用したニューラルネットワークによるハードウェアトロイ識別 (VLSI設計技術)

    井上 智貴, 長谷川 健人, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   119 ( 443 ) 227 - 232  2020.03

    CiNii

  • イジングマシンを用いたアミューズメントパークの経路最適化手法 (VLSI設計技術)

    武笠 陽介, 若泉 朋弥, 田中 宗, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   119 ( 443 ) 167 - 172  2020.03

    CiNii

  • 乱数化関数を用いた乱数生成回路を共有するストカスティック数生成器 (VLSI設計技術)

    多和田 雅師, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   119 ( 443 ) 163 - 166  2020.03

    CiNii

  • Performance comparison of integer encoding methods in Ising machines

    Tamura Kensuke, Shirai Tatsuhiko, Katsura Hosho, Tanaka Shu, Togawa Nozomu

    Meeting Abstracts of the Physical Society of Japan   75.1   2351 - 2351  2020

    DOI CiNii

  • イジングモデルによる類似誘導部分グラフ同型問題の解法 (VLSI設計技術) -- (デザインガイア2019 : VLSI設計の新しい大地)

    吉村 夏一, 多和田 雅師, 田中 宗, 新井 淳也, 巴 徳瑪, 八木 哲志, 内山 寛之, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   119 ( 282 ) 103 - 108  2019.11

    CiNii

  • ストカスティック計算におけるステップ関数の実装と評価 (VLSI設計技術) -- (デザインガイア2019 : VLSI設計の新しい大地)

    石川 遼太, 多和田 雅師, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   119 ( 282 ) 69 - 74  2019.11

    CiNii

  • 低密度パリティ検査符号復号問題を制約なし二次形式二値変数最適化問題に変換した解法

    多和田 雅師, 田中 宗, 戸川 望

    DAシンポジウム2019論文集   ( 2019 ) 45 - 50  2019.08

    CiNii

  • スリープ状態をもつ組込みシステムを対象とした電力解析にもとづく異常動作検知とその実証的評価

    長谷川 健人, 近松 聖, 戸川 望

    DAシンポジウム2019論文集   ( 2019 ) 93 - 98  2019.08

    CiNii

  • ストカスティック数を用いた再帰的分割による解像度解釈可変な画像形式 (VLSI設計技術)

    石川 遼太, 多和田 雅師, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   119 ( 154 ) 71 - 76  2019.07

    CiNii

  • モンテカルロ木探索によるユーザ個人の嗜好を考慮した経路推薦手法とその評価

    石崎 雄太, 高山 敏典, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2019論文集   ( 2019 ) 854 - 862  2019.06

    CiNii

  • 動的な歩幅更新をベースとするマップマッチングによるPDR手法

    西村 天晴, 高山 敏典, 柳澤 政生, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2019論文集   ( 2019 ) 1663 - 1669  2019.06

    CiNii

  • スマートフォン搭載センサを用いた自転車の挙動認識の向上

    宇佐見 友理, 石川 和明, 高山 敏典, 柳澤 政生, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2019論文集   ( 2019 ) 1670 - 1675  2019.06

    CiNii

  • 2ⁿRRR : 高度な並び替えにより誤り耐性を強化したストカスティック数複製器 (VLSI設計技術) -- (デザインガイア2018 : VLSI設計の新しい大地)

    石川 遼太, 多和田 雅師, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   118 ( 334 ) 95 - 100  2018.12

    CiNii

  • 高位合成時のモジュール分割におけるバッファコスト最小化問題とその解法

    大場 諒介, 川村 一志, 田宮 豊, 柳澤 政生, 戸川 望

    DAシンポジウム2018論文集   ( 2018 ) 63 - 68  2018.08

    CiNii

  • 低電力化電気容量検出装置を用いた動作中の不正デバイス検知

    西澤 誠人, 長谷川 健人, 柳澤 政生, 戸川 望

    DAシンポジウム2018論文集   ( 2018 ) 112 - 117  2018.08

    CiNii

  • マイクロコントローラのスリープ状態に着目した消費電力にもとづく悪意のある機能の発現検知

    長谷川 健人, 柳澤 政生, 戸川 望

    DAシンポジウム2018論文集   ( 2018 ) 118 - 123  2018.08

    CiNii

  • スマートフォン搭載3軸加速度センサと3軸ジャイロセンサを用いた自転車の挙動認識

    宇佐見 友理, 石川 和明, 高山 敏典, 柳澤 政生, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2018論文集   ( 2018 ) 32 - 42  2018.06

    CiNii

  • POIを考慮した経路長指定の複数巡回経路探索手法

    西村 天晴, 石川 和明, 高山 敏典, 柳澤 政生, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2018論文集   ( 2018 ) 1612 - 1621  2018.06

    CiNii

  • 再収斂による計算誤りに耐性を持つストカスティック数複製器を用いた活性化関数の実装と評価

    石川遼太, 多和田雅師, 柳澤政生, 戸川望

    電子情報通信学会技術研究報告   118 ( 83 ) 167 - 172  2018.06

    CiNii J-GLOBAL

  • イジング計算機によるスロット配置問題の解法

    金丸翔, 於久太祐, 多和田雅師, 田中宗, 田中宗, 林真人, 山岡雅直, 柳澤政生, 戸川望

    電子情報通信学会技術研究報告   118 ( 85(MSS2018 1-36) ) 161‐166  2018.06

    J-GLOBAL

  • 亜種ハードウェアトロイの設計とそのニューラルネットワークを用いた検出

    井上智貴, 長谷川健人, 小林悠記, 柳澤政生, 戸川望

    電子情報通信学会技術研究報告   118 ( 85(MSS2018 1-36) ) 173‐178  2018.06

    J-GLOBAL

  • リーク削減による低消費電力SRAMの設計—A low power SRAM design with leakage power reduction

    伊藤 卓, 戸川 望, 柳澤 政生, 史 又華

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems   31   197 - 202  2018.05

    CiNii

  • 低周波圧電エネルギーハーベスティングにおけるMOSs SP-SSHI手法—MOSs SP-SSHI for low frequency piezoelectric energy harvesting

    杉山 貴紀, 戸川 望, 柳澤 政生, 史 又華

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems   31   86 - 91  2018.05

    CiNii

  • CNNに対する概算加算器の適用と評価—Application and evaluation of CNN with approximate adders

    井上 雄太, 戸川 望, 柳澤 政生, 史 又華

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems   31   191 - 196  2018.05

    CiNii

  • 効率的なストカスティック数複製器と合成関数回路を用いたその評価

    石川遼太, 多和田雅師, 柳澤政生, 戸川望

    電子情報通信学会技術研究報告   117 ( 480 ) 209 - 214  2018.03

    CiNii J-GLOBAL

  • 鍵長128ビット,192ビット,256ビットの軽量暗号CLEFIAに対するスキャンベース攻撃手法

    於久太祐, 多和田雅師, 柳澤政生, 戸川望

    電子情報通信学会技術研究報告   117 ( 480 ) 251 - 256  2018.03

    CiNii J-GLOBAL

  • 鍵長128ビット,192ビット,256ビットの軽量暗号CLEFIAに対するスキャンベース攻撃手法 (コンピュータシステム) -- (組込み技術とネットワークに関するワークショップETNET2018)

    於久 太祐, 多和田 雅師, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   117 ( 479 ) 251 - 256  2018.03

    CiNii

  • 複数エリアへの近接度を用いたパーティクルフィルタによる屋内測位手法の適用

    百瀬凌也, 石川和明, 柳澤政生, 戸川望

    電子情報通信学会大会講演論文集(CD-ROM)   2018   ROMBUNNO.A‐14‐7  2018.03

    J-GLOBAL

  • 凍結ビットパタンの偏りを利用した高速Polar符号復号器とそのハードウェア実装の検討

    多和田雅師, 神谷典史, 井手口裕太, 井上浩明, 戸川望

    電子情報通信学会大会講演論文集(CD-ROM)   2018   ROMBUNNO.A‐1‐12  2018.03

    J-GLOBAL

  • LSIの配線問題―DAシンポジウムの配線問題解法コンテスト―2 機械学習とFPGAを用いた配線問題解法への取り組み

    川村一志, 長谷川健人, 多和田雅師, 戸川望

    情報処理   59 ( 3 ) 228‐231  2018.02

    J-GLOBAL

  • 低周波圧電エネルギーハーベスティングにおけるMOSs SP‐SSHI手法

    杉山貴紀, 戸川望, 柳澤政生, SHI Youhua

    回路とシステムワークショップ論文集(CD-ROM)   31st   ROMBUNNO.A2‐1  2018

    J-GLOBAL

  • リーク削減による低消費電力SRAMの設計

    伊藤卓, 戸川望, 柳澤政生, SHI Youhua

    回路とシステムワークショップ論文集(CD-ROM)   31st   ROMBUNNO.C4‐3  2018

    J-GLOBAL

  • トリガ条件の異なるハードウェアトロイの設計とSVMを用いた検出 (VLSI設計技術) -- (デザインガイア2017 : VLSI設計の新しい大地)

    井上 智貴, 長谷川 健人, 小林 悠記, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   117 ( 273 ) 133 - 138  2017.11

    CiNii

  • 暗号回路に挿入されたハードウェアトロイとその抑止回路のFPGA実装 (VLSI設計技術) -- (デザインガイア2017 : VLSI設計の新しい大地)

    長谷川 健人, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   117 ( 273 ) 139 - 144  2017.11

    CiNii

  • 環境発電動作を想定した揮発・不揮発レジスタ併用型フロアプラン指向高位合成手法

    浅井 大輝, 柳澤 政生, 戸川 望

    DAシンポジウム2017論文集   ( 2017 ) 57 - 62  2017.08

    CiNii

  • 高ポイント高速数論変換に対する高位合成のためのループ構造最適化

    川村 一志, 柳澤 政生, 戸川 望

    DAシンポジウム2017論文集   ( 2017 ) 63 - 68  2017.08

    CiNii

  • スキャンシグネチャを用いた周辺回路を含む軽量暗号CLEFIAに対するスキャンベース攻撃

    於久 太祐, 多和田 雅師, 柳澤 政生, 戸川 望

    DAシンポジウム2017論文集   ( 2017 ) 116 - 121  2017.08

    CiNii

  • 不揮発性メモリを対象とした低書き込みメモリ暗号化手法

    多和田 雅師, 柳澤 政生, 戸川 望

    DAシンポジウム2017論文集   ( 2017 ) 122 - 126  2017.08

    CiNii

  • ネットの周辺情報を考慮した機械学習によるハードウェアトロイ識別

    長谷川 健人, 柳澤 政生, 戸川 望

    DAシンポジウム2017論文集   ( 2017 ) 127 - 132  2017.08

    CiNii

  • 20KスピンCMOSアニーリングマシンを対象とした完全結合イジングモデルマッピング手法と評価

    寺田 晃太朗, 田中 宗, 林 真人, 山岡 雅直, 柳澤 政生, 戸川 望

    DAシンポジウム2017論文集   ( 2017 ) 163 - 168  2017.08

    CiNii

  • セレクタ論理を適用したFFTプロセッサのFPGA実装評価

    平井 勇也, 川村 一志, 柳澤 政生, 戸川 望

    DAシンポジウム2017論文集   ( 2017 ) 180 - 185  2017.08

    CiNii

  • 遅延変動に対しロバストなAES暗号回路の設計

    矢作 裕基, 柳澤 政生, 戸川 望

    DAシンポジウム2017論文集   ( 2017 ) 210 - 215  2017.08

    CiNii

  • 乱数によるビット並び替えに基づくストカスティック数複製器

    石川 遼太, 多和田 雅師, 柳澤 政生, 戸川 望

    DAシンポジウム2017論文集   ( 2017 ) 169 - 174  2017.08

    CiNii

  • 近接度を用いたパーティクルフィルタによる高精度屋内測位手法

    百瀬 凌也, 新田 知之, 柳澤 政生, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2017論文集   ( 2017 ) 514 - 522  2017.06

    CiNii

  • 疎な GPS 測位情報を対象にした測位精度と短時間滞在除去に基づく滞在地推定手法

    岩田 紗瑛, 新田 知之, 高山 敏典, 柳澤 政生, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2017論文集   ( 2017 ) 523 - 531  2017.06

    CiNii

  • 連続してハッシュ値を出力しないHMAC-SHA-256回路へのスキャンベース攻撃手法 (ディペンダブルコンピューティング) -- (組込み技術とネットワークに関するワークショップETNET2017)

    於久 太祐, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   116 ( 511 ) 129 - 134  2017.03

    CiNii

  • 連続してハッシュ値を出力しないHMAC-SHA-256回路へのスキャンベース攻撃手法 (コンピュータシステム) -- (組込み技術とネットワークに関するワークショップETNET2017)

    於久 太祐, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   116 ( 510 ) 129 - 134  2017.03

    CiNii

  • ネットの特徴量を用いた多層ニューラルネットワークによるハードウェアトロイ識別 (コンピュータシステム) -- (組込み技術とネットワークに関するワークショップETNET2017)

    長谷川 健人, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   116 ( 510 ) 135 - 140  2017.03

    CiNii

  • PDRの測位誤差補正のためのマルチシナリオ化マップマッチング手法 (画像工学)

    岩名地 良太, 新田 知之, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   116 ( 464 ) 387 - 392  2017.02

    CiNii

  • An Evaluation Method of Road Illuminance Levels Using Road Lights and Landmarks

    BAO Siya, YANAGISAWA Masao, TOGAWA Nozomu

    電子情報通信学会大会講演論文集(CD-ROM)   2017  2017

    J-GLOBAL

  • Proposal of pH-sensor device capable of operating only with NFC energy harvesting

    MIYABAYASHI Shun, OSAKA Tetsuya, TAWADA Masashi, TOGAWA Nozomu, KATAOKA Kosuke, ASAHI Toru, IWATA Hiroyasu, HAYATA Hiroki, IWASE Eiji, FUJIE Toshinori, TAKEOKA Shinji, OHASHI Keishi, SATO Shin, KUROIWA Shigeki, MOMMA Toshiyuki

    The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec)   2017 ( 0 ) 1A1 - L10  2017

     View Summary

    <p>Skin-attachable devices are essential to the realization of personalized skin health through continuously monitoring individual's skin surface pH. This paper describes an approach to measure the skin surface pH no matter when or where, just holding a NFC enable phone over the pH-sensor device capable of operating only with NFC energy harvesting. Since NFC can generate the power and batteries are replaced, the proposed device becomes smaller, lighter and thinner. Therefore, it could be attached on the skin by using the ultrathin polymer film called nanosheet. Moreover, the low-power circuit is proposed which implements the constant current circuit and the function of wireless communication.</p>

    DOI CiNii

  • セレクタ論理に帰着させたバタフライ演算器のFPGA実装評価 (VLSI設計技術) -- (デザインガイア2016 : VLSI設計の新しい大地)

    伊東 光希, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   116 ( 330 ) 67 - 72  2016.11

    CiNii

  • 動作中のIoTデバイスに対する電気容量変化の測定を用いた不正改変検知装置の設計 (VLSI設計技術) -- (デザインガイア2016 : VLSI設計の新しい大地)

    北山 遼育, 竹中 崇, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   116 ( 330 ) 129 - 134  2016.11

    CiNii

  • 経年劣化を考慮したフロアプラン統合化高位合成手法 (VLSI設計技術) -- (デザインガイア2016 : VLSI設計の新しい大地)

    井川 昂輝, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   116 ( 330 ) 141 - 146  2016.11

    CiNii

  • スキャンシグネチャを用いたスキャンデータ解析に基づくHMAC-SHA-256ハッシュ回路のスキャンベース攻撃

    於久 太祐, 多和田 雅師, 柳澤 政生, 戸川 望

    DAシンポジウム2016論文集   2016 ( 2 ) 2 - 7  2016.09

    CiNii

  • Random Forestを用いたネットリスト特徴選択と機械学習によるハードウェアトロイ識別

    長谷川 健人, 柳澤 政生, 戸川 望

    DAシンポジウム2016論文集   2016 ( 3 ) 8 - 13  2016.09

    CiNii

  • リードソロモン符号に基づいたマルチレベルセル不揮発性メモリ書き込み削減

    多和田 雅師, 柳澤 政生, 戸川 望

    DAシンポジウム2016論文集   2016 ( 31 ) 163 - 168  2016.09

    CiNii

  • From the EDA Perspective

      99 ( 9 ) 901 - 906  2016.09

    CiNii

  • 歩行者の方向判断基準を用いた腕時計型ウェアラブル端末向け略地図生成手法

    河野 圭亮, 新田 知之, 石川 和明, 柳澤 政生, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2016論文集   ( 2016 ) 411 - 418  2016.07

    CiNii

  • 眼鏡型ウェアラブル端末を用いたランドマーク確認に基づく屋外歩行者ナビゲーション

    矢野 椋也, 新田 知之, 石川 和明, 柳澤 政生, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2016論文集   ( 2016 ) 419 - 427  2016.07

    CiNii

  • 歩行者の視点情報に基づく屋内経路案内

    岩名地 良太, 新田 知之, 石川 和明, 柳澤 政生, 戸川 望

    マルチメディア,分散協調とモバイルシンポジウム2016論文集   ( 2016 ) 1748 - 1756  2016.07

    CiNii

  • A-6-4 Improvement and Evaluation of Selector-logic-based Volume Rendering Circuits for FPGAs

    Igarashi Keita, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the IEICE Engineering Sciences Society/NOLTA Society Conference   2016   78 - 78  2016.03

    CiNii

  • A-6-5 Evaluation of A Floorplan-aware High-level Synthesis Algorithm Optimizing Critical Path for FPGA Designs

    Fujiwara Koichi, Kawamura Kazushi, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the IEICE Engineering Sciences Society/NOLTA Society Conference   2016   79 - 79  2016.03

    CiNii

  • A-3-7 Worst-case Bit-Write-Reducing and Error-Correcting Code Generation by One-to-many Mapping for Non-Volatile Memories

    Kojo Tatsuro, Tawada Masashi, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the IEICE Engineering Sciences Society/NOLTA Society Conference   2015   52 - 52  2015.08

    CiNii

  • A-9-2 Low-power soft-error tolerant New-SEH latch scheme

    TAJIMA Saki, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao

    Proceedings of the IEICE Engineering Sciences Society/NOLTA Society Conference   2015   106 - 106  2015.08

    CiNii

  • AES Encryption Circuit against Clock Glitch based Fault Analysis

      2015 ( 10 ) 1 - 5  2015.05

     View Summary

    Recently, fault analysis has attracted a lot of attentions as a new kind of side channel attack methods, in which malicious faults are generally injected by attackers through clock glitch generation, voltage change, or laser manipulation during the execution of a crypto circuit. As existing countermeasures against fault analysis, area-redundant and time-redundant methods have been proposed. However they will cause large area overhead or time overhead. Therefore, in this paper, we proposed an AES circuit design that can detect timing faults caused by malicious clock glitches. Experimental results show that the proposed method can detect 100% timing faults at only 4.9% post-layout area overhead.

    CiNii

  • 製造ばらつきと配線遅延を同時に考慮した低レイテンシ指向のマルチシナリオ高位合成の評価 (ディペンダブルコンピューティング)

    井川 昂輝, 阿部 晋矢, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   114 ( 507 ) 155 - 160  2015.03

     View Summary

    増大を続ける製造ばらつきや配線遅延への解決策として,HDRアーキテクチャを対象としたマルチシナリオ高位合成手法を提案している.チップ全体をハドルと呼ばれる配線遅延の影響のない範囲に分割することで高位合成段階における適切な配線遅延の予測が可能となる.加えて製造ばらつきによる演算器の遅延ばらつきをシナリオとして扱う.演算器の遅延がTypicalケースの場合のTypicalシナリオ,Worstケースの場合のWorstシナリオを同時に1つのチップ上に高位合成し,製造されたチップの特性に応じてシナリオを切り替えることで高い歩留りと高い性能の両立が可能となる.提案手法は各シナリオの動作コントロールステップ数を最小化し,ハドル間データ通信やモジュール間結線をシナリオ間で揃える共通化と呼ばれる処理により全体の面積を削減する.本稿では,計算機実験により各動作条件におけるレイテンシを従来手法と比較し評価する.また,演算器の遅延分布からTypicalシナリオで動作可能な確率を算出し,レイテンシの期待値も評価する.提案手法は従来手法と比較し,レイテンシの期待値を最大35%削減できることを確認した.

    CiNii

  • Improved scan-based side-channel attack on the LED block cipher

    FUJISHIRO MIKA, YANAGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Dependable computing   114 ( 507 ) 149 - 154  2015.03

     View Summary

    LED (Light Encryption Device) block cipher, one of lightweight block ciphers, is very compact in hardware. Although the conventional scan-based side-channel attack method on the LED can retrieve a 64-bit secret key, it would not retrieve a 128-bit secret key. In this paper, an improved scan-based attack method on the LED block cipher is proposed. Experimental results show that our proposed method successfully retrieves its 128-bit secret key using 145 plaintexts on average if the scan chain is only connected to the LED block cipher.

    CiNii

  • Improved scan-based side-channel attack on the LED block cipher

    FUJISHIRO MIKA, YANAGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Computer systems   114 ( 506 ) 149 - 154  2015.03

     View Summary

    LED (Light Encryption Device) block cipher, one of lightweight block ciphers, is very compact in hardware. Although the conventional scan-based side-channel attack method on the LED can retrieve a 64-bit secret key, it would not retrieve a 128-bit secret key. In this paper, an improved scan-based attack method on the LED block cipher is proposed. Experimental results show that our proposed method successfully retrieves its 128-bit secret key using 145 plaintexts on average if the scan chain is only connected to the LED block cipher.

    CiNii

  • A low-power soft error tolerant latch scheme

    TAJIMA Saki, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao

    Technical report of IEICE. VLD   114 ( 476 ) 55 - 60  2015.03

     View Summary

    In recent technology scaling, reduction of reliability by soft-error and increase power has appeared as an inevitable problem for logic circuits. We propose a low-power and high soft-error tolerant latch called TSPC-SEH latch based Soft Error Hardened (SEH) latch and True Single Phase Clock (TSPC). To compere SEH latch and DICE latch, the proposed latch archives 42% power reduction, and 54%s delay reduction.

    DOI CiNii

  • A Score-Based Hardware-Trojan Identification Method for Gate-Level Netlists

    OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 476 ) 165 - 170  2015.03

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. In this paper, we propose an HT identification method for gate-level netlists without using a Golden netlist. Firstly, we extract several their features specific to Trojan nets using several HT-inserted benchmarks. Secondly, we give scores to Trojan net features and sum up them for each net in benchmarks. Then we can find out a score threshold to identify HTs. Experimental results demonstrate that our method successfully identify all the HT-inserted gate-level benchmarks to be HT-inserted and all the HT-free gate-level benchmarks to be HT-free in approximately three hours for each benchmark.

    CiNii

  • Improved scan-based side-channel attack on the LED block cipher

      2015 ( 47 ) 1 - 6  2015.02

     View Summary

    LED (Light Encryption Device) block cipher, one of lightweight block ciphers, is very compact in hardware. Although the conventional scan-based side-channel attack method on the LED can retrieve a 64-bit secret key, it would not retrieve a 128-bit secret key. In this paper, an improved scan-based attack method on the LED block cipher is proposed. Experimental results show that our proposed method successfully retrieves its 128-bit secret key using 145 plaintexts on average if the scan chain is only connected to the LED block cipher.

    CiNii

  • 製造ばらつきと配線遅延を同時に考慮した低レイテンシ指向のマルチシナリオ高位合成の評価

    井川 昂輝, 阿部 晋矢, 柳澤 政生, 戸川 望

    情報処理学会研究報告. SLDM, [システムLSI設計技術]   2015 ( 48 ) 1 - 6  2015.02

     View Summary

    増大を続ける製造ばらつきや配線遅延への解決策として,HDR アーキテクチャを対象としたマルチシナリオ高位合成手法を提案している.チップ全体をハドルと呼ばれる配線遅延の影響のない範囲に分割することで高位合成段階における適切な配線遅延の予測が可能となる.加えて製造ばらつきによる演算器の遅延ばらつきをシナリオとして扱う.演算器の遅延が Typical ケースの場合の Typical シナリオ,Worst ケースの場合の Worst シナリオを同時に 1 つのチップ上に高位合成し,製造されたチップの特性に応じてシナリオを切り替えることで高い歩留りと高い性能の両立が可能となる.提案手法は各シナリオの動作コントロールステップ数を最小化し,ハドル間データ通信やモジュール間結線をシナリオ間で揃える共通化と呼ばれる処理により全体の面積を削減する.本稿では,計算機実験により各動作条件におけるレイテンシを従来手法と比較し評価する.また,演算器の遅延分布から Typical シナリオで動作可能な確率を算出し,レイテンシの期待値も評価する.提案手法は従来手法と比較し,レイテンシの期待値を最大 35% 削減できることを確認した.

    CiNii

  • A-3-1 Interconnection Delay Modeling for Floorplan-Driven High-Level Synthesis Targeting FPGAs

    Fujiwara Koichi, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the IEICE General Conference   2015   80 - 80  2015.02

    CiNii

  • A-3-8 Implementation and Evaluation of Selector-logic-based Alpha Blending Circuits for FPGAs

    Igarashi Keita, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the IEICE General Conference   2015   87 - 87  2015.02

    CiNii

  • A Hardware Trojan Detection Method based on Trojan Net Features

    OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 426 ) 157 - 162  2015.01

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. Particularly HTs can be easily inserted during design phase but their detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method based on Trojan net features. Most of nets in HTs have several features and our method detects the nets having these features. Our approach does not assume Golden netlists nor activation of HTs. We can succesfully detect a Trojan net in each of the HT-inserted gate-level netlists from the Trust-HUB benchmark. It takes approximately thirty minutes to detect Trojan nets in each benchmark.

    CiNii

  • A Hardware Trojan Detection Method based on Trojan Net Features

    OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Computer systems   114 ( 427 ) 157 - 162  2015.01

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. Particularly HTs can be easily inserted during design phase but their detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method based on Trojan net features. Most of nets in HTs have several features and our method detects the nets having these features. Our approach does not assume Golden netlists nor activation of HTs. We can succesfully detect a Trojan net in each of the HT-inserted gate-level netlists from the Trust-HUB benchmark. It takes approximately thirty minutes to detect Trojan nets in each benchmark.

    CiNii

  • A Hardware Trojan Detection Method based on Trojan Net Features

    大屋 優, 史 又華, 柳澤 政生, 戸川 望

    情報処理学会研究報告. SLDM, [システムLSI設計技術]   2015 ( 28 ) 1 - 6  2015.01

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce costs in semiconductor industry. This circumstance introduces risks that malicious attackers can implement Hardware Trojans (HTs) on them. Partic ularly HTs can be easily inserted during design phase but their detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method based on Trojan net features. Most of nets in HTs have several features and our method detects the nets having these features. Our approach does not assume Golden netlists nor activation of HTs. We can succesfully detect a Trojan net in each of the HT-inserted gate-level netlists from the Trust-HUB benchmark. It takes approximately thirty minutes to detect Trojan nets in each benchmark.

    CiNii

  • A Field Data Extractor Configuration Based on Multiplexer Tree Partitioning

    ITO Koki, KAWAMURA Kazushi, YANAGISAWA Masao, TOGAWA Nozomu, TAMIYA Yutaka

    Technical report of IEICE. VLD   114 ( 328 ) 197 - 202  2014.11

     View Summary

    As seen in packet analysis of TCP/IP offload engine and stream data processing of encoder/decoder for video data, it is often necessary to extract a part of data from data changed field dynamically, where we can use a field-data extractor. Particularly, an (M, N) field-data extractor reads out any consecutive N bytes from an M-byte register by connecting its input/output using multiplexers. However, the number of required multiplexers increases too much as the input/output byte lengths increase. How to reduce the number of its required multiplexers is a major challenge. In this paper, we propose an efficient multiplexer-tree configuration method for an (M, N) field-data extractor. Our method is based on inserting a (N+B-1)-byte virtual intermediate-register into a multiplexer tree and partitioning it into an upper tree and a lower tree. Then our method theoretically reduces the number of required multiplexers without increasing the multiplexer-tree depth. We also propose how to determine the size of the virtual intermediate-register that minimizes the number of required multiplexers. Experimental results show that our method reduces the required number of gates to implement a field-data extractor by up to 92% compared with the one using a naive multiplexer-tree configuration.

    CiNii

  • An Effective Robust Design Using Improved Checkpoint Insertion Algorithm for Suspicious Timing-Error Prediction Scheme and its Evaluations

    YOSHIDA Shinnosuke, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 57 - 62  2014.11

     View Summary

    As process technologies advance, process and delay variation causes a complex timing design and in-situ timing error correction techniques are strongly required. Suspicious timing error prediction (STEP) predicts timing errors by monitoring checkpoints by STEP circuits (STEPCs) and how to insert checkpoints is very important. We have proposed a network-flow-based checkpoint insertion algorithm for STEP. However, our algorithm may ignore long paths and insert checkpoints near the output. In this paper, we improve how to ignore short paths and set labels by estimating path lengths. Then, we can ignore only short paths and insert checkpoints into near the center of all long paths. We evaluate our algorithm by applying it to four benchmark circuits. Experimental results show that our proposed algorithm realizes an average of 1.71X overclocking compared with just inserting no STEPC. Furthermore, our improved algorithm realizes an average of 1.15X overclocking compared with our original algorithm.

    CiNii

  • Energy-efficient High-level Synthesis Algorithm targeting HDR-mcv Architecture with Multiple Clock Domains and Multiple Supply Voltages

    ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 203 - 208  2014.11

     View Summary

    An HDR-mcv architecture, which integrates multiple supply voltages and multiple clock domains into high-level synthesis and enables us to estimate interconnection delay effects during high-level synthesis, has been proposed with the corresponding synthesis algorithm. They assign voltages and clock frequencies to huddles which are the partitions for interconnection delay estimation during high-level synthesis. However, the voltage and clock assignment may have some energy overheads due to the increased clock trees. In this paper, we propose a new HDR-mcv architecture in which supply voltages are assigned to functional logics and clock synchronization logics separately. Next, we propose a high-level synthesis algorithm for the architecture, which can assign clock frequencies and supply voltages on the bases of the placement and energy informations. Experimental results show that the proposed method achieves 50% energy-saving compared with the conventional HDR-mcv architecture and 60% energy-saving compared with the existing high-level synthesis methods.

    CiNii

  • A Process-Variation-Tolerant and Low-Latency Multi-Scenario High-Level Synthesis Algorithm for HDR Architectures

    IGAWA Koki, ABE Shinya, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 105 - 110  2014.11

     View Summary

    In this paper, we propose a process-variation-tolerant and low-latency multi-scenario high-level synthesis algorithm for HDR architectures. We assume two scenarios, which are a typical-case scenario and a worst-case scenario, and realize them on a single chip. By using distributed-register architectures called HDR architectures, we can take into account interconnection delays in high-level syntesis. We first schedule/bind each of the scenarios independently. After that, we commonize a typical-case scenario and a worst-case scenario and synthesize a commonized scheduling/binding result. Experimental results show that our algorithm reduces the latency of typical-case scenario by up to 33% compared with previous methods.

    CiNii

  • Energy evaluation of bit-write reduction method based on state encoding limiting maximum and minimum Hamming distances for non-volatile memories

    KOJO Tatsuro, TAWADA Masashi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 221 - 226  2014.11

     View Summary

    Data stored in non-volatile memories may be destructed due to crosstalk and radiation but we can restore their data by using error-correcting codes. However, non-volatile memories consume a large amount of energy in writing. How to reduce writing bits even when using error-correcting codes is one of the challenges in non-volatile memory design. We have proposed a Doughnut code, which is a new bit-write-reducing and error-correcting code. In addition, we have proposed a code expansion method. When we apply our code expansion method to Doughnut code, we can obtain expanded Doughnut codes. Expanded Doughnut codes are error-correcting codes which can reduce the number of writing bits. In this paper, we demonstrate experimental evaluations from the viewpoint of energy reduction of our proposed expanded Doughnut codes. Experimental results show that the write-reducing code reduces energy consumption by up to 32% compared to Hamming code.

    CiNii

  • Small-Sized Encoder/Decoder Circuit Design for Bit-Write Reduction Targeting Non-Volatile Memories

    TAWADA Masashi, KIMURA Shinji, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 227 - 232  2014.11

     View Summary

    Non-volatile memory has many advantages such as low leakage power and non-volatility. However, there are problems that a non-volatile memory consumes a large amount of energy in writing and that the maximum number of bit re-writings is limited. We have proposed a Hamming-code based bit-write reduction method using data encoding/decoding but its encoder/decoder becomes too much large. In this paper, we propose small-sized encoder/decoder circuit design for the bit-write reduction codes. In this design, we simplify data encoding/decoding by using code redundancy. Experimental results show the efficiency of our encoder/decoder design.

    CiNii

  • Data Dependent Optimization using Suspicious Timing Error Prediction for Reconfigurable Approximation Circuits

    KAWAMURA Kazushi, ABE Shinya, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 51 - 56  2014.11

     View Summary

    The propagation delay along each path inside an LSI widely varies depending on input data, and this property can be exploited to design high-performance approximation circuit with a negligible error rate. In this paper, we propose a novel approximation circuit design algorithm, which identifies paths to be optimized based on input data and reconfigures these paths. Our algorithm first identifies the optimized paths by incorporating timing error prediction circuits into a target circuit and running them in practice. These paths are then dynamically reconfigured within an accuracy constraint with the objective of maximizing its performance. Experimental results targeting a set of basic adders show that our algorithm can achieve performance increase by up to 18.5% within acceptable error of 2.1% compared with conventional design techniques.

    CiNii

  • Design of Flip-Flop with Timing Error Tolerance

    SUZUKI Taito, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    Technical report of IEICE. VLD   114 ( 328 ) 45 - 50  2014.11

     View Summary

    Under the influence of the miniaturization of the integrated circuit, the variation of the operation condition of the circuit becomes bigger, and margins of the supply voltage and the clock frequency necessary for a design increase. For the mitigation of the margin, the structure of the circuit with the timing error tolerance is studied flourishingly. In this paper, we propose two new Time Borrowing Flip-Flops (TBFF) in transistor level to realize timing error tolerance by switching from flip-flop to latch dynamically. HSPICE simulation results show that the proposed TBFF can achieve up to 28.1% power reduction when compared with existing works.

    CiNii

  • A Floorplan-aware High-level Synthesis Algorithm Utilizing Interconnection Delay Characteristics in FPGA Designs

    FUJIWARA Koichi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 99 - 104  2014.11

     View Summary

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required such as in image processing and computerized stock tradings. With recent process scaling in FPGAs, interconnection delays become dominant in total circuit delays nevertheless I/O buffers and wire buffers are provided and each FPGA has a different interconnection delay characteristics. We need to consider interconnection delays based on interconnection delay characteristics in FPGA designs. In this paper, we propose a floorplan-aware high-level synthesis algorithm utilizing interconnection delay characteristics targeting FPGA designs. Our target architecture is HDR, one of distributed-register architectures, and then we can estimate interconnection delays correctly by utilizing interconnection delay characteristics in an FPGA chip. Further, we reduce multiplexers generated and also limit the total number of inputs to multiplexers in HLS process. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the latency by up to 6% compared with our previous approach.

    CiNii

  • High speed design of sub-threshold circuit by using DTMOS

    FUKUDOME Yuji, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    Technical report of IEICE. VLD   114 ( 328 ) 117 - 121  2014.11

     View Summary

    Low power consumption is achieved by operating circuits in sub-threshold region. However, in subthreshold region, the operating speed becomes slow, and the tradeoff between power and speed should be considered carefully. In this work, we present DTMOS implementations to realize high speed and low power in subthreshold region. Transistor level simulation results show that the operating speed can be improved by 30 %-45 %, and on average 15 % energy reduction can be achieved when V_&lt;dd&gt; ranges 0.2-0.3V.

    CiNii

  • A Hardware Trojans Detection Method focusing on Nets in Hardware Trojans in Gate-Level Netlists

    OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 135 - 140  2014.11

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce design costs in semiconductor industry. This circumstance introduces risks that malicious attackers implement Hardware Trojans (HTs) into ICs. HTs are easily inserted in particular during design phase, but HTs detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method through detecting LSLG nets, which have low switching probabilities. Our approach does not assume Golden netlists nor activation of HTs. We succesfully find out that all HT-inserted gate-level netlists from Trust-HUB benchmarks include a small number of LSLG nets. It takes approximately ten minutes to detect LSLG nets in each benchmark.

    CiNii

  • A High-level Synthesis Algorithm with Delay Variation Tolerance Optimization for RDR Architectures

    HAGIO Yuta, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 328 ) 209 - 214  2014.11

     View Summary

    In this paper, we propose a high-level synthesis algorithm with delay variation tolerance optimization for RDR architectures. We first obtain a non-delayed scheduling/binding result and a delayed scheduling/binding result independently. When we obtain two scheduling/binding results, we use two variation rates, the typical variation rate and the worst variation rate, and maximize them without increasing the latency. By adding several extra functional units to vacant RDR islands, we have a delayed scheduling/binding result so that its latency cannot be increased compared with the non-delayed one. After that, we similarize the two scheduling/binding results by repeatedly modifying their results. We can finally realize non-delayed and delayed scheduling/binding results simultaneously on RDR architecture with almost no area/performance overheads and we can select either one of them depending on post-silicon delay variation. Experimental results show that our algorithm successfully reduces delayed scheduling/binding latency by up to 16.7% compared with the conventional approach.

    CiNii

  • Design of Flip-Flop with Timing Error Tolerance

    SUZUKI Taito, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    IEICE technical report. Dependable computing   114 ( 329 ) 45 - 50  2014.11

     View Summary

    Under the influence of the miniaturization of the integrated circuit, the variation of the operation condition of the circuit becomes bigger, and margins of the supply voltage and the clock frequency necessary for a design increase. For the mitigation of the margin, the structure of the circuit with the timing error tolerance is studied flourishingly. In this paper, we propose two new Time Borrowing Flip-Flops (TBFF) in transistor level to realize timing error tolerance by switching from flip-flop to latch dynamically. HSPICE simulation results show that the proposed TBFF can achieve up to 28.1% power reduction when compared with existing works.

    CiNii

  • A Floorplan-aware High-level Synthesis Algorithm Utilizing Interconnection Delay Characteristics in FPGA Designs

    FUJIWARA Koichi, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   114 ( 329 ) 99 - 104  2014.11

     View Summary

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required such as in image processing and computerized stock tradings. With recent process scaling in FPGAs, interconnection delays become dominant in total circuit delays nevertheless I/O buffers and wire buffers are provided and each FPGA has a different interconnection delay characteristics. We need to consider interconnection delays based on interconnection delay characteristics in FPGA designs. In this paper, we propose a floorplan-aware high-level synthesis algorithm utilizing interconnection delay characteristics targeting FPGA designs. Our target architecture is HDR, one of distributed-register architectures, and then we can estimate interconnection delays correctly by utilizing interconnection delay characteristics in an FPGA chip. Further, we reduce multiplexers generated and also limit the total number of inputs to multiplexers in HLS process. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the latency by up to 6% compared with our previous approach.

    CiNii

  • High speed design of sub-threshold circuit by using DTMOS

    FUKUDOME Yuji, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    IEICE technical report. Dependable computing   114 ( 329 ) 117 - 121  2014.11

     View Summary

    Low power consumption is achieved by operating circuits in sub-threshold region. However, in sub-threshold region, the operating speed becomes slow, and the tradeoff between power and speed should be considered carefully. In this work, we present DTMOS implementations to realize high speed and low power in subthreshold region. Transistor level simulation results show that the operating speed can be improved by 30 %-45 %, and on average 15 % energy reduction can be achieved when V_&lt;dd&gt; ranges 0.2-0.3V.

    CiNii

  • A Hardware Trojans Detection Method focusing on Nets in Hardware Trojans in Gate-Level Netlists

    OYA Masaru, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   114 ( 329 ) 135 - 140  2014.11

     View Summary

    Recently, digital ICs are designed by outside vendors to reduce design costs in semiconductor industry. This circumstance introduces risks that malicious attackers implement Hardware Trojans (HTs) into ICs. HTs are easily inserted in particular during design phase, but HTs detection is too difficult during this phase. This is why we have to assume Golden Netlists and activation of HTs in previous researches. This paper proposes an HT detection method through detecting LSLG nets, which have low switching probabilities. Our approach does not assume Golden netlists nor activation of HTs. We succesfully find out that all HT-inserted gate-level netlists from Trust-HUB benchmarks include a small number of LSLG nets. It takes approximately ten minutes to detect LSLG nets in each benchmark.

    CiNii

  • A High-level Synthesis Algorithm with Delay Variation Tolerance Optimization for RDR Architectures

    HAGIO Yuta, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   114 ( 329 ) 209 - 214  2014.11

     View Summary

    In this paper, we propose a high-level synthesis algorithm with delay variation tolerance optimization for RDR architectures. We first obtain a non-delayed scheduling/binding result and a delayed scheduling/binding result independently. When we obtain two scheduling/binding results, we use two variation rates, the typical variation rate and the worst variation rate, and maximize them without increasing the latency. By adding several extra functional units to vacant RDR islands, we have a delayed scheduling/binding result so that its latency cannot be increased compared with the non-delayed one. After that, we similarize the two scheduling/binding results by repeatedly modifying their results. We can finally realize non-delayed and delayed scheduling/binding results simultaneously on RDR architecture with almost no area/performance overheads and we can select either one of them depending on post-silicon de- lay variation. Experimental results show that our algorithm successfully reduces delayed scheduling/binding latency by up to 16.7% compared with the conventional approach.

    CiNii

  • Data Dependent Optimization using Suspicious Timing Error Prediction for Reconfigurable Approximation Circuits

    Author not found

    研究報告システムとLSIの設計技術(SLDM)   2014 ( 2 ) 1 - 6  2014.11

     View Summary

    LSI 内部の各パス遅延は入力データに応じて様々に変動する.この性質を利用することで,計算精度をわずかに落としながらも高速に動作する LSI の設計が可能になる.本稿では,入力データ群にもとづき特定された最適化すべきパスをリコンフィギュレーションし最適化する,新たな回路設計アルゴリズムを提案する.提案アルゴリズムは最適化対象の回路にタイミングエラー予測回路を挿入し動作させることで被最適化パスを特定,動的に再構成し与えられたエラー制約内で動作クロック周期の最小化を図る.本アルゴリズムを加算器に対して適用した結果,通常のクリティカルパス最小化の設計と比較し,2.1 %以下のエラーを許容する制約下で最大 18.5%の高速化に成功した.The propagation delay along each path inside an LSI widely varies depending on input data, and this property can be exploited to design high-performance approximation circuit with a negligible error rate. In this paper, we propose a novel approximation circuit design algorithm, which identifies paths to be optimized based on input data and reconfigures these paths. Our algorithm first identifies the optimized paths by incorporating timing error prediction circuits into a target circuit and running them in practice. These paths are then dynamically reconfigured within an accuracy constraint with the objective of maximizing its performance. Experimental results targeting a set of basic adders show that our algorithm can achieve performance increase by up to 18.5% within acceptable error of 2.1% compared with conventional design techniques.

    CiNii

  • Local pulse generation in variable stages pipeline designs for low energy consumption

    NII Takayuki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    Technical report of IEICE. VLD   114 ( 231 ) 7 - 12  2014.10

     View Summary

    The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

    CiNii

  • Local pulse generation in variable stages pipeline designs for low energy consumption

    NII Takayuki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    IEICE technical report. Image engineering   114 ( 233 ) 7 - 12  2014.10

     View Summary

    The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

    CiNii

  • Local pulse generation in variable stages pipeline designs for low energy consumption

    NII Takayuki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    Technical report of IEICE. ICD   114 ( 232 ) 7 - 12  2014.10

     View Summary

    The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

    CiNii

  • Local pulse generation in variable stages pipeline designs for low energy consumption

    Takayuki Nii, Youhua Shi, Nozomu Togawa, Kimiyoshi Usami, Masao Yanagisawa

    研究報告システムとLSIの設計技術(SLDM)   2014 ( 2 ) 1 - 6  2014.09

     View Summary

    The increase of energy consumption due to improved performance has become a problem in the mobile terminal, and various low energy design techniques have been proposed. Variable Stages Pipeline(VSP) technique is one of them, which can reduce glitches by using a special LDS-cell(Latch D-FF selector-cell). However, glitches that occur during the low clock phase will still be propagated to next stages. In this paper, we propose a method for variable stages pipeline designs by applying local pulse generation and clock gating in LE mode for further energy reduction. We implemented the proposed method to a multiplier and experimental results show that the energy is reduced by 3.08% when compared to conventional VSP.

    CiNii

  • A-3-12 Low Area Overhead Fault-Secure High-Level Synthesis for Floorplan-Driven Architectures

    Kawamura Kazushi, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2014   56 - 56  2014.09

    CiNii

  • A-3-13 A High-level Synthesis Algorithm with Delay Variation Tolerance Maximization for RDR Architectures

    Hagio Yuta, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2014   57 - 57  2014.09

    CiNii

  • A-17-10 Start/Target-Point Determination Methods for Indoor Pedestrain Navigation System based on Visibility Graphs

    Machida Satoshi, Yanagisawa Masao, Togawa Nozomu, Nitta Tomoyuki, Shindo Daisuke, Tanaka Kiyotaka

    Proceedings of the Society Conference of IEICE   2014   119 - 119  2014.09

    CiNii

  • A-17-11 Indoor Positioning System using Sensors and Bluetooth Beacons based on Visibility Graphs for Mobile Devices

    Fujita Hiroshi, Yanagisawa Masao, Togawa Nozomu, Nitta Tomoyuki, Shindo Daisuke, Tanaka Kiyotaka

    Proceedings of the Society Conference of IEICE   2014   120 - 120  2014.09

    CiNii

  • A-17-12 A Link Shaping Algorithm based on Feature Point Extraction using Cubic Bezier Curve

    Orihara Terutaka, Yanagisawa Masao, Togawa Nozomu, Nitta Tomoyuki, Shindou Daisuke, Tanaka Kiyotaka

    Proceedings of the Society Conference of IEICE   2014   121 - 121  2014.09

    CiNii

  • A Floorplan-driven High-Level Synthesis Algorithm for Reducing Multiplexer Inputs Targeting FPGAs

    FUJIWARA Koichi, ABE Shinya, KAWAMURA Kazushi, YANAGISAWA Masao, TOGAWA Nozomu

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   114 ( 126 ) 219 - 224  2014.07

     View Summary

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in situations when it is need to improve specifications in a short time such as computerized stock tradings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer&#039;s cost concurrently. In this paper, we propose a floorplan-aware HLS algorithm for reducing multiplexer inputs targeting FPGA designs. By utilizing a distirbuted-register architecture called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer inputs, we propose a novel binding methods called datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the number of slices by up to 33% and 13% on average compared with the conventional approach.

    CiNii

  • A Floorplan-driven High-Level Synthesis Algorithm for Reducing Multiplexer Inputs Targeting FPGAs

    FUJIWARA Koichi, ABE Shinya, KAWAMURA Kazushi, YANAGISAWA Masao, TOGAWA Nozomu

    Mathematical Systems Science and its Applications : IEICE technical report   114 ( 125 ) 219 - 224  2014.07

     View Summary

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in situations when it is need to improve specifications in a short time such as computerized stock tradings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer&#039;s cost concurrently. In this paper, we propose a floorplan-aware HLS algorithm for reducing multiplexer inputs targeting FPGA designs. By utilizing a distirbuted-register architecture called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer inputs, we propose a novel binding methods called datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the number of slices by up to 33% and 13% on average compared with the conventional approach.

    CiNii

  • A Floorplan-driven High-Level Synthesis Algorithm for Reducing Multiplexer Inputs Targeting FPGAs

    FUJIWARA Koichi, ABE Shinya, KAWAMURA Kazushi, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Circuits and systems   114 ( 122 ) 219 - 224  2014.07

     View Summary

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in situations when it is need to improve specifications in a short time such as computerized stock tradings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer&#039;s cost concurrently. In this paper, we propose a floorplan-aware HLS algorithm for reducing multiplexer inputs targeting FPGA designs. By utilizing a distirbuted-register architecture called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer inputs, we propose a novel binding methods called datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the number of slices by up to 33% and 13% on average compared with the conventional approach.

    CiNii

  • A Floorplan-driven High-Level Synthesis Algorithm for Reducing Multiplexer Inputs Targeting FPGAs

    FUJIWARA Koichi, ABE Shinya, KAWAMURA Kazushi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   114 ( 123 ) 219 - 224  2014.07

     View Summary

    Recently, high-level synthesis (HLS) techniques for FPGA designs are required in situations when it is need to improve specifications in a short time such as computerized stock tradings. In HLS for FPGA designs, we need to consider module floorplan and reduce multiplexer&#039;s cost concurrently. In this paper, we propose a floorplan-aware HLS algorithm for reducing multiplexer inputs targeting FPGA designs. By utilizing a distirbuted-register architecture called HDR, we can easily consider module floorplan in HLS. In order to reduce multiplexer inputs, we propose a novel binding methods called datapath-oriented register binding. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the number of slices by up to 33% and 13% on average compared with the conventional approach.

    CiNii

  • Improved scan-based side-channel attack on the LED block cipher independent of scan structure

    FUJISHIRO Mika, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   113 ( 454 ) 31 - 36  2014.03

     View Summary

    LED (Light Encryption Device) block cipher, one of lightweight block ciphers, is very compact in hard-ware. The conventional scan-based side-channel attack method on the LED would not retrieve the secret key if the scan chain length in the LED LSI is about 30,000 bits or more. In this paper, an improved scan-based attack method on the LED block cipher independent of scan structure is proposed. Experimental results show that our proposed method successfully retrieves its 64-bit secret key using 36 plaintexts on average if the scan chain is only connected to the LED block cipher. These experimental results also show the key is successfully retrieved even if the scan chain includes additional 130,000 1-bit registers.

    CiNii

  • Latch-based AES Encryption Circuit Against Fault Analysis

    SHI Youhua, TANIGUCHI Hiroaki, TOGAWA Nozomu, YANAGISAWA Masao

    Technical report of IEICE. VLD   113 ( 454 ) 37 - 42  2014.03

     View Summary

    In general, cryptography is considered to be secure because it is based on complicated mathematical theories. In recent year, however, attacks on not crypto algorithms but hardware implementations such as fault analysis methods have posed new security threats. Cryptographic circuits are prone to fault analysis that intend to retrieve secret data by means of malicious fault injection. Clock-adjustment, voltage change, and laser manipulation can be used to inject malicious faults during the execution of a crypto circuit. As countermeasures against fault analysis, area-redundant methods such as triple modular redundant(TMR) and timing-redundant methods have been proposed at the cost of area or throughput. In this paper, we proposed a latch-based AES encryption circuit, with 18.1% area overhead and 5% throughput improvement, which can detect all the possible errors during the fault analysis region of clock glitch based fault analysis. In addition to fault analysis detection, the proposed method can also prevent the transmission and the use of erroneous results, and then can guarantee the correctness of the final encrypted outputs.

    CiNii

  • Secure scan design using improved random order scans and its evaluations

    OYA Masaru, ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   113 ( 454 ) 43 - 48  2014.03

     View Summary

    Scan test using scan chains is one of the most important DFT techniques. On the other hand, scan-based attacks are reported which can retrieve the secret key in crypto circuits by using scan chains. Secure scan architecture is strongly required to protect scan chains from scan-based attacks. In this paper, we propose an improved version of random order scans as a secure scan architecture. In our improved random order scans, a scan chain is partitioned into multiple sub-chains. The structure of the scan chain changes dynamically by selecting a subchain to scan out using enable signals. We also discuss testability and security of our improved random order scans and demonstrate their effectiveness through implementation results.

    CiNii

  • Exposure source optimization by clustering for lithography

    TAWADA Masashi, YANAGISAWA Masao, TOGAWA Nozomu, HASHIMOTO Takaki, SAKANUSHI Keishi, NOJIMA Shigeki, KOTANI Toshiya

    Technical report of IEICE. VLD   113 ( 454 ) 105 - 110  2014.03

     View Summary

    In lithography, we generate patterns on a wafer through a photomask, where patterns generated have to be close to ideal patterns by optimizing a photomask as well as an exposure source. One of the most important tasks here is to speed-up exposure source optimization to have overall optimized photomask and exposure source. In this paper, we propose a speeding-up method for exposure source optimization by clustering for lithography. In our method, we cluster several source grid-points utilizing the lithography property and reduce the number of parameters to be optimized simultaneously. Experimental results demonstrate that our method achieves 8X speed-up compared with a conventional method.

    CiNii

  • Experiment and Analysis on Temperature Dependence of Delay and Energy for Subthreshold Circuits

    KUSHIDA Hiroki, SHI Youhua, TOGAWA Nozomu, USAMI Kimiyoshi, YANAGISAWA Masao

    Technical report of IEICE. VLD   113 ( 454 ) 147 - 151  2014.03

     View Summary

    Low voltage design has been used in order to reduce the energy dissipation of mobile network equipment. However, as supply voltage reduces into subthreshold region, performance degradation and environment variations become the primary design challenges. In this paper, we implemented a super-pipelined multiplier for subthreshold supply voltage. With super-pipeline, the performance and energy efficiency can be improved. Moreover, experimental evaluations on the temperature dependences of delay and energy are also conducted for analysis.

    CiNii

  • A Locality-Driven Task Mapping Algorithm for Multi-FPGA Systems

    KATANO Hiroki, LEE SeungJu, TOGAWA Nozomu, AOKI Takashi, SEKIHARA Yusuke, NAKANISHI Mamoru

    Technical report of IEICE. VLD   113 ( 416 ) 143 - 148  2014.01

     View Summary

    Recently, a scalable and reconfigurable multi-FPGA system has been proposed which consists of two or more boards, each of which consists of one router FPGA chip and five general-purpose FPGA chips. The five general-purpose FPGA chips are connected to form a ring and the router FPGA chip performs inter-board communications. How to map a task graph onto such a multi-FPGA system is one of the challenging problems. In this paper, we propose a task mapping algorithm for a multi-FPGA system. Since the multi-FPGA system has a hierarchical structure, we have to find out locality in a given task graph. In our proposed algorithm, we focus on the communication rate between tasks and try to assign the ones with many communications between them to the same FPGA chip one by one. Experimental results demonstrate the effectiveness of our proposed algorithm.

    CiNii

  • An Area Constraint-Based Fault-Secure HLS Algorithm for RDR Architectures Considering Trade-Off between Reliability and Time Overhead

    KAWAMURA Kazushi, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   113 ( 321 ) 129 - 134  2013.11

     View Summary

    With process technology scaling, decreasing reliability caused by soft errors as well as increasing the average interconnection delays are becoming serious issues. The fault-secure design technique which utilizes concurrent error detection is one of the approaches to overcome reliability degradation, and we can design systems based on trade-off between reliability and several kinds of overhead by giving a partial redundancy to operations. In this paper, we propose a partial redundant fault-secure high-level synthesis algorithm for RDR architectures. Our proposed algorithm receives a fixed area constraint and various time constrains as inputs, and aims at maximizing reliability under them. Experimental results demonstrate that our algorithm improves reliability by up to 44% with zero time and area overhead compared with the conventional approach. They also show that we can realize complete duplication of operations with zero area overhead and about 50% time overhead.

    CiNii

  • Suspicious timing error prediction using check points

    IGARASHI Hiroaki, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   113 ( 321 ) 39 - 44  2013.11

     View Summary

    Due to advance process technologies, timing design of LSIs has become more difficult and the importance of timing error countermeasure techniques is increasing as well. Existing timing error detection/correction methods have difficulties in timing design since they have complex structure. Furthermore, their error correction is realized by re-run operation which results in low throughput. We have proposed a suspicious timing error prediction method (STEP method) which predicts timing error and corrects it with simple structure. STEP is based on checking timing errors by observing several checkpoints on signal paths. Since STEP is a timing error prediction method, we may have false positives and reduction of them is one of the largest problems. In this paper, we propose a method to reduce the false positives to optimize the checkpoints. The experimental results show that an operational frequency is increased by up to 2.4 times and its throughput is improved by up to 45%.

    CiNii

  • Energy Evaluation of Writing Reduction Method for Non-Volatile Memory

    TAWADA Masashi, KIMURA Shinji, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   113 ( 321 ) 141 - 146  2013.11

     View Summary

    Non-volatile memory has many advantages over SRAM, such as high density, low leakage power, and non-volatility. However, one of its largest problems is that it consumes a large amount of energy in writing. It is quite necessary to reduce the number of writing bits and thus decrease its writing energy. We have proposed a memory writing reduction method based on error correcting codes. When a data is written into a memory, we do not write it directly but encode it into a codeword. Then the number of writing bits into memory is also limited in data writing. In this paper, we demonstrate several experimental evaluations from the viewpoints of energy reduction and discuss the effectiveness of our proposed writing-reduction codes.

    CiNii

  • Clock Energy-efficient High-level Synthesis and Experimental Evaluation for HDR-mcd Architecture

    ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   113 ( 321 ) 263 - 268  2013.11

     View Summary

    In this paper, we propose a clock energy-efficient high-level synthesis algorithm for HDR-mcd architecture. In HDR-mcd, an entire chip is divided into several huddles. Huddles can realize synchronization between different clock domains in which interconnection delay is required and should be considered during high-level synthesis. In our iterative improvement based algorithm, low-frequency clocks are assigned to non-critical huddles under resource and latency constraints for energy efficiency improvement. Experimental results show that the proposed method achieves 20% clock energy-saving and 10% total energy-saving compared with the existing methods considering clock gating.

    CiNii

  • Scan-based Attack on the LED Block Cipher Using Scan Signatures

    FUJISHIRO Mika, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Image engineering   113 ( 237 ) 47 - 52  2013.10

     View Summary

    LED (Light Encryption Device) block cipher, one of lightweight block ciphers, is very compact in hardware. Its encryption process is composed of AES-like rounds. Recently, scan-based side-channel attacks are reported. It retrieves the secret information inside the cryptosystem utilizing scan chains, one of design-for-test techniques. In this paper, a scan-based attack method on the LED block cipher using scan signatures is proposed. In our proposed method, we focus on a particular 16-bit position in scanned data obtained from an LED LSI and retrieve its secret key using scan signatures. Experimental results show that our proposed method successfully retrieves its 64-bit secret key using 73 plaintexts on average. These experimental results also show the key is successfully retrieved even if the scan chain includes additional some 4000 1-bit registers.

    CiNii

  • Scan-based Attack on the LED Block Cipher Using Scan Signatures

    FUJISHIRO Mika, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. ICD   113 ( 236 ) 47 - 52  2013.10

     View Summary

    LED (Light Encryption Device) block cipher, one of lightweight block ciphers, is very compact in hardware. Its encryption process is composed of AES-like rounds. Recently, scan-based side-channel attacks are reported. It retrieves the secret information inside the cryptosystem utilizing scan chains, one of design-for-test techniques. In this paper, a scan-based attack method on the LED block cipher using scan signatures is proposed. In our proposed method, we focus on a particular 16-bit position in scanned data obtained from an LED LSI and retrieve its secret key using scan signatures. Experimental results show that our proposed method successfully retrieves its 64-bit secret key using 73 plaintexts on average. These experimental results also show the key is successfully retrieved even if the scan chain includes additional some 4000 1-bit registers.

    CiNii

  • A High-Level Synthesis Algorithm with Post-Silicon Delay Tuning for RDR Architectures and its Experimental Evaluations

    HAGIO Yuta, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   113 ( 235 ) 41 - 46  2013.09

     View Summary

    As device feature size drops, interconnection delays often exceed gate delays. We have to incorporate interconnection delays even in high-level synthesis. Using RDR architectures is one of the effective solutions to this problem. At the same time, process and delay variation also becomes a serious problem which may result in several timing errors. How to deal with this problem is another key issue in high-level synthesis. Thus, we have proposed a high-level synthesis algorithm with post-silicon delay tuning for RDR architectures. In this paper, we evaluate our high-level synthesis algorithm comparing several existing algorithms considering several situations. Experimental results show that our algorithm successfully reduces delayed scheduling/binding latency by up to 42.9% compared with the conventional approach.

    CiNii

  • Scan-based Attack on the LED Block Cipher Using Scan Signatures

    FUJISHIRO Mika, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   113 ( 235 ) 47 - 52  2013.09

     View Summary

    LED (Light Encryption Device) block cipher, one of lightweight block ciphers, is very compact in hardware. Its encryption process is composed of AES-like rounds. Recently, scan-based side-channel attacks are reported. It retrieves the secret information inside the cryptosystem utilizing scan chains, one of design-for-test techniques. In this paper, a scan-based attack method on the LED block cipher using scan signatures is proposed. In our proposed method, we focus on a particular 16-bit position in scanned data obtained from an LED LSI and retrieve its secret key using scan signatures. Experimental results show that our proposed method successfully retrieves its 64-bit secret key using 73 plaintexts on average. These experimental results also show the key is successfully retrieved even if the scan chain includes additional some 4000 1-bit registers.

    CiNii

  • A Bi-Linear Interpolation Unit Using Selector Logics

    SHIO Masashi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   113 ( 235 ) 53 - 58  2013.09

     View Summary

    Bi-Linear interpolation is one of interpolation techniques, which interpolates a value linearly from its four circumferences. Bi-Linear interpolation is often used for image scaling and correction of distortion. In this paper, we propose a high-speed bi-linear interpolation circuit reducing carry propagation delay by using selector logics. We have implemented our bi-linear interpolation circuit in several ways and evaluated each of them.

    CiNii

  • A Road-network Shaping Algorithm for Smoothly-connected Deformed Map Generation

    Orihara Terutaka, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2013   143 - 143  2013.09

    CiNii

  • Data Recoverable AES Circuit Against Differential Fault Analysis

    Taniguchi Hiroaki, Shi Youhua, Togawa Nozomu, Yanagisawa Masao

    Proceedings of the Society Conference of IEICE   2013   49 - 49  2013.09

    CiNii

  • A Comsideration on Hardware Trojan Detection Specifying Trojan Path

    Atobe Yuta, Shi Youhua, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2013   48 - 48  2013.09

    CiNii

  • Evaluation of energy consumption for two-level cache using Non-Volatile Memory for IL1 and UL2 caches

    MATSUNO Shota, TAWADA Masashi, YANAGISAWA Masao, KIMURA Shinji, TOGAWA Nozomu, SUGIBAYASHI Tadahiko

    Mathematical Systems Science and its Applications : IEICE technical report   113 ( 121 ) 89 - 94  2013.07

     View Summary

    A non-volatile memory has advantages such as low leak energy and non-volatility compared with SRAM or DRAM has high leak energy. It is strongly expected to use a non-volatile memory for realizing normally-off systems. A non-volatile memory, however, consumes more energy to write than SRAM or DRAM. In this paper, we evaluate energy consumption of a cache memory in an embedded processor with non-volatile memories. In our evaluation, we assume that their write energy is 1.0x to 10.0x higher than that of SRAM. Experimental evaluations demonstrate that using non-volatile memories in a cache is better choice in some cases, even when write energy of non-volatile memories is 10.0x higher than that of SRAM.

    CiNii

  • A non-volatile memory writing reduction method based on state encoding limiting maximum Hamming distance

    TAWADA Masashi, KIMURA Shinji, YANAGISAWA Masao, TOGAWA Nozomu

    Mathematical Systems Science and its Applications : IEICE technical report   113 ( 121 ) 95 - 100  2013.07

     View Summary

    Non-volatile memory has many advantages over SRAM, such as high density, low leakage power, and non-volatility. However, one of its largest problems is that it consumes a large amount of energy in writing. It is quite necessary to reduce the number of writing bits and thus decrease its writing energy. In this paper, we propose a memory writing reduction method based on state encoding limiting maximum Hamming distance. When a data is written into a memory, we do not write it directly but encode it into a codeword. Then we write the codeword into a memory. At this time, we encode a data into a codeword limiting its maximum Hamming distance from another codeword. If the maximum Hamming distance is limited among all the codewords, the number of flipped bits are also limited and then the number of writing bits will be reduced. We show several experimental evaluations and discuss the effectiveness of our proposed algorithm.

    CiNii

  • Evaluation of energy consumption for two-level cache using Non-Volatile Memory for IL1 and UL2 caches

    MATSUNO Shota, TAWADA Masashi, YANAGISAWA Masao, KIMURA Shinji, TOGAWA Nozomu, SUGIBAYASHI Tadahiko

    IEICE technical report. Circuits and systems   113 ( 118 ) 89 - 94  2013.07

     View Summary

    A non-volatile memory has advantages such as low leak energy and non-volatility compared with SRAM or DRAM has high leak energy. It is strongly expected to use a non-volatile memory for realizing normally-off systems. A non-volatile memory, however, consumes more energy to write than SRAM or DRAM. In this paper, we evaluate energy consumption of a cache memory in an embedded processor with non-volatile memories. In our evaluation, we assume that their write energy is 1.0x to 10.0x higher than that of SRAM. Experimental evaluations demonstrate that using non-volatile memories in a cache is better choice in some cases, even when write energy of non-volatile memories is 10.0x higher than that of SRAM.

    CiNii

  • A non-volatile memory writing reduction method based on state encoding limiting maximum Hamming distance

    TAWADA Masashi, KIMURA Shinji, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Circuits and systems   113 ( 118 ) 95 - 100  2013.07

     View Summary

    Non-volatile memory has many advantages over SRAM, such as high density, low leakage power, and non-volatility. However, one of its largest problems is that it consumes a large amount of energy in writing. It is quite necessary to reduce the number of writing bits and thus decrease its writing energy. In this paper, we propose a memory writing reduction method based on state encoding limiting maximum Hamming distance. When a data is written into a memory, we do not write it directly but encode it into a codeword. Then we write the codeword into a memory. At this time, we encode a data into a codeword limiting its maximum Hamming distance from another codeword. If the maximum Hamming distance is limited among all the codewords, the number of flipped bits are also limited and then the number of writing bits will be reduced. We show several experimental evaluations and discuss the effectiveness of our proposed algorithm.

    CiNii

  • Evaluation of energy consumption for two-level cache using Non-Volatile Memory for IL1 and UL2 caches

    MATSUNO Shota, TAWADA Masashi, YANAGISAWA Masao, KIMURA Shinji, TOGAWA Nozomu, SUGIBAYASHI Tadahiko

    IEICE technical report. Signal processing   113 ( 120 ) 89 - 94  2013.07

     View Summary

    A non-volatile memory has advantages such as low leak energy and non-volatility compared with SRAM or DRAM has high leak energy. It is strongly expected to use a non-volatile memory for realizing normally-off systems. A non-volatile memory, however, consumes more energy to write than SRAM or DRAM. In this paper, we evaluate energy consumption of a cache memory in an embedded processor with non-volatile memories. In our evaluation, we assume that their write energy is 1.0x to 10.0x higher than that of SRAM. Experimental evaluations demonstrate that using non-volatile memories in a cache is better choice in some cases, even when write energy of non-volatile memories is 10.0x higher than that of SRAM.

    CiNii

  • Evaluation of energy consumption for two-level cache using Non-Volatile Memory for IL1 and UL2 caches

    MATSUNO Shota, TAWADA Masashi, YANAGISAWA Masao, KIMURA Shinji, TOGAWA Nozomu, SUGIBAYASHI Tadahiko

    Technical report of IEICE. VLD   113 ( 119 ) 89 - 94  2013.07

     View Summary

    A non-volatile memory has advantages such as low leak energy and non-volatility compared with SRAM or DRAM has high leak energy. It is strongly expected to use a non-volatile memory for realizing normally-off systems. A non-volatile memory, however, consumes more energy to write than SRAM or DRAM. In this paper, we evaluate energy consumption of a cache memory in an embedded processor with non-volatile memories. In our evaluation, we assume that their write energy is 1.0x to 10.0x higher than that of SRAM. Experimental evaluations demonstrate that using non-volatile memories in a cache is better choice in some cases, even when write energy of non-volatile memories is 10.0x higher than that of SRAM.

    CiNii

  • 特定形状を考慮した視認性の良いエリア略地図生成手法

    折原照崇, 柳澤政生, 戸川望

    マルチメディア、分散協調とモバイルシンポジウム2013論文集   2013   2036 - 2043  2013.07

    CiNii

  • VNS:可視グラフに基づく屋内環境ナビゲーションシステム

    町田理, 町田直哉, 柳澤政生, 戸川望

    マルチメディア、分散協調とモバイルシンポジウム2013論文集   2013   688 - 701  2013.07

    CiNii

  • モバイル端末におけるセンサ利用型現在位置測位の精度評価

    藤田博, 柳澤政生, 戸川望

    マルチメディア、分散協調とモバイルシンポジウム2013論文集   2013   175 - 181  2013.07

    CiNii

  • ランドマーク表示歩行者向けナビゲーションシステム

    岩田裕樹, 柳澤政生, 戸川望

    マルチメディア、分散協調とモバイルシンポジウム2013論文集   2013   702 - 716  2013.07

    CiNii

  • A Linear Interpolation Unit Using Selector Logics

    SHIO Masashi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   113 ( 30 ) 49 - 54  2013.05

     View Summary

    Interpolation is a technique that presumes a value between existing data, which is often used for image scaling and correction of distortion. A linear interpolation is one of the interpolation techniques which interpolates inbetween values by linearly connecting two known values. It is used practically in many cases because there are comparatively small computation cost. In this paper, we propose a high-speed linear interpolation circuit based on selector logics. The proposed linear interpolation circuit reduces carry propagation delay by using selector logics and then realizes a fast operation. We have implemented our linear interpolation circuit in several ways and evaluated each of them. We can find out that a selector-based linear interpolation circuit where its partial products are summed up by using the arithmetic operator reduces its delay by a maximum of 16% compared with a linear interpolation circuit synthesized by using arithmetic operators only.

    CiNii

  • Scan-based Attack against Trivium Stream Cipher Using Scan Signatures

    FUJISHIRO Mika, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   113 ( 30 ) 61 - 66  2013.05

     View Summary

    Trivium is a synchronous stream cipher using three shift registers. It is designed to have a simple structure and runs at high speed. A scan-based side-channel attack retrieves secret information using scan chains, one of design-for-test techniques. Since a conventional scan-based attack against Trivium assumes that a scan chain connects just registers in Trivium, it is difficult to apply it to a practical Trivium LSI chip. In this paper, a scan-based attack method against Trivium using scan signatures is proposed. In our method, we focus on a particular 1-bit position in a collection of scan chains and then we can attack Trivium even if the scan chain includes other registers than internal state registers in Trivium. Experimental results show that our proposed method successfully retrieves a plaintext from a ciphertext.

    CiNii

  • A Zero Time and Area Overhead Fault-Secure High-Level Synthesis Algorithm for RDR Architectures

    KAWAMURA Kazushi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   113 ( 30 ) 67 - 72  2013.05

     View Summary

    In this paper, we propose a zero time and area overhead fault-secure high-level synthesis algorithm for RDR architectures. We duplicate some operations under a given time and area constraint and improve reliability by detecting the faults caused by soft errors. Experimental results demonstrate that our algorithm improves reliability by up to 37.73% with zero time and area overhead compared with the conventional approach.

    CiNii

  • A Zero Time and Area Overhead Fault-Secure High-Level Synthesis Algorithm for RDR Architectures

    Kazushi Kawamura, Masao Yanagisawa, Nozomu Togawa

    研究報告システムLSI設計技術(SLDM)   2013 ( 12 ) 1 - 6  2013.05

     View Summary

    In this paper, we propose a zero time and area overhead fault-secure high-level synthesis algorithm for RDR architectures. We duplicate some operations under a given time and area constraint and improve reliability by detecting the faults caused by soft errors. Experimental results demonstrate that our algorithm improves reliability by up to 37.73% with zero time and area overhead compared with the conventional approach.

    CiNii

  • A Linear Interpolation Unit Using Selector Logics

    Masashi Shio, Masao Yanagisawa, Nozomu Togawa

    研究報告システムLSI設計技術(SLDM)   2013 ( 9 ) 1 - 6  2013.05

     View Summary

    Interpolation is a technique that presumes a value between existing data, which is often used for image scaling and correction of distortion. A linear interpolation is one of the interpolation techniques which interpolates inbetween values by linearly connecting two known values. It is used practically in many cases because there are comparatively small computation cost. In this paper, we propose a high-speed linear interpolation circuit based on selector logics. The proposed linear interpolation circuit reduces carry propagation delay by using selector logics and then realizes a fast operation. We have implemented our linear interpolation circuit in several ways and evaluated each of them. We can find out that a selector-based linear interpolation circuit where its partial products are summed up by using the arithmetic operator reduces its delay by a maximum of 16% compared with a linear interpolation circuit synthesized by using arithmetic operators only.

    CiNii

  • Scan-based Attack against Trivium Stream Cipher Using Scan Signatures

    Mika Fujishiro, Masao Yanagisawa, Nozomu Togawa

    研究報告システムLSI設計技術(SLDM)   2013 ( 11 ) 1 - 6  2013.05

     View Summary

    Trivium is a synchronous stream cipher using three shift registers. It is designed to have a simple structure and runs at high speed. A scan-based side-channel attack retrieves secret information using scan chains, one of design-for-test techniques. Since a conventional scan-based attack against Trivium assumes that a scan chain connects just registers in Trivium, it is difficult to apply it to a practical Trivium LSI chip. In this paper, a scan-based attack method against Trivium using scan signatures is proposed. In our method, we focus on a particular 1-bit position in a collection of scan chains and then we can attack Trivium even if the scan chain includes other registers than internal state registers in Trivium. Experimental results show that our proposed method successfully retrieves a plaintext from a ciphertext.

    CiNii

  • Accuracy Evaluation of Trace-based Cache Simulation for Two-core L1 Caches

    MASASHI TAWADA, MASAO YANAGISAWA, NOZOMU TOGAWA

    IEICE technical report. Dependable computing   112 ( 482 ) 85 - 90  2013.03

     View Summary

    In trace-based cache simulation, we perform cache simulation based on a particular memor/access trace obtained by cycle-accurate memory simulation. While cycle-accurate simulation takes too many time to run, trace-based cache simulation runs very fast and then we can evaluate many cache configurations in a short time. Let us consider a multi-core processor cache. We can obtain a memory access trace by using a cycle-accurate memory simulation but it can be changed when we consider another multi-core processor cache configuration. One of the main concerns in trace-based cache simulation applied to multi-core processor caches is its accuracy when the cache configuration that the memory access trace assumed is different from those the trace-based cache simulation targets. In this paper, we evaluate how much memory access traces affect cache configuration simulation when cache configurations simulated are different from the one that memory access traces assume, using several benchmark applications.

    CiNii

  • フロアプランを考慮したマルチクロックドメイン指向の低電力化高位合成手法 (コンピュータシステム 組込み技術とネットワークに関するワークショップETNET2013)

    阿部 晋矢, 史 又華, 柳澤 政生, 戸川 望

    電子情報通信学会技術研究報告 : 信学技報   112 ( 481 ) 115 - 120  2013.03

     View Summary

    本稿では,マルチクロックドメイン適用へ向け,HDRアーキテクチャを拡張したHDR-mcdを提案する.続いてHDR-mcdを対象にマルチクロックドメイン指向の低電力化高位合成を提案する.提案手法はフロアプラン情報をフィードバックし,反復改良する合成フローを取る.その際,1クロック内の通信が保障されるパドルと呼ぶ区画を利用し,配線遅延の影響を予測,異なるクロック間の同期を考慮した高位合成を実現する.クロックはパドル毎に割り当て,資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する.計算機実験により提案手法は従来の単一クロックのみを考慮したレジスタ分散型アーキテクチャと比較し25%程度消費エネルギーを削減できることを確認した.

    CiNii

  • Accuracy Evaluation of Trace-based Cache Simulation for Two-core L1 Caches

    MASASHI TAWADA, MASAO YANAGISAWA, NOZOMU TOGAWA

    IEICE technical report. Computer systems   112 ( 481 ) 85 - 90  2013.03

     View Summary

    In trace-based cache simulation, we perform cache simulation based on a particular memor/access trace obtained by cycle-accurate memory simulation. While cycle-accurate simulation takes too many time to run, trace-based cache simulation runs very fast and then we can evaluate many cache configurations in a short time. Let us consider a multi-core processor cache. We can obtain a memory access trace by using a cycle-accurate memory simulation but it can be changed when we consider another multi-core processor cache configuration. One of the main concerns in trace-based cache simulation applied to multi-core processor caches is its accuracy when the cache configuration that the memory access trace assumed is different from those the trace-based cache simulation targets. In this paper, we evaluate how much memory access traces affect cache configuration simulation when cache configurations simulated are different from the one that memory access traces assume, using several benchmark applications.

    CiNii

  • フロアプランを考慮したマルチクロックドメイン指向の低電力化高位合成手法

    阿部晋矢, 史又華, 柳澤政生, 戸川望

    研究報告組込みシステム(EMB)   2013 ( 20 ) 1 - 6  2013.03

     View Summary

    本稿では,マルチクロックドメイン適用へ向け,HDRアーキテクチャを拡張したHDR-mcdを提案する.続いてHDR-mcdを対象にマルチクロックドメイン指向の低電力化高位合成を提案する.提案手法はフロアプラン情報をフィードバックし,反復改良する合成フローを取る.その際,1クロック内の通信が保障されるハドルと呼ぶ区画を利用し,配線遅延の影響を予測,異なるクロック間の同期を考慮した高位合成を実現する.クロックはハドル毎に割り当て,資源制約と時間制約を満たす範囲で低い周波数のクロックを割り当てることで低電力化する.計算機実験により提案手法は従来の単一クロックのみを考慮したレジスタ分散型アーキテクチャと比較し25%程度消費エネルギーを削減できることを確認した.

    CiNii

  • Accuracy Evaluation of Trace-based Cache Simulation for Two-core L1 Caches

    多和田 雅師, 柳澤 政生, 戸川 望

    研究報告システムLSI設計技術(SLDM)   2013 ( 15 ) 1 - 6  2013.03

     View Summary

    一般にプロセッサ上でアプリケーションを走らせた場合にキャッシュがどのように動作するかサイクル精度でシミュレーションすると時間がかかる.そこで,特定のキャッシュ構成を想定してサイクル精度でシミュレーションすることによりメモリアクセストレースを入手し,メモリアクセストレースを用いてキャッシュ動作をトレースベースシミュレーションするとシミュレーション時間を極めて短くできる.ここでキャッシュのトレースベースシミュレーションとは,メモリアクセストレースに従ってプロセッサがメモリアクセスすると仮定し,キャッシュがどのように動作するかのシミュレーションである.ところが,マルチコアアーキテクチャではメモリアクセスは原理的に,想定するキャッシュ構成によって変化する.トレースベースシミュレーションをマルチコアアーキテクチャに適用した場合,メモリアクセストレースを入手するときに想定したキャッシュ構成とトレースベースシミュレーションで想定したキャッシュ構成が異なるとトレースベースシミュレーション結果はサイクル精度シミュレーション結果と一致しない.本稿では,メモリアクセストレースを入手するときに想定したキャッシュ構成とトレースベースシミュレーションで想定したキャッシュ構成が異なるとき,トレースベースシミュレーションがどの程度,サイクル精度シミュレーションと一致するかを評価する.In trace-based cache simulation, we perform cache simulation based on a particular memory access trace obtained by cycle-accurate memory simulation. While cycle-accurate simulation takes too many time to run, trace-based cache simulation runs very fast and then we can evaluate many cache configurations in a short time. Let us consider a multi-core processor cache. We can obtain a memory access trace by using a cycle-accurate memory simulation but it can be changed when we consider another multi-core processor cache configuration. One of the main concerns in trace-based cache simulation applied to multi-core processor caches is its accuracy when the cache configuration that the memory access trace assumed is different from those the trace-based cache simulation targets. In this paper, we evaluate how much memory access traces affect cache configuration simulation when cache configurations simulated are different from the one that memory access traces assume, using several benchmark applications.

    CiNii

  • Accuracy Evaluation of Trace-based Cache Simulation for Two-core L1 Caches

    多和田 雅師, 柳澤 政生, 戸川 望

    研究報告組込みシステム(EMB)   2013 ( 15 ) 1 - 6  2013.03

     View Summary

    一般にプロセッサ上でアプリケーションを走らせた場合にキャッシュがどのように動作するかサイクル精度でシミュレーションすると時間がかかる.そこで,特定のキャッシュ構成を想定してサイクル精度でシミュレーションすることによりメモリアクセストレースを入手し,メモリアクセストレースを用いてキャッシュ動作をトレースベースシミュレーションするとシミュレーション時間を極めて短くできる.ここでキャッシュのトレースベースシミュレーションとは,メモリアクセストレースに従ってプロセッサがメモリアクセスすると仮定し,キャッシュがどのように動作するかのシミュレーションである.ところが,マルチコアアーキテクチャではメモリアクセスは原理的に,想定するキャッシュ構成によって変化する.トレースベースシミュレーションをマルチコアアーキテクチャに適用した場合,メモリアクセストレースを入手するときに想定したキャッシュ構成とトレースベースシミュレーションで想定したキャッシュ構成が異なるとトレースベースシミュレーション結果はサイクル精度シミュレーション結果と一致しない.本稿では,メモリアクセストレースを入手するときに想定したキャッシュ構成とトレースベースシミュレーションで想定したキャッシュ構成が異なるとき,トレースベースシミュレーションがどの程度,サイクル精度シミュレーションと一致するかを評価する.In trace-based cache simulation, we perform cache simulation based on a particular memory access trace obtained by cycle-accurate memory simulation. While cycle-accurate simulation takes too many time to run, trace-based cache simulation runs very fast and then we can evaluate many cache configurations in a short time. Let us consider a multi-core processor cache. We can obtain a memory access trace by using a cycle-accurate memory simulation but it can be changed when we consider another multi-core processor cache configuration. One of the main concerns in trace-based cache simulation applied to multi-core processor caches is its accuracy when the cache configuration that the memory access trace assumed is different from those the trace-based cache simulation targets. In this paper, we evaluate how much memory access traces affect cache configuration simulation when cache configurations simulated are different from the one that memory access traces assume, using several benchmark applications.

    CiNii

  • A-17-13 Universal Navigation System Using Landmarks on Mobile Terminal

    Ishida Tsuyoshi, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the IEICE General Conference   2013   244 - 244  2013.03

    CiNii

  • A-17-7 Pedestrian Friendly Route for Small Display Device using Deforrned Map

    Tsuji Kazuki, Yanagisawa Masao, Togawa Nozumu

    Proceedings of the IEICE General Conference   2013   238 - 238  2013.03

    CiNii

  • B-6-56 Duty Cycle Optimization for SMAC Protocol Minimizing Energy Consumption in Sensor NetWorks

    Ohgishi Ryuji, Yanagisawa Masao, Togawa Nozumu

    Proceedings of the IEICE General Conference   2013 ( 2 ) 56 - 56  2013.03

    CiNii

  • A Temperature-A ware High-Level Synthesis Algorithm for Regular-Distributed-Register Architectures based on Accurate Energy Consumption Estimation

    KAWAMURA Kazushi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 320 ) 13 - 18  2012.11

     View Summary

    With process technology scaling, heat problems in IC chips as well as increasing the average interconnection delays are becoming serious issues. Recently, we have proposed a binding and allocation algorithm for regular-distributed-register architectures (RDR architectures) with the objective of minimizing the peak temperature. In this paper, we propose an improved thermal-aware high-level synthesis algorithm for RDR architectures. The RDR architecture divides the entire chip into islands regularly. Firstly, our algorithm balances the energy consumption among islands through re-binding to functional units. Secondly, it accurately estimates the energy consumption in each island and minimizes the maximum energy consumption among islands through re-allocating new additional functional units. Experimental results demonstrate that our algorithm reduces the peak temperature by up to 15.42% compared with the conventional approaches.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 320 ) 45 - 50  2012.11

     View Summary

    Secure cryptographic LSIs is intensively used in order to perform confidential operation. Scan test has become the most widely adopted test technique to ensure the correctness of manufactured LSIs, in which through the scan chains the internal states of the circuit under test (CUT) can be controlled and observed externally. However, scan chains using scan test might carry the risk of being misused for secret information leakage. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of vairous crypto circuits implementation is compromised, and the effectiveness of the proposed method. Experimental results on various crypto implementations show the effectiveness of the proposed method.

    CiNii

  • Scan-based Attack against Camellia Cryptosystems

    KODERA Hirokazu, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 320 ) 51 - 56  2012.11

     View Summary

    Camellia is a common key cryptosystem and it has higher tolerance for cryptoanalysis than AES. In addition, Camellia has a processing speed which is equivalent to AES. Because Camellia can share encryption processing with decryption processing and it doesn&#039;t use arithmetic operation, it can be implemented hardware with the small number of gates. Recently, scan-based attacks are reported which retrieve secret keys with scanned data obtained from scan chain. There are no reports on scan-based attack against Camellia. In this paper, we propose a scan-based attack method against Camellia. Camellia has an 18-round Feistel structure which repeats the round function 18 times. In our proposed method, attackers input two plaintexts to a Camellia cryptosystem LSI and obtain two scanned data. By XORing them, influence of S-funtion in the round function can be removed. We focus on specific bit column data of XORed scanned data and, by observing transition of correspoding registers. Then, attackers retrieve four equivalent keys and restore a secret key in Camellia. We showed that secret keys of Camellia are restored with our proposed method.

    CiNii

  • Energy-efficient High-level Synthesis Considering Clock Design for HDR Architectures

    AKASAKA Hiryuki, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 320 ) 129 - 134  2012.11

     View Summary

    With the miniaturization of LSIs and its increasing performance, demand for high-functional portable devices has grown significantly. At the same time, the problems for battery runtime and device overheating have occurred. On the other hand, the ratio of an interconnection delay to a gate delay has continued to increase as device feature size decreases. We have to estimate the interconnection delay and reduce energy consumption even in a high-level synthesis stage. In this paper, we propose high-level synthesis considering clock design for HDR architectures with concurrency-oriented scheduling. Firstly we focus on the number of the control steps at which we can apply the clock gating to registers and we schedule and bind operations to be performed at the same time. By adjusting the clock gating timings in a high-level synthesis stage, we enhance the effect of clock gatings than applying clock gatings after logic synthesis. Secondly, we determine the clock gating timings to minimize all energy consumption including clock tree energy. The experimental results show that our proposed algorithm reduces energy consumption by a maximum of 21.2% compared with several conventional algorithms.

    CiNii

  • SAAV:Energy-efficient High-level Synthesis Algorithm targeting Adaptive Voltage Huddle-based Distributed Register Architecture with Dynamic Multiple Supply Voltages

    ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 320 ) 135 - 140  2012.11

     View Summary

    An adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis, and a synthesis algorithm for AVHDR architectures have been proposed. This algorithm is based on iterative improvement of scheduling/binding and floorplanning and can converge without oscillation by using virtual-area-based iterative refinement flow. However, virtual areas may have some area and interconnection delay overheads. In this paper, we propose virtual area adaptation which relaxes these overheads as the iteration proceeds. Experimental results show that our algorithm achieves 6.2% energy saving compared with conventional algorithm for AVHDR architectures and 65.7% energy saving compared with conventional algorithms.

    CiNii

  • A Temperature-A ware High-Level Synthesis Algorithm for Regular-Distributed-Register Architectures based on Accurate Energy Consumption Estimation

    KAWAMURA Kazushi, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   112 ( 321 ) 13 - 18  2012.11

     View Summary

    With process technology scaling, heat problems in IC chips as well as increasing the average interconnection delays are becoming serious issues. Recently, we have proposed a binding and allocation algorithm for regular-distributed-register architectures (RDR architectures) with the objective of minimizing the peak temperature. In this paper, we propose an improved thermal-aware high-level synthesis algorithm for RDR architectures. The RDR architecture divides the entire chip into islands regularly. Firstly, our algorithm balances the energy consumption among islands through re-binding to functional units. Secondly, it accurately estimates the energy consumption in each island and minimizes the maximum energy consumption among islands through re-allocating new additional functional units. Experimental results demonstrate that our algorithm reduces the peak temperature by up to 15.42% compared with the conventional approaches.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration against Scan-Based Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   112 ( 321 ) 45 - 50  2012.11

     View Summary

    Secure cryptographic LSIs is intensively used in order to perform confidential operation. Scan test has become the most widely adopted test technique to ensure the correctness of manufactured LSIs, in which through the scan chains the internal states of the circuit under test (CUT) can be controlled and observed externally. However, scan chains using scan test might carry the risk of being misused for secret information leakage. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of vairous crypto circuits implementation is compromised, and the effectiveness of the proposed method. Experimental results on various crypto implementations show the effectiveness of the proposed method.

    CiNii

  • Scan-based Attack against Camellia Cryptosystems

    KODERA Hirokazu, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   112 ( 321 ) 51 - 56  2012.11

     View Summary

    Camellia is a common key cryptosystem and it has higher tolerance for cryptoanalysis than AES. In addition, Camellia has a processing speed which is equivalent to AES. Because Camellia can share encryption processing with decryption processing and it doesn&#039;t use arithmetic operation, it can be implemented hardware with the small number of gates. Recently, scan-based attacks are reported which retrieve secret keys with scanned data obtained from scan chain. There are no reports on scan-based attack against Camellia. In this paper, we propose a scan-based attack method against Camellia. Camellia has an 18-round Feistel structure which repeats the round function 18 times. In our proposed method, attackers input two plaintexts to a Camellia cryptosystem LSI and obtain two scanned data. By XORing them, influence of S-funtion in the round function can be removed. We focus on specific bit column data of XORed scanned data and, by observing transition of correspoding registers. Then, attackers retrieve four equivalent keys and restore a secret key in Camellia. We showed that secret keys of Camellia are restored with our proposed method.

    CiNii

  • Energy-efficient High-level Synthesis Considering Clock Design for HDR Architectures

    AKASAKA Hiryuki, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   112 ( 321 ) 129 - 134  2012.11

     View Summary

    With the miniaturization of LSIs and its increasing performance, demand for high-functional portable devices has grown significantly. At the same time, the problems for battery runtime and device overheating have occurred. On the other hand, the ratio of an interconnection delay to a gate delay has continued to increase as device feature size decreases. We have to estimate the interconnection delay and reduce energy consumption even in a high-level synthesis stage. In this paper, we propose high-level synthesis considering clock design for HDR architectures with concurrency-oriented scheduling. Firstly we focus on the number of the control steps at which we can apply the clock gating to registers and we schedule and bind operations to be performed at the same time. By adjusting the clock gating timings in a high-level synthesis stage, we enhance the effect of clock gatings than applying clock gatings after logic synthesis. Secondly, we determine the clock gating timings to minimize all energy consumption including clock tree energy. The experimental results show that our proposed algorithm reduces energy consumption by a maximum of 21.2% compared with several conventional algorithms.

    CiNii

  • SAAV:Energy-efficient High-level Synthesis Algorithm targeting Adaptive Voltage Huddle-based Distributed Register Architecture with Dynamic Multiple Supply Voltages

    ABE Shin-ya, SHI Youhua, USAMI Kimiyoshi, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   112 ( 321 ) 135 - 140  2012.11

     View Summary

    An adaptive voltage huddle-based distributed-register architecture (AVHDR architecture), which integrates dynamic multiple supply voltages and interconnection delays into high-level synthesis, and a synthesis algorithm for AVHDR architectures have been proposed. This algorithm is based on iterative improvement of scheduling/binding and floorplanning and can converge without oscillation by using virtual-area-based iterative refinement flow. However, virtual areas may have some area and interconnection delay overheads. In this paper, we propose virtual area adaptation which relaxes these overheads as the iteration proceeds. Experimental results show that our algorithm achieves 6.2% energy saving compared with conventional algorithm for AVHDR architectures and 65.7% energy saving compared with conventional algorithms.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. ICD   112 ( 247 ) 95 - 100  2012.10

     View Summary

    Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it&#039;s possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    電子情報通信学会技術研究報告. ICD, 集積回路   112 ( 247 ) 95 - 100  2012.10

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Signal processing   112 ( 246 ) 95 - 100  2012.10

     View Summary

    Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it&#039;s possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 245 ) 95 - 100  2012.10

     View Summary

    Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it&#039;s possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

    CiNii

  • Secure Scan Architecture Using State Dependent Scan Flip-Flop with Key-Based Configuration on RSA Circuit

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Image engineering   112 ( 248 ) 95 - 100  2012.10

     View Summary

    Scan test is one of the useful design for testability techniques, which can detect circuit failure efficiently. However, it has been reported that it&#039;s possible to retrieve secret keys from cryptographic LSIs through scan chains. Therefore a secure scan architecture using SDSFF (State Dependent Scan Flip-Flop) against scan-based attack which achieves high security without compromising the testability is proposed. In SDSFF, there is a problem which is the update timing of the latch which added to the scan FF. In this paper, we propose the update timing to online test without sacrificing the security. In our method, the latches are updated by result which the value of KEY which decided when designed compared with any FFs in a scan chain. We show that by using proposed method, neither the secret key nor the testability of an RSA circuit implementation is compromised, and the effectiveness of the proposed method According the result, even with 100 SDSFFs, the introduced area overhead is 0.555% which less than the conventional method.

    CiNii

  • A-17-7 Pedestrian Navigation Considering Easiness of Route Understranding using Deformed-map

    Tsuji Kazuki, Yanagisawa Masao, Togawa Nozumu

    Proceedings of the Society Conference of IEICE   2012   166 - 166  2012.08

    CiNii

  • A-3-4 AES Cryptosystem Using Clock Falling Edge Against DFA

    Igarashi Hiroaki, Shi Youhua, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2012   51 - 51  2012.08

    CiNii

  • A-3-1 Evaluation of L1 and L2 Caches Configuration using Non-Volatile Memory for IL1 and UL2 Caches

    Matsuno Shota, Tawada Masashi, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2012   48 - 48  2012.08

    CiNii

  • A-3-5 Secure Scan Architecture Using State Dependent Scan Flip-Flop with Feedback

    Atobe Yuta, Shi Youhua, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2012   52 - 52  2012.08

    CiNii

  • Multiple Supply Voltage aware High-level Synthesis for High-integrated and High-frequency Circuits

    阿部 晋矢, 柳澤 政生, 戸川 望

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems   25   160 - 165  2012.07

    CiNii

  • Data Structures Representing Multiple Cache Configurations and Its Associated Fast and Exact Two-core Cache Configuration Simulation

    多和田 雅師, 柳澤 政生, 戸川 望

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems   25   414 - 419  2012.07

    CiNii

  • Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Signal processing   112 ( 115 ) 115 - 120  2012.06

     View Summary

    Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

    CiNii

  • Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Mathematical Systems Science and its Applications : IEICE technical report   112 ( 116 ) 115 - 120  2012.06

     View Summary

    Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

    CiNii

  • Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Circuits and systems   112 ( 113 ) 115 - 120  2012.06

     View Summary

    Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

    CiNii

  • Secure Scan Architecture on RSA Circuit Using State Dependent Scan Flip Flop against Scan-Based Side Channel Attack

    ATOBE Yuta, SHI Youhua, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 114 ) 115 - 120  2012.06

     View Summary

    Scan test that is one of the useful design for testability tecniques, which can control and observe the FFs(Flip Flops) inside LSIs, can detect circuit failure efficiently. On the other hand, a scan-based attack using scan chain which retrieves secret keys of cryptographic LSIs is considered. Generaly testability and security are contradictory, there is a need for an efficient design for testability circuit to satisfy both testability and security. In this paper, a secure scan architecture against scan-based attack which have high testability is proposed. In our method, scan data is state-dependent changed unintelligible data to attackers by adding the latch to any FFs in the scan chain. Changing the value of the FFs can dynamically change the scan data. The tester can test as a normal scan test because they know the structure of the extended circuit. We made an analysis on an RSA implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based attack.

    CiNii

  • Multiple-supply-voltages aware high-speed and high-efficiency high-level synthesis for HDR architecture

    阿部 晋矢, 柳澤 政生, 戸川 望

    研究報告システムLSI設計技術(SLDM)   2012 ( 2 ) 1 - 6  2012.05

     View Summary

    高集積,高機能な LSI 加工技術の出現により,エネルギー効率と配線遅延を意識した LSI 設計が求められる.低電力化技術の 1 つである複数電源電圧は,設計の上位工程で意識するほど効果が高いまた,設計の下位工程であるフロアプランまで意識し,配線遅延の影響を考えた高位合成が必要となっている.複数電源電圧と配線遅延を高位合成に統合するプラットフォームとして HDR アーキテクチャが提案された本稿では,HDR アーキテクチャを対象に高速かつ効率的な複数電源電圧指向の高位合成を提案する.高速かつ効率的に解を得るため,「高収束な面積見積もり」 と 「フロアプラン指向ハドル合成」 を提案する.「高収束な面積見積もり」 は,従来手法において収束の妨げとなっていた反復中の面積の振動を削減する.「フロアプラン指向ハドル合成」 は,ハドルに所属する演算器をフロアプランと同時に決定することで効率的にハドルの構成を決定する.計算機実験結果より提案手法は従来手法と比較し,約 40% 実行時間が削減された.HDR architecture has been proposed as a platform that integrates energy-efficiency and interconnection delays into high-level synthesis. In this paper, we propose new multiple-supply-voltages aware high-speed and highefficiency high-level synthesis for HDR architectures. We propose two new techniques, &quot;virtual area estimation&quot; and &quot;floorplanning directed huddling&quot;, and integrate them into an HDR architecture synthesis algorithm. &quot;Virtual area estimation&quot; reduces huddles&#039; area oscillating during iterations, which impedes convergence of conventional methods. &quot;Floorplanning directed huddling&quot; determines huddle structure effectively by resolving floorplanning and functional unit assignment inside huddles at the same time. Experimental results show that our algorithm achieves about 40% run-time-saving compared with the conventional methods.

    CiNii

  • Multiple-supply-voltages aware high-speed and high-efficiency high-level synthesis for HDR architecture

    ABE Shin-ya, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   112 ( 71 ) 7 - 12  2012.05

     View Summary

    HDR architecture has been proposed as a platform that integrates energy-efficiency and interconnection delays into high-level synthesis. In this paper, we propose new multiple-supply-voltages aware high-speed and high-efficiency high-level synthesis for HDR architectures. We propose two new techniques, &quot;virtual area estimation&quot; and &quot;floorplanning directed huddling&quot;, and integrate them into an HDR architecture synthesis algorithm. &quot;Virtual area estimation&quot; reduces huddles&#039; area oscillating during iterations, which impedes convergence of conventional methods. &quot;Floorplanning directed huddling&quot; determines huddle structure effectively by resolving floorplanning and functional unit assignment inside huddles at the same time. Experimental results show that our algorithm achieves about 40% run-time-saving compared with the conventional methods.

    CiNii

  • A-3-7 Control Method for Image Recognition Hardware with Multiple Video Inputs

    Otsuka Takuya, Hosoya Eiichi, Aoki Takashi, Onozawa Akira, Lee Seungju, Togawa Nozomu

    Proceedings of the IEICE General Conference   2012   91 - 91  2012.03

    CiNii

  • Fast and Exact Cache Configuration Simulation for Two-core L1 Cache

    多和田 雅師, 柳澤 政生, 戸川 望

    研究報告システムLSI設計技術(SLDM)   2012 ( 3 ) 1 - 6  2012.02

     View Summary

    近年,複数のコアをもつ組込みプロセッサが増えている.アプリケーションが限定される組込みシステムでは,速度や電力,面積の点で最適なキャッシュメモリが存在する.限定されたアプリケーションに対して複数のキャッシュ構成それぞれで動作シミュレーションを行うことで,キャッシュメモリ設計時に最適なキャッシュ構成を判定できる.マルチコアキャッシュ構成のシミュレーションは複雑になりシングルコアキャッシュ構成のシミュレーションよりも時間がかかってしまう.シングルコアプロセッサのキャッシュ構成ではシミュレーションの高速化手法が研究されているが,マルチコアプロセッサのキャッシュ構成ではシミュレーション高速化手法の研究は進んでいない.本稿では 2 コアプロセッサ L1 キャッシュのキャッシュ構成シミュレーションの高速化手法を提案する.マルチコアプロセッサではキャッシュコヒーレンシプロトコルがあり,複数の似たキャッシュ構成であっても内部状態が異なる場合が多い.そこでキャッシュコヒーレンシプロトコルの状態遷移とキャッシュ連想度に関する性質を利用することで 1 つのデータ構造で連想度の異なる複数のキャッシュ構成を表現する手法を提案する.複数のキャッシュ構成を 1 つのデータ構造で表し探索や更新の範囲を少なくすることで,シミュレーションの高速化を図る.Recently, multiple-core processors are used in embedded systems very often. Since application programs running are much limited on embedded systems, there must exist an optimal cache memory in terms of power and area. Simulating application programs on various cache configurations is one of the best options to determine the optimal cache configuration. Multicore cache configuration simulation, however, is much more complicated and takes much more time than single-core cache configuration simulation. In this paper, we propose a very fast two-core L1 cache configuration simulation algorithm. We first propose a new data structure where just a single data structure represents two or more multicore cache configurations with different cache associativities. After that, we propose a new multicore cache configuration simulation algorithm using our new data structure associated with new theorems.

    CiNii

  • Partial Redundant Fault Secure High Level Synthesis for RDR Architecture

    田中 翔, 柳澤 政生, 戸川 望

    研究報告システムLSI設計技術(SLDM)   2012 ( 4 ) 1 - 6  2012.02

     View Summary

    半導体の微細化技術の向上に伴い,ソフトエラーによる信頼性低下が問題となっている.そのため,LSI にエラー検出機能を組み込むフォールトセキュア設計の必要性が高まっている.一方,微細化技術の向上によりゲート遅延より配線遅延が支配的となったため,高位合成段階で配線遅延を予測する必要が生じている.配線長が不定である従来のレジスタ集中型アーキテクチャに対し,レジスタをチップ内に均等に配置することで配線長を一定とする RDR アーキテクチャが提案されている.本稿では RDR アーキテクチャを対象とした,部分 2 重化によるフォールトセキュア高位合成手法を提案する.提案手法では入力 CDFG の演算ノードを一部 2 重化することで,レイテンシ制約内で信頼性を最大化する.RDR アーキテクチャで生じる空き領域をフォールトセキュア設計に利用することで面積効率を向上させると同時に,2 重化可能な演算ノード数を増加させる.続いて,挿入比較ノード数を最小化するスケジューリング・バインディングを行うことで余分な演算器動作を抑制し,信頼性向上を図る.計算機実験により提案手法は,フォールトセキュア設計を利用しない手法と比して最大 57% 信頼性を向上させるフォールトセキュア高位合成が可能であることを確認した.As device feature size decreases, the reliability improvement against soft errors becomes quite necessary. A fault-secure system, in which concurrent error detection is realized, is one of the solutions to this problem. On the other hand, the average interconnect delay exceeds the gate delay which leads to the timing closure problem. By using regular-distributed-register architecture (RDR architecture), we can estimate interconnection delays very accurately and influence of their interconnect can be much reduced even in the behavioral level. In this paper, we propose a partial redundant fault-secure high-level synthesis algorithm for an RDR architecture. In fault-secure high-level synthesis, a re-computation CDFG a part of normal-computation CDFG must be scheduled and bound to functional units. Firstly, our algorithm re-uses vacant areas on RDR islands to allocate new function units additionally for the re-computation CDFG.Secondly, we propose a scheduling algorithm which minimize the number of insert comparator nodes. We show the effectiveness of the proposed algorithm through experimental results. Our algorithm reduces the soft error rate by an average of 57% compared with the non fault-secure approach.

    CiNii

  • A Fast Interpolation Unit Using Selector Logics

    岩田 愛実, 吉原 弘峰, 柳澤 政生, 戸川 望

    研究報告システムLSI設計技術(SLDM)   2012 ( 7 ) 1 - 6  2012.02

     View Summary

    補間演算は既知のデータ列を基にして各区間の範囲内を埋める数値または関数を求める演算で,画像の拡大,縮小や魚眼画像の補正といった処理に利用される.キュービックスプライン補間は周囲 4 点から 3 次関数を用いることで補間を行うため精度が高く,より滑らかな補間ができるため実用的に用いられる.しかし,キュービックスプライン補間では扱う既知データが多く,計算が複雑なために処理に時間がかかる.そのため,補間演算処理をリアルタイムに行うには演算の高速化が必要である.本稿では,補間演算器にセレクタ論理を組み込むことで桁上げ伝搬遅延を削減し,演算器を高速化する手法を提案する.周囲 2 点を基に補間を行う線形補間では,算術演算子を用いて設計した従来の線形補間演算器に比べ,遅延時間は最大15%削減された.キュービックスプライン補間演算では,従来のキュービックスプライン補間演算器に比べ,遅延時間は最大 25% 削減された.Interpolation is a technique that fills the gaps between existing data, which is often applied to image scaling and superresolution. Cubic spline interpolation, one of the interpolation techniques, obtains a cubic function based on the four existing points and fills their gaps very smoothly and precisely. However, it takes a lot of time because it requires many data and complex calculation. Speeding-up cubic spline interpolation is the key to realize a practical image scaling system. In this paper, we firstly focus on linear interpolation and propose a high-speed linear interpolation circuit based on &quot;selector logics.&quot; Secondly, we propose a high-speed cubic spline interpolation circuit composed of our proposed linear interpolation circuits. Experimental results demonstrate that our linear interpolation circuit improves the performance by 15% and that our cubic interpolation circuit improves the performance by 25 %, compared to a conventional interpolation design.

    CiNii

  • Fast and Exact Cache Configuration Simulation for Two-core L1 Cache

    多和田 雅師, 柳澤 政生, 戸川 望

    研究報告組込みシステム(EMB)   2012 ( 3 ) 1 - 6  2012.02

     View Summary

    近年,複数のコアをもつ組込みプロセッサが増えている.アプリケーションが限定される組込みシステムでは,速度や電力,面積の点で最適なキャッシュメモリが存在する.限定されたアプリケーションに対して複数のキャッシュ構成それぞれで動作シミュレーションを行うことで,キャッシュメモリ設計時に最適なキャッシュ構成を判定できる.マルチコアキャッシュ構成のシミュレーションは複雑になりシングルコアキャッシュ構成のシミュレーションよりも時間がかかってしまう.シングルコアプロセッサのキャッシュ構成ではシミュレーションの高速化手法が研究されているが,マルチコアプロセッサのキャッシュ構成ではシミュレーション高速化手法の研究は進んでいない.本稿では 2 コアプロセッサ L1 キャッシュのキャッシュ構成シミュレーションの高速化手法を提案する.マルチコアプロセッサではキャッシュコヒーレンシプロトコルがあり,複数の似たキャッシュ構成であっても内部状態が異なる場合が多い.そこでキャッシュコヒーレンシプロトコルの状態遷移とキャッシュ連想度に関する性質を利用することで 1 つのデータ構造で連想度の異なる複数のキャッシュ構成を表現する手法を提案する.複数のキャッシュ構成を 1 つのデータ構造で表し探索や更新の範囲を少なくすることで,シミュレーションの高速化を図る.Recently, multiple-core processors are used in embedded systems very often. Since application programs running are much limited on embedded systems, there must exist an optimal cache memory in terms of power and area. Simulating application programs on various cache configurations is one of the best options to determine the optimal cache configuration. Multicore cache configuration simulation, however, is much more complicated and takes much more time than single-core cache configuration simulation. In this paper, we propose a very fast two-core L1 cache configuration simulation algorithm. We first propose a new data structure where just a single data structure represents two or more multicore cache configurations with different cache associativities. After that, we propose a new multicore cache configuration simulation algorithm using our new data structure associated with new theorems.

    CiNii

  • Partial Redundant Fault Secure High Level Synthesis for RDR Architecture

    田中 翔, 柳澤 政生, 戸川 望

    研究報告組込みシステム(EMB)   2012 ( 4 ) 1 - 6  2012.02

     View Summary

    半導体の微細化技術の向上に伴い,ソフトエラーによる信頼性低下が問題となっている.そのため,LSI にエラー検出機能を組み込むフォールトセキュア設計の必要性が高まっている.一方,微細化技術の向上によりゲート遅延より配線遅延が支配的となったため,高位合成段階で配線遅延を予測する必要が生じている.配線長が不定である従来のレジスタ集中型アーキテクチャに対し,レジスタをチップ内に均等に配置することで配線長を一定とする RDR アーキテクチャが提案されている.本稿では RDR アーキテクチャを対象とした,部分 2 重化によるフォールトセキュア高位合成手法を提案する.提案手法では入力 CDFG の演算ノードを一部 2 重化することで,レイテンシ制約内で信頼性を最大化する.RDR アーキテクチャで生じる空き領域をフォールトセキュア設計に利用することで面積効率を向上させると同時に,2 重化可能な演算ノード数を増加させる.続いて,挿入比較ノード数を最小化するスケジューリング・バインディングを行うことで余分な演算器動作を抑制し,信頼性向上を図る.計算機実験により提案手法は,フォールトセキュア設計を利用しない手法と比して最大 57% 信頼性を向上させるフォールトセキュア高位合成が可能であることを確認した.As device feature size decreases, the reliability improvement against soft errors becomes quite necessary. A fault-secure system, in which concurrent error detection is realized, is one of the solutions to this problem. On the other hand, the average interconnect delay exceeds the gate delay which leads to the timing closure problem. By using regular-distributed-register architecture (RDR architecture), we can estimate interconnection delays very accurately and influence of their interconnect can be much reduced even in the behavioral level. In this paper, we propose a partial redundant fault-secure high-level synthesis algorithm for an RDR architecture. In fault-secure high-level synthesis, a re-computation CDFG a part of normal-computation CDFG must be scheduled and bound to functional units. Firstly, our algorithm re-uses vacant areas on RDR islands to allocate new function units additionally for the re-computation CDFG.Secondly, we propose a scheduling algorithm which minimize the number of insert comparator nodes. We show the effectiveness of the proposed algorithm through experimental results. Our algorithm reduces the soft error rate by an average of 57% compared with the non fault-secure approach.

    CiNii

  • A Fast Interpolation Unit Using Selector Logics

    岩田 愛実, 吉原 弘峰, 柳澤 政生, 戸川 望

    研究報告組込みシステム(EMB)   2012 ( 7 ) 1 - 6  2012.02

     View Summary

    補間演算は既知のデータ列を基にして各区間の範囲内を埋める数値または関数を求める演算で,画像の拡大,縮小や魚眼画像の補正といった処理に利用される.キュービックスプライン補間は周囲 4 点から 3 次関数を用いることで補間を行うため精度が高く,より滑らかな補間ができるため実用的に用いられる.しかし,キュービックスプライン補間では扱う既知データが多く,計算が複雑なために処理に時間がかかる.そのため,補間演算処理をリアルタイムに行うには演算の高速化が必要である.本稿では,補間演算器にセレクタ論理を組み込むことで桁上げ伝搬遅延を削減し,演算器を高速化する手法を提案する.周囲 2 点を基に補間を行う線形補間では,算術演算子を用いて設計した従来の線形補間演算器に比べ,遅延時間は最大15%削減された.キュービックスプライン補間演算では,従来のキュービックスプライン補間演算器に比べ,遅延時間は最大 25% 削減された.Interpolation is a technique that fills the gaps between existing data, which is often applied to image scaling and superresolution. Cubic spline interpolation, one of the interpolation techniques, obtains a cubic function based on the four existing points and fills their gaps very smoothly and precisely. However, it takes a lot of time because it requires many data and complex calculation. Speeding-up cubic spline interpolation is the key to realize a practical image scaling system. In this paper, we firstly focus on linear interpolation and propose a high-speed linear interpolation circuit based on &quot;selector logics.&quot; Secondly, we propose a high-speed cubic spline interpolation circuit composed of our proposed linear interpolation circuits. Experimental results demonstrate that our linear interpolation circuit improves the performance by 15% and that our cubic interpolation circuit improves the performance by 25 %, compared to a conventional interpolation design.

    CiNii

  • Fast and Exact Cache Configuration Simulation for Two-core L1 Cache

    TAWADA MASASHI, YANAGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Dependable computing   111 ( 462 ) 13 - 18  2012.02

     View Summary

    Recently, multiple-core processors are used in embedded systems very often. Since application programs running are much limited on embedded systems, there must exist an optimal cache memory in terms of power and area. Simulating application programs on various cache configurations is one of the best options to determine the optimal cache configuration. Multicore cache configuration simulation, however, is much more complicated and takes much more time than single-core cache configuration simulation. In this paper, we propose a very fast two-core L1 cache configuration simulation algorithm. We first propose a new data structure where just a single data structure represents two or more multicore cache configurations with different cache associativities. After that, we propose a new multicore cache configuration simulation algorithm using our new data structure associated with new theorems.

    CiNii

  • Partial Redundant Fault Secure High Level Synthesis for RDR Architecture

    TANAKA SHO, YANAGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Dependable computing   111 ( 462 ) 19 - 24  2012.02

     View Summary

    As device feature size decreases, the reliability improvement against soft errors becomes quite necessary. A fault-secure system, in which concurrent error detection is realized, is one of the solutions to this problem. On the other hand, the average interconnect delay exceeds the gate delay which leads to the timing closure problem. By using regular-distributed-register architecture (RDR architecture), we can estimate interconnection delays very accurately and influence of their interconnect can be much reduced even in the behavioral level. In this paper, we propose a partial redundant fault-secure high-level synthesis algorithm for an RDR architecture. In fault-secure high-level synthesis, a recomputation CDFG a part of normal-computation CDFG must be scheduled and bound to functional units. Firstly, our algorithm re-uses vacant areas on RDR islands to allocate new function units additionally for the re-computation CDFG.Secondly, we propose a scheduling algorithm which minimize the number of insert comparator nodes. We show the effectiveness of the proposed algorithm through experimental results. Our algorithm reduces the soft error rate by an average of 57% compared with the non fault-secure approach.

    CiNii

  • A Fast Interpolation Unit Using Selector Logics

    IWATA MANAMI, YOSHIHARA HIROMINE, YANAGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Dependable computing   111 ( 462 ) 37 - 42  2012.02

     View Summary

    Interpolation is a technique that fills the gaps between existing data, which is often applied to image scaling and superresolution. Cubic spline interpolation, one of the interpolation techniques, obtains a cubic function based on the four existing points and fills their gaps very smoothly and precisely. However, it takes a lot of time because it requires many data and complex calculation. Speeding-up cubic spline interpolation is the key to realize a practical image scaling system. In this paper, we firstly focus on linear interpolation and propose a high-speed linear interpolation circuit based on &quot;selector logics.&quot; Secondly, we propose a high-speed cubic spline interpolation circuit composed of our proposed linear interpolation circuits. Experimental results demonstrate that our linear interpolation circuit improves the performance by 15 % and that our cubic interpolation circuit improves the performance by 25 %, compared to a conventional interpolation design.

    CiNii

  • Fast and Exact Cache Configuration Simulation for Two-core L1 Cache

    TAWADA MASASHI, YANAGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Computer systems   111 ( 461 ) 13 - 18  2012.02

     View Summary

    Recently, multiple-core processors are used in embedded systems very often. Since application programs running are much limited on embedded systems, there must exist an optimal cache memory in terms of power and area. Simulating application programs on various cache configurations is one of the best options to determine the optimal cache configuration. Multicore cache configuration simulation, however, is much more complicated and takes much more time than single-core cache configuration simulation. In this paper, we propose a very fast two-core L1 cache configuration simulation algorithm. We first propose a new data structure where just a single data structure represents two or more multicore cache configurations with different cache associativities. After that, we propose a new multicore cache configuration simulation algorithm using our new data structure associated with new theorems.

    CiNii

  • Partial Redundant Fault Secure High Level Synthesis for RDR Architecture

    TANAKA SHO, YANAGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Computer systems   111 ( 461 ) 19 - 24  2012.02

     View Summary

    As device feature size decreases, the reliability improvement against soft errors becomes quite necessary. A fault-secure system, in which concurrent error detection is realized, is one of the solutions to this problem. On the other hand, the average interconnect delay exceeds the gate delay which leads to the timing closure problem. By using regular-distributed-register architecture (RDR architecture), we can estimate interconnection delays very accurately and influence of their interconnect can be much reduced even in the behavioral level. In this paper, we propose a partial redundant fault-secure high-level synthesis algorithm for an RDR architecture. In fault-secure high-level synthesis, a recomputation CDFG a part of normal-computation CDFG must be scheduled and bound to functional units. Firstly, our algorithm re-uses vacant areas on RDR islands to allocate new function units additionally for the re-computation CDFG.Secondly, we propose a scheduling algorithm which minimize the number of insert comparator nodes. We show the effectiveness of the proposed algorithm through experimental results. Our algorithm reduces the soft error rate by an average of 57% compared with the non fault-secure approach.

    CiNii

  • A Fast Interpolation Unit Using Selector Logics

    IWATA MANAMI, YOSHIHARA HIROMINE, YANAGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Computer systems   111 ( 461 ) 37 - 42  2012.02

     View Summary

    Interpolation is a technique that fills the gaps between existing data, which is often applied to image scaling and superresolution. Cubic spline interpolation, one of the interpolation techniques, obtains a cubic function based on the four existing points and fills their gaps very smoothly and precisely. However, it takes a lot of time because it requires many data and complex calculation. Speeding-up cubic spline interpolation is the key to realize a practical image scaling system. In this paper, we firstly focus on linear interpolation and propose a high-speed linear interpolation circuit based on &quot;selector logics.&quot; Secondly, we propose a high-speed cubic spline interpolation circuit composed of our proposed linear interpolation circuits. Experimental results demonstrate that our linear interpolation circuit improves the performance by 15 % and that our cubic interpolation circuit improves the performance by 25 %, compared to a conventional interpolation design.

    CiNii

  • キャッシュ構成の高速シミュレーションを利用した不揮発メモリによる二階層キャッシュ構成の評価

    松野翔太, 多和田雅師, 柳澤政生, 戸川望

    情報処理学会シンポジウム論文集   2012 ( 5 )  2012

    J-GLOBAL

  • Scan-based Attack against Triple DES Cryptosystems Using Scan Signatures

    KODERA Hirokazu, YANAGISAWA Masao, TOGAWA Nozomu

    Technical report of IEICE. VLD   111 ( 324 ) 7 - 12  2011.11

     View Summary

    Scan-path test is one of the useful design-for-test techniques, which can observe and control registers inside LSIs. On the other hand, a scan-based attack which retrieves secret keys from scanned data is considered to be one of the strongest side-channel attacks. In this paper, a scan-based attack method against Triple DES cryptosystems using a &quot;scan signature&quot; is proposed. In our method, several plaintexts are inputted into a Triple DES module and an attacker obtains scanned data. Then, an attacker observes a specific bit line (scan signature) of these scanned data to retrieve a secret key. The Triple DES algorithm uses three secret keys. The first secret key can be retrieved as in the same way as we can retrieve a secret key from a DES module. How to retrieve the second and third secret keys is the most concern. In our proposed method, we retrieve the second and third secret keys by using the retrieved first key and setting an appropriate scan signature. Experimental results show that our proposed method successfully retrieve three secret keys in a Triple DES module using up to 43 plaintexts.

    CiNii

  • Scan-based Attack against Triple DES Cryptosystems Using Scan Signatures

    KODERA Hirokazu, YANAGISAWA Masao, TOGAWA Nozomu

    IEICE technical report. Dependable computing   111 ( 325 ) 7 - 12  2011.11

     View Summary

    Scan-path test is one of the useful design-for-test techniques, which can observe and control registers inside LSIs. On the other hand, a scan-based attack which retrieves secret keys from scanned data is considered to be one of the strongest side-channel attacks. In this paper, a scan-based attack method against Triple DES cryptosystems using a &quot;scan signature&quot; is proposed. In our method, several plaintexts are inputted into a Triple DES module and an attacker obtains scanned data. Then, an attacker observes a specific bit line (scan signature) of these scanned data to retrieve a secret key. The Triple DES algorithm uses three secret keys. The first secret key can be retrieved as in the same way as we can retrieve a secret key from a DES module. How to retrieve the second and third secret keys is the most concern. In our proposed method, we retrieve the second and third secret keys by using the retrieved first key and setting an appropriate scan signature. Experimental results show that our proposed method successfully retrieve three secret keys in a Triple DES module using up to 43 plaintexts.

    CiNii

  • Scan-based Attack against Triple DES Cryptosystems Using Scan Signatures

    KODERA Hirokazu, YANAGISAWA Masao, TOGAWA Nozomu

    電子情報通信学会技術研究報告. DC, ディペンダブルコンピューティング : IEICE technical report   111 ( 325 ) 7 - 12  2011.11

    CiNii

  • Scan-based Attack against Triple DES Cryptosystems Using Scan Signatures

    KODERA Hirokazu, YANAGISAWA Masao, TOGAWA Nozomu

    電子情報通信学会技術研究報告. VLD, VLSI設計技術   111 ( 324 ) 7 - 12  2011.11

    CiNii

  • Scan-based Attack against Triple DES Cryptosystems Using Scan Signatures

    小寺 博和, 柳澤 政生, 戸川 望

    研究報告システムLSI設計技術(SLDM)   2011 ( 2 ) 1 - 6  2011.11

     View Summary

    テスト容易化技術の 1 つであるスキャンパステストは,LSI のレジスタを外部から直接観測・制御することが可能であるため LSI の検証に非常に役立つ.一方で,暗号モジュールや暗号 LSI に対するサイドチャネル攻撃の危険性が指摘されており,その中でもスキャンパステストで使用するテスト用スキャンチェインから取得可能なスキャンデータから秘密鍵を解読するスキャンベース攻撃が注目されている.従来研究として,共通鍵暗号 DES や AES,公開鍵暗号 RSA や楕円曲線暗号に対するスキャンベース攻撃手法が提案されているが,共通鍵暗号 Triple DES に対するスキャンベース攻撃手法は報告されていない.本稿では,共通鍵暗号 Triple DES に対するスキャンシグネチャを用いたスキャンベース攻撃手法を提案する.提案手法では,暗号 LSI に複数の平文を入力したときのスキャンデータの特定のビット列に着目し,対応するレジスタの変化を観察することで秘密鍵を解読する.暗号 LSI 以外のレジスタがスキャンチェインに含まれる場合や,暗号 LSI の動作タイミングが不明な場合でも秘密鍵の解読が可能となる.Triple DES は暗号化のために秘密鍵を 3 つ使用するため,最初に解読した秘密鍵を用いて他の秘密鍵の解読を行うことで 3 つの秘密鍵の解読を実行する.提案手法では,多くても 43 個の平文で Triple DES の秘密鍵解読をできる結果が得られた.Scan-path test is one of the useful design-for-test techniques, which can observe and control registers inside LSIs. On the other hand, a scan-based attack which retrieves secret keys from scanned data is considered to be one of the strongest side-channel attacks. In this paper, a scan-based attack method against Triple DES cryptosystems using a &quot;scan signature&quot; is proposed. In our method, several plaintexts are inputted into a Triple DES module and an attacker obtains scanned data. Then, an attacker observes a specific bit line (scan signature) of these scanned data to retrieve a secret key. The Triple DES algorithm uses three secret keys. The first secret key can be retrieved as in the same way as we can retrieve a secret key from a DES module. How to retrieve the second and third secret keys is the most concern. In our proposed method, we retrieve the second and third secret keys by using the retrieved first key and setting an appropriate scan signature. Experimental results show that our proposed method successfully retrieve three secret keys in a Triple DES module using up to 43 plaintexts.

    CiNii

  • Scan-based Attack against DES Cryptosystems Independent of Scan-structure

    KODERA HIROKAZU, YANGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Signal processing   111 ( 257 ) 61 - 66  2011.10

     View Summary

    Side-channel attacks against crypto modules and LSIs become a practical threat these days. Especially, a scan-based attack which retrieves secret keys from scan data is considered to be one of the strongest side-channel attacks. In this paper, a scan-based attack method against DES cryptosystems is proposed. In our method, several plain texts are inputted into a DES module. After that, an attacker retrieves a secret key by observing a specific bit line of these scanned data. Because the values of a specific bit line dependent on the secret key, an attacker can analyze secret key using these values. Even when an attacker does not know scan chain structure implemented on a DES module and even when scan chain includes registers other than DES crypto modules, our proposed method can successfully retrieve its secret key. Several Experimental evaluations are demonstrated to confirm the effectiveness of our proposed method.

    CiNii

  • Multiple Supply Voltages aware High-level Synthesis for HDR architecture

    ABE SHIN-YA, YANAGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Signal processing   111 ( 257 ) 95 - 100  2011.10

     View Summary

    As buttery runtime and overheating problems for portable devices become unignorable, energy-aware LSI design is strongly required. Moreover, an interconnect delay should be explicitly considered there because it exceeds a gate delay as the semiconductor devices are downsized. We must take account of energy efficiency and interconnect delay even in high-level synthesis. Recently, a huddle-based distributed-register architecture (HDR architecture), which is a kind of island-based distributed-register architecture for multi-cycle interconnect communications, and its associated synthesis algorithm have been proposed. The algorithm is composed of scheduling/FU binding, huddling, unhuddling, and floorplanning. However, the original scheduling/FU binding does not minimize energy consumption directly but minimizes execution time. In this paper we propose a new scheduling/FU binding algorithm whose purpose is the minimization of energy consumption considering multiple supply voltages for HDR architectures. Experimental results show that our algorithm achieves 45.1 % energy-saving compared with the conventional distributed-register architectures and conventional algorithms, and 15.9 % energy-saving compared with the conventional algorithm for HDR architecture.

    CiNii

  • Scan-based Attack against DES Cryptosystems Independent of Scan-structure

    KODERA HIROKAZU, YANGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Image engineering   111 ( 259 ) 61 - 66  2011.10

     View Summary

    Side-channel attacks against crypto modules and LSIs become a practical threat these days. Especially, a scan-based attack which retrieves secret keys from scan data is considered to be one of the strongest side-channel attacks. In this paper, a scan-based attack method against DES cryptosystems is proposed. In our method, several plain texts are inputted into a DES module. After that, an attacker retrieves a secret key by observing a specific bit line of these scanned data. Because the values of a specific bit line dependent on the secret key, an attacker can analyze secret key using these values. Even when an attacker does not know scan chain structure implemented on a DES module and even when scan chain includes registers other than DES crypto modules, our proposed method can successfully retrieve its secret key. Several Experimental evaluations are demonstrated to confirm the effectiveness of our proposed method.

    CiNii

  • Multiple Supply Voltages aware High-level Synthesis for HDR architecture

    ABE SHIN-YA, YANAGISAWA MASAO, TOGAWA NOZOMU

    IEICE technical report. Image engineering   111 ( 259 ) 95 - 100  2011.10

     View Summary

    As buttery runtime and overheating problems for portable devices become unignorable, energy-aware LSI design is strongly required. Moreover, an inter-connect delay should be explicitly considered there because it exceeds a gate delay as the semiconductor devices are downsized. We must take account of energy efficiency and interconnect delay even in high-level synthesis. Recently, a huddle-based distributed-register architecture (HDR architecture), which is a kind of island-based distributed-register architecture for multi-cycle interconnect communications, and its associated synthesis algorithm have been proposed. The algorithm is composed of scheduling/FU binding, huddling, unhuddling, and floorplanning. However, the original scheduling/FU binding does not minimize energy consumption directly but minimizes execution time. In this paper we propose a new scheduling/FU binding algorithm whose purpose is the minimization of energy consumption considering multiple supply voltages for HDR architectures. Experimental results show that our algorithm achieves 45.1 % energy-saving compared with the conventional distributed-register architectures and conventional algorithms, and 15.9 % energy-saving compared with the conventional algorithm for HDR architecture.

    CiNii

  • Scan-based Attack against DES Cryptosystems Independent of Scan-structure

    KODERA HIROKAZU, YANGISAWA MASAO, TOGAWA NOZOMU

    Technical report of IEICE. ICD   111 ( 258 ) 61 - 66  2011.10

     View Summary

    Side-channel attacks against crypto modules and LSIs become a practical threat these days. Especially, a scan-based attack which retrieves secret keys from scan data is considered to be one of the strongest side-channel attacks. In this paper, a scan-based attack method against DES cryptosystems is proposed. In our method, several plain texts are inputted into a DES module. After that, an attacker retrieves a secret key by observing a specific bit line of these scanned data. Because the values of a specific bit line dependent on the secret key, an attacker can analyze secret key using these values. Even when an attacker does not know scan chain structure implemented on a DES module and even when scan chain includes registers other than DES crypto modules, our proposed method can successfully retrieve its secret key. Several Experimental evaluations are demonstrated to confirm the effectiveness of our proposed method.

    CiNii

  • Multiple Supply Voltages aware High-level Synthesis for HDR architecture

    ABE SHIN-YA, YANAGISAWA MASAO, TOGAWA NOZOMU

    Technical report of IEICE. ICD   111 ( 258 ) 95 - 100  2011.10

     View Summary

    As buttery runtime and overheating problems for portable devices become unignorable, energy-aware LSI design is strongly required. Moreover, an interconnect delay should be explicitly considered there because it exceeds a gate delay as the semiconductor devices are downsized. We must take account of energy efficiency and interconnect delay even in high-level synthesis. Recently, a huddle-based distributed-register architecture (HDR architecture), which is a kind of island-based distributed-register architecture for multi-cycle interconnect communications, and its associated synthesis algorithm have been proposed. The algorithm is composed of scheduling/FU binding, huddling, unhuddling, and floorplanning. However, the original scheduling/FU binding does not minimize energy consumption directly but minimizes execution time. In this paper we propose a new scheduling/FU binding algorithm whose purpose is the minimization of energy consumption considering multiple supply voltages for HDR architectures. Experimental results show that our algorithm achieves 45.1 % energy-saving compared with the conventional distributed-register architectures and conventional algorithms, and 15.9 % energy-saving compared with the conventional algorithm for HDR architecture.

    CiNii

  • A-3-1 Adaptive parallel interpolation archetecture using motion vectors

    Kurioka Daiki, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2011   75 - 75  2011.08

    CiNii

  • A-3-3 Super-resolution by Using Weighted Adders with Selector Logics and Its Experimental Comparisons

    Yoshihara Hiromine, Yanagisawa Masao, Togawa Nozumu

    Proceedings of the Society Conference of IEICE   2011   77 - 77  2011.08

    CiNii

  • A-3-11 A consider of exact cache configuration simulation for two-core processors

    Tawada Masashi, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2011   85 - 85  2011.08

    CiNii

  • A-3-13 Performance comparison between shared bus and bus matrix in network processors

    Deguchi Kensuke, Yanagisawa Masao, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2011   87 - 87  2011.08

    CiNii

  • Reconstruction Hardware Design in Super-resolution Using Weighted Adders with Selector Logics

    吉原 弘峰, 柳澤 政生, 戸川 望

    回路とシステムワークショップ論文集 Workshop on Circuits and Systems   24 ( 266 ) 431 - 436  2011.08

    CiNii

  • 歩行者ナビゲーションのための屋内環境での空間認知

    杉岡基行, 柳澤政生, 戸川望

    マルチメディア、分散協調とモバイルシンポジウム2011論文集   2011   1065 - 1079  2011.06

    CiNii

  • 屋内環境モデル化と柔軟な歩行経路生成手法

    町田直哉, 柳澤政生, 戸川望

    マルチメディア、分散協調とモバイルシンポジウム2011論文集   2011   1057 - 1064  2011.06

    CiNii

  • Super-resolution by Using Weighted Adders with Selector Logics

    YOSHIHARA Hiromine, YANAGISAWA Masao, OHTSUKI Tatsuo, TOGAWA Nozomu

    IEICE technical report   111 ( 40 ) 27 - 32  2011.05

     View Summary

    In recent years the popularity of television sets and computers with large screens has let to more opportunities to watch moving picture on high-resolution liquid crystal display (LCD) where it is quite necessary to convert low-resolution images to high-resolution ones at low cost. Super-resolution is a technique to remove the noise of observed images and restore the high frequencise of ones. We focus on reconstruction-based super-resolution because it can restore their own brightnesses. It produces a high-resolution image from a set of low-resolution images. Reconstruction requires large computation cost because it requires many images. However, it is necessary to improve arithmetic circuits&#039; performance specific to reconstruction-based super-resolution since the reconstruction-based algorithms need more information on images. In this paper, we propose a reconstruction-based super-resolution using a weighted adder. Our weighted adder is implemented by using selector logics so that we can reduce carry propagations and improve the performance of reconstruction-based super-resolution. Finally, experimental results demonstrate that our proposed weighted adder circuit improves the performance by 13 % and reduces the area by 32 %, compared to conventional weighted adders.

    CiNii

  • DS-2-4 Fast Angular Intra-Prediction Mode Decision Method based on Edge Information

    Tokumitsu Kenta, Chono Keiichi, Senzaki Kenta, Senda Yuzo, Togawa Nozomu, Yanagisawa Masao, Ohtsuki Tatsuo

    Proceedings of the IEICE General Conference   2011 ( 2 ) "S - 7"  2011.02

    CiNii

  • Exact, Fast and Flexible Two-level Cache Simulation for Embedded Systems

    TAWADA Masashi, YANAGISAWA Masao, OHTSUKI Tatsuo, TOGAWA Nozomu

    IEICE technical report   110 ( 432 ) 13 - 18  2011.02

     View Summary

    In hierarchical cache configurations, L1 cache uses LRU as cache replacement policy but L2 and/or L3 caches use FIFO due to its low hardware cost. This paper proposes a fast cache configuration simulation method for hierarchical cache configurations composed of LRU-based L1-cache and FIFO-based L2-cache. In our proposed method, we fix L1 data cache and simulate several L1 instruction cache configurations and L2 unified cache configurations simultaneously with varying their cache parameters. By using L1/L2 cache properties, we can skip to simulate several cache configurations but can obtain exact cache hit/miss counts for all the L1/L2 cache configurations. Experimental evaluations demonstrate that our proposed method boosts up the simulation speed by up to 1900 times.

    CiNii

  • An Energy-efficient ASIP Synthesis Method Using Scratchpad Memory and Code Placement Optimization

    SHIMADA Yoshinori, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   110 ( 432 ) 25 - 30  2011.02

     View Summary

    In this paper, we propose an energy-efficient ASIP synthesis method using scratchpad memory. Due to the fact that a significant amount of power is consumed in the instruction memory, how to develop energy-efficient memory structure becomes important in reducing the overall power consumption of the system. Our method is based on the idea of using scratchpad memory with code placement optimization. The proposed memory architecture can copy data from instruction memory to scratchpad meory under the control of on-chip program counter. With an inputted application CFG, the proposed code placement optimization is used to decide both the code allocations and the required scratchpad memory size for energy minimization. By doing this, the total energy consumption could be reduced as the number of instruction memory accesses is reduced. Experimental results on Mediabench are included to show the effectiveness of the proposed method, in which on average 47.9% energy consumption could be reduced.

    CiNii

  • Speeding-up Exact and Fast L1 Cache Configuration Simulation based on FIFO Replacement Policy

    TAWADA Masashi, YANAGISAWA Masao, OHTSUKI Tatsuo, TOGAWA Nozomu

    IEICE technical report   110 ( 317 ) 55 - 60  2010.11

     View Summary

    The number of sets, block size and associativity determine processor&#039;s cache configuration. Particularly in embedded systems, cache configuration can be optimized due to the limitation of target applications. For LRU cache replacement policy, Recently, the CRCB approach has been proposed for LRU-based cache configuration simulation, that can calculate cache hit/miss rate accurately and very fast changing the three parameters described above. However many recent processors use FIFO-based caches instead of LRU-based caches. In this paper, we propose a faster cache configuration simulation method for embedded applications that uses FIFO as a cache replacement policy. We first prove several properties for FIFO-based caches and then we propose a simulation method that can process two or more FIFO-based cache configurations with different cache associativity simultaneously. Experimental results show that our proposed method can obtain accurate cache hits/misses and an average of 18% faster than the conventional simulators.

    CiNii

  • Speeding-up Exact and Fast L1 Cache Configuration Simulation based on FIFO Replacement Policy

    TAWADA Masashi, YANAGISAWA Masao, OHTSUKI Tatsuo, TOGAWA Nozomu

    IEICE technical report   110 ( 316 ) 55 - 60  2010.11

     View Summary

    The number of sets, block size and associativity determine processor&#039;s cache configuration. Particularly in embedded systems, cache configuration can be optimized due to the limitation of target applications. For LRU cache replacement policy, Recently, the CRCB approach has been proposed for LRU-based cache configuration simulation, that can calculate cache hit/miss rate accurately and very fast changing the three parameters described above. However many recent processors use FIFO-based caches instead of LRU-based caches. In this paper, we propose a faster cache configuration simulation method for embedded applications that uses FIFO as a cache replacement policy. We first prove several properties for FIFO-based caches and then we propose a simulation method that can process two or more FIFO-based cache configurations with different cache associativity simultaneously. Experimental results show that our proposed method can obtain accurate cache hits/misses and an average of 18% faster than the conventional simulators.

    CiNii

  • A-3-6 Evaluation experiment on scan-based attack against RSA cryptosystem

    Nara Ryuta, Yanagisawa Masao, Ohtsuki Tatsuo, Togawa Nozomu

    Proceedings of the Society Conference of IEICE   2010   68 - 68  2010.08

    CiNii

  • Advances in VLSI Technologies for Ultra-Low-Power Computing : Ultra Low Power SoC Design Technologies for Media Processing

    GOTO Satoshi, IKENAGA Takeshi, YOSHIMURA Takeshi, KIMURA Shinji, TOGAWA Nozomu

    IPSJ Magazine   51 ( 7 ) 837 - 845  2010.07

    CiNii

  • High-Level Synthesis with Floorplan for GDR Architectures and its Evaluation

    OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   110 ( 36 ) 19 - 24  2010.05

     View Summary

    Abstract As device feature size decreases, interconnection delay becomes the dominating factor of circuit total delay. It is necessary to deal with floorplan information such as placement and interconnection delay even in high-level synthesis stage. In this paper, we propose a high-level synthesis method targeting generalized distributed-register (GDR) architecture in which we introduce shared/local registers and global/local controllers. In GDR architecture, by adding local register/controller to bottlenecked FU, we can obtain high performance circuit. By sharing register/controller between non-bottlenecked FUs, we can reduce circuit area. Our method automatically selects register/controller configuration by target application and constraints. Experimental results show that 7.6% area reduction can be achieved compared to the conventional floorplan-aware high-level synthesis method.

    CiNii

  • A Pedestrian Positioning System Using Road Traffic Signs and Landmarks

    児島 伴幸, 山根 和也, 柳澤 政生, 大附 辰夫, 戸川 望

    情報処理学会論文誌   51 ( 3 ) 899 - 913  2010.03

     View Summary

    携帯電話を用いた歩行者の位置特定は一般的に携帯電話に搭載されたGPS(携帯GPSと呼ぶ)を用いているが,携帯GPSはマルチパスなどの影響により測位誤差が生じる可能性がある.一方,携帯GPSの測位誤差を調べた調査結果が公開されていることが少ない.本論文ではまず都市部と住宅地の両方が存在する高田馬場駅周辺において携帯GPSの測位誤差を調査した.調査の結果,携帯GPSは最大で80m程度の測位誤差が生じた.都市部における80mの測位誤差は道路2.3本分の誤差に対応するため,歩行者に混乱を与えかねない.次に,携帯GPSの測位誤差を0に近づけるため,道路標識とランドマークを用いて携帯GPSの測位を補正する位置特定手法を提案する.既存インフラである道路標識・ランドマークと,近い将来に社会インフラ化される携帯GPSを用いるため,インフラ設備を最小限に抑えることができる.提案手法は利用者の現在地を道路標識の位置と同一視し,利用者が見つけた道路標識の位置を知ることにより,利用者の位置を特定するものである.処理の流れは携帯GPSにより大まかな位置を特定した後に,利用者が見つけた道路標識を選択することにより現在地候補を5個以下に絞る.現在地候補の近辺に存在するランドマークを選択することにより唯一の現在地を特定する.提案手法をCGI環境で実装し,NTTドコモ社とKDDI社の携帯電話を用いて評価実験した.実地調査を通じて98%の精度で利用者の現在地を特定できることを実証し,提案手法が有効な手法であることを確認した.Mobile-GPS is generally used for pedestrian positioning on mobile devices such as mobile phones and PDAs. Positioning errors of mobile-GPS might be caused by several factors, such as &quot;multipath,&quot; however, positioning errors of mobile-GPS have been not investigated sufficiently. In this paper, we first investigate positioning errors of mobile-GPS at Takadanobaba station and its environs which have both urban and residential areas. Our investigation results show that positioning errors of mobile-GPS can cause approximately 80-meter error at the maximum. Secondly we propose a highly accurate pedestrian positioning method using road traffic signs and landmarks. Our proposed method does not require any infrastructure construction as we already have infrastructure of road traffic signs, landmarks and mobile-GPS on mobile devices. Assuming that a user is positioned at the traffic sign, our proposed method determine the user position by finding out several nearby road traffic signs and sending their colors and shapes to a server. Our method start with locating approximately position of a user using mobile-GPS. Next, it locates user position by selecting road traffic sings and landmarks. Our method is implemented with CGI and investigated using mobile phones of NTT Docomo and au by KDDI. By this investigation, the accuracy of this method was 98% and we succeeded to confirm effectiveness of the proposed method through this evaluation investigation.

    CiNii

  • Changing Organization through Continuous Data Collection with Business Microscope

    ARA Koji, SATO Nobuo, YANO Kazuo, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   109 ( 462 ) 43 - 47  2010.03

     View Summary

    We demonstrate how sensor technology change our business through observation of human behavior.

    CiNii

  • A Pedestrian Positioning System Using Road Traffic Signs and Landmarks Based on Current Location Recognition

    KOJIMA Tomoyuki, YANAGISAWA Masao, OHTSUKI Tatsuo, TOGAWA Nozomu

    IEICE technical report   109 ( 414 ) 153 - 158  2010.02

     View Summary

    Mobile-GPS is generally used for pedestrian positioning on mobile devices such as mobile phones and PDAs. However mobile-GPS can cause approximately 100-meter error by serious influences such as &quot;multipath&quot; in urban areas. When the present location plot is far away from approximately 100-meter on the map, Mobile-GPS may give the pedestrian serious confusion. In this paper, we propose an improved pedestrian positioning system using mobile-GPS, road traffic signs and landmarks based on current location recognition. At first, our system determines the position of the road traffic signs using the road traffic signs and a landmark which the pedestrian finds in the real world. Secondly, our system displays the position of the road traffic signs and the landmark on the map. Then, the pedestrian recognize the present location easily on the map because the real world corresponds to the map on which the road traffic sign and landmark are displayed. Our system assumes the same road traffic signs in a small area to be one cluster and partitions urban areas into high-rise building areas and others. The system usability is improved based on the original system by selecting a road traffic sign only twice in high-rise building areas and only once in other areas in urban area. Finally, we confirm effectiveness of the improved system through simulation investigation.

    CiNii

  • A Pedestrian Positioning System Using Road Traffic Signs and Landmarks Based on Current Location Recognition

    KOJIMA Tomoyuki, YANAGISAWA Masao, OHTSUKI Tatsuo, TOGAWA Nozomu

    IEICE technical report   109 ( 415 ) 153 - 158  2010.02

     View Summary

    Mobile-GPS is generally used for pedestrian positioning on mobile devices such as mobile phones and PDAs. However mobile-GPS can cause approximately 100-meter error by serious influences such as &quot;multipath&quot; in urban areas. When the present location plot is far away from approximately 100-meter on the map, Mobile-GPS may give the pedestrian serious confusion. In this paper, we propose an improved pedestrian positioning system using mobile-GPS, road traffic signs and landmarks based on current location recognition. At first, our system determines the position of the road traffic signs using the road traffic signs and a landmark which the pedestrian finds in the real world. Secondly, our system displays the position of the road traffic signs and the landmark on the map. Then, the pedestrian recognize the present location easily on the map because the real world corresponds to the map on which the road traffic sign and landmark are displayed. Our system assumes the same road traffic signs in a small area to be one cluster and partitions urban areas into high-rise building areas and others. The system usability is improved based on the original system by selecting a road traffic sign only twice in high-rise building areas and only once in other areas in urban area. Finally, we confirm effectiveness of the improved system through simulation investigation.

    DOI CiNii

  • Connectivity-based and load-balanced cluster routing for mobile ad hoc networks

    板橋 裕介, 戸川 望, 柳澤 政生

    IEICE technical report   109 ( 381 ) 85 - 90  2010.01

    CiNii

  • Multicast routing protocol with collision avoidance in multi-group wireless ad-hoc networks

    竹内 博是, 戸川 望, 柳澤 政生

    IEICE technical report   109 ( 381 ) 95 - 100  2010.01

    CiNii

  • A Dedicated Functional Unit Synthesis Algorithm with MISO Structures based on Partial Matching

    HASHIMOTO Norihiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   109 ( 395 ) 89 - 94  2010.01

     View Summary

    Requirement for application-specific processor is really increasing recently, however, it takes much time to design a processor for each application. Therefore, we require an automatic synthesis system for application-specific processors. In this paper, we propose a dedicated functional unit synthesis algorithm for an application-specific processor. Our algorithm synthesizes a dedicated functional unit with MISO(Multiple Input, Single Output)structure. Additionally, our algorithm performs partial matching, which makes a dedicated unit execute functions even if a dedicated unit corresponds to a subgraph of CDFG(Control-Data Flow Graph)partially. It is realized by making unnecessary functions execute with 0 or 1 as an input. Our algorithm achieved 52% of time reduction compared to previous approaches.

    CiNii

  • A Dedicated Functional Unit Synthesis Algorithm with MISO Structures based on Partial Matching

    HASHIMOTO Norihiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   109 ( 394 ) 89 - 94  2010.01

     View Summary

    Requirement for application-specific processor is really increasing recently, however, it takes much time to design a processor for each application. Therefore, we require an automatic synthesis system for application-specific processors. In this paper, we propose a dedicated functional unit syntesis algorithm for an application-specific processor. Our algorithm synthesizes a dedicated funcitonal unit with MISO (Multiple Input, Single Output) structure. Additionally, our algorithm performs partial matching, which makes a dedicated unit execute fuctions even if a dedicated unit corresponds to a subgraph of CDFG (Control-Data Flow Graph) partially. It is realized by making unnecessary functions execute with 0 or 1 as an input. Our algorithm acheived 52% of time reduction compared to previous approaches.

    CiNii

  • A Dedicated Functional Unit Synthesis Algorithm with MISO Structures based on Partial Matching

    HASHIMOTO Norihiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   109 ( 393 ) 89 - 94  2010.01

     View Summary

    Requirement for application-specific processor is really increasing recently, however, it takes much time to design a processor for each application. Therefore, we require an automatic synthesis system for application-specific processors. In this paper, we propose a dedicated functional unit syntesis algorithm for an application-specific processor. Our algorithm synthesizes a dedicated funcitonal unit with MISO (Multiple Input, Single Output) structure. Additionally, our algorithm performs partial matching, which makes a dedicated unit execute fuctions even if a dedicated unit corresponds to a subgraph of CDFG (Control-Data Flow Graph) partially. It is realized by making unnecessary functions execute with 0 or 1 as an input. Our algorithm acheived 52% of time reduction compared to previous approaches.

    CiNii

  • A Pedestrian Positioning System Using Road Traffic Signs and Landmarks Based on Current Location Recognition

    KOJIMA Tomoyuki, YANAGISAWA Masao, OHTSUKI Tatsuo, TOGAWA Nozomu

    ITE Technical Report   34 ( 0 ) 153 - 158  2010

     View Summary

    Mobile-GPS is generally used for pedestrian positioning on mobile devices such as mobile phones and PDAs. However mobile-GPS can cause approximately 100-meter error by serious influences such as &quot;multipath&quot; in urban areas. When the present location plot is far away from approximately 100-meter on the map, Mobile-GPS may give the pedestrian serious confusion. In this paper, we propose an improved pedestrian positioning system using mobile-GPS, road traffic signs and landmarks based on current location recognition. At first, our system determines the position of the road traffic signs using the road traffic signs and a landmark which the pedestrian finds in the real world. Secondly, our system displays the position of the road traffic signs and the landmark on the map. Then, the pedestrian recognize the present location easily on the map because the real world corresponds to the map on which the road traffic sign and landmark are displayed. Our system assumes the same road traffic signs in a small area to be one cluster and partitions urban areas into high-rise building areas and others. The system usability is improved based on the original system by selecting a road traffic sign only twice in high-rise building areas and only once in other areas in urban area. Finally, we confirm effectiveness of the improved system through simulation investigation.

    DOI CiNii

  • Two-level Cache Simulation with L2 Unified Cache for Embedded Applications

    KOBAYASHI Yuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   109 ( 316 ) 37 - 42  2009.11

     View Summary

    In this paper, we propose a two-level cache simulation method with L2 unified cache for embedded applications. It simulates L1 instruction cache, L1 data cache and L2 unified cache accurately in short period of time, by repeating simulation for L1/L2 instruction(data) cache several times. Additionally, by using several cache properties we can obtain the number of cache hits/misses without simulating several cache configurations. Our proposed approach totally runs a maximum of 3662.93 times faster than that of the conventional exhaustive approach.

    CiNii

  • Simulation-Based Bus Width Optimization for Two-Level Caches

    WATANABE Shinta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   109 ( 316 ) 43 - 48  2009.11

     View Summary

    In this paper, we propose a simulation-based bus width and cache configuration optimization approach for two-level caches. First, we show that we can consider the cache hit/miss judgment and the bus width optimization independently. Second, the cache hit/miss judgments can be done effectively by applying our CRCB techniques. Then we show several properties for cache and bus width and propose an effective bus width optimization approach based on them. We have developed a system that optimizes cache and bus configuration where total memory access time or total energy consumption is minimized. Our proposed approach totally runs a maximum of 835.91 faster compared to the simple exhaustive approach.

    CiNii

  • Two-level Cache Simulation with L2 Unified Cache for Embedded Applications

    KOBAYASHI Yuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   109 ( 315 ) 37 - 42  2009.11

     View Summary

    In this paper, we propose a two-level cache simulation method with L2 unified cache for embedded applications. It simulates L1 instruction cache, L1 data cache and L2 unified cache accurately in short period of time, by repeating simulation for L1/L2 instruction(data) cache several times. Additionally, by using several cache properties we can obtain the number of cache hits/misses whithout simulating several cache configurations. Our proposed approach totally runs a maximum of 3662.93 times faster than that of the conventional exhaustive approach.

    CiNii

  • Simulation-Based Bus Width Optimization for Two-Level Caches

    WATANABE Shinta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   109 ( 315 ) 43 - 48  2009.11

     View Summary

    In this paper, we propose a simulation-based bus width and cache configuration optimization approach for two-level caches. First, we show that we can consider the cache hit/miss judgement and the bus width optimization independently. Second, the cache hit/mis judgments can be done effectively by applying our CRCB techniques. Then we show several properties for cache and bus width and propose an effective bus width optimization approach based on them. We have developed a system that optimizes cache and bus configuration where total memory access time or total energy consumption is minimized. Our proposed approach totally runs a maximum of 835.91 faster compared to the simple exhaustive approach.

    CiNii

  • High-speed sub-multiplication arithmetic unit design by selector logic and novel butterfly unit as an application

    塚本 洋平, 戸川 望, 柳澤 政生

    IEICE technical report   109 ( 227 ) 101 - 106  2009.10

    CiNii

  • High-speed Sub-Multiplication Arithmetic Unit Design by Selector Logic and Novel Buttefly Unit As an Application

    TSUKAMOTO YOUHEI, TOGAWA NOZOMU, YANAGISAWA MASAO, OHTSUKI TATSUO, TONOMURA MOTONOBU

    研究報告システムLSI設計技術(SLDM)   2009 ( 18 ) 1 - 6  2009.10

     View Summary

    システム LSI は通信,動画像,音声処理などの複雑で規模の大きな演算を高速に処理するために特定の計算に特化した専用演算器を搭載してきた.その一つが積和演算を行う MAC 演算器である.これは部分積加算を拡張することで桁上げ伝播遅延を削減でき,結果として乗算 1 回分と同等の遅延時間で計算できる.一方差積演算に注目すると,部分積が決定するのに減算の桁上げ遅延を待たねばならず全体の遅延は減算と乗算 2 つの遅延の合計となる.本稿ではこの問題に対し差積演算の部分積を適切にまとめたものがセレクタ回路の計算と等価となることに注目し,セレクタ論理を用いて部分積を高速に生成し差積演算の速度を向上する手法を提案する.次に設計した差積演算器を FFT におけるバタフライ演算に組み込むことを考える.FFT は無線通信,動画像処理などの分野で高サンプル数の演算が求められており,それらに対応するために高速なバタフライ演算器が必要である.これに対しバタフライ演算のクリティカルパスは複素減算,乗算演算でありこれに上述の差積演算回路を適用することで高速化できることを示す.Large-scale network and multimedia application LSIs include application specific arithmetic circuits. A multiply-accumulator (MAC) which is one of these optimized circuits extends partial-products addition and decreases carry propagations. However, there is no method similar to MAC to execute subtractmultiplication. In this paper, we propose a high-speed subtract-multiplier that decreases latency of subtract operation by bit-level transformation using selector-logics. Partial products are calculated directly by bit-level transformation and its total number is compressed to approximately half. The proposed subtract-multiplier can apply to even any kind of systems using subtractmultiplications and butterfly operation in FFT is a suitable application using them. Experimental results show that our proposed butterfly operation circuit improves the performance by 33.0%, compared to a conventional one.

    CiNii

  • High-speed Sub-Multiplication Arithmetic Unit Design by Selector Logic and Novel Buttefly Unit As an Application

    TSUKAMOTO YOUHEI, TOGAWA NOZOMU, YANAGISAWA MASAO, OHTSUKI TATSUO, TONOMURA MOTONOBU

    IEICE technical report   109 ( 226 ) 101 - 106  2009.10

    CiNii

  • High Throughput Irregular LDPC Decoder Based on High-Efficiency Column Operation Unit for IEEE 802.11n Standard

    NAGASHIMA Akiyuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   109 ( 201 ) 51 - 56  2009.09

     View Summary

    Low Density Parity Check (LDPC) code is expected to be an error correcting code for next generation networks since it shows high error correcting performance and is incorporated into IEEE802.11n, the next standard of wireless network. In this paper, we propose a multi-rate compatible irregular LDPC decoder enhancing column operation parallelism. Focusing on column-wise parallelism of column operations, our LDPC decoder increases usage rate of operational units and throughput by calculating all inputs simultaneously. The decoder achieves 28% increase in throughput compared to resent architectures.

    CiNii

  • DFG Mapping for Flexible Engine/Generic ALU Array and Its Dedicated Synthesis Algorithm

    TAMURA Ryo, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, SATOH Makoto

    IEICE technical report   109 ( 201 ) 57 - 62  2009.09

     View Summary

    Reconfigurable processors are those whose contexts are dynamically reconfigured while they are working. We focus on a reconfigurable processor called FE-GA (Flexible Engine/Generic ALU Array) for digital media processing. Currently, FE-GA does not have its dedicated behavior synthesis tool. In this paper, we map DFG (Data Flow Graph) and propose an algorithm to map them onto calculation cell array (disposed in grid) automatically. Furthermore, our algorithm can generate any size of DFG by using thread switching which is a characteristic of FE-GA. For a given DFG with addition calculation, the algorithm generates a dedicated assembly code which represents a given DFG circuits for FE-GA. The proposed algorithm achieves automatic mapping of DFG with addition calculation of all size within the range of the specification of FE-GA architecture.

    CiNii

  • A Weighted-Sum Circuit Using Selector Logic By Transforming Bit-Level Operations

    HARA Tomoaki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, TONOMURA Motonobu

    IEICE technical report   109 ( 34 ) 7 - 12  2009.05

     View Summary

    Consider a weighted-sum operation, sum of whose weights becomes one. This operation can be applied to various image processings such as alpha-blending and video overlay. In this paper, we propose a weighted-sum circuit, in which we use selector logic by transforming bit-level operations. Our weighted-sum circuit reduces carry propagations and thus decreases critical path delay. Experimental results show that our proposed weighted-sum circuit improves their performance by 17%, compared with naive implementations.

    CiNii

  • Delay Reduction Algorithm by Balancing Distribution of Traffic for Odd-Even Turn Model in NoC

    WAKITA Shingo, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 478 ) 153 - 158  2009.03

     View Summary

    It is necessary to suppress the average delay to low when the packet is forwarded from a source node to the destination node in Network-on-Chip (NoC) for the quality maintenance of the communication between nodes. Adaptive routing used in NoC is composed of routing function that selects route candidates and selection function that decides the candidate which minimizes the communication delay for the distribution of traffic to the used route. Currently, the odd-even turn model has been used as the most popular method for routing function. However, due to that odd-even turn model doesn&#039;t consider the load distribution when conducting route selection, the use of the channel on which the load has been concentrated might not be avoided and the delay might grow as a result. Thus, in this paper we propose an approach for both traffic decentralization and delay reduction. The proposed approach introduces a concept of restricted area, which contains the region concentrating traffic by the feature of the route selection method of odd-even turn model and limits the use of channels in the restricted area.

    CiNii

  • A Task Mapping Algorithm for Task Chaining Network Processor by Backtracking

    SAITO Keita, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 478 ) 147 - 152  2009.03

     View Summary

    To meet increasing demands of link speeds and complex network applications, network processor is required because it has higher speed than general-purpose processor and more flexibility than ASIC. Task Chaining is widely used technique that alllows a network application to be partitioned into multiple modules and be assigned to processor cores for pipeline processing; however, it requires mapping multiple tasks onto different processing elements. Not to lose its high speed and programmability, task mapping cannot take long time and must get great performance out of network processor. In this paper, we present a backtracking-based search approach to solve this problem. In order to conduct task mapping in a short time and improvement of the accuracy of the solution as well, we introduce the existing heauristics that were used for decreasing the communication cost and load-balancing, and then limit the range of the search around the solution obtained by those techniques. The experimental results showed that the proposal method could find the solution with higher throughput when compared with existing technique.

    CiNii

  • A Low Energy ASIP Synthesis Method Based on Reducing Instruction Memory Access

    KOBAYASHI Yuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 413 ) 147 - 152  2009.01

     View Summary

    In this paper, we propose an energy-efficient ASIP synthesis method based on reducing instruction memory access. Since an instruction memory is one of the main energy consumers in ASIP, reducing consumed energy in instruction memory is an important problem. We propose a vertical combined instruction that stores two or more instructions issued sequentially into a single instruction. Then we propose a method to synthesize the vertical combined instructions from a scheduled CDFG. Since the number of instruction memory accesses is reduced, the energy consumption can also be reduced. In experimental results, we confirm reducing approximately 41.9% energy consumption at a whole processor system including memories.

    CiNii

  • A Multi-layer Bus Architecture Optimization Algorithm for MPSoC in Embedded Systems

    YOSHIDA Harunobu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, TACHIBANA Masayoshi

    IEICE technical report   108 ( 413 ) 141 - 146  2009.01

     View Summary

    In this paper, we propose an on-chip bus optimization algorithm for a multi-layer bus architecture. Our algorithm efficiently searches for an optimal selection of the number and bit-size of buses, CPU-bus connection topology, and the priority of each CPU subject to the time constraint for given embedded applications. It is necessary to estimate the running time of applications with taking into consideration the effect of memory access conflict. Before taking into consideration the effect of memory access conflict, our approach removes configurations which violate the constraints. By reducing the design space in this way we can obtain an optimal configuration in shorter time. Our algorithm is 8.55 faster compared to the exhaustive approach.

    CiNii

  • A Fast SIMD Processing Unit Synthesis Method with Optimal Pipeline Architecture for Application-specific Processors

    WATANABE Takayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 413 ) 99 - 104  2009.01

     View Summary

    Small area, high performance and high productivity are required for application-specific processors in embedded systems. This paper proposes a fast SIMD processing unit synthesis method with optimal pipeline architecture applied to a processor core in hardware/software (HW/SW) co-synthesis system, SPADES, for application-specific processors. In the proposed method, if a pipelined SIMD processing unit with minimum delay is not on the critical path of a processor core, pipeline registers are inserted at optimal position which causes minimum amount of area increase within the critical path delay of a processor core. Therefore it can reduce area increase compared with the conventional method. Since this proposed method is fast to find the optimal solution, exploring processor architecture configuration is also effective. Finally, the SIMD operation unit generation system into which this proposed method is embeded generates HDL description of a SIMD processing unit. The experimental results show effectiveness of this method.

    CiNii

  • Fast Module Placement in Floorplan-aware High-level Synthesis

    SATO Wataru, OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 413 ) 93 - 98  2009.01

     View Summary

    As device feature size decreases, interconnect delay becomes the dominating factor of total delay. Therefore it is necessary to consider a floorplan in a stage of the high-level synthesis. While device feature size decreases, a condition of the Time to Market is severe, we need to design in a short time. Therefore it is desired to execute the high-level synthesis with floorplan in a short time. In this paper, we propose a high-speed module placement algorithm that used information of the high-level synthesis for the system that execute high-level synthesis and a floorplan repeatedly. This algorithm executes the placement fast that considered interconnect delay between modules by constructive method that used information of a scheduling/FU binding process. We show effectiveness of the proposed algorithm through experimental results.

    CiNii

  • A Low Energy ASIP Synthesis Method Based on Reducing Instruction Memory Access

    KOBAYASHI Yuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 414 ) 147 - 152  2009.01

     View Summary

    In this paper, we propose an energy-efficient ASIP synthesis method based on reducing instruction memory access. Since an instruction memory is one of the main energy consumers in ASIP, reducing consumed energy in instruction memory is an important problem. We propose a vertical combined instruction that stores two or more instructions issued sequentially into a single instruction. Then we propose a method to synthesize the vertical combined instructions from a scheduled CDFG. Since the number of instruction memory accesses is reduced, the energy consumption can also be reduced. In experimental results, we confirm reducing approximately 41.9% energy consumption at a whole processor system including memories.

    CiNii

  • A Multi-layer Bus Architecture Optimization Algorithm for MPSoC in Embedded Systems

    YOSHIDA Harunobu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, TACHIBANA Masayoshi

    IEICE technical report   108 ( 414 ) 141 - 146  2009.01

     View Summary

    In this paper, we propose an on-chip bus optimization algorithm for a multi-layer bus architecture. Our algorithm efficiently searches for an optimal selection of the number and bit-size of buses, CPU-bus connection topology, and the priority of each CPU subject to the time constraint for given embedded applications. It is necessary to estimate the running time of applications with taking into consideration the effect of memory access conflict. Before taking into consideration the effect of memory access conflict, our approach removes configurations which violate the constraints. By reducing the design space in this way we can obtain an optimal configuration in shorter time. Our algorithm is 8.55 faster compared to the exhaustive approach.

    CiNii

  • A Fast SIMD Processing Unit Synthesis Method with Optimal Pipeline Architecture for Application-specific Processors

    WATANABE Takayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 414 ) 99 - 104  2009.01

     View Summary

    Small area, high performance and high productivity are required for application-specific processors in embedded systems. This paper proposes a fast SIMD processing unit synthesis method with optimal pipeline architecture applied to a processor core in hardware/software (HW/SW) co-synthesis system, SPADES, for application-specific processors. In the proposed method, if a pipelined SIMD processing unit with minimum delay is not on the critical path of a processor core, pipeline registers are inserted at optimal position which causes minimum amount of area increase within the critical path delay of a processor core. Therefore it can reduce area increase compared with the conventional method. Since this proposed method is fast to find the optimal solution, exploring processor architecture configuration is also effective. Finally, the SIMD operation unit generation system into which this proposed method is embeded generates HDL description of a SIMD processing unit. The experimental results show effectiveness of this method.

    CiNii

  • Fast Module Placement in Floorplan-aware High-level Synthesis

    SATO Wataru, OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 414 ) 93 - 98  2009.01

     View Summary

    As device feature size decreases, interconnect delay becomes the dominating factor of total delay. Therefore it is necessary to consider a floorplan in a stage of the high-level synthesis. While device feature size decreases, a condition of the Time to Market is severe, we need to design in a short time. Therefore it is desired to execute the high-level synthesis with floorplan in a short time. In this paper, we propose a high-speed module placement algorithm that used information of the high-level synthesis for the system that execute high-level synthesis and a floorplan repeatedly. This algorithm executes the placement fast that considered interconnect delay between modules by constructive method that used information of a scheduling/FU binding process. We show effectiveness of the proposed algorithm through experimental results.

    CiNii

  • A Fast SIMD Processing Unit Synthesis Method with Optimal Pipeline Architecture for Application-specific Processors

    WATANABE Takayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    研究報告システムLSI設計技術(SLDM)   2009 ( 7 ) 99 - 104  2009.01

     View Summary

    Small area, high performance and high productivity are required for application-specific processors in embedded systems. This paper proposes a fast SIMD processing unit synthesis method with optimal pipeline architecture applied to a processor core in hardware/software (HW/SW) co-synthesis system, SPADES, for application-specific processors. In the proposed method, if a pipelined SIMD processing unit with minimum delay is not on the critical path of a processor core, pipeline registers are inserted at optimal position which causes minimum amount of area increase within the critical path delay of a processor core. Therefore it can reduce area increase compared with the conventional method. Since this proposed method is fast to find the optimal solution, exploring processor architecture configuration is also effective. Finally, the SIMD operation unit generation system into which this proposed method is embeded generates HDL description of a SIMD processing unit. The experimental results show effectiveness of this method.

    CiNii

  • Fast Module Placement in Floorplan-aware High-level Synthesis

    SATO Wataru, OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    研究報告システムLSI設計技術(SLDM)   2009 ( 7 ) 93 - 98  2009.01

     View Summary

    As device feature size decreases, interconnect delay becomes the dominating factor of total delay. Therefore it is necessary to consider a floorplan in a stage of the high-level synthesis. While device feature size decreases, a condition of the Time to Market is severe, we need to design in a short time. Therefore it is desired to execute the high-level synthesis with floorplan in a short time. In this paper, we propose a high-speed module placement algorithm that used information of the high-level synthesis for the system that execute high-level synthesis and a floorplan repeatedly. This algorithm executes the placement fast that considered interconnect delay between modules by constructive method that used information of a scheduling/FU binding process. We show effectiveness of the proposed algorithm through experimental results.

    CiNii

  • A Low Energy ASIP Synthesis Method Based on Reducing Instruction Memory Access

    KOBAYASHI Yuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    研究報告システムLSI設計技術(SLDM)   2009 ( 7 ) 147 - 152  2009.01

     View Summary

    In this paper, we propose an energy-efficient ASIP synthesis method based on reducing instruction memory access. Since an instruction memory is one of the main energy consumers in ASIP, reducing consumed energy in instruction memory is an important problem. We propose a vertical combined instruction that stores two or more instructions issued sequentially into a single instruction. Then we propose a method to synthesize the vertical combined instructions from a scheduled CDFG. Since the number of instruction memory accesses is reduced, the energy consumption can also be reduced. In experimental results, we confirm reducing approximately 41.9% energy consumption at a whole processor system including memories.

    CiNii

  • A Multi-layer Bus Architecture Optimization Algorithm for MPSoC in Embedded Systems

    YOSHIDA Harunobu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, TACHIBANA Masayoshi

    研究報告システムLSI設計技術(SLDM)   2009 ( 7 ) 141 - 146  2009.01

     View Summary

    In this paper, we propose an on-chip bus optimization algorithm for a multi-layer bus architecture. Our algorithm efficiently searches for an optimal selection of the number and bit-size of buses, CPU-bus connection topology, and the priority of each CPU subject to the time constraint for given embedded applications. It is necessary to estimate the running time of applications with taking into consideration the effect of memory access conflict. Before taking into consideration the effect of memory access conflict, our approach removes configurations which violate the constraints. By reducing the design space in this way we can obtain an optimal configuration in shorter time. Our algorithm is 8.55 faster compared to the exhaustive approach.

    CiNii

  • A Low Energy ASIP Synthesis Method Based on Reducing Instruction Memory Access

    KOBAYASHI Yuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 412 ) 147 - 152  2009.01

     View Summary

    In this paper, we propose an energy-efficient ASIP synthesis method based on reducing instruction memory access. Since an instruction memory is one of the main energy consumers in ASIP, reducing consumed energy in instruction memory is an important problem. We propose a vertical combined instruction that stores two or more instructions issued sequentially into a single instruction. Then we propose a method to synthesize the vertical combined instructions from a scheduled CDFG. Since the number of instruction memory accesses is reduced, the energy consumption can also be reduced. In experimental results, we confirm reducing approximately 41.9% energy consumption at a whole processor system including memories.

    CiNii

  • A Multi-layer Bus Architecture Optimization Algorithm for MPSoC in Embedded Systems

    YOSHIDA Harunobu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, TACHIBANA Masayoshi

    IEICE technical report   108 ( 412 ) 141 - 146  2009.01

     View Summary

    In this paper, we propose an on-chip bus optimization algorithm for a multi-layer bus architecture. Our algorithm efficiently searches for an optimal selection of the number and bit-size of buses, CPU-bus connection topology, and the priority of each CPU subject to the time constraint for given embedded applications. It is necessary to estimate the running time of applications with taking into consideration the effect of memory access conflict. Before taking into consideration the effect of memory access conflict, our approach removes configurations which violate the constraints. By reducing the design space in this way we can obtain an optimal configuration in shorter time. Our algorithm is 8.55 faster compared to the exhaustive approach.

    CiNii

  • A Fast SIMD Processing Unit Synthesis Method with Optimal Pipeline Architecture for Application-specific Processors

    WATANABE Takayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 412 ) 99 - 104  2009.01

     View Summary

    Small area, high performance and high productivity are required for application-specific processors in embedded systems. This paper proposes a fast SIMD processing unit synthesis method with optimal pipeline architecture applied to a processor core in hardware/software (HW/SW) co-synthesis system, SPADES, for application-specific processors. In the proposed method, if a pipelined SIMD processing unit with minimum delay is not on the critical path of a processor core, pipeline registers are inserted at optimal position which causes minimum amount of area increase within the critical path delay of a processor core. Therefore it can reduce area increase compared with the conventional method. Since this proposed method is fast to find the optimal solution, exploring processor architecture configuration is also effective. Finally, the SIMD operation unit generation system into which this proposed method is embeded generates HDL description of a SIMD processing unit. The experimental results show effectiveness of this method.

    CiNii

  • Fast Module Placement in Floorplan-aware High-level Synthesis

    SATO Wataru, OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 412 ) 93 - 98  2009.01

     View Summary

    As device feature size decreases, interconnect delay becomes the dominating factor of total delay. Therefore it is necessary to consider a floorplan in a stage of the high-level synthesis. While device feature size decreases, a condition of the Time to Market is severe, we need to design in a short time. Therefore it is desired to execute the high-level synthesis with floorplan in a short time. In this paper, we propose a high-speed module placement algorithm that used information of the high-level synthesis for the system that execute high-level synthesis and a floorplan repeatedly. This algorithm executes the placement fast that considered interconnect delay between modules by constructive method that used information of a scheduling/FU binding process. We show effectiveness of the proposed algorithm through experimental results.

    CiNii

  • A Load-Balancing Anycast Route Selection Method for Reducing Control Packets

    YOKOTA Masayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 359 ) 13 - 18  2008.12

     View Summary

    In anycast communications, clients can communicate with the most suitable server automatically from multiple servers which offer a specific application. We select a route from a client to a server based on the number of hops or processing time of a server, but must consider router loads when there are a lot of clients. We propose a load-balancing anycast route selection method for reducing control packets. The proposed method based on Core-Based Tree Method (CBT) constructs partial trees composed of ridge routers on which loads concentrate. The loads of ridge routers are balanced and the number of control packets for exchanging the load information is reduced by using partial trees. As a result, the packet amounts of the whole network is decreased and we can shorten total communication time. We evaluate the proposed method and investigate the optimal number of partial trees by the simulation.

    CiNii

  • A fast handoff method by using NEMO for high-speed mobile terminals

    TANAKA Atsuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 359 ) 89 - 94  2008.12

     View Summary

    Network Mobility (NEMO) is a method using the Internet inside cars or trains. NEMO achieves a maximum of 54Mbps throughput, but it frequently causes intolerable handoff time. Althought it is possible to reduce the handoff time by using F-HMIPv6 for NEMO, but it cannot run with a high-speed mobile terminals such as Shinkansen. We propose a fast handoff method by using NEMO for high-speed mobile terminals. By means of the proposed method, the MR (Mobile Router) in the train gets IP address of the next AP (Access Point) before receiving the L2 beacon by using train&#039;s speed informetion. The proposed method enables fast handoff with MAP (Mobile Anchor Point) for high-speed mobile terminals such as Shinkansen. We demonstrate advantages of our method by means of simulation results using NS-2.

    CiNii

  • A Multiplexer Reducing Algorithm in Floorplan-Aware High-level Synthesis for Distributed-Register Architectures

    ENDO Tetsuya, OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 111 ) 145 - 150  2008.11

     View Summary

    In high level synthesis for resource shared architecture, multiplexers are inserted between registers and functional units as a result of binding. Multiplexer reduction is necessary for area and performance of synthesized circuit. In this paper, we propose multiplexer reducting algorithms in floorplan-aware high-level synthesis for distributed-register architectures. These algorithms can reduce the number of multiplexers for conventional high-level synthesis. We show effectiveness of the proposed algorithm thorough experimental results.

    CiNii

  • A Two-level Cache and Scratch Pad Memory Simulation for Embedded Systems

    TOJO Nobuaki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 111 ) 97 - 102  2008.11

     View Summary

    In an embedded system where a single application or a class of applications are repeatedly executed on a processor, its memory configuration can be customized such that an optimal one is achieved. We can have an optimal two-level cache and scratch pad memory configuration which minimizes overall memory access time or energy consumption by varying the seven parameters: the number of sets of an L1/L2 cache, a line size of an L1/L2 cache, an associativity of an L1/L2 cache, and a size of a scratch pad memory. In this paper, we propose two-level cache and scratch pad memory design space exploration algorithms: CRCB-T and CRCB-S. Our proposed approach totally runs a maximum of 3172.94 faster compared to the conventional exhaustive approach.

    CiNii

  • A Power Masking Method of AES Circuit by Using Cross Bar Switch to Switch S-Box Circuit.

    KAWAHATA Nobuyuki, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 111 ) 61 - 66  2008.11

     View Summary

    AES is one of the common key cryptosystems often used on an embedded systems, IC-chips and others. Teir common key must be kept secret from others. However, it can be deciphered by side channel attack, the method of cracking cryptosystems by analyzing physical quantity generated at the encryption processing. Especially in side channel attack, differential power analysis (DPA) is known as the most dangerous attacking method. AES circuit is needed to be designd with regard to anti-DPA. To design an anti-DPA AES circuit, we propose a power masking SubBytes circuit which switches several S-Boxes, each of which has a different power to each other. We demonstrate our evaluation and results.

    CiNii

  • Dynamically Variable Secure Scan Architecture against Scan-based Side Channel Attack on Cryptography LSIs

    ATOBE Hiroshi, NARA Ryuta, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 111 ) 55 - 59  2008.11

     View Summary

    Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan chains would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan chains to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each SDSFF to be inverted or not so as to make it more difficult to discover the internal scan architecture. We made an analysis on an AES implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based side channel attack.

    CiNii

  • Scan-based Attack for an AES-LSI included with other IPs

    NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 111 ) 49 - 53  2008.11

     View Summary

    The threat of side-channel attacks against the cryptography LSI is indicated. Especially, scan-based attacks, which use the scan chain, are watched. Scan chains are one of the most important testing techniques, but it is possible to use for attacks against the cryptography LSI. Conventional scan-based attacks only consider the scan chain made by registers of cryptography circuits. However, cryptography LSI usually has many IPs such as memories, micro-processors and other circuits. Because of the real scan chain consists of many kinds of registers, it is obscure whether conventional scan-based attacks can attack or cannot. In this paper, scan-based attack which enables to crack the secret key in the AES-LSI with other IPs is proposed. By focusing the bit pattern of the specific register and monitoring its change, and our method eliminates the influence of other circuit registers. Therefore, our scan-based attacks don&#039;t depend on the architecture of the scan chain, and it can crack real cryptography LSIs included with other IPs.

    CiNii

  • A Parallel Hardware Engine for Generating Deformed Maps

    ARAHATA Akita, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 111 ) 43 - 48  2008.11

     View Summary

    Recently, many of the distribution of map information to mobile devices have been highly-popularized, however, those maps are generally for PC use and not suitable for displays as on mobile devices. According to the nature of map information, it has to be updated in real time, it is a distant idea to prepare an easy-to-read deformed map in advance. For that reason, it is difficult to tailor deformed map to preference of user when processing map on servers even automatic deformation of map data is proposed numerously. Mobile devices need loads of processing time which is virtually impossible in attribute to massive processing volume of data has to be required to deform map data by narrow throughput of mobile devices. In this paper, we propose parallel processing hardware engine for map deformation for mobile devices. We worked out to reduce processing time by processing on hardwares which was bottleneck of map deformation. Proposed parallel processing hardware engine can process deformation of map data within just 1 second on a mobile phone.

    CiNii

  • Multi-Rate Compatible High Throughput Irregular LDPC Decoder Based on High-Efficiency Column Operation Unit

    NAGASHIMA Akiyuki, IMAI Yuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 111 ) 37 - 42  2008.11

     View Summary

    Low Density Parity Check (LDPC) code is expected to be an error correcting code for next generation networks since it shows high error correcting performance and is incorporated in IEEE802.11n the next standard of wireless network. In this paper, we propose a multi-rate compatible irregular LDPC decoder enhancing column operation parallelism. Focusing on column-wise parallelism of column operations, uplift usage rate of operational unit and throughput by calculating all inputs simultaneously. The decoder achieves 12% savings in area and 81% increase in throughput compared to recent architectures.

    CiNii

  • A Multiplexer Reducing Algorithm in Floorplan-Aware High-level Synthesis for Distributed-Register Architectures

    ENDO Tetsuya, OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 298 ) 145 - 150  2008.11

     View Summary

    In high level synthesis for resource shared architecture, multiplexers are inserted between registers and functional units as a result of binding. Multiplexer reduction is necessary for area and performance of synthesized circuit. In this paper, we propose multiplexer reducting algorithms in floorplan-aware high-level synthesis for distributed-register architectures. These algorithms can reduce the number of multiplexers for conventional high-level synthesis. We show effectiveness of the proposed algorithm thorough experimental results.

    CiNii

  • A Two-level Cache and Scratch Pad Memory Simulation for Embedded Systems

    TOJO Nobuaki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 298 ) 97 - 102  2008.11

     View Summary

    In an embedded system where a single application or a class of applications are repeatedly executed on a processor, its memory configuration can be customized such that an optimal one is achieved. We can have an optimal two-level cache and scratch pad memory configuration which minimizes overall memory access time or energy consumption by varying the seven parameters: the number of sets of an L1/L2 cache, a line size of an L1/L2 cache, an associativity of an L1/L2 cache, and a size of a scratch pad memory. In this paper, we propose two-level cache and scratch pad memory design space exploration algorithms: CRCB-T and CRCB-S. Our proposed approach totally runs a maximum of 3172.94 faster compared to the conventional exhaustive approach.

    CiNii

  • A Power Masking Method of AES Circuit by Using Cross Bar Switch to Switch S-Box Circuit

    KAWAHATA Nobuyuki, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 298 ) 61 - 66  2008.11

     View Summary

    AES is one of the common key cryptosystems often used on an embedded systems, IC-chips and others. Teir common key must be kept secret from others. However, it can be deciphered by side channel attack, the method of cracking cryptosystems by analyzing physical quantity generated at the encryption processing. Especially in side channel attack, differential power analysis (DPA) is known as the most dangerous attacking method. AES circuit is needed to be designd with regard to anti-DPA. To design an anti-DPA AES circuit, we propose a power masking SubBytes circuit which switches several S-Boxes, each of which has a different power to each other. We demonstrate our evaluation and results.

    CiNii

  • Dynamically Variable Secure Scan Architecture against Scan-based Side Channel Attack on Cryptography LSIs

    ATOBE Hiroshi, NARA Ryuta, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 298 ) 55 - 59  2008.11

     View Summary

    Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan chains would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan chains to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each SDSFF to be inverted or not so as to make it more difficult to discover the internal scan architecture. We made an analysis on an AES implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based side channel attack.

    CiNii

  • Scan-based Attack for an AES-LSI included with other IPs

    NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 298 ) 49 - 53  2008.11

     View Summary

    The threat of side-channel attacks against the cryptography LSI is indicated. Especially, scan-based attacks, which use the scan chain, are watched. Scan chains are one of the most important testing techniques, but it is possible to use for attacks against the cryptography LSI. Conventional scan-based attacks only consider the scan chain made by registers of cryptography circuits. However, cryptography LSI usually has many IPs such as memories, micro-processors and other circuits. Because of the real scan chain consists of many kinds of registers, it is obscure whether conventional scan-based attacks can attack or cannot. In this paper, scan-based attack which enables to crack the secret key in the AES-LSI with other IPs is proposed. By focusing the bit pattern of the specific register and monitoring its change, and our method eliminates the influence of other circuit registers. Therefore, our scan-based attacks don&#039;t depend on the architecture of the scan chain, and it can crack real cryptography LSIs included with other IPs.

    CiNii

  • A Parallel Hardware Engine for Generating Deformed Maps

    ARAHATA Akira, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 298 ) 43 - 48  2008.11

     View Summary

    Recently, many of the distribution of map information to mobile devices have been highly-popularized, however, those maps are generally for PC use and not suitable for displays as on mobile devices. According to the nature of map information, it has to be updated in real time, it is a distant idea to prepare an easy-to-read deformed map in advance. For that reason, it is difficult to tailor deformed map to preference of user when processing map on servers even automatic deformation of map data is proposed numerously. Mobile devices need loads of processing time which is virtually impossible in attribute to massive processing volume of data has to be required to deform map data by narrow throughput of mobile devices. In this paper, we propose parallel processing hardware engine for map deformation for mobile devices. We worked out to reduce processing time by processing on hardwares which was bottleneck of map deformation. Proposed parallel processing hardware engine can process deformation of map data within just 1 second on a mobile phone.

    CiNii

  • Multi-Rate Compatible High Throughput Irregular LDPC Decoder Based on High-Efficiency Column Operation Unit

    NAGASHIMA Akiyuki, IMAI Yuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 298 ) 37 - 42  2008.11

     View Summary

    Low Density Parity Check (LDPC) code is expected to be an error correcting code for next generation networks since it shows high error correcting performance and is incorporated in IEEE802.11n the next standard of wireless network. In this paper, we propose a multi-rate compatible irregular LDPC decoder enhancing column operation parallelism. Focusing on column-wise parallelism of column operations, uplift usage rate of operational unit and throughput by calculating all inputs simultaneously. The decoder achieves 12% savings in area and 81% increase in throughput compared to recent architectures.

    CiNii

  • A Multiplexer Reducing Algorithm in Floorplan-Aware High-level Synthesis for Distributed-Register Architectures

    ENDO Tetsuya, OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 299 ) 145 - 150  2008.11

     View Summary

    In high level synthesis for resource shared architecture, multiplexers are inserted between registers and functional units as a result of binding. Multiplexer reduction is necessary for area and performance of synthesized circuit. In this paper, we propose multiplexer reducting algorithms in floorplan-aware high-level synthesis for distributed-register architectures. These algorithms can reduce the number of multiplexers for conventional high-level synthesis. We show effectiveness of the proposed algorithm thorough experimental results.

    CiNii

  • A Two-level Cache and Scratch Pad Memory Simulation for Embedded Systems

    TOJO Nobuaki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 299 ) 97 - 102  2008.11

     View Summary

    In an embedded system where a single application or a class of applications are repeatedly executed on a processor, its memory configuration can be customized such that an optimal one is achieved. We can have an optimal two-level cache and scratch pad memory configuration which minimizes overall memory access time or energy consumption by varying the seven parameters: the number of sets of an L1/L2 cache, a line size of an L1/L2 cache, an associativity of an L1/L2 cache, and a size of a scratch pad memory. In this paper, we propose two-level cache and scratch pad memory design space exploration algorithms: CRCB-T and CRCB-S. Our proposed approach totally runs a maximum of 3172.94 faster compared to the conventional exhaustive approach.

    CiNii

  • A Power Masking Method of AES Circuit by Using Cross Bar Switch to Switch S-Box Circuit

    KAWAHATA Nobuyuki, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 299 ) 61 - 66  2008.11

     View Summary

    AES is one of the common key cryptosystems often used on an embedded systems, IC-chips and others. Teir common key must be kept secret from others. However, it can be deciphered by side channel attack, the method of cracking cryptosystems by analyzing physical quantity generated at the encryption processing. Especially in side channel attack, differential power analysis (DPA) is known as the most dangerous attacking method. AES circuit is needed to be designd with regard to anti-DPA. To design an anti-DPA AES circuit, we propose a power masking SubBytes circuit which switches several S-Boxes, each of which has a different power to each other. We demonstrate our evaluation and results.

    CiNii

  • Dynamically Variable Secure Scan Architecture against Scan-based Side Channel Attack on Cryptography LSIs

    ATOBE Hiroshi, NARA Ryuta, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 299 ) 55 - 59  2008.11

     View Summary

    Scan test is a powerful and popular test technique because it can control and observe the internal states of the circuit under test. However, scan chains would be used to discover the internals of crypto hardware, which presents a significant security risk of information leakage. An interesting design-for-test technique by inserting inverters into the internal scan chains to complicate the scan structure has been recently presented. Unfortunately, it still carries the potential of being attacked through statistical analysis of the information scanned out from chips. Therefore, in this paper we propose secure scan architecture, called dynamic variable secure scan, against scan-based side channel attack. The modified scan flip-flops are state-dependent, which could cause the output of each SDSFF to be inverted or not so as to make it more difficult to discover the internal scan architecture. We made an analysis on an AES implementation to show the effectiveness of the proposed method and discussed how our approach is resistant to scan-based side channel attack.

    CiNii

  • Scan-based Attack for an AES-LSI included with other IPs

    NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 299 ) 49 - 53  2008.11

     View Summary

    The threat of side-channel attacks against the cryptography LSI is indicated. Especially, scan-based attacks, which use the scan chain, are watched. Scan chains are one of the most important testing techniques, but it is possible to use for attacks against the cryptography LSI. Conventional scan-based attacks only consider the scan chain made by registers of cryptography circuits. However, cryptography LSI usually has many IPs such as memories, micro-processors and other circuits. Because of the real scan chain consists of many kinds of registers, it is obscure whether conventional scan-based attacks can attack or cannot. In this paper, scan-based attack which enables to crack the secret key in the AES-LSI with other IPs is proposed. By focusing the bit pattern of the specific register and monitoring its change, and our method eliminates the influence of other circuit registers. Therefore, our scan-based attacks don&#039;t depend on the architecture of the scan chain, and it can crack real cryptography LSIs included with other IPs.

    CiNii

  • A Parallel Hardware Engine for Generating Deformed Maps

    ARAHATA Akira, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 299 ) 43 - 48  2008.11

     View Summary

    Recently, many of the distribution of map information to mobile devices have been highly-popularized, however, those maps are generally for PC use and not suitable for displays as on mobile devices. According to the nature of map information, it has to be updated in real time, it is a distant idea to prepare an easy-to-read deformed map in advance. For that reason, it is difficult to tailor deformed map to preference of user when processing map on servers even automatic deformation of map data is proposed numerously. Mobile devices need loads of processing time which is virtually impossible in attribute to massive processing volume of data has to be required to deform map data by narrow throughput of mobile devices. In this paper, we propose parallel processing hardware engine for map deformation for mobile devices. We worked out to reduce processing time by processing on hardwares which was bottleneck of map deformation. Proposed parallel processing hardware engine can process deformation of map data within just 1 second on a mobile phone.

    CiNii

  • Multi-Rate Compatible High Throughput Irregular LDPC Decoder Based on High-Efficiency Column Operation Unit

    NAGASHIMA Akiyuki, IMAI Yuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 299 ) 37 - 42  2008.11

     View Summary

    Low Density Parity Check (LDPC) code is expected to be an error correcting code for next generation networks since it shows high error correcting performance and is incorporated in IEEE802.11n the next standard of wireless network. In this paper, we propose a multi-rate compatible irregular LDPC decoder enhancing column operation parallelism. Focusing on column-wise parallelism of column operations, uplift usage rate of operational unit and throughput by calculating all inputs simultaneously. The decoder achieves 12% savings in area and 81% increase in throughput compared to recent architectures.

    CiNii

  • A hybrid routing protocol using location information for Mobile Ad Hoc Networks

    三浦 俊祐, 戸川 望, 柳澤 政生

    IEICE technical report   108 ( 251 ) 17 - 22  2008.10

    CiNii

  • 特集「組込みシステム工学」の編集にあたって

    平山 雅之, 戸川 望

    情報処理学会論文誌   49 ( 10 ) 3450 - 3450  2008.10

    CiNii

  • Area/Delay Estimation for SIMD Processor Cores

    山崎 大輔, 小原俊逸, 戸川 望, 柳澤 政生, 大附 辰夫

    情報処理学会論文誌   49 ( 10 ) 3462 - 3481  2008.10

     View Summary

    ASIP(Application Specific Instruction Processor)の自動合成は,対象とするアプリケーションに最適な構成を決定し,プロセッサのハードウェア部分とソフトウェア部分を同時に設計する.最適な構成の探索において,ある時点での構成に対して逐一論理合成を行い最適な構成の判定を行うと探索に多大な時間を要してしまうため,探索の評価指標として面積/遅延の見積り値を用い,論理合成することなく高速な探索を行う必要がある.また,アーキテクチャ探索に使用する見積り値と論理合成値との誤差が大きいと解の探索において適切な解が得られない可能性があるため精度の高い見積りを行うことが重要となる.本稿では,SIMD演算ユニットおよびアドレッシングユニットの構成の変化に対応したSIMD型プロセッサコアの面積/遅延時間見積り式を提案する.見積り式はプロセッサコアと付随するハードウェアユニットを部分機能ごとに分けてパラメータ化することによって導出し,これを用いることで論理合成することなく所望のアーキテクチャの面積・遅延値を導出することが可能となる.見積り式により導出されたプロセッサコアの面積値と論理合成値の相対誤差は平均2.25%,遅延時間の誤差は平均で0.54 nsとなった.In synthesis of ASIP (Application Specific Instruction Processor), we optimize processor architecture for a target application, and design a hardware part and a software part at the same time. In order to obtain an optimal processor architecture in a short time, we require a fast area/delay estimation without doing logic synthesis in an architecture exploration phase. It is important to estimate them accurately because a large range of errors may lead an inadequate solution. This paper proposes area/delay estimation for SIMD processor cores with configurable SIMD functional units and adressing units. Estimation equation is obtained by partitioning the processor core and hardware units into several functional parts and parameterizing them, and can obtain an estimation value for an architecture. We show the effectiveness of estimation equation by verifying the area/delay values obtained from the estimation equation and the logic synthesis value of processor cores. Relative error of them is 2.25% on the average. Error of delays is 0.54ns on the average.

    CiNii

  • A DFG Mapping Algorithm for Flexible Engine/Generic ALU Array

    HONMA Masayuki, TAMURA Ryo, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, SATOH Makoto

    IEICE technical report   108 ( 224 ) 7 - 12  2008.09

     View Summary

    Reconfigurable processors are processors whose contexts are dynamically reconfigured while they are working. We focus on a reconfigurable processor called FE-GA (Flexible Engine/Generic ALU array) for digital media processing. Presently, FE-GA does not have its dedicated development tool. Thus, in this paper, we propose an algorithm to map a data flow graph(DFG) onto it automatically. For given a DFG, the algorithm generates a mapping result and translates it into a dedicated assembly code which represents the DFG for FE-GA. Then an editor called FEEditor reads the generated assembly code and implements its corresponding calculation on FE-GA. In the proposed algorithm, we map each node one by one sequentially onto calculation cell array of FE-GA in the direction from the input side to the output side of a DFG. The first node is preferentially mapped at the upper left. The others are mapped depending on the positions of the nodes which have been already mapped to FE-GA. The algorithm uses this step repeatedly. We apply the proposed algorithm to 8 DFGs and work out the number of cycles and runtimes. The proposed algorithm successflly achieves all of 8 DFGs mapping.

    CiNii

  • FFT Design for Flexible Engine/Generic ALU Array and Its Dedicated Synthesis Algorithm

    TAMURA Ryo, HONMA Masayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, SATOH Makoto

    IEICE technical report   108 ( 224 ) 13 - 18  2008.09

     View Summary

    Reconfigurable processors are those whose contexts are dynamically reconfigured while they are working. We focus on a reconfigurable processor called FE-GA (Flexible Engine/Generic ALU array) for digital media processing. Currently, FE-GA does not have its dedicated behavior synthesis tool. In this paper, we design FFT filters and propose an algorithm to map them onto it automatically. For a given data point of an FFT filter, the algorithm generates a dedicated assembly code which represents a given FFT circuits for FE-GA. After implementing FFT filters to FE-GA, an editor called FEEditor reads the generated assembly code and implements its corresponding FFT filter on FE-GA. The proposed algorithm achieves automatic mapping of FFT filters of all data points within the range of the specification of FE-GA architecture.

    CiNii

  • Design and Evalution of a Butterfly Circuit Using Selector Logic by Bit-Level Transformation

    NAMURA Takeshi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, TONOMURA Motonobu

    IEICE technical report   108 ( 224 ) 31 - 36  2008.09

     View Summary

    An arithmetic circuit using selector logic has been proposed, as a high computational approach for processing. In this paper, we propose a radix-2 butterfly circuit architecture using selector logic by bit-level taransformation for high computational requirement of FFT. Our butterfly circuit reduces carry propagations, compared to conventional butterfly circuits. Experimental results show that our proposed butterfly circuit improves the performance by 37.1% to 49.8% with no area overhead, compared to conventional butterfly circuits under area constraint design.

    CiNii

  • A Deformed Map Generation Algorithm for Small Displays Based on Cognitive Science and Its Stochastic Evaluations

    NINOMIYA Naoya, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    The IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (Japanese edition) A   91 ( 9 ) 869 - 882  2008.09

     View Summary

    携帯電話による位置情報サービスとインターネットサービスの普及により,歩行者を対象とした地図サービスの利用が拡大している.これに伴い,表示面積の狭いモバイル端末に有効な略地図を自動生成するための各種技術の研究が盛んに行われている.道路形状の水平・垂直化,交差点角度の量子化を基本とする従来手法では碁盤の目のようなデザイン性の高い略地図の生成が可能であるが,それらがユーザにとって迷いにくい地図であるとは限らない.本論文では歩行者が道路形状やランドマークをどのように認識しているかという認知科学に着目し,これらの認知科学を反映させた略地図生成手法を提案する.略地図生成手法では第1に経路探索結果を用いて経路に沿った道路ネットワークデータを抽出する.第2に抽出した道路ネットワークデータに対して認知科学に基づいた簡略化を施す.最後にベクタグラフィックスにより略地図を描画する.評価実験後のアンケート調査により,提案手法で生成した地図の利用者の約7割〜9割が視認性,迷いにくさの観点から「良好」と回答していることを確認した.

    CiNii

  • A Fast Deformed Area Map Generation Algorithm Based on Road Network Partitioning

    MATSUMOTO Kazuya, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 171 ) 25 - 30  2008.07

     View Summary

    As cellular phones become smaller and more powerful, a navigation system using GPS on cellular phones has expanded, and its demand increases not only in urban areas but also in suburb areas. It is necessary to generate a deformed map that is appropriate for the display on a cellular phone whose display size is small and its processing performance is low. In this paper, we propose a deformed map generation algorithm that can be applied not only to urban areas mainly composed of straight lines, but also to suburb areas including straight lines and curves. The proposed algorithm is based on dividing the road network in an entire area into some groups and substitute each group with a straight line or a curve. At the same time it removes several nodes and links which we do not need in a deformed map. As a result of having applied the proposed algorithm to ten urban areas and ten suburb areas, we confirmed that the deformed maps easy to recognize were generated and its data size was reduced not only in the urban areas but also in the suburb areas with many curves.

    CiNii

  • User's Route Preference Investigation in Indoor Environment and Its Associative Route Searching Method

    YAMAGISHI Takahiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 171 ) 31 - 36  2008.07

     View Summary

    Recently, mobile telecommunication services have been much extended and advanced with the spread of celluar phones. However, researches on practical pedestrian navigation services are limited to outdoor environments. In this paper, we focus on navigation services in indoor environments with complicated structure in comparison with outdoor environments, and propose a route searching method that reflect user&#039;s route preference to give the most suitable route to each user. At first, we propose network data structure which is specialized in an indoor environment using a visible graph. Next, we investigate route preference items which should be taken in and show that there is over 70% demand for short-distant routes and more than 80% demand for stairs/escalators/elevators, from elderly persons in particular. In addition to these, there is over 60% demand for crowd-avoiding routes. Therefore, we propose a route searching method that considers &quot;short-distant routes&quot;, &quot;stairs/escalators/elevators&quot; and &quot;a congestion situation&quot;. To show the effectiveness of the proposal method, we carry out an on-the-spot investigation and show that the most suitable route is output for each user through several types of simulation experiments.

    CiNii

  • Recognition Rate Investigation of Various Photography Conditions for Pedestrian Positioning Using Road Traffic Signs

    KOJIMA Tomoyuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   108 ( 171 ) 37 - 42  2008.07

     View Summary

    Pedestrian navigation systems are being developed by the spread of GPS. However GPS can cause approximately 100-meter error by several influences such as the ionosphere and the multipath in urban areas. In urban areas, 100-meter error may give serious confusion because 100-meter square includes several streets. We have been proposing a pedestrian positioning system using a camera phone and road traffic signs for several meter accuracy. The positioning system compares traffic signs photographed by a camera phone with the map database around the position of the user where a mobile phone grasps from the GPS coordinates, and gives the user his/her accurate position. One of the most important subsystem of this positioning system is a recognition system of road traffic signs. Such systems have been developed for cars, but not users with a mobile phone. We are developing two types recognition systems of road traffics signs. In this paper, we investgate recognition rate using them for given various photograpy conditions. In this investigation, we use the photographs which include road sings taken by the camera phone. Especially we show that the mobile light at night and the backlight at daytime affects the recognition rate drastically.

    CiNii

  • Radix-2 Butterfly Circuit Architecture Using Selector Logic

    NAMURA Takeshi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, TONOMURA Motonobu

    IEICE technical report   108 ( 22 ) 25 - 30  2008.05

     View Summary

    An arithmetic circuit using selector logic has been proposed, as a high computational approach for processing. In this paper, we propose a radix-2 butterfly circuit architecture using selector logic. Our butterfly circuit reduces carry propagations, compared to conventional butterfly circuits. Experimental results show that our proposed butterfly circuit improves the performance by 21.8%, compared to conventional butterfly circuits.

    CiNii

  • Radix-2 Butterfly Circuit Architecture Using Selector Logic

    NAMURA Takeshi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, TONOMURA Motonobu

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 38 ) 25 - 30  2008.05

     View Summary

    An arithmetic circuit using selector logic has been proposed, as a high computational approach for processing. In this paper, we propose a radix-2 butterfly circuit architecture using selector logic. Our butterfly circuit reduces carry propagations, compared to conventional butterfly circuits. Experimental results show that our proposed butterfly circuit improves the performance by 21.8%, compared to conventional butterfly circuits.

    CiNii

  • An L1 cache optimization algorithm for application processor cores

    東條 信明, 戸川 望, 柳澤 政生

    回路とシステム軽井沢ワークショップ論文集   21   243 - 248  2008.04

    CiNii

  • A route selection method based on router loads in anycast communications

    YOKOTA Masayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 443 ) 13 - 18  2008.04

     View Summary

    In anycast communications, clients can communicate with the most suitable server from multiple servers which offer a specific application automatically. We select a route to the most suitable server based on the number of hops or processing time of a server, but must consider router loads by the congestion of the network when there are a lot of clients. We propose a route selection method based on router loads in anycast communications. The proposed method based on Core-Based Tree Method (CBT) selects a route by building the Cover tree which is composed of not only the CBT tree but also ridge routers. Because it takes into consideration processing time of a server and router loads, even if network traffic increases, the proposed method can select the route where router loads are small in comparison with the existing method. We perform evaluation by the simulation and show the effectiveness of the proposed method.

    CiNii

  • A fast handoff method with reducing latency and packet loss for mobile communications

    TANAKA Atsuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 443 ) 41 - 46  2008.04

     View Summary

    These days, it is required that mobile nodes be connected to Internet in express trains or cars. In this paper, we propose a method of reducing latency and packet loss for both intara-MAP handoff and inter-MAP handoff. By AR (Access Router) getting IP layer parameter of AP (Access Point) which mobile node is going to connect to next beforehand, it is possible to reduce handoff latency and packet loss. We simulate our method using network simulator NS-2. We confirm that our method reduces latency and packet loss, which enables to continue VoIP communication even if inter-MAP handoff occurs.

    CiNii

  • LAMR : Load-Aware Multipath Routing in Ad hoc Networks

    SHIMIZU Yuji, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 443 ) 51 - 56  2008.04

     View Summary

    In this paper, we propose load-aware multipath routing (LAMR) in ad hoc networks. Conventional routing protocols in ad hoc networks have problems which select the shortest path without taking into account nodes&#039; traffic load and are so weak for link break because of establishing a single path. LAMR combines SMR that establishes multiple paths of maximally disjoint paths and LASR that establishes a single path of which takes into account nodes&#039; traffic load. In LAMR, when establishing paths, intermediate nodes use an RREQ drop algorism to compare the number of data packets in unit time with those of neighboring nodes and decide that RREQ is dropped or not. A destination node selects the maximally disjoint two paths from established paths by RREQs. Computer simulation results show that LAMR can improve the packet arrival ratio and reduce the number of control packets compared with the conventional protocols.

    CiNii

  • Application-Oriented Dynamic Reconfigurable Network Processor Architecture and Its Optimization Method

    OHTA Motonori, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 511 ) 47 - 52  2008.03

     View Summary

    In this paper, we propose an application directional dynamic reconfigurable network processor architecture and its optimization method. The proposed network processor consists of input and output processing processors, application specific hardware and dynamic processors, which could detect the processing bottleneck of each packet when it is executed. Dynamic processor enables throughput improvement of a network processor because dynamic processor executes the highest-load processing selectively so that the bottleneck can be removed. In designing network processors as an actual application, we developed a network simulator in order to obtain the most suitable hardware architecture. By using it, we can evaluate the number of dynamic processors which is the most suitable for a target application. In our work, when we set DES as a target application , we could find that the most suitable number of dynamic processors is six. Furthermore, we presented advantage of application directional dynamic reconfigurable network processor, comparing to the existing products.

    CiNii

  • Application-oriented dynamic reconfigurable network processor architecture and its optimization method

    OHTA Motonori, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 508 ) 47 - 52  2008.03

     View Summary

    In this paper, we propose an application directional dynamic reconfigurable network processor architecture and its optimization method. The proposed network processor consists of input and output processing processors, application specific hardware and dynamic processors, which could detect the processing bottleneck of each packet when it is executed. Dynamic processor enables throughput improvement of a network processor because dynamic processor executes the highest-load processing selectively so that the bottleneck can be removed. In designing network processors as an actual application, we developed a network simulator in order to obtain the most suitable hardware architecture. By using it, we can evaluate the number of dynamic processors which is the most suitable for a target application. In our work, when we set DES as a target application, we could find that the most suitable number of dynamic processors is six. Furthermore, we presented advantage of application directional dynamic reconfigurable network processor, comparing to the existing products.

    CiNii

  • An Energy-efficent ASIP Synthesis Method Based on Reducing Bit-width of Instruction Memory

    KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 509 ) 25 - 30  2008.03

     View Summary

    This paper proposes an energy-efficient ASIP synthesis method based on reducing bit-width of instruction memory. VLIW-type processors can execute several instructions concurrently. However, an instruction memory of the processors requires long bit-width. This increases power and energy consumption wastefully. Therefore reducing bit-width of instruction memory can realize high-performance and energy-efficient processors. Bit-width of an instruction memory depends on the instruction encoding format, which is composed of the opcode and the operands of a instruction. The opcode bit-width depends on the number of instructions in the instruction-set and the operand bit-width depends depends on the number of general-purpose registers. Moreover, to reduce opcode bit-width, we introduce a concept of a combined instruction which is handled as one instruction and composed of several instructions issued concurrently at each VLIW-slots. We develop an energy-efficient ASIP synthesis system including 3 algorithm: opcode bit-width reduction algorithm, operand bit-width reduction algorithm and energy minimization algorithm. In experimental results, we confirm 9%〜12% energy consumption reduction at a whole processor system including memories.

    CiNii

  • An energy-efficient ASIP synthesis method based on reducing bit-width of instruction memory

    KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 506 ) 25 - 30  2008.03

     View Summary

    This paper proposes an energy-efficient ASIP synthesis method based on reducing bit-width of instruction memory. VLIW-type processors can execute several instructions concurrently. However, an instruction memory of the processors requires long bit-width. This increases power and energy consumption wastefully. Therefore reducing bit-width of instruction memory can realize high-performance and energy-efficient processors. Bit-width of an instruction memory depends on the instruction encoding format, which is composed of the opcode and the operands of a instruction. The opcode bit-width depends on the number of instructions in the instruction-set and the operand bit-width depends depends on the number of general-purpose registers. Moreover, to reduce opcode bit-width, we introduce a concept of a combined instruction which is handled as one instruction and composed of several instructions issued concurrently at each VLIW-slots. We develop an energy-efficient ASIP synthesis system including 3 algorithm: opcode bit-width reduction algorithm, operand bit-width reduction algorithm and energy minimization algorithm. In experimental results, we confirm 9%〜12% energy consumption reduction at a whole processor system including memories.

    CiNii

  • A Multiplexer Reduction Algorithm in High-level Synthesis for Distributed-Register Architectures

    ENDO Tetsuya, OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 2 ) 85 - 90  2008.01

     View Summary

    As device feature size decreases, interconnection delay becomes the dominating factor of total delay. In addition, as the number of total gates and the number of wirings in each unit area increase, the number of multiplexers that is necessary for the wiring control increases. By using a distributed-register architecture, we can synthesize circuits with register-to-register data transfer, and can reduce influence of interconnection delay. However, as the number of wirings required for the connection between registers increases, the needed number of multiplexers is also increased. In this paper, we propose a multiplexer reduction algorithm in high-level synthesis for distributed-register architectures. This algorithm can reduce the number of multiplexers for each functional unit, wiring connection between local registers by optimizing a port re-assignment. We show effectiveness of the proposed algorithm thorough experimental results.

    CiNii

  • An L1 Data Cache Optimization Algorithm for Application Processor Cores

    TOJO Nobuaki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 2 ) 155 - 160  2008.01

     View Summary

    One major factor in improving the performance of embedded processors is the use of data and instruction caches. In this paper, we propose an L1 data cache optimization algorithm which selects a suitable cache configuration for a given embedded application. Our algorithm can have the area constraint by introducing CRMF (Configuration Reduction approach by the Miss Factor) and CRCB(Configuration Reduction approach by the Cache Behavior). Our algorithm finally selects best cache size, block size and associativity under the area constraint for a targeted application. We demonstrate the effectiveness of our algorithm by applying it to Mediabench.

    CiNii

  • A Processor Kernel Generation Method for Application-specific Processors

    HIURA Toshihiro, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2008 ( 2 ) 161 - 166  2008.01

     View Summary

    This paper proposes a processor kernel generation method for HW/SW co-design system named SPADES. SPADES is a system to synthesize processor cores specialized in application automatically. Low cost, small area, high performance and high productivity are required for application-specific processors in embedded systems. One of the effective methods to improve the processor performance is to integrate some hardware units such as SIMD functional units, MAC functional units, hardware loop unit, addressing unit, extra data memory, and it is important to select the appropriate hardware units depending on each target application. In our work, we divide the application-specific processor into some functional parts which are customized for additional hardware units, and our method generates and merges them. The description of the processor core is composed of each description of the function parts. In the experimental results, the VHDL descriptions of the processor cores can be generated in 3 seconds.

    CiNii

  • A Multiplexer Reduction Algorithm in High-level Synthesis for Distributed-Register Architectures

    ENDO Tetsuya, OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 419 ) 7 - 12  2008.01

     View Summary

    As device feature size decreases, interconnection delay becomes the dominating factor of total delay. In addition, as the number of total gates and the number of wirings in each unit area increase, the number of multiplexers that is necessary for the wiring control increases. By using a distributed-register architecture, we can synthesize circuits with register-to-register data transfer, and can reduce influence of interconnection delay. However, as the number of wirings required for the connection between registers increases, the needed number of multiplexers is also increased. In this paper, we propose a multiplexer reduction algorithm in high-level synthesis for distributed-register architectures. This algorithm can reduce the number of multiplexers for each functional unit, wiring connection between local registers by optimizing a port re-assignment. We show effectiveness of the proposed algorithm thorough experimental results.

    CiNii

  • An L1 Data Cache Optimization Algorithm for Application Processor Cores

    TOJO Nobuaki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 419 ) 77 - 82  2008.01

     View Summary

    One major factor in improving the performance of embedded processors is the use of data and instruction caches. In this paper, we propose an L1 data cache optimization algorithm which selects a suitable cache configuration for a given embedded application. Our algorithm can have the area constraint by introducing CRMF (Configuration Reduction approach by the Miss Factor) and CRCB(Configuration Reduction approach by the Cache Behavior). Our algorithm finally selects best cache size, block size and associativity under the area constraint for a targeted application. We demonstrate the effectiveness of our algorithm by applying it to Mediabench.

    CiNii

  • A Processor Kernel Generation Method for Application-specific Processors

    HIURA Toshihiro, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 419 ) 83 - 88  2008.01

     View Summary

    This paper proposes a processor kernel generation method for HW/SW co-design system named SPADES. SPADES is a system to synthesize processor cores specialized in application automatically. Low cost, small area, high performance and high productivity are required for application-specific processors in embedded systems. One of the effective methods to improve the processor performance is to integrate some hardware units such as SIMD functional units, MAC functional units, hardware loop unit, addressing unit, extra data memory, and it is important to select the appropriate hardware units depending on each target application. In our work, we divide the application-specific processor into some functional parts which are customized for additional hardware units, and our method generates and merges them. The description of the processor core is composed of each description of the function parts. In the experimental results, the VHDL descriptions of the processor cores can be generated in 3 seconds.

    CiNii

  • A Multiplexer Reduction Algorithm in High-level Synthesis for Distributed-Register Architectures

    ENDO Tetsuya, OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 417 ) 7 - 12  2008.01

     View Summary

    As device feature size decreases, interconnection delay becomes the dominating factor of total delay. In addition, as the number of total gates and the number of wirings in each unit area increase, the number of multiplexers that is necessary for the wiring control increases. By using a distributed-register architecture, we can synthesize circuits with register-to-register data transfer, and can reduce influence of interconnection delay. However, as the number of wirings required for the connection between registers increases, the needed number of multiplexers is also increased. In this paper, we propose a multiplexer reduction algorithm in high-level synthesis for distributed-register architectures. This algorithm can reduce the number of multiplexers for each functional unit, wiring connection between local registers by optimizing a port re-assignment. We show effectiveness of the proposed algorithm thorough experimental results.

    CiNii

  • An L1 Data Cache Optimization Algorithm for Application Processor Cores

    TOJO Nobuaki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 417 ) 77 - 82  2008.01

     View Summary

    One major factor in improving the performance of embedded processors is the use of data and instruction caches. In this paper, we propose an L1 data cache optimization algorithm which selects a suitable cache configuration for a given embedded application. Our algorithm can have the area constraint by introducing CRMF (Configuration Reduction approach by the Miss Factor) and CRCB(Configuration Reduction approach by the Cache Behavior). Our algorithm finally selects best cache size, block size and associativity under the area constraint for a targeted application. We demonstrate the effectiveness of our algorithm by applying it to Mediabench.

    CiNii

  • A Processor Kernel Generation Method for Application-specific Processors

    HIURA Toshihiro, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 417 ) 83 - 88  2008.01

     View Summary

    This paper proposes a processor kernel generation method for HW/SW co-design system named SPADES. SPADES is a system to synthesize processor cores specialized in application automatically. Low cost, small area, high performance and high productivity are required for application-specific processors in embedded systems. One of the effective methods to improve the processor performance is to integrate some hardware units such as SIMD functional units, MAC functional units, hardware loop unit, addressing unit, extra data memory, and it is important to select the appropriate hardware units depending on each target application. In our work, we divide the application-specific processor into some functional parts which are customized for additional hardware units, and our method generates and merges them. The description of the processor core is composed of each description of the function parts. In the experimental results, the VHDL descriptions of the processor cores can be generated in 3 seconds.

    CiNii

  • A Multiplexer Reduction Algorithm in High-level Synthesis for Distributed-Register Architectures

    ENDO Tetsuya, OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 415 ) 7 - 12  2008.01

     View Summary

    As device feature size decreases, interconnection delay becomes the dominating factor of total delay. In addition, as the number of total gates and the number of wirings in each unit area increase, the number of multiplexers that is necessary for the wiring control increases. By using a distributed-register architecture, we can synthesize circuits with register-to-register data transfer, and can reduce influence of interconnection delay. However, as the number of wirings required for the connection between registers increases, the needed number of multiplexers is also increased. In this paper, we propose a multiplexer reduction algorithm in high-level synthesis for distributed-register architectures. This algorithm can reduce the number of multiplexers for each functional unit, wiring connection between local registers by optimizing a port re-assignment. We show effectiveness of the proposed algorithm thorough experimental results.

    CiNii

  • An L1 Data Cache Optimization Algorithm for Application Processor Cores

    TOJO Nobuaki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 415 ) 77 - 82  2008.01

     View Summary

    One major factor in improving the performance of embedded processors is the use of data and instruction caches. In this paper, we propose an L1 data cache optimization algorithm which selects a suitable cache configuration for a given embedded application. Our algorithm can have the area constraint by introducing CRMF (Configuration Reduction approach by the Miss Factor) and CRCB(Configuration Reduction approach by the Cache Behavior). Our algorithm finally selects best cache size, block size and associativity under the area constraint for a targeted application. We demonstrate the effectiveness of our algorithm by applying it to Mediabench.

    CiNii

  • A Processor Kernel Generation Method for Application-specific Processors

    HIURA Toshihiro, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 415 ) 83 - 88  2008.01

     View Summary

    This paper proposes a processor kernel generation method for HW/SW co-design system named SPADES. SPADES is a system to synthesize processor cores specialized in application automatically. Low cost, small area, high performance and high productivity are required for application-specific processors in embedded systems. One of the effective methods to improve the processor performance is to integrate some hardware units such as SIMD functional units, MAC functional units, hardware loop unit, addressing unit, extra data memory, and it is important to select the appropriate hardware units depending on each target application. In our work, we divide the application-specific processor into some functional parts which are customized for additional hardware units, and our method generates and merges them. The description of the processor core is composed of each description of the function parts. In the experimental results, the VHDL descriptions of the processor cores can be generated in 3 seconds.

    CiNii

  • A Hardware Engine for Generation Deformed Map

    ARAHATA Akira, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2007 ( 114 ) 175 - 180  2007.11

     View Summary

    An image of map information for a computer display is complex to be shown in a mobile phone LCD. thus a deformed map is necessary in a mobile phone. An image of map information needs a renewal on real-time processing. hence an automatic generation of a deformed map is proposed. An automatic generation of a deformed map on the network server is not favor of individual and on the network client costs mobile phone load. This paper presents a hardware engine for generating deformed map of a mobile phone. We analyzed generating deformed map and detected a bottleneck of the processing. As a result, we proposes appropriate ALU for a mobile phone. Embedding the proposing ALU for a mobile phone, It is possible to execute generating deformed map from 50% to 20% of the processing of the past.

    CiNii

  • A power masking multiplier based on galois field for composite field AES

    KAWAHATA Nobuyuki, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2007 ( 114 ) 109 - 114  2007.11

     View Summary

    AES is one of common key cryptosystems and mainly used on an embedded system, IC-chip and others, and the common key must not known by others. However the common key can be cracked by side channel attack(SCA). SCA, an attacking method of cracking common key by measuring and analyzing physical quantity at the encryption processing, is proposed and pointed as a dangerous for the security of AES. Especialy in SCA, the attacking method that is the most dangerous and realistic for security of AES is to be a deffirential power analysis(DPA). Hence against DPA, SubBytes circuit is needed to design as an anti-DPA. To design an anti-DPA SubBytes circuit, we propose a power masking multiplier based on galois field for composite field AES. With the multiplier, we design a circuit of inverse-element based on galois field for composite field and design SubBytes circuit oriented low area by using it. We report evaluation and result.

    CiNii

  • A Hardware Engine for Generation Deformed Map

    ARAHATA Akira, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 336 ) 61 - 66  2007.11

     View Summary

    An image of map information for a computer display is complex to be shown in a mobile phone LCD. thus a deformed map is necessary in a mobile phone. An image of map information needs a renewal on real-time processing. hence an automatic generation of a deformed map is proposed. An automatic generation of a deformed map on the network server is not favor of individual and on the network client costs mobile phone load. This paper presents a hardware engine for generating deformed map of a mobile phone. We analyzed generating deformed map and detected a bottleneck of the processing. As a result, we proposes appropriate ALU for a mobile phone. Embedding the proposing ALU for a mobile phone, It is possible to execute generating deformed map from 50% to 20% of the processing of the past.

    CiNii

  • A Hardware Engine for Generation Deformed Map

    ARAHATA Akira, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 339 ) 61 - 66  2007.11

     View Summary

    An image of map information for a computer display is complex to be shown in a mobile phone LCD. thus a deformed map is necessary in a mobile phone. An image of map information needs a renewal on real-time processing. hence an automatic generation of a deformed map is proposed. An automatic generation of a deformed map on the network server is not favor of individual and on the network client costs mobile phone load. This paper presents a hardware engine for generating deformed map of a mobile phone. We analyzed generating deformed map and detected a bottleneck of the processing. As a result, we proposes appropriate ALU for a mobile phone. Embedding the proposing ALU for a mobile phone, It is possible to execute generating deformed map from 50% to 20% of the processing of the past.

    CiNii

  • A Multi-Rate Compatible Irregular LDPC Decoder Enhancing Column Operation Parallelism

    IMAI Yuta, SHIMIZU Kazunori, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 342 ) 19 - 24  2007.11

     View Summary

    Recently, needs for downloading digital contents via wireless network have been dramatically increasing as high-functionalization of portable music player and mobile phone had proceeded and the digitalization of broadcasting had been done. For that reason it is now essential to support high communication quality in a situation where communication environment is unstable. Low Density Parity Check(LDPC) code is expected to be an error correcting code for next generation since it shows high error correcting performance. Many experiments have been carried out on this topic. At present LDPC code is incorporated in IEEE802.11n which is next standard of wireless network. In this paper, we propose area-saving LDPC decoder which can show a high decoding performance under unstalbe wireless communication environment. This is done by sharing adders within the column operational module among different information rates. Our method can also increase a parallelism of operation as an information rate gets higher so that the decoder shows higher decoding throughtput compared to the conventional decoders.

    CiNii

  • A power masking multiplier based on galois field for composite field AES

    KAWAHATA Nobuyuki, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 335 ) 37 - 42  2007.11

     View Summary

    AES is one of common key cryptosystems and mainly used on an embedded system, IC-chip and others, and the common key must not known by others. However the common key can be cracked by side channel attack(SCA). SCA, an attacking method of cracking common key by measuring and analyzing physical quantity at the encryption processing, is proposed and pointed as a dangerous for the security of AES. Especialy in SCA, the attacking method that is the most dangerous and realistic for security of AES is to be a deffirential power analysis(DPA). Hence against DPA, SubBytes circuit is needed to design as an anti-DPA. To design an anti-DPA SubBytes circuit, we propose a power masking multiplier based on galois field for composite field AES. With the multiplier, we design a circuit of inverse-element based on galois field for composite field and design SubBytes circuit oriented low area by using it. We report evaluation and result.

    CiNii

  • A power masking multiplier based on galois field for composite field AES

    KAWAHATA Nobuyuki, NARA Ryuta, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 338 ) 37 - 42  2007.11

     View Summary

    AES is one of common key cryptosystems and mainly used on an embedded system, IC-chip and others, and the common key must not known by others. However the common key can be cracked by side channel attack (SCA). SCA, an attacking method of cracking common key by measuring and analyzing physical quantity at the encryption processing, is proposed and pointed as a dangerous for the security of AES. Especialy in SCA, the attacking method that is the most dangerous and realistic for security of AES is to be a deffirential power analysis (DPA). Hence against DPA, SubBytes circuit is needed to design as an anti-DPA. To design an anti-DPA SubBytes circuit, we propose a power masking multiplier based on galois field for composite field AES. With the multiplier, we design a circuit of inverse-element based on galois field for composite field and design SubBytes circuit oriented low area by using it. We report evaluation and result.

    CiNii

  • Embedded systems dissection room (2) Dissecting the IC card compatible automatic ticket gate

    戸川 望

    Interface   33 ( 11 ) 149 - 156  2007.11

    CiNii

  • 組み込みシステム解剖室(新連載・第1回)ディジタル一眼レフ・カメラを解剖する

    戸川 望

    インタ-フェ-ス   33 ( 10 ) 148 - 154  2007.10

    CiNii

  • A Load Balancing Method of Hierarchical Network Mobility by using Application Type and Data Size for Mobile Communications

    TSUKIGI Eiji, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告高度交通システム(ITS)   2007 ( 90 ) 65 - 70  2007.09

     View Summary

    Network Mobility (NEMO) is a method using the Internet inside cars or trains. However, packet losses, delay time and load concentration to a router, when handover is taking place, cause quality degradation. We propose a load balancing method by using application type and data size in multi-layered Mobility Anchor Point (MAP). Our method does not depend on MN&#039;s speed, so the load balancing over MAPs can be realized where all MNs move at the same speed as in a train. In addition, we can guarantee constant delay time for applications such as IP telephone or video streaming. We show the effectiveness of the proposed method using the network simulater OPNET.

    CiNii

  • Positioning Method Using Road Traffic Sign to Correct GPS Error for Pedestrian Navigation

    OHIRA Hidetaka, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告高度交通システム(ITS)   2007 ( 90 ) 71 - 76  2007.09

     View Summary

    GPS has a significant error caused by multipath, poor satellite reception and so on. We propose a positioning method that can correct GPS error for pedestrian navigation. Our system is composed of a cellular phone with camera, GPS and the map server that are all existing infrastructures and devices. A user takes a picture including a traffic sign by using his cellular phon camera and send it to the map server. The map server identifies the road traffic sign from the picture taken by the cellular phone. Because the user is in which he can see the sign, the map server corrects GPS error using position of the sign and identifies his position. One of the most important processes of this system is identification of the road traffic sign. Speed-up and high accuracy are needed for this process. We propose speed-up and high accuracy technique for identification of the road traffic sign. This technique is based on the feature that the user takes a picture of the sign to the center of the image. When the map server receives a sign picture, the map server identifies the sign by color recognition, shape recognition using Hough transform and symbol recognition using template matching. Proposed system reduces communication data by changing the resolution of the image to low so that the time from the sending the image to receiving result is kept within 3 seconds. In recognition of the color of the sign, the high accuracy is achieved by putting high weight on a pixel near the center of image. In recognition of the shape of sign, the hough transform is sped up by using direction of edges and the feature that a rounded sign is at the center of image. In template matching, speed-up is achieved by narrowing the candidate according to the color and the shape of the sign.

    CiNii

  • A positioning Method Using Camera Phone and Landmarks for Pedestrian Navigation

    HONDA Masahito, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告高度交通システム(ITS)   2007 ( 90 ) 77 - 82  2007.09

     View Summary

    A navigation service on a mobile phone for pedestrians has been increasing in recent years. The method using GPS or base-station information on a mobile phone is generally used for pedestrian positioning. However, in urban areas, significant signal errors may be caused by several factors, such as multipath and reduction of available satellites. In this paper, we propose a new positioning method to achieve highly accurate positioning with low-cost. By our method, all the users need to do is taking pictures of road signs and sending pictures to a map server using his mobile phone. The server sends back the user&#039;s position using the positioning data of road signs. If there are several candidates for the present data, the server asks a question concerning landmarks that the user can see. By answering this question, the server can give the user an accurate user&#039;s position. Since our positioning method does not use only GPS or base-station information, we remove the factors of errors in urban areas. We confirm the effectiveness of the proposed method through the experiments in a real environment.

    CiNii

  • A Travel Time Calculation Method Considering Congestion of Course Direction

    OHTAKA Kousuke, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    The Papers of Technical Meeting on Intelligent Transport Systems, IEE Japan   2007 ( 15 ) 15 - 20  2007.09

     View Summary

    To calculate travel time from departing point to destination point by vehicles, link travel time offered by VICS or telematics service is used. However, the accuracy of link travel time is so low that the accuracy of the travel time becomes also low. In this paper, we propose a method to calculate travel time with higher accuracy than traditional methods. By storing link travel time based on driving records for each course direction, our method enables to calculate travel time which considers congestion of course direction, and covers all roads including highway. In addition, we show the effectiveness of our method by examining with a traffic simulator.

    CiNii

  • A Load Balancing Method of Hierarchical Network Mobility by using Application Type and Data Size for Mobile Communications

    TSUKIGI Eiji, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    The Papers of Technical Meeting on Intelligent Transport Systems, IEE Japan   2007 ( 15 ) 65 - 70  2007.09

    CiNii

  • Positioning Method Using Road Traffic Sign to Correct GPS Error for Pedestrian Navigation

    OHIRA Hidetaka, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    The Papers of Technical Meeting on Intelligent Transport Systems, IEE Japan   2007 ( 15 ) 71 - 76  2007.09

    CiNii

  • A Positioning Method Using Camera Phone and Landmarks for Pedestrian Navigation

    HONDA Masahito, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    The Papers of Technical Meeting on Intelligent Transport Systems, IEE Japan   2007 ( 15 ) 77 - 82  2007.09

    CiNii

  • A Travel Time Calculation Method Considering Congestion of Course Direction

    OHTAKA Kousuke, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告高度交通システム(ITS)   2007 ( 90 ) 15 - 20  2007.09

     View Summary

    To calculate travel time from departing point to destination point by vehicles, link travel time offered by VICS or telematics service is used. However, the accuracy of link travel time is so low that the accuracy of the travel time becomes also low. In this paper, we propose a method to calculate travel time with higher accuracy than traditional methods. By storing link travel time based on driving records for each course direction, our method enables to calculate travel time which considers congestion of course direction, and covers all roads including highway. In addition, we show the effectiveness of our method by examining with a traffic simulator.

    CiNii

  • A Load Balancing Method of Hierarchical Network Mobility by using Application Type and Data Size for Mobile Communications

    TSUKIGI Eiji, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 216 ) 21 - 26  2007.09

     View Summary

    Network Mobility (NEMO) is a method using the Internet inside cars or trains. However, packet losses, delay time and load concentration to a router, when handover is taking place, cause quality degradation. We propose a load balancing method by using application type and data size in multi-layered Mobility Anchor Point (MAP). Our method does not depend on MN&#039;s speed, so the load balancing over MAPs can be realized where all MNs move at the same speed as in a train. In addition, we can guarantee constant delay time for applications such as IP telephone or video streaming. We show the effectiveness of the proposed method using the network simulater OPNET.

    CiNii

  • Positioning Method Using Road Traffic Sign to Correct GPS Error for Pedestrian Navigation

    OHIRA Hidetaka, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 216 ) 27 - 32  2007.09

     View Summary

    GPS has a significant error caused by multipath, poor satellite reception and so on. We propose a positioning method that can correct GPS error for pedestrian navigation. Our system is composed of a cellular phone with camera, GPS and the map server that are all existing infrastructures and devices. A user takes a picture including a traffic sign by using his cellular phon camera and send it to the map server. The map server identifies the road traffic sign from the picture taken by the cellular phone. Because the user is in which he can see the sign, the map server corrects GPS error using position of the sign and identifies his position. One of the most important processes of this system is identification of the road traffic sign. Speed-up and high accuracy are needed for this process. We propose speed-up and high accuracy technique for identification of the road traffic sign. This technique is based on the feature that the user takes a picture of the sign to the center of the image. When the map server receives a sign picture, the map server identifies the sign by color recognition, shape recognition using Hough transform and symbol recognition using template matching. Proposed system reduces communication data by changing the resolution of the image to low so that the time from the sending the image to receiving result is kept within 3 seconds. In recognition of the color of the sign, the high accuracy is achieved by putting high weight on a pixel near the center of image. In recognition of the shape of sign, the hough transform is sped up by using direction of edges and the feature that a rounded sign is at the center of image. In template matching, speed-up is achieved by narrowing the candidate according to the color and the shape of the sign.

    CiNii

  • A Positioning Method Using Camera Phone and Landmarks for Pedestrian Navigation

    HONDA Masahito, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 216 ) 33 - 38  2007.09

     View Summary

    A navigation service on a mobile phone for pedestrians has been increasing in recent years. The method using GPS or base-station information on a mobile phone is generally used for pedestrian positioning. However, in urban areas, significant signal errors may be caused by several factors, such as multipath and reduction of available satellites. In this paper, we propose a new positioning method to achieve highly accurate positioning with low-cost. By our method, all the users need to do is taking pictures of road signs and sending pictures to a map server using his mobile phone. The server sends back the user&#039;s position using the positioning data of road signs. If there are several candidates for the present data, the server asks a question concerning landmarks that the user can see. By answering this question, the server can give the user an accurate user&#039;s position. Since our positioning method does not use only GPS or base-station information, we remove the factors of errors in urban areas. We confirm the effectiveness of the proposed method through the experiments in a real environment.

    CiNii

  • A Travel Time Calculation Method Considering Congestion of Course Direction

    OHTAKA Kousuke, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 215 ) 15 - 20  2007.09

     View Summary

    To calculate travel time from departing point to destination point by vehicles, link travel time offered by VICS or telematics service is used. However, the accuracy of link travel time is so low that the accuracy of the travel time becomes also low. In this paper, we propose a method to calculate travel time with higher accuracy than traditional methods. By storing link travel time based on driving records for each course direction, our method enables to calculate travel time which considers congestion of course direction, and covers all roads including highway. In addition, we show the effectiveness of our method by examining with a traffic simulator.

    CiNii

  • Scalable Dual-Radix Unified Montgomery Multiplier in GF(P) and GF(2^n)

    TANIMURA Kazuyuki, NARA Ryuta, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 101 ) 43 - 48  2007.06

     View Summary

    Modular multiplication is the dominant arithmetic operation in elliptic curve cryptography (ECC), which is one of public-key cryptographies. Montgomery multiplication is commonly used as a technique for modular multiplication and required scalability since the bit length of operands varies depending on the security levels. ECC is performed in GF(P) of GF(2^n), and scalable unified architectures are proposed in previous works. However, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2^n) parts of the multiplier because the critical path of GF(P) hardware is longer. This paper proposes an algorithm and architecture for a scalable and dual-radix unified Montgomery multiplier in GF(P) and GF(2^n). The proposed architecture unifies 4 parallelized radix-2^16 multipliers in GF(P) and a radix-2^64 multiplier in GF(2^n) into a single unit. Applying lower radix to GF(P) hardware shortens its critical path and allows to compute the numbers in the two fields using a same multiplier. Moreover, parallelized architecture in GF(P) reduces the clock cycles increased by dual-radix approach, achieving the fastest scalable unified Montgomery multiplier yet reported.

    CiNii

  • Filter Design for Flexible Engine/Generic ALU Array and Its Dedicated Synthesis Algorithm

    HONMA Masayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, SATOH Makoto

    IEICE technical report   107 ( 100 ) 67 - 72  2007.06

     View Summary

    Reconfigurable processors are processors whose contexts are dynamically reconfigured while they are working. We focus on a reconfigurable processor called FE-GA (Flexible Engine/Generic ALU array) for digital media processing. Presently, FE-GA do not have its dedicated development tool. Thus, in this paper, we design FIR filters and propose an algorithm to map them onto it automatically. For given a degree and coefficients of an FIR filter, the algorithm generates a dedicated assembly code which represents a given FIR filter for FE-GA. Then an editor called FEEditor reads the generated assembly code and implements its corresponding FIR filter on FE-GA. The proposed algorithm achieves automatic mapping of FIR filters of all degrees within the range of the specification of FE-GA architecture. Furthermore, it is proved that a minimum cycle to execute FIR filtering is achieved if there is no thread switch.

    CiNii

  • Scalable Dual-Radix Unified Montgomery Multiplier in GF(P) and GF(2^n)

    TANIMURA Kazuyuki, NARA Ryuta, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 105 ) 43 - 48  2007.06

     View Summary

    Modular multiplication is the dominant arithmetic operation in elliptic curve cryptography (ECC), which is one of public-key cryptographies. Montgomery multiplication is commonly used as a technique for modular multiplication and required scalability since the bit length of operands varies depending on the security levels. ECC is performed in GF(P) of GF(2^n), and scalable unified architectures are proposed in previous works. However, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2^n) parts of the multiplier because the critical path of GF(P) hardware is longer. This paper proposes an algorithm and architecture for a scalable and dual-radix unified Montgomery multiplier in GF(P) and GF(2^n). The proposed architecture unifies 4 parallelized radix-2^16 multipliers in GF(P) and a radix-2^64 multiplier in GF(2^n) into a single unit. Applying lower radix to GF(P) hardware shortens its critical path and allows to compute the numbers in the two fields using a same multiplier. Moreover, parallelized architecture in GF(P) reduces the clock cycles increased by dual-radix approach, achieving the fastest scalable unified Montgomery multiplier yet reported.

    CiNii

  • Scalable Dual-Radix Unified Montgomery Multiplier in GF(P) and GF(2^n)

    TANIMURA Kazuyuki, NARA Ryuta, KOHARA Shunitsu, SHI Youhua, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 103 ) 43 - 48  2007.06

     View Summary

    Modular multiplication is the dominant arithmetic operation in elliptic curve cryptography (ECC), which is one of public-key cryptographies. Montgomery multiplication is commonly used as a technique for modular multiplication and required scalability since the bit length of operands varies depending on the security levels. ECC is performed in GF(P) of GF(2^n), and scalable unified architectures are proposed in previous works. However, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2^n) parts of the multiplier because the critical path of GF(P) hardware is longer. This paper proposes an algorithm and architecture for a scalable and dual-radix unified Montgomery multiplier in GF(P) and GF(2^n). The proposed architecture unifies 4 parallelized radix-2^16 multipliers in GF(P) and a radix-2^64 multiplier in GF(2^n) into a single unit. Applying lower radix to GF(P) hardware shortens its critical path and allows to compute the numbers in the two fields using a same multiplier. Moreover, parallelized architecture in GF(P) reduces the clock cycles increased by dual-radix approach, achieving the fastest scalable unified Montgomery multiplier yet reported.

    CiNii

  • Filter Design for Flexible Engine/Generic ALU Array and Its Dedicated Synthesis Algorithm

    HONMA Masayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, SATOH Makoto

    IEICE technical report   107 ( 104 ) 67 - 72  2007.06

     View Summary

    Reconfigurable processors are processors whose contexts are dynamically reconfigured while they are working. We focus on a reconfigurable processor called FE-GA (Flexible Engine/Generic ALU array) for digital media processing. Presently, FE-GA do not have its dedicated development tool. Thus, in this paper, we design FIR filters and propose an algorithm to map them onto it automatically. For given a degree and coefficients of an FIR filter, the algorithm generates a dedicated assembly code which represents a given FIR filter for FE-GA. Then an editor called FEEditor reads the generated assembly code and implements its corresponding FIR filter on FE-GA. The proposed algorithm achieves automatic mapping of FIR filters of all degrees within the range of the specification of FE-GA architecture. Furthermore, it is proved that a minimum cycle to execute FIR filtering is achieved if there is no thread switch.

    CiNii

  • Filter Design for Flexible Engine/Generic ALU Array and Its Dedicated Synthesis Algorithm

    HONMA Masayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo, SATOH Makoto

    IEICE technical report   107 ( 102 ) 67 - 72  2007.06

     View Summary

    Reconfigurable processors are processors whose contexts are dynamically reconfigured while they are working. We focus on a reconfigurable processor called FE-GA (Flexible Engine/Generic ALU array) for digital media processing. Presently, FE-GA do not have its dedicated development tool. Thus, in this paper, we design FIR filters and propose an algorithm to map them onto it automatically. For given a degree and coefficients of an FIR filter, the algorithm generates a dedicated assembly code which represents a given FIR filter for FE-GA. Then an editor called FEEditor reads the generated assembly code and implements its corresponding FIR filter on FE-GA. The proposed algorithm achieves automatic mapping of FIR filters of all degrees within the range of the specification of FE-GA architecture. Furthermore, it is proved that a minimum cycle to execute FIR filtering is achieved if there is no thread switch.

    CiNii

  • An SIMD MSD Multiplier based on variable GF(2m) for Elliptic Curve Cryptosystems

    NARA Ryuta, SHIMIZU Kazunori, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2007 ( 39 ) 57 - 61  2007.05

     View Summary

    Originally elliptic curve cryptosystem (ECC) hardware are often required to operate variable key length. Digit-serial multipliers for ECC enable the hardware to accelerale the finite field operation. However, the lack of flexibility of digit-serial multipliers is major challenge for building the ecc hardware which operates variable key length. In this paper, we propose a SIMD MSD multiplier based on variable GF(2^m) for ECC. Adjusting the parallellizm of the SIMD MSD multiplier according to the field length enables us to accelarate the ecc scalar multiplication throughput. The proposed multiplier operates 5 types of field length which are recommended by NIST, where 2 multiplications can be operated simultaneously for the small field length. Implementation results show that the proposed multiplier reduces the hardware area by up to 1/3 compared to the same throughput. while achieving up to about 2 times multiplication throughput compared to the conventional multipliers for the variable field length.

    CiNii

  • An SIMD MSD Multiplier based on variable GF(2^m) for Elliptic Curve Cryptosystems

    NARA Ryuta, SHIMIZU Kazunori, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   107 ( 32 ) 25 - 29  2007.05

     View Summary

    Originally elliptic curve cryptosystem (ECC) hardware are often required to operate variable key length. Digit-serial multipliers for ECC enable the hardware to accelerale the finite field operation. However, the lack of flexibility of digit-serial multipliers is major challenge for building the ecc hardware which operates variable key length. In this paper, we propose a SIMD MSD multiplier based on variable GF(2^m) for ECC. Adjusting the parallellizm of the SIMD MSD multiplier according to the field length enables us to accelarate the ecc scalar multiplication throughput. The proposed multiplier operates 5 types of field length which are recommended by NIST, where 2 multiplications can be operated simultaneously for the small field length. Implementation results show that the proposed multiplier reduces the hardware area by up to 1/3 compared to the same throughput. while achieving up to about 2 times multiplication throughput compared to the conventional multipliers for the variable field length.

    CiNii

  • Low power LDPC decoder design based on accelerated message-passing schedule

    清水 一範, 戸川 望, 池永 剛

    回路とシステム軽井沢ワークショップ論文集   20   331 - 336  2007.04

    CiNii

  • An Elliptic Area Search Algorithm Based on a Mobile User Destination

    YAMAMOTO Takayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    The Papers of Technical Meeting on Intelligent Transport Systems, IEE Japan   2007 ( 1 ) 25 - 30  2007.03

    CiNii

  • An Elliptic Area Search Algorithm Based on a Mobile User Destination

    YAMAMOTO Takayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 616 ) 25 - 30  2007.03

     View Summary

    A location-based information service has become common these days, which gives neighborhood information by using mobile user&#039;s location. However such kinds of services are difficult to distinguish important informations from ones that are not so, because they give too much informations. It is a problem to pick up important informations among them provided by a location-based information service. In this paper we propese an elliptic area search algorithm based on a mobile user&#039;s location and his/her destination in order to obtain important informations. Our proposed algorithm searches information by an elliptic area whose two focuses correspond to a mobile user&#039;s location and destination. We show its effectiveness by carrying out experiments and compare it with existing systems.

    CiNii

  • A Clustering Technique for Energy Consumption Reduction in Wireless Sensor Network

    HIROSE Fumiaki, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 577 ) 41 - 46  2007.03

     View Summary

    Recently, the wireless sensor network is expected as a new communications infrastructure technology in the ubiquitous society. The traditional network protocol like TCP/IP was made it a top priority to be designed in the consideration of throughput and delay constraint, and left the energy consumption on a back burner. But the protocol on the sensor network is desired to make it a top priority to be designed in the consideration of energy consumption, the reason for the difficulty of physical power supply management to the terminal and the miniaturization, lowering cost of it, and so on. Therefore, in this paper, we propose the sensor network protocol prolonging lifetime of the entire network and improving the rate of data accession to the base station. This technique extends protocol LEACH for the sensor network of the cluster type, improves the efficiency of the communication by reducing the overhead of an extra communication, and generates the best cluster between terminals.

    CiNii

  • A routing selection method for improving the response time in anycast communications

    YANG Xia, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 577 ) 381 - 386  2007.03

     View Summary

    We propose a novel method to select a proper server in anycast communications. The proposed method is based on the core-based tree method, but takes into consideration not only delay time of route but also the processing time of the server. We carried out simulations by using the network simulator, OPENT. The simulation results show that the proposition system can reduce the delay time 30% or more. Further, both better load balance and better throughout can be obtained.

    CiNii

  • A Deformed Map Generation Algorithm Considering Visibility in a Small Display and Easiness of Route Understanding for Pedestrian Navigation

    NINOMIYA Naoya, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告高度交通システム(ITS)   2006 ( 103 ) 111 - 116  2006.09

     View Summary

    The use of map service for pedestrians has expanded by the spread of the location information service and Internet services by the cellular phone. There have been various researches to generate effective deformed maps to mobile devices with a small display automatically. The existing techniques are based on making road shape horizontal and vertical, and quantizing of intersection angle. Deformed maps generated by them have a high level of visibility, but they are not easy to understand for users. In this paper, we propose a road shape transformation algorithm based on cognitive science. It can generate deformed maps that can be understandable in a small display and has easiness of route understanding. By applying our proposed algorithm to about 400 node road-network data, we confirmed that our proposed algorithm work efficiently.

    CiNii

  • A Detection Method of Line Congestion Information by Using Vehicle-to-Vehicle and Road-to-Vehicle Communication

    OHTAKA Kousuke, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    The Papers of Technical Meeting on Intelligent Transport Systems, IEE Japan   2006 ( 20 ) 19 - 24  2006.09

    CiNii

  • A Detection Method of Line Congestion Information by Using Vehicle-to-Vehicle and Road-to-Vehicle Communication

    OHTAKA Kousuke, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告高度交通システム(ITS)   2006 ( 103 ) 19 - 24  2006.09

     View Summary

    As the ITS technology evolves, the measurement accuracy and the technology of the route guide is rising. But, because the measurement accuracy of the time required from startoing point to destination is not high enough, it is problem how to acquire accurate congestion information. Especially, because difference in congestion situation for each lane exerts a great influence on calculation of the time required, if the congestion level is different for each lane, it is necessary to detect congestion information for each lane in the intersection without causing a problem which was seen in conventional congestion-detecting method. Then, we propose a method to detect congestion information of each lane by using Vehicle-to-Vehicle and Road-to-Vehicle Communication technology in real time in the intersection on a general road. Time required to pass the congestion is calculated by using the information which was gathered by iterative communication among cars which starts from a beacom. After that, we show the effectiveness of this method by simulating it.

    CiNii

  • A Detection Method of Line Congestion Information by Using Vehicle-to-Vehicle and Road-to-Vehicle Communication

    OHTAKA Kousuke, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 265 ) 19 - 24  2006.09

     View Summary

    As the ITS technology evolves, the measurement accuracy and the technology of the route guide is rising. But, because the measurement accuracy of the time required from startoing point to destination is not high enough, it is problem how to acquire accurate congestion information. Especially, because difference in congestion situation for each lane exerts a great influence on calculation of the time required, if the congestion level is different for each lane, it is necessary to detect congestion information for each lane in the intersection without causing a problem which was seen in conventional congestion-detecting method. Then, we propose a method to detect congestion information of each lane by using Vehicle-to-Vehicle and Road-to-Vehicle Communication technology in real time in the intersection on a general road. Time required to pass the congestion is calculated by using the information which was gathered by iterative communication among cars which starts from a beacom. After that, we show the effectiveness of this method by simulating it.

    CiNii

  • A Functional Unit Design of Motion Estimator on DSP for H.264/AVC Encoding

    TAKAHASHI Toyokazu, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 114 ) 13 - 18  2006.06

     View Summary

    The improved coding efficiency in H.264/AVC comes from higher computational complexity. Most of that is related to motion estimation. Some new features, such as multiple reference frame, variable block size motion compensation and quarter-pel accuracy motion compensation have been adopted to improve coding peformance, however they would increase the processing time. On the other hand, to speed up motion estimation, many architectures that can implement integer-pel motion estimation have also been proposed. However, it&#039;s difficult to improve the processing performance of such architectures in memory bandwidth restricted architecture like a DSP datapath, due to the irregular memory access. In this paper, we propose an integer-pel motion estimator on DSP that adopts pixel subsampling technique to reduce hardware cost. In addition, we modify subsampling pattern from commonly used chessboad-like pattern to vertical-striped pattern, which is able to speed up motion estimation by reducing memory access cycles. The proposed architecture can process 86.5 CIF frames per second at 200MHz operating frequency.

    CiNii

  • A Functional Unit Design of Motion Estimator on DSP for H.264/AVC Encoding

    TAKAHASHI Toyokazu, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 116 ) 13 - 18  2006.06

     View Summary

    The improved coding efficiency in H.264/AVC comes from higher computational complexity. Most of that is related to motion estimation. Some new features, such as multiple reference frame, variable block size motion compensation and quarter-pel accuracy motion compensation have been adopted to improve coding peformance, however they would increase the processing time. On the other hand, to speed up motion estimation, many architectures that can implement integer-pel motion estimation have also been proposed. However, it&#039;s difficult to improve the processing performance of such architectures in memory bandwidth restricted architecture like a DSP datapath, due to the irregular memory access. In this paper, we propose an integer-pel motion estimator on DSP that adopts pixel subsampling technique to reduce hardware cost. In addition, we modify subsampling pattern from commonly used chessboad-like pattern to vertical-striped pattern, which is able to speed up motion estimation by reducing memory access cycles. The proposed architecture can process 86.5 CIF frames per second at 200MHz operating frequency.

    CiNii

  • A Functional Unit Design of Motion Estimator on DSP for H.264/AVC Encoding

    TAKAHASHI Toyokazu, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 112 ) 13 - 18  2006.06

     View Summary

    The improved coding efficiency in H.264/AVC comes from higher computational complexity. Most of that is related to motion estimation. Some new features, such as multiple reference frame, variable block size motion compensation and quarter-pel accuracy motion compensation have been adopted to improve coding peformance, however they would increase the processing time. On the other hand, to speed up motion estimation, many architectures that can implement integer-pel motion estimation have also been proposed. However, it&#039;s difficult to improve the processing performance of such architectures in memory bandwidth restricted architecture like a DSP datapath, due to the irregular memory access. In this paper, we propose an integer-pel motion estimator on DSP that adopts pixel subsampling technique to reduce hardware cost. In addition, we modify subsampling pattern from commonly used chessboad-like pattern to vertical-striped pattern, which is able to speed up motion estimation by reducing memory access cycles. The proposed architecture can process 86.5 CIF frames per second at 200MHz operating frequency.

    CiNii

  • An Area/Delay Estimation Method for an Application Processor in HW/SW Cosynthesis

    YAMAZAKI Daisuke, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 113 ) 1 - 6  2006.06

     View Summary

    This paper proposes an area/delay estimation method with configurable pipeline stages and controller structure. In HW/SW cosynthesis, we optimize processor architecture for a target application, and design a hardware part and a software part at the same time. In order to obtain an optimal architecture processor in a short time, we require a fast area/delay estimation method without logic synthesis in an architecture exploration phase. It is important to estimate them accurately because a large range of errors may lead an inadequate solution. In the proposal method, we partition the processor core into several functional parts and parameterize them, and obtain an estimation equation by analyzing the results of logic synthesis. We show the effectiveness of the proposal technique by verifying the area/delay values obtained from the equation estimation and the logic synthesis value of the processor core. Relative error of them is 1.13[%] on the average. Errors of delays is 0.14[ns] on the average.

    CiNii

  • An Area/Delay Estimation Method for an Application Processor in HW/SW Cosynthesis

    YAMAZAKI Daisuke, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 115 ) 1 - 6  2006.06

     View Summary

    This paper proposes an area/delay estimation method with configurable pipeline stages and controller structure. In HW/SW cosynthesis, we optimize processor architecture for a target application, and design a hardware part and a software part at the same time. In order to obtain an optimal architecture processor in a short time, we require a fast area/delay estimation method without logic synthesis in an architecture exploration phase. It is important to estimate them accurately because a large range of errors may lead an inadequate solution. In the proposal method, we partition the processor core into several functional parts and parameterize them, and obtain an estimation equation by analyzing the results of logic synthesis. We show the effectiveness of the proposal technique by verifying the area/delay values obtained from the equation estimation and the logic synthesis value of the processor core. Relative error of them is 1.13[%] on the average. Errors of delays is 0.14[ns] on the average.

    CiNii

  • An Area/Delay Estimation Method for an Application Processor in HW/SW Cosynthesis

    YAMAZAKI Daisuke, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   106 ( 111 ) 1 - 6  2006.06

     View Summary

    This paper proposes an area/delay estimation method with configurable pipeline stages and controller structure. In HW/SW cosynthesis, we optimize processor architecture for a target application, and design a hardware part and a software part at the same time. In order to obtain an optimal architecture processor in a short time, we require a fast area/delay estimation method without logic synthesis in an architecture exploration phase. It is important to estimate them accurately because a large range of errors may lead an inadequate solution. In the proposal method, we partition the processor core into several functional parts and parameterize them, and obtain an estimation equation by analyzing the results of logic synthesis. We show the effectiveness of the proposal technique by verifying the area/delay values obtained from the equation estimation and the logic synthesis value of the processor core. Relative error of them is 1.13[%] on the average. Errors of delays is 0.14[ns] on the average.

    CiNii

  • LDPC decoder using FIFO-based high-efficiency message-passing schedule

    清水 一範, 石川 達之, 戸川 望

    回路とシステム軽井沢ワークショップ論文集   19   211 - 216  2006.04

    CiNii

  • An application specific data cache optimization algorithm for processor cores

    堀内 一央, 小原 俊逸, 戸川 望

    回路とシステム軽井沢ワークショップ論文集   19   583 - 588  2006.04

    CiNii

  • Positioning Method Using Camera Phone in A Pedestrian Navigation System

    NAKAGUCHI Satoshi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    The Papers of Technical Meeting on Intelligent Transport Systems, IEE Japan   2006 ( 1 ) 25 - 30  2006.03

    CiNii

  • Positioning Method Using Camera Phone in A Pedestrian Navigation System

    NAKAGUCHI Satoshi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   105 ( 688 ) 25 - 30  2006.03

     View Summary

    The method using GPS or the base station information on a cellular phone is generally used in a positioning method in the pedestrian navigation system now. However, in urban areas, a significant error may be caused according to several factors, such as multipass and reduction of satellites. The proposal system in this paper aims at the low-cost, highly accurate positioning which utilizes the existing infrastructure by photoing and recognizing the road traffic sign in a town using a camera-mounted phone which has spread widely now. We confirmed the effectiveness of the proposal method through the experiment in a real environment.

    CiNii

  • Improved Network Processor for Dynamic Packet Flows and Its Experimental Evaluations

    TABUCHI Hidetaka, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   105 ( 644 ) 25 - 30  2006.03

     View Summary

    A network processor for dynamic packet flows can behave adaptively on dynamic communication data-flows. This paper proposes a CAM module for an improved network processor for dynamic packet flows. We divide routing information based on the rate of required IP prefix length on a network and design a CAM module composed of CAM and TCAM utilizing this information. The CAM module can reduce the area and improve the throughput of a network processor compared with a conventional one. The effectiveness of the CAM module is shown through implementations and exparimental evalutions. In the hardware evalution using 0.35μm CMOS standerd library, the area can be reduced by 11.9%, and its throughput improved by up to 16.3%.

    CiNii

  • Improved Network Processor for Dynamic Packet Flows and Its Experimental Evaluations

    TABUCHI Hidetaka, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   105 ( 646 ) 25 - 30  2006.03

     View Summary

    A network processor for dynamic packet flows can behave adaptively on dynamic communication data-flows. This paper proposes a CAM module for an improved network processor for dynamic packet flows. We divide routing information based on the rate of required IP prefix length on a network and design a CAM module composed of CAM and TCAM utilizing this information. The CAM module can reduce the area and improve the throughput of a network processor compared with a conventional one. The effectiveness of the CAM module is shown through implementations and exparimental evalutions. In the hardware evalution using 0.35μm CMOS standerd library, the area can be reduced by 11.9%, and its throughput improved by up to 16.3%.

    CiNii

  • A High-level Synthesis Algorithm Based on Floorplans for Distributed/Shared-Register Architectures

    OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2005 ( 121 ) 73 - 78  2005.11

     View Summary

    As device feature size decreases, interconnection delay becomes the dominating factor of total delay. By using Distributed-Register architectures, we can synthesize the circuits with register-to-register data transfer, and can reduce influence of interconnect delay. However, Distributed-Register architectures have the problem that circuit area increases by the number of registers increasing. In this paper, we propose a high-level synthesis method targeting a Distributed/Shared-Register architectures. Our method repeats (1) scheduling, (2) register allocation, (3) register binding, (4) module placement processes, and feeds back floorplan information from (4). This method can reduce circuit area while maintaining the performance of the circuit equal with Distrubuted-register architectures. We show effectiveness of the proposed methods through experimental results.

    CiNii

  • Fast Interconnect Delay Estimation with Considering Inductance Based on Multiple Regression Analysis

    SUZUKI Kosei, ANWAR Marta D, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2005 ( 121 ) 109 - 114  2005.11

     View Summary

    In recent DSM (Deep SubMicron) technology, we need to take some important points, such as floorplaning, interconnect resistance and so on into consideration. It has been shown that inductance effect on clock, power, bus and macroblock interconnect is considerably large. In this paper we propose a new method to estimate single interconnect 50% delay by using an approximated equation given by multiple regression analysis. The proposed method achieved higher accuracy and less amount of operation than those of a conventional method.

    CiNii

  • A High-level Synthesis Algorithm Based on Floorplans for Distributed/Shared-Register Architectures

    OHCHI Akira, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   105 ( 442 ) 31 - 36  2005.11

     View Summary

    As device feature size decreases, interconnection delay becomes the dominating factor of total delay. By using Distributed-Register architectures, we can synthesize the circuits with register-to-register data transfer, and can reduce influence of interconnect delay. However, Distributed-Register architectures have the problem that circuit area increases by the number of registers increasing. In this paper, we propose a high-level synthesis method targeting a Distributed/Shared-Register architectures. Our method repeats (1) scheduling, (2) register allocation, (3) register binding, (4) module placement processes, and feeds back floorplan information from (4). This method can reduce circuit area while maintaining the performance of the circuit equal with Distrubuted-register architectures. We show effectiveness of the proposed methods through experimental results.

    CiNii

  • Fast Interconnect Delay Estimation with Considering Inductance Based on Multiple Regression Analysis

    SUZUKI Kosei, ANWAR Marta D, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report   105 ( 442 ) 67 - 72  2005.11

     View Summary

    In recent DSM (Deep SubMicron) technology, we need to take some important points, such as floorplaning, interconnect resistance and so on into consideration. It has been shown that inductance effect on clock, power, bus and macroblock interconnect is considerably large. In this paper we propose a new method to estimate single interconnect 50% delay by using an approximated equation given by multiple regression analysis. The proposed method achieved higher accuracy and less amount of operation than those of a conventional method.

    CiNii

  • Asia and South Pacific Design Automation Conference 2005(ASP-DAC 2005, アジア・南太平洋設計自動化会議2005)

    戸川 望

    電子情報通信学会誌   88 ( 4 ) 303 - 303  2005.04

    CiNii

  • Implementation and Evaluation of Partial-Parallel LDPC Decoder Improving Belief Propagation based on Sum-Product Algorithm

    SHIMIZU Kazunori, ISHIKAWA Tatsuyuki, TOGAWA Nozomu, IKENAGA Takeshi, GOTO Satoshi

    Technical report of IEICE. VLD   104 ( 709 ) 73 - 78  2005.03

     View Summary

    In this paper, we propose a partial-parallel LDPC decoder improving belief propagation based on sum-product algorithm. Our proposed partial-parallel LDPC decoder processes column operations for bit nodes in conjunction with row operations for check nodes. Bit functional unit with pipeline architecture in our LDPC decoder allows us to process column operations for every bit node connected to each of check nodes which are processed by row operations in parallel. Thus, our proposed LDPC decoder increases the number of belief propagations in the sum-product algorithm. We implemented the proposed partial-parallel LDPC decoder on a FPGA, and simulated its decoding performance. Practical simulation shows that our proposed partial-parallel LDPC decoder improves the number of iterations and bit error performance in the sum-product algorithm.

    CiNii

  • Implementation and Evaluation of Partial-Parallel LDPC Decoder Improving Belief Propagation based on Sum-Product Algorithm

    SHIMIZU Kazunori, ISHIKAWA Tatsuyuki, TOGAWA Nozomu, IKENAGA Takeshi, GOTO Satoshi

    Technical report of IEICE. ICD   104 ( 711 ) 73 - 78  2005.03

     View Summary

    In this paper, we propose a partial-parallel LDPC decoder improving belief propagation based on sum-product algorithm. Our proposed partial-parallel LDPC decoder processes column operations for bit nodes in conjunction with row operations for check nodes. Bit functional unit with pipeline architecture in our LDPC decoder allows us to process column operations for every bit node connected to each of check nodes which are processed by row operations in parallel. Thus, our proposed LDPC decoder increases the number of belief propagations in the sum-product algorithm. We implemented the proposed partial-parallel LDPC decoder on a FPGA, and simulated its decoding performance. Practical simulation shows that our proposed partial-parallel LDPC decoder improves the number of iterations and bit error performance in the sum-product algorithm.

    CiNii

  • A Hardware/Software Cosynthesis Algorithm for Processors with Heterogeneous Datapaths

    MIYAOKA Yuichiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE Trans. Fundamentals, A   87 ( 4 ) 830 - 836  2004.04

     View Summary

    This paper proposes a hardware/software cosynthesis algorithm for processors with heterogeneous registers. Given a CDFG corresponding to an application program and a timing constraint, the algorithm generates a processor configuration minimizing area of the processor and an assembly code on the processor. First, the algorithm con figures a datapath which can execute several DFG nodes with data dependency at one cycle. The datapath can execute the application program at the least number of cycles. The branch and bound algorithm is applied and all the number of functional units and memory banks are tried. For an assumed number of functional units and memory banks, an appropriate number of heterogeneous registers and connections to functional units and registers are explored. The experimental results show effectiveness and efficiency of the algorithm.

    CiNii

  • A Hardware/Software Cosynthesis Method for CAM Processor with Area Constraints

    ISHIKAWA Yuichiro, MIYAOKA Yuichiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

      2003 ( 105 ) 175 - 180  2003.10

     View Summary

    We have been building the hardware/software cosynthesis system for a processor core with a content addressable memory (CAM). We input a description of an application program written in C language into the system, and the system outputs an optimal hardware configration of a CAM processor which executes an inputted application program. This paper extends our hardware/software cosynthesis system which incorporates area constraints for a CAM processor. The system computes the number of CAM words which minimizes the execution time with meeting the area constraints. We reduce the CAM processor's area by replacing CAM with RAM according to the word number that the system computed. Experimental results for practical application program show that the system can output a configration of the processor which executes the application program fastest with meeting the area constraints.

    CiNii

  • A Thread Partitioning Algorithm in Low Power High-Level Synthesis

    UCHIDA Jumpei, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   102 ( 684 ) 7 - 12  2003.02

     View Summary

    This paper proposes a thread partitioning algorithm in low power high-level synthesis. The algorithm is applied to high-level synthesis systems that describe parallel behaving circuit blocks (threads) explicitly. First it focuses on a set R of local registers in a thread. It partitions a thread into two sub-threads, one of which has R and the other does not have R. The partitioned sub-threads need to be synchronized with each other to keep the data dependency of the original thread. Since the partitioned sub-threads have waiting time for synchronization, gated clocks can be applied to each sub-threads. Then power reduced circuits are synthesized, with a low area overhead, compared to original circuits. Experimental results demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • A SIMD operation optimization algorithm in HW/SW partitioning for SIMD processor cores

    TACHIKAKE Koichi, MIYAOKA Yuichiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   102 ( 684 ) 13 - 18  2003.02

     View Summary

    A SIMD operation is composed of an arithmetic operation, a shift operation, and a saturation operation. They are called sub-operations. By executing all the sub-operations in a SIMD operation in one clock cycle, we can execute an application in a short time, but its corresponding functional unit in a processor core needs a complex configuration. By assigning all sub-operations to their respective functional unit, we can simplify a configuration of a SIMD functional unit, but it takes much execution time. This paper proposes a SIMD operation optimization algorithm in HW/SW partitioning for SIMD processor cores. Given an application program and a timing constraint, we assume an initial processor core which executes all the sub-operations in each SIMD operation in one clock cycle. Then, the algorithm divides a complex SIMD operation into two arithmetic operations with the bit extending operation and one bit extracting operation with the shift operation one by one. By repeating this process while a timing constraint is satisfied, a total cost of a processor core can be reduced. The experimental results show effectiveness of the algorithm.

    CiNii

  • An Optimizing Algorithm for Extended CAM Processors with Threshold Search

    TOTSUKA Takao, MIYAOKA Yuichiro, ISHIKAWA Yuichiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   102 ( 684 ) 19 - 24  2003.02

     View Summary

    An extended content addressable memory (CAM) realizes not only conventional equivalent search but parallel threshold search such as less-than search and greater-than search. In order to use the parallel processing function of CAM, parallel processing circuits are needed around a CAM cell array. Furthermore every application requires its specific CAM cell array arid peripheral circuits. This paper proposes an optimizing algorithm for a processor core with an extended CAM. Based on a application and timing constraint, the proposed algorithm determines the CAM cell array type and its peripheral circuits by means of a branch and bound method. It minimizes processor core area meeting with timing constraint. By introducing an improved hardware configuration tree we can obtain a configuration in a short time. Experimental results for practical application programs show the effectiveness of the proposed algorithm.

    CiNii

  • A Thread Partitioning Algorithm in Low Power High-Level Synthesis

    UCHIDA Jumpei, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. ICD   102 ( 686 ) 7 - 12  2003.02

     View Summary

    This paper proposes a thread partitioning algorithm in low power high-level synthesis. The algorithm is applied to high-level synthesis systems that describe parallel behaving circuit blocks (threads) explicitly. First it focuses on a set R of local registers in a thread. It partitions a thread into two sub-threads, one of which has R and the other does not have R. The partitioned sub-threads need to be synchronized with each other to keep the data dependency of the original thread. Since the partitioned sub-threads have waiting time for synchronization, gated clocks can be applied to each sub-threads. Then power reduced circuits are synthesized, with a low area overhead, compared to original circuits. Experimental results demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • A SIMD operation optimization algorithm in HW/SW partitioning for SIMD processor cores

    TACHIKAKE Koichi, MIYOKA Yuichiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. ICD   102 ( 686 ) 13 - 18  2003.02

     View Summary

    A SIMD operation is composed of an arithmetic operation, a shift operation, and a saturation operation. They are called sub-operations. By executing all the sub-operations in a SIMD operation in one clock cycle, we can execute an application in a short time, but its corresponding functional unit in a processor core needs a complex configuration. By assigning all sub-operations to their respective functional unit, we can simplify a configuration of a SIMD functional unit, but it takes much execution time. This paper proposes a SIMD operation optimization algorithm in HW/SW partitioning for SIMD processor cores. Given an application program and a timing constraint, we assume an initial processor core which executes all the sub-operations in each SIMD operation in one clock cycle. Then, the algorithm divides a complex SIMD operation into two arithmetic operations with the bit extending operation and one bit extracting operation with the shift operation one by one. By repeating this process while a timing constraint is satisfied, a total cost of a processor core can be reduced. The experimental results show effectiveness of the algorithm.

    CiNii

  • An Optimizing Algorithm for Extended CAM Processors with Threshold Search

    TOTSUKA Takao, MIYAOKA Yuichiro, ISHIKAWA Yuichiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. ICD   102 ( 686 ) 19 - 24  2003.02

     View Summary

    An extended content addressable memory (CAM) realizes not only conventional equivalent search but parallel threshold search such as less-than search and greater-than search. In order to use the parallel processing function of CAM, parallel processing circuits are needed around a CAM cell array. Furthermore every application requires its specific CAM cell array and peripheral circuits. This paper proposes an optimizing algorithm for a processor core with an extended CAM. Based on a application and timing constraint, the proposed algorithm determines the CAM cell array type and its peripheral circuits by means of a branch and bound method. It minimizes processor core area meeting with timing constraint. By introducing an improved hardware configuration tree we can obtain a configuration in a short, time. Experimental results for practical application programs show the effectiveness of the proposed algorithm.

    CiNii

  • A Hardware/Software Partitioning Algorithm for Micro Processors Based on Response Time of Hardware IPs

    TAGAWA Hiroki, KOHARA Shunitsu, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

      2003 ( 7 ) 93 - 98  2003.01

     View Summary

    This paper proposes a hardware/software partitioning algorithm based on response time of hardware IPs. We have been developing a new design approach which first determines the hardware IPs, then co-synthesizes a processor core. Our approach realizes an application-specific system LSI including the processor core that contains only the necessary functionalities. We can reduce an unnecessary functionalities by hardware/software partitioning for micro processors based on response time of hardware IPs. Our algorithm obtains hardware response time of hardware IPs at instruction level. That realizes the efficient parallel execution of hardware and software. The experimental results show effectiveness of the proposed algorithm and our new design approach.

    CiNii

  • A Processor Core Synthesis System Based on Response Time of Hardware IPs

    KOHARA Shunitsu, TAGAWA Hiroki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

      2003 ( 7 ) 87 - 92  2003.01

     View Summary

    This paper proposes a processor core synthesis system based on response time of hardware IPs, and a framework for system LSI design over the synthesis system. In case of designing a system LSI using hardware IPs, IPs which are necessary and sufficient performance for the system LSI are not always provided. Our approach is as follows:After system-level hardware/software partitioning, we use IPs for hardware, but not processor core IPs for software. We use a processor core which is auto-synthesized by the proposed synthesis system and has just enough performance. We design a JPEG encoder within the framework and the results demonstrate its effectiveness and efficiency.

    CiNii

  • A DSP with Dedicated Functional Units for MPEG - 4 Core Profile Encoding

    ISHIMOTO Takeshi, MIYAOKA Yuichiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2003 ( 7 ) 81 - 86  2003.01

     View Summary

    This paper proposes a DSP with dedicated functional units for MPEG-4 core profile encoding. In the proposed DSP, we can have 30fps of QCIF for MPEG-4 core profile encoding, which no other LSIs for MPEG-4 have ever achieved. The proposed DSP has four dedicated functional units: a shape coding unit, a padding unit, a quantization unit, and a variable-length coding unit. These units execute processes which require much computational power for MPEG-4 core profile encoding. It also has a bitstream load unit and a bitstream store unit which realize a variable-length memory access. These units speed up reading and writing variable-length codes outputted from the shape coding unit and the variable-length coding unit, and make it possible to incorporate these coding units into the proposed DSP. Therefore, our DSP can achieve both the performance of dedicated hardwares and the flexibility of a processor. Our DSP has been implemented using 0.35 μm CMOS technology and operates at 40MHz.

    CiNii

  • A DSP with Dedicated Functional Units for MPEG-4 Core Profile Encoding

    ISHIMOTO Takeshi, MIYAOKA Yuichiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report. Computer systems   102 ( 611 ) 25 - 30  2003.01

     View Summary

    This paper proposes a DSP with dedicated functional units for MPEG-4 core profile encoding. In the proposed DSP, we can have 30fps of QCIF for MPEG-4 core profile encoding, which no other LSls for MPEG-4 have ever achieved. The proposed DSP has four dedicated functional units : a shape coding unit, a padding unit, a quantization unit, and a variable-length coding unit. These units execute processes which require much computational power for MPEG-4 core profile encoding. It also has a bitstream load unit and a bitstrearn store unit, which realize a variable-length memory access. These units speed up reading and writing variable-length codes outputted from the shape coding unit and the variable-length coding unit, and make it possible to incorporate these coding units into the proposed DSP. Therefore, our DSP can achieve both the performance of dedicated hardwares and the flexibility of a processor. Our DSP has been implemented using 0.35μm CMOS technology and operates at 40MHz.

    CiNii

  • A DSP with Dedicated Functional Units for MPEG-4 Core Profile Encoding

    ISHIMOTO Takeshi, MIYAOKA Yuichiro, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   102 ( 609 ) 25 - 30  2003.01

     View Summary

    This paper proposes a DSP with dedicated functional units for MPEG-4 core profile encoding. In the proposed DSP, we can have 30fps of QCIF for MPEG-4 core profile encoding, which no other LSIs for MPEG-4 have ever achieved. The proposed DSP has four dedicated functional units: a shape coding unit, a padding unit, a quantization unit, and a variable-length coding unit. These units execute processes which require much computational power for MPEG-4 core profile encoding. It also has a bitstream load unit and a bitstream store unit which realize a variable-length memory access. These units speed up reading and writing variable-length codes outputted from the shape coding unit and the variable-length coding unit, and make it possible to incorporate these coding units into the proposed DSP. Therefore, our DSP can achieve both the performance of dedicated hardwares and the flexibility of a processor. Our DSP has been implemented using 0.35μm CMOS technology and operates at 40MHz.

    CiNii

  • A High-Level Energy-Optimizing Algorithm for System VLSIs Based on Area/Time/Power Estimation

    NODA Shinichi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE transactions on fundamentals of electronics, communications and computer sciences   85 ( 12 ) 2655 - 2666  2002.12

     View Summary

    This paper proposes a high-level energy-optimizing algorithm which can synthesize low energy system VLSIs. Given an initial system hardware obtained from an abstract behavioral description, the proposed algorithm applies to it the three energy reduction techniques, 1) reducing supply voltage, 2) selecting lower energy modules, and 3) applying gated clocks. By incorporating our area/delay/power estimation, the proposed algorithm can obtain low energy system VLSIs meeting the constraints of area, delay, and execution time. The proposed algorithm has been incorporated into a high-level synthesis system and experimental results demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • High-Level Area/Delay/Power Estimation for Low Power System VLSIs with Gated Clocks

    NODA Shinichi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE Trans. on Fundamentals, A   85 ( 4 ) 827 - 834  2002.04

     View Summary

    At high-level synthesis for system VLSIs, their power consumption is efficiently reduced by applying gated clocks to them. Since using gated clocks causes the reduction of power consumption and the increase of area/delay, estimating trade-off between power and area/delay by applying gated clocks is very important. In this paper, we discuss the amount of variance of area, delay and power by applying gated clocks. We propose a simple gate-level circuit model and estimation equations. We vary parameters in our proposed circuit model, and evaluate power consumption by back-annotating gate-level simulation results to the original circuit. This paper also proposes a conditional expression for applying gated clocks. The expression shows whether or not we can reduce power consumption by applying gated clocks. We confirm the accuracy of proposed estimation equations by experiments.

    CiNii

  • Retargetable Simulator Generation for Digital Signal Processor with Packed SIMD Type Instructions

    KASAHARA Kyosuke, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   101 ( 695 ) 17 - 24  2002.03

     View Summary

    Consider to synthesize a processor core with packed SIMD type instructions by a hardware/software cosynthesis system. A simulator specified for the synthesized processor core is necessary. But it is impossible to reserve descriptions described about packed SIMD type instructions for simulators since there are too many packed SIMD type instructions in a target instruction set of our hardware/software cosynthesis system. This paper proposes a methodology to generate descriptions about packed SIMD type instructions. Putting subfunctions that constitute a packed SIMD type instruction together generates the descriptions. Experimental results of simulations about a processor core with packed SIMD type instructions prove effectiveness of proposed methodology.

    CiNii

  • A High - Level Power Optimization Algorithm for System VLSIs Based on Area/Delay/Power Estimation

    NODA Shinichi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

      2002 ( 5 ) 169 - 176  2002.01

     View Summary

    This paper proposes a new high-level synthesis system which can synthesize low-powered system VLSIs under the constraints of area, delay, and execution time. In the proposed system, first an initial system hardware is obtained from an abstract behavioral description. Then three power reduction techniques, 1) reducing power supply voltage, 2) selecting lower power modules, and 3) applying gated clocks, are applied to it. However these power reduction techniques may increase area, delay, and/or execution time of a synthesized hardware, while they can reduce its power dissipation. In this paper, we propose a power optimization algorithm which incorporates area/delay/power estimation, in which we can obtain a synthesized hardware meeting given area/delay/power constraints. Experimental results demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • An Area/Delay Estimation Technique for Control - Based Hardware Synthesis

    YODA Takayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2001 ( 12 ) 25 - 32  2001.02

     View Summary

    This paper proposes an area/delay estimation technique in high-level synthesis for control flow based hardwares. At area/delay estimation, the input is the state-transition graph, which is generated by the area/time optimizing. The output is estimated area and delay value for the state-transition graph. Our estimation technique gives area and delay including control part of hardware, using an estimation equation. The equation has been decided by number of operations, number of states and type of operations. Experimental results for several control-based hardware demonstrate effectiveness and efficiency of the technique.

    CiNii

  • A Resource Binding Algorithm Based on Computation Time Estimation Using Heuristic Method and Branch -and- bound Method

    NAKAMURA Hiroshi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   2001 ( 2 ) 65 - 72  2001.01

     View Summary

    This paper proposes a resource binding algorithm based on computation time estimation in the high-level synthesis system for digital processing. In the algorithm, a heuristic based binder is first executed and then a branch-and-bound based binder is executed. The computation time to run the algorithm depends on the unmber of resource assignments which the heuristic based binder determines. Thus we can estimate computation time to run the algorithm by varying the number of such resource assignments. In the algorithm, for a given constraint of computation time, we first obtain the number of resource assignments which the heuristic based binder determines based on the computation time estimation. Then we actually execute the heuristic based binder. After that, we execute the branch-and-bound based binder for the rest of the resource assignments. Experimental results demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • A Resource Binding Algorithm Based on Computation Time Estimation Using Heuristic Method and Branch-and-bound Method

    NAKAMURA Hiroshi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report. Computer systems   100 ( 534 ) 17 - 24  2001.01

     View Summary

    This paper proposes a resource binding algorithm based on computation time estimation in the high-level synthesis system for digital signal processing. In the algorithm, a heuristic based binder is first executed and then a branch-and-bound based binder is executed. The computation time to run the algorithm depends on the number of resource assignments which the heuristic based binder determins. Thus we can estimate computation time to run the algorithm by varying the number of such resource assgnments. In the algorithm, for a given constraint of computation time, we first obtain the number of resource assignmmemts which the heuristic based binder determines baded on the computation time estimation. Then we actually execute the heuristic based binder. After that, we execute the branch-and-bound based binder for the rest of the resource assignments. Experimental results demonstrate effecitiveness and efficency of the algorithm.

    CiNii

  • A Resource Binding Algorithm Based on Computation Time Estimation Using Heuristic Method and Branch-and-bound Method

    NAKAMURA Hiroshi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   100 ( 532 ) 17 - 24  2001.01

     View Summary

    This paper proposes a resource binding algorithm based on computation time estimation in the high-level synthesis system for digital signal processing. In the algorithm, a heuristic based binder is first executed and then a branch-and-bound based binder is executed. The computation time to run the algorithm depends on the number of resource assignments which the heuristic based binder determines. Thus we can estimate computation time to run the algorithm by varying the number of such resource assignments. In the algorithm, for a given constraint of computation time, we first obtain the number of resource assignments which the heuristic based binder determines based on the computation time estimation. Then we actually execute the heuristic based binder. After that, we execute the branch-and-bound based binder for the rest of the resource assignments. Experimental results demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • Area/Delay Estimation Techiques for Processors with Content Addressable Memory

    YODEN Tatsuhiko, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. FTS   100 ( 475 ) 83 - 88  2000.11

     View Summary

    A hardware/software cosynthesis which synthesizes processors with a CAM(Content Addressable Memory)unit requires area/delay estimation of a generated processor.We, at first, configure processors with CAM unit based on several parameters, the results are logic-synthesized, and the figures of logic circuit are analyzed to obtain roughestimation equations of area and delay.Based on the obtained estimation equations of area and delay, we configure variety of processors of various types of parameter, the results are logic-synthesized, and the final estimation equations are established.We have compared the established estimation equations with the logic-synthesized processor&#039;s area and delay.Errors of the area estimations are less than 2.7%.Errors o thef delay astimations are less than 3.8%.

    CiNii

  • A Hardware/Software Cosynthesis System for CAM Processor

    WAKUI Tatsuhiko, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. FTS   100 ( 475 ) 89 - 94  2000.11

     View Summary

    This paper proposes a hardware/software cosynthesis system which synthesizes processors with a CAM(Content Addressable Memory)unit.The input of the system is an application program written in C including CAM functions and area/time constraints, and its output is hardware descriptions of a synthesized processor and an application binary code executed on the processor satisfying the constraints.Our system determines the hardware part and the software part of a CAM unit by means of branch and bound method.Experimental results show that we can obtain hardware descriptions of a CAM processor and a binary code satisfying a given timing constraint.

    CiNii

  • Area/Delay Estimation Techiques for Processors with Content Addressable Memory

    YODEN Tatsuhiko, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. ICD   100 ( 474 ) 83 - 88  2000.11

     View Summary

    A hardware/software cosynthesis which synthesizes processors with a CAM(Content Addressable Memory)unit requires area/delay estimation of a generated processor.We, at first, configure processors with CAM unit based on several parameters, the results are logic-synthesized, and the figures of logic circuit are analyzed to obtain roughestimation equations of area and delay.Based on the obtained estimation equations of area and delay, we configure variety of processors of various types of parameter, the results are logic-synthesized, and the final estimation equations are established.We have compared the established estimation equations with the logic-synthesized processor&#039;s area and delay.Errors of the area estimations are less than 2.7%.Errors o thef delay astimations are less than 3.8%.

    CiNii

  • Area/Delay Estimation Techiques for Processors with Content Addressable Memory

    YODEN Tatsuhiko, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   100 ( 473 ) 83 - 88  2000.11

     View Summary

    A hardware/software cosynthesis which synthesizes processors with a CAM(Content Addressable Memory)unit requires area/delay estimation of a generated processor. We, at first, configure processors with CAM unit based on several parameters, the results are logic-synthesized, and the figures of logic circuit are analyzed to obtain rough estimation equations of area and delay. Based on the obtained estimation equations of area and delay, we configure variety of processors of various types of parameter, the results are logic-synthesized, and the final estimation equations are established. We have compared the established estimation equations with the logic-synthesized processor&#039;s area and delay. Errors of the area estimations are less than 2.7%. Errors of the delay astimations are less than 3.8%.

    CiNii

  • Area/Delay Estimation Techniques for Digital Signal Processor Cores

    KATAOKA Yoshiharu, YOSHIZAWA Dai, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. FTS   99 ( 479 ) 1 - 8  1999.11

     View Summary

    A hardware/software cosynthesis system for digital signal processors with two types of register files requires to certain evalution values in the phase of hardware/software partitioning. These evaluation values are execution time of a given application program and a hardware cost of a generated processor core. In order to obtain these evaluation values, we, in advance, configure a variety of hardware units and the results are logic-synthesized and analyzed to establish estimation equations. We propose techniques for deriving the convincing equations which estimate both the delay and the area of the target processor core. For the area estimation, we show that the total area can be derived by the summation of area of a processor kernel and area of additional hardware units. The processor kernel area amounts to two independent rules: (1) area corresponding to an overhead when extra hardware units are added; (2) the size of general-purpose resisters. We have compared the derived estimation values with the in-advance logicsynthesized data. Errors of the area estimation are less than 2%. For the delay estimation, we can reduce estimation errors by focusing on the functional units on a critical path. Errors of the delay estimation are all less than 2ns.

    CiNii

  • A Hardware/Software Partitioning Algorithmfor Digital Signal Processors with Two Types of Register Files

    SAKURAI Takashi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. FTS   99 ( 479 ) 9 - 16  1999.11

     View Summary

    This paper proposes a hardware/software partitioning algorithm for digital signal processors with two types of register files. Given a compiled assembly code, analyzed application data and a timing constraint of execution time, the proposed algorithm generates a processor architecture with new assembly code for the processor. The target processor has a VLIW-type core consisting of a processor kernel, two resister files and multiple hardware units such as hardware loops, addressing units, functional units and data memory buses. Two types of register files have different bit width, and we can reduce total hardware costs for the register files by assigning variables to the appropriate register file. Also, our hardware unit library includes more than one functional units for a single operation arithmetic or logical. We can reduce total hardware costs selecting appropriate functional units depending on the given application program. The experimental results show the effectiveness of the proposed algorithm.

    CiNii

  • An Area/Time Optimizing Algorithm for Control-Based Hardware Synthesis

    IENAGA Masayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   99 ( 317 ) 15 - 22  1999.09

     View Summary

    This paper proposes an area/time optimizing algorithm in high-level synthesis for control-based hardwares. Given a call graph whose node corresponds to a control flow of an application program, the algorithm generates a set of state-transition graphs which represents the input call graph under area and timing constraint. In the algorithm, first state-transition graphs which Satisfy only timing constraint are generated and second they are transformed so that they can satisfy area constraint. Since the algorithm is directly applied to control-flow graphs, it can deal with control flows such as bit-wise processes and conditional branches. Further, the algorithm synthesizes more than one hardware architecture candidates from a single call graph for an application program. Designers of an application program can select several good hardware architectures among candidates according to multiple design criteria. Experimental results for several control-based hardwares demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • An Area/Time Optimizing Algorithm for Control-Based Hardware Synthesis

    IENAGA Masayuki, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   99 ( 317 ) 15 - 22  1999.09

     View Summary

    This paper proposes an area/time optimizing algorithm in high-level synthesis for control-based hardwares. Given a call graph whose node corresponds to a control flow of an application program, the algorithm generates a set of state-transition graphs which represents the input call graph under area and timing constraint. In the algorithm, first state-transition graphs which Satisfy only timing constraint are generated and second they are transformed so that they can satisfy area constraint. Since the algorithm is directly applied to control-flow graphs, it can deal with control flows such as bit-wise processes and conditional branches. Further, the algorithm synthesizes more than one hardware architecture candidates from a single call graph for an application program. Designers of an application program can select several good hardware architectures among candidates according to multiple design criteria. Experimental results for several control-based hardwares demonstrate effectiveness and efficiency of the algorithm.

    CiNii

  • A Dynamic Reconfigurable System Based on Multiple FPGAs and Its Applications

    HASEGAWA Yohei, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   98 ( 625 ) 17 - 24  1999.03

     View Summary

    An FPGA is a reconfigurable device on which users realize their own logic circuits rapidly. A flexible hardware system can be designed based on FPGAs. Recently, there has been proposed a dynamic reconfigurable system where a part of the system can be reconfigured while executing an application program. This paper proposes a dynamic reconfigurable system mFPS2 for fast digital signal processing. mFPS2 consists of four FPGAs, two of them can be dynamically reconfigured. Therefore a large application Program exceeding the size of physical hardware resources will be efficiently implemented on mFPS2. A JPEG encoder has been implemented on mFPS2 and experimental results show that mFPS2 executes the JPEG encoder two times faster than a software process on a workstation.

    CiNii

  • A Dynamic Reconfigurable System Based on Multiple FPGAs and Its Applications

    HASEGAWA Yohei, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    Technical report of IEICE. ICD   98 ( 626 ) 17 - 24  1999.03

     View Summary

    An FPGA is a reconfigurable device on which users realize their own logic circuits rapidly. A flexible hardware system can be designed based on FPGAs. Recently, there has been proposed a dynamic reconfigurable System Where a part of the system can be reconfigured while executing an application program. This paper proposes a dynamic reconfigurable system mFPS2 for fast digital signal processing. mFPS2 consists of four FPGAs, two of them Can be dynamically reconfigured. Therefore a large application program exceeding the size of Physical hardware resources will be efficiently implemented on mFPS2. A JPEG encoder has been implemented on mFPS2 and experimental results show that mFPS2 executes the JPEG encoder two times faster than a software process on a workstation.

    CiNii

  • A Hardware/Software Cosynthesis System for Digital Signal Processors with Two Types of Register Files and Its Compiler

    NAKAMURA Tsuyoshi, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IPSJ SIG Notes   99 ( 12 ) 113 - 120  1999.02

     View Summary

    In digital signal processing, intermediate results require greater bit width than input data in order to keep high precision for arithmetic operation. If a digital signal processor has two types of register files, digital signal processing applications can keep high precision for arithmetic operation with small amount of processor area. This paper proposes a hardware/software cosynthesis system which synthesizes digital signal processors with two types of register files and its compiler. The input of the system is an application program written in C and application data, and its output is hardware descriptions of a synthesized processor core, an application binary code executed on the processor core and software environment. The proposed compiler generates an assembly code for a processor core with all the available hardware units which can be added to the processor core. It extracts from an input application program those instructions which can be executed concurrently and attempts to minimize its execution time. Moreover it generates an assembly code which keeps required precision for arithmetic operation, since the proposed compiler assigns two types of data to two types of register files. The experimental results show the effectiveness of the system and the compiler.

    CiNii

  • A Hardware/Software Cosynthesis System for Processors with Content Addressable Memory

    TERAJIMA Makoto, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE technical report. Computer systems   98 ( 291 ) 31 - 38  1998.09

     View Summary

    This paper proposes a hardware/software cosynthesis system which synthesizes processors with a CAM (Content Addressable Memory) unit. The input of the system is an application program written in C including CAM functions, and its output is hardware descriptions of a synthesized processor and application an application binary code executed on the processor. CAM realizes word-parallel equivalence search and word-parallel writing within one clock cycle. Since the system synthesizes an application specific CAM unit according to the requirements of an input application program, it can execute the application program fast with small amount of processor area. Experimental results demonstrate its efficiency and effectiveness.

    CiNii

  • A Hardware/Software Cosynthesis System for Processors with Content Addressable Memory

    TERAJIMA Makoto, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    IPSJ SIG Notes   98 ( 87 ) 83 - 90  1998.09

     View Summary

    This paper proposes a hardware/software cosynthesis system which synthesizes processors with a CAM(Content Addressable Memory) unit. The input of the system is an application program written in C including CAM functions, and its output is hardware descriptions of a synthesized processor and application an applicatoin binary code executed on the processor. CAM realizes word-parallel equivalence search and word-parralel writing within one clock cycle. Since the system synthesizes an application specific CAM unit according to the requirements of an input application program, it can execute the application program fast with small amount of processor area. Experimental results demonstrate its effectiveness.

    CiNii

  • An FPGA Layout Reconfiguration Algorithm Based on Global Routes for Engineering Changes in System Design Specifications

    TOGAWA Nozomu, HAGI Kayoko, YANAGISAWA Masao, OHTSUKI Tatsuo

    IEICE transactions on fundamentals of electronics, communications and computer sciences   81 ( 5 ) 873 - 884  1998.05

     View Summary

    Rapid system prototyping is one of the main applications for field-programmable gate arrays(FPGAs). At the stage of rapid system prototyping, design specifications can often be changed since they cannot be determined completely. In this paper, layout design change is focused on and a layout reconfiguration algorithm is proposed for FPGAs. The target FPGA architecture is develioped for transport processing. In order to implement more various circuits flexibly, it has three-input lookup tables(LUTs)as minimum logic cells. Since its logic granularity is finer than that of conventional FPGAs, it requires more routing resources to connect them and minimization of routing congestion is indispensable. In layout reconfiguration, the main problem is to add LUTs to initial layouts. Our algorithm consists of two steps: For given placement and global routing of LUTs, in Step 1 an added LUT is placed with allowing that the position of the added LUT may overlap that of a preplaced LUT; Then in Step 2 preplaced LUTs are moved to their adjacent positions so that the overlap of the LUT positions can be resolved. Global routes are updated corresponding to reconfiguration of placement. The algorithm keeps routing congestion small by evaluating global routes directly both in Steps 1 and 2. Especially in Step 2, if the minimum number of preplaced LUTs are moved to their adjacent positions, our algorithm minimizes routing congestion. Experimental results demonstrate that, if the number of added LUTs is at most 20% of the number of initial LUTs, our algorithm generates the reconfigured layouts whose routing congestion is as small as that obtained by executing a conventional placement and global routing algorithm. Run time of our algorithm is within approximately one second.

    CiNii

  • ツリー構造を持つ論理ブロックを対象としたテクノロジマッピング手法

    電子情報通信学会技術研究報告   VLD97;104  1997.12

    CiNii

  • A Performance-Oriented Simultaneous Placement and Global Routing Algorithm for Transport-Processing FPGAs

    TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    IEICE transactions on fundamentals of electronics, communications and computer sciences   80 ( 10 ) 1795 - 1806  1997.10

     View Summary

    In layout design of transport-processing FPGAs, it is required that not only routing congestion is kept small but also circuits implemented on them operate with higher operation frequency. This paper extends the proposed simultaneous placement and global routing algorithm for transport-processing FPGAs whose objective is to minimize routing congestion and proposes a new algorithm in which the length of each critical signal path (path length) is limited within a specified upper bound imposed on it (path length constraint). The algorithm is based on hierarchical bipartitioning of layout regions and LUT (LookUp Table) sets to be placed. In each bipartitioning, the algorithm first searches the paths with tighter path length constraints by estimating their path lengths. Second the algorithm proceeds the bipartitioning so that the path lengths of critical paths can be reduced. The algorithm is applied to transport-processing circuits and compared with conventional approaches. The results demonstrate that the algorithm satisfies the path length constraints for 11 out of 13 circuits, though it increases routing congestion by an average of 20%. After detailed routing, it achieves 100% routing for all the circuits and decreases a circuit delay by an average of 23%.

    CiNii

  • A Circuit Partitioning Algorithm with Path Delay Constraints for Multi-FPGA Systems

    TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    IEICE transactions on fundamentals of electronics, communications and computer sciences   80 ( 3 ) 494 - 505  1997.03

     View Summary

    In this paper, we extend the circuit partitioning algorithm which we have proposed for multi-FPGA systems and present a new algorithm in which the delay of each critical signal path is within a specified upper bound imposed on it. The core of the presented algorithm is recursive bipartitioning of a circuit. The bipartitioning procedure consists of three stages: 0) detection of critical paths; 1) bipartitioning of a set of primary inputs and outputs; and 2) bipartitioning of a set of logic-blocks. In 0), the algorithm computes the lower bounds of delays for paths with path delay constraints and detects the critical paths based on the difference between the lower and upper bound dynamically in every bipartitioning procedure. The delays of the critical paths are reduced with higher priority. In 1), the algorithm attempts to assign the primary inputs and outputs on each critical path to one chip so that the critical path does not cross between chips. Finally in 2), the algorithm not only decreases the number of crossings between chips but also assigns the logic-blocks on each critical path to one chip by exploiting a network flow technique. The algorithm has been implemented and applied to MCNC PARTITIONING 93 benchmark circuits. The experimental results demonstrate that it resolves almost all path delay constraints with maintaining the maximum number of required I/O blocks per chip small compared with conventional algorithms.

    CiNii

  • A fast scheduling algorithm in high-level synthesis system for digital signal processing

    TOGAWA N.

    Proc. IPSJ DA Symposium '97     167 - 172  1997

    CiNii

  • Simultaneous Placement and Global Routing for Transport-Processing FPGA Layout

    TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    IEICE transactions on fundamentals of electronics, communications and computer sciences   79 ( 12 ) 2140 - 2150  1996.12

     View Summary

    Transport-processing FPGAs have been pro-posed for flexible telecommunication systems. Since those FP-GAs have finer granularity of logic functions to implement circuits on them, the amount of routing resources tends to increase.In order to keep routing congestion small, it is necessary to execute placement and routing simultaneously. This paper pro-poses a simultaneous placement and global routing algorithm for transport-processing FPGAs whose primary objective is minimizing routing congestion. The algorithm is based on hierarchical bipartition of layout regions and sets of LUTs (LookUp Tables) to be placed. It achieves bipartitioning which leads to small routing congestion by applying a network flow technique to it and computing a maximum flow and a minimum cut. If there exist connections between bipartitioned LUT sets, pairs of pseudo-terminals are introduced to preserve the connections. A sequence of pseudo-terminals represents a global route of each net. As a result, both placement of LUTs and global routing are determined when hierarchical bipartitioning procedures are finished. The proposed algorithm has been implemented and applied to practical transport-processing circuits. The experimental results demonstrate that it decreases routing congestion by an average of 37% compared with a conventional algorithm and achieves 100% routing for the circuits for which[ the conventional algorithm causes unrouted nets.

    CiNii

  • A Spacing Algorithm for Double-Sided Printed Wiring Boards

    KANAI Hirokazu, TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    IPSJ SIG Notes   96 ( 51 ) 9 - 14  1996.05

     View Summary

    In design of printed wiring boards, after parts are placed on them, wires are routed among the parts. Thus all wires cannot always be routed among the placed parts. Spacing is the process to move the preplaced parts so that all wires can be routed. In this paper, we propose a spacing algorithm for double-sided printed wiring boards, on both sides of which parts are placed and wires are routed. The algorithm moves the preplaced parts so as to take the space in proportion to the number of wires among the parts. If the initial layout has violations of the design rules such as overlaps of the preplaced parts, the algorithm resolves the violations. Experimental results show that the algorithm is effective.

    CiNii

  • パス遅延制約を考慮したマルチFPGA用回路分割手法

    電子情報通信学会第9回回路とシステム軽井沢ワークショップ論文集    1996.04

  • A Time-Constrained Scheduling Algorithm for CDFG with Conditional Branches

    ISHIWATA Hiroaki, TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   95 ( 561 ) 31 - 36  1996.03

     View Summary

    In case behavioral descriptions contain conditional branches in high-level synthesis of LSI, scheduling algorithms which deal with conditional branches are required. In scheduling control data flow graphs(CDFG) with conditional branches, operations which have different execution conditions as well as those which have different execution time are able to share hardware resources. In this paper, we propose a scheduling algorithm for CDFG with conditional branches. In the algorithm, we first serch the operations which have different execution conditions and are able to share hardware resources and assign them to the same control step. Then we schedule other operations. Experimental results show that the algorithm obtains near optimal solutions in less than one second.

    CiNii

  • A Pipelined DSP Scheduling Algorithm for DFG with Inter Iteration Data Dependencies

    NISHIDA Koichi, TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   95 ( 561 ) 37 - 44  1996.03

     View Summary

    In high-level synthesis for digital signal processing (DSP), scheduling of data-flow graphs plays a primary role. The primary operations of DSP are delaying signals with delay units and operating them with arithmetic units. Thus it is required to satisfy inter iteration data dependency constraints in scheduling. In high-speed DSP, pipelining data-paths is also important for improving throughputs. In this paper, we propose a pipelined DSP scheduling algorithm for DFG with inter iteration data dependencies. The algorithm deals with multi-cycle functional units and pipelined functional units and synthesizes pipelined data paths satisfying inter iteration data dependency constraints. In the algorithm, we first enumerate candidate control steps to which each operation in DFG is assigned. By reducing the assignment candidates gradually, we finally obtain a scheduling result. Experimental results for practical DSP data-flow graphs show that the algorithm obtains near optimal solutions in less than one second.

    CiNii

  • High Level Synthesis for Entropy CODEC

    SUZUKI Katsuharu, TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    IPSJ SIG Notes   96 ( 16 ) 25 - 30  1996.02

     View Summary

    Entropy codling/decoding is implemented on FPGAs as a fast and flexible system in which high level synthesis technologies arc key issucs. In this paper, we propose scheduling and allocation algorithm for behavioral description of Entropy CODEC. The scheduling algorithms employs CFG as input and finds a solution with minimal cost and execution time by degenerating nodes in CFG. The allocation algorithm assigns each operation to functional unites with various bit length. As a result RTL description is efficiently obtained from behavioral description of Entropy CODEC with many conditional branches and variable bit length. Experimental results demonstrate its efficiency and effectiveness.

    CiNii

  • School Education System in Korea : Especially on Teacher Training

    Hwang Ui-il, Im Bu-Yeul, Hata Katsuaki

    Memoirs of the Faculty of Education, Shimane University. Educational science   29   77 - 91  1995.12

    CiNii

  • An Algorithm for Generating Data Flow Graphs from Behavioral Descriptions

    KAWATA Yoko, TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   95 ( 307 ) 55 - 62  1995.10

     View Summary

    In this paper, we propose an algorithm for generating data flow graphs (DFGs) from a behavioral description which consists of algebraic expressions without control structures. DFG generation is the first task of the high level synthesis for designing DSP. Since the results of the synthesis much depend on a DFG structure, it is important to consider design requirements such as time and area during generating DFGs. In the proposed technique, we transform a DFG structure and make operations in parallel without increasing a resource cost so that the DFG can satisfy a given time constraint. Among generated DFGs satisfying a given time constraint, multiple DFGs are chosen based on the estimated resource costs. We can obtain better results by preparing multiple DFGs and synthesizing each of them. Experimental results show that we obtain multiple DFGs with low resource costs from a practical behavioral description.

    CiNii

  • A Data Path Scheduling Algorithm with Resource Allocation for DSP Synthesis

    NISHIDA Koichi, TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    Technical report of IEICE. VLD   95 ( 307 ) 63 - 70  1995.10

     View Summary

    In high-level synthesis for DSP data paths, scheduling of data flow graphs plays a primary role. In scheduling, it is required that a hardware resource amount is estimated as precisely as possible, and that operations are assigned to control steps so that a resource amount is minimized. In addition, pipelining that overlaps operations is necessary in high-speed DSP application such as image processing. In this paper, we propose a time constraint scheduling algorithm that deals with pipelining, and minimizes both functional unit and register costs. In our algorithm, the control step regarded as the worst in terms of a resource cost is gradually eliminated for each operation in each iteration. Finally, each operation is assigned to one control step. Experimental results for practical DSP data flow graphs show that our algorithm obtains near optimal solutions in less than one second.

    CiNii

  • An Algorithm for Generating Data Flow Graphs from Behavioral Descriptions

    Kawata Yoko, Togawa Nozomu, Sato Masao, Ohtsuki Tatsuo

    IPSJ SIG Notes   1995 ( 99 ) 137 - 144  1995.10

     View Summary

    In this paper, we propose an algorithm for generating data flow graphs (DFGs) from a behavioral description which consists of algebraic expressions without control structures. DFG generation is the first task of the high level synthesis for designing DSP. Since the results of the synthesis much depend on a DFG structure, it is important to consider design requirements such as time and area during generating DFGs. In the proposed technique, we transform a DFG structure and make operations in parallel without increasing a resource cost so that the DFG can satisfy a given time constraint. Among generated DFGs satisfying a given time constraint, multiple DFGs are chosen based on the estimated resource costs. We can obtain better results by preparing multiple DFGs and synthesizing each of them. Experimental results show that we obtain multiple DFGs with low resource costs from a practical behavioral description.

    CiNii

  • A Data Path Scheduling Algorithm with Resource Allocation for DSP Synthesis

    Nishida Koichi, Togawa Nozomu, Sato Masao, Ohtsuki Tatsuo

    IPSJ SIG Notes   1995 ( 99 ) 145 - 152  1995.10

     View Summary

    In high-level synthesis for DSP data paths, scheduling of data flow graphs plays a primary role. In scheduling, it is required that a hardware resource amount is estimated as precisely as possible, and that operations are assigned to control steps so that a resource amount is minimized. In addition, pipelining that overlaps operations is necessary in high-speed DSP application such as image processing. In this paper, we propose a time constraint scheduling algorithm that deals with pipelining, and minimizes both functional unit and register costs. In our algorithm, the control step regarded as the worst in terms of a resource cost is gradually eliminated for each operation in each iteration. Finally, each operation is assigned to one control step. Experimental results for practical DSP data flow graphs show that our algorithm obtains near optimal solutions in less than one second.

    CiNii

  • A Hierarchical Circuit Partitioning Algorithm for Multi-FPGA Systems

    TOGAWA Nozomu, SATO Masao, OHTSUKI Tatsuo

    IEICE technical report. Circuits and systems   95 ( 106 ) 69 - 76  1995.06

     View Summary

    In this paper, we propose an algorithm which partitions an initial circuit into multi-FPGA chips. The algorithm is based on recursive bi-partitioning of a circuit. In each bi-partitioning, it searches a partitioning position of a circuit such that each of partitioned subcircuits is accommodated in each FPGA chip with making the number of signal nets between chips as small as possible. Such bi-partitioning is achieved by computing a minimum cut repeatedly applying a network flow technique, and replicating logic-blocks appropriately. Since a set of logic-blocks assigned to each chip is computed separately, logic-blocks to be replicated are naturally determined. This means that the algorithm makes good use of unused logic-blocks from the viewpoint of reducing the number of signal nets between chips, i.e. the number of required I/O blocks. Experimental results for several benchmark circuits show its efficiency and effectiveness.

    CiNii

  • A circuit partitioning algorithm with replication capability for multi-FPGA systems

    TOGAWA N.

    IEICE Trans. Fundamentals   78 ( 12 ) 1765 - 1776  1995

    CiNii

  • A Top -Down Hierarchical Routing Algorithm for FPGAS with Long- Lines

    TOGAWA NOZOMU, SATO MASAO, OHTSUKI TATSUO

    IPSJ Journal   35 ( 12 ) 2785 - 2796  1994.12

     View Summary

    FPGAs (Field-Programmable Gate Arrays) are programmable devices with relatively high density and quite important to make prototyping systems rapidly. Most FPGAs consist of logic-blocks, switches, and wire segments. Each logic-block realizes a small logic function, and each net connecting logic-blocks is realized with some switches and wire segments. There are usually two kinds of wire segments such as local-lines and long-lines. Signal delay is mainly caused by logic-blocks and switches. Since long-lines pass by switches, the signal delay between faraway logic-blocks could be reduced by the long-lines. A top-down hierarchical FPGA routing algorithm is ptesented in this paper. It is based on a top-down bi-partitioning and linear assignment. During the partitioning, the algorithm assigns each net crossing the bi-partitioning line (cut-line) to a wire segnent by two-phase linear assignments. The nets are assigned to minimize the maximum delay between primary inputs and primary outputs by making good use of long-lines. The algorithm is implemented and applied to several benchmark circuits. Its results show that the presented algorithm runs effectively and efficiently.

    CiNii

  • Maple : A Simultanecous Technology Mapping.Placement.and Global Routing Algorithm for LUT-based FPGAs

    Sato Masao, Togawa Nozomu, Ohtsuki Tatsuo

    IEICE technical report. Computer systems   94 ( 257 ) 41 - 48  1994.09

     View Summary

    Technology mapping algorithms for LUT(Look Up Table)based FPGAs have aimed at transforming a Boolean network into logic-blocks. However.since those algorithms take no layout information into account.they cannot produce excellent layout results.In this paper. a simultaneous technology mapping,placement and global routing algorithm for FPGAs,Maple.is presented.MAple is an extended version of a simultaneous placement,and global routing algorithm for FPGAs,which is based on recursive partition of layout regions and block sets.Maple inherits its basic process and exectes the technology mapping simultaneously in each recursive process. Therefore,the mapping can be done with the placement and global routing information.Experimental results for some benchmark circuits demonstrate its efficiency and effectiveness.

    CiNii

  • Theories and an Optimal Algorithm for Crossing Number Minimization Prblem in Multi - level Graphs

    Matsumoto Hideyuki, Sasaki Hitoshi, Umeda Tatsuya, Togawa Nozomu, Sato Masao, Takeya Makoto, Ohtsuki Tatsuo

    IPSJ SIG Notes   1994 ( 82 ) 49 - 56  1994.09

     View Summary

    A graph is called a multi-level graph when each node of the graph is given a hierarchical level. The problem of minimizing the number of edge crossings in a multi-level graphs is discussed, where each edge is drawn with a line-segment. The problem is known to include NP-complete problems. In the first part of this paper, previous works are summarized and several theories are introduced. Then, the problem is formulated by 0-1 integer linear programming, and solved optimally using a binary decision diagram. Experimental results are also shown.

    CiNii

  • Layout CAD Methods for FPGAs

    SATO Masao, TOGAWA Nozomu

    IPSJ Magazine   35 ( 6 ) 535 - 540  1994.06

    CiNii

  • A Method for Manhattan Wiring using BDD

    Umeda Tatsuya, Togawa Nozomu, Sato Masao, Ohtsuki Tatsuo

    IPSJ SIG Notes   1993 ( 88 ) 35 - 42  1993.10

     View Summary

    To find the maximum number of nets which can be connected by manhattan wires without cossing on a place is NP-hard. This problem is considered as a problem of finding MIS(Maximum Independence Set) in graph theory. This paper presents a method for finding MIS using BDD(binary decision diagram). The variable ordering of BDD is quite important. Thus we consider suitable variable orderings for Manhattan wiring. In this paper we also concern about the condition in which our method could be effective.

    CiNii

  • A Top-Down Hierarchical Global and Detailed Routing Algorithm for Field-Programmable Gate Arrays

    TOGAWA Nozomu, AWASHIMA Toru, KANEKO Kazuya, SATO Masao, OHTSUKI Tatsuo

    The Transactions of the Institute of Electronics,Information and Communication Engineers. A   76 ( 9 ) 1312 - 1321  1993.09

     View Summary

    ゲートアレーとPLAの間隙を埋めるデバイスとしてFPGAが注目されている.FPGAはユーザプログラマブルなデバイスであり短期間で所望の回路を設計できるため,特にシステムのプロトタイピング等の分野で重要である.これは,FPGAの設計手法に対し特に城理の高速性が求められることを意味する.また,FPGAのプログラムは記憶素子またはスイッチにより実現されるため,その影響によって信号遅延が大きくなる傾向がある.従って,FPGAの設計手法では遅延制御に対しても注意する必要がある.本論文では,FPGA設計の中でも配線設計を取り上げ,高速でかつ遅延制御を実現した階層的概略詳細配線手法を提案する.階層的配線手法は,領域を再帰的に2分割し,分割線上のネットの通過位置を線形割当てにより決定するという高速な処理を基本としている.また,2段階の線形割当てによって分割線と交差するネットの通過トラック位置まで決定することで,概略配線と詳細配線の一括処理を可能とし,より高速な処理を実現する.このとき,ネットに優先度を付加し優先度の高いネットを優先的に短く配線することで遅延制御を実現する.本手法をいくつかのベンチマーク回路に適用し,その有効性を示す.

    CiNii

  • A Simultaneous Placement and Global Routing Algorithm

    Togawa Nozomu, Sato Masao, Ohtsuki Tatsuo

    IEICE technical report. Electron devices   93 ( 216 ) 53 - 60  1993.09

     View Summary

    The conventional layout design is composed of two phases,such as placement and routing.Those phases,however,should be combined to make better layouts.A top-down hierarchical layout algorithm for FPGAs is presented,which executes both placement and global routing simultaneously.It is based on a simple and fast top-down bi-partitioning.The algorithm generates pseudo-blocks on bi- partitioning lines.Global routes are represented by the sequence of them.Since the pseudo-blocks and logic-blocks are treated similarly,the placement and global routing are executed simultaneously.Experimental results for several.benchrriark circuits show that it runs quite effectively and efficiently.

    CiNii

  • A Top - Down Hierarchical FPGA Routing Algorithm Applicable to Long - lines

    SONEHARA Masahito, TOGAWA Nozomu, YANAGISAWA Masao, OHTSUKI Tatsuo

    情報処理学会研究報告システムLSI設計技術(SLDM)   1993 ( 22 ) 17 - 24  1993.03

     View Summary

    This paper proposes a routing algorithm for crosstalk reduction. Where crosstalk is modeled by RC equivalent circuits. The estimate equation of crosstalk is obtained based on peak voltage of crosstalk noise. It also includes process rule parameter and thus it takes into account the scaling law. Given routing result which does not optimize crosstalk noise reduction. The algorithm repeatedly applies the SA-based track permutation and then obtaines a crosstalk reduced routing result. The experimental results show the effectiveness of the proposed crosstalk reduced routing algorithm.

    CiNii

  • タイミング制約を考慮したFPGA配置概略配線同時処理手法

    戸川望

    DAシンポジウム'93論文集     137 - 142  1993

    CiNii

▼display all

Industrial Property Rights

  • 計算方法、計算システム、及びプログラム

    戸川 望, 多和田 雅師, 跡部 悠太

    Patent

    J-GLOBAL

  • 計算方法、計算システム、及びプログラム

    戸川 望, 白井 達彦

    Patent

    J-GLOBAL

  • ハードウエアトロイ検出方法、ハードウエアトロイ検出装置及びハードウエアトロイ検出用プログラム

    特許第7410476号

    永田 真一, 高橋 功次, 戸川 望, 大屋 優

    Patent

    J-GLOBAL

  • 組合せ最適化装置、組合せ最適化方法、およびプログラム

    巴 徳瑪, 新井 淳也, 八木 哲志, 寺本 純司, 川上 蒼馬, 武笠 陽介, 鮑 思雅, 戸川 望

    Patent

    J-GLOBAL

  • 処理装置、処理方法及び処理プログラム

    特許第7285516号

    巴 徳瑪, 内山 寛之, 八木 哲志, 新井 淳也, 吉村 夏一, 多和田 雅師, 田中 宗, 戸川 望

    Patent

    J-GLOBAL

  • 学習装置、学習方法及び学習プログラム

    特許第7223372号

    披田野 清良, 清本 晋作, 長谷川 健人, 戸川 望

    Patent

    J-GLOBAL

  • 検知装置、学習装置、検知方法及び検知プログラム

    長谷川 健人, 披田野 清良, 清本 晋作, 戸川 望

    Patent

    J-GLOBAL

  • 検出方法及び検出装置

    特許第7136439号

    戸川 望, 長谷川 健人

    Patent

    J-GLOBAL

  • 計算方法、計算装置、及びプログラム

    戸川 望, 多和田 雅師, 於久 太祐, 田中 宗

    Patent

    J-GLOBAL

  • 学習装置、学習方法及び学習プログラム

    披田野 清良, 清本 晋作, 野澤 康平, 戸川 望

    Patent

    J-GLOBAL

  • 処理装置、処理方法及び処理プログラム

    新井 淳也, 巴 徳瑪, 八木 哲志, 吉村 夏一, 多和田 雅師, 戸川 望

    Patent

    J-GLOBAL

  • ハードウエアトロイ検出方法、ハードウエアトロイ検出装置及びハードウエアトロイ検出用プログラム

    永田 真一, 高橋 功次, 戸川 望, 大屋 優

    Patent

    J-GLOBAL

  • 測定装置、ナビゲーションシステム、測定方法及びプログラム

    特許第6867254号

    戸川 望, 矢野 椋也, 石川 和明

    Patent

    J-GLOBAL

  • 学習装置、学習方法及び学習プログラム

    披田野 清良, 清本 晋作, 長谷川 健人, 戸川 望

    Patent

    J-GLOBAL

  • 処理装置、処理方法及び処理プログラム

    巴 徳瑪, 内山 寛之, 八木 哲志, 新井 淳也, 吉村 夏一, 多和田 雅師, 田中 宗, 戸川 望

    Patent

    J-GLOBAL

  • 検出方法及び検出装置

    戸川 望, 長谷川 健人

    Patent

    J-GLOBAL

  • ハードウェアトロイの検出方法、ハードウェアトロイの検出プログラム、およびハードウェアトロイの検出装置

    特許第6566576号

    戸川 望, 大屋 優

    Patent

    J-GLOBAL

  • 測定装置、ナビゲーションシステム、測定方法及びプログラム

    戸川 望, 矢野 椋也, 石川 和明

    Patent

    J-GLOBAL

  • 辞書検索方法、装置、およびプログラム

    右近 祐太, 宮崎 昭彦, 島▲崎▼ 健太, 多和田 雅師, 津田 俊隆, 中里 秀則, 戸川 望

    Patent

    J-GLOBAL

  • 辞書検索方法および装置

    青木 孝, 羽田野 孝裕, 大塚 卓哉, 宮崎 昭彦, 島▲崎▼ 健太, 戸川 望, 朴 容震, 津田 俊隆

    Patent

    J-GLOBAL

  • ハッシュ関数計算装置および方法

    青木 孝, 宮崎 昭彦, 羽田野 孝裕, 戸川 望, 島崎 健太, 津田 俊隆, 朴 容震

    Patent

    J-GLOBAL

  • ハッシュ関数計算装置および方法

    青木 孝, 宮崎 昭彦, 羽田野 孝裕, 戸川 望, 島崎 健太, 津田 俊隆, 朴 容震

    Patent

    J-GLOBAL

  • 半導体装置及びその制御方法

    伊澤 義貴, 戸川 勝巳, 戸井 崇雄, 藤井 太郎

    Patent

    J-GLOBAL

  • ハードウェアトロイの検出方法、ハードウェアトロイの検出プログラム、およびハードウェアトロイの検出装置

    戸川 望, 大屋 優

    Patent

    J-GLOBAL

  • 資源再配置装置、資源再配置方法およびプログラム

    青木 孝, 右近 祐太, 関原 悠介, 戸川 望

    Patent

    J-GLOBAL

  • 画像処理システムの構成装置および構成方法

    特許第5697102号

    小野澤 晃, 青木 孝, 戸川 望, 李 昇周

    Patent

    J-GLOBAL

  • 信号処理装置および信号処理方法

    史 又華, 戸川 望, 柳澤 政生, 五十嵐 博昭

    Patent

    J-GLOBAL

  • 信号処理装置および信号処理方法

    史 又華, 戸川 望, 柳澤 政生, 五十嵐 博昭

    Patent

    J-GLOBAL

  • 故障攻撃検出回路および暗号処理装置

    戸川 望, 五十嵐 博昭, 史 又華

    Patent

    J-GLOBAL

  • 計算システム、処理装置、及び計算システムにおける内部負荷分散方法

    小野澤 晃, 青木 孝, 戸川 望, 李 昇周

    Patent

    J-GLOBAL

  • 画像処理システムの構成装置および構成方法

    小野澤 晃, 青木 孝, 戸川 望, 李 昇周

    Patent

    J-GLOBAL

▼display all

 

Syllabus

▼display all

 

Sub-affiliation

  • Faculty of Science and Engineering   Graduate School of Fundamental Science and Engineering

  • Affiliated organization   Global Education Center

Research Institute

  • 2023
    -
    2024

    Center for Data Science   Concurrent Researcher

  • 2022
    -
    2024

    Waseda Research Institute for Science and Engineering   Concurrent Researcher

  • 2022
    -
    2024

    Global Information and Telecommunication Institute   Concurrent Researcher

  • 2022
    -
    2024

    Waseda Center for a Carbon Neutral Society   Concurrent Researcher

  • 2022
    -
    2024

    Research Organization for Open Innovation Strategy   Concurrent Researcher

  • 2018
    -
    2023

    Research Institute for Next-Gen Computing   Director of Research Institute

▼display all

Internal Special Research Projects

  • 機械学習を用いた自動プログラミングによる量子アニーリング計算機の実応用

    2023  

     View Summary

     内閣府「量子未来社会ビジョン」(2022年4月公表)では,創薬・医療,材料,金融,製造など社会経済システム全体に量子技術を取り組み,我が国の産業の成長機会を創出し社会課題の解決が得組まれる.一方,産業応用が近いとされる量子技術の一つとして「量子アニーリング計算機」が注目され,さまざまな産業応用に見られる組合せ最適化問題を高速・実時間に解法することが期待される.一方,量子アニーリングによる組合せ最適化問題の解法では,対象問題をイジングモデルによって表現する必要があるが,組合せ最適化問題の中には本質的にイジングモデルで表現不可能なものが多数存在する. 本研究は機械学習を利用することで量子アニーリング計算機のプログラミングに注目し,実問題や実物理現象をイジングモデルのベースとなる「2次形式」で自動的に十分よく近似表現する仕組みを構築することを目指す.得られた「2次形式」のもと量子アニーリング計算機により原問題を求解することで,これまで難しいとされた実問題や実物理現象にもとづく多様な組合せ最適化問題が量子アニーリング計算機によって極めて高速・高精度に求解可能となる.今年度は,いくつかの例題をもとにその妥当性を検証した.

  • 機械学習を用いた自動プログラミングによる量子アニーリングの応用

    2023  

     View Summary

     量子技術の中でも「量子アニーリング計算機」は,内部に「スピン」の集合から構成されるイジング模型を持ち,その基底状態(最小エネルギー状態)を求めることで組合せ最適化問題を高速に解法することが期待される.産業応用が近く,内閣府「量子未来社会ビジョン」で注目される一方,量子アニーリング計算機を利用するには,組合せ最適化問題を説明バイナリ変数(以降,単に説明変数と呼ぶ)の「2次形式」で表現されるエネルギー関数に落とし込む必要がある(量子アニーリング計算機のプログラミングに相当する)が,実問題や実物理現象を説明変数の「2次形式」で表現することは極めて難しい,あるいは不可能である. 本研究では,機械学習を利用することで量子アニーリング計算機のプログラミングに注目することで,実問題や実物理現象を「2次形式」で自動的に十分よく近似表現する仕組みを構築し,さらに特定の拾産業問題に適用して,その有効性を評価する.特に,今年度採択されたNEDO量子AI事業を補完し,提案技術の有効性を評価するための基盤技術研究開発を行った.

  • 機械学習と量子アニーリングによる多様な組合せ最適化問題の解法

    2022  

     View Summary

     Soceity5.0を実現する産業分野では数多くの「組合せ最適化問題」が存在し,その高速・実時間解法が最大の困難点と言われる.一方,量子アニーリングマシンをはじめとする「イジング計算機」が注目されており,さまざまな組合せ最適化問題の解法が検討されている.一方,イジング計算機を利用した組合せ最適化問題の解法では,対象問題をイジングモデルによって表現する必要があるが,組合せ最適化問題の中には本質的にイジングモデルで表現不可能なものが多数存在する. 本研究では,上記の問題を解決するため,「機械学習」と「量子アニーリング」を融合により,Soceity5.0を実現する産業分野のさまざまな実問題の解法を目指し,基礎検討を行った.結果として,広範な組合せ最適化に量子アニーリングマシンの適用の道筋を確認した.

  • イジング計算機による「新しい生活様式」の実現

    2021  

     View Summary

     Soceity5.0を実現する産業分野では数多くの「組合せ最適化問題」が存在し,その高速・実時間解法が最大の困難点と言われる.一方,量子アニーリングマシンをはじめとする「イジング計算機」が注目されていており,イジング計算機の活用が期待されている. 本研究では,厚生労働省で示された「新しい生活様式」を実現すべく,密集・密接・密閉(3密)を回避する生活様式ならびに行動様式を「組合せ最適化問題」に定式化し,これらをイジング計算機によって解法することを目的に,QUBOによる定式化ならびに実イジング計算機による評価を行った.評価の結果,従来方式に比較して,計算時間や収束性 において優位性が確認できた.

  • イジング計算機による地理空間情報処理問題の高速解法

    2020  

     View Summary

    本研究では,Society5.0の実現に不可欠となる「地理空間情報処理問題」をいくつか取り上げ,これをイジング計算機によって高速解法を実施した.地理空間情報処理問題として,ここでは実問題をベースに,複数の集積箇所を持ち移動体に容量制約を持った集配経路探索問題,アミューズメントパークを対象とした経路探索問題を取り上げ,これらを効果的にイジング模型にマッピングし,イジング計算機によって解法した.その上で既存手法と比較評価することで,イジング模型マッピングならびにイジング計算機による解法の有効性を確認した.

  • イジング計算機による組合せ最適化問題の高速解法の可能性

    2019  

     View Summary

     内閣府資料によれば,モビリティ,金融,創薬など,Society5.0を実現する産業分野では,数多くの「組合せ最適化問題」の高速・実時間解法が最大の困難点であり,NP困難問題など難しいクラスの組合せ最適化問題に対し,いかに「高速・実時間で」(準)最適解を求めるかがその成否を決めると言われる. 一方,非ノイマン型コンピューティング技術の決定打として量子アニーリングマシンをはじめとする「イジング計算機」が注目されている.イジング計算機は物理現象を利用することで組合せ最適化問題を高速に解法するものであり,カナダD-Wave,我が国では日立,NTT,富士通,NEC,東芝等が次々にイジング計算機を発表している.ところが,これら既存のイジング計算機が対象とした組合せ最適化問題の多くは,グラフ最大カット問題等のイジング計算機に都合が良い単純な問題ばかりであり,現実的な組合せ最適化問題の解法に至っていない.しかも現状,イジング計算機は「物理的特性」(複数のスピンのコヒーレンス時間の長時間化等)が注目されるばかりであり「イジング計算機の実応用」はあまり注目されていない.つまりイジング計算機による高速化・低電力化等はまだサンプル評価段階であり,ここに最大の問題点がある. 以上の背景のもと,本研究では,イジング計算機にブレークスルーを与え, Society5.0の実現に不可欠となる「現実的な」組合せ最適化問題に対して,イジング計算機により高速に解法することを目的に研究に取り組んだ.2019年度には,他の研究資金の成果を補うことを目的に,二次割当問題の最適解法,長方形敷き詰め問題(矩形パッキング問題)の解法,長方形敷き詰めを3次元に拡張した直方体敷き詰め問題の解法,グラフの同型判定問題の解法など,これまでイジング計算機で解法されて来なかった数々の問題の解法に挑戦し,一定の成果を得た.

  • 不安定な環境発電でも永続動作可能とする超低エネルギーでロバストな集積回路設計技術

    2018   木村 晋二, 多和田 雅師, 川村 一志

     View Summary

    IoT (Internet of Things) 時代に「もの」がネットワーク化され至るところで運用されれば,電力ネットワークから安定電力の供給は不可能となり,エネルギーの地産地消,即ち太陽光や振動など環境発電による回路駆動が必須となる.本研究では,レジスタ分散型アーキテクチャと呼ばれる基本アーキテクチャをベースに,集積回路の設計マージンを削減し,さらに短期・長期の遅延変動にロバストな集積回路設計技術を構築した.構築された集積回路設計技術は,レジスタ分散型アーキテクチャにより高位設計と物理設計とを統合したものであり,これにより集積回路の設計マージンを削減,低エネルギー化を実現する.さらに,遅延監視回路を埋め込むことで短期遅延変動に対応,複数の設計シナリオの作り込みにより長期遅延変動に対応した.結果的に,これら個別の集積回路設計技術の見込みを得た.

  • FPGAデバイスに侵入したハードウェアトロイ検知技術の構築

    2018  

     View Summary

     一般に,集積回路の設計・製造工程は,設計や製造コストを削減するため積極的に外注を利用しているのが現状である.すなわち設計・製造プロセスにおいて,悪意ある設計・製造者が存在した場合,IoT機器に原理的に設計者の意図しない不正な回路部品(ハードウェアトロイと呼ばれる)の侵入の危険性がある.本研究は,FPGAデバイス(書き換え可能な回路デバイス)などの集積回路を対象に,ハードウェアトロイ回路を発見することを目的とする.本研究では,まずFPGAを含む集積回路デバイス中において,ハードウェアトロイの特徴について考察を進めた.その結果,ハードウェアトロイは,(1) 局所的に高いファンイン(入力線数)を持つ,(2) 外部入力に近い位置にある等の性質があることを見出した.これらの知見のもと,ハードウェアトロイを「学習」ならびに学習させて識別器をもとに,未知集積回路に対して,ハードウェアトロイの識別に成功した.

  • 機械学習による複合的なIoTデバイスの異常検知・回復技術の構築

    2017  

     View Summary

    IoT(「もの」のインターネット)デバイスは多くの大規模集積回路(LSI)によって構成されが,その設計・製造プロセスにおいて,悪意ある設計・製造者が存在した場合,IoTデバイスに原理的に設計者の意図しない不正な回路部品(ハードウェアトロイと呼ばれる)の侵入の危険性がある.安全かつ安心にIoTデバイスを運用するためには,IoTデバイス中の不正な回路部品をいち早く検知,これを取り除くことで,セキュアなIoTデバイスを実現する必要が強く求められる.本研究では機械学習を積極的に利用することで,IoTデバイス中の不正回路を高精度に検知することに成功し,またIoTデバイスの消費電力を計測することで不正動作を発見することに成功した.

  • 機械学習を用いた設計工程ハードウェアトロイ検出手法の構築

    2016  

     View Summary

    一般に,大規模集積回路(LSI)の設計・製造工程において,悪意ある設計・製造者が存在した場合,原理的にハードウェアトロイの侵入の危険性がある.そしてハードウェアトロイが侵入したLSIならびにこれを用いたシステムの機能を無効・破壊される可能性や機密情報を漏洩する恐れがある. 本研究では,未知のハードウェアトロイ回路に適応的に対応すべく,機械学習を用いた設計工程ハードウェアトロイ検出技術を確立した.未知のハードウェアトロイに対して,例題によって80%を超える検出率を達成している.

  • 不揮発メモリのための書込みビット数を厳密に最小化する符号化とノーマリオフ計算応用

    2016  

     View Summary

    不揮発メモリはノーマリオフ計算の中心的役割を果たすが,不揮発メモリのビット書込みエネルギーはビット読出しに比べ1桁~2桁以上大きく,その成否は「不揮発メモリの書込みビットをいかに削減するか」にある.これに対し我々はデータを符号化することで,書込みビット数を厳密に最小化する符号構成方法の構築に成功した.本研究では,まずノーマリオフ計算アプリケーションに最適な構成を持つ書込みビットを最小化する符号を構築した.特に書込みビットに対して,誤り訂正能力を付加し,書込み削減と同時に誤り訂正能力を持つ符号化の構成方法の構築に成功した.

  • 環境発電の不安定な微小電力で永続動作する超低エネルギー・ロバスト集積回路設計技術

    2016   木村晋二

     View Summary

    本研究では,集積回路設計において,動作変動があってもロバストに動作を続ける回路設計技術の構築を目標に主に以下の2点について取り組んだ.まず第一に回路動作を複数の「シナリオ」として実現し,回路動作中に遅延の変動を監視し,遅延変動があった場合には適切な「シナリオ」に回路動作をスイッチし,常に「最適なシナリオ」で動作する回路設計技術を構築した.続いて第二に,回路の経年劣化現象,特にNBTI(Negative Bias Temperature Instability ; 負バイアス温度不安定性)に注目し,経年劣化を考慮した回路設計技術を構築した.回路中の最適箇所をパワーゲーティングすることにより,経年劣化による遅延変動量を小さく抑えることに成功した.

  • 自然エネルギーで半永久的に動作し続けるレジリエント集積回路設計技術

    2015  

     View Summary

     自然エネルギーを中心とした社会において,自然エネルギー発電で駆動する集積回路を動作し続けるためには,(A) 集積回路の消費電力量の無駄を極限まで省く設計技術,ならびに,(B) 自然エネルギーの電力供給に変動があっても動作する集積回路の設計技術が鍵となる. 本研究は,(1) 集積回路の正常動作と回路要素の正しい結合状態とが等価であることを利用し,(1-1) 結合状態を監視する回路技術,(1-2) 結合状態に異常が見られた場合これを修復する回路技術を構築した.さらに (2) 結合状態の監視・修復技術を持った回路設計技術を構築した.さまざまなアプリケーションに適応した結果,20%を超える性能マージンの削減に成功した.

  • ノーマリオフ計算のための書込みビットを厳密に最小化する書込み削減符号の構築

    2015  

     View Summary

     不揮発メモリはノーマリオフ計算の中心的役割を果たすが,不揮発メモリのビット書込みエネルギーはビット読出しに比べ1桁~2桁以上大きく,その成否は「不揮発メモリの書込みビットをいかに削減するか」にある.これに対し我々はデータを符号化することで,書込みビット数を厳密に最小化する符号構成方法の構築に成功した.&nbsp; 本研究では,まずノーマリオフ計算アプリケーションに最適な構成を持つ書込みビットを最小化する符号を構築し,符号化器,復号化器,メモリセルから構成される不揮発メモリシステムを構築した.さらにビット書込み削減と同時に,誤り訂正能力を持った符号を構築した.メモリエネルギーを最大40%程度削減した.

  • 書込みビット数を1桁削減する書込み削減符号の構築とノーマリオフ技術への展開

    2014   多和田雅師

     View Summary

    ノーマリオフコンピューティングとは,常時「電源オフ」を基本とする計算パラダイムであり,その中心的な役割を担うのが「不揮発メモリ」である.不揮発メモリは,書込みエネルギーが通常の揮発メモリに比べ極めて大きい.ノーマリオフコンピューティングの実現には「不揮発メモリの書き込みビットをいかに削減するか」が最大の問題となる.本研究では,不揮発メモリを対象に,データを一旦符号化し,符号語どうしの「距離」を極小化することによって,不揮発メモリを書き込みエネルギーの最小化手法を構築した.実験により,不揮発メモリの書き込みビット数を最大75%,書き込みエネルギーを最大33%削減することを確認した.

  • 世界最速を達成するメニーコアプロセッサのキャッシュ構成シミュレータの研究開発

    2013   多和田 雅師

     View Summary

     現在,我々の身の回りにあるデジタルテレビ,ハードディスクレコーダ,携帯電話,自動車,エアコン,炊飯器などあらゆる電化製品に,ほぼ必ず大小の「組込みプロセッサ」が組み込まれている.我々の豊かで安全・安心な生活に組込みプロセッサの性能・価格は密接に関わってきている.とりわけ半導体加工技術の進歩に伴い,組込みプロセッサのトレンドは単一プロセッサコアから複数のプロセッサコアを集積したメニーコアプロセッサが主流となっている. 高性能化されたメニーコアプロセッサは,内部に「キャッシュメモリ」を搭載している.キャッシュメモリとは,メニーコアプロセッサの性能と,SDRAMなどの外部メモリの性能とのギャップを補償するために,プロセッサと外部メモリの間を仲介するメモリシステムであるが,キャッシュサイズそのものの増大ならびに半導体の微細化によるリーク電流の増大を主な原因として,キャッシュの面積は,プロセッサ全面積のうち最大で60%~80%にも達し,同様にその消費電力は最大で50%~70%にも達する.極端に言えば,メニーコア組込みプロセッサの価格・性能を決定づけるのはもはやキャッシュメモリである.とりわけメニーコアプロセッサのメモリ構成は,各プロセッサコアに固有のL1キャッシュ,また複数のプロセッサコアに共有されるL2キャッシュ,L3キャッシュより構成され,単一のプロセッサに比較し極めて複雑なものとなる.特定の応用プログラムが与えられたとき,メニーコア組込みプロセッサのキャッシュの振舞いを正確に知ることは,その価格・性能の決定に大きく寄与することになる. 以上の背景のもと,本研究ではメニーコアプロセッサのキャッシュに特有な数理的性質を発見・証明すると共に,ここまでの数理的性質を適用することで,超高速なメニーコアプロセッサのキャッシュ構成シミュレーション技術の開発した.本研究の成果は主に以下の2点に集約される:(1) キャッシュ構成シミュレーションは,単一構成のキャッシュシミュレーションを複数回行うことで実現できる.しかし,この手法は現実的でない時間がかかる可能性がある.複数のキャッシュ構成をまとめて同時にシミュレーションすることができれば実行時間を短縮できる. 複数のキャッシュ構成をまとめて同時にシミュレーションするためには,同時に複数のキャッシュ構成を表現するデータ構造が必要となる.ひとつのデータ構造を探索,更新することで複数のキャッシュ構成で探索,更新が行われるようなデータ構造を構築することができれば高速なキャッシュ構成シミュレーションを実現できる可能性がある.  そこで本研究では,キャッシュの「連想度」に着目し,連想度の異なる複数のキャッシュ構成を「ひとつのデータ構造で表現する手法」を提案した.(2) 上記(1)で提案した,複数のキャッシュ構成を同時に表現するデータ構造を計算機上に実装し,実際にメニーコアプロセッサのためのキャッシュ構成シミュレータを構築した.構築したキャッシュ構成シミュレータは,従来のキャッシュ構成シミュレーションに比較して,キャッシュのヒット/ミスを正確に,かつ,20倍の高速化を実現していることを確認した.

  • 世界最速を達成する階層キャッシュ構成シミュレータの研究開発

    2011   多和田 雅師

     View Summary

    1. 研究背景 現在,我々の身の回りにあるデジタルテレビ,ハードディスクレコーダ,携帯電話,自動車,エアコン,炊飯器などあらゆる電化製品に,ほぼ必ず大小の「組込みプロセッサ」が組み込まれている.我々の豊かで安全・安心な生活に組込みプロセッサの性能・価格は密接に関わってきている. 高性能化された組込みプロセッサは,内部に「オンチップメモリ」と称されるメモリを搭載している.オンチップメモリとは,組込みプロセッサの性能と,DRAMなどの外部メモリの性能とのギャップを補償するために,プロセッサと外部メモリの間を仲介するメモリシステムであるが,オンチップメモリサイズの増大ならびに半導体の微細化によるリーク電流の増大を主な原因として,オンチップメモリの面積は,組込みプロセッサの全面積のうち最大で60%~80%にも達し,同様にその消費電力は最大で50%~70%にも達する.極端に言えば,組込みプロセッサの価格・性能を決定づけるのは,もはやオンチップメモリであり,その振舞いを知ることが組込みプロセッサの価格・性能の決定に大きく寄与することになる. オンチップメモリは,一般に(1) L1(レベル1)キャッシュ,(2) L2 (レベル2)キャッシュならびに(3) スクラッチパッドメモリによって構成される.本研究では,これら構成要素(1)~(3)に対して,特定のプログラム-例えば,デジタルテレビであれば,デジタル放送のデコード処理-が組込みプロセッサ上で実行されると仮定し,オンチップメモリの構成要素(1)~(3)の総計7個のパラメータを,それぞれその最小値から最大値まで変化させたとき,オンチップメモリ内でデータのヒットとミスが何回起こるかを,極めて高速にかつ正確にシミュレーションすることで,これを結果をベースとした【最適なオンチップメモリ構成】を得ることを目的とする.世界で最速とされるオンチップメモリシミュレータに比較して1000倍以上の高速化を達成することを目標とした.2. 研究成果概要 以下に示すように第1段階(L1キャッシュ/L2キャッシュ)と第2段階(L1キャッシュ/L2キャッシュ/スクラッチパッドメモリ)に分けて,研究を実施した.【第1段階】 (L1キャッシュ/L2キャッシュの超高速シミュレーションによる最適化)オンチップメモリの構成要素のうち,まずL1キャッシュ/L2キャッシュのシミュレーションを取り上げ,6個のパラメータを変化させたときのヒット数とミス数を正確に算出することで,メモリアクセス時間最小化あるいは消費エネルギー最小化を達成する最適パラメータを探索する. 申請者は,L1キャッシュのシミュレーションにおいて,キャッシュメモリが持つ普遍的な数理的性質を世界で初めて見出している.これらの性質をL1キャッシュとL2キャッシュの双方に適用することで,超高速なシミュレーションベースのメモリアクセス時間最小化あるいは消費エネルギー最小化が実現することを考えた.以上の考察のもと次の性質を見出し,さらにこれに基づくオンチップメモリの高速最適化技術を考案した.【性質1】L1 キャッシュ構成の連想度が1 となる2階層L1キャッシュ/L2キャッシュ構成は,同じ構成をもつ1階層L1キャッシュのキャッシュミス数と同一となる. この性質をもとに,2階層L1キャッシュ/L2キャッシュ構成のキャッシュヒット数,ミス数を正確にシミュレーションする高速化手法CRCB-T手法を考案した.考案した手法を計算機上で評価した結果,従来の技術に比較して【1465倍】の高速化が達成できていることを確認した.【第2段階】 (L1キャッシュ/L2キャッシュ/スクラッチパッドメモリの超高速シミュレーションによる最適化) 組込みプロセッサのオンチップメモリは,L1キャッシュ-L2キャッシュ-スクラッチパッドメモリという構造を持つ.【第2段階】では,L1/L2キャッシュメモリだけでなく,スクラッチパッドメモリを含めたオンチップメモリ全体の高速シミュレーションによる7個のパラメータ全部の最適化を課題とする.【第1段階】の研究成果をスクラッチパッドメモリに拡張すると同時に,スクラッチパッド単独の数理的性質を見出すことで,最終的に従来最速されているシミュレーションベースのオンチップメモリ最適化手法に比較して,100倍~1000倍の高速化を実現する. スクラッチパッドを組み込んだL1キャッシュ-L2キャッシュ-スクラッチパッドメモリ構成において上述の性質1につづき,以下の性質を見出した.これは単純かつ明解なものであるが,不変の原理としてすべてのオンチップメモリに適用し得る極めて重要な性質である.【性質2】より小さい容量のスクラッチパッドメモリに収容されるデータは,必ずより大きい容量のスクラッチパッドメモリに含まれる.  この性質をもとに,L1キャッシュ-L2キャッシュ-スクラッチパッドメモリ構成のキャッシュヒット数,ミス数を正確にシミュレーションする高速化手法CRCB-S手法を考案した.考案した手法を計算機上で評価した結果,従来の技術に比較して約【3173倍】の高速化が達成できていることを確認した. これらの研究成果として,従来,世界最高速のシミュレーションベースのオンチップメモリ最適化に数ヶ月を要した実行時間を,提案する技術により数時間以内に完了させることになる.本研究は世界際高速のキャッシュ構成シミュレーションが達成されたことを意味する.

  • 安全・安心な電子社会のための暗号LSI攻撃とその防御

    2010  

     View Summary

     近年,暗号処理を実装したLSI (大規模集積回路) に対し,テスト用のスキャンパスを利用することでその秘密鍵を復元するスキャンベース攻撃が注目されている.スキャンパスとはLSI中のレジスタを直列に接続し,LSIの外部からレジスタを直接制御・観測できるようにしたテスト容易化手法の1つであり,スキャンパステストを用いることでLSIテスト効率を大幅に高めることができる. その一方,スキャンパスを使用して動作中のLSI内部のレジスタ出力を取得できることを利用し,暗号回路の動作状態を解析,秘密鍵復元に応用したものがスキャンベース攻撃である.スキャンベース攻撃の難しさは攻撃者が暗号動作中のスキャンデータを取得しても,そのスキャンデータとレジスタの対応関係が不明である点にある.これに対し従来いくつかの手法が提案されて来ているがいずれも次の2点に大きな問題がある.(1) スキャンパスが暗号回路中のレジスタだけで構成されている場合のみ有効であり,周辺回路のレジスタを含むことができない.(2) 共通鍵暗号DESおよびAES を対象としており,スキャンベース攻撃で公開鍵暗号方式の秘密鍵を復元できない. このような背景のもと本研究では,暗号回路以外のレジスタがスキャンパスに含まれていても秘密鍵を復元すると同時に,公開鍵暗号方式として知られるRSA暗号ならびに楕円曲線暗号の秘密鍵も復元することを可能とした新たなスキャンベース攻撃手法を提案した.提案手法は,暗号中に計算される「中間値」を保持する特定の1ビットレジスタの変化の系列に着目する.十分な数の入力からそれぞれ計算した暗号処理中の中間値の1ビットの変化は乱数に近い値であり,その途中結果に固有の値となる(これを判別値あるいはスキャンシグニチャと呼ぶ).スキャンシグニチャがスキャンデータの中に存在するか否かでスキャンデータを解析する.計算機シミュレーションおよびFPGAボードを使った評価実験を通して,AES,RSA,楕円曲線暗号のそれぞれにおいて最大数百程度の平文によって,128ビットを越える秘密鍵を解読できることを示した.さらに提案するスキャンベース攻撃から暗号LSIを防御するためスキャンデータの解析を妨害する新たなスキャンパス防御手法-状態依存スキャンレジスタ技術-を提案した.

  • 通信処理向け適応型プロセッサ設計に関する研究

    2005  

     View Summary

     通信処理プロセッサ/ネットワークプロセッサは,比較的新しいタイプのプロセッサで,主にパケットのスイッチングに代表される通信処理に特化した専用プロセッサである.これまで基幹ネットワークの通信処理など高速な通信パケットのスイッチングが主眼となる箇所に使用されてきた.しかしながら,情報家電を筆頭とするエンドユーザ機器では,通信処理プロセッサに対し,これまでの (1)単純なスイッチング処理,に加え,(2)マルチメディア情報の符号化/復号化,(3)個人コンテンツ情報の暗号化/復号化,(4)ファイアウォール機能,(5)QoS(Quality ofService)の制御,を適応的に実現することが不可欠と考える. 以上の背景のもと,本研究では通信処理向けにアプリケーション処理を適応的に変化させることを可能とした専用プロセッサの設計に取り組んだ.提案する通信処理プロセッサは,複数個の「不均一」な構造を持つプロセッサコアの集合体として,通信パケット処理の負荷に応じて,「適応的に」内蔵プログラムを変化させ,処理の均衡化を図るしくみを持つものを考える.また通信パケットの遅延制御を確保するため,パケット優先順位に基づくバス調停機構(QoS調停機構)を設け,しかも各プロセッサコアは,QoS調停機構付き共通バスに接続されるアーキテクチャを持つ.この結果,上記(1)~(5)の処理を実現しかつ確実なパケット遅延制御を実現することが期待できる. (1) 通信処理向け適応型プロセッサの基本アーキテクチャの構築,ならびに,(2) 通信処理向け適応型プロセッサの自動設計環境フレームワークの構築,を通じて,提案する通信処理向けにアプリケーション処理を適応的に変化させた専用プロセッサは,既存のネットワークプロセッサに比較して,そのスケーラビリティを大幅に向上させることで,10%~20%程度の性能向上を実現し,3Gbpsに近いスイッチング処理や,250Mbpsを越える暗号化通信処理を実現することが確認できた.

  • 制御フローを主体としたハードウェアの高位合成手法に関する研究

    1999  

     View Summary

     一般に、画像符号化・復号化、プロトコル処理あるいは暗号処理といった、ビット処理あるいは条件分岐処理から構成されるアプリケーションプログラムが専用ハードウェアによって実現されると、ビット処理および条件分岐処理等が並列実行可能となり、マイクロプロセッサによって実現された場合と比較し、高速実行が可能となる。制御処理を主体とする専用ハードウェア設計を自動化する高位合成システムは、1.ビット処理および条件分岐処理といった制御処理を実現するハードウェアを合成可能であり、加えて2.設計者によって与えられた動作仕様に対し複数の設計候補を提供し最適設計を評価する環境を必要とする、と考える。本研究では、このような考えに基づき、制御処理ハードウェアを対象に、C言語による動作記述からハードウェア記述言語によるハードウェア記述を合成する高位合成システムを提案した。本システムは、C言語によるアプリケーションプログラムの動作記述、アプリケーションデータを入力として、アプリケーションプログラムを実現するハードウェア記述を出力する。入力されたアプリケーションプログラムに対し、時間制約および面積制約を満足するハードウェアを複数個列挙する。システムは、(i)コード最適化、(ii)面積/時間最適化、(iii)ハードウェア記述生成の各処理によって実現される。まず、コード最適化は、アプリケーションプログラムを入力とし、これを内部表現となるコールグラフならびにコントロールフローグラフにより表現する。面積/時間最適化は、コールグラフならびにコントロールフローグラフから、時間制約および面積制約を満足する複数個のハードウェア候補を得る。最後に、ハードウェア記述生成系は、面積/時間最適化によって得られたハードウェア候補に対してハードウェア記述を出力する。 本研究ではさらに、本システムで核となる面積/時間最適化に注目し、これを実現する面積/時間最適化アルゴリズムを提案・構築した。提案アルゴリズムは、入力としてコールグラフおよびコールグラフを構成するコントロールフローグラフ集合を取り、面積制約および時間制約のもとに、コールグラフ全体を表す状態遷移グラフ集合を合成する。まず、時間制約のみを満足する状態遷移グラフを構築し、その後、時間制約を満足する間、面積制約を満足するよう状態遷移グラフを変換することによって複数個のハードウェア候補を得ることができる。提案アルゴリズムは次の特長を持つ。(1)コントロールフローグラフを直接的に操作することで、ビット処理および条件分岐処理といった制御処理を扱うことができる。(2)アプリケーションプログラム全体を表す1個のコールグラフから、面積制約および時間制約を満足する複数個のハードウェア候補を列挙することができる。 提案した面積/時間最適化アルゴリズムをシステムの一部として組み込み、制御処理アプリケーションプログラムに適用した結果、面積と実行時間とがトレードオフの関係にある複数個のハードウェアを合成することができた。しかも、合成されたハードウェアは、人手設計によるハードウェアに比較して、より面積の小さい結果から面積の大きい結果、より実行時間の小さい結果から実行時間の大きい結果を得た。

  • 画像処理向け組込みプロセッサのハードウェア/ソフトウェア協調設計手法に関する研究

    1998  

     View Summary

     画像処理向け組込みプロセッサとは、画像処理専用システムLSIに集積されたプロセッサである。画像処理向け組込みプロセッサは、従来の汎用途マイクロプロセッサに見られない数多くの画像処理専用ユニットを有しており、いかにこれらを組み合わせてプロセッサを構築していくかが最大の焦点となる。しかも最適設計を得るには、数多くのプロセッサアーキテクチャの候補を列挙し、プロセッサ上でアプリケーションソフトウェアを動作させる必要がある。即ち、画像処理向け組込みプロセッサのハードウェアとその上で動作するソフトウェアとを同時に自動設計する手法(ハードウェア/ソフトウェア協調設計手法)が求められる。これまで、アーキテクチャ候補列挙に基づくハードウェア自動設計手法に関する研究を行ってきた。計算機を用いてより広範囲のアーキテクチャの解空間を探索することにより、最適なアーキテクチャ設計を実現している。本研究は、この概念を画像処理向け組込みプロセッサのハードウェア/ソフトウェア協調設計に拡張することを目指したものである。 以上のような背景から、本研究では、動画像あるいは高精細画像の符号化・復号化、特徴抽出、強調復元といった画像処理アプリケーションを対象に、これらのアプリケーションソフトウェア専用のプロセッサを計算機によって自動設計する手法を構築した。構築された画像処理向け組込みプロセッサの自動設計手法は、次の手順に基づく。(1) まず、プロセッサに対して想定される全てのハードウェアユニット(積和器、アドレッシングユニット、ハードウェアルーピング回路、複数レジスタファイル)を付加したプロセッサモデルを定義し、このモデル上で与えられたアプリケーションプログラムをコンパイルする。この結果得られたアセンブリコードを実行するプロセッサハードウェアは、ハードウェアコストが増加するが実行時間は最短となる。(2) 続いて、ハードウェアユニットによる実現部の一部を徐々にソフトウェアによって代替する。得られるアセンブリコードは、徐々に実行時間が長くなるが、プロセッサハードウェアに必要とされる面積は小さくなる。(3) (2)によって得られたプロセッサ構成から、ハードウェア記述言語VHDLによって記述されたレジスタトランスファレベル(RTレベル)でのプロセッサ記述を合成する。得られたプロセッサ記述は、市販の論理合成ツールで論理合成可能であり、構築した手法を用いることで、極めて高速かつ低コストに画像処理向け組込みプロセッサを得ることができた。

  • ディジタル信号処理専用プロセッサのためのハードウェア/ソフトウェア分割手法に関する研究

    1997  

     View Summary

    DSP(Digital Singal Processor; ディジタル信号処理専用プロセッサ)は、高精細画像処理に代表される今日のディジタル信号処理に不可欠なデバイスである。DSPのためのハードウェア/ソフトウェア分割(HW/SW分割)とは、DSP内部において、ハードウェアとして実現する部分とソフトウェアとして実現する部分とを決定する問題であり、DSP自身ひいてはDSPを持つディジタル信号処理システムの価格、面積、性能を決定するものである。HW/SW分割手法を計算機によって自動的に実現することは、5年程度前から始まった新しい研究であり、これまで、一般のマイクロプロセッサに対しての報告があるのみである。DSPは、マイクロプロセッサに存在しない数多く信号処理専用ユニットを有しているため、本質的にマイクロプロセッサに対するHW/SW分割と問題を異にする。DSPのHW/SW分割には、従来研究がなされてきた集積回路の自動設計手法の概念、特に、DSPのデータパスを対象とした設計候補列挙による高位設計の概念、を応用できると考える。 以上のような背景から、本研究では、動画像の符号化・復号化、特徴抽出、強調復元といった画像情報処理アプリケーションを対象に、アプリケーションプログラム群専用のDSPハードウェアの計算機による自動合成システムを考え、システムの中核をなすHW/SW分割手法を構築した。構築されたHW/SW分割手法は、次の処理に基づく。(1)まず、DSPハードウェアに対して想定される全てのハードウェアユニット(積和器、アドレッシングユニット、ハードウェアルーピング回路)を付加したプロセッサモデルを定義し、このモデル上で与えられたアプリケーションプログラムをコンパイルする。この結果得られたアセンブリコードは、ハードウェアコストが増加するが実行時間は最短となる。(2)続いて、ハードウェアユニットによる実現部の一部を徐々にソフトウェアによって代替する。得られるアセンブリコードは、徐々に実行時間が長くなるが、DSPハードウェアに必要とされる面積は小さくなる。(3)時間制約に違反するまでこの処理を繰り返すことにより、アプリケーションの実行時間が時間制約を満たし小面積かつアプリケーションプログラムに最適なプロセッサコアを合成することが可能となる。 構築された手法を用いることで、短期間でアプリケーションプログラム群に適合したDSPハードウェアを構築・評価することが可能となり、短期間のうちに最新の画像情報処理アルゴリズムを実現するDSPおよびDSPを含めた信号処理システムを構築可能となった。研究成果の発表:1998年3月戸川望、桜井崇志、柳澤政生、大附辰夫、“ディジタル信号処理向けプロセッサのハードウェア/ソフトウェア協調合成システム、”電子情報通信学会技術研究報告、VLD97-115。1998年3月川崎隆志、戸川望、柳澤政生、大附辰夫、“ディジタル信号処理向けプロセッサの自動合成システムにおける並列化コンパイラ、”電子情報通信学会技術研究報告、VLD97-116。1998年3月濱辺雅哉、能勢敦、戸川望、柳澤政生、大附辰夫、“パイプラインプロセッサのハードウェア記述自動生成手法、” 電子情報通信学会技術研究報告、VLD97-117。

▼display all