Updated on 2024/12/21

写真a

 
WATANABE, Takahiro
 
Affiliation
Faculty of Science and Engineering
Job title
Professor Emeritus
Degree
Doctor of Engineering ( Touhoku University )

Research Experience

  • 2003
    -
     

    Waseda University, Graduate School of Information, Production and Systems

  • 1990
    -
    2003

    Yamaguchi University, Faculty of Engineering

  • 1979
    -
    1990

    Toshiba Corporation, Research and Deveopment Center

Education Background

  •  
    -
    1979

    Tohoku University   Graduate School of Engineering   Information Science  

  •  
    -
    1976

    Yamaguchi University   Graduate School of Engineering   Electrical Engineering  

  •  
    -
    1974

    Yamaguchi University   Faculty of Engineering   Electrical Engineering  

Professional Memberships

  •  
     
     

    The Institute of Electronics, Information and Communication Engineers

  •  
     
     

    The Japanese Society for Artificial Intelligence

  •  
     
     

    Research Institute of Signal Processing, Japan

  •  
     
     

    IEEE(the Institute of Electrical and Electoronics Engineeers,Inc.)

  •  
     
     

    Information Processing Society of Japan

  •  
     
     

    The Institute of Ekectronics, Information and Communication Engineers

▼display all

Research Areas

  • Computer system

Research Interests

  • LSI , Electronic Circuit, Integrated Circuit, Design Automation, Computer-Aided Design, Algorithm

Awards

  • IEICE Young Researchers' Award

    1983  

 

Papers

  • Predicting stock high price using forecast error with recurrent neural network

    Zhiguo Bao, Qing Wei, Tingyu Zhou, Xin Jiang, Takahiro Watanabe

    Applied Mathematics and Nonlinear Sciences    2021.05

     View Summary

    <title>Abstract</title>
    Stock price forecasting is an eye-catching research topic. In previous works, many researchers used a single method or combination of methods to make predictions. However, accurately predicting stock prices is very difficult. To improve the predicting precision, in this study, an innovative prediction approach was proposed by recurrent substitution of forecast error into the historical neural network model through three steps. According to the historical data, the initial predicted value of the next day is obtained through the neural network. Then, the prediction error of the next day is obtained through the neural network according to the historical prediction error. Finally, the initial predicted value and the prediction error are added to obtain the final predicted value of the next day. We use recurrent neural network prediction methods, such as Long Short-Term Memory Network Model and Gated Recurrent Unit, which are popular in the recent neural network study. In the simulations, the past stock prices of China from June 2010 to August 2017 are used as training data, and those from September 2017 to April 2018 are used as test data. The experimental findings demonstrate that the proposed method with forecast error gives a more accurate prediction result for the stock’s high price on the next day, which indicates that the performance of the proposed one is superior to that of the traditional models without forecast error.

    DOI

    Scopus

    10
    Citation
    (Scopus)
  • High performance virtual channel based fully adaptive 3D NoC routing for congestion and thermal problem

    Xin Jiang, Xiangyang Lei, Lian Zeng, Takahiro Watanabe

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E100A ( 11 ) 2379 - 2391  2017.11

     View Summary

    Recent Network on Chip (NoC) design must take the thermal issue into consideration due to its great impact on the network performance and reliability, especially for 3D NoC. In this work, we design a virtual channel based fully adaptive routing algorithm for the runtime 3D NoC thermal-aware management. To improve the network throughput and latency, we use two virtual channels for each horizontal direction and design a routing function which can not only avoid deadlock and livelock, but also ensure high adaptivity and routability in the throttled network. For path selection, we design a strategy that takes priority to the distance, but also considers path diversity and traffic state. For throttling information collection, instead of transmitting the topology information of the whole network, we use a 12 bits register to reserve the router state for one hop away, which saves the hardware cost largely and decreases the network latency. In the experiments, we test our proposed routing algorithm in different states with different sizes, and the proposed algorithm shows better network latency and throughput with low power compared with traditional algorithms.

    DOI

    Scopus

  • An adaptive routing algorithm based on network partitioning for 3D Network-on-Chip

    Jindun Dai, Xin Jiang, Takahiro Watanabe

    IEEE CITS 2017 - 2017 International Conference on Computer, Information and Telecommunication Systems     229 - 233  2017.09

     View Summary

    This paper presents an efficient routing algorithm for 3D meshes without virtual channels. The proposed routing algorithm is extended from 2D east-first routing algorithm and based on network partitioning. It is proven that the proposed method is free from deadlock. In comparison of previous routing algorithms, the average degree of adaptiveness is higher. This feature contributes to higher communication efficiency. Experimental results show that the proposed method can achieve lower communication latency and higher throughput over other traditional methods.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Behavior-aware cache hierarchy optimization for low-power multi-core embedded systems

    Huatao Zhao, Xiao Luo, Chen Zhu, Takahiro Watanabe, Tianbo Zhu

    MODERN PHYSICS LETTERS B   31 ( 19-21 )  2017.07  [Refereed]

     View Summary

    In modern embedded systems, the increasing number of cores requires efficient cache hierarchies to ensure data throughput, but such cache hierarchies are restricted by their tumid size and interference accesses which leads to both performance degradation and wasted energy. In this paper, we firstly propose a behavior-aware cache hierarchy (BACH) which can optimally allocate the multi-level cache resources to many cores and highly improved the efficiency of cache hierarchy, resulting in low energy consumption. The BACH takes full advantage of the explored application behaviors and runtime cache resource demands as the cache allocation bases, so that we can optimally configure the cache hierarchy to meet the runtime demand. The BACH was implemented on the GEM5 simulator. The experimental results show that energy consumption of a three-level cache hierarchy can be saved from 5.29% up to 27.94% compared with other key approaches while the performance of the multi-core system even has a slight improvement counting in hardware overhead.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Behavior-aware cache hierarchy optimization for low-power multi-core embedded systems

    Huatao Zhao, Xiao Luo, Chen Zhu, Tianbo Zhu, Takahiro Watanabe

    Modern Physics Letters B   31 ( 19 ) 1 - 7  2017.04  [Refereed]

  • High-Throughput Message Digest (MD5) Design and Simulation-Based Power Estimation Using Unfolding Transformation

    Suhaili Shamsiah binti, Watanabe Takahiro

    Journal of Signal Processing   21 ( 6 ) 233 - 238  2017

     View Summary

    The high throughput of the cryptographic hash function has become an important aspect of the hardware implementation of security system design. There are several methods that can be used to improve the throughput performance of MD5 design. In this paper, four types of MD5 design were proposed: MD5 iterative design, MD5 unfolding design, MD5 unfolding with 4 stages of pipelining design and MD5 unfolding with 32 stages of pipelining design. The results indicated that MD5 unfolding with 32 stages pipelining of design provides a high throughput compared with other MD5 designs. By using an unfolding transformation factor of 2, the number of cycles of MD5 design was reduced from 64 to 32. All the proposed designs were successfully designed using Verilog code and simulated using ModelSim. The throughput of MD5 unfolding with 32 stages of pipelining design was increased significantly to 137.97 Gbps, and the power of this MD5 unfolding with 32 stages of pipelining was 750.99 mW. Therefore, it is suggested that an unfolding transformation with a high performance pipelining are applied to MD5 hash function design in order to produce an embedded security system design. This paper is expected to be for improving the maximum frequency and the throughput of MD5 design. Thus, by increasing the number of stages in MD5 unfolding design, the performance of MD5 designs can be improved significantly.

    DOI CiNii

  • High Performance Virtual Channel Based Fully Adaptive 3D NoC Routing for Congestion and Thermal Problem

    JIANG Xin, LEI Xiangyang, ZENG Lian, WATANABE Takahiro

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   100 ( 11 ) 2379 - 2391  2017

     View Summary

    <p>Recent Network on Chip (NoC) design must take the thermal issue into consideration due to its great impact on the network performance and reliability, especially for 3D NoC. In this work, we design a virtual channel based fully adaptive routing algorithm for the runtime 3D NoC thermal-aware management. To improve the network throughput and latency, we use two virtual channels for each horizontal direction and design a routing function which can not only avoid deadlock and livelock, but also ensure high adaptivity and routability in the throttled network. For path selection, we design a strategy that takes priority to the distance, but also considers path diversity and traffic state. For throttling information collection, instead of transmitting the topology information of the whole network, we use a 12 bits register to reserve the router state for one hop away, which saves the hardware cost largely and decreases the network latency. In the experiments, we test our proposed routing algorithm in different states with different sizes, and the proposed algorithm shows better network latency and throughput with low power compared with traditional algorithms.</p>

    CiNii

  • High Performance Virtual Channel Based Fully Adaptive Thermal-aware Routing for 3D NoC

    Xin Jiang, Xiangyang Lei, Lian Zeng, Takahiro Watanabe

    PROCEEDINGS OF THE EIGHTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED)     289 - 295  2017  [Refereed]

     View Summary

    The thermal problem is a challenge in recent Network on Chip (NoC) designs due to its great impact on the network performance and reliability, especially for 3D NoC. In this work, we design a virtual channel based fully adaptive routing algorithm for the runtime 3D NoC thermal-aware management. For throttling information collection, instead of transmitting the topology information of the whole network, we use a 12 bits register to reserve the router state for one hop away instead of transmitting the topology information of the whole network. It saves the hardware cost largely and decreases the network latency. To ensure deadlock and livelock free and minimize the hardware overhead, we only use two virtual channels for each horizontal channel to achieve full adaptivity and high routability. For path selection, we design a strategy that takes priority to the distance, but also consider path diversity and traffic state. Experimental results show that the proposed algorithm shows better network latency and throughput with low power compared with traditional algorithms.

    DOI

    Scopus

    8
    Citation
    (Scopus)
  • A Fast MER Enumeration Algorithm for Online Task Placement on Reconfigurable FPGAs

    Tieyuan Pan, Lian Zeng, Yasuhiro Takashima, Takahiro Watanabe

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E99A ( 12 ) 2412 - 2424  2016.12  [Refereed]

     View Summary

    In this paper, we propose a fast Maximal Empty Rectangle (MER) enumeration algorithm for online task placement on reconfigurable Field-Programmable Gate Arrays (FPGAs). On the assumption that each task utilizes rectangle-shaped resources, the proposed algorithm can manage the free space on FPGAs by an MER list. When assigning or removing a task, a series of MERs are selected and cut into segments according to the task and its assignment location. By processing these segments, the MER list can be updated quickly with low memory consumption. Under the proof of the upper limit of the number of the MERs on the FPGA, we analyze both the time and space complexity of the proposed algorithm. The efficiency of the proposed algorithm is verified by experiments.

    DOI CiNii

    Scopus

    6
    Citation
    (Scopus)
  • A fast MER enumeration algorithm for online task placement on reconfigurable FPGAs

    Tieyuan Pan, Lian Zeng, Yasuhiro Takashima, Takahiro Watanabe

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E99A ( 12 ) 2412 - 2424  2016.12

     View Summary

    In this paper, we propose a fast Maximal Empty Rectangle (MER) enumeration algorithm for online task placement on reconfigurable Field-Programmable Gate Arrays (FPGAs). On the assumption that each task utilizes rectangle-shaped resources, the proposed algorithm can manage the free space on FPGAs by an MER list. When assigning or removing a task, a series of MERs are selected and cut into segments according to the task and its assignment location. By processing these segments, the MER list can be updated quickly with low memory consumption. Under the proof of the upper limit of the number of the MERs on the FPGA, we analyze both the time and space complexity of the proposed algorithm. The efficiency of the proposed algorithm is verified by experiments.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • An efficient highly adaptive and deadlock-free routing algorithm for 3D network-on-chip

    Lian Zengy, Tieyuan Pan, Xin Jiang, Takahiro Watanabe

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E99A ( 7 ) 1334 - 1344  2016.07

     View Summary

    As the semiconductor technology continues to develop, hundreds of cores will be deployed on a single die in the future Chip-Multiprocessors (CMPs) design. Three-Dimensional Network-on-Chips (3D NoCs) has become an attractive solution which can provide impressive high performance. An efficient and deadlock-free routing algorithm is a critical to achieve the high performance of network-on-chip. Traditional methods based on deterministic and turn model are deadlock-free, but they are unable to distribute the traffic loads over the network. In this paper, we propose an efficient, adaptive and deadlock-free algorithm (EAR) based on a novel routing selection strategy in 3D NoC, which can distribute the traffic loads not only in intra-layers but also in inter-layers according to congestion information and path diversity. Simulation results show that the proposed method achieves the significant performance improvement compared with others.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • An online task placement algorithm based on MER enumeration for partially reconfigurable device

    Tieyuan Pan, Li Zhu, Lian Zeng, Takahiro Watanabe, Yasuhiro Takashima

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E99A ( 7 ) 1345 - 1354  2016.07

     View Summary

    Recently, due to the development of design and manufacturing technologies for VLSI systems, an embedded system becomes more and more complex. Consequently, not only the performance of chips, but also the flexibility and dynamic adaptation of the implemented systems are required. To achieve these requirements, a partially reconfigurable device is promising. In this paper, we propose an efficient data structure to manage the reconfigurable units. And then, on the assumption that each task utilizes the rectangle shaped resources, a very simple MER enumeration algorithm based on this data structure is proposed. By utilizing the result of MER enumeration, the free space on the reconfigurable device can be used suffi-ciently. We analyze the complexity of the proposed algorithm and confirm its efficiency by experiments.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • High throughput evaluation of SHA-1 implementation using unfolding transformation

    Suhaili, Shamsiah Binti, Watanabe, Takahiro

    ARPN Journal of Engineering and Applied Sciences   11 ( 5 ) 3350 - 3355  2016.03

     View Summary

    © 2006-2016 Asian Research Publishing Network (ARPN).Hash Function is widely used in the protocol scheme. In this paper, the design of SHA-1 hash function by using Verilog HDL based on FPGA is studied to optimise both hardware resource and performance. It was successfully synthesised and implemented using Altera Quartus II Arria II GX: EP2AGX45DF29C4. In this paper, two types of design are proposed, namely SHA-1 and SHA-1unfolding. The maximum frequency of SHA-1 design is 274.2 MHz which is higher than SHA-1 unfolding that has the maximum frequency of only 174.73 MHz. However, this leads to a high throughput of the SHA1 unfolding design with 2236.54 Mbps. Besides, both designs provide a small area implementation on Arria II that requires only 423 and 548 Combinational ALUTs, 693 and 907 total register respectively.

  • An Online Task Placement Algorithm Based on MER Enumeration for Partially Reconfigurable Device

    PAN Tieyuan, ZHU Li, ZENG Lian, WATANABE Takahiro, TAKASHIMA Yasuhiro

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   99 ( 7 ) 1345 - 1354  2016

     View Summary

    Recently, due to the development of design and manufacturing technologies for VLSI systems, an embedded system becomes more and more complex. Consequently, not only the performance of chips, but also the flexibility and dynamic adaptation of the implemented systems are required. To achieve these requirements, a partially reconfigurable device is promising. In this paper, we propose an efficient data structure to manage the reconfigurable units. And then, on the assumption that each task utilizes the rectangle shaped resources, a very simple MER enumeration algorithm based on this data structure is proposed. By utilizing the result of MER enumeration, the free space on the reconfigurable device can be used sufficiently. We analyze the complexity of the proposed algorithm and confirm its efficiency by experiments.

    CiNii

  • An Efficient Highly Adaptive and Deadlock-Free Routing Algorithm for 3D Network-on-Chip

    ZENG Lian, PAN Tieyuan, JIANG Xin, WATANABE Takahiro

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   99 ( 7 ) 1334 - 1344  2016

     View Summary

    As the semiconductor technology continues to develop, hundreds of cores will be deployed on a single die in the future Chip-Multiprocessors (CMPs) design. Three-Dimensional Network-on-Chips (3D NoCs) has become an attractive solution which can provide impressive high performance. An efficient and deadlock-free routing algorithm is a critical to achieve the high performance of network-on-chip. Traditional methods based on deterministic and turn model are deadlock-free, but they are unable to distribute the traffic loads over the network. In this paper, we propose an efficient, adaptive and deadlock-free algorithm (EAR) based on a novel routing selection strategy in 3D NoC, which can distribute the traffic loads not only in intra-layers but also in inter-layers according to congestion information and path diversity. Simulation results show that the proposed method achieves the significant performance improvement compared with others.

    CiNii

  • Fully adaptive thermal-aware routing for runtime thermal management of 3D network-on-chip

    Jiang, Xin, Lei, Xiangyang, Zeng, Lian, Watanabe, Takahiro

    Lecture Notes in Engineering and Computer Science   2   659 - 664  2016.01

     View Summary

    Thermal problem is an essential issue which must be taken Into account in the 3D Network-on-Chip NoC) design, because it has a great impact on not only the network performance, but also the reliability of the message transmission. Tn this work, we prescnt a fully adaptive runtime thermal-aware routing algorithm, which combines the distance, traffic state, path dhersity and the thermal impact in the path determination. By simultaneously considering all these factors, the routing algorithm can effectively balance the traffic load while keeping high adaptivity and routability, which also results in an even distribution of temperature across the network. Instead of collecting the topology information of the whole network, we utilize a 12 bits register to reserve the router state for one hop away, which saves the hardware cost largely and decreases the network latency. The simulation results show our proposed routing algorithm can improve the latency and energy consumption by comparing with other previously proposed thermal-aware routing schemes, and the improvement is more remarkable in large scale networks.

  • C-009 A Novel Routing Algorithm based on Path Diversity and Congestion Estimation

    Hong Yang, Zeng Lian, Jiang Xin, Watanabe Takahiro

      14 ( 1 ) 251 - 252  2015.08

     View Summary

    This paper proposes a minimal adaptive routing algorithm for Network-on-Chip, which takes congestion information and routing diversity into consideration. Congestion is one of the most important factors on performance. Our algorithm can select a lower latency path for packet transmission according to the following conditions: the free buffer size of two neighbor routers is compared, the direction which has more different paths to the destination is chosen, it decides which direction the packet more tend to be transmitted by the position of the source and destination. As the result, the algorithm gets the proper direction for the next step. Comparing to other algorithms, our proposed routing algorithm has less latency and better throughput.

    CiNii

  • C-008 A High Density Escape Routing Method for Staggered-Pin-Array Based Mixed-Pattern Signal Model

    Xu Qianying, Pan Tieyuan, Zhang Ran, Tian Yang, Watanabe Takahiro

      14 ( 1 ) 249 - 250  2015.08

     View Summary

    Escape routing is one of key problems in design of printed circuit boards (PCB), and it becomes more and more difficult for manual design due to increased pin count. This paper proposed an algorithm using an unified model to formulate the problem of escape routing of differential pairs together with single signals (mixed-pattern signals) on staggered pin array (SPA) considering the routing's density.

    CiNii

  • RC-009 Development of a System to Reduce the Load of DRC Verification

    Kamei Tomoki, Watanabe Takahiro

      14 ( 1 ) 69 - 74  2015.08

    CiNii

  • Sorting-Based I/O Connection Assignment and Non-Manhattan RDL Routing for Flip-Chip Designs

    Zhang Ran, Watanabe Takahiro

    IEEJ Transactions on Electronics, Information and Systems   135 ( 12 ) 1535 - 1544  2015

     View Summary

    In modern VLSI designs, a flip-chip package is widely used to meet the higher integration density and the larger I/O counts of circuits. Recently the I/O buffers are mapped onto bump balls without changing the module placement using re-distribution layer (RDL) in flip-chip designs. In this research, a sorting-based I/O connection assignment and non-Manhattan RDL routing method is proposed for area I/O flip-chip designs. The approach initially assigns the I/O buffers to bump balls by sorting the Manhattan distance between them. Three kinds of pair-exchange procedures are then carried out to improve the initial assignment. Then to shorten the wire length, non-Manhattan RDL routing is adopted to connect the I/O buffers and bump balls. Finally some un-routed connections are ripped up and rerouted. The experimental results show that the proposed method is able to obtain the routes with shorter wire length in reasonable CPU times.

    CiNii

  • Layer Assignment and Equal-length Routing for Disordered Pins in PCB Design

    Zhang Ran, Pan Tieyuan, Zhu Li, Watanabe Takahiro

    IMT   10 ( 3 ) 395 - 404  2015

     View Summary

    In recent printed circuit board (PCB) design, due to the high density of integration, the signal propagation delay or skew has become an important factor for a circuit performance. As the routing delay is proportional to the wire length, the controllability of the wire length is usually focused on. In this research, a heuristic algorithm to get equal-length routing for disordered pins in PCB design is proposed. The approach initially checks the longest common subsequence of source and target pin sets to assign layers for pins. Single commodity flow is then carried out to generate the base routes. Finally, considering target length requirement and available routing region, R-flip and C-flip are adopted to adjust the wire length. The experimental results show that the proposed method is able to obtain the routes with better wire length balance and smaller worst length error in reasonable CPU times.

    DOI CiNii

  • A Performance Enhanced Dual-switch Network-on-chip Architecture

    Zeng Lian, Jiang Xin, Watanabe Takahiro

    IMT   10 ( 3 ) 405 - 414  2015

     View Summary

    With rapid progress in semiconductor technology, Network-on-Chip (NoC) becomes an attractive solution for future systems on chip (SoC). The network performance depends critically on the performance of packets routing. The delay of router and packets contention can significantly affect network latency and throughput. As the network becomes more congested, packets will be blocked more frequently. It would result in degrading the network performance. In this article, we propose an innovative dual-switch allocation (DSA) design. By introducing DSA design, we can make utmost use of idle output ports to reduce packets contention delay, meanwhile, without increasing router delay. Experimental results show that our design significantly achieves the performance improvement in terms of throughput and latency at the cost of very little power and area overhead.

    CiNii

  • A Performance Enhanced Dual-switch Network-on-chip Architecture

    Zeng Lian, Jiang Xin, Watanabe Takahiro

    IPSJ Transactions on System LSI Design Methodology   8 ( 0 ) 85 - 94  2015

     View Summary

    With rapid progress in semiconductor technology, Network-on-Chip (NoC) becomes an attractive solution for future systems on chip (SoC). The network performance depends critically on the performance of packets routing. The delay of router and packets contention can significantly affect network latency and throughput. As the network becomes more congested, packets will be blocked more frequently. It would result in degrading the network performance. In this article, we propose an innovative dual-switch allocation (DSA) design. By introducing DSA design, we can make utmost use of idle output ports to reduce packets contention delay, meanwhile, without increasing router delay. Experimental results show that our design significantly achieves the performance improvement in terms of throughput and latency at the cost of very little power and area overhead.

    CiNii

  • A Performance Enhanced Dual-switch Network-on-Chip Architecture

    Lian Zeng, Takahiro Watanabe

    2015 20TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC)     69 - 74  2015  [Refereed]

     View Summary

    Network-on-Chip (NoC) is an attractive solution for future systems on chip (SoC). The network performance depends critically on the performance of packets routing. However, as the network becomes more congested, packets will be blocked more frequently. It would result in degrading the network performance. In this article, we propose an innovative dual-switch allocation (DSA) design. By introducing two switch allocations, we can make utmost use of idle output ports. Experimental results show that our design significantly achieves the performance improvement in terms of throughput and latency at the cost of very little power overhead.

  • A Length Matching Routing Method for Disordered Pins in PCB Design

    Ran Zhang, Tieyuan Pan, Li Zhu, Takahiro Watanabe

    2015 20TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC)     402 - 407  2015  [Refereed]

     View Summary

    In this paper, for the disordered pins in printed circuit board (PCB) design, a heuristics algorithm is proposed to obtain a length matching routing. We initially check the longest common subsequence of pin pairs to assign layers for pins. Then, adopt single commodity flow to generate base routes. R-flip and C-flip are finally carried out to adjust the wire length. The experiments show that our algorithm generates the optimal routes with better wire balance within reasonable CPU times.

  • Application-Specific Shared Last-Level Cache Optimization for Low-Power Embedded Systems

    Huatao Zhao, Jiongyao Ye, Xian Su, Takahiro Watanabe

    2015 IEEE 13TH INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS)    2015  [Refereed]

     View Summary

    Modern embedded systems favor the chip multiprocessor frame to achieve higher performance, but they are restricted by the inefficient cache hierarchies. Typically, the accessing interference and improper allocation in last-level cache (LLC) shared by multiprocessors cause significant energy consumption and performance depression. In this paper, we propose a configurable and partitioned cache hierarchy where an energy-efficient runtime mechanism can well manage the shared LLC to meet application programs. This mechanism utilizes the repeated behaviors in hot subroutines of application and selects the proper partition intervals. Then, a low-power metric based configurable scheme is employed to explore the optimal allocation of given cache resources. Thus, we can provide each core with the optimal allocation information to dynamically partition the shared LLC during runtime. Experimental results for a quad-core system using the SPEC2006 benchmarks show that the cache access energy can be reduced by on average 32.5 percent compared to the equal partition scheme only with 1.3 percent performance off.

    DOI

    Scopus

  • Vertical-Mesh-Conscious-Dynamic Routing Algorithm for 3D NoCs

    Xiangyang Lei, Xin Jiang, Lian Zeng, Takahiro Watanabe

    TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE   2016-January  2015  [Refereed]

     View Summary

    In this paper, a new deadlock-free dynamic turn model named VMCD (vertical-mesh-conscious-dynamic) is proposed for higher performance in 3D NoC. On vertical meshes and odd horizontal meshes, odd-even turn model is applied, while xy routing is utilized on even horizontal meshes. According to the priority of vertical meshes and horizontal meshes, two VMCD routing algorithms are applied based on this turn model. Compared with the Z-odd-even (ZOE) and balanced-odd-even (BOE), the proposed VMCD algorithms take adaptiveness and network balance into account simultaneously and show better performance including average latency and throughput. Compared to ZOE on 8*8*2 and 8*8*4 mesh, the improvement of throughput can be up to 68.5% and 9.3% respectively for the random traffic and 14.3% and 20% respectively for the transpose traffic pattern. The performance improvement is much more remarkable compared with BOE routing algorithm.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A Performance Enhanced Adaptive Routing Algorithm for 3D Network-on-Chips

    Lian Zeng, Tieyuan Pan, Xin Jiang, Takahiro Watanabe

    TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE   2016-January  2015  [Refereed]

     View Summary

    As the technology of semiconductor continues to develop, hundreds of cores will be deployed on a signal die in the future Chip-Multiprocessors (CMPs) design. So Three Dimensional Network-on-Chips (3D NoCs) has become an attractive solution which can provide high performance. The network performance depends critically on the performance of routing algorithm. This paper proposes a novel adaptive routing in 3D NoC which can solve congestion not only in the intra-layers but also in inter-layers. Simulation results shoo that our proposed method significantly achieves the performance improvement compared with other transitional routing algorithms.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A Sorting-Based Micro-Bump Assignment for 3D ICs

    Ran Zhang, Tieyuan Pan, Takahiro Watanabe

    2015 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC)     139 - 140  2015  [Refereed]

     View Summary

    Recently RDLs (Re-Distribution Layers) and microbumps are widely adopted in 3D IC designs. In this research, a sorting-based micro-bump assignment method is proposed. The approach initially assigns the I/O pads to micro-bumps by sorting the Manhattan distance between them. Then single layer routing in two RDLs are carried out respectively. The experimental results show that the proposed method is able to obtain the routes with shorter total wire length in reasonable CPU times.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • A Stack-based Solution for Alias Problem in Branch Prediction

    Sijie YIN, Huatao ZHANG, Takahiro WATANABE

    情報処理学会第76回全国大会   2014 ( 1 ) 95 - 96  2014.03

    CiNii

  • Adaptive Routing with Congestion Estimation based on G-table

    Gong Zheng, Zeng Lian, Watanabe Takahiro

    2014電子情報通信学会 総合大会    2014.03

  • A Sophisticated Routing Algorithm in 3D NoC with Fixed TSVs for Low Energy and Latency

    Jiang Xin, Zeng Lian, Watanabe Takahiro

    IMT   9 ( 4 ) 404 - 412  2014

     View Summary

    With rapid progress in Integrated Circuit technologies, Three-Dimensional Network-on-Chips (3DNoCs) have become a promising solution for achieving low latency and low power. Under the constraint of the TSV number used in 3DNoCs, designing a proper routing algorithm with fewer TSVs is a critical problem for network performance improvement. In this work, we design a novel fully adaptive routing algorithm in 3D NoC. The algorithm consists of two parts: one is a vertical node assignment in inter-layer routing, which is a TSV selection scheme in a limited quantity of TSVs in the NoC architecture, and the other is a 2D fully adaptive routing algorithm in intra-layer routing, which combines the optimization of routing distance, network traffic condition and diversity of the path selection. Simulation results show that our proposed routing algorithm can achieve lower latency and energy consumption compared with other traditional routing algorithms.

    DOI CiNii

  • A sophisticated routing algorithm in 3D NoC with fixed TSVs for low energy and latency

    Xin Jiang, Lian Zeng, Takahiro Watanabe

    IPSJ Transactions on System LSI Design Methodology   7 ( 0 ) 101 - 109  2014

     View Summary

    With rapid progress in Integrated Circuit technologies, Three-Dimensional Network-on-Chips (3DNoCs) have become a promising solution for achieving low latency and low power. Under the constraint of the TSV number used in 3DNoCs, designing a proper routing algorithm with fewer TSVs is a critical problem for network performance improvement. In this work, we design a novel fully adaptive routing algorithm in 3D NoC. The algorithm consists of two parts: one is a vertical node assignment in inter-layer routing, which is a TSV selection scheme in a limited quantity of TSVs in the NoC architecture, and the other is a 2D fully adaptive routing algorithm in intra-layer routing, which combines the optimization of routing distance, network traffic condition and diversity of the path selection. Simulation results show that our proposed routing algorithm can achieve lower latency and energy consumption compared with other traditional routing algorithms.

    DOI CiNii

    Scopus

    5
    Citation
    (Scopus)
  • A sophisticated routing algorithm in 3D NoC with fixed TSVs for low energy and latency

    Xin Jiang, Lian Zeng, Takahiro Watanabe

    IPSJ Transactions on System LSI Design Methodology   7 ( Aug.2014 ) 101 - 109  2014

     View Summary

    With rapid progress in Integrated Circuit technologies, Three-Dimensional Network-on-Chips (3DNoCs) have become a promising solution for achieving low latency and low power. Under the constraint of the TSV number used in 3DNoCs, designing a proper routing algorithm with fewer TSVs is a critical problem for network performance improvement. In this work, we design a novel fully adaptive routing algorithm in 3D NoC. The algorithm consists of two parts: one is a vertical node assignment in inter-layer routing, which is a TSV selection scheme in a limited quantity of TSVs in the NoC architecture, and the other is a 2D fully adaptive routing algorithm in intra-layer routing, which combines the optimization of routing distance, network traffic condition and diversity of the path selection. Simulation results show that our proposed routing algorithm can achieve lower latency and energy consumption compared with other traditional routing algorithms.

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • A Randomized Algorithm for the Fixed-Length Routing Problem

    Tieyuan Pan, Ran Zhang, Yasuhiro Takashima, Takahiro Watanabe

    2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     711 - 714  2014  [Refereed]

     View Summary

    In this paper, we propose a fixed-length routing method in Printed Circuit Board (PCB). The proposed method utilizes the simpath algorithm with a randomized graph reduction. It outputs the routing of the nets with small length-error. Its efficiency is confirmed empirically.

  • LVSの出力情報を活用したVLSI電源配線幅の高速検証システム

    亀井智紀, 渡邊孝博, 川北真裕

    電子情報通信学会 論文誌D   Vol.J96-D ( 5 )  2013.05

  • An efficient algorithm for 3d NOC architecture optimization

    Xin Jiang, Ran Zhang, Takahiro Watanabe

    IPSJ Transactions on System LSI Design Methodology   6   34 - 41  2013.02

     View Summary

    With the progress of 3D IC integration technologies, the application of 3D Networks-on-chip (NoCs) has been proposed as a scalable and efficient solution to the global communication in the interconnect designs. In this work, we propose a new procedure for designing application specific irregular 3D NoC architectures. This procedure does not only satisfy the variability of the highly customized SoC designs, but also achieve significant performance improvement. The objective is to improve both communication latency and power consumption under several 3D constraints. A Genetic Algorithm (GA) based efficient algorithm is applied to optimize both the topology and floorplan. Numerical experiments are implemented on standard benchmarks by comparing the method application in 3D architectures with the 2D designs and then comparing the architecture obtained by our proposed algorithm with both classical topologies and custom based topologies. The experimental results show that the architectures by our design algorithm can achieve more performance improvement than other algorithms and the proposed algorithm also proves to be a time efficient method for exploration in the large solution space. © 2013 Information Processing Society of Japan.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • An Efficient Algorithm for 3D NoC Architecture Optimization

    Jiang Xin, Zhang Ran, Watanabe Takahiro

    IMT   8 ( 2 ) 254 - 261  2013

     View Summary

    With the progress of 3D IC integration technologies, the application of 3D Networks-on-chip (NoCs) has been proposed as a scalable and efficient solution to the global communication in the interconnect designs. In this work, we propose a new procedure for designing application specific irregular 3D NoC architectures. This procedure does not only satisfy the variability of the highly customized SoC designs, but also achieve significant performance improvement. The objective is to improve both communication latency and power consumption under several 3D constraints. A Genetic Algorithm (GA) based efficient algorithm is applied to optimize both the topology and floorplan. Numerical experiments are implemented on standard benchmarks by comparing the method application in 3D architectures with the 2D designs and then comparing the architecture obtained by our proposed algorithm with both classical topologies and custom based topologies. The experimental results show that the architectures by our design algorithm can achieve more performance improvement than other algorithms and the proposed algorithm also proves to be a time efficient method for exploration in the large solution space.

    DOI CiNii

  • Flexible L1 Cache Optimization for a Low Power Embedded System

    Huatao Zhao, Sijie Yin, Yuxin Sun, Takahiro Watanabe

    PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC)   1   2433 - 2437  2013  [Refereed]

     View Summary

    large power consumption of memory access has been one of the major bottlenecks in modern embedded systems. Because caches even take about half of those systems' power consumption. So it is essential in concentrating on optimized strategies for cache's parameters as well as the enhancement of its adaptability to various applications. Considering the particular applications of embedded systems, we can optimize the caches with configuration parameters such as cache size, line size or associativity. In this paper, we firstly put forward the relations between those cache parameters, and the quantified results establish a new reconfigurable cache structure so as to find the optimal cache parameters rapidly by a searching algorithm. Furthermore, the possible hardware implementation with certain parameters is described, and the effectiveness of this method is verified by experiments using CACTI6.5 and SPEC2006 benchmark on Simple- scalar 3.0e. Experimental results show that the proposed cache can reduce the power consumption to 38.4% of its maximum power consumption caused by the redundant hardware resources.

  • A Parallel Routing Method for Fixed Pins using Virtual Boundary

    Ran Zhang, Takahiro Watanabe

    2013 IEEE TENCON SPRING CONFERENCE     99 - 103  2013  [Refereed]

     View Summary

    In recent PCB systems, the routing for high speed board is still achieved manually. As IC technology advances rapidly, the dimensions of packages and PCBs are decreasing while the pin counts and routing layers keep increasing. In this research, a parallel routing method for fixed pins using virtual boundary is proposed, which partitions the routing area into several sub-areas and routes them separately. Applying this proposed method, the wire length can be reduced. Moreover, considering the length-matching constraints, especially for the isometric wires routing problems, the proposed method can get better wire shape resemblance.

  • A Novel Fully Adaptive Fault-tolerant Routing Algorithm for 3D Network-on-Chip

    Xin Jiang, Takahiro Watanabe

    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON)    2013  [Refereed]

     View Summary

    In this work, we present an efficient fully adaptive fault-tolerant routing algorithm for 3D Network-on-Chip (3D NoC). The crucial algorithm for path routing is firstly routing the packet to the destination layer by using an adaptive vertical node assignment scheme in the NoC architecture with a limited quantity of TSVs and then routing to the destination node within the 2D layer through a fully adaptive routing algorithm. Instead of rerouting packets around the fault regions when fault occurs, our proposed algorithm applies a fault detection scheme which can get the fault information one hop away in advance, and it combines the fault information when doing the path computation. This algorithm can deal with multi faults in the 3D NoC architecture. Simulation results show that our proposed routing algorithm can achieve lower latency, energy consumption and higher packet arrival rate compared with other traditional routing algorithms in various network applications.

  • Adaptive Router with Predictor using Congestion Degree for 3D Network-on-Chip

    Lian Zeng, Xin Jiang, Takahiro Watanabe

    2013 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC)     46 - 49  2013  [Refereed]

     View Summary

    As the technology of chip multiprocessors (CMPs) is evolved, the performance of 2D architecture becomes insufficient to meet various requirements, and three-dimensional integrated circuits (3D-ICs) provide an attractive solution to improve network performance by using through silicon via (TSV). However, there are more transmitted packets in 3D network and congestion condition becomes more complex. The performance of network depends critically on its routing algorithm. Various routing algorithms have been proposed for 3D NoCs. Adaptive routing algorithm that merges local congestion and future congestion information was proposed in [9]. But the congestion used in it is roughly estimated, not very precise, but network performance is affected by the congestion significantly. In this paper, we propose a more precise congestion for predictor based on [9] and implement it in 3D NoCs. The proposed method is proved to have better latency and throughput than traditional routing methods like XY routing and Odd-even routing.

  • Region Oriented Routing FPGA Architecture for Dynamic Power Gating

    Ce Li, Yiping Dong, Takahiro Watanabe

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E95A ( 12 ) 2199 - 2207  2012.12  [Refereed]

     View Summary

    Dynamic power gating applicable to FPGA can reduce the power consumption effectively. In this paper, we propose a sophisticated routing architecture for a region oriented FPGA which supports dynamic power gating. This is the first routing solution of dynamic power gating for coarse-grained FPGA. This paper has 2 main contributions. First, it improves the routing resource graph and routing architecture to support special routing for a region oriented FPGA. Second, some routing channels are made wider to avoid congestion. Experimental result shows that 7.7% routing area can lie reduced compared with the symmetric Wilton switch box in the region. Also, our proposed FPGA architecture with sophisticated P&R can reduce the power consumption of the system implemented in FPGA.

    DOI

    Scopus

  • Rotational Display Problem for Array Reference in LSI Layout Data

    Tomoki Kamei, Takahiro Watanabe

    Proc. ITC-CSCC 2012    2012.07

  • Design and Implementation of SHA-1 Hash Function using Verilog HDL

    Suhaili Shamsiah binti, Takahiro Watanabe

    2012年電子情報通信学会総合大会講演論文集 DS-1-3   DS ( 1 ) s5 - s6  2012.03

  • A Parallel Routing Method using Virtual Boundary

    Ran Zhang・Takahiro Watanabe

    2012年電子情報通信学会総合大会講演論文集 A-3-2   A ( 3 ) 2  2012.03

  • A Time-efficient Approach to Evolve GA-based Image Filters

    Endong Ni, Takahiro Watanabe

    2012年電子情報通信学会総合大会講演論文集   A ( 1 ) 33  2012.03

  • Region-Oriented Placement Algorithm for Coarse-Grained Power-Gating FPGA Architecture

    Ce Li, Yiping Dong, Takahiro Watanabe

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E95D ( 2 ) 314 - 323  2012.02  [Refereed]

     View Summary

    An FPGA plays an essential role in industrial products due to its fast, stable and flexible features. But the power consumption of FPGAs used in portable devices is one of critical issues. Top-down hierarchical design method is commonly used in both ASIC and FPGA design. But, in the case where plural modules are integrated in an FPGA and some of them might be in sleep-mode, current FPGA architecture cannot be fully effective. In this paper, coarse-grained power gating FPGA architecture is proposed where a whole area of an FPGA is partitioned into several regions and power supply is controlled for each region, so that modules in sleep mode can be effectively power-off. We also propose a region oriented FPGA placement algorithm fitted to this user's hierarchical design based on VPR [1]. Simulation results show that this proposed method could reduce power consumption of FPGA by 38% on average by setting unused modules or regions in sleep mode.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A Behavior-based Adaptive Access-mode for Low-power Set-associative Caches in Embedded Systems

    Jiongyao Ye, Hongfeng Ding, Yingtao Hu, Takahiro Watanabe

    Journal of information processing   20 ( 1 ) 26 - 36  2012.01

     View Summary

    Modern embedded processors commonly use a set-associative scheme to reduce cache misses. However, a conventional set-associative cache has its drawbacks in terms of power consumption because it has to probe all ways to reduce the access time, although only the matched way is used. The energy spent in accessing the other ways is wasted, and the percentage of such energy will increase as cache associativity increases. Previous research, such as phased caches, way prediction caches and partial tag comparison, have been proposed to reduce the power consumption of set-associative caches by optimizing the cache access mode. However, these methods are not adaptable according to the program behavior because of using a single access mode throughout the program execution. In this paper, we propose a behavior-based adaptive access-mode for set-associative caches in embedded systems, which can dynamically adjust the access modes during the program execution. First, a program is divided into several phases based on the principle of program behavior repetition. Then, an off-system pre-analysis is used to exploit the optimal access mode for each phase so that each phase employs the different optimal access mode to meet the application's demand during the program execution. Our proposed approach requires little hardware overhead and commits most workload to the software, so it is very effective for embedded processors. Simulation by using Spec 2000 shows that our proposed approach can reduce roughly 76.95% and 64.67% of power for an instruction cache and a data cache, respectively. At the same time, the performance degradation is less than 1%.

    DOI CiNii

    Scopus

    1
    Citation
    (Scopus)
  • A New Recovery Mechanism in Superscalar Microprocessors by Recovering Critical Misprediction

    Jiongyao Ye, Yu Wan, Takahiro Watanabe

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E94A ( 12 ) 2639 - 2648  2011.12  [Refereed]

     View Summary

    Current trends in modern out-of-order processors involve implementing deeper pipelines and a large instruction window to achieve high performance, which lead to the penalty of the branch misprediction recovery being a critical factor in overall processor performance. Multi path execution is proposed to reduce this penalty by executing both paths following a branch, simultaneously. However, there are some drawbacks in this mechanism, such as design complexity caused by processing both paths after a branch and performance degradation due to hardware resource competition between two paths. In this paper, we propose a new recovery mechanism, called Recovery Critical Misprediction (RCM), to reduce the penalty of branch misprediction recovery. The mechanism uses a small trace cache to save the decoded instructions from the alternative path following a branch. Then, during the subsequent predictions, the trace cache is accessed. If there is a hit, the processor forks the second path of this branch at the renamed stage so that the design complexity in the fetch stage and decode stage is alleviated. The most contribution of this paper is that our proposed mechanism employs critical path prediction to identify the branches that will be most harmful if mispredicted. Only the critical branch can save its alternative path into the trace cache, which not only increases the usefulness of a limited size of trace cache but also avoids the performance degradation caused by the forked non-critical branch. Experimental results employing SPECint 2000 benchmark show that a processor with our proposed RCM improves IPC value by 10.05% compared with a conventional processor.

    DOI

    Scopus

  • Low Power Placement and Routing for the Coarse-Grained Power Gating FPGA Architecture

    Ce Li, Yiping Dong, Takahiro Watanabe

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E94A ( 12 ) 2519 - 2527  2011.12  [Refereed]

     View Summary

    Since the power consumption of FPGA is larger than that of ASIC under the condition to perform the same function using the same scaling, the application of FPGA is limited especially in portable electronic devices. In this paper, we propose a novel low-power FPGA architecture based on coarse-grained power gating to reduce power consumption. The new placement algorithm and routing resource graph for sleep regions is also presented. After enhancing the CAD framework, a detailed discussion is given under different region size supported by the new FPGA architecture. As a result, our proposed FPGA architecture combined with the new placement and routing algorithm can reduce 19.4% in the total power consumption compared with the traditional FPGA. By using our proposed method, FPGA is promising to be widely applied to portable devices.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A Hybrid Layer-Multiplexing and Pipeline Architecture for Efficient FPGA-based Multilayer Neural Network

    Y.P.Dong, C.Li, Z.Lin, Takahiro Watanabe

    IEICE NOLTA   E94-N ( 10 ) 522 - 532  2011.10

  • An Adaptive Various-Width Data Cache for Low Power Design

    Jiongyao Ye, Yu Wan, Takahiro Watanabe

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E94D ( 8 ) 1539 - 1546  2011.08  [Refereed]

     View Summary

    Modern microprocessors employ caches to bridge the great speed variance between a main memory and a central processing unit, but these caches consume a larger and larger proportion of the total power consumption. In fact, many values in a processor rarely need the full-bit dynamic range supported by a cache. The narrow-width value occupies a large portion of the cache access and storage. In view of these observations, this paper proposes an Adaptive Various-width Data Cache (AVDC) to reduce the power consumption in a cache, which exploits the popularity of narrow-width value stored in the cache. In AVDC, the data storage unit consists of three sub-arrays to store data of different widths. When high sub-arrays are not used, they are closed to save its dynamic and static power consumption through the modified high-bit SRAM cell. The main advantages of AVDC are: 1) Both the dynamic and static power consumption can be reduced. 2) Low power consumption is achieved by the modification of the data storage unit with less hardware modification. 3) We exploit the redundancy of narrow-width values instead of compressed values, thus cache access latency does not increase. Experimental results using SPEC 2000 benchmarks show that our proposed AVDC can reduce the power consumption, by 34.83% for dynamic power saving and by 42.87% for static power saving on average, compared with a cache without AVDC.

    DOI

    Scopus

  • Analysis before Starting an Access: A New Power-Efficient Instruction Fetch Mechanism

    Jiongyao Ye, Yingtao Hu, Hongfeng Ding, Takahiro Watanabe

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E94D ( 7 ) 1398 - 1408  2011.07  [Refereed]

     View Summary

    Power consumption has become an increasing concern in high performance microprocessor design. Especially, Instruction Cache (I-Cache) contributes a large portion of the total power consumption in a microprocessor, since it is a complex unit and is accessed very frequently. Several studies on low-power design have been presented for the power-efficient cache design. However, these techniques usually suffer from the restrictions in the traditional Instruction Fetch Unit (IFU) architectures where the fetch address needs to be sent to I-Cache once it is available. Therefore, work to reduce the power consumption is limited after the address generation and before starting an access. In this paper, we present a new power-aware IFU architecture, named Analysis Before Starting an Access (ABSA), which aims at maximizing the power efficiency of the low-power designs by eliminating the restrictions on those low-power designs of the traditional IFU. To achieve this goal, ABSA reorganizes the IFU pipeline and carefully assigns tasks for each stages so that sufficient time and information can be provided for the low-power techniques to maximize the power efficiency before starting an access. The proposed design is fully scalable and its cost is low. Compared to a conventional IFU design, simulation results show that ABSA saves about 30.3% fetch power consumption, on average. I-Cache employed by ABSA reduces both static and dynamic power consumptions about 85.63% and 66.92%, respectively. Meanwhile the performance degradation is only about 0.97%.

    DOI

    Scopus

  • High Performance Feedforward Neural Network Mapped by NoC Architecture with a New Routing Strategy Implementation Method

    Y.P.Dong, C.Li, Z.Lin, H.Zhang, Takahiro Watanabe

    J. Signal Processing   15 ( 3 ) 113 - 122  2011.03

    CiNii

  • Mixed Constrained Image Filter Design for Salt-and-pepper Noise Reduction using Genetic Algorithm,", , pp.363-368, 2011

    Bao Zhiguo, Takahiro Watanabe

    IEEJ Trans.EIS   vol.131, No.3   363 - 368  2011.03

  • Efficient GA approach combined with Taguchi method for mixed constrained circuit design

    Yiwen Su, Zhiguo Bao, Fangfang Wang, Takahiro Watanabe

    Proceedings - 2011 International Conference on Computational Science and Its Applications, ICCSA 2011     290 - 293  2011

     View Summary

    This paper proposes a new circuit design optimization method where Genetic Algorithm (GA) with parameterized uniform crossover (GApuc) is combined with Taguchi method. The purposed are (a) using Taguchi method to search for optimal fitness value and (b) evaluating the power and signal delay of logic blocks in circuit design to get an optimum circuit in complexity, power and signal delay. The present study enhances the previous results by providing a much more detailed examination of mixed constrained circuit design. Experimental results show that our proposed approach can produce a good circuit in both fitness function and CPU time. © 2011 IEEE.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Via 数削減による大規模LSI レイアウトの高速

    亀井 智紀, 安部 拓哉, 本垰 秀昭, 渡邊 孝博

    情報処理学会 SLDM研究報告   2011-SLDM-148(17)   1 - 6  2011.01

  • Fault-tolerant Image Filter Design using Particle Swarm Optimization

    Zhiguo Bao, Fangfang Wang, Xiaoming Zhao, Takahiro Watanabe

    PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11)     653 - 658  2011  [Refereed]

     View Summary

    This paper describes mixed constrained image filter design with fault tolerant using Particle Swarm Optimization (PSO) on a reconfigurable processing array. There may be some faulty Configurable Logic Blocks (CLBs) in a reconfigurable processing array. The proposed method with PSO autonomously synthesizes a filter fitted to the reconfigurable device with some faults, to optimize the complexity and power of a circuit, and signal delay in both CLBs and wires. An image filter for noise reduction is experimentally synthesized to verify the validity of our method. By evolution, the quality of the optimized image filter on a reconfigurable device with a few faults is almost same as that with no fault.

  • A High Performance Digital Neural Processor Design by Network on Chip Architecture

    Yiping Dong, Ce Li, Hui Liu, Watanabe Takahiro

    2011 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT)     243 - 246  2011  [Refereed]

     View Summary

    This paper describes a high performance neural processor by using a Network on Chip (NoC) architecture to solve the interconnection and performance problems in hardware neural networks. The proposed NoC-based neural processor is composed of 20 tiles in 4x5 2-D array, and each tile includes a Process Element (PE) and a packet switched router. In each PE, four neurons are implemented to achieve low communication load. The network is 2-D torus topology, and it has a 32 G/s bandwidth and asynchronous clocking system. Our proposed neural processor is designed using 90-nm CMOS technology with one Poly and nine metals, and its performance is evaluated. As a result, it can achieve over 3.1 G Connection Per Second (CPS) of performance while power dissipation is 1.1317 W at 1.2 V supply-voltage and 25 mm(2) chip area. Compared with the other existing hardware neural networks, the proposed processor can achieve low communication load and high performance, and it is reconfigurable and extendable.

  • New power-efficient FPGA design combining with region-constrained placement and multiple power domains

    Ce Li, Yiping Dong, Takahiro Watanabe

    2011 IEEE 9th International New Circuits and Systems Conference, NEWCAS 2011     69 - 72  2011

     View Summary

    Multiple power domain design architectures have been studied for the power-efficient FPGAs. But, most of these researches pay attention on the clustered logic block's finegrain power gating which increases the FPGA size significantly. This paper presents a fast placement algorithm for coarsegrain FPGAs architecture, by which the circuit with multiple power domains is mapped into several regions for low power consumption. Each region uses one or several sleep transistors in order to conserve leakage energy. Using the CAD framework, we discuss the power efficiency of sleep region FPGA architecture by using the benchmarks assumed in multiple power domains. Simulation result shows that 9.1% power consumption of FPGA can be reduced on average by the proposed placement algorithm, compared to the traditional algorithm. Furthermore, when the dual power domains are individually power-on and -off, our proposed method can reduce the power more than 20%. © 2011 IEEE.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • New power-aware placement for region-based FPGA architecture combined with dynamic power gating by PCHM

    Ce Li, Yiping Dong, Takahiro Watanabe

    Proceedings of the International Symposium on Low Power Electronics and Design     223 - 228  2011

     View Summary

    The power consumption of FPGA is larger than that of ASIC to perform the same function in the same scaling. In this paper, we propose a Power Control Hard Macro (PCHM) based coarse-grained power gating FPGA architecture to dynamically reduce the power consumption. The algorithm of the placement based on sleep region is presented. After enhancing the CAD framework, a detailed study is given under different region size supported by the new FPGA architecture. As a result, the proposed architecture and the placement algorithm can reduce 51% power consumption on average compared with normal architecture. © 2011 IEEE.

    DOI

    Scopus

    11
    Citation
    (Scopus)
  • An efficient design algorithm for exploring flexible topologies in custom adaptive 3D NoCs for high performance and low power

    Xin Jiang, Ran Zhang, Takahiro Watanabe

    Proceedings of International Conference on ASIC     535 - 538  2011

     View Summary

    The application of 3D Networks-on-chip (NoCs) has been proved to be an effective solution to the global communication of 3D IC integration, while the design of NoC topologies has played a critical role to increase interconnection performance. In this work, we propose a new procedure for designing application specific irregular 3D NoC topologies which achieve significant performance improvement. The objective is to improve both communication latency and power consumption under several 3D constraints. We propose a two-stage design model based on a series of efficient algorithms to explore the optimized topology in a large scale searching space. Numerical experimental results show that the topologies by our design algorithm achieve more performance improvement (about 31.5%) than the classical topologies and the proposed algorithm also proves to be a time efficient method for exploration in the large solution space. © 2011 IEEE.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • カスタマイズ可能なRip-up IP MIX とWIPER2.0 の開発

    李 美燕, 王 嘉宇, 渡邊孝博

    電気関係学会九州支部第63回連合大会   02-1P-02  2010.09

  • ネットワーク・オン・チップにおける低遅延ルーティングアルゴリズムの提案

    李 岩, 林 しん, 董 宜平, 渡邊孝博

    電気関係学会九州支部第63回連合大会   10-2A-08  2010.09

  • 並列等長配線のための多層配線手法

    張 然, 渡邊孝博

    電気関係学会九州支部第63回連合大会   10-2A-07  2010.09

  • NoC ルーティングアルゴリズムの高性能ハードウェア化の手法

    張 華, 董 宜平, 渡邉孝博

    電気関係学会九州支部第63回連合大会   10-2A-09  2010.09

  • Circuit Design Using Genetic Algorithm combined with Taguchi method and Particle Swarm Optimization

    YiWen Su, Zhiguo Bao, Kuoyang Tu, Takahiro Watanabe

    電気関係学会九州支部第63回連合大会   12-1A-04  2010.09

  • Power-efficient Level-2 Cache Design for Embedded Processors

    Mengyuan Tang・Jiongyao Ye, Takahiro Watanabe

    電気関係学会九州支部第63回連合大会   12-1A-01  2010.09

  • A Novel Low Power FPGA Architecture

    Li Ce, Watanabe Takahiro

    Proc. FIT2010 (Forum on Information Technology)   1 ( RC002 )  2010.09

  • Multiple Network-on-Chip Model for High Performance Neural Network

    Yiping Dong, Ce Li, Zhen Lin, Takahiro Watanabe

    JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE   10 ( 1 ) 28 - 36  2010.03  [Refereed]

     View Summary

    Hardware implementation methods for Artificial Neural Network (ANN) have been researched for a long time to achieve high performance. We have proposed a Network on Chip (NoC) for ANN, and this architecture can reduce communication load and increase performance when an implemented ANN is small. In this paper, a multiple NoC models are proposed for ANN, which can implement both a small size ANN and a large size one. The simulation result shows that the proposed multiple NoC models can reduce communication load, increase system performance of connection-per-second (CPS), and reduce system running time compared with the existing hardware ANN. Furthermore, this architecture is reconfigurable and reparable. It can be used to implement different applications of ANN.

  • Circuit Design Optimization Using Genetic Algorithm with Parameterized Uniform Crossover

    Zhiguo Bao, Takahiro Watanabe

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E93A ( 1 ) 281 - 290  2010.01  [Refereed]

     View Summary

    Evolvable hardware (EHW) is a new research field about the use of Evolutionary Algorithms (EAs) to construct electronic systems. EHW refers in a narrow sense to use evolutionary mechanisms as the algorithmic drivers for system design, while in a general sense to the capability of the hardware system to develop and to improve itself. Genetic Algorithm (GA) is one of typical EAs. We propose optimal circuit design by using GA with parameterized uniform crossover (GApuc) and with fitness function composed of circuit complexity, power, and signal delay. Parameterized uniform crossover is much more likely to distribute its disruptive trials in an unbiased manner over larger portions of the space, then it has more exploratory power than one and two-point crossover, so we have more chances of finding better solutions. Its effectiveness is shown by experiments. From the results, we can see that the best elite fitness, the average value of fitness of the correct circuits and the number of the correct circuits of GApuc are better than that of GA with one-point crossover or two-point crossover. The best case of optimal circuits generated by GApuc is 10.18% and 6.08% better in evaluating value than that by GA with one-point crossover and two-point crossover, respectively.

    DOI

    Scopus

    8
    Citation
    (Scopus)
  • Mixed constrained image filter design using particle swarm optimization

    Zhiguo Bao, Takahiro Watanabe

    Artificial Life and Robotics   15 ( 3 ) 363 - 368  2010

     View Summary

    This article describes an evolutionary image filter design for noise reduction using particle swarm optimization (PSO), where mixed constraints on the circuit complexity, power, and signal delay are optimized. First, the evaluated values of correctness, complexity, power, and signal delay are introduced to the fitness function. Then PSO autonomously synthesizes a filter. To verify the validity of our method, an image filter for noise reduction was synthesized. The performance of the resultant filter by PSO was similar to that of a genetic algorithm (GA), but the running time of PSO is 10% shorter than that of GA. © 2010 International Symposium on Artificial Life and Robotics (ISAROB).

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • High performance Implementation of Neural Networks by Networks on Chip with 5-Port 2-Virtual Channels

    Yiping Dong, Zhen Lin, Yan Li, Takahiro Watanabe

    2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS     381 - 384  2010  [Refereed]

     View Summary

    Hardware implementation of Artificial Neural Network (ANN) is proposed by using Networks on Chip (NoC) with 5-port 2-virtual channels router, aiming at higher performance and low latency. Experimental results by NIRGAM NoC simulator show that this proposed system has higher Connection-Per-Second (CPS), higher Connection-Per-Second-Per-Weight (CPSPW), lower communication load. Furthermore this NoC implementation system is reconfigurable and expandable, so that it can be applied to various applications.

  • A novel hardware method to implement a routing algorithm onto network on chip

    Yiping Dong, Hua Zhang, Zhen Lin, Takahiro Watanabe

    2010 International Conference on Communications, Circuits and Systems, ICCCAS 2010 - Proceedings     852 - 856  2010

     View Summary

    Recently, a Network on Chip (NoC) has attracted much attention for its smart structure and high performance. However, NoC routing algorithms significantly influences the performance and design cost. In this paper, a new hardware method to implement a routing algorithm is proposed. The proposed method is used to replace the general destination-tag method for router design. We simulate and evaluate the router and NoC with proposed method in terms of circuit resource, latency and throughput. The results indicate that the NoC architecture with proposed method is effective in reducing circuit resource, latency and increasing throughput. © 2010 IEEE.

    DOI

    Scopus

  • High performance networks on chip architecture with a new routing strategy for neural network

    Yiping Dong, Zhen Lin, Takahiro Watanabe

    PrimeAsia 2010 - 2nd Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics     347 - 350  2010

     View Summary

    Hardware implementation by Networks on Chip (NoC) for Artificial Neural Network (ANN) was proposed to improve. In this work, a new architecture of NoC which has a hardware implementation of routing algorithm is proposed for ANN design. This routing strategy could reduce the packet size of header. The NOXIM NoC simulator is used to simulate the proposed system in term of latency, throughput and power consumption. The experimental results indicate that the proposed new NoC architecture is effective in increasing throughput and reducing latency and power consumption, compare with the traditional one. The ANN with the new NoC architecture could achieve higher performance and lower communication load.

    DOI

    Scopus

  • A Variable Bitline Data Cache for low power design

    Jiongyao Ye, Takahiro Watanabe

    PrimeAsia 2010 - 2nd Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics     174 - 177  2010

     View Summary

    Reducing the power consumption is one of the most important design problems at present. Modern microprocessors employ caches to bridge the great speed variance between the main memory and the central processing unit, but these caches propose larger and larger proportion in the total power consumption. In fact, many values rarely need the full-bit dynamic range supported by a cache. The Narrow-Width Value (NWV) occupies a large portion of cache access and storage. It is unreasonable that the storage space for value of any data width is the same in the cache, even if NWV needs only a few bits to be stored. This paper proposes a Variable Bitline Data Cache (VBDC) which exploits the popularity of NWV stored in the cache. In VBDC design, the cache data array is divided into several sub-arrays to adapt each data pattern with the different bitline length to access. The VBDC can shut off the corresponding unused high arrays to reduce its dynamic and static power consumption. The VBDC achieves low power consumption through reducing the bitline length. Experimental results employing SPEC 2000 benchmarks show that our proposed VBDC can reduce both the dynamic power consumption ant the static power consumption by 44.75% and 42.86%.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Fault-tolerant Image Filter Design using GA

    Zhiguo Bao, Fangfang Wang, Xiaoming Zhao, Takahiro Watanabe

    TENCON 2010: 2010 IEEE REGION 10 CONFERENCE     897 - 902  2010  [Refereed]

     View Summary

    This paper describes mixed constrained image filter design with fault tolerant using Genetic Algorithm (GA) on a reconfigurable processing array. There may be some faulty Configurable Logic Blocks (CLBs) in a reconfigurable processing array at random. The proposed method with GA autonomously synthesizes a filter fitted to the reconfigurable device with some faults, evaluating the complexity, power and signal delay in both CLBs and wires. An image filter for noise reduction is experimentally synthesized to verify the validity of our method. By evolution, the quality of the optimized image filter on a reconfigurable device with a few faults is almost same as that with no fault.

  • An Efficient Hardware Routing Algorithms for NoC

    Yiping Dong, Zhen Lin, Takahiro Watanabe

    TENCON 2010: 2010 IEEE REGION 10 CONFERENCE     1525 - 1530  2010  [Refereed]

     View Summary

    Networks on Chip (NoC) has been widely discussed for its smart structure and high performance. Routing algorithms significantly influence design cost and system performance of NoC. In this paper, a new hardware method called Final-Destination-Tag (FDT) is proposed to improve the original Destination-Tag (DT) method for implementing different routing algorithms. Compared with the DT method, the proposed FDT method could reduce the header size of the packet. We evaluate NoC with this proposed method in terms of circuit resource, average latency, max latency, average throughput and power consumption. The results indicate that the proposed method is effective in increasing throughput and reducing circuit resource, latency and power consumption for NoC.

  • An Efficient 3D NoC Synthesis by Using Genetic Algorithms

    Xin Jiang, Takahiro Watanabe

    TENCON 2010: 2010 IEEE REGION 10 CONFERENCE     1207 - 1212  2010  [Refereed]

     View Summary

    The application of 3D Network on Chip (NoC) provides an effective way for tackling the performance bottleneck for high-performance Systems on Chips (SoCs). How to design an efficient 3D Network on Chip which is satisfied with the communication requirement of 3D system and simultaneously enables significant performance enhancements has encouraged a lot of attention. In this paper, we focus on the automatic design for custom based NoC architecture by use of a novel approach. The synthesis idea is proposed to develop a minimum cost topology and an optimized floorplan to decrease the power consumption, under the hardware and software constraints. Different algorithms are used to solve the sub-problems. In the core to switch connectivity stage, we firstly use Tarjan Algorithm to find the strong connectivity part in the core communication graph, and then use the Min-cut Algorithm to partition the core communication graph into sub-graphs. To establish the switch to switch connection, we apply Genetic Algorithm (GA) to do the path computation and flow control. Finally, we use Genetic Algorithm to solve the switch position problem. Optimized positions of switches in the floorplan for minimizing the power consumption are obtained while meeting the non-overlapping constraints. The experimental results show that our proposed synthesis approach is efficient and much power saving in the application of NoC design work.

  • A Hybrid Architecture for Efficient FPGA-based Implementation of Multilayer Neural Network

    Zhen Lin, Yiping Dong, Yan Li, Takahiro Watanabe

    PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS)     616 - 619  2010  [Refereed]

     View Summary

    This paper presents a novel architecture for the FPGA-based implementation of multilayer neural network (NN), which integrates the layer-multiplexing and pipeline architecture together. The proposed method is aimed at enhancing the efficiency of resource usage and improving the forward speed at the module level, so that a larger NN can be implemented on commercial FPGAs. We developed a mapping method from NN schematic to physical architecture in FPGA by using the hybrid architecture, and also developed an algorithm to automatically determine the architecture by optimizing the application specific neural network topology. The experimental results with several different network topologies show that the proposed architecture can produce a very compact circuit with higher speed, compared with conventional methods.

  • A novel genetic algorithm with different structure selection for circuit design optimization

    Zhiguo Bao, Takahiro Watanabe

    Artificial Life and Robotics   14 ( 2 ) 266 - 270  2009.11

     View Summary

    In the traditional GA, the tournament selection for crossover and mutation is based on the fitness of individuals. This can make convergence easy, but some useful genes may be lost. In selection, as well as fitness, we consider the different structure of each individual compared with an elite one. Some individuals are selected with many different structures, and then crossover and mutation are performed from these to generate new individuals. In this way, the GA can increase diversification into search spaces so that it can find a better solution. One promising application of GA is evolvable hardware (EHW), which is a new research field to synthesize an optimal circuit. We propose an optimal circuit design by using a GA with a different structure selection (GAdss), and with a fitness function composed of circuit complexity, power, and signal delay. Its effectiveness is shown by simulations. From the results, we can see that the best elite fitness, the average fitness value of correct circuits, and the number of correct circuits with GAdss are better than with GA. The best case of optimal circuits generated by GAdss is 8.1% better in evaluation value than that by traditional GA. © 2009 International Symposium on Artificial Life and Robotics (ISAROB).

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A Study of Customized Processor IP Design using WIPER

    Y. Wan, J. Ye, M. Bi, T. Watanabe

    Proc. PrimeAsia’09    2009.11

  • P/G network design to optimize area, performance and power consumption

    Y. Shi, Z. Bao, Y. Wang, X. Zuojun, T. Watanabe

    Proc. PrimeAsia’09    2009.11

  • A new flexible network on chip architecture for mapping complex feedforward neural network

    Y. Dong, C. Li, K. Kumai, Y. Li, Y.Wang, T.Watanabe

    Journal of Signal Processing   13 ( 6 ) 453 - 462  2009.11

  • Reducing Branch Misprediction Penalty in Superscalar Microprocessors by Recovering

    Ye Jiongyao, Wan Yu, Dong Yiping, Bao Zhiguo, Watanabe Takahiro

    Proc. FIT2009 (Forum on Information Technology2009)   1 ( RC-002 ) 121 - 128  2009.09

  • Low power and high speed network on chip architecture for bp neural network

    Y. P. Dong, Y. H. Li, Y. Wang, T. Watanabe

    Proc. ITC-CSCC’09    2009.07

  • An effective method to reduce recovery cache size by using hash table search

    JiongYao Ye, T. Watanabe

    Proc. ITC-CSCC2009    2009.07

  • A novel GA with multi-level evolution for mixed constrained circuit design optimization

    Zhiguo Bao, Takahiro Watanabe

    Proc.NCSP 2009 (RISP Int'l Workshop on Nonlinear Circuits and Signal Processing)     411 - 414  2009.03

    CiNii

  • Mixed NoC architecture for mapping complex feedforward neural network

    Yiping Dong, Takahiro Watanabe

    Proc.NCSP 2009 (RISP Int'l Workshop on Nonlinear Circuits and Signal Processing)     609 - 612  2009.03

  • A novel genetic algorithm with different structure selection for circuit optimization

    Zhiguo Bao, Takahiro Watanabe

    Proc.14th AROB (Int'l Symposium on Artificial Life and Robotics))     218 - 222  2009.02

  • A Novel Genetic Algorithm with Cell Crossover for Circuit Design Optimization

    Zhiguo Bao, Takahiro Watanabe

    ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5     2982 - 2985  2009  [Refereed]

     View Summary

    Evolvable Hardware (EHW) is a new field about the use of Evolutionary Algorithms (EA) to synthesize a circuit. Genetic Algorithm (GA) is one of the typical EA. In traditional GA, the crossover is one-point crossover or two-point crossover. One-point crossover and two-point crossover change the genes of individuals too many in one time and they are not flexible, so it may lose some useful genes. In this paper, we propose the novel cell crossover. The cell crossover can change genes more flexibly and enhance more diversification to search spaces than one-point crossover and two-point crossover, so that we can find better solution. We propose optimal circuit design by using GA with cell crossover (GAcc), and with fitness function composed of circuit complexity, power and signal delay. Simulation results show GAcc is superior to traditional GA in point of the best elite fitness, the average value of fitness of correct circuits and the number of correct circuits. The best optimal circuit generated by GAcc is 27.9% better in evaluating value than that by GA with one-point crossover.

  • High Performance and Low Latency Mapping for Neural Network into Network on Chip Architecture

    Yiping Dong, Yang Wang, Zhen Lin, Takahiro Watanabe

    2009 IEEE 8TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS     891 - 894  2009  [Refereed]

     View Summary

    Various hardware implementations of neural networks have been studied well in recent years. We have already proposed a hardware implementation method for neural network with a Network on Chip (NoC) architecture. A mapping of a neural network on NoC should be tuned to achieve high performance whenever neural network application is changed, so that different mapping methods are needed every time and tedious or burdensome works are required. In this paper, we propose a general mapping strategy based on three rules. The mapping method with this strategy can implement different neural networks applications with NoC architecture. The simulation results show that the proposed method makes the system low latency and high performance.

  • Evolutionary Design for Image Filter using GA

    Zhiguo Bao, Takahiro Watanabe

    TENCON 2009 - 2009 IEEE REGION 10 CONFERENCE, VOLS 1-4     164 - 169  2009  [Refereed]

     View Summary

    This paper describes evolutionary image filter design for noise reduction using Genetic Algorithm (GA), where the circuit complexity, power and signal delay are optimized. First, the evaluating value about correctness, complexity, power and signal delay are introduced to the fitness function. Then GA autonomously synthesizes a circuit which is simple and has good performance. To verify the effectiveness of our method, an image filter for noise reduction is experimentally synthesized. The resultant image filter by GA and the quality of filtered image are discussed.

  • High Dependable Implementation of Neural Networks with Networks on Chip Architecture and a Backtracking Routing Algorithm

    Yiping Dong, Kento Kumai, Zhen Lin, Yinghe Li, Takahiro Watanabe

    2009 ASIA PACIFIC CONFERENCE ON POSTGRADUATE RESEARCH IN MICROELECTRONICS AND ELECTRONICS (PRIMEASIA 2009)     404 - +  2009  [Refereed]

     View Summary

    Networks on Chip (NoC), a new packet-based design method, with a new Dependable No Deadlock (DND) backtracking routing algorithm are proposed to implement Artificial Neural Network (ANN). This system is simulated by NIRGAM NoC simulator to get system performance. Experimental results show that this proposed system has higher Connection-Per-Second (CPS), lower communication load than the exiting other implemented ANN. Furthermore this NoC implementation system is reconfigurable and expandable. In addition, this implementation method has a higher dependable than our former NoC implemented ANN system.

  • A low-power misprediction recovery mechanism

    Jiongyao Ye, Takahiro Watanabe

    2009 ASIA PACIFIC CONFERENCE ON POSTGRADUATE RESEARCH IN MICROELECTRONICS AND ELECTRONICS (PRIMEASIA 2009)     209 - 212  2009  [Refereed]

     View Summary

    In modern superscalar processor, branch misprediction penalty becomes a critical factor in overall processor performance. Previous researches proposed dual (or multi) path execution methods attempt to reduce the misprediction penalty, but these methods are quite complex and high power consumption. Most of the reasons are due to simultaneously fetching and executing instructions from multiple. In this paper, we reduce branch misprediction penalties based on the balance between complexity, power, and performance. We present a novel technique-Decode Recovery Cache (DRC) - for reducing misprediction penalty, giving consideration to complexity and power consumption simultaneously. The DRC stores decoded instructions that are mispredicted. Then during subsequent mispredictions, a hit in the DRC can reduce the re-fill time of pipeline, and eliminate instruction re-fetch and its subsequent decoding. The bypassing of both re-fetching and re-decoding reduces processor power. Experimental results employing SPECint 2000 benchmark show that, using a processor with DRC, IPC value is significantly improved by 10.4% on average over the traditional processors and average power consumption is reduced by 62.6%, compared with dual Path Instruction Processing.

  • An Adaptive Width Data Cache for Low Power Design

    Jiongyao Ye, Takahiro Watanabe

    2009 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC 2009)     488 - 491  2009  [Refereed]

     View Summary

    Reducing the power consumption is one of the most important design problems at present. Modern microprocessors employ caches to bridge the great speed variance between the main memory and the central processing unit, but these caches propose larger and larger proportion in the total power consumption. The Narrow-Width Value (NWV) occupies a large portion of cache access and storage. The storage space for value of any data width is the same in the cache, even if NWV needs only a few bits to be stored. This paper proposes an Adaptive Width Data Cache (AWDC) which exploits the popularity of NWV stored in the cache. In AWDC, the cache data array is divided into several data arrays to adapt different data width to access/store. Its purpose is shutting off corresponding unused high arrays to reduce its dynamic and static power consumption. AWDC achieves low power consumption only by the modification of the high-bit SRAM unit almost without any additional hardware, and does not affect cache performance. Experimental results employing SPEC 2000 benchmarks show that our proposed AWDC can reduce both the dynamic power consumption ant the static power consumption by 44.75% and 42.86%.

  • High performance autoassociative neural network using network on chip

    Yiping Dong, Zhen Lin, Takahiro Watanabe

    2009 1st International Conference on Information Science and Engineering, ICISE 2009     4015 - 4018  2009

     View Summary

    In this paper, an Artificial Autoassociative Neural Network (AANN) is implemented by Network on Chip (NoC) architecture to solve communication and performance problem. This proposed NoC based system can map four neurons in one PE and the whole system consists of PEs each of which connects with a router. This system is reconfigurable and extendable so that it can easily suit for different applications. Simulation results show that the proposed implementation method can reduce communication load and total computation time. ©2009 IEEE.

    DOI

    Scopus

  • スーパスカラプロセッサの分岐回復の高速化に関する研究

    白馬成, 叶炯耀, 高芳, 渡邊孝博

    電子情報通信学会ソサイエティ大会    2008.09

  • Power Consideration Multilevel Partitioning Using Voltage Islands

    Wang Wei, Lin Tao, Watanabe Takahiro

    FIT2008    2008.09

  • Rapid Design of a Multiprocessor Syatem for a JPEG Decoder on FPGA

    Cao Dawei, Chen Keyan, Watanabe Takahiro

    FIT2008    2008.09

  • Network on Chips Structure for Mapping Two Hidden Layers BP-ANNs

    Yiping Dong, Takahiro Watanabe

    Proc.23rd Intn'l Tech. Conf.Circuits/Systems,Computers and Communications (ITC-CSCC2008     601 - 604  2008.07

  • Recovery Scheme to Reduce Latency of Miss-Prediction for Superscalar Processor using L1 Recovery Cache

    JiongYao Ye, Takahiro Watanabe

    Proc. 23rd ITC-CSCC     233 - 236  2008.07

  • FPGAとSoftCoreを用いたチップ・マルチプロセッサの検討

    姜洋, 李策, 陳科研, 曹大為, 渡邊孝博

    電子情報通信学会総合全国大会    2008.03

  • 多層ハイパーグラフを用いた超大規模回路の電圧島の分割問題の解法

    林涛, 王偉, 渡邊孝博

    電子情報通信学会総合全国大会    2008.03

  • Network-on-Chipにおける消費電力を考慮したルーティングの一手法

    白秀君, 佐藤清久, 渡邊孝博

    電子情報通信学会総合全国大会    2008.03

  • パケット位置情報を用いたオンチップ・ルータの消費電力削減手法の提案

    佐藤清久, 白秀君, 渡邊孝博

    電子情報通信学会総合全国大会    2008.03

  • A multiprocessor system for a small size soccer robot control system

    Ce Li, Yang Jiang, Zhenyu Wu, Takahiro Watanabe

    DELTA 2008: FOURTH IEEE INTERNATIONAL SYMPOSIUM ON ELECTRONIC DESIGN, TEST AND APPLICATIONS, PROCEEDINGS     115 - +  2008  [Refereed]

     View Summary

    In this paper, a new fully digitized hardware design scheme of a soccer robot controller is presented as an application of a multiprocessor system. It is designed and implemented on one-chip FPGA with two embedded Nios II processors to verify the effectiveness of our system. In the practical test, the system is dependable, and has the characteristics of fast response and high precision. It also has the advantages of smaller PCB area, less chip number and shorter development period.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Network on Chip architecture for BP Neural Network

    Yiping Dong, Watanabe Takahiro

    2008 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1 AND 2     1083 - 1087  2008  [Refereed]

     View Summary

    Recently, Networks-on-Chips (NoCs) have a great development and have been proposed as a promising solution to complex on-chip communication problems. One of the problems is an application of Artificial Neural Networks (ANNs). In this paper, we propose NoCs for the ANNs. NoCs is designed to implement a BP-ANNs (Back-Propagation) and evaluated by Network-on-Chips. Experimental results show that for has a great reduction in communication load and a high connection per second (CPS) compared with traditional BP-ANNs. It is also reconfigurable, expandable and stable to meet various problems.

  • A New Approach for Circuit Design Optimization using Genetic Algorithm

    Zhiguo Bao, Takahiro Watanabe

    ISOCC: 2008 INTERNATIONAL SOC DESIGN CONFERENCE, VOLS 1-3     383 - 386  2008  [Refereed]

     View Summary

    A circuit designed by human often results in very complex hardware architectures, requiring a large amount of manpower and computational resources. A wider objective is used to find novel solutions to design such complex architectures so that system functionality and performance may not be compromised. Design automation using reconfigurable hardware and Evolutionary Algorithms (EA), such as Genetic Algorithm (GA), is one of the methods to tackle this issue. This concept applies the notion of Evolvable Hardware (EHNV) to the problem domain such as novel design solutions and circuit optimization. EHW is a new field about the use of EA to synthesize a circuit. EA manipulates a population of individuals where each individual describes how to construct a candidate for a good circuit. Each circuit is assigned a fitness, which indicates how well a candidate satisfies the design specification. EA uses stochastic operators repeatedly to evolve new circuit configurations from existing ones, and a resultant circuit configuration will exhibit a desirable behavior. In this paper, optimum circuit design by using GA with fitness function composed of circuit complexity, power and time delay is proposed, and its effectiveness is shown by simulations.

  • High Performance NoC Architecture for two hidden layers BP Neural Network

    Yiping Dong, Watanabe Takahiro

    ISOCC: 2008 INTERNATIONAL SOC DESIGN CONFERENCE, VOLS 1-3     269 - 272  2008  [Refereed]

     View Summary

    Artificial Neural Networks (ANNs) are widely used in applications of an intelligent system such as pattern recognition, fuzzy system, optimization and control. We have already proposed a novel NoC architecture for different kinds of BP-ANNs [1][2] and it was shown that the architecture is a promising hardware implementation for Neural Network. However, some problems to be solved are still remained. One of them is performance. In this paper, we propose another NoC architecture, network topology and routing strategy for higher performance. Experimental results by NoC simulator show that this new architecture and routing strategy reduce the communication load, reduce both latency by 7.7% and dynamic power consumption by 10.3% and also improve throughput by 8.1%, all compared with the previous one.

  • Score sequence pair problems of (r(11), r(12), r(22))-tournaments - Determination of realizability

    Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E90D ( 2 ) 440 - 448  2007.02  [Refereed]

     View Summary

    Let G be any graph with property P (for example, general graph, directed graph, etc.) and S be nonnegative and non-decreasing integer sequence(s). The prescribed degree sequence problem is a problem to determine whether there is a graph G having S as the prescribed sequence(s) of degrees or outdegrees of the vertices. From 1950's, P has attracted wide attentions, and its many extensions have been considered. Let P be the property satisfying the following (1) and (2):
    (1) G is a directed graph with two disjoint vertex sets A and B.
    (2) There are r(11) (r(22), respectively) directed edges between every pair of vertices in A(B), and r(12) directed edges between every pair of vertex in A and vertex in B.
    Then G is called an (r(11), r(12), r(22))-tournament ("tournament", for short). The problem is called the score sequence pair problem of a "tournament" (realizable, for short). S is called a score sequence pair of a "tournament" if the answer of the problem is "yes." In this paper, we propose the characterizations of a score sequence pair of a "tournament" and an algorithm for determining in linear time whether a pair of two integer sequences is realizable or not.

    DOI

    Scopus

  • Construction of an (r(11), r(12), r(22))-tournament from a score sequence pair

    Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11     3403 - +  2007  [Refereed]

     View Summary

    Let G be any directed graph and S be nonnegative and non-decreasing integer sequence(s). The prescribed degree sequence problem is a problem to determine whether there is a graph G with S as the prescribed sequence(s) of outdegrees of the vertices. Let G be the property satisfying the following (1) and (2):
    (1) G has two disjoint vertex sets A and B.
    (2) For every vertex pair u, v epsilon G (u not equal v), G satisfies
    [GRAPHICS]
    where uv (vu, respectively) means a directed edges from u to v (from v to u).
    Then G is called an (r(11),r(12),r(22))-tournament ("tournament", for short). When G is a "tournament," the prescribed degree sequence problem is called the score sequence pair problem of a "tournament", and S is called a score sequence pair of a "tournament "(or S is realizable) if the answer is "yes."
    We proposed the characterizations of a "tournament" and an algorithm for determining in linear time whether a pair of two integer sequences is realizable or not [5]. In this paper, we propose an algorithm for constructing a "tournament" from such a score sequence pair.

  • Realizability of score sequence pair problem of an (r11,r12,r22)-tournament

    Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

    Proc. IEEE APCCAS,Dec.2006     1021 - 1024  2006.12

  • A Consideration of the Score Sequence Pair Problems of (r11,r12,r22)-Tournaments

    Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

    Proc.Int'l Mathematical Conference-Topics in Mathematical Analysis and Graph Theory,Magt Belgrade 2006     50 - 51  2006.09

  • FPGAを用いたμプロセッサのカスタマイズIP

    北島圭祐, 渡邊孝博

    情報処理学会九州支部「火の国情報シンポジウム2006」   論文番号 C-5-3  2006.03

  • 2-3木を用いた回路の階層的分割の検討

    朱小松, 渡邊孝博

    情報処理学会九州支部「火の国情報シンポジウム2006」   論文番号 C-5-4  2006.03

  • ScoresequencePairProblems of (r11、r12、r22)-tournaments construction

    Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

    電子情報通信学会回路とシステム研究会技術報告   CAS2005 ( 70 ) 1 - 6  2006.01

  • Realizability of score sequence pair of an (r(11), r(12), r(22))-tournament

    Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

    2006 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS     1019 - +  2006  [Refereed]

     View Summary

    Let G be any directed graph and S be nonnegative and non-decreasing integer sequence(s). The prescribed degree sequence problem is a problem to determine whether there is a graph G with S as the prescribed sequence(s) of outdegrees of the vertices. Let G be the property satisfying the following (1) and (2):
    (1) G has two disjoint vertex sets A and B.
    (2) For every vertex pair u, v is an element of G (u not equal v), G satisfies
    vertical bar{uv}vertical bar + vertical bar{vu}vertical bar = {r(11) if u, v is an element of A {r(12) if u is an element of A, v is an element of B {r(22) if u, v is an element of B
    where uv (vu, respectively) means a directed edges from u to v (from v to u).
    Then G is called an (r(11),r(12),r(22))-tournamenf ("tournament", for short). When G is a "tournament," the prescribed degree sequence problem is called the score sequence pairproblem of a "tournament", and S is called a score sequence pair of a "tournament" (or S is realizable) if the answer is "yes."
    In this paper, we propose the characterizations of a "tournament" and an algorithm for determining in linear time whether a pair of two integer sequences is realizable or not.

  • μプロセッサIPのカスタマイズ設計

    野村知弘, 渡邊孝博

    情報処理学会九州支部「若手の会セミナー2005」    2005.03

  • カスタマイズ可能なμプロセッサIPに関する研究

    古賀雅隆, 渡邊孝博

    情報処理学会九州支部「火の国情報シンポジウム2005」   論文番号 A-4-4  2005.03

  • 分岐処理の高速化に関する一手法

    叶炯耀, 渡邊孝博

    2005年電子情報通信学会総合大会講演論文集   講演番号 D-6-2   50  2005.03

  • (r11,r12,r22)得点列対問題

    高橋昌也, 渡邊孝博, 吉村猛

    電子情報通信学会コンピュテーション研究会技術報告(COMP2004-72)   104 ( 642 ) 97 - 106  2005.01

  • 大規模回路の階層的分割手法

    韓東,徐軼韜, 渡邊孝博

    Proc.2004 HISS (第6回IEEE広島シンポジウム)     210  2004.12

  • FPGA-IP利用の一手法とその設計環境

    徐軼韜, 渡邊孝博

    平成16年度電気情報関連学会中国支部第55回連合大会講演論文集   論文番号 122006   311  2004.10

  • 大規模システムの効率的な階層木分割手法

    徳本守, 渡邊孝博

    山口大学工学部研究報告   52 ( 1 ) 5 - 12  2001.10

    CiNii

  • 暗号VLSIプロセッサのための固有電力消費アーキテクチャ

    松原裕之, 中村維男, 渡邊孝博

    情報処理学会論文誌   41 ( 4 ) 950 - 957  2001.04

  • シフト直交実数有限長系列に対するM-ary /DS-SS方式用ディジタルマッチトフィルタの演算素子数の検討

    T.Matsumoto, Y.Tanada, T.Watanabe

    Proc.3rd IEEE Signal Processing Workshop on Signal Processing Advances in Wireless Communications,    2001.03

  • A fine grain cooled logic architecture for low-power processors

    H Matsubara, T Watanabe, T Nakamura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E84A ( 3 ) 735 - 740  2001.03  [Refereed]

     View Summary

    In this paper, we propose a fine grain Cooled Logic architecture for low-power oriented processors. Cooled Logic detects, in novel hardware method with dual-rail logic, functional blocks to be active, and stops clocks to each of the functional blocks in order to make it inactive at certain periods. To confirm the effectiveness of our approach, we design a 4-bit and a 16-bit event-driven array multipliers, and analyze their power consumption by the HSPICE simulator. As a result, it is shown that Cooled Logic has a tendency to reduce power consumptions in both the functional blocks and the clock drivers of the multipliers.

  • Digital matched filter of reduced operation elements for M-ary/DS-SS system using real-valued shift-orthogonal finite-length sequences

    T Matsumoto, Y Tanada, T Watanabe

    2001 IEEE THIRD WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, PROCEEDINGS     46 - 49  2001  [Refereed]

     View Summary

    A real-valued shift-orthogonal finite-length sequence has a sharp autocorrelation function with zero sidelobes except at shift ends, and can be synthesized by element sequences. In this paper, we propose the structure of a digital matched Alter for Mary Direct Sequence Spread Spectrum (M-ary/DS-SS) system using the sequences and the reduction of operation elements of the digital matched Alter. It is shown that this digital matched Alter for M-ary/DS-SS system can be constructed by the multipliers and the adders proportional to sequence length.

  • An Architecture for Secure Encryption VLSI Procesors using a Constant-Characteristic Power Dissipation Concept

    H.Matsubara, T.Watanabe, T.Nakamura

    Journal.IPSJ   42 ( 4 ) 950 - 957  2001

  • A clocking scheme for lowering peak-current in dynamic logic circuits

    H Matsubara, T Watanabe, T Nakamura

    IEICE TRANSACTIONS ON ELECTRONICS   E83C ( 11 ) 1733 - 1738  2000.11  [Refereed]

     View Summary

    This paper deals with a new low-power clocking scheme for dynamic logic circuits to reduce power dissipation. Although conventional clocking schemes for dynamic logic circuits are mainly used for high-speed applications like domino circuits, their peak-current are very large due to the concentration of precharging and discharging in a short period. It is hard for these schemes to accomplish both reductions of power dissipation and high performance at the same time. In the field of power engineering, levering power means decreasing peak-to-peak of power keeping its amount. So, we propose a sophisticated clocking scheme leveling power dissipation of processing elements that mainly reduces power dissipation of crock drivers. Our proposed clocking scheme uses an over-lapped clock with a fine-grain power control, and peak-current becomes lower and power dissipation in short period is levered without penalty of speed performance. Our proposed scheme is applied to a 4-bit array multiplier, and reductions of power dissipation of both the multiplier and clock driver are measured by the HSPICE simulator based on 0.5 mum CMOS technology. It is shown that power dissipation of clock drivers, 4-bit array multiplier, and the total are reduced by about 13.2 percent, 2.6 percent and 7.0 percent, respectively. As a result, our clocking scheme is effective in reduction of power dissipations of clock drivers.

  • LSIレイアウトにおけるポリゴン配線の通常配線変換

    江達夫, 渡邊孝博

    山口大学工学部研究報告   51 ( 1 ) 41 - 48  2000.10

    CiNii

  • 低電力のための細粒度電力制御Cooled Logic アーキテクチャ

    松原裕之, 中村維男, 渡邊孝博

    電子情報通信学会 第13回回路とシステム軽井沢ワークショップ    2000.04

▼display all

Books and Other Publications

  • Robot Soccer 〜 Chapter.1 The real-time and embedded soccer robot control system

    C. Li, T. Watanabe, Z. Wu, H. Li, Y. Huangfu, Edited, by Vladan Pap

    Sciyo, Vienna, Austria  2010.01 ISBN: 9789533070360

  • デジタル論理回路の基礎

    笹尾勤, 渡邊孝博, 見山友裕, 澤田直, 橋本浩二

    (財)福岡県産業・科学技術振興財団 システムLSI部  2007.04

  • 回路設計・物理設計の基礎知識

    井上靖秋, 渡邊孝博, 淡野公一, 築添明

    (財)福岡県産業・科学技術振興財団  2005.04

  • 情報工学実験及び演習Ⅰ テキスト

    古賀和利, 中村秀明, 伊藤暁, 山口静馬, 石川昌明, 久長穣, 渡邊孝博

    山口大学工学部知能情報システム工学科  2003.09

  • 最新VLSIの開発設計とCAD 第7章

    渡邊孝博, 大附辰夫, 後藤敏 監

    ミマツデータシステム  1994

Presentations

  • An Adaptive Adjustable Routing Algorithm for 3D Network-on-Chop

    Ma W, Watanabe T

    IEICE General Conf. 2018  Institute of Electronics, Infromation and Communication Engineering

    Presentation date: 2018.03

  • The High-speed Power Line Topology Check by Reducing Vias

    DAC 2011 User Truck (2011 IEEE 48th Deasign Automation Conference) 

    Presentation date: 2011.06

  • Via数削減による大規模LSIレイアウトの高速DRC手法

    情報処理学会 システムLSI設計技術研究会(SLDM) 

    Presentation date: 2011.01

  • ネットワークオンチップによるBPニューラルネットワークの一構成法

    電子情報通信学会2008年総合大会 

    Presentation date: 2008.03

Research Projects

  • Research on NoC system robust to fluctuation of traffic patterns

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2018.04
    -
    2021.03
     

    WATANABE TAKAHIRO

     View Summary

    NoC (Network-on-Chip) is an architecture that integrates hundreds or thousands of processor cores on one chip and can process packet-communication between cores on a network. NoC is easy to make into a large-scale system and has high communication performance. However, as the amount of traffics increases, delays due to communication congestion occur, which becomes a bottleneck for more improving performance. In this research, we proposed a mechanism that enables efficient communication even under traffic congestion. Specifically, it is (1) analysis of the performance evaluation of various routing algorithms for different traffic patterns, (2) traffic congestion detection mechanism, and (3) selection mechanism of the optimum algorithm according to the traffic pattern. The proposed mechanism was evaluated by simulation experiments and the effectiveness was confirmed.

  • A Study of a Tile-based NoC System using IPs and its Design

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2011.04
    -
    2014.03
     

    WATANABE TAKAHIRO

     View Summary

    NoC(Network on Chip) is one of a promising solution to implement the ultra large scale system with high performance on a chip. For improving the design efficiency of NoC, an IP-reused design method was proposed to implement a core in each tile, where design techniques for instruction-level customizable processor IP were developed and its design environment was constructed. Application-specific NoCs of Two- or Three-dimension were also discussed, and NoC architectures for high throughput, low latency and low power were explored and routing algorithms with high performance or fault-tolerancy were developed. Besides, to solve a signal-delay problem of the board-level system composed of NoCs and SoCs(System on Chip), several routing algorithm ware proposed and evaluated

  • A Study of a Tile-based NoC System using IPs and its Design

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2011
    -
    2013
     

    WATANABE TAKAHIRO

     View Summary

    NoC(Network on Chip) is one of a promising solution to implement the ultra large scale system with high performance on a chip. For improving the design efficiency of NoC, an IP-reused design method was proposed to implement a core in each tile, where design techniques for instruction-level customizable processor IP were developed and its design environment was constructed. Application-specific NoCs of Two- or Three-dimension were also discussed, and NoC architectures for high throughput, low latency and low power were explored and routing algorithms with high performance or fault-tolerancy were developed. Besides, to solve a signal-delay problem of the board-level system composed of NoCs and SoCs(System on Chip), several routing algorithm ware proposed and evaluated.

  • System Design Method for Communication SoC

    Project Year :

    2003
    -
    2008
     

  • ICT application LSI IP and Advanced Design Method

    Project Year :

    2007
    -
     
     

  • Prototyping Design for System LSI

    Project Year :

    2003
    -
    2007
     

  • Efficient Design Method for Microprocessors

    Project Year :

    1999
    -
    2002
     

  • Research on Digitalized Spread Spectrum Communication System Using Real-Valued Sequences

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    1998
    -
    1999
     

    TADANO Yoshihiro, MATSUMOTO Takahiro, WATANABE Takahiro

     View Summary

    This project has been performed from 1998 to 1999, for the purpose of developing real-valued sequences for a quasi-synchronous CDMA system with no interchannel interference and with little intersymbol interference and implementing trial degitalized modems for this CDMA system, to bring a novel technique to the current CDMA systems. The aim of this research is nearly attained. Digital fast processors of code generation and correlation are implemented on FPGA, because of construction of the real-valued sequences applicable to fast signal processing algorithm. Performances of correlation processing in the multiple access experiment almost follow those of designed values. The real-valued finite-length sequences developed in this research are effective to design the codes of high efficiency data transmission free from the guard interval to avoid interferences, and so expected to be developed to the future concrete application system.
    The achievements of this project are the followings.
    (1) Developments of the sequences for CDMA and the algorithm of fast correlation
    Orthogonal sets of real-valued finite-length sequences with sharp autocorrelation function are derived from the convolution of element sequences that leads to fast correlation algorithm.
    (2) Trial production of code generator and correlator
    The code generator using ROM and correlator using FPGA corresponding to 10,000 logic gates are implemented for the aproximated integer sequence from the real-valued shift-orthogonal finite-length sequence of length 33.
    The combining operation of 8-bit generator output and 8-bit correlator input presents 16Mchips/s rate and correlation pulse of 1% error.
    (3) Experiment of CDMA transmission
    The experiment of optical and spatial multiplexing transmission based on two real-valued finite-length sequences shows that 0dB D/U ratio at the correlator input is improved to 40dB D/U ratio at its output, satisfying the orthogonality within allowable error.

  • Research on Digitalized Spread Spectrum Communication System Using Real-Valued Sequences

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    1998
    -
    1999
     

    TADANO Yoshihiro, MATSUMOTO Takahiro, WATANABE Takahiro

     View Summary

    This project has been performed from 1998 to 1999, for the purpose of developing real-valued sequences for a quasi-synchronous CDMA system with no interchannel interference and with little intersymbol interference and implementing trial degitalized modems for this CDMA system, to bring a novel technique to the current CDMA systems. The aim of this research is nearly attained. Digital fast processors of code generation and correlation are implemented on FPGA, because of construction of the real-valued sequences applicable to fast signal processing algorithm. Performances of correlation processing in the multiple access experiment almost follow those of designed values. The real-valued finite-length sequences developed in this research are effective to design the codes of high efficiency data transmission free from the guard interval to avoid interferences, and so expected to be developed to the future concrete application system.
    The achievements of this project are the followings.
    (1) Developments of the sequences for CDMA and the algorithm of fast correlation
    Orthogonal sets of real-valued finite-length sequences with sharp autocorrelation function are derived from the convolution of element sequences that leads to fast correlation algorithm.
    (2) Trial production of code generator and correlator
    The code generator using ROM and correlator using FPGA corresponding to 10,000 logic gates are implemented for the aproximated integer sequence from the real-valued shift-orthogonal finite-length sequence of length 33.
    The combining operation of 8-bit generator output and 8-bit correlator input presents 16Mchips/s rate and correlation pulse of 1% error.
    (3) Experiment of CDMA transmission
    The experiment of optical and spatial multiplexing transmission based on two real-valued finite-length sequences shows that 0dB D/U ratio at the correlator input is improved to 40dB D/U ratio at its output, satisfying the orthogonality within allowable error.

  • CAD for Analog LSIs

    Project Year :

    1990
    -
    1998
     

  • Computer-Aided-Design for Analog-Digital Mixed Large Scaled Integration Circuits

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    1995
    -
    1996
     

    WATANABE Takahiro

     View Summary

    (1) Analog Layout Constraints were classified into two groups, constraints explicitly specified by human designers and constraints implicitly derived from circuit design specifications. The latter must be well-defined and incorporated in a design data-base.(2) A device-level global rouitng algorithm was proposed in order to obtain high-performance detailed routing under various layout constraints. The algorithm was also improved to increase efficiency and to meet with a larger layout problem, introducing a new layout-evaluation, function and a divide-and-conquer technique.(3) A multi-layrs rouitng problem was disscussed for Multi-Chip Modules (MCM). An MCM technology is usually used as higher-density packaging technology, but its routing ploblems are similar to analog routing problems such that minimum wire-length, less routing-layrs or less vias used, and preventing signal interference or cross-talk noise. We improved a V4R algorithm and experimental results show that our approach is fairly good in the total routing length. We also discussed algorithm parameters to obtain an optimum routes.(4) Commercial CAD systems were investigated and tried to design some sample circuit, a micro-proceesor and an simple application circuit. Design efficiency was compared between HDL design and conventional gate-level design. We also discussed test generation problems, and proposed an efficient testing for combinatorial circuits and a redundant fault detection for sequential circuits

  • Computer-Aided-Design for Analog-Digital Mixed Large Scaled Integration Circuits

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    1995
    -
    1996
     

    WATANABE Takahiro

     View Summary

    (1) Analog Layout Constraints were classified into two groups, constraints explicitly specified by human designers and constraints implicitly derived from circuit design specifications. The latter must be well-defined and incorporated in a design data-base.
    (2) A device-level global rouitng algorithm was proposed in order to obtain high-performance detailed routing under various layout constraints. The algorithm was also improved to increase efficiency and to meet with a larger layout problem, introducing a new layout-evaluation, function and a divide-and-conquer technique.
    (3) A multi-layrs rouitng problem was disscussed for Multi-Chip Modules (MCM). An MCM technology is usually used as higher-density packaging technology, but its routing ploblems are similar to analog routing problems such that minimum wire-length, less routing-layrs or less vias used, and preventing signal interference or cross-talk noise. We improved a V4R algorithm and experimental results show that our approach is fairly good in the total routing length. We also discussed algorithm parameters to obtain an optimum routes.
    (4) Commercial CAD systems were investigated and tried to design some sample circuit, a micro-proceesor and an simple application circuit. Design efficiency was compared between HDL design and conventional gate-level design. We also discussed test generation problems, and proposed an efficient testing for combinatorial circuits and a redundant fault detection for sequential circuits.

▼display all

Misc

  • CAME : A Novel Fast Connectivity-Aware MER Enumeration Algorithm for the Online Task Placement on Partially Reconfigurable Device (システム数理と応用)

    PAN Tieyuan, Zeng Lian, TAKASHIMA Yasuhiro, Watanabe Takahiro

    電子情報通信学会技術研究報告 = IEICE technical report : 信学技報   115 ( 480 ) 79 - 84  2016.03

    CiNii

  • A Length Matching Routing Method for Disordered Pins in PCB Design

    Zhang Ran, Pan Tieyuan, Zhu Li, Watanabe Takahiro

    Technical report of IEICE. VLD   114 ( 476 ) 103 - 108  2015.03

     View Summary

    In this paper, for the disordered pins in printed circuit board (PCB) design, a heuristics algorithm is proposed to obtain a length matching routing. We initially check the longest common subsequence of pin pairs to assign layers for pins. Then, adopt single commodity flow to generate base routes. R-flip and C-flip are finally carried out to adjust the wire length. The experiments show that our algorithm generates the optimal routes with better wire balance within reasonable CPU times.

    CiNii

  • A Performance Enhanced Dual-switch Network-on-Chip Architecture

    Zeng Lian, Watanabe Takahiro

    Technical report of IEICE. VLD   114 ( 476 ) 97 - 102  2015.03

     View Summary

    Network-on-Chip (NoC) is an attractive solution for future systems on chip (SoC). The network performance depends critically on the performance of packets routing. However, as the network becomes more congested, packets will be blocked more frequently. It would result in degrading the network performance. In this article, we propose an innovative dual-switch allocation (DSA) design. By introducing two switch allocations, we can make utmost use of idle output ports. Experimental results show that our design significantly achieves the performance improvement in terms of throughput and latency at the cost of very little power overhead.

    CiNii

  • A-3-2 Adaptive Router with Predictor using Congestion Degree

    Zeng Lian, Watanabe Takahiro

    Proceedings of the Society Conference of IEICE   2013   45 - 45  2013.09

    CiNii

  • A Behavior-based Adaptive Access-mode for Low-power Set-associative Caches in Embedded Systems

    Jiongyao Ye, Hongfeng Ding, Yingtao Hu, Takahiro Watanabe

      52 ( 12 ) 11p  2011.12

    CiNii

  • A general neural network architecture for efficient FPGA-based implementation (VLSI設計技術)

    Lin Zhen, 董 宜平, 渡邊 孝博

    電子情報通信学会技術研究報告   110 ( 36 ) 61 - 66  2010.05

    CiNii

  • A General Neural Network Architecture for Efficient FPGA-based Implementation

    LIN Zhen, DONG Yi-ping, WATANABE Takahiro

      2010 ( 11 ) 1 - 6  2010.05

    CiNii

  • Optimized Design of Logic Circuit using Genetic Algorithms

    WANG Fangfang, BAO Zhiguo, SU Yi-Wen, WATANABE Takahiro

      2010 ( 2 ) 1 - 6  2010.05

    CiNii

  • C_008 IP re-used design environment for quick customization using a Rip-up IP

    Kamei Tomoki, Watanabe Takahiro

      5 ( 1 ) 173 - 174  2006.08

    CiNii

  • A REDUNDANT FAULT IDENTIFICATION METHOD FOR SEQUENTIAL CIRCUITS BASED ON IMPLICATION PROCEDURE

    FUJIMOTO Yukihiro, WATANABE Takahiro

    Memoirs of the Faculty of Engineering, Yamaguchi University   48 ( 2 ) 213 - 220  1998.03

     View Summary

    Redundant faults in logic circuits are faults that cannot be detected in fault simulation. Therefore removing any redundant fault from a fault list prior to test pattern generation is indisp ensable to efficient circuit testing. In this paper two efficient methods are discussed to identify redundant faults in sequential circuits. We implemented these methods and experimented using ISCAS'89 benchmark circuits. Experimental results show that the both can improve the efficiency of a whole test generation system compared to a system without redundant-fault identification.

    CiNii

  • LSI Multi-Layer Routing Method Using a Flow Graph

    WATANABE Takahiro, OMOTANI Keiji

    Memoirs of the Faculty of Engineering, Yamaguchi University   45 ( 1 ) 83 - 90  1994.10

     View Summary

    Advances in VLSI fabrication technology have made it possible to use more than two routing layers for interconnection. In such a multi-layer routing technology, one of the important objective functions is via-minimization, that is, the number of vias should be kept as small as possible. A topological planar routing (TPR) was proposed to solve this via-minimization problem. TPR is a layer assignment method which assigns each net to one of the layers without crossing other nets in the same layer. Although an optimum TPR is unfortunately known as an NP-complete problem, it can be approximately solved in polynomial time for the channel layout model as a minimum-cost maximum-flow problem using a flow graph. In this paper, we propose an improved TPR for more general layout model like a macrocell layout model, where planarity testing and a flow graph are modified to treat our model. An experimental result shows that our improvements increase an efficiency of usage of multi-layers.

    CiNii

  • Computer Aided Design for LSI Layout

    WATANABE Takahiro

    The Journal of the Institute of Electronics,Information and Communication Engineers   76 ( 7 ) 774 - 782  1993.07

    CiNii

▼display all

 

Internal Special Research Projects

  • Study of Congestion-aware and Fault-tolerant NoC Routing and its implementation on FPGAs

    2020  

     View Summary

    NoC(Network-on-Chip)はMPSoC(Multi-Processor System-on-a-Chip)の一種で、拡張性や通信性能および処理能力の点で非常に優れており、多くの研究が行われている。本研究ではNoCに故障が発生したとき、故障部分を避ける迂回路を効率よく求める手法を提案した。具体的にはNoC上の故障としてリンク遮断が発生した時、Hamiltonian-based Odd-Even Routing手法 を耐故障性を持つように改良した。実験でレイテンシとスループット値を評価した結果、提案手法の有効性を確認した。併せて、通信トラフィックの混雑による性能低下を事前に検出し、防止するための機構を研究した。過去の通信状況に基づいてトラフィック混雑を回避するルートを予測する提案を行い、実験で有効性を確認した。以上の研究成果は国際会議2件の論文として発表した。

  • Traffic-Congestion-Aware Routing Strategy for 2D/3D NoC

    2019  

     View Summary

    NoC(Network-on-Chip)は、コア間のパケット通信をオンチップ・ネットワークによって処理することで、スケーラビリティと通信性能の向上を目指し、大規模なマルチコアシステムを実現するものである。本研究の目的は、通信量が増大して局所的な通信混雑が発生した場合でも良好な性能を発揮できるNoCルーティング機構を開発することである。具体的には、トラフィックパターンに応じた混雑状況の検出機構、ホットスポット・トラフィックパターンでの混雑とアルゴリズム性能との関係分析、および、低コストな混雑検出回路を提案した。また、NoCに故障がある場合にその故障部分を回避するルーティング手法についても取り組んだ。研究成果は4件の査読付き国際会議論文として発表した。&nbsp;

  • 動的再構成可能デバイスによるオンライン・タスク配置問題の効率的解法

    2018   周 亭宇

     View Summary

    動的再構成可能プロセッサ(以下DRP)ではタスクを論理要素に割り当てて並列演算処理し、また、処理が完了したタスクは論理要素群から解放し、そこに別のタスクを割り当てて再利用することができる。オンラインタスク配置問題とは、DRPを効率よく使用しスループットを向上させるために、タスクの処理順序とDRP上の割り当てを最適化する問題である。割り当て問題については、DRP上の領域を管理するデータ構造MERを改良し、再利用可能領域の抽出の高速化手法を提案した。処理順序の最適化については、タスク間に一方向性の通信が存在する場合についてタスク処理順序グラフを定義し、効率の良い処理順序の決定手法を提案した。成果は国際会議等で発表した。

  • ミクスト・シグナルLSIの対称制約条件付き配線手法の研究

    2017   周 亭宇, 戴 Jindun, 黄 洪逸

     View Summary

    ミクスト・シグナルLSIでは信号の干渉や遅延など配線設計に起因する問題が顕著になっている。この問題を解決するため我々は「対称度」なる評価関数を導入し、対称制約を維持できる配線手法を提案した。今回は評価関数について(1)重み係数の影響、 (2)配線障害物がある場合の効果 を検証した。その結果、配線障害物がない場合には、人手設計と同等な経路が得られ、評価関数が機能することが示された。障害物がある場合、一層配線では評価関数の効果が認められるが、配線層数が増えるにつれて経路候補が多くなるため、対称度が同じでも対称性が乏しい配線結果が発生することが判った。今後の課題は配線層数や配線層毎の評価を組み入れることである。

  • LSI/PCBの自動配線アルゴリズムに関する研究

    2016   蒋 欣, 潘 鉄源, 張 子驕

     View Summary

    &nbsp; 集積回路の設計において回路動作や性能に影響を与える配線設計は重要である。そのために複数ネットの配線長を揃える等長配線の自動化手法があるが、バス配線やクロック配線での遅延やスキュをより高精度に考慮するために、ペア配線の対称性も問題となっている。本研究では多層配線においてペア配線を対象形状にする手法を研究した。配線経路探索では、最大フローアルゴリズムを利用して効率よく所望の経路を探索する。対称形状の評価のために、配線長、配線折曲数、配線方向の関数である対称度(symmetrical rate)を定義した。実験の結果、提案手法による配線経路は対称度か高く、少ない配線層で、経路探索時間も従来手法と比べて短縮できることが示された。&nbsp;

  • NoCベース高性能演算処理システムの構成方式と設計技術に関する研究

    2014  

     View Summary

    VLSIシステムの大規模化と高性能化の要求に対処するため、設計技術やデバイス技術など様々な観点から研究が行われている。VLSI構成方式の点からネットワークオンチップ(NoC)が、また、デバイス構造の点から3次元化が着目されている。本研究では3次元NoCに着目し、そのアーキテクチャと高性能なネットワーク処理を可能にするルーティング機構を研究開発した。シミュレーション実験の結果、提案する3次元NoCは従来アーキテクチャに比べて一層の低消費電力化、低レイテンシ化、高スループット化を達成できることが確認できた。また、信頼性向上のために耐故障機構を組み込み、NoCのノードやリンクに故障が発生した場合でも有効に対応できることを確認した。&nbsp;VLSI systems become larger and larger and their performance requrement is more and more sever. To meet such situation, Network-on-Chip(NoC) and Three Dimentional(3D) VLSI are very attractive.&nbsp;This research focuses on 3D NoC, where NoC Architecture and network topologies&nbsp; are studied. Experimental results by simulation show that the proposed NoC architecture has lower latency and higher throuput compared to the traditional NoC. Furthermore, Fault-tolerancy is also implemented for higher reliability of NoC.

  • カスタマイズ可能なIPを用いたSoC設計とその応用システムの構成

    2010  

     View Summary

    本課題の研究項目は次の3点からなる:(1)Rip-up IPとカスタマイズ設計利用環境の開発、(2)Rip-up IPによるSoC/NoCアーキテクチャの研究、(3)応用分野の研究。今年度は特に(1)に重点を置いてカスタマイズ手法の確立を図ることとし、その上で、(2)の大規模SoC/NoCアーキテクチャ検討を行った。(1)では、Rip-upIP方式のプロセッサIPの命令レベルのカスタマイズおよびプロセッサを利用するアプリケーションの機能レベルのカスタマイズを実験した。アプリケーションに用いられるアルゴリズムを機能ととらえ、該当機能を実現するのに必要な命令集合を抽出することで、既開発の命令レベルカスタマイズ処理を利用する。DSPをモチーフにして、機能レベルカスタマイズが可能であることを確認した。この機能レベル処理に伴い、Rip-upIPを用いる設計利用環境“WIPER”の改良を行った。すなわち、C記述のアプリケーションと対象とするプロセッサIPの全命令セットとから必要な命令のサブセットを生成する処理部を改良し、“WIPER-Ⅱ”とした。次に、アプリケーションアルゴリズムと汎用のプロセッサIPを入力として、カスタマイズに至る一連の処理フローの確立と、適用分野の拡大を図ることを目的に、商用ツールの調査を行った。調査の結果、特定用途プロセッサ(ASIP:Application Specific Instruction-set Processor)の自動生成ツールであるASIP-Meister(ASIPソリューションズ社製)をRip-upIPライブラリ作成に利用することを提案し、WIPERシステムと繋ぐための開発作業を開始した。(2)では、マルチプロセッサSoC (MPSoC)の発展的な応用としてNoC(Network-on-Chip)を採り上げ、応用分野に適したアーキテクチャ検討と、性能向上のためのルーティング方式およびルータ回路の試作と評価を行った。応用としてはニューラルネットワーク(NN)のハードウェア化に着目し、NoCにNNを実装するための手法、拡張性、処理性能、電力を評価した。既存のハードウェアNNと比べて、これら評価の上で優れていることが明らかにされ、成果を学会論文および国際会議等で報告した。

  • カスタマイズ可能IPを用いたSoC実現手法とその応用システム構築に関する研究

    2009  

     View Summary

    本研究ではカスタマイズ可能なIPリソースを利用したSoC(System on a Chip)およびNoC(Network on a Chip)の設計方式を研究するとともに、カスタマイズ作業とその支援環境方式を開発評価することを目的に行った。 カスタマイズ可能なIPとして、Rip-upIPと呼ぶ方式を既に開発しているが、今年度はIP種類の拡大を狙って、DSPのIP化とそのカスタマイズ方式、ビット幅をパラメータとするカスタマイズ処理を研究し、これに対応するように設計環境改良の設計を行った。特にカスタマイズ可能なDSPのテーマでは、従来のインストラクションレベルでカスタマイズする方式から、アルゴリズムレベルでカスタマイズする方式を検討し、IP利用者にとっての使いやすさの向上を目指した。この手法はIPの利用度をさらに高めることができ、また、カスタマイズ可能IPの品種数を拡大することにもつながる。 SoCについては、DSPとプロセッサから構成された組込システムをモチーフにして、DSPの代わりに複数のプロセッサIPからなるマルチプロセッサシステムを設計し、FPGA上に実装して、設計効率や性能を評価した。また、NoCについては、二次元平面上に配置されたIPコア間のネットワーク構造(トポロジ)、ルータ回路、および ルーティング戦略を検討し、レイテンシ、スループット、消費電力などの観点から評価し、有効なNoCの構成を明らかにした。ここで評価のための具体的事例としては、ANN(Artificial Neural Network)を想定した。これらの研究を通して、大規模なシステムを構築するのに必要なIP群と性能、カスタマイズ要求の条件を検討した。 一方、有力なIPセットを提供することも重要である。そこで、プロセッサIPの高性能化とIP設計に利用される基本回路の生成に関わる研究を平行して行った。前者ではIPの低消費電力を目標に、特にキャッシュ構造の新方式を提案し、シミュレーションにより効果を確認した。後者ではGAを用いた新しい回路生成・最適化方式を提案し、実験で良質な回路が生成できることを実験的に確認した。 以上の研究成果は別項の研究成果発表で記すように、随時、国際会議や学術雑誌等で発表した

  • カスタマイズIPによる高性能マルチプロセッサSoCの効率的設計手法

    2008  

     View Summary

     本研究では,前年度までに研究してきた「カスタマイズできるプロセッサIPとそのカスタマイズ設計環境」を用いて,マルチプロセッサSoC (以下 MPSoC *1)を構築する設計手法の検討,具体的なアプリケーションを対象にMPSoCをFPGAで試作し評価すること,および,MPSoCの今後の発展形として幅広い分野の応用が期待されているNoC(*2)の構成法の研究を行った.また,IPの最適化設計にAI手法を導入した新しい試みとSoCの高性能・低電力化を達成するためのプロセッサアーキテクチャを並行して研究した.具体的な研究内容と成果は以下の通りである.成果は国際会議等で論文発表した.(*1 MPSoC : Multi-Processor System on a Chip *2 NoC : Network on Chip)(1)カスタマイズ可能IP“Rip-up IP”としてCOMET,x86互換,MiniMIPSの3種類のプロセッサIPを開発し,IP設計支援環境“WIPER” を用いてカスタマイズし,FPGA実装して,回路規模,処理性能,消費電力等の評価を行った.(2)MPSoCの性能を予測するための一手法として,FPGAを用いたマルチプロセッサシステムを試作した.アプリケーションはJPEG エンコーダ・システムで,AlteraFPGAにハードウェアおよびJPEGソフトウェアを実装した.1プロセッサから4プロセッサまでの 構成で,タスク分担や設計方法の工夫を行い,性能比較を行った.(3)SoC上の複数コアの各々にルータを取り付け,ネットワークで信号伝送するNoCを用いて,NeuralNetworkのハードウェア化の アーキテクチャを提案し,NoCシミュレータによる性能評価実験とFPGAによる消費電力見積もりなどの評価を行った.(4)SoCのためのIP利用をさらに促進させるためには,優れたIPが効率よく開発される必要がある.複数個の設計制約条件下でAI手法 を用いてをIPの最適設計ができるシステムを研究した.改良型GAを提案し,小規模な論理回路を例に,複数条件に対して最適設計 が得られることをシミュレーション実験で確認した.(5)プロセッサの高性能化の一ボトルネックである,分岐命令からの回復処理を高速化するハードウェア機構を提案し, シミュレーションで評価実験を行った.

  • マルチプロセッサSoCを指向したRip-up IP利用設計

    2007  

     View Summary

    本研究テーマでは,システムの仕様に応じてカスタマイズできるIPとその利用設計環境を研究開発し,さらに,複数のカスタマイズ可能プロセッサIPの組合せによるマルチプロセッサSoC型の高性能システムの実現を目的として行なった.具体的には以下のとおり.1. カスタマイズ可能IP“Rip-up IP”とその設計支援環境の研究(1)プロセッサを例に機能の削除・追加が容易なIPである“Rip-upIP”を開発した. (2)x86命令互換プロセッサのRip-upIPを作成し,評価実験を行った. FPGA実装時の回路規模と動作周波数では期待通りの値を得たが,命令削減による論理ブロックのトグル数上昇と動作周波数が高くなったことから,省電力効果は不十分であった.パワーゲーティングなどの低電力機構を導入することによって解決できる. (3)カスタマイズ設計支援環境“WIPER”を開発した.Rip-upIPのソース記述形式を定義してIPライブラリ化を行い,GUIベースのカスタマイズ作業支援ツールを開発した.2.MPSoC構成とその発展形NoCの研究MPSoCをFPGAのソフトIPコアを利用して実験した.さらに,通信バスの代わりにネットワークでコア間通信を行なうNoC (Network on a Chip)について研究した. (1)Altera社製FPGAのソフトコアであるNIOSプロセッサを用いて,制御システムを題材にMPSoCの構成実験を行なった.従来のDSP+プロセッサ構成を,様々な組合せの2プロセッサ構成に置き換え,設計効率や性能等を比較検討した.(2)MPSoCの将来形として,プロセッサやメモリなどの機能ユニットをコアとし,各コアはルータを有してチップ上ネットワークで結合されるNoCについて研究した.今年度は特に低電力化をキーワードに, ネットワークのルーティング問題およびスリープ制御機構による消費電力削減機構を研究した.

  • システムLSIの超短期設計のための基礎技術に関する研究

    2004   吉村猛, 木村晋二, 土井伸洋

     View Summary

    システムLSIの超短期設計実現を目標に,特に以下の研究を行った.(1) 新デバイス構造LSとして,ラッチベースの回路構成を提案し,回路のタイミング最適化と動作高速化が実現することを示した. この成果は電子情報通信学会論文誌Eに掲載された.(2) 浮動小数点処理を固定小数点処理に自動変換するアルゴリズムを提案し,アルゴリズムのハードウェア化においての面積縮小と 動作速度向上が可能であることを示した.本成果は第12回SASIMI (Synthesis And System Integration of Mixed Information tech.) ワークショップに採択,掲載された.(3) 超大規模回路を実用的な時間で設計するための回路分割手法を提案し,任意数への分割アルゴリズムを作成した. この成果は第6回IEEE広島支部学生シンポジウムで発表した.(4) システムLSIのCPUコアへの利用を目的にμプロセッサのFPGA-IPを設計した.また,プロセッサをユーザの仕様に応じてカスタマイズ するための設計環境を試作した.これらの成果は電気・情報関連学会中国支部連合大会,情報処理学会九州支部シンポジウム等で発表した. また,北九州学術研究都市で行われた産学連携フェアにて展示した.(5) 大規模なハードウェア・システムを合理的な規模の複数FPGAで実現し,FPGAエミュレーションを容易にするための分割・実装手法を 提案した.この成果は電子情報通信学会総合大会にて発表した. 以上,システムLSIの実機検証を含めて設計期間を短縮するための基礎技術 ならびに より高性能な回路を実現するための回路構成を研究し,成果を公開した.今後はこれら基礎技術の各々を改良し,更に性能を向上させると共に,超短期設計のための一貫した設計フローとして統合化を図っていくことが課題である.

▼display all