研究者詳細 - 渡邊　孝博

写真a

ワタナベ　タカヒロ

渡邊　孝博

Scopus 論文情報

論文数: 114 Citation: 441 h-index: 11

Click to view the Scopus page. The data was downloaded from Scopus API in April 18, 2026, via http://api.elsevier.com and http://www.scopus.com .

所属

理工学術院

職名

名誉教授

学位

工学博士 ( 東北大学 )

ホームページ

http://www.f.waseda.jp/watt

経歴

2003年

-

　

早稲田大学大学院情報生産システム研究科教員
1990年

-

2003年

山口大学工学部教員
1979年

-

1990年

㈱東芝総合研究所（Research＆DevelopmentCenter）研究員, 主任研究員, グループリーダ

学歴

　

-

1979年

東北大学工学研究科情報工学
　

-

1976年

山口大学工学研究科電気工学
　

-

1974年

山口大学工学部電気工学科

所属学協会

　

　

　

電子情報通信学会
　

　

　

人工知能学会
　

　

　

信号処理学会
　

　

　

IEEE(the Institute of Electrical and Electoronics Engineeers,Inc.)
　

　

　

情報処理学会
　

　

　

電子情報通信学会

▼全件表示

研究分野

計算機システム

研究キーワード

LSI、電子回路、集積回路、自動設計、計算機支援設計、プロセッサ、アルゴリズム

受賞

電子情報通信学会学術奨励賞

1983年

論文

Predicting stock high price using forecast error with recurrent neural network

Zhiguo Bao, Qing Wei, Tingyu Zhou, Xin Jiang, Takahiro Watanabe

Applied Mathematics and Nonlinear Sciences 2021年05月

　概要を見る

<title>Abstract</title>
Stock price forecasting is an eye-catching research topic. In previous works, many researchers used a single method or combination of methods to make predictions. However, accurately predicting stock prices is very difficult. To improve the predicting precision, in this study, an innovative prediction approach was proposed by recurrent substitution of forecast error into the historical neural network model through three steps. According to the historical data, the initial predicted value of the next day is obtained through the neural network. Then, the prediction error of the next day is obtained through the neural network according to the historical prediction error. Finally, the initial predicted value and the prediction error are added to obtain the final predicted value of the next day. We use recurrent neural network prediction methods, such as Long Short-Term Memory Network Model and Gated Recurrent Unit, which are popular in the recent neural network study. In the simulations, the past stock prices of China from June 2010 to August 2017 are used as training data, and those from September 2017 to April 2018 are used as test data. The experimental findings demonstrate that the proposed method with forecast error gives a more accurate prediction result for the stock’s high price on the next day, which indicates that the performance of the proposed one is superior to that of the traditional models without forecast error.

DOI

Scopus

10

被引用数

(Scopus)
High performance virtual channel based fully adaptive 3D NoC routing for congestion and thermal problem

Xin Jiang, Xiangyang Lei, Lian Zeng, Takahiro Watanabe

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E100A ( 11 ) 2379 - 2391 2017年11月

　概要を見る

Recent Network on Chip (NoC) design must take the thermal issue into consideration due to its great impact on the network performance and reliability, especially for 3D NoC. In this work, we design a virtual channel based fully adaptive routing algorithm for the runtime 3D NoC thermal-aware management. To improve the network throughput and latency, we use two virtual channels for each horizontal direction and design a routing function which can not only avoid deadlock and livelock, but also ensure high adaptivity and routability in the throttled network. For path selection, we design a strategy that takes priority to the distance, but also considers path diversity and traffic state. For throttling information collection, instead of transmitting the topology information of the whole network, we use a 12 bits register to reserve the router state for one hop away, which saves the hardware cost largely and decreases the network latency. In the experiments, we test our proposed routing algorithm in different states with different sizes, and the proposed algorithm shows better network latency and throughput with low power compared with traditional algorithms.

DOI

Scopus
An adaptive routing algorithm based on network partitioning for 3D Network-on-Chip

Jindun Dai, Xin Jiang, Takahiro Watanabe

IEEE CITS 2017 - 2017 International Conference on Computer, Information and Telecommunication Systems 229 - 233 2017年09月

　概要を見る

This paper presents an efficient routing algorithm for 3D meshes without virtual channels. The proposed routing algorithm is extended from 2D east-first routing algorithm and based on network partitioning. It is proven that the proposed method is free from deadlock. In comparison of previous routing algorithms, the average degree of adaptiveness is higher. This feature contributes to higher communication efficiency. Experimental results show that the proposed method can achieve lower communication latency and higher throughput over other traditional methods.

DOI

Scopus

4

被引用数

(Scopus)
Behavior-aware cache hierarchy optimization for low-power multi-core embedded systems

Huatao Zhao, Xiao Luo, Chen Zhu, Takahiro Watanabe, Tianbo Zhu

MODERN PHYSICS LETTERS B 31 ( 19-21 ) 2017年07月 [査読有り]

　概要を見る

In modern embedded systems, the increasing number of cores requires efficient cache hierarchies to ensure data throughput, but such cache hierarchies are restricted by their tumid size and interference accesses which leads to both performance degradation and wasted energy. In this paper, we firstly propose a behavior-aware cache hierarchy (BACH) which can optimally allocate the multi-level cache resources to many cores and highly improved the efficiency of cache hierarchy, resulting in low energy consumption. The BACH takes full advantage of the explored application behaviors and runtime cache resource demands as the cache allocation bases, so that we can optimally configure the cache hierarchy to meet the runtime demand. The BACH was implemented on the GEM5 simulator. The experimental results show that energy consumption of a three-level cache hierarchy can be saved from 5.29% up to 27.94% compared with other key approaches while the performance of the multi-core system even has a slight improvement counting in hardware overhead.

DOI

Scopus

1

被引用数

(Scopus)
Behavior-aware cache hierarchy optimization for low-power multi-core embedded systems

Huatao Zhao, Xiao Luo, Chen Zhu, Tianbo Zhu, Takahiro Watanabe

Modern Physics Letters B 31 ( 19 ) 1 - 7 2017年04月 [査読有り]
High-Throughput Message Digest (MD5) Design and Simulation-Based Power Estimation Using Unfolding Transformation

Suhaili Shamsiah binti, Watanabe Takahiro

Journal of Signal Processing 21 ( 6 ) 233 - 238 2017年

　概要を見る

The high throughput of the cryptographic hash function has become an important aspect of the hardware implementation of security system design. There are several methods that can be used to improve the throughput performance of MD5 design. In this paper, four types of MD5 design were proposed: MD5 iterative design, MD5 unfolding design, MD5 unfolding with 4 stages of pipelining design and MD5 unfolding with 32 stages of pipelining design. The results indicated that MD5 unfolding with 32 stages pipelining of design provides a high throughput compared with other MD5 designs. By using an unfolding transformation factor of 2, the number of cycles of MD5 design was reduced from 64 to 32. All the proposed designs were successfully designed using Verilog code and simulated using ModelSim. The throughput of MD5 unfolding with 32 stages of pipelining design was increased significantly to 137.97 Gbps, and the power of this MD5 unfolding with 32 stages of pipelining was 750.99 mW. Therefore, it is suggested that an unfolding transformation with a high performance pipelining are applied to MD5 hash function design in order to produce an embedded security system design. This paper is expected to be for improving the maximum frequency and the throughput of MD5 design. Thus, by increasing the number of stages in MD5 unfolding design, the performance of MD5 designs can be improved significantly.

DOI CiNii
High Performance Virtual Channel Based Fully Adaptive 3D NoC Routing for Congestion and Thermal Problem

JIANG Xin, LEI Xiangyang, ZENG Lian, WATANABE Takahiro

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 100 ( 11 ) 2379 - 2391 2017年

　概要を見る

<p>Recent Network on Chip (NoC) design must take the thermal issue into consideration due to its great impact on the network performance and reliability, especially for 3D NoC. In this work, we design a virtual channel based fully adaptive routing algorithm for the runtime 3D NoC thermal-aware management. To improve the network throughput and latency, we use two virtual channels for each horizontal direction and design a routing function which can not only avoid deadlock and livelock, but also ensure high adaptivity and routability in the throttled network. For path selection, we design a strategy that takes priority to the distance, but also considers path diversity and traffic state. For throttling information collection, instead of transmitting the topology information of the whole network, we use a 12 bits register to reserve the router state for one hop away, which saves the hardware cost largely and decreases the network latency. In the experiments, we test our proposed routing algorithm in different states with different sizes, and the proposed algorithm shows better network latency and throughput with low power compared with traditional algorithms.</p>

CiNii
High Performance Virtual Channel Based Fully Adaptive Thermal-aware Routing for 3D NoC

Xin Jiang, Xiangyang Lei, Lian Zeng, Takahiro Watanabe

PROCEEDINGS OF THE EIGHTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED) 289 - 295 2017年 [査読有り]

　概要を見る

The thermal problem is a challenge in recent Network on Chip (NoC) designs due to its great impact on the network performance and reliability, especially for 3D NoC. In this work, we design a virtual channel based fully adaptive routing algorithm for the runtime 3D NoC thermal-aware management. For throttling information collection, instead of transmitting the topology information of the whole network, we use a 12 bits register to reserve the router state for one hop away instead of transmitting the topology information of the whole network. It saves the hardware cost largely and decreases the network latency. To ensure deadlock and livelock free and minimize the hardware overhead, we only use two virtual channels for each horizontal channel to achieve full adaptivity and high routability. For path selection, we design a strategy that takes priority to the distance, but also consider path diversity and traffic state. Experimental results show that the proposed algorithm shows better network latency and throughput with low power compared with traditional algorithms.

DOI

Scopus

8

被引用数

(Scopus)
A Fast MER Enumeration Algorithm for Online Task Placement on Reconfigurable FPGAs

Tieyuan Pan, Lian Zeng, Yasuhiro Takashima, Takahiro Watanabe

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E99A ( 12 ) 2412 - 2424 2016年12月 [査読有り]

　概要を見る

In this paper, we propose a fast Maximal Empty Rectangle (MER) enumeration algorithm for online task placement on reconfigurable Field-Programmable Gate Arrays (FPGAs). On the assumption that each task utilizes rectangle-shaped resources, the proposed algorithm can manage the free space on FPGAs by an MER list. When assigning or removing a task, a series of MERs are selected and cut into segments according to the task and its assignment location. By processing these segments, the MER list can be updated quickly with low memory consumption. Under the proof of the upper limit of the number of the MERs on the FPGA, we analyze both the time and space complexity of the proposed algorithm. The efficiency of the proposed algorithm is verified by experiments.

DOI CiNii

Scopus

6

被引用数

(Scopus)
A fast MER enumeration algorithm for online task placement on reconfigurable FPGAs

Tieyuan Pan, Lian Zeng, Yasuhiro Takashima, Takahiro Watanabe

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E99A ( 12 ) 2412 - 2424 2016年12月

　概要を見る

In this paper, we propose a fast Maximal Empty Rectangle (MER) enumeration algorithm for online task placement on reconfigurable Field-Programmable Gate Arrays (FPGAs). On the assumption that each task utilizes rectangle-shaped resources, the proposed algorithm can manage the free space on FPGAs by an MER list. When assigning or removing a task, a series of MERs are selected and cut into segments according to the task and its assignment location. By processing these segments, the MER list can be updated quickly with low memory consumption. Under the proof of the upper limit of the number of the MERs on the FPGA, we analyze both the time and space complexity of the proposed algorithm. The efficiency of the proposed algorithm is verified by experiments.

DOI

Scopus

6

被引用数

(Scopus)
An efficient highly adaptive and deadlock-free routing algorithm for 3D network-on-chip

Lian Zengy, Tieyuan Pan, Xin Jiang, Takahiro Watanabe

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E99A ( 7 ) 1334 - 1344 2016年07月

　概要を見る

As the semiconductor technology continues to develop, hundreds of cores will be deployed on a single die in the future Chip-Multiprocessors (CMPs) design. Three-Dimensional Network-on-Chips (3D NoCs) has become an attractive solution which can provide impressive high performance. An efficient and deadlock-free routing algorithm is a critical to achieve the high performance of network-on-chip. Traditional methods based on deterministic and turn model are deadlock-free, but they are unable to distribute the traffic loads over the network. In this paper, we propose an efficient, adaptive and deadlock-free algorithm (EAR) based on a novel routing selection strategy in 3D NoC, which can distribute the traffic loads not only in intra-layers but also in inter-layers according to congestion information and path diversity. Simulation results show that the proposed method achieves the significant performance improvement compared with others.

DOI

Scopus

7

被引用数

(Scopus)
An online task placement algorithm based on MER enumeration for partially reconfigurable device

Tieyuan Pan, Li Zhu, Lian Zeng, Takahiro Watanabe, Yasuhiro Takashima

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E99A ( 7 ) 1345 - 1354 2016年07月

　概要を見る

Recently, due to the development of design and manufacturing technologies for VLSI systems, an embedded system becomes more and more complex. Consequently, not only the performance of chips, but also the flexibility and dynamic adaptation of the implemented systems are required. To achieve these requirements, a partially reconfigurable device is promising. In this paper, we propose an efficient data structure to manage the reconfigurable units. And then, on the assumption that each task utilizes the rectangle shaped resources, a very simple MER enumeration algorithm based on this data structure is proposed. By utilizing the result of MER enumeration, the free space on the reconfigurable device can be used suffi-ciently. We analyze the complexity of the proposed algorithm and confirm its efficiency by experiments.

DOI

Scopus

3

被引用数

(Scopus)
High throughput evaluation of SHA-1 implementation using unfolding transformation

Suhaili, Shamsiah Binti, Watanabe, Takahiro

ARPN Journal of Engineering and Applied Sciences 11 ( 5 ) 3350 - 3355 2016年03月

　概要を見る

© 2006-2016 Asian Research Publishing Network (ARPN).Hash Function is widely used in the protocol scheme. In this paper, the design of SHA-1 hash function by using Verilog HDL based on FPGA is studied to optimise both hardware resource and performance. It was successfully synthesised and implemented using Altera Quartus II Arria II GX: EP2AGX45DF29C4. In this paper, two types of design are proposed, namely SHA-1 and SHA-1unfolding. The maximum frequency of SHA-1 design is 274.2 MHz which is higher than SHA-1 unfolding that has the maximum frequency of only 174.73 MHz. However, this leads to a high throughput of the SHA1 unfolding design with 2236.54 Mbps. Besides, both designs provide a small area implementation on Arria II that requires only 423 and 548 Combinational ALUTs, 693 and 907 total register respectively.
An Online Task Placement Algorithm Based on MER Enumeration for Partially Reconfigurable Device

PAN Tieyuan, ZHU Li, ZENG Lian, WATANABE Takahiro, TAKASHIMA Yasuhiro

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 99 ( 7 ) 1345 - 1354 2016年

　概要を見る

Recently, due to the development of design and manufacturing technologies for VLSI systems, an embedded system becomes more and more complex. Consequently, not only the performance of chips, but also the flexibility and dynamic adaptation of the implemented systems are required. To achieve these requirements, a partially reconfigurable device is promising. In this paper, we propose an efficient data structure to manage the reconfigurable units. And then, on the assumption that each task utilizes the rectangle shaped resources, a very simple MER enumeration algorithm based on this data structure is proposed. By utilizing the result of MER enumeration, the free space on the reconfigurable device can be used sufficiently. We analyze the complexity of the proposed algorithm and confirm its efficiency by experiments.

CiNii
An Efficient Highly Adaptive and Deadlock-Free Routing Algorithm for 3D Network-on-Chip

ZENG Lian, PAN Tieyuan, JIANG Xin, WATANABE Takahiro

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 99 ( 7 ) 1334 - 1344 2016年

　概要を見る

As the semiconductor technology continues to develop, hundreds of cores will be deployed on a single die in the future Chip-Multiprocessors (CMPs) design. Three-Dimensional Network-on-Chips (3D NoCs) has become an attractive solution which can provide impressive high performance. An efficient and deadlock-free routing algorithm is a critical to achieve the high performance of network-on-chip. Traditional methods based on deterministic and turn model are deadlock-free, but they are unable to distribute the traffic loads over the network. In this paper, we propose an efficient, adaptive and deadlock-free algorithm (EAR) based on a novel routing selection strategy in 3D NoC, which can distribute the traffic loads not only in intra-layers but also in inter-layers according to congestion information and path diversity. Simulation results show that the proposed method achieves the significant performance improvement compared with others.

CiNii
Fully adaptive thermal-aware routing for runtime thermal management of 3D network-on-chip

Jiang, Xin, Lei, Xiangyang, Zeng, Lian, Watanabe, Takahiro

Lecture Notes in Engineering and Computer Science 2 659 - 664 2016年01月

　概要を見る

Thermal problem is an essential issue which must be taken Into account in the 3D Network-on-Chip NoC) design, because it has a great impact on not only the network performance, but also the reliability of the message transmission. Tn this work, we prescnt a fully adaptive runtime thermal-aware routing algorithm, which combines the distance, traffic state, path dhersity and the thermal impact in the path determination. By simultaneously considering all these factors, the routing algorithm can effectively balance the traffic load while keeping high adaptivity and routability, which also results in an even distribution of temperature across the network. Instead of collecting the topology information of the whole network, we utilize a 12 bits register to reserve the router state for one hop away, which saves the hardware cost largely and decreases the network latency. The simulation results show our proposed routing algorithm can improve the latency and energy consumption by comparing with other previously proposed thermal-aware routing schemes, and the improvement is more remarkable in large scale networks.
C-009 A Novel Routing Algorithm based on Path Diversity and Congestion Estimation

洪揚, 曾濂, 蒋欣, 渡邊孝博

情報科学技術フォーラム講演論文集 14 ( 1 ) 251 - 252 2015年08月

　概要を見る

This paper proposes a minimal adaptive routing algorithm for Network-on-Chip, which takes congestion information and routing diversity into consideration. Congestion is one of the most important factors on performance. Our algorithm can select a lower latency path for packet transmission according to the following conditions: the free buffer size of two neighbor routers is compared, the direction which has more different paths to the destination is chosen, it decides which direction the packet more tend to be transmitted by the position of the source and destination. As the result, the algorithm gets the proper direction for the next step. Comparing to other algorithms, our proposed routing algorithm has less latency and better throughput.

CiNii
C-008 A High Density Escape Routing Method for Staggered-Pin-Array Based Mixed-Pattern Signal Model

徐倩影, 潘鉄源, 張然, 田楊, 渡邊孝博

情報科学技術フォーラム講演論文集 14 ( 1 ) 249 - 250 2015年08月

　概要を見る

Escape routing is one of key problems in design of printed circuit boards (PCB), and it becomes more and more difficult for manual design due to increased pin count. This paper proposed an algorithm using an unified model to formulate the problem of escape routing of differential pairs together with single signals (mixed-pattern signals) on staggered pin array (SPA) considering the routing's density.

CiNii
RC-009 DRC検証作業の負荷を軽減するシステムの開発(C分野:ハードウェア・アーキテクチャ,査読付き論文)

亀井智紀, 渡邊孝博

情報科学技術フォーラム講演論文集 14 ( 1 ) 69 - 74 2015年08月

CiNii
Sorting-Based I/O Connection Assignment and Non-Manhattan RDL Routing for Flip-Chip Designs

Zhang Ran, Watanabe Takahiro

電気学会論文誌. C 135 ( 12 ) 1535 - 1544 2015年

　概要を見る

In modern VLSI designs, a flip-chip package is widely used to meet the higher integration density and the larger I/O counts of circuits. Recently the I/O buffers are mapped onto bump balls without changing the module placement using re-distribution layer (RDL) in flip-chip designs. In this research, a sorting-based I/O connection assignment and non-Manhattan RDL routing method is proposed for area I/O flip-chip designs. The approach initially assigns the I/O buffers to bump balls by sorting the Manhattan distance between them. Three kinds of pair-exchange procedures are then carried out to improve the initial assignment. Then to shorten the wire length, non-Manhattan RDL routing is adopted to connect the I/O buffers and bump balls. Finally some un-routed connections are ripped up and rerouted. The experimental results show that the proposed method is able to obtain the routes with shorter wire length in reasonable CPU times.

CiNii
Layer Assignment and Equal-length Routing for Disordered Pins in PCB Design

Zhang Ran, Pan Tieyuan, Zhu Li, Watanabe Takahiro

Information and Media Technologies 10 ( 3 ) 395 - 404 2015年

　概要を見る

In recent printed circuit board (PCB) design, due to the high density of integration, the signal propagation delay or skew has become an important factor for a circuit performance. As the routing delay is proportional to the wire length, the controllability of the wire length is usually focused on. In this research, a heuristic algorithm to get equal-length routing for disordered pins in PCB design is proposed. The approach initially checks the longest common subsequence of source and target pin sets to assign layers for pins. Single commodity flow is then carried out to generate the base routes. Finally, considering target length requirement and available routing region, R-flip and C-flip are adopted to adjust the wire length. The experimental results show that the proposed method is able to obtain the routes with better wire length balance and smaller worst length error in reasonable CPU times.

DOI CiNii
A Performance Enhanced Dual-switch Network-on-chip Architecture

Zeng Lian, Jiang Xin, Watanabe Takahiro

Information and Media Technologies 10 ( 3 ) 405 - 414 2015年

　概要を見る

With rapid progress in semiconductor technology, Network-on-Chip (NoC) becomes an attractive solution for future systems on chip (SoC). The network performance depends critically on the performance of packets routing. The delay of router and packets contention can significantly affect network latency and throughput. As the network becomes more congested, packets will be blocked more frequently. It would result in degrading the network performance. In this article, we propose an innovative dual-switch allocation (DSA) design. By introducing DSA design, we can make utmost use of idle output ports to reduce packets contention delay, meanwhile, without increasing router delay. Experimental results show that our design significantly achieves the performance improvement in terms of throughput and latency at the cost of very little power and area overhead.

CiNii
A Performance Enhanced Dual-switch Network-on-chip Architecture

Zeng Lian, Jiang Xin, Watanabe Takahiro

IPSJ Transactions on System LSI Design Methodology 8 ( 0 ) 85 - 94 2015年

　概要を見る

With rapid progress in semiconductor technology, Network-on-Chip (NoC) becomes an attractive solution for future systems on chip (SoC). The network performance depends critically on the performance of packets routing. The delay of router and packets contention can significantly affect network latency and throughput. As the network becomes more congested, packets will be blocked more frequently. It would result in degrading the network performance. In this article, we propose an innovative dual-switch allocation (DSA) design. By introducing DSA design, we can make utmost use of idle output ports to reduce packets contention delay, meanwhile, without increasing router delay. Experimental results show that our design significantly achieves the performance improvement in terms of throughput and latency at the cost of very little power and area overhead.

CiNii
A Performance Enhanced Dual-switch Network-on-Chip Architecture

Lian Zeng, Takahiro Watanabe

2015 20TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC) 69 - 74 2015年 [査読有り]

　概要を見る

Network-on-Chip (NoC) is an attractive solution for future systems on chip (SoC). The network performance depends critically on the performance of packets routing. However, as the network becomes more congested, packets will be blocked more frequently. It would result in degrading the network performance. In this article, we propose an innovative dual-switch allocation (DSA) design. By introducing two switch allocations, we can make utmost use of idle output ports. Experimental results show that our design significantly achieves the performance improvement in terms of throughput and latency at the cost of very little power overhead.
A Length Matching Routing Method for Disordered Pins in PCB Design

Ran Zhang, Tieyuan Pan, Li Zhu, Takahiro Watanabe

2015 20TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC) 402 - 407 2015年 [査読有り]

　概要を見る

In this paper, for the disordered pins in printed circuit board (PCB) design, a heuristics algorithm is proposed to obtain a length matching routing. We initially check the longest common subsequence of pin pairs to assign layers for pins. Then, adopt single commodity flow to generate base routes. R-flip and C-flip are finally carried out to adjust the wire length. The experiments show that our algorithm generates the optimal routes with better wire balance within reasonable CPU times.
Application-Specific Shared Last-Level Cache Optimization for Low-Power Embedded Systems

Huatao Zhao, Jiongyao Ye, Xian Su, Takahiro Watanabe

2015 IEEE 13TH INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS) 2015年 [査読有り]

　概要を見る

Modern embedded systems favor the chip multiprocessor frame to achieve higher performance, but they are restricted by the inefficient cache hierarchies. Typically, the accessing interference and improper allocation in last-level cache (LLC) shared by multiprocessors cause significant energy consumption and performance depression. In this paper, we propose a configurable and partitioned cache hierarchy where an energy-efficient runtime mechanism can well manage the shared LLC to meet application programs. This mechanism utilizes the repeated behaviors in hot subroutines of application and selects the proper partition intervals. Then, a low-power metric based configurable scheme is employed to explore the optimal allocation of given cache resources. Thus, we can provide each core with the optimal allocation information to dynamically partition the shared LLC during runtime. Experimental results for a quad-core system using the SPEC2006 benchmarks show that the cache access energy can be reduced by on average 32.5 percent compared to the equal partition scheme only with 1.3 percent performance off.

DOI

Scopus
Vertical-Mesh-Conscious-Dynamic Routing Algorithm for 3D NoCs

Xiangyang Lei, Xin Jiang, Lian Zeng, Takahiro Watanabe

TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE 2016-January 2015年 [査読有り]

　概要を見る

In this paper, a new deadlock-free dynamic turn model named VMCD (vertical-mesh-conscious-dynamic) is proposed for higher performance in 3D NoC. On vertical meshes and odd horizontal meshes, odd-even turn model is applied, while xy routing is utilized on even horizontal meshes. According to the priority of vertical meshes and horizontal meshes, two VMCD routing algorithms are applied based on this turn model. Compared with the Z-odd-even (ZOE) and balanced-odd-even (BOE), the proposed VMCD algorithms take adaptiveness and network balance into account simultaneously and show better performance including average latency and throughput. Compared to ZOE on 8*8*2 and 8*8*4 mesh, the improvement of throughput can be up to 68.5% and 9.3% respectively for the random traffic and 14.3% and 20% respectively for the transpose traffic pattern. The performance improvement is much more remarkable compared with BOE routing algorithm.

DOI

Scopus

1

被引用数

(Scopus)
A Performance Enhanced Adaptive Routing Algorithm for 3D Network-on-Chips

Lian Zeng, Tieyuan Pan, Xin Jiang, Takahiro Watanabe

TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE 2016-January 2015年 [査読有り]

　概要を見る

As the technology of semiconductor continues to develop, hundreds of cores will be deployed on a signal die in the future Chip-Multiprocessors (CMPs) design. So Three Dimensional Network-on-Chips (3D NoCs) has become an attractive solution which can provide high performance. The network performance depends critically on the performance of routing algorithm. This paper proposes a novel adaptive routing in 3D NoC which can solve congestion not only in the intra-layers but also in inter-layers. Simulation results shoo that our proposed method significantly achieves the performance improvement compared with other transitional routing algorithms.

DOI

Scopus

1

被引用数

(Scopus)
A Sorting-Based Micro-Bump Assignment for 3D ICs

Ran Zhang, Tieyuan Pan, Takahiro Watanabe

2015 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC) 139 - 140 2015年 [査読有り]

　概要を見る

Recently RDLs (Re-Distribution Layers) and microbumps are widely adopted in 3D IC designs. In this research, a sorting-based micro-bump assignment method is proposed. The approach initially assigns the I/O pads to micro-bumps by sorting the Manhattan distance between them. Then single layer routing in two RDLs are carried out respectively. The experimental results show that the proposed method is able to obtain the routes with shorter total wire length in reasonable CPU times.

DOI

Scopus

3

被引用数

(Scopus)
A Stack-based Solution for Alias Problem in Branch Prediction

Sijie YIN, Huatao ZHANG, Takahiro WATANABE

情報処理学会第76回全国大会 2014 ( 1 ) 95 - 96 2014年03月

　概要を見る

In modern embedded systems, improving accuracy in branch prediction is one of the most crucial preoccupations. It's well known that branch alias has become one of the most serious problem that affects the accuracy in 2-level adaptive predictor. In this paper, we propose a stack-based solution which can alleviate alias problem significantly and improve the accuracy in branch prediction. Our proposed solution extends 4 bits on the conventional PHT's higher bits. Experiments are carried on Simple-scalar3.0e and the performance is verified by testing SPEC2006. The result shows that contrasting to the conventional prediction, the new proposed structure can achieve about 10.5% improvement on IPC on average with negligible extra hardware cost.

CiNii
Adaptive Routing with Congestion Estimation based on G-table

Gong Zheng, Zeng Lian, Watanabe Takahiro

2014電子情報通信学会総合大会 2014年03月
A Sophisticated Routing Algorithm in 3D NoC with Fixed TSVs for Low Energy and Latency

Jiang Xin, Zeng Lian, Watanabe Takahiro

Information and Media Technologies 9 ( 4 ) 404 - 412 2014年

　概要を見る

With rapid progress in Integrated Circuit technologies, Three-Dimensional Network-on-Chips (3DNoCs) have become a promising solution for achieving low latency and low power. Under the constraint of the TSV number used in 3DNoCs, designing a proper routing algorithm with fewer TSVs is a critical problem for network performance improvement. In this work, we design a novel fully adaptive routing algorithm in 3D NoC. The algorithm consists of two parts: one is a vertical node assignment in inter-layer routing, which is a TSV selection scheme in a limited quantity of TSVs in the NoC architecture, and the other is a 2D fully adaptive routing algorithm in intra-layer routing, which combines the optimization of routing distance, network traffic condition and diversity of the path selection. Simulation results show that our proposed routing algorithm can achieve lower latency and energy consumption compared with other traditional routing algorithms.

DOI CiNii
A sophisticated routing algorithm in 3D NoC with fixed TSVs for low energy and latency

Xin Jiang, Lian Zeng, Takahiro Watanabe

IPSJ Transactions on System LSI Design Methodology 7 ( 0 ) 101 - 109 2014年

　概要を見る

With rapid progress in Integrated Circuit technologies, Three-Dimensional Network-on-Chips (3DNoCs) have become a promising solution for achieving low latency and low power. Under the constraint of the TSV number used in 3DNoCs, designing a proper routing algorithm with fewer TSVs is a critical problem for network performance improvement. In this work, we design a novel fully adaptive routing algorithm in 3D NoC. The algorithm consists of two parts: one is a vertical node assignment in inter-layer routing, which is a TSV selection scheme in a limited quantity of TSVs in the NoC architecture, and the other is a 2D fully adaptive routing algorithm in intra-layer routing, which combines the optimization of routing distance, network traffic condition and diversity of the path selection. Simulation results show that our proposed routing algorithm can achieve lower latency and energy consumption compared with other traditional routing algorithms.

DOI CiNii

Scopus

5

被引用数

(Scopus)
A sophisticated routing algorithm in 3D NoC with fixed TSVs for low energy and latency

Xin Jiang, Lian Zeng, Takahiro Watanabe

IPSJ Transactions on System LSI Design Methodology 7 ( Aug.2014 ) 101 - 109 2014年

　概要を見る

With rapid progress in Integrated Circuit technologies, Three-Dimensional Network-on-Chips (3DNoCs) have become a promising solution for achieving low latency and low power. Under the constraint of the TSV number used in 3DNoCs, designing a proper routing algorithm with fewer TSVs is a critical problem for network performance improvement. In this work, we design a novel fully adaptive routing algorithm in 3D NoC. The algorithm consists of two parts: one is a vertical node assignment in inter-layer routing, which is a TSV selection scheme in a limited quantity of TSVs in the NoC architecture, and the other is a 2D fully adaptive routing algorithm in intra-layer routing, which combines the optimization of routing distance, network traffic condition and diversity of the path selection. Simulation results show that our proposed routing algorithm can achieve lower latency and energy consumption compared with other traditional routing algorithms.

DOI

Scopus

5

被引用数

(Scopus)
A Randomized Algorithm for the Fixed-Length Routing Problem

Tieyuan Pan, Ran Zhang, Yasuhiro Takashima, Takahiro Watanabe

2014 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS) 711 - 714 2014年 [査読有り]

　概要を見る

In this paper, we propose a fixed-length routing method in Printed Circuit Board (PCB). The proposed method utilizes the simpath algorithm with a randomized graph reduction. It outputs the routing of the nets with small length-error. Its efficiency is confirmed empirically.
LVSの出力情報を活用したVLSI電源配線幅の高速検証システム

亀井智紀, 渡邊孝博, 川北真裕

電子情報通信学会論文誌Ｄ Vol.J96-D ( 5 ) 2013年05月
An efficient algorithm for 3d NOC architecture optimization

Xin Jiang, Ran Zhang, Takahiro Watanabe

IPSJ Transactions on System LSI Design Methodology 6 34 - 41 2013年02月

　概要を見る

With the progress of 3D IC integration technologies, the application of 3D Networks-on-chip (NoCs) has been proposed as a scalable and efficient solution to the global communication in the interconnect designs. In this work, we propose a new procedure for designing application specific irregular 3D NoC architectures. This procedure does not only satisfy the variability of the highly customized SoC designs, but also achieve significant performance improvement. The objective is to improve both communication latency and power consumption under several 3D constraints. A Genetic Algorithm (GA) based efficient algorithm is applied to optimize both the topology and floorplan. Numerical experiments are implemented on standard benchmarks by comparing the method application in 3D architectures with the 2D designs and then comparing the architecture obtained by our proposed algorithm with both classical topologies and custom based topologies. The experimental results show that the architectures by our design algorithm can achieve more performance improvement than other algorithms and the proposed algorithm also proves to be a time efficient method for exploration in the large solution space. © 2013 Information Processing Society of Japan.

DOI

Scopus

1

被引用数

(Scopus)
An Efficient Algorithm for 3D NoC Architecture Optimization

Jiang Xin, Zhang Ran, Watanabe Takahiro

Information and Media Technologies 8 ( 2 ) 254 - 261 2013年

　概要を見る

With the progress of 3D IC integration technologies, the application of 3D Networks-on-chip (NoCs) has been proposed as a scalable and efficient solution to the global communication in the interconnect designs. In this work, we propose a new procedure for designing application specific irregular 3D NoC architectures. This procedure does not only satisfy the variability of the highly customized SoC designs, but also achieve significant performance improvement. The objective is to improve both communication latency and power consumption under several 3D constraints. A Genetic Algorithm (GA) based efficient algorithm is applied to optimize both the topology and floorplan. Numerical experiments are implemented on standard benchmarks by comparing the method application in 3D architectures with the 2D designs and then comparing the architecture obtained by our proposed algorithm with both classical topologies and custom based topologies. The experimental results show that the architectures by our design algorithm can achieve more performance improvement than other algorithms and the proposed algorithm also proves to be a time efficient method for exploration in the large solution space.

DOI CiNii
Flexible L1 Cache Optimization for a Low Power Embedded System

Huatao Zhao, Sijie Yin, Yuxin Sun, Takahiro Watanabe

PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC) 1 2433 - 2437 2013年 [査読有り]

　概要を見る

large power consumption of memory access has been one of the major bottlenecks in modern embedded systems. Because caches even take about half of those systems' power consumption. So it is essential in concentrating on optimized strategies for cache's parameters as well as the enhancement of its adaptability to various applications. Considering the particular applications of embedded systems, we can optimize the caches with configuration parameters such as cache size, line size or associativity. In this paper, we firstly put forward the relations between those cache parameters, and the quantified results establish a new reconfigurable cache structure so as to find the optimal cache parameters rapidly by a searching algorithm. Furthermore, the possible hardware implementation with certain parameters is described, and the effectiveness of this method is verified by experiments using CACTI6.5 and SPEC2006 benchmark on Simple- scalar 3.0e. Experimental results show that the proposed cache can reduce the power consumption to 38.4% of its maximum power consumption caused by the redundant hardware resources.
A Parallel Routing Method for Fixed Pins using Virtual Boundary

Ran Zhang, Takahiro Watanabe

2013 IEEE TENCON SPRING CONFERENCE 99 - 103 2013年 [査読有り]

　概要を見る

In recent PCB systems, the routing for high speed board is still achieved manually. As IC technology advances rapidly, the dimensions of packages and PCBs are decreasing while the pin counts and routing layers keep increasing. In this research, a parallel routing method for fixed pins using virtual boundary is proposed, which partitions the routing area into several sub-areas and routes them separately. Applying this proposed method, the wire length can be reduced. Moreover, considering the length-matching constraints, especially for the isometric wires routing problems, the proposed method can get better wire shape resemblance.
A Novel Fully Adaptive Fault-tolerant Routing Algorithm for 3D Network-on-Chip

Xin Jiang, Takahiro Watanabe

2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON) 2013年 [査読有り]

　概要を見る

In this work, we present an efficient fully adaptive fault-tolerant routing algorithm for 3D Network-on-Chip (3D NoC). The crucial algorithm for path routing is firstly routing the packet to the destination layer by using an adaptive vertical node assignment scheme in the NoC architecture with a limited quantity of TSVs and then routing to the destination node within the 2D layer through a fully adaptive routing algorithm. Instead of rerouting packets around the fault regions when fault occurs, our proposed algorithm applies a fault detection scheme which can get the fault information one hop away in advance, and it combines the fault information when doing the path computation. This algorithm can deal with multi faults in the 3D NoC architecture. Simulation results show that our proposed routing algorithm can achieve lower latency, energy consumption and higher packet arrival rate compared with other traditional routing algorithms in various network applications.
Adaptive Router with Predictor using Congestion Degree for 3D Network-on-Chip

Lian Zeng, Xin Jiang, Takahiro Watanabe

2013 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC) 46 - 49 2013年 [査読有り]

　概要を見る

As the technology of chip multiprocessors (CMPs) is evolved, the performance of 2D architecture becomes insufficient to meet various requirements, and three-dimensional integrated circuits (3D-ICs) provide an attractive solution to improve network performance by using through silicon via (TSV). However, there are more transmitted packets in 3D network and congestion condition becomes more complex. The performance of network depends critically on its routing algorithm. Various routing algorithms have been proposed for 3D NoCs. Adaptive routing algorithm that merges local congestion and future congestion information was proposed in [9]. But the congestion used in it is roughly estimated, not very precise, but network performance is affected by the congestion significantly. In this paper, we propose a more precise congestion for predictor based on [9] and implement it in 3D NoCs. The proposed method is proved to have better latency and throughput than traditional routing methods like XY routing and Odd-even routing.
Region Oriented Routing FPGA Architecture for Dynamic Power Gating

Ce Li, Yiping Dong, Takahiro Watanabe

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E95A ( 12 ) 2199 - 2207 2012年12月 [査読有り]

　概要を見る

Dynamic power gating applicable to FPGA can reduce the power consumption effectively. In this paper, we propose a sophisticated routing architecture for a region oriented FPGA which supports dynamic power gating. This is the first routing solution of dynamic power gating for coarse-grained FPGA. This paper has 2 main contributions. First, it improves the routing resource graph and routing architecture to support special routing for a region oriented FPGA. Second, some routing channels are made wider to avoid congestion. Experimental result shows that 7.7% routing area can lie reduced compared with the symmetric Wilton switch box in the region. Also, our proposed FPGA architecture with sophisticated P&R can reduce the power consumption of the system implemented in FPGA.

DOI

Scopus
Rotational Display Problem for Array Reference in LSI Layout Data

Tomoki Kamei, Takahiro Watanabe

Proc. ITC-CSCC 2012 2012年07月
Design and Implementation of SHA-1 Hash Function using Verilog HDL

Suhaili Shamsiah binti, Takahiro Watanabe

2012年電子情報通信学会総合大会講演論文集 DS-1-3 DS ( 1 ) s5 - s6 2012年03月
A Parallel Routing Method using Virtual Boundary

Ran Zhang・Takahiro Watanabe

2012年電子情報通信学会総合大会講演論文集 A-３-2 A ( 3 ) 2 2012年03月
A Time-efficient Approach to Evolve GA-based Image Filters

Endong Ni, Takahiro Watanabe

2012年電子情報通信学会総合大会講演論文集 A ( 1 ) 33 2012年03月
Region-Oriented Placement Algorithm for Coarse-Grained Power-Gating FPGA Architecture

Ce Li, Yiping Dong, Takahiro Watanabe

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E95D ( 2 ) 314 - 323 2012年02月 [査読有り]

　概要を見る

An FPGA plays an essential role in industrial products due to its fast, stable and flexible features. But the power consumption of FPGAs used in portable devices is one of critical issues. Top-down hierarchical design method is commonly used in both ASIC and FPGA design. But, in the case where plural modules are integrated in an FPGA and some of them might be in sleep-mode, current FPGA architecture cannot be fully effective. In this paper, coarse-grained power gating FPGA architecture is proposed where a whole area of an FPGA is partitioned into several regions and power supply is controlled for each region, so that modules in sleep mode can be effectively power-off. We also propose a region oriented FPGA placement algorithm fitted to this user's hierarchical design based on VPR [1]. Simulation results show that this proposed method could reduce power consumption of FPGA by 38% on average by setting unused modules or regions in sleep mode.

DOI

Scopus

1

被引用数

(Scopus)
A Behavior-based Adaptive Access-mode for Low-power Set-associative Caches in Embedded Systems

Jiongyao Ye, Hongfeng Ding, Yingtao Hu, Takahiro Watanabe

Journal of Information Processing 20 ( 1 ) 26 - 36 2012年01月

　概要を見る

Modern embedded processors commonly use a set-associative scheme to reduce cache misses. However, a conventional set-associative cache has its drawbacks in terms of power consumption because it has to probe all ways to reduce the access time, although only the matched way is used. The energy spent in accessing the other ways is wasted, and the percentage of such energy will increase as cache associativity increases. Previous research, such as phased caches, way prediction caches and partial tag comparison, have been proposed to reduce the power consumption of set-associative caches by optimizing the cache access mode. However, these methods are not adaptable according to the program behavior because of using a single access mode throughout the program execution. In this paper, we propose a behavior-based adaptive access-mode for set-associative caches in embedded systems, which can dynamically adjust the access modes during the program execution. First, a program is divided into several phases based on the principle of program behavior repetition. Then, an off-system pre-analysis is used to exploit the optimal access mode for each phase so that each phase employs the different optimal access mode to meet the application's demand during the program execution. Our proposed approach requires little hardware overhead and commits most workload to the software, so it is very effective for embedded processors. Simulation by using Spec 2000 shows that our proposed approach can reduce roughly 76.95% and 64.67% of power for an instruction cache and a data cache, respectively. At the same time, the performance degradation is less than 1%.Modern embedded processors commonly use a set-associative scheme to reduce cache misses. However, a conventional set-associative cache has its drawbacks in terms of power consumption because it has to probe all ways to reduce the access time, although only the matched way is used. The energy spent in accessing the other ways is wasted, and the percentage of such energy will increase as cache associativity increases. Previous research, such as phased caches, way prediction caches and partial tag comparison, have been proposed to reduce the power consumption of set-associative caches by optimizing the cache access mode. However, these methods are not adaptable according to the program behavior because of using a single access mode throughout the program execution. In this paper, we propose a behavior-based adaptive access-mode for set-associative caches in embedded systems, which can dynamically adjust the access modes during the program execution. First, a program is divided into several phases based on the principle of program behavior repetition. Then, an off-system pre-analysis is used to exploit the optimal access mode for each phase so that each phase employs the different optimal access mode to meet the application's demand during the program execution. Our proposed approach requires little hardware overhead and commits most workload to the software, so it is very effective for embedded processors. Simulation by using Spec 2000 shows that our proposed approach can reduce roughly 76.95% and 64.67% of power for an instruction cache and a data cache, respectively. At the same time, the performance degradation is less than 1%.

DOI CiNii

Scopus

1

被引用数

(Scopus)
A New Recovery Mechanism in Superscalar Microprocessors by Recovering Critical Misprediction

Jiongyao Ye, Yu Wan, Takahiro Watanabe

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E94A ( 12 ) 2639 - 2648 2011年12月 [査読有り]

　概要を見る

Current trends in modern out-of-order processors involve implementing deeper pipelines and a large instruction window to achieve high performance, which lead to the penalty of the branch misprediction recovery being a critical factor in overall processor performance. Multi path execution is proposed to reduce this penalty by executing both paths following a branch, simultaneously. However, there are some drawbacks in this mechanism, such as design complexity caused by processing both paths after a branch and performance degradation due to hardware resource competition between two paths. In this paper, we propose a new recovery mechanism, called Recovery Critical Misprediction (RCM), to reduce the penalty of branch misprediction recovery. The mechanism uses a small trace cache to save the decoded instructions from the alternative path following a branch. Then, during the subsequent predictions, the trace cache is accessed. If there is a hit, the processor forks the second path of this branch at the renamed stage so that the design complexity in the fetch stage and decode stage is alleviated. The most contribution of this paper is that our proposed mechanism employs critical path prediction to identify the branches that will be most harmful if mispredicted. Only the critical branch can save its alternative path into the trace cache, which not only increases the usefulness of a limited size of trace cache but also avoids the performance degradation caused by the forked non-critical branch. Experimental results employing SPECint 2000 benchmark show that a processor with our proposed RCM improves IPC value by 10.05% compared with a conventional processor.

DOI

Scopus
Low Power Placement and Routing for the Coarse-Grained Power Gating FPGA Architecture

Ce Li, Yiping Dong, Takahiro Watanabe

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E94A ( 12 ) 2519 - 2527 2011年12月 [査読有り]

　概要を見る

Since the power consumption of FPGA is larger than that of ASIC under the condition to perform the same function using the same scaling, the application of FPGA is limited especially in portable electronic devices. In this paper, we propose a novel low-power FPGA architecture based on coarse-grained power gating to reduce power consumption. The new placement algorithm and routing resource graph for sleep regions is also presented. After enhancing the CAD framework, a detailed discussion is given under different region size supported by the new FPGA architecture. As a result, our proposed FPGA architecture combined with the new placement and routing algorithm can reduce 19.4% in the total power consumption compared with the traditional FPGA. By using our proposed method, FPGA is promising to be widely applied to portable devices.

DOI

Scopus

1

被引用数

(Scopus)
A Hybrid Layer-Multiplexing and Pipeline Architecture for Efficient FPGA-based Multilayer Neural Network

Y.P.Dong, C.Li, Z.Lin, Takahiro Watanabe

IEICE NOLTA E94-N ( 10 ) 522 - 532 2011年10月
An Adaptive Various-Width Data Cache for Low Power Design

Jiongyao Ye, Yu Wan, Takahiro Watanabe

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E94D ( 8 ) 1539 - 1546 2011年08月 [査読有り]

　概要を見る

Modern microprocessors employ caches to bridge the great speed variance between a main memory and a central processing unit, but these caches consume a larger and larger proportion of the total power consumption. In fact, many values in a processor rarely need the full-bit dynamic range supported by a cache. The narrow-width value occupies a large portion of the cache access and storage. In view of these observations, this paper proposes an Adaptive Various-width Data Cache (AVDC) to reduce the power consumption in a cache, which exploits the popularity of narrow-width value stored in the cache. In AVDC, the data storage unit consists of three sub-arrays to store data of different widths. When high sub-arrays are not used, they are closed to save its dynamic and static power consumption through the modified high-bit SRAM cell. The main advantages of AVDC are: 1) Both the dynamic and static power consumption can be reduced. 2) Low power consumption is achieved by the modification of the data storage unit with less hardware modification. 3) We exploit the redundancy of narrow-width values instead of compressed values, thus cache access latency does not increase. Experimental results using SPEC 2000 benchmarks show that our proposed AVDC can reduce the power consumption, by 34.83% for dynamic power saving and by 42.87% for static power saving on average, compared with a cache without AVDC.

DOI

Scopus
Analysis before Starting an Access: A New Power-Efficient Instruction Fetch Mechanism

Jiongyao Ye, Yingtao Hu, Hongfeng Ding, Takahiro Watanabe

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E94D ( 7 ) 1398 - 1408 2011年07月 [査読有り]

　概要を見る

Power consumption has become an increasing concern in high performance microprocessor design. Especially, Instruction Cache (I-Cache) contributes a large portion of the total power consumption in a microprocessor, since it is a complex unit and is accessed very frequently. Several studies on low-power design have been presented for the power-efficient cache design. However, these techniques usually suffer from the restrictions in the traditional Instruction Fetch Unit (IFU) architectures where the fetch address needs to be sent to I-Cache once it is available. Therefore, work to reduce the power consumption is limited after the address generation and before starting an access. In this paper, we present a new power-aware IFU architecture, named Analysis Before Starting an Access (ABSA), which aims at maximizing the power efficiency of the low-power designs by eliminating the restrictions on those low-power designs of the traditional IFU. To achieve this goal, ABSA reorganizes the IFU pipeline and carefully assigns tasks for each stages so that sufficient time and information can be provided for the low-power techniques to maximize the power efficiency before starting an access. The proposed design is fully scalable and its cost is low. Compared to a conventional IFU design, simulation results show that ABSA saves about 30.3% fetch power consumption, on average. I-Cache employed by ABSA reduces both static and dynamic power consumptions about 85.63% and 66.92%, respectively. Meanwhile the performance degradation is only about 0.97%.

DOI

Scopus
High Performance Feedforward Neural Network Mapped by NoC Architecture with a New Routing Strategy Implementation Method

Y.P.Dong, C.Li, Z.Lin, H.Zhang, Takahiro Watanabe

J. Signal Processing 15 ( 3 ) 113 - 122 2011年03月

CiNii
Mixed Constrained Image Filter Design for Salt-and-pepper Noise Reduction using Genetic Algorithm,", , pp.363-368, 2011

Bao Zhiguo, Takahiro Watanabe

IEEJ Trans.EIS vol.131, No.3 363 - 368 2011年03月
Efficient GA approach combined with Taguchi method for mixed constrained circuit design

Yiwen Su, Zhiguo Bao, Fangfang Wang, Takahiro Watanabe

Proceedings - 2011 International Conference on Computational Science and Its Applications, ICCSA 2011 290 - 293 2011年

　概要を見る

This paper proposes a new circuit design optimization method where Genetic Algorithm (GA) with parameterized uniform crossover (GApuc) is combined with Taguchi method. The purposed are (a) using Taguchi method to search for optimal fitness value and (b) evaluating the power and signal delay of logic blocks in circuit design to get an optimum circuit in complexity, power and signal delay. The present study enhances the previous results by providing a much more detailed examination of mixed constrained circuit design. Experimental results show that our proposed approach can produce a good circuit in both fitness function and CPU time. © 2011 IEEE.

DOI

Scopus

4

被引用数

(Scopus)
Ｖｉａ数削減による大規模ＬＳＩレイアウトの高速

亀井智紀, 安部拓哉, 本垰秀昭, 渡邊孝博

情報処理学会ＳＬＤＭ研究報告 2011-SLDM-148(17) 1 - 6 2011年01月
Fault-tolerant Image Filter Design using Particle Swarm Optimization

Zhiguo Bao, Fangfang Wang, Xiaoming Zhao, Takahiro Watanabe

PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11) 653 - 658 2011年 [査読有り]

　概要を見る

This paper describes mixed constrained image filter design with fault tolerant using Particle Swarm Optimization (PSO) on a reconfigurable processing array. There may be some faulty Configurable Logic Blocks (CLBs) in a reconfigurable processing array. The proposed method with PSO autonomously synthesizes a filter fitted to the reconfigurable device with some faults, to optimize the complexity and power of a circuit, and signal delay in both CLBs and wires. An image filter for noise reduction is experimentally synthesized to verify the validity of our method. By evolution, the quality of the optimized image filter on a reconfigurable device with a few faults is almost same as that with no fault.
A High Performance Digital Neural Processor Design by Network on Chip Architecture

Yiping Dong, Ce Li, Hui Liu, Watanabe Takahiro

2011 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT) 243 - 246 2011年 [査読有り]

　概要を見る

This paper describes a high performance neural processor by using a Network on Chip (NoC) architecture to solve the interconnection and performance problems in hardware neural networks. The proposed NoC-based neural processor is composed of 20 tiles in 4x5 2-D array, and each tile includes a Process Element (PE) and a packet switched router. In each PE, four neurons are implemented to achieve low communication load. The network is 2-D torus topology, and it has a 32 G/s bandwidth and asynchronous clocking system. Our proposed neural processor is designed using 90-nm CMOS technology with one Poly and nine metals, and its performance is evaluated. As a result, it can achieve over 3.1 G Connection Per Second (CPS) of performance while power dissipation is 1.1317 W at 1.2 V supply-voltage and 25 mm(2) chip area. Compared with the other existing hardware neural networks, the proposed processor can achieve low communication load and high performance, and it is reconfigurable and extendable.
New power-efficient FPGA design combining with region-constrained placement and multiple power domains

Ce Li, Yiping Dong, Takahiro Watanabe

2011 IEEE 9th International New Circuits and Systems Conference, NEWCAS 2011 69 - 72 2011年

　概要を見る

Multiple power domain design architectures have been studied for the power-efficient FPGAs. But, most of these researches pay attention on the clustered logic block's finegrain power gating which increases the FPGA size significantly. This paper presents a fast placement algorithm for coarsegrain FPGAs architecture, by which the circuit with multiple power domains is mapped into several regions for low power consumption. Each region uses one or several sleep transistors in order to conserve leakage energy. Using the CAD framework, we discuss the power efficiency of sleep region FPGA architecture by using the benchmarks assumed in multiple power domains. Simulation result shows that 9.1% power consumption of FPGA can be reduced on average by the proposed placement algorithm, compared to the traditional algorithm. Furthermore, when the dual power domains are individually power-on and -off, our proposed method can reduce the power more than 20%. © 2011 IEEE.

DOI

Scopus

5

被引用数

(Scopus)
New Power-aware Placement for Region based FPGA Architecture コンビねdウィthDynamic Power Gating by PCHM

C.Li, Y.P.Dong, T. Watanabe

Proc.ISLPED'11 (Int'l Symp. Low Power Electronics Design) 223 - 228 2011年

DOI

Scopus

11

被引用数

(Scopus)
An efficient design algorithm for exploring flexible topologies in custom adaptive 3D NoCs for high performance and low power

Xin Jiang, Ran Zhang, Takahiro Watanabe

Proceedings of International Conference on ASIC 535 - 538 2011年

　概要を見る

The application of 3D Networks-on-chip (NoCs) has been proved to be an effective solution to the global communication of 3D IC integration, while the design of NoC topologies has played a critical role to increase interconnection performance. In this work, we propose a new procedure for designing application specific irregular 3D NoC topologies which achieve significant performance improvement. The objective is to improve both communication latency and power consumption under several 3D constraints. We propose a two-stage design model based on a series of efficient algorithms to explore the optimized topology in a large scale searching space. Numerical experimental results show that the topologies by our design algorithm achieve more performance improvement (about 31.5%) than the classical topologies and the proposed algorithm also proves to be a time efficient method for exploration in the large solution space. © 2011 IEEE.

DOI

Scopus

1

被引用数

(Scopus)
カスタマイズ可能なRip-up IP MIX とWIPER2.0 の開発

李美燕, 王嘉宇, 渡邊孝博

電気関係学会九州支部第63回連合大会 02-1P-02 2010年09月
ネットワーク・オン・チップにおける低遅延ルーティングアルゴリズムの提案

李岩, 林しん, 董宜平, 渡邊孝博

電気関係学会九州支部第63回連合大会 10-2A-08 2010年09月
並列等長配線のための多層配線手法

張然, 渡邊孝博

電気関係学会九州支部第63回連合大会 10-2A-07 2010年09月
NoC ルーティングアルゴリズムの高性能ハードウェア化の手法

張華, 董宜平, 渡邉孝博

電気関係学会九州支部第63回連合大会 10-2A-09 2010年09月
Circuit Design Using Genetic Algorithm combined with Taguchi method and Particle Swarm Optimization

YiWen Su, Zhiguo Bao, Kuoyang Tu, Takahiro Watanabe

電気関係学会九州支部第63回連合大会 12-1A-04 2010年09月
Power-efficient Level-2 Cache Design for Embedded Processors

Mengyuan Tang・Jiongyao Ye, Takahiro Watanabe

電気関係学会九州支部第63回連合大会 12-1A-01 2010年09月
A Novel Low Power FPGA Architecture

Li Ce, Watanabe Takahiro

Proc. FIT２０１０ (Forum on Information Technology) 1 ( RC002 ) 2010年09月
Multiple Network-on-Chip Model for High Performance Neural Network

Yiping Dong, Ce Li, Zhen Lin, Takahiro Watanabe

JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE 10 ( 1 ) 28 - 36 2010年03月 [査読有り]

　概要を見る

Hardware implementation methods for Artificial Neural Network (ANN) have been researched for a long time to achieve high performance. We have proposed a Network on Chip (NoC) for ANN, and this architecture can reduce communication load and increase performance when an implemented ANN is small. In this paper, a multiple NoC models are proposed for ANN, which can implement both a small size ANN and a large size one. The simulation result shows that the proposed multiple NoC models can reduce communication load, increase system performance of connection-per-second (CPS), and reduce system running time compared with the existing hardware ANN. Furthermore, this architecture is reconfigurable and reparable. It can be used to implement different applications of ANN.
Circuit Design Optimization Using Genetic Algorithm with Parameterized Uniform Crossover

Zhiguo Bao, Takahiro Watanabe

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E93A ( 1 ) 281 - 290 2010年01月 [査読有り]

　概要を見る

Evolvable hardware (EHW) is a new research field about the use of Evolutionary Algorithms (EAs) to construct electronic systems. EHW refers in a narrow sense to use evolutionary mechanisms as the algorithmic drivers for system design, while in a general sense to the capability of the hardware system to develop and to improve itself. Genetic Algorithm (GA) is one of typical EAs. We propose optimal circuit design by using GA with parameterized uniform crossover (GApuc) and with fitness function composed of circuit complexity, power, and signal delay. Parameterized uniform crossover is much more likely to distribute its disruptive trials in an unbiased manner over larger portions of the space, then it has more exploratory power than one and two-point crossover, so we have more chances of finding better solutions. Its effectiveness is shown by experiments. From the results, we can see that the best elite fitness, the average value of fitness of the correct circuits and the number of the correct circuits of GApuc are better than that of GA with one-point crossover or two-point crossover. The best case of optimal circuits generated by GApuc is 10.18% and 6.08% better in evaluating value than that by GA with one-point crossover and two-point crossover, respectively.

DOI

Scopus

9

被引用数

(Scopus)
Mixed constrained image filter design using particle swarm optimization

Zhiguo Bao, Takahiro Watanabe

Artificial Life and Robotics 15 ( 3 ) 363 - 368 2010年

　概要を見る

This article describes an evolutionary image filter design for noise reduction using particle swarm optimization (PSO), where mixed constraints on the circuit complexity, power, and signal delay are optimized. First, the evaluated values of correctness, complexity, power, and signal delay are introduced to the fitness function. Then PSO autonomously synthesizes a filter. To verify the validity of our method, an image filter for noise reduction was synthesized. The performance of the resultant filter by PSO was similar to that of a genetic algorithm (GA), but the running time of PSO is 10% shorter than that of GA. © 2010 International Symposium on Artificial Life and Robotics (ISAROB).

DOI

Scopus

5

被引用数

(Scopus)
High performance Implementation of Neural Networks by Networks on Chip with 5-Port 2-Virtual Channels

Yiping Dong, Zhen Lin, Yan Li, Takahiro Watanabe

2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS 381 - 384 2010年 [査読有り]

　概要を見る

Hardware implementation of Artificial Neural Network (ANN) is proposed by using Networks on Chip (NoC) with 5-port 2-virtual channels router, aiming at higher performance and low latency. Experimental results by NIRGAM NoC simulator show that this proposed system has higher Connection-Per-Second (CPS), higher Connection-Per-Second-Per-Weight (CPSPW), lower communication load. Furthermore this NoC implementation system is reconfigurable and expandable, so that it can be applied to various applications.
A novel hardware method to implement a routing algorithm onto network on chip

Yiping Dong, Hua Zhang, Zhen Lin, Takahiro Watanabe

2010 International Conference on Communications, Circuits and Systems, ICCCAS 2010 - Proceedings 852 - 856 2010年

　概要を見る

Recently, a Network on Chip (NoC) has attracted much attention for its smart structure and high performance. However, NoC routing algorithms significantly influences the performance and design cost. In this paper, a new hardware method to implement a routing algorithm is proposed. The proposed method is used to replace the general destination-tag method for router design. We simulate and evaluate the router and NoC with proposed method in terms of circuit resource, latency and throughput. The results indicate that the NoC architecture with proposed method is effective in reducing circuit resource, latency and increasing throughput. © 2010 IEEE.

DOI

Scopus
High performance networks on chip architecture with a new routing strategy for neural network

Yiping Dong, Zhen Lin, Takahiro Watanabe

PrimeAsia 2010 - 2nd Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics 347 - 350 2010年

　概要を見る

Hardware implementation by Networks on Chip (NoC) for Artificial Neural Network (ANN) was proposed to improve. In this work, a new architecture of NoC which has a hardware implementation of routing algorithm is proposed for ANN design. This routing strategy could reduce the packet size of header. The NOXIM NoC simulator is used to simulate the proposed system in term of latency, throughput and power consumption. The experimental results indicate that the proposed new NoC architecture is effective in increasing throughput and reducing latency and power consumption, compare with the traditional one. The ANN with the new NoC architecture could achieve higher performance and lower communication load.

DOI

Scopus
A Variable Bitline Data Cache for low power design

Jiongyao Ye, Takahiro Watanabe

PrimeAsia 2010 - 2nd Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics 174 - 177 2010年

　概要を見る

Reducing the power consumption is one of the most important design problems at present. Modern microprocessors employ caches to bridge the great speed variance between the main memory and the central processing unit, but these caches propose larger and larger proportion in the total power consumption. In fact, many values rarely need the full-bit dynamic range supported by a cache. The Narrow-Width Value (NWV) occupies a large portion of cache access and storage. It is unreasonable that the storage space for value of any data width is the same in the cache, even if NWV needs only a few bits to be stored. This paper proposes a Variable Bitline Data Cache (VBDC) which exploits the popularity of NWV stored in the cache. In VBDC design, the cache data array is divided into several sub-arrays to adapt each data pattern with the different bitline length to access. The VBDC can shut off the corresponding unused high arrays to reduce its dynamic and static power consumption. The VBDC achieves low power consumption through reducing the bitline length. Experimental results employing SPEC 2000 benchmarks show that our proposed VBDC can reduce both the dynamic power consumption ant the static power consumption by 44.75% and 42.86%.

DOI

Scopus

2

被引用数

(Scopus)
Fault-tolerant Image Filter Design using GA

Zhiguo Bao, Fangfang Wang, Xiaoming Zhao, Takahiro Watanabe

TENCON 2010: 2010 IEEE REGION 10 CONFERENCE 897 - 902 2010年 [査読有り]

　概要を見る

This paper describes mixed constrained image filter design with fault tolerant using Genetic Algorithm (GA) on a reconfigurable processing array. There may be some faulty Configurable Logic Blocks (CLBs) in a reconfigurable processing array at random. The proposed method with GA autonomously synthesizes a filter fitted to the reconfigurable device with some faults, evaluating the complexity, power and signal delay in both CLBs and wires. An image filter for noise reduction is experimentally synthesized to verify the validity of our method. By evolution, the quality of the optimized image filter on a reconfigurable device with a few faults is almost same as that with no fault.
An Efficient Hardware Routing Algorithms for NoC

Yiping Dong, Zhen Lin, Takahiro Watanabe

TENCON 2010: 2010 IEEE REGION 10 CONFERENCE 1525 - 1530 2010年 [査読有り]

　概要を見る

Networks on Chip (NoC) has been widely discussed for its smart structure and high performance. Routing algorithms significantly influence design cost and system performance of NoC. In this paper, a new hardware method called Final-Destination-Tag (FDT) is proposed to improve the original Destination-Tag (DT) method for implementing different routing algorithms. Compared with the DT method, the proposed FDT method could reduce the header size of the packet. We evaluate NoC with this proposed method in terms of circuit resource, average latency, max latency, average throughput and power consumption. The results indicate that the proposed method is effective in increasing throughput and reducing circuit resource, latency and power consumption for NoC.
Aｎ efficient 3D NoC synthesis by using Genetic Algorithms

Xin Jiang, Takahiro Watanabe

Proc. IEEE TENCON２０１０ 1207 - 1212 2010年 [査読有り]
A Hybrid Architecture for Efficient FPGA-based Implementation of Multilayer Neural Network

Zhen Lin, Yiping Dong, Yan Li, Takahiro Watanabe

PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS) 616 - 619 2010年 [査読有り]

　概要を見る

This paper presents a novel architecture for the FPGA-based implementation of multilayer neural network (NN), which integrates the layer-multiplexing and pipeline architecture together. The proposed method is aimed at enhancing the efficiency of resource usage and improving the forward speed at the module level, so that a larger NN can be implemented on commercial FPGAs. We developed a mapping method from NN schematic to physical architecture in FPGA by using the hybrid architecture, and also developed an algorithm to automatically determine the architecture by optimizing the application specific neural network topology. The experimental results with several different network topologies show that the proposed architecture can produce a very compact circuit with higher speed, compared with conventional methods.
A novel genetic algorithm with different structure selection for circuit design optimization

Zhiguo Bao, Takahiro Watanabe

Artificial Life and Robotics 14 ( 2 ) 266 - 270 2009年11月

　概要を見る

In the traditional GA, the tournament selection for crossover and mutation is based on the fitness of individuals. This can make convergence easy, but some useful genes may be lost. In selection, as well as fitness, we consider the different structure of each individual compared with an elite one. Some individuals are selected with many different structures, and then crossover and mutation are performed from these to generate new individuals. In this way, the GA can increase diversification into search spaces so that it can find a better solution. One promising application of GA is evolvable hardware (EHW), which is a new research field to synthesize an optimal circuit. We propose an optimal circuit design by using a GA with a different structure selection (GAdss), and with a fitness function composed of circuit complexity, power, and signal delay. Its effectiveness is shown by simulations. From the results, we can see that the best elite fitness, the average fitness value of correct circuits, and the number of correct circuits with GAdss are better than with GA. The best case of optimal circuits generated by GAdss is 8.1% better in evaluation value than that by traditional GA. © 2009 International Symposium on Artificial Life and Robotics (ISAROB).

DOI

Scopus

2

被引用数

(Scopus)
A Study of Customized Processor IP Design using WIPER

Y. Wan, J. Ye, M. Bi, T. Watanabe

Proc. PrimeAsia’09 2009年11月
P/G network design to optimize area, performance and power consumption

Y. Shi, Z. Bao, Y. Wang, X. Zuojun, T. Watanabe

Proc. PrimeAsia’09 2009年11月
A new flexible network on chip architecture for mapping complex feedforward neural network

Y. Dong, C. Li, K. Kumai, Y. Li, Y.Wang, T.Watanabe

Journal of Signal Processing 13 ( 6 ) 453 - 462 2009年11月
Reducing Branch Misprediction Penalty in Superscalar Microprocessors by Recovering

Ye Jiongyao, Wan Yu, Dong Yiping, Bao Zhiguo, Watanabe Takahiro

Proc. FIT2009 (Forum on Information Technology2009) 1 ( RC-002 ) 121 - 128 2009年09月
Low power and high speed network on chip architecture for bp neural network

Y. P. Dong, Y. H. Li, Y. Wang, T. Watanabe

Proc. ITC-CSCC’09 2009年07月
An effective method to reduce recovery cache size by using hash table search

JiongYao Ye, T. Watanabe

Proc. ITC-CSCC2009 2009年07月
A novel GA with multi-level evolution for mixed constrained circuit design optimization

Zhiguo Bao, Takahiro Watanabe

Proc.NCSP 2009 (RISP Int'l Workshop on Nonlinear Circuits and Signal Processing) 411 - 414 2009年03月

CiNii
Mixed NoC architecture for mapping complex feedforward neural network

Yiping Dong, Takahiro Watanabe

Proc.NCSP 2009 (RISP Int'l Workshop on Nonlinear Circuits and Signal Processing) 609 - 612 2009年03月
A novel genetic algorithm with different structure selection for circuit optimization

Zhiguo Bao, Takahiro Watanabe

Proc.14th AROB (Int'l Symposium on Artificial Life and Robotics)) 218 - 222 2009年02月
A Novel Genetic Algorithm with Cell Crossover for Circuit Design Optimization

Zhiguo Bao, Takahiro Watanabe

ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5 2982 - 2985 2009年 [査読有り]

　概要を見る

Evolvable Hardware (EHW) is a new field about the use of Evolutionary Algorithms (EA) to synthesize a circuit. Genetic Algorithm (GA) is one of the typical EA. In traditional GA, the crossover is one-point crossover or two-point crossover. One-point crossover and two-point crossover change the genes of individuals too many in one time and they are not flexible, so it may lose some useful genes. In this paper, we propose the novel cell crossover. The cell crossover can change genes more flexibly and enhance more diversification to search spaces than one-point crossover and two-point crossover, so that we can find better solution. We propose optimal circuit design by using GA with cell crossover (GAcc), and with fitness function composed of circuit complexity, power and signal delay. Simulation results show GAcc is superior to traditional GA in point of the best elite fitness, the average value of fitness of correct circuits and the number of correct circuits. The best optimal circuit generated by GAcc is 27.9% better in evaluating value than that by GA with one-point crossover.
High Performance and Low Latency Mapping for Neural Network into Network on Chip Architecture

Yiping Dong, Yang Wang, Zhen Lin, Takahiro Watanabe

2009 IEEE 8TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS 891 - 894 2009年 [査読有り]

　概要を見る

Various hardware implementations of neural networks have been studied well in recent years. We have already proposed a hardware implementation method for neural network with a Network on Chip (NoC) architecture. A mapping of a neural network on NoC should be tuned to achieve high performance whenever neural network application is changed, so that different mapping methods are needed every time and tedious or burdensome works are required. In this paper, we propose a general mapping strategy based on three rules. The mapping method with this strategy can implement different neural networks applications with NoC architecture. The simulation results show that the proposed method makes the system low latency and high performance.
Evolutionary Design for Image Filter using GA

Zhiguo Bao, Takahiro Watanabe

TENCON 2009 - 2009 IEEE REGION 10 CONFERENCE, VOLS 1-4 164 - 169 2009年 [査読有り]

　概要を見る

This paper describes evolutionary image filter design for noise reduction using Genetic Algorithm (GA), where the circuit complexity, power and signal delay are optimized. First, the evaluating value about correctness, complexity, power and signal delay are introduced to the fitness function. Then GA autonomously synthesizes a circuit which is simple and has good performance. To verify the effectiveness of our method, an image filter for noise reduction is experimentally synthesized. The resultant image filter by GA and the quality of filtered image are discussed.
High Dependable Implementation of Neural Networks with Networks on Chip Architecture and a Backtracking Routing Algorithm

Yiping Dong, Kento Kumai, Zhen Lin, Yinghe Li, Takahiro Watanabe

2009 ASIA PACIFIC CONFERENCE ON POSTGRADUATE RESEARCH IN MICROELECTRONICS AND ELECTRONICS (PRIMEASIA 2009) 404 - + 2009年 [査読有り]

　概要を見る

Networks on Chip (NoC), a new packet-based design method, with a new Dependable No Deadlock (DND) backtracking routing algorithm are proposed to implement Artificial Neural Network (ANN). This system is simulated by NIRGAM NoC simulator to get system performance. Experimental results show that this proposed system has higher Connection-Per-Second (CPS), lower communication load than the exiting other implemented ANN. Furthermore this NoC implementation system is reconfigurable and expandable. In addition, this implementation method has a higher dependable than our former NoC implemented ANN system.
A low-power misprediction recovery mechanism

Jiongyao Ye, Takahiro Watanabe

2009 ASIA PACIFIC CONFERENCE ON POSTGRADUATE RESEARCH IN MICROELECTRONICS AND ELECTRONICS (PRIMEASIA 2009) 209 - 212 2009年 [査読有り]

　概要を見る

In modern superscalar processor, branch misprediction penalty becomes a critical factor in overall processor performance. Previous researches proposed dual (or multi) path execution methods attempt to reduce the misprediction penalty, but these methods are quite complex and high power consumption. Most of the reasons are due to simultaneously fetching and executing instructions from multiple. In this paper, we reduce branch misprediction penalties based on the balance between complexity, power, and performance. We present a novel technique-Decode Recovery Cache (DRC) - for reducing misprediction penalty, giving consideration to complexity and power consumption simultaneously. The DRC stores decoded instructions that are mispredicted. Then during subsequent mispredictions, a hit in the DRC can reduce the re-fill time of pipeline, and eliminate instruction re-fetch and its subsequent decoding. The bypassing of both re-fetching and re-decoding reduces processor power. Experimental results employing SPECint 2000 benchmark show that, using a processor with DRC, IPC value is significantly improved by 10.4% on average over the traditional processors and average power consumption is reduced by 62.6%, compared with dual Path Instruction Processing.
An Adaptive Width Data Cache for Low Power Design

Jiongyao Ye, Takahiro Watanabe

2009 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC 2009) 488 - 491 2009年 [査読有り]

　概要を見る

Reducing the power consumption is one of the most important design problems at present. Modern microprocessors employ caches to bridge the great speed variance between the main memory and the central processing unit, but these caches propose larger and larger proportion in the total power consumption. The Narrow-Width Value (NWV) occupies a large portion of cache access and storage. The storage space for value of any data width is the same in the cache, even if NWV needs only a few bits to be stored. This paper proposes an Adaptive Width Data Cache (AWDC) which exploits the popularity of NWV stored in the cache. In AWDC, the cache data array is divided into several data arrays to adapt different data width to access/store. Its purpose is shutting off corresponding unused high arrays to reduce its dynamic and static power consumption. AWDC achieves low power consumption only by the modification of the high-bit SRAM unit almost without any additional hardware, and does not affect cache performance. Experimental results employing SPEC 2000 benchmarks show that our proposed AWDC can reduce both the dynamic power consumption ant the static power consumption by 44.75% and 42.86%.
High performance autoassociative neural network using network on chip

Yiping Dong, Zhen Lin, Takahiro Watanabe

2009 1st International Conference on Information Science and Engineering, ICISE 2009 4015 - 4018 2009年

　概要を見る

In this paper, an Artificial Autoassociative Neural Network (AANN) is implemented by Network on Chip (NoC) architecture to solve communication and performance problem. This proposed NoC based system can map four neurons in one PE and the whole system consists of PEs each of which connects with a router. This system is reconfigurable and extendable so that it can easily suit for different applications. Simulation results show that the proposed implementation method can reduce communication load and total computation time. ©2009 IEEE.

DOI

Scopus
スーパスカラプロセッサの分岐回復の高速化に関する研究

白馬成, 叶炯耀, 高芳, 渡邊孝博

電子情報通信学会ソサイエティ大会 2008年09月
Power Consideration Multilevel Partitioning Using Voltage Islands

Wang Wei, Lin Tao, Watanabe Takahiro

FIT2008 2008年09月
Rapid Design of a Multiprocessor Syatem for a JPEG Decoder on FPGA

Cao Dawei, Chen Keyan, Watanabe Takahiro

FIT2008 2008年09月
Network on Chips Structure for Mapping Two Hidden Layers BP-ANNｓ

Yiping Dong, Takahiro Watanabe

Proc.23rd Intn'l Tech. Conf.Circuits/Systems,Computers and Communications (ITC-CSCC2008 601 - 604 2008年07月
Recovery Scheme to Reduce Latency of Miss-Prediction for Superscalar Processor using L1 Recovery Cache

JiongYao Ye, Takahiro Watanabe

Proc. 23rd ITC-CSCC 233 - 236 2008年07月
FPGAとSoftCoreを用いたチップ・マルチプロセッサの検討

姜洋, 李策, 陳科研, 曹大為, 渡邊孝博

電子情報通信学会総合全国大会 2008年03月
多層ハイパーグラフを用いた超大規模回路の電圧島の分割問題の解法

林涛, 王偉, 渡邊孝博

電子情報通信学会総合全国大会 2008年03月
Network-on-Chipにおける消費電力を考慮したルーティングの一手法

白秀君, 佐藤清久, 渡邊孝博

電子情報通信学会総合全国大会 2008年03月
パケット位置情報を用いたオンチップ・ルータの消費電力削減手法の提案

佐藤清久, 白秀君, 渡邊孝博

電子情報通信学会総合全国大会 2008年03月
A multiprocessor system for a small size soccer robot control system

Ce Li, Yang Jiang, Zhenyu Wu, Takahiro Watanabe

DELTA 2008: FOURTH IEEE INTERNATIONAL SYMPOSIUM ON ELECTRONIC DESIGN, TEST AND APPLICATIONS, PROCEEDINGS 115 - + 2008年 [査読有り]

　概要を見る

In this paper, a new fully digitized hardware design scheme of a soccer robot controller is presented as an application of a multiprocessor system. It is designed and implemented on one-chip FPGA with two embedded Nios II processors to verify the effectiveness of our system. In the practical test, the system is dependable, and has the characteristics of fast response and high precision. It also has the advantages of smaller PCB area, less chip number and shorter development period.

DOI

Scopus

4

被引用数

(Scopus)
Network on Chip architecture for BP Neural Network

Yiping Dong, Watanabe Takahiro

2008 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1 AND 2 1083 - 1087 2008年 [査読有り]

　概要を見る

Recently, Networks-on-Chips (NoCs) have a great development and have been proposed as a promising solution to complex on-chip communication problems. One of the problems is an application of Artificial Neural Networks (ANNs). In this paper, we propose NoCs for the ANNs. NoCs is designed to implement a BP-ANNs (Back-Propagation) and evaluated by Network-on-Chips. Experimental results show that for has a great reduction in communication load and a high connection per second (CPS) compared with traditional BP-ANNs. It is also reconfigurable, expandable and stable to meet various problems.
A New Approach for Circuit Design Optimization using Genetic Algorithm

Zhiguo Bao, Takahiro Watanabe

ISOCC: 2008 INTERNATIONAL SOC DESIGN CONFERENCE, VOLS 1-3 383 - 386 2008年 [査読有り]

　概要を見る

A circuit designed by human often results in very complex hardware architectures, requiring a large amount of manpower and computational resources. A wider objective is used to find novel solutions to design such complex architectures so that system functionality and performance may not be compromised. Design automation using reconfigurable hardware and Evolutionary Algorithms (EA), such as Genetic Algorithm (GA), is one of the methods to tackle this issue. This concept applies the notion of Evolvable Hardware (EHNV) to the problem domain such as novel design solutions and circuit optimization. EHW is a new field about the use of EA to synthesize a circuit. EA manipulates a population of individuals where each individual describes how to construct a candidate for a good circuit. Each circuit is assigned a fitness, which indicates how well a candidate satisfies the design specification. EA uses stochastic operators repeatedly to evolve new circuit configurations from existing ones, and a resultant circuit configuration will exhibit a desirable behavior. In this paper, optimum circuit design by using GA with fitness function composed of circuit complexity, power and time delay is proposed, and its effectiveness is shown by simulations.
High Performance NoC Architecture for two hidden layers BP Neural Network

Yiping Dong, Watanabe Takahiro

ISOCC: 2008 INTERNATIONAL SOC DESIGN CONFERENCE, VOLS 1-3 269 - 272 2008年 [査読有り]

　概要を見る

Artificial Neural Networks (ANNs) are widely used in applications of an intelligent system such as pattern recognition, fuzzy system, optimization and control. We have already proposed a novel NoC architecture for different kinds of BP-ANNs [1][2] and it was shown that the architecture is a promising hardware implementation for Neural Network. However, some problems to be solved are still remained. One of them is performance. In this paper, we propose another NoC architecture, network topology and routing strategy for higher performance. Experimental results by NoC simulator show that this new architecture and routing strategy reduce the communication load, reduce both latency by 7.7% and dynamic power consumption by 10.3% and also improve throughput by 8.1%, all compared with the previous one.
Score sequence pair problems of (r(11), r(12), r(22))-tournaments - Determination of realizability

Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E90D ( 2 ) 440 - 448 2007年02月 [査読有り]

　概要を見る

Let G be any graph with property P (for example, general graph, directed graph, etc.) and S be nonnegative and non-decreasing integer sequence(s). The prescribed degree sequence problem is a problem to determine whether there is a graph G having S as the prescribed sequence(s) of degrees or outdegrees of the vertices. From 1950's, P has attracted wide attentions, and its many extensions have been considered. Let P be the property satisfying the following (1) and (2):
(1) G is a directed graph with two disjoint vertex sets A and B.
(2) There are r(11) (r(22), respectively) directed edges between every pair of vertices in A(B), and r(12) directed edges between every pair of vertex in A and vertex in B.
Then G is called an (r(11), r(12), r(22))-tournament ("tournament", for short). The problem is called the score sequence pair problem of a "tournament" (realizable, for short). S is called a score sequence pair of a "tournament" if the answer of the problem is "yes." In this paper, we propose the characterizations of a score sequence pair of a "tournament" and an algorithm for determining in linear time whether a pair of two integer sequences is realizable or not.

DOI

Scopus
Construction of an (r(11), r(12), r(22))-tournament from a score sequence pair

Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11 3403 - + 2007年 [査読有り]

　概要を見る

Let G be any directed graph and S be nonnegative and non-decreasing integer sequence(s). The prescribed degree sequence problem is a problem to determine whether there is a graph G with S as the prescribed sequence(s) of outdegrees of the vertices. Let G be the property satisfying the following (1) and (2):
(1) G has two disjoint vertex sets A and B.
(2) For every vertex pair u, v epsilon G (u not equal v), G satisfies
[GRAPHICS]
where uv (vu, respectively) means a directed edges from u to v (from v to u).
Then G is called an (r(11),r(12),r(22))-tournament ("tournament", for short). When G is a "tournament," the prescribed degree sequence problem is called the score sequence pair problem of a "tournament", and S is called a score sequence pair of a "tournament "(or S is realizable) if the answer is "yes."
We proposed the characterizations of a "tournament" and an algorithm for determining in linear time whether a pair of two integer sequences is realizable or not [5]. In this paper, we propose an algorithm for constructing a "tournament" from such a score sequence pair.
Realizability of score sequence pair problem of an (r11,r12,r22)-tournament

Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

Proc. IEEE APCCAS，Dec．2006 1021 - 1024 2006年12月
A Consideration of the Score Sequence Pair Problems of (r11,r12,r22)-Tournaments

Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

Proc．Int'l Mathematical Conference-Topics in Mathematical Analysis and Graph Theory,Magt Belgrade 2006 50 - 51 2006年09月
FPGAを用いたμプロセッサのカスタマイズIP

北島圭祐, 渡邊孝博

情報処理学会九州支部「火の国情報シンポジウム2006」論文番号 C-5-3 2006年03月
２-３木を用いた回路の階層的分割の検討

朱小松, 渡邊孝博

情報処理学会九州支部「火の国情報シンポジウム2006」論文番号 C-5-4 2006年03月
ScoresequencePairProblems of （ｒ１１、ｒ１２、ｒ２２）-tournaments construction

Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

電子情報通信学会回路とシステム研究会技術報告 CAS2005 ( 70 ) 1 - 6 2006年01月
Realizability of score sequence pair of an (r(11), r(12), r(22))-tournament

Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

2006 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS 1019 - + 2006年 [査読有り]

　概要を見る

Let G be any directed graph and S be nonnegative and non-decreasing integer sequence(s). The prescribed degree sequence problem is a problem to determine whether there is a graph G with S as the prescribed sequence(s) of outdegrees of the vertices. Let G be the property satisfying the following (1) and (2):
(1) G has two disjoint vertex sets A and B.
(2) For every vertex pair u, v is an element of G (u not equal v), G satisfies
vertical bar{uv}vertical bar + vertical bar{vu}vertical bar = {r(11) if u, v is an element of A {r(12) if u is an element of A, v is an element of B {r(22) if u, v is an element of B
where uv (vu, respectively) means a directed edges from u to v (from v to u).
Then G is called an (r(11),r(12),r(22))-tournamenf ("tournament", for short). When G is a "tournament," the prescribed degree sequence problem is called the score sequence pairproblem of a "tournament", and S is called a score sequence pair of a "tournament" (or S is realizable) if the answer is "yes."
In this paper, we propose the characterizations of a "tournament" and an algorithm for determining in linear time whether a pair of two integer sequences is realizable or not.
μプロセッサIPのカスタマイズ設計

野村知弘, 渡邊孝博

情報処理学会九州支部「若手の会セミナー2005」 2005年03月
カスタマイズ可能なμプロセッサIPに関する研究

古賀雅隆, 渡邊孝博

情報処理学会九州支部「火の国情報シンポジウム2005」論文番号 A-4-4 2005年03月
分岐処理の高速化に関する一手法

叶炯耀, 渡邊孝博

2005年電子情報通信学会総合大会講演論文集講演番号 D-6-2 50 2005年03月
（r11,r12,r22)得点列対問題

高橋昌也, 渡邊孝博, 吉村猛

電子情報通信学会コンピュテーション研究会技術報告（COMP2004-72） 104 ( 642 ) 97 - 106 2005年01月
大規模回路の階層的分割手法

韓東,徐軼韜, 渡邊孝博

Proc.2004 HISS (第6回IEEE広島シンポジウム) 210 2004年12月
FPGA-IP利用の一手法とその設計環境

徐軼韜, 渡邊孝博

平成16年度電気情報関連学会中国支部第55回連合大会講演論文集論文番号 122006 311 2004年10月
大規模システムの効率的な階層木分割手法

徳本守, 渡邊孝博

山口大学工学部研究報告 52 ( 1 ) 5 - 12 2001年10月

　概要を見る

Recently,a circuit complexity of VLSIs,especially SOCs(Systm-on-a-chip),has been increased more and more due to the requirements of high performance and various functions,and their layout design has become a great di cult task. So that,circuit partitioning is indispensable to an e cient and superior system design,where the whole circuit is partitioned into sub-circuits of a reasonable size. Circuit partitioning is reduced to a graph partitioning problem. But the problem is known as an NP-complete problem,even if two-way partitioning of a graph with unity node-size and edge-weight.So,we propose a hier- archical tree partitioning method,where two greedy algorithms are executed in some probability. Experimental results show that the proposed method can e ciently make a good circuit partition,and it is very useful for a VLSI design.

CiNii
暗号VLSIプロセッサのための固有電力消費アーキテクチャ

松原裕之, 中村維男, 渡邊孝博

情報処理学会論文誌 41 ( 4 ) 950 - 957 2001年04月
シフト直交実数有限長系列に対するＭ-ary /DS-SS方式用ディジタルマッチトフィルタの演算素子数の検討

T.Matsumoto, Y.Tanada, T.Watanabe

Proc.3rd IEEE Signal Processing Workshop on Signal Processing Advances in Wireless Communications, 2001年03月
A fine grain cooled logic architecture for low-power processors

H Matsubara, T Watanabe, T Nakamura

IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES E84A ( 3 ) 735 - 740 2001年03月 [査読有り]

　概要を見る

In this paper, we propose a fine grain Cooled Logic architecture for low-power oriented processors. Cooled Logic detects, in novel hardware method with dual-rail logic, functional blocks to be active, and stops clocks to each of the functional blocks in order to make it inactive at certain periods. To confirm the effectiveness of our approach, we design a 4-bit and a 16-bit event-driven array multipliers, and analyze their power consumption by the HSPICE simulator. As a result, it is shown that Cooled Logic has a tendency to reduce power consumptions in both the functional blocks and the clock drivers of the multipliers.
Digital matched filter of reduced operation elements for M-ary/DS-SS system using real-valued shift-orthogonal finite-length sequences

T Matsumoto, Y Tanada, T Watanabe

2001 IEEE THIRD WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, PROCEEDINGS 46 - 49 2001年 [査読有り]

　概要を見る

A real-valued shift-orthogonal finite-length sequence has a sharp autocorrelation function with zero sidelobes except at shift ends, and can be synthesized by element sequences. In this paper, we propose the structure of a digital matched Alter for Mary Direct Sequence Spread Spectrum (M-ary/DS-SS) system using the sequences and the reduction of operation elements of the digital matched Alter. It is shown that this digital matched Alter for M-ary/DS-SS system can be constructed by the multipliers and the adders proportional to sequence length.
An Architecture for Secure Encryption VLSI Procesors using a Constant-Characteristic Power Dissipation Concept

H.Matsubara, T.Watanabe, T.Nakamura

Journal.IPSJ 42 ( 4 ) 950 - 957 2001年
A clocking scheme for lowering peak-current in dynamic logic circuits

H Matsubara, T Watanabe, T Nakamura

IEICE TRANSACTIONS ON ELECTRONICS E83C ( 11 ) 1733 - 1738 2000年11月 [査読有り]

　概要を見る

This paper deals with a new low-power clocking scheme for dynamic logic circuits to reduce power dissipation. Although conventional clocking schemes for dynamic logic circuits are mainly used for high-speed applications like domino circuits, their peak-current are very large due to the concentration of precharging and discharging in a short period. It is hard for these schemes to accomplish both reductions of power dissipation and high performance at the same time. In the field of power engineering, levering power means decreasing peak-to-peak of power keeping its amount. So, we propose a sophisticated clocking scheme leveling power dissipation of processing elements that mainly reduces power dissipation of crock drivers. Our proposed clocking scheme uses an over-lapped clock with a fine-grain power control, and peak-current becomes lower and power dissipation in short period is levered without penalty of speed performance. Our proposed scheme is applied to a 4-bit array multiplier, and reductions of power dissipation of both the multiplier and clock driver are measured by the HSPICE simulator based on 0.5 mum CMOS technology. It is shown that power dissipation of clock drivers, 4-bit array multiplier, and the total are reduced by about 13.2 percent, 2.6 percent and 7.0 percent, respectively. As a result, our clocking scheme is effective in reduction of power dissipations of clock drivers.
LSIレイアウトにおけるポリゴン配線の通常配線変換

江達夫, 渡邊孝博

山口大学工学部研究報告 51 ( 1 ) 41 - 48 2000年10月

　概要を見る

In LSI layout design, two kinds of wire patterns are used to connect equipotential terminals. One is a so-called "ordinary routing-pattern" which is defined by line segments with a given width along a route, and another is a "polygon routing-pattern" whose shape is defined by coordinates of polygon's vertices. The latter is difficult to deal with layout CAD/DA tools like a layout compactor, because direction of wire's expansion and contraction cannot be recognized. So, we propose an algorithm which can efficiently transform a polygon routing-pattern into an ordinary one, by constructing a routing tree connecting terminals in each polygon pattern. Experimental results show that a transformed routing pattern obtained by the proposed algorithm is good for existent CAD tools.

CiNii
低電力のための細粒度電力制御Cooled Logic アーキテクチャ

松原裕之, 中村維男, 渡邊孝博

電子情報通信学会第13回回路とシステム軽井沢ワークショップ 2000年04月

▼全件表示

書籍等出版物

Robot Soccer 〜Ｃｈａｐｔｅｒ．１ The real-time and embedded soccer robot control system

C. Li, T. Watanabe, Z. Wu, H. Li, Y. Huangfu, Edited, by Vladan Pap

Ｓｃｉｙｏ, Vienna, Austria 2010年01月 ISBN: 9789533070360
デジタル論理回路の基礎

笹尾勤, 渡邊孝博, 見山友裕, 澤田直, 橋本浩二

(財)福岡県産業・科学技術振興財団システムLSI部 2007年04月
回路設計・物理設計の基礎知識

井上靖秋, 渡邊孝博, 淡野公一, 築添明

（財）福岡県産業・科学技術振興財団 2005年04月
情報工学実験及び演習Ⅰ テキスト

古賀和利, 中村秀明, 伊藤暁, 山口静馬, 石川昌明, 久長穣, 渡邊孝博

山口大学工学部知能情報システム工学科 2003年09月
最新VLSIの開発設計とCAD 第7章

渡邊孝博, 大附辰夫, 後藤敏監

ミマツデータシステム 1994年

講演・口頭発表等

An Adaptive Adjustable Routing Algorithm for 3D Network-on-Chop

Ma W, Watanabe T

電子情報通信学会総合大会電子情報通信学会

発表年月： 2018年03月
The High-speed Power Line Topology Check by Reducing Vias

DAC 2011 User Truck (2011 IEEE 48th Deasign Automation Conference)

発表年月： 2011年06月
Via数削減による大規模LSIレイアウトの高速DRC手法

情報処理学会システムLSI設計技術研究会（SLDM)

発表年月： 2011年01月
ネットワークオンチップによるBPニューラルネットワークの一構成法

電子情報通信学会2008年総合大会

発表年月： 2008年03月

共同研究・競争的資金等の研究課題

トラフィックパターンの変動にロバストなＮｏＣシステムの研究

日本学術振興会科学研究費助成事業

研究期間:

2018年04月

-

2021年03月

渡邊孝博

　概要を見る

LSIの設計・製造技術とシステム開発技術の進展により、SoC(System-on-Chip)の規模は益々増大している。そのため、数百個以上のプロセッサ・コアを集積したMPSoC(Multi-Processor SoC)ではコア間の通信が重大な問題となる。NoC(Network-on-Chip)は、従来SoCのバスベース通信に代えて、コア間の通信をオンチップ・ネットワークを用いたパケット通信で処理することで、スケーラビリティと通信性能の向上が得られ、大規模なマルチコアシステムが実現できる。しかし、NoCでも通信量増大につれて通信混雑が発生し、十分な性能を発揮できない状況が発生する。このため、混雑状況下でもパケットを効率よく伝送できるルーティングアルゴリズムが必須である。本研究では、通信量増大に伴って通信混雑が発生した場合でも、遅延が小さく良好な性能を発揮できるNoCルーティングを実現した。具体的には、(1)様々なトラフィックパターンに対する複数のルーティングアルゴリズムの性能評価を行い、パターンとアルゴリズムの相性の調査、(2)あるトラフィックパターンの下での混雑状況の検出機構の提案、(3)トラフィックパターンの特徴に応じた最適アルゴリズムを選択する機構の提案、(4)ホットスポット・トラフィックパターンでの局所的混雑とアルゴリズム性能との関係の分析、(5)低コストな混雑検出回路の提案である。研究の結果、与えられたトラフィックパターンの下で事前に用意したルーティングアルゴリズム群の中から最適なものを選択できる機構と、混雑状況に応じてアルゴリズムを切り替える機構を提案し、当初の研究目的を達成した。また、NoCに故障がある場合にその故障部分を回避するルーティング手法についても取り組んだ。この問題は、通信混雑を回避する手法と類似した部分があり、今後取り組むべき発展的な課題である。前年度までの研究を引き継いで、トラフィックパターンの変動要因の分析結果に基づいたトラフィック混雑状況の2種類の検知機構の提案手法を適用して、ルーティングアルゴリズムの性能実験(パケット注入度の増加に対するパケット通信のスループットとレイテンシの評価)を行った。混雑状況としては(1)局所的な混雑状況がランダムに発生する場合と、(2)トラフィックパターンの種類に依存して発生する場合に分類した。(1)での混雑検知機構として、ルータに備わっているバッファでのパケットの待機状態をもとに定義する混雑度を用い、3×3のベーシックなメッシュ形NoCに対して、高い混雑度の領域を回避する経路探索のアルゴリズムを提案した。(2)では、トラフィックパターンの種類判別のために、（A)パケットの始点・終点アドレスから予測できるトラフィックパターン(トランスポーズ、ビットリバーサルパターンなど)と、（B)サンプルパケット群の終点の分布に基づく分散から特徴抽出できるトラフィックパターン(ランダムパターンなど)との2種類に分類し、パターン【A】ではWest-Firstルーティング手法を、（B)ではX-Yルーティング手法を適用するフレームを提案してきた。サイズの異なるメッシュ型NoCを用いた実験では、レイテンシとスループットの評価で、いずれのトラフィックパターンでも提案手法が従来手法より優れていることが判った。以上より、パターンの変動要因であるトラフィック状況を分類し、それに適切な混雑検知機構を適用することで、目標とする「トラフィックの種類や混雑状況に応じたNoCルーティングの高性能化」が達成できた。これらの成果はIEEE MCSoC国際会議にて論文発表した。これまでの研究成果により、トラフィックパターンおよび混雑度検知の機構の構築と、ルーティングアルゴリズム切替による性能向上が確認でき、主題であるトラフィックパターンの変動に頑健なNoCルーティングシステムが構築できた。そこで最終年度の目標としては、与えられたトラフィックパターンの下で、さらに高性能なルーティングを実現する混雑回避アルゴリズムの開発を絞り込んで行うこととする。一つはパケット転送経路の履歴に基づいて混雑箇所を予測し最適ルートを探索する方式、もう一つはメッシュ形NoC上で混雑箇所が集中している領域を特定して迂回経路を選択する方式である。それぞれの基本アルゴリズムは検討済みであり、実験評価を行う。併せて、耐故障性を高めると同時に、デッドロック問題を回避するるルーティングを提案し、実験で評価する。耐故障性の評価はパケット到達率を指標として行うこととする。これらの研究により得られた成果は適宜、学会発表等で報告していく
ＩＰを用いたタイルベースＮｏＣのシステムの構成と設計技術に関する研究

日本学術振興会科学研究費助成事業

研究期間:

2011年04月

-

2014年03月

渡邊孝博

　概要を見る

大規模LSIシステムの実現方式であるNoC(Network on Chip)としてタイルベースアーキテクチャを採用し、設計効率向上のために、各タイルのコア部をIP再利用設計することを提案した。プロセッサコアの設計には、命令レベルでカスタマイズ可能なプロセッサIP作成手法と設計環境を構築した。2次元および3次元NoCについて、特定用途向きのアーキテクチャと、高スループットで低レイテンシおよび低消費電力のルーティングを研究し、高性能NoC構成の手法を明らかにした。さらに、NoCやSoC(System on Chip)を搭載したボードレベルの課題である配線遅延問題を解決のための配線手法を提案した
ＩＰを用いたタイルベースＮｏＣのシステムの構成と設計技術に関する研究

科学研究費助成事業(早稲田大学) 科学研究費助成事業(基盤研究(C))

研究期間:

2011年

-

2013年

渡邊孝博

　概要を見る

大規模LSIシステムの実現方式であるNoC(Network on Chip)としてタイルベースアーキテクチャを採用し、設計効率向上のために、各タイルのコア部をIP再利用設計することを提案した。プロセッサコアの設計には、命令レベルでカスタマイズ可能なプロセッサIP作成手法と設計環境を構築した。2次元および3次元NoCについて、特定用途向きのアーキテクチャと、高スループットで低レイテンシおよび低消費電力のルーティングを研究し、高性能NoC構成の手法を明らかにした。さらに、NoCやSoC(System on Chip)を搭載したボードレベルの課題である配線遅延問題を解決のための配線手法を提案した。
通信用SoCのシステムレベル設計手法

研究期間:

2003年

-

2008年
ＩＣＴアプリケーションＬＳＩＩＰとその先端的設計支援技術

研究期間:

2007年

-

　
システムLSIプロトタイピングベース設計環境

研究期間:

2003年

-

2007年
μプロセッサの効率的設計法

研究期間:

1999年

-

2002年
実数値系列を用いたスペクトル拡散通信方式のディジタル化に関する研究

科学研究費助成事業(山口大学) 科学研究費助成事業(基盤研究(C))

研究期間:

1998年

-

1999年

棚田嘉博, 松元隆博, 渡邊孝博

　概要を見る

本研究では、一定のシフト範囲で不要信号の局間干渉がなく、必要信号の符号間干渉も殆どなく、かつディジタル信号処理が高速に実行できるような準同期CDMA方式のための符号の系列を開発し、実際にディジタル信号処理を主体としたスペクトル拡散モデムを試作することによってCDMA方式への新しい技術の導入の可能性を与えることを目的として、平成10年度から平成11年度まで行われた。所期の目的は概ね達成された。実数値の系列であっても高速演算アルゴリズムが適用できるように系列を構成したので、符号発生および相関処理をディジタル的に高速に実行する回路がFPGAを用いて実現され、多元接続実験でも相関処理の性能はほぼ設計通りであった。本研究で開発された有限長の系列は符号設計において干渉回避のためのガード区間を設ける必要もなく、情報伝送効率が高いので、今後の具体的応用への発展が期待される。以下に研究成果を要約する。
(1)CDMA用の系列と高速相関アルゴリズムの開発
自己相関が鋭く、異なる系列が直交し、各々高速相関アルゴリズムが適用できる有限長系列の組を要素系列のたたみ込みによって導いた。
(2)符号発生器、相関器の試作
長さ33のシフト直交実数有限長系列を整数近似し、ROMを用いた符号発生器、および1万ゲート相当のFPGAを用いた高速相関器を試作した。8ビットの分解能の符号発生器出力と相関器入力に対し16Mchip/sの動作速度、誤差1%の相関パルスを得た。
(3)CDMA伝送実験
2つの実数有限長系列に基づく信号の光空間多重伝送実験により、相関器の入力で0dBのDU比が出力で40dBに抑圧され、直交性は誤差の範囲で満たされた。
実数値系列を用いたスペクトル拡散通信方式のディジタル化に関する研究

科学研究費助成事業(山口大学) 科学研究費助成事業(基盤研究(C))

研究期間:

1998年

-

1999年

棚田嘉博, 松元隆博, 渡邊孝博

　概要を見る

本研究では、一定のシフト範囲で不要信号の局間干渉がなく、必要信号の符号間干渉も殆どなく、かつディジタル信号処理が高速に実行できるような準同期CDMA方式のための符号の系列を開発し、実際にディジタル信号処理を主体としたスペクトル拡散モデムを試作することによってCDMA方式への新しい技術の導入の可能性を与えることを目的として、平成10年度から平成11年度まで行われた。所期の目的は概ね達成された。実数値の系列であっても高速演算アルゴリズムが適用できるように系列を構成したので、符号発生および相関処理をディジタル的に高速に実行する回路がFPGAを用いて実現され、多元接続実験でも相関処理の性能はほぼ設計通りであった。本研究で開発された有限長の系列は符号設計において干渉回避のためのガード区間を設ける必要もなく、情報伝送効率が高いので、今後の具体的応用への発展が期待される。以下に研究成果を要約する。
(1)CDMA用の系列と高速相関アルゴリズムの開発
自己相関が鋭く、異なる系列が直交し、各々高速相関アルゴリズムが適用できる有限長系列の組を要素系列のたたみ込みによって導いた。
(2)符号発生器、相関器の試作
長さ33のシフト直交実数有限長系列を整数近似し、ROMを用いた符号発生器、および1万ゲート相当のFPGAを用いた高速相関器を試作した。8ビットの分解能の符号発生器出力と相関器入力に対し16Mchip/sの動作速度、誤差1%の相関パルスを得た。
(3)CDMA伝送実験
2つの実数有限長系列に基づく信号の光空間多重伝送実験により、相関器の入力で0dBのDU比が出力で40dBに抑圧され、直交性は誤差の範囲で満たされた。
アナログLSIのCAD

研究期間:

1990年

-

1998年
アナログ・デジタル混載型大規模集積回路の計算機支援設計の研究

科学研究費助成事業(山口大学) 科学研究費助成事業(一般研究(C))

研究期間:

1995年

-

1996年

渡邊孝博

　概要を見る

1.アナログ回路の要求仕様とレイアウト制約条件の関連性調査により、レイアウト設計仕様として「陽に指定される制約条件」と「回路構成や過去の設計経験から暗黙の内に指定できる制約条件」とに分類できた。回路設計からレイアウト設計までの一貫した自動化を達成するには、暗黙の内に指定できる制約条件を過去の設計知識を整理、体系化し、DB化することが必要である。アナログ記述言語については現在、様々な提案が出始めており、レイアウト仕様の記述をどう組み込めるか、さらに研究が必要である。2.複雑なレイアウト制約条件を満足するアナログ概略配線アルゴリズムを試作した。分枝限定法により、処理の高速化と高品質な配線結果を得た。新たな配線指標を提案して一層の高速化を図ると共に、分割統治法を併用することで大規模問題への対応を可能にした。分割統治の制御が今後の課題となった。3.高密度高速伝送の実装技術であるMCMの多層配線アルゴリズムについて、配線経路長の短縮、使用配線層数の削減およびビア数の削減を目的にアルゴリズム改良を行い、所望の結果を得た。MCMは従来、デジタルシステムの実装を対象としているが、技術課題の点でアナログ及びアナデジ混載と同根の技術課題を有するので、この分野も大規模システムの実装技術として必要になってくる。4.混載型LSIの統合設計環境の検討の基盤として、論理設計CADシステムの調査し、試用した。ハードウェア設計記述言語(Verilog-HDL)を用いた設定環境を利用して、μP及びデジタル応用回路を設計し、設計作業効率や設計品質に関して知見を深めた。また、大規模システムのテスト問題への対策として、組合わせ回路の効率的なテスト生成手法および順序回路の冗長故障判定法を試作し、実験によって有効性を確認した
アナログ・デジタル混載型大規模集積回路の計算機支援設計の研究

科学研究費助成事業(山口大学) 科学研究費助成事業(一般研究(C))

研究期間:

1995年

-

1996年

渡邊孝博

　概要を見る

1.アナログ回路の要求仕様とレイアウト制約条件の関連性調査により、レイアウト設計仕様として「陽に指定される制約条件」と「回路構成や過去の設計経験から暗黙の内に指定できる制約条件」とに分類できた。回路設計からレイアウト設計までの一貫した自動化を達成するには、暗黙の内に指定できる制約条件を過去の設計知識を整理、体系化し、DB化することが必要である。アナログ記述言語については現在、様々な提案が出始めており、レイアウト仕様の記述をどう組み込めるか、さらに研究が必要である。
2.複雑なレイアウト制約条件を満足するアナログ概略配線アルゴリズムを試作した。分枝限定法により、処理の高速化と高品質な配線結果を得た。新たな配線指標を提案して一層の高速化を図ると共に、分割統治法を併用することで大規模問題への対応を可能にした。分割統治の制御が今後の課題となった。
3.高密度高速伝送の実装技術であるMCMの多層配線アルゴリズムについて、配線経路長の短縮、使用配線層数の削減およびビア数の削減を目的にアルゴリズム改良を行い、所望の結果を得た。MCMは従来、デジタルシステムの実装を対象としているが、技術課題の点でアナログ及びアナデジ混載と同根の技術課題を有するので、この分野も大規模システムの実装技術として必要になってくる。
4.混載型LSIの統合設計環境の検討の基盤として、論理設計CADシステムの調査し、試用した。ハードウェア設計記述言語(Verilog-HDL)を用いた設定環境を利用して、μP及びデジタル応用回路を設計し、設計作業効率や設計品質に関して知見を深めた。また、大規模システムのテスト問題への対策として、組合わせ回路の効率的なテスト生成手法および順序回路の冗長故障判定法を試作し、実験によって有効性を確認した。

▼全件表示

Misc

CAME : A Novel Fast Connectivity-Aware MER Enumeration Algorithm for the Online Task Placement on Partially Reconfigurable Device (システム数理と応用)

PAN Tieyuan, Zeng Lian, TAKASHIMA Yasuhiro, Watanabe Takahiro

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 115 ( 480 ) 79 - 84 2016年03月

CiNii
A Length Matching Routing Method for Disordered Pins in PCB Design (VLSI設計技術)

Zhang Ran, Pan Tieyuan, Zhu Li, Watanabe Takahiro

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 114 ( 476 ) 103 - 108 2015年03月

　概要を見る

In this paper, for the disordered pins in printed circuit board (PCB) design, a heuristics algorithm is proposed to obtain a length matching routing. We initially check the longest common subsequence of pin pairs to assign layers for pins. Then, adopt single commodity flow to generate base routes. R-flip and C-flip are finally carried out to adjust the wire length. The experiments show that our algorithm generates the optimal routes with better wire balance within reasonable CPU times.

CiNii
A Performance Enhanced Dual-switch Network-on-Chip Architecture (VLSI設計技術)

Zeng Lian, Watanabe Takahiro

電子情報通信学会技術研究報告 = IEICE technical report : 信学技報 114 ( 476 ) 97 - 102 2015年03月

　概要を見る

Network-on-Chip (NoC) is an attractive solution for future systems on chip (SoC). The network performance depends critically on the performance of packets routing. However, as the network becomes more congested, packets will be blocked more frequently. It would result in degrading the network performance. In this article, we propose an innovative dual-switch allocation (DSA) design. By introducing two switch allocations, we can make utmost use of idle output ports. Experimental results show that our design significantly achieves the performance improvement in terms of throughput and latency at the cost of very little power overhead.

CiNii
A-3-2 Adaptive Router with Predictor using Congestion Degree

Zeng Lian, Watanabe Takahiro

電子情報通信学会ソサイエティ大会講演論文集 2013 45 - 45 2013年09月

CiNii
A Behavior-based Adaptive Access-mode for Low-power Set-associative Caches in Embedded Systems (特集:組込みシステム工学)

Jiongyao Ye, Hongfeng Ding, Yingtao Hu, Takahiro Watanabe

情報処理学会論文誌 52 ( 12 ) 11p 2011年12月

　概要を見る

Modern embedded processors commonly use a set-associative scheme to reduce cache misses. However, a conventional set-associative cache has its drawbacks in terms of power consumption because it has to probe all ways to reduce the access time, although only the matched way is used. The energy spent in accessing the other ways is wasted, and the percentage of such energy will increase as cache associativity increases. Previous research, such as phased caches, way prediction caches and partial tag comparison, have been proposed to reduce the power consumption of set-associative caches by optimizing the cache access mode. However, these methods are not adaptable according to the program behavior because of using a single access mode throughout the program execution. In this paper, we propose a behavior-based adaptive access-mode for set-associative caches in embedded systems, which can dynamically adjust the access modes during the program execution. First, a program is divided into several phases based on the principle of program behavior repetition. Then, an off-system pre-analysis is used to exploit the optimal access mode for each phase so that each phase employs the different optimal access mode to meet the application's demand during the program execution. Our proposed approach requires little hardware overhead and commits most workload to the software, so it is very effective for embedded processors. Simulation by using Spec 2000 shows that our proposed approach can reduce roughly 76.95% and 64.67% of power for an instruction cache and a data cache, respectively. At the same time, the performance degradation is less than 1%. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.20(2012) No.1 (online) DOI http://dx.doi.org/10.2197/ipsjjip.20.26------------------------------Modern embedded processors commonly use a set-associative scheme to reduce cache misses. However, a conventional set-associative cache has its drawbacks in terms of power consumption because it has to probe all ways to reduce the access time, although only the matched way is used. The energy spent in accessing the other ways is wasted, and the percentage of such energy will increase as cache associativity increases. Previous research, such as phased caches, way prediction caches and partial tag comparison, have been proposed to reduce the power consumption of set-associative caches by optimizing the cache access mode. However, these methods are not adaptable according to the program behavior because of using a single access mode throughout the program execution. In this paper, we propose a behavior-based adaptive access-mode for set-associative caches in embedded systems, which can dynamically adjust the access modes during the program execution. First, a program is divided into several phases based on the principle of program behavior repetition. Then, an off-system pre-analysis is used to exploit the optimal access mode for each phase so that each phase employs the different optimal access mode to meet the application's demand during the program execution. Our proposed approach requires little hardware overhead and commits most workload to the software, so it is very effective for embedded processors. Simulation by using Spec 2000 shows that our proposed approach can reduce roughly 76.95% and 64.67% of power for an instruction cache and a data cache, respectively. At the same time, the performance degradation is less than 1%. ------------------------------ This is a preprint of an article intended for publication Journal of Information Processing(JIP). This preprint should not be cited. This article should be cited as: Journal of Information Processing Vol.20(2012) No.1 (online) DOI http://dx.doi.org/10.2197/ipsjjip.20.26------------------------------

CiNii
A general neural network architecture for efficient FPGA-based implementation (VLSI設計技術)

Lin Zhen, 董宜平, 渡邊孝博

電子情報通信学会技術研究報告 110 ( 36 ) 61 - 66 2010年05月

CiNii
効率的なFPGA実装を指向したニューラルネットワークのアーキテクチャ

林圳, 董宜平, 渡邊孝博

研究報告システムLSI設計技術（SLDM） 2010 ( 11 ) 1 - 6 2010年05月

　概要を見る

本稿では、多層ニューラルネットワーク（NN）を FPGA で実装する一般的なアーキテクチャを提案する。提案アーキテクチャは、リソースの使用効率を高めて、ネット遅延を削減するように工夫しており、NN のサイズが大規模になっても市販の FPGA チップ上に実現することができる。最も大きな特徴は層間マルチプレクシングと部分的なパイプライン方法を利用してマッピング方法を改善したことである。このアーキテクチャは、層の数と各層のニューロンの数が与えられる任意の NN に対して、適用することができる。実験の結果、従来の方法と比べて、提案した構造が非常にコンパクトで、高速度と低コストであることが分かった。This paper presents a general architecture for a multilayer neural network (NN) to be implemented on FPGA. The proposed architecture is aimed at enhancing the efficiency of resource usage and reducing the net delay, so that a larger NN can be realized on a commercially available FPGA chip. A key feature is the mapping method, which has been exploited by using layer-multiplexing and partly pipeline manner. This architecture can be applied to any multilayer neural network composed of a given number of layers and a given number of neurons in each layer. Experimental results show that the proposed architecture can produce a very compact circuit and behaves the characteristics of higher speed and lower cost comparing with conventional methods.

CiNii
GAを用いたディジタル回路設計の一手法

王芳芳, 鮑治国, 蘇怡文, 渡邊孝博

研究報告システムLSI設計技術（SLDM） 2010 ( 2 ) 1 - 6 2010年05月

　概要を見る

ディジタル回路の設計では回路の良さの尺度として、構成の複雑さに加えて信号遅延や消費電力などの複数の評価基準が必要となっており、このことが回路設計を一層複雑にしている。そこで、遺伝的アルゴリズム（GA：Genetic Algorithm）を用いて複数の評価基準を満たす回路を生成する設計手法が提案されている（1）、（2）。本論分では、遺伝子の表現と交差や選択の処理に新たな工夫を導入する、小規模の回路を用いて実験を行った結果、提案手法は従来のものに比べて、より少ないゲート数の回路を生成することができた。In this paper, we propose a new digital circuit design by GA, which has sophisticated chromosome representation, crossover and mutation operators on the performance of GAs. We propose a tree-based chromosome representation, in which initialization depends on a guided random initialization. Based on considering the characteristic of representation, two kinds of crossover operators and three kinds of mutation operators are adopted. Experimental results show that our proposed method provides better results compared to other methods.

CiNii
C_008 Rip-up IPを用いたカスタマイズ設計環境(C分野:ハードウェア)

亀井智紀, 渡邊孝博

情報科学技術フォーラム一般講演論文集 5 ( 1 ) 173 - 174 2006年08月

CiNii
含意操作に基づく順序回路の冗長故障判定

藤本幸宏, 渡邊孝博

山口大学工学部研究報告 48 ( 2 ) 213 - 220 1998年03月

　概要を見る

論理回路の冗長故障は, 回路の故障検査(テスト)では検出できない故障であり, これをあらかじめ検査項目の故障リストから除いておくことは効率的なテスト生成に欠かせない。そこで, 本研究では, 特に回路が複雑になる順序回路に対して, 2種類の効率的な冗長故障判定方法の検討を行う。我々はこれらの2種類の方法を実装し, ISCAS'89ベンチマーク回路を用いて実験を行った。その結果, 冗長故障判定を行わない時と比較して, テスト生成全体の効率を向上させることが確認できた。

CiNii
フローグラフによる LSI 多層配線問題の解法

渡邊孝博, 面谷圭司

山口大学工学部研究報告 45 ( 1 ) 83 - 90 1994年10月

　概要を見る

Advances in VLSI fabrication technology have made it possible to use more than two routing layers for interconnection. In such a multi-layer routing technology, one of the important objective functions is via-minimization, that is, the number of vias should be kept as small as possible. A topological planar routing (TPR) was proposed to solve this via-minimization problem. TPR is a layer assignment method which assigns each net to one of the layers without crossing other nets in the same layer. Although an optimum TPR is unfortunately known as an NP-complete problem, it can be approximately solved in polynomial time for the channel layout model as a minimum-cost maximum-flow problem using a flow graph. In this paper, we propose an improved TPR for more general layout model like a macrocell layout model, where planarity testing and a flow graph are modified to treat our model. An experimental result shows that our improvements increase an efficiency of usage of multi-layers.

CiNii
LSIレイアウト自動設計の現状と可能性

渡辺孝博

電子情報通信学会誌 76 ( 7 ) 774 - 782 1993年07月

　概要を見る

LSIレイアウトCADは,設計期間短縮と大規模・高密度LSIの設計に大きく貢献してきた.しかし,半導体およびシステムの設計・製造技術は更に進展を続けており,LSIの一層の高性能化・多機能化の要求が新たな課題を課している.本稿では,レイアウト設計方式を説明し,大規模・高密度LSIを実現してきたレイアウト手法の現状を紹介する.また,LSIレイアウトのこれからの技術課題の中から,パフォーマンスドリブンレイアウトとアナログレイアウトの問題および現状を解説する.

CiNii

▼全件表示

特定課題制度（学内資金）

Study of Congestion-aware and Fault-tolerant NoC Routing and its implementation on FPGAs

2020年

　概要を見る

NoC(Network-on-Chip)はMPSoC(Multi-Processor System-on-a-Chip)の一種で、拡張性や通信性能および処理能力の点で非常に優れており、多くの研究が行われている。本研究ではNoCに故障が発生したとき、故障部分を避ける迂回路を効率よく求める手法を提案した。具体的にはNoC上の故障としてリンク遮断が発生した時、Hamiltonian-based Odd-Even Routing手法を耐故障性を持つように改良した。実験でレイテンシとスループット値を評価した結果、提案手法の有効性を確認した。併せて、通信トラフィックの混雑による性能低下を事前に検出し、防止するための機構を研究した。過去の通信状況に基づいてトラフィック混雑を回避するルートを予測する提案を行い、実験で有効性を確認した。以上の研究成果は国際会議2件の論文として発表した。
Traffic-Congestion-Aware Routing Strategy for 2D/3D NoC

2019年

　概要を見る

NoC(Network-on-Chip)は、コア間のパケット通信をオンチップ・ネットワークによって処理することで、スケーラビリティと通信性能の向上を目指し、大規模なマルチコアシステムを実現するものである。本研究の目的は、通信量が増大して局所的な通信混雑が発生した場合でも良好な性能を発揮できるNoCルーティング機構を開発することである。具体的には、トラフィックパターンに応じた混雑状況の検出機構、ホットスポット・トラフィックパターンでの混雑とアルゴリズム性能との関係分析、および、低コストな混雑検出回路を提案した。また、NoCに故障がある場合にその故障部分を回避するルーティング手法についても取り組んだ。研究成果は4件の査読付き国際会議論文として発表した。 
動的再構成可能デバイスによるオンライン・タスク配置問題の効率的解法

2018年周　亭宇

　概要を見る

動的再構成可能プロセッサ(以下DRP)ではタスクを論理要素に割り当てて並列演算処理し、また、処理が完了したタスクは論理要素群から解放し、そこに別のタスクを割り当てて再利用することができる。オンラインタスク配置問題とは、DRPを効率よく使用しスループットを向上させるために、タスクの処理順序とDRP上の割り当てを最適化する問題である。割り当て問題については、DRP上の領域を管理するデータ構造MERを改良し、再利用可能領域の抽出の高速化手法を提案した。処理順序の最適化については、タスク間に一方向性の通信が存在する場合についてタスク処理順序グラフを定義し、効率の良い処理順序の決定手法を提案した。成果は国際会議等で発表した。
ミクスト・シグナルLSIの対称制約条件付き配線手法の研究

2017年周　亭宇, 戴 Jindun, 黄　洪逸

　概要を見る

ミクスト・シグナルLSIでは信号の干渉や遅延など配線設計に起因する問題が顕著になっている。この問題を解決するため我々は「対称度」なる評価関数を導入し、対称制約を維持できる配線手法を提案した。今回は評価関数について(1)重み係数の影響、 (2)配線障害物がある場合の効果　を検証した。その結果、配線障害物がない場合には、人手設計と同等な経路が得られ、評価関数が機能することが示された。障害物がある場合、一層配線では評価関数の効果が認められるが、配線層数が増えるにつれて経路候補が多くなるため、対称度が同じでも対称性が乏しい配線結果が発生することが判った。今後の課題は配線層数や配線層毎の評価を組み入れることである。
LSI/PCBの自動配線アルゴリズムに関する研究

2016年蒋　欣, 潘　鉄源, 張　子驕

　概要を見る

  集積回路の設計において回路動作や性能に影響を与える配線設計は重要である。そのために複数ネットの配線長を揃える等長配線の自動化手法があるが、バス配線やクロック配線での遅延やスキュをより高精度に考慮するために、ペア配線の対称性も問題となっている。本研究では多層配線においてペア配線を対象形状にする手法を研究した。配線経路探索では、最大フローアルゴリズムを利用して効率よく所望の経路を探索する。対称形状の評価のために、配線長、配線折曲数、配線方向の関数である対称度(symmetrical rate)を定義した。実験の結果、提案手法による配線経路は対称度か高く、少ない配線層で、経路探索時間も従来手法と比べて短縮できることが示された。 
NoCベース高性能演算処理システムの構成方式と設計技術に関する研究

2014年

　概要を見る

VLSIシステムの大規模化と高性能化の要求に対処するため、設計技術やデバイス技術など様々な観点から研究が行われている。VLSI構成方式の点からネットワークオンチップ(NoC)が、また、デバイス構造の点から3次元化が着目されている。本研究では3次元NoCに着目し、そのアーキテクチャと高性能なネットワーク処理を可能にするルーティング機構を研究開発した。シミュレーション実験の結果、提案する3次元NoCは従来アーキテクチャに比べて一層の低消費電力化、低レイテンシ化、高スループット化を達成できることが確認できた。また、信頼性向上のために耐故障機構を組み込み、NoCのノードやリンクに故障が発生した場合でも有効に対応できることを確認した。 VLSI systems become larger and larger and their performance requrement is more and more sever. To meet such situation, Network-on-Chip(NoC) and Three Dimentional(3D) VLSI are very attractive. This research focuses on 3D NoC, where NoC Architecture and network topologies  are studied. Experimental results by simulation show that the proposed NoC architecture has lower latency and higher throuput compared to the traditional NoC. Furthermore, Fault-tolerancy is also implemented for higher reliability of NoC.
カスタマイズ可能なＩＰを用いたＳｏＣ設計とその応用システムの構成

2010年

　概要を見る

本課題の研究項目は次の3点からなる：(1)Rip-up IPとカスタマイズ設計利用環境の開発、(2)Rip-up IPによるSoC/NoCアーキテクチャの研究、(3)応用分野の研究。今年度は特に（1）に重点を置いてカスタマイズ手法の確立を図ることとし、その上で、(2)の大規模SoC/NoCアーキテクチャ検討を行った。（１）では、Rip-upIP方式のプロセッサIPの命令レベルのカスタマイズおよびプロセッサを利用するアプリケーションの機能レベルのカスタマイズを実験した。アプリケーションに用いられるアルゴリズムを機能ととらえ、該当機能を実現するのに必要な命令集合を抽出することで、既開発の命令レベルカスタマイズ処理を利用する。DSPをモチーフにして、機能レベルカスタマイズが可能であることを確認した。この機能レベル処理に伴い、Rip-upIPを用いる設計利用環境“WIPER”の改良を行った。すなわち、C記述のアプリケーションと対象とするプロセッサIPの全命令セットとから必要な命令のサブセットを生成する処理部を改良し、“WIPER-Ⅱ”とした。次に、アプリケーションアルゴリズムと汎用のプロセッサIPを入力として、カスタマイズに至る一連の処理フローの確立と、適用分野の拡大を図ることを目的に、商用ツールの調査を行った。調査の結果、特定用途プロセッサ（ASIP：Application Specific Instruction-set Processor）の自動生成ツールであるASIP-Meister（ASIPソリューションズ社製）をRip-upIPライブラリ作成に利用することを提案し、WIPERシステムと繋ぐための開発作業を開始した。（２）では、マルチプロセッサSoC (MPSoC)の発展的な応用としてNoC(Network-on-Chip)を採り上げ、応用分野に適したアーキテクチャ検討と、性能向上のためのルーティング方式およびルータ回路の試作と評価を行った。応用としてはニューラルネットワーク（NN）のハードウェア化に着目し、NoCにNNを実装するための手法、拡張性、処理性能、電力を評価した。既存のハードウェアNNと比べて、これら評価の上で優れていることが明らかにされ、成果を学会論文および国際会議等で報告した。
カスタマイズ可能ＩＰを用いたＳｏＣ実現手法とその応用システム構築に関する研究

2009年

　概要を見る

本研究ではカスタマイズ可能なIPリソースを利用したSoC(System on a Chip)およびNoC(Network on a Chip)の設計方式を研究するとともに、カスタマイズ作業とその支援環境方式を開発評価することを目的に行った。　カスタマイズ可能なIPとして、Rip-upIPと呼ぶ方式を既に開発しているが、今年度はIP種類の拡大を狙って、DSPのIP化とそのカスタマイズ方式、ビット幅をパラメータとするカスタマイズ処理を研究し、これに対応するように設計環境改良の設計を行った。特にカスタマイズ可能なDSPのテーマでは、従来のインストラクションレベルでカスタマイズする方式から、アルゴリズムレベルでカスタマイズする方式を検討し、IP利用者にとっての使いやすさの向上を目指した。この手法はIPの利用度をさらに高めることができ、また、カスタマイズ可能IPの品種数を拡大することにもつながる。　SoCについては、DSPとプロセッサから構成された組込システムをモチーフにして、DSPの代わりに複数のプロセッサIPからなるマルチプロセッサシステムを設計し、FPGA上に実装して、設計効率や性能を評価した。また、NoCについては、二次元平面上に配置されたIPコア間のネットワーク構造(トポロジ)、ルータ回路、および　ルーティング戦略を検討し、レイテンシ、スループット、消費電力などの観点から評価し、有効なNoCの構成を明らかにした。ここで評価のための具体的事例としては、ANN(Artificial Neural Network)を想定した。これらの研究を通して、大規模なシステムを構築するのに必要なIP群と性能、カスタマイズ要求の条件を検討した。　一方、有力なIPセットを提供することも重要である。そこで、プロセッサIPの高性能化とIP設計に利用される基本回路の生成に関わる研究を平行して行った。前者ではIPの低消費電力を目標に、特にキャッシュ構造の新方式を提案し、シミュレーションにより効果を確認した。後者ではGAを用いた新しい回路生成・最適化方式を提案し、実験で良質な回路が生成できることを実験的に確認した。　以上の研究成果は別項の研究成果発表で記すように、随時、国際会議や学術雑誌等で発表した
カスタマイズＩＰによる高性能マルチプロセッサＳｏＣの効率的設計手法

2008年

　概要を見る

　本研究では，前年度までに研究してきた「カスタマイズできるプロセッサIPとそのカスタマイズ設計環境」を用いて，マルチプロセッサSoC (以下　MPSoC *1)を構築する設計手法の検討，具体的なアプリケーションを対象にMPSoCをFPGAで試作し評価すること，および，MPSoCの今後の発展形として幅広い分野の応用が期待されているNoC(*2)の構成法の研究を行った．また，IPの最適化設計にAI手法を導入した新しい試みとSoCの高性能・低電力化を達成するためのプロセッサアーキテクチャを並行して研究した．具体的な研究内容と成果は以下の通りである．成果は国際会議等で論文発表した．（*1 MPSoC : Multi-Processor System on a Chip *2 NoC : Network on Chip)(1)カスタマイズ可能IP“Rip-up IP”としてCOMET，x86互換，MiniMIPSの3種類のプロセッサIPを開発し，IP設計支援環境“WIPER”　を用いてカスタマイズし，FPGA実装して，回路規模，処理性能，消費電力等の評価を行った．(2)MPSoCの性能を予測するための一手法として，FPGAを用いたマルチプロセッサシステムを試作した．アプリケーションはJPEG　エンコーダ・システムで，AlteraFPGAにハードウェアおよびJPEGソフトウェアを実装した．1プロセッサから4プロセッサまでの　構成で，タスク分担や設計方法の工夫を行い，性能比較を行った．(3)SoC上の複数コアの各々にルータを取り付け，ネットワークで信号伝送するNoCを用いて，NeuralNetworkのハードウェア化の　アーキテクチャを提案し，NoCシミュレータによる性能評価実験とFPGAによる消費電力見積もりなどの評価を行った．(4)SoCのためのIP利用をさらに促進させるためには，優れたIPが効率よく開発される必要がある．複数個の設計制約条件下でAI手法　を用いてをIPの最適設計ができるシステムを研究した．改良型GAを提案し，小規模な論理回路を例に，複数条件に対して最適設計　が得られることをシミュレーション実験で確認した．(5)プロセッサの高性能化の一ボトルネックである，分岐命令からの回復処理を高速化するハードウェア機構を提案し，　シミュレーションで評価実験を行った．
マルチプロセッサＳｏＣを指向したＲｉｐ－ｕｐ　ＩＰ利用設計

2007年

　概要を見る

本研究テーマでは，システムの仕様に応じてカスタマイズできるIPとその利用設計環境を研究開発し，さらに，複数のカスタマイズ可能プロセッサIPの組合せによるマルチプロセッサSoC型の高性能システムの実現を目的として行なった.具体的には以下のとおり.１．　カスタマイズ可能IP“Rip-up IP”とその設計支援環境の研究(1)プロセッサを例に機能の削除・追加が容易なIPである“Rip-upIP”を開発した． (2)ｘ８６命令互換プロセッサのRip-upIPを作成し，評価実験を行った． FPGA実装時の回路規模と動作周波数では期待通りの値を得たが，命令削減による論理ブロックのトグル数上昇と動作周波数が高くなったことから，省電力効果は不十分であった．パワーゲーティングなどの低電力機構を導入することによって解決できる． (3)カスタマイズ設計支援環境“WIPER”を開発した．Rip-upIPのソース記述形式を定義してIPライブラリ化を行い，GUIベースのカスタマイズ作業支援ツールを開発した．２．MPSoC構成とその発展形NoCの研究MPSoCをFPGAのソフトIPコアを利用して実験した．さらに，通信バスの代わりにネットワークでコア間通信を行なうNoC　(Network on a Chip)について研究した．　(1)Altera社製FPGAのソフトコアであるNIOSプロセッサを用いて，制御システムを題材にMPSoCの構成実験を行なった．従来のDSP+プロセッサ構成を，様々な組合せの２プロセッサ構成に置き換え，設計効率や性能等を比較検討した．(2)MPSｏＣの将来形として，プロセッサやメモリなどの機能ユニットをコアとし，各コアはルータを有してチップ上ネットワークで結合されるＮｏＣについて研究した．今年度は特に低電力化をキーワードに，ネットワークのルーティング問題およびスリープ制御機構による消費電力削減機構を研究した．
システムＬＳＩの超短期設計のための基礎技術に関する研究

2004年吉村猛, 木村晋二, 土井伸洋

　概要を見る

システムLSIの超短期設計実現を目標に，特に以下の研究を行った．(1) 新デバイス構造LSとして，ラッチベースの回路構成を提案し，回路のタイミング最適化と動作高速化が実現することを示した．この成果は電子情報通信学会論文誌Eに掲載された．(2) 浮動小数点処理を固定小数点処理に自動変換するアルゴリズムを提案し，アルゴリズムのハードウェア化においての面積縮小と動作速度向上が可能であることを示した．本成果は第12回SASIMI　(Synthesis And System Integration of Mixed Information tech．）ワークショップに採択，掲載された．(3) 超大規模回路を実用的な時間で設計するための回路分割手法を提案し，任意数への分割アルゴリズムを作成した．この成果は第６回IEEE広島支部学生シンポジウムで発表した．(4) システムLSIのCPUコアへの利用を目的にμプロセッサのFPGA-IPを設計した．また，プロセッサをユーザの仕様に応じてカスタマイズするための設計環境を試作した．これらの成果は電気･情報関連学会中国支部連合大会，情報処理学会九州支部シンポジウム等で発表した．また，北九州学術研究都市で行われた産学連携フェアにて展示した．(5) 大規模なハードウェア･システムを合理的な規模の複数FPGAで実現し，FPGAエミュレーションを容易にするための分割・実装手法を提案した．この成果は電子情報通信学会総合大会にて発表した．以上，システムLSIの実機検証を含めて設計期間を短縮するための基礎技術　ならびに　より高性能な回路を実現するための回路構成を研究し，成果を公開した．今後はこれら基礎技術の各々を改良し，更に性能を向上させると共に，超短期設計のための一貫した設計フローとして統合化を図っていくことが課題である．

▼全件表示