Updated on 2024/04/23

写真a

 
YOSHIMURA, Takeshi
 
Affiliation
Faculty of Science and Engineering
Job title
Professor Emeritus
Degree
博士(工学) ( 大阪大学 )

Professional Memberships

  •  
     
     

    日本学術会議

  •  
     
     

    日本技術者教育認定機構(JABEE)

  •  
     
     

    IEEE CAS Society Tokyo Chapter

  •  
     
     

    IEEE

  •  
     
     

    情報処理学会

  •  
     
     

    電子情報通信学会

▼display all

Research Interests

  • optimization technologies, VLSI design automation

Awards

  • IEEE APCCAS2006 Best Paper Award

    2006.12  

  • IEICE Fellow

    2004.09  

  • IEEE Circuits and Systems CAD Transactions Best Paper Award

    2002.06  

  • 電子情報通信学会論文賞

    1988  

 

Papers

  • A Unified Scheduling Approach for Power and Resource Optimization With Multiple V-dd or/and V-th in High-Level Synthesis

    Cong Hao, Nan Wang, Takeshi Yoshimura

    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS   36 ( 12 ) 2030 - 2043  2017.12

     View Summary

    In this paper, we focus on the low-power scheduling problem with multiple threshold and/or supply voltage technologies in high-level synthesis. We propose a unified scheduling approach which is applicable to various optimization problems, including: 1) dynamic power and resource usage co-optimization; 2) leakage power optimization; and 3) dynamic power and leakage power co-optimization. To deal with different objectives with high flexibility, three problems are divided into two common subproblems including delay assignment and resource density variance minimization, then a vertex potential-based mobility allocation model is proposed to solve two subproblems simultaneously. Experimental results show that, for dynamic power and resource co-optimization, our scheduling approach produces optimum solutions for all six benchmarks with 15 groups of data; for leakage power optimization it also greatly excels the latest existing work, by 20% leakage power reduction and 52 times speedup. Besides, for dynamic and leakage power co-optimization, the Pareto solutions are studied.

    DOI

  • Framework and VLSI architecture of measurement-domain intra prediction for compressively sensed visual contents

    Jianbin Zhou, Dajiang Zhou, Li Guo, Takeshi Yoshimura, Satoshi Goto

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E100A ( 12 ) 2869 - 2877  2017.12

     View Summary

    This paper presents a measurement-domain intra prediction coding framework that is compatible with compressive sensing (CS)-based image sensors. In this framework, we propose a low-complexity intra prediction algorithm that can be directly applied to measurements captured by the image sensor. We proposed a structural random 0/1 measurement matrix, embedding the block boundary information that can be extracted from the measurements for intra prediction. Furthermore, a low-cost Very Large Scale Integration (VLSI) architecture isimplemented for the proposed framework, by substituting the matrix multiplication with shared adders and shifters. The experimental results show that our proposed framework can compress the measurements and increase coding efficiency, with 34.9% BD-rate reduction compared to the direct output of CS-based sensors. The VLSI architecture of the proposed framework is 9.1 Kin area, and achieves the 83% reduction in size of memory bandwidth and storage for the line buffer. This could significantly reduce both the energy consumption and bandwidth in communication of wireless camera systems, which are expected to be massively deployed in the Internet of Things (IoT) era.

    DOI

  • Application of on-line machine learning in optimization algorithms: A case study for local search

    Cong Hao, Takeshi Yoshimura

    2017 9th Computer Science and Electronic Engineering Conference, CEEC 2017 - Proceedings     19 - 24  2017.11

     View Summary

    The study on machine learning has been flourishing for several years, and machine learning algorithms are being applied to various fields with great achievements. In this paper, combining the on-line machine learning method into optimization algorithms is to be studied. In many heuristic optimization algorithms, one common way to reduce execution time and improve solution optimality is, first estimating the quality of a set of candidate solutions, and solving only promising candidates in detail. Currently most estimations are performed by empirical equations, whose accuracy greatly relies on the how well the equation is designed. In this paper, we propose an on-line learning based estimator to perform the solution estimation in heuristic algorithms to improve estimation accuracy. Then a simple case study is discussed, where a local search based heuristic with random start is used, and an on-line estimator considering the properties of local search is proposed. The experiments show that the accuracy of on-line estimator is much higher than the static estimator, and is also higher than a general off-line pre-Trained learner. Even though the on-line estimator introduced special time for its training, the heuristic algorithm still speeds up by 3.7X without optimality sacrifice.

    DOI

  • A particle swarm optimization and branch and bound based algorithm for economical smart home scheduling

    Yangyizhou Wang, Cong Hao, Takeshi Yoshimura

    Midwest Symposium on Circuits and Systems   2017-   213 - 216  2017.09

     View Summary

    Smart home scheduling, as one of the most effective techniques in Demand Side Management (DSM), is now attracting more and more research interests in the recent years. In this paper we propose an efficient scheduling algorithm for smart home resident to reduce the monetary cost of the electricity. The proposed algorithm is an improved particle swarm optimization(PSO) algorithm that can schedule the smart appliances under discrete power level and quadratic pricing model. Branch and bound method is adopted to map real number values to discrete power level values. Simulation results shows that our method exceeds the previous methods both in total monetary cost and execution time.

    DOI

  • Energy-efficient scheduling method with cross-loop model for resource-limited CNN accelerator designs

    Kaiyi Yang, Shihao Wang, Jianbin Zhou, Takeshi Yoshimura

    Proceedings - IEEE International Symposium on Circuits and Systems    2017.09

     View Summary

    The state-of-the-art customized accelerators of convolution neural networks (CNN) have achieved high throughput while the huge amount of data movements still remains as the dominant part of the total energy costs. In this paper, we propose an energy-efficient scheduling approach to find an efficient dataflow that minimizes data movements with limited hardware resource budgets. In detail, two-level nested loop transformations are proposed to separate memory and computing resource constraints. This allows us to fully exploit the potential of available memory resources for reducing off-chip memory traffic. Further, the proposed cross-loop model is capable of figuring out the data locality across nested loops in CNN algorithms. Finally, energy-delay production is employed as the evaluation criteria to balancing energy and throughput performance. The experimental results show our cross-loop model can reduce the off-chip data movements by 11-21% and achieve the theoretical optimum. Therefore, the proposed scheduling method can increase the energy efficiency by at least 8.7 times.

    DOI

  • An Efficient Multi-Level Algorithm for 3D-IC TSV Assignment

    Cong Hao, Takeshi Yoshimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E100A ( 3 ) 776 - 784  2017.03

     View Summary

    Through-silicon via (TSV) assignment problem is one of the key design challenges of 3-D IC which is crucial to the wire length and signal delay. In this work we formulate the 3-D IC TSV assignment as an Integer Minimum Cost Multi Commodity (IMCMC) problem on a IMCMC network, and propose a multi-level algorithm. It coarsens the IMCMC network level by level, applies a rough flow assignment on each level of coarsened graph, and generates only promising edges to reduce the IMCMC network size. Benefiting from the multi-level structure, we propose a mixed single and multi commodity flow method improve the TSV assignment solution quality. Moreover, given a TSV assignment, we propose an extended layer by layer algorithm to further optimize the TSV assignment. The experimental results demonstrate that our multi-level with mixed single and multi commodity flow algorithm achieves not only smaller wire length but also shorter runtime compared to other existing works.

    DOI CiNii

  • Thermal-Aware floorplanning for noc-sprinting

    Zhu, Hui, Zhu, Hui, Hao, Cong, Yoshimura, Takeshi

    Midwest Symposium on Circuits and Systems    2017.03

     View Summary

    © 2016 IEEE. Due to the rise of utilization wall, a large portion of silicon chips become dark or dim silicon. A NoC-sprinting method is proposed to deal with this problem for instantaneous improvement and the key design constraint of these problems is thermal design power(TDP). In this work we propose a thermalaware Modified Insert After Remove Floorplanning(MD-IARFP) algorithm for NoC-sprinting. Wire length is taken into consideration while only thermal behavior is concerned in the previous work. A thermal model is constructed to evaluate temperature, using relationship between heat transfer and electrical phenomena. Simulated Annealing(SA) based MD-IARFP algorithm is applied to optimize the distribution of active cores. In terms of perturbation of SA, Modified Insert After Remove(MD-IAR) method gives an efficient generation of new floorplan which helps to reduce iterate number and lead to less CPU time. Effective as the experimental results show, our proposal provides better solution with lower temperature and significant decrease of wire length.

    DOI

  • Interconnection Allocation Between Functional Units and Registers in High-Level Synthesis

    Cong Hao, Jianmo Ni, Nan Wang, Takeshi Yoshimura

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS   25 ( 3 ) 1140 - 1153  2017.03

     View Summary

    Data path interconnection on VLSI chips usually consumes a significant amount of both power and area. In this paper, we focus on the port assignment problem for binary commutative operators for interconnection complexity reduction. First, the port assignment problem is formulated on a constraint graph, and a practical method is proposed to find a valid and initial solution. For solution optimization, an elementary spanning-tree-transformation-based local search algorithm is proposed. To improve the efficiency of optimization, a matrix formulation, which meets the simplex tabuleau format, is proposed and thus the simplex method is adopted for optimization. Moreover, operation pivoting and successive pivoting are discussed for algorithm speedup. The experimental results show that on the randomly generated test cases, the matrix-based algorithm shows the highest solution optimality and is five times faster than the elementary transformation method. On the real high-level synthesis benchmarks, the matrix-based method reduced 14% interconnections, while the previous greedy algorithm reduced 8% on average.

    DOI

  • An efficient multi-level algorithm for 3D-IC TSV assignment

    Cong Hao, Takeshi Yoshimura

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E100A ( 3 ) 776 - 784  2017.03

     View Summary

    Through-silicon via (TSV) assignment problem is one of the key design challenges of 3-D IC which is crucial to the wire length and signal delay. In this work we formulate the 3-D IC TSV assignment as an Integer Minimum Cost Multi Commodity (IMCMC) problem on a IMCMC network, and propose a multi-level algorithm. It coarsens the IMCMC network level by level, applies a rough flow assignment on each level of coarsened graph, and generates only promising edges to reduce the IMCMC network size. Benefiting from the multi-level structure, we propose a mixed single and multi commodity flow method improve the TSV assignment solution quality. Moreover, given a TSV assignment, we propose an extended layer by layer algorithm to further optimize the TSV assignment. The experimental results demonstrate that our multi-level with mixed single and multi commodity flow algorithm achieves not only smaller wire length but also shorter runtime compared to other existing works.

    DOI

  • Economical smart home scheduling for single and multiple users

    Hao, Cong, Yoshimura, Takeshi

    Midwest Symposium on Circuits and Systems    2017.03

     View Summary

    © 2016 IEEE. Smart home scheduling, one of the most powerful technologies for electric power control from demand side, has been attracting rapidly growing interests. In this paper smart home scheduling algorithms are proposed for a single user and for multiple users to reduce total monetary cost. For a single user a game theory based method is proposed where each home appliance is regarded as a player and scheduling for appliances is iteratively optimized. For multiple users mixed strategy is adopted instead of pure strategy to reduce congestions of appliances on the current cheapest time interval. Experimental results show that both the scheduling for a single user or multiple users outperform the latest existing work both in terms of total monetary cost and execution time.

    DOI

  • Real-Time UHD Background Modelling with Mixed Selection Block Updates

    Axel Beaugendre, Satoshi Goto, Takeshi Yoshimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E100A ( 2 ) 581 - 591  2017.02

     View Summary

    The vast majority of foreground detection methods require heavy hardware optimization to process in real-time standard definition videos. Indeed, those methods process the whole frame for the detection but also for the background modelling part which makes them resource guzzlers (time, memory, etc.) unable to be applied to Ultra High Definition (UHD) videos. This paper presents a real-time background modelling method called Mixed Block Background Modelling (MBBM). It is a spatio-temporal approach which updates the background model by carefully selecting block by a linear and pseudo-random orders and update the corresponding model's block parts. The two block selection orders make sure that every block will be updated. For foreground detection purposes, the method is combined with a foreground detection designed for UHD videos such as the Adaptive Block-Propagative Background Subtraction method. Experimental results show that the proposed MBBM can process 50 min. of 4K UHD videos in less than 6 hours. while other methods are estimated to take from 8 days to more than 21 years. Compared to 10 state-of-the-art foreground detection methods, the proposed MBBM shows the best quality results with an average global quality score of 0.597 (1 being the maximum) on a dataset of 4K UHDTV sequences containing various situation like illumination variation. Finally, the processing time per pixel of the MBBM is the lowest of all compared methods with an average of 3.18x10(-8) s.

    DOI CiNii

  • Real-time UHD background modelling with mixed selection block updates

    Axel Beaugendre, Satoshi Goto, Takeshi Yoshimura

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E100A ( 2 ) 581 - 591  2017.02

     View Summary

    The vast majority of foreground detection methods require heavy hardware optimization to process in real-time standard definition videos. Indeed, those methods process the whole frame for the detection but also for the background modelling part which makes them resourceguzzlers (time, memory, etc.) unable to be applied to Ultra High Definition (UHD) videos. This paper presents a real-time background modelling method called Mixed Block Background Modelling (MBBM). It is a spatio-temporal approach which updates the background model by carefully selecting block by a linear and pseudo-random orders and update the corresponding model's block parts. The two block selection orders make sure that every block will be updated. For foreground detection purposes, the method is combined with a foreground detection designed for UHD videos such as the Adaptive Block-Propagative Background Subtraction method. Experimental results show that the proposed MBBM can process 50 min. of 4K UHD videos in less than 6 hours. while other methods are estimated to take from 8 days to more than 21 years. Compared to 10 stateof-the-art foreground detection methods, the proposed MBBM shows the best quality results with an average global quality score of 0.597 (1 being the maximum) on a dataset of 4K UHDTV sequences containing various situation like illumination variation. Finally, the processing time per pixel of the MBBM is the lowest of all compared methods with an average of 3.18×10-8 s.

    DOI

  • A Dual-Clock VLSI Design of H.265 Sample Adaptive Offset Estimation for 8k Ultra- HD TV Encoding

    Jianbin Zhou, Dajiang Zhou, Shihao Wang, Shuping Zhang, Takeshi Yoshimura, Satoshi Goto

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS   25 ( 2 ) 714 - 724  2017.02

     View Summary

    Sample adaptive offset (SAO) is a newly introduced in-loop filtering component in H. 265/High Efficiency Video Coding (HEVC). While SAO contributes to a notable coding efficiency improvement, the estimation of SAO parameters dominates the complexity of in-loop filtering in HEVC encoding. This paper presents an efficient VLSI design for SAO estimation. Our design features a dual-clock architecture that processes statistics collection (SC) and parameter decision (PD), the two main functional blocks of SAO estimation, at high-and lowspeed clocks, respectively. Such a strategy reduces the overall area by 56% by addressing the heterogeneous data flows of SC and PD. To further improve the area and power efficiency, algorithm-architecture co-optimizations are applied, including a coarse range selection (CRS) and an accumulator bit width reduction (ABR). CRS shrinks the range of fine processed bands for the band offset estimation. ABR further reduces the area by narrowing the accumulators of SC. They together achieve another 25% area reduction. The proposed VLSI design is capable of processing 8k at 120-frames/s encoding. It occupies 51k logic gates, only one-third of the circuit area of the state-of-the-art implementations.

    DOI

  • VLSI Implementation of HEVC Motion Compensation With Distance Biased Direct Cache Mapping for 8K UHDTV Applications

    Shihao Wang, Dajiang Zhou, Jianbin Zhou, Takeshi Yoshimura, Satoshi Goto

    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY   27 ( 2 ) 380 - 393  2017.02

     View Summary

    Ultrahigh definition television is becoming increasingly attractive and practical with the doubled compression performance delivered by High Efficiency Video Coding (H.265/HEVC). Meanwhile, implementation of real-time video codecs is challenged by not only the huge throughput and memory bandwidth requirements but also the increased complexity of new algorithms. For motion compensation (MC) that is a known bottleneck in video decoding, the enlarged and diversified prediction unit sizes impose notably higher difficulties in trading off area, power, and memory traffic. This paper presents a very large scale integration implementation of HEVC MC that supports 7680 x 4320@60 frames/s bidirectional prediction. The MC design incorporates a highly efficient cache realized by novel architecture optimizations including distance biased directing mapping, eight-bank memory structure, row-based miss information compression, and mask-based block conflict checking. As a result, the proposed design not only achieves 8x throughput enhancement but also improves hardware efficiency by at least 2.01 times, in comparison with prior arts.

    DOI

  • An 8K H.265/HEVC Video Decoder Chip With a New System Pipeline Design

    Dajiang Zhou, Shihao Wang, Heming Sun, Jianbin Zhou, Jiayi Zhu, Yijin Zhao, Jinjia Zhou, Shuping Zhang, Shinji Kimura, Takeshi Yoshimura, Satoshi Goto

    IEEE JOURNAL OF SOLID-STATE CIRCUITS   52 ( 1 ) 113 - 126  2017.01

     View Summary

    8K ultra-HD is being promoted as the next-generation video specification. While the High Efficiency Video Coding (HEVC) standard greatly enhances the feasibility of 8K with a doubled compression ratio, its implementation is a challenge, owing to ultrahigh-throughput requirements and increased complexity per pixel. The latter comes from the new features of HEVC. At the system level, the most challenging of them is the enlarged and highly variable-size coding/prediction/transform units, which significantly increase the requirement for on-chip memory as pipeline buffers and the difficulty in maintaining pipeline utilization. This paper presents an HEVC decoder chip featuring a system pipeline that works at a nonunified and variable granularity. The pipeline saves on-chip memory with a novel block-in-block-out queue system and a parameter delivery network, while allowing overhead-free and fully pipelined operation of the processing components. With the system pipeline design combined with various component-level optimizations, the proposed decoder in 40 nm achieves a maximum throughput of 4 Gpixels/s or 8K 120 frames/s for the low-delay-P configuration of HEVC, 7.5-55 times faster than prior works. It supports 8K 60 frames/s for the low-delay and random-access configurations. In a normalized comparison, it also shows 3.1-3.6 times better area efficiency and 31%-55% superior energy efficiency.

    DOI

  • Chain-NN: An Energy-Efficient 1D Chain Architecture for Accelerating Deep Convolutional Neural Networks

    Shihao Wang, Dajiang Zhou, Xushen Han, Takeshi Yoshimura

    PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE)     1032 - 1037  2017

     View Summary

    Deep convolutional neural networks (CNN) have shown their good performances in many computer vision tasks. However, the high computational complexity of CNN involves a huge amount of data movements between the computational processor core and memory hierarchy which occupies the major of the power consumption. This paper presents Chain-NN, a novel energy-efficient 1D chain architecture for accelerating deep CNNs. Chain-NN consists of the dedicated dual-channel process engines (PE). In Chain-NN, convolutions are done by the 1D systolic primitives composed of a group of adjacent PEs. These systolic primitives, together with the proposed column-wise scan input pattern, can fully reuse input operand to reduce the memory bandwidth requirement for energy saving. Moreover, the 1D chain architecture allows the systolic primitives to be easily reconfigured according to specific CNN parameters with fewer design complexity. The synthesis and layout of Chain-NN is under TSMC 28nm process. It costs 3751k logic gates and 352KB on-chip memory. The results show a 576-PE Chain-NN can be scaled up to 700MHz. This achieves a peak throughput of 806.4GOPS with 567.5mW and is able to accelerate the five convolutional layers in AlexNet at a frame rate of 326.2fps. 1421.0GOPS/W power efficiency is at least 2.5 to 4.1x times better than the state-of-the-art works.

    DOI

  • Leakage-Power-Aware Scheduling With Dual-Threshold Voltage Design

    Nan Wang, Wei Zhong, Cong Hao, Song Chen, Takeshi Yoshimura, Yu Zhu

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS   24 ( 10 ) 3067 - 3079  2016.10

     View Summary

    The exponential increase in leakage power and the substantial power-saving opportunities provided by scheduling have made dual-threshold voltage (dual-V-th) an attractive choice for low-leakage-power designs. In this paper, we work under the assumption that functional units (FUs) are allocated after scheduling, and fully explore the solution space of scheduling with dual-V-th operations to optimize the leakage power of the FUs. First, a binding conflict graph (BCG)-based scheduling method is presented to minimize the number of FUs. Second, the BCG-based method is extended to allow scheduling with dual-V-th operation targeting the minimization of leakage power. In timing-constrained scheduling, each operation in the data flow is initialized with low-V-th. Then, starting from an operation schedule with the timing constraint satisfied, we scale the sets of low-V-th operations in the off-critical paths with high-V-th so as to reduce the number of low-V-th FUs without increasing the total delay. Finally, a scheduling method for minimizing the leakage power under both timing and resource constraints is presented. The results of benchmark tests show that the proposed algorithms can reduce the leakage power reported in previous works by 10.2% while maintaining high circuit performance.

    DOI

  • An efficient algorithm for 3D-IC TSV assignment

    Hao, Cong, Ding, Nan, Yoshimura, Takeshi

    14th IEEE International NEWCAS Conference, NEWCAS 2016    2016.10

     View Summary

    © 2016 IEEE.In this work we solve the 3D-IC TSV assignment which is formulated as an Integer Minimum Cost Multi Commodity (IMCMC) problem. We propose a multi-level algorithm including graph coarsening and un-coarsening, to greatly reduce the IMCMC problem size; benefiting from it, we transform the IMCMC problem into a mixed single and multi commodity problem and partially solved it using min-cost flow method. Further, we extend the layer by layer method for TSV assignment optimization. The experimental results shows that the our proposal generates initial solutions with 2.8% wire length reduction compared to the existing work, and our extended layer by layer method achieves another 1.7% improvement with slight time overhead.

    DOI

  • Leakage-Power-Aware Scheduling With Dual-Threshold Voltage Design

    Nan Wang, Wei Zhong, Cong Hao, Song Chen, Takeshi Yoshimura, Yu Zhu

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS   24 ( 10 ) 3067 - 3079  2016.10

     View Summary

    The exponential increase in leakage power and the substantial power-saving opportunities provided by scheduling have made dual-threshold voltage (dual-V-th) an attractive choice for low-leakage-power designs. In this paper, we work under the assumption that functional units (FUs) are allocated after scheduling, and fully explore the solution space of scheduling with dual-V-th operations to optimize the leakage power of the FUs. First, a binding conflict graph (BCG)-based scheduling method is presented to minimize the number of FUs. Second, the BCG-based method is extended to allow scheduling with dual-V-th operation targeting the minimization of leakage power. In timing-constrained scheduling, each operation in the data flow is initialized with low-V-th. Then, starting from an operation schedule with the timing constraint satisfied, we scale the sets of low-V-th operations in the off-critical paths with high-V-th so as to reduce the number of low-V-th FUs without increasing the total delay. Finally, a scheduling method for minimizing the leakage power under both timing and resource constraints is presented. The results of benchmark tests show that the proposed algorithms can reduce the leakage power reported in previous works by 10.2% while maintaining high circuit performance.

    DOI

  • Primal-dual method based simultaneous functional unit and register binding

    Ni, Jianmo, Hao, Cong, Wang, Nan, Ai, Qian, Yoshimura, Takeshi

    Proceedings - 2015 IEEE 11th International Conference on ASIC, ASICON 2015    2016.07

     View Summary

    © 2015 IEEE.Interconnect reduction is one of the key issues in high-level synthesis. In this paper, we propose a primal-dual based method to solve the functional unit (FU) and register binding simultaneously while minimizing the global interconnection. Specifically, the binding problem is formulated as a min-cost network flow based on splitting weighted and order compatibility graphs (SWOCGs). The interconnect sharing among registers and FUs are maximized by binding the the operations or variables on the same path to the same FUs or registers according to the flow. Experimental results show that, compared with the previous greedy method [7], our proposed algorithm achieves an average 4.8% further reduction in global interconnection for a suite of benchmarks.

    DOI

  • Simultaneous scheduling and binding for resource usage and interconnect complexity reduction in high-level synthesis

    Hao, Cong, Ni, Jian Mo, Wang, Hui Tong, Yoshimura, Takeshi

    Proceedings - 2015 IEEE 11th International Conference on ASIC, ASICON 2015    2016.07

     View Summary

    © 2015 IEEE.This paper proposes a simultaneous scheduling and binding approach for resource and interconnect reduction in high-level synthesis. The scheme incorporates the operation scheduling into functional unit (FU) and register binding, targeting the reduction of both resource and interconnect reduction. A simplified weighted and ordered compatibility graph (SWOCG) based binding algorithm is also proposed and runs tens of times faster than the WOCG based binding algorithm. The experimental results show that our proposal achieves 4% to 15% reduction in resource usage and interconnect reduction, and also runs 5X faster compared to previous works.

    DOI

  • 14.7 A 4Gpixel/s 8/10b H.265/HEVC video decoder chip for 8K Ultra HD applications

    Zhou, Dajiang, Wang, Shihao, Sun, Heming, Zhou, Jianbin, Zhu, Jiayi, Zhao, Yijin, Zhou, Jinjia, Zhang, Shuping, Kimura, Shinji, Yoshimura, Takeshi, Goto, Satoshi

    Digest of Technical Papers - IEEE International Solid-State Circuits Conference   59   266 - 268  2016.02

     View Summary

    © 2016 IEEE.8K Ultra HD is being promoted as the next-generation digital video format. From a communication channel perspective, the latest high-efficiency video coding standard (H.265/HEVC) greatly enhances the feasibility of 8K by doubling the compression ratio. Implementation of such codecs is a challenge, owing to ultra-high throughput requirements and increased complexity per pixel. The former corresponds to up to 10b/pixel, 7680×4320pixels/frame and 120fps - 80× larger than 1080p HD. The latter comes from the new features of HEVC relative to its predecessor H.264/AVC. The most challenging of them is the enlarged and highly variable-size coding/prediction/transform units (CU/PU/TU), which significantly increase: 1) the requirement for on-chip memory as pipeline buffers, 2) the difficulty in maintianing pipeline utilization, and 3) the complexity of inverse transforms (IT). This paper presents an HEVC decoder chip supporting 8K Ultra HD, featuring a 16pixel/cycle true-variable-block-size system pipeline. The pipeline: 1) saves on-chip memory with a novel block-in-block-out (BIBO) queue system and a parameter delivery network, and 2) allows high design efficiency and utilizat

    DOI

  • Power-efficient Partitioning and Cluster Generation Design for Application-Specific Network-on-Chip

    Jiayi Ma, Cong Hao, Wencan Zhang, Takeshi Yoshimura

    2016 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC)     83 - 84  2016

     View Summary

    Network-on-Chip (NoC) is a promising solution for System-on-Chip (SoC) challenges. In this work, we present a Decompose and Cluster generation Refinement (DCR) algorithm to find minimum power consumption simultaneously. A two-stage method is proposed for decompose and cluster generation step to generate solutions with lower power. Refinement step explores optimal positions and adjusts clusters for selected solutions to find balanced point between power consumption and CPU time. Experimental results show that the proposed method outperforms the existing work.

    DOI

  • Unified parameter decoder architecture for H.265/HEVC motion vector and boundary strength decoding

    Shihao Wang, Dajiang Zhou, Jianbin Zhou, Takeshi Yoshimura, Satoshi Goto

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E98A ( 7 ) 1356 - 1365  2015.07

     View Summary

    In this paper, VLSI architecture design of unified motion vector (MV) and boundary strength (BS) parameter decoder (PDec) for 8K UHDTV HEVC decoder is presented. The adoption of new coding tools in PDec, such as Advanced Motion Vector Prediction (AMVP), increases the VLSI hardware realization overhead and memory bandwidth requirement, especially for 8K UHDTV application. We propose four techniques for these challenges. Firstly, this work unifies MV and BS parameter decoders for line buffer memory sharing. Secondly, to support high throughput, we propose the top-level CU-adaptive pipeline scheme by trading off between implementation complexity and performance. Thirdly, PDec process engine with optimizations is adopted for 43.2k area reduction. Finally, PU-based coding scheme is proposed for 30% DRAM bandwidth reduction. In 90 nm process, our design costs 93.3k logic gates with 23.0 kB line buffer. The proposed architecture can support real-time decoding for 7680x4320@60fps application at 249MHz in the worst case.

    DOI CiNii

  • Unified Parameter Decoder Architecture for H.265/HEVC Motion Vector and Boundary Strength Decoding

    Shihao Wang, Dajiang Zhou, Jianbin Zhou, Takeshi Yoshimura, Satoshi Goto

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E98A ( 7 ) 1356 - 1365  2015.07

     View Summary

    In this paper, VLSI architecture design of unified motion vector (MV) and boundary strength (BS) parameter decoder (PDec) for 8K UHDTV HEVC decoder is presented. The adoption of new coding tools in PDec, such as Advanced Motion Vector Prediction (AMVP), increases the VLSI hardware realization overhead and memory bandwidth requirement, especially for 8K UHDTV application. We propose four techniques for these challenges. Firstly, this work unifies MV and BS parameter decoders for line buffer memory sharing. Secondly, to support high throughput, we propose the top-level CU-adaptive pipeline scheme by trading off between implementation complexity and performance. Thirdly, PDec process engine with optimizations is adopted for 43.2k area reduction. Finally, PU-based coding scheme is proposed for 30% DRAM bandwidth reduction. In 90 nm process, our design costs 93.3k logic gates with 23.0 kB line buffer. The proposed architecture can support real-time decoding for 7680x4320@60fps application at 249 MHz in the worst case.

    DOI

  • High Performance VLSI Architecture of H.265/HEVC Intra Prediction for 8K UHDTV Video Decoder

    ZHOU Jianbin, ZHOU Dajiang, WANG Shihao, YOSHIMURA Takeshi, GOTO Satoshi

    IEICE Trans. Fundamentals   98 ( 12 ) 2519 - 2527  2015

     View Summary

    8K Ultra High Definition Television (UHDTV) requires extremely high throughput for video decoding based on H.265. In H.265, intra coding could significantly enhance video compression efficiency, at the expense of an increased computational complexity compared with H.264. For intra prediction of 8K UHDTV real-time H.265 decoding, the joint complexity and throughput issue is more difficult to solve. Therefore, based on the divide-and-conquer strategy, we propose a new VLSI architecture in this paper, including two techniques, in order to achieve 8K UHDTV H.265 intra prediction decoding. The first technique is the LUT based Reference Sample Fetching Scheme (LUT-RSFS), reducing the number of reference samples in the worst case from 99 to 13. It further reduces the circuit area and enhances the performance. The second one is the Hybrid Block Reordering and Data Forwarding (HBRDF), minimizing the idle time and eliminating the dependency between TUs by creating 3 Data Forwarding paths. It achieves the hardware utilization of 94%. Our design is synthesized using Synopsys Design Compiler in 40nm process technology. It achieves an operation frequency of 260MHz, with a gate count of 217.8K fo

    CiNii

  • Mobility Overlap-Removal-Based Leakage Power and Register-Aware Scheduling in High-Level Synthesis

    Nan Wang, Song Chen, Wei Zhong, Nan Liu, Takeshi Yoshimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E97A ( 8 ) 1709 - 1719  2014.08

     View Summary

    Scheduling is a key problem in high level synthesis, as the scheduling results affect most of the important design metrics. In this paper, we propose a novel scheduling method to simultaneously optimize the leakage power of functional units with dual-Vth techniques and the number of registers under given timing and resource constraints. The mobility overlaps between operations are removed to eliminate data dependencies, and a simulated-annealing-based method is introduced to explore the mobility overlap removal solution space. Given the overlapfree mobilities, the resource usage and register usage in each control step can be accurately estimated. Meanwhile, operations are scheduled so as to optimize the leakage power of functional units with minimal number of registers. Then, a set of operations is iteratively selected, reassigned as low-Vth, and rescheduled until the resource constraints are all satisfied. Experimental results show the efficiency of the proposed algorithm.

    DOI

  • Leakage Power Aware Scheduling in High-Level Synthesis

    Nan Wang, Song Chen, Cong Hao, Haoran Zhang, Takeshi Yoshimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E97A ( 4 ) 940 - 951  2014.04

     View Summary

    In this paper, we address the problem of scheduling operations into control steps with a dual threshold voltage (dual-V-th) technique, under timing and resource constraints. We present a two-stage algorithm for leakage power optimization. In the threshold voltage (V-th) assignment stage, the proposed algorithm first initializes all the operations to high-V-th, and then it iteratively shortens the critical path delay by reassigning the set of operations covering all the critical paths to low-V-th, until the timing constraint is met. In the scheduling stage, a modified force-directed scheduling is implemented to schedule operations and to adjust threshold voltage assignments with a consideration of the resource constraints. To eliminate the potential resource constraint violations, the operations' threshold voltage adjustment problem is formulated as a "weighted interval scheduling" problem. The experimental results show that our proposed method performs better in both running time and leakage power reduction compared with MWIS [3].

    DOI

  • Leakage Power Aware Scheduling in High-Level Synthesis

    Nan Wang, Song Chen, Cong Hao, Haoran Zhang, Takeshi Yoshimura

    IEICE Trans. Fundamentals.   Vol.E97-A ( No.4 ) 940 - 951  2014.04

  • Leakage power aware scheduling in high-level synthesis

    Nan Wang, Song Chen, Cong Hao, Haoran Zhang, Takeshi Yoshimura

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E97-A ( 4 ) 940 - 951  2014

     View Summary

    In this paper, we address the problem of scheduling operations into control steps with a dual threshold voltage (dual-Vth) technique, under timing and resource constraints. We present a two-stage algorithm for leakage power optimization. In the threshold voltage (Vth) assignment stage, the proposed algorithm first initializes all the operations to high-V th, and then it iteratively shortens the critical path delay by reassigning the set of operations covering all the critical paths to low-V th until the timing constraint is met. In the scheduling stage, a modified force-directed scheduling is implemented to schedule operations and to adjust threshold voltage assignments with a consideration of the resource constraints. To eliminate the potential resource constraint violations, the operations' threshold voltage adjustment problem is formulated as a "weighted interval scheduling" problem. The experimental results show that our proposed method performs better in both running time and leakage power reduction compared with MWIS [3]. Copyright © 2014 The Institute of Electronics, Information and Communication Engineers.

    DOI CiNii

  • Mobility overlap-removal-based leakage power and register-aware scheduling in high-level synthesis

    Nan Wang, Song Chen, Wei Zhong, Nan Liu, Takeshi Yoshimura

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E97-A ( 8 ) 1709 - 1719  2014

     View Summary

    Scheduling is a key problem in high level synthesis, as the scheduling results affect most of the important design metrics. In this paper, we propose a novel scheduling method to simultaneously optimize the leakage power of functional units with dual-Vth techniques and the number of registers under given timing and resource constraints. The mobility overlaps between operations are removed to eliminate data dependencies, and a simulated-annealing-based method is introduced to explore the mobility overlap removal solution space. Given the overlapfree mobilities, the resource usage and register usage in each control step can be accurately estimated. Meanwhile, operations are scheduled so as to optimize the leakage power of functional units with minimal number of registers. Then, a set of operations is iteratively selected, reassigned as low-Vth, and rescheduled until the resource constraints are all satisfied. Experimental results show the efficiency of the proposed algorithm. Copyright © 2014 The Institute of Electronics, Information and Communication Engineers.

    DOI CiNii

  • Genetic Algorithm Based Pipeline Scheduling in High-level Synthesis

    Xiaohao Gao, T. Yoshimura

    Proc. ASICON 2013     1 - 4  2013.10

    DOI

  • Timing and Resource Constrained Leakage Power Aware Scheduling in High-Level Synthesis

    Nan Wang, Song Chen, T. Yoshimura

    Proc. ASICON 2013     1 - 4  2013.10

    DOI

  • Power and Resource Aware Scheduling with Multiple Voltages

    Haoran Zhang, Cong Hao, Nan Wang, Song Chen, Takeshi Yoshimura

    Proc. ASICON 2013     1 - 4  2013.10

    DOI

  • Interconnection Allocation Between Functional Units And Registers in High-Level Synthesis

    Cong Hao, Nan Wang, Song Chen, Takeshi Yoshimura, Min-You Wu

    Proc ASICN 2013     1 - 4  2013.10

    DOI

  • Mobility Overlap-Removal Based Leakage Power Aware Scheduling in High-Level Synthesis

    Nan Wang, Song Chen, Takeshi Yoshimura

    Proc. ISCAS2013     1745 - 1748  2013.05

  • Topology-Aware Floorplanning for 3D Application-Specific Network-on-Chip Synthesis

    Bo Huang, Song Chen, Wei Zhong, Takeshi Yoshimura

    Proc. ISCAS2013     1732 - 1735  2013.05

  • Resource-Aware Multi-Layer Floorplanning for Partially Reconfigurable FPGAs

    Nan Liu, Song Chen, Takeshi Yoshimura

    IEICE TRANSACTIONS ON ELECTRONICS   E96C ( 4 ) 501 - 510  2013.04

     View Summary

    Modem field programmable gate arrays (FPGAs) with heterogeneous resources are partially reconfigurable. Existing methods of reconfiguration-aware floorplanning have limitations with regard to homogeneous resources; they solve only a part of the reconfigurable problem. In this paper, first, a precise model for partially reconfigurable FPGAs is formulated, and then, a two-phase floorplanning approach is presented. In the proposed approach, resource distribution is taken into consideration at all times. In the first step, a resource-aware insertion-after-remove perturbation is devised on the basis of the multi-layer sequence pair constraint graphs, and resource-aware slack-based moves (RASBM) are made to satisfy resource requirements. In the second step, a resource-aware fixed-outline floorplanner is used, and RASBM are applied to pack the reconfigurable regions on the FPGAs. Experimental results show that the proposed approach is resource- and reconfiguration-aware, and facilitates stable floorplanning. In addition, it reduces the wire-length by 4-28% in the first step, and by 12% on average in the second step compared to the wire-length in previous approaches.

    DOI

  • Resource-Aware Multi-Layer Floorplanning for Partially Reconfigurable FPGAs

    Nan Liu, Song Chen, Takeshi Yoshimura

    IEICE Trans. on Electronics   Vol.E96-C ( No.4 ) 510 - 510  2013.04

  • Min-Cut Based Leakage Power Aware Scheduling in High-Level Synthesis

    Nan Wang, Song Chen, Takeshi Yoshimura

    Proc. ISQED 2013    2013.03

  • Floorplanning and topology synthesis for application-specific network-on-chips

    Wei Zhong, Song Chen, Bo Huang, Takeshi Yoshimura, Satoshi Goto

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E96-A ( 6 ) 1174 - 1184  2013

     View Summary

    Application-Specific Network-on-Chips (ASNoCs) have been proposed as a more promising solution than regular NoCs to the global communication challenges for particular applications in nanoscale Systemon- Chip (SoC) designs. In ASNoC Design, one of the key challenges is to generate the most suitable and power efficient NoC topology under the constraints of the application specification. In this work, we present a two-step floorplanning (TSF) algorithm, integrating topology synthesis into floorplanning phase, to automate the synthesis of such ASNoC topologies. At the first-step floorplanning, during the simulated annealing, we explore the optimal positions and clustering of cores and implement an incremental path allocation algorithm to predictively evaluate the power consumption of the generated NoC topology. At the second-step floorplanning, we explore the optimal positions of switches and network interfaces on the floorplan. A power and timing aware path allocation algorithm is also integrated into this step to determine the connectivity across different switches. Experimental results on a variety of benchmarks show that our algorithm can produce greatly improved solutions over the latest works. Copyright © 2013 The Institute of Electronics, Information and Communication Engineers.

    DOI

  • Genetic Algorithm based pipeline scheduling in high-level synthesis

    Gao, Xiaohao, Yoshimura, Takeshi

    Proceedings of International Conference on ASIC    2013.01

     View Summary

    In this work, we present a Genetic Algorithm (GA) based method for pipeline scheduling optimization. The objective is to minimize the circuit area under both data initiation interval and pipeline latency constraints. In the initialization, the scheduler generates a series of solutions between As Soon As Possible (ASAP) and As Late As Possible (ALAP) interval. Afterwards a Linear Programming (LP) algorithm is applied for transforming unfeasible solutions to feasible solutions, which are input to GA for searching the optimization result. In the experiments, our proposed algorithm achieves an average of 29.74% area improvement by comparing with ASAP and ALAP methods. © 2013 IEEE.

    DOI

  • Network Simplex Method Based Multiple Voltage Scheduling in Power-Efficient High-Level Synthesis

    Cong Hao, Song Chen, Takeshi Yoshimura

    2013 18TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC)     237 - 242  2013

     View Summary

    In this work, we focus on the problem of latency-constrained scheduling with consideration of multiple voltage technologies in High-level synthesis. Without the resource concern, we propose an Integer Linear Programming (ILP) formulation and further relax it to a piecewise Linear Programming (LP) problem, which is optimally solved using the efficient piecewise-linear extended network simplex method(PLNSM). The experimental results showed 80X+ speedup compared to the general LP formulation. Considering the resource usage, we propose a two-stage heuristic Network Simplex Method based Power-efficient Multiple Voltage Scheduling(NPMVS) method. Firstly, the above relaxed LP formulation is modified to perform mobility allocation and delay assignment for the operations so as to minimize the power and the differences between the allocated operation mobilities and the predefined target mobilities. The modified formulation is solved using the PLNSM and iteratively performed to minimize power and resource density variation in control steps by gradually updating the predefined target mobilities. Secondly, with the allocated operation mobilities, we apply dependencyfree operation scheduling with the objective of minimizing the resource usage. Experimental results show that the proposed method can produce optimum solutions for all 6 benchmarks with 14 groups of data in a maximum time of 0.25 second.

  • A Novel Floorplan Representation with Random Contour Corner Selecting Scheme

    Xiaohao Gao, Takeshi Yoshimura

    2013 IEEE TENCON SPRING CONFERENCE   (Accepted)   552 - 556  2013

     View Summary

    Floorplanning has played a crucial role in the VLSI physical design process, while a lot of focus has been put on the representation methodologies for the floorplan optimization in the recent researches. In this work, we propose an efficient Padmissible representation, called random contour corner (RCC), for non-slicing floorplans. It depends on a two-section random code, representing the sequence and location of the blocks respectively. The objective is to improve both area and wire length. For the optimization procedure, we apply a simulated annealing (SA) algorithm. The method is quite simple and time efficient for implementation even on large-scale integration floorplans, and the run time complexity for the algorithm is O(nlogn). The experimental results show that our proposed method achieves promising results by comparing with some other representations (O-tree, B*-tree and TCG) on basic MCNC benchmark circuits.

  • Network Simplex Method Based Multiple Voltage Scheduling in Power-Efficient High-Level Synthesis

    Cong Hao, Song Chen, Takeshi Yoshimura

    2013 18TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC)     237 - 242  2013

     View Summary

    In this work, we focus on the problem of latency-constrained scheduling with consideration of multiple voltage technologies in High-level synthesis. Without the resource concern, we propose an Integer Linear Programming (ILP) formulation and further relax it to a piecewise Linear Programming (LP) problem, which is optimally solved using the efficient piecewise-linear extended network simplex method(PLNSM). The experimental results showed 80X+ speedup compared to the general LP formulation. Considering the resource usage, we propose a two-stage heuristic Network Simplex Method based Power-efficient Multiple Voltage Scheduling(NPMVS) method. Firstly, the above relaxed LP formulation is modified to perform mobility allocation and delay assignment for the operations so as to minimize the power and the differences between the allocated operation mobilities and the predefined target mobilities. The modified formulation is solved using the PLNSM and iteratively performed to minimize power and resource density variation in control steps by gradually updating the predefined target mobilities. Secondly, with the allocated operation mobilities, we apply dependencyfree operation scheduling with the objective of minimizing the resource usage. Experimental results show that the proposed method can produce optimum solutions for all 6 benchmarks with 14 groups of data in a maximum time of 0.25 second.

    DOI

  • Floorplanning and topology synthesis for application-specific network-on-chips

    Wei Zhong, Song Chen, Bo Huang, Takeshi Yoshimura, Satoshi Goto

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E96-A ( 6 ) 1174 - 1184  2013

     View Summary

    Application-Specific Network-on-Chips (ASNoCs) have been proposed as a more promising solution than regular NoCs to the global communication challenges for particular applications in nanoscale Systemon- Chip (SoC) designs. In ASNoC Design, one of the key challenges is to generate the most suitable and power efficient NoC topology under the constraints of the application specification. In this work, we present a two-step floorplanning (TSF) algorithm, integrating topology synthesis into floorplanning phase, to automate the synthesis of such ASNoC topologies. At the first-step floorplanning, during the simulated annealing, we explore the optimal positions and clustering of cores and implement an incremental path allocation algorithm to predictively evaluate the power consumption of the generated NoC topology. At the second-step floorplanning, we explore the optimal positions of switches and network interfaces on the floorplan. A power and timing aware path allocation algorithm is also integrated into this step to determine the connectivity across different switches. Experimental results on a variety of benchmarks show that our algorithm can produce greatly improved solutions over the latest works. Copyright © 2013 The Institute of Electronics, Information and Communication Engineers.

    DOI

  • Min-cut based leakage power aware scheduling in high-level synthesis

    Nan Wang, Song Chen, Takeshi Yoshimura

    Proceedings - International Symposium on Quality Electronic Design, ISQED     164 - 169  2013

     View Summary

    In this paper, we address the problem of scheduling operations into control steps with dual threshold voltage (dual-Vth) technique under timing and resource constraints. We present a min-cut based algorithm for leakage power optimization. The proposed algorithm first initializes all the operations to high-Vth, then iteratively shorten the critical path delay by reassigning the set of operations covering all the critical paths to low-Vth until the timing constraints are met. A modified force-directed scheduling is implemented to schedule operations and to adjust threshold voltage assignments with consideration of resource constraints. During this procedure, mobility overlap graph (MOG) is constructed based on the mobilities of high-Vth operations. To guarantee the resource constraints are satisfied, operations' threshold voltages are adjusted by computing the min-cut of MOG. Experimental results show that our method performs better both in running time and leakage power reduction compared with MWIS-heuristic [3]. © 2013 IEEE.

    DOI

  • Port assignment for multiplexer and interconnection optimization

    Cong Hao, Hao-Ran Zhang, Song Chen, Takeshi Yoshimura, Min-You Wu

    Proceedings of the 5th Asia Symposium on Quality Electronic Design, ASQED 2013     136 - 143  2013

     View Summary

    Data path connection elements usually consume a significant amount of both power and area on a VLSI chip. In this paper, we focus on the port assignment problem for multiplexer (MUX) and interconnection optimization in High-Level Synthesis. Given a binding solution of operations and variables, the port assignment problem connects the registers to the operator ports through MUXes, to minimize the interconnections between MUXes and operator ports, as well as the MUX power and area. We formulate the port assignment problem for binary commutative operators as a vertex partition problem on a graph, and propose a local search based heuristic algorithm that iteratively performs the elementary spanning tree transformation on the graph to solve it. We also propose a method to estimate the result of the tree transformation and filter a considerable amount of bad solutions in advance which greatly accelerate the algorithm. The experimental results show that our proposed algorithm is able to achieve 48% execution time reduction and 8.3% power reduction compared with the previous work, and the power reduction can be obtained for 37% test benches. © 2013 IEEE.

    DOI

  • A Novel Floorplan Representation with Random Contour Corner Selecting Scheme

    Xiaohao Gao, Takeshi Yoshimura

    2013 IEEE TENCON SPRING CONFERENCE     552 - 556  2013

     View Summary

    Floorplanning has played a crucial role in the VLSI physical design process, while a lot of focus has been put on the representation methodologies for the floorplan optimization in the recent researches. In this work, we propose an efficient Padmissible representation, called random contour corner (RCC), for non-slicing floorplans. It depends on a two-section random code, representing the sequence and location of the blocks respectively. The objective is to improve both area and wire length. For the optimization procedure, we apply a simulated annealing (SA) algorithm. The method is quite simple and time efficient for implementation even on large-scale integration floorplans, and the run time complexity for the algorithm is O(nlogn). The experimental results show that our proposed method achieves promising results by comparing with some other representations (O-tree, B*-tree and TCG) on basic MCNC benchmark circuits.

    DOI

  • Topology-aware floorplanning for 3D application-specific Network-on-Chip synthesis

    Bo Huang, Song Chen, Wei Zhong, Takeshi Yoshimura

    Proceedings - IEEE International Symposium on Circuits and Systems     1732 - 1735  2013

     View Summary

    As technology scaling, three-dimensional integrated circuits (3D-ICs) are emerging as a promising solution to address the challenges in system on chips (SoCs). Moreover, it's a necessity to design an efficient Network-on-Chip (NoC) topology for the interconnection issues of 3D SoCs. In this paper, we propose a topology-aware floorplanning method to determine the power-performance efficient 3D NoC topology. Unlike the previous works which explore the path allocation of the NoC components and Through-Silicon Vias (TSVs) assignment after the floorplan of cores is fixed, we integrate these steps (the clustering of cores + the placement of cores and switches + the path allocation + the TSV-aware topology evaluation) within the 3D floorplanning procedure. Experimental results show the effectiveness of our method. © 2013 IEEE.

    DOI

  • Mobility overlap-removal based leakage power aware scheduling in high-level synthesis

    Nan Wang, Song Chen, Yuhuan Sun, Takeshi Yoshimura

    Proceedings - IEEE International Symposium on Circuits and Systems     1745 - 1748  2013

     View Summary

    In this paper, we address the problem of scheduling operations into control steps with dual threshold voltage (dual-Vth) technique under timing and resource constraints. Recently, some scheduling methods are proposed based on the mobility overlap removal, and it is a hard problem to remove the mobility overlap optimally. There might be no feasible solution with an improper mobility overlap removal. In this work, we implement the mobility overlap removal together with dual threshold voltage technique to minimize the total leakage power. A simulated-annealing based method is introduced to explore the optimal solution. For each mobility overlap removal, a probability-based method is proposed to schedule operations at appropriate control steps, and to assign them with proper threshold voltages. The experimental results show the effectiveness of the proposed method. © 2013 IEEE.

    DOI

  • Lagrangian relaxation based pin assignment and Through-Silicon Via planning for 3-D SoCs

    Wei Zhong, Song Chen, Yang Geng, Takeshi Yoshimura

    Proceedings of International Conference on ASIC     1 - 4  2013

     View Summary

    As technology advances, 3-D stacking of silicon layers that promises a solution to significantly alleviate the interconnect problem faced by current System-on-Chips (SoCs). In 3-D SoCs, judicious assignment of the pin locations and related vertical Through-Silicon Vias (TSVs) can improve the routing area and interconnection delay by reducing wire length and wire congestion. In this paper, we propose a significant algorithm to locate pin and TSV positions simultaneously for the two-pin net list in 3-D SoCs. Given a floorplan result, we formulate the pin assignment and TSV planning problem as a min-cost multi-commodity flow model and solve it based on lagrangian relaxation. By relaxing the capacity constraints in pin and TSV locations, we transform the min-cost multi-commodity flow problem to several min-cost max-flow problems that can be solved independently. A heuristic algorithm is also proposed to improve the result of lagrangian relaxation to guarantee the feasible solution. Experimental results show the effectiveness of the proposed algorithm. © 2013 IEEE.

    DOI

  • Application-specific network-on-chip synthesis with topology-aware floorplanning

    Huang, Bo, Chen, Song, Zhong, Wei, Yoshimura, Takeshi

    Proceedings - SBCCI 2012: 25th Symposium on Integrated Circuits and Systems Design    2012.12

     View Summary

    Application-Specific Network-on-Chip (ASNoC) architecture is more promising than regular network-on-Chip(NoC) for some particular applications. In ASNoC Design, one of the key challenges is to generate the most suitable and power efficient NoC topology. In previous works, the placement of the cores and network components, and the path allocation are explored separately. However, the path allocation strongly depends on the placement of cores and network components. In this paper, we integrate these steps together through the floorplanning with the cluster reconstruction and path allocation (FCRPA). Several SoC benchmarks have been tested and the results showed improvements over the latest works. copy;2012 IEEE.

  • Floorplanning for High Utilization of Heterogeneous FPGAs

    Nan Liu, Song Chen, Takeshi Yoshimura

    IEICE Transactions on Fundamentals   E95-A ( 9 ) 1529 - 1537  2012.09

  • Applicationi Specific Network-on-Chip Synthesis with Topology-Aware Floorplanning

    Bo Huang, S ong Chen, WeiZhong,T. Yoshimura

    International Symposium on Integrated Circuits and Systems Design (SBCCI)    2012.08

  • Integrating Routing Path Allocation with Network Component Placement for Application- Specific Network-on-Chips

    Wei Zhong, Song Chen, Dan Su, Takeshi Yoshimura, S.Goto

    Proc. ITC-CSCC    2012.07

  • Cluster Generation and Network Component Insertion for Topology Synthesis of Application-Specific Network-on-Chips

    Wei Zhong, Takeshi Yoshimura, Bei Yu, Song Chen, Sheqin Dong, Satoshi Goto

    IEICE TRANSACTIONS ON ELECTRONICS   E95C ( 4 ) 534 - 545  2012.04

     View Summary

    Network-on-Chips (NoCs) have been proposed as a solution for addressing the global communication challenges in System-on-Chip (SoC) architectures that are implemented in nanoscale technologies. For the use of NoCs to be feasible in today's industrial designs, a custom-tailored, power- efficient NoC topology that satisfies the application characteristics is required. In this work, we present a design methodology that automates the synthesis of such application-specific NoC topologies. We present a method which integrates partitioning into floorplanning phase to explore optimal clustering of cores during floorplanning with minimized link and switch power consumption. Based on the size of applications, we also present an Integer Linear Programming and a heuristic method to place switches and network interfaces on the floorplan. Then, a power and timing aware path allocation algorithm is carried out to determine the connectivity across different switches. We perform experiments on several SoC benchmarks and present a comparison with the latest work. For small applications, the NoC topologies synthesized by our method show large improvements in power consumption (27,54%), hop-count (4%) and running time (66%) on average. And for large applications, the synthesized topologies result in large power (31.77%), hop-count (29%) and running time (94.18%) on average.

    DOI

  • Port Assignment for Interconnect Reduction in High-Level Synthesis

    Cong Hao, Song Chen, Takeshi Yoshimura

    International Symposium on VLSI Design, Automation and Test (VLSI-DAT)     1 - 4  2012.04

  • Cluster Generation and Network Component Insertion for Topology Synthesis of Application-Specific Network-on-Chips

    Wei Zhong, T. Yoshimura, Bei Yu, Song Chen, Sheqin Dong, S. Goto

    IEICE Transactions on Electronics   E95-C ( 4 ) 535 - 545  2012.04

  • Floorplanning for high utilization of heterogeneous FPGAs

    Nan Liu, Song Chen, Takeshi Yoshimura

    IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences   E95-A ( 9 ) 1529 - 1537  2012

     View Summary

    Heterogeneous resources such as configurable logic blocks (CLBs), multiplier blocks (MULs) and RAM blocks (RAMs) where millions of logic gates are included have been added to field programmable gate arrays (FPGAs). The fixed-outline floorplanning used by the existing methods always has a big penalty item in the objective function to ensure all the modules are placed in the specified chip region, which maybe greatly degrade the wirelength. This paper presents a three-phase floorplanning method for heterogeneous FPGAs. First, a non-slicing free-outline floorplanning method is used to optimize the wirelength, however, in this phase, the satisfaction of resource requirements from functional modules might fail. Second, a min-cost-max-flow algorithm is used to tune the assignment of CLBs to functional modules, and assign contiguous regions to each module so that all the functional modules satisfy CLB requirements. Finally, the MULs and RAMs are allocated to modules by a network flow model. CLBs hold the maximum quantity among all the resources. Therefore, making a high utilization of them means an enhancement of the FPGA densities. The proposed method can improve the utilization of CLBs, hence, much larger circuits could be mapped to the same FPGA chip. The results show that about 7-85% wirelength reduction is obtained, and CLB utilization is improved by about 25%. Copyright © 2012 The Institute of Electronics, Information and Communication Engineers.

    DOI

  • Practically scalable floorplanning with voltage island generation

    Song Chen, Xiaolin Zhang, Takeshi Yoshimura

    Proceedings of the International Symposium on Low Power Electronics and Design     27 - 32  2012

     View Summary

    In this paper, we propose a method of floorplanning with voltage island generation for system-on-chips (SoC) designs, which is deeply coupled with voltage island partitioning and voltage assignment, and has a good scalability. Floorplans with voltage islands are represented using nested Sequence Pairs, where the cores involved in the same voltage island consecutively appear in the sequences. Starting from a randomly generated initial floorplan, where each non chip-level core occupies an individual voltage island, we iteratively improve the solution by removing a core from the floorplan, then, inserting back the core by trying all the possible O(n 2) block positions, which are defined as the combinations of insertion points and voltage islands. An almost linear algorithm is devised to roughly but quickly filter many worse block positions, in each iteration, considering fixed-outline constraints, wirelength, power, power/ground routing resources, and level shifters. Compared with the latest work, the proposed method shows 9.74% and 21.04% improvements respectively on the wirelength and power when all the blocks are hard. When soft blocks are involved, the proposed method shows 12.38% wirelength reduction and 3.10% more power saving with a penalty of 7.5% whitespace. Moreover, more than 20X speedups can be obtained for the large test cases. © 2012 ACM.

    DOI

  • Practically scalable floorplanning with voltage island generation

    Song Chen, Xiaolin Zhang, Takeshi Yoshimura

    Proceedings of the International Symposium on Low Power Electronics and Design     27 - 32  2012

     View Summary

    In this paper, we propose a method of floorplanning with voltage island generation for system-on-chips (SoC) designs, which is deeply coupled with voltage island partitioning and voltage assignment, and has a good scalability. Floorplans with voltage islands are represented using nested Sequence Pairs, where the cores involved in the same voltage island consecutively appear in the sequences. Starting from a randomly generated initial floorplan, where each non chip-level core occupies an individual voltage island, we iteratively improve the solution by removing a core from the floorplan, then, inserting back the core by trying all the possible O(n 2) block positions, which are defined as the combinations of insertion points and voltage islands. An almost linear algorithm is devised to roughly but quickly filter many worse block positions, in each iteration, considering fixed-outline constraints, wirelength, power, power/ground routing resources, and level shifters. Compared with the latest work, the proposed method shows 9.74% and 21.04% improvements respectively on the wirelength and power when all the blocks are hard. When soft blocks are involved, the proposed method shows 12.38% wirelength reduction and 3.10% more power saving with a penalty of 7.5% whitespace. Moreover, more than 20X speedups can be obtained for the large test cases. © 2012 ACM.

    DOI

  • Wirelength driven I/O buffer placement for Flip-chip with timing-constrained

    Nan Liu, Shiyu Liu, Takeshi Yoshimura

    2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     631 - 634  2012

     View Summary

    Flip-chip package provides the highest chip density because I/O buffers in it could be placed anywhere inside a chip. The assignment of bump pads, I/O buffers and I/O pins will affect the satisfaction of timing requirement inside die core. In this paper, we proposed an effective three-step hierarchical approach to satisfy the timing-constrained I/O buffer placement in an area-I/O flip-chip design, meanwhile, wirelength could be optimized. First of all, I/O buffers are inserted to the floorplan plane greedily, and then, the wirelength between I/O buffers and I/O pins are optimized by a fixed-outline floorplanning algorithm. Secondly, a network flow model is conducted, and a min-cost-max-flow algorithm is used to assign I/O pins, I/O buffers and bump pads. Finally, the timing constraints are translated to length constraints, the results that satisfy timing constraints are selected. The experimental results show that, under the given timing constraints, higher timing-constrained satisfaction ratio (TCSR) is obtained, and the reduction of total wirelength is 14% on average.

    DOI

  • Wirelength driven I/O buffer placement for Flip-chip with timing-constrained

    Nan Liu, Shiyu Liu, Takeshi Yoshimura

    2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS)     631 - 634  2012

     View Summary

    Flip-chip package provides the highest chip density because I/O buffers in it could be placed anywhere inside a chip. The assignment of bump pads, I/O buffers and I/O pins will affect the satisfaction of timing requirement inside die core. In this paper, we proposed an effective three-step hierarchical approach to satisfy the timing-constrained I/O buffer placement in an area-I/O flip-chip design, meanwhile, wirelength could be optimized. First of all, I/O buffers are inserted to the floorplan plane greedily, and then, the wirelength between I/O buffers and I/O pins are optimized by a fixed-outline floorplanning algorithm. Secondly, a network flow model is conducted, and a min-cost-max-flow algorithm is used to assign I/O pins, I/O buffers and bump pads. Finally, the timing constraints are translated to length constraints, the results that satisfy timing constraints are selected. The experimental results show that, under the given timing constraints, higher timing-constrained satisfaction ratio (TCSR) is obtained, and the reduction of total wirelength is 14% on average.

  • A Low Power Technology Mapping Method for Adaptive Logic Module

    Wei Chen, Yuichi Nakamura, Xiaolin Zhang, Takeshi Yoshimura

    Proc. International Conference on Field-Programmable Technology, New Delhi, India    2011.12

  • mobility Overlap-Removal Based Timing-Constrained Scheduling

    Song Chen, Yuan Yao, Takeshi Yoshimura

    Proc. IEEE International Conference on ASIC (ASICON), Xiamen, China    2011.10

  • An Efficient Level-Shifter Floorplanning Method for Multi-Voltage Design

    Xiaolin Zhang, Zhi Lin, Song Chen, Takeshi Yoshimura

    Proc. IEEE International Conference on ASIC (ASICON), Xiamen, China    2011.10

  • Application-Specfic Network-on-Chip Synthesis: Cluster Generation and Network Component Insertion

    W. Zhong, B. Yu, S. Chen, T. Yoshimura, S. Dong, S. Goto

    Proc.IEEE International Symposium on Quality Electronic Design (ISQED), Santa Clara, USA.    2011.03

  • Floorplanning for high utilization of heterogeneous FPGAs

    N. Liu, S. Chen, T. Yoshimura

    Proc. IEEE International Symposium on Quality Electronic Design (ISQED), Santa Clara, USA    2011.03

  • Floorplanning Driven Network-on-Chip Synthesis for 3-D SoCs

    Wei Zhong, Song Chen, Fei Ma, Takeshi Yoshimura, Satoshi Goto

    2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS)     1203 - 1206  2011

     View Summary

    As technology advances, 3-D stacking of silicon layers is emerging as a promising approach to address the integration challenges faced by current System-on-Chips (SoCs). Designing efficient Network-on-Chips (NoCs) is necessary to handle the 3-D interconnect complexity. In this paper, we present a four-stage synthesis approach to determine the power-performance efficient 3-D NoC topology for the application. First, we propose an algorithm to explore optimal clustering of cores during 3-D floorplanning. Then, an Integer Linear Programming (ILP) algorithm is proposed to place switches and network interfaces on the 3-D floorplan. Thirdly, a power and timing aware path allocation algorithm is carried out to determine the connectivity across different switches. Last, a min-cost max-flow based algorithm is proposed for Through-Silicon Via (TSV) assignment to minimize the link power consumption. Experimental results show the effectiveness of the proposed algorithm.

  • Application-Specific Network-on-Chip Synthesis: Cluster Generation and Network Component Insertion

    Wei Zhong, Bei Yu, Song Chen, Takeshi Yoshimura, Sheqin Dong, Satoshi Goto

    2011 12TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED)     144 - 149  2011

     View Summary

    Network-on-Chips (NoCs) have emerged as a paradigm for designing scalable communication architecture for System-on-Chips (SoCs). In NoC, one of the key challenges is to design the most power-performance efficient NoC topology that satisfies the application characteristics. In this paper, we present a three-stage synthesis approach to solve this problem. First, we propose an algorithm [floorplanning integrated with cluster generation (FCG)] to explore optimal clustering of cores during floorplanning with minimized link and switch power consumption. Then, based on the size of applications, an Integer Linear Programming (ILP) and a heuristic method (H) are also proposed to place switches and network interfaces on the floorplan. Finally, a power and timing aware path allocation algorithm (PA) is carried out to determine the connectivity across different switches. Experimental results show that, for small applications, the NoC topology synthesized by FIP (FCG+ILP+PA) method can save 27.54% of power, 4% of hop-count and 66% of running time on average. And for large applications, FHP (FCG+H+PA) synthesis method can even save 31.77% of power, 29% of hop-count and 94.18% of running time on average.

    DOI

  • Floorplanning driven network-on-chip synthesis for 3-D SoCs

    Wei Zhong, Song Chen, Fei Ma, Takeshi Yoshimura, Satoshi Goto

    Proceedings - IEEE International Symposium on Circuits and Systems     1203 - 1206  2011

     View Summary

    1 As technology advances, 3-D stacking of silicon layers is emerging as a promising approach to address the integration challenges faced by current System-on-Chips (SoCs). Designing efficient Network-on-Chips (NoCs) is necessary to handle the 3-D interconnect complexity. In this paper, we present a four-stage synthesis approach to determine the power-performance efficient 3-D NoC topology for the application. First, we propose an algorithm to explore optimal clustering of cores during 3-D floorplanning. Then, an Integer Linear Programming (ILP) algorithm is proposed to place switches and network interfaces on the 3-D floorplan. Thirdly, a power and timing aware path allocation algorithm is carried out to determine the connectivity across different switches. Last, a min-cost max-flow based algorithm is proposed for Through-Silicon Via (TSV) assignment to minimize the link power consumption. Experimental results show the effectiveness of the proposed algorithm. © 2011 IEEE.

    DOI

  • Mobility overlap-removal based timing-constrained scheduling

    Song Chen, Yuan Yao, Takeshi Yoshimura

    Proceedings of International Conference on ASIC     417 - 420  2011

     View Summary

    In the High-Level Synthesis (HLS), the scheduling is a key problem. Recently, the max-flow scheduling method is proposed for resource-constrained scheduling based on the mobility overlap removal. In the max-flow scheduling method, it is a hard problem how to remove the overlaps optimally. With an improper mobility overlap removal, there might be no feasible solutions with resource constraints. In this work, the mobility overlap removal-based method is extended to deal with timing-constrained scheduling problem, where the timing constraint is given as an input and the resource usage should be minimized. A simulated-annealing based method is used to explore the optimal mobility overlap removal. And, given the overlap-free mobilities, a convex-cost flow based algorithm is proposed to schedule single-cycle operations optimally. At the same time, greedy algorithms are devised to schedule multi-cycle operations. The experimental results show the effectiveness of the proposed method. © 2011 IEEE.

    DOI

  • Redundant via Insertion: Removing Design Rule Conflicts and Balancing via Density

    Song Chen, Jianwei Shen, Wei Guo, Mei-Fang Chiang, Takeshi Yoshimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E93A ( 12 ) 2372 - 2379  2010.12

     View Summary

    The occurrence of via defects Increases due to the shrinking size in integrated circuit manufacturing Redundant via insertion is an effective and recommended method to reduce the yield loss caused by via failures In this paper we introduce the redundant via allocation problem for layer partition based redundant via insertion methods [1] and solve it using the genetic algorithm At the same time we use a convex cost flow model to equilibrate the via density which is good for the via density rules The results of layer partition based model depend on the partition and processing order of metal layers Furthermore even we try all of partitions and processing orders we might miss the optimal solutions By introducing the redundant via allocation problem on partitioning boundaries we can avoid the sub optimality of the original layer partition based method The experimental results show that the proposed method got 12 more redundant vias inserted on average and the via density balance can be greatly Improved

    DOI

  • Redundant Via Insertion: Removing Design Rule Conflicts and Balancing Via Density

    S. Chen, Jianwei Shen, Wei Guo, Mei-fang Chiang, T. Yoshimura

    IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences   E93-A ( 12 ) 1 - 8  2010.12

  • A dynamic programming based algorithm for post-scheduling frequency assignment in energy-efficient high-level synthesis

    S. Chen, Y. Yao, T. Yoshimura

    Proc. International Conference on Solid-State and Integrated-Circuit Technology (ICSICT)     794 - 706  2010.11

  • Redundant via insertion based on conflict removal

    J. Liang, S. Chen, T. Yoshimura

    Proc. International Conference on Solid-State and Integrated-Circuit Technology (ICSICT)     797 - 799  2010.11

  • Multi-layer floorplanning for stacked ICs: Configuration number and fixed-outline constraints

    Song Chen, Takeshi Yoshimura

    INTEGRATION-THE VLSI JOURNAL   43 ( 4 ) 378 - 388  2010.09

     View Summary

    3-D (stacked device layers) ICs can significantly alleviate the interconnect problem coming with the decreasing feature size and is promising for heterogeneous integration. In this paper, we concentrate on the configuration number and fixed-outline constraints in the floorplanning for 3-D ICs. Extended sequence pair, named partitioned sequence pair (in short, P-SP), is used to represent 3-D IC floorplans. We prove that the number of configuration of 3-D IC floorplans represented by P-SP is less than that of planar floorplans represented by sequence pair (SP) and decreases as the device layer number increases. Moreover, we applied the technique of block position enumeration, which have been successfully used in planar fixed-outline floorplanning, to fixed-outline multi-layer floorplanning. The experimental results demonstrate the efficiency and effectiveness of the proposed method. (C) 2010 Elsevier B.V. All rights reserved.

    DOI

  • Multi-layer floorplanning for stacked ICs: Configuration number and fixed-outline constraints

    S. Chen, T. Yoshimura

    Integration, the VLSI journal   43 ( 4 ) 1 - 11  2010.09

  • Ultra Low Power SoC Design Technologies for Media Processing

    GOTO Satoshi, IKENAGA Takeshi, YOSHIMURA Takeshi, KIMURA Shinji, TOGAWA Nozomu

    IPSJ Magazine   51 ( 7 ) 837 - 845  2010.07

    CiNii

  • Voltage-Island-Driven Floorplanning for Low-Power System-on-Chip Design

    Xiaolin Zhang, Song Chen, Takeshi Yoshimura

    火の国シンポジウム2010    2010.03

  • Floorplanning for High Utilization of Heterogeneous FPGAs

    Nan Liu, Song Chen, Takeshi Yoshimura

    火の国シンポジウム2010    2010.03

  • Whitespace Insertion for Through-Silicon Via Planning on 3-D SoCs

    Wei Zhong, Song Chen, Takeshi Yoshimura

    2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS     913 - 916  2010

     View Summary

    As technology advances, 3-D ICs can significantly alleviate the interconnect problem coming with the decreasing of feature size and are promising for heterogeneous integration. In 3-D ICs, one of the key challenges is the vertical technology, using through-silicon via (TSV) for different device layers connection. In this paper, by noticing the TSV assignment comes under the influence of the whitespace distribution in a given 3-D floorplan, we proposed an algorithm called whitespace insertion (WSI) based on the floorplan-representation Sequence Pair to make the whitespace distribution in the given floorplan more reasonable for TSV insertion. When given 3-D circuit placement or floorplan results, we also proposed a minimum spanning tree based algorithm for TSV assignment to minimize the total wire length, assuming each net may have at most one TSV on each device layer. Experimental results show that, in the given 3-D floorplans there is a huge gap about 45.54% of the wire length increase between the ideal and the practice. And based on our method, the total wire length can be reduced by 13% on average without changing the chip area.

  • Whitespace Insertion for Through-Silicon Via Planning on 3-D SoCs

    Wei Zhong, Song Chen, Takeshi Yoshimura

    2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS     913 - 916  2010

     View Summary

    As technology advances, 3-D ICs can significantly alleviate the interconnect problem coming with the decreasing of feature size and are promising for heterogeneous integration. In 3-D ICs, one of the key challenges is the vertical technology, using through-silicon via (TSV) for different device layers connection. In this paper, by noticing the TSV assignment comes under the influence of the whitespace distribution in a given 3-D floorplan, we proposed an algorithm called whitespace insertion (WSI) based on the floorplan-representation Sequence Pair to make the whitespace distribution in the given floorplan more reasonable for TSV insertion. When given 3-D circuit placement or floorplan results, we also proposed a minimum spanning tree based algorithm for TSV assignment to minimize the total wire length, assuming each net may have at most one TSV on each device layer. Experimental results show that, in the given 3-D floorplans there is a huge gap about 45.54% of the wire length increase between the ideal and the practice. And based on our method, the total wire length can be reduced by 13% on average without changing the chip area.

    DOI

  • Redundant via insertion based on conflict removal

    Jia Liang, Song Chen, Takeshi Yoshimura

    ICSICT-2010 - 2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology, Proceedings     794 - 796  2010

     View Summary

    The occurrence of via defects increases due to the shrinking size in integrated circuit manufacturing. Double-via insertion is an effective and recommended method for improving chip yield and reliability and reducing the yield loss caused by via failures. In this paper we present a genetic algorithm based method to do the double-via insertion for layouts with grid-less or grid-based routing. Design rule violation between redundant via can be represented by a conflict graph whose vertices are redundant vias and edges represent design rule violations. We propose a genetic algorithm based method exploring the optimal removal of some redundant vias to get a conflict-free redundant via set for double via insertion. To reduce the problem size, we will first merge into one vertex (one redundant via) all the connected components that are cliques of the conflict graph. Experiment results show that the effectiveness of the proposed method. ©2010 IEEE.

    DOI

  • Post-Scheduling Frequency Assignment for Energy-Efficient High-Level Synthesis

    Ru Liu, Song Chen, Takeshi Yoshimura

    PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS)     588 - 591  2010

     View Summary

    The technique of multiple supply voltages and dynamic frequency has been explored as a possible energy-efficient strategy in CMOS circuits. In this paper, we consider the technique in the scheduling process which is the keys of high level synthesis. Given a schedule, a novel method is presented for the frequency assignment. The objective is to decrease the energy consumption as much as possible without violating the timing constraints. Initially, an approach based on the convex cost integer network flow is proposed to generate a feasible initial frequency assignment solution. Secondly, a branch and bound method is used to improve the initial solution. Finally, due to the frequency assignment result, we perform the assignment of voltage to each function operation in the control steps. The experimental results show the effectiveness of the proposed method. It is observed that using three supply voltage levels (5V; 3.3V; 2.4V), an average energy savings of 25% to 40% (with the time constraint of 1.5 to 2.0 times the critical path) is obtained as compared to using a single-frequency clocking scheme with a single supply voltage.

    DOI

  • A dynamic programming based algorithm for post-scheduling frequency assignment in energy-efficient high-level synthesis

    Song Chen, Yuan Yao, Takeshi Yoshimura

    ICSICT-2010 - 2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology, Proceedings     797 - 799  2010

     View Summary

    Scaling frequency and voltage in a coordinated manner is a promising way to reduce energy and power. we explore the use of dynamic frequency clocking within the datapath and datapath scheduling algorithms that can be incorporated into a datapath synthesis tool. Given a schedule, we propose a practical optimal frequency assignment algorithm based on dynamic programming. The algorithm run very fast in practice. Though the algorithm theoretically run in a pseudo polynomial time O(ncTex2), nc is the number of control steps and Tex is the difference between the timing constraint and the critical path delay. ©2010 IEEE.

    DOI

  • Post-Scheduling Frequency Assignment for Energy-Efficient High-Level Synthesis

    Ru Liu, Song Chen, Takeshi Yoshimura

    PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS)     588 - 591  2010

     View Summary

    The technique of multiple supply voltages and dynamic frequency has been explored as a possible energy-efficient strategy in CMOS circuits. In this paper, we consider the technique in the scheduling process which is the keys of high level synthesis. Given a schedule, a novel method is presented for the frequency assignment. The objective is to decrease the energy consumption as much as possible without violating the timing constraints. Initially, an approach based on the convex cost integer network flow is proposed to generate a feasible initial frequency assignment solution. Secondly, a branch and bound method is used to improve the initial solution. Finally, due to the frequency assignment result, we perform the assignment of voltage to each function operation in the control steps. The experimental results show the effectiveness of the proposed method. It is observed that using three supply voltage levels (5V; 3.3V; 2.4V), an average energy savings of 25% to 40% (with the time constraint of 1.5 to 2.0 times the critical path) is obtained as compared to using a single-frequency clocking scheme with a single supply voltage.

  • Convex-Cost Flow Based redundant-Via Insertion with Density-Balance Consideration

    Jianwei Shen, M. F. Chiang, Song Chen, Wei Guo, T. Yoshimura

    Proc. IEEE International Conference on ASIC (ASICON 2009)     734 - 737  2009.10

  • Register placement for high-performance circuits

    M.-F. Chiang, T. Okamoto, T. Yoshimura

    Proc. Design, Automation and Test in Europe (DATE), Nice, France     1470 - 1475  2009.04

  • Lagrangian Relaxation Based Inter-Layer Signal Via Assignment for 3-D ICs

    Song Chen, Liangwei Ge, Mei-Fang Chiang, Takeshi Yoshimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E92A ( 4 ) 1080 - 1087  2009.04

     View Summary

    Three-dimensional integrated circuits (3-D ICs), i.e., stacked dies, can alleviate the interconnect problem coining with the decreasing feature size and increasing integration density, and promise a solution to heterogenous integration. The vertical connection, which is generally implemented by the through-the-si I icon via, is a key technology for 3-D ICs. In this paper, given 3-D circuit placement or floorplan results with white space reserved between blocks for inter-layer interconnections, we proposed methods for assigning inter-layer signal via locations. Introducing a grid structure on the chip, the inter-layer via assignment of two-layer chips can be optimally solved by a convex-cost max-flow formulation with signal via congestion optimized. As for 3-D ICs with three or more layers, the inter-layer signal via assignment is modeled as an integral min-cost multi-commodity flow problem, which is solved by a heuristic method based on the lagrangian relaxation. Relaxing the capacity constraints in the grids, we transfer the min-cost multi-commodity flow problem to a sequence of lagrangian sub-problems, which are solved by finding a sequence of shortest paths. The complexity of solving a lagrangian sub-problem is O(n(nt)n(g)(2)), where n(nt) is the number of nets and n(g) is the number of grids on one chip layer. The experimental results demonstrated the effectiveness of the method.

    DOI

  • Lagrangian relaxation based register placement for high-performance circuits

    Mei-Fang Chiang, Takumi Okamoto, Takeshi Yoshimura

    Proceedings of the 2009 10th International Symposium on Quality of Electronic Design     511 - 516  2009.03

  • Exploration of schedule space by random walk

    Liangwei Ge, Song Chen, Takeshi Yoshimura

    IPSJ Transactions on System LSI Design Methodology   2   30 - 42  2009

     View Summary

    Scheduling, an important step in high-level synthesis, is essentially a searching process in the solution space. Due to the vastness of the solution space and the complexity of the imposed constraints, it is usually difficult to explore the solution space efficiently. In this paper, we present a random walk based perturbation method to explore the schedule space. The method works by limiting the search within a specifically defined sub-solution space (SSS), where schedules in the SSS can be found in polynomial time. Then, the SSS is repeatedly perturbed by using an N-dimension random walk so that better schedules can be searched in the new SSS. To improve the search efficiency, a guided perturbation strategy is presented that leads the random walk toward promising directions. Experiments on well-known benchmarks show that by controlling the number of perturbations, our method conveniently makes tradeoff between schedule quality and runtime. In reasonable runtime, the proposed method finds schedules of better quality than existing methods. © 2009 Information Processing Society of Japan.

    DOI

  • A Generalized V-shaped Multilevel Method for Large Scale Floorplanning

    Song Chen, Zheng Xu, Takeshi Yoshimura

    ISQED 2009: PROCEEDINGS 10TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, VOLS 1 AND 2     734 - 739  2009

     View Summary

    In this paper, we propose a generalized V-shaped multilevel floorplanning method with consideration of fixed-outline constraint. The Sequence Pair is used as the floorplan representation. The proposed multilevel method (ML-IARFP) adopts a two-stage structure: top-down partitioning and floorplanning followed by bottom-up merging and refinement. At the first stage, we recursively partition and floorplan the circuits until there are limited number blocks in each sub-circuit. Since we use a multi-partitioning instead of bi-partitioning, general non-slicing floorplan structures are explored in each level, which potentially lead to more effective exploration of the solution space. At the second stage, using a multilevel sequence pair structure, we recursively merge the sub-circuits into bigger circuits and do the refinement. Compared with IMF, Capo 10.2 and IARFP, ML-IARFP obtained the best results under fixed-outline constraints, and compared with IARFP, it achieved 9% wirelength reduction on average and showed a better scalability.

  • A Heuristic Method for Module Sizing Under Fixed-Outline Constraints

    Xiaolin Zhang, Song Chen, Longfan Piao, Takeshi Yoshimura

    2009 IEEE 8TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS     738 - 741  2009

     View Summary

    In this paper a heuristic method for module sizing is proposed as a post-processing of the fixed-outline floorplanning. The heuristic method is based on horizontal and vertical slacks of blocks in the floorplan. By evaluating the distances of each block to the chip boundaries, x-slack and y-slack can be calculated. On the one hand, the heuristic focuses on the blocks has non-zero x-slack or non-zero y-slack but not both. These blocks will be first sorted by the amount of x/y-slack. The block having the most slack will be selected and reshaped, it has a potential to reduce the fixed-outline violation. On the other hand, the heuristic makes Use of a marked flag for each block. After reshaping the block, the marked flag of the block will be set to 1. The slack of all the blocks will be recomputed and the focused blocks with marked flag 0 are reselected and sorted. If the set of focused blocks with marked flag 0 is empty, the marked flag of all the blocks will be reset to 0 and repeat the previous steps. If the fixed-outline constraint is not satisfied, we will repeat the procedure with defined maximum iteration number Experimental results show that, the proposed heuristic could efficient achieve highly success rate without sacrificing much wire-length.

  • Convex-cost Flow based Redundant-Via Insertion with Density-Balance Consideration

    Wei Guo, Song Chen, Mei-Fang Chiang, Jian-Wei Shen, Takeshi Yoshimura

    2009 IEEE 8TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS     1280 - 1283  2009

     View Summary

    As VLSI design complexity continues to increase, the yield loss due to via failure becomes more and more significant Redundant via insertion is highly recommended for improving chip yield and reliability. In this paper, we study the redundant via insertion problem in a post-routing stage, where a single via can have at most one redundant via inserted next to it. The goal is to insert as many redundant vias as possible and, at the same time, try to equilibrate the via density, which is good for the via density rules. We propose a convex-cost flow based approach to solve the problem within up to three routing layers. The experimental results show that, the proposed method can produce optimal solutions on defined density grid structure with maximum redundant via insertion, and achieve an at least 32.61% improvement of via density equalization(1).

  • High-Speed, Pipelined Implementation of Squashing Functions in Neural Networks

    Liangwei Ge, Song Chen, Takeshi Yoshimura

    Proc. The 9th International Conference on Solid-State and Integrated-Circuit Technology (ICSICT 2008)    2008.10

  • Automatic Implementation of Arithmetic Functions in High-Level Synthesis

    Liangwei Ge, Song Chen, Takeshi Yoshimura

    Proc. The 9th International Conference on Solid-State and Integrated-Circuit Technology (ICSICT 2008)    2008.10

  • A synthesis method of general floating-point arithmetic units by aligned partition

    Liangwei Ge, Song Chen, Yuichi Nakamura, Takeshi Yoshimura

    IPSJ Transactions on System LSI Design Methodology   1 ( No.1 ) 67 - 77  2008.08

     View Summary

    Since many embedded applications involve intensive mathematic operations, floating-point arithmetic units (FPU) have paramount importance in embedded systems. However, previous implementations of FPU either require much manual work or only support special functions (e.g. reciprocal, square root, logarithm, etc.). In this paper, we present an automatic method to synthesize general FPU by aligned partition. Based on the novel partition algorithm and the concept of grouping floating-point numbers into zones, our method supports general functions of wide, irreducible domain. The synthesized FPU achieves smaller area, higher frequency, and greater accuracy. Experimental results show that our method achieves 1) on average 90% smaller and 50% faster indexer than the conventional automatic method
    2) on the hyperbolic functions, 20 k times smaller error rate and 50% use of LUTs and flip-flops than the conventional manual design. © 2008 Information Processing Society of Japan.

    DOI

  • A New Implementation of Multilevel Framework for Interconnect-Driven Floorplanning

    Zheng Xu, Song Chen, Takeshi Yoshimura, Yong Fang

    Proc. the 23rd International Technical Conference on Circuits/Systems, Computers and Communications(ITC-CSCC2008)     185 - 188  2008.07

  • On Objective Functions for Fixed-Outline Floorplanning

    Lu Wang, Xiaolin Zhang, Song Chen, Takeshi Yoshimura

    Proc. the 23rd International Technical Conference on Circuits/Systems, Computers and Communications(ITC-CSCC2008)     569 - 572  2008.07

  • A Synthesis Method of General Floating-Point Arithmetic Units by Aligned Partition

    Liangwei Ge, Song Chen, Yuichi Nakamura, Takeshi Yoshimura

    Proc. 23rd International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC2008)     1177 - 1180  2008.07

  • Exploration of Schedule Space by Random Walk

    Liangwei Ge, Song Chen, Takeshi Yoshimura

    Proc. 23rd International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC2008)     1573 - 1576  2008.07

  • Fixed-Outline Floorplanning: Block Position Enumeration and a New Method for Calculating Area Costs

    Song Chen, Takeshi Yoshimura

    IEEE Transactions on CAD   Vol.27 ( No.5 ) 858 - 871  2008.05

  • Lagrangian Relaxation Based Inter-Layer Signal Via Assignment for 3-D ICs

    Song Chen, Liang-Wei Ge, Mei-Fang Chiang, Takeshi Yoshimura

    Proc. 21st Circuits and Systems Karuizawa Workshop     581 - 586  2008.04

  • Weighted Jumper Insertion for Antenna Fixing

    Mei-Fang Chiang, Takeshi Yoshimura

    Proc. 21st Circuits and Systems Karuizawa Workshop     575 - 580  2008.04

  • Performance Maximized Interlayer Via Planning for 3D ICs

    Jun Lu, Song Chen, Takeshi Yoshimura

    Proceedings of ASICON2007     1196 - 1099  2007.10

  • A Fixed-Outline Floorplanning Method

    Takeshi Yosihmura, Song Chen

    Proceedings of ASCON2007     1070 - 1075  2007.10

  • Max-flow scheduling in high-level synthesis

    Liangwei Ge, Song Chen, Kazutoshi Wakabayashi, Takashi Takenaka, Takeshi Yoshimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E90A ( 9 ) 1940 - 1948  2007.09

     View Summary

    Scheduling, an essential step in high-level synthesis, is an intractable process. Traditional heuristic scheduling methods usually search schedules directly in the entire solution space. In this paper, we propose the idea of searching within an intermediate solution space (ISS). We put forward a max-flow scheduling method that heuristically prunes the solution space into a specific ISS and finds the optimum of ISS in polynomial time. The proposed scheduling algorithm has some unique features, such as the correction of previous scheduling decisions in a later stage, the simultaneous scheduling of all the operations, and the optimization of more complicated objectives. Aided by the max-flow scheduling method, we implement the optimization of the IC power-ground integrity problem at the behavior level conveniently. Experiments on well-known benchmarks show that without requiring additional resources or prolonging schedule latency, the proposed scheduling method can find a schedule that draws current more stably from a supply, which mitigates the voltage fluctuation in the on-chip power distribution network.

    DOI

  • Score Sequence Pair Problems of (r11, r12, r22)-Tournaments ~ Determination of Realizability

    Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

    IEICE Transactions on Information and Systems   E90-D ( 2 ) 440 - 448  2007.02

  • A Stable Fixed-Outline Floorplanning Method

    Song Chen, Takeshi Yosihmura

    ISPD'07: PROCEEDINGS OF THE 2007 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN     119 - 126  2007

     View Summary

    In this paper, we propose a stable fixed-outline floorplanning method(IARFP). An elaborated method for perturbing solutions, Insertion after Remove(IAR), is devised for the simulated annealing. The IAR operation uses the technique of enumerating positions in Sequence Pair and greatly accelerates the searching. Moreover, based on the analysis of diverse objective functions used in the existing researches, we suggest a new objective function, which is still effective when combined with other objectives, for the fixed-outline floorplanning. Compared with the previous fixed-outline floorplanners, the proposed method is effective and efficient. Experiments showed that the proposed fixed-outline floorplanner achieved 100% success rate efficiently when optimizing area and wire-length simultaneously, while getting much smaller wirelength. On the other hand, we validated once more by experiments that aspect ratio close to one is beneficial to wire-length.

  • Construction of an (r(11), r(12), r(22))-tournament from a score sequence pair

    Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11     3403 - +  2007

     View Summary

    Let G be any directed graph and S be nonnegative and non-decreasing integer sequence(s). The prescribed degree sequence problem is a problem to determine whether there is a graph G with S as the prescribed sequence(s) of outdegrees of the vertices. Let G be the property satisfying the following (1) and (2):
    (1) G has two disjoint vertex sets A and B.
    (2) For every vertex pair u, v epsilon G (u not equal v), G satisfies
    [GRAPHICS]
    where uv (vu, respectively) means a directed edges from u to v (from v to u).
    Then G is called an (r(11),r(12),r(22))-tournament ("tournament", for short). When G is a "tournament," the prescribed degree sequence problem is called the score sequence pair problem of a "tournament", and S is called a score sequence pair of a "tournament "(or S is realizable) if the answer is "yes."
    We proposed the characterizations of a "tournament" and an algorithm for determining in linear time whether a pair of two integer sequences is realizable or not [5]. In this paper, we propose an algorithm for constructing a "tournament" from such a score sequence pair.

  • On the Number of 3-D IC Floorplan Configurations and a Solution Perturbation Method with Good Convergence

    Song Chen, Takeshi Yoshimura

    Proceedings of IEEE Asia Pacific Conference on Circuits and Systems at Singapore (Best Paper Award)     1867 - 1870  2006.12

  • Realizability of Score Sequence Pair of an (r11, r12, r22)-Tournament

    Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

    Proceedings of IEEE Asia Pacific Conference on Circuits and Systems at Singapore     1021 - 1024  2006.12

  • Hierarchical-analysis-based fast chip-scale power estimation method for large and complex LSIs

    Yuichi Nakamura, Takeshi Yoshimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E89A ( 12 ) 3458 - 3463  2006.12

     View Summary

    This paper presents a novel power estimation method for large and complex LSIs. The proposed method is based on simulation and is used for analyzing the ways in chip-scale gate-level circuits including processors and memory are affected by gated-clock power reduction and the voltage drop due to electrical resistance. The chip-scale power estimation based on simulation patterns generally takes enormous time. In order to reduce the time to obtain accurate estimation results based on simulation patterns, we introduce three approaches: "partitioning of target LSIs and simulation pattern," "memory modeling," and "processor modeling." After placing and routing, the target LSIs are partitioned into hierarchical blocks, memory, and processors. The power consumption of each hierarchical block is calculated by using the partitioned patterns generated from chip-scale simulation patterns. The power consumption of the processor and memory blocks is estimated by a method considering the static power consumption and the rate of LSI activity ratio. Experimental results for a commercial 0.18 mu m-technology media processing chip show that the proposed method is 23 times faster than the conventional method without partitioning and that both the results are almost the same.

    DOI

  • A Consideration of the Score Sequence Problems of (r11, r12, r22)-Tournaments

    Masaya Takahashi, Takahiro Watanabe, Takeshi Yoshimura

    Proceeding of abstract, International Mathematical Conference ~ Topics in Mathematical Analysis and Graph Theory ~ at Belgrade     50 - 51  2006.09

  • Domino Logic Synthesis System and Its Applications

    Ko Yoshikawa, Shigeto Inui, Yuichi Nakamura, Takeshi Yoshimura

    Journal of Circuits, Systems and Computers   15 ( 2 ) 277 - 287  2006.06

    DOI

  • Removal of Overlapped Freedom during Scheduling in High Level Synthesis

    Liangwei Ge, Kouhei Isoda, Takeshi Yoshimura

    Proceedings of the 13th Workshop on Synthesis And System Integration of MIxed Technologies    2006.04

  • An engineering change orders design method based on patchwork-like partitioning for high performance LSIs

    Y Nakamura, K Yoshikawa, T Yoshimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E88A ( 12 ) 3351 - 3357  2005.12

     View Summary

    This paper describes a novel engineering change order (ECO) design method for large-scale, high performance LSIs, based on a patchwork-like partitioning technique. In conventional design methods, even when only small changes are made to the design after the placement and routing process, a whole re-layout must be done, and this is very time consuming. Using the proposed method, we can partition the design into several parts after logic synthesis. When design changes occur in HDL, only the parts related to the changes need to be redesigned. The netlist for the changed design remains almost the same as the original, except for the small changed parts. For partitioning, we used multiple-fan-out-points as partition borders. An experimental evaluation of our method showed that when a small change was made in the RTL description, the revised circuit part had only about 87 gates on average. This greatly reduces the re-layout time required for implementing an ECO. In actual commercial designs in which several design changes are required, it takes only one day to redesign.

    DOI

  • Design and Implementation of an EOS Chip

    liangwei ge, Takeshi Yoshimura

    Proceedings of the 6-th International Conference on ASIC    2005.10

  • A fast chip-scale power estimation method for large and complex LSIs based on hierarchical analysis

    Y Nakamura, T Yoshimura

    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS     628 - 631  2005

     View Summary

    This paper presents a novel power estimation method for large and complex LSIs. The proposed method is based on simulation and is used for analyzing the ways in chip-scale gate-level circuits including processors and memory are affected by gated-clock power reduction and the voltage drop due to electrical resistance (IR-Drop). The chip-scale power estimation based on simulation patterns generally takes a long time. To obtain accurate estimation results based on simulation patterns in a short time, we introduce three approaches: (1) hierarchical circuit and simulation pattern partitioning, (2) memory modeling, and (3) the processor modeling. First, the target LSI design after placing and routing is partitioned into hierarchical blocks, memory, and processors. The power consumption of each hierarchical block is calculated by using the partitioned patterns generated from chip-scale simulation patterns. The power consumption of the processor and memory blocks is estimated by a method considering the static power consumption and the rate of LSI activity. The experimental results for the commercial 0.18um-technology digital TV chip show that our proposed method can get almost the same results obtained by the conventional method without partitioning and can do it 25 times faster.

  • Timing Optimization Methodology Based on Replacing Flip-Flops by Latches

    K. Yoshikawa, K. Kanamaru, Y. Hagihara, S. Inui, Y. Nakamura, T. Yoshimura

    IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   E87-A ( 12 ) 186 - 191  2004.12

  • A Patchwork-like Partitioning Method for Engineering Change Orders in Redesign of High Performance LSIs

    Y. Nakamura, K. Yoshikawa, T. Yoshimura

    Proceedings of the 12th Workshop on Synthesis And System Integration of MIxed Technologies     351 - 356  2004.10

  • Logic Optimization Method after Technology Mapping

    K. Yoshikawa, Y. Nakamura, K Akashi, T. Yoshimura

    Proceedings of the 12th Workshop on Synthesis And System Integration of MIxed Technologies     357 - 362  2004.10

  • A fast hardware/software Co-verification method for system-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication

    Y Nakamura, K Hosokawa, Kuroda, I, K Yoshikawa, T Yoshimura

    41ST DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 2004     299 - 304  2004

     View Summary

    This paper describes a new hardware/software co-verification method for System-On-a-Chip, based on the integration of a C/C++ simulator and an inexpensive FPGA emulator. Communication between the simulator and emulator occurs via a flexible interface based on shared communication registers. This method enables easy debugging, rich portability, and high verification speed, at a low cost. We describe the application of this environment to the verification of three different complex commercial SoCs, supporting concurrent hardware and embedded software development. In these projects, our verification methodology was used to perform complete system verification at 0.2-1.1 MHz, while supporting full graphical interface functions such as "waveform" or "signal dump" viewers, and debugging functions such as "step" or "break".

  • Timing optimization by replacing flip-flops to latches

    K Yoshikawa, K Kanamaru, S Inui, Y Hagihara, Y Nakamura, T Yoshimura

    ASP-DAC 2004: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE     186 - 191  2004

     View Summary

    Latch circuits have advantage for timing and are widely used for high-speed custom circuits. However, ASIC design flows are based on the circuits with flip-flops. Then, ASIC designers don't use latches. This paper describes a new timing optimization algorithm for ASIC by replacing flip-flops to latches without changing the functionality of the circuits. After latch replacement, restricted retiming called fixed-phase retiming is carried out for timing optimization by minimizing the impact of clock skew and jitter. The experimental results show that 17% delay improvement of the benchmark circuits is achieved by proposed algorithms.

  • A New Approach to VLSI Floorplanning based on Quadratic Programming and Rectangle Packing

    Takumi Okamoto, Takeshi Yoshimura

    Proceedings of the 10th Workshop on Synthesis and System Integration of Mixed Technologies   ( 257 ) 263  2001.10

  • Floorplanning using a tree representation

    PN Guo, T Takahashi, CK Cheng, T Yoshimura

    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS   20 ( 2 ) 281 - 289  2001.02

     View Summary

    We present an ordered tree (O tree) structure to represent nonslicing floorplans, The O tree uses only n(2 + inverted right perpendicular lg n inverted left perpendicular) bits for a floorplan of n, rectangular blocks. We define an admissible placement as a compacted placement in both a: and y directions. For each admissible placement, we can find an O-tree representation. We show that the number of possible O-tree combinations is O(n!2(2n-2)/n(1.5)). This is very concise compared to a sequence pair representation that has O((n!)(2)) combinations. The approximate ratio of sequence pair and O-tree combinations is O(n(2)(n/4e)(n)), The complexity of O tree is even smaller than a binary tree structure for slicing floorplan that has O(n!2(5n-3)/n(1.5)) combinations, Given an O tree, it takes only linear time to construct the placement and its constraint graph, We have developed a deterministic floorplanning algorithm utilizing the structure of O tree. Empirical results on MCNC (www.mcnc.org) benchmarks show promising performance with average 16% improvement in wire length and 1% less dead space over previous central processing unit (CPU) intensive cluster refinement method.

  • An Enhanced Perturbing Algorithm for Floorplan Design using O-tree Representation

    Yingxin Pang, Chung-Kuan Cheng, Takeshi Yoshimura

    Proceedings of the 2000 international symposium on Physical design (ISPD2000)     168 - 173  2000.04

  • Logic minimization for large-scale networks based on multi-signal implications

    M Yuguchi, K Wakabayashi, T Yoshimura

    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES   E82A ( 11 ) 2390 - 2397  1999.11

     View Summary

    This paper presents a novel implication-based method for logic minimization in large-scale, multi-level networks. It significantly reduces network size through repeated addition and removal of redundant subnetworks, utilizing multisignal implications and relationships among these implications. These are handled on a transitive implication graph, proposed ill this paper, which offers the practical use of implications for logic minimization. The proposed method holds great promise for the achievement of an interactive logic design environment for large-scale networks.

  • An O-Tree Representation of Nonslicing Floorplan and Its Applications

    Pei-Ning Guo, Chung-Kuan Cheng, Takeshi Yoshimura

    Proceedings of the 36th Design Automation Conference (DAC'99)     268 - 273  1999.06

  • A partitioning-based logic optimization method for large scale circuits with Boolean matrix

    Y NAKAMURA, T YOSHIMURA

    32ND DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 1995     653 - 657  1995

  • Minimum Path-Length Equi-Distant Routing

    M. Edahiro, T. Yoshimura

    Proceedings of the IEEE Asia-Pacific Conference on Circuits and Systems(APCCAS '92)     41 - 46  1992.12

  • A MULTILAYER CHANNEL ROUTER WITH NEW STYLE OF OVER-THE-CELL ROUTING

    T FUJII, Y MIMA, T MATSUDA, T YOSHIMURA

    29TH ACM/IEEE DESIGN AUTOMATION CONFERENCE : PROCEEDINGS     585 - 588  1992

  • ハイレベルシンセシスの動向

    高橋,吉村

    電子情報通信学会論文誌 A   Vol.J74-A ( 2 ) 143 - 151  1991.02

  • New Placement and Global Routing Algorithms for Standard Cell Layouts

    M. Edahiro, T. Yoshimura

    Proceedings of the 27th Design Automation Conference (DAC'90)     642 - 645  1990.06

  • A RESOURCE SHARING AND CONTROL SYNTHESIS METHOD FOR CONDITIONAL BRANCHES

    K WAKABAYASHI, T YOSHIMURA

    1989 IEEE INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN     62 - 65  1989

  • VLSIのレイアウトコンパクション

    吉村

    東レWORLD TECNO TREND   3 ( 6 ) 11 - 14  1988.12

  • 最小コストフローアルゴリズムに基づく管路網解析問題の一解法

    阪内,吉村

    情報処理学会論文誌   Vol.29 ( 11 ) 1079 - 1187  1988.11

  • ネットワーク問題の最近の動向 -理論と応用

    国枝,吉村, 築山, 岡田,大附

    電子情報通信学会誌     922 - 933  1988

  • 最新のLSI-CAD技術

    吉村

    PIXEL(図形処理センター)     77 - 81  1988

  • ネットワーク問題におけるシンプレックス法

    吉村

    電子通信学会論文誌A   J70-A ( 2 ) 156 - 163  1987.02

  • A New Module Generator with Structural Routers and Graphical Interface

    M.Ishikawa, T.Yoshimura

    Proceedings of the 1987 IEEE International Conference on Computer-Aided Design (ICCAD-1987)     436 - 439  1987

  • A Rule-Based and Algorithmic Approach for Logic Synthesis

    T.Yoshimura, S.Goto

    International Workshop on Logic Synthesis    1987

  • Compaction-Based Custom LSI Layout Design Method

    Masaki Ishikawa, Tsuneo Matsuda, Takeshi Yoshimura, Satoshi Goto

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems   6 ( 3 ) 374 - 382  1987

     View Summary

    This paper presents a new design method for custom LSI layouts. This method is based on layout compaction with automatic jog (wiring bend) generation in the layout. A dense chip design can be realized by this technique. Experimental results show that the chip size designed by using the proposed layout method is only 1.2–1.4 times larger than that resulting from manual layouts. Therefore, this comkpaction-based custom LSI layout design method is very effective for achieving a minimal chip layout design. © 1987 by The Institute of Electrical and Electronics Engineers, Inc.

    DOI

  • A Rule-Base and Algorithmic Approach for Logic Synthesis

    T.Yoshimura, S.Goto

    Proceedings of the 1986 IEEE International Conference on Computer-Aided Design (ICCAD-1986)     162 - 165  1986.11

  • A VLSI Architecture Evaluation System

    R.Takahashi, T.Yoshimura, S.goto

    Proceedings of the IEEE International Conference on Computer Design (ICCD'86)     60 - 63  1986

  • A Graph Theoretical Compaction Algorithm

    T.Yoshimura

    Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS1985)     1455 - 1458  1985

    CiNii

  • An Automatic Compaction Method for Building Block LSIs

    M. Ishikawa, T. Matsuda, T.Yoshimura, S. Goto

    Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS1985)    1985

  • An Efficient Channel Router

    T.Yoshimura

    Proceedings of the 21st Design Automation Conference (DAC'84)     38 - 44  1984

    CiNii

  • 区分線形近似による管網解析手法

    吉村

    電子通信学会論文誌A   J66-A ( 4 ) 297 - 304  1983.04

  • 水運用計画における予測問題とタンクモデル

    小橋,吉村

    情報処理学会論文誌   Vol.24 ( 4 ) 406 - 413  1983.04

  • Efficient Algorithms for Channel Routing

    Takeshi Yoshimura, Ernest S. Kuh

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems   1 ( 1 ) 25 - 35  1982

     View Summary

    In the layout design of LSI chips, channel routing is one of the key problems. The problem is to route a specified net list between two rows of terminals across a two-layer channel. Nets are routed with horizontal segments on one layer and vertical segments on the other. Connections between two layers are made through via holes. Two new algorithms are proposed. These algorithms merge nets instead of assigning horizontal tracks to individual nets. The algorithms were coded in Fortran and implemented on a VAX 11/780 computer. Experimental results are quite encouraging. Both programs generated optimal solutions in 6 out of 8 cases, using examples in previously published papers. The computation times of the algorithms for a typical channel (300 terminals, 70 nets) are 1.0 and 2.1 s, respectively. © 1982 IEEE

    DOI

  • Some Algorithms for Channel Routing

    T.Yoshimura, E.S.Kuh

    Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS1980)     955 - 955  1980.05

  • Optimizing Water Distribution in Pipe Networks Using Maximal Flow Algorithms

    T.Yoshimura, T.Ohtsuki, Y.Fujisawa

    Proc. of TIMS XXIV International Meeting    1979.08

  • An Algorithm for Designing Multidrop Teleprocessing Networks

    T.Yoshimura

    Proceedings of the 4th IEEE International Conference on Computer Communication (ICCC-78)     487 - 492  1978

  • A Communication Network Design Tool - NETS

    Y.Teshigawara, I.Wakayama, K.Wakabayashi, T.Yoshimura

    Proc. COMPCON-78     158 - 165  1978

  • Sparse Matrix Techniques for the Shortest Path Problem

    S.Goto, T.Ohtuski, T.Yoshimura

    IEEE Transactions on Circuits and Systems   CAS-23 ( 12 ) 752 - 758  1976

  • A Shortest Path Calculation Program based on Code Generation Technique

    S.Goto, T.Ohtsuki, T.Yoshimura

    Proceedings of the 13th Allerton Conference     431 - 438  1975

▼display all

Books and Other Publications

  • 最新VLSIの開発設計とCAD

    大附,後藤, 國尾, 福間, 石川, 吉村他

    ミマツデータシステム  1994.04

Research Projects

  • Study of combinatorial and mathematical programming methods for high level synthesis

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2014.04
    -
    2017.03
     

    Yoshimura Takeshi

     View Summary

    The research on high level synthesis of system LSI were conducted. First, in the scheduling problem, a method based on a combination of mathematical programming and graph theory was proposed for the dynamic power optimization problem. For leakage power optimization problem, a method with the modifications of the above method and additional post-processing were proposed. In both problems, optimal solutions were obtained in most cases. In the port assignment problem, a method to avoid local optimal solutions considering sub-solution space and a method to reduce processing time were proposed. Optimal solutions were obtained for all the evaluation data. In the TSV assignment problem for three-dimensional LSI, a method to reduce the scale of the problem without sacrificing the solution quality was proposed based on a hierarchical design method. The CPU time was reduced to about 1/38 of the conventional methods

  • 組み合わせ論的および数理計画論的高位レベル合成手法の研究

    科学研究費助成事業(早稲田大学)  科学研究費助成事業(基盤研究(C))

    Project Year :

    2014
    -
    2016
     

  • Floorplan-base Design Environment Technologies for Large-Scale System LSIs

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2011.04
    -
    2014.03
     

    YOSHIMURA Takeshi

     View Summary

    Aiming at efficient design environments for the large system LSIs, the research on High-level Synthesis and Floorplanning has been done.First, the research on the methods for minimizing power consumption by optimizing the value of the frequency, the power-supply voltage and the threshold voltage of the operation units has been done and new methods based on graph theoretical approach for the linear programming problem, and flow algorithms were developed. Then, a graph theoretical approach for the port assignment problem which is one of the important problems in high-level synthesis was developed. As for the floorplanning problems, floorplanning algorithms for FPGA and 2D/3D Application Specific Network on Chips were developed

  • Floorplan-base Design Environment Technologies for Large-Scale System LSIs

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2011
    -
    2013
     

    YOSHIMURA Takeshi

     View Summary

    Aiming at efficient design environments for the large system LSIs, the research on High-level Synthesis and Floorplanning has been done.
    First, the research on the methods for minimizing power consumption by optimizing the value of the frequency, the power-supply voltage and the threshold voltage of the operation units has been done and new methods based on graph theoretical approach for the linear programming problem, and flow algorithms were developed. Then, a graph theoretical approach for the port assignment problem which is one of the important problems in high-level synthesis was developed. As for the floorplanning problems, floorplanning algorithms for FPGA and 2D/3D Application Specific Network on Chips were developed.

  • Research on design and implementation of Ultra Large scale LSI

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2008
    -
    2010
     

    GOTO Satoshi, TAKESHI Yoshimura, SHINJI Kimura

     View Summary

    In this research, fundamental technologies have been developed from architecture circuit, device design to package design to implement 100 million Gate LSI within 1/5 development period, 1/10 fabrication cost and 1/10 power consumption compared with conventional SoC or SiP technologies. Particularly, by doing (1) Research on large-scale system design methodologies, (2) Research on large-scale design automation technologies, (3) Research on high level verification technologies, achieved the drastic reduction on design and fabrication cost with realizing ultra low power and huge bandwidth communication.

  • Research on basic technologies for physical design of large scale system LSI

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research

    Project Year :

    2005
    -
    2008
     

    YOSHIMURA Takeshi

  • システムLSIプロトタイピングベース設計システムの研究

    Project Year :

    2002
    -
    2006
     

  • LSIレイアウト設計アルゴリズムの研究

    Project Year :

    2003
    -
     
     

▼display all

 

Internal Special Research Projects

  • 大規模システムLSIフロアプランベース設計基盤技術の研究

    2010  

     View Summary

    本研究ではフロアプランベースの設計基盤技術の確立を目指し,種々の大きさのブロックを指定された領域への配置を決定するフロアプラン手法、高位レベル設計自動化におけるスケジューリング手法、超高速回路の実現にむけたクロックスキューの最適化手法、Design for Manufacturability のための冗長ビア挿入手法に関する研究を行ない、以下の成果を得た。(1)3次元フロアプランアルゴリズム フロアプランは通常、指定された2次元の領域の中に、種々の大きさの機能モジュールを配置する問題であるが、これを3次元に拡張した問題を取り上げ、最適化アルゴリズムを開発した、従来の2次元の手法に比べ、大幅な面積削減を達成し、論文発表を行った。(2) 3次元フロアプランのためのTSV挿入最適化手法 3次元フロアプランでは異なる層の間を連結するThrough-Silicon Via(TSV)が必要になる。ここでは、TSV挿入位置の最適化手法を開発し、国際会議発表を行った。(3)高位レベルスケジューリングにおける周波数最適化手法 高位レベルスケジューリングにおける各コントロールステップの周波数を、与えられた性能を満たす範囲で最適化し、低電力化を行う手法を検討した。そして、最小コストフローアルゴリズムにより近似解を高速に求める手法、動的計画法により最適解を求める手法を開発し、それぞれ国際会議発表を行った。(3) Application Specific NoC 設計手法 フロアプランアルゴリズムを基にして、種々の大きさのコア(機能ブロック)を含むNetwork on Chip(NoC)の設計を行う手法を開発した。ここでは、まずフロアプランアルゴリズムとクラスタリングアルゴリズムを融合してコア、NI(Network Interface)およびSWの配置を決定し、次に整数計画法に基づき、スイッチ間のリンクの設計を行う。本手法は従来手法を大幅に上回る結果を出し、国際会議発表を行った。(4)チップ製造信頼性向上のための冗長ビア挿入手法 遺伝アルゴリズムに基づく冗長ビア挿入手法および最小コストフローアルゴリズムを利用して、挿入される冗長ビアの密度を均一化する手法を提案し、従来手法を上回る結果を得た。前者は論文発表、後者は国際会議発表を行った。

  • 大規模システムLSIフロアプランベース設計基盤技術の研究

    2009  

     View Summary

    本研究ではフロアプランベースの設計基盤技術の確立を目指し,種々の大きさのブロックを指定された領域への配置を決定するフロアプラン手法、高位レベル設計自動化におけるスケジューリング手法、超高速回路の実現にむけたクロックスキューの最適化手法、Design for Manufacturabilityに関して、冗長ビア導入によるチップ製造の信頼性向上に関する研究を行ない、以下の成果を得た。(1)ソフトブロックを含むフロアプラン 形状が可変なブロック(ソフトブロック)の対応として,(a)フロアプラン途中で形状変更を考慮するアルゴリズム,(b)フロアプラン後に、ブロック形状を最適化する手法を提案・評価を行った。実験の結果、(a)の手法により、余裕の少ない指定領域への配置成功率が大幅に向上すること、(b)により既存手法に比べ、配線長が15%前後削減できることを確認した。(2)高位レベルスケジューリングアルゴリズム 高位レベルスケジューリングを直接計算するのではなく、まず、タイミングの自由度から一旦、2部グラフを構築し、それをフローネットワーク上での最大フローを計算することで、より最適なタイミング割当求める方法を提案した。(3) 超高速回路のクロックスキュー最適化 レジスタのクロックドライバへの割り当てを最適化することにより、スキューを最適化する方法を検討し、ラグランジュ緩和法による最適化手法の改良を行った。これにより、解の精度向上、計算時間の短縮を実現できる見通しを得た。(4)チップ製造信頼性向上のための冗長ビア挿入手法 遺伝アルゴリズムに基づく冗長ビア挿入手法を提案した。また、最小コストフローアルゴリズムを利用して、挿入される冗長ビアの密度を均一化する手法を提案した。

  • 大規模システムLSI物理設計基盤技術の研究

    2005  

     View Summary

     本研究では大規模システムLSI物理設計基盤技術の確立を目指し,連結度の強いセル同士をまとめる「クラスタリング」,クラスタレベルでの配置の最適化を行う「フロアプラン」およびセルレベルでの「配置最適化」の各アルゴリズムの研究開発を行っている。 平成17年度はまず3000個程度のクラスタを想定した最適化基盤技術の開発とプロトタイプ作成による評価実験を行った。具体的な研究実績は以下のとおりである。(1)高速クラスタリングアルゴリズムの開発 ・セル連結度の評価関数の提案 ・連結度の強い順にセルをまとめていくための,Heapを利用した高速アルゴリズムの開発を行い,プログラムの作成および標準ベンチマークデータによる評価を行った。実験では20万セルのデータの処理時間は3秒以下と極めて高速であった。(2)大規模フロアプランアルゴリズムの開発: 従来の局所探索法に替え,Simulated Annealing法をベースとしたフロアプランのための最適化アルゴリズムを開発し,企業から提供された実データを用いて評価を行った。その結果,3000ブロックレベルでの収束回数をこれまでの約1/3に減少できることを確認した。また,平行して,特定ブロックの位置を固定するための手法の理論的検討を行った。現在,評価中である。(3)レイアウト配置最適化アルゴリズムの開発。 LU分解法を用いた2次計画最適化手法による配置最適化手法を検討し,プロトタイプ開発した。実験による評価では,従来のCG法,SOR法に比べ,3倍程度の高速化を達成した。(4)その他 関連研究として,システムLSI設計方法論および演算のスケジューリング手法の研究開発を行った。