Updated on 2024/07/17

写真a

 
KASAHARA, Hironori
 
Affiliation
Faculty of Science and Engineering, School of Fundamental Science and Engineering
Job title
Professor
Degree
工学博士 ( 早稲田大学 電気工学(計算機システム) )
Doctor Engineering

Research Experience

  • 2017.05
    -
    Now

    Engineering Academy of Japan   Member

  • 2017.01
    -
    Now

    IEEE   Fellow

  • 2010.01
    -
    Now

    IEEE Computer Society   Golden Core Member

  • 2004.04
    -
    Now

    Waseda University   Advanced Multicore Processor Research Institute   Director

  • 1997.04
    -
    Now

    Waseda University   Department of Computer Science and Engineering   Professor

  • 2020.06
    -
    2024.06

    Engineering Academy of Japan   Director

  • 2019.05
    -
    2023.05

    COCN (Council on Competitiveness-Nippon)   Board Member

  • 2020.04
    -
    2022.09

    Waseda University   Senior Executive Vice President (Research Promotion)

  • 2019.06
    -
    2021.05

    Japan Universities Association for Computer Education   Standing director

  • 2018.11
    -
    2020.03

    Waseda University   Senior Executive Vice President (Research and Information System Promotion)

  • 2017.01
    -
    2019.12

    IEEE Computer Society   Strategic Planning Committee Chair

  • 2018.01
    -
    2018.12

    IEEE   Technical Activity Board Member

  • 2018.01
    -
    2018.12

    IEEE Computer Society   Board of Governors Chair

  • 2018.01
    -
    2018.12

    IEEE Computer Society   President

  • 2017
    -
     

    the Science Council of Japan   Member

  • 2015
    -
     

    Information Processing Society of Japan   Fellow

  • 2009.01
    -
    2014.12

    IEEE Computer Society   Board of Governors

  • 1988.04
    -
    1997.03

    Waseda University   Department of Electrical, Electronics, and Computer Engineering   Associate Professor

  • 1989.03
    -
    1990.03

    Univ. of Illinois at Urbana-Champaign   Center for Supercomputing R & D   Visiting Research Scholar

  • 1986.04
    -
    1988.03

    Waseda Univerrsity   Department of electrical Engineering   Assistant Professor

  • 1985.09
    -
    1986.03

    The Japan Society for the Promotion of Science (JSPS)   The First Special Resarch Fellow (PD)

  • 1985.07
    -
    1985.12

    University of California at Berkeley   Department of Electrical Engineering and Computer Science   Visiting Scholar

  • 1983.04
    -
    1985.03

    Waseda University   Department of Electrical Engineering   Research Associate

  • 2023.01
    -
    Now

    IEEE   Life Fellow

  • 2021.06
    -
    Now

    IEEE   E. Allen Medal Committee

  • 2017.06
    -
    2025.03

    The Okawa Foundation for Information and Telecommunications (The Okawa Foundation)   Councilor

  • 2018.06
    -
    2024.06

    The Okawa Foundation for Information and Telecommunications (The Okawa Foundation)   Ohkawa Award Selection Committee Member

  • 2022.03
    -
    2023.04

    The Japan Prize Foundation   The 2023 Japan Prize Selection Committee Selection Subcommittee for the “Electronics, Information, and Communication” field Deputy Chairman

  • 2021.01
    -
    2022.09

    Research Innovation Center, Waseda University   General Manager

  • 2018.11
    -
    2022.09

    Research Organization for Open Innovation Strategy, Waseda University   Chairperson

  • 2018.11
    -
    2022.09

    Waseda Shibuya Senior High School   Representative Director

  • 2018.11
    -
    2022.09

    Waseda Junior and Senior High School   Member of Board of Directors and Councilor

  • 2021.04
    -
    2021.10

    IEEE Computer Society   Election Committee Member

  • 2020.09
    -
    2020.12

    Research Innovation Center, Waseda University   Head of Intellectual Property and Research Collaboration Support Section

  • 2019.06
    -
    2020.12

    Research Innovation Center, Waseda University   Director

  • 2012.01
    -
    2020.09

    IEEE Computer Society   Multicore STC (Special Technical Community) Chair

  • 2019.01
    -
    2019.12

    IEEE Computer Society   Nomination Committee Chair

  • 2019.01
    -
    2019.12

    IEEE Computer Society   Past President

  • 2017.01
    -
    2019.12

    IEEE Computer Society   Executive Committee Member

  • 2018.11
    -
    2019.05

    Research Collaboration and Promotion Center, Waseda University   Director

  • 2018.01
    -
    2018.12

    IEEE Computer Society   Executive Committee Chair

  • 2017.01
    -
    2017.12

    IEEE Computer Society   President Elect

  • 2017
    -
     

    Professional member of the IEEE-Eta Kappa Nu(IEEE-HKN)

  • 2010.04
    -
    2013.03

    Egypt Japan University of Science and Technology - EJUST   Invited Professor

  • 2011.04
    -
    2011.09

    The University of Tokyo   Deoartment of Computer Science   Part-time Lecturer

▼display all

Education Background

  • 1982.04
    -
    1985.03

    Waseda University   Graduate School of Science and Engineering   Department of Electrical Engineering  

    Doctor of Engineeting

  • 1980.04
    -
    1982.03

    Waseda University   Graduate School of Science and Engineering Master Course   Department of Electrical Engineering  

    Master of Enginnering

  • 1976.04
    -
    1980.03

    Waseda University   School of Science and Engineering   Department of Electrical Engineering  

    Bachelor of engineering

Committee Memberships

  • 2024.04
    -
    Now

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  National University Corporation Evaluation Committee Temporary Member

  • 2024.02
    -
    Now

    Japan Science and Technology Agency  Chair, Committee for Doctoral Students Support

  • 2023.06
    -
    Now

    ACM / IEEE  ACM/IEEE Co-General Chair, ISCA2025 (International Symposium on Computer Architecture)

  • 2023.04
    -
    Now

    Japan Science and Technology Agency  External Committee Member, Japan Science and Technology Agency (JST) Self-Assessment Committee Subcommittee

  • 2023.03
    -
    Now

    Japan Science and Technology Agency  Governing Board Member, Japan Science and Technology Agency (JST) Research Accomplishments Development Project University-Launched New Industry Creation Program, Governing Board

  • 2023.03
    -
    Now

    World Economic Forum Impact Circle: Innovation for the Public Sector  Member

  • 2023.01
    -
    Now

    IEEE  Life Fellow

  • 2021.11
    -
    Now

    IEEE Eta Kappa Nu (IEEE-HKN)  Professional member, IEEE-Eta Kappa Nu (IEEE-HKN)

  • 2021.07
    -
    Now

    Japan Science and Technology Agency  University Originated New Industry Creation Program Project Promotion Type SBIR Phase1 Support Committee Chair

  • 2021.06
    -
    Now

    IEEE Frances  E. Allen Medal Committee

  • 2020.06
    -
    Now

    World Economic Forum Expert Network  Member

  • 2020.05
    -
    Now

    Japan Science and Technology Agency  Advisor, Moonshot Research and Development Project Division 3

  • 2018.06
    -
    Now

    The Okawa Foundation for Information and Telecommunications  Committee Members, Selection Committee of the Okawa Prize

  • 2017.10
    -
    Now

    Science Council of Japan  Member, Science Council of Japan (SectionIII: Physical Sciences and Engineering)

  • 2017.10
    -
    Now

    Science Council of Japan  Informatics Committee, Member of the IT-generated Issue Review Committee

  • 2017.06
    -
    Now

    The Okawa Foundation for Information and Telecommunications  The Okawa Foundation for Information and Telecommunications Board of Councillors

  • 2017.01
    -
    Now

    IEEE  Fellow, IEEE

  • 2013.11
    -
    Now

    Oscar Technology Corporation  Adviser

  • 2020.06
    -
    2024.06

    The Engineering Academy of Japan (EAJ)  Director, The Engineering Academy of Japan Inc.(EAJ)

  • 2023.04
    -
    2024.02

    Japan Science and Technology Agency  Chairman, Next Generation Researcher Challenging Research Program Committee

  • 2019.05
    -
    2023.05

    Council on Competitiveness-Nippon (COCN)  Board member

  • 2018.11
    -
    2023.05

    Life Science Innovation Network Japan  Members, Management Advisory Committee

  • 2022.03
    -
    2023.04

    The Japan Prize Foundation  The 2023 Japan Prize Selection Committee Selection Subcommittee for the “Electronics, Information, and Communication” field Deputy Chairman

  • 2021.10
    -
    2023.03

    MEXT Research Infrastructure Committee  Advisor

  • 2021.05
    -
    2023.03

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  Member of the Expert Council on the Appropriate Management of Public Research Funds

  • 2020.10
    -
    2023.03

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  Job-type Research Internship Promotion Committee

  • 2019.06
    -
    2023.03

    Circular Economy Organization (CEO)  Advisory Board

  • 2020.04
    -
    2022.09

    Japan Science and Technology Agency  University Originated New Industry Creation Program (W-SPRING) Integrated Leader

  • 2018.11
    -
    2022.09

    Waseda Junior and Senior High School  Member of Board of Directors and Councilor

  • 2018.11
    -
    2022.09

    Waseda Shibuya Senior High School in Singapore  Representative Director

  • 2021.07
    -
    2022.03

    Japan Bioindustry Association  Member of Greater Tokyo Biocommunity Council

  • 2021.01
    -
    2021.12

    IEEE Computer Society  Past President, IEEE Computer Society

  • 2021.01
    -
    2021.12

    IEEE Computer Society  Chair of Board of Governors, IEEE Computer Society

  • 2021.01
    -
    2021.12

    IEEE Computer Society  Chair of Executive Committee, IEEE Computer Society

  • 2021.04
    -
    2021.11

    ACM / IEEE  ACM/IEEE Co-Chair, SC2021 Workshop on Programming Environments for Heterogeneous Computing (PEHC)

  • 2021.02
    -
    2021.11

    ACM / IEEE  ACM/IEEE Committee Member, SC'21 Invited Speakers Committee

  • 2021.05
    -
    2021.11

    IEEE Computer Society  Election Committee Member

  • 2019.05
    -
    2021.05

    Japan Universities Association for Computer Education  Managing Director

  • 2019.06
    -
    2021.02

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  Expert Member of the Science and Technology Council

  • 2020.08
    -
    2020.12

    IEEE Computer Society  Steering Committee Chair, IEEE InTech Forum: A Forum on the Response and Resiliency to COVID-19

  • 2012.01
    -
    2020.12

    IEEE Computer Society  Chair, IEEE Computer Society Special Technical Community on Multicore

  • 2020.07
    -
     

    Other Society  Chief Digital & Learning Officer, World Economic Forum :The Reimaging Learning for Higher Education Committee

  • 2019.07
    -
    2020.06

    Inamori Foundation  Chair, Inamori Foundation Kyoto Prize Selection Committee Advanced Technology Department

  • 2017.01
    -
    2019.12

    IEEE Computer Society  Chair of Strategic Planning (SP9) Committee

  • 2019.01
    -
    2019.08

    IEEE Computer Society  Chair of Nomination Committee

  • 2018.08
    -
    2019.06

    IEEE  Co-Chair of Future of Conputing, IEEE International Conference on Cloud Engineering (IC2E 2019)

  • 2018.01
    -
    2018.12

    IEEE  Technical Activity Board (TAB)

  • 2018.01
    -
    2018.12

    IEEE Computer Society  President, IEEE Computer Society

  • 2017.08
    -
    2018.03

    The Engineering Academy of Japan (EAJ)  Member, The Engineering Academy of Japan Inc.(EAJ)

  • 2016.04
    -
    2018.03

    New Energy and Industrial Technology Development Organization (NEDO)  Peer Reviewer for Prior Evaluation

  • 2010.04
    -
    2018.03

    Japan Science and Technology Agency  JST CREST "Creation of System Software Technology to Support Post-Peta Scale High Performance Computation" Evaluation Committee Member

  • 2017.11
    -
    2017.12

    Other Society  Steering Committee, The Ivannikov ISPRAS Open Conference, Institute for System Programming of the Russian Academy of Sciences

  • 2017.01
    -
    2017.12

    IEEE Computer Society  Chair of Planning Committee

  • 2017.01
    -
    2017.12

    IEEE Computer Society  Chair of Constitution & Bylaws Committee

  • 2017.01
    -
    2017.12

    IEEE Computer Society  President Elect

  • 2016.04
    -
    2017.03

    Information Processing Society of Japan  Representative Member

  • 2016.03
    -
    2017.03

    The Japan Prize Foundation  Members, Fields Selection Committee for the 2017 Japan Prize

  • 2007.03
    -
    2017.02

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  Sub Committee, Evaluation WG on Concept Design of Next Generation Supercomputer

  • 2016.04
    -
    2017.02

    ACM  Program Committee, PPOPP 2017, the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Austin, Texas, USA

  • 2011.02
    -
    2017.01

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  Information Science and Technology Committee

  • 2015.12
    -
    2016.11

    ACM / IEEE  Program Committee, SC16, IEEE ACM International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Palace Convention Center, Salt Lake City, Utah, USA

  • 2016.04
    -
    2016.09

    Other Society  Program Committee, The 29th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2016), Rochester NY, USA

  • 2016.02
    -
    2016.03

    RIKEN  Project Leader Research Accomplishment Evaluation Committee

  • 2015.04
    -
    2015.09

    Other Society  Program Committee, The 28th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2015), Raleigh, NC, USA

  • 2015.06
    -
     

    Information Processing Society of Japan  Fellow

  • 2014.09
    -
    2015.06

    Information Processing Society of Japan  Senior Member

  • 2010.08
    -
    2015.03

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  High Performance Computing Infrastructure Project Promotion Committee

  • 2014.01
    -
    2014.12

    IEEE Ad Hoc Committee on Serving Individuals in Industry  Committee Member of IEEE Ad Hoc on Serving Individuals in Industry

  • 2014.01
    -
    2014.12

    IEEE Computer Society  Member of Constitution & Bylaws Committees

  • 2014.01
    -
    2014.12

    IEEE Computer Society  Member of Nomination Committees

  • 2009.01
    -
    2014.12

    IEEE Computer Society  Board of Governors, Computer Society

  • 2008.04
    -
    2014.09

    Cabinet Office  Expert Member, Exploratory Committee on Office for Government Procurement Challenge System

  • 2014.04
    -
    2014.09

    Other Society  Program Committee, The 27th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2014), Intel Corporation, Hillsboro, OR, USA

  • 2013.10
    -
    2014.05

    Japan Electronics and Information Technology Industries Association (JEITA)  Chair,PC power consumption measurement method JIS Drafting Committee

  • 2014.01
    -
    2014.03

    Japan Science and Technology Agency  Follow-up Evaluation Committee for JST CREST (Dependable OS for Embedded Systems)

  • 2009.04
    -
    2014.03

    New Energy and Industrial Technology Development Organization (NEDO)  Technology Committee

  • 2006.01
    -
    2014.03

    Japan Science and Technology Agency  JST CREST "Dependable Operating System for Embedded Systems Targeting Practicalization" Evaluation Committee Member

  • 2013.01
    -
    2013.12

    IEEE  The 2013 Nominations Committee

  • 2013.04
    -
    2013.09

    Other Society  Program Committee, The 26th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2013), Qualcomm Research Silicon Valley, Santa Clara, CA, USA

  • 2007.09
    -
    2013.05

    Japan Electronics and Information Technology Industries Association (JEITA)  Member, Evaluation Committee for IT & Electronics Human Resource Development

  • 2013.01
    -
    2013.03

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  Supercomputer "K" Post Development Evaluation Committee

  • 2011.12
    -
    2013.03

    Japan Atomic Energy Agency  Research Evaluation Committee for Promotion of Computational Science and Engineering on Atomic Energy fundamental engineering

  • 2001.03
    -
    2013.03

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  National Institute of Science and Technology Policy Science and Technology Specialists Network Committee

  • 2011.12
    -
    2012.11

    Japan Society for the Promotion of Science  Scientific Research Fund Sub Committee

  • 2012.11
     
     

    IEEE  Program Committee, LASCCDCN2012, 2012 Latin America Symposium on Cloud Computing Datacenter and Networking, Mexico City, MEXICO

  • 2012.04
    -
    2012.09

    Other Society  General Chair, The 25th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2012), Green Computing Systems R&D Center,Waseda University, Tokyo, Japan

  • 2010.04
    -
    2012.06

    RIKEN  Next Generation Super Computer Technology Advisary Committee

  • 2012.06
     
     

    Other Society  Program Committee, 2012 First Asia-Pacific Programming Languages and Compilers Workshop (APPLC 2012), Beijing, China

  • 2012.06
     
     

    IEEE  Program Committee, 11th IEEE/ACM International Conference on Ubiquitous Computing and Communications (IUCC 2012), Liverpool, UK

  • 2011.04
    -
    2012.02

    ACM  Program Committee, PPOPP 2012, The 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New Orleans, LA, USA

  • 2010.12
    -
    2011.11

    Japan Society for the Promotion of Science  Scientific Research Fund Sub Committee

  • 2011.10
     
     

    IEEE  Program Committee, The Twentieth International Conference on Parallel Architectures and Compilation Techniques (PACT), Galveston Island, Texas, USA

  • 2011.04
    -
    2011.09

    Other Society  Program Committee, The 24th International Workshop on Languages and Compilers for Parallel Computing (LCPC2011), Colorado State University, Fort Collins, Colorado, USA

  • 2011.04
    -
    2011.09

    The University of Tokyo Department of Computer Science  Part-time Lecturer

  • 2011.03
    -
    2011.09

    Other Society  Program Committee, ICPP-EMS 2011 (The 2011 International Workshop on Embedded Multicore Systems), Taipei, Taiwan

  • 2010.09
    -
    2011.08

    Egypt Japan University for Science and Technology (E-JUST)  Visiting Professor

  • 2011.05
    -
    2011.07

    Japan Atomic Energy Agency  Research Evaluation Committee on Atomic Energy fundamental engineering

  • 2011.06
     
     

    IEEE  Program Committee, The 10th International Symposium on Parallel and Distributed Computing (ISPDC 2011), The Technical University of Cluj-Napoca, Romania

  • 2011.03
    -
    2011.05

    Japan Science and Technology Agency  Evaluation Committee for JST CREST (Dependable OS for Embedded Systems)

  • 2011.05
     
     

    IEEE  SYSTOR 2011 Program Committee

  • 2011.04
     
     

    IEEE  Program Committee, International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, Kohala Coast, Hawaii Hapuna Beach Prince Hotel

  • 2010.12
    -
    2011.03

    Japan Atomic Energy Agency  Atomic Energy Code Research Committee

  • 2009.06
    -
    2011.03

    Japan Atomic Energy Agency  Steering Committee Member, Supercomputing in Nuclear Applications SNA2010+MC2010, Tokyo, Japan

  • 2011.02
    -
     

    New Energy and Industrial Technology Development Organization (NEDO)  Chair, Technical Committee of Green Network Systems Technology Research and Development Project (Green IT Project)

  • 2009.02
    -
    2011.01

    Other Society  Editorial Board, The Encyclopedia of Parallel Computing (Springer)

  • 2007.01
    -
    2010.12

    IEEE  Member, IEEE Japan Council Long Range Strategy Committee

  • 2010.10
     
     

    Other Society  Organizing Committee, The Joint International Conference of the 7th Supercomputing in Nuclear Application and the 3rd Monte Carlo (SNA-MC2010), Tokyo, Japan

  • 2010.04
    -
    2010.10

    Other Society  Program Committee, The 23rd International Workshop on Languages and Compilers for Parallel Computing (LCPC2010), Rice University, Houston, Texas, USA

  • 2009.12
    -
    2010.06

    ACM  Program Committee, ICS'10, 24th ACM International Conference on Supercomputing, Epochal Tsukuba, Tsukuba, Japan

  • 2010.03
    -
     

    Ministry of Economy, Trade and Industry (METI)  Evaluation Committee IT Practical Use for Asia Intelligent Economic Evolution

  • 2009.09
    -
    2010.03

    RIKEN  Next Generation Super Computer Technology Advisary Committee

  • 2009.09
    -
    2010.03

    New Energy and Industrial Technology Development Organization (NEDO)  Evaluation Committee 2009, Energy Saving Innovative Development Project

  • 2009.04
    -
    2010.03

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  Committee Member of Next Generation Supercomputer Project Intermediate Evaluation

  • 2008.07
    -
    2010.03

    New Energy and Industrial Technology Development Organization (NEDO)  Application Evaluation Committee for Green Network Systems Technology Research and Development Project (Green IT Project)

  • 2008.07
    -
    2010.03

    New Energy and Industrial Technology Development Organization (NEDO)  Application Evaluation Committee for 2008 Strategic Development of Efficient Energy Use Technology

  • 2006.11
    -
    2010.03

    New Energy and Industrial Technology Development Organization (NEDO)  Research Committee on Electronics and Information Technological Strategy (Member of Exploratory WG on Field-crossing Technological Strategy)

  • 2006.01
    -
    2010.03

    Japan Atomic Energy Agency  Atomic Energy Code Research Committee

  • 2003.06
    -
    2010.03

    New Energy and Industrial Technology Development Organization (NEDO)  Exploratory WG in Deliberation Committee on Electronics and Information Technological Strategy

  • 2003.04
    -
    2010.03

    New Energy and Industrial Technology Development Organization (NEDO)  Peer Reviewer for Prior Evaluation

  • 2000.04
    -
    2010.03

    Japan Atomic Energy Agency  Research Evaluation Committee (Committee for for Promotion of Computational Science and Engineering)

  • 1997.05
    -
    2010.03

    Japan Atomic Energy Agency  Research Evaluation Committee (Information Technology Committee)

  • 2010.03
     
     

    New Energy and Industrial Technology Development Organization (NEDO)  Electronics and Information Technology Roadmap (Computer Technology Strategy WG) Committee Chair

  • 2009.10
    -
    2010.03

    ACM  Program Committee, 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '10), Mar.13-17.2010, Pittsburgh, PA, USA

  • 2009.07
    -
    2009.12

    IBM  Review Committee of 23rd IBM Japan Science Prize in Computer Science

  • 2009.07
    -
    2009.12

    Ministry of Economy, Trade and Industry (METI)  Evaluation Committee of Technology Development and Demonstration Project for Reliable Data Center

  • 2009.12
    -
     

    IEEE  Program Committee, The Fifteenth International Conference on Parallel and Distributed Systems (ICPADS'09), Shenzhen, China

  • 2009.08
    -
    2009.12

    IEEE  Program Committee, The 7th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC-09), Vancouver, Canada

  • 2009.01
    -
    2009.12

    IEEE  The 2009 Nominations Committee

  • 2009.04
    -
    2009.10

    Other Society  Program Committee, The 22nd International Workshop on Languages and Compilers for Parallel Computing (LCPC 2009), University of Delaware, Newark, Delaware, USA

  • 2009.02
    -
    2009.09

    IEEE  Program Committee, The 10th IEEE International Conference on High Performance Computing and Communications (HPCC-08), DaLian, China

  • 2009.08
    -
    2009.09

    New Energy and Industrial Technology Development Organization (NEDO)  2009 Evaluation Committee of Energy Saving Innovative Development Project

  • 2009.01
    -
    2009.06

    IEEE  Program Committee, The 11th IEEE International Conference on High Performance Computing and Communications (HPCC-09), Seoul, Korea

  • 2009.01
    -
    2009.06

    IEEE  Program Committee, 8th International Symposium on Parallel and Distributed Computing (ISPDC'2009), Lisbon, Portugal

  • 2006.11
    -
    2009.05

    New Energy and Industrial Technology Development Organization (NEDO)  Chair, Computer Strategy Investigation WG in Electronics and Information Technology Strategy Investigation Committee

  • 2006.06
    -
    2009.05

    Information Processing Society of Japan  Senior Reviewer, Journal of IPSJ

  • 2008.11
    -
    2009.03

    New Energy and Industrial Technology Development Organization (NEDO)  Review Committee for Low Power Consumption Architecture Considering Future Evolution

  • 2008.06
    -
    2009.03

    Japan Agency for Marine-Earth Science and Technology (JAMSTEC)  Earth Simulator Advisory Committee

  • 2008.01
    -
    2009.03

    Cabinet Office  Member, Security & Software Investigation Committee for Information & Communication Field Promotion Strategy, Council for Science and Technology Policy (CSTP) Expert Panel on Basic Policy

  • 2008.01
    -
    2009.03

    Cabinet Office  Member, R&D Infrastructure Investigation Committee for Information & Communication Field Promotion Strategy, Council for Science and Technology Policy (CSTP) Expert Panel on Basic Policy

  • 2004.07
    -
    2009.03

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  Steering Committee Member, Promotion Budget for Science and Technology 'Improvement of Basis of Distributed Research Data Sharing for Solution Research of Important Issues' Research Steering Committee

  • 2009.03
     
     

    New Energy and Industrial Technology Development Organization (NEDO)  Hearing Member for Roadmap Development of Energy Saving Technologies in IT and Electronics

  • 2008.10
    -
    2009.02

    ACM  Program Committee, PPoPP2009, 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, North Carolina, USA

  • 2007.03
    -
    2009.01

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  Sub Committee, Evaluation WG on Concept Design of Next Generation Supercomputer

  • 2008.07
    -
    2008.12

    Asahi Shimbun  Judging Committee, Asahi Shimbun JSEC2008 'Science & Technology Challenge for High School Students'

  • 2008.09
    -
     

    Cabinet Office  External Hearing Committee Resource Dispatch Policy for 2009 Science and Technology Budget Requests in Information and Communication

  • 2008.01
    -
    2008.09

    IEEE  Program Committee, ICPP-2008, 2008 International Conference on Parallel Processing, Portland, Oregon

  • 2008.03
    -
    2008.06

    IEEE  Program Committee, Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures (PESPMA), Co-located with ISCA 2008, Beijing, China (IEEE, ACM)

  • 2008.02
    -
    2008.06

    IEEE  Program Committee, HIPS 2008, 13th International Workshop on High-Level Parallel Programming Models and Supportive Environments, Miami, Florida

  • 2008.01
    -
    2008.06

    ACM  Program Committee, ICS'08, 22nd ACM International Conference on Supercomputing, Island of Kos-Aegean Sea, Greece

  • 2008.02
    -
    2008.03

    New Energy and Industrial Technology Development Organization (NEDO)  Chair, Investigation Committee for Technological Strategy of Construction of Power-saving Information Space and Next-generation Power-saving Devices

  • 2005.07
    -
    2008.03

    New Energy and Industrial Technology Development Organization (NEDO)  Chair of Implemented Architecture Investigation Committee, "Research and Development of Multicore Technology for Real Time Consumer Electronics Project"

  • 2005.07
    -
    2008.03

    New Energy and Industrial Technology Development Organization (NEDO)  Chair of Multicore, Architecture and API Investigation Committee, "Research and Development of Multicore Technology for Real Time Consumer Electronics Project"

  • 2005.07
    -
    2008.03

    New Energy and Industrial Technology Development Organization (NEDO)  Chair of R&D Steering Committee, "Research and Development of Multicore Technology for Real Time Consumer Electronics Project"

  • 2005.07
    -
    2008.03

    New Energy and Industrial Technology Development Organization (NEDO)  Chair of Integrated R&D Steering Committee, "Research and Development of Multicore Technology for Real Time Consumer Electronics Project"

  • 2005.07
    -
    2008.03

    New Energy and Industrial Technology Development Organization (NEDO)  Project Leader, "Research and Development of Multicore Technology for Real Time Consumer Electronics Project"

  • 2004.04
    -
    2008.03

    Information Processing Society of Japan  Committee Member, Steering Committee of SIG. on Computer Architecture

  • 2005.12
    -
    2007.12

    Cabinet Office  Member, Security & Software WG for Information & Communication Field Promotion Strategy, Council for Science and Technology Policy (CSTP) Expert Panel on Basic Policy

  • 2005.12
    -
    2007.12

    Cabinet Office  Member, R&D Infrastructure WG for Information & Communication Field Promotion Strategy, Council for Science and Technology Policy (CSTP) Expert Panel on Basic Policy

  • 2007.08
    -
    2007.12

    Asahi Shimbun  Judging Committee, Asahi Shimbun JSEC2007 'Science & Technology Challenge for High School Students'

  • 2005.01
    -
    2007.12

    IEEE  Chair, IEEE Computer Society Japan Chapter

  • 2005.01
    -
    2007.12

    IEEE  Board Member, IEEE Tokyo Section

  • 2007.02
    -
    2007.11

    IEEE  Program Committee, SC 2007, The 2007 International Conference for High Performance Computing and Communications, Reno, Nevada (IEEE, ACM)

  • 2007.01
    -
    2007.07

    IEEE  Program Committee, ISPDC 2006, 6th International Symposium on Parallel and Distributed Computing Hagenberg, Austria

  • 2006.07
    -
    2007.06

    ACM  Program Committee, LCTES'07, ACM SIGPLAN/SIGBED 2007 Conference on Languages, Compilers, and Tools for Embedded Systems, San Diego, California

  • 2007.01
    -
    2007.06

    ACM  Program Committee, ICS'07, 21st ACM International Conference on Supercomputing, Seattle, USA

  • 2007.05
    -
     

    New Energy and Industrial Technology Development Organization (NEDO)  Supervising Committee, Road map of Next-generation Power-saving Devices

  • 2006.08
    -
    2007.03

    New Energy and Industrial Technology Development Organization (NEDO)  Member, Evaluation Committee for "Semiconductor Application Chip Project" (Semiconductor Chip for High-speed, High-reliable Server)

  • 2006.07
    -
    2007.03

    Cabinet Office  Strategy Committee on "Development and Utilization of cutting-edge high performance general-purpose supercomputer Project"

  • 2006.06
    -
    2007.03

    Cabinet Office  Panelist on the Promotion of Collaboration among Business, Academia and Government and Human Resource Development for Creation of Innovations, 5th Conference for the Promotion of Collaboration Among Business, Academia, and Government

  • 2001.06
    -
    2007.03

    Japan Science and Technology Agency  Research Area Adviser, PRESTO; Precursory Research for Embryonic Science and Technology

  • 2000.07
    -
    2007.03

    Japan Atomic Energy Agency  Atomic Energy Code Research Expert Committee

  • 2006.11
    -
    2007.03

    IEEE  Program Committee, IPDPS 2007, 21st IEEE International Parallel & Distributed Processing Symposium, Long Beach, California USA, March 26-30, 2007

  • 2005.04
    -
    2007.03

    Information Processing Society of Japan  Committee Member, Steering Committee of EMBedded Systems

  • 2006.11
    -
    2007.02

    Ministry of Economy, Trade and Industry (METI)  Member, Evaluation Committee for Business Grid Computing Project

  • 2006.08
    -
    2006.12

    Asahi Shimbun  Judging Committee, Asahi Shimbun JSEC2006 'Science & Technology Challenge for High School Students'

  • 2005.12
    -
    2006.10

    New Energy and Industrial Technology Development Organization (NEDO)  Research Committee on Electronics and Information Technological Strategy (Member of Exploratory WG on Field-crossing Technological Strategy)

  • 2005.12
    -
    2006.10

    New Energy and Industrial Technology Development Organization (NEDO)  Chair, Computer Strategy Investigation WG in Electronics and Information Technology Strategy Investigation Committee

  • 2006.09
     
     

    IEEE  Program Committee, PARELEC2006, International Conference on Parallel Computing in Electrical Engineering

  • 2006.07
     
     

    IEEE  Publication Chair, Twelfth International Conference on Parallel and Distributed Systems (ICPADS 2006), Minneapolis, USA

  • 2005.11
    -
    2006.03

    Ministry of Internal Affairs and Communications  Evaluation Committee on Promotion Systems of Strategic Information Network Research and Development

  • 2004.06
    -
    2006.03

    New Energy and Industrial Technology Development Organization (NEDO)  Technology Committee

  • 2004.06
    -
    2006.03

    New Energy and Industrial Technology Development Organization (NEDO)  Member, Evaluation Committee for "Development of Low Power Superconductive Network Devices"

  • 2005.08
    -
    2005.11

    Asahi Shimbun  Judging Committee, Asahi Shimbun JSEC2005 'Science & Technology Challenge for High School Students'

  • 2004.06
    -
    2005.06

    New Energy and Industrial Technology Development Organization (NEDO)  Research Committee on Electronics and Information Technological Strategy (Member of Exploratory WG on Field-crossing Technological Strategy)

  • 2004.06
    -
    2005.06

    New Energy and Industrial Technology Development Organization (NEDO)  Chair, Computer Strategy Investigation WG in Electronics and Information Technology Strategy Investigation Committee

  • 2005.01
    -
    2005.06

    ACM  Program Committee, ICS'05, 19th ACM International Conference on Supercomputing, Massachusetts, U.S.A

  • 2004.12
    -
    2005.03

    International Superconductivity Technology Center (ISTEC)  Member, Supercomputing Using SFQ Device Investigation Committee

  • 2004.12
    -
    2005.03

    Ministry of Internal Affairs and Communications  Evaluation Committee on Promotion Systems of Strategic Information Network Research and Development

  • 2003.06
    -
    2005.03

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  Steering Committee Member, Promotion Budget for Science and Technology 'Research of Common Infrastructure for Parallelizing Compiler' Research Steering Committee

  • 2004.04
    -
    2005.03

    Information Processing Society of Japan  2004th Representative Member

  • 2005.02
     
     

    Other Society  Program Committee, PDCN2005: the IASTED International Conference on Parallel and Distributed Computing and Networks, Innsbruck, Austria

  • 2004.01
    -
    2004.12

    Information Processing Society of Japan  Editorial Board, IPSJ Transactions on Advanced Computing Systems

  • 2004.08
     
     

    Other Society  Program Committee, ICPP04 (The 2004 International Conference on Parallel Processing), Montreal, Quebec, Canada

  • 2004.07
     
     

    Other Society  Program Committee, HPC Asia 2004 7th International Conference on High Performance Computing and Grid in Asia Pacific Region Omiya Sonic City, Tokyo Area, Japan

  • 2003.06
    -
    2004.05

    Information Processing Society of Japan  Steering Committee, SACSIS2004

  • 2003.12
    -
    2004.04

    Other Society  Program Committee, HIPS04 (9th International Workshop on High-Level Parallel Programming Models and Supportive Enviroments), Santa Fe, New Mexico, USA

  • 2003.04
    -
    2004.03

    Information Processing Society of Japan  2003th Representative Member

  • 2000.04
    -
    2004.03

    Information Processing Society of Japan  Computer Science Field Committee

  • 2000.04
    -
    2004.03

    Information Processing Society of Japan  Chair, Steering Committee of SIG. on Computer Architecture

  • 2004.02
     
     

    Other Society  Program Committee, PDCN2005: the IASTED International Conference on Parallel and Distributed Computing and Networks, Innsbruck, Austria

  • 2002.01
    -
    2003.12

    Information Processing Society of Japan  Editorial Board, IPSJ Transactions on High Performance Computing Systems

  • 2003.10
     
     

    Other Society  Program Committee, ICPP2003, International Conference on Parallel Processing 2003

  • 2003.10
     
     

    Other Society  Program Committee, ISHPC-V, The 5th International Symposium on High Performance Computing

  • 2003.01
    -
    2003.09

    Japan Society for the Promotion of Science  Scientific Research Fund Sub Committee

  • 2003.01
    -
    2003.06

    ACM  Program Committee, ICS'03, 17th ACM International Conference on Supercomputing, San Francisco, U.S.A

  • 2002.06
    -
    2003.05

    Information Processing Society of Japan  Steering Committee, SACSIS2003

  • 2003.04
    -
     

    Ministry of Economy, Trade and Industry (METI)  Invited Speaker, Minister's Secretariat Sig. on R&D Human Resource for Innovation Systems

  • 2002.12
    -
    2003.04

    Other Society  Program Committee, HIPS03, 8th International Workshop on High-Level Parallel Programming Models and Supportive Environments, held in conjunction with IPDPS2003, Nice, France

  • 1995.05
    -
    2003.04

    The Institute of Electronics, Information and Communication Engineers  Steering Committee, SIG. on Computer Systems

  • 2002.05
    -
    2003.03

    Japan Atomic Energy Research Institute  ITBL Infrastructure Software Evaluation Committee

  • 2001.08
    -
    2003.03

    Ministry of Economy, Trade and Industry (METI)  Chair, (NEDO)Advanced Parallezing Compiler Technology International Coordination Committee

  • 2001.04
    -
    2003.03

    STARC  SoC DesignText Development Committee

  • 2000.10
    -
    2003.03

    Ministry of Economy, Trade and Industry (METI)  Chair, (NEDO) Advanced Parallelizing Compiler Project Steering Committee

  • 2000.10
    -
    2003.03

    Ministry of Economy, Trade and Industry (METI)  Chair, (NEDO)Advanced Parallelizing Compiler Technology Research Workshop

  • 2000.10
    -
    2003.03

    Ministry of Economy, Trade and Industry (METI)  Chair, (NEDO)Advanced Parallelzing Compiler Intellectual Property Committee

  • 2000.10
    -
    2003.03

    Ministry of Economy, Trade and Industry (METI)  Chair, (NEDO)Advanced Parallezing Compiler Technology Committee

  • 2000.06
    -
    2003.03

    Ministry of Economy, Trade and Industry (METI)  Project Leader, (NEDO) Millennium Project 'Advanced Parallelizing Compiler'

  • 2002.10
    -
    2003.03

    The Institute of Electronics, Information and Communication Engineers  Editorial Board, IEICE TRANSACTIONS Special Issue on Computer System Development

  • 2002.04
    -
    2003.03

    Information Processing Society of Japan  2002th Representative Member

  • 2001.10
    -
    2002.11

    Japan Information Processing Development Center Research Institute for Advanced Information Technology(AITEC)  High-end Computing Technology Research Working Group

  • 2002.09
     
     

    IEEE  Program Committee, PARELEC2002, International Conference on Parallel Computing in Electrical Engineering, Warsaw, Poland

  • 2002.09
    -
     

    Ministry of Economy, Trade and Industry (METI)  Observer, Workshop for Happiness and Independence of Children

  • 2002.08
     
     

    Other Society  Program Committee on Programming Methodologies & Tools, ICPP-2002, International Conference on Parallel Processing, Vancouver, British Columbia, Canada

  • 2002.08
     
     

    Other Society  Program Committee on Compilers and Languages, ICPP-2002, International Conference on Parallel Processing, Vancouver, British Columbia, Canada

  • 2002.01
    -
    2002.06

    ACM  Program Committee, ICS'02, 16th ACM International Conference on Supercomputing, N.Y., U.S.A

  • 2001.06
    -
    2002.06

    Information Processing Society of Japan  Steering Committee, JSPP2002

  • 2002.05
     
     

    Other Society  Program Committee, ISHPC-Ⅳ, The 4th International Symposium on High Performance Computing

  • 2002.02
    -
    2002.05

    Other Society  Program Committee, WOMPEI 2002, International Workshop on OpenMP : Experiences and Implementations

  • 2002.04
     
     

    Other Society  Program Committee, HIPS02, The 7th International Workshop on High-Level Parallel Programming Models and Supportive Environments, held in conjunction with IPDPS2002, Ft.Lauderdale, U.S.A.

  • 2001.12
    -
    2002.03

    Japan Information Processing Development Center  Next Generation Electronics Infomation Instructure Human Resource Investigation Committee

  • 2001.12
    -
    2002.03

    Japan Information Processing Development Center  Next Generation IT Talented Persons Investigation Committee

  • 2000.04
    -
    2002.03

    Japan Atomic Energy Research Institute  Part-Time Invited Researcher(Research and Development of Parallel Processing Basic

  • 2001.04
    -
    2002.03

    Information Processing Society of Japan  2001th Representative Member

  • 2002.02
     
     

    Japan Atomic Energy Research Institute  Research accomplishment evaluation committee for research staff employment

  • 2001.06
     
     

    Other Society  Organizing Committee, PDPTA'01, International Conference on Parallel Processing and Distributed Processing Techniques and Applications, Las Vegas, Nevada, U.S.A.

  • 2000.06
    -
    2001.06

    Information Processing Society of Japan  Steering Committee, JSPP2001

  • 2000.04
    -
    2001.05

    Information Processing Society of Japan  Chair of Editorial Board: Guest Editor, Journal of IPSJ Special Issue on Parallel Processing

  • 2000.10
    -
    2001.03

    The University of Tokyo  Doctoral Dissertation Evaluation Committee

  • 2000.10
    -
    2001.03

    Japan Information Processing Development Center Research Institute for Advanced Information Technology(AITEC)  Chair, Next Generation Electronic Information Basis Technology Investigation Committee

  • 1999.10
    -
    2001.03

    Japan Information Processing Development Center Research Institute for Advanced Information Technology(AITEC)  High End Computing and Communication Committee (HECC)

  • 1999.06
    -
    2001.03

    Japan Atomic Energy Research Institute  Steering Committee, Supercomputing International Conference on Atomic Energy

  • 1997.04
    -
    2001.03

    Tokyo Electric Power Company  Academic Evaluation Committee

  • 1996.09
    -
    2001.03

    Ministry of Economy, Trade and Industry (METI)  Chair, Industry, Academia & Government Information Policy Forum Investigation WG4 (Information/System [HPC])

  • 2001.03
    -
     

    Kyoto University  Data Processing Center 66th Research Seminar Invited Speaker

  • 2000.06
    -
    2001.03

    Information Processing Society of Japan  Copyright Investigation Committee

  • 2000.10
     
     

    Other Society  Program Co-Chair, ISHPC'2000, International Symposium on High Performance Computing

  • 2000.05
    -
    2000.10

    Other Society  Steering Committee, ISHPC2000, International Workshop on OpenMP: Experiences and Implementations

  • 2000.09
     
     

    Other Society  Steering Committee, JAERI Nuclear Supercomputing 2000

  • 2000.08
     
     

    Other Society  Program Committee, ICPP2000, International Conference on Parallel Processing 2000 The Westin Habor Castle, Toronto, Canada

  • 2000.05
     
     

    Other Society  Program Committee, HPC-Asia 2000, Beijing, China

  • 1999.06
    -
    2000.05

    Information Processing Society of Japan  Program Chair, JSPP2000

  • 2000.04
    -
     

    Other Society  Editorial advisory board, Scientific Programming John Wiley & Sons, Inc.

  • 1999.12
    -
    2000.03

    Japan Information Processing Development Center Research Institute for Advanced Information Technology(AITEC)  Super Advanced Electronic Basis Technology Investigation Committee

  • 1999.07
    -
    2000.03

    Japan Information Processing Development Center  Chair, Investigation Research Committee of Super Compiler Technology Parallelizing Compiler WG

  • 1999.07
    -
    2000.03

    Japan Information Processing Development Center  Investigation Research Committee of Super Compiler Technology

  • 1999.04
    -
    2000.03

    Japan Atomic Energy Research Institute  Research Evaluation Committee on Computer Science Technology Sub Committee

  • 1997.04
    -
    2000.03

    Japan Atomic Energy Research Institute  The First Class Invited Researcher

  • 1996.04
    -
    2000.03

    Information Processing Society of Japan  Steering Committee, SIG. on Computer Architecture

  • 1997.11
    -
    1999.11

    Other Society  Program Committee, ISHPC'97, Institute of Systems & Information Technologies/ KYUSHU, Fukuoka

  • 1999.09
     
     

    Other Society  Program Committee, ICPP'99, Aizu Univ., Fukushima, Japan

  • 1999.06
    -
    1999.07

    Other Society  Program Committee, PDPTA'99, Las Vegas, Nevada, U.S.A.

  • 1999.03
    -
    1999.06

    ACM  Program Committee, 13th ACM ICS Workshop on Scheduling Algorithms for Parallel/Distributed Computing -From Theory to Practice-, Rhodes, Greece

  • 1999.01
    -
    1999.06

    ACM  Program Committee, ICS'99, 13th ACM International Conference on Supercomputing, Rhodes, Greece

  • 1999.05
     
     

    Other Society  Program Committee, ISHPC'99, Keihan International Plaza, Kyoto, Japan

  • 1999.02
    -
    1999.03

    Japan Atomic Energy Research Institute  Research Results Evaluation Committee on Computational Software Sub Committee

  • 1999.01
    -
    1999.03

    Japan Atomic Energy Research Institute  Research Achievement Evaluation Committee for Research Staff Employment

  • 1999.01
    -
    1999.03

    Japan Information Processing Development Center  Chair, Investigation Research Committee of Super Compiler Technology Parallelizing Compiler WG

  • 1999.01
    -
    1999.03

    Japan Information Processing Development Center  Investigation Research Committee of Super Compiler Technology

  • 1998.10
    -
    1999.03

    Japan Atomic Energy Research Institute  Research Achievement Evaluation Committee for Ph.D Researchers (Computer Science Technology Promotion Center)

  • 1996.10
    -
    1999.03

    Japan Information Processing Development Center Research Institute for Advanced Information Technology(AITEC)  Research Trend Investigation WG on Petaflops Machines

  • 1998.05
    -
    1998.12

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  Earth Simulator Intermediate Evaluation Committee

  • 1998.03
    -
    1998.09

    Japan Information Processing Development Center Research Institute for Advanced Information Technology(AITEC)  Super Compiler Systems Technology Investigation Committee

  • 1998.07
     
     

    Other Society  Program Committee, PDPTA'98, Las Vegas, Nevada, U.S.A.

  • 1998.06
     
     

    Other Society  Organizing Committee, SGDC'98 The Symposium on Global Distributed Computing Toward The Year 2010, Waseda Univ., Tokyo, Japan

  • 1997.06
    -
    1998.05

    Information Processing Society of Japan  Chair, Journal of IPSJ Editorial Board HG

  • 1993.06
    -
    1998.05

    Information Processing Society of Japan  Journal of IPSJ Editorial Board

  • 1996.01
    -
    1998.03

    National Aerospace Laboratory  Fundamental Research on Creation Supports in Intellectual Manufacture Activities Third Section Committee

  • 1997.09
    -
    1998.03

    Information Processing Society of Japan  Trans. and SIG. Joint Committee

  • 1995.04
    -
    1998.03

    Information Processing Society of Japan  Steering Committee, SIG. on Algorithm

  • 1986.04
    -
    1998.03

    The Institute of Electrical Engineers of Japan  Secretary, Committee on Information Technology

  • 1995.10
    -
    1997.12

    Chair, SIG. on Parallel Processing  The Institute of Electrical Engineers of Japan

  • 1993.01
    -
    1997.12

    Information Processing Society of Japan  Best Paper Award Selection Committee

  • 1997.08
     
     

    Other Society  Program Committee, ICPP'97, Bloomingdale, Illinois

  • 1997.07
    -
     

    Japan Atomic Energy Research Institute  Research Achievement Evaluation Committee for Research Staff Employment

  • 1997.02
    -
    1997.07

    ACM  Program Committee, ICS'97, 11th ACM International Conference on Supercomputing, Vienna, Austria

  • 1996.06
    -
    1997.05

    Information Processing Society of Japan  Vice Chair, Journal of IPSJ Editorial Board HG

  • 1994.04
    -
    1997.03

    Real World Computing Partnership (RWC)  RWC Super Parallel Architecture Workshop Committee

  • 1996.05
    -
    1997.03

    The Institute of Electronics, Information and Communication Engineers  Editorial Board Secretary, English Trans. (D) Special Issue on 'Parallel and Distributed Supercomputing'

  • 1993.04
    -
    1997.03

    Information Processing Society of Japan  Steering Committee, SIG. on OS and System Software

  • 1996.10
     
     

    IEEE  Program Committee, SPDP'96, 8th Symposium on Parallel and Distributed Processing, New Orleans, Louisiana, U.S.A.

  • 1995.11
    -
    1996.05

    ACM  Program Vice Chair, ICS'96, 10th ACM International Conference on Supercomputing, Philadelphia, Pennsylvania, U.S.A

  • 1994.06
    -
    1996.03

    The Institute of Electronics, Information and Communication Engineers  SIG. on Multi-media Infrastructure & Service

  • 1993.01
    -
    1995.12

    Information Processing Society of Japan  Program Committee, The National Convention

  • 1995.11
     
     

    IEEE  Program Committee, ICECCS'95, First IEEE International Conference on Engineering of Complex Computer Systems, Westin Cypress Creek Hotel, Ft. Lauderdale, Florida, U.S.A

  • 1995.02
    -
    1995.07

    ACM  Program Committee, ICS'95, 9th ACM International Conference on Supercomputing, Barcelone, Spain

  • 1994.06
    -
    1995.05

    Information Processing Society of Japan  Program Committee, JSPP'95

  • 1995.01
    -
    1995.03

    Japan Atomic Energy Research Institute  Research Evaluation Committee on Computer Science Technology Sub Committee

  • 1995.01
    -
    1995.03

    National Aerospace Laboratory  Research Evaluation Committee (Computer Science Sub Committee)

  • 1995
     
     

    The Institute of Electrical Engineers of Japan  Chair, Electronics, Information and System Section 'Trends of Multiprocessor Supercomputer'

  • 1994
    -
    1995

    The Institute of Electronics, Information and Communication Engineers  Editorial Board, Japanese Trans. (D) Special Issue on 'Real Time Processing Systems and Applications'

  • 1994.01
    -
    1994.12

    Information Processing Society of Japan  Program Committee, Electrical and Electronics Joint Conference

  • 1994.09
     
     

    Other Society  Program Committee, CONPAR'94/ VAPP VI International Conference on Parallel Processing and Vector and Parallel Processing in Computational Sciences, Linz, Austria (Springer-Verlag)

  • 1992.10
    -
    1994.09

    The Institute of Electrical Engineers of Japan  Chair, SIG. on Parallel Processing Computer Technology in Industry

  • 1993.11
    -
     

    The Institute of Electrical Engineers of Japan  Guest Editor, Trans. (C) of IEEJ Special Issue on 'Parallel and Distributed Super Computing'

  • 1993.06
    -
     

    Information Processing Society of Japan  Reviewer, Journal of IPSJ

  • 1992.06
    -
    1993.05

    Information Processing Society of Japan  Program Committee, JSPP'93

  • 1992.06
    -
    1993.05

    Information Processing Society of Japan  Chair, IPSJ Magazine Editorial Board HWG

  • 1990.06
    -
    1993.05

    Information Processing Society of Japan  IPSJ Magazine Editorial Board

  • 1993.01
    -
    1993.03

    Kyushu University  Interdisciplinary Graduate School of Engineering Sciences Invited Lecturer

  • 1991
    -
    1993

    The Institute of Electronics, Information and Communication Engineers  Secretary, SIG. on Computer Systems

  • 1992.01
    -
    1992.12

    Information Processing Society of Japan  The Incentive Prize Selection Committee

  • 1991.06
    -
    1992.05

    Information Processing Society of Japan  Vice Chair, IPSJ Magazine Editorial Board HWG

  • 1988.04
    -
    1991.03

    The Institute of Electrical Engineers of Japan  SIG. on Simulation Technology

  • 1988.06
    -
    1990.05

    Information Processing Society of Japan  Literature and News Committee in IPSJ Magazine

  • 1988.04
    -
    1990.03

    Japan Electronic Industry Development Association  Secretary, SIG. on Distributed Computer Control Systems

▼display all

Professional Memberships

  • 2023.01
    -
    Now

    IEEE Life Fellow

  • 2020.06
    -
    Now

    The Engineering Academy of Japan, Director

  • 2019.01
    -
    Now

    COCN Board Member

  • 2017.11
    -
    Now

    IEEE Eta Kappa Nu Professional member,

  • 2017.05
    -
    Now

    The Engineering Academy of Japan Inc.(EAJ)

  • 2017.04
    -
    Now

    Science Council of Japan Member

  • 2017.01
    -
    Now

    IEEE Fellow

  • 2017.01
    -
    Now

    The Okawa Foundation for Information and Telecommunications

  • 2016.02
    -
    Now

    IEEE Senior Member,

  • 2015.06
    -
    Now

    IPSJ Fellow

  • 1987.04
    -
    Now

    ACM

  • 1986.01
    -
    Now

    IEEE Professional member

  • 1983.01
    -
    Now

    The Robotics Society of Japan

  • 1982.06
    -
    Now

    IEEE Computer Society

  • 1982.06
    -
    Now

    Japan Society for Simulation Technology

  • 1982.04
    -
    Now

    The Institute of Electronics, Information and Communication Engineers

  • 1982.01
    -
    Now

    IEEE

  • 1981.04
    -
    Now

    Information Processing Society of Japan

  • 2018.01
    -
    2018.12

    IEEE Computer Society President

  • 1980.04
    -
     

    Institute of Electrical Engineers of Japan

▼display all

Research Areas

  • Computer system

Research Interests

  • Parallel Processing, Parallelizing Compiler, Multicore Processor, Green Computing, Computer Science

Awards

  • Life Fellow

    2023.01   IEEE  

  • SCAT (Support Center for Advanced Telecommunications Technology Research) President Grand Award

    2021.01   SCAT (Support Center for Advanced Telecommunications Technology Research)  

    Winner: Hironori Kasahara

  • Information Processing Society of Japan, Contribution Award

    2020.06   Information Processing Society of Japan  

    Winner: Hironori Kasahara

  • Spirit of the IEEE Computer Society Award

    2019.10   IEEE Computer Society   Distinguished Contribution for Progress of Resarch, Education and Standard in Computer Technology in the World

    Winner: Hironori Kasahara

  • Fellow

    2017.01   IEEE  

    Winner: Hironori Kasahara

  • Information Processing Society of Japan Fellow Award

    2015.06  

    Winner: Hironori Kasahara

  • Prize for Science and Technology (Research Category),The Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology

    2014.04  

    Winner: Hironori Kasahara, Keiji Kimura

  • IEEE Computer Society Golden Core Member

    2010.02   IEEE  

    Winner: Hironori Kasahara

  • Intel 2008 Asia Academic Forum Best Research Award

    2008.10   Intel  

    Winner: Hironori Kasahara

  • 2008 LSI Of-The-Year Second Prize

    2008.07  

  • STARC (Semiconductor Technology Academic Research Center) Industry-Academia Cooperative Research Award

    2005.01  

  • IPSJ Sakai Memorial Special Award

    1997  

  • IFAC World Congress Young Author Prize

    1987   IFAC (International Federation of Automatic Control)  

    Winner: Hironori Kasahara

▼display all

Media Coverage

  • Quantum Technology Research and Implementation Center Inauguration Symposium

    Internet

    Quantum Technology Research and Implementation Center HP  

    2024.03

  • Advanced Multicore Processor Research Institute participated in Exhibition of IEEE ACM SC 2023

    Internet

    Waseda University Green Computing Systems Research organization News  

    2024.02

  • IEEE Computer Society (CS) Leaders Reveal Predictions on the Technologies to Watch in 2024: Generative AI leads expectations for the greatest impact this year

    Internet

    IEEE Computer Society, LOS ALAMITOS, Calif.  

    2024.01

  • IEEE Computer Society Leaders Reveal Predictions on the Technologies to Watch in 2024

    Internet

    HPCwire  

    2024.01

  • IEEE Computer Society (CS) Leaders Reveal Predictions on the Technologies to Watch in 2024

    Internet

    IEEE Computer Society  

    2024.01

  • Automatic Parallelizing and Power Reducing Compiler Technology of Real-time Control Computation on Multicores with Accelerators

    Internet

    TIER IV Workshop 2023 on AI Computing in Automatic Driving  

    2023.07

  • Japanese University Ranking Having Most Attractive Professors: Chosen by Citizen of Tokyo! The 2nd 'Waseda University''

    Internet

    NetLab  

    2023.04

  • Japanese University Ranking Having Most Attractive Professors: Result of a survey offered to people older than 70 years old! The 2nd 'Waseda University', Which is No. 1?

    Internet

    NetLab  

    2023.03

  • RU11 special programme: Challenges and prospects for the World University Rankings: Japanese universities perspectives

    Other

    THE Asia Universities Summit 2022, EVENT REPORT  

    2022.05

  • Waseda Shibuya Senior High School in Singapore (WSS) April 22, 2022, Entrance ceremony was held: Waseda University SEVP Prof. Hironori Kasahara gave us a congratulatory address

    Internet

    Waseda Shibuya Senior High School in Singapore Home Page  

    2022.04

  • 早稲田大学力拒逆流 開放創新生態促進国際化

    Newspaper, magazine

    亜洲週刊, 2022年第17期 2022/4/25-5/1号  

    2022.04

  • Policy and System for Bridging Industry and Academia, 2.3.3. Waseda University

    Other

    JST CRODS Survey Report 'Current Status and Problems for Innovation Ecosystem Implementation', CRDS-FY2021-RR-04  

    2022.03

  • Waseda University organize WOI'22: Including Information Sharing for Innovation and Carbon Neutral Research

    Internet

    YAHOO! JAPAN news  

    2022.02

  • Waseda Carbon Net Zero Challenge: Advanced Research -Interview with Waseda University Senior Executive Vice President Hironori Kasahara-

    Internet

    Waseda Net Carbon Zero Challenge Home Page  

    2022.01

  • Waseda Open Innovation Forum 2022 (WOI'22) Promotion Video

    Internet

    Waseda Univ. HP  

    2022.01

  • Waseda Open Innovation Forum 2022 (WOI'22) Promotion Video

    Internet

    Waseda Open Innovation Forum 2022 (WOI'22) HP  

    2022.01

  • The Future of Tech: 2022 Technology Predictions Revealed

    Other

    IEEE Computer Society  

    2022.01

  • Computing Experts Release Scorecard for IEEE Computer Society’s 2021 Tech Predictions

    Internet

    HPC-Wire  

    2021.12

  • Voices from University and Industry: Prof. Hironori Kasahara, Waseda University Open Innovation Strategic Research Organization Development Project Director

    Internet

    MEXT Open Innovation Strategic Research Organization Development Project Home Page  

    2021.12

  • Prof. Kasahara Received SCAT (Support Center for Advanced Telecommunications Technology Research) President Grand Award. [July. 7 Award Celebration Speach]

    Internet

    Author: Other  

    Waseda University Green Computing Systems Research organization News  

    2021.06

  • Waseda University:?University Tradition and Color Opening New Era

    Internet

    Author: Other  

    Asahi Shimbun, The Power of University  

    2021.05

  • 2020 Accepted Organizations of SCORE (University Promotion Type, specifically Gateway City Environment Improvement Type --Start-up incubation from COre REsearch), Research Achievement and Evolution Project University-- Initiated for Creation of New Industry Program

    Other

    Japan Science and Technology Agency  

    2021.03

  • Waseda Univ. Research Organization for Open Innovation Strategy

    Internet

    Author: Other  

    Waseda Univ. HP  

    2021.03

  • program of Start up incubation from COre REsearch : SCORE

    Internet

    Author: Other  

    Waseda Univ. HP  

    2021.03

  • Announcement of WOI'21 WASEDA

    Internet

    Author: Other  

    Waseda Univ. HP  

    2021.02

  • Engineering Education in the Age of Autonomous Machines

    Internet

    CoRR abs/2102.07900  

    2021.02

  • Creation of New Global Values

    Internet

    Waseda Net Carbon Zero Challenge Home Page  

    2021

  • Promoting Research Exchanges Between Waseda and the University of Oxford

    Promotional material

    Author: Other  

    Waseda University CAMPUS NOW, Vol. 238  

    2021.01

  • 60th Anniversary of the Information Processing Society of Japan (IPSJ) --60 Years of Historical Accomplishments and Advancements in Computing--

    Other

    Author: Other  

    IEEE Computer Society  

    2020.12

  • WASEDA Open Innovation Forum 2021

    Other

    Author: Other  

    Waseda Univ. HP  

    2020.12

  • Research Promotion at Waseda University for Realizing Innovation

    Internet

    Author: Other  

    Waseda University CAMPUS NOW  

    2020.12

  • My Research at the University of Illinois, Urbana-Champaign in 1989 Led to the Future Beyond My Imagination

    Other

    Author: Other  

    Murata Overseas Scholarship Foundation 50th Anniversary Publication  

    2020.10

  • Research Promotion at Waseda University for Realizing Innovation

    Other

    Author: Other  

    Waseda University CAMPUS NOW, Vol. 237 pp.6  

    2020.10

  • The Role of the University in the Innovation Ecosystem

    Other

    Author: Other  

    Road to Silicon Valley Event Summary Report pp.5  

    2020.10

  • 2020 GITI Forum --Corona Society Coped by ICT--

    Other

    Author: Other  

    Waseda Univ. HP  

    2020.09

  • "Cyber Symposium -Information Sharing on Remote Education Started from April

    Internet

    Author: Other  

    National Institute of Informatics  

    2020.09

  • Agreement of Ishikawa Prefecture, Komatsu and Waseda University for Human Resource Development on IoT and AI -Yomiuri Shimbun (Local Version)-

    Internet

    Author: Other  

    Yomiuri Shimbun (Local Version)  

    2020.09

  • Human Resource Development for Digital Technology by Industry, Academia and Government -NHK Ishikawa-

    Internet

    Author: Other  

    NHK Ishikawa  

    2020.09

  • Advanced Technical Human Resource Development: Ishikawa Prefecture, Komatsu, and Waseda University Collaboration Agreement Conclusion -Hokkoku Shimbun-

    Internet

    Author: Other  

    Hokkoku Shimbun  

    2020.09

  • Agreement of Ishikawa Prefecture, Komatsu and Waseda University for Human Resource Development on IoT and AI

    Newspaper, magazine

    Author: Other  

    Yomiuri Shimbun (Local Version)  

    2020.09

  • Cooperation for Advanced Human Resource Development: Collaboration Agreement among Ishikawa Prefecture, Komatsu and Waseda University / Ishikawa Prefecture

    Newspaper, magazine

    Author: Other  

    Mainichi shimbun (Local Version)  

    2020.09

  • Advanced Technical Human Resource Development: Ishikawa Prefecture, Komatsu, and Waseda University Collaboration Agreement Conclusion

    Newspaper, magazine

    Author: Other  

    Hokkoku Shimbun  

    2020.09

  • 'School' Started for Training of AI Engineers, etc. : Collaboration of Ishikawa Prefecture, Komatsu, and Waseda University

    Newspaper, magazine

    Author: Other  

    Chunichi Shimbun (Hokuriku)  

    2020.09

  • Human Resource Development for Digital Technology by Industry, Academia and Government

    TV or radio program

    Author: Other  

    NHK Ishikawa  

    2020.09

  • Ishikawa Prefecture, Komatsu and Waseda University: Cooperation Agreement Ceremony for Human Resource Development on IoT

    TV or radio program

    Author: Other  

    MRO Hokuriku Broadcasting  

    2020.09

  • University-wide agreement concluded between Oxford and Waseda

    Other

    Author: Other  

    Waseda Univ. HP  

    2020.04

  • University-wide agreement concluded between Oxford and Waseda

    Other

    Author: Other  

    Waseda Univ. HP  

    2020.04

  • University of Oxford signs Memorandum of Understanding with Waseda University

    Other

    Author: Other  

    University of Oxford HP  

    2020.04

  • Waseda Univ. opens a new Industry and Academia Collaboration facility: Expecting participation of more than 200 companies

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    2020.04

  • Waseda signs agreement with University of Tokyo to fast track social change

    Other

    Author: Other  

    Waseda Univ. HP  

    2020.03

  • EECS Seminar: Green Multicore Computing

    Internet

    Author: Other  

    Samueli School of Engineering University of California, Irvine  

    2020.02

  • How Waseda University is Helping Japan Stay Competitive

    Other

    Author: Other  

    Science Magazine, Vol. 367, Issue.6479  

    2020.02

  • Robots, Baseball, and Bilingualism Embody Waseda University' s Culture of Scholarship

    Other

    Author: Other  

    Science Magazine, Vol.367, Issue.6478  

    2020.02

  • Theoretical and Applied Research Help Cut Pollution

    Other

    Author: Other  

    Science Magazine, Vol.367, Issue.6476  

    2020.01

  • Waseda University: Driving positive change in science and society

    Internet

    Author: Other  

    American Association for the Advancement of Science  

    2020.01

  • Waseda University SEVP Hironori Kasahara, actively working in the World, talked about Waseda Open Innovation Valley Project: For strengthening Research in Waseda University in the Waseda Counsellor Forum on Dec. 7, 2019

    Internet

    Waseda Alumni Nishitokyo-tomonkai Home Page  

    2019.12

  • E. Tsutsui, “World Academic Summit Participation Report" 'pp.8-

    Other

    Author: Other  

    Between  

    2019.11

  • 犀牛鳥学聞 | 早稲田大学笠原副校長一行訪問騰訊

    Other

    Author: Other  

    騰訊高校合作  

    2019.11

  • he Only Way to Reflourish an Industrial Nation Japan is Industry-Academia Collaboratio

    Other

    Author: Other  

    LINK-J Interview Column  

    2019.10

  • Laser Kasahara-san --Innovation Day--

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    2019.10

  • Joining Forces for New Industry Creation --Waseda Open Innovation Valley Connecting Research Facility--

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    2019.09

  • Baidu ABC Institute and IEEE Computer Society Sign Memorandum of Understanding and Secure Global Partner Program

    Other

    Author: Other  

    MarketWatch  

    2019.08

  • IEEE Computer Society Awards Presentations

    Other

    Author: Other  

    Computer, IEEE CS  

    2019.06

  • Research Visit to Maeda Corporation ICI Integrated Research Center on May 7, 2019

    Other

    Author: Other  

    Waseda Univ. HP  

    2019.05

  • Multigrain Parallelization and Compiler/Architecture Co-design for 30 Years, Hironori Kasahara, pp.22

    Other

    Author: Other  

    Springer Nature Switzerland AG 2019, LNCS (Lecture Notes in Computer Science) 11403, Languages and Compilers for Parallel Computing, -- 30th International Workshop, LCPC 2017, College Station, TX, USA, October 11?13, 2017, Revised Selected Papers  

    2019.04

  • Computer Society marks Russia's 70th anniversary in computer science

    Other

    Author: Other  

    Computer, IEEE CS  

    2018.12

  • Welcome to SC18 Supercomputing Conference, with world's fastest temporary network at 4.02 terabytes a second. How fast is that? Enough to download Netflix's entire HD movie library in 45 seconds.

    Other

    Author: Other  

    Computer, IEEE CS  

    2018.12

  • IEEE Computer Society Brings Tencent and Waseda University Together for Special Event

    Internet

    Author: Other  

    IEEE Computer Society  

    2018.12

  • Tencent Travels to IEEE Computer Society President's Research Center in Japan to Discuss Supercomputing, Robotics under Global Partner Program

    Other

    Author: Other  

    Computer, IEEE CS  

    2018.12

  • Gallery of 2018 IEEE Computer Society Award Winners

    Other

    Author: Other  

    Computer, IEEE CS  

    2018.12

  • Waseda Univ. chasing World Top Class

    Newspaper, magazine

    Author: Other  

    Nihon Keizai Shimbun  

    2018.12

  • Hironori Kasahara congratulated ISP RAS and IEEE Computer Society Russia with the 70th anniversary of IT

    Internet

    Author: Other  

    ISP RAS HP (Ivannikov Institute for System Programming of the RAS)  

    2018.11

  • ACM Ken Kennedy Award

    Other

    Author: Other  

    ACM HP  

    2018.11

  • Vice President, Executive Vice President, Executive Directors and Auditors

    Internet

    Author: Other  

    Waseda Univ. HP  

    2018.11

  • Vice President, Executive Vice President, Executive Directors and Auditors

    Internet

    Author: Other  

    Waseda Univ. HP  

    2018.11

  • 2018中国計算机大会在杭州隆重挙行"

    Other

    Author: Other  

    CNCC News  

    2018.10

  • Global AI Narratives - Tokyo Workshop (invitation only: up to 40)

    Internet

    Author: Other  

    Toshie Takahashi Official Website  

    2018.09

  • Global AI Narratives - Tokyo Workshop (invitation only: up to 40)

    Internet

    Author: Other  

    Toshie Takahashi Official Website  

    2018.09

  • 国重成功主弁ACM ICS-2018大会

    Internet

    Author: Other  

    Institute of Computing Technology, Chinese Academy of Sciences  

    2018.07

  • Embedded Multicore/Manycore Software Development Technical Seminar

    Internet

    Author: Other  

    GAIO TECHNOLOGY CO., LTD  

    2018.07

  • Young Researchers in Information Processing were awarded -- IPSJ and IEEE-CS founded Young Researcher Award

    Internet

    Author: Other  

    Dream News  

    2018.07

  • Young Researchers in Information Processing were awarded -- IPSJ and IEEE-CS founded Young Researcher Award

    Internet

    Author: Other  

    Information Processing Society of Japan  

    2018.07

  • Name of HKN Chapter

    Internet

    Author: Other  

    IEEE HP  

    2018.07

  • Attractive Feature is that Students can Learn from World Leading Professors and Make High Level Research

    Internet

    Author: Other  

    Waseda Univ. HP  

    2018.07

  • Proxor and IEEE Computer Society (CS) to co-host the COMPSAC 2018 Software Developer-Java Programming T1 Challenge

    Internet

    Author: Other  

    Proxor and IEEE Computer Society (CS)  

    2018.06

  • "協調設計"にとらわれず、ユーザー最適の製品を作る

    Other

    Author: Other  

    NIKKEI ELECTRONICS  

    2018.04

  • Meet Hironori Kasahara, The 2018 President of the IEEE Computer Society

    Other

    Author: Other  

    Interface, IEEE CS  

    2018.04

  • 会見IEEE計算机協会2018年主席笠原博德。他的計劃是什麼?

    Other

    Author: Other  

    IEEE計算机協会  

    2018.03

  • Computer and IEEE Micro Magazines Highlight Intel's Loihi, a Revolutionary Neuromorphic 'Self-Learning' Chip

    Other

    Author: Other  

    KSLA NEWS12  

    2018.03

  • Meet Hironori Kasahara, The 2018 President Of The IEEE Computer Society. What Are His Plans?

    Other

    Author: Other  

    Computer, IEEE CS  

    2018.03

  • Now Accepting Nominations for Computer Society Officer Positions

    Other

    Author: Other  

    Interface, IEEE CS  

    2018.03

  • Hironori Kasahara Edit Profile computer science educator

    Internet

    Author: Other  

    Prabook  

    2018.01

  • A new member of the Engineering Academy of Japan

    Other

    Author: Other  

    EAJ News  

    2017.12

  • Future of Green Multicore Computing

    Internet

    Author: Other  

    Dipartimento di Elettoronica  

    2017.07

  • Message from the CAP 2017 Organizing Committee

    Internet

    Author: Other  

    COMPSAC 2017  

    2017.07

  • Automatic Cache and Local Memory Optimization for Multicores

    Internet

    Author: Other  

    17th INTERNATIONAL FORUM ON MPSoC  

    2017.07

  • New Fellow Awards ceremony and social gathering in 2017

    Other

    Author: Other  

    IEEE TOKYO  

    2017.05

  • Parallel software technology going ahead 5 years, the huge market beyond Death Valley

    Internet

    Author: Other  

    NIKKEI Technology Online  

    2017.04

  • Cool Chips, Low Power Multicores, Open the Way to the Future

    Internet

    Author: Other  

    COOL CHIPS2017  

    2017.04

  • INCJ to invest in Oscar Technology Corporation A venture company providing software parallelization technology

    Internet

    Author: Other  

    Innovation Network Corporation of Japan HP  

    2017.03

  • INCJ to invest in Oscar Technology Corporation A venture company providing software parallelization technology

    Internet

    Author: Other  

    Innovation Network Corporation of Japan HP  

    2017.03

  • Prof. Kasahara, faculty of science and engineering, has been elected IEEE Computer Society President 2018 from Japan, for the first time in 70 years history

    Other

    Author: Other  

    CAMPUS NOW  

    2017.02

  • American Exascale Project Basic Plan Became clear

    Internet

    Author: Other  

    My navi News Technology  

    2017.01

  • International Workshop on A Strategic Initiative of Computing: Systems and Applications (SISA): Integrating HPC, Big Data, AI and Beyond

    Internet

    Author: Other  

    Japan ROBOT Database System  

    2017.01

  • International Workshop on A Strategic Initiative of Computing: Systems and Applications (SISA): Integrating HPC, Big Data, AI and Beyond

    Internet

    Author: Other  

    SGU, Waseda University  

    2016.12

  • International Workshop on A Strategic Initiative of Computing: Systems and Applications (SISA): Integrating HPC, Big Data, AI and Beyond

    Internet

    Author: Other  

    SGU, Waseda University  

    2016.12

  • The future of tech: 16 trends for 2017 through 2022

    Internet

    Author: Other  

    Health Data Management  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    Yahoo finance  

    2016.12

  • Prof. Kasahara, Waseda University Faculty of Science Engineering, was awarded IEEE Fellow.

    Internet

    Author: Other  

    Faculty of Science and Engineering, Waseda University  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    EconoTimes  

    2016.12

  • IEEE Computer Society Predicts the Future of Tech for 2017 and Next Five Years

    Internet

    Author: Other  

    Yahoo singapore finance  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    The Sacramento Bee  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    StreetInsider  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    Silicon Valley Business Journal  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    San Francisco Business Times  

    2016.12

  • IEEE Computer Society Predicts the Future of Tech for 2017 and Next Five Years

    Internet

    Author: Other  

    PR Newswire  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    Pittsburgh Post-Gazette  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    New York Business Journal  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    MarketWatch  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    ITBusinessNet  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    infoTECH Spotlight  

    2016.12

  • IEEE Computer Society Predicts the Future of Tech for 2017 and Next Five Years

    Internet

    Author: Other  

    IEEE Computer Society  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    EE Times  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    Denver Business Journal  

    2016.12

  • IEEE Computer Society expects blockchain technology to reach adoption in 2017

    Internet

    Author: Other  

    Boston Business Journal  

    2016.12

  • 2017 Newly Elevated Fellows

    Internet

    Author: Other  

    IEEE Computer Society  

    2016.12

  • 西電承弁|HPC NPC等一大波國際學術會議來襲

    Internet

    Author: Other  

    必品文章網  

    2016.10

  • 西電承弁|HPC NPC等一大波國際學術會議來襲

    Internet

    Author: Other  

    西電承弁  

    2016.10

  • 西電承弁第十三届網絡与并行計算国際会議

    Internet

    Author: Other  

    西安電子科技大学学術信息網  

    2016.10

  • 西電承弁第十三届網絡与并行計算国際会議

    Internet

    Author: Other  

    西安電子科技大学新聞網  

    2016.10

  • IEEE Computer Society elects its first president from Japan in its 70-year history

    Internet

    Author: Other  

    Waseda Univ. HP  

    2016.10

  • IEEE Computer Society elects its first president from Japan in its 70-year history

    Internet

    Author: Other  

    Faculty of Science and Engineering, Waseda University  

    2016.10

  • IEEE Computer Society elects its first president from Japan in its 70-year history

    Internet

    Author: Other  

    Facebook, Waseda University  

    2016.10

  • Prof. Kasahara has been elected IEEE Computer Society President 2018.

    Internet

    Author: Other  

    Waseda University Green Computing Systems Research organization News  

    2016.10

  • IEEE Computer Society elects its first president from Japan in its 70-year history

    Internet

    Author: Other  

    research-er.jp  

    2016.10

  • IEEE Computer Society elects its first president from Japan in its 70-year history

    Internet

    Author: Other  

    Faculty of Science and Engineering, Waseda University  

    2016.10

  • IEEE Computer Society elects its first president from Japan in its 70-year history

    Internet

    Author: Other  

    Facebook, Waseda University  

    2016.10

  • Hironori Kasahara Voted 2017 IEEE Computer Society President-Elect

    Internet

    Author: Other  

    IEEE Computer Society  

    2016.09

  • 2016 IEEE Computer Society Election Results -- Hironori Kasahara selected 2017 President-Elect (2018 President)--

    Internet

    Author: Other  

    IEEE Computer Society  

    2016.09

  • IEEE Computer Society Election Opens on 01 August 2016

    Internet

    Author: Other  

    IEEE Computer Society  

    2016.07

  • Person: Dr. Hironori Kasahara, Professor of Faculty of Science and Engineering, Waseda University --Unique Multicore Technology Establishment for Green ICT Realization--

    Newspaper, magazine

    Author: Other  

    Japan Chemical Daily  

    2016.04

  • Productization of Multicore Processors and Parallelizing Compilers aiming at Fastest Execution and Low Power Consumption

    Other

    Author: Other  

    THE TOWER  

    2016.03

  • Passionately pursuing research and inspiring students for over 30 years

    Internet

    Author: Other  

    WASEDA ONLINE  

    2016.02

  • Pursuing Excitement for Over 30 Years -Waseda Weekly-

    Internet

    Author: Other  

    Waseda Weekly  

    2016.02

  • Pursuing Excitement for Over 30 Years

    Other

    Author: Other  

    Waseda Weekly  

    2016.02

  • Feeling of excitement has been continuing for more than 30 years --It does not change that he has been aiming --

    Other

    Author: Other  

    Waseda Weekly  

    2016.02

  • Feeling of excitement has been continuing for more than 30 years --It does not change that he has been aiming --

    Internet

    Author: Other  

    Waseda Weekly  

    2016.01

  • Gaudiot Voted 2016 Computer Society President-Elect, pp.102-103

    Other

    Author: Other  

    Computer, IEEE Computer Society  

    2015.12

  • Environment Friendly Low Power Computer Technology --Launched for Products toward Social Implementation--

    Newspaper, magazine

    Author: Other  

    The Science News  

    2015.11

  • Waseda University Kasahara Kimura Laboratory

    Internet

    Author: Other  

    Embedded Technology 2015  

    2015.11

  • Eco-friendly innovation for automobiles to mobile phones to cancer therapy

    Internet

    Author: Other  

    Waseda Univ. HP  

    2015.11

  • Japanese ambassador welcomed opportunities for UD-Japan collaborations focus of visit

    Internet

    Author: Other  

    UDaily, University of Delaware  

    2015.11

  • Multicore Processor and Parallelizing

    Internet

    Author: Other  

    Automotive Engineers' Guide  

    2015.11

  • Remarkable Embedded Systems Latest Technology - Oscar Tech. Compiler Parallelizes Customers' Sequential Programs -

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun  

    2015.10

  • Global computing collaboration

    Internet

    Author: Other  

    UDaily, University of Delaware  

    2015.09

  • IEEE Computer Society 2022 Report: In Era of Seamless Intelligence, Information Will Be Gathered by Our Senses

    Internet

    Author: Other  

    Forward Geek  

    2014.11

  • Roger Fujii Voted 2015 IEEE Computer Society President-Elect

    Internet

    Author: Other  

    IEEE Computer Society  

    2014.10

  • Technology in 2022: A Report from the IEEE

    Internet

    Author: Other  

    IEEE Computer Society  

    2014.10

  • IEEE Report shows how ingrained IoT has become in our future

    Internet

    Author: Other  

    Rethink Internet of Things  

    2014.10

  • What will our world look like in 2022? --IEEE Computer Society

    Internet

    Author: Other  

    @godwin. Caruana  

    2014.09

  • IEEE: 23 technologies that could make 2022 look a whole lot different

    Internet

    Author: Other  

    Smart itiies Council  

    2014.09

  • IEEE Computer Society Looks to the Future with Report on Top Technologies for 2022 -Cloud Computing-

    Internet

    Author: Other  

    Cloud Computing  

    2014.09

  • IEEE Picks Top 23 Technologies for 2022

    Internet

    Author: Other  

    eweek  

    2014.09

  • IEEE Visualises The Technology Landscape in 2022

    Internet

    Author: Other  

    Computer Business Review  

    2014.09

  • Candidates Approved for 2014 IEEE Computer Society Election -i-Newswire.com-

    Internet

    Author: Other  

    i-Newswire.com  

    2014.06

  • 2014 Three Waseda Professors receive Minister of Education, Culture, Sports, Science and Technology 2014 Commendations for Science and Technology, Prof. Kasahara, Prof. Kimura and Assistant Prof. Tanabe -Yomiuri Online-

    Internet

    Author: Other  

    Yomiuri Online  

    2014.04

  • Three Waseda professors receive Education Minister 2014 Commendations for Science and Technology

    Internet

    Author: Other  

    The Japan News by The Yomiuri Shimbun  

    2014.04

  • Waseda Univ., Prof. Kasahara, Prof. Kimura, Assist. Prof. Tanabe have Award Prizes for Science and Technology (Research), The Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology

    Other

    Author: Other  

    Waseda University Press Release  

    2014.04

  • 2014 Recipients of Minister of Education, Culture, Sports and Technology Commendations for Science and Technology

    Internet

    Author: Other  

    Ministry of Education, Culture, Sports, Science and Technology (MEXT)  

    2014.04

  • Linking Advanced Research Accomplishment of Waseda Univ. Hironori Kasahara/Keiji Kimura Laboratory of Industry

    Internet

    Author: Other  

    Oscar Technology Corporation  

    2014

  • IEEE Computer Society Looks to the Future with Report on Top Technologies for 2022

    Internet

    Author: Other  

    iReach by PR Newswire  

    2014

  • President Elect

    Internet

    Author: Other  

    IEEE Computer Society  

    2014

  • IEEE predicts Top Technologies for 2022

    Internet

    Author: Other  

    eweek  

    2014

  • Candidates Approved for 2014 IEEE Computer Society Election

    Internet

    Author: Other  

    Calameo  

    2014

  • IEEE predicts top technologies for 2022"

    Internet

    Author: Other  

    Bicsi South Pacific  

    2014

  • Multicore Software Parallelization --Oscar Technology Corporation--

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    2013.12

  • ESOL starts research on program parallelization with Waseda Univ.

    Internet

    Author: Other  

    YAHOO! JAPAN news  

    2012.11

  • ESOL starts research on program parallelization with Waseda Univ.

    Internet

    Author: Other  

    Pixiv  

    2012.11

  • ESOL et al starts research on program parallelization service for multicores

    Internet

    Author: Other  

    nikoniko news  

    2012.11

  • ESOL et al starts research on program parallelization service for multicores

    Internet

    Author: Other  

    mynabi news  

    2012.11

  • ESOL starts research on program parallelization with Waseda Univ.

    Internet

    Author: Other  

    msn topics  

    2012.11

  • ESOL et al starts research on program parallelization service for multicores

    Internet

    Author: Other  

    Mapion news  

    2012.11

  • ESOL et al starts research on program parallelization service for multicores

    Internet

    Author: Other  

    Livedoor NEWS  

    2012.11

  • ESOL starts research on program parallelization with Waseda Univ.

    Internet

    Author: Other  

    japan.internet.com  

    2012.11

  • ESOL starts collaborative research project with Waseda Univ. on program parallelization support service for multicores using Oscar compiler

    Internet

    Author: Other  

    income.co.jp  

    2012.11

  • ESOL starts research on program parallelization with Waseda Univ.

    Internet

    Author: Other  

    excite.news  

    2012.11

  • ESOL et al starts research on program parallelization service for multicores

    Internet

    Author: Other  

    excite.news  

    2012.11

  • ESOL starts collaborative research project with Waseda Univ. on program parallelization support service for multicores using Oscar compiler

    Internet

    Author: Other  

    ESOL Press Release  

    2012.11

  • US Patent Issued to Hitachi, Renesas Electronics, Waseda University on June 12 for 'Data Transfer Unit in Multi-Core Processor' (Japanese Inventors)

    Internet

    Author: Other  

    HighBeam RESEARCH  

    2012.06

  • Performance-up Multicore Processors: Allowing High Speed Low Power Execution of a Parallel Program

    Newspaper, magazine

    Author: Other  

    The Science News  

    2012.05

  • Multicore Parallel Program Software Standard 'OSCAR API' adapted ESOL 'eT-Kernel Multi-Core Edition' as an Evaluation Environment

    Internet

    Author: Other  

    Tech -On! Nikkei BP  

    2012.05

  • Software Standard 'OSCAR API' of Parallel Programs for Multicore Processors Developed by Waseda Univ. Industry, etc. Adapted ESOL 'Real-time OS eT-Kernel Multi-Core Edition' as an Evaluation Environment

    Internet

    Author: Other  

    incom.co.jp  

    2012.05

  • High Speed and Low Power Execution of Parallel Programs on Multicore Processor Systems, Waseda Kasahara Lab. Development and Free Distribution of World New Software Interface (Oscar API ver. 2.0) -Waseda University Green Computing Systems Research organization News-

    Internet

    Author: Other  

    Waseda University Green Computing Systems Research organization News  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    YAHOO! JAPAN news  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    webapi.jpn.com  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    unwired job professional.jp  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    prtimes.jp  

    2012.04

  • 'Oscar API ver. 2.0' for Homogeneous / Heterogeneous Multicores is open to the public

    Internet

    Author: Other  

    PC Watch  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    mynabi news  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    media jam  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    livedoor news  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    Infoseek news  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    HosPit119.net  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    Hatena Bookmark  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    excite.news  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    choix.jp  

    2012.04

  • Waseda Univ. developed Software Standard 'Oscar API ver. 2.0'

    Internet

    Author: Other  

    apiclip.blogspot.jp  

    2012.04

  • High Speed and Low Power Execution of Parallel Programs on Multicore Processor Systems, Waseda Kasahara Lab. Development and Free Distribution of World New Software Interface (Oscar API ver. 2.0)

    Other

    Author: Other  

    Waseda University Press Release  

    2012.04

  • Efficient Development of High Performance Smart Phones, Waseda with Hitachi etc.

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2012.04

  • ESOL 'eT-Kernel Multi-Core Edition' has been adapted as an evaluation environment of Multicore Software Standard 'OSCAR API'

    Internet

    Author: Other  

    markezine.jp  

    2012.04

  • ESOL 'eT-Kernel Multi-Core Edition' has been adapted as an evaluation environment of Multicore Software Standard 'OSCAR API'

    Internet

    Author: Other  

    KOCHI Press  

    2012.04

  • ESOL 'eT-Kernel Multi-Core Edition' has been adapted as an evaluation environment of Multicore Software Standard 'OSCAR API'

    Internet

    Author: Other  

    ipadnews.jp  

    2012.04

  • Computer Science and Engineering, Waseda University, 'OSCAR API 2.0 Specification Download'

    Other

    Author: Other  

    Kasahara Laboratory  

    2012.04

  • Potential capabilities of multicore systems can be pulled out by the cooperation of development tools and OS

    Internet

    Author: Other  

    Nikkei BP Tech-on Special Discussion  

    2011.07

  • Securing a competitive advantage for Japan through green IT which supports a low-carbon society

    Internet

    Author: Other  

    Asia Research News  

    2011.06

  • Open Japanese Future by the Ultimate High Performance and Saving Electricity Computers -CYBERNET NEWS, No.133 SUMMER-

    Internet

    Author: Other  

    CYBERNET NEWS, No.133 SUMMER  

    2011.06

  • Open Japanese Future by the Ultimate High Performance and Saving Electricity Computers

    Other

    Author: Other  

    CYBERNET NEWS, No.133 SUMMER, pp.4-7  

    2011.06

  • Open Japanese Future by the Ultimate High Performance and Saving Electricity Computers

    Newspaper, magazine

    Author: Other  

    Visual Communications Journal  

    2011.05

  • Founded Research&Development Center of Green Computing

    Internet

    Author: Other  

    Waseda Univ. HP  

    2011.05

  • Securing a competitive advantage for Japan through green IT which supports a low-carbon society

    Internet

    Author: Other  

    Waseda Research Zone  

    2011.05

  • Waseda Research Zone,'Securing a competitive advantage for Japan through green IT which supports a low-carbon society'

    Internet

    Author: Other  

    Daily Yomiuri Online  

    2011.05

  • Waseda Univ. Founded Research&Development Center for Realization of Future Green Computing -Unwired Job Professional -

    Internet

    Author: Other  

    Unwired Job Professional  

    2011.05

  • Waseda Univ. Founded Research&Development Center for Realization of Future Green Computing -midashi.jp -

    Internet

    Author: Other  

    midashi.jp  

    2011.05

  • Waseda Univ. Founded Research&Development Center for Realization of Future Green Computing -media jam-

    Internet

    Author: Other  

    media jam  

    2011.05

  • Waseda Univ. Founded Research&Development Center for Realization of Future Green Computing -First career Trading System Development-

    Internet

    Author: Other  

    First career Trading System Development  

    2011.05

  • Waseda Univ. Founded Research&Development Center for Realization of Future Green Computing

    Internet

    Author: Other  

    Mycom Jounrnal  

    2011.05

  • Hitachi, Adding a New Model SR16000 VM1 to Super-technical ServeSR16000 Series

    Other

    Author: Other  

    Hitachi Hitac, Vol. 2011-Spring, No. 5, pp.17  

    2011.05

  • Waseda Univ. & Nagoya Univ. Founded Centers for Environment Technology Development

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2011.05

  • Waseda Univ.Opens Low Power Consumption IT Apparatus Research Center

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    2011.05

  • Waseda Symposium

    Internet

    Author: Other  

    Toshin.com  

    2011.05

  • Waseda Symposium

    Internet

    Author: Other  

    TOSHIN TIMES Express Education Information  

    2011.05

  • Founded Research&Development Center of Green Computing

    Other

    Author: Other  

    Waseda University CAMPUS NOW, Vol. 196 pp.4  

    2011.05

  • Waseda University Bldg.40

    Other

    Author: Other  

    SHINKENCHIKU-SHA, SHINKENCHIKU 2011:4, pp.101  

    2011.04

  • Hitachi, Adding new model SR16000 VM1 to Super-technical Server SR16000 Series:Installation to Waseda University Green Computing Systems Research Development Center

    Internet

    Author: Other  

    Qlep  

    2011.03

  • Hitachi, Adding new model SR16000 VM1 to Super-technical Server SR16000 Series:Installation to Waseda University Green Computing Systems Research Development Center

    Internet

    Author: Other  

    News2u.net  

    2011.03

  • Hitachi, Adding new model SR16000 VM1 to Super-technical Server SR16000 Series:Installation to Waseda University Green Computing Systems Research Development Center

    Internet

    Author: Other  

    Mapion News  

    2011.03

  • Hitachi, Adding new model SR16000 VM1 to Super-technical Server SR16000 Series:Installation to Waseda University Green Computing Systems Research Development Center

    Internet

    Author: Other  

    Livedoor News  

    2011.03

  • Hitachi, Adding new model SR16000 VM1 to Super-technical Server SR16000 Series:Installation to Waseda University Green Computing Systems Research Development Center

    Internet

    Author: Other  

    IT Press Release  

    2011.03

  • Hitachi, Adding new model SR16000 VM1 to Super-technical Server SR16000 Series:Installation to Waseda University Green Computing Systems Research Development Center

    Internet

    Author: Other  

    Infoseek News  

    2011.03

  • Hitachi, Adding new model SR16000 VM1 to Super-technical Server SR16000 Series:Installation to Waseda University Green Computing Systems Research Development Center

    Internet

    Author: Other  

    IMPRESS BUSINESS MEDIA  

    2011.03

  • Hitachi, Adding new model SR16000 VM1 to Super-technical Server SR16000 Series

    Internet

    Author: Other  

    IT Leaders  

    2011.03

  • Theoretical Peak Performance 6.4 Times, Scientific and Technological Supercomputer

    Internet

    Author: Other  

    asahi.com  

    2011.03

  • Theoretical Peak Performance 6.4 Times, Scientific and Technological Supercomputer, Delivered to Waseda Univ.

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    2011.03

  • New Server for Scientific and Technological computation

    Newspaper, magazine

    Author: Other  

    Japan Chemical Daily  

    2011.03

  • Power7 Processor Based Super Technical Server:A New Model Added

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun  

    2011.03

  • Hitachi, Adding new model SR16000 VM1 to Super-technical Server SR16000 Series:Installation to Waseda University Green Computing Systems Research Development Center

    Internet

    Author: Other  

    Nikkei Press Release  

    2011.03

  • Hitachi, Adding new model SR16000 VM1 to Super-technical Server SR16000 Series:Installation to Waseda University Green Computing Systems Research Development Center

    Internet

    Author: Other  

    Nikkan Kogyo Shimbun Business Line  

    2011.03

  • Hitachi, Adding new model SR16000 VM1 to Super-technical Server SR16000 Series

    Internet

    Author: Other  

    Bluecom  

    2011.03

  • Putting Japanese Technology at the Top of the World with Parallelization of Next Generation Multicore Processors.

    Internet

    Author: Other  

    WASEDA ONLINE  

    2010.10

  • Putting Japanese Technology at the Top of the World with Parallelization of Next Generation Multicore Processors.

    Internet

    Author: Other  

    innovations report  

    2010.05

  • Developed LSI for next generation consumer electronics realizing highest level performance and low power consumption

    Internet

    Author: Other  

    WASEDA ONLINE (YOMIURI ONLINE) Campus Now  

    2010.05

  • Waseda University's Prof. Kasahara is seeding the next revolution in eco-friendly computing, by Hugh Ashton

    Other

    Author: Other  

    ACCJ Journal (American Chamber of Commerce in Japan)  

    2010.02

  • The Japanese supercomputer next generation shelved?

    Internet

    Author: Other  

    Science Knowledge  

    2010.02

  • Knowledge Co-Creation Profiles of researchers Putting Japanese Technology at the Top of the World With Parallelization of Next Generation Multicore Processors

    Internet

    Author: Other  

    Daily Yomiuri Online Waseda Online  

    2010.02

  • Next Generation Supercomputer

    Newspaper, magazine

    Author: Other  

    Asahi Shimbun  

    2010.02

  • Heterogeneous Multicore for the Next Generation of Imformation Appliance Developed

    Other

    Author: Other  

    Waseda University Press Release  

    2010.02

  • Putting Japanese Technology at the Top of the World with Parallelization of Next Generation Multicore Processors.

    Internet

    Author: Other  

    WASEDA ONLINE  

    2010.02

  • LSI Performance Per Watt Twice

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2010.02

  • Hetero multicore LSI offers highest level performance per watt

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    2010.02

  • Development of Heterogeneous Multicore LSI attaining 37 GOPS/WATT

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun  

    2010.02

  • Heterogeneous Multicore for the Next Generation of Imformation Appliance Developed -Techno Future-

    Internet

    Author: Other  

    Techno Future  

    2010.02

  • Renesas developed Heterogeneous Multicore System LSI for next Generation Information Appliance

    Internet

    Author: Other  

    Semiconductor Japan Net  

    2010.02

  • Renesas: Heterogeneous Multicore Embedded Precessor

    Internet

    Author: Other  

    PC Watch  

    2010.02

  • Renesas and Hitachi developed Heterogeneous High Performance Multicore System LSI for Next Generation TV and Recorders

    Internet

    Author: Other  

    NIKKEI NET  

    2010.02

  • Renesas developed Heterogeneous Multicore System LSI for next Generation Information Appliance

    Internet

    Author: Other  

    Nikkan Kogyo Shimbun  

    2010.02

  • ISSCC2010, Renesas developed Heterogeneous Multicore System LSI

    Internet

    Author: Other  

    Jounrnal Mycom  

    2010.02

  • Renesas, Hitachi developed Heterogeneous Multicore LSI for next Generation Information Appliance

    Internet

    Author: Other  

    IT+PLUS  

    2010.02

  • Renesas developed Heterogeneous Multicore System LSI

    Internet

    Author: Other  

    Feed Archive  

    2010.02

  • Renesas developed Heterogeneous Multicore System LSI

    Internet

    Author: Other  

    ELISNET  

    2010.02

  • Renesas, Hitachi developed 37 GOPS/W Heterogeneous Multicore LSI for Information Appliance

    Internet

    Author: Other  

    EDR, LLC  

    2010.02

  • ISSCC2010, Renesas developed Heterogeneous Multicore System LSI

    Internet

    Author: Other  

    BIO IMPACT  

    2010.02

  • Heterogeneous Multicore for the Next Generation of Information Appliance Developed

    Other

    Author: Other  

    Renesastechnology Press Release  

    2010.02

  • Heterogeneous Multicore for the Next Generation of Imformation Appliance Developed

    Other

    Author: Other  

    Hitachi Press Release  

    2010.02

  • The Epoch of Parallelizing Software

    Internet

    Author: Other  

    EE Times Japan  

    2009.12

  • Wasedauniversitetet Japan bygger super-cpu

    Internet

    Author: Other  

    Newsbrook  

    2009.11

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    MCU BBS  

    2009.11

  • UPCRC Illinois: Research Seminar-Hironori Kasahara,Waseda University

    Internet

    Author: Other  

    PARALLEL@ILLINOIS  

    2009.10

  • Japanese researchers downplay super CPU effort

    Internet

    Author: Other  

    ZDNet  

    2009.10

  • Njujork podneo tuzbu protiv Intel-a

    Internet

    Author: Other  

    PC Press info  

    2009.10

  • Intel bi tong tan cong bang du an super CPU

    Internet

    Author: Other  

    Newsad.org  

    2009.10

  • सात Chipmakers सिम्मिलत हों हाथ नई ूोसेसर िवकिसत करनेके िलए

    Internet

    Author: Other  

    GURUPERL.net  

    2009.10

  • Japanese researchers downplay super CPU effort

    Internet

    Author: Other  

    Design Analysis  

    2009.10

  • Panasonic: Projekat upravljanja energijom u kuci

    Internet

    Author: Other  

    PC Press info  

    2009.10

  • Japanese researchers have used parallel chip

    Internet

    Author: Other  

    Joomla Onair  

    2009.10

  • Japanese researchers harness parallel chips

    Internet

    Author: Other  

    ZDNet UK  

    2009.09

  • Japanisches Projekt soll Standard-API fur MulticoreProzessoren entwickeln

    Internet

    Author: Other  

    ZDNet News  

    2009.09

  • Japanese Researchers Downplay Super CPU Effect

    Internet

    Author: Other  

    communications of the ACM  

    2009.09

  • Japanese Researchers Downplay Super CPU Effect

    Internet

    Author: Other  

    CACM (Communications of the ACM)  

    2009.09

  • Japanese researchers downplay super CPU effort

    Internet

    Author: Other  

    ZDNet Asia  

    2009.09

  • Full Coverage: Japanese researchers downplay super CPU effort

    Internet

    Author: Other  

    Newstin  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    World Tech Magazine  

    2009.09

  • Cac hang Nhat phat trien CPU tiet kiem dien nang

    Internet

    Author: Other  

    VietnamPlus  

    2009.09

  • Intel to get a new competitor by 2012

    Internet

    Author: Other  

    Techie-buzz AMD  

    2009.09

  • Giappone: maxi-allenza nei microprocessori contro

    Internet

    Author: Other  

    Swissinfo.ch  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Striker  

    2009.09

  • Seven Chipmakers Join Hands to Develop New ProcessorTake on Intel and AMD

    Internet

    Author: Other  

    softpedia  

    2009.09

  • Japan spending $42m to develop solar-powered 'super CPU'

    Internet

    Author: Other  

    silobreaker  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    OSNews  

    2009.09

  • Japanese electronics giants set to make microprocessor

    Internet

    Author: Other  

    NordicHardware  

    2009.09

  • Japanese Firms In CPU Alliance To Unseat Intel

    Internet

    Author: Other  

    Nikkei.com  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    NexGadget  

    2009.09

  • Cac hang Nhat phat trien CPU tiet kiem dien nang

    Internet

    Author: Other  

    Kinhte hop, tac viet nam  

    2009.09

  • Intel Atom dev program launched, seeks to inspire netbookcentric applications

    Internet

    Author: Other  

    Kev.W  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Insomnia  

    2009.09

  • Japan lapkagyartok az Intel ellen

    Internet

    Author: Other  

    Informatika Online  

    2009.09

  • Japan Fashions Super Chip

    Internet

    Author: Other  

    Forbes.com  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    ENGADGET  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Elanso  

    2009.09

  • Japon elektronik devleri, Intel'e karsi bir araya geliyorlar

    Internet

    Author: Other  

    donanimhaver.com  

    2009.09

  • Japan to Develop Super CPU

    Internet

    Author: Other  

    CDRinfo.com  

    2009.09

  • Japan spending $42m to develop solar-powered 'super CPU'

    Internet

    Author: Other  

    Business Green  

    2009.09

  • Is it Intel's challenge?

    Internet

    Author: Other  

    @astera  

    2009.09

  • Steady efforts seen in Unifying standard of CPU for Digital Consumer Electronics

    Internet

    Author: Other  

    ZUNOU HOUDAN  

    2009.09

  • Νεο πρωτοποριακό chip-επεξεργαστής

    Internet

    Author: Other  

    zefyr  

    2009.09

  • Япония: догнать и перегнать Intel

    Internet

    Author: Other  

    DonbassUA  

    2009.09

  • Основатели GLOBALFOUNDRIES покупают Chartered

    Internet

    Author: Other  

    3D News  

    2009.09

  • Японцы бросят вызов Intel?

    Internet

    Author: Other  

    Понедельник, 07 Сентября, 2009  

    2009.09

  • Японцы бросят вызов Intel?

    Internet

    Author: Other  

    Mobus news  

    2009.09

  • Японцы бросят вызов Intel?

    Internet

    Author: Other  

    @astera  

    2009.09

  • "सात Chipmakers सिम्मिलत हों हाथ नई ूोसेसर िवकिसत करनेके िलए"

    Internet

    Author: Other  

    GURUPERL.net  

    2009.09

  • Linux for realtid fran Wind River

    Internet

    Author: Other  

    ELEKTRONIK  

    2009.09

  • A japan oriasok kihivjak az Intelt

    Internet

    Author: Other  

    PROHARDVER  

    2009.09

  • A japan oriasok kihivjak az Intelt

    Internet

    Author: Other  

    Bovito.hu  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    sketchubar  

    2009.09

  • Эра мобильных одноядерных чипов на пороге заката

    Internet

    Author: Other  

    3DNews  

    2009.09

  • Giganter bag plan om str?m-besparende chip

    Internet

    Author: Other  

    Ingenioren  

    2009.09

  • 日大厂結盟来勢洶洶 ARM威脇更顕迫切 Intel地位岌岌可危

    Internet

    Author: Other  

    Cibu.cn  

    2009.09

  • Japanische Elektronikkonzerne wollen Intel-Chips durch Eigenentwicklung

    Internet

    Author: Other  

    Zdnews.de  

    2009.09

  • Toshiba, Nec, Hitachi et Canon contre Inetel sur les processeurs

    Internet

    Author: Other  

    UNHOMME.FR  

    2009.09

  • Asian firms eye alternative to Intel

    Internet

    Author: Other  

    SILICON INVESTOR  

    2009.09

  • Seven Chipmakers Join Hands to Develop New

    Internet

    Author: Other  

    ERODOV.COM  

    2009.09

  • Empresas da Asia buscam uma alternativa a Intel

    Internet

    Author: Other  

    Convergencia Digital  

    2009.09

  • Intel e AMD, pericolo asiatico

    Internet

    Author: Other  

    Arduer.com  

    2009.09

  • 日大厂結盟来勢洶洶 ARM威脇更顕迫切 Intel地位岌岌可危

    Internet

    Author: Other  

    第五頻道論壇  

    2009.09

  • 日大廠結盟來勢洶洶 ARM威脅更顯迫切 英特爾地位岌岌可危

    Internet

    Author: Other  

    財經新聞 科技産業  

    2009.09

  • 日本数家電子巨頭聯合自主開発芯片対抗Intel

    Internet

    Author: Other  

    Donews  

    2009.09

  • 일본, 디지털가전 규격 통일 추진

    Internet

    Author: Other  

    esnet.go.kr  

    2009.09

  • 7 Perusahaan Jepang Hadapi AMD-Intel

    Internet

    Author: Other  

    VIVANEWS  

    2009.09

  • Toshiba, Nec, Hitachi et Canon contre Inetel sur les processeurs

    Internet

    Author: Other  

    Ubergizmo  

    2009.09

  • Intel bi tong tan cong bang du an super CPU

    Internet

    Author: Other  

    Trasua  

    2009.09

  • 7 Perusahaan Jepang Hadapi AMD-Intel

    Internet

    Author: Other  

    Teknologi  

    2009.09

  • Japanske kompanije razvijaju novi mikroprocesor

    Internet

    Author: Other  

    PCPRESS  

    2009.09

  • Waseda Unibersity at center of Efforts to Produce Super Green Processor Chip

    Internet

    Author: Other  

    Japan Higher Education Outlook (JHEO)  

    2009.09

  • Japanske kompanije razvijaju novi mikroprocesor

    Internet

    Author: Other  

    ETH.RS  

    2009.09

  • Asian firms eye alternative to Intel

    Internet

    Author: Other  

    C-NET  

    2009.09

  • Report: Asian firms eye alternative chips

    Other

    Author: Other  

    CNETNews  

    2009.09

  • Intel bi tong tan cong bang du an super CPU

    Internet

    Author: Other  

    Tien phong  

    2009.09

  • Intel bi tong tan cong bang du an super CPU

    Internet

    Author: Other  

    THUGIAN  

    2009.09

  • Japonski konzorcij kot konkurenca Intelu

    Internet

    Author: Other  

    Slo-tech.com  

    2009.09

  • Japansk processor pa vej i2012

    Internet

    Author: Other  

    newsDK  

    2009.09

  • 7 Japanese Companies to Develop CPU to Compete Against AMD and Intel

    Internet

    Author: Other  

    Neowin  

    2009.09

  • Sem Samuraev Protiv Intel Japoncy Reshili Sozdat Svoj Jenergojeffektivnyj Processor

    Internet

    Author: Other  

    Lucky Ace Poker  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Fuwuqi  

    2009.09

  • Le gouvernement japonais se donne 2 ans pour cr er un super micro-processeur

    Internet

    Author: Other  

    Digitaladventures  

    2009.09

  • Intel e AMD, attenti alle sette sorelle

    Internet

    Author: Other  

    Arduer.com  

    2009.09

  • Νεο πρωτοποριακό chip-επεξεργαστής

    Internet

    Author: Other  

    zefyr  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    望見竜  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    木本之家  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Yesky  

    2009.09

  • Seven Japanese Companies to Develop Microprocessor to Compete Against AMD and Intel

    Internet

    Author: Other  

    Xbitlaboratory  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    UPNB  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    The Third Media  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Server.ctocio  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    PCPOP.com  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    PCONLINE 太平洋社区  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Pchome  

    2009.09

  • iaponelebi vs. Intel

    Internet

    Author: Other  

    Overclockers  

    2009.09

  • 7 Japanese companies come together to develop a super CPU to challenge Intel

    Internet

    Author: Other  

    News.xzjdw.com  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Iworks  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    ITHOV  

    2009.09

  • Japan lapkagyartok az Intel ellen

    Internet

    Author: Other  

    HOC.hu  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Engadget Chinese  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Enet  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Citygf  

    2009.09

  • Tujuh Samurai Dari Jepang

    Internet

    Author: Other  

    CHIP Online Indonesia  

    2009.09

  • Japonsko chce vytvo it superprocesor, pr jako konkurenci Intelu

    Internet

    Author: Other  

    CDR.CZ  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Aol Tec  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    51invest.com  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    51CTO  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    新聞中心  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    PCBETA  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Ejiarui  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    8998CN  

    2009.09

  • 日本預打造'超級処理器'日本芯抗衡Intel

    Internet

    Author: Other  

    Redbots  

    2009.09

  • Bay cong ty cong nghe Nhat lien minh san xuat vi xu ly xanh

    Internet

    Author: Other  

    Techzone-vn  

    2009.09

  • 7 Japanese companies come together to develop a super CPU to challenge Intel

    Internet

    Author: Other  

    TechFuels News  

    2009.09

  • 7 Japanese companies come together to develop a super CPU to challenge Intel

    Internet

    Author: Other  

    JBTALKS  

    2009.09

  • Seven Samurai chipmakers set to take on Intel

    Internet

    Author: Other  

    Gadgetswow  

    2009.09

  • 日本科技企業聯手研発処理器対抗Intel

    Internet

    Author: Other  

    天涯社区  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    The Daily Tech Log  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    Ryanshooltz  

    2009.09

  • 7 Japanese companies come together to develop a super CPU to challenge Intel

    Internet

    Author: Other  

    Pclaunches  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    OSNews  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    nexgadget  

    2009.09

  • Seven Samurai Chipmakers Set to Take on Intel

    Internet

    Author: Other  

    iRepairGuide  

    2009.09

  • 日本科技企業聯手研発処理器対抗Intel

    Internet

    Author: Other  

    Forum.esm-cn  

    2009.09

  • Cac hang Nhat phat trien CPU tiet kiem dien nang

    Internet

    Author: Other  

    Congthuong  

    2009.09

  • Japanese Companies unifying standard of CPU for Digital Consumer Electronics

    Internet

    Author: Other  

    ZaiDiamond  

    2009.09

  • Japanese Companies unifying standard of CPU for Digital Consumer Electronics

    Internet

    Author: Other  

    Yahoo finance  

    2009.09

  • Japanese Companies unifying standard of CPU for Digital Consumer Electronics

    Internet

    Author: Other  

    Stock Station  

    2009.09

  • Japanese Companies unifying standard of CPU for Digital Consumer Electronics

    Internet

    Author: Other  

    Searchina  

    2009.09

  • Japanese Companies unifying standard of CPU for Digital Consumer Electronics

    Internet

    Author: Other  

    NIKKEI NET  

    2009.09

  • Japanese Companies unifying standard of CPU for Digital Consumer Electronics

    Internet

    Author: Other  

    NIKKEI IT PLUS  

    2009.09

  • Is Japan Gunning for Intel?

    Internet

    Author: Other  

    Fidelity  

    2009.09

  • Japan bygger 'super-cpu'

    Internet

    Author: Other  

    Elektroniktidningen  

    2009.09

  • Are we in for a CPU war? Japanese companies team up against Intel

    Internet

    Author: Other  

    Crunch Gear  

    2009.09

  • Japanese companies unify the standard of digital consumer electronics CPU

    Newspaper, magazine

    Author: Other  

    Nihon Keizai Shimbun  

    2009.09

  • Small Low Power Dynamic Reconfigurable Processor FE-GA

    Other

    Author: Other  

    The Journal of The Institute of Image Information and Television Engineers, Vol.63, No.9, pp.21-23  

    2009.09

  • Japanese researchers downplay super CPU effort

    Other

    Author: Other  

    The Invest Penang  

    2009.09

  • OSCAR Compiler

    Other

    Author: Other  

    DIME, Shogakukan  

    2009.09

  • Low Power High Speed LSI System Waseda University builts Innovative Development Center

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2009.08

  • Evolution Theory of Embedded Multicore (5)Multicore Standard API OpenMP

    Internet

    Author: Other  

    IT MONOIST  

    2009.06

  • Writability rather than Performance knowing hart of software designer in Multicore LSI

    Internet

    Author: Other  

    Nikkei Tech-On EDA Online  

    2009.05

  • 'Networking Now' Part2-5, Connecting Games -asahi.com-

    Internet

    Author: Other  

    asahi.com  

    2009.02

  • 'Networking Now' Part2-5, Connecting Games

    Newspaper, magazine

    Author: Other  

    Asahi Shimbun  

    2009.02

  • Multicore MPU for Consumer Electronics -Reduction of Consumed Power by Parallel Processing: Accomplishment by National Project-

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2009.01

  • Innovative Multicore LSI Realized Low Consumed Power and High Software Productivity -Hitachi Hyoron-

    Internet

    Author: Other  

    Hitachi Hyoron  

    2009.01

  • API for 'OSCAR compiler' automatically generate code for multicores -EDN Japan MAGAZINE ARTICLES, Jan. 2009-

    Internet

    Author: Other  

    EDN Japan MAGAZINE ARTICLES, Jan. 2009  

    2009.01

  • Innovative Multicore LSI Realized Low Consumed Power and High Software Productivity

    Other

    Author: Other  

    Hitachi Hyoron, Vol.91, No.1, pp.125  

    2009.01

  • API for 'OSCAR compiler' automatically generate code for multicores

    Other

    Author: Other  

    EDN Japan, No.95, pp.17  

    2009.01

  • Information Technology Research

    Other

    Author: Other  

    Research Activities 20082009  

    2009.01

  • Realization of Japanese Style Industry Academia Collaboration Management

    Internet

    Author: Other  

    Nikkei BP Mail Magazine No.194 -Emerging Technology Business-  

    2008.12

  • Apple Open CL Gives Prosessors Freedom

    Other

    Author: Other  

    Nikkei Electronics, No.993, pp.107-117  

    2008.12

  • Public Release of OSCAR API, Prof. Kasahara, Waseda Univ.

    Internet

    Author: Other  

    Faculty of Science and Engineering, Waseda Univ. HP  

    2008.11

  • Group Develops Standard API to Give Parallel Execution, Power Control Orders to Compiler

    Internet

    Author: Other  

    Nikkei Electronics Tech On  

    2008.11

  • Prof. Kasahara, Waseda Univ Developed API for Real-time Parallel Processing in a National Project with 6 companies and Opened it to the Public in Nov. 2008

    Internet

    Author: Other  

    Nikkei BP Emerging Technology Business  

    2008.11

  • Waseda Univ.: Efficient Use of Multicore MPU, Opening Program Specification to the Public -Nikkei Shushoku Navi-

    Internet

    Author: Other  

    Nikkei Shushoku Navi  

    2008.11

  • Waseda Univ.: Efficient Use of Multicore MPU, Opening Program Specification to the Public -NIKKEI NET-

    Internet

    Author: Other  

    NIKKEI NET  

    2008.11

  • Waseda Univ.: Efficient Use of Multicore MPU, Opening Program Specification to the Public

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2008.11

  • Waseda University et al. Deverop and Open API realizing Low Consumed Power Real-time Parallel Processing

    Internet

    Author: Other  

    TRENDLINE  

    2008.11

  • Waseda University et al. Deverop and Open API realizing Low Consumed Power Real-time Parallel Processing

    Internet

    Author: Other  

    NIKKEI NET IT PLUS  

    2008.11

  • Waseda University et al. Deverop and Open API realizing Low Consumed Power Real-time Parallel Processing

    Internet

    Author: Other  

    NIKKEI NET  

    2008.11

  • Software Standard (API) Realizing Low Power Real-time Parallel Processing on Multicores for Consumer Electronics from Different Vendors

    Other

    Author: Other  

    Waseda University Press Release  

    2008.11

  • Selected as the Runner-Up Grand Prix of the '15th Annual LSI of the Year, 2008' -WASEDA ONLINE (YOMIURI ONLINE) Campus Now-

    Internet

    Author: Other  

    WASEDA ONLINE (YOMIURI ONLINE) Campus Now  

    2008.10

  • IEEE Computer Society Election, IEEE Computer Society Officers and Board of Governors Positions in 2009

    Internet

    Author: Other  

    IEEE Computer Society  

    2008.10

  • Selected as the Runner-Up Grand Prix of the '15th Annual LSI of the Year, 2008'

    Other

    Author: Other  

    Waseda University CAMPUS NOW, Vol. 183  

    2008.10

  • LSI of the Year 2008 the Second Grand Prix: LSI:RP2 Having 8CPU Cores and 8RAMs Realizing Independent Power Shut-down: Differentiates by Software Productivity and Extreme Low Consumed Power -Accomplishment by Industry-Academia Collaboration Bringing Advantageous Technologies-

    Newspaper, magazine

    Author: Other  

    Handoutai Sangyo Shimbun  

    2008.09

  • Microprocessor Forum Japan: Software for multicore processor was the focal point -EDN Japan MAGAZINE ARTICLES, Sep. 2008-

    Internet

    Author: Other  

    EDN Japan MAGAZINE ARTICLES, Sep. 2008  

    2008.09

  • Microprocessor Forum Japan: Software for multicore processor was the focal point

    Other

    Author: Other  

    EDN Japan, No.91, pp.19-26  

    2008.09

  • 2008 LSI of The Year The Second Prize. Innovative Low Power Consumption LSI(Renesas, Hitachi, Waseda Univ.

    Newspaper, magazine

    Author: Other  

    Handoutai Sangyo Shimbun  

    2008.07

  • ECO Computer by Solar Battery? Leading edge multicore technology

    Internet

    Author: Other  

    innovations reort  

    2008.07

  • 2008 LSI of the Year

    Internet

    Author: Other  

    Handoutai Sangyo Shimbun HP  

    2008.07

  • Renesas, Hitachi, Waseda Univ. Received 2008 LSI of The Year The Second Prize

    Newspaper, magazine

    Author: Other  

    Handoutai Sangyo Shimbun  

    2008.07

  • ECO Computer by Solar Battery -Leading edge multicore technology-

    Internet

    Author: Other  

    WASEDA ONLINE (YOMIURI ONLINE)  

    2008.07

  • Multicore LSI Developed by Waseda Univ., Hitachi, and Renesas Won 2008 LSI of-The-Year Second Prize

    Internet

    Author: Other  

    Waseda University HP  

    2008.07

  • 2008 LSI of the Year

    Internet

    Author: Other  

    MYCOM Journal  

    2008.07

  • 2008 LSI of the Year

    Internet

    Author: Other  

    Denshi Journal  

    2008.07

  • MPSOC '08, Live from Maastricht: Got SMP? Need Auto Parallelization? Just add Multigrain OSCAR

    Internet

    Author: Other  

    Electronics Design, Strategy, News -Leibson's Law-  

    2008.07

  • Microprocessor Forum Japan 2008, A Highlight is Advanced Technology of Processor for Small Devices.

    Internet

    Author: Other  

    @IT MONOist  

    2008.07

  • Small-Footprint, Power-Saving, and High-Performance Deskside Computer Reduces Software Development Period

    Other

    Author: Other  

    sgi news, No.43, pp.8  

    2008.07

  • Case Introduction, Dept. of Computer Science, Waseda Univ.: Parallelizing Compiler Research for Multicore Processor. SGI Japan Installed Mid Range Servers 'Altix 450' to Kasahara Lab, Waseda Univ. : Small-Footprint, Power-Saving, and High-Performance Deskside Computer Reduce Software Development Period

    Internet

    Author: Other  

    SGI e-News  

    2008.06

  • 2. Accomplishments of 'Multicore Technology for Real Time Consumer Electronics' in Semiconductor Application Chip Project were Introduced in the Council for Science and Technology Policy as a Next Generation IT Energy Saving Technology

    Other

    Author: Other  

    Brochure about NEDO Electronic & Information Technology Development Dept. pp.13  

    2008.06

  • Cool Chips XI -Panel Discussion-

    Internet

    Author: Other  

    MYCOM Journal  

    2008.05

  • Development of Multicore Technology for Efficient Development of Consumer Electronics, Waseda Univ.

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2008.05

  • Situation of Research of Multicore CPU and Expectation from User View

    Internet

    Author: Other  

    IPSJ Special Interest Group on Computer Architecture: Panel Discussion "Multicore Strategy of a New Era" (Dr. Fukunaga, Hitachi)  

    2008.05

  • Cool Chips XI -Multicore Compiler Realizing Power-Saving and High-Performance-

    Internet

    Author: Other  

    MYCOM Journal  

    2008.05

  • Upcoming Epoch of Multicore Processor

    Internet

    Author: Other  

    Automotive Electronics Feature  

    2008.05

  • Cool Chips XI -Remarkable Papers-

    Internet

    Author: Other  

    MYCOM Journal  

    2008.05

  • Highlights of ESEC2008

    Other

    Author: Other  

    EDN Japan, No.87, pp.72-73  

    2008.05

  • Upcoming Epoch of Multicore Processor

    Other

    Author: Other  

    Automotive Electronics, Vol.2, pp.52-55  

    2008.05

  • Development of Low Power Consumption Technology of Multicore LSI for Consumer Electronics

    Other

    Author: Other  

    RENESAS Edge, Vol.21 pp.6  

    2008.04

  • Development of Low Power Consumption Technology of Multicore LSI for Consumer Electronics

    Internet

    Author: Other  

    Waseda Univ. CAMPUS NOW Online  

    2008.04

  • Developed multicore was introduced in the CSTP at the Prime Minister's office

    Internet

    Author: Other  

    Council for Science and Technology Policy 74th session  

    2008.04

  • Development of Low Power Consumption Technology of Multicore LSI for Consumer Electronics

    Other

    Author: Other  

    Waseda University CAMPUS NOW, Vol. 180  

    2008.04

  • Development of Low Power Consumption Technology of Multicore LSI

    Other

    Author: Other  

    Hitachi Environmental Report 2008  

    2008.04

  • Cover Story: IT warming

    Internet

    Author: Other  

    asahi.com  

    2008.03

  • Low Power Consumption Technology of Multicore LSI

    Internet

    Author: Other  

    Japan Edition Semiconductor International  

    2008.03

  • SGI Japan Installed Mid Range Servers 'Altix 450' to Kasahara Lab, Waseda Univ. Introduction of Efforts and Goal of Research and Development

    Internet

    Author: Other  

    SGI e-News No.94  

    2008.02

  • IT Apparatus : Pressing need for saving energy

    Newspaper, magazine

    Author: Other  

    Asahi Shimbun  

    2008.02

  • Midrange Server : Waseda kasahara Lab

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun Data Communication  

    2008.02

  • Rock and Tukwila Are the Stars of ISSCC This Week

    Internet

    Author: Other  

    The Unix Guardian  

    2008.02

  • Waseda etc. Developed Low Power Consumption Technology of Multicore LSI

    Internet

    Author: Other  

    Semiconductor Japan Net  

    2008.02

  • ISSCC 2008 : Tilera, Tile 64 Compared with Renesas Technology's 8 core chip

    Internet

    Author: Other  

    Mycomi Journal  

    2008.02

  • [ISSCC 2008]Large Reduction of Consumed Power with Compiler Collaboration

    Internet

    Author: Other  

    EE TIMES Japan  

    2008.02

  • Renesas etc Developed Low Power Consumption Technology of Multicore LSI by Parallelizing Compiler

    Internet

    Author: Other  

    Mycomi Journal  

    2008.02

  • Hitachi, Renesas, Waseda Univ. etc Developed Low Power Consumption Technology of Multicore LSI

    Internet

    Author: Other  

    Micro Technology Business  

    2008.02

  • Renesas etc Developed Low Power Consumption Technology of Multicore LSI by Parallelizing Compiler

    Internet

    Author: Other  

    media jam  

    2008.02

  • Development of Low Power Consumption Technology for Multicore LSI for Consumption Electronics

    Internet

    Author: Other  

    ELISNET  

    2008.02

  • Multicore LSI Reduction of Consumed Power by Compiler Collaboration

    Newspaper, magazine

    Author: Other  

    Japan Chemical Daily  

    2008.02

  • 'ISSCC 2008 Previous Report' Low Power Consumption Processor

    Internet

    Author: Other  

    Yahoo News  

    2008.02

  • Waseda, Hitachi, Renesas Developed Low Power Consumption Technology for Consumer Electronics

    Internet

    Author: Other  

    Yahoo News  

    2008.02

  • 'ISSCC 2008 Previous Report' Low Power Consumption Processor

    Internet

    Author: Other  

    Yahoo Game  

    2008.02

  • Waseda Univ., Hitachi, Renesas Developed Low Power Consumption Technology for Multicore LSI for Consumer Electronics

    Internet

    Author: Other  

    Waseda Univ. HP  

    2008.02

  • Renesas etc. 80% Cousumed Power Reduction in Digital Consumer Electronics LSI

    Internet

    Author: Other  

    NIKKEI NET  

    2008.02

  • Information Processing Software Renesas etc. Reduced 80% consumed Power

    Internet

    Author: Other  

    Nikkei Navi 2008  

    2008.02

  • Waseda, Hitachi, Renesas Developed Low Power Consumption Technology for Consumer Electronics

    Internet

    Author: Other  

    Nikkan Kogyo Shimbun Business Line  

    2008.02

  • Waseda Univ., Hitachi, Renesas Developed Low Power Consumption Technology for Multicore LSI for Consumer Electronics

    Internet

    Author: Other  

    Kabuka Zairyo  

    2008.02

  • 'ISSCC 2008 Previous Report' Low Power Consumption Processor

    Internet

    Author: Other  

    infoseek News  

    2008.02

  • 'ISSCC 2008 Previous Report' Low Power Consumption Processor

    Internet

    Author: Other  

    Impress Watch  

    2008.02

  • Waseda, Hitachi, Renesas Developed Low Power Consumption Technology for Consumer Electronics (Prof. Hironori Kasahara, Dept. of Computer Science)

    Internet

    Author: Other  

    Faculty of Science and Engineering, Waseda Univ. HP  

    2008.02

  • Renesas, Hitachi, Waseda Univ. Co-developed Low Power Consumption Technology for Multicore LSI by Parallelizing Compiler

    Internet

    Author: Other  

    EDA News  

    2008.02

  • Renesas, Hitachi, Waseda Univ. Co-developed Low Power Consumption Technology for Multicore LSI by Parallelizing Compiler

    Internet

    Author: Other  

    EDA Express  

    2008.02

  • Waseda, Hitachi, Renesas Developed Low Power Consumption Technology for Consumer Electronics

    Internet

    Author: Other  

    asahi.com  

    2008.02

  • Development of Low Power Consumption Technology of Multicore LSI for Consumer Electronics

    Other

    Author: Other  

    Waseda University Press Release  

    2008.02

  • Development of Low Power Consumption Technology of Multicore LSI for Consumer Electronics

    Other

    Author: Other  

    Renesastechnology Press Release  

    2008.02

  • Development of Low Power Consumption Technology of Multicore LSI for Consumer Electronics

    Other

    Author: Other  

    Hitachi Press Release  

    2008.02

  • 80% Cousumed Power Reduction in Digital Consumer Electronics LSI

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2008.02

  • Parallelizing Compiler Technology for Multicores : Development by Waseda, Hitachi, Renesas

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    2008.02

  • Development of LSI Consumed Power Reduction Technology for Consumer Electronics Waseda, Hitachi, etc.

    Newspaper, magazine

    Author: Other  

    Denki Shimbun  

    2008.02

  • Waseda, Hitachi, Renesas Development of Low Power Consumption Technology for Multicore LSI

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun  

    2008.02

  • ISSCC 2008 Preview : Microprocessor Session

    Internet

    Author: Other  

    Mycomi Journal  

    2008.01

  • SGI Japan Installed Mid Range Servers 'Altix 450' to Kasahara Lab, Waseda Univ. Contribution to Research Theme 'Improvement of Computer's Processing Speed and Reduction of Software Development Period'

    Internet

    Author: Other  

    SGI e-News No.91  

    2008.01

  • Deskside Super Computer Supports Development of Automatic Parallelizing Compilers for Multicore

    Internet

    Author: Other  

    YahooNews  

    2008.01

  • Deskside Super Computer Supports Development of Automatic Parallelizing Compilers for Multicore

    Internet

    Author: Other  

    NEWS@nifty  

    2008.01

  • Deskside Super Computer Supports Development of Automatic Parallelizing Compilers for Multicore

    Internet

    Author: Other  

    livedoor News  

    2008.01

  • Deskside Super Computer Supports Development of Automatic Parallelizing Compilers for Multicore

    Internet

    Author: Other  

    infoseekNews  

    2008.01

  • High Performance Digital : Coping with Consumed Power

    Other

    Author: Other  

    Nikkei Electronics, No.969 (Jan. 14, 2008)  

    2008.01

  • SGI Japan Installed Compact Server to Waseda Univ.

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2008.01

  • SGI Japan Installed Mid Range Servers 'Altix 450' to Kasahara Lab, Waseda Univ.

    Internet

    Author: Other  

    YahooNews  

    2007.12

  • SGI Japan Installed Mid Range Servers 'Altix 450' to Kasahara Lab, Waseda Univ.

    Internet

    Author: Other  

    webBCN  

    2007.12

  • SGI Japan Installed Mid Range Servers 'Altix 450' to Kasahara Lab, Waseda Univ.

    Internet

    Author: Other  

    Excite News Press Release  

    2007.12

  • SGI Japan Installed Mid Range Servers 'Altix 450' to Kasahara Lab, Waseda Univ.

    Internet

    Author: Other  

    asahi.com  

    2007.12

  • SGI Japan Installed 3 Mid Range Servers to Kasahara Lab, Waseda Univ. for Parallelizing Compiler Research

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun  

    2007.12

  • SGI Japan Installed Mid Range Servers 'Altix 450' to Kasahara Lab, Waseda Univ. : Small-Footprint, Power-Saving, and High-Performance Deskside Computer Reduces Software Development Period

    Internet

    Author: Other  

    Security Online News  

    2007.12

  • Parallelizing Compiler Research for Multicore Processor. SGI Japan Installed Mid Range Servers 'Altix 450' to Kasahara Lab, Waseda Univ. : Small-Footprint, Power-Saving, and High-Performance Deskside Computer Reduce Software Development Period

    Other

    Author: Other  

    Waseda University Press Release  

    2007.12

  • Parallelizing Compiler Research for Multicore Processor. SGI Japan Installed Mid Range Servers 'Altix 450' to Kasahara Lab, Waseda Univ. : Small-Footprint, Power-Saving, and High-Performance Deskside Computer Reduce Software Development Period

    Other

    Author: Other  

    SGI Japan Press Release  

    2007.12

  • Announcement of Waseda Univ. 125 th & Faculty of Science and Engineering 100th Anniversary Symposium ’Innovative Information, Electronics, and Optical technology’

    Internet

    Author: Other  

    Faculty of Science and Engineering, Waseda Univ. HP, News & Events  

    2007.09

  • Announcement of Waseda Univ. 125 th & Faculty of Science and Engineering 100th Anniversary Symposium ’Innovative Information, Electronics, and Optical technology’

    Internet

    Author: Other  

    Waseda Univ. HP, News & Events  

    2007.09

  • Waseda University Hironori Kasahara Laboratory : A New Value Creation by Combining Research and Needs for Multicore Technology

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2007.09

  • Waseda University: Innovation of University Management in the 125th Anniversary, Global Navi

    TV or radio program

    Author: Other  

    TBS   BSi TV Program  

    2007.07

  • Development of Multicore Technology to Reduce Development Period of Consumer Electronics

    Other

    Author: Other  

    Waseda University CAMPUS NOW, Vol. 173  

    2007.07

  • Development of Multicore Technology : Shorten development period of consumer electronics

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun  

    2007.06

  • Parallelizing Compiler Technology for Multicores : Development by Waseda, Hitachi, Renesas

    Other

    Author: Other  

    EE Times Japan E-mail News Letter (no.98)  

    2007.06

  • Report on Next Generation Supercomputer Concept Design Evaluation

    Other

    Author: Other  

    MEXT Next Generation Supecomputer Concept WG  

    2007.06

  • Waseda Univ. etc. Developed a New Multicore Technology for Quick Software Development of Consumer Electronics

    Newspaper, magazine

    Author: Other  

    Nihon Joho Sangyo Shimbun  

    2007.06

  • Parallelizing Compiler Technology for Multicores : Development by Waseda, Hitachi, Renesas

    Internet

    Author: Other  

    EE TIMES Japan  

    2007.06

  • Development of Consumer Electronics Software : High Speed Processing by Multicore Technology

    Newspaper, magazine

    Author: Other  

    Japan Chemical Daily  

    2007.06

  • Development of Multicore Technology for Efficient Development of Consumer Electronics

    Newspaper, magazine

    Author: Other  

    Demkei Shimbun  

    2007.06

  • Waseda Univ. and Hitachi etc. Developed Multicore Technology to Reduce Development Period of Consumer Electronics

    Internet

    Author: Other  

    Kankyoubu.com  

    2007.06

  • Waseda Univ. and Hitachi etc. Developed Multicore Technology to Reduce Development Period of Consumer Electronics

    Internet

    Author: Other  

    IPNEXT  

    2007.06

  • Hitachi etc. Established Technology to Reduce Development Period of Multicore LSI

    Internet

    Author: Other  

    Yahoo Japan News(Nikkan Kogyo Shimbun)  

    2007.06

  • Waseda Univ. etc. Co-developed Multicore Technology to Reduce Development Period of Consumer Electronics

    Internet

    Author: Other  

    Semiconductor Japan Net  

    2007.06

  • Waseda Univ., Hitachi, Renesas Technology Developed Multicore Technology to Shorten Development Period of Consumer Electronics

    Internet

    Author: Other  

    Nippon R&D Community  

    2007.06

  • Waseda Univ. etc. Developed Multicore Software : Enhancing Performance of Digital Consumer Electronics

    Internet

    Author: Other  

    NIKKEI NET IT PLUS  

    2007.06

  • Hitachi, Waseda Univ., Renesas Show off Power of Parallelizing Compiler Technology for Multicore SoC

    Internet

    Author: Other  

    Nikkei Electronics Tech On  

    2007.06

  • Hitachi, Waseda Univ., Renesas Show off Power of Parallelizing Compiler Technology for Multicore SoC

    Internet

    Author: Other  

    News about Semiconductor and Car Electronics  

    2007.06

  • Hitachi etc. Developed Multicore Technology to Reduce Development Period of Consumer Electronics

    Internet

    Author: Other  

    Micro Technology Business  

    2007.06

  • Hitachi etc. Established Technology to Reduce Development Period of Multicore LSI

    Internet

    Author: Other  

    Kabuka Zairyo  

    2007.06

  • Waseda Univ. and Hitachi etc. Developed Multicore Technology to Reduce Development Period of Consumer Electronics

    Internet

    Author: Other  

    IBTimes  

    2007.06

  • Waseda Univ. and Hitachi etc. Developed Multicore Technology to Reduce Development Period of Consumer Electronics

    Internet

    Author: Other  

    CMSNAVI  

    2007.06

  • Hitachi etc. Established Technology to Reduce Development Period of Multicore LSI

    Internet

    Author: Other  

    asahi.com  

    2007.06

  • Waseda Univ. etc. Developed Multicore Software

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2007.06

  • Parallel Processing on Multicore LSI

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    2007.06

  • Parallel Processing Software for Consumer Electronics Development by Waseda, Hitachi, etc.

    Newspaper, magazine

    Author: Other  

    Nihon Keizai Shimbun  

    2007.06

  • Waseda/Hitachi/Renesas Developed Multicore Technology to Reduce Software Development Period

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun  

    2007.06

  • Waseda Univ., Hitachi, Renesas Technology Developed Multicore Technology to Shorten Development Period of Consumer Electronics -NIKKEI NET-

    Internet

    Author: Other  

    NIKKEI NET  

    2007.05

  • Waseda Univ., Hitachi, Renesas Technology Developed Multicore Technology to Shorten Development Period of Consumer Electronics -Matsui Securities-

    Internet

    Author: Other  

    Matsui Securities  

    2007.05

  • Waseda Univ., Hitachi, Renesas Technology Developed Multicore Technology to Shorten Development Period of Consumer Electronics -JCN Network-

    Internet

    Author: Other  

    JCN Network  

    2007.05

  • Waseda Univ., Hitachi, Renesas Technology Developed Multicore Technology to Shorten Development Period of Consumer Electronics -Infoseek Money-

    Internet

    Author: Other  

    Infoseek Money  

    2007.05

  • NEDO Roadmap Committee Report

    Other

    Author: Other  

    NEDO Electronics and Information Technology Roadmap  

    2007.05

  • Waseda Univ., Hitachi, Renesas Technology Developed Multicore Technology to Shorten Development Period of Consumer Electronics

    Other

    Author: Other  

    Waseda University Press Release  

    2007.05

  • Waseda Univ., Hitachi, Renesas Technology Developed Multicore Technology to Shorten Development Period of Consumer Electronics

    Other

    Author: Other  

    Renesastechnology Press Release  

    2007.05

  • Waseda Univ., Hitachi, Renesas Technology Developed Multicore Technology to Shorten Development Period of Consumer Electronics

    Other

    Author: Other  

    Hitachi Press Release  

    2007.05

  • Introduction of Kasahara Research Group: Compiler Cooperated Chip Multiprocessor

    Other

    Author: Other  

    Nikkei Microdevices Special Edition 2007  

    2007.05

  • SH-4A Multicore

    Other

    Author: Other  

    RENESAS Edge Vol.17 pp.04  

    2007.04

  • Special Issue on Advanced Embedded Microprocessors

    Other

    Author: Other  

    RENESAS Edge Vol.17 pp.04  

    2007.04

  • Microprocessor

    Other

    Author: Other  

    HITACHI 2007Spring pp.13-15  

    2007.04

  • What do supercomputers do?

    Newspaper, magazine

    Author: Other  

    Asahi Shimbun  

    2007.01

  • Toward the Fastest Supercomputers

    Newspaper, magazine

    Author: Other  

    Asahi Shimbun  

    2006.11

  • Arm Forum 2006 - Potential of Multi-core Compilers

    Internet

    Author: Other  

    MYCOM Journal  

    2006.10

  • ARM Forum 2006- Cortex Family and Compilers for Multi-core

    Internet

    Author: Other  

    MYCOM Journal  

    2006.10

  • Creation of World Best Parallelizing Compiler: Toward to the Era of Multi-core Everywhere in the 21th Century

    Other

    Author: Other  

    IBM High Performance Computing Case Introduction  

    2006.09

  • A special Theme Case: Strengthening International Competitiveness: Real-time Multi-core for Consumer Electronics

    Other

    Author: Other  

    MEXT the 10th Convention for Coordinators of Collaboration of Business, Academia and Government  

    2006.09

  • IV. Outline of Completed Projects (19) Research and Development of Advanced Parallelizing Compiler

    Other

    Author: Other  

    Brochure about NEDO Electronic & Information Technology Development Dept. pp.60  

    2006.09

  • Science & Technology Research Contest JSEC An Important Chance for High School Students to Know the Fun of Research and Development

    Newspaper, magazine

    Author: Other  

    Asahi Shimbun  

    2006.06

  • "Is There a Limit of Speedup of Supercomputers?" Hot Science Nattoku Kagaku

    Newspaper, magazine

    Author: Other  

    Yomiuri Shimbun  

    2006.01

  • II. Outline of Projects - The fields of Semiconductor Technology - (2) Semiconductor Application Chip Project (Field of Information Appliances)

    Other

    Author: Other  

    Brochure about NEDO Electronic & Information Technology Development Dept. pp.20-21  

    2006.01

  • Face-to-face communication! This is distinctive advantage of Tokyo.

    Other

    Author: Other  

    Nikkei BP Mook, School of Science and Engineering, Waseda University 2006-2007, pp. 39  

    2005.12

  • - Site of Academic/Research Field - 'Learning about Theory of Opinion Reader'

    Other

    Author: Other  

    PC-Webzine, Vol.165, pp.100  

    2005.11

  • hapter 1 Closing Up of the Forefront of Science and Technology "2. Overwhelmed out the World by Parallelizing Compiler and Multicore Processor

    Other

    Author: Other  

    Chuokoron-Shinsha, Advanced Research from Laboratories, Energetic Research Activities in Waseda University, pp.28-37  

    2005.09

  • Adopted for Semiconductor Application Chip Project at NEDO 'Research and Development of Real-Time Multi-Core Technology for Information Appliances' Waseda University, Hitachi Ltd. and Renesas Technology Corp. (Project leader / Prof. Hironori Kasahara)

    Internet

    Author: Other  

    Waseda University Liaison office Press Release  

    2005.07

  • NEDO Adds Four New Themes to 'Semiconductor Application Chip Project'

    Internet

    Author: Other  

    NEDO Press Release  

    2005.06

  • 'Japanese Hinomaru Processor' Cooperation between Hitachi and Waseda University (NEDO Matching Fund), Research of Advanced Heterogeneous Multiprocessor

    Internet

    Author: Other  

    The 4th Conference for the Promotion of Collaboration among Business, Academia, and Government, Special Lecture by Dr. Kenji Takeda (RIKEN Executive Directors, Former Hitachi General Manager of Research Alliance in Headquarter of Research and Developnent)  

    2005.06

  • Easily Understandable! 'Electronics and IT Field: Advanced Parallelizing Compiler Project'

    Internet

    Author: Other  

    NEDO  

    2005.01

  • Japanese Universities and Research Institutes Embrace Cosy. Waseda University and Tokyo University enter into advanced compiler research with compiler development system from ACE

    Internet

    Author: Other  

    Cosy 2004 Announcement  

    2004.11

  • Current and Future of Automatic Parallelizing Compilers

    Other

    Author: Other  

    Mycom PC Web  

    2004.11

  • Development of Speeding up and Power Saving Multi-core processor for Mobile Phone, Hitachi and Waseda Univ.

    Internet

    Author: Other  

    Nikkei Net IT Business & News  

    2004.10

  • Unique Industry-University Cooperation between Hitachi and Waseda Univ. Expatriate employees Give Their Practical Business know-how to Students in English.

    Newspaper, magazine

    Author: Other  

    Shinano Maihichi Shimbun  

    2004.10

  • Framework Agreement between Hitachi and Waseda Univ. --Having its beginning with research and development of Multi-core Processor

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    2004.10

  • Cooperative Research and Development by Hitachi and Waseda Univ.

    Newspaper, magazine

    Author: Other  

    Nihon Keizai Shimbun  

    2004.10

  • Comprehensive Cooperation between Waseda Univ. and Hitachi, in the Fields of Research and Education

    Newspaper, magazine

    Author: Other  

    Kensetsutsushin Shimbun  

    2004.10

  • Framework Agreement on Industry-University Cooperation

    Newspaper, magazine

    Author: Other  

    FujiSankei Business i  

    2004.10

  • Framework cooperation between Hitachi and Waseda Univ. --Promotion of Exchange in Many Fields, Human Resource, Technology and Information.

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun  

    2004.10

  • Framework Agreement on Industry-University Cooperation between Hitachi and Waseda Univ. Starting Their Cooperative Research and Development of Multi-core Microprocessor as Their First Shot

    Internet

    Author: Other  

    Tech On!  

    2004.09

  • Framework Agreement on Industry-University Cooperation between Hitachi and Waseda Univ. Promoting Development of Semiconductor, Robot and so on, as an Important Pillar.

    Internet

    Author: Other  

    Nikkeibp.jp for Technology & Business  

    2004.09

  • Waseda University and Hitachi Group concluded an agreement for Comprehensive Academic Industrial Collaboration to start comprehensive cooperation

    Other

    Author: Other  

    Waseda University Press Release  

    2004.09

  • Toward Multi-core Processors from Single-core.

    Other

    Author: Other  

    Nikkei Electronics Vol.8, pp.97-121, 2004  

    2004.08

  • Upbringing of Human Resources by Collaboration of Industry and Academia

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2004.04

  • Assistant professors are engineers of the first class Education in industry-university cooperation

    Other

    Author: Other  

    Nikkei Shingaku Guide  

    2004.01

  • Development of Single Chip Multicore Architecture

    Other

    Author: Other  

    Nikkei Microdevices Special Edition 2004 (Reader for Jobhunting in Semiconductor Industry )  

    2004.01

  • Success in Software Development Speeding up Parallel Computer more than 10times in Industry-academia-government Collaboration Project

    Internet

    Author: Other  

    Digital JECC NEWS  

    2003.05

  • Research and Development of Advanced Parallelizing Compiler

    Other

    Author: Other  

    Focus NEDO Vol. 3, No. 9  

    2003.05

  • Industry, Government, Academia Cooperation Toward Strengthening IT Competitive Power

    Internet

    Author: Other  

    WASEDA.COM on asahi.com  

    2003.04

  • MOT: Upbringing Future Engineers

    Other

    Author: Other  

    Nikkei Electronics No.845, pp.106-107, 2003  

    2003.04

  • Published in the newspapers

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun, Dempa Shimbun, Yomiuri Shimbun  

    2003.04

  • JEITA Donated Lecture 'IT Forefront' in Kansai

    Internet

    Author: Other  

    JEITA  

    2003.03

  • New Compiler Technology Easily Develops Parallel Application Programs

    Internet

    Author: Other  

    IT Pro News  

    2003.03

  • Boosting up High Performance Computer's Speed---APC Technology Speeding up 10.7times at maximum, 3.5times in average, Several years advanced performance without hardware change

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun  

    2003.03

  • Success in Software Development Speeding up more than 10times

    Internet

    Author: Other  

    METI Press Release  

    2003.03

  • 10fold Speed up Parallel Computer

    Internet

    Author: Other  

    @braina.com  

    2003.03

  • Prof. Kasahara, Waseda Univ. etc Developed Parallelizing Compiler Technology Speeding up Parallel Computer 10times, Presenting the Accomplishment in International Symposium

    Newspaper, magazine

    Author: Other  

    The Science News  

    2003.03

  • 'Advanced Parallelizing Compiler Project' Boosting up Performance of Parallel Computers 10times

    Newspaper, magazine

    Author: Other  

    Japan Chemical Daily  

    2003.03

  • Parallelizing Compiler Realizing Speed up of 3.5times in Average

    Internet

    Author: Other  

    Kure JBC  

    2003.03

  • Fujitsu etc. Development of Software Speeding up Parallel Computer more than 10times

    Internet

    Author: Other  

    ZDNet News  

    2003.02

  • 10fold Computing Speed in Parallel Computer, New Software Development

    Internet

    Author: Other  

    Yahoo Japan News(Yomiuri Shimbun)  

    2003.02

  • JIPDEC etc. Developed Software Speeding up the Latest Parallel Computer

    Internet

    Author: Other  

    The Japan Industrial Journal  

    2003.02

  • Cooperative Development of Software Speeding up Parallel Computer more than 10times

    Internet

    Author: Other  

    Nikkan Kogyo Shimbun  

    2003.02

  • Waseda Univ. etc. Software for Parallel Computers Speeding up 10times(06:52)

    Internet

    Author: Other  

    Nihon Keizai Shimbun IT Business & News  

    2003.02

  • NEDO Develops software speeding up computer operation 10times(14:46)

    Internet

    Author: Other  

    KYODO NEWS  

    2003.02

  • "Development of High Speed Operation Software" by Collaboration of Academia and Industry (Hitachi etc.)

    Newspaper, magazine

    Author: Other  

    Yomiuri Shimbun  

    2003.02

  • Speeding up the Latest Parallel Computer more than 10times" JIPDEC etc

    Newspaper, magazine

    Author: Other  

    The Japan Industrial Journal  

    2003.02

  • 'Software Speeding up Processing 10fold' Parallel Computer Waseda Univ. etc development

    Newspaper, magazine

    Author: Other  

    Nikkei Sangyo Shimbun  

    2003.02

  • "Speeding up parallel computers more than 10times" JIPDEC Parallelizing Compiler Development

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    2003.02

  • "Development of Software Speeding up Parallel computers 10 times" by Hitachi, Waseda .etc

    Newspaper, magazine

    Author: Other  

    Nihon Keizai Shimbun  

    2003.02

  • "Compiler Development Speeding up Parallel Computers" NEDO

    Newspaper, magazine

    Author: Other  

    Japan Chemical Daily  

    2003.02

  • "Speeding up Parallel Computers" Cooperative Development by JIPDEC, Fujitsu and so on

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun  

    2003.02

  • Promotion of JEITA Donated Lecture 'IT Forefront' by Collaboration of Industry and Academia

    Internet

    Author: Other  

    BCN  

    2003.02

  • Success in development of speeding up parallel computers over 10times

    Other

    Author: Other  

    Fujitsu Press Release  

    2003.02

  • Success in Development of Software Speeding up Parallel Computer more than 10times

    Internet

    Author: Other  

    WASEDA.COM on asahi.com  

    2003.02

  • New Software Development Speeding up Parallel Computer 10times(23:23)

    Internet

    Author: Other  

    Yomiuri Shimbun  

    2003.02

  • Speeding up 10times at maximum NEDO, New Software Development

    Internet

    Author: Other  

    The Kyoto Shimbun  

    2003.02

  • Hitachi, Cooperative Development of Compiler Speeding up Parallel Computer more than 10times

    Internet

    Author: Other  

    PC Web  

    2003.02

  • Development of Software Speeding up Parallel Computer 10times

    Internet

    Author: Other  

    nth dimension  

    2003.02

  • Parallelizing Compiler Realizing Speed up 3.5times in Average(19:12)

    Internet

    Author: Other  

    Nikkei Business Publications  

    2003.02

  • Fukushim Mimpo Internet

    Internet

    Author: Other  

    Fukushim Mimpo  

    2003.02

  • Success in Software Development Speeding up Parallel Computer more than 10times

    Internet

    Author: Other  

    Doshisha University  

    2003.02

  • The Part 3 Policy for Science and Technology Promotion

    Internet

    Author: Other  

    2002 MEXT Science Technology White Paper  

    2002.06

  • Upbringing of IT Engineers by Collaboration of Industry and Academia

    TV or radio program

    Author: Other  

    NHK  

    2002.04

  • TV News Lecture in University by Collaboration of Industry and Academia

    TV or radio program

    Author: Other  

    NHK   Business Front Line  

    2002.04

  • Upbringing of IT Engineers by Collaboration of Industry and Academia

    TV or radio program

    Author: Other  

    NHK  

    2002.03

  • Opening of JEITA Donated Lecture 'IT Forefront'

    Internet

    Author: Other  

    The Japan Association of Private Colleges and Universities HP  

    2002.03

  • Promotion of JEITA Donated Lecture 'IT Forefront' by Collaboration of Industry and Academia (Web BCN)

    Internet

    Author: Other  

    Mycom PC Web  

    2002.03

  • Promotion of JEITA Donated Lecture 'IT Forefront' by Collaboration of Industry and Academia

    Newspaper, magazine

    Author: Other  

    Asahi Shimbun  

    2002.03

  • Electronic Manufacturer Engineers Dispatched to teach in JEITA Donated Lecture 'IT Forefront' at Tokyo Univ., Waseda Univ. etc.

    Newspaper, magazine

    Author: Other  

    Yomiuri Shimbun  

    2002.03

  • Opening of JEITA Donated Lecture 'IT Forefront'

    Internet

    Author: Other  

    Waseda Univ. News Flash  

    2002.03

  • Collaboration of JEITA and Tokyo Univ. etc. Opening of Donated Lecture 'IT Forefront' Taught by Engineers in IT Firms.

    Internet

    Author: Other  

    Nikkei Press Release  

    2002.03

  • Courses on Demand by 9 IT firms at 3 Universities (Tokyo Univ. etc.) - Supported by METI for Upbringing of Human Resources

    Internet

    Author: Other  

    LYCOS NEWS  

    2002.03

  • Establishment of JEITA Donated Lecture 'IT Forefront'

    Internet

    Author: Other  

    JEITA  

    2002.03

  • JEITA Donated Lecture 'IT Forefront' Held by IT firms at Universities (Tokyo Univ. etc.)

    Newspaper, magazine

    Author: Other  

    Sanyo Shimbun  

    2002.03

  • Close up: Introduce of New Projects : Research and Development of Advanced Parallelizing Compiler

    Promotional material

    Author: Other  

    NEDO BEST MIX vol.47  

    2001.03

  • Hundredfold Speed Computer Technology

    Newspaper, magazine

    Author: Other  

    Nikkan Kogyo Shimbun  

    1999.02

  • Facing Limit of Supercomputer, The Coming of Age of Super Parallel Computer

    Newspaper, magazine

    Author: Other  

    Asahi Shimbun  

    1993.12

  • Flourishing Super Paralle Cumputer Market

    Newspaper, magazine

    Author: Other  

    Computer World  

    1991.04

  • Perspective on Parallel Computers

    Newspaper, magazine

    Author: Other  

    Dempa Shimbun Data Communication  

    1991.01

  • Waseda Univ. Developed High Performance Compiler

    Newspaper, magazine

    Author: Other  

    The Japan Industrial Journal  

    1990.05

  • Research on parallel processing computer

    Newspaper, magazine

    Author: Other  

    Juken Kouza  

    1989.04

  • Interview with Dr. Hironori Kasahara, Waseda Univ. -Parallel Processing Method is Considered as Creative and Innovative -

    Newspaper, magazine

    Author: Other  

    Nihon Joho Sangyo Shimbun  

    1988.10

  • Wased Univ. Developed High Speed Parallel Processing Computer

    Newspaper, magazine

    Author: Other  

    Nihon Joho Sangyo Shimbun  

    1988.08

  • Purchase of 1,000,000 High Performance Workstations for Network Construction

    Newspaper, magazine

    Author: Other  

    The Science News  

    1988.02

  • Dr. Hironori Kasahara Received Young Author Prize: IFAC Triennial World Congress

    Promotional material

    Author: Other  

    Waseda Weekly, Vol.554  

    1987.11

  • OSCAR Multi Processing -World class Computer Using Scheduling Theory

    Newspaper, magazine

    Author: Other  

    Waseda Gakusei Shimbun  

    1987.10

  • Waseda Univ. Developing Parallel Prolog Processing System on General Multiprocessor

    Newspaper, magazine

    Author: Other  

    Nikkei AI  

    1987.09

  • Waseda University Open Innovation Strategy

    Promotional material

    University & College Management  

▼display all

 

Papers

  • Parallelizing Ladder Applications with Task Fusion Techniques for Reducing Parallelization Overhead by OSCAR Automatic Parallelizing Compiler

    Tohma Kawasumi, Hiroki Mikami, Tomoya Yoshikawa, Takero Hosomi, Shingo Oidate, Keiji Kimura, Hironori Kasahara

    Trans. of IPSJ   65 ( 2 ) 539 - 551  2024.02  [Refereed]

    Authorship:Last author

  • Automatic Deep Learning Parallelization for Vector Multicore Chips with the OSCAR Parallelizing and the TVM Open-Source Deep Learning Compiler

    Fumiaki Onishi, Ryosei Otaka, Kazuki Fujita, Tomoki Suetsugu, Tohma Kawasumi, Toshiaki Kitamura, Hironori Kasahara, Keiji Kimura

    Proc. of The 36th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2023), Lexington, Kentucky, USA.    2023.10  [Refereed]

  • Investigation of code generation techniques for vector multicore targeting using the deep learning compiler TVM

    Fumiaki Onishi, Ryosei Otaka, Kazuki Fujita, Tomoki Suetsugu, Tohma Kawasumi, Toshiaki Kitamura, Hironori Kasahara, Keiji Kimura

    IPSJ SIG Technical Report   2023-ARC-254 ( 8 ) 1 - 8  2023.08  [Refereed]

  • Evaluation of Convolution Layers on an Embedded Vector Multticore having Local Memory Architecture

    Ryosei OTAKA, Honoka KOIKE, Ryusei ISONO, Toma KAWASUMI, Toshiaki KITAMURA, Hiroki MIKAMI, Akira NODOMI, Sadahiro KIMURA, Keiji KIMURA, Hironori KASAHARA

    IPSJ SIG Technical Report   2023-EMB-62 ( 32 )  2023.03  [Refereed]

    Authorship:Last author

  • Preliminary Evaluation of Low Power Optimized ORB-SLAM3 on Jetson Xavier NX

    Raito HAYASHI, Hiroki MIKAMI, Akira NODOMI, Sadahiro KIMURA, Keiji KIMURA, Hironori KASAHARA

    IEICE Technical Report   CPSY2022-40  2023.03  [Refereed]

    Authorship:Last author

  • Parallelizing Factory Automation Ladder Programs by OSCAR Automatic Parallelizing Compiler

    Tohma Kawasumi, Tsumura Yuta, Hiroki Mikami, Tomoya Yoshikawa, Takero Hosomi, Shingo Oidate, Keiji Kimura, Hironori Kasahara

    Proc. of the 35th International Workshop on Languages and Compilers for Parallel Computing (LCPC2022)    2022.10  [Refereed]

    Authorship:Last author

  • Parallelism Analysis of Ladder Programs by OSCAR Automatic Parallelizing Compiler

    Yuta TSUMURA, Tohma KAWASUMI, Hiroki MIKAMI, Daiki KAWAKAMI, Takero HOSOMI, Shingo OIDATE, Keiji KIMURA, Hironori KASAHARA

    IPSJ SIG Technical Report   ( 53 )  2022.03

    Authorship:Last author

  • LocalMapping Parallelization and CPU Allocation Method on ORB-SLAM3

    Kazuki YAMAMOTO, Takugo OSAKABE, Honoka KOIKE, Tohma KAWASUMI, Kazuki FUJITA, Toshiaki KITAMURA, Akihiro KAWASHIMA, Akira NODOMI, Sadahiro KIMURA, Keiji KIMURA, Hironori KASAHARA

    IEICE Technical Report   121 ( 425, CPSY2021-58 ) 79 - 74  2022.03

    Authorship:Last author

  • Trends in Parallelization Techniques for Embedded Systems

    Keiji Kimura, Dan Umeda, Hironori Kasahara

      66 ( 1 ) 2 - 7  2022.01  [Refereed]  [Invited]

    Authorship:Last author

  • Parallelizing Compiler Translation Validation Using Happens-Before and Task-Set

    Jixin Han, Tomofumi Yuki, Michelle Mills Strout, Dan Umeda, Hironori Kasahara, Keiji Kimura

    2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW)     87 - 93  2021.11  [Refereed]

    DOI

  • OSCAR Parallelizing and Power Reducing Compiler and API for Heterogeneous Multicores : (Invited Paper)

    Hironori Kasahara, Keiji Kimura, Toshiaki Kitamura, Hiroki Mikami, Kazutaka Morita, Kazuki Fujita, Kazuki Yamamoto, Tohma Kawasumi

    2021 IEEE/ACM SC'21 Workshop on Programming Environments for Heterogeneous Computing (PEHC)     10 - 19  2021.11  [Refereed]  [Invited]

    Authorship:Lead author

    DOI

  • Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores

    Birk Martin Magnussen, Tohma Kawasumi, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

       2021.10  [Refereed]

  • Engineering Education in the Age of Autonomous Machines

    Shaoshan Liu, Jean-Luc Gaudiot, Hironori Kasahara

    IEEE Computer   54 ( 4 ) 66 - 69  2021.04  [Refereed]

    Authorship:Last author

  • Automatic Parallelization of MATLAB/Simulink Applications Using OSCAR Compiler

    Ryo Koyama, Yuta Tsumura, Toma Kawasumi, Yuya Nakada, Dan Umeda, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on System Architecture (ARC236@ETNET2021)    2021.03

    Authorship:Last author

  • Parallelization and Vectorization of SpMM for Sparse Neural Network

    Yuta Tadokoro, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on System Architecture (ARC236@ETNET2021)    2021.03

    Authorship:Last author

  • Waseda University Venture Creation and Expectations for 'Lab to Market'

    Hironori Kasahara

    STE Relay Column : Narratives 130, Research Organization for Open Innovation Strategy, Science, Technology and Entreprenership Research Factory    2021.03  [Invited]

    Authorship:Lead author

  • Computer Education in the Age of COVID-19

    Jean-Luc Gaudiot, Hironori Kasahara

    Computer, January 2020, IEEE Computer Society   53 ( 10 ) 114 - 118  2020.10  [Refereed]

    Authorship:Last author

    DOI

    Scopus

    17
    Citation
    (Scopus)
  • Local Memory Mapping of Multicore Processors on an Automatic Parallelizing Compiler

    Yoshitake Oki, Yuto Abe, Kazuki Yamamoto, Kohei Yamamoto, Tomoya Shirakawa, Akimasa Yoshida, Keiji Kimura, Hironori Kasahara

    IEICE Transaction on Electronics Special Section on “Low-Power and High-Speed Chips”   E103-C ( 3 ) 98 - 109  2020.03  [Refereed]  [Domestic journal]

    Authorship:Last author

    DOI

    Scopus

  • Compiler Software Coherent Control for Embedded High Performance Multicore

    Boma A. Adhi, Tomoya Kashimata, Ken Takahashi, Keiji Kimura, Hironori Kasahara

    IEICE Transaction on Electronics Special Section on “Low-Power and High-Speed Chips”   E103-C ( 3 ) 85 - 97  2020.03  [Refereed]  [Domestic journal]

    Authorship:Last author

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Consideration of Accelerator Cost Estimation Method in Multi-Target Automatic Parallelizing Compiler

    Kazuki Yamamoto, Kazuki Fujita, Tomoya Kashimata, Ken Takahashi, Boma A. Adhi, Toshiaki Kitamura, Akihiro Kawashima, Akira Nodomi, Yuji Mori, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on System Architecture (ARC232@ETNET2020)    2020.02

    Authorship:Last author

  • Automatic Vector-Parallelization by Collaboration of Oscar Automatic Parallelizing Compiler and NEC Vectorizing Compiler

    Yuta Tadokoro, Hiroki Mikami, Takeo Hosomi, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on System Architecture (ARC232@ETNET2020)    2020.02

    Authorship:Last author

  • Extensions of OSCAR Compiler for Parallelizing C++ Programs

    Toma Kawasumi, Tilman Priesner, Masato Noguchi, Jixin Han, Hiroki Mikami, Takahiro Miyajima, Keishiro Tanaka, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on System Architecture (ARC232@ETNET2020)    2020.02

    Authorship:Last author

  • Automatically Parallelizing Compiler Cooperative OSCAR Vector Multicore

    Keiji Kimura, Kazuki Fujita, Kazuki Yamamoto, Tomoya Kashimata, Toshiaki Kitamura, Hironori Kasahara

    International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems    2020.02  [Refereed]

    Authorship:Last author

  • Aiming for World Level Research Promotion Considering Safety and Environment

    Hironori Kasahara

    Waseda Univ. Environmental Safety Center "Environment 40th anniversary edition "     3 - 3  2019.11  [Invited]

    Authorship:Lead author

  • Cascaded DMA Controller for Speedup of Indirect Memory Access in Irregular Applications

    Tomoya Kashimata, Toshiaki Kitamura, Keiji Kimura, Hironori Kasahara

    9th Workshop on Irregular Applications: Architectures and Algorithms (IA3 2019)    2019.11  [Refereed]

    Authorship:Last author

  • Fast and Highly Optimizing Separate Compilation for Automatic Parallelization

    Tohma Kawasumi, Ryota Tamura, Yuya Asada, Jixin Han, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

    The 2019 International Conference on High Performance Computing & Simulation (HPCS 2019)    2019.07  [Refereed]

    Authorship:Last author

  • 2018 CS PRESIDENT’S MESSAGE --Collaboration for the Future--

    Hironori Kasahara

    Computer, January 2019, IEEE Computer Society   ( 1-19 ) 72 - 76  2019.03  [Refereed]  [Invited]

    Authorship:Lead author

  • Speedup of indirect load by DMA cascading

    Tomoya Kashimata, Toshiaki Kitamura, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan(2018-ARC-234)    2019.01

    Authorship:Last author

  • Guest Editorial: Special Issue on Network and Parallel Computing for Emerging Architectures and Applications

    Zhang, F., Zhai, J., Snir, M., Jin, H., Kasahara, H., Valero, M.

    International Journal of Parallel Programming   47 ( 3 )  2019

    DOI

    Scopus

  • Software Cache Coherent Control by Parallelizing Compiler

    Boma A. Adhi, Masayoshi Mase, Yuhei Hosokawa, Yohei Kishimoto, Taisuke Onishi, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

    Lecture Notes in Computer Science   LNCS 11403. Springer, 2019   17 - 25  2019.01  [Refereed]

    Authorship:Last author

  • NPC: 15th IFIP International Conference Network and Parallel Computing

    Feng Zhang, Jidong Zhai, Marc Snir, Hai Jin, Hironori Kasahara, Mateo Valero

    Lecture Notes in Computer Science   11276 ( LNCS )  2018.11

  • IEEE Division VIII Delegate/Director Candidates

    Hironori Kasahara

    Computer, IEEE Computer Society   50 ( 8 ) 94 - 95  2018.07  [Invited]

  • Development of Compilation Flow and Evaluation of OSCAR Vector Multicore Architecture

    Ken Takahashi, Satoshi Karino, Kazuki Miyamoto, Takumi Kawata, Tomoya Kashimata, Tetsuya Makita, Toshiaki Kitamura, Keiji Kimura, Hironori Kasahara

    The 80th National Conversion of Information Processing Society of Japan    2018.03

    Authorship:Last author

  • FPGA implementation of OSCAR Vector Accelerator

    Tomoya Kashimata, Boma A. Adhi, Satoshi Karino, Kazuki Miyamoto, Takumi Kawata, Ken Takahashi, Tetsuya Makita, Toshiaki Kitamura, Keiji Kimura, Hironori Kasahara

    The 80th National Conversion of Information Processing Society of Japan    2018.03

    Authorship:Last author

  • Automatic parallelizing and vectorizing compiler framework for OSCAR vector multicore processor.

    Kazuki Miyamoto, Tetsuya Makita, Ken Takahashi, Tomoya Kashimata, Takumi Kawada, Satoshi Karino, Toshiaki Kitamura, Keiji Kimura, Hironori kasahara

    Information Processing Society of Japan, Special Interest Group on System Architecture (ARC222@ETNET2018)    2018.03

    Authorship:Last author

  • Preface

    Zhang, F., Zhai, J., Snir, M., Jin, H., Kasahara, H., Valero, M.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   11276 LNCS  2018

  • Satisfaction and Sustainability

    Hironori Kasahara

    Computer IEEE Computer Society   51   4 - 6  2018.01  [Refereed]  [Invited]

    Authorship:Lead author

  • Automatic Local Memory Management Using Hierarchical Adjustable Block for Multicores and Its Performance Evaluation

    Tomoya Shirakawa, Yuto abe, Yoshitake Ooki, Akimasa Yoshida, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2017-ARC-220 (DesignGaia2017)    2017.11

    Authorship:Last author

  • IEEE President-Elect Candidates Address Computer Society Concerns

    Hironori Kasahara

    Computer, IEEE Computer Society   50 ( 8 ) 96 - 100  2017.08  [Invited]

    Authorship:Corresponding author

  • Multicore Cache Coherence Control by a Parallelizing Compiler

    Hironori Kasahara, Keiji Kimura, Boma A. Adhi, Yuhei Hosokawa, Yohei Kishimoto, Masayoshi Mase

    IEEE COMPSAC 2017 (The 41th IEEE Computer Society International Conference on Computers, Software & Applications)    2017.07  [Refereed]

  • Message from the CAP 2017 Organizing Committee

    Cristina Seceleanu, Hironori Kasahara, Tiberiu Seceleanu

    2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC)    2017.07  [Invited]

    DOI

  • Hierarchical Interconnection Network Extension for Gen 5 Simulator Considering Large Scale Systems

    Tatsuya Onoguchi, Ayane Hayashi, Katsuyuki Utaka, Yuichi Matsushima, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on System Architecture (ARC217@ETNET2017)    2017.03

    Authorship:Last author

  • Parallel Processing of Automobile Real-time Control on Multicore System with Multiple Clusters

    Jin Miyata, Mamoru Shimaoka, Hiroki Mikami, Hirofumi Nishi, Hitoshi Suzuki, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on System Architecture (ARC217@ETNET2017)    2017.03

    Authorship:Last author

  • Code Generating Method with Profile Feedback for Reducing Compilation Time of Automatic Parallelizing Compiler

    Rina Fujino, Jixin Han, Mamoru Shimaoka, Hiroki Mikami, Takahiro Miyajima, Moriyuki Takamura, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on System Architecture (ARC217@ETNET2017)    2017.03

    Authorship:Last author

  • Automatic Local Memory Management for Multicores Having Global Address Space

    Kouhei Yamamoto, Tomoya Shirakawa, Yoshitake Oki, Akimasa Yoshida, Keiji Kimura, Hironori Kasahara

    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2016   10136   282 - 296  2017  [Refereed]

     View Summary

    Embedded multicore processors for hard real-time applications like automobile engine control require the usage of local memory on each processor core to precisely meet the real-time deadline constraints, since cache memory cannot satisfy the deadline requirements due to cache misses. To utilize local memory, programmers or compilers need to explicitly manage data movement and data replacement for local memory considering the limited size. However, such management is extremely difficult and time consuming for programmers. This paper proposes an automatic local memory management method by compilers through (i) multi-dimensional data decomposition techniques to fit working sets onto limited size local memory (ii) suitable block management structures, called Adjustable Blocks, to create application specific fixed size data transfer blocks (iii) multi-dimensional templates to preserve the original multi-dimensional representations of the decomposed multi-dimensional data that are mapped onto one-dimensional Adjustable Blocks (iv) block replacement policies from liveness analysis of the decomposed data, and (v) code size reduction schemes to generate shorter codes. The proposed local memory management method is implemented on the OSCAR multi-grain and multi-platform compiler and evaluated on the Renesas RP2 8 core embedded homogeneous multicore processor equipped with local and shared memory. Evaluations on 5 programs including multimedia and scientific applications show promising results. For instance, speedups on 8 cores compared to single core execution using off-chip shared memory on an AAC encoder program, a MPEG2 encoder program, Tomcatv, and Swim are improved from 7.14 to 20.12, 1.97 to 7.59, 5.73 to 7.38, and 7.40 to 11.30, respectively, when using local memory with the proposed method. These evaluations indicate the usefulness and the validity of the proposed local memory management method on real embedded multicore processors.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Kasahara Voted 2017 Computer Society President-Elect

    Hironori Kasahara, Jean Luc Gaudiot

    Computer, IEEE Computer Society   49 ( 12 ) 90 - 92  2016.12  [Invited]

    DOI

  • Architecture Design for the Environmental Monitoring System over the Winter Season

    Koichiro Yamashita, Takahisa Suzuki, Hongchun Li, Chen Ao, Yi Xu, Jun Tian, Keiji Kimura, Hironori Kasahara

    Proceedings of the 14th ACM International Symposium on Mobility Management and Wireless Access     27 - 34  2016.11  [Refereed]

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Reducing parallelizing compilation time by removing redundant analysis

    Jixin Han, Rina Fujino, Ryota Tamura, Mamoru Shimaoka, Hiroki Mikami, Moriyuki Takamura, Sachio Kamiya, Kazuhiko Suzuki, Takahiro Miyajima, Keiji Kimura, Hironori Kasahara

    SEPS 2016 - Proceedings of the 3rd International Workshop on Software Engineering for Parallel Systems, co-located with SPLASH 2016     1 - 9  2016.10  [Refereed]

    Authorship:Last author

     View Summary

    Parallelizing compilers employing powerful compiler optimizations are essential tools to fully exploit performance from today's computer systems. These optimizations are supported by both highly sophisticated program analysis techniques and aggressive program restructuring techniques. However, the compilation time for such powerful compilers becomes larger and larger for real commercial application due to these strong program analysis techniques. In this paper, we propose a compilation time reduction technique for parallelizing compilers. The basic idea of the proposed technique is based on an observation that parallelizing compilers apply multiple program analysis passes and restructuring passes to a source program but all program analysis passes do not have to be applied to the whole source program. Thus, there is an opportunity for compilation time reduction by removing redundant program analysis. We describe the removing redundant program analysis techniques considering the inter-procedural propagation of analysis update information in this paper. We implement the proposed technique into OSCAR automatically multigrain parallelizing compiler. We then evaluate the proposed technique by using three proprietary large scale programs. The proposed technique can remove 37.7% of program analysis time on average for basic analysis includes def-use analysis and dependence calculation, and 51.7% for pointer analysis, respectively.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • A Compilation Framework for Multicores having Vector Accelerators using LLVM

    Akira Maruoka, Yuya Mushu, Satoshi Karino, Takashi Mochiyama, Toshiaki Kitamura, Sachio Kamiya, Moriyuki Takamura, Keiji Kimura, Hironori Kasahara

    Summer United Workshops on Parallel, Distributed and Cooperative Processing, Technical Report of IPSJ,Vol.2016-ARC-221 No.4    2016.08

    Authorship:Last author

  • Multigrain Parallelization of Program for Medical Image Filtering

    Mariko Okumura, Tomoyuki Shibasaki, Kohei Kuwajima, Hiroki Mikami, Keiji Kimura, Kohei Kadoshita, Keiichi Nakano, Hironori Kasahara

    Technical Report of IPSJ, 2016-HPC-153    2016.03

    Authorship:Last author

  • Automatic Multigrain Parallel Processing for 3D Noise Reduction Using OSCAR Compiler

    Tomoyuki Shibasaki, Kohei Kuwajima, Mariko Okumura, Hiroki Mikami, Keiji Kimura, Kohei Kadoshita, Keiichi Nakano, Hironori Kasahara

    Technical Report of IPSJ, 2016-HPC-153    2016.03

    Authorship:Last author

  • The parallelism abstraction method with a data conversion at analysis in a OSCAR compiler

    Naoto Kageura, Tamami Wake, Ji Xin Han, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2016-HPC-153    2016.03

    Authorship:Last author

  • Multigrain Parallelization Using Profile Information of Embedded Applications Generated by Model-based Development Tools on Multicore Processors

    Dan Umeda, Takahiro Suzuki, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

    Trans. of IPSJ   57 ( 2 ) 1 - 12  2016.02  [Refereed]

    Authorship:Last author

    CiNii

  • Coarse grain task parallelization of earthquake simulator GMS using OSCAR compiler on various Cc-NUMA servers

    Mamoru Shimaoka, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9519   238 - 253  2016  [Refereed]

    Authorship:Last author

     View Summary

    This paper proposes coarse grain task parallelization for a earthquake simulation program using Finite Difference Method to solve the wave equations in 3-D heterogeneous structure or the Ground Motion Simulator (GMS) on various cc-NUMA servers using IBM, Intel and Fujitsu multicore processors. The GMS has been developed by the National Research Institute for Earth Science and Disaster Prevention (NIED) in Japan. Earthquake wave propagation simulations are important numerical applications to save lives through damage predictions of residential areas by earthquakes. Parallel processing with strong scaling has been required to precisely calculate the simulations quickly. The proposed method uses the OSCAR compiler for exploiting coarse grain task parallelism efficiently to get scalable speed-ups with strong scaling. The OSCAR compiler can analyze data dependence and control dependence among coarse grain tasks, such as subroutines, loops and basic blocks. Moreover, locality optimizations considering the boundary calculations of FDM and a new static scheduler that enables more efficient task schedulings on cc-NUMA servers are presented. The performance evaluation shows 110 times speed-up using 128 cores against the sequential execution on a POWER7 based 128 cores cc-NUMA server Hitachi SR16000 VM1, 37.2 times speed-up using 64 cores against the sequential execution on a Xeon E7-8830 based 64 cores cc-NUMA server BS2000, 19.8 times speed-up using 32 cores against the sequential execution on a Xeon X7560 based 32 cores cc-NUMA server HA8000/RS440, 99.3 times speed-up using 128 cores against the sequential execution on a SPARC64 VII based 256 cores cc-NUMA server Fujitsu M9000, 9.42 times speed-up using 12 cores against the sequential execution on a POWER8 based 12 cores cc-NUMA server Power System S812L.

    DOI

    Scopus

  • Multigrain parallelization for model-based design applications using the OSCAR compiler

    Dan Umeda, Takahiro Suzuki, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   9519   125 - 139  2016  [Refereed]

    Authorship:Last author

     View Summary

    Model-based design is a very popular software development method for developing a wide variety of embedded applications such as automotive systems, aircraft systems, and medical systems. Model-based design tools like MATLAB/Simulink typically allow engineers to graphically build models consisting of connected blocks for the purpose of reducing development time. These tools also support automatic C code generation from models with a special tool such as Embedded Coder to map models onto various kinds of embedded CPUs. Since embedded systems require real-time processing, the use of multi-core CPUs poses more opportunities for accelerating program execution to satisfy the real-time constraints. While prior approaches exploit parallelism among blocks by inspecting MATLAB/Simulink models, this may lose an opportunity for fully exploiting parallelism of the whole program because models potentially have parallelism within a block. To unlock this limitation, this paper presents an automatic parallelization technique for auto-generated C code developed by MATLAB/Simulink with Embedded Coder. Specifically, this work (1) exploits multi-level parallelism including inter-block and intra-block parallelism by analyzing the auto-generated C code, and (2) performs static scheduling to reduce dynamic overheads as much as possible. Also, this paper proposes an automatic profiling framework for the auto-generated code for enhancing static scheduling, which leads to improving the performance of MATLAB/Simulink applications. Performance evaluation shows 4.21 times speedup with six processor cores on Intel Xeon X5670 and 3.38 times speedup with four processor cores on ARM Cortex-A15 compared with uniprocessor execution for a road tracking application.

    DOI

    Scopus

    10
    Citation
    (Scopus)
  • Multicore Local Memory Management Scheme using Data Multidimensional Aligned Decomposition

    Kohei Yamamoto, Tomoya Shirakawa, Akimasa Yoshida, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on Embedded Systems(SIGEMB) Vol.2016-ARC-218No.10 Vol.2016-SLDM174No   115 ( 400 ) 55 - 60  2016.01

    Authorship:Last author

    CiNii

  • Android video processing system combined with automatically parallelized and power optimized code by OSCAR compiler

    Bui Duc Binh, Tomohiro Hirano, Hiroki Mikami, Hideo Yamamoto, Keiji Kimura, Hironori Kasahara

    Journal of Information Processing   24 ( 3 ) 504 - 511  2016  [Refereed]

    Authorship:Last author

     View Summary

    The emergence of multi-core processors in smart devices promises higher performance and low power consumption. The parallelization of applications enables us to improve their performance. However, simultaneously utilizing many cores would drastically drain the device battery life. This paper shows a demonstration system of realtime video processing combined with power reduction controlled by the OSCAR automatic parallelization compiler on ODROID-X2, an open Android development platform based on Samsung Exynos4412 Prime with 4 ARM Cortext- A9 cores. In this paper, we exploited the DVFS framework, core partitioning, and profiling technique and OSCAR parallelization - power control algorithm to reduce the total consumption in a real-time video application. The demonstration results show that it can cut power consumption by 42.8% for MPEG-2 Decoder application and 59.8% for Optical Flow application by using 3 cores in both applications.

    DOI

    Scopus

  • Accelerating Multicore Architecture Simulation Using Application Profile

    Keiji Kimura, Gakuho Taguchi, Hironori Kasahara

    2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC)     177 - 184  2016  [Refereed]

    Authorship:Last author

     View Summary

    Architecture simulators play an important role in exploring frontiers in the early stages of the architecture design. However, the execution time of simulators increases with an increase the number of cores. The sampling simulation technique that was originally proposed to simulate single-core processors is a promising approach to reduce simulation time. Two main hurdles for multi/many-core are preparing sampling points and thread skewing at functional simulation time. This paper proposes a very simple and low-error sampling-based acceleration technique for multi/many-core simulators. For a parallelized application, an iteration of a large loop including a parallelizable program part, is defined as a sampling unit. We apply X-means method to a profile result of the collection of iterations derived from a real machine to form clusters of those iterations. Multiple iterations are exploited as sampling points from these clusters. We execute the simulation along the sampling points and calculate the number of total execution cycles. Results from a 16-core simulation show that our proposed simulation technique gives us a maximum of 443x speedup with a 0.52% error and 218x speedup with 1.50% error on an average.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Annotatable systrace: An extended linux ftrace for tracing a parallelized program

    Daichi Fukui, Mamoru Shimaoka, Hiroki Mikami, Dominic Hillenbrand, Hideo Yamamoto, Keiji Kimura, Hironori Kasahara

    SEPS 2015 - Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems     21 - 25  2015.10  [Refereed]

    Authorship:Last author

     View Summary

    Investigation of the runtime behavior is one of the most important processes for performance tuning on a computer system. Profiling tools have been widely used to detect hot-spots in a program. In addition to them, tracing tools produce valuable information especially from parallelized programs, such as thread scheduling, barrier synchronizations, context switching, thread migration, and jitter by interrupts. Users can optimize a runtime system and hardware configuration in addition to a program itself by utilizing the attained information. However, existing tools provide information per process or per function. Finer information like task-or loop-granularity should be required to understand the program behavior more precisely. This paper has proposed a tracing tool, Annotatable Systrace, to investigate runtime execution behavior of a parallelized program based on an extended Linux ftrace. The Annotatable Systrace can add arbitrary annotations in a trace of a target program. The proposed tool exploits traces from 183.equake, 179.art, and mpeg2enc on Intel Xeon X7560 and ARMv7 as an evaluation. The evaluation shows that the tool enables us to observe load imbalance along with the program execution. It can also generate a trace with the inserted annotations even on a 32-core machine. The overhead of one annotation on Intel Xeon is 1.07 us and the one on ARMv7 is 4.44 us, respectively.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Nominees for Computer Society Officers and Board of Governors Positions in 2016

    Jean-Luc Gaudiot, Hironori Kasahara

    IEEE Computer Society Computer     96 - 97  2015.08  [Invited]

  • Evaluation of Parallelization of video decoding on Intel and ARM Multicore

    Tamami Wake, Shuhei Iizuka, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on Embedded Systems(SIGEMB)    2015.03

    Authorship:Last author

  • Dynamic Scheduling Algorithm for Automatically Parallelized and Power Reduced Applications on Multicore Systems

    Takashi Goto, Kohei Muto, Tomohiro Hirano, Hiroki Mikami, Uichiro Takahashi(Fujitsu, Sakae Inoue(Fujitsu, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on Embedded Systems(SIGEMB)    2015.03

    Authorship:Last author

  • Power Reduction of Real-time Dynamic Image Processing on Haswell Multicore Using OSCAR Compiler

    Shuhei Iizuka, Hideo Yamamoto, Tomohiro Hirano, Youhei Kishimoto, Takashi Goto, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

    Information Processing Society of Japan, Special Interest Group on Embedded Systems(SIGEMB)   114 ( 507 ) 219 - 224  2015.03

    CiNii

  • What Will 2022 Look Like? The IEEE CS 2022 Report

    Hasan Alkhatib, Paolo Faraboschi, Eitan Frachtenberg, Hironori Kasahara, Danny Lange, Phil Laplante, Arif Merchant, Dejan Milojicic, Karsten Schwan

    COMPUTER   48 ( 3 ) 68 - 76  2015.03  [Refereed]

     View Summary

    Over the last two years, nine IEEE Computer Society tech leaders collaborated to identify important industry advances that promise to change the world by 2022. The 23 technologies provide new insights into the emergence of "seamless intelligence."

  • Evaluation of Automatic Power Reduction with OSCAR Compiler on Intel Haswell and ARM Cortex-A9 Multicores

    Tomohiro Hirano, Hideo Yamamoto, Shuhei Iizuka, Kohei Muto, Takashi Goto, Tamami Wake, Hiroki Mikami, Moriyuki Takamura, Keiji Kimura, Hironori Kasahara

    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING (LCPC 2014)   8967   239 - 252  2015  [Refereed]

     View Summary

    Reducing power dissipation without performance degradation is one of the most important issues for all computing systems, such as supercomputers, cloud servers, desktop PCs, medical systems, smartphones and wearable devices. Exploiting parallelism, careful frequency-and-voltage control and clock-and-power-gating control for multicore/manycore systems are promising to attain performance improvements and reducing power dissipation. However, the hand parallelization and power reduction of application programs are very difficult and time-consuming. The OSCAR automatic parallelization compiler has been developed to overcome these problems by realizing automatic low-power control in addition to the parallelization. This paper evaluates performance of the low-power control technology of the OSCAR compiler on Intel Haswell and ARM multicore platforms. The evaluations show that the power consumption is reduced to 2/5 using 3 cores on the Intel Haswell multicore for the H. 264 decoder and 1/3 for Optical Flow on 3 cores with the power control compared with 3 cores without power control. On the ARM Cortex-A9 using 3 cores, the power control reduces power consumption to 1/2 with the H. 264 decoder and 1/3 with Optical Flow. These show that the OSCAR multi-platform compiler allows us to reduce the power consumption on Intel and ARM multicores.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Evaluation of Software Cashe Coherency Cotrol Scheme by an Automatic Parallelizing Compiler

    Yohei Kishimoto, Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    IPSJ SIG Technical Report Vol.2014-ARC-213 No.19    2014.12

    Authorship:Last author

  • Power Reduction of H.264/AVC Decoder on Android Multicore Using OSCAR Compiler

    Shuhei Iizuka, Hideo Yamamoto, Tomohiro Hirano, Takashi Goto, Hiroki Mikami, Uichiro Takahashi, Sakae Yamamoto, Moriyuki Takamura, Keiji Kimura, Hironori Kasahara

    IPSJ SIG Technical Report Vol.2014-ARC-204    2014.10

    Authorship:Last author

  • Expectation for Green Computing and Smart Grid

    Hironori Kasahara

    Technical Journal "Smart Grid", Special Issue `New Technologies for Smart Grid'     2 - 2  2014.10  [Refereed]  [Invited]

    Authorship:Lead author

  • Prospect of Green Computing

    Keiji Kimura, Hironori Kasahara

    Technical Journal "Smart Grid", Special Issue `New Technologies for Smart Grid   55 ( 14 ) 3 - 8  2014.10  [Refereed]  [Invited]

    Authorship:Last author

  • Parallel Hashtable Building Using Serialization Based on Inter-Thread Pipes

    Makoto Nakayama, Kenichi Yamazaki(Shibaura, Institute of Technology, Satoshi Tanaka(NTT DOCOMO, Hironori Kasahara

    The IEICE Transactions on Information and Systems   Vol. J97-D(10) ( 10 ) 1541 - 1552  2014.10  [Refereed]

    J-GLOBAL

  • OSCAR Compiler Controlled Multicore Power Reduction on Android Platform

    Hideo Yamamoto, Tomohiro Hirano, Kohei Muto, Hiroki Mikami, Takashi Goto, Dominic Hillenbrand, Moriyuki Takamura, Keiji Kimura, Hironori Kasahara

    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2013   8664   155 - 168  2014.09  [Refereed]

    Authorship:Last author

     View Summary

    In recent years, smart devices are transitioning from single core processors to multicore processors to satisfy the growing demands of higher performance and lower power consumption. However, power consumption of multicore processors is increasing, as usage of smart devices become more intense. This situation is one of the most fundamental and important obstacle that the mobile device industries face, to extend the battery life of smart devices. This paper evaluates the power reduction control by the OSCAR Automatic Parallelizing Compiler on an Android platform with the newly developed precise power measurement environment on the ODROID-X2, a development platform with the Samsung Exynos4412 Prime, which consists of 4 ARM Cortex-A9 cores. The OSCAR Compiler enables automatic exploitation of multigrain parallelism within a sequential program, and automatically generates a parallelized code with the OSCAR Multi-Platform API power reduction directives for the purpose of DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating. The paper also introduces a newly developed micro second order pseudo clock gating method to reduce power consumption using WFI (Wait For Interrupt). By inserting GPIO (General Purpose Input Output) control functions into programs, signals appear on the power waveform indicating the point of where the GPIO control was inserted and provides a precise power measurement of the specified program area. The results of the power evaluation for real-time Mpeg2 Decoder show 86.7% power reduction, namely from 2.79[W] to 0.37[W] and for real-time Optical Flow show 86.5% power reduction, namely from 2.23[W] to 0.36[W] on 3 core execution.

    DOI

    Scopus

    3
    Citation
    (Scopus)
  • Automatic Parallelization of Designed Engine Control C Codes by MATLAB/Simulink

    Dan Umeda, Youhei Kanehagi, Hiroki Mikami, Akihiro Hayashi, Mituhiro Tani, DENSO, Yuji Mori, DENSO, Keiji Kimura, Hironori Kasahara

    Journal of Embedded System Symposium   55 ( 8 ) 1817 - 1829  2014.08  [Refereed]

    CiNii

  • Tracing method of a parallelized program using Linux ftrace on a multicore processor

    Daichi Fukui, Mamoru Shimaoka, Hiroki Mikami, Dominic Hillenbrand, Keiji Kimura, Hironori Kasahara

    Summer United Workshops on Parallel, Distributed and Cooperative Processing, Technical Report of IPSJ,Vol.2014-ARC-211 No.6    2014.07

    Authorship:Last author

  • Android Demonstration System of Automatic Parallelization and Power Optimization by OSCAR Compiler

    Bui Duc Binh, Tomohiro Hirano, Hiroki Mikami, Dominic Hillenbrand, Keiji Kimura, Hironori Kasahara

    Summer United Workshops on Parallel, Distributed and Cooperative Processing, Technical Report of IPSJ,Vol.2014-ARC-211 No.6    2014.07

    Authorship:Last author

  • Automatic Parallelization of Small Point FFT on Multicore Processor

    Yuuki Furuyama, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

    IPSJ SIG Technical Report Vol.2013-ARC-201   113 ( 474 ) 15 - 22  2014.03

    Authorship:Last author

     View Summary

    Fast Fourier Transorm (FFT) is one of the most frequently used algorihtms in many applications including digital signal processing and image processing to compute Descrite Fourier Transform (DFT). Although small size FFT programs must be used in baseband signal processing such as LTE and so on, it's difficult to use special hardwares like DSPs for computing such a small problem because of their relatively large data transfer and control overhead. This paper proposes an automatic parallelization method to generate parallelized programs with low overhead for small size FFTs suited for shared memory multicore processor by applying cache optimization to avoide false sharing between cores. The proposed method has been implemented in OSCAR automatic parallelizing compiler, parallelized small point FFT programs from 32 points to 256 points and evaluated them on RP2 multicore processor having 8 SH-4A cores. It achieved 1.97 times speedup on 2 SH-4A cores and 3.9 times speedup on 4 SH-4A cores in a 256 points FFT program. In addition to the FFT programs, the proposed approach is applied to Fast Hadamard Transform (FHT) which has similar computation to the FFT. The results are 1.91 times speedup on 2 SH-4A cores and 3.32 times speedup on 4 SH-4A cores. It shows effectiveness of the proposed method and easiness of applying the method to many kinds of programs.

    CiNii

  • A Latency Reduction Technique for IDS by Allocating Decomposed Signature on Multi-core

    Shohei Yamada, Keiji Kimura, Hironori Kasahara

    IPSJ SIG Technical Report Vol.2013-ARC-201    2014.03

    Authorship:Last author

  • A Parallelizing Compiler Cooperative Acceleration Technique of Multicore Architecture Simulation using a Statistical Method

    Gakuho Taguchi, Keiji Kimura, Hironori Kasahara

    IPSJ SIG Technical Report    2014.03

    Authorship:Last author

  • Multicore Technologies Realizing Low-power Computing

    Keiji Kimura, Hironori Kasahara

    The Journal of Electronics, Information and Communication Engineers   97 ( 2 ) 133 - 139  2014.02  [Refereed]  [Invited]

    Authorship:Last author

    CiNii

  • Parallelization of Tree-to-TLV Serialization

    Makoto Nakayama, Kenichi Yamazaki, Satoshi Tanaka, Hironori Kasahara

    2014 IEEE INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC)    2014  [Refereed]

     View Summary

    A serializer/deserializer (SerDe) is necessary to serialize a data object into a byte array and to deserialize in reverse direction. A SerDe that is used worldwide and runs quickly is the Protocol Buffer (ProtoBuf), which serializes a tree-structured data object into the Type-Length-Value (TLV) format. Acceleration of SerDe processing is beneficial because SerDes are used in various fields. This paper proposes a new method that accelerates the tree-to-TLV serialization through 2-way parallel processing called "parallelized serialization" and "parallelization with streaming". Experimental results show that parallelized serialization with 4 worker threads achieves a 1.97 fold shorter serialization time than when using a single worker thread, and the combination of 2-way parallel processing achieves a 2.11 fold shorter output time than that for ProtoBuf when 4 worker threads, FileOutputStream and trees of 10,080 container nodes are used.

  • Profile-Based Automatic Parallelization for Android 2D Rendering by Using OSCAR Compiler

    Takashi Goto, Kohei Muto, Hideo Yamamoto, Tomohiro Hirano, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol.2013-ARC-207 No.12   2013 ( 12 ) 1 - 7  2013.12

    Authorship:Last author

    CiNii

  • New SerDe Featured by Precompression Using Knowledge of Redundant Subtrees

    Makoto Nakayama, Kenichi Yamazaki(Shibaura, Institute of Technology, Satoshi Tanaka(NTT DOCOMO, Hironori Kasahara

    The IEICE Transactions on Information and Systems   Vol. J96-D(10) ( Vol. J96-D(10) ) 2089 - 2100  2013.10  [Refereed]

    CiNii

  • An Evaluation of Hardware Barrier Synchronization Mechanism Considering Hierarchical Processor Grouping using OSCAR API Standard Translator

    Akihiro Kawashima, Youhei Kanehagi, Akihiro Hayashi, Keiji Kimura, Hironori Kasahara

    Summer United Workshops on Parallel, Distributed and Cooperative Processing, Technical Report of IPSJ, Vol.2013-ARC-206 No.16    2013.08

    Authorship:Last author

  • Automatic Power Control on Multicore Android Devices

    Tomohiro Hirano, Hideo Yamamoto, Kohei Muto, Hiroki Mikami, Takashi Goto, Dominic Hillenbrand, Keiji Kimura, Hironori Kasahara

    Summer United Workshops on Parallel, Distributed and Cooperative Processing, Technical Report of IPSJ, Vol.2013-ARC-206 No.23    2013.08

    Authorship:Last author

  • Automatic Parallelization of Hand Written Automotive Engine Control Codes Using OSCAR Compiler

    Dan Umeda, Yohei Kanehagi, Hiroki Mikami, Akihiro Hayashi, Keiji Kimura, Hironori Kasahara

    17th Workshop on Compilers for Parallel Computing (CPC2013), Lyon, France    2013.07  [Refereed]

    Authorship:Last author

  • OSCAR API v2.1: Extensions for an Advanced Accelerator Control Scheme to a Low-Power Multicore API

    Keiji Kimura, Cecilia Gonzáles-Álvarez, Akihiro Hayashi, Hiroki Mikami, Mamoru Shimaoka, Jun Shirako, Hironori Kasahara

    17th Workshop on Compilers for Parallel Computing (CPC2013), Lyon, France    2013.07  [Refereed]

    Authorship:Last author

  • Enhancing the Performance of a Multiplayer Game by Using a Parallelizing Compiler

    Yasir I. M. Al-Dosary, Yuki Furuyama, Dominic Hillenbrand, Keiji Kimura, Hironori Kasahara, Seinosuke Narita

    Technical Report of IPSJ    2013.04

     View Summary

    Video Games have been a very popular form of digital entertainment in recent years. They have been delivered in state of the art technologies that include multi-core processors that are known to be the leading contributor in enhancing the performance of computer applications. Since parallel programming is a difficult technology to implement, that field in Video Games is still rich with areas for advancements. This paper investigates performance enhancement in Video Games when using parallelizing compilers and the difficulties involved in achieving that. This experiment conducts several stages in attempting to parallelize a well-renowned sequentially written Video Game called ioquake3. First, the Game is profiled for discovering bottlenecks, then examined by hand on how much parallelism could be extracted from those bottlenecks, and what sort of hazards exist in delivering a parallel-friendly version of ioquake3. Then, the Game code is rewritten into a hazard-free version while also modified to comply with the Parallelizable-C rules, which crucially aid parallelizing compilers in extracting parallelism. Next, the program is compiled using a parallelizing compiler called OSCAR (Optimally Scheduled Advanced Multiprocessor) to produce a parallel version of ioquake3. Finally, the performance of the newly produced parallel version of ioquake3 on a Multi-core platform is analyzed.
    The following is found: (1) the parallelized game by the compiler from the revised sequential program of the game is found to achieve a 5.1 faster performance at 8-threads than original one on an IBM Power 5+ machine that is equipped with 8-cores, and (2) hazards are caused by thread contentions over globally shared data, and as well as thread private data, and (3) AI driven players are represented very similarly to Human players inside ioquake3 engine, which gives an estimation of the costs for parallelizing Human driven sessions, and (4) 70% of the costs of the experiment is spent in analyzing ioquake3 code, 30% in implementing the changes in the code.

  • An Investigation of Parallelization and Evaluation on Commercial Multi-core Smart Device

    Hideo Yamamoto, Takashi Goto, Tomohiro Hirano, Kouhei Muto, Hiroki Mikami, Dominic Hillenbrand, Akihiro Hayashi, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol. 2013-OS-124 No. 000310    2013.02

    Authorship:Last author

  • Preface

    Kasahara, H., Kimura, K.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   7760 LNCS  2013

  • Languages and Compilers for Parallel Computing: 25th International Workshop, LCPC 2012, Tokyo, Japan, September 11-13, 2012, Revised Selected Papers

    Hironori Kasahara, Keiji Kimura

    Lecture Notes in Computer Science   7760  2013  [Refereed]

    Authorship:Last author

  • Evaluation of power consumption at execution of multiple automatically parallelized and power controlled media applications on the RP2 low-power multicore

    Hiroki Mikami, Shumpei Kitaki, Masayoshi Mase, Akihiro Hayashi, Mamoru Shimaoka, Keiji Kimura, Masato Edahiro, Hironori Kasahara

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   7146   31 - 45  2013  [Refereed]

    Authorship:Last author

     View Summary

    This paper evaluates an automatic power reduction scheme of OSCAR automatic parallelizing compiler having power reduction control capability when multiple media applications parallelized by the OSCAR compiler are executed simultaneously on RP2, a 8-core multicore processor developed by Renesas Electronics, Hitachi, and Waseda University. OSCAR compiler enables the hierarchical multigrain parallel processing and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating and power gating for each processor core using the OSCAR multi-platform API. The RP2 has eight SH4A processor cores, each of which has power control mechanisms such as DVFS, clock gating and power gating. First, multiple applications with relatively light computational load are executed simultaneously on the RP2. The average power consumption of power controlled eight AAC encoder programs, each of which was executed on one processor, was reduced by 47%, (to 1.01W), against one AAC encoder execution on one processor (from 1.89W) without power control. Second, when multiple intermediate computational load applications are executed, the power consumptions of an AAC encoder executed on four processors with the power reduction control was reduced by 57% (to 0.84W) against an AAC encoder execution on one processor (from 1.95W). Power consumptions of one MPEG2 decoder on four processors with power reduction control was reduced by 49% (to 1.01W) against one MPEG2 decoder execution on one processor (from 1.99W). Finally, when a combination of a high computational load application program and an intermediate computational load application program are executed simultaneously, the consumed power reduced by 21% by using twice number of cores for each application. This paper confirmed parallel processing and power reduction by OSCAR compiler are efficient for multiple application executions. In execution of multiple light computational load applications, power consumption increases only 12% for one application. Parallel processing being applied to intermediate computational load applications, power consumption of executing one application on one processor core (1.49W) is almost same power consumption of two applications on eight processor cores (1.46W). © 2013 Springer-Verlag.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Parallelization of Automobile Engine Control Software on Multicore Processor

    Youhei Kanehagi, Dan Umeda, Hiroki Mikami, Akihiro Hayashi, Mitsuo Sawada(TOYOTA, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol.2013-ARC-203 No.2    2013.01

    Authorship:Last author

  • An Acceleration Technique of Many-core Architecture Simulation with Parallelized Applications by Statistical Technique

    Yoichi Abe, Gakuho Taguchi, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol.2012-ARC-203 N0.13    2013.01

    Authorship:Last author

  • A Parallelizing Compiler Cooperative Multicore Architecture Simulator with Changeover Mechanism of Simulation Modes

    Gakuho Taguchi, Yoichi Abe, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol.2012-ARC-203 N0.14    2013.01

    Authorship:Last author

  • Automatic Design Exploration Framework for Multicores with Reconfigurable Accelerators

    Cecilia Gonzalez-Alvarez, Haruku Ishikawa, Akihiro Hayashi, Daniel Jimenez-Gonzalez, Carlos Alvarez, Keiji Kimura, Hironori Kasahara

    7th Workshop on Reconfigurable Computing (WRC) 2013, held in conjuction with HiPEAC conference 2013, Berlin    2013.01  [Refereed]

    Authorship:Last author

  • Automatic Parallelization, Performance Predictability and Power Control for Mobile-Applications

    Dominic Hillenbrand, Akihiro Hayashi, Hideo Yamamoto, Keiji Kimura, Hironori Kasahara

    2013 IEEE COOL CHIPS XVI (COOL CHIPS)    2013  [Refereed]

    Authorship:Last author

     View Summary

    Currently few mobile applications exploit the power- and performance capabilities of multi-core architectures. As the number of cores increases, the challenges become more pressing. We picked three challenges: application parallelization, performance-predictability/portability and power control for mobile devices. We tackled the challenges with our auto-parallelizing compiler and operating system enhancements.

  • Parallelization of Automotive Engine Control Software On Embedded Multi-core Processor Using OSCAR Compiler

    Yohei Kanehagi, Dan Umeda, Akihiro Hayashi, Keiji Kimura, Hironori Kasahara

    2013 IEEE COOL CHIPS XVI (COOL CHIPS)    2013  [Refereed]

    Authorship:Last author

  • Reconciling application power control and operating systems for optimal power and performance

    Dominic Hillenbrand, Yuuki Furuyama, Akihiro Hayashi, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

    2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip, ReCoSoC 2013    2013  [Refereed]

    Authorship:Last author

     View Summary

    In the age of dark silicon on-chip power control is a necessity. Upcoming and state of the art embedded- and cloud computer system-on-chips (SoCs) already provide interfaces for fine grained power control. Sometimes both: core- and interconnect-voltage and frequency can be scaled for example. To further reduce power consumption SoCs often have specialized accelerators. Due to the rising specialization of hard- and software general purpose operating systems require changes to exploit the power saving opportunities provided by the hardware. However, they lack detailed hardware- and application-level-information. Application-level power control in turn is still very uncommon and difficult to realize. Now a days vendors of mobile devices are forced to tweak and patch system-level software to enhance the power efficiency of each individual product. This manual process is time consuming and must be re-iterated for each new product. In this paper we explore the opportunities and challenges of automatic application- level power control using compilers. © 2013 IEEE.

    DOI

    Scopus

    4
    Citation
    (Scopus)
  • Dynamic Profiling and Feedback Framework for Reduce-side Join

    Makoto Nakayama, Kenichi Yamazaki, Satoshi Tanaka, Hironori Kasahara

    2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013)     1255 - 1262  2013  [Refereed]

     View Summary

    MapReduce has become popular and Reduce-side join is one of the most important application of MapReduce. Data skew, in which the data load assigned to each Reduce task fluctuates task by task, increases the MapReduce job completion time. This paper proposes a dynamic profiling and feedback framework that works on a MapReduce cluster. The framework allows programmers to build their own algorithm to address data skew on Reduce-side join based on their specific knowledge and/or requirements. This paper also proposes an estimation method which makes our framework adapt to a wide range of MapReduce cluster sizes. This paper presents two example algorithms to address data skew using the estimation method, and the experimental results shows up to 2.59 times speed-up of join completion time on a cluster with 50 servers and highly skewed input data.

    DOI

    Scopus

  • Automatic parallelization with OSCAR API Analyzer: a cross-platform performance evaluation

    Cecilia Gonzalez-Alvarez, Youhei Kanehagi, Kosei Takemoto, Yohei Kishimoto, Kohei Muto, Hiroki Mikami, Akihiro Hayashi, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol.2012-ARC-202HPC137 No.10   2012 ( 10 ) 1 - 8  2012.12

    Authorship:Last author

    CiNii

  • Automatic Parallelization of Ground Motion Simulator

    Mamoru Shimaoka, Hiroki Mikami, Akihiro Hayashi, Yasutaka Wada, Keiji Kimura, Hidekazu Morita, HITACHI, Kunio Uchiyama, HITACHI, Hironori Kasahara

    Technical Report of IPSJ, Vol.2012-ARC-202HPC137 No.11    2012.12

    Authorship:Last author

  • Opportunities and Challenges of Application-Power Control in the Age of Dark Silicon

    Dominic Hillenbrand, Yuuki Furuyama, Akihiro Hayashi, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol.2012-ARC-202HPC137 No.26    2012.12

    Authorship:Last author

  • Parallelization of Basic Engine Controll Software Model on Multicore Processor

    Dan Umeda, Youhei Kanehagi, Hiroki Mikami, Akihiro Hayashi, Mituhiro Tani, DENSO, Yuji Mori, DENSO, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol.2012-ARC-201 No.22    2012.08

    Authorship:Last author

  • Realization of 1 Watt Web Service with RP-X Low-power Multicore Processor

    Yuuki Furuyama, Mamoru Shimaoka, Hiroki Mikami, Akihiro Hayashi, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol.2012-ARC-201 No.24    2012.08

    Authorship:Last author

  • Low Power Consumption Multicore Technology for Green Computing

    Hironori Kasahara

    Tokugicon Patent Office Society, Japan Patent Office   265   31 - 42  2012.05  [Refereed]

  • Inlining Analysis of Exception Flow and Fast Method Dispatch on Automatic Parallelization of Java

    Keiichi Tabata, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol. 2012-ARC-199, No. 9    2012.03

    Authorship:Last author

  • An Examination of Accelerating Many-core Architecture Simulation for Parallelized Media Applications

    Yoichi Abe, Ryo Ishizuka, Ryota Daigo, Gakuho Taguchi, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol. 2012-ARC-199, No. 3    2012.03

    Authorship:Last author

  • Heterogeneous multicore processor technologies for embedded systems

    Uchiyama, K., Arakawa, F., Kasahara, H., Nojiri, T., Noda, H., Tawara, Y., Idehara, A., Iwata, K., Shikano, H.

    Heterogeneous Multicore Processor Technologies for Embedded Systems   9781461402848  2012

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • OSCAR Parallelizing Compiler and API for Real-time Low Power Heterogeneous Multicores

    Akihiro Hayashi, Mamoru Shimaoka, Hiroki Mikami, Masayoshi Mase, Yasutaka Wada, Jun Shirako, Keiji Kimura, Hironori Kasahara

    16th Workshop on Compilers for Parallel Computing(CPC2012), Padova, Italy   5 ( 1 ) 68 - 79  2012.01  [Refereed]

    Authorship:Last author

    CiNii

  • Enhancing the Performance of a Multiplayer Game by Using a Parallelizing Compiler

    Yasir I. M. Al-Dosary, Keiji Kimura, Hironori Kasahara, Seinosuke Narita

    2012 17TH INTERNATIONAL CONFERENCE ON COMPUTER GAMES (CGAMES)     67 - 75  2012  [Refereed]

     View Summary

    Video Games have been a very popular form of digital entertainment in recent years. They have been delivered in state of the art technologies that include multi-core processors that are known to be the leading contributor in enhancing the performance of computer applications. Since parallel programming is a difficult technology to implement, that field in Video Games is still rich with areas for advancements. This paper investigates performance enhancement in Video Games when using parallelizing compilers and the difficulties involved in achieving that. This experiment conducts several stages in attempting to parallelize a well-renowned sequentially written Video Game called ioquake3. First, the Game is profiled for discovering bottlenecks, then examined by hand on how much parallelism could be extracted from those bottlenecks, and what sort of hazards exist in delivering a parallel-friendly version of ioquake3. Then, the Game code is rewritten into a hazard-free version while also modified to comply with the Parallelizable-C rules, which crucially aid parallelizing compilers in extracting parallelism. Next, the program is compiled using a parallelizing compiler called OSCAR (Optimally Scheduled Advanced Multiprocessor) to produce a parallel version of ioquake3. Finally, the performance of the newly produced parallel version of ioquake3 on a Multi-core platform is analyzed.
    The following is found: (1) the parallelized game by the compiler from the revised sequential program of the game is found to achieve a 5.1 faster performance at 8-threads than original one on an IBM Power 5+ machine that is equipped with 8-cores, and (2) hazards are caused by thread contentions over globally shared data, and as well as thread private data, and (3) AI driven players are represented very similarly to Human players inside ioquake3 engine, which gives an estimation of the costs for parallelizing Human driven sessions, and (4) 70% of the costs of the experiment is spent in analyzing ioquake3 code, 30% in implementing the changes in the code.

  • Automatic Parallelization of Dose Calculation Engine for A Particle Therapy on SMP Servers

    Akihiro Hayashi, Takuji Matsumoto, Hiroki Mikami, Keiji Kimura, Keiji Yamamoto, Hironori Saki, Yasuyuki Takatani, Hironori Kasahara

    Technical Report of IPSJ, Vol.2011-ARC189HPC132-2    2011.11

  • Parallelizing Compiler Framework and API for Heterogeneous Multicores

    Akihiro Hayashi, Yasutaka Wada, Takeshi Watanabe, Takeshi Sekiguchi, Masayoshi Mase, Jun Shirako, Keiji Kimura, Hironori Kasahara

    IPSJ Transactions on Advanced Computing Systems   5   68 - 79  2011.11  [Refereed]

  • An Evaluation of An Acceleration Technique of Many Core Architecture Simulator Considering Science Technology Calculation Program Structure

    Ryo Ishizuka, Yoichi Abe, Ryota Daigo, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol.2011-ARC-196-14    2011.07

  • Examination of Parallelization by CUDA in SPEC benchmark program

    Yuki Taira, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, Vol.2011-HPC-130-16    2011.07

  • Hiding I/O overheads with Parallelizing Compiler for Media Applications

    Akihiro Hayashi, Takeshi Sekiguchi, Masayoshi Mase, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    Techinical Report of IPSJ, Vol.2011-ARC-195OS117-14   2011 ( 14 ) 1 - 7  2011.04

    CiNii

  • A 45-nm 37.3 GOPS/W Heterogeneous Multi-Core SOC with 16/32 Bit Instruction-Set General-Purpose Core

    Osamu Nishii, Yoichi Yuyama, Masayuki Ito, Yoshikazu Kiyoshige, Yusuke Nitta, Makoto Ishikawa, Tetsuya Yamada, Junichi Miyakoshi, Yasutaka Wada, Keiji Kimura, Hironori Kasahara, Hideo Maejima

    IEICE TRANSACTIONS ON ELECTRONICS   E94C ( 4 ) 663 - 669  2011.04  [Refereed]

     View Summary

    We built a 12.4 mm x 12.4 mm, 45-nm CMOS, chip that integrates eight 648-MHz general purpose cores, two matrix processor (MX-2) cores, four flexible engine (FE) cores and media IP (VPU5) to establish heterogeneous multi-core chip architecture. The general purpose core had its IPC (instructions per cycle) performance enhanced by adding 32-bit instructions to the existing 16-bit fixed-length instruction set and executing up to two 32-bit instructions per cycle. Considering these five-to-seven years of embedded LSI and increasing trend of access-master within LSI, we predict that the memory usage of single core will not exceed 32-bit physical area (i.e. 4 GB), but chip-total memory usage will exceed 4 GB. Based on this prediction, the physical address was expanded from 32-bit to 40-bit. The fabricated chip was tested and a parallel operation of eight general purpose cores and four FE cores and eight data transfer units (DTU) is obtained on AAC (Advanced Audio Coding) encode processing.

    DOI

    Scopus

  • Evaluation of Power Consumption by Executing Media Applications on Low-power Multicore RP2

    Hiroki Mikami, Shumpei Kitaki, Takafumi Sato, Masayoshi Mase, Keiji Kimura, Kazuhisa Ishizaka, Junji Sakai, Masato Edahiro, Hironori Kasahara

    Technical Report of IPSJ, 2011-ARC-194-1    2011.03

  • A parallelizing compiler cooperative heterogeneous multicore processor architecture

    Yasutaka Wada, Akihiro Hayashi, Takeshi Masuura, Jun Shirako, Hirofumi Nakano, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   6760   215 - 233  2011  [Refereed]

     View Summary

    Heterogeneous multicore architectures, integrating several kinds of accelerator cores in addition to general purpose processor cores, have been attracting much attention to realize high performance with low power consumption. To attain effective high performance, high application software productivity, and low power consumption on heterogeneous multicores, cooperation between an architecture and a parallelizing compiler is important. This paper proposes a compiler cooperative heterogeneous multicore architecture and parallelizing compilation scheme for it. Performance of the proposed scheme is evaluated on the heterogeneous multicore integrating Hitachi and Renesas' SH4A processor cores and Hitachi's FE-GA accelerator cores, using an MP3 encoder. The heterogeneous multicore gives us 14.34 times speedup with two SH4As and two FE-GAs, and 26.05 times speedup with four SH4As and four FE-GAs against sequential execution with a single SH4A. The cooperation between the heterogeneous multicore architecture and the parallelizing compiler enables to achieve high performance in a short development period. © 2011 Springer-Verlag Berlin Heidelberg.

    DOI

  • Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores

    Akihiro Hayashi, Yasutaka Wada, Takeshi Watanabe, Takeshi Sekiguchi, Masayoshi Mase, Jun Shirako, Keiji Kimura, Hironori Kasahara

    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING   6548   184 - 198  2011  [Refereed]

     View Summary

    Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.

  • A parallelizing compiler cooperative heterogeneous multicore processor architecture

    Yasutaka Wada, Akihiro Hayashi, Takeshi Masuura, Jun Shirako, Hirofumi Nakano, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   6760   215 - 233  2011  [Refereed]

     View Summary

    Heterogeneous multicore architectures, integrating several kinds of accelerator cores in addition to general purpose processor cores, have been attracting much attention to realize high performance with low power consumption. To attain effective high performance, high application software productivity, and low power consumption on heterogeneous multicores, cooperation between an architecture and a parallelizing compiler is important. This paper proposes a compiler cooperative heterogeneous multicore architecture and parallelizing compilation scheme for it. Performance of the proposed scheme is evaluated on the heterogeneous multicore integrating Hitachi and Renesas' SH4A processor cores and Hitachi's FE-GA accelerator cores, using an MP3 encoder. The heterogeneous multicore gives us 14.34 times speedup with two SH4As and two FE-GAs, and 26.05 times speedup with four SH4As and four FE-GAs against sequential execution with a single SH4A. The cooperation between the heterogeneous multicore architecture and the parallelizing compiler enables to achieve high performance in a short development period. © 2011 Springer-Verlag Berlin Heidelberg.

    DOI

  • Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores

    Akihiro Hayashi, Yasutaka Wada, Takeshi Watanabe, Takeshi Sekiguchi, Masayoshi Mase, Jun Shirako, Keiji Kimura, Hironori Kasahara

    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING   6548   184 - 198  2011  [Refereed]

     View Summary

    Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.

  • Evaluation of Parallelizable C Programs by the OSCAR API Standard Translator

    Takuya Sato, Hiroki Mikami, Akihiro Hayashi, Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2010-ARC-191-2   2010 ( 2 ) 1 - 6  2010.10

    CiNii

  • Performance of Power Reduction Scheme by a Compiler on Heterogeneous Multicore for Consumer Electronics "RP-X"

    Yasutaka Wada, Akihiro Hayashi, Takeshi Watanabe, Takeshi Sekiguchi, Masahiro Mase, Jun Shirako, Keiji Kimura, Masayuki Ito, Jun Hasegawa, Makoto Sato, Tohru Nojiri, Kunio Uchiyama, Hironori Kasahara

    Technical Report of IPSJ, 2010-ARC-190-8(SWoPP2010)    2010.08

  • A Compiler Framework for Heterogeneous Multicores for Consumer Electronics

    Akihiro Hayashi, Yasutaka Wada, Takeshi Watanabe, Takeshi Sekiguchi, Masahiro Mase, Keiji Kimura, Masayuki Ito, Jun Hasegawa, Makoto Sato, Tohru Nojiri, Kunio Uchiyama, Hironori Kasahara

    Technical Report of IPSJ, 2010-ARC-190-7(SWoPP2010)    2010.08

  • An Acceleration Technique of Many Core Architecture Simulator Considering Program Structure

    Ryo Ishizuka, Toshiya Ootomo, Ryouta Daigo, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2010-ARC-190-20    2010.07

  • Parallelizable C and Its Performance on Low Power High Performance Multicore Processors

    Masayoshi Mase, Yuto Onozaki, Keiji Kimura, Hironori Kasahara

    15th Workshop on Compilers for Parallel Computing 2010    2010.07  [Refereed]

  • Parallelizing Compiler Directed Software Coherence

    Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2010-ARC-189-7   2010 ( 7 ) 1 - 10  2010.04

    CiNii

  • Processing Performance of Automatically Parallelized Applications on Embedded Multicore with Running Multiple Applications

    Takamichi Miyamoto, Masayoshi Mase, Keiji Kimura, Kazuhisa Ishizaka, Junji Sakai, Masato Edahiro, Hironori Kasahara

    Technical Report of IPSJ   2010-ARC-188 ( 9 )  2010.03

    CiNii

  • Hierarchical Parallel Processing of H.264/AVC Encoder on an Multicore Processeor

    Hiroki Mikami, Takamichi Miyamoto, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJVol.2010-ARC-187 No.22 Vol.2010-EMB-15 No.22   2010 ( 22 ) 1 - 6  2010.01

    CiNii

  • OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

    Keiji Kimura, Masayoshi Mase, Hiroki Mikami, Takamichi Miyamoto, Jun Shirako, Hironori Kasahara

    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING   5898   188 - 202  2010  [Refereed]

     View Summary

    OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR, compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR. API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.

  • Research and Development of Advanced Low Power Computer (Multicore/Manycore)Hardware and Software

    Hironori Kasahara

    EWE   ( 51 )  2009.11

    Authorship:Lead author

  • Element-Sensitive Pointer Analysis for Automatic Parallelization

    Masayoshi Mase, Keiji Kimura, Hironori Kasahara, Yuta murata

    IPSJ-SIGPRO    2009.10

  • Roles of Parallelizing Compilers for Low Power Manycores”, Panel: "What do compiler optimizations mean for many-cores?"

    Hironori Kasahara

    The 22nd International Workshop on Languages and Compilers for Parallel Computing (LCPC09)    2009.10  [Refereed]

  • 太陽電池で駆動できる低消費電力マルチコアプロセッサとソフトウェア

    笠原博徳

    Waseda University DCC Industry and Academia Cooperation Forum    2009.09  [Refereed]

  • Automatic Parallelization of Parallelizable C Programs on Multicore Processors

    Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2009-ARC-184-15(SWoPP2009)   2009  2009.08

    CiNii

  • Compiler Technology and API for Multi-Core

    Hironori Kasahara, Jun Shirako

    The IEEE Computer Society 2009 Vail Computer Elements Workshop    2009.06  [Refereed]

  • Parallelizing Compiler and API for Low Power Multicores

    Hironori Kasahara

    LSI and Systems Workshop 2009    2009.05  [Refereed]

  • 低消費電力マルチコアのための並列化コンパイラ及びAPI

    笠原 博徳

    LSIとシステムのワークショップ2009「エネルギーと環境のためのLSIとシステム」    2009.05  [Refereed]

  • マルチコア上でのOSCAR APIを用いた並列化コンパイラによる低消費電力化手法

    中川亮, 間瀬正啓, 白子準, 木村啓二, 笠原博徳

    SACSIS2009 - 先進的計算基盤システムシンポジウム    2009.05  [Refereed]

  • A Power Reduction Scheme for Parallelizing Compiler Using OSCAR API on Multicore Processors

    Ryo Nakagawa, Masayoshi Mase, Jun Shirako, Keiji Kimura, Hironori Kasahara

    Symposium on Advanced Computing Systems and Infrastructures (SACSIS 2009)    2009.05  [Refereed]

  • 組み込みマルチコアが開く新市場とそれを支える並列コンパイラ技術の最前線

    笠原 博徳

    組み込みプロセッサ&プラットホーム・ワークショップ2009    2009.04  [Refereed]

  • New Markets Opened by Embedded Multicores and Forefront of Parallelizing Compiler Technology

    Hironori Kasahara

    Embedded Processor and Platform Workshop 2009    2009.04  [Refereed]

  • OSCAR Parallelizing Compiler and API for Low Power High Performance Multicores

    Hironori Kasahara

    The 11th International Specialist Meeting on The Next generation Models on Climate Change and Sustainability for Adavanced High-performance Computing Facilities (Climate Meeting 2009)    2009.03  [Refereed]

  • 低消費電力マルチコアプロセッサとソフトウェア技術

    笠原 博徳

    早稲田大学技術説明会    2009.03  [Refereed]

  • Low Power Multicores Processor and Software Technologies

    Hironori Kasahara

    Waseda University Technical Presentation Meeting    2009.03  [Refereed]

  • Performance Evaluation of Minimum Execution Time Multiprocessor Scheduling Algorithms Using Standard Task Graph Set Ver3 Consider Parallelism of Task Graphs and Deviation of Task Execution Time

    Mamoru Shimaoka, Kazuhiro Imaizumi, Fumiyo Takano, Keiji Kimura, Hironori Kasahara

    Technical Report of IEICE   2009 ( 14 ) 127 - 132  2009.02

     View Summary

    This paper proposes the "Standard Task Graph Set Ver3" (STG Ver3) to evaluate performance of heuristic and optimization algorithms for the minimum execution time multiprocessor scheduling problem. The minimum execution time multiprocessor scheduling problem is known as a strong NP-hard combinational optimization problem to the public. The STG Ver2 was created by random task execution times and random predecessors. In addition, the STG Ver3 considers parallelism of task graphs and deviation of task execution times to let us understand characteristics of algrithms. This paper describes evaluation results by applying the STG Ver3 to several algorithms. Performance evaluation show that DF/IHS can give us optimal solutions for 87.25%, and PDF/IHS 92.25% within 600 seconds.

    CiNii

  • Parallel and Concurrent Search for Fast AND/OR Tree Search on Multicore Processors

    Fumiyo Takano, Yoshitaka Maekawa, Hironori Kasahara

    Proc. of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 2009)    2009.02  [Refereed]

  • 組込マルチコア用並列化コンパイラとAPIについて

    笠原 博徳

    トロン協会    2009.02  [Refereed]

  • Parallelizing Compiler and API for Embedded Multi-cores

    Hironori Kasahara

    TRON Association    2009.02  [Refereed]

  • 並列度・タスク実行時間の偏りを考慮した標準タスクグラフセットSTG Ver3を用いたスケジューリングアルゴリズムの評価

    島岡護, 今泉和浩, 鷹野芙美代, 木村啓二, 笠原博徳

    第119回 ハイパフォーマンスコンピューティング研究会   2009 ( 14 ) 127 - 132  2009.02  [Refereed]

     View Summary

    This paper proposes the "Standard Task Graph Set Ver3" (STG Ver3) to evaluate performance of heuristic and optimization algorithms for the minimum execution time multiprocessor scheduling problem. The minimum execution time multiprocessor scheduling problem is known as a strong NP-hard combinational optimization problem to the public. The STG Ver2 was created by random task execution times and random predecessors. In addition, the STG Ver3 considers parallelism of task graphs and deviation of task execution times to let us understand characteristics of algrithms. This paper describes evaluation results by applying the STG Ver3 to several algorithms. Performance evaluation show that DF/IHS can give us optimal solutions for 87.25%, and PDF/IHS 92.25% within 600 seconds.

    CiNii

  • Performance Evaluation of Minimum Execution Time Multiprocessor Scheduling Algorithms Using Standard Task Graph Set Ver3 Consider Parallelism of Task Graphs and Deviation of Task Execution Time

    Mamoru Shimaoka, Kazuhiro Imaizumi, Fumiyo Takano, Keiji Kimura, Hironori Kasahara

    Technical Report of IEICE   2009 ( 14 ) 127 - 132  2009.02  [Refereed]

     View Summary

    This paper proposes the "Standard Task Graph Set Ver3" (STG Ver3) to evaluate performance of heuristic and optimization algorithms for the minimum execution time multiprocessor scheduling problem. The minimum execution time multiprocessor scheduling problem is known as a strong NP-hard combinational optimization problem to the public. The STG Ver2 was created by random task execution times and random predecessors. In addition, the STG Ver3 considers parallelism of task graphs and deviation of task execution times to let us understand characteristics of algrithms. This paper describes evaluation results by applying the STG Ver3 to several algorithms. Performance evaluation show that DF/IHS can give us optimal solutions for 87.25%, and PDF/IHS 92.25% within 600 seconds.

    CiNii

  • Green multicore-SoC software-execution framework with timely-power-gating scheme

    Masafumi Onouchi, Keisuke Toyama, Toru Nojiri, Makoto Sato, Masayoshi Mase, Jun Shirako, Mikiko Sato, Masashi Takada, Masayuki Ito, Hiroyuki Mizuno, Mitaro Namiki, Keiji Kimura, Hironori Kasahara

    Proceedings of the International Conference on Parallel Processing     510 - 517  2009  [Refereed]

     View Summary

    We are developing a software-execution framework based on an octo-core chip multiprocessor named RP2 and an automatic multigrain-parallelizing compiler named OSCAR. The main purpose of this framework is to maintain good speed scalability and power efficiency over the number of processor cores under severe hardware restrictions for embedded use. Key to the speed scalability is reduction of a communication overhead with parallelized tasks. A data-categorization scheme enables small-overhead cache-coherency maintenance by using directives and instructions from the compiler. In this scheme, the number of cache-flushing time is minimized and parallelized tasks are quickly synchronized by using flags in local memory. As regards power efficiency, to reduce power consumption, power supply to processor cores waiting for other cores is timely and frequently cut off, even in the middle of an application, by using a timelypower- gating scheme. In this scheme, to achieve quick mode transition between "NORMAL" mode and "RESUME POWEROFF" mode, register values of the processor core are stored in core-local memory, which is active even in "RESUME POWEROFF" mode and can be accessed in one or two clock cycles. Measured speed and power of an application show good speed scalability in execution time and high power efficiency, simultaneously. In the case of a secure AAC-LC encoding program, execution speed when eight processor cores are used can be increased by 4.85 times compared to that of sequential execution. Moreover, power consumption under the same condition can be reduced by 51.0% by parallelizing and timely-power gating. The time for mode transition is less than 20 μsec, which is only 2.5% of the "RESUME POWER-OFF" period. © 2009 IEEE.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • A Power Saving Scheme on Multicore Processors Using OSCAR API

    Ryo Nakagawa, Masayoshi Mase, Jun Shirako, Keiji Kimura, Hironori Kasahara

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, TECHNICAL REPORT OF IEICE. (ICD2008/145)    2009.01

  • Local Memory Management Scheme by a Compiler for Multicore Processor

    Taku Momozono, Hirofumi Nakano, Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, TECHNICAL REPORT OF IEICE. (ICD2008/141)    2009.01

  • Performance Evaluation of Parallelizing Compiler Cooperated Heterogeneous Multicore Architecture Using Media Applications

    Teruo Kamiyama, Yasutaka Wada, Akihiro Hayashi, Masayoshi Mase, Hirofumi Nakano, Takeshi Watanabe, Keiji Kimura, Hironori Kasahara

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, TECHNICAL REPORT OF IEICE. (ICD2008/140)   2009 ( 1 ) 63 - 68  2009.01

     View Summary

    This paper describes a heterogeneous multicore architecture having accelerator cores in addition to general purpose cores, an automatic parallelizing compiler that cooperatively works with the heterogeneous multicore, a heterogeneous multicore architecture simulation environment, and performance evaluation results with the simulation environment. For the performance evaluation, multimedia applications written in C or Fortran, considered with parallelization by the compiler, are used. As a result, the evaluated heterogeneous multicore having two general purpose cores and two accelerator cores achieves 9.82 times speedup from MP3 encoder. This architecture also achieves 14.64 times speedup from JPEG2000 encoder.

    CiNii

  • Performance of OSCAR Multigrain Parallelizing Compiler on Multicore Processors

    Hiroki Mikami, Jun Shirako, Masayoshi Mase, Takamichi Miyamoto, Hirofumi Nakano, Fumiyo Takano, Akihiro Hayashi, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    Proc. of 14th Workshop on Compilers for Parallel Computing(CPC 2009)    2009.01  [Refereed]

  • マルチコア上でのOSCAR API を用いた低消費電力化手法

    中川亮, 間瀬正啓, 白子準, 木村啓二, 笠原博徳

    社団法人 電子情報通信学会, 信学技報, ICD2008-145    2009.01  [Refereed]

  • A Power Saving Scheme on Multicore Processors Using OSCAR API

    Ryo Nakagawa, Masayoshi Mase, Jun Shirako, Keiji Kimura, Hironori Kasahara

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, TECHNICAL REPORT OF IEICE. (ICD2008/145)    2009.01  [Refereed]

  • マルチコアのためのコンパイラにおけるローカルメモリ管理手法

    桃園拓, 中野啓史, 間瀬正啓, 木村啓二, 笠原博徳

    社団法人 電子情報通信学会, 信学技報, ICD2008-141    2009.01  [Refereed]

  • Local Memory Management Scheme by a Compiler for Multicore Processor

    Taku Momozono, Hirofumi Nakano, Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, TECHNICAL REPORT OF IEICE. (ICD2008/141)    2009.01  [Refereed]

  • メディアアプリケーションを用いた並列化コンパイラ協調型ヘテロジニアスマルチコアアーキテクチャのシミュレーション評価

    神山輝壮, 和田康孝, 林明宏, 間瀬正啓, 中野啓史, 渡辺岳志, 木村啓二, 笠原博徳

    社団法人 電子情報通信学会, 信学技報, ICD2008-140   108 ( 375 ) 63 - 68  2009.01  [Refereed]

     View Summary

    This paper describes a heterogeneous multicore architecture having accelerator cores in addition to general purpose cores, an automatic parallelizing compiler that cooperatively works with the heterogeneous multicore, a heterogeneous multicore architecture simulation environment, and performance evaluation results with the simulation environment. For the performance evaluation, multimedia applications written in C or Fortran, considered with parallelization by the compiler, are used. As a result, the evaluated heterogeneous multicore having two general purpose cores and two accelerator cores achieves 9.82 times speedup from MP3 encoder. This architecture also achieves 14.64 times speedup from JPEG2000 encoder.

    CiNii

  • Performance Evaluation of Parallelizing Compiler Cooperated Heterogeneous Multicore Architecture Using Media Applications

    Teruo Kamiyama, Yasutaka Wada, Akihiro Hayashi, Masayoshi Mase, Hirofumi Nakano, Takeshi Watanabe, Keiji Kimura, Hironori Kasahara

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, TECHNICAL REPORT OF IEICE. (ICD2008/140)   2009 ( 1 ) 63 - 68  2009.01  [Refereed]

     View Summary

    This paper describes a heterogeneous multicore architecture having accelerator cores in addition to general purpose cores, an automatic parallelizing compiler that cooperatively works with the heterogeneous multicore, a heterogeneous multicore architecture simulation environment, and performance evaluation results with the simulation environment. For the performance evaluation, multimedia applications written in C or Fortran, considered with parallelization by the compiler, are used. As a result, the evaluated heterogeneous multicore having two general purpose cores and two accelerator cores achieves 9.82 times speedup from MP3 encoder. This architecture also achieves 14.64 times speedup from JPEG2000 encoder.

    CiNii

  • Multiple-paths Search with Concurrent Thread Scheduling for Fast AND/OR Tree Search

    Fumiyo Takano, Yoshitaka Maekawa, Hironori Kasahara

    CISIS: 2009 INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS, VOLS 1 AND 2     51 - +  2009  [Refereed]

     View Summary

    This paper proposes a fast AND/OR tree search algorithm using a multiple-paths concurrent search method. Conventional heuristic AND/OR tree search algorithms expand nodes in only a descending order of heuristic evaluation values. However, since the evaluation values are heuristic, a solution node group sometimes includes nodes with lower evaluation values. The tree which has a solution node group including nodes with lower evaluation values requires a long time to be solved by the conventional algorithms. The proposed algorithm. allows us to search paths including nodes with lower evaluation values and paths including nodes with higher evaluation values concurrently For searching various paths concurrently, the proposed algorithm uses pseudo-threads and a pseudo-thread scheduler managed by a user program with low overhead compared with the OS thread management. The pseudo-thread scheduler can weight the amount of search on each path and schedule the pseudo-threads. The proposed algorithm car, solve trees which have solutions including nodes with lower evaluation values also quickly. For performance evaluation, the proposed algorithm was applied to a tsume-shogi (Japanese chess problem) solver as a typical AND/OR tree search problem. In tsume-shogi, players can reuse captured pieces. Performance evaluation results on 385 problems show that the proposed algorithm is 1.67 times faster on the average than the previous algorithm df-pn.

  • 情報家電用マルチコア並列化APIを生成する自動並列化コンパイラによる並列化の評価

    宮本孝道, 浅香沙織, 見神広紀, 間瀬正啓, 木村啓二, 笠原博徳

    情報処理学会論文誌 コンピューティングシステム   1 ( 3 ) 83 - 95  2008.12  [Refereed]

    CiNii

  • An Evaluation of Parallelization with Automatic Parallelizing Compiler Generating Consumer Electronics Multicore API

    Takamichi Miyamoto, Saori Asaka, Hiroki Mikami, Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    IPSJ Transactions on Advanced Computing Systems   1 ( 3 ) 83 - 95  2008.12  [Refereed]

    CiNii

  • Panel Discussions: Japanese Challenges for Multicore -Low Power High Performance Multicores,Compiler and API-

    Hironori Kasahara

    Intel Higher Education Program 2008 Asia Academic Forum    2008.10  [Refereed]

  • 低炭素社会実現のためのマルチコア・テクノロジーと利用技術への挑戦

    笠原 博徳

    IBM HPCフォーラム 2008    2008.09  [Refereed]

  • Multicore Technologies for Realization of Low-carbon Society and Challenge for Utilization Technologies

    Hironori Kasahara

    IBM HPC Forum 2008    2008.09  [Refereed]

  • An Eight Core - Eight-RAM SoC Delivers 8.6GMIPS and 33.6GFLOPS at 600MHz (1/2)

    Hironori Kasahara

    Microprocessor Forum Japan 2008    2008.07  [Refereed]

  • 8.6GMIPS/33.6GFLOPSを実現する8コア/8RAM内蔵SoC (1/2)

    笠原 博徳

    マイクロプロセッサ・フォーラム・ジャパン2008    2008.07  [Refereed]

  • Low Power High Performance Multicores Technology

    Hironori Kasahara

    JAPAN ASSOCIATION for HEAT PIPE Seminar    2008.07  [Refereed]

  • 低消費電力・高性能マルチコア技術

    笠原 博徳

    日本ヒートパイプ協会 第27回総会・講演会    2008.07  [Refereed]

  • Parallelizing Compiler Cooperative Heterogeneous Multicore

    Yasutaka Wada, Akihiro Hayashi, Takeshi Masuura, Jun Shirako, Hirofumi Nakano, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    Proc. of Workshop on Software and Hardware Challenges of Manycore Platforms (SHCMP 2008)    2008.06  [Refereed]

  • Parallelization of MP3 Encoder using Static Scheduling on a Heterogeneous Multicore

    Yasutaka Wada, Akihiro Hayashi, Takeshi Masuura, Jun Shirako, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Trans. of IPSJ on Computing Systems   1 ( 1 ) 105 - 119  2008.06  [Refereed]

    CiNii

  • ヘテロジニアスマルチコア上でのスタティックスケジューリングを用いたMP3エンコーダの並列化

    和田 康孝, 林 明宏, 益浦 健, 白子 準, 中野 啓史, 鹿野 裕明, 木村啓二, 笠原博徳

    情報処理学会論文誌コンピューティングシステム   1 ( 1 ) 105 - 119  2008.06  [Refereed]

    CiNii

  • OSCAR Low Power High Performance Multicore and Parallelizing Compiler

    Hironori Kasahara

    Nokia, Finland    2008.06  [Refereed]

  • Compiler and API for Low Power High Performance Multicores

    Hironori Kasahara

    8th International Forum on Application-Specific Multi-Processor SoC (MpSoc '08)    2008.06  [Refereed]

  • An Evaluation of Barrier Synchronization Mechanism Considering Hierarchical Processor Grouping

    Kaito Yamada, Masayoshi Mase, Jun Shirako, Keiji Kimura, Masayuki Ito, Toshihiro Hattori, Hiroyuki Mizuno, Kunio Uchiyama, Hironori Kasahara

    Technical Report of IPSJ   108 ( 28 ) 19 - 24  2008.05

     View Summary

    In order to use a large number of processor cores in a chip, hierarchical coarse grain task parallel processing, which exploits whole program parallelism by analyzing hierarchical coarse grain task parallelism inside loops and subroutines, has been proposed and implemented in OSCAR automatic parallelizing compiler. This hierarchical coarse grain task parallel processing defines processor groups hierarchically and logically, and assigns hierarchical coarse grain tasks to each processor group. A light-weight and scalable barrier synchronization mechanism considering hierarchical processor grouping, which supports hierarchical coarse grain task parallel processing, is developed and implemented into RP2 multicore processor having eight SH4A cores with support by NEDO "Multicore Technology for Realtime Consumer Electronics". This barrier mechanism is proposed and evaluated in this paper. The evaluation using AAC encoder program by 8 cores shows our barrier mechanism achieves 16% better performance than software barrier.

    CiNii

  • Automatic Parallelization of Restricted C Programs using Pointer Analysis

    Masayoshi Mase, Daisuke Baba, Harumi Nagayama, Yuta Murata, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ   108 ( 28 ) 69 - 74  2008.05

     View Summary

    This paper describes a restriction on pointer usage in C language for parallelism extraction by an automatic parallelizing compiler. By rewriting programs to satisfy the restriction, automatic parallelization using flow-sensitive, context-sensitive pointer analysis on an 8 cores SMP server achieved 3.80 times speedup for SPEC2000 art, 6.17 times speedup for SPEC2006 lbm and 5.14 times speedup for MediaBench mpeg2enc against the sequential execution, respectively.

    CiNii

  • OSCAR Multigrain Parallelizing Compiler for High Performance Low Power Multicores

    Hironori Kasahara

    The 14th Workshop on Compiler Techniques for High-Performance Computing(CTHPC2008)    2008.05  [Refereed]

  • OSCAR Multigrain Parallelizing Compiler for High Performance Low Power Multicores

    Hironori Kasahara

    Industrial Technology Research Institute, Hosted by Dr. Cheng    2008.05  [Refereed]

  • Embedded Multi-cores Advanced Parallelizing Compiler Technologies

    Hironori Kasahara

    11th Embedded Systems Expo    2008.05  [Refereed]

  • 組込みマルチコア最先端並列化コンパイラ技術

    笠原 博徳

    第11回組込みシステム開発技術展(ESEC) 専門セミナー    2008.05  [Refereed]

  • An Evaluation of Barrier Synchronization Mechanism Considering Hierarchical Processor Grouping

    Kaito Yamada, Masayoshi Mase, Jun Shirako, Keiji Kimura, Masayuki Ito, Toshihiro Hattori, Hiroyuki Mizuno, Kunio Uchiyama, Hironori Kasahara

    Technical Report of IPSJ, 2008   108 ( 28 ) 19 - 24  2008.05  [Refereed]

     View Summary

    In order to use a large number of processor cores in a chip, hierarchical coarse grain task parallel processing, which exploits whole program parallelism by analyzing hierarchical coarse grain task parallelism inside loops and subroutines, has been proposed and implemented in OSCAR automatic parallelizing compiler. This hierarchical coarse grain task parallel processing defines processor groups hierarchically and logically, and assigns hierarchical coarse grain tasks to each processor group. A light-weight and scalable barrier synchronization mechanism considering hierarchical processor grouping, which supports hierarchical coarse grain task parallel processing, is developed and implemented into RP2 multicore processor having eight SH4A cores with support by NEDO "Multicore Technology for Realtime Consumer Electronics". This barrier mechanism is proposed and evaluated in this paper. The evaluation using AAC encoder program by 8 cores shows our barrier mechanism achieves 16% better performance than software barrier.

    CiNii

  • 階層グルーピング対応バリア同期機構の評価

    山田 海斗, 間瀬 正啓, 白子 準, 木村 啓二, 伊藤 雅之, 服部 俊洋, 水野 弘之, 内山 邦男, 笠原 博徳

    第170回 計算機アーキテクチャ研究会   108 ( 28 ) 19 - 24  2008.05  [Refereed]

     View Summary

    In order to use a large number of processor cores in a chip, hierarchical coarse grain task parallel processing, which exploits whole program parallelism by analyzing hierarchical coarse grain task parallelism inside loops and subroutines, has been proposed and implemented in OSCAR automatic parallelizing compiler. This hierarchical coarse grain task parallel processing defines processor groups hierarchically and logically, and assigns hierarchical coarse grain tasks to each processor group. A light-weight and scalable barrier synchronization mechanism considering hierarchical processor grouping, which supports hierarchical coarse grain task parallel processing, is developed and implemented into RP2 multicore processor having eight SH4A cores with support by NEDO "Multicore Technology for Realtime Consumer Electronics". This barrier mechanism is proposed and evaluated in this paper. The evaluation using AAC encoder program by 8 cores shows our barrier mechanism achieves 16% better performance than software barrier.

    CiNii

  • Automatic Parallelization of Restricted C Programs using Pointer Analysis

    Masayoshi Mase, Daisuke Baba, Harumi Nagayama, Yuta Murata, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2008   108 ( 28 ) 69 - 74  2008.05  [Refereed]

     View Summary

    This paper describes a restriction on pointer usage in C language for parallelism extraction by an automatic parallelizing compiler. By rewriting programs to satisfy the restriction, automatic parallelization using flow-sensitive, context-sensitive pointer analysis on an 8 cores SMP server achieved 3.80 times speedup for SPEC2000 art, 6.17 times speedup for SPEC2006 lbm and 5.14 times speedup for MediaBench mpeg2enc against the sequential execution, respectively.

    CiNii

  • ポインタ解析を用いた制約付きCプログラムの自動並列化

    間瀬正啓, 馬場大介, 長山晴美, 村田雄太, 木村啓二, 笠原博徳

    第170回 計算機アーキテクチャ研究会   108 ( 28 ) 69 - 74  2008.05  [Refereed]

     View Summary

    This paper describes a restriction on pointer usage in C language for parallelism extraction by an automatic parallelizing compiler. By rewriting programs to satisfy the restriction, automatic parallelization using flow-sensitive, context-sensitive pointer analysis on an 8 cores SMP server achieved 3.80 times speedup for SPEC2000 art, 6.17 times speedup for SPEC2006 lbm and 5.14 times speedup for MediaBench mpeg2enc against the sequential execution, respectively.

    CiNii

  • Parallelization of Multimedia Applications by Compiler on Multicores for Consumer Electronics

    Takamichi Miyamoto, Saori Asaka, Hiroki Mikami, Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    Symposium on Advanced Computing Systems and Infrastructures (SACSIS 2008)    2008.05  [Refereed]

  • 情報家電用マルチコア上におけるマルチメディア処理のコンパイラによる並列化

    宮本孝道, 浅香沙織, 見神広紀, 間瀬正啓, 木村啓二, 笠原博徳

    SACSIS2008 - 先進的計算基盤システムシンポジウム    2008.05  [Refereed]

    CiNii

  • Heterogeneous multi-core architecture that enables 54x AAC-LC stereo encoding

    Hiroaki Shikano, Masaki Ito, Masafumi Onouchi, Takashi Todaka, Takanobu Tsunoda, Tomoyuki Kodama, Kunio Uchiyama, Toshihiko Odaka, Tatsuya Kamei, Ei Nagahama, Manabu Kusaoke, Yusuke Nitta, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    IEEE JOURNAL OF SOLID-STATE CIRCUITS   43 ( 4 ) 902 - 910  2008.04  [Refereed]

     View Summary

    This paper describes a heterogeneous multi-core processor (HMCP) architecture that integrates general-purpose processors (CPUs) and accelerators (ACCs) to achieve exceptional performance as well as low-power consumption for the SoCs of embedded systems. The memory architectures of CPUs and ACCs were unified to improve programming and compiling efficiency. Advanced audio codec-low complexity (AAC-LC) stereo audio encoding was parallelized on a heterogeneous multi-core having homogeneous processor cores and dynamically reconfigurable processor (DRP) ACC cores in a preliminary evaluation of the HMCP architecture. The performance evaluation revealed that 54x AAC encoding was achieved on the chip with two CPUs at 600 MHz and two DRPs at 300 MHz, which achieved encoding of an entire CD within 1-2 min.

    DOI

    Scopus

    16
    Citation
    (Scopus)
  • An 8 CPU SoC with Independent Power-off Control of CPUs and Multicore Software Debug Function

    Yutaka Yoshida, Masayuki Ito, Kiyoshi Hayase, Tomoichi Hayashi, Osamu Nishii, Toshihiro Hattori, Jun Sakiyama, Masashi Takada, Kunio Uchiyama, Jun Shirako, Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    Proc. of IEEE Cool Chips XI: Symposium on Low-Power and High-Speed Chips 2008    2008.04  [Refereed]

  • Panel Discussions: Multi-Core and Many-Core: the 5 to 10 Year View

    Hironori Kasahara

    IEEE Symposium on Low-Power and High-Speed Chips COOLChips XI    2008.04  [Refereed]

  • Multicore Compiler for Low Power High Performance Embedded Computing

    Hironori Kasahara

    IEEE Symposium on Low-Power and High-Speed Chips COOLChips XI, Yokohama, Japan    2008.04  [Refereed]

  • Power-aware compiler controllable chip multiprocessor

    Hiroaki Shikano, Jun Shirako, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    IEICE TRANSACTIONS ON ELECTRONICS   E91C ( 4 ) 432 - 439  2008.04  [Refereed]

     View Summary

    A power-aware compiler controllable chip multiprocessor (CMP) is presented and its performance and power consumption are evaluated with the optimally scheduled advanced multiprocessor (OSCAR) parallelizing compiler. The CMP is equipped with power control registers that change clock frequency and power supply voltage to functional units including processor cores, memories, and an interconnection network. The OSCAR compiler carries out coarse-grain task parallelization of programs and reduces power consumption using architectural power control support and the compiler's power saving scheme. The performance evaluation shows that MPEG-2 encoding on the proposed CMP with four CPUs results in 82.6% power reduction in real-time execution mode with a deadline constraint on its sequential execution time. Furthermore, MP3 encoding on a heterogeneous CMP with four CPUs and four accelerators results in 53.9% power reduction at 21.1-fold speed-up in performance against its sequential execution in the fastest execution mode.

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • 情報家電用マルチコア・プロセッサ

    笠原博徳

    電気学会誌   128 ( 3 ) 172 - 175  2008.03  [Refereed]

     View Summary

    This article has no abstract.

    DOI CiNii

  • Multicore Processors for Consumer Electronics

    Hironori Kasahara

    The Journal of IEE of Japan   128 ( 3 ) 172 - 175  2008.03  [Refereed]

  • A Multigrain Parallelizing Compiler with Power Control for Multicore Processors

    Hironori Kasahara

    Intel Headquarter, Hosted by Dr. Peng Tu    2008.02  [Refereed]

  • A Multigrain Parallelizing Compiler with Power Control for Multicore Processors

    Hironori Kasahara

    Google Headquarter, Hosted by Dr. Shih-wei Liao    2008.02  [Refereed]

  • Performance evaluation of compiler controlled power saving scheme

    Jun Shirako, Munehiro Yoshida, Naoto Oshiyama, Yasutaka Wada, Hirofurni Nakano, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    HIGH-PERFORMANCE COMPUTING   4759   480 - 493  2008  [Refereed]

     View Summary

    Multicore processors, or chip multiprocessors, which allow us to realize low power consumption, high effective performance, good cost performance and short hardware/software development period, are attracting much attention. In order to achieve full potential of multicore processors, cooperation with a parallelizing compiler is very important. The latest compiler extracts multilevel parallelism, such as coarse grain task parallelism, loop parallelism and near fine grain parallelism, to keep parallel execution efficiency high. It also controls voltage and clock frequency of processors carefully to reduce energy consumption during execution of an application program. This paper evaluates performance of compiler controlled power saving scheme which has been implemented in OSCAR multigrain parallelizing compiler. The developed power saving scheme realizes voltage/frequency control and power shutdown of each processor core during coarse grain task parallel processing. In performance evaluation, when static power is assumed as one-tenth of dynamic power, OSCAR compiler with the power saving scheme achieved 61.2 percent energy reduction for SPEC CFP95 applu without performance degradation on 4 processors and 87.4 percent energy reduction for mpeg2encode, 88.1 percent energy reduction for SPEC CFP95 tomcatv and 84.6 percent energy reduction for applu with real-time deadline constraint on 4 processors.

  • Language extensions in support of compiler parallelization

    Jun Shirako, Hironori Kasahara, Vivek Sarkar

    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING   5234   78 - +  2008  [Refereed]

     View Summary

    In this paper, we propose an approach to automatic compiler parallelization based on language extensions that is applicable to a broader range of program structures and application domains than in past work. As a complement to ongoing work on high productivity languages for explicit parallelism, the basic idea in this paper is to make sequential languages more amenable to compiler parallelization by adding enforceable declarations and annotations. Specifically, we propose the addition of annotations and declarations related to multidimensional arrays, points, regions, array views, parameter intents, array and object privatization, pure methods, absence of exceptions, and gather/reduce computations. In many cases, these extensions are also motivated by best practices in software engineering, and can also contribute to performance improvements in sequential code. A detailed case study of the Java Grande Forum benchmark suite illustrates the obstacles to compiler parallelization in current object-oriented languages, and shows that the extensions proposed in this paper can be effective in enabling compiler parallelization. The results in this paper motivate future work on building an automatically parallelizing compiler for the language extensions proposed in this paper.

  • Advanced Parallelizing Compiler Technology for High Performance Low Power Multicores

    Hironori Kasahara

    VDEC Refresh Seminar    2008.01  [Refereed]

  • 高性能低消費電力マルチコアのための最先端並列化コンパイラ技術

    笠原 博徳

    VDECリフレッシュ・セミナー    2008.01  [Refereed]

  • Software-cooperative power-efficient heterogeneous multi-core for media processing

    Hiroaki Shikano, Masaki Ito, Kunio Uchiyama, Toshihiko Odaka, Akihiro Hayashi, Takeshi Masuura, Masayoshi Mase, Jun Shirako, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    2008 ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2     712 - +  2008  [Refereed]

     View Summary

    A heterogeneous multi-core processor (HMCP) architecture, which integrates general purpose processors (CPU) and accelerators (ACC) to achieve high-performance as well as low-power consumption with the support of a parallelizing compiler, was developed. The evaluation was performed using an MP3 audio encoder on a simulator that accurately models the HMCP, It showed that 16-frame encoding on the HMCP with four CPUs and four ACCs yielded 24.5-fold speed-up of performance against sequential execution on one CPU. Furthermore, power saving by the compiler reduced energy consumption of the encoding to 0.17 J, namely, by 28.4%.

  • Performance evaluation of compiler controlled power saving scheme

    Jun Shirako, Munehiro Yoshida, Naoto Oshiyama, Yasutaka Wada, Hirofurni Nakano, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    HIGH-PERFORMANCE COMPUTING   4759   480 - 493  2008  [Refereed]

     View Summary

    Multicore processors, or chip multiprocessors, which allow us to realize low power consumption, high effective performance, good cost performance and short hardware/software development period, are attracting much attention. In order to achieve full potential of multicore processors, cooperation with a parallelizing compiler is very important. The latest compiler extracts multilevel parallelism, such as coarse grain task parallelism, loop parallelism and near fine grain parallelism, to keep parallel execution efficiency high. It also controls voltage and clock frequency of processors carefully to reduce energy consumption during execution of an application program. This paper evaluates performance of compiler controlled power saving scheme which has been implemented in OSCAR multigrain parallelizing compiler. The developed power saving scheme realizes voltage/frequency control and power shutdown of each processor core during coarse grain task parallel processing. In performance evaluation, when static power is assumed as one-tenth of dynamic power, OSCAR compiler with the power saving scheme achieved 61.2 percent energy reduction for SPEC CFP95 applu without performance degradation on 4 processors and 87.4 percent energy reduction for mpeg2encode, 88.1 percent energy reduction for SPEC CFP95 tomcatv and 84.6 percent energy reduction for applu with real-time deadline constraint on 4 processors.

  • An 8640 MIPS SoC with independent power-off control of 8 CPUs and 8 RAMs by an automatic parallelizing compiler

    Masayuki Ito, Toshihiro Hattori, Yutaka Yoshida, Kiyoshi Hayase, Tomoichi Hayashi, Osamu Nishii, Yoshihiko Yasu, Atsushi Hasegawa, Masashi Takada, Masaki Ito, Hiroyuki Mizuno, Kunio Uchiyama, Toshihiko Odaka, Jun Shirako, Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    Digest of Technical Papers - IEEE International Solid-State Circuits Conference   51   81 - 598  2008  [Refereed]

     View Summary

    A 104.8mm2 90nm CMOS 600MHz SoC integrates 8 processor cores and 8 user RAMs in 17 separate power domains and delivers 33.6GFLOPS. An automatic parallelizing compiler assigns tasks to each CPU and controls its power mode including power supply in accordance with its processing load and status. The compiler also uses barrier registers to achieve fast and accurate CPU synchronization. ©2008 IEEE.

    DOI

    Scopus

    37
    Citation
    (Scopus)
  • Language extensions in support of compiler parallelization

    Jun Shirako, Hironori Kasahara, Vivek Sarkar

    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING   5234   78 - +  2008  [Refereed]

     View Summary

    In this paper, we propose an approach to automatic compiler parallelization based on language extensions that is applicable to a broader range of program structures and application domains than in past work. As a complement to ongoing work on high productivity languages for explicit parallelism, the basic idea in this paper is to make sequential languages more amenable to compiler parallelization by adding enforceable declarations and annotations. Specifically, we propose the addition of annotations and declarations related to multidimensional arrays, points, regions, array views, parameter intents, array and object privatization, pure methods, absence of exceptions, and gather/reduce computations. In many cases, these extensions are also motivated by best practices in software engineering, and can also contribute to performance improvements in sequential code. A detailed case study of the Java Grande Forum benchmark suite illustrates the obstacles to compiler parallelization in current object-oriented languages, and shows that the extensions proposed in this paper can be effective in enabling compiler parallelization. The results in this paper motivate future work on building an automatically parallelizing compiler for the language extensions proposed in this paper.

  • Power Reduction Controll for Multicores in OSCAR Multigrain Parallelizing Compiler

    Jun Shirako, Keiji Kimura, Hironori Kasahara

    ISOCC: 2008 INTERNATIONAL SOC DESIGN CONFERENCE, VOLS 1-3     50 - 55  2008  [Refereed]

     View Summary

    Multicore processors have become mainstream computer architecture to go beyond the performance and power efficiency limits of single-core processors. To achieve low power consumption and high performance on multicores, parallelizing compilers take on an important role. This paper describes the performance of a compiler-based power reduction scheme cooperating with OSCAR multigrain parallelizing compiler on a newly developed 8-way SH4A low power multicore chip for consumer electronics, which supports DVFS (Dynamic Voltage and Frequency Scaling) and Clock/Power Gating. Using hardware parameters and parallelized program information, OSCAR compiler determines suitable voltage and frequency of each active processor core and appropriate schedule of clock gating and power gating. Performance experiments shows the compiler reduces consumed power by 88.3%, namely from 5.68 W to 0.67 W, for real-time secure AAC Encoding and 73.5%, namely from 5.73 W to 1.52 W, for real-time MPEG2 Decoding on 8 core execution.

  • Parallelization with Automatic Parallelizing Compiler Generating Consumer Electronics Multicore API

    Takamichi Miyamoto, Saori Asaka, Hiroki Mikami, Masayoshi Mase, Yasutaka Wada, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS     600 - 607  2008  [Refereed]

     View Summary

    Multicore processors have been adopted for consumer electronics like portable electronics, mobile phones, car navigation systems, digital TVs and games to obtain high performance with low power consumption. The OSCAR automatic parallelizing compiler has been developed to utilize these multicores easily. Also, a new Consumer Electronics Multicore Application Program Interface (API) to use the OSCAR compiler with native sequential compilers for various kinds of multicores from different vendors has been developed in NEDO (New Energy and Industrial Technology Development Organization) "Multicore Technology for Realtime Consumer Electronics" project with Japanese 6 IT companies. This paper evaluates the parallel processing performance of multimedia applications using this API by the OSCAR compiler on the FR1000 4 VLIW cores multicore processor developed by Fujitsu Ltd, and the RP1 4 SH-4A cores multicore processor jointly-developed by Renesas Technology Corp., Hitachi Ltd. and Waseda University. As the results, the parallel codes generated by the OSCAR compiler using the API give us 3.27 times speedup on average using 4 cores against 1 core on the FR1000 multicore, and 3.31 times speedup on average using 4 cores against 1 core on the RP1 multicore.

    DOI

    Scopus

    6
    Citation
    (Scopus)
  • Parallelization for Multimedia Processing on Multicore Processors

    Takamichi Miyamoto, Kei Tamura, Hiroaki Tano, Hiroki Mikami, Saori Asaka, Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2007-ARC-175-05 (DesignGaia2007)   2007 ( 115 ) 77 - 82  2007.11

     View Summary

    Multicore processors have attracted much attention to handle the increase of power consumption, the slowdown of improvement of processor clock speed, and the increase of hardware/software developing period. Also, speeding up multimedia applications is required with the progress of the consumer electronics devices like mobile phones, digital TV and games. This paper describes parallelization methods of multimedia applications on the multicore processors. Especially in this paper, MPEG2 encoding and MPEG2 decoding are selected as examples of video sequence processing, MP3 encoding is selected as an example of audio processing, JPEG 2000 encoding is selected as an example of picture processing. OSCAR multigrain parallelizing compiler parallelizes these media applications using newly developed multicore API. This paper evaluates parallel processing performances of these multimedia applications on the FR1000 multicore processor developed by Fujitsu Ltd, and the RP1 multicore processor jointly-developed by Waseda University, Renesas Technology Corp. and Hitachi Ltd.

    CiNii

  • マルチコアプロセッサ上でのマルチメディア処理の並列化

    宮本孝道, 田村圭, 田野裕秋, 見神広紀, 浅香沙織, 間瀬正啓, 木村啓二, 笠原博徳

    情報処理学会研究会報告2007-ARC-175-15(デザインガイア2007)   2007 ( 115 ) 77 - 82  2007.11  [Refereed]

     View Summary

    Multicore processors have attracted much attention to handle the increase of power consumption, the slowdown of improvement of processor clock speed, and the increase of hardware/software developing period. Also, speeding up multimedia applications is required with the progress of the consumer electronics devices like mobile phones, digital TV and games. This paper describes parallelization methods of multimedia applications on the multicore processors. Especially in this paper, MPEG2 encoding and MPEG2 decoding are selected as examples of video sequence processing, MP3 encoding is selected as an example of audio processing, JPEG 2000 encoding is selected as an example of picture processing. OSCAR multigrain parallelizing compiler parallelizes these media applications using newly developed multicore API. This paper evaluates parallel processing performances of these multimedia applications on the FR1000 multicore processor developed by Fujitsu Ltd, and the RP1 multicore processor jointly-developed by Waseda University, Renesas Technology Corp. and Hitachi Ltd.

    CiNii

  • Parallelization for Multimedia Processing on Multicore Processors

    Takamichi Miyamoto, Kei Tamura, Hiroaki Tano, Hiroki Mikami, Saori Asaka, Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2007-ARC-175-05 (DesignGaia2007)   2007 ( 115 ) 77 - 82  2007.11  [Refereed]

     View Summary

    Multicore processors have attracted much attention to handle the increase of power consumption, the slowdown of improvement of processor clock speed, and the increase of hardware/software developing period. Also, speeding up multimedia applications is required with the progress of the consumer electronics devices like mobile phones, digital TV and games. This paper describes parallelization methods of multimedia applications on the multicore processors. Especially in this paper, MPEG2 encoding and MPEG2 decoding are selected as examples of video sequence processing, MP3 encoding is selected as an example of audio processing, JPEG 2000 encoding is selected as an example of picture processing. OSCAR multigrain parallelizing compiler parallelizes these media applications using newly developed multicore API. This paper evaluates parallel processing performances of these multimedia applications on the FR1000 multicore processor developed by Fujitsu Ltd, and the RP1 multicore processor jointly-developed by Waseda University, Renesas Technology Corp. and Hitachi Ltd.

    CiNii

  • Multigrain Parallelization of Restricted C Programs on SMP Servers and Low Power Multicores

    M. Mase, D. Baba, H. Nagayama, H. Tano, T. Masuura, T. Miyamoto, J. Shirako, H. Nakano, K. Kimura, H. Kasahara

    The 20th International Workshop on Languages and Compilers for Parallel Computing (LCPC2007)    2007.10  [Refereed]

  • Low Power High Performance Multicores and Compiler Technology

    Hironori Kasahara

    The 5th Technology Link in W.T.L.O - For International Research Center in Collaboration of Industry and Academia    2007.10  [Refereed]

  • 低消費電力・高性能マルチコアとコンパイラ技術

    笠原 博徳

    第5回Technology Link in W.T.L.O 〜 産学連携における国際化拠点の構築に向けて 〜    2007.10  [Refereed]

  • 情報家電用マルチコアSMP実行モードにおける制約付きCプログラムのマルチグレイン並列化

    間瀬正啓, 馬場大介, 長山晴美, 田野裕秋, 益浦健, 宮本孝道, 白子準, 中野啓史, 木村啓二, 笠原博徳

    情報家電用マルチコアSMP実行モードにおける制約付きCプログラムのマルチグレイン並列化    2007.10  [Refereed]

  • A Multi-core Parallelizing Compiler for Low-Power High-Performance Computing

    Hironori Kasahara

    Colloquium Electrical and Computer Engineering, Computer and Information Technology Institute, Computer Science, and Dean of Engineering, Duncan Hall, Rice University, Hosted by Prof. Vivek Sarkar    2007.10  [Refereed]

  • How is specifically multicore programming different from traditional parallel computing?", Panel Discussion on "How is specifically multicore programming different from traditional parallel computing?

    Hironori Kasahara

    The 20th International Workshop on Languages and Compilers for Parallel Computing (LCPC2007), University of Illinois at Urbana-Champaign    2007.10  [Refereed]

  • 情報家電用マルチコアSMP実行モードにおける制約付きCプログラムのマルチグレイン並列化

    間瀬正啓, 馬場大介, 長山晴美, 田野裕秋, 益浦健, 宮本孝道, 白子準, 中野啓史, 木村啓二, 笠原博徳

    組込みシステムシンポジウム2007    2007.10  [Refereed]

    CiNii

  • Multigrain Parallelization of Restricted C Programs in SMP Execution Mode of a Multicore for Consumer Electronics

    Masayoshi Mase, Daisuke Baba, Harumi Nagayama, Hiroaki Tano, Takeshi Masuura, Takamichi Miyamoto, Jun Shirako, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Embedded Systems Symposium 2007 (ESS 2007)    2007.10  [Refereed]

  • Multicore Innovation

    Hironori Kasahara

    Waseda Univ. 125 th & Faculty of Science and Engineering 100th Anniversary Symposium "Innovative Information, Electronics, and Optical technology"    2007.09  [Refereed]

  • マルチコア・イノベーション

    笠原 博徳

    早稲田大学125周年・理工学部100周年記念シンポジウム “イノベーティブ情報・電子・光技術”    2007.09  [Refereed]

  • Compiler Control Power Saving for Heterogeneous Multicore Processor

    Akihiro Hayashi, Taketo Iyoku, Ryo Nakagawa, Shigeru Matsumoto, Kaito Yamada, Naoto Oshiyama, Jun Shirako, Yasutaka Wada, Hirofumi Nakano, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2007-ARC-174-18(SWoPP2007)   2007 ( 79 ) 103 - 108  2007.08

     View Summary

    Multicore processors are getting introduced for performance improvement and reduction of power dissipation in various IT fields, such as consumer electronics, PCs, servers and super-computers. Especially, heterogeneous multicores have attracted much attention in consumer electronics to achieve higher performance per watt. In order to satisfy the demand for the high performance, low power dissipation and high software productivity, Parallelizing compilers for both parallelization and Frequency and Voltage control are required. This paper describes the evaluation results of compiler control power saving for a heterogeneous multicore processor which integrates upto 4 general purpose embedded processor Renesas SH4As and 4 accelerator core like dynamically reconfigureable processors Hitachi FE-GAs. Performance evaluation shows the heterogeneous multicore gave us 24.32 times speed up against sequential processing and 28.43% energy savings for MP3 encoding program without performance degradation.

    CiNii

  • A Hierarchical Coarse Grain Task Static Scheduling Scheme on a Heterogeneous Multicore

    Yasutaka Wada, Akihiro Hayashi, Taketo Iyoku, Jun Shirako, Hirofumi Nakano, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2007-ARC-174-17(SWoPP2007)   2007 ( 79 ) 97 - 102  2007.08

     View Summary

    This paper proposes a static scheduling scheme for hierarchical coarse grain task parallel processing on a heterogeneous multicore processor. A heterogeneous multicore processor integrates not only general purpose processors but also accelerators like dynamically reconfigurable processors (DRPs) or digital signal processors (DSPs). Effective usage of these accelerators allows us to get high performance and low power consumption at the same time. In the proposed scheme, the compiler extracts parallelism using coarse grain parallel processing and assigns tasks considering characteristics of each core to minimize the execution time of an application. Performance of the proposed scheme is evaluated on a heterogeneous multicore processor using MP3 encoder. Hetero-geneous configurations give us 12.64 times speedup with two SH4As and two DRPs and 24.48 times speedup with four SH4As and four DRPs against sequential execution with one SH4A core.

    CiNii

  • Evaluation of Heterogeneous Multicore Architecture with AAC-LC Stereo Encoding

    Hiroaki Shikano, Masaki Ito, Takashi Todaka, Takanobu Tsunoda, Tomoyuki Kodama, Masafumi Onouchi, Kunio Uchiyama, Toshihiko Odaka, Tatsuya Kamei, Ei Nagahama, Manabu Kusaoke, Yusuke Nitta, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, TECHNICAL REPORT OF IEICE. (ICD2007-71)   107 ( 195 ) 11 - 16  2007.08

     View Summary

    This paper describes a heterogeneous multi-core processor (HMCP) architecture which integrates general purpose processors (CPU) and accelerators (ACC) to achieve high-performance as well as low-power consumption for SoCs of embedded systems. Memory architecture of CPUs and ACCs were unified to improve programming and compiling efficiency. For preliminary evaluation of the HMCP architecture, AAC-LC stereo audio encoding is parallelized on a heterogeneous multi-core having homogeneous processor cores and dynamic reconfigurable processor (DRP) accelerator cores. The performance evaluation shows that 54x AAC encoding is achieved on the chip with two CPUs at 600MHz and two DRPs at 300MHz, which realizes encoding of a whole CD in 1-2 minutes.

    CiNii

  • ヘテロジニアスマルチコア上でのコンパイラによる低消費電力制御

    林明宏, 伊能健人, 中川亮, 松本繁, 山田海斗, 押山直人, 白子準, 和田康孝, 中野啓史, 鹿野裕明, 木村啓二, 笠原博徳

    情報処理学会研究会報告2007-ARC-174-18(SWoPP2007)   2007 ( 79 ) 103 - 108  2007.08  [Refereed]

     View Summary

    Multicore processors are getting introduced for performance improvement and reduction of power dissipation in various IT fields, such as consumer electronics, PCs, servers and super-computers. Especially, heterogeneous multicores have attracted much attention in consumer electronics to achieve higher performance per watt. In order to satisfy the demand for the high performance, low power dissipation and high software productivity, Parallelizing compilers for both parallelization and Frequency and Voltage control are required. This paper describes the evaluation results of compiler control power saving for a heterogeneous multicore processor which integrates upto 4 general purpose embedded processor Renesas SH4As and 4 accelerator core like dynamically reconfigureable processors Hitachi FE-GAs. Performance evaluation shows the heterogeneous multicore gave us 24.32 times speed up against sequential processing and 28.43% energy savings for MP3 encoding program without performance degradation.

    CiNii

  • ヘテロジニアスマルチコア上での階層的粗粒度タスクスタティックスケジューリング手法

    和田康孝, 林明宏, 伊能健人, 白子準, 中野啓史, 鹿野裕明, 木村啓二, 笠原博徳

    情報処理学会研究会報告2007-ARC-174-17(SWoPP2007)   2007 ( 79 ) 97 - 102  2007.08  [Refereed]

     View Summary

    This paper proposes a static scheduling scheme for hierarchical coarse grain task parallel processing on a heterogeneous multicore processor. A heterogeneous multicore processor integrates not only general purpose processors but also accelerators like dynamically reconfigurable processors (DRPs) or digital signal processors (DSPs). Effective usage of these accelerators allows us to get high performance and low power consumption at the same time. In the proposed scheme, the compiler extracts parallelism using coarse grain parallel processing and assigns tasks considering characteristics of each core to minimize the execution time of an application. Performance of the proposed scheme is evaluated on a heterogeneous multicore processor using MP3 encoder. Hetero-geneous configurations give us 12.64 times speedup with two SH4As and two DRPs and 24.48 times speedup with four SH4As and four DRPs against sequential execution with one SH4A core.

    CiNii

  • 54倍速AACエンコードを実現するヘテロジニアスマルチコアアーキテクチャの検討

    鹿野裕明, 伊藤雅樹, 戸高貴司, 津野田賢伸, 兒玉征之, 小野内雅文, 内山邦男, 小高俊彦, 亀井達也, 永濱 衛, 草桶 学, 新田祐介, 和田康孝, 木村啓二, 笠原博徳

    社団法人 電子情報通信学会, 信学技報, ICD2007-71   107 ( 195 ) 11 - 16  2007.08  [Refereed]

     View Summary

    This paper describes a heterogeneous multi-core processor (HMCP) architecture which integrates general purpose processors (CPU) and accelerators (ACC) to achieve high-performance as well as low-power consumption for SoCs of embedded systems. Memory architecture of CPUs and ACCs were unified to improve programming and compiling efficiency. For preliminary evaluation of the HMCP architecture, AAC-LC stereo audio encoding is parallelized on a heterogeneous multi-core having homogeneous processor cores and dynamic reconfigurable processor (DRP) accelerator cores. The performance evaluation shows that 54x AAC encoding is achieved on the chip with two CPUs at 600MHz and two DRPs at 300MHz, which realizes encoding of a whole CD in 1-2 minutes.

    CiNii

  • A Hierarchical Coarse Grain Task Static Scheduling Scheme on a Heterogeneous Multicore

    Yasutaka Wada, Akihiro Hayashi, Taketo Iyoku, Jun Shirako, Hirofumi Nakano, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2007-ARC-174-17(SWoPP2007)   2007 ( 79 ) 97 - 102  2007.08  [Refereed]

     View Summary

    This paper proposes a static scheduling scheme for hierarchical coarse grain task parallel processing on a heterogeneous multicore processor. A heterogeneous multicore processor integrates not only general purpose processors but also accelerators like dynamically reconfigurable processors (DRPs) or digital signal processors (DSPs). Effective usage of these accelerators allows us to get high performance and low power consumption at the same time. In the proposed scheme, the compiler extracts parallelism using coarse grain parallel processing and assigns tasks considering characteristics of each core to minimize the execution time of an application. Performance of the proposed scheme is evaluated on a heterogeneous multicore processor using MP3 encoder. Hetero-geneous configurations give us 12.64 times speedup with two SH4As and two DRPs and 24.48 times speedup with four SH4As and four DRPs against sequential execution with one SH4A core.

    CiNii

  • Evaluation of Heterogeneous Multicore Architecture with AAC-LC Stereo Encoding

    Hiroaki Shikano, Masaki Ito, Takashi Todaka, Takanobu Tsunoda, Tomoyuki Kodama, Masafumi Onouchi, Kunio Uchiyama, Toshihiko Odaka, Tatsuya Kamei, Ei Nagahama, Manabu Kusaoke, Yusuke Nitta, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, TECHNICAL REPORT OF IEICE. (ICD2007-71)   107 ( 195 ) 11 - 16  2007.08  [Refereed]

     View Summary

    This paper describes a heterogeneous multi-core processor (HMCP) architecture which integrates general purpose processors (CPU) and accelerators (ACC) to achieve high-performance as well as low-power consumption for SoCs of embedded systems. Memory architecture of CPUs and ACCs were unified to improve programming and compiling efficiency. For preliminary evaluation of the HMCP architecture, AAC-LC stereo audio encoding is parallelized on a heterogeneous multi-core having homogeneous processor cores and dynamic reconfigurable processor (DRP) accelerator cores. The performance evaluation shows that 54x AAC encoding is achieved on the chip with two CPUs at 600MHz and two DRPs at 300MHz, which realizes encoding of a whole CD in 1-2 minutes.

    CiNii

  • 最先端の組み込みマルチコア用コンパイラ技術

    笠原 博徳

    DAシンポジウム2007 - システムLSI設計技術とDA -    2007.08  [Refereed]

  • Advanced Parallelizing Compiler Technologies for Embedded Multi-cores

    Hironori Kasahara

    DA Symposiumu 2007    2007.08  [Refereed]

  • Mutligrain Parallel Processing in SMP Execution Mode on a Multicore for Consumer Electronics

    Masayoshi Mase, Daisuke Baba, Harumi Nagayama, Hiroaki Tano, Takeshi Masuura, Takamichi Miyamoto, Jun Shirako, Hirofumi Nakano, Keiji Kimura, Tatsuya Kamei, Toshihiro Hattori, Atsushi Hasegawa, Makoto Sato, Masaki Ito, Toshihiko Odaka, Hironori Kasahara

    Technical Report of IPSJ, 2007-ARC-173-05   107 ( 76 ) 25 - 30  2007.05

     View Summary

    Currently, multicore processors are becoming ubiquitous in various computing domains, namely consumer electronics such as games, car navigation systems and mobile phones, PCs, and supercomputers. This paper describes parallelization of media processing programs written in restricted C language by OSCAR multigrain parallelizing compiler and SMP processing performance on RP1 4-core SH-4A (SH-X3) multicore processor developed by Renesas Technology Corp. and Hitachi, Ltd. based on standard OSCAR multicore memory architecture as a part of NEDO "Research and Development of Multicore Technology for Real Time Consumer Electronics Project". Performance evaluation shows OSCAR compiler achieved 3.34 times speedup using 4 cores against using 1 core for AAC audio encoder.

    CiNii

  • MP3エンコーダを用いたOSCARヘテロジニアスチップマルチプロセッサの性能評価

    鹿野裕明, 鈴木裕貴, 和田康孝, 白子準, 木村啓二, 笠原博徳

    情報処理学会論文誌   48 ( SIG8(ACS18) ) 141 - 152  2007.05  [Refereed]

  • 独立に周波数制御可能な 4320MIPS、SMP/AMP対応 4プロセッサLSIの開発

    早瀬 清, 吉田 裕, 亀井達也, 芝原真一, 西井 修, 服部俊洋, 長谷川 淳, 高田雅士, 入江直彦, 内山邦男, 小高俊彦, 高田 究, 木村啓二, 笠原博徳

    情報処理学会研究会報告2007-ARC-173-06(第165回 計算機アーキテクチャ研究会)   107 ( 76 ) 31 - 35  2007.05  [Refereed]

     View Summary

    4320MIPS 4-processor SoC that provides with low power consumption and high performance was designed using 90nm process. The 32KB-data cache is built into each processor, and the module to maintain the coherency of the data cache between processors is built into. A low electric power is achieved by frequency control of each processor according to amount of processing and adopting sleep mode that maintains coherency of the data cache between processors.

    CiNii

  • 情報家電用マルチコアSMP実行モードにおけるマルチグレイン並列処理

    間瀬正啓, 馬場大介, 長山晴美, 田野裕秋, 益浦健, 深津幸 二, 宮本孝道, 白子準, 中野啓史, 木村啓二, 亀井達也, 服部俊洋, 長谷川淳, 佐藤真琴, 伊藤雅樹, 内山 邦男, 小高俊彦, 笠原博徳

    情報処理学会研究会報告2007-ARC-173-05(第165回 計算機アーキテクチャ研究会)   107 ( 76 ) 25 - 30  2007.05  [Refereed]

     View Summary

    Currently, multicore processors are becoming ubiquitous in various computing domains, namely consumer electronics such as games, car navigation systems and mobile phones, PCs, and supercomputers. This paper describes parallelization of media processing programs written in restricted C language by OSCAR multigrain parallelizing compiler and SMP processing performance on RP1 4-core SH-4A (SH-X3) multicore processor developed by Renesas Technology Corp. and Hitachi, Ltd. based on standard OSCAR multicore memory architecture as a part of NEDO "Research and Development of Multicore Technology for Real Time Consumer Electronics Project". Performance evaluation shows OSCAR compiler achieved 3.34 times speedup using 4 cores against using 1 core for AAC audio encoder.

    CiNii

  • Performance Evaluation of MP3 Audio Encoder on OSCAR Heterogeneous Chip Multicore Processor

    Hiroaki Shikano, Yuki Suzuki, Yasutaka Wada, Jun Shirako, Keiji Kimura, Hironori Kasahara

    Trans. of IPSJ   48 ( SIG8(ACS18) ) 141 - 152  2007.05  [Refereed]

     View Summary

    This paper evaluates a heterogeneous chip multi-processor (HCMP) and its scheduling scheme. The HCMP possesses different types of processing elements (PEs) such as CPUs as general-purpose processors, as well as digital signal processors or dynamic reconfigurable processors (DRPs) as specific-purpose processors. The HCMP realizes higher performance and lower power consumption than conventional single-core processors or even homogeneous multi-core processors in some specific applications such as media processing with low operating frequency supplied. In this paper, the performance of the HCMP is analyzed by studying parallelizing scheme and power control scheme of an MP3 audio encoding program and by scheduling the program onto the HCMP using these two schemes. As a result, it is observed an HCMP, consisting of two CPUs and four DRPs, outperforms a single-core processor with one CPU by a speed-up factor of 18.4. It is also obtained that the estimated energy on the HCMP with a power control results in as much as 80.0% reduction.

    CiNii

  • Mutligrain Parallel Processing in SMP Execution Mode on a Multicore for Consumer Electronics

    Masayoshi Mase, Daisuke Baba, Harumi Nagayama, Hiroaki Tano, Takeshi Masuura, Takamichi Miyamoto, Jun Shirako, Hirofumi Nakano, Keiji Kimura, Tatsuya Kamei, Toshihiro Hattori, Atsushi Hasegawa, Makoto Sato, Masaki Ito, Toshihiko Odaka, Hironori Kasahara

    Technical Report of IPSJ, 2007-ARC-173-05   107 ( 76 ) 25 - 30  2007.05  [Refereed]

     View Summary

    Currently, multicore processors are becoming ubiquitous in various computing domains, namely consumer electronics such as games, car navigation systems and mobile phones, PCs, and supercomputers. This paper describes parallelization of media processing programs written in restricted C language by OSCAR multigrain parallelizing compiler and SMP processing performance on RP1 4-core SH-4A (SH-X3) multicore processor developed by Renesas Technology Corp. and Hitachi, Ltd. based on standard OSCAR multicore memory architecture as a part of NEDO "Research and Development of Multicore Technology for Real Time Consumer Electronics Project". Performance evaluation shows OSCAR compiler achieved 3.34 times speedup using 4 cores against using 1 core for AAC audio encoder.

    CiNii

  • A Local Memory Management Scheme in Multigrain Parallelizing Compiler

    Tsuyoshi Miura, Tomohiro Tagawa, Yusuke Muramatsu, Akinori Ikemi, Masahiro Nakagawa, Hirofumi Nakano, Jun Shirako, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2007-ARC-172/HPC-109-11   2007 ( 17 ) 61 - 66  2007.03

     View Summary

    Multicore systems have been attracting much attention for performance, low power consumption and short hardware/software development period. To take the full advantage of multiprocessor systems, parallelizing compilers serve important roles. On multicore processor, a memory wall caused by the speed gap between processor core and memory is also serious problem. Therefore, it is important for performance improvement to use fast memolies like cache and local memory nearby a processor effectively. This paper proposes a local memory management scheme for coarse grain task parallel processing. In the evaluation using SPeC 95fp tomcatv, the proposed scheme using 8 processors achieved 19.6 times speedup against the sequantial execution without the proposed scheme on the OSCAR multicore processor by the effective use of local memories.

    CiNii

  • マルチグレイン並列化コンパイラにおけるローカルメモリ管理手法

    三浦 剛, 田川友博, 村松裕介, 池見明紀, 中川正洋, 中野啓史, 白子 準, 木村啓二, 笠原博徳

    情報処理学会研究会報告2007-ARC-109/HPC-109-11 (HOKKE2007)   2007 ( 17 ) 61 - 66  2007.03  [Refereed]

     View Summary

    Multicore systems have been attracting much attention for performance, low power consumption and short hardware/software development period. To take the full advantage of multiprocessor systems, parallelizing compilers serve important roles. On multicore processor, a memory wall caused by the speed gap between processor core and memory is also serious problem. Therefore, it is important for performance improvement to use fast memolies like cache and local memory nearby a processor effectively. This paper proposes a local memory management scheme for coarse grain task parallel processing. In the evaluation using SPeC 95fp tomcatv, the proposed scheme using 8 processors achieved 19.6 times speedup against the sequantial execution without the proposed scheme on the OSCAR multicore processor by the effective use of local memories.

    CiNii

  • A Local Memory Management Scheme in Multigrain Parallelizing Compiler

    Tsuyoshi Miura, Tomohiro Tagawa, Yusuke Muramatsu, Akinori Ikemi, Masahiro Nakagawa, Hirofumi Nakano, Jun Shirako, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2007-ARC-172/HPC-109-11   2007 ( 17 ) 61 - 66  2007.03  [Refereed]

     View Summary

    Multicore systems have been attracting much attention for performance, low power consumption and short hardware/software development period. To take the full advantage of multiprocessor systems, parallelizing compilers serve important roles. On multicore processor, a memory wall caused by the speed gap between processor core and memory is also serious problem. Therefore, it is important for performance improvement to use fast memolies like cache and local memory nearby a processor effectively. This paper proposes a local memory management scheme for coarse grain task parallel processing. In the evaluation using SPeC 95fp tomcatv, the proposed scheme using 8 processors achieved 19.6 times speedup against the sequantial execution without the proposed scheme on the OSCAR multicore processor by the effective use of local memories.

    CiNii

  • Power-aware compiler controllable chip multiprocessor

    Hiroaki Shikano, Jun Shirako, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT     427  2007  [Refereed]

    DOI

    Scopus

    1
    Citation
    (Scopus)
  • Automatic Parallelization for Multimedia Applications on Multicore Processors

    Takamichi Miyamoto, Saori Asaka, Nobuhito Kamakura, Hiromasa Yamauchi, Masayoshi Mase, Jun Shirako, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2007-ARC-171-13   2007 ( 4 ) 69 - 74  2007.01

     View Summary

    Multicore processors have attracted much attention to handle the increase of power consumption along with the increase of integration degree of semiconductor devices, the slowdown of improvement of processor clocks, and the increase of hardware/software developing period. Also, speeding up multimedia applications is required with the progress of the consumer electronics like mobile phones, digital TV and games. This paper describes parallelization methods of multimedia applications on the multicore processors. Especially in this paper, MPEG2 encoding and MPEG2 decoding are selected as examples of video sequence processing, MP3 encoding is selected as an example of audio processing, JPEG 2000 encoding is selected as an example of picture processing. OSCAR multigrain parallelizing compiler automatically parallelizes these media applications. This paper evaluates parallel processing performances of these multimedia applications on the OSCAR multicore processor, and the IBM p5 550Q Power5+ 8 processors SMP server. On the OSCAR multicore processor, the parallel execution with the proposed method of managing local memory and optimizing data transfer using 4 processors, gives us 3.81 times speedup for MPEG2 encoding, 3.04 times speedup for MPEG2 decoding, 3.09 times speedup for MP3 encoding, 3.79 times speedup for JPEG 2000 encoding against the sequential execution. On the IBM p5 550Q Power5+ 8 processors server, the parallel execution using 8 processors gives us 5.19 times speedup for MPEG2 encoding, 5.12 times speedup for MPEG2 decoding, 3.69 times speedup for MP3 encoding, 4.32 times speedup for JPEG 2000 encoding against the sequential execution.

    CiNii

  • マルチコア上でのマルチメディアアプリケーションの自動並列化

    宮本孝道, 浅香沙織, 鎌倉信仁, 山内宏真, 間瀬正啓, 白子準, 中野啓史, 木村啓二, 笠原博徳

    情報処理学会研究会報告2006-ARC-171-13   2007 ( 4 ) 69 - 74  2007.01  [Refereed]

     View Summary

    Multicore processors have attracted much attention to handle the increase of power consumption along with the increase of integration degree of semiconductor devices, the slowdown of improvement of processor clocks, and the increase of hardware/software developing period. Also, speeding up multimedia applications is required with the progress of the consumer electronics like mobile phones, digital TV and games. This paper describes parallelization methods of multimedia applications on the multicore processors. Especially in this paper, MPEG2 encoding and MPEG2 decoding are selected as examples of video sequence processing, MP3 encoding is selected as an example of audio processing, JPEG 2000 encoding is selected as an example of picture processing. OSCAR multigrain parallelizing compiler automatically parallelizes these media applications. This paper evaluates parallel processing performances of these multimedia applications on the OSCAR multicore processor, and the IBM p5 550Q Power5+ 8 processors SMP server. On the OSCAR multicore processor, the parallel execution with the proposed method of managing local memory and optimizing data transfer using 4 processors, gives us 3.81 times speedup for MPEG2 encoding, 3.04 times speedup for MPEG2 decoding, 3.09 times speedup for MP3 encoding, 3.79 times speedup for JPEG 2000 encoding against the sequential execution. On the IBM p5 550Q Power5+ 8 processors server, the parallel execution using 8 processors gives us 5.19 times speedup for MPEG2 encoding, 5.12 times speedup for MPEG2 decoding, 3.69 times speedup for MP3 encoding, 4.32 times speedup for JPEG 2000 encoding against the sequential execution.

    CiNii

  • Automatic Parallelization for Multimedia Applications on Multicore Processors

    Takamichi Miyamoto, Saori Asaka, Nobuhito Kamakura, Hiromasa Yamauchi, Masayoshi Mase, Jun Shirako, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2007-ARC-171-13   2007 ( 4 ) 69 - 74  2007.01  [Refereed]

     View Summary

    Multicore processors have attracted much attention to handle the increase of power consumption along with the increase of integration degree of semiconductor devices, the slowdown of improvement of processor clocks, and the increase of hardware/software developing period. Also, speeding up multimedia applications is required with the progress of the consumer electronics like mobile phones, digital TV and games. This paper describes parallelization methods of multimedia applications on the multicore processors. Especially in this paper, MPEG2 encoding and MPEG2 decoding are selected as examples of video sequence processing, MP3 encoding is selected as an example of audio processing, JPEG 2000 encoding is selected as an example of picture processing. OSCAR multigrain parallelizing compiler automatically parallelizes these media applications. This paper evaluates parallel processing performances of these multimedia applications on the OSCAR multicore processor, and the IBM p5 550Q Power5+ 8 processors SMP server. On the OSCAR multicore processor, the parallel execution with the proposed method of managing local memory and optimizing data transfer using 4 processors, gives us 3.81 times speedup for MPEG2 encoding, 3.04 times speedup for MPEG2 decoding, 3.09 times speedup for MP3 encoding, 3.79 times speedup for JPEG 2000 encoding against the sequential execution. On the IBM p5 550Q Power5+ 8 processors server, the parallel execution using 8 processors gives us 5.19 times speedup for MPEG2 encoding, 5.12 times speedup for MPEG2 decoding, 3.69 times speedup for MP3 encoding, 4.32 times speedup for JPEG 2000 encoding against the sequential execution.

    CiNii

  • A 4320MIPS four-processor core SMP/AMP with individually managed clock frequency for low power consumption

    Yutaka Yoshida, Tatsuya Kamei, Kiyoshi Hayase, Shinichi Shibahara, Osamu Nishii, Toshihiro Hattori, Atsushi Hasegawa, Masashi Takada, Naohiko Irie, Kunio Uchiyama, Toshihiko Odaka, Kiwamu Takada, Keiji Kimura, Hironori Kasahara

    Digest of Technical Papers - IEEE International Solid-State Circuits Conference     95 - 590  2007  [Refereed]

     View Summary

    A 4320MIPS four-core SoC that supports both SMP and AMP for embedded applications is designed in 90nm CMOS. Each processor-core can be operated with a different frequency dynamically including clock stop, while keeping data cache coherency, to maintain maximum processing performance and to reduce average operating power. The 97.6mm2 die achieves a floating-point performance of 16.8GFLOPS. © 2007 IEEE.

    DOI

    Scopus

    26
    Citation
    (Scopus)
  • A 4320MIPS four-processor core SMP/AMP with individually managed clock frequency for low power consumption

    Yutaka Yoshida, Tatsuya Kamei, Kiyoshi Hayase, Shinichi Shibahara, Osamu Nishii, Toshihiro Hattori, Atsushi Hasegawa, Masashi Takada, Naohiko Irie, Kunio Uchiyama, Toshihiko Odaka, Kiwamu Takada, Keiji Kimura, Hironori Kasahara

    Digest of Technical Papers - IEEE International Solid-State Circuits Conference     95 - 590  2007

     View Summary

    A 4320MIPS four-core SoC that supports both SMP and AMP for embedded applications is designed in 90nm CMOS. Each processor-core can be operated with a different frequency dynamically including clock stop, while keeping data cache coherency, to maintain maximum processing performance and to reduce average operating power. The 97.6mm2 die achieves a floating-point performance of 16.8GFLOPS. © 2007 IEEE.

    DOI

    Scopus

    26
    Citation
    (Scopus)
  • A 4320MIPS four-processor core SMP/AMP with individually managed clock frequency for low power consumption

    Yutaka Yoshida, Tatsuya Kamei, Kiyoshi Hayase, Shinichi Shibahara, Osamu Nishii, Toshihiro Hattori, Atsushi Hasegawa, Masashi Takada, Naohiko Irie, Kunio Uchiyama, Toshihiko Odaka, Kiwamu Takada, Keiji Kimura, Hironori Kasahara

    Digest of Technical Papers - IEEE International Solid-State Circuits Conference     95 - 590  2007  [Refereed]

     View Summary

    A 4320MIPS four-core SoC that supports both SMP and AMP for embedded applications is designed in 90nm CMOS. Each processor-core can be operated with a different frequency dynamically including clock stop, while keeping data cache coherency, to maintain maximum processing performance and to reduce average operating power. The 97.6mm2 die achieves a floating-point performance of 16.8GFLOPS. © 2007 IEEE.

    DOI

    Scopus

    26
    Citation
    (Scopus)
  • Heterogeneous multiprocessor on a chip which enables 54x AAC-LC stereo encoding

    Masaki Ito, Takashi Todaka, Takanobu Tsunoda, Hiroshi Tanaka, Tomoyuki Kodama, Hiroaki Shikano, Masafumi Onouchi, Kunio Uchiyama, Toshihiko Odaka, Tatsuya Kamei, Ei Nagahama, Manabu Kusaoke, Yusuke Nitta, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    2007 Symposium on VLSI Circuits, Digest of Technical Papers     18 - 19  2007  [Refereed]

     View Summary

    A heterogeneous multiprocessor on a chip has been designed and implemented. It consists of 2 CPUs and 2 DRPs (Dynamic Reconfigurable Processors). The design of DRP was intended to achieve high-performance in a small area to be integrated on a SoC for embedded systems. Memory architecture of CPUs and DRPs were unified to improve programming and compiling efficiency. 54x AAC-LC stereo encoding has been enabled with 2 DRPs at 300MHz and 2 CPUs at 600MHz.

  • Automatic Parallelization of Restricted C Programs in OSCAR Compiler

    Masayoshi Mase, Daisuke Baba, Harumi Nagayama, Hiroaki Tano, Takeshi Masuura, Koji Fukatsu, Takamichi Miyamoto, Jun Shirako, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2006-ARC-170-01 (DesignGaia2006)   2006 ( 127 ) 1 - 6  2006.11

     View Summary

    Along with the popularization of multiprocessors and multicore architectures, automatic parallelizing compiler, which can realize high effective performance and low power comsumption, becomes more and more important in various areas from high performance computing to embedded computing. OSCAR compiler realizes multigrain automatic parallelization, which can exploit parallelism and data locality from the whole of the program. This paper describes C language support in OSCAR compiler. For rapid support of C language, restricted C language is proposed. In the preliminary performance evaluation of automatic parallelization using following media applications as MPEG2 encode, MP3 encode, and AAC encode, Susan (smoothing) derived from MiBench, and Art from SPEC2000, OSCAR compiler achieved 7.49 times speed up in maximum for susan (smoothing) against sequential execution on IBM p5 550 server having 8 processors, and 3.75 times speed up in maximum for susan (smoothing) too against sequential execution on Sun Ultra80 workstation having 4 processors.

    CiNii

  • Performance of OSCAR Multigrain Parallelizing Compiler on SMP Servers and Embedded Multicore

    Jun Shirako, Tomohiro Tagawa, Tsuyoshi Miura, Takamichi Miyamoto, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2006-ARC-170-02 (DesignGaia2006)   2006 ( 127 ) 7 - 12  2006.11

     View Summary

    Currently, multiprocessor systems, especially multicore processors, are attracting much attention for performance, low power consumption and short hardware/software development period. To take the full advantage of multiprocessor systems, parallelizing compilers serve important roles. This paper describes the execution performance of OSCAR multigrain parallelizing compiler using coarse grain task parallelization and near fine grain parallelization in addition to loop parallelization, on the latest SMP servers and a SMP embedded multicore. The OSCAR compiler has realized the automatic determination of parallelizing layer, which decides the suitable number of processors and parallelizing technique for each nested part of the program, and global cache memory optimization over loops and coarse grain tasks. In the performance evaluation using 10 SPEC CFP95 benchmark programs and 4 SPEC CFP2000, OSCAR compiler gave us 2.74 times speedup compared with IBM XL Fortran compiler 10.1 on IBM p5 550Q Power5+8 processors server, 4.82 times speedup compared with IBM XL Fortran compiler 8.1 on IBM pSeries690 Power4 24 processors server. OSCAR compiler can be also applied for NEC/ARM MPCore ARMv6 4 processors low power embedded multicore, using subset of OpenMP libraries and g77 compiler. In the evaluation using SPEC CFP95 benchmarks with reduced data sets, OSCAR compiler achieved 4.08 times speedup for tomcatv, 3.90 times speedup for swim, 2.21 times speedup for su2cor, 3.53 times speedup for hydro2d, 3.85 times speedup for mgrid, 3.62 times speedup for applu and 3.20 times speedup for turb3d against the sequential execution.

    CiNii

  • SMPサーバ及び組込み用マルチコア上でのOSCARマルチグレイン自動並列化コンパイラの性能

    白子準, 田川友博, 三浦剛, 宮本孝道, 中野啓史, 木村啓二, 笠原博徳

    情報処理学会研究会報告2006-ARC-170-02(デザインガイア2006)   2006 ( 127 ) 7 - 12  2006.11  [Refereed]

     View Summary

    Currently, multiprocessor systems, especially multicore processors, are attracting much attention for performance, low power consumption and short hardware/software development period. To take the full advantage of multiprocessor systems, parallelizing compilers serve important roles. This paper describes the execution performance of OSCAR multigrain parallelizing compiler using coarse grain task parallelization and near fine grain parallelization in addition to loop parallelization, on the latest SMP servers and a SMP embedded multicore. The OSCAR compiler has realized the automatic determination of parallelizing layer, which decides the suitable number of processors and parallelizing technique for each nested part of the program, and global cache memory optimization over loops and coarse grain tasks. In the performance evaluation using 10 SPEC CFP95 benchmark programs and 4 SPEC CFP2000, OSCAR compiler gave us 2.74 times speedup compared with IBM XL Fortran compiler 10.1 on IBM p5 550Q Power5+8 processors server, 4.82 times speedup compared with IBM XL Fortran compiler 8.1 on IBM pSeries690 Power4 24 processors server. OSCAR compiler can be also applied for NEC/ARM MPCore ARMv6 4 processors low power embedded multicore, using subset of OpenMP libraries and g77 compiler. In the evaluation using SPEC CFP95 benchmarks with reduced data sets, OSCAR compiler achieved 4.08 times speedup for tomcatv, 3.90 times speedup for swim, 2.21 times speedup for su2cor, 3.53 times speedup for hydro2d, 3.85 times speedup for mgrid, 3.62 times speedup for applu and 3.20 times speedup for turb3d against the sequential execution.

    CiNii

  • OSCARコンパイラにおける制約付きCプログラムの自動並列化

    間瀬正啓, 馬場大介, 長山晴美, 田野裕秋, 益浦健, 深津幸二, 宮本孝道, 白子準, 中野啓史, 木村啓二, 笠原博徳

    情報処理学会研究会報告2006-ARC-170-01(デザインガイア2006)   2006 ( 127 ) 1 - 6  2006.11  [Refereed]

     View Summary

    Along with the popularization of multiprocessors and multicore architectures, automatic parallelizing compiler, which can realize high effective performance and low power comsumption, becomes more and more important in various areas from high performance computing to embedded computing. OSCAR compiler realizes multigrain automatic parallelization, which can exploit parallelism and data locality from the whole of the program. This paper describes C language support in OSCAR compiler. For rapid support of C language, restricted C language is proposed. In the preliminary performance evaluation of automatic parallelization using following media applications as MPEG2 encode, MP3 encode, and AAC encode, Susan (smoothing) derived from MiBench, and Art from SPEC2000, OSCAR compiler achieved 7.49 times speed up in maximum for susan (smoothing) against sequential execution on IBM p5 550 server having 8 processors, and 3.75 times speed up in maximum for susan (smoothing) too against sequential execution on Sun Ultra80 workstation having 4 processors.

    CiNii

  • Performance of OSCAR Multigrain Parallelizing Compiler on SMP Servers and Embedded Multicore

    Jun Shirako, Tomohiro Tagawa, Tsuyoshi Miura, Takamichi Miyamoto, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2006-ARC-170-02/ (DesignGaia2006)   2006 ( 127 ) 7 - 12  2006.11  [Refereed]

     View Summary

    Currently, multiprocessor systems, especially multicore processors, are attracting much attention for performance, low power consumption and short hardware/software development period. To take the full advantage of multiprocessor systems, parallelizing compilers serve important roles. This paper describes the execution performance of OSCAR multigrain parallelizing compiler using coarse grain task parallelization and near fine grain parallelization in addition to loop parallelization, on the latest SMP servers and a SMP embedded multicore. The OSCAR compiler has realized the automatic determination of parallelizing layer, which decides the suitable number of processors and parallelizing technique for each nested part of the program, and global cache memory optimization over loops and coarse grain tasks. In the performance evaluation using 10 SPEC CFP95 benchmark programs and 4 SPEC CFP2000, OSCAR compiler gave us 2.74 times speedup compared with IBM XL Fortran compiler 10.1 on IBM p5 550Q Power5+8 processors server, 4.82 times speedup compared with IBM XL Fortran compiler 8.1 on IBM pSeries690 Power4 24 processors server. OSCAR compiler can be also applied for NEC/ARM MPCore ARMv6 4 processors low power embedded multicore, using subset of OpenMP libraries and g77 compiler. In the evaluation using SPEC CFP95 benchmarks with reduced data sets, OSCAR compiler achieved 4.08 times speedup for tomcatv, 3.90 times speedup for swim, 2.21 times speedup for su2cor, 3.53 times speedup for hydro2d, 3.85 times speedup for mgrid, 3.62 times speedup for applu and 3.20 times speedup for turb3d against the sequential execution.

    CiNii

  • Automatic Parallelization of Restricted C Progurams in OSCAR Compiler

    Masayoshi Mase, Daisuke Baba, Harumi Nagayama, Hiroaki Tano, Takeshi Masuura, Koji Fukatsu, Takamichi Miyamoto, Jun Shirako, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2006-ARC-170-01/ (DesignGaia2006)   2006 ( 127 ) 1 - 6  2006.11  [Refereed]

     View Summary

    Along with the popularization of multiprocessors and multicore architectures, automatic parallelizing compiler, which can realize high effective performance and low power comsumption, becomes more and more important in various areas from high performance computing to embedded computing. OSCAR compiler realizes multigrain automatic parallelization, which can exploit parallelism and data locality from the whole of the program. This paper describes C language support in OSCAR compiler. For rapid support of C language, restricted C language is proposed. In the preliminary performance evaluation of automatic parallelization using following media applications as MPEG2 encode, MP3 encode, and AAC encode, Susan (smoothing) derived from MiBench, and Art from SPEC2000, OSCAR compiler achieved 7.49 times speed up in maximum for susan (smoothing) against sequential execution on IBM p5 550 server having 8 processors, and 3.75 times speed up in maximum for susan (smoothing) too against sequential execution on Sun Ultra80 workstation having 4 processors.

    CiNii

  • 最先端のコンピュータアーキテクチャ -経済産業省/NEDOリアルタイム情報家電用マルチコアプロジェクトを中心として-

    笠原 博徳

    東京電力EWE講演会2006    2006.10  [Refereed]

  • 最先端マルチコアコンパイラとその並列化・低消費電力化性能

    笠原 博徳

    アーム株式会社 ARMセミナー2006    2006.10  [Refereed]

  • Multi-core Parallelizing Compiler for Low Power High Performance Computing

    Hironori Kasahara

    University of Illinois at Urbana-Champaign, Hosted by Prof. David Padua    2006.10  [Refereed]

  • Advanced Computer Architecture: METI/NEDO Multicore-processor Technology for Real-time Consumer Electronics Project

    Hironori Kasahara

    Tokyo Electric Power Company EWE Seminor 2006    2006.10  [Refereed]

  • Advanced Multi-core Compiler and Its Parallelization and Power Reduction Performance

    Hironori Kasahara

    ARM Seminar 2006    2006.10  [Refereed]

  • C Language Support in OSCAR Multigrain Parallelizing Compiler using CoSy

    Masayoshi Mase, Keiji Kimura, Hironori Kasahara

    ACE 2nd CoSy Community Gathering    2006.10  [Refereed]

  • マルチコアプロセッサにおけるコンパイラ制御低消費電力化手法

    白子 準, 吉田 宗弘, 押山 直人, 和田 康孝, 中野 啓史, 鹿野 裕明, 木村 啓二, 笠原 博徳

    情報処理学会論文誌コンピューティングシステム   47 ( SIG12(ACS15) ) 147 - 158  2006.09  [Refereed]

  • Software Challenges in Multi-Core Chip Era (Panel Discussion)

    Guang R. Gao, Kasahara Hironori, Vivek Sarkar, Skevos Evripidou, Murphy Brian

    Workshop on Software Challenges for Multicore Architectures(Tshinghua Univ. Beijing, China)    2006.09  [Refereed]

  • OSCAR Multigrain Parallelizing Compiler for Multicore Architectures

    Hironori Kasahara

    Workshop on Software Challenges for Multicore Architectures(Tshinghua Univ. Beijing, China)    2006.09  [Refereed]

  • 並列化コンパイラ協調型 チップマルチプロセッサ技術

    笠原博徳, 木村啓二, 白子準, 和田康孝, 中野啓史, 宮本孝道

    STARCシンポジウム2006    2006.09  [Refereed]

  • Parallelizing Compiler Cooperative Chip Multiprocessor Technology

    Hironori Kasahara, Keiji Kimura, Jun Shirako, Yasutaka Wada, Hirofumi Nakano, Takamichi Miyamoto

    STARC Symposium 2006    2006.09  [Refereed]

  • Parallelization of Multi-Path Concurrent Search for Iterative Deepening using Proof and Disproof Numbers

    Fumiyo Takano, Yoshitaka Maekawa, Hironori Kasahara, Seinosuke Narita

    Technical Report of IPSJ, 2006-HPC-103-17 (SWoPP2006)    2006.08

  • Local Memory Management on OSCAR Multicore

    Hirofumi Nakano, Takumi Nito, Takanori Maruyama, Masahiro Nakagawa, Yuki Suzuki, Yosuke Naito, Takamichi Miyamoto, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2006-ARC-169-28 (SWoPP2006)    2006.08

  • 並列化コンパイラの最新動向

    笠原 博徳

    日本IBM 先駆的科学計算に関するフォーラム2006    2006.08  [Refereed]

  • 証明数・反証数を用いた反復深化法における複数経路並行探索の並列化

    鷹野芙美代, 前川仁孝, 笠原博徳, 成田誠之助

    情報処理学会研究会報告2006-HPC-103-17(SWoPP高知2006)    2006.08  [Refereed]

  • OSCARマルチコア上でのローカルメモリ管理手法

    中野啓史, 仁藤拓実, 丸山貴紀, 中川正洋, 鈴木裕貴, 内藤陽介, 宮本孝道, 和田康孝, 木村啓二, 笠原博徳

    情報処理学会研究会報告2006-ARC-169-28(SWoPP高知2006)    2006.08  [Refereed]

  • Parallelization of Multi-Path Concurrent Search for Iterative Deepening using Proof and Disproof Numbers

    Fumiyo Takano, Yoshitaka Maekawa, Hironori Kasahara, Seinosuke Narita

    Technical Report of IPSJ, 2006-HPC-103-17/ (SWoPP2006)    2006.08  [Refereed]

  • Local Memory Management on OSCAR Multicore

    Hirofumi Nakano, Takumi Nito, Takanori Maruyama, Masahiro Nakagawa, Yuki Suzuki, Yosuke Naito, Takamichi Miyamoto, Yasutaka Wada, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2006-ARC-169-28/ (SWoPP2006)    2006.08  [Refereed]

  • 情報家電用マルチコアと並列化コンパイラ

    笠原 博徳

    JEITAマイクロプロセッサ専門委員会講演会「マルチコアアーキテクチャの研究開発動向及び将来展望」    2006.08  [Refereed]

  • Multicores for Consumer Electronics and Parallelizing Compilers

    Hironori Kasahara

    JEITA SIG. on Microprocessor    2006.08  [Refereed]

  • The Latest Trend of Parallelizing Compiler

    Hironori Kasahara

    IBM Japan Forum on Pioneering Scientific Computing    2006.08  [Refereed]

  • イノベーション創出を目指した産官学連携と人材育成の試み(「イノベーションの創出に向けた 産学官連携の推進と人材の育成」パネリスト)

    笠原 博徳

    第5回産学官連携推進会議分科会    2006.06  [Refereed]

  • Trial s of Collaboration among Business, Academia and Governmentand Human Resource Development for Creation of Innovations(Panel on the Promotion of Collaboration among Business, Academia and Government and Human Resource Development for Creation of Innovations)

    Hironori Kasahara

    5th Conference for the Promotion of Collaboration Among Business, Academia, and Government (Section Meeting)    2006.06  [Refereed]

  • Compiler Controle Power Saving Scheme for Multicore Processors

    Jun Shirako, Munehiro Yoshida, Naoto Oshiyama, Yasutaka Wada, Hirofumi Nakano, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    Symposium on Advanced Computing Systems and Infrastructures (SACSIS 2006)   47 ( SIG12(ACS15) ) 147 - 158  2006.05  [Refereed]

    CiNii

  • マルチCPUアーキテクチャと並列化コンパイラ技術の動向(コンスーマー機器への応用)

    笠原 博徳

    ソニー株式会社 技術講演会    2006.05  [Refereed]

  • Latest Trends of Multi-CPU Architectures and Parallelizing Compilers: Application for Consumer Electronics

    Hironori Kasahara

    Sony Technology seminar    2006.05  [Refereed]

  • マルチコアプロセッサにおけるコンパイラ制御低消費電力化手法

    白子 準, 吉田 宗広, 押山 直人, 和田 康孝, 中野 啓史, 鹿野 裕明, 木村 啓二, 笠原 博徳

    SACSIS2006 - 先進的計算基盤システムシンポジウム    2006.05  [Refereed]

  • Performance Evaluation of Heterogeneous Chip Multi-Processor with MP3 Audio Encoder

    Hiroaki Shikano, Yuki Suzuki, Yasutaka Wada, Jun Shirako, Keiji Kimura, Hironori Kasahara

    Proc. of IEEE Symposiumu on Low-Power and High Speed Chips (COOL Chips IX)     349 - 363  2006.04  [Refereed]

    CiNii

  • Data Transfer Overlap of Coarse Grain Task Parallel Processing on a Multicore Processor

    Takamichi Miyamoto, Masahiro Nakagawa, Shoichiro Asano, Yosuke Naito, Takumi Nito, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2006-ARC-167/HPC-105-10   2006 ( 20 ) 55 - 60  2006.02

     View Summary

    Along with the increase of integration degree of semiconductor devices, to overcome the increase of power consumption, the slowdown of improvement of processor effective performance, and the increase of period for hardware/software developing transistors integrated on to a chip, multicore processors, have attracted much attention as a next-generation microprocessor architecture. However, the memory wall caused by the gap between memory access speed and processor core speed is still a serious problem also on the multicore processors. Therefore, the effective use of fast memories like cache and local memory nearby processor is important for reducing large memory access overhead. Futhermore, hiding data transfer overhead among local or distributed shared memories of processors and centralized shared memory is important. On the memory architechture, the data transfer is specified. Considering these problems, the authors have proposed the OSCAR multicore processor architecture which cooperates with OSCAR multigrain parallelizing compiler and aims at developing a processor with high effective performance and good cost performance computer system. The OSCAR multicore processor has local data memory (LDM) for processor private data, distributed shared memory (DSM) having two ports for synchronization and data transfer among processor cores, centralized shared memory (CSM) to support dynamic task scheduling, and data transfer unit (DTU) which transfers data asynchronously and aims at overlapping data transfer overhead. This paper proposes and evaluates a static data transfer scheduling algorithm aiming at overlapping data transfer overhead. As the results, the proposed scheme controlled by OSCAR compiler gives us 2.86 times speedup using 4 processors for JPEG2000 encoding program against the ideal sequential execution assuming that the all data can be assigned to the local memory.

    CiNii

  • マルチコアプロセッサ上での粗粒度タスク並列処理におけるデータ転送オーバラップ方式

    宮本孝道, 中川正洋, 浅野尚一郎, 内藤陽介, 仁藤拓実, 中野啓史, 木村啓二, 笠原博徳

    情報処理学会研究報告2006ARC-167-10(HOKKE2006)    2006.02  [Refereed]

  • Data Transfer Overlap of Coarse Grain Task Parallel Processing on a Multicore Processor

    Takamichi Miyamoto, Masahiro Nakagawa, Shoichiro Asano, Yosuke Naito, Takumi Nito, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2006-ARC-167/HPC-105-10   2006 ( 20 ) 55 - 60  2006.02  [Refereed]

     View Summary

    Along with the increase of integration degree of semiconductor devices, to overcome the increase of power consumption, the slowdown of improvement of processor effective performance, and the increase of period for hardware/software developing transistors integrated on to a chip, multicore processors, have attracted much attention as a next-generation microprocessor architecture. However, the memory wall caused by the gap between memory access speed and processor core speed is still a serious problem also on the multicore processors. Therefore, the effective use of fast memories like cache and local memory nearby processor is important for reducing large memory access overhead. Futhermore, hiding data transfer overhead among local or distributed shared memories of processors and centralized shared memory is important. On the memory architechture, the data transfer is specified. Considering these problems, the authors have proposed the OSCAR multicore processor architecture which cooperates with OSCAR multigrain parallelizing compiler and aims at developing a processor with high effective performance and good cost performance computer system. The OSCAR multicore processor has local data memory (LDM) for processor private data, distributed shared memory (DSM) having two ports for synchronization and data transfer among processor cores, centralized shared memory (CSM) to support dynamic task scheduling, and data transfer unit (DTU) which transfers data asynchronously and aims at overlapping data transfer overhead. This paper proposes and evaluates a static data transfer scheduling algorithm aiming at overlapping data transfer overhead. As the results, the proposed scheme controlled by OSCAR compiler gives us 2.86 times speedup using 4 processors for JPEG2000 encoding program against the ideal sequential execution assuming that the all data can be assigned to the local memory.

    CiNii

  • A Static Scheduling Scheme for Coarse Grain Tasks on a Heterogeneous Chip Multi Processor

    Yasutaka Wada, Naoto Oshiyama, Yuki Suzuki, Yosuke Naito, Jun Shirako, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2006-ARC-166-3 (SHINING2006)   2006 ( 8 ) 13 - 18  2006.01

     View Summary

    This paper proposes a static scheduling scheme for coarse grain tasks on a heterogeneous chip multi processor which integrates not only general purpose processors but also accelerators like DRP or DSP. A heterogeneous chip multi processor allows us to get high performance by using the accelerators and to save energy by frequency/voltage control by the compiler. In this scheme, the compiler aim to minimize the execution time of an application in consideration of the characteristic in each core. Performance of the proposed scheme is evaluated on a heterogeneous chip multi processor which has 4 general purpose processors and 2 accelerators using MP3 encoder and gives us 8.8 times speedup against sequencial execution without the proposed scheme.

    CiNii

  • Preliminary Evaluation of Heterogeneous Chip Multi-Processor with MP3 Audio Encoder

    Hiroaki Shikano, Yuki Suzuki, Yasutaka Wada, Jun Shirako, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2006-ARC-166-1 (SHINING2006)    2006.01

  • Parallelizing Compiler Cooperated Low Power High Effective Performance Multi-core Processors

    Hironori Kasahara

    Technical Report of IPSJ, 2006-ARC-166-6 (SHINING2006)    2006.01

  • 並列化コンパイラ協調型低消費電力・高実効性能マルチコアプロセッサの動向

    笠原 博徳

    情報処理学会2006 ARC-166-6(SHINING2006)    2006.01  [Refereed]

  • ヘテロジニアスチップマルチプロセッサにおける粗粒度タスクスタティックスケジューリング手法

    和田康孝, 押山直人, 鈴木裕貴, 内藤陽介, 白子準, 中野啓史, 鹿野裕明, 木村啓二, 笠原博徳

    情報処理学会2006 ARC-166-3(SHINING2006)   2006 ( 8 ) 13 - 18  2006.01  [Refereed]

     View Summary

    This paper proposes a static scheduling scheme for coarse grain tasks on a heterogeneous chip multi processor which integrates not only general purpose processors but also accelerators like DRP or DSP. A heterogeneous chip multi processor allows us to get high performance by using the accelerators and to save energy by frequency/voltage control by the compiler. In this scheme, the compiler aim to minimize the execution time of an application in consideration of the characteristic in each core. Performance of the proposed scheme is evaluated on a heterogeneous chip multi processor which has 4 general purpose processors and 2 accelerators using MP3 encoder and gives us 8.8 times speedup against sequencial execution without the proposed scheme.

    CiNii

  • MP3エンコーダを用いたヘテロジニアスチップマルチプロセッサの性能評価

    鹿野裕明, 鈴木裕貴, 和田康孝, 白子準, 木村啓二, 笠原博徳

    情報処理学会2006 ARC-166-1(SHINING2006)    2006.01  [Refereed]

  • 2.マルチコアにおけるプログラミング( 「特集 マルチコアにおけるソフトウェア」)

    笠原博徳, 木村啓二

    情報処理   47 ( 1 ) 17 - 23  2006.01  [Refereed]

    CiNii

  • 1.マルチコア化するマイクロプロセッサ( 「特集 マルチコアにおけるソフトウェア」)

    笠原博徳, 木村啓二

    情報処理   47 ( 1 ) 10 - 16  2006.01  [Refereed]

  • Parallelizing Compiler Cooperated Low Power High Effective Performance Multi-core Processors

    Hironori Kasahara

    Technical Report of IPSJ,2006-ARC-166-6(SHINING2006)    2006.01  [Refereed]

  • A Static Scheduling Scheme for Coarse Grain Tasks on a Heterogeneous Chip Multi Processor

    Yasutaka Wada, Naoto Oshiyama, Yuki Suzuki, Yosuke Naito, Jun Shirako, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ,2006-ARC-166-3(SHINING2006)   2006 ( 8 ) 13 - 18  2006.01  [Refereed]

     View Summary

    This paper proposes a static scheduling scheme for coarse grain tasks on a heterogeneous chip multi processor which integrates not only general purpose processors but also accelerators like DRP or DSP. A heterogeneous chip multi processor allows us to get high performance by using the accelerators and to save energy by frequency/voltage control by the compiler. In this scheme, the compiler aim to minimize the execution time of an application in consideration of the characteristic in each core. Performance of the proposed scheme is evaluated on a heterogeneous chip multi processor which has 4 general purpose processors and 2 accelerators using MP3 encoder and gives us 8.8 times speedup against sequencial execution without the proposed scheme.

    CiNii

  • Preliminary Evaluation of Heterogeneous Chip Multi-Processor with MP3 Audio Encoder

    Hiroaki Shikano, Yuki Suzuki, Yasutaka Wada, Jun Shirako, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ,2006-ARC-166-1(SHINING2006)   2006 ( 8 ) 1 - 6  2006.01  [Refereed]

     View Summary

    This paper proposes a heterogeneous chip multi-processor (HCMP) that possesses different types of processing elements (PEs) such as CPUs as general-purpose processors, as well as digital signal processors or dynamic reconfigurable processors (DRPs) as special-purpose processors. The HCMP realizes higher performance than conventional single-core processors or even homogeneous multi-processors in some specific applications such as media processing, with low operating frequency supplied, which results in lower power consumption. In this paper, the performance of the HCMP is analyzed by studying parallelizing scheme and power control scheme of an MP3 audio encoding program and by scheduling the program onto the HCMP using these two schemes. As a result, it is confirmed that an HCMP, consisting of three CPUs and two DRPs, outperforms a single-core processor with one CPU by a speed-up factor of 16.3, and a homogeneous multi-processor with 5 CPUs by a speed-up factor of 4.0. It is also confirmed that the power control on the HCMP results in 24% power reduction.

    CiNii

  • Parallelizing Compilation Scheme for Reduction of Power Consumption of Chip Multiprocessors

    Jun Shirako, Naoto Oshiyama, Yasutaka Wada, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    Proc. of 12th Workshop on Compilers for Parallel Computers (CPC 2006)     426 - 440  2006.01  [Refereed]

  • 2.Programing for Multicore Systems

    Hironori Kasahara, Keiji Kimura

    IPSJ MAGAZINE   47 ( 1 ) 17 - 23  2006.01  [Refereed]

  • 1.Multicores Emerge as Next Generation Microprocessors

    Hironori Kasahara, Keiji Kimura

    IPSJ MAGAZINE   47 ( 1 ) 10 - 16  2006.01  [Refereed]

    CiNii

  • Compiler control power saving scheme for multi core processors

    Jun Shirako, Naoto Oshiyama, Yasutaka Wada, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   4339   362 - 376  2006  [Refereed]

     View Summary

    With the increase of transistors integrated onto a chip, multi core processor architectures have attracted much attention to achieve high effective performance, shorten development period and reduce the power consumption. To this end, the compiler for a multi core processor is expected not only to parallelize program effectively, but also to control the voltage and clock frequency of processors and storages carefully inside an application program. This paper proposes a compilation scheme for reduction of power consumption under the multigrain parallel processing environment that controls Voltage/Frequency and power supply of each processor core on a chip. In the evaluation, the OSCAR compiler with the proposed scheme achieves 60.7 percent energy savings for SPEC CFP95 applu without performance degradation on 4 processors, and 45.4 percent energy savings for SPEC CFP95 tomcatv with real-time deadline constraint on 4 processors, and 46.5 percent energy savings for SPEC CFP95 swim with the deadline constraint on 4 processors. © 2006 Springer-Verlag Berlin Heidelberg.

    DOI

    Scopus

    18
    Citation
    (Scopus)
  • Data Localization on a Multicore Processor

    Hiforumi Nakano, Shoichiro Asano, Yosuke Naito, Takumi Nito, Tomohiro Tagawa, Takaumichi Miyamoto, Takeshi Kodaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2005-ARC-165-10     51 - 56  2005.12  [Refereed]

  • マルチコアプロセッサ上でのデータローカライゼーション

    中野啓文, 浅野尚一郎, 内藤陽介, 仁藤拓実, 田川友博, 宮本孝道, 小高剛, 木村啓二, 笠原博徳

    情報処理学会研究会報告2005-ARC-165-10     51 - 56  2005.11  [Refereed]

  • マルチコアプロセッサ上でのデータローカライゼーション

    中野啓文, 浅野尚一郎, 内藤陽介, 仁藤拓実, 田川友博, 宮本孝道, 小高剛, 木村啓二, 笠原博徳

    情報処理学会研究会報告2005-ARC-165-10   2005 ( 120 ) 51 - 56  2005.11  [Refereed]

     View Summary

    Along with the increase of integration degree of semiconductor devices, to overcome the increase of power consumption, the slowdown of improvement of processor effective performance, and the increase of period for hardware/software developing transistors integrated on to a chip, multicore processors, which integrate multiple processor cores on a single chip, have attracted much attention as a next-generation microprocessor architecture. However, the memory wall caused by the gap between memory access speed and processor core speed is still a serious problem also on the multicore processors. Therefore the effective use of fast memories like cache and local memory nearby a processor is important. Considering these problems, the authors have proposed the OSCAR multicore processor architecture which cooperates with OSCAR multigrain parallelizing compiler and aims at developing a processor with high effective performance and good cost performance computer system. The OSCAR multicore processor has local data memory (LDM) for processor private data, distributed shared memory (DSM) having two ports for synchronization and data transfer among processor cores, centralized shared memory (CSM) to support dynamic task scheduling, and data transfer unit (DTU) which transfers data asynchronously and aims at overlapping data transfer overhead. This paper describes data localization scheme that aimed at improving the effective use of LDM using coarse grain parallel processing and compiler-controlled LDM management scheme. As the results, the proposed scheme gives us 8.01 times speedup for MPEG2 encoding program against the sequential execution on 8 processors automatically.

    CiNii

  • ホモジニアスマルチコアにおけるコンパイラ制御低消費電力化手法

    白子 準, 押山 直人, 和田 康孝, 鹿野 裕明, 木村 啓二, 笠原博徳

    情報処理学会研究会報告2005-ARC-164-10(SwoPP2005)     55 - 60  2005.09  [Refereed]

  • チップマルチプロセッサ上でのMPEG2エンコードの並列処理

    小高 剛, 中野 啓史, 木村 啓二, 笠原 博徳

    情報処理学会論文誌   46 ( 9 ) 2311 - 2325  2005.09  [Refereed]

  • Parallel Processing of MPEG2 Encoding on a Chip Multiprocessor Architecture

    Takeshi Kodaka, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Trans. of IPSJ   46 ( 9 ) 2311 - 2325  2005.09  [Refereed]

  • 並列化コンパイラ協調型チップマルチプロセッサ技術

    笠原 博徳, 木村 啓二, 中野 啓史, 白子 準, 宮本 孝道, 和田 康孝

    STARCシンポジウム2005    2005.09  [Refereed]

  • Compiler Control Power Saving Scheme for Homogeneous Multiprocessor

    Jun Shirako, Naoto Oshiyama, Yasutaka Wada, Hiroaki Shikano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2005-ARC-164-10 (SwoPP2005)     55 - 60  2005.08

  • 組み込みマルチコア用コンパイラ技術

    笠原 博徳

    アーム株式会社 ARMセミナー2005    2005.06  [Refereed]

  • Compiler technology for built-in multi-core processor

    H. Kasahara

    ARM Seminar 2005, Tokyo    2005.06  [Refereed]

  • 最先端の高性能コンピュータ

    笠原 博徳

    文部科学省 科学技術振興調整費 新興分野人材養成プログラム 「ナノ・IT・バイオ知財経営戦略スキルアッププログラム」 特別講座「先端技術と知的財産①ナノ・IT編」    2005.05  [Refereed]

  • コンピュータ分野のロードマップ

    笠原 博徳

    NEDO 電子・情報技術ロードマップ成果報告会    2005.05  [Refereed]

  • Road map of the computer area

    H. Kasahara

    NEDO Electronics and Information Technology Road map Accomplishment Report Symposium, Tokyo    2005.05  [Refereed]

  • Advanced High-Performance Computer

    H. Kasahara

    Lecture on 'Advanced technology and intellectual property in Nano and IT', Program for cultivation of people in new fields of study 'Upskilling program for Nano, IT, Bio - Intellectual Property Management Strategy', Promotion Budget for Science and Techno    2005.05  [Refereed]

  • Hierarchical parallelism control for multigrain parallel processing

    M Obata, J Shirako, H Kaminaga, K Ishizaka, H Kasahara

    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING   2481   31 - 44  2005  [Refereed]

     View Summary

    To improve effective performance and usability of shared memory multiprocessor systems, a multi-grain compilation scheme, which hierarchically exploits coarse grain parallelism among loops, subroutines and basic blocks, conventional loop parallelism and near fine grain parallelism among statements inside a basic block, is important. In order to efficiently use hierarchical parallelism of each nest level, or layer, in multigrain parallel processing, it is required to determine how many processors or groups of processors should be assigned to each layer, according to the parallelism of the layer. This paper proposes an automatic hierarchical parallelism control scheme to assign suitable number of processors to each layer so that the parallelism of each hierarchy can be used efficiently. Performance of the proposed scheme is evaluated on IBM RS6000 SMP server with 8 processors using 8 programs of SPEC95FP.

  • Performance of OSCAR multigrain parallelizing compiler on SMP servers

    K Ishizaka, T Miyamoto, J Shirako, M Obata, K Kimura, H Kasahara

    LANGUAGES AND COMPILERS FOR HIGH PERFORMANCE COMPUTING   3602   319 - 331  2005  [Refereed]

     View Summary

    This paper describes performance of OSCAR multigrain parallelizing compiler on various SMP servers, such as IBM pSeries 690, Sun Fire V880, Sun Ultra 80, NEC TX7/i6010 and SGI Altix 3700. The OSCAR compiler hierarchically exploits the coarse grain task parallelism among loops, subroutines and basic blocks and the near fine grain parallelism among statements inside a basic block in addition to the loop parallelism. Also, it allows us global cache optimization over different loops, or coarse grain tasks, based on data localization technique with interarray padding to reduce memory access overhead. Current performance of OSCAR compiler is evaluated on the above SMP servers. For example, the OSCAR compiler generating OpenMP parallelized programs from ordinary sequential Fortran programs gives us 5.7 times speedup, in the average of seven programs, such as SPEC CFP95 tomcatv, swim, su2cor, hydro2d, mgrid, applu and turb3d, compared with IBM XL Fortran compiler 8.1 on IBM pSeries 690 24 processors SMP server. Also, it gives us 2.6 times speedup compare with Intel Fortran Itanium Compiler 7.1 on SGI Altix 3700 Itanium 2 16 processors server, 1.7 times speedup compared with NEC Fortran Itanium Compiler 3.4 on NEC TX7/i6010 Itanium 2 8 processors server, 2.5 times speedup compared with Sun Forte 7.0 on Sun Ultra 80 UltraSPARC II4 processors desktop work-station, and 2.1 times speedup compare with Sun Forte compiler 7.1 on Sun Fire V880 UltraSPARC III Cu 8 processors server.

  • Performance of OSCAR multigrain parallelizing compiler on SMP servers

    K Ishizaka, T Miyamoto, J Shirako, M Obata, K Kimura, H Kasahara

    LANGUAGES AND COMPILERS FOR HIGH PERFORMANCE COMPUTING   3602   319 - 331  2005  [Refereed]

     View Summary

    This paper describes performance of OSCAR multigrain parallelizing compiler on various SMP servers, such as IBM pSeries 690, Sun Fire V880, Sun Ultra 80, NEC TX7/i6010 and SGI Altix 3700. The OSCAR compiler hierarchically exploits the coarse grain task parallelism among loops, subroutines and basic blocks and the near fine grain parallelism among statements inside a basic block in addition to the loop parallelism. Also, it allows us global cache optimization over different loops, or coarse grain tasks, based on data localization technique with interarray padding to reduce memory access overhead. Current performance of OSCAR compiler is evaluated on the above SMP servers. For example, the OSCAR compiler generating OpenMP parallelized programs from ordinary sequential Fortran programs gives us 5.7 times speedup, in the average of seven programs, such as SPEC CFP95 tomcatv, swim, su2cor, hydro2d, mgrid, applu and turb3d, compared with IBM XL Fortran compiler 8.1 on IBM pSeries 690 24 processors SMP server. Also, it gives us 2.6 times speedup compare with Intel Fortran Itanium Compiler 7.1 on SGI Altix 3700 Itanium 2 16 processors server, 1.7 times speedup compared with NEC Fortran Itanium Compiler 3.4 on NEC TX7/i6010 Itanium 2 8 processors server, 2.5 times speedup compared with Sun Forte 7.0 on Sun Ultra 80 UltraSPARC II4 processors desktop work-station, and 2.1 times speedup compare with Sun Forte compiler 7.1 on Sun Fire V880 UltraSPARC III Cu 8 processors server.

  • Performance Evaluation of Minimum Execution Time Multiprocessor Scheduling Algorithms using Standard Task Graph Set Which Takes into Account Parallelism of Task Graphs

    Takanari Matsuzawa, Shinya Sakaida, Takao Tobita, Hironori Kasahara

    Technical Report of IPSJ, ARC2004-161-9    2005.01

  • Performance of OSCAR Multigrain Parallelizing Compiler on Shared Memory Multiprocessor Serers

    Jun Shirako, Takamichi Miyamoto, Kazuhisa Ishizaka, Motoki Obata, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2004-161-5   2005 ( 7 ) 21 - 26  2005.01

     View Summary

    The needs for automatic parallelizing compilers are getting larger with widly use of multiprocessor systems. However, the loop parallelization techniques are almost matured and new generation of parallelization methods like multi-grain parallelization are required to achieve higher effective performance. This paper describes the performance of OSCAR multigrain parallelizing compiler that uses the coarse grain task parallelization and the near fine grain parallelization in addition to the loop parallelization. OSCAR compiler realizes the following two important techniques. The first is the automatic determination scheme of parallelizing layer, which decides the number of processors and parallelizing technique for each part of the program. The other is global cache memory optimization among loops and coarse grain tasks. In the evaluation using SPEC95FP benchmarks, OSCAR compiler gave us 4.78 times speedup compared with IBM XL Fortran compiler 8.1 on IBM pSeries690 Power4 24 processors server, 2.40 times speedup compared with Intel Fortran Itanium Compiler 7.1 on SGI Altix3700 Itanium2 16 processors server, 1.90 times speedup compared with Sun Forte compiler 7.1 on Sun Fire V880 Ultra SPARC III Cu 8 processors server.

    CiNii

  • Performance Evaluation of Electronic Circuit Simulation Using Code Generation Method without Array Indirect Access

    Akira Kuroda, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2005-161-1 (SHINING2005)    2005.01

  • 並列度を考慮した標準タスクグラフセットを用いた実行時間最小マルチプロセッサスケジューリングアルゴリズムの性能評価

    松澤能成, 坂井田真也, 飛田高雄, 笠原博徳

    情報処理学会研究報告ARC2005-161-5 (SHINING2005)    2005.01  [Refereed]

  • 共有メモリ型マルチプロセッササーバ上におけるOSCARマルチグレイン自動並列化コンパイラの性能評価

    白子準, 宮本孝道, 石坂一久, 小幡元樹, 木村啓二, 笠原博徳

    情報処理学会研究報告ARC2005-161-5 (SHINING2005)   2005 ( 7 ) 21 - 26  2005.01  [Refereed]

     View Summary

    The needs for automatic parallelizing compilers are getting larger with widly use of multiprocessor systems. However, the loop parallelization techniques are almost matured and new generation of parallelization methods like multi-grain parallelization are required to achieve higher effective performance. This paper describes the performance of OSCAR multigrain parallelizing compiler that uses the coarse grain task parallelization and the near fine grain parallelization in addition to the loop parallelization. OSCAR compiler realizes the following two important techniques. The first is the automatic determination scheme of parallelizing layer, which decides the number of processors and parallelizing technique for each part of the program. The other is global cache memory optimization among loops and coarse grain tasks. In the evaluation using SPEC95FP benchmarks, OSCAR compiler gave us 4.78 times speedup compared with IBM XL Fortran compiler 8.1 on IBM pSeries690 Power4 24 processors server, 2.40 times speedup compared with Intel Fortran Itanium Compiler 7.1 on SGI Altix3700 Itanium2 16 processors server, 1.90 times speedup compared with Sun Forte compiler 7.1 on Sun Fire V880 Ultra SPARC III Cu 8 processors server.

    CiNii

  • 配列間接アクセスを用いないコード生成法を用いた電子回路シミュレーション手法の性能評価

    黒田亮, 木村啓二, 笠原博徳

    情報処理学会研究報告ARC2005-161-1 (SHINING2005)   2005 ( 7 ) 1 - 6  2005.01  [Refereed]

     View Summary

    This paper evaluates performance of a fast sequential circuit simulation scheme using the loop free code without the array indirect accesses. This scheme allows us to get several tens of times higher processing performance than SPICE version 3f5 on a WS and a PC. The array indirect accesses for the sparse matrix solution in SPICE have been one of the factors that prevents from efficient processing. This paper describes the circuit simulation scheme using loop free code without any array indirect accesses and its performance evaluation shows the scheme gives us 2 to 110 times better performance than SPICE3f5 on a WS and a PC. The performance by reducing the memory accesses overhead significantly.

    CiNii

  • Performance Evaluation of Electronic Circuit Simulation Using Code Generation Method without Array Indirect Access

    Akira Kuroda, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2005-161-1 (SHINING2005)    2005.01  [Refereed]

  • Performance Evaluation of Minimum Execution Time Multiprocessor Scheduling Algorithms using Standard Task Graph Set Which Takes into Account Parallelism of Task Graphs

    Takanari Matsuzawa, Shinya Sakaida, Takao Tobita, Hironori Kasahara

    Technical Report of IPSJ, ARC2004-161-9   2005 ( 7 ) 45 - 50  2005.01  [Refereed]

     View Summary

    This paper evaluates performance of heuristic and optimization algorithms using benchmark task graphs named Standard TaskGraph Set (STG) for the minimum execution time nonpreemptive multiprocessor scheduling problem. In the standard task graph set used in this paper, in addition to the relationship between parallelism of task graphs and "the number of processors" which is the number of processors used in the scheduling problem, the scale of task graphs like 50, 100, 300, 500, 700, 1000 tasks, and parallelism "para" of 1.5&le; para < 20.5 affects optimal solution rate. This paper evaluates perfomance of heuristic algorithms, practical sequential optimization algorithm DF/IHS (Depth First/Implicit Heuristic Search) and practical parallel optimization algorithm (Parallelized DF/IHS) using this STG also for 2 to 16 processors. The evaluation shows for the total 12312 tested problems, FIFO gives us optimal solutions for 15.14% of the problems, RTRS for 14.63%, CP for 65.80%, CP/MISF for 65.85%, DF/IHS for 87.79% and PDF/IHS for 91.62%. Also, it was confirmed that the parallel algorithm PDF/IHS gave us 554.6 times speed up against the sequential algorithm DF/IHS for 2 processors scheduling problems and 461.8 times for 4 processors scheduling problems. When para is close to the number of processors, each algorithm gives us low optimal solution rate, in addition to that, when the number of processors is 4 and "para > the number of processors", heuristic algorithms like CP gives us low optimal solution rate (60%) and however, DF/IHS and PDF/IHS give us high optimal solution rate such as 90% and 100% respectively.

    CiNii

  • Performance of OSCAR Multigrain Parallelizing Compiler on Shared Memory Multiprocessor Serers

    Jun Shirako, Takamichi Miyamoto, Kazuhisa Ishizaka, Motoki Obata, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2004-161-5   2005 ( 7 ) 21 - 26  2005.01  [Refereed]

     View Summary

    The needs for automatic parallelizing compilers are getting larger with widly use of multiprocessor systems. However, the loop parallelization techniques are almost matured and new generation of parallelization methods like multi-grain parallelization are required to achieve higher effective performance. This paper describes the performance of OSCAR multigrain parallelizing compiler that uses the coarse grain task parallelization and the near fine grain parallelization in addition to the loop parallelization. OSCAR compiler realizes the following two important techniques. The first is the automatic determination scheme of parallelizing layer, which decides the number of processors and parallelizing technique for each part of the program. The other is global cache memory optimization among loops and coarse grain tasks. In the evaluation using SPEC95FP benchmarks, OSCAR compiler gave us 4.78 times speedup compared with IBM XL Fortran compiler 8.1 on IBM pSeries690 Power4 24 processors server, 2.40 times speedup compared with Intel Fortran Itanium Compiler 7.1 on SGI Altix3700 Itanium2 16 processors server, 1.90 times speedup compared with Sun Forte compiler 7.1 on Sun Fire V880 Ultra SPARC III Cu 8 processors server.

    CiNii

  • Multigrain parallel processing on compiler cooperative chip multiprocessor

    K Kimura, Y Wada, H Nakano, T Kodaka, J Shirako, K Ishizaka, H Kasahara

    9TH ANNUAL WORKSHOP ON INTERACTION BETWEEN COMPILERS AND COMPUTER ARCHITECTURES, PROCEEDINGS     11 - 20  2005  [Refereed]

     View Summary

    This paper describes multigrain parallel processing on a compiler cooperative chip multiprocessor The multigrain parallel processing hierarchically exploits multiple grains of parallelism such as coarse grain task parallelism, loop iteration level parallelism and statement level near-fine grain parallelism. The chip multiprocessor has been designed to attain high effective peformance, cost effectiveness and high software productivity by supporting the optimizations of the multigrain parallelizing compiler, which is developed by Japanese Millennium Project IT21 "Advance Parallelizing Compiler". To achieve full potential of multigrain parallel processing, the chip multiprocessor integrates simple single-issue processors having distributed shared data memory for both optimal use of data locality and scalar data transfer local data memory for processor private data, in addition to centralized shared memory for shared data among processors. This paper focuses on the scalability of the chip multiprocessor having up to eight processors on a chip by exploiting of the multigrain parallelism from SPECfp95 programs. When microSPARC like the simple processor core is used under assumption of 90 nm technology and 2.8 GHz, the evaluation results show the speedups for eight processors and four processors reach 7.1 and 3.9, respectively. Similarly, when 400 MHz is assumed for embedded usage, the speedups reach 7.8 and 4.0, respectively.

  • Multigrain parallel processing on compiler cooperative chip multiprocessor

    K Kimura, Y Wada, H Nakano, T Kodaka, J Shirako, K Ishizaka, H Kasahara

    9TH ANNUAL WORKSHOP ON INTERACTION BETWEEN COMPILERS AND COMPUTER ARCHITECTURES, PROCEEDINGS     11 - 20  2005  [Refereed]

     View Summary

    This paper describes multigrain parallel processing on a compiler cooperative chip multiprocessor The multigrain parallel processing hierarchically exploits multiple grains of parallelism such as coarse grain task parallelism, loop iteration level parallelism and statement level near-fine grain parallelism. The chip multiprocessor has been designed to attain high effective peformance, cost effectiveness and high software productivity by supporting the optimizations of the multigrain parallelizing compiler, which is developed by Japanese Millennium Project IT21 "Advance Parallelizing Compiler". To achieve full potential of multigrain parallel processing, the chip multiprocessor integrates simple single-issue processors having distributed shared data memory for both optimal use of data locality and scalar data transfer local data memory for processor private data, in addition to centralized shared memory for shared data among processors. This paper focuses on the scalability of the chip multiprocessor having up to eight processors on a chip by exploiting of the multigrain parallelism from SPECfp95 programs. When microSPARC like the simple processor core is used under assumption of 90 nm technology and 2.8 GHz, the evaluation results show the speedups for eight processors and four processors reach 7.1 and 3.9, respectively. Similarly, when 400 MHz is assumed for embedded usage, the speedups reach 7.8 and 4.0, respectively.

  • Parallel Processing for MPEG2 Encoding on OSCAR Chip Multiprocessor

    Takeshi Kodaka, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2004-ARC-160-07     119 - 127  2004.12

    Authorship:Last author

     View Summary

    Currently, many people are enjoying multimedia applications with image and audio processing on PCs, PDAs, mobile phones and so on. With the popularization of the multimedia applications, needs for low cost, low power consumption and high performance processors has been increasing.To this end, chip multiprocessor architectures which allow us to attain scalable performance improvement by using multigrain parallelism are attracting much attention. However, in order to extract higher performance on a chip multiprocessor, more sophisticated software techniques are required, such as decomposing a program into adequate grain of tasks, assigning them onto processors considering parallelism, data locality optimization and so on. This paper describes a parallel processing scheme for MPEG2 encoding using data localization which improve execution efficiency assigning coarse grain tasks sharing same data on a same processor consecutively for a chip multiprocessor. The performance evaluation on OSCAR chip multiprocessor architecture shows that proposed scheme gives us 6.97 times speedup using 8 processors and 10.93 times speedup using 16 processors against sequential execution time respectively. Moreover, the proposed scheme gives us 1.61 times speedup using 8 processors and 2.08 times speedup using 16 processors against loop parallel processing which has been widely used for multiprocessor systems using the same number of processors. © 2004 IEEE.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • OSCARチップマルチプロセッサ上でのMPEG2エンコードの並列処理

    小高 剛, 中野 啓史, 木村 啓二, 笠原 博徳

    情報処理学会研究会報告2004-ARC-160-07   2004 ( 123 ) 53 - 58  2004.12  [Refereed]

     View Summary

    This paper proposes a coarse grain task parallel processing scheme for MPEG2 encoding using data localization which optimizes execution efficiency assigning coarse grain tasks accessing the same array data on the same processor consecutively on a chip multiprocessor and data transfer overlapping technique which minimize the data transfer overhead by overlapping task execution and data transfer. Performance of the proposed scheme is evaluated. As the evaluation result on an OSCAR chip multiprocessor architecture, the proposed scheme gave us 1.24 times speedup for 1 processor, 2.47 times speedup for 2 processors. 4.57 times speedup for 4 processors, 7.97 times speedup for 8 processors and 11.93 times speedup for 16 processors respectively against the sequential execution on a single processor without the proposed scheme.

    CiNii

  • HPC用自動並列化コンパイラの動向と将来課題

    笠原 博徳

    第19回NEC・HPC研究会    2004.11  [Refereed]

  • Current and Future of Automatic Parallelizing Compilers

    H. Kasahara

    The 19th NEC HPC Forum    2004.11  [Refereed]

  • Performance of OSCAR Multigrain Parallelizing Compiler on SMP Servers

    Kazuhisa Ishizaka, Takamichi Miyamoto, Jun Shirako, Motoki Obata, Keiji Kimura, Hironori Kasahara

    Proc. of 17th International Workshop on Languages and Compilers for Parallel Computing(LCPC2004)    2004.09  [Refereed]

    CiNii

  • 世界一のコンパイラを作る--アドバンスト並列化コンパイラプロジェクト--

    笠原 博徳

    IBMライフサイエンス天城セミナー    2004.09  [Refereed]

  • Developing World Fastest Compiler: Advanced Parallelizing Compiler Project

    H. Kasahara

    IBM Life Science Amagi Seminar    2004.09  [Refereed]

  • Data Localization using Data Transfer Unit on OSCAR Chip Multiprocessor

    Hirofumi Nakano, Yosuke Naito, Takahisa Suzuki, Takeshi Kodaka, Kazuhisa Ishizaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2004-ARC-159-20    2004.07

  • Evaluation of Multigrain Parallelism on OSCAR Chip Multi Processor

    Yasutaka Wada, Jun Shirako, Kazuhisa Ishizaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2004-ARC-159-11    2004.07

  • OSCARチップマルチプロセッサ上でのデータ転送ユニットを用いたデータローカライゼーション

    中野 啓史, 内藤 陽介, 鈴木 貴久, 小高 剛, 石坂 一久, 木村 啓二, 笠原 博徳

    情報処理学会研究会報告2004-ARC-159-20    2004.07  [Refereed]

  • OSCARチップマルチプロセッサ上でのマルチグレイン並列性評価

    和田 康孝, 白子 準, 石坂 一久, 木村 啓二, 笠原 博徳

    情報処理学会研究会報告2004-ARC-159-11    2004.07  [Refereed]

  • Data Localization using Data Transfer Unit on OSCAR Chip Multiprocessor

    Hirofumi Nakano, Yosuke Naito, Takahisa Suzuki, Takeshi Kodaka, Kazuhisa Ishizaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2004-ARC-159-20    2004.07  [Refereed]

  • Evaluation of Multigrain Parallelism on OSCAR Chip Multi Processor

    Yasutaka Wada, Jun Shirako, Kazuhisa Ishizaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2004-ARC-159-11    2004.07  [Refereed]

  • マルチグレイン並列性向上のための選択的インライン展開手法

    白子 準, 長澤 耕平, 石坂 一久, 小幡 元樹, 笠原 博徳

    情報処理学会論文誌   45 ( 4 ) 1354 - 1356  2004.05  [Refereed]

     View Summary

    With the increase of applications of multiprocessor systems, needs of automatic parallelizing compilers are increasing to improve effective performance, cost performance, and software productivity. Especially, for higher effective performance by compiler, a multi-grain parallel processing which exploits coarse grain parallelism among loops, subroutines and basic blocks, medium grain parallelism among loop-iterations and near fine grain parallelism among statements inside a basic block, is getting important. In multi-grain parallel processing, it is required to assign the appropriate number of processors to each nested layer, considering the parallelism of each layer. At that time, inline expansion of subroutines having large parallelism in a lower layer can increase coarse grain parallelism significantly. Therefore, considering this program restructuring, a compiler must assign processors to each layer. To this end, this paper proposes a selective inline expansion scheme for improvement of multi grain parallelism. Effectiveness of the proposed scheme is evaluated on IBM RS6000, midrange SMP server with 8 processors and IBM pSeries690 regattaH, highend SMP server with 16 processors, using 103.su2cor, 107.mgrid, 125.turb3d of SPEC95FP. The multi grain parallel processing using the proposed scheme gave us 2.84 to 6.04 times speedup on RS6000, 3.54 to 11.19 times speedup on regattaH against sequential processing, 1.12 to 1.79 times speedup on RS6000, 1.03 to 1.47 times speedup on regattaH against conventional multi-grain parallelization.

    CiNii

  • Selective Inline Expansion for Improvement of Multi Grain Parallelism

    Jun shirako, Kouhei Nagasawa, Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    Trans. of IPSJ   45 ( 5 ) 1354 - 1356  2004.05  [Refereed]

  • 150th ARC memorial special technical meeting(2), Panel: Future of Computer Architecture Research 'Development of high-value added Chip Multiprocessors by industry-government-academia collaboration'

    H. Kasahara

    150th IPSJ Special Interest Group on Computer Architecture    2004.05  [Refereed]

  • 配列間パディングを用いた粗粒度タスク間キャッシュ最適化

    石坂 一久, 小幡 元樹, 笠原 博徳

    情報処理学会論文誌   45 ( 4 ) 1063 - 1076  2004.04  [Refereed]

     View Summary

    Importance of automatic parallelizing compilers is getting larger with the widespread use of multiprocessor system. To improve the performance of multiprocessor system, currently multigrain parallelization is attracting much attention. In multigrain parallelization, coarse grain task parallelisms among loops and subroutines and near fine grain parallelisms among statements are used in addition to the traditional loop parallelism. The locality optimization to use cache effectively is also important for the performance improvement. This paper proposes inter-array padding for data localization to minimize cache conflict misses over loops. The proposed padding scheme was evaluated on the two commercial 4 processors workstations, namely Sun Ultra 80 and IBM RS/6000 44p-270, which have different cache configuration. Compared with the maximum performance of Sun Forte 6 update 2 compiler automatic loop parallelization on Ultra 80, the proposed padding with data localization gave us 5.1 times speedup for SPEC CFP95 tomcatv, 3.3 times for swim, 2.1 times for hydro2d, 1.1 times for turb3d. On IBM RS/6000 44p-270, it shows 1.7 times speedup for tomcatv, 4.2 times for swim, 2.5 times for hydro2d, 1.03 times for turbSd against automatic parallelization by IBM XL Fortran 7.1 compiler.

    CiNii

  • Cache Optimization among Coarse Grain Tasks using Intra-Array Pading

    Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    Trans. of IPSJ   45 ( 4 )  2004.04  [Refereed]

  • IBM pSeries 690 上での OSCAR マルチグレイン自動並列化コンパイラの性能評価

    石坂 一久, 白子 準, 小幡 元樹, 木村 啓二, 笠原 博徳

    情報処理学会第66回全国大会    2004.03  [Refereed]

  • Software Development on Large Parallel Supercomputers in Japan -- Parallelizing Compilers and Parallel Programming Language Projects --

    H. Kasahara

    U.S.-Japan Forum on the Future of Supercomputing, 米国工学アカデミー、(社)日本工学アカデミー    2004.03  [Refereed]

  • Research on Parallelizing Compiler for High Performance Computing in Japan

    H. Kasahara

    Japan-U.S.A. Supercomputing Forum, The Engineering Academy of Japan Inc.(EAJ)    2004.03  [Refereed]

  • ミレニアムプロジェクトIT21アドバンスト並列化コンパイラとコンパイラ協調型チップマルチプロセッサ

    笠原 博徳

    NECソフト㈱ 第四回 VTC先端領域セミナー    2004.02  [Refereed]

  • Parallel Processing for MPEG2 Encoding using Data Localization

    Takeshi Kodaka, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2004-ARC-156-3   2004 ( 12 ) 13 - 18  2004.02

     View Summary

    Recently, many people are getting to enjoy multimedia applications with image and audio processing on PCs, mobile phones and PDAs. For this situation, development of low cost, low power consumption and high performance processors for multimedia applications has been expected. To satisfy these demands, chip multiprocessor architectures which allows us to attain scalability using coarse grain level parallelism and loop level parallelism in addition to instruction level parallelism are attracting much attention. However, in order to extract much performance from chip multiprocessor architectures efficiently, highly sophisticated technique is required such as decomposing a program into adequate grain of tasks and assigning them onto processors considering parallelism and data locality of target applications. This paper describes a parallel processing scheme for MPEG2 encoding using data localization which improve execution efficiency assigning coarse grain tasks sharing same data on a same processor consecutively for a chip multiprocessor, and evaluate its performance. As the evaluation result on OSCAR CMP using 8 processors, proposed scheme gives us 1.64 times speedup against loop parallel processing, and 6.82 times speedup against sequential execution time.

    CiNii

  • データローカライゼーションを伴うMPEG2エンコーディングの並列処理

    小高 剛, 中野 啓史, 木村 啓二, 笠原 博徳

    情報処理学会研究会報告2004-ARC-156-3   2004 ( 12 ) 13 - 18  2004.02  [Refereed]

     View Summary

    Recently, many people are getting to enjoy multimedia applications with image and audio processing on PCs, mobile phones and PDAs. For this situation, development of low cost, low power consumption and high performance processors for multimedia applications has been expected. To satisfy these demands, chip multiprocessor architectures which allows us to attain scalability using coarse grain level parallelism and loop level parallelism in addition to instruction level parallelism are attracting much attention. However, in order to extract much performance from chip multiprocessor architectures efficiently, highly sophisticated technique is required such as decomposing a program into adequate grain of tasks and assigning them onto processors considering parallelism and data locality of target applications. This paper describes a parallel processing scheme for MPEG2 encoding using data localization which improve execution efficiency assigning coarse grain tasks sharing same data on a same processor consecutively for a chip multiprocessor, and evaluate its performance. As the evaluation result on OSCAR CMP using 8 processors, proposed scheme gives us 1.64 times speedup against loop parallel processing, and 6.82 times speedup against sequential execution time.

    CiNii

  • Millennium Project IT21 Advanced Parallelizing Compiler and Compiler Cooperative Chip Multiprocessor

    H. Kasahara

    The 4th VTC Seminar, NEC Soft    2004.02  [Refereed]

  • Parallel Processing for MPEG2 Encoding using Data Localization

    Takeshi Kodaka, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2004-ARC-156-3   2004 ( 12 ) 13 - 18  2004.02  [Refereed]

     View Summary

    Recently, many people are getting to enjoy multimedia applications with image and audio processing on PCs, mobile phones and PDAs. For this situation, development of low cost, low power consumption and high performance processors for multimedia applications has been expected. To satisfy these demands, chip multiprocessor architectures which allows us to attain scalability using coarse grain level parallelism and loop level parallelism in addition to instruction level parallelism are attracting much attention. However, in order to extract much performance from chip multiprocessor architectures efficiently, highly sophisticated technique is required such as decomposing a program into adequate grain of tasks and assigning them onto processors considering parallelism and data locality of target applications. This paper describes a parallel processing scheme for MPEG2 encoding using data localization which improve execution efficiency assigning coarse grain tasks sharing same data on a same processor consecutively for a chip multiprocessor, and evaluate its performance. As the evaluation result on OSCAR CMP using 8 processors, proposed scheme gives us 1.64 times speedup against loop parallel processing, and 6.82 times speedup against sequential execution time.

    CiNii

  • Selective inline expansion for improvement of multi grain parallelism

    J Shirako, K Nagasawa, K Ishizaka, M Obata, H Kasahara

    Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks     476 - 482  2004  [Refereed]

     View Summary

    This paper proposes a selective procedure inlining scheme to improve a multi-grain parallelism, which hierarchically exploits the coarse grain task parallelism among loops, subroutines and basic blocks and near fine grain parallelism among statements inside a basic block in addition to the loop parallelism. Using the proposed scheme, the parallelism among, different layers(nested levels) can be exploited. In the evaluation using 103.su2cor, 107.mgrid and 125.turb3d in SPEC95FP benchmarks on 16 way IBM pSeries690 SMP server, the multi-,grain parallel processing with the proposed scheme gave us 3.65 to 5.34 times speedups against IBM XL Fortran compiler and 1.03 to 1.47 times speedups against conventional multi-grain parallelization.

  • Cache optimization for coarse grain task parallel processing using inter-array padding

    K Ishizaka, M Obata, H Kasahara

    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING   2958   64 - 76  2004  [Refereed]

     View Summary

    The wide use of multiprocessor system has been making automatic parallelizing compilers more important. To improve the performance of multiprocessor system more by compiler, multigrain parallelization is important. In multigrain parallelization, Coarse grain task parallelism among loops and subroutines and near fine grain parallelism among statements are used in addition to the traditional loop parallelism. In addition, locality optimization to use cache effectively is also important for the performance improvement. This paper describes inter-array padding to minimize cache conflict misses among macro-tasks with data localization scheme which decomposes loops sharing the same arrays to fit cache size and executes the decomposed loops consecutively on the same processor. In the performance evaluation on Sun Ultra 80(4pe), OSCAR compiler on which the proposed scheme is implemented gave us 2.5 times speedup against the maximum performance of Sun Forte compiler automatic loop parallelization at the average of SPEC CFP95 tomcatv, swim hydro2d and turb3d programs. Also, OSCAR compiler showed 2.1 times speedup on IBM RS/6000 44p-270(4pe) against XLF compiler.

  • Parallel processing using data localization for MPEG2 encoding on OSCAR chip multiprocessor

    T Kodaka, H Nakano, K Kimura, H Kasahara

    INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, PROCEEDINGS     119 - 127  2004  [Refereed]

     View Summary

    Currently, many people are enjoying multimedia applications with image and audio processing on PCs, PDAs, mobile phones and so on. With the popularization of the multimedia applications, needs for low cost, low power consumption and high performance processors has been increasing. To this end, chip multiprocessor architectures which allow us to attain scalable performance improvement by using multigrain parallelism are attracting much attention. However, in order to extract higher performance on a chip multiprocessor, more sophisticated software techniques are required, such as decomposing a program into adequate grain of tasks, assigning them onto processors considering parallelism, data locality optimization and so on. This paper describes a parallel processing scheme for MPEG2 encoding using data localization which improve execution efficiency assigning coarse grain tasks sharing same data on a same processor consecutively for a chip multiprocessor. The performance evaluation on OSCAR chip multiprocessor architecture shows that proposed scheme gives us 6.97 times speedup using 8 processors and 10.93 times speedup using 16 processors against sequential execution time respectively. Moreover, the proposed scheme gives us 1.61 times speedup using 8 processors and 2.08 times speedup using 16 processors against loop parallel processing which has been widely used for multiprocessor systems using the same number of processors.

  • Memory management for data localization on OSCAR chip multiprocessor

    H Nakano, T Kodaka, K Kimura, H Kasahara

    INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, PROCEEDINGS     82 - 88  2004  [Refereed]

     View Summary

    Chip Multiprocessor (CMP) architecture has attracting much attention as a next-generation microprocessor architecture and many kinds of CMP are widely being researched. However, CMP architectures several difficulties for effective use of memory, especially cache or local memory near a processor core. The authors have proposed OSCAR CMP architecture, which cooperatively works with multigrain parallelizing compiler which gives us much higher parallelism than instruction level parallelism or loop level parallelism and high productivity of application programs. To support the compiler optimization for effective use of cache or local memory, OSCAR CMP has local data memory (LDM) for processor private data and distributed shared memory (DSM) for synchronization and fine grain data transfers among processors, in addition to centralized shared memory (CSM) to support dynamic task scheduling. This paper proposes a static coarse grain task scheduling scheme for data localization using live variable analysis. Furthermore, remote memory data transfer scheduling scheme using information of live variable analysis is also described. The proposed scheme is implemented on OSCAR FORTRAN multigrain parallelizing compiler and is evaluated on OSCAR CMP using Tomcatv and Swim in SPEC CFP 95 benchmark.

  • Selective inline expansion for improvement of multi grain parallelism

    J Shirako, K Nagasawa, K Ishizaka, M Obata, H Kasahara

    Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks     476 - 482  2004  [Refereed]

     View Summary

    This paper proposes a selective procedure inlining scheme to improve a multi-grain parallelism, which hierarchically exploits the coarse grain task parallelism among loops, subroutines and basic blocks and near fine grain parallelism among statements inside a basic block in addition to the loop parallelism. Using the proposed scheme, the parallelism among, different layers(nested levels) can be exploited. In the evaluation using 103.su2cor, 107.mgrid and 125.turb3d in SPEC95FP benchmarks on 16 way IBM pSeries690 SMP server, the multi-,grain parallel processing with the proposed scheme gave us 3.65 to 5.34 times speedups against IBM XL Fortran compiler and 1.03 to 1.47 times speedups against conventional multi-grain parallelization.

  • The Data Prefetching of Coarse Grain Task Parallel Processing on Symmetric Multi Processor Machine

    Takamichi Miyamoto, Takahiro Yamaguchi, Takao Tobita, Kazuhisa Ishizaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2003-ARC-155-06   2003 ( 119 ) 63 - 68  2003.11

     View Summary

    On the shared multi processor system used in current computing servers, the increase of memory access overhead with the speedup of CPU interfere to get the scalable performance improvement with the increase of the processors. In order to get scalable performance improvement, this paper proposes and evaluates the static scheduling algorithm which reduces the memory access overhead by using cache prefetch to overlap of data transfer and task processing. The proposed algorithm is used in static scheduling stage in a compiler, moreover the compiler generates a OpenMP parallelized Fortran program with prefetch directives for SUN Forte compiler for Sun Fire V880 server. Performance evaluation shows that the proposed algorithm gave us super linear speedup compared with sequential processing without prefetching by Sun Forte compiler such as 13.9 times speedup on 8 processors for SPEC95fp tomcatv program and 22.3 times speedup on 8 processors for SPEC95fp swim program. Futhermore, compared with automatic prefetching by SUN Forte compiler using the same number of processors, this algorithm shows that 1.1 times speedup on 1 processor, 3.86 times speedup on 8 processors for SPEC95fp tomcatv and 1.44 times speedup on 1processor, 1.85 times speedup on 8 processors for SPEC95fp swim.

    CiNii

  • SMPマシン上での粗粒度タスク並列処理におけるデータプリフェッチ手法

    宮本 孝道, 山口 高弘, 飛田 高雄, 石坂 一久, 木村 啓二, 笠原 博徳

    情報処理学会研究会報告2003-ARC-155-06   2003 ( 119 ) 63 - 68  2003.11  [Refereed]

     View Summary

    On the shared multi processor system used in current computing servers, the increase of memory access overhead with the speedup of CPU interfere to get the scalable performance improvement with the increase of the processors. In order to get scalable performance improvement, this paper proposes and evaluates the static scheduling algorithm which reduces the memory access overhead by using cache prefetch to overlap of data transfer and task processing. The proposed algorithm is used in static scheduling stage in a compiler, moreover the compiler generates a OpenMP parallelized Fortran program with prefetch directives for SUN Forte compiler for Sun Fire V880 server. Performance evaluation shows that the proposed algorithm gave us super linear speedup compared with sequential processing without prefetching by Sun Forte compiler such as 13.9 times speedup on 8 processors for SPEC95fp tomcatv program and 22.3 times speedup on 8 processors for SPEC95fp swim program. Futhermore, compared with automatic prefetching by SUN Forte compiler using the same number of processors, this algorithm shows that 1.1 times speedup on 1 processor, 3.86 times speedup on 8 processors for SPEC95fp tomcatv and 1.44 times speedup on 1processor, 1.85 times speedup on 8 processors for SPEC95fp swim.

    CiNii

  • The Data Prefetching of Coarse Grain Task Parallel Processing on Symmetric Multi Processor Machine

    Takamichi Miyamoto, Takahiro Yamaguchi, Takao Tobita, Kazuhisa Ishizaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2003-ARC-155-06   2003 ( 119 ) 63 - 68  2003.11  [Refereed]

     View Summary

    On the shared multi processor system used in current computing servers, the increase of memory access overhead with the speedup of CPU interfere to get the scalable performance improvement with the increase of the processors. In order to get scalable performance improvement, this paper proposes and evaluates the static scheduling algorithm which reduces the memory access overhead by using cache prefetch to overlap of data transfer and task processing. The proposed algorithm is used in static scheduling stage in a compiler, moreover the compiler generates a OpenMP parallelized Fortran program with prefetch directives for SUN Forte compiler for Sun Fire V880 server. Performance evaluation shows that the proposed algorithm gave us super linear speedup compared with sequential processing without prefetching by Sun Forte compiler such as 13.9 times speedup on 8 processors for SPEC95fp tomcatv program and 22.3 times speedup on 8 processors for SPEC95fp swim program. Futhermore, compared with automatic prefetching by SUN Forte compiler using the same number of processors, this algorithm shows that 1.1 times speedup on 1 processor, 3.86 times speedup on 8 processors for SPEC95fp tomcatv and 1.44 times speedup on 1processor, 1.85 times speedup on 8 processors for SPEC95fp swim.

    CiNii

  • Millennium Project IT21 Advanced Parallelizing Compiler

    H. Kasahara

    Information Processing Society of Japan Kansai Branch    2003.10  [Refereed]

  • ミレニアムプロジェクトIT21 アドバンスト並列化コンパイラ

    笠原 博徳

    (社)情報処理学会 関西支部大会    2003.10  [Refereed]

  • Data Localization Scheme using Static Scheduling on Chip Multiprocessor

    Hirofumi Nakano, Takeshi Kodaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2003-ARC-154-14   2003 ( 84 ) 79 - 84  2003.08

     View Summary

    Recently, chip multiprocessor architecture that contains multiple processors on a chip becomes popular approach even in commercial area. The authors have proposed OSCAR chip multiprocessor (OSCAR CMP) that is aimed at exploiting multiple grains of parallelism hierarchically from a sequential program on a chip. OSCAR CMP has local data memory (LDM) for processor private data and distributed shared memory having two ports for processor shared data to control data allocation by a compiler appropriately. This paper describes data localization scheme for OSCAR CMP which exploits data locality by assigning coarse grain tasks sharing same data on a same processor consecutively. In addition, OSCAR CMP using data localization scheme is compared with shared cache architecture and snooping cache architecture. Then, current naive code generation for OSCAR CMP is considered using evaluation results.

    CiNii

  • Parallel Processing on MPEG2 Encoding for OSCAR Chip Multiprocessor

    Takeshi Kodaka, Hirofumi Nakano, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2003-ARC-154-10   2003 ( 84 ) 55 - 60  2003.08  [Refereed]

     View Summary

    Recently, multimedia applications with visual and sound processing are popular on mobile phones and PDAs. To satisfy the needs for efficient multimedia processing, development of low cost, low power consumption and high performance processors for multimedia applications has been expected. Chip multiprocessor architectures which allows us to attain scalability using coarse grain level parallelism and loop level parallelism in addition to instruction level parallelism are attracting much attention. However, to realize efficient processing on chip multiprocessor architectures, parallel processing techniques such as decomposing a program into adequate tasks considering characteristics of a program and assigning these tasks onto processors are essential. This paper describes a parallel processing scheme for MPEG2 encoding for a chip multiprocessor and its performance.

    CiNii

  • OSCAR CMP上でのスタティックスケジューリングを用いたデータローカライゼーション手法

    中野 啓史, 小高 剛, 木村 啓二, 笠原 博徳

    情報処理学会研究会報告2003-ARC-154-14   2003 ( 84 ) 79 - 84  2003.08  [Refereed]

     View Summary

    Recently, chip multiprocessor architecture that contains multiple processors on a chip becomes popular approach even in commercial area. The authors have proposed OSCAR chip multiprocessor (OSCAR CMP) that is aimed at exploiting multiple grains of parallelism hierarchically from a sequential program on a chip. OSCAR CMP has local data memory (LDM) for processor private data and distributed shared memory having two ports for processor shared data to control data allocation by a compiler appropriately. This paper describes data localization scheme for OSCAR CMP which exploits data locality by assigning coarse grain tasks sharing same data on a same processor consecutively. In addition, OSCAR CMP using data localization scheme is compared with shared cache architecture and snooping cache architecture. Then, current naive code generation for OSCAR CMP is considered using evaluation results.

    CiNii

  • OSCARマルチプロセッサシステム上でのMPEG2エンコーディングの並列処理

    小高 剛, 中野 啓史, 木村 啓二, 笠原 博徳

    情報処理学会研究会報告2003-ARC-154-10    2003.08  [Refereed]

  • Millennium Project IT21 'Advanced Parallelizing Compiler' and Compiler Cooperative Chip Multiprocessor

    H. Kasahara

    The 2nd Super H Open Forum, Renesas Technology Corp. &amp; Hitachi Ltd.    2003.08  [Refereed]

  • Data Localization Scheme using Static Scheduling on Chip Multiprocessor

    Hirofumi Nakano, Takeshi Kodaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, 2003-ARC-154-14    2003.08  [Refereed]

  • ミレニアムプロジェクトIT21”アドバンスト並列化コンパイラ”とコンパイラ協調型チップマルチプロセッサ

    笠原 博徳

    ㈱ルネサステクノロジ、㈱日立製作所 第2回 Super H オープンフォーラム    2003.08  [Refereed]

  • Static coarse grain task scheduling with cache optimization using OpenMP

    H Nakano, K Ishizaka, M Obata, K Kimura, H Kasahara

    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING   31 ( 3 ) 211 - 223  2003.06  [Refereed]

     View Summary

    Effective use of cache memory is getting more important with increasing gap between the processor speed and memory access speed. Also, use of multigrain parallelism is getting more important to improve effective performance beyond the limitation of loop iteration level parallelism. Considering these factors, this paper proposes a coarse grain task static scheduling scheme considering cache optimization. The proposed scheme schedules coarse grain tasks to threads so that shared data among coarse grain tasks can be passed via cache after task and data decomposition considering cache size at compile time. It is implemented on OSCAR Fortran multigrain parallelizing compiler and evaluated on Sun Ultra80 four-processor SMP workstation using Swim and Tomcatv from the SPEC fp 95. As the results, the proposed scheme gives us 4.56 times speedup for Swim and 2.37 times on 4 processors for Tomcatv respectively against the Sun Forte HPC Ver. 6 update 1 loop parallelizing compiler.

  • Inter-Array Padding for Data Localization with Static Scheduling

    Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, 2003-ARC-153-11    2003.05

  • スタティックスケジューリングを用いたデータローカライゼーションにおける配列間パディング

    石坂 一久, 小幡 元樹, 笠原 博徳

    情報処理学会研究会報告2003-ARC-153    2003.05  [Refereed]

  • Inter-Array Padding for Data Localization with Static Scheduling

    Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, 2003-ARC-153-11    2003.05  [Refereed]

  • IT競争力強化に向けた産官学連携

    笠原博徳

    朝日新聞社企画 WASEDA.COM, オピニオン    2003.04  [Refereed]  [Invited]

  • マルチグレイン並列処理のための階層的並列性制御手法

    小幡 元樹, 白子 準, 神長 浩気, 石坂 一久, 笠原 博徳

    情報処理学会論文誌   44 ( 4 ) 1044 - 1055  2003.04  [Refereed]

    CiNii

  • 最先端の自動並列化コンパイラ技術

    笠原博徳

    情報処理学会誌   44 ( 4 ) 384 - 392  2003.04  [Refereed]

    CiNii

  • IT競争力強化のための研究開発人材---経済産業省アドバンスト並列化コンパイラプロジェクトリーダ,JEITA及びSTARC産官学連携講座の経験を通して---

    笠原 博徳

    経済産業省 大臣官房 イノベーション・システムにおける研究開発人材に関する研究会    2003.04  [Refereed]

  • Hierarchical Parallelism Control Scheme for Multigrain Parallelization

    Motoki Obata, Jun Shirako, Hiroki Kaminaga, Kazuhisa Ishizaka, Hironori Kasahara

    Trans. of IPSJ   44 ( 4 )  2003.04  [Refereed]

  • Multigrain parallel processing on compiler cooperative OSCAR chip multiprocessor architecture

    K Kimura, T Kodaka, M Obata, H Kasahara

    IEICE TRANSACTIONS ON ELECTRONICS   E86C ( 4 ) 570 - 579  2003.04  [Refereed]

     View Summary

    This paper describes multigrain parallel processing on OSCAR (Optimally SCheduled Advanced multiprocessoR) chip multiprocessor architecture. OSCAR compiler cooperative chip multiprocessor architecture aims at development of scalable, high effective performance and cost effective chip multiprocessor with ease of use by compiler supports. OSCAR chip multiprocessor architecture integrates simple single issue processors having distributed shared data memory for optimal, use of data locality over different loops and fine grain data transfer and synchronization, local data memory for private data recognized by compiler, and compiler controllable data transfer unit for overlapping data transfer to hide data transfer overhead. This OSCAR chip multiprocessor and OSCAR multigrain parallelizing compiler have been developed. simultaneously. Performance of multigrain parallel processing on OSCAR chip multiprocessor architecture is evaluated using SPEC fp 2000/95 benchmark suite. When microSPARC like single issue core is used, OSCAR chip multiprocessor architecture gives us 2.36 times speedup in fpppp, 2.64 times in su2cor, 2.88 times in turb3d, 2.98 times in hydro2d, 3.84 times in tomcatv, 3.84 times in mgrid and 3.97 times in swim respectively for four processors against single processor.

  • Collaboration of Industry, Government and Academia for IT Competitive Power Strengthening

    Hironori Kasahara

    Opinions, WASEDA.COM, Asahi Shimbunnsha    2003.04  [Refereed]

  • R&amp;D Human Resource for Strengthening IT Competitive Power---From the experience of a Project Leader of METI Advanced Parallelizing Compiler Project and JEITA &amp; STARC Industry, Government and Academia Cooperative Lectures---

    H. Kasahara

    METI Minister's Secretariat Sig. on R&amp;D Human Resource for Innovation Systems    2003.04  [Refereed]

  • Advanced Automatic Parallelizing Compiler Technology

    Hironori Kasahara

    IPSJ MAGAZINE   44 ( 4 ) 384 - 392  2003.04  [Refereed]

  • 研究開発競争力強化に向けた産官学連携寄付講座:JEITA IT最前線

    笠原博徳

    早稲田大 学理工学部・大学院報「塔」78号    2003.03  [Refereed]  [Invited]

  • Industry, Government and Academia Collaborative Donated Course for R&amp;D Competitive Power Strengthening

    Hironori Kasahara

    Waseda University School of Science and Engineering, "Tower", No.78    2003.03  [Refereed]

  • Coarse grain task parallel processing with cache optimization on shared memory multiprocessor

    K Ishizaka, M Obata, H Kasahara

    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING   2624   352 - 365  2003  [Refereed]

     View Summary

    In multiprocessor systems, the gap between peak and effective performance has getting larger. To cope with this performance gap, it is important to use multigrain parallelism in addition to ordinary loop level parallelism. Also, effective use of memory hierarchy is important for the performance improvement of multiprocessor systems because the speed gap between processors and memories is getting larger. This paper describes coarse grain task parallel processing that uses parallelism among macro-tasks like loops and subroutines considering cache optimization using data localization scheme. The proposed scheme is implemented on OSCAR automatic multigrain parallelizing compiler. OSCAR compiler generates OpenMP FORTRAN program realizing the proposed scheme from a sequential FORTRAN77 program. Its performance is evaluated on IBM RS6000 SP 604e High Node 8 processors SMP machine using SPEC95fp tomcatv, swim, mgrid. In the evaluation, the proposed coarse grain task parallel processing scheme with cache optimization gives us up to 1.3 times speedup on 1PE, 4.7 times speedup on 4PE and 8.8 times speedup on 8PE compared with a sequential processing time.

  • Data Localization using Coarse Grain Task Parallelization on Chip Multiprocessor

    Hirofumi Nakano, Takeshi Kodaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2003-151-3(SHINING2003)   2003 ( 10 ) 13 - 18  2003.01

     View Summary

    Recently. Chip Multiprocessor(GMP)architecture has attracted much attention as a next-generation microprocessor architecture. and many kinds of GMP have widely developed. However, these GMP architectures still have the problem of effective use of memory system nearby processor cores such as cache and local memory. On the other hand, the authors have proposed OSCAR GMP. which cooperatively works with multigrain parallel processing, to achieve high effective performance and good cost effectiveness. To overcome the problem of effective use of cache and local memory. OSCAR GMP has local data memory(LDM)for processor private data and distributed shared memory(DSN) having two por for synchronization and data transfer among processor cores, in addition to centralized shared memory (CSM). The multigrain parallelizing compiler uses such memory architecture of OSCAR GMP with data localization scheme that fully uses compile time information. This paper proposes a coarse grain task static scheduling scheme considering data localization using live variable analysis. Furthermore, data transfer between CSM and LDM insertion scheme using information of live variable analysis is also described. This data localization scheme is implemented on OSCAR FORTRAN multigrain parallelizing compiler and is evaluated on OSCAR GMP using Tomcatv form SPEC fp 95 benchmark suite. As the results, the proposed scheme gives us about 1.3 times speedup using 20 clocks as the access latency of CSM, and about 1.6 times using 40 clocks as the access latency of CSM respectively against without data localization scheme.

    CiNii

  • Inline Expansion for Improvement of Multi Grain Parallelism

    Jun Shirako, Kouhei Nagasawa, Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC2003-151-2(SHINING2003)    2003.01

  • チップマルチプロセッサ上での粗粒度タスク並列処理によるデータローカライゼーション

    中野 啓史, 小高 剛, 木村 啓二, 笠原 博徳

    情報処理学会研究報告ARC2003-151-3(SHINING2003)   2003 ( 10 ) 13 - 18  2003.01  [Refereed]

     View Summary

    Recently. Chip Multiprocessor(GMP)architecture has attracted much attention as a next-generation microprocessor architecture. and many kinds of GMP have widely developed. However, these GMP architectures still have the problem of effective use of memory system nearby processor cores such as cache and local memory. On the other hand, the authors have proposed OSCAR GMP. which cooperatively works with multigrain parallel processing, to achieve high effective performance and good cost effectiveness. To overcome the problem of effective use of cache and local memory. OSCAR GMP has local data memory(LDM)for processor private data and distributed shared memory(DSN) having two por for synchronization and data transfer among processor cores, in addition to centralized shared memory (CSM). The multigrain parallelizing compiler uses such memory architecture of OSCAR GMP with data localization scheme that fully uses compile time information. This paper proposes a coarse grain task static scheduling scheme considering data localization using live variable analysis. Furthermore, data transfer between CSM and LDM insertion scheme using information of live variable analysis is also described. This data localization scheme is implemented on OSCAR FORTRAN multigrain parallelizing compiler and is evaluated on OSCAR GMP using Tomcatv form SPEC fp 95 benchmark suite. As the results, the proposed scheme gives us about 1.3 times speedup using 20 clocks as the access latency of CSM, and about 1.6 times using 40 clocks as the access latency of CSM respectively against without data localization scheme.

    CiNii

  • マルチグレイン並列性向上のためのインライン展開手法

    白子 準, 長澤 耕平, 石坂 一久, 小幡 元樹, 笠原 博徳

    情報処理学会研究報告ARC2003-151-2(SHINING2003)    2003.01  [Refereed]

  • Data Localization using Coarse Grain Task Parallelization on Chip Multiprocessor

    Hirofumi Nakano, Takeshi Kodaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2003-151-3(SHINING2003)    2003.01  [Refereed]

  • Multigrain parallel processing on OSCAR CMP

    K Kimura, T Kodaka, M Obata, H Kasahara

    INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS     56 - 65  2003  [Refereed]

     View Summary

    It seems that Instruction Level Parallelism (ILP) approach, which has been used by various superscalar processors and VLIW processors for a long time, reaches its limitation of performance improvement. To obtain scalable performance improvement, cost effectiveness and high productivity even in the era of one billion transistors, the cooperative work between software and hardware is getting increasingly important. For this reason, the authors have developed OSCAR (Optimally SCheduled Advanced multiprocessoR) Chip Multiprocessor (OSCAR CMP) and OSCAR multigrain compiler simultaneously. To preserve the scalability in the future, OSCAR CMP has mechanisms for efficient use of parallelism and data locality, and for hiding data transfer overhead. These mechanisms can be fully controlled by the OSCAR multigrain compiler In this paper, the authors focus on multigrain parallel processing on OSCAR CMP, which enables us to exploit loop iteration level parallelism and coarse grain task parallelism in addition to ILP from the entire of a program. Performance of multigrain parallel processing on OSCAR CMP architecture is evaluated using SPEC fp 2000195 benchmark suite. When microSPARC like single issue core is used, OSCAR CMP gives us from 1.77 to 3.96 times speedup for four processors against single processor In addition, OSCAR CMP is compared with Sun UltraSPARC II like processor to evaluate cost effectiveness. As a result, OSCAR CMP gives us 1.66 times better performance on the average under the condition that OSCAR CMP and UltraSPARC II are built from almost same number of transistors.

  • Multigrain Parallel Processing on OSCAR Chip Multiprocessor

    Keiji Kimura, Takeshi Kodaka, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-150-7    2002.11

  • Multigrain Parallel Processing on Motion Vector Estimation for Single Chip Multiprocessor

    Takeshi Kodaka, Takahisa Suzuki, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-150-6    2002.11

  • OSCAR チップマルチプロセッサ上でのマルチグレイン並列処理

    木村 啓二, 小高 剛, 小幡 元樹, 笠原 博徳

    情報処理学会研究報告ARC2002-150-7    2002.11  [Refereed]

  • OSCAR 型シングルチップマルチプロセッサにおける動きベクトル探索処理

    小高 剛, 鈴木 貴久, 木村 啓二, 笠原 博徳

    情報処理学会研究報告ARC2002-150-6    2002.11  [Refereed]

  • Multigrain Parallel Processing on OSCAR Chip Multiprocessor

    Keiji Kimura, Takeshi Kodaka, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-150-7    2002.11  [Refereed]

  • Multigrain Parallel Processing on Motion Vector Estimation for Single Chip Multiprocessor

    Takeshi Kodaka, Takahisa Suzuki, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-150-6    2002.11  [Refereed]

  • Multigrain Parallelizing Compiler for Chip Multiprocessors to High Performance Severs

    H. Kasahara

    Intel ICRC, China    2002.11  [Refereed]

  • A standard task graph set for fair evaluation of multiprocessor scheduling algorithms

    Takao Tobita, Hironori Kasahara

    Journal of scheduing, John Wiley &amp; Sons Ltd   5 ( 5 ) 379 - 394  2002.10  [Refereed]

    CiNii

  • シングルチップマルチプロセッサにおけるJPEGエンコーディングのマルチグレイン並列処理

    小高 剛, 内田 貴之, 木村 啓二, 笠原 博徳

    情報処理学会ハイパフォーマンスコンピューティングシステム論文誌   43 ( Sig.6(HPS5) ) 153 - 62  2002.09  [Refereed]

  • NEDO-1 アドバンスト並列化コンパイラ技術

    笠原 博徳

    情報処理学会・電子情報通信学会FIT (Forum on Information Technology), 大型プロジェクト紹介(国家プロジェクト紹介), 東工大 百年記念館フェライト会議室    2002.09  [Refereed]

  • OSCAR Multigrain Parallelizing Compiler for Chip Multiprocessors to High Performance Severs

    H. Kasahara

    Polish-Japanese Institute of Information Technology (PJIIT) hosted by Prof. Marek Tudruj    2002.09  [Refereed]

  • NEDO-1 Advanced Parallelizing Technology, IPSJ-IEICE FIT2002 (Forum on Information Technology), National Project Introduction

    H. Kasahara

       2002.09  [Refereed]

  • Cache Optimization among Coarse Grain Tasks considering Line Conflict Miss

    Kazuhisa Ishizaka, Hirofumi Nakano, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-149-25(SWoPP2002)    2002.08

  • Performance of OSCAR Multigrain Parallelizing Compiler on SMPs

    Motoki Obata, Jun Shirako, Kazuhisa Ishizaka, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-149-20(SWoPP2002)    2002.08  [Refereed]

  • ラインコンフリクトミスを考慮した粗粒度タスク間キャッシュ最適化

    石坂 一久, 中野 啓史, 小幡 元樹, 笠原 博徳

    情報処理学会研究報告ARC2002-149-25(SWoPP2002)   2002 ( 81 ) 145 - 150  2002.08  [Refereed]

     View Summary

    Effective use of cache is getting important with the increase of the speed gap between processors and memories. In this paper, cache optimization for coarse grain task parallel processing is described. Coarse grain task parallel processing uses the parallelism among coarse grain tasks such like basic blocks, loops and subroutines to increase effective performance of multiprocessor. In the proposed cache optimization, loops are decomposed to the small loops which access smaller data than cache size. Moreover, these loops are executed as consecutively as possible on the same processor to use cache effectively for data transfer among loops. In addition, the proposed cache optimization eliminates conflict misses among the data used in macro tasks which are consecutively executed on same processor by intra-variable padding which changes array dimension size. The proposed scheme is evaluated on Sun Ultra80 using spec95 swim. The performance of cache optimization among macro tasks (10.0s) gave us 10 times speedup against the sequential execution (99.8s) by elimination of conflict misses for 4 processors on which all data can be put on cache after padding because total cache size exceeds data size. Total speedup using padding and cache optimization among macro tasks (79.1s) is 18% against Sun Forte compiler on single processor (93.5s). Also, in the evaluation on IBM RS6000 SP 604e, the proposed scheme improve the performance of coarse grain task parallel processing by 14% (59.2s to 52.0s) for 8pe, and gave us 2.08 times speedup against XLF compiler for 6pe which gave us the best performance(108.0s).

    CiNii

  • SMPシステム上でのOSCARマルチグレイン並列化コンパイラの性能

    小幡 元樹, 石坂 一久, 白子 準, 笠原 博徳

    情報処理学会研究報告ARC2002-149-20(SWoPP2002)   2002 ( 81 ) 115 - 120  2002.08  [Refereed]

     View Summary

    This paper describes OSCAR multigrain parallelizing compiler which has been developed in Japanese Millennium Project IT21 "Advanced Parallelizing Compiler" and its performance on SMP machines. The compiler realizes multigrain parallelization for chip-multiprocessors to high-end servers to hierarchically exploit coarse grain task parallelism among loops, subroutines and basic blocks and near fine grain parallelism among statements inside a basic block in addition to loop parallelism. Also, it globally optimizes cache use over different loops, or coarse grain tasks, based on data localization technique to reduce memory access overhead. Performance of OSCAR compiler for SPEC95fp is evaluated on different SMPs. For example, it gives us 10.6 times for MGRID on 16 processor IBM RegattaH, 8.5 times speedup for HYDRO2D on 8 processor IBM RS6000 604e High Node against sequential processing and 6.0 times speedup for TOMCATV using 4 processors on Sun Fire V880 server.

    CiNii

  • ミレニアムプロジェクトIT21アドバンスト並列化コンパイラにおけるマルチグレイン並列処理

    笠原 博徳

    自律分散システム研究会(名古屋大学)    2002.08  [Refereed]

  • Cache Optimization among Coarse Grain Tasks considering Line Conflict Miss

    Kazuhisa Ishizaka, Hirofumi Nakano, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-149-25(SWoPP2002)   2002 ( 81 ) 145 - 150  2002.08  [Refereed]

     View Summary

    Effective use of cache is getting important with the increase of the speed gap between processors and memories. In this paper, cache optimization for coarse grain task parallel processing is described. Coarse grain task parallel processing uses the parallelism among coarse grain tasks such like basic blocks, loops and subroutines to increase effective performance of multiprocessor. In the proposed cache optimization, loops are decomposed to the small loops which access smaller data than cache size. Moreover, these loops are executed as consecutively as possible on the same processor to use cache effectively for data transfer among loops. In addition, the proposed cache optimization eliminates conflict misses among the data used in macro tasks which are consecutively executed on same processor by intra-variable padding which changes array dimension size. The proposed scheme is evaluated on Sun Ultra80 using spec95 swim. The performance of cache optimization among macro tasks (10.0s) gave us 10 times speedup against the sequential execution (99.8s) by elimination of conflict misses for 4 processors on which all data can be put on cache after padding because total cache size exceeds data size. Total speedup using padding and cache optimization among macro tasks (79.1s) is 18% against Sun Forte compiler on single processor (93.5s). Also, in the evaluation on IBM RS6000 SP 604e, the proposed scheme improve the performance of coarse grain task parallel processing by 14% (59.2s to 52.0s) for 8pe, and gave us 2.08 times speedup against XLF compiler for 6pe which gave us the best performance(108.0s).

    CiNii

  • Multigrain Parallel Processing in Millennium Project IT21 Advanced Parallelizing Compiler

    H. Kasahara

    Sig. on Autonomous Distributed Systems, Nagoya University hosted by Prof. Toshio Fukuda    2002.08  [Refereed]

  • Coarse Grain Task Parallel Processing with Automatic Determination Scheme of Parallel Processing Layer

    Jun Shirako, Hiroki Kaminaga, Noriaki Kondo, Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-148-4   2002 ( 37 ) 19 - 24  2002.05

     View Summary

    For improvement performance and usablity of multiprocessor systems used from a chip multiprocessor to high performance computer, a multi-grain compilation scheme, which exploits coarse grain parallelism among loops, subroutines and basic blocks, conventional medium grain parallelism among loop-iterations in a Doall loop and near fine grain parallelism among statements inside a basic block, is important. In order to extract the parallelism of each layer(nest level) hierarchically and achieve a better performance in multi-grain parallel processing, it is necessary to determine how much processors or groups of processors(,or processor clusters) should be assigned to the layers, according to the parallelism of the target program layers. This paper proposes an automatic determination scheme of the number of processors to be assigned to each layer, to use the parallelism of each hierarchy in a program efficiently. Effectiveness of the proposed scheme is evaluated on IBM RS6000 SMP server with 8 processors using 8 programs of SPEC95FP.

    CiNii

  • Evaluation of Overhead with Coarse Grain Task Parallel Processing on SMP Machines

    Yasutaka Wada, Hirofumi Nakano, Keiji Kimura, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-148-3    2002.05

  • 世界トップのIT産業を担う技術と人材の育成

    笠原博徳

    早稲田大学広報誌 月刊 Campus Now 2002/5号    2002.05  [Refereed]  [Invited]

  • シングルチップマルチプロセッサにおける JPEGエンコーディングのマルチグレイン並列処理

    小高 剛, 内田 貴之, 木村 啓二, 笠原 博徳

    情報処理学会並列処理シンポジウム(JSPP2002)    2002.05  [Refereed]

  • 並列処理階層自動決定手法を用いた粗粒度タスク並列処理

    白子 準, 神長 浩気, 近藤 巧章, 石坂 一久, 小幡 元樹, 笠原博徳

    情報処理学会研究報告ARC2002-148-4   2002 ( 37 ) 19 - 24  2002.05  [Refereed]

     View Summary

    For improvement performance and usablity of multiprocessor systems used from a chip multiprocessor to high performance computer, a multi-grain compilation scheme, which exploits coarse grain parallelism among loops, subroutines and basic blocks, conventional medium grain parallelism among loop-iterations in a Doall loop and near fine grain parallelism among statements inside a basic block, is important. In order to extract the parallelism of each layer(nest level) hierarchically and achieve a better performance in multi-grain parallel processing, it is necessary to determine how much processors or groups of processors(,or processor clusters) should be assigned to the layers, according to the parallelism of the target program layers. This paper proposes an automatic determination scheme of the number of processors to be assigned to each layer, to use the parallelism of each hierarchy in a program efficiently. Effectiveness of the proposed scheme is evaluated on IBM RS6000 SMP server with 8 processors using 8 programs of SPEC95FP.

    CiNii

  • SMPマシン上での粗粒度タスク並列処理オーバーへッドの解析

    和田 康孝, 中野 啓史, 木村 啓二, 小幡 元樹, 笠原博徳

    情報処理学会研究報告ARC2002-148-3   2002 ( 37 ) 13 - 18  2002.05  [Refereed]

     View Summary

    Coarse grain task parallel processing, which exploits parallelism among loops, subroutines and basic blocks, is getting more important to attain performance improvement on multiprocessor architectures. To efficiently implement the coarse grain task parallel processing. it is important to analyze various processor overhead quantitatively. This paper evaluates overheads of barrier synchronization, thread fork/join and L2 cache miss penalty are using performance measurement mechanisms to analyze the performance improvements by OSCAR Fortran compiler on Sun Ultra80, IBM RS6000 and SGI Origin2000.

    CiNii

  • Upbringing of Technology and Human Resource Aiming at World Top IT Industry

    Hironori Kasahara

    Waseda Univ. Monthly Report "Campus Now" Vol.5, 2002    2002.05  [Refereed]

  • Coarse Grain Task Parallel Processing with Automatic Determination Scheme of Parallel Processing Layer

    Jun Shirako, Hiroki Kaminaga, Noriaki Kondo, Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-148-4   2002 ( 37 ) 19 - 24  2002.05  [Refereed]

     View Summary

    For improvement performance and usablity of multiprocessor systems used from a chip multiprocessor to high performance computer, a multi-grain compilation scheme, which exploits coarse grain parallelism among loops, subroutines and basic blocks, conventional medium grain parallelism among loop-iterations in a Doall loop and near fine grain parallelism among statements inside a basic block, is important. In order to extract the parallelism of each layer(nest level) hierarchically and achieve a better performance in multi-grain parallel processing, it is necessary to determine how much processors or groups of processors(,or processor clusters) should be assigned to the layers, according to the parallelism of the target program layers. This paper proposes an automatic determination scheme of the number of processors to be assigned to each layer, to use the parallelism of each hierarchy in a program efficiently. Effectiveness of the proposed scheme is evaluated on IBM RS6000 SMP server with 8 processors using 8 programs of SPEC95FP.

    CiNii

  • Evaluation of Overhead with Coarse Grain Task Parallel Processing on SMP Machines

    Yasutaka Wada, Hirofumi Nakano, Keiji Kimura, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-148-3    2002.05  [Refereed]

  • JPEG Encoding using Multigrain Parallel Processing on a Shingle Chip Multiprocessor

    Takeshi Kodaka, Takayuki Uchida, Keiji Kimura, Hironori Kasahara

    Joint Symposium on Parallel Processing 2002 (JSPP2002)   43 ( 6 ) 153 - 162  2002.05  [Refereed]

    CiNii

  • 標準タスクグラフセットを用いた実行時間最小マルチプロセッサスケジューリングアルゴリズムの性能評価

    飛田 高雄, 笠原 博徳

    情報処理学会論文誌   43 ( 4 ) 936 - 947  2002.04  [Refereed]

     View Summary

    This paper proposes a "Standard Task Graph Set" (STG) to evaluate performance of heuristic and optimization algorithms for the minimum execution time multiprocessor scheduling problem, which is known as a strong NP-hard combinational optimization problem, and describes evaluation results by applying them to several algorithms. In the previous researches on multiprocessor scheduling algorithms, there exists a problem that it is not able to compare the performance to decide which algorithm is better, because the task graphs fit for the algorithm proposed in each paper or were not available to the other researchers. To cope with this problem, STG makes possible the fair evaluation and comparison of the algorithms under the same conditions for every researchers by giving many kinds of random task graphs based on various task graph generation methods used in the literature with their scheduling results, and making them available from Website. This paper evaluates several algorithms using 2,700 task graphs with 50 to 5,000 tasks from STG and evaluates its effectiveness. The performance evaluation confirms that heuristic algorithms CP and CP/MISF could obtain optimal schedules 68.22% and 68.46% of tested cases, 85.79% by a sequential optimization algorithm DF/IHS, and 89.60% by a parallel optimization algorithm PDF/IHS on a SMP with 4 processor elements within 600 seconds upper limit. It was also confirmed that the proposed STG is useful for evaluation of the heuristic and the optimization scheduling algorithms.

    CiNii

  • 共有メモリマルチプロセッサ上でのキャッシュ最適化を考慮した粗粒度タスク並列処理

    石坂 一久, 中野 啓史, 八木 哲志, 小幡 元樹, 笠原 博徳

    情報処理学会論文誌   43 ( 4 ) 958 - 970  2002.04  [Refereed]

     View Summary

    In multiprocessor systems, the gap between peak and effective performance has getting larger. To cope with this performance gap, it is important to use multigrain parallelism in addition to ordinary loop level parallelism. Also, effective use of memory hierarchy is important for the performance improvement of multiprocessor systems because the speed gap between processors and memories is getting larger. This paper describes coarse grain task parallel processing that uses parallelism among macro-tasks like loops and subroutines considering cache optimization using data localization scheme. The proposed scheme is implemented on OSCAR automatic multigrain parallelizing compiler. OSCAR compiler generates OpenMP FORTRAN program realizing the proposed scheme from an ordinary FORTRAN77 program. Its performance is evaluated on IBM RS6000 SP 604e High Node 8 processors SMP machine and Sun Ultra80 4 processors SMP machine. In the evaluation, OSCAR compiler gives us up to 5.8 times speedup against the minimum execution time of IBM XL FORTRAN compiler on IBM RS/6000 and up to 3.6 times speedup against Sun Forte 6 update 1 compiler on Sun Ultra80.

    CiNii

  • Coarse Grain Task Parallel Processing with Cache Optimization on Shared Memory Multiprocessor

    Kazuhisa Ishizaka, Hirofumi Nakano, Satoshi Yagi, Motoki Obata, Hironori Kasahara

    Trans. of IPSJ   43 ( 4 ) 958 - 970  2002.04  [Refereed]

     View Summary

    In multiprocessor systems, the gap between peak and effective performance has getting larger. To cope with this performance gap, it is important to use multigrain parallelism in addition to ordinary loop level parallelism. Also, effective use of memory hierarchy is important for the performance improvement of multiprocessor systems because the speed gap between processors and memories is getting larger. This paper describes coarse grain task parallel processing that uses parallelism among macro-tasks like loops and subroutines considering cache optimization using data localization scheme. The proposed scheme is implemented on OSCAR automatic multigrain parallelizing compiler. OSCAR compiler generates OpenMP FORTRAN program realizing the proposed scheme from an ordinary FORTRAN77 program. Its performance is evaluated on IBM RS6000 SP 604e High Node 8 processors SMP machine and Sun Ultra80 4 processors SMP machine. In the evaluation, OSCAR compiler gives us up to 5.8 times speedup against the minimum execution time of IBM XL FORTRAN compiler on IBM RS/6000 and up to 3.6 times speedup against Sun Forte 6 update 1 compiler on Sun Ultra80.

    CiNii

  • A Macrotask selection technique for Data-Localization Scheme on Shared-memory Multi-Processor

    Satoshi Yagi, Hiroki Itagaki, Hirofumi Nakano, Kazuhisa Ishizaka, Motoki Obata, Akimasa Yoshida, Hironori Kasahara

    Technical Report of IPSJ, ARC    2002.03

  • An Analysis-time Procedure Inlining and Flexible Cloning Scheme for Coarse-grain Automatic Parallelizing Compilation

    Shin-ya Kumazawa, Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC    2002.03

  • 粗粒度並列性抽出のための解析時インライニングとフレキシブルクローニング

    熊澤 慎也, 石坂 一久, 小幡 元樹, 笠原 博徳

    情報処理学会研究報告 ARC   2002 ( 22 ) 191 - 196  2002.03  [Refereed]

     View Summary

    This paper proposes an interprocedural parallelism analysis scheme which combines analysis-time inline expansion and flexible cloning for coarse-grain parallelization. The analysis-time inlining is applied to selected subroutines. After the analysis of global parallelism over procedures, compiler generates inlined code for program part having global parallelism or applies "flexible cloning" to program parts without global parallelism into the original shape or different shape of subroutine. With this scheme, the compiler can exploit global coarse-grain with minimum increase in the code size. Performance evaluation using benchmark program ARC2D on SUN Ultra80 shows the proposed scheme gives us maximum 15% speedup than automatic parallelization of SUN Forte compiler. And by using flexible cloning, increase of code size has reduced by 14.8% from the case which doesn't use it.

    CiNii

  • 共有メモリマルチプロセッサ上でのデータローカライゼーション対象マクロタスク決定手法

    八木 哲志, 板垣 裕樹, 中野 啓史, 石坂 一久, 小幡 元樹, 吉田 明正, 笠原 博徳

    情報処理学会研究報告 ARC    2002.03  [Refereed]

    CiNii

  • An Analysis-time Procedure Inlining and Flexible Cloning Scheme for Coarse-grain Automatic Parallelizing Compilation

    Shin-ya Kumazawa, Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC   2002 ( 22 ) 191 - 196  2002.03  [Refereed]

     View Summary

    This paper proposes an interprocedural parallelism analysis scheme which combines analysis-time inline expansion and flexible cloning for coarse-grain parallelization. The analysis-time inlining is applied to selected subroutines. After the analysis of global parallelism over procedures, compiler generates inlined code for program part having global parallelism or applies "flexible cloning" to program parts without global parallelism into the original shape or different shape of subroutine. With this scheme, the compiler can exploit global coarse-grain with minimum increase in the code size. Performance evaluation using benchmark program ARC2D on SUN Ultra80 shows the proposed scheme gives us maximum 15% speedup than automatic parallelization of SUN Forte compiler. And by using flexible cloning, increase of code size has reduced by 14.8% from the case which doesn't use it.

    CiNii

  • A Macrotask selection technique for Data-Localization Scheme on Shared-memory Multi-Processor

    Satoshi Yagi, Hiroki Itagaki, Hirofumi Nakano, Kazuhisa Ishizaka, Motoki Obata, Akimasa Yoshida, Hironori Kasahara

    Technical Report of IPSJ, ARC    2002.03  [Refereed]

  • Coarse Grain Task Parallel Processing on Commercial SMPs

    Motoki Obata, Kazuhisa Ishizaka, Hiroki Kaminaga, Hirofumi Nakano, Akimasa Yoshida, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-146-10   2002 ( 9 ) 55 - 60  2002.02

     View Summary

    This paper evaluates performance of coarse grain task parallel processing using OSCAR Multigrain Parallelizing Compiler for five applications from SPEC95FP and Perfect Club benchmarks on commercial SMP machines. The coarse grain task parallel processing is important to improve the effective performance of SMP machines beyond the limit of the loop parallelism. In this OSCAR compiler, One-time Single Level Thread Generation scheme using OpenMP API and a data localization scheme are used to realize coarse grain task parallelization efficiently on various SMP machines. The evaluation shows that the coarse grain parallel processing gives us 60-430% larger speed up than the automatic loop parallelizing compiler for the five applications by the reduction of overheads of thread management and shared memory access on SMP server IBM RS6000 SP 604e High Node and SMP workstation SUN Ultra80.

    CiNii

  • Multigrain Parallel Processing for JPEG Encoding Program on an OSCAR type Single Chip Multiprocessor

    Takeshi Kodaka, Takayuki Uchida, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-146-4   2002 ( 9 ) 19 - 24  2002.02

     View Summary

    With the recent increase of multimedia contests using JPEG and MPEG, low cost, low power consumption and high performance processors for multimedia have been expected. Particularly, single chip multiprocessor architecture having simple processor cores is attracting much attention to develop such processors. This paper describes multigrain parallel processing scheme for a JPEG encoding program for OSCAR type single chip multiprocessor and its performance. The evaluation shows an OSCAR type single chip multiprocessor having four single-issue simple processor cores gave us 3.59 times speed-up than sequencial execution and 2.87 times speed-up than OSCAR type single chip multiprocessor that has a four-issue UltraSPARC-II type super-scaler processor core.

    CiNii

  • Multigrain Parallel Processing on Single Chip Multiprocessor

    Takayuki Uchida, Takeshi Kodaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-146-3   2002 ( 9 ) 13 - 18  2002.02

     View Summary

    With the advances in semiconductor integration technology, efficient use of transisors on a chip and scalable performance improvement have been demanded. To satisfy this demand, many researches on next generation microprocessor architectures and its software, especially compilers, have been performed. In these next generation microprocessor architectures, a single chip multiprocessor(SCM) using multigrain parallel processing, which hierarchically exploits different level of parallelism from the whole program, is one of the most promising architectures. This paper evaluates performance of the SCM architectures for multigrain parallel processing, using five application programs from SPEC2000fp and SPEC95fp. The evaluation shows that a four-processor cores SCM using multigrain parallel processing gives us 1.4 to 3.8 times larger speed up against a simple processor.

    CiNii

  • シングルチップマルチプロセッサにおけるマルチグレイン並列処理

    内田 貴之, 木村 啓二, 小高 剛, 笠原 博徳

    情報処理学会研究報告ARC-2002-146-5   2002 ( 9 ) 13 - 18  2002.02  [Refereed]

     View Summary

    With the advances in semiconductor integration technology, efficient use of transisors on a chip and scalable performance improvement have been demanded. To satisfy this demand, many researches on next generation microprocessor architectures and its software, especially compilers, have been performed. In these next generation microprocessor architectures, a single chip multiprocessor(SCM) using multigrain parallel processing, which hierarchically exploits different level of parallelism from the whole program, is one of the most promising architectures. This paper evaluates performance of the SCM architectures for multigrain parallel processing, using five application programs from SPEC2000fp and SPEC95fp. The evaluation shows that a four-processor cores SCM using multigrain parallel processing gives us 1.4 to 3.8 times larger speed up against a simple processor.

    CiNii

  • OSCAR型シングルチップマルチプロセッサ上でのJPEGエンコーディングプログラムのマルチグレイン並列処理

    小高 剛, 内田 貴之, 木村 啓二, 笠原 博徳

    情報処理学会研究報告ARC-2002-146-4   2002 ( 9 ) 19 - 24  2002.02  [Refereed]

     View Summary

    With the recent increase of multimedia contests using JPEG and MPEG, low cost, low power consumption and high performance processors for multimedia have been expected. Particularly, single chip multiprocessor architecture having simple processor cores is attracting much attention to develop such processors. This paper describes multigrain parallel processing scheme for a JPEG encoding program for OSCAR type single chip multiprocessor and its performance. The evaluation shows an OSCAR type single chip multiprocessor having four single-issue simple processor cores gave us 3.59 times speed-up than sequencial execution and 2.87 times speed-up than OSCAR type single chip multiprocessor that has a four-issue UltraSPARC-II type super-scaler processor core.

    CiNii

  • 商用SMP上での粗粒度タスク並列処理

    小幡 元樹, 石坂 一久, 神長 浩気, 中野 啓史, 吉田 明正, 笠原 博徳

    情報処理学会研究報告ARC-2002-146-10   2002 ( 9 ) 55 - 60  2002.02  [Refereed]

     View Summary

    This paper evaluates performance of coarse grain task parallel processing using OSCAR Multigrain Parallelizing Compiler for five applications from SPEC95FP and Perfect Club benchmarks on commercial SMP machines. The coarse grain task parallel processing is important to improve the effective performance of SMP machines beyond the limit of the loop parallelism. In this OSCAR compiler, One-time Single Level Thread Generation scheme using OpenMP API and a data localization scheme are used to realize coarse grain task parallelization efficiently on various SMP machines. The evaluation shows that the coarse grain parallel processing gives us 60-430% larger speed up than the automatic loop parallelizing compiler for the five applications by the reduction of overheads of thread management and shared memory access on SMP server IBM RS6000 SP 604e High Node and SMP workstation SUN Ultra80.

    CiNii

  • Multigrain Parallel Processing for JPEG Encoding Program on an OSCAR type Single Chip Multiprocessor

    Takeshi Kodaka, Takayuki Uchida, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-146-4   2002 ( 9 ) 19 - 24  2002.02  [Refereed]

     View Summary

    With the recent increase of multimedia contests using JPEG and MPEG, low cost, low power consumption and high performance processors for multimedia have been expected. Particularly, single chip multiprocessor architecture having simple processor cores is attracting much attention to develop such processors. This paper describes multigrain parallel processing scheme for a JPEG encoding program for OSCAR type single chip multiprocessor and its performance. The evaluation shows an OSCAR type single chip multiprocessor having four single-issue simple processor cores gave us 3.59 times speed-up than sequencial execution and 2.87 times speed-up than OSCAR type single chip multiprocessor that has a four-issue UltraSPARC-II type super-scaler processor core.

    CiNii

  • Multigrain Parallel Processing on Single Chip Multiprocessor

    Takayuki Uchida, Takeshi Kodaka, Keiji Kimura, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-146-3   2002 ( 9 ) 13 - 18  2002.02  [Refereed]

     View Summary

    With the advances in semiconductor integration technology, efficient use of transisors on a chip and scalable performance improvement have been demanded. To satisfy this demand, many researches on next generation microprocessor architectures and its software, especially compilers, have been performed. In these next generation microprocessor architectures, a single chip multiprocessor(SCM) using multigrain parallel processing, which hierarchically exploits different level of parallelism from the whole program, is one of the most promising architectures. This paper evaluates performance of the SCM architectures for multigrain parallel processing, using five application programs from SPEC2000fp and SPEC95fp. The evaluation shows that a four-processor cores SCM using multigrain parallel processing gives us 1.4 to 3.8 times larger speed up against a simple processor.

    CiNii

  • Coarse Grain Task Parallel Processing on Commercial SMPs

    Motoki Obata, Kazuhisa Ishizaka, Hiroki Kaminaga, Hirofumi Nakano, Akimasa Yoshida, Hironori Kasahara

    Technical Report of IPSJ, ARC2002-146-10   2002 ( 9 ) 55 - 60  2002.02  [Refereed]

     View Summary

    This paper evaluates performance of coarse grain task parallel processing using OSCAR Multigrain Parallelizing Compiler for five applications from SPEC95FP and Perfect Club benchmarks on commercial SMP machines. The coarse grain task parallel processing is important to improve the effective performance of SMP machines beyond the limit of the loop parallelism. In this OSCAR compiler, One-time Single Level Thread Generation scheme using OpenMP API and a data localization scheme are used to realize coarse grain task parallelization efficiently on various SMP machines. The evaluation shows that the coarse grain parallel processing gives us 60-430% larger speed up than the automatic loop parallelizing compiler for the five applications by the reduction of overheads of thread management and shared memory access on SMP server IBM RS6000 SP 604e High Node and SMP workstation SUN Ultra80.

    CiNii

  • Static coarse grain task scheduling with cache optimization using openMP

    Hirofumi Nakano, Kazuhisa Ishizaka, Motoki Obata, Keiji Kimura, Hironori Kasahara

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   2327   479 - 489  2002  [Refereed]

     View Summary

    Effective use of cache memory is getting more important with increasing gap between the processor speed and memory access speed. Also, use of multigrain parallelism is getting more important to improve effective performance beyond the limitation of loop iteration level parallelism. Considering these factors, this paper proposes a coarse grain task static scheduling scheme considering cache optimization. The proposed scheme schedules coarse grain tasks to threads so that shared data among coarse grain tasks can be passed via cache after task and data decomposition considering cache size at compile time. It is implemented on OSCAR Fortran multigrain parallelizing compiler and evaluated on Sun Ultra80 four-processor SMP workstation, using Swim and Tomcatv from the SPEC fp 95. As the results, the proposed scheme gives us 4.56 times speedup for Swim and 2.37 times on 4 processors for Tomcatv respectively against the Sun Forte HPC 6 loop parallelizing compiler. © 2002 Springer Berlin Heidelberg.

    DOI

    Scopus

    2
    Citation
    (Scopus)
  • Multigrain parallel processing for JPEG encoding on a single chip multiprocessor

    T Kodaka, K Kimura, H Kasahara

    INTERNATIONAL WORKSHOP ON INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS     57 - 63  2002  [Refereed]

     View Summary

    With the recent increase of multimedia contents using JPEG and MPEG, low cost, low power consumption and high performance processors for multimedia application have been expected. Particularly, single chip multiprocessor architecture having simple processor cores that will attain good scalability and cost effectiveness is attracting much attention. To exploit full performance of single chip multiprocessor architecture, multigrain parallel processing, which exploits coarse grain task parallelism, loop parallelism and instruction level parallelism, is attractive. This paper describes a multigrain parallel processing scheme for the JPEG encoding on a single chip multiprocessor and its performance. The evaluation shows an OSCAR type single chip multiprocessor having four single-issue simple processor cores gave us 3.59 times speed-up against sequential execution time.

  • 自動並列化コンパイラ協調型シングルチップ・マルチプロセッサの研究

    笠原 博徳

    JEITA/EDS Fair 2002    2002.01  [Refereed]

  • Automatic Parallelizing Compiler Cooperative Single Chip Multiprocessor

    Hironori Kasahara

    JEITA/EDS Fair 2002    2002.01  [Refereed]

  • Humanoid Robots in Waseda University---Hadaly-2 and WABIAN

    S Hashimoto, S Narita, H Kasahara, K Shirai, T Kobayashi, A Takanishi, S Sugano, J Yamaguchi, H Sawada, H Takanobu, K Shibuya, T Morita, T Kurata, N Onoe, K Ouchi, T Noguchi, Y Niwa, S Nagayama, H Tabayashi, Matsui, I, M Obata, H Matsuzaki, A Murasugi, T Kobayashi, S Haruyama, T Okada, Y Hidaki, Y Taguchi, K Hoashi, E Morikawa, Y Iwano, D Araki, J Suzuki, M Yokoyama, Dawa, I, D Nishino, S Inoue, T Hirano, E Soga, S Gen, T Yanada, K Kato, S Sakamoto, Y Ishii, S Matsuo, Y Yamamoto, K Sato, T Hagiwara, T Ueda, N Honda, K Hashimoto, T Hanamoto, S Kayaba, T Kojima, H Iwata, H Kubodera, R Matsuki, T Nakajima, K Nitto, D Yamamoto, Y Kamizaki, S Nagaike, Y Kunitake, S Morita

    Autonomous Robots, 2002Kluwer Academic Publishers. Manufactured in The Netherlands   12 ( 1 ) 25 - 38  2002.01  [Refereed]

     View Summary

    This paper describes two humanoid robots developed in the Humanoid Robotics Institute, Waseda University. Hadaly-2 is intended to realize information interaction with humans by integrating environmental recognition with vision, conversation capability (voice recognition, voice synthesis), and gesture behaviors. It also possesses physical interaction functions for direct contact with humans and behaviors that are gentle and safe for humans. WABIAN is a robot with a complete human configuration that is capable of walking on two legs and carrying things as with humans. Furthermore, it has functions for information interactions suite for uses at home.

  • Multigrain parallel processing for JPEG encoding on a single chip multiprocessor

    T. Kodaka, K. Kimura, H. Kasahara

    Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems   2002-   57 - 63  2002  [Refereed]

     View Summary

    With the recent increase of multimedia content using JPEG and MPEG, low cost, low power consumption and high performance processors for multimedia application are desirable. In particular, single chip multiprocessor architecture having simple processor cores that will attain good scalability and cost effectiveness is attracting much attention. To exploit full performance of single chip multiprocessor architecture, multigrain parallel processing, which exploits coarse grain task parallelism, loop parallelism and instruction level parallelism, is attractive. This paper describes a multigrain parallel processing scheme for JPEG encoding on a single chip multiprocessor and its performance. The evaluation shows that an OSCAR type single chip multiprocessor having four single-issue simple processor cores gave a 3.59 times speed-up against sequential execution time.

    DOI

    Scopus

    12
    Citation
    (Scopus)
  • Multigrain automatic parallelization in Japanese Millennium Project IT21 Advanced Parallelizing Compiler

    H Kasahara, M Obata, K Ishizaka, K Kimura, H Kaminaga, H Nakano, K Nagasawa, A Murai, H Itagaki, J Shirako

    PAR ELEC 2002: INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING     105 - 111  2002  [Refereed]

     View Summary

    This paper describes OSCAR multigrain parallelizing compiler which has been developed in Japanese Millennium Project IT21 "Advanced Parallelizing Compiler" project and its performance on SMP machines. The compiler realizes multigrain parallelization for chip-multiprocessors to high-end servers. It hierarchically exploits coarse grain task parallelism among loops, subroutines and basic blocks and near fine grain parallelism among statements inside a basic block in addition to loop parallelism. Also, it globally optimizes cache use over different loops, or coarse grain tasks, based on data localization technique to reduce memory access overhead Current performance of OSCAR compiler for SPEC95fp is evaluated on different SMPs. For example, it gives us 3.7 times speedup for HYDRO2D, 1.8 times for SWIM, 1.7 times for SU2COR, 2.0 times for MGRID, 3.3 times for TURB3D on 8 processor IBM RS6000, against XL Fortran compiler ver:7.1 and 4.2 times speedup for SWIM and 2.2 times speedup for TURB3D on 4 processor Sun Ultra80 workstation against Forte6 update 2.

  • A Static Scheduling Scheme for Coarse Grain Tasks considering Cache Optimization on SMP

    Hirofumi Nakano, Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    IPSJ SIG Notes 2001-ARC-144-12    2001.08

  • Near Fine Grain Parallel Processing on Multimedia Application for Single Chip Multiprocessor

    Takeshi Kodaka, Naohisa Miyashita, Keiji Kimura, Hironori Kasahara

    IPSJ SIG Notes 2001-ARC-144-11    2001.08

  • キャッシュ最適化を考慮したマルチプロセッサシステム上での粗粒度タスクスタティックスケジューリング手法

    中野 啓史, 石坂 一久, 小幡 元樹, 木村 啓二, 笠原 博徳

    情報処理学会研究報告ARC-2001-140-12   2001 ( 76 ) 67 - 72  2001.08  [Refereed]

    CiNii

  • シングルチップマルチプロセッサ上でのマルチメディアアプリケーションの近細粒度並列処理

    小高 剛, 宮下 直久, 木村 啓二, 笠原 博徳

    情報処理学会研究報告ARC-2001-140-11    2001.08  [Refereed]

  • Future of Automatic Parallelizing Compiler

    H. Kasahara

    The 14th International Workshop on Languages and Compilers for Parallel Computing (LCPC'01) Panel: Future of Languages and Compilers, Kentucky    2001.08  [Refereed]

  • A Static Scheduling Scheme for Coarse Grain Tasks considering Cache Optimization on SMP

    Hirofumi Nakano, Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    IPSJ SIG Notes 2001-ARC-144-12    2001.08  [Refereed]

  • Near Fine Grain Parallel Processing on Multimedia Application for Single Chip Multiprocessor

    Takeshi Kodaka, Naohisa Miyashita, Keiji Kimura, Hironori Kasahara

    IPSJ SIG Notes 2001-ARC-144-11    2001.08  [Refereed]

  • A Data Localization Scheme for Coarse Grain Task Parallel Processing on Shared Memory Multiprocessors

    Akimasa Yoshida, Satoshi Yagi, Hironori Kasahara

    Proc. of IEEE International Workshop on Advanced Compiler Technology for High Performance and Embedded Systems     111 - 118  2001.07  [Refereed]

    CiNii

  • OSCAR Single Chip Multiprocessor and Multigrain Parallelizing Compiler

    H. Kasahara

    IEEE International Workshop on Advanced Compiler Technology for High Performance and Embedded Systems (IWACT 2001) Panel : New Architecture and Their Compilers, Romania    2001.07  [Refereed]

  • Automatic Coarse Grain Task Parallel Processing Using OSCAR Multigrain Parallelizing Compiler

    Motoki Obata, Kazuhisa Ishizaka, Hironori Kasahara

    Ninth International Workshop on Compilers for Parallel Computers(CPC 2001)     173 - 182  2001.06  [Refereed]

  • 近細粒度並列処理用シングルチップマルチプロセッサにおけるプロセッサコアの評価

    木村 啓二, 加藤 孝幸, 笠原 博徳

    情報処理学会論文誌   42 ( 4 ) 692 - 703  2001.04  [Refereed]

    CiNii

  • 共有メモリマルチプロセッサシステム上での粗粒度タスク並列処理

    笠原 博徳, 小幡 元樹, 石坂 一久

    情報処理学会論文誌   42 ( 4 )  2001.04  [Refereed]

    CiNii

  • メタスケジューリング--自動並列分散処理の試み

    小出 洋, 笠原 博徳

    bit、共立出版   33 ( 4 ) 10 - 14  2001.04  [Refereed]

    J-GLOBAL

  • Meta-scheduling -- Trial for Automatic Distributed Computing

    Hiroshi Koide, Hironori Kasahara

    bit, Kyoritsu Shuppan   33 ( 4 ) 10 - 14  2001.04  [Refereed]

  • Evaluation of Processor Core Architecture for Single Chip Multiprocessor with Near Fine Grain Parallel Processing

    Keiji Kimura, Takayuki Kato, Hironori Kasahara

    Trans. of IPSJ   42 ( 4 ) 692 - 703  2001.04  [Refereed]

  • Coarse Grain Task Parallel Processing on a Shared Memory Multiprocessor System

    Hironori Kasahara, Motoki Obata, Kazuhisa Ishizaka

    Trans. of IPSJ   42 ( 4 )  2001.04  [Refereed]

  • 資源情報サーバにおける資源情報予測の評価

    小出 洋, 山岸 信寛, 武宮 博, 笠原 博徳

    情報処理学会論文誌   42 ( SIG03 ) 65 - 73  2001.03  [Refereed]

    J-GLOBAL

  • 標準タスクグラフセットを用いたデータ転送オーバーへッドを考慮したスケジューリングアルゴリズムの性能評価

    山口 高弘, 田中 雄一, 飛田 高雄, 笠原 博徳

    情報処理学会第62回全国大会   2Q-01  2001.03  [Refereed]

  • 近細粒度並列処理に適したシングルチップマルチプロセッサのメモリアーキテクチャの評価

    松元 信介, 木村 啓二, 笠原 博徳

    情報処理学会第62回全国大会   4P-01  2001.03  [Refereed]

  • 異機種分散計算機環境におけるOSCARマルチグレイン並列化コンパイラを用いたメタスケジューリング手法

    林 拓也, 茂田 有己光, 小出 洋, 飛田 高雄, 笠原 博徳

    情報処理学会第62回全国大会   3R-01 ( 1 )  2001.03  [Refereed]

    J-GLOBAL

  • メモリ容量を考慮したプレロード・ポストストアスケジューリングアルゴリズムの評価

    田中 崇久, 舟山 洋央, 飛田 高雄, 笠原 博徳

    情報処理学会第62回全国大会   4R-03  2001.03  [Refereed]

    CiNii

  • マルチメディアアプリケーションのシングルチップマルチプロセッサ上での近細粒度並列処理

    小高 剛, 木村 啓二, 宮下 直久, 笠原 博徳

    情報処理学会第62回全国大会   3P-08  2001.03  [Refereed]

  • マルチプロセッサシステム上でのキャッシュ最適化を考慮した粗粒度タスクスタティックスケジューリング手法

    中野 啓史, 石坂 一久, 小幡 元樹, 木村 啓二, 笠原 博徳

    情報処理学会第62回全国大会   4R-02  2001.03  [Refereed]

  • マルチグレイン並列処理用シングルチップマルチプロセッサにおけるデータ転送ユニットの検討

    宮下 直久, 木村 啓二, 小高 剛, 笠原 博徳

    情報処理学会第62回全国大会   4P-02  2001.03  [Refereed]

    CiNii

  • データマイニングツールdataFORESTを用いた異機種分散計算機環境におけるプロセッサ負荷予測

    茂田 有己光, 林 拓也, 小出 洋, 鹿島 亨, 筒井 宏明, 笠原 博徳

    情報処理学会第62回全国大会   3R-02 ( 1 )  2001.03  [Refereed]

    J-GLOBAL

  • OSCARマルチグレイン並列化コンパイラとシングルチップ・マルチプロセッサ

    笠原 博徳

    京都大学大型計算機センター研究開発部第66回研究セミナー    2001.03  [Refereed]

  • OSCAR Multigrain Parallelizing Compiler and Single Chip Multiprocessor

    H. Kasahara

    Data Processing Center, Kyoto University    2001.03  [Refereed]

  • Automatic coarse grain task parallel processing on SMP using openMP

    Hironori Kasahara, Motoki Obata, Kazuhisa Ishizaka

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)   2017   189 - 207  2001  [Refereed]

     View Summary

    This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing compiler automatically generates parallelized code including OpenMP directives and its performance is evaluated on a commercial SMP machine. The coarse grain task parallel processing is important to improve the effective performance of wide range of multiprocessor systems from a single chip multiprocessor to a high performance computer beyond the limit of the loop parallelism. The proposed scheme decomposes a Fortran program into coarse grain tasks, analyzes parallelism among tasks by “Earliest Executable Condition Analysis” considering control and data dependencies, statically schedules the coarse grain tasks to threads or generates dynamic task scheduling codes to assign the tasks to threads and generates OpenMP Fortran source code for a SMP machine. The thread parallel code using OpenMP generated by OSCAR compiler forks threads only once at the beginning of the program and joins only once at the end even though the program is processed in parallel based on hierarchical coarse grain task parallel processing concept. The performance of the scheme is evaluated on 8-processor SMP machine, IBM RS6000 SP 604e High Node, using a newly developed OpenMP backend of OSCAR multigrain compiler. The evaluation shows that OSCAR compiler with IBM XL Fortran compiler version 5.1 gives us 1.5 to 3 times larger speedup than the native XL Fortran compiler for SPEC 95fp SWIM, TOMCATV, HYDRO2D, MGRID and Perfect Benchmarks ARC2D.

    DOI

    Scopus

    17
    Citation
    (Scopus)
  • 特集:並列処理

    笠原 博徳

    情報処理学会論文誌   42 ( 4 ) 651 - 920  2001  [Refereed]

    CiNii

  • A Data-Localization Scheme for Macrotask-Graph with Data Dependencies on SMP

    Akimasa Yoshida, Satoshi Yagi, Hironori Kasahara

    Technical Report of IPSJ, ARC-141-6    2001.01

  • Evaluation of coarse grain task parallel processing on the shared memory multiprocessor system

    Kazuhisa Ishizaka, Satoshi Yagi, Motoki Obata, Akimasa Yoshida, Hironori Kasahara

    Technical Report of IPSJ, ARC-141-7    2001.01

  • 共有メモリマルチプロセッサシステム上での粗粒度タスク並列実現手法の評価

    石坂 一久, 八木 哲志, 小幡 元樹, 吉田 明正, 笠原 博徳

    情報処理学会研究報告ARC-141-7    2001.01  [Refereed]

  • SMP上でのデータ依存マクロタスクグラフのデータローカライゼーション手法

    吉田 明正, 八木 哲志, 笠原 博徳

    情報処理学会研究報告ARC-141-6   2001  2001.01  [Refereed]

    CiNii

  • アドバンスト並列化コンパイラ技術研究開発の概要

    笠原 博徳

    経済産業省・NEDOミレニアムプロジェクト, 日本情報処理開発協会先端情報技術研究所    2001.01  [Refereed]

  • Evaluation of coarse grain task parallel processing on the shared memory multiprocessor system

    Kazuhisa Ishizaka, Satoshi Yagi, Motoki Obata, Akimasa Yoshida, Hironori Kasahara

    Technical Report of IPSJ, ARC-141-7    2001.01  [Refereed]

  • A Data-Localization Scheme for Macrotask-Graph with Data Dependencies on SMP

    Akimasa Yoshida, Satoshi Yagi, Hironori Kasahara

    Technical Report of IPSJ, ARC-141-6    2001.01  [Refereed]

  • Evaluation of Single Chip Multiprocessor Core Architecture with Near Fine Grain Parallel Processing

    Keiji Kimura, Hironori Kasahara

    Proc. of International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'01)    2001.01  [Refereed]

  • Overview of METI/NEDO Millennium Project 'Advanced Parallelizing Compiler'

    H. Kasahara

    Japan Information Processing Development Center Research Institute for Advanced Information Technology    2001.01  [Refereed]

    CiNii

  • OSCAR Multigrain Parallelizing Compiler and Single Chip Multiprocessor

    H. Kasahara

    University of Illinois at Urbana-Champaign, Hosted by Prof. David Padua, USA    2000.11  [Refereed]

  • Coarse-grain Task Parallel Processing using the OpenMP backend of the OSCAR Multigrain Parallelizing Compiler

    Kazuhisa Ishizaka, Hironori Kasahara, Motoki Obata

    Proc. of Third International Symposium, ISHPC 2000   1940   352 - 365  2000.10  [Refereed]

    DOI

    Scopus

    5
    Citation
    (Scopus)
  • Multigrain Parallel Processing Model for Future Single Chip Multiprocessor Systems

    H. Kasahara

    ISHPC2000, Panel "Programming Models for New Architectures"    2000.10  [Refereed]

  • Evaluation of the resource information prediction in the resource information server

    Hiroshi Koide, Nobuhiro Yamagishi, Hiroshi Takemiya, Hironori Kasahara

    Technical Report of IPSJ,PRO   42 ( SIG3(PRO10) )  2000.08  [Domestic journal]

    Authorship:Last author

    J-GLOBAL

  • Processor Core Architecture of Single Chip Multiprocessor for Near Fine Grain Parallel Processing

    Keiji Kimura, Takayuki Uhida, Takayuki Kato, Hironori Kasahara

    Technical Report of IPSJ, ARC-139-16    2000.08

  • Coarse Grain Task Parallel Processing with OpenMP API

    Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC-139-32    2000.08

  • OpenMPを用いた粗粒度タスク並列処理現

    石坂 一久, 小幡 元樹, 笠原 博徳

    情報処理学会研究報告ARC-139-32(SWoPP2000)    2000.08  [Refereed]

  • 近細粒度並列処理用シングルチップマルチプロセッサにおけるプロセッサコアの構成

    木村 啓二, 内田 貴之, 加藤 孝幸, 笠原 博徳

    情報処理学会研究報告ARC-139-16(SWoPP2000)     91 - 96  2000.08  [Refereed]

    CiNii

  • Coarse Grain Task Parallel Processing with OpenMP API

    Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

    Technical Report of IPSJ, ARC-139-32    2000.08  [Refereed]

  • Processor Core Architecture of Single Chip Multiprocessor for Near Fine Grain Parallel Processing

    Keiji Kimura, Takayuki Uhida, Takayuki Kato, Hironori Kasahara

    Technical Report of IPSJ, ARC-139-16    2000.08  [Refereed]

  • 標準タスクグラフセットを用いたマルチプロセッサスケジューリングアルゴリズムの性能評価

    飛田 高雄, 笠原 博徳

    情報処理学会2000年記念並列処理シンポジウム(JSPP2000)論文集     131 - 138  2000.05  [Refereed]

  • メタスケジューリングのための資源情報サーバの構築

    小出 洋, 山岸 信寛, 武宮 博, 林 拓也, 引田 雅之, 笠原 博徳

    計算工学講演会論文集   5 ( 1 ) 357 - 360  2000.05  [Refereed]

    CiNii

  • Performance Evaluation of Multiprocessor Scheduling Algorithms Using Standard Task Graph Set

    T. Tobita, H. Kasahara

    Joint Symposium on Parallel Processing 2000 (JSPP2000)     131 - 138  2000.05  [Refereed]

    CiNii

  • An Analysis-time Procedure Inlining Scheme for Multi-grain Automatic Parallelizing Compilation

    K. Yoshii, G. Matsui, M. Obata, S. Kumazawa, H. Kasahara

    IPSJ ARC/HPC    2000.03

  • Performance Evaluation and Parallelize of Electronic Circuit Simulation which generate code without array indirect access

    K. Manaka, R. Osakabe, Y. Maekawa, H. Kasahara

    IPSJ ARC/HPC    2000.03

  • 配列間接アクセスを用いないコード生成法による電子回路シミュレーションの高速化

    間中 邦之, 刑部 亮, 前川 仁孝, 笠原 博徳

    情報処理学会第60回全国大会   5H-08  2000.03  [Refereed]

  • 解析時インライニングを用いたマルチグレイン自動並列化手法

    吉井 謙一郎, 松井 巌徹, 小幡 元樹, 熊澤 慎也, 笠原 博徳

    情報処理学会第60回全国大会   4J-03  2000.03  [Refereed]

  • メモリ容量を考慮したデータプレロード・マルチプロセッサスケジューリング

    増田 高史, 飛田 高雄, 舟山 洋央, 笠原博徳

    情報処理学会第60回全国大会   4J-06  2000.03  [Refereed]

  • マルチグレイン並列処理における階層的並列処理のためのプロセッサクラスタリング決定手法

    山本 正行, 山本 晃正, 小幡 元樹, 笠原 博徳

    情報処理学会第60回全国大会   4J-05   4J - 5  2000.03  [Refereed]

    CiNii

  • データ依存のみを持つ任意形状のマクロタスクグラフに対するデータローカライゼーション手法

    成清暁博, 八木哲志, 松崎秀則, 小幡元樹, 吉田明正, 笠原博徳

    情報処理学会第60回全国大会   4J-02  2000.03  [Refereed]

  • シングルチップマルチプロセッサの近細粒度並列処理に対する性能評価

    加藤 考幸, 尾形 航, 木村 啓二, 内田 貴之, 笠原 博徳

    情報処理学会第60回全国大会   4J-07  2000.03  [Refereed]

  • SMP上での有限要素・境界要素法併用法による電磁界解析アプリケーション並列処理

    金子 大作, 小幡 元樹, 若尾 真治, 小貫 天, 笠原 博徳

    情報処理学会第60回全国大会   5H-07  2000.03  [Refereed]

  • OpenMPを用いたマルチグレイン並列処理の実現

    石坂 一久, 小幡 元樹, 瀧 康太郎, 笠原 博徳

    情報処理学会第60回全国大会   4J-04  2000.03  [Refereed]

  • 配列間接アクセスを用いないコード生成法による電子回路シミュレーションの高速化とその並列処理

    間中 邦之, 刑部 亮, 前川 仁孝, 笠原 博徳

    情報処理学会ARC研究会/HPC研究会    2000.03  [Refereed]

  • マルチグレイン自動並列化のための解析時インライニング

    吉井 謙一郎, 松井 巌徹, 小幡 元樹, 熊澤 慎也, 笠原 博徳

    情報処理学会ARC研究会/HPC研究会    2000.03  [Refereed]

    CiNii

  • Performance Evaluation and Parallelize of Electronic Circuit Simulation which generate code without array indirect access

    K. Manaka, R. Osakabe, Y. Maekawa, H. Kasahara

    IPSJ ARC/HPC    2000.03  [Refereed]

  • An Analysis-time Procedure Inlining Scheme for Multi-grain Automatic Parallelizing Compilation

    K. Yoshii, G. Matsui, M. Obata, S. Kumazawa, H. Kasahara

    IPSJ ARC/HPC    2000.03  [Refereed]

  • Evaluation of the resource information prediction in the resource information server

    Hiroshi Koide, Nobuhiro Yamagishi, Hiroshi Takemiya, Hironori Kasahara

    Trans. of IPSJ: Programming   42 ( SIG03 ) 65 - 73  2000.03  [Refereed]  [Domestic journal]

    Authorship:Last author

    J-GLOBAL

  • Near fine grain parallel processing using static scheduling on single chip multiprocessors

    K Kimura, H Kasahara

    INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS     23 - 31  2000  [Refereed]

     View Summary

    With the increase of the number of transistors integrated on a chip, efficient use of transistors and scalable improvement of effective performance of a processor are getting important problems. However it has been thought that popular superscalar and VLIW would have difficulty, to obtain scalable improvement of effective performance in future because of the limitation of instruction level parallelism. To cope with this problem, a single chip multiprocessor (SCM) approach,vith multi grain parallelprocessing inside a chip, which hierarchically exploits loop parallelism and coarse grain parallelism among subroutines, loops and basic blocks in addition to instruction level parallelism, is thought one of the most promising approaches. This paper evaluates effectiveness of the single chip multiprocessor architectures with a shared cache, global registers, distributed shared memory and/or local memory for near fine grain parallel processing as the first step of research on SCM architecture to support multi grain parallel processing. The evaluation shows OSCAR (Optimally Scheduled Advanced Multiprocessor architecture having distributed shared memory and local memory in addition to centralized shared memory and attachment of global register gives us significant speed up such as 13.8% to 143.8% for four processors compared with shared cache architecture for applications which have been difficult to extract parallelism effectively.

  • A Data-Localization Scheme for Macrotask-Graphs with Data Dependencies

    A. Narikiyo, H. Matsuzaki, M. Obata, A. Yoshida, H. Kasahara

    Technical Report of IPSJ, ARC-136-8   2000 ( 1 ) 43 - 48  2000.01

     View Summary

    This paper proposes a data-localization scheme for a part with data dependence edges in any kinds of macrotask-graphs in hierarchical coarse grain parallel processing. First, multiple loops having data dependence are decomposed into data-localization-groups in each macrotaskgraph layer. Next, the compiler generates a hierarchical dynamic scheduling routine with partial static task assignment, which assigns macrotasks inside data-localization-group to the same processor or processor-cluster in each layer, so that shared data can be transferred via local memory. This data localization scheme can be applied to a part or the whole macrotask graph which only has data dependence edges. This data localization scheme also handles loops with the lower and upper limit given by variables. As a result, most of array data is transferred via local memory. Finally, this paper describes the performance evaluation on a multi-processor system OSCAR. The evaluation shows that hierarchical coarse grain parallel processing with data-localization can reduce execution time about 20% compared with hierarchical coarse grain parallel processing without data-localization.

    CiNii

  • データ依存のみを持つマクロタスクグラフに対するデータローカライゼーション手法

    成清 暁博, 松崎 秀則, 小幡 元樹, 吉田 明正, 笠原 博徳

    情報処理学会ARC136-8研究会     43 - 48  2000.01  [Refereed]

  • A Data-Localization Scheme for Macrotask-Graphs with Data Dependencies

    A. Narikiyo, H. Matsuzaki, M. Obata, A. Yoshida, H. Kasahara

    Technical Report of IPSJ, ARC-136-8   2000 ( 1 ) 43 - 48  2000.01  [Refereed]

     View Summary

    This paper proposes a data-localization scheme for a part with data dependence edges in any kinds of macrotask-graphs in hierarchical coarse grain parallel processing. First, multiple loops having data dependence are decomposed into data-localization-groups in each macrotaskgraph layer. Next, the compiler generates a hierarchical dynamic scheduling routine with partial static task assignment, which assigns macrotasks inside data-localization-group to the same processor or processor-cluster in each layer, so that shared data can be transferred via local memory. This data localization scheme can be applied to a part or the whole macrotask graph which only has data dependence edges. This data localization scheme also handles loops with the lower and upper limit given by variables. As a result, most of array data is transferred via local memory. Finally, this paper describes the performance evaluation on a multi-processor system OSCAR. The evaluation shows that hierarchical coarse grain parallel processing with data-localization can reduce execution time about 20% compared with hierarchical coarse grain parallel processing without data-localization.

    CiNii

  • Performance evaluation of minimum execution time multiprocessor scheduling algorithms using standard task graph set

    T Tobita, M Kouda, H Kasahara

    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V     745 - 751  2000  [Refereed]

     View Summary

    This paper evaluates performance of heuristic algorithms such as CP (Critical Path), CP/MISF (Critical Path/Most Immediate Successors First), practical sequential optimization algorithm DF/IHS (Depth First/Implicit Heuristic Search) and practical parallel optimization algorithm PDF/IHS (Parallelized DF/IHS) using a "Standard Task Graph Set" for evaluation of multiprocessor scheduling algorithms. The Standard Task Graph Set has been developed to allow worldwide researchers to evaluate multiprocessor scheduling algorithms fairly under the same evaluation conditions. It includes random task graphs generated by several generation methods that were used in the previous papers published by many research groups. Performance evaluation shows that PDF/IHS gives us optimal solutions for 96.06% of tested 660 task graphs with 50 to 1900 tasks by using 6 parallel processors within 600 seconds in wall-clock, and heuristic algorithms can give us optimal solutions for about 75% of tested graphs.

  • Performance evaluation of minimum execution time multiprocessor scheduling algorithms using standard task graph set

    T Tobita, M Kouda, H Kasahara

    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V   43 ( 4 ) 745 - 751  2000  [Refereed]

     View Summary

    This paper evaluates performance of heuristic algorithms such as CP (Critical Path), CP/MISF (Critical Path/Most Immediate Successors First), practical sequential optimization algorithm DF/IHS (Depth First/Implicit Heuristic Search) and practical parallel optimization algorithm PDF/IHS (Parallelized DF/IHS) using a "Standard Task Graph Set" for evaluation of multiprocessor scheduling algorithms. The Standard Task Graph Set has been developed to allow worldwide researchers to evaluate multiprocessor scheduling algorithms fairly under the same evaluation conditions. It includes random task graphs generated by several generation methods that were used in the previous papers published by many research groups. Performance evaluation shows that PDF/IHS gives us optimal solutions for 96.06% of tested 660 task graphs with 50 to 1900 tasks by using 6 parallel processors within 600 seconds in wall-clock, and heuristic algorithms can give us optimal solutions for about 75% of tested graphs.

  • マルチグレイン並列化FORTRANコンパイラ

    岡本 雅巳, 小幡 元樹, 松井 巌徹, 松崎 秀則, 笠原 博徳, 成田 誠之助

    情報処理学会論文誌   40 ( 12 ) 4296 - 4308  1999.12  [Refereed]

     View Summary

    This paper describes a FORTRAN multi-grain parallelizing compiler. The multi-grain parallelizing compiler improves effective performance and ease of use of multiprocessor systems from single-chip multiprocessors to supercomputers. Multi-grain parallelizing scheme realizes effective parallel processing over the whole program by hierarchically appling coarse grain parallelization among subroutines, loops and basic blocks, and fine grain parallelization among statements or instructions in addition to conventional loop parallelization.

    CiNii

  • Multi-grain Parallelizing FORTRAN Compiler

    M. Okamoto, M. Obata, G. Matsui, H. Matsuzaki, H. Kasahara, S. Narita

    Trans. of IPSJ   40 ( 12 ) 4296 - 4308  1999.12  [Refereed]

     View Summary

    This paper describes a FORTRAN multi-grain parallelizing compiler. The multi-grain parallelizing compiler improves effective performance and ease of use of multiprocessor systems from single-chip multiprocessors to supercomputers. Multi-grain parallelizing scheme realizes effective parallel processing over the whole program by hierarchically appling coarse grain parallelization among subroutines, loops and basic blocks, and fine grain parallelization among statements or instructions in addition to conventional loop parallelization.

    CiNii

  • Memory access analyzer for a Multi-grain parallel processing

    K. Iwai, M. Obata, K. Kimura, H. Amano, H. Kasahara

    Technical Report of IEICE,CPSY99   99 ( 252 ) 1 - 8  1999.08

  • Performance Evaluation of Near Fine Grain Parallel Processing on the Single Chip Multiprocessor

    K. Kimura, K. Manaka, W. Ogata, M. Okamoto, H. Kasahara

    Technical Report of IPSJ, ARC-134-5     19 - 24  1999.08

  • マルチグレイン並列化コンパイラのメモリアクセスアナライザ

    岩井 啓輔, 小幡 元樹, 木村 啓二, 天野 英晴, 笠原 博徳

    電子通信情報学会技術報告CPSY99-62   99 ( 252 ) 1 - 8  1999.08  [Refereed]

  • シングルチップマルチプロセッサ上での近細粒度並列処理の性能評価

    木村 啓二, 間中 邦之, 尾形 航, 岡本 雅巳, 笠原 博徳

    情報処理学会研究報告ARC134-4     19 - 24  1999.08  [Refereed]

  • Performance Evaluation of Near Fine Grain Parallel Processing on the Single Chip Multiprocessor

    K. Kimura, K. Manaka, W. Ogata, M. Okamoto, H. Kasahara

    Technical Report of IPSJ, ARC-134-5     19 - 24  1999.08  [Refereed]

  • Memory access analyzer for a Multi-grain parallel processing

    K. Iwai, M. Obata, K. Kimura, H. Amano, H. Kasahara

    Technical Report of IEICE,CPSY99   99 ( 252 ) 1 - 8  1999.08  [Refereed]

  • An Automatic Coarse Grain Parallel Processing Scheme Using Multiprocessor Scheduling Algorithms Considering Overlap of Task Execution and Data Transfer

    H. Kasahara, M. Kogou, T. Tobita, T. Masuda, T. Tanaka

    Proc. SCI99 and ISAS99   9   82 - 89  1999.08  [Refereed]

    CiNii

  • Meta-scheduling for a Cluster of Supercomputers

    H. Koide, T. Hirayama, A. Murasugi, T. Hayashi, H. Kasahara

    Proc. ICS99 Workshop     63 - 69  1999.06  [Refereed]

    CiNii

  • A Standard Task Graph Set for Fair Evaluation of Multiprocessor Scheduling Algorithms

    T. Tobita, H. Kasahara

    Proc. ICS99 Workshop     71 - 77  1999.06  [Refereed]

  • 階層型粗粒度並列処理における同一階層内ループ間データローカライゼーション手法

    吉田 明正, 越塚 健一, 岡本 雅巳, 笠原 博徳

    情報処理学会論文誌   40 ( 5 ) 2054 - 2063  1999.05  [Refereed]

     View Summary

    This paper proposes a data-localization scheme for hierarchical macro-dataflow processing, which hierachically exploits coarse-grain parallelism. The proposed data-localization scheme consists of three parts: (1) hierarchical loop aligned decomposition, which decomposes multiple loops having data dependences into data-localization-groups in each layer; (2) generation of hierarchical dynamic sheduling routine with partial static task assignment, which assigns macrotasks inside data-localization-group to the same processor-cluster in each layer; (3) generation of data transfer code via local memory inside data-localization-group. Performance evaluation on a multiprocessor system OSCAR shows that hierarchical macro-dataflow processing with data-localization can reduce execution time by 10-20% compared with hierarchical macro-dataflow processing without data-localization.

    CiNii

  • シングルチップマルチプロセッサ上での近細粒度並列処理

    木村 啓二, 尾形 航, 岡本 雅巳, 笠原 博徳

    情報処理学会論文誌   40 ( 5 ) 1924 - 1934  1999.05  [Refereed]

  • 並列分散科学技術計算の支援環境─SSP─

    武宮 博, 太田 浩史, 今村 俊幸, 小出 洋, 松田 勝之, 樋口 健二, 平山 俊雄, 笠原 博徳

    計算工学講演会論文集   4 ( 1 ) 333 - 336  1999.05  [Refereed]

    CiNii

  • Near Fine Grain Parallel Processing on Single Chip Multiprocessors

    K. Kimura, W. Ogata, M. Okamoto, H. Kasahara

    Trans. of IPSJ   40 ( 5 ) 1924 - 1934  1999.05  [Refereed]

    CiNii

  • A Data-Localization Scheme among Loops for each Layer in Hierarchical Coarse Grain Parallel Processing

    A.Yoshida, K. Koshizuka, M. Okamoto, H. Kasahara

    Trans. of IPSJ   40 ( 5 ) 2054 - 2063  1999.05  [Refereed]

     View Summary

    This paper proposes a data-localization scheme for hierarchical macro-dataflow processing, which hierachically exploits coarse-grain parallelism. The proposed data-localization scheme consists of three parts: (1) hierarchical loop aligned decomposition, which decomposes multiple loops having data dependences into data-localization-groups in each layer; (2) generation of hierarchical dynamic sheduling routine with partial static task assignment, which assigns macrotasks inside data-localization-group to the same processor-cluster in each layer; (3) generation of data transfer code via local memory inside data-localization-group. Performance evaluation on a multiprocessor system OSCAR shows that hierarchical macro-dataflow processing with data-localization can reduce execution time by 10-20% compared with hierarchical macro-dataflow processing without data-localization.

    CiNii

  • 処理とデータ転送のオーバーラップのための自動並列化手法

    古郷 誠, 田中 崇久, 藤本 謙作, 岡本 雅巳, 笠原 博徳

    情報処理学会第58回全国大会   3H-06  1999.03  [Refereed]

  • 最早実行可能条件解析を用いたキャッシュ最適化手法

    稲石 大祐, 木村 啓二, 藤本 謙作, 尾形 航, 岡本 雅巳, 笠原 博徳

    情報処理学会第58回全国大会   3H-07  1999.03  [Refereed]

  • マルチグレイン並列処理におけるサブルーチンを含むデータローカライゼーション手法

    宇治川 泰史, 成清 暁博, 小幡 元樹, 吉田 明正, 岡本 雅巳, 笠原 博徳

    情報処理学会第58回全国大会   2D-05  1999.03  [Refereed]

  • OSCARマルチグレイン並列化コンパイラを用いたスーパーコンピュータクラスタのためのメタ・スケジューリング手法

    村杉 明夫, 林 拓也, 飛田 高雄, 小出 洋, 笠原 博徳

    情報処理学会第58回全国大会   2D-06  1999.03  [Refereed]

  • OSCARマルチグレイン並列化コンパイラにおける階層的並列処理手法

    山本 晃正, 稲石 大祐, 宇治川 泰史, 小幡 元樹, 岡本 雅巳, 笠原 博徳

    情報処理学会第58回全国大会   2D-04   2D - 4  1999.03  [Refereed]

    CiNii

  • Near fine grain parallel processing using static scheduling on single chip multiprocessors

    Keiji Kimura, Hironori Kasahara

    Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems   1999-   23 - 31  1999  [Refereed]

     View Summary

    With the increase of the number of transistors integrated on a chip, efficient use of transistors and scalable improvement of effective performance of a processor are getting im-portant problems. However, it has been thought that popular superscalar and VLIW would have difficulty to obtain scalable improvement of effective performance in future because of the limitation of instruction level parallelism. To cope with this problem, a single chip multiprocessor (SCM) approach with multi grain parallel processing inside a chip, which hierarchically exploits loop parallelism and coarse grain parallelism among subroutines, loops and basic blocks in addition to instruction level parallelism, is thought one of the most promising approaches. This paper evaluates effectiveness of the single chip multiprocessor architectures with a shared cache, global registers, distributed shared memory and/or local memory for near fine grain parallel processing as the first step of research on SCM architecture to support multi grain parallel processing. The evaluation shows OSCAR (Optimally Scheduled Advanced Multiprocessor) architecture having distributed shared memory and local memory in addition to centralized shared memory and attachment of global register gives us significant speed up such as 13.8% to 143.8% for four pro-cessors compared with shared cache architecture for applications which have been difficult to extract parallelism effectively.

    DOI

    Scopus

    7
    Citation
    (Scopus)
  • Job Scheduling Scheme for Pure Space Sharing among Rigid Jobs

    K. Aida, H. Kasahara, S. Narita

    Proc. 4th Workshop on Job Scheduling Strategies for Parallel Processing     98 - 121  1998.12  [Refereed]

  • OSCAR Scalable Multigrain Parallelizing Compiler for Single Chip Multiprocessors to A Cluster of Supercomputers

    H. Kasahara

    Hosted by Prof. David Padua, University of Illinois at Urbana-Champaign    1998.11  [Refereed]

  • A Cache Optimization with Earliest Executable Condition Analysis

    D. Inaishi, K. Kimura, K. Fujimoto, W. Ogata, M. Okamoto, H. Kasahara

    Technical Report of IPSJ, ARC-130-6   1998 ( 70 ) 31 - 36  1998.08

     View Summary

    Cache optimizations by a compiler for a single processor machine have been mainly applied to a singlenested loop.On the contrary, this paper proposes a cache optimization scheme using earliest executable condition analysis for FORTRAN programs on a single processor system.OSCAR FORTRAN multi-grain automatic parallelizing compiler decomposes a FORTRAN program into three types of macrotasks(MT), such as loops, subroutines and basic blocks, and analyzes the earliest executable condition of each MT to extract coarse grain parallelism among MTs and generates a macrotask graph(MTG).The MTG represents data dependence and extended control dependence among MTs and an information of shared data among MTs.By using this MTG, a compiler realizes global code motion to use cache effectively.The code motion technique moves a MT, which accesses data accessed by a precedent MT on MTG, immediately after the precedent MT to increase a cache hit rate. This optimization is realized using OSCAR multi-grain compiler as a preprocessor to output an optimized sequential FORTRAN code.A performance evaluation shows about 62% speed up compared with original program on 167MHz UltraSPARC.

    CiNii

  • Multigrain parallel Processing on the Single Chip Multiprocessor

    K. Kimura, W. Ogata, M. Okamoto, H. Kasahara

    Technical Report of IPSJ,ARC-130-5    1998.08

  • Evaluation of Multigrain Parallelism using OSCAR FORTRAN Compiler

    M. Obata, G. Matsui, H. Matsuzaki, K. Kimura, D. Inaishi, Y. Ujigawa, T. Yamamoto, M. Okamoto, H. Kasahara

    Technical Report of IPSJ, ARC-130-3    1998.08

  • 最早実行可能条件解析を用いたキャッシュ利用の最適化

    稲石 大祐, 木村 啓二, 藤本 謙作, 尾形 航, 岡本 雅巳, 笠原 博徳

    情報処理学会研究報告ARC130-6   1998 ( 70 ) 31 - 36  1998.08  [Refereed]

     View Summary

    Cache optimizations by a compiler for a single processor machine have been mainly applied to a singlenested loop.On the contrary, this paper proposes a cache optimization scheme using earliest executable condition analysis for FORTRAN programs on a single processor system.OSCAR FORTRAN multi-grain automatic parallelizing compiler decomposes a FORTRAN program into three types of macrotasks(MT), such as loops, subroutines and basic blocks, and analyzes the earliest executable condition of each MT to extract coarse grain parallelism among MTs and generates a macrotask graph(MTG).The MTG represents data dependence and extended control dependence among MTs and an information of shared data among MTs.By using this MTG, a compiler realizes global code motion to use cache effectively.The code motion technique moves a MT, which accesses data accessed by a precedent MT on MTG, immediately after the precedent MT to increase a cache hit rate. This optimization is realized using OSCAR multi-grain compiler as a preprocessor to output an optimized sequential FORTRAN code.A performance evaluation shows about 62% speed up compared with original program on 167MHz UltraSPARC.

    CiNii

  • シングルチップマルチプロセッサ上でのマルチグレイン並列処理

    木村 啓二, 尾形 航, 岡本 雅巳, 笠原 博徳

    情報処理学会研究報告ARC130-5    1998.08  [Refereed]

  • OSCAR FORTRAN Compilerを用いたマルチグレイン並列性の評価

    小幡 元樹, 松井 巌徹, 松崎 秀則, 木村 啓二, 稲石 大祐, 宇治川 泰史, 山本 晃正, 岡本 雅巳, 笠原 博徳

    情報処理学会研究報告ARC130-3     13 - 18  1998.08  [Refereed]

    CiNii

  • Multigrain parallel Processing on the Single Chip Multiprocessor

    K. Kimura, W. Ogata, M. Okamoto, H. Kasahara

    Technical Report of IPSJ,ARC-130-5    1998.08  [Refereed]

  • A Cache Optimization with Earliest Executable Condition Analysis

    D. Inaishi, K. Kimura, K. Fujimoto, W. Ogata, M. Okamoto, H. Kasahara

    Technical Report of IPSJ, ARC-130-6   1998 ( 70 ) 31 - 36  1998.08  [Refereed]

     View Summary

    Cache optimizations by a compiler for a single processor machine have been mainly applied to a singlenested loop.On the contrary, this paper proposes a cache optimization scheme using earliest executable condition analysis for FORTRAN programs on a single processor system.OSCAR FORTRAN multi-grain automatic parallelizing compiler decomposes a FORTRAN program into three types of macrotasks(MT), such as loops, subroutines and basic blocks, and analyzes the earliest executable condition of each MT to extract coarse grain parallelism among MTs and generates a macrotask graph(MTG).The MTG represents data dependence and extended control dependence among MTs and an information of shared data among MTs.By using this MTG, a compiler realizes global code motion to use cache effectively.The code motion technique moves a MT, which accesses data accessed by a precedent MT on MTG, immediately after the precedent MT to increase a cache hit rate. This optimization is realized using OSCAR multi-grain compiler as a preprocessor to output an optimized sequential FORTRAN code.A performance evaluation shows about 62% speed up compared with original program on 167MHz UltraSPARC.

    CiNii

  • Evaluation of Multigrain Parallelism using OSCAR FORTRAN Compiler

    M. Obata, G. Matsui, H. Matsuzaki, K. Kimura, D. Inaishi, Y. Ujigawa, T. Yamamoto, M. Okamoto, H. Kasahara

    Technical Report of IPSJ, ARC-130-3    1998.08  [Refereed]

  • Job Scheduling Scheme for Pure Space Sharing among Rigid Jobs

    K. Aida, H. Kasahara, S. Narita

    Lecture Notes in Computer Science   1459, Springer   33 - 45  1998.08  [Refereed]

    CiNii

  • 実用的並列最適化マルチプロセッサスケジューリングアルゴリズム PDF/IHS の大規模問題への適用と性能評価

    飛田 高雄, 笠原 博徳

    情報処理学会並列処理シンポジウムJSPP '98論文集     31 - 37  1998.06  [Refereed]

  • 階層型マクロデータフロー処理における同一階層内ループ間データローカライゼーション手法

    吉田 明正, 越塚 健一, 岡本 雅巳, 小幡 元樹, 笠原 博徳

    情報処理学会並列処理シンポジウムJSPP '98論文集     375 - 382  1998.06  [Refereed]

  • Data-Localization among Doall and Sequential Loops in Coarse Grain Parallel Processing

    A. YOSHIDA, Y. UJIGAWA, M. OBATA, K. KIMURA, H. KASAHARA

    Seventh Workshop on Compilers for Parallel Computers, Linkoping, Sweden     266 - 277  1998.06  [Refereed]

  • Application and Evaluation of a Practical Parallel Optimization Algorithm PDF/IHS (Parallelized Depth First / Implicit Heuristic Search) to Large Scale Problems

    T. Tobita, H. Kasahara

    Joint Symposium on Parallel Processing (JSPP'98)     31 - 37  1998.06  [Refereed]

  • A Data-Localization Scheme among Loops inside the Same Layer of Hierarchical Macro-Dataflow Processing

    A. Yoshida, K. Koshizuka, M. Okamoto, M. Obata, H. Kasahara

    Joint Symposium on Parallel Processing (JSPP'98)     375 - 382  1998.06  [Refereed]

  • 並列分散科学技術計算環境STA(4)─異機種並列計算機の統合利用環境の構築

    今村 俊幸, 太田 浩史, 川崎 啄治, 小出 洋, 武宮 博, 樋口 健二, 久野 章則, 笠原 博徳, 相川裕史

    計算工学講演会論文集   3  1998.05  [Refereed]

    CiNii

  • 並列分散科学技術計算環境STA(3)─異機種並列計算機間通信ライブラリの構築

    小出 洋, 今村 俊幸, 太田 浩史, 川崎 啄治, 武宮 博, 樋口 健二, 笠原 博徳, 相川裕史

    計算工学講演会論文集   3  1998.05  [Refereed]

  • 並列分散科学技術計算環境STA(2)─エディタを中心に統合された並列プログラム開発環境PPDEの構築

    太田 浩史, 今村 俊幸, 川崎 啄治, 小出 洋, 武宮 博, 樋口 健二, 笠原 博徳, 相川裕史

    計算工学講演会論文集   3  1998.05  [Refereed]

  • 並列分散科学技術計算環境STA(1)─目的及び概要

    武宮 博, 今村 俊幸, 太田 浩史, 川崎 琢治, 小出 洋, 笠原 博徳, 相川 裕史

    計算工学講演会論文集   3  1998.05  [Refereed]

    CiNii

  • A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing

    H Kasahara, A Yoshida

    PARALLEL COMPUTING   24 ( 3-4 ) 579 - 596  1998.05  [Refereed]

     View Summary

    This paper proposes a compilation scheme for data localization using partial-static task assignment for Fortran coarse-grain parallel processing, or macro-dataflow processing, on a multiprocessor system with local memories and centralized shared memory. The data localization allows us to effectively use local memories and reduce data transfer overhead under dynamic task-scheduling environment. The proposed compilation scheme mainly consists of the following three parts: (1) loop-aligned decomposition, which decomposes each of the loops having data dependence among them into smaller loops, and groups the decomposed loops into data-localizable groups so that shared data among the decomposed loops inside each group can be passed via local memory and data transfer overhead among the groups can be minimum; (2) partial static task assignment, which gives information that the decomposed loops inside each data-localizable group are assigned to the same processor to a dynamic scheduling routine generator in the macro-dataflow compiler; (3) parallel machine code generation, which generates parallel machine code to pass shared data inside the group through local memory and transfer data among groups through centralized shared memory. This compilation scheme has been implemented for a multiprocessor system, OSCAR (Optimally SCheduled Advanced multiprocessoR), having centralized shared memory and distributed shared memory, in addition to local memory on each processor. Performance evaluation of OSCAR shows that macro-dataflow processing with the proposed data-localization scheme can reduce the execution time by 20%, in average, compared with macro-dataflow processing without data localization. (C) 1998 Elsevier Science B.V. All rights reserved.

  • A Multigrain Parallelizing Compiler and Its Architectural Support

    H. Kasahara, W. Ogata, K. Kimura, M. Obata, T. Tobita, D. Inaishi

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, TECHNICAL REPORT OF IEICE. (ICD98-10, CPSY98-10, FTS98-10)   98 ( 22 ) 71 - 76  1998.04

     View Summary

    Currently, difficulty of enlargement of the world market for supercomputers caused by cost-performance, which does not seem excellent for real effective performance, and need of high experience for parallel tuning is getting a problem. Also, in general purpose microprocessors, limitations of extraction of instruction level parallelism being used by super-scalar and VLIW architectures are getting clear. This paper describes a multigrain compilation technology and architectural support for it as an approach to cope with the above difficulites and develop user friendly and excellent cors performance supercomputers and single chip multiprocesors.

    CiNii

  • 電磁界解析における有限要素・境界要素併用法の並列処理手法

    小幡 元樹, 前川 仁孝, 若尾 真治, 小貫 天, 笠原 博徳

    電気学会論文誌 A (基礎・材料・共通部門誌)   118-A ( 4 ) 377 - 379  1998.04  [Refereed]

    CiNii

  • マルチグレイン並列化コンパイラとそのアーキテクチャ支援

    笠原 博徳, 尾形 航, 木村 啓二, 小幡 元樹, 飛田 高雄, 稲石 大祐

    社団法人 電子情報通信学会, 信学技報, ICD98-10, CPSY98-10, FTS98-10   98 ( 22 ) 71 - 76  1998.04  [Refereed]

     View Summary

    Currently, difficulty of enlargement of the world market for supercomputers caused by cost-performance, which does not seem excellent for real effective performance, and need of high experience for parallel tuning is getting a problem. Also, in general purpose microprocessors, limitations of extraction of instruction level parallelism being used by super-scalar and VLIW architectures are getting clear. This paper describes a multigrain compilation technology and architectural support for it as an approach to cope with the above difficulites and develop user friendly and excellent cors performance supercomputers and single chip multiprocesors.

    CiNii

  • マルチグレイン並列化コンパイラとそのアーキテクチャ支援

    笠原 博徳

    社団法人 電子情報通信学会, 信学技報, ICD98-10, CPSY98-10, FTS98-10   98 ( 22 ) 71 - 76  1998.04  [Refereed]

     View Summary

    Currently, difficulty of enlargement of the world market for supercomputers caused by cost-performance, which does not seem excellent for real effective performance, and need of high experience for parallel tuning is getting a problem. Also, in general purpose microprocessors, limitations of extraction of instruction level parallelism being used by super-scalar and VLIW architectures are getting clear. This paper describes a multigrain compilation technology and architectural support for it as an approach to cope with the above difficulites and develop user friendly and excellent cors performance supercomputers and single chip multiprocesors.

    CiNii

  • Parallel Processing of Hybrid Finite Element and Boundary Element Method for Electro-magnetic Field Analysis

    M. Obata, Y. Maekawa, S. Wakao, T. Onuki, H. Kasahara

    Trans.IEE of Japan   118-A ( 4 ) 377 - 379  1998.04  [Refereed]

    CiNii

  • A Multigrain Parallelizing Compiler and Its Architectural Support

    H. Kasahara, W. Ogata, K. Kimura, M. Obata, T. Tobita, D. Inaishi

    THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, TECHNICAL REPORT OF IEICE. (ICD98-10, CPSY98-10, FTS98-10)   98 ( 22 ) 71 - 76  1998.04  [Refereed]

     View Summary

    Currently, difficulty of enlargement of the world market for supercomputers caused by cost-performance, which does not seem excellent for real effective performance, and need of high experience for parallel tuning is getting a problem. Also, in general purpose microprocessors, limitations of extraction of instruction level parallelism being used by super-scalar and VLIW architectures are getting clear. This paper describes a multigrain compilation technology and architectural support for it as an approach to cope with the above difficulites and develop user friendly and excellent cors performance supercomputers and single chip multiprocesors.

    CiNii

  • A Multigrain Parallelizing Compiler and Its Architectural Support, THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, TECHNICAL REPORT OF IEICE. (ICD98-10, CPSY98-10, FTS98-10)

    H. Kasahara

       1998.04  [Refereed]

  • Implementation of FPGA Based Architecture Test Bed For Multi Processor System

    W. Ogata, T. Yamamoto, M. Mizuno, K. Kimura, H. Kasahara

    IPSJ SIG Notes, 98-ARC-128-14    1998.03

  • 科学技術計算プログラムにおけるマルチグレイン並列性の評価

    小幡 元樹, 松井 巌徹, 松崎 秀則, 木村 啓二, 稲石 大裕, 宇治川 泰史, 山本 晃正, 岡本 雅巳, 笠原 博徳

    情報処理学会第56回全国大会   2E-07  1998.03  [Refereed]

  • 一般的なマクロタスクグラフに対するループ間データローカライゼーション手法

    松崎秀則, 吉田明正, 岡本雅巳, 松井巌徹, 小幡元樹, 宇治川泰史, 笠原博徳

    情報処理学会第56回全国大会   2E-05  1998.03  [Refereed]

  • 異機種並列分散コンピューティングのためのメタ・スケジューリングの構想

    小出 洋, 武宮 博, 今村 俊幸, 太田 浩史, 川崎 琢治, 樋口 健二, 笠原 博徳, 相川 裕史

    情報処理学会第56回全国大会   2J-10  1998.03  [Refereed]

    CiNii

  • マルチグレイン並列処理用シングルチップマルチプロセッサアーキテクチャ

    木村 啓二, 尾形 航, 岡本 雅巳, 笠原 博徳

    情報処理学会第56回全国大会   1N-03  1998.03  [Refereed]

  • マルチグレイン並列処理におけるインタープロシージャ解析

    松井 巌徹, 岡本 雅巳, 松崎 秀則, 小幡 元樹, 吉井 謙一郎, 笠原 博徳

    情報処理学会第56回全国大会   2E-04  1998.03  [Refereed]

  • マクロタスク最早実行可能条件解析を用いたキャッシュ最適化手法

    稲石 大祐, 木村 啓二, 尾形 航, 岡本 雅巳, 笠原 博徳

    情報処理学会第56回全国大会   2E-06   303 - 304  1998.03  [Refereed]

    CiNii

  • FPGAを用いたマルチプロセッサシステムテストベッドの実装

    尾形 航, 山本 泰平, 水尾 学, 木村 啓二, 笠原 博徳

    情報処理学会, ARC研究会,98-ARC-128-14    1998.03  [Refereed]

  • Job Scheduling Scheme for Pure Space Sharing among Rigid Jobs

    K. Aida, H. Kasahara, S. Narita

    Proc. 4th Workshop on Job Scheduling Strategies for Parallel Processing     98 - 121  1998.03  [Refereed]

  • Implementation of FPGA Based Architecture Test Bed For Multi Processor System

    W. Ogata, T. Yamamoto, M. Mizuno, K. Kimura, H. Kasahara

    IPSJ SIG Notes, 98-ARC-128-14    1998.03  [Refereed]

  • OSCAR multi-grain architecture and its evaluation

    H Kasahara, W Ogata, K Kimura, G Matsui, H Matsuzaki, M Okamoto, A Yoshida, H Honda

    INNOVATIVE ARCHITECTURE FOR FUTURE GENERATION HIGH-PERFORMANCE PROCESSORS AND SYSTEMS, PROCEEDINGS     106 - 115  1998  [Refereed]

     View Summary

    OSCAR (Optimally Scheduled Advanced Multiprocessor) was designed to efficiently realize multi-grain parallel processing using static and dynamic scheduling. It is a shared memory multiprocessor system having centralized and distributed shared memories in addition to local memory on each processor with data transfer controller for overlapping of data transfer and task processing. Also, its Fortran multi-grain compiler hierarchically exploits coarse grain parallelism among loops, subroutines and basic blocks, conventional medium grain parallelism among loop-iterations in a Doall loop and near fine grain parallelism among statements. At the coarse grain parallel processing, data localization (automatic data distribution) have been employed to minimize data transfer overhear. In the near fine grain processing of a basic block, explicit synchronization can be removed by use of a clock level accurate code scheduling technique with architectural supports. This paper describes OSCAR's architecture, its compiler and the performance for the multi-grain parallel processing. OSCAR's architecture and compilation technology will be more important in future High Performance Computers and single chip multiprocessors.

  • Performance Evaluation of a Practical Parallel Optimization Multiprocessor Scheduling Algorithm PDF/HIS

    T. Tobita, H. Kasahara

    IPSJ SIG Notes   97 ( 113 ) 13 - 18  1997.11

  • 実用的並列最適化マルチプロセッサスケジューリングアルゴリズムPDF/IHSの性能評価

    飛田 高雄, 笠原 博徳

    情報処理学会研究報告   97 ( 113 ) 13 - 18  1997.11  [Refereed]

  • ヒューマンノイド-人間形高度情報処理ロボット-

    橋本 周司, 成田 誠之助, 白井 克彦, 小林 哲則, 高西 淳夫, 菅野 重樹, 笠原 博徳

    情報処理   38 ( 11 ) 959 - 969  1997.11  [Refereed]

    CiNii

  • Performance Evaluation of a Practical Parallel Optimization Multiprocessor Scheduling Algorithm PDF/HIS

    T. Tobita, H. Kasahara

    IPSJ SIG Notes   97 ( 113 ) 13 - 18  1997.11  [Refereed]

  • Humanoid - Intelligent Anthropomorphic Robot

    S. Hashimoto, S. Narita, K. Shirai, T. Kobayashi, A. Takanishi, S. Sugano, H. Kasahara

    IPSJ MAGAZINE   38 ( 11 ) 959 - 969  1997.11  [Refereed]

    CiNii

  • 21世紀へ向けたHPCにおける日本-EU技術移転と協力

    笠原 博徳

    教育・科学技術に関する日本・EU協力会議ラウンドテーブル論文集, United Nations University    1997.09  [Refereed]

  • Technology Transfer and Cooperation in HPC Toward the 21st Century Between Japan and EU

    H. Kasahara

    Conference on EU-Japan Co-operation in Education, Science and Technology: Round Table on Science and Technology    1997.09  [Refereed]

  • Parallel Processing of Hybrid Finite Element and Boundary Element Method for Electro-magnetic field analysis

    M. Obata, Y. Maekawa, S. Wakao, T. Onuki, H. Kasahara

    IPSJ SIG Notes, 97-HPC-67-3    1997.08

  • Multi-processor system for Multi-grain Parallel Processing

    K. Iwai, T. Fujiwara, T. Morimura, H. Amano, K. Kimura, W. Ogata, H. Kasahara

    Technical Report of IEICE, CPSY97-46    1997.08

  • A Macro Task Dynamic Scheduling Algorithm with Overlapping of Task Processing and Data Transfer

    K. Kimura, S. Hashimoto, M. Kogou, W. Ogata, H. Kasahara

    Technical Report of IEICE, CPSY97-40    1997.08

  • Evaluation of a Practical Parallel Optimization Algorithm for the Minimum Execution-Time Multiprocessor Scheduling Problem

    T. Tobita, H. Kasahara

    Technical Report of IEICE, CPSY97-39    1997.08

  • Data-Localization for Fortran Hierarchical Macro-Dataflow Processing

    Yoshida, K. Koshizuka, M. Okamoto, H. Kasahara

    IPSJ SIG Notes,97-ARC-125-2   1997 ( 76 ) 7 - 12  1997.08

     View Summary

    This paper proposes a data-localization scheme for Fortran hierarchical macro-dataflow processing, which hierarchically exploits coarse-grain parallelism. The proposed data-localization scheme consists of three parts: (1) loop-aligned decomposition, which decomposes multiple loops having data dependences into data-localization-groups, (2) generation of dynamic scheduling routine with partial static task assignment, which assigns macrotasks inside data-localization-group to the same processor, (3) generation of data transfer code via local memory inside data-localization-group. Performance evaluations on a multiprocessor system OSCAR show that hierarchical macro-dataflow processing with data-localization can reduce execution time by 10%-20% compared with hierarchical macro-dataflow processing without data-localization.

    CiNii

  • 処理とデータ転送のオーバーラッピングを考慮したダイナミックスケジューリングアルゴリズム

    木村 啓二, 橋本 茂, 古郷 誠, 尾形 航, 笠原 博徳

    電子情報通信学会研究報告、CPSY97-40    1997.08  [Refereed]

  • 実行時間最小マルチプロセッサスケジューリング問題に対する実用的並列最適化アルゴリズムの性能評価

    飛田 高雄, 笠原 博徳

    電子情報通信学会研究報告、CPSY97-39    1997.08  [Refereed]

  • マルチグレイン並列処理用マルチプロセッサシステム

    岩井 啓輔, 藤原 崇, 森村 知弘, 天野 英晴, 木村 啓二, 尾形 航, 笠原 博徳

    電子情報通信学会研究報告, CPSY97-46    1997.08  [Refereed]

    CiNii

  • 電磁界解析における有限要素・境界要素併用法の並列処理

    小幡 元樹, 前川 仁孝, 若尾 真治, 小貫 天, 笠原 博徳

    電気学会電子・情報・システム部門大会講演論文集     549 - 554  1997.08  [Refereed]

  • Fortran階層型マクロデータフロー処理におけるデータローカライゼーション

    吉田 明正, 越塚 健一, 岡本 雅巳, 笠原 博徳

    情報処理学会研究会報告、97-ARC-125-2    1997.08  [Refereed]

  • 電磁界解析における有限要素・境界要素併用法の並列処理手法

    小幡 元樹, 前川 仁孝, 若尾 真治, 小貫 天, 笠原 博徳

    情報処理学会研究会報告, 97-HPC-67-3   1997 ( 75 ) 13 - 18  1997.08  [Refereed]