Updated on 2025/05/09

写真a

 
SHA, Xingan
 
Affiliation
Faculty of Science and Engineering, School of Fundamental Science and Engineering
Job title
Research Associate
 

Internal Special Research Projects

  • Accelerating Stable Diffusion: A Low-Latency and Energy-Efficient FPGA Solution

    2024  

     View Summary

    This research will produce one of the first FPGA-based accelerators tailored for the Stable Diffusion model, a state-of-the-art generative AI widely used for text-to-image synthesis. The key expected outcomes are as follows:1. FPGA PrototypeAn FPGA implementation capable of running Stable Diffusion inference efficiently is being developed. The design implements a unified computation architecture to support convolution, transpose convolution, and matrix multiplication. This approach enables better resource utilization and higher performance under limited FPGA capacity, compared with other works.2. Latency and Power OptimizationBy introducing a zero-padding skipping mechanism avoiding unmeaning multiply-and-accumulate computations and an optimized on-chip memory hierarchy design minimizing energy-consuming off-chip DRAM access by efficiently utilizing on-chip memory, the accelerator will significantly increase processing speed and minimize the most energy-consuming off-chip memory access. These techniques aim to achieve the best balance between inference latency and power consumption.3. Benchmarking and ComparisonComprehensive benchmarking will be conducted against high-end GPUs such as the RTX 4080, focusing on energy efficiency, which is frames per second divided by power. And the estimated results of the under-developing FPGA implementation show the potentials of this design in cost and energy-efficiency. The results will highlight the advantages and limitations of FPGA in running large-scale diffusion models.4. Dissemination and Open SourceThe research will lead to one publication in a hardware or embedded systems conference or journal. To support reproducibility and encourage future work, the FPGA design and evaluation scripts will be released as open source in Github.5. Broader ImpactsThis work will demonstrate the feasibility of energy-efficient generative AI on cost-effective hardware, making models like Stable Diffusion more accessible for low-resource environments and edge devices. This contributes to the democratization of generative AI, enabling more people to access and benefit from cutting-edge AI technologies.