| 引用本文: | 李 雷,黎远松,石 睿.基于 FPGA 的 YOLOv4-tiny 网络的硬件加速与实现(J/M/D/N,J:杂志,M:书,D:论文,N:报纸).期刊名称,2026,43(3):45-52 |
| CHEN X. Adap tive slidingmode contr ol for discrete2ti me multi2inputmulti2 out put systems[ J ]. Aut omatica, 2006, 42(6): 4272-435 |
|
| 摘要: |
| :目的 为解决在资源受限的情况下, 目标检测算法在边缘硬件平台中提高计算性能和能效比,提出了一种基
于现场可编程门阵列(Field Programmable Gate Array, FPGA)的边缘硬件平台实现对 YOLOv4-tiny 网络的加速设
计并进行验证。 方法 采用了高层次综合技术(High level Synthesis)对算法的算子和模块进行了高度并行的设计与
优化。 为提高设计吞吐量,采用了双缓冲策略增加系统资源利用率,并利用卷积层与 BN(Batch Normalization)层融
合和量化模型技术,减少了模型参数量,提高了计算密度。 结果 在 PYNQ-Z2 平台上进行实验,结果表明:加速器
的计算性能为 15. 33 GOPS,总功耗为 2. 65 W,相较于同类研究的 FPGA 平台计算性能提高了 2. 79 倍,相较于 CPU
平台的能效比提高了 29. 5 倍。 结论 对 YOLOv4-tiny 网络在边缘 FPGA 平台加速效果有所提升,为目标检测算法
在硬件平台的加速研究提供了参考。 |
| 关键词: 现场可编程门阵列 高层次综合 YOLO 硬件加速 |
| DOI: |
| 分类号: |
| 基金项目: |
|
| Hardware Acceleration and Implementation of YOLOv4-Tiny Network Based on FPGA |
|
LI Lei LI Yuansong SHI Rui
|
|
School of Computer Science and Engineering Sichuan University of Science and Engineering Yibin 643000 Sichuan
China
|
| Abstract: |
| Objective To enhance the computational performance and energy efficiency ratio of object detection algorithms
on edge hardware platforms under resource-constrained conditions this paper proposes and verifies an edge hardware
platform based on field programmable gate array FPGA for accelerating the YOLOv4-tiny network. Methods High-level
synthesis HLS was used to design and optimize the operators and modules of the algorithm in a highly parallel manner.
To improve design throughput a double-buffering strategy was adopted to increase system resource utilization.
Additionally techniques such as fusion of convolutional and batch normalization BN layers and model quantization were
applied to reduce model parameters and enhance computational density. Results Experiments conducted on the PYNQZ2 platform demonstrate that the accelerator achieved a computational performance of 15. 33 GOPS with a total power
consumption of 2. 65 W. Compared to existing FPGA platforms the proposed design improved computational performance
by 2. 79 times while achieving a 29. 5-fold increase in energy efficiency ratio compared to CPU platforms.
Conclusion The proposed method effectively enhances the acceleration of the YOLOv4-tiny network on edge FPGA
platforms providing a valuable reference for the acceleration research of object detection algorithms on hardware
platforms. |
| Key words: field programmable gate array high-level synthesis YOLO hardware acceleration |