曹钦平
华南理工大学电子与信息学院 ??广州 ??510641
基金项目:广东省科技计划项目(2014B090910002)
详细信息
作者简介:秦华标:男,1967年生,教授,研究方向为智能信息处理、无线通信网络、嵌入式系统、FPGA设计
曹钦平:男,1995年生,硕士生,研究方向为集成电路设计
通讯作者:秦华标 eehbqin@scut.edu.cn
中图分类号:TP331计量
文章访问数:3128
HTML全文浏览量:1605
PDF下载量:182
被引次数:0
出版历程
收稿日期:2019-01-22
修回日期:2019-06-10
网络出版日期:2019-06-20
刊出日期:2019-11-01
Design of Convolutional Neural Networks Hardware Acceleration Based on FPGA
Huabiao QIN,,Qinping CAO
School of Electronics and Information Engineering, South China University of Technology, Guangzhou 510641, China
Funds:The Science and Technology Project of Guangdong Provience (2014B090910002)
摘要
摘要:针对卷积神经网络(CNN)计算量大、计算时间长的问题,该文提出一种基于现场可编程逻辑门阵列(FPGA)的卷积神经网络硬件加速器。首先通过深入分析卷积层的前向运算原理和探索卷积层运算的并行性,设计了一种输入通道并行、输出通道并行以及卷积窗口深度流水的硬件架构。然后在上述架构中设计了全并行乘法-加法树模块来加速卷积运算和高效的窗口缓存模块来实现卷积窗口的流水线操作。最后实验结果表明,该文提出的加速器能效比达到32.73 GOPS/W,比现有的解决方案高了34%,同时性能达到了317.86 GOPS。
关键词:卷积神经网络/
硬件加速/
现场可编程逻辑门阵列/
计算并行/
深度流水
Abstract:Considering the large computational complexity and the long-time calculation of Convolutional Neural Networks (CNN), an Field-Programmable Gate Array(FPGA)-based CNN hardware accelerator is proposed. Firstly, by deeply analyzing the forward computing principle and exploring the parallelism of convolutional layer, a hardware architecture in which parallel for the input channel and output channel, deep pipeline for the convolution window is presented. Then, a full parallel multi-addition tree is designed to accelerate convolution and efficient window buffer to implement deep pipelining operation of convolution window. The experimental results show that the energy efficiency ratio of proposed accelerator reaches 32.73 GOPS/W, which is 34% higher than the existing solutions, as the performance reaches 317.86 GOPS.
Key words:Convolutional Neural Networks (CNN)/
Hardware acceleration/
FPGA/
Parallel computation/
Deep pipeline
PDF全文下载地址:
https://jeit.ac.cn/article/exportPdf?id=33d031ea-ad5d-4c9c-80ef-f69b4a6067dd