周凯利,,
王伊昌,
王广,
袁军
重庆邮电大学 光电工程学院/国际半导体学院 ??重庆 ??400065
基金项目:国家自然科学基金(61404019),重庆市集成电路产业重大主题专项(cstc2018jszx-cyztzx0211, cstc2018jszx-cyztzx0217)
详细信息
作者简介:王巍:男,1967年生,博士后,教授,研究方向为集成电路设计
周凯利:女,1991年生,硕士生,研究方向为数字集成电路设计
王伊昌:男,1996年生,硕士生,研究方向为模拟集成电路设计
王广:男,1994年生,硕士生,研究方向为半导体光电器件设计
袁军:男,1984年生,博士,副教授,研究方向为数模混合集成电路设计
通讯作者:周凯利 2508005354@qq.com
中图分类号:TN432计量
文章访问数:2325
HTML全文浏览量:1081
PDF下载量:91
被引次数:0
出版历程
收稿日期:2019-01-15
修回日期:2019-03-20
网络出版日期:2019-05-23
刊出日期:2019-11-01
Design of Convolutional Neural Networks Accelerator Based on Fast Filter Algorithm
Wei WANG,Kaili ZHOU,,
Yichang WANG,
Guang WANG,
Jun YUAN
College of Electronics Engineering/International Semiconductor College, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Funds:The National Natural Science Foundation of China (61404019), Major Themes of Integrated Circuit Industry in Chongqing (cstc2018jszx-cyztzx0211, cstc2018jszx-cyztzx0217)
摘要
摘要:为减少卷积神经网络(CNN)的计算量,该文将2维快速滤波算法引入到卷积神经网络,并提出一种在FPGA上实现CNN逐层加速的硬件架构。首先,采用循环变换方法设计行缓存循环控制单元,用于有效地管理不同卷积窗口以及不同层之间的输入特征图数据,并通过标志信号启动卷积计算加速单元来实现逐层加速;其次,设计了基于4并行快速滤波算法的卷积计算加速单元,该单元采用若干小滤波器组成的复杂度较低的并行滤波结构来实现。利用手写数字集MNIST对所设计的CNN加速器电路进行测试,结果表明:在xilinx kintex7平台上,输入时钟为100 MHz时,电路的计算性能达到了20.49 GOPS,识别率为98.68%。可见通过减少CNN的计算量,能够提高电路的计算性能。
关键词:卷积神经网络/
快速滤波算法/
FPGA/
并行结构
Abstract:In order to reduce the computational complexity of Convolutional Neural Network(CNN), the two-dimensional fast filtering algorithm is introduced into the CNN, and a hardware architecture for implementing CNN layer-by-layer acceleration on FPGA is proposed. Firstly, the line buffer loop control unit is designed by using the cyclic transformation method to manage effectively different convolution windows and the input feature map data between different layers, and starts the convolution calculation acceleration unit by the flag signal to realize layer-by-layer acceleration. Secondly, a convolution calculation accelerating unit based on 4 parallel fast filtering algorithm is designed. The unit is realized by a less complex parallel filtering structure composed of several small filters. Using the handwritten digit set MNIST to test the designed CNN accelerator circuit, the results show that on the xilinx kintex7 platform, when the input clock is 100 MHz, the computational performance of the circuit reaches 20.49 GOPS, and the recognition rate is 98.68%. It can be seen that the computational performance of the circuit can be improved by reducing the amount of calculation of the CNN.
Key words:Convolution Neural Network(CNN)/
Fast filter algorithms/
FPGA/
Parallel structure
PDF全文下载地址:
https://jeit.ac.cn/article/exportPdf?id=f8f1987b-fabe-4777-afc4-f07ab9f0323d