基于FPGA的深度学习加速器设计与实现.pdf-资料库

9fa97402-7ecf-4618-9f71-88950edf95e1.pdf-第1页.png

第1页 / 共96页

9fa97402-7ecf-4618-9f71-88950edf95e1.pdf-第2页.png

第2页 / 共96页

9fa97402-7ecf-4618-9f71-88950edf95e1.pdf-第3页.png

第3页 / 共96页

9fa97402-7ecf-4618-9f71-88950edf95e1.pdf-第4页.png

第4页 / 共96页

9fa97402-7ecf-4618-9f71-88950edf95e1.pdf-第5页.png

第5页 / 共96页

9fa97402-7ecf-4618-9f71-88950edf95e1.pdf-第6页.png

第6页 / 共96页

9fa97402-7ecf-4618-9f71-88950edf95e1.pdf-第7页.png

第7页 / 共96页

9fa97402-7ecf-4618-9f71-88950edf95e1.pdf-第8页.png

第8页 / 共96页

中国科学技术大学硕士学位论文基于FPGA的深度学习加速器设计与实现作者姓名：学科专业：导师姓名：完成时间：万方数据

II IIIIII lIIIIIIII IlUl Y3021 457 I IIIIII University of Science and Technology of China A d issertation for master’S deg ree ● ■、 Deep Iearn I ng aCCeIeratOr design and implementation ■ ■ ● based on FPGA Author’S Name： Speciality： Finished time： Prof．Xuehai Zhou ￡!Q￡gh垒Q翌迟坐g 鲤鱼!，2Q!鱼万方数据

中国科学技术大学学位论文原创性声明本人声明所呈交的学位论文，是本人在导师指导下进行研究工作所取得的成果。除已特别加以标注和致谢的地方外，论文中不包含任何他人已经发表或撰写过的研究成果。与我一同工作的同志对本研究所做的贡献均己在论文中作了明确的说明。作者签名：参耋签字日期：堡堡垒三旦堑鱼中国科学技术大学学位论文授权使用声明作为申请学位的条件之一，学位论文著作权拥有者授权中国科学技术大学拥有学位论文的部分使用权，即：学校有权按有关规定向国家有关部门或机构送交论文的复印件和电子版，允许论文被查阅和借阅，可以将学位论文编入《中国学位论文全文数据库》等有关数据库进行检索，可以采用影印、缩印或扫描等复制手段保存、汇编学位论文。本人提交的电子文档的内容和纸质论文的内容相一致。保密的学位论文在解密后也遵守此规定。母公开口保密(——年) 作者签名：鲢签字日期：塑』照陛5臼导师签名：签字日期：塑[‘!』：：丛万方数据

摘要摘要近年来，随着计算能力的剧增和学科技术相互渗透、不断发展，机器学习渐渐被大众所认知和接受，并逐渐出现在大众生活中。无论是逛淘宝时物品推荐，汽车无人驾驶，还是轰动一时的人机围棋大战AlphaGo，机器学习让人们感叹科技力量的同时也改善了人们的日常生活。深度学习作为机器学习的新兴领域，起源对人工神经网络的进一步研究，为生物科学和计算机科学相互交叉渗透的产物，其在处理复杂抽象的学习问题上有着出色表现，也因此迅速在学术界和商业界风靡。然而，为了解决更加抽象、更加复杂的学习问题，深度学习的网络规模在不断增加，计算和数据的复杂也随之剧增，比如Google Cat系统网络具有10亿左右个神经元连接。如何高性能低能耗地实现深度学习相关算法，则成为科研机构的研究热点。现场可编程门阵列FPGA作为常用的加速手段之一，具有高性能、低功耗、可编程等特点。本文采用FPGA设计针对深度学习通用计算部分的加速器，主要工作有： 1)、分析深度神经网络、卷积神经网络的预测过程和训练过程算法共性和特性，并以此为基础设计FPGA运算单元，算法包括前向计算算法、本地预训练算法和全局训练算法。 2)、根据FPGA资源情况设计基本运算单元，包括前向计算单元和权值更新运算单元。运算单元均进行可配置和流水线设计，在适应不同规模深度学习神经网络的同时具有高吞吐率。 3)、分析FPGA加速器的上层框架和数据通路，编写linux操作系统下驱动程序以及面向上层用户简单易用的调用接口。 4)、通过大量实验测试分析影响加速器性能的各种因素，得到加速器的性能、能耗趋势，使用测试数据集与CPU、GPU平台进行性能、功率、能耗等参数对比，分析FPGA实现的优劣性。关键词：深度学习；人工神经网络；FPGA；预测过程；训练过程；加速器；低功耗；万方数据

摘要万方数据

ABsTRACT ABSTRACT In recent years，with the development of computing power and scientific theories， machine learning began to emerge in public life，and the benefits of machine learning applications were accepted by people gradually．Whether the taobao items recommendation system，driverless car,or the man-machine Go competition AlphaGo， machine learning showed US the amazing power of science technology and improved our daily life．As the emerging field of machine learning，Deep Learning originates in the further study of artificial neural networks，and is organic combination of biological sciences and computer science．It shows excellent ability in solving complex learning problems and is seeing significant attention from industry． However,in order to solve the more abstract and more complicated machine learning problems，the networks becomes increasingly large scale；for example，the Google cat system has one billion neurons connections．So high performance implementations of deep learning networks immediately become one of the research hotspots． As a common means to accelerate algorithms，FPGA，Field programmable logic gate array，has high performance，low power consumption，programmable，small size and other characteristics．In this paper,we use FPGA to design a pipelined accelerator for Deep Learning comnlon computing．Main work includes： 1)、This paper analyzes the prediction process and the training process of Deep the common Neural Networks and Convolutional Neural Networks，and gets computational primitives and characteristics to design the accelerator．The algorithms include feedforward algorithm，local pre-training algorithm and global training algorithm． 2)、Based on the resources and memory width，the paper designs the proccesing element，including the feedforward module and weight update module．The module is configurable for different sizes of neural network and is pipeline design for high throughtput． 31、Analysizing the superstructure and data access ofthe FPGA based accelerator, and designing the hardware drives of linux operating system and application programming interface of user． 4、、Summarizing the factors which affect the performance and energy 万方数据

ABsTRACT consumption of the FPGA accelerator through making a large number of comparative experiments．The paper uses several datasets to test the performance，power,energy consumption with CPU and GPU，and analyzes the advantages and disadvantages of the FPGA accelerator． Key words：Deep Learning；artificial neural network；FPGA；prediction；training； accelerator；low power；万方数据

目录目录摘要…………………………………………………………………………． ABSTRACT………………………………………………………………………………．… 目录………………………………………………………………………．．v 表格索引………………………………………………………………I× 插图索引………………………………………………………………×I 代码索引……………………………………………………………．．Xlll 第1章绪论…………………………………………………………一1 1．1 课题背景及意义……………………………………………………………。1 1．2 国内外研究现状……………………………………………………………．．2 1．2．1 加速技术……………………………………………………………………………2 1．2．2 研究现状……………………………………………………………………………3 1．3 本文主要工作………………………………………………………………．．3 1．4 论文组织安排………………………………………………………………．．4 第2章相关技术基础…………………………………………………5 2．1 深度学习的基本概念…………………………………………………………5 2．1．1 人工神经网络简介…………………………………………………………………5 2．1．2 深度学习简介………………………………………………………………………7 2．1．3 网络拓扑结构………………………………………………………………………8 2．1．4 相关算法介绍……………………………………………………………………．12 2．2 软硬件协同设计……………………………………………………………17 2．2．1 软硬件协同计算模型……………………………………………………………．17 2．2．2 设计流程…………………………………………………………………………．18 2．3 硬件加速技术介绍…………………………………………………………19 V 万方数据

资料库

基于FPGA的深度学习加速器设计与实现.pdf

相关推荐

人工智能

热门标签

最新资料