CN109447256A - The design method that Tensorflow system based on FPGA accelerates - Google Patents

The design method that Tensorflow system based on FPGA accelerates Download PDF

Info

Publication number
CN109447256A
CN109447256A CN201811061386.1A CN201811061386A CN109447256A CN 109447256 A CN109447256 A CN 109447256A CN 201811061386 A CN201811061386 A CN 201811061386A CN 109447256 A CN109447256 A CN 109447256A
Authority
CN
China
Prior art keywords
fpga
opencl
operator
language
kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811061386.1A
Other languages
Chinese (zh)
Inventor
张英杰
郭开城
陈勇彪
刘焰强
戚正伟
管海兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201811061386.1A priority Critical patent/CN109447256A/en
Publication of CN109447256A publication Critical patent/CN109447256A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

A kind of accelerated method of the Tensorflow system based on FPGA, comprising steps of using Python as upper-layer client end program library;As interface for the client program calls on upper layer after the module encapsulation that OpenCL is realized;Prepare FPGA operator by OpenCL and develops environment;FPGA operator is developed by OpenCL;Interior nuclear operator is compiled by OpenCL;FPGA device is incorporated to whole system.The present invention reduces development difficulties, and FPGA is allowed to have the guarantee of stability and practicability as an equipment in Tensorflow system.

Description

The design method that Tensorflow system based on FPGA accelerates
Technical field
The present invention relates to Heterogeneous Computing development fields, specifically, it is a kind of will be some in Tensorflow artificial intelligence system The operator that itself is realized by CPU, uses field programmable gate array (hereinafter referred to as FPGA) instead to be realized.
Background technique
With the development of artificial intelligence, deep neural network computer vision, natural language processing and other across Disciplinary study field becomes increasingly popular.Deep neural network extracts feature by multiple stack layers from input naturally, and uses Classifier makes final decision, it means that wherein contains big moment matrix or convolution algorithm operator.Recent evidence show that The depth of neural network is most important for performance, which greatly increases the requirement to computing capability, traditional CPU without Method meets the needs of growing, and long time is often wanted in the identification for completing a deep neural network, has not been suitable for In practical application scene.The common method for solving the problems, such as this is using with the various different heterogeneous distributed rings for calculating equipment Border, such as DSP, the custom hardware accelerator of GPU and such as FPGA.However, although FPGA has flexibility, high energy efficiency and cost Benefit, but it is not included into state-of-the-art deep learning frame or system.Such as computation accelerator is used as in Tensorflow, This and its traditional development scheme speed slowly have relationship very much.Therefore, it is built in using FPGA as acceleration equipment is calculated The development efficiency for improving FPGA accelerator in tensorflow and by High Level Synthesis is one and gets a good eye meaning and tool There is novelty.
Summary of the invention
In order to solve can not to support the exploitation and operation of artificial intelligence neural networks using CPU well in the prior art Speed, and Tensorflow artificial intelligence system does not support this series of problems for the equipment of FPGA instantly, the present invention A kind of design method that the Tensorflow system based on FPGA accelerates is provided, FPGA is added as computation accelerator and is entered In Tensorflow system.It is incorporated to by this programmable logical device using FPGA as basic calculating equipment Tensorflow makes it complete some calculating in Tensorflow.Meanwhile it realizing in one group of corresponding operation of FPGA device Core ensures acceleration effect to obtain the calculating speed of various tensor stream operations on FPGA.
Technical solution of the invention is as follows:
A kind of accelerated method of the Tensorflow system based on FPGA, feature sign is, includes the following steps:
The first step, using Python as upper-layer client end program library
Second step provides suitable C language interface
As interface for the client program calls on upper layer after the module encapsulation that OpenCL is realized;
Third step prepares FPGA operator by OpenCL and develops environment
4th step develops FPGA operator by OpenCL
The kernel function of needs is first write with C language, then is compiled by BSP the and OpenCL kernel that equipment vendor provides Environment can automatically generate corresponding binary stream file, by can be on FPGA after the burned FPGA of binary stream file of generation Run operator;
5th step compiles interior nuclear operator by OpenCL
Using the operator of C language description as kernel, then by the BSP of equipment vendor's offer come the process that is mapped;
FPGA device is incorporated to whole system by the 6th step
C language interface layer is added in the C language code of OpenCL, and add the codes of some registration device ids to facilitate on The Python of layer, C++ client call directly.
Described prepares FPGA operator exploitation environment by OpenCL, specifically
By searching platform, query facility determines FPGA model and quantity in system;
Similar and with operation data the storage of on-unit is completed with creation caching by command queue;
The binary stream file that can be executed the kernel mappings that C language is completed at FPGA by BSP;
Setting executes parameter, and the host side of OpenCL executes kernel according to actual demand.
The execution parameter includes executing the type and number of kernel.
Described compiles interior nuclear operator by OpenCL, specifically
Determine the type kind of FPGA BSP corresponding with its;
The operator that will be realized is developed with C language and is completed;
Compiling instruction is executed in the environment that equipment vendor provides;
Can generate corresponding net meter file and can programming enter the binary stream file that FPGA generates corresponding hardware.
Compared with prior art, present invention has an advantage that Tensorflow system is added using FPGA to accelerate The operation of neural network improves neural network computing speed compared to traditional Tensorflow system unsupported for FPGA Degree.And corresponding operator is developed by OpenCL, using the characteristic of OpenCL itself, uses hardware description compared to traditional Language development operator, then go to develop corresponding data stream management step by step, telecommunication management greatly reduces development difficulty, FPGA is allowed to have the guarantee of stability and practicability as an equipment in Tensorflow system.
Detailed description of the invention
Fig. 1 is the flow chart of the design method accelerated the present invention is based on the Tensorflow system of FPGA;
Fig. 2 is that FPGA operator realizes architecture diagram
Fig. 3 OpenCL kernel implementation process
Specific embodiment
The present invention is further explained with reference to the accompanying drawings and examples, but this is not answered to limit protection model of the invention It encloses.
Specific implementation of the present invention is main including the following steps:
The first step selects suitable upper-layer client end program library
If directly to realize that neural network or other vector operations are necessarily very time-consuming by machine level language 's.So Tensorflow is often to the integrated library packaged towards high-level language such as python and c++ offer, this hair Bright realization is also with these libraries, because the function provided in these libraries whether realizes or state have extraordinary pumping As.
From the point of view of attached drawing 1, regardless of language realize library, last concrete operations still pass through after a system Column process will finally implement on the device of device level, that is to say, that the high-level language on upper layer is to provide unified programming model Facilitate user to carry out using and the present invention is also according to this set of rule, and some accelerating operators that FPGA is realized are also step by step Being packaged into function allows upper-layer user directly to be called with high-level language.
Second step provides suitable C language interface
From the point of view of Fig. 2, although high-level language can be convenient the exploitation of upper layer user, because of target of the invention Or whole system is added in FPGA, so the present invention still needs the C language interface to connection upper layer and mechanical floor later Layer makes corresponding modification.
The method of use is: with reference to the existing method realized for CPU or GPU, retaining the part that may be multiplexed. It is then developed using OpenCL, because OpenCL itself is an open source standard general towards heterogeneous system, simultaneously C language can be used also as carrier is realized in OpenCL itself.And the TensorFlow of new version is also achieved pair OpenCL and the support for supporting OpenCL equipment can substantially reduce the development difficulty on interface in this way and improve interface Stability and practicability.
Third step prepares FPGA operator by OpenCL and develops environment
If completing the acceleration of corresponding operator with FPGA, traditional method is to be write out with hardware description language accordingly Operator function.But such development time is very long, and OpenCL provides one kind can be implemented in C language operator function, most The plate grade provided again by relevant device company afterwards supports file (abbreviation BSP once) to be realized on FPGA.
As described in Figure 3, standard set development process has been provided in OpenCL, provides a large amount of C language interface, It can be multiplexed in the second portion.Firstly, query facility determines FPGA model and quantity in system by searching platform, then Similar and with operation data the storage of on-unit is completed with creation caching by command queue, then passes through BSP for C The binary stream file that the kernel mappings that language is completed can be executed at FPGA.Finally it is arranged in some execution parameters for example execute The type and number of core, the host side of OpenCL will execute kernel according to actual demand to which operator part is passed through FPGA reality It is existing.From above-mentioned process it can be found that OpenCL is greatly reduced the difficulty of FPGA access whole system.Because of its process sheet Body can complete the equipment management of data transmission migration to the end in most of such as attached drawing 2 that works.
4th step develops FPGA operator by OpenCL
The operator for needing to complete on FPGA can write kernel letter with C language by the standard that OpenCL is provided Number is to complete.The kernel function completed with C language can be certainly by BSP the and OpenCL kernel translation and compiling environment that equipment vendor provides The operator that dynamic generation can be run on FPGA hardware.
5th step compiles interior nuclear operator by OpenCL
Compiling kernel in Fig. 3 be using C language description operator as kernel, then by equipment vendor offer BSP come The process mapped.Concrete methods of realizing is the type kind BSP corresponding with its for determining FPGA, then will be realized Operator is developed with C language and is completed, and is then held in the SDK for OpenCL for the environment such as Intel company that equipment vendor provides Row compiling instruction.Corresponding netlist and actual hardware realization binary stream can be then generated, thus FPGA can be allowed direct Realize corresponding operator.
FPGA device is incorporated to whole system by the 6th step
After the completion of above-mentioned steps, the kernel in attached drawing 1 is realized and mechanical floor part is completed.And because using The reason of OpenCL, should not stand-alone development host side and FPGA again communication, data flow performer part has also been completed.Finally Only need the C language code of OpenCL C language interface layer is added, and add some codes about facility registration ID to facilitate on The Python of layer, C++ client call directly.By above-mentioned encapsulation, upper-layer user can be allowed to use in Tensorflow system High-level language realizes neural network, and FPGA is allowed to realize that part operator can accelerate network.
The present invention not only extends support of the Tensorflow for new equipment FPGA, is also created in development mode Newly, it is developed by this open, free programmed environment design towards the general multiple programming of heterogeneous system of OpenCL FPGA.It can not only allow Tensorflow that can make full use of the completely new operation of FPGA to explore the potentiality of FPGA in this way, also by The environment that OpenCL is provided accelerates the development progress of FPGA, while decreasing FPGA access Tensorflow bring is all Multiplex roles problem.It may make that the operator speed of service is promoted in this way, while also increasing some new operators and being completed by FPGA So that Tensorflow is called.

Claims (4)

1. a kind of accelerated method of the Tensorflow system based on FPGA, which comprises the steps of:
The first step, using Python as upper-layer client end program library
Second step provides suitable C language interface
As interface for the client program calls on upper layer after the module encapsulation that OpenCL is realized;
Third step prepares FPGA operator by OpenCL and develops environment
4th step develops FPGA operator by OpenCL
The kernel function of needs, then BSP the and OpenCL kernel translation and compiling environment provided by equipment vendor are first provided with C language Corresponding binary stream file can be automatically generated, will can be run on FPGA after the burned FPGA of binary stream file of generation Operator;
5th step compiles interior nuclear operator by OpenCL
Using the operator of C language description as kernel, then by the BSP of equipment vendor's offer come the process that is mapped;
FPGA device is incorporated to whole system by the 6th step
C language interface layer is added in the C language code of OpenCL, and adds the code of some registration device ids to facilitate upper layer Python, C++ client call directly.
2. the accelerated method of the Tensorflow system according to claim 1 based on FPGA, which is characterized in that described Prepare FPGA operator by OpenCL and develop environment, specifically
By searching platform, query facility determines FPGA model and quantity in system;
Similar and with operation data the storage of on-unit is completed with creation caching by command queue;
The binary stream file that can be executed the kernel mappings that C language is completed at FPGA by BSP;
Setting executes parameter, and the host side of OpenCL executes kernel according to actual demand.
3. the accelerated method of the Tensorflow system according to claim 2 based on FPGA, which is characterized in that described Executing parameter includes executing the type and number of kernel.
4. the accelerated method of the Tensorflow system according to claim 1 based on FPGA, which is characterized in that described Interior nuclear operator is compiled by OpenCL, specifically
Determine the type kind of FPGA BSP corresponding with its;
The operator that will be realized is developed with C language and is completed;
Compiling instruction is executed in the environment that equipment vendor provides;
Can generate corresponding net meter file and can programming enter the binary stream file that FPGA generates corresponding hardware.
CN201811061386.1A 2018-09-12 2018-09-12 The design method that Tensorflow system based on FPGA accelerates Pending CN109447256A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811061386.1A CN109447256A (en) 2018-09-12 2018-09-12 The design method that Tensorflow system based on FPGA accelerates

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811061386.1A CN109447256A (en) 2018-09-12 2018-09-12 The design method that Tensorflow system based on FPGA accelerates

Publications (1)

Publication Number Publication Date
CN109447256A true CN109447256A (en) 2019-03-08

Family

ID=65532812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811061386.1A Pending CN109447256A (en) 2018-09-12 2018-09-12 The design method that Tensorflow system based on FPGA accelerates

Country Status (1)

Country Link
CN (1) CN109447256A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399234A (en) * 2019-07-10 2019-11-01 苏州浪潮智能科技有限公司 A kind of task accelerated processing method, device, equipment and readable storage medium storing program for executing
CN110928529A (en) * 2019-11-06 2020-03-27 第四范式(北京)技术有限公司 Method and system for assisting operator development
CN111858036A (en) * 2020-06-29 2020-10-30 浪潮电子信息产业股份有限公司 Tensorflow system acceleration method, device and equipment based on FPGA equipment and storage medium
CN112001494A (en) * 2020-08-20 2020-11-27 浪潮电子信息产业股份有限公司 Method for realizing support of FPGA (field programmable Gate array) back-end equipment by nGraph framework
CN113496272A (en) * 2021-05-10 2021-10-12 中国电子科技集团公司第十四研究所 Convolutional neural network operation method based on heterogeneous platform
CN114201154A (en) * 2021-12-10 2022-03-18 北京百度网讯科技有限公司 Operator generation method and device
CN116698411A (en) * 2023-06-29 2023-09-05 重庆邮电大学空间通信研究院 Rolling bearing health state early warning method and device based on convolutional neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239315A (en) * 2017-04-11 2017-10-10 北京深鉴智能科技有限公司 Towards the programming model of neutral net heterogeneous computing platforms
WO2018077295A1 (en) * 2016-10-31 2018-05-03 腾讯科技(深圳)有限公司 Data processing method and apparatus for convolutional neural network
CN107992940A (en) * 2017-12-12 2018-05-04 郑州云海信息技术有限公司 Implementation method and device of a kind of convolutional neural networks on FPGA
CN108090560A (en) * 2018-01-05 2018-05-29 中国科学技术大学苏州研究院 The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN108520300A (en) * 2018-04-09 2018-09-11 郑州云海信息技术有限公司 A kind of implementation method and device of deep learning network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018077295A1 (en) * 2016-10-31 2018-05-03 腾讯科技(深圳)有限公司 Data processing method and apparatus for convolutional neural network
CN107239315A (en) * 2017-04-11 2017-10-10 北京深鉴智能科技有限公司 Towards the programming model of neutral net heterogeneous computing platforms
CN107992940A (en) * 2017-12-12 2018-05-04 郑州云海信息技术有限公司 Implementation method and device of a kind of convolutional neural networks on FPGA
CN108090560A (en) * 2018-01-05 2018-05-29 中国科学技术大学苏州研究院 The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN108520300A (en) * 2018-04-09 2018-09-11 郑州云海信息技术有限公司 A kind of implementation method and device of deep learning network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XINYU ZHANG等: ""A design methodology for efficient implementation of network on an FPGA "", 《COMPUTER SCIENCE 2017》 *
朱虎明等: ""深度神经网络并行化研究综述"", 《计算机学报》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399234A (en) * 2019-07-10 2019-11-01 苏州浪潮智能科技有限公司 A kind of task accelerated processing method, device, equipment and readable storage medium storing program for executing
CN110928529A (en) * 2019-11-06 2020-03-27 第四范式(北京)技术有限公司 Method and system for assisting operator development
WO2021088909A1 (en) * 2019-11-06 2021-05-14 第四范式(北京)技术有限公司 Method and system for assisting operator development
CN111858036A (en) * 2020-06-29 2020-10-30 浪潮电子信息产业股份有限公司 Tensorflow system acceleration method, device and equipment based on FPGA equipment and storage medium
CN111858036B (en) * 2020-06-29 2022-06-10 浪潮电子信息产业股份有限公司 Tensorflow system acceleration method, device and equipment based on FPGA equipment and storage medium
CN112001494A (en) * 2020-08-20 2020-11-27 浪潮电子信息产业股份有限公司 Method for realizing support of FPGA (field programmable Gate array) back-end equipment by nGraph framework
US11762721B2 (en) 2020-08-20 2023-09-19 Inspur Electronic Information Industry Co., Ltd. Method for realizing nGraph framework supporting FPGA rear-end device
CN113496272A (en) * 2021-05-10 2021-10-12 中国电子科技集团公司第十四研究所 Convolutional neural network operation method based on heterogeneous platform
CN114201154A (en) * 2021-12-10 2022-03-18 北京百度网讯科技有限公司 Operator generation method and device
JP7403586B2 (en) 2021-12-10 2023-12-22 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Operator generation method and device, electronic equipment, storage medium, and computer program
CN116698411A (en) * 2023-06-29 2023-09-05 重庆邮电大学空间通信研究院 Rolling bearing health state early warning method and device based on convolutional neural network
CN116698411B (en) * 2023-06-29 2024-03-08 重庆邮电大学空间通信研究院 Rolling bearing health state early warning method and device based on convolutional neural network

Similar Documents

Publication Publication Date Title
CN109447256A (en) The design method that Tensorflow system based on FPGA accelerates
US11893386B1 (en) Optimizing source code from binary files
CN103858099B (en) The method and system applied for execution, the circuit with machine instruction
US11561772B2 (en) Low-code development platform
CN110149800B (en) Apparatus for processing abstract syntax tree associated with source code of source program
US20160378438A1 (en) Agile communication operator
CN106687921A (en) Specifying components in graph-based programs
US10282179B2 (en) Nested communication operator
CN103718159B (en) Image processing software development approach, image processing software development device
CN112199086A (en) Automatic programming control system, method, device, electronic device and storage medium
US8615729B2 (en) Extending existing model-to-model transformations
CN104503778A (en) Installation method and installation device for applications
CN107741847A (en) Realize the method and device of domain-driven model
CN106020905A (en) Microcontroller firmware developing and updating method and system
CN106528171A (en) Method, device and system for designing interface between heterogeneous computing platforms
US20130019225A1 (en) Incremental Inferences for Developing Data Models
US8935657B2 (en) Model-to-model transformation by kind
Narihira et al. Neural Network Libraries: A Deep Learning Framework Designed from Engineers' Perspectives
US8914782B2 (en) Optimization of declarative queries
Costa et al. Exploiting different types of parallelism in distributed analysis of remote sensing data
CN116861359A (en) Operator fusion method and system for deep learning reasoning task compiler
CN115983378A (en) Automatic compiling method for kernel of machine learning operating system
US9026985B2 (en) Dynamically configurable model-to-model transformation engine
Panyala et al. On the use of term rewriting for performance optimization of legacy HPC applications
Papenhausen et al. Polyhedral user mapping and assistant visualizer tool for the r-stream auto-parallelizing compiler

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190308

WD01 Invention patent application deemed withdrawn after publication