CN109447256A - The design method that Tensorflow system based on FPGA accelerates - Google Patents
The design method that Tensorflow system based on FPGA accelerates Download PDFInfo
- Publication number
- CN109447256A CN109447256A CN201811061386.1A CN201811061386A CN109447256A CN 109447256 A CN109447256 A CN 109447256A CN 201811061386 A CN201811061386 A CN 201811061386A CN 109447256 A CN109447256 A CN 109447256A
- Authority
- CN
- China
- Prior art keywords
- fpga
- opencl
- operator
- language
- kernel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Abstract
A kind of accelerated method of the Tensorflow system based on FPGA, comprising steps of using Python as upper-layer client end program library;As interface for the client program calls on upper layer after the module encapsulation that OpenCL is realized;Prepare FPGA operator by OpenCL and develops environment;FPGA operator is developed by OpenCL;Interior nuclear operator is compiled by OpenCL;FPGA device is incorporated to whole system.The present invention reduces development difficulties, and FPGA is allowed to have the guarantee of stability and practicability as an equipment in Tensorflow system.
Description
Technical field
The present invention relates to Heterogeneous Computing development fields, specifically, it is a kind of will be some in Tensorflow artificial intelligence system
The operator that itself is realized by CPU, uses field programmable gate array (hereinafter referred to as FPGA) instead to be realized.
Background technique
With the development of artificial intelligence, deep neural network computer vision, natural language processing and other across
Disciplinary study field becomes increasingly popular.Deep neural network extracts feature by multiple stack layers from input naturally, and uses
Classifier makes final decision, it means that wherein contains big moment matrix or convolution algorithm operator.Recent evidence show that
The depth of neural network is most important for performance, which greatly increases the requirement to computing capability, traditional CPU without
Method meets the needs of growing, and long time is often wanted in the identification for completing a deep neural network, has not been suitable for
In practical application scene.The common method for solving the problems, such as this is using with the various different heterogeneous distributed rings for calculating equipment
Border, such as DSP, the custom hardware accelerator of GPU and such as FPGA.However, although FPGA has flexibility, high energy efficiency and cost
Benefit, but it is not included into state-of-the-art deep learning frame or system.Such as computation accelerator is used as in Tensorflow,
This and its traditional development scheme speed slowly have relationship very much.Therefore, it is built in using FPGA as acceleration equipment is calculated
The development efficiency for improving FPGA accelerator in tensorflow and by High Level Synthesis is one and gets a good eye meaning and tool
There is novelty.
Summary of the invention
In order to solve can not to support the exploitation and operation of artificial intelligence neural networks using CPU well in the prior art
Speed, and Tensorflow artificial intelligence system does not support this series of problems for the equipment of FPGA instantly, the present invention
A kind of design method that the Tensorflow system based on FPGA accelerates is provided, FPGA is added as computation accelerator and is entered
In Tensorflow system.It is incorporated to by this programmable logical device using FPGA as basic calculating equipment
Tensorflow makes it complete some calculating in Tensorflow.Meanwhile it realizing in one group of corresponding operation of FPGA device
Core ensures acceleration effect to obtain the calculating speed of various tensor stream operations on FPGA.
Technical solution of the invention is as follows:
A kind of accelerated method of the Tensorflow system based on FPGA, feature sign is, includes the following steps:
The first step, using Python as upper-layer client end program library
Second step provides suitable C language interface
As interface for the client program calls on upper layer after the module encapsulation that OpenCL is realized;
Third step prepares FPGA operator by OpenCL and develops environment
4th step develops FPGA operator by OpenCL
The kernel function of needs is first write with C language, then is compiled by BSP the and OpenCL kernel that equipment vendor provides
Environment can automatically generate corresponding binary stream file, by can be on FPGA after the burned FPGA of binary stream file of generation
Run operator;
5th step compiles interior nuclear operator by OpenCL
Using the operator of C language description as kernel, then by the BSP of equipment vendor's offer come the process that is mapped;
FPGA device is incorporated to whole system by the 6th step
C language interface layer is added in the C language code of OpenCL, and add the codes of some registration device ids to facilitate on
The Python of layer, C++ client call directly.
Described prepares FPGA operator exploitation environment by OpenCL, specifically
By searching platform, query facility determines FPGA model and quantity in system;
Similar and with operation data the storage of on-unit is completed with creation caching by command queue;
The binary stream file that can be executed the kernel mappings that C language is completed at FPGA by BSP;
Setting executes parameter, and the host side of OpenCL executes kernel according to actual demand.
The execution parameter includes executing the type and number of kernel.
Described compiles interior nuclear operator by OpenCL, specifically
Determine the type kind of FPGA BSP corresponding with its;
The operator that will be realized is developed with C language and is completed;
Compiling instruction is executed in the environment that equipment vendor provides;
Can generate corresponding net meter file and can programming enter the binary stream file that FPGA generates corresponding hardware.
Compared with prior art, present invention has an advantage that Tensorflow system is added using FPGA to accelerate
The operation of neural network improves neural network computing speed compared to traditional Tensorflow system unsupported for FPGA
Degree.And corresponding operator is developed by OpenCL, using the characteristic of OpenCL itself, uses hardware description compared to traditional
Language development operator, then go to develop corresponding data stream management step by step, telecommunication management greatly reduces development difficulty,
FPGA is allowed to have the guarantee of stability and practicability as an equipment in Tensorflow system.
Detailed description of the invention
Fig. 1 is the flow chart of the design method accelerated the present invention is based on the Tensorflow system of FPGA;
Fig. 2 is that FPGA operator realizes architecture diagram
Fig. 3 OpenCL kernel implementation process
Specific embodiment
The present invention is further explained with reference to the accompanying drawings and examples, but this is not answered to limit protection model of the invention
It encloses.
Specific implementation of the present invention is main including the following steps:
The first step selects suitable upper-layer client end program library
If directly to realize that neural network or other vector operations are necessarily very time-consuming by machine level language
's.So Tensorflow is often to the integrated library packaged towards high-level language such as python and c++ offer, this hair
Bright realization is also with these libraries, because the function provided in these libraries whether realizes or state have extraordinary pumping
As.
From the point of view of attached drawing 1, regardless of language realize library, last concrete operations still pass through after a system
Column process will finally implement on the device of device level, that is to say, that the high-level language on upper layer is to provide unified programming model
Facilitate user to carry out using and the present invention is also according to this set of rule, and some accelerating operators that FPGA is realized are also step by step
Being packaged into function allows upper-layer user directly to be called with high-level language.
Second step provides suitable C language interface
From the point of view of Fig. 2, although high-level language can be convenient the exploitation of upper layer user, because of target of the invention
Or whole system is added in FPGA, so the present invention still needs the C language interface to connection upper layer and mechanical floor later
Layer makes corresponding modification.
The method of use is: with reference to the existing method realized for CPU or GPU, retaining the part that may be multiplexed.
It is then developed using OpenCL, because OpenCL itself is an open source standard general towards heterogeneous system, simultaneously
C language can be used also as carrier is realized in OpenCL itself.And the TensorFlow of new version is also achieved pair
OpenCL and the support for supporting OpenCL equipment can substantially reduce the development difficulty on interface in this way and improve interface
Stability and practicability.
Third step prepares FPGA operator by OpenCL and develops environment
If completing the acceleration of corresponding operator with FPGA, traditional method is to be write out with hardware description language accordingly
Operator function.But such development time is very long, and OpenCL provides one kind can be implemented in C language operator function, most
The plate grade provided again by relevant device company afterwards supports file (abbreviation BSP once) to be realized on FPGA.
As described in Figure 3, standard set development process has been provided in OpenCL, provides a large amount of C language interface,
It can be multiplexed in the second portion.Firstly, query facility determines FPGA model and quantity in system by searching platform, then
Similar and with operation data the storage of on-unit is completed with creation caching by command queue, then passes through BSP for C
The binary stream file that the kernel mappings that language is completed can be executed at FPGA.Finally it is arranged in some execution parameters for example execute
The type and number of core, the host side of OpenCL will execute kernel according to actual demand to which operator part is passed through FPGA reality
It is existing.From above-mentioned process it can be found that OpenCL is greatly reduced the difficulty of FPGA access whole system.Because of its process sheet
Body can complete the equipment management of data transmission migration to the end in most of such as attached drawing 2 that works.
4th step develops FPGA operator by OpenCL
The operator for needing to complete on FPGA can write kernel letter with C language by the standard that OpenCL is provided
Number is to complete.The kernel function completed with C language can be certainly by BSP the and OpenCL kernel translation and compiling environment that equipment vendor provides
The operator that dynamic generation can be run on FPGA hardware.
5th step compiles interior nuclear operator by OpenCL
Compiling kernel in Fig. 3 be using C language description operator as kernel, then by equipment vendor offer BSP come
The process mapped.Concrete methods of realizing is the type kind BSP corresponding with its for determining FPGA, then will be realized
Operator is developed with C language and is completed, and is then held in the SDK for OpenCL for the environment such as Intel company that equipment vendor provides
Row compiling instruction.Corresponding netlist and actual hardware realization binary stream can be then generated, thus FPGA can be allowed direct
Realize corresponding operator.
FPGA device is incorporated to whole system by the 6th step
After the completion of above-mentioned steps, the kernel in attached drawing 1 is realized and mechanical floor part is completed.And because using
The reason of OpenCL, should not stand-alone development host side and FPGA again communication, data flow performer part has also been completed.Finally
Only need the C language code of OpenCL C language interface layer is added, and add some codes about facility registration ID to facilitate on
The Python of layer, C++ client call directly.By above-mentioned encapsulation, upper-layer user can be allowed to use in Tensorflow system
High-level language realizes neural network, and FPGA is allowed to realize that part operator can accelerate network.
The present invention not only extends support of the Tensorflow for new equipment FPGA, is also created in development mode
Newly, it is developed by this open, free programmed environment design towards the general multiple programming of heterogeneous system of OpenCL
FPGA.It can not only allow Tensorflow that can make full use of the completely new operation of FPGA to explore the potentiality of FPGA in this way, also by
The environment that OpenCL is provided accelerates the development progress of FPGA, while decreasing FPGA access Tensorflow bring is all
Multiplex roles problem.It may make that the operator speed of service is promoted in this way, while also increasing some new operators and being completed by FPGA
So that Tensorflow is called.
Claims (4)
1. a kind of accelerated method of the Tensorflow system based on FPGA, which comprises the steps of:
The first step, using Python as upper-layer client end program library
Second step provides suitable C language interface
As interface for the client program calls on upper layer after the module encapsulation that OpenCL is realized;
Third step prepares FPGA operator by OpenCL and develops environment
4th step develops FPGA operator by OpenCL
The kernel function of needs, then BSP the and OpenCL kernel translation and compiling environment provided by equipment vendor are first provided with C language
Corresponding binary stream file can be automatically generated, will can be run on FPGA after the burned FPGA of binary stream file of generation
Operator;
5th step compiles interior nuclear operator by OpenCL
Using the operator of C language description as kernel, then by the BSP of equipment vendor's offer come the process that is mapped;
FPGA device is incorporated to whole system by the 6th step
C language interface layer is added in the C language code of OpenCL, and adds the code of some registration device ids to facilitate upper layer
Python, C++ client call directly.
2. the accelerated method of the Tensorflow system according to claim 1 based on FPGA, which is characterized in that described
Prepare FPGA operator by OpenCL and develop environment, specifically
By searching platform, query facility determines FPGA model and quantity in system;
Similar and with operation data the storage of on-unit is completed with creation caching by command queue;
The binary stream file that can be executed the kernel mappings that C language is completed at FPGA by BSP;
Setting executes parameter, and the host side of OpenCL executes kernel according to actual demand.
3. the accelerated method of the Tensorflow system according to claim 2 based on FPGA, which is characterized in that described
Executing parameter includes executing the type and number of kernel.
4. the accelerated method of the Tensorflow system according to claim 1 based on FPGA, which is characterized in that described
Interior nuclear operator is compiled by OpenCL, specifically
Determine the type kind of FPGA BSP corresponding with its;
The operator that will be realized is developed with C language and is completed;
Compiling instruction is executed in the environment that equipment vendor provides;
Can generate corresponding net meter file and can programming enter the binary stream file that FPGA generates corresponding hardware.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811061386.1A CN109447256A (en) | 2018-09-12 | 2018-09-12 | The design method that Tensorflow system based on FPGA accelerates |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811061386.1A CN109447256A (en) | 2018-09-12 | 2018-09-12 | The design method that Tensorflow system based on FPGA accelerates |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109447256A true CN109447256A (en) | 2019-03-08 |
Family
ID=65532812
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811061386.1A Pending CN109447256A (en) | 2018-09-12 | 2018-09-12 | The design method that Tensorflow system based on FPGA accelerates |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109447256A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110399234A (en) * | 2019-07-10 | 2019-11-01 | 苏州浪潮智能科技有限公司 | A kind of task accelerated processing method, device, equipment and readable storage medium storing program for executing |
CN110928529A (en) * | 2019-11-06 | 2020-03-27 | 第四范式(北京)技术有限公司 | Method and system for assisting operator development |
CN111858036A (en) * | 2020-06-29 | 2020-10-30 | 浪潮电子信息产业股份有限公司 | Tensorflow system acceleration method, device and equipment based on FPGA equipment and storage medium |
CN112001494A (en) * | 2020-08-20 | 2020-11-27 | 浪潮电子信息产业股份有限公司 | Method for realizing support of FPGA (field programmable Gate array) back-end equipment by nGraph framework |
CN113496272A (en) * | 2021-05-10 | 2021-10-12 | 中国电子科技集团公司第十四研究所 | Convolutional neural network operation method based on heterogeneous platform |
CN114201154A (en) * | 2021-12-10 | 2022-03-18 | 北京百度网讯科技有限公司 | Operator generation method and device |
CN116698411A (en) * | 2023-06-29 | 2023-09-05 | 重庆邮电大学空间通信研究院 | Rolling bearing health state early warning method and device based on convolutional neural network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239315A (en) * | 2017-04-11 | 2017-10-10 | 北京深鉴智能科技有限公司 | Towards the programming model of neutral net heterogeneous computing platforms |
WO2018077295A1 (en) * | 2016-10-31 | 2018-05-03 | 腾讯科技(深圳)有限公司 | Data processing method and apparatus for convolutional neural network |
CN107992940A (en) * | 2017-12-12 | 2018-05-04 | 郑州云海信息技术有限公司 | Implementation method and device of a kind of convolutional neural networks on FPGA |
CN108090560A (en) * | 2018-01-05 | 2018-05-29 | 中国科学技术大学苏州研究院 | The design method of LSTM recurrent neural network hardware accelerators based on FPGA |
CN108520300A (en) * | 2018-04-09 | 2018-09-11 | 郑州云海信息技术有限公司 | A kind of implementation method and device of deep learning network |
-
2018
- 2018-09-12 CN CN201811061386.1A patent/CN109447256A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018077295A1 (en) * | 2016-10-31 | 2018-05-03 | 腾讯科技(深圳)有限公司 | Data processing method and apparatus for convolutional neural network |
CN107239315A (en) * | 2017-04-11 | 2017-10-10 | 北京深鉴智能科技有限公司 | Towards the programming model of neutral net heterogeneous computing platforms |
CN107992940A (en) * | 2017-12-12 | 2018-05-04 | 郑州云海信息技术有限公司 | Implementation method and device of a kind of convolutional neural networks on FPGA |
CN108090560A (en) * | 2018-01-05 | 2018-05-29 | 中国科学技术大学苏州研究院 | The design method of LSTM recurrent neural network hardware accelerators based on FPGA |
CN108520300A (en) * | 2018-04-09 | 2018-09-11 | 郑州云海信息技术有限公司 | A kind of implementation method and device of deep learning network |
Non-Patent Citations (2)
Title |
---|
XINYU ZHANG等: ""A design methodology for efficient implementation of network on an FPGA "", 《COMPUTER SCIENCE 2017》 * |
朱虎明等: ""深度神经网络并行化研究综述"", 《计算机学报》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110399234A (en) * | 2019-07-10 | 2019-11-01 | 苏州浪潮智能科技有限公司 | A kind of task accelerated processing method, device, equipment and readable storage medium storing program for executing |
CN110928529A (en) * | 2019-11-06 | 2020-03-27 | 第四范式(北京)技术有限公司 | Method and system for assisting operator development |
WO2021088909A1 (en) * | 2019-11-06 | 2021-05-14 | 第四范式(北京)技术有限公司 | Method and system for assisting operator development |
CN111858036A (en) * | 2020-06-29 | 2020-10-30 | 浪潮电子信息产业股份有限公司 | Tensorflow system acceleration method, device and equipment based on FPGA equipment and storage medium |
CN111858036B (en) * | 2020-06-29 | 2022-06-10 | 浪潮电子信息产业股份有限公司 | Tensorflow system acceleration method, device and equipment based on FPGA equipment and storage medium |
CN112001494A (en) * | 2020-08-20 | 2020-11-27 | 浪潮电子信息产业股份有限公司 | Method for realizing support of FPGA (field programmable Gate array) back-end equipment by nGraph framework |
US11762721B2 (en) | 2020-08-20 | 2023-09-19 | Inspur Electronic Information Industry Co., Ltd. | Method for realizing nGraph framework supporting FPGA rear-end device |
CN113496272A (en) * | 2021-05-10 | 2021-10-12 | 中国电子科技集团公司第十四研究所 | Convolutional neural network operation method based on heterogeneous platform |
CN114201154A (en) * | 2021-12-10 | 2022-03-18 | 北京百度网讯科技有限公司 | Operator generation method and device |
JP7403586B2 (en) | 2021-12-10 | 2023-12-22 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | Operator generation method and device, electronic equipment, storage medium, and computer program |
CN116698411A (en) * | 2023-06-29 | 2023-09-05 | 重庆邮电大学空间通信研究院 | Rolling bearing health state early warning method and device based on convolutional neural network |
CN116698411B (en) * | 2023-06-29 | 2024-03-08 | 重庆邮电大学空间通信研究院 | Rolling bearing health state early warning method and device based on convolutional neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109447256A (en) | The design method that Tensorflow system based on FPGA accelerates | |
US11893386B1 (en) | Optimizing source code from binary files | |
CN103858099B (en) | The method and system applied for execution, the circuit with machine instruction | |
US11561772B2 (en) | Low-code development platform | |
CN110149800B (en) | Apparatus for processing abstract syntax tree associated with source code of source program | |
US20160378438A1 (en) | Agile communication operator | |
CN106687921A (en) | Specifying components in graph-based programs | |
US10282179B2 (en) | Nested communication operator | |
CN103718159B (en) | Image processing software development approach, image processing software development device | |
CN112199086A (en) | Automatic programming control system, method, device, electronic device and storage medium | |
US8615729B2 (en) | Extending existing model-to-model transformations | |
CN104503778A (en) | Installation method and installation device for applications | |
CN107741847A (en) | Realize the method and device of domain-driven model | |
CN106020905A (en) | Microcontroller firmware developing and updating method and system | |
CN106528171A (en) | Method, device and system for designing interface between heterogeneous computing platforms | |
US20130019225A1 (en) | Incremental Inferences for Developing Data Models | |
US8935657B2 (en) | Model-to-model transformation by kind | |
Narihira et al. | Neural Network Libraries: A Deep Learning Framework Designed from Engineers' Perspectives | |
US8914782B2 (en) | Optimization of declarative queries | |
Costa et al. | Exploiting different types of parallelism in distributed analysis of remote sensing data | |
CN116861359A (en) | Operator fusion method and system for deep learning reasoning task compiler | |
CN115983378A (en) | Automatic compiling method for kernel of machine learning operating system | |
US9026985B2 (en) | Dynamically configurable model-to-model transformation engine | |
Panyala et al. | On the use of term rewriting for performance optimization of legacy HPC applications | |
Papenhausen et al. | Polyhedral user mapping and assistant visualizer tool for the r-stream auto-parallelizing compiler |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190308 |
|
WD01 | Invention patent application deemed withdrawn after publication |