CN109858610A - A kind of accelerated method of convolutional neural networks, device, equipment and storage medium - Google Patents
A kind of accelerated method of convolutional neural networks, device, equipment and storage medium Download PDFInfo
- Publication number
- CN109858610A CN109858610A CN201910016345.9A CN201910016345A CN109858610A CN 109858610 A CN109858610 A CN 109858610A CN 201910016345 A CN201910016345 A CN 201910016345A CN 109858610 A CN109858610 A CN 109858610A
- Authority
- CN
- China
- Prior art keywords
- cnn
- accelerated
- calculating operation
- action sequence
- neural networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 173
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000009471 action Effects 0.000 claims abstract description 64
- 230000001133 acceleration Effects 0.000 claims abstract description 18
- 238000013135 deep learning Methods 0.000 claims description 26
- 238000004590 computer program Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 6
- 230000006399 behavior Effects 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000012827 research and development Methods 0.000 abstract description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 5
- 239000010410 layer Substances 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000013016 learning Effects 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses accelerated method device, equipment and the storage mediums of a kind of convolutional neural networks, the calculating operation model including receiving multiple preset kinds in preset convolutional neural networks CNN in advance;From multiple calculating operation models, the calculating operation model that can be realized each calculating operation of CNN to be accelerated is obtained, as stand-by calculating operation model;The on-site programmable gate array FPGA of accelerator card is controlled according to stand-by calculating operation model, compiles out the kernel program for executing CNN to be accelerated;Obtain the action sequence parameter of the action sequence of each calculating operation comprising CNN to be accelerated;It controls FPGA and executes kernel program according to the action sequence in action sequence parameter, and operation is carried out to preset data, accelerate to realize.The present invention can execute the operation of the acceleration to any one CNN to be accelerated using any one piece of accelerator card, and without developing a variety of accelerator cards, flexibility is stronger, and saves research and development cost.
Description
Technical field
The present invention relates to algorithms to accelerate field, and more particularly to a kind of accelerated method of convolutional neural networks, the present invention is also
It is related to accelerator, equipment and the storage medium of a kind of convolutional neural networks.
Background technique
CNN (Convolutional Neutral Network, convolutional neural networks) is one kind of artificial neural network,
In order to meet the requirement such as arithmetic speed, it will usually accelerated using calculating process of the accelerator card to CNN, but there are many CNN
Different type, in the prior art when the calculating process to CNN accelerates, it is necessary to use this type to be accelerated
The dedicated accelerator card of CNN, i.e., each type of CNN, which requires dedicated accelerator card, can realize acceleration, and flexibility is poor, and
It researches and develops a plurality of types of accelerator cards and produces higher research and development cost.
Therefore, how to provide a kind of scheme of solution above-mentioned technical problem is that those skilled in the art need to solve at present
Problem.
Summary of the invention
The object of the present invention is to provide a kind of accelerated methods of convolutional neural networks, and flexibility is stronger, and save research and development
Cost;It is a further object of the present invention to provide a kind of accelerator of convolutional neural networks, equipment and storage medium, flexibility compared with
By force, and research and development cost is saved.
In order to solve the above technical problems, the present invention provides a kind of accelerated methods of convolutional neural networks, comprising:
The calculating operation model of multiple preset kinds in preset convolutional neural networks CNN is received in advance;
From multiple calculating operation models, the meter that can be realized each calculating operation of CNN to be accelerated is obtained
Operation model is calculated, as stand-by calculating operation model;
The on-site programmable gate array FPGA of accelerator card is controlled according to the stand-by calculating operation model, is compiled out for holding
The kernel program of the row CNN to be accelerated;
Obtain the action sequence parameter of the action sequence of each calculating operation comprising the CNN to be accelerated;
It controls the FPGA and executes the kernel program according to the action sequence in the action sequence parameter, and is right
Preset data carries out operation, accelerates to realize.
Preferably, described to obtain comprising described when accelerating the movement of the action sequence of each calculating operation of CNN
Order parameter specifically:
The CNN to be accelerated is converted into CNN to be accelerated described in predetermined deep learning framework;
Obtain the action sequence of each calculating operation of the CNN to be accelerated comprising predetermined deep learning framework
Action sequence parameter.
Preferably, the predetermined deep learning framework is caffe or TensorFlow.
Preferably, the on-site programmable gate array FPGA of the control accelerator card is according to the stand-by calculating operation model,
Compile out the kernel program for executing the CNN to be accelerated specifically:
The on-site programmable gate array FPGA of accelerator card is controlled according to the stand-by calculating operation model, passes through the hard of itself
Part compiles platform, compiles out the kernel program for executing the CNN to be accelerated.
Preferably, the calculating operation mould for receiving multiple preset kinds in preset convolutional neural networks CNN in advance
Type specifically:
It receives in advance and utilizes multiple preset kinds in the open preset convolutional neural networks CNN of operation language OpenCL
Calculating operation model.
Preferably, the preset kind includes convolution operation, pondization operation, line rectification function Relu and Norm letter
Number.
In order to solve the above technical problems, the present invention also provides a kind of accelerators of convolutional neural networks, comprising:
Receiving module, for receiving the calculating operation of multiple preset kinds in preset convolutional neural networks CNN in advance
Model;
First obtains module, for from multiple calculating operation models, acquisition to can be realized each of CNN to be accelerated
The calculating operation model of calculating operation, as stand-by calculating operation model;
First control module, for controlling the on-site programmable gate array FPGA of accelerator card according to the stand-by calculating operation
Model compiles out the kernel program for executing the CNN to be accelerated;
Second obtains module, the action sequence of each calculating operation for obtaining the CNN to be accelerated comprising described in
Action sequence parameter;
Second control module is executed for controlling the FPGA according to the action sequence in the action sequence parameter
The kernel program, and operation is carried out to preset data, accelerate to realize.
Preferably, the second acquisition module includes:
Conversion module, for the CNN to be accelerated to be converted to CNN to be accelerated described in predetermined deep learning framework;
Acquisition submodule, for obtaining each calculating comprising CNN to be accelerated described in predetermined deep learning framework
The action sequence parameter of the action sequence of operation.
In order to solve the above technical problems, the present invention also provides a kind of acceleration equipments of convolutional neural networks, comprising:
Memory, for storing computer program;
Processor realizes the acceleration side of the as above any one convolutional neural networks when for executing the computer program
The step of method.
In order to solve the above technical problems, the computer can the present invention also provides a kind of computer readable storage medium
It reads to be stored with computer program on storage medium, the as above any one volume is realized when the computer program is executed by processor
The step of accelerated method of product neural network.
The present invention provides a kind of accelerated methods of convolutional neural networks, including receive preset convolutional neural networks in advance
The calculating operation model of multiple preset kinds in CNN;From multiple calculating operation models, acquisition can be realized CNN to be accelerated
Each calculating operation calculating operation model, as stand-by calculating operation model;Control the field-programmable gate array of accelerator card
FPGA is arranged according to stand-by calculating operation model, compiles out the kernel program for executing CNN to be accelerated;It obtains comprising wait accelerate
The action sequence parameter of the action sequence of each calculating operation of CNN;When controlling FPGA according to movement in action sequence parameter
Sequence executes kernel program, and carries out operation to preset data, accelerates to realize.
As it can be seen that executed to any one when accelerating the acceleration of CNN to operate in the present invention, it can be from preset CNN
In multiple preset kinds calculating operation model in, obtain can be realized CNN to be accelerated each calculating operation calculating behaviour
Make model, as stand-by calculating operation model, then can control the FPGA in accelerator card according to stand-by calculating operation model, compile
The kernel program for executing CNN to be accelerated is translated, then the movement of available each calculating operation comprising CNN to be accelerated
The action sequence parameter of timing, and control FPGA and execute kernel program according to the action sequence in action sequence parameter, and to pre-
If data carry out operation, accelerate to realize, the present invention can be executed to be added to any one using any one piece of accelerator card
The acceleration of fast CNN operates, and without developing a variety of accelerator cards, flexibility is stronger, and saves research and development cost.
The present invention also provides a kind of accelerator of convolutional neural networks, equipment and storage medium, there is as above volume
The identical beneficial effect of accelerated method of product neural network.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to institute in the prior art and embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings
Obtain other attached drawings.
Fig. 1 is a kind of flow diagram of the accelerated method of convolutional neural networks provided by the invention;
Fig. 2 is a kind of structural schematic diagram of the accelerator of convolutional neural networks provided by the invention;
Fig. 3 is a kind of structural schematic diagram of the acceleration equipment of convolutional neural networks provided by the invention.
Specific embodiment
Core of the invention is to provide a kind of accelerated method of convolutional neural networks, and flexibility is stronger, and saves research and development
Cost;Another core of the invention is to provide accelerator, equipment and the storage medium of a kind of convolutional neural networks, flexibility compared with
By force, and research and development cost is saved.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Referring to FIG. 1, Fig. 1 is a kind of flow diagram of the accelerated method of convolutional neural networks provided by the invention, packet
It includes:
Step S1: the calculating operation model of multiple preset kinds in preset CNN is received in advance;
Specifically, the calculating operation model of multiple preset kinds can be that can be realized in various convolutional neural networks to commonly use
Calculating operation each calculating operation model, such as A calculating operation model may be implemented A calculating operation, and B calculating operation mould
B calculating operation etc. may be implemented in type, and quantity can independently be set according to demand, and the embodiment of the present invention does not limit herein
It is fixed.
Specifically, the executing subject in the embodiment of the present invention can be CPU, this step is specifically as follows the storage in CPU
Module receives the calculating operation model of multiple preset kinds in preset CNN in advance, or CPU receive it is preset
It is stored it in memory module after the calculating operation model of multiple preset kinds in CNN, in such cases, works as memory module
In have each calculating operation model after, subsequent step can be executed, to realize the acceleration operation to various algorithms.
Step S2: from multiple calculating operation models, the calculating that can be realized each calculating operation of CNN to be accelerated is obtained
Operation model, as stand-by calculating operation model;
Specifically, CNN to be accelerated can be any one in various CNN, the embodiment of the present invention is it is not limited here.
Wherein it is possible to be obtained first wait accelerate each calculating operation in CNN, that is, know wait accelerate each meter in CNN
Calculating operation is what respectively, then from multiple calculating operation models, can be realized the meter of each calculating operation of CNN to be accelerated
Operation model is calculated, as stand-by calculating operation model, to execute subsequent step.
Step S3: FPGA (Field-Programmable Gate Array, the field-programmable gate array of accelerator card are controlled
Column) according to stand-by calculating operation model, compile out the kernel program for executing CNN to be accelerated;
Accelerate CNN to be accelerated specifically, FPGA wants smoothly to treat, can compile out and use according to stand-by computation model
In the kernel program for executing CNN to be accelerated, in such cases, FPGA can execute kernel program and cooperate subsequent step with right
CNN to be accelerated is accelerated.
Wherein, the acceleration that the embodiment of the present invention is realized by FPGA can accelerate for isomery, can be adapted for various types
CNN, the embodiment of the present invention is it is not limited here.
Wherein, after kernel program is compiled, it can control kernel program and be loaded into FPGA, so as to subsequent execution.
Step S4: the action sequence parameter of the action sequence of each calculating operation comprising CNN to be accelerated is obtained;
Specifically, the action sequence parameter of CNN to be accelerated can be obtained by multiple approach, for example, can directly for
CNN is accelerated to carry out parsing acquisition, or acquisition etc. from pre-stored data bank, the embodiment of the present invention is it is not limited here.
Wherein, it may include the action sequence of CNN to be accelerated in action sequence parameter, such as after A movement is finished
B movement is executed, executes D movement etc. after B movement has executed, the concrete form of action sequence is opposite with the type of CNN to be accelerated
It answers, the embodiment of the present invention is it is not limited here.
Step S5: control FPGA executes kernel program according to the action sequence in action sequence parameter, and to preset data
Operation is carried out, is accelerated to realize.
Specifically, above-mentioned steps after the completion of, FPGA can be controlled according to the action sequence in action sequence parameter
Kernel program is executed, and operation is carried out to preset data in the process, the acceleration for CNN to be accelerated is realized, improves
Arithmetic speed.
Wherein, preset data can be a plurality of types of data, such as the people got in carrying out face recognition process
Face data etc., preset data can be input in FPGA under the control of cpu by global memory and carry out operation, the embodiment of the present invention
It is not limited here.
Wherein it is possible to which action sequence parameter is saved in data group, the read-write operation of data in array is then controlled, it will
Data are passed in the global memory of FPGA, and start FPGA kernel program, and reading it from global memory includes when acting
The input data of order parameter and preset data, accelerates algorithm.
In addition, CPU can also obtain the operation result of FPGA after operation terminates, the process for obtaining operation result can
Think that control FPGA stores operation result, then CPU obtains operation result from storage, and by operation result with a variety of
Form output, such as diagrammatic form or the form of voice prompting etc., the embodiment of the present invention is it is not limited here.
It should be noted that a branch of the deep learning as machine learning, is the neck quickly grown in artificial intelligence
One of domain can help the data of computer understanding great amount of images, sound and textual form.Recently as caffe
(Convolutional Architecture for Fast Feature Embedding, the convolution for swift nature insertion
Structure) even depth study Open-Source Tools tend to be mature, deep learning technology is quickly grown, currently, deep learning recognition of face,
Speech recognition, precisely medical treatment and the fields such as unmanned are just widely used.CNN is one kind of artificial neural network, is
First is really successfully trained the deep learning algorithm of multitiered network structure.Developer is created using computation-intensive algorithm
CNN, and it is implemented on a variety of platforms.It, can mimic biology vision because it is connected with multilayer neuron handles data
The behavior of nerve obtains very high recognition accuracy, it has also become the research hotspot of current speech analysis and field of image recognition.Check
In reading system, OCR (Optical Character Recognition, optical character identification) and hand-written discrimination system, streetscape
Recognition of face and Car license recognition and France Telecom's video conferencing system in recognition of face all used CNN.
Existing major part CNN, which is realized, is mainly based upon general processor CPU realization, in CNN network structure, in layer
Calculating is independent incoherent, and interlayer structure can be understood as a flowing structure.Due to the specific calculations mode of CNN,
General processor CPU excavates the concurrency inside CNN due to its own feature with being unable to fully, and realizes that CNN is not efficient, so
It is difficult to meet performance requirement.It is based on FPGA recently, GPU (Graphics Processing Unit, graphics processor) is even
The different accelerators of ASIC (Application Specific Integrated Circuit, specific integrated circuit) are successive
It proposes to promote CNN design performance.In these schemes, based on the accelerator of FPGA due to its better performance, high energy efficiency, fastly
The fast development cycle and it is reconfigurable can the gravitational attraction attention of more and more researchers.FPGA adds as a kind of computation-intensive
Fast component, by accelerating the Parallel Hardware on Algorithm mapping to FPGA, the upper designed each hardware module of FPGA can
To execute parallel, flowing structure provided by the interconnection of each hardware module input and output and FPGA can be very good and
CNN algorithm matches, and makes full use of the concurrency inside algorithm network structure, reduces energy while improving arithmetic speed
Consumption.There is scholar to realize the CNN of different structure on FPGA before to do simple realtime graphic identification or classification, but this
A little researchs realize it is most of just in calculating more complicated convolutional layer or being based on certain specific neural network, such as
Aydonat et al. proposes a completely new CNN and realizes frame, completes FPGA and accelerates to the isomery of Alexnet network.Work as research and development
When personnel need to carry out FPGA isomery to new convolutional neural networks to accelerate, then the specific network knot according to new network is needed
Structure realizes there is poor versatility and flexibility to the design for realizing that framework carries out again of FPGA.
The present invention provides a kind of accelerated methods of convolutional neural networks, including receive preset convolutional neural networks in advance
The calculating operation model of multiple preset kinds in CNN;From multiple calculating operation models, acquisition can be realized CNN to be accelerated
Each calculating operation calculating operation model, as stand-by calculating operation model;Control the field-programmable gate array of accelerator card
FPGA is arranged according to stand-by calculating operation model, compiles out the kernel program for executing CNN to be accelerated;It obtains comprising wait accelerate
The action sequence parameter of the action sequence of each calculating operation of CNN;When controlling FPGA according to movement in action sequence parameter
Sequence executes kernel program, and carries out operation to preset data, accelerates to realize.
As it can be seen that executed to any one when accelerating the acceleration of CNN to operate in the present invention, it can be from preset CNN
In multiple preset kinds calculating operation model in, obtain can be realized CNN to be accelerated each calculating operation calculating behaviour
Make model, as stand-by calculating operation model, then can control the FPGA in accelerator card according to stand-by calculating operation model, compile
The kernel program for executing CNN to be accelerated is translated, then the movement of available each calculating operation comprising CNN to be accelerated
The action sequence parameter of timing, and control FPGA and execute kernel program according to the action sequence in action sequence parameter, and to pre-
If data carry out operation, accelerate to realize, the present invention can be executed to be added to any one using any one piece of accelerator card
The acceleration of fast CNN operates, and without developing a variety of accelerator cards, flexibility is stronger, and saves research and development cost.
On the basis of the above embodiments:
Embodiment as one preferred obtains the movement of the action sequence of each calculating operation comprising CNN to be accelerated
Time sequence parameter specifically:
CNN to be accelerated is converted to the CNN to be accelerated of predetermined deep learning framework;
Obtain the action sequence of the action sequence of each calculating operation of the CNN to be accelerated comprising predetermined deep learning framework
Parameter.
Specifically, in view of CNN to be accelerated can be a plurality of types of deep learning frames, it is desirable to from different types of depth
The action sequence parameter of CNN to be accelerated is obtained in degree learning framework, it is necessary to build various types of deep learnings in advance in CPU
Frame in the embodiment of the present invention, in order to save resource, can only build a kind of predetermined deep learning framework, such feelings in CPU
Under condition, it is only necessary to CNN to be accelerated be converted to the CNN to be accelerated of default learning framework, then CPU, which can be treated, accelerates CNN
In action sequence parameter obtained, save resource.
Certainly, obtaining action sequence parameter may be other modes, such as build a variety of deep learnings in CPU in advance
Then frame is treated and the action sequence parameter of CNN is accelerated to directly acquire etc., the embodiment of the present invention is it is not limited here.
Embodiment as one preferred, predetermined deep learning framework are caffe or TensorFlow.
Specifically, caffe and TensorFlow are common deep learning frame, in such cases, if wait accelerate
CNN is just caffe and TensorFlow, then just being converted without carrying out deep learning frame, is further saved
Computing resource.
Certainly, other than caffe and TensorFlow, predetermined deep learning framework can also be other types, this hair
Bright embodiment is it is not limited here.
Embodiment as one preferred controls the on-site programmable gate array FPGA of accelerator card according to stand-by calculating operation
Model compiles out the kernel program for executing CNN to be accelerated specifically:
The on-site programmable gate array FPGA of accelerator card is controlled according to stand-by calculating operation model, is compiled by the hardware of itself
Platform is translated, the kernel program for executing CNN to be accelerated is compiled out.
It, can be with save the cost, without will count specifically, the hardware compilation platform using FPGA itself compiles kernel program
According to export, improve work efficiency.
Certainly, other than using the hardware compilation platform of FPGA itself compiling kernel program, other modes can also be used,
The embodiment of the present invention is it is not limited here.
Embodiment as one preferred receives multiple preset kinds in preset convolutional neural networks CNN in advance
Calculating operation model specifically:
It receives in advance and utilizes OpenCL (Open Computing Language, open operation language) preset convolution mind
Calculating operation model through multiple preset kinds in network C NN.
Specifically, OpenCL has many advantages, such as that structure is simple and easy to use.
Wherein, can use computing module in CNN network layer in the embodiment of the present invention is independent incoherent feature, will
Common each network layer computing module is realized respectively with the high-level programming language OpenCl of FPGA in CNN, and completes OpenCL's
Parallel optimization design, constructs the calculating operation model of multiple preset kinds, and can be by all calculating operation model structures
It builds and calculates library for a network layer.
Certainly, other than OpenCL, calculating operation model can also be realized using other programming languages, the present invention
Embodiment is it is not limited here.
Embodiment as one preferred, preset kind include convolution operation, pondization operation, Relu (Rectified
Linear Unit, line rectification function) and Norm function.
Specifically, the operation of convolution operation, pondization, Relu and norm Norm function are commonly to calculate in various CNN
Operation, can realize various types of CNN well.
Certainly, preset kind can also include other multiple types, and the embodiment of the present invention is it is not limited here.
Referring to FIG. 2, Fig. 2 is a kind of accelerator of convolutional neural networks provided by the invention, comprising:
Receiving module 1, for receiving the calculating operation of multiple preset kinds in preset convolutional neural networks CNN in advance
Model;
First obtains module 2, for obtaining each meter that can be realized CNN to be accelerated from multiple calculating operation models
The calculating operation model for calculating operation, as stand-by calculating operation model;
First control module 3, for controlling the on-site programmable gate array FPGA of accelerator card according to stand-by calculating operation mould
Type compiles out the kernel program for executing CNN to be accelerated;
Second obtains module 4, the action sequence of the action sequence for obtaining each calculating operation comprising CNN to be accelerated
Parameter;
Second control module 5 executes kernel program according to the action sequence in action sequence parameter for controlling FPGA, and
Operation is carried out to preset data, is accelerated to realize.
Embodiment as one preferred, the second acquisition module 4 include:
Conversion module, for CNN to be accelerated to be converted to the CNN to be accelerated of predetermined deep learning framework;
Acquisition submodule, for obtain include predetermined deep learning framework CNN to be accelerated each calculating operation it is dynamic
Make the action sequence parameter of timing.
Acceleration side above-mentioned is please referred to for the introduction of the medium of the accelerator of convolutional neural networks provided by the invention
The embodiment of method, details are not described herein for the embodiment of the present invention.
Referring to FIG. 3, Fig. 3 is a kind of acceleration equipment of convolutional neural networks provided by the invention, comprising:
Memory 6, for storing computer program;
Processor 7 realizes the acceleration side such as the convolutional neural networks in previous embodiment when for executing computer program
The step of method.
Acceleration side above-mentioned is please referred to for the introduction of the medium of the acceleration equipment of convolutional neural networks provided by the invention
The embodiment of method, details are not described herein for the embodiment of the present invention.
The present invention also provides a kind of computer readable storage medium, computer is stored on computer readable storage medium
Program realizes the step of the accelerated method such as the convolutional neural networks in previous embodiment when computer program is executed by processor 7
Suddenly.
The embodiment of accelerated method above-mentioned is please referred to for the introduction of computer readable storage medium provided by the invention,
Details are not described herein for the embodiment of the present invention.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or equipment for including the element.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (10)
1. a kind of accelerated method of convolutional neural networks characterized by comprising
The calculating operation model of multiple preset kinds in preset convolutional neural networks CNN is received in advance;
From multiple calculating operation models, the calculating behaviour for each calculating operation that can be realized CNN to be accelerated is obtained
Make model, as stand-by calculating operation model;
The on-site programmable gate array FPGA of accelerator card is controlled according to the stand-by calculating operation model, is compiled out for executing institute
State the kernel program of CNN to be accelerated;
Obtain the action sequence parameter of the action sequence of each calculating operation comprising the CNN to be accelerated;
It controls the FPGA and executes the kernel program according to the action sequence in the action sequence parameter, and to default
Data carry out operation, accelerate to realize.
2. accelerated method according to claim 1, which is characterized in that described to obtain comprising each of the CNN to be accelerated
The action sequence parameter of the action sequence of the calculating operation specifically:
The CNN to be accelerated is converted into CNN to be accelerated described in predetermined deep learning framework;
Obtain the movement of the action sequence of each calculating operation of the CNN to be accelerated comprising predetermined deep learning framework
Time sequence parameter.
3. accelerated method according to claim 2, which is characterized in that the predetermined deep learning framework be caffe or
TensorFlow。
4. accelerated method according to claim 2, which is characterized in that the field programmable gate array of the control accelerator card
FPGA compiles out the kernel program for executing the CNN to be accelerated according to the stand-by calculating operation model specifically:
The on-site programmable gate array FPGA of accelerator card is controlled according to the stand-by calculating operation model, is compiled by the hardware of itself
Platform is translated, the kernel program for executing the CNN to be accelerated is compiled out.
5. accelerated method according to claim 4, which is characterized in that described to receive preset convolutional neural networks in advance
The calculating operation model of multiple preset kinds in CNN specifically:
The calculating using multiple preset kinds in the open preset convolutional neural networks CNN of operation language OpenCL is received in advance
Operation model.
6. accelerated method according to any one of claims 1 to 5, which is characterized in that the preset kind includes convolution behaviour
Make, pondization operation, line rectification function Relu and Norm function.
7. a kind of accelerator of convolutional neural networks characterized by comprising
Receiving module, for receiving the calculating operation model of multiple preset kinds in preset convolutional neural networks CNN in advance;
First obtains module, for obtaining each calculating that can be realized CNN to be accelerated from multiple calculating operation models
The calculating operation model of operation, as stand-by calculating operation model;
First control module, for controlling the on-site programmable gate array FPGA of accelerator card according to the stand-by calculating operation mould
Type compiles out the kernel program for executing the CNN to be accelerated;
Second obtains module, the movement of the action sequence for obtaining each calculating operation comprising the CNN to be accelerated
Time sequence parameter;
Second control module, for controlling the FPGA according to described in the action sequence execution in the action sequence parameter
Kernel program, and operation is carried out to preset data, accelerate to realize.
8. accelerator according to claim 7, which is characterized in that described second, which obtains module, includes:
Conversion module, for the CNN to be accelerated to be converted to CNN to be accelerated described in predetermined deep learning framework;
Acquisition submodule, for obtaining each calculating operation comprising CNN to be accelerated described in predetermined deep learning framework
Action sequence action sequence parameter.
9. a kind of acceleration equipment of convolutional neural networks characterized by comprising
Memory, for storing computer program;
Processor, realizing the convolutional neural networks as described in any one of claim 1 to 6 when for executing the computer program
The step of accelerated method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program, when the computer program is executed by processor realize as described in any one of claim 1 to 6 convolutional neural networks add
The step of fast method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910016345.9A CN109858610A (en) | 2019-01-08 | 2019-01-08 | A kind of accelerated method of convolutional neural networks, device, equipment and storage medium |
PCT/CN2019/103637 WO2020143236A1 (en) | 2019-01-08 | 2019-08-30 | Method, device, and equipment for accelerating convolutional neural network, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910016345.9A CN109858610A (en) | 2019-01-08 | 2019-01-08 | A kind of accelerated method of convolutional neural networks, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109858610A true CN109858610A (en) | 2019-06-07 |
Family
ID=66894174
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910016345.9A Pending CN109858610A (en) | 2019-01-08 | 2019-01-08 | A kind of accelerated method of convolutional neural networks, device, equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109858610A (en) |
WO (1) | WO2020143236A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929860A (en) * | 2019-11-07 | 2020-03-27 | 深圳云天励飞技术有限公司 | Convolution acceleration operation method and device, storage medium and terminal equipment |
WO2020143236A1 (en) * | 2019-01-08 | 2020-07-16 | 广东浪潮大数据研究有限公司 | Method, device, and equipment for accelerating convolutional neural network, and storage medium |
WO2021077284A1 (en) * | 2019-10-22 | 2021-04-29 | 深圳鲲云信息科技有限公司 | Neural network operating system and method |
CN115829064A (en) * | 2023-02-17 | 2023-03-21 | 山东浪潮科学研究院有限公司 | Method, device and equipment for accelerating federated learning and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170103298A1 (en) * | 2015-10-09 | 2017-04-13 | Altera Corporation | Method and Apparatus for Designing and Implementing a Convolution Neural Net Accelerator |
CN107463990A (en) * | 2016-06-02 | 2017-12-12 | 国家计算机网络与信息安全管理中心 | A kind of FPGA parallel acceleration methods of convolutional neural networks |
US20180114117A1 (en) * | 2016-10-21 | 2018-04-26 | International Business Machines Corporation | Accelerate deep neural network in an fpga |
CN107992299A (en) * | 2017-11-27 | 2018-05-04 | 郑州云海信息技术有限公司 | Neutral net hyper parameter extraction conversion method, system, device and storage medium |
CN108710941A (en) * | 2018-04-11 | 2018-10-26 | 杭州菲数科技有限公司 | The hard acceleration method and device of neural network model for electronic equipment |
CN108764466A (en) * | 2018-03-07 | 2018-11-06 | 东南大学 | Convolutional neural networks hardware based on field programmable gate array and its accelerated method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10303953B2 (en) * | 2017-04-17 | 2019-05-28 | Intel Corporation | Person tracking and privacy and acceleration of data using autonomous machines |
CN107657581B (en) * | 2017-09-28 | 2020-12-22 | 中国人民解放军国防科技大学 | Convolutional neural network CNN hardware accelerator and acceleration method |
CN109858610A (en) * | 2019-01-08 | 2019-06-07 | 广东浪潮大数据研究有限公司 | A kind of accelerated method of convolutional neural networks, device, equipment and storage medium |
-
2019
- 2019-01-08 CN CN201910016345.9A patent/CN109858610A/en active Pending
- 2019-08-30 WO PCT/CN2019/103637 patent/WO2020143236A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170103298A1 (en) * | 2015-10-09 | 2017-04-13 | Altera Corporation | Method and Apparatus for Designing and Implementing a Convolution Neural Net Accelerator |
CN107463990A (en) * | 2016-06-02 | 2017-12-12 | 国家计算机网络与信息安全管理中心 | A kind of FPGA parallel acceleration methods of convolutional neural networks |
US20180114117A1 (en) * | 2016-10-21 | 2018-04-26 | International Business Machines Corporation | Accelerate deep neural network in an fpga |
CN107992299A (en) * | 2017-11-27 | 2018-05-04 | 郑州云海信息技术有限公司 | Neutral net hyper parameter extraction conversion method, system, device and storage medium |
CN108764466A (en) * | 2018-03-07 | 2018-11-06 | 东南大学 | Convolutional neural networks hardware based on field programmable gate array and its accelerated method |
CN108710941A (en) * | 2018-04-11 | 2018-10-26 | 杭州菲数科技有限公司 | The hard acceleration method and device of neural network model for electronic equipment |
Non-Patent Citations (1)
Title |
---|
曹芳等: ""基于FPGA的CNN单机多卡加速算法实现"", 《2017电力行业信息化年会》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020143236A1 (en) * | 2019-01-08 | 2020-07-16 | 广东浪潮大数据研究有限公司 | Method, device, and equipment for accelerating convolutional neural network, and storage medium |
WO2021077284A1 (en) * | 2019-10-22 | 2021-04-29 | 深圳鲲云信息科技有限公司 | Neural network operating system and method |
CN110929860A (en) * | 2019-11-07 | 2020-03-27 | 深圳云天励飞技术有限公司 | Convolution acceleration operation method and device, storage medium and terminal equipment |
CN115829064A (en) * | 2023-02-17 | 2023-03-21 | 山东浪潮科学研究院有限公司 | Method, device and equipment for accelerating federated learning and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2020143236A1 (en) | 2020-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109858610A (en) | A kind of accelerated method of convolutional neural networks, device, equipment and storage medium | |
US11783227B2 (en) | Method, apparatus, device and readable medium for transfer learning in machine learning | |
CN111651207B (en) | Neural network model operation chip, method, device, equipment and medium | |
CN110287489A (en) | Document creation method, device, storage medium and electronic equipment | |
CN106528613B (en) | Intelligent answer method and device | |
CN110750298B (en) | AI model compiling method, equipment and storage medium | |
CN110717584A (en) | Neural network compiling method, compiler, computer device, and readable storage medium | |
Rosenbloom et al. | Towards emotion in sigma: from appraisal to attention | |
CN105894043A (en) | Method and system for generating video description sentences | |
CN113157917A (en) | OpenCL-based optimized classification model establishing and optimized classification method and system | |
US9336195B2 (en) | Method and system for dictionary noise removal | |
CN110109658B (en) | ROS code generator based on formalized model and code generation method | |
EP4318319A1 (en) | Model processing method and apparatus | |
Wen et al. | Taso: Time and space optimization for memory-constrained DNN inference | |
US20220067495A1 (en) | Intelligent processor, data processing method and storage medium | |
CN111831285B (en) | Code conversion method, system and application for memory computing platform | |
Benmeziane | Comparison of deep learning frameworks and compilers | |
CN111190690A (en) | Intelligent training device based on container arrangement tool | |
US20230419039A1 (en) | Named Entity Recognition Using Capsule Networks | |
Tarasyuk et al. | Stochastic process reduction for performance evaluation in dtsiPBC | |
CN110825530B (en) | Instruction execution method and device for artificial intelligence chip | |
CN106126311A (en) | A kind of intermediate code optimization method based on algebraically calculation | |
US20230214598A1 (en) | Semantic Frame Identification Using Capsule Networks | |
CN111475775B (en) | Data processing method, text processing method, device and equipment of graphic processor | |
Benmeziane | Accelerating a Deep Learning Framework with Tiramisu |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190607 |
|
RJ01 | Rejection of invention patent application after publication |