CN109858610A - A kind of accelerated method of convolutional neural networks, device, equipment and storage medium - Google Patents

A kind of accelerated method of convolutional neural networks, device, equipment and storage medium Download PDF

Info

Publication number
CN109858610A
CN109858610A CN201910016345.9A CN201910016345A CN109858610A CN 109858610 A CN109858610 A CN 109858610A CN 201910016345 A CN201910016345 A CN 201910016345A CN 109858610 A CN109858610 A CN 109858610A
Authority
CN
China
Prior art keywords
cnn
accelerated
calculating operation
action sequence
neural networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910016345.9A
Other languages
Chinese (zh)
Inventor
王丽
曹芳
郭振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Inspur Smart Computing Technology Co Ltd
Original Assignee
Guangdong Inspur Big Data Research Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Inspur Big Data Research Co Ltd filed Critical Guangdong Inspur Big Data Research Co Ltd
Priority to CN201910016345.9A priority Critical patent/CN109858610A/en
Publication of CN109858610A publication Critical patent/CN109858610A/en
Priority to PCT/CN2019/103637 priority patent/WO2020143236A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses accelerated method device, equipment and the storage mediums of a kind of convolutional neural networks, the calculating operation model including receiving multiple preset kinds in preset convolutional neural networks CNN in advance;From multiple calculating operation models, the calculating operation model that can be realized each calculating operation of CNN to be accelerated is obtained, as stand-by calculating operation model;The on-site programmable gate array FPGA of accelerator card is controlled according to stand-by calculating operation model, compiles out the kernel program for executing CNN to be accelerated;Obtain the action sequence parameter of the action sequence of each calculating operation comprising CNN to be accelerated;It controls FPGA and executes kernel program according to the action sequence in action sequence parameter, and operation is carried out to preset data, accelerate to realize.The present invention can execute the operation of the acceleration to any one CNN to be accelerated using any one piece of accelerator card, and without developing a variety of accelerator cards, flexibility is stronger, and saves research and development cost.

Description

A kind of accelerated method of convolutional neural networks, device, equipment and storage medium
Technical field
The present invention relates to algorithms to accelerate field, and more particularly to a kind of accelerated method of convolutional neural networks, the present invention is also It is related to accelerator, equipment and the storage medium of a kind of convolutional neural networks.
Background technique
CNN (Convolutional Neutral Network, convolutional neural networks) is one kind of artificial neural network, In order to meet the requirement such as arithmetic speed, it will usually accelerated using calculating process of the accelerator card to CNN, but there are many CNN Different type, in the prior art when the calculating process to CNN accelerates, it is necessary to use this type to be accelerated The dedicated accelerator card of CNN, i.e., each type of CNN, which requires dedicated accelerator card, can realize acceleration, and flexibility is poor, and It researches and develops a plurality of types of accelerator cards and produces higher research and development cost.
Therefore, how to provide a kind of scheme of solution above-mentioned technical problem is that those skilled in the art need to solve at present Problem.
Summary of the invention
The object of the present invention is to provide a kind of accelerated methods of convolutional neural networks, and flexibility is stronger, and save research and development Cost;It is a further object of the present invention to provide a kind of accelerator of convolutional neural networks, equipment and storage medium, flexibility compared with By force, and research and development cost is saved.
In order to solve the above technical problems, the present invention provides a kind of accelerated methods of convolutional neural networks, comprising:
The calculating operation model of multiple preset kinds in preset convolutional neural networks CNN is received in advance;
From multiple calculating operation models, the meter that can be realized each calculating operation of CNN to be accelerated is obtained Operation model is calculated, as stand-by calculating operation model;
The on-site programmable gate array FPGA of accelerator card is controlled according to the stand-by calculating operation model, is compiled out for holding The kernel program of the row CNN to be accelerated;
Obtain the action sequence parameter of the action sequence of each calculating operation comprising the CNN to be accelerated;
It controls the FPGA and executes the kernel program according to the action sequence in the action sequence parameter, and is right Preset data carries out operation, accelerates to realize.
Preferably, described to obtain comprising described when accelerating the movement of the action sequence of each calculating operation of CNN Order parameter specifically:
The CNN to be accelerated is converted into CNN to be accelerated described in predetermined deep learning framework;
Obtain the action sequence of each calculating operation of the CNN to be accelerated comprising predetermined deep learning framework Action sequence parameter.
Preferably, the predetermined deep learning framework is caffe or TensorFlow.
Preferably, the on-site programmable gate array FPGA of the control accelerator card is according to the stand-by calculating operation model, Compile out the kernel program for executing the CNN to be accelerated specifically:
The on-site programmable gate array FPGA of accelerator card is controlled according to the stand-by calculating operation model, passes through the hard of itself Part compiles platform, compiles out the kernel program for executing the CNN to be accelerated.
Preferably, the calculating operation mould for receiving multiple preset kinds in preset convolutional neural networks CNN in advance Type specifically:
It receives in advance and utilizes multiple preset kinds in the open preset convolutional neural networks CNN of operation language OpenCL Calculating operation model.
Preferably, the preset kind includes convolution operation, pondization operation, line rectification function Relu and Norm letter Number.
In order to solve the above technical problems, the present invention also provides a kind of accelerators of convolutional neural networks, comprising:
Receiving module, for receiving the calculating operation of multiple preset kinds in preset convolutional neural networks CNN in advance Model;
First obtains module, for from multiple calculating operation models, acquisition to can be realized each of CNN to be accelerated The calculating operation model of calculating operation, as stand-by calculating operation model;
First control module, for controlling the on-site programmable gate array FPGA of accelerator card according to the stand-by calculating operation Model compiles out the kernel program for executing the CNN to be accelerated;
Second obtains module, the action sequence of each calculating operation for obtaining the CNN to be accelerated comprising described in Action sequence parameter;
Second control module is executed for controlling the FPGA according to the action sequence in the action sequence parameter The kernel program, and operation is carried out to preset data, accelerate to realize.
Preferably, the second acquisition module includes:
Conversion module, for the CNN to be accelerated to be converted to CNN to be accelerated described in predetermined deep learning framework;
Acquisition submodule, for obtaining each calculating comprising CNN to be accelerated described in predetermined deep learning framework The action sequence parameter of the action sequence of operation.
In order to solve the above technical problems, the present invention also provides a kind of acceleration equipments of convolutional neural networks, comprising:
Memory, for storing computer program;
Processor realizes the acceleration side of the as above any one convolutional neural networks when for executing the computer program The step of method.
In order to solve the above technical problems, the computer can the present invention also provides a kind of computer readable storage medium It reads to be stored with computer program on storage medium, the as above any one volume is realized when the computer program is executed by processor The step of accelerated method of product neural network.
The present invention provides a kind of accelerated methods of convolutional neural networks, including receive preset convolutional neural networks in advance The calculating operation model of multiple preset kinds in CNN;From multiple calculating operation models, acquisition can be realized CNN to be accelerated Each calculating operation calculating operation model, as stand-by calculating operation model;Control the field-programmable gate array of accelerator card FPGA is arranged according to stand-by calculating operation model, compiles out the kernel program for executing CNN to be accelerated;It obtains comprising wait accelerate The action sequence parameter of the action sequence of each calculating operation of CNN;When controlling FPGA according to movement in action sequence parameter Sequence executes kernel program, and carries out operation to preset data, accelerates to realize.
As it can be seen that executed to any one when accelerating the acceleration of CNN to operate in the present invention, it can be from preset CNN In multiple preset kinds calculating operation model in, obtain can be realized CNN to be accelerated each calculating operation calculating behaviour Make model, as stand-by calculating operation model, then can control the FPGA in accelerator card according to stand-by calculating operation model, compile The kernel program for executing CNN to be accelerated is translated, then the movement of available each calculating operation comprising CNN to be accelerated The action sequence parameter of timing, and control FPGA and execute kernel program according to the action sequence in action sequence parameter, and to pre- If data carry out operation, accelerate to realize, the present invention can be executed to be added to any one using any one piece of accelerator card The acceleration of fast CNN operates, and without developing a variety of accelerator cards, flexibility is stronger, and saves research and development cost.
The present invention also provides a kind of accelerator of convolutional neural networks, equipment and storage medium, there is as above volume The identical beneficial effect of accelerated method of product neural network.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to institute in the prior art and embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings Obtain other attached drawings.
Fig. 1 is a kind of flow diagram of the accelerated method of convolutional neural networks provided by the invention;
Fig. 2 is a kind of structural schematic diagram of the accelerator of convolutional neural networks provided by the invention;
Fig. 3 is a kind of structural schematic diagram of the acceleration equipment of convolutional neural networks provided by the invention.
Specific embodiment
Core of the invention is to provide a kind of accelerated method of convolutional neural networks, and flexibility is stronger, and saves research and development Cost;Another core of the invention is to provide accelerator, equipment and the storage medium of a kind of convolutional neural networks, flexibility compared with By force, and research and development cost is saved.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Referring to FIG. 1, Fig. 1 is a kind of flow diagram of the accelerated method of convolutional neural networks provided by the invention, packet It includes:
Step S1: the calculating operation model of multiple preset kinds in preset CNN is received in advance;
Specifically, the calculating operation model of multiple preset kinds can be that can be realized in various convolutional neural networks to commonly use Calculating operation each calculating operation model, such as A calculating operation model may be implemented A calculating operation, and B calculating operation mould B calculating operation etc. may be implemented in type, and quantity can independently be set according to demand, and the embodiment of the present invention does not limit herein It is fixed.
Specifically, the executing subject in the embodiment of the present invention can be CPU, this step is specifically as follows the storage in CPU Module receives the calculating operation model of multiple preset kinds in preset CNN in advance, or CPU receive it is preset It is stored it in memory module after the calculating operation model of multiple preset kinds in CNN, in such cases, works as memory module In have each calculating operation model after, subsequent step can be executed, to realize the acceleration operation to various algorithms.
Step S2: from multiple calculating operation models, the calculating that can be realized each calculating operation of CNN to be accelerated is obtained Operation model, as stand-by calculating operation model;
Specifically, CNN to be accelerated can be any one in various CNN, the embodiment of the present invention is it is not limited here.
Wherein it is possible to be obtained first wait accelerate each calculating operation in CNN, that is, know wait accelerate each meter in CNN Calculating operation is what respectively, then from multiple calculating operation models, can be realized the meter of each calculating operation of CNN to be accelerated Operation model is calculated, as stand-by calculating operation model, to execute subsequent step.
Step S3: FPGA (Field-Programmable Gate Array, the field-programmable gate array of accelerator card are controlled Column) according to stand-by calculating operation model, compile out the kernel program for executing CNN to be accelerated;
Accelerate CNN to be accelerated specifically, FPGA wants smoothly to treat, can compile out and use according to stand-by computation model In the kernel program for executing CNN to be accelerated, in such cases, FPGA can execute kernel program and cooperate subsequent step with right CNN to be accelerated is accelerated.
Wherein, the acceleration that the embodiment of the present invention is realized by FPGA can accelerate for isomery, can be adapted for various types CNN, the embodiment of the present invention is it is not limited here.
Wherein, after kernel program is compiled, it can control kernel program and be loaded into FPGA, so as to subsequent execution.
Step S4: the action sequence parameter of the action sequence of each calculating operation comprising CNN to be accelerated is obtained;
Specifically, the action sequence parameter of CNN to be accelerated can be obtained by multiple approach, for example, can directly for CNN is accelerated to carry out parsing acquisition, or acquisition etc. from pre-stored data bank, the embodiment of the present invention is it is not limited here.
Wherein, it may include the action sequence of CNN to be accelerated in action sequence parameter, such as after A movement is finished B movement is executed, executes D movement etc. after B movement has executed, the concrete form of action sequence is opposite with the type of CNN to be accelerated It answers, the embodiment of the present invention is it is not limited here.
Step S5: control FPGA executes kernel program according to the action sequence in action sequence parameter, and to preset data Operation is carried out, is accelerated to realize.
Specifically, above-mentioned steps after the completion of, FPGA can be controlled according to the action sequence in action sequence parameter Kernel program is executed, and operation is carried out to preset data in the process, the acceleration for CNN to be accelerated is realized, improves Arithmetic speed.
Wherein, preset data can be a plurality of types of data, such as the people got in carrying out face recognition process Face data etc., preset data can be input in FPGA under the control of cpu by global memory and carry out operation, the embodiment of the present invention It is not limited here.
Wherein it is possible to which action sequence parameter is saved in data group, the read-write operation of data in array is then controlled, it will Data are passed in the global memory of FPGA, and start FPGA kernel program, and reading it from global memory includes when acting The input data of order parameter and preset data, accelerates algorithm.
In addition, CPU can also obtain the operation result of FPGA after operation terminates, the process for obtaining operation result can Think that control FPGA stores operation result, then CPU obtains operation result from storage, and by operation result with a variety of Form output, such as diagrammatic form or the form of voice prompting etc., the embodiment of the present invention is it is not limited here.
It should be noted that a branch of the deep learning as machine learning, is the neck quickly grown in artificial intelligence One of domain can help the data of computer understanding great amount of images, sound and textual form.Recently as caffe (Convolutional Architecture for Fast Feature Embedding, the convolution for swift nature insertion Structure) even depth study Open-Source Tools tend to be mature, deep learning technology is quickly grown, currently, deep learning recognition of face, Speech recognition, precisely medical treatment and the fields such as unmanned are just widely used.CNN is one kind of artificial neural network, is First is really successfully trained the deep learning algorithm of multitiered network structure.Developer is created using computation-intensive algorithm CNN, and it is implemented on a variety of platforms.It, can mimic biology vision because it is connected with multilayer neuron handles data The behavior of nerve obtains very high recognition accuracy, it has also become the research hotspot of current speech analysis and field of image recognition.Check In reading system, OCR (Optical Character Recognition, optical character identification) and hand-written discrimination system, streetscape Recognition of face and Car license recognition and France Telecom's video conferencing system in recognition of face all used CNN.
Existing major part CNN, which is realized, is mainly based upon general processor CPU realization, in CNN network structure, in layer Calculating is independent incoherent, and interlayer structure can be understood as a flowing structure.Due to the specific calculations mode of CNN, General processor CPU excavates the concurrency inside CNN due to its own feature with being unable to fully, and realizes that CNN is not efficient, so It is difficult to meet performance requirement.It is based on FPGA recently, GPU (Graphics Processing Unit, graphics processor) is even The different accelerators of ASIC (Application Specific Integrated Circuit, specific integrated circuit) are successive It proposes to promote CNN design performance.In these schemes, based on the accelerator of FPGA due to its better performance, high energy efficiency, fastly The fast development cycle and it is reconfigurable can the gravitational attraction attention of more and more researchers.FPGA adds as a kind of computation-intensive Fast component, by accelerating the Parallel Hardware on Algorithm mapping to FPGA, the upper designed each hardware module of FPGA can To execute parallel, flowing structure provided by the interconnection of each hardware module input and output and FPGA can be very good and CNN algorithm matches, and makes full use of the concurrency inside algorithm network structure, reduces energy while improving arithmetic speed Consumption.There is scholar to realize the CNN of different structure on FPGA before to do simple realtime graphic identification or classification, but this A little researchs realize it is most of just in calculating more complicated convolutional layer or being based on certain specific neural network, such as Aydonat et al. proposes a completely new CNN and realizes frame, completes FPGA and accelerates to the isomery of Alexnet network.Work as research and development When personnel need to carry out FPGA isomery to new convolutional neural networks to accelerate, then the specific network knot according to new network is needed Structure realizes there is poor versatility and flexibility to the design for realizing that framework carries out again of FPGA.
The present invention provides a kind of accelerated methods of convolutional neural networks, including receive preset convolutional neural networks in advance The calculating operation model of multiple preset kinds in CNN;From multiple calculating operation models, acquisition can be realized CNN to be accelerated Each calculating operation calculating operation model, as stand-by calculating operation model;Control the field-programmable gate array of accelerator card FPGA is arranged according to stand-by calculating operation model, compiles out the kernel program for executing CNN to be accelerated;It obtains comprising wait accelerate The action sequence parameter of the action sequence of each calculating operation of CNN;When controlling FPGA according to movement in action sequence parameter Sequence executes kernel program, and carries out operation to preset data, accelerates to realize.
As it can be seen that executed to any one when accelerating the acceleration of CNN to operate in the present invention, it can be from preset CNN In multiple preset kinds calculating operation model in, obtain can be realized CNN to be accelerated each calculating operation calculating behaviour Make model, as stand-by calculating operation model, then can control the FPGA in accelerator card according to stand-by calculating operation model, compile The kernel program for executing CNN to be accelerated is translated, then the movement of available each calculating operation comprising CNN to be accelerated The action sequence parameter of timing, and control FPGA and execute kernel program according to the action sequence in action sequence parameter, and to pre- If data carry out operation, accelerate to realize, the present invention can be executed to be added to any one using any one piece of accelerator card The acceleration of fast CNN operates, and without developing a variety of accelerator cards, flexibility is stronger, and saves research and development cost.
On the basis of the above embodiments:
Embodiment as one preferred obtains the movement of the action sequence of each calculating operation comprising CNN to be accelerated Time sequence parameter specifically:
CNN to be accelerated is converted to the CNN to be accelerated of predetermined deep learning framework;
Obtain the action sequence of the action sequence of each calculating operation of the CNN to be accelerated comprising predetermined deep learning framework Parameter.
Specifically, in view of CNN to be accelerated can be a plurality of types of deep learning frames, it is desirable to from different types of depth The action sequence parameter of CNN to be accelerated is obtained in degree learning framework, it is necessary to build various types of deep learnings in advance in CPU Frame in the embodiment of the present invention, in order to save resource, can only build a kind of predetermined deep learning framework, such feelings in CPU Under condition, it is only necessary to CNN to be accelerated be converted to the CNN to be accelerated of default learning framework, then CPU, which can be treated, accelerates CNN In action sequence parameter obtained, save resource.
Certainly, obtaining action sequence parameter may be other modes, such as build a variety of deep learnings in CPU in advance Then frame is treated and the action sequence parameter of CNN is accelerated to directly acquire etc., the embodiment of the present invention is it is not limited here.
Embodiment as one preferred, predetermined deep learning framework are caffe or TensorFlow.
Specifically, caffe and TensorFlow are common deep learning frame, in such cases, if wait accelerate CNN is just caffe and TensorFlow, then just being converted without carrying out deep learning frame, is further saved Computing resource.
Certainly, other than caffe and TensorFlow, predetermined deep learning framework can also be other types, this hair Bright embodiment is it is not limited here.
Embodiment as one preferred controls the on-site programmable gate array FPGA of accelerator card according to stand-by calculating operation Model compiles out the kernel program for executing CNN to be accelerated specifically:
The on-site programmable gate array FPGA of accelerator card is controlled according to stand-by calculating operation model, is compiled by the hardware of itself Platform is translated, the kernel program for executing CNN to be accelerated is compiled out.
It, can be with save the cost, without will count specifically, the hardware compilation platform using FPGA itself compiles kernel program According to export, improve work efficiency.
Certainly, other than using the hardware compilation platform of FPGA itself compiling kernel program, other modes can also be used, The embodiment of the present invention is it is not limited here.
Embodiment as one preferred receives multiple preset kinds in preset convolutional neural networks CNN in advance Calculating operation model specifically:
It receives in advance and utilizes OpenCL (Open Computing Language, open operation language) preset convolution mind Calculating operation model through multiple preset kinds in network C NN.
Specifically, OpenCL has many advantages, such as that structure is simple and easy to use.
Wherein, can use computing module in CNN network layer in the embodiment of the present invention is independent incoherent feature, will Common each network layer computing module is realized respectively with the high-level programming language OpenCl of FPGA in CNN, and completes OpenCL's Parallel optimization design, constructs the calculating operation model of multiple preset kinds, and can be by all calculating operation model structures It builds and calculates library for a network layer.
Certainly, other than OpenCL, calculating operation model can also be realized using other programming languages, the present invention Embodiment is it is not limited here.
Embodiment as one preferred, preset kind include convolution operation, pondization operation, Relu (Rectified Linear Unit, line rectification function) and Norm function.
Specifically, the operation of convolution operation, pondization, Relu and norm Norm function are commonly to calculate in various CNN Operation, can realize various types of CNN well.
Certainly, preset kind can also include other multiple types, and the embodiment of the present invention is it is not limited here.
Referring to FIG. 2, Fig. 2 is a kind of accelerator of convolutional neural networks provided by the invention, comprising:
Receiving module 1, for receiving the calculating operation of multiple preset kinds in preset convolutional neural networks CNN in advance Model;
First obtains module 2, for obtaining each meter that can be realized CNN to be accelerated from multiple calculating operation models The calculating operation model for calculating operation, as stand-by calculating operation model;
First control module 3, for controlling the on-site programmable gate array FPGA of accelerator card according to stand-by calculating operation mould Type compiles out the kernel program for executing CNN to be accelerated;
Second obtains module 4, the action sequence of the action sequence for obtaining each calculating operation comprising CNN to be accelerated Parameter;
Second control module 5 executes kernel program according to the action sequence in action sequence parameter for controlling FPGA, and Operation is carried out to preset data, is accelerated to realize.
Embodiment as one preferred, the second acquisition module 4 include:
Conversion module, for CNN to be accelerated to be converted to the CNN to be accelerated of predetermined deep learning framework;
Acquisition submodule, for obtain include predetermined deep learning framework CNN to be accelerated each calculating operation it is dynamic Make the action sequence parameter of timing.
Acceleration side above-mentioned is please referred to for the introduction of the medium of the accelerator of convolutional neural networks provided by the invention The embodiment of method, details are not described herein for the embodiment of the present invention.
Referring to FIG. 3, Fig. 3 is a kind of acceleration equipment of convolutional neural networks provided by the invention, comprising:
Memory 6, for storing computer program;
Processor 7 realizes the acceleration side such as the convolutional neural networks in previous embodiment when for executing computer program The step of method.
Acceleration side above-mentioned is please referred to for the introduction of the medium of the acceleration equipment of convolutional neural networks provided by the invention The embodiment of method, details are not described herein for the embodiment of the present invention.
The present invention also provides a kind of computer readable storage medium, computer is stored on computer readable storage medium Program realizes the step of the accelerated method such as the convolutional neural networks in previous embodiment when computer program is executed by processor 7 Suddenly.
The embodiment of accelerated method above-mentioned is please referred to for the introduction of computer readable storage medium provided by the invention, Details are not described herein for the embodiment of the present invention.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or equipment for including the element.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (10)

1. a kind of accelerated method of convolutional neural networks characterized by comprising
The calculating operation model of multiple preset kinds in preset convolutional neural networks CNN is received in advance;
From multiple calculating operation models, the calculating behaviour for each calculating operation that can be realized CNN to be accelerated is obtained Make model, as stand-by calculating operation model;
The on-site programmable gate array FPGA of accelerator card is controlled according to the stand-by calculating operation model, is compiled out for executing institute State the kernel program of CNN to be accelerated;
Obtain the action sequence parameter of the action sequence of each calculating operation comprising the CNN to be accelerated;
It controls the FPGA and executes the kernel program according to the action sequence in the action sequence parameter, and to default Data carry out operation, accelerate to realize.
2. accelerated method according to claim 1, which is characterized in that described to obtain comprising each of the CNN to be accelerated The action sequence parameter of the action sequence of the calculating operation specifically:
The CNN to be accelerated is converted into CNN to be accelerated described in predetermined deep learning framework;
Obtain the movement of the action sequence of each calculating operation of the CNN to be accelerated comprising predetermined deep learning framework Time sequence parameter.
3. accelerated method according to claim 2, which is characterized in that the predetermined deep learning framework be caffe or TensorFlow。
4. accelerated method according to claim 2, which is characterized in that the field programmable gate array of the control accelerator card FPGA compiles out the kernel program for executing the CNN to be accelerated according to the stand-by calculating operation model specifically:
The on-site programmable gate array FPGA of accelerator card is controlled according to the stand-by calculating operation model, is compiled by the hardware of itself Platform is translated, the kernel program for executing the CNN to be accelerated is compiled out.
5. accelerated method according to claim 4, which is characterized in that described to receive preset convolutional neural networks in advance The calculating operation model of multiple preset kinds in CNN specifically:
The calculating using multiple preset kinds in the open preset convolutional neural networks CNN of operation language OpenCL is received in advance Operation model.
6. accelerated method according to any one of claims 1 to 5, which is characterized in that the preset kind includes convolution behaviour Make, pondization operation, line rectification function Relu and Norm function.
7. a kind of accelerator of convolutional neural networks characterized by comprising
Receiving module, for receiving the calculating operation model of multiple preset kinds in preset convolutional neural networks CNN in advance;
First obtains module, for obtaining each calculating that can be realized CNN to be accelerated from multiple calculating operation models The calculating operation model of operation, as stand-by calculating operation model;
First control module, for controlling the on-site programmable gate array FPGA of accelerator card according to the stand-by calculating operation mould Type compiles out the kernel program for executing the CNN to be accelerated;
Second obtains module, the movement of the action sequence for obtaining each calculating operation comprising the CNN to be accelerated Time sequence parameter;
Second control module, for controlling the FPGA according to described in the action sequence execution in the action sequence parameter Kernel program, and operation is carried out to preset data, accelerate to realize.
8. accelerator according to claim 7, which is characterized in that described second, which obtains module, includes:
Conversion module, for the CNN to be accelerated to be converted to CNN to be accelerated described in predetermined deep learning framework;
Acquisition submodule, for obtaining each calculating operation comprising CNN to be accelerated described in predetermined deep learning framework Action sequence action sequence parameter.
9. a kind of acceleration equipment of convolutional neural networks characterized by comprising
Memory, for storing computer program;
Processor, realizing the convolutional neural networks as described in any one of claim 1 to 6 when for executing the computer program The step of accelerated method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, when the computer program is executed by processor realize as described in any one of claim 1 to 6 convolutional neural networks add The step of fast method.
CN201910016345.9A 2019-01-08 2019-01-08 A kind of accelerated method of convolutional neural networks, device, equipment and storage medium Pending CN109858610A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910016345.9A CN109858610A (en) 2019-01-08 2019-01-08 A kind of accelerated method of convolutional neural networks, device, equipment and storage medium
PCT/CN2019/103637 WO2020143236A1 (en) 2019-01-08 2019-08-30 Method, device, and equipment for accelerating convolutional neural network, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910016345.9A CN109858610A (en) 2019-01-08 2019-01-08 A kind of accelerated method of convolutional neural networks, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN109858610A true CN109858610A (en) 2019-06-07

Family

ID=66894174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910016345.9A Pending CN109858610A (en) 2019-01-08 2019-01-08 A kind of accelerated method of convolutional neural networks, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN109858610A (en)
WO (1) WO2020143236A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929860A (en) * 2019-11-07 2020-03-27 深圳云天励飞技术有限公司 Convolution acceleration operation method and device, storage medium and terminal equipment
WO2020143236A1 (en) * 2019-01-08 2020-07-16 广东浪潮大数据研究有限公司 Method, device, and equipment for accelerating convolutional neural network, and storage medium
WO2021077284A1 (en) * 2019-10-22 2021-04-29 深圳鲲云信息科技有限公司 Neural network operating system and method
CN115829064A (en) * 2023-02-17 2023-03-21 山东浪潮科学研究院有限公司 Method, device and equipment for accelerating federated learning and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170103298A1 (en) * 2015-10-09 2017-04-13 Altera Corporation Method and Apparatus for Designing and Implementing a Convolution Neural Net Accelerator
CN107463990A (en) * 2016-06-02 2017-12-12 国家计算机网络与信息安全管理中心 A kind of FPGA parallel acceleration methods of convolutional neural networks
US20180114117A1 (en) * 2016-10-21 2018-04-26 International Business Machines Corporation Accelerate deep neural network in an fpga
CN107992299A (en) * 2017-11-27 2018-05-04 郑州云海信息技术有限公司 Neutral net hyper parameter extraction conversion method, system, device and storage medium
CN108710941A (en) * 2018-04-11 2018-10-26 杭州菲数科技有限公司 The hard acceleration method and device of neural network model for electronic equipment
CN108764466A (en) * 2018-03-07 2018-11-06 东南大学 Convolutional neural networks hardware based on field programmable gate array and its accelerated method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10303953B2 (en) * 2017-04-17 2019-05-28 Intel Corporation Person tracking and privacy and acceleration of data using autonomous machines
CN107657581B (en) * 2017-09-28 2020-12-22 中国人民解放军国防科技大学 Convolutional neural network CNN hardware accelerator and acceleration method
CN109858610A (en) * 2019-01-08 2019-06-07 广东浪潮大数据研究有限公司 A kind of accelerated method of convolutional neural networks, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170103298A1 (en) * 2015-10-09 2017-04-13 Altera Corporation Method and Apparatus for Designing and Implementing a Convolution Neural Net Accelerator
CN107463990A (en) * 2016-06-02 2017-12-12 国家计算机网络与信息安全管理中心 A kind of FPGA parallel acceleration methods of convolutional neural networks
US20180114117A1 (en) * 2016-10-21 2018-04-26 International Business Machines Corporation Accelerate deep neural network in an fpga
CN107992299A (en) * 2017-11-27 2018-05-04 郑州云海信息技术有限公司 Neutral net hyper parameter extraction conversion method, system, device and storage medium
CN108764466A (en) * 2018-03-07 2018-11-06 东南大学 Convolutional neural networks hardware based on field programmable gate array and its accelerated method
CN108710941A (en) * 2018-04-11 2018-10-26 杭州菲数科技有限公司 The hard acceleration method and device of neural network model for electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曹芳等: ""基于FPGA的CNN单机多卡加速算法实现"", 《2017电力行业信息化年会》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020143236A1 (en) * 2019-01-08 2020-07-16 广东浪潮大数据研究有限公司 Method, device, and equipment for accelerating convolutional neural network, and storage medium
WO2021077284A1 (en) * 2019-10-22 2021-04-29 深圳鲲云信息科技有限公司 Neural network operating system and method
CN110929860A (en) * 2019-11-07 2020-03-27 深圳云天励飞技术有限公司 Convolution acceleration operation method and device, storage medium and terminal equipment
CN115829064A (en) * 2023-02-17 2023-03-21 山东浪潮科学研究院有限公司 Method, device and equipment for accelerating federated learning and storage medium

Also Published As

Publication number Publication date
WO2020143236A1 (en) 2020-07-16

Similar Documents

Publication Publication Date Title
CN109858610A (en) A kind of accelerated method of convolutional neural networks, device, equipment and storage medium
US11783227B2 (en) Method, apparatus, device and readable medium for transfer learning in machine learning
CN111651207B (en) Neural network model operation chip, method, device, equipment and medium
CN110287489A (en) Document creation method, device, storage medium and electronic equipment
CN106528613B (en) Intelligent answer method and device
CN110750298B (en) AI model compiling method, equipment and storage medium
CN110717584A (en) Neural network compiling method, compiler, computer device, and readable storage medium
Rosenbloom et al. Towards emotion in sigma: from appraisal to attention
CN105894043A (en) Method and system for generating video description sentences
CN113157917A (en) OpenCL-based optimized classification model establishing and optimized classification method and system
US9336195B2 (en) Method and system for dictionary noise removal
CN110109658B (en) ROS code generator based on formalized model and code generation method
EP4318319A1 (en) Model processing method and apparatus
Wen et al. Taso: Time and space optimization for memory-constrained DNN inference
US20220067495A1 (en) Intelligent processor, data processing method and storage medium
CN111831285B (en) Code conversion method, system and application for memory computing platform
Benmeziane Comparison of deep learning frameworks and compilers
CN111190690A (en) Intelligent training device based on container arrangement tool
US20230419039A1 (en) Named Entity Recognition Using Capsule Networks
Tarasyuk et al. Stochastic process reduction for performance evaluation in dtsiPBC
CN110825530B (en) Instruction execution method and device for artificial intelligence chip
CN106126311A (en) A kind of intermediate code optimization method based on algebraically calculation
US20230214598A1 (en) Semantic Frame Identification Using Capsule Networks
CN111475775B (en) Data processing method, text processing method, device and equipment of graphic processor
Benmeziane Accelerating a Deep Learning Framework with Tiramisu

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190607

RJ01 Rejection of invention patent application after publication