CN111931913A - Caffe-based deployment method of convolutional neural network on FPGA - Google Patents

Caffe-based deployment method of convolutional neural network on FPGA Download PDF

Info

Publication number
CN111931913A
CN111931913A CN202010793360.7A CN202010793360A CN111931913A CN 111931913 A CN111931913 A CN 111931913A CN 202010793360 A CN202010793360 A CN 202010793360A CN 111931913 A CN111931913 A CN 111931913A
Authority
CN
China
Prior art keywords
neural network
layer
convolutional neural
caffe
fpga
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010793360.7A
Other languages
Chinese (zh)
Other versions
CN111931913B (en
Inventor
杨鹏飞
王泉
张志强
梁瑀
王振翼
李喜林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202010793360.7A priority Critical patent/CN111931913B/en
Publication of CN111931913A publication Critical patent/CN111931913A/en
Application granted granted Critical
Publication of CN111931913B publication Critical patent/CN111931913B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/34Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
    • G06F30/347Physical level, e.g. placement or routing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Geometry (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a Caffe-based deployment method of a convolutional neural network on an FPGA (field programmable gate array), which is used for solving the problems of large time consumption, long deployment time and poor practical applicability in the prior art and comprises the following implementation steps of: acquiring a training sample set; constructing a convolutional neural network model C based on Caffe; training a convolutional neural network model C based on Caffe; caffe extraction-based trained convolutional neural network model C*And storing the parameters; building convolutional neural network C based on verilog**(ii) a And acquiring a deployment result of the convolutional neural network on the FPGA. The invention utilizes the content of CaffeThe method establishes a convolutional neural network model easy to control, improves the deployment speed of the convolutional neural network on the FPGA, designs mapping operation during parameter configuration, and enables the convolutional neural network to be deployed smoothly.

Description

Caffe-based deployment method of convolutional neural network on FPGA
Technical Field
The invention belongs to the technical field of hardware acceleration of convolutional neural networks, relates to a deployment method of a convolutional neural network on an FPGA (field programmable gate array), in particular to a deployment method of a convolutional neural network on an FPGA based on Caffe, and can be used for developing an artificial intelligent hardware acceleration platform.
Background
The convolutional neural network has the characteristics of weight sharing and sparse connection, can extract local features of data and reduce parameter magnitude, is more suitable for processing a data set with large data volume, has high identification precision, shows excellent performance in the field of artificial intelligence, and is widely applied to the fields of image classification, target positioning, face identification, bone identification and the like. The FPGA becomes a popular part in the field of hardware development at present, mainly because of two characteristics, on one hand, a large number of digital circuit basic gate structures and storage units are integrated in the FPGA, and a user can change an internal logic structure by programming a configuration file into the FPGA, so that the purpose of customizing a circuit is achieved; on the other hand, FPGAs do not belong to von neumann architecture, and the result calculated by the previous unit can be directly sent to the next unit without temporary storage in a memory, so that the FPGA has very low bandwidth requirement, and the pipeline processing architecture has the characteristics of quick response and low delay, and is widely applied to video image processing, communication field and digital signal processing.
The convolutional neural network can be deployed on a CPU, a GPU and an FPGA, time consumption and deployment speed are generally used as indexes for measuring the deployment of the convolutional neural network, the CPU is too low in computing capacity to give full play to the characteristics of the convolutional neural network, the GPU is too high in power consumption to cause limited application scenes, the FPGA has the characteristics of low power consumption and low delay, meanwhile, the FPGA has abundant logic resources, large-scale parallel computing can be realized by writing an internal structure, the convolutional neural network can be deployed on the FPGA to give full play to the characteristics of the parallel computing of the convolutional neural network, time consumption can be reduced compared with the situation that the convolutional neural network is deployed on the CPU, and the deployment speed can be improved compared with the situation that the convolutional neural network is deployed on the GPU.
For example, the patent application with application publication number CN 111104124 a, entitled "fast deployment method of convolutional neural network based on Pytorch framework on FPGA"; a method for rapidly deploying a convolutional neural network on an FPGA based on a Pythrch framework is disclosed, which comprises the following steps: firstly, a model fast mapping mechanism is established through the construction of a naming rule. And then, carrying out optimization strategy calculation under the constraint condition of hardware resources, and establishing a template base based on the hardware optimization strategy. And finally, decomposing the complex network model file at the FPGA end in a self-adaptive processing flow based on rule mapping, abstracting the network into a directed acyclic graph, and finally generating the neural network accelerator. The method has the following defects: the time consumption is large and the deployment speed is slow when the Pythrch framework is used for deploying the convolutional neural network; in addition, the method lacks specific operation of calling the template library, and the situation that parameters cannot be configured possibly occurs during actual calling, so that the actual applicability is poor.
Disclosure of Invention
The invention aims to provide a Caffe-based convolutional neural network deployment method on an FPGA (field programmable gate array) aiming at the defects of the prior art, and is used for solving the problems of high time consumption, low deployment speed and poor practical applicability in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
(1) obtaining a training sample set:
taking n images with class labels obtained from the Iris dataset as a training sample set P, wherein P is { P ═ P1,P2,...,Pi,...,PnIn which P isiRepresents the ith image, and n is more than or equal to 10000;
(2) constructing a convolutional neural network model C based on Caffe:
the convolutional layer A is constructed based on Caffe.layers.Convolation, the pooling layer B is constructed based on Caffe.layers.Pooling, m full-connected layers D are constructed based on Caffe.layers.InnerProduct, and the convolutional layer A, the pooling layer B and the m full-connected layers are sequentially connected to obtain a convolutional neural network model C, wherein the parameter of the convolutional layer A is convolutional kernel size scaleconvNumber of convolution kernels num and convolution step size strideconvThe parameter of the pooling layer B is a pooling window size scalepoolAnd pooling window moving step size stridepoolThe number of the neurons connected with the full connection layer D is q, and m is more than or equal to 2;
(3) training a convolutional neural network model C based on Caffe:
(3a) the number of initialization iterations is t, and the convolution neural network model of the t iteration is CtThe learning rate of the t-th iteration is alphatInitializing configuration file Solver by using Caffe, initializing maximum iteration times T, initial learning rate alpha, learning updating step length stepsize, learning rate updating parameters gamma and power in Solver, 60<T,0<α<0.01,stepsize<T,0<gamma<1,0<power<1, and let t equal to 0, alphat=α,Ct=C;
(3b) Layer calls convolutional neural network model C based on cafetSimultaneously, based on the context, the Solver calls a configuration file Solver;
(3c) randomly selecting N samples from the training sample set as a convolutional neural network model CtThe input of the N samples is predicted to obtain the prediction class labels corresponding to the N samples, and C is calculated according to the class label and the prediction class label of each sampletLoss value E oft
Figure BDA0002624535580000031
(3d) Adopting a back propagation method according to the learning rate alpha in the solutiontAnd loss value EtFor convolution neural network model CtConvolution kernel weight ω of convolution layer a of (a)tConnection parameter theta with full connection layer DtUpdating to obtain a t-th iteration convolutional neural network model C't
(3e) Judging t>If stepsize is true, if yes, learning rate alpha is determined according to gamma and powertUpdating, otherwise, executing the step (3 f);
(3f) judging whether T is true or not, if so, obtaining a trained convolutional neural network model C*Otherwise, let t be t +1, and execute step (3 c);
(4) caffe extraction-based trained convolutional neural network model C*And storing:
(4a) trained convolutional neural network C based on get _ layer function pair in Caffe*Decomposing to obtain a convolution Layer A ', a pooling Layer B ' and a full-connection Layer D ' represented by Layer;
(4b) establishing an empty dictionary file D, extracting parameters of the convolutional layer A ', parameters of the pooling layer B ' and the number q of connected neurons in the fully-connected layer D ' through an items function in Caffe, and storing the parameters in the dictionary file D;
(5) building convolutional neural network C based on verilog**
(5a) Building a convolutional layer template for configuring parameters of a convolutional layer A ', a pooling layer template for configuring parameters of a pooling layer B ' and a fully-connected layer template for configuring the number q of connected neurons in a fully-connected layer D ' by using verilog;
(5b) by adopting a mapping mode, the parameter scale of the convolution layer A' in the dictionary file d is usedconvNum and strideconvCarrying out parameter configuration on the convolutional layer template to obtain a configured convolutional layer module mconvBy the size scale of the parameter of the pooling layer B' in the dictionary file dpoolAnd stridepoolPerforming parameter configuration on the pooling layer template to obtain a configured pooling layer module mpoolPerforming parameter configuration on the full-connection layer template through the number q of the connection neurons of the full-connection layer D' in the dictionary file D to obtain a configured full-connection layer module mfcAnd m isconv、mpoolAnd m are mfcAre connected in sequence to form a convolutional neural network C**
(6) Acquiring a deployment result of the convolutional neural network on the FPGA:
(6a) by verilog pair C**Performing RTL level description, and performing C by using Synthesis tool of VIVADO**The RTL level description of (A) is translated into a logic netlist;
(6b) according to the time sequence constraint and the area constraint condition, optimizing the layout and the wiring of the logic netlist on the FPGA by an implantation tool of VIVADO (virtual evolution data only) to generate an optimized logic netlist;
(6c) and converting the optimized logic netlist into a bit stream file which can be used for circuit deployment on the FPGA through a generate bit stream tool of the VIVADO, and burning the bit stream file into a Flash memory Flash on the FPGA.
Compared with the prior art, the invention has the following advantages:
firstly, Caffe is adopted to train the convolutional neural network, and Caffe is used as a frame for direct compiling and executing, so that the method has higher executing speed, and can dynamically adjust the learning rate so as to train the convolutional neural network in shorter time; meanwhile, the Caffe framework can simplify the deployment process by utilizing various functions, so that the time consumption of the convolutional neural network deployment is reduced.
Secondly, the invention adopts Caffe to extract the trained convolutional neural network model C*The Caffe can directly store the convolution Layer, the pooling Layer and the full-connection Layer of the convolution neural network model in a Layer structure through a function, information stored in the Layer structure is easy to view and operate, the response speed is higher during calling, and the deployment speed is effectively improved.
Thirdly, when parameters are configured for the convolutional layer template, the pooling layer template and the full-connection layer template, a mapping method is adopted, and a plurality of tools in the VIVADO are used for acquiring a deployment result when the deployment result is acquired, so that the practical applicability is improved.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
FIG. 2 is a flow chart of the implementation of the present invention for training the convolutional neural network model C based on Caffe.
FIG. 3 is a diagram of the present invention's results of acquiring the deployment of a convolutional neural network on an FPGA.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples.
Referring to fig. 1, the present invention includes the steps of:
step 1) obtaining a training sample set:
the Iris data set as a classic data set in the field of machine learning and artificial intelligence comprises 150 data samples which are divided into 3 classes, each class comprises 50 data, and each data comprises 4 attributes. The Iris flower species can be predicted by 4 attributes including the length of the calyx, the width of the calyx, the length of the petals and the width of the petals, n images with class labels acquired from an Iris data set are used as a training sample set P, and P is { P ═ P1,P2,...,Pi,...,PnIn which P isiRepresents the ith image, n is 10000;
step 2) constructing a convolutional neural network model C based on Caffe:
the convolutional layer A is constructed based on Caffe.layers.Convolation, the pooling layer B is constructed based on Caffe.layers.Pooling, m full-connected layers D are constructed based on Caffe.layers.InnerProduct, and the convolutional layer A, the pooling layer B and the m full-connected layers are sequentially connected to obtain a convolutional neural network model C, wherein the parameter of the convolutional layer A is convolutional kernel size scaleconvNumber of convolution kernels num and convolution step size strideconvThe parameter of the pooling layer B is a pooling window size scalepoolAnd pooling window moving step size stridepoolThe number of the neurons connected with the full connection layer D is q; the construction method can directly define the names, types and parameters of the convolutional layer, the pooling layer and the full-connection layer through the functional configuration parameters, effectively simplifies the steps of model establishment and improves the deployment speed; in this example, m is 2, and the specific parameters of each layer of the convolutional neural network C are as follows:
the convolution kernel size of convolution layer A is 7 x 7, the number of convolution kernels is 64, and the convolution step size is 2;
the size of the pooling window of the pooling layer B is set to be 2 multiplied by 2, and the moving step length of the pooling window is 2;
the number of the neurons connected by the full connection layer D is 512;
the length of the full connection layer can be automatically adjusted according to actual needs, and weight balance in time and space is guaranteed so as to increase the complexity of the model.
Step 3) referring to fig. 2, training a convolutional neural network model C based on Caffe:
(3a) the number of initialization iterations is t, and the convolution neural network model of the t iteration is CtThe learning rate of the t-th iteration is alphatInitializing configuration file Solver by using Caffe, initializing maximum iteration times T, initial learning rate alpha, learning updating step length stepsize, learning rate updating parameters gamma and power in Solver, 60<T,0<α<0.01,stepsize<T,0<gamma<1,0<power<1, and let t equal to 0, alphat=α,CtC, in this example T70, α 0.002, stepsize 5000, gamma 0.5, power 0.5;
(3b) layer calls convolutional neural network model C based on cafetThe method is characterized in that the method is based on the same call, wherein the Solver calls the configuration file Solver, the two calls are based on the using calling method, the type of the name space can be directly called without specifying the detailed name space, and therefore the time consumption during calling is reduced;
(3c) randomly selecting N samples from the training sample set as a convolutional neural network model CtThe input of the N samples is predicted to obtain the prediction class labels corresponding to the N samples, and C is calculated according to the class label and the prediction class label of each sampletLoss value E oft
Figure BDA0002624535580000061
In this example, N is 500, which is calculated as:
Figure BDA0002624535580000062
wherein n isiThe serial number of the selected sample is indicated,c represents the total column number of class labels of the training sample set, kiA column number indicating a sample category label,
Figure BDA0002624535580000063
denotes the n-thiKth in class label of individual sampleiThe elements of the columns, log, represent logarithmic operations based on e,
Figure BDA0002624535580000064
denotes the n-thiKth in prediction class label of individual sampleiElements of a column;
(3d) adopting a back propagation method according to the learning rate alpha in the solutiontAnd loss value EtFor convolution neural network model CtConvolution kernel weight ω of convolution layer a of (a)tConnection parameter theta with full connection layer DtUpdating to obtain a t-th iteration convolutional neural network model C'tThe update formula is as follows:
Figure BDA0002624535580000071
Figure BDA0002624535580000072
wherein, ω ist+1Represents the updated convolution kernel weight, θt+1Indicating the updated full connectivity layer connection parameters,
Figure BDA0002624535580000073
representing a derivative operation;
(3e) judging t>If stepsize is true, if yes, learning rate alpha is determined according to gamma and powertUpdating, otherwise, executing the step (3 f); wherein the learning rate update formula is:
αt+1=αt×(1+gamma×t)(-power)
wherein alpha ist+1Represents the updated learning rate;
(3f) judging whether T is true or not, if so, obtaining a trained convolutional neural network model C*Otherwise, let t be t +1, and execute step (3 c);
step 4) extracting the trained convolutional neural network model C based on Caffe*And storing:
(4a) trained convolutional neural network C based on get _ layer function pair in Caffe*Decomposing to obtain a convolution Layer A ', a pooling Layer B ' and a full-connection Layer D ' represented by Layer;
(4b) establishing an empty dictionary file D, extracting parameters of the convolutional layer A ', parameters of the pooling layer B ' and the number q of connected neurons in the fully-connected layer D ' through an items function in Caffe, and storing the parameters in the dictionary file D;
step 5) establishing a convolutional neural network C based on verilog**
(5a) Building a convolutional layer template for configuring parameters of a convolutional layer A ', a pooling layer template for configuring parameters of a pooling layer B ' and a fully-connected layer template for configuring the number q of connected neurons in a fully-connected layer D ' by using verilog;
(5b) by adopting a mapping mode, the parameter scale of the convolution layer A' in the dictionary file d is usedconvNum and strideconvCarrying out parameter configuration on the convolutional layer template to obtain a configured convolutional layer module mconvBy parameter scale of the pooling layer B' in the dictionary file dpoolAnd stridepoolPerforming parameter configuration on the pooling layer template to obtain a configured pooling layer module mpoolPerforming parameter configuration on the full-connection layer template through the number q of the connection neurons of the full-connection layer D' in the dictionary file D to obtain a configured full-connection layer module mfc
Traversing the size of an input feature diagram and the size of an input/output channel of the convolutional neural network, searching hardware design parameters which accord with the following formulas and comprise the output feature diagram parallelism Tm, the input feature diagram parallelism Tn, the high Tr of the feature diagram after blocking and the wide Tc of the feature diagram after blocking, if the resource occupation of the searched reorganized hardware design parameters is less than the total resource, and the current overall calculation delay is taken as the minimum delay, storing the group of hardware parameters, otherwise, continuously searching, and finally obtaining the hardware design parameters adopted by the minimum delay;
Figure BDA0002624535580000081
wherein T represents the overall calculation time delay, H represents the height of the characteristic diagram, W represents the width of the characteristic diagram, M represents an output channel, and N represents an input channel;
will Tm×TnxW weights and TnSize is Tr×TcIs input to the convolution module mconvBuffer of (1), and then buffering T from the buffernInputting the input characteristic diagram of each input channel and the corresponding number of weights into a convolution calculation unit, simultaneously performing multiplication and accumulation, and performing convolution window sliding on the block input characteristic diagram until the block input characteristic diagram slides to Tr×TcAfter the position of the point, the convolution calculation of the group of input feature maps is finished, and the convolution calculation result is input to mpool
Calculating the T obtained after convolutionmLoading an output feature map to mpoolBuffer of (2), T when pooling operation startsmThe same position values of the characteristic graphs are input into a comparison unit, the comparison unit takes the maximum value in the characteristic values as the characteristic value of the output characteristic graph to complete pooling operation, and m is obtainedpoolThe result of the pooling operation in (1) is input to mfc
M is to beconv、mpoolAnd m are mfcForming a convolutional neural network C**
Step 6) referring to fig. 3, acquiring a deployment result of the convolutional neural network on the FPGA:
(6a) by verilog pair C**Performing RTL level description, and performing C by using Synthesis tool of VIVADO**The RTL level description of (A) is translated into a logic netlist;
(6b) according to the time sequence constraint and the area constraint condition, optimizing the layout and the wiring of the logic netlist on the FPGA by an implantation tool of VIVADO (virtual evolution data only) to generate an optimized logic netlist;
(6c) and converting the optimized logic netlist into a bit stream file which can be used for circuit deployment on the FPGA through a generate bit stream tool of the VIVADO, and burning the bit stream file into a Flash memory Flash on the FPGA.

Claims (5)

1. A Caffe-based convolutional neural network deployment method on an FPGA is characterized by comprising the following steps:
(1) obtaining a training sample set:
taking n images with class labels obtained from the Iris dataset as a training sample set P, wherein P is { P ═ P1,P2,...,Pi,...,PnIn which P isiRepresenting the ith image, n is less than or equal to 10000;
(2) constructing a convolutional neural network model C based on Caffe:
the convolutional layer A is constructed based on Caffe.layers.Convolation, the pooling layer B is constructed based on Caffe.layers.Pooling, m full-connected layers D are constructed based on Caffe.layers.InnerProduct, and the convolutional layer A, the pooling layer B and the m full-connected layers are sequentially connected to obtain a convolutional neural network model C, wherein the parameter of the convolutional layer A is convolutional kernel size scaleconvNumber of convolution kernels num and convolution step size strideconvThe parameter of the pooling layer B is a pooling window size scalepoolAnd pooling window moving step size stridepoolThe number of the connecting neurons of the full connecting layer D is q, and m is more than or equal to 2;
(3) training a convolutional neural network model C based on Caffe:
(3a) the number of initialization iterations is t, and the convolution neural network model of the t iteration is CtThe learning rate of the t-th iteration is alphatInitializing configuration file Solver by using Caffe, initializing maximum iteration times T, initial learning rate alpha, learning updating step length stepsize, learning rate updating parameters gamma and power in Solver, 60<T,0<α<0.01,using,0<gamma<1,0<power<1, and let t equal to 0, alphat=α,Ct=C;
(3b) Layer calls convolutional neural network model C based on cafetSimultaneously, based on the context, the Solver calls a configuration file Solver;
(3c) randomly selecting N samples from the training sample set as a convolutional neural network model CtThe input of the N samples is predicted to obtain the prediction class labels corresponding to the N samples, and C is calculated according to the class label and the prediction class label of each sampletLoss value E oft
Figure FDA0002624535570000021
(3d) Adopting a back propagation method according to the learning rate alpha in the solutiontAnd loss value EtFor convolution neural network model CtConvolution kernel weight ω of convolution layer a of (a)tConnection parameter theta with full connection layer DtUpdating to obtain a t-th iteration convolutional neural network model C't
(3e) Judging t>If stepsize is true, if yes, learning rate alpha is determined according to gamma and powertUpdating, otherwise, executing the step (3 f);
(3f) judging whether T is true or not, if so, obtaining a trained convolutional neural network model C*Otherwise, let t be t +1, and execute step (3 c);
(4) caffe extraction-based trained convolutional neural network model C*And storing:
(4a) trained convolutional neural network C based on get _ layer function pair in Caffe*Decomposing to obtain a convolution Layer A ', a pooling Layer B ' and a full-connection Layer D ' represented by Layer;
(4b) establishing an empty dictionary file D, extracting parameters of the convolutional layer A ', parameters of the pooling layer B ' and the number q of connected neurons in the fully-connected layer D ' through an items function in Caffe, and storing the parameters in the dictionary file D;
(5) building convolutional neural network C based on verilog**
(5a) Building a convolutional layer template for configuring parameters of a convolutional layer A ', a pooling layer template for configuring parameters of a pooling layer B ' and a fully-connected layer template for configuring the number q of connected neurons in a fully-connected layer D ' by using verilog;
(5b) by adopting a mapping mode, the parameter scale of the convolution layer A' in the dictionary file d is usedconvNum and strideconvCarrying out parameter configuration on the convolutional layer template to obtain a configured convolutional layer module mconvBy parameter scale of the pooling layer B' in the dictionary file dpoolAnd stridepoolPerforming parameter configuration on the pooling layer template to obtain a configured pooling layer module mpoolPerforming parameter configuration on the full-connection layer template through the number q of the connection neurons of the full-connection layer D' in the dictionary file D to obtain a configured full-connection layer module mfcAnd m isconv、mpoolAnd m are mfcAre connected in sequence to form a convolutional neural network C**
(6) Acquiring a deployment result of the convolutional neural network on the FPGA:
(6a) by verilog pair C**Performing RTL level description, and performing C by using Synthesis tool of VIVADO**The RTL level description of (A) is translated into a logic netlist;
(6b) according to the time sequence constraint and the area constraint condition, optimizing the layout and the wiring of the logic netlist on the FPGA by an implantation tool of VIVADO (virtual evolution data only) to generate an optimized logic netlist;
(6c) and converting the optimized logic netlist into a bit stream file which can be used for circuit deployment on the FPGA through a generatebutastream tool of the VIVADO, and burning the bit stream file into a Flash memory Flash on the FPGA.
2. The deployment method of the Caffe-based convolutional neural network on the FPGA as claimed in claim 1, wherein the convolutional neural network C in the step (2) comprises two fully connected layers D, and the specific parameters of each layer of the convolutional neural network C are as follows:
the convolution kernel size of convolution layer A is 7 x 7, the number of convolution kernels is 64, and the convolution step size is 2;
the size of the pooling window of the pooling layer B is set to be 2 multiplied by 2, and the moving step length of the pooling window is 2;
the number of the fully connected layer D connecting neurons is 512.
3. The Caffe-based convolutional neural network deployment method on FPGA as claimed in claim 1, wherein C in step (3C)tLoss value E oftThe calculation formula is as follows:
Figure FDA0002624535570000031
wherein n isiSerial number representing selected sample, c total column number of class labels of training sample set, kiA column number indicating a sample category label,
Figure FDA0002624535570000032
denotes the n-thiKth in class label of individual sampleiThe elements of the columns, log, represent logarithmic operations based on e,
Figure FDA0002624535570000033
denotes the n-thiKth in prediction class label of individual sampleiThe elements of the column.
4. The method for deploying Caffe-based convolutional neural network on FPGA as claimed in claim 1, wherein the convolution kernel weight ω in step (3d)tUpdating formula and full link layer connection parameter thetatUpdating formulas, which are respectively:
Figure FDA0002624535570000041
Figure FDA0002624535570000042
wherein, ω ist+1Represents the updated convolution kernel weight, θt+1Representing updated full connectivity connectionsThe parameters are set to be in a predetermined range,
Figure FDA0002624535570000043
representing a derivative operation.
5. The Caffe-based convolutional neural network deployment method on FPGA as claimed in claim 1, wherein the learning rate α in step (3e)tThe update formula is:
αt+1=αt×(1+gamma×t)(-power)
wherein alpha ist+1Indicating the updated learning rate.
CN202010793360.7A 2020-08-10 2020-08-10 Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe Active CN111931913B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010793360.7A CN111931913B (en) 2020-08-10 2020-08-10 Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010793360.7A CN111931913B (en) 2020-08-10 2020-08-10 Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe

Publications (2)

Publication Number Publication Date
CN111931913A true CN111931913A (en) 2020-11-13
CN111931913B CN111931913B (en) 2023-08-01

Family

ID=73306448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010793360.7A Active CN111931913B (en) 2020-08-10 2020-08-10 Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe

Country Status (1)

Country Link
CN (1) CN111931913B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111896823A (en) * 2020-06-30 2020-11-06 成都四威功率电子科技有限公司 System for carrying out online health monitoring and fault early warning on power amplifier

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137406A1 (en) * 2016-11-15 2018-05-17 Google Inc. Efficient Convolutional Neural Networks and Techniques to Reduce Associated Computational Costs
CN109086867A (en) * 2018-07-02 2018-12-25 武汉魅瞳科技有限公司 A kind of convolutional neural networks acceleration system based on FPGA
CN109740734A (en) * 2018-12-29 2019-05-10 北京工业大学 A kind of method of neuron spatial arrangement in optimization convolutional neural networks
US20190318231A1 (en) * 2018-04-11 2019-10-17 Hangzhou Flyslice Technologies Co., Ltd. Method for acceleration of a neural network model of an electronic euqipment and a device thereof related appliction information
CN111104124A (en) * 2019-11-07 2020-05-05 北京航空航天大学 Pythrch framework-based rapid deployment method of convolutional neural network on FPGA

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180137406A1 (en) * 2016-11-15 2018-05-17 Google Inc. Efficient Convolutional Neural Networks and Techniques to Reduce Associated Computational Costs
US20190318231A1 (en) * 2018-04-11 2019-10-17 Hangzhou Flyslice Technologies Co., Ltd. Method for acceleration of a neural network model of an electronic euqipment and a device thereof related appliction information
CN109086867A (en) * 2018-07-02 2018-12-25 武汉魅瞳科技有限公司 A kind of convolutional neural networks acceleration system based on FPGA
CN109740734A (en) * 2018-12-29 2019-05-10 北京工业大学 A kind of method of neuron spatial arrangement in optimization convolutional neural networks
CN111104124A (en) * 2019-11-07 2020-05-05 北京航空航天大学 Pythrch framework-based rapid deployment method of convolutional neural network on FPGA

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李坤伦;魏泽发;宋焕生;: "基于SqueezeNet卷积神经网络的车辆颜色识别", 长安大学学报(自然科学版), no. 04 *
谢达;周道逵;季振凯;戴新宇;武睿;: "基于异构多核平台的Caffe框架物体分类算法实现与加速", 电子与封装, no. 05 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111896823A (en) * 2020-06-30 2020-11-06 成都四威功率电子科技有限公司 System for carrying out online health monitoring and fault early warning on power amplifier

Also Published As

Publication number Publication date
CN111931913B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
US20190303762A1 (en) Methods of optimization of computational graphs of neural networks
WO2018171717A1 (en) Automated design method and system for neural network processor
WO2018171715A1 (en) Automated design method and system applicable for neural network processor
WO2022027937A1 (en) Neural network compression method, apparatus and device, and storage medium
CN110059811A (en) Weight buffer
CN109784489A (en) Convolutional neural networks IP kernel based on FPGA
CN112288086B (en) Neural network training method and device and computer equipment
US20180204110A1 (en) Compressed neural network system using sparse parameters and design method thereof
CN109409510B (en) Neuron circuit, chip, system and method thereof, and storage medium
WO2021233342A1 (en) Neural network construction method and system
WO2021218517A1 (en) Method for acquiring neural network model, and image processing method and apparatus
CN108764466A (en) Convolutional neural networks hardware based on field programmable gate array and its accelerated method
JP7168772B2 (en) Neural network search method, device, processor, electronic device, storage medium and computer program
CN113076938B (en) Deep learning target detection method combining embedded hardware information
CN107391512A (en) The method and apparatus of knowledge mapping prediction
WO2022007867A1 (en) Method and device for constructing neural network
Stevens et al. Manna: An accelerator for memory-augmented neural networks
CN115860081B (en) Core algorithm scheduling method, system, electronic equipment and storage medium
CN112307048B (en) Semantic matching model training method, matching method, device, equipment and storage medium
CN111563582A (en) Method for realizing and optimizing accelerated convolution neural network on FPGA (field programmable Gate array)
de Prado et al. Automated design space exploration for optimized deployment of dnn on arm cortex-a cpus
CN111931913B (en) Deployment method of convolutional neural network on FPGA (field programmable gate array) based on Caffe
JP2022548341A (en) Get the target model
CN111582094A (en) Method for identifying pedestrian by parallel selecting hyper-parameter design multi-branch convolutional neural network
CN116401552A (en) Classification model training method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant