CN111931913A

CN111931913A - Caffe-based deployment method of convolutional neural network on FPGA

Info

Publication number: CN111931913A
Application number: CN202010793360.7A
Authority: CN
Inventors: 杨鹏飞; 王泉; 张志强; 梁瑀; 王振翼; 李喜林
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2020-11-13
Anticipated expiration: 2040-08-10
Also published as: CN111931913B

Abstract

The invention provides a Caffe-based deployment method of a convolutional neural network on an FPGA (field programmable gate array), which is used for solving the problems of large time consumption, long deployment time and poor practical applicability in the prior art and comprises the following implementation steps of: acquiring a training sample set; constructing a convolutional neural network model C based on Caffe; training a convolutional neural network model C based on Caffe; caffe extraction-based trained convolutional neural network model C^*And storing the parameters; building convolutional neural network C based on verilog^**(ii) a And acquiring a deployment result of the convolutional neural network on the FPGA. The invention utilizes the content of CaffeThe method establishes a convolutional neural network model easy to control, improves the deployment speed of the convolutional neural network on the FPGA, designs mapping operation during parameter configuration, and enables the convolutional neural network to be deployed smoothly.

Description

Caffe-based deployment method of convolutional neural network on FPGA

Technical Field

The invention belongs to the technical field of hardware acceleration of convolutional neural networks, relates to a deployment method of a convolutional neural network on an FPGA (field programmable gate array), in particular to a deployment method of a convolutional neural network on an FPGA based on Caffe, and can be used for developing an artificial intelligent hardware acceleration platform.

Background

The convolutional neural network has the characteristics of weight sharing and sparse connection, can extract local features of data and reduce parameter magnitude, is more suitable for processing a data set with large data volume, has high identification precision, shows excellent performance in the field of artificial intelligence, and is widely applied to the fields of image classification, target positioning, face identification, bone identification and the like. The FPGA becomes a popular part in the field of hardware development at present, mainly because of two characteristics, on one hand, a large number of digital circuit basic gate structures and storage units are integrated in the FPGA, and a user can change an internal logic structure by programming a configuration file into the FPGA, so that the purpose of customizing a circuit is achieved; on the other hand, FPGAs do not belong to von neumann architecture, and the result calculated by the previous unit can be directly sent to the next unit without temporary storage in a memory, so that the FPGA has very low bandwidth requirement, and the pipeline processing architecture has the characteristics of quick response and low delay, and is widely applied to video image processing, communication field and digital signal processing.

The convolutional neural network can be deployed on a CPU, a GPU and an FPGA, time consumption and deployment speed are generally used as indexes for measuring the deployment of the convolutional neural network, the CPU is too low in computing capacity to give full play to the characteristics of the convolutional neural network, the GPU is too high in power consumption to cause limited application scenes, the FPGA has the characteristics of low power consumption and low delay, meanwhile, the FPGA has abundant logic resources, large-scale parallel computing can be realized by writing an internal structure, the convolutional neural network can be deployed on the FPGA to give full play to the characteristics of the parallel computing of the convolutional neural network, time consumption can be reduced compared with the situation that the convolutional neural network is deployed on the CPU, and the deployment speed can be improved compared with the situation that the convolutional neural network is deployed on the GPU.

For example, the patent application with application publication number CN 111104124 a, entitled "fast deployment method of convolutional neural network based on Pytorch framework on FPGA"; a method for rapidly deploying a convolutional neural network on an FPGA based on a Pythrch framework is disclosed, which comprises the following steps: firstly, a model fast mapping mechanism is established through the construction of a naming rule. And then, carrying out optimization strategy calculation under the constraint condition of hardware resources, and establishing a template base based on the hardware optimization strategy. And finally, decomposing the complex network model file at the FPGA end in a self-adaptive processing flow based on rule mapping, abstracting the network into a directed acyclic graph, and finally generating the neural network accelerator. The method has the following defects: the time consumption is large and the deployment speed is slow when the Pythrch framework is used for deploying the convolutional neural network; in addition, the method lacks specific operation of calling the template library, and the situation that parameters cannot be configured possibly occurs during actual calling, so that the actual applicability is poor.

Disclosure of Invention

The invention aims to provide a Caffe-based convolutional neural network deployment method on an FPGA (field programmable gate array) aiming at the defects of the prior art, and is used for solving the problems of high time consumption, low deployment speed and poor practical applicability in the prior art.

In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:

(1) obtaining a training sample set:

taking n images with class labels obtained from the Iris dataset as a training sample set P, wherein P is { P ═ P₁,P₂,...,P_i,...,P_nIn which P is_iRepresents the ith image, and n is more than or equal to 10000;

(2) constructing a convolutional neural network model C based on Caffe:

the convolutional layer A is constructed based on Caffe.layers.Convolation, the pooling layer B is constructed based on Caffe.layers.Pooling, m full-connected layers D are constructed based on Caffe.layers.InnerProduct, and the convolutional layer A, the pooling layer B and the m full-connected layers are sequentially connected to obtain a convolutional neural network model C, wherein the parameter of the convolutional layer A is convolutional kernel size scale_convNumber of convolution kernels num and convolution step size stride_convThe parameter of the pooling layer B is a pooling window size scale_poolAnd pooling window moving step size stride_poolThe number of the neurons connected with the full connection layer D is q, and m is more than or equal to 2;

(3) training a convolutional neural network model C based on Caffe:

(3a) the number of initialization iterations is t, and the convolution neural network model of the t iteration is C_tThe learning rate of the t-th iteration is alpha_tInitializing configuration file Solver by using Caffe, initializing maximum iteration times T, initial learning rate alpha, learning updating step length stepsize, learning rate updating parameters gamma and power in Solver, 60<T，0<α<0.01，stepsize<T，0<gamma<1，0<power<1, and let t equal to 0, alpha_t＝α，C_t＝C；

(3b) Layer calls convolutional neural network model C based on cafe_tSimultaneously, based on the context, the Solver calls a configuration file Solver;

(3c) randomly selecting N samples from the training sample set as a convolutional neural network model C_tThe input of the N samples is predicted to obtain the prediction class labels corresponding to the N samples, and C is calculated according to the class label and the prediction class label of each sample_tLoss value E of_t，

(3d) Adopting a back propagation method according to the learning rate alpha in the solution_tAnd loss value E_tFor convolution neural network model C_tConvolution kernel weight ω of convolution layer a of (a)^tConnection parameter theta with full connection layer D^tUpdating to obtain a t-th iteration convolutional neural network model C'_t；

(3e) Judging t>If stepsize is true, if yes, learning rate alpha is determined according to gamma and power_tUpdating, otherwise, executing the step (3 f);

(3f) judging whether T is true or not, if so, obtaining a trained convolutional neural network model C^*Otherwise, let t be t +1, and execute step (3 c);

(4) caffe extraction-based trained convolutional neural network model C^*And storing:

(4a) trained convolutional neural network C based on get _ layer function pair in Caffe^*Decomposing to obtain a convolution Layer A ', a pooling Layer B ' and a full-connection Layer D ' represented by Layer;

(4b) establishing an empty dictionary file D, extracting parameters of the convolutional layer A ', parameters of the pooling layer B ' and the number q of connected neurons in the fully-connected layer D ' through an items function in Caffe, and storing the parameters in the dictionary file D;

(5) building convolutional neural network C based on verilog^**：

(5a) Building a convolutional layer template for configuring parameters of a convolutional layer A ', a pooling layer template for configuring parameters of a pooling layer B ' and a fully-connected layer template for configuring the number q of connected neurons in a fully-connected layer D ' by using verilog;

(5b) by adopting a mapping mode, the parameter scale of the convolution layer A' in the dictionary file d is used_convNum and stride_convCarrying out parameter configuration on the convolutional layer template to obtain a configured convolutional layer module m_convBy the size scale of the parameter of the pooling layer B' in the dictionary file d_poolAnd stride_poolPerforming parameter configuration on the pooling layer template to obtain a configured pooling layer module m_poolPerforming parameter configuration on the full-connection layer template through the number q of the connection neurons of the full-connection layer D' in the dictionary file D to obtain a configured full-connection layer module m_fcAnd m is_conv、m_poolAnd m are m_fcAre connected in sequence to form a convolutional neural network C^**；

(6) Acquiring a deployment result of the convolutional neural network on the FPGA:

(6a) by verilog pair C^**Performing RTL level description, and performing C by using Synthesis tool of VIVADO^**The RTL level description of (A) is translated into a logic netlist;

(6b) according to the time sequence constraint and the area constraint condition, optimizing the layout and the wiring of the logic netlist on the FPGA by an implantation tool of VIVADO (virtual evolution data only) to generate an optimized logic netlist;

(6c) and converting the optimized logic netlist into a bit stream file which can be used for circuit deployment on the FPGA through a generate bit stream tool of the VIVADO, and burning the bit stream file into a Flash memory Flash on the FPGA.

Compared with the prior art, the invention has the following advantages:

firstly, Caffe is adopted to train the convolutional neural network, and Caffe is used as a frame for direct compiling and executing, so that the method has higher executing speed, and can dynamically adjust the learning rate so as to train the convolutional neural network in shorter time; meanwhile, the Caffe framework can simplify the deployment process by utilizing various functions, so that the time consumption of the convolutional neural network deployment is reduced.

Secondly, the invention adopts Caffe to extract the trained convolutional neural network model C^*The Caffe can directly store the convolution Layer, the pooling Layer and the full-connection Layer of the convolution neural network model in a Layer structure through a function, information stored in the Layer structure is easy to view and operate, the response speed is higher during calling, and the deployment speed is effectively improved.

Thirdly, when parameters are configured for the convolutional layer template, the pooling layer template and the full-connection layer template, a mapping method is adopted, and a plurality of tools in the VIVADO are used for acquiring a deployment result when the deployment result is acquired, so that the practical applicability is improved.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention.

FIG. 2 is a flow chart of the implementation of the present invention for training the convolutional neural network model C based on Caffe.

FIG. 3 is a diagram of the present invention's results of acquiring the deployment of a convolutional neural network on an FPGA.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples.

Referring to fig. 1, the present invention includes the steps of:

step 1) obtaining a training sample set:

the Iris data set as a classic data set in the field of machine learning and artificial intelligence comprises 150 data samples which are divided into 3 classes, each class comprises 50 data, and each data comprises 4 attributes. The Iris flower species can be predicted by 4 attributes including the length of the calyx, the width of the calyx, the length of the petals and the width of the petals, n images with class labels acquired from an Iris data set are used as a training sample set P, and P is { P ═ P₁,P₂,...,P_i,...,P_nIn which P is_iRepresents the ith image, n is 10000;

step 2) constructing a convolutional neural network model C based on Caffe:

the convolutional layer A is constructed based on Caffe.layers.Convolation, the pooling layer B is constructed based on Caffe.layers.Pooling, m full-connected layers D are constructed based on Caffe.layers.InnerProduct, and the convolutional layer A, the pooling layer B and the m full-connected layers are sequentially connected to obtain a convolutional neural network model C, wherein the parameter of the convolutional layer A is convolutional kernel size scale_convNumber of convolution kernels num and convolution step size stride_convThe parameter of the pooling layer B is a pooling window size scale_poolAnd pooling window moving step size stride_poolThe number of the neurons connected with the full connection layer D is q; the construction method can directly define the names, types and parameters of the convolutional layer, the pooling layer and the full-connection layer through the functional configuration parameters, effectively simplifies the steps of model establishment and improves the deployment speed; in this example, m is 2, and the specific parameters of each layer of the convolutional neural network C are as follows:

the convolution kernel size of convolution layer A is 7 x 7, the number of convolution kernels is 64, and the convolution step size is 2;

the size of the pooling window of the pooling layer B is set to be 2 multiplied by 2, and the moving step length of the pooling window is 2;

the number of the neurons connected by the full connection layer D is 512;

the length of the full connection layer can be automatically adjusted according to actual needs, and weight balance in time and space is guaranteed so as to increase the complexity of the model.

Step 3) referring to fig. 2, training a convolutional neural network model C based on Caffe:

(3a) the number of initialization iterations is t, and the convolution neural network model of the t iteration is C_tThe learning rate of the t-th iteration is alpha_tInitializing configuration file Solver by using Caffe, initializing maximum iteration times T, initial learning rate alpha, learning updating step length stepsize, learning rate updating parameters gamma and power in Solver, 60<T，0<α<0.01，stepsize<T，0<gamma<1，0<power<1, and let t equal to 0, alpha_t＝α，C_tC, in this example T70, α 0.002, stepsize 5000, gamma 0.5, power 0.5;

(3b) layer calls convolutional neural network model C based on cafe_tThe method is characterized in that the method is based on the same call, wherein the Solver calls the configuration file Solver, the two calls are based on the using calling method, the type of the name space can be directly called without specifying the detailed name space, and therefore the time consumption during calling is reduced;

In this example, N is 500, which is calculated as:

wherein n is_iThe serial number of the selected sample is indicated,c represents the total column number of class labels of the training sample set, k_iA column number indicating a sample category label,

denotes the n-th_iKth in class label of individual sample_iThe elements of the columns, log, represent logarithmic operations based on e,

denotes the n-th_iKth in prediction class label of individual sample_iElements of a column;

(3d) adopting a back propagation method according to the learning rate alpha in the solution_tAnd loss value E_tFor convolution neural network model C_tConvolution kernel weight ω of convolution layer a of (a)^tConnection parameter theta with full connection layer D^tUpdating to obtain a t-th iteration convolutional neural network model C'_tThe update formula is as follows:

wherein, ω is^t+1Represents the updated convolution kernel weight, θ^t+1Indicating the updated full connectivity layer connection parameters,

representing a derivative operation;

(3e) judging t>If stepsize is true, if yes, learning rate alpha is determined according to gamma and power_tUpdating, otherwise, executing the step (3 f); wherein the learning rate update formula is:

α_t+1＝α_t×(1+gamma×t)^(-power)

wherein alpha is_t+1Represents the updated learning rate;

step 4) extracting the trained convolutional neural network model C based on Caffe^*And storing:

step 5) establishing a convolutional neural network C based on verilog^**：

(5b) by adopting a mapping mode, the parameter scale of the convolution layer A' in the dictionary file d is used_convNum and stride_convCarrying out parameter configuration on the convolutional layer template to obtain a configured convolutional layer module m_convBy parameter scale of the pooling layer B' in the dictionary file d_poolAnd stride_poolPerforming parameter configuration on the pooling layer template to obtain a configured pooling layer module m_poolPerforming parameter configuration on the full-connection layer template through the number q of the connection neurons of the full-connection layer D' in the dictionary file D to obtain a configured full-connection layer module m_fc；

Traversing the size of an input feature diagram and the size of an input/output channel of the convolutional neural network, searching hardware design parameters which accord with the following formulas and comprise the output feature diagram parallelism Tm, the input feature diagram parallelism Tn, the high Tr of the feature diagram after blocking and the wide Tc of the feature diagram after blocking, if the resource occupation of the searched reorganized hardware design parameters is less than the total resource, and the current overall calculation delay is taken as the minimum delay, storing the group of hardware parameters, otherwise, continuously searching, and finally obtaining the hardware design parameters adopted by the minimum delay;

wherein T represents the overall calculation time delay, H represents the height of the characteristic diagram, W represents the width of the characteristic diagram, M represents an output channel, and N represents an input channel;

will T_m×T_nxW weights and T_nSize is T_r×T_cIs input to the convolution module m_convBuffer of (1), and then buffering T from the buffer_nInputting the input characteristic diagram of each input channel and the corresponding number of weights into a convolution calculation unit, simultaneously performing multiplication and accumulation, and performing convolution window sliding on the block input characteristic diagram until the block input characteristic diagram slides to T_r×T_cAfter the position of the point, the convolution calculation of the group of input feature maps is finished, and the convolution calculation result is input to m_pool；

Calculating the T obtained after convolution_mLoading an output feature map to m_poolBuffer of (2), T when pooling operation starts_mThe same position values of the characteristic graphs are input into a comparison unit, the comparison unit takes the maximum value in the characteristic values as the characteristic value of the output characteristic graph to complete pooling operation, and m is obtained_poolThe result of the pooling operation in (1) is input to m_fc；

M is to be_conv、m_poolAnd m are m_fcForming a convolutional neural network C^**；

Step 6) referring to fig. 3, acquiring a deployment result of the convolutional neural network on the FPGA:

Claims

1. A Caffe-based convolutional neural network deployment method on an FPGA is characterized by comprising the following steps:

(1) obtaining a training sample set:

taking n images with class labels obtained from the Iris dataset as a training sample set P, wherein P is { P ═ P₁,P₂,...,P_i,...,P_nIn which P is_iRepresenting the ith image, n is less than or equal to 10000;

(2) constructing a convolutional neural network model C based on Caffe:

the convolutional layer A is constructed based on Caffe.layers.Convolation, the pooling layer B is constructed based on Caffe.layers.Pooling, m full-connected layers D are constructed based on Caffe.layers.InnerProduct, and the convolutional layer A, the pooling layer B and the m full-connected layers are sequentially connected to obtain a convolutional neural network model C, wherein the parameter of the convolutional layer A is convolutional kernel size scale_convNumber of convolution kernels num and convolution step size stride_convThe parameter of the pooling layer B is a pooling window size scale_poolAnd pooling window moving step size stride_poolThe number of the connecting neurons of the full connecting layer D is q, and m is more than or equal to 2;

(3) training a convolutional neural network model C based on Caffe:

(3a) the number of initialization iterations is t, and the convolution neural network model of the t iteration is C_tThe learning rate of the t-th iteration is alpha_tInitializing configuration file Solver by using Caffe, initializing maximum iteration times T, initial learning rate alpha, learning updating step length stepsize, learning rate updating parameters gamma and power in Solver, 60<T，0<α<0.01，using，0<gamma<1，0<power<1, and let t equal to 0, alpha_t＝α，C_t＝C；

(5) building convolutional neural network C based on verilog^**：

(5b) by adopting a mapping mode, the parameter scale of the convolution layer A' in the dictionary file d is used_convNum and stride_convCarrying out parameter configuration on the convolutional layer template to obtain a configured convolutional layer module m_convBy parameter scale of the pooling layer B' in the dictionary file d_poolAnd stride_poolPerforming parameter configuration on the pooling layer template to obtain a configured pooling layer module m_poolPerforming parameter configuration on the full-connection layer template through the number q of the connection neurons of the full-connection layer D' in the dictionary file D to obtain a configured full-connection layer module m_fcAnd m is_conv、m_poolAnd m are m_fcAre connected in sequence to form a convolutional neural network C^**；

(6c) and converting the optimized logic netlist into a bit stream file which can be used for circuit deployment on the FPGA through a generatebutastream tool of the VIVADO, and burning the bit stream file into a Flash memory Flash on the FPGA.

2. The deployment method of the Caffe-based convolutional neural network on the FPGA as claimed in claim 1, wherein the convolutional neural network C in the step (2) comprises two fully connected layers D, and the specific parameters of each layer of the convolutional neural network C are as follows:

the number of the fully connected layer D connecting neurons is 512.

3. The Caffe-based convolutional neural network deployment method on FPGA as claimed in claim 1, wherein C in step (3C)_tLoss value E of_tThe calculation formula is as follows:

wherein n is_iSerial number representing selected sample, c total column number of class labels of training sample set, k_iA column number indicating a sample category label,

denotes the n-th_iKth in prediction class label of individual sample_iThe elements of the column.

4. The method for deploying Caffe-based convolutional neural network on FPGA as claimed in claim 1, wherein the convolution kernel weight ω in step (3d)^tUpdating formula and full link layer connection parameter theta^tUpdating formulas, which are respectively:

wherein, ω is^t+1Represents the updated convolution kernel weight, θ^t+1Representing updated full connectivity connectionsThe parameters are set to be in a predetermined range,

representing a derivative operation.

5. The Caffe-based convolutional neural network deployment method on FPGA as claimed in claim 1, wherein the learning rate α in step (3e)_tThe update formula is:

α_t+1＝α_t×(1+gamma×t)^(-power)

wherein alpha is_t+1Indicating the updated learning rate.