CN116543291A

CN116543291A - Method for realizing CNN (customer premise network) by using FPGA (field programmable gate array) with flexible resource configuration

Info

Publication number: CN116543291A
Application number: CN202310748451.2A
Authority: CN
Inventors: 李强; 赵峰; 庄莉; 王秋琳; 伍臣周; 宋立华; 邱镇; 黄晓光
Original assignee: State Grid Information and Telecommunication Co Ltd; Fujian Yirong Information Technology Co Ltd
Current assignee: State Grid Information and Telecommunication Co Ltd; Fujian Yirong Information Technology Co Ltd
Priority date: 2023-06-25
Filing date: 2023-06-25
Publication date: 2023-08-04

Abstract

The invention relates to the technical field of neural network models, and discloses a method for realizing CNN by using an FPGA with flexible resource allocation, which comprises the following steps: step 101, before inputting a convolution layer, firstly selecting a serial-parallel combination configuration, wherein the serial-parallel combination configuration comprises a K value; step 102, generating serial step number, wherein the serial step number is equal to 'N/K', N is the number N of the characteristic diagrams of the previous layer, the number N of the current convolution layers is N x M, and the number M of the characteristic diagrams of the next layer; step 103, the K-1 convolution calculation structure calculates K feature images of the previous layer in parallel, merges the K-1 convolution calculation structures through serial merging structures according to the step N/K to obtain the final result of the feature images of the next layer, and completes the convolution calculation of the total N input feature images of the previous layer; by adopting the method for convolution, the operation efficiency of the whole neural network can be improved, and the plant pest degree can be calculated more efficiently.

Description

Method for realizing CNN (customer premise network) by using FPGA (field programmable gate array) with flexible resource configuration

Technical Field

The invention relates to the technical field of neural network models, in particular to a method for realizing CNN by using an FPGA with flexible resource configuration.

Background

The method has the advantages that the plant with a simple three-dimensional structure and the leaf surface as a main body can be subjected to insect pest degree evaluation by adopting the neural network model, the existing obstacle is that a large number of training samples for marking the number of the insect pests are required for training of a common neural network, the artificial statistics of the number of the insect pests in the samples is long, the insect pests are active and dynamic, the statistical time span can cause distortion of statistical results, and therefore the training of the neural network by the aid of the sufficient training samples cannot be obtained, so that the accuracy of the result of the number of the insect pests output by the neural network is poor.

Disclosure of Invention

The invention provides a method for realizing CNN by using an FPGA with flexible resource allocation, which solves the technical problem that training of a common neural network for calculating the number of pests in the related art requires a large number of training samples for marking the number of pests.

The invention provides a CNN neural network realized by FPGA with flexible resource allocation, which comprises the following components:

h convolution layers which are sequentially connected in series, wherein a first convolution layer inputs a unit image, and a plurality of unit images are sequentially input; performing linear transformation on the feature map output by the last convolution layer to generate a first feature vector;

generating an original top point diagram, wherein one vertex of the original top point diagram corresponds to one unit image; generating a vertex network diagram for each vertex, and vectorizing the vertices of the vertex network diagram to obtain vertex vectors;

a first hidden layer, wherein the first hidden layer inputs a first characteristic vector and a vertex vector of a vertex of the vertex network diagram; the h first hidden layer comprises C channels, the vertexes of the h layer of the vertex network diagram are randomly sampled to generate random subsets with consistent sizes, the i random subsets and the first eigenvectors and vertex vectors of the vertexes in the center of the vertex network diagram are input into the i channels, and the calculation comprises the following steps:

，/>the ith propagation vector representing the h th first hidden layer, +.>And->Weight parameter and bias parameter respectively representing the h first hidden layer, +.>Representing an activation function->The ith propagation information of the vertex v in the h layer of the vertex network diagram is represented, and the vertex v represents the vertex in the center of the vertex network diagram;

wherein->Representation->Is a random subset of the i-th of the set,vertex set representing the connection of the h layer of the vertex network graph with vertex v, vertex e belonging to +.>，/>Attention parameters representing the first feature vector corresponding to vertex v and the first feature vector corresponding to vertex e; />A vertex vector representing vertex e.

The second hiding layer is input after the output pretreatment of the first hiding layer;

the calculating of the second hidden layer includes:

wherein, the liquid crystal display device comprises a liquid crystal display device,a graph coding vector representing vertex v, +.>Vector representing the center of the y-th cluster, +.>From a first hidden layerOutput pre-processing of->Representing the number of cluster centers, +.>And->Weight parameter and bias parameter respectively representing the second hidden layer, < >>Representing an activation function->A vertex vector corresponding to the vertex v;

the result output layer inputs the graph coding vector of the vertex, searches the local maximum value of the graph coding vector, and the number of the local maximum value is the number of pests in the area corresponding to the unit image corresponding to the vertex;

the second hidden layer is connected with the full-connection layer during training, the full-connection layer is used for inputting the graph coding vector of the vertex, outputting and mapping to the classification space, and the classification label of the classification space represents the insect pest degree.

Further, connecting edges exist at vertexes corresponding to adjacent unit images on the plant image.

Further, the calculation formula of the attention parameter of the first feature vector corresponding to the vertex v and the first feature vector corresponding to the vertex e is as follows:，/>，/>representing the original attention parameter +.>Representing to which vertex e belongs in the vertex network graph of vertex vRandom subset,/->Represents an exponential function based on natural constants, < ->And->Representing the first eigenvector corresponding to vertex v and vertex e, respectively,>representing the scaling factor.

Further, the preprocessing of the output of the first hidden layer includes: and clustering the propagation vectors output by the first hidden layer by using the C propagation vectors output by the first hidden layer as a clustering center to generate C clusters, and calculating the vector of the cluster center by each cluster.

Further, performing random walk on the original vertex map to generate a vertex network map for each vertex, wherein the number of layers of the vertex network map is the same as that of the random walk.

Further, the method for generating a vertex network graph for each vertex by the random walk includes:

step 201, selecting a vertex, and starting random walk by taking the selected vertex as a center until the number of walked layers reaches A; a is the number of first hidden layers;

step 202, adding the vertex sequence of the walk into the vertex network graph, if the number of the walks is less than B, accumulating the number of the walks once, and returning to step 201, otherwise, ending the step.

Further, the classification labels of the classification space represent no pest, general pest and serious pest, respectively.

Further, the unit image is generated by uniformly dividing the plant image.

The invention provides a method for realizing CNN by using a flexibly-configured FPGA, which is used for carrying out the convolution operation of a convolution layer of a CNN neural network realized by using the flexibly-configured FPGA, and comprises the following steps:

step 101, before inputting a convolution layer, firstly selecting a serial-parallel combination configuration, wherein the serial-parallel combination configuration comprises a K value;

step 102, generating serial step number, wherein the serial step number is equal to 'N/K', N is the number N of the characteristic diagrams of the previous layer, the number N of the current convolution layers is N x M, and the number M of the characteristic diagrams of the next layer;

and 103, the K-1 convolution calculation structure calculates K feature graphs of the previous layer in parallel, merges the K-1 convolution calculation structures through serial merging structure serial according to the step N/K to obtain the final result of the feature graphs of the next layer, and the convolution calculation of the total N input feature graphs of the previous layer is completed.

Further, the K-1 convolution calculation structure performs the convolution calculation of the K convolution kernels and the corresponding feature graphs, and sums the corresponding points to obtain the values in the 1 feature graphs, wherein the convolution calculation of the K feature graphs and the K convolution kernels is performed in parallel, and each clock sums the K output convolution values;

the serial merging structure is to accumulate the results output by each time of K-1 convolution calculation structure, change K input feature images to continue calculation after each time of K-1 convolution calculation, complete the convolution calculation of the N input feature images in the last layer after N/K times, add bias to the accumulated value, and obtain the final result of the feature images through the operation of an activation function f.

The invention has the beneficial effects that: the training sample of the neural network model is short in labeling time, the integrated information is transmitted to a plurality of unit images to generate the coding vector corresponding to the unit images, the quantity result of pests is indirectly output by utilizing the embodiment of the neural network on the vector concerning the characteristics corresponding to the pests during training, the accuracy is ensured, the workload is reduced, meanwhile, the FPGA combined with flexible resource allocation performs efficient convolution processing operation on a convolution layer, and the operation speed of the neural network model is ensured.

Drawings

FIG. 1 is a flow chart of a method for implementing CNN by a resource-flexible-configuration FPGA of the present invention;

FIG. 2 is a flow chart of a method of generating a vertex network graph for each vertex for the random walk of the present invention.

Detailed Description

The subject matter described herein will now be discussed with reference to example embodiments. It is to be understood that these embodiments are merely discussed so that those skilled in the art may better understand and implement the subject matter described herein and that changes may be made in the function and arrangement of the elements discussed without departing from the scope of the disclosure herein. Various examples may omit, replace, or add various procedures or components as desired. In addition, features described with respect to some examples may be combined in other examples as well.

A FPGA-implemented CNN neural network with flexible resource configuration, comprising:

h convolution layers, wherein a first convolution layer inputs a unit image, and a plurality of unit images are sequentially input;

the H convolution layers are sequentially connected in series, the feature image output by the last convolution layer is defined as a final feature image, and each unit image corresponds to one final feature image;

a first linear layer which inputs the final feature map and performs linear transformation on the final feature map to generate a first feature vector;

generating an original top point diagram, wherein one vertex of the original top point diagram corresponds to one unit image, and connecting edges exist on the vertices corresponding to adjacent unit images on the plant image;

generating a vertex network diagram for each vertex by carrying out random walk on an original vertex diagram, wherein the starting point of the random walk is the vertex at the center of the vertex network diagram, the number of layers of the vertex network diagram is the same as that of the random walk, and vectorizing the vertices of the vertex network diagram to obtain a vertex vector;

a first hidden layer connected in parallel, wherein the first hidden layer inputs a first characteristic vector and a vertex vector of a vertex network diagram;

the h first hidden layer comprises C channels, the vertexes of the h layer of the vertex network diagram are randomly sampled to generate random subsets with consistent sizes, the i random subsets and the first eigenvectors and vertex vectors of the vertexes in the center of the vertex network diagram are input into the i channels (the channels share weight parameters and bias parameters), and the calculation comprises the following steps:

the ith propagation vector representing the h th first hidden layer, +.>And->Weight parameter and bias parameter respectively representing the h first hidden layer, +.>Representing an activation function->Information indicating the ith propagation of vertex v (vertex in the center of the vertex network graph) at the h layer of the vertex network graph;

wherein the method comprises the steps ofRepresentation->I-th random subset of (a),/i>Vertex set representing the h layer of the vertex network graph connected to vertex v, ">Attention parameters representing the first feature vector corresponding to vertex v and the first feature vector corresponding to vertex e; />A vertex vector representing vertex e.

Representing the original attention parameter +.>Representing the random subset to which e belongs in the vertex network graph of vertex v, < >>Represents an exponential function based on natural constants, < ->And->Representing the first eigenvector corresponding to vertex v and vertex e, respectively,>the expansion coefficient and the adjustable parameter are represented, and default is 0.2.

The second hidden layer is input after the output pretreatment of the first hidden layer, and the pretreatment comprises the following steps:

taking the C propagation vectors output by the first hidden layer as a clustering center, clustering the propagation vectors output by other first hidden layers to generate C clusters, and calculating the vector of the cluster center by each cluster;

the calculating of the second hidden layer includes:

wherein, the liquid crystal display device comprises a liquid crystal display device,a graph coding vector representing vertex v, +.>Vector representing the center of the y-th cluster, +.>Representing the number of cluster centers, +.>And->Weight parameter and bias parameter respectively representing the second hidden layer, < >>Representing an activation function->A vertex vector corresponding to the vertex v;

and the result output layer inputs the graph coding vector of the vertex, searches the local maximum value of the graph coding vector, and the number of the local maximum value is the number of pests in the area corresponding to the unit image corresponding to the vertex.

In an embodiment of the present invention, the second hidden layer of the CNN neural network implemented by the FPGA with flexible resource allocation is connected to a full-connection layer during training, where the full-connection layer is used for inputting the graph coding vector of the vertex, outputting and mapping the graph coding vector to a classification space, and classification labels of the classification space respectively represent no pest, general pest and serious pest. Specifically, the vertex tracing corresponds to a unit image, the classification label actually represents the insect pest degree of the area corresponding to the unit image, and for the training set, the classification label can be marked by manually checking pictures through experience, so that the specific information such as the number of insect pests in the unit image is not needed to be obtained.

The algorithm applied by the result output layer for searching the local maximum value is a conventional means, and the result output layer can be connected after other parts of training are completed.

In one embodiment of the invention, each vector component of the graph-encoded vector is mapped into a two-dimensional coordinate system, the component values of one vector component correspond to the Y-axis coordinates, the order of which corresponds to the X-axis coordinates, and curve peaks are found as local maxima after fitting the curve.

In one embodiment of the invention, the dimension of the graph coding vector of the vertex is the same as the number of elements of the matrix of the final feature graph, the graph coding vector is cut at equal length and sequentially spliced to form an intermediate matrix, the intermediate matrix is consistent with the matrix of the final feature graph in size, and the local maximum value of the intermediate matrix is searched to be used as the local maximum value of the graph coding vector.

As shown in fig. 2, in one embodiment of the present invention, a method for generating a vertex network graph for each vertex by random walk includes:

step 202, adding the moving vertex sequence into the vertex network graph, if the moving times are less than B, accumulating the moving times once, and returning to step 201, otherwise, ending the step;

the vertex sequence generates onehot codes, inputs the continuous bag-of-words model, outputs vectorized representations of the vertices, and marks the vectorized representations as vertex vectors.

In one embodiment of the invention, the cell image is generated by uniform segmentation of the plant image.

In one embodiment of the present invention, the plant image is a top view image of a plant planting area, the plant is tobacco or the like, and the plant image is an RGB image, so each unit image is also an RGB image, and image feature inputs of three channels are generated respectively.

In one embodiment of the invention, as with general convolution, pooling layers are provided between the convolution layers.

In one embodiment of the invention, the function is activatedThe sigmod activation function is selected.

In one embodiment of the invention, generating an original vertex map, generating a vertex network map for each vertex, and vectorizing the vertices of the vertex network map to obtain vertex vectors are performed outside a CNN neural network implemented by an FPGA with flexible resource allocation.

As shown in fig. 1, a method for implementing CNN by using an FPGA with flexible resource configuration includes the following steps:

k is more than or equal to 1 and less than or equal to N, the size of the K value determines the resource cost of parallel operation, and when the resources of the FPGA are fewer, the K value can be set to be a smaller value.

and 103, the K-1 convolution calculation structure calculates K feature graphs of the previous layer in parallel, merges the K-1 convolution calculation structures through serial merging structure serial according to the step N/K to obtain the final result of the feature graphs of the next layer, and the convolution calculation of the total N input feature graphs of the previous layer is completed. Wherein "" represents the value of the integer;

the K-1 convolution calculation structure is a convolution calculation structure which calculates K feature images of the previous layer in parallel, calculates to obtain 1 feature image of the next layer, executes convolution calculation of K convolution kernels and corresponding feature images, sums corresponding points to obtain values in the 1 feature images, the convolution calculation of the K feature images and the K convolution kernels is executed in parallel, and each clock sums the output K convolution values;

the serial merging structure is to accumulate the results output by each time of K-1 convolution calculation structure, change K input feature graphs to continue calculation after each time of K-1 convolution calculation, complete the convolution calculation of the total N input feature graphs of the previous layer after N/K times, add bias to the accumulated value, and obtain the final result of the feature graphs through the operation of an activation function f (such as Sigmoid).

The feature map input by the first convolution layer is the unit image.

And carrying out off-chip caching or on-chip storage on the feature map information calculated each time, when the number of the feature maps is large and the feature dimension is large, the storage resource occupation is large, and carrying out off-chip caching after 1 feature map calculation is needed, otherwise, carrying out on-chip storage directly, and reducing the transmission time and the final operation time delay.

The calculation of activation functions, pooling, full connection and the like is operated in a normal mode, and the method of the invention focuses on optimizing interlayer convolution operation which is the most core, consumes the most resources and affects the performance.

The L-1 layer has N characteristic diagrams, the L layer has M characteristic diagrams, and the number of the convolution kernels of the layer is N×M. Each output characteristic diagram of the L layer is obtained by convolving all L-1 characteristic diagrams with corresponding convolution kernels, then summing, adding offset and taking an excitation function. The calculation formula is as follows:

wherein the method comprises the steps ofInformation representing the j-th feature map of the L-th layer,>the i-th feature map information representing the L-1 th layer,convolution kernel representing i-th input and j-th output,>representing convolution,/->Representing the bias of the jth output feature map of the L-th layer, f represents the activation function. For the convenience of the following explanation, let n=4, m=3, the number of convolution kernels of this layer be 12, each convolution kernelA size of 3x3;

according to the serial-parallel structure, a K-1 convolution calculation structure is firstly executed, K L-1 layer input features are taken, convolution is carried out on the K L-1 layer input features and K convolution kernels, then the K L-1 layer input features and the K L-1 layer input features are added, and a temporary value of an L-1 layer output feature diagram is obtained, wherein K is not less than 1 and not more than N, and K is assumed to be 2.

Corresponding to convolution operation between the L-1 layer and the L layer, firstly taking the 1 st characteristic diagram information of the L-1 layerAnd 2 nd feature map information->Performing the convolution calculation structure of K-1 to obtain the first temporary feature map of the L-th layer +.>Performing on-chip caching; then take the 3 rd feature map information of L-1 layer +.>And 3 rd feature map information->Performing the convolution calculation structure of K-1 to obtain the second temporary feature map of the L-th layer +.>Will->And->After accumulation, adding bias, and activating function f operation to obtain the result of the first output characteristic diagram of the L layers.

The size of the K value determines the resource cost of parallel operation, when the resources of the FPGA are fewer, the K value can be set to be smaller until K=1, and complete serial operation is executed, and at the moment, N characteristic images of the L-1 layer and M characteristic images of the L layer can be executed for N times or M times to complete convolution operation of all the layers; when the resources of the FPGA are more, the K value can be set to be a larger value until N is reached, and the calculation of the N feature images of the L-1 layer and the M feature images of the L layer can be completed only by executing M times, so that the timeliness of the operation is improved.

For the CNN neural network realized by the FPGA with flexible resource configuration, as the independent unit images are input, the operation times of convolution can be multiplied along with the increase of the number of the unit images, so that the operation efficiency of the whole neural network can be improved by adopting the method for convolution.

The embodiment has been described above with reference to the embodiment, but the embodiment is not limited to the above-described specific implementation, which is only illustrative and not restrictive, and many forms can be made by those of ordinary skill in the art, given the benefit of this disclosure, are within the scope of this embodiment.

Claims

1. A flexible resource configuration FPGA implemented CNN neural network, comprising:

wherein->Representation->I-th random subset of (a),/i>Vertex set representing the connection of the h layer of the vertex network graph with vertex v, vertex e belonging to +.>，/>Attention parameters representing the first feature vector corresponding to vertex v and the first feature vector corresponding to vertex e; />A vertex vector representing vertex e;

the calculating of the second hidden layer includes:wherein->A graph coding vector representing vertex v, +.>Vector representing the center of the y-th cluster, +.>Obtained by the output preprocessing of the first hidden layer,representing the number of cluster centers, +.>And->Weight parameter and bias parameter respectively representing the second hidden layer, < >>Representing an activation function->A vertex vector corresponding to the vertex v;

2. The CNN neural network implemented by an FPGA with flexible resource allocation according to claim 1, wherein connecting edges exist at vertices corresponding to adjacent unit images on the plant image.

3. The CNN neural network implemented by an FPGA with flexible resource allocation according to claim 1, wherein a calculation formula of the attention parameter of the first eigenvector corresponding to the vertex v and the first eigenvector corresponding to the vertex e is as follows:，/>，/>the original attention parameter is represented as such,representing the random subset to which e belongs in the vertex network graph of vertex v, < >>Represents an exponential function based on natural constants, < ->And->Representing the first eigenvector corresponding to vertex v and vertex e, respectively,>representing the scaling factor.

4. The FPGA implemented CNN neural network of claim 1, wherein the preprocessing of the output of the first hidden layer comprises: and clustering the propagation vectors output by the first hidden layer by using the C propagation vectors output by the first hidden layer as a clustering center to generate C clusters, and calculating the vector of the cluster center by each cluster.

5. The CNN neural network implemented by an FPGA with flexible resource allocation according to claim 1, wherein the vertex network map is generated by performing a random walk on the original vertex map for each vertex, and the number of layers of the vertex network map is the same as the number of layers of the random walk.

6. The FPGA implemented CNN neural network of claim 5, wherein the method of generating a vertex network graph for each vertex by random walk comprises:

7. The FPGA implemented CNN neural network of claim 1, wherein the classification labels of the classification space represent pest free, general pest and serious pest, respectively.

8. The FPGA implemented CNN neural network of claim 1, wherein the cell images are generated by uniform segmentation of the plant images.

9. A method for implementing CNN by using a flexible-resource-allocation FPGA, which is used for performing a convolution operation of a convolution layer of a flexible-resource-allocation FPGA implemented CNN neural network according to any one of claims 1 to 8, and includes the following steps:

10. The method for implementing CNN by using an FPGA with flexible resource allocation according to claim 9, wherein the K-1 convolution computation structure performs convolution computation of K convolution kernels and corresponding feature maps, and sums corresponding points to obtain values in the 1 feature maps, where the convolution computation of the K feature maps and the K convolution kernels is performed in parallel, and each clock sums the K output convolution values;