CN112784969B

CN112784969B - Convolutional neural network acceleration learning method for image feature extraction

Info

Publication number: CN112784969B
Application number: CN202110136925.9A
Authority: CN
Inventors: 杨晓春; 张宇杰; 许婧楠; 王斌
Original assignee: 东北大学
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2024-05-14
Anticipated expiration: 2041-02-01
Also published as: CN112784969A

Abstract

The invention discloses a convolutional neural network accelerated learning method based on sampling, and belongs to the technical field of convolutional neural networks. In the forward propagation stage, the method only samples and obtains partial convolution kernel vector to multiply with input data, and the rest vectors are ignored. The backward propagation phase only updates the convolution kernel vector that participates in the computation in the forward propagation. Compared with the existing convolution network learning method for calculating complete matrix multiplication, the method can effectively reduce the calculated amount in the forward propagation and backward propagation processes; meanwhile, the convergence process of the network can be accelerated as only meaningful weights in the network are calculated and updated each time. The convolution neural network acceleration learning method based on sampling does not need to adjust the macroscopic structure of the convolution network in practical application, does not influence the local feature extraction characteristic of the convolution network, and is easier to apply and saves cost compared with a convolution acceleration method based on hardware.

Description

Convolutional neural network acceleration learning method for image feature extraction

Technical Field

The invention belongs to the technical field of convolutional neural networks, and particularly relates to a convolutional neural network acceleration learning method for image feature extraction.

Background

Convolutional neural networks (Convolutional Neural Network, CNN) are one of the first successful depth models, and have been still at the front of deep learning commercial applications, such as image detection and segmentation, object recognition, and speech processing.

The convolution operation is a process of sliding different convolution kernels on an input picture and performing a certain operation. Specifically, at each sliding position, the elements of the convolution kernel are multiplied and summed with the elements of the input picture in a one-to-one correspondence. After this, the de-linearization is performed by an activation function, most commonly a linear rectification, i.e. a ReLU function. Such a calculation principle gives the convolutional network the ability to extract local features. The main structure of the convolutional neural network is that a plurality of convolutional layers are stacked to be used as a feature extractor, and finally, a full-connection layer is connected to be used as a classifier. In order to make the convolutional network have better feature extraction capability, many convolutional layers are often required to be stacked, so that the parameter scale of the convolutional neural network is greatly increased with the increase of the depth of the network. The computational effort of forward feature extraction and backward error propagation by the network is typically on the order of tens of millions to hundreds of millions, with the most significant computing resource consuming being the convolution operation of the convolution layer. Therefore, accelerating the convolution operation is a key for improving the calculation efficiency of the convolution neural network model.

The neural network training frameworks now in common use, such as the Caffe and TensorFlow frameworks, will spread the input data and convolution kernels into a two-dimensional array, thereby converting the convolution operation into a matrix multiplication operation. In the learning of the convolution network, each convolution layer needs to do three matrix products in total, one time is needed in forward propagation, and one time is needed for calculating the output gradient matrix and the gradient matrix of the convolution kernel in backward propagation, which consumes a great amount of calculation resources. In practice, not all the nerve units in the convolutional network are meaningful in calculation, the nerve units with larger eigenvalues have larger influence on the subsequent network layer, and the process of using the common activation function ReLU to perform the de-linearization can also directly set the negative eigenvalue to 0. Therefore, a new convolutional network learning method is needed to be designed, the complete matrix multiplication is not calculated, but only the values of more meaningful neural units in the original output are calculated, the calculation process of the rest neural units is omitted, and the calculation cost can be reduced and the convolutional network training is accelerated. Moreover, the learning method is improved in method implementation, and has practicability compared with a convolution acceleration method based on hardware equipment.

Disclosure of Invention

In an actual feature extraction scene, the convolutional network parameter is large in scale, the calculation cost is high, and redundancy exists in the calculation process, so that the speed of feature extraction by training a convolutional network model is slow, and an acceleration method based on hardware equipment is difficult to practically apply. The invention provides a convolutional neural network acceleration learning method for image feature extraction.

The technical scheme of the invention is as follows:

The convolutional neural network acceleration learning method for image feature extraction comprises the following steps:

Step 1: the forward propagation stage utilizes probability sampling to calculate and obtain an output characteristic diagram;

Step 2: the corresponding gradient values of neurons participating in calculation during forward propagation are reserved only in the backward propagation stage, the rest gradient values are ignored for 0, the gradient matrix is pruned, and the pruned gradient matrix is used for calculating and updating convolution kernel parameters;

Step 3: and repeatedly executing the steps 1 and 2 until the network training stopping condition is reached.

Further, according to the convolutional neural network acceleration learning method for image feature extraction, the method for obtaining the output feature map by probability sampling calculation in the forward propagation stage in step 1 is as follows: firstly, expanding an input characteristic diagram and a convolution kernel into a two-dimensional matrix; obtaining a corresponding candidate convolution kernel vector number set V by probability sampling for each input vector in the unfolded two-dimensional matrix; the input vector only multiplies the vector in the set V, and fills the calculation result into the corresponding position of the output feature map, and the rest positions of the output feature map are 0.

Further, according to the convolutional neural network acceleration learning method for image feature extraction, the step 1 specifically includes the following steps:

Step 1.1: expanding the input feature map into a two-dimensional matrix X according to the dimension of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, converting convolution operation into the product of the expanded two-dimensional matrix X and W, and calculating the absolute value sum s _t of each column of elements in the matrix W;

Step 1.2: constructing a conditional probability distribution P (j|x _i) for each vector X _i in X according to the absolute value sum s _t of each column element of the matrix W;

step 1.3: sampling is carried out for tau times according to probability distribution P (j|x _i), a convolution kernel vector number is obtained by each sampling, and a convolution kernel vector candidate set V _pre is obtained after tau times of sampling;

Step 1.4: screening elements in the V _pre according to a preset condition, and forming a final candidate convolution kernel vector number set V by the screened elements;

Step 1.5: vector x _i is only inner-product with the vectors in set V, and the result is filled in the corresponding position of the output feature map Y, and the rest positions are 0.

Further, according to the convolutional neural network acceleration learning method for image feature extraction, the meaning of the conditional probability distribution P (j|x _i) is that for one vector X _i in X, the vector numbers are extracted from all the convolutional kernel vectors, and the probability of extracting the jth convolutional kernel vector w _j is proportional to the absolute value of the inner product X _iw_j ^T of the input vectors X _i and w _j, and the formula is as follows:

P(j|x_i)∝x_iw_j ^T，i∈[1，N]，j∈[1，n] (2)

wherein i is a line number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W. N is the number of rows of the input two-dimensional matrix X, and N is the number of rows of the convolution kernel matrix, namely the number of convolution kernel vectors in the matrix.

Further, according to the convolutional neural network acceleration learning method for image feature extraction, the specific content of step 1.2 includes:

Constructing a polynomial distribution P (j|t) for each column of the convolution kernel matrix W according to the absolute value sum s _t of each column element of the matrix W, and constructing a polynomial distribution P (t|x _i) for each vector X _i in X; according to Constructing a conditional probability distribution P (j|x _i); wherein i is a line number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; t is the column number of matrix W; d=d _x＝d_w,d_x is the dimension of vector x; d _w is the dimension of the convolution kernel vector w.

Further, according to the convolutional neural network acceleration learning method for image feature extraction, the polynomial distributions P (j|t) and P (t|x _i) are represented by the formula (3) and the formula (4), respectively:

P(j|t)～PN([|W_1t|,…,|W_nt|])，j∈[1，n]，t∈[1，d] (3)

P(t|x_i)～PN([|x_i1s₁|,…,|x_ids_d|])，t∈[1，d] (4)

wherein PN represents a polynomial distribution. In the formula (3), each distribution P (j|t) stores the probability of selecting a different row number j of the matrix W on the premise of selecting the t-th column of the convolution kernel matrix W; since the matrix W has d columns, d distributions are constructed in total. In equation (4), the distribution P (t|x _i) represents the probability of selecting a different column number t of the matrix W for the input vector x _i.

Further, according to the convolutional neural network accelerated learning method for image feature extraction, the method for obtaining a convolutional kernel vector number in each sampling in step 1.3 comprises the following steps: sampling according to P (t|x _i) to obtain a column number i _chosen of the convolution kernel matrix W; the t _chosen th probability distribution P (j|t=t _chosen) is found in the conditional probability distribution P (j|t), and a number j _chosen,j_chosen is obtained according to the distribution sample, namely the convolution kernel vector number obtained in the sub-sample.

Further, according to the convolutional neural network accelerated learning method for image feature extraction, the step 1.4 specifically includes: firstly, calculating a result weight of omega _j for a convolution kernel vector number j obtained by sampling each time, then sequencing elements in a set V _pre according to the weight omega _j, and reserving theta vectors with the largest weight as a candidate convolution kernel vector number set V; wherein ω _j＝ω_j+sgn(x_itW_jt); sgn () represents a sign function, when x _itW_jt > 0, sgn (x _itW_jt) =1; when x _itW_jt =0, sgn (x _itW_jt) =0; when x _itW_jt < 0, sgn (x _itW_jt) = -1.

Further, according to the convolutional neural network acceleration learning method for image feature extraction, θ represents the number of candidate convolutional kernel vectors which are finally reserved, and θ is less than or equal to n, where n represents the number of matrix rows of the two-dimensional matrix W, i.e. the number of convolutional kernel vectors in the matrix.

The convolutional neural network acceleration learning method for image feature extraction, wherein step2 comprises the following steps:

Step 2.1: for input gradient matrix The unnecessary gradient value in (1) is ignored to be 0, and the input gradient matrix/>Deleting to obtain deleted input gradient matrix/>

Step 2.2: gradient matrix for computing convolution kernel

Step 2.3: calculating an output gradient matrix

Step 2.4: from a convolution kernel gradient matrixUpdating matrix W, outputting gradient matrix/>The input gradient matrix, which is the previous layer in the network, continues back propagation.

Compared with the prior art, the convolutional neural network acceleration learning method for image feature extraction has the following beneficial effects: the method utilizes the probability sampling principle, reduces the calculated amount under the condition of not affecting the network feature extraction capability, thereby accelerating the speed of constructing the convolutional network, improving the calculation efficiency of feature extraction by using the convolutional neural network model, and meeting the requirement of rapid feature extraction in practical application. Specifically, the forward propagation phase only samples and obtains a product of a part of convolution kernel vectors and input data, and the rest vectors are ignored. The backward propagation phase only updates the convolution kernel vector that participates in the computation in the forward propagation. Compared with the existing convolution network learning method for calculating complete matrix multiplication, the method can effectively reduce the calculated amount in the forward propagation and backward propagation processes; meanwhile, the convergence process of the network can be accelerated as only meaningful weights in the network are calculated and updated each time. The convolutional neural network acceleration learning method for image feature extraction does not need to adjust the macroscopic structure of the convolutional network in practical application, does not influence the local feature extraction characteristic of the convolutional network, and is easier to apply and saves cost compared with a hardware-based convolutional acceleration method.

Drawings

FIG. 1 is a schematic diagram of a sample-based convolutional layer feature extraction process provided by the present invention;

FIG. 2 is a schematic flow chart of the convolutional neural network accelerated learning method for image feature extraction provided by the invention;

FIG. 3 is a schematic flow chart of forward propagation provided by the present invention;

FIG. 4 is a schematic flow chart of the construction probability distribution P (j|x _i) provided by the present invention;

FIG. 5 is a schematic diagram of a specific sampling flow in step 1.3 according to the present invention;

FIG. 6 is a flow chart of a final set of convolution kernel vectors V obtained by screening from set V _pre;

Fig. 7 is a flow chart of the back propagation and update process provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the embodiments and the accompanying drawings, and it is apparent that the described embodiment is a preferred embodiment of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, a schematic diagram of a sampling-based convolutional layer feature extraction process provided by the present invention is provided. When the convolutional network is used for feature extraction, a plurality of convolutional layers are stacked to improve the feature extraction capability of the network. Taking image feature extraction as an example, the input of each convolution layer is an original picture or a feature map obtained through the previous layer, and the output of each convolution layer is an output feature map obtained through convolution operation. The invention aims to reduce the calculated amount of convolution operation under the condition of not influencing the feature extraction effect of the convolution network and improve the calculation efficiency of feature extraction by using a convolution neural network model, so that the dimension of input and output is the same as the definition in the prior convolution operation. The definitions of the symbolic variables noted in fig. 1 are shown in table 1.

Table 1 the symbol variable meaning table referred to in fig. 1

An embodiment of the present invention will be described in detail with reference to the accompanying drawings.

As shown in fig. 2, a flowchart of a convolutional neural network accelerated learning method for image feature extraction provided by the invention includes steps 1,2, and 3:

Step 1: the forward propagation stage utilizes probability sampling to calculate an output feature map.

The input feature map and convolution kernel are both first expanded into a two-dimensional matrix. And obtaining a corresponding candidate convolution kernel vector number set V by probability sampling for each input vector in the input two-dimensional matrix obtained after expansion. The input vector is multiplied by the vector in the set V only and the calculation result is filled in the corresponding position of the output feature map, and the rest positions of the output feature map are 0.

The specific workflow of the forward propagation phase, as shown in fig. 3, includes steps 1.1, 1.2, 1.3, 1.4 and 1.5:

step 1.1: and expanding the input feature map into a two-dimensional matrix X according to the dimension of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, and calculating the absolute value sum of each column of elements in the matrix W.

Specifically, as shown in fig. 1, in the process of expanding input data, according to the definition of the existing convolution operation, in the process that the convolution kernel slides on the original image, the elements of the convolution kernel at each sliding position are multiplied by the elements of the input feature map one by one, and then summed. The convolution operation can be converted into a matrix product. As shown in fig. 1, the convolution kernel is a four-dimensional tensor of kn×kh×kw×kc, which is expanded into a two-dimensional matrix denoted by W. W has dimensions n×d _w, each row of the matrix representing a convolution kernel vector W, the vector corresponding to dimensions d _w =kh×kw×kc; n represents the number of matrix rows, i.e. the number of convolution kernel vectors in the matrix, so n=kn. Let the input feature map be a four-dimensional tensor of in×ih×iw×ic, and spread the input feature map into a two-dimensional matrix X according to the convolution kernel dimension, where X is a two-dimensional matrix of n×d _x. Each vector X in the X matrix is an area covered by the convolution kernel when the convolution kernel slides on the original image each time, so that the dimension d _x of the vector X is equal to the dimension d _w of the convolution kernel vector w. Let d _x＝d_w = d. N represents how many times the convolution kernel has slid together on the input signature, n=in×oh×ow being obtained by the definition of the convolution operation. The convolution operation can thus be converted into the product of the unwrapped two-dimensional matrices X and W.

The absolute value sum s _t of each column of elements of the matrix W is calculated as follows.

Wherein t is the column number of the matrix W; w _jt is the element of the jth row and t column in the convolution kernel matrix W. The calculation of this value prepares the probability distribution for the subsequent step.

Step 1.2: from the absolute value sum s _t of each column element of the matrix W, a conditional probability distribution P (j|x _i) is constructed for each vector X _i in X.

P(j|x_i)∝x_iw_j ^T，i∈[1，N]，，∈[1，n] (2)

Wherein i is a line number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W. The significance of this probability distribution is that for one vector X _i of X, the vector number is extracted among all convolution kernel vectors, and the probability of extracting to the jth convolution kernel vector w _j is proportional to the absolute value size of the inner product X _iw_j ^T of the input vectors X _i and w _j.

In particular, since it is difficult to directly construct the conditional probability distribution P (j|x _i)∝xxw_j ^T, the method is based on Step 1.2P (j|x _i) is obtained by constructing two polynomial distributions P (j|t) (step 1.2.1) and P (t|x _i) (step 1.2.2).

The specific workflow for constructing the conditional probability distribution P (j|x _i), as shown in fig. 4, includes steps 1.2.1 and 1.2.2:

step 1.2.1: a polynomial distribution P (j|t) is constructed for each column of the convolution kernel matrix W.

P(j|t)～PN([|W_1t|,…,|W_nt|])，j∈[1，n]，t∈[1，d] (3)

Wherein each distribution P (j|t) stores the probability of selecting a different row number j of the matrix W on the premise that the t-th column of the matrix W is selected. Since the matrix W has d columns, d distributions are constructed in total. PN represents a polynomial distribution. In this distribution, taking the value of j as 5 as an example, the specific probability value is calculated as

Step 1.2.2: for each vector X _i in X, a polynomial distribution P (t|x _i) is constructed.

P(t|x_i)～PN([|x_i1s₁|,…,|x_ids_d|])，t∈[1，d] (4)

Wherein s _t in each term is the sum of the absolute values of the column elements of the convolution kernel matrix calculated in step 1.1. Specifically, for example, the probability of t being 3 (3. Epsilon. [1, d ]) is

Step 1.3: and (3) sampling is carried out for tau times according to the probability distribution P (j|x _i), a convolution kernel vector number is obtained each time, and a convolution kernel vector candidate set V _pre is obtained after the tau times of sampling. τ represents the number of samples and the specific value is customized by the skilled person based on the experimental effect.

The specific workflow of sampling according to probability distribution P (j|x _i) is shown in fig. 5. Comprises the steps of 1.3.1, 1.3.2 and 1.3.3:

Step 1.3.1: a column number t _chosen of the convolution kernel matrix W is obtained from the sampling of P (t|x _i).

Step 1.3.2: the i _chosen th probability distribution P (j|t=t _chosen) is found in the set of probability distributions constructed in step 1.2.1, and a number j _chosen,j_chosen is obtained according to the distribution sample, which is the convolution kernel vector number obtained in the sample.

Specifically, for one sampling, assuming that t=3 is first extracted according to P (t|x _i) (step 1.3.1), a probability distribution P (j|t=3) is found, and j=5 is extracted according to the probability distribution P (step 1.3.2). The final extracted convolution kernel vector is numbered 5. The significance of this result is that the inner product of the input vector x _i with the decimated convolution kernel vector w ₅ may result in a larger eigenvalue than other, non-decimated convolution kernel vectors.

Step 1.3.3: repeating 1.3.1 and 1.3.2, and extracting τ times to obtain a convolution kernel vector candidate set V _pre.

Step 1.4: v _pre is filtered to obtain a final candidate convolution kernel vector number set V.

The specific workflow of the final convolution kernel vector set V is obtained by screening the set V _pre, as shown in fig. 6, including steps 1.4.1 and 1.4.2:

Step 1.4.1: a resulting weight is calculated for the convolution kernel number j obtained for each sample, denoted omega _j,ω_j＝ω_j+sgn(x_itW_jt).

Where sgn () represents a sign function, when x _itW_jt > 0, sgn (x _itW_jt) =1; when x _itW_jt =0, sgn (x _itW_jt) =0; when x _itW_jt < 0, sgn (x _itW_jt) = -1.

Specifically, since the objective is to construct a probability distribution P (j|x _i)∝x_iw_j ^Y, the inner product x _iw_j ^T is divided by a positive and negative value, and two probability distributions P (t|x _i) and P (j|t) are constructed in proportion to the magnitude of the inner product absolute value.

Step 1.4.2: the elements in the set V _pre are ordered according to the weight omega _j, and the theta vectors with the largest weights are reserved as the final set V.

Specifically, θ represents the number of candidate convolution kernel vector numbers finally reserved, and θ is less than or equal to n. When all convolution kernel vectors are selected, θ=n, the amount of computation at this time is the same as in the existing convolution method. The specific value of θ is customized by the skilled person according to the experimental effect.

Step 1.5: vector x _i is inner-product with only those extracted vectors in set V, and as a result, fills in the locations corresponding to output feature map Y, the rest of locations are 0.

The specific flow of step 1 has been described thus far in connection with a specific embodiment, wherein the construction and sampling of the polynomial distribution in step 1.2 can be further accelerated by existing methods, e.g. the time complexity of constructing P (t|x _i) for each input vector is O (d) and the time complexity of sampling is O (1) using the alias sampling method (ALIAS SAMPLE). The existing convolution method needs to calculate the product of x _i and N convolution kernel vectors, the invention only needs to calculate the product of x _i and theta vectors in a set V obtained by probability sampling, so that the complexity of the forward propagation time of the convolution operation after improvement is O (N.theta.d); the forward propagation of the existing convolution method requires the calculation of a complete matrix multiplication, with a time complexity O (n·n·d). Because theta is less than or equal to n, the invention has the advantage of acceleration. In addition, the threshold value theta is customized by a technician according to the experimental effect, so that the balance of the speed and the accuracy can be adjusted according to the actual requirement.

Step 2: the backward propagation phase only retains the corresponding gradient values of neurons participating in computation in the forward propagation, and the rest gradient values are ignored for 0. The pruned gradient matrix is used to calculate and update convolution kernel parameters.

Specifically, in the existing definition, the forward propagation stage of the network training is a process of calculating the given input data layer by the network to obtain the final output; the counter-propagating stage calculates the error between the output and the target value, and propagates the error forward layer by layer from the last layer of the network. For the convolution layer, the back propagation stage calculates a convolution kernel gradient matrix and an output gradient matrix according to the input gradient matrix, wherein the convolution kernel gradient matrix is used for updating the convolution kernel tensor of the current layer, and the output gradient matrix is transmitted to the previous layer and used as the input gradient matrix of the previous layer. The invention does not change the definition of the existing convolution network to the back propagation process, and only reduces the calculated amount of the convolution kernel gradient matrix and the output gradient matrix in the process.

The specific workflow of the back propagation and update process in the present invention, as shown in fig. 7, includes steps 2.1, 2.2, 2.3, and 2.4:

Step 2.1: for input gradient matrix The unnecessary gradient value in the (1) is ignored to be 0, and the/>

According to the back-propagation definition,The same dimensions as the output profile Y. According to the flow of step 1, for each vector X _i in the input two-dimensional matrix X, only the product of the vector X _i and the convolution kernel vector obtained by sampling is carried out, and the result is filled in the corresponding position of the output characteristic diagram Y, only the positions actually participate in the calculation of forward propagation. Thus only remain/>The rest of the gradient values are set to 0, thus obtaining/>

Step 2.2: gradient matrix for computing convolution kernel

In particular, the method comprises the steps of,Only the gradient values of the positions participating in the computation in the forward propagation are retained, such that/>And multiplying X, so that the gradient value is calculated for each vector X _i in X by the convolution kernel vector sampled in the corresponding forward propagation stage, and the gradient values of the rest convolution kernel vectors are 0.

Step 2.3: calculating an output gradient matrix

In particular, i.e. using a pruned input gradient matrixAnd multiplying the convolution kernel matrix W to obtain an output gradient matrix/>

The update operation is the same as in the existing convolutional network, supporting different update strategies, such as using random gradient descent rulesΗ is the learning rate set by the technician according to the existing learning rate setting strategy.

Step 3: and repeatedly executing the steps 1 and 2 until the network training stopping condition is reached. The stop condition is the same as that in the existing convolutional network learning process. Thus, the learning process of the convolution network is completed.

According to the technical scheme, the convolutional neural network acceleration learning method for image feature extraction provided by the embodiment of the application only picks out more meaningful weight calculation in the network through the probability sampling principle and updates the weight calculation, and can reduce the calculated amount under the condition that the network feature extraction capacity is not affected, so that the speed of constructing the convolutional network is increased, the calculation efficiency of feature extraction by using a convolutional neural network model is improved, and the requirement of rapid feature extraction in practical application is met. And meanwhile, compared with a convolution acceleration method based on hardware, the convolution acceleration method based on hardware is easier to apply and saves cost.

Thus, the present invention has been described in connection with the accompanying drawings and the embodiments, and the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims

1. The convolutional neural network acceleration learning method for image feature extraction is characterized by comprising the following steps of:

step 3: repeating the steps 1 and 2 until reaching the network training stop condition;

The step 1 specifically comprises the following steps:

Step 1.5: vector x _i is only inner product with the vector in set V, the result is filled in the corresponding position of the output characteristic diagram Y, and the rest positions are 0;

The step2 comprises the following steps:

Step 2.2: gradient matrix for computing convolution kernel

Step 2.3: calculating an output gradient matrix

2. The method for accelerating learning of convolutional neural network for image feature extraction according to claim 1, wherein the method for obtaining the output feature map by probability sampling calculation in the forward propagation stage in step 1 is as follows: firstly, expanding an input characteristic diagram and a convolution kernel into a two-dimensional matrix; obtaining a corresponding candidate convolution kernel vector number set V by probability sampling for each input vector in the unfolded two-dimensional matrix; the input vector only multiplies the vector in the set V, and fills the calculation result into the corresponding position of the output feature map, and the rest positions of the output feature map are 0.

3. The method for accelerating learning of a convolutional neural network for extracting image features according to claim 1, wherein the meaning of the conditional probability distribution P (j|x _i) is that for one vector X _i in X, the vector numbers are extracted in all convolution kernel vectors, and the probability of extracting the j-th convolution kernel vector w _j is proportional to the absolute value of the inner product X _iw_j ^T of the input vectors X _i and w _j, and the formula is as follows:

P(j|x_i)∝x_iw_j ^T,i∈[1,N],j∈[1,n] (2)

wherein i is a line number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; n is the number of rows of the input two-dimensional matrix X, and N is the number of rows of the convolution kernel matrix, namely the number of convolution kernel vectors in the matrix.

4. The convolutional neural network acceleration learning method for image feature extraction of claim 1, wherein the step 1.2 comprises:

5. The convolutional neural network acceleration learning method for image feature extraction of claim 4, wherein the polynomial distributions P (j|t) and P (t|x _i) are represented by equations (3) and (4), respectively:

P(j|t)～PN([|W_1t|,…,|W_nt|]),j∈[1,n],t∈[1,d] (3)

P(t|x_i)～PN([|x_i1s₁|,…,|s_ids_d|]),t∈[1,d] (4)

Wherein PN represents a polynomial distribution; in the formula (3), each distribution P (j|t) stores the probability of selecting a different row number j of the matrix W on the premise that the t-th column of the matrix W is selected; since the matrix W has d columns, d distributions are constructed altogether; in equation (4), the distribution P (t|x _i) represents the probability of selecting a different column number t of the matrix W for the input vector x _i.

6. The method for accelerated learning of convolutional neural networks for image feature extraction of claim 1,

The method for obtaining a convolution kernel vector number in each sampling in the step 1.3 is as follows: sampling according to P (t|x _i) to obtain a column number t _chosen of the convolution kernel matrix W; the t _chosen th probability distribution (j|t=t _chosen) is found in the conditional probability distribution P (j|t), and a number j _chosen,j_chosen is obtained according to the distribution sample, namely the number of the convolution kernel vector obtained in the sub-sample.

7. The convolutional neural network acceleration learning method for image feature extraction of claim 1, wherein step 1.4 specifically comprises: firstly, calculating a result weight of omega _j for a convolution kernel vector number j obtained by sampling each time, then sequencing elements in a set V _pre according to the weight omega _j, and reserving theta vectors with the largest weight as a candidate convolution kernel vector number set V; wherein ω _j＝ω_j+sgn(x_itW_jt); sgn () represents a sign function, when x _itW_jt >0, sgn (x _itW_jt) =1; when x _itW_jt =0, sgn (x _itW_jt) =0; when x _itW_jt <0, sgn (x _itW_jt) = -1.

8. The method for accelerating learning of a convolutional neural network for image feature extraction according to claim 7, wherein θ represents the number of candidate convolutional kernel vectors finally reserved, and θ is less than or equal to n, where n represents the number of matrix rows of the two-dimensional matrix W, i.e., the number of convolutional kernel vectors in the matrix.