CN112784969B - Convolutional neural network acceleration learning method for image feature extraction - Google Patents

Convolutional neural network acceleration learning method for image feature extraction Download PDF

Info

Publication number
CN112784969B
CN112784969B CN202110136925.9A CN202110136925A CN112784969B CN 112784969 B CN112784969 B CN 112784969B CN 202110136925 A CN202110136925 A CN 202110136925A CN 112784969 B CN112784969 B CN 112784969B
Authority
CN
China
Prior art keywords
matrix
convolution kernel
vector
convolution
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110136925.9A
Other languages
Chinese (zh)
Other versions
CN112784969A (en
Inventor
杨晓春
张宇杰
许婧楠
王斌
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN202110136925.9A priority Critical patent/CN112784969B/en
Publication of CN112784969A publication Critical patent/CN112784969A/en
Application granted granted Critical
Publication of CN112784969B publication Critical patent/CN112784969B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Complex Calculations (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a convolutional neural network accelerated learning method based on sampling, and belongs to the technical field of convolutional neural networks. In the forward propagation stage, the method only samples and obtains partial convolution kernel vector to multiply with input data, and the rest vectors are ignored. The backward propagation phase only updates the convolution kernel vector that participates in the computation in the forward propagation. Compared with the existing convolution network learning method for calculating complete matrix multiplication, the method can effectively reduce the calculated amount in the forward propagation and backward propagation processes; meanwhile, the convergence process of the network can be accelerated as only meaningful weights in the network are calculated and updated each time. The convolution neural network acceleration learning method based on sampling does not need to adjust the macroscopic structure of the convolution network in practical application, does not influence the local feature extraction characteristic of the convolution network, and is easier to apply and saves cost compared with a convolution acceleration method based on hardware.

Description

Convolutional neural network acceleration learning method for image feature extraction
Technical Field
The invention belongs to the technical field of convolutional neural networks, and particularly relates to a convolutional neural network acceleration learning method for image feature extraction.
Background
Convolutional neural networks (Convolutional Neural Network, CNN) are one of the first successful depth models, and have been still at the front of deep learning commercial applications, such as image detection and segmentation, object recognition, and speech processing.
The convolution operation is a process of sliding different convolution kernels on an input picture and performing a certain operation. Specifically, at each sliding position, the elements of the convolution kernel are multiplied and summed with the elements of the input picture in a one-to-one correspondence. After this, the de-linearization is performed by an activation function, most commonly a linear rectification, i.e. a ReLU function. Such a calculation principle gives the convolutional network the ability to extract local features. The main structure of the convolutional neural network is that a plurality of convolutional layers are stacked to be used as a feature extractor, and finally, a full-connection layer is connected to be used as a classifier. In order to make the convolutional network have better feature extraction capability, many convolutional layers are often required to be stacked, so that the parameter scale of the convolutional neural network is greatly increased with the increase of the depth of the network. The computational effort of forward feature extraction and backward error propagation by the network is typically on the order of tens of millions to hundreds of millions, with the most significant computing resource consuming being the convolution operation of the convolution layer. Therefore, accelerating the convolution operation is a key for improving the calculation efficiency of the convolution neural network model.
The neural network training frameworks now in common use, such as the Caffe and TensorFlow frameworks, will spread the input data and convolution kernels into a two-dimensional array, thereby converting the convolution operation into a matrix multiplication operation. In the learning of the convolution network, each convolution layer needs to do three matrix products in total, one time is needed in forward propagation, and one time is needed for calculating the output gradient matrix and the gradient matrix of the convolution kernel in backward propagation, which consumes a great amount of calculation resources. In practice, not all the nerve units in the convolutional network are meaningful in calculation, the nerve units with larger eigenvalues have larger influence on the subsequent network layer, and the process of using the common activation function ReLU to perform the de-linearization can also directly set the negative eigenvalue to 0. Therefore, a new convolutional network learning method is needed to be designed, the complete matrix multiplication is not calculated, but only the values of more meaningful neural units in the original output are calculated, the calculation process of the rest neural units is omitted, and the calculation cost can be reduced and the convolutional network training is accelerated. Moreover, the learning method is improved in method implementation, and has practicability compared with a convolution acceleration method based on hardware equipment.
Disclosure of Invention
In an actual feature extraction scene, the convolutional network parameter is large in scale, the calculation cost is high, and redundancy exists in the calculation process, so that the speed of feature extraction by training a convolutional network model is slow, and an acceleration method based on hardware equipment is difficult to practically apply. The invention provides a convolutional neural network acceleration learning method for image feature extraction.
The technical scheme of the invention is as follows:
The convolutional neural network acceleration learning method for image feature extraction comprises the following steps:
Step 1: the forward propagation stage utilizes probability sampling to calculate and obtain an output characteristic diagram;
Step 2: the corresponding gradient values of neurons participating in calculation during forward propagation are reserved only in the backward propagation stage, the rest gradient values are ignored for 0, the gradient matrix is pruned, and the pruned gradient matrix is used for calculating and updating convolution kernel parameters;
Step 3: and repeatedly executing the steps 1 and 2 until the network training stopping condition is reached.
Further, according to the convolutional neural network acceleration learning method for image feature extraction, the method for obtaining the output feature map by probability sampling calculation in the forward propagation stage in step 1 is as follows: firstly, expanding an input characteristic diagram and a convolution kernel into a two-dimensional matrix; obtaining a corresponding candidate convolution kernel vector number set V by probability sampling for each input vector in the unfolded two-dimensional matrix; the input vector only multiplies the vector in the set V, and fills the calculation result into the corresponding position of the output feature map, and the rest positions of the output feature map are 0.
Further, according to the convolutional neural network acceleration learning method for image feature extraction, the step 1 specifically includes the following steps:
Step 1.1: expanding the input feature map into a two-dimensional matrix X according to the dimension of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, converting convolution operation into the product of the expanded two-dimensional matrix X and W, and calculating the absolute value sum s t of each column of elements in the matrix W;
Step 1.2: constructing a conditional probability distribution P (j|x i) for each vector X i in X according to the absolute value sum s t of each column element of the matrix W;
step 1.3: sampling is carried out for tau times according to probability distribution P (j|x i), a convolution kernel vector number is obtained by each sampling, and a convolution kernel vector candidate set V pre is obtained after tau times of sampling;
Step 1.4: screening elements in the V pre according to a preset condition, and forming a final candidate convolution kernel vector number set V by the screened elements;
Step 1.5: vector x i is only inner-product with the vectors in set V, and the result is filled in the corresponding position of the output feature map Y, and the rest positions are 0.
Further, according to the convolutional neural network acceleration learning method for image feature extraction, the meaning of the conditional probability distribution P (j|x i) is that for one vector X i in X, the vector numbers are extracted from all the convolutional kernel vectors, and the probability of extracting the jth convolutional kernel vector w j is proportional to the absolute value of the inner product X iwj T of the input vectors X i and w j, and the formula is as follows:
P(j|xi)∝xiwj T,i∈[1,N],j∈[1,n] (2)
wherein i is a line number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W. N is the number of rows of the input two-dimensional matrix X, and N is the number of rows of the convolution kernel matrix, namely the number of convolution kernel vectors in the matrix.
Further, according to the convolutional neural network acceleration learning method for image feature extraction, the specific content of step 1.2 includes:
Constructing a polynomial distribution P (j|t) for each column of the convolution kernel matrix W according to the absolute value sum s t of each column element of the matrix W, and constructing a polynomial distribution P (t|x i) for each vector X i in X; according to Constructing a conditional probability distribution P (j|x i); wherein i is a line number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; t is the column number of matrix W; d=d x=dw,dx is the dimension of vector x; d w is the dimension of the convolution kernel vector w.
Further, according to the convolutional neural network acceleration learning method for image feature extraction, the polynomial distributions P (j|t) and P (t|x i) are represented by the formula (3) and the formula (4), respectively:
P(j|t)~PN([|W1t|,…,|Wnt|]),j∈[1,n],t∈[1,d] (3)
P(t|xi)~PN([|xi1s1|,…,|xidsd|]),t∈[1,d] (4)
wherein PN represents a polynomial distribution. In the formula (3), each distribution P (j|t) stores the probability of selecting a different row number j of the matrix W on the premise of selecting the t-th column of the convolution kernel matrix W; since the matrix W has d columns, d distributions are constructed in total. In equation (4), the distribution P (t|x i) represents the probability of selecting a different column number t of the matrix W for the input vector x i.
Further, according to the convolutional neural network accelerated learning method for image feature extraction, the method for obtaining a convolutional kernel vector number in each sampling in step 1.3 comprises the following steps: sampling according to P (t|x i) to obtain a column number i chosen of the convolution kernel matrix W; the t chosen th probability distribution P (j|t=t chosen) is found in the conditional probability distribution P (j|t), and a number j chosen,jchosen is obtained according to the distribution sample, namely the convolution kernel vector number obtained in the sub-sample.
Further, according to the convolutional neural network accelerated learning method for image feature extraction, the step 1.4 specifically includes: firstly, calculating a result weight of omega j for a convolution kernel vector number j obtained by sampling each time, then sequencing elements in a set V pre according to the weight omega j, and reserving theta vectors with the largest weight as a candidate convolution kernel vector number set V; wherein ω j=ωj+sgn(xitWjt); sgn () represents a sign function, when x itWjt > 0, sgn (x itWjt) =1; when x itWjt =0, sgn (x itWjt) =0; when x itWjt < 0, sgn (x itWjt) = -1.
Further, according to the convolutional neural network acceleration learning method for image feature extraction, θ represents the number of candidate convolutional kernel vectors which are finally reserved, and θ is less than or equal to n, where n represents the number of matrix rows of the two-dimensional matrix W, i.e. the number of convolutional kernel vectors in the matrix.
The convolutional neural network acceleration learning method for image feature extraction, wherein step2 comprises the following steps:
Step 2.1: for input gradient matrix The unnecessary gradient value in (1) is ignored to be 0, and the input gradient matrix/>Deleting to obtain deleted input gradient matrix/>
Step 2.2: gradient matrix for computing convolution kernel
Step 2.3: calculating an output gradient matrix
Step 2.4: from a convolution kernel gradient matrixUpdating matrix W, outputting gradient matrix/>The input gradient matrix, which is the previous layer in the network, continues back propagation.
Compared with the prior art, the convolutional neural network acceleration learning method for image feature extraction has the following beneficial effects: the method utilizes the probability sampling principle, reduces the calculated amount under the condition of not affecting the network feature extraction capability, thereby accelerating the speed of constructing the convolutional network, improving the calculation efficiency of feature extraction by using the convolutional neural network model, and meeting the requirement of rapid feature extraction in practical application. Specifically, the forward propagation phase only samples and obtains a product of a part of convolution kernel vectors and input data, and the rest vectors are ignored. The backward propagation phase only updates the convolution kernel vector that participates in the computation in the forward propagation. Compared with the existing convolution network learning method for calculating complete matrix multiplication, the method can effectively reduce the calculated amount in the forward propagation and backward propagation processes; meanwhile, the convergence process of the network can be accelerated as only meaningful weights in the network are calculated and updated each time. The convolutional neural network acceleration learning method for image feature extraction does not need to adjust the macroscopic structure of the convolutional network in practical application, does not influence the local feature extraction characteristic of the convolutional network, and is easier to apply and saves cost compared with a hardware-based convolutional acceleration method.
Drawings
FIG. 1 is a schematic diagram of a sample-based convolutional layer feature extraction process provided by the present invention;
FIG. 2 is a schematic flow chart of the convolutional neural network accelerated learning method for image feature extraction provided by the invention;
FIG. 3 is a schematic flow chart of forward propagation provided by the present invention;
FIG. 4 is a schematic flow chart of the construction probability distribution P (j|x i) provided by the present invention;
FIG. 5 is a schematic diagram of a specific sampling flow in step 1.3 according to the present invention;
FIG. 6 is a flow chart of a final set of convolution kernel vectors V obtained by screening from set V pre;
Fig. 7 is a flow chart of the back propagation and update process provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the embodiments and the accompanying drawings, and it is apparent that the described embodiment is a preferred embodiment of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a schematic diagram of a sampling-based convolutional layer feature extraction process provided by the present invention is provided. When the convolutional network is used for feature extraction, a plurality of convolutional layers are stacked to improve the feature extraction capability of the network. Taking image feature extraction as an example, the input of each convolution layer is an original picture or a feature map obtained through the previous layer, and the output of each convolution layer is an output feature map obtained through convolution operation. The invention aims to reduce the calculated amount of convolution operation under the condition of not influencing the feature extraction effect of the convolution network and improve the calculation efficiency of feature extraction by using a convolution neural network model, so that the dimension of input and output is the same as the definition in the prior convolution operation. The definitions of the symbolic variables noted in fig. 1 are shown in table 1.
Table 1 the symbol variable meaning table referred to in fig. 1
An embodiment of the present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 2, a flowchart of a convolutional neural network accelerated learning method for image feature extraction provided by the invention includes steps 1,2, and 3:
Step 1: the forward propagation stage utilizes probability sampling to calculate an output feature map.
The input feature map and convolution kernel are both first expanded into a two-dimensional matrix. And obtaining a corresponding candidate convolution kernel vector number set V by probability sampling for each input vector in the input two-dimensional matrix obtained after expansion. The input vector is multiplied by the vector in the set V only and the calculation result is filled in the corresponding position of the output feature map, and the rest positions of the output feature map are 0.
The specific workflow of the forward propagation phase, as shown in fig. 3, includes steps 1.1, 1.2, 1.3, 1.4 and 1.5:
step 1.1: and expanding the input feature map into a two-dimensional matrix X according to the dimension of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, and calculating the absolute value sum of each column of elements in the matrix W.
Specifically, as shown in fig. 1, in the process of expanding input data, according to the definition of the existing convolution operation, in the process that the convolution kernel slides on the original image, the elements of the convolution kernel at each sliding position are multiplied by the elements of the input feature map one by one, and then summed. The convolution operation can be converted into a matrix product. As shown in fig. 1, the convolution kernel is a four-dimensional tensor of kn×kh×kw×kc, which is expanded into a two-dimensional matrix denoted by W. W has dimensions n×d w, each row of the matrix representing a convolution kernel vector W, the vector corresponding to dimensions d w =kh×kw×kc; n represents the number of matrix rows, i.e. the number of convolution kernel vectors in the matrix, so n=kn. Let the input feature map be a four-dimensional tensor of in×ih×iw×ic, and spread the input feature map into a two-dimensional matrix X according to the convolution kernel dimension, where X is a two-dimensional matrix of n×d x. Each vector X in the X matrix is an area covered by the convolution kernel when the convolution kernel slides on the original image each time, so that the dimension d x of the vector X is equal to the dimension d w of the convolution kernel vector w. Let d x=dw = d. N represents how many times the convolution kernel has slid together on the input signature, n=in×oh×ow being obtained by the definition of the convolution operation. The convolution operation can thus be converted into the product of the unwrapped two-dimensional matrices X and W.
The absolute value sum s t of each column of elements of the matrix W is calculated as follows.
Wherein t is the column number of the matrix W; w jt is the element of the jth row and t column in the convolution kernel matrix W. The calculation of this value prepares the probability distribution for the subsequent step.
Step 1.2: from the absolute value sum s t of each column element of the matrix W, a conditional probability distribution P (j|x i) is constructed for each vector X i in X.
P(j|xi)∝xiwj T,i∈[1,N],,∈[1,n] (2)
Wherein i is a line number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W. The significance of this probability distribution is that for one vector X i of X, the vector number is extracted among all convolution kernel vectors, and the probability of extracting to the jth convolution kernel vector w j is proportional to the absolute value size of the inner product X iwj T of the input vectors X i and w j.
In particular, since it is difficult to directly construct the conditional probability distribution P (j|x i)∝xxwj T, the method is based on Step 1.2P (j|x i) is obtained by constructing two polynomial distributions P (j|t) (step 1.2.1) and P (t|x i) (step 1.2.2).
The specific workflow for constructing the conditional probability distribution P (j|x i), as shown in fig. 4, includes steps 1.2.1 and 1.2.2:
step 1.2.1: a polynomial distribution P (j|t) is constructed for each column of the convolution kernel matrix W.
P(j|t)~PN([|W1t|,…,|Wnt|]),j∈[1,n],t∈[1,d] (3)
Wherein each distribution P (j|t) stores the probability of selecting a different row number j of the matrix W on the premise that the t-th column of the matrix W is selected. Since the matrix W has d columns, d distributions are constructed in total. PN represents a polynomial distribution. In this distribution, taking the value of j as 5 as an example, the specific probability value is calculated as
Step 1.2.2: for each vector X i in X, a polynomial distribution P (t|x i) is constructed.
P(t|xi)~PN([|xi1s1|,…,|xidsd|]),t∈[1,d] (4)
Wherein s t in each term is the sum of the absolute values of the column elements of the convolution kernel matrix calculated in step 1.1. Specifically, for example, the probability of t being 3 (3. Epsilon. [1, d ]) is
Step 1.3: and (3) sampling is carried out for tau times according to the probability distribution P (j|x i), a convolution kernel vector number is obtained each time, and a convolution kernel vector candidate set V pre is obtained after the tau times of sampling. τ represents the number of samples and the specific value is customized by the skilled person based on the experimental effect.
The specific workflow of sampling according to probability distribution P (j|x i) is shown in fig. 5. Comprises the steps of 1.3.1, 1.3.2 and 1.3.3:
Step 1.3.1: a column number t chosen of the convolution kernel matrix W is obtained from the sampling of P (t|x i).
Step 1.3.2: the i chosen th probability distribution P (j|t=t chosen) is found in the set of probability distributions constructed in step 1.2.1, and a number j chosen,jchosen is obtained according to the distribution sample, which is the convolution kernel vector number obtained in the sample.
Specifically, for one sampling, assuming that t=3 is first extracted according to P (t|x i) (step 1.3.1), a probability distribution P (j|t=3) is found, and j=5 is extracted according to the probability distribution P (step 1.3.2). The final extracted convolution kernel vector is numbered 5. The significance of this result is that the inner product of the input vector x i with the decimated convolution kernel vector w 5 may result in a larger eigenvalue than other, non-decimated convolution kernel vectors.
Step 1.3.3: repeating 1.3.1 and 1.3.2, and extracting τ times to obtain a convolution kernel vector candidate set V pre.
Step 1.4: v pre is filtered to obtain a final candidate convolution kernel vector number set V.
The specific workflow of the final convolution kernel vector set V is obtained by screening the set V pre, as shown in fig. 6, including steps 1.4.1 and 1.4.2:
Step 1.4.1: a resulting weight is calculated for the convolution kernel number j obtained for each sample, denoted omega jj=ωj+sgn(xitWjt).
Where sgn () represents a sign function, when x itWjt > 0, sgn (x itWjt) =1; when x itWjt =0, sgn (x itWjt) =0; when x itWjt < 0, sgn (x itWjt) = -1.
Specifically, since the objective is to construct a probability distribution P (j|x i)∝xiwj Y, the inner product x iwj T is divided by a positive and negative value, and two probability distributions P (t|x i) and P (j|t) are constructed in proportion to the magnitude of the inner product absolute value.
Step 1.4.2: the elements in the set V pre are ordered according to the weight omega j, and the theta vectors with the largest weights are reserved as the final set V.
Specifically, θ represents the number of candidate convolution kernel vector numbers finally reserved, and θ is less than or equal to n. When all convolution kernel vectors are selected, θ=n, the amount of computation at this time is the same as in the existing convolution method. The specific value of θ is customized by the skilled person according to the experimental effect.
Step 1.5: vector x i is inner-product with only those extracted vectors in set V, and as a result, fills in the locations corresponding to output feature map Y, the rest of locations are 0.
The specific flow of step 1 has been described thus far in connection with a specific embodiment, wherein the construction and sampling of the polynomial distribution in step 1.2 can be further accelerated by existing methods, e.g. the time complexity of constructing P (t|x i) for each input vector is O (d) and the time complexity of sampling is O (1) using the alias sampling method (ALIAS SAMPLE). The existing convolution method needs to calculate the product of x i and N convolution kernel vectors, the invention only needs to calculate the product of x i and theta vectors in a set V obtained by probability sampling, so that the complexity of the forward propagation time of the convolution operation after improvement is O (N.theta.d); the forward propagation of the existing convolution method requires the calculation of a complete matrix multiplication, with a time complexity O (n·n·d). Because theta is less than or equal to n, the invention has the advantage of acceleration. In addition, the threshold value theta is customized by a technician according to the experimental effect, so that the balance of the speed and the accuracy can be adjusted according to the actual requirement.
Step 2: the backward propagation phase only retains the corresponding gradient values of neurons participating in computation in the forward propagation, and the rest gradient values are ignored for 0. The pruned gradient matrix is used to calculate and update convolution kernel parameters.
Specifically, in the existing definition, the forward propagation stage of the network training is a process of calculating the given input data layer by the network to obtain the final output; the counter-propagating stage calculates the error between the output and the target value, and propagates the error forward layer by layer from the last layer of the network. For the convolution layer, the back propagation stage calculates a convolution kernel gradient matrix and an output gradient matrix according to the input gradient matrix, wherein the convolution kernel gradient matrix is used for updating the convolution kernel tensor of the current layer, and the output gradient matrix is transmitted to the previous layer and used as the input gradient matrix of the previous layer. The invention does not change the definition of the existing convolution network to the back propagation process, and only reduces the calculated amount of the convolution kernel gradient matrix and the output gradient matrix in the process.
The specific workflow of the back propagation and update process in the present invention, as shown in fig. 7, includes steps 2.1, 2.2, 2.3, and 2.4:
Step 2.1: for input gradient matrix The unnecessary gradient value in the (1) is ignored to be 0, and the/>
According to the back-propagation definition,The same dimensions as the output profile Y. According to the flow of step 1, for each vector X i in the input two-dimensional matrix X, only the product of the vector X i and the convolution kernel vector obtained by sampling is carried out, and the result is filled in the corresponding position of the output characteristic diagram Y, only the positions actually participate in the calculation of forward propagation. Thus only remain/>The rest of the gradient values are set to 0, thus obtaining/>
Step 2.2: gradient matrix for computing convolution kernel
In particular, the method comprises the steps of,Only the gradient values of the positions participating in the computation in the forward propagation are retained, such that/>And multiplying X, so that the gradient value is calculated for each vector X i in X by the convolution kernel vector sampled in the corresponding forward propagation stage, and the gradient values of the rest convolution kernel vectors are 0.
Step 2.3: calculating an output gradient matrix
In particular, i.e. using a pruned input gradient matrixAnd multiplying the convolution kernel matrix W to obtain an output gradient matrix/>
Step 2.4: from a convolution kernel gradient matrixUpdating matrix W, outputting gradient matrix/>The input gradient matrix, which is the previous layer in the network, continues back propagation.
The update operation is the same as in the existing convolutional network, supporting different update strategies, such as using random gradient descent rulesΗ is the learning rate set by the technician according to the existing learning rate setting strategy.
Step 3: and repeatedly executing the steps 1 and 2 until the network training stopping condition is reached. The stop condition is the same as that in the existing convolutional network learning process. Thus, the learning process of the convolution network is completed.
According to the technical scheme, the convolutional neural network acceleration learning method for image feature extraction provided by the embodiment of the application only picks out more meaningful weight calculation in the network through the probability sampling principle and updates the weight calculation, and can reduce the calculated amount under the condition that the network feature extraction capacity is not affected, so that the speed of constructing the convolutional network is increased, the calculation efficiency of feature extraction by using a convolutional neural network model is improved, and the requirement of rapid feature extraction in practical application is met. And meanwhile, compared with a convolution acceleration method based on hardware, the convolution acceleration method based on hardware is easier to apply and saves cost.
Thus, the present invention has been described in connection with the accompanying drawings and the embodiments, and the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (8)

1. The convolutional neural network acceleration learning method for image feature extraction is characterized by comprising the following steps of:
Step 1: the forward propagation stage utilizes probability sampling to calculate and obtain an output characteristic diagram;
Step 2: the corresponding gradient values of neurons participating in calculation during forward propagation are reserved only in the backward propagation stage, the rest gradient values are ignored for 0, the gradient matrix is pruned, and the pruned gradient matrix is used for calculating and updating convolution kernel parameters;
step 3: repeating the steps 1 and 2 until reaching the network training stop condition;
The step 1 specifically comprises the following steps:
Step 1.1: expanding the input feature map into a two-dimensional matrix X according to the dimension of a convolution kernel, expanding the convolution kernel into a two-dimensional matrix W, converting convolution operation into the product of the expanded two-dimensional matrix X and W, and calculating the absolute value sum s t of each column of elements in the matrix W;
Step 1.2: constructing a conditional probability distribution P (j|x i) for each vector X i in X according to the absolute value sum s t of each column element of the matrix W;
step 1.3: sampling is carried out for tau times according to probability distribution P (j|x i), a convolution kernel vector number is obtained by each sampling, and a convolution kernel vector candidate set V pre is obtained after tau times of sampling;
Step 1.4: screening elements in the V pre according to a preset condition, and forming a final candidate convolution kernel vector number set V by the screened elements;
Step 1.5: vector x i is only inner product with the vector in set V, the result is filled in the corresponding position of the output characteristic diagram Y, and the rest positions are 0;
The step2 comprises the following steps:
Step 2.1: for input gradient matrix The unnecessary gradient value in (1) is ignored to be 0, and the input gradient matrix/>Deleting to obtain deleted input gradient matrix/>
Step 2.2: gradient matrix for computing convolution kernel
Step 2.3: calculating an output gradient matrix
Step 2.4: from a convolution kernel gradient matrixUpdating matrix W, outputting gradient matrix/>The input gradient matrix, which is the previous layer in the network, continues back propagation.
2. The method for accelerating learning of convolutional neural network for image feature extraction according to claim 1, wherein the method for obtaining the output feature map by probability sampling calculation in the forward propagation stage in step 1 is as follows: firstly, expanding an input characteristic diagram and a convolution kernel into a two-dimensional matrix; obtaining a corresponding candidate convolution kernel vector number set V by probability sampling for each input vector in the unfolded two-dimensional matrix; the input vector only multiplies the vector in the set V, and fills the calculation result into the corresponding position of the output feature map, and the rest positions of the output feature map are 0.
3. The method for accelerating learning of a convolutional neural network for extracting image features according to claim 1, wherein the meaning of the conditional probability distribution P (j|x i) is that for one vector X i in X, the vector numbers are extracted in all convolution kernel vectors, and the probability of extracting the j-th convolution kernel vector w j is proportional to the absolute value of the inner product X iwj T of the input vectors X i and w j, and the formula is as follows:
P(j|xi)∝xiwj T,i∈[1,N],j∈[1,n] (2)
wherein i is a line number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; n is the number of rows of the input two-dimensional matrix X, and N is the number of rows of the convolution kernel matrix, namely the number of convolution kernel vectors in the matrix.
4. The convolutional neural network acceleration learning method for image feature extraction of claim 1, wherein the step 1.2 comprises:
Constructing a polynomial distribution P (j|t) for each column of the convolution kernel matrix W according to the absolute value sum s t of each column element of the matrix W, and constructing a polynomial distribution P (t|x i) for each vector X i in X; according to Constructing a conditional probability distribution P (j|x i); wherein i is a line number of the input two-dimensional matrix X; j is the row number of the convolution kernel matrix W; t is the column number of matrix W; d=d x=dw,dx is the dimension of vector x; d w is the dimension of the convolution kernel vector w.
5. The convolutional neural network acceleration learning method for image feature extraction of claim 4, wherein the polynomial distributions P (j|t) and P (t|x i) are represented by equations (3) and (4), respectively:
P(j|t)~PN([|W1t|,…,|Wnt|]),j∈[1,n],t∈[1,d] (3)
P(t|xi)~PN([|xi1s1|,…,|sidsd|]),t∈[1,d] (4)
Wherein PN represents a polynomial distribution; in the formula (3), each distribution P (j|t) stores the probability of selecting a different row number j of the matrix W on the premise that the t-th column of the matrix W is selected; since the matrix W has d columns, d distributions are constructed altogether; in equation (4), the distribution P (t|x i) represents the probability of selecting a different column number t of the matrix W for the input vector x i.
6. The method for accelerated learning of convolutional neural networks for image feature extraction of claim 1,
The method for obtaining a convolution kernel vector number in each sampling in the step 1.3 is as follows: sampling according to P (t|x i) to obtain a column number t chosen of the convolution kernel matrix W; the t chosen th probability distribution (j|t=t chosen) is found in the conditional probability distribution P (j|t), and a number j chosen,jchosen is obtained according to the distribution sample, namely the number of the convolution kernel vector obtained in the sub-sample.
7. The convolutional neural network acceleration learning method for image feature extraction of claim 1, wherein step 1.4 specifically comprises: firstly, calculating a result weight of omega j for a convolution kernel vector number j obtained by sampling each time, then sequencing elements in a set V pre according to the weight omega j, and reserving theta vectors with the largest weight as a candidate convolution kernel vector number set V; wherein ω j=ωj+sgn(xitWjt); sgn () represents a sign function, when x itWjt >0, sgn (x itWjt) =1; when x itWjt =0, sgn (x itWjt) =0; when x itWjt <0, sgn (x itWjt) = -1.
8. The method for accelerating learning of a convolutional neural network for image feature extraction according to claim 7, wherein θ represents the number of candidate convolutional kernel vectors finally reserved, and θ is less than or equal to n, where n represents the number of matrix rows of the two-dimensional matrix W, i.e., the number of convolutional kernel vectors in the matrix.
CN202110136925.9A 2021-02-01 2021-02-01 Convolutional neural network acceleration learning method for image feature extraction Active CN112784969B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110136925.9A CN112784969B (en) 2021-02-01 2021-02-01 Convolutional neural network acceleration learning method for image feature extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110136925.9A CN112784969B (en) 2021-02-01 2021-02-01 Convolutional neural network acceleration learning method for image feature extraction

Publications (2)

Publication Number Publication Date
CN112784969A CN112784969A (en) 2021-05-11
CN112784969B true CN112784969B (en) 2024-05-14

Family

ID=75760307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110136925.9A Active CN112784969B (en) 2021-02-01 2021-02-01 Convolutional neural network acceleration learning method for image feature extraction

Country Status (1)

Country Link
CN (1) CN112784969B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118070847B (en) * 2024-04-17 2024-07-05 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Model parameter updating method, device, equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871136A (en) * 2017-03-22 2018-04-03 中山大学 The image-recognizing method of convolutional neural networks based on openness random pool
US10140421B1 (en) * 2017-05-25 2018-11-27 Enlitic, Inc. Medical scan annotator system
CN109612708A (en) * 2018-12-28 2019-04-12 东北大学 Based on the power transformer on-line detecting system and method for improving convolutional neural networks
CN109948029A (en) * 2019-01-25 2019-06-28 南京邮电大学 Based on the adaptive depth hashing image searching method of neural network
CN111428188A (en) * 2020-03-30 2020-07-17 南京大学 Convolution operation method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10474458B2 (en) * 2017-04-28 2019-11-12 Intel Corporation Instructions and logic to perform floating-point and integer operations for machine learning
EP3622520A1 (en) * 2017-10-16 2020-03-18 Illumina, Inc. Deep learning-based techniques for training deep convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871136A (en) * 2017-03-22 2018-04-03 中山大学 The image-recognizing method of convolutional neural networks based on openness random pool
US10140421B1 (en) * 2017-05-25 2018-11-27 Enlitic, Inc. Medical scan annotator system
CN109612708A (en) * 2018-12-28 2019-04-12 东北大学 Based on the power transformer on-line detecting system and method for improving convolutional neural networks
CN109948029A (en) * 2019-01-25 2019-06-28 南京邮电大学 Based on the adaptive depth hashing image searching method of neural network
CN111428188A (en) * 2020-03-30 2020-07-17 南京大学 Convolution operation method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Structured Probabilistic Pruning for Convolution Neural Network Acceleration;Huan Wang等;《arxiv》;20180910;正文1-13页 *
资源受限下的卷积神经网络模型优化方法;韩韬;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190215;I140-91 *

Also Published As

Publication number Publication date
CN112784969A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN110188685B (en) Target counting method and system based on double-attention multi-scale cascade network
CN110619655B (en) Target tracking method and device integrating optical flow information and Simese framework
CN109410261B (en) Monocular image depth estimation method based on pyramid pooling module
CN112101190A (en) Remote sensing image classification method, storage medium and computing device
CN114092832B (en) High-resolution remote sensing image classification method based on parallel hybrid convolutional network
CN109948029A (en) Based on the adaptive depth hashing image searching method of neural network
CN112818764B (en) Low-resolution image facial expression recognition method based on feature reconstruction model
CN111785014B (en) Road network traffic data restoration method based on DTW-RGCN
CN110334584B (en) Gesture recognition method based on regional full convolution network
CN112529415B (en) Article scoring method based on combined multiple receptive field graph neural network
CN115966010A (en) Expression recognition method based on attention and multi-scale feature fusion
CN105976395A (en) Video target tracking method based on sparse representation
CN112784969B (en) Convolutional neural network acceleration learning method for image feature extraction
CN111598460A (en) Method, device and equipment for monitoring heavy metal content in soil and storage medium
Zhang et al. Layer pruning for obtaining shallower resnets
CN118101339B (en) Federal knowledge distillation method for coping with privacy protection of Internet of things
CN117933095B (en) Earth surface emissivity real-time inversion and assimilation method based on machine learning
CN115797808A (en) Unmanned aerial vehicle inspection defect image identification method, system, device and medium
CN114495163B (en) Pedestrian re-identification generation learning method based on category activation mapping
CN110188612B (en) Aurora egg intensity image modeling method based on generating type countermeasure network
CN114862015A (en) Typhoon wind speed intelligent prediction method based on FEN-ConvLSTM model
CN115170403A (en) Font repairing method and system based on deep meta learning and generation countermeasure network
CN114581789A (en) Hyperspectral image classification method and system
CN112597890A (en) Face recognition method based on multi-dimensional Taylor network
CN116433980A (en) Image classification method, device, equipment and medium of impulse neural network structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant