CN118114723A

CN118114723A - Convolutional neural network training method based on asymmetric convolution kernel

Info

Publication number: CN118114723A
Application number: CN202410383952.XA
Authority: CN
Inventors: 邹佳; 朱金铭; 邹刘磊; 黄语莹; 范洪辉; 朱洪锦
Original assignee: Jiangsu University of Technology
Current assignee: Jiangsu University of Technology
Priority date: 2024-04-01
Filing date: 2024-04-01
Publication date: 2024-05-31

Abstract

The invention discloses a convolutional neural network training method based on an asymmetric convolutional kernel, which comprises the following steps: s01: acquiring a data set; s02: splitting n-n convolution kernels of a designated convolution layer in a convolution neural network into n-1 and 1*n asymmetric convolution kernels to form two parallel convolution layers, and adding a batch normalization layer after the split two parallel convolution layers; s03: training the convolutional neural network processed in the step S02 by adopting the data set acquired in the step S01, acquiring an image feature map in the training process, extracting image features, and classifying images; s04: after the convolutional neural network is trained in one round, the convolutional neural network is verified by using a verification set; s05: and (3) circularly executing the steps S2 to S4, and performing multi-round training on the convolutional neural network until reaching the ending condition of model convergence of the convolutional neural network. The method can reduce the number of network parameters on the premise of ensuring the network classification performance in the classification task.

Description

Convolutional neural network training method based on asymmetric convolution kernel

Technical Field

The invention relates to a convolutional neural network training method based on an asymmetric convolutional kernel.

Background

At present, a convolutional neural network (Convolutional Neural Networks, CNN) is a classical model in deep learning, the CNN mainly comprises a convolutional layer, a downsampling layer and a full-connection layer, and the CNN can gradually and deeply extract rich high-level semantics of an image by combining a plurality of convolutional layers and the downsampling layer, which is key information for completing tasks such as image classification or recognition.

The chinese patent with publication No. CN114091648a discloses a convolutional neural network-based image classification method and apparatus, and a convolutional neural network, in which the number of extracted feature values is increased by setting parallel sub-convolutional neural network layers, although the convolutional neural network is excellent in computer vision tasks, as a network model becomes larger, a network of up to several hundred million parameters may be used sometimes, which leads to a significant increase in memory occupation and computation requirements, so that restrictions on memory and computation resources need to be considered.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a convolutional neural network training method based on an asymmetric convolutional kernel, which can reduce the number of network parameters on the premise of ensuring the network classification performance in a classification task.

In order to solve the technical problems, the technical scheme of the invention is as follows: a convolutional neural network training method based on an asymmetric convolutional kernel comprises the following steps:

s01: acquiring a data set, wherein the data set is used for training a convolutional neural network;

s02: splitting n-n convolution kernels of a designated convolution layer in a convolution neural network into n-1 and 1*n asymmetric convolution kernels to form two parallel convolution layers, and adding a batch normalization layer after the split two parallel convolution layers;

s03: training the convolutional neural network processed in the step S02 by adopting the data set acquired in the step S01, acquiring an image feature map in the training process, extracting image features, and classifying images;

s04: after the convolutional neural network is trained in one round, the convolutional neural network is verified by using a verification set;

s05: and (3) circularly executing the steps S2 to S4, and performing multi-round training on the convolutional neural network until reaching the ending condition of model convergence of the convolutional neural network.

Further, in order to obtain a larger-scale data set to be applied to the neural network model test, in step S01, data enhancement is performed on the data set.

Further, the data enhancement mode is as follows: at least one of blurring and rotation.

Further, in order to ensure that the size of the output is consistent with that of the normal convolution, in step S2, after splitting the n×n convolution kernels of the specified convolution layers into two asymmetric convolution kernels, a padding operation is added.

Further, the filling formula of the filling operation is:

wherein,

W, H is the width and height of the pattern;

KW, KH is the convolution kernel width and convolution kernel height;

PW, PH is left and right filling size and up and down filling size;

s is convolution step length; w _N,H_N is the updated pattern width and pattern height.

Further, BN formula for batch normalization layer is:

y_i＝BN_γ,β(x_i)；

wherein,

I is an index number in the batch data;

x _i is the ith sample input data in the batch data;

y _i is the i-th sample output data;

gamma, beta are self-learning parameters;

m is the total number of index batch input samples for a sample in a batch of input samples.

Mu _Β is the mean of the batch input data;

variance of input data for a batch;

Is normalized data;

E is a very small number to avoid dividing by 0.

Further, the method further comprises the following steps:

S06: calculating the accuracy and positive and negative sample prediction bias values of the convolutional neural network on the test set, wherein the convolutional neural network uses the accuracy and the positive and negative sample prediction bias values together as evaluation indexes, and the final algorithm target of the convolutional neural network is as follows: the positive and negative samples have the lowest predicted bias value.

After the technical scheme is adopted, the invention has the following beneficial effects:

Firstly, according to the convolutional neural network training method based on the asymmetric convolutional kernels, the shape of the convolutional kernels is changed, one n-by-n convolutional kernel is decomposed into 1*n and n-by-1 convolutional kernels, the two kernels are used in parallel, the number of parameters of a model can be obviously reduced by introducing the technology, meanwhile, in order to avoid the problem caused by adding the asymmetric convolutional, a batch normalization layer is added after two split parallel convolutional layers, the training speed is improved, and finally, the classification performance of pictures is equivalent to that of a traditional convolutional neural network, and the number of network parameters is reduced.

Secondly, the method aims at the problem that the accuracy of the multi-classification evaluation indexes is similar, adopts the positive and negative sample deviation evaluation indexes, and uses the accuracy and the positive and negative sample prediction deviation value together as the evaluation indexes, so that the evaluation result is more reasonable.

Drawings

FIG. 1 is a diagram of a ResNet network architecture based on asymmetric convolution optimization in accordance with the present invention;

FIG. 2 is a diagram of a batch normalization layer placement order based on asymmetric convolution optimization in accordance with the present invention.

Detailed Description

In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings.

As shown in fig. 1 and 2, a convolutional neural network training method based on an asymmetric convolutional kernel includes the following steps:

Specifically, in step S01, data enhancement is performed on the data set to obtain a larger-scale data set for application to neural network model testing.

Specifically, the data enhancement mode is as follows: at least one of blurring and rotation.

Specifically, in step S2, after splitting the n×n convolution kernels of the specified convolution layer into two asymmetric convolution kernels, a padding operation is added to ensure that the size of the convolution kernels is consistent with that of the normal convolution output.

Specifically, the filling formula of the filling operation is:

wherein W and H are pattern width and pattern height;

KW, KH is the convolution kernel width and convolution kernel height;

PW, PH is left and right filling size and up and down filling size;

In the asymmetric convolution, padding is only required in the vertical or horizontal direction, and padding is not required to be applied in the other direction, so that the size of the output feature map and the size of the input feature map can be ensured to be consistent.

The batch normalization layer (Batch Normalization) pulls back the distribution of any neuron of each layer of neural network, namely input value, to the standard normal distribution with the mean value of 0 and the variance of 1 through a certain normalization means, and forcibly pulls back the distribution with more and more deviation to the distribution of the comparison standard, so that the activated input value falls in a region with more sensitive nonlinear function to input, the gradient is increased, the problem of gradient disappearance is avoided, the gradient is increased, the learning convergence speed is high, and the training speed is accelerated.

Specifically, the BN formula for the batch normalization layer is:

y_i＝BN_γ,β(x_i)；

wherein,

I is an index number in the batch data;

x _i is the ith sample input data in the batch data;

y _i is the i-th sample output data;

Gamma and beta are self-learning parameters, scaling factors and offset factors, offset is carried out on the scaled data, and the positions of the data are adjusted so that the model learns more complex characteristic representation;

Mu _Β is the mean of the batch input data;

variance of input data for a batch;

For normalized data, the distribution range and mean variance are adjusted to learn the model to the appropriate feature representation. ;

E is a very small number to avoid dividing by 0.

The batch normalization layer is arranged after the split convolution layer and is not suitable for being arranged in the middle of the split parallel convolution layer; if placed in the middle of the split parallel convolution layers, the batch normalization layer makes the model less sensitive to the initialization method and parameters in the network, but conflicts with the perceptibility of the directional information in the input data.

Specifically, the method further comprises the steps of:

S06: calculating the accuracy and positive and negative sample prediction bias values of the convolutional neural network on the test set, wherein the convolutional neural network uses the accuracy and the positive and negative sample prediction bias values together as evaluation indexes, and the final algorithm target of the convolutional neural network is as follows: the predicted deviation value of the positive and negative samples is the lowest; the positive sample prediction deviation and the negative sample prediction deviation can be calculated according to the probability of predicting each category by the model of the convolutional neural network.

Specifically, in step S6, when the classification accuracy is similar, performance evaluation is performed in combination with the positive and negative sample prediction bias.

Wherein,

In the formula, out is the probability of model prediction category; i is that the model predicts that the current image belongs to the ith category; out _i is the probability that the model predicts that the current image belongs to the ith category; max is the subscript of the maximum value in Out, i.e. the category to which the model prediction belongs; cur is the true class of the current image.

Specifically, in this embodiment, in step S3, in the forward propagation process of the convolutional neural network, the calculation formula of each layer of input features:

Wherein y ^(l) is the output of the first convolutional layer;

Where x is the input vector;

wherein, Is convolution operation;

Wherein b ^l is offset;

Wherein w is the weight value of the layer;

Wherein m is a set of input features;

Wherein i is the ith neuron;

Where f (x) is the activation function.

Epochs was set to 100, batchSize to 128, and LR to 0.0009 in ResNet model based on asymmetric convolution optimization. Wherein tables 1-3 are the correct rate of the model test set

Table 1 performance of ResNet model based on asymmetric convolution optimization on the accuracy of the Raw-img test set:

Table 2 behavior of ResNet model based on asymmetric convolution optimization on positive and negative prediction bias in the Raw-img test set:

Table 3 behavior of ResNet model based on asymmetric convolution optimization for test set correctness and positive and negative prediction bias at IMAGENETTE:

the technical problems, technical solutions and advantageous effects solved by the present invention have been further described in detail in the above-described embodiments, and it should be understood that the above-described embodiments are only illustrative of the present invention and are not intended to limit the present invention, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the scope of protection of the present invention.

Claims

1. A convolutional neural network training method based on an asymmetric convolutional kernel is characterized by comprising the following steps:

2. The convolutional neural network training method of claim 1, wherein in step S01, data enhancement is performed on the data set.

3. The convolutional neural network training method of claim 2,

The data enhancement mode is as follows: at least one of blurring and rotation.

4. The convolutional neural network training method of claim 1,

In step S2, after splitting the n×n convolution kernels of the specified convolution layer into two asymmetric convolution kernels, a padding operation is added.

5. The convolutional neural network training method of claim 4,

The filling formula of the filling operation is:

wherein,

W, H is the width and height of the pattern;

KW, KH is the convolution kernel width and convolution kernel height;

PW, PH is left and right filling size and up and down filling size;

6. The convolutional neural network training method of claim 1,

The BN formula for the batch normalization layer is:

y_i＝BN_γ,β(x_i)；

wherein,

I is an index number in the batch data;

x _i is the ith sample input data in the batch data;

y _i is the i-th sample output data;

gamma, beta are self-learning parameters;

Mu _Β is the mean of the batch input data;

variance of input data for a batch;

Is normalized data;

E is a very small number to avoid dividing by 0.

7. The convolutional neural network training method of claim 1, further comprising: