CN111950699A

CN111950699A - Neural network regularization method based on characteristic space correlation

Info

Publication number: CN111950699A
Application number: CN202010632236.2A
Authority: CN
Inventors: 戴涛; 曾钰媛; 夏树涛; 李清; 李伟超; 汪漪
Original assignee: Shenzhen International Graduate School of Tsinghua University; Peng Cheng Laboratory
Current assignee: Shenzhen International Graduate School of Tsinghua University; Peng Cheng Laboratory
Priority date: 2020-07-03
Filing date: 2020-07-03
Publication date: 2020-11-17

Abstract

The invention discloses a neural network regularization method based on feature space correlation, which comprises the following steps: acquiring a spatial correlation matrix of a feature map to be processed, and determining a first discarding mask matrix based on the spatial correlation matrix; determining a first feature map corresponding to the feature map to be processed according to the first discarding mask matrix and the feature map to be processed; determining a channel correlation vector corresponding to the first feature map, and determining a second discarding mask matrix corresponding to the feature map to be processed based on the channel correlation vector; and determining the discarded characteristic diagram corresponding to the characteristic diagram to be processed according to the second discarding mask matrix and the first characteristic diagram. The method and the device perform feature discarding on the feature map based on the spatial feature correlation and the channel feature correlation, so that the low-correlation features in the feature map can be effectively selected to be discarded to achieve the purpose of self-adaptive discarding, a CNN network can be effectively regularized, and the generalization capability of a model is improved.

Description

Neural network regularization method based on characteristic space correlation

Technical Field

The invention relates to the technical field of neural networks, in particular to a neural network regularization method based on feature space correlation.

Background

The deep neural network is a highly nonlinear model, and the problem of overfitting is easy to occur in the training process, so that various regular algorithms are proposed in the research of machine learning to improve the generalization performance of the machine learning model. Early regularization methods limited the complexity of the model (e.g., linear model, neural network) by adding a regularization term or penalty term to the loss function, such as L1 regularization, L2 regularization, etc. The regularization term is typically a parametric norm penalty to reduce the size of the model parameters. In computer vision research, data enhancement is a common and effective regularization method. For example, in an image, data enhancement increases a training data set by performing horizontal/vertical flipping, scaling, rotating, clipping, adding color dithering and noise and the like on the image, so that the training set has diversity, and the generalization capability of a model is enhanced.

With the development of deep neural networks, more efficient regularization techniques have been proposed. Dropout was proposed by Hinton et al to be applied in fully connected networks in 2012, and this method achieves the role of network regularization by randomly zeroing out some of the neurons in the neural network. For the structural characteristics of CNN, a large class of regularization algorithms based on random discarding is proposed, such as Cutout, Spatial Dropout, DropBlock, etc. Since CNNs are characterized by local correlation when capturing image features, such regularization algorithms achieve the goal of effectively discarding local features in CNNs by designing structured discarding rules. Cutout randomly sets zero in a continuous pixel region in an input image to achieve the effect of data amplification. And flat Dropout randomly discards partial channels in the feature map, and DropBlock discards continuous active regions in the spatial region of the feature map to achieve the purpose of feature discarding. The method based on feature discarding increases the learning difficulty of the network to a certain extent, and further achieves the function of model regularization, so that the model learns the features with better generalization performance.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a neural network regularization method based on feature space correlation, aiming at the defects of the prior art.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a method of neural network regularization based on feature spatial correlation, the method comprising:

acquiring a spatial correlation matrix of a feature map to be processed, and determining a first discarding mask matrix corresponding to the feature map to be processed based on the spatial correlation matrix;

determining a first feature map corresponding to the feature map to be processed according to the first discarding mask matrix and the feature map to be processed;

determining a channel correlation vector corresponding to the first feature map, and determining a second discarding mask matrix corresponding to the feature map to be processed based on the channel correlation vector;

and determining the discarded characteristic diagram corresponding to the characteristic diagram to be processed according to the second discarding mask matrix and the first characteristic diagram.

The neural network regularization method based on the characteristic space correlation is characterized in that the characteristic graph to be processed is a characteristic graph output by a convolution layer in the neural network.

The neural network regularization method based on the feature spatial correlation, wherein the obtaining of the spatial correlation matrix of the feature map to be processed specifically includes:

down-sampling the feature map to be processed to obtain a feature map to be down-sampled;

converting the down-sampling feature map into a feature matrix, and carrying out normalization processing on the feature matrix to obtain a normalized feature matrix;

and determining a spatial correlation matrix corresponding to the normalized feature matrix according to the feature orthogonality.

The neural network regularization method based on the feature spatial correlation, wherein the determining of the first discarding mask matrix corresponding to the down-sampling feature map based on the spatial correlation matrix specifically includes:

determining a first discarding probability vector corresponding to the down-sampling feature map according to the spatial correlation matrix, and determining a third discarding mask matrix corresponding to the down-sampling feature map according to the first discarding probability vector;

determining a first candidate discarding proportion corresponding to the third discarding mask matrix, and determining a first target discarding proportion corresponding to the downsampling feature map according to a preset first discarding proportion and the first candidate discarding proportion;

determining a fourth discarding mask matrix corresponding to the downsampled feature map according to the first target discarding proportion;

and determining a first discarding mask matrix corresponding to the feature map to be processed according to the third discarding mask matrix and the fourth discarding mask matrix.

The method for regularizing a neural network based on feature spatial correlation, wherein the determining, according to the third discarding mask matrix and the fourth discarding mask matrix, a first discarding mask matrix corresponding to the feature map to be processed specifically includes:

selecting a target element from the third discarding mask matrix, wherein the element value of the target element and the element value of a candidate element corresponding to the target element are both 0, and the element position of the candidate element in the fourth discarding mask matrix is the same as the element position of the target element in the third discarding mask matrix;

setting the element values of other elements except the target element in the third discarding mask matrix to be 1 so as to obtain a discarding mask matrix corresponding to the down-sampling feature map;

and performing up-sampling on the discarding mask matrix corresponding to the down-sampling feature map to obtain a discarding mask matrix corresponding to the feature map to be processed, wherein the image size corresponding to the discarding mask matrix corresponding to the feature map to be processed is the same as the image size of the feature map to be processed.

The method for regularizing a neural network based on feature spatial correlation includes the following specific steps of determining, according to the first discarding mask matrix and the feature map to be processed, a first feature map corresponding to the feature map to be processed:

and performing dot product multiplication on the first discarding mask matrix and the feature map to be processed to obtain a first feature map corresponding to the feature map to be processed.

The neural network regularization method based on the feature spatial correlation, wherein the determining of the channel correlation vector corresponding to the first feature map specifically includes:

converting the first characteristic diagram along the channel direction to obtain a characteristic matrix corresponding to the first characteristic diagram;

and determining a channel correlation vector corresponding to the feature matrix according to the feature orthogonality.

The neural network regularization method based on the feature spatial correlation, wherein the determining of the second discarding mask matrix corresponding to the feature map to be processed based on the channel correlation vector specifically includes:

determining a second discarding probability vector corresponding to the first feature map according to the channel correlation vector, and determining a fifth discarding mask matrix corresponding to the first feature map according to the second discarding probability vector;

determining a second candidate discarding proportion corresponding to the fifth discarding mask matrix, and determining a second target discarding proportion corresponding to the first feature map according to a preset second discarding proportion and the second candidate discarding proportion;

determining a sixth discarding mask matrix corresponding to the first feature map according to the second target discarding proportion;

and determining a second discarding mask matrix corresponding to the feature map to be processed according to the fifth discarding mask matrix and the sixth discarding mask matrix.

A neural network model, characterized in that the neural model comprises at least one convolution module, the convolution module comprises convolution layers and a regularization module, and the output items of the convolution layers are the input items of the regularization module; the regularization module is configured to perform a feature spatial correlation based neural network regularization method as described above.

A terminal device, characterized in that the terminal device is loaded with a neural network model as described above.

Has the advantages that: compared with the prior art, the invention provides a neural network regularization method based on feature space correlation, which comprises the following steps: acquiring a spatial correlation matrix of a feature map to be processed, and determining a first discarding mask matrix based on the spatial correlation matrix; determining a first feature map corresponding to the feature map to be processed according to the first discarding mask matrix and the feature map to be processed; determining a channel correlation vector corresponding to the first feature map, and determining a second discarding mask matrix corresponding to the feature map to be processed based on the channel correlation vector; and determining the discarded characteristic diagram corresponding to the characteristic diagram to be processed according to the second discarding mask matrix and the first characteristic diagram. The method and the device perform feature discarding on the feature map based on the spatial feature correlation and the channel feature correlation, so that the low-correlation features in the feature map can be effectively selected to be discarded to achieve the purpose of self-adaptive discarding, a CNN network can be effectively regularized, and the generalization capability of a model is improved.

Drawings

Fig. 1 is a flowchart of a neural network regularization method based on feature spatial correlation provided in the present invention.

Fig. 2 is a flow diagram illustrating a process of discarding spatial domain correlations in the neural network regularization method based on feature spatial correlations according to the present invention.

Fig. 3 is a flow diagram illustrating a discarding process of channel domain correlation in the neural network regularization method based on feature spatial correlation according to the present invention.

FIG. 4 is an application scenario diagram of a convolution module constructed by the neural network regularization method based on feature spatial correlation provided in the present invention

Fig. 5 is a schematic structural diagram of a terminal device provided in the present invention.

Detailed Description

The invention provides a neural network regularization method based on feature space correlation, and in order to make the purpose, technical scheme and effect of the invention clearer and clearer, the invention is further described in detail below by referring to the attached drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The inventor finds that the deep neural network is easy to generate the over-fitting problem in the training process, so that various regular algorithms are proposed in the machine learning research to improve the generalization performance of the machine learning model. Early regularization methods limited the complexity of the model (e.g., linear model, neural network) by adding a regularization term or penalty term to the loss function, such as L1 regularization, L2 regularization, etc. The regularization term is typically a parametric norm penalty to reduce the size of the model parameters. In computer vision research, data enhancement is a common and effective regularization method. For example, in an image, data enhancement increases a training data set by performing horizontal/vertical flipping, scaling, rotating, clipping, adding color dithering and noise and the like on the image, so that the training set has diversity, and the generalization capability of a model is enhanced.

With the development of deep neural networks, more efficient regularization techniques have been proposed. Dropout was proposed by Hinton et al to be applied in fully connected networks in 2012, and this method achieves the role of network regularization by randomly zeroing out some of the neurons in the neural network. For the structural characteristics of CNN, a large class of regularization algorithms based on random discarding is proposed, such as Cutout, Spatial Dropout, DropBlock, etc. Since CNNs are characterized by local correlation when capturing image features, such regularization algorithms achieve the goal of effectively discarding local features in CNNs by designing structured discarding rules. Cutout randomly sets zero in a continuous pixel region in an input image to achieve the effect of data amplification. Spatial Dropout randomly discards part of the channels in the feature map, and DropBlock discards consecutive active regions in the Spatial region of the feature map for the purpose of feature discarding. The method based on feature discarding increases the learning difficulty of the network to a certain extent, and further achieves the function of model regularization, so that the model learns the features with better generalization performance.

In addition, the feature discarding-based regularization algorithm in the existing regularization algorithms plays an important role in improving the generalization of the deep neural network model, but the existing regularization algorithms based on feature discarding still have some limitations. For example, the standard Dropout technique is a regularization algorithm that randomly discards individual feature points, which does not work well for regularization in CNN. Meanwhile, the Dropout technology based on random discarding does not consider the importance degree between feature points, and limits the regularization effect of the Dropout technology on the CNN.

In order to solve the above problem, in the embodiment of the present invention, after the feature map to be processed is obtained, a spatial correlation matrix of the feature map to be processed is obtained, and a first discarding mask matrix is determined based on the spatial correlation matrix; determining a first feature map corresponding to the feature map to be processed according to the first discarding mask matrix and the feature map to be processed; determining a channel correlation vector corresponding to the first feature map, and determining a second discarding mask matrix corresponding to the feature map to be processed based on the channel correlation vector; and determining the discarded characteristic diagram corresponding to the characteristic diagram to be processed according to the second discarding mask matrix and the first characteristic diagram. The method and the device perform feature discarding on the feature map based on the spatial feature correlation and the channel feature correlation, so that the low-correlation features in the feature map can be effectively selected to be discarded to achieve the purpose of self-adaptive discarding, a CNN network can be effectively regularized, and the generalization capability of a model is improved.

The invention will be further explained by the description of the embodiments with reference to the drawings.

The present implementation provides a neural network regularization method based on feature spatial correlation, as shown in fig. 1-3, the method includes:

s10, obtaining a spatial correlation matrix of the feature map to be processed, and determining a first discarding mask matrix corresponding to the feature map to be processed based on the spatial correlation matrix.

Specifically, the feature map to be processed is a feature map output by a convolutional layer in a deep learning network model (e.g., a convolutional neural network model, etc.), that is, the feature map to be processed is an output item of the convolutional layer, and the convolutional layer may be any convolutional layer in the deep learning network model. For example, for an input image of a classified convolutional neural network, a feature map obtained by any convolutional layer of the classified convolutional neural network based on the input image is acquired.

The spatial correlation matrix is used for reflecting the spatial correlation between the feature points in the down-sampling feature map, and each element in the spatial correlation matrix is used for reflecting the degree of importance of the correlation between the feature point corresponding to the element in the feature map and other feature points in the feature map. In an implementation manner of this embodiment, the obtaining a spatial correlation matrix of a feature map to be processed specifically includes:

a11, down-sampling the feature map to be processed to obtain the feature map to be down-sampled

A12, converting the down-sampling feature map into a feature matrix, and normalizing the feature matrix to obtain a normalized feature matrix;

and A13, determining a spatial correlation matrix corresponding to the normalized feature matrix according to the feature orthogonality.

Specifically, in step a11, the downsampled feature map is obtained by downsampling the feature map to be processed, and is used to represent local feature information of the feature map to be processed. It can be understood that after the feature map to be processed is obtained, the feature map to be processed is downsampled to obtain a downsampled feature map. In an implementation manner of this embodiment, the downsampling may adopt an average pooling method, after the feature map to be processed is obtained, an average pooling kernel corresponding to the feature map to be processed is determined, and the feature map to be processed is averaged and pooled based on the average pooling kernel, so as to obtain a downsampled feature map corresponding to the feature map to be processed. For example, assuming that the feature map x to be processed is a feature map of H × W × C, the average pooling kernel is k × k (e.g., k is 5, etc.), and the step size is k, a sliding window is used, and feature value average pooling is performed for each local region of k × k to obtain a down-sampled feature map x 'corresponding to the feature map to be processed, where x' ∈ R^{H′×W′×C}. Of course, it should be noted that the average pooling is only one implementation for obtaining the downsampled feature map in the present embodiment, and in practical applications, other implementations may be adopted as long as it can be determinedThe method of downsampling the feature map corresponding to the feature map to be processed may be, for example, bilinear pooling.

Further, in step a12, the converting the downsampled feature map into the feature matrix means that the downsampled feature map is represented in a matrix form, where the feature matrix includes elements of H '× W', and each element of the feature matrix is a multidimensional feature vector whose dimension is the same as the number of channels of the downsampled feature map. For example, if the number of channels of the downsampled feature map is H '× W' × C, then the number of elements of the feature matrix is H '× W', and each element is a C-dimensional feature vector. In an implementation manner of this embodiment, the converting the downsampled feature map into the feature matrix may be: for x' ∈ R^{H′×W′×C}The down-sampled feature maps H '× W' corresponding to each channel along the channel direction are arranged in the column direction, so that the down-sampled feature maps corresponding to the channel are stretched into a feature vector of H '× W', and the down-sampled feature maps X 'are mapped to a (H' × W ') × C feature matrix X'. In addition, after the feature matrix is obtained, the feature matrix is normalized, wherein the normalization formula may be:

wherein the content of the first and second substances,

and X' is a feature matrix after normalization.

Further, in the step a13, after the normalized feature matrix is obtained, a spatial correlation matrix is calculated according to the orthogonality of the features, where a calculation formula of the spatial correlation matrix may be:

wherein P is a spatial correlationThe matrix is a matrix of a plurality of matrices,

for the normalized feature matrix, I is a (H '× W') × (H '× W') identity matrix, where each element P in the spatial correlation matrix P_i,jRepresenting the magnitude of the correlation of the ith element and the jth element in the normalized feature vector.

Further, in an implementation manner of this embodiment, the determining, based on the spatial correlation matrix, a first discarding mask matrix corresponding to the downsampled feature map specifically includes:

b11, determining a first discarding probability vector corresponding to the down-sampling feature map according to the spatial correlation matrix, and determining a third discarding mask matrix corresponding to the down-sampling feature map according to the first discarding probability vector;

b12, determining a first candidate discarding proportion corresponding to the third discarding mask matrix, and determining a first target discarding proportion corresponding to the downsampling feature map according to a preset first discarding proportion and the first candidate discarding proportion;

b13, determining a fourth discarding mask matrix corresponding to the downsampled feature map according to the first target discarding proportion;

and B14, determining a first discarding mask matrix corresponding to the feature map to be processed according to the third discarding mask matrix and the fourth discarding mask matrix.

Specifically, the third discarded mask matrix is a discarded mask matrix based on feature correlation, and the third discarded mask matrix is used for reflecting discarded feature point information in the feature map to be processed, where a part of element values in the third discarded mask matrix is a first preset value, and a part of element values is a second preset value, where the first element value is used to indicate that a feature point corresponding to the element in the downsampled feature map is a discarded feature point; the second element value is used to indicate that the feature point corresponding to the element in the downsampled feature map is not a discarded feature point. In an implementation manner of this embodiment, the first element value may be 0, and the second element value may be 1.

The first discarding probability vector is determined according to the feature correlation, each numerical value in the first discarding probability vector represents the probability that the feature point corresponding to the numerical value is discarded, wherein the feature point with a larger numerical value has smaller correlation with other feature points in the down-sampling feature map, and conversely, the feature point with a smaller numerical value has larger correlation with other feature points in the down-sampling feature map. It can be understood that, in the correlation scores of the feature point correspondences determined by the spatial correlation matrix, the feature point correspondences with lower correlation scores have higher discarding probabilities, and conversely, the feature point correspondences with higher correlation scores have lower missing probabilities.

Further, the determining process of the first drop probability vector may be: firstly, averaging each row in a spatial correlation matrix P to obtain a correlation vector of H '. multidot.W', wherein each element in the correlation vector represents the correlation importance degree of the element corresponding to other elements in a down-sampling feature map; and secondly, normalizing the relevant feature vector, and determining the discarding probability corresponding to each element in the relevant feature vector according to the normalized relevant feature vector to obtain a first discarding probability vector. Wherein the normalization may be performed by using a softmax function, and a calculation formula of each first discard probability in the first discard probability vector is:

wherein, γ_iIs the ith first drop probability, F, in the first drop probability vector_iIs the ith element in the normalized spatial correlation matrix; f_kTo normalize the kth element in the spatial correlation matrix, i and k are both [1, N]N is the number of elements in the normalized spatial correlation matrix, where N is the product of width and height according to the downsampled feature map, e.g., when the image scale of the local feature is (H '× W') × C, N ═ H '× W').

Further, after the first discarding probability vector is obtained, the first discarding probability is determinedEach element gamma in the vector_iBernoulli sampling is performed to obtain an N-dimensional mask vector, and the N-dimensional mask vector is rearranged by columns into a third drop mask matrix of H '× W'. Wherein, the calculation formula of the bernoulli sampling can be:

m_i＝Bernourlli(1-r_i)

wherein m is_iThe i-th element, γ, representing an N-dimensional mask vector_iIs the ith first drop probability in the first drop probability vector; bernourli () represents bernoulli sampling.

Further, the first discarding proportion is preset and is used for representing the proportion of discarded feature points in the feature map to be processed, and the preset first discarding proportion may be set according to actual requirements, and is not specifically limited herein, and only specific examples are given for explanation, for example, 1/10 and the like. In addition, after a preset first discarding proportion is obtained, a first candidate discarding proportion corresponding to the first discarding mask matrix may be determined according to the first discarding mask matrix, after the first candidate discarding proportion is obtained, the preset first discarding proportion is adjusted according to the first candidate discarding proportion, the adjusted first discarding proportion is used as a first target discarding proportion corresponding to the feature map to be processed, and the second discarding mask matrix is determined based on the first target discarding proportion. Wherein, the calculation formulas of the first candidate discarding proportion and the first target discarding proportion are respectively:

p′＝p/p_m

wherein p is_mThe candidate discarding ratio is a first candidate discarding ratio, N is the number of elements in the first discarding mask matrix, M is the first discarding mask matrix, sum (M) is the number of elements with an element value of 1 in the first discarding mask matrix, p' is a first target discarding ratio, and p is a preset first discarding ratio.

Further, after the first target discarding proportion is obtained, based on the first target discarding proportion, the bernoulli sampling is adopted to sample and mask the third discarding mask matrix to obtain a fourth discarding mask matrix, where a bernoulli sampling formula corresponding to the fourth discarding mask matrix may be:

b_i＝Bernourlli(1-p′)；

wherein, b_iThe ith element in the mask matrix is discarded for the fourth.

Further, the first discarding mask matrix is determined by considering spatial correlation discarding and a first discarding proportion at the same time, wherein an element with an element value of 0 in the first discarding mask matrix represents that a feature point corresponding to the element in the down-sampling feature map is a discarded feature, and an element with an element value of not 0 in the first discarding mask matrix represents that a feature point corresponding to the element in the down-sampling feature map is not a discarded feature. In this embodiment, for an element whose element value in the discarding mask matrix is 0, the element value corresponding to the element in the first discarding mask matrix and the element value corresponding to the element in the second discarding mask matrix are both 0. Correspondingly, the determining, according to the third discarding mask matrix and the fourth discarding mask matrix, the first discarding mask matrix corresponding to the feature map to be processed specifically includes:

Specifically, a dimensional space corresponding to a third discarding mask matrix corresponding to the downsampled feature map is the same as a dimensional space corresponding to a fourth discarding mask matrix, and the dimensional space corresponding to the third discarding mask matrix is the same as the dimensional space corresponding to the fourth discarding mask matrix. It is to be understood that, for each element in the discard mask matrix, the target element corresponding to the element may be obtained in both the third discard mask matrix and the fourth discard mask matrix. And, for an element whose element value is 0 in the discard mask matrix, the element value of the corresponding target element in the third discard mask matrix is 0, and the element value of the corresponding target element in the fourth discard mask matrix is 0. Thus, after the third discard mask matrix and the fourth discard mask matrix are obtained, the calculation formula of each element in the first discard mask matrix may be:

wherein s is_i,jFor the element in the ith row and jth column of the discard mask matrix corresponding to the downsampled feature map, m_i,jFor the elements of the ith row and jth column in the third discard mask matrix, b_i,jThe element in the ith row and jth column of the fourth drop mask matrix.

Further, after the first discarding mask matrix corresponding to the downsampled feature map is obtained, the first discarding mask matrix corresponding to the downsampled feature map is upsampled, so that the image size corresponding to the first discarding mask matrix obtained through upsampling is the same as the image size of the feature map to be processed. In addition, in the first discarding mask matrix, the element values in a preset area around an element of which the element value is 0 in the first discarding mask matrix corresponding to the downsampling feature map are all 0, wherein the size of the preset area is the same as that of an upsampling core of the upsampling, so that the purpose of local area discarding is achieved.

S20, determining a first feature map corresponding to the feature map to be processed according to the first discarding mask matrix and the feature map to be processed.

Specifically, the step of multiplying the first discarding mask matrix by the feature map to be processed according to the first discarding mask matrix and the feature map to be processed by a dot product is performed to obtain a first feature map corresponding to the feature map to be processed. The first feature map refers to setting the feature point of the region corresponding to the element value 0 in the first discarding mask matrix in the feature map to be processed to 0, so as to discard the feature point of the part.

S30, determining a channel correlation vector corresponding to the first feature map, and determining a second discarding mask matrix corresponding to the feature map to be processed based on the channel correlation vector.

Specifically, the channel correlation vector is used to reflect the correlation between the channels in the first feature map, and each element in the channel correlation matrix is used to reflect the degree of importance of the correlation between the channel corresponding to the element in the first feature map and the other channels in the first feature map. The second discarding mask matrix is a discarding mask matrix based on channel correlation, and is used for reflecting discarded channel information in the first feature map, wherein part of element values in the second discarding mask matrix are first preset values, and part of element values are second preset values, and the first element value is used for indicating that a channel corresponding to the element in the first feature map is a discarding channel; the second element value is used to indicate that the channel corresponding to the element in the first feature map is not a drop channel. In an implementation manner of this embodiment, the first element value may be 0, and the second element value may be 1.

In an implementation manner of this embodiment, the determining the channel correlation vector corresponding to the first feature map specifically includes:

Specifically, the converting the first feature map along the channel direction means that each channel in the first feature map is stretched to obtain a vector, and the vectors obtained by stretching the channels are arranged in the channel sequence to be spliced to obtain the feature matrix. For example, the image scale of the first feature map is H × W × C, the feature map matrix of each channel is stretched into vectors, the vector dimension is N ═ H × W, and C channels are sequentially spliced according to the channels to obtain the feature matrix V, where the dimension of the feature map V is N × C.

Further, after the first characteristic diagram is obtained, a correlation matrix is determined, and a channel correlation matrix is determined according to the correlation matrix. The calculation formula of the correlation matrix may be: p ═ V^TX V-I, where P is a correlation matrix, V is a feature matrix, and I is a C x C identity matrix, where each element P in the correlation matrix P_i,jIndicating the correlation size of the ith channel and the jth channel.

In an implementation manner of this embodiment, the determining, based on the channel correlation vector, the second discard mask matrix corresponding to the to-be-processed feature map specifically includes:

Specifically, the second discarding probability vector is determined according to the feature correlation, and each value in the first discarding probability vector represents a probability that the channel corresponding to the value is discarded, where a channel with a larger value has a smaller correlation with other channels in the first feature map, and conversely, a channel with a smaller value has a larger correlation with other channels in the first feature map. It can be understood that, in the correlation scores corresponding to the channels determined by the second discard probability vector, the lower the correlation score is, the higher the discard probability is, and conversely, the higher the correlation score is, the lower the loss probability is.

Further, the second discard probability vector may be determined by: firstly, averaging each row in a correlation matrix P to obtain a1 × C channel correlation vector F, wherein each element in the channel correlation vector represents the correlation importance degree of the element corresponding to other elements in the first feature map; and secondly, normalizing the channel correlation characteristic vector F, and determining the discarding probability corresponding to each element in the channel correlation characteristic vector according to the normalized channel correlation characteristic vector to obtain a second discarding probability vector. Wherein the normalization may be performed by using a softmax function, and a calculation formula of each second discard probability in the second discard probability vector is:

wherein, γ_iIs the ith second drop probability, F, in the second drop probability vector_iThe ith element in the normalized channel correlation matrix is obtained; f_kFor the kth element in the normalized channel correlation matrix, i and k are both [1, C]C is the number of elements in the normalized channel correlation matrix, wherein C is the number of channels according to the first feature map.

Further, after the second discarding probability vector is obtained, γ is obtained for each element in the second discarding probability vector_iCarrying out Bernoulli sampling to obtain a C-dimensional mask vector, rearranging the C-dimensional mask vector into a fifth discarding mask matrix of 1 × C according to columns, wherein each element in the fifth discarding mask matrix is the C-dimensional mask vector. Furthermore, the formula for calculating bernoulli sampling may be:

m_i＝Bernourlli(1-r_i)

wherein m is_iThe i-th element, γ, representing a C-dimensional mask vector_iIs as followsAn ith first drop probability in a drop probability vector; bernourli () represents bernoulli sampling.

Further, the second discarding proportion is preset to indicate the proportion of discarded channels in the first characteristic diagram, and the second discarding proportion may be set according to actual requirements, and is not specifically limited herein, and only specific examples are given for explanation, for example, 1/10 and the like. In addition, after a preset first and second discarding proportion is obtained, a second candidate discarding proportion corresponding to the second discarding mask matrix may be determined according to the fifth discarding mask matrix, after the second candidate discarding proportion is obtained, the preset second discarding proportion is adjusted according to the second candidate discarding proportion, the adjusted second discarding proportion is used as a second target discarding proportion corresponding to the feature map to be processed, and the sixth discarding mask matrix is determined based on the second target discarding proportion. Wherein, the calculation formulas of the second candidate discarding proportion and the second target discarding proportion are respectively:

p′＝p/p_m

wherein p is_mAnd C is the second candidate discarding ratio, C is the number of elements in the second discarding mask matrix, M is the fifth discarding mask matrix, sum (M) is the number of elements with element values of 1 in the fifth discarding mask matrix, p' is the second target discarding ratio, and p is the preset second discarding ratio.

Further, after the second target discarding proportion is obtained, based on the second target discarding proportion, the fifth discarding mask matrix may be sampled and masked by using bernoulli sampling to obtain a sixth discarding mask matrix, where a bernoulli sampling formula corresponding to the sixth discarding mask matrix may be:

b_i＝Bernourlli(1-p′)；

wherein, b_iFor the ith element in the sixth discard mask B, B ∈ R^1×C。

Further, the second discard mask matrix is determined by considering both the channel correlation discard and the second discard proportion, where an element with an element value of 0 in the second discard mask matrix indicates that the channel corresponding to the element in the first feature map is a discarded channel, and an element with an element value of not 0 in the second discard mask matrix indicates that the channel corresponding to the element in the first feature map is not a discarded channel. In this embodiment, for an element whose element value is 0 in the second discard mask matrix, the element value corresponding to the element in the fifth discard mask matrix and the element value corresponding to the element in the sixth discard mask matrix are both 0. Correspondingly, the determining, according to the fifth discarding mask matrix and the sixth discarding mask matrix, the second discarding mask matrix corresponding to the first feature map specifically includes:

selecting a target element from the fifth discarding mask matrix, where an element value of the target element and an element value of a candidate element corresponding to the target element are both 0, and the candidate element is that an element position in the sixth discarding mask is the same as an element position of the target element in the fifth discarding mask matrix;

and setting the element values of other elements except the target element in the fifth discarding mask matrix to be 1 so as to obtain a second discarding mask matrix corresponding to the first feature map.

Specifically, a dimension space corresponding to a fifth discarding mask matrix corresponding to the first feature map is the same as a dimension space corresponding to a sixth discarding mask matrix, and the dimension space corresponding to the fifth discarding mask matrix is the same as the dimension space corresponding to the sixth discarding mask matrix. It is to be understood that, for each element in the second discard mask matrix, the target element corresponding to the element may be obtained in both the fifth discard mask matrix and the sixth discard mask matrix. And, for an element whose element value is 0 in the second drop mask matrix, the element value of the corresponding target element in the fifth drop mask matrix is 0, and the element value of the corresponding target element in the sixth drop mask matrix is 0. Thus, after acquiring the fifth discard mask matrix and the sixth discard mask matrix, the calculation formula of each element in the second discard mask matrix may be:

wherein s is_i,jM is the element of the ith row and the jth column in the second discard mask matrix corresponding to the first feature map_i,jFor the element in the ith row and jth column of the fifth discard mask matrix, b_i,jThe element in the ith row and jth column of the sixth drop mask matrix.

And S40, determining the discarded feature map corresponding to the feature map to be processed according to the second discarded mask matrix and the first feature map.

Specifically, after the second discarding mask matrix and the first feature map are obtained, the second discarding mask matrix and the first feature map are dot product multiplied to obtain a discarded feature map corresponding to the feature map to be processed, where the dot product multiplication of the second discarding mask matrix and the first feature map is to multiply an element of the second discarding mask matrix with each element in a channel corresponding to the second discarding mask matrix, so as to set each element in the channel corresponding to the second discarding mask matrix to 0, so as to discard a feature point in the channel. It is to be understood that, for a channel in the discarded feature map corresponding to the element value 0 in the second discard mask matrix, the value of each element point in the channel is 0.

Further, in an implementation manner of this embodiment, the neural network regularization method based on the feature space correlation may be made into a regularization module, and the regularization module may be used after the convolutional neural network to regularize the mean-price neural network in a training process of the convolutional neural network, so as to improve a generalization performance of the model. In a specific implementation manner, the regularization module may include a first execution module, a second execution module, and a third execution module, where the first execution module is connected to the second execution module, the second execution module is connected to the third execution module, the first execution module is configured to perform downsampling on the feature map to be processed to obtain a downsampled feature map, and the second execution module is configured to perform feature discarding based on spatial domain feature correlation; the third execution module is used for feature discarding based on the channel domain feature correlation. It is to be understood that the first execution module includes a convolution layer, and the feature map to be processed is downsampled by the convolution layer; the second execution module is used for determining a first discarding mask matrix corresponding to the down-sampling feature map and discarding the features of the down-sampling feature map based on the first discarding mask matrix to obtain a first feature map; the third execution module is used for determining a second discarding mask matrix corresponding to the first feature diagram, and performing feature diagram discarding on the first feature diagram based on the second discarding mask matrix to obtain a discarded feature diagram.

Based on this, the present embodiment provides a neural network regularization method based on feature space correlation, and also provides a neural network model, where the neural model includes at least one convolution module, the convolution module includes a convolution layer and a regularization module, and an output term of the convolution layer is an input term of the regularization module; the regularization module is used for executing the neural network regularization method based on the characteristic space correlation as the embodiment.

Further, the regularization module may be executed by a computer device, and in the computer execution process, the regularization module may be constructed by: respectively constructing a first execution module, a second execution module and a third execution module, wherein the first execution module is used for down-sampling the feature map to be processed to obtain a down-sampled feature map, and the second execution module is used for discarding the features based on the spatial domain feature correlation; the third execution module is used for feature discarding based on the channel domain feature correlation. After the first module, the second module and the third module are obtained through construction, the first module, the second module and the third module are connected according to a preset connection rule to form a regularization module, and the regularization module is added to any convolution layer in the existing image classification network to construct an image classification CNN network. In addition, after the image classification convolutional neural network is constructed, the constructed CNN classification network can be trained according to a pre-constructed image data set comprising images of different categories and corresponding labels thereof, so that the constructed CNN classification network outputs correct image labels.

Of course, in practical applications, the regularization module may be a general module, and may be embedded into an existing Convolutional Neural Network (CNN). For example, as shown in fig. 4, is a standard residual block that includes two convolutional layers and a RELU activation layer and a residual connection; the convolution module (corrprep module) can be embedded directly into the residual module. Therefore, the regularization method based on the characteristic correlation can be embedded into any CNN network structure, and has good universality; meanwhile, the regularization method effectively selects the low-relevance features in the CNN feature map to discard so as to achieve the purpose of self-adaptive discarding, so that the CNN network can be effectively regularized, and the generalization capability of the model is improved.

In addition, in order to show the effect that the regularization method provided by the application performs effective regularization on the CNN network and improves the generalization capability of the model, the application classifies the accuracy of ResNet20 on CIFAR-10 and CIFAR-100 image classification data sets under different regularization methods, and the obtained result is shown in Table 1. As can be seen from Table 1, the method of the present application allows the model to obtain higher test accuracy on the test set through feature-based adaptive discarding compared with other methods.

TABLE 1 Classification accuracy table of ResNet20 under different regular methods

In summary, the present embodiment provides a neural network regularization method based on feature spatial correlation, where the method includes: acquiring a spatial correlation matrix of a feature map to be processed, and determining a first discarding mask matrix based on the spatial correlation matrix; determining a first feature map corresponding to the feature map to be processed according to the first discarding mask matrix and the feature map to be processed; determining a channel correlation vector corresponding to the first feature map, and determining a second discarding mask matrix corresponding to the feature map to be processed based on the channel correlation vector; and determining the discarded characteristic diagram corresponding to the characteristic diagram to be processed according to the second discarding mask matrix and the first characteristic diagram. According to the method, the characteristic graph is subjected to characteristic discarding based on the spatial characteristic correlation and the channel characteristic correlation, so that the characteristic region is discarded at different probabilities according to the strength of the characteristic point correlation in a training stage, more discrimination semantic information is reserved for network characteristics, a good regularization effect is achieved for CNN, the problem of CNN over-training fitting is avoided, and the generalization of CNN is improved. Meanwhile, the regularization method can be used as a feature discarding general module to realize the self-adaptive discarding of feature points, and can be embedded into CNNs of different layers or different structures, so that the application range of the regularization method is widened.

Based on the above neural network regularization method based on feature spatial correlation, the present embodiment provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement performing the convolutional neural network as described above.

Based on the above neural network regularization method based on feature spatial correlation, the present invention further provides a terminal device, as shown in fig. 5, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory)22, and may further include a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. Processor 20 may invoke logic instructions in memory 22 to execute the convolutional neural network as described above.

Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.

The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.

In addition, the specific processes loaded and executed by the plurality of instruction processors in the neural network, the storage medium and the terminal device are described in detail in the method, and are not stated herein.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for regularization of a neural network based on feature spatial correlation, the method comprising:

2. The method according to claim 1, wherein the feature map to be processed is a feature map output by a convolution layer in the neural network.

3. The method according to claim 1, wherein the obtaining of the spatial correlation matrix of the feature map to be processed specifically includes:

4. The method according to claim 1, wherein the determining the first discard mask matrix corresponding to the downsampled feature map based on the spatial correlation matrix specifically includes:

5. The method according to claim 4, wherein the determining, according to the third discarding mask matrix and the fourth discarding mask matrix, the first discarding mask matrix corresponding to the feature map to be processed specifically includes:

6. The method according to claim 1, wherein the determining, according to the first discarding mask matrix and the feature map to be processed, the first feature map corresponding to the feature map to be processed specifically includes:

7. The method according to claim 1, wherein the determining the channel correlation vector corresponding to the first feature map specifically includes:

8. The feature space correlation-based neural network regularization method according to claim 7, wherein the determining a second discarding mask matrix corresponding to the feature map to be processed based on the channel correlation vector specifically includes:

9. A neural network model, characterized in that the neural model comprises at least one convolution module, the convolution module comprises convolution layers and a regularization module, and the output items of the convolution layers are the input items of the regularization module; the regularization module is configured to perform the feature spatial correlation based neural network regularization method according to any one of claims 1 to 8.

10. A terminal device, characterized in that it is loaded with a neural network model according to claim 9.