CN111967574B

CN111967574B - Tensor singular value delimitation-based convolutional neural network training method

Info

Publication number: CN111967574B
Application number: CN202010700940.7A
Authority: CN
Inventors: 郭锴凌; 陈琦; 徐向民
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2024-01-23
Anticipated expiration: 2040-07-20
Also published as: CN111967574A

Abstract

The invention discloses a convolutional neural network training method based on tensor singular value delimitation, which comprises the following steps: s1, initializing the weight of a convolutional neural network, so that weight matrixes of a full-connection layer and a convolutional layer are orthogonal, and singular values of weight tensors of the convolutional layer are equal; s2, training the convolutional neural network by using a random gradient descent method or a random gradient descent method deformation; and S3, after a plurality of training iterations, alternating updating of matrix singular value delimitation and tensor singular value delimitation is carried out on the weights of the convolution layers, matrix singular value delimitation updating is carried out on the weights of the full-connection layers, and delimitation updating is carried out on the weights of the batch standardization layers. The invention proposes to add an orthogonal constraint to the weight tensor, which not only maintains the network energy, but also does not destroy the tensor structure of the weights. Aiming at the constraint of the orthogonal tensor, the invention provides the threshold limiting of the singular value of the weight tensor, realizes the training of the orthogonal tensor network and improves the performance of the image classification network.

Description

Tensor singular value delimitation-based convolutional neural network training method

Technical Field

The invention belongs to the field of artificial intelligence, relates to machine learning and deep learning, and in particular relates to a convolutional neural network training method based on tensor singular value delimitation.

Background

Deep convolutional neural networks have met with great success in many applications, such as image classification and object detection. Convolutional neural networks have been successful, primarily because they have a strong expressive power to represent complex relationships from input to output. But the strong expression capacity on the other hand also increases the risk of overfitting. To mitigate overfitting, researchers have proposed a number of techniques such as weight decay, dropout, and tag perturbation. The deep layered structure of the convolutional neural network cascade also brings the problems of gradient disappearance/explosion, saddle point diffusion and the like, and the training is difficult. To solve these problems, methods of parameter initialization, direct connection (shortcut) and BN are proposed to simplify the optimization of convolutional neural networks.

Orthogonality is also used to solve the over-fitting and optimization problems of deep convolutional neural networks. It has been shown in theory that when the singular values of the weight matrix are equal, the convolutional neural network can achieve an optimal generalization error, reducing the risk of overfitting. Orthogonality also limits the magnitude of the gradient and stabilizes the distribution of the activation outputs of the layers, making optimization more efficient. There are many convolutional neural network methods proposed that utilize orthogonality for constraint. The gray matrix of the soft-orthonormal regularization constraint weight matrix approaches the identity matrix at the F-norm. By analyzing the limited equidistant characteristics, the F-norm is replaced by the spectral norm, resulting in improved performance. Since the orthogonal matrix is located on the stiifel manifold, the projection gradient drop is also a class of methods to solve the problem of deep learning optimization with strict orthogonal constraints. The linear module adds a module for converting the common weight matrix into the orthogonal matrix on the network structure design, and can be optimized by a common gradient descent method. By relaxing the hard orthogonal constraint, the singular value bounding method rapidly solves the orthogonal constraint by limiting the threshold range around 1 for all singular values of each weight matrix after each training period (i.e., the number of iterations required to traverse the entire training data once) is completed.

The orthogonality constraint is applied to convolutional neural network training successfully, for example, the convergence stability in convolutional neural network training can be improved by adding the orthogonality constraint, an orthogonal regularization loss function is utilized in the convolutional neural network training process, singular value delimitation is carried out after the weight is unfolded into a two-dimensional matrix, the weight of the convolutional neural network is unfolded into the two-dimensional matrix and then is multiplied by the pseudo inverse of the two-dimensional matrix, a new orthogonality constraint is formed for the convolutional kernel weight of the convolutional neural network, and the method can improve the recognition accuracy of the network (Huang L, liu X, lang B, et al, orthology weight normalization: solution to optimization over multiple dependent Stiefel manifolds in deep neural networks [ C ]// third-Second AAAI Conference on Artificial interaction.2018.). However, when the constraint is carried out, the weight tensor of the convolution layer is unfolded into a matrix, and the structural characteristic of the tensor is destroyed. Tensor-tensor product is a newly defined tensor operation that can derive the properties of a series of analog matrices, and has received attention in the field of machine learning in recent years, with many successful applications in preserving tensor structural properties. The robust principal component analysis is promoted to tensors by combining tensor singular value decomposition (tensor-Singular Value Decomposition), and the method has good effect on image video processing. The tensor singular value decomposition based on tensor-tensor product deduction promotes the orthogonal matrix constraint to the tensor, and realizes the convolutional neural network training method based on tensor structure constraint.

Disclosure of Invention

The invention provides a convolutional neural network training method based on tensor singular value delimitation, which aims to add structural constraints of a matrix and tensors to a convolutional neural network, so that the performance of the convolutional neural network is improved.

On the basis of the constraint condition design of an objective function, the constraint that tensor singular values are equal is added on the basis of the constraint condition of orthogonality of a weight matrix, and the fact that the weight tensor obtained by solving a network is an orthogonal tensor or the product of the orthogonal tensor and a constant is theoretically ensured. In the training optimization process, after a plurality of optimization iterations, the matrix singular values and the tensor singular values of the weights are respectively limited within a certain threshold range, so that the solved weights approximately meet the constraint condition of the objective function.

The invention is realized at least by one of the following technical schemes.

A convolutional neural network training method based on tensor singular value delimitation comprises the following steps:

s1, initializing the weight of a convolutional neural network, so that weight matrixes of a full-connection layer and a convolutional layer are orthogonal, and singular values of weight tensors of the convolutional layer are equal;

s2, training the initialized convolutional neural network by using a random gradient descent method (Stochastic Gradient Descent, SGD for short) or a variant thereof (comprising SGD with momentum, SGD with Nesterov Momentum and AdaGrad, adadelta, RMSprop, adam);

and S3, after a plurality of training iterations, alternating updating of matrix singular value delimitation and tensor singular value delimitation is carried out on the convolution layer weight of the convolution neural network, matrix singular value delimitation updating is carried out on the full-connection layer weight of the convolution neural network, and delimitation updating is carried out on the weight of a batch standardization (Batch Normalization, BN) layer. If the loss function converges, the training is finished; if the loss function has not converged, the process returns to step S2.

Further, step S1 performs constraint of matrix orthogonality and tensor singular value equality on the initialization of the convolutional neural network weights.

Further, in step S3, after the full-connection layer weights of the convolutional neural network are randomly initialized, all singular values of the full-connection layer weight matrix are limited to 1, so that the full-connection layer weight matrix is orthogonal.

Further, after randomly initializing the convolution layer weights of the convolution neural network in step S3, the following operations are alternately performed until convergence:

1) Limiting all singular values of the convolution layer weight matrix to 1 so that the convolution layer weight matrix is orthogonal;

2) All singular values of the convolution layer weight tensor are limited to be equal on the premise that a freude Luo Beini us norm (hereinafter referred to as F-norm) is kept unchanged, so that the convolution layer weight tensor is an orthogonal tensor or a product of the orthogonal tensor and a constant.

Further, step S2 performs training iterations on the initialized convolutional neural network for several times by adopting a random gradient descent method or a variation thereof, and then updates the weight of the convolutional neural network by using threshold delimitation.

Further, step S3 of matrix singular value delimiting the weight matrices of the convolutional layer and the fully-connected layer includes the following steps:

a) Performing matrix singular value decomposition on the weight matrix;

b) Threshold constraint is carried out on each singular value of the weight matrix, so that each singular value is in the vicinity of 1;

c) And reconstructing a weight matrix according to the updated singular values.

Further, the tensor singular value delimiting of step S3 comprises the steps of:

(1) keeping the F-norm of the tensor equal to the F-norm of the corresponding orthogonal matrix, and calculating expected singular values when all tensor singular values are equal;

(2) performing tensor singular value decomposition on the weight tensor;

(3) threshold constraint is carried out on each singular value of the weight tensor, so that each singular value is near the calculated expected singular value;

(4) and reconstructing a weight tensor according to the updated singular value.

Further, step S3 of thresholding the weights of the BN layer comprises the steps of:

calculating the average value of the quotient of each neuron weight and the input standard layer;

and (II) limiting the quotient of each neuron weight and the input standard layer to be near the corresponding mean value, and obtaining the weight of the new BN layer.

Compared with the prior art, the invention has the beneficial effects that:

constraint is carried out on the weight tensor of the convolutional neural network, and structural information of the weight tensor is reserved on a model structure; compared with a method for performing matrix orthogonal constraint on a weight matrix of a convolutional neural network in the aspect of optimizing performance, the method reduces the solving space of the network weight and simplifies optimization. The invention effectively improves the performance of the convolutional neural network.

Drawings

FIG. 1 is a flowchart of a training process of a convolutional neural network training method based on tensor singular value delimitation in the present embodiment;

FIG. 2 is a schematic diagram of singular value delimitation of the matrix of the present embodiment;

fig. 3 is a schematic diagram of tensor singular value delimitation in this embodiment.

Detailed Description

The present invention will be described in further detail by way of the following specific embodiments, but the embodiments of the present invention are not limited thereto.

The principle of the invention comprises: on the basis of orthogonal constraint on the weight matrixes of the convolution layer and the full connection layer of the convolution neural network, the singular values of the weight tensors of the convolution layer are further constrained to be equal, so that the structural characteristics of the weight tensors are reserved. The singular values of the matrix and the tensor are subjected to threshold limiting to approximately meet constraint conditions, so that a convolutional neural network training method is obtained, and network performance is improved.

As shown in fig. 1, a convolutional neural network training method based on tensor singular value delimitation includes the following steps:

specifically, the weights of the convolution layer and the full connection layer are initialized to random values, and then all singular values of the weight matrixes of the convolution layer and the full connection layer are set to be 1, so that the weight matrixes are orthogonal matrixes. For the convolution layer, on the premise of keeping F-norm unchanged, all singular values of the weight tensor of the convolution layer are further set to be equal, so that the weight tensor of the convolution layer is orthogonal tensor or the product of the orthogonal tensor and a constant, and the weight matrix of the convolution layer and the singular values of the weight tensor are alternately set until convergence.

S2, training the initialized convolutional neural network by adopting a random gradient descent method or a modification thereof (comprising SGD with momentum, SGD with Nesterov Momentum and AdaGrad, adadelta, RMSprop, adam). And (3) training and updating the weight according to the step (S3) after each training period.

And S3, updating the weight of the convolution layer by utilizing matrix singular value delimitation and tensor singular value delimitation.

For convenience of description, reference is made to the symbols involved. For any convolution layer, the convolution weight tensor isWherein->Is a three-dimensional real tensor, and the sizes of the tensor in three dimensions are K, C and d respectively ² R represents a real number, C is the number of input channels, K is the number of convolution kernels,d is the convolution kernel size. Convolution weight tensor->Obtaining a corresponding weight matrix of +.>Wherein R represents real numbers, K and Cd ² Representing the size of the two dimensions of the matrix, respectively. For the fully connected layer, the weight structure is a matrix. For convenience of representation, the invention uniformly uses W E R to the full connection matrix of the convolution layer and the full connection layer ^K×m Representing, when it represents a convolution matrix, m=cd ² 。

The step S3 is specifically as follows:

(1) Updating the weight of the convolution layer according to the matrix singular value, as shown in fig. 2, comprises the following steps:

(1) singular value decomposition (Singular Value Decomposition, hereinafter SVD) is performed on the convolution layer weight matrix W to obtain w=u Σv ^T Wherein U is a Kxm-order unitary matrix, Σ is an m x m-order non-negative real diagonal matrix with diagonal elements of singular values of W, V is an m x m-order unitary matrix, V ^T Represents the transpose of V.

(2) The threshold limit is performed on each diagonal element of the singular value matrix Σ as follows:

if sigma _i >1+ε ₁ Sigma is then _i ＝1+ε ₁ ；

If sigma _i <1/(1+ε ₁ ) Sigma is then _i ＝1/(1+ε ₁ )；

If 1/(1+ε) ₁ )≤σ _i ≤1+ε ₁ Sigma is then _i Remain unchanged;

wherein sigma _i The ith diagonal element, ε, representing Σ ₁ Representing smaller values, ranging from 0.1 to 0.5, for the singular value σ _i Delimitation is performed, constrained to be around 1.

(3) U 'sigma V' is calculated from the new sigma ^T A new convolution layer weight matrix W is obtained.

(2) Singular according to tensorThe value updates the weight of the convolutional layer. Convolving tensors using tensor singular value decomposition (Tensor Singular Value Decomposition:t-SVD)Break down and let(s)>And for tensor singular values->Delimitation is performed as shown in fig. 3. Wherein->And->Respectively the dimensions KxKxd ² And the dimension C x d ² Is a normal tensor of (2); />Is a front diagonal tensor (i.e., the tensor is a diagonal matrix for all front slices) of size KXCXd ² ，/>Diagonal elements of the first front slice of (a) are tensors +.>Tensor singular values of (a); * Is a tensor-tensor product. The singular value decomposition of tensors in actual operation can be obtained by utilizing the Fourier transformation and the singular value decomposition of a matrix, and the specific calculation process comprises the following steps: first, for the convolution weight tensor +.>Performing fast Fourier transform along dimension of convolution kernel to obtain result ∈R after Fourier transform>This calculation procedure is noted->Second, for->Is subjected to matrix singular value decomposition, i.e. +.>Wherein (1)>Representation->Is a matrix of i-th front side slices,is a Kxm-order unitary matrix, < >>Is an m×m order non-negative real diagonal matrix with diagonal elements of +.>Singular values of>Is an m×m-order unitary matrix, +.>Representation->Is a transpose of (2); finally, pair->Tensor for frontal slice>Performing fast Fourier transform along dimension of convolution kernel to obtain +.>I.e.From the nature of the inverse Fourier transform, +.>The elements of the first front slice of (a) are equal to all +.>Mean value of corresponding positions, thus pair->Can be delimited by singular values of +.>Is realized by singular value bounding, comprising the steps of:

(1) for convolution weight tensorPerforming fast Fourier transform along dimension of convolution kernel to obtain result ∈R after Fourier transform>I.e. < ->

(2) Calculating expected tensor singular values

(3) When i is 1 to d ² In between, the following operations are performed:

for a pair ofSVD-decomposing the matrix of the ith front slice to obtain +.>

For a pair ofIs thresholded as follows:

if it isThen->

If it isThen->

If it isThen->Remain unchanged.

Wherein,representation->Is the j-th diagonal element, epsilon ₂ Representing smaller values, in the range of 0.1-0.5, for singular values +.>Delimitation is performed, constrained to be around 1.

According to the newComputing matrix multiplication->Obtaining new->

(4) For newPerforming inverse fast Fourier transform along the dimension of the convolution kernel size to obtain a new convolution weight tensor +.>

(3) And (3) alternately performing a plurality of iterations of the step (1) and the step (2). The present embodiment suggests that the iteration is performed 1.5 times (i.e. the iteration is performed after the steps (1), 2) and (1), so that a better balance between the calculation time and the final effect can be achieved, but the method is not limited to 1.5 times in practical application.

S4, updating the weight of the full connection layer by utilizing matrix singular value delimitation, comprising the following steps:

(1) SVD decomposition is carried out on the weight matrix W of the full-connection layer to obtain W=U ΣV ^T 。

(2) For each diagonal element sigma of sigma _i Threshold limiting is performed as follows:

if sigma _i >1+ε, then σ _i ＝1+ε；

If sigma _i <1/(1+ε), σ _i ＝1/(1+ε)；

If 1/(1+ε). Ltoreq.σ _i Less than or equal to 1+ε, σ _i Remain unchanged.

(3) Calculating U sigma V from new sigma ^T A new W is obtained.

S5, carrying out delimitation updating on the weight of the BN layer. Assume that the input of BN layer is h E R ⁿ (i.e., h is a real vector of dimension n, where R represents a real number), then the operation of the BN layer can be expressed as:

BN(h)＝ΥΦ(h-μ)+β,

wherein n represents the number of channels of the BN layer, and is numerically equal to the number of convolution kernels of the convolution layer to which it is connected, i.e. n=k, μ is the mean value of the batch of neuron inputs, and the diagonal element of the diagonal matrix is the standard deviation of the batch of neuron inputs _i Is a diagonal matrix whose diagonal elements can learn the BN layer weights v _i Beta is a learnable BN layer bias term. The threshold limit updating of the BN layer specifically comprises the following steps:

(1) Calculating the mean value of the quotient of each neuron weight and the input standard layer

(2) For each diagonal element v of the mean y _i Threshold limiting is performed as follows:

if it isThen->

If it isThen->

If it isThen upsilon _i Remain unchanged.

Wherein v _i Is the weight of the ith neuron of the BN layer, phi _i Is the standard deviation of the ith neuron input of the batch,representing a smaller value (ranging from 0.1 to 0.5), weighting the BN layer by v _i Delimiting and constraining to be near alpha;

and S6, repeatedly executing the steps S3 to S5 until the convolutional neural network training converges.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The convolutional neural network training method based on tensor singular value delimitation is characterized by comprising the following steps of:

s2, training the initialized convolutional neural network by using a random gradient descent method or deformation thereof; after training iteration is carried out on the initialized convolutional neural network for a plurality of times by adopting a random gradient descent method or deformation thereof, updating the weight of the convolutional neural network by utilizing threshold delimitation;

s3, after a plurality of training iterations, alternating updating of matrix singular value delimitation and tensor singular value delimitation is carried out on the convolution layer weights of the convolution neural network, matrix singular value delimitation updating is carried out on all connection layer weights of the convolution neural network, delimitation updating is carried out on weights of a batch standardization layer (BN layer), and if a loss function converges, training is finished; if the loss function is not converged, returning to the step S2;

the matrix singular value delimitation of the weight matrix of the convolution layer and the full connection layer comprises the following steps:

a) Performing matrix singular value decomposition on the weight matrix;

c) Reconstructing a weight matrix according to the updated singular values, and popularizing robust principal component analysis on tensors by combining tensor singular value decomposition to be applied to image video processing;

after randomly initializing the convolution layer weight of the convolution neural network, the following operations are alternately performed until convergence:

2) On the premise of keeping the Frobenius norm (F-norm) of the Fu Luo Beini unchanged, all singular values of the convolution layer weight tensors are limited to be equal, so that the convolution layer weight tensors are orthogonal tensors or products of the orthogonal tensors and constants;

tensor singular value delimitation comprises the steps of:

(2) performing tensor singular value decomposition on the weight tensor;

(4) reconstructing a weight tensor according to the updated singular value;

threshold delimiting the weights of the BN layer comprises the steps of:

2. The method according to claim 1, characterized in that step S1 performs a constraint of matrix orthogonality and tensor singular value equality on the initialization of the convolutional neural network weights.

3. The method according to claim 2, wherein after the full-link layer weights of the convolutional neural network are randomly initialized in step S3, all singular values of the full-link layer weight matrix are limited to 1 so that the full-link layer weight matrix is orthogonal.