CN111967574A - Convolutional neural network training method based on tensor singular value delimitation - Google Patents

Convolutional neural network training method based on tensor singular value delimitation Download PDF

Info

Publication number
CN111967574A
CN111967574A CN202010700940.7A CN202010700940A CN111967574A CN 111967574 A CN111967574 A CN 111967574A CN 202010700940 A CN202010700940 A CN 202010700940A CN 111967574 A CN111967574 A CN 111967574A
Authority
CN
China
Prior art keywords
tensor
weight
singular value
layer
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010700940.7A
Other languages
Chinese (zh)
Other versions
CN111967574B (en
Inventor
郭锴凌
陈琦
徐向民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010700940.7A priority Critical patent/CN111967574B/en
Publication of CN111967574A publication Critical patent/CN111967574A/en
Application granted granted Critical
Publication of CN111967574B publication Critical patent/CN111967574B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a convolutional neural network training method based on tensor singular value delimitation, which comprises the following steps of: s1, initializing weights of the convolutional neural network to enable the weight matrixes of the fully-connected layer and the convolutional layer to be orthogonal, and enabling singular values of weight tensors of the convolutional layer to be equal; s2, training the convolutional neural network by using a random gradient descent method or a deformation thereof; and S3, after each training iteration, alternately updating matrix singular value delimitation and tensor singular value delimitation for the convolution layer weight, updating matrix singular value delimitation for the full-connection layer weight, and updating delimitation for the batch normalization layer weight. The invention provides a tensor structure which adds orthogonal constraint to the weight tensor, not only keeps network energy, but also does not damage the weight. Aiming at the constraint of the orthogonal tensor, the invention provides that the threshold value limitation is carried out on the singular value of the weight tensor, the training of the orthogonal tensor network is realized, and the performance of the image classification network is improved.

Description

Convolutional neural network training method based on tensor singular value delimitation
Technical Field
The invention belongs to the field of artificial intelligence, relates to machine learning and deep learning, and particularly relates to a convolutional neural network training method based on tensor singular value delimitation.
Background
Deep convolutional neural networks have enjoyed great success in many applications, such as image classification and object detection. Convolutional neural networks have been successful primarily because they have a powerful expressive power to represent complex relationships from input to output. But the strong expression capacity on the other hand also increases the risk of overfitting. To mitigate overfitting, researchers have introduced many tricks such as weight attenuation, dropout, and label perturbation. The cascaded deep hierarchical structure of the convolutional neural network also brings problems of gradient disappearance/explosion, saddle point diffusion and the like, and the training is difficult. In order to solve these problems, methods such as parameter initialization, direct connection (short) and BN have been proposed to simplify the optimization of convolutional neural networks.
Orthogonality is also used to solve the over-fitting and optimization problem of deep convolutional neural networks. Theoretically, it has been proved that when the singular values of the weight matrix are equal, the convolutional neural network can achieve the optimal generalization error, and the risk of overfitting is reduced. Orthogonality also limits the magnitude of the gradient and stabilizes the distribution of the activation outputs of the layers, making optimization more efficient. There are many convolutional neural network methods proposed that use orthogonality for constraints. The soft orthogonality regularization constrains the Graham matrix of the weight matrix to be near the identity matrix under the F-norm. By analyzing the limited equidistant characteristic, the F-norm is replaced by the frequency spectrum norm, and the performance is improved. Since the orthogonal matrix is located on the Stiefel manifold, projection gradient descent also becomes a method for solving the deep learning optimization problem with strict orthogonal constraint. The linear module is added with a module for converting a common weight matrix into an orthogonal matrix on the network structure design, and can be optimized by a common gradient descent method. By relaxing the hard orthogonal constraint, the singular value bounding method limits all singular values of each weight matrix to a threshold range near 1 after each training period (i.e. epoch, the iteration number required for traversing all the training data once) is finished, and the fast solution of the orthogonal constraint is realized.
The orthogonal constraint is successfully applied to convolutional neural network training, for example, the stability of convergence during convolutional neural network training can be increased by adding the orthogonal constraint, an orthogonal regularization loss function is utilized in the convolutional neural network training process, singular value delimitation is performed after weights are expanded into a two-dimensional matrix, the weights of the convolutional neural network are expanded into the two-dimensional matrix and then multiplied by the pseudo inverse of the two-dimensional matrix, and a new orthogonal constraint is formed on the convolution kernel weights of the convolutional neural network. However, in this method, the weight tensor of the convolutional layer is expanded into a matrix during constraint, and the structural characteristics of the tensor are destroyed. The tensor-tensor product is a newly defined tensor operation, which can deduce the properties of a series of analog matrixes, and is concerned by the field of machine learning in recent years, and has been successfully applied to the aspect of keeping the structural characteristics of the tensor. The robust principal component analysis is popularized to tensor by combining tensor-Singular Value Decomposition (tensor-Singular Value Decomposition), and the method has a good effect when being applied to image video processing. The invention discloses a tensor singular value decomposition method based on tensor-tensor product derivation, which is used for popularizing orthogonal matrix constraint to tensor and realizing a convolutional neural network training method based on tensor structure constraint.
Disclosure of Invention
The invention provides a convolutional neural network training method based on tensor singular value delimitation, which aims to add structural constraints of matrixes and tensors to a convolutional neural network, improve the network performance, integrate orthogonal constraints, simultaneously keep a tensor structure of network weight, and stably improve the performance of the convolutional neural network.
On the aspect of constraint condition design of an objective function, the invention adds constraint of tensor singular value equality on the basis of the constraint condition of weight matrix orthogonality, and theoretically ensures that a weight tensor obtained by network solution is an orthogonal tensor or a product of the orthogonal tensor and a constant. In the optimization process of training, after each time of a plurality of suboptimal iterations, matrix singular values and tensor singular values of weights are respectively limited within a certain threshold range, so that the solved weights approximately meet constraint conditions of an objective function.
The invention is realized by at least one of the following technical schemes.
A convolutional neural network training method based on tensor singular value delimitation comprises the following steps:
s1, initializing weights of the convolutional neural network to enable the weight matrixes of the fully-connected layer and the convolutional layer to be orthogonal, and enabling singular values of weight tensors of the convolutional layer to be equal;
s2, training the initialized convolutional neural network by using a Stochastic Gradient Descent method (SGD for short) or a deformation thereof (including SGD with Momentum, SGD with Nesterov Momentum, AdaGrad, Adadelta, RMSprop and Adam);
and S3, after each training iteration, alternately updating matrix singular value delimitation and tensor singular value delimitation for the convolutional layer weight of the convolutional neural network, updating matrix singular value delimitation for the fully-connected layer weight of the convolutional neural network, and updating delimitation for the weight of a Batch standardization (Batch Normalization, hereinafter referred to as BN) layer. If the loss function is converged, the training is finished; if the loss function has not converged, the process returns to step S2.
Further, step S1 performs a constraint that matrix orthogonality and tensor singular values are equal for initialization of convolutional neural network weights.
Further, after the full-link layer weights of the convolutional neural network are initialized randomly in step S3, all singular values of the full-link layer weight matrix are limited to 1, so that the full-link layer weight matrix is orthogonal.
Further, after the convolutional layer weights of the convolutional neural network are initialized randomly in step S3, the following operations are performed alternately until convergence:
1) limiting all singular values of the convolutional layer weight matrix to 1 so that the convolutional layer weight matrix is orthogonal;
2) on the premise of keeping the Frobenius norm (hereinafter referred to as F norm) unchanged, all singular values of the convolutional layer weight tensor are restricted to be equal, so that the convolutional layer weight tensor is an orthogonal tensor or a product of the orthogonal tensor and a constant.
Further, in step S2, after performing training iterations on the initialized convolutional neural network for several times by using a random gradient descent method or its deformation, the weights of the convolutional neural network are updated by using threshold delimitation.
Further, the step S3 of performing matrix singular value delimitation on the weight matrices of the convolutional layer and the fully-connected layer includes the following steps:
a) performing matrix singular value decomposition on the weight matrix;
b) performing threshold value constraint on each singular value of the weight matrix, so that each singular value is close to 1;
c) and reconstructing a weight matrix according to the updated singular value.
Further, the tensor singular value delimitation of step S3 includes the following steps:
keeping the F-norm of the tensor equal to the F-norm of the corresponding orthogonal matrix, and calculating expected singular values when all tensor singular values are equal;
carrying out tensor singular value decomposition on the weight tensor;
carrying out threshold value constraint on each singular value of the weight tensor to enable each singular value to be close to the expected singular value obtained through calculation;
and fourthly, reconstructing the weight tensor according to the updated singular value.
Further, the step S3 of thresholding the weight of the BN layer includes the following steps:
calculating the average value of the quotient of each neuron weight and the input standard layer;
and (II) limiting the quotient of each neuron weight and the input standard layer to be close to the corresponding mean value, and obtaining the weight of a new BN layer.
Compared with the prior art, the invention has the beneficial effects that:
the method comprises the steps of constraining the weight tensor of the convolutional neural network, and reserving structural information of the weight tensor on a model structure; compared with the method for carrying out matrix orthogonal constraint on the weight matrix of the convolutional neural network, the optimization performance reduces the solving space of the network weight and simplifies the optimization. The invention effectively improves the performance of the convolutional neural network.
Drawings
Fig. 1 is a flowchart of a training process of a convolutional neural network training method based on tensor singular value delimitation in this embodiment;
FIG. 2 is a diagram illustrating singular value delimitation of a matrix according to this embodiment;
fig. 3 is a schematic diagram of singular value delimitation of the tensor of the embodiment.
Detailed Description
The present invention will be described in further detail below with reference to specific embodiments, but the embodiments of the present invention are not limited thereto.
The principle of the invention comprises: on the basis of orthogonal constraint on the weight matrixes of the convolutional layers and the fully-connected layers of the convolutional neural network, the singular values of the weight tensors of the convolutional layers are further constrained to be equal, so that the structural characteristics of the weight tensors are reserved. Threshold value limitation is carried out on singular values of the matrix and the tensor to approximately meet constraint conditions, a convolutional neural network training method is obtained, and network performance is improved.
As shown in fig. 1, a convolutional neural network training method based on tensor singular value delimitation includes the following steps:
s1, initializing weights of the convolutional neural network to enable the weight matrixes of the fully-connected layer and the convolutional layer to be orthogonal, and enabling singular values of weight tensors of the convolutional layer to be equal;
specifically, the weights of the convolutional layer and the fully-connected layer are initialized to random values, and then all singular values of the weight matrices of the convolutional layer and the fully-connected layer are set to be 1, so that the weight matrices are orthogonal matrices. In the convolutional layer, on the premise of keeping the F-norm constant, all singular values of the convolutional layer weight tensor are further made equal so that the convolutional layer weight tensor is the orthogonal tensor or the product of the orthogonal tensor and the constant, and the singular value setting of the weight matrix and the weight tensor of the convolutional layer is alternately performed until convergence.
S2, training the initialized convolution neural network by adopting a random gradient descent method or a variation thereof (including SGD with Momentum, SGD with Nesterov Momentum, AdaGrad, Adadelta, RMSprop and Adam). Every time a training period passes, the weights are updated according to the step S3.
And S3, updating the weight of the convolutional layer by utilizing matrix singular value delimitation and tensor singular value delimitation.
For convenience of description, the symbols involved are agreed upon. For any convolutional layer, the convolutional weight tensor is
Figure BDA0002593021690000041
Wherein
Figure BDA0002593021690000044
Is a three-dimensional real number tensor, the sizes of three dimensions of the tensor are respectively K, C and d2R represents a real number, C represents the number of input channels, K represents the number of convolution kernels, and d represents the size of the convolution kernels. Convolution weight tensor
Figure BDA0002593021690000043
The modulo-one expansion (mode-1 underfold) yields the corresponding weight matrix as
Figure BDA0002593021690000042
Wherein R represents a real number, K and Cd2Respectively representing the size of two dimensions of the matrix. For a fully connected layer, the weight structure is a matrix. For convenience of representation, the invention uniformly uses W epsilon R in the fully-connected matrix of the convolution layer and the fully-connected layerK×mDenotes that when it denotes a convolution matrix, m ═ Cd2
Step S3 is specifically as follows:
(1) updating the weights of the convolutional layers according to the matrix singular values, as shown in fig. 2, includes the following steps:
first, Singular Value Decomposition (SVD) is performed on the convolutional layer weight matrix W to obtain W ═ U ∑ VTWhere U is unitary matrix of K × m order, Sigma is diagonal matrix of m × m order non-negative real numbers, the diagonal elements are singular values of W, and V is mUnitary matrix of order m, VTRepresenting the transpose of V.
Secondly, threshold limitation is carried out on each diagonal element of the singular value matrix sigma according to the following mode:
if σ isi>1+1Then σi=1+1
If σ isi<1/(1+1) Then σi=1/(1+1);
If 1/(1+1)≤σi≤1+1Then σiKeeping the same;
wherein sigmaiThe ith diagonal element representing sigma,1representing smaller values, ranging from 0.1 to 0.5, for the singular value σiDelimitation is performed with constraints around 1.
Thirdly, calculating U 'sigma V' according to the new sigmaTA new convolutional layer weight matrix W is obtained.
(2) And updating the weight of the convolutional layer according to the tensor singular value. Convolving the Tensor with Tensor Singular Value Decomposition (t-SVD)
Figure BDA00025930216900000528
The decomposition is carried out, and the decomposition is carried out,
Figure BDA0002593021690000051
and for tensor singular values
Figure BDA0002593021690000052
Delimitation is performed as shown in fig. 3. Wherein
Figure BDA0002593021690000053
And
Figure BDA0002593021690000054
respectively, dimension K x d2And the size C d2The orthogonal tensor of (a);
Figure BDA0002593021690000055
is the frontal diagonal tensor (i.e., the tensor has all frontal slices being diagonal momentsMatrix) of size K × C × d2
Figure BDA0002593021690000056
Is a diagonal element of the first frontal slice of
Figure BDA00025930216900000529
Tensor singular values of (a); is the tensor-tensor product. The singular value decomposition of the tensor in the actual operation can be obtained by utilizing Fourier transform and the singular value decomposition of a matrix, and the specific calculation process is as follows: first, the convolution weight tensor is aligned
Figure BDA0002593021690000057
Performing fast Fourier transform along the dimension of the convolution kernel to obtain the result after Fourier transform
Figure BDA0002593021690000058
The calculation process is recorded as
Figure BDA0002593021690000059
Secondly, to
Figure BDA00025930216900000510
Each of the frontal slice matrices of (a) is subjected to matrix singular value decomposition, i.e.
Figure BDA00025930216900000511
Wherein,
Figure BDA00025930216900000512
to represent
Figure BDA00025930216900000513
The ith front-side slice matrix of (a),
Figure BDA00025930216900000514
is a unitary matrix of order K x m,
Figure BDA00025930216900000515
is a m x m order non-negative real diagonal matrix whose diagonal elements are
Figure BDA00025930216900000516
The singular value of (a) is,
Figure BDA00025930216900000517
is a unitary matrix of order m x m,
Figure BDA00025930216900000518
to represent
Figure BDA00025930216900000519
Transposing; finally, in pairs
Figure BDA00025930216900000520
Tensor for frontal slice
Figure BDA00025930216900000521
Respectively carrying out fast inverse Fourier transform along the dimension of the convolution kernel to obtain
Figure BDA00025930216900000522
Namely, it is
Figure BDA00025930216900000523
As can be seen from the nature of the inverse fourier transform,
Figure BDA00025930216900000524
is equal to all of the elements of the first front slice
Figure BDA00025930216900000525
Mean value of corresponding positions, hence
Figure BDA00025930216900000526
May be delimited by pairing
Figure BDA00025930216900000527
Comprising the following steps:
phi is to convolution weight tensor
Figure BDA00025930216900000620
Performing fast Fourier transform along the dimension of the convolution kernel to obtain the result after Fourier transform
Figure BDA0002593021690000061
Namely, it is
Figure BDA0002593021690000062
Calculating expected singular values of tensor
Figure BDA0002593021690000063
When i belongs to 1 to d2The following operations are performed:
to pair
Figure BDA0002593021690000064
Performing SVD on the matrix of the ith front slice to obtain
Figure BDA0002593021690000065
To pair
Figure BDA0002593021690000066
Is thresholded as follows:
if it is not
Figure BDA0002593021690000067
Then
Figure BDA0002593021690000068
If it is not
Figure BDA0002593021690000069
Then
Figure BDA00025930216900000610
If it is not
Figure BDA00025930216900000611
Then
Figure BDA00025930216900000612
Remain unchanged.
Wherein,
Figure BDA00025930216900000613
to represent
Figure BDA00025930216900000614
The jth diagonal element of (a) is,2representing smaller values, ranging from 0.1 to 0.5, for singular values
Figure BDA00025930216900000615
Delimitation is performed with constraints around 1.
According to new
Figure BDA00025930216900000616
Computing matrix multiplications
Figure BDA00025930216900000617
To obtain new
Figure BDA00025930216900000618
Fourthly, to new
Figure BDA00025930216900000619
Performing inverse fast Fourier transform along the dimension of the convolution kernel size to obtain a new convolution weight tensor
Figure BDA00025930216900000621
(3) And (3) alternately carrying out a plurality of iterations of the step (1) and the step (2). The embodiment suggests 1.5 iterations (i.e. performing step (1), step (2), and step (1) followed by exiting the iteration), which can achieve a good balance between the calculation time and the final effect, but is not limited to 1.5 iterations in practical applications.
S4, updating the weight of the full connection layer by using matrix singular value delimitation, and the method comprises the following steps:
(1) SVD is carried out on the weight matrix W of the full connection layer to obtain W ═ U ∑ VT
(2) For each diagonal element σ of ∑iThe threshold limitation is performed as follows:
if σ isi>1+, then σi=1+;
If σ isi<1/(1+), then σi=1/(1+);
If 1/(1 +). ltoreq. sigma.iLess than or equal to 1+, then sigmaiRemain unchanged.
(3) Calculating U sigma V from the new sigmaTNew W is obtained.
And S5, delimitation updating is carried out on the weight of the BN layer. Suppose the input of BN layer is h epsilon Rn(i.e., h is a vector of real numbers of dimension n, where R represents a real number), the operation of the BN layer can be expressed as:
BN(h)=ΥΦ(h-μ)+β,
wherein n represents the number of channels of the BN layer and is equal to the number of convolution kernels of the convolution layer connected with the BN layer in terms of value, namely n is equal to K, mu is the average value of the input of the batch of neurons, and the diagonal element of the diagonal matrix of phi is the standard deviation phi of the input of the batch of neuronsiY is a diagonal matrix whose diagonal elements may be learned BN layer weights viAnd β is a learnable BN layer bias term. The specific steps of the threshold limit updating of the BN layer are as follows:
(1) calculating the mean of the quotient of each neuron weight and the input standard layer
Figure BDA0002593021690000071
(2) Each diagonal element v to the mean value yiThe threshold limitation is performed as follows:
if it is not
Figure BDA0002593021690000072
Then
Figure BDA0002593021690000073
If it is not
Figure BDA0002593021690000074
Then
Figure BDA0002593021690000075
If it is not
Figure BDA0002593021690000076
Then upsiloniRemain unchanged.
Wherein upsilon isiIs the weight of the ith neuron of the BN layer, phiiIs the standard deviation of the ith neuron input for the batch,
Figure BDA0002593021690000077
representing a smaller value (ranging between 0.1 and 0.5) with a weight v for the BN layeriDelimitation is carried out, and constraint is carried out near alpha;
and S6, repeatedly executing the steps S3 to S5 until the training of the convolutional neural network is converged.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (8)

1. A convolutional neural network training method based on tensor singular value delimitation is characterized by comprising the following steps:
s1, initializing weights of the convolutional neural network to enable the weight matrixes of the fully-connected layer and the convolutional layer to be orthogonal, and enabling singular values of weight tensors of the convolutional layer to be equal;
s2, training the initialized convolutional neural network by using a random gradient descent method or a deformation thereof;
s3, after each training iteration, alternately updating matrix singular value delimitation and tensor singular value delimitation for convolutional layer weights of the convolutional neural network, updating matrix singular value delimitation for full-connection layer weights of the convolutional neural network, updating delimitation for weights of batch standardized layers (BN layers), and finishing the training if loss functions are converged; if the loss function has not converged, the process returns to step S2.
2. The method of claim 1, wherein step S1 constrains the initialization of convolutional neural network weights to be matrix orthogonal and tensor singular value equal.
3. The method according to claim 2, wherein after the random initialization of the fully-connected layer weights of the convolutional neural network in step S3, all singular values of the fully-connected layer weight matrix are limited to 1, so that the fully-connected layer weight matrix is orthogonal.
4. The method according to claim 3, wherein the random initialization of convolutional layer weights of convolutional neural network in step S3 is followed by the following operations until convergence:
limiting all singular values of the convolutional layer weight matrix to 1 so that the convolutional layer weight matrix is orthogonal;
on the premise of keeping the Frobenius norm (hereinafter referred to as F norm) unchanged, all singular values of the convolutional layer weight tensor are restricted to be equal, so that the convolutional layer weight tensor is an orthogonal tensor or a product of the orthogonal tensor and a constant.
5. The method according to claim 1, wherein step S2 is implemented by performing several training iterations on the initialized convolutional neural network by using a stochastic gradient descent method or its variant, and then updating the weights of the convolutional neural network by using threshold bounding.
6. The method of claim 1, wherein the step S3 of performing matrix singular value delimitation on the weight matrices of the convolutional layer and the fully-connected layer comprises the steps of:
performing matrix singular value decomposition on the weight matrix;
performing threshold value constraint on each singular value of the weight matrix, so that each singular value is close to 1;
and reconstructing a weight matrix according to the updated singular value.
7. The method of claim 1, wherein the tensor singular value delimitation of step S3 comprises the steps of:
keeping the F-norm of the tensor equal to the F-norm of the corresponding orthogonal matrix, and calculating expected singular values when all tensor singular values are equal;
carrying out tensor singular value decomposition on the weight tensor;
carrying out threshold value constraint on each singular value of the weight tensor to enable each singular value to be close to the expected singular value obtained through calculation;
and fourthly, reconstructing the weight tensor according to the updated singular value.
8. The method of claim 1, wherein the step S3 of thresholding the weight of the BN layer comprises the steps of:
calculating the mean value of the quotient of each neuron weight and the input standard layer;
and limiting the quotient of each neuron weight and the input standard layer to be close to the corresponding mean value to obtain the weight of a new BN layer.
CN202010700940.7A 2020-07-20 2020-07-20 Tensor singular value delimitation-based convolutional neural network training method Active CN111967574B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010700940.7A CN111967574B (en) 2020-07-20 2020-07-20 Tensor singular value delimitation-based convolutional neural network training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010700940.7A CN111967574B (en) 2020-07-20 2020-07-20 Tensor singular value delimitation-based convolutional neural network training method

Publications (2)

Publication Number Publication Date
CN111967574A true CN111967574A (en) 2020-11-20
CN111967574B CN111967574B (en) 2024-01-23

Family

ID=73362075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010700940.7A Active CN111967574B (en) 2020-07-20 2020-07-20 Tensor singular value delimitation-based convolutional neural network training method

Country Status (1)

Country Link
CN (1) CN111967574B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537492A (en) * 2021-07-19 2021-10-22 第六镜科技(成都)有限公司 Model training and data processing method, device, equipment, medium and product
CN114638283A (en) * 2022-02-11 2022-06-17 华南理工大学 Orthogonal convolution neural network image identification method based on tensor optimization space
CN117352049A (en) * 2023-10-31 2024-01-05 河南大学 Parameter efficient protein language model design method based on self-supervision learning and Kronecker product decomposition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092960A (en) * 2017-04-17 2017-08-25 中国民航大学 A kind of improved parallel channel convolutional neural networks training method
CN107680044A (en) * 2017-09-30 2018-02-09 福建帝视信息科技有限公司 A kind of image super-resolution convolutional neural networks speed-up computation method
CN107967516A (en) * 2017-10-12 2018-04-27 中科视拓(北京)科技有限公司 A kind of acceleration of neutral net based on trace norm constraint and compression method
CN108537252A (en) * 2018-03-21 2018-09-14 温州大学苍南研究院 A kind of image noise elimination method based on new norm
CN108649926A (en) * 2018-05-11 2018-10-12 电子科技大学 DAS data de-noising methods based on wavelet basis tensor rarefaction representation
CN109214441A (en) * 2018-08-23 2019-01-15 桂林电子科技大学 A kind of fine granularity model recognition system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092960A (en) * 2017-04-17 2017-08-25 中国民航大学 A kind of improved parallel channel convolutional neural networks training method
CN107680044A (en) * 2017-09-30 2018-02-09 福建帝视信息科技有限公司 A kind of image super-resolution convolutional neural networks speed-up computation method
CN107967516A (en) * 2017-10-12 2018-04-27 中科视拓(北京)科技有限公司 A kind of acceleration of neutral net based on trace norm constraint and compression method
CN108537252A (en) * 2018-03-21 2018-09-14 温州大学苍南研究院 A kind of image noise elimination method based on new norm
CN108649926A (en) * 2018-05-11 2018-10-12 电子科技大学 DAS data de-noising methods based on wavelet basis tensor rarefaction representation
CN109214441A (en) * 2018-08-23 2019-01-15 桂林电子科技大学 A kind of fine granularity model recognition system and method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537492A (en) * 2021-07-19 2021-10-22 第六镜科技(成都)有限公司 Model training and data processing method, device, equipment, medium and product
CN113537492B (en) * 2021-07-19 2024-04-26 第六镜科技(成都)有限公司 Model training and data processing method, device, equipment, medium and product
CN114638283A (en) * 2022-02-11 2022-06-17 华南理工大学 Orthogonal convolution neural network image identification method based on tensor optimization space
CN114638283B (en) * 2022-02-11 2024-08-16 华南理工大学 Tensor optimization space-based orthogonal convolutional neural network image recognition method
CN117352049A (en) * 2023-10-31 2024-01-05 河南大学 Parameter efficient protein language model design method based on self-supervision learning and Kronecker product decomposition
CN117352049B (en) * 2023-10-31 2024-09-13 河南大学 Parameter efficient protein language model design method based on self-supervision learning and Kronecker product decomposition

Also Published As

Publication number Publication date
CN111967574B (en) 2024-01-23

Similar Documents

Publication Publication Date Title
Cheng et al. Model compression and acceleration for deep neural networks: The principles, progress, and challenges
Yang et al. Feed-forward neural network training using sparse representation
Dziugaite et al. Training generative neural networks via maximum mean discrepancy optimization
CN111967574A (en) Convolutional neural network training method based on tensor singular value delimitation
Tang et al. Automatic sparse connectivity learning for neural networks
CN110490227B (en) Feature conversion-based few-sample image classification method
Tang et al. Analysis dictionary learning based classification: Structure for robustness
Wu et al. Improved expressivity through dendritic neural networks
CN111224905B (en) Multi-user detection method based on convolution residual error network in large-scale Internet of things
CN113408610B (en) Image identification method based on adaptive matrix iteration extreme learning machine
CN110717519A (en) Training, feature extraction and classification method, device and storage medium
CN111144500A (en) Differential privacy deep learning classification method based on analytic Gaussian mechanism
Bungert et al. A Bregman learning framework for sparse neural networks
CN112949610A (en) Improved Elman neural network prediction method based on noise reduction algorithm
Rojas et al. Blind source separation in post-nonlinear mixtures using competitive learning, simulated annealing, and a genetic algorithm
CN118196231B (en) Lifelong learning draft method based on concept segmentation
CN113240105A (en) Power grid steady state discrimination method based on graph neural network pooling
CN114638283B (en) Tensor optimization space-based orthogonal convolutional neural network image recognition method
Kim et al. Revisiting orthogonality regularization: a study for convolutional neural networks in image classification
CN117033985A (en) Motor imagery electroencephalogram classification method based on ResCNN-BiGRU
Bilski et al. Towards a very fast feedforward multilayer neural networks training algorithm
Yoshida et al. Tropical neural networks and its applications to classifying phylogenetic trees
Ben et al. An adaptive neural networks formulation for the two-dimensional principal component analysis
Upadhya et al. Learning RBM with a DC programming approach
CN110288002A (en) A kind of image classification method based on sparse Orthogonal Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant