CN111062474A - Neural network optimization method for solving and improving adjacent computer machines - Google Patents

Neural network optimization method for solving and improving adjacent computer machines Download PDF

Info

Publication number
CN111062474A
CN111062474A CN201811203464.7A CN201811203464A CN111062474A CN 111062474 A CN111062474 A CN 111062474A CN 201811203464 A CN201811203464 A CN 201811203464A CN 111062474 A CN111062474 A CN 111062474A
Authority
CN
China
Prior art keywords
neural network
optimization method
lpom
training
network optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811203464.7A
Other languages
Chinese (zh)
Other versions
CN111062474B (en
Inventor
林宙辰
李嘉
方聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201811203464.7A priority Critical patent/CN111062474B/en
Publication of CN111062474A publication Critical patent/CN111062474A/en
Application granted granted Critical
Publication of CN111062474B publication Critical patent/CN111062474B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a neural network optimization method for solving and improving adjacent computer machines, and relates to the technical field of deep learning neural network optimization; in the training of the forward neural network, the LPOM model of the adjacent computer machine is solved and improved by adopting a block coordinate descent method, each subproblem in the LPOM model has convergence, the weight of each layer of the neural network can be updated in parallel, the network activation is carried out, and no extra memory space is occupied. By adopting the technical scheme of the invention, the parallelism, the applicability and the training effect of the neural network training can be improved under the condition of using relatively less storage.

Description

Neural network optimization method for solving and improving adjacent computer machines
Technical Field
The invention relates to the technical field of deep learning neural network optimization, in particular to a method for carrying out neural network optimization by solving and improving a neighboring computer machine (LPOM).
Background
The forward deep neural network is composed of fully connected layers of a hierarchy and no feedback connection exists. With recent advances in hardware and dataset size, forward deep neural networks have become standard for many tasks. For example, image recognition [16], speech recognition [12], natural language understanding [6] and as an important component of a go learning system [22 ].
For the last decades, the goal of optimizing the forward neural network has typically been a function that is highly non-convex and nested with respect to network weights. The main method for optimizing the forward neural network is the Stochastic Gradient Descent (SGD) [21] its effectiveness was verified by its success in various practical applications. In recent years, various variants with decreasing random gradients have been proposed. They use adaptive learning efficiency or momentum terms such as Nesterov momentum [23], AdaGrad [8], RMSProp [7] and Adam [15 ]. SGD and its variants use fewer training samples to estimate the gradient, making the computation per iteration less computationally intensive. Furthermore, since the estimated gradient contains noise, this is advantageous for escaping the saddle point [9 ]. However, these methods also have some disadvantages. The main problem is that the magnitude of the gradient decreases or increases exponentially with the number of network layers causing the gradient to disappear or explode. This phenomenon can cause slow or unstable convergence, which is particularly severe in deeper neural networks. This disadvantage can be mitigated by using non-saturating activation functions such as linear rectification units (relus) and modified network architectures such as ResNet [11 ]. However, the fundamental problem still remains [24]. In addition, they cannot directly process non-differentiable activation functions (such as binary neural network [13]), nor can weights at different layers be updated in parallel.
The shortcomings of SGD have motivated new approaches to the study of training forward neural networks. Recently, training the forward neural network has been formalized as a constrained optimization problem. It introduces network activation as an auxiliary variable, and the network structure is guaranteed by layer-by-layer constraints [3 ]. This breaks the dependency of the nested functions into equality constraints and can then be solved using a number of standard optimization algorithms. The main difference belonging to this type of approach is how to handle the equality constraints. Document [4] approximates the equality constraints by a quadratic penalty term and alternately optimizes the network weights and activations. Document [25] introduces one more additional variable per layer. They also use quadratic penalty terms to approximate the equality constraints. However, both of these approaches are either approximately equality constrained or contain more auxiliary variables. Inspired by the interleaved orientation method [16], documents [24] and [27] use the augmented Lagrangian method to obtain a strict equality constraint. However, both of these methods involve lagrangian multipliers and nonlinear constraints, which require more memory and make optimization more difficult. According to the fact that the ReLU activation function is equivalent to a simple constrained convex optimization problem, document [26] relaxes the nonlinear constraint as a penalty term, which characterizes the network structure and the ReLU activation function. Thus, the non-linear constraint no longer exists. However, this method is limited to the ReLU activation function and cannot be used for other activation functions. Document [2] takes a similar idea, but discusses various types of single increment activation functions. However, their algorithms for updating weights and activation are still limited to the ReLU function. Their method can only be used to initialize SGD and cannot exceed the performance of SGD. Patent [1] proposes a new model approximating the forward neural network, called a lifted neighbor operator machine (LPOM). LPOM rewrites the activation function to its equivalent neighbor operator and adds the neighbor operator as a penalty term to the objective function to approximate the forward neural network. However, the solving algorithm presented in patent [1] does not take advantage of its property of being blocky and convex with respect to per-layer weights and activations. Updating the network activation using the staggered direction method introduces a number of auxiliary variables. It is very difficult to select proper learning efficiency when updating the weights using the gradient descent method.
Cited documents:
[1] optimization method for improving neighbor operator neural network 201711156691.4
[2]Askari,A.;Negiar,G.;Sambharya,R.;and Ghaoui,L.E.2018.Lifted neuralnetworks.arXiv preprint arXiv:1805.01532.
[3]Beck,A.,and Teboulle,M.2009.A fast iterative shrinkagethresholding algorithm for linear inverse problems.SIAM Journal on ImagingSciences 183–202.
[4]Carreira-Perpinan,M.,and Wang,W.2014.Distributed optimization ofdeeply nested systems.In International Conference on Artificial Intelligenceand Statistics,10–19.
[5]Clevert,D.-A.;Unterthiner,T.;and Hochreiter,S.2015.Fast andaccurate deep network learning by exponential linear units(elus).arXivpreprint arXiv:1511.07289.
[6]Collobert,R.;Weston,J.;Bottou,L.;Karlen,M.;Kavukcuoglu,K.;andKuksa,P.2011.Natural language processing(almost)from scratch.Journal ofMachine Learning Research 12:2493–2537.
[7]Dauphin,Y.;de Vries,H.;and Bengio,Y.2015.Equilibrated adaptivelearning rates for non-convex optimization.In NIPS,1504–1512.
[8]Duchi,J.;Hazan,E.;and Singer,Y.2011.Adaptive subgradient methodsfor online learning and stochastic optimization.Journal of Machine LearningResearch 12:2121–2159.
[9]Ge,R.;Huang,F.;Jin,C.;and Yuan,Y.2015.Escaping from saddle points-online stochastic gradient for tensor decomposition.In COLT,797–842.
[10]Glorot,X.,and Bengio,Y.2010.Understanding the difficulty oftraining deep feedforward neural networks.In Proceedings of the ThirteenthInternational Conference on Artificial Intelligence and Statistics,249–256.
[11]He,K.;Zhang,X.;Ren,S.;and Sun,J.2016.Deep residual learning forimage recognition.In CVPR,770–778.
[12]Hinton,G.;Deng,L.;Yu,D.;Dahl,G.E.;Mohamed,A.-R.;Jaitly,N.;Senior,A.;Vanhoucke,V.;Nguyen,P.;Sainath,T.N.;et al.2012.Deep neural networks foracoustic modeling in speech recognition:The shared views of four researchgroups.IEEE Signal Processing Magazine 29(6):82–97.
[13]Hubara,I.;Courbariaux,M.;Soudry,D.;El-Yaniv,R.;and Bengio,Y.2016.Binarized neural networks.In Advances in NIPS,4107–4115.
[14]Jia,Y.;Shelhamer,E.;Donahue,J.;Karayev,S.;Long,J.;Girshick,R.;Guadarrama,S.;and Darrell,T.2014.Caffe:Convolutional architecture for fastfeature embedding.In Proceedings of the 22nd ACM International Conference onMultimedia,675–678.ACM.
[15]Kingma,D.P.,and Ba,J.2014.Adam:A method for stochasticoptimization.arXiv preprint arXiv:1412.6980.
[16]Krizhevsky,A.;Sutskever,I.;and Hinton,G.E.2012.Imagenetclassification with deep convolutional neural networks.In NIPS,1097–1105.
[17]Lin,Z.;Liu,R.;and Su,Z.2011.Linearized alternating directionmethod with adaptive penalty for low-rank representation.In NIPS,612–620.
[18]Nesterov,Y.,ed.2004.Introductory Lectures on Convex Optimization:A Basic Course.Springer.
[19]Netzer,Y.;Wang,T.;Coates,A.;Bissacco,A.;Wu,B.;and Ng,A.Y.2011.Reading digits in natural images with unsupervised featurelearning.In NIPS workshop on Deep Learning and Unsupervised Feature Learning,volume 2011,5.
[20]Parikh,N.;Boyd,S.;et al.2014.Proximal algorithms.Foundations andTrendsR in Optimization 1(3):127–239.
[21]Rumelhart,D.E.;Hinton,G.E.;and Williams,R.J.1986.Learningrepresentations by back-propagating errors.Nature 323(6088):533.
[22]Silver,D.;Huang,A.;Maddison,C.J.;Guez,A.;Sifre,L.;Van DenDriessche,G.;Schrittwieser,J.;Antonoglou,I.;Panneershelvam,V.;Lanctot,M.;etal.2016.Mastering the game of Go with deep neural networks and treesearch.Nature 529(7587):484.
[23]Sutskever,I.;Martens,J.;Dahl,G.;and Hinton,G.2013.On theimportance of initialization and momentum in deep learning.In ICML,1139–1147.
[24]Taylor,G.;Burmeister,R.;Xu,Z.;Singh,B.;Patel,A.;and Goldstein,T.2016.Training neural networks without gradients:A scalable ADMM approach.InICML,2722–2731.
[25]Zeng,J.;Ouyang,S.;Lau,T.T.-K.;Lin,S.;and Yao,Y.2018.Globalconvergence in deep learning with variable splitting via the Kurdyka-Lojasiewicz property.arXiv preprint arXiv:1803.00225.
[26]Zhang,Z.,and Brand,M.2017.Convergent block coordinate descent fortraining Tikhonov regularized deep neural networks.In NIPS,1721–1730.
[27]Zhang,Z.;Chen,Y.;and Saligrama,V.2016.Efficient training of verydeep neural networks for supervised hashing.In CVPR,1487–1495.
Disclosure of Invention
To overcome the above-mentioned deficiencies of the prior art, the present invention provides a new solution to boost the neighbor calculator machine (LPOM) for training of the forward neural network. Different from the existing neural network optimization method, the solution has convergence guarantee on each subproblem, can update variables in parallel, and occupies a memory equivalent to that of a random gradient descent (SGD) method in the solving process.
For convenience of description, the invention first introduces the LPOM model, specifically as shown in formula 1:
Figure BDA0001830610080000041
wherein, Wi-1Is the i-1 th layer network weight, XiIs the i-th network activation, i 2, …, n, l (X)nL) is a loss function, n is the number of layers of the neural network, X1Is a training sample (when i is 2, Xi-1Is namely X1) L is X1The corresponding class label is marked with the corresponding class label,
Figure BDA0001830610080000042
for the matrix input, f (x) and g (x) are element-by-element, φ (x) is the activation function-1Is the inverse function of phi, muiMore than 0 is the parameter of the ith penalty term, 1 is the full 1 column vector, | | · | survivalFIs the Frobenius norm. If l (X)nL) with respect to XnIs convex and phi (x) is monotonically increasing, then LPOM is related to WiAnd XiIs block-convex, i.e. the objective function of equation 1 is with respect to W if the remaining variables remain unchangediAnd XiIs convex.
The technical scheme provided by the invention is as follows:
a neural network optimization method for solving and improving adjacent computer machines is characterized in that in the training of a forward neural network, an LPOM model (shown in formula 1) is solved by adopting a new block coordinate descent method, the convergence of each subproblem in the LPOM model is guaranteed, variables can be updated in parallel, and no additional memory space is occupied; the method comprises the following steps:
1) randomly selecting m from neural network training samples1A training sample X1And L, wherein m1Is the size of the batch process, L is the training sample X1Corresponding class labels;
2) updating network activation X layer by layeriI is 1, …, n; operations 21) to 22) are performed, and the meanings of the symbols in these steps are the same as in formula 1:
21) sequentially updating X according to the sequence of i-1, …, n-1i. Loop 2 until convergence.
Figure BDA0001830610080000051
In formula 2,. mu.i、μi+1Parameters of the ith and (i + 1) th penalty terms, respectively.
22) Updating Xn. Loop 3 until convergence.
Figure BDA0001830610080000052
In formula 3,. mu.nIs a parameter of the nth penalty term in equation 1.
3) Updating the network weight Wi,i=1,…,n-1。
It is assumed here that
Figure BDA0001830610080000053
β is smooth, i.e., the following inequality holds:
Figure BDA0001830610080000054
update W by the following procedurei
Initialization: wi,0,Wi,1,θ0And t is 0 and 1. Wherein, Wi,0And Wi,1Is to iteratively update WiInitial value of (a), theta0Is the parameter thetaIs the number of iterations.
31) Calculating thetat
Figure BDA0001830610080000055
Wherein, thetat> 0 represents the value of the parameter theta at the t-th iteration;
32) calculating Yi,t
Figure BDA0001830610080000056
Wherein, Yi,tRepresents the t-th iteration to update Yi
33) Calculating Wi,t+1
Figure BDA0001830610080000061
Wherein, Wi,t+1Represents the t +1 th iteration update Wi
Figure BDA0001830610080000062
Represents XiThe pseudo-inverse of (1);
34)t←t+1;
wherein, steps 21), 22) and 3) have convergence guarantee, and realize updating network activation and network weight layer by layer.
Namely, the neural network optimization is realized by the block coordinate descent method for solving and improving the adjacent operator machines.
Compared with the prior art, the invention has the beneficial effects that:
the method optimizes the forward neural network by solving and improving the adjacent computer machine, and can be used for specific tasks such as image recognition, voice recognition, natural language understanding and the like. The method for solving and improving the block coordinate descent of the adjacent operator machine can improve the parallelism, the applicability and the training effect of neural network training under the condition of using relatively less storage.
Specifically, the method provided by the invention can update the weight and activation of each layer in parallel. In addition, the algorithm only uses the activation function and does not use the differentiation of the activation function, so that the problem of gradient disappearance or explosion in a training method based on the gradient is avoided, and the training effect of the neural network can be improved. The optimization forward neural network method provided by the invention can be suitable for various single-increment Prziz continuous activation functions, and the activation functions can be saturated and cannot be differentiated. No additional auxiliary variables are required except for the activation of each layer, so substantially equivalent memory is used as with SGD. Further, the specific implementation experiment verifies that the algorithm updates the weight of each layer and the activated convergence. An image recognition task experiment on MNIST, CIFAR-10 and SVHN data sets [19] also verifies that the algorithm has the advantage of high accuracy rate when being used for neural network optimization.
Drawings
FIG. 1 is a comparison chart of the results of the new algorithm for solving LPOM and SGD method proposed by the present invention on MNIST and CIFAR-10 data sets.
Detailed Description
The invention will be further described by way of examples, without in any way limiting the scope of the invention, with reference to the accompanying drawings.
The invention provides a neural network optimization method for solving and improving adjacent computer machines, which adopts a new block coordinate reduction method to solve an LPOM model in the training of a forward neural network, ensures the convergence of each subproblem in the LPOM model, can update variables in parallel, improves the accuracy of the neural network training, and does not occupy extra memory space. The neural network optimization method provided by the invention can be applied to specific tasks such as image recognition, voice recognition, natural language processing and the like.
The following describes embodiments using image recognition as an example and compares them with the best current results. The method of the invention uses a least squares loss function
Figure BDA0001830610080000071
And the ReLU activation function, ReLU (x) ═ max (x,0), without using any regularization processing weights. The method proposed by the present invention to solve LPOM uses the same inputs as the SGD method and uses the document [10 ]]The random initialization method is described. The method for solving LPOM and SGD of the invention is adopted to carry out the method for MNIST, CIFAR-10 and SVHN [19]]Image recognition task on three datasetsAnd (6) comparing. For both SGD and LPOM, all training images in each dataset are used only once per pass (epoch) training process. The optimization of the image recognition neural network training by adopting the method comprises the following steps:
1) randomly selecting m from training samples of image recognition neural network1Training image X1And L, wherein m1Is the batch size and can be of the same value as 100 or 256, L is X1Corresponding class labels, wherein the commonly used MNIST and CIFAR-10 data sets respectively comprise 10 classes;
2) updating activation X of forward neural network layer by layeriI is 1, …, n; operations 21) to 22) are performed, and the meanings of the symbols in these steps are the same as in formula 1:
21) sequentially updating X according to the sequence of i-1, …, n-1i. Cycle 4 was repeated 100 times.
Figure BDA0001830610080000072
22) Updating activation X of Forward neural networkn. Cycle 5 was repeated 100 times.
Figure BDA0001830610080000073
3) Updating weight W of forward neural networki,i=1,…,n-1。
For the purpose of the ReLU activation function,
Figure BDA0001830610080000074
β is smoothed 1, i.e., the following inequality holds:
Figure BDA0001830610080000075
therefore, W can be updated by the following procedureiA total of 5 iterations:
initialization: wi,0=Wi,Wi,1=Wi,θ0And t is 0 and 1. Wherein, Wi,0And Wi,1Are all initialized to Wi
31) Calculating thetat
Figure BDA0001830610080000076
Wherein, thetat-1Is the value of the parameter theta at t-1 iterations;
32) calculating Yi,t
Figure BDA0001830610080000077
33) Calculating Wi,t+1
Figure BDA0001830610080000081
Wherein the content of the first and second substances,
Figure BDA0001830610080000082
represents XiThe pseudo-inverse of (1);
34)t←t+1;
wherein, steps 21), 22) and 3) have convergence guarantee.
Namely, the optimization of the image recognition neural network is realized by the method for solving and improving the block coordinate descent of the adjacent operator machine.
Specifically, on the MNIST data set, 784 original pixels are used as input to the method of solving LPOM and SGD in the present invention. The data set contained 60,000 training images and 10,000 test images in total. No pre-processing or data enhancement is used in the implementation. And document [25]]Similarly, the invention uses a forward fully-connected neural network of 784-. The method of the invention is used for simply setting mu for LPOM i20. The LPOM solving method and the SGD method are carried out in 100 times in the experiment, the batch processing size is 100, and the method is similar to the literature [25] on the CIFAR-10 data set]Similarly, the invention uses 3072-4000-1000-4000-10 forward fully-connected neural network. The color image is normalized by subtracting the mean of the three channels red, green and blue. In addition, no other pre-processing or data enhancement is used. For the method to solve LPOM, set μiThe method to solve for LPOM and SGD method were run 100 runs together, with a batch size of 100.
On MNIST data set with literature [2]]When compared, the present invention is used in reference [2]]The same network structure. In actual calculations, document [2]]Only the ReLU activation function is used. And document [2]]Similarly, the method of solving the LPOM is run for 17 runs, with a batch size of 100. for LPOM, μ is set on all network fabricsiNo pre-processing or data enhancement was used in the implementation. And document [24]]In SVHN dataset [19]When the above comparison is made, it is according to the reference [24]]Settings regarding network structure and data sets. For the proposed method of solving LPOM, set μi=20.
The training and testing accuracy of the method for solving LPOM and the SGD method on MNIST data set is shown in fig. 1(a) and (b). It can be seen that the training accuracy of both methods is close to 100%, however, the test accuracy (98.2%) obtained by solving for LPOM using the inventive method is slightly better than SGD (98.0%). The training and testing accuracy of LPOM and SGD in the present invention on CIFAR-10 data set is shown in FIGS. 1(c) and (d). It can be seen that the training accuracy of both methods is close to 100%, however, the testing accuracy of LPOM in the present invention (52.5%) is higher than that of SGD (47.5%).
The testing accuracy of LPOM adopting the method of the invention and the reference [2] on MNIST is shown in Table 1. It can be seen that the results of LPOM in the present invention are significantly better than those of document [2] the test accuracy of LPOM in the present invention with SGD and document [24] on SVHN data set is shown in Table 2. it can be seen that the results of solving LPOM by the method of the present invention are better than those of SGD and document [24].
Table 1: comparison of LPOM and literature [2] solved by the method of the invention on MNIST data set
Figure BDA0001830610080000083
Figure BDA0001830610080000091
TABLE 2 comparison of LPOM and SGD solved by the method of the invention and document [24] on SVHD datasets
SGD 95.0%
Document [24] 96.5%
LPOM 98.3%
It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

Claims (5)

1. A neural network optimization method for solving and improving a neighboring computer machine solves and improves a neighboring computer machine LPOM model by adopting a block coordinate descent method in the training of a forward neural network, each subproblem in the LPOM model has convergence, the weight and network activation of each layer of the neural network can be updated in parallel, and additional memory space is not occupied;
the objective function for promoting the LPOM model of the adjacent operator is expressed as formula 1:
Figure FDA0001830610070000011
wherein, Wi-1Is the i-1 layer network weight of the neural network; xiIs a layer i network activation, i 2, …, n; l (X)nL) is a loss function, n is the number of layers of the neural network;X1Is a training sample, X when i is 2i-1Is namely X1(ii) a L is X1Corresponding class labels;
Figure FDA0001830610070000012
for matrix input, f (x) and g (x) are element-by-element; phi (x) is the activation function, phi-1Is the inverse function of phi; mu.si> 0 is a parameter of the ith penalty term; 1 is a column vector of all 1's; i | · | purple windFIs the Frobenius norm; if l (X)nL) with respect to XnIs convex and phi (x) is monotonically increasing, then LPOM is related to WiAnd XiIs block-convex, i.e. the objective function expressed by equation 1 is with respect to W if the remaining variables remain unchangediAnd XiIs convex;
the neural network optimization method for solving and improving the adjacent computer machines comprises the following steps:
1) randomly selecting m from neural network training samples1A training sample X1And L, wherein m1Is the size of the batch process, L is the training sample X1Corresponding class labels;
2) updating network activation X layer by layeriI is 1, …, n; perform operations 21) through 22):
21) sequentially updating X according to the sequence of i-1, …, n-1i: cycle 2 until convergence:
Figure FDA0001830610070000013
in formula 2,. mu.i+1A parameter which is the (i + 1) th penalty term;
22) updating Xn: loop 3 until convergence:
Figure FDA0001830610070000014
in formula 3,. mu.nA parameter that is the nth penalty term;
3) updating the network weight Wi,i=1,…,n-1:
In particular, assume that
Figure FDA0001830610070000021
Is β smooth, i.e., the inequality:
Figure FDA0001830610070000022
if true; update W by the following procedurei
And (3) initializing: wi,0、Wi,1、θ00; t is 1; wherein, Wi,0And Wi,1Is to iteratively update WiInitial value of (a), theta0Is the initial value of the parameter θ, t is the number of iterations;
31) calculating thetat
Figure FDA0001830610070000023
Wherein, thetat> 0 represents the value of the parameter theta at the t-th iteration;
32) calculating Yi,t
Figure FDA0001830610070000024
Wherein, Yi,tRepresents the t-th iteration to update Yi
33) Calculating Wi,t+1
Figure FDA0001830610070000025
Wherein, Wi,t+1Represents the t +1 th iteration update Wi
Figure FDA0001830610070000026
Represents XiThe pseudo-inverse of (1);
34) adding 1 to the number of iterations: t ← t + 1;
the steps 21), 22) and 3) have convergence guarantee;
through the steps, the method for solving and lifting the block coordinate descending of the adjacent operator machine is used for realizing the neural network optimization.
2. The neural network optimization method of claim 1, wherein the neural network optimization method performs parallel update of weights and network activations of each layer of the neural network.
3. The neural network optimization method of claim 1, wherein the neural network optimization method is applied to image recognition, speech recognition and natural language processing neural networks.
4. The neural network optimization method of claim 1, wherein the neural network optimization method is applied to image recognition;
using a least squares loss function:
Figure FDA0001830610070000027
and the ReLU activation function: relu (x) max (x,0), without using any regularization processing weights; the training samples are training images in an image dataset.
5. A neural network optimization method as claimed in claim 4, wherein the image dataset is a MNIST, CIFAR-10 and/or SVHN dataset.
CN201811203464.7A 2018-10-16 2018-10-16 Neural network optimization method for solving and improving adjacent operator machine Active CN111062474B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811203464.7A CN111062474B (en) 2018-10-16 2018-10-16 Neural network optimization method for solving and improving adjacent operator machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811203464.7A CN111062474B (en) 2018-10-16 2018-10-16 Neural network optimization method for solving and improving adjacent operator machine

Publications (2)

Publication Number Publication Date
CN111062474A true CN111062474A (en) 2020-04-24
CN111062474B CN111062474B (en) 2023-04-28

Family

ID=70296459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811203464.7A Active CN111062474B (en) 2018-10-16 2018-10-16 Neural network optimization method for solving and improving adjacent operator machine

Country Status (1)

Country Link
CN (1) CN111062474B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132760A (en) * 2020-09-14 2020-12-25 北京大学 Image recovery method based on learnable differentiable matrix inversion and matrix decomposition
CN112183742A (en) * 2020-09-03 2021-01-05 南强智视(厦门)科技有限公司 Neural network hybrid quantization method based on progressive quantization and Hessian information
CN113313175A (en) * 2021-05-28 2021-08-27 北京大学 Image classification method of sparse regularization neural network based on multivariate activation function

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110019693A1 (en) * 2009-07-23 2011-01-27 Sanyo North America Corporation Adaptive network system with online learning and autonomous cross-layer optimization for delay-sensitive applications
CN107784361A (en) * 2017-11-20 2018-03-09 北京大学 The neighbouring operator machine neural network optimization method of one kind lifting

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110019693A1 (en) * 2009-07-23 2011-01-27 Sanyo North America Corporation Adaptive network system with online learning and autonomous cross-layer optimization for delay-sensitive applications
CN107784361A (en) * 2017-11-20 2018-03-09 北京大学 The neighbouring operator machine neural network optimization method of one kind lifting

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
GEOFFREY HINTON 等: "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups" *
亢良伊;王建飞;刘杰;叶丹;: "可扩展机器学习的并行与分布式优化算法综述" *
李晓宇;周铭;袁晓彤;罗琦;刘青山;: "基于坐标下降邻域选择的高斯图模型结构并行估计" *
李智;杨洪耕;: "基于分块坐标下降思想的并行无功优化分解协调算法" *
谢佩;游科友;洪奕光;谢立华;: "网络化分布式凸优化算法研究进展" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183742A (en) * 2020-09-03 2021-01-05 南强智视(厦门)科技有限公司 Neural network hybrid quantization method based on progressive quantization and Hessian information
CN112183742B (en) * 2020-09-03 2023-05-12 南强智视(厦门)科技有限公司 Neural network hybrid quantization method based on progressive quantization and Hessian information
CN112132760A (en) * 2020-09-14 2020-12-25 北京大学 Image recovery method based on learnable differentiable matrix inversion and matrix decomposition
CN112132760B (en) * 2020-09-14 2024-02-27 北京大学 Image recovery method based on matrix inversion and matrix decomposition capable of learning and differentiating
CN113313175A (en) * 2021-05-28 2021-08-27 北京大学 Image classification method of sparse regularization neural network based on multivariate activation function
CN113313175B (en) * 2021-05-28 2024-02-27 北京大学 Image classification method of sparse regularized neural network based on multi-element activation function

Also Published As

Publication number Publication date
CN111062474B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
Rastegari et al. Xnor-net: Imagenet classification using binary convolutional neural networks
Cortes et al. Adanet: Adaptive structural learning of artificial neural networks
Cai et al. Efficient architecture search by network transformation
Balaji et al. Metareg: Towards domain generalization using meta-regularization
Guberman On complex valued convolutional neural networks
CN110288030B (en) Image identification method, device and equipment based on lightweight network model
Xu et al. Deep neural network compression with single and multiple level quantization
Huang et al. Deep networks with stochastic depth
US12001918B2 (en) Classification using quantum neural networks
Lee et al. Deeply-supervised nets
Godin et al. Dual rectified linear units (DReLUs): A replacement for tanh activation functions in quasi-recurrent neural networks
Singh et al. Layer-specific adaptive learning rates for deep networks
CN111062474B (en) Neural network optimization method for solving and improving adjacent operator machine
CN110443372B (en) Transfer learning method and system based on entropy minimization
Li et al. Lifted proximal operator machines
Mosca et al. Deep incremental boosting
US20220215252A1 (en) Method and system for initializing a neural network
Hayou et al. Mean-field behaviour of neural tangent kernel for deep neural networks
Wang et al. Enresnet: Resnet ensemble via the feynman-kac formalism
Wu et al. Steepest descent neural architecture optimization: Escaping local optimum with signed neural splitting
Basheer et al. Alternating layered variational quantum circuits can be classically optimized efficiently using classical shadows
Roth et al. Variational inference in neural networks using an approximate closed-form objective
Wani et al. Training supervised deep learning networks
Wani et al. Supervised deep learning architectures
Chavan et al. A hybrid deep neural network for online learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant