CN111062474A - Neural network optimization method for solving and improving adjacent computer machines - Google Patents
Neural network optimization method for solving and improving adjacent computer machines Download PDFInfo
- Publication number
- CN111062474A CN111062474A CN201811203464.7A CN201811203464A CN111062474A CN 111062474 A CN111062474 A CN 111062474A CN 201811203464 A CN201811203464 A CN 201811203464A CN 111062474 A CN111062474 A CN 111062474A
- Authority
- CN
- China
- Prior art keywords
- neural network
- optimization method
- lpom
- training
- network optimization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a neural network optimization method for solving and improving adjacent computer machines, and relates to the technical field of deep learning neural network optimization; in the training of the forward neural network, the LPOM model of the adjacent computer machine is solved and improved by adopting a block coordinate descent method, each subproblem in the LPOM model has convergence, the weight of each layer of the neural network can be updated in parallel, the network activation is carried out, and no extra memory space is occupied. By adopting the technical scheme of the invention, the parallelism, the applicability and the training effect of the neural network training can be improved under the condition of using relatively less storage.
Description
Technical Field
The invention relates to the technical field of deep learning neural network optimization, in particular to a method for carrying out neural network optimization by solving and improving a neighboring computer machine (LPOM).
Background
The forward deep neural network is composed of fully connected layers of a hierarchy and no feedback connection exists. With recent advances in hardware and dataset size, forward deep neural networks have become standard for many tasks. For example, image recognition [16], speech recognition [12], natural language understanding [6] and as an important component of a go learning system [22 ].
For the last decades, the goal of optimizing the forward neural network has typically been a function that is highly non-convex and nested with respect to network weights. The main method for optimizing the forward neural network is the Stochastic Gradient Descent (SGD) [21] its effectiveness was verified by its success in various practical applications. In recent years, various variants with decreasing random gradients have been proposed. They use adaptive learning efficiency or momentum terms such as Nesterov momentum [23], AdaGrad [8], RMSProp [7] and Adam [15 ]. SGD and its variants use fewer training samples to estimate the gradient, making the computation per iteration less computationally intensive. Furthermore, since the estimated gradient contains noise, this is advantageous for escaping the saddle point [9 ]. However, these methods also have some disadvantages. The main problem is that the magnitude of the gradient decreases or increases exponentially with the number of network layers causing the gradient to disappear or explode. This phenomenon can cause slow or unstable convergence, which is particularly severe in deeper neural networks. This disadvantage can be mitigated by using non-saturating activation functions such as linear rectification units (relus) and modified network architectures such as ResNet [11 ]. However, the fundamental problem still remains [24]. In addition, they cannot directly process non-differentiable activation functions (such as binary neural network [13]), nor can weights at different layers be updated in parallel.
The shortcomings of SGD have motivated new approaches to the study of training forward neural networks. Recently, training the forward neural network has been formalized as a constrained optimization problem. It introduces network activation as an auxiliary variable, and the network structure is guaranteed by layer-by-layer constraints [3 ]. This breaks the dependency of the nested functions into equality constraints and can then be solved using a number of standard optimization algorithms. The main difference belonging to this type of approach is how to handle the equality constraints. Document [4] approximates the equality constraints by a quadratic penalty term and alternately optimizes the network weights and activations. Document [25] introduces one more additional variable per layer. They also use quadratic penalty terms to approximate the equality constraints. However, both of these approaches are either approximately equality constrained or contain more auxiliary variables. Inspired by the interleaved orientation method [16], documents [24] and [27] use the augmented Lagrangian method to obtain a strict equality constraint. However, both of these methods involve lagrangian multipliers and nonlinear constraints, which require more memory and make optimization more difficult. According to the fact that the ReLU activation function is equivalent to a simple constrained convex optimization problem, document [26] relaxes the nonlinear constraint as a penalty term, which characterizes the network structure and the ReLU activation function. Thus, the non-linear constraint no longer exists. However, this method is limited to the ReLU activation function and cannot be used for other activation functions. Document [2] takes a similar idea, but discusses various types of single increment activation functions. However, their algorithms for updating weights and activation are still limited to the ReLU function. Their method can only be used to initialize SGD and cannot exceed the performance of SGD. Patent [1] proposes a new model approximating the forward neural network, called a lifted neighbor operator machine (LPOM). LPOM rewrites the activation function to its equivalent neighbor operator and adds the neighbor operator as a penalty term to the objective function to approximate the forward neural network. However, the solving algorithm presented in patent [1] does not take advantage of its property of being blocky and convex with respect to per-layer weights and activations. Updating the network activation using the staggered direction method introduces a number of auxiliary variables. It is very difficult to select proper learning efficiency when updating the weights using the gradient descent method.
Cited documents:
[1] optimization method for improving neighbor operator neural network 201711156691.4
[2]Askari,A.;Negiar,G.;Sambharya,R.;and Ghaoui,L.E.2018.Lifted neuralnetworks.arXiv preprint arXiv:1805.01532.
[3]Beck,A.,and Teboulle,M.2009.A fast iterative shrinkagethresholding algorithm for linear inverse problems.SIAM Journal on ImagingSciences 183–202.
[4]Carreira-Perpinan,M.,and Wang,W.2014.Distributed optimization ofdeeply nested systems.In International Conference on Artificial Intelligenceand Statistics,10–19.
[5]Clevert,D.-A.;Unterthiner,T.;and Hochreiter,S.2015.Fast andaccurate deep network learning by exponential linear units(elus).arXivpreprint arXiv:1511.07289.
[6]Collobert,R.;Weston,J.;Bottou,L.;Karlen,M.;Kavukcuoglu,K.;andKuksa,P.2011.Natural language processing(almost)from scratch.Journal ofMachine Learning Research 12:2493–2537.
[7]Dauphin,Y.;de Vries,H.;and Bengio,Y.2015.Equilibrated adaptivelearning rates for non-convex optimization.In NIPS,1504–1512.
[8]Duchi,J.;Hazan,E.;and Singer,Y.2011.Adaptive subgradient methodsfor online learning and stochastic optimization.Journal of Machine LearningResearch 12:2121–2159.
[9]Ge,R.;Huang,F.;Jin,C.;and Yuan,Y.2015.Escaping from saddle points-online stochastic gradient for tensor decomposition.In COLT,797–842.
[10]Glorot,X.,and Bengio,Y.2010.Understanding the difficulty oftraining deep feedforward neural networks.In Proceedings of the ThirteenthInternational Conference on Artificial Intelligence and Statistics,249–256.
[11]He,K.;Zhang,X.;Ren,S.;and Sun,J.2016.Deep residual learning forimage recognition.In CVPR,770–778.
[12]Hinton,G.;Deng,L.;Yu,D.;Dahl,G.E.;Mohamed,A.-R.;Jaitly,N.;Senior,A.;Vanhoucke,V.;Nguyen,P.;Sainath,T.N.;et al.2012.Deep neural networks foracoustic modeling in speech recognition:The shared views of four researchgroups.IEEE Signal Processing Magazine 29(6):82–97.
[13]Hubara,I.;Courbariaux,M.;Soudry,D.;El-Yaniv,R.;and Bengio,Y.2016.Binarized neural networks.In Advances in NIPS,4107–4115.
[14]Jia,Y.;Shelhamer,E.;Donahue,J.;Karayev,S.;Long,J.;Girshick,R.;Guadarrama,S.;and Darrell,T.2014.Caffe:Convolutional architecture for fastfeature embedding.In Proceedings of the 22nd ACM International Conference onMultimedia,675–678.ACM.
[15]Kingma,D.P.,and Ba,J.2014.Adam:A method for stochasticoptimization.arXiv preprint arXiv:1412.6980.
[16]Krizhevsky,A.;Sutskever,I.;and Hinton,G.E.2012.Imagenetclassification with deep convolutional neural networks.In NIPS,1097–1105.
[17]Lin,Z.;Liu,R.;and Su,Z.2011.Linearized alternating directionmethod with adaptive penalty for low-rank representation.In NIPS,612–620.
[18]Nesterov,Y.,ed.2004.Introductory Lectures on Convex Optimization:A Basic Course.Springer.
[19]Netzer,Y.;Wang,T.;Coates,A.;Bissacco,A.;Wu,B.;and Ng,A.Y.2011.Reading digits in natural images with unsupervised featurelearning.In NIPS workshop on Deep Learning and Unsupervised Feature Learning,volume 2011,5.
[20]Parikh,N.;Boyd,S.;et al.2014.Proximal algorithms.Foundations andTrendsR in Optimization 1(3):127–239.
[21]Rumelhart,D.E.;Hinton,G.E.;and Williams,R.J.1986.Learningrepresentations by back-propagating errors.Nature 323(6088):533.
[22]Silver,D.;Huang,A.;Maddison,C.J.;Guez,A.;Sifre,L.;Van DenDriessche,G.;Schrittwieser,J.;Antonoglou,I.;Panneershelvam,V.;Lanctot,M.;etal.2016.Mastering the game of Go with deep neural networks and treesearch.Nature 529(7587):484.
[23]Sutskever,I.;Martens,J.;Dahl,G.;and Hinton,G.2013.On theimportance of initialization and momentum in deep learning.In ICML,1139–1147.
[24]Taylor,G.;Burmeister,R.;Xu,Z.;Singh,B.;Patel,A.;and Goldstein,T.2016.Training neural networks without gradients:A scalable ADMM approach.InICML,2722–2731.
[25]Zeng,J.;Ouyang,S.;Lau,T.T.-K.;Lin,S.;and Yao,Y.2018.Globalconvergence in deep learning with variable splitting via the Kurdyka-Lojasiewicz property.arXiv preprint arXiv:1803.00225.
[26]Zhang,Z.,and Brand,M.2017.Convergent block coordinate descent fortraining Tikhonov regularized deep neural networks.In NIPS,1721–1730.
[27]Zhang,Z.;Chen,Y.;and Saligrama,V.2016.Efficient training of verydeep neural networks for supervised hashing.In CVPR,1487–1495.
Disclosure of Invention
To overcome the above-mentioned deficiencies of the prior art, the present invention provides a new solution to boost the neighbor calculator machine (LPOM) for training of the forward neural network. Different from the existing neural network optimization method, the solution has convergence guarantee on each subproblem, can update variables in parallel, and occupies a memory equivalent to that of a random gradient descent (SGD) method in the solving process.
For convenience of description, the invention first introduces the LPOM model, specifically as shown in formula 1:
wherein, Wi-1Is the i-1 th layer network weight, XiIs the i-th network activation, i 2, …, n, l (X)nL) is a loss function, n is the number of layers of the neural network, X1Is a training sample (when i is 2, Xi-1Is namely X1) L is X1The corresponding class label is marked with the corresponding class label,for the matrix input, f (x) and g (x) are element-by-element, φ (x) is the activation function-1Is the inverse function of phi, muiMore than 0 is the parameter of the ith penalty term, 1 is the full 1 column vector, | | · | survivalFIs the Frobenius norm. If l (X)nL) with respect to XnIs convex and phi (x) is monotonically increasing, then LPOM is related to WiAnd XiIs block-convex, i.e. the objective function of equation 1 is with respect to W if the remaining variables remain unchangediAnd XiIs convex.
The technical scheme provided by the invention is as follows:
a neural network optimization method for solving and improving adjacent computer machines is characterized in that in the training of a forward neural network, an LPOM model (shown in formula 1) is solved by adopting a new block coordinate descent method, the convergence of each subproblem in the LPOM model is guaranteed, variables can be updated in parallel, and no additional memory space is occupied; the method comprises the following steps:
1) randomly selecting m from neural network training samples1A training sample X1And L, wherein m1Is the size of the batch process, L is the training sample X1Corresponding class labels;
2) updating network activation X layer by layeriI is 1, …, n; operations 21) to 22) are performed, and the meanings of the symbols in these steps are the same as in formula 1:
21) sequentially updating X according to the sequence of i-1, …, n-1i. Loop 2 until convergence.
In formula 2,. mu.i、μi+1Parameters of the ith and (i + 1) th penalty terms, respectively.
22) Updating Xn. Loop 3 until convergence.
In formula 3,. mu.nIs a parameter of the nth penalty term in equation 1.
3) Updating the network weight Wi,i=1,…,n-1。
update W by the following procedurei:
Initialization: wi,0,Wi,1,θ0And t is 0 and 1. Wherein, Wi,0And Wi,1Is to iteratively update WiInitial value of (a), theta0Is the parameter thetaIs the number of iterations.
31) Calculating thetat:Wherein, thetat> 0 represents the value of the parameter theta at the t-th iteration;
33) Calculating Wi,t+1:Wherein, Wi,t+1Represents the t +1 th iteration update Wi,Represents XiThe pseudo-inverse of (1);
34)t←t+1;
wherein, steps 21), 22) and 3) have convergence guarantee, and realize updating network activation and network weight layer by layer.
Namely, the neural network optimization is realized by the block coordinate descent method for solving and improving the adjacent operator machines.
Compared with the prior art, the invention has the beneficial effects that:
the method optimizes the forward neural network by solving and improving the adjacent computer machine, and can be used for specific tasks such as image recognition, voice recognition, natural language understanding and the like. The method for solving and improving the block coordinate descent of the adjacent operator machine can improve the parallelism, the applicability and the training effect of neural network training under the condition of using relatively less storage.
Specifically, the method provided by the invention can update the weight and activation of each layer in parallel. In addition, the algorithm only uses the activation function and does not use the differentiation of the activation function, so that the problem of gradient disappearance or explosion in a training method based on the gradient is avoided, and the training effect of the neural network can be improved. The optimization forward neural network method provided by the invention can be suitable for various single-increment Prziz continuous activation functions, and the activation functions can be saturated and cannot be differentiated. No additional auxiliary variables are required except for the activation of each layer, so substantially equivalent memory is used as with SGD. Further, the specific implementation experiment verifies that the algorithm updates the weight of each layer and the activated convergence. An image recognition task experiment on MNIST, CIFAR-10 and SVHN data sets [19] also verifies that the algorithm has the advantage of high accuracy rate when being used for neural network optimization.
Drawings
FIG. 1 is a comparison chart of the results of the new algorithm for solving LPOM and SGD method proposed by the present invention on MNIST and CIFAR-10 data sets.
Detailed Description
The invention will be further described by way of examples, without in any way limiting the scope of the invention, with reference to the accompanying drawings.
The invention provides a neural network optimization method for solving and improving adjacent computer machines, which adopts a new block coordinate reduction method to solve an LPOM model in the training of a forward neural network, ensures the convergence of each subproblem in the LPOM model, can update variables in parallel, improves the accuracy of the neural network training, and does not occupy extra memory space. The neural network optimization method provided by the invention can be applied to specific tasks such as image recognition, voice recognition, natural language processing and the like.
The following describes embodiments using image recognition as an example and compares them with the best current results. The method of the invention uses a least squares loss functionAnd the ReLU activation function, ReLU (x) ═ max (x,0), without using any regularization processing weights. The method proposed by the present invention to solve LPOM uses the same inputs as the SGD method and uses the document [10 ]]The random initialization method is described. The method for solving LPOM and SGD of the invention is adopted to carry out the method for MNIST, CIFAR-10 and SVHN [19]]Image recognition task on three datasetsAnd (6) comparing. For both SGD and LPOM, all training images in each dataset are used only once per pass (epoch) training process. The optimization of the image recognition neural network training by adopting the method comprises the following steps:
1) randomly selecting m from training samples of image recognition neural network1Training image X1And L, wherein m1Is the batch size and can be of the same value as 100 or 256, L is X1Corresponding class labels, wherein the commonly used MNIST and CIFAR-10 data sets respectively comprise 10 classes;
2) updating activation X of forward neural network layer by layeriI is 1, …, n; operations 21) to 22) are performed, and the meanings of the symbols in these steps are the same as in formula 1:
21) sequentially updating X according to the sequence of i-1, …, n-1i. Cycle 4 was repeated 100 times.
22) Updating activation X of Forward neural networkn. Cycle 5 was repeated 100 times.
3) Updating weight W of forward neural networki,i=1,…,n-1。
For the purpose of the ReLU activation function,β is smoothed 1, i.e., the following inequality holds:
therefore, W can be updated by the following procedureiA total of 5 iterations:
initialization: wi,0=Wi,Wi,1=Wi,θ0And t is 0 and 1. Wherein, Wi,0And Wi,1Are all initialized to Wi。
33) Calculating Wi,t+1:Wherein the content of the first and second substances,represents XiThe pseudo-inverse of (1);
34)t←t+1;
wherein, steps 21), 22) and 3) have convergence guarantee.
Namely, the optimization of the image recognition neural network is realized by the method for solving and improving the block coordinate descent of the adjacent operator machine.
Specifically, on the MNIST data set, 784 original pixels are used as input to the method of solving LPOM and SGD in the present invention. The data set contained 60,000 training images and 10,000 test images in total. No pre-processing or data enhancement is used in the implementation. And document [25]]Similarly, the invention uses a forward fully-connected neural network of 784-. The method of the invention is used for simply setting mu for LPOM i20. The LPOM solving method and the SGD method are carried out in 100 times in the experiment, the batch processing size is 100, and the method is similar to the literature [25] on the CIFAR-10 data set]Similarly, the invention uses 3072-4000-1000-4000-10 forward fully-connected neural network. The color image is normalized by subtracting the mean of the three channels red, green and blue. In addition, no other pre-processing or data enhancement is used. For the method to solve LPOM, set μiThe method to solve for LPOM and SGD method were run 100 runs together, with a batch size of 100.
On MNIST data set with literature [2]]When compared, the present invention is used in reference [2]]The same network structure. In actual calculations, document [2]]Only the ReLU activation function is used. And document [2]]Similarly, the method of solving the LPOM is run for 17 runs, with a batch size of 100. for LPOM, μ is set on all network fabricsiNo pre-processing or data enhancement was used in the implementation. And document [24]]In SVHN dataset [19]When the above comparison is made, it is according to the reference [24]]Settings regarding network structure and data sets. For the proposed method of solving LPOM, set μi=20.
The training and testing accuracy of the method for solving LPOM and the SGD method on MNIST data set is shown in fig. 1(a) and (b). It can be seen that the training accuracy of both methods is close to 100%, however, the test accuracy (98.2%) obtained by solving for LPOM using the inventive method is slightly better than SGD (98.0%). The training and testing accuracy of LPOM and SGD in the present invention on CIFAR-10 data set is shown in FIGS. 1(c) and (d). It can be seen that the training accuracy of both methods is close to 100%, however, the testing accuracy of LPOM in the present invention (52.5%) is higher than that of SGD (47.5%).
The testing accuracy of LPOM adopting the method of the invention and the reference [2] on MNIST is shown in Table 1. It can be seen that the results of LPOM in the present invention are significantly better than those of document [2] the test accuracy of LPOM in the present invention with SGD and document [24] on SVHN data set is shown in Table 2. it can be seen that the results of solving LPOM by the method of the present invention are better than those of SGD and document [24].
Table 1: comparison of LPOM and literature [2] solved by the method of the invention on MNIST data set
TABLE 2 comparison of LPOM and SGD solved by the method of the invention and document [24] on SVHD datasets
SGD | 95.0% |
Document [24] | 96.5% |
LPOM | 98.3% |
It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.
Claims (5)
1. A neural network optimization method for solving and improving a neighboring computer machine solves and improves a neighboring computer machine LPOM model by adopting a block coordinate descent method in the training of a forward neural network, each subproblem in the LPOM model has convergence, the weight and network activation of each layer of the neural network can be updated in parallel, and additional memory space is not occupied;
the objective function for promoting the LPOM model of the adjacent operator is expressed as formula 1:
wherein, Wi-1Is the i-1 layer network weight of the neural network; xiIs a layer i network activation, i 2, …, n; l (X)nL) is a loss function, n is the number of layers of the neural network;X1Is a training sample, X when i is 2i-1Is namely X1(ii) a L is X1Corresponding class labels;for matrix input, f (x) and g (x) are element-by-element; phi (x) is the activation function, phi-1Is the inverse function of phi; mu.si> 0 is a parameter of the ith penalty term; 1 is a column vector of all 1's; i | · | purple windFIs the Frobenius norm; if l (X)nL) with respect to XnIs convex and phi (x) is monotonically increasing, then LPOM is related to WiAnd XiIs block-convex, i.e. the objective function expressed by equation 1 is with respect to W if the remaining variables remain unchangediAnd XiIs convex;
the neural network optimization method for solving and improving the adjacent computer machines comprises the following steps:
1) randomly selecting m from neural network training samples1A training sample X1And L, wherein m1Is the size of the batch process, L is the training sample X1Corresponding class labels;
2) updating network activation X layer by layeriI is 1, …, n; perform operations 21) through 22):
21) sequentially updating X according to the sequence of i-1, …, n-1i: cycle 2 until convergence:
in formula 2,. mu.i+1A parameter which is the (i + 1) th penalty term;
22) updating Xn: loop 3 until convergence:
in formula 3,. mu.nA parameter that is the nth penalty term;
3) updating the network weight Wi,i=1,…,n-1:
In particular, assume thatIs β smooth, i.e., the inequality:if true; update W by the following procedurei:
And (3) initializing: wi,0、Wi,1、θ00; t is 1; wherein, Wi,0And Wi,1Is to iteratively update WiInitial value of (a), theta0Is the initial value of the parameter θ, t is the number of iterations;
31) calculating thetat:Wherein, thetat> 0 represents the value of the parameter theta at the t-th iteration;
33) Calculating Wi,t+1:Wherein, Wi,t+1Represents the t +1 th iteration update Wi,Represents XiThe pseudo-inverse of (1);
34) adding 1 to the number of iterations: t ← t + 1;
the steps 21), 22) and 3) have convergence guarantee;
through the steps, the method for solving and lifting the block coordinate descending of the adjacent operator machine is used for realizing the neural network optimization.
2. The neural network optimization method of claim 1, wherein the neural network optimization method performs parallel update of weights and network activations of each layer of the neural network.
3. The neural network optimization method of claim 1, wherein the neural network optimization method is applied to image recognition, speech recognition and natural language processing neural networks.
4. The neural network optimization method of claim 1, wherein the neural network optimization method is applied to image recognition;
5. A neural network optimization method as claimed in claim 4, wherein the image dataset is a MNIST, CIFAR-10 and/or SVHN dataset.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811203464.7A CN111062474B (en) | 2018-10-16 | 2018-10-16 | Neural network optimization method for solving and improving adjacent operator machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811203464.7A CN111062474B (en) | 2018-10-16 | 2018-10-16 | Neural network optimization method for solving and improving adjacent operator machine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111062474A true CN111062474A (en) | 2020-04-24 |
CN111062474B CN111062474B (en) | 2023-04-28 |
Family
ID=70296459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811203464.7A Active CN111062474B (en) | 2018-10-16 | 2018-10-16 | Neural network optimization method for solving and improving adjacent operator machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111062474B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112132760A (en) * | 2020-09-14 | 2020-12-25 | 北京大学 | Image recovery method based on learnable differentiable matrix inversion and matrix decomposition |
CN112183742A (en) * | 2020-09-03 | 2021-01-05 | 南强智视(厦门)科技有限公司 | Neural network hybrid quantization method based on progressive quantization and Hessian information |
CN113313175A (en) * | 2021-05-28 | 2021-08-27 | 北京大学 | Image classification method of sparse regularization neural network based on multivariate activation function |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110019693A1 (en) * | 2009-07-23 | 2011-01-27 | Sanyo North America Corporation | Adaptive network system with online learning and autonomous cross-layer optimization for delay-sensitive applications |
CN107784361A (en) * | 2017-11-20 | 2018-03-09 | 北京大学 | The neighbouring operator machine neural network optimization method of one kind lifting |
-
2018
- 2018-10-16 CN CN201811203464.7A patent/CN111062474B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110019693A1 (en) * | 2009-07-23 | 2011-01-27 | Sanyo North America Corporation | Adaptive network system with online learning and autonomous cross-layer optimization for delay-sensitive applications |
CN107784361A (en) * | 2017-11-20 | 2018-03-09 | 北京大学 | The neighbouring operator machine neural network optimization method of one kind lifting |
Non-Patent Citations (5)
Title |
---|
GEOFFREY HINTON 等: "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups" * |
亢良伊;王建飞;刘杰;叶丹;: "可扩展机器学习的并行与分布式优化算法综述" * |
李晓宇;周铭;袁晓彤;罗琦;刘青山;: "基于坐标下降邻域选择的高斯图模型结构并行估计" * |
李智;杨洪耕;: "基于分块坐标下降思想的并行无功优化分解协调算法" * |
谢佩;游科友;洪奕光;谢立华;: "网络化分布式凸优化算法研究进展" * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183742A (en) * | 2020-09-03 | 2021-01-05 | 南强智视(厦门)科技有限公司 | Neural network hybrid quantization method based on progressive quantization and Hessian information |
CN112183742B (en) * | 2020-09-03 | 2023-05-12 | 南强智视(厦门)科技有限公司 | Neural network hybrid quantization method based on progressive quantization and Hessian information |
CN112132760A (en) * | 2020-09-14 | 2020-12-25 | 北京大学 | Image recovery method based on learnable differentiable matrix inversion and matrix decomposition |
CN112132760B (en) * | 2020-09-14 | 2024-02-27 | 北京大学 | Image recovery method based on matrix inversion and matrix decomposition capable of learning and differentiating |
CN113313175A (en) * | 2021-05-28 | 2021-08-27 | 北京大学 | Image classification method of sparse regularization neural network based on multivariate activation function |
CN113313175B (en) * | 2021-05-28 | 2024-02-27 | 北京大学 | Image classification method of sparse regularized neural network based on multi-element activation function |
Also Published As
Publication number | Publication date |
---|---|
CN111062474B (en) | 2023-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rastegari et al. | Xnor-net: Imagenet classification using binary convolutional neural networks | |
Cortes et al. | Adanet: Adaptive structural learning of artificial neural networks | |
Cai et al. | Efficient architecture search by network transformation | |
Balaji et al. | Metareg: Towards domain generalization using meta-regularization | |
Guberman | On complex valued convolutional neural networks | |
CN110288030B (en) | Image identification method, device and equipment based on lightweight network model | |
Xu et al. | Deep neural network compression with single and multiple level quantization | |
Huang et al. | Deep networks with stochastic depth | |
US12001918B2 (en) | Classification using quantum neural networks | |
Lee et al. | Deeply-supervised nets | |
Godin et al. | Dual rectified linear units (DReLUs): A replacement for tanh activation functions in quasi-recurrent neural networks | |
Singh et al. | Layer-specific adaptive learning rates for deep networks | |
CN111062474B (en) | Neural network optimization method for solving and improving adjacent operator machine | |
CN110443372B (en) | Transfer learning method and system based on entropy minimization | |
Li et al. | Lifted proximal operator machines | |
Mosca et al. | Deep incremental boosting | |
US20220215252A1 (en) | Method and system for initializing a neural network | |
Hayou et al. | Mean-field behaviour of neural tangent kernel for deep neural networks | |
Wang et al. | Enresnet: Resnet ensemble via the feynman-kac formalism | |
Wu et al. | Steepest descent neural architecture optimization: Escaping local optimum with signed neural splitting | |
Basheer et al. | Alternating layered variational quantum circuits can be classically optimized efficiently using classical shadows | |
Roth et al. | Variational inference in neural networks using an approximate closed-form objective | |
Wani et al. | Training supervised deep learning networks | |
Wani et al. | Supervised deep learning architectures | |
Chavan et al. | A hybrid deep neural network for online learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |