CN112906861A

CN112906861A - Neural network optimization method and device

Info

Publication number: CN112906861A
Application number: CN202110163918.8A
Authority: CN
Inventors: 李卫军; 王国俊; 孙琳钧; 张丽萍
Original assignee: Institute of Semiconductors of CAS
Current assignee: Institute of Semiconductors of CAS
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-06-04

Abstract

The invention provides a neural network optimization method and a device, wherein the method comprises the following steps: inputting preset random noise data into a preset neural network model, and obtaining a solution of a target transcendental equation; and obtaining a loss value of the preset neural network model according to a target loss function of the preset neural network model and a solution of a target transcendental equation, and updating network parameters of the preset neural network model based on a back propagation algorithm according to the loss value so as to optimize the preset neural network model. The device is used for executing the method. According to the neural network optimization method and device, when the transcendental equation is solved based on the optimized neural network model, the process of specifically initializing solution parameters is omitted, a specific network structure is not required to be designed according to each target equation, the applicability is wide, and meanwhile when the transcendental equation is solved by the optimized neural network model obtained by the method, the precision and the stability of solving the transcendental equation are improved.

Description

Neural network optimization method and device

Technical Field

The invention relates to the technical field of artificial neural networks, in particular to a neural network optimization method and device.

Background

Transcendental equation solving method based on neural network in engineering applicationA shallow neural network is designed by the method S.K.Jeswal, and parameters in the network are updated by directly using a back propagation algorithm so as to obtain the solution of the transcendental equation. E.g. solving transcendental equation x²-log_eWhen x-1 is 0, the network structure of Jeswal method is shown in fig. 1.

The Jeswal network considers the weight of the connection between the input layer and the first hidden layer as a solution to the transcendental equation, so that neurons in the layers behind the network can be determined by each term in the transcendental equation. Each specific term in the equation and its corresponding priority relationship determine the specific network structure. The weights in the remaining layers in the network, except the first layer, may be determined by coefficients in the target override equation.

Initial estimate x for a given solution as it propagates forward₀Obtaining the output f (x) of the network₀)。

At the output f (x) to the network₀) Calculating tanh (f (x) after tanh activation operation₀) ) and 0 and takes this error as the loss value loss of the network.

loss＝tanh(f(x₀))–0 (2)

In the back propagation, parameters in the network, namely the solution x of the equation, are updated according to a gradient descent method.

Although the Jeswal network can solve the solution of the transcendental equation, the solution idea of directly performing gradient descent on the solution parameters often causes the problems of low convergence rate, poor stability and the like of the solution. In addition, the Jeswal network still does not solve the problem that the traditional method requires strict requirements for solving initial estimation values, namely certain initialization is required, and the use of the model in practical problems is limited by giving initial values to parameters in the Jeswal network.

Disclosure of Invention

The neural network optimization method provided by the invention is used for overcoming the problems in the prior art, solving the transcendental equation based on the optimized neural network model, and improving the precision and stability of solving the transcendental equation.

The invention provides a neural network optimization method, which comprises the following steps:

inputting preset random noise data into a preset neural network model, and obtaining a solution of a target transcendental equation;

and obtaining a loss value of the preset neural network model according to a target loss function of the preset neural network model and a solution of the target transcendental equation, and updating network parameters of the preset neural network model based on a back propagation algorithm according to the loss value so as to optimize the preset neural network model.

According to the neural network optimization method provided by the invention, the target loss function is determined by the following method:

determining the target loss function according to a mean square loss function between an output value obtained after the solution of the target transcendental equation is input to a target discriminator and a preset real value;

wherein the target arbiter is determined by the target override equation.

According to the neural network optimization method provided by the invention,

updating the network parameters of the preset neural network model based on a back propagation algorithm according to the loss value to optimize the neural network model, including:

iteratively updating the network parameters of the preset neural network model based on a back propagation algorithm according to the loss value, and stopping updating the network parameters when a preset network convergence condition is met;

and optimizing the preset neural network model according to the updated network parameters.

According to the neural network optimization method provided by the invention, the preset network convergence condition comprises the following steps:

the loss value is smaller than a preset error value; or

The iteration times reach the preset maximum iteration times.

According to the neural network optimization method provided by the invention, the iteratively updating the network parameters of the preset neural network model comprises the following steps:

updating the network parameters according to the network parameters after the previous iteration, a preset learning rate and the values of the target partial derivatives of the network parameters after the previous iteration;

and determining the target partial derivative according to the partial derivative of the target loss function to the network parameter.

According to a neural network optimization method provided by the invention, the preset neural network model comprises the following steps:

a fully-connected neural network or a convolutional neural network.

The present invention also provides a neural network optimization apparatus, including: an acquisition module and an optimization module;

the acquisition module is used for inputting random noise data into a preset neural network model and acquiring a solution of a target transcendental equation;

the optimization module is used for obtaining a loss value of the preset neural network model according to a target loss function of the preset neural network model and a solution of the target transcendental equation, and updating network parameters of the preset neural network model based on a back propagation algorithm according to the loss value so as to optimize the preset neural network model.

The neural network optimization device provided by the invention further comprises:

a loss function determination module for determining the target loss function according to a mean square loss function between an output value obtained by inputting the solution of the target transcendental equation to a target discriminator and a preset true value;

wherein the target arbiter is determined by the target override equation.

The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the neural network optimization method as described in any one of the above when executing the program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the neural network optimization method as described in any one of the above.

According to the neural network optimization method and device, when the transcendental equation is solved based on the optimized neural network model, the process of specifically initializing solution parameters is omitted, a specific network structure is not required to be designed according to each target equation, the applicability is wide, and meanwhile when the transcendental equation is solved by the optimized neural network model obtained by the method, the precision and the stability of solving the transcendental equation are improved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a network architecture diagram of prior art solution of an object transcendental equation via a Jeswal neural network;

FIG. 2 is a schematic flow chart of a neural network optimization method provided by the present invention;

FIG. 3 is a schematic diagram of a neural network model structure provided by the present invention;

FIG. 4 is a schematic structural diagram of a neural network optimization device provided by the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 2 is a schematic flow chart of a neural network optimization method provided by the present invention, and as shown in fig. 2, the method includes:

s1, inputting preset random noise data into a preset neural network model, and obtaining a solution of a target transcendental equation;

s2, obtaining a loss value of the preset neural network model according to a target loss function of the preset neural network model and a solution of a target transcendental equation, and updating network parameters of the preset neural network model based on a back propagation algorithm according to the loss value so as to optimize the preset neural network model.

It should be noted that the execution subject of the method may be an electronic device, a component in an electronic device, an integrated circuit, or a chip. The electronic device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, and the like, which is not limited in this respect.

The method skillfully utilizes the strong nonlinear fitting capability and the strong generating capability of the classical neural network to solve the transcendental equation. Compared with other neural network methods, the new solution idea has better convergence and stability, and provides new references for solving problems of other nonlinear systems. In addition, compared with the transcendental equation solving method in the prior art, the method has better universality, for example, solution parameters in a neural network model can be randomly initialized without designing a specific network structure aiming at a specific equation.

Specifically, the present invention directly generates the solution of the target transcendental equation rather than updating the equation solution as an unknown parameter. The new idea provides a new reference for solving problems of other nonlinear systems.

According to the generation concept of generating a countermeasure Network (GAN), the preset neural Network model in the invention mainly comprises two parts, wherein one part is a generation Network formed by a fully connected neural Network or a convolutional neural Network, and the other part is a target discriminator formed by a mathematical expression of a target transcendental equation, wherein the structure of the preset neural Network model provided by the invention is shown in fig. 3.

GAN is a method of unsupervised learning by letting two neural networks game each other. This method was proposed by ien, gudefeilo et al in 2014. The GAN consists of a generation network and a discrimination network. The generation network takes random samples from the underlying space (latency) as input, and its output needs to mimic as much as possible the real samples in the training set. The input of the discrimination network is the real sample or the output of the generation network, and the purpose is to distinguish the output of the generation network from the real sample as much as possible. The generation network should cheat the discrimination network as much as possible. The two networks resist each other and continuously adjust parameters, and the final purpose is to make the judgment network unable to judge whether the output result of the generated network is real or not.

The method comprises the steps of inputting collected random noise data serving as original data into a generation network to generate a solution a of a target transcendental equation, obtaining a loss value of a preset neural network model through calculation according to a target loss function of the preset neural network model and the solution of the target transcendental equation, and updating network parameters in the generation network by using a back propagation algorithm to optimize the preset neural network model. Finally, the solution generated by the generating network is considered to be the solved transcendental equation solution.

Back Propagation (BP), which is a short term for "error back propagation," is a common method used in conjunction with optimization methods (such as gradient descent) to train artificial neural networks. The method calculates the gradient of the loss function for all weights in the network. This gradient is fed back to the optimization method for updating the weights to minimize the loss function.

When the transcendental equation with the given domain solution is solved, only the neuron on the last layer in the generated network needs to be correspondingly constrained, but other neural network methods cannot realize the operation of parameter constraint.

The optimization objective function of the preset neural network model in the invention is as follows:

θ＝arg min(function(Discriminator(Generator(D,θ)),0))

the solution x of the target transcendental equation is represented by a large number of parameters in a generated network in a preset neural network model, and the parameterized representation of the solution enables the generated network to select a better optimization direction according to the gradient direction in the process of generating the solution, so that the convergence speed and stability of the technology in solving the transcendental equation are improved, and the parameters in the generated network can be randomly initialized, so that specific initialization operation is not needed in the invention, and the universality of the technology is further improved.

The neural network optimization method provided by the invention omits the specific initialization process of solution parameters when solving the transcendental equation based on the optimized neural network model, and does not need to design a specific network structure according to each target equation, so that the neural network optimization method has wide applicability, and meanwhile, the precision and stability of solving the transcendental equation are improved when solving the transcendental equation by using the optimized neural network model obtained by the method.

Further, in one embodiment, the target loss function is determined by:

determining a target loss function according to a mean square loss function between an output value obtained after the solution of the target transcendental equation is input to the target discriminator and a preset true value;

wherein the target arbiter is determined by a target override equation. Specifically, the collected random noise data is input into a generation network as original data to generate a solution a of a target transcendental equation, then a target discriminator evaluates the solution generated by the generation network, namely, the solution a of the target transcendental equation is input into the target discriminator to obtain an output value, and a target loss function is determined according to a mean square loss function between the output value of the target discriminator and a preset real value.

For example: 100 random noise data D are sampled from the one-dimensional space and used as input data of a preset neural network model. When solving a target transcendental equation f (x) is 0, inputting random noise data D into a generation network in a preset neural network model and generating a solution x of the transcendental equation a;

a＝Generator(D,θ) (4)

where θ represents a network parameter in the generated network, and the Generator is the generated network in the preset neural network model. Subsequently, the solution a generated by the generation network is input to a target discriminator composed of the transcendental equation mathematical expression f (x) to obtain f (a). And calculating an error value between f (a) and a predetermined true value of 0 based on a predefined loss function loss, such as a mean square loss function.

f(a)＝Discriminator(a) (5)

loss＝function(f(a),0) (6)

Wherein, the Discriminator is a target Discriminator, the function is a mean square loss function, and the loss is a target loss function of the preset neural network model.

And after the loss value loss of the preset neural network model is obtained through calculation, updating the network parameter theta in the generated network in the preset neural network model by using a back propagation algorithm so as to optimize the preset neural network model.

The neural network optimization method provided by the invention is characterized in that a target discriminator of a preset neural network model is determined according to a target transcendental equation to be solved, a target loss function of the preset neural network model is obtained by combining a mean square loss function, and network parameters of a generation network are continuously trained by using the obtained target discriminator, so that the solution obtained by solving the target transcendental equation by the optimized preset neural network model is closer to the real solution of the target transcendental equation.

Further, in an embodiment, the updating the network parameters of the preset neural network model based on the back propagation algorithm according to the loss value in step S2 to optimize the neural network model specifically includes:

s21, iteratively updating the network parameters of the preset neural network model based on a back propagation algorithm according to the loss value, and stopping updating the network parameters when the preset network convergence condition is met;

and S22, optimizing the preset neural network model according to the updated network parameters.

Further, in an embodiment, the presetting of the network convergence condition may specifically include:

the loss value is smaller than a preset error value; or

The iteration times reach the preset maximum iteration times.

Specifically, network parameters of a preset neural network model are updated iteratively based on a back propagation algorithm according to the loss value, and updating of the network parameters is stopped when preset network convergence conditions are met;

for example, the iteration number of the preset neural network model is set to M, the iteration step is set to 1, the initial iteration number is M-0, and the initial iteration number M is updated according to a formula M-M +1, where M is a positive integer;

comparing the size of M with the size of M, if M is larger than M, determining that the preset neural network model is converged, and stopping updating network parameters of a generated network in the preset neural network model;

or setting the preset error value as sigma, if the loss value is less than the preset error value loss < sigma, determining the convergence of the preset neural network model, and stopping updating the network parameters of the generated network in the preset neural network model.

And according to the updated network parameters, completing the optimization of the preset neural network model.

According to the neural network optimization method, the network parameters of the preset neural network model are updated by using a back propagation algorithm according to the target loss function, the updated network parameters are used for carrying out iterative training on the preset neural network model until the network converges, and the accuracy of solving the solution of the target transcendental equation of the preset neural network model is improved.

Further, in an embodiment, iteratively updating the network parameters of the preset neural network model may specifically include:

updating the network parameters according to the network parameters after the previous iteration, the preset learning rate and the values of the target partial derivatives of the network parameters after the previous iteration;

and determining a target partial derivative according to the partial derivative of the target loss function to the network parameter.

Specifically, according to the network parameter theta after the previous iteration_t-1Presetting a learning rate eta and a network parameter theta after previous iteration_t-1Partial derivatives of network parameters substituted into the objective loss function

Updating network parameters of a generated network in the preset neural network model based on formula (7):

it should be noted that the present invention can also optimize the network parameters of the preset neural network model by using Adam algorithm.

The neural network optimization method provided by the invention has the advantages that the network parameters determined by the method for iteratively updating the network parameters are adopted, the precision, the convergence speed and the stability of solving the solution of the object transcendental equation are greatly improved compared with other neural network methods and traditional methods, in addition, the network parameters are initialized randomly, namely the initial solution of the object transcendental equation is random, and the universality of the technology is further improved.

Further, in an embodiment, the presetting of the neural network model may specifically include:

a fully-connected neural network or a convolutional neural network.

Specifically, the generation network in the preset neural network model of the invention adopts a fully-connected neural network or a convolutional neural network. Among them, the Convolutional Neural Network (CNN) is a kind of feed-forward Neural Network, and its artificial neurons can respond to a part of surrounding units in the coverage range, and has excellent performance for large-scale image processing. The convolutional neural network consists of one or more convolutional layers and a top fully connected layer (corresponding to the classical neural network), and also includes associated weights and pooling layers (pooling layers).

The neural network is fully connected, each neuron in the network takes the output of all neurons in the previous layer as input, the output of each neuron in the next layer is taken as input, and each neuron in the adjacent layer has a connection right, such as a Multilayer Perceptron (MLP).

According to the neural network optimization method, the fully-connected neural network or the convolutional neural network is selected as the generation network of the preset neural network model, initialization processing is not needed to be carried out on network parameters of the generation network, and the problem that the traditional method is strict in requirements for solving initial estimation values of the object transcendental equation is solved.

The following describes the neural network optimization device provided by the present invention, and the neural network optimization device described below and the neural network optimization method described above may be referred to correspondingly.

Fig. 4 is a schematic structural diagram of a neural network optimization device provided in the present invention, as shown in fig. 4, including: an acquisition module 410 and an optimization module 420;

the obtaining module 410 is configured to input random noise data to a preset neural network model, and obtain a solution of a target transcendental equation;

and the optimization module 420 is configured to obtain a loss value of the preset neural network model according to a target loss function of the preset neural network model and a solution of the target transcendental equation, and update a network parameter of the preset neural network model based on a back propagation algorithm according to the loss value, so as to optimize the preset neural network model.

The neural network optimization device provided by the invention omits a specific initialization process for solution parameters when solving the transcendental equation based on the optimized neural network model, and does not need to design a specific network structure according to each target equation, so that the neural network optimization device has wide applicability, and meanwhile, the precision and stability for solving the transcendental equation are improved when solving the transcendental equation by using the optimized neural network model obtained by the invention.

Further, in one embodiment, the apparatus further comprises:

a loss function determining module 430, configured to determine a target loss function according to a mean square loss function between an output value obtained by inputting the solution of the target transcendental equation to the target discriminator and a preset true value;

wherein the target arbiter is determined by a target override equation.

The neural network optimization device provided by the invention determines the target discriminator of the preset neural network model according to the target transcendental equation to be solved, obtains the target loss function of the preset neural network model by combining the mean square loss function, and continuously trains the network parameters of the generated network by using the obtained target discriminator, so that the solution obtained by solving the target transcendental equation by the optimized preset neural network model is closer to the real solution of the target transcendental equation.

Fig. 5 is a schematic physical structure diagram of an electronic device provided in the present invention, and as shown in fig. 5, the electronic device may include: a processor (processor)510, a communication interface (communication interface)511, a memory (memory)512 and a bus (bus)513, wherein the processor 510, the communication interface 511 and the memory 512 complete mutual communication through the bus 513. Processor 510 may call logic instructions in memory 512 to perform the following method:

and obtaining a loss value of the preset neural network model according to a target loss function of the preset neural network model and a solution of a target transcendental equation, and updating network parameters of the preset neural network model based on a back propagation algorithm according to the loss value so as to optimize the preset neural network model.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

Further, the present invention discloses a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the neural network optimization method provided by the above-mentioned method embodiments, for example, comprising:

In another aspect, the present invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the neural network optimization method provided in the foregoing embodiments, for example, including:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A neural network optimization method, comprising:

2. The neural network optimization method of claim 1, wherein the objective loss function is determined by:

wherein the target arbiter is determined by the target override equation.

3. The neural network optimization method according to claim 2, wherein the updating the network parameters of the preset neural network model based on a back propagation algorithm according to the loss values to optimize the neural network model comprises:

4. The neural network optimization method of claim 3, wherein the preset network convergence condition comprises:

the loss value is smaller than a preset error value; or

The iteration times reach the preset maximum iteration times.

5. The neural network optimization method of claim 3, wherein the iteratively updating the network parameters of the preset neural network model comprises:

6. The neural network optimization method according to any one of claims 1 to 5, wherein the presetting of the neural network model comprises:

a fully-connected neural network or a convolutional neural network.

7. An apparatus for neural network optimization, comprising: an acquisition module and an optimization module;

8. The neural network optimization device of claim 7, further comprising:

wherein the target arbiter is determined by the target override equation.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the neural network optimization method of any one of claims 1 to 6 when executing the computer program.

10. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the neural network optimization method according to any one of claims 1 to 6.