CN114548400A - Rapid flexible full-pure embedded neural network wide area optimization training method - Google Patents
Rapid flexible full-pure embedded neural network wide area optimization training method Download PDFInfo
- Publication number
- CN114548400A CN114548400A CN202210125273.3A CN202210125273A CN114548400A CN 114548400 A CN114548400 A CN 114548400A CN 202210125273 A CN202210125273 A CN 202210125273A CN 114548400 A CN114548400 A CN 114548400A
- Authority
- CN
- China
- Prior art keywords
- training
- function
- neural network
- activation function
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 65
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 35
- 238000005457 optimization Methods 0.000 title claims abstract description 9
- 230000004913 activation Effects 0.000 claims abstract description 52
- 238000003062 neural network model Methods 0.000 claims abstract description 15
- 238000005070 sampling Methods 0.000 claims abstract description 8
- 238000004364 calculation method Methods 0.000 claims abstract description 6
- 238000012360 testing method Methods 0.000 claims abstract description 5
- 230000008569 process Effects 0.000 claims description 8
- 235000015220 hamburgers Nutrition 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 230000003213 activating effect Effects 0.000 claims description 3
- 230000009191 jumping Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 91
- 210000002569 neuron Anatomy 0.000 description 11
- 238000010586 diagram Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 210000004556 brain Anatomy 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 230000035939 shock Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003925 brain function Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005183 dynamical system Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000017074 necrotic cell death Effects 0.000 description 1
- 230000003389 potentiating effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000012892 rational function Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
- G06F17/13—Differential equations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Operations Research (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
- Feedback Control In General (AREA)
Abstract
The invention provides a quick, flexible and fully-pure embedded neural network wide-area optimization training method, which comprises the following specific steps: step 1, determining a differential equation to be solved, and sampling in a defined domain to obtain training data and test data; step 2, constructing a neural network model, and embedding an activation function layer based on piecewise rational approximation; step 3, adjusting the hyper-parameters and training a neural network model; step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3. The activation function constructed by the piecewise rational approximation method is superior to a common activation function in the aspects of training time and training precision, and a powerful solution is provided for rapidly and accurately solving the problem of high-dimensional partial differential equations related to actual engineering calculation tasks.
Description
Technical Field
The invention relates to the technical field of information science and engineering calculation, in particular to a quick, flexible and fully-embedded neural network wide-area optimization training method.
Background
Partial differential equations are widely used in various fields of natural science and engineering applications such as oil and gas exploration, bridge design, and machine manufacturing. However, in some complex scenarios, it is difficult to have an analytical solution. Numerical methods such as the traditional methods of finite difference, finite element, finite volume, etc. are more commonly used. However, the conventional method needs to divide the region into a plurality of grid cells to approximate the solution space of the partial differential equation, and the number of grids is huge when the dimension is very high, so that the calculation cost is very large. And the partial differential equation is solved by using a Neural Network (NN), grid division is not needed, and random sampling is carried out in the space to be used as the input of the model, so that dimension disasters are avoided.
In the past decade, Deep Neural Networks (DNNs) have evolved as the fundamental technology and key tool for machine learning. Research finds that the performance of the method is superior to that of the traditional statistical learning technology (such as a nuclear method, a support vector machine and a random forest) in many practical applications such as image classification, voice recognition, image segmentation and medical imaging.
A neural network is a complex network system formed by a large number of simple processing units (called neurons) widely connected to each other, reflects many basic features of human brain functions, and is a highly complex nonlinear dynamical learning system. The neural network has four basic features:
(i) non-linear: non-linear relationships are a common property of nature. The intelligence of the brain is a nonlinear phenomenon. The artificial neuron is in two different states of activation or inhibition, and the behavior is mathematically expressed as a nonlinear relationship. The network formed by the neurons with the threshold value has better performance, and the fault tolerance and the storage capacity can be improved.
(ii) Without limitation: a neural network is typically formed by a plurality of widely connected neurons. The overall behavior of a system depends not only on the characteristics of the individual neurons, but may be primarily determined by the interactions, interconnections, between the units. The non-limitations of the brain are simulated by a large number of connections between the cells. Associative memory is a non-limiting representative example.
(iii) Very qualitative: the artificial neural network has the self-adaption, self-organization and self-learning capabilities. The neural network not only can process various information, but also can process information, and simultaneously, the nonlinear dynamical system is changed continuously. Iterative processes are often used to describe the evolution of dynamic or time-varying systems.
(iv) Non-convex: the direction of evolution of a system will, under certain conditions, depend on a particular state function. For example an energy function, the extreme values of which correspond to a more stable state of the system. Non-convexity means that the function has a plurality of extreme values, so that the system has a plurality of stable equilibrium states, which leads to the diversity of the system evolution.
The activation function plays an important role for the artificial neural network model in learning and understanding the complex change law (which is generally highly nonlinear). They introduce non-linear characteristics into the network. In neurons, the inputs are weighted, summed, and applied to a function, which is the activation function. The activation function introduces nonlinear factors to the neurons, so that the neural network can arbitrarily approximate any nonlinear function, and the neural network can be applied to a plurality of nonlinear models.
There are no clear guiding theoretical principles for the selection of activation functions, and the general choices are ReLu function, Sigmoid function and hyperbolic tangent function. Existing activation functions are often one of the above three functions or variations of the three functions (e.g., with one to two trainable parameters). The advantages and disadvantages of these three activation functions are:
(i) ReLu function is the most commonly used activation function in modern neural networks, most feed-forward neural networks use the activation function by default. The method has the advantages that the algorithm is fast in convergence, and meanwhile, the problems of gradient saturation, gradient disappearance and the like cannot occur in the region where x is greater than 0; in addition, its disadvantages are also evident, including: the phenomenon of neuron necrosis caused by the fact that the ReLu function is constantly zero in a negative number region, at the moment, the gradient of a neuron and the gradient behind the neuron are always zero, and updating can not be carried out in the training round; meanwhile, because the second derivative and higher derivatives of the ReLu function in the positive and negative number regions are zero, the neural network model cannot be trained effectively in some special applications (such as solving a differential equation by using a neural network).
(ii) Sigmoid function has the advantages that the output of the function is between (0,1), the optimization is stable, the function is continuous and the derivation is convenient; the disadvantage is that the function is not sensitive to input and output because it saturates when the absolute value of the variable is very large.
(iii) The hyperbolic tangent function can be regarded as a variation of the Sigmoid function, and the problem of gradient saturation still exists.
Disclosure of Invention
In order to solve the problems, the invention provides a rapid, flexible and fully-embedded neural network wide-area optimization training method which is strong in expression capability, good in smoothness and convenient to calculate.
In order to achieve the purpose, the invention is realized by the following technical scheme:
the invention relates to a quick, flexible and fully-pure embedded neural network wide-area optimization training method, which comprises the following steps:
step 1, determining a differential equation to be solved, and sampling in a defined domain to obtain training data and test data;
step 2, constructing a neural network model, and embedding an activation function layer based on piecewise rational approximation;
step 3, adjusting the hyper-parameters and training a neural network model;
step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished;
otherwise, returning to the step 3.
The invention is further improved in that: the differential equation in step 1 is a Burgers equation.
The invention is further improved in that: the neural network model constructed in the step 2 comprises an input layer, four full-connection layers, four activation function layers and an output layer.
The invention is further improved in that: the construction of the activation function of the piecewise rational approximation in step 2 is as follows:
suppose at some point x0The function f (x) is approximated using a single point pade approximation method of the form:
wherein p iskAnd q iskIs a coefficient to be obtained, L represents the highest order of x in the numerator, and M represents the highest order of x in the denominator. When L + M is constant, when L ═ M is taken, the numerator and denominator are solved in the following manner. Let L be M be n, first solve the linear equation AqB to obtain (q)1,q2,q3,…,qn) A value of (a), wherein:
(p) is obtained by the following equation0,p1,p2,…,pn) The value of (c):
the multi-point pade approximation is a generalized version of the single-point pade approximation. Setting the approximated function f (x) if x is at n +1 interpolation points0,x1,x2,…,xnIf the function value is known, there is a rational formula:
wherein L + M ═ n, u[L/M](x) Is a polynomial of highest order L, v[L/M](x) Is a polynomial of highest order M:
here, u[L/M](x) And v[L/M](x) Is a polynomial function that needs to be constructed by mean-difference;
first, the mean deviation of f (x) is defined as follows:
let fi,jIs f [ x ]i,xi+1,…,xj]J is not less than i; then u[L/M](x) Can be calculated by:
at the same time, v[L/M](x) Can be calculated by the following way:
the segmented Pade approximation used by the invention is a special form of multi-point Pade approximation, which is constructed by giving each interpolation point, function value at the interpolation point and derivative value from first order to m order and constructing each segment based on the multi-point Pade approximation.
Let the approximated function be f (x) and at n +1 interpolation points x0,x1,x2,…,xnThe method comprises the following steps:
arbitrarily take a section of interval [ xk,xk+1]Structure ofPade approximation expression:
wherein L + M +1 ═ n,andthe expression of (c) has been given in equations (8) and (9). The specific calculation process needs to consider an equivalent set formed by 2m +2 points:
according to the formula (8) and the formula (9), the mean difference fi,j=f[zi,zi+1,…,zj],0≤i≤j≤2m+1;
From the nature of the mean difference and equation (10):
when i is more than or equal to 0 and less than or equal to m and j is more than or equal to m +1 and less than or equal to 2m +1, a recurrence formula is as follows:
when i +1 is more than or equal to m +1, directly calculating according to the formula (14);
when j-1 is not more than m, directly calculating according to a formula (13);
calculating fi,jIs substituted into the formula (8) and the formula (9) to obtainAndfurther, find outFunction r constructed from a piecewise Pade approximationL/M(x) Expressed as:
the invention is further improved in that: setting the training round as N in the step 3, wherein the training steps are as follows:
step 3.1, inputting the training data into a neural network, and executing step 3.2;
step 3.2, data in the module is transmitted in the forward direction, and data Hn×mInputting the data into an activation function layer, and executing the next step;
step 3.3, activating the hyper-parameter x of the function layer0,x1,x2,…,xnAnd trainable parameters The piecewise function is obtained according to equations (10) - (16) as interpolation points and derivative values from zero order to m orderForming a piecewise activation function r[L/M](x);
Step 3.4, data Hn×mAfter passing through the activation function r[L/M](x) To obtain an output Zn×mExpressed as:
to obtain an output Zn×m;
3.5, continuing forward propagation of the data until a next activation function layer is encountered, jumping to the step 3.3, and otherwise, executing the step 3.6;
step 3.6, obtaining a training result, calculating the value of a loss function, and automatically performing back propagation and updating neural network weight and trainable parameters by a framework; if the current round is less than or equal to N, a new batch of training data is taken, and the step 3.2 is skipped; otherwise, the model training process is ended.
The invention is further improved in that: step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3.
The beneficial effects of the invention are: the invention provides an activation function based on segmentation rational approximation according to the idea of fast flexible all-pure embedding (FFHE). Firstly, initializing function points, function values and various derivative values,
and constructing a segmented activation function by using a segmented rational approximation method. Its advantages are as follows:
(i) more potent expression: the expression capability of the piecewise function is stronger than that of the ordinary function, and a solid theoretical foundation is provided. It has been demonstrated in the literature that under the Lipschitz condition, the property of point-by-point nonlinearity is linked to the global Lipschitz constant of the network by introducing a boundary, and then regularization is performed using the boundary to derive a representation theorem indicating that the optimal configuration is implemented by a depth spline network, wherein each activation function is a piecewise linear spline function with adaptive nodes.
(ii) Better smoothness: other commonly used activation functions such as ReLu function, PReLu function and piecewise linear spline are only piecewise first-order derivative, which is limited in some scenes, for example, solving a differential equation by using a neural network usually requires a second-order derivative or even a higher-order derivative of a network output to an input, and the activation function which is only first-order derivative causes a gradient to be zero and cannot effectively update parameters.
(iii) More flexible and easy to calculate: the activation function based on the piecewise rational approximation sets an initialization function point, a function value and each order derivative, takes the function value and each order derivative as parameters which can be adjusted along with the training of the neural network, and the adaptive adjustment of the parameters enables the back propagation of the neural network to be updated towards the steepest direction, so that the activation function needs fewer turns than other activation functions to achieve the expected precision.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
FIG. 2 is a flow chart of neural network model training based on a piecewise rational approximation activation function.
Fig. 3 is a schematic diagram of a neural network model structure of the present invention.
Fig. 4 is a schematic diagram of the structure of the PINNs model.
Fig. 5 is a graph of the LeakyReLu, ReLu, Tanh, and FFHE activation function training.
Detailed Description
Embodiments of the invention will be described in detail below with reference to the drawings, and for the sake of clarity, many implementation details will be set forth in the following description. It should be understood, however, that these implementation details should not be used to limit the invention. That is, details of these implementations are not necessary in some embodiments of the invention.
As shown in fig. 1-3, the invention is a fast, flexible and fully-embedded neural network wide-area optimization training method, comprising the following steps:
step 1, determining a differential equation to be solved, and sampling in a defined domain to obtain training data and test data; step 2, constructing a neural network model, and embedding an activation function layer based on piecewise rational approximation;
step 3, adjusting the hyper-parameters and training a neural network model;
step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3.
The differential equation to be solved in the step 1 is a Burgers equation, and the Burgers equation is a very useful mathematical model for many physical problems, such as shock waves, shallow water wave problems, traffic flow mechanics and the like, and is an important mathematical model for describing the physical world diffusion phenomenon. It is a nonlinear partial differential equation that simulates the propagation and reflection of a shock wave, defined as follows:
ut+uux-(0.01/π)uxx=0,x∈[-1,1],t∈[0,1],
u(0,x)=-sin(πx),
u(t,-1)=u(t,1)=0.
the equation is a time-varying, one-dimensional, partial differential equation with an initial condition and boundary conditions in z-state space.
The PINNs model is used in step 2, and the approximate structure of the model is shown in FIG. 4, with the independent variables x and t of the differential equation as inputs and the dependent variable u as an output. NN (x, t; theta) in the graph is represented as a fully-connected neural network, and theta is the weight of a hidden layer of the neural network. The PDE (λ) part of the graph represents the composition of the loss function in the neural network model. The penalty function for PINNs is divided into two parts: one block is the initial conditions and boundary portion and one block is the equation itself.
Taking Burgers equation as an example, the boundary and initial up-sampling number are set as NuThe number of samples within the boundary is Nf. The first part of the loss function is to compute the MSE of the output of the model over initial and boundary conditions:
the second part of the loss function is to compute the MSE of the output of the model over the equation:
is provided with
γ=ut+uux-(0.01/π)uxx.
Then there are:
the final loss function is the sum of the two:
MSE=MSEu+MSEf.
as shown in FIG. 3, the present invention provides a fully-connected neural network of PINNs with four hidden layers of 20 neurons per layer. And (2) obtaining 25600 (x, t) data pairs by sampling in the boundary and the initial value, obtaining 10000 (x, t) data pairs in the boundary and 100 (x, t) data pairs on the boundary and the initial value by adopting a Latin hypercube sampling method in all the data, and taking 10100 data pairs as training data of the model in total. The remaining (x, t) data pairs were used as test data for the model.
Each full-connection hidden layer is followed by an activation function layer based on piecewise rational approximation, each activation function layer has six trainable parameters, and each activation function layer has n +1 hyper-parameters x0,x1,x2,…,xnRepresenting interpolation points, (m +1) (n +1) trainable parameters Representing the derivative values from zero to m.
The invention designs the activation function according to the thought of fast flexible full-pure embedding (FFHE) and by combining the related mathematical knowledge of the piecewise rational approximation. Wherein, the Pade approximation is a method for constructing rational function approximation, and the Pade approximation is more accurate than truncated Taylor series; moreover, even when the taylor series does not converge, the pade approximation tends to converge. In addition, when constructing the interpolation function, in order to avoid the longge phenomenon caused by the high-order polynomial, a method of piecewise interpolation is generally adopted, and the interpolation result depends only on a few surrounding points, so that a composite piecewise function is finally formed.
The construction process of the activation function based on the piecewise rational approximation in the step 2 is as follows:
the details of which are described in the formulae (10) - (16)
Setting the maximum training round as N in the step 3, and training the neural network model specifically comprises the following steps:
step 3.1, inputting the training data into a neural network, and executing step 3.2;
step 3.2, data in the module is transmitted in the forward direction, and data Hn×mInputting the data into an activation function layer, and executing the next step;
step 3.3, activating the hyper-parameter x of the function layer0,x1,x2,…,xnAnd trainable parameters The piecewise function is obtained according to equations (10) - (16) as interpolation points and derivative values from zero order to m orderForming a piecewise activation function r[L/M](x);
Step 4.4, data Hn×mAfter passing through the activation function r[L/M](x) To obtain an output Zn×mExpressed as:
to obtain an output Zn×m;
3.5, continuing forward propagation of the data until a next activation function layer is encountered, jumping to the step 3.3, and otherwise, executing the step 3.6;
step 3.6, obtaining a training result, calculating a value of a loss function, and automatically performing reverse propagation and updating neural network weight and trainable parameters by a framework; if the current round is less than or equal to N, a new batch of training data is taken, and the step 3.2 is skipped; otherwise, the model training process is ended.
Step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3.
7000 rounds of training were performed with the learning rate set to 0.002. The LeakyReLu activation function and the ReLu activation function have the worst training effect, and the corresponding training curves in FIG. 5 are the top two almost coincident curves. The average training time of each hundred rounds of activation functions constructed by the segmented Pade approximation is 4.307s, and the average training time of each hundred rounds of Tanh functions is 3.532 s; the activation function of the segmented Pade approximation construction reaches 9.4067E-04 at the 1500 th round of training error, and the Tanh function is trained to 7000 rounds and just reduces the training error to 9.1780E-04. That is, the error of the FFHE method can be reduced to the same level only by about one fifth of the training round required by Tanh; if 7000 rounds are trained, the FFHE method gives results that are more than two orders of magnitude (100 times) more accurate than those obtained using Tanh. Therefore, the activation function constructed by using the FFHE (segmented Pade approximation) method is superior to a general activation function in the aspects of training time and training precision. Therefore, the method provides a powerful solution for rapidly and accurately solving the problem of the high-dimensional partial differential equation involved in the actual engineering calculation task.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.
Claims (6)
1. A quick, flexible and fully-pure embedded neural network wide area optimization training method is characterized by comprising the following steps: the method comprises the following steps:
step 1, determining a differential equation to be solved, and sampling in a defined domain to obtain training data and test data;
step 2, constructing a neural network model, and embedding an activation function layer based on piecewise rational approximation;
step 3, adjusting the hyper-parameters and training a neural network model;
step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3.
2. The method of claim 1, wherein the method comprises: the differential equation in step 1 is a Burgers equation.
3. The method of claim 1, wherein the method comprises: the neural network model constructed in the step 2 comprises an input layer, four full-connection layers, four activation function layers and an output layer.
4. The method of claim 1, wherein the method comprises: the construction of the activation function of the piecewise rational approximation in step 2 is as follows:
suppose at some point x0The function f (x) is approximated using a single point pade approximation method of the form:
wherein p iskAnd q iskIs a coefficient to be obtained, L represents the highest order of x in the numerator, and M represents the highest order of x in the denominator. When L + M is constant, when L ═ M is taken, the numerator and denominator are solved in the following manner. Let L ═ M ═ n, first solve linear equation Aq ═ b, get (q ═ b)1,q2,q3,...,qn) A value of (a), wherein:
(p) is obtained by the following equation0,p1,p2,...,pn) The value of (c):
the multi-point pade approximation is a generalized version of the single-point pade approximation. Setting the approximated function f (x), if at n +1 interpolation points x0,x1,x2,...,xnIf the function value is known, there is a rational formula:
wherein L + M ═ n, u[L/M](x) Is a polynomial of highest order L, v[L/M](x) Is a polynomial of highest order M:
here, u[L/M](x) And v[L/M](x) Is a polynomial function that needs to be constructed by mean-difference;
first, the mean deviation of f (x) is defined as follows:
let fi,jIs f [ x ]i,xi+1,...,xj]J is not less than i; then u[L/M](x) Can be calculated by the following way:
at the same time, v[L/M](x) Can be calculated by the following way:
the segmented Pade approximation used by the invention is a special form of multi-point Pade approximation, which is constructed by giving each interpolation point, function value at the interpolation point and derivative value from first order to m order and constructing each segment based on the multi-point Pade approximation.
Let the approximated function be f (x), and at n +1 interpolation points x0,x1,x2,...,xnThe method comprises the following steps:
arbitrarily take a section of interval [ xk,xk+1]Constructing a Pade approximation expression:
wherein L + M +1 ═ n,andthe expression of (c) has been given in equations (8) and (9). The specific calculation process needs to consider an equivalent set formed by 2m +2 points:
according to the formula (8) and the formula (9), the mean difference fi,j=f[zi,zi+1,...,zj],0≤i≤j≤2m+1;
From the nature of the mean difference and equation (10):
when i is more than or equal to 0 and less than or equal to m and j is more than or equal to m +1 and less than or equal to 2m +1, a recurrence formula is as follows:
when i +1 is more than or equal to m +1, directly calculating according to the formula (14);
when j-1 is not more than m, directly calculating according to a formula (13);
calculating fi,jIs substituted into the formula (8) and the formula (9) to obtainAndfurther, find outFunction r constructed from a piecewise Pade approximationL/M(x) Expressed as:
5. the method of claim 1, wherein the method comprises: setting the training round as N in the step 3, wherein the training steps are as follows:
step 3.1, inputting the training data into a neural network, and executing step 3.2;
step 3.2, data in the module is transmitted in the forward direction, and data Hn×mInputting the data into an activation function layer, and executing the next step;
step 3.3, activating the hyper-parameter x of the function layer0,x1,x2,...,xnAnd trainable parameters The piecewise function is obtained according to equations (10) - (16) as interpolation points and derivative values from zero order to m orderForming a piecewise activation function r[L/M](x);
Step 3.4, data Hn×mAfter passing through the activation function r[L/M](x) To obtain an output Zn×mExpressed as:
to obtain an output Zn×m;
3.5, continuing forward propagation of the data until a next activation function layer is encountered, jumping to the step 3.3, and otherwise, executing the step 3.6;
step 3.6, obtaining a training result, calculating the value of a loss function, and automatically performing back propagation and updating neural network weight and trainable parameters by a framework; if the current round is less than or equal to N, a new batch of training data is taken, and the step 3.2 is skipped; otherwise, the model training process is ended.
6. The method of claim 1, wherein the method comprises: step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210125273.3A CN114548400A (en) | 2022-02-10 | 2022-02-10 | Rapid flexible full-pure embedded neural network wide area optimization training method |
PCT/CN2022/094901 WO2023151201A1 (en) | 2022-02-10 | 2022-05-25 | Fast and flexible holomorphic embedding type neural network wide-area optimization training method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210125273.3A CN114548400A (en) | 2022-02-10 | 2022-02-10 | Rapid flexible full-pure embedded neural network wide area optimization training method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114548400A true CN114548400A (en) | 2022-05-27 |
Family
ID=81672897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210125273.3A Pending CN114548400A (en) | 2022-02-10 | 2022-02-10 | Rapid flexible full-pure embedded neural network wide area optimization training method |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114548400A (en) |
WO (1) | WO2023151201A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116700049A (en) * | 2023-07-12 | 2023-09-05 | 山东大学 | Multi-energy network digital twin real-time simulation system and method based on data driving |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11175641B2 (en) * | 2018-08-10 | 2021-11-16 | Cornell University | Processing platform with holomorphic embedding functionality for power control and other applications |
CN112597700B (en) * | 2020-12-15 | 2022-09-27 | 北京理工大学 | Aircraft trajectory simulation method based on neural network |
CN112784496A (en) * | 2021-01-29 | 2021-05-11 | 上海明略人工智能(集团)有限公司 | Method and device for predicting motion parameters of hydrodynamics and storage medium |
CN113183146B (en) * | 2021-02-04 | 2024-02-09 | 中山大学 | Mechanical arm motion planning method based on rapid and flexible full-pure embedding idea |
CN113489014B (en) * | 2021-07-19 | 2023-06-02 | 中山大学 | Quick and flexible full-pure embedded power system optimal power flow evaluation method |
CN114239698A (en) * | 2021-11-26 | 2022-03-25 | 中国空间技术研究院 | Data processing method, device and equipment |
CN114385969A (en) * | 2022-01-12 | 2022-04-22 | 温州大学 | Neural network method for solving differential equations |
-
2022
- 2022-02-10 CN CN202210125273.3A patent/CN114548400A/en active Pending
- 2022-05-25 WO PCT/CN2022/094901 patent/WO2023151201A1/en unknown
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116700049A (en) * | 2023-07-12 | 2023-09-05 | 山东大学 | Multi-energy network digital twin real-time simulation system and method based on data driving |
CN116700049B (en) * | 2023-07-12 | 2024-05-28 | 山东大学 | Multi-energy network digital twin real-time simulation system and method based on data driving |
Also Published As
Publication number | Publication date |
---|---|
WO2023151201A1 (en) | 2023-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110119854B (en) | Voltage stabilizer water level prediction method based on cost-sensitive LSTM (least squares) cyclic neural network | |
CN109784480B (en) | Power system state estimation method based on convolutional neural network | |
CN111292525B (en) | Traffic flow prediction method based on neural network | |
CN112541572B (en) | Residual oil distribution prediction method based on convolutional encoder-decoder network | |
CN108764568B (en) | Data prediction model tuning method and device based on LSTM network | |
CN108537366B (en) | Reservoir scheduling method based on optimal convolution bidimensionalization | |
KR20040099092A (en) | Improved performance of artificial neural network models in the presence of instrumental noise and measurement errors | |
CN112578089B (en) | Air pollutant concentration prediction method based on improved TCN | |
Lun et al. | The modified sufficient conditions for echo state property and parameter optimization of leaky integrator echo state network | |
CN112800675A (en) | KPCA and ELM-based time-space separation distribution parameter system modeling method | |
CN114036850A (en) | Runoff prediction method based on VECGM | |
CN114548400A (en) | Rapid flexible full-pure embedded neural network wide area optimization training method | |
Fan et al. | Combining a fully connected neural network with an ensemble kalman filter to emulate a dynamic model in data assimilation | |
CN110619382A (en) | Convolution depth network construction method suitable for seismic exploration | |
CN115081323A (en) | Method for solving multi-objective constrained optimization problem and storage medium thereof | |
Alrubaie | Cascade-Forward neural network for volterra integral equation solution | |
CN114202063A (en) | Fuzzy neural network greenhouse temperature prediction method based on genetic algorithm optimization | |
Nguyen et al. | A deep learning approach for solving Poisson’s equations | |
CN110829434B (en) | Method for improving expansibility of deep neural network tidal current model | |
CN114638421A (en) | Method for predicting requirement of generator set spare parts | |
CN114529040A (en) | On-line prediction method for assembly error of electromechanical product | |
Marepally et al. | Data Puncturing and Training Strategies for Cost-Efficient Surrogate Modeling of Airfoil Aerodynamics | |
YANG | Benefits of a metamodel for automatic calibration of 1D and 2D fluvial models | |
Yang et al. | ELM weighted hybrid modeling and its online modification | |
CN117786396A (en) | Short-term sea surface temperature prediction method and system based on CSA-ConvLSTM model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |