CN114548400A - Rapid flexible full-pure embedded neural network wide area optimization training method - Google Patents

Rapid flexible full-pure embedded neural network wide area optimization training method Download PDF

Info

Publication number
CN114548400A
CN114548400A CN202210125273.3A CN202210125273A CN114548400A CN 114548400 A CN114548400 A CN 114548400A CN 202210125273 A CN202210125273 A CN 202210125273A CN 114548400 A CN114548400 A CN 114548400A
Authority
CN
China
Prior art keywords
training
function
neural network
activation function
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210125273.3A
Other languages
Chinese (zh)
Inventor
汪涛
谭洪宇
高子雄
何晓斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202210125273.3A priority Critical patent/CN114548400A/en
Priority to PCT/CN2022/094901 priority patent/WO2023151201A1/en
Publication of CN114548400A publication Critical patent/CN114548400A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/13Differential equations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a quick, flexible and fully-pure embedded neural network wide-area optimization training method, which comprises the following specific steps: step 1, determining a differential equation to be solved, and sampling in a defined domain to obtain training data and test data; step 2, constructing a neural network model, and embedding an activation function layer based on piecewise rational approximation; step 3, adjusting the hyper-parameters and training a neural network model; step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3. The activation function constructed by the piecewise rational approximation method is superior to a common activation function in the aspects of training time and training precision, and a powerful solution is provided for rapidly and accurately solving the problem of high-dimensional partial differential equations related to actual engineering calculation tasks.

Description

Rapid flexible full-pure embedded neural network wide area optimization training method
Technical Field
The invention relates to the technical field of information science and engineering calculation, in particular to a quick, flexible and fully-embedded neural network wide-area optimization training method.
Background
Partial differential equations are widely used in various fields of natural science and engineering applications such as oil and gas exploration, bridge design, and machine manufacturing. However, in some complex scenarios, it is difficult to have an analytical solution. Numerical methods such as the traditional methods of finite difference, finite element, finite volume, etc. are more commonly used. However, the conventional method needs to divide the region into a plurality of grid cells to approximate the solution space of the partial differential equation, and the number of grids is huge when the dimension is very high, so that the calculation cost is very large. And the partial differential equation is solved by using a Neural Network (NN), grid division is not needed, and random sampling is carried out in the space to be used as the input of the model, so that dimension disasters are avoided.
In the past decade, Deep Neural Networks (DNNs) have evolved as the fundamental technology and key tool for machine learning. Research finds that the performance of the method is superior to that of the traditional statistical learning technology (such as a nuclear method, a support vector machine and a random forest) in many practical applications such as image classification, voice recognition, image segmentation and medical imaging.
A neural network is a complex network system formed by a large number of simple processing units (called neurons) widely connected to each other, reflects many basic features of human brain functions, and is a highly complex nonlinear dynamical learning system. The neural network has four basic features:
(i) non-linear: non-linear relationships are a common property of nature. The intelligence of the brain is a nonlinear phenomenon. The artificial neuron is in two different states of activation or inhibition, and the behavior is mathematically expressed as a nonlinear relationship. The network formed by the neurons with the threshold value has better performance, and the fault tolerance and the storage capacity can be improved.
(ii) Without limitation: a neural network is typically formed by a plurality of widely connected neurons. The overall behavior of a system depends not only on the characteristics of the individual neurons, but may be primarily determined by the interactions, interconnections, between the units. The non-limitations of the brain are simulated by a large number of connections between the cells. Associative memory is a non-limiting representative example.
(iii) Very qualitative: the artificial neural network has the self-adaption, self-organization and self-learning capabilities. The neural network not only can process various information, but also can process information, and simultaneously, the nonlinear dynamical system is changed continuously. Iterative processes are often used to describe the evolution of dynamic or time-varying systems.
(iv) Non-convex: the direction of evolution of a system will, under certain conditions, depend on a particular state function. For example an energy function, the extreme values of which correspond to a more stable state of the system. Non-convexity means that the function has a plurality of extreme values, so that the system has a plurality of stable equilibrium states, which leads to the diversity of the system evolution.
The activation function plays an important role for the artificial neural network model in learning and understanding the complex change law (which is generally highly nonlinear). They introduce non-linear characteristics into the network. In neurons, the inputs are weighted, summed, and applied to a function, which is the activation function. The activation function introduces nonlinear factors to the neurons, so that the neural network can arbitrarily approximate any nonlinear function, and the neural network can be applied to a plurality of nonlinear models.
There are no clear guiding theoretical principles for the selection of activation functions, and the general choices are ReLu function, Sigmoid function and hyperbolic tangent function. Existing activation functions are often one of the above three functions or variations of the three functions (e.g., with one to two trainable parameters). The advantages and disadvantages of these three activation functions are:
(i) ReLu function is the most commonly used activation function in modern neural networks, most feed-forward neural networks use the activation function by default. The method has the advantages that the algorithm is fast in convergence, and meanwhile, the problems of gradient saturation, gradient disappearance and the like cannot occur in the region where x is greater than 0; in addition, its disadvantages are also evident, including: the phenomenon of neuron necrosis caused by the fact that the ReLu function is constantly zero in a negative number region, at the moment, the gradient of a neuron and the gradient behind the neuron are always zero, and updating can not be carried out in the training round; meanwhile, because the second derivative and higher derivatives of the ReLu function in the positive and negative number regions are zero, the neural network model cannot be trained effectively in some special applications (such as solving a differential equation by using a neural network).
(ii) Sigmoid function has the advantages that the output of the function is between (0,1), the optimization is stable, the function is continuous and the derivation is convenient; the disadvantage is that the function is not sensitive to input and output because it saturates when the absolute value of the variable is very large.
(iii) The hyperbolic tangent function can be regarded as a variation of the Sigmoid function, and the problem of gradient saturation still exists.
Disclosure of Invention
In order to solve the problems, the invention provides a rapid, flexible and fully-embedded neural network wide-area optimization training method which is strong in expression capability, good in smoothness and convenient to calculate.
In order to achieve the purpose, the invention is realized by the following technical scheme:
the invention relates to a quick, flexible and fully-pure embedded neural network wide-area optimization training method, which comprises the following steps:
step 1, determining a differential equation to be solved, and sampling in a defined domain to obtain training data and test data;
step 2, constructing a neural network model, and embedding an activation function layer based on piecewise rational approximation;
step 3, adjusting the hyper-parameters and training a neural network model;
step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished;
otherwise, returning to the step 3.
The invention is further improved in that: the differential equation in step 1 is a Burgers equation.
The invention is further improved in that: the neural network model constructed in the step 2 comprises an input layer, four full-connection layers, four activation function layers and an output layer.
The invention is further improved in that: the construction of the activation function of the piecewise rational approximation in step 2 is as follows:
suppose at some point x0The function f (x) is approximated using a single point pade approximation method of the form:
Figure 100002_1
wherein p iskAnd q iskIs a coefficient to be obtained, L represents the highest order of x in the numerator, and M represents the highest order of x in the denominator. When L + M is constant, when L ═ M is taken, the numerator and denominator are solved in the following manner. Let L be M be n, first solve the linear equation AqB to obtain (q)1,q2,q3,…,qn) A value of (a), wherein:
Figure 100002_2
Figure 100002_3
(p) is obtained by the following equation0,p1,p2,…,pn) The value of (c):
Figure 100002_4
the multi-point pade approximation is a generalized version of the single-point pade approximation. Setting the approximated function f (x) if x is at n +1 interpolation points0,x1,x2,…,xnIf the function value is known, there is a rational formula:
Figure 100002_5
wherein L + M ═ n, u[L/M](x) Is a polynomial of highest order L, v[L/M](x) Is a polynomial of highest order M:
Figure 100002_6
here, u[L/M](x) And v[L/M](x) Is a polynomial function that needs to be constructed by mean-difference;
first, the mean deviation of f (x) is defined as follows:
Figure 100002_7
let fi,jIs f [ x ]i,xi+1,…,xj]J is not less than i; then u[L/M](x) Can be calculated by:
Figure 100002_8
at the same time, v[L/M](x) Can be calculated by the following way:
Figure 100002_9
the segmented Pade approximation used by the invention is a special form of multi-point Pade approximation, which is constructed by giving each interpolation point, function value at the interpolation point and derivative value from first order to m order and constructing each segment based on the multi-point Pade approximation.
Let the approximated function be f (x) and at n +1 interpolation points x0,x1,x2,…,xnThe method comprises the following steps:
Figure 100002_10
wherein
Figure BDA0003500135930000055
Is represented by xiA derivative value of order τ of (f), (x);
arbitrarily take a section of interval [ xk,xk+1]Structure ofPade approximation expression:
Figure 100002_11
wherein L + M +1 ═ n,
Figure BDA0003500135930000056
and
Figure BDA0003500135930000057
the expression of (c) has been given in equations (8) and (9). The specific calculation process needs to consider an equivalent set formed by 2m +2 points:
Figure 100002_12
according to the formula (8) and the formula (9), the mean difference fi,j=f[zi,zi+1,…,zj],0≤i≤j≤2m+1;
From the nature of the mean difference and equation (10):
Figure 100002_13
Figure 100002_14
when i is more than or equal to 0 and less than or equal to m and j is more than or equal to m +1 and less than or equal to 2m +1, a recurrence formula is as follows:
Figure 100002_15
when i +1 is more than or equal to m +1, directly calculating according to the formula (14);
when j-1 is not more than m, directly calculating according to a formula (13);
calculating fi,jIs substituted into the formula (8) and the formula (9) to obtain
Figure BDA0003500135930000066
And
Figure BDA0003500135930000067
further, find out
Figure BDA0003500135930000068
Function r constructed from a piecewise Pade approximationL/M(x) Expressed as:
Figure BDA0003500135930000065
the invention is further improved in that: setting the training round as N in the step 3, wherein the training steps are as follows:
step 3.1, inputting the training data into a neural network, and executing step 3.2;
step 3.2, data in the module is transmitted in the forward direction, and data Hn×mInputting the data into an activation function layer, and executing the next step;
step 3.3, activating the hyper-parameter x of the function layer0,x1,x2,…,xnAnd trainable parameters
Figure BDA0003500135930000069
Figure BDA00035001359300000610
The piecewise function is obtained according to equations (10) - (16) as interpolation points and derivative values from zero order to m order
Figure BDA00035001359300000611
Forming a piecewise activation function r[L/M](x);
Step 3.4, data Hn×mAfter passing through the activation function r[L/M](x) To obtain an output Zn×mExpressed as:
Figure 100002_16
to obtain an output Zn×m
3.5, continuing forward propagation of the data until a next activation function layer is encountered, jumping to the step 3.3, and otherwise, executing the step 3.6;
step 3.6, obtaining a training result, calculating the value of a loss function, and automatically performing back propagation and updating neural network weight and trainable parameters by a framework; if the current round is less than or equal to N, a new batch of training data is taken, and the step 3.2 is skipped; otherwise, the model training process is ended.
The invention is further improved in that: step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3.
The beneficial effects of the invention are: the invention provides an activation function based on segmentation rational approximation according to the idea of fast flexible all-pure embedding (FFHE). Firstly, initializing function points, function values and various derivative values,
and constructing a segmented activation function by using a segmented rational approximation method. Its advantages are as follows:
(i) more potent expression: the expression capability of the piecewise function is stronger than that of the ordinary function, and a solid theoretical foundation is provided. It has been demonstrated in the literature that under the Lipschitz condition, the property of point-by-point nonlinearity is linked to the global Lipschitz constant of the network by introducing a boundary, and then regularization is performed using the boundary to derive a representation theorem indicating that the optimal configuration is implemented by a depth spline network, wherein each activation function is a piecewise linear spline function with adaptive nodes.
(ii) Better smoothness: other commonly used activation functions such as ReLu function, PReLu function and piecewise linear spline are only piecewise first-order derivative, which is limited in some scenes, for example, solving a differential equation by using a neural network usually requires a second-order derivative or even a higher-order derivative of a network output to an input, and the activation function which is only first-order derivative causes a gradient to be zero and cannot effectively update parameters.
(iii) More flexible and easy to calculate: the activation function based on the piecewise rational approximation sets an initialization function point, a function value and each order derivative, takes the function value and each order derivative as parameters which can be adjusted along with the training of the neural network, and the adaptive adjustment of the parameters enables the back propagation of the neural network to be updated towards the steepest direction, so that the activation function needs fewer turns than other activation functions to achieve the expected precision.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
FIG. 2 is a flow chart of neural network model training based on a piecewise rational approximation activation function.
Fig. 3 is a schematic diagram of a neural network model structure of the present invention.
Fig. 4 is a schematic diagram of the structure of the PINNs model.
Fig. 5 is a graph of the LeakyReLu, ReLu, Tanh, and FFHE activation function training.
Detailed Description
Embodiments of the invention will be described in detail below with reference to the drawings, and for the sake of clarity, many implementation details will be set forth in the following description. It should be understood, however, that these implementation details should not be used to limit the invention. That is, details of these implementations are not necessary in some embodiments of the invention.
As shown in fig. 1-3, the invention is a fast, flexible and fully-embedded neural network wide-area optimization training method, comprising the following steps:
step 1, determining a differential equation to be solved, and sampling in a defined domain to obtain training data and test data; step 2, constructing a neural network model, and embedding an activation function layer based on piecewise rational approximation;
step 3, adjusting the hyper-parameters and training a neural network model;
step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3.
The differential equation to be solved in the step 1 is a Burgers equation, and the Burgers equation is a very useful mathematical model for many physical problems, such as shock waves, shallow water wave problems, traffic flow mechanics and the like, and is an important mathematical model for describing the physical world diffusion phenomenon. It is a nonlinear partial differential equation that simulates the propagation and reflection of a shock wave, defined as follows:
ut+uux-(0.01/π)uxx=0,x∈[-1,1],t∈[0,1],
u(0,x)=-sin(πx),
u(t,-1)=u(t,1)=0.
the equation is a time-varying, one-dimensional, partial differential equation with an initial condition and boundary conditions in z-state space.
The PINNs model is used in step 2, and the approximate structure of the model is shown in FIG. 4, with the independent variables x and t of the differential equation as inputs and the dependent variable u as an output. NN (x, t; theta) in the graph is represented as a fully-connected neural network, and theta is the weight of a hidden layer of the neural network. The PDE (λ) part of the graph represents the composition of the loss function in the neural network model. The penalty function for PINNs is divided into two parts: one block is the initial conditions and boundary portion and one block is the equation itself.
Taking Burgers equation as an example, the boundary and initial up-sampling number are set as NuThe number of samples within the boundary is Nf. The first part of the loss function is to compute the MSE of the output of the model over initial and boundary conditions:
Figure 17
the second part of the loss function is to compute the MSE of the output of the model over the equation:
is provided with
γ=ut+uux-(0.01/π)uxx.
Then there are:
Figure 18
the final loss function is the sum of the two:
MSE=MSEu+MSEf.
as shown in FIG. 3, the present invention provides a fully-connected neural network of PINNs with four hidden layers of 20 neurons per layer. And (2) obtaining 25600 (x, t) data pairs by sampling in the boundary and the initial value, obtaining 10000 (x, t) data pairs in the boundary and 100 (x, t) data pairs on the boundary and the initial value by adopting a Latin hypercube sampling method in all the data, and taking 10100 data pairs as training data of the model in total. The remaining (x, t) data pairs were used as test data for the model.
Each full-connection hidden layer is followed by an activation function layer based on piecewise rational approximation, each activation function layer has six trainable parameters, and each activation function layer has n +1 hyper-parameters x0,x1,x2,…,xnRepresenting interpolation points, (m +1) (n +1) trainable parameters
Figure BDA0003500135930000102
Figure BDA0003500135930000103
Representing the derivative values from zero to m.
The invention designs the activation function according to the thought of fast flexible full-pure embedding (FFHE) and by combining the related mathematical knowledge of the piecewise rational approximation. Wherein, the Pade approximation is a method for constructing rational function approximation, and the Pade approximation is more accurate than truncated Taylor series; moreover, even when the taylor series does not converge, the pade approximation tends to converge. In addition, when constructing the interpolation function, in order to avoid the longge phenomenon caused by the high-order polynomial, a method of piecewise interpolation is generally adopted, and the interpolation result depends only on a few surrounding points, so that a composite piecewise function is finally formed.
The construction process of the activation function based on the piecewise rational approximation in the step 2 is as follows:
the details of which are described in the formulae (10) - (16)
Setting the maximum training round as N in the step 3, and training the neural network model specifically comprises the following steps:
step 3.1, inputting the training data into a neural network, and executing step 3.2;
step 3.2, data in the module is transmitted in the forward direction, and data Hn×mInputting the data into an activation function layer, and executing the next step;
step 3.3, activating the hyper-parameter x of the function layer0,x1,x2,…,xnAnd trainable parameters
Figure BDA0003500135930000104
Figure BDA0003500135930000105
The piecewise function is obtained according to equations (10) - (16) as interpolation points and derivative values from zero order to m order
Figure BDA0003500135930000106
Forming a piecewise activation function r[L/M](x);
Step 4.4, data Hn×mAfter passing through the activation function r[L/M](x) To obtain an output Zn×mExpressed as:
Figure 19
to obtain an output Zn×m
3.5, continuing forward propagation of the data until a next activation function layer is encountered, jumping to the step 3.3, and otherwise, executing the step 3.6;
step 3.6, obtaining a training result, calculating a value of a loss function, and automatically performing reverse propagation and updating neural network weight and trainable parameters by a framework; if the current round is less than or equal to N, a new batch of training data is taken, and the step 3.2 is skipped; otherwise, the model training process is ended.
Step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3.
7000 rounds of training were performed with the learning rate set to 0.002. The LeakyReLu activation function and the ReLu activation function have the worst training effect, and the corresponding training curves in FIG. 5 are the top two almost coincident curves. The average training time of each hundred rounds of activation functions constructed by the segmented Pade approximation is 4.307s, and the average training time of each hundred rounds of Tanh functions is 3.532 s; the activation function of the segmented Pade approximation construction reaches 9.4067E-04 at the 1500 th round of training error, and the Tanh function is trained to 7000 rounds and just reduces the training error to 9.1780E-04. That is, the error of the FFHE method can be reduced to the same level only by about one fifth of the training round required by Tanh; if 7000 rounds are trained, the FFHE method gives results that are more than two orders of magnitude (100 times) more accurate than those obtained using Tanh. Therefore, the activation function constructed by using the FFHE (segmented Pade approximation) method is superior to a general activation function in the aspects of training time and training precision. Therefore, the method provides a powerful solution for rapidly and accurately solving the problem of the high-dimensional partial differential equation involved in the actual engineering calculation task.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (6)

1. A quick, flexible and fully-pure embedded neural network wide area optimization training method is characterized by comprising the following steps: the method comprises the following steps:
step 1, determining a differential equation to be solved, and sampling in a defined domain to obtain training data and test data;
step 2, constructing a neural network model, and embedding an activation function layer based on piecewise rational approximation;
step 3, adjusting the hyper-parameters and training a neural network model;
step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3.
2. The method of claim 1, wherein the method comprises: the differential equation in step 1 is a Burgers equation.
3. The method of claim 1, wherein the method comprises: the neural network model constructed in the step 2 comprises an input layer, four full-connection layers, four activation function layers and an output layer.
4. The method of claim 1, wherein the method comprises: the construction of the activation function of the piecewise rational approximation in step 2 is as follows:
suppose at some point x0The function f (x) is approximated using a single point pade approximation method of the form:
Figure 1
wherein p iskAnd q iskIs a coefficient to be obtained, L represents the highest order of x in the numerator, and M represents the highest order of x in the denominator. When L + M is constant, when L ═ M is taken, the numerator and denominator are solved in the following manner. Let L ═ M ═ n, first solve linear equation Aq ═ b, get (q ═ b)1,q2,q3,...,qn) A value of (a), wherein:
Figure 2
Figure 3
(p) is obtained by the following equation0,p1,p2,...,pn) The value of (c):
p0=a0,q0=1,
Figure 4
the multi-point pade approximation is a generalized version of the single-point pade approximation. Setting the approximated function f (x), if at n +1 interpolation points x0,x1,x2,...,xnIf the function value is known, there is a rational formula:
Figure 5
wherein L + M ═ n, u[L/M](x) Is a polynomial of highest order L, v[L/M](x) Is a polynomial of highest order M:
Figure 6
here, u[L/M](x) And v[L/M](x) Is a polynomial function that needs to be constructed by mean-difference;
first, the mean deviation of f (x) is defined as follows:
Figure 7
let fi,jIs f [ x ]i,xi+1,...,xj]J is not less than i; then u[L/M](x) Can be calculated by the following way:
Figure 8
at the same time, v[L/M](x) Can be calculated by the following way:
Figure 9
the segmented Pade approximation used by the invention is a special form of multi-point Pade approximation, which is constructed by giving each interpolation point, function value at the interpolation point and derivative value from first order to m order and constructing each segment based on the multi-point Pade approximation.
Let the approximated function be f (x), and at n +1 interpolation points x0,x1,x2,...,xnThe method comprises the following steps:
Figure 10
wherein
Figure FDA0003500135920000033
Is represented by xiA derivative value of order τ of (f), (x);
arbitrarily take a section of interval [ xk,xk+1]Constructing a Pade approximation expression:
Figure 11
wherein L + M +1 ═ n,
Figure FDA0003500135920000035
and
Figure FDA0003500135920000036
the expression of (c) has been given in equations (8) and (9). The specific calculation process needs to consider an equivalent set formed by 2m +2 points:
Figure 12
according to the formula (8) and the formula (9), the mean difference fi,j=f[zi,zi+1,...,zj],0≤i≤j≤2m+1;
From the nature of the mean difference and equation (10):
Figure 13
Figure 14
when i is more than or equal to 0 and less than or equal to m and j is more than or equal to m +1 and less than or equal to 2m +1, a recurrence formula is as follows:
Figure 15
when i +1 is more than or equal to m +1, directly calculating according to the formula (14);
when j-1 is not more than m, directly calculating according to a formula (13);
calculating fi,jIs substituted into the formula (8) and the formula (9) to obtain
Figure FDA0003500135920000043
And
Figure FDA0003500135920000044
further, find out
Figure FDA0003500135920000045
Function r constructed from a piecewise Pade approximationL/M(x) Expressed as:
Figure FDA0003500135920000046
5. the method of claim 1, wherein the method comprises: setting the training round as N in the step 3, wherein the training steps are as follows:
step 3.1, inputting the training data into a neural network, and executing step 3.2;
step 3.2, data in the module is transmitted in the forward direction, and data Hn×mInputting the data into an activation function layer, and executing the next step;
step 3.3, activating the hyper-parameter x of the function layer0,x1,x2,...,xnAnd trainable parameters
Figure FDA0003500135920000047
Figure FDA0003500135920000049
The piecewise function is obtained according to equations (10) - (16) as interpolation points and derivative values from zero order to m order
Figure FDA0003500135920000048
Forming a piecewise activation function r[L/M](x);
Step 3.4, data Hn×mAfter passing through the activation function r[L/M](x) To obtain an output Zn×mExpressed as:
Figure 16
to obtain an output Zn×m
3.5, continuing forward propagation of the data until a next activation function layer is encountered, jumping to the step 3.3, and otherwise, executing the step 3.6;
step 3.6, obtaining a training result, calculating the value of a loss function, and automatically performing back propagation and updating neural network weight and trainable parameters by a framework; if the current round is less than or equal to N, a new batch of training data is taken, and the step 3.2 is skipped; otherwise, the model training process is ended.
6. The method of claim 1, wherein the method comprises: step 4, model prediction is carried out, if the prediction result meets the requirement, the model training is successful, and the training is finished; otherwise, returning to the step 3.
CN202210125273.3A 2022-02-10 2022-02-10 Rapid flexible full-pure embedded neural network wide area optimization training method Pending CN114548400A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210125273.3A CN114548400A (en) 2022-02-10 2022-02-10 Rapid flexible full-pure embedded neural network wide area optimization training method
PCT/CN2022/094901 WO2023151201A1 (en) 2022-02-10 2022-05-25 Fast and flexible holomorphic embedding type neural network wide-area optimization training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210125273.3A CN114548400A (en) 2022-02-10 2022-02-10 Rapid flexible full-pure embedded neural network wide area optimization training method

Publications (1)

Publication Number Publication Date
CN114548400A true CN114548400A (en) 2022-05-27

Family

ID=81672897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210125273.3A Pending CN114548400A (en) 2022-02-10 2022-02-10 Rapid flexible full-pure embedded neural network wide area optimization training method

Country Status (2)

Country Link
CN (1) CN114548400A (en)
WO (1) WO2023151201A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116700049A (en) * 2023-07-12 2023-09-05 山东大学 Multi-energy network digital twin real-time simulation system and method based on data driving

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11175641B2 (en) * 2018-08-10 2021-11-16 Cornell University Processing platform with holomorphic embedding functionality for power control and other applications
CN112597700B (en) * 2020-12-15 2022-09-27 北京理工大学 Aircraft trajectory simulation method based on neural network
CN112784496A (en) * 2021-01-29 2021-05-11 上海明略人工智能(集团)有限公司 Method and device for predicting motion parameters of hydrodynamics and storage medium
CN113183146B (en) * 2021-02-04 2024-02-09 中山大学 Mechanical arm motion planning method based on rapid and flexible full-pure embedding idea
CN113489014B (en) * 2021-07-19 2023-06-02 中山大学 Quick and flexible full-pure embedded power system optimal power flow evaluation method
CN114239698A (en) * 2021-11-26 2022-03-25 中国空间技术研究院 Data processing method, device and equipment
CN114385969A (en) * 2022-01-12 2022-04-22 温州大学 Neural network method for solving differential equations

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116700049A (en) * 2023-07-12 2023-09-05 山东大学 Multi-energy network digital twin real-time simulation system and method based on data driving
CN116700049B (en) * 2023-07-12 2024-05-28 山东大学 Multi-energy network digital twin real-time simulation system and method based on data driving

Also Published As

Publication number Publication date
WO2023151201A1 (en) 2023-08-17

Similar Documents

Publication Publication Date Title
CN110119854B (en) Voltage stabilizer water level prediction method based on cost-sensitive LSTM (least squares) cyclic neural network
CN109784480B (en) Power system state estimation method based on convolutional neural network
CN111292525B (en) Traffic flow prediction method based on neural network
CN112541572B (en) Residual oil distribution prediction method based on convolutional encoder-decoder network
CN108764568B (en) Data prediction model tuning method and device based on LSTM network
CN108537366B (en) Reservoir scheduling method based on optimal convolution bidimensionalization
KR20040099092A (en) Improved performance of artificial neural network models in the presence of instrumental noise and measurement errors
CN112578089B (en) Air pollutant concentration prediction method based on improved TCN
Lun et al. The modified sufficient conditions for echo state property and parameter optimization of leaky integrator echo state network
CN112800675A (en) KPCA and ELM-based time-space separation distribution parameter system modeling method
CN114036850A (en) Runoff prediction method based on VECGM
CN114548400A (en) Rapid flexible full-pure embedded neural network wide area optimization training method
Fan et al. Combining a fully connected neural network with an ensemble kalman filter to emulate a dynamic model in data assimilation
CN110619382A (en) Convolution depth network construction method suitable for seismic exploration
CN115081323A (en) Method for solving multi-objective constrained optimization problem and storage medium thereof
Alrubaie Cascade-Forward neural network for volterra integral equation solution
CN114202063A (en) Fuzzy neural network greenhouse temperature prediction method based on genetic algorithm optimization
Nguyen et al. A deep learning approach for solving Poisson’s equations
CN110829434B (en) Method for improving expansibility of deep neural network tidal current model
CN114638421A (en) Method for predicting requirement of generator set spare parts
CN114529040A (en) On-line prediction method for assembly error of electromechanical product
Marepally et al. Data Puncturing and Training Strategies for Cost-Efficient Surrogate Modeling of Airfoil Aerodynamics
YANG Benefits of a metamodel for automatic calibration of 1D and 2D fluvial models
Yang et al. ELM weighted hybrid modeling and its online modification
CN117786396A (en) Short-term sea surface temperature prediction method and system based on CSA-ConvLSTM model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination