WO2023151201A1 - 一种快速灵活全纯嵌入式神经网络广域寻优训练方法 - Google Patents

一种快速灵活全纯嵌入式神经网络广域寻优训练方法 Download PDF

Info

Publication number
WO2023151201A1
WO2023151201A1 PCT/CN2022/094901 CN2022094901W WO2023151201A1 WO 2023151201 A1 WO2023151201 A1 WO 2023151201A1 CN 2022094901 W CN2022094901 W CN 2022094901W WO 2023151201 A1 WO2023151201 A1 WO 2023151201A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
neural network
function
activation function
approximation
Prior art date
Application number
PCT/CN2022/094901
Other languages
English (en)
French (fr)
Inventor
汪涛
谭洪宇
高子雄
何晓斌
Original Assignee
中山大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中山大学 filed Critical 中山大学
Publication of WO2023151201A1 publication Critical patent/WO2023151201A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/13Differential equations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the invention relates to the technical fields of information science and engineering calculation, in particular to a fast, flexible and pure embedded neural network wide-area optimization training method.
  • Partial differential equations are widely used in various fields of natural science and engineering applications, such as oil and gas exploration, bridge design, and mechanical manufacturing. But in some complex scenarios, it is difficult to have an analytical solution. Therefore, numerical methods are more commonly used, such as traditional methods such as finite difference, finite element, and finite body. However, the traditional method needs to divide the region into several grid units to approximate the solution space of partial differential equations. When the dimension is very high, the number of grids is huge, and the calculation cost is very high. However, using neural networks (Neural Networks, NN) to solve partial differential equations does not require grid division but random sampling in space as the input of the model, thus avoiding the curse of dimensionality.
  • neural networks Neuro Networks, NN
  • DNNs deep neural networks
  • traditional statistical learning techniques e.g., kernel methods, support vector machines, random forests
  • image classification e.g., speech recognition, image segmentation, and medical imaging.
  • Neural network is a complex network system formed by extensive interconnection of a large number of simple processing units (called neurons), which reflects many basic characteristics of human brain function and is a highly complex nonlinear dynamic learning system.
  • a neural network has the following four basic characteristics:
  • Nonlinearity Nonlinear relationships are a universal property of nature. Brain intelligence is a nonlinear phenomenon. Artificial neurons are in two different states of activation or inhibition, and this behavior is mathematically represented as a nonlinear relationship. A network composed of neurons with a threshold has better performance, which can improve fault tolerance and storage capacity.
  • a neural network is usually formed by extensive connections of multiple neurons. The overall behavior of a system not only depends on the characteristics of individual neurons, but may also be mainly determined by the interaction and interconnection between units. Simulate the non-limitation of the brain through a large number of connections between units. Associative memory is a typical example of non-limitation.
  • the artificial neural network has the ability of self-adaptation, self-organization and self-learning. Not only the information processed by the neural network can change in various ways, but also the nonlinear dynamical system itself is constantly changing while processing the information. An iterative process is often used to describe the evolution process of a dynamic or time-varying system.
  • Non-convexity The evolution direction of a system will depend on a specific state function under certain conditions. For example, the energy function, its extremum corresponds to a relatively stable state of the system. Non-convexity means that this function has multiple extreme values, so the system has multiple stable equilibrium states, which will lead to the diversity of system evolution.
  • Activation functions play a very important role in learning and understanding complex variation rules (usually highly nonlinear) for artificial neural network models. They introduce nonlinear properties into the network. In a neuron, inputs are weighted, summed, and applied to a function called the activation function. The activation function introduces nonlinear factors to neurons, so that the neural network can arbitrarily approximate any nonlinear function, so that the neural network can be applied to many nonlinear models.
  • activation functions There are not many clear guiding theoretical principles for the selection of activation functions. The usual choices are ReLu function, Sigmoid function and hyperbolic tangent function. Existing activation functions are often one of the above three functions or variants of these three functions (for example, with one or two trainable parameters). The advantages and disadvantages of these three activation functions are:
  • the ReLu function is the most commonly used activation function in modern neural networks, and the activation function used by most feedforward neural networks by default. Its advantage is that the algorithm converges quickly, and at the same time, there will be no problems such as gradient saturation and gradient disappearance in the region of x>0; in addition, its disadvantages are also obvious, including: in the negative region, the ReLu function is always zero, which leads to neurological problems.
  • the hyperbolic tangent function can be regarded as a deformation of the Sigmoid function, and there is still the problem of gradient saturation.
  • the present invention is a fast, flexible and pure embedded neural network wide-area optimization training method, comprising the following steps:
  • Step 1 determine the differential equation that needs to be solved, and sample training data and test data in the defined domain;
  • Step 2 constructing a neural network model, embedding an activation function layer based on piecewise rational approximation
  • Step 3 adjust the hyperparameters and train the neural network model
  • Step 4 Perform model prediction. If the prediction result meets the requirements, the model training is successful and the training ends; otherwise, return to step 3.
  • the differential equation in step 1 is Burgers equation.
  • the neural network model constructed in step 2 includes an input layer, four fully connected layers, four activation function layers and an output layer.
  • step 2 the construction of the activation function of the piecewise rational approximation in step 2 is as follows:
  • p k and q k are the coefficients to be obtained
  • L represents the highest order of x in the numerator
  • M represents the highest order of x in the denominator.
  • L+M is a constant
  • the multi-point Padé approximation is a generalized form of the single-point Padé approximation.
  • the approximated function f(x) if its function value is known at n+1 interpolation points x 0 , x 1 , x 2 ,...,x n , then there is a rational fraction:
  • u [L/M] (x) is a polynomial with the highest order L
  • v [L/M] (x) is a polynomial with the highest order M:
  • u [L/M] (x) and v [L/M] (x) are polynomial functions that need to be constructed by mean difference;
  • f i,j be f[x i ,x i+1 ,...,x j ],j ⁇ i; then, u [L/M] (x) can be calculated as follows:
  • v [L/M] (x) can be calculated by:
  • the segmented Padé approximation used in the present invention is to construct each segment based on the multipoint Padre approximation by giving each interpolation point, the function value at the interpolation point and the derivative value from the first order to the m order, which is a multipoint Padé approximation.
  • a special form of the German approximation constructed as follows.
  • the further improvement of the present invention is: set the number of training rounds as N in step 3, and the training steps are as follows:
  • Step 3.1 input the training data into the neural network, and execute step 3.2;
  • Step 3.2 the data in the module is propagated forward, the data H n ⁇ m is input to the activation function layer, and the next step is executed;
  • Step 3.3 from the hyperparameters x 0 , x 1 , x 2 ,..., x n of the activation function layer and the trainable parameters As the interpolation point and the derivative value from the zeroth order to the mth order respectively, according to formulas (10)-(16), the piecewise function is obtained Form a piecewise activation function r [L/M] (x);
  • step 3.4 the data H n ⁇ m passes through the activation function r [L/M] (x), and the output Z n ⁇ m is obtained, expressed as:
  • step 3.5 the data continues to propagate forward until the next activation function layer is encountered, then skip to step 3.3, otherwise, go to step 3.6;
  • Step 3.6 get the training result, calculate the value of the loss function, and the framework automatically performs backpropagation, updates the neural network weights and trainable parameters; if the current round is less than or equal to N, take a new batch of training data, and jump to step 3.2 ; Otherwise, the model training process ends.
  • the further improvement of the present invention lies in: performing model prediction in step 4, if the prediction result meets the requirements, the model training is successful, and the training ends; otherwise, return to step 3.
  • the present invention proposes an activation function based on piecewise rational approximation. First, initialize the function points, function values and derivative values at each stage, and then use the method of piecewise rational approximation to construct a piecewise activation function. Its advantages are as follows:
  • Fig. 1 is a schematic flow chart of the present invention.
  • Fig. 2 is a flow chart of neural network model training based on piecewise rational approximation activation function.
  • Fig. 3 is a structural schematic diagram of the neural network model of the present invention.
  • Figure 4 is a schematic diagram of the structure of the PINNs model.
  • Figure 5 is a training curve diagram of LeakyReLu, ReLu, Tanh and FFHE activation functions.
  • the present invention is a fast and flexible holo-pure embedded neural network wide-area optimization training method, comprising the following steps:
  • Step 1 determine the differential equation that needs to be solved, and sample training data and test data in the defined domain;
  • Step 2 constructing a neural network model, embedding an activation function layer based on piecewise rational approximation
  • Step 3 adjust the hyperparameters and train the neural network model
  • Step 4 Perform model prediction. If the prediction result meets the requirements, the model training is successful and the training ends; otherwise, return to step 3.
  • the differential equation to be solved in step 1 is the Burgers equation.
  • the Burgers equation is a very useful mathematical model for many physical problems, such as shock waves, shallow water wave problems, and traffic flow dynamics. It is an important key to describe the diffusion phenomenon in the physical world. mathematical model. It is a nonlinear partial differential equation that simulates the propagation and reflection of shock waves, and its definition is as follows:
  • the equation is a time-varying, one-dimensional z-state space partial differential equation with initial value conditions and boundary conditions.
  • the PINNs model is used in step 2, and the general structure of the model is shown in Figure 4.
  • the independent variables x and t of the differential equation are used as input, and the dependent variable u is used as output.
  • NN(x,t; ⁇ ) is represented as a fully connected neural network, and ⁇ is the weight of the hidden layer of the neural network.
  • the PDE( ⁇ ) part in the figure indicates the composition of the loss function in the neural network model.
  • the loss function of PINNs is divided into two parts: one is the initial condition and boundary part, and the other is the equation itself.
  • the first part of the loss function is to calculate the MSE of the output of the model over the initial and boundary conditions:
  • the second part of the loss function is to calculate the MSE of the output of the model on the equation:
  • the final loss function is the sum of the two:
  • MSE MSEu + MSEf .
  • the fully connected neural network of PINNs has four hidden layers, each layer has 20 neurons.
  • 25600 (x, t) data pairs are obtained by sampling in the boundary and the initial value, and the Latin hypercube sampling method is used in all the data to obtain 10,000 (x, t) data pairs in the boundary, and in the boundary and the initial value 100 (x, t) data pairs are obtained, and a total of 10100 data pairs are used as the training data of the model.
  • the remaining (x, t) data pairs are used as test data for the model.
  • each activation function layer has six trainable parameters, and in each activation function layer part, there are n+1 hyperparameters x 0 ,x 1 ,x 2 ,...,x n represent interpolation points, (m+1)(n+1) trainable parameters Represents the derivative value from order zero to order m.
  • the present invention designs an activation function based on the idea of Fast and Flexible Holomorphic Embedding (FFHE) and in combination with relevant mathematical knowledge of piecewise rational approximation.
  • FFHE Fast and Flexible Holomorphic Embedding
  • the Padé approximation is a method of constructing rational function approximation, and the Padé approximation is often more accurate than the truncated Taylor series; moreover, even when the Taylor series does not converge, the Padé approximation can often converge.
  • the method of segmental interpolation is usually used, that is, the interpolation result only depends on a few surrounding points, and finally forms a compound segmental function.
  • step 2 The activation function construction process based on piecewise rational approximation in step 2 is:
  • step 3 the maximum number of training rounds is set to N, and the specific steps for training the neural network model are as follows:
  • Step 3.1 input the training data into the neural network, and execute step 3.2;
  • Step 3.2 the data in the module is propagated forward, the data H n ⁇ m is input to the activation function layer, and the next step is executed;
  • Step 3.3 from the hyperparameters x 0 , x 1 , x 2 ,..., x n of the activation function layer and the trainable parameters As the interpolation point and the derivative value from the zeroth order to the mth order respectively, according to formulas (10)-(16), the piecewise function is obtained Form a piecewise activation function r [L/M] (x);
  • step 4.4 the data H n ⁇ m passes through the activation function r [L/M] (x), and the output Z n ⁇ m is obtained, expressed as:
  • step 3.5 the data continues to propagate forward until the next activation function layer is encountered, then skip to step 3.3, otherwise, go to step 3.6;
  • Step 3.6 get the training result, calculate the value of the loss function, and the framework automatically performs backpropagation, updates the neural network weights and trainable parameters; if the current round is less than or equal to N, take a new batch of training data, and jump to step 3.2 ; Otherwise, the model training process ends.
  • step 4 model prediction is carried out. If the prediction result meets the requirements, the model training is successful and the training ends; otherwise, return to step 3.
  • Training is performed for 7000 rounds, and the learning rate is set to 0.002.
  • the training effect of the LeakyReLu activation function and the ReLu activation function is the worst.
  • the corresponding training curves in Figure 5 are the top two almost overlapping curves.
  • the average training time per hundred rounds of the activation function constructed by the piecewise Padé approximation is 4.307s, and the average training time per hundred rounds of the Tanh function is 3.532s; the activation function constructed by the piecewise Padé approximation reaches a training error of 9.4067E-04, and the Tanh function training to 7000 rounds has just reduced the training error to 9.1780E-04.
  • the FFHE method only needs to go through about one-fifth of the training rounds required by Tanh, and its error can be reduced to the same level; if you train 7000 rounds, the results obtained by the FFHE method are better than those obtained by using Tanh. The accuracy is improved by more than two orders of magnitude (100 times). It can be seen that the activation function constructed by the present invention using the FFHE (Segmented Padre Approximation) method is superior to general activation functions in terms of training time and training accuracy. Therefore, the present invention provides a powerful solution for quickly and accurately solving high-dimensional partial differential equation problems involved in practical engineering calculation tasks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Feedback Control In General (AREA)
  • Complex Calculations (AREA)

Abstract

本发明提供了一种快速灵活全纯嵌入式神经网络广域寻优训练方法,具体步骤为:步骤1,确定需要求解的微分方程,在定义域内采样获得训练数据和测试数据;步骤2,构建神经网络模型,嵌入基于分段有理逼近的激活函数层;步骤3,调整超参数,训练神经网络模型;步骤4,进行模型预测,预测结果若满足要求,则模型训练成功,结束训练;否则返回步骤3。本发明使用分段有理逼近方法构造的激活函数,在训练时间和训练精度上都要优于一般的激活函数,为快速准确地解决实际工程计算任务涉及到的高维偏微分方程问题提供了强有力的解决方案。

Description

一种快速灵活全纯嵌入式神经网络广域寻优训练方法 技术领域
本发明涉及到信息科学和工程计算技术领域,具体的说是一种快速灵活全纯嵌入式神经网络广域寻优训练方法。
背景技术
偏微分方程被广泛地应用到自然科学的各个领域和工程应用,比如油气勘探、桥梁设计、机械制造。但在一些复杂的场景下,很难有解析解。所以更加常用的是数值方法,比如有限差分、有限元、有限体等传统方法。但是传统方法需要将区域剖分成若干网格单元来近似偏微分方程的求解空间,当维度非常高时网格数量巨大,为此计算代价非常大。而使用神经网络(Neural Networks,NN)求解偏微分方程,不必进行网格划分而是在空间内进行随机采样作为模型的输入,从而避免了维度灾难。
在过去的十年中,深度神经网络(DNNs)已经发展成为机器学习的基础技术和关键工具。研究发现,在图像分类、语音识别、图像分割和医学成像等许多实际应用中,它们的性能优于传统的统计学习技术(如核方法、支持向量机、随机森林)。
神经网络是由大量的、简单的处理单元(称为神经元)广泛地互相连接而形成的复杂网络系统,它反映了人脑功能的许多基本特征,是一个高度复杂的非线性动力学习系统。神经网络具有如下四个基本特征:
(i).非线性:非线性关系是自然界的普遍特性。大脑的智慧就是一种非线性现象。人工神经元处于激活或抑制二种不同的状态,这种行为在数学上表现为一种非线性关系。具有阈值的神经元构成的网络具有更好的性能,可以提高容错性和存储容量。
(ii).非局限性:一个神经网络通常由多个神经元广泛连接而成。一个系统的整体行为不仅取决于单个神经元的特征,而且可能主要由单元之间的相互作 用、相互连接所决定。通过单元间的大量连接模拟大脑的非局限性。联想记忆是非局限性的典型例子。
(iii).非常定性:人工神经网络具有自适应、自组织、自学习能力。神经网络不但处理的信息可以有各种变化,而且在处理信息的同时,非线性动力系统本身也在不断变化。经常采用迭代过程描述动态或时变系统的演化过程。
(iv).非凸性:一个系统的演化方向,在一定条件下将取决于某个特定的状态函数。例如能量函数,它的极值相应于系统比较稳定的状态。非凸性是指这种函数有多个极值,故系统具有多个较稳定的平衡态,这将导致系统演化的多样性。
激活函数对于人工神经网络模型在学习和理解复杂的变化规律时(通常是高度非线性的)具有十分重要的作用。它们将非线性特性引入到网络中。在神经元中,输入通过加权、求和、被作用在一个函数上,这个函数就是激活函数。激活函数给神经元引入了非线性因素,使得神经网络可以任意逼近任何非线性函数,这样神经网络就可以应用到众多的非线性模型中。
技术问题
激活函数的选择现在还没有许多明确的指导性理论原则,通常的选择有ReLu函数、Sigmoid函数和双曲正切函数。现有的激活函数往往是上面三种函数之一或者这三种函数的变种(比如带一到两个可训练的参数)。这三种激活函数的优缺点在于:
(i).ReLu函数是现代神经网络中最常用的激活函数,大多数前馈神经网络默认使用的激活函数。它的优点是算法收敛较快,同时在x>0的区域上不会出现梯度饱和、梯度消失等问题;另外,其缺点也是明显的,包括:在负数区域ReLu函数恒为零从而导致的神经元坏死的现象,此时该神经元的以及这个神经元之后的梯度永远为零,在该训练轮次内无法再进行更新;同时因为ReLu函数在正数和负数区域内的二阶导数以及更高阶导数都是零,所以在某些特殊的应用(比如使用神经网络求解微分方程)上会导致神经网络模型得不到有效的训练。
(ii).Sigmoid函数的优点在于函数的输出在(0,1)之间,优化稳定,也是连续函数且方便求导;缺点在于函数在变量取绝对值非常大时会出现饱和现象,从而对输入和输出不敏感。
(iii).双曲正切函数可以看做是Sigmoid函数的变形,仍然存在梯度饱和的问题。
因此,有必要提出一种表达能力强、光滑性好且便于计算的快速灵活全纯嵌入式神经网络广域寻优训练方法。
技术解决方案
为了达到上述目的,本发明是通过以下技术方案来实现的:
本发明是一种快速灵活全纯嵌入式神经网络广域寻优训练方法,包括如下步骤:
步骤1,确定需要求解的微分方程,在定义域内采样获得训练数据和测试数据;
步骤2,构建神经网络模型,嵌入基于分段有理逼近的激活函数层;
步骤3,调整超参数,训练神经网络模型;
步骤4,进行模型预测,预测结果若满足要求,则模型训练成功,结束训练;否则返回步骤3。
本发明的进一步改进在于:步骤1中微分方程为Burgers方程。
本发明的进一步改进在于:步骤2构建的神经网络模型包括输入层、四个全连接层、四个激活函数层和输出层。
本发明的进一步改进在于:步骤2中的分段有理逼近的激活函数的构造如下:
假设在某个点x 0处使用单点帕德近似方法逼近函数f(x),单点帕德近似函数形式如下:
Figure PCTCN2022094901-appb-000001
其中p k和q k是需要求出的系数,L表示分子中x的最高阶次,M表示在分母中的x最高阶次。当L+M为常数时,取L=M时,分子与分母通过以下方式求解。设L=M=n,首先求解线性方程Aq=b,得到(q 1,q 2,q 3,…,q n)的值,其中:
Figure PCTCN2022094901-appb-000002
Figure PCTCN2022094901-appb-000003
通过下式求出(p 0,p 1,p 2,…,p n)的值:
Figure PCTCN2022094901-appb-000004
多点帕德逼近则是单点帕德逼近的推广形式。设被逼近函数f(x),如果在n+1个插值点x 0,x 1,x 2,…,x n处已知其函数值,则有有理分式:
Figure PCTCN2022094901-appb-000005
其中L+M=n,u [L/M](x)是最高阶次为L的多项式,v [L/M](x)是最高阶次为M的多项式:
Figure PCTCN2022094901-appb-000006
这里,u [L/M](x)与v [L/M](x)是需要通过均差构造的多项式函数;
首先,f(x)的均差定义如下:
Figure PCTCN2022094901-appb-000007
令f i,j为f[x i,x i+1,…,x j],j≥i;则,u [L/M](x)可通过以下方式计算:
Figure PCTCN2022094901-appb-000008
同时,v [L/M](x)可通过以下方式计算:
Figure PCTCN2022094901-appb-000009
本发明使用的分段帕德逼近是通过给出各个插值点、插值点处的函数值和从一阶到m阶的导数值,基于多点帕德逼近来构造各分段,是多点帕德逼近的一种特殊形式,构造方式如下。
设被逼近函数为f(x),且在n+1个插值点x 0,x 1,x 2,…,x n处已知:
Figure PCTCN2022094901-appb-000010
其中
Figure PCTCN2022094901-appb-000011
表示在x i处f(x)的τ阶导数值;
任取一段区间[x k,x k+1],构造帕德逼近表达式:
Figure PCTCN2022094901-appb-000012
其中L+M+1=n,
Figure PCTCN2022094901-appb-000013
Figure PCTCN2022094901-appb-000014
的表达形式已在公式(8)与(9)中给出。其具体计算过程需要考虑2m+2个点构成的等价集合:
Figure PCTCN2022094901-appb-000015
根据公式(8)和公式(9),均差f i,j=f[z i,z i+1,…,z j],0≤i≤j≤2m+1;
由均差的性质和公式(10)得出:
Figure PCTCN2022094901-appb-000016
Figure PCTCN2022094901-appb-000017
当0≤i≤m且m+1≤j≤2m+1时,有递推公式如下:
Figure PCTCN2022094901-appb-000018
当i+1≥m+1时,根据公式(14)直接求出;
当j-1≤m时,根据公式(13)直接求出;
把求得的f i,j带入到公式(8)和公式(9)中,即求出
Figure PCTCN2022094901-appb-000019
Figure PCTCN2022094901-appb-000020
进而求出
Figure PCTCN2022094901-appb-000021
由分段帕德逼近构造的函数r L/M(x)表示为:
Figure PCTCN2022094901-appb-000022
本发明的进一步改进在于:步骤3中设训练轮次为N,训练步骤如下:
步骤3.1,将训练数据输入到神经网络中,执行步骤3.2;
步骤3.2,模块内数据正向传播,数据H n×m输入到激活函数层,执行下一步;
步骤3.3,由激活函数层的超参数x 0,x 1,x 2,…,x n和可训练参数
Figure PCTCN2022094901-appb-000023
Figure PCTCN2022094901-appb-000024
分别作为插值点和零阶到m阶的导数值,根据公式(10)-(16),求出分段函数
Figure PCTCN2022094901-appb-000025
构成分段激活函数r [L/M](x);
步骤3.4,数据H n×m经过激活函数后r [L/M](x),得到输出Z n×m,表示为:
Figure PCTCN2022094901-appb-000026
得到输出Z n×m
步骤3.5,数据继续正向传播,直至遇到下一个激活函数层,跳到步骤3.3,否则,执行步骤3.6;
步骤3.6,得到训练结果,计算损失函数的值,由框架自动进行反向传播、更新神经网络权重和可训练参数;若当前轮次小于等于N,新取一批训练数据,跳转至步骤3.2;否则模型训练流程结束。
本发明的进一步改进在于:步骤4中进行模型预测,预测结果若满足要求,则模型训练成功,结束训练;否则返回步骤3。
有益效果
本发明根据快速灵活全纯嵌入(FFHE)的思想,提出了基于分段有理逼近的激活函数。首先初始化函数点、函数值以及各阶导数值,再利用分段有理逼近的方法构造分段激活函数。其优点如下:
(i).表达能力更强:分段函数的表达能力比普通函数更强,拥有坚实的理论基础。现有文献已证明,在Lipschitz条件下,通过引入一个边界将逐点非线性的性质与网络的全局Lipschitz常数联系起来,然后利用该边界做正则化可以导出一个表示定理,它表明最优构型是由一个深度样条网络实现的,其中每个激活函数都是一个自带适应节点的分段线性样条函数。
(ii).光滑性更佳:ReLu函数、PReLu函数和分段线性样条等其他常用激活函数都只是分段一阶可导的,这在某些场景下受到了限制,比如用神经网络求解微分方程往往需要求网络输出对输入的二阶导数甚至更高阶导数,只是一阶可导的激活函数会导致梯度为零而无法有效地更新参数,而本发明设计的分段有理激活函数拥有连续的高阶导数且可有效地更新参数。
(iii).更加灵活且易于计算:基于分段有理逼近的激活函数,通过设置初始化函数点、函数值以及各阶导数,把函数值和各阶导数值作为可以随着神经网络训练而调整的参数,这些参数的自适应调整使神经网络的反向传播向着最陡峭的方向更新,比其他激活函数需要更少的轮次便可达到预期的精度。
附图说明
图1是本发明流程示意图。
图2是基于分段有理逼近激活函数的神经网络模型训练流程图。
图3是本发明的神经网络模型结构示意图。
图4是PINNs模型结构示意图。
图5是LeakyReLu、ReLu、Tanh和FFHE激活函数训练曲线图。
本发明的实施方式
以下将结合附图详细描述本发明的实施方式,为明确说明起见,许多实施的细节将在以下叙述中一并说明。然而,应了解到,这些实施上的细节不应用以限制本发明。也就是说,在本发明的部分实施方式中,这些实施上的细节是非必要的。
如图1-3所示,本发明是一种快速灵活全纯嵌入式神经网络广域寻优训练方法,包括如下步骤:
步骤1,确定需要求解的微分方程,在定义域内采样获得训练数据和测试数据;
步骤2,构建神经网络模型,嵌入基于分段有理逼近的激活函数层;
步骤3,调整超参数,训练神经网络模型;
步骤4,进行模型预测,预测结果若满足要求,则模型训练成功,结束训练;否则返回步骤3。
步骤1需要求解的微分方程为Burgers方程,Burgers方程对于很多物理问题来说,是一个非常有用的数学模型,比如激波、浅水波问题和交通流动力学等 问题,是描述物理世界扩散现象的重要数学模型。它是一个模拟冲击波的传播和反射的非线性偏微分方程,其定义如下:
u t+uu x-(0.01/π)u xx=0,x∈[-1,1],t∈[0,1],
u(0,x)=-sin(πx),
u(t,-1)=u(t,1)=0.
该方程是一个时变、z状态空间一维、有初值条件和边界条件的偏微分方程。
步骤2中采用的是PINNs模型,该模型的大致结构如图4所示,以微分方程的自变量x,t作为输入,因变量u作为输出。图中NN(x,t;θ)表示为全连接神经网络,θ是神经网络隐藏层的权重。图中PDE(λ)部分,表示在该神经网络模型中损失函数的构成。PINNs的损失函数分为两个部分:一块是初始条件和边界部分,一块是方程等式本身。
以Burgers方程为例,设边界和初值上采样个数为N u,在边界内采样个数为N f。损失函数的第一个部分是计算模型的输出在初始和边界条件上的MSE:
Figure PCTCN2022094901-appb-000027
损失函数的第二个部分是计算模型的输出在方程上的MSE:
γ=u t+uu x-(0.01/π)u xx.
则有:
Figure PCTCN2022094901-appb-000028
最终的损失函数为二者之和:
MSE=MSE u+MSE f.
如图3所示,本发明中,PINNs的全连接神经网络,共有四个隐藏层,每层有20个神经元。在边界内和初值上采样得到25600个(x,t)数据对,在所有数据中再采用拉丁超立方采样方法,于边界内得到10000个(x,t)数据对、在边界和初值上得到100个(x,t)数据对,总共10100个数据对作为模型的训练数据。其余(x,t)数据对作为模型的测试数据。
本发明每个全连接隐藏层后接基于分段有理逼近的激活函数层,每个激活函数层有六个可训练参数,在每个激活函数层部分,都有n+1个超参数x 0,x 1,x 2,…,x n代表插值点,(m+1)(n+1)个可训练参数
Figure PCTCN2022094901-appb-000029
Figure PCTCN2022094901-appb-000030
代表从零阶到m阶的导数值。
本发明根据快速灵活全纯嵌入(FFHE)的思想,结合分段有理逼近相关的数学知识,设计了激活函数。其中,帕德近似是构造有理函数逼近的一种方法,帕德近似往往比截断的泰勒级数更准确;而且,即使当泰勒级数不收敛的情况,帕德近似往往也可收敛。另外,在构造插值函数时,为了避免产生高次多项式带来的龙格现象,通常采用分段插值的手法,即插值结果仅取决于周围少数几个点,最终形成复合分段函数。
步骤2中基于分段有理逼近的激活函数构造过程为:
该部分已说明,详见公式(10)-(16)
步骤3中设最大训练轮次为N,训练神经网络模型的具体步骤如下:
步骤3.1,将训练数据输入到神经网络中,执行步骤3.2;
步骤3.2,模块内数据正向传播,数据H n×m输入到激活函数层,执行下一步;
步骤3.3,由激活函数层的超参数x 0,x 1,x 2,…,x n和可训练参数
Figure PCTCN2022094901-appb-000031
Figure PCTCN2022094901-appb-000032
分别作为插值点和零阶到m阶的导数值,根据公式(10)-(16),求出分段函数
Figure PCTCN2022094901-appb-000033
构成分段激活函数r [L/M](x);
步骤4.4,数据H n×m经过激活函数后r [L/M](x),得到输出Z n×m,表示为:
Figure PCTCN2022094901-appb-000034
得到输出Z n×m
步骤3.5,数据继续正向传播,直至遇到下一个激活函数层,跳到步骤3.3,否则,执行步骤3.6;
步骤3.6,得到训练结果,计算损失函数的值,由框架自动进行反向传播、更新神经网络权重和可训练参数;若当前轮次小于等于N,新取一批训练数据,跳转至步骤3.2;否则模型训练流程结束。
步骤4中进行模型预测,预测结果若满足要求,则模型训练成功,结束训练;否则返回步骤3。
训练7000轮,学习率设置为0.002。LeakyReLu激活函数和ReLu激活函数的训练效果最差,图5中对应的训练曲线是最上面两条几乎重合的曲线。分段帕德逼近构造的激活函数每百轮的平均训练时间为4.307s,Tanh函数每百轮的平均训练时间为3.532s;分段帕德逼近构造的激活函数在第1500轮训练误差就达到了9.4067E-04,而Tanh函数训练到7000轮才刚刚将训练误差降到9.1780E-04。也就是说,使用FFHE方法只需经过Tanh所需的训练轮次的五分之一左右,便可将其误差降到同等水平;如果都训练7000轮,那么FFHE方法所得结果比使用Tanh所得结果的精度提高超过两个数量级(100倍)。由此可见,本发明使用FFHE(分段帕德逼近)方法构造的激活函数,在训练时间和训练精度上都要优于一般的激活函数。为此,本发明为快速准确地解决实际工程计算任务涉及到的高维偏微分方程问题提供了强有力的解决方案。
以上所述仅是本发明的优选实施方式,应当指出:对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (6)

  1. 一种快速灵活全纯嵌入式神经网络广域寻优训练方法,其特征在于:包括如下步骤:
    步骤1,确定需要求解的微分方程,在定义域内采样获得训练数据和测试数据;
    步骤2,构建神经网络模型,嵌入基于分段有理逼近的激活函数层;
    步骤3,调整超参数,训练神经网络模型;
    步骤4,进行模型预测,预测结果若满足要求,则模型训练成功,结束训练;否则返回步骤3。
  2. 根据权利要求1所述一种快速灵活全纯嵌入式神经网络广域寻优训练方法,其特征在于:步骤1中微分方程为Burgers方程。
  3. 根据权利要求1所述一种快速灵活全纯嵌入式神经网络广域寻优训练方法,其特征在于:步骤2构建的神经网络模型包括输入层、四个全连接层、四个激活函数层和输出层。
  4. 根据权利要求1所述一种快速灵活全纯嵌入式神经网络广域寻优训练方法,其特征在于:步骤2中的分段有理逼近的激活函数的构造如下:
    假设在某个点x 0处使用单点帕德近似方法逼近函数f(x),单点帕德近似函数形式如下:
    Figure PCTCN2022094901-appb-100001
    其中p k和q k是需要求出的系数,L表示分子中x的最高阶次,M表示在分母中的x最高阶次。当L+M为常数时,取L=M时,分子与分母通过以下方式求解。设L=M=n,首先求解线性方程Aq=b,得到(q 1,q 2,q 3,…,q n)的值,其中:
    Figure PCTCN2022094901-appb-100002
    Figure PCTCN2022094901-appb-100003
    通过下式求出(p 0,p 1,p 2,…,p n)的值:
    Figure PCTCN2022094901-appb-100004
    多点帕德逼近则是单点帕德逼近的推广形式。设被逼近函数f(x),如果在n+1个插值点x 0,x 1,x 2,…,x n处已知其函数值,则有有理分式:
    Figure PCTCN2022094901-appb-100005
    其中L+M=n,u [L/M](x)是最高阶次为L的多项式,v [L/M](x)是最高阶次为M的多项式:
    Figure PCTCN2022094901-appb-100006
    这里,u [L/M](x)与v [L/M](x)是需要通过均差构造的多项式函数;
    首先,f(x)的均差定义如下:
    Figure PCTCN2022094901-appb-100007
    令f i,j为f[x i,x i+1,…,x j],j≥i;则,u [L/M](x)可通过以下方式计算:
    Figure PCTCN2022094901-appb-100008
    同时,v [L/M](x)可通过以下方式计算:
    Figure PCTCN2022094901-appb-100009
    本发明使用的分段帕德逼近是通过给出各个插值点、插值点处的函数值和从一阶到m阶的导数值,基于多点帕德逼近来构造各分段,是多点帕德逼近的一种特殊形式,构造方式如下。
    设被逼近函数为f(x),且在n+1个插值点x 0,x 1,x 2,…,x n处已知:
    Figure PCTCN2022094901-appb-100010
    其中
    Figure PCTCN2022094901-appb-100011
    表示在x i处f(x)的τ阶导数值;
    任取一段区间[x k,x k+1],构造帕德逼近表达式:
    Figure PCTCN2022094901-appb-100012
    其中L+M+1=n,
    Figure PCTCN2022094901-appb-100013
    Figure PCTCN2022094901-appb-100014
    的表达形式已在公式(8)与(9)中给出。其具体计算过程需要考虑2m+2个点构成的等价集合:
    Figure PCTCN2022094901-appb-100015
    根据公式(8)和公式(9),均差f i,j=f[z i,z i+1,…,z j],0≤i≤j≤2m+1;
    由均差的性质和公式(10)得出:
    Figure PCTCN2022094901-appb-100016
    Figure PCTCN2022094901-appb-100017
    当0≤i≤m且m+1≤j≤2m+1时,有递推公式如下:
    Figure PCTCN2022094901-appb-100018
    当i+1≥m+1时,根据公式(14)直接求出;
    当j-1≤m时,根据公式(13)直接求出;
    把求得的f i,j带入到公式(8)和公式(9)中,即求出
    Figure PCTCN2022094901-appb-100019
    Figure PCTCN2022094901-appb-100020
    进而求出
    Figure PCTCN2022094901-appb-100021
    由分段帕德逼近构造的函数r L/M(x)表示为:
    Figure PCTCN2022094901-appb-100022
  5. 根据权利要求1所述一种快速灵活全纯嵌入式神经网络广域寻优训练方法,其特征在于:步骤3中设训练轮次为N,训练步骤如下:
    步骤3.1,将训练数据输入到神经网络中,执行步骤3.2;
    步骤3.2,模块内数据正向传播,数据H n×m输入到激活函数层,执行下一步;步骤3.3,由激活函数层的超参数x 0,x 1,x 2,…,x n和可训练参数
    Figure PCTCN2022094901-appb-100023
    Figure PCTCN2022094901-appb-100024
    分别作为插值点和零阶到m阶的导数值,根据公式(10)-(16),求出分段函数
    Figure PCTCN2022094901-appb-100025
    构成分段激活函数r [L/M](x);
    步骤3.4,数据H n×m经过激活函数后r [L/M](x),得到输出Z n×m,表示为:
    Figure PCTCN2022094901-appb-100026
    得到输出Z n×m
    步骤3.5,数据继续正向传播,直至遇到下一个激活函数层,跳到步骤3.3,否则,执行步骤3.6;
    步骤3.6,得到训练结果,计算损失函数的值,由框架自动进行反向传播、更新神经网络权重和可训练参数;若当前轮次小于等于N,新取一批训练数据,跳转至步骤3.2;否则模型训练流程结束。
  6. 根据权利要求1所述一种快速灵活全纯嵌入式神经网络广域寻优训练方法,其特征在于:步骤4中进行模型预测,预测结果若满足要求,则模型训练成功,结束训练;否则返回步骤3。
PCT/CN2022/094901 2022-02-10 2022-05-25 一种快速灵活全纯嵌入式神经网络广域寻优训练方法 WO2023151201A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210125273.3 2022-02-10
CN202210125273.3A CN114548400A (zh) 2022-02-10 2022-02-10 一种快速灵活全纯嵌入式神经网络广域寻优训练方法

Publications (1)

Publication Number Publication Date
WO2023151201A1 true WO2023151201A1 (zh) 2023-08-17

Family

ID=81672897

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/094901 WO2023151201A1 (zh) 2022-02-10 2022-05-25 一种快速灵活全纯嵌入式神经网络广域寻优训练方法

Country Status (2)

Country Link
CN (1) CN114548400A (zh)
WO (1) WO2023151201A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116700049B (zh) * 2023-07-12 2024-05-28 山东大学 基于数据驱动的多能源网络数字孪生实时仿真系统及方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200050159A1 (en) * 2018-08-10 2020-02-13 Cornell University Processing platform with holomorphic embedding functionality for power control and other applications
CN112597700A (zh) * 2020-12-15 2021-04-02 北京理工大学 基于神经网络的飞行器弹道仿真方法
CN112784496A (zh) * 2021-01-29 2021-05-11 上海明略人工智能(集团)有限公司 一种流体力学的运动参数预测方法、装置及存储介质
CN113183146A (zh) * 2021-02-04 2021-07-30 中山大学 一种基于快速灵活全纯嵌入思想的机械臂运动规划方法
CN113489014A (zh) * 2021-07-19 2021-10-08 中山大学 一种快速灵活全纯嵌入式电力系统最优潮流评估方法
CN114239698A (zh) * 2021-11-26 2022-03-25 中国空间技术研究院 数据处理方法、装置及设备
CN114385969A (zh) * 2022-01-12 2022-04-22 温州大学 用于求解微分方程的神经网络方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200050159A1 (en) * 2018-08-10 2020-02-13 Cornell University Processing platform with holomorphic embedding functionality for power control and other applications
CN112597700A (zh) * 2020-12-15 2021-04-02 北京理工大学 基于神经网络的飞行器弹道仿真方法
CN112784496A (zh) * 2021-01-29 2021-05-11 上海明略人工智能(集团)有限公司 一种流体力学的运动参数预测方法、装置及存储介质
CN113183146A (zh) * 2021-02-04 2021-07-30 中山大学 一种基于快速灵活全纯嵌入思想的机械臂运动规划方法
CN113489014A (zh) * 2021-07-19 2021-10-08 中山大学 一种快速灵活全纯嵌入式电力系统最优潮流评估方法
CN114239698A (zh) * 2021-11-26 2022-03-25 中国空间技术研究院 数据处理方法、装置及设备
CN114385969A (zh) * 2022-01-12 2022-04-22 温州大学 用于求解微分方程的神经网络方法

Also Published As

Publication number Publication date
CN114548400A (zh) 2022-05-27

Similar Documents

Publication Publication Date Title
CN110119854B (zh) 基于代价敏感lstm循环神经网络的稳压器水位预测方法
KR20040099092A (ko) 계기 노이즈 및 측정 오차의 존재 하의 인공 신경망모델의 향상된 성능
Ye et al. Cascaded GMDH-wavelet-neuro-fuzzy network
CN111507530B (zh) 基于分数阶动量梯度下降的rbf神经网络船舶交通流预测方法
WO2023151201A1 (zh) 一种快速灵活全纯嵌入式神经网络广域寻优训练方法
CN112578089B (zh) 一种基于改进tcn的空气污染物浓度预测方法
CN115407665B (zh) 一种基于h∞的机器人关节高效故障估计方法及装置
CN113191092A (zh) 一种基于正交增量随机配置网络的工业过程产品质量软测量方法
Yu et al. Recurrent neural networks training with stable bounding ellipsoid algorithm
CN115327927A (zh) 一种奖励函数及采用该函数的振动抑制强化学习算法
Eldebiky et al. Correctnet: Robustness enhancement of analog in-memory computing for neural networks by error suppression and compensation
Wang et al. Adaptive echo state network with a recursive inverse-free weight update algorithm
CN111524348A (zh) 一种长短期交通流预测模型及方法
CN109861666B (zh) 基于反馈神经网络的frm滤波器设计方法及系统
Shen et al. Stock index prediction based on adaptive training and pruning algorithm
Bazzi et al. Comparative performance of several recent supervised learning algorithms
CN114202063A (zh) 一种基于遗传算法优化的模糊神经网络温室温度预测方法
JPH05128284A (ja) ニユーロプロセツサ
Discacciati Controlling oscillations in high-order schemes using neural networks
Sharma et al. An adaptive sigmoidal activation function cascading neural networks
Yang et al. ELM weighted hybrid modeling and its online modification
CN110661511A (zh) 一种矩阵型自适应滤波器迟滞控制系统及方法
Ueno et al. Interpretation of deep neural networks based on decision trees
EP4134882A1 (en) Information processing system, information processing device, information processing method, and information processing program
CN113468740B (zh) 一种基于协同分摊噪声的软测量建模方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22925560

Country of ref document: EP

Kind code of ref document: A1