CN110531620B

CN110531620B - Adaptive control method of mountain climbing system of trolley based on Gaussian process approximate model

Info

Publication number: CN110531620B
Application number: CN201910823151.XA
Authority: CN
Inventors: 钟珊; 陈雪梅; 应文豪; 伏玉琛; 龚声蓉; 钱振江
Original assignee: Changshu Institute of Technology
Current assignee: Changshu Institute of Technology
Priority date: 2019-09-02
Filing date: 2019-09-02
Publication date: 2020-09-18
Anticipated expiration: 2039-09-02
Also published as: CN110531620A

Abstract

The invention discloses a self-adaptive control method of a mountain climbing system of a trolley based on a Gaussian process approximate model, which learns a value function and a strategy through online samples generated by a physical system simulator, and simultaneously learns an environmental dynamic model based on the Gaussian process by utilizing the online samples in the process. When the precision of the environmental dynamic model meets a certain precision, the model based on the Gaussian process can be used for off-line planning, and the convergence of the algorithm is promoted together with on-line learning. The method can acquire the optimal control method of the trolley climbing system more quickly.

Description

Adaptive control method of mountain climbing system of trolley based on Gaussian process approximate model

Technical Field

The invention relates to a self-adaptive control method of a physical system, in particular to a self-adaptive control method of a mountain climbing system of a trolley based on a Gaussian process approximate model.

Background

The system for the trolley to ascend the mountain is shown in figure 1, the trolley is positioned at the slope bottom of two mountains, and the target of the trolley is a pentagram position of the right mountain head. However, the car cannot directly accelerate to reach the destination through the accelerator due to insufficient power, and only can reach the left side first, so that the car has enough forward inertia and can reach the destination on the right side through enough acceleration. The self-adaptive control of the system is to control the acceleration of the trolley at any time step to ensure that the trolley reaches the right in the shortest time. This control problem is an optimal control problem for continuous states or continuous motion spaces. The control problem of a physical system can be generally modeled as a markov decision problem, that is, all possible states in the physical system are modeled as a state space, all possible actions are modeled as an action space, the probability distribution of the next state reached after a certain action is applied in the current state is modeled as a transition function, and the environmental feedback obtained after a certain action is applied in the current state is called a reward function.

After the physical system is modeled into the MDP model, the optimal strategy can be solved by adopting a reinforcement learning method, namely the optimal control method of the physical system is obtained. Reinforcement learning methods can be divided into two categories: model independent methods and model based methods. Model-independent methods learn value functions and strategies by learning the interaction of agents with the environment to obtain samples. The method is simple and quick, but only utilizes the sample learning value function and the strategy, and the sample is discarded after being used once, so that the utilization rate of the sample is extremely low; the model-based method can learn the value function and the strategy by planning through the dynamic model without participation of real samples, so the method has higher sample utilization efficiency, and has the defect that the optimal solution of the problem is obtained by continuously iterating the Bellman equation, thereby leading the model-based method to have higher computational complexity.

In most practical physical systems, the model is unknown. If one wants to take advantage of model planning, one must first learn a model and then use that model to do the planning. However, most physical systems are continuous rather than discrete and even if the model is known, they cannot be used directly for iterative solution of bellman equations. Meanwhile, the quality of planning is directly affected when the learned model is not accurate enough.

Disclosure of Invention

The invention aims to provide a self-adaptive control method of a mountain climbing system of a trolley based on a Gaussian process approximate model, which learns a value function and a strategy through online samples generated by a physical system simulator, and simultaneously learns an environmental dynamic model based on the Gaussian process by using the online samples in the process. When the precision of the environmental dynamic model meets a certain precision, the model is used for planning to generate a simulation sample, and the simulation sample and the online sample jointly learn a value function and a strategy, so that the convergence of the algorithm is promoted, and the optimal control method of the system is obtained more quickly.

The technical scheme of the invention is as follows: a self-adaptive control method of a trolley climbing system based on a Gaussian process approximate model comprises the following steps:

step (1) initializing a model, setting a state space X and an action space U of an environment, wherein the state is represented by a two-dimensional vector X ═ (w, v) ∈ X, w is the position of the trolley in the horizontal direction, v is the speed of the trolley in the horizontal direction, and the trolley can executeThe action is acceleration U ∈ U, and the temporary variable in the approximate model of the Gaussian process, namely the state transition function, is a vector

Variable d is 0, variable s is 0 and matrix

The state x is corresponding to a characteristic function, and phi (x, u) is the characteristic function of the state action pair (x, u);

initializing the hyper-parameters, setting the discount rate gamma, the attenuation factor lambda, the maximum plot number E and the exploration variance sigma of the Gaussian function²Matrix DeltaN_kRespective element σ on the middle diagonal_i ²I is more than or equal to 1 and less than or equal to k, the maximum time step T contained in each plot, the learning rate α of the value function and the strategy, the current plot number e is 1, and the parameter vector of the value function

Policy parameter vector

Gaussian process approximation model parameter vector

Planning the maximum times K;

initializing the range of a state space and an action space of a trolley climbing system, initializing conditions of success or failure of control, setting the current time step t as 1, and setting the current state x as x₁；

Step (4) with the current optimal action u^*As the mean of the Gaussian function, with the search variance σ specified in step (2)²Establishing the Gaussian equation N (u) as the variance^*,σ²) Using the Gaussian equation to generate the action u to be performed currently_t；

Step (5) in the current state x_tThen, the step (A) is performed4) Of (1) determined action u_tAnd obtaining the next state x of the trolley by using the dynamic equation of the system_t+1While obtaining an immediate reward r using a reward function_t+1Forming a sample (x)_t,u_t,x_t+1,r_t+1)；

Step (6) TD error using sample calculation value function_t：_t＝r_t+1+γV(x_t+1,ν_t)-V(x_t,ν_t)；

Step (7) updating the eligibility trace e of the value function_t+1：

Step (8) updating the value function parameter v_t+1：ν_t+1←ν_t+α_te_t+1；

Step (9) updating strategy parameter theta_t+1：θ_t+1←θ_t+α_t(u^*-u_t)；

Step (10) uses the sample to update the model intermediate formula p_t+1、d_t+1、s_t+1And P_t+1；

Step (11) updating the state transition function parameters by using the current sample:

step (12) updating the current state: x ═ x_t+1Judgment of x_t+1The state component w in_t+1Whether to control the success condition:

if it is

Let E be E +1 and judge whether the current episode E be E ═ E holds:

if it reaches

Turning to step (19);

otherwise

Turning to step (13);

step (13) initializes the planning number k to 1, and plans the initial state x 'of the process'_k＝x₁；

Step (14) is x 'in the current state'_kThen, according to step (4), the action u to be executed is selected_kThen, the next state is predicted according to the Gaussian process approximation model:

wherein phi_k＝(φ(x′₁,u₀),φ(x′₂,u₁),...,φ(x′_k,u_t))^TFor the state feature matrix at time step t, β_kBeing a model parameter of the Gaussian process, Δ N_t∈R^t×tA noise matrix in which the position vectors satisfy the Gaussian distribution until t time step;

step (15) updating the qualification trace according to the Gaussian process approximate model:

step (16) updating value function parameters according to simulation samples generated by the Gaussian process approximation model: v is_k+1←ν_k+α_ke_k+1；

Step (17) updating strategy parameters according to simulation samples generated by the Gaussian process approximation model: theta_k+1←θ_k+α_kΔu_k；

Step (18) of judging the current planning times k:

if K is equal to K

Updating the current time step t to t +1, and judging the current time step t;

if the current time step does not reach the maximum time step T

Continuing to operate in the step (4);

otherwise

And updating the current plot e as e +1, and judging the current plot:

if the current plot E is equal to E

Then go to step (19);

otherwise

Turning to the step (3);

otherwise

k +1, and go to step (14);

and (19) outputting the optimal strategy. At this point the trolley is moved from its initial state x₀Starting from an arbitrary state x_tTo adopt an optimal strategy

To obtain an arbitrary state x_tAnd corresponding optimal action until the target state is reached.

Further, the solution of the optimal action in the step (4)

Wherein the content of the first and second substances,

is a state x_tCorresponding feature, θ_tAnd representing the strategy parameters corresponding to the time step t.

Further, in the step (5), given the current state x ═ x_t＝(w_t,v_t)，w_tIs a position component, v_tFor the velocity component, its next state can be represented as x_t+1＝(w_t+1,v_t+1) Wherein the velocity component of the next time step can be represented by v_t+1＝v_t+0.001u_t+gcos(3w_t) Can be solved and the state component of the next time step can be passed through w_t+1＝w_t+v_t+1To solve, where g-0.0025 is the acceleration of gravity, and the reward function is: if the next state is x_t+1When in the target state, r _t+10, otherwise r_t+1＝-1；

Further, the expression of the function of the state value in the step (6) is

Wherein, v_tRepresents a state x_tThe parameters of the corresponding value function are,

represents a state x_t+1The function of the corresponding value is then used,

is a state x_tCorresponding feature, r_t+1Is in a state x_tTo execute action u_tA prize to be awarded.

Further, the qualification trace updating formula in the step (7) is as follows:

wherein e is_tRepresents a state x_tCorresponding qualification trace, e_t+1Represents a state x_t+1The corresponding qualification trace.

Further, the value function parameter in the step (8) is updated to v_t+1←ν_t+α_te_t+1In v_tIs a state x_tThe corresponding value function parameter vector.

Further, the strategy parameter θ in the step (9)_t+1：θ_t+1←θ_t+α_tΔ u, wherein_tTD error as a function of the value corresponding to step (7).

Further, p of the intermediate formula of the model in the step (10)_t+1、d_t+1、s_t+1And P_t+1The update formulas are respectively:

wherein u is_t+1Indicating that the state x can be obtained according to step (4)_t+1An action to be performed, u_tIndicating that the state x can be obtained according to step (4)_tActions performed at time step, σ_tThe gaussian process approximates the standard deviation of the model at time step t.

Further, the state transition function parameter in the step (11):

wherein p is_t+1、d_t+1And s_t+1Is the intermediate variable of the model, β, found according to step (10)_tAnd (3) approximating a model, namely a parameter vector of the state transition function, for the Gaussian process corresponding to the time step t.

Further, the next state obtained in the step (14) based on the gaussian process approximation model is

Wherein phi_k＝(φ(x′₁,u₁),φ(x′₂,u₂),...,φ(x′_k,u_k))^TIs a state feature matrix, x 'at time step k'₁Is a planned initial state, x'_kIs planned to be from x'₁State reached after k times of planning start β_kBeing a model parameter of the Gaussian process, Δ N_k∈R^k×kIs a noise matrix whose position components up to k time steps satisfy a Gaussian distribution, i.e.

The technical scheme provided by the invention has the advantages that,

establishing an approximate method suitable for solving the optimal strategy of the mountain climbing system of the trolley, namely, carrying out approximate solution on a continuously controlled value function and a Bellman equation corresponding to the strategy; the learning precision of the model is improved, and when the model meets a certain precision requirement, the model is used for planning to generate a simulation sample, so that the convergence of the optimal control method is promoted.

Drawings

FIG. 1 is a schematic view of a cart mountain climbing system;

FIG. 2 is a schematic flow chart of the method of the present invention.

Detailed Description

The present invention is further illustrated by the following examples, which are not to be construed as limiting the invention thereto.

Referring to fig. 1 and 2, the present invention relates to a system for climbing a mountain on a trolley, which is initially set up and applied to the problemA dynamic model: the state of the trolley is x when any time step t is set_t＝(w_t,v_t) Wherein w is_tThe position of the trolley on the x axis at the time step t is shown, the limit value of the x axis at the top end on the left side of the hill is-1.2, the limit value of the x axis at the top end on the right side of the hill is 0.5, and therefore the w of the position of the trolley is_tThe threshold value of is-1.2-w_t≤0.5；v_tThe speed of the display trolley is-0.07-v_t≤0.07；u_tThe value range of the action exerted on the trolley, namely the acceleration of the trolley corresponding to the time step t is-1 to u_tThe acceleration is less than or equal to 1, the accelerator is used as a timing accelerator, and the brake is stepped when the acceleration is negative. At an initial time step, i.e. when t is 1, the initial state of the vehicle is x₁＝(w₁,v₁) (-0.5, 0). The target position of the trolley is the right five-pointed star position, namely w_tAt 0.5.

Suppose that the current state of the known trolley is x_t＝(w_t,v_t) Then, at the next time step t +1, the next state x of the vehicle_t+1The corresponding x-axis values of the cart are:

wherein g-0.0025 is gravity acceleration, u_tIs the action exerted on the trolley, i.e. the acceleration of the trolley corresponding to time step t.

Solution objective of the problem: from the kinetic model given above, it can be seen that given the current state x_t＝(w_t,v_t) In the case of (1), as long as the strategy is known, i.e. the acceleration u at any time step_tThe vehicle state for the next time step can be obtained until the goal is achieved.

The measure of the optimality of a policy may be measured by the time required to reach the goal, and thus, the less time required to reach the goal, the better the corresponding policy. To achieve this goal, a reward function is introduced:

from this reward function, it can be seen that if the more time steps the vehicle takes to reach the target from the initial position, the less the accumulated reward value; conversely, if less time steps are required to reach the goal, then the more prize values are accumulated.

The optimization goal of the algorithm is to maximize the accumulated reward since the less time steps required to reach the target position, the better the strategy.

The accumulated reward may be approximated by a V-valued function, state x at time step t_tThe corresponding V value may be expressed as:

wherein, v_tFor the V-valued function parameter vector corresponding to time step t,

is a state x_tIs characterized by

Solving the optimal strategy is to find a strategy capable of maximizing the accumulated reward, namely:

h(x_t)^*＝argmax_hV^h(x_t,ν_t)

wherein, V^h(x_t,ν_t) Representing the V-value function under the adopted strategy h.

When the optimal strategy is obtained, h (x) can be utilized^*And obtaining the optimal acceleration of the trolley corresponding to the arbitrary time step t as follows:

referring to fig. 2, the adaptive control method for the mountain climbing system of the trolley based on the gaussian process approximation model in the embodiment includes the following steps:

step (1) initializing a model, and setting a variable in a state space X of an environment to be a 2-dimensional directionQuantity x_t＝(w_t,v_t)， w_t∈[-1.2,+0.5]Is the position of the trolley in the horizontal direction, v_t∈[-0.07,+0.07]Indicating the speed of the trolley in the horizontal direction. Temporary variables in Gaussian process approximation model (state transition function) are vectors

Variable d is 0, variable s is 0 and matrix

For state x, the corresponding feature function, and φ (x, u) the feature function of the state action pair (x, u). The motion that the trolley can perform is acceleration u_t∈[-1,1]The reward function is set to:

step (2) initializing an environment, setting the discount rate gamma to be 0.95, the attenuation factor lambda to be 0.4, the maximum knot number E to be 500, and the exploration variance sigma of the Gaussian function²0.9, the matrix Δ N is initialized_kRespective element σ on the middle diagonal_i ²(1 ≦ i ≦ k) is a random number between 0.1 and 0.8, the maximum time step T included in each episode is 3000, the learning rate of the value function and the learning rate of the policy α are 0.6, the current episode number e is 1, the value function parameter vector is a vector of values

Policy parameter vector

Gaussian process approximation model parameter vector

Planning the maximum times K to be 100;

step (3) initializing a physical system of the trolley for going up the hill,the initial state is defined as x₁＝(w₁,v₁) Initializing the ranges of the state space and the action space of the trolley ascending system (-0.5,0), controlling the conditions of success or failure, achieving the target state w-0.5 or the current plot number equal to the maximum plot number E-E, the current time step t-1, and the current state x-x₁；

Step (5) in the current state x_tThen, the action u determined in step (4) is executed_tGet the next state x of the physical system_tImmediate reward r_t+1Forming a sample (x)_t,u_t,x_t+1,r_t+1) In calculating the next state x_t+1＝(w_t+1,v_t+1) In the process of (1), the next position w is calculated according to the following formula_t+1And velocity v_t+1：

Wherein g-0.0025 is gravity acceleration

Generating a reward r after execution of a current action according to a reward function_t+1：

Step (7) updating the qualification trace of the value function:

the initial eligibility trace vector defaults to 0,

is calculated as follows:

wherein the content of the first and second substances,

the center of the gaussian radial basis function is represented as a two-dimensional vector whose dimension is the product of the number of center points in the position direction and the velocity direction. The number of central points selected in the position direction is 11, and the central points are { -1.05, -0.9, -0.75, -0.6, -0.45, -0.3, -0.15, 0, 0.15, 0.3, 0.45 }; the number of central points selected in the speed direction is 10, the speed central points are { -0.058, -0.046, -0.034, -0.022, -0.01, 0.002, 0.014, 0.026, 0.038 and 0.05}, and the variances of the position and the speed are respectively taken

And

Step (9) updating strategy parameter theta_t+1：θ_t+1←θ_t+α_t(u^*-u_t)；

Step (10) uses the sample to update the model intermediate formula p_t+1、d_t+1、s_t+1And P_t+1Wherein the state action pairs y_t＝(x_t,u_t) Is phi (x)_t,u_t) Can be expressed as:

wherein the content of the first and second substances,

the center of the Gaussian radial basis function is represented as a three-dimensional vector, and the dimension of the three-dimensional vector is the product of the number of center points of the position direction, the speed direction and the action direction. The number of central points selected in the position direction is 11, and the central points of the positions are { -1.05, -0.9, -0.75, -0.6, -0.45, -0.3, -0.15, 0, 0.15, 0.3, 0.45 }; the number of central points selected in the speed direction is 10, and the speed central points are { -0.058, -0.046, -0.034, -0.022, -0.01, 0.002, 0.014, 0.026, 0.038 and 0.05 }; the number of the central points selected by the action directions is 5, and the central points of the action directions are as follows: { -1, -0.5,0,0.5, 1}. The variance of position, velocity and motion is taken separately

And

step (11) updating the state transition function parameter vector by using the current sample:

step (12) updating the current state: x ═ x_t+1Judgment of x_t+1The state component w in_t+1Whether or not 0.5 holds:

if yes, let E be E +1, and judge whether current plot E reaches maximum value E:

if it reaches

Then go to step (19);

otherwise

Turning to the step (3);

Step (14) is x 'in the current state'_kThen, according to step (4), the action u to be executed is selected_kThen, the next state is predicted from the gaussian process model:

wherein phi_k＝(φ(x′₁,u₁),φ(x′₂,u₂),...,φ(x′_k,u_k))^TFor the state feature matrix at k time steps, β_kBeing a model parameter of the Gaussian process, Δ N_k∈R^k×kIs a noise matrix satisfying a gaussian distribution to k time step positions with values:

step (15) updating the qualification trace according to the Gaussian process model:

step (16) updating value function parameters according to simulation samples generated by the Gaussian process model: v is_k+1←ν_k+α_ke_k+1；

Step (17) updating strategy parameters according to the simulation samples generated by the Gaussian process model: theta_k+1←θ_k+α_kΔu_k；

Step (18) of judging the current planning times k:

if K is equal to K, updating the current time step t to t +1, and judging the time step t;

if the current time step does not reach the maximum time step T

Continuing to operate in the step (4);

otherwise

And updating the current plot e as e +1, and judging the current plot:

if the current plot E is equal to E

Then go to step (19);

otherwise

Turning to the step (3);

otherwise

Continuing to execute the step (14) when k is k + 1;

and (19) outputting the optimal strategy. At this time, the trolley is driven from the initial stateState x₁Starting from an arbitrary state x_tTo adopt an optimal strategy

Claims

1. A self-adaptive control method of a trolley hill climbing system based on a Gaussian process approximate model is characterized by comprising the following steps:

initializing a model, setting a state space X and an action space U of an environment, wherein the state is represented by a two-dimensional vector X (w, v) ∈ X, w is the position of a trolley in the horizontal direction, v is the speed of the trolley in the horizontal direction, the action which can be executed by the trolley is an acceleration U ∈ U, and a Gaussian process approximation model, namely a temporary variable in a state transfer function is a vector

Variable d is 0, variable s is 0 and matrix

Policy parameter vector

Gaussian process approximation model parameter vector

Planning the maximum times K;

Step (5) in the current state x_tThen, the action u determined in step (4) is executed_tAnd obtaining the next state x of the trolley by using the dynamic equation of the system_t+1While obtaining an immediate reward r using a reward function_t+1Forming a sample (x)_t,u_t,x_t+1,r_t+1)；

Step (6) TD error using sample calculation value function_t：_t＝r_t+1+γV(x_t+1,ν_t)-V(x_t,ν_t) Wherein v is_tRepresents a state x_tParameter of the corresponding value function, V (x)_t+1,ν_t) Represents a state x_t+1Corresponding value function, V (x)_t,ν_t) Represents a state x_tA corresponding value function;

step (7) updating the eligibility trace e of the value function_t+1：

Step (8) update value function parameter v_t+1：v_t+1←v_t+α_te_t+1；

Step (9) updating strategy parameter theta_t+1：θ_t+1←θ_t+α_t(u^*-u_t)；

Wherein u is_t+1Indicating that the state x can be obtained according to step (4)_t+1An action to be performed, u_tIndicating that the state x can be obtained according to step (4)_tActions performed at time step, σ_tThe standard deviation of the Gaussian process approximate model at the time step t;

if yes, let E be E +1, and judge whether the current plot E ═ E holds:

if so, turning to step (19);

otherwise, go to step (13);

wherein phi_k＝(φ(x′₁,u₀),φ(x′₂,u₁),...,φ(x′_k,u_t))^TTo time tThe state feature matrix at step β is the model parameter of the Gaussian process, Δ N_t∈R^t×tA noise matrix in which the position components satisfy the gaussian distribution until t time step;

step (16) updating value function parameters according to simulation samples generated by the Gaussian process approximation model: v. of_k+1←v_k+α_ke_k+1Wherein_kTD error as a function of value;

step (17) updating strategy parameters according to simulation samples generated by the Gaussian process approximation model: theta_k+1←θ_k+α_kΔu_kWherein Δ u_k＝u^*-u_k，u^*For the current optimal action, u_kRepresentation Using the Gaussian equation N (u)^*,σ²) A resulting action to be currently performed;

step (18) of judging the current planning times k:

if K is equal to K

Updating the current time step t to t +1 and judging the current time step t;

if the current time step does not reach the maximum time step T

Continuing to operate in the step (4);

otherwise

And updating the current plot e as e +1, and judging the current plot:

if the current plot E is equal to E

Turning to step (19);

otherwise

Turning to the step (3);

otherwise

k +1, and go to step (14);

step (19) of outputting an optimal strategy, wherein the trolley is in the initial state x₀Starting from an arbitrary state x_tTo adopt an optimal strategy

2. The adaptive control method for mountain climbing system of trolley based on Gaussian process approximation model as claimed in claim 1, wherein the solution of optimal action in step (4)

Wherein the content of the first and second substances,

3. The adaptive control method for mountain climbing system on trolley based on Gaussian process approximation model as claimed in claim 1, wherein in step (5), given current state x ═ x_t＝(w_t,v_t)，w_tIs a position component, v_tFor the velocity component, its next state can be represented as x_t+1＝(w_t+1,v_t+1) Wherein the velocity component of the next time step can be represented by v_t+1＝v_t+0.001u_t+gcos(3w_t) Can be solved and the state component of the next time step can be passed through w_t+1＝w_t+v_t+1To solve, where g-0.0025 is the acceleration of gravity, and the reward function is: if the next state is x_t+1When in the target state, r_t+10, otherwise r_t+1＝-1。

4. The adaptive control method for mountain climbing system on trolley based on Gaussian process approximation model as claimed in claim 1, wherein the expression of the function of state value in step (6) is

represents a state x_t+1The function of the corresponding value is then used,

5. The adaptive control method for mountain climbing system on trolley based on Gaussian process approximate model as claimed in claim 1, wherein the qualification track update formula in the step (7) is as follows:

6. The adaptive control method for mountain climbing system on trolley based on Gaussian process approximation model as claimed in claim 1, wherein the strategy parameter θ in step (9)_t+1：θ_t+1←θ_t+α_t(u^*-u_t) Wherein_tTD error as a function of the value corresponding to step (7).

7. The adaptive control method for mountain climbing system of trolley based on approximate Gaussian process model as claimed in claim 1, wherein in the step (11), the parameter vector of state transition function:

wherein p is_t+1、d_t+1And s_t+1Is obtained according to the step (10), β_tFor Gauss passing corresponding to time step tThe path approximation model is the parameter vector of the state transition function.

8. The adaptive control method for mountain climbing system of trolley based on Gaussian process approximate model as claimed in claim 1, wherein the next state obtained based on Gaussian process approximate model in step (14) is

Wherein phi_k＝(φ(x′₁,u₁),φ(x′₂,u₂),...,φ(x′_k,u_k))^TIs a state feature matrix, x 'at time step k'₁Is a planned initial state, x'_kIs planned to be from x'₁The state reached after k times of planning is started β is the model parameter of the Gaussian process, Δ N_k∈R^k×kIs a noise matrix whose position components up to k time steps satisfy a Gaussian distribution, i.e.