CN110879531B - Data-driven self-adaptive optimization control method and medium for random disturbance system - Google Patents

Data-driven self-adaptive optimization control method and medium for random disturbance system Download PDF

Info

Publication number
CN110879531B
CN110879531B CN201911154069.9A CN201911154069A CN110879531B CN 110879531 B CN110879531 B CN 110879531B CN 201911154069 A CN201911154069 A CN 201911154069A CN 110879531 B CN110879531 B CN 110879531B
Authority
CN
China
Prior art keywords
data
optimal
state
control
driven
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911154069.9A
Other languages
Chinese (zh)
Other versions
CN110879531A (en
Inventor
甘明刚
马千兆
张蒙
陈杰
窦丽华
邓方
白永强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Chongqing Innovation Center of Beijing University of Technology
Original Assignee
Beijing Institute of Technology BIT
Chongqing Innovation Center of Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT, Chongqing Innovation Center of Beijing University of Technology filed Critical Beijing Institute of Technology BIT
Priority to CN201911154069.9A priority Critical patent/CN110879531B/en
Publication of CN110879531A publication Critical patent/CN110879531A/en
Application granted granted Critical
Publication of CN110879531B publication Critical patent/CN110879531B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Abstract

The invention discloses a data-driven self-adaptive optimization control method and a medium of a random disturbance system, wherein the method comprises a problem description part, a design part of a data-driven optimal state observer and an inter-policy data-driven ADP control part of the random disturbance system; the present invention has been described in detail with respect to the above three sections. According to the method, the optimal state observer is driven by design data, and different strategy data driving ADP control of a random disturbance system is performed. The data driving ADP method is firstly used for a system with completely unmeasurable state; model-less LQG control is generalized to continuous time systems; non-matching noise outside a control signal channel and independent noise independent of a state and a control signal are considered in ADP design; a novel different strategy data driving ADP control method and medium for a random disturbance system are provided, the burden of repeatedly reading and updating control signals is avoided, and the calculation amount is obviously reduced.

Description

Data-driven self-adaptive optimization control method and medium for random disturbance system
Technical Field
The invention relates to a random noise disturbance system, in particular to model-free random optimal control. The random noise disturbance system is applied to various fields such as industrial and agricultural production, electric power systems, chemical processes, mechanical manufacturing, transportation, aerospace, artificial intelligence and the like.
Background
Uncertainty in the actual system may come from signals such as inputs and conditionsNoise. Therefore, the optimal control problem of the random noise disturbance system is always concerned. In the conventional literature, such problems are usually represented by H2Or HAnd (3) processing by using a robust control method, wherein the main realization mode is to adjust disturbance input by using a certain determination model so as to design state feedback control. In engineering practice, however, it is often not practical to update the external disturbances in the way they are expected. On the other hand, existing H2And HThe outcomes are mostly model-based methods. For practical control systems, in addition to noise interference, uncertainty due to model unknowns may also suffer. Therefore, the research of the model-free random optimal control method has important theoretical and practical significance.
An Adaptive Dynamic Programming (ADP) method provides a new idea of model-free random optimal control. In recent years, random optimal control results based on reinforcement learning or an ADP method have appeared, which only considers the "matching type" noise of a control signal channel, needs to read and update the control signal for many times, and has a large computation amount. In a practical system, however, the source of the noise may fall into different categories. Another concern is that system state is sometimes not directly available, but existing data-driven reinforcement learning or ADP methods require that system state be at least partially known. Under the MBC framework, some researchers have solved the problem of completely unmeasurable states by taking the output as a state. However, in the case of measurement noise, this approach will certainly deteriorate the performance of the control system.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a data-driven self-adaptive optimization control method of a random disturbance system, which is used for solving the control problem of the system with completely undetectable state.
It is another object of the present invention to provide a storage medium for a data-driven adaptive optimization control method for a stochastic disturbance system.
The purpose of the invention is realized by the following technical scheme:
a data-driven self-adaptive optimization control method of a random disturbance system comprises a problem description part, a design part of a data-driven optimal state observer and an ADP control part of a random disturbance system driven by different strategy data;
for the problem description section:
giving a random perturbation system and obtaining an output equation associated with the random perturbation system; solving the optimal linear control, and minimizing a cost function through a designed random optimal control strategy;
for the design part of the data-driven optimal state observer:
aiming at a completely unmeasured system state, designing a data-driven optimal state observer; obtaining a state design system through a random disturbance system, an output equation and an observer;
observing the obtained optimal control strategy of the state design system;
designing a data driving algorithm on a state design system by using the idea of data driving ADP, and solving the optimal observation gain;
for the different strategy data drive ADP control part of the random disturbance system:
and obtaining the online state information of the state design system by using the optimal observer, and further designing a data-driven ADP algorithm to finally obtain random optimal control.
As a preferred mode, for the problem description section:
given a randomly perturbed system described by a random differential equation
Figure GDA0003600446490000031
And output equation associated therewith
y=Cx+υ (2)
Wherein the content of the first and second substances,
Figure GDA0003600446490000032
and
Figure GDA0003600446490000033
respectively represent the system state and control the outputInputting and outputting;
Figure GDA0003600446490000034
and
Figure GDA0003600446490000035
is an unknown constant matrix;
Figure GDA0003600446490000036
and
Figure GDA0003600446490000037
are uncorrelated zero-mean wiener processes whose covariance matrices are respectively represented as
Figure GDA0003600446490000038
And
Figure GDA0003600446490000039
N1,N2is a non-negative integer; xii
Figure GDA00036004464900000310
Is a zero-mean wiener process, satisfies
Figure GDA00036004464900000311
Figure GDA00036004464900000312
Figure GDA00036004464900000313
Where ρ isijij>0 is a known constant value;
given the above system, the goal is to solve for the optimal linear control u*=-K*x, wherein
Figure GDA00036004464900000314
Is a random optimal control strategy to be designed so that the cost function
Figure GDA00036004464900000315
Namely, it is
Figure GDA00036004464900000316
Minimization of which
Figure GDA00036004464900000317
As a preferred mode, for the design part of the data-driven optimal state observer:
aiming at the completely unmeasured system state, an observer is designed
Figure GDA0003600446490000041
Wherein
Figure GDA0003600446490000042
An observed value that is representative of the state,
Figure GDA0003600446490000043
the observation gain to be solved; the observation error is expressed as
Figure GDA0003600446490000044
From (1), (2) and (8) can be obtained:
Figure GDA0003600446490000045
the change of the error e is independent of the state x, firstly, an optimal observer of an unknown state is designed, and then an optimal control strategy of the system is designed by using the observed state.
Preferably, the method comprises the following steps: definition of
Figure GDA0003600446490000046
Figure GDA0003600446490000047
According to the LQG control theory, the optimal state observation gain L*Can be expressed as
L*=S*CTV-1 (12)
Wherein S*Is composed of
Figure GDA0003600446490000048
A unique symmetric positive solution of;
for an optimal observer to exist, the following assumptions are made:
Figure GDA0003600446490000049
is a restful and strictly considerable mean square, wherein
Figure GDA00036004464900000410
Is in a state;
given initial observed gain
Figure GDA00036004464900000411
So that
Figure GDA00036004464900000412
Is a Hurwitz matrix, if
Figure GDA0003600446490000051
Is the Lyapunov equation
Figure GDA0003600446490000052
Of (a) wherein L isk(k-1, 2,3, …) is prepared from
Lk=Sk-1BTV-1 (16)
Given, then
Figure GDA0003600446490000053
Is a Hurwitz matrix, and the sequence
Figure GDA0003600446490000054
And
Figure GDA0003600446490000055
respectively converge to S*And L*
By using the idea of data driving ADP, a data driving algorithm is designed on a system (9) to solve the optimal observation gain L*
Preferably, the method comprises the following steps: the observation gain in the data acquisition and learning stage is fixed to L0Wherein
Figure GDA0003600446490000056
Is a Hurwitz matrix. When the independent noise is zero, in order to ensure the continuous excitation condition, the exploration noise zeta dt is added to the right side of the system (9); when independent noise exists, the continuous excitation condition is automatically met, and exploration noise does not need to be added, namely zeta is 0;
definition of Lk:=Lk-L0(k is 0,1,2 …) and
Figure GDA0003600446490000057
let d sigma1Can be measured if
Figure GDA0003600446490000058
Integrating the two sides of the system (9) along the trajectory to obtain
Figure GDA0003600446490000061
Definition of
Figure GDA0003600446490000062
Figure GDA0003600446490000063
Figure GDA0003600446490000064
Figure GDA0003600446490000065
Figure GDA0003600446490000066
Wherein
Figure GDA0003600446490000067
Is a predefined sampling instant, satisfies
Figure GDA0003600446490000068
Using the above expression, (19) can be transformed into a more compact form
Figure GDA0003600446490000069
Wherein psiekAnd ΩekAre respectively defined as
Figure GDA00036004464900000610
Figure GDA00036004464900000611
For a given L0So that
Figure GDA00036004464900000612
Is a Hurwitz matrix, if the rank condition
Figure GDA00036004464900000613
If true, then there is SkAnd Sk+1And calculating the resulting sequence
Figure GDA00036004464900000614
Respectively converge to S*,L*
As long as the number of samples r1(the lower subscript of the preset sampling time in the formula (24) represents the number of the collected data) is selected to be large enough to ensure that the rank condition is met, and the rank condition can be met through
Figure GDA0003600446490000071
Iterative solution of Sk,Lk+1. By setting a threshold value k1As the cycle interruption condition, when | | | S is satisfiedk-Sk-1||≤κ1Interrupt the cycle at that time, LkI.e. the resulting optimal observation gain.
Preferably, the ADP control part is driven by the abnormal strategy data of the random disturbance system:
when the independent noise is not zero, the continuous excitation condition is automatically met, and the control law in the learning stage is set as
u0=-K0x (30)
Wherein K0Admission control strategy of the system. When the independent noise is zero, the exploration noise is added on the right side of (30)
Figure GDA0003600446490000075
Derived from the third introduction of Ito
Figure GDA0003600446490000072
Definition of
Figure GDA0003600446490000073
And
Figure GDA0003600446490000074
and d sigma2Can be measured as follows
Figure GDA0003600446490000081
Integrating the two sides of equation (33) along the trajectory of system (1) to obtain further
Figure GDA0003600446490000082
Definition of
Figure GDA0003600446490000083
Figure GDA0003600446490000084
Figure GDA0003600446490000085
Figure GDA0003600446490000086
Wherein
Figure GDA0003600446490000087
Is a predefined sampling instant, which is satisfied
Figure GDA0003600446490000088
Using the above expressions, it is possible to transform (34) into a compact form
Figure GDA0003600446490000089
Wherein psixkAnd ΩxkAre respectively defined as
Figure GDA00036004464900000810
Figure GDA0003600446490000091
Further obtain
Figure GDA0003600446490000092
Given an initial admission control strategy K0If rank condition
Figure GDA0003600446490000093
By choosing r large enough2(equation (38) lower subscript of the sampling instant) to obtain (P)k,Kk+1) Is unique solution of, and
Figure GDA0003600446490000094
by setting a threshold value k2As the cycle interruption condition, when | | | P is satisfiedk-Pk-1||≤κ2The cycle is interrupted at times when u is-Kkx is the resulting random optimal control input.
A computer-readable storage medium, on which a computer program is stored, which computer program is executed by a processor for performing the above-mentioned method.
The beneficial effects of the invention are:
according to the method, the optimal state observer is driven by design data, and different strategy data driving ADP control of a random disturbance system is performed. The data driving ADP method is firstly used for a system with completely unmeasurable state; model-less LQG control is generalized to continuous time systems; non-matching noise outside a control signal channel and independent noise independent of a state and a control signal are considered in ADP design; a novel different strategy data driving ADP control method and medium for a random disturbance system are provided, the burden of repeatedly reading and updating control signals is avoided, and the calculation amount is obviously reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a top view of a laboratory scene;
FIG. 2 is a flow chart of a data-driven adaptive optimization control algorithm for a random perturbation system;
FIG. 3 is a graph showing the meaning of (d), (t), and θ (t));
FIG. 4 is a diagram of the trajectory of the end movements;
FIG. 5 is the tip speed and the motive force in a zero force field;
FIG. 6 is the tip speed and force before learning in VF;
FIG. 7 is the tip velocity and force after learning in VF;
fig. 8 shows the VF back end velocity and force.
Detailed Description
The technical solutions of the present invention are further described in detail below with reference to the accompanying drawings, but the scope of the present invention is not limited to the following.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Example one
The invention provides a data-driven self-adaptive optimization control method of a random disturbance system, which comprises a problem description part, a design part of a data-driven optimal state observer and an ADP control part driven by different strategy data of the random disturbance system;
for the problem description section:
giving a random perturbation system and obtaining an output equation associated with the random perturbation system; solving the optimal linear control, and minimizing a cost function through a designed random optimal control strategy;
for the design part of the data-driven optimal state observer:
aiming at a completely unmeasured system state, designing a data-driven optimal state observer; obtaining a state design system through a random disturbance system, an output equation and an observer;
observing the obtained optimal control strategy of the state design system;
designing a data driving algorithm on a state design system by using the idea of data driving ADP, and solving the optimal observation gain;
for the different strategy data drive ADP control part of the random disturbance system:
and obtaining the online state information of the state design system by using the optimal observer, and further designing a data-driven ADP algorithm to finally obtain random optimal control.
For the problem description section:
given a random perturbation system described by random differential equations
Figure GDA0003600446490000111
And the output equation associated therewith
y=Cx+υ (2)
Wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003600446490000121
and
Figure GDA0003600446490000122
respectively representing system status (completely undetectable), control inputs and outputs;
Figure GDA0003600446490000123
and
Figure GDA0003600446490000124
is an unknown constant matrix;
Figure GDA0003600446490000125
and
Figure GDA0003600446490000126
is a zero-mean wiener process (Brownian motion) that is uncorrelated with each other and whose covariance matrix is divided intoIs shown as
Figure GDA0003600446490000127
And
Figure GDA0003600446490000128
N1,N2is a non-negative integer; xii
Figure GDA0003600446490000129
Is a zero-mean wiener process, satisfies
Figure GDA00036004464900001210
Figure GDA00036004464900001211
Figure GDA00036004464900001212
Where ρ isijij>0 is a known constant value;
given the above system, the goal is to solve for the optimal linear control u*=-K*x is wherein
Figure GDA00036004464900001219
Is a random optimal control strategy to be designed so that the cost function
Figure GDA00036004464900001213
Namely, it is
Figure GDA00036004464900001214
Minimization of which
Figure GDA00036004464900001215
For the design part of the data-driven optimal state observer:
aiming at the completely unmeasured system state, an observer is designed
Figure GDA00036004464900001216
Wherein
Figure GDA00036004464900001217
An observed value that is representative of the state,
Figure GDA00036004464900001218
the observation gain to be solved; the observation error is expressed as
Figure GDA0003600446490000131
From (1), (2) and (8) can be obtained:
Figure GDA0003600446490000132
the change in error e is independent of the state x, which is consistent with the description of the separation principle. Therefore, it is possible to first design an optimal observer of unknown state and then design an optimal control strategy of the system using the observed state.
Definition of
Figure GDA0003600446490000133
Figure GDA0003600446490000134
According to LQG control theory, the optimal state observation gain L*Can be expressed as
L*=S*CTV-1 (12)
Wherein S*Is composed of
Figure GDA0003600446490000135
The unique symmetric positive solution of (a);
for an optimal observer to exist, the following assumptions are made:
Figure GDA0003600446490000136
is a restful and strictly considerable mean square, wherein
Figure GDA0003600446490000137
Is in a state;
given initial observed gain
Figure GDA0003600446490000138
So that
Figure GDA0003600446490000139
Is a Hurwitz matrix, if
Figure GDA00036004464900001310
Is the Lyapunov equation
Figure GDA00036004464900001311
Of (a) wherein L isk(k-1, 2,3, …) is prepared from
Lk=Sk-1BTV-1 (16)
Given, then
Figure GDA0003600446490000141
Is a Hurwitz matrix, and the sequence
Figure GDA0003600446490000142
And
Figure GDA0003600446490000143
respectively converge to S*And L*
By using the idea of data driving ADP, a data driving algorithm is designed on a system (9) to solve the optimal observation gain L*
The observation gain in the data acquisition and learning stage is fixed to L0In which
Figure GDA0003600446490000144
Is a Hurwitz matrix. When the independent noise is zero, in order to ensure the continuous excitation condition, the exploration noise zeta dt is added to the right side of the system (9); when independent noise exists, the continuous excitation condition is automatically met, and exploration noise does not need to be added, namely zeta is 0;
definition of Lk:=Lk-L0(k is 0,1,2 …) and
Figure GDA0003600446490000145
let d sigma1Can be measured if
Figure GDA0003600446490000146
Integrating the two sides of the system (9) along the trajectory to obtain
Figure GDA0003600446490000147
Definition of
Figure GDA0003600446490000151
Figure GDA0003600446490000152
Figure GDA0003600446490000153
Figure GDA0003600446490000154
Figure GDA0003600446490000155
Wherein
Figure GDA0003600446490000156
Is a predefined sampling instant, satisfies
Figure GDA0003600446490000157
Using the above expression, (19) can be transformed into a more compact form
Figure GDA0003600446490000158
Wherein psiekAnd ΩekAre respectively defined as
Figure GDA0003600446490000159
Figure GDA00036004464900001510
For a given L0So that
Figure GDA00036004464900001511
Is a Hurwitz matrix, if rank condition
Figure GDA00036004464900001512
If true, then there is SkAnd Sk+1And calculating the resulting sequence
Figure GDA00036004464900001513
Respectively converge to S*,L*
As long as the number of samples r1(the lower corner mark of the preset sampling time in the formula (24) represents the number of the acquired data) is selected to be large enough, so that the rank condition can be ensured to be met, and the rank condition can be met at the moment
Figure GDA00036004464900001514
Iterative solution of Sk,Lk+1. By setting a threshold value k1As a cycle interrupt condition. When | | | S is satisfiedk-Sk-1||≤κ1Interrupt the cycle at that time, LkI.e. the resulting optimal observation gain.
For the different strategy data drive ADP control part of the random disturbance system:
when the independent noise is not zero, the continuous excitation condition is automatically met, and the control law in the learning stage is set as
u0=-K0x (30)
Wherein K0The allowable control strategy of the system. When the independent noise is zero, add the exploration noise on the right side of (30)
Figure GDA0003600446490000165
Derived from the third introduction of Ito
Figure GDA0003600446490000161
Definition of
Figure GDA0003600446490000162
And
Figure GDA0003600446490000163
and d sigma2Can measure, have
Figure GDA0003600446490000164
Integrating the two sides of equation (33) along the trajectory of system (1) to obtain further
Figure GDA0003600446490000171
Definition of
Figure GDA0003600446490000172
Figure GDA0003600446490000173
Figure GDA0003600446490000174
Figure GDA0003600446490000175
Wherein
Figure GDA0003600446490000176
Is a predefined sampling instant, satisfies
Figure GDA0003600446490000177
Using the above expression, it is possible to transform (34) into a compact form
Figure GDA0003600446490000178
Wherein psixkAnd ΩxkAre respectively defined as
Figure GDA0003600446490000179
Figure GDA00036004464900001710
Further obtain
Figure GDA0003600446490000181
Given an initial admission control strategy K0If rank condition
Figure GDA0003600446490000182
By choosing r large enough2(equation (38) lower corner of sampling time) here again, we get (P)k,Kk+1) Is unique solution of, and
Figure GDA0003600446490000183
by setting a threshold value k2As a cycle interrupt condition. When | | | P is satisfiedk-Pk-1||≤κ2The cycle is interrupted at times when u is-Kkx is the resulting random optimal control input.
Example two
In accordance with an embodiment, the invention provides a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor for performing the method described above.
The present invention may employ a computer program product embodied on one or more storage media (including disk storage, CD-ROM, optical storage) having computer program code embodied therein.
The present invention has been described with reference to a method according to an embodiment of the invention. It will be understood that each flow in the flow diagrams can be implemented by computer program instructions. These computer programs may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means.
EXAMPLE III
In accordance with an embodiment, the present invention provides an example of an application of a learning mechanism simulation of the central nervous system.
This example demonstrates the effectiveness of the above method by simulating the arm motion control experiment of the Central Nervous System (CNS) under external force field disturbance. The human subject moves the manipulator tip forward in the horizontal plane to the target position by arm movements, as shown in fig. 1. Two torque motors are arranged in a base of the manipulator, can generate a required force field, and apply corresponding interference force to the arm through the mechanical arm and a handle at the tail end. The data-driven Adaptive Optimal Control (AOC) approach shown in fig. 2 is used here to simulate the learning mechanism of the CNS in this example.
1) Simulation setup
The dynamic behavior of the system can be described as:
Figure GDA0003600446490000191
wherein, two-dimensional vector p ═ px,py]T,υ=[υxy]TRespectively representing the position and the action speed of the tail end; a ═ ax,ay]TThe state of the actuator is represented, namely the action force applied to the tail end by the experimental object; u ═ ux,uy]TControl signals for CNS; m represents the mass of the hand; b is the viscosity constant; τ is a time constant; d η represents the control correlation noise, consisting of
Figure GDA0003600446490000192
Given, wherein eta1And η2Is two wiener processes, c1And c2Is a positive number that measures the noise amplitude; f is an external force generated by a Velocity-dependent force field (VF), and the value of the external force is set as
Figure GDA0003600446490000193
Wherein
Figure GDA0003600446490000194
Is a constant scaling factor whose value is positively correlated to the subject's strength.
The values of the physical parameters of the system are shown in table 1. Taking the state vector as [ p ]TT,aT]T. Since the output is noisy, the state cannot be measured directly, but is obtained by an observer, where C ═ I is set6. Note that N1=0,N22. Taking W as 0.001I as covariance matrix of independent noise6V ═ 0.015diag (1,1,1,1,10, 10). The initial observed gain and control strategy are set to L respectively0=10I6And K0=[100I2 10I2 10I2]。
TABLE 1 partial physical parameters of arm motion model
Figure GDA0003600446490000201
2) Weight matrix selection
Weight matrix R is 0.01I2. Q is set to task dependent, i.e. the CNS can select different Q matrices for different tasks. For example, when a force field perturbation is found in the x-axis direction, the CNS can increase the weight of that direction to enhance the stiffness of the system (i.e., the magnitude of the restoring force per unit of trajectory deflection). Thus, the existing tests can be utilized (one trial and error is calledFor one trial) data and idea of data fitting, select the appropriate Q. Note that Q contains 21 independent elements. To reduce redundancy, the form is set to Q ═ diag (Q)0,0.01Q0,0.0005Q0) Wherein
Figure GDA0003600446490000202
Is a task-dependent symmetric matrix. Let g*(t) is an ideal motion track, namely a straight track; g (t) is the actual trajectory of the last trial. In g*With (t) as the origin, the polar function pairs (d (t), θ (t)) for the g (t) phases are obtained, as shown in FIG. 3.
At time t when d (t) is at its maximummHas a dm=d(tm),θm=θ(tm) Then CNS determines Q by lower model0The value of (c):
Figure GDA0003600446490000211
wherein, ω is012As an empirical constant, the example takes ω0=5×1051=5×1042=105. In the case of a fixed external force field, dmAnd thetamIs a constant value; d is only modulated when changes in the external force field are observed in the CNSmAnd thetamAdapting Q to new task requirements.
3) Simulation result
The trajectory of the tip movement in the simulation is shown in fig. 4.
5 trials were performed in a zero force field (Null field) using an initial admission control strategy (top left panel); then, VF was suddenly applied to the subject, and the experiment was performed (upper right panel); after the first test in VF, the weight matrix parameters are obtained
Figure GDA0003600446490000212
After learning of the CNS by the data-driven AOC algorithm, experiments were performed (lower left panel); when VF is suddenly removed, namely the state of zero force field is recovered, the trace of the end of the aftereffect is obviously deviated to the right (lower right graph)) It is shown that previous learning does produce a stable compensation effect that is still valid until a new learning task is performed.
The end action speed and action force states of the above four stages are shown in fig. 5-8, respectively. It can be seen that the initial trajectory in the y-direction (i.e. the target direction) is approximately a bell curve. After VF is applied, the x positive direction and y negative direction of the motion force increase significantly, which indicates that the subject is disturbed by a large external force (the direction of which is related to the initial motion direction), so that the motion force has to be increased to compensate. After learning, the experimental object generates stable compensation force, and the influence of an external force field is effectively counteracted.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, it should be noted that any modifications, equivalents and improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (5)

1. A data-driven self-adaptive optimization control method of a random disturbance system is characterized by comprising a problem description part, a design part of a data-driven optimal state observer and an ADP control part driven by different strategy data of the random disturbance system;
for the problem description section:
a random disturbance system is given, and an output equation associated with the random disturbance system is obtained; solving the optimal linear control, and minimizing a cost function through a designed random optimal control strategy;
for the design part of the data-driven optimal state observer:
aiming at a completely unmeasured system state, designing a data-driven optimal state observer; obtaining a state design system through a random disturbance system, an output equation and an observer;
observing the obtained optimal control strategy of the state design system;
designing a data driving algorithm on a state design system by using the idea of data driving ADP, and solving the optimal observation gain;
for the different strategy data drive ADP control part of the random disturbance system:
obtaining the online state information of the state design system by using the optimal observer, namely further designing a data-driven ADP algorithm and finally obtaining random optimal control;
the observation gain in the data acquisition and learning stage is fixed to L0Wherein
Figure FDA0003600446480000012
Is a Hurwitz matrix; when the independent noise is zero, in order to ensure the continuous excitation condition, the exploration noise zeta dt is added to the right side of the system (9); when independent noise exists, the continuous excitation condition is automatically met, and then exploration noise does not need to be added, namely zeta is equal to 0;
definition of Lk:=Lk-L0(k is 0,1,2 …) and
Figure FDA0003600446480000011
let d sigma1Can be measured if
Figure FDA0003600446480000021
Integrating the two sides of the system (9) along the trajectory to obtain
Figure FDA0003600446480000022
Definition of
Figure FDA0003600446480000023
Figure FDA0003600446480000024
Figure FDA0003600446480000025
Figure FDA0003600446480000026
Figure FDA0003600446480000027
Wherein
Figure FDA0003600446480000028
Is a predefined sampling instant, satisfies
Figure FDA0003600446480000029
Using the above expression, (19) can be transformed into a more compact form
Figure FDA00036004464800000210
Wherein psiekAnd ΩekAre respectively defined as
Figure FDA0003600446480000031
Figure FDA0003600446480000032
For a given L0So that
Figure FDA0003600446480000033
Is a Hurwitz matrix, if the rank condition
Figure FDA0003600446480000034
If true, then there is SkAnd Sk+1And calculating the resulting sequence
Figure FDA0003600446480000035
Respectively converge to S*,L*
As long as the number of samples r1(the subscript of the preset sampling time in the formula (24), representing the number of the collected data) is selected to be large enough, so that the rank condition can be met, and the rank condition can be met at the moment
Figure FDA0003600446480000036
Iterative solution of Sk,Lk+1By setting a threshold value k1As the cycle interruption condition, when | | | S is satisfiedk-Sk-1||≤κ1Interrupt the cycle at that time, LkObtaining the optimal observation gain;
for the different strategy data drive ADP control part of the random disturbance system:
when the independent noise is not zero, the continuous excitation condition is automatically met, and the control law of the learning stage is set as
u0=-K0x (30)
Wherein K0The allowable control strategy of the system, when the independent noise is zero, the exploration noise theta is added on the right side of the (30);
derived from the third introduction of Ito
Figure FDA0003600446480000037
Definition of
Figure FDA0003600446480000041
And
Figure FDA0003600446480000042
and d sigma2Can measure, have
Figure FDA0003600446480000043
Integrating the track of the system (1) to two sides of the equation (33) to further obtain
Figure FDA0003600446480000044
Definition of
Figure FDA0003600446480000045
Figure FDA0003600446480000046
Figure FDA0003600446480000047
Figure FDA0003600446480000048
Wherein
Figure FDA0003600446480000049
Is a predefined sampling instant, satisfies
Figure FDA00036004464800000410
Using the above expression, it is possible to transform (34) into a compact form
Figure FDA0003600446480000051
Wherein psixkAnd ΩxkAre respectively defined as
Figure FDA0003600446480000052
Figure FDA0003600446480000053
Further obtain
Figure FDA0003600446480000054
Given an initial admission control strategy K0If rank condition
Figure FDA0003600446480000055
By choosing r large enough2Obtaining (P)k,Kk+1) Is unique solution of, and
Figure FDA0003600446480000056
by setting a threshold value k2As the cycle interruption condition, when | | | P is satisfiedk-Pk-1||≤κ2The cycle is interrupted at times when u is-Kkx is the resulting random optimal control input.
2. The data-driven adaptive optimization control method for the random disturbance system according to claim 1, wherein for the problem description part:
given a randomly perturbed system described by a random differential equation
Figure FDA0003600446480000057
And the output equation associated therewith
y=Cx+υ (2)
Wherein the content of the first and second substances,
Figure FDA0003600446480000058
and
Figure FDA0003600446480000059
respectively representing system states, control inputs and outputs;
Figure FDA00036004464800000510
and
Figure FDA00036004464800000511
is an unknown constant matrix;
Figure FDA0003600446480000061
and
Figure FDA0003600446480000062
are uncorrelated zero-mean wiener processes whose covariance matrices are respectively represented as
Figure FDA0003600446480000063
And
Figure FDA0003600446480000064
N1,N2is a non-negative integer; xii
Figure FDA0003600446480000065
Is a zero-mean wiener process, satisfies
Figure FDA0003600446480000066
Figure FDA0003600446480000067
Figure FDA0003600446480000068
Where ρ isijij>0 is a known constant value;
given the above system, the goal is to solve for the optimal linear control u*=-K*x, wherein
Figure FDA0003600446480000069
Is a random optimal control strategy to be designed so that the cost function
Figure FDA00036004464800000610
Namely, it is
Figure FDA00036004464800000611
Minimization of which
Figure FDA00036004464800000612
3. The data-driven adaptive optimization control method of the random disturbance system according to claim 1, wherein for the design part of the data-driven optimal state observer:
aiming at the completely unmeasured system state, an observer is designed
Figure FDA00036004464800000613
Wherein
Figure FDA00036004464800000614
An observed value representative of the state of the device,
Figure FDA00036004464800000615
the observation gain to be solved; the observation error is expressed as
Figure FDA00036004464800000616
From (1), (2) and (8) can be obtained:
Figure FDA0003600446480000071
the change of the error e is independent of the state x, firstly, an optimal observer of an unknown state is designed, and then an optimal control strategy of the system is designed by using the observed state.
4. The data-driven adaptive optimization control method of the random disturbance system according to claim 3, wherein:
definition of
Figure FDA0003600446480000072
Figure FDA0003600446480000073
According to the LQG control theory, the optimal state observation gain L*Can be expressed as
L*=S*CTV-1 (12)
Wherein S*Is composed of
Figure FDA0003600446480000074
A unique symmetric positive solution of;
for an optimal observer to exist, the following assumptions are made:
Figure FDA0003600446480000075
is a restful and strictly considerable mean square, wherein
Figure FDA0003600446480000076
Is in a state;
given initial observed gain
Figure FDA0003600446480000077
So that
Figure FDA0003600446480000078
Is a Hurwitz matrix, if
Figure FDA0003600446480000079
Is the Lyapunov equation
Figure FDA00036004464800000710
Of (a) wherein L isk(k-1, 2,3, …) is prepared from
Lk=Sk-1BTV-1 (16)
Given, then
Figure FDA0003600446480000081
Is a Hurwitz matrix, and the sequence
Figure FDA0003600446480000082
And
Figure FDA0003600446480000083
respectively converge to S*And L*
By using the idea of data driving ADP, a data driving algorithm is designed on a system (9) to solve the optimal observation gain L*
5. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program is executed by a processor for performing the method according to any one of claims 1-4.
CN201911154069.9A 2019-11-22 2019-11-22 Data-driven self-adaptive optimization control method and medium for random disturbance system Active CN110879531B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911154069.9A CN110879531B (en) 2019-11-22 2019-11-22 Data-driven self-adaptive optimization control method and medium for random disturbance system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911154069.9A CN110879531B (en) 2019-11-22 2019-11-22 Data-driven self-adaptive optimization control method and medium for random disturbance system

Publications (2)

Publication Number Publication Date
CN110879531A CN110879531A (en) 2020-03-13
CN110879531B true CN110879531B (en) 2022-06-24

Family

ID=69730443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911154069.9A Active CN110879531B (en) 2019-11-22 2019-11-22 Data-driven self-adaptive optimization control method and medium for random disturbance system

Country Status (1)

Country Link
CN (1) CN110879531B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111665719B (en) * 2020-06-11 2022-11-22 大连海事大学 Supply ship synchronous control algorithm with timeliness and stability

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104019520A (en) * 2014-05-20 2014-09-03 天津大学 Data drive control method for minimum energy consumption of refrigerating system on basis of SPSA
CN107273445A (en) * 2017-05-26 2017-10-20 电子科技大学 The apparatus and method that missing data mixes multiple interpolation in a kind of big data analysis
CN107807069A (en) * 2017-10-25 2018-03-16 中国石油大学(华东) The adaptive tracking control method and its system of a kind of offshore spilled oil

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0719969D0 (en) * 2007-10-12 2007-11-21 Cambridge Entpr Ltd Substance monitoring and control in human or animal bodies

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104019520A (en) * 2014-05-20 2014-09-03 天津大学 Data drive control method for minimum energy consumption of refrigerating system on basis of SPSA
CN107273445A (en) * 2017-05-26 2017-10-20 电子科技大学 The apparatus and method that missing data mixes multiple interpolation in a kind of big data analysis
CN107807069A (en) * 2017-10-25 2018-03-16 中国石油大学(华东) The adaptive tracking control method and its system of a kind of offshore spilled oil

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李刚 等."飞行器惯性导航陀螺仪故障诊断研究".《计算机仿真》.2019,第36卷(第3期),第32-38,44页. *

Also Published As

Publication number Publication date
CN110879531A (en) 2020-03-13

Similar Documents

Publication Publication Date Title
Doerr et al. Model-based policy search for automatic tuning of multivariate PID controllers
Qi et al. Stable indirect adaptive control based on discrete-time T–S fuzzy model
CN109375512B (en) Prediction control method for ensuring closed loop stability of inverted pendulum system based on RBF-ARX model
Kersten et al. State-space transformations of uncertain systems with purely real and conjugate-complex eigenvalues into a cooperative form
Yang et al. Adaptive backstepping terminal sliding mode control method based on recurrent neural networks for autonomous underwater vehicle
Guan et al. Ship steering control based on quantum neural network
CN110879531B (en) Data-driven self-adaptive optimization control method and medium for random disturbance system
CN111273677B (en) Autonomous underwater robot speed and heading control method based on reinforcement learning technology
Rahman et al. Neural ordinary differential equations for nonlinear system identification
CN112571420A (en) Dual-function model prediction control method under unknown parameters
Kim et al. TOAST: Trajectory Optimization and Simultaneous Tracking Using Shared Neural Network Dynamics
Abadi et al. Chattering-free adaptive finite-time sliding mode control for trajectory tracking of MEMS gyroscope
Takahashi Remarks on a recurrent quaternion neural network with application to servo control systems
Yang et al. Robust control of a class of under-actuated mechanical systems with model uncertainty
Wu et al. Date-Driven Tracking Control via Fuzzy-State Observer for AUV under Uncertain Disturbance and Time-Delay
Zhu et al. Online parameter estimation for uncertain robot manipulators with fixed-time convergence
Loria Uniform global position feedback tracking control of mechanical systems without friction
Fan et al. Differential Dynamic Programming for time-delayed systems
Zheng et al. Identification for nonlinear singularly perturbed system using recurrent high-order multi-time scales neural network
Azarfar et al. Adaptive control for nonlinear singular systems
Xia et al. Three-Dimensional Trajectory Tracking for a Heterogeneous XAUV via Finite-Time Robust Nonlinear Control and Optimal Rudder Allocation
Żak Neural Controlling of Remotely Operated Underwater Vehicle
Balakhnov et al. Robust explicit model predictive control for hybrid linear systems with parameter uncertainties
Rawat et al. Trajectory Control of Robotic Manipulator using Metaheuristic Algorithms
Takahashi et al. Remarks on an Echo State Network–Based Optimal Predictive Control Using a Metaheuristics Optimisation Approach

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant