CN115877871A - Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning - Google Patents

Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning Download PDF

Info

Publication number
CN115877871A
CN115877871A CN202310193021.9A CN202310193021A CN115877871A CN 115877871 A CN115877871 A CN 115877871A CN 202310193021 A CN202310193021 A CN 202310193021A CN 115877871 A CN115877871 A CN 115877871A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
game
subsystem
zero
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310193021.9A
Other languages
Chinese (zh)
Other versions
CN115877871B (en
Inventor
刘昊
吕金虎
马子豪
高庆
刘德元
王薇
钟森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Academy of Mathematics and Systems Science of CAS
Original Assignee
Beihang University
Academy of Mathematics and Systems Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University, Academy of Mathematics and Systems Science of CAS filed Critical Beihang University
Priority to CN202310193021.9A priority Critical patent/CN115877871B/en
Publication of CN115877871A publication Critical patent/CN115877871A/en
Application granted granted Critical
Publication of CN115877871B publication Critical patent/CN115877871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention belongs to the technical field of unmanned aerial vehicle control, and provides a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning, which comprises the following specific steps: s1: establishing an unmanned aerial vehicle dynamic model; s2: establishing a non-zero and game formation model; s3: solving the nonzero and game formation model established in the step S2 by using a reinforcement learning method; s4: designing a non-zero and game formation controller. The control method can ensure that the state of the unmanned aerial vehicle cluster subsystem is quickly converged to a desired value, namely the unmanned aerial vehicle cluster can form a required formation pattern in a short time, and the process is stable and quick. Meanwhile, the error system adopts a game-based control method, and compared with the traditional control method, the convergence rate is higher and the overshoot is smaller. Therefore, the proposed non-zero and game controller can better solve the problem of formation trajectory tracking.

Description

Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning
Technical Field
The invention belongs to the technical field of unmanned aerial vehicle control. In particular to a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning.
Background
Unmanned Aerial Vehicles (UAVs) are gaining attention in various research fields because they have many advantages such as low cost, high maneuverability, and good adaptability compared to a single UAV in many typical applications such as heavy transportation, wide area search mission, and large-scale scientific observation. The traditional formation system control method comprises a pilot-follower method, a behavior-based method, a virtual structure method, an artificial potential field method and the like. The pilot-follower method only defines the behavior of the pilot during formation flight, and other followers automatically keep the relative position with the pilot through information interaction, so that the formation keeping task of the whole formation can be completed. The distributed navigator-follower method has a simple and clear control structure in the practical application process, and each unmanned aerial vehicle only needs the state information of neighbors and the unmanned aerial vehicle, so that the requirement on missile communication hardware is not high, the cooperation problem among formation members is greatly simplified, and the distributed navigator-follower method is widely applied to robot formation, unmanned aerial vehicle formation and missile formation. In recent years, game theory has attracted extensive attention in the field of robot formation, for example, a differential game equilibrium solution is used as a formation control strategy to effectively solve the formation control problem. In fact, the drone formation control problem can be expressed as a multi-person differential gaming problem.
When the dynamic parameters of the unmanned aerial vehicle cannot be accurately obtained or constant interference exists in formation, the Nash equilibrium solution of the game formation problem is difficult to solve, and further the Nash equilibrium optimal formation control law is obtained to realize the formation required by the unmanned aerial vehicle team. The problem can be solved by utilizing the ability of reinforcement learning self-adaptive learning, intelligently identifying some unknown parameters of the unmanned aerial vehicle formation system and learning the optimal controller by utilizing state data.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning, which can intelligently identify some unknown parameters of an unmanned aerial vehicle formation system and learn an optimal controller by using state data by using the reinforcement learning self-adaptive learning capability.
The technical scheme of the invention is as follows:
a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning comprises the following specific steps:
s1: establishing a dynamic model of the unmanned aerial vehicle;
s2: establishing a non-zero and game formation model, which comprises a longitudinal subsystem non-zero and game formation model, a transverse subsystem non-zero and game formation model, a vertical subsystem non-zero and game formation model and a yaw subsystem non-zero and game formation model;
s3: solving the nonzero and game formation model established in the step S2 by using a reinforcement learning method;
s4: designing a non-zero and game formation controller.
Preferably, the step S1 includes the following steps:
for the firstiErecting an unmanned aerial vehicle and establishing a six-degree-of-freedom unmanned aerial vehicle dynamic system, wherein the six-degree-of-freedom unmanned aerial vehicle dynamic system is a multi-input and multi-output system and consists of four subsystems, and the input and the output of a longitudinal subsystem are respectively defined as
Figure SMS_2
,/>
Figure SMS_5
The input and output of the transverse subsystem are each defined as->
Figure SMS_7
,/>
Figure SMS_3
The input and output of the vertical subsystem are defined as ^ or ^ s, respectively>
Figure SMS_4
,/>
Figure SMS_6
The input and output of the yaw subsystem are defined as ^ or ^ s, respectively>
Figure SMS_8
,/>
Figure SMS_1
Figure SMS_10
Represents the position of the unmanned aerial vehicle in the earth-fixed inertial system, and/or>
Figure SMS_14
Representing the Euler attitude angle of the drone, wherein->
Figure SMS_17
Respectively representing a roll angle, a pitch angle and a yaw angle;
Figure SMS_11
representing control inputs of the drone, divided into local control inputs->
Figure SMS_16
And a global control input
Figure SMS_20
I.e. is->
Figure SMS_22
(ii) a Local input->
Figure SMS_9
Is that the unmanned aerial vehicle outputs by itself->
Figure SMS_13
Generated control input, global input>
Figure SMS_18
Is output by other unmanned aerial vehicles>
Figure SMS_21
A generated control input; the state vectors of the longitudinal subsystem, the transverse subsystem, the vertical subsystem and the yaw subsystem are respectively defined as, device for selecting or keeping>
Figure SMS_12
,/>
Figure SMS_15
,/>
Figure SMS_19
and />
Figure SMS_23
Establishing a longitudinal subsystem dynamic model and a transverse subsystem dynamic model of the unmanned aerial vehicle as follows:
Figure SMS_24
(1)
wherein
Figure SMS_25
Figure SMS_26
Is a constant feedback factor, is greater than or equal to>
Figure SMS_27
and />
Figure SMS_28
Are the nominal parameters of the longitudinal and transverse subsystems,ja number indicating the number of the other drone,nrepresents the sum of other drones; />
Figure SMS_29
When the temperature of the water is higher than the set temperature,
Figure SMS_30
respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a longitudinal subsystem of the unmanned aerial vehicle are erected; />
Figure SMS_31
When, is greater or less>
Figure SMS_32
Respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a transverse subsystem of the unmanned aerial vehicle are erected;
establishing a dynamic model of a vertical subsystem and a dynamic model of a yawing subsystem of the unmanned aerial vehicle as follows:
Figure SMS_33
(2)
wherein
Figure SMS_34
Figure SMS_35
Is a constant feedback factor, is greater than or equal to>
Figure SMS_36
and />
Figure SMS_37
Are the nominal parameters of the vertical subsystem and the yaw subsystem,
Figure SMS_38
in combination of time>
Figure SMS_39
Respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a vertical subsystem of the unmanned aerial vehicle are erected; />
Figure SMS_40
When the temperature of the water is higher than the set temperature,
Figure SMS_41
respectively represent the firstiThe state vector, the local control input, the global control input, the self output quantity and the other unmanned aerial vehicle output quantities of the yawing subsystem of the unmanned aerial vehicle are erected.
Preferably, the process of establishing the non-zero and game formation model of the longitudinal sub-system in the step S2 is as follows:
Figure SMS_42
according to the dynamics model of the longitudinal subsystem of the unmanned aerial vehicle, a non-zero game formation model of the longitudinal subsystem is established, and the steps are as follows:
unmanned plane set
Figure SMS_43
In all, havenErect unmanned aerial vehicle, and make->
Figure SMS_44
Representing the states of the non-zero and game formation models, and obtaining a global dynamic model as follows:
Figure SMS_45
(3)
wherein
Figure SMS_46
Is->
Figure SMS_47
Unit matrix, based on the status of the unit>
Figure SMS_48
Is a kronecker product, or a combination thereof>
Figure SMS_49
Is the firstiColumn vectors with elements of 1 and other elements of 0; it is known that
Figure SMS_50
So substituting the above formula into
Figure SMS_51
(4)
Figure SMS_52
Considering that the virtual leader provides the required relative position for each follower in the formation, an ideal system of zero input is defined as follows:
Figure SMS_53
(5)
wherein
Figure SMS_54
,/>
Figure SMS_55
(ii) a Make->
Figure SMS_56
Figure SMS_57
Then, the following error system is obtained by subtracting the equations (4) and (5):
Figure SMS_58
(6)/>
and establishing non-zero and game formation models of the transverse subsystem, the vertical subsystem and the yaw subsystem by adopting the same method.
Preferably, the step S3 specifically comprises the following steps:
the cost function for each drone is defined as follows:
Figure SMS_59
(7)
wherein the weight parameter
Figure SMS_60
And a weight parameter->
Figure SMS_61
Is a symmetric matrix, is>
Figure SMS_62
For the initial state of the non-zero and game formation models,twhich represents the time of the start-up,τrepresents the integration time;
designing a non-zero and game formation controller for each drone to track a predetermined trajectory, i.e.
Figure SMS_63
And
Figure SMS_64
while minimizingiCost function of unmanned aerial vehicle on shelf->
Figure SMS_65
The game optimal controller is designed into
Figure SMS_66
(8)
wherein ,
Figure SMS_67
represents an optimal feedback control gain;
under the condition of satisfying the formula (6),
Figure SMS_68
solving by minimizing a cost function:
Figure SMS_69
game feedback controller equation (8) establishes nash equilibrium, a nash control strategy, of the non-zero and game formation control problems when each drone satisfies all of the following inequalities:
Figure SMS_70
(9)
preferably, the step S4 specifically comprises the following steps:
for unknown matrix
Figure SMS_71
and />
Figure SMS_72
Approximately solving a coupled algebra Riccati equation set and a Nash equilibrium solution by using a reinforcement learning algorithm; optimal feedback control gain pass for drone>
Figure SMS_73
Obtaining; whereinK i For feedback control of the gain, the symmetric matrix->
Figure SMS_74
Solved by the following coupled AREs:
Figure SMS_75
(10)
in the formula vIt is shown that the counting variable is,
Figure SMS_76
if the dynamics of the system are known, obtaining a numerical solution by using a strategy iterative algorithm based on a model; if the dynamics of the system is unknown, the following strategy is utilized to iterate the reinforcement learning algorithm to approximately solve;
Figure SMS_77
(11)/>
wherein ,kthe number of iterations is indicated and,
Figure SMS_78
(12)
from the kronecker product algorithm:
Figure SMS_79
(13)
Figure SMS_80
(14)
wherein
Figure SMS_81
Is an arbitrary column vector and is a linear vector,MandNrepresents any two matrices;
obtained by using the formula (13) and the formula (14),
Figure SMS_82
(15)
Figure SMS_83
(16)
Figure SMS_84
(17)
wherein, vec () has the following general formula:
Figure SMS_85
,/>
Figure SMS_86
is->
Figure SMS_87
A column vector formed by each column element of (a); defining a column vector
Figure SMS_88
,/>
Figure SMS_89
and />
Figure SMS_90
sIs a positive integer as follows:
Figure SMS_91
(18)
wherein ,
Figure SMS_92
,/>
Figure SMS_93
combinations (11), (15), (16), (17) and (18) of push-out
Figure SMS_94
(19)
wherein ,
Figure SMS_95
the following linear iterative equation is derived from equation (11):
Figure SMS_96
(20)
wherein
Figure SMS_97
If it is not
Figure SMS_98
If the rank is full, then a unique solution of equation (20) is obtained; by introducing an appropriate random harmonic detection noise in the learning process, ≥ is set>
Figure SMS_99
Rank of full row; when/is>
Figure SMS_100
In whichεThe convergence threshold is expressed, and the first is obtained from the formula (20)iFeedback control gain->
Figure SMS_101
At the same timeSolving optimal feedback control gains
Figure SMS_102
And the symmetry matrix->
Figure SMS_103
Without the need for the system dynamic matrix in equation (4)
Figure SMS_104
and />
Figure SMS_105
(ii) a The game-based control protocol is derived from state information and control input information of the system.
Preferably, in step S4: the strategy iterative algorithm is a model-free algorithm independent of the prior knowledge of the unmanned aerial vehicle system, and comprises the following specific algorithms:
step1. Selection
Figure SMS_106
(ii) a Selecting a stable initial feedback control gain ^ for each drone>
Figure SMS_107
Step2. At
Figure SMS_108
During the period, the control input of the unmanned aerial vehicle is->
Figure SMS_109
, wherein />
Figure SMS_110
Is a bounded heuristic noise;
step3. Solving for each drone by equation (20)
Figure SMS_111
Step4. Order
Figure SMS_112
And returns to Step3 until implemented->
Figure SMS_113
,/>
Figure SMS_114
A spectral norm representing the matrix;
step5. Obtaining an approximate solution to the Nash equilibrium solution
Figure SMS_115
。/>
The invention provides a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning, which has the following advantages that:
the control method can ensure that the state of the unmanned aerial vehicle cluster subsystem is quickly converged to a desired value, namely the unmanned aerial vehicle cluster can form a required formation pattern in a short time, and the process is stable and quick. Meanwhile, the error system adopts a game-based control method, and compared with the traditional control method, the convergence rate is higher and the overshoot is smaller. Therefore, the proposed non-zero and game controller can better solve the problem of formation trajectory tracking.
Drawings
In order to illustrate embodiments of the invention or solutions in the prior art more clearly, the drawings that are needed in the embodiments will be briefly described below, so that the features and advantages of the invention will be more clearly understood by referring to the drawings that are schematic and should not be understood as limiting the invention in any way, and other drawings may be obtained by those skilled in the art without inventive effort. Wherein:
FIG. 1 is
Figure SMS_116
A learning process convergence map;
fig. 2 is a diagram of drone positions;
fig. 3 is a drone position response diagram (3D);
figure 4 is a drone position error map;
FIG. 5 is a response plot of the attitude angle of the drone;
fig. 6 is a drone position response diagram (LQR).
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
A non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning comprises the following specific steps:
s1: establishing an unmanned aerial vehicle dynamic model;
s2: establishing a non-zero and game formation model, which comprises a longitudinal subsystem non-zero and game formation model, a transverse subsystem non-zero and game formation model, a vertical subsystem non-zero and game formation model and a yaw subsystem non-zero and game formation model;
s3: solving the nonzero and game formation model established in the step S2 by using a reinforcement learning method;
s4: designing a non-zero and game formation controller.
1. Unmanned aerial vehicle model
The six-degree-of-freedom unmanned aerial vehicle dynamic system can be divided into four subsystems, namely a longitudinal subsystem, a transverse subsystem, a vertical subsystem and a yawing subsystem.
Figure SMS_119
Denotes the firstiThe position of the unmanned aerial vehicle in the earth fixed inertial system is erected,
Figure SMS_123
representing the Euler attitude angle of the drone, wherein->
Figure SMS_128
Are respectively provided withRoll, pitch and yaw are indicated. Make->
Figure SMS_120
Is shown asiAnd the control input of the unmanned aerial vehicle is erected, and the four components represent the control input of the four subsystems. The control input may be divided into a local control input->
Figure SMS_124
And a global control input->
Figure SMS_127
I.e. is->
Figure SMS_131
. Local input->
Figure SMS_117
Is that the unmanned aerial vehicle outputs by itself->
Figure SMS_121
Generated control input, global input>
Figure SMS_125
Is output from other unmanned aerial vehicles
Figure SMS_129
The generated control input. Therefore, the status vectors of the longitudinal subsystem, the transverse subsystem, the vertical subsystem and the yaw subsystem are respectively defined as->
Figure SMS_118
,/>
Figure SMS_122
,/>
Figure SMS_126
and />
Figure SMS_130
. The longitudinal subsystem dynamics model and the transverse subsystem dynamics model are as follows:
Figure SMS_132
(1)
wherein
Figure SMS_133
Figure SMS_134
Is a constant feedback coefficient>
Figure SMS_135
and />
Figure SMS_136
Are the nominal parameters of the longitudinal subsystem and the transverse subsystem,ja number indicating the number of the other drone,nrepresents the sum of other drones; />
Figure SMS_137
When the temperature of the water is higher than the set temperature,
Figure SMS_138
respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a longitudinal subsystem of the unmanned aerial vehicle are erected; />
Figure SMS_139
When, is greater or less>
Figure SMS_140
Respectively representiThe state vector, local control input, global control input, self output and other unmanned aerial vehicle output of the transverse subsystem of the unmanned aerial vehicle are erected.
The vertical subsystem dynamics model and the yaw subsystem dynamics model are as follows:
Figure SMS_141
(2)
wherein
Figure SMS_142
Figure SMS_143
Is a constant feedback factor, is greater than or equal to>
Figure SMS_144
and />
Figure SMS_145
Is a nominal parameter for the vertical subsystem and the yaw subsystem. As can be seen from equations (1) and (2), the UVA system is a multiple-input and multiple-output system, consisting of four subsystems, with four control inputs->
Figure SMS_146
And four outputs->
Figure SMS_147
2. Non-zero sum game formation model
Non-zero and game formation models are established by taking a vertical subsystem as an example, and models of other subsystems can be established by adopting the same method. Set unmanned aerial vehicle
Figure SMS_148
And make->
Figure SMS_149
Representing the state of the non-zero and game formation models. Then, a global kinetic model can be obtained as follows:
Figure SMS_150
(3)
wherein ,
Figure SMS_151
is->
Figure SMS_152
Unit matrix, based on the status of the unit>
Figure SMS_153
Is a kronecker product, or a combination thereof>
Figure SMS_154
Is the firstiColumn vectors with element 1 and other element 0. It is known that
Figure SMS_155
So that the above formula is substituted
Figure SMS_156
(4)
Figure SMS_157
Considering that the virtual leader provides the required relative position for each follower in the formation, then an ideal system of zero input is defined as follows:
Figure SMS_158
(5)
wherein
Figure SMS_159
,/>
Figure SMS_160
(ii) a Make->
Figure SMS_161
Figure SMS_162
Then, the following error system is obtained by subtracting the equations (4) and (5):
Figure SMS_163
(6)
as can be seen from the formula (6)The state of the formation longitudinal subsystem is controlled and input by all unmanned aerial vehicles
Figure SMS_164
The influence of (a); this means that there are co-ordinations and conflicts within the multi-drone system, so the global system can be studied within the scope of the differential game theory.
3. Solving non-zero sum game formation control problem
The cost function for each drone is defined as follows:
Figure SMS_165
(7)/>
wherein the weight parameter
Figure SMS_166
and />
Figure SMS_167
Is a symmetric weight matrix, and>
Figure SMS_168
for the initial state of the non-zero and game formation models,twhich represents the starting time of the process,τrepresenting the integration time. The aim of the invention is to design a non-zero and game formation controller for each unmanned aerial vehicle to track a predetermined trajectory, namely ^ er>
Figure SMS_169
and />
Figure SMS_170
While minimizingiCost function of unmanned aerial vehicle
Figure SMS_171
. Designing the game controller as
Figure SMS_172
(8)
Under the condition of satisfying the formula (6), the cost function is minimized to obtain:
Figure SMS_173
game feedback controller equation (8) establishes nash equilibrium, a nash control strategy, of the non-zero and game formation control problems when each drone satisfies all of the following inequalities:
Figure SMS_174
(9)
as can be seen from equation (9), when other drones maintain the nash control strategy, none of the participants can reduce cost by deviating from nash equilibrium; this means that nash equalization can force each participant to maintain a nash control strategy.
4. Designing a controller:
in the part, a non-zero game formation controller is designed based on game theory and reinforcement learning theory. For unknown matrix
Figure SMS_175
and />
Figure SMS_176
Coupled Algebraic Riccati Equations (AREs) and Nash equilibrium solutions are solved approximately using a reinforcement learning algorithm. According to the optimal control theory and the game theory, the stable feedback control gain of the unmanned aerial vehicle can be obtained through->
Figure SMS_177
And (4) obtaining. In which symmetry matrix +>
Figure SMS_178
This can be solved by the following coupled AREs:
Figure SMS_179
(10)
in the formula
Figure SMS_180
. Of non-linear equation (10)The analytical solution is difficult to directly obtain. Thus, if the dynamics of the system are known, a model-based strategy iterative algorithm can be used to obtain a numerical solution of equation (10); if the dynamics of the system are unknown, the following strategy can be used to iterate a reinforcement learning algorithm to approximate solution (10).
Figure SMS_181
(11)
Figure SMS_182
(12)
From the kronecker product:
Figure SMS_183
(13)
Figure SMS_184
(14)/>
obtained by using the formula (13) and the formula (14),
Figure SMS_185
(15)
Figure SMS_186
(16)
Figure SMS_187
(17)
wherein, vec () has the following general formula:
Figure SMS_188
,/>
Figure SMS_189
is->
Figure SMS_190
Each column of elements ofA column vector of pixels; defining a column vector
Figure SMS_191
,/>
Figure SMS_192
and />
Figure SMS_193
sIs a positive integer as follows:
Figure SMS_194
(18)
wherein ,
Figure SMS_195
,/>
Figure SMS_196
combinations of (11), (15), (16), (17) and (18), push-out
Figure SMS_197
(19)
wherein ,
Figure SMS_198
the following linear iterative equation can be derived from equation (11):
Figure SMS_199
(20)
wherein
Figure SMS_200
Note 1: if it is not
Figure SMS_201
Is a full rank, a unique solution of equation (20) can be obtained. By introducing appropriate followers in the learning processThe machine harmonic wave detects noise and can make->
Figure SMS_202
The full rank. When/is>
Figure SMS_203
When it comes toiFeedback control matrix of unmanned aerial vehicle on frame->
Figure SMS_204
Can be obtained from the formula (20).
Note 2: the optimal feedback control gain can be solved simultaneously by the formula (20)
Figure SMS_205
And the symmetry matrix->
Figure SMS_206
Without the need for the system dynamics matrix ≥ in equation (4)>
Figure SMS_207
and />
Figure SMS_208
. The game-based control protocol may be derived from state information and control input information of the system. Therefore, the proposed strategy iterative algorithm is a model-free algorithm that does not rely on the prior knowledge of the drone system. The specific algorithm is as follows:
step1. Selection
Figure SMS_209
(ii) a Selecting a stable initial feedback control gain ^ for each drone>
Figure SMS_210
Step2. In
Figure SMS_211
During, unmanned aerial vehicle's control input is->
Figure SMS_212
, wherein />
Figure SMS_213
Is a bounded heuristic noise;
step3. Solving for each drone by equation (20)
Figure SMS_214
;/>
Step4. Order
Figure SMS_215
And returns to Step3 until implemented->
Figure SMS_216
,/>
Figure SMS_217
A spectral norm representing a matrix;
step5. Obtaining an approximate solution to the Nash equilibrium solution
Figure SMS_218
In the simulation, the game formation system comprises three unmanned aerial vehicles, all of which are modeled according to a single unmanned aerial vehicle system, and the parameters are shown in table 1. The validity of the unmanned aerial vehicle system parameters is verified through a real-time experiment result. While matrices a, B and C are unknown to the formation controller. Furthermore, comparison with the conventional optimal control method (LQR) was also made. The weight matrix is chosen as:
Figure SMS_226
,/>
Figure SMS_230
and />
Figure SMS_238
. The initial state of the ideal system is selected to be->
Figure SMS_220
,/>
Figure SMS_227
Figure SMS_235
and />
Figure SMS_243
. And the initial state of the non-zero sum game formation system is selected as
Figure SMS_223
,/>
Figure SMS_231
,/>
Figure SMS_239
And
Figure SMS_246
. The stable feedback control gain is selected as->
Figure SMS_221
Figure SMS_229
,/>
Figure SMS_236
,/>
Figure SMS_244
Figure SMS_222
and />
Figure SMS_232
,/>
Figure SMS_240
,/>
Figure SMS_247
,/>
Figure SMS_219
,/>
Figure SMS_228
Figure SMS_237
,/>
Figure SMS_245
. The design parameters of the reinforcement learning algorithm are selected as follows: />
Figure SMS_225
and />
Figure SMS_234
And s. The exploration noise is selected as: />
Figure SMS_241
,/>
Figure SMS_248
and />
Figure SMS_224
, wherein />
Figure SMS_233
Is a section
Figure SMS_242
The random number in (c). Finally, a convergence threshold is selected>
Figure SMS_249
To verify the validity of the proposed gaming formation controller.
TABLE 1 subsystem parameters
Figure SMS_250
In the course of the learning process,
Figure SMS_251
converge to its optimum value->
Figure SMS_252
The convergence process of (2) is shown in fig. 1. It can be seen from the figure that, through the proposed reinforcement learning algorithm, the method is applied toConvergence can be achieved after 12 iterations. The position response is shown in fig. 2, the position error is shown in fig. 4, and the attitude response is shown in fig. 5, in which the dashed line, the chain line, and the solid line respectively represent the drones 1-3. As can be seen from these figures, all subsystems of the three drones converge to their desired values within 5 seconds. Three unmanned aerial vehicles can form required formation patterns in a short time, and the process is stable and quick. Comparing fig. 4 and 6, it can be seen that the error system using the game-based control method has a faster convergence rate and a smaller overshoot than using the conventional control method. Therefore, the proposed non-zero and game controller can better solve the problem of formation trajectory tracking.
The above description is given for the purpose of illustrating the invention and is not to be construed as limiting the invention, since various modifications and changes will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning is characterized by comprising the following specific steps:
s1: establishing an unmanned aerial vehicle dynamic model;
s2: establishing a non-zero and game formation model, which comprises a longitudinal subsystem non-zero and game formation model, a transverse subsystem non-zero and game formation model, a vertical subsystem non-zero and game formation model and a yaw subsystem non-zero and game formation model;
s3: solving the nonzero and game formation model established in the step S2 by using a reinforcement learning method;
s4: designing a non-zero and game formation controller.
2. The reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 1, wherein the step S1 comprises the following steps:
for the firstiErecting an unmanned aerial vehicle and establishing a six-degree-of-freedom unmanned aerial vehicle dynamic system, wherein the six degrees of freedom are free ofThe human power system is a multi-input and multi-output system and consists of four subsystems, wherein the input and the output of a longitudinal subsystem are respectively defined as
Figure QLYQS_2
Figure QLYQS_5
The input and output of the transverse subsystem are each defined as->
Figure QLYQS_7
,/>
Figure QLYQS_3
The input and output of the vertical subsystem are respectively defined as
Figure QLYQS_4
,/>
Figure QLYQS_6
The input and output of the yaw subsystem are defined as ^ or ^ s, respectively>
Figure QLYQS_8
,/>
Figure QLYQS_1
Figure QLYQS_11
Represents the position of the unmanned aerial vehicle in the earth-fixed inertial system, and/or>
Figure QLYQS_16
Representing the Euler attitude angle of the drone, wherein->
Figure QLYQS_20
Respectively representing a roll angle, a pitch angle and a yaw angle; />
Figure QLYQS_10
Representing a control input of the drone, divided into local control inputs->
Figure QLYQS_15
And a global control input>
Figure QLYQS_19
I.e. based on>
Figure QLYQS_23
(ii) a Local input->
Figure QLYQS_9
Is that the unmanned plane outputs the quantity by itself>
Figure QLYQS_13
Generated control input, global input>
Figure QLYQS_17
Is output by other unmanned aerial vehicles>
Figure QLYQS_21
A generated control input; the state vectors of the longitudinal subsystem, the transverse subsystem, the vertical subsystem and the yaw subsystem are respectively defined as,
Figure QLYQS_12
,/>
Figure QLYQS_14
,/>
Figure QLYQS_18
and />
Figure QLYQS_22
Establishing a longitudinal subsystem dynamic model and a transverse subsystem dynamic model of the unmanned aerial vehicle as follows:
Figure QLYQS_24
(1)
wherein
Figure QLYQS_25
Figure QLYQS_26
Is a constant feedback factor, is greater than or equal to>
Figure QLYQS_27
and />
Figure QLYQS_28
Are the nominal parameters of the longitudinal subsystem and the transverse subsystem,ja number indicating the number of the other drone,nrepresents the sum of other drones; />
Figure QLYQS_29
When, is greater or less>
Figure QLYQS_30
Respectively representiErecting state vectors, local control input, global control input, self output quantity and other unmanned aerial vehicle output quantities of a longitudinal subsystem of the unmanned aerial vehicle; />
Figure QLYQS_31
When, is greater or less>
Figure QLYQS_32
Respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a transverse subsystem of the unmanned aerial vehicle are erected;
establishing a dynamic model of a vertical subsystem and a dynamic model of a yawing subsystem of the unmanned aerial vehicle as follows:
Figure QLYQS_33
(2)
wherein
Figure QLYQS_34
Figure QLYQS_35
Is a constant feedback factor, is greater than or equal to>
Figure QLYQS_36
and />
Figure QLYQS_37
Is a nominal parameter of the vertical subsystem and the yaw subsystem, based on the measured values>
Figure QLYQS_38
When the temperature of the water is higher than the set temperature,
Figure QLYQS_39
respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a vertical subsystem of the unmanned aerial vehicle are erected; />
Figure QLYQS_40
When, is greater or less>
Figure QLYQS_41
Respectively representiState vector, local control input, global control input, self output, other unmanned aerial vehicle output of the yaw subsystem of the frame unmanned aerial vehicle.
3. The reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 2, wherein the process of establishing the vertical subsystem non-zero and game formation model in step S2 is as follows:
Figure QLYQS_42
according to the dynamics model of the longitudinal subsystem of the unmanned aerial vehicle, a non-zero game formation model of the longitudinal subsystem is established, and the steps are as follows:
unmanned plane set
Figure QLYQS_43
All together havenPut up unmanned aerial vehicle, and make->
Figure QLYQS_44
Representing the states of the non-zero and game formation models, and obtaining a global dynamic model as follows:
Figure QLYQS_45
(3)
wherein
Figure QLYQS_46
Is->
Figure QLYQS_47
Unit matrix, based on the status of the unit>
Figure QLYQS_48
Is a kronecker product, or a combination thereof>
Figure QLYQS_49
Is the firstiColumn vectors with elements of 1 and other elements of 0; it is known that
Figure QLYQS_50
Substituted by formula (3)
Figure QLYQS_51
(4)/>
Figure QLYQS_52
Considering that the virtual leader provides the required relative position for each follower in the formation, an ideal system of zero input is defined as follows:
Figure QLYQS_53
(5)
wherein
Figure QLYQS_54
,/>
Figure QLYQS_55
(ii) a Make->
Figure QLYQS_56
,/>
Figure QLYQS_57
Then, the following error system is obtained by subtracting the equations (4) and (5):
Figure QLYQS_58
(6)。
4. the reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 3, wherein the step S3 comprises the following steps:
the cost function for each drone is defined as follows:
Figure QLYQS_59
(7)
wherein the weight parameter
Figure QLYQS_60
And a weight parameter>
Figure QLYQS_61
Is a symmetric matrix, is>
Figure QLYQS_62
For the initial state of the non-zero and game formation models,twhich represents the starting time of the process,τrepresents the integration time;
designing a non-zero and game formation controller for each drone to track a predetermined trajectory, i.e.
Figure QLYQS_63
and />
Figure QLYQS_64
While minimizingiCost function of unmanned aerial vehicle on shelf->
Figure QLYQS_65
The game optimal controller is designed into
Figure QLYQS_66
(8)
wherein ,
Figure QLYQS_67
represents an optimal feedback control gain;
under the condition of satisfying the formula (6),
Figure QLYQS_68
solving by minimizing a cost function:
Figure QLYQS_69
game feedback controller equation (8) establishes nash equilibrium, a nash control strategy, of the non-zero and game formation control problems when each drone satisfies all of the following inequalities:
Figure QLYQS_70
(9)。
5. the reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 4, wherein the step S4 comprises the following steps:
for unknown matrix
Figure QLYQS_71
and />
Figure QLYQS_72
Approximately solving a coupled algebra Riccati equation set and a Nash equilibrium solution by using a reinforcement learning algorithm; optimal feedback control gain pass for drone>
Figure QLYQS_73
Obtaining; whereinK i For feedback control of gain, symmetric matrices
Figure QLYQS_74
Solved by the following coupled AREs:
Figure QLYQS_75
(10)
in the formula vIt is shown that the counting variable is,
Figure QLYQS_76
obtaining (10) a numerical solution using a model-based strategy iterative algorithm if the dynamics of the system are known; if the dynamics of the system is unknown, the following strategy is utilized to iterate the reinforcement learning algorithm to approximately solve;
Figure QLYQS_77
(11)
wherein ,krepresenting number of iterationsThe number of the first and second groups is,
Figure QLYQS_78
(12)
from the kronecker product algorithm:
Figure QLYQS_79
(13)
Figure QLYQS_80
(14)
wherein
Figure QLYQS_81
Is an arbitrary column vector and is a linear vector,MandNrepresents any two matrices;
obtained by using the formula (13) and the formula (14),
Figure QLYQS_82
(15)
Figure QLYQS_83
(16)
Figure QLYQS_84
(17)
wherein the general formula of vec () is as follows:
Figure QLYQS_85
,/>
Figure QLYQS_86
is/>
Figure QLYQS_87
A column vector formed by each column element of (a); definition ofColumn vector
Figure QLYQS_88
,/>
Figure QLYQS_89
and />
Figure QLYQS_90
sIs a positive integer as follows: />
Figure QLYQS_91
(18)
wherein ,
Figure QLYQS_92
,/>
Figure QLYQS_93
combinations of (11), (15), (16), (17) and (18), push-out
Figure QLYQS_94
(19)
wherein ,
Figure QLYQS_95
the following linear iterative equation is derived from equation (11):
Figure QLYQS_96
(20)
wherein
Figure QLYQS_97
If it is not
Figure QLYQS_98
If the rank is full, then the only solution of formula (20) is obtained; based on the fact that a random harmonic detection noise is introduced into the learning process>
Figure QLYQS_99
Rank of full rank; when/is>
Figure QLYQS_100
In whichεThe convergence threshold is expressed, and the first is obtained from the formula (20)iFeedback control gain of unmanned aerial vehicle on frame->
Figure QLYQS_101
Simultaneous solution of optimal feedback control gain
Figure QLYQS_102
And the symmetry matrix->
Figure QLYQS_103
Without the need for the system dynamics matrix ≥ in equation (4)>
Figure QLYQS_104
And
Figure QLYQS_105
(ii) a The game-based control protocol is derived from state information and control input information of the system.
6. The reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 5, wherein in step S4: the strategy iterative algorithm is a model-free algorithm independent of the prior knowledge of the unmanned aerial vehicle system, and comprises the following specific algorithms:
step1. Selection
Figure QLYQS_106
(ii) a Selecting a stable initial feedback control gain ^ for each drone>
Figure QLYQS_107
Step2. In
Figure QLYQS_108
During the period, the control input of the unmanned aerial vehicle is->
Figure QLYQS_109
, wherein />
Figure QLYQS_110
Is a bounded heuristic noise;
step3. Solving for each drone by equation (20)
Figure QLYQS_111
Step4. Order
Figure QLYQS_112
And go back to Step3 until it is achieved +>
Figure QLYQS_113
,/>
Figure QLYQS_114
A spectral norm representing a matrix;
step5. Obtaining an approximate solution to the Nash equilibrium solution
Figure QLYQS_115
。/>
CN202310193021.9A 2023-03-03 2023-03-03 Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning Active CN115877871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310193021.9A CN115877871B (en) 2023-03-03 2023-03-03 Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310193021.9A CN115877871B (en) 2023-03-03 2023-03-03 Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN115877871A true CN115877871A (en) 2023-03-31
CN115877871B CN115877871B (en) 2023-05-26

Family

ID=85761836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310193021.9A Active CN115877871B (en) 2023-03-03 2023-03-03 Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN115877871B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116880551A (en) * 2023-07-13 2023-10-13 之江实验室 Flight path planning method, system and storage medium based on random event capturing
CN117420849A (en) * 2023-12-18 2024-01-19 山东科技大学 Marine unmanned aerial vehicle formation granularity-variable collaborative search and rescue method based on reinforcement learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109375514A (en) * 2018-11-30 2019-02-22 沈阳航空航天大学 A kind of optimal track control device design method when the injection attacks there are false data
CN113093538A (en) * 2021-03-18 2021-07-09 长春工业大学 Non-zero and game neural-optimal control method of modular robot system
US20210403159A1 (en) * 2018-10-18 2021-12-30 Telefonaktiebolaget Lm Ericsson (Publ) Formation Flight of Unmanned Aerial Vehicles
US20220004191A1 (en) * 2020-07-01 2022-01-06 Wuhan University Of Technology Usv formation path-following method based on deep reinforcement learning
CN114047758A (en) * 2021-11-08 2022-02-15 南京云智控产业技术研究院有限公司 Q-learning-based multi-mobile-robot formation method
CN114460959A (en) * 2021-12-15 2022-05-10 北京机电工程研究所 Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game
CN115562342A (en) * 2022-10-24 2023-01-03 南京航空航天大学 Multi-aircraft task allocation, flight path planning and formation control integrated game method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210403159A1 (en) * 2018-10-18 2021-12-30 Telefonaktiebolaget Lm Ericsson (Publ) Formation Flight of Unmanned Aerial Vehicles
CN109375514A (en) * 2018-11-30 2019-02-22 沈阳航空航天大学 A kind of optimal track control device design method when the injection attacks there are false data
US20220004191A1 (en) * 2020-07-01 2022-01-06 Wuhan University Of Technology Usv formation path-following method based on deep reinforcement learning
CN113093538A (en) * 2021-03-18 2021-07-09 长春工业大学 Non-zero and game neural-optimal control method of modular robot system
CN114047758A (en) * 2021-11-08 2022-02-15 南京云智控产业技术研究院有限公司 Q-learning-based multi-mobile-robot formation method
CN114460959A (en) * 2021-12-15 2022-05-10 北京机电工程研究所 Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game
CN115562342A (en) * 2022-10-24 2023-01-03 南京航空航天大学 Multi-aircraft task allocation, flight path planning and formation control integrated game method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
弓镇宇等: "基于零和博弈方法的多智能体系统H_∞一致性" *
王醒策等: "多机器人动态编队的强化学习算法研究" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116880551A (en) * 2023-07-13 2023-10-13 之江实验室 Flight path planning method, system and storage medium based on random event capturing
CN117420849A (en) * 2023-12-18 2024-01-19 山东科技大学 Marine unmanned aerial vehicle formation granularity-variable collaborative search and rescue method based on reinforcement learning
CN117420849B (en) * 2023-12-18 2024-03-08 山东科技大学 Marine unmanned aerial vehicle formation granularity-variable collaborative search and rescue method based on reinforcement learning

Also Published As

Publication number Publication date
CN115877871B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN115877871A (en) Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning
CN113093802B (en) Unmanned aerial vehicle maneuver decision method based on deep reinforcement learning
US20220315219A1 (en) Air combat maneuvering method based on parallel self-play
CN109669475A (en) Multiple no-manned plane three-dimensional formation reconfiguration method based on artificial bee colony algorithm
Ali et al. Explicit model following distributed control scheme for formation flying of mini UAVs
CN114020042A (en) Heterogeneous unmanned cluster formation enclosure tracking control method and system
CN110347181B (en) Energy consumption-based distributed formation control method for unmanned aerial vehicles
Yang et al. Distributed optimal consensus with obstacle avoidance algorithm of mixed-order UAVs–USVs–UUVs systems
CN104589349A (en) Combination automatic control method with single-joint manipulator under mixed suspension microgravity environments
Zhou et al. Distributed formation control for multiple quadrotor UAVs under Markovian switching topologies with partially unknown transition rates
CN110825116B (en) Unmanned aerial vehicle formation method based on time-varying network topology
CN115639830B (en) Air-ground intelligent agent cooperative formation control system and formation control method thereof
Cong et al. Formation control for multiquadrotor aircraft: Connectivity preserving and collision avoidance
CN115755956B (en) Knowledge and data collaborative driving unmanned aerial vehicle maneuvering decision method and system
Shen et al. Attitude active disturbance rejection control of the quadrotor and its parameter tuning
CN116974299A (en) Reinforced learning unmanned aerial vehicle track planning method based on delayed experience priority playback mechanism
Fiori et al. Extension of a PID control theory to Lie groups applied to synchronising satellites and drones
CN117055605A (en) Multi-unmanned aerial vehicle attitude control method and system
CN113268084B (en) Intelligent fault-tolerant control method for unmanned aerial vehicle formation
Yang et al. Cooperative group formation control for multiple quadrotors system with finite-and fixed-time convergence
Aruneshwaran et al. Neural adaptive flight controller for ducted-fan UAV performing nonlinear maneuver
Montella et al. Reinforcement learning for autonomous dynamic soaring in shear winds
CN117452975A (en) Security performance cooperative formation control design method for four-rotor unmanned aerial vehicle cluster
CN114995521B (en) Multi-unmanned aerial vehicle distributed formation control method and device and electronic equipment
Razzaghian et al. Robust adaptive neural network control of miniature unmanned helicopter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant