CN115877871A - Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning - Google Patents
Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning Download PDFInfo
- Publication number
- CN115877871A CN115877871A CN202310193021.9A CN202310193021A CN115877871A CN 115877871 A CN115877871 A CN 115877871A CN 202310193021 A CN202310193021 A CN 202310193021A CN 115877871 A CN115877871 A CN 115877871A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- game
- subsystem
- zero
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 91
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000002787 reinforcement Effects 0.000 title claims abstract description 24
- 230000008569 process Effects 0.000 claims abstract description 13
- 239000013598 vector Substances 0.000 claims description 26
- 239000011159 matrix material Substances 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 9
- 238000011217 control strategy Methods 0.000 claims description 6
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 4
- 230000010354 integration Effects 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 2
- 230000004044 response Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 101100460704 Aspergillus sp. (strain MF297-2) notI gene Proteins 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention belongs to the technical field of unmanned aerial vehicle control, and provides a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning, which comprises the following specific steps: s1: establishing an unmanned aerial vehicle dynamic model; s2: establishing a non-zero and game formation model; s3: solving the nonzero and game formation model established in the step S2 by using a reinforcement learning method; s4: designing a non-zero and game formation controller. The control method can ensure that the state of the unmanned aerial vehicle cluster subsystem is quickly converged to a desired value, namely the unmanned aerial vehicle cluster can form a required formation pattern in a short time, and the process is stable and quick. Meanwhile, the error system adopts a game-based control method, and compared with the traditional control method, the convergence rate is higher and the overshoot is smaller. Therefore, the proposed non-zero and game controller can better solve the problem of formation trajectory tracking.
Description
Technical Field
The invention belongs to the technical field of unmanned aerial vehicle control. In particular to a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning.
Background
Unmanned Aerial Vehicles (UAVs) are gaining attention in various research fields because they have many advantages such as low cost, high maneuverability, and good adaptability compared to a single UAV in many typical applications such as heavy transportation, wide area search mission, and large-scale scientific observation. The traditional formation system control method comprises a pilot-follower method, a behavior-based method, a virtual structure method, an artificial potential field method and the like. The pilot-follower method only defines the behavior of the pilot during formation flight, and other followers automatically keep the relative position with the pilot through information interaction, so that the formation keeping task of the whole formation can be completed. The distributed navigator-follower method has a simple and clear control structure in the practical application process, and each unmanned aerial vehicle only needs the state information of neighbors and the unmanned aerial vehicle, so that the requirement on missile communication hardware is not high, the cooperation problem among formation members is greatly simplified, and the distributed navigator-follower method is widely applied to robot formation, unmanned aerial vehicle formation and missile formation. In recent years, game theory has attracted extensive attention in the field of robot formation, for example, a differential game equilibrium solution is used as a formation control strategy to effectively solve the formation control problem. In fact, the drone formation control problem can be expressed as a multi-person differential gaming problem.
When the dynamic parameters of the unmanned aerial vehicle cannot be accurately obtained or constant interference exists in formation, the Nash equilibrium solution of the game formation problem is difficult to solve, and further the Nash equilibrium optimal formation control law is obtained to realize the formation required by the unmanned aerial vehicle team. The problem can be solved by utilizing the ability of reinforcement learning self-adaptive learning, intelligently identifying some unknown parameters of the unmanned aerial vehicle formation system and learning the optimal controller by utilizing state data.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning, which can intelligently identify some unknown parameters of an unmanned aerial vehicle formation system and learn an optimal controller by using state data by using the reinforcement learning self-adaptive learning capability.
The technical scheme of the invention is as follows:
a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning comprises the following specific steps:
s1: establishing a dynamic model of the unmanned aerial vehicle;
s2: establishing a non-zero and game formation model, which comprises a longitudinal subsystem non-zero and game formation model, a transverse subsystem non-zero and game formation model, a vertical subsystem non-zero and game formation model and a yaw subsystem non-zero and game formation model;
s3: solving the nonzero and game formation model established in the step S2 by using a reinforcement learning method;
s4: designing a non-zero and game formation controller.
Preferably, the step S1 includes the following steps:
for the firstiErecting an unmanned aerial vehicle and establishing a six-degree-of-freedom unmanned aerial vehicle dynamic system, wherein the six-degree-of-freedom unmanned aerial vehicle dynamic system is a multi-input and multi-output system and consists of four subsystems, and the input and the output of a longitudinal subsystem are respectively defined as,/>The input and output of the transverse subsystem are each defined as->,/>The input and output of the vertical subsystem are defined as ^ or ^ s, respectively>,/>The input and output of the yaw subsystem are defined as ^ or ^ s, respectively>,/>;
Represents the position of the unmanned aerial vehicle in the earth-fixed inertial system, and/or>Representing the Euler attitude angle of the drone, wherein->Respectively representing a roll angle, a pitch angle and a yaw angle;representing control inputs of the drone, divided into local control inputs->And a global control inputI.e. is->(ii) a Local input->Is that the unmanned aerial vehicle outputs by itself->Generated control input, global input>Is output by other unmanned aerial vehicles>A generated control input; the state vectors of the longitudinal subsystem, the transverse subsystem, the vertical subsystem and the yaw subsystem are respectively defined as, device for selecting or keeping>,/>,/> and />;
Establishing a longitudinal subsystem dynamic model and a transverse subsystem dynamic model of the unmanned aerial vehicle as follows:
wherein
Is a constant feedback factor, is greater than or equal to> and />Are the nominal parameters of the longitudinal and transverse subsystems,ja number indicating the number of the other drone,nrepresents the sum of other drones; />When the temperature of the water is higher than the set temperature,respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a longitudinal subsystem of the unmanned aerial vehicle are erected; />When, is greater or less>Respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a transverse subsystem of the unmanned aerial vehicle are erected;
establishing a dynamic model of a vertical subsystem and a dynamic model of a yawing subsystem of the unmanned aerial vehicle as follows:
wherein
Is a constant feedback factor, is greater than or equal to> and />Are the nominal parameters of the vertical subsystem and the yaw subsystem,in combination of time>Respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a vertical subsystem of the unmanned aerial vehicle are erected; />When the temperature of the water is higher than the set temperature,respectively represent the firstiThe state vector, the local control input, the global control input, the self output quantity and the other unmanned aerial vehicle output quantities of the yawing subsystem of the unmanned aerial vehicle are erected.
Preferably, the process of establishing the non-zero and game formation model of the longitudinal sub-system in the step S2 is as follows:
according to the dynamics model of the longitudinal subsystem of the unmanned aerial vehicle, a non-zero game formation model of the longitudinal subsystem is established, and the steps are as follows:
unmanned plane setIn all, havenErect unmanned aerial vehicle, and make->Representing the states of the non-zero and game formation models, and obtaining a global dynamic model as follows:
wherein Is->Unit matrix, based on the status of the unit>Is a kronecker product, or a combination thereof>Is the firstiColumn vectors with elements of 1 and other elements of 0; it is known that
So substituting the above formula into
Considering that the virtual leader provides the required relative position for each follower in the formation, an ideal system of zero input is defined as follows:
wherein ,/>(ii) a Make->,Then, the following error system is obtained by subtracting the equations (4) and (5):
and establishing non-zero and game formation models of the transverse subsystem, the vertical subsystem and the yaw subsystem by adopting the same method.
Preferably, the step S3 specifically comprises the following steps:
the cost function for each drone is defined as follows:
wherein the weight parameterAnd a weight parameter->Is a symmetric matrix, is>For the initial state of the non-zero and game formation models,twhich represents the time of the start-up,τrepresents the integration time;
designing a non-zero and game formation controller for each drone to track a predetermined trajectory, i.e.Andwhile minimizingiCost function of unmanned aerial vehicle on shelf->;
The game optimal controller is designed into
game feedback controller equation (8) establishes nash equilibrium, a nash control strategy, of the non-zero and game formation control problems when each drone satisfies all of the following inequalities:
preferably, the step S4 specifically comprises the following steps:
for unknown matrix and />Approximately solving a coupled algebra Riccati equation set and a Nash equilibrium solution by using a reinforcement learning algorithm; optimal feedback control gain pass for drone>Obtaining; whereinK i For feedback control of the gain, the symmetric matrix->Solved by the following coupled AREs:
in the formula vIt is shown that the counting variable is,if the dynamics of the system are known, obtaining a numerical solution by using a strategy iterative algorithm based on a model; if the dynamics of the system is unknown, the following strategy is utilized to iterate the reinforcement learning algorithm to approximately solve;
wherein ,kthe number of iterations is indicated and,
from the kronecker product algorithm:
obtained by using the formula (13) and the formula (14),
wherein, vec () has the following general formula:
,/>is->A column vector formed by each column element of (a); defining a column vector,/> and />,sIs a positive integer as follows:
combinations (11), (15), (16), (17) and (18) of push-out
the following linear iterative equation is derived from equation (11):
wherein
If it is notIf the rank is full, then a unique solution of equation (20) is obtained; by introducing an appropriate random harmonic detection noise in the learning process, ≥ is set>Rank of full row; when/is>In whichεThe convergence threshold is expressed, and the first is obtained from the formula (20)iFeedback control gain->;
At the same timeSolving optimal feedback control gainsAnd the symmetry matrix->Without the need for the system dynamic matrix in equation (4) and />(ii) a The game-based control protocol is derived from state information and control input information of the system.
Preferably, in step S4: the strategy iterative algorithm is a model-free algorithm independent of the prior knowledge of the unmanned aerial vehicle system, and comprises the following specific algorithms:
Step2. AtDuring the period, the control input of the unmanned aerial vehicle is->, wherein />Is a bounded heuristic noise;
The invention provides a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning, which has the following advantages that:
the control method can ensure that the state of the unmanned aerial vehicle cluster subsystem is quickly converged to a desired value, namely the unmanned aerial vehicle cluster can form a required formation pattern in a short time, and the process is stable and quick. Meanwhile, the error system adopts a game-based control method, and compared with the traditional control method, the convergence rate is higher and the overshoot is smaller. Therefore, the proposed non-zero and game controller can better solve the problem of formation trajectory tracking.
Drawings
In order to illustrate embodiments of the invention or solutions in the prior art more clearly, the drawings that are needed in the embodiments will be briefly described below, so that the features and advantages of the invention will be more clearly understood by referring to the drawings that are schematic and should not be understood as limiting the invention in any way, and other drawings may be obtained by those skilled in the art without inventive effort. Wherein:
fig. 2 is a diagram of drone positions;
fig. 3 is a drone position response diagram (3D);
figure 4 is a drone position error map;
FIG. 5 is a response plot of the attitude angle of the drone;
fig. 6 is a drone position response diagram (LQR).
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
A non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning comprises the following specific steps:
s1: establishing an unmanned aerial vehicle dynamic model;
s2: establishing a non-zero and game formation model, which comprises a longitudinal subsystem non-zero and game formation model, a transverse subsystem non-zero and game formation model, a vertical subsystem non-zero and game formation model and a yaw subsystem non-zero and game formation model;
s3: solving the nonzero and game formation model established in the step S2 by using a reinforcement learning method;
s4: designing a non-zero and game formation controller.
1. Unmanned aerial vehicle model
The six-degree-of-freedom unmanned aerial vehicle dynamic system can be divided into four subsystems, namely a longitudinal subsystem, a transverse subsystem, a vertical subsystem and a yawing subsystem.Denotes the firstiThe position of the unmanned aerial vehicle in the earth fixed inertial system is erected,representing the Euler attitude angle of the drone, wherein->Are respectively provided withRoll, pitch and yaw are indicated. Make->Is shown asiAnd the control input of the unmanned aerial vehicle is erected, and the four components represent the control input of the four subsystems. The control input may be divided into a local control input->And a global control input->I.e. is->. Local input->Is that the unmanned aerial vehicle outputs by itself->Generated control input, global input>Is output from other unmanned aerial vehiclesThe generated control input. Therefore, the status vectors of the longitudinal subsystem, the transverse subsystem, the vertical subsystem and the yaw subsystem are respectively defined as->,/>,/> and />. The longitudinal subsystem dynamics model and the transverse subsystem dynamics model are as follows:
wherein
Is a constant feedback coefficient> and />Are the nominal parameters of the longitudinal subsystem and the transverse subsystem,ja number indicating the number of the other drone,nrepresents the sum of other drones; />When the temperature of the water is higher than the set temperature,respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a longitudinal subsystem of the unmanned aerial vehicle are erected; />When, is greater or less>Respectively representiThe state vector, local control input, global control input, self output and other unmanned aerial vehicle output of the transverse subsystem of the unmanned aerial vehicle are erected.
The vertical subsystem dynamics model and the yaw subsystem dynamics model are as follows:
wherein
Is a constant feedback factor, is greater than or equal to> and />Is a nominal parameter for the vertical subsystem and the yaw subsystem. As can be seen from equations (1) and (2), the UVA system is a multiple-input and multiple-output system, consisting of four subsystems, with four control inputs->And four outputs->。
2. Non-zero sum game formation model
Non-zero and game formation models are established by taking a vertical subsystem as an example, and models of other subsystems can be established by adopting the same method. Set unmanned aerial vehicleAnd make->Representing the state of the non-zero and game formation models. Then, a global kinetic model can be obtained as follows:
wherein ,is->Unit matrix, based on the status of the unit>Is a kronecker product, or a combination thereof>Is the firstiColumn vectors with element 1 and other element 0. It is known that
So that the above formula is substituted
Considering that the virtual leader provides the required relative position for each follower in the formation, then an ideal system of zero input is defined as follows:
wherein ,/>(ii) a Make->,Then, the following error system is obtained by subtracting the equations (4) and (5):
as can be seen from the formula (6)The state of the formation longitudinal subsystem is controlled and input by all unmanned aerial vehiclesThe influence of (a); this means that there are co-ordinations and conflicts within the multi-drone system, so the global system can be studied within the scope of the differential game theory.
3. Solving non-zero sum game formation control problem
The cost function for each drone is defined as follows:
wherein the weight parameter and />Is a symmetric weight matrix, and>for the initial state of the non-zero and game formation models,twhich represents the starting time of the process,τrepresenting the integration time. The aim of the invention is to design a non-zero and game formation controller for each unmanned aerial vehicle to track a predetermined trajectory, namely ^ er> and />While minimizingiCost function of unmanned aerial vehicle. Designing the game controller as
Under the condition of satisfying the formula (6), the cost function is minimized to obtain:
game feedback controller equation (8) establishes nash equilibrium, a nash control strategy, of the non-zero and game formation control problems when each drone satisfies all of the following inequalities:
as can be seen from equation (9), when other drones maintain the nash control strategy, none of the participants can reduce cost by deviating from nash equilibrium; this means that nash equalization can force each participant to maintain a nash control strategy.
4. Designing a controller:
in the part, a non-zero game formation controller is designed based on game theory and reinforcement learning theory. For unknown matrix and />Coupled Algebraic Riccati Equations (AREs) and Nash equilibrium solutions are solved approximately using a reinforcement learning algorithm. According to the optimal control theory and the game theory, the stable feedback control gain of the unmanned aerial vehicle can be obtained through->And (4) obtaining. In which symmetry matrix +>This can be solved by the following coupled AREs:
in the formula . Of non-linear equation (10)The analytical solution is difficult to directly obtain. Thus, if the dynamics of the system are known, a model-based strategy iterative algorithm can be used to obtain a numerical solution of equation (10); if the dynamics of the system are unknown, the following strategy can be used to iterate a reinforcement learning algorithm to approximate solution (10).
From the kronecker product:
obtained by using the formula (13) and the formula (14),
wherein, vec () has the following general formula:
,/>is->Each column of elements ofA column vector of pixels; defining a column vector,/> and />,sIs a positive integer as follows:
combinations of (11), (15), (16), (17) and (18), push-out
the following linear iterative equation can be derived from equation (11):
wherein
Note 1: if it is notIs a full rank, a unique solution of equation (20) can be obtained. By introducing appropriate followers in the learning processThe machine harmonic wave detects noise and can make->The full rank. When/is>When it comes toiFeedback control matrix of unmanned aerial vehicle on frame->Can be obtained from the formula (20).
Note 2: the optimal feedback control gain can be solved simultaneously by the formula (20)And the symmetry matrix->Without the need for the system dynamics matrix ≥ in equation (4)> and />. The game-based control protocol may be derived from state information and control input information of the system. Therefore, the proposed strategy iterative algorithm is a model-free algorithm that does not rely on the prior knowledge of the drone system. The specific algorithm is as follows:
Step2. InDuring, unmanned aerial vehicle's control input is->, wherein />Is a bounded heuristic noise;
In the simulation, the game formation system comprises three unmanned aerial vehicles, all of which are modeled according to a single unmanned aerial vehicle system, and the parameters are shown in table 1. The validity of the unmanned aerial vehicle system parameters is verified through a real-time experiment result. While matrices a, B and C are unknown to the formation controller. Furthermore, comparison with the conventional optimal control method (LQR) was also made. The weight matrix is chosen as:,/> and />. The initial state of the ideal system is selected to be->,/>, and />. And the initial state of the non-zero sum game formation system is selected as,/>,/>And. The stable feedback control gain is selected as->,,/>,/>, and />,/>,/>,/>,/>,,/>. The design parameters of the reinforcement learning algorithm are selected as follows: /> and />And s. The exploration noise is selected as: />,/> and />, wherein />Is a sectionThe random number in (c). Finally, a convergence threshold is selected>To verify the validity of the proposed gaming formation controller.
TABLE 1 subsystem parameters
In the course of the learning process,converge to its optimum value->The convergence process of (2) is shown in fig. 1. It can be seen from the figure that, through the proposed reinforcement learning algorithm, the method is applied toConvergence can be achieved after 12 iterations. The position response is shown in fig. 2, the position error is shown in fig. 4, and the attitude response is shown in fig. 5, in which the dashed line, the chain line, and the solid line respectively represent the drones 1-3. As can be seen from these figures, all subsystems of the three drones converge to their desired values within 5 seconds. Three unmanned aerial vehicles can form required formation patterns in a short time, and the process is stable and quick. Comparing fig. 4 and 6, it can be seen that the error system using the game-based control method has a faster convergence rate and a smaller overshoot than using the conventional control method. Therefore, the proposed non-zero and game controller can better solve the problem of formation trajectory tracking.
The above description is given for the purpose of illustrating the invention and is not to be construed as limiting the invention, since various modifications and changes will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. A non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning is characterized by comprising the following specific steps:
s1: establishing an unmanned aerial vehicle dynamic model;
s2: establishing a non-zero and game formation model, which comprises a longitudinal subsystem non-zero and game formation model, a transverse subsystem non-zero and game formation model, a vertical subsystem non-zero and game formation model and a yaw subsystem non-zero and game formation model;
s3: solving the nonzero and game formation model established in the step S2 by using a reinforcement learning method;
s4: designing a non-zero and game formation controller.
2. The reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 1, wherein the step S1 comprises the following steps:
for the firstiErecting an unmanned aerial vehicle and establishing a six-degree-of-freedom unmanned aerial vehicle dynamic system, wherein the six degrees of freedom are free ofThe human power system is a multi-input and multi-output system and consists of four subsystems, wherein the input and the output of a longitudinal subsystem are respectively defined as,The input and output of the transverse subsystem are each defined as->,/>The input and output of the vertical subsystem are respectively defined as,/>The input and output of the yaw subsystem are defined as ^ or ^ s, respectively>,/>;
Represents the position of the unmanned aerial vehicle in the earth-fixed inertial system, and/or>Representing the Euler attitude angle of the drone, wherein->Respectively representing a roll angle, a pitch angle and a yaw angle; />Representing a control input of the drone, divided into local control inputs->And a global control input>I.e. based on>(ii) a Local input->Is that the unmanned plane outputs the quantity by itself>Generated control input, global input>Is output by other unmanned aerial vehicles>A generated control input; the state vectors of the longitudinal subsystem, the transverse subsystem, the vertical subsystem and the yaw subsystem are respectively defined as,,/>,/> and />;
Establishing a longitudinal subsystem dynamic model and a transverse subsystem dynamic model of the unmanned aerial vehicle as follows:
wherein
Is a constant feedback factor, is greater than or equal to> and />Are the nominal parameters of the longitudinal subsystem and the transverse subsystem,ja number indicating the number of the other drone,nrepresents the sum of other drones; />When, is greater or less>Respectively representiErecting state vectors, local control input, global control input, self output quantity and other unmanned aerial vehicle output quantities of a longitudinal subsystem of the unmanned aerial vehicle; />When, is greater or less>Respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a transverse subsystem of the unmanned aerial vehicle are erected;
establishing a dynamic model of a vertical subsystem and a dynamic model of a yawing subsystem of the unmanned aerial vehicle as follows:
wherein
Is a constant feedback factor, is greater than or equal to> and />Is a nominal parameter of the vertical subsystem and the yaw subsystem, based on the measured values>When the temperature of the water is higher than the set temperature,respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a vertical subsystem of the unmanned aerial vehicle are erected; />When, is greater or less>Respectively representiState vector, local control input, global control input, self output, other unmanned aerial vehicle output of the yaw subsystem of the frame unmanned aerial vehicle.
3. The reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 2, wherein the process of establishing the vertical subsystem non-zero and game formation model in step S2 is as follows:
according to the dynamics model of the longitudinal subsystem of the unmanned aerial vehicle, a non-zero game formation model of the longitudinal subsystem is established, and the steps are as follows:
unmanned plane setAll together havenPut up unmanned aerial vehicle, and make->Representing the states of the non-zero and game formation models, and obtaining a global dynamic model as follows:
wherein Is->Unit matrix, based on the status of the unit>Is a kronecker product, or a combination thereof>Is the firstiColumn vectors with elements of 1 and other elements of 0; it is known that
Substituted by formula (3)
Considering that the virtual leader provides the required relative position for each follower in the formation, an ideal system of zero input is defined as follows:
wherein ,/>(ii) a Make->,/>Then, the following error system is obtained by subtracting the equations (4) and (5):
4. the reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 3, wherein the step S3 comprises the following steps:
the cost function for each drone is defined as follows:
wherein the weight parameterAnd a weight parameter>Is a symmetric matrix, is>For the initial state of the non-zero and game formation models,twhich represents the starting time of the process,τrepresents the integration time;
designing a non-zero and game formation controller for each drone to track a predetermined trajectory, i.e. and />While minimizingiCost function of unmanned aerial vehicle on shelf->;
The game optimal controller is designed into
game feedback controller equation (8) establishes nash equilibrium, a nash control strategy, of the non-zero and game formation control problems when each drone satisfies all of the following inequalities:
5. the reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 4, wherein the step S4 comprises the following steps:
for unknown matrix and />Approximately solving a coupled algebra Riccati equation set and a Nash equilibrium solution by using a reinforcement learning algorithm; optimal feedback control gain pass for drone>Obtaining; whereinK i For feedback control of gain, symmetric matricesSolved by the following coupled AREs:
in the formula vIt is shown that the counting variable is,obtaining (10) a numerical solution using a model-based strategy iterative algorithm if the dynamics of the system are known; if the dynamics of the system is unknown, the following strategy is utilized to iterate the reinforcement learning algorithm to approximately solve;
wherein ,krepresenting number of iterationsThe number of the first and second groups is,
from the kronecker product algorithm:
obtained by using the formula (13) and the formula (14),
wherein the general formula of vec () is as follows:
,/>is/>A column vector formed by each column element of (a); definition ofColumn vector,/> and />,sIs a positive integer as follows: />
combinations of (11), (15), (16), (17) and (18), push-out
the following linear iterative equation is derived from equation (11):
wherein
If it is notIf the rank is full, then the only solution of formula (20) is obtained; based on the fact that a random harmonic detection noise is introduced into the learning process>Rank of full rank; when/is>In whichεThe convergence threshold is expressed, and the first is obtained from the formula (20)iFeedback control gain of unmanned aerial vehicle on frame->;
6. The reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 5, wherein in step S4: the strategy iterative algorithm is a model-free algorithm independent of the prior knowledge of the unmanned aerial vehicle system, and comprises the following specific algorithms:
Step2. InDuring the period, the control input of the unmanned aerial vehicle is->, wherein />Is a bounded heuristic noise;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310193021.9A CN115877871B (en) | 2023-03-03 | 2023-03-03 | Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310193021.9A CN115877871B (en) | 2023-03-03 | 2023-03-03 | Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115877871A true CN115877871A (en) | 2023-03-31 |
CN115877871B CN115877871B (en) | 2023-05-26 |
Family
ID=85761836
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310193021.9A Active CN115877871B (en) | 2023-03-03 | 2023-03-03 | Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115877871B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116880551A (en) * | 2023-07-13 | 2023-10-13 | 之江实验室 | Flight path planning method, system and storage medium based on random event capturing |
CN117420849A (en) * | 2023-12-18 | 2024-01-19 | 山东科技大学 | Marine unmanned aerial vehicle formation granularity-variable collaborative search and rescue method based on reinforcement learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109375514A (en) * | 2018-11-30 | 2019-02-22 | 沈阳航空航天大学 | A kind of optimal track control device design method when the injection attacks there are false data |
CN113093538A (en) * | 2021-03-18 | 2021-07-09 | 长春工业大学 | Non-zero and game neural-optimal control method of modular robot system |
US20210403159A1 (en) * | 2018-10-18 | 2021-12-30 | Telefonaktiebolaget Lm Ericsson (Publ) | Formation Flight of Unmanned Aerial Vehicles |
US20220004191A1 (en) * | 2020-07-01 | 2022-01-06 | Wuhan University Of Technology | Usv formation path-following method based on deep reinforcement learning |
CN114047758A (en) * | 2021-11-08 | 2022-02-15 | 南京云智控产业技术研究院有限公司 | Q-learning-based multi-mobile-robot formation method |
CN114460959A (en) * | 2021-12-15 | 2022-05-10 | 北京机电工程研究所 | Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game |
CN115562342A (en) * | 2022-10-24 | 2023-01-03 | 南京航空航天大学 | Multi-aircraft task allocation, flight path planning and formation control integrated game method |
-
2023
- 2023-03-03 CN CN202310193021.9A patent/CN115877871B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210403159A1 (en) * | 2018-10-18 | 2021-12-30 | Telefonaktiebolaget Lm Ericsson (Publ) | Formation Flight of Unmanned Aerial Vehicles |
CN109375514A (en) * | 2018-11-30 | 2019-02-22 | 沈阳航空航天大学 | A kind of optimal track control device design method when the injection attacks there are false data |
US20220004191A1 (en) * | 2020-07-01 | 2022-01-06 | Wuhan University Of Technology | Usv formation path-following method based on deep reinforcement learning |
CN113093538A (en) * | 2021-03-18 | 2021-07-09 | 长春工业大学 | Non-zero and game neural-optimal control method of modular robot system |
CN114047758A (en) * | 2021-11-08 | 2022-02-15 | 南京云智控产业技术研究院有限公司 | Q-learning-based multi-mobile-robot formation method |
CN114460959A (en) * | 2021-12-15 | 2022-05-10 | 北京机电工程研究所 | Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game |
CN115562342A (en) * | 2022-10-24 | 2023-01-03 | 南京航空航天大学 | Multi-aircraft task allocation, flight path planning and formation control integrated game method |
Non-Patent Citations (2)
Title |
---|
弓镇宇等: "基于零和博弈方法的多智能体系统H_∞一致性" * |
王醒策等: "多机器人动态编队的强化学习算法研究" * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116880551A (en) * | 2023-07-13 | 2023-10-13 | 之江实验室 | Flight path planning method, system and storage medium based on random event capturing |
CN117420849A (en) * | 2023-12-18 | 2024-01-19 | 山东科技大学 | Marine unmanned aerial vehicle formation granularity-variable collaborative search and rescue method based on reinforcement learning |
CN117420849B (en) * | 2023-12-18 | 2024-03-08 | 山东科技大学 | Marine unmanned aerial vehicle formation granularity-variable collaborative search and rescue method based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN115877871B (en) | 2023-05-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115877871A (en) | Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning | |
CN113093802B (en) | Unmanned aerial vehicle maneuver decision method based on deep reinforcement learning | |
US20220315219A1 (en) | Air combat maneuvering method based on parallel self-play | |
CN109669475A (en) | Multiple no-manned plane three-dimensional formation reconfiguration method based on artificial bee colony algorithm | |
Ali et al. | Explicit model following distributed control scheme for formation flying of mini UAVs | |
CN114020042A (en) | Heterogeneous unmanned cluster formation enclosure tracking control method and system | |
CN110347181B (en) | Energy consumption-based distributed formation control method for unmanned aerial vehicles | |
Yang et al. | Distributed optimal consensus with obstacle avoidance algorithm of mixed-order UAVs–USVs–UUVs systems | |
CN104589349A (en) | Combination automatic control method with single-joint manipulator under mixed suspension microgravity environments | |
Zhou et al. | Distributed formation control for multiple quadrotor UAVs under Markovian switching topologies with partially unknown transition rates | |
CN110825116B (en) | Unmanned aerial vehicle formation method based on time-varying network topology | |
CN115639830B (en) | Air-ground intelligent agent cooperative formation control system and formation control method thereof | |
Cong et al. | Formation control for multiquadrotor aircraft: Connectivity preserving and collision avoidance | |
CN115755956B (en) | Knowledge and data collaborative driving unmanned aerial vehicle maneuvering decision method and system | |
Shen et al. | Attitude active disturbance rejection control of the quadrotor and its parameter tuning | |
CN116974299A (en) | Reinforced learning unmanned aerial vehicle track planning method based on delayed experience priority playback mechanism | |
Fiori et al. | Extension of a PID control theory to Lie groups applied to synchronising satellites and drones | |
CN117055605A (en) | Multi-unmanned aerial vehicle attitude control method and system | |
CN113268084B (en) | Intelligent fault-tolerant control method for unmanned aerial vehicle formation | |
Yang et al. | Cooperative group formation control for multiple quadrotors system with finite-and fixed-time convergence | |
Aruneshwaran et al. | Neural adaptive flight controller for ducted-fan UAV performing nonlinear maneuver | |
Montella et al. | Reinforcement learning for autonomous dynamic soaring in shear winds | |
CN117452975A (en) | Security performance cooperative formation control design method for four-rotor unmanned aerial vehicle cluster | |
CN114995521B (en) | Multi-unmanned aerial vehicle distributed formation control method and device and electronic equipment | |
Razzaghian et al. | Robust adaptive neural network control of miniature unmanned helicopter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |