CN115877871A

CN115877871A - Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning

Info

Publication number: CN115877871A
Application number: CN202310193021.9A
Authority: CN
Inventors: 刘昊; 吕金虎; 马子豪; 高庆; 刘德元; 王薇; 钟森
Original assignee: Beihang University; Academy of Mathematics and Systems Science of CAS
Current assignee: Beihang University; Academy of Mathematics and Systems Science of CAS
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-03-31
Anticipated expiration: 2043-03-03
Also published as: CN115877871B

Abstract

The invention belongs to the technical field of unmanned aerial vehicle control, and provides a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning, which comprises the following specific steps: s1: establishing an unmanned aerial vehicle dynamic model; s2: establishing a non-zero and game formation model; s3: solving the nonzero and game formation model established in the step S2 by using a reinforcement learning method; s4: designing a non-zero and game formation controller. The control method can ensure that the state of the unmanned aerial vehicle cluster subsystem is quickly converged to a desired value, namely the unmanned aerial vehicle cluster can form a required formation pattern in a short time, and the process is stable and quick. Meanwhile, the error system adopts a game-based control method, and compared with the traditional control method, the convergence rate is higher and the overshoot is smaller. Therefore, the proposed non-zero and game controller can better solve the problem of formation trajectory tracking.

Description

Non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning

Technical Field

The invention belongs to the technical field of unmanned aerial vehicle control. In particular to a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning.

Background

Unmanned Aerial Vehicles (UAVs) are gaining attention in various research fields because they have many advantages such as low cost, high maneuverability, and good adaptability compared to a single UAV in many typical applications such as heavy transportation, wide area search mission, and large-scale scientific observation. The traditional formation system control method comprises a pilot-follower method, a behavior-based method, a virtual structure method, an artificial potential field method and the like. The pilot-follower method only defines the behavior of the pilot during formation flight, and other followers automatically keep the relative position with the pilot through information interaction, so that the formation keeping task of the whole formation can be completed. The distributed navigator-follower method has a simple and clear control structure in the practical application process, and each unmanned aerial vehicle only needs the state information of neighbors and the unmanned aerial vehicle, so that the requirement on missile communication hardware is not high, the cooperation problem among formation members is greatly simplified, and the distributed navigator-follower method is widely applied to robot formation, unmanned aerial vehicle formation and missile formation. In recent years, game theory has attracted extensive attention in the field of robot formation, for example, a differential game equilibrium solution is used as a formation control strategy to effectively solve the formation control problem. In fact, the drone formation control problem can be expressed as a multi-person differential gaming problem.

When the dynamic parameters of the unmanned aerial vehicle cannot be accurately obtained or constant interference exists in formation, the Nash equilibrium solution of the game formation problem is difficult to solve, and further the Nash equilibrium optimal formation control law is obtained to realize the formation required by the unmanned aerial vehicle team. The problem can be solved by utilizing the ability of reinforcement learning self-adaptive learning, intelligently identifying some unknown parameters of the unmanned aerial vehicle formation system and learning the optimal controller by utilizing state data.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning, which can intelligently identify some unknown parameters of an unmanned aerial vehicle formation system and learn an optimal controller by using state data by using the reinforcement learning self-adaptive learning capability.

The technical scheme of the invention is as follows:

a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning comprises the following specific steps:

s1: establishing a dynamic model of the unmanned aerial vehicle;

s2: establishing a non-zero and game formation model, which comprises a longitudinal subsystem non-zero and game formation model, a transverse subsystem non-zero and game formation model, a vertical subsystem non-zero and game formation model and a yaw subsystem non-zero and game formation model;

s3: solving the nonzero and game formation model established in the step S2 by using a reinforcement learning method;

s4: designing a non-zero and game formation controller.

Preferably, the step S1 includes the following steps:

for the firstiErecting an unmanned aerial vehicle and establishing a six-degree-of-freedom unmanned aerial vehicle dynamic system, wherein the six-degree-of-freedom unmanned aerial vehicle dynamic system is a multi-input and multi-output system and consists of four subsystems, and the input and the output of a longitudinal subsystem are respectively defined as

，/>

The input and output of the transverse subsystem are each defined as->

，/>

The input and output of the vertical subsystem are defined as ^ or ^ s, respectively>

，/>

The input and output of the yaw subsystem are defined as ^ or ^ s, respectively>

，/>

；

Represents the position of the unmanned aerial vehicle in the earth-fixed inertial system, and/or>

Representing the Euler attitude angle of the drone, wherein->

Respectively representing a roll angle, a pitch angle and a yaw angle;

representing control inputs of the drone, divided into local control inputs->

And a global control input

I.e. is->

(ii) a Local input->

Is that the unmanned aerial vehicle outputs by itself->

Generated control input, global input>

Is output by other unmanned aerial vehicles>

A generated control input; the state vectors of the longitudinal subsystem, the transverse subsystem, the vertical subsystem and the yaw subsystem are respectively defined as, device for selecting or keeping>

，/>

，/>

and />

；

Establishing a longitudinal subsystem dynamic model and a transverse subsystem dynamic model of the unmanned aerial vehicle as follows:

（1）

wherein

Is a constant feedback factor, is greater than or equal to>

and />

Are the nominal parameters of the longitudinal and transverse subsystems,ja number indicating the number of the other drone,nrepresents the sum of other drones; />

When the temperature of the water is higher than the set temperature,

respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a longitudinal subsystem of the unmanned aerial vehicle are erected; />

When, is greater or less>

Respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a transverse subsystem of the unmanned aerial vehicle are erected;

establishing a dynamic model of a vertical subsystem and a dynamic model of a yawing subsystem of the unmanned aerial vehicle as follows:

（2）

wherein

Is a constant feedback factor, is greater than or equal to>

and />

Are the nominal parameters of the vertical subsystem and the yaw subsystem,

in combination of time>

Respectively representiThe state vector, the local control input, the global control input, the self output quantity and the output quantity of other unmanned aerial vehicles of a vertical subsystem of the unmanned aerial vehicle are erected; />

When the temperature of the water is higher than the set temperature,

respectively represent the firstiThe state vector, the local control input, the global control input, the self output quantity and the other unmanned aerial vehicle output quantities of the yawing subsystem of the unmanned aerial vehicle are erected.

Preferably, the process of establishing the non-zero and game formation model of the longitudinal sub-system in the step S2 is as follows:

according to the dynamics model of the longitudinal subsystem of the unmanned aerial vehicle, a non-zero game formation model of the longitudinal subsystem is established, and the steps are as follows:

unmanned plane set

In all, havenErect unmanned aerial vehicle, and make->

Representing the states of the non-zero and game formation models, and obtaining a global dynamic model as follows:

（3）

wherein

Is->

Unit matrix, based on the status of the unit>

Is a kronecker product, or a combination thereof>

Is the firstiColumn vectors with elements of 1 and other elements of 0; it is known that

So substituting the above formula into

（4）

Considering that the virtual leader provides the required relative position for each follower in the formation, an ideal system of zero input is defined as follows:

（5）

wherein

，/>

(ii) a Make->

，

Then, the following error system is obtained by subtracting the equations (4) and (5):

（6）/>

and establishing non-zero and game formation models of the transverse subsystem, the vertical subsystem and the yaw subsystem by adopting the same method.

Preferably, the step S3 specifically comprises the following steps:

the cost function for each drone is defined as follows:

（7）

wherein the weight parameter

And a weight parameter->

Is a symmetric matrix, is>

For the initial state of the non-zero and game formation models,twhich represents the time of the start-up,τrepresents the integration time;

designing a non-zero and game formation controller for each drone to track a predetermined trajectory, i.e.

And

while minimizingiCost function of unmanned aerial vehicle on shelf->

；

The game optimal controller is designed into

（8）

wherein ,

represents an optimal feedback control gain;

under the condition of satisfying the formula (6),

solving by minimizing a cost function:

game feedback controller equation (8) establishes nash equilibrium, a nash control strategy, of the non-zero and game formation control problems when each drone satisfies all of the following inequalities:

（9）

preferably, the step S4 specifically comprises the following steps:

for unknown matrix

and />

Approximately solving a coupled algebra Riccati equation set and a Nash equilibrium solution by using a reinforcement learning algorithm; optimal feedback control gain pass for drone>

Obtaining; whereinK _i For feedback control of the gain, the symmetric matrix->

Solved by the following coupled AREs:

（10）

in the formula vIt is shown that the counting variable is,

if the dynamics of the system are known, obtaining a numerical solution by using a strategy iterative algorithm based on a model; if the dynamics of the system is unknown, the following strategy is utilized to iterate the reinforcement learning algorithm to approximately solve;

（11）/>

wherein ,kthe number of iterations is indicated and,

（12）

from the kronecker product algorithm:

（13）

（14）

wherein

Is an arbitrary column vector and is a linear vector,MandNrepresents any two matrices;

obtained by using the formula (13) and the formula (14),

（15）

（16）

（17）

wherein, vec () has the following general formula:

，/>

is->

A column vector formed by each column element of (a); defining a column vector

，/>

and />

，sIs a positive integer as follows:

（18）

wherein ,

，/>

combinations (11), (15), (16), (17) and (18) of push-out

（19）

wherein ,

the following linear iterative equation is derived from equation (11):

（20）

wherein

If it is not

If the rank is full, then a unique solution of equation (20) is obtained; by introducing an appropriate random harmonic detection noise in the learning process, ≥ is set>

Rank of full row; when/is>

In whichεThe convergence threshold is expressed, and the first is obtained from the formula (20)iFeedback control gain->

；

At the same timeSolving optimal feedback control gains

And the symmetry matrix->

Without the need for the system dynamic matrix in equation (4)

and />

(ii) a The game-based control protocol is derived from state information and control input information of the system.

Preferably, in step S4: the strategy iterative algorithm is a model-free algorithm independent of the prior knowledge of the unmanned aerial vehicle system, and comprises the following specific algorithms:

step1. Selection

(ii) a Selecting a stable initial feedback control gain ^ for each drone>

；

Step2. At

During the period, the control input of the unmanned aerial vehicle is->

, wherein />

Is a bounded heuristic noise;

step3. Solving for each drone by equation (20)

；

Step4. Order

And returns to Step3 until implemented->

，/>

A spectral norm representing the matrix;

step5. Obtaining an approximate solution to the Nash equilibrium solution

。/>

The invention provides a non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning, which has the following advantages that:

the control method can ensure that the state of the unmanned aerial vehicle cluster subsystem is quickly converged to a desired value, namely the unmanned aerial vehicle cluster can form a required formation pattern in a short time, and the process is stable and quick. Meanwhile, the error system adopts a game-based control method, and compared with the traditional control method, the convergence rate is higher and the overshoot is smaller. Therefore, the proposed non-zero and game controller can better solve the problem of formation trajectory tracking.

Drawings

In order to illustrate embodiments of the invention or solutions in the prior art more clearly, the drawings that are needed in the embodiments will be briefly described below, so that the features and advantages of the invention will be more clearly understood by referring to the drawings that are schematic and should not be understood as limiting the invention in any way, and other drawings may be obtained by those skilled in the art without inventive effort. Wherein:

FIG. 1 is

A learning process convergence map;

fig. 2 is a diagram of drone positions;

fig. 3 is a drone position response diagram (3D);

figure 4 is a drone position error map;

FIG. 5 is a response plot of the attitude angle of the drone;

fig. 6 is a drone position response diagram (LQR).

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

s1: establishing an unmanned aerial vehicle dynamic model;

s4: designing a non-zero and game formation controller.

1. Unmanned aerial vehicle model

The six-degree-of-freedom unmanned aerial vehicle dynamic system can be divided into four subsystems, namely a longitudinal subsystem, a transverse subsystem, a vertical subsystem and a yawing subsystem.

Denotes the firstiThe position of the unmanned aerial vehicle in the earth fixed inertial system is erected,

representing the Euler attitude angle of the drone, wherein->

Are respectively provided withRoll, pitch and yaw are indicated. Make->

Is shown asiAnd the control input of the unmanned aerial vehicle is erected, and the four components represent the control input of the four subsystems. The control input may be divided into a local control input->

And a global control input->

I.e. is->

. Local input->

Is that the unmanned aerial vehicle outputs by itself->

Generated control input, global input>

Is output from other unmanned aerial vehicles

The generated control input. Therefore, the status vectors of the longitudinal subsystem, the transverse subsystem, the vertical subsystem and the yaw subsystem are respectively defined as->

，/>

，/>

and />

. The longitudinal subsystem dynamics model and the transverse subsystem dynamics model are as follows:

（1）

wherein

Is a constant feedback coefficient>

and />

Are the nominal parameters of the longitudinal subsystem and the transverse subsystem,ja number indicating the number of the other drone,nrepresents the sum of other drones; />

When the temperature of the water is higher than the set temperature,

When, is greater or less>

Respectively representiThe state vector, local control input, global control input, self output and other unmanned aerial vehicle output of the transverse subsystem of the unmanned aerial vehicle are erected.

The vertical subsystem dynamics model and the yaw subsystem dynamics model are as follows:

（2）

wherein

Is a constant feedback factor, is greater than or equal to>

and />

Is a nominal parameter for the vertical subsystem and the yaw subsystem. As can be seen from equations (1) and (2), the UVA system is a multiple-input and multiple-output system, consisting of four subsystems, with four control inputs->

And four outputs->

。

2. Non-zero sum game formation model

Non-zero and game formation models are established by taking a vertical subsystem as an example, and models of other subsystems can be established by adopting the same method. Set unmanned aerial vehicle

And make->

Representing the state of the non-zero and game formation models. Then, a global kinetic model can be obtained as follows:

（3）

wherein ,

is->

Unit matrix, based on the status of the unit>

Is a kronecker product, or a combination thereof>

Is the firstiColumn vectors with element 1 and other element 0. It is known that

So that the above formula is substituted

（4）

Considering that the virtual leader provides the required relative position for each follower in the formation, then an ideal system of zero input is defined as follows:

（5）

wherein

，/>

(ii) a Make->

，

（6）

as can be seen from the formula (6)The state of the formation longitudinal subsystem is controlled and input by all unmanned aerial vehicles

The influence of (a); this means that there are co-ordinations and conflicts within the multi-drone system, so the global system can be studied within the scope of the differential game theory.

3. Solving non-zero sum game formation control problem

The cost function for each drone is defined as follows:

（7）/>

wherein the weight parameter

and />

Is a symmetric weight matrix, and>

for the initial state of the non-zero and game formation models,twhich represents the starting time of the process,τrepresenting the integration time. The aim of the invention is to design a non-zero and game formation controller for each unmanned aerial vehicle to track a predetermined trajectory, namely ^ er>

and />

While minimizingiCost function of unmanned aerial vehicle

. Designing the game controller as

（8）

Under the condition of satisfying the formula (6), the cost function is minimized to obtain:

（9）

as can be seen from equation (9), when other drones maintain the nash control strategy, none of the participants can reduce cost by deviating from nash equilibrium; this means that nash equalization can force each participant to maintain a nash control strategy.

4. Designing a controller:

in the part, a non-zero game formation controller is designed based on game theory and reinforcement learning theory. For unknown matrix

and />

Coupled Algebraic Riccati Equations (AREs) and Nash equilibrium solutions are solved approximately using a reinforcement learning algorithm. According to the optimal control theory and the game theory, the stable feedback control gain of the unmanned aerial vehicle can be obtained through->

And (4) obtaining. In which symmetry matrix +>

This can be solved by the following coupled AREs:

（10）

in the formula

. Of non-linear equation (10)The analytical solution is difficult to directly obtain. Thus, if the dynamics of the system are known, a model-based strategy iterative algorithm can be used to obtain a numerical solution of equation (10); if the dynamics of the system are unknown, the following strategy can be used to iterate a reinforcement learning algorithm to approximate solution (10).

（11）

（12）

From the kronecker product:

（13）

（14）/>

obtained by using the formula (13) and the formula (14),

（15）

（16）

（17）

wherein, vec () has the following general formula:

，/>

is->

Each column of elements ofA column vector of pixels; defining a column vector

，/>

and />

，sIs a positive integer as follows:

（18）

wherein ,

，/>

combinations of (11), (15), (16), (17) and (18), push-out

（19）

wherein ,

。

the following linear iterative equation can be derived from equation (11):

（20）

wherein

Note 1: if it is not

Is a full rank, a unique solution of equation (20) can be obtained. By introducing appropriate followers in the learning processThe machine harmonic wave detects noise and can make->

The full rank. When/is>

When it comes toiFeedback control matrix of unmanned aerial vehicle on frame->

Can be obtained from the formula (20).

Note 2: the optimal feedback control gain can be solved simultaneously by the formula (20)

And the symmetry matrix->

Without the need for the system dynamics matrix ≥ in equation (4)>

and />

. The game-based control protocol may be derived from state information and control input information of the system. Therefore, the proposed strategy iterative algorithm is a model-free algorithm that does not rely on the prior knowledge of the drone system. The specific algorithm is as follows:

step1. Selection

(ii) a Selecting a stable initial feedback control gain ^ for each drone>

；

Step2. In

During, unmanned aerial vehicle's control input is->

, wherein />

Is a bounded heuristic noise;

step3. Solving for each drone by equation (20)

；/>

Step4. Order

And returns to Step3 until implemented->

，/>

A spectral norm representing a matrix;

step5. Obtaining an approximate solution to the Nash equilibrium solution

。

In the simulation, the game formation system comprises three unmanned aerial vehicles, all of which are modeled according to a single unmanned aerial vehicle system, and the parameters are shown in table 1. The validity of the unmanned aerial vehicle system parameters is verified through a real-time experiment result. While matrices a, B and C are unknown to the formation controller. Furthermore, comparison with the conventional optimal control method (LQR) was also made. The weight matrix is chosen as:

，/>

and />

. The initial state of the ideal system is selected to be->

，/>

，

and />

. And the initial state of the non-zero sum game formation system is selected as

，/>

，/>

And

. The stable feedback control gain is selected as->

，

，/>

，/>

，

and />

，/>

，/>

，/>

，/>

，

，/>

. The design parameters of the reinforcement learning algorithm are selected as follows: />

and />

And s. The exploration noise is selected as: />

，/>

and />

, wherein />

Is a section

The random number in (c). Finally, a convergence threshold is selected>

To verify the validity of the proposed gaming formation controller.

TABLE 1 subsystem parameters

In the course of the learning process,

converge to its optimum value->

The convergence process of (2) is shown in fig. 1. It can be seen from the figure that, through the proposed reinforcement learning algorithm, the method is applied toConvergence can be achieved after 12 iterations. The position response is shown in fig. 2, the position error is shown in fig. 4, and the attitude response is shown in fig. 5, in which the dashed line, the chain line, and the solid line respectively represent the drones 1-3. As can be seen from these figures, all subsystems of the three drones converge to their desired values within 5 seconds. Three unmanned aerial vehicles can form required formation patterns in a short time, and the process is stable and quick. Comparing fig. 4 and 6, it can be seen that the error system using the game-based control method has a faster convergence rate and a smaller overshoot than using the conventional control method. Therefore, the proposed non-zero and game controller can better solve the problem of formation trajectory tracking.

The above description is given for the purpose of illustrating the invention and is not to be construed as limiting the invention, since various modifications and changes will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A non-zero and game unmanned aerial vehicle formation control method based on reinforcement learning is characterized by comprising the following specific steps:

s1: establishing an unmanned aerial vehicle dynamic model;

s4: designing a non-zero and game formation controller.

2. The reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 1, wherein the step S1 comprises the following steps:

for the firstiErecting an unmanned aerial vehicle and establishing a six-degree-of-freedom unmanned aerial vehicle dynamic system, wherein the six degrees of freedom are free ofThe human power system is a multi-input and multi-output system and consists of four subsystems, wherein the input and the output of a longitudinal subsystem are respectively defined as

，

The input and output of the transverse subsystem are each defined as->

，/>

The input and output of the vertical subsystem are respectively defined as

，/>

，/>

；

Representing the Euler attitude angle of the drone, wherein->

Respectively representing a roll angle, a pitch angle and a yaw angle; />

Representing a control input of the drone, divided into local control inputs->

And a global control input>

I.e. based on>

(ii) a Local input->

Is that the unmanned plane outputs the quantity by itself>

Generated control input, global input>

Is output by other unmanned aerial vehicles>

A generated control input; the state vectors of the longitudinal subsystem, the transverse subsystem, the vertical subsystem and the yaw subsystem are respectively defined as,

，/>

，/>

and />

；

（1）

wherein

Is a constant feedback factor, is greater than or equal to>

and />

When, is greater or less>

Respectively representiErecting state vectors, local control input, global control input, self output quantity and other unmanned aerial vehicle output quantities of a longitudinal subsystem of the unmanned aerial vehicle; />

When, is greater or less>

（2）

wherein

Is a constant feedback factor, is greater than or equal to>

and />

Is a nominal parameter of the vertical subsystem and the yaw subsystem, based on the measured values>

When the temperature of the water is higher than the set temperature,

When, is greater or less>

Respectively representiState vector, local control input, global control input, self output, other unmanned aerial vehicle output of the yaw subsystem of the frame unmanned aerial vehicle.

3. The reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 2, wherein the process of establishing the vertical subsystem non-zero and game formation model in step S2 is as follows:

unmanned plane set

All together havenPut up unmanned aerial vehicle, and make->

（3）

wherein

Is->

Unit matrix, based on the status of the unit>

Is a kronecker product, or a combination thereof>

Substituted by formula (3)

（4）/>

（5）

wherein

，/>

(ii) a Make->

，/>

（6）。

4. the reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 3, wherein the step S3 comprises the following steps:

the cost function for each drone is defined as follows:

（7）

wherein the weight parameter

And a weight parameter>

Is a symmetric matrix, is>

For the initial state of the non-zero and game formation models,twhich represents the starting time of the process,τrepresents the integration time;

and />

While minimizingiCost function of unmanned aerial vehicle on shelf->

；

The game optimal controller is designed into

（8）

wherein ,

represents an optimal feedback control gain;

under the condition of satisfying the formula (6),

solving by minimizing a cost function:

（9）。

5. the reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 4, wherein the step S4 comprises the following steps:

for unknown matrix

and />

Obtaining; whereinK _i For feedback control of gain, symmetric matrices

Solved by the following coupled AREs:

（10）

in the formula vIt is shown that the counting variable is,

obtaining (10) a numerical solution using a model-based strategy iterative algorithm if the dynamics of the system are known; if the dynamics of the system is unknown, the following strategy is utilized to iterate the reinforcement learning algorithm to approximately solve;

（11）

wherein ,krepresenting number of iterationsThe number of the first and second groups is,

（12）

from the kronecker product algorithm:

（13）

（14）

wherein

obtained by using the formula (13) and the formula (14),

（15）

（16）

（17）

wherein the general formula of vec () is as follows:

，/>

is/>

A column vector formed by each column element of (a); definition ofColumn vector

，/>

and />

，sIs a positive integer as follows: />

（18）

wherein ,

，/>

combinations of (11), (15), (16), (17) and (18), push-out

（19）

wherein ,

the following linear iterative equation is derived from equation (11):

（20）

wherein

If it is not

If the rank is full, then the only solution of formula (20) is obtained; based on the fact that a random harmonic detection noise is introduced into the learning process>

Rank of full rank; when/is>

In whichεThe convergence threshold is expressed, and the first is obtained from the formula (20)iFeedback control gain of unmanned aerial vehicle on frame->

；

Simultaneous solution of optimal feedback control gain

And the symmetry matrix->

Without the need for the system dynamics matrix ≥ in equation (4)>

And

6. The reinforced learning-based non-zero and game unmanned aerial vehicle formation control method according to claim 5, wherein in step S4: the strategy iterative algorithm is a model-free algorithm independent of the prior knowledge of the unmanned aerial vehicle system, and comprises the following specific algorithms:

step1. Selection