CN115562322A

CN115562322A - Unmanned aerial vehicle variable impedance flight control method based on reinforcement learning

Info

Publication number: CN115562322A
Application number: CN202211234445.7A
Authority: CN
Inventors: 俞玉树; 冯宇婷; 杜健睿; 杨涛
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-10-10
Filing date: 2022-10-10
Publication date: 2023-01-03

Abstract

The invention relates to an unmanned aerial vehicle variable impedance flight control method based on reinforcement learning, and belongs to the technical field of unmanned aerial vehicles. The method comprises the steps of establishing a reinforcement learning network _1 for outputting a flight control command in a dynamic model and a reinforcement learning network _2 for outputting an impedance parameter in an impedance control model by adopting a reinforcement learning algorithm, inputting the flight control command into the dynamic model to obtain the current actual state of the unmanned aerial vehicle, inputting the impedance parameter into the impedance control model to obtain the estimated state error of the unmanned aerial vehicle, and simultaneously acting the difference between the actual state and the estimated state error as well as the difference between the actual state and the expected state on the reinforcement learning network _1 and the reinforcement learning network _2, so that the unmanned aerial vehicle can achieve the aim of achieving the flight control in the flight process, the rigid body of the unmanned aerial vehicle can be prevented from being damaged, less energy can be consumed, and the method has a good application prospect in the field of flight control of the unmanned aerial vehicle.

Description

Unmanned aerial vehicle variable impedance flight control method based on reinforcement learning

Technical Field

The invention relates to an unmanned aerial vehicle variable impedance flight control method based on reinforcement learning, and belongs to the technical field of unmanned aerial vehicles.

Background

Most of the traditional flight control methods need to be strictly dependent on various dynamic models and parameters, complex mathematical proving processes are needed, and a large amount of time is consumed in parameter adjustment to achieve the effect of stable flight. In addition, the existing flight control method has no way to actively adjust the flight flexibility of the unmanned aerial vehicle according to the current external environment state in the flight process, and may not be capable of coping with various emergency situations and cause damage to the rigid body of the unmanned aerial vehicle.

Disclosure of Invention

Aiming at the defects of the existing flight control method, the invention provides the variable impedance flight control method of the unmanned aerial vehicle based on reinforcement learning, which enables an intelligent body to continuously interact with the environment based on a mode of combining the reinforcement learning and the variable impedance, autonomously learns the impedance parameters instead of taking fixed parameters, can protect the rigid body of the unmanned aerial vehicle from being damaged, can consume less energy, and achieves the effect of smooth operation.

The purpose of the invention is realized by the following technical scheme.

A method for controlling variable impedance flight of an unmanned aerial vehicle based on reinforcement learning comprises the steps of respectively establishing a reinforcement learning network _1 for outputting a flight control command in a dynamic model and a reinforcement learning network _2 for outputting impedance parameters in an impedance control model by adopting a reinforcement learning algorithm, inputting the flight control command into the dynamic model to obtain the current actual state of the unmanned aerial vehicle, inputting the impedance parameters into the impedance control model to obtain the estimated state error of the unmanned aerial vehicle, and simultaneously acting the difference values of the actual state, the estimated state error and the expected state on the reinforcement learning network _1 and the reinforcement learning network _2, so that the purpose of soft control of the unmanned aerial vehicle in the flight process can be achieved;

among these, reinforcement learning learns the mapping from the state error s to the optimal action a to be performed, i.e. finds a policy (function/logic rule) such that the decisions made by the policy in a given state can ultimately lead to the maximum reward, i.e. the learner is not informed of what action should be taken in the process, but rather they find out what action to take through constant interaction with the environment, which can lead to the highest reward value.

Further, the unmanned aerial vehicle variable impedance flight control method based on reinforcement learning specifically operates as follows:

(1) Constructing a reinforcement learning network _1 taking a dynamic model as an interactive environment, taking a target state error as the input of the reinforcement learning network _1, and correspondingly outputting the reinforcement learning network _1 as the speeds in three directions of x, y and z and a yaw rate; taking the output (namely the speeds in three directions and a yaw rate) of the reinforcement learning network _1 as the input of a dynamic environment (or a dynamic model), and correspondingly taking the output of the dynamic environment as the current actual state of the unmanned aerial vehicle;

(2) The reinforcement learning network _1 is repeatedly operated according to the step (1) until the reward _1 in the reinforcement learning network _1 is not increased any more (namely, the reward _1 is the minimization of the error value between the current state and the target state of the unmanned aerial vehicle and the minimization of the number of times of collision of the unmanned aerial vehicle);

(3) Constructing a reinforcement learning network _2 taking the impedance control model as an interactive environment, taking the target state error as the input of the reinforcement learning network _2, and correspondingly taking the output of the reinforcement learning network _2 as the damping parameter and the rigidity parameter of the impedance control model; taking the output of the reinforcement learning network _2 as the input of an impedance environment (or a second-order system environment or an impedance control model), and taking the output of the impedance environment as an estimated state error correspondingly;

(4) When the reward _1 in the step (2) is not increased any more, the difference value between the corresponding output current actual state and the estimation state error and the expected state value output in the step (3) are respectively acted in the reinforcement learning network _1 and the reinforcement learning network _2, so that the reinforcement learning network _1 refers to the step (1) and the reinforcement learning network _2 refers to the step (3) to operate simultaneously, at this time, the difference value between the output current actual state based on the reinforcement learning network _1 and the output estimation state error and the expected state value based on the reinforcement learning network _2 is respectively used for the reinforcement learning network _1 and the reinforcement learning network _2 to operate, and so on, until the reward _1 in the reinforcement learning network _1 and the reward _2 in the reinforcement learning network _2 are not increased any more, namely, the regulation and control of the flight state of the unmanned aerial vehicle under the variable impedance environment are completed;

the target state error comprises a position error, a speed error, a rotation matrix error and an angular speed error; the reward _1 set in the reinforcement learning network _1 is used for realizing stable flight of the unmanned aerial vehicle, and the reward _1 is set as an error value of the current state and the target state of the unmanned aerial vehicle and the number of times of collision of the unmanned aerial vehicle; the estimated state error is a state error value generated from an original target position due to external force; the reward _2 set in the reinforcement learning network _2 is for realizing variable impedance operation, when external force is large, the unmanned aerial vehicle generates large displacement along with the direction of the external force, when the external force is small, the unmanned aerial vehicle generates small displacement along with the direction of the external force, the posture change of the unmanned aerial vehicle is the same as that of the unmanned aerial vehicle according to the change of the external torque, in addition, the energy consumption is related to the impedance parameter value, the larger the impedance parameter value is, the larger the consumed energy is, and the reward _2 is set to be the impedance parameter value and the displacement value generated by the unmanned aerial vehicle under the external force.

Furthermore, the overall structure of the reinforcement learning network _1 and the reinforcement learning network _2 are based on the architecture of the ppo model, and the ppo model is built based on the garage framework.

Furthermore, the reinforcement learning network _1 and the operator network and the critical network in the reinforcement learning network _2 based on the ppo model architecture are two fully connected layers.

Furthermore, when the operator network and the critic network are two fully connected layers, the number of neurons is 64 × 64.

Further, the impedance control model adopts a second-order impedance model, and the specific formula is shown as (1.1):

in the formula (1.1), the right F value is a 6-dimensional vector formed by the corresponding external force and external moment; k is a radical of ₁ Is the target inertia matrix, k ₂ Is a target damping matrix, k ₃ A target stiffness matrix is obtained; y is the value of the position error,

to be fastThe value of the error is measured by a degree error,

is the acceleration error value.

The solution is performed by using a fourth-order Rungestota formula, and the second-order differential equation formula of the fourth-order Rungestota is shown as (1.2):

introducing a variable z, order

The original equation can be reduced to a first order equation set as shown in equation (1.3):

and solving the equation set formula by a fourth-order R-K method, wherein when x takes different values, y and z are obtained as shown in formula (1.4):

in the equation (1.4), h is a hyper-parameter, and y _ n is an estimated trajectory error value (original desired trajectory — estimated trajectory error value = current trajectory target position, and current trajectory target position is recorded as a desired state value);

since the second-order impedance system should meet the requirement of stable second-order system parameters, the second-order linear system function is shown as (1.5):

the relationship between the damping factor ζ and the frequency w is shown in the formula (1.6):

equation (1.1) can be converted to equation (1.7):

in the formula (1.7), the compound,

and

are all 3 × 3 matrices;

to simplify the calculation, k ₁ Can be taken as a unit matrix, then equation (1.8) can be converted to equation (1.9):

k ₂ ＝2ζω

k ₃ ＝ω ² (1.9)

the parameter values of the formula (1.1) are trained by using reinforcement learning, and the training aims to minimize the overall energy consumption and the rigid body damage, so that a target reward _2 in the optimized impedance control model can be designed, as shown in the formula (1.10):

Cb＝1-e ^{-1/2||goal-real_state||}

Ce＝w ²

R＝r1Cb+r2Ce (1.10)

in the formula (1.10), good is a target state value, real _ state is a current actual state value, w is a frequency value in the formula (1.7), R1 and R2 are fixed hyper-parameters, and R is reward _2. In addition, the optimized reward _2 can be selected from, but not limited to, the formula (1.10).

Has the advantages that:

(1) The invention adopts the reinforcement learning network established based on the reinforcement learning algorithm and is combined with the dynamic environment to realize the control of the stable flight of the unmanned aerial vehicle, can avoid the complicated parameter adjusting process of the traditional unmanned aerial vehicle control method, does not need to carry out repeated experiments to determine parameters, and once the strategy network is generated, the control instruction of the unmanned aerial vehicle can be directly generated according to the current state and the target state value.

(2) The invention adopts the reinforcement learning network established by the reinforcement learning algorithm and is combined with the impedance environment, and variable impedance control can be realized, so that the unmanned aerial vehicle can adapt to the environment, and the unmanned aerial vehicle and other objects in the current environment are protected.

(3) According to the invention, reinforcement learning and variable impedance are combined, under the action of uncontrollable external force, a variable impedance model trained through reinforcement learning can change a target state value according to the magnitude of the external force, and under the cooperation action of a dynamic model trained through reinforcement learning, the flexible control of the flight process of the unmanned aerial vehicle can be realized, so that the effects of reducing energy consumption and meeting target displacement are achieved.

(4) The control method provided by the invention is simple to operate, avoids a complicated parameter adjustment process, can actively adjust the flying flexibility of the unmanned aerial vehicle according to the current external environment state in the flying process, can avoid the damage of the rigid body of the unmanned aerial vehicle, can consume less energy, and has a good application prospect in the field of unmanned aerial vehicle flight control.

Drawings

Fig. 1 is a schematic flow chart of the unmanned aerial vehicle variable impedance flight control method based on reinforcement learning.

Detailed Description

The present invention is further illustrated with reference to specific embodiments, wherein the process is conventional unless otherwise specified, and the starting materials are commercially available from a public source unless otherwise specified.

Example 1

The reinforced learning-based unmanned aerial vehicle variable impedance flight control method specifically operates as follows:

(1) Constructing a reinforcement learning network _1 taking a dynamic model as an interactive environment, taking a target state error as the input of the reinforcement learning network _1, and correspondingly outputting the reinforcement learning network _1 as the speeds in three directions of x, y and z and a yaw rate; taking the output (namely the speeds in three directions and one yaw rate) of the reinforcement learning network _1 as the input of a dynamic environment (or a dynamic model), and correspondingly taking the output of the dynamic environment as the current actual state of the unmanned aerial vehicle;

(4) When the reward _1 in the step (2) is not increased any more, the difference value between the corresponding output current actual state, the estimation state error output in the step (3) and the expected state value is respectively acted in the reinforcement learning network _1 and the reinforcement learning network _2, so that the reinforcement learning network _1 referencing step (1) and the reinforcement learning network _2 referencing step (3) are operated simultaneously, at this time, the difference value between the current actual state output based on the reinforcement learning network _1 and the estimation state error output based on the reinforcement learning network _2 and the expected state value is respectively used for the reinforcement learning network _1 and the reinforcement learning network _2 to operate, and so on, until the reward _1 in the reinforcement learning network _1 and the reward _2 in the reinforcement learning network _2 are not increased any more, namely, the regulation and control of the flight state of the unmanned aerial vehicle under the variable impedance environment are completed;

the target state errors comprise position errors, speed errors, rotation matrix errors and angular speed errors; the reward _1 set in the reinforcement learning network _1 is used for realizing stable flight of the unmanned aerial vehicle, and the reward _1 is set as an error value between the current state and the target state of the unmanned aerial vehicle and the collision frequency of the unmanned aerial vehicle; the estimated state error is a state error value generated from an original target position due to external force; the reward _2 set in the reinforcement learning network _2 is used for realizing variable impedance operation, when external force is large, the unmanned aerial vehicle generates larger displacement along with the direction of the external force, when the external force is small, the unmanned aerial vehicle generates smaller displacement along with the direction of the external force, the posture change of the unmanned aerial vehicle is the same as that of the unmanned aerial vehicle according to the external torque change, in addition, the energy consumption is related to the impedance parameter value, the larger the impedance parameter value is, the larger the consumed energy is, the reward _2 is set as the impedance parameter value and the displacement value generated by the unmanned aerial vehicle under the external force, and the formula for optimizing the reward _2 is shown as the formula (1.10); the reinforcement learning network _1 and the reinforcement learning network _2 are based on the ppo model architecture established based on the garage framework, wherein the operator network and the critic network are two full connection layers, and the number of the neurons is 64 × 64.

Cb＝1-e ^{-1/2||goal-real_state||}

Ce＝w ²

R＝r1Cb+r2Ce (1.10)

In the formula (1.10), good is a target state value, real _ state is a current actual state value, w is a frequency, R1 and R2 are fixed hyper-parameters, and R is a forward _2.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An unmanned aerial vehicle variable impedance flight control method based on reinforcement learning is characterized in that: respectively establishing a reinforcement learning network _1 for outputting a flight control command in a dynamic model and a reinforcement learning network _2 for outputting impedance parameters in an impedance control model by adopting a reinforcement learning algorithm, inputting the flight control command into the dynamic model to obtain the current actual state of the unmanned aerial vehicle, inputting the impedance parameters into the impedance control model to obtain the estimated state error of the unmanned aerial vehicle, and simultaneously acting the difference values of the actual state, the estimated state error and the expected state on the reinforcement learning network _1 and the reinforcement learning network _2, so that the purpose of flexibly controlling the unmanned aerial vehicle in the flight process can be achieved.

2. The unmanned aerial vehicle variable impedance flight control method based on reinforcement learning of claim 1, characterized in that: the reinforced learning-based unmanned aerial vehicle variable impedance flight control method specifically operates as follows:

(1) Constructing a reinforcement learning network _1 taking a dynamic model as an interactive environment, taking a target state error as the input of the reinforcement learning network _1, and correspondingly outputting the reinforcement learning network _1 as the speeds in three directions of x, y and z and a yaw rate; taking the output of the reinforcement learning network _1 as the input of the dynamic model, and correspondingly taking the output of the dynamic model as the current actual state of the unmanned aerial vehicle;

(2) Repeatedly running the reinforcement learning network _1 according to the step (1) until the reward _1 in the reinforcement learning network _1 is not increased any more;

(3) Constructing a reinforcement learning network _2 taking the impedance control model as an interactive environment, taking the target state error as the input of the reinforcement learning network _2, and correspondingly taking the output of the reinforcement learning network _2 as the damping parameter and the rigidity parameter of the impedance control model; taking the output of the reinforcement learning network _2 as the input of an impedance control model, and correspondingly taking the output of the impedance control model as an estimated state error;

the target state error comprises a position error, a speed error, a rotation matrix error and an angular speed error; the reward _1 is set as an error value of the current state and the target state of the unmanned aerial vehicle and the collision frequency of the unmanned aerial vehicle; the estimated state error is a state error value generated with an original target position due to external force; the reward _2 is set as an impedance parameter value and a displacement value generated by the unmanned aerial vehicle under the action of external force.

3. The unmanned aerial vehicle variable impedance flight control method based on reinforcement learning of claim 1 or 2, wherein: the overall structure of the reinforcement learning network _1 and the reinforcement learning network _2 is based on the architecture of the ppo model, and the ppo model is built based on the garage framework.

4. The unmanned aerial vehicle variable impedance flight control method based on reinforcement learning of claim 3, characterized in that: the reinforcement learning network _1 and the operator network and the critical network in the reinforcement learning network _2 based on the ppo model architecture are two fully connected layers.

5. The unmanned aerial vehicle variable impedance flight control method based on reinforcement learning of claim 4, wherein: when the operator network and the critical network are two fully connected layers, the number of neurons is 64 × 64.

6. The unmanned aerial vehicle variable impedance flight control method based on reinforcement learning of claim 1 or 2, wherein: the formula of the target reward _2 in the optimized impedance control model is as follows:

Cb＝1-e ^{-1/2||goal-real_state||}

Ce＝w ²

R＝r1Cb+r2Ce

in the formula, good is a target state value, real _ state is a current actual state value, w is a frequency, R1 and R2 are fixed hyper-parameters, and R is a reward _2.