CN112486192B

CN112486192B - Aircraft guided transfer learning training algorithm based on destination movement prediction

Info

Publication number: CN112486192B
Application number: CN202011294913.0A
Authority: CN
Inventors: 李辉; 王壮
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2022-04-08
Anticipated expiration: 2040-11-18
Also published as: CN112486192A

Abstract

The invention discloses an aircraft guiding migration learning training algorithm based on destination movement prediction, which comprises the following steps: firstly, setting a kinematics and dynamics model of an aircraft, and training a fixed destination guide agent B; then, setting a mobile model of the mobile destination, setting an agent selection factor and an updating method, and initializing a mobile destination guide agent A; then, according to the agent selection factor, selecting to adopt the agent A to generate data of a segment, or adopt the agent B to update the data of a segment by a destination movement prediction method; and finally, training a moving destination guide intelligent agent by using a reinforcement learning algorithm to guide the aircraft to reach a moving destination along a specific direction. The method can be applied to an automatic guidance system of the aircraft, trains a new intelligent agent which can be applied to a new scene based on the existing guidance intelligent agent, and guides the aircraft to arrive at a moving destination from any posture along a specific direction.

Description

Aircraft guided transfer learning training algorithm based on destination movement prediction

Technical Field

The invention relates to the field of aircraft guidance control, in particular to an aircraft guidance migration learning training algorithm based on destination motion prediction.

Background

In many flight missions, it is necessary to guide an aircraft in a particular direction to a destination of movement. The same aircraft is often required to face destinations of different movement patterns in different missions. Such as: landing the aircraft on different movement modes of the aircraft carrier; the aircraft achieves refueling occupation during refueling in the air; the aircraft can reach the position of the dominant situation in the air war, and the like.

The reinforcement learning has high execution efficiency and flexible use, and has a plurality of researches in the field of aircraft guidance. If the small unmanned aerial vehicle is guided to land on a moving vehicle, the fixed wing aircraft is guided to arrive at an airport along the runway direction, and the carrier aircraft is guided to arrive at an approach point of an aircraft carrier. For different destinations, the moving modes are different, if each scene is trained from the beginning, on one hand, more time is needed, and on the other hand, the success rate cannot be guaranteed. After the reinforcement learning and the migration learning are combined, the method can be used in an aircraft guiding task. For an aircraft, a guiding agent is trained and has the capability of guiding the aircraft to fly to a fixed destination. The intelligent agent is used as a baseline intelligent agent, for a new guiding task, aiming at destinations with different moving modes, training is carried out by using data generated by the baseline intelligent agent through a method of predicting destination movement, and the training speed of the intelligent agent and the guiding success rate of an aircraft can be improved. The method has practical significance for executing flight guidance tasks of different moving mode destinations.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an aircraft guiding migration learning training algorithm based on destination motion prediction.

In order to realize the purpose, the invention adopts the following technical scheme:

an aircraft guided migration learning training algorithm based on destination movement prediction comprises the following steps:

(1) setting a kinematics and dynamics model of the aircraft, taking an aircraft guide environment of a fixed destination as a training environment, training a baseline agent B by using a reinforcement learning algorithm, and obtaining a current state

Including aircraft attitude and destination attitude information, aircraft action at current state

Is a guidance instruction for the aircraft;

(2) setting a destination movement model according to the destination movement characteristics, constructing an aircraft guide training environment of destination movement, setting an agent selection factor p in training and a factor updating method, initializing a guide agent A, wherein the input and output of the agent A are the same as those of an agent B, and starting the training of the agent A;

(3) using a guiding agent A to execute a segment by taking the segment (epicode) as a unit, and recording the current state for each time step t in the segment

Current state of the aircraft

Reward function

The state of the next moment

Four data, in quadruplets

Storing the mode;

(4) generating random numbers in the range of [0,1], wherein the random numbers comprise two conditions that the random numbers are smaller than the agent selection factor and are larger than or equal to the agent selection factor, updating the quadruple by using the agent B when the random numbers are smaller than the agent selection factor, and not updating the quadruple when the random numbers are larger than or equal to the agent selection factor;

(5) and training by using a reinforcement learning algorithm, and updating the agent selection factor p.

As a preferable technical solution, in the step (4), for the case of updating the quadruple, the updating is performed according to the following steps:

starting with a start step T =0 and going to an end step T = T, for each step T = N in the cycle, T =0 to TEach quadruple of N = N

In

Replacing the posture information of the middle destination into a quadruple at the moment of t = N

To form a new destination pose information

Is marked as

；

Starting at t =0 and ending at t = N, baseline agent B is used to replace the post-state information

For input, new quadruples are generated

At any moment in the stage, the aircraft successfully arrives at the destination, and the quadruple updating process is ended;

starting from the starting step T =0 and ending with the segment ending step T = T, without any replacement, the aircraft successfully arrives at the destination, and the last executed quadruple is recorded

；

Will be provided with

,

Attitude information of the medium aircraft and

，

replace the original quadruple

，

Attitude information of the medium aircraft and

,

for the on-line training reinforcement learning method, training is directly carried out according to the stored new quadruple, and for the off-line training reinforcement learning method, the new quadruple is stored in an experience pool;

and for the condition that the quadruple is not updated, for the on-line training reinforcement learning method, the quadruple is directly trained according to the stored quadruple, and for the off-line training reinforcement learning method, the quadruple is stored in an experience pool.

As a preferred technical solution, in the step (5), the agent selection factor p is updated according to the following formula:

k is attenuation factor and has a value range of [0,1]]When the agent selection factor is less than 0, the updating is stopped and the factor is made equal to 0.

Compared with the prior art, the invention has the following advantages and effects:

(1) the invention adopts a method combining transfer learning and reinforcement learning, and carries out destination prediction and updates training data based on a baseline intelligent agent trained by a fixed destination, thereby improving the training speed of the intelligent agent and training the aircraft guiding intelligent agent with different tasks more efficiently under the scenes of different moving destinations.

(2) The invention applies the aircraft guiding migration learning training algorithm based on destination motion prediction to an aircraft automatic guiding system, quickly trains a guiding intelligent body in a new task, generates an instruction to guide the aircraft to reach a moving destination along a specific direction, and has practical significance for the aircraft to execute a new flight task.

Drawings

FIG. 1 is a flowchart of an aircraft guided migration learning training algorithm based on destination movement prediction according to the present embodiment;

fig. 2 is a graph comparing success rate changes in the training process of the destination linear movement guiding agent of the training algorithm for aircraft guiding migration learning based on destination movement prediction according to the present embodiment;

fig. 3 is a route chart of an example of guidance of the destination straight-line moving aircraft according to the training algorithm for aircraft guidance migration learning based on destination motion prediction in the embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the embodiment, in a scene of destination linear movement, an aircraft is guided to fly, and an aircraft guiding migration learning training algorithm based on destination motion prediction utilizes an existing trained fixed destination aircraft guiding intelligent body as a baseline intelligent body, so that in the scene of destination linear movement, compared with from-the-beginning training, the guiding intelligent body can be trained more quickly to guide the aircraft to reach a moving destination along a specific direction; as shown in fig. 1, the method comprises the steps of:

Is a guidance instruction for the aircraft;

in this embodiment, the aircraft dynamic model is as follows:

wherein the content of the first and second substances,

is a three-dimensional coordinate of the aircraft,

are the velocity components of the aircraft in three directions,

for the pitch angle of the aircraft,

is the rate of change of the pitch angle of the aircraft,

in order to orient the aircraft towards the corner,

is the rate of change of the aircraft heading angle,

in order to be the speed of the aircraft,

in order to be the rate of change of the speed of the aircraft,

is the mass of the aircraft and is,L，D，Trespectively lift, drag andthe thrust of the aircraft is obtained by the aircraft,

is the angle of attack of the aircraft,

is the aircraft roll angle.

(2) Setting a moving model of a destination according to the characteristics of the destination movement, constructing an aircraft guide training environment of the destination movement, setting an agent selection factor p in training and a method for updating the factor, initializing a guide agent A, wherein the input and output of the agent A are the same as those of an agent B, and starting the training of the agent A.

In this embodiment, the destination moves straight at a constant speed; the agent selection factor initialization value is 1.

Current state of the aircraft

Reward function

The state of the next moment

Four data, in quadruplets

And storing the mode.

Generating random numbers in the range of [0,1], wherein the random numbers comprise two conditions that the random numbers are smaller than the agent selection factor and are larger than or equal to the agent selection factor, updating the quadruple by using the agent B when the random numbers are smaller than the agent selection factor, not updating the quadruple when the random numbers are larger than or equal to the agent selection factor, and updating the quadruple according to the following steps when the random numbers are larger than or equal to the agent selection factor:

starting with a start step T =0 and ending with a step T = T, a loop is made, for each step T = N in the loop, for each quadruple T =0 to T = N

In

To form a new destination pose information

Is marked as

；

For input, new quadruples are generated

；

Will be provided with

，

Attitude information of the medium aircraft and

，

replace the original quadruple

，

Attitude information of the medium aircraft and

，

for the condition that the quadruple is not updated, for the on-line training reinforcement learning method, the quadruple is directly trained according to the stored quadruple, and for the off-line training reinforcement learning method, the quadruple is stored in an experience pool;

in the embodiment, a near-end strategy optimization deep reinforcement learning method (PPO) is adopted to train a guiding intelligent agent, and the guiding intelligent agent continuously self-learns in an aircraft guiding training environment;

in the embodiment, the success rate of guidance is continuously increased in the guidance agent training process, and as shown in fig. 2, compared with the training from the beginning, the training algorithm for aircraft guidance migration learning based on destination motion prediction can be used for convergence faster. (5) Training by using a reinforcement learning method, and updating an agent selection factor p;

updating agent selection factor according to the following formulap：

K is attenuation factor and has a value range of [0,1]]When the agent selection factor is less than 0, stopping updating and making the factor equal to 0;

in the embodiment, K has a value of 0.001, as shown in fig. 3, the trained guiding agent can generate an accurate guiding instruction to guide the aircraft to the moving destination along a specific direction.

The above-mentioned embodiments only express one embodiment of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the claims.

Claims

1. An aircraft guided migration learning training algorithm based on destination movement prediction is characterized by comprising the following steps:

(1) setting a kinematics and dynamics model of the aircraft, taking an aircraft guide environment of a fixed destination as a training environment, training a baseline agent B by using a reinforcement learning method, and obtaining a current state

Is a guidance instruction for the aircraft;

Current state of the aircraft

Reward function

The state of the next moment

Four data, in quadruplets

Storing the mode;

(5) and training by using a reinforcement learning method, and updating the agent selection factor p.

2. The destination motion prediction-based aircraft guided migration learning training algorithm according to claim 1, wherein in the step (4), for the case of updating the quadruple, the updating is performed according to the following steps:

In

To form a new destination pose information

Is marked as

；

For input, new quadruples are generated

(ii) a Will be provided with

，

Attitude information of the medium aircraft and

，

replace the original quadruple

,

Attitude information of the medium aircraft and

，

3. The destination movement prediction-based aircraft guided migration learning training algorithm according to claim 1, wherein in the step (5), the agent selection factor p is updated according to the following formula: