CN112486192B - Aircraft guided transfer learning training algorithm based on destination movement prediction - Google Patents
Aircraft guided transfer learning training algorithm based on destination movement prediction Download PDFInfo
- Publication number
- CN112486192B CN112486192B CN202011294913.0A CN202011294913A CN112486192B CN 112486192 B CN112486192 B CN 112486192B CN 202011294913 A CN202011294913 A CN 202011294913A CN 112486192 B CN112486192 B CN 112486192B
- Authority
- CN
- China
- Prior art keywords
- aircraft
- agent
- destination
- quadruple
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 60
- 238000013526 transfer learning Methods 0.000 title description 3
- 238000000034 method Methods 0.000 claims abstract description 32
- 230000002787 reinforcement Effects 0.000 claims abstract description 23
- 238000013508 migration Methods 0.000 claims abstract description 14
- 230000005012 migration Effects 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims description 5
- 230000009471 action Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/08—Control of attitude, i.e. control of roll, pitch, or yaw
- G05D1/0808—Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses an aircraft guiding migration learning training algorithm based on destination movement prediction, which comprises the following steps: firstly, setting a kinematics and dynamics model of an aircraft, and training a fixed destination guide agent B; then, setting a mobile model of the mobile destination, setting an agent selection factor and an updating method, and initializing a mobile destination guide agent A; then, according to the agent selection factor, selecting to adopt the agent A to generate data of a segment, or adopt the agent B to update the data of a segment by a destination movement prediction method; and finally, training a moving destination guide intelligent agent by using a reinforcement learning algorithm to guide the aircraft to reach a moving destination along a specific direction. The method can be applied to an automatic guidance system of the aircraft, trains a new intelligent agent which can be applied to a new scene based on the existing guidance intelligent agent, and guides the aircraft to arrive at a moving destination from any posture along a specific direction.
Description
Technical Field
The invention relates to the field of aircraft guidance control, in particular to an aircraft guidance migration learning training algorithm based on destination motion prediction.
Background
In many flight missions, it is necessary to guide an aircraft in a particular direction to a destination of movement. The same aircraft is often required to face destinations of different movement patterns in different missions. Such as: landing the aircraft on different movement modes of the aircraft carrier; the aircraft achieves refueling occupation during refueling in the air; the aircraft can reach the position of the dominant situation in the air war, and the like.
The reinforcement learning has high execution efficiency and flexible use, and has a plurality of researches in the field of aircraft guidance. If the small unmanned aerial vehicle is guided to land on a moving vehicle, the fixed wing aircraft is guided to arrive at an airport along the runway direction, and the carrier aircraft is guided to arrive at an approach point of an aircraft carrier. For different destinations, the moving modes are different, if each scene is trained from the beginning, on one hand, more time is needed, and on the other hand, the success rate cannot be guaranteed. After the reinforcement learning and the migration learning are combined, the method can be used in an aircraft guiding task. For an aircraft, a guiding agent is trained and has the capability of guiding the aircraft to fly to a fixed destination. The intelligent agent is used as a baseline intelligent agent, for a new guiding task, aiming at destinations with different moving modes, training is carried out by using data generated by the baseline intelligent agent through a method of predicting destination movement, and the training speed of the intelligent agent and the guiding success rate of an aircraft can be improved. The method has practical significance for executing flight guidance tasks of different moving mode destinations.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an aircraft guiding migration learning training algorithm based on destination motion prediction.
In order to realize the purpose, the invention adopts the following technical scheme:
an aircraft guided migration learning training algorithm based on destination movement prediction comprises the following steps:
(1) setting a kinematics and dynamics model of the aircraft, taking an aircraft guide environment of a fixed destination as a training environment, training a baseline agent B by using a reinforcement learning algorithm, and obtaining a current stateIncluding aircraft attitude and destination attitude information, aircraft action at current stateIs a guidance instruction for the aircraft;
(2) setting a destination movement model according to the destination movement characteristics, constructing an aircraft guide training environment of destination movement, setting an agent selection factor p in training and a factor updating method, initializing a guide agent A, wherein the input and output of the agent A are the same as those of an agent B, and starting the training of the agent A;
(3) using a guiding agent A to execute a segment by taking the segment (epicode) as a unit, and recording the current state for each time step t in the segmentCurrent state of the aircraftReward functionThe state of the next momentFour data, in quadrupletsStoring the mode;
(4) generating random numbers in the range of [0,1], wherein the random numbers comprise two conditions that the random numbers are smaller than the agent selection factor and are larger than or equal to the agent selection factor, updating the quadruple by using the agent B when the random numbers are smaller than the agent selection factor, and not updating the quadruple when the random numbers are larger than or equal to the agent selection factor;
(5) and training by using a reinforcement learning algorithm, and updating the agent selection factor p.
As a preferable technical solution, in the step (4), for the case of updating the quadruple, the updating is performed according to the following steps:
starting with a start step T =0 and going to an end step T = T, for each step T = N in the cycle, T =0 to TEach quadruple of N = NInReplacing the posture information of the middle destination into a quadruple at the moment of t = NTo form a new destination pose informationIs marked as;
Starting at t =0 and ending at t = N, baseline agent B is used to replace the post-state informationFor input, new quadruples are generatedAt any moment in the stage, the aircraft successfully arrives at the destination, and the quadruple updating process is ended;
starting from the starting step T =0 and ending with the segment ending step T = T, without any replacement, the aircraft successfully arrives at the destination, and the last executed quadruple is recorded;
Will be provided with,Attitude information of the medium aircraft and,replace the original quadruple,Attitude information of the medium aircraft and,for the on-line training reinforcement learning method, training is directly carried out according to the stored new quadruple, and for the off-line training reinforcement learning method, the new quadruple is stored in an experience pool;
and for the condition that the quadruple is not updated, for the on-line training reinforcement learning method, the quadruple is directly trained according to the stored quadruple, and for the off-line training reinforcement learning method, the quadruple is stored in an experience pool.
As a preferred technical solution, in the step (5), the agent selection factor p is updated according to the following formula:k is attenuation factor and has a value range of [0,1]]When the agent selection factor is less than 0, the updating is stopped and the factor is made equal to 0.
Compared with the prior art, the invention has the following advantages and effects:
(1) the invention adopts a method combining transfer learning and reinforcement learning, and carries out destination prediction and updates training data based on a baseline intelligent agent trained by a fixed destination, thereby improving the training speed of the intelligent agent and training the aircraft guiding intelligent agent with different tasks more efficiently under the scenes of different moving destinations.
(2) The invention applies the aircraft guiding migration learning training algorithm based on destination motion prediction to an aircraft automatic guiding system, quickly trains a guiding intelligent body in a new task, generates an instruction to guide the aircraft to reach a moving destination along a specific direction, and has practical significance for the aircraft to execute a new flight task.
Drawings
FIG. 1 is a flowchart of an aircraft guided migration learning training algorithm based on destination movement prediction according to the present embodiment;
fig. 2 is a graph comparing success rate changes in the training process of the destination linear movement guiding agent of the training algorithm for aircraft guiding migration learning based on destination movement prediction according to the present embodiment;
fig. 3 is a route chart of an example of guidance of the destination straight-line moving aircraft according to the training algorithm for aircraft guidance migration learning based on destination motion prediction in the embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the embodiment, in a scene of destination linear movement, an aircraft is guided to fly, and an aircraft guiding migration learning training algorithm based on destination motion prediction utilizes an existing trained fixed destination aircraft guiding intelligent body as a baseline intelligent body, so that in the scene of destination linear movement, compared with from-the-beginning training, the guiding intelligent body can be trained more quickly to guide the aircraft to reach a moving destination along a specific direction; as shown in fig. 1, the method comprises the steps of:
(1) setting a kinematics and dynamics model of the aircraft, taking an aircraft guide environment of a fixed destination as a training environment, training a baseline agent B by using a reinforcement learning algorithm, and obtaining a current stateIncluding aircraft attitude and destination attitude information, aircraft action at current stateIs a guidance instruction for the aircraft;
in this embodiment, the aircraft dynamic model is as follows:
wherein the content of the first and second substances,is a three-dimensional coordinate of the aircraft,are the velocity components of the aircraft in three directions,for the pitch angle of the aircraft,is the rate of change of the pitch angle of the aircraft,in order to orient the aircraft towards the corner,is the rate of change of the aircraft heading angle,in order to be the speed of the aircraft,in order to be the rate of change of the speed of the aircraft,is the mass of the aircraft and is,L,D,Trespectively lift, drag andthe thrust of the aircraft is obtained by the aircraft,is the angle of attack of the aircraft,is the aircraft roll angle.
(2) Setting a moving model of a destination according to the characteristics of the destination movement, constructing an aircraft guide training environment of the destination movement, setting an agent selection factor p in training and a method for updating the factor, initializing a guide agent A, wherein the input and output of the agent A are the same as those of an agent B, and starting the training of the agent A.
In this embodiment, the destination moves straight at a constant speed; the agent selection factor initialization value is 1.
(3) Using a guiding agent A to execute a segment by taking the segment (epicode) as a unit, and recording the current state for each time step t in the segmentCurrent state of the aircraftReward functionThe state of the next momentFour data, in quadrupletsAnd storing the mode.
Generating random numbers in the range of [0,1], wherein the random numbers comprise two conditions that the random numbers are smaller than the agent selection factor and are larger than or equal to the agent selection factor, updating the quadruple by using the agent B when the random numbers are smaller than the agent selection factor, not updating the quadruple when the random numbers are larger than or equal to the agent selection factor, and updating the quadruple according to the following steps when the random numbers are larger than or equal to the agent selection factor:
starting with a start step T =0 and ending with a step T = T, a loop is made, for each step T = N in the loop, for each quadruple T =0 to T = NInReplacing the posture information of the middle destination into a quadruple at the moment of t = NTo form a new destination pose informationIs marked as;
Starting at t =0 and ending at t = N, baseline agent B is used to replace the post-state informationFor input, new quadruples are generatedAt any moment in the stage, the aircraft successfully arrives at the destination, and the quadruple updating process is ended;
starting from the starting step T =0 and ending with the segment ending step T = T, without any replacement, the aircraft successfully arrives at the destination, and the last executed quadruple is recorded;
Will be provided with,Attitude information of the medium aircraft and,replace the original quadruple,Attitude information of the medium aircraft and,for the on-line training reinforcement learning method, training is directly carried out according to the stored new quadruple, and for the off-line training reinforcement learning method, the new quadruple is stored in an experience pool;
for the condition that the quadruple is not updated, for the on-line training reinforcement learning method, the quadruple is directly trained according to the stored quadruple, and for the off-line training reinforcement learning method, the quadruple is stored in an experience pool;
in the embodiment, a near-end strategy optimization deep reinforcement learning method (PPO) is adopted to train a guiding intelligent agent, and the guiding intelligent agent continuously self-learns in an aircraft guiding training environment;
in the embodiment, the success rate of guidance is continuously increased in the guidance agent training process, and as shown in fig. 2, compared with the training from the beginning, the training algorithm for aircraft guidance migration learning based on destination motion prediction can be used for convergence faster. (5) Training by using a reinforcement learning method, and updating an agent selection factor p;
updating agent selection factor according to the following formulap:K is attenuation factor and has a value range of [0,1]]When the agent selection factor is less than 0, stopping updating and making the factor equal to 0;
in the embodiment, K has a value of 0.001, as shown in fig. 3, the trained guiding agent can generate an accurate guiding instruction to guide the aircraft to the moving destination along a specific direction.
The above-mentioned embodiments only express one embodiment of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the claims.
Claims (3)
1. An aircraft guided migration learning training algorithm based on destination movement prediction is characterized by comprising the following steps:
(1) setting a kinematics and dynamics model of the aircraft, taking an aircraft guide environment of a fixed destination as a training environment, training a baseline agent B by using a reinforcement learning method, and obtaining a current stateIncluding aircraft attitude and destination attitude information, aircraft action at current stateIs a guidance instruction for the aircraft;
(2) setting a destination movement model according to the destination movement characteristics, constructing an aircraft guide training environment of destination movement, setting an agent selection factor p in training and a factor updating method, initializing a guide agent A, wherein the input and output of the agent A are the same as those of an agent B, and starting the training of the agent A;
(3) using a guiding agent A to execute a segment by taking the segment (epicode) as a unit, and recording the current state for each time step t in the segmentCurrent state of the aircraftReward functionThe state of the next momentFour data, in quadrupletsStoring the mode;
(4) generating random numbers in the range of [0,1], wherein the random numbers comprise two conditions that the random numbers are smaller than the agent selection factor and are larger than or equal to the agent selection factor, updating the quadruple by using the agent B when the random numbers are smaller than the agent selection factor, and not updating the quadruple when the random numbers are larger than or equal to the agent selection factor;
(5) and training by using a reinforcement learning method, and updating the agent selection factor p.
2. The destination motion prediction-based aircraft guided migration learning training algorithm according to claim 1, wherein in the step (4), for the case of updating the quadruple, the updating is performed according to the following steps:
starting with a start step T =0 and ending with a step T = T, a loop is made, for each step T = N in the loop, for each quadruple T =0 to T = NInReplacing the posture information of the middle destination into a quadruple at the moment of t = NTo form a new destination pose informationIs marked as;
Starting at t =0 and ending at t = N, baseline agent B is used to replace the post-state informationFor input, new quadruples are generatedAt any moment in the stage, the aircraft successfully arrives at the destination, and the quadruple updating process is ended;
starting from the starting step T =0 and ending with the segment ending step T = T, without any replacement, the aircraft successfully arrives at the destination, and the last executed quadruple is recorded(ii) a Will be provided with,Attitude information of the medium aircraft and,replace the original quadruple,Attitude information of the medium aircraft and,for the on-line training reinforcement learning method, training is directly carried out according to the stored new quadruple, and for the off-line training reinforcement learning method, the new quadruple is stored in an experience pool;
and for the condition that the quadruple is not updated, for the on-line training reinforcement learning method, the quadruple is directly trained according to the stored quadruple, and for the off-line training reinforcement learning method, the quadruple is stored in an experience pool.
3. The destination movement prediction-based aircraft guided migration learning training algorithm according to claim 1, wherein in the step (5), the agent selection factor p is updated according to the following formula:k is attenuation factor and has a value range of [0,1]]When the agent selection factor is less than 0, the updating is stopped and the factor is made equal to 0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011294913.0A CN112486192B (en) | 2020-11-18 | 2020-11-18 | Aircraft guided transfer learning training algorithm based on destination movement prediction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011294913.0A CN112486192B (en) | 2020-11-18 | 2020-11-18 | Aircraft guided transfer learning training algorithm based on destination movement prediction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112486192A CN112486192A (en) | 2021-03-12 |
CN112486192B true CN112486192B (en) | 2022-04-08 |
Family
ID=74931399
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011294913.0A Active CN112486192B (en) | 2020-11-18 | 2020-11-18 | Aircraft guided transfer learning training algorithm based on destination movement prediction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112486192B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111026157A (en) * | 2019-12-18 | 2020-04-17 | 四川大学 | Intelligent aircraft guiding method based on reward remodeling reinforcement learning |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11176370B2 (en) * | 2018-07-02 | 2021-11-16 | United States Of America As Represented By The Secretary Of The Air Force | Diffusion maps and transfer subspace learning |
CN109164821B (en) * | 2018-09-26 | 2019-05-07 | 中科物栖(北京)科技有限责任公司 | A kind of UAV Attitude training method and device |
CN113396428B (en) * | 2019-03-05 | 2024-05-07 | 赫尔实验室有限公司 | Learning system, computer program product and method for multi-agent application |
CN111027143B (en) * | 2019-12-18 | 2020-12-04 | 四川大学 | Shipboard aircraft approach guiding method based on deep reinforcement learning |
CN111859541B (en) * | 2020-07-17 | 2022-10-14 | 西北工业大学 | PMADDPG multi-unmanned aerial vehicle task decision method based on transfer learning improvement |
-
2020
- 2020-11-18 CN CN202011294913.0A patent/CN112486192B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111026157A (en) * | 2019-12-18 | 2020-04-17 | 四川大学 | Intelligent aircraft guiding method based on reward remodeling reinforcement learning |
Non-Patent Citations (3)
Title |
---|
Design of Agent Training Environment for Aircraft Landing Guidance Based on Deep Reinforcement Learning;Zhuang Wang 等;《2018 11th International Symposium on Computational Intelligence and Design (ISCID)》;20190425;第76-79页 * |
专家知识辅助的强化学习研究及其在无人机路径规划中的应用;王国芳;《中国优秀博硕士学位论文全文数据库(博士) 工程科技Ⅱ辑》;20170815(第08期);第i-ii、12页 * |
在线更新的信息强度引导启发式Q学习;吴昊霖 等;《计算机应用研究》;20170721(第08期);第89-93页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112486192A (en) | 2021-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111026157B (en) | Intelligent aircraft guiding method based on reward remodeling reinforcement learning | |
Ambrosino et al. | Path generation and tracking in 3-D for UAVs | |
CN111027143B (en) | Shipboard aircraft approach guiding method based on deep reinforcement learning | |
Frank et al. | Hover, transition, and level flight control design for a single-propeller indoor airplane | |
CN110673620A (en) | Four-rotor unmanned aerial vehicle air line following control method based on deep reinforcement learning | |
CN111240348B (en) | Unmanned aerial vehicle landing control method based on motion base, computer readable storage medium and control equipment | |
Tao et al. | Swing-reducing flight control system for an underactuated indoor miniature autonomous blimp | |
CN108333934A (en) | The adaptive LOS guidance methods of aircushion vehicle path trace based on compensation of breakking away | |
CN106873615B (en) | Emergency return landing speed instruction set design method | |
CN105759609A (en) | Carrier-based aircraft autonomous landing method based on explicitly nonlinear model predictive control | |
Yu et al. | Analytical cooperative entry guidance for rendezvous and formation flight | |
Wang et al. | Design of agent training environment for aircraft landing guidance based on deep reinforcement learning | |
CN112486192B (en) | Aircraft guided transfer learning training algorithm based on destination movement prediction | |
Hoy et al. | Collision-free navigation of an autonomous unmanned helicopter in unknown urban environments: sliding mode and MPC approaches | |
CN113608543A (en) | Method, device, equipment and storage medium for self-adaptive planning of flight path of aircraft | |
Hartmann et al. | Control of departure and approach maneuvers of tiltwing VTOL aircraft | |
CN116697829A (en) | Rocket landing guidance method and system based on deep reinforcement learning | |
CN114995517A (en) | Subsonic aircraft trajectory planning method based on trajectory deflection angle deviation correction | |
Nakpiam et al. | Airship waypoint navigation in the presence of wind | |
CN117970952B (en) | Unmanned aerial vehicle maneuver strategy offline modeling method | |
CN114428517B (en) | End-to-end autonomous landing control method for unmanned plane and unmanned ship cooperative platform | |
Paiva et al. | Optimal aerial guidance in general wind fields | |
Huo et al. | Memory-based reinforcement learning for trans-domain tiltrotor robot control | |
Zhang et al. | Integration of path planning and following control for the stratospheric airship with forecasted wind field data | |
KR102670927B1 (en) | Autonomous flight platform using actor-critic deep reinforcement learning-based target point estimation and collision avoidance technique for intelligent autonomous flight |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |