CN114942651A - Unmanned aerial vehicle autonomous control method and system based on experience pool optimization - Google Patents

Unmanned aerial vehicle autonomous control method and system based on experience pool optimization Download PDF

Info

Publication number
CN114942651A
CN114942651A CN202210654543.XA CN202210654543A CN114942651A CN 114942651 A CN114942651 A CN 114942651A CN 202210654543 A CN202210654543 A CN 202210654543A CN 114942651 A CN114942651 A CN 114942651A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
encoder
self
autonomous control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210654543.XA
Other languages
Chinese (zh)
Inventor
韩升
林友芳
吕凯
张硕
宋明惠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202210654543.XA priority Critical patent/CN114942651A/en
Publication of CN114942651A publication Critical patent/CN114942651A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an unmanned aerial vehicle autonomous control method and system based on experience pool optimization, and belongs to the technical field of flight control. The method comprises the following steps: setting a simulation environment for a target unmanned aerial vehicle in an unmanned aerial vehicle simulator; establishing a state space, an action space and a reward function of the target unmanned aerial vehicle; constructing an auto-encoder for feature extraction according to the state space and the action space; constructing an unmanned aerial vehicle autonomous control task decision network model; loading the simulation environment to simulate the flight of the target unmanned aerial vehicle, generating experience data through the reward function, extracting characteristic values of the experience data through a self-encoder, screening the experience data according to the characteristic values, and training the unmanned aerial vehicle autonomous control task decision network model according to the screened experience data; and autonomously controlling the target unmanned aerial vehicle through the trained unmanned aerial vehicle autonomous control task decision network model. The invention improves the diversity of experience in the experience pool.

Description

Unmanned aerial vehicle autonomous control method and system based on experience pool optimization
Technical Field
The invention relates to the technical field of flight control, in particular to an unmanned aerial vehicle autonomous control method and system based on experience pool optimization.
Background
An Unmanned Aerial Vehicle (UAV), or simply an Unmanned Aerial Vehicle, can control flight by radio using a remote control device, and can also autonomously fly by using a sensor device and an internal program. Along with the development of the unmanned aerial vehicle correlation technique, the unmanned aerial vehicle gradually integrates into the life of people. Because the unmanned aerial vehicle has the advantages of low cost, small volume, high flexibility, strong battlefield viability, convenient operation and the like, the unmanned aerial vehicle is widely applied to the military and civil fields.
When the unmanned aerial vehicle is rapidly developed, the unmanned aerial vehicle autonomous flight control technology for restricting the application of the unmanned aerial vehicle is also widely concerned, and numerous students and organizations invest in research. In the long-term research process, a plurality of control methods are proposed accordingly, and the existing methods can be divided into a conventional linear control method, a general nonlinear control method and a learning-based control method. However, the traditional linear control algorithm cannot output accurate control due to the limitation of a model structure and has the problems of poor anti-interference capability, and a general nonlinear method too depends on expert experience and also has the problem of low control accuracy. The deep reinforcement learning combines the neural network in the deep learning with the reinforcement learning idea, and shows better performance in a plurality of sequence decision tasks. Because the neural network can infinitely approximate any continuous function and is not limited by a specific controller structure, high-precision control actions can be output.
However, the problem of sparse rewards exists in the autonomous control task of the unmanned aerial vehicle, and a better strategy is difficult to learn in a reward sparse environment by traditional deep learning. This is mainly due to the fact that a lot of experiences are stored in the experience pool, which cannot help the intelligent agent to learn, and therefore it is difficult to obtain effective experiences when the experiences are replayed.
Disclosure of Invention
Aiming at the problems, the invention provides an unmanned aerial vehicle autonomous control method based on experience pool optimization, which comprises the following steps:
setting a simulation environment for a target unmanned aerial vehicle in an unmanned aerial vehicle simulator;
establishing a state space, an action space and a reward function of the target unmanned aerial vehicle;
constructing an auto-encoder for feature extraction according to the state space and the action space;
constructing an unmanned aerial vehicle autonomous control task decision network model;
loading the simulated environment simulation target unmanned aerial vehicle to fly, generating experience data through the reward function, extracting characteristic values of the experience data through a self-encoder, screening the experience data according to the characteristic values, and training the unmanned aerial vehicle autonomous control task decision network model according to the screened experience data;
and autonomously controlling the target unmanned aerial vehicle through the trained unmanned aerial vehicle autonomous control task decision network model.
Optionally, the method further comprises: in the process of setting the simulation environment, the starting position and the target position of the target unmanned aerial vehicle flying in the simulation environment are set simultaneously.
Optionally, the self-encoder for feature extraction includes: the self-encoder is used for extracting the state information features and the self-encoder is used for extracting the action information features;
the self-encoder for extracting the state information features is constructed according to a state space, and specifically comprises the following steps: establishing a simulation state data set according to a state space, and establishing a self-encoder for extracting state information characteristics according to the simulation state data set;
the self-encoder for extracting the motion information features is constructed according to a motion space, and specifically comprises the following steps: and constructing a simulated motion data set according to the motion space, and constructing a self-encoder for extracting motion information characteristics according to the simulated motion data set.
Optionally, the network model for autonomous control task decision of the unmanned aerial vehicle includes: an Actor network and a Critic network.
Optionally, the loading the simulation environment to simulate the flight of the target unmanned aerial vehicle, generating experience data through the reward function, extracting a feature value of the experience data through a self-encoder, screening the experience data according to the feature value, and training the autonomous control task decision network model of the unmanned aerial vehicle according to the screened experience data specifically includes:
loading a simulation environment;
making an action decision according to the current state information of the target unmanned aerial vehicle by using an unmanned aerial vehicle autonomous control task decision network model; the action decision is used for controlling the target unmanned aerial vehicle to simulate flight in a simulation environment;
calculating reward values generated after actions of the target unmanned aerial vehicle act on the simulation environment at all times through the reward function, and generating experience data according to the reward values, the state information, the action information and the new state information of the target unmanned aerial vehicle;
extracting characteristic values of the empirical data through a self-encoder, and screening the empirical data according to the characteristic values;
and training the unmanned aerial vehicle autonomous control task decision network model through the screened empirical data.
The invention also provides an unmanned aerial vehicle autonomous control system based on experience pool optimization, which comprises the following steps:
a simulation environment construction unit which sets a simulation environment for the target unmanned aerial vehicle in the unmanned aerial vehicle simulator;
the first calculation unit is used for establishing a state space, an action space and a reward function of the target unmanned aerial vehicle;
the second calculation unit is used for constructing a self-encoder for feature extraction according to the state space and the action space;
the model building unit is used for building an unmanned aerial vehicle autonomous control task decision network model;
the model training unit is used for loading the simulated environment simulation target unmanned aerial vehicle to fly, generating experience data through the reward function, extracting characteristic values of the experience data through the self-encoder, screening the experience data according to the characteristic values, and training the unmanned aerial vehicle autonomous control task decision network model according to the screened experience data;
and the unmanned aerial vehicle autonomous control unit autonomously controls the target unmanned aerial vehicle through the trained unmanned aerial vehicle autonomous control task decision network model.
Optionally, the simulation environment building unit is further configured to: in the process of setting the simulation environment, the starting position and the target position of the target unmanned aerial vehicle flying in the simulation environment are set simultaneously.
Optionally, the self-encoder for feature extraction includes: the self-encoder is used for extracting the state information features and the self-encoder is used for extracting the action information features;
the self-encoder for extracting the state information features is constructed according to a state space, and specifically comprises the following steps: establishing a simulation state data set according to a state space, and establishing a self-encoder for extracting state information characteristics according to the simulation state data set;
the self-encoder for extracting the motion information features is constructed according to a motion space, and specifically comprises the following steps: and constructing a simulated motion data set according to the motion space, and constructing a self-encoder for extracting the motion information characteristics according to the simulated motion data set.
Optionally, the network model for autonomous task control decision of the unmanned aerial vehicle includes: an Actor network and a Critic network.
Optionally, the loading the simulation environment to simulate the flight of the target unmanned aerial vehicle, generating experience data through the reward function, extracting a feature value of the experience data through a self-encoder, screening the experience data according to the feature value, and training the autonomous control task decision network model of the unmanned aerial vehicle according to the screened experience data specifically includes:
loading a simulation environment;
making an action decision according to the current state information of the target unmanned aerial vehicle by using an unmanned aerial vehicle autonomous control task decision network model; the action decision is used for controlling the target unmanned aerial vehicle to simulate flight in a simulation environment;
calculating reward values generated after actions of the target unmanned aerial vehicle act on the simulation environment at all times through the reward function, and generating experience data according to the reward values, the state information and the action information of the target unmanned aerial vehicle and new state information;
extracting characteristic values of the empirical data through a self-encoder, and screening the empirical data according to the characteristic values;
and training the unmanned aerial vehicle autonomous control task decision network model through the screened empirical data.
The method can effectively reduce repeated experience stored in the experience pool, improve the diversity of the experience in the experience pool, ensure that the intelligent agent can learn various experiences as much as possible, solve the problem that the intelligent agent is difficult to learn the optimal strategy due to sparse reward, and accelerate the learning of the decision network model.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a block diagram of the system of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
The invention provides an unmanned aerial vehicle autonomous control method based on experience pool optimization, which comprises the following steps of:
setting a simulation environment for a target unmanned aerial vehicle in an unmanned aerial vehicle simulator;
establishing a state space, an action space and a reward function of the target unmanned aerial vehicle;
constructing an auto-encoder for feature extraction according to the state space and the action space;
constructing an unmanned aerial vehicle autonomous control task decision network model;
loading the simulation environment to simulate the flight of the target unmanned aerial vehicle, generating experience data through the reward function, extracting characteristic values of the experience data through a self-encoder, screening the experience data according to the characteristic values, and training the unmanned aerial vehicle autonomous control task decision network model according to the screened experience data;
and autonomously controlling the target unmanned aerial vehicle through the trained unmanned aerial vehicle autonomous control task decision network model.
Wherein, the method further comprises: in the process of setting the simulation environment, the starting position and the target position of the target unmanned aerial vehicle flying in the simulation environment are set simultaneously.
Three attitude angles describing the state of the unmanned aerial vehicle are respectively a Roll angle (Roll), a Pitch angle (Pitch) and a Yaw angle (Yaw), and the attitude angles at three angles are defined as (Roll, Pitch, Yaw); defining the angular velocities at the three attitude angles as (roll _ v, pitch _ v, yaw _ v); defining the accelerations in the three attitude angular directions as (roll _ a, pitch _ a, yaw _ a); defining the coordinates of the target position relative to the current position of the unmanned aerial vehicle on the three-dimensional coordinate system as (point _ x, point _ y, point _ z); defining the speed of the unmanned aerial vehicle on a three-dimensional coordinate system as (v _ x, v _ y, v _ z); the acceleration of the unmanned aerial vehicle on the three-dimensional coordinate system is (a _ x, a _ y, a _ z);
establishing a state space, an action space and a reward function of the target unmanned aerial vehicle according to the defined parameters;
establishing a state space of the unmanned aerial vehicle, specifically as follows;
the state of each unmanned aerial vehicle comprises the attitude and the dynamic information of the unmanned aerial vehicle, and the state of the unmanned aerial vehicle at the moment t is defined as:
Figure BDA0003686947500000061
the states of the unmanned aerial vehicle at all times form a state space of the unmanned aerial vehicle;
establishing an action space of the unmanned aerial vehicle, specifically as follows;
at time t, the state of the unmanned aerial vehicle is transferred to the agent, and the agent outputs unmanned aerial vehicle control actions (action _ pitch, action _ yaw, action _ pitch) according to the current strategy, wherein the action _ pitch represents control of a pitch angle, the action _ yaw represents control of a roll angle, and the action _ pitch represents control of an accelerator. The value ranges of the three control actions are all between-1 and 1.
Establishing an unmanned aerial vehicle reward function, which specifically comprises the following steps:
the reward function defining the pitch angle is given by the following equation (2), where P 1 Indicating the current pitch angle, P, of the drone 2 Indicates the pitch angle, r, at which the drone should be P A reward value representing the drone in pitch attitude:
Figure BDA0003686947500000062
the reward function defining the Roll angle is as the following formula (3), Roll represents the current Roll angle of the unmanned plane, r R Reward value representing roll gesture:
r R =-|Roll| (3)
the reward function defining the heading angle is given by the following equation (4), where Y 1 Indicating the current heading angle, Y, of the drone 2 Indicating the heading angle, r, at which the drone should be Y The reward value representing the heading pose is:
Figure BDA0003686947500000071
the final drone reward function is set to:
r=(r p +r y +r R )*r dis (5)
wherein r is dis Being unmanned aerial vehiclesThe advancing distance, r is the final reward value;
wherein, the self-encoder for feature extraction comprises: the self-encoder is used for extracting the state information features and the self-encoder is used for extracting the action information features;
the self-encoder for extracting the state information features is constructed according to a state space;
the self-encoder for extracting the motion information features is constructed according to the motion space.
The state space is used for constructing a simulation state data set, and a self-encoder for extracting state information features is constructed according to the simulation state data set.
The motion space is used for constructing a simulated motion data set, and a self-encoder for extracting motion information features is constructed according to the simulated motion data set.
Wherein, unmanned aerial vehicle is from master control task decision network model includes: an Actor network and a Critic network. The Actor network and the Critic network both have a dual-network structure and have respective target networks and current networks.
The method includes the steps of loading the simulation environment to simulate the flight of a target unmanned aerial vehicle, obtaining experience data of simulating the flight of the target unmanned aerial vehicle through the reward function and the self-encoder, and training the unmanned aerial vehicle autonomous control task decision network model according to the experience data, and specifically includes the following steps:
loading a simulation environment;
an Actor network of an unmanned aerial vehicle autonomous control task decision network model is used for making an action decision according to the current state information of the target unmanned aerial vehicle; the action decision is used for controlling the target unmanned aerial vehicle to simulate flight in a simulation environment; in the flight process, the state of the simulation environment changes;
through the reward function, calculating empirical data generated after actions of the target unmanned aerial vehicle at all times act on the simulation environment, specifically:
the reward function calculates the action generated by the unmanned aerial vehicle at the current moment, and the reward value generated after the action is acted on the environment, so that the experience data of the unmanned aerial vehicle is obtained. Unmanned planeThe experience data of the unmanned aerial vehicle comprises the current time state of the unmanned aerial vehicle, the action strategy, the reward value of the unmanned aerial vehicle and the state of the unmanned aerial vehicle at the next time. One experience data of the unmanned plane is expressed as<s t ,a t ,r t ,s t+1 >Wherein s is t State representing the current moment in the unmanned aerial vehicle mission, a i Representing the action strategy of the drone at the current moment, r i The reward value, s, representing the action of the unmanned plane at the current moment i+1 Representing the state of the next moment in the unmanned aerial vehicle task;
extracting the empirical data through a self-encoder; the unmanned human-computer system continuously generates experience data at every moment, the space for storing the experience data is defined as an experience pool, the experience data needs to be subjected to feature extraction through a self-encoder to obtain a feature value f, and the space for storing the feature value f is defined as a feature recording table.
Before each piece of experience data is stored in the experience pool, whether the experience pool is full or not needs to be judged, if the experience pool is full, one piece of experience data needs to be removed, feature extraction is carried out by using a self-encoder, and a feature value f is obtained i Simultaneously recording the characteristic value f in the characteristic record table i Removing;
performing feature extraction on the current empirical data by using an autoencoder to obtain a feature value f j Looking up the feature value f in the feature record table j Whether it already exists. If the characteristic value f j If the current experience data exists, the current experience data is not stored in the experience pool;
if the characteristic value f does not exist in the characteristic record table j The characteristic value is stored in a characteristic record table, and the current experience data is stored in an experience pool.
Training an unmanned aerial vehicle autonomous control task decision network model through the extracted empirical data, and specifically comprises the following steps:
randomly taking N experiences at different moments from an experience pool to form a sampling experience data set with a structure of < S, A, R, S '> wherein S is a current moment state set of the unmanned aerial vehicle in the sampling experience data set, A is a current moment unmanned aerial vehicle action strategy set in the sampling experience data set, R is a current moment unmanned aerial vehicle reward value set in the sampling experience data set, S' is a next moment state set of the unmanned aerial vehicle, and the current state set S is obtained by adopting the current action set A;
inputting S ' into a target Actor network to obtain all unmanned aerial vehicle action strategy sets A ' at the next moment, and then inputting A ' and S ' into a target criticic network together to obtain a target Q ' value estimated at the next moment;
the loss function defining the criticic network is:
Figure BDA0003686947500000081
wherein, theta Q Is a parameter of the current Critic network, and N represents an extracted experience number during training; q(s) i ,a iQ ) Is represented by s i And a i When the input is input, the output Q value of the current Critic network is obtained;
y i can be expressed as:
y i =r i +γQ′(s i+1 ,μ′(s i+1μ′ )|θ Q′ ) (7)
wherein gamma is a discount factor and theta Q′ Is a parameter in the target Critic network, θ μ′ Is the output of the target Actor network; q'(s) i+1 ,μ′(s i+1μ′ )|θ Q′ ) Is represented by s i+1 And μ'(s) i+1μ′ ) The output of the target Critic as an input;
with y i Updating the weight of the current Actor network for training the label through back propagation;
training and updating the weight of the current Critic network by adopting a different strategy method;
updating the weights of the target Critic network and the target Actor network in a soft updating mode at fixed time intervals;
and stopping training when the set training times are reached.
The invention further provides an unmanned aerial vehicle autonomous control system 200 based on experience pool optimization, as shown in fig. 2, including:
a simulation environment construction unit 201 that sets a simulation environment for a target unmanned aerial vehicle in the unmanned aerial vehicle simulator;
the first calculation unit 202 is used for establishing a state space, an action space and a reward function of the target unmanned aerial vehicle;
a second calculation unit 203, which constructs a self-encoder for feature extraction according to the state space and the action space;
the model building unit 204 is used for building an unmanned aerial vehicle autonomous control task decision network model;
the model training unit 205 is used for loading the simulated environment simulation target unmanned aerial vehicle to fly, generating experience data through the reward function, extracting characteristic values of the experience data through a self-encoder, screening the experience data according to the characteristic values, and training the unmanned aerial vehicle autonomous control task decision network model according to the screened experience data;
the unmanned aerial vehicle autonomous control unit 206 autonomously controls the target unmanned aerial vehicle through the trained unmanned aerial vehicle autonomous control task decision network model.
Wherein, the simulation environment construction unit 201 is further configured to: in the process of setting the simulation environment, the starting position and the target position of the target unmanned aerial vehicle flying in the simulation environment are set simultaneously.
Wherein, the self-encoder for feature extraction comprises: the self-encoder is used for extracting the state information features and the self-encoder is used for extracting the action information features;
the self-encoder for extracting the state information features is constructed according to a state space, and specifically comprises the following steps: establishing a simulation state data set according to a state space, and establishing a self-encoder for extracting state information characteristics according to the simulation state data set;
the self-encoder for extracting the motion information features is constructed according to a motion space, and specifically comprises the following steps: and constructing a simulated motion data set according to the motion space, and constructing a self-encoder for extracting the motion information characteristics according to the simulated motion data set.
Wherein, unmanned aerial vehicle is from master control task decision network model includes: an Actor network and a Critic network.
The method includes the steps of loading the simulation environment to simulate the flight of a target unmanned aerial vehicle, generating experience data through the reward function, extracting characteristic values of the experience data through a self-encoder, screening the experience data according to the characteristic values, and training the unmanned aerial vehicle autonomous control task decision network model according to the screened experience data, and specifically includes the following steps:
loading a simulation environment;
making an action decision according to the current state information of the target unmanned aerial vehicle by using an unmanned aerial vehicle autonomous control task decision network model; the action decision is used for controlling the target unmanned aerial vehicle to simulate flight in a simulation environment;
calculating reward values generated after actions of the target unmanned aerial vehicle act on the simulation environment at all times through the reward function, and generating experience data according to the reward values, the state information and the action information of the target unmanned aerial vehicle and new state information;
extracting characteristic values of the empirical data through a self-encoder, and screening the empirical data according to the characteristic values;
and training the unmanned aerial vehicle autonomous control task decision network model through the screened empirical data.
The method can effectively reduce repeated experience stored in the experience pool, improve the diversity of the experience in the experience pool, ensure that the intelligent agent can learn various experiences as much as possible, solve the problem that the intelligent agent is difficult to learn the optimal strategy due to sparse reward, and accelerate the learning of the decision network model.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the invention can be realized by adopting various computer languages, such as object-oriented programming language Java and transliterated scripting language JavaScript.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. An unmanned aerial vehicle autonomous control method based on experience pool optimization is characterized by comprising the following steps:
setting a simulation environment for a target unmanned aerial vehicle in an unmanned aerial vehicle simulator;
establishing a state space, an action space and a reward function of the target unmanned aerial vehicle;
constructing an auto-encoder for feature extraction according to the state space and the action space;
constructing an unmanned aerial vehicle autonomous control task decision network model;
loading the simulated environment simulation target unmanned aerial vehicle to fly, generating experience data through the reward function, extracting characteristic values of the experience data through a self-encoder, screening the experience data according to the characteristic values, and training the unmanned aerial vehicle autonomous control task decision network model according to the screened experience data;
and autonomously controlling the target unmanned aerial vehicle through the trained unmanned aerial vehicle autonomous control task decision network model.
2. The method of claim 1, further comprising: in the process of setting the simulation environment, the starting position and the target position of the target unmanned aerial vehicle flying in the simulation environment are set simultaneously.
3. The method of claim 1, wherein the self-encoder for feature extraction comprises: the self-encoder is used for extracting the state information features and the self-encoder is used for extracting the action information features;
the self-encoder for extracting the state information features is constructed according to a state space, and specifically comprises the following steps: establishing a simulation state data set according to a state space, and establishing a self-encoder for extracting state information characteristics according to the simulation state data set;
the self-encoder for extracting the motion information features is constructed according to a motion space, and specifically comprises the following steps: and constructing a simulated motion data set according to the motion space, and constructing a self-encoder for extracting motion information characteristics according to the simulated motion data set.
4. The method of claim 1, wherein the drone autonomous control task decision network model comprises: an Actor network and a Critic network.
5. The method according to claim 1, wherein the loading the simulation environment to simulate the flight of the target drone, generating experience data through the reward function, extracting feature values of the experience data through a self-encoder, screening the experience data according to the feature values, and training the drone autonomous control task decision network model according to the screened experience data specifically comprises:
loading a simulation environment;
making an action decision according to the current state information of the target unmanned aerial vehicle by using an unmanned aerial vehicle autonomous control task decision network model; the action decision is used for controlling the target unmanned aerial vehicle to simulate flight in a simulation environment;
calculating reward values generated after actions of the target unmanned aerial vehicle act on the simulation environment at all times through the reward function, and generating experience data according to the reward values, the state information and the action information of the target unmanned aerial vehicle and new state information;
extracting characteristic values of the empirical data through a self-encoder, and screening the empirical data according to the characteristic values;
and training the unmanned aerial vehicle autonomous control task decision network model through the screened empirical data.
6. An autonomous unmanned aerial vehicle control system based on experience pool optimization, the system comprising:
a simulation environment building unit which sets a simulation environment for a target unmanned aerial vehicle in the unmanned aerial vehicle simulator;
the first calculation unit is used for establishing a state space, an action space and a reward function of the target unmanned aerial vehicle;
the second calculation unit is used for constructing a self-encoder for feature extraction according to the state space and the action space;
the model building unit is used for building an unmanned aerial vehicle autonomous control task decision network model;
the model training unit is used for loading the simulated environment simulation target unmanned aerial vehicle to fly, generating experience data through the reward function, extracting characteristic values of the experience data through the self-encoder, screening the experience data according to the characteristic values, and training the unmanned aerial vehicle autonomous control task decision network model according to the screened experience data;
and the unmanned aerial vehicle autonomous control unit autonomously controls the target unmanned aerial vehicle through the trained unmanned aerial vehicle autonomous control task decision network model.
7. The system of claim 6, wherein the simulation environment construction unit is further configured to: in the process of setting the simulation environment, the starting position and the target position of the target unmanned aerial vehicle flying in the simulation environment are set simultaneously.
8. The system of claim 6, wherein the self-encoder for feature extraction comprises: the self-encoder is used for extracting the state information features and the self-encoder is used for extracting the action information features;
the self-encoder for extracting the state information features is constructed according to a state space, and specifically comprises the following steps: establishing a simulation state data set according to a state space, and establishing a self-encoder for extracting state information characteristics according to the simulation state data set;
the self-encoder for extracting the motion information features is constructed according to a motion space, and specifically comprises the following steps: and constructing a simulated motion data set according to the motion space, and constructing a self-encoder for extracting motion information characteristics according to the simulated motion data set.
9. The system of claim 6, wherein the drone autonomous control task decision network model comprises: an Actor network and a Critic network.
10. The system according to claim 6, wherein the loading of the simulated environment simulating the flight of the target drone, the generating of experience data through the reward function, the extracting of feature values of the experience data through the self-encoder, the screening of the experience data according to the feature values, and the training of the drone autonomous control task decision network model according to the screened experience data specifically include:
loading a simulation environment;
making an action decision according to the current state information of the target unmanned aerial vehicle by using an unmanned aerial vehicle autonomous control task decision network model; the action decision is used for controlling the target unmanned aerial vehicle to simulate flight in a simulation environment;
calculating reward values generated after actions of the target unmanned aerial vehicle act on the simulation environment at all times through the reward function, and generating experience data according to the reward values, the state information and the action information of the target unmanned aerial vehicle and new state information;
extracting characteristic values of the empirical data through a self-encoder, and screening the empirical data according to the characteristic values;
and training the unmanned aerial vehicle autonomous control task decision network model through the screened empirical data.
CN202210654543.XA 2022-06-10 2022-06-10 Unmanned aerial vehicle autonomous control method and system based on experience pool optimization Pending CN114942651A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210654543.XA CN114942651A (en) 2022-06-10 2022-06-10 Unmanned aerial vehicle autonomous control method and system based on experience pool optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210654543.XA CN114942651A (en) 2022-06-10 2022-06-10 Unmanned aerial vehicle autonomous control method and system based on experience pool optimization

Publications (1)

Publication Number Publication Date
CN114942651A true CN114942651A (en) 2022-08-26

Family

ID=82910072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210654543.XA Pending CN114942651A (en) 2022-06-10 2022-06-10 Unmanned aerial vehicle autonomous control method and system based on experience pool optimization

Country Status (1)

Country Link
CN (1) CN114942651A (en)

Similar Documents

Publication Publication Date Title
CN111061277B (en) Unmanned vehicle global path planning method and device
US11150655B2 (en) Method and system for training unmanned aerial vehicle control model based on artificial intelligence
US11779837B2 (en) Method, apparatus, and device for scheduling virtual objects in virtual environment
CN112131786B (en) Target detection and distribution method and device based on multi-agent reinforcement learning
CN107103164B (en) Distribution method and device for unmanned aerial vehicle to execute multiple tasks
CN112433525A (en) Mobile robot navigation method based on simulation learning and deep reinforcement learning
CN111417964A (en) Distributed training using heterogeneous actor-evaluator reinforcement learning
CN112215350B (en) Method and device for controlling agent based on reinforcement learning
US20200379486A1 (en) Autonomous Behavior Generation for Aircraft
CN112508164B (en) End-to-end automatic driving model pre-training method based on asynchronous supervised learning
CN111580544A (en) Unmanned aerial vehicle target tracking control method based on reinforcement learning PPO algorithm
CN115562357B (en) Intelligent path planning method for unmanned aerial vehicle cluster
CN115755956B (en) Knowledge and data collaborative driving unmanned aerial vehicle maneuvering decision method and system
Zhou et al. An air combat decision learning system based on a brain-like cognitive mechanism
CN114089776A (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN111830848A (en) Unmanned aerial vehicle super-maneuvering flight performance simulation training system and method
CN112131661A (en) Method for unmanned aerial vehicle to autonomously follow moving target
CN115809609A (en) Target searching method and system for multi-underwater autonomous aircraft
CN114815891A (en) PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method
CN113110101B (en) Production line mobile robot gathering type recovery and warehousing simulation method and system
CN117763713A (en) Simulation method and application of flight dynamics of aircraft
CN117406762A (en) Unmanned aerial vehicle remote control algorithm based on sectional reinforcement learning
CN114942651A (en) Unmanned aerial vehicle autonomous control method and system based on experience pool optimization
CN114840928B (en) Underwater vehicle cluster motion simulation method based on deep learning
CN113919475B (en) Robot skill learning method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination