CN114489107B - Aircraft double-delay depth certainty strategy gradient attitude control method - Google Patents

Aircraft double-delay depth certainty strategy gradient attitude control method Download PDF

Info

Publication number
CN114489107B
CN114489107B CN202210113006.4A CN202210113006A CN114489107B CN 114489107 B CN114489107 B CN 114489107B CN 202210113006 A CN202210113006 A CN 202210113006A CN 114489107 B CN114489107 B CN 114489107B
Authority
CN
China
Prior art keywords
network
aircraft
reinforcement learning
target
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210113006.4A
Other languages
Chinese (zh)
Other versions
CN114489107A (en
Inventor
韦常柱
朱光楠
刘哲
浦甲伦
徐世昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Zhuyu Aerospace Technology Co ltd
Original Assignee
Harbin Zhuyu Aerospace Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Zhuyu Aerospace Technology Co ltd filed Critical Harbin Zhuyu Aerospace Technology Co ltd
Priority to CN202210113006.4A priority Critical patent/CN114489107B/en
Publication of CN114489107A publication Critical patent/CN114489107A/en
Application granted granted Critical
Publication of CN114489107B publication Critical patent/CN114489107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/08Control of attitude, i.e. control of roll, pitch, or yaw
    • G05D1/0808Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
    • G05D1/0816Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft to ensure stability
    • G05D1/0833Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft to ensure stability using limited authority control
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

A gradient attitude control method for an aircraft dual-delay depth certainty strategy belongs to the technical field of aircraft control. The method comprises the following steps: establishing an aircraft dynamics model to form a reinforcement learning environment; initializing a reinforcement learning interaction environment, an agent and a maximum step number; obtaining a control quantity of the aircraft as an action quantity; calculating a reward function value corresponding to the action quantity and the next observed quantity, and combining to form experience data to be recorded to an experience playback area; adjusting the parameters of the intelligent agent to complete a round of reinforcement learning; and outputting the fuel-air mixing ratio of the aircraft control quantity and the elevator deflection angle. The invention relates to a high-precision self-adaptive aircraft intelligent control method, which carries out reinforcement learning by a double-delay depth certainty strategy gradient method, realizes the design of an optimal attitude controller which is weakly dependent on a model, only needs a basic model of an aircraft, and does not need to give out all parameter quantities in the model accurately, thereby weakening the dependence degree of the control system design on the model.

Description

Aircraft double-delay depth certainty strategy gradient attitude control method
Technical Field
The invention relates to a gradient attitude control method for a double-delay depth certainty strategy of various aircrafts, belonging to the technical field of aircraft control.
Background
The aircraft is difficult to establish an accurate control model due to the problems of high parameter uncertainty and coupling, high model nonlinearity, complex interference and the like, and a design method of a traditional controller depends on a relatively accurate control model, so that a control method with a design process which is weakly dependent on the accurate model of the aircraft needs to be developed.
Disclosure of Invention
In order to solve the problems in the background art, the invention provides a gradient attitude control method of an aircraft double-delay depth deterministic strategy.
The invention adopts the following technical scheme: an aircraft dual-delay depth-deterministic tactical gradient attitude control method, said method comprising the steps of:
s1: establishing an aircraft dynamics model, and packaging to form a reinforcement learning environment;
s2: initializing a reinforcement learning interaction environment, an agent and a maximum step number;
s3: obtaining a control quantity of the aircraft as an action quantity; calculating a reward function value corresponding to the action quantity and the next observed quantity, and storing experience data into an experience playback area;
s4: randomly sampling empirical data in a self-experience replay area, and adjusting parameters of the intelligent agent based on a double-delay depth certainty strategy gradient algorithm to complete a round of reinforcement learning;
if the accumulated number of rounds of reinforcement learning does not reach the maximum number of steps defined in S2, returning to S3; otherwise, ending the reinforcement learning;
s5: after the reinforcement learning is finished, storing the intelligent agent, and storing the Actor network to be used as an adaptive controller; and the self-adaptive controller outputs the fuel-air mixing ratio and the elevator deflection angle of the aircraft control quantity under the condition of inputting the general observation measurement.
Compared with the prior art, the invention has the beneficial effects that:
the invention relates to a high-precision self-adaptive aircraft intelligent control method, which carries out reinforcement learning by a double-delay depth certainty strategy gradient method, realizes the design of an optimal attitude controller which is weakly dependent on a model, only needs a basic model of an aircraft, and does not need to give out all parameter quantities in the model accurately, thereby weakening the dependence degree of the control system design on the model.
Drawings
FIG. 1 is a design flow diagram of the present invention;
FIG. 2 is a flow chart of reinforcement learning according to the present invention.
Detailed Description
The technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the invention, rather than all embodiments, and all other embodiments obtained by those skilled in the art without any creative work based on the embodiments of the present invention belong to the protection scope of the present invention.
An aircraft dual-delay depth-deterministic tactical gradient attitude control method, said method comprising the steps of:
s1: establishing an aircraft dynamic model based on aircraft design parameters and wind tunnel data, and packaging to form a reinforcement learning environment;
s101: the aircraft dynamics model is established as follows:
Figure BDA0003495430610000031
in formula (1):
v represents a speed;
α represents an angle of attack;
g represents the gravitational acceleration;
gamma represents the track inclination;
θ represents a pitch angle;
ω z representing pitch angular velocity;
m represents the aircraft mass;
I yy representing the pitch channel moment of inertia;
T=C T,φ (α)φ+C T (α) represents a thrust force, in which: phi denotes fuel emptyGas mixing ratio, C T,φ (α) represents a coefficient between T and φ, C T (alpha) represents a coefficient between T and alpha, and is obtained through a wind tunnel test;
D=qSC D represents a resistance, wherein: q represents the dynamic pressure, S represents the aircraft reference area, C D Representing the resistance coefficient and obtaining the resistance coefficient through a wind tunnel test;
L=qSC L represents lift, wherein: c L Expressing the lift coefficient, and obtaining the lift coefficient through a wind tunnel test;
M=z T T+qScC M represents the pitching moment, wherein: z is a radical of formula T Denotes the thrust moment arm length, C denotes the mean aerodynamic chord length, C M Representing a pitching moment coefficient, and obtaining the pitching moment coefficient through a wind tunnel experiment;
d V ,d γ ,d Q representing uncertainty in the model due to parameter uncertainty;
s102: and compiling the aircraft dynamics model and a resolving program thereof into a C language program, compiling to form a dynamic link library file, and forming a reinforcement learning environment.
S2: initializing a reinforcement learning interaction environment, an agent and a maximum step number;
s201: the reinforcement learning interactive environment comprises: overview measurement o T And an amount of operation a T And a reward function, and the three categories of reward functions,
defining:
each simulation time step t observed quantity is o t = { V, γ, θ, Q }, wherein: v represents a speed; gamma represents the track dip; θ represents a pitch angle; q represents the attitude angular rate;
overview measurement o T ={o t-3 ,o t-2 ,o t-1 ,o t }; t-3 represents 3 time steps before the simulation time step t, t-2 represents 2 time steps before the simulation time step t, and t-1 represents 1 time step before the simulation time step t;
an amount of motion of a T ={φ,δ e }, wherein: phi represents a fuel-air mixture ratio; delta e Representing elevator yaw;
the reward function is r T =r 1 +r 2 Which isThe method comprises the following steps: r is 1 Representing a reward function related to speed and track pitch control error, and r 1 =λ 1 (V-V r ) 22 (γ-γ r ) 2 ,V r Being a speed command, γ r For track pitch command, λ 12 Setting as a negative number for punishing the control error of the speed and the track inclination angle; r is a radical of hydrogen 2 The project is to award prizes when the speed and track inclination control errors are small, and the total observation is measured T Designed as four successive simulated time step observations o t-3 ,o t-2 ,o t-1 ,o t Superposition of (2);
if | V-V r |<ε 1 And gamma-gamma r |<ε 2 Wherein: epsilon 12 Represents the ideal control accuracy, then r 2 = P, P > 0 representing the value of the reward function for ideal speed and track angle control accuracy, otherwise r 2 =0;
S202: the reinforcement learning agent comprises six neural networks which are respectively: actor network mu (o) T ) Target Actor network mu t (o T ) Critic network one
Figure BDA0003495430610000041
Critic network two
Figure BDA0003495430610000042
Target criticic network one
Figure BDA0003495430610000043
And target critical network two
Figure BDA0003495430610000044
Wherein:
the input of the Actor network is the overview measurement o T The output is the motion amount a T
The input of both the Critic network I and the Critic network II is the overview measurement o T And an amount of motion a T The output quantity is obtained after the intelligent agent takes action(ii) an expected value of the jackpot;
and the structure of the Actor network is the same as that of the target Actor network, the structure of the Critic network I is the same as that of the target Critic network I, the structure of the Critic network II is the same as that of the target Critic network II, the parameters of each neural network are initialized randomly, and the initialized parameters of each neural network are the same as those of the corresponding target neural network, namely:
Figure BDA0003495430610000051
wherein:
θ μ and is a parameter of the Actor network;
Figure BDA0003495430610000052
is a parameter of the target Actor network;
Figure BDA0003495430610000053
is a parameter of a Critic network I;
Figure BDA0003495430610000054
is a parameter of a target Critic network I;
Figure BDA0003495430610000055
the parameter is a parameter of a Critic network II;
Figure BDA0003495430610000056
parameters of a target Critic network II;
s203: setting the maximum number of reinforcement learning steps to be N step
S3: in each simulation time step, taking the acquired corresponding aircraft speed, track inclination angle, attitude angle and attitude angle rate as observed quantities, and inputting the observed quantities into an intelligent agent to obtain the control quantity of the aircraft as an action quantity; calculating a reward function value and a next observed quantity corresponding to the action quantity, and combining the observed quantity, the action quantity, the reward function value and the next observed quantity of the simulation time step to form experience data and storing the experience data into an experience playback area;
s301: according to the speed V of the aircraft collected in each simulation time step t t And track inclination angle gamma t Attitude angle theta t And attitude angular rate Q t Calculating a total view o according to the step of S201 T
S302: measuring the overall view obtained in S301 to o T Inputting the input signal into the Actor network to obtain network output, and superposing random noise N to obtain action amount a T ={φ,δ e };
Clipping phi to phi according to the physical constraints of the aircraft min ≤φ≤φ max ,δ e Clipping to delta emin ≤δ e ≤δ emax Wherein: phi is a min Is the minimum value of the fuel-air mixture ratio; phi is a max Is the maximum fuel-air mixture ratio; delta. For the preparation of a coating emin Is the minimum value of the elevator deflection angle; delta emax Is the maximum value of the elevator deflection angle;
s303: will move an amount a T Inputting S102 to obtain observed quantity o of next simulation time step in reinforcement learning environment t+1 And calculates a reward function r according to the definition in S201 T And observed quantity o T
S304: measure the overview o T And an amount of operation a T Observed quantity o of next simulation time step T+1 And a reward function r T The formed quadruple is stored in the experience playback area as experience data.
S4: randomly sampling the experience data in the experience playback area when the data in the experience playback area reaches a specified number, and adjusting the parameters of the agent based on a double-delay depth certainty strategy gradient algorithm to complete a round of reinforcement learning;
if the reinforcement learning accumulated round number does not reach the maximum step number defined in the S2, returning to the S3; otherwise, ending the reinforcement learning;
s401: randomly sampling M quadruples from an empirical playback zone, and recording the M quadruples as B and B i ,1≤M is more than or equal to i and is the ith quadruple in B;
s402: b is to be i General overview of (1) measure T Inputting the target Actor network, and overlapping random noise to obtain action
Figure BDA0003495430610000061
Will be provided with
Figure BDA0003495430610000062
Is limited to
Figure BDA0003495430610000063
Figure BDA0003495430610000064
Is limited to
Figure BDA0003495430610000065
S403: will act on
Figure BDA0003495430610000066
And the observed quantity o T+1 Respectively inputting them into a critical network I and a critical network II to respectively obtain output quantities
Figure BDA0003495430610000067
S404: calculating a value function
Figure BDA0003495430610000068
Wherein:
Figure BDA0003495430610000069
represents a discount factor, min (Q) 1i ,Q 2i ) Represents Q 1i ,Q 2i The minimum value of (d);
s405: repeating S402-S404, and calculating to obtain output quantities and value functions corresponding to all four tuples in B;
s406: calculating a loss function of a Critic network one
Figure BDA00034954306100000610
Loss function of Critic network two
Figure BDA00034954306100000611
Using a gradient descent method to minimize L 1 And L 2 Updating the parameters of the critical network I and the critical network II for the target
Figure BDA00034954306100000612
S407: to optimize
Figure BDA0003495430610000071
Aiming at the aim, a gradient ascending method is adopted to update the parameter theta of the Actor network μ
S408: updating parameters of the target Actor, the target Critic network I and the target Critic network II by adopting the following formula:
Figure BDA0003495430610000072
in formula (2):
tau is more than 0 and less than 1, and is a smooth updating factor;
thus, a round of reinforcement learning is completed;
if the reinforcement learning accumulated round number does not reach the maximum step number defined in the S203, returning to the S3; otherwise, ending the reinforcement learning.
S5: after the reinforcement learning is finished, the intelligent agent is stored, and the Actor network is stored and used as a self-adaptive controller; and the self-adaptive controller outputs the fuel-air mixing ratio and the elevator deflection angle of the aircraft control quantity under the condition of inputting the general observation measurement.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (3)

1. A gradient attitude control method for an aircraft double-delay depth certainty strategy is characterized by comprising the following steps: the method comprises the following steps:
s1: establishing an aircraft dynamics model, and packaging to form a reinforcement learning environment;
s2: initializing a reinforcement learning interaction environment, an aircraft and a maximum step number;
s201: the reinforcement learning interactive environment comprises: overview measurement o T And an amount of operation a T And a reward function, and the three categories of reward functions,
defining:
each simulation time step t observed quantity is o t = V, γ, θ, Q, where: v represents a speed; gamma represents the track dip; θ represents a pitch angle; q represents the attitude angular rate;
overview measurement o T ={o t-3 ,o t-2 ,o t-1 ,o t }; t-3 represents 3 time steps before the simulation time step t, t-2 represents 2 time steps before the simulation time step t, and t-1 represents 1 time step before the simulation time step t;
an operation amount of a T ={φ,δ e }, wherein: phi represents a fuel-air mixture ratio; delta. For the preparation of a coating e Representing elevator yaw;
the reward function is r T =r 1 +r 2 Wherein: r is 1 Representing a reward function related to speed and track pitch control error, and r 1 =λ 1 (V-V r ) 22 (γ-γ r ) 2 ,V r For speed command, γ r For track pitch command, λ 12 Setting as a negative number for punishing control errors of speed and track inclination; r is a radical of hydrogen 2 The project design objective is to give a reward when speed and track pitch control errors are small, and the overall view measures o T Designed as four continuous simulation time step observables o t-3 ,o t-2 ,o t-1 ,o t Superposition of (2);
if | V-V r |<ε 1 And gamma-gamma r |<ε 2 Wherein: epsilon 12 Represents the ideal control accuracy, r 2 = P, P > 0 representing the value of the reward function for ideal speed and track angle control accuracy, otherwise r 2 =0;
S202: the reinforcement learning aircraft comprises six neural networks which are respectively as follows: actor network mu (o) T ) Target Actor network mu t (o T ) Critic network one
Figure FDA0003848087210000011
Critic network two
Figure FDA0003848087210000012
Target criticic network one
Figure FDA0003848087210000013
And target critical network two
Figure FDA0003848087210000014
Wherein:
the input of the Actor network is the overview measurement o T The output is the action amount a T
The input of both the Critic network I and the Critic network II is the overview measurement o T And an amount of operation a T The output quantity is an expected value of the accumulated reward obtained after the aircraft takes action quantity;
and the structure of the Actor network is the same as that of the target Actor network, the structure of the critical network I is the same as that of the target critical network I, the structure of the critical network II is the same as that of the target critical network II, the parameters of each neural network are initialized randomly, and the initialized parameters of each neural network are the same as those of the corresponding target neural network, namely:
Figure FDA0003848087210000021
wherein:
θ μ is a parameter of the Actor network;
Figure FDA0003848087210000022
is a parameter of the target Actor network;
Figure FDA0003848087210000023
is a parameter of a Critic network I;
Figure FDA0003848087210000024
is a parameter of a target Critic network I;
Figure FDA0003848087210000025
the parameter is a parameter of a Critic network II;
Figure FDA0003848087210000026
parameters of a target Critic network II;
s203: setting the maximum step number of reinforcement learning to N step
S3: obtaining a control quantity of the aircraft as an action quantity; calculating a reward function value corresponding to the action quantity and the next observed quantity, and storing experience data into an experience playback area;
s4: randomly sampling empirical data in a self-experience replay area, adjusting aircraft parameters based on a double-delay depth certainty strategy gradient algorithm, and completing a round of reinforcement learning;
s401: randomly sampling M quadruples from an empirical playback zone, and recording the M quadruples as B and B i M is greater than or equal to 1 and less than or equal to the ith quadruple in the B;
s402: b is to be i Overall view measurement of T Inputting the target Actor network, and overlapping random noise to obtain action
Figure FDA0003848087210000027
Will be provided with
Figure FDA0003848087210000028
Is limited to
Figure FDA0003848087210000029
Figure FDA00038480872100000210
Is limited to
Figure FDA00038480872100000211
S403: will act on
Figure FDA00038480872100000212
And the observed quantity o T+1 Respectively inputting them into a critical network I and a critical network II to respectively obtain output quantities
Figure FDA0003848087210000031
S404: calculating a value function
Figure FDA0003848087210000032
Wherein:
Figure FDA0003848087210000033
represents a discount factor, min (Q) 1i ,Q 2i ) Represents Q 1i ,Q 2i The minimum value of (d);
s405: repeating S402-S404, and calculating to obtain output quantities and value functions corresponding to all four tuples in B;
s406: calculating a loss function of a Critic network one
Figure FDA0003848087210000034
Loss function of Critic network two
Figure FDA0003848087210000035
Using a gradient descent method to minimize L 1 And L 2 Updating the parameters of the critical network I and the critical network II for the target
Figure FDA0003848087210000036
S407: to optimize
Figure FDA0003848087210000037
Aiming at the aim, a gradient ascending method is adopted to update the parameter theta of the Actor network μ
S408: and updating parameters of the target Actor, the target critical network I and the target critical network II by adopting the following formula:
Figure FDA0003848087210000038
in the formula (2):
tau is more than 0 and less than 1, and is a smooth updating factor;
thus, a round of reinforcement learning is completed;
if the accumulated number of rounds of reinforcement learning does not reach the maximum step number defined in S203, returning to S3; otherwise, ending the reinforcement learning;
s5: after the reinforcement learning is finished, storing the aircraft, and storing an Actor network to be used as a self-adaptive controller; and the self-adaptive controller outputs the fuel-air mixing ratio of the aircraft control quantity and the elevator deflection angle under the condition of inputting the overview measurement.
2. The aircraft dual-delay depth deterministic strategy gradient attitude control method of claim 1, characterized in that: the S1 comprises the following steps:
s101: the aircraft dynamics model is established as follows:
Figure FDA0003848087210000041
in formula (1):
v represents a speed;
α represents an angle of attack;
g represents the gravitational acceleration;
gamma represents the track inclination;
θ represents a pitch angle;
ω z representing pitch angular velocity;
m represents the aircraft mass;
I yy representing the pitch channel moment of inertia;
T=C T,φ (α)φ+C T (α) represents a thrust force, wherein: phi denotes the fuel-air mixture ratio, C T,φ (α) represents the coefficient between T and φ, C T (alpha) represents a coefficient between T and alpha, and is obtained through a wind tunnel test;
D=qSC D represents a resistance, wherein: q represents the dynamic pressure, S represents the aircraft reference area, C D Representing the resistance coefficient, and obtaining the resistance coefficient through a wind tunnel test;
L=qSC L represents lift, wherein: c L Expressing the lift coefficient, and obtaining the lift coefficient through a wind tunnel test;
M=z T T+qScC M representing a pitching moment, wherein: z is a radical of T Representing the arm length of the thrust moment, C representing the mean aerodynamic chord length, C M Representing the pitching moment coefficient, and obtaining the pitching moment coefficient through a wind tunnel experiment;
d V ,d γ ,d Q representing uncertainty in the model due to parameter uncertainty;
s102: and compiling the aircraft dynamics model and a resolving program thereof into a C language program, compiling to form a dynamic link library file, and forming a reinforcement learning environment.
3. The aircraft dual-delay depth deterministic strategy gradient attitude control method of claim 2, characterized in that: the S3 comprises the following steps:
s301: according to the speed V of the aircraft collected in each simulation time step t t Track dip angle gamma t Attitude angle theta t And attitude angular rate Q t Calculating the overview o according to the step of S201 T
S302: measuring the overall view obtained in S301 to o T Inputting the signal into the Actor network to obtain network output and superposing random noise
Figure FDA0003848087210000051
Obtaining an operation amount a T ={φ,δ e };
Clipping phi to phi according to physical limitations of the aircraft min ≤φ≤φ max ,δ e Limiting to delta emin ≤δ e ≤δ emax Wherein: phi is a min Is the minimum value of the fuel-air mixture ratio; phi is a max Is the maximum fuel-air mixture ratio; delta emin Is the minimum value of the elevator deflection angle; delta emax Is the maximum value of the elevator deflection angle;
s303: will move an amount a T Inputting S102 to obtain observed quantity o of next simulation time step in reinforcement learning environment t+1 And calculates a reward function r according to the definition in S201 T And observed quantity o T
S304: measure the overview o T And an amount of operation a T Observed quantity o of next simulation time step T+1 And a reward function r T The formed quadruple is stored in the experience playback area as experience data.
CN202210113006.4A 2022-01-29 2022-01-29 Aircraft double-delay depth certainty strategy gradient attitude control method Active CN114489107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210113006.4A CN114489107B (en) 2022-01-29 2022-01-29 Aircraft double-delay depth certainty strategy gradient attitude control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210113006.4A CN114489107B (en) 2022-01-29 2022-01-29 Aircraft double-delay depth certainty strategy gradient attitude control method

Publications (2)

Publication Number Publication Date
CN114489107A CN114489107A (en) 2022-05-13
CN114489107B true CN114489107B (en) 2022-10-25

Family

ID=81479574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210113006.4A Active CN114489107B (en) 2022-01-29 2022-01-29 Aircraft double-delay depth certainty strategy gradient attitude control method

Country Status (1)

Country Link
CN (1) CN114489107B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117289709A (en) * 2023-09-12 2023-12-26 中南大学 High-ultrasonic-speed appearance-changing aircraft attitude control method based on deep reinforcement learning
CN117518836B (en) * 2024-01-04 2024-04-09 中南大学 Robust deep reinforcement learning guidance control integrated method for variant aircraft

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112286218A (en) * 2020-12-29 2021-01-29 南京理工大学 Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient
CN113268074A (en) * 2021-06-07 2021-08-17 哈尔滨工程大学 Unmanned aerial vehicle flight path planning method based on joint optimization
CN113377121A (en) * 2020-07-02 2021-09-10 北京航空航天大学 Aircraft intelligent disturbance rejection control method based on deep reinforcement learning
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377121A (en) * 2020-07-02 2021-09-10 北京航空航天大学 Aircraft intelligent disturbance rejection control method based on deep reinforcement learning
CN112286218A (en) * 2020-12-29 2021-01-29 南京理工大学 Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient
CN113268074A (en) * 2021-06-07 2021-08-17 哈尔滨工程大学 Unmanned aerial vehicle flight path planning method based on joint optimization
CN113392935A (en) * 2021-07-09 2021-09-14 浙江工业大学 Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Low-Level Control of a Quadrotor using Twin Delayed Deep Deterministic Policy Gradient (TD3);Mazen Shehab等;《021 18th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE)》;IEEE;20211214;第1-6页 *
基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策;郭万春等;《空军工程大学学报(自然科学版)》;空军工程大学教研保障中心;20210831;第22卷(第4期);第15-21页 *
基于跟踪微分器的高超声速飞行器Backstepping控制;路遥;《航空学报》;20211125;第42卷(第11期);第524737-1至524737-12页 *
郭万春等.基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策.《空军工程大学学报(自然科学版)》.空军工程大学教研保障中心,2021,第22卷(第4期),第15-21页. *

Also Published As

Publication number Publication date
CN114489107A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN114489107B (en) Aircraft double-delay depth certainty strategy gradient attitude control method
CN110377045B (en) Aircraft full-profile control method based on anti-interference technology
CN110806759B (en) Aircraft route tracking method based on deep reinforcement learning
CN109144084B (en) A kind of VTOL Reusable Launch Vehicles Attitude tracking control method based on set time Convergence monitoring device
Klein Estimation of aircraft aerodynamic parameters from flight data
CN109270947B (en) Tilt rotor unmanned aerial vehicle flight control system
CN106896722B (en) The hypersonic vehicle composite control method of adoption status feedback and neural network
CN109635494B (en) Flight test and ground simulation aerodynamic force data comprehensive modeling method
CN112286218B (en) Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient
CN113485304B (en) Aircraft hierarchical fault-tolerant control method based on deep learning fault diagnosis
CN112859889B (en) Autonomous underwater robot control method and system based on self-adaptive dynamic planning
CN112286217A (en) Automatic pilot based on radial basis function neural network and decoupling control method thereof
CN115437406A (en) Aircraft reentry tracking guidance method based on reinforcement learning algorithm
CN112880704A (en) Intelligent calibration method for fiber optic gyroscope strapdown inertial navigation system
CN113377121A (en) Aircraft intelligent disturbance rejection control method based on deep reinforcement learning
CN115857530A (en) Decoupling-free attitude control method of aircraft based on TD3 multi-experience pool reinforcement learning
CN117289709A (en) High-ultrasonic-speed appearance-changing aircraft attitude control method based on deep reinforcement learning
CN112560343B (en) J2 perturbation Lambert problem solving method based on deep neural network and targeting algorithm
CN111273056B (en) Attack angle observation method of high-speed aircraft without adopting altitude measurement
CN112683261A (en) Unmanned aerial vehicle robustness navigation method based on speed prediction
CN113821057B (en) Planetary soft landing control method and system based on reinforcement learning and storage medium
CN116907503A (en) Remote sensing formation satellite positioning method and system based on robust positioning algorithm of outlier
CN114462149B (en) Aircraft pneumatic parameter identification method based on pre-training and incremental learning
CN115048724A (en) B-type spline-based method for online identification of aerodynamic coefficient of variant aerospace vehicle
CN117784616B (en) High-speed aircraft fault reconstruction method based on intelligent observer group

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant