CN114489107B - Aircraft double-delay depth certainty strategy gradient attitude control method - Google Patents
Aircraft double-delay depth certainty strategy gradient attitude control method Download PDFInfo
- Publication number
- CN114489107B CN114489107B CN202210113006.4A CN202210113006A CN114489107B CN 114489107 B CN114489107 B CN 114489107B CN 202210113006 A CN202210113006 A CN 202210113006A CN 114489107 B CN114489107 B CN 114489107B
- Authority
- CN
- China
- Prior art keywords
- network
- aircraft
- reinforcement learning
- target
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000002787 reinforcement Effects 0.000 claims abstract description 38
- 230000006870 function Effects 0.000 claims abstract description 25
- 230000009471 action Effects 0.000 claims abstract description 13
- 238000013461 design Methods 0.000 claims abstract description 8
- 230000003993 interaction Effects 0.000 claims abstract description 4
- 238000004088 simulation Methods 0.000 claims description 17
- 238000005259 measurement Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 239000000203 mixture Substances 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 5
- 238000004806 packaging method and process Methods 0.000 claims description 3
- YZCKVEUIGOORGS-OUBTZVSYSA-N Deuterium Chemical compound [2H] YZCKVEUIGOORGS-OUBTZVSYSA-N 0.000 claims description 2
- 230000001133 acceleration Effects 0.000 claims description 2
- 230000001174 ascending effect Effects 0.000 claims description 2
- 239000011248 coating agent Substances 0.000 claims description 2
- 238000000576 coating method Methods 0.000 claims description 2
- 238000002474 experimental method Methods 0.000 claims description 2
- 238000011478 gradient descent method Methods 0.000 claims description 2
- 230000002452 interceptive effect Effects 0.000 claims description 2
- 238000002360 preparation method Methods 0.000 claims description 2
- 230000001419 dependent effect Effects 0.000 abstract description 3
- 230000003313 weakening effect Effects 0.000 abstract description 2
- 239000003795 chemical substances by application Substances 0.000 description 9
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000446 fuel Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/08—Control of attitude, i.e. control of roll, pitch, or yaw
- G05D1/0808—Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
- G05D1/0816—Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft to ensure stability
- G05D1/0833—Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft to ensure stability using limited authority control
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T90/00—Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Feedback Control In General (AREA)
Abstract
A gradient attitude control method for an aircraft dual-delay depth certainty strategy belongs to the technical field of aircraft control. The method comprises the following steps: establishing an aircraft dynamics model to form a reinforcement learning environment; initializing a reinforcement learning interaction environment, an agent and a maximum step number; obtaining a control quantity of the aircraft as an action quantity; calculating a reward function value corresponding to the action quantity and the next observed quantity, and combining to form experience data to be recorded to an experience playback area; adjusting the parameters of the intelligent agent to complete a round of reinforcement learning; and outputting the fuel-air mixing ratio of the aircraft control quantity and the elevator deflection angle. The invention relates to a high-precision self-adaptive aircraft intelligent control method, which carries out reinforcement learning by a double-delay depth certainty strategy gradient method, realizes the design of an optimal attitude controller which is weakly dependent on a model, only needs a basic model of an aircraft, and does not need to give out all parameter quantities in the model accurately, thereby weakening the dependence degree of the control system design on the model.
Description
Technical Field
The invention relates to a gradient attitude control method for a double-delay depth certainty strategy of various aircrafts, belonging to the technical field of aircraft control.
Background
The aircraft is difficult to establish an accurate control model due to the problems of high parameter uncertainty and coupling, high model nonlinearity, complex interference and the like, and a design method of a traditional controller depends on a relatively accurate control model, so that a control method with a design process which is weakly dependent on the accurate model of the aircraft needs to be developed.
Disclosure of Invention
In order to solve the problems in the background art, the invention provides a gradient attitude control method of an aircraft double-delay depth deterministic strategy.
The invention adopts the following technical scheme: an aircraft dual-delay depth-deterministic tactical gradient attitude control method, said method comprising the steps of:
s1: establishing an aircraft dynamics model, and packaging to form a reinforcement learning environment;
s2: initializing a reinforcement learning interaction environment, an agent and a maximum step number;
s3: obtaining a control quantity of the aircraft as an action quantity; calculating a reward function value corresponding to the action quantity and the next observed quantity, and storing experience data into an experience playback area;
s4: randomly sampling empirical data in a self-experience replay area, and adjusting parameters of the intelligent agent based on a double-delay depth certainty strategy gradient algorithm to complete a round of reinforcement learning;
if the accumulated number of rounds of reinforcement learning does not reach the maximum number of steps defined in S2, returning to S3; otherwise, ending the reinforcement learning;
s5: after the reinforcement learning is finished, storing the intelligent agent, and storing the Actor network to be used as an adaptive controller; and the self-adaptive controller outputs the fuel-air mixing ratio and the elevator deflection angle of the aircraft control quantity under the condition of inputting the general observation measurement.
Compared with the prior art, the invention has the beneficial effects that:
the invention relates to a high-precision self-adaptive aircraft intelligent control method, which carries out reinforcement learning by a double-delay depth certainty strategy gradient method, realizes the design of an optimal attitude controller which is weakly dependent on a model, only needs a basic model of an aircraft, and does not need to give out all parameter quantities in the model accurately, thereby weakening the dependence degree of the control system design on the model.
Drawings
FIG. 1 is a design flow diagram of the present invention;
FIG. 2 is a flow chart of reinforcement learning according to the present invention.
Detailed Description
The technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the invention, rather than all embodiments, and all other embodiments obtained by those skilled in the art without any creative work based on the embodiments of the present invention belong to the protection scope of the present invention.
An aircraft dual-delay depth-deterministic tactical gradient attitude control method, said method comprising the steps of:
s1: establishing an aircraft dynamic model based on aircraft design parameters and wind tunnel data, and packaging to form a reinforcement learning environment;
s101: the aircraft dynamics model is established as follows:
in formula (1):
v represents a speed;
α represents an angle of attack;
g represents the gravitational acceleration;
gamma represents the track inclination;
θ represents a pitch angle;
ω z representing pitch angular velocity;
m represents the aircraft mass;
I yy representing the pitch channel moment of inertia;
T=C T,φ (α)φ+C T (α) represents a thrust force, in which: phi denotes fuel emptyGas mixing ratio, C T,φ (α) represents a coefficient between T and φ, C T (alpha) represents a coefficient between T and alpha, and is obtained through a wind tunnel test;
D=qSC D represents a resistance, wherein: q represents the dynamic pressure, S represents the aircraft reference area, C D Representing the resistance coefficient and obtaining the resistance coefficient through a wind tunnel test;
L=qSC L represents lift, wherein: c L Expressing the lift coefficient, and obtaining the lift coefficient through a wind tunnel test;
M=z T T+qScC M represents the pitching moment, wherein: z is a radical of formula T Denotes the thrust moment arm length, C denotes the mean aerodynamic chord length, C M Representing a pitching moment coefficient, and obtaining the pitching moment coefficient through a wind tunnel experiment;
d V ,d γ ,d Q representing uncertainty in the model due to parameter uncertainty;
s102: and compiling the aircraft dynamics model and a resolving program thereof into a C language program, compiling to form a dynamic link library file, and forming a reinforcement learning environment.
S2: initializing a reinforcement learning interaction environment, an agent and a maximum step number;
s201: the reinforcement learning interactive environment comprises: overview measurement o T And an amount of operation a T And a reward function, and the three categories of reward functions,
defining:
each simulation time step t observed quantity is o t = { V, γ, θ, Q }, wherein: v represents a speed; gamma represents the track dip; θ represents a pitch angle; q represents the attitude angular rate;
overview measurement o T ={o t-3 ,o t-2 ,o t-1 ,o t }; t-3 represents 3 time steps before the simulation time step t, t-2 represents 2 time steps before the simulation time step t, and t-1 represents 1 time step before the simulation time step t;
an amount of motion of a T ={φ,δ e }, wherein: phi represents a fuel-air mixture ratio; delta e Representing elevator yaw;
the reward function is r T =r 1 +r 2 Which isThe method comprises the following steps: r is 1 Representing a reward function related to speed and track pitch control error, and r 1 =λ 1 (V-V r ) 2 +λ 2 (γ-γ r ) 2 ,V r Being a speed command, γ r For track pitch command, λ 1 ,λ 2 Setting as a negative number for punishing the control error of the speed and the track inclination angle; r is a radical of hydrogen 2 The project is to award prizes when the speed and track inclination control errors are small, and the total observation is measured T Designed as four successive simulated time step observations o t-3 ,o t-2 ,o t-1 ,o t Superposition of (2);
if | V-V r |<ε 1 And gamma-gamma r |<ε 2 Wherein: epsilon 1 ,ε 2 Represents the ideal control accuracy, then r 2 = P, P > 0 representing the value of the reward function for ideal speed and track angle control accuracy, otherwise r 2 =0;
S202: the reinforcement learning agent comprises six neural networks which are respectively: actor network mu (o) T ) Target Actor network mu t (o T ) Critic network oneCritic network twoTarget criticic network oneAnd target critical network two
Wherein:
the input of the Actor network is the overview measurement o T The output is the motion amount a T ;
The input of both the Critic network I and the Critic network II is the overview measurement o T And an amount of motion a T The output quantity is obtained after the intelligent agent takes action(ii) an expected value of the jackpot;
and the structure of the Actor network is the same as that of the target Actor network, the structure of the Critic network I is the same as that of the target Critic network I, the structure of the Critic network II is the same as that of the target Critic network II, the parameters of each neural network are initialized randomly, and the initialized parameters of each neural network are the same as those of the corresponding target neural network, namely:
wherein:
θ μ and is a parameter of the Actor network;
s203: setting the maximum number of reinforcement learning steps to be N step 。
S3: in each simulation time step, taking the acquired corresponding aircraft speed, track inclination angle, attitude angle and attitude angle rate as observed quantities, and inputting the observed quantities into an intelligent agent to obtain the control quantity of the aircraft as an action quantity; calculating a reward function value and a next observed quantity corresponding to the action quantity, and combining the observed quantity, the action quantity, the reward function value and the next observed quantity of the simulation time step to form experience data and storing the experience data into an experience playback area;
s301: according to the speed V of the aircraft collected in each simulation time step t t And track inclination angle gamma t Attitude angle theta t And attitude angular rate Q t Calculating a total view o according to the step of S201 T ;
S302: measuring the overall view obtained in S301 to o T Inputting the input signal into the Actor network to obtain network output, and superposing random noise N to obtain action amount a T ={φ,δ e };
Clipping phi to phi according to the physical constraints of the aircraft min ≤φ≤φ max ,δ e Clipping to delta emin ≤δ e ≤δ emax Wherein: phi is a min Is the minimum value of the fuel-air mixture ratio; phi is a max Is the maximum fuel-air mixture ratio; delta. For the preparation of a coating emin Is the minimum value of the elevator deflection angle; delta emax Is the maximum value of the elevator deflection angle;
s303: will move an amount a T Inputting S102 to obtain observed quantity o of next simulation time step in reinforcement learning environment t+1 And calculates a reward function r according to the definition in S201 T And observed quantity o T ;
S304: measure the overview o T And an amount of operation a T Observed quantity o of next simulation time step T+1 And a reward function r T The formed quadruple is stored in the experience playback area as experience data.
S4: randomly sampling the experience data in the experience playback area when the data in the experience playback area reaches a specified number, and adjusting the parameters of the agent based on a double-delay depth certainty strategy gradient algorithm to complete a round of reinforcement learning;
if the reinforcement learning accumulated round number does not reach the maximum step number defined in the S2, returning to the S3; otherwise, ending the reinforcement learning;
s401: randomly sampling M quadruples from an empirical playback zone, and recording the M quadruples as B and B i ,1≤M is more than or equal to i and is the ith quadruple in B;
s402: b is to be i General overview of (1) measure T Inputting the target Actor network, and overlapping random noise to obtain actionWill be provided withIs limited to Is limited to
S403: will act onAnd the observed quantity o T+1 Respectively inputting them into a critical network I and a critical network II to respectively obtain output quantities
S404: calculating a value functionWherein:represents a discount factor, min (Q) 1i ,Q 2i ) Represents Q 1i ,Q 2i The minimum value of (d);
s405: repeating S402-S404, and calculating to obtain output quantities and value functions corresponding to all four tuples in B;
s406: calculating a loss function of a Critic network oneLoss function of Critic network twoUsing a gradient descent method to minimize L 1 And L 2 Updating the parameters of the critical network I and the critical network II for the target
S407: to optimizeAiming at the aim, a gradient ascending method is adopted to update the parameter theta of the Actor network μ ;
S408: updating parameters of the target Actor, the target Critic network I and the target Critic network II by adopting the following formula:
in formula (2):
tau is more than 0 and less than 1, and is a smooth updating factor;
thus, a round of reinforcement learning is completed;
if the reinforcement learning accumulated round number does not reach the maximum step number defined in the S203, returning to the S3; otherwise, ending the reinforcement learning.
S5: after the reinforcement learning is finished, the intelligent agent is stored, and the Actor network is stored and used as a self-adaptive controller; and the self-adaptive controller outputs the fuel-air mixing ratio and the elevator deflection angle of the aircraft control quantity under the condition of inputting the general observation measurement.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.
Claims (3)
1. A gradient attitude control method for an aircraft double-delay depth certainty strategy is characterized by comprising the following steps: the method comprises the following steps:
s1: establishing an aircraft dynamics model, and packaging to form a reinforcement learning environment;
s2: initializing a reinforcement learning interaction environment, an aircraft and a maximum step number;
s201: the reinforcement learning interactive environment comprises: overview measurement o T And an amount of operation a T And a reward function, and the three categories of reward functions,
defining:
each simulation time step t observed quantity is o t = V, γ, θ, Q, where: v represents a speed; gamma represents the track dip; θ represents a pitch angle; q represents the attitude angular rate;
overview measurement o T ={o t-3 ,o t-2 ,o t-1 ,o t }; t-3 represents 3 time steps before the simulation time step t, t-2 represents 2 time steps before the simulation time step t, and t-1 represents 1 time step before the simulation time step t;
an operation amount of a T ={φ,δ e }, wherein: phi represents a fuel-air mixture ratio; delta. For the preparation of a coating e Representing elevator yaw;
the reward function is r T =r 1 +r 2 Wherein: r is 1 Representing a reward function related to speed and track pitch control error, and r 1 =λ 1 (V-V r ) 2 +λ 2 (γ-γ r ) 2 ,V r For speed command, γ r For track pitch command, λ 1 ,λ 2 Setting as a negative number for punishing control errors of speed and track inclination; r is a radical of hydrogen 2 The project design objective is to give a reward when speed and track pitch control errors are small, and the overall view measures o T Designed as four continuous simulation time step observables o t-3 ,o t-2 ,o t-1 ,o t Superposition of (2);
if | V-V r |<ε 1 And gamma-gamma r |<ε 2 Wherein: epsilon 1 ,ε 2 Represents the ideal control accuracy, r 2 = P, P > 0 representing the value of the reward function for ideal speed and track angle control accuracy, otherwise r 2 =0;
S202: the reinforcement learning aircraft comprises six neural networks which are respectively as follows: actor network mu (o) T ) Target Actor network mu t (o T ) Critic network oneCritic network twoTarget criticic network oneAnd target critical network two
Wherein:
the input of the Actor network is the overview measurement o T The output is the action amount a T ;
The input of both the Critic network I and the Critic network II is the overview measurement o T And an amount of operation a T The output quantity is an expected value of the accumulated reward obtained after the aircraft takes action quantity;
and the structure of the Actor network is the same as that of the target Actor network, the structure of the critical network I is the same as that of the target critical network I, the structure of the critical network II is the same as that of the target critical network II, the parameters of each neural network are initialized randomly, and the initialized parameters of each neural network are the same as those of the corresponding target neural network, namely:
wherein:
θ μ is a parameter of the Actor network;
s203: setting the maximum step number of reinforcement learning to N step ;
S3: obtaining a control quantity of the aircraft as an action quantity; calculating a reward function value corresponding to the action quantity and the next observed quantity, and storing experience data into an experience playback area;
s4: randomly sampling empirical data in a self-experience replay area, adjusting aircraft parameters based on a double-delay depth certainty strategy gradient algorithm, and completing a round of reinforcement learning;
s401: randomly sampling M quadruples from an empirical playback zone, and recording the M quadruples as B and B i M is greater than or equal to 1 and less than or equal to the ith quadruple in the B;
s402: b is to be i Overall view measurement of T Inputting the target Actor network, and overlapping random noise to obtain actionWill be provided withIs limited to Is limited to
S403: will act onAnd the observed quantity o T+1 Respectively inputting them into a critical network I and a critical network II to respectively obtain output quantities
S404: calculating a value functionWherein:represents a discount factor, min (Q) 1i ,Q 2i ) Represents Q 1i ,Q 2i The minimum value of (d);
s405: repeating S402-S404, and calculating to obtain output quantities and value functions corresponding to all four tuples in B;
s406: calculating a loss function of a Critic network oneLoss function of Critic network twoUsing a gradient descent method to minimize L 1 And L 2 Updating the parameters of the critical network I and the critical network II for the target
S407: to optimizeAiming at the aim, a gradient ascending method is adopted to update the parameter theta of the Actor network μ ;
S408: and updating parameters of the target Actor, the target critical network I and the target critical network II by adopting the following formula:
in the formula (2):
tau is more than 0 and less than 1, and is a smooth updating factor;
thus, a round of reinforcement learning is completed;
if the accumulated number of rounds of reinforcement learning does not reach the maximum step number defined in S203, returning to S3; otherwise, ending the reinforcement learning;
s5: after the reinforcement learning is finished, storing the aircraft, and storing an Actor network to be used as a self-adaptive controller; and the self-adaptive controller outputs the fuel-air mixing ratio of the aircraft control quantity and the elevator deflection angle under the condition of inputting the overview measurement.
2. The aircraft dual-delay depth deterministic strategy gradient attitude control method of claim 1, characterized in that: the S1 comprises the following steps:
s101: the aircraft dynamics model is established as follows:
in formula (1):
v represents a speed;
α represents an angle of attack;
g represents the gravitational acceleration;
gamma represents the track inclination;
θ represents a pitch angle;
ω z representing pitch angular velocity;
m represents the aircraft mass;
I yy representing the pitch channel moment of inertia;
T=C T,φ (α)φ+C T (α) represents a thrust force, wherein: phi denotes the fuel-air mixture ratio, C T,φ (α) represents the coefficient between T and φ, C T (alpha) represents a coefficient between T and alpha, and is obtained through a wind tunnel test;
D=qSC D represents a resistance, wherein: q represents the dynamic pressure, S represents the aircraft reference area, C D Representing the resistance coefficient, and obtaining the resistance coefficient through a wind tunnel test;
L=qSC L represents lift, wherein: c L Expressing the lift coefficient, and obtaining the lift coefficient through a wind tunnel test;
M=z T T+qScC M representing a pitching moment, wherein: z is a radical of T Representing the arm length of the thrust moment, C representing the mean aerodynamic chord length, C M Representing the pitching moment coefficient, and obtaining the pitching moment coefficient through a wind tunnel experiment;
d V ,d γ ,d Q representing uncertainty in the model due to parameter uncertainty;
s102: and compiling the aircraft dynamics model and a resolving program thereof into a C language program, compiling to form a dynamic link library file, and forming a reinforcement learning environment.
3. The aircraft dual-delay depth deterministic strategy gradient attitude control method of claim 2, characterized in that: the S3 comprises the following steps:
s301: according to the speed V of the aircraft collected in each simulation time step t t Track dip angle gamma t Attitude angle theta t And attitude angular rate Q t Calculating the overview o according to the step of S201 T ;
S302: measuring the overall view obtained in S301 to o T Inputting the signal into the Actor network to obtain network output and superposing random noiseObtaining an operation amount a T ={φ,δ e };
Clipping phi to phi according to physical limitations of the aircraft min ≤φ≤φ max ,δ e Limiting to delta emin ≤δ e ≤δ emax Wherein: phi is a min Is the minimum value of the fuel-air mixture ratio; phi is a max Is the maximum fuel-air mixture ratio; delta emin Is the minimum value of the elevator deflection angle; delta emax Is the maximum value of the elevator deflection angle;
s303: will move an amount a T Inputting S102 to obtain observed quantity o of next simulation time step in reinforcement learning environment t+1 And calculates a reward function r according to the definition in S201 T And observed quantity o T ;
S304: measure the overview o T And an amount of operation a T Observed quantity o of next simulation time step T+1 And a reward function r T The formed quadruple is stored in the experience playback area as experience data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210113006.4A CN114489107B (en) | 2022-01-29 | 2022-01-29 | Aircraft double-delay depth certainty strategy gradient attitude control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210113006.4A CN114489107B (en) | 2022-01-29 | 2022-01-29 | Aircraft double-delay depth certainty strategy gradient attitude control method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114489107A CN114489107A (en) | 2022-05-13 |
CN114489107B true CN114489107B (en) | 2022-10-25 |
Family
ID=81479574
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210113006.4A Active CN114489107B (en) | 2022-01-29 | 2022-01-29 | Aircraft double-delay depth certainty strategy gradient attitude control method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114489107B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117289709A (en) * | 2023-09-12 | 2023-12-26 | 中南大学 | High-ultrasonic-speed appearance-changing aircraft attitude control method based on deep reinforcement learning |
CN117518836B (en) * | 2024-01-04 | 2024-04-09 | 中南大学 | Robust deep reinforcement learning guidance control integrated method for variant aircraft |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112286218A (en) * | 2020-12-29 | 2021-01-29 | 南京理工大学 | Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient |
CN113268074A (en) * | 2021-06-07 | 2021-08-17 | 哈尔滨工程大学 | Unmanned aerial vehicle flight path planning method based on joint optimization |
CN113377121A (en) * | 2020-07-02 | 2021-09-10 | 北京航空航天大学 | Aircraft intelligent disturbance rejection control method based on deep reinforcement learning |
CN113392935A (en) * | 2021-07-09 | 2021-09-14 | 浙江工业大学 | Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism |
-
2022
- 2022-01-29 CN CN202210113006.4A patent/CN114489107B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113377121A (en) * | 2020-07-02 | 2021-09-10 | 北京航空航天大学 | Aircraft intelligent disturbance rejection control method based on deep reinforcement learning |
CN112286218A (en) * | 2020-12-29 | 2021-01-29 | 南京理工大学 | Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient |
CN113268074A (en) * | 2021-06-07 | 2021-08-17 | 哈尔滨工程大学 | Unmanned aerial vehicle flight path planning method based on joint optimization |
CN113392935A (en) * | 2021-07-09 | 2021-09-14 | 浙江工业大学 | Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism |
Non-Patent Citations (4)
Title |
---|
Low-Level Control of a Quadrotor using Twin Delayed Deep Deterministic Policy Gradient (TD3);Mazen Shehab等;《021 18th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE)》;IEEE;20211214;第1-6页 * |
基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策;郭万春等;《空军工程大学学报(自然科学版)》;空军工程大学教研保障中心;20210831;第22卷(第4期);第15-21页 * |
基于跟踪微分器的高超声速飞行器Backstepping控制;路遥;《航空学报》;20211125;第42卷(第11期);第524737-1至524737-12页 * |
郭万春等.基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策.《空军工程大学学报(自然科学版)》.空军工程大学教研保障中心,2021,第22卷(第4期),第15-21页. * |
Also Published As
Publication number | Publication date |
---|---|
CN114489107A (en) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114489107B (en) | Aircraft double-delay depth certainty strategy gradient attitude control method | |
CN110377045B (en) | Aircraft full-profile control method based on anti-interference technology | |
CN110806759B (en) | Aircraft route tracking method based on deep reinforcement learning | |
CN109144084B (en) | A kind of VTOL Reusable Launch Vehicles Attitude tracking control method based on set time Convergence monitoring device | |
Klein | Estimation of aircraft aerodynamic parameters from flight data | |
CN109270947B (en) | Tilt rotor unmanned aerial vehicle flight control system | |
CN106896722B (en) | The hypersonic vehicle composite control method of adoption status feedback and neural network | |
CN109635494B (en) | Flight test and ground simulation aerodynamic force data comprehensive modeling method | |
CN112286218B (en) | Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient | |
CN113485304B (en) | Aircraft hierarchical fault-tolerant control method based on deep learning fault diagnosis | |
CN112859889B (en) | Autonomous underwater robot control method and system based on self-adaptive dynamic planning | |
CN112286217A (en) | Automatic pilot based on radial basis function neural network and decoupling control method thereof | |
CN115437406A (en) | Aircraft reentry tracking guidance method based on reinforcement learning algorithm | |
CN112880704A (en) | Intelligent calibration method for fiber optic gyroscope strapdown inertial navigation system | |
CN113377121A (en) | Aircraft intelligent disturbance rejection control method based on deep reinforcement learning | |
CN115857530A (en) | Decoupling-free attitude control method of aircraft based on TD3 multi-experience pool reinforcement learning | |
CN117289709A (en) | High-ultrasonic-speed appearance-changing aircraft attitude control method based on deep reinforcement learning | |
CN112560343B (en) | J2 perturbation Lambert problem solving method based on deep neural network and targeting algorithm | |
CN111273056B (en) | Attack angle observation method of high-speed aircraft without adopting altitude measurement | |
CN112683261A (en) | Unmanned aerial vehicle robustness navigation method based on speed prediction | |
CN113821057B (en) | Planetary soft landing control method and system based on reinforcement learning and storage medium | |
CN116907503A (en) | Remote sensing formation satellite positioning method and system based on robust positioning algorithm of outlier | |
CN114462149B (en) | Aircraft pneumatic parameter identification method based on pre-training and incremental learning | |
CN115048724A (en) | B-type spline-based method for online identification of aerodynamic coefficient of variant aerospace vehicle | |
CN117784616B (en) | High-speed aircraft fault reconstruction method based on intelligent observer group |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |