CN113985924B

CN113985924B - Aircraft control method, device, equipment and computer readable storage medium

Info

Publication number: CN113985924B
Application number: CN202111608105.1A
Authority: CN
Inventors: 周志明; 刘振; 蒲志强; 易建强
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-04-08
Anticipated expiration: 2041-12-27
Also published as: CN113985924A

Abstract

The application provides an aircraft control method, an aircraft control device, aircraft control equipment and a computer program product, wherein the method comprises the following steps: according to model parameters in the aircraft model, determining observed quantity data and intelligent agent action data of the aircraft; training an actor network and a critic network based on the observed quantity data or/and the intelligent agent action data, and outputting a deterministic strategy and an action value function; updating the operator network parameters and the criticic network parameters through a depth certainty strategy gradient algorithm to obtain an optimal operator network and an optimal criticic network; and constructing an online controller based on the optimal actor network and the optimal critic network, and controlling the aircraft through the online controller. According to the aircraft control method provided by the embodiment of the application, the online controller is trained offline through a depth certainty strategy gradient algorithm, so that the online controller has good adaptability and robustness, and the accurate control of the aircraft is realized.

Description

Aircraft control method, device, equipment and computer readable storage medium

Technical Field

The present application relates to the field of aircraft control technologies, and in particular, to an aircraft control method, apparatus, device, and computer-readable storage medium.

Background

The aircraft generally has a large flight envelope range, a high speed and a wide flight altitude range, so that the dynamic coefficient related to the flight state is changed violently in the flight process. In engineering practice, the control of the aircraft is generally realized by using a classical frequency domain controller. Although the method can meet the requirements of practical application to a certain extent, the method has many obvious defects: firstly, a good controller can be designed only by accurately modeling an aircraft, and the capability of ensuring the stability and the dynamic performance of the system is poor under the condition of model uncertainty; secondly, when the state of the aircraft changes along with the flight time, the classical frequency domain controller needs to carry out a large amount of interpolation calculation, and the requirement on the storage space on the aircraft is high. In addition, the expansion of the flight envelope further makes the flight environment in the flight process become more complicated and changeable, and the uncertain factors are numerous, thereby bringing great difficulty to the precise control of the aircraft.

Disclosure of Invention

The application provides an aircraft control method, an aircraft control device, aircraft control equipment and a computer program product, and aims to realize accurate control of an aircraft.

In a first aspect, the present application provides an aircraft control method comprising:

according to model parameters in the aircraft model, determining observed quantity data and intelligent agent action data of the aircraft;

training an actor network and a critic network based on the observed quantity data or/and the intelligent agent action data, and outputting a deterministic strategy and an action value function;

updating an actor network parameter in the certainty strategy and a critic network parameter in the action value function through a depth certainty strategy gradient algorithm to obtain an optimal actor network and an optimal critic network;

and constructing an online controller based on the optimal actor network and the optimal critic network, and controlling the aircraft through the online controller.

In one embodiment, the deep deterministic policy gradient algorithm includes an actor current network and an actor target network,

updating the operator network parameters in the deterministic strategy through a depth deterministic strategy gradient algorithm to obtain an optimal operator network, wherein the method comprises the following steps:

updating initial operator network parameters in the certainty strategy through the operator current network, selecting a current action according to a current state to interact with a current environment, generating a next step state, and transmitting the next step state to an experience playback pool;

acquiring the next step state in the experience playback pool through the operator target network, selecting an optimal next action according to the next step state, and calculating target operator network parameters;

and determining the latest actor network parameters through the actor target network according to the initial actor network parameters, the target actor network parameters and the inertia updating rate, and updating the target actor network parameters through the latest actor network parameters to obtain the optimal actor network.

The depth deterministic policy gradient algorithm comprises a critic current network and a critic target network,

updating the criticic network parameters in the action value function through a depth certainty strategy gradient algorithm to obtain an optimal criticic network, wherein the method comprises the following steps:

updating initial critic network parameters in the action value function through the critic current network, calculating a function value of the current action, and constructing a to-be-processed action value function according to the function value of the current action;

estimating the to-be-processed action value function through the critic target network to obtain target critic network parameters;

and determining the latest critic network parameters through the critic target network according to the initial critic network parameters, the target critic network parameters and the inertia updating rate, and updating the target critic network parameters through the latest critic network parameters to obtain the optimal critic network.

The actor network includes a first input layer and a first intermediate fully-connected layer,

training an actor network based on the observed quantity data, and outputting a deterministic strategy, wherein the method comprises the following steps:

determining the observation data as input data for the first input layer;

processing the observed quantity data through the first middle full-connection layer, outputting a rudder deflection angle action, and determining the rudder deflection angle action as the certainty strategy;

the first input layer is an input layer of one layer of 9 neurons, and the first middle full-connection layer is a full-connection layer of two layers of 49 neurons.

The criticc network includes a second input layer, a second intermediate fully-connected layer and a third intermediate fully-connected layer,

training a criticc network based on the observed quantity data and the agent action data, and outputting an action value function, wherein the action value function comprises:

determining the observation data and the agent motion data as input data of the second input layer;

processing the observed quantity data through the second middle full-connection layer to obtain first data to be processed;

processing the intelligent agent action data through the third middle full connection layer to obtain second data to be processed;

summing the first data to be processed and the second data to be processed to obtain target data, processing the target data through the third middle full-link layer, and outputting the action value function;

the second input layer is an input layer of 10 neurons, the second middle full-connection layer is a full-connection layer of 49 neurons, and the third middle full-connection layer is a full-connection layer of 49 neurons.

The method for determining the observed quantity data and the intelligent agent action data of the aircraft according to the model parameters in the aircraft model comprises the following steps:

determining preset frame data of actual overload and instruction deviation, pitch angle speed and pseudo attack angle in the model parameters as the observed quantity data by taking the step length as a period;

and determining the pitching rudder deflection angle in the model parameters as the intelligent body action data.

Before determining observed quantity data and intelligent agent action data of the aircraft according to model parameters in the aircraft model, the method further comprises the following steps:

determining a transfer function from the normal overload to the pitch rudder deflection angle through the combination of transfer functions and the normal overload, the pitch rudder deflection angle and characteristic parameters of the aircraft;

determining a transfer function from the pitch angle speed to the pitch rudder deflection angle through the combination of the transfer functions and the pitch angle speed, the pitch rudder deflection angle and the characteristic parameters of the aircraft;

determining a transfer function from the attack angle to the pitching rudder deflection angle through the combination of the transfer functions and the attack angle, the pitching rudder deflection angle and the characteristic parameters of the aircraft;

and constructing the aircraft model according to a transfer function from normal overload to pitching rudder deflection angle, a transfer function from pitch angle speed to pitching rudder deflection angle and a transfer function from attack angle to pitching rudder deflection angle.

In a second aspect, the present application also provides an aircraft control device comprising:

the determining module is used for determining observed quantity data and intelligent agent action data of the aircraft according to model parameters in the aircraft model;

the training module is used for training an actor network and a critic network based on the observed quantity data or/and the intelligent agent action data and outputting a deterministic strategy and an action value function;

the updating module is used for updating the operator network parameters in the deterministic strategy and the criticic network parameters in the action value function through a depth deterministic strategy gradient algorithm to obtain an optimal operator network and an optimal criticic network;

and the control module is used for constructing an online controller based on the optimal actor network and the optimal critic network and controlling the aircraft through the online controller.

In a third aspect, the present application further provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the aircraft control method of the first aspect when executing the program.

In a fourth aspect, the present application also provides a computer program product comprising a computer program which, when executed by the processor, carries out the steps of the aircraft control method of the first aspect.

In a fifth aspect, the present application also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the aircraft control method of the first aspect.

According to the aircraft control method, the aircraft control device, the aircraft control equipment and the computer program product, the online controller is trained offline through a depth certainty strategy gradient algorithm, and the aircraft control is realized through the online controller. The online controller is obtained by training through a depth certainty strategy gradient algorithm, so that the online controller has good adaptability to the characteristic changes of the model caused by the unbeared parameter disturbance, the fault input and the model uncertainty. Meanwhile, the online controller realizes good instruction following of the aircraft under the conditions of parameter deviation, disturbance and fault to a certain degree, has strong robustness and generalization performance, and realizes accurate control of the aircraft through the online controller with good adaptability and robustness.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow diagram of an aircraft control method provided herein;

FIG. 2 is a control framework diagram of an aircraft control method provided herein;

FIG. 3 is a schematic structural diagram of an aircraft control device provided herein;

fig. 4 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The aircraft control methods, apparatus, devices and computer program products provided herein are described below in conjunction with fig. 1-4.

The present application provides an aircraft control method, with reference to fig. 1 to 4, fig. 1 is a schematic flow chart of the aircraft control method provided herein; FIG. 2 is a control framework diagram of an aircraft control method provided herein; FIG. 3 is a schematic structural diagram of an aircraft control device provided herein; fig. 4 is a schematic structural diagram of an electronic device provided in the present application.

While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in a different order than presented herein.

The electronic device is taken as an execution main body to exemplify the embodiment of the application, and the aircraft control system is taken as the electronic device in the embodiment of the application, so that the electronic device is not limited.

By way of analysis with reference to fig. 1, an aircraft control method provided in an embodiment of the present application includes:

and step S50, according to the model parameters in the aircraft model, determining the observed quantity data and the intelligent body action data of the aircraft.

It should be noted that, heretofore, the aircraft control system needs to build an aircraft model, and an online controller of the aircraft is built by the aid of the aircraft model. In the embodiment, the aircraft model is constructed with the aid of the transfer function, and the specific construction process is as described in step S10 to step S40.

Further, the description of steps S10 to S40 is as follows:

step S10, determining a transfer function from normal overload to pitching rudder deflection angle through transfer function combination and normal overload, pitching rudder deflection angle and characteristic parameters of the aircraft;

step S20, determining a transfer function from the pitch angle speed to the pitch rudder deflection angle through the combination of the transfer functions and the pitch angle speed, the pitch rudder deflection angle and the characteristic parameters of the aircraft;

step S30, determining a transfer function from the attack angle to the pitch rudder deflection angle through the combination of the transfer functions and the attack angle, the pitch rudder deflection angle and the characteristic parameters of the aircraft;

and step S40, constructing the aircraft model according to a transfer function from normal overload to pitch rudder deflection angle, a transfer function from pitch angular velocity to pitch rudder deflection angle and a transfer function from attack angle to pitch rudder deflection angle.

Specifically, the aircraft control system determines the normal overload, pitch angle speed, angle of attack, pitch rudder deflection angle and characteristic parameters of the aircraft, and defines the normal overload of the aircraft as the normal overload

The pitch angle velocity of the aircraft is defined as

The angle of attack of an aircraft is defined as

The pitching rudder deflection angle of the aircraft is defined as

The characteristic parameters of the aircraft include

. The aircraft control system then combines the normal overload of the aircraft via a transfer function

Pitching rudder deflection angle

And characteristic parameters

Determining a transfer function from the normal overload to the pitching rudder deflection angle, and recording the transfer function from the normal overload to the pitching rudder deflection angle as a first transfer function, so that the first transfer function can be expressed as

. Then, the aircraft control system combines the pitch angular velocity of the aircraft through a transfer function

Pitching rudder deflection angle

And characteristic parameters

Determining a transfer function of pitch angle speed to pitch rudder deflection angle, and recording the transfer function of pitch angle speed to pitch rudder deflection angle as a second transfer function, so that the second transfer function can be expressed as

. Then, the aircraft control system combines the attack angle of the aircraft through a transfer function

Pitching rudder deflection angle

And characteristic parameters

Determining a transfer function from the attack angle to the deflection angle of the pitching rudder, recording the transfer function from the attack angle to the deflection angle of the pitching rudder as a third transfer function, and expressing the third transfer function as

. Transfer function of aircraft control system from normal overload to pitching rudder deflection angle

And transfer function of pitch angle velocity to pitch rudder deflection angle

And an attackTransfer function of angle to pitch rudder deflection angle

And combining to construct an aircraft model.

Further, the aircraft control system needs to set a reward function through the normal overload, the pitch angle speed, the attack angle, the pitch rudder deflection angle and the characteristic parameters of the aircraft, and the reward function can be expressed as

Wherein the immediate award is

，

For deviations of actual overload from instruction, immediate awarding

Indicating that when the deviation of the control effect and the actual effect is large, a large punishment is output; when the deviation of the control effect from the actual effect is small, a penalty of almost zero is output.

For constraining the energy of the control input. Sparse rewards

Meaning that if the overload deviation is greater than 0.1 and less than 0.5,

taking 1; if the overload deviation is less than 0.1,

taking 5; rest conditions

Are all taken to be 0.

When the overload deviation of the system is more than 100 or less than-100, the exploration is ended and output

=-500。

According to the embodiment of the application, the performance of the online controller is further generalized by constructing the aircraft model and constructing the online controller of the aircraft in an auxiliary manner through the aircraft model.

After the aircraft model is constructed, the aircraft control system needs to acquire model parameters in the aircraft model, where the model parameters include, but are not limited to, actual overload and commanded deviation and pitch angle velocity, and the present embodiment also needs to acquire a pseudo angle of attack for the reason that the actual angle of attack is not measurable. Next, the aircraft control system determines the observed quantity data and the agent motion data of the aircraft according to the deviation between the actual overload and the command, the pitch angle rate, and the false attack angle in the model parameters, as described in step S501 to step S502.

Further, the description of steps S501 to S502 is as follows:

step S501, with step length as a period, determining preset frame data of actual overload and instruction deviation, pitch angle speed and pseudo attack angle in the model parameters as the observed quantity data;

step S502, determining the pitching rudder deflection angle in the model parameters as the intelligent body action data.

It should be noted that the preset frame data in the present embodiment is set according to actual conditions, and the preset frame data in the present embodiment is preferably 3 frame data. Specifically, the aircraft control system takes the step length as a period, and the deviation of the actual overload and the instruction is obtained

Angular velocity of pitch

Angle of attack

The 3 frames of data of (2) are determined as observed quantity data.

It is further understood that the deviation from the command as a function of the actual overload is periodic in steps

Angular velocity of pitch

Angle of attack

One frame of data available is

Thus, the 3 frame data is

That is, the observed quantity data can be expressed as

. Meanwhile, the aircraft control system determines the pitching rudder deflection angle in the model parameters as the intelligent body action data

I.e. agent motion data

Can be expressed as

。

According to the method and the device, the observation data are determined by taking the step length as the period, the deviation between the actual overload and the instruction, the pitch angle speed and the preset frame data of the pseudo attack angle, and the pitching rudder deflection angle is determined as the intelligent body action data, so that the accuracy of the observation data and the accuracy of the intelligent body action data are guaranteed.

And step S60, training an actor network and a critic network based on the observed quantity data or/and the intelligent agent action data, and outputting a deterministic strategy and an action value function.

The aircraft control system takes the observed quantity data as input data of an actor network, trains the actor network and outputs a deterministic strategy, wherein the actor network comprises an input layer of 9 neurons, two middle full-connection layers of 49 neurons and an output layer of 1 neuron, and the specific training process is as described in steps S601 to S602. Meanwhile, the aircraft control system takes the observed quantity data and the agent action data as input data of a criticic network, trains the criticic network and outputs an action value function, wherein the criticic network comprises an input layer with 10 neurons, a middle full-link layer with two layers of 49 neurons, a middle full-link layer with 49 neurons and an output layer with 1 neuron, and the specific training process is as described in the step S603 to the step S606. Note that, the input layer, the intermediate fully-connected layer, and the output layer in the actor network and the critic network are all BP (Back Propagation, multilayer feedforward neural) neural networks.

Further, the embodiment also provides a control framework to assist the training of the operator network and the critic network, so as to achieve fast convergence. The description is made by an actor network, the control framework of the actor network is as shown in fig. 2, and fig. 2 is a control framework diagram of the aircraft control method provided by the application.

As shown in FIG. 2, the auxiliary controller in the control frame diagram is a proportional controller, and the gain factor of the proportional controller is selected to match the steady-state gain of the system

. Therefore, the system passes the command, the auxiliary controller and the pitching rudder deflection angle

Normal overload

And when the flight action and the current environment determine to reach the steady state, the pitching rudder deflection angle output by the actor network is zero.

Further, the description of step S601 to step S602 is as follows:

step S601, determining the observation amount data as input data of the first input layer;

step S602, processing the observed quantity data through the first intermediate full connection layer, outputting a rudder deflection angle action, and determining the rudder deflection angle action as the certainty policy.

Specifically, the aircraft control system takes the observation data as the input data of the actor network, and since the input layer of the actor network is the input layer of one layer of 9 neurons, the 9 observation data are correspondingly taken as the input data of the input layer. Then, the aircraft control system processes the input 9 observation data through the middle full-connection layer of the two layers of 49 neurons, outputs the processed rudder deflection angle action through the output layer of the one layer of 1 neuron, and determines the rudder deflection angle action as a deterministic strategy, wherein the rudder deflection angle action is understood to be one specific expression form of the deterministic strategy. Further, the present embodiment may activate the deterministic policy by activating a function tanh, and additionally add 1 scalar layer in the operator network for scaling the output amplitude.

According to the method and the device, the observed quantity data are used as input data, and the certainty strategy of the observed quantity data is output through the input layer of 9 neurons in one layer of the factor network and the middle full-connection layer of 49 neurons in two layers, so that the certainty strategy is more accurate.

Further, the description of steps S603 to S606 is as follows:

step S603, determining the observation data and the agent action data as input data of the second input layer;

step S604, processing the observed quantity data through the second middle full-connection layer to obtain first data to be processed;

step S605, processing the intelligent agent action data through the third middle full connection layer to obtain second data to be processed;

step S606, summing the first to-be-processed data and the second to-be-processed data to obtain target data, processing the target data through the third intermediate full link layer, and outputting the action value function.

Specifically, the aircraft control system takes observation quantity data and intelligent body action data as input data of a criticic network, and since the input layer of the criticic network is a layer of 10 neuron input layers, 9 observation quantity data and 1 intelligent body action data are taken as input data of the input layers. And then, the aircraft control system processes the 9 observed quantity data through the middle full connection layer of the two layers of 49 neurons to obtain processed first data to be processed, and simultaneously processes the 1 intelligent body action data through the middle full connection layer of the one layer of 49 neurons to obtain processed second data to be processed. And then, the aircraft control system sums the first to-be-processed data processed by the middle full connection layer of the two layers of 49 neurons and the second to-be-processed data processed by the middle full connection layer of the 49 neurons to obtain target data. And finally, the aircraft control system processes the target data through a middle full-connection layer of one layer of 49 neurons, and outputs the action value function obtained after processing through an output layer of one layer of 1 neuron, wherein the action value function can be understood as a state behavior estimated value in the current state and the action. Further, the present embodiment may activate the action value function by the activation function ReLU.

According to the embodiment of the application, observed quantity data and agent action data are used as input data, and an action value function is output through an input layer of 10 neurons in one layer, a middle full-connection layer of 49 neurons in two layers and a middle full-connection layer of 49 neurons in one layer of a critic network, so that the action value function is more accurate.

And step S70, updating the actor network parameters in the certainty strategy and the critic network parameters in the action value function through a depth certainty strategy gradient algorithm to obtain the optimal actor network and the optimal critic network.

It should be noted that the deep deterministic policy gradient algorithm includes an actor current network, an actor target network, a critic current network, and a critic target network.

Specifically, the aircraft control system applies a deterministic strategy to

The expression shows that the operator network parameter in the deterministic strategy is

. In this embodiment, the operator network parameters

Is determined by the gradient of the objective function, which can be expressed as

Wherein, in the step (A),

is the state distribution of a deterministic policy. The aircraft control system then determines the actor network parameters through the actor current network and the actor target network

Updating is performed, and the specific updating process is as described in step S701 to step S703. The aircraft control system functions the action value as

If the representation is positive, the criticc network parameter in the action value function is

. In this embodiment, critical network parameters

According to the time difference method, the updating formula is

，

，

Wherein, in the step (A),

converting the future reward into the current proportion, and generally taking 0.99;

the learning rate of the critic network is generally 0.001;

the learning rate of the actor network is generally 0.0001. The aircraft control system then carries out parameter pair on the critic network through the critic current network and the critic target network

The updating is performed, and the specific updating process is as described in step S704 to step S706.

Further, the description of steps S701 to 703 is as follows:

step S701, updating initial operator network parameters in the certainty strategy through the operator current network, selecting a current action according to a current state to interact with a current environment, generating a next step state, and transmitting the next step state to an experience playback pool;

step S702, obtaining the next step state in the experience playback pool through the operator target network, selecting an optimal next action according to the next step state, and calculating target operator network parameters;

step S703, determining the latest actor network parameters according to the initial actor network parameters, the target actor network parameters and the inertia update rate through the actor target network, and updating the target actor network parameters through the latest actor network parameters to obtain the optimal actor network.

Specifically, the aircraft control system pairs initial actor network parameters in the deterministic policy through the actor's current network

Updating and according to the current state

Selecting a current action

Interacting with the current environment to generate the next step state

And a reward, and the status of the next step

And transmitting the reward to the experience playback pool. Then, the aircraft control system acquires the next step state in the experience playback pool through the actor target network

And according to the status of the next step

Selecting an optimal next action

Calculating the target operator network parameters

. Then, the aircraft control system passes through the actor target network according to the initial actor network parameters

Target operator network parameters

And inertial update rate

Determining the latest operator network parameters, which may be expressed as

Wherein the inertia update rate

Typically set to 0.001. Finally, the aircraft control system periodically transmits the latest actor network parameters through the actor target network

Copying to target actor network parameters

In order to update the target operator network parameter

And obtaining the optimal operator network. Thus, the aircraft control system updates the last actor network parameters

Copying to target actor network parameters

Can be expressed as:

。

the method and the device for controlling the aircraft in the embodiment of the application update the parameters of the actor network through the actor current network and the actor target network in the depth certainty strategy gradient algorithm to obtain the optimal actor network, so that the online controller which enables the online controller to have good adaptability and robustness is constructed through the optimal actor network, and the accurate control of the aircraft is further realized.

Further, the description of steps S704 to S706 is as follows:

step S704, updating the initial critic network parameters in the action value function through the critic current network, calculating the function value of the current action, and constructing a to-be-processed action value function according to the function value of the current action;

step S705, estimating the to-be-processed action value function through the critic target network to obtain a target critic network parameter;

step S706, determining the latest critic network parameters through the critic target network according to the initial critic network parameters, the target critic network parameters and the inertia updating rate, and updating the target critic network parameters through the latest critic network parameters to obtain the optimal critic network.

Specifically, the aircraft control system passes the initial critic network parameter in the critic current network to action value function

Updating, calculating a function value of the current action, and constructing a function of the action value to be processed according to the function value of the current action. Then, the aircraft control system estimates the action value function to be processed through the critic target network to obtain target critic network parameters

. And then, the aircraft control system passes through the critic target network according to the initial critic network parameters

Target criticc network parameters

And inertial update rate

Determining the latest critic network parameter, which can be expressed as

Wherein the inertia update rate

Typically set to 0.001. Finally, the aircraft control system will periodically transmit the data via the critic target network

Copy to target critic network parameters

To update the target critic network parameters

And obtaining an optimal critic network. Therefore, will

Copy to target critic network parameters

Can be expressed as:

。

according to the method and the device, the critic network parameters are updated through the critic current network and the critic target network in the depth certainty strategy gradient algorithm to obtain the optimal critic network, so that the online controller which has good adaptability and robustness is constructed through the optimal critic network, and the accurate control of the aircraft is further realized.

And step S80, constructing an online controller based on the optimal actor network and the optimal critic network, and controlling the aircraft through the online controller.

The aircraft control system constructs an online controller through an optimal operator network and an optimal critic network to complete offline training of the online controller. It can be understood that when the aircraft control system detects an input state, the input state is evaluated by the online controller to complete the control of the aircraft. In one embodiment, the online controller determines the action of the input state through the optimal actor network, determines whether the action determined by the optimal actor network is appropriate through the optimal critic network, and controls the aircraft through the cooperation of the optimal actor network and the optimal critic network.

The embodiment provides an aircraft control method, an online controller is trained offline through a depth certainty strategy gradient algorithm, and aircraft control is achieved through the online controller. The online controller is obtained by training through a depth certainty strategy gradient algorithm, so that the online controller has good adaptability to the characteristic changes of the model caused by the unbeared parameter disturbance, the fault input and the model uncertainty. Meanwhile, the online controller realizes good instruction following of the aircraft under the conditions of parameter deviation, disturbance and fault to a certain degree, has strong robustness and generalization performance, and realizes accurate control of the aircraft through the online controller with good adaptability and robustness.

Further, the aircraft control device provided by the present application is described below, and the aircraft control device described below and the aircraft control method described above are referred to in correspondence with each other.

As shown in fig. 3, fig. 3 is a schematic structural diagram of an aircraft control device provided in the present application, where the aircraft control device includes:

a determining module 301, configured to determine observation data and agent action data of an aircraft according to a model parameter in an aircraft model;

a training module 302, configured to train an actor network and a critic network based on the observed quantity data or/and the agent action data, and output a deterministic policy and an action value function;

an updating module 303, configured to update an actor network parameter in the deterministic policy and a criticic network parameter in the action value function through a depth deterministic policy gradient algorithm, so as to obtain an optimal actor network and an optimal criticic network;

and the control module 304 is used for constructing an online controller based on the optimal actor network and the optimal critic network, and controlling the aircraft through the online controller.

Further, the updating module 303 is further configured to:

Further, the training module 302 is further configured to:

determining the observation data as input data for the first input layer;

Further, the training module 302 is further configured to:

Further, the determining module 301 is further configured to:

Further, the aircraft control device further comprises a building module for:

The specific embodiment of the aircraft control device provided in the present application is substantially the same as the embodiments of the aircraft control method described above, and details are not described here.

Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor) 410, a communication Interface 420, a memory (memory) 430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform an aircraft control method comprising:

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present application also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the aircraft control method provided by the above-mentioned methods, the method comprising:

In yet another aspect, the present application also provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform the aircraft control method provided above, the method comprising:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An aircraft control method, comprising:

combining normal overload of aircraft by transfer function

Pitching rudder deflection angle

And characteristic parameters

Determining a first transfer function of normal overload to pitch rudder deflection angle, said first transfer function being expressed as

；

Combining the pitch angle velocity of the aircraft by the transfer function

Pitching rudder deflection angle

And characteristic parameters

Determining a second transfer function of pitch angle velocity to pitch rudder deflection angle, said second transfer function being expressed as

；

Combining the angle of attack of the aircraft by the transfer function

Pitching rudder deflection angle

And characteristic parameters

Determining a third transfer function of the angle of attack to the pitch rudder deflection angle, said third transfer function being expressed as

；

Combining the first transfer function, the second transfer function and the third transfer function to construct the aircraft model;

setting a reward function through the normal overload, the pitch angle speed, the attack angle, the pitch rudder deflection angle and the characteristic parameters of the aircraft, wherein the reward function is expressed as

Wherein the immediate award is

，

For deviations of actual overload from instruction, immediate awarding

Indicating that when the deviation of the control effect and the actual effect is large, a large punishment is output; when the deviation between the control effect and the actual effect is small, a penalty of almost zero is output;

for constraining energy of the control input; sparse rewards

Meaning that if the overload deviation is greater than 0.1 and less than 0.5,

taking 1; if the overload deviation is less than 0.1,

taking 5; rest conditions

All are taken as 0;

=-500；

According to the model parameters in the aircraft model, determining observed quantity data and intelligent agent action data of the aircraft;

constructing an online controller based on the optimal actor network and the optimal critic network, and controlling the aircraft through the online controller;

training of the operator network and the criticc network is assisted by an auxiliary controller, wherein the auxiliary controller is a proportional controller, and the gain coefficient of the proportional controller is

；

And when the steady state is determined to be reached through the instruction, the auxiliary controller, the pitching rudder deflection angle, the normal overload, the flight action and the current environment, the pitching rudder deflection angle output by the actor network is zero.

2. The aircraft control method of claim 1 wherein the depth-deterministic policy gradient algorithm comprises an actor current network and an actor target network,

3. The aircraft control method of claim 1 wherein the depth-deterministic policy gradient algorithm comprises a critical current network and a critical target network,

4. The aircraft control method of claim 1 wherein the operator network comprises a first input layer and a first intermediate fully-connected layer,

determining the observation data as input data for the first input layer;

5. The aircraft control method of claim 1 wherein the critic network comprises a second input layer, a second intermediate fully-connected layer and a third intermediate fully-connected layer,

6. The aircraft control method of claim 1, wherein determining observation data and agent action data for the aircraft based on model parameters in the aircraft model comprises:

7. An aircraft control device, comprising:

building block for combining the normal overload of an aircraft by means of a transfer function

Pitching rudder deflection angle

And characteristic parameters

；

Combining the pitch angle velocity of the aircraft by the transfer function

Pitching rudder deflection angle

And characteristic parameters

；

Combining the angle of attack of the aircraft by the transfer function

Pitching rudder deflection angle

And characteristic parameters

；

Wherein the immediate award is

，

For deviations of actual overload from instruction, immediate awarding

for constraining energy of the control input; sparse rewards

Meaning that if the overload deviation is greater than 0.1 and less than 0.5,

taking 1; if the overload deviation is less than 0.1,

taking 5; rest conditions

All are taken as 0;

=-500；

the control module is used for constructing an online controller based on the optimal actor network and the optimal critic network and controlling the aircraft through the online controller;

the auxiliary training module is used for assisting the training of the operator network and the criticc network through an auxiliary controller, wherein the auxiliary controller is a proportional controller, and the gain coefficient of the proportional controller is

；

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the aircraft control method according to any one of claims 1 to 6 are implemented by the processor when executing the computer program.

9. A computer-readable storage medium comprising a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the aircraft control method according to any one of claims 1 to 6.