CN118034373A

CN118034373A - Method and system for controlling residence of optimal intelligent area of stratospheric airship environment

Info

Publication number: CN118034373A
Application number: CN202410291247.7A
Authority: CN
Inventors: 郑泽伟; 温弘毅; 张一飞; 陈天; 祝明
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2024-03-14
Filing date: 2024-03-14
Publication date: 2024-05-14

Abstract

The invention discloses a method and a system for controlling residence in an intelligent area of an optimal stratospheric airship environment, and relates to the technical field of intelligent control, wherein the method comprises the steps of obtaining an actual track, an expected track, an airship position, an airship speed and an airship acceleration of a target stratospheric airship; according to the actual track and the expected track of the target stratospheric airship, an environment optimal outer ring controller is adopted to determine the linear error and the angle error between the actual track and the expected track; determining the observation state of the target stratospheric airship according to the airship position, the airship speed, the airship acceleration, the linear error and the angle error; and determining the two-dimensional action quantity required to be executed by the target stratospheric airship based on the inner ring attitude controller according to the observation state. The invention designs the outer ring controller based on the environment optimal control theory, and can realize forward wind resistance control autonomously, thereby realizing unknown wind field disturbance resistance and model uncertainty resistance region residence.

Description

Method and system for controlling residence of optimal intelligent area of stratospheric airship environment

Technical Field

The invention relates to the technical field of intelligent control, in particular to a method and a system for controlling residence in an optimal intelligent area of a stratospheric airship environment.

Background

In recent years, as technology develops, stratospheric airships have been widely studied. Stratospheric airship is used as an emerging air traffic tool, and has great application potential in the fields of communication, observation, transportation and the like. However, the control problem in the practical application process becomes a difficult problem to be solved. Particularly in a complex atmospheric environment, how to realize regional residence control of stratospheric airships has become the focus of research.

The traditional linear control method can realize the control of the stratospheric airship in a short time, but is very difficult to keep the vicinity of the working point for a long time under a complex atmospheric environment. This is mainly because the linear control algorithm has limitations in coping with system uncertainty, external disturbance, and the like, resulting in complicated control parameter adjustment. To overcome this problem, researchers have begun to explore nonlinear control methods. The nonlinear control method can improve the control performance of the stratospheric airship to a certain extent, but often depends strongly on accurate modeling. In practical applications, accurate modeling is difficult to achieve due to the complexity of the atmospheric environment and the nonlinear nature of the airship system. This means that the nonlinear control method still has a certain limitation in practical application.

Disclosure of Invention

The invention aims to provide a method and a system for controlling regional residence in an optimal intelligent area of a stratospheric airship environment, which can control the stratospheric airship to realize regional residence resistant to unknown wind field disturbance and resistant to model uncertainty.

In order to achieve the above object, the present invention provides the following solutions:

in a first aspect, the present invention provides a method for controlling residence in an optimal intelligent area of a stratospheric airship environment, including:

acquiring environmental parameters of a target stratospheric airship; the environmental parameters include actual trajectory, desired trajectory, airship position, airship speed, and airship acceleration.

According to the actual track and the expected track of the target stratospheric airship, determining a linear error and an angle error between the actual track and the expected track by adopting an environment optimal outer ring controller; the environment-optimal outer ring controller is an outer ring controller designed based on the principle that the target stratospheric airship converges to a forward wind resistance point under the action of an unknown wind field.

And determining the observation state of the target stratospheric airship according to the airship position, the airship speed, the airship acceleration, the linear error and the angle error.

According to the observation state, determining the two-dimensional action quantity required to be executed by the target stratospheric airship based on an inner ring attitude controller; the two-dimensional motion quantity comprises a forward motion control quantity and a steering motion control quantity; the inner ring attitude controller is a controller obtained by training a neural network based on a depth deterministic strategy algorithm.

Optionally, before acquiring the environmental parameter of the target stratospheric airship, the method further comprises:

and acquiring the circle center of the virtual arc of the residence area and the radius of the virtual arc of the residence area.

And determining the expected track of the target stratospheric airship according to the circle center of the virtual arc of the residence area and the radius of the virtual arc of the residence area.

Optionally, according to the actual track and the expected track of the target stratospheric airship, an environment optimal outer ring controller is adopted to determine a linear error and an angle error between the actual track and the expected track, which specifically includes:

according to the formula The linearity error is determined.

The angle error is determined according to the formula e _ψ＝ψ_c - ψ.

Wherein,Is an x-direction error,/> Is y-direction error,/>And phi _c is the desired heading angle,E _b,x is a linear error, e _ψ is an angle error, r _c is a virtual arc radius of the residence area, and ψ is a heading angle.

Optionally, determining the observation state of the target stratospheric airship according to the airship position, the airship speed, the airship acceleration, the linear error and the angle error specifically includes:

And determining the observation state of the target stratospheric airship according to the formula s _t＝[e_b,x,e_ψ, u, r, x, y, du, dr.

Wherein s _t is the observation state, u is the forward speed of the airship, r is the steering speed, du is the forward acceleration of the airship, dr is the steering angular acceleration, e _b,x is the linear error, e _ψ is the angular error, x is the abscissa of the airship in the inertial coordinate system, and y is the ordinate of the airship in the inertial coordinate system.

Optionally, the training process of the neural network based on the depth deterministic strategy algorithm is as follows:

Acquiring historical sample data of the target stratospheric airship; the historical sample data comprise an old state, a two-dimensional action quantity and a new state of the target stratospheric airship; the old state is an observation state when the target stratospheric airship does not execute two-dimensional action quantity; the new state is an observation state of the target stratospheric airship after the two-dimensional action quantity is executed.

And calculating the rewarding value of the two-dimensional action quantity in the current iteration process according to the new state.

And forming the old state, the two-dimensional action quantity, the rewarding value and the new state into a quadruple, and storing the quadruple into an experience playback pool.

Extracting a set number of quaternions from the experience playback pool according to an importance sampling strategy, and calculating a loss value according to the quaternions and a loss function of a target network; the target network is a neural network based on a depth deterministic strategy algorithm.

And optimizing the weights of the performer neural network and the criticizing neural network by using an optimizer according to the loss value.

And updating the weight of the target network according to the weight of the performer neural network and the weight of the criticizing person neural network to obtain the trained neural network based on the depth deterministic strategy algorithm.

Optionally, according to the new state, calculating a prize value of the two-dimensional action quantity in the current iteration process, which is specifically as follows:

r_t＝r_err+r_acc。

Wherein r _err is a reward value ,r_err＝K_b,x·exp(-k_b,x·e_b,x)+K_ψ·exp(-k_ψ·e_ψ);r_acc corresponding to tracking error, r _acc＝-K_du·|du|-K_dr·|dr|;K_b,x、K_ψ、K_du and K _dr are control parameters, K _b,x and K _ψ are scaling factors, and r _t is a reward value of two-dimensional action quantity; e _b,x is the linear error and e _ψ is the angular error.

Optionally, calculating a loss value by using a target network, and optimizing weights of the performer neural network and the criticizer neural network by using an optimizer, wherein the weights are specifically as follows:

y_i＝r_i+γ·Q′(s_i+1,μ′(s_i+1|θ^μ′)|θ^Q′)。

Wherein L _c and L _a are loss functions of the actor neural network and the criticizer neural network, respectively, θ ^μ and θ ^Q are weights of the actor neural network and the criticizer neural network, respectively, and θ ^μ′ and θ ^Q′ are neural network weights of the target actor and the target criticizer, respectively; the optimizer selected when updating criticizing and performer neural network weights is Adam optimizer, γ is the learning rate setting.

Optionally, the weight of the target network is updated according to the weight of the performer neural network and the weight of the criticizing person neural network, specifically as follows:

θ^μ′←λθ^μ+(1-λ)θ^μ′。

θ^Q′←λθ^Q+(1-λ)θ^Q′。

Where λ is the soft update rate, θ ^μ and θ ^Q are the weights of the actor neural network and criticizer neural network, respectively.

In a second aspect, the present invention provides a stratospheric airship environment optimal intelligent regional residence control system, including:

the parameter acquisition module is used for acquiring the environmental parameters of the target stratospheric airship; the environmental parameters include actual trajectory, desired trajectory, airship position, airship speed, and airship acceleration.

The error calculation module is used for determining linear errors and angle errors between the actual track and the expected track by adopting an environment optimal outer ring controller according to the actual track and the expected track of the target stratospheric airship; the environment-optimal outer ring controller is an outer ring controller designed based on the principle that the target stratospheric airship converges to a forward wind resistance point under the action of an unknown wind field.

And the state observation module is used for determining the observation state of the target stratospheric airship according to the airship position, the airship speed, the airship acceleration, the linear error and the angle error.

The action determining module is used for determining the two-dimensional action quantity required to be executed by the target stratospheric airship based on the inner ring gesture controller according to the observation state; the two-dimensional motion quantity comprises a forward motion control quantity and a steering motion control quantity; the inner ring attitude controller is a controller obtained by training a neural network based on a depth deterministic strategy algorithm.

Optionally, the method further comprises:

the residence area determining module is used for acquiring the center of the residence area virtual arc and the radius of the residence area virtual arc.

And the expected track determining module is used for determining the expected track of the target stratospheric airship according to the circle center of the virtual arc of the residence area and the radius of the virtual arc of the residence area.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

The invention provides a method and a system for controlling residence in an intelligent area with optimal stratospheric airship environment. The method includes acquiring environmental parameters of the airship including actual trajectory, desired trajectory, position, speed, and acceleration. And determining the linear error and the angle error according to the actual track and the expected track by using an environment optimal outer ring controller. The design principle of the outer ring controller is that the airship converges to a forward wind resistance point in an unknown wind field. And determining a two-dimensional motion quantity based on the inner ring gesture controller according to the observation state, wherein the two-dimensional motion quantity comprises a forward motion control quantity and a steering motion control quantity. The inner loop controller trains reinforcement learning agents based on a depth deterministic strategy algorithm to reduce errors in the actual path and the desired path. The invention designs the outer ring controller based on the environment optimal control theory, and can realize forward wind resistance control autonomously, thereby realizing unknown wind field disturbance resistance and model uncertainty resistance region residence.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for controlling residence in an optimal intelligent area of a stratospheric airship environment according to an embodiment of the invention.

Fig. 2 is a control frame diagram according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an expected track of an environment optimization method according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of calculating a control error of an outer loop controller according to a first embodiment of the present invention.

Fig. 5 is a diagram of a binary tree according to a first embodiment of the present invention.

Fig. 6 is a structural diagram of a system for controlling residence in an optimal intelligent area of a stratospheric airship according to a second embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Example 1

As shown in fig. 1, the present embodiment provides a method for controlling residence in an optimal intelligent area of a stratospheric airship environment, including:

Step 101: acquiring environmental parameters of a target stratospheric airship; the environmental parameters include actual trajectory, desired trajectory, airship position, airship speed, and airship acceleration.

Step 102: according to the actual track and the expected track of the target stratospheric airship, determining a linear error and an angle error between the actual track and the expected track by adopting an environment optimal outer ring controller; the environment-optimal outer ring controller is an outer ring controller designed based on the principle that the target stratospheric airship converges to a forward wind resistance point under the action of an unknown wind field.

Step 103: and determining the observation state of the target stratospheric airship according to the airship position, the airship speed, the airship acceleration, the linear error and the angle error.

Step 104: according to the observation state, determining the two-dimensional action quantity required to be executed by the target stratospheric airship based on an inner ring attitude controller; the two-dimensional motion quantity comprises a forward motion control quantity and a steering motion control quantity; the inner ring attitude controller is a controller obtained by training a neural network based on a depth deterministic strategy algorithm.

In this embodiment, a schematic diagram of the desired track is shown in fig. 3. The expected track is similar to a simple pendulum model, and under the action of a gravity field, the simple pendulum ball finally converges to the lowest potential energy point, namely the stable balance point, under the influence of an unknown gravity field and damping no matter which position on the circular arc is started. The outer ring controller is designed by utilizing the basic theory, so that the airship is controlled to simulate the movement of the simple pendulum model, and the forward wind resistance point (stable balance point) is converged under the action of an unknown wind field. The outer ring controller is designed according to the principle, and then the outer ring controller is adopted to train a reinforcement learning intelligent body to output forward control quantity and steering control quantity so as to control the airship to track the expected track.

Wherein, before executing step 101, further comprises:

For example, as shown in fig. 3, the desired residence area is set, and the center coordinates of the virtual circle areThe radius is set to r _c =50.

In some implementations of the present embodiment, when performing step 102, as shown in fig. 4, the method specifically may include:

according to the formula The linearity error is determined.

The angle error is determined according to the formula e _ψ＝ψ_c - ψ.

Wherein,Is an x-direction error,/> Is y-direction error,/>And phi _c is the desired heading angle,E _b,x is the linear error, e _ψ is the angular error, and r _c is the radius of the virtual arc of the parking area.

In some implementations of this embodiment, when performing step 103, specifically may include:

Wherein s _t is the observation state, u is the forward speed of the airship, r is the steering speed, du is the forward acceleration of the airship, dr is the steering angular acceleration, e _b,x is the linear error, and e _ψ is the angular error.

In some implementations of the present embodiment, when performing step 104, specifically may include:

The observed state quantity s _t is input, and the two-dimensional operation quantity a _t is output.

Specifically, the actor neural network and the criticizer neural network are respectively a multi-layer perceptron model comprising an input layer, two middle layers and an output layer, wherein the actor neural network has 8 neurons in the input layer, 128 neurons in the two middle layers, 2 neurons in the output layer, 10 neurons in the criticizer neural network input layer, 128 neurons in the middle layers and 1 neuron in the output layer. The activation function is set as tanh (&) and the tanh (&) is limited in the range of [ -1,1] by the output of the actor neural network, and the output of the criticizer neural network is not limited; the actor neural network inputs the observation vectors s _t＝[e_b,x,e_ψ, u, r, x, y, du, dr ], and outputs a two-dimensional output a _t＝[a₁,a₂.

The training process of the neural network based on the depth deterministic strategy algorithm is as follows:

step 201: acquiring historical sample data of the target stratospheric airship; the historical sample data comprise an old state, a two-dimensional action quantity and a new state of the target stratospheric airship; the old state is an observation state when the target stratospheric airship does not execute two-dimensional action quantity; the new state is an observation state of the target stratospheric airship after the two-dimensional action quantity is executed.

Step 202: and calculating the rewarding value of the two-dimensional action quantity in the current iteration process according to the new state.

Step 203: and forming the old state, the two-dimensional action quantity, the rewarding value and the new state into a quadruple, and storing the quadruple into an experience playback pool.

Step 204: extracting a set number of quaternions from the experience playback pool according to an importance sampling strategy, and calculating a loss value according to the quaternions and a loss function of a target network; and optimizing the weights of the performer neural network and the criticizer neural network by using an optimizer; the target network is a neural network based on a depth deterministic strategy algorithm.

Step 205: and updating the weight of the target network according to the weight of the performer neural network and the weight of the criticizing person neural network to obtain the trained neural network based on the depth deterministic strategy algorithm.

In step 202, the following may be specifically mentioned:

r＝r_err+r_acc。

Where r _err is the prize value ,r_err＝K_b,x·exp(-k_b,x·e_b,x)+K_ψ·exp(-k_ψ·e_ψ);r_acc corresponding to the tracking error, r _acc＝-K_du·|du|-K_dr·|dr|;K_b,x、K_ψ、K_du and K _dr are control parameters, and K _b,x and K _ψ are scaling factors.

Specifically, the airship driving system receives a _t＝[a₁,a₂, limits the maximum value of the forward control quantity of the airship to 2e5, limits the maximum value of the steering control quantity to 2e7, inputs the control quantity of the airship system to the scaled action quantity τ= [2e5·a ₁,2e7·a₂ ], and obtains a new state quantity s _t+1 after the airship executes the control quantity.

Calculating a reward value obtained by tracking error, r _err＝K_b,x·exp(-k_b,x·e_b,x)+K_ψ·exp(-k_ψ·e_ψ), calculating a penalty r _acc＝-K_du·|du|-K_dr |dr| caused by acceleration, and adding the two items to obtain a final reward, r _t＝r_err+r_acc.

In step 203, the following may be specifically mentioned:

The four-element group is formed by the old state s _t, the action quantity a _t, the rewarding value r _t and the new state s _t+1, and the four-element group is stored in the experience playback pool.

The quadruple (s _t,a_t,r_t,s_t+1) is saved to the experience playback pool.

The weight of the set of data is set to an initial value of 10, i.e., w=10, and saved to the corresponding position of the binary tree leaf node.

Wherein, when the capacity of the experience playback pool is required to be determined according to the data volume, the experience playback pool belongs to super parameters, and when the experience playback pool is full, the latest data replace the oldest data.

In executing step 204, the method specifically may include:

A set of quadruplets N ^*(s_i,a_i,r_i,s_i+1 is extracted from the experience playback pool according to the importance sampling strategy).

Wherein the importance sampling strategy described in step five determines the probability of each data being sampled according to its weight, and uses a binary tree to store the weights of the data, as shown in fig. 5, and each data is stored in a leaf node (the lowest node) corresponding to a weight. The value of the parent node is the sum of the values of the two child nodes, and so on, the root node (the topmost node) is the sum of all weights. When sampling is implemented, importance sampling is carried out according to the weight value stored on the leaf node, the probability that data with larger weight is extracted is larger, and the extraction probability of each data is calculated as follows:

w _i is the weight value corresponding to the ith data.

For example, the extraction lot size n=256 is set, and the probability of each data in the experience playback pool is calculatedAnd extracts 256 data according to this probability.

All data weights are limited between [0,10], the newly saved data weight defaults to 10, which makes the agent more prone to learn new knowledge, and the data weights are updated according to time difference errors after each time step, and the updated formula is as follows:

w_i＝e_c,i ^0.6。

Where e _td,i represents the time difference error and w _i is the updated weight. The batch size of the data extracted from the empirical playback pool is set to N, which is a manually set hyper-parameter, which will be used for weight updating of the neural network.

The loss values are calculated from the target network and the weights of the performer and criticizer networks are optimized using an optimizer.

Calculating the target value y _i＝r_i+γ·Q′(s_i+1,μ′(s_i+1|θ^μ′)|θ^Q′ using the 256 data just extracted), the loss function value of the criticizer is calculated

Calculating a loss function value of a performer

The parameters were optimized using Adam optimizer, criticizer parameters were optimized by lowering L _c, and actor parameters were optimized by lowering L _a.

In step 205, the following is specifically described below:

The method for calculating the update of the target network weight is as follows:

θ^μ′←λθ^μ+(1-λ)θ^μ′。

θ^Q′←λθ^Q+(1-λ)θ^Q′。

Lambda is soft update rate, belongs to super parameter, and is selected according to actual requirement.

Setting the simulation step length to be 0.1s, and circularly simulating the steps to realize training of the intelligent agent network parameters based on the depth deterministic strategy algorithm (DDPG).

In addition, in the practical application scene, state parameters such as track and speed of the stratospheric airship are measured through the sensor, the fully trained intelligent body output control quantity is distributed and then transmitted to actuating mechanisms such as a control surface and a propeller, and therefore the regional residence control function of the stratospheric airship against unknown wind field interference and model uncertainty is achieved, and the overall flow is shown in fig. 2. According to actual demands, a control engineer can set the position of the resident virtual circle and the initial position of the stratospheric airship, and the calculated control quantity is transmitted to an executing mechanism to realize the regional resident control function.

Example two

As shown in fig. 6, the present embodiment provides a stratospheric airship environment optimal intelligent regional residence control system, including:

The parameter obtaining module 601 is configured to obtain an environmental parameter of the target stratospheric airship; the environmental parameters include actual trajectory, desired trajectory, airship position, airship speed, and airship acceleration.

The error calculation module 602 is configured to determine, according to an actual track and an expected track of the target stratospheric airship, a linear error and an angle error between the actual track and the expected track by using an environment-optimal outer ring controller; the environment-optimal outer ring controller is an outer ring controller designed based on the principle that the target stratospheric airship converges to a forward wind resistance point under the action of an unknown wind field.

And a state observation module 603, configured to determine an observation state of the target stratospheric airship according to the airship position, the airship speed, the airship acceleration, the linearity error and the angle error.

The action determining module 604 is configured to determine, based on the inner ring attitude controller, a two-dimensional action amount that needs to be performed by the target stratospheric airship according to the observation state; the two-dimensional motion quantity comprises a forward motion control quantity and a steering motion control quantity; the inner ring attitude controller is a controller obtained by training a neural network based on a depth deterministic strategy algorithm.

Optionally, the method further comprises:

In summary, the invention has the following technical effects:

1) The invention designs the regional resident outer ring controller by utilizing the environment optimal control theory, which is used for constructing the expected path of the stratospheric airship, so that the airship can automatically fly to a forward wind-resistant position under the action of a wind field; meanwhile, the inner ring attitude controller is designed based on a depth deterministic strategy algorithm (DDPG), and aims to reduce errors of an actual path and an expected path through training reinforcement learning agents until the expected path can be tracked rapidly and accurately.

2) The invention avoids establishing a mathematical model, can be directly applied to a high-nonlinearity real environment, has simple and efficient steps, and can ensure the stability of the system.

3) The outer ring controller designed based on the environment optimal control theory can realize forward wind resistance without measuring wind direction.

4) The inner loop controller designed by the depth deterministic strategy method can realize self-learning, has less parameter adjustment, and saves the cost of manual parameter adjustment.

5) The algorithm has simple structure and high response speed, and is easy for engineering realization.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. The intelligent area residence control method for the stratospheric airship environment optimization is characterized by comprising the following steps of:

acquiring environmental parameters of a target stratospheric airship; the environmental parameters include actual trajectory, desired trajectory, airship position, airship speed, and airship acceleration;

According to the actual track and the expected track of the target stratospheric airship, determining a linear error and an angle error between the actual track and the expected track by adopting an environment optimal outer ring controller; the environment-optimal outer ring controller is an outer ring controller designed based on the principle that the target stratospheric airship converges to a forward wind resistance point under the action of an unknown wind field;

Determining an observation state of the target stratospheric airship according to the airship position, the airship speed, the airship acceleration, the linear error and the angle error;

2. The method for intelligent regional residence control for optimal stratospheric airship environment according to claim 1, further comprising, before obtaining the environmental parameters of the target stratospheric airship:

Acquiring a virtual arc circle center of a resident area and a virtual arc radius of the resident area;

3. The method for controlling the residence of the optimal intelligent area of the stratospheric airship environment according to claim 1, wherein the method for controlling the residence of the optimal intelligent area of the stratospheric airship environment is characterized by adopting an environment optimal outer ring controller to determine the linear error and the angle error between the actual track and the expected track according to the actual track and the expected track of the target stratospheric airship, and specifically comprises the following steps:

according to the formula Determining the linearity error;

Determining the angle error according to formulas e _ψ＝ψ_c - ψ;

Wherein, Is an x-direction error,/> Is y-direction error,/>And phi _c is the desired heading angle,E _b,x is a linear error, e _ψ is an angle error, r _c is a virtual arc radius of the residence area, and ψ is a heading angle.

4. The method for controlling the residence of an intelligent area optimal for a stratospheric airship environment according to claim 1, wherein the method for controlling the residence of the stratospheric airship according to the airship position, the airship speed, the airship acceleration, the linearity error and the angle error comprises the following steps:

Determining the observation state of the target stratospheric airship according to the formula s _t＝[e_b,x,e_ψ, u, r, x, y, du, dr;

5. The method for controlling the residence of the optimal intelligent area in the stratospheric airship environment according to claim 1, wherein the training process of the neural network based on the depth deterministic strategy algorithm is as follows:

Acquiring historical sample data of the target stratospheric airship; the historical sample data comprise an old state, a two-dimensional action quantity and a new state of the target stratospheric airship; the old state is an observation state when the target stratospheric airship does not execute two-dimensional action quantity; the new state is an observation state after the target stratospheric airship executes two-dimensional action quantity;

calculating a rewarding value of the two-dimensional action quantity in the current iteration process according to the new state;

Forming the old state, the two-dimensional action quantity, the rewarding value and the new state into a quadruple, and storing the quadruple into an experience playback pool;

extracting a set number of quaternions from the experience playback pool according to an importance sampling strategy, and calculating a loss value according to the quaternions and a loss function of a target network; the target network is a neural network based on a depth deterministic strategy algorithm;

Optimizing weights of the performer neural network and the criticizer neural network by using an optimizer according to the loss value;

6. The method for controlling the residence of the optimal intelligent area in the stratospheric airship environment according to claim 5, wherein the method is characterized in that a reward value of two-dimensional action quantity in the current iteration process is calculated according to the new state, and the method is specifically as follows:

r_t＝r_err+r_acc；

7. The method for controlling the residence of the optimal intelligent area in the stratospheric airship environment according to claim 6, wherein the loss value is calculated by adopting a target network, and the weights of the performer neural network and the criticizer neural network are optimized by using an optimizer, specifically:

y_i＝r_i+γ·Q′(s_i+1,μ′(s_i+1|θ^μ′)|θ^Q′)；

8. The method for controlling the optimal intelligent area residence of the stratospheric airship environment according to claim 6, wherein the weight of the target network is updated according to the weight of the performer neural network and the weight of the criticizing person neural network, specifically comprising the following steps:

θ^μ′←λθ^μ+(1-λ)θ^μ′；

θ^Q′←λθ^Q+(1-λ)θ^Q′；

9. An intelligent regional residence control system for an optimal stratospheric airship environment, which is characterized by comprising:

The parameter acquisition module is used for acquiring the environmental parameters of the target stratospheric airship; the environmental parameters include actual trajectory, desired trajectory, airship position, airship speed, and airship acceleration;

the error calculation module is used for determining linear errors and angle errors between the actual track and the expected track by adopting an environment optimal outer ring controller according to the actual track and the expected track of the target stratospheric airship; the environment-optimal outer ring controller is an outer ring controller designed based on the principle that the target stratospheric airship converges to a forward wind resistance point under the action of an unknown wind field;

The state observation module is used for determining the observation state of the target stratospheric airship according to the airship position, the airship speed, the airship acceleration, the linear error and the angle error;

10. The stratospheric airship environment-optimal intelligent regional residence control system of claim 9, further comprising:

the residence area determining module is used for acquiring the center of the residence area virtual arc and the radius of the residence area virtual arc;